* Large repo and pack.packsizelimit
@ 2012-05-03 11:57 th.acker66
2012-05-08 20:31 ` Jeff King
0 siblings, 1 reply; 10+ messages in thread
From: th.acker66 @ 2012-05-03 11:57 UTC (permalink / raw)
To: git
Hello,
I am using MSysgit 1.7.9 on WinXP 32bit and have a very large repo (10GB in .git; 20GB in source tree).
I had to set pack.packsizelimit=1024MB to prevent "out of memory" during repacking in git-gc
and everything seemed to work fine.
When I tried to clone this repo an "out of memory" occured because the packs to be transferred
by the git protocol are not limited by pack.packsizelimit. I "fixed" this by setting transfer.unpackLimit=100000
and thus transferring only loose objects. This is very slow but it works.
In this cloned repo now git-gc again causes "out of memory" because it tries to pack all loose
objects in one go thereby seemingly not respecting pack.packsizelimit ...
(Setting --window-memory=512m in git-repack did not help here.)
Am I doing anything wrong here or is this a bug/feature in git?
BTW1 Repo is very large but contains only one really large file with 1.2GB; all other files are smaller than 256MB.
BTW2 I cannot use 1.7.10 due to the http authorization bug.
---
Thomas
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Large repo and pack.packsizelimit
2012-05-03 11:57 Large repo and pack.packsizelimit th.acker66
@ 2012-05-08 20:31 ` Jeff King
2012-05-08 21:13 ` Nicolas Pitre
0 siblings, 1 reply; 10+ messages in thread
From: Jeff King @ 2012-05-08 20:31 UTC (permalink / raw)
To: th.acker66; +Cc: Nicolas Pitre, git
On Thu, May 03, 2012 at 01:57:58PM +0200, th.acker66@arcor.de wrote:
> I am using MSysgit 1.7.9 on WinXP 32bit and have a very large repo
> (10GB in .git; 20GB in source tree). I had to set
> pack.packsizelimit=1024MB to prevent "out of memory" during repacking
> in git-gc and everything seemed to work fine.
>
> When I tried to clone this repo an "out of memory" occured because the
> packs to be transferred by the git protocol are not limited by
> pack.packsizelimit.
Yes, pack-objects respects pack.packsizelimit when creating local packs,
but incoming packs from the network (which are processed by index-pack)
are not split.
This should be fixed in git. Unfortunately, I don't know that it is as
trivial as just splitting the incoming stream; we would also have to
make sure that there were no cross-pack deltas in the result.
-Peff
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Large repo and pack.packsizelimit
2012-05-08 20:31 ` Jeff King
@ 2012-05-08 21:13 ` Nicolas Pitre
2012-05-08 21:20 ` Jeff King
0 siblings, 1 reply; 10+ messages in thread
From: Nicolas Pitre @ 2012-05-08 21:13 UTC (permalink / raw)
To: Jeff King; +Cc: th.acker66, git
On Tue, 8 May 2012, Jeff King wrote:
> On Thu, May 03, 2012 at 01:57:58PM +0200, th.acker66@arcor.de wrote:
>
> > I am using MSysgit 1.7.9 on WinXP 32bit and have a very large repo
> > (10GB in .git; 20GB in source tree). I had to set
> > pack.packsizelimit=1024MB to prevent "out of memory" during repacking
> > in git-gc and everything seemed to work fine.
> >
> > When I tried to clone this repo an "out of memory" occured because the
> > packs to be transferred by the git protocol are not limited by
> > pack.packsizelimit.
>
> Yes, pack-objects respects pack.packsizelimit when creating local packs,
> but incoming packs from the network (which are processed by index-pack)
> are not split.
>
> This should be fixed in git. Unfortunately, I don't know that it is as
> trivial as just splitting the incoming stream; we would also have to
> make sure that there were no cross-pack deltas in the result.
IMHO this is the wrong fix. The pack size limit was created to deal
with storage media with limited capacity. In this case, the repack
process should be told to limit its memory usage, and pack-index should
simply be taught to cope.
Nicolas
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Large repo and pack.packsizelimit
2012-05-08 21:13 ` Nicolas Pitre
@ 2012-05-08 21:20 ` Jeff King
2012-05-08 21:52 ` Nicolas Pitre
0 siblings, 1 reply; 10+ messages in thread
From: Jeff King @ 2012-05-08 21:20 UTC (permalink / raw)
To: Nicolas Pitre; +Cc: th.acker66, git
On Tue, May 08, 2012 at 05:13:13PM -0400, Nicolas Pitre wrote:
> > This should be fixed in git. Unfortunately, I don't know that it is as
> > trivial as just splitting the incoming stream; we would also have to
> > make sure that there were no cross-pack deltas in the result.
>
> IMHO this is the wrong fix. The pack size limit was created to deal
> with storage media with limited capacity. In this case, the repack
> process should be told to limit its memory usage, and pack-index should
> simply be taught to cope.
Hmm, you're right. I was thinking it helped to deal with memory
addressing issues for 32-bit systems, but I guess
core.packedGitWindowSize should be handling that. IOW, the 10G packfile
should work just fine for normal access.
However, the OP did say he got an "out of memory" error during the
clone. So maybe there is a problem to be fixed in index-pack there.
-Peff
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Large repo and pack.packsizelimit
2012-05-08 21:20 ` Jeff King
@ 2012-05-08 21:52 ` Nicolas Pitre
2012-05-09 9:36 ` Thomas
0 siblings, 1 reply; 10+ messages in thread
From: Nicolas Pitre @ 2012-05-08 21:52 UTC (permalink / raw)
To: Jeff King; +Cc: th.acker66, git
On Tue, 8 May 2012, Jeff King wrote:
> On Tue, May 08, 2012 at 05:13:13PM -0400, Nicolas Pitre wrote:
>
> > > This should be fixed in git. Unfortunately, I don't know that it is as
> > > trivial as just splitting the incoming stream; we would also have to
> > > make sure that there were no cross-pack deltas in the result.
> >
> > IMHO this is the wrong fix. The pack size limit was created to deal
> > with storage media with limited capacity. In this case, the repack
> > process should be told to limit its memory usage, and pack-index should
> > simply be taught to cope.
>
> Hmm, you're right. I was thinking it helped to deal with memory
> addressing issues for 32-bit systems, but I guess
> core.packedGitWindowSize should be handling that. IOW, the 10G packfile
> should work just fine for normal access.
>
> However, the OP did say he got an "out of memory" error during the
> clone. So maybe there is a problem to be fixed in index-pack there.
Was the OOM on the remote side (pack-objects) or on the local side
(index-pack) ?
Nicolas
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Large repo and pack.packsizelimit
2012-05-08 21:52 ` Nicolas Pitre
@ 2012-05-09 9:36 ` Thomas
2012-05-09 10:50 ` Nguyen Thai Ngoc Duy
0 siblings, 1 reply; 10+ messages in thread
From: Thomas @ 2012-05-09 9:36 UTC (permalink / raw)
To: git
Nicolas Pitre <nico <at> fluxnic.net> writes:
>
> On Tue, 8 May 2012, Jeff King wrote:
>
> > On Tue, May 08, 2012 at 05:13:13PM -0400, Nicolas Pitre wrote:
> >
> > > > This should be fixed in git. Unfortunately, I don't know that it is as
> > > > trivial as just splitting the incoming stream; we would also have to
> > > > make sure that there were no cross-pack deltas in the result.
> > >
> > > IMHO this is the wrong fix. The pack size limit was created to deal
> > > with storage media with limited capacity. In this case, the repack
> > > process should be told to limit its memory usage, and pack-index should
> > > simply be taught to cope.
> >
> > Hmm, you're right. I was thinking it helped to deal with memory
> > addressing issues for 32-bit systems, but I guess
> > core.packedGitWindowSize should be handling that. IOW, the 10G packfile
> > should work just fine for normal access.
> >
> > However, the OP did say he got an "out of memory" error during the
> > clone. So maybe there is a problem to be fixed in index-pack there.
>
> Was the OOM on the remote side (pack-objects) or on the local side
> (index-pack) ?
>
> Nicolas
>
To be exact I did the clone locally on the same machine and so the clone itself
worked
but I got the OOM during the first fetch. I "fixed" this by setting
transfer.unpacklimit=100000
which caused only loose objects to be transfered.
So in this case I think the OOM was on the remote side. But there is another OOM
if I try to repack locally.
It seems to me that neither pack-objects nor index-pack respekt
pack.packsizelimit and always
try to pack all objects to be transferred resp. all local loose objects in one
pack.
I could live wth the transfer.unpacklimit=100000 but the local OOM stops me from
using the cloned repo.
---
Thomas
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Large repo and pack.packsizelimit
2012-05-09 9:36 ` Thomas
@ 2012-05-09 10:50 ` Nguyen Thai Ngoc Duy
2012-05-09 11:46 ` Thomas
0 siblings, 1 reply; 10+ messages in thread
From: Nguyen Thai Ngoc Duy @ 2012-05-09 10:50 UTC (permalink / raw)
To: Thomas; +Cc: git
On Wed, May 9, 2012 at 4:36 PM, Thomas <th.acker66@arcor.de> wrote:
> To be exact I did the clone locally on the same machine and so the clone itself
> worked
> but I got the OOM during the first fetch. I "fixed" this by setting
> transfer.unpacklimit=100000
> which caused only loose objects to be transfered.
> So in this case I think the OOM was on the remote side. But there is another OOM
> if I try to repack locally.
> It seems to me that neither pack-objects nor index-pack respekt
> pack.packsizelimit and always
> try to pack all objects to be transferred resp. all local loose objects in one
> pack.
> I could live wth the transfer.unpacklimit=100000 but the local OOM stops me from
> using the cloned repo.
I have some patches to make index-pack work better with large blobs
but they're not ready yet. I think pack-objects works fine with large
blobs as long as they are all in packs. Are there any loose objects on
the source repo?
It's strange that you chose "256mb" as the upper limit for small
objects in your first mail. Do you have a lot of >=10mb files? By
default, files smaller than 512mb will be put in memory for delta. A
lot of big (but smaller than 512mb) files can quickly consume all
memory. If it's the case, maybe you can lower core.bigFileThreshold
Also maybe try remove the 1.2GB file from the source repo and see if
it works better. That could give us some hints where the problem is.
--
Duy
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Large repo and pack.packsizelimit
2012-05-09 10:50 ` Nguyen Thai Ngoc Duy
@ 2012-05-09 11:46 ` Thomas
2012-05-09 17:30 ` Junio C Hamano
0 siblings, 1 reply; 10+ messages in thread
From: Thomas @ 2012-05-09 11:46 UTC (permalink / raw)
To: git
Nguyen Thai Ngoc Duy <pclouds <at> gmail.com> writes:
>
> On Wed, May 9, 2012 at 4:36 PM, Thomas <th.acker66 <at> arcor.de> wrote:
> > To be exact I did the clone locally on the same machine and so the clone
itself
> > worked
> > but I got the OOM during the first fetch. I "fixed" this by setting
> > transfer.unpacklimit=100000
> > which caused only loose objects to be transfered.
> > So in this case I think the OOM was on the remote side. But there is another
OOM
> > if I try to repack locally.
> > It seems to me that neither pack-objects nor index-pack respekt
> > pack.packsizelimit and always
> > try to pack all objects to be transferred resp. all local loose objects in
one
> > pack.
> > I could live wth the transfer.unpacklimit=100000 but the local OOM stops me
from
> > using the cloned repo.
>
> I have some patches to make index-pack work better with large blobs
> but they're not ready yet. I think pack-objects works fine with large
> blobs as long as they are all in packs. Are there any loose objects on
> the source repo?
>
> It's strange that you chose "256mb" as the upper limit for small
> objects in your first mail. Do you have a lot of >=10mb files? By
> default, files smaller than 512mb will be put in memory for delta. A
> lot of big (but smaller than 512mb) files can quickly consume all
> memory. If it's the case, maybe you can lower core.bigFileThreshold
>
> Also maybe try remove the 1.2GB file from the source repo and see if
> it works better. That could give us some hints where the problem is.
I am using core.bigFileThreshold=256MB already; so the large file/s should not
be the problem (most of the files in the repo are "standard" source code files;
I tried even smaller numbers for bigFileThreshold and packsizelimit but with no
success).
As long as I worked with the original repo which was updated regularily all
worked well as soon as pack.packsizelimit was set to 1024MB (even with the 1.2GB
file). Repack seems not to increase a pack further as soon as packsizelimit is
exceeded (so my packs are all slightly larger than 1024MB) BUT it also seems to
try to put everything in one pack regardless of packsizelimit in the following
cases:
(1) all objects to be transferred to another repo
(2) all loose objects when starting a local repack
Case (1) can be fixed by transfer.unpacklimit but there is no fix for (2).
---
Thomas
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Large repo and pack.packsizelimit
2012-05-09 11:46 ` Thomas
@ 2012-05-09 17:30 ` Junio C Hamano
2012-05-10 11:42 ` Thomas
0 siblings, 1 reply; 10+ messages in thread
From: Junio C Hamano @ 2012-05-09 17:30 UTC (permalink / raw)
To: Thomas; +Cc: git
Thomas <th.acker66@arcor.de> writes:
> (1) all objects to be transferred to another repo
> (2) all loose objects when starting a local repack
> Case (1) can be fixed by transfer.unpacklimit but there is no fix for (2).
Technically (1) is putting everything in a single pack to transfer, and it
is only the receiving end that does the chopping.
For (2), you could do something like
keep=$( git rev-list --objects $some_rev |
git pack-objects --delta-base-offset pack ) &&
mv pack-$keep.pack pack-$keep.idx .git/objects/pack/ &&
echo "keep $some_rev" >.git/objects/pack/pack-$keep.keep
after finding a suitable $some_rev that is old enough so that it will be
an ancestor of anything that matters in the future and gives small enough
packfiles. You may want to try doing the above multiple times, by picking
a few strategic ranges, e.g.
for some_rev in v1.0 v1.0..v2.0 v2.0..v2.4 v2.4..v3.0
do
... the above four lines come here ...
done
The objects stored in .keep packs won't participate in future repack so
your "git repack -a -d" after that will put everything that is needed only
for versions newer than v3.0 in a single new pack.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Large repo and pack.packsizelimit
2012-05-09 17:30 ` Junio C Hamano
@ 2012-05-10 11:42 ` Thomas
0 siblings, 0 replies; 10+ messages in thread
From: Thomas @ 2012-05-10 11:42 UTC (permalink / raw)
To: git
Junio C Hamano <gitster <at> pobox.com> writes:
>
> Thomas <th.acker66 <at> arcor.de> writes:
>
> > (1) all objects to be transferred to another repo
> > (2) all loose objects when starting a local repack
> > Case (1) can be fixed by transfer.unpacklimit but there is no fix for (2).
>
> Technically (1) is putting everything in a single pack to transfer, and it
> is only the receiving end that does the chopping.
>
> For (2), you could do something like
>
> keep=$( git rev-list --objects $some_rev |
> git pack-objects --delta-base-offset pack ) &&
> mv pack-$keep.pack pack-$keep.idx .git/objects/pack/ &&
> echo "keep $some_rev" >.git/objects/pack/pack-$keep.keep
>
> after finding a suitable $some_rev that is old enough so that it will be
> an ancestor of anything that matters in the future and gives small enough
> packfiles. You may want to try doing the above multiple times, by picking
> a few strategic ranges, e.g.
>
> for some_rev in v1.0 v1.0..v2.0 v2.0..v2.4 v2.4..v3.0
> do
> ... the above four lines come here ...
> done
>
Not really a porcelain-level solution ;-) but I will try it.
Thanks!
Is there any chance that (1) and (2) will be solved by using packsizelimit
in the future?
Will there be any porcelain/plumbing commands for creating/deleting
.keep files for packs?
---
Thomas
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2012-05-10 11:42 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-05-03 11:57 Large repo and pack.packsizelimit th.acker66
2012-05-08 20:31 ` Jeff King
2012-05-08 21:13 ` Nicolas Pitre
2012-05-08 21:20 ` Jeff King
2012-05-08 21:52 ` Nicolas Pitre
2012-05-09 9:36 ` Thomas
2012-05-09 10:50 ` Nguyen Thai Ngoc Duy
2012-05-09 11:46 ` Thomas
2012-05-09 17:30 ` Junio C Hamano
2012-05-10 11:42 ` Thomas
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).