From: "Dirk Süsserott" <newsletter@dirk.my1.cc>
To: John <john@puckerupgames.com>
Cc: git@vger.kernel.org, Jeff King <peff@peff.net>
Subject: Re: serious performance issues with images, audio files, and other "non-code" data
Date: Fri, 14 May 2010 19:26:59 +0200 [thread overview]
Message-ID: <4BED87E3.60000@dirk.my1.cc> (raw)
In-Reply-To: <4BED47EA.9090905@puckerupgames.com>
Am 14.05.2010 14:54 schrieb John:
> Thanks so much. It's version 1.5.6.5. I compiled it 3 months ago. For
> example, in one repo, there are 1200 source files, each on average 109K
> in size, for a total size of 127M. The largest source file is 82M. Most
> of the non-text source files are already compressed.
>
> I packed the bare repo, then ran `gc --aggressive`. Then I did a `git
> pull`, which took 35 minutes. The git processes in `top` seemed to peak
> at around 300M of memory. Since then, I added 'binary -delta' to the
> .gitattributes for various files, based on suggestions from this mailing
> list, but by that time did not wish to repeat the 35 minute pull to test
> it out. Let's hope that made a difference.
>
> You can simulate it all by generating a batch of 1-100 MB files from
> /dev/urandom (since they won't compress), commit them, then do it again
> many times to simulate edits. Every few iterates, push it somewhere.
>
>
> I noticed some other folks on this list apparently having the same
> issues, but they don't know it yet ("git hangs while compressing
> objects", etc.). That's probably the first symptom they'll see. It
> *appears* to hang, but it's really spinning away on the `pack` gizmo.
>
> I'm open to alternative suggestions -- some kind of dual-mode, where
> text files are "fully" version'd, diff'd, delta'd, index'd, stash'd,
> pack'd, compress'd, object'd and whatever else git needs to do, while
> non-text files are archived in a "lesser" manner. On the other hand, I
> get the sense that the LAST thing git needs is another "mode"!
>
>
>
>
> On 05/14/2010 01:10 AM, Jeff King wrote:
>> On Wed, May 12, 2010 at 02:53:53PM -0400, John wrote:
>>
>>> We're seeing serious performance issues with repos that store media
>>> files, even relatively small files. For example, a web site with less
>>> than 100 MB of images can take minutes to commit, push, or pull when
>>> images have changed.
>>
>> That sounds way too slow from my experiences. I have a repository with 3
>> gigabytes of photos and videos. Committing 20M of new images takes a
>> second or two. The biggest slowdown is doing the sha1 over the new data
>> (which actually happens during "git add").
>>
>> What version of git are you using? Have you tried "commit -q" to
>> suppress the diff at the end of commit?
>>
>> Can you show us exactly what commands you're using, along with timings
>> so we can see where the slowness is?
>>
>> For pushing and pulling, you're probably seeing delta compression, which
>> can be slow for large files (though again, minutes seems kind of slow to
>> me). It _can_ be worth doing for images, if you do things like change
>> only exif tags but not the image data itself. But if the images
>> themselves are changing, you probably want to try setting the "-delta"
>> attribute. Like:
>>
>> echo '*.jpg -delta'>.gitattributes
>>
>> Also, consider repacking your repository, which will generate a packfile
>> that will be re-used during push and pull.
>>
>>> Our first guess was that git is repeatedly attempting to
>>> compress/decompress data that had already been compressed. We tried
>>
>> Git does spend a fair bit of time in zlib for some workloads, but it
>> should not create problems on the order of minutes.
>>
>>> core.compression 0 ## Docs say this disables compression.
>>> Didn't seem to work.
>>
>> That should disable zlib compression of loose objects and objects within
>> packfiles. It can save a little time for objects which won't compress,
>> but you will lose the size benefits for any text files.
>>
>> But it won't turn off delta compression, which is what the
>> "compressing..." phase during push and pull is doing. And which is much
>> more likely the cause of slowness.
>>
>>> pack.depth 1 ## Unclear what this does.
>>
>> It says you can't make a chain of deltas deeper than 1. It's probably
>> not what you want.
>>
>>> pack.window 0 ## No idea what this does.
>>
>> It sets the number of other objects git will consider when doing delta
>> compression. Setting it low should improve your push/pull times. But you
>> will lose the substantial benefit of delta-compression of your non-image
>> files (and git's meta objects). So the "-delta" option above for
>> specific files is a much better solution.
>>
>>> gc.auto 0 ## We hope this disables automatic packing.
>>
>> It disables automatic repacking when you have a lot of objects. You
>> _have_ to pack when pushing and pulling, since packfiles are the
>> on-the-wire format. What will help is:
>>
>> 1. Having repositories already packed, since git can re-use the packed
>> data.
>>
>> 2. Using -delta so that things which delta poorly are just copied into
>> the packfile as-is.
>>
>>> Is there a trick to getting git to simply "copy files as is"? In
>>> other words, don't attempt to compress them, don't attempt to "diff"
>>> them, just store/copy/transfer the files as-is?
>>
>> Hopefully you can pick out the answer to that question from the above
>> statements. :)
>>
>> -Peff
>
Hi John,
Peff explained it very well. Some time ago, I had a similar problem:
http://www.mentby.com/Group/git/how-to-prevent-git-from-compressing-certain-files.html
and he helped me as well. Probably you may want to have a look at that
thread.
Dirk
> --
> To unsubscribe from this list: send the line "unsubscribe git" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
next prev parent reply other threads:[~2010-05-14 17:39 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-05-12 18:53 serious performance issues with images, audio files, and other "non-code" data John
2010-05-12 19:15 ` Jakub Narebski
2010-05-14 5:10 ` Jeff King
2010-05-14 12:54 ` John
2010-05-14 17:26 ` Dirk Süsserott [this message]
2010-05-17 23:16 ` Jeff King
2010-05-17 23:33 ` Sverre Rabbelier
2010-05-18 19:07 ` Jeff King
2010-05-18 19:10 ` Sverre Rabbelier
2010-05-18 19:27 ` Jeff King
2010-05-18 19:37 ` Nicolas Pitre
2010-05-18 18:50 ` John
2010-05-18 18:54 ` Sverre Rabbelier
2010-05-18 19:19 ` Jeff King
2010-05-18 19:33 ` Nicolas Pitre
2010-05-18 19:41 ` Jeff King
2010-05-18 19:59 ` Nicolas Pitre
2010-05-24 0:21 ` John
2010-05-24 1:16 ` Junio C Hamano
2010-05-24 7:01 ` John
2010-05-25 6:33 ` Jeff King
2010-05-25 7:28 ` Michael J Gruber
2010-05-25 16:12 ` John
2010-05-25 17:18 ` Nicolas Pitre
2010-05-25 17:47 ` John
2010-05-24 5:39 ` Jeff King
2010-05-24 6:44 ` John
2010-05-24 6:45 ` Jeff King
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4BED87E3.60000@dirk.my1.cc \
--to=newsletter@dirk.my1.cc \
--cc=git@vger.kernel.org \
--cc=john@puckerupgames.com \
--cc=peff@peff.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).