From: mkoegler@auto.tuwien.ac.at (Martin Koegler)
To: Nicolas Pitre <nico@cam.org>
Cc: Junio C Hamano <gitster@pobox.com>, git@vger.kernel.org
Subject: Re: [RFC Patch] Preventing corrupt objects from entering the repository
Date: Mon, 11 Feb 2008 20:56:23 +0100 [thread overview]
Message-ID: <20080211195623.GA21878@auto.tuwien.ac.at> (raw)
In-Reply-To: <alpine.LFD.1.00.0802101929310.2732@xanadu.home>
On Sun, Feb 10, 2008 at 07:33:34PM -0500, Nicolas Pitre wrote:
> On Sun, 10 Feb 2008, Junio C Hamano wrote:
>
> > mkoegler@auto.tuwien.ac.at (Martin Koegler) writes:
> >
> > > This patch adds a cache to keep the object data in memory. The delta
> > > resolving code must also search in the cache.
> >
> > I have to wonder what the memory pressure in real-life usage
> > will be like.
> FWIW, I don't like this idea.
>
> I'm struggling to find ways to improve performances of
> pack-objects/index-pack with those large repositories that are becoming
> more common (i.e. GCC, OOO, Mozilla, etc.) Anything that increase
> memory usage isn't very welcome IMHO.
Maybe I have missed something, but all repack problems reported on the
git mailing list happen durring the deltifing phase. The problematic
files are mostly bigger blobs. I'm aware of these problems, so my
patch does not keep any blobs in memory.
As we are talking about memory, let's ignore unpack-objects, which is
used for small packs. Lets compare the memory usage of index-pack to
pack-objects:
If it is disabled (no --strict passed), only a (unused) pointer for
each object in the received pack file is additionally allocated.
On i386, struct object_entry is 84 bytes in pack-objects, but only 52
in index-pack. Both programs keep a struct object_entry for each
object during the runtime in memory. So in this case, index-pack uses
less memory than pack-objects
If the --strict option is passed, more memory is used:
* Again, we add one pointer to struct object_entry. object_entry is
still smaller.(52<84 bytes).
* index-pack allocates a struct blob/tree/commit/tag for each object in the pack.
pack-objects also allocates only struct object in the best case
(reading from pack file), otherwise a struct
blob/tree/commit/tag. This objects are kept during the runtime of
pack-objects in memory.
So depending of the parameters of pack-objects, index-pack uses
additionally up to 24 bytes per object, but struct object_entry is 32
bytes smaller.
* index-pack allocates a struct blob/tree/commit/tag for each link to a object outside the pack.
I don't know the code of pack-objects enough to say something to
this point.
* index-pack keeps the data for each tag/tree/commit in the pack in memory
In the next version, I don't need to keep the tag/commit data in
memory. Tree data could be reconstructed from the written pack,
but I'm not sure, if the additional code (resolving deltas again),
would justify the additional memory usage.
So my conclusion is, that the memory usage of index-pack with --strict
should not be too worse compared to pack-objects.
Please remember, that --strict is used for pushing data.
mfg Martin Kögler
next prev parent reply other threads:[~2008-02-11 19:57 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-02-10 17:58 [RFC Patch] Preventing corrupt objects from entering the repository Martin Koegler
2008-02-11 0:00 ` Junio C Hamano
2008-02-11 0:33 ` Nicolas Pitre
2008-02-11 19:56 ` Martin Koegler [this message]
2008-02-11 20:41 ` Nicolas Pitre
2008-02-11 21:58 ` Martin Koegler
2008-02-12 16:02 ` Nicolas Pitre
2008-02-12 19:04 ` Martin Koegler
2008-02-12 20:22 ` Nicolas Pitre
2008-02-12 21:38 ` Martin Koegler
2008-02-12 21:51 ` Nicolas Pitre
2008-02-13 6:20 ` Shawn O. Pearce
2008-02-13 7:39 ` Martin Koegler
2008-02-14 9:00 ` [RFC PATCH] Remove object-refs from fsck Shawn O. Pearce
2008-02-14 19:07 ` Martin Koegler
2008-02-13 7:42 ` [RFC Patch] Preventing corrupt objects from entering the repository Shawn O. Pearce
2008-02-13 8:11 ` Martin Koegler
2008-02-13 12:01 ` Johannes Schindelin
2008-02-14 6:16 ` Shawn O. Pearce
2008-02-14 19:04 ` Martin Koegler
2008-02-15 0:06 ` Johannes Schindelin
2008-02-15 7:18 ` Martin Koegler
2008-02-12 7:20 ` Martin Koegler
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20080211195623.GA21878@auto.tuwien.ac.at \
--to=mkoegler@auto.tuwien.ac.at \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=nico@cam.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).