git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: mkoegler@auto.tuwien.ac.at (Martin Koegler)
To: Nicolas Pitre <nico@cam.org>
Cc: Junio C Hamano <gitster@pobox.com>, git@vger.kernel.org
Subject: Re: [RFC Patch] Preventing corrupt objects from entering the repository
Date: Mon, 11 Feb 2008 20:56:23 +0100	[thread overview]
Message-ID: <20080211195623.GA21878@auto.tuwien.ac.at> (raw)
In-Reply-To: <alpine.LFD.1.00.0802101929310.2732@xanadu.home>

On Sun, Feb 10, 2008 at 07:33:34PM -0500, Nicolas Pitre wrote:
> On Sun, 10 Feb 2008, Junio C Hamano wrote:
> 
> > mkoegler@auto.tuwien.ac.at (Martin Koegler) writes:
> > 
> > > This patch adds a cache to keep the object data in memory. The delta
> > > resolving code must also search in the cache.
> > 
> > I have to wonder what the memory pressure in real-life usage
> > will be like.

> FWIW, I don't like this idea.
>
> I'm struggling to find ways to improve performances of 
> pack-objects/index-pack with those large repositories that are becoming 
> more common (i.e. GCC, OOO, Mozilla, etc.)  Anything that increase 
> memory usage isn't very welcome IMHO.

Maybe I have missed something, but all repack problems reported on the
git mailing list happen durring the deltifing phase. The problematic
files are mostly bigger blobs. I'm aware of these problems, so my
patch does not keep any blobs in memory.

As we are talking about memory, let's ignore unpack-objects, which is
used for small packs. Lets compare the memory usage of index-pack to
pack-objects:

If it is disabled (no --strict passed), only a (unused) pointer for
each object in the received pack file is additionally allocated.

On i386, struct object_entry is 84 bytes in pack-objects, but only 52
in index-pack. Both programs keep a struct object_entry for each
object during the runtime in memory. So in this case, index-pack uses
less memory than pack-objects

If the --strict option is passed, more memory is used:

* Again, we add one pointer to struct object_entry. object_entry is
  still smaller.(52<84 bytes).

* index-pack allocates a struct blob/tree/commit/tag for each object in the pack.

  pack-objects also allocates only struct object in the best case
  (reading from pack file), otherwise a struct
  blob/tree/commit/tag. This objects are kept during the runtime of
  pack-objects in memory.

  So depending of the parameters of pack-objects, index-pack uses
  additionally up to 24 bytes per object, but struct object_entry is 32
  bytes smaller.

* index-pack allocates a struct blob/tree/commit/tag for each link to a object outside the pack.

  I don't know the code of pack-objects enough to say something to
  this point.

* index-pack keeps the data for each tag/tree/commit in the pack in memory

  In the next version, I don't need to keep the tag/commit data in
  memory. Tree data could be reconstructed from the written pack,
  but I'm not sure, if the additional code (resolving deltas again),
  would justify the additional memory usage.

So my conclusion is, that the memory usage of index-pack with --strict
should not be too worse compared to pack-objects.

Please remember, that --strict is used for pushing data.

mfg Martin Kögler

  reply	other threads:[~2008-02-11 19:57 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-02-10 17:58 [RFC Patch] Preventing corrupt objects from entering the repository Martin Koegler
2008-02-11  0:00 ` Junio C Hamano
2008-02-11  0:33   ` Nicolas Pitre
2008-02-11 19:56     ` Martin Koegler [this message]
2008-02-11 20:41       ` Nicolas Pitre
2008-02-11 21:58         ` Martin Koegler
2008-02-12 16:02           ` Nicolas Pitre
2008-02-12 19:04             ` Martin Koegler
2008-02-12 20:22               ` Nicolas Pitre
2008-02-12 21:38                 ` Martin Koegler
2008-02-12 21:51                   ` Nicolas Pitre
2008-02-13  6:20                     ` Shawn O. Pearce
2008-02-13  7:39                       ` Martin Koegler
2008-02-14  9:00                         ` [RFC PATCH] Remove object-refs from fsck Shawn O. Pearce
2008-02-14 19:07                           ` Martin Koegler
2008-02-13  7:42             ` [RFC Patch] Preventing corrupt objects from entering the repository Shawn O. Pearce
2008-02-13  8:11               ` Martin Koegler
2008-02-13 12:01                 ` Johannes Schindelin
2008-02-14  6:16                   ` Shawn O. Pearce
2008-02-14 19:04                   ` Martin Koegler
2008-02-15  0:06                     ` Johannes Schindelin
2008-02-15  7:18                       ` Martin Koegler
2008-02-12  7:20   ` Martin Koegler

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080211195623.GA21878@auto.tuwien.ac.at \
    --to=mkoegler@auto.tuwien.ac.at \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=nico@cam.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).