From: mkoegler@auto.tuwien.ac.at (Martin Koegler)
To: Junio C Hamano <junkio@cox.net>
Cc: git@vger.kernel.org
Subject: Re: [PATCH] git-pack-objects: cache small deltas between big objects
Date: Mon, 21 May 2007 19:00:43 +0200 [thread overview]
Message-ID: <20070521170043.GA27063@auto.tuwien.ac.at> (raw)
In-Reply-To: <7v646mixnm.fsf@assigned-by-dhcp.cox.net>
On Sun, May 20, 2007 at 09:54:53PM -0700, Junio C Hamano wrote:
> Martin Koegler <mkoegler@auto.tuwien.ac.at> writes:
> > diff --git a/builtin-pack-objects.c b/builtin-pack-objects.c
> > index d165f10..13429d0 100644
> > --- a/builtin-pack-objects.c
> > +++ b/builtin-pack-objects.c
> > ...
> > @@ -1294,10 +1302,17 @@ static int try_delta(struct unpacked *trg, struct unpacked *src,
> > if (!delta_buf)
> > return 0;
> >
> > + if (trg_entry->delta_data)
> > + free (trg_entry->delta_data);
> > + trg_entry->delta_data = 0;
> > trg_entry->delta = src_entry;
> > trg_entry->delta_size = delta_size;
> > trg_entry->depth = src_entry->depth + 1;
> > - free(delta_buf);
> > + /* cache delta, if objects are large enough compared to delta size */
> > + if ((src_size >> 20) + (trg_size >> 21) > (delta_size >> 10))
> > + trg_entry->delta_data = delta_buf;
> > + else
> > + free(delta_buf);
> > return 1;
> > }
>
> Care to justify this arithmetic? Why isn't it for example like
> this?
>
> ((src_size + trg_size) >> 10) > delta_size
I wanted to avoid a possible overflow in (src_size + trg_size), so
I shift both sides.
> I am puzzled by the shifts on both ends, and differences between
> 20 and 21.
I base the maximum allowed delta_size for caching on the required
memory for creating the delta. For the src entry, you need need a
delta index, which has (about) the same size of the src entry. So I
count the src entry double.
I divide the requried memory by 1024, so that the delta size is some
magnitudes smaller and will not cause a big increase of memory usage,
eg:
For two 100 MB (uncompressed) blobs, we need 300MB of memory to do the
delta (with the default window size of 10 up to 1900MB for all delta
indexes in the worst case). The patch will limit the delta size for
the target blob to 150kB.
The caching policy does only cache really small deltas for really big
objects, as I wanted to avoid out of memory situations. Futurer patch
should probably replace it with a better strategy.
mfg Martin Kögler
prev parent reply other threads:[~2007-05-21 17:00 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-05-20 21:11 [PATCH] git-pack-objects: cache small deltas between big objects Martin Koegler
2007-05-21 4:35 ` Dana How
2007-05-21 17:59 ` Martin Koegler
2007-05-22 7:01 ` Dana How
2007-05-22 8:04 ` Junio C Hamano
2007-05-22 9:25 ` Dana How
2007-05-21 4:54 ` Junio C Hamano
2007-05-21 17:00 ` Martin Koegler [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20070521170043.GA27063@auto.tuwien.ac.at \
--to=mkoegler@auto.tuwien.ac.at \
--cc=git@vger.kernel.org \
--cc=junkio@cox.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).