git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: mkoegler@auto.tuwien.ac.at (Martin Koegler)
To: Nicolas Pitre <nico@cam.org>
Cc: Junio C Hamano <junkio@cox.net>, git@vger.kernel.org
Subject: Re: [PATCH] improve delta long block matching with big files
Date: Sat, 26 May 2007 17:19:09 +0200	[thread overview]
Message-ID: <20070526151909.GA9429@auto.tuwien.ac.at> (raw)

Nicolas Pitre wrote:
> Martin Koegler noted that create_delta() performs a new hash lookup
> after every block copy encoding which are currently limited to 64KB.
> 
> In case of larger identical blocks, the next hash lookup would normally
> point to the next 64KB block in the reference buffer and multiple block
> copy operations will be consecutively encoded.
> 
> It is however possible that the reference buffer be sparsely indexed if
> hash buckets have been trimmed down in create_delta_index() when hashing
> of the reference buffer isn't well balanced.  In that case the hash
> lookup following a block copy might fail to match anything and the fact
> that the reference buffer still matches beyond the previous 64KB block
> will be missed.
> 
> Let's rework the code so that buffer comparison isn't bounded to 64KB
> anymore.  The match size should be as large as possible up front and
> only then should multiple block copy be encoded to cover it all.
> Also, fewer hash lookups will be performed in the end.
> 
> According to Martin, this patch should reduce his 92MB pack down to 75MB
> with the dataset he has.
> 
> Tests performed on the Linux kernel repo show a slightly smaller pack and
> a slightly faster repack.
>
Acked-by: Martin Koegler <mkoegler@auto.tuwien.ac.at>
> Signed-off-by: Nicolas Pitre <nico@cam.org>
---

The patch results in a 75 MB pack file for my repository and is
faster:

Total 6452 (delta 4581), reused 1522 (delta 0)
10073.11user 5200.33system 4:14:36elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+1371504760minor)pagefaults 0swaps

mfg Martin Kögler

             reply	other threads:[~2007-05-26 15:19 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-05-26 15:19 Martin Koegler [this message]
  -- strict thread matches above, loose matches on Subject: below --
2007-05-26  1:38 [PATCH] improve delta long block matching with big files Nicolas Pitre

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20070526151909.GA9429@auto.tuwien.ac.at \
    --to=mkoegler@auto.tuwien.ac.at \
    --cc=git@vger.kernel.org \
    --cc=junkio@cox.net \
    --cc=nico@cam.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).