git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Nicolas Pitre <nico@cam.org>
To: Linus Torvalds <torvalds@osdl.org>
Cc: Junio C Hamano <junkio@cox.net>,
	git@vger.kernel.org, Carl Baldwin <cnb@fc.hp.com>
Subject: Re: [PATCH] diff-delta: produce optimal pack data
Date: Sat, 25 Feb 2006 00:35:26 -0500 (EST)	[thread overview]
Message-ID: <Pine.LNX.4.64.0602250012230.31162@localhost.localdomain> (raw)
In-Reply-To: <Pine.LNX.4.64.0602242326381.31162@localhost.localdomain>


OOps.... Forgot to complete one paragraph.

On Sat, 25 Feb 2006, Nicolas Pitre wrote:

> I of course looked at the time to pack vs the size reduction in my 
> tests.  And really like I said above the cost is well balanced.  The 
> only issue is that smaller blocks are more likely to trap into 
> patological data sets.  But that problem does exist with larger blocks 
> too, to a lesser degree of course but still.  For example, using a 16 
> block size with adler32, computing a delta between two files 

... as provided by Carl takes up to _nine_ minutes for a _single_ delta !

So regardless of the block size used, the issue right now has more to do 
with that combinatorial explosion than the actual block size.  And 
preventing that patological case from expending out of bounds is pretty 
easy to do.

OK I just tested a tentative patch to trap that case and the time to 
delta those two 20MB files passed from over 9 minutes to only 36 seconds 
here, with less than 10% in delta size difference.  So I think I might 
be on the right track.  Further tuning might help even further.


Nicolas

  reply	other threads:[~2006-02-25  5:35 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2006-02-22  1:45 [PATCH] diff-delta: produce optimal pack data Nicolas Pitre
2006-02-24  8:49 ` Junio C Hamano
2006-02-24 15:37   ` Nicolas Pitre
2006-02-24 23:55     ` Junio C Hamano
2006-02-24 17:44   ` Carl Baldwin
2006-02-24 17:56     ` Nicolas Pitre
2006-02-24 18:35       ` Carl Baldwin
2006-02-24 18:57         ` Nicolas Pitre
2006-02-24 19:23           ` Carl Baldwin
2006-02-24 20:02             ` Nicolas Pitre
2006-02-24 20:40               ` Carl Baldwin
2006-02-24 21:12                 ` Nicolas Pitre
2006-02-24 22:50                   ` Carl Baldwin
2006-02-25  3:53                     ` Nicolas Pitre
2006-02-24 20:02             ` Linus Torvalds
2006-02-24 20:19               ` Nicolas Pitre
2006-02-24 20:53               ` Junio C Hamano
2006-02-24 21:39                 ` Nicolas Pitre
2006-02-24 21:48                   ` Nicolas Pitre
2006-02-25  0:45                   ` Linus Torvalds
2006-02-25  3:07                     ` Nicolas Pitre
2006-02-25  4:05                       ` Linus Torvalds
2006-02-25  5:10                         ` Nicolas Pitre
2006-02-25  5:35                           ` Nicolas Pitre [this message]
2006-03-07 23:48                             ` [RFH] zlib gurus out there? Junio C Hamano
2006-03-08  0:59                               ` Linus Torvalds
2006-03-08  1:22                                 ` Junio C Hamano
2006-03-08  2:00                                   ` Linus Torvalds
2006-03-08  9:46                                     ` Johannes Schindelin
2006-03-08 10:45                               ` [PATCH] write_sha1_file(): Perform Z_FULL_FLUSH between header and data Sergey Vlasov
2006-03-08 11:04                                 ` Junio C Hamano
2006-03-08 14:17                                   ` Sergey Vlasov
2006-02-25 19:18                           ` [PATCH] diff-delta: produce optimal pack data Linus Torvalds
2006-02-24 18:49       ` Carl Baldwin
2006-02-24 19:03         ` Nicolas Pitre

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Pine.LNX.4.64.0602250012230.31162@localhost.localdomain \
    --to=nico@cam.org \
    --cc=cnb@fc.hp.com \
    --cc=git@vger.kernel.org \
    --cc=junkio@cox.net \
    --cc=torvalds@osdl.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).