* RFC: New diff-delta.c implementation
@ 2006-04-21 21:16 Geert Bosch
2006-04-22 3:19 ` Nicolas Pitre
` (3 more replies)
0 siblings, 4 replies; 32+ messages in thread
From: Geert Bosch @ 2006-04-21 21:16 UTC (permalink / raw)
To: Git Mailing List; +Cc: Junio C Hamano
[-- Attachment #1: Type: text/plain, Size: 1056 bytes --]
I wrote a new binary differencing algorithm that is both faster
and generates smaller deltas than the current implementation.
The format is compatible with that used by patch-delta, so
it should be easy to integrate.
Originally, I wrote this for the GDIFF format, see http://www.w3.org/
TR/NOTE-gdiff-19970901.
The adaptation for GIT format was relatively simple, but is not
thoroughly tested.
The code is not derived from libxdiff, but uses the rabin_slide
function written
by David Mazieres (dm@uun.org). Also the tables are generated using
his code.
Finally, this was developed on Darwin, and not a Linux system, so
some changes may be needed.
Initial testing seems quite positive, take for example git-1.2.5.tar
vs git-1.2.6.tar
on my PowerBook (both with -O2 -DNDEBUG):
current: 2.281s, patch size 36563
new : 0.109s, patch size 16199
Please feel free to play around with this code, and give feedback.
Keep in mind this wasn't originally written for GIT, and C is not
my native language, so don't mind my formatting etc.
-Geert
[-- Attachment #2: diff-delta.c --]
[-- Type: application/octet-stream, Size: 30518 bytes --]
#include <unistd.h>
#include <stdlib.h>
#include <assert.h>
#include <string.h>
#include <sys/types.h>
/* MIN_HTAB_SIZE is fixed amount to be added to the size of the hash table
used for indexing and must be a power of two. This allows for small files
to have a sparse hash table, since in that case it's cheap.
Hash table sizes are rounded up to a power of two to avoid integer division.
*/
#define MIN_HTAB_SIZE 8192
#define MAX_HTAB_SIZE (1024*1024*1024)
/* Diffing files of gigabyte range is impractical with the current
algorithm, so we're assuming 32-bit sizes everywhere.
Size leaves some room for expansion when diffing random files. */
#define MAX_SIZE (0x7eff0000)
/* Initial size of copies table, dynamically extended as needed. */
#define MAX_COPIES 4096
/* Matching is done using a sliding window for which a Rabin
polynomial is computed. The advantage of such polynomials is
that they can efficiently be updated at every position.
The tables needed for this are precomputed, as it is desirable
to use the same polynomial all the time for repeatable results.
*/
#define RABIN_WINDOW_SIZE 22
#define RABIN_SHIFT 55
static unsigned long long T[256] =
{ 0x0000000000000000ULL, 0xb15e234bd3792f63ULL, 0x62bc4697a6f25ec6ULL,
0xd3e265dc758b71a5ULL, 0x7426ae649e9d92efULL, 0xc5788d2f4de4bd8cULL,
0x169ae8f3386fcc29ULL, 0xa7c4cbb8eb16e34aULL, 0x59137f82ee420abdULL,
0xe84d5cc93d3b25deULL, 0x3baf391548b0547bULL, 0x8af11a5e9bc97b18ULL,
0x2d35d1e670df9852ULL, 0x9c6bf2ada3a6b731ULL, 0x4f899771d62dc694ULL,
0xfed7b43a0554e9f7ULL, 0x0378dc4e0ffd3a19ULL, 0xb226ff05dc84157aULL,
0x61c49ad9a90f64dfULL, 0xd09ab9927a764bbcULL, 0x775e722a9160a8f6ULL,
0xc600516142198795ULL, 0x15e234bd3792f630ULL, 0xa4bc17f6e4ebd953ULL,
0x5a6ba3cce1bf30a4ULL, 0xeb35808732c61fc7ULL, 0x38d7e55b474d6e62ULL,
0x8989c61094344101ULL, 0x2e4d0da87f22a24bULL, 0x9f132ee3ac5b8d28ULL,
0x4cf14b3fd9d0fc8dULL, 0xfdaf68740aa9d3eeULL, 0x06f1b89c1ffa7432ULL,
0xb7af9bd7cc835b51ULL, 0x644dfe0bb9082af4ULL, 0xd513dd406a710597ULL,
0x72d716f88167e6ddULL, 0xc38935b3521ec9beULL, 0x106b506f2795b81bULL,
0xa1357324f4ec9778ULL, 0x5fe2c71ef1b87e8fULL, 0xeebce45522c151ecULL,
0x3d5e8189574a2049ULL, 0x8c00a2c284330f2aULL, 0x2bc4697a6f25ec60ULL,
0x9a9a4a31bc5cc303ULL, 0x49782fedc9d7b2a6ULL, 0xf8260ca61aae9dc5ULL,
0x058964d210074e2bULL, 0xb4d74799c37e6148ULL, 0x67352245b6f510edULL,
0xd66b010e658c3f8eULL, 0x71afcab68e9adcc4ULL, 0xc0f1e9fd5de3f3a7ULL,
0x13138c2128688202ULL, 0xa24daf6afb11ad61ULL, 0x5c9a1b50fe454496ULL,
0xedc4381b2d3c6bf5ULL, 0x3e265dc758b71a50ULL, 0x8f787e8c8bce3533ULL,
0x28bcb53460d8d679ULL, 0x99e2967fb3a1f91aULL, 0x4a00f3a3c62a88bfULL,
0xfb5ed0e81553a7dcULL, 0x0de371383ff4e864ULL, 0xbcbd5273ec8dc707ULL,
0x6f5f37af9906b6a2ULL, 0xde0114e44a7f99c1ULL, 0x79c5df5ca1697a8bULL,
0xc89bfc17721055e8ULL, 0x1b7999cb079b244dULL, 0xaa27ba80d4e20b2eULL,
0x54f00ebad1b6e2d9ULL, 0xe5ae2df102cfcdbaULL, 0x364c482d7744bc1fULL,
0x87126b66a43d937cULL, 0x20d6a0de4f2b7036ULL, 0x918883959c525f55ULL,
0x426ae649e9d92ef0ULL, 0xf334c5023aa00193ULL, 0x0e9bad763009d27dULL,
0xbfc58e3de370fd1eULL, 0x6c27ebe196fb8cbbULL, 0xdd79c8aa4582a3d8ULL,
0x7abd0312ae944092ULL, 0xcbe320597ded6ff1ULL, 0x1801458508661e54ULL,
0xa95f66cedb1f3137ULL, 0x5788d2f4de4bd8c0ULL, 0xe6d6f1bf0d32f7a3ULL,
0x3534946378b98606ULL, 0x846ab728abc0a965ULL, 0x23ae7c9040d64a2fULL,
0x92f05fdb93af654cULL, 0x41123a07e62414e9ULL, 0xf04c194c355d3b8aULL,
0x0b12c9a4200e9c56ULL, 0xba4ceaeff377b335ULL, 0x69ae8f3386fcc290ULL,
0xd8f0ac785585edf3ULL, 0x7f3467c0be930eb9ULL, 0xce6a448b6dea21daULL,
0x1d8821571861507fULL, 0xacd6021ccb187f1cULL, 0x5201b626ce4c96ebULL,
0xe35f956d1d35b988ULL, 0x30bdf0b168bec82dULL, 0x81e3d3fabbc7e74eULL,
0x2627184250d10404ULL, 0x97793b0983a82b67ULL, 0x449b5ed5f6235ac2ULL,
0xf5c57d9e255a75a1ULL, 0x086a15ea2ff3a64fULL, 0xb93436a1fc8a892cULL,
0x6ad6537d8901f889ULL, 0xdb8870365a78d7eaULL, 0x7c4cbb8eb16e34a0ULL,
0xcd1298c562171bc3ULL, 0x1ef0fd19179c6a66ULL, 0xafaede52c4e54505ULL,
0x51796a68c1b1acf2ULL, 0xe027492312c88391ULL, 0x33c52cff6743f234ULL,
0x829b0fb4b43add57ULL, 0x255fc40c5f2c3e1dULL, 0x9401e7478c55117eULL,
0x47e3829bf9de60dbULL, 0xf6bda1d02aa74fb8ULL, 0x1bc6e2707fe9d0c8ULL,
0xaa98c13bac90ffabULL, 0x797aa4e7d91b8e0eULL, 0xc82487ac0a62a16dULL,
0x6fe04c14e1744227ULL, 0xdebe6f5f320d6d44ULL, 0x0d5c0a8347861ce1ULL,
0xbc0229c894ff3382ULL, 0x42d59df291abda75ULL, 0xf38bbeb942d2f516ULL,
0x2069db65375984b3ULL, 0x9137f82ee420abd0ULL, 0x36f333960f36489aULL,
0x87ad10dddc4f67f9ULL, 0x544f7501a9c4165cULL, 0xe511564a7abd393fULL,
0x18be3e3e7014ead1ULL, 0xa9e01d75a36dc5b2ULL, 0x7a0278a9d6e6b417ULL,
0xcb5c5be2059f9b74ULL, 0x6c98905aee89783eULL, 0xddc6b3113df0575dULL,
0x0e24d6cd487b26f8ULL, 0xbf7af5869b02099bULL, 0x41ad41bc9e56e06cULL,
0xf0f362f74d2fcf0fULL, 0x2311072b38a4beaaULL, 0x924f2460ebdd91c9ULL,
0x358befd800cb7283ULL, 0x84d5cc93d3b25de0ULL, 0x5737a94fa6392c45ULL,
0xe6698a0475400326ULL, 0x1d375aec6013a4faULL, 0xac6979a7b36a8b99ULL,
0x7f8b1c7bc6e1fa3cULL, 0xced53f301598d55fULL, 0x6911f488fe8e3615ULL,
0xd84fd7c32df71976ULL, 0x0badb21f587c68d3ULL, 0xbaf391548b0547b0ULL,
0x4424256e8e51ae47ULL, 0xf57a06255d288124ULL, 0x269863f928a3f081ULL,
0x97c640b2fbdadfe2ULL, 0x30028b0a10cc3ca8ULL, 0x815ca841c3b513cbULL,
0x52becd9db63e626eULL, 0xe3e0eed665474d0dULL, 0x1e4f86a26fee9ee3ULL,
0xaf11a5e9bc97b180ULL, 0x7cf3c035c91cc025ULL, 0xcdade37e1a65ef46ULL,
0x6a6928c6f1730c0cULL, 0xdb370b8d220a236fULL, 0x08d56e51578152caULL,
0xb98b4d1a84f87da9ULL, 0x475cf92081ac945eULL, 0xf602da6b52d5bb3dULL,
0x25e0bfb7275eca98ULL, 0x94be9cfcf427e5fbULL, 0x337a57441f3106b1ULL,
0x8224740fcc4829d2ULL, 0x51c611d3b9c35877ULL, 0xe09832986aba7714ULL,
0x16259348401d38acULL, 0xa77bb003936417cfULL, 0x7499d5dfe6ef666aULL,
0xc5c7f69435964909ULL, 0x62033d2cde80aa43ULL, 0xd35d1e670df98520ULL,
0x00bf7bbb7872f485ULL, 0xb1e158f0ab0bdbe6ULL, 0x4f36eccaae5f3211ULL,
0xfe68cf817d261d72ULL, 0x2d8aaa5d08ad6cd7ULL, 0x9cd48916dbd443b4ULL,
0x3b1042ae30c2a0feULL, 0x8a4e61e5e3bb8f9dULL, 0x59ac04399630fe38ULL,
0xe8f227724549d15bULL, 0x155d4f064fe002b5ULL, 0xa4036c4d9c992dd6ULL,
0x77e10991e9125c73ULL, 0xc6bf2ada3a6b7310ULL, 0x617be162d17d905aULL,
0xd025c2290204bf39ULL, 0x03c7a7f5778fce9cULL, 0xb29984bea4f6e1ffULL,
0x4c4e3084a1a20808ULL, 0xfd1013cf72db276bULL, 0x2ef27613075056ceULL,
0x9fac5558d42979adULL, 0x38689ee03f3f9ae7ULL, 0x8936bdabec46b584ULL,
0x5ad4d87799cdc421ULL, 0xeb8afb3c4ab4eb42ULL, 0x10d42bd45fe74c9eULL,
0xa18a089f8c9e63fdULL, 0x72686d43f9151258ULL, 0xc3364e082a6c3d3bULL,
0x64f285b0c17ade71ULL, 0xd5aca6fb1203f112ULL, 0x064ec327678880b7ULL,
0xb710e06cb4f1afd4ULL, 0x49c75456b1a54623ULL, 0xf899771d62dc6940ULL,
0x2b7b12c1175718e5ULL, 0x9a25318ac42e3786ULL, 0x3de1fa322f38d4ccULL,
0x8cbfd979fc41fbafULL, 0x5f5dbca589ca8a0aULL, 0xee039fee5ab3a569ULL,
0x13acf79a501a7687ULL, 0xa2f2d4d1836359e4ULL, 0x7110b10df6e82841ULL,
0xc04e924625910722ULL, 0x678a59fece87e468ULL, 0xd6d47ab51dfecb0bULL,
0x05361f696875baaeULL, 0xb4683c22bb0c95cdULL, 0x4abf8818be587c3aULL,
0xfbe1ab536d215359ULL, 0x2803ce8f18aa22fcULL, 0x995dedc4cbd30d9fULL,
0x3e99267c20c5eed5ULL, 0x8fc70537f3bcc1b6ULL, 0x5c2560eb8637b013ULL,
0xed7b43a0554e9f70ULL
};
static unsigned long long U[256] =
{ 0x0000000000000000ULL, 0x079343d61ab9f60eULL, 0x0f2687ac3573ec1cULL,
0x08b5c47a2fca1a12ULL, 0x1e4d0f586ae7d838ULL, 0x19de4c8e705e2e36ULL,
0x116b88f45f943424ULL, 0x16f8cb22452dc22aULL, 0x3c9a1eb0d5cfb070ULL,
0x3b095d66cf76467eULL, 0x33bc991ce0bc5c6cULL, 0x342fdacafa05aa62ULL,
0x22d711e8bf286848ULL, 0x2544523ea5919e46ULL, 0x2df196448a5b8454ULL,
0x2a62d59290e2725aULL, 0x79343d61ab9f60e0ULL, 0x7ea77eb7b12696eeULL,
0x7612bacd9eec8cfcULL, 0x7181f91b84557af2ULL, 0x67793239c178b8d8ULL,
0x60ea71efdbc14ed6ULL, 0x685fb595f40b54c4ULL, 0x6fccf643eeb2a2caULL,
0x45ae23d17e50d090ULL, 0x423d600764e9269eULL, 0x4a88a47d4b233c8cULL,
0x4d1be7ab519aca82ULL, 0x5be32c8914b708a8ULL, 0x5c706f5f0e0efea6ULL,
0x54c5ab2521c4e4b4ULL, 0x5356e8f33b7d12baULL, 0x433659888447eea3ULL,
0x44a51a5e9efe18adULL, 0x4c10de24b13402bfULL, 0x4b839df2ab8df4b1ULL,
0x5d7b56d0eea0369bULL, 0x5ae81506f419c095ULL, 0x525dd17cdbd3da87ULL,
0x55ce92aac16a2c89ULL, 0x7fac473851885ed3ULL, 0x783f04ee4b31a8ddULL,
0x708ac09464fbb2cfULL, 0x771983427e4244c1ULL, 0x61e148603b6f86ebULL,
0x66720bb621d670e5ULL, 0x6ec7cfcc0e1c6af7ULL, 0x69548c1a14a59cf9ULL,
0x3a0264e92fd88e43ULL, 0x3d91273f3561784dULL, 0x3524e3451aab625fULL,
0x32b7a09300129451ULL, 0x244f6bb1453f567bULL, 0x23dc28675f86a075ULL,
0x2b69ec1d704cba67ULL, 0x2cfaafcb6af54c69ULL, 0x06987a59fa173e33ULL,
0x010b398fe0aec83dULL, 0x09befdf5cf64d22fULL, 0x0e2dbe23d5dd2421ULL,
0x18d5750190f0e60bULL, 0x1f4636d78a491005ULL, 0x17f3f2ada5830a17ULL,
0x1060b17bbf3afc19ULL, 0x3732905adbf6f225ULL, 0x30a1d38cc14f042bULL,
0x381417f6ee851e39ULL, 0x3f875420f43ce837ULL, 0x297f9f02b1112a1dULL,
0x2eecdcd4aba8dc13ULL, 0x265918ae8462c601ULL, 0x21ca5b789edb300fULL,
0x0ba88eea0e394255ULL, 0x0c3bcd3c1480b45bULL, 0x048e09463b4aae49ULL,
0x031d4a9021f35847ULL, 0x15e581b264de9a6dULL, 0x1276c2647e676c63ULL,
0x1ac3061e51ad7671ULL, 0x1d5045c84b14807fULL, 0x4e06ad3b706992c5ULL,
0x4995eeed6ad064cbULL, 0x41202a97451a7ed9ULL, 0x46b369415fa388d7ULL,
0x504ba2631a8e4afdULL, 0x57d8e1b50037bcf3ULL, 0x5f6d25cf2ffda6e1ULL,
0x58fe6619354450efULL, 0x729cb38ba5a622b5ULL, 0x750ff05dbf1fd4bbULL,
0x7dba342790d5cea9ULL, 0x7a2977f18a6c38a7ULL, 0x6cd1bcd3cf41fa8dULL,
0x6b42ff05d5f80c83ULL, 0x63f73b7ffa321691ULL, 0x646478a9e08be09fULL,
0x7404c9d25fb11c86ULL, 0x73978a044508ea88ULL, 0x7b224e7e6ac2f09aULL,
0x7cb10da8707b0694ULL, 0x6a49c68a3556c4beULL, 0x6dda855c2fef32b0ULL,
0x656f4126002528a2ULL, 0x62fc02f01a9cdeacULL, 0x489ed7628a7eacf6ULL,
0x4f0d94b490c75af8ULL, 0x47b850cebf0d40eaULL, 0x402b1318a5b4b6e4ULL,
0x56d3d83ae09974ceULL, 0x51409becfa2082c0ULL, 0x59f55f96d5ea98d2ULL,
0x5e661c40cf536edcULL, 0x0d30f4b3f42e7c66ULL, 0x0aa3b765ee978a68ULL,
0x0216731fc15d907aULL, 0x058530c9dbe46674ULL, 0x137dfbeb9ec9a45eULL,
0x14eeb83d84705250ULL, 0x1c5b7c47abba4842ULL, 0x1bc83f91b103be4cULL,
0x31aaea0321e1cc16ULL, 0x3639a9d53b583a18ULL, 0x3e8c6daf1492200aULL,
0x391f2e790e2bd604ULL, 0x2fe7e55b4b06142eULL, 0x2874a68d51bfe220ULL,
0x20c162f77e75f832ULL, 0x2752212164cc0e3cULL, 0x6e6520b5b7ede44aULL,
0x69f66363ad541244ULL, 0x6143a719829e0856ULL, 0x66d0e4cf9827fe58ULL,
0x70282feddd0a3c72ULL, 0x77bb6c3bc7b3ca7cULL, 0x7f0ea841e879d06eULL,
0x789deb97f2c02660ULL, 0x52ff3e056222543aULL, 0x556c7dd3789ba234ULL,
0x5dd9b9a95751b826ULL, 0x5a4afa7f4de84e28ULL, 0x4cb2315d08c58c02ULL,
0x4b21728b127c7a0cULL, 0x4394b6f13db6601eULL, 0x4407f527270f9610ULL,
0x17511dd41c7284aaULL, 0x10c25e0206cb72a4ULL, 0x18779a78290168b6ULL,
0x1fe4d9ae33b89eb8ULL, 0x091c128c76955c92ULL, 0x0e8f515a6c2caa9cULL,
0x063a952043e6b08eULL, 0x01a9d6f6595f4680ULL, 0x2bcb0364c9bd34daULL,
0x2c5840b2d304c2d4ULL, 0x24ed84c8fcced8c6ULL, 0x237ec71ee6772ec8ULL,
0x35860c3ca35aece2ULL, 0x32154feab9e31aecULL, 0x3aa08b90962900feULL,
0x3d33c8468c90f6f0ULL, 0x2d53793d33aa0ae9ULL, 0x2ac03aeb2913fce7ULL,
0x2275fe9106d9e6f5ULL, 0x25e6bd471c6010fbULL, 0x331e7665594dd2d1ULL,
0x348d35b343f424dfULL, 0x3c38f1c96c3e3ecdULL, 0x3babb21f7687c8c3ULL,
0x11c9678de665ba99ULL, 0x165a245bfcdc4c97ULL, 0x1eefe021d3165685ULL,
0x197ca3f7c9afa08bULL, 0x0f8468d58c8262a1ULL, 0x08172b03963b94afULL,
0x00a2ef79b9f18ebdULL, 0x0731acafa34878b3ULL, 0x5467445c98356a09ULL,
0x53f4078a828c9c07ULL, 0x5b41c3f0ad468615ULL, 0x5cd28026b7ff701bULL,
0x4a2a4b04f2d2b231ULL, 0x4db908d2e86b443fULL, 0x450ccca8c7a15e2dULL,
0x429f8f7edd18a823ULL, 0x68fd5aec4dfada79ULL, 0x6f6e193a57432c77ULL,
0x67dbdd4078893665ULL, 0x60489e966230c06bULL, 0x76b055b4271d0241ULL,
0x712316623da4f44fULL, 0x7996d218126eee5dULL, 0x7e0591ce08d71853ULL,
0x5957b0ef6c1b166fULL, 0x5ec4f33976a2e061ULL, 0x567137435968fa73ULL,
0x51e2749543d10c7dULL, 0x471abfb706fcce57ULL, 0x4089fc611c453859ULL,
0x483c381b338f224bULL, 0x4faf7bcd2936d445ULL, 0x65cdae5fb9d4a61fULL,
0x625eed89a36d5011ULL, 0x6aeb29f38ca74a03ULL, 0x6d786a25961ebc0dULL,
0x7b80a107d3337e27ULL, 0x7c13e2d1c98a8829ULL, 0x74a626abe640923bULL,
0x7335657dfcf96435ULL, 0x20638d8ec784768fULL, 0x27f0ce58dd3d8081ULL,
0x2f450a22f2f79a93ULL, 0x28d649f4e84e6c9dULL, 0x3e2e82d6ad63aeb7ULL,
0x39bdc100b7da58b9ULL, 0x3108057a981042abULL, 0x369b46ac82a9b4a5ULL,
0x1cf9933e124bc6ffULL, 0x1b6ad0e808f230f1ULL, 0x13df149227382ae3ULL,
0x144c57443d81dcedULL, 0x02b49c6678ac1ec7ULL, 0x0527dfb06215e8c9ULL,
0x0d921bca4ddff2dbULL, 0x0a01581c576604d5ULL, 0x1a61e967e85cf8ccULL,
0x1df2aab1f2e50ec2ULL, 0x15476ecbdd2f14d0ULL, 0x12d42d1dc796e2deULL,
0x042ce63f82bb20f4ULL, 0x03bfa5e99802d6faULL, 0x0b0a6193b7c8cce8ULL,
0x0c992245ad713ae6ULL, 0x26fbf7d73d9348bcULL, 0x2168b401272abeb2ULL,
0x29dd707b08e0a4a0ULL, 0x2e4e33ad125952aeULL, 0x38b6f88f57749084ULL,
0x3f25bb594dcd668aULL, 0x37907f2362077c98ULL, 0x30033cf578be8a96ULL,
0x6355d40643c3982cULL, 0x64c697d0597a6e22ULL, 0x6c7353aa76b07430ULL,
0x6be0107c6c09823eULL, 0x7d18db5e29244014ULL, 0x7a8b9888339db61aULL,
0x723e5cf21c57ac08ULL, 0x75ad1f2406ee5a06ULL, 0x5fcfcab6960c285cULL,
0x585c89608cb5de52ULL, 0x50e94d1aa37fc440ULL, 0x577a0eccb9c6324eULL,
0x4182c5eefcebf064ULL, 0x46118638e652066aULL, 0x4ea44242c9981c78ULL,
0x49370194d321ea76ULL
};
static unsigned char rabin_window[RABIN_WINDOW_SIZE];
static unsigned rabin_pos = 0;
static void rabin_reset();
static u_int64_t rabin_slide(u_int64_t fp, unsigned char m);
#define MIN(x,y) ((y)<(x) ? (y) : (x))
#define MAX(x,y) ((y)>(x) ? (y) : (x))
/* FIXME: There must be a better way to do this... */
#if !defined(_BIG_ENDIAN) && defined(__BIG_ENDIAN) && defined(__BYTE_ORDER)
static const int big_endian = (__BYTE_ORDER == __BIG_ENDIAN);
#elif !defined(_BIG_ENDIAN) && !defined(_LITTLE_ENDIAN)
#error "Exactly one of _BIG_ENDIAN or _LITTLE_ENDIAN must be defined"
#elif defined(_BIG_ENDIAN) && defined(_LITTLE_ENDIAN)
#error "Only one of _BIG_ENDIAN or _LITTLE_ENDIAN may be defined"
#elif defined(_BIG_ENDIAN)
static const int big_endian = 1;
#else
static cont int big_endian = 0;
#endif
/* The copies array is the central data structure for
diff generation. Data statements are implicit,
for ranges not covered by any copy command.
The sum of tgt and length for each entry must be
monotonically increasing, and data ranges
must be non-overlapping. This is accomplished by
not extending matches backwards during initial matching.
Copies may have zero length, to make it quick to
delete copies during optimization. However, the last
copy in the list must always be a non-trivial copy.
Before committing copies, an important optimization
is performed: during a backward pass through the copies array,
each entry is extended backwards, and redundant copies are
eliminated.
If each match were extended backwards on insertion, the same
data may be matched an arbitrary number of times, resulting in
potentially quadratic time behavior.
*/
typedef struct copyinfo
{ unsigned src;
unsigned tgt;
unsigned length;
} CopyInfo;
static CopyInfo *copies;
static int copy_count = 0;
static unsigned max_copies = 0; /* Dynamically increased */
static unsigned *idx;
static unsigned idx_size;
static unsigned char *idx_data;
static unsigned idx_data_len;
static void
rabin_reset ()
{ bzero (rabin_window, sizeof (rabin_window));
}
static u_int64_t
rabin_slide (u_int64_t fp, unsigned char m)
{ unsigned char om;
if (++rabin_pos == RABIN_WINDOW_SIZE) rabin_pos = 0;
om = rabin_window[rabin_pos];
fp ^= U[om];
rabin_window[rabin_pos] = m;
fp = ((fp << 8) | m) ^ T[fp >> RABIN_SHIFT];
return fp;
}
void init_idx (unsigned char *data, size_t len, int level)
{ static unsigned index_step
= RABIN_WINDOW_SIZE / sizeof (unsigned) * sizeof (unsigned);
size_t j, k;
unsigned char ch = 0;
unsigned maxofs[256];
unsigned maxlen[256];
unsigned maxfp[256];
unsigned runlen = 0;
u_int64_t fp = 0;
assert (len <= MAX_SIZE);
assert (level >= 0 && level <= 9);
bzero (maxofs, 256 * sizeof (unsigned));
bzero (maxlen, 256 * sizeof (unsigned));
bzero (maxfp, 256 * sizeof (unsigned));
/* index_step must be multiple of word size */
if (level >= 1)
{ index_step = MIN (index_step, 4 * sizeof (unsigned));
/* Use smaller step size for higher optimization levels or smaller files */
if (level >= 3 || len <= 65536)
{ index_step = MIN (index_step, 3 * sizeof (unsigned));
}
if (level >= 4 || len <= 32768)
{ index_step = MIN (index_step, 2 * sizeof (unsigned));
}
if (level >= 6 || len < 16384)
{ index_step = MIN (index_step, 1 * sizeof (unsigned));
} }
assert (index_step && !(index_step % sizeof (unsigned)));
/* Add fixed amount to hash table size, as small files will benefit
a lot without using significantly more memory or time. */
idx_size = (level + 1) * (len / index_step) / 2 + MIN_HTAB_SIZE;
idx_size = MIN (idx_size, MAX_HTAB_SIZE - 1); /* So rounding up works */
/* Round up to next power of two, but limit to MAX_HTAB_SIZE. */
{ unsigned s = MIN_HTAB_SIZE;
while (s < idx_size) s += s;
idx_size = s;
}
idx_data = data;
idx_data_len = len;
idx = (unsigned *) calloc (idx_size, sizeof (unsigned));
/* It is tempting to first index higher addresses, so hashes of lower
addresses will get preference in the hash table. However, for
repetitive patterns with a period that is a divisor of the fingerprint
window, this may mean the match is not anchored at the end.
Furthermore, even when using a window length that is prime, the
benefits are small and the irregularity of the first matches being
more important is not worth it. */
rabin_reset();
ch = 0;
runlen = 0;
for (j = 0; j + index_step < len; j += index_step)
{ unsigned char pch = 0;
unsigned hash;
/* hot loop, use word loads. */
for (k = 0; k < index_step; k+= sizeof (unsigned))
{ unsigned w = *((unsigned *) (data + (j + k)));
unsigned n;
for (n = 0; n < sizeof (unsigned); n++)
{ pch = ch;
ch = big_endian ? (w>>24) & 0xff : w & 0xff;
w = big_endian ? (w<<8) : (w>>8);
if (ch != pch) runlen = 0;
runlen++;
fp = rabin_slide (fp, ch);
} }
/* See if there is a word-aligned window-sized run of equal characters */
if (runlen >= RABIN_WINDOW_SIZE + (sizeof (unsigned) - 1))
{ /* Skip ahead to end of run of identical input characters */
while (j + k < len && data[j + k] == ch) { k++; runlen++; }
/* Although matches are usually anchored at the end, in the case
of extended runs of equal characters it is better to anchor after the
first RABIN_WINDOW_SIZE bytes. This allows for quick skip ahead
while matching such runs, avoiding unneeded fingerprint calculations.
Also, when anchoring at the end, matches will be generated after
every word, because the fingerprint stays constant. Even though
all matches would get combined during match optimization,
it wastes time and space.
*/
if (runlen > maxlen[pch] + 4)
{ unsigned ofs;
/* ofs points RABIN_WINDOW_SIZE bytes after the start of the run,
rounded up to the next word */
ofs = j + k - runlen + RABIN_WINDOW_SIZE + (sizeof (unsigned) - 1);
ofs -= (ofs % sizeof (unsigned));
maxofs [pch] = ofs;
maxlen [pch] = runlen;
assert (maxfp[pch] == 0 || maxfp[pch] == (unsigned) fp);
maxfp [pch] = (unsigned) fp;
}
/* Keep input aligned as if no special run processing had taken place */
j += k - (k % index_step) - index_step;
k = index_step;
}
/* Testing showed that avoiding collisions using secondary hashing, or
hash chaining had little effect and is not worth the time. */
hash = ((unsigned) fp) & (idx_size - 1);
idx [hash] = j + k;
}
/* Lastly, index the longest runs of equal characters found before.
This ensures we always match the longerst such runs available. */
for (j = 0; j < 256; j++)
{ if (maxlen[j])
{ idx[maxfp[j] % idx_size] = maxofs[j];
} } }
static unsigned header_length (unsigned srclen, unsigned tgtlen)
{ unsigned len = 0;
assert (srclen <= MAX_SIZE && tgtlen <= MAX_SIZE);
/* GIT headers start with the length of the source and target,
with 7 bits per byte, most significant byte first, and
the high bit indicating continuation. */
while (srclen >= 0x7f) { len++; srclen >>= 7; }
while (tgtlen >= 0x7f) { len++; tgtlen >>= 7; }
return len + 2;
}
static unsigned data_length (unsigned length)
{ assert (length > 0 && length <= MAX_SIZE);
/* Can only include 0x7f data bytes per command */
return (length / 0x7f) * 0x80 + length % 0x7f + 1;
}
static unsigned copy_length (unsigned offset, unsigned length)
{ /* Can only copy 0xffffff bytes per command. For longer commands,
break into pieces of that size. It might be slightly more
efficient to break into pieces of size 0xff0000, but it's not
worth adding complexity for that rare case. */
unsigned osize = !!(offset & 0xff) + !!(offset & 0xff00)
+ !!(offset & 0xff0000) + !!(offset & 0xff000000);
assert (offset < MAX_SIZE && length < MAX_SIZE);
return 1 + (length / 0xffffff ) * (osize + 4) + osize +
+ !!(length & 0xff) + !!(length & 0xff00) + !!(length & 0xff0000);
}
static unsigned process_copies (unsigned char *data, unsigned length)
{ int j;
unsigned ptr = length;
unsigned patch_bytes = 0;
/* Work through the copies backwards, extending each one backwards. */
for (j = copy_count - 1; j >= 0; j--)
{ CopyInfo *copy = copies+j;
unsigned src = copy->src;
unsigned tgt = copy->tgt;
unsigned len = copy->length;
int data_follows;
if (tgt + len > ptr)
{ /* Part of copy already covered by later one, so shorten copy. */
if (ptr < tgt)
{ /* Copy completely disappeared, but guess that a backward extension
might still be useful. This extension is non-contiguous, as it is
irrelevant whether the skipped data would have matched or not.
Be careful to not extend past the beginning of the source. */
unsigned adjust = tgt - ptr;
tgt = ptr;
src = (src < adjust) ? 0 : src - adjust;
copy->tgt = tgt;
copy->src = src;
}
len = ptr - tgt;
}
while (src && tgt && idx_data[src - 1] == data[tgt - 1])
{ src-- ; tgt--; }
len += copy->tgt - tgt;
data_follows = tgt + len < ptr;
if (len < (data_follows ? 16 : 10)) len = 0;
/* A short copy may cost as much as 6 bytes for the copy and
5 as result of an extra data command.
It's not worth having extra copies in order to just save a byte or two.
Being too smart here may hurt later compression as well.
*/
if (len && data_follows)
{ /* Some target data is not covered by the copies, account for
the DATA command that will follow the copy. */
patch_bytes += data_length (ptr - (tgt + len));
}
/* Everything about the copy is known and will not change.
Write back the new information and update the patch size
with the size of the copy instruction. */
copy->length = len;
copy->src = src;
copy->tgt = tgt;
/* Remove empty copies at end of list. */
copy_count -= (!len && j == copy_count - 1);
if (len)
{ /* update patch size for copy command */
patch_bytes += copy_length (src, len);
ptr = tgt;
} }
/* Account for data before first copy */
for (j = 0; j < copy_count; j++)
{ if (copies[j].length)
{ if (copies[j].tgt) patch_bytes += data_length (copies[j].tgt);
break;
} }
/* Case where no copies remain: entire file is a data statement. */
if (!copy_count && length) patch_bytes += data_length (length);
/* Account for header */
patch_bytes += header_length (idx_data_len, length);
return patch_bytes;
}
/* Match data against the current index and record all possible copies */
static int find_copies (unsigned char *data, size_t len)
{ size_t j = 0;
u_int64_t fp = 0;
unsigned w = 0; /* shift register for quick content verification */
rabin_reset ();
while (j < RABIN_WINDOW_SIZE && j < len)
{ unsigned char ch = data[j++];
fp = rabin_slide (fp, ch);
w = big_endian ? w<<8 | ch : w>>8 | ch<<((sizeof (w) - 1) * 8);
}
while (j < len)
{ unsigned char ch = data[j++];
unsigned hash, ofs;
fp = rabin_slide (fp, ch);
hash = fp & (idx_size - 1);
ofs = idx[hash];
w = big_endian ? w<<8 | ch : w>>8 | ch<<((sizeof (w) - 1) * 8);
/* Invariant:
data[0] .. data[j-1] has been processed
w contains last sizeof (unsigned) bytes of processed data
fp is fingerprint of sliding window ending at j-1
ofs is zero or points just past tentative match
ofs is a multiple of index_step
*/
if (ofs && *((unsigned *) (idx_data + ofs - sizeof (w))) == w)
{ /* Found a match. Now try to extend it forward. */
unsigned runlen = sizeof (w);
unsigned tgt = j - runlen;
unsigned src = ofs - runlen;
unsigned maxrun = MIN (idx_data_len - src, len - tgt);
CopyInfo *copy;
if (copy_count == max_copies)
{ max_copies *= 2;
if (!max_copies)
{ max_copies = MAX_COPIES;
copies = malloc(max_copies * sizeof (CopyInfo));
}
else
{ copies = realloc(copies, max_copies * sizeof (CopyInfo));
}
if (!copies) return 0;
}
copy = copies + copy_count;
/* Hot loop */
while (runlen < maxrun && data[tgt + runlen] == idx_data[src + runlen])
{ runlen++; }
copy->src = src;
copy->tgt = tgt;
copy->length = runlen;
copy_count++;
/* For runs extending more than RABIN_WINDOW_SIZE bytes beyond j,
skip ahead to prevent useless fingerprint computations. */
if (tgt + runlen > j + RABIN_WINDOW_SIZE)
{ j = tgt + runlen - RABIN_WINDOW_SIZE;
}
/* Quickly scan ahead without looking for matches
until the end of this run */
while (j + sizeof (w) < tgt + runlen) fp = rabin_slide (fp, data[j++]);
while (j < tgt + runlen)
{ unsigned char ch = data[j++];
fp = rabin_slide (fp, ch);
w = big_endian ? w<<8 | ch : w>>8 | ch<<((sizeof (w) - 1) * 8);
} } }
return 1;
}
unsigned calculate_delta (void *to_buf, unsigned long to_size)
{ unsigned delta_size;
assert (to_size < MAX_SIZE);
if (!find_copies ((unsigned char *) to_buf, to_size)) return 0;
delta_size = process_copies ((unsigned char *) to_buf, to_size);
return delta_size;
}
static unsigned char *
write_header (unsigned char *patch, unsigned srclen, unsigned tgtlen)
{
assert (srclen <= MAX_SIZE && tgtlen <= MAX_SIZE);
while (srclen >= 0x7f)
{ *patch++= (srclen & 0x7f) | 0x80;
srclen >>= 7;
}
*patch++ = srclen;
while (tgtlen >= 0x7f)
{ *patch++ = (tgtlen & 0x7f) | 0x80;
tgtlen >>= 7;
}
*patch++ = tgtlen;
return patch;
}
static unsigned char *
write_data (unsigned char *patch, unsigned char *data, unsigned size)
{ assert (size > 0 && size < MAX_SIZE);
/* The return value must be equal to patch + data_length (patch, size).
This correspondence is essential for calculating the patch size. */
/* GIT has no data commands for large data, rest is same as GDIFF */
while (size > 0x7f)
{ *patch++ = (unsigned char) 0x7f;
memcpy (patch, data, 0x7f);
data += 0x7f;
patch += 0x7f;
size -= 0x7f;
}
*patch++ = (unsigned char) size;
memcpy (patch, data, size);
return patch + size;
}
static unsigned char *
write_copy (unsigned char *patch, unsigned offset, unsigned size)
{ /* The return value must be equal to patch + copy_length (patch,offset,size).
This correspondence is essential for calculating the patch size. */
while (size > 0)
{ unsigned chunksize = MIN (0xffffff, size);
unsigned char cmd = 1;
cmd = (cmd<<1) | !!(size & 0xff0000);
cmd = (cmd<<1) | !!(size & 0x00ff00);
cmd = (cmd<<1) | !!(size & 0x0000ff);
cmd = (cmd<<1) | !!(offset & 0xff000000);
cmd = (cmd<<1) | !!(offset & 0x00ff0000);
cmd = (cmd<<1) | !!(offset & 0x0000ff00);
cmd = (cmd<<1) | !!(offset & 0x000000ff);
*patch++ = cmd | 0x80;
if (cmd & 0x01) *patch++ = offset & 0xff;
if (cmd & 0x02) *patch++ = (offset >> 8) & 0xff;
if (cmd & 0x04) *patch++ = (offset >> 16) & 0xff;
if (cmd & 0x08) *patch++ = (offset >> 24) & 0xff;
if (cmd & 0x10) *patch++ = size & 0xff;
if (cmd & 0x20) *patch++ = (size >> 8) & 0xff;
if (cmd & 0x40) *patch++ = (size >> 16) & 0xff;
size -= chunksize;
}
return patch;
}
void*
create_delta (unsigned char *data, unsigned len, unsigned delta_size)
{ unsigned char *delta = (unsigned char *) malloc (delta_size);
unsigned char *ptr = delta;
unsigned offset = 0;
unsigned data_commands = 0;
unsigned copy_commands = 0;
int j;
ptr = write_header (ptr, idx_data_len, len);
for (j = 0; j < copy_count; j++)
{ CopyInfo *copy = copies + j;
unsigned copylen = copy->length;
if (copylen)
{ if (copy->tgt > offset)
{ assert (delta_size - (ptr - delta) > data_length (copy->tgt - offset));
ptr = write_data (ptr, data + offset, copy->tgt - offset);
data_commands++;
}
assert (delta_size - (ptr - delta) >= copy_length (copy->src, copylen));
ptr = write_copy (ptr, copy->src, copylen);
copy_commands++;
offset = copy->tgt + copylen;
} }
if (offset < len)
{ assert (delta_size - (ptr - delta) >= data_length (len - offset));
ptr = write_data (ptr, data + offset, len - offset);
data_commands++;
}
assert (ptr - delta == (int) delta_size);
return delta;
}
void *diff_delta(void *from_buf, unsigned long from_size,
void *to_buf, unsigned long to_size,
unsigned long *delta_size, unsigned long max_size)
{ unsigned dsize;
assert (from_size <= MAX_SIZE && to_size <= MAX_SIZE);
init_idx (from_buf, from_size, 1); /* Use optimization level 1 */
dsize = calculate_delta (to_buf, to_size);
if (!dsize) return 0;
*delta_size = dsize;
return create_delta (to_buf, to_size, *delta_size);
}
^ permalink raw reply [flat|nested] 32+ messages in thread* Re: RFC: New diff-delta.c implementation 2006-04-21 21:16 RFC: New diff-delta.c implementation Geert Bosch @ 2006-04-22 3:19 ` Nicolas Pitre 2006-04-22 11:04 ` Geert Bosch 2006-04-22 5:21 ` Davide Libenzi ` (2 subsequent siblings) 3 siblings, 1 reply; 32+ messages in thread From: Nicolas Pitre @ 2006-04-22 3:19 UTC (permalink / raw) To: Geert Bosch; +Cc: Git Mailing List, Junio C Hamano On Fri, 21 Apr 2006, Geert Bosch wrote: > I wrote a new binary differencing algorithm that is both faster > and generates smaller deltas than the current implementation. > The format is compatible with that used by patch-delta, so > it should be easy to integrate. It looks really interesting. It ignores the max_size argument but that is trivially fixed. Then it triggers some assertions in the code when running the test suite. > Originally, I wrote this for the GDIFF format, see > http://www.w3.org/TR/NOTE-gdiff-19970901. > The adaptation for GIT format was relatively simple, but is not thoroughly > tested. Some trivial tests look fine but it fail on some others. > The code is not derived from libxdiff, but uses the rabin_slide function > written > by David Mazieres (dm@uun.org). Also the tables are generated using his code. > Finally, this was developed on Darwin, and not a Linux system, so some changes > may be needed. It does compile out of the box on Linux. > Please feel free to play around with this code, and give feedback. > Keep in mind this wasn't originally written for GIT, and C is not > my native language, so don't mind my formatting etc. I did reformat it a bit to be more inline with the rest of GIT's coding style (and to help me read it). I'll look at fixing the issues I can fix and post it back. Nicolas ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: RFC: New diff-delta.c implementation 2006-04-22 3:19 ` Nicolas Pitre @ 2006-04-22 11:04 ` Geert Bosch 2006-04-22 11:13 ` Junio C Hamano 0 siblings, 1 reply; 32+ messages in thread From: Geert Bosch @ 2006-04-22 11:04 UTC (permalink / raw) To: Nicolas Pitre; +Cc: Git Mailing List, Junio C Hamano [-- Attachment #1: Type: text/plain, Size: 854 bytes --] On Apr 21, 2006, at 23:19, Nicolas Pitre wrote: > It looks really interesting. > > It ignores the max_size argument but that is trivially fixed. > > Then it triggers some assertions in the code when running the test > suite. Yes, these were errors in the change to the GIT output format. They were trivially fixed. More importantly, I didn't finalize the indexing data, since the code was originally used in a stand-alone program that would terminate after the diff. > I did reformat it a bit to be more inline with the rest of GIT's > coding > style (and to help me read it). I'll look at fixing the issues I can > fix and post it back. Please apply the attached patch first. -Geert BTW. It's a shame that we don't reuse the index when comparing one source against multiple targets. Creating the index takes about 70% of the time. [-- Attachment #2: pat --] [-- Type: application/octet-stream, Size: 3585 bytes --] --- diff-delta.c.orig 2006-04-22 06:53:44.000000000 -0400 +++ diff-delta.c 2006-04-22 06:59:14.000000000 -0400 @@ -29,7 +29,7 @@ #define RABIN_WINDOW_SIZE 22 #define RABIN_SHIFT 55 -static unsigned long long T[256] = +static const unsigned long long T[256] = { 0x0000000000000000ULL, 0xb15e234bd3792f63ULL, 0x62bc4697a6f25ec6ULL, 0xd3e265dc758b71a5ULL, 0x7426ae649e9d92efULL, 0xc5788d2f4de4bd8cULL, 0x169ae8f3386fcc29ULL, 0xa7c4cbb8eb16e34aULL, 0x59137f82ee420abdULL, @@ -118,7 +118,7 @@ 0xed7b43a0554e9f70ULL }; -static unsigned long long U[256] = +static const unsigned long long U[256] = { 0x0000000000000000ULL, 0x079343d61ab9f60eULL, 0x0f2687ac3573ec1cULL, 0x08b5c47a2fca1a12ULL, 0x1e4d0f586ae7d838ULL, 0x19de4c8e705e2e36ULL, 0x116b88f45f943424ULL, 0x16f8cb22452dc22aULL, 0x3c9a1eb0d5cfb070ULL, @@ -261,7 +261,7 @@ static int copy_count = 0; static unsigned max_copies = 0; /* Dynamically increased */ -static unsigned *idx; +static unsigned *idx = 0; static unsigned idx_size; static unsigned char *idx_data; static unsigned idx_data_len; @@ -406,6 +406,19 @@ { idx[maxfp[j] % idx_size] = maxofs[j]; } } } +static void finalize_idx() +{ if (max_copies > MAX_COPIES) + { free (copies); + max_copies = 0; + copy_count = 0; + } + if (idx) free (idx); + idx = 0; + idx_size = 0; + idx_data = 0; + idx_data_len = 0; +} + static unsigned header_length (unsigned srclen, unsigned tgtlen) { unsigned len = 0; assert (srclen <= MAX_SIZE && tgtlen <= MAX_SIZE); @@ -420,10 +433,12 @@ } static unsigned data_length (unsigned length) -{ assert (length > 0 && length <= MAX_SIZE); +{ unsigned len = length % 0x7f; + assert (length > 0 && length <= MAX_SIZE); /* Can only include 0x7f data bytes per command */ - return (length / 0x7f) * 0x80 + length % 0x7f + 1; + if (len) len++; + return len + (length / 0x7f) * 0x80; } static unsigned copy_length (unsigned offset, unsigned length) @@ -624,8 +639,7 @@ static unsigned char * write_header (unsigned char *patch, unsigned srclen, unsigned tgtlen) -{ - assert (srclen <= MAX_SIZE && tgtlen <= MAX_SIZE); +{ assert (srclen <= MAX_SIZE && tgtlen <= MAX_SIZE); while (srclen >= 0x7f) { *patch++= (srclen & 0x7f) | 0x80; @@ -658,8 +672,9 @@ *patch++ = (unsigned char) size; memcpy (patch, data, size); + patch += size; - return patch + size; + return patch; } static unsigned char * @@ -687,6 +702,7 @@ if (cmd & 0x40) *patch++ = (size >> 16) & 0xff; size -= chunksize; } + return patch; } @@ -699,7 +715,9 @@ unsigned copy_commands = 0; int j; + assert (delta_size - (ptr - delta) >= header_length (idx_data_len, len)); ptr = write_header (ptr, idx_data_len, len); + assert (ptr - delta > 0); for (j = 0; j < copy_count; j++) { CopyInfo *copy = copies + j; @@ -732,11 +750,17 @@ void *diff_delta(void *from_buf, unsigned long from_size, void *to_buf, unsigned long to_size, unsigned long *delta_size, unsigned long max_size) -{ unsigned dsize; +{ unsigned char *delta = 0; + unsigned dsize; assert (from_size <= MAX_SIZE && to_size <= MAX_SIZE); init_idx (from_buf, from_size, 1); /* Use optimization level 1 */ + dsize = calculate_delta (to_buf, to_size); - if (!dsize) return 0; - *delta_size = dsize; - return create_delta (to_buf, to_size, *delta_size); + if (dsize && (!max_size || dsize <= max_size)) + { delta = create_delta (to_buf, to_size, dsize); + *delta_size = delta ? dsize : 0; + } + + finalize_idx (); + return delta; } ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: RFC: New diff-delta.c implementation 2006-04-22 11:04 ` Geert Bosch @ 2006-04-22 11:13 ` Junio C Hamano 2006-04-22 12:35 ` Geert Bosch 2006-04-22 12:45 ` Nicolas Pitre 0 siblings, 2 replies; 32+ messages in thread From: Junio C Hamano @ 2006-04-22 11:13 UTC (permalink / raw) To: Geert Bosch; +Cc: Nicolas Pitre, Git Mailing List Geert Bosch <bosch@adacore.com> writes: > BTW. It's a shame that we don't reuse the index when comparing one > source > against multiple targets. Creating the index takes about 70% of > the time. (Please line-wrap sensibly). I think we tried that with Nico/Davide's delta already, and IIRC we had mixed results. It really depends on how big an index for a source is. Keep in mind that we keep --window (default=10) of the source text in-core, and you are suggesting to keep index in-core as well, so we need to take memory pressure into account. Having said that, the initial number you posted suggests the algorithm is very fast, in which case reusing index may not matter. We will see ;-). ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: RFC: New diff-delta.c implementation 2006-04-22 11:13 ` Junio C Hamano @ 2006-04-22 12:35 ` Geert Bosch 2006-04-22 12:51 ` Nicolas Pitre 2006-04-22 12:45 ` Nicolas Pitre 1 sibling, 1 reply; 32+ messages in thread From: Geert Bosch @ 2006-04-22 12:35 UTC (permalink / raw) To: Junio C Hamano; +Cc: Nicolas Pitre, Git Mailing List On Apr 22, 2006, at 07:13, Junio C Hamano wrote: > (Please line-wrap sensibly). Apologies. I'm using a broken mailer, which insists on breaking longish lines. I'll try to avoid this. > I think we tried that with Nico/Davide's delta already, and IIRC > we had mixed results. > > It really depends on how big an index for a source is. Keep > in mind that we keep --window (default=10) of the source > text in-core, and you are suggesting to keep index in-core > as well, so we need to take memory pressure into account. The point is that if we do the following comparisons: diff A B diff A C diff A D then we should keep A and its index in core for all three comparisons. Discarding and recreating can never be better :) > Having said that, the initial number you posted suggests the > algorithm is very fast, in which case reusing index may not > matter. We will see ;-). BTW, after applying the obvious fixes, I get the following message: potomac%./git-pack-objects --no-reuse-delta --stdout <lst Generating pack... Done counting 16984 objects. Deltifying 16984 objects. 100% (16984/16984) done fatal: delta size changed Is this expected now I have a different algorithm? -Geert ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: RFC: New diff-delta.c implementation 2006-04-22 12:35 ` Geert Bosch @ 2006-04-22 12:51 ` Nicolas Pitre 2006-04-22 13:39 ` Geert Bosch 0 siblings, 1 reply; 32+ messages in thread From: Nicolas Pitre @ 2006-04-22 12:51 UTC (permalink / raw) To: Geert Bosch; +Cc: Junio C Hamano, Git Mailing List On Sat, 22 Apr 2006, Geert Bosch wrote: > BTW, after applying the obvious fixes, I get the following > message: > potomac%./git-pack-objects --no-reuse-delta --stdout <lst > Generating pack... > Done counting 16984 objects. > Deltifying 16984 objects. > 100% (16984/16984) done > fatal: delta size changed > > Is this expected now I have a different algorithm? It should not. First, pack-objects tries to find the best object combinations producing the smallest delta. Then there is a second pass where the best delta are actually written out. When that message appears that means the delta size for the same object pair does not match between those two passes. Nicolas ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: RFC: New diff-delta.c implementation 2006-04-22 12:51 ` Nicolas Pitre @ 2006-04-22 13:39 ` Geert Bosch 2006-04-22 17:03 ` Junio C Hamano 0 siblings, 1 reply; 32+ messages in thread From: Geert Bosch @ 2006-04-22 13:39 UTC (permalink / raw) To: Nicolas Pitre; +Cc: Junio C Hamano, Git Mailing List On Apr 22, 2006, at 08:51, Nicolas Pitre wrote: > First, pack-objects tries to find the best object combinations > producing the smallest delta. Then there is a second pass > where the best delta are actually written out. When that > message appears that means the delta size for the same object > pair does not match between those two passes. OK, thanks for that info. There are very few comments in the code, or specs of either the file format used, or for function arguments. I'll look a the code again with this info. What is the exact role of the max_size parameter that is passed to diff_delta? I took it to mean return 0 if the size of the delta would be bigger than max_size and max_size is nonzero. I only set *delta_size when returning a nonzero delta. -Geert ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: RFC: New diff-delta.c implementation 2006-04-22 13:39 ` Geert Bosch @ 2006-04-22 17:03 ` Junio C Hamano 2006-04-22 17:28 ` Geert Bosch 0 siblings, 1 reply; 32+ messages in thread From: Junio C Hamano @ 2006-04-22 17:03 UTC (permalink / raw) To: Geert Bosch; +Cc: Nicolas Pitre, Git Mailing List Geert Bosch <bosch@adacore.com> writes: > On Apr 22, 2006, at 08:51, Nicolas Pitre wrote: >> First, pack-objects tries to find the best object combinations >> producing the smallest delta. Then there is a second pass >> where the best delta are actually written out. When that >> message appears that means the delta size for the same object >> pair does not match between those two passes. > > OK, thanks for that info. There are very few comments in the > code, or specs of either the file format used, or > for function arguments. I'll look a the code again with this > info. Initially I thought it would be irrelevant to your work, but generating packs is the only way to really exercise the diff-delta code these days; Documentation/technical/pack-format.txt might help. > What is the exact role of the max_size parameter that is > passed to diff_delta? I took it to mean return 0 if > the size of the delta would be bigger than max_size and > max_size is nonzero. No, that is a _strong_ hint to tell diff_delta to quit early without wasting cycles if the result exceeds the given size, either because we already have a delta smaller than that, or because we expect to get an undeltified representation compressed down to that size. So if your algorithm cannot notice early stage of the processing if the result would exceed that max_size, just code things to ignore it first. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: RFC: New diff-delta.c implementation 2006-04-22 17:03 ` Junio C Hamano @ 2006-04-22 17:28 ` Geert Bosch 2006-04-22 17:57 ` Junio C Hamano 0 siblings, 1 reply; 32+ messages in thread From: Geert Bosch @ 2006-04-22 17:28 UTC (permalink / raw) To: Junio C Hamano; +Cc: Nicolas Pitre, Git Mailing List On Apr 22, 2006, at 13:03, Junio C Hamano wrote: >> What is the exact role of the max_size parameter that >> is passed to diff_delta? I took it to mean return 0 if >> the size of the delta would be bigger than max_size and >> max_size is nonzero. > > No, that is a _strong_ hint to tell diff_delta to quit > early without wasting cycles if the result exceeds the > given size, either because we already have a delta smaller > than that, or because we expect to get an undeltified > representation compressed down to that size. So if > your algorithm cannot notice early stage of the processing > if the result would exceed that max_size, just code things > to ignore it first. That's about how I implemented it in my last patch. Is it correct that 0 means that there is no max_size? Should I set *delta_size to 0 when doing an early return, or leave it alone? Note that this really is a micro-optimization, since all the expensive stuff is the indexing and then the matching. Since my algorithm matches both forward and backward, there is no way to know that the patch size can't be optimized until after matching completes, even though it will never actually create the patch or allocate memory for it. -Geert ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: RFC: New diff-delta.c implementation 2006-04-22 17:28 ` Geert Bosch @ 2006-04-22 17:57 ` Junio C Hamano 0 siblings, 0 replies; 32+ messages in thread From: Junio C Hamano @ 2006-04-22 17:57 UTC (permalink / raw) To: Geert Bosch; +Cc: Nicolas Pitre, Git Mailing List Geert Bosch <bosch@adacore.com> writes: > That's about how I implemented it in my last patch. > Is it correct that 0 means that there is no max_size? > Should I set *delta_size to 0 when doing an early > return, or leave it alone? When failing, I think the convention is to return NULL and what you do with *delta_size does not matter, but Nico and pack-objects.c code can answer that better than I can ;-). > Note that this really is a micro-optimization, True. max_size is a hint, and the caller validates the size independently anyway, so ignoring is fine. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: RFC: New diff-delta.c implementation 2006-04-22 11:13 ` Junio C Hamano 2006-04-22 12:35 ` Geert Bosch @ 2006-04-22 12:45 ` Nicolas Pitre 2006-04-22 14:17 ` Geert Bosch 2006-04-22 17:29 ` Junio C Hamano 1 sibling, 2 replies; 32+ messages in thread From: Nicolas Pitre @ 2006-04-22 12:45 UTC (permalink / raw) To: Junio C Hamano; +Cc: Geert Bosch, Git Mailing List On Sat, 22 Apr 2006, Junio C Hamano wrote: > Geert Bosch <bosch@adacore.com> writes: > > > BTW. It's a shame that we don't reuse the index when comparing one > > source > > against multiple targets. Creating the index takes about 70% of > > the time. > > I think we tried that with Nico/Davide's delta already, and IIRC > we had mixed results. Well, actually I was measuring a 10% speed improvement with a quick and naive (not memory efficient) approach for pack-objects with the current algorithm. > It really depends on how big an index for a source is. Keep in > mind that we keep --window (default=10) of the source text > in-core, and you are suggesting to keep index in-core as well, > so we need to take memory pressure into account. The idea to avoid memory pressure is to reverse the window processing such that the object to delta against is constant for the entire window instead of the current logic where the target object is constant. This way there would be only one index in memory at all time. Nicolas ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: RFC: New diff-delta.c implementation 2006-04-22 12:45 ` Nicolas Pitre @ 2006-04-22 14:17 ` Geert Bosch 2006-04-22 17:29 ` Junio C Hamano 1 sibling, 0 replies; 32+ messages in thread From: Geert Bosch @ 2006-04-22 14:17 UTC (permalink / raw) To: Nicolas Pitre; +Cc: Junio C Hamano, Git Mailing List On Apr 22, 2006, at 08:45, Nicolas Pitre wrote: > The idea to avoid memory pressure is to reverse the window processing > such that the object to delta against is constant for the entire > window > instead of the current logic where the target object is constant. > This > way there would be only one index in memory at all time. Right, this is essential. In my measurements, diff-delta spends about 70% of time generating the index, and 30% matching. Right now, for 10 candidates per file, we'd do 11 units of work, since we repeat the final delta. When reusing the index, and keeping the smallest delta around, we'd use 0.7 + 3 = 3.7 units of work. This is almost a 3x speedup. There's no way we can get decent performance without this. With the similarity fingerprints, another factor 2x should be attainable, by only considering the 3 files with the nearest fingerprints. -Geert ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: RFC: New diff-delta.c implementation 2006-04-22 12:45 ` Nicolas Pitre 2006-04-22 14:17 ` Geert Bosch @ 2006-04-22 17:29 ` Junio C Hamano 2006-04-22 19:58 ` Nicolas Pitre 1 sibling, 1 reply; 32+ messages in thread From: Junio C Hamano @ 2006-04-22 17:29 UTC (permalink / raw) To: Nicolas Pitre; +Cc: Geert Bosch, Git Mailing List Nicolas Pitre <nico@cam.org> writes: > Well, actually I was measuring a 10% speed improvement with a quick and > naive (not memory efficient) approach for pack-objects with the current > algorithm. >... > The idea to avoid memory pressure is to reverse the window processing > such that the object to delta against is constant for the entire window > instead of the current logic where the target object is constant. This > way there would be only one index in memory at all time. Your are right. The first led to the latter unexplored idea. I expect to be offline most of the day today, and have other things I can work on for the next few days anyway, so if you or somebody else have an inclination and energy to reverse the delta window, I would appreciate that. Maybe the calling convention of diff-delta.c would become something like this? struct delta_index; /* opaque to the caller; implementation * defines what's in it. */ /* returns a newly allocated struct delta_index. * input "buf" pointer can be stored in the struct, but "buf" * does not belong to diff-delta module (i.e. borrowed reference). */ struct delta_index *delta_index( void *buf, /* input: from buffer */ unsigned long size, /* input: from size */ ); /* ... so free the structure and its internal data, but * do not free the borrowed reference! */ void free_delta_index(struct delta_index *); /* Take "from", an already preprocessed delta_index for the * traditional from_buffer/from_size, and to_buf/to_size, and * produce delta in newly allocated buffer (caller should * free() when it is done), and return the result size in * *delta_size. Stop early if the result would exceed max_size. */ void *diff_delta( struct delta_index *from, /* input: prepared by delta_index() */ void *to_buf, /* input: destination buffer */ unsigned long to_size, /* input: destination size */ unsigned long *delta_size, /* output: result size */ unsigned long max_size /* input: do not waste cycles if you cannot generate result smaller than this */ ); and the calling convention would be: struct unpacked *s, *d; unsigned long max_size; /* precompute the index */ struct delta_index *src = delta_index(s->data, s->entry->size); /* do the delta */ void *delta_buf = diff_delta(src, d->data, d->entry->size, &sz, max_size); /* do useful thing here on delta_buf and sz */ free(delta_buf); /* the caller can reuse *src with other *d, * but when it is done... */ free_delta_index(src); ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: RFC: New diff-delta.c implementation 2006-04-22 17:29 ` Junio C Hamano @ 2006-04-22 19:58 ` Nicolas Pitre 0 siblings, 0 replies; 32+ messages in thread From: Nicolas Pitre @ 2006-04-22 19:58 UTC (permalink / raw) To: Junio C Hamano; +Cc: Geert Bosch, Git Mailing List On Sat, 22 Apr 2006, Junio C Hamano wrote: > Nicolas Pitre <nico@cam.org> writes: > > > Well, actually I was measuring a 10% speed improvement with a quick and > > naive (not memory efficient) approach for pack-objects with the current > > algorithm. > >... > > The idea to avoid memory pressure is to reverse the window processing > > such that the object to delta against is constant for the entire window > > instead of the current logic where the target object is constant. This > > way there would be only one index in memory at all time. > > Your are right. The first led to the latter unexplored idea. > > I expect to be offline most of the day today, and have other > things I can work on for the next few days anyway, so if you or > somebody else have an inclination and energy to reverse the > delta window, I would appreciate that. I'll probably give it a try. I'm still reviewing Geert's code right now and found minor things pertaining to the GIT delta encoding here and there which probably explain why it doesn't pack the Linux kernel archive yet. Nicolas ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: RFC: New diff-delta.c implementation 2006-04-21 21:16 RFC: New diff-delta.c implementation Geert Bosch 2006-04-22 3:19 ` Nicolas Pitre @ 2006-04-22 5:21 ` Davide Libenzi 2006-04-22 9:12 ` Geert Bosch 2006-04-22 12:36 ` Rene Scharfe 2006-04-22 20:36 ` Davide Libenzi 3 siblings, 1 reply; 32+ messages in thread From: Davide Libenzi @ 2006-04-22 5:21 UTC (permalink / raw) To: Geert Bosch; +Cc: Git Mailing List, Junio C Hamano On Fri, 21 Apr 2006, Geert Bosch wrote: > I wrote a new binary differencing algorithm that is both faster > and generates smaller deltas than the current implementation. > The format is compatible with that used by patch-delta, so > it should be easy to integrate. > > Originally, I wrote this for the GDIFF format, see > http://www.w3.org/TR/NOTE-gdiff-19970901. > The adaptation for GIT format was relatively simple, but is not thoroughly > tested. > The code is not derived from libxdiff, but uses the rabin_slide function > written > by David Mazieres (dm@uun.org). Also the tables are generated using his code. > Finally, this was developed on Darwin, and not a Linux system, so some > changes may be needed. > > Initial testing seems quite positive, take for example git-1.2.5.tar vs > git-1.2.6.tar > on my PowerBook (both with -O2 -DNDEBUG): > > current: 2.281s, patch size 36563 > new : 0.109s, patch size 16199 Geert, the code needs some works IMO ;), but otherwise very lever idea to use Rabin's polynomials and impressive results! - Davide ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: RFC: New diff-delta.c implementation 2006-04-22 5:21 ` Davide Libenzi @ 2006-04-22 9:12 ` Geert Bosch 0 siblings, 0 replies; 32+ messages in thread From: Geert Bosch @ 2006-04-22 9:12 UTC (permalink / raw) To: Davide Libenzi; +Cc: Git Mailing List, Junio C Hamano On Apr 22, 2006, at 01:21, Davide Libenzi wrote: > Geert, the code needs some works IMO ;), but otherwise very lever > idea to use Rabin's polynomials and impressive results! I'm fixing the obvious mistakes: as the code was lifted from my stand-alone command processing a single source file, I didn't bother freeing memory, as process exit would do that anyway. Also, there are a few issues that cropped up as I adapted the output from GDIFF format to the one used by GIT. I'll post a new version when I'm done. Please withhold further judgment until then. :) -Geert ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: RFC: New diff-delta.c implementation 2006-04-21 21:16 RFC: New diff-delta.c implementation Geert Bosch 2006-04-22 3:19 ` Nicolas Pitre 2006-04-22 5:21 ` Davide Libenzi @ 2006-04-22 12:36 ` Rene Scharfe 2006-04-24 2:57 ` Geert Bosch 2006-04-22 20:36 ` Davide Libenzi 3 siblings, 1 reply; 32+ messages in thread From: Rene Scharfe @ 2006-04-22 12:36 UTC (permalink / raw) To: Geert Bosch; +Cc: Git Mailing List, Junio C Hamano Hello Geert, Geert Bosch schrieb: > I wrote a new binary differencing algorithm that is both faster and > generates smaller deltas than the current implementation. The format > is compatible with that used by patch-delta, so it should be easy to > integrate. [...] > Initial testing seems quite positive, take for example git-1.2.5.tar > vs git-1.2.6.tar on my PowerBook (both with -O2 -DNDEBUG): > > current: 2.281s, patch size 36563 > new : 0.109s, patch size 16199 > > Please feel free to play around with this code, and give feedback. > Keep in mind this wasn't originally written for GIT, and C is not my > native language, so don't mind my formatting etc. nice speedup! Though I cannot comment on what it actually does, I have some comments on style. B-) Could you please send your code inline, not as an attachment? And possibly as a patch with a Signed-off-by: tag (see Documentation/SubmittingPatches)? Regarding your FIXME comment about endianess: I think you are looking for htonl(). Use it to convert the values from host byte order to network byte order (= big endian) and you can get rid of those ugly branches. You can use "indent -npro -kr -i8 -ts8 -l80 -ss -ncs" to reformat your code into a similar style as used in the rest of git (settings taken from Lindent which is shipped with the Linux source). After converting to htonl() "make test" ran fine on my x86 box. Here is what I get when I try to repack the git repo, though: $ git repack -a -d Generating pack... Done counting 18985 objects. Deltifying 18985 objects. git-pack-objects: diff-delta.c:766: create_delta: Assertion `ptr - delta == (int)delta_size' failed. Please let me know if you need more details. Thanks, René ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: RFC: New diff-delta.c implementation 2006-04-22 12:36 ` Rene Scharfe @ 2006-04-24 2:57 ` Geert Bosch 2006-04-24 5:27 ` Nicolas Pitre 2006-04-25 18:22 ` Rene Scharfe 0 siblings, 2 replies; 32+ messages in thread From: Geert Bosch @ 2006-04-24 2:57 UTC (permalink / raw) To: Rene Scharfe; +Cc: Git Mailing List, Junio C Hamano On Sat, Apr 22, 2006 at 02:36:04PM +0200, Rene Scharfe wrote: > Could you please send your code inline, not as an attachment? And > possibly as a patch with a Signed-off-by: tag (see > Documentation/SubmittingPatches)? For various reasons, mostly to do with managing and searching huge mailboxes, I'm using Apple Mail. What sucks though is that automatic line wrapping can't be turned off. This never got fixed, so it's useless for posting inline patches. That said, I now leave a synchronized copy of the git repository on my mailserver and use mutt for this reply. Hopefully things will be better. Note that I sent this code as a RFC, with explicit disclaimers about style. So, I did not want to sign off on this code, since I pretty much knew there would be some problems with the undocumented ("proprietary", according the libxdiff site) file format. In contrast the GDIFF fileformat was documented very well, and I have a version of this code that works flawlessly with that format. > Regarding your FIXME comment about endianess: I think you are looking > for htonl(). Use it to convert the values from host byte order to > network byte order (= big endian) and you can get rid of those ugly > branches. Ah, I'll use that. It's of course a slight change that all processing now is big-endian centric, but that might actually even result in better code in this case. I'm just assuming any decent system has some highly optimized macro for this and will never do a function call. This is used in the most performance critical loops, and doing function calls here will lead to horrendous performance. > > You can use "indent -npro -kr -i8 -ts8 -l80 -ss -ncs" to reformat your > code into a similar style as used in the rest of git (settings taken > from Lindent which is shipped with the Linux source). Although I cringe at 8-space indenting, and find much of the GIT code close to unreadable for lack of design-level comments, I'll gladly reformat any code to conform to existing code standards. Please let me know if you've got documentation on that, as it would be helpful for me to know what the standard is. (No flame intended. :-) > > After converting to htonl() "make test" ran fine on my x86 box. Here is > what I get when I try to repack the git repo, though: > > $ git repack -a -d > Generating pack... > Done counting 18985 objects. > Deltifying 18985 objects. > git-pack-objects: diff-delta.c:766: create_delta: Assertion `ptr - > delta == (int)delta_size' failed. > > Please let me know if you need more details. This was a result of incorrect calculation of the size of copy and data commands. I fixed this in a follow-up patch sent to the list. For any bug reports, they're easiest to fix if you can find a reproducer using test-delta. -Geert ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: RFC: New diff-delta.c implementation 2006-04-24 2:57 ` Geert Bosch @ 2006-04-24 5:27 ` Nicolas Pitre 2006-04-24 15:19 ` Geert Bosch 2006-04-24 18:44 ` Geert Bosch 2006-04-25 18:22 ` Rene Scharfe 1 sibling, 2 replies; 32+ messages in thread From: Nicolas Pitre @ 2006-04-24 5:27 UTC (permalink / raw) To: Geert Bosch; +Cc: Rene Scharfe, Git Mailing List, Junio C Hamano [-- Attachment #1: Type: TEXT/PLAIN, Size: 3400 bytes --] On Sun, 23 Apr 2006, Geert Bosch wrote: > Note that I sent this code as a RFC, with explicit disclaimers about > style. I took the liberty of reworking the coding style, as well as simplifying some constructs a bit. I also fixed all the format encoding and size computations bugs the original code has so it works perfectly now. > So, I did not want to sign off on this code, since I pretty > much knew there would be some problems with the undocumented > ("proprietary", according the libxdiff site) file format. In contrast > the GDIFF fileformat was documented very well, and I have a version > of this code that works flawlessly with that format. Note that the GIT encoding is itself different (a bit denser) from the xdiff encoding. > > Regarding your FIXME comment about endianess: I think you are looking > > for htonl(). Use it to convert the values from host byte order to > > network byte order (= big endian) and you can get rid of those ugly > > branches. > Ah, I'll use that. It's of course a slight change that all processing > now is big-endian centric, but that might actually even result in > better code in this case. I'm just assuming any decent system has > some highly optimized macro for this and will never do a function call. > This is used in the most performance critical loops, and doing function > calls here will lead to horrendous performance. Please just completely drop that word fetch "optimization". Not only does it produce worse assembly code in the end due to the extra loop counter and extra shifting in turn increasing register pressure, but it also assumes that the target data buffer is always word aligned which is not guaranteed (casting a char pointer to an unsigned and dereferencing it is not portable since on some architecture such misaligned accesses are not allowed). And it even has worse performance. For example, with word fetch and shift code, packing the Linux kernel archive on my P4 at 3GHz: $ git-repack -a -f ... real 7m5.727s user 6m33.797s sys 0m31.394s And with the word fetch optimization completely ripped out: $ git-repack -a -f ... real 6m45.443s user 6m13.911s sys 0m31.434s So you must be extremely careful when trying to implement such kind of optimizations. Now disabling all assert() statements gives: $ git-repack -a -f ... real 6m42.478s user 6m12.031s sys 0m31.090s But here comes the sad part. Even after simplifying the code as much as I could, performance is still significantly worse than the current diff-delta.c code. Repacking again the same Linux kernel repository with the current code: $ git-repack -a -f ... real 4m20.742s user 3m52.255s sys 0m28.758s The final pack is smaller with your code but not significantly: 117867049 bytes vs 118824550 bytes with the current code, i.e. less than 1% difference. You can find attached my current version of your code. In the mean time I'll try replacing the adler32 hashing with your Rabin polynomial hashing into the current code just to see if it helps (and I think it will help quite a bit). The best solution might be a mix of both approaches. Of course there is the issue of reversing the object matching window logic in pack-object.c to further improve things, but I consider that an orthogonal improvement at this point which both approaches will benefit from anyway. Nicolas [-- Attachment #2: Type: TEXT/PLAIN, Size: 40628 bytes --] #include <unistd.h> #include <stdlib.h> #include <assert.h> #include <string.h> #include <sys/types.h> #undef assert #define assert(x) do { } while (0) /* * MIN_HTAB_SIZE is fixed amount to be added to the size of the hash table * used for indexing and must be a power of two. This allows for small files * to have a sparse hash table, since in that case it's cheap. * Hash table sizes are rounded up to a power of two to avoid integer division. */ #define MIN_HTAB_SIZE 8192 #define MAX_HTAB_SIZE (1024*1024*1024) /* * Diffing files of gigabyte range is impractical with the current * algorithm, so we're assuming 32-bit sizes everywhere. * Size leaves some room for expansion when diffing random files. */ #define MAX_SIZE (0x7eff0000) /* Initial size of copies table, dynamically extended as needed. */ #define MAX_COPIES 4096 /* * Matching is done using a sliding window for which a Rabin * polynomial is computed. The advantage of such polynomials is * that they can efficiently be updated at every position. * The tables needed for this are precomputed, as it is desirable * to use the same polynomial all the time for repeatable results. */ #if 0 /* those don't seem to work */ #define RABIN_POLY 0x25bd5331c0d7096dULL #define RABIN_DEGREE 61 #define RABIN_SHIFT 53 #define RABIN_WINDOW_SIZE 22 static const u_int64_t T[256] = { 0x6ec7f55241791bb7ULL, 0x96f54cc7035c25b4ULL, 0xb3481ff6c38b2cd9ULL, 0xdd8feaa482f2376eULL, 0xf832b99542253e03ULL, 0x0857cabfc66f4205ULL, 0x2dea998e06b84b68ULL, 0x432d6cdc47c150dfULL, 0x66903fed871659b2ULL, 0x9ea28678c53367b1ULL, 0xbb1fd54905e46edcULL, 0xd5d8201b449d756bULL, 0xf065732a844a7c06ULL, 0x10af957f8cde840aULL, 0x3512c64e4c098d67ULL, 0x5bd5331c0d7096d0ULL, 0x7e68602dcda79fbdULL, 0x865ad9b88f82a1beULL, 0xa3e78a894f55a8d3ULL, 0xcd207fdb0e2cb364ULL, 0xe89d2ceacefbba09ULL, 0x18f85fc04ab1c60fULL, 0x3d450cf18a66cf62ULL, 0x5382f9a3cb1fd4d5ULL, 0x763faa920bc8ddb8ULL, 0x8e0d130749ede3bbULL, 0xabb04036893aead6ULL, 0xc577b564c843f161ULL, 0xe0cae6550894f80cULL, 0x04e279ced96a0179ULL, 0x215f2aff19bd0814ULL, 0x4f98dfad58c413a3ULL, 0x6a258c9c98131aceULL, 0x92173509da3624cdULL, 0xb7aa66381ae12da0ULL, 0xd96d936a5b983617ULL, 0xfcd0c05b9b4f3f7aULL, 0x0cb5b3711f05437cULL, 0x2908e040dfd24a11ULL, 0x47cf15129eab51a6ULL, 0x627246235e7c58cbULL, 0x9a40ffb61c5966c8ULL, 0xbffdac87dc8e6fa5ULL, 0xd13a59d59df77412ULL, 0xf4870ae45d207d7fULL, 0x144decb155b48573ULL, 0x31f0bf8095638c1eULL, 0x5f374ad2d41a97a9ULL, 0x7a8a19e314cd9ec4ULL, 0x82b8a07656e8a0c7ULL, 0xa705f347963fa9aaULL, 0xc9c20615d746b21dULL, 0xec7f55241791bb70ULL, 0x1c1a260e93dbc776ULL, 0x39a7753f530cce1bULL, 0x5760806d1275d5acULL, 0x72ddd35cd2a2dcc1ULL, 0x8aef6ac99087e2c2ULL, 0xaf5239f85050ebafULL, 0xc195ccaa1129f018ULL, 0xe4289f9bd1fef975ULL, 0x09c4f39db2d402f2ULL, 0x2c79a0ac72030b9fULL, 0x42be55fe337a1028ULL, 0x670306cff3ad1945ULL, 0x9f31bf5ab1882746ULL, 0xba8cec6b715f2e2bULL, 0xd44b19393026359cULL, 0xf1f64a08f0f13cf1ULL, 0x0193392274bb40f7ULL, 0x242e6a13b46c499aULL, 0x4ae99f41f515522dULL, 0x6f54cc7035c25b40ULL, 0x976675e577e76543ULL, 0xb2db26d4b7306c2eULL, 0xdc1cd386f6497799ULL, 0xf9a180b7369e7ef4ULL, 0x196b66e23e0a86f8ULL, 0x3cd635d3fedd8f95ULL, 0x5211c081bfa49422ULL, 0x77ac93b07f739d4fULL, 0x8f9e2a253d56a34cULL, 0xaa237914fd81aa21ULL, 0xc4e48c46bcf8b196ULL, 0xe159df777c2fb8fbULL, 0x113cac5df865c4fdULL, 0x3481ff6c38b2cd90ULL, 0x5a460a3e79cbd627ULL, 0x7ffb590fb91cdf4aULL, 0x87c9e09afb39e149ULL, 0xa274b3ab3beee824ULL, 0xccb346f97a97f393ULL, 0xe90e15c8ba40fafeULL, 0x0d268a536bbe038bULL, 0x289bd962ab690ae6ULL, 0x465c2c30ea101151ULL, 0x63e17f012ac7183cULL, 0x9bd3c69468e2263fULL, 0xbe6e95a5a8352f52ULL, 0xd0a960f7e94c34e5ULL, 0xf51433c6299b3d88ULL, 0x057140ecadd1418eULL, 0x20cc13dd6d0648e3ULL, 0x4e0be68f2c7f5354ULL, 0x6bb6b5beeca85a39ULL, 0x93840c2bae8d643aULL, 0xb6395f1a6e5a6d57ULL, 0xd8feaa482f2376e0ULL, 0xfd43f979eff47f8dULL, 0x1d891f2ce7608781ULL, 0x38344c1d27b78eecULL, 0x56f3b94f66ce955bULL, 0x734eea7ea6199c36ULL, 0x8b7c53ebe43ca235ULL, 0xaec100da24ebab58ULL, 0xc006f5886592b0efULL, 0xe5bba6b9a545b982ULL, 0x15ded593210fc584ULL, 0x306386a2e1d8cce9ULL, 0x5ea473f0a0a1d75eULL, 0x7b1920c16076de33ULL, 0x832b99542253e030ULL, 0xa696ca65e284e95dULL, 0xc8513f37a3fdf2eaULL, 0xedec6c06632afb87ULL, 0x1389e73b65a805e4ULL, 0x3634b40aa57f0c89ULL, 0x58f34158e406173eULL, 0x7d4e126924d11e53ULL, 0x857cabfc66f42050ULL, 0xa0c1f8cda623293dULL, 0xce060d9fe75a328aULL, 0xebbb5eae278d3be7ULL, 0x1bde2d84a3c747e1ULL, 0x3e637eb563104e8cULL, 0x50a48be72269553bULL, 0x7519d8d6e2be5c56ULL, 0x8d2b6143a09b6255ULL, 0xa8963272604c6b38ULL, 0xc651c7202135708fULL, 0xe3ec9411e1e279e2ULL, 0x03267244e97681eeULL, 0x269b217529a18883ULL, 0x485cd42768d89334ULL, 0x6de18716a80f9a59ULL, 0x95d33e83ea2aa45aULL, 0xb06e6db22afdad37ULL, 0xdea998e06b84b680ULL, 0xfb14cbd1ab53bfedULL, 0x0b71b8fb2f19c3ebULL, 0x2eccebcaefceca86ULL, 0x400b1e98aeb7d131ULL, 0x65b64da96e60d85cULL, 0x9d84f43c2c45e65fULL, 0xb839a70dec92ef32ULL, 0xd6fe525fadebf485ULL, 0xf343016e6d3cfde8ULL, 0x176b9ef5bcc2049dULL, 0x32d6cdc47c150df0ULL, 0x5c1138963d6c1647ULL, 0x79ac6ba7fdbb1f2aULL, 0x819ed232bf9e2129ULL, 0xa42381037f492844ULL, 0xcae474513e3033f3ULL, 0xef592760fee73a9eULL, 0x1f3c544a7aad4698ULL, 0x3a81077bba7a4ff5ULL, 0x5446f229fb035442ULL, 0x71fba1183bd45d2fULL, 0x89c9188d79f1632cULL, 0xac744bbcb9266a41ULL, 0xc2b3beeef85f71f6ULL, 0xe70eeddf3888789bULL, 0x07c40b8a301c8097ULL, 0x227958bbf0cb89faULL, 0x4cbeade9b1b2924dULL, 0x6903fed871659b20ULL, 0x9131474d3340a523ULL, 0xb48c147cf397ac4eULL, 0xda4be12eb2eeb7f9ULL, 0xfff6b21f7239be94ULL, 0x0f93c135f673c292ULL, 0x2a2e920436a4cbffULL, 0x44e9675677ddd048ULL, 0x61543467b70ad925ULL, 0x99668df2f52fe726ULL, 0xbcdbdec335f8ee4bULL, 0xd21c2b917481f5fcULL, 0xf7a178a0b456fc91ULL, 0x1a4d14a6d77c0716ULL, 0x3ff0479717ab0e7bULL, 0x5137b2c556d215ccULL, 0x748ae1f496051ca1ULL, 0x8cb85861d42022a2ULL, 0xa9050b5014f72bcfULL, 0xc7c2fe02558e3078ULL, 0xe27fad3395593915ULL, 0x121ade1911134513ULL, 0x37a78d28d1c44c7eULL, 0x5960787a90bd57c9ULL, 0x7cdd2b4b506a5ea4ULL, 0x84ef92de124f60a7ULL, 0xa152c1efd29869caULL, 0xcf9534bd93e1727dULL, 0xea28678c53367b10ULL, 0x0ae281d95ba2831cULL, 0x2f5fd2e89b758a71ULL, 0x419827bada0c91c6ULL, 0x6425748b1adb98abULL, 0x9c17cd1e58fea6a8ULL, 0xb9aa9e2f9829afc5ULL, 0xd76d6b7dd950b472ULL, 0xf2d0384c1987bd1fULL, 0x02b54b669dcdc119ULL, 0x270818575d1ac874ULL, 0x49cfed051c63d3c3ULL, 0x6c72be34dcb4daaeULL, 0x944007a19e91e4adULL, 0xb1fd54905e46edc0ULL, 0xdf3aa1c21f3ff677ULL, 0xfa87f2f3dfe8ff1aULL, 0x1eaf6d680e16066fULL, 0x3b123e59cec10f02ULL, 0x55d5cb0b8fb814b5ULL, 0x7068983a4f6f1dd8ULL, 0x885a21af0d4a23dbULL, 0xade7729ecd9d2ab6ULL, 0xc32087cc8ce43101ULL, 0xe69dd4fd4c33386cULL, 0x16f8a7d7c879446aULL, 0x3345f4e608ae4d07ULL, 0x5d8201b449d756b0ULL, 0x783f528589005fddULL, 0x800deb10cb2561deULL, 0xa5b0b8210bf268b3ULL, 0xcb774d734a8b7304ULL, 0xeeca1e428a5c7a69ULL, 0x0e00f81782c88265ULL, 0x2bbdab26421f8b08ULL, 0x457a5e74036690bfULL, 0x60c70d45c3b199d2ULL, 0x98f5b4d08194a7d1ULL, 0xbd48e7e14143aebcULL, 0xd38f12b3003ab50bULL, 0xf6324182c0edbc66ULL, 0x065732a844a7c060ULL, 0x23ea61998470c90dULL, 0x4d2d94cbc509d2baULL, 0x6890c7fa05dedbd7ULL, 0x90a27e6f47fbe5d4ULL, 0xb51f2d5e872cecb9ULL, 0xdbd8d80cc655f70eULL, 0xfe658b3d0682fe63ULL }; static const u_int64_t U[256] = { 0x0000000000000000ULL, 0x067b86c43d6a6cb0ULL, 0x0cf70d887ad4d960ULL, 0x0a8c8b4c47beb5d0ULL, 0x19ee1b10f5a9b2c0ULL, 0x1f959dd4c8c3de70ULL, 0x151916988f7d6ba0ULL, 0x1362905cb2170710ULL, 0x166165102b846cedULL, 0x101ae3d416ee005dULL, 0x1a9668985150b58dULL, 0x1cedee5c6c3ad93dULL, 0x0f8f7e00de2dde2dULL, 0x09f4f8c4e347b29dULL, 0x03787388a4f9074dULL, 0x0503f54c99936bfdULL, 0x097f991197dfd0b7ULL, 0x0f041fd5aab5bc07ULL, 0x05889499ed0b09d7ULL, 0x03f3125dd0616567ULL, 0x1091820162766277ULL, 0x16ea04c55f1c0ec7ULL, 0x1c668f8918a2bb17ULL, 0x1a1d094d25c8d7a7ULL, 0x1f1efc01bc5bbc5aULL, 0x19657ac58131d0eaULL, 0x13e9f189c68f653aULL, 0x1592774dfbe5098aULL, 0x06f0e71149f20e9aULL, 0x008b61d57498622aULL, 0x0a07ea993326d7faULL, 0x0c7c6c5d0e4cbb4aULL, 0x12ff32232fbfa16eULL, 0x1484b4e712d5cddeULL, 0x1e083fab556b780eULL, 0x1873b96f680114beULL, 0x0b112933da1613aeULL, 0x0d6aaff7e77c7f1eULL, 0x07e624bba0c2caceULL, 0x019da27f9da8a67eULL, 0x049e5733043bcd83ULL, 0x02e5d1f73951a133ULL, 0x08695abb7eef14e3ULL, 0x0e12dc7f43857853ULL, 0x1d704c23f1927f43ULL, 0x1b0bcae7ccf813f3ULL, 0x118741ab8b46a623ULL, 0x17fcc76fb62cca93ULL, 0x1b80ab32b86071d9ULL, 0x1dfb2df6850a1d69ULL, 0x1777a6bac2b4a8b9ULL, 0x110c207effdec409ULL, 0x026eb0224dc9c319ULL, 0x041536e670a3afa9ULL, 0x0e99bdaa371d1a79ULL, 0x08e23b6e0a7776c9ULL, 0x0de1ce2293e41d34ULL, 0x0b9a48e6ae8e7184ULL, 0x0116c3aae930c454ULL, 0x076d456ed45aa8e4ULL, 0x140fd532664daff4ULL, 0x127453f65b27c344ULL, 0x18f8d8ba1c997694ULL, 0x1e835e7e21f31a24ULL, 0x004337779fa84bb1ULL, 0x0638b1b3a2c22701ULL, 0x0cb43affe57c92d1ULL, 0x0acfbc3bd816fe61ULL, 0x19ad2c676a01f971ULL, 0x1fd6aaa3576b95c1ULL, 0x155a21ef10d52011ULL, 0x1321a72b2dbf4ca1ULL, 0x16225267b42c275cULL, 0x1059d4a389464becULL, 0x1ad55fefcef8fe3cULL, 0x1caed92bf392928cULL, 0x0fcc49774185959cULL, 0x09b7cfb37ceff92cULL, 0x033b44ff3b514cfcULL, 0x0540c23b063b204cULL, 0x093cae6608779b06ULL, 0x0f4728a2351df7b6ULL, 0x05cba3ee72a34266ULL, 0x03b0252a4fc92ed6ULL, 0x10d2b576fdde29c6ULL, 0x16a933b2c0b44576ULL, 0x1c25b8fe870af0a6ULL, 0x1a5e3e3aba609c16ULL, 0x1f5dcb7623f3f7ebULL, 0x19264db21e999b5bULL, 0x13aac6fe59272e8bULL, 0x15d1403a644d423bULL, 0x06b3d066d65a452bULL, 0x00c856a2eb30299bULL, 0x0a44ddeeac8e9c4bULL, 0x0c3f5b2a91e4f0fbULL, 0x12bc0554b017eadfULL, 0x14c783908d7d866fULL, 0x1e4b08dccac333bfULL, 0x18308e18f7a95f0fULL, 0x0b521e4445be581fULL, 0x0d29988078d434afULL, 0x07a513cc3f6a817fULL, 0x01de95080200edcfULL, 0x04dd60449b938632ULL, 0x02a6e680a6f9ea82ULL, 0x082a6dcce1475f52ULL, 0x0e51eb08dc2d33e2ULL, 0x1d337b546e3a34f2ULL, 0x1b48fd9053505842ULL, 0x11c476dc14eeed92ULL, 0x17bff01829848122ULL, 0x1bc39c4527c83a68ULL, 0x1db81a811aa256d8ULL, 0x173491cd5d1ce308ULL, 0x114f170960768fb8ULL, 0x022d8755d26188a8ULL, 0x04560191ef0be418ULL, 0x0eda8adda8b551c8ULL, 0x08a10c1995df3d78ULL, 0x0da2f9550c4c5685ULL, 0x0bd97f9131263a35ULL, 0x0155f4dd76988fe5ULL, 0x072e72194bf2e355ULL, 0x144ce245f9e5e445ULL, 0x12376481c48f88f5ULL, 0x18bbefcd83313d25ULL, 0x1ec06909be5b5195ULL, 0x00866eef3f509762ULL, 0x06fde82b023afbd2ULL, 0x0c71636745844e02ULL, 0x0a0ae5a378ee22b2ULL, 0x196875ffcaf925a2ULL, 0x1f13f33bf7934912ULL, 0x159f7877b02dfcc2ULL, 0x13e4feb38d479072ULL, 0x16e70bff14d4fb8fULL, 0x109c8d3b29be973fULL, 0x1a1006776e0022efULL, 0x1c6b80b3536a4e5fULL, 0x0f0910efe17d494fULL, 0x0972962bdc1725ffULL, 0x03fe1d679ba9902fULL, 0x05859ba3a6c3fc9fULL, 0x09f9f7fea88f47d5ULL, 0x0f82713a95e52b65ULL, 0x050efa76d25b9eb5ULL, 0x03757cb2ef31f205ULL, 0x1017ecee5d26f515ULL, 0x166c6a2a604c99a5ULL, 0x1ce0e16627f22c75ULL, 0x1a9b67a21a9840c5ULL, 0x1f9892ee830b2b38ULL, 0x19e3142abe614788ULL, 0x136f9f66f9dff258ULL, 0x151419a2c4b59ee8ULL, 0x067689fe76a299f8ULL, 0x000d0f3a4bc8f548ULL, 0x0a8184760c764098ULL, 0x0cfa02b2311c2c28ULL, 0x12795ccc10ef360cULL, 0x1402da082d855abcULL, 0x1e8e51446a3bef6cULL, 0x18f5d780575183dcULL, 0x0b9747dce54684ccULL, 0x0decc118d82ce87cULL, 0x07604a549f925dacULL, 0x011bcc90a2f8311cULL, 0x041839dc3b6b5ae1ULL, 0x0263bf1806013651ULL, 0x08ef345441bf8381ULL, 0x0e94b2907cd5ef31ULL, 0x1df622cccec2e821ULL, 0x1b8da408f3a88491ULL, 0x11012f44b4163141ULL, 0x177aa980897c5df1ULL, 0x1b06c5dd8730e6bbULL, 0x1d7d4319ba5a8a0bULL, 0x17f1c855fde43fdbULL, 0x118a4e91c08e536bULL, 0x02e8decd7299547bULL, 0x049358094ff338cbULL, 0x0e1fd345084d8d1bULL, 0x086455813527e1abULL, 0x0d67a0cdacb48a56ULL, 0x0b1c260991dee6e6ULL, 0x0190ad45d6605336ULL, 0x07eb2b81eb0a3f86ULL, 0x1489bbdd591d3896ULL, 0x12f23d1964775426ULL, 0x187eb65523c9e1f6ULL, 0x1e0530911ea38d46ULL, 0x00c55998a0f8dcd3ULL, 0x06bedf5c9d92b063ULL, 0x0c325410da2c05b3ULL, 0x0a49d2d4e7466903ULL, 0x192b428855516e13ULL, 0x1f50c44c683b02a3ULL, 0x15dc4f002f85b773ULL, 0x13a7c9c412efdbc3ULL, 0x16a43c888b7cb03eULL, 0x10dfba4cb616dc8eULL, 0x1a533100f1a8695eULL, 0x1c28b7c4ccc205eeULL, 0x0f4a27987ed502feULL, 0x0931a15c43bf6e4eULL, 0x03bd2a100401db9eULL, 0x05c6acd4396bb72eULL, 0x09bac08937270c64ULL, 0x0fc1464d0a4d60d4ULL, 0x054dcd014df3d504ULL, 0x03364bc57099b9b4ULL, 0x1054db99c28ebea4ULL, 0x162f5d5dffe4d214ULL, 0x1ca3d611b85a67c4ULL, 0x1ad850d585300b74ULL, 0x1fdba5991ca36089ULL, 0x19a0235d21c90c39ULL, 0x132ca8116677b9e9ULL, 0x15572ed55b1dd559ULL, 0x0635be89e90ad249ULL, 0x004e384dd460bef9ULL, 0x0ac2b30193de0b29ULL, 0x0cb935c5aeb46799ULL, 0x123a6bbb8f477dbdULL, 0x1441ed7fb22d110dULL, 0x1ecd6633f593a4ddULL, 0x18b6e0f7c8f9c86dULL, 0x0bd470ab7aeecf7dULL, 0x0daff66f4784a3cdULL, 0x07237d23003a161dULL, 0x0158fbe73d507aadULL, 0x045b0eaba4c31150ULL, 0x0220886f99a97de0ULL, 0x08ac0323de17c830ULL, 0x0ed785e7e37da480ULL, 0x1db515bb516aa390ULL, 0x1bce937f6c00cf20ULL, 0x114218332bbe7af0ULL, 0x17399ef716d41640ULL, 0x1b45f2aa1898ad0aULL, 0x1d3e746e25f2c1baULL, 0x17b2ff22624c746aULL, 0x11c979e65f2618daULL, 0x02abe9baed311fcaULL, 0x04d06f7ed05b737aULL, 0x0e5ce43297e5c6aaULL, 0x082762f6aa8faa1aULL, 0x0d2497ba331cc1e7ULL, 0x0b5f117e0e76ad57ULL, 0x01d39a3249c81887ULL, 0x07a81cf674a27437ULL, 0x14ca8caac6b57327ULL, 0x12b10a6efbdf1f97ULL, 0x183d8122bc61aa47ULL, 0x1e4607e6810bc6f7ULL }; #else /* the original ones */ #define RABIN_WINDOW_SIZE 22 #define RABIN_SHIFT 55 static const u_int64_t T[256] = { 0x0000000000000000ULL, 0xb15e234bd3792f63ULL, 0x62bc4697a6f25ec6ULL, 0xd3e265dc758b71a5ULL, 0x7426ae649e9d92efULL, 0xc5788d2f4de4bd8cULL, 0x169ae8f3386fcc29ULL, 0xa7c4cbb8eb16e34aULL, 0x59137f82ee420abdULL, 0xe84d5cc93d3b25deULL, 0x3baf391548b0547bULL, 0x8af11a5e9bc97b18ULL, 0x2d35d1e670df9852ULL, 0x9c6bf2ada3a6b731ULL, 0x4f899771d62dc694ULL, 0xfed7b43a0554e9f7ULL, 0x0378dc4e0ffd3a19ULL, 0xb226ff05dc84157aULL, 0x61c49ad9a90f64dfULL, 0xd09ab9927a764bbcULL, 0x775e722a9160a8f6ULL, 0xc600516142198795ULL, 0x15e234bd3792f630ULL, 0xa4bc17f6e4ebd953ULL, 0x5a6ba3cce1bf30a4ULL, 0xeb35808732c61fc7ULL, 0x38d7e55b474d6e62ULL, 0x8989c61094344101ULL, 0x2e4d0da87f22a24bULL, 0x9f132ee3ac5b8d28ULL, 0x4cf14b3fd9d0fc8dULL, 0xfdaf68740aa9d3eeULL, 0x06f1b89c1ffa7432ULL, 0xb7af9bd7cc835b51ULL, 0x644dfe0bb9082af4ULL, 0xd513dd406a710597ULL, 0x72d716f88167e6ddULL, 0xc38935b3521ec9beULL, 0x106b506f2795b81bULL, 0xa1357324f4ec9778ULL, 0x5fe2c71ef1b87e8fULL, 0xeebce45522c151ecULL, 0x3d5e8189574a2049ULL, 0x8c00a2c284330f2aULL, 0x2bc4697a6f25ec60ULL, 0x9a9a4a31bc5cc303ULL, 0x49782fedc9d7b2a6ULL, 0xf8260ca61aae9dc5ULL, 0x058964d210074e2bULL, 0xb4d74799c37e6148ULL, 0x67352245b6f510edULL, 0xd66b010e658c3f8eULL, 0x71afcab68e9adcc4ULL, 0xc0f1e9fd5de3f3a7ULL, 0x13138c2128688202ULL, 0xa24daf6afb11ad61ULL, 0x5c9a1b50fe454496ULL, 0xedc4381b2d3c6bf5ULL, 0x3e265dc758b71a50ULL, 0x8f787e8c8bce3533ULL, 0x28bcb53460d8d679ULL, 0x99e2967fb3a1f91aULL, 0x4a00f3a3c62a88bfULL, 0xfb5ed0e81553a7dcULL, 0x0de371383ff4e864ULL, 0xbcbd5273ec8dc707ULL, 0x6f5f37af9906b6a2ULL, 0xde0114e44a7f99c1ULL, 0x79c5df5ca1697a8bULL, 0xc89bfc17721055e8ULL, 0x1b7999cb079b244dULL, 0xaa27ba80d4e20b2eULL, 0x54f00ebad1b6e2d9ULL, 0xe5ae2df102cfcdbaULL, 0x364c482d7744bc1fULL, 0x87126b66a43d937cULL, 0x20d6a0de4f2b7036ULL, 0x918883959c525f55ULL, 0x426ae649e9d92ef0ULL, 0xf334c5023aa00193ULL, 0x0e9bad763009d27dULL, 0xbfc58e3de370fd1eULL, 0x6c27ebe196fb8cbbULL, 0xdd79c8aa4582a3d8ULL, 0x7abd0312ae944092ULL, 0xcbe320597ded6ff1ULL, 0x1801458508661e54ULL, 0xa95f66cedb1f3137ULL, 0x5788d2f4de4bd8c0ULL, 0xe6d6f1bf0d32f7a3ULL, 0x3534946378b98606ULL, 0x846ab728abc0a965ULL, 0x23ae7c9040d64a2fULL, 0x92f05fdb93af654cULL, 0x41123a07e62414e9ULL, 0xf04c194c355d3b8aULL, 0x0b12c9a4200e9c56ULL, 0xba4ceaeff377b335ULL, 0x69ae8f3386fcc290ULL, 0xd8f0ac785585edf3ULL, 0x7f3467c0be930eb9ULL, 0xce6a448b6dea21daULL, 0x1d8821571861507fULL, 0xacd6021ccb187f1cULL, 0x5201b626ce4c96ebULL, 0xe35f956d1d35b988ULL, 0x30bdf0b168bec82dULL, 0x81e3d3fabbc7e74eULL, 0x2627184250d10404ULL, 0x97793b0983a82b67ULL, 0x449b5ed5f6235ac2ULL, 0xf5c57d9e255a75a1ULL, 0x086a15ea2ff3a64fULL, 0xb93436a1fc8a892cULL, 0x6ad6537d8901f889ULL, 0xdb8870365a78d7eaULL, 0x7c4cbb8eb16e34a0ULL, 0xcd1298c562171bc3ULL, 0x1ef0fd19179c6a66ULL, 0xafaede52c4e54505ULL, 0x51796a68c1b1acf2ULL, 0xe027492312c88391ULL, 0x33c52cff6743f234ULL, 0x829b0fb4b43add57ULL, 0x255fc40c5f2c3e1dULL, 0x9401e7478c55117eULL, 0x47e3829bf9de60dbULL, 0xf6bda1d02aa74fb8ULL, 0x1bc6e2707fe9d0c8ULL, 0xaa98c13bac90ffabULL, 0x797aa4e7d91b8e0eULL, 0xc82487ac0a62a16dULL, 0x6fe04c14e1744227ULL, 0xdebe6f5f320d6d44ULL, 0x0d5c0a8347861ce1ULL, 0xbc0229c894ff3382ULL, 0x42d59df291abda75ULL, 0xf38bbeb942d2f516ULL, 0x2069db65375984b3ULL, 0x9137f82ee420abd0ULL, 0x36f333960f36489aULL, 0x87ad10dddc4f67f9ULL, 0x544f7501a9c4165cULL, 0xe511564a7abd393fULL, 0x18be3e3e7014ead1ULL, 0xa9e01d75a36dc5b2ULL, 0x7a0278a9d6e6b417ULL, 0xcb5c5be2059f9b74ULL, 0x6c98905aee89783eULL, 0xddc6b3113df0575dULL, 0x0e24d6cd487b26f8ULL, 0xbf7af5869b02099bULL, 0x41ad41bc9e56e06cULL, 0xf0f362f74d2fcf0fULL, 0x2311072b38a4beaaULL, 0x924f2460ebdd91c9ULL, 0x358befd800cb7283ULL, 0x84d5cc93d3b25de0ULL, 0x5737a94fa6392c45ULL, 0xe6698a0475400326ULL, 0x1d375aec6013a4faULL, 0xac6979a7b36a8b99ULL, 0x7f8b1c7bc6e1fa3cULL, 0xced53f301598d55fULL, 0x6911f488fe8e3615ULL, 0xd84fd7c32df71976ULL, 0x0badb21f587c68d3ULL, 0xbaf391548b0547b0ULL, 0x4424256e8e51ae47ULL, 0xf57a06255d288124ULL, 0x269863f928a3f081ULL, 0x97c640b2fbdadfe2ULL, 0x30028b0a10cc3ca8ULL, 0x815ca841c3b513cbULL, 0x52becd9db63e626eULL, 0xe3e0eed665474d0dULL, 0x1e4f86a26fee9ee3ULL, 0xaf11a5e9bc97b180ULL, 0x7cf3c035c91cc025ULL, 0xcdade37e1a65ef46ULL, 0x6a6928c6f1730c0cULL, 0xdb370b8d220a236fULL, 0x08d56e51578152caULL, 0xb98b4d1a84f87da9ULL, 0x475cf92081ac945eULL, 0xf602da6b52d5bb3dULL, 0x25e0bfb7275eca98ULL, 0x94be9cfcf427e5fbULL, 0x337a57441f3106b1ULL, 0x8224740fcc4829d2ULL, 0x51c611d3b9c35877ULL, 0xe09832986aba7714ULL, 0x16259348401d38acULL, 0xa77bb003936417cfULL, 0x7499d5dfe6ef666aULL, 0xc5c7f69435964909ULL, 0x62033d2cde80aa43ULL, 0xd35d1e670df98520ULL, 0x00bf7bbb7872f485ULL, 0xb1e158f0ab0bdbe6ULL, 0x4f36eccaae5f3211ULL, 0xfe68cf817d261d72ULL, 0x2d8aaa5d08ad6cd7ULL, 0x9cd48916dbd443b4ULL, 0x3b1042ae30c2a0feULL, 0x8a4e61e5e3bb8f9dULL, 0x59ac04399630fe38ULL, 0xe8f227724549d15bULL, 0x155d4f064fe002b5ULL, 0xa4036c4d9c992dd6ULL, 0x77e10991e9125c73ULL, 0xc6bf2ada3a6b7310ULL, 0x617be162d17d905aULL, 0xd025c2290204bf39ULL, 0x03c7a7f5778fce9cULL, 0xb29984bea4f6e1ffULL, 0x4c4e3084a1a20808ULL, 0xfd1013cf72db276bULL, 0x2ef27613075056ceULL, 0x9fac5558d42979adULL, 0x38689ee03f3f9ae7ULL, 0x8936bdabec46b584ULL, 0x5ad4d87799cdc421ULL, 0xeb8afb3c4ab4eb42ULL, 0x10d42bd45fe74c9eULL, 0xa18a089f8c9e63fdULL, 0x72686d43f9151258ULL, 0xc3364e082a6c3d3bULL, 0x64f285b0c17ade71ULL, 0xd5aca6fb1203f112ULL, 0x064ec327678880b7ULL, 0xb710e06cb4f1afd4ULL, 0x49c75456b1a54623ULL, 0xf899771d62dc6940ULL, 0x2b7b12c1175718e5ULL, 0x9a25318ac42e3786ULL, 0x3de1fa322f38d4ccULL, 0x8cbfd979fc41fbafULL, 0x5f5dbca589ca8a0aULL, 0xee039fee5ab3a569ULL, 0x13acf79a501a7687ULL, 0xa2f2d4d1836359e4ULL, 0x7110b10df6e82841ULL, 0xc04e924625910722ULL, 0x678a59fece87e468ULL, 0xd6d47ab51dfecb0bULL, 0x05361f696875baaeULL, 0xb4683c22bb0c95cdULL, 0x4abf8818be587c3aULL, 0xfbe1ab536d215359ULL, 0x2803ce8f18aa22fcULL, 0x995dedc4cbd30d9fULL, 0x3e99267c20c5eed5ULL, 0x8fc70537f3bcc1b6ULL, 0x5c2560eb8637b013ULL, 0xed7b43a0554e9f70ULL }; static const u_int64_t U[256] = { 0x0000000000000000ULL, 0x079343d61ab9f60eULL, 0x0f2687ac3573ec1cULL, 0x08b5c47a2fca1a12ULL, 0x1e4d0f586ae7d838ULL, 0x19de4c8e705e2e36ULL, 0x116b88f45f943424ULL, 0x16f8cb22452dc22aULL, 0x3c9a1eb0d5cfb070ULL, 0x3b095d66cf76467eULL, 0x33bc991ce0bc5c6cULL, 0x342fdacafa05aa62ULL, 0x22d711e8bf286848ULL, 0x2544523ea5919e46ULL, 0x2df196448a5b8454ULL, 0x2a62d59290e2725aULL, 0x79343d61ab9f60e0ULL, 0x7ea77eb7b12696eeULL, 0x7612bacd9eec8cfcULL, 0x7181f91b84557af2ULL, 0x67793239c178b8d8ULL, 0x60ea71efdbc14ed6ULL, 0x685fb595f40b54c4ULL, 0x6fccf643eeb2a2caULL, 0x45ae23d17e50d090ULL, 0x423d600764e9269eULL, 0x4a88a47d4b233c8cULL, 0x4d1be7ab519aca82ULL, 0x5be32c8914b708a8ULL, 0x5c706f5f0e0efea6ULL, 0x54c5ab2521c4e4b4ULL, 0x5356e8f33b7d12baULL, 0x433659888447eea3ULL, 0x44a51a5e9efe18adULL, 0x4c10de24b13402bfULL, 0x4b839df2ab8df4b1ULL, 0x5d7b56d0eea0369bULL, 0x5ae81506f419c095ULL, 0x525dd17cdbd3da87ULL, 0x55ce92aac16a2c89ULL, 0x7fac473851885ed3ULL, 0x783f04ee4b31a8ddULL, 0x708ac09464fbb2cfULL, 0x771983427e4244c1ULL, 0x61e148603b6f86ebULL, 0x66720bb621d670e5ULL, 0x6ec7cfcc0e1c6af7ULL, 0x69548c1a14a59cf9ULL, 0x3a0264e92fd88e43ULL, 0x3d91273f3561784dULL, 0x3524e3451aab625fULL, 0x32b7a09300129451ULL, 0x244f6bb1453f567bULL, 0x23dc28675f86a075ULL, 0x2b69ec1d704cba67ULL, 0x2cfaafcb6af54c69ULL, 0x06987a59fa173e33ULL, 0x010b398fe0aec83dULL, 0x09befdf5cf64d22fULL, 0x0e2dbe23d5dd2421ULL, 0x18d5750190f0e60bULL, 0x1f4636d78a491005ULL, 0x17f3f2ada5830a17ULL, 0x1060b17bbf3afc19ULL, 0x3732905adbf6f225ULL, 0x30a1d38cc14f042bULL, 0x381417f6ee851e39ULL, 0x3f875420f43ce837ULL, 0x297f9f02b1112a1dULL, 0x2eecdcd4aba8dc13ULL, 0x265918ae8462c601ULL, 0x21ca5b789edb300fULL, 0x0ba88eea0e394255ULL, 0x0c3bcd3c1480b45bULL, 0x048e09463b4aae49ULL, 0x031d4a9021f35847ULL, 0x15e581b264de9a6dULL, 0x1276c2647e676c63ULL, 0x1ac3061e51ad7671ULL, 0x1d5045c84b14807fULL, 0x4e06ad3b706992c5ULL, 0x4995eeed6ad064cbULL, 0x41202a97451a7ed9ULL, 0x46b369415fa388d7ULL, 0x504ba2631a8e4afdULL, 0x57d8e1b50037bcf3ULL, 0x5f6d25cf2ffda6e1ULL, 0x58fe6619354450efULL, 0x729cb38ba5a622b5ULL, 0x750ff05dbf1fd4bbULL, 0x7dba342790d5cea9ULL, 0x7a2977f18a6c38a7ULL, 0x6cd1bcd3cf41fa8dULL, 0x6b42ff05d5f80c83ULL, 0x63f73b7ffa321691ULL, 0x646478a9e08be09fULL, 0x7404c9d25fb11c86ULL, 0x73978a044508ea88ULL, 0x7b224e7e6ac2f09aULL, 0x7cb10da8707b0694ULL, 0x6a49c68a3556c4beULL, 0x6dda855c2fef32b0ULL, 0x656f4126002528a2ULL, 0x62fc02f01a9cdeacULL, 0x489ed7628a7eacf6ULL, 0x4f0d94b490c75af8ULL, 0x47b850cebf0d40eaULL, 0x402b1318a5b4b6e4ULL, 0x56d3d83ae09974ceULL, 0x51409becfa2082c0ULL, 0x59f55f96d5ea98d2ULL, 0x5e661c40cf536edcULL, 0x0d30f4b3f42e7c66ULL, 0x0aa3b765ee978a68ULL, 0x0216731fc15d907aULL, 0x058530c9dbe46674ULL, 0x137dfbeb9ec9a45eULL, 0x14eeb83d84705250ULL, 0x1c5b7c47abba4842ULL, 0x1bc83f91b103be4cULL, 0x31aaea0321e1cc16ULL, 0x3639a9d53b583a18ULL, 0x3e8c6daf1492200aULL, 0x391f2e790e2bd604ULL, 0x2fe7e55b4b06142eULL, 0x2874a68d51bfe220ULL, 0x20c162f77e75f832ULL, 0x2752212164cc0e3cULL, 0x6e6520b5b7ede44aULL, 0x69f66363ad541244ULL, 0x6143a719829e0856ULL, 0x66d0e4cf9827fe58ULL, 0x70282feddd0a3c72ULL, 0x77bb6c3bc7b3ca7cULL, 0x7f0ea841e879d06eULL, 0x789deb97f2c02660ULL, 0x52ff3e056222543aULL, 0x556c7dd3789ba234ULL, 0x5dd9b9a95751b826ULL, 0x5a4afa7f4de84e28ULL, 0x4cb2315d08c58c02ULL, 0x4b21728b127c7a0cULL, 0x4394b6f13db6601eULL, 0x4407f527270f9610ULL, 0x17511dd41c7284aaULL, 0x10c25e0206cb72a4ULL, 0x18779a78290168b6ULL, 0x1fe4d9ae33b89eb8ULL, 0x091c128c76955c92ULL, 0x0e8f515a6c2caa9cULL, 0x063a952043e6b08eULL, 0x01a9d6f6595f4680ULL, 0x2bcb0364c9bd34daULL, 0x2c5840b2d304c2d4ULL, 0x24ed84c8fcced8c6ULL, 0x237ec71ee6772ec8ULL, 0x35860c3ca35aece2ULL, 0x32154feab9e31aecULL, 0x3aa08b90962900feULL, 0x3d33c8468c90f6f0ULL, 0x2d53793d33aa0ae9ULL, 0x2ac03aeb2913fce7ULL, 0x2275fe9106d9e6f5ULL, 0x25e6bd471c6010fbULL, 0x331e7665594dd2d1ULL, 0x348d35b343f424dfULL, 0x3c38f1c96c3e3ecdULL, 0x3babb21f7687c8c3ULL, 0x11c9678de665ba99ULL, 0x165a245bfcdc4c97ULL, 0x1eefe021d3165685ULL, 0x197ca3f7c9afa08bULL, 0x0f8468d58c8262a1ULL, 0x08172b03963b94afULL, 0x00a2ef79b9f18ebdULL, 0x0731acafa34878b3ULL, 0x5467445c98356a09ULL, 0x53f4078a828c9c07ULL, 0x5b41c3f0ad468615ULL, 0x5cd28026b7ff701bULL, 0x4a2a4b04f2d2b231ULL, 0x4db908d2e86b443fULL, 0x450ccca8c7a15e2dULL, 0x429f8f7edd18a823ULL, 0x68fd5aec4dfada79ULL, 0x6f6e193a57432c77ULL, 0x67dbdd4078893665ULL, 0x60489e966230c06bULL, 0x76b055b4271d0241ULL, 0x712316623da4f44fULL, 0x7996d218126eee5dULL, 0x7e0591ce08d71853ULL, 0x5957b0ef6c1b166fULL, 0x5ec4f33976a2e061ULL, 0x567137435968fa73ULL, 0x51e2749543d10c7dULL, 0x471abfb706fcce57ULL, 0x4089fc611c453859ULL, 0x483c381b338f224bULL, 0x4faf7bcd2936d445ULL, 0x65cdae5fb9d4a61fULL, 0x625eed89a36d5011ULL, 0x6aeb29f38ca74a03ULL, 0x6d786a25961ebc0dULL, 0x7b80a107d3337e27ULL, 0x7c13e2d1c98a8829ULL, 0x74a626abe640923bULL, 0x7335657dfcf96435ULL, 0x20638d8ec784768fULL, 0x27f0ce58dd3d8081ULL, 0x2f450a22f2f79a93ULL, 0x28d649f4e84e6c9dULL, 0x3e2e82d6ad63aeb7ULL, 0x39bdc100b7da58b9ULL, 0x3108057a981042abULL, 0x369b46ac82a9b4a5ULL, 0x1cf9933e124bc6ffULL, 0x1b6ad0e808f230f1ULL, 0x13df149227382ae3ULL, 0x144c57443d81dcedULL, 0x02b49c6678ac1ec7ULL, 0x0527dfb06215e8c9ULL, 0x0d921bca4ddff2dbULL, 0x0a01581c576604d5ULL, 0x1a61e967e85cf8ccULL, 0x1df2aab1f2e50ec2ULL, 0x15476ecbdd2f14d0ULL, 0x12d42d1dc796e2deULL, 0x042ce63f82bb20f4ULL, 0x03bfa5e99802d6faULL, 0x0b0a6193b7c8cce8ULL, 0x0c992245ad713ae6ULL, 0x26fbf7d73d9348bcULL, 0x2168b401272abeb2ULL, 0x29dd707b08e0a4a0ULL, 0x2e4e33ad125952aeULL, 0x38b6f88f57749084ULL, 0x3f25bb594dcd668aULL, 0x37907f2362077c98ULL, 0x30033cf578be8a96ULL, 0x6355d40643c3982cULL, 0x64c697d0597a6e22ULL, 0x6c7353aa76b07430ULL, 0x6be0107c6c09823eULL, 0x7d18db5e29244014ULL, 0x7a8b9888339db61aULL, 0x723e5cf21c57ac08ULL, 0x75ad1f2406ee5a06ULL, 0x5fcfcab6960c285cULL, 0x585c89608cb5de52ULL, 0x50e94d1aa37fc440ULL, 0x577a0eccb9c6324eULL, 0x4182c5eefcebf064ULL, 0x46118638e652066aULL, 0x4ea44242c9981c78ULL, 0x49370194d321ea76ULL }; #endif static unsigned char rabin_window[RABIN_WINDOW_SIZE]; static unsigned rabin_pos = 0; #define MIN(x,y) ((y)<(x) ? (y) : (x)) #define MAX(x,y) ((y)>(x) ? (y) : (x)) /* * The copies array is the central data structure for diff generation. * Data statements are implicit, for ranges not covered by any copy command. * * The sum of tgt and length for each entry must be monotonically increasing, * and data ranges must be non-overlapping. This is accomplished by not * extending matches backwards during initial matching. * * Copies may have zero length, to make it quick to delete copies during * optimization. However, the last copy in the list must always be a * non-trivial copy. * * Before committing copies, an important optimization is performed: during * a backward pass through the copies array, each entry is extended backwards, * and redundant copies are eliminated. * * If each match were extended backwards on insertion, the same data may be * matched an arbitrary number of times, resulting in potentially quadratic * time behavior. */ typedef struct copyinfo { unsigned src; unsigned tgt; unsigned length; } CopyInfo; static CopyInfo *copies; static int copy_count = 0; static unsigned max_copies = 0; /* Dynamically increased */ static unsigned *idx; static unsigned idx_size; static unsigned char *idx_data; static unsigned idx_data_len; static void rabin_reset(void) { memset(rabin_window, 0, sizeof(rabin_window)); } static u_int64_t rabin_slide (u_int64_t fp, unsigned char m) { unsigned char om; if (++rabin_pos == RABIN_WINDOW_SIZE) rabin_pos = 0; om = rabin_window[rabin_pos]; fp ^= U[om]; rabin_window[rabin_pos] = m; fp = ((fp << 8) | m) ^ T[fp >> RABIN_SHIFT]; return fp; } static void init_idx (unsigned char *data, size_t len, int level) { unsigned index_step = RABIN_WINDOW_SIZE / sizeof(unsigned) * sizeof(unsigned); size_t j, k; unsigned char ch = 0; unsigned maxofs[256]; unsigned maxlen[256]; unsigned maxfp[256]; unsigned runlen = 0; u_int64_t fp = 0; assert (len <= MAX_SIZE); assert (level >= 0 && level <= 9); memset(maxofs, 0, sizeof(maxofs)); memset(maxlen, 0, sizeof(maxlen)); memset(maxfp, 0, sizeof(maxfp)); /* index_step must be multiple of word size */ if (level >= 1) index_step = MIN(index_step, 4 * sizeof(unsigned)); /* Use smaller step size for higher optimization levels or smaller files */ if (level >= 3 || len <= 65536) index_step = MIN (index_step, 3 * sizeof (unsigned)); if (level >= 4 || len <= 32768) index_step = MIN (index_step, 2 * sizeof (unsigned)); if (level >= 6 || len < 16384) index_step = MIN (index_step, 1 * sizeof (unsigned)); assert (index_step && !(index_step % sizeof (unsigned))); /* Add fixed amount to hash table size, as small files will benefit a lot without using significantly more memory or time. */ idx_size = (level + 1) * (len / index_step) / 2 + MIN_HTAB_SIZE; idx_size = MIN (idx_size, MAX_HTAB_SIZE - 1); /* So rounding up works */ /* Round up to next power of two, but limit to MAX_HTAB_SIZE. */ { unsigned s = MIN_HTAB_SIZE; while (s < idx_size) s += s; idx_size = s; } idx_data = data; idx_data_len = len; idx = calloc(idx_size, sizeof(unsigned)); /* It is tempting to first index higher addresses, so hashes of lower addresses will get preference in the hash table. However, for repetitive patterns with a period that is a divisor of the fingerprint window, this may mean the match is not anchored at the end. Furthermore, even when using a window length that is prime, the benefits are small and the irregularity of the first matches being more important is not worth it. */ rabin_reset(); ch = 0; runlen = 0; for (j = 0; j + index_step < len; j += index_step) { unsigned char pch = 0; unsigned hash; for (k = 0; k < index_step; k++) { pch = ch; ch = data[j + k]; if (ch != pch) runlen = 0; runlen++; fp = rabin_slide(fp, ch); } /* See if there is a word-aligned window-sized run of equal characters */ if (runlen >= RABIN_WINDOW_SIZE + sizeof(unsigned) - 1) { /* Skip ahead to end of run of identical input characters */ while (j + k < len && data[j + k] == ch) { k++; runlen++; } /* Although matches are usually anchored at the end, in the case of extended runs of equal characters it is better to anchor after the first RABIN_WINDOW_SIZE bytes. This allows for quick skip ahead while matching such runs, avoiding unneeded fingerprint calculations. Also, when anchoring at the end, matches will be generated after every word, because the fingerprint stays constant. Even though all matches would get combined during match optimization, it wastes time and space. */ if (runlen > maxlen[pch] + 4) { unsigned ofs; /* ofs points RABIN_WINDOW_SIZE bytes after the start of the run, rounded up to the next word */ ofs = j + k - runlen + RABIN_WINDOW_SIZE + (sizeof (unsigned) - 1); ofs -= ofs % sizeof(unsigned); maxofs[pch] = ofs; maxlen [pch] = runlen; assert(maxfp[pch] == 0 || maxfp[pch] == (unsigned)fp); maxfp[pch] = (unsigned)fp; } /* Keep input aligned as if no special run processing had taken place */ j += k - (k % index_step) - index_step; k = index_step; } /* Testing showed that avoiding collisions using secondary hashing, or hash chaining had little effect and is not worth the time. */ hash = ((unsigned)fp) & (idx_size - 1); idx[hash] = j + k; } /* Lastly, index the longest runs of equal characters found before. This ensures we always match the longerst such runs available. */ for (j = 0; j < 256; j++) if (maxlen[j]) idx[maxfp[j] % idx_size] = maxofs[j]; } /* Match data against the current index and record all possible copies */ static int find_copies(unsigned char *data, size_t len) { size_t j = 0; u_int64_t fp = 0; rabin_reset(); while (j < RABIN_WINDOW_SIZE && j < len) fp = rabin_slide(fp, data[j++]); while (j < len) { unsigned hash, ofs, src, tgt, runlen, maxrun; fp = rabin_slide(fp, data[j++]); hash = fp & (idx_size - 1); ofs = idx[hash]; /* Invariant: data[0] .. data[j-1] has been processed fp is fingerprint of sliding window ending at j-1 ofs is zero or points just past tentative match ofs is a multiple of index_step */ if (!ofs) continue; runlen = 0; tgt = j - 4; src = ofs - 4; maxrun = MIN(idx_data_len - src, len - tgt); /* Hot loop */ while (runlen < maxrun && data[tgt + runlen] == idx_data[src + runlen]) runlen++; if (runlen < 4) continue; if (copy_count == max_copies) { max_copies *= 2; if (!copies) { max_copies = MAX_COPIES; copies = malloc(max_copies * sizeof(CopyInfo)); } else { copies = realloc(copies, max_copies * sizeof (CopyInfo)); } if (!copies) return 0; } copies[copy_count].src = src; copies[copy_count].tgt = tgt; copies[copy_count].length = runlen; copy_count++; /* For runs extending more than RABIN_WINDOW_SIZE bytes beyond j, skip ahead to prevent useless fingerprint computations. */ if (tgt + runlen > j + RABIN_WINDOW_SIZE) j = tgt + runlen - RABIN_WINDOW_SIZE; /* Quickly scan ahead without looking for matches until the end of this run */ while (j < tgt + runlen) fp = rabin_slide(fp, data[j++]); } return 1; } static unsigned header_length(unsigned srclen, unsigned tgtlen) { unsigned len = 0; assert (srclen <= MAX_SIZE && tgtlen <= MAX_SIZE); /* GIT headers start with the length of the source and target, with 7 bits per byte, least significant byte first, and the high bit indicating continuation. */ do { len++; srclen >>= 7; } while (srclen); do { len++; tgtlen >>= 7; } while (tgtlen); return len; } static unsigned char *write_header(unsigned char *patch, unsigned srclen, unsigned tgtlen) { assert (srclen <= MAX_SIZE && tgtlen <= MAX_SIZE); while (srclen >= 0x80) { *patch++ = srclen | 0x80; srclen >>= 7; } *patch++ = srclen; while (tgtlen >= 0x80) { *patch++ = tgtlen | 0x80; tgtlen >>= 7; } *patch++ = tgtlen; return patch; } static unsigned data_length(unsigned length) { /* Can only include 0x7f data bytes per command */ unsigned partial = length % 0x7f; assert (length > 0 && length <= MAX_SIZE); if (partial) partial++; return partial + (length / 0x7f) * 0x80; } static unsigned char *write_data(unsigned char *patch, unsigned char *data, unsigned size) { assert (size > 0 && size < MAX_SIZE); /* The return value must be equal to patch + data_length (patch, size). This correspondence is essential for calculating the patch size. */ /* GIT has no data commands for large data, rest is same as GDIFF */ do { unsigned s = size; if (s > 0x7f) s = 0x7f; *patch++ = s; memcpy(patch, data, s); data += s; patch += s; size -= s; } while (size); return patch; } static unsigned copy_length (unsigned offset, unsigned length) { unsigned size = 0; assert (offset < MAX_SIZE && length < MAX_SIZE); /* For now we only copy a maximum of 0x10000 bytes per command. Longer copies are broken into pieces of that size. */ do { signed s = length; if (s > 0x10000) s = 0x10000; size += !!(s & 0xff) + !!(s & 0xff00); size += !!(offset & 0xff) + !!(offset & 0xff00) + !!(offset & 0xff0000) + !!(offset & 0xff000000); size += 1; offset += s; length -= s; } while (length); return size; } static unsigned char * write_copy(unsigned char *patch, unsigned offset, unsigned size) { /* The return value must be equal to patch + copy_length (patch,offset,size). This correspondence is essential for calculating the patch size. */ do { unsigned char c = 0x80, *cmd = patch++; unsigned v, s = size; if (s > 0x10000) s = 0x10000; v = offset; if (v & 0xff) c |= 0x01, *patch++ = v; v >>= 8; if (v & 0xff) c |= 0x02, *patch++ = v; v >>= 8; if (v & 0xff) c |= 0x04, *patch++ = v; v >>= 8; if (v & 0xff) c |= 0x08, *patch++ = v; v = s; if (v & 0xff) c |= 0x10, *patch++ = v; v >>= 8; if (v & 0xff) c |= 0x20, *patch++ = v; *cmd = c; offset += s; size -= s; } while (size); return patch; } static unsigned process_copies (unsigned char *data, unsigned length, unsigned maxlen) { int j; unsigned ptr = length; unsigned patch_bytes = header_length(idx_data_len, length); /* Work through the copies backwards, extending each one backwards. */ for (j = copy_count - 1; j >= 0; j--) { CopyInfo *copy = copies+j; unsigned src = copy->src; unsigned tgt = copy->tgt; unsigned len = copy->length; int data_follows; if (tgt + len > ptr) { /* Part of copy already covered by later one, so shorten copy. */ if (ptr < tgt) { /* Copy completely disappeared, but guess that a backward extension might still be useful. This extension is non-contiguous, as it is irrelevant whether the skipped data would have matched or not. Be careful to not extend past the beginning of the source. */ unsigned adjust = tgt - ptr; tgt = ptr; src = (src < adjust) ? 0 : src - adjust; copy->tgt = tgt; copy->src = src; } len = ptr - tgt; } while (src && tgt && idx_data[src - 1] == data[tgt - 1]) { src--; tgt--; } len += copy->tgt - tgt; data_follows = (tgt + len < ptr); /* A short copy may cost as much as 6 bytes for the copy and 5 as result of an extra data command. It's not worth having extra copies in order to just save a byte or two. Being too smart here may hurt later compression as well. */ if (len < (data_follows ? 16 : 10)) len = 0; /* Some target data is not covered by the copies, account for the DATA command that will follow the copy. */ if (len && data_follows) patch_bytes += data_length(ptr - (tgt + len)); /* Everything about the copy is known and will not change. Write back the new information and update the patch size with the size of the copy instruction. */ copy->length = len; copy->src = src; copy->tgt = tgt; if (len) { /* update patch size for copy command */ patch_bytes += copy_length (src, len); ptr = tgt; } else if (j == copy_count - 1) { /* Remove empty copies at end of list. */ copy_count--; } if (patch_bytes > maxlen) return 0; } /* Account for data before first copy */ if (ptr != 0) patch_bytes += data_length(ptr); if (patch_bytes > maxlen) return 0; return patch_bytes; } static unsigned calculate_delta(void *to_buf, unsigned long to_size, unsigned long max_size) { assert (to_size < MAX_SIZE); if (!find_copies(to_buf, to_size)) return 0; return process_copies(to_buf, to_size, max_size); } static void *create_delta (unsigned char *data, unsigned len, unsigned delta_size) { unsigned char *delta = malloc(delta_size); unsigned char *ptr = delta; unsigned offset = 0; int j; ptr = write_header(ptr, idx_data_len, len); for (j = 0; j < copy_count; j++) { CopyInfo *copy = copies + j; unsigned copylen = copy->length; if (!copylen) continue; if (copy->tgt > offset) { assert (delta_size - (ptr - delta) > data_length(copy->tgt - offset)); ptr = write_data(ptr, data + offset, copy->tgt - offset); } assert(delta_size - (ptr - delta) >= copy_length(copy->src, copylen)); ptr = write_copy(ptr, copy->src, copylen); offset = copy->tgt + copylen; } if (offset < len) ptr = write_data(ptr, data + offset, len - offset); assert(ptr - delta == delta_size); return delta; } static void finalize_idx() { free(copies); copies = 0; max_copies = 0; copy_count = 0; free(idx); idx = 0; idx_size = 0; idx_data = 0; idx_data_len = 0; } void *diff_delta(void *from_buf, unsigned long from_size, void *to_buf, unsigned long to_size, unsigned long *delta_size, unsigned long max_size) { unsigned char *delta = 0; unsigned dsize; assert (from_size <= MAX_SIZE && to_size <= MAX_SIZE); init_idx(from_buf, from_size, 1); /* Use optimization level 1 */ if (!max_size) max_size = MAX_SIZE; dsize = calculate_delta (to_buf, to_size, max_size); if (dsize) delta = create_delta (to_buf, to_size, dsize); finalize_idx (); if (delta) *delta_size = dsize; return delta; } ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: RFC: New diff-delta.c implementation 2006-04-24 5:27 ` Nicolas Pitre @ 2006-04-24 15:19 ` Geert Bosch 2006-04-24 15:57 ` Nicolas Pitre 2006-04-24 20:37 ` Petr Baudis 2006-04-24 18:44 ` Geert Bosch 1 sibling, 2 replies; 32+ messages in thread From: Geert Bosch @ 2006-04-24 15:19 UTC (permalink / raw) To: Nicolas Pitre; +Cc: Rene Scharfe, Git Mailing List, Junio C Hamano On Mon, Apr 24, 2006 at 01:27:07AM -0400, Nicolas Pitre wrote: > I took the liberty of reworking the coding style, as well as simplifying > some constructs a bit. > > I also fixed all the format encoding and size computations bugs the > original code has so it works perfectly now. Thanks for fixing that! I'll have a more detailed look at your changes later, as not all changes are immediately obvious since the formatting is different. > Note that the GIT encoding is itself different (a bit denser) from the > xdiff encoding. OK, I mostly looked a patch-delta.c to reverse engineer the format, and mainly looked at libxdiff in the hope to find some format description there. > Please just completely drop that word fetch "optimization". Not only > does it produce worse assembly code in the end due to the extra loop > counter and extra shifting in turn increasing register pressure, but it > also assumes that the target data buffer is always word aligned which is > not guaranteed (casting a char pointer to an unsigned and dereferencing > it is not portable since on some architecture such misaligned accesses > are not allowed). And it even has worse performance. For example, with > word fetch and shift code, packing the Linux kernel archive on my P4 at > 3GHz: > user 6m33.797s [vs.] > user 6m13.911s > So you must be extremely careful when trying to implement such kind of > optimizations. As a compiler writer, I am more than aware of this. Your change results in a 70% slowdown on PowerPC. On many targets byte-accesses are relatively slow. Indeed, I had expect this to be mostly neutral on x86, and the 5% slowdown you saw is not unexpected. Note that in this case aligned accesses are guaranteed, because the index_step is guaranteed to be a multiple of the word size. > But here comes the sad part. Even after simplifying the code as much as > I could, performance is still significantly worse than the current > diff-delta.c code. Repacking again the same Linux kernel repository > with the current code: That's unexpected, but I can see how this could be if most files have very few differences and are relatively small. For such cases, almost any hash will do, and the more complicated hashing will be more compute intensive. I have benchmarked my original diff code on a set of large files with lots of changes. These are hardest to get right, and hardest to get good performance with. Just try diffing any two large (uncompressed) tar files, and you'll see. On many of such large files, the new code is orders of magnitude faster. On these cases, the resulting deltas are also much smaller. The comparison is a bit between a O(n^2) sort that is fast on small or mostly sorted inputs (but horrible on large ones) and a more complex O(nlogn) algorithm that is a bit slower for the simple cases, but far faster for more complex cases. > In the mean time I'll try replacing the adler32 hashing with your Rabin > polynomial hashing into the current code just to see if it helps (and I > think it will help quite a bit). The best solution might be a mix of > both approaches. I think the most interesting part of my code is in process_copies. This optimization results in elimination of about half the commands in large test cases, while taking virtually no time at all. Also, I believe that any form of hash chaining of secondary hashing is useless. I have done many benchmarks, and in all cases a slight increase of hash table size was both more effective for pack size and faster due to less cache misses. > Of course there is the issue of reversing the object matching window > logic in pack-object.c to further improve things, but I consider that an > orthogonal improvement at this point which both approaches will benefit > from anyway. Right, this will be the most important one for performance for now. Thanks again for taking the time to look at my code and fixing the errors in handling the git file format. -Geert PS. Somehow your code had "double line spacing" :) ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: RFC: New diff-delta.c implementation 2006-04-24 15:19 ` Geert Bosch @ 2006-04-24 15:57 ` Nicolas Pitre 2006-04-24 16:31 ` Geert Bosch ` (2 more replies) 2006-04-24 20:37 ` Petr Baudis 1 sibling, 3 replies; 32+ messages in thread From: Nicolas Pitre @ 2006-04-24 15:57 UTC (permalink / raw) To: Geert Bosch; +Cc: Rene Scharfe, Git Mailing List, Junio C Hamano [-- Attachment #1: Type: TEXT/PLAIN, Size: 4115 bytes --] On Mon, 24 Apr 2006, Geert Bosch wrote: > On Mon, Apr 24, 2006 at 01:27:07AM -0400, Nicolas Pitre wrote: > > Please just completely drop that word fetch "optimization". Not only > > does it produce worse assembly code in the end due to the extra loop > > counter and extra shifting in turn increasing register pressure, but it > > also assumes that the target data buffer is always word aligned which is > > not guaranteed (casting a char pointer to an unsigned and dereferencing > > it is not portable since on some architecture such misaligned accesses > > are not allowed). And it even has worse performance. For example, with > > word fetch and shift code, packing the Linux kernel archive on my P4 at > > 3GHz: > > > user 6m33.797s > [vs.] > > user 6m13.911s > > So you must be extremely careful when trying to implement such kind of > > optimizations. > > As a compiler writer, I am more than aware of this. > Your change results in a 70% slowdown on PowerPC. Huh!? > On many targets > byte-accesses are relatively slow. Indeed, I had expect this to be > mostly neutral on x86, and the 5% slowdown you saw is not unexpected. FWIW I'd expect a slowdown on ARM as well but I didn't bench it. > Note that in this case aligned accesses are guaranteed, because the > index_step is guaranteed to be a multiple of the word size. Sure, but the data buffers aren't necessarily so. At least some code to align index_step with actual memory offset if necessary should be considered. > > But here comes the sad part. Even after simplifying the code as much as > > I could, performance is still significantly worse than the current > > diff-delta.c code. Repacking again the same Linux kernel repository > > with the current code: > That's unexpected, but I can see how this could be if most files have > very few differences and are relatively small. For such cases, almost > any hash will do, and the more complicated hashing will be more compute > intensive. > > > I have benchmarked my original diff code on a set of large files with > lots of changes. These are hardest to get right, and hardest to get > good performance with. Just try diffing any two large (uncompressed) > tar files, and you'll see. On many of such large files, the new code > is orders of magnitude faster. On these cases, the resulting deltas > are also much smaller. > > The comparison is a bit between a O(n^2) sort that is fast on small > or mostly sorted inputs (but horrible on large ones) and a more > complex O(nlogn) algorithm that is a bit slower for the simple > cases, but far faster for more complex cases. Indeed. And since the primary goal for GIT is to manage relatively small files with relatively few differences then we have to optimize for that case while trying to simply limit the dammage in the other cases. > > In the mean time I'll try replacing the adler32 hashing with your Rabin > > polynomial hashing into the current code just to see if it helps (and I > > think it will help quite a bit). The best solution might be a mix of > > both approaches. > I think the most interesting part of my code is in process_copies. > This optimization results in elimination of about half the commands > in large test cases, while taking virtually no time at all. Yes, I found that part really nice and clever. > Also, I believe that any form of hash chaining of secondary hashing > is useless. I have done many benchmarks, and in all cases a slight > increase of hash table size was both more effective for pack size and > faster due to less cache misses. Well, I did lots of benchmarks too over the Linux kernel repository with the current code. It is of course a dataset quite different from two large files. And simply increasing the hash size to improve on pack size did increase CPU usage quite significantly. > PS. Somehow your code had "double line spacing" :) Gah. Your original version must have CRLF line terminations, and vi simply notice that and writes the file back with CRLF by default. Find attached a version with those converted to LF only. Nicolas [-- Attachment #2: Type: TEXT/PLAIN, Size: 39703 bytes --] #include <unistd.h> #include <stdlib.h> #include <assert.h> #include <string.h> #include <sys/types.h> #undef assert #define assert(x) do { } while (0) /* * MIN_HTAB_SIZE is fixed amount to be added to the size of the hash table * used for indexing and must be a power of two. This allows for small files * to have a sparse hash table, since in that case it's cheap. * Hash table sizes are rounded up to a power of two to avoid integer division. */ #define MIN_HTAB_SIZE 8192 #define MAX_HTAB_SIZE (1024*1024*1024) /* * Diffing files of gigabyte range is impractical with the current * algorithm, so we're assuming 32-bit sizes everywhere. * Size leaves some room for expansion when diffing random files. */ #define MAX_SIZE (0x7eff0000) /* Initial size of copies table, dynamically extended as needed. */ #define MAX_COPIES 4096 /* * Matching is done using a sliding window for which a Rabin * polynomial is computed. The advantage of such polynomials is * that they can efficiently be updated at every position. * The tables needed for this are precomputed, as it is desirable * to use the same polynomial all the time for repeatable results. */ #if 0 /* those don't seem to work */ #define RABIN_POLY 0x25bd5331c0d7096dULL #define RABIN_DEGREE 61 #define RABIN_SHIFT 53 #define RABIN_WINDOW_SIZE 22 static const u_int64_t T[256] = { 0x6ec7f55241791bb7ULL, 0x96f54cc7035c25b4ULL, 0xb3481ff6c38b2cd9ULL, 0xdd8feaa482f2376eULL, 0xf832b99542253e03ULL, 0x0857cabfc66f4205ULL, 0x2dea998e06b84b68ULL, 0x432d6cdc47c150dfULL, 0x66903fed871659b2ULL, 0x9ea28678c53367b1ULL, 0xbb1fd54905e46edcULL, 0xd5d8201b449d756bULL, 0xf065732a844a7c06ULL, 0x10af957f8cde840aULL, 0x3512c64e4c098d67ULL, 0x5bd5331c0d7096d0ULL, 0x7e68602dcda79fbdULL, 0x865ad9b88f82a1beULL, 0xa3e78a894f55a8d3ULL, 0xcd207fdb0e2cb364ULL, 0xe89d2ceacefbba09ULL, 0x18f85fc04ab1c60fULL, 0x3d450cf18a66cf62ULL, 0x5382f9a3cb1fd4d5ULL, 0x763faa920bc8ddb8ULL, 0x8e0d130749ede3bbULL, 0xabb04036893aead6ULL, 0xc577b564c843f161ULL, 0xe0cae6550894f80cULL, 0x04e279ced96a0179ULL, 0x215f2aff19bd0814ULL, 0x4f98dfad58c413a3ULL, 0x6a258c9c98131aceULL, 0x92173509da3624cdULL, 0xb7aa66381ae12da0ULL, 0xd96d936a5b983617ULL, 0xfcd0c05b9b4f3f7aULL, 0x0cb5b3711f05437cULL, 0x2908e040dfd24a11ULL, 0x47cf15129eab51a6ULL, 0x627246235e7c58cbULL, 0x9a40ffb61c5966c8ULL, 0xbffdac87dc8e6fa5ULL, 0xd13a59d59df77412ULL, 0xf4870ae45d207d7fULL, 0x144decb155b48573ULL, 0x31f0bf8095638c1eULL, 0x5f374ad2d41a97a9ULL, 0x7a8a19e314cd9ec4ULL, 0x82b8a07656e8a0c7ULL, 0xa705f347963fa9aaULL, 0xc9c20615d746b21dULL, 0xec7f55241791bb70ULL, 0x1c1a260e93dbc776ULL, 0x39a7753f530cce1bULL, 0x5760806d1275d5acULL, 0x72ddd35cd2a2dcc1ULL, 0x8aef6ac99087e2c2ULL, 0xaf5239f85050ebafULL, 0xc195ccaa1129f018ULL, 0xe4289f9bd1fef975ULL, 0x09c4f39db2d402f2ULL, 0x2c79a0ac72030b9fULL, 0x42be55fe337a1028ULL, 0x670306cff3ad1945ULL, 0x9f31bf5ab1882746ULL, 0xba8cec6b715f2e2bULL, 0xd44b19393026359cULL, 0xf1f64a08f0f13cf1ULL, 0x0193392274bb40f7ULL, 0x242e6a13b46c499aULL, 0x4ae99f41f515522dULL, 0x6f54cc7035c25b40ULL, 0x976675e577e76543ULL, 0xb2db26d4b7306c2eULL, 0xdc1cd386f6497799ULL, 0xf9a180b7369e7ef4ULL, 0x196b66e23e0a86f8ULL, 0x3cd635d3fedd8f95ULL, 0x5211c081bfa49422ULL, 0x77ac93b07f739d4fULL, 0x8f9e2a253d56a34cULL, 0xaa237914fd81aa21ULL, 0xc4e48c46bcf8b196ULL, 0xe159df777c2fb8fbULL, 0x113cac5df865c4fdULL, 0x3481ff6c38b2cd90ULL, 0x5a460a3e79cbd627ULL, 0x7ffb590fb91cdf4aULL, 0x87c9e09afb39e149ULL, 0xa274b3ab3beee824ULL, 0xccb346f97a97f393ULL, 0xe90e15c8ba40fafeULL, 0x0d268a536bbe038bULL, 0x289bd962ab690ae6ULL, 0x465c2c30ea101151ULL, 0x63e17f012ac7183cULL, 0x9bd3c69468e2263fULL, 0xbe6e95a5a8352f52ULL, 0xd0a960f7e94c34e5ULL, 0xf51433c6299b3d88ULL, 0x057140ecadd1418eULL, 0x20cc13dd6d0648e3ULL, 0x4e0be68f2c7f5354ULL, 0x6bb6b5beeca85a39ULL, 0x93840c2bae8d643aULL, 0xb6395f1a6e5a6d57ULL, 0xd8feaa482f2376e0ULL, 0xfd43f979eff47f8dULL, 0x1d891f2ce7608781ULL, 0x38344c1d27b78eecULL, 0x56f3b94f66ce955bULL, 0x734eea7ea6199c36ULL, 0x8b7c53ebe43ca235ULL, 0xaec100da24ebab58ULL, 0xc006f5886592b0efULL, 0xe5bba6b9a545b982ULL, 0x15ded593210fc584ULL, 0x306386a2e1d8cce9ULL, 0x5ea473f0a0a1d75eULL, 0x7b1920c16076de33ULL, 0x832b99542253e030ULL, 0xa696ca65e284e95dULL, 0xc8513f37a3fdf2eaULL, 0xedec6c06632afb87ULL, 0x1389e73b65a805e4ULL, 0x3634b40aa57f0c89ULL, 0x58f34158e406173eULL, 0x7d4e126924d11e53ULL, 0x857cabfc66f42050ULL, 0xa0c1f8cda623293dULL, 0xce060d9fe75a328aULL, 0xebbb5eae278d3be7ULL, 0x1bde2d84a3c747e1ULL, 0x3e637eb563104e8cULL, 0x50a48be72269553bULL, 0x7519d8d6e2be5c56ULL, 0x8d2b6143a09b6255ULL, 0xa8963272604c6b38ULL, 0xc651c7202135708fULL, 0xe3ec9411e1e279e2ULL, 0x03267244e97681eeULL, 0x269b217529a18883ULL, 0x485cd42768d89334ULL, 0x6de18716a80f9a59ULL, 0x95d33e83ea2aa45aULL, 0xb06e6db22afdad37ULL, 0xdea998e06b84b680ULL, 0xfb14cbd1ab53bfedULL, 0x0b71b8fb2f19c3ebULL, 0x2eccebcaefceca86ULL, 0x400b1e98aeb7d131ULL, 0x65b64da96e60d85cULL, 0x9d84f43c2c45e65fULL, 0xb839a70dec92ef32ULL, 0xd6fe525fadebf485ULL, 0xf343016e6d3cfde8ULL, 0x176b9ef5bcc2049dULL, 0x32d6cdc47c150df0ULL, 0x5c1138963d6c1647ULL, 0x79ac6ba7fdbb1f2aULL, 0x819ed232bf9e2129ULL, 0xa42381037f492844ULL, 0xcae474513e3033f3ULL, 0xef592760fee73a9eULL, 0x1f3c544a7aad4698ULL, 0x3a81077bba7a4ff5ULL, 0x5446f229fb035442ULL, 0x71fba1183bd45d2fULL, 0x89c9188d79f1632cULL, 0xac744bbcb9266a41ULL, 0xc2b3beeef85f71f6ULL, 0xe70eeddf3888789bULL, 0x07c40b8a301c8097ULL, 0x227958bbf0cb89faULL, 0x4cbeade9b1b2924dULL, 0x6903fed871659b20ULL, 0x9131474d3340a523ULL, 0xb48c147cf397ac4eULL, 0xda4be12eb2eeb7f9ULL, 0xfff6b21f7239be94ULL, 0x0f93c135f673c292ULL, 0x2a2e920436a4cbffULL, 0x44e9675677ddd048ULL, 0x61543467b70ad925ULL, 0x99668df2f52fe726ULL, 0xbcdbdec335f8ee4bULL, 0xd21c2b917481f5fcULL, 0xf7a178a0b456fc91ULL, 0x1a4d14a6d77c0716ULL, 0x3ff0479717ab0e7bULL, 0x5137b2c556d215ccULL, 0x748ae1f496051ca1ULL, 0x8cb85861d42022a2ULL, 0xa9050b5014f72bcfULL, 0xc7c2fe02558e3078ULL, 0xe27fad3395593915ULL, 0x121ade1911134513ULL, 0x37a78d28d1c44c7eULL, 0x5960787a90bd57c9ULL, 0x7cdd2b4b506a5ea4ULL, 0x84ef92de124f60a7ULL, 0xa152c1efd29869caULL, 0xcf9534bd93e1727dULL, 0xea28678c53367b10ULL, 0x0ae281d95ba2831cULL, 0x2f5fd2e89b758a71ULL, 0x419827bada0c91c6ULL, 0x6425748b1adb98abULL, 0x9c17cd1e58fea6a8ULL, 0xb9aa9e2f9829afc5ULL, 0xd76d6b7dd950b472ULL, 0xf2d0384c1987bd1fULL, 0x02b54b669dcdc119ULL, 0x270818575d1ac874ULL, 0x49cfed051c63d3c3ULL, 0x6c72be34dcb4daaeULL, 0x944007a19e91e4adULL, 0xb1fd54905e46edc0ULL, 0xdf3aa1c21f3ff677ULL, 0xfa87f2f3dfe8ff1aULL, 0x1eaf6d680e16066fULL, 0x3b123e59cec10f02ULL, 0x55d5cb0b8fb814b5ULL, 0x7068983a4f6f1dd8ULL, 0x885a21af0d4a23dbULL, 0xade7729ecd9d2ab6ULL, 0xc32087cc8ce43101ULL, 0xe69dd4fd4c33386cULL, 0x16f8a7d7c879446aULL, 0x3345f4e608ae4d07ULL, 0x5d8201b449d756b0ULL, 0x783f528589005fddULL, 0x800deb10cb2561deULL, 0xa5b0b8210bf268b3ULL, 0xcb774d734a8b7304ULL, 0xeeca1e428a5c7a69ULL, 0x0e00f81782c88265ULL, 0x2bbdab26421f8b08ULL, 0x457a5e74036690bfULL, 0x60c70d45c3b199d2ULL, 0x98f5b4d08194a7d1ULL, 0xbd48e7e14143aebcULL, 0xd38f12b3003ab50bULL, 0xf6324182c0edbc66ULL, 0x065732a844a7c060ULL, 0x23ea61998470c90dULL, 0x4d2d94cbc509d2baULL, 0x6890c7fa05dedbd7ULL, 0x90a27e6f47fbe5d4ULL, 0xb51f2d5e872cecb9ULL, 0xdbd8d80cc655f70eULL, 0xfe658b3d0682fe63ULL }; static const u_int64_t U[256] = { 0x0000000000000000ULL, 0x067b86c43d6a6cb0ULL, 0x0cf70d887ad4d960ULL, 0x0a8c8b4c47beb5d0ULL, 0x19ee1b10f5a9b2c0ULL, 0x1f959dd4c8c3de70ULL, 0x151916988f7d6ba0ULL, 0x1362905cb2170710ULL, 0x166165102b846cedULL, 0x101ae3d416ee005dULL, 0x1a9668985150b58dULL, 0x1cedee5c6c3ad93dULL, 0x0f8f7e00de2dde2dULL, 0x09f4f8c4e347b29dULL, 0x03787388a4f9074dULL, 0x0503f54c99936bfdULL, 0x097f991197dfd0b7ULL, 0x0f041fd5aab5bc07ULL, 0x05889499ed0b09d7ULL, 0x03f3125dd0616567ULL, 0x1091820162766277ULL, 0x16ea04c55f1c0ec7ULL, 0x1c668f8918a2bb17ULL, 0x1a1d094d25c8d7a7ULL, 0x1f1efc01bc5bbc5aULL, 0x19657ac58131d0eaULL, 0x13e9f189c68f653aULL, 0x1592774dfbe5098aULL, 0x06f0e71149f20e9aULL, 0x008b61d57498622aULL, 0x0a07ea993326d7faULL, 0x0c7c6c5d0e4cbb4aULL, 0x12ff32232fbfa16eULL, 0x1484b4e712d5cddeULL, 0x1e083fab556b780eULL, 0x1873b96f680114beULL, 0x0b112933da1613aeULL, 0x0d6aaff7e77c7f1eULL, 0x07e624bba0c2caceULL, 0x019da27f9da8a67eULL, 0x049e5733043bcd83ULL, 0x02e5d1f73951a133ULL, 0x08695abb7eef14e3ULL, 0x0e12dc7f43857853ULL, 0x1d704c23f1927f43ULL, 0x1b0bcae7ccf813f3ULL, 0x118741ab8b46a623ULL, 0x17fcc76fb62cca93ULL, 0x1b80ab32b86071d9ULL, 0x1dfb2df6850a1d69ULL, 0x1777a6bac2b4a8b9ULL, 0x110c207effdec409ULL, 0x026eb0224dc9c319ULL, 0x041536e670a3afa9ULL, 0x0e99bdaa371d1a79ULL, 0x08e23b6e0a7776c9ULL, 0x0de1ce2293e41d34ULL, 0x0b9a48e6ae8e7184ULL, 0x0116c3aae930c454ULL, 0x076d456ed45aa8e4ULL, 0x140fd532664daff4ULL, 0x127453f65b27c344ULL, 0x18f8d8ba1c997694ULL, 0x1e835e7e21f31a24ULL, 0x004337779fa84bb1ULL, 0x0638b1b3a2c22701ULL, 0x0cb43affe57c92d1ULL, 0x0acfbc3bd816fe61ULL, 0x19ad2c676a01f971ULL, 0x1fd6aaa3576b95c1ULL, 0x155a21ef10d52011ULL, 0x1321a72b2dbf4ca1ULL, 0x16225267b42c275cULL, 0x1059d4a389464becULL, 0x1ad55fefcef8fe3cULL, 0x1caed92bf392928cULL, 0x0fcc49774185959cULL, 0x09b7cfb37ceff92cULL, 0x033b44ff3b514cfcULL, 0x0540c23b063b204cULL, 0x093cae6608779b06ULL, 0x0f4728a2351df7b6ULL, 0x05cba3ee72a34266ULL, 0x03b0252a4fc92ed6ULL, 0x10d2b576fdde29c6ULL, 0x16a933b2c0b44576ULL, 0x1c25b8fe870af0a6ULL, 0x1a5e3e3aba609c16ULL, 0x1f5dcb7623f3f7ebULL, 0x19264db21e999b5bULL, 0x13aac6fe59272e8bULL, 0x15d1403a644d423bULL, 0x06b3d066d65a452bULL, 0x00c856a2eb30299bULL, 0x0a44ddeeac8e9c4bULL, 0x0c3f5b2a91e4f0fbULL, 0x12bc0554b017eadfULL, 0x14c783908d7d866fULL, 0x1e4b08dccac333bfULL, 0x18308e18f7a95f0fULL, 0x0b521e4445be581fULL, 0x0d29988078d434afULL, 0x07a513cc3f6a817fULL, 0x01de95080200edcfULL, 0x04dd60449b938632ULL, 0x02a6e680a6f9ea82ULL, 0x082a6dcce1475f52ULL, 0x0e51eb08dc2d33e2ULL, 0x1d337b546e3a34f2ULL, 0x1b48fd9053505842ULL, 0x11c476dc14eeed92ULL, 0x17bff01829848122ULL, 0x1bc39c4527c83a68ULL, 0x1db81a811aa256d8ULL, 0x173491cd5d1ce308ULL, 0x114f170960768fb8ULL, 0x022d8755d26188a8ULL, 0x04560191ef0be418ULL, 0x0eda8adda8b551c8ULL, 0x08a10c1995df3d78ULL, 0x0da2f9550c4c5685ULL, 0x0bd97f9131263a35ULL, 0x0155f4dd76988fe5ULL, 0x072e72194bf2e355ULL, 0x144ce245f9e5e445ULL, 0x12376481c48f88f5ULL, 0x18bbefcd83313d25ULL, 0x1ec06909be5b5195ULL, 0x00866eef3f509762ULL, 0x06fde82b023afbd2ULL, 0x0c71636745844e02ULL, 0x0a0ae5a378ee22b2ULL, 0x196875ffcaf925a2ULL, 0x1f13f33bf7934912ULL, 0x159f7877b02dfcc2ULL, 0x13e4feb38d479072ULL, 0x16e70bff14d4fb8fULL, 0x109c8d3b29be973fULL, 0x1a1006776e0022efULL, 0x1c6b80b3536a4e5fULL, 0x0f0910efe17d494fULL, 0x0972962bdc1725ffULL, 0x03fe1d679ba9902fULL, 0x05859ba3a6c3fc9fULL, 0x09f9f7fea88f47d5ULL, 0x0f82713a95e52b65ULL, 0x050efa76d25b9eb5ULL, 0x03757cb2ef31f205ULL, 0x1017ecee5d26f515ULL, 0x166c6a2a604c99a5ULL, 0x1ce0e16627f22c75ULL, 0x1a9b67a21a9840c5ULL, 0x1f9892ee830b2b38ULL, 0x19e3142abe614788ULL, 0x136f9f66f9dff258ULL, 0x151419a2c4b59ee8ULL, 0x067689fe76a299f8ULL, 0x000d0f3a4bc8f548ULL, 0x0a8184760c764098ULL, 0x0cfa02b2311c2c28ULL, 0x12795ccc10ef360cULL, 0x1402da082d855abcULL, 0x1e8e51446a3bef6cULL, 0x18f5d780575183dcULL, 0x0b9747dce54684ccULL, 0x0decc118d82ce87cULL, 0x07604a549f925dacULL, 0x011bcc90a2f8311cULL, 0x041839dc3b6b5ae1ULL, 0x0263bf1806013651ULL, 0x08ef345441bf8381ULL, 0x0e94b2907cd5ef31ULL, 0x1df622cccec2e821ULL, 0x1b8da408f3a88491ULL, 0x11012f44b4163141ULL, 0x177aa980897c5df1ULL, 0x1b06c5dd8730e6bbULL, 0x1d7d4319ba5a8a0bULL, 0x17f1c855fde43fdbULL, 0x118a4e91c08e536bULL, 0x02e8decd7299547bULL, 0x049358094ff338cbULL, 0x0e1fd345084d8d1bULL, 0x086455813527e1abULL, 0x0d67a0cdacb48a56ULL, 0x0b1c260991dee6e6ULL, 0x0190ad45d6605336ULL, 0x07eb2b81eb0a3f86ULL, 0x1489bbdd591d3896ULL, 0x12f23d1964775426ULL, 0x187eb65523c9e1f6ULL, 0x1e0530911ea38d46ULL, 0x00c55998a0f8dcd3ULL, 0x06bedf5c9d92b063ULL, 0x0c325410da2c05b3ULL, 0x0a49d2d4e7466903ULL, 0x192b428855516e13ULL, 0x1f50c44c683b02a3ULL, 0x15dc4f002f85b773ULL, 0x13a7c9c412efdbc3ULL, 0x16a43c888b7cb03eULL, 0x10dfba4cb616dc8eULL, 0x1a533100f1a8695eULL, 0x1c28b7c4ccc205eeULL, 0x0f4a27987ed502feULL, 0x0931a15c43bf6e4eULL, 0x03bd2a100401db9eULL, 0x05c6acd4396bb72eULL, 0x09bac08937270c64ULL, 0x0fc1464d0a4d60d4ULL, 0x054dcd014df3d504ULL, 0x03364bc57099b9b4ULL, 0x1054db99c28ebea4ULL, 0x162f5d5dffe4d214ULL, 0x1ca3d611b85a67c4ULL, 0x1ad850d585300b74ULL, 0x1fdba5991ca36089ULL, 0x19a0235d21c90c39ULL, 0x132ca8116677b9e9ULL, 0x15572ed55b1dd559ULL, 0x0635be89e90ad249ULL, 0x004e384dd460bef9ULL, 0x0ac2b30193de0b29ULL, 0x0cb935c5aeb46799ULL, 0x123a6bbb8f477dbdULL, 0x1441ed7fb22d110dULL, 0x1ecd6633f593a4ddULL, 0x18b6e0f7c8f9c86dULL, 0x0bd470ab7aeecf7dULL, 0x0daff66f4784a3cdULL, 0x07237d23003a161dULL, 0x0158fbe73d507aadULL, 0x045b0eaba4c31150ULL, 0x0220886f99a97de0ULL, 0x08ac0323de17c830ULL, 0x0ed785e7e37da480ULL, 0x1db515bb516aa390ULL, 0x1bce937f6c00cf20ULL, 0x114218332bbe7af0ULL, 0x17399ef716d41640ULL, 0x1b45f2aa1898ad0aULL, 0x1d3e746e25f2c1baULL, 0x17b2ff22624c746aULL, 0x11c979e65f2618daULL, 0x02abe9baed311fcaULL, 0x04d06f7ed05b737aULL, 0x0e5ce43297e5c6aaULL, 0x082762f6aa8faa1aULL, 0x0d2497ba331cc1e7ULL, 0x0b5f117e0e76ad57ULL, 0x01d39a3249c81887ULL, 0x07a81cf674a27437ULL, 0x14ca8caac6b57327ULL, 0x12b10a6efbdf1f97ULL, 0x183d8122bc61aa47ULL, 0x1e4607e6810bc6f7ULL }; #else /* the original ones */ #define RABIN_WINDOW_SIZE 22 #define RABIN_SHIFT 55 static const u_int64_t T[256] = { 0x0000000000000000ULL, 0xb15e234bd3792f63ULL, 0x62bc4697a6f25ec6ULL, 0xd3e265dc758b71a5ULL, 0x7426ae649e9d92efULL, 0xc5788d2f4de4bd8cULL, 0x169ae8f3386fcc29ULL, 0xa7c4cbb8eb16e34aULL, 0x59137f82ee420abdULL, 0xe84d5cc93d3b25deULL, 0x3baf391548b0547bULL, 0x8af11a5e9bc97b18ULL, 0x2d35d1e670df9852ULL, 0x9c6bf2ada3a6b731ULL, 0x4f899771d62dc694ULL, 0xfed7b43a0554e9f7ULL, 0x0378dc4e0ffd3a19ULL, 0xb226ff05dc84157aULL, 0x61c49ad9a90f64dfULL, 0xd09ab9927a764bbcULL, 0x775e722a9160a8f6ULL, 0xc600516142198795ULL, 0x15e234bd3792f630ULL, 0xa4bc17f6e4ebd953ULL, 0x5a6ba3cce1bf30a4ULL, 0xeb35808732c61fc7ULL, 0x38d7e55b474d6e62ULL, 0x8989c61094344101ULL, 0x2e4d0da87f22a24bULL, 0x9f132ee3ac5b8d28ULL, 0x4cf14b3fd9d0fc8dULL, 0xfdaf68740aa9d3eeULL, 0x06f1b89c1ffa7432ULL, 0xb7af9bd7cc835b51ULL, 0x644dfe0bb9082af4ULL, 0xd513dd406a710597ULL, 0x72d716f88167e6ddULL, 0xc38935b3521ec9beULL, 0x106b506f2795b81bULL, 0xa1357324f4ec9778ULL, 0x5fe2c71ef1b87e8fULL, 0xeebce45522c151ecULL, 0x3d5e8189574a2049ULL, 0x8c00a2c284330f2aULL, 0x2bc4697a6f25ec60ULL, 0x9a9a4a31bc5cc303ULL, 0x49782fedc9d7b2a6ULL, 0xf8260ca61aae9dc5ULL, 0x058964d210074e2bULL, 0xb4d74799c37e6148ULL, 0x67352245b6f510edULL, 0xd66b010e658c3f8eULL, 0x71afcab68e9adcc4ULL, 0xc0f1e9fd5de3f3a7ULL, 0x13138c2128688202ULL, 0xa24daf6afb11ad61ULL, 0x5c9a1b50fe454496ULL, 0xedc4381b2d3c6bf5ULL, 0x3e265dc758b71a50ULL, 0x8f787e8c8bce3533ULL, 0x28bcb53460d8d679ULL, 0x99e2967fb3a1f91aULL, 0x4a00f3a3c62a88bfULL, 0xfb5ed0e81553a7dcULL, 0x0de371383ff4e864ULL, 0xbcbd5273ec8dc707ULL, 0x6f5f37af9906b6a2ULL, 0xde0114e44a7f99c1ULL, 0x79c5df5ca1697a8bULL, 0xc89bfc17721055e8ULL, 0x1b7999cb079b244dULL, 0xaa27ba80d4e20b2eULL, 0x54f00ebad1b6e2d9ULL, 0xe5ae2df102cfcdbaULL, 0x364c482d7744bc1fULL, 0x87126b66a43d937cULL, 0x20d6a0de4f2b7036ULL, 0x918883959c525f55ULL, 0x426ae649e9d92ef0ULL, 0xf334c5023aa00193ULL, 0x0e9bad763009d27dULL, 0xbfc58e3de370fd1eULL, 0x6c27ebe196fb8cbbULL, 0xdd79c8aa4582a3d8ULL, 0x7abd0312ae944092ULL, 0xcbe320597ded6ff1ULL, 0x1801458508661e54ULL, 0xa95f66cedb1f3137ULL, 0x5788d2f4de4bd8c0ULL, 0xe6d6f1bf0d32f7a3ULL, 0x3534946378b98606ULL, 0x846ab728abc0a965ULL, 0x23ae7c9040d64a2fULL, 0x92f05fdb93af654cULL, 0x41123a07e62414e9ULL, 0xf04c194c355d3b8aULL, 0x0b12c9a4200e9c56ULL, 0xba4ceaeff377b335ULL, 0x69ae8f3386fcc290ULL, 0xd8f0ac785585edf3ULL, 0x7f3467c0be930eb9ULL, 0xce6a448b6dea21daULL, 0x1d8821571861507fULL, 0xacd6021ccb187f1cULL, 0x5201b626ce4c96ebULL, 0xe35f956d1d35b988ULL, 0x30bdf0b168bec82dULL, 0x81e3d3fabbc7e74eULL, 0x2627184250d10404ULL, 0x97793b0983a82b67ULL, 0x449b5ed5f6235ac2ULL, 0xf5c57d9e255a75a1ULL, 0x086a15ea2ff3a64fULL, 0xb93436a1fc8a892cULL, 0x6ad6537d8901f889ULL, 0xdb8870365a78d7eaULL, 0x7c4cbb8eb16e34a0ULL, 0xcd1298c562171bc3ULL, 0x1ef0fd19179c6a66ULL, 0xafaede52c4e54505ULL, 0x51796a68c1b1acf2ULL, 0xe027492312c88391ULL, 0x33c52cff6743f234ULL, 0x829b0fb4b43add57ULL, 0x255fc40c5f2c3e1dULL, 0x9401e7478c55117eULL, 0x47e3829bf9de60dbULL, 0xf6bda1d02aa74fb8ULL, 0x1bc6e2707fe9d0c8ULL, 0xaa98c13bac90ffabULL, 0x797aa4e7d91b8e0eULL, 0xc82487ac0a62a16dULL, 0x6fe04c14e1744227ULL, 0xdebe6f5f320d6d44ULL, 0x0d5c0a8347861ce1ULL, 0xbc0229c894ff3382ULL, 0x42d59df291abda75ULL, 0xf38bbeb942d2f516ULL, 0x2069db65375984b3ULL, 0x9137f82ee420abd0ULL, 0x36f333960f36489aULL, 0x87ad10dddc4f67f9ULL, 0x544f7501a9c4165cULL, 0xe511564a7abd393fULL, 0x18be3e3e7014ead1ULL, 0xa9e01d75a36dc5b2ULL, 0x7a0278a9d6e6b417ULL, 0xcb5c5be2059f9b74ULL, 0x6c98905aee89783eULL, 0xddc6b3113df0575dULL, 0x0e24d6cd487b26f8ULL, 0xbf7af5869b02099bULL, 0x41ad41bc9e56e06cULL, 0xf0f362f74d2fcf0fULL, 0x2311072b38a4beaaULL, 0x924f2460ebdd91c9ULL, 0x358befd800cb7283ULL, 0x84d5cc93d3b25de0ULL, 0x5737a94fa6392c45ULL, 0xe6698a0475400326ULL, 0x1d375aec6013a4faULL, 0xac6979a7b36a8b99ULL, 0x7f8b1c7bc6e1fa3cULL, 0xced53f301598d55fULL, 0x6911f488fe8e3615ULL, 0xd84fd7c32df71976ULL, 0x0badb21f587c68d3ULL, 0xbaf391548b0547b0ULL, 0x4424256e8e51ae47ULL, 0xf57a06255d288124ULL, 0x269863f928a3f081ULL, 0x97c640b2fbdadfe2ULL, 0x30028b0a10cc3ca8ULL, 0x815ca841c3b513cbULL, 0x52becd9db63e626eULL, 0xe3e0eed665474d0dULL, 0x1e4f86a26fee9ee3ULL, 0xaf11a5e9bc97b180ULL, 0x7cf3c035c91cc025ULL, 0xcdade37e1a65ef46ULL, 0x6a6928c6f1730c0cULL, 0xdb370b8d220a236fULL, 0x08d56e51578152caULL, 0xb98b4d1a84f87da9ULL, 0x475cf92081ac945eULL, 0xf602da6b52d5bb3dULL, 0x25e0bfb7275eca98ULL, 0x94be9cfcf427e5fbULL, 0x337a57441f3106b1ULL, 0x8224740fcc4829d2ULL, 0x51c611d3b9c35877ULL, 0xe09832986aba7714ULL, 0x16259348401d38acULL, 0xa77bb003936417cfULL, 0x7499d5dfe6ef666aULL, 0xc5c7f69435964909ULL, 0x62033d2cde80aa43ULL, 0xd35d1e670df98520ULL, 0x00bf7bbb7872f485ULL, 0xb1e158f0ab0bdbe6ULL, 0x4f36eccaae5f3211ULL, 0xfe68cf817d261d72ULL, 0x2d8aaa5d08ad6cd7ULL, 0x9cd48916dbd443b4ULL, 0x3b1042ae30c2a0feULL, 0x8a4e61e5e3bb8f9dULL, 0x59ac04399630fe38ULL, 0xe8f227724549d15bULL, 0x155d4f064fe002b5ULL, 0xa4036c4d9c992dd6ULL, 0x77e10991e9125c73ULL, 0xc6bf2ada3a6b7310ULL, 0x617be162d17d905aULL, 0xd025c2290204bf39ULL, 0x03c7a7f5778fce9cULL, 0xb29984bea4f6e1ffULL, 0x4c4e3084a1a20808ULL, 0xfd1013cf72db276bULL, 0x2ef27613075056ceULL, 0x9fac5558d42979adULL, 0x38689ee03f3f9ae7ULL, 0x8936bdabec46b584ULL, 0x5ad4d87799cdc421ULL, 0xeb8afb3c4ab4eb42ULL, 0x10d42bd45fe74c9eULL, 0xa18a089f8c9e63fdULL, 0x72686d43f9151258ULL, 0xc3364e082a6c3d3bULL, 0x64f285b0c17ade71ULL, 0xd5aca6fb1203f112ULL, 0x064ec327678880b7ULL, 0xb710e06cb4f1afd4ULL, 0x49c75456b1a54623ULL, 0xf899771d62dc6940ULL, 0x2b7b12c1175718e5ULL, 0x9a25318ac42e3786ULL, 0x3de1fa322f38d4ccULL, 0x8cbfd979fc41fbafULL, 0x5f5dbca589ca8a0aULL, 0xee039fee5ab3a569ULL, 0x13acf79a501a7687ULL, 0xa2f2d4d1836359e4ULL, 0x7110b10df6e82841ULL, 0xc04e924625910722ULL, 0x678a59fece87e468ULL, 0xd6d47ab51dfecb0bULL, 0x05361f696875baaeULL, 0xb4683c22bb0c95cdULL, 0x4abf8818be587c3aULL, 0xfbe1ab536d215359ULL, 0x2803ce8f18aa22fcULL, 0x995dedc4cbd30d9fULL, 0x3e99267c20c5eed5ULL, 0x8fc70537f3bcc1b6ULL, 0x5c2560eb8637b013ULL, 0xed7b43a0554e9f70ULL }; static const u_int64_t U[256] = { 0x0000000000000000ULL, 0x079343d61ab9f60eULL, 0x0f2687ac3573ec1cULL, 0x08b5c47a2fca1a12ULL, 0x1e4d0f586ae7d838ULL, 0x19de4c8e705e2e36ULL, 0x116b88f45f943424ULL, 0x16f8cb22452dc22aULL, 0x3c9a1eb0d5cfb070ULL, 0x3b095d66cf76467eULL, 0x33bc991ce0bc5c6cULL, 0x342fdacafa05aa62ULL, 0x22d711e8bf286848ULL, 0x2544523ea5919e46ULL, 0x2df196448a5b8454ULL, 0x2a62d59290e2725aULL, 0x79343d61ab9f60e0ULL, 0x7ea77eb7b12696eeULL, 0x7612bacd9eec8cfcULL, 0x7181f91b84557af2ULL, 0x67793239c178b8d8ULL, 0x60ea71efdbc14ed6ULL, 0x685fb595f40b54c4ULL, 0x6fccf643eeb2a2caULL, 0x45ae23d17e50d090ULL, 0x423d600764e9269eULL, 0x4a88a47d4b233c8cULL, 0x4d1be7ab519aca82ULL, 0x5be32c8914b708a8ULL, 0x5c706f5f0e0efea6ULL, 0x54c5ab2521c4e4b4ULL, 0x5356e8f33b7d12baULL, 0x433659888447eea3ULL, 0x44a51a5e9efe18adULL, 0x4c10de24b13402bfULL, 0x4b839df2ab8df4b1ULL, 0x5d7b56d0eea0369bULL, 0x5ae81506f419c095ULL, 0x525dd17cdbd3da87ULL, 0x55ce92aac16a2c89ULL, 0x7fac473851885ed3ULL, 0x783f04ee4b31a8ddULL, 0x708ac09464fbb2cfULL, 0x771983427e4244c1ULL, 0x61e148603b6f86ebULL, 0x66720bb621d670e5ULL, 0x6ec7cfcc0e1c6af7ULL, 0x69548c1a14a59cf9ULL, 0x3a0264e92fd88e43ULL, 0x3d91273f3561784dULL, 0x3524e3451aab625fULL, 0x32b7a09300129451ULL, 0x244f6bb1453f567bULL, 0x23dc28675f86a075ULL, 0x2b69ec1d704cba67ULL, 0x2cfaafcb6af54c69ULL, 0x06987a59fa173e33ULL, 0x010b398fe0aec83dULL, 0x09befdf5cf64d22fULL, 0x0e2dbe23d5dd2421ULL, 0x18d5750190f0e60bULL, 0x1f4636d78a491005ULL, 0x17f3f2ada5830a17ULL, 0x1060b17bbf3afc19ULL, 0x3732905adbf6f225ULL, 0x30a1d38cc14f042bULL, 0x381417f6ee851e39ULL, 0x3f875420f43ce837ULL, 0x297f9f02b1112a1dULL, 0x2eecdcd4aba8dc13ULL, 0x265918ae8462c601ULL, 0x21ca5b789edb300fULL, 0x0ba88eea0e394255ULL, 0x0c3bcd3c1480b45bULL, 0x048e09463b4aae49ULL, 0x031d4a9021f35847ULL, 0x15e581b264de9a6dULL, 0x1276c2647e676c63ULL, 0x1ac3061e51ad7671ULL, 0x1d5045c84b14807fULL, 0x4e06ad3b706992c5ULL, 0x4995eeed6ad064cbULL, 0x41202a97451a7ed9ULL, 0x46b369415fa388d7ULL, 0x504ba2631a8e4afdULL, 0x57d8e1b50037bcf3ULL, 0x5f6d25cf2ffda6e1ULL, 0x58fe6619354450efULL, 0x729cb38ba5a622b5ULL, 0x750ff05dbf1fd4bbULL, 0x7dba342790d5cea9ULL, 0x7a2977f18a6c38a7ULL, 0x6cd1bcd3cf41fa8dULL, 0x6b42ff05d5f80c83ULL, 0x63f73b7ffa321691ULL, 0x646478a9e08be09fULL, 0x7404c9d25fb11c86ULL, 0x73978a044508ea88ULL, 0x7b224e7e6ac2f09aULL, 0x7cb10da8707b0694ULL, 0x6a49c68a3556c4beULL, 0x6dda855c2fef32b0ULL, 0x656f4126002528a2ULL, 0x62fc02f01a9cdeacULL, 0x489ed7628a7eacf6ULL, 0x4f0d94b490c75af8ULL, 0x47b850cebf0d40eaULL, 0x402b1318a5b4b6e4ULL, 0x56d3d83ae09974ceULL, 0x51409becfa2082c0ULL, 0x59f55f96d5ea98d2ULL, 0x5e661c40cf536edcULL, 0x0d30f4b3f42e7c66ULL, 0x0aa3b765ee978a68ULL, 0x0216731fc15d907aULL, 0x058530c9dbe46674ULL, 0x137dfbeb9ec9a45eULL, 0x14eeb83d84705250ULL, 0x1c5b7c47abba4842ULL, 0x1bc83f91b103be4cULL, 0x31aaea0321e1cc16ULL, 0x3639a9d53b583a18ULL, 0x3e8c6daf1492200aULL, 0x391f2e790e2bd604ULL, 0x2fe7e55b4b06142eULL, 0x2874a68d51bfe220ULL, 0x20c162f77e75f832ULL, 0x2752212164cc0e3cULL, 0x6e6520b5b7ede44aULL, 0x69f66363ad541244ULL, 0x6143a719829e0856ULL, 0x66d0e4cf9827fe58ULL, 0x70282feddd0a3c72ULL, 0x77bb6c3bc7b3ca7cULL, 0x7f0ea841e879d06eULL, 0x789deb97f2c02660ULL, 0x52ff3e056222543aULL, 0x556c7dd3789ba234ULL, 0x5dd9b9a95751b826ULL, 0x5a4afa7f4de84e28ULL, 0x4cb2315d08c58c02ULL, 0x4b21728b127c7a0cULL, 0x4394b6f13db6601eULL, 0x4407f527270f9610ULL, 0x17511dd41c7284aaULL, 0x10c25e0206cb72a4ULL, 0x18779a78290168b6ULL, 0x1fe4d9ae33b89eb8ULL, 0x091c128c76955c92ULL, 0x0e8f515a6c2caa9cULL, 0x063a952043e6b08eULL, 0x01a9d6f6595f4680ULL, 0x2bcb0364c9bd34daULL, 0x2c5840b2d304c2d4ULL, 0x24ed84c8fcced8c6ULL, 0x237ec71ee6772ec8ULL, 0x35860c3ca35aece2ULL, 0x32154feab9e31aecULL, 0x3aa08b90962900feULL, 0x3d33c8468c90f6f0ULL, 0x2d53793d33aa0ae9ULL, 0x2ac03aeb2913fce7ULL, 0x2275fe9106d9e6f5ULL, 0x25e6bd471c6010fbULL, 0x331e7665594dd2d1ULL, 0x348d35b343f424dfULL, 0x3c38f1c96c3e3ecdULL, 0x3babb21f7687c8c3ULL, 0x11c9678de665ba99ULL, 0x165a245bfcdc4c97ULL, 0x1eefe021d3165685ULL, 0x197ca3f7c9afa08bULL, 0x0f8468d58c8262a1ULL, 0x08172b03963b94afULL, 0x00a2ef79b9f18ebdULL, 0x0731acafa34878b3ULL, 0x5467445c98356a09ULL, 0x53f4078a828c9c07ULL, 0x5b41c3f0ad468615ULL, 0x5cd28026b7ff701bULL, 0x4a2a4b04f2d2b231ULL, 0x4db908d2e86b443fULL, 0x450ccca8c7a15e2dULL, 0x429f8f7edd18a823ULL, 0x68fd5aec4dfada79ULL, 0x6f6e193a57432c77ULL, 0x67dbdd4078893665ULL, 0x60489e966230c06bULL, 0x76b055b4271d0241ULL, 0x712316623da4f44fULL, 0x7996d218126eee5dULL, 0x7e0591ce08d71853ULL, 0x5957b0ef6c1b166fULL, 0x5ec4f33976a2e061ULL, 0x567137435968fa73ULL, 0x51e2749543d10c7dULL, 0x471abfb706fcce57ULL, 0x4089fc611c453859ULL, 0x483c381b338f224bULL, 0x4faf7bcd2936d445ULL, 0x65cdae5fb9d4a61fULL, 0x625eed89a36d5011ULL, 0x6aeb29f38ca74a03ULL, 0x6d786a25961ebc0dULL, 0x7b80a107d3337e27ULL, 0x7c13e2d1c98a8829ULL, 0x74a626abe640923bULL, 0x7335657dfcf96435ULL, 0x20638d8ec784768fULL, 0x27f0ce58dd3d8081ULL, 0x2f450a22f2f79a93ULL, 0x28d649f4e84e6c9dULL, 0x3e2e82d6ad63aeb7ULL, 0x39bdc100b7da58b9ULL, 0x3108057a981042abULL, 0x369b46ac82a9b4a5ULL, 0x1cf9933e124bc6ffULL, 0x1b6ad0e808f230f1ULL, 0x13df149227382ae3ULL, 0x144c57443d81dcedULL, 0x02b49c6678ac1ec7ULL, 0x0527dfb06215e8c9ULL, 0x0d921bca4ddff2dbULL, 0x0a01581c576604d5ULL, 0x1a61e967e85cf8ccULL, 0x1df2aab1f2e50ec2ULL, 0x15476ecbdd2f14d0ULL, 0x12d42d1dc796e2deULL, 0x042ce63f82bb20f4ULL, 0x03bfa5e99802d6faULL, 0x0b0a6193b7c8cce8ULL, 0x0c992245ad713ae6ULL, 0x26fbf7d73d9348bcULL, 0x2168b401272abeb2ULL, 0x29dd707b08e0a4a0ULL, 0x2e4e33ad125952aeULL, 0x38b6f88f57749084ULL, 0x3f25bb594dcd668aULL, 0x37907f2362077c98ULL, 0x30033cf578be8a96ULL, 0x6355d40643c3982cULL, 0x64c697d0597a6e22ULL, 0x6c7353aa76b07430ULL, 0x6be0107c6c09823eULL, 0x7d18db5e29244014ULL, 0x7a8b9888339db61aULL, 0x723e5cf21c57ac08ULL, 0x75ad1f2406ee5a06ULL, 0x5fcfcab6960c285cULL, 0x585c89608cb5de52ULL, 0x50e94d1aa37fc440ULL, 0x577a0eccb9c6324eULL, 0x4182c5eefcebf064ULL, 0x46118638e652066aULL, 0x4ea44242c9981c78ULL, 0x49370194d321ea76ULL }; #endif static unsigned char rabin_window[RABIN_WINDOW_SIZE]; static unsigned rabin_pos = 0; #define MIN(x,y) ((y)<(x) ? (y) : (x)) #define MAX(x,y) ((y)>(x) ? (y) : (x)) /* * The copies array is the central data structure for diff generation. * Data statements are implicit, for ranges not covered by any copy command. * * The sum of tgt and length for each entry must be monotonically increasing, * and data ranges must be non-overlapping. This is accomplished by not * extending matches backwards during initial matching. * * Copies may have zero length, to make it quick to delete copies during * optimization. However, the last copy in the list must always be a * non-trivial copy. * * Before committing copies, an important optimization is performed: during * a backward pass through the copies array, each entry is extended backwards, * and redundant copies are eliminated. * * If each match were extended backwards on insertion, the same data may be * matched an arbitrary number of times, resulting in potentially quadratic * time behavior. */ typedef struct copyinfo { unsigned src; unsigned tgt; unsigned length; } CopyInfo; static CopyInfo *copies; static int copy_count = 0; static unsigned max_copies = 0; /* Dynamically increased */ static unsigned *idx; static unsigned idx_size; static unsigned char *idx_data; static unsigned idx_data_len; static void rabin_reset(void) { memset(rabin_window, 0, sizeof(rabin_window)); } static u_int64_t rabin_slide (u_int64_t fp, unsigned char m) { unsigned char om; if (++rabin_pos == RABIN_WINDOW_SIZE) rabin_pos = 0; om = rabin_window[rabin_pos]; fp ^= U[om]; rabin_window[rabin_pos] = m; fp = ((fp << 8) | m) ^ T[fp >> RABIN_SHIFT]; return fp; } static void init_idx (unsigned char *data, size_t len, int level) { unsigned index_step = RABIN_WINDOW_SIZE / sizeof(unsigned) * sizeof(unsigned); size_t j, k; unsigned char ch = 0; unsigned maxofs[256]; unsigned maxlen[256]; unsigned maxfp[256]; unsigned runlen = 0; u_int64_t fp = 0; assert (len <= MAX_SIZE); assert (level >= 0 && level <= 9); memset(maxofs, 0, sizeof(maxofs)); memset(maxlen, 0, sizeof(maxlen)); memset(maxfp, 0, sizeof(maxfp)); /* index_step must be multiple of word size */ if (level >= 1) index_step = MIN(index_step, 4 * sizeof(unsigned)); /* Use smaller step size for higher optimization levels or smaller files */ if (level >= 3 || len <= 65536) index_step = MIN (index_step, 3 * sizeof (unsigned)); if (level >= 4 || len <= 32768) index_step = MIN (index_step, 2 * sizeof (unsigned)); if (level >= 6 || len < 16384) index_step = MIN (index_step, 1 * sizeof (unsigned)); assert (index_step && !(index_step % sizeof (unsigned))); /* Add fixed amount to hash table size, as small files will benefit a lot without using significantly more memory or time. */ idx_size = (level + 1) * (len / index_step) / 2 + MIN_HTAB_SIZE; idx_size = MIN (idx_size, MAX_HTAB_SIZE - 1); /* So rounding up works */ /* Round up to next power of two, but limit to MAX_HTAB_SIZE. */ { unsigned s = MIN_HTAB_SIZE; while (s < idx_size) s += s; idx_size = s; } idx_data = data; idx_data_len = len; idx = calloc(idx_size, sizeof(unsigned)); /* It is tempting to first index higher addresses, so hashes of lower addresses will get preference in the hash table. However, for repetitive patterns with a period that is a divisor of the fingerprint window, this may mean the match is not anchored at the end. Furthermore, even when using a window length that is prime, the benefits are small and the irregularity of the first matches being more important is not worth it. */ rabin_reset(); ch = 0; runlen = 0; for (j = 0; j + index_step < len; j += index_step) { unsigned char pch = 0; unsigned hash; for (k = 0; k < index_step; k++) { pch = ch; ch = data[j + k]; if (ch != pch) runlen = 0; runlen++; fp = rabin_slide(fp, ch); } /* See if there is a word-aligned window-sized run of equal characters */ if (runlen >= RABIN_WINDOW_SIZE + sizeof(unsigned) - 1) { /* Skip ahead to end of run of identical input characters */ while (j + k < len && data[j + k] == ch) { k++; runlen++; } /* Although matches are usually anchored at the end, in the case of extended runs of equal characters it is better to anchor after the first RABIN_WINDOW_SIZE bytes. This allows for quick skip ahead while matching such runs, avoiding unneeded fingerprint calculations. Also, when anchoring at the end, matches will be generated after every word, because the fingerprint stays constant. Even though all matches would get combined during match optimization, it wastes time and space. */ if (runlen > maxlen[pch] + 4) { unsigned ofs; /* ofs points RABIN_WINDOW_SIZE bytes after the start of the run, rounded up to the next word */ ofs = j + k - runlen + RABIN_WINDOW_SIZE + (sizeof (unsigned) - 1); ofs -= ofs % sizeof(unsigned); maxofs[pch] = ofs; maxlen [pch] = runlen; assert(maxfp[pch] == 0 || maxfp[pch] == (unsigned)fp); maxfp[pch] = (unsigned)fp; } /* Keep input aligned as if no special run processing had taken place */ j += k - (k % index_step) - index_step; k = index_step; } /* Testing showed that avoiding collisions using secondary hashing, or hash chaining had little effect and is not worth the time. */ hash = ((unsigned)fp) & (idx_size - 1); idx[hash] = j + k; } /* Lastly, index the longest runs of equal characters found before. This ensures we always match the longerst such runs available. */ for (j = 0; j < 256; j++) if (maxlen[j]) idx[maxfp[j] % idx_size] = maxofs[j]; } /* Match data against the current index and record all possible copies */ static int find_copies(unsigned char *data, size_t len) { size_t j = 0; u_int64_t fp = 0; rabin_reset(); while (j < RABIN_WINDOW_SIZE && j < len) fp = rabin_slide(fp, data[j++]); while (j < len) { unsigned hash, ofs, src, tgt, runlen, maxrun; fp = rabin_slide(fp, data[j++]); hash = fp & (idx_size - 1); ofs = idx[hash]; /* Invariant: data[0] .. data[j-1] has been processed fp is fingerprint of sliding window ending at j-1 ofs is zero or points just past tentative match ofs is a multiple of index_step */ if (!ofs) continue; runlen = 0; tgt = j - 4; src = ofs - 4; maxrun = MIN(idx_data_len - src, len - tgt); /* Hot loop */ while (runlen < maxrun && data[tgt + runlen] == idx_data[src + runlen]) runlen++; if (runlen < 4) continue; if (copy_count == max_copies) { max_copies *= 2; if (!copies) { max_copies = MAX_COPIES; copies = malloc(max_copies * sizeof(CopyInfo)); } else { copies = realloc(copies, max_copies * sizeof (CopyInfo)); } if (!copies) return 0; } copies[copy_count].src = src; copies[copy_count].tgt = tgt; copies[copy_count].length = runlen; copy_count++; /* For runs extending more than RABIN_WINDOW_SIZE bytes beyond j, skip ahead to prevent useless fingerprint computations. */ if (tgt + runlen > j + RABIN_WINDOW_SIZE) j = tgt + runlen - RABIN_WINDOW_SIZE; /* Quickly scan ahead without looking for matches until the end of this run */ while (j < tgt + runlen) fp = rabin_slide(fp, data[j++]); } return 1; } static unsigned header_length(unsigned srclen, unsigned tgtlen) { unsigned len = 0; assert (srclen <= MAX_SIZE && tgtlen <= MAX_SIZE); /* GIT headers start with the length of the source and target, with 7 bits per byte, least significant byte first, and the high bit indicating continuation. */ do { len++; srclen >>= 7; } while (srclen); do { len++; tgtlen >>= 7; } while (tgtlen); return len; } static unsigned char *write_header(unsigned char *patch, unsigned srclen, unsigned tgtlen) { assert (srclen <= MAX_SIZE && tgtlen <= MAX_SIZE); while (srclen >= 0x80) { *patch++ = srclen | 0x80; srclen >>= 7; } *patch++ = srclen; while (tgtlen >= 0x80) { *patch++ = tgtlen | 0x80; tgtlen >>= 7; } *patch++ = tgtlen; return patch; } static unsigned data_length(unsigned length) { /* Can only include 0x7f data bytes per command */ unsigned partial = length % 0x7f; assert (length > 0 && length <= MAX_SIZE); if (partial) partial++; return partial + (length / 0x7f) * 0x80; } static unsigned char *write_data(unsigned char *patch, unsigned char *data, unsigned size) { assert (size > 0 && size < MAX_SIZE); /* The return value must be equal to patch + data_length (patch, size). This correspondence is essential for calculating the patch size. */ /* GIT has no data commands for large data, rest is same as GDIFF */ do { unsigned s = size; if (s > 0x7f) s = 0x7f; *patch++ = s; memcpy(patch, data, s); data += s; patch += s; size -= s; } while (size); return patch; } static unsigned copy_length (unsigned offset, unsigned length) { unsigned size = 0; assert (offset < MAX_SIZE && length < MAX_SIZE); /* For now we only copy a maximum of 0x10000 bytes per command. Longer copies are broken into pieces of that size. */ do { signed s = length; if (s > 0x10000) s = 0x10000; size += !!(s & 0xff) + !!(s & 0xff00); size += !!(offset & 0xff) + !!(offset & 0xff00) + !!(offset & 0xff0000) + !!(offset & 0xff000000); size += 1; offset += s; length -= s; } while (length); return size; } static unsigned char * write_copy(unsigned char *patch, unsigned offset, unsigned size) { /* The return value must be equal to patch + copy_length (patch,offset,size). This correspondence is essential for calculating the patch size. */ do { unsigned char c = 0x80, *cmd = patch++; unsigned v, s = size; if (s > 0x10000) s = 0x10000; v = offset; if (v & 0xff) c |= 0x01, *patch++ = v; v >>= 8; if (v & 0xff) c |= 0x02, *patch++ = v; v >>= 8; if (v & 0xff) c |= 0x04, *patch++ = v; v >>= 8; if (v & 0xff) c |= 0x08, *patch++ = v; v = s; if (v & 0xff) c |= 0x10, *patch++ = v; v >>= 8; if (v & 0xff) c |= 0x20, *patch++ = v; *cmd = c; offset += s; size -= s; } while (size); return patch; } static unsigned process_copies (unsigned char *data, unsigned length, unsigned maxlen) { int j; unsigned ptr = length; unsigned patch_bytes = header_length(idx_data_len, length); /* Work through the copies backwards, extending each one backwards. */ for (j = copy_count - 1; j >= 0; j--) { CopyInfo *copy = copies+j; unsigned src = copy->src; unsigned tgt = copy->tgt; unsigned len = copy->length; int data_follows; if (tgt + len > ptr) { /* Part of copy already covered by later one, so shorten copy. */ if (ptr < tgt) { /* Copy completely disappeared, but guess that a backward extension might still be useful. This extension is non-contiguous, as it is irrelevant whether the skipped data would have matched or not. Be careful to not extend past the beginning of the source. */ unsigned adjust = tgt - ptr; tgt = ptr; src = (src < adjust) ? 0 : src - adjust; copy->tgt = tgt; copy->src = src; } len = ptr - tgt; } while (src && tgt && idx_data[src - 1] == data[tgt - 1]) { src--; tgt--; } len += copy->tgt - tgt; data_follows = (tgt + len < ptr); /* A short copy may cost as much as 6 bytes for the copy and 5 as result of an extra data command. It's not worth having extra copies in order to just save a byte or two. Being too smart here may hurt later compression as well. */ if (len < (data_follows ? 16 : 10)) len = 0; /* Some target data is not covered by the copies, account for the DATA command that will follow the copy. */ if (len && data_follows) patch_bytes += data_length(ptr - (tgt + len)); /* Everything about the copy is known and will not change. Write back the new information and update the patch size with the size of the copy instruction. */ copy->length = len; copy->src = src; copy->tgt = tgt; if (len) { /* update patch size for copy command */ patch_bytes += copy_length (src, len); ptr = tgt; } else if (j == copy_count - 1) { /* Remove empty copies at end of list. */ copy_count--; } if (patch_bytes > maxlen) return 0; } /* Account for data before first copy */ if (ptr != 0) patch_bytes += data_length(ptr); if (patch_bytes > maxlen) return 0; return patch_bytes; } static unsigned calculate_delta(void *to_buf, unsigned long to_size, unsigned long max_size) { assert (to_size < MAX_SIZE); if (!find_copies(to_buf, to_size)) return 0; return process_copies(to_buf, to_size, max_size); } static void *create_delta (unsigned char *data, unsigned len, unsigned delta_size) { unsigned char *delta = malloc(delta_size); unsigned char *ptr = delta; unsigned offset = 0; int j; ptr = write_header(ptr, idx_data_len, len); for (j = 0; j < copy_count; j++) { CopyInfo *copy = copies + j; unsigned copylen = copy->length; if (!copylen) continue; if (copy->tgt > offset) { assert (delta_size - (ptr - delta) > data_length(copy->tgt - offset)); ptr = write_data(ptr, data + offset, copy->tgt - offset); } assert(delta_size - (ptr - delta) >= copy_length(copy->src, copylen)); ptr = write_copy(ptr, copy->src, copylen); offset = copy->tgt + copylen; } if (offset < len) ptr = write_data(ptr, data + offset, len - offset); assert(ptr - delta == delta_size); return delta; } static void finalize_idx() { free(copies); copies = 0; max_copies = 0; copy_count = 0; free(idx); idx = 0; idx_size = 0; idx_data = 0; idx_data_len = 0; } void *diff_delta(void *from_buf, unsigned long from_size, void *to_buf, unsigned long to_size, unsigned long *delta_size, unsigned long max_size) { unsigned char *delta = 0; unsigned dsize; assert (from_size <= MAX_SIZE && to_size <= MAX_SIZE); init_idx(from_buf, from_size, 1); /* Use optimization level 1 */ if (!max_size) max_size = MAX_SIZE; dsize = calculate_delta (to_buf, to_size, max_size); if (dsize) delta = create_delta (to_buf, to_size, dsize); finalize_idx (); if (delta) *delta_size = dsize; return delta; } ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: RFC: New diff-delta.c implementation 2006-04-24 15:57 ` Nicolas Pitre @ 2006-04-24 16:31 ` Geert Bosch 2006-04-24 18:24 ` Geert Bosch 2006-04-24 18:27 ` Geert Bosch 2006-04-24 19:21 ` Rutger Nijlunsing 2 siblings, 1 reply; 32+ messages in thread From: Geert Bosch @ 2006-04-24 16:31 UTC (permalink / raw) To: Nicolas Pitre; +Cc: Rene Scharfe, Git Mailing List, Junio C Hamano > Sure, but the data buffers aren't necessarily so. At least some > code to > align index_step with actual memory offset if necessary should be > considered. I don't quite understand. Do you mean that if I mmap a file, I can't count on the memory being word-aligned? Note that I traverse the target buffer byte-by-byte, but the matches in the source buffer are always aligned to index_step. Doing a word load there instead of individual byte loads actually is a significant speedup on PPC, and most other non-Intel platforms. > Indeed. And since the primary goal for GIT is to manage relatively > small files with relatively few differences then we have to > optimize for > that case while trying to simply limit the dammage in the other cases. Yes, I'll look at those now. Also, comparing one index against 10 target files may change the profile quite a bit. The new algorithm spends most time on indexing, but when comparing against many files, the find_copy part suddenly becomes dominant. > Well, I did lots of benchmarks too over the Linux kernel repository > with the current code. It is of course a dataset quite different from > two large files. And simply increasing the hash size to improve on > pack > size did increase CPU usage quite significantly. BTW, I still get: potomac%git-rev-list --objects e64961b0573b0e72bd55eab6d36bd97f859f9516 | ./git-pack-objects --no- reuse-delta --stdout Generating pack... Done counting 17005 objects. Deltifying 17005 objects. 100% (17005/17005) done fatal: delta size changed (This is for my git.git tree.) >> PS. Somehow your code had "double line spacing" :) > Gah. Your original version must have CRLF line terminations, and vi > simply notice that and writes the file back with CRLF by default. > Find > attached a version with those converted to LF only. Strange, my platform doesn't use CRLF, and my sources all have pristine LF line terminations. -Geert ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: RFC: New diff-delta.c implementation 2006-04-24 16:31 ` Geert Bosch @ 2006-04-24 18:24 ` Geert Bosch 0 siblings, 0 replies; 32+ messages in thread From: Geert Bosch @ 2006-04-24 18:24 UTC (permalink / raw) To: Geert Bosch; +Cc: Nicolas Pitre, Rene Scharfe, Git Mailing List, Junio C Hamano On Apr 24, 2006, at 12:31, Geert Bosch wrote: > BTW, I still get: > potomac%git-rev-list --objects > e64961b0573b0e72bd55eab6d36bd97f859f9516 | ./git-pack-objects --no- > reuse-delta --stdout > Generating pack... > Done counting 17005 objects. > Deltifying 17005 objects. > 100% (17005/17005) done > fatal: delta size changed > (This is for my git.git tree.) Ignore this, I hadn't had my coffee yet. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: RFC: New diff-delta.c implementation 2006-04-24 15:57 ` Nicolas Pitre 2006-04-24 16:31 ` Geert Bosch @ 2006-04-24 18:27 ` Geert Bosch 2006-04-24 19:21 ` Rutger Nijlunsing 2 siblings, 0 replies; 32+ messages in thread From: Geert Bosch @ 2006-04-24 18:27 UTC (permalink / raw) To: Nicolas Pitre; +Cc: Rene Scharfe, Git Mailing List, Junio C Hamano On Apr 24, 2006, at 11:57, Nicolas Pitre wrote: >> As a compiler writer, I am more than aware of this. >> Your change results in a 70% slowdown on PowerPC. > > Huh!? 2.5s instead of 1.5s comparing gcc-2.95.1.tar against gcc-2.95.2.tar. Byte loads are relatively expensive on this platform. I'll put the word loads back in and make sure they don't slow things down on x86. -Geert ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: RFC: New diff-delta.c implementation 2006-04-24 15:57 ` Nicolas Pitre 2006-04-24 16:31 ` Geert Bosch 2006-04-24 18:27 ` Geert Bosch @ 2006-04-24 19:21 ` Rutger Nijlunsing 2 siblings, 0 replies; 32+ messages in thread From: Rutger Nijlunsing @ 2006-04-24 19:21 UTC (permalink / raw) To: Nicolas Pitre; +Cc: Geert Bosch, Rene Scharfe, Git Mailing List, Junio C Hamano On Mon, Apr 24, 2006 at 11:57:38AM -0400, Nicolas Pitre wrote: > On Mon, 24 Apr 2006, Geert Bosch wrote: > > > > The comparison is a bit between a O(n^2) sort that is fast on small > > or mostly sorted inputs (but horrible on large ones) and a more > > complex O(nlogn) algorithm that is a bit slower for the simple > > cases, but far faster for more complex cases. > > Indeed. And since the primary goal for GIT is to manage relatively > small files with relatively few differences then we have to optimize for > that case while trying to simply limit the dammage in the other cases. Like others (the large-Maildir-storage thread comes to mind), I am looking into storing more diverse data (say, $HOME) into git repo's and I would mind the O(n log n) instead of O(n^2) where the constant factor of the first is larger than the constant factor of the second. ...but then again, I'm just a user ;) -- Rutger Nijlunsing ---------------------------------- eludias ed dse.nl never attribute to a conspiracy which can be explained by incompetence ---------------------------------------------------------------------- ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: RFC: New diff-delta.c implementation 2006-04-24 15:19 ` Geert Bosch 2006-04-24 15:57 ` Nicolas Pitre @ 2006-04-24 20:37 ` Petr Baudis 1 sibling, 0 replies; 32+ messages in thread From: Petr Baudis @ 2006-04-24 20:37 UTC (permalink / raw) To: Geert Bosch; +Cc: Nicolas Pitre, Rene Scharfe, Git Mailing List, Junio C Hamano Dear diary, on Mon, Apr 24, 2006 at 05:19:01PM CEST, I got a letter where Geert Bosch <bosch@adacore.com> said that... > > But here comes the sad part. Even after simplifying the code as much as > > I could, performance is still significantly worse than the current > > diff-delta.c code. Repacking again the same Linux kernel repository > > with the current code: > That's unexpected, but I can see how this could be if most files have > very few differences and are relatively small. For such cases, almost > any hash will do, and the more complicated hashing will be more compute > intensive. > > > I have benchmarked my original diff code on a set of large files with > lots of changes. These are hardest to get right, and hardest to get > good performance with. Just try diffing any two large (uncompressed) > tar files, and you'll see. On many of such large files, the new code > is orders of magnitude faster. On these cases, the resulting deltas > are also much smaller. > > The comparison is a bit between a O(n^2) sort that is fast on small > or mostly sorted inputs (but horrible on large ones) and a more > complex O(nlogn) algorithm that is a bit slower for the simple > cases, but far faster for more complex cases. Can't you just switch between different delta algorithms based on some heuristic like the blob size? -- Petr "Pasky" Baudis Stuff: http://pasky.or.cz/ Right now I am having amnesia and deja-vu at the same time. I think I have forgotten this before. ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: RFC: New diff-delta.c implementation 2006-04-24 5:27 ` Nicolas Pitre 2006-04-24 15:19 ` Geert Bosch @ 2006-04-24 18:44 ` Geert Bosch 1 sibling, 0 replies; 32+ messages in thread From: Geert Bosch @ 2006-04-24 18:44 UTC (permalink / raw) To: Nicolas Pitre; +Cc: Rene Scharfe, Git Mailing List, Junio C Hamano On Apr 24, 2006, at 01:27, Nicolas Pitre wrote: > But here comes the sad part. Even after simplifying the code as > much as > I could, performance is still significantly worse than the current > diff-delta.c code. Repacking again the same Linux kernel repository > with the current code: Changing the level parameter in the call to init_idx to 0 gives a significant speedup. After that, by far the most time is spent computing hashes. I have some ideas of cutting down on that for this test case. > The final pack is smaller with your code but not significantly: > 117867049 bytes vs 118824550 bytes with the current code, i.e. less > than > 1% difference. I'm doing tests on the git.git repository now, and even though I see similar performance, I noted that the new algorithm packs more files, so the pack size is not the only thing to look at. Still, it remains the case that when you compare two files with just one or two changes, the simplest algorithm is still good enough. It will be very interesting how things will work out when comparing 10 files at a time. Then the extra cost of building the index isn't that significant, and the higher quality of the index may then pay off. -Geert ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: RFC: New diff-delta.c implementation 2006-04-24 2:57 ` Geert Bosch 2006-04-24 5:27 ` Nicolas Pitre @ 2006-04-25 18:22 ` Rene Scharfe 1 sibling, 0 replies; 32+ messages in thread From: Rene Scharfe @ 2006-04-25 18:22 UTC (permalink / raw) To: Geert Bosch; +Cc: Git Mailing List, Junio C Hamano Geert Bosch schrieb: > On Sat, Apr 22, 2006 at 02:36:04PM +0200, Rene Scharfe wrote: >> You can use "indent -npro -kr -i8 -ts8 -l80 -ss -ncs" to reformat your >> code into a similar style as used in the rest of git (settings taken >> from Lindent which is shipped with the Linux source). > Although I cringe at 8-space indenting, and find much of the GIT > code close to unreadable for lack of design-level comments, I'll > gladly reformat any code to conform to existing code standards. > Please let me know if you've got documentation on that, as it would > be helpful for me to know what the standard is. (No flame intended. :-) I'm not aware of a document mandating a certain formatting. The output of that indent call should come close to a "standard format", because Linus followed this style from the beginning and Junio didn't go astray. Don't worry too much about it. I just wanted to point out an easy way to reformat your code to use sane indenting. :-> René ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: RFC: New diff-delta.c implementation 2006-04-21 21:16 RFC: New diff-delta.c implementation Geert Bosch ` (2 preceding siblings ...) 2006-04-22 12:36 ` Rene Scharfe @ 2006-04-22 20:36 ` Davide Libenzi 2006-04-23 2:31 ` Geert Bosch 3 siblings, 1 reply; 32+ messages in thread From: Davide Libenzi @ 2006-04-22 20:36 UTC (permalink / raw) To: Geert Bosch; +Cc: Git Mailing List, Junio C Hamano On Fri, 21 Apr 2006, Geert Bosch wrote: > I wrote a new binary differencing algorithm that is both faster > and generates smaller deltas than the current implementation. > The format is compatible with that used by patch-delta, so > it should be easy to integrate. > > Originally, I wrote this for the GDIFF format, see > http://www.w3.org/TR/NOTE-gdiff-19970901. > The adaptation for GIT format was relatively simple, but is not thoroughly > tested. > The code is not derived from libxdiff, but uses the rabin_slide function > written > by David Mazieres (dm@uun.org). Also the tables are generated using his code. > Finally, this was developed on Darwin, and not a Linux system, so some > changes may be needed. > > Initial testing seems quite positive, take for example git-1.2.5.tar vs > git-1.2.6.tar > on my PowerBook (both with -O2 -DNDEBUG): > > current: 2.281s, patch size 36563 > new : 0.109s, patch size 16199 > > Please feel free to play around with this code, and give feedback. > Keep in mind this wasn't originally written for GIT, and C is not > my native language, so don't mind my formatting etc. Geert, I saw you're using a shift of 55 bits, that gives an degree of the root polynomial of 63, that is not prime. Where did you get the root polynomial, and why you did not chose 61 as degree of the root? Just curious ... - Davide ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: RFC: New diff-delta.c implementation 2006-04-22 20:36 ` Davide Libenzi @ 2006-04-23 2:31 ` Geert Bosch 2006-04-24 19:10 ` Davide Libenzi 0 siblings, 1 reply; 32+ messages in thread From: Geert Bosch @ 2006-04-23 2:31 UTC (permalink / raw) To: Davide Libenzi; +Cc: git On Sat, Apr 22, 2006 at 01:36:07PM -0700, Davide Libenzi wrote: > Geert, I saw you're using a shift of 55 bits, that gives an degree of the > root polynomial of 63, that is not prime. Where did you get the root > polynomial, and why you did not chose 61 as degree of the root? > Just curious ... The polynomial was randomly created using code by David Mazieres, that is part of LBFS. I chose a (irreducible) polynomial of degree 63 as that was the same as LBFS did. As for my purposes it's best to have a constant polynomial and I wanted to have all the code for the computations in the same compilation unit for performance, I decided to just have a little program print out the tables and include it directly. The chosen polynomial was 0xb15e234bd3792f63. Later on I haven't revisited this decision, although I agree that it'd probably be a good idea to use a polynomial of prime degree, even though we're not looking for cryptographically strong hashes here. Below I include new tables for degree 61. The window size of 22 was found by plotting graphs on a number of largish test cases (30-100MB) and seeing how the size of the compressed output changed. It is essential to do all comparisons using gzipped output. I've been tempted a number of times to include new optimizations, only to find out that the uncompressed size reduced, but final compressed size grew. More smaller copies and literal data segments is generally worse. -Geert #define RABIN_POLY 0x25bd5331c0d7096dULL #define RABIN_DEGREE 61 #define RABIN_SHIFT 53 #define RABIN_WINDOW_SIZE 22 unsigned long long T[256] = { 0x0000000000000000ULL, 0x25bd5331c0d7096dULL, 0x4b7aa66381ae12daULL, 0x6ec7f55241791bb7ULL, 0x96f54cc7035c25b4ULL, 0xb3481ff6c38b2cd9ULL, 0xdd8feaa482f2376eULL, 0xf832b99542253e03ULL, 0x0857cabfc66f4205ULL, 0x2dea998e06b84b68ULL, 0x432d6cdc47c150dfULL, 0x66903fed871659b2ULL, 0x9ea28678c53367b1ULL, 0xbb1fd54905e46edcULL, 0xd5d8201b449d756bULL, 0xf065732a844a7c06ULL, 0x10af957f8cde840aULL, 0x3512c64e4c098d67ULL, 0x5bd5331c0d7096d0ULL, 0x7e68602dcda79fbdULL, 0x865ad9b88f82a1beULL, 0xa3e78a894f55a8d3ULL, 0xcd207fdb0e2cb364ULL, 0xe89d2ceacefbba09ULL, 0x18f85fc04ab1c60fULL, 0x3d450cf18a66cf62ULL, 0x5382f9a3cb1fd4d5ULL, 0x763faa920bc8ddb8ULL, 0x8e0d130749ede3bbULL, 0xabb04036893aead6ULL, 0xc577b564c843f161ULL, 0xe0cae6550894f80cULL, 0x04e279ced96a0179ULL, 0x215f2aff19bd0814ULL, 0x4f98dfad58c413a3ULL, 0x6a258c9c98131aceULL, 0x92173509da3624cdULL, 0xb7aa66381ae12da0ULL, 0xd96d936a5b983617ULL, 0xfcd0c05b9b4f3f7aULL, 0x0cb5b3711f05437cULL, 0x2908e040dfd24a11ULL, 0x47cf15129eab51a6ULL, 0x627246235e7c58cbULL, 0x9a40ffb61c5966c8ULL, 0xbffdac87dc8e6fa5ULL, 0xd13a59d59df77412ULL, 0xf4870ae45d207d7fULL, 0x144decb155b48573ULL, 0x31f0bf8095638c1eULL, 0x5f374ad2d41a97a9ULL, 0x7a8a19e314cd9ec4ULL, 0x82b8a07656e8a0c7ULL, 0xa705f347963fa9aaULL, 0xc9c20615d746b21dULL, 0xec7f55241791bb70ULL, 0x1c1a260e93dbc776ULL, 0x39a7753f530cce1bULL, 0x5760806d1275d5acULL, 0x72ddd35cd2a2dcc1ULL, 0x8aef6ac99087e2c2ULL, 0xaf5239f85050ebafULL, 0xc195ccaa1129f018ULL, 0xe4289f9bd1fef975ULL, 0x09c4f39db2d402f2ULL, 0x2c79a0ac72030b9fULL, 0x42be55fe337a1028ULL, 0x670306cff3ad1945ULL, 0x9f31bf5ab1882746ULL, 0xba8cec6b715f2e2bULL, 0xd44b19393026359cULL, 0xf1f64a08f0f13cf1ULL, 0x0193392274bb40f7ULL, 0x242e6a13b46c499aULL, 0x4ae99f41f515522dULL, 0x6f54cc7035c25b40ULL, 0x976675e577e76543ULL, 0xb2db26d4b7306c2eULL, 0xdc1cd386f6497799ULL, 0xf9a180b7369e7ef4ULL, 0x196b66e23e0a86f8ULL, 0x3cd635d3fedd8f95ULL, 0x5211c081bfa49422ULL, 0x77ac93b07f739d4fULL, 0x8f9e2a253d56a34cULL, 0xaa237914fd81aa21ULL, 0xc4e48c46bcf8b196ULL, 0xe159df777c2fb8fbULL, 0x113cac5df865c4fdULL, 0x3481ff6c38b2cd90ULL, 0x5a460a3e79cbd627ULL, 0x7ffb590fb91cdf4aULL, 0x87c9e09afb39e149ULL, 0xa274b3ab3beee824ULL, 0xccb346f97a97f393ULL, 0xe90e15c8ba40fafeULL, 0x0d268a536bbe038bULL, 0x289bd962ab690ae6ULL, 0x465c2c30ea101151ULL, 0x63e17f012ac7183cULL, 0x9bd3c69468e2263fULL, 0xbe6e95a5a8352f52ULL, 0xd0a960f7e94c34e5ULL, 0xf51433c6299b3d88ULL, 0x057140ecadd1418eULL, 0x20cc13dd6d0648e3ULL, 0x4e0be68f2c7f5354ULL, 0x6bb6b5beeca85a39ULL, 0x93840c2bae8d643aULL, 0xb6395f1a6e5a6d57ULL, 0xd8feaa482f2376e0ULL, 0xfd43f979eff47f8dULL, 0x1d891f2ce7608781ULL, 0x38344c1d27b78eecULL, 0x56f3b94f66ce955bULL, 0x734eea7ea6199c36ULL, 0x8b7c53ebe43ca235ULL, 0xaec100da24ebab58ULL, 0xc006f5886592b0efULL, 0xe5bba6b9a545b982ULL, 0x15ded593210fc584ULL, 0x306386a2e1d8cce9ULL, 0x5ea473f0a0a1d75eULL, 0x7b1920c16076de33ULL, 0x832b99542253e030ULL, 0xa696ca65e284e95dULL, 0xc8513f37a3fdf2eaULL, 0xedec6c06632afb87ULL, 0x1389e73b65a805e4ULL, 0x3634b40aa57f0c89ULL, 0x58f34158e406173eULL, 0x7d4e126924d11e53ULL, 0x857cabfc66f42050ULL, 0xa0c1f8cda623293dULL, 0xce060d9fe75a328aULL, 0xebbb5eae278d3be7ULL, 0x1bde2d84a3c747e1ULL, 0x3e637eb563104e8cULL, 0x50a48be72269553bULL, 0x7519d8d6e2be5c56ULL, 0x8d2b6143a09b6255ULL, 0xa8963272604c6b38ULL, 0xc651c7202135708fULL, 0xe3ec9411e1e279e2ULL, 0x03267244e97681eeULL, 0x269b217529a18883ULL, 0x485cd42768d89334ULL, 0x6de18716a80f9a59ULL, 0x95d33e83ea2aa45aULL, 0xb06e6db22afdad37ULL, 0xdea998e06b84b680ULL, 0xfb14cbd1ab53bfedULL, 0x0b71b8fb2f19c3ebULL, 0x2eccebcaefceca86ULL, 0x400b1e98aeb7d131ULL, 0x65b64da96e60d85cULL, 0x9d84f43c2c45e65fULL, 0xb839a70dec92ef32ULL, 0xd6fe525fadebf485ULL, 0xf343016e6d3cfde8ULL, 0x176b9ef5bcc2049dULL, 0x32d6cdc47c150df0ULL, 0x5c1138963d6c1647ULL, 0x79ac6ba7fdbb1f2aULL, 0x819ed232bf9e2129ULL, 0xa42381037f492844ULL, 0xcae474513e3033f3ULL, 0xef592760fee73a9eULL, 0x1f3c544a7aad4698ULL, 0x3a81077bba7a4ff5ULL, 0x5446f229fb035442ULL, 0x71fba1183bd45d2fULL, 0x89c9188d79f1632cULL, 0xac744bbcb9266a41ULL, 0xc2b3beeef85f71f6ULL, 0xe70eeddf3888789bULL, 0x07c40b8a301c8097ULL, 0x227958bbf0cb89faULL, 0x4cbeade9b1b2924dULL, 0x6903fed871659b20ULL, 0x9131474d3340a523ULL, 0xb48c147cf397ac4eULL, 0xda4be12eb2eeb7f9ULL, 0xfff6b21f7239be94ULL, 0x0f93c135f673c292ULL, 0x2a2e920436a4cbffULL, 0x44e9675677ddd048ULL, 0x61543467b70ad925ULL, 0x99668df2f52fe726ULL, 0xbcdbdec335f8ee4bULL, 0xd21c2b917481f5fcULL, 0xf7a178a0b456fc91ULL, 0x1a4d14a6d77c0716ULL, 0x3ff0479717ab0e7bULL, 0x5137b2c556d215ccULL, 0x748ae1f496051ca1ULL, 0x8cb85861d42022a2ULL, 0xa9050b5014f72bcfULL, 0xc7c2fe02558e3078ULL, 0xe27fad3395593915ULL, 0x121ade1911134513ULL, 0x37a78d28d1c44c7eULL, 0x5960787a90bd57c9ULL, 0x7cdd2b4b506a5ea4ULL, 0x84ef92de124f60a7ULL, 0xa152c1efd29869caULL, 0xcf9534bd93e1727dULL, 0xea28678c53367b10ULL, 0x0ae281d95ba2831cULL, 0x2f5fd2e89b758a71ULL, 0x419827bada0c91c6ULL, 0x6425748b1adb98abULL, 0x9c17cd1e58fea6a8ULL, 0xb9aa9e2f9829afc5ULL, 0xd76d6b7dd950b472ULL, 0xf2d0384c1987bd1fULL, 0x02b54b669dcdc119ULL, 0x270818575d1ac874ULL, 0x49cfed051c63d3c3ULL, 0x6c72be34dcb4daaeULL, 0x944007a19e91e4adULL, 0xb1fd54905e46edc0ULL, 0xdf3aa1c21f3ff677ULL, 0xfa87f2f3dfe8ff1aULL, 0x1eaf6d680e16066fULL, 0x3b123e59cec10f02ULL, 0x55d5cb0b8fb814b5ULL, 0x7068983a4f6f1dd8ULL, 0x885a21af0d4a23dbULL, 0xade7729ecd9d2ab6ULL, 0xc32087cc8ce43101ULL, 0xe69dd4fd4c33386cULL, 0x16f8a7d7c879446aULL, 0x3345f4e608ae4d07ULL, 0x5d8201b449d756b0ULL, 0x783f528589005fddULL, 0x800deb10cb2561deULL, 0xa5b0b8210bf268b3ULL, 0xcb774d734a8b7304ULL, 0xeeca1e428a5c7a69ULL, 0x0e00f81782c88265ULL, 0x2bbdab26421f8b08ULL, 0x457a5e74036690bfULL, 0x60c70d45c3b199d2ULL, 0x98f5b4d08194a7d1ULL, 0xbd48e7e14143aebcULL, 0xd38f12b3003ab50bULL, 0xf6324182c0edbc66ULL, 0x065732a844a7c060ULL, 0x23ea61998470c90dULL, 0x4d2d94cbc509d2baULL, 0x6890c7fa05dedbd7ULL, 0x90a27e6f47fbe5d4ULL, 0xb51f2d5e872cecb9ULL, 0xdbd8d80cc655f70eULL, 0xfe658b3d0682fe63ULL }; unsigned long long U[256] = { 0x0000000000000000ULL, 0x067b86c43d6a6cb0ULL, 0x0cf70d887ad4d960ULL, 0x0a8c8b4c47beb5d0ULL, 0x19ee1b10f5a9b2c0ULL, 0x1f959dd4c8c3de70ULL, 0x151916988f7d6ba0ULL, 0x1362905cb2170710ULL, 0x166165102b846cedULL, 0x101ae3d416ee005dULL, 0x1a9668985150b58dULL, 0x1cedee5c6c3ad93dULL, 0x0f8f7e00de2dde2dULL, 0x09f4f8c4e347b29dULL, 0x03787388a4f9074dULL, 0x0503f54c99936bfdULL, 0x097f991197dfd0b7ULL, 0x0f041fd5aab5bc07ULL, 0x05889499ed0b09d7ULL, 0x03f3125dd0616567ULL, 0x1091820162766277ULL, 0x16ea04c55f1c0ec7ULL, 0x1c668f8918a2bb17ULL, 0x1a1d094d25c8d7a7ULL, 0x1f1efc01bc5bbc5aULL, 0x19657ac58131d0eaULL, 0x13e9f189c68f653aULL, 0x1592774dfbe5098aULL, 0x06f0e71149f20e9aULL, 0x008b61d57498622aULL, 0x0a07ea993326d7faULL, 0x0c7c6c5d0e4cbb4aULL, 0x12ff32232fbfa16eULL, 0x1484b4e712d5cddeULL, 0x1e083fab556b780eULL, 0x1873b96f680114beULL, 0x0b112933da1613aeULL, 0x0d6aaff7e77c7f1eULL, 0x07e624bba0c2caceULL, 0x019da27f9da8a67eULL, 0x049e5733043bcd83ULL, 0x02e5d1f73951a133ULL, 0x08695abb7eef14e3ULL, 0x0e12dc7f43857853ULL, 0x1d704c23f1927f43ULL, 0x1b0bcae7ccf813f3ULL, 0x118741ab8b46a623ULL, 0x17fcc76fb62cca93ULL, 0x1b80ab32b86071d9ULL, 0x1dfb2df6850a1d69ULL, 0x1777a6bac2b4a8b9ULL, 0x110c207effdec409ULL, 0x026eb0224dc9c319ULL, 0x041536e670a3afa9ULL, 0x0e99bdaa371d1a79ULL, 0x08e23b6e0a7776c9ULL, 0x0de1ce2293e41d34ULL, 0x0b9a48e6ae8e7184ULL, 0x0116c3aae930c454ULL, 0x076d456ed45aa8e4ULL, 0x140fd532664daff4ULL, 0x127453f65b27c344ULL, 0x18f8d8ba1c997694ULL, 0x1e835e7e21f31a24ULL, 0x004337779fa84bb1ULL, 0x0638b1b3a2c22701ULL, 0x0cb43affe57c92d1ULL, 0x0acfbc3bd816fe61ULL, 0x19ad2c676a01f971ULL, 0x1fd6aaa3576b95c1ULL, 0x155a21ef10d52011ULL, 0x1321a72b2dbf4ca1ULL, 0x16225267b42c275cULL, 0x1059d4a389464becULL, 0x1ad55fefcef8fe3cULL, 0x1caed92bf392928cULL, 0x0fcc49774185959cULL, 0x09b7cfb37ceff92cULL, 0x033b44ff3b514cfcULL, 0x0540c23b063b204cULL, 0x093cae6608779b06ULL, 0x0f4728a2351df7b6ULL, 0x05cba3ee72a34266ULL, 0x03b0252a4fc92ed6ULL, 0x10d2b576fdde29c6ULL, 0x16a933b2c0b44576ULL, 0x1c25b8fe870af0a6ULL, 0x1a5e3e3aba609c16ULL, 0x1f5dcb7623f3f7ebULL, 0x19264db21e999b5bULL, 0x13aac6fe59272e8bULL, 0x15d1403a644d423bULL, 0x06b3d066d65a452bULL, 0x00c856a2eb30299bULL, 0x0a44ddeeac8e9c4bULL, 0x0c3f5b2a91e4f0fbULL, 0x12bc0554b017eadfULL, 0x14c783908d7d866fULL, 0x1e4b08dccac333bfULL, 0x18308e18f7a95f0fULL, 0x0b521e4445be581fULL, 0x0d29988078d434afULL, 0x07a513cc3f6a817fULL, 0x01de95080200edcfULL, 0x04dd60449b938632ULL, 0x02a6e680a6f9ea82ULL, 0x082a6dcce1475f52ULL, 0x0e51eb08dc2d33e2ULL, 0x1d337b546e3a34f2ULL, 0x1b48fd9053505842ULL, 0x11c476dc14eeed92ULL, 0x17bff01829848122ULL, 0x1bc39c4527c83a68ULL, 0x1db81a811aa256d8ULL, 0x173491cd5d1ce308ULL, 0x114f170960768fb8ULL, 0x022d8755d26188a8ULL, 0x04560191ef0be418ULL, 0x0eda8adda8b551c8ULL, 0x08a10c1995df3d78ULL, 0x0da2f9550c4c5685ULL, 0x0bd97f9131263a35ULL, 0x0155f4dd76988fe5ULL, 0x072e72194bf2e355ULL, 0x144ce245f9e5e445ULL, 0x12376481c48f88f5ULL, 0x18bbefcd83313d25ULL, 0x1ec06909be5b5195ULL, 0x00866eef3f509762ULL, 0x06fde82b023afbd2ULL, 0x0c71636745844e02ULL, 0x0a0ae5a378ee22b2ULL, 0x196875ffcaf925a2ULL, 0x1f13f33bf7934912ULL, 0x159f7877b02dfcc2ULL, 0x13e4feb38d479072ULL, 0x16e70bff14d4fb8fULL, 0x109c8d3b29be973fULL, 0x1a1006776e0022efULL, 0x1c6b80b3536a4e5fULL, 0x0f0910efe17d494fULL, 0x0972962bdc1725ffULL, 0x03fe1d679ba9902fULL, 0x05859ba3a6c3fc9fULL, 0x09f9f7fea88f47d5ULL, 0x0f82713a95e52b65ULL, 0x050efa76d25b9eb5ULL, 0x03757cb2ef31f205ULL, 0x1017ecee5d26f515ULL, 0x166c6a2a604c99a5ULL, 0x1ce0e16627f22c75ULL, 0x1a9b67a21a9840c5ULL, 0x1f9892ee830b2b38ULL, 0x19e3142abe614788ULL, 0x136f9f66f9dff258ULL, 0x151419a2c4b59ee8ULL, 0x067689fe76a299f8ULL, 0x000d0f3a4bc8f548ULL, 0x0a8184760c764098ULL, 0x0cfa02b2311c2c28ULL, 0x12795ccc10ef360cULL, 0x1402da082d855abcULL, 0x1e8e51446a3bef6cULL, 0x18f5d780575183dcULL, 0x0b9747dce54684ccULL, 0x0decc118d82ce87cULL, 0x07604a549f925dacULL, 0x011bcc90a2f8311cULL, 0x041839dc3b6b5ae1ULL, 0x0263bf1806013651ULL, 0x08ef345441bf8381ULL, 0x0e94b2907cd5ef31ULL, 0x1df622cccec2e821ULL, 0x1b8da408f3a88491ULL, 0x11012f44b4163141ULL, 0x177aa980897c5df1ULL, 0x1b06c5dd8730e6bbULL, 0x1d7d4319ba5a8a0bULL, 0x17f1c855fde43fdbULL, 0x118a4e91c08e536bULL, 0x02e8decd7299547bULL, 0x049358094ff338cbULL, 0x0e1fd345084d8d1bULL, 0x086455813527e1abULL, 0x0d67a0cdacb48a56ULL, 0x0b1c260991dee6e6ULL, 0x0190ad45d6605336ULL, 0x07eb2b81eb0a3f86ULL, 0x1489bbdd591d3896ULL, 0x12f23d1964775426ULL, 0x187eb65523c9e1f6ULL, 0x1e0530911ea38d46ULL, 0x00c55998a0f8dcd3ULL, 0x06bedf5c9d92b063ULL, 0x0c325410da2c05b3ULL, 0x0a49d2d4e7466903ULL, 0x192b428855516e13ULL, 0x1f50c44c683b02a3ULL, 0x15dc4f002f85b773ULL, 0x13a7c9c412efdbc3ULL, 0x16a43c888b7cb03eULL, 0x10dfba4cb616dc8eULL, 0x1a533100f1a8695eULL, 0x1c28b7c4ccc205eeULL, 0x0f4a27987ed502feULL, 0x0931a15c43bf6e4eULL, 0x03bd2a100401db9eULL, 0x05c6acd4396bb72eULL, 0x09bac08937270c64ULL, 0x0fc1464d0a4d60d4ULL, 0x054dcd014df3d504ULL, 0x03364bc57099b9b4ULL, 0x1054db99c28ebea4ULL, 0x162f5d5dffe4d214ULL, 0x1ca3d611b85a67c4ULL, 0x1ad850d585300b74ULL, 0x1fdba5991ca36089ULL, 0x19a0235d21c90c39ULL, 0x132ca8116677b9e9ULL, 0x15572ed55b1dd559ULL, 0x0635be89e90ad249ULL, 0x004e384dd460bef9ULL, 0x0ac2b30193de0b29ULL, 0x0cb935c5aeb46799ULL, 0x123a6bbb8f477dbdULL, 0x1441ed7fb22d110dULL, 0x1ecd6633f593a4ddULL, 0x18b6e0f7c8f9c86dULL, 0x0bd470ab7aeecf7dULL, 0x0daff66f4784a3cdULL, 0x07237d23003a161dULL, 0x0158fbe73d507aadULL, 0x045b0eaba4c31150ULL, 0x0220886f99a97de0ULL, 0x08ac0323de17c830ULL, 0x0ed785e7e37da480ULL, 0x1db515bb516aa390ULL, 0x1bce937f6c00cf20ULL, 0x114218332bbe7af0ULL, 0x17399ef716d41640ULL, 0x1b45f2aa1898ad0aULL, 0x1d3e746e25f2c1baULL, 0x17b2ff22624c746aULL, 0x11c979e65f2618daULL, 0x02abe9baed311fcaULL, 0x04d06f7ed05b737aULL, 0x0e5ce43297e5c6aaULL, 0x082762f6aa8faa1aULL, 0x0d2497ba331cc1e7ULL, 0x0b5f117e0e76ad57ULL, 0x01d39a3249c81887ULL, 0x07a81cf674a27437ULL, 0x14ca8caac6b57327ULL, 0x12b10a6efbdf1f97ULL, 0x183d8122bc61aa47ULL, 0x1e4607e6810bc6f7ULL }; ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: RFC: New diff-delta.c implementation 2006-04-23 2:31 ` Geert Bosch @ 2006-04-24 19:10 ` Davide Libenzi 2006-04-24 19:23 ` Geert Bosch 0 siblings, 1 reply; 32+ messages in thread From: Davide Libenzi @ 2006-04-24 19:10 UTC (permalink / raw) To: Geert Bosch; +Cc: git [-- Attachment #1: Type: TEXT/PLAIN, Size: 1664 bytes --] On Sat, 22 Apr 2006, Geert Bosch wrote: > On Sat, Apr 22, 2006 at 01:36:07PM -0700, Davide Libenzi wrote: >> Geert, I saw you're using a shift of 55 bits, that gives an degree of the >> root polynomial of 63, that is not prime. Where did you get the root >> polynomial, and why you did not chose 61 as degree of the root? >> Just curious ... > > The polynomial was randomly created using code by David Mazieres, that > is part of LBFS. I chose a (irreducible) polynomial of degree 63 as > that was the same as LBFS did. As for my purposes it's best to have > a constant polynomial and I wanted to have all the code for > the computations in the same compilation unit for performance, > I decided to just have a little program print out the tables > and include it directly. The chosen polynomial was 0xb15e234bd3792f63. > > Later on I haven't revisited this decision, although I agree that > it'd probably be a good idea to use a polynomial of prime degree, > even though we're not looking for cryptographically strong hashes here. Right, but you are looking at highest equal-probability distribution over your hash buckets ;) Anyway, thanks for bringing Rabin's polynomial fingerprint up from the forgotten lands. Performance and delta size are quite amazing, and I decided to add Rabin's delta to libxdiff. I hacked some code (attached) to generate T/U tables. Since libxdiff must be portable everywhere, even on system w/out 64 bits support, I use xrabin to create both 64 bits tables (poly degree 61) and 32 bits tables (poly degree 31), and store them in a .c file letting the build environment to pick the correct one for the platform. - Davide [-- Attachment #2: Type: TEXT/x-csrc, Size: 7281 bytes --] /* * xrabin by Davide Libenzi (Rabin's polynomial generator) * Copyright (C) 2006 Davide Libenzi * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation; either version 2 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with this program; if not, write to the Free Software * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA * * Davide Libenzi <davidel@xmailserver.org> * * * Hints, ideas and code for the implementation came from: * * Rabin's original paper: http://www.xmailserver.org/rabin.pdf * Chan & Lu's paper: http://www.xmailserver.org/rabin_impl.pdf * Broder's paper: http://www.xmailserver.org/rabin_apps.pdf * LBFS source code: http://www.fs.net/sfswww/lbfs/ * Geert Bosch's post: http://marc.theaimsgroup.com/?l=git&m=114565424620771&w=2 * */ #include <stdio.h> #include <stdlib.h> #include <string.h> #include <time.h> #if defined(WIN32) #define strtoll _strtoi64 #endif #if !defined(XRAB_WORD_TYPE) #if defined(WIN32) #define XRAB_WORD_TYPE __int64 #else /* #if defined(WIN32) */ #define XRAB_WORD_TYPE long long #endif /* #if defined(WIN32) */ #endif /* #if !defined(XRAB_WORD_TYPE) */ #if !defined(XRAB_WORD_PFMT) #if defined(WIN32) #define XRAB_WORD_PFMT "0x%I64x" #else /* #if defined(WIN32) */ #define XRAB_WORD_PFMT "0x%llx" #endif /* #if defined(WIN32) */ #endif /* #if !defined(XRAB_WORD_PFMT) */ #define XPLYW_BITS (sizeof(xply_word) * 8) #define XPLYW_MSB ((xply_word) 1 << (sizeof(xply_word) * 8 - 1)) typedef unsigned XRAB_WORD_TYPE xply_word; static int xrab_isprime(int n) { if (n > 3) { if (n & 1) { int i, hn = n / 2; for (i = 3; i < hn; i += 2) if (!(n % i)) return 0; } else return 0; } return 1; } static int xrab_fls(xply_word v) { int r, s; xply_word mask = ~(((xply_word) 1 << (XPLYW_BITS / 2)) - 1); if (v == 0) return 0; for (r = XPLYW_BITS, s = r / 2; s != 0;) { if ((v & mask) == 0) { v <<= s; r -= s; } s /= 2; mask <<= s; } return r; } static xply_word xrab_polymod(xply_word nh, xply_word nl, xply_word d) { int i, k = xrab_fls(d) - 1; d <<= (XPLYW_BITS - 1) - k; if (nh) { if (nh & XPLYW_MSB) nh ^= d; for (i = XPLYW_BITS - 2; i >= 0; i--) if (nh & ((xply_word) 1) << i) { nh ^= d >> (XPLYW_BITS - 1) - i; nl ^= d << i + 1; } } for (i = XPLYW_BITS - 1; i >= k; i--) if (nl & ((xply_word) 1 << i)) nl ^= d >> (XPLYW_BITS - 1) - i; return nl; } static xply_word xrab_polygcd(xply_word x, xply_word y) { for (;;) { if (!y) return x; x = xrab_polymod(0, x, y); if (!x) return y; y = xrab_polymod(0, y, x); } } static void xrab_polymult(xply_word *php, xply_word *plp, xply_word x, xply_word y) { int i; xply_word ph = 0, pl = 0; if (x & 1) pl = y; for (i = 1; i < XPLYW_BITS; i++) if (x & (((xply_word) 1) << i)) { ph ^= y >> (XPLYW_BITS - i); pl ^= y << i; } if (php) *php = ph; if (plp) *plp = pl; } static xply_word xrab_polymmult(xply_word x, xply_word y, xply_word d) { xply_word h, l; xrab_polymult(&h, &l, x, y); return xrab_polymod(h, l, d); } static int xrab_polyirreducible(xply_word f) { xply_word u = 2; int i, m = (xrab_fls(f) - 1) >> 1; for (i = 0; i < m; i++) { u = xrab_polymmult(u, u, f); if (xrab_polygcd(f, u ^ 2) != 1) return 0; } return 1; } static void xrab_rndgen(xply_word *f) { unsigned int i; xply_word g; for (i = 0, g = 0; i < sizeof(xply_word); i++) g ^= (g << 11) + (unsigned int) rand() + (g >> 7); *f = g; } static int xrab_polygen(int degree, xply_word *ply) { xply_word msb, mask, f; if (degree <= 0 || degree >= XPLYW_BITS) return -1; msb = ((xply_word) 1) << degree; mask = msb - 1; srand(time(NULL)); do { xrab_rndgen(&f); f = (f & mask) | msb; } while (!xrab_polyirreducible(f)); *ply = f; return 0; } static int xarb_calc_tu(xply_word poly, int size, xply_word *t, xply_word *u) { int j, xshift, shift; xply_word t1, ssh; xshift = xrab_fls(poly) - 1; shift = xshift - 8; if (shift < 0) return -1; t1 = xrab_polymod(0, ((xply_word) 1) << xshift, poly); for (j = 0; j < 256; j++) t[j] = xrab_polymmult(j, t1, poly) | ((xply_word) j << xshift); for (j = 1, ssh = 1; j < size; j++) ssh = (ssh << 8) ^ t[ssh >> shift]; for (j = 0; j < 256; j++) u[j] = xrab_polymmult(j, ssh, poly); return 0; } int main(int ac, char **av) { int i, size = 24, degree = 0, shift; xply_word ply = 0, t[256], u[256]; for (i = 1; i < ac; i++) { if (strcmp(av[i], "-s") == 0) { if (++i < ac) size = atol(av[i]); } else if (strcmp(av[i], "-p") == 0) { if (++i < ac) ply = (xply_word) strtoll(av[i], NULL, 16); } else if (strcmp(av[i], "-d") == 0) { if (++i < ac) degree = atol(av[i]); } } if (degree && (degree < 8 || degree >= XPLYW_BITS)) { fprintf(stderr, "degree (%d) out of bound for the poly word size (8..%u)\n", degree, XPLYW_BITS); return 1; } if (degree == 0) for (degree = XPLYW_BITS - 1; !xrab_isprime(degree); degree--); if (ply == 0 && xrab_polygen(degree, &ply) < 0) return 2; shift = (xrab_fls(ply) - 1) - 8; fprintf(stderr, "found poly = " XRAB_WORD_PFMT " (shift %d)\n", ply, shift); if (xarb_calc_tu(ply, size, t, u) < 0) return 3; fprintf(stdout, "#if defined(XRABPLY_TYPE%d)\n\n", XPLYW_BITS); fprintf(stdout, "#if !defined(XV%d)\n", XPLYW_BITS); fprintf(stdout, "#define XV%d(v) ((xply_word) v ## ULL)\n", XPLYW_BITS); fprintf(stdout, "#endif\n\n"); fprintf(stdout, "#define XRAB_ROOTPOLY XV%d(" XRAB_WORD_PFMT ")\n\n", XPLYW_BITS, ply); fprintf(stdout, "#define XRAB_SHIFT %d\n", shift); fprintf(stdout, "#define XRAB_WNDSIZE %d\n\n", size); fprintf(stdout, "typedef unsigned XRABPLY_TYPE%d xply_word;\n\n", XPLYW_BITS); fprintf(stdout, "static const xply_word T[256] = {\n"); for (i = 0; i < 256; i++) { if (i) { fputs(",", stdout); if (i % 4 == 0) fputs("\n\t", stdout); else fputs(" ", stdout); } else fputs("\t", stdout); fprintf(stdout, "XV%d(" XRAB_WORD_PFMT ")", XPLYW_BITS, t[i]); } fprintf(stdout, "\n};\n\n"); fprintf(stdout, "static const xply_word U[256] = {\n"); for (i = 0; i < 256; i++) { if (i) { fputs(",", stdout); if (i % 4 == 0) fputs("\n\t", stdout); else fputs(" ", stdout); } else fputs("\t", stdout); fprintf(stdout, "XV%d(" XRAB_WORD_PFMT ")", XPLYW_BITS, u[i]); } fprintf(stdout, "\n};\n\n"); fprintf(stdout, "#endif /* if defined(XRABPLY_TYPE%d) */\n\n", XPLYW_BITS); return 0; } ^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: RFC: New diff-delta.c implementation 2006-04-24 19:10 ` Davide Libenzi @ 2006-04-24 19:23 ` Geert Bosch 0 siblings, 0 replies; 32+ messages in thread From: Geert Bosch @ 2006-04-24 19:23 UTC (permalink / raw) To: Davide Libenzi; +Cc: git On Apr 24, 2006, at 15:10, Davide Libenzi wrote: > Right, but you are looking at highest equal-probability > distribution over your hash buckets ;) > Anyway, thanks for bringing Rabin's polynomial fingerprint up from > the forgotten lands. Performance and delta size are quite amazing, > and I decided to add Rabin's delta to libxdiff. > I hacked some code (attached) to generate T/U tables. Since > libxdiff must be portable everywhere, even on system w/out 64 bits > support, I use xrabin to create both 64 bits tables (poly degree > 61) and 32 bits tables (poly degree 31), and store them in a .c > file letting the build environment to pick the correct one for the > platform. It might actually make sense to use the 32-bit code for GIT as well, since it turns out that on the typical small source files with few differences, the full 64-bit Rabin is a problem for performance. When diffing large files (my main interest), this is more than offset by the better hash quality. For tiny files with few changes it appears to be overkill... -Geert ^ permalink raw reply [flat|nested] 32+ messages in thread
end of thread, other threads:[~2006-04-25 18:22 UTC | newest] Thread overview: 32+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2006-04-21 21:16 RFC: New diff-delta.c implementation Geert Bosch 2006-04-22 3:19 ` Nicolas Pitre 2006-04-22 11:04 ` Geert Bosch 2006-04-22 11:13 ` Junio C Hamano 2006-04-22 12:35 ` Geert Bosch 2006-04-22 12:51 ` Nicolas Pitre 2006-04-22 13:39 ` Geert Bosch 2006-04-22 17:03 ` Junio C Hamano 2006-04-22 17:28 ` Geert Bosch 2006-04-22 17:57 ` Junio C Hamano 2006-04-22 12:45 ` Nicolas Pitre 2006-04-22 14:17 ` Geert Bosch 2006-04-22 17:29 ` Junio C Hamano 2006-04-22 19:58 ` Nicolas Pitre 2006-04-22 5:21 ` Davide Libenzi 2006-04-22 9:12 ` Geert Bosch 2006-04-22 12:36 ` Rene Scharfe 2006-04-24 2:57 ` Geert Bosch 2006-04-24 5:27 ` Nicolas Pitre 2006-04-24 15:19 ` Geert Bosch 2006-04-24 15:57 ` Nicolas Pitre 2006-04-24 16:31 ` Geert Bosch 2006-04-24 18:24 ` Geert Bosch 2006-04-24 18:27 ` Geert Bosch 2006-04-24 19:21 ` Rutger Nijlunsing 2006-04-24 20:37 ` Petr Baudis 2006-04-24 18:44 ` Geert Bosch 2006-04-25 18:22 ` Rene Scharfe 2006-04-22 20:36 ` Davide Libenzi 2006-04-23 2:31 ` Geert Bosch 2006-04-24 19:10 ` Davide Libenzi 2006-04-24 19:23 ` Geert Bosch
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).