Git development
 help / color / mirror / Atom feed
* RFC: New diff-delta.c implementation
From: Geert Bosch @ 2006-04-21 21:16 UTC (permalink / raw)
  To: Git Mailing List; +Cc: Junio C Hamano

[-- Attachment #1: Type: text/plain, Size: 1056 bytes --]

I wrote a new binary differencing algorithm that is both faster
and generates smaller deltas than the current implementation.
The format is compatible with that used by patch-delta, so
it should be easy to integrate.

Originally, I wrote this for the GDIFF format, see http://www.w3.org/ 
TR/NOTE-gdiff-19970901.
The adaptation for GIT format was relatively simple, but is not  
thoroughly tested.
The code is not derived from libxdiff, but uses the rabin_slide  
function written
by David Mazieres (dm@uun.org). Also the tables are generated using  
his code.
Finally, this was developed on Darwin, and not a Linux system, so  
some changes may be needed.

Initial testing seems quite positive, take for example git-1.2.5.tar  
vs git-1.2.6.tar
on my PowerBook (both with -O2 -DNDEBUG):

current: 2.281s, patch size 36563
new    : 0.109s, patch size 16199

Please feel free to play around with this code, and give feedback.
Keep in mind this wasn't originally written for GIT, and C is not
my native language, so don't mind my formatting etc.

   -Geert


[-- Attachment #2: diff-delta.c --]
[-- Type: application/octet-stream, Size: 30518 bytes --]

#include <unistd.h>
#include <stdlib.h>
#include <assert.h>
#include <string.h>
#include <sys/types.h>

/* MIN_HTAB_SIZE is fixed amount to be added to the size of the hash table
   used for indexing and must be a power of two. This allows for small files
   to have a sparse hash table, since in that case it's cheap.
   Hash table sizes are rounded up to a power of two to avoid integer division.
*/
#define MIN_HTAB_SIZE 8192
#define MAX_HTAB_SIZE (1024*1024*1024)

/* Diffing files of gigabyte range is impractical with the current
   algorithm, so we're assuming 32-bit sizes everywhere.
   Size leaves some room for expansion when diffing random files.  */
#define MAX_SIZE (0x7eff0000)
/* Initial size of copies table, dynamically extended as needed. */
#define MAX_COPIES 4096

/* Matching is done using a sliding window for which a Rabin
   polynomial is computed. The advantage of such polynomials is
   that they can efficiently be updated at every position.
   The tables needed for this are precomputed, as it is desirable
   to use the same polynomial all the time for repeatable results.
*/

#define RABIN_WINDOW_SIZE 22
#define RABIN_SHIFT 55

static unsigned long long T[256] =
{ 0x0000000000000000ULL, 0xb15e234bd3792f63ULL, 0x62bc4697a6f25ec6ULL,
  0xd3e265dc758b71a5ULL, 0x7426ae649e9d92efULL, 0xc5788d2f4de4bd8cULL,
  0x169ae8f3386fcc29ULL, 0xa7c4cbb8eb16e34aULL, 0x59137f82ee420abdULL,
  0xe84d5cc93d3b25deULL, 0x3baf391548b0547bULL, 0x8af11a5e9bc97b18ULL,
  0x2d35d1e670df9852ULL, 0x9c6bf2ada3a6b731ULL, 0x4f899771d62dc694ULL,
  0xfed7b43a0554e9f7ULL, 0x0378dc4e0ffd3a19ULL, 0xb226ff05dc84157aULL,
  0x61c49ad9a90f64dfULL, 0xd09ab9927a764bbcULL, 0x775e722a9160a8f6ULL,
  0xc600516142198795ULL, 0x15e234bd3792f630ULL, 0xa4bc17f6e4ebd953ULL,
  0x5a6ba3cce1bf30a4ULL, 0xeb35808732c61fc7ULL, 0x38d7e55b474d6e62ULL,
  0x8989c61094344101ULL, 0x2e4d0da87f22a24bULL, 0x9f132ee3ac5b8d28ULL,
  0x4cf14b3fd9d0fc8dULL, 0xfdaf68740aa9d3eeULL, 0x06f1b89c1ffa7432ULL,
  0xb7af9bd7cc835b51ULL, 0x644dfe0bb9082af4ULL, 0xd513dd406a710597ULL,
  0x72d716f88167e6ddULL, 0xc38935b3521ec9beULL, 0x106b506f2795b81bULL,
  0xa1357324f4ec9778ULL, 0x5fe2c71ef1b87e8fULL, 0xeebce45522c151ecULL,
  0x3d5e8189574a2049ULL, 0x8c00a2c284330f2aULL, 0x2bc4697a6f25ec60ULL,
  0x9a9a4a31bc5cc303ULL, 0x49782fedc9d7b2a6ULL, 0xf8260ca61aae9dc5ULL,
  0x058964d210074e2bULL, 0xb4d74799c37e6148ULL, 0x67352245b6f510edULL,
  0xd66b010e658c3f8eULL, 0x71afcab68e9adcc4ULL, 0xc0f1e9fd5de3f3a7ULL,
  0x13138c2128688202ULL, 0xa24daf6afb11ad61ULL, 0x5c9a1b50fe454496ULL,
  0xedc4381b2d3c6bf5ULL, 0x3e265dc758b71a50ULL, 0x8f787e8c8bce3533ULL,
  0x28bcb53460d8d679ULL, 0x99e2967fb3a1f91aULL, 0x4a00f3a3c62a88bfULL,
  0xfb5ed0e81553a7dcULL, 0x0de371383ff4e864ULL, 0xbcbd5273ec8dc707ULL,
  0x6f5f37af9906b6a2ULL, 0xde0114e44a7f99c1ULL, 0x79c5df5ca1697a8bULL,
  0xc89bfc17721055e8ULL, 0x1b7999cb079b244dULL, 0xaa27ba80d4e20b2eULL,
  0x54f00ebad1b6e2d9ULL, 0xe5ae2df102cfcdbaULL, 0x364c482d7744bc1fULL,
  0x87126b66a43d937cULL, 0x20d6a0de4f2b7036ULL, 0x918883959c525f55ULL,
  0x426ae649e9d92ef0ULL, 0xf334c5023aa00193ULL, 0x0e9bad763009d27dULL,
  0xbfc58e3de370fd1eULL, 0x6c27ebe196fb8cbbULL, 0xdd79c8aa4582a3d8ULL,
  0x7abd0312ae944092ULL, 0xcbe320597ded6ff1ULL, 0x1801458508661e54ULL,
  0xa95f66cedb1f3137ULL, 0x5788d2f4de4bd8c0ULL, 0xe6d6f1bf0d32f7a3ULL,
  0x3534946378b98606ULL, 0x846ab728abc0a965ULL, 0x23ae7c9040d64a2fULL,
  0x92f05fdb93af654cULL, 0x41123a07e62414e9ULL, 0xf04c194c355d3b8aULL,
  0x0b12c9a4200e9c56ULL, 0xba4ceaeff377b335ULL, 0x69ae8f3386fcc290ULL,
  0xd8f0ac785585edf3ULL, 0x7f3467c0be930eb9ULL, 0xce6a448b6dea21daULL,
  0x1d8821571861507fULL, 0xacd6021ccb187f1cULL, 0x5201b626ce4c96ebULL,
  0xe35f956d1d35b988ULL, 0x30bdf0b168bec82dULL, 0x81e3d3fabbc7e74eULL,
  0x2627184250d10404ULL, 0x97793b0983a82b67ULL, 0x449b5ed5f6235ac2ULL,
  0xf5c57d9e255a75a1ULL, 0x086a15ea2ff3a64fULL, 0xb93436a1fc8a892cULL,
  0x6ad6537d8901f889ULL, 0xdb8870365a78d7eaULL, 0x7c4cbb8eb16e34a0ULL,
  0xcd1298c562171bc3ULL, 0x1ef0fd19179c6a66ULL, 0xafaede52c4e54505ULL,
  0x51796a68c1b1acf2ULL, 0xe027492312c88391ULL, 0x33c52cff6743f234ULL,
  0x829b0fb4b43add57ULL, 0x255fc40c5f2c3e1dULL, 0x9401e7478c55117eULL,
  0x47e3829bf9de60dbULL, 0xf6bda1d02aa74fb8ULL, 0x1bc6e2707fe9d0c8ULL,
  0xaa98c13bac90ffabULL, 0x797aa4e7d91b8e0eULL, 0xc82487ac0a62a16dULL,
  0x6fe04c14e1744227ULL, 0xdebe6f5f320d6d44ULL, 0x0d5c0a8347861ce1ULL,
  0xbc0229c894ff3382ULL, 0x42d59df291abda75ULL, 0xf38bbeb942d2f516ULL,
  0x2069db65375984b3ULL, 0x9137f82ee420abd0ULL, 0x36f333960f36489aULL,
  0x87ad10dddc4f67f9ULL, 0x544f7501a9c4165cULL, 0xe511564a7abd393fULL,
  0x18be3e3e7014ead1ULL, 0xa9e01d75a36dc5b2ULL, 0x7a0278a9d6e6b417ULL,
  0xcb5c5be2059f9b74ULL, 0x6c98905aee89783eULL, 0xddc6b3113df0575dULL,
  0x0e24d6cd487b26f8ULL, 0xbf7af5869b02099bULL, 0x41ad41bc9e56e06cULL,
  0xf0f362f74d2fcf0fULL, 0x2311072b38a4beaaULL, 0x924f2460ebdd91c9ULL,
  0x358befd800cb7283ULL, 0x84d5cc93d3b25de0ULL, 0x5737a94fa6392c45ULL,
  0xe6698a0475400326ULL, 0x1d375aec6013a4faULL, 0xac6979a7b36a8b99ULL,
  0x7f8b1c7bc6e1fa3cULL, 0xced53f301598d55fULL, 0x6911f488fe8e3615ULL,
  0xd84fd7c32df71976ULL, 0x0badb21f587c68d3ULL, 0xbaf391548b0547b0ULL,
  0x4424256e8e51ae47ULL, 0xf57a06255d288124ULL, 0x269863f928a3f081ULL,
  0x97c640b2fbdadfe2ULL, 0x30028b0a10cc3ca8ULL, 0x815ca841c3b513cbULL,
  0x52becd9db63e626eULL, 0xe3e0eed665474d0dULL, 0x1e4f86a26fee9ee3ULL,
  0xaf11a5e9bc97b180ULL, 0x7cf3c035c91cc025ULL, 0xcdade37e1a65ef46ULL,
  0x6a6928c6f1730c0cULL, 0xdb370b8d220a236fULL, 0x08d56e51578152caULL,
  0xb98b4d1a84f87da9ULL, 0x475cf92081ac945eULL, 0xf602da6b52d5bb3dULL,
  0x25e0bfb7275eca98ULL, 0x94be9cfcf427e5fbULL, 0x337a57441f3106b1ULL,
  0x8224740fcc4829d2ULL, 0x51c611d3b9c35877ULL, 0xe09832986aba7714ULL,
  0x16259348401d38acULL, 0xa77bb003936417cfULL, 0x7499d5dfe6ef666aULL,
  0xc5c7f69435964909ULL, 0x62033d2cde80aa43ULL, 0xd35d1e670df98520ULL,
  0x00bf7bbb7872f485ULL, 0xb1e158f0ab0bdbe6ULL, 0x4f36eccaae5f3211ULL,
  0xfe68cf817d261d72ULL, 0x2d8aaa5d08ad6cd7ULL, 0x9cd48916dbd443b4ULL,
  0x3b1042ae30c2a0feULL, 0x8a4e61e5e3bb8f9dULL, 0x59ac04399630fe38ULL,
  0xe8f227724549d15bULL, 0x155d4f064fe002b5ULL, 0xa4036c4d9c992dd6ULL,
  0x77e10991e9125c73ULL, 0xc6bf2ada3a6b7310ULL, 0x617be162d17d905aULL,
  0xd025c2290204bf39ULL, 0x03c7a7f5778fce9cULL, 0xb29984bea4f6e1ffULL,
  0x4c4e3084a1a20808ULL, 0xfd1013cf72db276bULL, 0x2ef27613075056ceULL,
  0x9fac5558d42979adULL, 0x38689ee03f3f9ae7ULL, 0x8936bdabec46b584ULL,
  0x5ad4d87799cdc421ULL, 0xeb8afb3c4ab4eb42ULL, 0x10d42bd45fe74c9eULL,
  0xa18a089f8c9e63fdULL, 0x72686d43f9151258ULL, 0xc3364e082a6c3d3bULL,
  0x64f285b0c17ade71ULL, 0xd5aca6fb1203f112ULL, 0x064ec327678880b7ULL,
  0xb710e06cb4f1afd4ULL, 0x49c75456b1a54623ULL, 0xf899771d62dc6940ULL,
  0x2b7b12c1175718e5ULL, 0x9a25318ac42e3786ULL, 0x3de1fa322f38d4ccULL,
  0x8cbfd979fc41fbafULL, 0x5f5dbca589ca8a0aULL, 0xee039fee5ab3a569ULL,
  0x13acf79a501a7687ULL, 0xa2f2d4d1836359e4ULL, 0x7110b10df6e82841ULL,
  0xc04e924625910722ULL, 0x678a59fece87e468ULL, 0xd6d47ab51dfecb0bULL,
  0x05361f696875baaeULL, 0xb4683c22bb0c95cdULL, 0x4abf8818be587c3aULL,
  0xfbe1ab536d215359ULL, 0x2803ce8f18aa22fcULL, 0x995dedc4cbd30d9fULL,
  0x3e99267c20c5eed5ULL, 0x8fc70537f3bcc1b6ULL, 0x5c2560eb8637b013ULL,
  0xed7b43a0554e9f70ULL
};

static unsigned long long U[256] =
{ 0x0000000000000000ULL, 0x079343d61ab9f60eULL, 0x0f2687ac3573ec1cULL,
  0x08b5c47a2fca1a12ULL, 0x1e4d0f586ae7d838ULL, 0x19de4c8e705e2e36ULL,
  0x116b88f45f943424ULL, 0x16f8cb22452dc22aULL, 0x3c9a1eb0d5cfb070ULL,
  0x3b095d66cf76467eULL, 0x33bc991ce0bc5c6cULL, 0x342fdacafa05aa62ULL,
  0x22d711e8bf286848ULL, 0x2544523ea5919e46ULL, 0x2df196448a5b8454ULL,
  0x2a62d59290e2725aULL, 0x79343d61ab9f60e0ULL, 0x7ea77eb7b12696eeULL,
  0x7612bacd9eec8cfcULL, 0x7181f91b84557af2ULL, 0x67793239c178b8d8ULL,
  0x60ea71efdbc14ed6ULL, 0x685fb595f40b54c4ULL, 0x6fccf643eeb2a2caULL,
  0x45ae23d17e50d090ULL, 0x423d600764e9269eULL, 0x4a88a47d4b233c8cULL,
  0x4d1be7ab519aca82ULL, 0x5be32c8914b708a8ULL, 0x5c706f5f0e0efea6ULL,
  0x54c5ab2521c4e4b4ULL, 0x5356e8f33b7d12baULL, 0x433659888447eea3ULL,
  0x44a51a5e9efe18adULL, 0x4c10de24b13402bfULL, 0x4b839df2ab8df4b1ULL,
  0x5d7b56d0eea0369bULL, 0x5ae81506f419c095ULL, 0x525dd17cdbd3da87ULL,
  0x55ce92aac16a2c89ULL, 0x7fac473851885ed3ULL, 0x783f04ee4b31a8ddULL,
  0x708ac09464fbb2cfULL, 0x771983427e4244c1ULL, 0x61e148603b6f86ebULL,
  0x66720bb621d670e5ULL, 0x6ec7cfcc0e1c6af7ULL, 0x69548c1a14a59cf9ULL,
  0x3a0264e92fd88e43ULL, 0x3d91273f3561784dULL, 0x3524e3451aab625fULL,
  0x32b7a09300129451ULL, 0x244f6bb1453f567bULL, 0x23dc28675f86a075ULL,
  0x2b69ec1d704cba67ULL, 0x2cfaafcb6af54c69ULL, 0x06987a59fa173e33ULL,
  0x010b398fe0aec83dULL, 0x09befdf5cf64d22fULL, 0x0e2dbe23d5dd2421ULL,
  0x18d5750190f0e60bULL, 0x1f4636d78a491005ULL, 0x17f3f2ada5830a17ULL,
  0x1060b17bbf3afc19ULL, 0x3732905adbf6f225ULL, 0x30a1d38cc14f042bULL,
  0x381417f6ee851e39ULL, 0x3f875420f43ce837ULL, 0x297f9f02b1112a1dULL,
  0x2eecdcd4aba8dc13ULL, 0x265918ae8462c601ULL, 0x21ca5b789edb300fULL,
  0x0ba88eea0e394255ULL, 0x0c3bcd3c1480b45bULL, 0x048e09463b4aae49ULL,
  0x031d4a9021f35847ULL, 0x15e581b264de9a6dULL, 0x1276c2647e676c63ULL,
  0x1ac3061e51ad7671ULL, 0x1d5045c84b14807fULL, 0x4e06ad3b706992c5ULL,
  0x4995eeed6ad064cbULL, 0x41202a97451a7ed9ULL, 0x46b369415fa388d7ULL,
  0x504ba2631a8e4afdULL, 0x57d8e1b50037bcf3ULL, 0x5f6d25cf2ffda6e1ULL,
  0x58fe6619354450efULL, 0x729cb38ba5a622b5ULL, 0x750ff05dbf1fd4bbULL,
  0x7dba342790d5cea9ULL, 0x7a2977f18a6c38a7ULL, 0x6cd1bcd3cf41fa8dULL,
  0x6b42ff05d5f80c83ULL, 0x63f73b7ffa321691ULL, 0x646478a9e08be09fULL,
  0x7404c9d25fb11c86ULL, 0x73978a044508ea88ULL, 0x7b224e7e6ac2f09aULL,
  0x7cb10da8707b0694ULL, 0x6a49c68a3556c4beULL, 0x6dda855c2fef32b0ULL,
  0x656f4126002528a2ULL, 0x62fc02f01a9cdeacULL, 0x489ed7628a7eacf6ULL,
  0x4f0d94b490c75af8ULL, 0x47b850cebf0d40eaULL, 0x402b1318a5b4b6e4ULL,
  0x56d3d83ae09974ceULL, 0x51409becfa2082c0ULL, 0x59f55f96d5ea98d2ULL,
  0x5e661c40cf536edcULL, 0x0d30f4b3f42e7c66ULL, 0x0aa3b765ee978a68ULL,
  0x0216731fc15d907aULL, 0x058530c9dbe46674ULL, 0x137dfbeb9ec9a45eULL,
  0x14eeb83d84705250ULL, 0x1c5b7c47abba4842ULL, 0x1bc83f91b103be4cULL,
  0x31aaea0321e1cc16ULL, 0x3639a9d53b583a18ULL, 0x3e8c6daf1492200aULL,
  0x391f2e790e2bd604ULL, 0x2fe7e55b4b06142eULL, 0x2874a68d51bfe220ULL,
  0x20c162f77e75f832ULL, 0x2752212164cc0e3cULL, 0x6e6520b5b7ede44aULL,
  0x69f66363ad541244ULL, 0x6143a719829e0856ULL, 0x66d0e4cf9827fe58ULL,
  0x70282feddd0a3c72ULL, 0x77bb6c3bc7b3ca7cULL, 0x7f0ea841e879d06eULL,
  0x789deb97f2c02660ULL, 0x52ff3e056222543aULL, 0x556c7dd3789ba234ULL,
  0x5dd9b9a95751b826ULL, 0x5a4afa7f4de84e28ULL, 0x4cb2315d08c58c02ULL,
  0x4b21728b127c7a0cULL, 0x4394b6f13db6601eULL, 0x4407f527270f9610ULL,
  0x17511dd41c7284aaULL, 0x10c25e0206cb72a4ULL, 0x18779a78290168b6ULL,
  0x1fe4d9ae33b89eb8ULL, 0x091c128c76955c92ULL, 0x0e8f515a6c2caa9cULL,
  0x063a952043e6b08eULL, 0x01a9d6f6595f4680ULL, 0x2bcb0364c9bd34daULL,
  0x2c5840b2d304c2d4ULL, 0x24ed84c8fcced8c6ULL, 0x237ec71ee6772ec8ULL,
  0x35860c3ca35aece2ULL, 0x32154feab9e31aecULL, 0x3aa08b90962900feULL,
  0x3d33c8468c90f6f0ULL, 0x2d53793d33aa0ae9ULL, 0x2ac03aeb2913fce7ULL,
  0x2275fe9106d9e6f5ULL, 0x25e6bd471c6010fbULL, 0x331e7665594dd2d1ULL,
  0x348d35b343f424dfULL, 0x3c38f1c96c3e3ecdULL, 0x3babb21f7687c8c3ULL,
  0x11c9678de665ba99ULL, 0x165a245bfcdc4c97ULL, 0x1eefe021d3165685ULL,
  0x197ca3f7c9afa08bULL, 0x0f8468d58c8262a1ULL, 0x08172b03963b94afULL,
  0x00a2ef79b9f18ebdULL, 0x0731acafa34878b3ULL, 0x5467445c98356a09ULL,
  0x53f4078a828c9c07ULL, 0x5b41c3f0ad468615ULL, 0x5cd28026b7ff701bULL,
  0x4a2a4b04f2d2b231ULL, 0x4db908d2e86b443fULL, 0x450ccca8c7a15e2dULL,
  0x429f8f7edd18a823ULL, 0x68fd5aec4dfada79ULL, 0x6f6e193a57432c77ULL,
  0x67dbdd4078893665ULL, 0x60489e966230c06bULL, 0x76b055b4271d0241ULL,
  0x712316623da4f44fULL, 0x7996d218126eee5dULL, 0x7e0591ce08d71853ULL,
  0x5957b0ef6c1b166fULL, 0x5ec4f33976a2e061ULL, 0x567137435968fa73ULL,
  0x51e2749543d10c7dULL, 0x471abfb706fcce57ULL, 0x4089fc611c453859ULL,
  0x483c381b338f224bULL, 0x4faf7bcd2936d445ULL, 0x65cdae5fb9d4a61fULL,
  0x625eed89a36d5011ULL, 0x6aeb29f38ca74a03ULL, 0x6d786a25961ebc0dULL,
  0x7b80a107d3337e27ULL, 0x7c13e2d1c98a8829ULL, 0x74a626abe640923bULL,
  0x7335657dfcf96435ULL, 0x20638d8ec784768fULL, 0x27f0ce58dd3d8081ULL,
  0x2f450a22f2f79a93ULL, 0x28d649f4e84e6c9dULL, 0x3e2e82d6ad63aeb7ULL,
  0x39bdc100b7da58b9ULL, 0x3108057a981042abULL, 0x369b46ac82a9b4a5ULL,
  0x1cf9933e124bc6ffULL, 0x1b6ad0e808f230f1ULL, 0x13df149227382ae3ULL,
  0x144c57443d81dcedULL, 0x02b49c6678ac1ec7ULL, 0x0527dfb06215e8c9ULL,
  0x0d921bca4ddff2dbULL, 0x0a01581c576604d5ULL, 0x1a61e967e85cf8ccULL,
  0x1df2aab1f2e50ec2ULL, 0x15476ecbdd2f14d0ULL, 0x12d42d1dc796e2deULL,
  0x042ce63f82bb20f4ULL, 0x03bfa5e99802d6faULL, 0x0b0a6193b7c8cce8ULL,
  0x0c992245ad713ae6ULL, 0x26fbf7d73d9348bcULL, 0x2168b401272abeb2ULL,
  0x29dd707b08e0a4a0ULL, 0x2e4e33ad125952aeULL, 0x38b6f88f57749084ULL,
  0x3f25bb594dcd668aULL, 0x37907f2362077c98ULL, 0x30033cf578be8a96ULL,
  0x6355d40643c3982cULL, 0x64c697d0597a6e22ULL, 0x6c7353aa76b07430ULL,
  0x6be0107c6c09823eULL, 0x7d18db5e29244014ULL, 0x7a8b9888339db61aULL,
  0x723e5cf21c57ac08ULL, 0x75ad1f2406ee5a06ULL, 0x5fcfcab6960c285cULL,
  0x585c89608cb5de52ULL, 0x50e94d1aa37fc440ULL, 0x577a0eccb9c6324eULL,
  0x4182c5eefcebf064ULL, 0x46118638e652066aULL, 0x4ea44242c9981c78ULL,
  0x49370194d321ea76ULL
};

static unsigned char rabin_window[RABIN_WINDOW_SIZE];
static unsigned rabin_pos = 0;
static void rabin_reset();
static u_int64_t rabin_slide(u_int64_t fp, unsigned char m);

#define MIN(x,y) ((y)<(x) ? (y) : (x))
#define MAX(x,y) ((y)>(x) ? (y) : (x))

/* FIXME: There must be a better way to do this... */
#if !defined(_BIG_ENDIAN) && defined(__BIG_ENDIAN) && defined(__BYTE_ORDER)
static const int big_endian = (__BYTE_ORDER == __BIG_ENDIAN);
#elif !defined(_BIG_ENDIAN) && !defined(_LITTLE_ENDIAN)
#error "Exactly one of _BIG_ENDIAN or _LITTLE_ENDIAN must be defined"
#elif defined(_BIG_ENDIAN) && defined(_LITTLE_ENDIAN)
#error "Only one of _BIG_ENDIAN or _LITTLE_ENDIAN may be defined"
#elif defined(_BIG_ENDIAN)
static const int big_endian = 1;
#else
static cont int big_endian = 0;
#endif

/* The copies array is the central data structure for 
   diff generation.  Data statements are implicit, 
   for ranges not covered by any copy command.

   The sum of tgt and length for each entry must be
   monotonically increasing, and data ranges 
   must be non-overlapping. This is accomplished by
   not extending matches backwards during initial matching.

   Copies may have zero length, to make it quick to
   delete copies during optimization. However, the last
   copy in the list must always be a non-trivial copy.

   Before committing copies, an important optimization
   is performed: during a backward pass through the copies array,
   each entry is extended backwards, and redundant copies are
   eliminated.

   If each match were extended backwards on insertion, the same
   data may be matched an arbitrary number of times, resulting in
   potentially quadratic time behavior.
*/

typedef struct copyinfo
{ unsigned	src;
  unsigned	tgt;
  unsigned	length;
} CopyInfo;
  
static CopyInfo *copies;
static int	copy_count = 0;
static unsigned max_copies = 0; /* Dynamically increased */

static unsigned *idx;
static unsigned idx_size;
static unsigned char *idx_data;
static unsigned idx_data_len;

static void
rabin_reset ()
{ bzero (rabin_window, sizeof (rabin_window));
}

static u_int64_t
rabin_slide (u_int64_t fp, unsigned char m)
{ unsigned char om;
  if (++rabin_pos == RABIN_WINDOW_SIZE) rabin_pos = 0;
  om = rabin_window[rabin_pos];
  fp ^= U[om];
  rabin_window[rabin_pos] = m;
  fp = ((fp << 8) | m) ^ T[fp >> RABIN_SHIFT];

  return fp;
}

void init_idx (unsigned char *data, size_t len, int level)
{ static unsigned index_step 
                 = RABIN_WINDOW_SIZE / sizeof (unsigned) * sizeof (unsigned);
  size_t j, k;
  unsigned char ch = 0;
  unsigned maxofs[256];
  unsigned maxlen[256];
  unsigned maxfp[256];
  unsigned runlen = 0;
  u_int64_t fp = 0;

  assert (len <= MAX_SIZE);
  assert (level >= 0 && level <= 9);
  bzero (maxofs, 256 * sizeof (unsigned));
  bzero (maxlen, 256 * sizeof (unsigned));
  bzero (maxfp, 256 * sizeof (unsigned));

  /* index_step must be multiple of word size */
  if (level >= 1)
  { index_step = MIN (index_step, 4 * sizeof (unsigned));
    /* Use smaller step size for higher optimization levels or smaller files */
    if (level >= 3 || len <= 65536)
    { index_step = MIN (index_step, 3 * sizeof (unsigned));
    }
    if (level >= 4 || len <= 32768)
    { index_step = MIN (index_step, 2 * sizeof (unsigned));
    }
    if (level >= 6 || len < 16384)
    { index_step = MIN (index_step, 1 * sizeof (unsigned));
  } }
  assert (index_step && !(index_step % sizeof (unsigned)));

  /* Add fixed amount to hash table size, as small files will benefit
     a lot without using significantly more memory or time. */
  idx_size = (level + 1) * (len / index_step) / 2 + MIN_HTAB_SIZE;
  idx_size = MIN (idx_size, MAX_HTAB_SIZE - 1); /* So rounding up works */

  /* Round up to next power of two, but limit to MAX_HTAB_SIZE. */
  { unsigned s = MIN_HTAB_SIZE;
    while (s < idx_size) s += s;
    idx_size = s;
  }

  idx_data = data;
  idx_data_len = len;
  idx = (unsigned *) calloc (idx_size, sizeof (unsigned)); 

  /* It is tempting to first index higher addresses, so hashes of lower
     addresses will get preference in the hash table. However, for
     repetitive patterns with a period that is a divisor of the fingerprint
     window, this may mean the match is not anchored at the end. 
     Furthermore, even when using a window length that is prime, the
     benefits are small and the irregularity of the first matches being
     more important is not worth it. */

  rabin_reset();
  
  ch = 0;
  runlen = 0;

  for (j = 0; j + index_step < len; j += index_step)
  { unsigned char pch = 0;
    unsigned hash;

    /* hot loop, use word loads. */
    for (k = 0; k < index_step; k+= sizeof (unsigned))
    { unsigned w = *((unsigned *) (data + (j + k)));
      unsigned n;

      for (n = 0; n < sizeof (unsigned); n++)
      { pch = ch;
        ch = big_endian ? (w>>24) & 0xff : w & 0xff;
        w = big_endian ? (w<<8) : (w>>8);
        if (ch != pch) runlen = 0;
        runlen++;
        fp = rabin_slide (fp, ch);
    } }

    /* See if there is a word-aligned window-sized run of equal characters */
    if (runlen >= RABIN_WINDOW_SIZE + (sizeof (unsigned) - 1))
    { /* Skip ahead to end of run of identical input characters */
      while (j + k < len && data[j + k] == ch) { k++; runlen++; }

      /* Although matches are usually anchored at the end, in the case
         of extended runs of equal characters it is better to anchor after the
         first RABIN_WINDOW_SIZE bytes. This allows for quick skip ahead 
         while matching such runs, avoiding unneeded fingerprint calculations.
         Also, when anchoring at the end, matches will be generated after
         every word, because the fingerprint stays constant. Even though
         all matches would get combined during match optimization, 
         it wastes time and space.
      */
      if (runlen > maxlen[pch] + 4)
      { unsigned ofs;
        /* ofs points RABIN_WINDOW_SIZE bytes after the start of the run,
           rounded up to the next word */
        ofs = j + k - runlen + RABIN_WINDOW_SIZE + (sizeof (unsigned) - 1);
        ofs -= (ofs % sizeof (unsigned));
        maxofs [pch] = ofs;
        maxlen [pch] = runlen;
        assert (maxfp[pch] == 0 || maxfp[pch] == (unsigned) fp);
        maxfp [pch] = (unsigned) fp;
      }
      /* Keep input aligned as if no special run processing had taken place */
      j += k - (k % index_step) - index_step;
      k = index_step;
    }

    /* Testing showed that avoiding collisions using secondary hashing, or
       hash chaining had little effect and is not worth the time. */

    hash = ((unsigned) fp) & (idx_size - 1);
    idx [hash] = j + k;
  }

  /* Lastly, index the longest runs of equal characters found before.
     This ensures we always match the longerst such runs available.  */

  for (j = 0; j < 256; j++)
  { if (maxlen[j]) 
    { idx[maxfp[j] % idx_size] = maxofs[j];
} } }

static unsigned header_length (unsigned srclen, unsigned tgtlen)
{ unsigned len = 0;
  assert (srclen <= MAX_SIZE && tgtlen <= MAX_SIZE);

  /* GIT headers start with the length of the source and target,
     with 7 bits per byte, most significant byte first, and
     the high bit indicating continuation. */
  while (srclen >= 0x7f) { len++; srclen >>= 7; }
  while (tgtlen >= 0x7f) { len++; tgtlen >>= 7; }

  return len + 2;
}

static unsigned data_length (unsigned length)
{ assert (length > 0 && length <= MAX_SIZE);

  /* Can only include 0x7f data bytes per command */
  return (length / 0x7f) * 0x80 + length % 0x7f + 1;
}

static unsigned copy_length (unsigned offset, unsigned length)
{ /* Can only copy 0xffffff bytes per command. For longer commands,
     break into pieces of that size. It might be slightly more
     efficient to break into pieces of size 0xff0000, but it's not
     worth adding complexity for that rare case. */
  unsigned osize = !!(offset & 0xff) + !!(offset & 0xff00) 
                   + !!(offset & 0xff0000) + !!(offset & 0xff000000); 
  assert (offset < MAX_SIZE && length < MAX_SIZE);

  return 1 + (length / 0xffffff ) * (osize + 4) + osize +
           + !!(length & 0xff) + !!(length & 0xff00) + !!(length & 0xff0000);
}

static unsigned process_copies (unsigned char *data, unsigned length)
{ int j;
  unsigned ptr = length;
  unsigned patch_bytes = 0;

  /* Work through the copies backwards, extending each one backwards. */
  for (j = copy_count - 1; j >= 0; j--)
  { CopyInfo *copy = copies+j;
    unsigned src = copy->src;
    unsigned tgt = copy->tgt;
    unsigned len = copy->length;
    int data_follows;

    if (tgt + len > ptr)
    { /* Part of copy already covered by later one, so shorten copy. */

      if (ptr < tgt)
      {  /* Copy completely disappeared, but guess that a backward extension
            might still be useful. This extension is non-contiguous, as it is
            irrelevant whether the skipped data would have matched or not.
            Be careful to not extend past the beginning of the source. */
         unsigned adjust = tgt - ptr;

         tgt = ptr;
         src = (src < adjust) ? 0 : src - adjust;

         copy->tgt = tgt;
         copy->src = src;
      }
       
      len = ptr - tgt;
    }

    while (src && tgt && idx_data[src - 1] == data[tgt - 1])
    { src-- ; tgt--; }

    len += copy->tgt - tgt;

    data_follows = tgt + len < ptr;

    if (len < (data_follows ? 16 : 10)) len = 0;
    /* A short copy may cost as much as 6 bytes for the copy and
       5 as result of an extra data command.
       It's not worth having extra copies in order to just save a byte or two.
       Being too smart here may hurt later compression as well.
    */

    if (len && data_follows)
    { /* Some target data is not covered by the copies, account for
         the DATA command that will follow the copy. */
      patch_bytes += data_length (ptr - (tgt + len));
    }

    /* Everything about the copy is known and will not change.
       Write back the new information and update the patch size
       with the size of the copy instruction. */
    copy->length = len;
    copy->src = src;
    copy->tgt = tgt;

    /* Remove empty copies at end of list. */
    copy_count -= (!len && j == copy_count - 1);

    if (len)
    { /* update patch size for copy command */
      patch_bytes += copy_length (src, len);
      ptr = tgt;

  } }

  /* Account for data before first copy */
  for (j = 0; j < copy_count; j++)
  { if (copies[j].length)
    { if (copies[j].tgt) patch_bytes += data_length (copies[j].tgt);
      break;
  } }

  /* Case where no copies remain: entire file is a data statement. */
  if (!copy_count && length) patch_bytes += data_length (length);

  /* Account for header */
  patch_bytes += header_length (idx_data_len, length);

  return patch_bytes;
}

/* Match data against the current index and record all possible copies */
static int find_copies (unsigned char *data, size_t len)
{ size_t j = 0;
  u_int64_t fp = 0;

  unsigned w = 0; /* shift register for quick content verification */

  rabin_reset ();

  while (j < RABIN_WINDOW_SIZE && j < len)
  { unsigned char ch = data[j++];
    fp = rabin_slide (fp, ch);
    w = big_endian ? w<<8 | ch : w>>8 | ch<<((sizeof (w) - 1) * 8);
  }

  while (j < len) 
  { unsigned char ch = data[j++];
    unsigned hash, ofs;

    fp = rabin_slide (fp, ch);
    hash = fp & (idx_size - 1);
    ofs = idx[hash];

    w = big_endian ? w<<8 | ch : w>>8 | ch<<((sizeof (w) - 1) * 8);

    /* Invariant:
         data[0] .. data[j-1] has been processed
         w contains last sizeof (unsigned) bytes of processed data
         fp is fingerprint of sliding window ending at j-1
         ofs is zero or points just past tentative match
         ofs is a multiple of index_step
    */

    if (ofs && *((unsigned *) (idx_data + ofs - sizeof (w))) == w)
    { /* Found a match. Now try to extend it forward. */
      unsigned runlen = sizeof (w);
      unsigned tgt = j - runlen;
      unsigned src = ofs - runlen;
      unsigned maxrun = MIN (idx_data_len - src, len - tgt);
      CopyInfo *copy;

      if (copy_count == max_copies)
      { max_copies *= 2;
        if (!max_copies)
        { max_copies = MAX_COPIES;
          copies = malloc(max_copies * sizeof (CopyInfo));
        }
        else
        { copies = realloc(copies, max_copies * sizeof (CopyInfo));
        }

        if (!copies) return 0;
      }

      copy = copies + copy_count;

      /* Hot loop */
      while (runlen < maxrun && data[tgt + runlen] == idx_data[src + runlen])
      { runlen++; }

      copy->src = src;
      copy->tgt = tgt;
      copy->length = runlen;
      copy_count++;

      /* For runs extending more than RABIN_WINDOW_SIZE bytes beyond j,
         skip ahead to prevent useless fingerprint computations. */
      if (tgt + runlen > j + RABIN_WINDOW_SIZE)
      { j = tgt  + runlen - RABIN_WINDOW_SIZE;
      }

      /* Quickly scan ahead without looking for matches
         until the end of this run */

      while (j + sizeof (w) < tgt + runlen) fp = rabin_slide (fp, data[j++]);

      while (j < tgt + runlen)
      { unsigned char ch = data[j++];
        fp = rabin_slide (fp, ch);

        w = big_endian ? w<<8 | ch : w>>8 | ch<<((sizeof (w) - 1) * 8);
  } } }

  return 1;
}

unsigned calculate_delta (void *to_buf, unsigned long to_size)
{ unsigned delta_size;

  assert (to_size < MAX_SIZE);

  if (!find_copies ((unsigned char *) to_buf, to_size)) return 0;
  delta_size = process_copies ((unsigned char *) to_buf, to_size);

  return delta_size;
}

static unsigned char *
write_header (unsigned char *patch, unsigned srclen, unsigned tgtlen)
{ 
  assert (srclen <= MAX_SIZE && tgtlen <= MAX_SIZE);

  while (srclen >= 0x7f) 
  { *patch++= (srclen & 0x7f) | 0x80;
    srclen >>= 7; 
  }
  *patch++ = srclen;
  while (tgtlen >= 0x7f)
  { *patch++ = (tgtlen & 0x7f) | 0x80;
    tgtlen >>= 7;
  }
  *patch++ = tgtlen;

  return patch;
}

static unsigned char *
write_data (unsigned char *patch, unsigned char *data, unsigned size)
{ assert (size > 0 && size < MAX_SIZE);
  /* The return value must be equal to patch + data_length (patch, size).
     This correspondence is essential for calculating the patch size.  */

  /* GIT has no data commands for large data, rest is same as GDIFF */
  while (size > 0x7f)
  { *patch++ = (unsigned char) 0x7f;
    memcpy (patch, data, 0x7f);
    data += 0x7f;
    patch += 0x7f;
    size -= 0x7f;
  }

  *patch++ = (unsigned char) size; 
  memcpy (patch, data, size);

  return patch + size;
} 

static unsigned char *
write_copy (unsigned char *patch, unsigned offset, unsigned size)
{ /* The return value must be equal to patch + copy_length (patch,offset,size).
     This correspondence is essential for calculating the patch size.  */

  while (size > 0)
  { unsigned chunksize = MIN (0xffffff, size);
    unsigned char cmd = 1;
    cmd = (cmd<<1) | !!(size & 0xff0000);
    cmd = (cmd<<1) | !!(size & 0x00ff00);
    cmd = (cmd<<1) | !!(size & 0x0000ff); 
    cmd = (cmd<<1) | !!(offset & 0xff000000);
    cmd = (cmd<<1) | !!(offset & 0x00ff0000);
    cmd = (cmd<<1) | !!(offset & 0x0000ff00);
    cmd = (cmd<<1) | !!(offset & 0x000000ff);
    *patch++ = cmd | 0x80;
    if (cmd & 0x01) *patch++ = offset & 0xff;
    if (cmd & 0x02) *patch++ = (offset >> 8) & 0xff;
    if (cmd & 0x04) *patch++ = (offset >> 16) & 0xff;
    if (cmd & 0x08) *patch++ = (offset >> 24) & 0xff;
    if (cmd & 0x10) *patch++ = size & 0xff;
    if (cmd & 0x20) *patch++ = (size >> 8) & 0xff;
    if (cmd & 0x40) *patch++ = (size >> 16) & 0xff;
    size -= chunksize;
  } 
  return patch;
} 

void*
create_delta (unsigned char *data, unsigned len, unsigned delta_size)
{ unsigned char *delta = (unsigned char *) malloc (delta_size);
  unsigned char *ptr = delta;
  unsigned offset = 0;
  unsigned data_commands = 0;
  unsigned copy_commands = 0;
  int j;

  ptr = write_header (ptr, idx_data_len, len);

  for (j = 0; j < copy_count; j++)
  { CopyInfo *copy = copies + j;
    unsigned copylen = copy->length;

    if (copylen)
    { if (copy->tgt > offset)
      { assert (delta_size - (ptr - delta) > data_length (copy->tgt - offset));
        ptr = write_data (ptr, data + offset, copy->tgt - offset);
        data_commands++;
      }

      assert (delta_size - (ptr - delta) >= copy_length (copy->src, copylen));

      ptr = write_copy (ptr, copy->src, copylen);
      copy_commands++;
      offset = copy->tgt + copylen;
  } }

  if (offset < len)
  { assert (delta_size - (ptr - delta) >= data_length (len - offset));
    ptr = write_data (ptr, data + offset, len - offset);
    data_commands++;
  }
  assert (ptr - delta == (int) delta_size);

  return delta;
}

void *diff_delta(void *from_buf, unsigned long from_size,
                 void *to_buf, unsigned long to_size,
                 unsigned long *delta_size, unsigned long max_size)
{ unsigned dsize;
  assert (from_size <= MAX_SIZE && to_size <= MAX_SIZE);
  init_idx (from_buf, from_size, 1); /* Use optimization level 1 */
  dsize = calculate_delta (to_buf, to_size);
  if (!dsize) return 0;
  *delta_size = dsize;
  return create_delta (to_buf, to_size, *delta_size);
}

^ permalink raw reply

* Re: git-log produces no output
From: Linus Torvalds @ 2006-04-21 20:25 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <7vpsjasnh5.fsf@assigned-by-dhcp.cox.net>



On Fri, 21 Apr 2006, Junio C Hamano wrote:
>
> Linus Torvalds <torvalds@osdl.org> writes:
> 
> > On Fri, 21 Apr 2006, Linus Torvalds wrote:
> >> 
> >> This patch would have made things a lot more obvious.
> >
> > Actually, scratch that one, and use this one instead. Much better, and 
> > actually allows Bob's crazy PAGER environment variable to work, rather 
> > than just reporting an error about it.
> 
> Agreed, this is much better than just punting.  Sign-off?

Yup, you can just go wild with the 

	Signed-off-by: Linus Torvalds <torvalds@osdl.org>

lines. My employment contract requires that everything I produce is open 
source ;)

> BTW: The extended extended SHA1 is a great addition.  I do not
> usually have contrib/colordiff checked out (it is in "pu", not
> in "next"), but I can easily do:
> 
> 	git tar-tree pu:contrib/colordiff colordiff | tar xf -

Ahh, yes. That is one situation where a sub-tree SHA1 makes more sense 
than most (the fact that it works with "git diff" and directory renames is 
likely more of a curiosity than anything widely useful, I think)

> BTW: Allow me to try "git fmt-patch -1" ;-).

Looks good to me.

		Linus

^ permalink raw reply

* Re: git-log produces no output
From: Junio C Hamano @ 2006-04-21 20:05 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git
In-Reply-To: <Pine.LNX.4.64.0604211223561.3701@g5.osdl.org>

Linus Torvalds <torvalds@osdl.org> writes:

> On Fri, 21 Apr 2006, Linus Torvalds wrote:
>> 
>> This patch would have made things a lot more obvious.
>
> Actually, scratch that one, and use this one instead. Much better, and 
> actually allows Bob's crazy PAGER environment variable to work, rather 
> than just reporting an error about it.

Agreed, this is much better than just punting.  Sign-off?

BTW: The extended extended SHA1 is a great addition.  I do not
usually have contrib/colordiff checked out (it is in "pu", not
in "next"), but I can easily do:

	git tar-tree pu:contrib/colordiff colordiff | tar xf -

Of course "git tar-tree pu | tar xf - contrib/colordiff" would
work for this particular case, but that is besides the point.

BTW: Allow me to try "git fmt-patch -1" ;-).

-- >8 --
From 34fd1c9ac5845d541e3196983df7f993e751b544  Thu Apr 7 15:13:13 2005
From: Linus Torvalds <torvalds@osdl.org>
Date: Fri Apr 21 12:25:13 2006 -0700
Subject: git-log produces no output

When $PAGER is set to 'less -i', we used to fail because we
assumed the $PAGER is a command and simply exec'ed it.

Try exec first, and then run it through shell if it fails.  This
allows even funkier PAGERs like these ;-):

	PAGER='sed -e "s/^/`date`: /" | more'
	PAGER='contrib/colordiff.perl | less -RS'

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
---
 pager.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/pager.c b/pager.c
index e5ba273..f7b8e78 100644
--- a/pager.c
+++ b/pager.c
@@ -8,6 +8,7 @@ #include "cache.h"
 static void run_pager(const char *pager)
 {
 	execlp(pager, pager, NULL);
+	execl("/bin/sh", "sh", "-c", pager, NULL);
 }
 
 void setup_pager(void)
@@ -47,5 +48,6 @@ void setup_pager(void)
 
 	setenv("LESS", "-S", 0);
 	run_pager(pager);
+	die("unable to execute pager '%s'", pager);
 	exit(255);
 }
-- 
1.3.0.gd1e3

^ permalink raw reply related

* Re: git-log produces no output
From: Bob Portmann @ 2006-04-21 19:38 UTC (permalink / raw)
  To: Linus Torvalds, Junio C Hamano; +Cc: Git Mailing List
In-Reply-To: <Pine.LNX.4.64.0604211223561.3701@g5.osdl.org>

Yes, that fixes it, even with the crazy PAGER (which is going to be
plain 'more' from now on:-)

Thanks,
Bob

--- Linus Torvalds <torvalds@osdl.org> wrote:

> On Fri, 21 Apr 2006, Linus Torvalds wrote:
> > 
> > This patch would have made things a lot more obvious.
> 
> Actually, scratch that one, and use this one instead. Much better,
> and 
> actually allows Bob's crazy PAGER environment variable to work,
> rather 
> than just reporting an error about it.
> 
> 		Linus
> ---
> diff --git a/pager.c b/pager.c
> index b063353..9a30939 100644
> --- a/pager.c
> +++ b/pager.c
> @@ -8,6 +8,7 @@ #include "cache.h"
>  static void run_pager(const char *pager)
>  {
>  	execlp(pager, pager, NULL);
> +	execl("/bin/sh", "sh", "-c", pager, NULL);
>  }
>  
>  void setup_pager(void)
> @@ -47,5 +48,6 @@ void setup_pager(void)
>  
>  	setenv("LESS", "-S", 0);
>  	run_pager(pager);
> +	die("unable to execute pager '%s'", pager);
>  	exit(255);
>  }
> 


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

^ permalink raw reply

* Re: git-log produces no output
From: Linus Torvalds @ 2006-04-21 19:25 UTC (permalink / raw)
  To: Bob Portmann, Junio C Hamano; +Cc: Git Mailing List
In-Reply-To: <Pine.LNX.4.64.0604211214560.3701@g5.osdl.org>



On Fri, 21 Apr 2006, Linus Torvalds wrote:
> 
> This patch would have made things a lot more obvious.

Actually, scratch that one, and use this one instead. Much better, and 
actually allows Bob's crazy PAGER environment variable to work, rather 
than just reporting an error about it.

		Linus
---
diff --git a/pager.c b/pager.c
index b063353..9a30939 100644
--- a/pager.c
+++ b/pager.c
@@ -8,6 +8,7 @@ #include "cache.h"
 static void run_pager(const char *pager)
 {
 	execlp(pager, pager, NULL);
+	execl("/bin/sh", "sh", "-c", pager, NULL);
 }
 
 void setup_pager(void)
@@ -47,5 +48,6 @@ void setup_pager(void)
 
 	setenv("LESS", "-S", 0);
 	run_pager(pager);
+	die("unable to execute pager '%s'", pager);
 	exit(255);
 }

^ permalink raw reply related

* Re: git-log produces no output
From: Linus Torvalds @ 2006-04-21 19:18 UTC (permalink / raw)
  To: Bob Portmann, Junio C Hamano; +Cc: Git Mailing List
In-Reply-To: <20060421184815.22939.qmail@web60319.mail.yahoo.com>



On Fri, 21 Apr 2006, Bob Portmann wrote:
> 
> Yes, this is the problem.  It works when I send it to a file.  It seems
> to be that having any extra options my PAGER command that messes it up
> (see below).  If get-log was a shell script I would imagine that some
> quotes are missing:-)

It's the other way around: it's got "too much" quoting.

"git log" will literally _execute_ the PAGER environment, not pass it to a 
shell, and not interpret any arguments.

So it will look for a program called "more -i" (space and all), and no 
such program exists, so the execve fails, and git log ends up being 
silent.

This patch would have made things a lot more obvious.

Junio?

		Linus
---
diff --git a/pager.c b/pager.c
index b063353..9204641 100644
--- a/pager.c
+++ b/pager.c
@@ -47,5 +47,6 @@ void setup_pager(void)
 
 	setenv("LESS", "-S", 0);
 	run_pager(pager);
+	die("unable to execute pager '%s'", pager);
 	exit(255);
 }

^ permalink raw reply related

* Re: git-log produces no output
From: Bob Portmann @ 2006-04-21 19:13 UTC (permalink / raw)
  To: Paolo Ciarrocchi; +Cc: Linus Torvalds, Git Mailing List
In-Reply-To: <4d8e3fd30604211158w71e97efew9646203a5510f409@mail.gmail.com>

--- Paolo Ciarrocchi <paolo.ciarrocchi@gmail.com> wrote:
> On 4/21/06, Bob Portmann <bportmann@yahoo.com> wrote:
> [...]
> > test-log> export PAGER='more -i'
> > test-log> git log
> > test-log>
> >
> > Adding the option -i (which should do nothing) has eliminated the
> > output.
> 
> Well, on my machine:
> paolo@Italia:~/gkernel$ more -i
> more: unknown option "-i"
> usage: more [-dflpcsu] [+linenum | +/pattern] name1 name2 ...

Well, then try an option that does work.  On my system less and more
are essentially the same and -i just causes it to ignore case on
searches.  The PAGER is automatically setup to 'less -XRse' on my
system (not sure why).  Changing it to 'more' makes git-log work, but
it should work with 'less -XRse' as well since that works with other
commands on the system (e.g. man) and git-whatchanged as well.

Bob
 
> Ciao,
> --
> Paolo
> http://paolociarrocchi.googlepages.com
> 


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

^ permalink raw reply

* Re: git-log produces no output
From: Paolo Ciarrocchi @ 2006-04-21 18:58 UTC (permalink / raw)
  To: Bob Portmann; +Cc: Linus Torvalds, Git Mailing List
In-Reply-To: <20060421184815.22939.qmail@web60319.mail.yahoo.com>

On 4/21/06, Bob Portmann <bportmann@yahoo.com> wrote:
[...]
> test-log> export PAGER='more -i'
> test-log> git log
> test-log>
>
> Adding the option -i (which should do nothing) has eliminated the
> output.

Well, on my machine:
paolo@Italia:~/gkernel$ more -i
more: unknown option "-i"
usage: more [-dflpcsu] [+linenum | +/pattern] name1 name2 ...

Ciao,
--
Paolo
http://paolociarrocchi.googlepages.com

^ permalink raw reply

* Re: git-log produces no output
From: Bob Portmann @ 2006-04-21 18:48 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Git Mailing List
In-Reply-To: <Pine.LNX.4.64.0604211102000.3701@g5.osdl.org>

--- Linus Torvalds <torvalds@osdl.org> wrote:
> On Fri, 21 Apr 2006, Bob Portmann wrote:
> >
> >  I cannot get any output out of it and am wondering if I am using
> it
> > correctly or it is broken.
> 
> You're using it correctly, but it isn't broken for me. 
> 
> > As I understand it, git-log should just print out the log messages
> but 
> > not the changes, whereas git-whatchanged will print out both.
> 
> Well, in 1.3.0, "git log" can actually do both, and you can get the 
> whatchanged output by just saying "git log -p".
> 
> But yes, without the "-p", you should get just the log.
> 
> And that's exactly what I get, both with current HEAD git, and with a
> 
> v1.3.0 checkout.
> 
> > test-log> git log
> > test-log> 
> > 
> > As you can see git log produces no output.  I've tried it with
> other
> > options with the same result.
> 
> Very strange indeed. Can you do
> 
> 	git log > file
> 
> to see if that changes (and see if the file contains anything)? The
> reason 
> I mention that is that by default "git log" will start a pager for
> you, 
> and if you somehow have a broken PAGER setup, I could imagine exactly
> the 
> behaviour you see (although I don't see why "git whatchanged" would
> work 
> either, in that case).

Yes, this is the problem.  It works when I send it to a file.  It seems
to be that having any extra options my PAGER command that messes it up
(see below).  If get-log was a shell script I would imagine that some
quotes are missing:-)

Bob

test-log> export PAGER='more'
test-log> git log
commit 9a4d7602fff052b6796c2862edddd11ae2e45d08
Author: Bob Portmann <portmann@removed>
Date:   Fri Apr 21 10:56:11 2006 -0600

    Two line hello

commit a38306518c5e5e8eb630c02a47bec2a9fc292025
Author: Bob Portmann <portmann@removed>
Date:   Fri Apr 21 10:55:44 2006 -0600

    One line hello

test-log> export PAGER='more -i'
test-log> git log
test-log> 

Adding the option -i (which should do nothing) has eliminated the
output.


> Finally, if that doesn't output anything either, please do (for just
> that 
> small repository, so that the trace is also small)
> 
> 	strace -o git-trace git log > /dev/null
> 
> and send out the result. Again, for PAGER reasons, that "> /dev/null"
> is 
> actually important, because we don't want to trigger the pager code.
> 
> 		Linus
> 


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

^ permalink raw reply

* Re: git-log produces no output
From: Linus Torvalds @ 2006-04-21 18:11 UTC (permalink / raw)
  To: Bob Portmann; +Cc: Git Mailing List
In-Reply-To: <20060421172001.44441.qmail@web60325.mail.yahoo.com>



On Fri, 21 Apr 2006, Bob Portmann wrote:
>
>  I cannot get any output out of it and am wondering if I am using it
> correctly or it is broken.

You're using it correctly, but it isn't broken for me. 

> As I understand it, git-log should just print out the log messages but 
> not the changes, whereas git-whatchanged will print out both.

Well, in 1.3.0, "git log" can actually do both, and you can get the 
whatchanged output by just saying "git log -p".

But yes, without the "-p", you should get just the log.

And that's exactly what I get, both with current HEAD git, and with a 
v1.3.0 checkout.

> test-log> git log
> test-log> 
> 
> As you can see git log produces no output.  I've tried it with other
> options with the same result.

Very strange indeed. Can you do

	git log > file

to see if that changes (and see if the file contains anything)? The reason 
I mention that is that by default "git log" will start a pager for you, 
and if you somehow have a broken PAGER setup, I could imagine exactly the 
behaviour you see (although I don't see why "git whatchanged" would work 
either, in that case).

Finally, if that doesn't output anything either, please do (for just that 
small repository, so that the trace is also small)

	strace -o git-trace git log > /dev/null

and send out the result. Again, for PAGER reasons, that "> /dev/null" is 
actually important, because we don't want to trigger the pager code.

		Linus

^ permalink raw reply

* Split up builtin commands into separate files from git.c
From: Linus Torvalds @ 2006-04-21 17:27 UTC (permalink / raw)
  To: Junio C Hamano, Git Mailing List


Right now it split it into "builtin-log.c" for log-related commands
("log", "show" and "whatchanged"), and "builtin-help.c" for the
informational commands (usage printing and "help" and "version").

This just makes things easier to read, I find.

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
---

[ Hey, make of this what you will. There's no real code changes, except I 
  used that "git_version_string[]" variable to make things easier to 
  build, and I renamed "cmd_wc" to "cmd_whatchanged", since when it's 
  split up into another file, we don't want to have a gratuitous 
  short-hand that we have to remember across files.

  I find things easier to work with the more you split them up along 
  conceptual lines, and as we do more and more built-ins, git.c would end 
  up a horrible mess unless we do _something_ like this.

  But if people don't like it, it's not a big deal. Another throw-away 
  patch from me. ]

 Makefile       |    9 +-
 builtin-help.c |  241 ++++++++++++++++++++++++++++++++++++++++++++
 builtin-log.c  |   69 +++++++++++++
 builtin.h      |   23 ++++
 git.c          |  305 +-------------------------------------------------------
 5 files changed, 342 insertions(+), 305 deletions(-)

diff --git a/Makefile b/Makefile
index 3ecd674..a83c502 100644
--- a/Makefile
+++ b/Makefile
@@ -213,6 +213,9 @@ LIB_OBJS = \
 	fetch-clone.o revision.o pager.o tree-walk.o xdiff-interface.o \
 	$(DIFF_OBJS)
 
+BUILTIN_OBJS = \
+	builtin-log.o builtin-help.o
+
 GITLIBS = $(LIB_FILE) $(XDIFF_LIB)
 LIBS = $(GITLIBS) -lz
 
@@ -462,10 +465,10 @@ all:
 strip: $(PROGRAMS) git$X
 	$(STRIP) $(STRIP_OPTS) $(PROGRAMS) git$X
 
-git$X: git.c common-cmds.h $(GITLIBS)
+git$X: git.c common-cmds.h $(BUILTIN_OBJS) $(GITLIBS)
 	$(CC) -DGIT_VERSION='"$(GIT_VERSION)"' \
 		$(ALL_CFLAGS) -o $@ $(filter %.c,$^) \
-		$(ALL_LDFLAGS) $(LIBS)
+		$(BUILTIN_OBJS) $(ALL_LDFLAGS) $(LIBS)
 
 $(BUILT_INS): git$X
 	rm -f $@ && ln git$X $@
@@ -565,7 +568,7 @@ init-db.o: init-db.c
 	$(CC) -c $(ALL_CFLAGS) \
 		-DDEFAULT_GIT_TEMPLATE_DIR='"$(template_dir_SQ)"' $*.c
 
-$(LIB_OBJS): $(LIB_H)
+$(LIB_OBJS) $(BUILTIN_OBJS): $(LIB_H)
 $(patsubst git-%$X,%.o,$(PROGRAMS)): $(GITLIBS)
 $(DIFF_OBJS): diffcore.h
 
diff --git a/builtin-help.c b/builtin-help.c
new file mode 100644
index 0000000..10a59cc
--- /dev/null
+++ b/builtin-help.c
@@ -0,0 +1,241 @@
+/*
+ * builtin-help.c
+ *
+ * Builtin help-related commands (help, usage, version)
+ */
+#include "cache.h"
+#include "builtin.h"
+#include "exec_cmd.h"
+#include "common-cmds.h"
+
+static const char git_usage[] =
+	"Usage: git [--version] [--exec-path[=GIT_EXEC_PATH]] [--help] COMMAND [ ARGS ]";
+
+/* most gui terms set COLUMNS (although some don't export it) */
+static int term_columns(void)
+{
+	char *col_string = getenv("COLUMNS");
+	int n_cols = 0;
+
+	if (col_string && (n_cols = atoi(col_string)) > 0)
+		return n_cols;
+
+#ifdef TIOCGWINSZ
+	{
+		struct winsize ws;
+		if (!ioctl(1, TIOCGWINSZ, &ws)) {
+			if (ws.ws_col)
+				return ws.ws_col;
+		}
+	}
+#endif
+
+	return 80;
+}
+
+static void oom(void)
+{
+	fprintf(stderr, "git: out of memory\n");
+	exit(1);
+}
+
+static inline void mput_char(char c, unsigned int num)
+{
+	while(num--)
+		putchar(c);
+}
+
+static struct cmdname {
+	size_t len;
+	char name[1];
+} **cmdname;
+static int cmdname_alloc, cmdname_cnt;
+
+static void add_cmdname(const char *name, int len)
+{
+	struct cmdname *ent;
+	if (cmdname_alloc <= cmdname_cnt) {
+		cmdname_alloc = cmdname_alloc + 200;
+		cmdname = realloc(cmdname, cmdname_alloc * sizeof(*cmdname));
+		if (!cmdname)
+			oom();
+	}
+	ent = malloc(sizeof(*ent) + len);
+	if (!ent)
+		oom();
+	ent->len = len;
+	memcpy(ent->name, name, len);
+	ent->name[len] = 0;
+	cmdname[cmdname_cnt++] = ent;
+}
+
+static int cmdname_compare(const void *a_, const void *b_)
+{
+	struct cmdname *a = *(struct cmdname **)a_;
+	struct cmdname *b = *(struct cmdname **)b_;
+	return strcmp(a->name, b->name);
+}
+
+static void pretty_print_string_list(struct cmdname **cmdname, int longest)
+{
+	int cols = 1, rows;
+	int space = longest + 1; /* min 1 SP between words */
+	int max_cols = term_columns() - 1; /* don't print *on* the edge */
+	int i, j;
+
+	if (space < max_cols)
+		cols = max_cols / space;
+	rows = (cmdname_cnt + cols - 1) / cols;
+
+	qsort(cmdname, cmdname_cnt, sizeof(*cmdname), cmdname_compare);
+
+	for (i = 0; i < rows; i++) {
+		printf("  ");
+
+		for (j = 0; j < cols; j++) {
+			int n = j * rows + i;
+			int size = space;
+			if (n >= cmdname_cnt)
+				break;
+			if (j == cols-1 || n + rows >= cmdname_cnt)
+				size = 1;
+			printf("%-*s", size, cmdname[n]->name);
+		}
+		putchar('\n');
+	}
+}
+
+static void list_commands(const char *exec_path, const char *pattern)
+{
+	unsigned int longest = 0;
+	char path[PATH_MAX];
+	int dirlen;
+	DIR *dir = opendir(exec_path);
+	struct dirent *de;
+
+	if (!dir) {
+		fprintf(stderr, "git: '%s': %s\n", exec_path, strerror(errno));
+		exit(1);
+	}
+
+	dirlen = strlen(exec_path);
+	if (PATH_MAX - 20 < dirlen) {
+		fprintf(stderr, "git: insanely long exec-path '%s'\n",
+			exec_path);
+		exit(1);
+	}
+
+	memcpy(path, exec_path, dirlen);
+	path[dirlen++] = '/';
+
+	while ((de = readdir(dir)) != NULL) {
+		struct stat st;
+		int entlen;
+
+		if (strncmp(de->d_name, "git-", 4))
+			continue;
+		strcpy(path+dirlen, de->d_name);
+		if (stat(path, &st) || /* stat, not lstat */
+		    !S_ISREG(st.st_mode) ||
+		    !(st.st_mode & S_IXUSR))
+			continue;
+
+		entlen = strlen(de->d_name);
+		if (4 < entlen && !strcmp(de->d_name + entlen - 4, ".exe"))
+			entlen -= 4;
+
+		if (longest < entlen)
+			longest = entlen;
+
+		add_cmdname(de->d_name + 4, entlen-4);
+	}
+	closedir(dir);
+
+	printf("git commands available in '%s'\n", exec_path);
+	printf("----------------------------");
+	mput_char('-', strlen(exec_path));
+	putchar('\n');
+	pretty_print_string_list(cmdname, longest - 4);
+	putchar('\n');
+}
+
+static void list_common_cmds_help(void)
+{
+	int i, longest = 0;
+
+	for (i = 0; i < ARRAY_SIZE(common_cmds); i++) {
+		if (longest < strlen(common_cmds[i].name))
+			longest = strlen(common_cmds[i].name);
+	}
+
+	puts("The most commonly used git commands are:");
+	for (i = 0; i < ARRAY_SIZE(common_cmds); i++) {
+		printf("    %s", common_cmds[i].name);
+		mput_char(' ', longest - strlen(common_cmds[i].name) + 4);
+		puts(common_cmds[i].help);
+	}
+	puts("(use 'git help -a' to get a list of all installed git commands)");
+}
+
+void cmd_usage(int show_all, const char *exec_path, const char *fmt, ...)
+{
+	if (fmt) {
+		va_list ap;
+
+		va_start(ap, fmt);
+		printf("git: ");
+		vprintf(fmt, ap);
+		va_end(ap);
+		putchar('\n');
+	}
+	else
+		puts(git_usage);
+
+	if (exec_path) {
+		putchar('\n');
+		if (show_all)
+			list_commands(exec_path, "git-*");
+		else
+			list_common_cmds_help();
+        }
+
+	exit(1);
+}
+
+static void show_man_page(const char *git_cmd)
+{
+	const char *page;
+
+	if (!strncmp(git_cmd, "git", 3))
+		page = git_cmd;
+	else {
+		int page_len = strlen(git_cmd) + 4;
+		char *p = malloc(page_len + 1);
+		strcpy(p, "git-");
+		strcpy(p + 4, git_cmd);
+		p[page_len] = 0;
+		page = p;
+	}
+
+	execlp("man", "man", page, NULL);
+}
+
+int cmd_version(int argc, const char **argv, char **envp)
+{
+	printf("git version %s\n", git_version_string);
+	return 0;
+}
+
+int cmd_help(int argc, const char **argv, char **envp)
+{
+	const char *help_cmd = argv[1];
+	if (!help_cmd)
+		cmd_usage(0, git_exec_path(), NULL);
+	else if (!strcmp(help_cmd, "--all") || !strcmp(help_cmd, "-a"))
+		cmd_usage(1, git_exec_path(), NULL);
+	else
+		show_man_page(help_cmd);
+	return 0;
+}
+
+
diff --git a/builtin-log.c b/builtin-log.c
new file mode 100644
index 0000000..418101d
--- /dev/null
+++ b/builtin-log.c
@@ -0,0 +1,69 @@
+/*
+ * Builtin "git log" and related commands (show, whatchanged)
+ *
+ * (C) Copyright 2006 Linus Torvalds
+ *		 2006 Junio Hamano
+ */ 
+#include "cache.h"
+#include "commit.h"
+#include "diff.h"
+#include "revision.h"
+#include "log-tree.h"
+
+static int cmd_log_wc(int argc, const char **argv, char **envp,
+		      struct rev_info *rev)
+{
+	struct commit *commit;
+
+	rev->abbrev = DEFAULT_ABBREV;
+	rev->commit_format = CMIT_FMT_DEFAULT;
+	rev->verbose_header = 1;
+	argc = setup_revisions(argc, argv, rev, "HEAD");
+
+	if (argc > 1)
+		die("unrecognized argument: %s", argv[1]);
+
+	prepare_revision_walk(rev);
+	setup_pager();
+	while ((commit = get_revision(rev)) != NULL) {
+		log_tree_commit(rev, commit);
+		free(commit->buffer);
+		commit->buffer = NULL;
+	}
+	return 0;
+}
+
+int cmd_whatchanged(int argc, const char **argv, char **envp)
+{
+	struct rev_info rev;
+
+	init_revisions(&rev);
+	rev.diff = 1;
+	rev.diffopt.recursive = 1;
+	return cmd_log_wc(argc, argv, envp, &rev);
+}
+
+int cmd_show(int argc, const char **argv, char **envp)
+{
+	struct rev_info rev;
+
+	init_revisions(&rev);
+	rev.diff = 1;
+	rev.diffopt.recursive = 1;
+	rev.combine_merges = 1;
+	rev.dense_combined_merges = 1;
+	rev.always_show_header = 1;
+	rev.ignore_merges = 0;
+	rev.no_walk = 1;
+	return cmd_log_wc(argc, argv, envp, &rev);
+}
+
+int cmd_log(int argc, const char **argv, char **envp)
+{
+	struct rev_info rev;
+
+	init_revisions(&rev);
+	rev.always_show_header = 1;
+	rev.diffopt.recursive = 1;
+	return cmd_log_wc(argc, argv, envp, &rev);
+}
diff --git a/builtin.h b/builtin.h
new file mode 100644
index 0000000..47408a0
--- /dev/null
+++ b/builtin.h
@@ -0,0 +1,23 @@
+#ifndef BUILTIN_H
+#define BUILTIN_H
+
+#ifndef PATH_MAX
+# define PATH_MAX 4096
+#endif
+
+extern const char git_version_string[];
+
+void cmd_usage(int show_all, const char *exec_path, const char *fmt, ...)
+#ifdef __GNUC__
+	__attribute__((__format__(__printf__, 3, 4), __noreturn__))
+#endif
+	;
+
+extern int cmd_help(int argc, const char **argv, char **envp);
+extern int cmd_version(int argc, const char **argv, char **envp);
+
+extern int cmd_whatchanged(int argc, const char **argv, char **envp);
+extern int cmd_show(int argc, const char **argv, char **envp);
+extern int cmd_log(int argc, const char **argv, char **envp);
+
+#endif
diff --git a/git.c b/git.c
index 40b7e42..aa2b814 100644
--- a/git.c
+++ b/git.c
@@ -11,215 +11,8 @@ #include <stdarg.h>
 #include <sys/ioctl.h>
 #include "git-compat-util.h"
 #include "exec_cmd.h"
-#include "common-cmds.h"
 
-#include "cache.h"
-#include "commit.h"
-#include "diff.h"
-#include "revision.h"
-#include "log-tree.h"
-
-#ifndef PATH_MAX
-# define PATH_MAX 4096
-#endif
-
-static const char git_usage[] =
-	"Usage: git [--version] [--exec-path[=GIT_EXEC_PATH]] [--help] COMMAND [ ARGS ]";
-
-/* most gui terms set COLUMNS (although some don't export it) */
-static int term_columns(void)
-{
-	char *col_string = getenv("COLUMNS");
-	int n_cols = 0;
-
-	if (col_string && (n_cols = atoi(col_string)) > 0)
-		return n_cols;
-
-#ifdef TIOCGWINSZ
-	{
-		struct winsize ws;
-		if (!ioctl(1, TIOCGWINSZ, &ws)) {
-			if (ws.ws_col)
-				return ws.ws_col;
-		}
-	}
-#endif
-
-	return 80;
-}
-
-static void oom(void)
-{
-	fprintf(stderr, "git: out of memory\n");
-	exit(1);
-}
-
-static inline void mput_char(char c, unsigned int num)
-{
-	while(num--)
-		putchar(c);
-}
-
-static struct cmdname {
-	size_t len;
-	char name[1];
-} **cmdname;
-static int cmdname_alloc, cmdname_cnt;
-
-static void add_cmdname(const char *name, int len)
-{
-	struct cmdname *ent;
-	if (cmdname_alloc <= cmdname_cnt) {
-		cmdname_alloc = cmdname_alloc + 200;
-		cmdname = realloc(cmdname, cmdname_alloc * sizeof(*cmdname));
-		if (!cmdname)
-			oom();
-	}
-	ent = malloc(sizeof(*ent) + len);
-	if (!ent)
-		oom();
-	ent->len = len;
-	memcpy(ent->name, name, len);
-	ent->name[len] = 0;
-	cmdname[cmdname_cnt++] = ent;
-}
-
-static int cmdname_compare(const void *a_, const void *b_)
-{
-	struct cmdname *a = *(struct cmdname **)a_;
-	struct cmdname *b = *(struct cmdname **)b_;
-	return strcmp(a->name, b->name);
-}
-
-static void pretty_print_string_list(struct cmdname **cmdname, int longest)
-{
-	int cols = 1, rows;
-	int space = longest + 1; /* min 1 SP between words */
-	int max_cols = term_columns() - 1; /* don't print *on* the edge */
-	int i, j;
-
-	if (space < max_cols)
-		cols = max_cols / space;
-	rows = (cmdname_cnt + cols - 1) / cols;
-
-	qsort(cmdname, cmdname_cnt, sizeof(*cmdname), cmdname_compare);
-
-	for (i = 0; i < rows; i++) {
-		printf("  ");
-
-		for (j = 0; j < cols; j++) {
-			int n = j * rows + i;
-			int size = space;
-			if (n >= cmdname_cnt)
-				break;
-			if (j == cols-1 || n + rows >= cmdname_cnt)
-				size = 1;
-			printf("%-*s", size, cmdname[n]->name);
-		}
-		putchar('\n');
-	}
-}
-
-static void list_commands(const char *exec_path, const char *pattern)
-{
-	unsigned int longest = 0;
-	char path[PATH_MAX];
-	int dirlen;
-	DIR *dir = opendir(exec_path);
-	struct dirent *de;
-
-	if (!dir) {
-		fprintf(stderr, "git: '%s': %s\n", exec_path, strerror(errno));
-		exit(1);
-	}
-
-	dirlen = strlen(exec_path);
-	if (PATH_MAX - 20 < dirlen) {
-		fprintf(stderr, "git: insanely long exec-path '%s'\n",
-			exec_path);
-		exit(1);
-	}
-
-	memcpy(path, exec_path, dirlen);
-	path[dirlen++] = '/';
-
-	while ((de = readdir(dir)) != NULL) {
-		struct stat st;
-		int entlen;
-
-		if (strncmp(de->d_name, "git-", 4))
-			continue;
-		strcpy(path+dirlen, de->d_name);
-		if (stat(path, &st) || /* stat, not lstat */
-		    !S_ISREG(st.st_mode) ||
-		    !(st.st_mode & S_IXUSR))
-			continue;
-
-		entlen = strlen(de->d_name);
-		if (4 < entlen && !strcmp(de->d_name + entlen - 4, ".exe"))
-			entlen -= 4;
-
-		if (longest < entlen)
-			longest = entlen;
-
-		add_cmdname(de->d_name + 4, entlen-4);
-	}
-	closedir(dir);
-
-	printf("git commands available in '%s'\n", exec_path);
-	printf("----------------------------");
-	mput_char('-', strlen(exec_path));
-	putchar('\n');
-	pretty_print_string_list(cmdname, longest - 4);
-	putchar('\n');
-}
-
-static void list_common_cmds_help(void)
-{
-	int i, longest = 0;
-
-	for (i = 0; i < ARRAY_SIZE(common_cmds); i++) {
-		if (longest < strlen(common_cmds[i].name))
-			longest = strlen(common_cmds[i].name);
-	}
-
-	puts("The most commonly used git commands are:");
-	for (i = 0; i < ARRAY_SIZE(common_cmds); i++) {
-		printf("    %s", common_cmds[i].name);
-		mput_char(' ', longest - strlen(common_cmds[i].name) + 4);
-		puts(common_cmds[i].help);
-	}
-	puts("(use 'git help -a' to get a list of all installed git commands)");
-}
-
-#ifdef __GNUC__
-static void cmd_usage(int show_all, const char *exec_path, const char *fmt, ...)
-	__attribute__((__format__(__printf__, 3, 4), __noreturn__));
-#endif
-static void cmd_usage(int show_all, const char *exec_path, const char *fmt, ...)
-{
-	if (fmt) {
-		va_list ap;
-
-		va_start(ap, fmt);
-		printf("git: ");
-		vprintf(fmt, ap);
-		va_end(ap);
-		putchar('\n');
-	}
-	else
-		puts(git_usage);
-
-	if (exec_path) {
-		putchar('\n');
-		if (show_all)
-			list_commands(exec_path, "git-*");
-		else
-			list_common_cmds_help();
-        }
-
-	exit(1);
-}
+#include "builtin.h"
 
 static void prepend_to_path(const char *dir, int len)
 {
@@ -240,99 +33,7 @@ static void prepend_to_path(const char *
 	setenv("PATH", path, 1);
 }
 
-static void show_man_page(const char *git_cmd)
-{
-	const char *page;
-
-	if (!strncmp(git_cmd, "git", 3))
-		page = git_cmd;
-	else {
-		int page_len = strlen(git_cmd) + 4;
-		char *p = malloc(page_len + 1);
-		strcpy(p, "git-");
-		strcpy(p + 4, git_cmd);
-		p[page_len] = 0;
-		page = p;
-	}
-
-	execlp("man", "man", page, NULL);
-}
-
-static int cmd_version(int argc, const char **argv, char **envp)
-{
-	printf("git version %s\n", GIT_VERSION);
-	return 0;
-}
-
-static int cmd_help(int argc, const char **argv, char **envp)
-{
-	const char *help_cmd = argv[1];
-	if (!help_cmd)
-		cmd_usage(0, git_exec_path(), NULL);
-	else if (!strcmp(help_cmd, "--all") || !strcmp(help_cmd, "-a"))
-		cmd_usage(1, git_exec_path(), NULL);
-	else
-		show_man_page(help_cmd);
-	return 0;
-}
-
-static int cmd_log_wc(int argc, const char **argv, char **envp,
-		      struct rev_info *rev)
-{
-	struct commit *commit;
-
-	rev->abbrev = DEFAULT_ABBREV;
-	rev->commit_format = CMIT_FMT_DEFAULT;
-	rev->verbose_header = 1;
-	argc = setup_revisions(argc, argv, rev, "HEAD");
-
-	if (argc > 1)
-		die("unrecognized argument: %s", argv[1]);
-
-	prepare_revision_walk(rev);
-	setup_pager();
-	while ((commit = get_revision(rev)) != NULL) {
-		log_tree_commit(rev, commit);
-		free(commit->buffer);
-		commit->buffer = NULL;
-	}
-	return 0;
-}
-
-static int cmd_wc(int argc, const char **argv, char **envp)
-{
-	struct rev_info rev;
-
-	init_revisions(&rev);
-	rev.diff = 1;
-	rev.diffopt.recursive = 1;
-	return cmd_log_wc(argc, argv, envp, &rev);
-}
-
-static int cmd_show(int argc, const char **argv, char **envp)
-{
-	struct rev_info rev;
-
-	init_revisions(&rev);
-	rev.diff = 1;
-	rev.diffopt.recursive = 1;
-	rev.combine_merges = 1;
-	rev.dense_combined_merges = 1;
-	rev.always_show_header = 1;
-	rev.ignore_merges = 0;
-	rev.no_walk = 1;
-	return cmd_log_wc(argc, argv, envp, &rev);
-}
-
-static int cmd_log(int argc, const char **argv, char **envp)
-{
-	struct rev_info rev;
-
-	init_revisions(&rev);
-	rev.always_show_header = 1;
-	rev.diffopt.recursive = 1;
-	return cmd_log_wc(argc, argv, envp, &rev);
-}
+const char git_version_string[] = GIT_VERSION;
 
 static void handle_internal_command(int argc, const char **argv, char **envp)
 {
@@ -344,7 +45,7 @@ static void handle_internal_command(int 
 		{ "version", cmd_version },
 		{ "help", cmd_help },
 		{ "log", cmd_log },
-		{ "whatchanged", cmd_wc },
+		{ "whatchanged", cmd_whatchanged },
 		{ "show", cmd_show },
 	};
 	int i;

^ permalink raw reply related

* git-log produces no output
From: Bob Portmann @ 2006-04-21 17:20 UTC (permalink / raw)
  To: Git Mailing List

I am just starting out with git and have a noob question about got-log.
 I cannot get any output out of it and am wondering if I am using it
correctly or it is broken.  As I understand it, git-log should just
print out the log messages but not the changes, whereas git-whatchanged
will print out both.  But while git-whatchanged works, git-log never
does.  I have a trivial example below which shows what I mean.  But I
get the same result using my real archives and out of git.git as well.

Thanks,
Bob

PS I'm using git 1.3.0 and have tried this on both Mac OS X and Linux
with the same results.

Trivial example:
git-test> mkdir test-log
git-test> cd test-log
test-log> git-init-db
defaulting to local storage area
test-log> echo "Hello World" >hello
test-log> git add .
test-log> git commit -a -m 'One line hello'
Committing initial tree 117c62a8c5e01758bd284126a6af69deab9dbbe2
test-log> echo "Hello World 2" >>hello
test-log> git commit -a -m 'Two line hello'
test-log> git whatchanged -p
diff-tree 9a4d7602fff052b6796c2862edddd11ae2e45d08 (from
a38306518c5e5e8eb630c02Author: Bob Portmann <portmann@xxxx.xx.xx>
Date:   Fri Apr 21 10:56:11 2006 -0600

    Two line hello

diff --git a/hello b/hello
index 557db03..514e5c5 100644
--- a/hello
+++ b/hello
@@ -1 +1,2 @@
 Hello World
+Hello World 2
test-log> git log
test-log> 

As you can see git log produces no output.  I've tried it with other
options with the same result.


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

^ permalink raw reply related

* Re: [RESEND] [PATCH] fix gitk with lots of tags
From: Linus Torvalds @ 2006-04-21 15:19 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: Jim Radford, Junio C Hamano, Git Mailing List
In-Reply-To: <17480.50829.466038.316769@cargo.ozlabs.ibm.com>



On Fri, 21 Apr 2006, Paul Mackerras wrote:
> 
> Junio, did you tell me some time ago about a flag to git-rev-parse
> that spits out just the file/directory names?  What was it again?

	git-rev-parse --no-flags --no-revs "$@"

should fo what you want.

		Linus

^ permalink raw reply

* Re: [RESEND] [PATCH] fix gitk with lots of tags
From: Paul Mackerras @ 2006-04-21 11:48 UTC (permalink / raw)
  To: Jim Radford; +Cc: Junio C Hamano, Git Mailing List
In-Reply-To: <20060418180614.GA31543@blackbean.org>

Jim Radford writes:

> I've gotten no reposnse from Paul on this patch[1].  If it seems ok to
> you, would you mind putting it in your queue for him?  I hate to see
> gitk die with "argument list too long" messages.  They're so 640k.

The reservation I have about this is that I need to be able to tell
the file/directory names from the tags/heads/SHA1 IDs.  After the pass
through git-rev-parse it's easy; I just take the things that match
^[a-f0-9]{40}$ as IDs and the rest as file/directory names or
switches.

Junio, did you tell me some time ago about a flag to git-rev-parse
that spits out just the file/directory names?  What was it again?

> [1] Maybe he judges people by the color of their IP address?

As in _black_bean.org? :)

>     Then again, he could just be busy.

Yeah.  Or just returned from international travel, or something like
that. :)

Paul.

^ permalink raw reply

* Re: n-heads and patch dependency chains
From: Andreas Ericsson @ 2006-04-21  8:50 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <7v4q0oyt3w.fsf@assigned-by-dhcp.cox.net>

Junio C Hamano wrote:
> Jon Loeliger <jdl@freescale.com> writes:
> 
> 
>>On Tue, 2006-04-04 at 06:47, Andreas Ericsson wrote:
>>
>>
>>>No, I mean that this would commit both to the testing branch (being the 
>>>result of several merged topic-branches) and to the topic-branch merged 
>>>in. Commit as in regular commit, with a commit-message and a patch. The 
>>>resulting repository would be the exact same as if the change was 
>>>committed only to the topic-branch and then cherry-picked on to the 
>>>testing-branch.
> 
> 
> To be consistent, I think the result should be "as if the change
> was commited only to the topic-branch and then the topic-branch
> was *merged* into the testing-branch", since you start your
> testing branch as "being the result of several merged topic-branches".
> 
> I do that (manually) all the time, with:
> 
> 	$ git checkout next
>         $ hack hack hack
> 
>         $ git checkout -m one/topic
>         $ git commit -o this-path that-path
>         $ git checkout next
>         $ git pull . one/topic
> 
> Giving a short-hand for the last four-command sequence would
> certainly be nice.
> 

Ah. That's easier than what I originally looked at doing.

> 
>>I am your number one fan!  If I finish reading these 600+
>>messages, will I find out you have already implemented it,
>>it's committed, and you just need me to test it now? :-)
> 
> 
> Likewise... ;-)
> 

Sorry to disappoint you so far. I'll see if I can turn up my 
shell-skills a notch or two and get the hang of the commit-script enough 
to implement it.

-- 
Andreas Ericsson                   andreas.ericsson@op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231

^ permalink raw reply

* Re: 1.3.0 creating bigger packs than 1.2.3
From: Nicolas Pitre @ 2006-04-21  3:07 UTC (permalink / raw)
  To: Shawn Pearce; +Cc: Junio C Hamano, git
In-Reply-To: <20060421024012.GA1213@spearce.org>

On Thu, 20 Apr 2006, Shawn Pearce wrote:
> Nicolas Pitre <nico@cam.org> wrote:
> > With the patch above the Linux kernel pack is 0.3% smaller with 1% more 
> > CPU usage.  But like for the diff-delta hash list limiting code this 
> > small overhead is certainly a good compromize to avoid big degradations 
> > in some other cases.
> 
> Hmm.  See the email I just sent. I was seeing a good 10% increase
> in my own tests on a Linux kernel repository.  But I guess I can
> hope that my test was flawed somehow and it really is closer to a 1%
> increase in running time, making it more likely that the above fix
> makes it into GIT.

Well, I repeated the kernel run and this time it took 2.5% more CPU with 
the patch.

But the thing is that I get a +/- 1% difference between successive runs.  
So while the patch does add a certain overhead, it appears to be in the 
same range as noise here.


Nicolas

^ permalink raw reply

* Re: 1.3.0 creating bigger packs than 1.2.3
From: Shawn Pearce @ 2006-04-21  2:40 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: Junio C Hamano, git
In-Reply-To: <Pine.LNX.4.64.0604202213470.2215@localhost.localdomain>

Nicolas Pitre <nico@cam.org> wrote:
> On Thu, 20 Apr 2006, Shawn Pearce wrote:
> 
> > Based on Linus' comment I changed your patch to just the following.
> > It still produced the 46M pack file, so the first hunk apears to
> > not have had much of an affect with this data.
> > 
> > From a running time perspective it appears as though this patch is
> > making things slightly better, not worse.  I ran it a few times
> > for each case always using the 46M pack as input for
> >  "git-repack -a -d -f".
> > 
> >   'next'       137.13 real        95.82 user        25.24 sys
> >   'next'+patch 131.62 real        89.35 user        28.56 sys
> > 
> > but even if the running time was an extra 6 seconds I'd still rather
> > spend 4% more running time to use 1/2 the storage space.
> > 
> > 
> > diff --git a/pack-objects.c b/pack-objects.c
> > index 09f4f2c..f7d6217 100644
> > --- a/pack-objects.c
> > +++ b/pack-objects.c
> > @@ -1052,7 +1052,7 @@ static int try_delta(struct unpacked *cu
> >         if (cur_entry->delta)
> >                 max_size = cur_entry->delta_size-1;
> >         if (sizediff >= max_size)
> > -               return -1;
> > +               return 0;
> >         delta_buf = diff_delta(old->data, oldsize,
> >                                cur->data, size, &delta_size, max_size);
> >         if (!delta_buf)
> 
> I can confirm this is indeed the best fix so far.  Any "smarter" 
> solution I could think of did increase the size of the final pack quite 
> spectacularly and rather unexpectedly with Shawn's repository.

Wow.  I'm such a trouble maker.  *grin*
 
> Of course removing the if (sizediff >= max_size) entirely does produce a 
> smaller pack (39MB) but it takes about twice the CPU.

Eh, that's not worth it.  7M disk space saved for twice the work isn't
that good of a tradeoff.  I'm not in favor of that version.

> With the patch above the Linux kernel pack is 0.3% smaller with 1% more 
> CPU usage.  But like for the diff-delta hash list limiting code this 
> small overhead is certainly a good compromize to avoid big degradations 
> in some other cases.

Hmm.  See the email I just sent. I was seeing a good 10% increase
in my own tests on a Linux kernel repository.  But I guess I can
hope that my test was flawed somehow and it really is closer to a 1%
increase in running time, making it more likely that the above fix
makes it into GIT.

-- 
Shawn.

^ permalink raw reply

* Re: 1.3.0 creating bigger packs than 1.2.3
From: Shawn Pearce @ 2006-04-21  2:32 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Nicolas Pitre, git
In-Reply-To: <20060421012029.GB819@spearce.org>

I just tried the patch below on a couple-month-old Linux 2.6
repository from Linus (last commit: Feb 14 2006).  It did not
decrease the pack file size by much despite the higher delta:

  'next'       Total 189435, written 189435 (delta 142093), reused 44057 (delta 0)
  'next'+patch Total 189435, written 189435 (delta 142712), reused 43954 (delta 0)

  'next'       104464297 bytes
  'next'+patch 104092920 bytes (99.6% of 'next')

  'next'       328.98 real       206.02 user        93.60 sys
  'next'+patch 363.06 real       218.98 user        94.72 sys

So it looks like the patch is taking longer to run, and by about 10%.
An expensive price to pay for what amounts to only a 0.4% reduction
in pack size on the kernel.


Shawn Pearce <spearce@spearce.org> wrote:
> Based on Linus' comment I changed your patch to just the following.
> It still produced the 46M pack file, so the first hunk apears to
> not have had much of an affect with this data.
> 
> From a running time perspective it appears as though this patch is
> making things slightly better, not worse.  I ran it a few times
> for each case always using the 46M pack as input for
>  "git-repack -a -d -f".
> 
>   'next'       137.13 real        95.82 user        25.24 sys
>   'next'+patch 131.62 real        89.35 user        28.56 sys
> 
> but even if the running time was an extra 6 seconds I'd still rather
> spend 4% more running time to use 1/2 the storage space.
> 
> 
> diff --git a/pack-objects.c b/pack-objects.c
> index 09f4f2c..f7d6217 100644
> --- a/pack-objects.c
> +++ b/pack-objects.c
> @@ -1052,7 +1052,7 @@ static int try_delta(struct unpacked *cu
>         if (cur_entry->delta)
>                 max_size = cur_entry->delta_size-1;
>         if (sizediff >= max_size)
> -               return -1;
> +               return 0;
>         delta_buf = diff_delta(old->data, oldsize,
>                                cur->data, size, &delta_size, max_size);
>         if (!delta_buf)

-- 
Shawn.

^ permalink raw reply

* Re: 1.3.0 creating bigger packs than 1.2.3
From: Nicolas Pitre @ 2006-04-21  2:28 UTC (permalink / raw)
  To: Shawn Pearce; +Cc: Junio C Hamano, git
In-Reply-To: <20060421012029.GB819@spearce.org>

On Thu, 20 Apr 2006, Shawn Pearce wrote:

> Based on Linus' comment I changed your patch to just the following.
> It still produced the 46M pack file, so the first hunk apears to
> not have had much of an affect with this data.
> 
> From a running time perspective it appears as though this patch is
> making things slightly better, not worse.  I ran it a few times
> for each case always using the 46M pack as input for
>  "git-repack -a -d -f".
> 
>   'next'       137.13 real        95.82 user        25.24 sys
>   'next'+patch 131.62 real        89.35 user        28.56 sys
> 
> but even if the running time was an extra 6 seconds I'd still rather
> spend 4% more running time to use 1/2 the storage space.
> 
> 
> diff --git a/pack-objects.c b/pack-objects.c
> index 09f4f2c..f7d6217 100644
> --- a/pack-objects.c
> +++ b/pack-objects.c
> @@ -1052,7 +1052,7 @@ static int try_delta(struct unpacked *cu
>         if (cur_entry->delta)
>                 max_size = cur_entry->delta_size-1;
>         if (sizediff >= max_size)
> -               return -1;
> +               return 0;
>         delta_buf = diff_delta(old->data, oldsize,
>                                cur->data, size, &delta_size, max_size);
>         if (!delta_buf)

I can confirm this is indeed the best fix so far.  Any "smarter" 
solution I could think of did increase the size of the final pack quite 
spectacularly and rather unexpectedly with Shawn's repository.

Of course removing the if (sizediff >= max_size) entirely does produce a 
smaller pack (39MB) but it takes about twice the CPU.

With the patch above the Linux kernel pack is 0.3% smaller with 1% more 
CPU usage.  But like for the diff-delta hash list limiting code this 
small overhead is certainly a good compromize to avoid big degradations 
in some other cases.


Nicolas

^ permalink raw reply

* PATCH RESEND: git-svnimport memory leak plug
From: Auke Kok @ 2006-04-21  2:12 UTC (permalink / raw)
  To: git


I was unable with todays git-svnimport to convert a modest 140mb svn repository to git using svn-git-import. the process would bomb out after 2000 revisions and I have 20000. It consumed over 1.5gb vm space on my 1gb machine, and died with a 'cannot fork error'. this also killed my project server today after I discivered it the wrong way.

The original patch send earlier by Santi Bejar fixes the problem. I include the patch again here so it can be merged. my mailer probably will nuke it so here's the link to the archive post:

http://marc.theaimsgroup.com/?l=git&m=114345884526971&w=2


Please include this patch.

Cheers,

Auke



On 3/24/06, Santi Béjar <sbejar@gmail.com> wrote:
> Jan-Benedict Glaw <jbglaw@lug-owl.de> writes:
>
> > On Wed, 2006-03-22 14:33:37 +0100, Jan-Benedict Glaw <jbglaw@lug-owl.de> wrote:
> >
> > Since it seems nobody looked at the GCC import run (which means to use
> > the svnimport), I ran it again, under strace control:
> >
> >> GCC
> >> ~~~
> >> $ /home/jbglaw/bin/git svnimport -C gcc -v svn://gcc.gnu.org/svn/gcc
> >
> >> Committed change 3936:/ 1993-03-31 05:44:03)
> >> Commit ID ceff85145f8671fb2a9d826a761cedc2a507bd1e
> >> Writing to refs/heads/origin
> >> DONE: 3936 origin ceff85145f8671fb2a9d826a761cedc2a507bd1e
> >> ... 3937 trunk/gcc/final.c ...
> >> Can't fork at /home/jbglaw/bin/git-svnimport line 379.
> >
>
> I have the same (?) problem with one of my svn repository. It worked
> before (I've redone the import with the -r flag), so I bisected it.
> The problematic commit seems to be:
>
> diff-tree 4802426... (from 525c0d7...)
> Author: Karl  Hasselström <kha@treskal.com>
> Date:   Sun Feb 26 06:11:27 2006 +0100
>
>     svnimport: Convert executable flag
>
>     Convert the svn:executable property to file mode 755 when converting
>     an SVN repository to GIT.
>
>     Signed-off-by: Karl Hasselström <kha@treskal.com>
>     Signed-off-by: Junio C Hamano <junkio@cox.net>
>
> :100755 100755 ee2940f... 6603b96... M  git-svnimport.perl
>
> I think it has a memory leak, it used up to 140m of memory.
>
> $ git reset --hard 4802426^
> $ time ../git-svnimport.perl file:///path/
> Use of uninitialized value in string eq at ../git-svnimport.perl line 463.
> Use of uninitialized value in substitution (s///) at ../git-svnimport.perl line 466.
> real    0m55.801s
> user    0m30.578s
> sys     0m23.084s
>
> $ git reset --hard 4802426
> $ time ../git-svnimport.perl file:///path/
> Use of uninitialized value in string eq at ../git-svnimport.perl line 463.
> Use of uninitialized value in substitution (s///) at ../git-svnimport.perl line 466.
> Can't fork at /home/santi/usr/src/scm/git/git-svnimport.perl line 331.
> real    6m2.163s
> user    0m20.332s
> sys     0m50.180s
>
> and it didn't finished. Hope it helps.

And this patch fixes my problems.

---

Introduced in 4802426.

Signed-off-by: Santi Béjar <sbejar@gmail.com>
---
 git-svnimport.perl |    4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/git-svnimport.perl b/git-svnimport.perl
index 639aa41..f2cf062 100755
--- a/git-svnimport.perl
+++ b/git-svnimport.perl
@@ -135,8 +135,10 @@

        print "... $rev $path ...\n" if $opt_v;
        my (undef, $properties);
+       my $pool = SVN::Pool->new();
        eval { (undef, $properties)
-                  = $self->{'svn'}->get_file($path,$rev,$fh); };
+                  = $self->{'svn'}->get_file($path,$rev,$fh,$pool); };
+       $pool->clear;
        if($@) {
                return undef if $@ =~ /Attempted to get checksum/;
                die $@;

^ permalink raw reply related

* Re: 1.3.0 creating bigger packs than 1.2.3
From: Shawn Pearce @ 2006-04-21  1:20 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Nicolas Pitre, git
In-Reply-To: <7vfyk8vscl.fsf@assigned-by-dhcp.cox.net>

Based on Linus' comment I changed your patch to just the following.
It still produced the 46M pack file, so the first hunk apears to
not have had much of an affect with this data.

>From a running time perspective it appears as though this patch is
making things slightly better, not worse.  I ran it a few times
for each case always using the 46M pack as input for
 "git-repack -a -d -f".

  'next'       137.13 real        95.82 user        25.24 sys
  'next'+patch 131.62 real        89.35 user        28.56 sys

but even if the running time was an extra 6 seconds I'd still rather
spend 4% more running time to use 1/2 the storage space.


diff --git a/pack-objects.c b/pack-objects.c
index 09f4f2c..f7d6217 100644
--- a/pack-objects.c
+++ b/pack-objects.c
@@ -1052,7 +1052,7 @@ static int try_delta(struct unpacked *cu
        if (cur_entry->delta)
                max_size = cur_entry->delta_size-1;
        if (sizediff >= max_size)
-               return -1;
+               return 0;
        delta_buf = diff_delta(old->data, oldsize,
                               cur->data, size, &delta_size, max_size);
        if (!delta_buf)

^ permalink raw reply related

* Re: 1.3.0 creating bigger packs than 1.2.3
From: Shawn Pearce @ 2006-04-21  1:01 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Nicolas Pitre, git
In-Reply-To: <7vy7xzvpsg.fsf@assigned-by-dhcp.cox.net>

Junio C Hamano <junkio@cox.net> wrote:
[snip]
> I suspect the test patch makes pack-objects a lot more
> expensive.

Which patch are you talking about the previous patch or the one in
the message I'm now replying to?

> The code before the test patch said "if the size is very small
> or size difference is too great, do not consider this, and do
> not consider any more objects in the delta window, because we
> know they are either even smaller of the same path, they have
> different names, or they are of different type".  The test patch
> you tried was a quick and dirty hack that said "under the
> too-small condition, skip this one, but keep trying the rest of
> the delta window".
> 
> Here is a cleaned up patch.  What it does is "under the
> too-small condition, see if the object has the same basename,
> and if so keep going, but otherwise skip the rest as before".
[snip]

The patch below does not help very much:

  Total 46391, written 46391 (delta 6686), reused 37979 (delta 0)
  129M pack-7f766f5af5547554bacb28c0294bd562589dc5e7.pack

> diff --git a/pack-objects.c b/pack-objects.c
> index 09f4f2c..2173709 100644
> --- a/pack-objects.c
> +++ b/pack-objects.c
> @@ -1036,8 +1036,6 @@ static int try_delta(struct unpacked *cu
>  	oldsize = old_entry->size;
>  	sizediff = oldsize > size ? oldsize - size : size - oldsize;
>  
> -	if (size < 50)
> -		return -1;
>  	if (old_entry->depth >= max_depth)
>  		return 0;
>  
> @@ -1048,20 +1046,27 @@ static int try_delta(struct unpacked *cu
>  	 * more space-efficient (deletes don't have to say _what_ they
>  	 * delete).
>  	 */
> -	max_size = size / 2 - 20;
> -	if (cur_entry->delta)
> -		max_size = cur_entry->delta_size-1;
> -	if (sizediff >= max_size)
> -		return -1;
> -	delta_buf = diff_delta(old->data, oldsize,
> -			       cur->data, size, &delta_size, max_size);
> -	if (!delta_buf)
> +	if (50 <= size) {
> +		max_size = size / 2 - 20;
> +		if (cur_entry->delta)
> +			max_size = cur_entry->delta_size-1;
> +		if (sizediff < max_size) {
> +			delta_buf = diff_delta(old->data, oldsize,
> +					       cur->data, size,
> +					       &delta_size, max_size);
> +			if (!delta_buf)
> +				return 0;
> +			cur_entry->delta = old_entry;
> +			cur_entry->delta_size = delta_size;
> +			cur_entry->depth = old_entry->depth + 1;
> +			free(delta_buf);
> +			return 0;
> +		}
> +	}
> +	/* Keep going as long as the basename matches */
> +	if (((cur_entry->hash ^ old_entry->hash) >>DIRBITS) == 0)
>  		return 0;
> -	cur_entry->delta = old_entry;
> -	cur_entry->delta_size = delta_size;
> -	cur_entry->depth = old_entry->depth + 1;
> -	free(delta_buf);
> -	return 0;
> +	return -1;
>  }
>  
>  static void progress_interval(int signum)
> 

-- 
Shawn.

^ permalink raw reply

* Re: 1.3.0 creating bigger packs than 1.2.3
From: Nicolas Pitre @ 2006-04-21  0:52 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <7vfyk8vscl.fsf@assigned-by-dhcp.cox.net>

On Thu, 20 Apr 2006, Junio C Hamano wrote:

> Nicolas Pitre <nico@cam.org> writes:
> 
> >> But I suspect we have a built-in "we sort bigger to smaller, and
> >> we cut off when we switch bins" somewhere in find_delta() loop,
> >> which I do not recall touching when I did that change, so that
> >> may be interfering and preventing 0-11-AdjLite.deg from all over
> >> the place to delta against each other.
> >
> > I just cannot find something that would do that in the code.  When 
> > --no-reuse-delta is specified, the only things that will break the loop
> > in find_delta() is when try_delta() returns -1, and that happens only 
> > when changing object type or when the size difference is too big, but 
> > nothing looks at the name hash.
> 
> The list is sorted by type then hash then size (type_size_sort),
> so if you have t/Makefile that are big medium small too-small
> and then doc/Makefile that are big medium, once you see the
> too-small t/Makefile it would not consider the big doc/Makefile
> as a candidate X-<.

Bingo!  I didn't think it all through before.


Nicolas

^ permalink raw reply

* [cogito-0.17.2] Bug in cg-log selection
From: Blaisorblade @ 2006-04-21  0:08 UTC (permalink / raw)
  To: Petr Baudis; +Cc: git

(Please CC me on replies as I'm not subscribed).

On a standard Linux tree:
$ cg-branch-ls
origin  
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-2.6.16.y.git

this command selects two commit instead than the right one (which is selected 
by git-whatchanged):
$ cg-log -r v2.6.16:v2.6.16.9 kernel/power/process.c
shows commits 6b2467e45179a336f1e5b70d2b2ae1fe89a00133 and 
1dd6f008de5a04251d9cbe4c1cf67e4c708f9fe9, but the latter doesn't touch that 
file, and
$ git-whatchanged -p v2.6.16..v2.6.16.9 kernel/power/process.c
only shows the first commit.

Verbatim output of short listing:

$ cg-log -c -s -r v2.6.16:v2.6.16.9 kernel/power/process.c|cat
6b2467e45179a336f1e5b70d2b2ae1fe89a00133 Pavel Machek    2006-04-17 13:16 
[PATCH] Fix suspend with traced tasks
1dd6f008de5a04251d9cbe4c1cf67e4c708f9fe9 Jeff Garzik     2006-03-27 22:47 
[PATCH] sata_mv: fix irq port status usage

Bye
-- 
Inform me of my mistakes, so I can keep imitating Homer Simpson's "Doh!".
Paolo Giarrusso, aka Blaisorblade (Skype ID "PaoloGiarrusso", ICQ 215621894)
http://www.user-mode-linux.org/~blaisorblade

	

	
		
___________________________________ 
Yahoo! Mail: gratis 1GB per i messaggi e allegati da 10MB 
http://mail.yahoo.it

^ permalink raw reply

* Cogito bug on Debian
From: Martin Langhoff @ 2006-04-20 23:17 UTC (permalink / raw)
  To: Git Mailing List, Petr Baudis

This was spotted circulating on Catalyst's IRC channel. Apparently,
the bug "causes non-serious data loss".

     http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=330031

cheers,


martin

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox