Git development

Git development
 help / color / mirror / Atom feed

* Re: bug: git-repack -a -d produces broken pack on NFS
From: Alex Riesen @ 2006-04-28 22:27 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Junio C Hamano, Git Mailing List
In-Reply-To: <Pine.LNX.4.64.0604271500500.3701@g5.osdl.org>

Linus Torvalds, Fri, Apr 28, 2006 00:11:13 +0200:
> > NFS server: 2.6.15
> > Client: 2.6.17-rc2
> > mount options: tigra:/home /net/home nfs rw,nosuid,nodev,noatime,vers=3,rsize=8192,wsize=32768,hard,intr,proto=udp,timeo=7,retrans=3,addr=tigra 0 0
> 
> It's repeatable? Can you check if it goes away if your remove "intr"?

It does not go away if I remove intr:

    $ grep 'nfs\>' /proc/mounts
    tigra:/home /net/home nfs rw,nosuid,nodev,noatime,vers=3,rsize=8192,wsize=32768,hard,proto=udp,timeo=7,retrans=3,addr=tigra 0 0

And this is really a broken packfile:

    $ git fsck-objects --full
    git-fsck-objects: error: Packfile .git/objects/pack/pack-9021635f04e29bb9f3313a54124f64589eca5764.pack SHA1 mismatch with itself
    git-fsck-objects: fatal: failed to read delta-pack base object a23816d3e9a1684794c8e5a8f1cc0cce26fb61d8

And I actually was kind of sure about the hardware (like in: "it
worked flawlessly for in the past 2 years"). Until looked today in the
logs and saw this:

    Apr 19 11:49:35 tigra kernel: eth1: tx underrun with maximum tx threshold, txcfg 0xd0f0102e.
    Apr 19 11:49:35 tigra kernel: eth1: Link wake-up event 0xffffffff
    Apr 19 11:49:35 tigra kernel: eth1: PCI error 0xf00000

Well, this is actually not _that_ day. And this:

    Apr 28 23:42:19 tigra kernel: eth1: tx underrun with maximum tx threshold, txcfg 0xd0f0102e.

is not exactly the time of most recent test (the one without the "hard"
mount option). But this _is_ that very same interface, and "PCI error"
looks nasty. Ok, looking at the card... Seats kinda skewed in the
slot, pressing on it... Wow! (lights go out):

    Apr 29 00:13:35 tigra kernel: eth1: Link wake-up event 0x00020b
    Apr 29 00:13:35 tigra kernel: eth1: PCI error 0xf00000
    Apr 29 00:13:39 tigra kernel: NETDEV WATCHDOG: eth1: transmit timed out
    Apr 29 00:13:39 tigra kernel: eth1: Transmit timed out, status 0x000000, resetting...
    Apr 29 00:13:39 tigra kernel: eth1: DSPCFG accepted after 0 usec.
    Apr 29 00:13:39 tigra kernel: eth1: Setting full-duplex based on negotiated link capability.

Redoing test... (Two times only, it's late already):

    $SRC/test2.git$ git repack -a -d
    Generating pack...
    Done counting 235775 objects.
    Deltifying 235775 objects.
     100% (235775/235775) done
    Writing 235775 objects.
     100% (235775/235775) done
    Total 235775, written 235775 (delta 181885), reused 223766 (delta 171462)
    Pack pack-9021635f04e29bb9f3313a54124f64589eca5764 created.
    $SRC/test2.git$ git fsck-objects --full
    dangling blob 419301f9bff67932cb9551f2d8436b277a3022b0
    $SRC/test2.git$ git repack -a -d
    Generating pack...
    Done counting 235775 objects.
    Deltifying 235775 objects.
     100% (235775/235775) done
    Writing 235775 objects.
     100% (235775/235775) done
    Total 235775, written 235775 (delta 181958), reused 235702 (delta 181885)
    Pack pack-9021635f04e29bb9f3313a54124f64589eca5764 created.
    $SRC/test2.git$ git fsck-objects --full
    dangling blob 419301f9bff67932cb9551f2d8436b277a3022b0

Hmm... Ok, apologies everyone, I'm just lazy and stupid.

Still, would be nice not to loose a repository just because
user is an idiot.

^ permalink raw reply

* Re: bug: git-repack -a -d produces broken pack on NFS
From: Linus Torvalds @ 2006-04-28 23:18 UTC (permalink / raw)
  To: Alex Riesen; +Cc: Junio C Hamano, Git Mailing List
In-Reply-To: <20060428222750.GA6462@steel.home>

On Sat, 29 Apr 2006, Alex Riesen wrote:
>
> mount option). But this _is_ that very same interface, and "PCI error"
> looks nasty. Ok, looking at the card... Seats kinda skewed in the
> slot, pressing on it... Wow! (lights go out):
> 
>     Apr 29 00:13:35 tigra kernel: eth1: Link wake-up event 0x00020b
>     Apr 29 00:13:35 tigra kernel: eth1: PCI error 0xf00000
>     Apr 29 00:13:39 tigra kernel: NETDEV WATCHDOG: eth1: transmit timed out
>     Apr 29 00:13:39 tigra kernel: eth1: Transmit timed out, status 0x000000, resetting...
>     Apr 29 00:13:39 tigra kernel: eth1: DSPCFG accepted after 0 usec.
>     Apr 29 00:13:39 tigra kernel: eth1: Setting full-duplex based on negotiated link capability.

Ok, that "PCI error" meaning is not entirely clear, but it sounds like 
noise on the line. The driver just has a comment saying

        /* Hmmmmm, it's not clear how to recover from PCI faults. */

and it's actually pretty possible (and even likely) that whatever 
corrupted the pack-file happened on the _send_ side, so by the time we get 
a PCI error report, there's already a packet out the door that likely has 
corrupted data.

Now, corrupted data would normally be caught by the UDP checksum, but:

 - a lot of modern cards do the IP checksum on the card. Whether the 
   natsemi driver does or not, I have no clue. If it does, it would always 
   make the checksum match the (corrupted) data.

 - the IP-level checksums are really quite weak. Realistically, you really 
   really _really_ want to have link-layer checksums working, but since 
   the link-level checksum is _always_ computed by the card (and since a 
   PCI error would have corrupted the packet data that the card saw), 
   link-level checksums will always have the same error that the data got, 
   and pass.

   So even if we did the UDP checksum in software, it has a reasonable 
   chance of not triggering. It's just 16 bits, and not a _good_ 16 bits 
   at that.

Anyway, definitely looks hw-induced.

> Hmm... Ok, apologies everyone, I'm just lazy and stupid.

Having flaky hw is just unhappy. I'm glad git caught it quickly.

> Still, would be nice not to loose a repository just because
> user is an idiot.

Well, there's two sides to this:

 - packs are very much designed to be very very dense. That means that 
   even the _slightest_ corruption will likely totally destroy them. Even 
   a single-bit flip likely causes massive damage to an archive.

   This is a direct and unavoidable consequence of compression and 
   density.  A less dense format automatically has a lot more redundant 
   data, and the less dense it is, the more likely you can recover from 
   single-bit errors (or even worse ones).

   So this is the downside to compression and packing. All forms of 
   compression (and we do both traditional stream-compression with zlib 
   and aggressive delta compression between objects) inevitably make the 
   effects of corruption much worse, and much harder to recover from.

   That's the bad part.

 - The good part is that replication is obviously trivial, and git is 
   inherently very good at making incremental backups.

Now, we could try to have something that tries to recover as much of a 
corrupted pack-file as possible. Right now, git-unpack-objects just dies 
(as early as possible) if the pack is corrupt. Having some mode where it 
tries to recover from errors and continue would probably be a good 
debugging and recovery tool.

		Linus

^ permalink raw reply

* Do not use zlib 1.1.3 with git packs!
From: Johannes Schindelin @ 2006-04-29  0:52 UTC (permalink / raw)
  To: git

Hi,

I had a strange effect when trying to repack a git repository on my iBook: 
first, "git-repack -a -d" would quit without an error message when about 
57% of the objects were written (*not* when deltifying them!).

It became even stranger when I tracked it to a segmentation fault in 
adler32(), where the debugger insisted that a buffer was NULL, but the 
calling code insisted it was not.

Upgrading to zlib 1.2.3 helped. That is, after I had a complete systen 
fsck-up, since virtually every binary, including su, login and getty, are 
linked to zlib on Mac OS X. (Yeah, yeah, no Linux, I know.)

Ciao,
Dscho

P.S.: This _might_ be related to the git-repack issue that came up a few 
days ago.

^ permalink raw reply

* Re: Do not use zlib 1.1.3 with git packs!
From: Johannes Schindelin @ 2006-04-29  1:47 UTC (permalink / raw)
  To: git
In-Reply-To: <Pine.LNX.4.63.0604290245510.30565@wbgn013.biozentrum.uni-wuerzburg.de>

Hi,

On Sat, 29 Apr 2006, Johannes Schindelin wrote:

> Upgrading to zlib 1.2.3 helped.

Apparently I was too enthusiastic to have a working system again.

The problem showed again, but with a different repository.

This time, though, I have an idea what could be the culprit.

In create_delta(), there might be illegal accesses. The function adler32() 
is called for BLK_SIZE bytes (which is 16 bytes at the moment), starting 
from data, which is initially trg_buf, and is incremented until it is 
(trg_buf + trg_size).

I gather that close to the end, adler32() tries to read 15 bytes after the 
end of the allocated target buffer.

Am I wrong?

Ciao,
Dscho

^ permalink raw reply

* Re: Do not use zlib 1.1.3 with git packs!
From: Nicolas Pitre @ 2006-04-29  3:20 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: git
In-Reply-To: <Pine.LNX.4.63.0604290341380.22696@wbgn013.biozentrum.uni-wuerzburg.de>

On Sat, 29 Apr 2006, Johannes Schindelin wrote:

> Hi,
> 
> On Sat, 29 Apr 2006, Johannes Schindelin wrote:
> 
> > Upgrading to zlib 1.2.3 helped.
> 
> Apparently I was too enthusiastic to have a working system again.
> 
> The problem showed again, but with a different repository.
> 
> This time, though, I have an idea what could be the culprit.
> 
> In create_delta(), there might be illegal accesses. The function adler32() 
> is called for BLK_SIZE bytes (which is 16 bytes at the moment), starting 
> from data, which is initially trg_buf, and is incremented until it is 
> (trg_buf + trg_size).
> 
> I gather that close to the end, adler32() tries to read 15 bytes after the 
> end of the allocated target buffer.
> 
> Am I wrong?

You're not.  My bad.

(I'm testing a version where adler32 has been replaced with rabin 
 polynomial so that issue will be gone at the same time.
 A patch should be coming in less than an hour.)

Nicolas

^ permalink raw reply

* [PATCH] replace adler32 with Rabin's polynomial in diff-delta
From: Nicolas Pitre @ 2006-04-29  4:58 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Johannes Schindelin, git

This brings another small repacking speedup for sensibly the same pack 
size.  On the Linux kernel repo, git-repack -a -f is 3.7% faster for a 
0.4% larger pack.

Credits to Geert Bosch who brought the Rabin's polynomial idea to my 
attention.

This also eliminate the issue of adler32() reading past the data buffer, 
as noticed by Johannes Schindelin.

Signed-off-by: Nicolas Pitre <nico@cam.org>

---

diff --git a/Makefile b/Makefile
index 8ce27a6..c13208f 100644
--- a/Makefile
+++ b/Makefile
@@ -609,7 +609,7 @@ test-date$X: test-date.c date.o ctype.o
 	$(CC) $(ALL_CFLAGS) -o $@ $(ALL_LDFLAGS) test-date.c date.o ctype.o
 
 test-delta$X: test-delta.c diff-delta.o patch-delta.o
-	$(CC) $(ALL_CFLAGS) -o $@ $(ALL_LDFLAGS) $^ -lz
+	$(CC) $(ALL_CFLAGS) -o $@ $(ALL_LDFLAGS) $^
 
 check:
 	for i in *.c; do sparse $(ALL_CFLAGS) $(SPARSE_FLAGS) $$i || exit; done
diff --git a/diff-delta.c b/diff-delta.c
index fdedf94..a40a89c 100644
--- a/diff-delta.c
+++ b/diff-delta.c
@@ -20,18 +20,106 @@
 
 #include <stdlib.h>
 #include <string.h>
-#include <zlib.h>
 #include "delta.h"
 
 
-/* block size: min = 16, max = 64k, power of 2 */
-#define BLK_SIZE 16
-
 /* maximum hash entry list for the same hash bucket */
 #define HASH_LIMIT 64
 
-#define GR_PRIME 0x9e370001
-#define HASH(v, shift) (((unsigned int)(v) * GR_PRIME) >> (shift))
+#define RABIN_SHIFT 23
+#define RABIN_WINDOW 16
+
+static const unsigned int T[256] = {
+	0x00000000, 0xab59b4d1, 0x56b369a2, 0xfdeadd73, 0x063f6795, 0xad66d344,
+	0x508c0e37, 0xfbd5bae6, 0x0c7ecf2a, 0xa7277bfb, 0x5acda688, 0xf1941259,
+	0x0a41a8bf, 0xa1181c6e, 0x5cf2c11d, 0xf7ab75cc, 0x18fd9e54, 0xb3a42a85,
+	0x4e4ef7f6, 0xe5174327, 0x1ec2f9c1, 0xb59b4d10, 0x48719063, 0xe32824b2,
+	0x1483517e, 0xbfdae5af, 0x423038dc, 0xe9698c0d, 0x12bc36eb, 0xb9e5823a,
+	0x440f5f49, 0xef56eb98, 0x31fb3ca8, 0x9aa28879, 0x6748550a, 0xcc11e1db,
+	0x37c45b3d, 0x9c9defec, 0x6177329f, 0xca2e864e, 0x3d85f382, 0x96dc4753,
+	0x6b369a20, 0xc06f2ef1, 0x3bba9417, 0x90e320c6, 0x6d09fdb5, 0xc6504964,
+	0x2906a2fc, 0x825f162d, 0x7fb5cb5e, 0xd4ec7f8f, 0x2f39c569, 0x846071b8,
+	0x798aaccb, 0xd2d3181a, 0x25786dd6, 0x8e21d907, 0x73cb0474, 0xd892b0a5,
+	0x23470a43, 0x881ebe92, 0x75f463e1, 0xdeadd730, 0x63f67950, 0xc8afcd81,
+	0x354510f2, 0x9e1ca423, 0x65c91ec5, 0xce90aa14, 0x337a7767, 0x9823c3b6,
+	0x6f88b67a, 0xc4d102ab, 0x393bdfd8, 0x92626b09, 0x69b7d1ef, 0xc2ee653e,
+	0x3f04b84d, 0x945d0c9c, 0x7b0be704, 0xd05253d5, 0x2db88ea6, 0x86e13a77,
+	0x7d348091, 0xd66d3440, 0x2b87e933, 0x80de5de2, 0x7775282e, 0xdc2c9cff,
+	0x21c6418c, 0x8a9ff55d, 0x714a4fbb, 0xda13fb6a, 0x27f92619, 0x8ca092c8,
+	0x520d45f8, 0xf954f129, 0x04be2c5a, 0xafe7988b, 0x5432226d, 0xff6b96bc,
+	0x02814bcf, 0xa9d8ff1e, 0x5e738ad2, 0xf52a3e03, 0x08c0e370, 0xa39957a1,
+	0x584ced47, 0xf3155996, 0x0eff84e5, 0xa5a63034, 0x4af0dbac, 0xe1a96f7d,
+	0x1c43b20e, 0xb71a06df, 0x4ccfbc39, 0xe79608e8, 0x1a7cd59b, 0xb125614a,
+	0x468e1486, 0xedd7a057, 0x103d7d24, 0xbb64c9f5, 0x40b17313, 0xebe8c7c2,
+	0x16021ab1, 0xbd5bae60, 0x6cb54671, 0xc7ecf2a0, 0x3a062fd3, 0x915f9b02,
+	0x6a8a21e4, 0xc1d39535, 0x3c394846, 0x9760fc97, 0x60cb895b, 0xcb923d8a,
+	0x3678e0f9, 0x9d215428, 0x66f4eece, 0xcdad5a1f, 0x3047876c, 0x9b1e33bd,
+	0x7448d825, 0xdf116cf4, 0x22fbb187, 0x89a20556, 0x7277bfb0, 0xd92e0b61,
+	0x24c4d612, 0x8f9d62c3, 0x7836170f, 0xd36fa3de, 0x2e857ead, 0x85dcca7c,
+	0x7e09709a, 0xd550c44b, 0x28ba1938, 0x83e3ade9, 0x5d4e7ad9, 0xf617ce08,
+	0x0bfd137b, 0xa0a4a7aa, 0x5b711d4c, 0xf028a99d, 0x0dc274ee, 0xa69bc03f,
+	0x5130b5f3, 0xfa690122, 0x0783dc51, 0xacda6880, 0x570fd266, 0xfc5666b7,
+	0x01bcbbc4, 0xaae50f15, 0x45b3e48d, 0xeeea505c, 0x13008d2f, 0xb85939fe,
+	0x438c8318, 0xe8d537c9, 0x153feaba, 0xbe665e6b, 0x49cd2ba7, 0xe2949f76,
+	0x1f7e4205, 0xb427f6d4, 0x4ff24c32, 0xe4abf8e3, 0x19412590, 0xb2189141,
+	0x0f433f21, 0xa41a8bf0, 0x59f05683, 0xf2a9e252, 0x097c58b4, 0xa225ec65,
+	0x5fcf3116, 0xf49685c7, 0x033df00b, 0xa86444da, 0x558e99a9, 0xfed72d78,
+	0x0502979e, 0xae5b234f, 0x53b1fe3c, 0xf8e84aed, 0x17bea175, 0xbce715a4,
+	0x410dc8d7, 0xea547c06, 0x1181c6e0, 0xbad87231, 0x4732af42, 0xec6b1b93,
+	0x1bc06e5f, 0xb099da8e, 0x4d7307fd, 0xe62ab32c, 0x1dff09ca, 0xb6a6bd1b,
+	0x4b4c6068, 0xe015d4b9, 0x3eb80389, 0x95e1b758, 0x680b6a2b, 0xc352defa,
+	0x3887641c, 0x93ded0cd, 0x6e340dbe, 0xc56db96f, 0x32c6cca3, 0x999f7872,
+	0x6475a501, 0xcf2c11d0, 0x34f9ab36, 0x9fa01fe7, 0x624ac294, 0xc9137645,
+	0x26459ddd, 0x8d1c290c, 0x70f6f47f, 0xdbaf40ae, 0x207afa48, 0x8b234e99,
+	0x76c993ea, 0xdd90273b, 0x2a3b52f7, 0x8162e626, 0x7c883b55, 0xd7d18f84,
+	0x2c043562, 0x875d81b3, 0x7ab75cc0, 0xd1eee811
+};
+
+static const unsigned int U[256] = {
+	0x00000000, 0x7eb5200d, 0x5633f4cb, 0x2886d4c6, 0x073e5d47, 0x798b7d4a,
+	0x510da98c, 0x2fb88981, 0x0e7cba8e, 0x70c99a83, 0x584f4e45, 0x26fa6e48,
+	0x0942e7c9, 0x77f7c7c4, 0x5f711302, 0x21c4330f, 0x1cf9751c, 0x624c5511,
+	0x4aca81d7, 0x347fa1da, 0x1bc7285b, 0x65720856, 0x4df4dc90, 0x3341fc9d,
+	0x1285cf92, 0x6c30ef9f, 0x44b63b59, 0x3a031b54, 0x15bb92d5, 0x6b0eb2d8,
+	0x4388661e, 0x3d3d4613, 0x39f2ea38, 0x4747ca35, 0x6fc11ef3, 0x11743efe,
+	0x3eccb77f, 0x40799772, 0x68ff43b4, 0x164a63b9, 0x378e50b6, 0x493b70bb,
+	0x61bda47d, 0x1f088470, 0x30b00df1, 0x4e052dfc, 0x6683f93a, 0x1836d937,
+	0x250b9f24, 0x5bbebf29, 0x73386bef, 0x0d8d4be2, 0x2235c263, 0x5c80e26e,
+	0x740636a8, 0x0ab316a5, 0x2b7725aa, 0x55c205a7, 0x7d44d161, 0x03f1f16c,
+	0x2c4978ed, 0x52fc58e0, 0x7a7a8c26, 0x04cfac2b, 0x73e5d470, 0x0d50f47d,
+	0x25d620bb, 0x5b6300b6, 0x74db8937, 0x0a6ea93a, 0x22e87dfc, 0x5c5d5df1,
+	0x7d996efe, 0x032c4ef3, 0x2baa9a35, 0x551fba38, 0x7aa733b9, 0x041213b4,
+	0x2c94c772, 0x5221e77f, 0x6f1ca16c, 0x11a98161, 0x392f55a7, 0x479a75aa,
+	0x6822fc2b, 0x1697dc26, 0x3e1108e0, 0x40a428ed, 0x61601be2, 0x1fd53bef,
+	0x3753ef29, 0x49e6cf24, 0x665e46a5, 0x18eb66a8, 0x306db26e, 0x4ed89263,
+	0x4a173e48, 0x34a21e45, 0x1c24ca83, 0x6291ea8e, 0x4d29630f, 0x339c4302,
+	0x1b1a97c4, 0x65afb7c9, 0x446b84c6, 0x3adea4cb, 0x1258700d, 0x6ced5000,
+	0x4355d981, 0x3de0f98c, 0x15662d4a, 0x6bd30d47, 0x56ee4b54, 0x285b6b59,
+	0x00ddbf9f, 0x7e689f92, 0x51d01613, 0x2f65361e, 0x07e3e2d8, 0x7956c2d5,
+	0x5892f1da, 0x2627d1d7, 0x0ea10511, 0x7014251c, 0x5facac9d, 0x21198c90,
+	0x099f5856, 0x772a785b, 0x4c921c31, 0x32273c3c, 0x1aa1e8fa, 0x6414c8f7,
+	0x4bac4176, 0x3519617b, 0x1d9fb5bd, 0x632a95b0, 0x42eea6bf, 0x3c5b86b2,
+	0x14dd5274, 0x6a687279, 0x45d0fbf8, 0x3b65dbf5, 0x13e30f33, 0x6d562f3e,
+	0x506b692d, 0x2ede4920, 0x06589de6, 0x78edbdeb, 0x5755346a, 0x29e01467,
+	0x0166c0a1, 0x7fd3e0ac, 0x5e17d3a3, 0x20a2f3ae, 0x08242768, 0x76910765,
+	0x59298ee4, 0x279caee9, 0x0f1a7a2f, 0x71af5a22, 0x7560f609, 0x0bd5d604,
+	0x235302c2, 0x5de622cf, 0x725eab4e, 0x0ceb8b43, 0x246d5f85, 0x5ad87f88,
+	0x7b1c4c87, 0x05a96c8a, 0x2d2fb84c, 0x539a9841, 0x7c2211c0, 0x029731cd,
+	0x2a11e50b, 0x54a4c506, 0x69998315, 0x172ca318, 0x3faa77de, 0x411f57d3,
+	0x6ea7de52, 0x1012fe5f, 0x38942a99, 0x46210a94, 0x67e5399b, 0x19501996,
+	0x31d6cd50, 0x4f63ed5d, 0x60db64dc, 0x1e6e44d1, 0x36e89017, 0x485db01a,
+	0x3f77c841, 0x41c2e84c, 0x69443c8a, 0x17f11c87, 0x38499506, 0x46fcb50b,
+	0x6e7a61cd, 0x10cf41c0, 0x310b72cf, 0x4fbe52c2, 0x67388604, 0x198da609,
+	0x36352f88, 0x48800f85, 0x6006db43, 0x1eb3fb4e, 0x238ebd5d, 0x5d3b9d50,
+	0x75bd4996, 0x0b08699b, 0x24b0e01a, 0x5a05c017, 0x728314d1, 0x0c3634dc,
+	0x2df207d3, 0x534727de, 0x7bc1f318, 0x0574d315, 0x2acc5a94, 0x54797a99,
+	0x7cffae5f, 0x024a8e52, 0x06852279, 0x78300274, 0x50b6d6b2, 0x2e03f6bf,
+	0x01bb7f3e, 0x7f0e5f33, 0x57888bf5, 0x293dabf8, 0x08f998f7, 0x764cb8fa,
+	0x5eca6c3c, 0x207f4c31, 0x0fc7c5b0, 0x7172e5bd, 0x59f4317b, 0x27411176,
+	0x1a7c5765, 0x64c97768, 0x4c4fa3ae, 0x32fa83a3, 0x1d420a22, 0x63f72a2f,
+	0x4b71fee9, 0x35c4dee4, 0x1400edeb, 0x6ab5cde6, 0x42331920, 0x3c86392d,
+	0x133eb0ac, 0x6d8b90a1, 0x450d4467, 0x3bb8646a
+};
 
 struct index_entry {
 	const unsigned char *ptr;
@@ -42,13 +130,13 @@ struct index_entry {
 struct delta_index {
 	const void *src_buf;
 	unsigned long src_size;
-	unsigned int hash_shift;
+	unsigned int hash_mask;
 	struct index_entry *hash[0];
 };
 
 struct delta_index * create_delta_index(const void *buf, unsigned long bufsize)
 {
-	unsigned int i, hsize, hshift, entries, *hash_count;
+	unsigned int i, hsize, hmask, entries, *hash_count;
 	const unsigned char *data, *buffer = buf;
 	struct delta_index *index;
 	struct index_entry *entry, **hash;
@@ -57,12 +145,14 @@ struct delta_index * create_delta_index(
 	if (!buf || !bufsize)
 		return NULL;
 
-	/* determine index hash size */
-	entries = bufsize  / BLK_SIZE;
+	/* Determine index hash size.  Note that indexing skips the
+	   first byte to allow for optimizing the rabin polynomial
+	   initialization in create_delta(). */
+	entries = (bufsize - 1)  / RABIN_WINDOW;
 	hsize = entries / 4;
 	for (i = 4; (1 << i) < hsize && i < 31; i++);
 	hsize = 1 << i;
-	hshift = 32 - i;
+	hmask = hsize - 1;
 
 	/* allocate lookup index */
 	mem = malloc(sizeof(*index) +
@@ -78,7 +168,7 @@ struct delta_index * create_delta_index(
 
 	index->src_buf = buf;
 	index->src_size = bufsize;
-	index->hash_shift = hshift;
+	index->hash_mask = hmask;
 	memset(hash, 0, hsize * sizeof(*hash));
 
 	/* allocate an array to count hash entries */
@@ -89,17 +179,19 @@ struct delta_index * create_delta_index(
 	}
 
 	/* then populate the index */
-	data = buffer + entries * BLK_SIZE - BLK_SIZE;
+	data = buffer + entries * RABIN_WINDOW - RABIN_WINDOW;
 	while (data >= buffer) {
-		unsigned int val = adler32(0, data, BLK_SIZE);
-		i = HASH(val, hshift);
-		entry->ptr = data;
+		unsigned int val = 0;
+		for (i = 1; i <= RABIN_WINDOW; i++)
+			val = ((val << 8) | data[i]) ^ T[val >> RABIN_SHIFT];
+		i = val & hmask;
+		entry->ptr = data + RABIN_WINDOW;
 		entry->val = val;
 		entry->next = hash[i];
 		hash[i] = entry++;
 		hash_count[i]++;
-		data -= BLK_SIZE;
- 	}
+		data -= RABIN_WINDOW;
+	}
 
 	/*
 	 * Determine a limit on the number of entries in the same hash
@@ -136,20 +228,18 @@ void free_delta_index(struct delta_index
 	free(index);
 }
 
-/* provide the size of the copy opcode given the block offset and size */
-#define COPYOP_SIZE(o, s) \
-    (!!(o & 0xff) + !!(o & 0xff00) + !!(o & 0xff0000) + !!(o & 0xff000000) + \
-     !!(s & 0xff) + !!(s & 0xff00) + 1)
-
-/* the maximum size for any opcode */
-#define MAX_OP_SIZE COPYOP_SIZE(0xffffffff, 0xffffffff)
+/*
+ * The maximum size for any opcode sequence, including the initial header
+ * plus rabin window plus biggest copy.
+ */
+#define MAX_OP_SIZE	(5 + 5 + 1 + RABIN_WINDOW + 7) 
 
 void *
 create_delta(const struct delta_index *index,
 	     const void *trg_buf, unsigned long trg_size,
 	     unsigned long *delta_size, unsigned long max_size)
 {
-	unsigned int i, outpos, outsize, hash_shift;
+	unsigned int i, outpos, outsize, hash_mask, val;
 	int inscnt;
 	const unsigned char *ref_data, *ref_top, *data, *top;
 	unsigned char *out;
@@ -185,14 +275,22 @@ create_delta(const struct delta_index *i
 	ref_top = ref_data + index->src_size;
 	data = trg_buf;
 	top = trg_buf + trg_size;
-	hash_shift = index->hash_shift;
-	inscnt = 0;
+	hash_mask = index->hash_mask;
+
+	outpos++;
+	val = 0;
+	for (i = 0; i < RABIN_WINDOW && data < top; i++, data++) {
+		out[outpos++] = *data;
+		val = ((val << 8) | *data) ^ T[val >> RABIN_SHIFT];
+	}
+	inscnt = i;
 
 	while (data < top) {
 		unsigned int moff = 0, msize = 0;
 		struct index_entry *entry;
-		unsigned int val = adler32(0, data, BLK_SIZE);
-		i = HASH(val, hash_shift);
+		val ^= U[data[-RABIN_WINDOW]];
+		val = ((val << 8) | *data) ^ T[val >> RABIN_SHIFT];
+		i = val & hash_mask;
 		for (entry = index->hash[i]; entry; entry = entry->next) {
 			const unsigned char *ref = entry->ptr;
 			const unsigned char *src = data;
@@ -214,7 +312,7 @@ create_delta(const struct delta_index *i
 			}
 		}
 
-		if (!msize || msize < COPYOP_SIZE(moff, msize)) {
+		if (msize < 4) {
 			if (!inscnt)
 				outpos++;
 			out[outpos++] = *data++;
@@ -226,6 +324,20 @@ create_delta(const struct delta_index *i
 		} else {
 			unsigned char *op;
 
+			if (msize >= RABIN_WINDOW) {
+				const unsigned char *sk;
+				sk = data + msize - RABIN_WINDOW;
+				val = 0;
+				for (i = 0; i < RABIN_WINDOW; i++)
+					val = ((val << 8) | *sk++) ^ T[val >> RABIN_SHIFT];
+			} else {
+				const unsigned char *sk = data + 1;
+				for (i = 1; i < msize; i++) {
+					val ^= U[sk[-RABIN_WINDOW]];
+					val = ((val << 8) | *sk++) ^ T[val >> RABIN_SHIFT];
+				}
+			}
+
 			if (inscnt) {
 				while (moff && ref_data[moff-1] == data[-1]) {
 					if (msize == 0x10000)
@@ -270,9 +382,8 @@ create_delta(const struct delta_index *i
 			if (max_size && outsize >= max_size)
 				outsize = max_size + MAX_OP_SIZE + 1;
 			if (max_size && outpos > max_size)
-				out = NULL;
-			else
-				out = realloc(out, outsize);
+				break;
+			out = realloc(out, outsize);
 			if (!out) {
 				free(tmp);
 				return NULL;
@@ -283,6 +394,11 @@ create_delta(const struct delta_index *i
 	if (inscnt)
 		out[outpos - inscnt - 1] = inscnt;
 
+	if (max_size && outpos > max_size) {
+		free(out);
+		return NULL;
+	}
+
 	*delta_size = outpos;
 	return out;
 }

^ permalink raw reply related

* Re: Two gitweb feature requests
From: Jeff King @ 2006-04-29  5:02 UTC (permalink / raw)
  To: git
In-Reply-To: <Pine.LNX.4.64.0604281116020.3701@g5.osdl.org>

On Fri, Apr 28, 2006 at 11:23:11AM -0700, Linus Torvalds wrote:

> The downside is that you'd have two different web-pages for the same tree 
> depending on which commit it came from. Which is not a downside from a 
> user perspective, but it's a downside from a caching/server perspective, 
> since it means less reuse of pages (maybe gitweb already does that, 
> though).

The gitweb request for a tree already contains not only the tree hash,
but also the commit hash and the filename path. It's possible (but more
expensive than typical tree requests) to find '..' by munging the path
and traversing the tree from the root.

-Peff

^ permalink raw reply

* Re: Do not use zlib 1.1.3 with git packs!
From: Junio C Hamano @ 2006-04-29  6:36 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: git, Johannes Schindelin
In-Reply-To: <Pine.LNX.4.64.0604282316520.18816@localhost.localdomain>

Nicolas Pitre <nico@cam.org> writes:

> On Sat, 29 Apr 2006, Johannes Schindelin wrote:
>
>> I gather that close to the end, adler32() tries to read 15 bytes after the 
>> end of the allocated target buffer.
>> 
>> Am I wrong?
>
> You're not.  My bad.

Thanks both, for catching this while still in "next".

^ permalink raw reply

* [PATCH] built-in diff.
From: Junio C Hamano @ 2006-04-29  7:19 UTC (permalink / raw)
  To: git

This starts to replace the shell script version of "git diff".

Signed-off-by: Junio C Hamano <junkio@cox.net>
---

 * Should apply on top of "next" with trivial conflict fixups
   (hint: "git am -3").

   It is not fully compatible with the shell script version yet,
   so it is called "git diffn" for now.  It does not do the
   fancy option defaulting, other than "git diffn --cached" to
   mean "git diffn --cached HEAD", and I do not think I got the
   defaulting to --cc right either.  Also it does not default to
   do -M either.  If somebody cares deeply, patches are welcome.
   When its option defaulting gets compatible with the shell
   script version, or when its new default options becomes
   accepted, we will rename it to "git diff" and merge it in.

   To ask for the diff-raw output, you can give --raw option.

 Makefile       |    2 
 builtin-diff.c |  332 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 builtin.h      |    1 
 git.c          |    1 
 4 files changed, 335 insertions(+), 1 deletions(-)

diff --git a/Makefile b/Makefile
index 8ce27a6..277c1ac 100644
--- a/Makefile
+++ b/Makefile
@@ -214,7 +214,7 @@ LIB_OBJS = \
 	$(DIFF_OBJS)
 
 BUILTIN_OBJS = \
-	builtin-log.o builtin-help.o
+	builtin-log.o builtin-help.o builtin-diff.o
 
 GITLIBS = $(LIB_FILE) $(XDIFF_LIB)
 LIBS = $(GITLIBS) -lz
diff --git a/builtin-diff.c b/builtin-diff.c
new file mode 100644
index 0000000..1968212
--- /dev/null
+++ b/builtin-diff.c
@@ -0,0 +1,332 @@
+/*
+ * Builtin "git diff"
+ *
+ * Copyright (c) 2006 Junio C Hamano
+ */
+#include "cache.h"
+#include "commit.h"
+#include "blob.h"
+#include "tag.h"
+#include "diff.h"
+#include "diffcore.h"
+#include "revision.h"
+#include "log-tree.h"
+#include "builtin.h"
+
+/* NEEDSWORK: struct object has place for name but we _do_
+ * know mode when we extracted the blob out of a tree, which
+ * we currently lose.
+ */
+struct blobinfo {
+	unsigned char sha1[20];
+	const char *name;
+};
+
+static const char builtin_diff_usage[] =
+"diff <options> <rev>{0,2} -- <path>*";
+
+static int builtin_diff_files(struct rev_info *revs,
+			      int argc, const char **argv)
+{
+	int silent = 0;
+	while (1 < argc) {
+		const char *arg = argv[1];
+		if (!strcmp(arg, "--base"))
+			revs->max_count = 1;
+		else if (!strcmp(arg, "--ours"))
+			revs->max_count = 2;
+		else if (!strcmp(arg, "--theirs"))
+			revs->max_count = 3;
+		else if (!strcmp(arg, "-q"))
+			silent = 1;
+		else if (!strcmp(arg, "--raw"))
+			revs->diffopt.output_format = DIFF_FORMAT_RAW;
+		else
+			usage(builtin_diff_usage);
+		argv++; argc--;
+	}
+	/*
+	 * Make sure there are NO revision (i.e. pending object) parameter,
+	 * rev.max_count is reasonable (0 <= n <= 3),
+	 * there is no other revision filtering parameters.
+	 */
+	if (revs->pending_objects ||
+	    revs->min_age != -1 ||
+	    revs->max_age != -1)
+		usage(builtin_diff_usage);
+	/*
+	 * Backward compatibility wart - "diff-files -s" used to
+	 * defeat the common diff option "-s" which asked for
+	 * DIFF_FORMAT_NO_OUTPUT.
+	 */
+	if (revs->diffopt.output_format == DIFF_FORMAT_NO_OUTPUT)
+		revs->diffopt.output_format = DIFF_FORMAT_RAW;
+	return run_diff_files(revs, silent);
+}
+
+static void stuff_change(struct diff_options *opt,
+			 unsigned old_mode, unsigned new_mode,
+			 const unsigned char *old_sha1,
+			 const unsigned char *new_sha1,
+			 const char *old_name,
+			 const char *new_name)
+{
+	struct diff_filespec *one, *two;
+
+	if (memcmp(null_sha1, old_sha1, 20) &&
+	    memcmp(null_sha1, new_sha1, 20) &&
+	    !memcmp(old_sha1, new_sha1, 20))
+		return;
+
+	if (opt->reverse_diff) {
+		unsigned tmp;
+		const
+			const unsigned char *tmp_u;
+		const char *tmp_c;
+		tmp = old_mode; old_mode = new_mode; new_mode = tmp;
+		tmp_u = old_sha1; old_sha1 = new_sha1; new_sha1 = tmp_u;
+		tmp_c = old_name; old_name = new_name; new_name = tmp_c;
+	}
+	one = alloc_filespec(old_name);
+	two = alloc_filespec(new_name);
+	fill_filespec(one, old_sha1, old_mode);
+	fill_filespec(two, new_sha1, new_mode);
+
+	/* NEEDSWORK: shouldn't this part of diffopt??? */
+	diff_queue(&diff_queued_diff, one, two);
+}
+
+static int builtin_diff_b_f(struct rev_info *revs,
+			    int argc, const char **argv,
+			    struct blobinfo *blob,
+			    const char *path)
+{
+	/* Blob vs file in the working tree*/
+	struct stat st;
+
+	while (1 < argc) {
+		const char *arg = argv[1];
+		if (!strcmp(arg, "--raw"))
+			revs->diffopt.output_format = DIFF_FORMAT_RAW;
+		else
+			usage(builtin_diff_usage);
+		argv++; argc--;
+	}
+	if (lstat(path, &st))
+		die("'%s': %s", path, strerror(errno));
+	if (!(S_ISREG(st.st_mode) || S_ISLNK(st.st_mode)))
+		die("'%s': not a regular file or symlink", path);
+	stuff_change(&revs->diffopt,
+		     canon_mode(st.st_mode), canon_mode(st.st_mode),
+		     blob[0].sha1, null_sha1,
+		     blob[0].name, path);
+	diffcore_std(&revs->diffopt);
+	diff_flush(&revs->diffopt);
+	return 0;
+}
+
+static int builtin_diff_blobs(struct rev_info *revs,
+			      int argc, const char **argv,
+			      struct blobinfo *blob)
+{
+	/* Blobs */
+	unsigned mode = canon_mode(S_IFREG | 0644);
+
+	while (1 < argc) {
+		const char *arg = argv[1];
+		if (!strcmp(arg, "--raw"))
+			revs->diffopt.output_format = DIFF_FORMAT_RAW;
+		else
+			usage(builtin_diff_usage);
+		argv++; argc--;
+	}
+	stuff_change(&revs->diffopt,
+		     mode, mode,
+		     blob[0].sha1, blob[1].sha1,
+		     blob[1].name, blob[1].name);
+	diffcore_std(&revs->diffopt);
+	diff_flush(&revs->diffopt);
+	return 0;
+}
+
+static int builtin_diff_index(struct rev_info *revs,
+			      int argc, const char **argv)
+{
+	int cached = 0;
+	while (1 < argc) {
+		const char *arg = argv[1];
+		if (!strcmp(arg, "--cached"))
+			cached = 1;
+		else if (!strcmp(arg, "--raw"))
+			revs->diffopt.output_format = DIFF_FORMAT_RAW;
+		else
+			usage(builtin_diff_usage);
+		argv++; argc--;
+	}
+	/*
+	 * Make sure there is one revision (i.e. pending object),
+	 * and there is no revision filtering parameters.
+	 */
+	if (!revs->pending_objects || revs->pending_objects->next ||
+	    revs->max_count != -1 || revs->min_age != -1 ||
+	    revs->max_age != -1)
+		usage(builtin_diff_usage);
+	return run_diff_index(revs, cached);
+}
+
+static int builtin_diff_tree(struct rev_info *revs,
+			     int argc, const char **argv,
+			     struct object_list *ent)
+{
+	/* We saw two trees, ent[0] and ent[1].
+	 * unless ent[0] is unintesting, they are swapped
+	 */
+	const unsigned char *(sha1[2]);
+	int swap = 1;
+	while (1 < argc) {
+		const char *arg = argv[1];
+		if (!strcmp(arg, "--raw"))
+			revs->diffopt.output_format = DIFF_FORMAT_RAW;
+		else
+			usage(builtin_diff_usage);
+		argv++; argc--;
+	}
+	if (ent[0].item->flags & UNINTERESTING)
+		swap = 0;
+	sha1[swap] = ent[0].item->sha1;
+	sha1[1-swap] = ent[1].item->sha1;
+	diff_tree_sha1(sha1[0], sha1[1], "", &revs->diffopt);
+	log_tree_diff_flush(revs);
+	return 0;
+}
+
+static void add_head(struct rev_info *revs)
+{
+	unsigned char sha1[20];
+	struct object *obj;
+	if (get_sha1("HEAD", sha1))
+		return;
+	obj = parse_object(sha1);
+	if (!obj)
+		return;
+	add_object(obj, &revs->pending_objects, NULL, "HEAD");
+}
+
+int cmd_diff(int argc, const char **argv, char **envp)
+{
+	struct rev_info rev;
+	struct object_list *list, ent[2];
+	int ents = 0, blobs = 0, paths = 0;
+	const char *path = NULL;
+	struct blobinfo blob[2];
+
+	/*
+	 * We could get N tree-ish in the rev.pending_objects list.
+	 * Also there could be M blobs there, and P pathspecs.
+	 *
+	 * N=0, M=0:
+	 *	cache vs files (diff-files)
+	 * N=0, M=2:
+	 *      compare two random blobs.  P must be zero.
+	 * N=0, M=1, P=1:
+	 *	compare a blob with a working tree file.
+	 *
+	 * N=1, M=0:
+	 *      tree vs cache (diff-index --cached)
+	 *
+	 * N=2, M=0:
+	 *      tree vs tree (diff-tree)
+	 *
+	 * Other cases are errors.
+	 */
+	
+	git_config(git_diff_config);
+	init_revisions(&rev);
+	rev.diffopt.output_format = DIFF_FORMAT_PATCH;
+
+	argc = setup_revisions(argc, argv, &rev, NULL);
+	/* Do we have --cached and not have a pending object, then
+	 * default to HEAD by hand.  Eek.
+	 */
+	if (!rev.pending_objects) {
+		int i;
+		for (i = 1; i < argc; i++) {
+			const char *arg = argv[i];
+			if (!strcmp(arg, "--"))
+				break;
+			else if (!strcmp(arg, "--cached")) {
+				add_head(&rev);
+				break;
+			}
+		}
+	}
+
+	for (list = rev.pending_objects; list; list = list->next) {
+		struct object *obj = list->item;
+		const char *name = list->name;
+		int flags = (obj->flags & UNINTERESTING);
+		if (!obj->parsed)
+			obj = parse_object(obj->sha1);
+		obj = deref_tag(obj, NULL, 0);
+		if (!obj)
+			die("invalid object '%s' given.", name);
+		if (!strcmp(obj->type, commit_type))
+			obj = &((struct commit *)obj)->tree->object;
+		if (!strcmp(obj->type, tree_type)) {
+			if (2 <= ents)
+				die("more than two trees given: '%s'", name);
+			obj->flags |= flags;
+			ent[ents].item = obj;
+			ent[ents].name = name;
+			ents++;
+			continue;
+		}
+		if (!strcmp(obj->type, blob_type)) {
+			if (2 <= blobs)
+				die("more than two blobs given: '%s'", name);
+			memcpy(blob[blobs].sha1, obj->sha1, 20);
+			blob[blobs].name = name;
+			blobs++;
+			continue;
+			
+		}
+		die("unhandled object '%s' given.", name);
+	}
+	if (rev.prune_data) {
+		const char **pathspec = rev.prune_data;
+		while (*pathspec) {
+			if (!path)
+				path = *pathspec;
+			paths++;
+			pathspec++;
+		}
+	}
+
+	/*
+	 * Now, do the arguments look reasonable?
+	 */
+	if (!ents) {
+		switch (blobs) {
+		case 0:
+			return builtin_diff_files(&rev, argc, argv);
+			break;
+		case 1:
+			if (paths != 1)
+				usage(builtin_diff_usage);
+			return builtin_diff_b_f(&rev, argc, argv, blob, path);
+			break;
+		case 2:
+			return builtin_diff_blobs(&rev, argc, argv, blob);
+			break;
+		default:
+			usage(builtin_diff_usage);
+		}
+	}
+	else if (blobs)
+		usage(builtin_diff_usage);
+	else if (ents == 1)
+		return builtin_diff_index(&rev, argc, argv);
+	else if (ents == 2)
+		return builtin_diff_tree(&rev, argc, argv, ent);
+	usage(builtin_diff_usage);
+}
diff --git a/builtin.h b/builtin.h
index 47408a0..52ffa52 100644
--- a/builtin.h
+++ b/builtin.h
@@ -19,5 +19,6 @@ extern int cmd_version(int argc, const c
 extern int cmd_whatchanged(int argc, const char **argv, char **envp);
 extern int cmd_show(int argc, const char **argv, char **envp);
 extern int cmd_log(int argc, const char **argv, char **envp);
+extern int cmd_diff(int argc, const char **argv, char **envp);
 
 #endif
diff --git a/git.c b/git.c
index 01b7e28..ff9b87a 100644
--- a/git.c
+++ b/git.c
@@ -46,6 +46,7 @@ static void handle_internal_command(int 
 		{ "log", cmd_log },
 		{ "whatchanged", cmd_whatchanged },
 		{ "show", cmd_show },
+		{ "diffn", cmd_diff },
 	};
 	int i;
 
-- 
1.3.1.g1d19

^ permalink raw reply related

* Re: Two gitweb feature requests
From: Jakub Narebski @ 2006-04-29  7:27 UTC (permalink / raw)
  To: git

It would be nice if gitweb provided for the GIT repository description 
which doesn't fit the "Description" column and is shortened, and any 
other text which doesn't fit it's own column, full text of said field 
as the "title" attribute for the cell 'td' element, or encompasing 
'span' element. It would result in having full, not shortened text of 
said field (e.g. repository description) in the pop-up on mouse hover 
over said field. To simplify things it could be provided 
unconditionally, regardless whether the field needs shortening or not.

By the way, would it be possible for the repository without provided 
description to stand out more, perhaps by changing formatting of 
default description: 
  "Unnamed repository; edit this file to name it for gitweb."
to use ALL CAPS for attention, like that: 
  "UNNAMED REPOSITORY; edit this file to name it for gitweb."
or perhaps if possible use HTML formatting [additionally] for that.

-- 
Jakub Narebski
Poland

^ permalink raw reply

* [ANNOUNCE] qgit-1.2
From: Marco Costalba @ 2006-04-29  7:54 UTC (permalink / raw)
  To: git, linux-kernel

This is qgit-1.2

With qgit you will be able to browse revisions history, view patch content
and changed files, graphically following different development branches.

FEATURES

 - View revisions, diffs, files history, files annotation, archive tree.

 - Commit changes visually cherry picking modified files.

 - Apply or format patch series from selected commits, drag and
   drop commits between two instances of qgit.

  - qgit implements a GUI for the most common StGIT commands like push/pop
   and apply/format patches. You can also create new patches or refresh
   current top one using the same semantics of git commit, i.e. cherry
   picking single modified files.

NEW IN THIS RELEASE

A lot of work has been done from 1.1

Main new features:

*Add support for code range filtering*
To filter a mouse selected code text across all revisions file history,
this is very useful to quick search for revisions that modified a given
chunk of code.

*Much improved graph for partial repos views*
Thanks to new --boundary git-rev-list option graphs for partial repo
views are now much more clear and easy to follow.

*Support for launching an external diff viewer*
Useful, as example, to view the diffs in a 'tiled pane' mode. Default
is kompare, but it is easily user settable.

*Show stat info in patch viewer*
Use new --patch-with-stat git-diff-tree option to prepend stat info in
patch viewer.

*Adjustable font size in revision list view*
To squeeze revision graph without compromise patch and file content views.

*Key bindings for scrolling patch and revision logs*
Same keyboard navigation map of gitk. See help (F1) for details.

*Support for 'working dir' pseudo-tree in tree view*
To browse working dir changes and files in a tree like view. This is useful
for people coming from others SCM systems.

*Loading speed improvements*
Heavy optimization work to speed-up start up times, about 20%-30% from
qgit 1.1.1

NOTE: interface change.
Browsing file history revisions in annotate window does not update main view
anymore. Double click is now required to update main view. This makes qgit
less 'jumpy' and let browse file history without loosing main view selection.

Please note that you will need git 1.3.0 or newer.

DOWNLOAD

Tarball is at
http://prdownloads.sourceforge.net/qgit/qgit-1.2.tar.bz2?download

Git archive is at
http://digilander.libero.it/mcostalba/scm/qgit.git

See http://digilander.libero.it/mcostalba/ for detailed download information.

INSTALLATION

git 1.3.0 is required.

To install from tarball:

./configure
make
make install-strip

To install from git archive:

autoreconf -i
./configure
make
make install-strip

Or check the shipped README for detailed information.

CHANGELOG from 1.2rc2

- do not update main view when browsing file history: double click is
now required

- show "working dir" pseudo rev also when there is no change

- add support for 'working dir' pseudo-tree in tree view

- add key bindings for scrolling patch and revision logs views

- add support for adjustable font size in revision list view

- add "revision diff" entry in context menu

- support lane mouse selection also for not free lanes

- help view should not be a top level window

- restore selection when removing code filtering

- fix a bad crash in diff viewer window

- fix a inconsistency in spinbox in annotation view

- fix an off by one bug that creates invalid pixmaps

- fix highlighting of selected revision if a reference name is used

- small GUI tweaks

For a complete changelog see shipped ChangeLog file or git repository
revision's history

	Marco

^ permalink raw reply

* [PATCH] built-in diff: assorted updates.
From: Junio C Hamano @ 2006-04-29  9:29 UTC (permalink / raw)
  To: git
In-Reply-To: <7virosalxb.fsf@assigned-by-dhcp.cox.net>

"git diff(n)" without --base, --ours, etc. defaults to --cc,
which usually is the same as -p unless you are in the middle of
a conflicted merge, just like the shell script version.

"git diff(n) blobA blobB path" complains and dies.

"git diff(n) tree0 tree1 tree2...treeN" does combined diff that
shows a merge of tree1..treeN to result in tree0.

Giving "-c" option to any command that defaults to "--cc" turns
off dense-combined flag.

Signed-off-by: Junio C Hamano <junkio@cox.net>
---

 * Comes on top of the previous "built-in diff" patch.

   Now I think this is more or less what Linus envisioned when
   he started converting the rev related option parsing to use
   setup_revisions(), including the exotic combinations the
   shell script "git diff" does not support:

   git diff(n) HEAD~10:Makefile Makefile	        ; blob vs any file
   git diff(n) HEAD~10:Makefile HEAD~20:Makefile        ; two blobs
   git diff(n) v1.0.0 v1.0.0^1 v1.0.0^2	; combine 'em

   The normal ones are also supported:

   git diff(n) -- Documentation				; index vs working
   git diff(n) --cached 				; tree vs index
   git diff(n) HEAD					; tree vs working
   git diff(n) HEAD~10 HEAD~8				; two trees
   git diff(n) HEAD:arch/i386 HEAD:arch/x86_64		; two trees

 builtin-diff.c |   55 ++++++++++++++++++++++++++++++++++++++++++++++---------
 combine-diff.c |   47 ++++++++++++++++++++++++++++++-----------------
 diff.h         |    2 ++
 revision.c     |    1 +
 4 files changed, 79 insertions(+), 26 deletions(-)

diff --git a/builtin-diff.c b/builtin-diff.c
index 1968212..c543105 100644
--- a/builtin-diff.c
+++ b/builtin-diff.c
@@ -47,13 +47,17 @@ static int builtin_diff_files(struct rev
 	}
 	/*
 	 * Make sure there are NO revision (i.e. pending object) parameter,
-	 * rev.max_count is reasonable (0 <= n <= 3),
-	 * there is no other revision filtering parameters.
+	 * specified rev.max_count is reasonable (0 <= n <= 3), and
+	 * there is no other revision filtering parameter.
 	 */
 	if (revs->pending_objects ||
 	    revs->min_age != -1 ||
-	    revs->max_age != -1)
+	    revs->max_age != -1 ||
+	    3 < revs->max_count)
 		usage(builtin_diff_usage);
+	if (revs->max_count < 0 &&
+	    (revs->diffopt.output_format == DIFF_FORMAT_PATCH))
+		revs->combine_merges = revs->dense_combined_merges = 1;
 	/*
 	 * Backward compatibility wart - "diff-files -s" used to
 	 * defeat the common diff option "-s" which asked for
@@ -178,9 +182,6 @@ static int builtin_diff_tree(struct rev_
 			     int argc, const char **argv,
 			     struct object_list *ent)
 {
-	/* We saw two trees, ent[0] and ent[1].
-	 * unless ent[0] is unintesting, they are swapped
-	 */
 	const unsigned char *(sha1[2]);
 	int swap = 1;
 	while (1 < argc) {
@@ -191,6 +192,10 @@ static int builtin_diff_tree(struct rev_
 			usage(builtin_diff_usage);
 		argv++; argc--;
 	}
+
+	/* We saw two trees, ent[0] and ent[1].
+	 * unless ent[0] is unintesting, they are swapped
+	 */
 	if (ent[0].item->flags & UNINTERESTING)
 		swap = 0;
 	sha1[swap] = ent[0].item->sha1;
@@ -200,6 +205,33 @@ static int builtin_diff_tree(struct rev_
 	return 0;
 }
 
+static int builtin_diff_combined(struct rev_info *revs,
+				 int argc, const char **argv,
+				 struct object_list *ent,
+				 int ents)
+{
+	const unsigned char (*parent)[20];
+	int i;
+
+	while (1 < argc) {
+		const char *arg = argv[1];
+		if (!strcmp(arg, "--raw"))
+			revs->diffopt.output_format = DIFF_FORMAT_RAW;
+		else
+			usage(builtin_diff_usage);
+		argv++; argc--;
+	}
+	if (!revs->dense_combined_merges && !revs->combine_merges)
+		revs->dense_combined_merges = revs->combine_merges = 1;
+	parent = xmalloc(ents * sizeof(*parent));
+	/* Again, the revs are all reverse */
+	for (i = 0; i < ents; i++)
+		memcpy(parent + i, ent[ents - 1 - i].item->sha1, 20);
+	diff_tree_combined(parent[0], parent + 1, ents - 1,
+			   revs->dense_combined_merges, revs);
+	return 0;
+}
+
 static void add_head(struct rev_info *revs)
 {
 	unsigned char sha1[20];
@@ -215,7 +247,7 @@ static void add_head(struct rev_info *re
 int cmd_diff(int argc, const char **argv, char **envp)
 {
 	struct rev_info rev;
-	struct object_list *list, ent[2];
+	struct object_list *list, ent[100];
 	int ents = 0, blobs = 0, paths = 0;
 	const char *path = NULL;
 	struct blobinfo blob[2];
@@ -273,8 +305,9 @@ int cmd_diff(int argc, const char **argv
 		if (!strcmp(obj->type, commit_type))
 			obj = &((struct commit *)obj)->tree->object;
 		if (!strcmp(obj->type, tree_type)) {
-			if (2 <= ents)
-				die("more than two trees given: '%s'", name);
+			if (ARRAY_SIZE(ent) <= ents)
+				die("more than %d trees given: '%s'",
+				    ARRAY_SIZE(ent), name);
 			obj->flags |= flags;
 			ent[ents].item = obj;
 			ent[ents].name = name;
@@ -316,6 +349,8 @@ int cmd_diff(int argc, const char **argv
 			return builtin_diff_b_f(&rev, argc, argv, blob, path);
 			break;
 		case 2:
+			if (paths)
+				usage(builtin_diff_usage);
 			return builtin_diff_blobs(&rev, argc, argv, blob);
 			break;
 		default:
@@ -328,5 +363,7 @@ int cmd_diff(int argc, const char **argv
 		return builtin_diff_index(&rev, argc, argv);
 	else if (ents == 2)
 		return builtin_diff_tree(&rev, argc, argv, ent);
+	else
+		return builtin_diff_combined(&rev, argc, argv, ent, ents);
 	usage(builtin_diff_usage);
 }
diff --git a/combine-diff.c b/combine-diff.c
index ca36f5d..8a8fe38 100644
--- a/combine-diff.c
+++ b/combine-diff.c
@@ -831,15 +831,16 @@ void show_combined_diff(struct combine_d
 	}
 }
 
-void diff_tree_combined_merge(const unsigned char *sha1,
-			     int dense, struct rev_info *rev)
+void diff_tree_combined(const unsigned char *sha1,
+			const unsigned char parent[][20],
+			int num_parent,
+			int dense,
+			struct rev_info *rev)
 {
 	struct diff_options *opt = &rev->diffopt;
-	struct commit *commit = lookup_commit(sha1);
 	struct diff_options diffopts;
-	struct commit_list *parents;
 	struct combine_diff_path *p, *paths = NULL;
-	int num_parent, i, num_paths;
+	int i, num_paths;
 	int do_diffstat;
 
 	do_diffstat = (opt->output_format == DIFF_FORMAT_DIFFSTAT ||
@@ -849,17 +850,8 @@ void diff_tree_combined_merge(const unsi
 	diffopts.with_stat = 0;
 	diffopts.recursive = 1;
 
-	/* count parents */
-	for (parents = commit->parents, num_parent = 0;
-	     parents;
-	     parents = parents->next, num_parent++)
-		; /* nothing */
-
 	/* find set of paths that everybody touches */
-	for (parents = commit->parents, i = 0;
-	     parents;
-	     parents = parents->next, i++) {
-		struct commit *parent = parents->item;
+	for (i = 0; i < num_parent; i++) {
 		/* show stat against the first parent even
 		 * when doing combined diff.
 		 */
@@ -867,8 +859,7 @@ void diff_tree_combined_merge(const unsi
 			diffopts.output_format = DIFF_FORMAT_DIFFSTAT;
 		else
 			diffopts.output_format = DIFF_FORMAT_NO_OUTPUT;
-		diff_tree_sha1(parent->object.sha1, commit->object.sha1, "",
-			       &diffopts);
+		diff_tree_sha1(parent[i], sha1, "", &diffopts);
 		diffcore_std(&diffopts);
 		paths = intersect_paths(paths, i, num_parent);
 
@@ -907,3 +898,25 @@ void diff_tree_combined_merge(const unsi
 		free(tmp);
 	}
 }
+
+void diff_tree_combined_merge(const unsigned char *sha1,
+			     int dense, struct rev_info *rev)
+{
+	int num_parent;
+	const unsigned char (*parent)[20];
+	struct commit *commit = lookup_commit(sha1);
+	struct commit_list *parents;
+
+	/* count parents */
+	for (parents = commit->parents, num_parent = 0;
+	     parents;
+	     parents = parents->next, num_parent++)
+		; /* nothing */
+
+	parent = xmalloc(num_parent * sizeof(*parent));
+	for (parents = commit->parents, num_parent = 0;
+	     parents;
+	     parents = parents->next, num_parent++)
+		memcpy(parent + num_parent, parents->item->object.sha1, 20);
+	diff_tree_combined(sha1, parent, num_parent, dense, rev);
+}
diff --git a/diff.h b/diff.h
index 7150b90..b3b2c4d 100644
--- a/diff.h
+++ b/diff.h
@@ -75,6 +75,8 @@ #define combine_diff_path_size(n, l) \
 extern void show_combined_diff(struct combine_diff_path *elem, int num_parent,
 			      int dense, struct rev_info *);
 
+extern void diff_tree_combined(const unsigned char *sha1, const unsigned char parent[][20], int num_parent, int dense, struct rev_info *rev);
+
 extern void diff_tree_combined_merge(const unsigned char *sha1, int, struct rev_info *);
 
 extern void diff_addremove(struct diff_options *,
diff --git a/revision.c b/revision.c
index f2a9f25..4381d45 100644
--- a/revision.c
+++ b/revision.c
@@ -664,6 +664,7 @@ int setup_revisions(int argc, const char
 			}
 			if (!strcmp(arg, "-c")) {
 				revs->diff = 1;
+				revs->dense_combined_merges = 0;
 				revs->combine_merges = 1;
 				continue;
 			}
-- 
1.3.1.ga0c5

^ permalink raw reply related

* [PATCH/RFC] Extended SHA1 -- "rev^#" syntax to mean "all parents"
From: Junio C Hamano @ 2006-04-29  9:45 UTC (permalink / raw)
  To: git
In-Reply-To: <7virosalxb.fsf@assigned-by-dhcp.cox.net>

A short-hand "rev^#" is understood to be "all parents of the
named commit" with this patch.  So you can do

	git show v1.0.0^#

to view the parents of a merge commit,

	gitk ^v1.0.0^# v1.0.4

to view the log between two revs (including the bottom one), and

	git diff --cc v1.1.0 v1.0.0^#

to inspect what got changed from the merge parents of v1.0.0 to v1.1.0.

This might be just my shiny new toy that is not very useful in
practice.  I needed it to do the multi-tree diff on Len's
infamous 12-way Octopus; typing "diff --cc funmerge funmerge^1
funmerge^2 funmerge^3 ..." was too painful.

Signed-off-by: Junio C Hamano <junkio@cox.net>

---

 revision.c |   37 +++++++++++++++++++++++++++++++++++++
 1 files changed, 37 insertions(+), 0 deletions(-)

diff --git a/revision.c b/revision.c
index f2a9f25..194f35b 100644
--- a/revision.c
+++ b/revision.c
@@ -477,6 +477,36 @@ static void handle_all(struct rev_info *
 	for_each_ref(handle_one_ref);
 }
 
+static int add_parents_only(struct rev_info *revs, const char *arg, int flags)
+{
+	unsigned char sha1[20];
+	struct object *it;
+	struct commit *commit;
+	struct commit_list *parents;
+
+	if (*arg == '^') {
+		flags ^= UNINTERESTING;
+		arg++;
+	}
+	if (get_sha1(arg, sha1))
+		return 0;
+	while (1) {
+		it = get_reference(revs, arg, sha1, 0);
+		if (strcmp(it->type, tag_type))
+			break;
+		memcpy(sha1, ((struct tag*)it)->tagged->sha1, 20);
+	}
+	if (strcmp(it->type, commit_type))
+		return 0;
+	commit = (struct commit *)it;
+	for (parents = commit->parents; parents; parents = parents->next) {
+		it = &parents->item->object;
+		it->flags |= flags;
+		add_pending_object(revs, it, arg);
+	}
+	return 1;
+}
+
 void init_revisions(struct rev_info *revs)
 {
 	memset(revs, 0, sizeof(*revs));
@@ -746,6 +776,13 @@ int setup_revisions(int argc, const char
 			}
 			*dotdot = '.';
 		}
+		dotdot = strstr(arg, "^#");
+		if (dotdot && !dotdot[2]) {
+			*dotdot = 0;
+			if (add_parents_only(revs, arg, flags))
+				continue;
+			*dotdot = '^';
+		}
 		local_flags = 0;
 		if (*arg == '^') {
 			local_flags = UNINTERESTING;
-- 
1.3.1.ga0c5

^ permalink raw reply related

* Re: Two gitweb feature requests
From: Jakub Narebski @ 2006-04-29 11:16 UTC (permalink / raw)
  To: git
In-Reply-To: <1146144425.11909.450.camel@pmac.infradead.org>

It would be nice if gitweb provided for the GIT repository description which
doesn't fit the "Description" column and is shortened, and any other text
which doesn't fit it's own column, full text of said field as the "title"
attribute for the cell 'td' element, or encompasing 'span' element. It
would result in having full, not shortened text of said field (e.g.
repository description) in the pop-up on mouse hover over said field. To
simplify things it could be provided unconditionally, regardless whether
the field needs shortening or not.

By the way, would it be possible for the repository without provided
description to stand out more, perhaps by changing formatting of default
description: 
  "Unnamed repository; edit this file to name it for gitweb."
to use ALL CAPS for attention, like that: 
  "UNNAMED REPOSITORY; edit this file to name it for gitweb."
or perhaps if possible use HTML formatting [additionally] for that.

-- 
Jakub Narebski
Warsaw, Poland

^ permalink raw reply

* Re: fatal: git-write-tree: not able to write tree
From: colin @ 2006-04-29 13:23 UTC (permalink / raw)
  To: junkio; +Cc: git

diff --git a/git-am.sh b/git-am.sh
index eab4aa8..872145b 100755
--- a/git-am.sh
+++ b/git-am.sh
@@ -376,6 +376,13 @@ do
 			echo "No changes - did you forget update-index?"
 			stop_here $this
 		fi
+		unmerged=$(git-ls-files -u)
+		if test -n "$unmerged"
+		then
+			echo "You still have unmerged paths in your index"
+			echo "did you forget update-index?"
+			stop_here $this
+		fi
 		apply_status=0
 		;;
 	esac

Er... it's very non-obvious to me why you'd want to stick a workaround
here when you could instead fix git-write-tree to do it.  That seems
like The Right Thing.
(It would also be helpful to mention at least one unmerged file by name.)

^ permalink raw reply related

* Features ask for git-send-email
From: Bertrand Jacquin @ 2006-04-29 13:30 UTC (permalink / raw)
  To: Git Mailing List

Hi,

Could it be possible to add a features in git-send-email.perl to
accept a differrent charset as iso-8859-1 ? I would like to send
fr_FR.utf8 mail as I use git to manager a latex files tree which are
written in utf8.

Any objection ?

--
Beber
#e.fr@freenode

^ permalink raw reply

* Re: cg-clone not fetching all tags?
From: Wolfgang Denk @ 2006-04-29 14:00 UTC (permalink / raw)
  To: Git Mailing List
In-Reply-To: <20060427105251.AA4B2353DAC@atlas.denx.de>

[Repost because there was zero response, not even somebody telling me
to RTFM or so.]

Hi,

it seems that "cg-clone" does not fetch all tags any more - only  the
most  recent ones (modiufied in the last N days?) seem to be fetched?
[Eventually the "N days"  might  correspond  to  "changing  tools  to
version X", but I have no way to find out.]

This happens only when using HTTP; using ssh  or  rsync  works  fine.
Also,  if  we follow the "cg-clone" by a "git-fetch -t" command, this
will load the missing tags.

Is this intentional, or am I doing anything wrong?

[For testing, try "cg-clone http://www.denx.de/git/u-boot.git"]

Best regards,

Wolfgang Denk

-- 
Software Engineering:  Embedded and Realtime Systems,  Embedded Linux
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de
In theory, there is no difference between  theory  and  practice.  In
practice, however, there is.

^ permalink raw reply

* Re: [RFC] [PATCH 0/5] Implement 'prior' commit object links (and other commit links ideas)
From: Jakub Narebski @ 2006-04-29 14:59 UTC (permalink / raw)
  To: git
In-Reply-To: <e2kset$lk2$1@sea.gmane.org>

Jakub Narebski wrote:

> Junio C Hamano wrote:
> 
>> Jakub Narebski <jnareb@gmail.com> writes:
>> 
>>> Actually, this can be resolved using automatic history grafts to the
>>> remote repository we pulled from, if the commit is not present on local
>>> side (and removing graft when commit appears on local side).
>> 
>> You do not even need history grafts.  The "cherry-pick source"
>> was a bad example.  Maybe using "related" as a way to implement
>> "bind" would have been a better example -- we want inter-commit
>> relationship that requires connectivity but without ancestry for
>> them.
>> 
>> You can just have two kinds of 'related'.  One that means
>> connectivity, the other that does not.
> 
> Good idea.
> 
> Another problem for core git, but I think orthogonal to the
> "related"/"note" distinction is if the relation (or note) should be used
> as helper in merges, perhaps by some agreed upon convention on the
> comment/description/value part (e.g. "mergehelper" or "mergeinfo").

Scratch that. It would be better for merge strategy just to check for
defined set of "links" and "notes", e.g. "prior" (pu-prior) and
"original" (cherrypick).

But there would be problem with connectivity provided by "link" relations,
namely info/grafts file, which deal only with parents. For example when we
cauterize history using grafts (e.g. for shallow clone) the "link" like
"prior" reaching to the cut-off part of the history might make your day ;-)

Well, we could always drop all the connectivity, and make
  link sha1 description...
shortcut for
  note link sha1 description...

-- 
Jakub Narebski
Warsaw, Poland

^ permalink raw reply

* Re: [PATCH/RFC] Extended SHA1 -- "rev^#" syntax to mean "all parents"
From: Johannes Schindelin @ 2006-04-29 15:38 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <7vaca490ln.fsf@assigned-by-dhcp.cox.net>

Hi,

On Sat, 29 Apr 2006, Junio C Hamano wrote:

> A short-hand "rev^#" is understood to be "all parents of the
> named commit" with this patch.

Just my 2/100: Why not "rev^*"? I could remember that more easily.

Ciao,
Dscho

^ permalink raw reply

* [PATCH] git-format-patch: Use rfc2822 compliant date.
From: Huw Davies @ 2006-04-29 15:50 UTC (permalink / raw)


Signed-off-by: Huw Davies <huw@codeweavers.com>


---

 git-format-patch.sh |    9 ++++-----
 1 files changed, 4 insertions(+), 5 deletions(-)

fa12b0e43a0559101551d697866c01a92778c67f
diff --git a/git-format-patch.sh b/git-format-patch.sh
index c7133bc..c077f44 100755
--- a/git-format-patch.sh
+++ b/git-format-patch.sh
@@ -205,11 +205,10 @@ sub show_date {
     }
     my $t = $time + $minutes * 60;
     my ($sec,$min,$hour,$mday,$mon,$year,$wday,$yday) = gmtime($t);
-    return sprintf("%s %s %d %02d:%02d:%02d %d %+05d",
-		   $weekday_names[$wday],
-		   $month_names[$mon],
-		   $mday, $hour, $min, $sec,
-		   $year+1900, $tz);
+    return sprintf("%s, %d %s %d %02d:%02d:%02d %+05d",
+		   $weekday_names[$wday], $mday,
+		   $month_names[$mon], $year+1900,
+		   $hour, $min, $sec, $tz);
 }
 
 print "From nobody Mon Sep 17 00:00:00 2001\n";
-- 
1.3.0

^ permalink raw reply related

* Re: [RFC] [PATCH 0/5] Implement 'prior' commit object links (and
From: linux @ 2006-04-29 16:51 UTC (permalink / raw)
  To: git

Boy, this is an interesting discussion!
On the one hand, it seems "obvious" to me that extra links might be
useful.  But Linus's minimalist points have a lot of merit.

I have to agree, it's important to think of a single practical use before
adding the feature.  So let's do a little brainstorming...

For just referring to another commit, there's no problem putting
it in the body.  A sensible porcelain GUI will, when it seems something
that looks like an object identifier in a comment, and that object
identifier exists, make it a clickable link.  So a comment like:

"This fixes the same problem as <commit>, but is a cleaner
(albeit more invasive) fix."

Would do the right thing: the user reading it could easily jump
to the other comment.  A "header" link as opposed to a "comment"
link just has the property of being unambiguous.  No heuristic
will guess that a link should exist when there isn't.

So, what is that property useful for?

Now, one thing that porcelains provide, in addition to "parent" links,
is "child" links.  Useful.  But it could be done with commit comment
links as well, and it's not clear that having the link in the commit
header as opposed to the comment would help much.  You still have to
find and uncompress part of each commit to generate the history
tree.  Does uncompressing the rest of it and running a heuristic
over the text for really cost that much?

I'm not convinced it's needed for that feature.  (I'd sooner argue for
never compressing commit objects in packs on the grounds that the
repeated uncompression while browsing is worth saving more than the
relatively minor disk space.)

So to be valuable, and inadvisable to express with a specially
formatted comment, it has to be something that would be Very Bad
to get wrong.  What qualifies?

Maybe some merge algorithm information?  If the merge could be told that
this change "is the same" as that change, so it can be skipped when
cherry-picking that branch, and the information was wrong, that could
cause lots of problems.

But given that git-cherry already uses (imperfect) heuristics to
detect already-merged patches, and they seem to work well enough, is
that a strong enough argument?  Is there some other merge application
where it would help?

Now, the "this other object should exist in the repository, and it's an
error if you can't fetch it" link obviously needs to be unambiguously
distinguished from, say, a reference to the (Linux kernel) dodecapus merge
in a git tree checkin comment.  But, as Linus says, what reason is there
for including it?  What do you need the commit in the repository for?

Well, the only reason that you need ANY commit in the repository is
because it's part of history, and comparing it with other versions is
meaningful.  So what trees, not already in the ancestry graph of a
given commit, are useful to compare to?  In particular, useful for some
automated process; manual comparisons can always be done manually.

Nothing's jumping out at me.  Any suggestions?

^ permalink raw reply

* Re: [PATCH/RFC] Extended SHA1 -- "rev^#" syntax to mean "all parents"
From: Linus Torvalds @ 2006-04-29 16:59 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Junio C Hamano, git
In-Reply-To: <Pine.LNX.4.63.0604291737390.25093@wbgn013.biozentrum.uni-wuerzburg.de>

On Sat, 29 Apr 2006, Johannes Schindelin wrote:
> 
> > A short-hand "rev^#" is understood to be "all parents of the
> > named commit" with this patch.
> 
> Just my 2/100: Why not "rev^*"? I could remember that more easily.

Yeah, that (or ^@ - to match shell "$1" "$2" .. "$@") was my reaction too.

The "rev^#" thing should to my mind return the _number_ of parents, the 
same way "$#" does in shell. I actually pronounce that '#' character 
mentally as "number", but maybe that's just because I'm totally mentally 
damaged by shell programming, and everybody else probably calls it "hash" 
(and some people apparently call it "pound", for some really sick reason).

So "rev^#" literally reads as "revision parent number" to me. Useful? 
Maybe. Maybe not.

		Linus

^ permalink raw reply

* Re: cg-clone not fetching all tags?
From: Petr Baudis @ 2006-04-29 17:05 UTC (permalink / raw)
  To: Wolfgang Denk; +Cc: Git Mailing List
In-Reply-To: <20060429140042.1FB37353DAC@atlas.denx.de>

  Hi,

Dear diary, on Sat, Apr 29, 2006 at 04:00:42PM CEST, I got a letter
where Wolfgang Denk <wd@denx.de> said that...
> it seems that "cg-clone" does not fetch all tags any more - only  the
> most  recent ones (modiufied in the last N days?) seem to be fetched?
> [Eventually the "N days"  might  correspond  to  "changing  tools  to
> version X", but I have no way to find out.]
> 
> This happens only when using HTTP; using ssh  or  rsync  works  fine.
> Also,  if  we follow the "cg-clone" by a "git-fetch -t" command, this
> will load the missing tags.
> 
> Is this intentional, or am I doing anything wrong?
> 
> [For testing, try "cg-clone http://www.denx.de/git/u-boot.git"]

  you need to run git-update-server-info every time you add or update a
tag (or best every time you push). See the NOTES section of
cg-admin-setuprepo documentation for details on how to set it up to be
called automagically at every push.

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
Right now I am having amnesia and deja-vu at the same time.  I think
I have forgotten this before.

^ permalink raw reply

* Re: [RFC] [PATCH 0/5] Implement 'prior' commit object links (and
From: Linus Torvalds @ 2006-04-29 17:35 UTC (permalink / raw)
  To: linux; +Cc: git
In-Reply-To: <20060429165151.2570.qmail@science.horizon.com>

On Sat, 29 Apr 2006, linux@horizon.com wrote:
> 
> Well, the only reason that you need ANY commit in the repository is
> because it's part of history, and comparing it with other versions is
> meaningful.  So what trees, not already in the ancestry graph of a
> given commit, are useful to compare to?  In particular, useful for some
> automated process; manual comparisons can always be done manually.
> 
> Nothing's jumping out at me.  Any suggestions?

The only thing that I've ever wondered about is the "base commit of a 
merge".

Now, the thing is, we can always compute it. That's true _iff_ we've 
merged using the standard merge mechanism, but it wasn't always true 
historically (eg the original merges were computed with the original 
"git-merge-base" algorithm, which just picked the _first_ merge base it 
would find, while these days we use multiple ones for criss-cross merges).

So I would not totally object if a merge algorithm added a

	merge-base <sha1>

notation. But while it _could_ be just a "note merge-base <sha1>", it 
should _not_ be a "link <sha1> merge-base".

Let me explain why I think there are differences between those three 
options, and why I actually think that two of them are "valid" ideas, 
while the third one is not.

 - Case 1: the

	merge-base <sha1>

   is a "valid" idea (where there might of course be more than one <sha1>, 
   and possibly more than one "merge-base" line: you'd have to have some 
   rule for what happens for a recursive merge), although it has the 
   generally big down-side of being redundant information in all current 
   setups.

   It's redundant, but at the same time it's information that in _theory_ 
   might not be redundant, because I can see a situation where a merge was 
   forced by manually specifying a merge base (eg a special merge like the 
   original "gitk" merge, merging two initially unrelated projects 
   together).

   In theory. So it could be real information for a merge commit. And we'd 
   enforce some kind of real semantics for it - and it would have a really 
   solid technical meaning: assuming we define the multi-merge-base 
   semantics properly it would NEVER have any question about "what are 
   best practices?" or "what does this mean?".

   So this "case 1" actually has technical consequences, but you can, for 
   example, actually _check_ them. You can make fsck literally complain if 
   the merge base doesn't make sense. There's a clear "technical 
   violation", which might not be entirely trivial to figure out, but 
   thanks to it having a good meaning and a strict definition, it's 
   _there_.

Now, in all honesty, I don't think "case 1" is a _good_ thing to do. I'm 
just saying that I wouldn't be as upset about it as I've been over this 
"link" discussion. The reason I think "case 1" sucks is simply that I 
think you can in _practice_ get all the benefits much better with "case 
2", even if that one doesn't imply any actual git semantics:

 - Case 2: the

	note merge-base <sha1>

   thing is _also_ a perfectly valid idea, because now it's also very 
   well-defined: the "note" part tells you that git doesn't actually 
   impose any semantics what-so-ever on it, so it's really just a comment, 
   and as in case 1 above, once you see it as a comment, the _meaning_ of 
   it is immediately clear. It's literally just a note from the merge 
   algorithm saying "I used this as a merge base".

   The "note" syntax actually has a huge advantage. When you see it as a 
   comment from the merge algorithm, you immediately think it might also 
   be a good idea to add a few other notes. So a merge commit might 
   actually have

	note merge-algorithm recursive
	note merge-conflicts none
	note merge-base <sha1>

   all make total sense. It's telling you what the algorithm used was, and 
   that it didn't neen any manual fixups. It's also telling you that none 
   of this has _any_ impact what-so-ever from a "git semantics" angle, and 
   that this is nothing but a note for anybody who starts digging into it.

So now I've shown _two_ examples of some kind of header that I think 
actually makes sense, and that I would not argue against on those grounds. 
Especially the "note" thing I think is fine. So why, oh why, do I hate the 
"link" thing so much?

 - Case 3: the

	link <sha1> merge-base

   thing is a horrible and nasty thing that we should never ever support.

   Why? Because it's literally designed to both have some semantic meaning 
   ("git will fetch the <sha1> and use it for connectivity analysis") 
   _and_ at the same time the whole syntax it's designed to _not_ have any 
   real meaning ("you can have any kind of link, and I don't know what 
   it actually means from a conceptual standpoint").

   So it has a meaning from an _implementation_ angle, but at the same 
   time it does not have a "higher cause". That is EVIL. When they say 
   "The road to hell is paved with good intentions", the implication there 
   is not that good intentions is bad per se, but that you should 
   understand that there are "Unintended Consequences".

   And if you cannot limit the thing to a very _specific_ higher-level 
   meaning, you by definition will have those "unintended consequences".

In short, the difference between three headers that on the face of it say 
exactly the same thing: "merge-base <sha1>", "note merge-base <sha1>", and 
"link merge-base <sha1>" is not that they have different syntax (hey, even 
the syntax itself is almost identical), but exactly the fact that they 
have different implications and _meaning_.

Two of the three have no unintended consequences. One ("note") has no 
technical "consequences" at _all_, by definition. The other "merge-base" 
has no technical "unintended" at all, because it's throught through, and 
has been fully defined.

The third? "unintended consequences". It doesn't have a clear definition 
("It's cool. You can use it for any link you want"). So pretty much BY 
DESIGN, it's set up so that you don't know what the consequences of it 
will be for a project.

And that's why "case 3" it's bad. Even though it looks very much like the 
two other ones.

			Linus

^ permalink raw reply

* Re: [RFC] [PATCH 0/5] Implement 'prior' commit object links (and
From: Jakub Narebski @ 2006-04-29 18:07 UTC (permalink / raw)
  To: git
In-Reply-To: <Pine.LNX.4.64.0604291006270.3701@g5.osdl.org>

Linus Torvalds wrote:

>  - Case 1: the
> 
>         merge-base <sha1>
[...]
>  - Case 2: the
> 
>         note merge-base <sha1>
[...]
>  - Case 3: the
> 
>         link <sha1> merge-base
[...]

> In short, the difference between three headers that on the face of it say
> exactly the same thing: "merge-base <sha1>", "note merge-base <sha1>", and
> "link merge-base <sha1>" is not that they have different syntax (hey, even
> the syntax itself is almost identical), but exactly the fact that they
> have different implications and _meaning_.
> 
> Two of the three have no unintended consequences. One ("note") has no
> technical "consequences" at _all_, by definition. The other "merge-base"
> has no technical "unintended" at all, because it's throught through, and
> has been fully defined.
> 
> The third? "unintended consequences". It doesn't have a clear definition
> ("It's cool. You can use it for any link you want"). So pretty much BY
> DESIGN, it's set up so that you don't know what the consequences of it
> will be for a project.
> 
> And that's why "case 3" it's bad. Even though it looks very much like the
> two other ones.

IF (and that is big if) git commit header will be extended to have some
extra "link" (enforcing connectivity) headers, like proposed "bind" for
subprojects, "prev" for pu-like union branches, "merge-base" for merges,
there would be repeated work on enforcing connectivity. Hence generic
"link" header (formerly "related") proposal. Having fsck report broken
links (or not), having purge removing commits (objects) reachable only via
"link" headers, having pull download commits via "link" headers... have I
forgot anything? It _seems_ that this part is common, and does not depend
on semantics.

But with "links" (connectivity headers) there always would be some other
consequences. For example info/grafts deals for now only with commit
parents, and extending the format could be difficult.

And of course if we want connectivity, this is for some reason, so the
"link" has some other consequences, for example "prev" and "merge-base" for
merging, "bind" for checkout, merge (but differently), etc.

I think that if it is 'helper' information (i.e. information which is
helpful, but we can do without it) and of no real importance to user then
use "note". If it is of importance to user (for example "cherrypick" or
"reverted") and of use to git, then repeat such info in "note" header to
avoid relying on parsing free-form part aka. commit comment. If
connectivity is needed... hmmm...

-- 
Jakub Narebski
Warsaw, Poland

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox