Git development

Git development
 help / color / mirror / Atom feed

* cogito cg-update fails
From: Benjamin Herrenschmidt @ 2005-05-03  3:19 UTC (permalink / raw)
  To: Git Mailing List

Hi Folks !

I have something weird happening with cogito. What I did is:

 - d/l & install 0.8 archive
 - cg-init <rync path>
 - built & install that, removed 0.8 files
 - a bit later: cg-update origin to check for new stuffs

The last one fails with:

benh@pogo:~/cogito$ cg-update origin
MOTD:
MOTD:   .../.. stripped kernel.org legal blurb

receiving file list ... done
.git/refs/heads/origin

sent 119 bytes  received 857 bytes  390.40 bytes/sec
total size is 41  speedup is 0.04
rsync: link_stat "/home/benh/cogito/origin/objects/." failed: No such file or directory (2)
building file list ... done
rsync error: some files could not be transferred (code 23) at main.c(702)

sent 17 bytes  received 20 bytes  74.00 bytes/sec
total size is 0  speedup is 0.00
cg-pull: objects pull failed

So it looks like it's trying to rsync to a bogus destination ...

Ben.

^ permalink raw reply

* Re: More problems...
From: Daniel Barkalow @ 2005-05-03  2:56 UTC (permalink / raw)
  To: Petr Baudis
  Cc: Linus Torvalds, Anton Altaparmakov, Russell King, Junio C Hamano,
	Ryan Anderson, git
In-Reply-To: <20050503014816.GQ20818@pasky.ji.cz>

On Tue, 3 May 2005, Petr Baudis wrote:

> BTW, I've just committed support for pulling from remote repositories
> over the HTTP and SSH protocols (http://your.git/repo,
> git+ssh://root@git.nasa.gov/srv/git/mars) (note that I was unable to
> test the SSH stuff properly now; success reports or patches welcome).
> Also, the local hardlinking access is now done over git-local-pull,
> therefore the cp errors should go away now.

Before you get too far with the SSH version, I have some protocol changes
which (1) allow transmission of things other than objects; (2) allow the
pushing side to report that it doesn't have something without killing the
connection; (3) send refs. (1) and (2) are needed to make the protocol
extensible; (3) takes advantage of (1) to make it possible to maintain a
remote repository without doing anything other than rpush to it.

This goes with my patches from the weekend to enable git-*-pull to
transfer refs/ files in the same process.

> I'm not yet decided whether locations like
> 
> 	kernel.org:/pub/scm/cogito/cogito.git
> 
> should invoke rsync, rpull, throw an error or print a fortune cookie.

Probably not rpull, which requires a login, at least not unless the others
fail. I think that http-pull is going to be nicer in the long run than
rsync, since the remote repository could have a bunch of mingled heads
and http-pull will get exclusively the interesting stuff. If you're trying
to push, then rpush, since that's the only push.

Personally, I've been using http://... for http-pull, rsync://... for
rsync, and //... for rpull/rpush (which is somewhat justified wrt the URI
standard for using the program's default method).

	-Daniel
*This .sig left intentionally blank*

^ permalink raw reply

* Re: [PATCH] Git-prune-script loses blobs referenced from an uncommitted cache.
From: Linus Torvalds @ 2005-05-03  2:56 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <7vis21no03.fsf@assigned-by-dhcp.cox.net>

On Mon, 2 May 2005, Junio C Hamano wrote:
>
> When a new blob is registered with update-cache, and before the cache
> is written as a tree and committed, git-fsck-cache will find the blob
> unreachable.  This patch fixes git-prune-script to keep such blob objects
> referenced from the cache.

Actually, I'd almost rather just have git-fsck-cache actually do a
"read_cache()" and walk through that and marking the sha1's as "needed".

That's useful for another reason: not only does it mean that we don't drop 
objects that may be in the current index, but it _also_ means that we 
check that the current index actually has everything that it needs. I had 
that situation a few times after I did a "convert-cache" - where I had an 
old index file that pointed to the old objects _before_ the conversion.

I noticed it the hard way, and happily it's easily fixed by just doing a 
"git-read-cache <new-head>", but it was actually very confusing when it 
happened, and it would have been good to have fsck-cache warn about it.

		Linus

^ permalink raw reply

* Re: Mercurial 0.4b vs git patchbomb benchmark
From: Linus Torvalds @ 2005-05-03  2:48 UTC (permalink / raw)
  To: Matt Mackall; +Cc: Bill Davidsen, Morten Welinder, Sean, linux-kernel, git
In-Reply-To: <20050503000011.GA22038@waste.org>

On Mon, 2 May 2005, Matt Mackall wrote:
> 
> Umm.. I am _not_ calculating the SHA of the delta itself. That'd be
> silly.

It's not silly.

Meta-data consistency is supremely important. If people can corrupt their 
metadata in strange an unobservable ways, that's almost as bad as 
corrupting the data itself. In fact, to some degree it's worse, since you 
make people trust the thing, but you don't actually guarantee it.

So how _do_ you guarantee consistency of a tree and the history that led 
up to it? 

And by that I don't mean any of the individual blobs - I realize that it's 
perfectly valid to just check out every single version, and have the sha1 
of that. But how do you guarantee that the sha's you check are the sha's 
that you saved in the first place, and somebody didn't replace something 
in the middle?

In other words, you need to hash the metadata too. Otherwise how do you
consistency-check the _collection_ of files?

It's absolutely not enough to just protect single-file content. That 
doesn't help one whit. It's not what a SCM is all about. You have to 
protect the state of _multiple_ files, ie the metadata has to be 
verifiable too.

If that meta-data is the index, then the index needs to be protected by a
SHA1. In git, that's why we don't just sha1 every blob, but every tree and
every commit. That's the thing that gets consistency _beyond_ a single
file.

> As various people have pointed out, you can hack delta transmission
> and file revision indexing on top of git. But to do that, you'll need
> to build the same indices that Mercurial has. And you'll need to check
> their integrity.

No, absolutely not.

Building indeces on top of git would be stupid. You can _cache_ deltas,
but there's a big difference between a index that actually describes how
random blobs go together, and a cache of a delta between two
well-specified end-points. And in particular, there is no "consistency" to
a delta. You don't need it.

Why? Because either the delta is correct, or it isn't. If it's correct,
the end result will be the right sha1. If it's not, the end result will be
something else. So when you do a "pull" from another repository, you can
trivially check whether the delta's you got were valid: did applying them
result in the same sha1 that the other repository had?

So git really validates the _only_ thing that matters: it validates the 
state of the data. It doesn't validate anything else, but if validates 
that one thing very completely indeed.

		Linus

^ permalink raw reply

* Re: [PATCH] delta compressed git
From: Nicolas Pitre @ 2005-05-03  2:43 UTC (permalink / raw)
  To: Chris Mason; +Cc: git
In-Reply-To: <200505022130.10958.mason@suse.com>

On Mon, 2 May 2005, Chris Mason wrote:


^ permalink raw reply

* Re: [PATCH] Add exclude file support to cg-status
From: Matt Porter @ 2005-05-03  2:33 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Petr Baudis, git
In-Reply-To: <7vd5s9nmio.fsf@assigned-by-dhcp.cox.net>

On Mon, May 02, 2005 at 06:09:19PM -0700, Junio C Hamano wrote:
> >>>>> "MP" == Matt Porter <mporter@kernel.crashing.org> writes:
> 
> MP> Adds a trivial per-repository exclude file implementation for
> MP> cg-status on top of the new git-ls-files option.
> 
>  
> MP> +EXCLUDEFILE=.git/exclude
> 
> Good intentions, but shouldn't the file be .git/info/exclude
> (i.e. under .git/info)?

My reasoning for not doing something like this was that there is
only ever one exclude file.  In other instances of cogito specific
data in the .git directory, there is a subdir named for the class
of data being stored there (i.e. branches, refs).  In this case,
it didn't seem necessary.  On the other hand, this made me
wonder whether there should just be a .git/cginfo subdir where
exclude, branches, refs, etc. all live under since they are
cogito specfic functionality. Something like:

.git/cginfo/

	    exclude
	    branches/
	    refs/

and so on...

-Matt

^ permalink raw reply

* Re: More problems...
From: Petr Baudis @ 2005-05-03  1:48 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Anton Altaparmakov, Russell King, Junio C Hamano, Ryan Anderson,
	git
In-Reply-To: <Pine.LNX.4.58.0505021509530.3594@ppc970.osdl.org>

Dear diary, on Tue, May 03, 2005 at 12:19:16AM CEST, I got a letter
where Linus Torvalds <torvalds@osdl.org> told me that...
> But for "normal" situations, where you have a tree or two, the hardlinking 
> win might not be big enough to warrant the maintenance headache. With 
> hardlinking, you _do_ need to "trust" the other trees to some degree.

As long as the trees aren't yours and you aren't doing something really
horrible with them...

$ time git-local-pull -a -l $(cat ~/git-devel/.git/HEAD) ~/git-devel/.git/
real    0m0.332s

$ time git-local-pull -a $(cat ~/git-devel/.git/HEAD) ~/git-devel/.git/
real    0m4.306s

And this is only 13M Cogito objects database. I think one of the
important things is to encourage branching, therefore it must be fast
enough; that's why I really wanted to do hardlinks. The disk space is
important, but the speed hit probably equally (if not more) so.

BTW, the object database files should have 0444 or such; they really
_are_ read-only and making them so mode-wise could help against some
mistakes too.

It's clear that Cogito should have a way to choose whether to hardlink
or copy; the question is which one should be the default one and how
should it be specified.  I thought about using file:// vs. just local
path to differentiate between copy and hardlinking, but that'd be
totally non-obvious, therefore bad UI-wise.

BTW, I've just committed support for pulling from remote repositories
over the HTTP and SSH protocols (http://your.git/repo,
git+ssh://root@git.nasa.gov/srv/git/mars) (note that I was unable to
test the SSH stuff properly now; success reports or patches welcome).
Also, the local hardlinking access is now done over git-local-pull,
therefore the cp errors should go away now.

I'm not yet decided whether locations like

	kernel.org:/pub/scm/cogito/cogito.git

should invoke rsync, rpull, throw an error or print a fortune cookie.

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor

^ permalink raw reply

* Re: How to get bash to shut up about SIGPIPE?
From: Paul Jackson @ 2005-05-03  1:44 UTC (permalink / raw)
  To: Petr Baudis; +Cc: rene.scharfe, torvalds, git
In-Reply-To: <20050502231743.GL20818@pasky.ji.cz>

> Could you elaborate on how exactly is it supposed to help?

The key code is in bash/jobs.c.

If you have a bash while or for loop feeding a pipe that shuts down
while the loop is still running commands that try to write the pipe
(perhaps you were pipe'ing to "head -1", and it exit'd, having read its
one line), then the next command to attempt to write that pipe will die,
and the bash instance that is handling that loop (and forked that
command that just died) will notice the command died with a SIGPIPE
signal.

At this point, one of three possible things happens:

 1) If your bash is compiled with DONT_REPORT_SIGPIPE defined, then
    that bash instance quietly leaves.  The concensus around here
    is that is "good(tm)."

 2) If not so compiled, then:

	2a) If you set a trap on SIGPIPE in that shell, it prints:

		fprintf (stderr, "%s", j_strsignal (termsig));

	2b) Else if you did not trap SIGPIPE, it prints:

		fprintf (stderr, "%s: line %d: ", get_name_for_error (),
				(line_number == 0) ? 1 : line_number);
		pretty_print_job (job, JLIST_NONINTERACTIVE, stderr);

The pretty_print_job() in (2b) can be a multi-line confusion.

Sample output for (2a):

	Broken pipe

Sample output in simple one line case for (2b):

	foo: line 2: 11663 Broken pipe             cat /etc/termcap

A couple of others are reporting different behaviour than what I report
above - including Linus.

So it is almost certain that I don't understand all I know about this.

What behaviour do you see?

-- 
                  I won't rest till it's the best ...
                  Programmer, Linux Scalability
                  Paul Jackson <pj@engr.sgi.com> 1.650.933.1373, 1.925.600.0401

^ permalink raw reply

* [PATCH] delta compressed git
From: Chris Mason @ 2005-05-03  1:30 UTC (permalink / raw)
  To: git

[-- Attachment #1: Type: text/plain, Size: 1817 bytes --]

Hello everyone,

Here's an early form of some code for delta compression in git archives.  It 
builds on top of my packed item patch from before.  Using this patch will 
create repositories that can't be read by unpatched git, and it is only ready 
for light testing.  The file format might change slightly in later revs.

deltas live as subfiles in packed files, and the packed item header has the 
sha1 of the file the delta is against. deltas are never taken against deltas, 
only whole files (so the chain length is only 1).  

When importing all of Ingo's bk->cvs patches into git (28,000 changesets), 
delta git applies the patches faster (2hrs vs 2.5hrs), consumes less space 
(900MB vs 2.5GB), and checks out the resulting git tree faster in hot and 
cold caches.

Another 200MB or so would be saved by packing trees and commits into the same 
files as the blobs.  This is easy to do, but makes the patch harder to 
maintain because I need to move code around in commit-tree.c and 
write-tree.c.  So I've left those bits out for now.

Because the packed files are created per changeset, if a changeset only 
modifies one file the delta will still end up using a whole block.  So, you 
could get much higher space savings with a tool to walk back over existing 
changesets and pack them together.  This doesn't exist yet, but wouldn't be 
difficult, and I expect it to get close to the mercurial/bk repository sizes.

The patch uses zdelta for delta compression, which you can download here:
http://cis.poly.edu/zdelta/

I'm open to suggestions on better delta libs.  I picked this one because it 
was easy to code.  In order for things to work with git you need to apply the 
attached zdelta.diff to the zdelta-2.1 sources.  It fixes a silly default in 
the Makefile and a symbol collision with zlib.

-chris

[-- Attachment #2: zdelta.diff --]
[-- Type: text/x-diff, Size: 1715 bytes --]

diff -ur zdelta-2.1.orig/infcodes.c zdelta-2.1/infcodes.c
--- zdelta-2.1.orig/infcodes.c	2003-10-26 19:30:09.000000000 -0500
+++ zdelta-2.1/infcodes.c	2005-05-02 16:03:37.000000000 -0400
@@ -145,7 +145,7 @@
     if (m >= MAX_MATCH && n >= 10) 
     {
       UPDATE
-      r = inflate_fast(c->lbits, c->dbits, c->zdbits, 
+      r = zd_inflate_fast(c->lbits, c->dbits, c->zdbits, 
 		       c->ltree, c->dtree, c->zdtree, s, z);
       LOAD
       if (r != ZD_OK)
Only in zdelta-2.1: infcodes.o
diff -ur zdelta-2.1.orig/inffast.c zdelta-2.1/inffast.c
--- zdelta-2.1.orig/inffast.c	2003-10-26 19:30:09.000000000 -0500
+++ zdelta-2.1/inffast.c	2005-05-02 16:03:20.000000000 -0400
@@ -8,7 +8,7 @@
 /* zdelta:
  *
  * modified: 
- *          inflate_fast
+ *          zd_inflate_fast
  * added:
  *          --
  * removed:
@@ -41,7 +41,7 @@
 /*
  * zdelta: modified
  */
-int inflate_fast(bl, bd, bzd, tl, td, tzd, s, z)
+int zd_inflate_fast(bl, bd, bzd, tl, td, tzd, s, z)
 uInt bl, bd, bzd;
 inflate_huft *tl;
 inflate_huft *td;
diff -ur zdelta-2.1.orig/inffast.h zdelta-2.1/inffast.h
--- zdelta-2.1.orig/inffast.h	2003-10-26 19:30:13.000000000 -0500
+++ zdelta-2.1/inffast.h	2005-05-02 16:02:58.000000000 -0400
@@ -22,7 +22,7 @@
 
 #ifndef ZD_INFFAST_H
 #define ZD_INFFAST_H
-extern int inflate_fast OF((
+extern int zd_inflate_fast OF((
     uInt,
     uInt,
     uInt,
diff -ur zdelta-2.1.orig/Makefile zdelta-2.1/Makefile
--- zdelta-2.1.orig/Makefile	2004-02-13 18:19:51.000000000 -0500
+++ zdelta-2.1/Makefile	2005-05-02 15:30:08.000000000 -0400
@@ -35,7 +35,7 @@
 
 CC=gcc
 
-CFLAGS= -O2 -W -Wall -pedantic -ansi -g -DREFNUM=2
+CFLAGS= -O2 -W -Wall -pedantic -ansi -g -DREFNUM=1
 
 LDSHARED=$(CC)
 CPP=$(CC) -E

[-- Attachment #3: delta-tree.diff --]
[-- Type: text/x-diff, Size: 23440 bytes --]

Index: Makefile
===================================================================
--- 89fdfd09b281fdf5071bc13a30ef683bd6851b61/Makefile  (mode:100644 sha1:2d2913b6b98ac836b43755b1304d2a838dad87dd)
+++ uncommitted/Makefile  (mode:100644)
@@ -36,7 +36,7 @@
 LIB_OBJS += diff.o
 
 LIBS = $(LIB_FILE)
-LIBS += -lz
+LIBS += -lzd -lz
 
 ifdef MOZILLA_SHA1
   SHA1_HEADER="mozilla-sha1/sha1.h"
Index: cache.h
===================================================================
--- 89fdfd09b281fdf5071bc13a30ef683bd6851b61/cache.h  (mode:100644 sha1:3277d48708f885fa1b7cc56c9d16061c65a2eeb9)
+++ uncommitted/cache.h  (mode:100644)
@@ -16,6 +16,7 @@
 
 #include SHA1_HEADER
 #include <zlib.h>
+#include <zdlib.h>
 
 /*
  * Basic data structures for the directory cache
@@ -64,6 +65,18 @@
 	char name[0];
 };
 
+struct packed_item {
+	/* length of compressed data */
+	unsigned long len;
+	struct packed_item *next;
+	/* sha1 of uncompressed data */
+	char sha1[20];
+	char refsha1[20];
+	char type[20];
+	/* compressed data */
+	char *data;
+};
+
 #define CE_NAMEMASK  (0x0fff)
 #define CE_STAGEMASK (0x3000)
 #define CE_STAGESHIFT 12
@@ -119,7 +132,7 @@
 
 /* Read and unpack a sha1 file into memory, write memory to a sha1 file */
 extern void * map_sha1_file(const unsigned char *sha1, unsigned long *size);
-extern void * unpack_sha1_file(void *map, unsigned long mapsize, char *type, unsigned long *size);
+extern void * unpack_sha1_file(const unsigned char *sha1, void *map, unsigned long mapsize, char *type, unsigned long *size);
 extern void * read_sha1_file(const unsigned char *sha1, char *type, unsigned long *size);
 extern int write_sha1_file(char *buf, unsigned long len, const char *type, unsigned char *return_sha1);
 
@@ -135,6 +148,10 @@
 /* Convert to/from hex/sha1 representation */
 extern int get_sha1_hex(const char *hex, unsigned char *sha1);
 extern char *sha1_to_hex(const unsigned char *sha1);	/* static buffer result! */
+extern int pack_sha1_buffer(void *buf, unsigned long buf_len, char *type,
+                            unsigned char *returnsha1, unsigned char *refsha1, 
+			    struct packed_item **);
+int write_packed_buffer(struct packed_item *head);
 
 /* General helper functions */
 extern void usage(const char *err);
Index: fsck-cache.c
===================================================================
--- 89fdfd09b281fdf5071bc13a30ef683bd6851b61/fsck-cache.c  (mode:100644 sha1:f9b1431dd8f4f3b426a7e410de952277aaa11401)
+++ uncommitted/fsck-cache.c  (mode:100644)
@@ -142,7 +142,7 @@
 		if (map) {
 			char type[100];
 			unsigned long size;
-			void *buffer = unpack_sha1_file(map, mapsize, type, &size);
+			void *buffer = unpack_sha1_file(sha1, map, mapsize, type, &size);
 			if (!buffer)
 				return -1;
 			if (check_sha1_signature(sha1, buffer, size, type) < 0)
Index: git-mktag.c
===================================================================
--- 89fdfd09b281fdf5071bc13a30ef683bd6851b61/git-mktag.c  (mode:100644 sha1:5d2830dc2bdfa2e76afc3fd4687db8faffaefba2)
+++ uncommitted/git-mktag.c  (mode:100644)
@@ -31,7 +31,7 @@
 	if (map) {
 		char type[100];
 		unsigned long size;
-		void *buffer = unpack_sha1_file(map, mapsize, type, &size);
+		void *buffer = unpack_sha1_file(sha1,map,mapsize,type,&size);
 
 		if (buffer) {
 			if (!strcmp(type, expected_type))
Index: sha1_file.c
===================================================================
--- 89fdfd09b281fdf5071bc13a30ef683bd6851b61/sha1_file.c  (mode:100644 sha1:db2880e389e556dd3a5eef02aa8a3bb235528057)
+++ uncommitted/sha1_file.c  (mode:100644)
@@ -139,31 +139,195 @@
 	return map;
 }
 
-void * unpack_sha1_file(void *map, unsigned long mapsize, char *type, unsigned long *size)
+static int find_packed_header(const unsigned char *sha1, char *buf, unsigned long buf_len,
+		              unsigned char *refsha1, char *type, unsigned long *offset)
+{
+	char *p;
+	p = buf;
+
+	*offset = 0;
+	while(p < buf + buf_len) {
+		unsigned long item_len;
+		unsigned char item_sha[20];
+		memcpy(item_sha, p, 20);
+		sscanf(p + 20, "%s %lu ", type, &item_len);
+		p += 20 + strlen(p + 20) + 1;
+		if (strcmp(type, "delta") == 0) {
+			memcpy(refsha1, p, 20);
+			p += 20;
+		}
+		if (memcmp(item_sha, sha1, 20) == 0)
+			return 0;
+		*offset += item_len;
+	}
+	return -1;
+}
+
+
+static void * _unpack_sha1_file(z_stream *stream, const unsigned char *sha1, void *map, 
+			unsigned long mapsize, char *type, unsigned long *size)
 {
 	int ret, bytes;
+	char buffer[8192];
+	char *buf;
+
+	/* Get the data stream */
+	memset(stream, 0, sizeof(*stream));
+	stream->next_in = map;
+	stream->avail_in = mapsize;
+	stream->next_out = buffer;
+	stream->avail_out = sizeof(buffer);
+
+	inflateInit(stream);
+	ret = inflate(stream, 0);
+	if (ret < Z_OK) {
+		return NULL;
+	}
+	if (sscanf(buffer, "%10s %lu", type, size) != 2) {
+		return NULL;
+	}
+	bytes = strlen(buffer) + 1;
+	buf = xmalloc(*size);
+
+	memcpy(buf, buffer + bytes, stream->total_out - bytes);
+	bytes = stream->total_out - bytes;
+	if (bytes < *size && ret == Z_OK) {
+		stream->next_out = buf + bytes;
+		stream->avail_out = *size - bytes;
+		while (inflate(stream, Z_FINISH) == Z_OK)
+			/* nothing */;
+	}
+	inflateEnd(stream);
+	return buf;
+}
+static int find_sha1_ref(unsigned char *sha1)
+{
+	unsigned char foundsha1[20];
 	z_stream stream;
+	char *buf;
+	unsigned long header_len;
+	char type[20];
+	char *map;
+	unsigned long mapsize;
+	unsigned long offset;
+
+	map = map_sha1_file(sha1, &mapsize);
+	if (!map)
+		return -1;
+	buf = _unpack_sha1_file(&stream, sha1, map, mapsize, type, &header_len);
+
+	if (!buf)
+		goto fail;
+	if (strcmp(type, "packed"))
+		goto fail;
+        if (find_packed_header(sha1, buf, header_len, foundsha1, type, &offset))
+		goto fail;
+	munmap(map, mapsize);
+	free(buf);
+
+	if (strcmp(type, "delta"))
+		return 0;
+	memcpy(sha1, foundsha1, 20);
+	return 0;
+fail:
+	munmap(map, mapsize);
+	free(buf);
+	return -1;
+}
+
+static void * unpack_delta(char *refsha1, char *delta_start, 
+			   unsigned long delta_len, char *type, 
+			   unsigned long *size)
+{
+	zd_stream dstream;
+	int ret, bytes;
 	char buffer[8192];
 	char *buf;
+	char *refbuffer = NULL;
+	unsigned long refsize = 0;
 
+	memset(&dstream, 0, sizeof(dstream));
+	refbuffer = read_sha1_file(refsha1, type, &refsize);
+	if (!refbuffer) {
+		return NULL;
+	}
+	dstream.base[0] = refbuffer;
+	dstream.base_avail[0] = refsize;
+	dstream.refnum = 1;
+	dstream.next_in = delta_start;
+	dstream.avail_in = delta_len;
+	dstream.next_out = buffer;
+	dstream.avail_out = sizeof(buffer);
+	ret = zd_inflateInit(&dstream);
+	ret = zd_inflate(&dstream, 0);
+	if (sscanf(buffer, "%10s %lu", type, size) != 2) {
+		free(refbuffer);
+		return NULL;
+	}
+	bytes = strlen(buffer) + 1;
+	buf = xmalloc(*size);
+	memcpy(buf, buffer + bytes, 
+		dstream.total_out - bytes);
+	bytes = dstream.total_out - bytes;
+	if (bytes < *size && ret == ZD_OK) {
+		dstream.next_out = buf + bytes;
+		dstream.avail_out = *size - bytes;
+		while (zd_inflate(&dstream, ZD_FINISH) == ZD_OK)
+			/* nothing */;
+	}
+	zd_inflateEnd(&dstream);
+	free(refbuffer);
+	return buf;
+}
+
+void * unpack_sha1_file(const unsigned char *sha1, void *map, 
+			unsigned long mapsize, char *type, unsigned long *size)
+{
+	int ret, bytes;
+	z_stream stream;
+	char buffer[8192];
+	char *buf;
+	unsigned long offset;
+	unsigned long header_len;
+	unsigned char refsha1[20];
+	unsigned char headertype[20];
+
+	buf = _unpack_sha1_file(&stream, sha1, map, mapsize, type, size);
+	if (!buf)
+		return buf;
+	if (strcmp(type, "packed"))
+		return buf;
+
+	if (!sha1) {
+		free(buf);
+		return NULL;
+	}
+	header_len = *size;
+        if (find_packed_header(sha1, buf, header_len, refsha1, headertype, &offset)) {
+		free(buf);
+		return NULL;
+	}
+	offset += stream.total_in;
+	free(buf);
+	if (!strcmp(headertype, "delta"))
+		return unpack_delta(refsha1, map+offset, mapsize-offset, type,size);
 	/* Get the data stream */
 	memset(&stream, 0, sizeof(stream));
-	stream.next_in = map;
-	stream.avail_in = mapsize;
+	buf = NULL;
+	stream.next_in = map + offset;
+	stream.avail_in = mapsize - offset;
 	stream.next_out = buffer;
 	stream.avail_out = sizeof(buffer);
+	ret = inflateInit(&stream);
 
-	inflateInit(&stream);
 	ret = inflate(&stream, 0);
-	if (ret < Z_OK)
-		return NULL;
-	if (sscanf(buffer, "%10s %lu", type, size) != 2)
+	if (sscanf(buffer, "%10s %lu", type, size) != 2) {
 		return NULL;
-
+	}
 	bytes = strlen(buffer) + 1;
 	buf = xmalloc(*size);
-
-	memcpy(buf, buffer + bytes, stream.total_out - bytes);
+	memcpy(buf, buffer + bytes, 
+		stream.total_out - bytes);
 	bytes = stream.total_out - bytes;
 	if (bytes < *size && ret == Z_OK) {
 		stream.next_out = buf + bytes;
@@ -182,7 +346,7 @@
 
 	map = map_sha1_file(sha1, &mapsize);
 	if (map) {
-		buf = unpack_sha1_file(map, mapsize, type, size);
+		buf = unpack_sha1_file(sha1, map, mapsize, type, size);
 		munmap(map, mapsize);
 		return buf;
 	}
@@ -268,7 +432,8 @@
 	/* Set it up */
 	memset(&stream, 0, sizeof(stream));
 	deflateInit(&stream, Z_BEST_COMPRESSION);
-	size = deflateBound(&stream, len+hdrlen);
+	// size = zd_deflateBound(&stream, len+hdrlen);
+	size = len + hdrlen + 12;
 	compressed = xmalloc(size);
 
 	/* Compress it */
@@ -413,3 +578,323 @@
 		return 1;
 	return 0;
 }
+
+static void *pack_delta_buffer(void *buf, unsigned long buf_len, char *metadata, int metadata_size, unsigned long *compsize, unsigned char *refsha1)
+{
+	char *compressed;
+	zd_stream stream;
+	unsigned long size;
+	char *refbuffer = NULL;
+	char reftype[20];
+	unsigned long refsize = 0;
+	int ret;
+
+	if (find_sha1_ref(refsha1)) {
+		return NULL;
+	}
+	refbuffer = read_sha1_file(refsha1, reftype, &refsize);
+
+	/* note, we could just continue without the delta here */
+	if (!refbuffer) {
+		free(refbuffer);
+		return NULL;
+	}
+
+	/* Set it up */
+	memset(&stream, 0, sizeof(stream));
+	/* TODO, real deflate bound here */
+	size = buf_len + metadata_size + 12;
+	compressed = xmalloc(size);
+
+	/*
+	 * ASCII size + nul byte
+	 */	
+	stream.base[0] = refbuffer;
+	stream.base_avail[0] = refsize;
+	stream.refnum = 1;
+	stream.next_in = metadata;
+	stream.avail_in = metadata_size;
+	stream.next_out = compressed;
+	stream.avail_out = size;
+	ret = zd_deflateInit(&stream, ZD_BEST_COMPRESSION);
+	/* TODO check for -ENOMEM */
+	while ((ret = zd_deflate(&stream, 0)) == ZD_OK)
+		/* nothing */;
+
+	stream.next_in = buf;
+	stream.avail_in = buf_len;
+	/* Compress it */
+	while ((ret = zd_deflate(&stream, ZD_FINISH)) == ZD_OK)
+		/* nothing */;
+	ret = zd_deflateEnd(&stream);
+	size = stream.total_out;
+	*compsize = size;
+
+	/* ugh, we're comparing the compressed size against the uncompressed size
+	 * of the reference buffer.  But, this is as good as we can do without
+	 * an extra read
+	 */
+	if (size > refsize) {
+		free(refbuffer);
+		free(compressed);
+		return NULL;
+	}
+	free(refbuffer);
+	return compressed;
+}
+
+static void *pack_buffer(void *buf, unsigned long buf_len, char *metadata, int metadata_size, unsigned long *compsize)
+{
+	char *compressed;
+	z_stream stream;
+	unsigned long size;
+	int ret;
+
+	/* Set it up */
+	memset(&stream, 0, sizeof(stream));
+	/* TODO, real deflate bound here */
+	size = buf_len + metadata_size + 12;
+	compressed = xmalloc(size);
+
+	/*
+	 * ASCII size + nul byte
+	 */	
+	stream.next_in = metadata;
+	stream.avail_in = metadata_size;
+	stream.next_out = compressed;
+	stream.avail_out = size;
+	ret = deflateInit(&stream, Z_BEST_COMPRESSION);
+	/* TODO check for -ENOMEM */
+	while ((ret = deflate(&stream, 0)) == Z_OK)
+		/* nothing */;
+
+	stream.next_in = buf;
+	stream.avail_in = buf_len;
+	/* Compress it */
+	while ((ret = deflate(&stream, Z_FINISH)) == Z_OK)
+		/* nothing */;
+	ret = deflateEnd(&stream);
+	size = stream.total_out;
+	*compsize = size;
+	return compressed;
+}
+
+int pack_sha1_buffer(void *buf, unsigned long buf_len, char *type,
+		     unsigned char *returnsha1,
+		     unsigned char *refsha1,
+		     struct packed_item **packed_item)
+{
+	unsigned char sha1[20];
+	SHA_CTX c;
+	char *filename;
+	struct stat st;
+	char *compressed = NULL;
+	unsigned long size;
+	struct packed_item *item;
+	char *metadata = xmalloc(200);
+	int metadata_size;
+	int delta = 0;
+
+	*packed_item = NULL;
+
+	metadata_size = 1 + sprintf(metadata, "%s %lu", type, buf_len);
+
+	/* Sha1.. */
+	SHA1_Init(&c);
+	SHA1_Update(&c, metadata, metadata_size);
+	SHA1_Update(&c, buf, buf_len);
+	SHA1_Final(sha1, &c);
+
+	if (returnsha1)
+		memcpy(returnsha1, sha1, 20);
+
+	filename = sha1_file_name(sha1);
+	if (stat(filename, &st) == 0)
+		goto out;
+
+	
+	if (refsha1) {
+		compressed = pack_delta_buffer(buf, buf_len, metadata, metadata_size, &size, refsha1);
+		delta = (compressed != NULL);
+	}
+	if (!compressed) {
+		compressed = pack_buffer(buf, buf_len, metadata, metadata_size, &size);
+	}
+	free(metadata);
+	if (!compressed) {
+		return -1;
+	}
+	item = xmalloc(sizeof(struct packed_item));
+	memcpy(item->sha1, sha1, 20);
+	if (delta) {
+		strcpy(item->type, "delta");
+		memcpy(item->refsha1, refsha1, 20);
+	} else {
+		strcpy(item->type, type);
+		memset(item->refsha1, 0, 20);
+	}
+	item->len = size;
+	item->next = NULL;
+	item->data = compressed;
+	*packed_item = item;
+out:
+	return 0;
+}
+
+static char *create_packed_header(struct packed_item *head, unsigned long *size)
+{
+	char *metadata = NULL;
+	int metadata_size = 0;
+	*size = 0;
+	int entry_size = 0;
+
+	while(head) {
+		char *p;
+		metadata = realloc(metadata, metadata_size + 220);
+		if (!metadata)
+			return NULL;
+		p = metadata+metadata_size;
+		memcpy(p, head->sha1, 20);
+		p += 20;
+		entry_size = 1 + sprintf(p, "%s %lu", head->type, head->len);
+		metadata_size += entry_size + 20;
+		if (strcmp(head->type, "delta") == 0) {
+			memcpy(p + entry_size, head->refsha1, 20);
+			metadata_size += 20;
+		}
+
+		head = head->next;
+	}
+	*size = metadata_size;
+	return metadata;
+}
+
+#define WRITE_BUFFER_SIZE 8192
+static char write_buffer[WRITE_BUFFER_SIZE];
+static unsigned long write_buffer_len;
+
+static int c_write(int fd, void *data, unsigned int len)
+{
+	while (len) {
+		unsigned int buffered = write_buffer_len;
+		unsigned int partial = WRITE_BUFFER_SIZE - buffered;
+		if (partial > len)
+			partial = len;
+		memcpy(write_buffer + buffered, data, partial);
+		buffered += partial;
+		if (buffered == WRITE_BUFFER_SIZE) {
+			if (write(fd, write_buffer, WRITE_BUFFER_SIZE) != WRITE_BUFFER_SIZE)
+				return -1;
+			buffered = 0;
+		}
+		write_buffer_len = buffered;
+		len -= partial;
+		data += partial;
+ 	}
+ 	return 0;
+}
+
+static int c_flush(int fd)
+{
+	if (write_buffer_len) {
+		int left = write_buffer_len;
+		if (write(fd, write_buffer, left) != left)
+			return -1;
+		write_buffer_len = 0;
+	}
+	return 0;
+}
+int write_packed_buffer(struct packed_item *head)
+{
+	unsigned char sha1[20];
+	SHA_CTX c;
+	char *filename;
+	char *metadata = xmalloc(200);
+	char *header;
+	int metadata_size;
+	int fd;
+	int ret = 0;
+	unsigned long header_len;
+	struct packed_item *item;
+	char *compressed;
+	z_stream stream;
+	unsigned long size;
+	int zdret;
+
+	header = create_packed_header(head, &header_len);
+	metadata_size = 1+sprintf(metadata, "packed %lu", header_len);
+	/* 
+	 * the header contains the sha1 of each item, so we only sha1 the
+	 * header
+	 */ 
+	SHA1_Init(&c);
+	SHA1_Update(&c, metadata, metadata_size);
+	SHA1_Update(&c, header, header_len);
+	SHA1_Final(sha1, &c);
+
+	filename = strdup(sha1_file_name(sha1));
+	fd = open(filename, O_WRONLY | O_CREAT | O_EXCL, 0666);
+	if (fd < 0) {
+		/* add collision check! */
+		if (errno != EEXIST) {
+			ret = -errno;
+		}
+		goto out_nofile;
+	}
+       /* compress just the header info */
+        memset(&stream, 0, sizeof(stream));
+        deflateInit(&stream, Z_BEST_COMPRESSION);
+	/* TODO, bounds check */
+	size = header_len + metadata_size + 12;
+        compressed = xmalloc(size);
+
+        stream.next_in = metadata;
+        stream.avail_in = metadata_size;
+        stream.next_out = compressed;
+        stream.avail_out = size;
+        while ((zdret = deflate(&stream, 0)) == Z_OK)
+                /* nothing */;
+        stream.next_in = header;
+        stream.avail_in = header_len;
+        while ((zdret = deflate(&stream, Z_FINISH)) == Z_OK)
+                /* nothing */;
+        zdret = deflateEnd(&stream);
+        size = stream.total_out;
+
+	c_write(fd, compressed, size);
+	free(compressed);
+
+	item = head;
+	while(item) {
+		if (c_write(fd, item->data, item->len)) {
+			ret = -EIO;
+			goto out;
+		}
+		item = item->next;
+	}
+	if (c_flush(fd)) {
+		ret = -EIO;
+		goto out;
+	}
+	item = head;
+	while(item) {
+		char *item_file;
+		struct packed_item *next = item->next;
+		item_file = sha1_file_name(item->sha1);
+		if (link(filename, item_file) && errno != EEXIST) {
+			ret = -errno;
+			break;
+		}
+		free(item->data);
+		free(item);
+		item = next;
+	}
+	unlink(filename);
+out:
+	close(fd);
+out_nofile:
+	free(header);
+	free(metadata);
+	free(filename);
+	return ret;
+}
Index: update-cache.c
===================================================================
--- 89fdfd09b281fdf5071bc13a30ef683bd6851b61/update-cache.c  (mode:100644 sha1:16e1bb9aea6413db35039042289605124d759501)
+++ uncommitted/update-cache.c  (mode:100644)
@@ -31,55 +31,39 @@
 	return (unsigned long)ptr > (unsigned long)-1000L;
 }
 
-static int index_fd(unsigned char *sha1, int fd, struct stat *st)
+static int index_fd(unsigned char *sha1, unsigned char *refsha1, int fd, struct stat *st, struct packed_item **head, struct packed_item **tail, unsigned long *packed_size, int *packed_nr)
 {
-	z_stream stream;
 	unsigned long size = st->st_size;
-	int max_out_bytes = size + 200;
-	void *out = xmalloc(max_out_bytes);
-	void *metadata = xmalloc(200);
-	int metadata_size;
 	void *in;
-	SHA_CTX c;
+	int ret;
+	struct packed_item *new_item;
 
 	in = "";
 	if (size)
 		in = mmap(NULL, size, PROT_READ, MAP_PRIVATE, fd, 0);
 	close(fd);
-	if (!out || (int)(long)in == -1)
+	if ((int)(long)in == -1) {
 		return -1;
-
-	metadata_size = 1+sprintf(metadata, "blob %lu", size);
-
-	SHA1_Init(&c);
-	SHA1_Update(&c, metadata, metadata_size);
-	SHA1_Update(&c, in, size);
-	SHA1_Final(sha1, &c);
-
-	memset(&stream, 0, sizeof(stream));
-	deflateInit(&stream, Z_BEST_COMPRESSION);
-
-	/*
-	 * ASCII size + nul byte
-	 */	
-	stream.next_in = metadata;
-	stream.avail_in = metadata_size;
-	stream.next_out = out;
-	stream.avail_out = max_out_bytes;
-	while (deflate(&stream, 0) == Z_OK)
-		/* nothing */;
-
-	/*
-	 * File content
-	 */
-	stream.next_in = in;
-	stream.avail_in = size;
-	while (deflate(&stream, Z_FINISH) == Z_OK)
-		/*nothing */;
-
-	deflateEnd(&stream);
-	
-	return write_sha1_buffer(sha1, out, stream.total_out);
+	}
+	ret = pack_sha1_buffer(in, size, "blob", sha1, refsha1, &new_item);
+	if (new_item) {
+		if (*tail)
+			(*tail)->next = new_item;
+		*tail = new_item;
+		if (!*head)
+			*head = new_item;
+		*packed_size += new_item->len;
+		*packed_nr++;
+		if (*packed_size > (512 * 1024) || *packed_nr > 1024) {
+			write_packed_buffer(*head);
+			*head = NULL;
+			*tail = NULL;
+			*packed_size = 0;
+			*packed_nr = 0;
+		}
+	}
+	munmap(in, size);
+	return ret;
 }
 
 /*
@@ -102,12 +86,14 @@
 	ce->ce_size = htonl(st->st_size);
 }
 
-static int add_file_to_cache(char *path)
+static int add_file_to_cache(char *path, struct packed_item **packed_head, struct packed_item **packed_tail, unsigned long *packed_size, int *packed_nr)
 {
 	int size, namelen;
 	struct cache_entry *ce;
 	struct stat st;
 	int fd;
+	int pos;
+	unsigned char *refsha1 = NULL;
 
 	fd = open(path, O_RDONLY);
 	if (fd < 0) {
@@ -129,8 +115,12 @@
 	fill_stat_cache_info(ce, &st);
 	ce->ce_mode = create_ce_mode(st.st_mode);
 	ce->ce_flags = htons(namelen);
+	pos = cache_name_pos(ce->name, namelen);
+	if (pos >= 0)
+		refsha1 = active_cache[pos]->sha1;
 
-	if (index_fd(ce->sha1, fd, &st) < 0)
+	if (index_fd(ce->sha1, refsha1, fd, &st, packed_head, 
+		     packed_tail, packed_size, packed_nr) < 0)
 		return -1;
 
 	return add_cache_entry(ce, allow_add);
@@ -311,6 +301,10 @@
 	int allow_options = 1;
 	static char lockfile[MAXPATHLEN+1];
 	const char *indexfile = get_index_file();
+	struct packed_item *packed_head = NULL;
+	struct packed_item *packed_tail = NULL;
+	unsigned long packed_size = 0;
+	int packed_nr = 0;
 
 	snprintf(lockfile, sizeof(lockfile), "%s.lock", indexfile);
 
@@ -362,8 +356,13 @@
 			fprintf(stderr, "Ignoring path %s\n", argv[i]);
 			continue;
 		}
-		if (add_file_to_cache(path))
+		if (add_file_to_cache(path, &packed_head, &packed_tail, &packed_size, &packed_nr))
 			die("Unable to add %s to database", path);
+
+	}
+	if (packed_head) {
+		if (write_packed_buffer(packed_head))
+			die("write packed buffer failed");
 	}
 	if (write_cache(newfd, active_cache, active_nr) || rename(lockfile, indexfile))
 		die("Unable to write new cachefile");
Index: write-tree.c
===================================================================
--- 89fdfd09b281fdf5071bc13a30ef683bd6851b61/write-tree.c  (mode:100644 sha1:168352853d37bdca71d68ad8312b87b84477dea1)
+++ uncommitted/write-tree.c  (mode:100644)
@@ -5,24 +5,13 @@
  */
 #include "cache.h"
 
-static int check_valid_sha1(unsigned char *sha1)
-{
-	char *filename = sha1_file_name(sha1);
-	int ret;
-
-	/* If we were anal, we'd check that the sha1 of the contents actually matches */
-	ret = access(filename, R_OK);
-	if (ret)
-		perror(filename);
-	return ret;
-}
-
-static int write_tree(struct cache_entry **cachep, int maxentries, const char *base, int baselen, unsigned char *returnsha1)
+static int write_tree(struct cache_entry **cachep, int maxentries, const char *base, int baselen, unsigned char *returnsha1, struct packed_item **head)
 {
 	unsigned char subdir_sha1[20];
 	unsigned long size, offset;
 	char *buffer;
 	int nr;
+	struct packed_item *item;
 
 	/* Guess at some random initial size */
 	size = 8192;
@@ -50,7 +39,7 @@
 		if (dirname) {
 			int subdir_written;
 
-			subdir_written = write_tree(cachep + nr, maxentries - nr, pathname, dirname-pathname+1, subdir_sha1);
+			subdir_written = write_tree(cachep + nr, maxentries - nr, pathname, dirname-pathname+1, subdir_sha1, head);
 			nr += subdir_written;
 
 			/* Now we need to write out the directory entry into this tree.. */
@@ -62,9 +51,6 @@
 			sha1 = subdir_sha1;
 		}
 
-		if (check_valid_sha1(sha1) < 0)
-			exit(1);
-
 		entrylen = pathlen - baselen;
 		if (offset + entrylen + 100 > size) {
 			size = alloc_nr(offset + entrylen + 100);
@@ -77,7 +63,11 @@
 		nr++;
 	} while (nr < maxentries);
 
-	write_sha1_file(buffer, offset, "tree", returnsha1);
+	pack_sha1_buffer(buffer, offset, "tree", returnsha1, NULL, &item);
+	if (item) {
+		item->next = *head;
+		*head = item;
+	}
 	free(buffer);
 	return nr;
 }
@@ -87,6 +77,7 @@
 	int i, unmerged;
 	int entries = read_cache();
 	unsigned char sha1[20];
+	struct packed_item *head = NULL;
 
 	if (entries <= 0)
 		die("write-tree: no cache contents to write");
@@ -107,8 +98,12 @@
 		die("write-tree: not able to write tree");
 
 	/* Ok, write it out */
-	if (write_tree(active_cache, entries, "", 0, sha1) != entries)
+	if (write_tree(active_cache, entries, "", 0, sha1, &head) != entries)
 		die("write-tree: internal error");
+	if (head) {
+		if (write_packed_buffer(head))
+			die("write_packed_buffer error");
+	}
 	printf("%s\n", sha1_to_hex(sha1));
 	return 0;
 }

^ permalink raw reply

* Re: Mercurial 0.4b vs git patchbomb benchmark
From: Matt Mackall @ 2005-05-03  1:29 UTC (permalink / raw)
  To: Bodo Eggert <harvested.in.lkml@posting.7eggert.dyndns.org>
  Cc: Linus Torvalds, Ryan Anderson, Bill Davidsen, Andrea Arcangeli,
	linux-kernel, git
In-Reply-To: <E1DSm1T-0002Tc-FV@be1.7eggert.dyndns.org>

On Tue, May 03, 2005 at 03:16:26AM +0200, Bodo Eggert <harvested.in.lkml@posting.7eggert.dyndns.org> wrote:
> Linus Torvalds <torvalds@osdl.org> wrote:
> > On Mon, 2 May 2005, Ryan Anderson wrote:
> >> On Mon, May 02, 2005 at 09:31:06AM -0700, Linus Torvalds wrote:
> 
> >> > That said, I think the /usr/bin/env trick is stupid too. It may be more
> >> > portable for various Linux distributions, but if you want _true_
> >> > portability, you use /bin/sh, and you do something like
> >> > 
> >> > #!/bin/sh
> >> > exec perl perlscript.pl "$@"
> >> if 0;
> 
> exec may fail.
> 
> #!/bin/sh
> exec perl -x $0 ${1+"$@"} || exit 127
> #!perl
> 
> >> You don't really want Perl to get itself into an exec loop.
> > 
> > This would _not_ be "perlscript.pl" itself. This is the shell-script, and
> > it's not called ".pl".
> 
> In this thread, it originally was.

In this thread, it was originally a Python script. In particular, one
aimed at managing the Linux kernel source. I'm going to use
/usr/bin/env, systems where that doesn't exist can edit the source.

--
Mathematics is the supreme nostalgia of our time.

^ permalink raw reply

* Re: Mercurial 0.4b vs git patchbomb benchmark
From: Bodo Eggert <harvested.in.lkml@posting.7eggert.dyndns.org> @ 2005-05-03  1:16 UTC (permalink / raw)
  To: Linus Torvalds, Ryan Anderson, Bill Davidsen, Andrea Arcangeli,
	Matt Mackall, linux-kernel, git
In-Reply-To: <3ZNdz-6gK-9@gated-at.bofh.it>

Linus Torvalds <torvalds@osdl.org> wrote:
> On Mon, 2 May 2005, Ryan Anderson wrote:
>> On Mon, May 02, 2005 at 09:31:06AM -0700, Linus Torvalds wrote:

>> > That said, I think the /usr/bin/env trick is stupid too. It may be more
>> > portable for various Linux distributions, but if you want _true_
>> > portability, you use /bin/sh, and you do something like
>> > 
>> > #!/bin/sh
>> > exec perl perlscript.pl "$@"
>> if 0;

exec may fail.

#!/bin/sh
exec perl -x $0 ${1+"$@"} || exit 127
#!perl

>> You don't really want Perl to get itself into an exec loop.
> 
> This would _not_ be "perlscript.pl" itself. This is the shell-script, and
> it's not called ".pl".

In this thread, it originally was.
-- 

"Our parents, worse than our grandparents, gave birth to us who are worse than
they, and we shall in our turn bear offspring still more evil."
        -- Horace (BC 65-8)


^ permalink raw reply

* Re: [PATCH] Add exclude file support to cg-status
From: Junio C Hamano @ 2005-05-03  1:09 UTC (permalink / raw)
  To: Matt Porter; +Cc: Petr Baudis, git
In-Reply-To: <20050502171042.A24299@cox.net>

>>>>> "MP" == Matt Porter <mporter@kernel.crashing.org> writes:

MP> Adds a trivial per-repository exclude file implementation for
MP> cg-status on top of the new git-ls-files option.

MP> +EXCLUDEFILE=.git/exclude

Good intentions, but shouldn't the file be .git/info/exclude
(i.e. under .git/info)?

^ permalink raw reply

* Re: Trying to use AUTHOR_DATE
From: H. Peter Anvin @ 2005-05-03  0:38 UTC (permalink / raw)
  To: Krzysztof Halasa
  Cc: David Woodhouse, Edgar Toernig, Linus Torvalds, Luck, Tony, git
In-Reply-To: <m38y2xdubr.fsf@defiant.localdomain>

Krzysztof Halasa wrote:
> "H. Peter Anvin" <hpa@zytor.com> writes:
> 
> 
>>No.  You cannot get 61.
> 
> I was told it would be possible if two leap seconds were needed in some
> point of time. Have never occured yet, and maybe never will.
> 
> Well, it seems it would need two seconds a month (at least 13 leap seconds
> a year) -> not in this century if ever, and it wouldn't be UTC anymore.
> 

It's certainly not permitted by the current UTC definition, which only 
allows 4 leap seconds per year.  61 comes from a typo in an old version 
of the POSIX standard.

>>You can, however, get jumps from 58 to 00.
> 
> Correct, that would be a deletion. Not yet tried, either, but they say
> it's possible.

... and permitted by the current UTC standard.

	-hpa

^ permalink raw reply

* [PATCH] Git-prune-script loses blobs referenced from an uncommitted cache.
From: Junio C Hamano @ 2005-05-03  0:37 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git

When a new blob is registered with update-cache, and before the cache
is written as a tree and committed, git-fsck-cache will find the blob
unreachable.  This patch fixes git-prune-script to keep such blob objects
referenced from the cache.

Without this fix, "diff-cache -p --cached" after git-prune-script has
pruned the blob object will fail mysteriously and git-write-tree would
also fail.

Signed-off-by: Junio C Hamano <junkio@cox.net>
---

git-prune-script |   32 +++++++++++++++++++++++++++++++-
1 files changed, 31 insertions(+), 1 deletion(-)

--- a/git-prune-script
+++ b/git-prune-script
@@ -1,2 +1,32 @@
 #!/bin/sh
-git-fsck-cache --unreachable $(cat .git/HEAD ) | grep unreachable | cut -d' ' -f3 | sed 's:^\(..\):.git/objects/\1/:' | xargs rm
+
+tmp=.git-prune-script-$$
+trap "rm -f $tmp-*" 0 1 2 3 15 
+
+# Defaulting to include .git/refs/*/* may be debatable from the
+# purist POV but power users can always give explicit parameters
+# to the script anyway.
+case "$#" in
+0) set x $(cat .git/HEAD .git/refs/*/*); shift ;;
+esac
+
+git-fsck-cache --unreachable "$@" |
+sed -ne 's/unreachable [^ ][^ ]* //p' |
+sort >$tmp-unreachable
+
+# This makes extra objects to be kept if the cache has an entry
+# with an unusual name like "this\n0 0123...abcdef 0 file", but
+# we are trying not to discard information and keeping extra in
+# an unusual situation would be OK.
+git-ls-files --stage |
+sed -ne 's|^[0-7][0-7]* \([0-9a-f][0-9a-f]*\) [0-3] .*|\1|p' |
+sort >$tmp-keep
+
+comm -23 $tmp-unreachable $tmp-keep |
+sed -e 's|\(..\)|\1/|' | {
+	case "$SHA1_FILE_DIRECTORY" in
+	'') cd .git/objects/ ;;
+	*) cd "$SHA1_FILE_DIRECTORY" ;;
+	esac || exit
+	xargs -r echo rm -f
+}


^ permalink raw reply

* Re: Trying to use AUTHOR_DATE
From: Krzysztof Halasa @ 2005-05-03  0:30 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: David Woodhouse, Edgar Toernig, Linus Torvalds, Luck, Tony, git
In-Reply-To: <4276B8A1.7070500@zytor.com>

"H. Peter Anvin" <hpa@zytor.com> writes:

> No.  You cannot get 61.

I was told it would be possible if two leap seconds were needed in some
point of time. Have never occured yet, and maybe never will.

Well, it seems it would need two seconds a month (at least 13 leap seconds
a year) -> not in this century if ever, and it wouldn't be UTC anymore.

> You can, however, get jumps from 58 to 00.

Correct, that would be a deletion. Not yet tried, either, but they say
it's possible.
-- 
Krzysztof Halasa

^ permalink raw reply

* Re: Anyone working on a CVS->git converter?
From: Kay Sievers @ 2005-05-03  0:28 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: Git Mailing List
In-Reply-To: <4275857A.1050106@zytor.com>

[-- Attachment #1: Type: text/plain, Size: 788 bytes --]

On Sun, 2005-05-01 at 18:42 -0700, H. Peter Anvin wrote:
> Anyone working on a CVS->git converter?  I'd like to move klibc 
> development into git.

I tried it with two completely stupid scripts and the nice cvsps.

Here is the tree to browse:
  http://ehlo.org/~kay/git/gitweb.cgi?p=klibc.git;a=log


In the CVS repo directory export patchsets as individual patches with a
header containing metadata:
  cvsps -x -b HEAD -g -p ../../patches/

split exported patches into individial files like author data, log, file list:
  for i in `seq 1 546`; do ../parse-cvsps-patch.pl ../patches/$i.patch ;done

apply it to an completely empty git-repo:
  for i in `seq 1 546`; do ../apply.sh ../patches/$i.patch ;done

Stupid scripts are attached. cvsps is here:
  http://www.cobite.com/cvsps/

Kay

[-- Attachment #2: apply.sh --]
[-- Type: application/x-shellscript, Size: 564 bytes --]

[-- Attachment #3: parse-cvsps-patch.pl --]
[-- Type: application/x-perl, Size: 1528 bytes --]

^ permalink raw reply

* [PATCH] Short-cut error return path in git-local-pull.
From: Junio C Hamano @ 2005-05-03  0:26 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git

When git-local-pull with -l option gets ENOENT attempting to create
a hard link, there is no point falling back to other copy methods.
This patch implements a short-cut to detect that case.

Signed-off-by: Junio C Hamano <junkio@cox.net>
---

local-pull.c |   25 ++++++++++++++++---------
1 files changed, 16 insertions(+), 9 deletions(-)

--- a/local-pull.c
+++ b/local-pull.c
@@ -39,12 +39,19 @@ int fetch(unsigned char *sha1)
 	filename[object_name_start+1] = hex[1];
 	filename[object_name_start+2] = '/';
 	strcpy(filename + object_name_start + 3, hex + 2);
-	if (use_link && !link(filename, dest_filename)) {
-		say("Hardlinked %s.\n", hex);
-		return 0;
+	if (use_link) {
+		if (!link(filename, dest_filename)) {
+			say("link %s\n", hex);
+			return 0;
+		}
+		/* If we got ENOENT there is no point continuing. */
+		if (errno == ENOENT) {
+			fprintf(stderr, "does not exist %s\n", filename);
+			return -1;
+		}
 	}
 	if (use_symlink && !symlink(filename, dest_filename)) {
-		say("Symlinked %s.\n", hex);
+		say("symlink %s\n", hex);
 		return 0;
 	}
 	if (use_filecopy) {
@@ -54,13 +61,13 @@ int fetch(unsigned char *sha1)
 		ifd = open(filename, O_RDONLY);
 		if (ifd < 0 || fstat(ifd, &st) < 0) {
 			close(ifd);
-			fprintf(stderr, "Cannot open %s\n", filename);
+			fprintf(stderr, "cannot open %s\n", filename);
 			return -1;
 		}
 		map = mmap(NULL, st.st_size, PROT_READ, MAP_PRIVATE, ifd, 0);
 		close(ifd);
 		if (-1 == (int)(long)map) {
-			fprintf(stderr, "Cannot mmap %s\n", filename);
+			fprintf(stderr, "cannot mmap %s\n", filename);
 			return -1;
 		}
 		ofd = open(dest_filename, O_WRONLY | O_CREAT | O_EXCL, 0666);
@@ -69,13 +76,13 @@ int fetch(unsigned char *sha1)
 		munmap(map, st.st_size);
 		close(ofd);
 		if (status)
-			fprintf(stderr, "Cannot write %s (%ld bytes)\n",
+			fprintf(stderr, "cannot write %s (%ld bytes)\n",
 				dest_filename, st.st_size);
 		else
-			say("Copied %s.\n", hex);
+			say("copy %s\n", hex);
 		return status;
 	}
-	fprintf(stderr, "No copy method was provided to copy %s.\n", hex);
+	fprintf(stderr, "failed to copy %s with given copy methods.\n", hex);
 	return -1;
 }
 


^ permalink raw reply

* Re: Mercurial 0.4b vs git patchbomb benchmark
From: Kevin Smith @ 2005-05-03  0:24 UTC (permalink / raw)
  Cc: git
In-Reply-To: <200505022106.OAA28850@emf.net>

Tom Lord wrote:
> More bluntly, given just a (1),(3) pair, Bob is extending his vulnerability
> to include a reliance on Alice's patch-computing tools.   If Alice were
> known to be signing a (1),(2) pair which she had reviewed in detail,
> then Bob's vulnerability stays at just his local patch-handling tools
> and his general trust of Alice.

I'm no expert, but it seems the opposite argument could be made as well.
By signing (1)(3), I am asserting that (3) is, in fact, what I intended
the end result to be. If I instead sign (1)(2), then it is possible that
your patching tools might end up producing something other than (3).

Personally, I still like the self-contained nature of signing (1)(2),
but I haven't yet heard a security argument in its favor.

Kevin

^ permalink raw reply

* [PATCH] Make git-*-pull say who wants it for missing objects.
From: Junio C Hamano @ 2005-05-03  0:13 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git

This patch updates pull.c, the engine that decides what object is needed
given a commit to traverse from, to report which commit was calling for
the object that cannot be retrieved from the remote side.  This complements
git-fsck-cache in that it checks the consistency of the remote repository
for reachability.

Signed-off-by: Junio C Hamano <junkio@cox.net>
---

pull.c |   37 ++++++++++++++++++++++++++++++-------
1 files changed, 30 insertions(+), 7 deletions(-)

--- a/pull.c
+++ b/pull.c
@@ -7,12 +7,31 @@
 int get_tree = 0;
 int get_history = 0;
 int get_all = 0;
+static unsigned char current_commit_sha1[20];
 
-static int make_sure_we_have_it(unsigned char *sha1)
+static const char commitS[] = "commit";
+static const char treeS[] = "tree";
+static const char blobS[] = "blob";
+
+static void report_missing(const char *what, const unsigned char *missing)
+{
+	char missing_hex[41];
+
+	strcpy(missing_hex, sha1_to_hex(missing));;
+	fprintf(stderr,
+		"Cannot obtain needed %s %s\nwhile processing commit %s.\n",
+		what, missing_hex, sha1_to_hex(current_commit_sha1));
+}
+
+static int make_sure_we_have_it(const char *what, unsigned char *sha1)
 {
+	int status;
 	if (has_sha1_file(sha1))
 		return 0;
-	return fetch(sha1);	
+	status = fetch(sha1);
+	if (status && what)
+		report_missing(what, sha1);
+	return status;
 }
 
 static int process_tree(unsigned char *sha1)
@@ -24,7 +43,8 @@ static int process_tree(unsigned char *s
 		return -1;
 
 	for (entries = tree->entries; entries; entries = entries->next) {
-		if (make_sure_we_have_it(entries->item.tree->object.sha1))
+		const char *what = entries->directory ? treeS : blobS;
+		if (make_sure_we_have_it(what, entries->item.tree->object.sha1))
 			return -1;
 		if (entries->directory) {
 			if (process_tree(entries->item.tree->object.sha1))
@@ -38,14 +58,14 @@ static int process_commit(unsigned char 
 {
 	struct commit *obj = lookup_commit(sha1);
 
-	if (make_sure_we_have_it(sha1))
+	if (make_sure_we_have_it(commitS, sha1))
 		return -1;
 
 	if (parse_commit(obj))
 		return -1;
 
 	if (get_tree) {
-		if (make_sure_we_have_it(obj->tree->object.sha1))
+		if (make_sure_we_have_it(treeS, obj->tree->object.sha1))
 			return -1;
 		if (process_tree(obj->tree->object.sha1))
 			return -1;
@@ -57,7 +77,8 @@ static int process_commit(unsigned char 
 		for (; parents; parents = parents->next) {
 			if (has_sha1_file(parents->item->object.sha1))
 				continue;
-			if (make_sure_we_have_it(parents->item->object.sha1)) {
+			if (make_sure_we_have_it(NULL,
+						 parents->item->object.sha1)) {
 				/* The server might not have it, and
 				 * we don't mind. 
 				 */
@@ -65,6 +86,7 @@ static int process_commit(unsigned char 
 			}
 			if (process_commit(parents->item->object.sha1))
 				return -1;
+			memcpy(current_commit_sha1, sha1, 20);
 		}
 	}
 	return 0;
@@ -77,8 +99,9 @@ int pull(char *target)
 	retval = get_sha1_hex(target, sha1);
 	if (retval)
 		return retval;
-	retval = make_sure_we_have_it(sha1);
+	retval = make_sure_we_have_it(commitS, sha1);
 	if (retval)
 		return retval;
+	memcpy(current_commit_sha1, sha1, 20);
 	return process_commit(sha1);
 }


^ permalink raw reply

* [PATCH] Add exclude file support to cg-status
From: Matt Porter @ 2005-05-03  0:10 UTC (permalink / raw)
  To: Petr Baudis; +Cc: git

Adds a trivial per-repository exclude file implementation for
cg-status on top of the new git-ls-files option.

Signed-off-by: Matt Porter <mporter@kernel.crashing.org>

--- 002c6f1e4924965e2101d2e6447855f10c55df41/cg-status  (mode:100755 sha1:9e7f0e59284a3d15cda35bbd5579c44d8eda05d5)
+++ 1b8c5395679e5c04734b1c86445a6355124ada7e/cg-status  (mode:100755 sha1:6669e36f5ff5d5964882b58ba43a5dcab4fd7fc6)
@@ -7,8 +7,14 @@
 
 . cg-Xlib
 
+EXCLUDEFILE=.git/exclude
+EXCLUDE=
+if [ -f $EXCLUDEFILE ]; then
+	EXCLUDE="--exclude-from=$EXCLUDEFILE"
+fi
+
 {
-	git-ls-files -z -t --others --deleted --unmerged
+	git-ls-files -z -t --others --deleted --unmerged $EXCLUDE
 } | sort -z -k 2 | xargs -0 sh -c '
 while [ "$1" ]; do
 	tag=${1% *};

^ permalink raw reply

* Re: Mercurial 0.4b vs git patchbomb benchmark
From: Matt Mackall @ 2005-05-03  0:00 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Bill Davidsen, Morten Welinder, Sean, linux-kernel, git
In-Reply-To: <Pine.LNX.4.58.0505021540070.3594@ppc970.osdl.org>

On Mon, May 02, 2005 at 03:49:49PM -0700, Linus Torvalds wrote:
> > >  - you can drop old objects.
> > 
> > You can't drop old objects without dropping all the changesets that
> > refer to them or otherwise being prepared to deal with the broken
> > links.

[...]

> I could write this up in ten minutes. It's really simple.

It's still simple in Mercurial, but more importantly Mercurial _won't
need it_. Dropping history is a work-around, not a feature.

> > > delta models very fundamentally don't support this. 
> > 
> > The latter can be done in a pretty straightforward manner in mercurial
> > with one pass over the data. But I have a goal to make keeping the
> > whole history cheap enough that no one balks at it.
> 
> With delta's, you have two choices:
> 
>  - change all the sha1 names (ie a pruned tree would no longer be 
>    compatible with a non-pruned one)
>  - make the delta part not show up as part of the sha1 name (which means 
>    that it's unprotected).
> 
> which one would you have?

Umm.. I am _not_ calculating the SHA of the delta itself. That'd be
silly.

There are an arbitrary number of ways to calculate a delta between two
files. Similarly, there are an arbitrary number of ways to compress a
file (gzip has at least 9, not counting all the permutations of
flush). The only sensible thing to do is store a hash of the raw text
and check it against the fully restored text, because that's what you
care about being correct.

In Mercurial, deltas are just a storage detail and are effectively
completely hidden from everything except the innermost part of the
back-end. What's important is that Mercurial knows that A is a
revision of B in the backend and thus has enough information to
opportunistically attempt to calculate a delta.

So if the day ever comes when I want to prune the head of a log, I
simply reconstruct the first version to keep, store it in a new file,
then append all the deltas, unmodified. And fix up the offsets in the
indices. None of the hashes change.

> > What is a tree re-linker? Finds duplicate files and hard-links them?
> > Ok, that makes some sense. But it's a win on one machine and a lose
> > everywhere else.
> 
> Where would it be a loss? Esepcially since with git, it's cheap (you don't 
> need to compare content to find objects to link - you can just compare 
> filename listings).

Git repositories will be 10x larger than Mercurial everywhere that
doesn't benefit from this linking of unrelated trees. That is, folks
who aren't running gitbits.net.

> > I've added an "hg verify" command to Mercurial. It doesn't attempt to
> > fix anything up yet, but it can catch a couple things that git
> > probably can't (like file revisions that aren't owned by any
> > changeset), namely because there's more metadata around to look at.
> 
> git-fsck-cache catches exactly those kinds of things. And since it checks
> pretty much every _single_ assumption in git (which is not a lot, since
> git doesn't have a lot of assumptions), I guarantee you that you can't
> find any more than it does (the filename ordering is the big missing
> piece: I _still_ don't verify that trees are ordered. I've been mentioning
> it since the beginning, but I'm lazy).
> 
> In other words, your verifier can't verify anything more. It's entirely 
> possible that more things can go _wrong_, since you have more indexes, so 
> your verifier will have more to check, but that's not an advantage, that's 
> a downside.

Uh, no. It's just like a filesystem. Redundancy is what lets you
recover.

The extra indices are also very useful in their own right:

- they let you do easily do delta storage
- they let you efficiently do delta transmission
- they let you find past revisions of a file in O(1)
- they let you efficiently do "annotate"
- they let you do smarter merge

At least the first four seem fairly critical to me.

As various people have pointed out, you can hack delta transmission
and file revision indexing on top of git. But to do that, you'll need
to build the same indices that Mercurial has. And you'll need to check
their integrity.

Unfortunately, since the git back-end refuses to know anything about
the relation between file revisions, this will all happen in the front
end, and you'll have done almost all the work needed to do delta
storage without actually getting it. How sad.

You'll also likely end up with something quite a bit more complicated
than Mercurial because of the extra layering. This all strongly suggests
to me that the git back-end is just a little bit too simple.

-- 
Mathematics is the supreme nostalgia of our time.

^ permalink raw reply

* Re: Trying to use AUTHOR_DATE
From: H. Peter Anvin @ 2005-05-02 23:32 UTC (permalink / raw)
  To: Krzysztof Halasa
  Cc: David Woodhouse, Edgar Toernig, Linus Torvalds, Luck, Tony, git
In-Reply-To: <m3mzrddx44.fsf@defiant.localdomain>

Krzysztof Halasa wrote:
> "H. Peter Anvin" <hpa@zytor.com> writes:
> 
> 
>>It is, but you can't assume you don't have that.
> 
> 
> Yes, if you use NTP time (directly - not the system time) you can get
> second=60 (and, in theory, even 61 - not to be expected soon).
> 

No.  You cannot get 61.  You can, however, get jumps from 58 to 00.

> 
>> Either way, you just
>>treat it the same as the following second.
> 
> Sure, that's the safe way.

^ permalink raw reply

* Re: Trying to use AUTHOR_DATE
From: Krzysztof Halasa @ 2005-05-02 23:30 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: David Woodhouse, Edgar Toernig, Linus Torvalds, Luck, Tony, git
In-Reply-To: <4276A906.2040403@zytor.com>

"H. Peter Anvin" <hpa@zytor.com> writes:

> It is, but you can't assume you don't have that.

Yes, if you use NTP time (directly - not the system time) you can get
second=60 (and, in theory, even 61 - not to be expected soon).

>  Either way, you just
> treat it the same as the following second.

Sure, that's the safe way.
-- 
Krzysztof Halasa

^ permalink raw reply

* [RFC] git-diff-cache sans --cached and unmerged paths
From: Junio C Hamano @ 2005-05-02 23:21 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git

Linus, 

    git-diff-cache without --cached says 'U filename" (or
"unmerged filename") when working with an unmerged cache entry.
Since the form without --cached is to mean "look at the work
tree", I think it should be changed to report the mode and the
magic 0{40} SHA1.  What do you think?

I was manually fixing up a merge and I wanted to compare the
merge result in the work tree with the pre-merge HEAD version
from either heads, but this behaviour (yes I am the guilty one)
makes it cumbersome, and that is the reason behind this
question.

BTW, when you have a chance, could you please give the
executable bit to git-apply-patch-script, pretty please.  This
is my fourth attempt ;-).

^ permalink raw reply

* Re: How to get bash to shut up about SIGPIPE?
From: Petr Baudis @ 2005-05-02 23:17 UTC (permalink / raw)
  To: Rene Scharfe; +Cc: Paul Jackson, Linus Torvalds, git
In-Reply-To: <20050430110410.GA25322@lsrfire.ath.cx>

Dear diary, on Sat, Apr 30, 2005 at 01:04:10PM CEST, I got a letter
where Rene Scharfe <rene.scharfe@lsrfire.ath.cx> told me that...
> On Fri, Apr 29, 2005 at 11:29:22PM -0700, Paul Jackson wrote:
> > Linus replied to pj:
> > > > Code Sample 2:
> > > > ...
> > > Didn't change anything for me. Same thing.
> > 
> > I don't believe you did what I did.
> > 
> > The source code for bash, both 2.x and 3.x versions, clearly displays a
> > simpler error message (no line number or redisplay of your script
> > commands) in the case that you set a trap.  And I tested both shells on
> > a multiprocessor, to verify that they behaved as I expected, running
> > these silly little scripts.
> 
> I don't have a multiprocessor and I see the same.  Are you sure it's SMP
> dependant?
> 
> Your solution (trapping _inside_ the job, too) works for me, btw.  Here's
> a patch for cg-log that reduces the clutter to two "Broken pipe" lines
> (pun not intended).

Could you elaborate on how exactly is it supposed to help? I see
identical behaviour with the traps and without them (UP, bash-2.05b).

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox