Git development

Git development
 help / color / mirror / Atom feed

* Re: gitweb wishlist
From: Thomas Glanzmann @ 2005-05-20 19:13 UTC (permalink / raw)
  To: git
In-Reply-To: <1116615600.12975.33.camel@dhcp-188>

Hello Kay,
I would like to see that I can klick on the file instead of the seperate
'blob' link in a directory view, becasue that is more intuitive and you can
already an klick on directories:

http://www.kernel.org/git/?p=git/git.git;a=tree;h=665a48af9e192ed84d2707c95d4c0d9c45eb45ad;hb=411746940f02f6fb90c4b6b97c6f07cee599c2e1

Thanks for this great tool!

Sincerely,
	Thomas

^ permalink raw reply

* Re: gitweb wishlist
From: Linus Torvalds @ 2005-05-20 19:22 UTC (permalink / raw)
  To: Kay Sievers; +Cc: Petr Baudis, Git Mailing List, Peter Anvin
In-Reply-To: <1116615600.12975.33.camel@dhcp-188>

On Fri, 20 May 2005, Kay Sievers wrote:
> 
> Somehting like this?:
>   http://kernel.org/git/?p=git/git.git;a=commitdiff;h=de809dbbce497e0d107562615c1d85ff35b4e0c5

Btw, at least for me, this looks much more interesting than the "commit" 
thing, and maybe it would make sense to make the summary links be to the 
"commitdiff" instead of the "commit"?

Or is it just so much more expensive to generate, that we want to not have
people go there normally? (hpa cc'd, since he may have some insight into
whether this is likely to be an issue or not? It's not like git-diff-tree
is that expensive, but it _does_ end up doing a "diff" against each
changed file, of course, modulo any caching of results).

		Linus

^ permalink raw reply

* add conf file support to gitweb
From: Andres Salomon @ 2005-05-20 19:29 UTC (permalink / raw)
  To: git, Kay Sievers

[-- Attachment #1: Type: text/plain, Size: 405 bytes --]

Hi,

The attached patch makes gitweb read and eval variables from
an /etc/gitweb.conf file.  This is useful for distributions; I'm
packaging gitweb for debian, and want to have a separate config file
that users can edit that won't get overwritten when they upgrade gitweb.
Even if you don't take this patch, please consider some other method
that decouples the configuration from the gitweb.cgi script.



[-- Attachment #2: gitweb.conf.patch --]
[-- Type: text/x-patch, Size: 687 bytes --]

Index: gitweb.cgi
===================================================================
--- 8b7a4b08ba4892970a2531d4c1584e3881a13586/gitweb.cgi  (mode:100644)
+++ fe8329b147103e115e2ad727bfca34c2ecfa901d/gitweb.cgi  (mode:100755)
@@ -40,6 +40,16 @@
 #my $projects_list = $projectroot;
 my $projects_list = "index/index.aux";
 
+# allow config file to override settings above
+if (-r '/etc/gitweb.conf') {
+	open(CONF, '/etc/gitweb.conf') || die_error(undef, "Cannot open /etc/gitweb.conf.");
+	while (<CONF>) {
+		chomp;
+		eval($_) if ($_ =~ /^\s*(\$[\w]+)\s*=\s*(.*)\s*$/);
+	}
+	close(CONF);
+}
+
 # input validation and dispatch
 my $action = $cgi->param('a');
 if (defined $action) {

^ permalink raw reply

* Re: gitweb and kernel.org
From: Jeff Garzik @ 2005-05-20 19:47 UTC (permalink / raw)
  To: Kay Sievers; +Cc: Git Mailing List, FTP Admin
In-Reply-To: <1116615502.12975.29.camel@dhcp-188>

Kay Sievers wrote:
> Initial support for branches added! :)

Wow, that was fast.  Thanks.

	Jeff



^ permalink raw reply

* Re: gitweb wishlist
From: H. Peter Anvin @ 2005-05-20 20:34 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Kay Sievers, Petr Baudis, Git Mailing List
In-Reply-To: <Pine.LNX.4.58.0505201219420.2206@ppc970.osdl.org>

Linus Torvalds wrote:
> 
> On Fri, 20 May 2005, Kay Sievers wrote:
> 
>>Somehting like this?:
>>  http://kernel.org/git/?p=git/git.git;a=commitdiff;h=de809dbbce497e0d107562615c1d85ff35b4e0c5
> 
> 
> Btw, at least for me, this looks much more interesting than the "commit" 
> thing, and maybe it would make sense to make the summary links be to the 
> "commitdiff" instead of the "commit"?
> 
> Or is it just so much more expensive to generate, that we want to not have
> people go there normally? (hpa cc'd, since he may have some insight into
> whether this is likely to be an issue or not? It's not like git-diff-tree
> is that expensive, but it _does_ end up doing a "diff" against each
> changed file, of course, modulo any caching of results).
> 

What I ended up doing for the diff viewer on kernel.org is that every 
page that's generated gets stuffed in a cache (locklessly indexed by a 
SHA-1 of a canonicalized form of the query); the pages people actually 
see are then simply pulled from the cache.  This caching was a just 
enormous win.  In the case of the diff viewer, the header is generated 
each time, since I allow the user to select a custom style sheet (and 
don't want to cache versions for each style sheet), but that's a trivial 
detail.

	-hpa

^ permalink raw reply

* Re: gitweb wishlist
From: Linus Torvalds @ 2005-05-20 20:49 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: Kay Sievers, Petr Baudis, Git Mailing List
In-Reply-To: <428E49DD.406@zytor.com>

On Fri, 20 May 2005, H. Peter Anvin wrote:

> 
> What I ended up doing for the diff viewer on kernel.org is that every 
> page that's generated gets stuffed in a cache (locklessly indexed by a 
> SHA-1 of a canonicalized form of the query); the pages people actually 
> see are then simply pulled from the cache.  This caching was a just 
> enormous win.

Ok. That still leaves the bandwidth issue (the full diffs are bigger than 
the commit object), but usually the diffs in individual commits aren't 
_that_ large, so maybe it's a non-issue.

Oh, btw, I notice that you moved klibc over to git - care to share your
cvs->git script (I assume you scripted it ;)? That would seem to be an 
obvious addition to the core stuff..

		Linus

^ permalink raw reply

* Re: gitweb wishlist
From: H. Peter Anvin @ 2005-05-20 20:50 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Kay Sievers, Petr Baudis, Git Mailing List
In-Reply-To: <Pine.LNX.4.58.0505201346330.2206@ppc970.osdl.org>

Linus Torvalds wrote:
> 
> Oh, btw, I notice that you moved klibc over to git - care to share your
> cvs->git script (I assume you scripted it ;)? That would seem to be an 
> obvious addition to the core stuff..
> 

Actually, Kay did the conversion... the scripts are clearly very 
cantankerous, because if *I* run them -- I tried -- they don't work! 
Since it's Kay's work, I'll leave them to him, but I would definitely 
love to move more of my CVS repos over to git, especially syslinux.

	-hpa

^ permalink raw reply

* [PATCH 1/3] delta read
From: Nicolas Pitre @ 2005-05-20 20:57 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git

This patch makes the core code aware of delta objects and undeltafy
them as needed.  The convention is to use read_sha1_file() to have
undeltafication done automatically (most users do that already so
this is transparent).

If the delta object itself has to be accessed then it must be done
through map_sha1_file() and unpack_sha1_file().

In that context mktag.c has been switched to read_sha1_file() as there
is no reason to do the full map+unpack manually.

Signed-off-by: Nicolas Pitre <nico@cam.org>

Index: git/sha1_file.c
===================================================================
--- git.orig/sha1_file.c
+++ git/sha1_file.c
@@ -9,6 +9,7 @@
 #include <stdarg.h>
 #include <limits.h>
 #include "cache.h"
+#include "delta.h"
 
 #ifndef O_NOATIME
 #if defined(__linux__) && (defined(__i386__) || defined(__PPC__))
@@ -353,6 +354,19 @@
 	if (map) {
 		buf = unpack_sha1_file(map, mapsize, type, size);
 		munmap(map, mapsize);
+		if (buf && !strcmp(type, "delta")) {
+			void *ref = NULL, *delta = buf;
+			unsigned long ref_size, delta_size = *size;
+			buf = NULL;
+			if (delta_size > 20)
+				ref = read_sha1_file(delta, type, &ref_size);
+			if (ref)
+				buf = patch_delta(ref, ref_size,
+						  delta+20, delta_size-20, 
+						  size);
+			free(delta);
+			free(ref);
+		}
 		return buf;
 	}
 	return NULL;
Index: git/mktag.c
===================================================================
--- git.orig/mktag.c
+++ git/mktag.c
@@ -25,20 +25,14 @@
 static int verify_object(unsigned char *sha1, const char *expected_type)
 {
 	int ret = -1;
-	unsigned long mapsize;
-	void *map = map_sha1_file(sha1, &mapsize);
+	char type[100];
+	unsigned long size;
+	void *buffer = read_sha1_file(sha1, type, &size);
 
-	if (map) {
-		char type[100];
-		unsigned long size;
-		void *buffer = unpack_sha1_file(map, mapsize, type, &size);
-
-		if (buffer) {
-			if (!strcmp(type, expected_type))
-				ret = check_sha1_signature(sha1, buffer, size, type);
-			free(buffer);
-		}
-		munmap(map, mapsize);
+	if (buffer) {
+		if (!strcmp(type, expected_type))
+			ret = check_sha1_signature(sha1, buffer, size, type);
+		free(buffer);
 	}
 	return ret;
 }

^ permalink raw reply

* [PATCH 2/3] delta check
From: Nicolas Pitre @ 2005-05-20 20:59 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git

This patch adds knowledge of delta objects to fsck-cache and various
object parsing code.  A new switch to git-fsck-cache is provided to
display the maximum delta depth found in a repository.

Signed-off-by: Nicolas Pitre <nico@cam.org>

Index: git/fsck-cache.c
===================================================================
--- git.orig/fsck-cache.c
+++ git/fsck-cache.c
@@ -6,15 +6,46 @@
 #include "tree.h"
 #include "blob.h"
 #include "tag.h"
+#include "delta.h"
 
 #define REACHABLE 0x0001
 
 static int show_root = 0;
 static int show_tags = 0;
 static int show_unreachable = 0;
+static int show_max_delta_depth = 0;
 static int keep_cache_objects = 0; 
 static unsigned char head_sha1[20];
 
+static void expand_deltas(void)
+{
+	int i, max_depth = 0;
+
+	/*
+	 * To be as efficient as possible we look for delta heads and
+	 * recursively process them going backward, and parsing
+	 * resulting objects along the way.  This allows for processing
+	 * each delta objects only once regardless of the delta depth.
+	 */
+	for (i = 0; i < nr_objs; i++) {
+		struct object *obj = objs[i];
+		if (obj->parsed && !obj->delta && obj->attached_deltas) {
+			int depth = 0;
+			char type[10];
+			unsigned long size;
+			void *buf = read_sha1_file(obj->sha1, type, &size);
+			if (!buf)
+				continue;
+			depth = process_deltas(buf, size, obj->type,
+					       obj->attached_deltas);
+			if (max_depth < depth)
+				max_depth = depth;
+		}
+	}
+	if (show_max_delta_depth)
+		printf("maximum delta depth = %d\n", max_depth);
+}
+															
 static void check_connectivity(void)
 {
 	int i;
@@ -25,7 +56,12 @@
 		struct object_list *refs;
 
 		if (!obj->parsed) {
-			printf("missing %s %s\n", obj->type, sha1_to_hex(obj->sha1));
+			if (obj->delta)
+				printf("unresolved delta %s\n",
+				       sha1_to_hex(obj->sha1));
+			else
+				printf("missing %s %s\n",
+				       obj->type, sha1_to_hex(obj->sha1));
 			continue;
 		}
 
@@ -43,7 +79,12 @@
 			continue;
 
 		if (show_unreachable && !(obj->flags & REACHABLE)) {
-			printf("unreachable %s %s\n", obj->type, sha1_to_hex(obj->sha1));
+			if (obj->attached_deltas)
+				printf("foreign delta reference %s\n", 
+				       sha1_to_hex(obj->sha1));
+			else
+				printf("unreachable %s %s\n",
+				       obj->type, sha1_to_hex(obj->sha1));
 			continue;
 		}
 
@@ -201,6 +242,8 @@
 		return fsck_commit((struct commit *) obj);
 	if (obj->type == tag_type)
 		return fsck_tag((struct tag *) obj);
+	if (!obj->type && obj->delta)
+		return 0;
 	return -1;
 }
 
@@ -384,6 +427,10 @@
 			show_root = 1;
 			continue;
 		}
+		if (!strcmp(arg, "--delta-depth")) {
+			show_max_delta_depth = 1;
+			continue;
+		}
 		if (!strcmp(arg, "--cache")) {
 			keep_cache_objects = 1;
 			continue;
@@ -400,6 +447,8 @@
 	}
 	fsck_sha1_list();
 
+	expand_deltas();
+
 	heads = 0;
 	for (i = 1; i < argc; i++) {
 		const char *arg = argv[i]; 
@@ -423,7 +472,7 @@
 	}
 
 	/*
-	 * If we've not been gived any explicit head information, do the
+	 * If we've not been given any explicit head information, do the
 	 * default ones from .git/refs. We also consider the index file
 	 * in this case (ie this implies --cache).
 	 */
Index: git/delta.c
===================================================================
--- /dev/null
+++ git/delta.c
@@ -0,0 +1,115 @@
+#include "object.h"
+#include "blob.h"
+#include "tree.h"
+#include "commit.h"
+#include "tag.h"
+#include "delta.h"
+#include "cache.h"
+#include <string.h>
+
+/* the delta object definition (it can alias any other object) */
+struct delta {
+	union {
+		struct object object;
+		struct blob blob;
+		struct tree tree;
+		struct commit commit;
+		struct tag tag;
+	} u;
+};
+
+struct delta *lookup_delta(unsigned char *sha1)
+{
+	struct object *obj = lookup_object(sha1);
+	if (!obj) {
+		struct delta *ret = xmalloc(sizeof(struct delta));
+		memset(ret, 0, sizeof(struct delta));
+		created_object(sha1, &ret->u.object);
+		return ret;
+	}
+	return (struct delta *) obj;
+}
+
+int parse_delta_buffer(struct delta *item, void *buffer, unsigned long size)
+{
+	struct object *reference;
+	struct object_list *p;
+
+	if (item->u.object.delta)
+		return 0;
+	item->u.object.delta = 1;
+	if (size <= 20)
+		return -1;
+	reference = lookup_object(buffer);
+	if (!reference) {
+		struct delta *ref = xmalloc(sizeof(struct delta));
+		memset(ref, 0, sizeof(struct delta));
+		created_object(buffer, &ref->u.object);
+		reference = &ref->u.object;
+	}
+
+	p = xmalloc(sizeof(*p));
+	p->item = &item->u.object;
+	p->next = reference->attached_deltas;
+	reference->attached_deltas = p;
+	return 0;
+}
+
+int process_deltas(void *src, unsigned long src_size, const char *src_type,
+		   struct object_list *delta_list)
+{
+	int deepest = 0;
+	do {
+		struct object *obj = delta_list->item;
+		static char type[10];
+		void *map, *delta, *buf;
+		unsigned long map_size, delta_size, buf_size;
+		map = map_sha1_file(obj->sha1, &map_size);
+		if (!map)
+			continue;
+		delta = unpack_sha1_file(map, map_size, type, &delta_size);
+		munmap(map, map_size);
+		if (!delta)
+			continue;
+		if (strcmp(type, "delta") || delta_size <= 20) {
+			free(delta);
+			continue;
+		}
+		buf = patch_delta(src, src_size,
+				  delta+20, delta_size-20,
+				  &buf_size);
+		free(delta);
+		if (!buf)
+			continue;
+		if (check_sha1_signature(obj->sha1, buf, buf_size, src_type) < 0)
+			printf("sha1 mismatch for delta %s\n", sha1_to_hex(obj->sha1));
+		if (obj->type && obj->type != src_type) {
+			error("got %s when expecting %s for delta %s",
+			      src_type, obj->type, sha1_to_hex(obj->sha1));
+			free(buf);
+			continue;
+		}
+		obj->type = src_type;
+		if (src_type == blob_type) {
+			parse_blob_buffer((struct blob *)obj, buf, buf_size);
+		} else if (src_type == tree_type) {
+			parse_tree_buffer((struct tree *)obj, buf, buf_size);
+		} else if (src_type == commit_type) {
+			parse_commit_buffer((struct commit *)obj, buf, buf_size);
+		} else if (src_type == tag_type) {
+			parse_tag_buffer((struct tag *)obj, buf, buf_size);
+		} else {
+			error("unknown object type %s", src_type);
+			free(buf);
+			continue;
+		}
+		if (obj->attached_deltas) {
+			int depth = process_deltas(buf, buf_size, src_type,
+						   obj->attached_deltas);
+			if (deepest < depth)
+				deepest = depth;
+		}
+		free(buf);
+	} while ((delta_list = delta_list->next));
+	return deepest + 1;
+}
Index: git/tag.c
===================================================================
--- git.orig/tag.c
+++ git/tag.c
@@ -13,6 +13,8 @@
                 ret->object.type = tag_type;
                 return ret;
         }
+	if (!obj->type)
+		obj->type = tag_type;
         if (obj->type != tag_type) {
                 error("Object %s is a %s, not a tree", 
                       sha1_to_hex(sha1), obj->type);
Index: git/tree.c
===================================================================
--- git.orig/tree.c
+++ git/tree.c
@@ -83,6 +83,8 @@
 		ret->object.type = tree_type;
 		return ret;
 	}
+	if (!obj->type)
+		obj->type = tree_type;
 	if (obj->type != tree_type) {
 		error("Object %s is a %s, not a tree", 
 		      sha1_to_hex(sha1), obj->type);
Index: git/blob.c
===================================================================
--- git.orig/blob.c
+++ git/blob.c
@@ -14,6 +14,8 @@
 		ret->object.type = blob_type;
 		return ret;
 	}
+	if (!obj->type)
+		obj->type = blob_type;
 	if (obj->type != blob_type) {
 		error("Object %s is a %s, not a blob", 
 		      sha1_to_hex(sha1), obj->type);
Index: git/delta.h
===================================================================
--- git.orig/delta.h
+++ git/delta.h
@@ -1,6 +1,21 @@
+#ifndef DELTA_H
+#define DELTA_H
+
+/* handling of delta buffers */
 extern void *diff_delta(void *from_buf, unsigned long from_size,
 			void *to_buf, unsigned long to_size,
 		        unsigned long *delta_size);
 extern void *patch_delta(void *src_buf, unsigned long src_size,
 			 void *delta_buf, unsigned long delta_size,
 			 unsigned long *dst_size);
+
+/* handling of delta objects */
+struct delta;
+struct object_list;
+extern struct delta *lookup_delta(unsigned char *sha1);
+extern int parse_delta_buffer(struct delta *item, void *buffer, unsigned long size);
+extern int parse_delta(struct delta *item, unsigned char sha1);
+extern int process_deltas(void *src, unsigned long src_size,
+			  const char *src_type, struct object_list *delta);
+
+#endif
Index: git/commit.c
===================================================================
--- git.orig/commit.c
+++ git/commit.c
@@ -37,6 +37,8 @@
 		ret->object.type = commit_type;
 		return ret;
 	}
+	if (!obj->type)
+		obj->type = commit_type;
 	return check_commit(obj, sha1);
 }
 
Index: git/object.c
===================================================================
--- git.orig/object.c
+++ git/object.c
@@ -4,6 +4,7 @@
 #include "commit.h"
 #include "cache.h"
 #include "tag.h"
+#include "delta.h"
 #include <stdlib.h>
 #include <string.h>
 
@@ -104,6 +105,7 @@
 	unsigned long mapsize;
 	void *map = map_sha1_file(sha1, &mapsize);
 	if (map) {
+		int is_delta;
 		struct object *obj;
 		char type[100];
 		unsigned long size;
@@ -111,9 +113,14 @@
 		munmap(map, mapsize);
 		if (!buffer)
 			return NULL;
-		if (check_sha1_signature(sha1, buffer, size, type) < 0)
+		is_delta = !strcmp(type, "delta");
+		if (!is_delta && check_sha1_signature(sha1, buffer, size, type) < 0)
 			printf("sha1 mismatch %s\n", sha1_to_hex(sha1));
-		if (!strcmp(type, "blob")) {
+		if (is_delta) {
+			struct delta *delta = lookup_delta(sha1);
+			parse_delta_buffer(delta, buffer, size);
+			obj = (struct object *) delta;
+		} else if (!strcmp(type, "blob")) {
 			struct blob *blob = lookup_blob(sha1);
 			parse_blob_buffer(blob, buffer, size);
 			obj = &blob->object;
Index: git/Makefile
===================================================================
--- git.orig/Makefile
+++ git/Makefile
@@ -36,7 +36,7 @@
 	$(INSTALL) $(PROG) $(SCRIPTS) $(dest)$(bin)
 
 LIB_OBJS=read-cache.o sha1_file.o usage.o object.o commit.o tree.o blob.o \
-	 tag.o date.o index.o diff-delta.o patch-delta.o
+	 tag.o delta.o date.o index.o diff-delta.o patch-delta.o
 LIB_FILE=libgit.a
 LIB_H=cache.h object.h blob.h tree.h commit.h tag.h delta.h
 
Index: git/object.h
===================================================================
--- git.orig/object.h
+++ git/object.h
@@ -9,10 +9,12 @@
 struct object {
 	unsigned parsed : 1;
 	unsigned used : 1;
+	unsigned delta : 1;
 	unsigned int flags;
 	unsigned char sha1[20];
 	const char *type;
 	struct object_list *refs;
+	struct object_list *attached_deltas;
 };
 
 extern int nr_objs;

^ permalink raw reply

* [PATCH 3/3] delta creation
From: Nicolas Pitre @ 2005-05-20 21:00 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git

This patch adds the ability to actually create delta objects using a
new tool: git-mkdelta.  It uses an ordered list of potential objects
to deltafy against earlier objects in the list.  A cap on the depth of
delta references can be provided as well, otherwise the default is to
not have any limit.  A limit of 0 will also undeltafy any given object.

Also provided is the beginning of a script to deltafy an entire
repository.

Signed-off-by: Nicolas Pitre <nico@cam.org>

Index: git/Makefile
===================================================================
--- git.orig/Makefile
+++ git/Makefile
@@ -19,7 +19,8 @@
 INSTALL=install
 
 SCRIPTS=git-apply-patch-script git-merge-one-file-script git-prune-script \
-	git-pull-script git-tag-script git-resolve-script git-whatchanged
+	git-pull-script git-tag-script git-resolve-script git-whatchanged \
+	git-deltafy-script
 
 PROG=   git-update-cache git-diff-files git-init-db git-write-tree \
 	git-read-tree git-commit-tree git-cat-file git-fsck-cache \
@@ -28,7 +29,7 @@
 	git-unpack-file git-export git-diff-cache git-convert-cache \
 	git-http-pull git-rpush git-rpull git-rev-list git-mktag \
 	git-diff-helper git-tar-tree git-local-pull git-write-blob \
-	git-get-tar-commit-id
+	git-get-tar-commit-id git-mkdelta
 
 all: $(PROG)
 
@@ -107,6 +108,7 @@
 git-diff-helper: diff-helper.c
 git-tar-tree: tar-tree.c
 git-write-blob: write-blob.c
+git-mkdelta: mkdelta.c
 
 git-http-pull: LIBS += -lcurl
 
Index: git/mkdelta.c
===================================================================
--- /dev/null
+++ git/mkdelta.c
@@ -0,0 +1,317 @@
+/*
+ * Deltafication of a GIT database.
+ *
+ * (C) 2005 Nicolas Pitre <nico@cam.org>
+ *
+ * This code is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include "cache.h"
+#include "delta.h"
+
+static int replace_object(char *buf, unsigned long size, unsigned char *sha1)
+{
+	char tmpfile[PATH_MAX];
+	int fd;
+
+	snprintf(tmpfile, sizeof(tmpfile), "%s/obj_XXXXXX", get_object_directory());
+	fd = mkstemp(tmpfile);
+	if (fd < 0)
+		return error("%s: %s\n", tmpfile, strerror(errno));
+	if (write(fd, buf, size) != size) {
+		perror("unable to write file");
+		close(fd);
+		unlink(tmpfile);
+		return -1;
+	}
+	fchmod(fd, 0444);
+	close(fd);
+	if (rename(tmpfile, sha1_file_name(sha1))) {
+		perror("unable to replace original object");
+		unlink(tmpfile);
+		return -1;
+	}
+	return 0;
+}
+
+static void *create_object(char *buf, unsigned long len, char *hdr, int hdrlen,
+			   unsigned long *retsize)
+{
+	char *compressed;
+	unsigned long size;
+	z_stream stream;
+
+	/* Set it up */
+	memset(&stream, 0, sizeof(stream));
+	deflateInit(&stream, Z_BEST_COMPRESSION);
+	size = deflateBound(&stream, len+hdrlen);
+	compressed = xmalloc(size);
+
+	/* Compress it */
+	stream.next_out = compressed;
+	stream.avail_out = size;
+
+	/* First header.. */
+	stream.next_in = hdr;
+	stream.avail_in = hdrlen;
+	while (deflate(&stream, 0) == Z_OK)
+		/* nothing */;
+
+	/* Then the data itself.. */
+	stream.next_in = buf;
+	stream.avail_in = len;
+	while (deflate(&stream, Z_FINISH) == Z_OK)
+		/* nothing */;
+	deflateEnd(&stream);
+	*retsize = stream.total_out;
+	return compressed;
+}
+
+static int restore_original_object(char *buf, unsigned long len,
+				   char *type, unsigned char *sha1)
+{
+	char hdr[50];
+	int hdrlen, ret;
+	void *compressed;
+	unsigned long size;
+
+	hdrlen = sprintf(hdr, "%s %lu", type, len)+1;
+	compressed = create_object(buf, len, hdr, hdrlen, &size);
+	ret = replace_object(compressed, size, sha1);
+	free(compressed);
+	return ret;
+}
+
+static void *create_delta_object(char *buf, unsigned long len,
+				 unsigned char *sha1_ref, unsigned long *size)
+{
+	char hdr[50];
+	int hdrlen;
+
+	/* Generate the header + sha1 of reference for delta */
+	hdrlen = sprintf(hdr, "delta %lu", len+20)+1;
+	memcpy(hdr + hdrlen, sha1_ref, 20);
+	hdrlen += 20;
+
+	return create_object(buf, len, hdr, hdrlen, size);
+}
+
+static unsigned long get_object_size(unsigned char *sha1)
+{
+	struct stat st;
+	if (stat(sha1_file_name(sha1), &st))
+		die("%s: %s", sha1_to_hex(sha1), strerror(errno));
+	return st.st_size;
+}
+
+static void *get_buffer(unsigned char *sha1, char *type, unsigned long *size)
+{
+	unsigned long mapsize;
+	void *map = map_sha1_file(sha1, &mapsize);
+	if (map) {
+		void *buffer = unpack_sha1_file(map, mapsize, type, size);
+		munmap(map, mapsize);
+		if (buffer)
+			return buffer;
+	}
+	error("unable to get object %s", sha1_to_hex(sha1));
+	return NULL;
+}
+
+static void *expand_delta(void *delta, unsigned long delta_size, char *type,
+			  unsigned long *size, unsigned int *depth, char *head)
+{
+	void *buf = NULL;
+	*depth++;
+	if (delta_size < 20) {
+		error("delta object is bad");
+		free(delta);
+	} else {
+		unsigned long ref_size;
+		void *ref = get_buffer(delta, type, &ref_size);
+		if (ref && !strcmp(type, "delta"))
+			ref = expand_delta(ref, ref_size, type, &ref_size,
+					   depth, head);
+		else
+			memcpy(head, delta, 20);
+		if (ref)
+			buf = patch_delta(ref, ref_size, delta+20,
+					  delta_size-20, size);
+		free(ref);
+		free(delta);
+	}
+	return buf;
+}
+
+static char *mkdelta_usage =
+"mkdelta [ --max-depth=N ] <reference_sha1> <target_sha1> [ <next_sha1> ... ]";
+
+int main(int argc, char **argv)
+{
+	unsigned char sha1_ref[20], sha1_trg[20], head_ref[20], head_trg[20];
+	char type_ref[20], type_trg[20];
+	void *buf_ref, *buf_trg, *buf_delta;
+	unsigned long size_ref, size_trg, size_orig, size_delta;
+	unsigned int depth_ref, depth_trg, depth_max = -1;
+	int i, verbose = 0;
+
+	for (i = 1; i < argc; i++) {
+		if (!strcmp(argv[i], "-v")) {
+			verbose = 1;
+		} else if (!strcmp(argv[i], "-d") && i+1 < argc) {
+			depth_max = atoi(argv[++i]);
+		} else if (!strncmp(argv[i], "--max-depth=", 12)) {
+			depth_max = atoi(argv[i]+12);
+		} else
+			break;
+	}
+
+	if (i + (depth_max != 0) >= argc)
+		usage(mkdelta_usage);
+
+	if (get_sha1(argv[i], sha1_ref))
+		die("bad sha1 %s", argv[i]);
+	depth_ref = 0;
+	buf_ref = get_buffer(sha1_ref, type_ref, &size_ref);
+	if (buf_ref && !strcmp(type_ref, "delta"))
+		buf_ref = expand_delta(buf_ref, size_ref, type_ref,
+				       &size_ref, &depth_ref, head_ref);
+	else
+		memcpy(head_ref, sha1_ref, 20);
+	if (!buf_ref)
+		die("unable to obtain initial object %s", argv[i]);
+
+	if (depth_ref > depth_max) {
+		if (restore_original_object(buf_ref, size_ref, type_ref, sha1_ref))
+			die("unable to restore %s", argv[i]);
+		if (verbose)
+			printf("undelta %s (depth was %d)\n", argv[i], depth_ref);
+		depth_ref = 0;
+	}
+
+	/*
+	 * TODO: deltafication should be tried against any early object
+	 * in the object list and not only the previous object.
+	 */
+
+	while (++i < argc) {
+		if (get_sha1(argv[i], sha1_trg))
+			die("bad sha1 %s", argv[i]);
+		depth_trg = 0;
+		buf_trg = get_buffer(sha1_trg, type_trg, &size_trg);
+		if (buf_trg && !size_trg) {
+			if (verbose)
+				printf("skip    %s (object is empty)\n", argv[i]);
+			continue;
+		}
+		size_orig = size_trg;
+		if (buf_trg && !strcmp(type_trg, "delta")) {
+			if (!memcmp(buf_trg, sha1_ref, 20)) {
+				/* delta already in place */
+				depth_ref++;
+				memcpy(sha1_ref, sha1_trg, 20);
+				buf_ref = patch_delta(buf_ref, size_ref,
+						      buf_trg+20, size_trg-20,
+						      &size_ref);
+				if (!buf_ref)
+					die("unable to apply delta %s", argv[i]);
+				if (depth_ref > depth_max) {
+					if (restore_original_object(buf_ref, size_ref,
+								    type_ref, sha1_ref))
+						die("unable to restore %s", argv[i]);
+					if (verbose)
+						printf("undelta %s (depth was %d)\n", argv[i], depth_ref);
+					depth_ref = 0;
+					continue;
+				}
+				if (verbose)
+					printf("skip    %s (delta already in place)\n", argv[i]);
+				continue;
+			}
+			buf_trg = expand_delta(buf_trg, size_trg, type_trg,
+					       &size_trg, &depth_trg, head_trg);
+		} else
+			memcpy(head_trg, sha1_trg, 20);
+		if (!buf_trg)
+			die("unable to read target object %s", argv[i]);
+
+		if (depth_trg > depth_max) {
+			if (restore_original_object(buf_trg, size_trg, type_trg, sha1_trg))
+				die("unable to restore %s", argv[i]);
+			if (verbose)
+				printf("undelta %s (depth was %d)\n", argv[i], depth_trg);
+			depth_trg = 0;
+			size_orig = size_trg;
+		}
+
+		if (depth_max == 0)
+			goto skip;
+
+		if (strcmp(type_ref, type_trg))
+			die("type mismatch for object %s", argv[i]);
+
+		if (!size_ref) {
+			if (verbose)
+				printf("skip    %s (initial object is empty)\n", argv[i]);
+			goto skip;
+		}
+		
+		if (depth_ref + 1 > depth_max) {
+			if (verbose)
+				printf("skip    %s (exceeding max link depth)\n", argv[i]);
+			goto skip;
+		}
+
+		if (!memcmp(head_ref, sha1_trg, 20)) {
+			if (verbose)
+				printf("skip    %s (would create a loop)\n", argv[i]);
+			goto skip;
+		}
+
+		buf_delta = diff_delta(buf_ref, size_ref, buf_trg, size_trg, &size_delta);
+		if (!buf_delta)
+			die("out of memory");
+
+		/* no need to even try to compress if original
+		   uncompressed is already smaller */
+		if (size_delta+20 < size_orig) {
+			void *buf_obj;
+			unsigned long size_obj;
+			buf_obj = create_delta_object(buf_delta, size_delta,
+						      sha1_ref, &size_obj);
+			free(buf_delta);
+			size_orig = get_object_size(sha1_trg);
+			if (size_obj >= size_orig) {
+				free(buf_obj);
+				if (verbose)
+					printf("skip    %s (original is smaller)\n", argv[i]);
+				goto skip;
+			}
+			if (replace_object(buf_obj, size_obj, sha1_trg))
+				die("unable to write delta for %s", argv[i]);
+			free(buf_obj);
+			depth_ref++;
+			if (verbose)
+				printf("delta   %s (size=%ld.%02ld%%, depth=%d)\n",
+				       argv[i], size_obj*100 / size_orig,
+				       (size_obj*10000 / size_orig)%100,
+				       depth_ref);
+		} else {
+			free(buf_delta);
+			if (verbose)
+				printf("skip    %s (original is smaller)\n", argv[i]);
+			skip:
+			depth_ref = depth_trg;
+			memcpy(head_ref, head_trg, 20);
+		}
+
+		free(buf_ref);
+		buf_ref = buf_trg;
+		size_ref = size_trg;
+		memcpy(sha1_ref, sha1_trg, 20);
+	}
+
+	return 0;
+}
Index: git/git-deltafy-script
===================================================================
--- /dev/null
+++ git/git-deltafy-script
@@ -0,0 +1,39 @@
+#!/bin/bash
+
+# Script to deltafy an entire GIT repository based on the commit list.
+# The most recent version of a file is the reference and previous versions
+# are made delta against the best earlier version available. And so on for
+# successive versions going back in time.  This way the delta overhead is
+# pushed towards older version of any given file.
+#
+# NOTE: the "best earlier version" is not implemented in mkdelta yet
+#       and therefore only the next eariler version is used at this time.
+#
+# TODO: deltafy tree objects as well.
+#
+# The -d argument allows to provide a limit on the delta chain depth.
+# If 0 is passed then everything is undeltafied.
+
+set -e
+
+depth=
+[ "$1" == "-d" ] && depth="--max-depth=$2" && shift 2
+
+curr_file=""
+
+git-rev-list HEAD |
+git-diff-tree -r --stdin |
+sed -n '/^\*/ s/^.*->\(.\{41\}\)\(.*\)$/\2 \1/p' | sort | uniq |
+while read file sha1; do
+	if [ "$file" == "$curr_file" ]; then
+		list="$list $sha1"
+	else
+		if [ "$list" ]; then
+			echo "Processing $curr_file"
+			echo "$head $list" | xargs git-mkdelta $depth -v
+		fi
+		curr_file="$file"
+		list=""
+		head="$sha1"
+	fi
+done

^ permalink raw reply

* checkout-cache -f: a better way?
From: Jeff Garzik @ 2005-05-20 21:05 UTC (permalink / raw)
  To: Git Mailing List

[-- Attachment #1: Type: text/plain, Size: 563 bytes --]

Being a weirdo, I don't use cogito for kernel development, just git 
itself.  I store branches in .git/refs/heads/ per the defacto standard, 
and use the attached script to switch the working directory from one 
branch to another.

Problem is, 'git-checkout-cache -q -f -a' really pounds the disk, and 
takes quite a while.

Is there any way to avoid -f, while ensuring that the working directory 
truly represents the new branch?

BitKeeper has a secret checkout arg '-S', which will leave files 
untouched if the mtime/size information is unchanged.

	Jeff

[-- Attachment #2: git-switch-tree --]
[-- Type: text/plain, Size: 381 bytes --]

#!/bin/sh

if [ "x$1" != "x" ]
then
	if [ "$1" == "master" ]
	then
		( cd .git && rm -f HEAD && ln -s refs/heads/master HEAD )
	else
		if [ ! -f .git/refs/heads/$1 ]
		then
			echo Branch $1 not found.
			exit 1
		fi

		( cd .git && rm -f HEAD && ln -s refs/heads/$1 HEAD )
	fi
fi

git-read-tree $(cat .git/HEAD) && \
	git-checkout-cache -q -f -a && \
	git-update-cache --refresh

^ permalink raw reply

* Re: gitweb wishlist
From: Thomas Glanzmann @ 2005-05-20 21:16 UTC (permalink / raw)
  To: Git Mailing List
In-Reply-To: <428E4D8C.3020606@zytor.com>

[-- Attachment #1: Type: text/plain, Size: 496 bytes --]

Hello,
I imported the mutt-cvs for the 1.5 branch into GIT using the following
script. But it is a hack. I also think that I will use something like
that to build a CVS->GIT vendortracking.

cvsps -x -z 10 -b HEAD -g -p ../../patches/

And using the attached script to import the patches in GIT. It works
quiet well.

See also msgid: 1115080139.21105.18.camel@localhost.localdomain there
are the scripts which he used to convert the CVS to GIT for HPA. My
scripts are based on his work.

	Thomas

[-- Attachment #2: cvsps-import.pl --]
[-- Type: text/plain, Size: 1810 bytes --]

#!/usr/bin/perl

use strict;
use warnings;
use File::Temp qw/ tempfile tempdir /;

# ---------------------
# PatchSet 1 
# Date: 2002/07/23 07:41:30
# Author: hpa
# Branch: HEAD
# Tag: (none) 
# Log:
# Initial revision
# 
# Members: 
# 	klibc.cvsroot/snprintf.c:INITIAL->1.1 
# 	klibc.cvsroot/vsnprintf.c:INITIAL->1.1 
# 	klibc.cvsroot/klibc/Makefile:INITIAL->1.1 
# 	klibc.cvsroot/klibc/snprintf.c:INITIAL->1.1 
# 	klibc.cvsroot/klibc/vsnprintf.c:INITIAL->1.1 
# 
# --- /dev/null	2005-04-30 18:00:24.840397008 +0200
# +++ klibc/klibc.cvsroot/snprintf.c	2005-05-02 19:57:42.879913000 +0200
# @@ -0,0 +1,19 @@
# +/*

my $patch = $ARGV[0];

my %committer = (
	brendan  => [ 'Brendan Cully',   'brendan@kublai.com' ],
	me       => [ 'Michael Elkins',  'me@sigpipe.org' ],
	roessler => [ 'Thomas Roessler', 'roessler@does-not-exist.org' ]
);

my @log = ();

$ENV{GIT_AUTHOR_EMAIL} = "";
$ENV{GIT_COMMITTER_EMAIL} = "";

open (my $fd, $patch);
while (my $line = <$fd>) {
	if ($line =~ m/^Date: (.*)/) {
		$ENV{GIT_AUTHOR_DATE} = $1;

	} elsif ($line =~ m/^Author: (.*)/) {
		if (defined($committer{$1})) {
			$ENV{GIT_COMMITTER_NAME}  = @{$committer{$1}}[0];
			$ENV{GIT_COMMITTER_EMAIL} = @{$committer{$1}}[1];
			$ENV{GIT_AUTHOR_NAME}         = @{$committer{$1}}[0];
			$ENV{GIT_AUTHOR_EMAIL}        = @{$committer{$1}}[1];
		} else {
			$ENV{GIT_COMMITTER_NAME} = $1;
			$ENV{GIT_AUTHOR_NAME}        = $1;
		}

	} elsif ($line =~ m/^Log:/) {
		while (my $line = <$fd>) {
			if ($line =~ m/^Members: $/) {
				pop(@log);
				last;
			} elsif ($line =~ /^From: (.+) <([^>]+@[^>]+)>$/) {
				$ENV{GIT_AUTHOR_NAME}  = $1;
				$ENV{GIT_AUTHOR_EMAIL} = $2;
			}
			push @log, $line;
		}
	}
}

close($fd);

my ($fh, $logfile) = tempfile(CLEANUP => 1);
print $fh @log;
system("git patch $patch < $logfile");
close($fh);

^ permalink raw reply

* Re: gitweb wishlist
From: Kay Sievers @ 2005-05-20 21:41 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Petr Baudis, Git Mailing List, Peter Anvin
In-Reply-To: <Pine.LNX.4.58.0505201219420.2206@ppc970.osdl.org>

On Fri, 2005-05-20 at 12:22 -0700, Linus Torvalds wrote:
> 
> On Fri, 20 May 2005, Kay Sievers wrote:
> > 
> > Somehting like this?:
> >   http://kernel.org/git/?p=git/git.git;a=commitdiff;h=de809dbbce497e0d107562615c1d85ff35b4e0c5
> 
> Btw, at least for me, this looks much more interesting than the "commit" 
> thing, and maybe it would make sense to make the summary links be to the 
> "commitdiff" instead of the "commit"?

How about this:
  http://www.kernel.org/git/?p=git/git.git;a=summary

The default link is still the same, but you can use the link at the end.

Kay


^ permalink raw reply

* Re: gitweb wishlist
From: Kay Sievers @ 2005-05-20 22:04 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: Linus Torvalds, Petr Baudis, Git Mailing List
In-Reply-To: <428E4D8C.3020606@zytor.com>

[-- Attachment #1: Type: text/plain, Size: 1607 bytes --]

On Fri, 2005-05-20 at 13:50 -0700, H. Peter Anvin wrote:
> Linus Torvalds wrote:
> > 
> > Oh, btw, I notice that you moved klibc over to git - care to share your
> > cvs->git script (I assume you scripted it ;)? That would seem to be an 
> > obvious addition to the core stuff..
> > 
> 
> Actually, Kay did the conversion... the scripts are clearly very 
> cantankerous, because if *I* run them -- I tried -- they don't work! 
> Since it's Kay's work, I'll leave them to him, but I would definitely 
> love to move more of my CVS repos over to git, especially syslinux.

Here we go;

These scripts are just a quick hack, I just wanted to know how nice the
stupid cvs file history can be converted to git-committs.

It exports the CVS repo with the help of the nice cvsps to individual
patches. (Every patch contains something like a "ChangeSet" by searching
for file revisions with the same checkin-date)

Then the patches with the header are split into individual files for
committing it into git (similar to Linus' git-mbox-tools).

If we reach a CVS tag with a patch during sequential patching, the
script throws away the whole current working tree and checks the
revision out of CVS. This way we make sure, that the git-tag matches
tree CVS has tagged. (I've encountered two mismatches in the
"patch-chain" with the CVS revision-tag. These corrections are hardcoded
into the script. :)

For every CVS revision-tag a git-tag without any content except the name
is created.

And the klibc-repo was created with a patched git-commit to fake the
commit date with the author date. :)

Good luck with it,
Kay

[-- Attachment #2: export-to-git.sh --]
[-- Type: application/x-shellscript, Size: 2581 bytes --]

[-- Attachment #3: split-cvsps-patch.pl --]
[-- Type: application/x-perl, Size: 2386 bytes --]

^ permalink raw reply

* Re: gitweb wishlist
From: H. Peter Anvin @ 2005-05-20 22:13 UTC (permalink / raw)
  To: Kay Sievers; +Cc: Linus Torvalds, Petr Baudis, Git Mailing List
In-Reply-To: <1116626652.12975.118.camel@dhcp-188>

Kay Sievers wrote:
> 
> And the klibc-repo was created with a patched git-commit to fake the
> commit date with the author date. :)
> 

In fact, I kind of wish we'd also made committer == author.

Since this whole thing is an import from another revision control 
system, one really wants that.  It's one of those very rare situations 
in which fudging the commit date is not only fully legitimate, but darn 
near required.

	-hpa

^ permalink raw reply

* Re: checkout-cache -f: a better way?
From: Junio C Hamano @ 2005-05-20 22:38 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: Git Mailing List
In-Reply-To: <428E5102.60003@pobox.com>

>>>>> "JG" == Jeff Garzik <jgarzik@pobox.com> writes:

JG> Being a weirdo, I don't use cogito for kernel development, just git
JG> itself.

My customer, in other words ;-).

JG> git-read-tree $(cat .git/HEAD) && \
JG> 	git-checkout-cache -q -f -a && \
JG> 	git-update-cache --refresh

I have to check checkout-cache.c, but assuming that you start
from an already populated work tree with a valid cache when you
do the git-read-tree at the third line from the last, using
"git-read-tree -m HEAD" (you do not need to say $(cat .git/HEAD)
in the modern git anymore) would be a good place to start.

Also the modern git-checkout-cache has a '-u' option and with it
you should not need 'git-update-cache --refresh' after that.

Let me know if you have any problems.  Single tree '-m' is what
Linus did and '-u' option to git-checkout-cache is mine.

^ permalink raw reply

* Re: gitweb wishlist
From: Linus Torvalds @ 2005-05-20 23:25 UTC (permalink / raw)
  To: Kay Sievers; +Cc: H. Peter Anvin, Petr Baudis, Git Mailing List
In-Reply-To: <1116626652.12975.118.camel@dhcp-188>

On Sat, 21 May 2005, Kay Sievers wrote:
> 
> These scripts are just a quick hack, I just wanted to know how nice the
> stupid cvs file history can be converted to git-committs.

Ugh, indeed.

Is it a cvsps bug or what that causes you to have to re-order the patches?  
Or is it that you don't handle branches or something in CVS? The fact that
you also remove one of the tags "suppress ash-branch" in that same number
sequence that you had to fix up by re-ordering seems to imply that the
breakage has something to do with branching.

Does anybody have any suggestions for a nice and smallish CVS project that
has branches that I should look at?

		Linus

^ permalink raw reply

* Re: checkout-cache -f: a better way?
From: Linus Torvalds @ 2005-05-20 23:33 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: Git Mailing List
In-Reply-To: <428E5102.60003@pobox.com>

On Fri, 20 May 2005, Jeff Garzik wrote:
> 
> Problem is, 'git-checkout-cache -q -f -a' really pounds the disk, and 
> takes quite a while.

No. "git" is perfect, and "git-checkout-cache -f" already does exactly 
what you want.

> Is there any way to avoid -f, while ensuring that the working directory 
> truly represents the new branch?

You don't need to avoid -f, it already has the logic to avoid writing 
files that are already up-to-date.

HOWEVER, your script is broken:

	git-read-tree $(cat .git/HEAD) && \
	        git-checkout-cache -q -f -a && \
	        git-update-cache --refresh

you need to use the "-m" switch to git-read-tree to tell it to merge the 
index information from your previous tree with the new one.

Also, don't do the "$(cat .git/HEAD)" thing any more, since modern git 
does this so much more nicely, and allows you to use your branch names 
directly.

Finally, use the new "-u" flag to git-checkout-cache, which will update 
the cache as it goes along. 

In other words, those lines in your script should look like this:

	git-read-tree -m HEAD && git-checkout-cache -q -f -u -a

and you'll be a lot happier.

			Linus

^ permalink raw reply

* Re: checkout-cache -f: a better way?
From: Jeff Garzik @ 2005-05-20 23:33 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Git Mailing List
In-Reply-To: <7vacmpsetb.fsf@assigned-by-dhcp.cox.net>

Junio C Hamano wrote:
>>>>>>"JG" == Jeff Garzik <jgarzik@pobox.com> writes:
> 
> 
> JG> Being a weirdo, I don't use cogito for kernel development, just git
> JG> itself.
> 
> My customer, in other words ;-).
> 
> JG> git-read-tree $(cat .git/HEAD) && \
> JG> 	git-checkout-cache -q -f -a && \
> JG> 	git-update-cache --refresh
> 
> I have to check checkout-cache.c, but assuming that you start
> from an already populated work tree with a valid cache when you
> do the git-read-tree at the third line from the last, using
> "git-read-tree -m HEAD" (you do not need to say $(cat .git/HEAD)
> in the modern git anymore) would be a good place to start.
> 
> Also the modern git-checkout-cache has a '-u' option and with it
> you should not need 'git-update-cache --refresh' after that.
> 
> Let me know if you have any problems.  Single tree '-m' is what
> Linus did and '-u' option to git-checkout-cache is mine.

Pardon my ignorance (I'm slow :)), but how do those changes address the 
fact that git-checkout-cache appears to checkout the entire kernel tree 
(over 100MB of writes) when using '-f' ?

git-checkout-cache -f writes out every file, even if it exists, correct?

	Jeff

^ permalink raw reply

* Re: checkout-cache -f: a better way?
From: Junio C Hamano @ 2005-05-20 23:39 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: Git Mailing List
In-Reply-To: <428E73B9.1080907@pobox.com>

>>>>> "JG" == Jeff Garzik <jgarzik@pobox.com> writes:

JG> git-checkout-cache -f writes out every file, even if it exists, correct?

No, that's not correct.  To translate my prose, you would want
this:

    git-read-tree -m HEAD && git-checkout-cache -q -f -u -a

(notice that I do not have git-update-cache --refresh after
that).

^ permalink raw reply

* Re: checkout-cache -f: a better way?
From: Linus Torvalds @ 2005-05-20 23:51 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: Git Mailing List
In-Reply-To: <Pine.LNX.4.58.0505201626560.2206@ppc970.osdl.org>



On Fri, 20 May 2005, Linus Torvalds wrote:
> 
> In other words, those lines in your script should look like this:
> 
> 	git-read-tree -m HEAD && git-checkout-cache -q -f -u -a
> 
> and you'll be a lot happier.

Btw, I do realize that I'm a total wiener, and that my inability to use 
"getopt_long()" is shameful and stupid. 

What can I say? I'm easily confused, and besides, I really seldom program 
in user mode.

So if somebody were to getopt'ify git, _without_ adding crapola like
autoconf (which probably implies that git would just require GNU getopt),
and others agree that it's ok to just say that we expect getopt_long() to
exist, then I'd not have any objections to making the above just be

	git-read-tree -m HEAD | git-checkout-cache -fqua

(to which the beavis-and-butthead in me says "hehhehhehh.. He said fqua.  
Hehhehh. fire fire fire.")

		Linus

^ permalink raw reply

* Re: checkout-cache -f: a better way?
From: Jeff Garzik @ 2005-05-20 23:55 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Git Mailing List
In-Reply-To: <Pine.LNX.4.58.0505201641160.2206@ppc970.osdl.org>

[-- Attachment #1: Type: text/plain, Size: 720 bytes --]

Linus Torvalds wrote:
> 
> On Fri, 20 May 2005, Linus Torvalds wrote:
> 
>>In other words, those lines in your script should look like this:
>>
>>	git-read-tree -m HEAD && git-checkout-cache -q -f -u -a
>>
>>and you'll be a lot happier.
> 
> 
> Btw, I do realize that I'm a total wiener, and that my inability to use 
> "getopt_long()" is shameful and stupid. 

info libc argp :)  argp is a lot more flexible, but with the same basic 
structure as getopt_long().

If you pick a random git program, I would be willing to convert it as an 
example.  I attached my implementation of ipcrm[1] as an example.

	Jeff


[1] from 'posixutils', my project to implement all the POSIX command 
line utilities.  Yes, I'm crazy too.

[-- Attachment #2: ipcrm.c --]
[-- Type: text/x-csrc, Size: 5023 bytes --]

/*
 * Copyright 2004-2005 Jeff Garzik <jgarzik@pobox.com>
 *
 * This program is free software; you can redistribute it and/or modify
 * it under the terms of the GNU General Public License as published by
 * the Free Software Foundation.
 *
 * This program is distributed in the hope that it will be useful,
 * but WITHOUT ANY WARRANTY; without even the implied warranty of
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 * GNU General Public License for more details.
 *
 * You should have received a copy of the GNU General Public License
 * along with this program; see the file COPYING.  If not, write to
 * the Free Software Foundation, 675 Mass Ave, Cambridge, MA 02139, USA.
 *
 */


#ifndef HAVE_CONFIG_H
#error missing autoconf-generated config.h.
#endif
#include "posixutils-config.h"

#include <sys/types.h>
#include <stdlib.h>
#include <string.h>
#include <sys/ipc.h>
#include <sys/msg.h>
#include <sys/sem.h>
#include <sys/shm.h>
#include <argp.h>
#include <libpu.h>


static const char doc[] =
N_("ipcrm - remove a message queue, semaphore set or shared memory id");

static struct argp_option options[] = {
	{ NULL, 'q', "msgid", 0,
	  N_("Remove message queue identifier msgid from system") },
	{ NULL, 'm', "shmid", 0,
	  N_("Remove shared memory identifier shmid from system") },
	{ NULL, 's', "semid", 0,
	  N_("Remove semaphore identifier semid from system") },
	{ NULL, 'Q', "msgkey", 0,
	  N_("Remove message queue identifier, created with key msgkey, from system") },
	{ NULL, 'M', "shmkey", 0,
	  N_("Remove shared memory identifier, created with key shmkey, from system") },
	{ NULL, 'S', "semkey", 0,
	  N_("Remove semaphore identifier, created with key semkey, from system") },
	{ }
};

static error_t parse_opt (int key, char *arg, struct argp_state *state);
static const struct argp argp = { options, parse_opt, NULL, doc };

enum parse_options_bits {
	OPT_MSG			= (1 << 0),
	OPT_SHM			= (1 << 1),
	OPT_SEM			= (1 << 2),
	OPT_KEY			= (1 << 3),
};

struct arglist {
	struct arglist		*next;
	int			mask;
	unsigned long		arg;
};

#ifdef _SEM_SEMUN_UNDEFINED
   union semun
   {
     int val;
     struct semid_ds *buf;
     unsigned short int *array;
     struct seminfo *__buf;
   };
#endif

static int exit_status = EXIT_SUCCESS;
static struct arglist *arglist;


static const char *arg_name(int mask)
{
	if (mask & OPT_MSG) return "msg";
	if (mask & OPT_SHM) return "shm";
	if (mask & OPT_SEM) return "sem";
	return NULL;
}

static void push_opt(int mask, unsigned long arg)
{
	struct arglist *tmp, *node = xcalloc(1, sizeof(struct arglist));

	node->mask = mask;
	node->arg = arg;

	tmp = arglist;
	if (!tmp) {
		arglist = node;
	} else {
		while (tmp->next)
			tmp = tmp->next;
		tmp->next = node;
	}
}

static void push_arg_opt(int mask, const char *arg)
{
	int base = (mask & OPT_KEY) ? 0 : 10;
	char *end = NULL;
	unsigned long l;

	l = strtoul(arg, &end, base);

	if ((*end != 0) ||	/* entire string is -not- valid */
	    ((mask & OPT_KEY) && (l == IPC_PRIVATE))) {
		fprintf(stderr, "%s%s '%s' invalid\n",
			arg_name(mask),
			mask & OPT_KEY ? "key" : "id",
			arg);
		exit_status = EXIT_FAILURE;
		return;
	}

	push_opt(mask, l);
}

static error_t parse_opt (int key, char *arg, struct argp_state *state)
{
	switch (key) {
	case 'q': push_arg_opt(OPT_MSG, arg); break;
	case 'm': push_arg_opt(OPT_SHM, arg); break;
	case 's': push_arg_opt(OPT_SEM, arg); break;
	case 'Q': push_arg_opt(OPT_MSG | OPT_KEY, arg); break;
	case 'M': push_arg_opt(OPT_SHM | OPT_KEY, arg); break;
	case 'S': push_arg_opt(OPT_SEM | OPT_KEY, arg); break;

	default:
		return ARGP_ERR_UNKNOWN;
	}

	return 0;
}

static void pinterr(const char *msg, long l)
{
	fprintf(stderr, msg, l, strerror(errno));
	exit_status = 1;
}

static void remove_one(int mask, unsigned long arg)
{
	int rc;
	int id = (int) arg;
	const char *errmsg = NULL;

	if (mask & OPT_KEY) {
		if (mask & OPT_MSG)
			id = msgget(arg, 0);
		else if (mask & OPT_SHM)
			id = shmget(arg, 0, 0);
		else if (mask & OPT_SEM)
			id = semget(arg, 0, 0);
		else
			abort();	/* should never happen */
	}

	if (id < 0) {
		pinterr("key 0x%lx lookup failed: %s\n", arg);
		return;
	}

	if (mask & OPT_MSG) {
		rc = msgctl(id, IPC_RMID, NULL);
		errmsg = "msgctl(0x%x): %s\n";
	}
	else if (mask & OPT_SHM) {
		rc = shmctl(id, IPC_RMID, NULL);
		errmsg = "shmctl(0x%x): %s\n";
	}
	else if (mask & OPT_SEM) {
		union semun dummy;
		dummy.val = 0;

		rc = semctl(id, 0, IPC_RMID, dummy);
		errmsg = "semctl(0x%x): %s\n";
	}

	else
		abort();	/* should never happen */

	if (rc < 0) {
		fprintf(stderr, errmsg, id, strerror(errno));
		exit_status = 1;
	}
}

static void remove_stuff(void)
{
	struct arglist *tmp = arglist;

	while (tmp) {
		remove_one(tmp->mask, tmp->arg);
		tmp = tmp->next;
	}
}

int main (int argc, char *argv[])
{
	error_t rc;

	pu_init();

	rc = argp_parse(&argp, argc, argv, 0, NULL, NULL);
	if (rc) {
		fprintf(stderr, "argp_parse failed: %s\n", strerror(rc));
		return 1;
	}

	remove_stuff();

	return exit_status;
}


^ permalink raw reply

* Re: checkout-cache -f: a better way?
From: Jeff Garzik @ 2005-05-20 23:58 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Git Mailing List
In-Reply-To: <7vvf5dqxfq.fsf@assigned-by-dhcp.cox.net>

Junio C Hamano wrote:
>>>>>>"JG" == Jeff Garzik <jgarzik@pobox.com> writes:
> 
> 
> JG> git-checkout-cache -f writes out every file, even if it exists, correct?
> 
> No, that's not correct.  To translate my prose, you would want
> this:

Thanks, I stand corrected :)

>     git-read-tree -m HEAD && git-checkout-cache -q -f -u -a
> 
> (notice that I do not have git-update-cache --refresh after
> that).

Yep, thanks.  Script does seem faster now.  Numbers for hot cache (first 
is pre-modification, post is your mod):

[jgarzik@pretzel libata-dev]$ time git-switch-tree adma-mwi

real    0m7.069s
user    0m4.183s
sys     0m2.817s
[jgarzik@pretzel libata-dev]$ time git-switch-tree adma

real    0m0.389s
user    0m0.294s
sys     0m0.094s

^ permalink raw reply

* Re: checkout-cache -f: a better way?
From: Junio C Hamano @ 2005-05-20 23:59 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Jeff Garzik, Git Mailing List
In-Reply-To: <Pine.LNX.4.58.0505201641160.2206@ppc970.osdl.org>

>>>>> "LT" == Linus Torvalds <torvalds@osdl.org> writes:

LT> (to which the beavis-and-butthead in me says "hehhehhehh.. He said fqua.  
LT> Hehhehh. fire fire fire.")

Earlier this week I've sent out a "Request for Help" listing
some janitorial work, on which this was one of the item.  I
believe Jeff suggested use of argp over GNU getopt(), but other
than that I do not think we had any volunteers (hint hint).  I
haven't looked into any of the RFH items myself yet.

^ permalink raw reply

* Re: gitweb wishlist
From: Linus Torvalds @ 2005-05-21  0:50 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Kay Sievers, Petr Baudis, Thomas Glanzmann, Git Mailing List
In-Reply-To: <428E745C.30304@zytor.com>

[ Thomas added to cc, since he seems to have also worked on this ]

On Fri, 20 May 2005, H. Peter Anvin wrote:
> 
> Here is my "main" OSS CVS repository; look at the syslinux module.  It 
> has at least some minor branching.

Ok, "cvsps" output scares me. I wonder what

	WARNING: Invalid PatchSet 775, Tag syslinux-2_12-pre7:
	    memdisk/init32.asm:1.3=after, memdisk/Makefile:1.26=before. Treated as 'before'
	WARNING: Invalid PatchSet 775, Tag syslinux-2_12-pre7:
	    memdisk/init32.asm:1.3=after, memdisk/e820test.c:1.7=before. Treated as 'before'
	...

means..

Also, your syslinux repo is interesting and shows another thing: doing a

	cvsps -g -p separate

ends badly with

	Directing PatchSet 938 to file separate/938.patch
	cvs rdiff: failed to read diff file header /tmp/cvso8PswZ for mdiskchk.com,v: end of file
	system command returned non-zero exit status: 1: aborting

which doesn't look very promising and causes an empty diff for
mdiskck.com. Trying with --cvs-direct shows the reason:

	Index: syslinux/sample/mdiskchk.com
	===================================================================
	RCS file: 
	/home/torvalds/src/osscvs/cvsroot/syslinux/sample/mdiskchk.com,v
	retrieving revision 1.1
	retrieving revision 1.2
	diff -u -r1.1 -r1.2
	Binary files /tmp/cvsU6MGU0 and /tmp/cvsiskFVR differ

which shows that anything that bases itself of diffs (ie uses "-g" with
cvsps) is just doomed to failure, since there's no good way to handle
binary data. Both Kay's and Thomas' scripts try to do the "-g" thing, 
that's just not right.

So the cvs->git thing would need to be based on the actual objects, which 
obviously fits git quite well, but I was really hoping to have cvsps give 
some nice intermediate format..

So it looks like we should avoid the diff format, and instead use

	cvsps -p separate

and then just parse the "Members" thing and turning each of them either
into a "delete"  (for ->.*DEAD) or "cvs checkout -rxxx" (for ".*->xxx").

Handling branches by literally treating them as different heads in git 
sounds quite simple, and indeed it looks like the basic logic for cvs->git 
translation would be

	for-each-patch-from-cvsps
	do
		git-read-tree -m branchname-from-patch
		git-update-cache -f -u -q -a
		for-each-member-in-patch
		do
			if [ DEAD ]; then
				rm member
				git-update-cache --remove member
			else
				cvs co -rREV member
				git-update-cache --add member
			fi
			cat commit-message-from-patch | 
				git-commit-tree $(git-write-tree) -p branchname-from-patch > .git/revs/heads/branchname-from-patch
		done
	done

which looks like it should work, and handle binary files right.

There seems to be two questions:

 - what to do about branch creation (ie a branch name we haven't seen
   before): it looks like cvsps doesn't tell you what the _originating_
   branch was for a new branch (that may be my confusion - maybe you can't
   create branches off branches in CVS?)

   For syslinux, it looks like you can always base it on HEAD, or possibly 
   just the previous patch (which looks like it is always HEAD). The above 
   pseudo-script will actually do that automatically, simply by virtue of
   the "git-read-tree -m" at the top of the loop failing when the
   branchname doesn't exist yet.

 - whether to bother to create merge entries for when somebody tried to 
   merge a branch back or forth in CVS. 

   CVS fundamentally doesn't have the notion of such a thing, and cvsps 
   can't either. But we could try to guess, based on the commit message, 
   perhaps.

   NOTE! Such a "merge" would not have any real GIT merge functionality 
   what-so-ever. It would just introduce a second parent into the commit, 
   nothing more.

Bah. What crud.

		Linus

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox