Git development

Git development
 help / color / mirror / Atom feed

* Re: blame not working well?
From: Junio C Hamano @ 2006-04-06  1:26 UTC (permalink / raw)
  To: Fredrik Kuivinen; +Cc: git
In-Reply-To: <7vacazy33w.fsf@assigned-by-dhcp.cox.net>

Junio C Hamano <junkio@cox.net> writes:

> I was having fun updating blame.c to use the built-in xdiff
> instead of spawning and reading from external GNU diff (it is
> currently in "next" branch).  It seems to pass the trivial
> testsuite case but I noticed for example annotating Makefile,
> sha1_name.c, or blame.c in git.git repository seems to show
> quite bogus annotation.  One extreme case is the Makefile; for
> all but one line is blamed for the very initial commit made by
> Linus X-<.  One good news for me is that the version before this
> change has the same breakage.  One bad news is this seems to
> have been broken for some time.
>
> Bisecting indicates 2a0925be3512451834ec9a3e023f4cff23c1cfb7 is
> the first bad commit, but I do not see how the change can break
> it.  I'll continue digging it, but if you have a chance, could
> you take a look, too?

It turns out that the only change needed to revert the breakage
was this one-liner.  get_revision() used to always rewrite
parents when prune and dense are specified, but the updated code
simply skips during the output filtering phase the parents that
would have been culled by calling rewrite_parents() unless the
caller tells it that it is interested in the parent field by
setting rev.parents.

-- >8 --
[PATCH] blame.c: fix completely broken ancestry traversal.

Recent revision.c updates completely broken the assignment of
blames by not rewriting commit->parents field unless explicitly
asked to by the caller.  The caller needs to set revs.parents.

Signed-off-by: Junio C Hamano <junkio@cox.net>

---

 blame.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

ba3c93743a8151e3663e1fda6b3cb165d8373ddf
diff --git a/blame.c b/blame.c
index 98f9992..9bb34e6 100644
--- a/blame.c
+++ b/blame.c
@@ -813,6 +813,7 @@ int main(int argc, const char **argv)
 	rev.prune_fn = simplify_commit;
 	rev.topo_setter = topo_setter;
 	rev.topo_getter = topo_getter;
+	rev.parents = 1;
 	rev.limited = 1;

 	commit_list_insert(start_commit, &rev.commits);
-- 
1.3.0.rc2.g9cda

^ permalink raw reply related

* [PATCH] Add documentation for git-imap-send.
From: Mike McCormack @ 2006-04-06  3:32 UTC (permalink / raw)
  To: git

[-- Attachment #1: Type: text/plain, Size: 248 bytes --]

Signed-off-by: Mike McCormack <mike@codeweavers.com>


---

  Documentation/git-imap-send.txt |   60 
+++++++++++++++++++++++++++++++++++++++
  1 files changed, 60 insertions(+), 0 deletions(-)
  create mode 100644 Documentation/git-imap-send.txt


[-- Attachment #2: 39f36da01434f743e36a7b0d6e8f625ad7785b2b.diff --]
[-- Type: text/x-patch, Size: 1350 bytes --]

39f36da01434f743e36a7b0d6e8f625ad7785b2b
diff --git a/Documentation/git-imap-send.txt b/Documentation/git-imap-send.txt
new file mode 100644
index 0000000..cfc0d88
--- /dev/null
+++ b/Documentation/git-imap-send.txt
@@ -0,0 +1,60 @@
+git-imap-send(1)
+================
+
+NAME
+----
+git-imap-send - Dump a mailbox from stdin into an imap folder
+
+
+SYNOPSIS
+--------
+'git-imap-send'
+
+
+DESCRIPTION
+-----------
+This command uploads a mailbox generated with git-format-patch
+into an imap drafts folder.  This allows patches to be sent as
+other email is sent with mail clients that cannot read mailbox
+files directly.
+
+Typical usage is something like:
+
+git-format-patch --signoff --stdout --attach origin | git-imap-send
+
+
+CONFIGURATION
+-------------
+
+git-imap-send requires the following values in the repository
+configuration file (shown with examples):
+
+[imap]
+    Folder = "INBOX.Drafts"
+
+[imap]
+    Tunnel = "ssh -q user@server.com /usr/bin/imapd ./Maildir 2> /dev/null"
+
+[imap]
+    Host = imap.server.com
+    User = bob
+    Password = pwd
+    Port = 143
+
+
+BUGS
+----
+Doesn't handle lines starting with "From " in the message body.
+
+
+Author
+------
+Derived from isync 1.0.1 by Mike McCormack.
+
+Documentation
+--------------
+Documentation by Mike McCormack
+
+GIT
+---
+Part of the gitlink:git[7] suite


^ permalink raw reply related

* Re: Cygwin can't handle huge packfiles?
From: Junio C Hamano @ 2006-04-06  4:13 UTC (permalink / raw)
  To: Kees-Jan Dijkzeul; +Cc: git
In-Reply-To: <fa0b6e200604050624h13ebd8deg241ae98cef1f5a74@mail.gmail.com>

"Kees-Jan Dijkzeul" <k.j.dijkzeul@gmail.com> writes:

> I'm trying to get Git to manage my companies source tree. We're
> writing software for digital TV sets. Anyway, the archive is about 5Gb
> in size and contains binaries, zip files, excel sheets meeting minutes
> and whatnot. So it doesn't compress very well. The 1.5Gb pack file
> hardly contains any history at all (five commits or so). On the flip
> side, for now I'll be the only one adding to the archive, so at least
> it will not grow that fast ;-)
>
> Anyway, to reconstitute the tree, I need very nearly the entire pack,
> so limiting the pack size won't do much good, as git will still try to
> allocate a total of 1.5Gb memory (which, unfortunately, isn't there
> :-)

Right now we LRU the pack files and evict older ones when we
mmap too many, but the unit of eviction is the whole file, so it
would not help the case like yours at all.  It might be possible
to mmap only part of a packfile, but it would involve fairly
major surgery to sha1_file.c.

^ permalink raw reply

* Fixes to parsecvs
From: Keith Packard @ 2006-04-06  6:36 UTC (permalink / raw)
  To: Git Mailing List; +Cc: keithp

[-- Attachment #1: Type: text/plain, Size: 656 bytes --]

note, parsecvs remains available from:

	git://git.freedesktop.org/~keithp/parsecvs

I've "fixed" the lexer to permit getc/ungetc in the data parsing
functions. This should resolve the flex -l / -X problems.

Jim Radford send a patch to add '/' as a legal tag character

I added my custom edit-change-log script for people dealing with
X.org-style commit messages.

And, it deals with import branch revisions that aren't supposed to
get merged back to the trunk, creating a custom branch name based on the
branch revision (which must be global across all files).

5e5f4c012aec2db012a08b1c7ed5219ed5100111

-- 
keith.packard@intel.com

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 191 bytes --]

^ permalink raw reply

* [PATCH] Thin pack generation: optimization.
From: Junio C Hamano @ 2006-04-06  6:58 UTC (permalink / raw)
  To: git
In-Reply-To: <7vd5fzlnyt.fsf@assigned-by-dhcp.cox.net>

Jens Axboe noticed that recent "git push" has become very slow
since we made --thin transfer the default.

Thin pack generation to push a handful revisions that touch
relatively small number of paths out of huge tree was stupid; it
registered _everything_ from the excluded revisions.  As a
result, "Counting objects" phase was unnecessarily expensive.

This changes the logic to register the blobs and trees from
excluded revisions only for paths we are actually going to send
to the other end.

Signed-off-by: Junio C Hamano <junkio@cox.net>

---

 pack-objects.c |  119 +++++++++++++++++++++++++++++++++++++-------------------
 1 files changed, 79 insertions(+), 40 deletions(-)

b5026f319fc873d03a02f15992b7e431b4b5ad03
diff --git a/pack-objects.c b/pack-objects.c
index 9346392..f1bd8a6 100644
--- a/pack-objects.c
+++ b/pack-objects.c
@@ -505,24 +505,17 @@ static unsigned name_hash(struct name_pa
 	 * but close enough.
 	 */
 	hash = (name_hash<<DIRBITS) | (hash & ((1U<<DIRBITS )-1));
-
-	if (0) { /* debug */
-		n = name + strlen(name);
-		if (n != name && n[-1] == '\n')
-			n--;
-		while (name <= --n)
-			fputc(*n, stderr);
-		for (p = path; p; p = p->up) {
-			fputc('/', stderr);
-			n = p->elem + p->len;
-			while (p->elem <= --n)
-				fputc(*n, stderr);
-		}
-		fprintf(stderr, "\t%08x\n", hash);
-	}
 	return hash;
 }
+
+static struct pbase_tree {
+	struct pbase_tree *next;
 
+	unsigned char sha1[20];
+	void *tree_data;
+	unsigned long tree_size;
+} *pbase_tree;
+
 static int add_object_entry(const unsigned char *sha1, unsigned hash, int exclude)
 {
 	unsigned int idx = nr_objects;
@@ -585,58 +578,99 @@ static int add_object_entry(const unsign
 		}
 	}
 	return status;
+}
+
+static int name_cmp_len(const char *name)
+{
+	int i;
+	for (i = 0; name[i] && name[i] != '\n' && name[i] != '/'; i++)
+		;
+	return i;
 }
 
-static void add_pbase_tree(struct tree_desc *tree, struct name_path *up)
+static void add_pbase_object(struct tree_desc *tree,
+			     struct name_path *up,
+			     const char *name,
+			     int cmplen)
 {
 	while (tree->size) {
 		const unsigned char *sha1;
-		const char *name;
-		unsigned mode, hash;
+		const char *entry_name;
+		int entry_len;
+		unsigned mode;
 		unsigned long size;
 		char type[20];
 
-		sha1 = tree_entry_extract(tree, &name, &mode);
+		sha1 = tree_entry_extract(tree, &entry_name, &mode);
 		update_tree_entry(tree);
-		if (!has_sha1_file(sha1))
-			continue;
-		if (sha1_object_info(sha1, type, &size))
-			continue;
-
-		hash = name_hash(up, name);
-		if (!add_object_entry(sha1, hash, 1))
+		entry_len = strlen(entry_name);
+		if (entry_len != cmplen ||
+		    memcmp(entry_name, name, cmplen) ||
+		    !has_sha1_file(sha1) ||
+		    sha1_object_info(sha1, type, &size))
 			continue;
-
+		if (name[cmplen] != '/') {
+			unsigned hash = name_hash(up, name);
+			add_object_entry(sha1, hash, 1);
+			return;
+		}
 		if (!strcmp(type, tree_type)) {
 			struct tree_desc sub;
 			void *elem;
 			struct name_path me;
 
+			/* We probably should cache these
+			 * intermediate tree buffers...
+			 */
 			elem = read_sha1_file(sha1, type, &sub.size);
 			sub.buf = elem;
 			if (sub.buf) {
+				const char *down = name+cmplen+1;
+				int downlen = name_cmp_len(down);
+
 				me.up = up;
-				me.elem = name;
-				me.len = strlen(name);
-				add_pbase_tree(&sub, &me);
+				me.elem = entry_name;
+				me.len = entry_len;
+				add_pbase_object(&sub, &me, down, downlen);
 				free(elem);
 			}
 		}
 	}
 }
+
+static void add_preferred_base_object(char *name)
+{
+	struct pbase_tree *it;
+	int cmplen = name_cmp_len(name);
 
+	for (it = pbase_tree; it; it = it->next) {
+		if (cmplen == 0) {
+			unsigned hash = name_hash(NULL, "");
+			add_object_entry(it->sha1, hash, 1);
+		}
+		else {
+			struct tree_desc tree;
+			tree.buf = it->tree_data;
+			tree.size = it->tree_size;
+			add_pbase_object(&tree, NULL, name, cmplen);
+		}
+	}
+}
+
 static void add_preferred_base(unsigned char *sha1)
 {
-	struct tree_desc tree;
-	void *elem;
+	struct pbase_tree *it;
 
-	elem = read_object_with_reference(sha1, tree_type, &tree.size, NULL);
-	tree.buf = elem;
-	if (!tree.buf)
+	for (it = pbase_tree; it && memcmp(it->sha1, sha1, 20); it = it->next)
+		;
+	if (it)
 		return;
-	if (add_object_entry(sha1, name_hash(NULL, ""), 1))
-		add_pbase_tree(&tree, NULL);
-	free(elem);
+	it = xcalloc(1, sizeof(*it));
+	it->next = pbase_tree;
+	pbase_tree = it;
+	memcpy(it->sha1, sha1, 20);
+	it->tree_data = read_object_with_reference(sha1, tree_type,
+						   &it->tree_size, NULL);
 }
 
 static void check_object(struct object_entry *entry)
@@ -1051,6 +1085,7 @@ int main(int argc, char **argv)
 	char line[PATH_MAX + 20];
 	int window = 10, depth = 10, pack_to_stdout = 0;
 	struct object_entry **list;
+	int num_preferred_base = 0;
 	int i;
 
 	setup_git_directory();
@@ -1116,6 +1151,7 @@ int main(int argc, char **argv)
 
 	for (;;) {
 		unsigned char sha1[20];
+		unsigned hash;
 
 		if (!fgets(line, sizeof(line), stdin)) {
 			if (feof(stdin))
@@ -1132,12 +1168,15 @@ int main(int argc, char **argv)
 			if (get_sha1_hex(line+1, sha1))
 				die("expected edge sha1, got garbage:\n %s",
 				    line+1);
-			add_preferred_base(sha1);
+			if (num_preferred_base++ < window)
+				add_preferred_base(sha1);
 			continue;
 		}
 		if (get_sha1_hex(line, sha1))
 			die("expected sha1, got garbage:\n %s", line);
-		add_object_entry(sha1, name_hash(NULL, line+41), 0);
+		add_preferred_base_object(line+41);
+		hash = name_hash(NULL, line+41);
+		add_object_entry(sha1, hash, 0);
 	}
 	if (progress)
 		fprintf(stderr, "Done counting %d objects.\n", nr_objects);
-- 
1.3.0.rc2.g9cda

^ permalink raw reply related

* Re: Fixes to parsecvs
From: Jan-Benedict Glaw @ 2006-04-06 12:08 UTC (permalink / raw)
  To: Keith Packard; +Cc: Git Mailing List
In-Reply-To: <1144305392.2303.240.camel@neko.keithp.com>

[-- Attachment #1: Type: text/plain, Size: 1977 bytes --]

On Wed, 2006-04-05 23:36:32 -0700, Keith Packard <keithp@keithp.com> wrote:
> note, parsecvs remains available from:
> 
> 	git://git.freedesktop.org/~keithp/parsecvs

It now compiles out-of-the-box for me, nice work.

However, it would be nice if you'd add a short description about how
to use it. Something like this:
---------------------------------------------------------------------
There's still a lot of work to do on parsecvs, but if you want to give
it a run, first create a copy of the whole CVS tree and go to the base
directory of this copy. (You find a lot of *,v files in this directory
and all its subdirectories.)
Now feed all ,v filenames into parsecvs. Keep in mind that a
`edit-change-log' executable needs to be in your $PATH (a one-line
script only exit'ing with 0 will do the job.):

	find . -type f -name '*,v' -print | parsecvs

This will create the .git/ directory and put all the objects, commits
and tree information into this new git repository.
---------------------------------------------------------------------

I just ran it against a locally rsync'ed copy of the Binutils ,v
files. Looging at the progress bar, it is bascally ready:


Load:               winsup/configure.in,v ....................* 27704 of 27704


But it seems it now starts to really consume memory:

jbglaw@bixie:~/bin$ ps axflwww|egrep '(VSZ|parsecvs)'|grep -v grep
F   UID   PID  PPID PRI  NI    VSZ   RSS WCHAN  STAT TTY        TIME COMMAND
0  1000 15564 22879  18   0 2805084 549996 finish T  pts/10    30:51 |       \_ parsecvs

How well does this work with even larger repositories?

MfG, JBG

-- 
Jan-Benedict Glaw       jbglaw@lug-owl.de    . +49-172-7608481             _ O _
"Eine Freie Meinung in  einem Freien Kopf    | Gegen Zensur | Gegen Krieg  _ _ O
 für einen Freien Staat voll Freier Bürger"  | im Internet! |   im Irak!   O O O
ret = do_actions((curr | FREE_SPEECH) & ~(NEW_COPYRIGHT_LAW | DRM | TCPA));

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply

* [PATCH] gitk: Fix Tcl error when merge has interesting diffs.
From: Mark Wooding @ 2006-04-06 12:16 UTC (permalink / raw)
  To: git

From: Mark Wooding <mdw@distorted.org.uk>

If a merge commit with nontrivial diffs is selected, gitk reports a Tcl
error:

wrong # args: should be "getmergediffline mdf id np"
    while executing
"getmergediffline file7 9fdb62af92c741addbea15545f214a6e89460865"

Change 79b2c75e... introduced the `np' argument to getmergediffline, but
failed to pass it when the function detaches and reattaches to the file
to make way for an update.

Signed-off-by: Mark Wooding <mdw@distorted.org.uk>
---

 gitk |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/gitk b/gitk
index 26fa79a..3b92820 100755
--- a/gitk
+++ b/gitk
@@ -2700,7 +2700,7 @@ proc getmergediffline {mdf id np} {
 	incr nextupdate 100
 	fileevent $mdf readable {}
 	update
-	fileevent $mdf readable [list getmergediffline $mdf $id]
+	fileevent $mdf readable [list getmergediffline $mdf $id $np]
     }
 }

^ permalink raw reply related

* Re: unchecked uses of strdup
From: Alex Riesen @ 2006-04-06 14:11 UTC (permalink / raw)
  To: Jim Meyering; +Cc: git
In-Reply-To: <87d5fwau3z.fsf_-_@rho.meyering.net>

On 4/5/06, Jim Meyering <jim@meyering.net> wrote:
> There are pretty many uses of strdup in git's sources.
> Here's one that can cause trouble if it ever returns NULL:
>
>     [from fsck-objects.c]
>     static int fsck_head_link(void)
>     {
>             unsigned char sha1[20];
>             const char *git_HEAD = strdup(git_path("HEAD"));
>             const char *git_refs_heads_master = resolve_ref(git_HEAD, sha1, 1);
>
> The problem is that resolve_ref does an unconditional `stat'
> on the parameter corresponding to the maybe-NULL git_HEAD.

That's actually alright (aside a nice core file). Worse are the cases
where a NULL would cause some "normal" behaviour, e.g. arguments,
which have a meaning for NULL value.

^ permalink raw reply

* Re: Fixes to parsecvs
From: Keith Packard @ 2006-04-06 14:48 UTC (permalink / raw)
  To: Jan-Benedict Glaw; +Cc: keithp, Git Mailing List
In-Reply-To: <20060406120812.GO13324@lug-owl.de>

[-- Attachment #1: Type: text/plain, Size: 2098 bytes --]

On Thu, 2006-04-06 at 14:08 +0200, Jan-Benedict Glaw wrote:
> On Wed, 2006-04-05 23:36:32 -0700, Keith Packard <keithp@keithp.com> wrote:
> > note, parsecvs remains available from:
> > 
> > 	git://git.freedesktop.org/~keithp/parsecvs
> 
> It now compiles out-of-the-box for me, nice work.

cool

> 
> However, it would be nice if you'd add a short description about how
> to use it. Something like this:

I'd rather just fix the usage to be more sane; that shouldn't take but a
few minutes...

> I just ran it against a locally rsync'ed copy of the Binutils ,v
> files. Looging at the progress bar, it is bascally ready:
> 
> 
> Load:               winsup/configure.in,v ....................* 27704 of 27704

Now all of the ,v files have been parsed and each revision placed in
the .git repository as a blob.

> But it seems it now starts to really consume memory:

Yeah, it's doing the change set computation, which is not very space
efficient; it computes the entire set of files at each commit which can
take 'a bit' of space with a large number of files over a long period of
time. Obviously computing revision deltas and saving those would make it
use a lot less memory.

> jbglaw@bixie:~/bin$ ps axflwww|egrep '(VSZ|parsecvs)'|grep -v grep
> F   UID   PID  PPID PRI  NI    VSZ   RSS WCHAN  STAT TTY        TIME COMMAND
> 0  1000 15564 22879  18   0 2805084 549996 finish T  pts/10    30:51 |       \_ parsecvs

I'd run a large repository on a large machine; I managed to get
postgresql to run on my laptop (615M CVS with 6000 files), but anything
larger I'd probably want to get it onto a big enough machine. The
question is whether it needs to be more efficient so that people can
constantly convert repositories or whether moving the repository to a
sufficiently large machine for the one-time conversion is 'good enough'.

> How well does this work with even larger repositories?

postgresql is the largest I've run; starting with a 615M CVS repository,
it built a 1.7G .git tree, which packed down to 125M.

-- 
keith.packard@intel.com

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 191 bytes --]

^ permalink raw reply

* Re: Fixes to parsecvs
From: Johannes Schindelin @ 2006-04-06 15:26 UTC (permalink / raw)
  To: Keith Packard; +Cc: Git Mailing List
In-Reply-To: <1144334896.2303.259.camel@neko.keithp.com>

Hi,

On Thu, 6 Apr 2006, Keith Packard wrote:

> On Thu, 2006-04-06 at 14:08 +0200, Jan-Benedict Glaw wrote:
> 
> > But it seems it now starts to really consume memory:
> 
> The question is whether it needs to be more efficient so that people can 
> constantly convert repositories or whether moving the repository to a 
> sufficiently large machine for the one-time conversion is 'good enough'.

Keep in mind that there are many more valid uses for tracking a CVS 
repository than to import it once.

Ciao,
Dscho

^ permalink raw reply

* Re: Fixes to parsecvs
From: Jan-Benedict Glaw @ 2006-04-06 16:09 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Keith Packard, Git Mailing List
In-Reply-To: <Pine.LNX.4.63.0604061723410.23681@wbgn013.biozentrum.uni-wuerzburg.de>

[-- Attachment #1: Type: text/plain, Size: 1518 bytes --]

On Thu, 2006-04-06 17:26:14 +0200, Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote:
> On Thu, 6 Apr 2006, Keith Packard wrote:
> > On Thu, 2006-04-06 at 14:08 +0200, Jan-Benedict Glaw wrote:
> > > But it seems it now starts to really consume memory:
> > The question is whether it needs to be more efficient so that people can 
> > constantly convert repositories or whether moving the repository to a 
> > sufficiently large machine for the one-time conversion is 'good enough'.
> 
> Keep in mind that there are many more valid uses for tracking a CVS 
> repository than to import it once.

Even the most simplest usage case reveals this. (It's also what I'm
about to do the the converted GCC repository.)

Get the repo, locally track the changes (so the importet branches are
all like "vendor branches") and do own work in local branches.

I'll do this eg. to be able to easily re-diff patches, which I want to
put into GIT, just because it's so much more convenient than SVN.
However, this is only possible because I'm able to keep track of
upstream SVN changes. They probably won't change their SCM again, just
after they've introduced SVN.

MfG, JBG

-- 
Jan-Benedict Glaw       jbglaw@lug-owl.de    . +49-172-7608481             _ O _
"Eine Freie Meinung in  einem Freien Kopf    | Gegen Zensur | Gegen Krieg  _ _ O
 für einen Freien Staat voll Freier Bürger"  | im Internet! |   im Irak!   O O O
ret = do_actions((curr | FREE_SPEECH) & ~(NEW_COPYRIGHT_LAW | DRM | TCPA));

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply

* [PATCH] cg-rm: Add option to remove all files which are gone from the working copy
From: Dennis Stosberg @ 2006-04-06 16:17 UTC (permalink / raw)
  To: git

Add an -a option to cg-rm, which removes all files that have been
physically deleted.

Signed-off-by: Dennis Stosberg <dennis@stosberg.net>


---

 cg-rm |   19 +++++++++++++++++--
 1 files changed, 17 insertions(+), 2 deletions(-)

ac5fc0c8d9c9ccecba7cbf83a74a163bff79f8f4
diff --git a/cg-rm b/cg-rm
index 5ab5dc8..0276632 100755
--- a/cg-rm
+++ b/cg-rm
@@ -19,13 +19,18 @@ #
 # -r:: Remove files recursively
 #	If you pass cg-rm this flag and any directory names, it will try
 #	to remove files in those directories recursively.
+#
+# -a:: Remove all files which are gone from the working copy
+#	Remove all files which have been deleted in the working copy
+#	from the index.
 
-USAGE="cg-rm [-f] [-n] [-r] FILE..."
+USAGE="cg-rm [-f] [-n] [-r] [-a] FILE..."
 
 . "${COGITO_LIB}"cg-Xlib || exit 1
 
 delete=
 recursive=
+rmgone=
 while optparse; do
 	if optparse -f; then
 		delete=1
@@ -33,12 +38,14 @@ while optparse; do
 		delete=
 	elif optparse -r; then
 		recursive=1
+	elif optparse -a; then
+		rmgone=1
 	else
 		optfail
 	fi
 done
 
-[ -n "${ARGS[*]}" ] || usage
+[ -n "${ARGS[*]}" -o "$rmgone" ] || usage
 
 TMPFILE="$(mktemp -t gitrm.XXXXXX)" || exit 1
 TMPDIRFILE="$(mktemp -t gitrm.XXXXXX)" || exit 1
@@ -57,6 +64,14 @@ for file in "${ARGS[@]}"; do
 		echo "$file" >>"$TMPFILE"
 	fi
 done
+
+if [ "$rmgone" ]; then
+	cg-status -s \! -n -w >>"$TMPFILE"
+	if [ ! $(cat "$TMPFILE" | sed -n "$=") ]; then
+		rm "$TMPFILE" "$TMPDIRFILE"
+		die "no files to remove"
+	fi
+fi
 
 cat "$TMPFILE" | sed 's/^/Removing file /'
 if [ "$delete" ]; then (
-- 
1.3-rc2.GIT

^ permalink raw reply related

* Re: Fixes to parsecvs
From: Keith Packard @ 2006-04-06 17:36 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: keithp, Git Mailing List
In-Reply-To: <Pine.LNX.4.63.0604061723410.23681@wbgn013.biozentrum.uni-wuerzburg.de>

[-- Attachment #1: Type: text/plain, Size: 587 bytes --]

On Thu, 2006-04-06 at 17:26 +0200, Johannes Schindelin wrote:

> Keep in mind that there are many more valid uses for tracking a CVS 
> repository than to import it once.

Sure, but we should fix parsecvs to handle incremental CVS tracking if
that's one of the goals for this utility. git-cvsimport does this by
skipping commits earlier than a fixed time; if we did that, we'd
eliminate the huge memory usage except for initial imports. I haven't
considered how this might be done in detail yet; I have no personal need
for this functionality.

-- 
keith.packard@intel.com

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 191 bytes --]

^ permalink raw reply

* git-clone and cg-clone
From: Belmar-Letelier @ 2006-04-06 18:10 UTC (permalink / raw)
  To: git

Hello

I'm a cogito users,

Since 0.17 to take benefit of cg-switch

I use:

$ git-clone  xxx
$ cg-branch-add origin xxx

instead of

$ cg-clone xxx

becauce cg-clone did not fetch all the heads.

Is there a better way to do this ?

-- 
Luis Belmar-Letelier

^ permalink raw reply

* Re: parsecvs tool now creates git repositories
From: Jim Radford @ 2006-04-06 18:15 UTC (permalink / raw)
  To: Keith Packard; +Cc: Git Mailing List
In-Reply-To: <1144305392.2303.240.camel@neko.keithp.com>

Hi Keith,

Here's one more build patch.  For some reason the Fedora lex doesn't
want a space after the -o.

Almost all of the errors I was seeing in the last version were fixed
with your "branches that don't get merged back to the trunk" fix.

Thanks,
-Jim

diff --git a/Makefile b/Makefile
index 4ca6ffd..137ed34 100644
--- a/Makefile
+++ b/Makefile
@@ -4,7 +4,7 @@ GCC_WARNINGS3=-Wnested-externs -fno-stri
 GCC_WARNINGS=$(GCC_WARNINGS1) $(GCC_WARNINGS2) $(GCC_WARNINGS3)
 CFLAGS=-O0 -g $(GCC_WARNINGS)
 YFLAGS=-d -l
-LFLAGS=-l -o lex.c
+LFLAGS=-l -olex.c

 SRCS=gram.y lex.l cvs.h parsecvs.c cvsutil.c \
        revlist.c atom.c revcvs.c git.c gitutil.c

^ permalink raw reply related

* gitweb: A git:// link for the project?
From: Jan-Benedict Glaw @ 2006-04-06 19:14 UTC (permalink / raw)
  To: Kay Sievers; +Cc: Git Mailing List

[-- Attachment #1: Type: text/plain, Size: 1729 bytes --]

Hi Kay!

I've got another small idea for gitweb, probably only a line or two,
but Perl isn't exactly my preferred language :)

The idea is to have a 'Copy Link-Lokation'able link presented by
gitweb, maybe placed right to summary, shortlog and log.

That should just link to a URL prefix
(git://git.yourbox.tld/some/path) plus the project path. I guess it's
something like this, but I haven't tested it. Should be good enough to
get an idea of what I ment:

Signed-off-by: Jan-Benedict Glaw <jbglaw@lug-owl.de>

diff --git a/gitweb.cgi b/gitweb.cgi
index c1bb624..b2942fd 100755
--- a/gitweb.cgi
+++ b/gitweb.cgi
@@ -26,6 +26,9 @@ my $rss_link =		"";
 #my $projectroot =	"/pub/scm";
 my $projectroot =	"/home/kay/public_html/pub/scm";
 
+# URL prefix for the git:// link
+my $urlprefix =		"git://git.kernel.org/pub/scm";
+
 # location of the git-core binaries
 my $gitbin =		"/usr/bin";
 
@@ -920,6 +923,7 @@ sub git_project_list {
 		      $cgi->a({-href => "$my_uri?" . esc_param("p=$pr->{'path'};a=summary")}, "summary") .
 		      " | " . $cgi->a({-href => "$my_uri?" . esc_param("p=$pr->{'path'};a=shortlog")}, "shortlog") .
 		      " | " . $cgi->a({-href => "$my_uri?" . esc_param("p=$pr->{'path'};a=log")}, "log") .
+		      " | " . $cgi->a({-href => "$urlprefix" . esc_param("/$pr->{'path'}")}, "GIT") .
 		      "</td>\n" .
 		      "</tr>\n";
 	}


-- 
Jan-Benedict Glaw       jbglaw@lug-owl.de    . +49-172-7608481             _ O _
"Eine Freie Meinung in  einem Freien Kopf    | Gegen Zensur | Gegen Krieg  _ _ O
 für einen Freien Staat voll Freier Bürger"  | im Internet! |   im Irak!   O O O
ret = do_actions((curr | FREE_SPEECH) & ~(NEW_COPYRIGHT_LAW | DRM | TCPA));

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply related

* Re: parsecvs tool now creates git repositories
From: Keith Packard @ 2006-04-06 20:12 UTC (permalink / raw)
  To: Jim Radford; +Cc: keithp, Git Mailing List
In-Reply-To: <20060406181502.GA15741@blackbean.org>

[-- Attachment #1: Type: text/plain, Size: 560 bytes --]

On Thu, 2006-04-06 at 11:15 -0700, Jim Radford wrote:
> Hi Keith,
> 
> Here's one more build patch.  For some reason the Fedora lex doesn't
> want a space after the -o.

I probably shouldn't even use the -o flag; all it does is change the
#line directives in the output file to point at lex.c instead of
<stdout>. I'm sure it'll break something.

> Almost all of the errors I was seeing in the last version were fixed
> with your "branches that don't get merged back to the trunk" fix.

That's good news at least.

-- 
keith.packard@intel.com

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 191 bytes --]

^ permalink raw reply

* [PATCH] fix gitk with lots of tags
From: Jim Radford @ 2006-04-06 20:36 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: Junio C Hamano, Git Mailing List

Hi Paul,

This fix allow gitk to be used on repositories with lots of tags.  It
bypasses git-rev-parse and passes its arguments to git-rev-list
directly to avoid the command line length restrictions.

Signed-Off-By: Jim Radford <radford@blackbean.org>

-Jim

---
diff --git a/gitk b/gitk
index 26fa79a..40672fb 100755
--- a/gitk
+++ b/gitk
@@ -17,19 +17,11 @@ proc gitdir {} {
 }
 
 proc parse_args {rargs} {
-    global parsed_args
-
-    if {[catch {
-	set parse_args [concat --default HEAD $rargs]
-	set parsed_args [split [eval exec git-rev-parse $parse_args] "\n"]
-    }]} {
-	# if git-rev-parse failed for some reason...
-	if {$rargs == {}} {
-	    set rargs HEAD
-	}
-	set parsed_args $rargs
+    if {$rargs == {}} {
+        return HEAD
+    } else {
+	return $rargs
     }
-    return $parsed_args
 }
 
 proc start_rev_list {rlargs} {

^ permalink raw reply related

* Re: Cygwin can't handle huge packfiles?
From: linux @ 2006-04-06 20:57 UTC (permalink / raw)
  To: git, junkio; +Cc: linux

> Right now we LRU the pack files and evict older ones when we
> mmap too many, but the unit of eviction is the whole file, so it
> would not help the case like yours at all.  It might be possible
> to mmap only part of a packfile, but it would involve fairly
> major surgery to sha1_file.c.

The simplest solution seems to be to limit pack file size to a reasonable
fraction of a 32-bit address space.  Say, 0.5 G.

That should be a fairly straightforward hack to git-pack-objects.
It already emits two files; just make it emit more.

You can tweak the heurisitics to try to find a good break point: start
thinking about splitting the pack when you get to one size, but don't
force a break until you hit a harder limit as long as the deltas are
working well.

This can all be adjustable with a command line and/or config file option
to allow for the eventual demise of 32-bit systems.

^ permalink raw reply

* Re: Fix up diffcore-rename scoring
From: Geert Bosch @ 2006-04-06 21:01 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Junio C Hamano, Git Mailing List
In-Reply-To: <Pine.LNX.4.64.0603122316160.3618@g5.osdl.org>

[-- Attachment #1: Type: text/plain, Size: 5171 bytes --]

On Mar 13, 2006, at 02:44, Linus Torvalds wrote:
> It might be that the fast delta thing is a good way to ask "is this  
> even
> worth considering", to cut down the O(m*n) rename/copy detection to
> something much smaller, and then use xdelta() to actually figure  
> out what
> is a good rename and what isn't from a much smaller set of potential
> targets.

Here's a possible way to do that first cut. Basically,
compute a short (256-bit) fingerprint for each file, such
that the Hamming distance between two fingerprints is a measure
for their similarity. I'll include a draft write up below.

My initial implementation seems reasonably fast, works
great for 4000 (decompressed) files (25M) randomly plucked
from an old git.git repository without packs. It works OK for
comparing tar archives for GCC releases, but then it becomes
clear that random walks aren't that random anymore and
become dominated by repeated information, such as tar headers.

Speed is about 10MB/sec on my PowerBook, but one could cache
fingerprints so they only need to be computed once.
The nice thing is that one can quickly find similar files
only using the fingerprint (and in practice file size),
no filenames: this seems to fit the git model well.

I'll attach my test implementation below, it uses
David Mazieres Rabinpoly code and D. Phillips's fls code.
Please don't mind my C coding, it's not my native language.
Also, this may have some Darwinisms, although it should
work on Linux too.

   -Geert

Estimating Similarity

For estimating similarity between strings A and B, let
SA and SB be the collection of all substrings with length
W of A and B. Similarity now is defined as the ratio of
the intersection and the union of SA and SB.

The length W of these substrings is the window size, and here is
chosen somewhat arbitrarily to be 48. The idea is to make them not
so short that all context is lost (like counting symbol frequencies),
but not so long that a few small changes can affect a large portion
of substrings.  Of course, a single symbol change may affect up to
48 substrings.

Let "&" be the string concatenation operator.
If A = S2 & S1 & S2 & S3 & S2, and B = S2 & S3 & S2 & S1 & S2,
then if the length of S2 is at least W - 1, the strings
will have the same set of substrings and be considered equal
for purpose of similarity checking.  This behavior is actually
welcome, since reordering sufficiently separated pieces of a
document do not make it substantially different.

Instead of computing the ratio of identical substrings directly,
compute a 1-bit hash for each substring and calculate the difference
between the number of zeroes and ones. If the hashes appear random,
this difference follows a binomial distribution. Two files are
considered "likely similar" if their differences have the same sign.

The assumption that the hashes are randomly distributed, is not
true if there are many repeated substrings. For most applications,
it will be sufficient to ignore such repetitions (by using a small
cache of recently encountered hashes) as they do not convey much
actual information. For example, for purposes of finding small
deltas between strings, duplicating existing text will not significantly
increase the delta.

For a string with N substrings, of which K changed, perform a random
walk of N steps in 1-dimensional space (see [1]): what is the  
probability
the origin was crossed an odd number of times in the last K steps?
As the expected distance is Sqrt (2 * N / Pi), this probability
gets progressively smaller for larger N and a given ratio of N and K.
For larger files, the result should be quite stable.

In order to strengthen this similarity check and be able to
quantify the degree of similarity, many independent 1-bit hashes
are computed and counted for each string and assembled into
a bit vector of 256 bits, called the fingerprint. Each bit
of the fingerprint represents the result of independent
statistical experiment. For similar strings, corresponding bits
are more likely to be the same than for random strings.

For efficiency, a 64-bit hash is computed using a irreducible
Rabin polynomial of degree 63. The algebraic properties
of these allow for efficient calculation over a sliding window
of the input. [2] As the cryptographic advantages of randomly
generated hash functions are not required, a fixed polynomial
has been chosen.

This 64-bit hash is expanded to 256 bits by using three bits
to select 32 of the 256 bits in the fingerprint to update.
So, for every 8-bit character the polynomial needs updating,
and 32 counters are incremented or decremented.
So, each of the 256 counters represents a random walk that
is N / 4, for a string of length N.

The similarity of A and B can now be expressed as the Hamming
distance between the two bit vectors, divided by the expected
distance between two random vectors. This similarity score is
a number between 0 and 2, where smaller values mean the strings
are more similar, and values of 1 or more mean they are dissimilar.

One of the unique properties of this fingerprint is the
ability to compare files in different locations by only
transmitting their fingerprint.

[-- Attachment #2: gsimm.c --]
[-- Type: application/octet-stream, Size: 10801 bytes --]

#include <unistd.h>
#include <stdlib.h>
#include <fcntl.h>
#include <libgen.h>
#include <stdio.h>
#include <assert.h>
#include <math.h>

#include <sys/types.h>
#include <sys/stat.h>
#include <sys/mman.h>

#include "rabinpoly.h"

/* Length of file message digest (MD) in bytes. Longer MD's are
   better, but increase processing time for diminishing returns.
   Must be multiple of NUM_HASHES_PER_CHAR / 8, and at least 24
   for good results 
*/
#define MD_LENGTH 32
#define MD_BITS (MD_LENGTH * 8)

/* Has to be power of two. Since the Rabin hash only has 63
   usable bits, the number of hashes is limited to 32.
   Lower powers of two could be used for speeding up processing
   of very large files.  */
#define NUM_HASHES_PER_CHAR 32

/* For the final counting, do not count each bit individually, but
   group them. Must be power of two, at most NUM_HASHES_PER_CHAR.
   However, larger sizes result in higher cache usage. Use 8 bits
   per group for efficient processing of large files on fast machines
   with decent caches, or 4 bits for faster processing of small files
   and for machines with small caches.  */
#define GROUP_BITS 4
#define GROUP_COUNTERS (1<<GROUP_BITS)

/* The RABIN_WINDOW_SIZE is the size of fingerprint window used by 
   Rabin algorithm. This is not a modifiable parameter.

   The first RABIN_WINDOW_SIZE - 1 bytes are skipped, in order to ensure
   fingerprints are good hashes. This does somewhat reduce the
   influence of the first few bytes in the file (they're part of
   fewer windows, like the last few bytes), but that actually isn't
   so bad as files often start with fixed content that may bias comparisons.
*/

/* The MIN_FILE_SIZE indicates the absolute minimal file size that
   can be processed. As indicated above, the first and last 
   RABIN_WINDOW_SIZE - 1 bytes are skipped. 
   In order to get at least an average of 12 samples
   per bit in the final message digest, require at least 3 * MD_LENGTH
   complete windows in the file.  */
#define MIN_FILE_SIZE (3 * MD_LENGTH + 2 * (RABIN_WINDOW_SIZE - 1))

/* Limit matching algorithm to files less than 256 MB, so we can use
   32 bit integers everywhere without fear of overflow. For larger
   files we should add logic to mmap the file by piece and accumulate
   the frequency counts. */
#define MAX_FILE_SIZE (256*1024*1024 - 1)

/* Size of cache used to eliminate duplicate substrings.
   Make small enough to comfortably fit in L1 cache.  */
#define DUP_CACHE_SIZE 256

#define MIN(x,y) ((y)<(x) ? (y) : (x))
#define MAX(x,y) ((y)>(x) ? (y) : (x))

typedef struct fileinfo
{ char		*name;
  size_t	length;
  u_char	md[MD_LENGTH];
  int		match;
} File;

int flag_verbose = 0;
int flag_debug = 0;
int flag_warning = 0;
char *flag_relative = 0;

char cmd[12] = "        ...";
char md_strbuf[MD_LENGTH * 2 + 1];
u_char relative_md [MD_LENGTH];

File *file;
int    file_count;
size_t file_bytes;

FILE *msgout;

char hex[17] = "0123456789abcdef";
double pi = 3.14159265358979323844;

int freq[MD_BITS];
u_int64_t freq_dups = 0;

void usage()
{  fprintf (stderr, "usage: %s [-dhvw] [-r fingerprint] file ...\n", cmd);
   fprintf (stderr, " -d\tdebug output, repeate for more verbosity\n");
   fprintf (stderr, " -h\tshow this usage information\n");
   fprintf (stderr, " -r\tshow distance relative to fingerprint "
                    "(%u hex digits)\n", MD_LENGTH * 2);
   fprintf (stderr, " -v\tverbose output, repeat for even more verbosity\n");
   fprintf (stderr, " -w\tenable warnings for suspect statistics\n");
   exit (1);
}

int dist (u_char *l, u_char *r)
{ int j, k;
  int d = 0;

  for (j = 0; j < MD_LENGTH; j++)
  { u_char ch = l[j] ^ r[j];

    for (k = 0; k < 8; k++) d += ((ch & (1<<k)) > 0);
  } 

  return d;
}

char *md_to_str(u_char *md)
{ int j;

  for (j = 0; j < MD_LENGTH; j++)
  { u_char ch = md[j];

    md_strbuf[j*2] = hex[ch >> 4];
    md_strbuf[j*2+1] = hex[ch & 0xF];
  }

  md_strbuf[j*2] = 0;
  return md_strbuf;
}

u_char *str_to_md(char *str, u_char *md)
{ int j;

  if (!md || !str) return 0;

  bzero (md, MD_LENGTH);

  for (j = 0; j < MD_LENGTH * 2; j++)
  { char ch = str[j];

    if (ch >= '0' && ch <= '9')
    { md [j/2] = (md [j/2] << 4) + (ch - '0'); 
    }
    else
    { ch |= 32;

      if (ch < 'a' || ch > 'f') break;
      md [j/2] = (md[j/2] << 4) + (ch - 'a' + 10);
  } } 

  return (j != MD_LENGTH * 2 || str[j] != 0) ? 0 : md;
}

void freq_to_md(u_char *md)
{ int j, k;
  int num = MD_BITS;

  for (j = 0; j < MD_LENGTH; j++)
  { u_char ch = 0;

    for (k = 0; k < 8; k++) ch = 2*ch + (freq[8*j+k] > 0);
    md[j] = ch;
  }

  if (flag_debug)
  { for (j = 0; j < num; j++)
    { if (j % 8 == 0) printf ("\n%3u: ", j);
      printf ("%7i ", freq[j]);
    }
    printf ("\n");
  }
  bzero (freq, sizeof(freq));
  freq_dups = 0;
}

void process_data (char *name, u_char *data, unsigned len, u_char *md)
{ size_t j = 0;
  u_int32_t ofs;
  u_int32_t dup_cache[DUP_CACHE_SIZE];
  u_int32_t count [MD_BITS * (GROUP_COUNTERS/GROUP_BITS)];
  bzero (dup_cache, DUP_CACHE_SIZE * sizeof (u_int32_t));
  bzero (count, (MD_BITS * (GROUP_COUNTERS/GROUP_BITS) * sizeof (u_int32_t)));

  /* Ignore incomplete substrings */
  while (j < len && j < RABIN_WINDOW_SIZE) rabin_slide8 (data[j++]);

  while (j < len)
  { u_int64_t hash;
    u_int32_t ofs, sum;
    u_char idx;
    int k;

    hash = rabin_slide8 (data[j++]);

    /* In order to update a much larger frequency table
       with only 32 bits of checksum, randomly select a
       part of the table to update. The selection should
       only depend on the content of the represented data,
       and be independent of the bits used for the update.

       Instead of updating 32 individual counters, process
       the checksum in MD_BITS / GROUP_BITS groups of 
       GROUP_BITS bits, and count the frequency of each bit pattern.
    */

    idx = (hash >> 32);
    sum = (u_int32_t) hash;
    ofs = idx % (MD_BITS / NUM_HASHES_PER_CHAR) * NUM_HASHES_PER_CHAR;
    idx %= DUP_CACHE_SIZE;
    if (dup_cache[idx] == sum)
    { freq_dups++; 
    }
    else
    { dup_cache[idx] = sum; 
      for (k = 0; k < NUM_HASHES_PER_CHAR / GROUP_BITS; k++)
      { count[ofs * GROUP_COUNTERS / GROUP_BITS + (sum % GROUP_COUNTERS)]++;
        ofs += GROUP_BITS;
        sum >>= GROUP_BITS;
  } } }

  /* Distribute the occurrences of each bit group over the frequency table. */
  for (ofs = 0; ofs < MD_BITS; ofs += GROUP_BITS)
  { int j;
    for (j = 0; j < GROUP_COUNTERS; j++)
    { int k;
      for (k = 0; k < GROUP_BITS; k++)
      { freq[ofs + k] += ((1<<k) & j) 
          ? count[ofs * GROUP_COUNTERS / GROUP_BITS + j]
          : -count[ofs * GROUP_COUNTERS / GROUP_BITS + j];
  } } }

  { int j;
    int num = MD_BITS;
    int stat_warn = 0;
    double sum = 0.0;
    double sumsqr = 0.0;
    double average, variance, stddev, bits, exp_average, max_average;

    assert (num >= 2);

    sum = 0;

    for (j = 0; j < num; j++)
    { double f = abs ((double) freq[j]);
      sum += f;
      sumsqr += f*f;
    }

    variance = (sumsqr - (sum * sum / num)) / (num - 1);
    average = sum / num;
    stddev = sqrt (variance);
    bits = (NUM_HASHES_PER_CHAR * (file[file_count].length - freq_dups)) 
             / (8 * MD_LENGTH);
    /* Random files, or short files with few repetitions should have
       average very close to the expected average. Large deviations
       show there is too much redundancy, or there is another problem
       with the statistical fundamentals of the algorithm. */
    exp_average = sqrt (2 * bits / pi);
    max_average = 2.0 * pow (2 * bits / pi, 0.6);

    stat_warn = flag_warning
      && (average < exp_average * 0.5 || average > max_average);
    if (stat_warn)
    { fprintf (stdout, "%s: warning: "
               "too much redundancy, fingerprint may not be accurate\n",
               file[file_count].name);

    }

    if (flag_verbose > 1 || (flag_verbose && stat_warn))
    { printf 
        ("%i frequencies, average %5.1f, std dev %5.1f, %2.1f %% duplicates, "
         "\"%s\"\n",
         num, average, stddev,
         100.0 * freq_dups / (double) file[file_count].length,
         file[file_count].name);
      printf
        ("%1.0f expected bits per frequency, "
         "expected average %1.1f, max average %1.1f\n",
         bits, exp_average, max_average);
  } }

  if (md)
  { rabin_reset();
    freq_to_md (md);
    if (flag_relative)
    { int d = dist (md, relative_md);
      double sim = 1.0 - MIN (1.0, (double) (d) / (MD_LENGTH * 4 - 1));
      fprintf (stdout, "%s %llu %u %s %u %3.1f\n", 
               md_to_str (md), (long long unsigned) 0, len, name, 
               d, 100.0 * sim);
    }
    else
    {
      fprintf (stdout, "%s %llu %u %s\n", 
               md_to_str (md), (long long unsigned) 0, len, name);
} } }

void process_file (char *name)
{ int fd;
  struct stat fs;
  u_char *data;
  File *fi = file+file_count;;

  fd = open (name, O_RDONLY, 0);
  if (fd < 0) 
  { perror (name);
    exit (2);
  }

  if (fstat (fd, &fs))
  { perror (name);
    exit (2);
  }

  if (fs.st_size >= MIN_FILE_SIZE
      && fs.st_size <= MAX_FILE_SIZE)
  { fi->length = fs.st_size;
    fi->name = name;

    data = (u_char *) mmap (0, fs.st_size, PROT_READ, MAP_PRIVATE, fd, 0);

    if (data == (u_char *) -1)
    { perror (name);
      exit (2);
    }

    process_data (name, data, fs.st_size, fi->md);
    munmap (data, fs.st_size);
    file_bytes += fs.st_size;
    file_count++;
  } else if (flag_verbose) 
  { fprintf (stdout, "skipping %s (size %llu)\n", name, fs.st_size); }

  close (fd);
}

int main (int argc, char *argv[])
{ int ch, j;

  strncpy (cmd, basename (argv[0]), 8);
  msgout = stdout;

  while ((ch = getopt(argc, argv, "dhr:vw")) != -1)
  { switch (ch) 
    { case 'd': flag_debug++;
		break;
      case 'r': if (!optarg)
                { fprintf (stderr, "%s: missing argument for -r\n", cmd);
                  return 1;
                }
                if (str_to_md (optarg, relative_md)) flag_relative = optarg;
                else
                { fprintf (stderr, "%s: not a valid fingerprint\n", optarg);
                  return 1;
                }
                break;
      case 'v': flag_verbose++;
                break;
      case 'w': flag_warning++;
                break;
      default : usage();
                return (ch != 'h');
  } }

  argc -= optind;
  argv += optind;

  if (argc == 0) usage();

  rabin_reset ();
  if (flag_verbose && flag_relative)
  { fprintf (stdout, "distances are relative to %s\n", flag_relative);
  }

  file = (File *) calloc (argc, sizeof (File));

  for (j = 0; j < argc; j++) process_file (argv[j]);

  if (flag_verbose) 
  { fprintf (stdout, "%li bytes in %i files\n", file_bytes, file_count);
  }

  return 0;
}

[-- Attachment #3: rabinpoly.c --]
[-- Type: application/octet-stream, Size: 3648 bytes --]

/*
 *
 * Copyright (C) 1999 David Mazieres (dm@uun.org)
 *
 * This program is free software; you can redistribute it and/or
 * modify it under the terms of the GNU General Public License as
 * published by the Free Software Foundation; either version 2, or (at
 * your option) any later version.
 *
 * This program is distributed in the hope that it will be useful, but
 * WITHOUT ANY WARRANTY; without even the implied warranty of
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
 * General Public License for more details.
 *
 * You should have received a copy of the GNU General Public License
 * along with this program; if not, write to the Free Software
 * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307
 * USA
 *
 */

  /* Faster generic_fls */
  /* (c) 2002, D.Phillips and Sistina Software */

#include "rabinpoly.h"
#define MSB64 0x8000000000000000ULL

static inline unsigned fls8(unsigned n)
{
       return n & 0xf0?
           n & 0xc0? (n >> 7) + 7: (n >> 5) + 5:
           n & 0x0c? (n >> 3) + 3: n - ((n + 1) >> 2);
}

static inline unsigned fls16(unsigned n)
{
       return n & 0xff00? fls8(n >> 8) + 8: fls8(n);
}

static inline unsigned fls32(unsigned n)
{
       return n & 0xffff0000? fls16(n >> 16) + 16: fls16(n);
}

static inline unsigned fls64(unsigned long long n) /* should be u64 */
{
       return n & 0xffffffff00000000ULL? fls32(n >> 32) + 32: fls32(n);
}

static u_int64_t polymod (u_int64_t nh, u_int64_t nl, u_int64_t d);
static void      polymult (u_int64_t *php, u_int64_t *plp,
                           u_int64_t x, u_int64_t y);
static u_int64_t polymmult (u_int64_t x, u_int64_t y, u_int64_t d);

static u_int64_t poly = 0xb15e234bd3792f63ull;	// Actual polynomial
static u_int64_t T[256];			// Lookup table for mod
static int shift;

u_int64_t append8 (u_int64_t p, u_char m) 
{ return ((p << 8) | m) ^ T[p >> shift]; 
}

static u_int64_t
polymod (u_int64_t nh, u_int64_t nl, u_int64_t d)
{ assert (d);
  int i;
  int k = fls64 (d) - 1;
  d <<= 63 - k;

  if (nh) {
    if (nh & MSB64)
      nh ^= d;
    for (i = 62; i >= 0; i--)
      if (nh & 1ULL << i) {
	nh ^= d >> (63 - i);
	nl ^= d << (i + 1);
      }
  }
  for (i = 63; i >= k; i--)
    if (nl & 1ULL << i)
      nl ^= d >> (63 - i);
  return nl;
}

static void
polymult (u_int64_t *php, u_int64_t *plp, u_int64_t x, u_int64_t y)
{ int i;
  u_int64_t ph = 0, pl = 0;
  if (x & 1)
    pl = y;
  for (i = 1; i < 64; i++)
    if (x & (1ULL << i)) {
      ph ^= y >> (64 - i);
      pl ^= y << i;
    }
  if (php)
    *php = ph;
  if (plp)
    *plp = pl;
}

static u_int64_t
polymmult (u_int64_t x, u_int64_t y, u_int64_t d)
{
  u_int64_t h, l;
  polymult (&h, &l, x, y);
  return polymod (h, l, d);
}

static int size = RABIN_WINDOW_SIZE;
static u_int64_t fingerprint = 0;
static int bufpos = -1;
static u_int64_t U[256];
static u_char buf[RABIN_WINDOW_SIZE];

void rabin_init ()
{ assert (poly >= 0x100);
  u_int64_t sizeshift = 1;
  int xshift = fls64 (poly) - 1;
  int i, j;
  shift = xshift - 8;
  u_int64_t T1 = polymod (0, 1ULL << xshift, poly);
  for (j = 0; j < 256; j++)
    T[j] = polymmult (j, T1, poly) | ((u_int64_t) j << xshift);

  for (i = 1; i < size; i++)
    sizeshift = append8 (sizeshift, 0);
  for (i = 0; i < 256; i++)
    U[i] = polymmult (i, sizeshift, poly);
  bzero (buf, sizeof (buf));
}

void
rabin_reset ()
{ rabin_init();
  fingerprint = 0; 
  bzero (buf, sizeof (buf));
}

u_int64_t
rabin_slide8 (u_char m)
{ u_char om;
  if (++bufpos >= size) bufpos = 0;

  om = buf[bufpos];
  buf[bufpos] = m;
  fingerprint = append8 (fingerprint ^ U[om], m);

  return fingerprint;
}

[-- Attachment #4: rabinpoly.h --]
[-- Type: application/octet-stream, Size: 1015 bytes --]

/*
 *
 * Copyright (C) 2000 David Mazieres (dm@uun.org)
 *
 * This program is free software; you can redistribute it and/or
 * modify it under the terms of the GNU General Public License as
 * published by the Free Software Foundation; either version 2, or (at
 * your option) any later version.
 *
 * This program is distributed in the hope that it will be useful, but
 * WITHOUT ANY WARRANTY; without even the implied warranty of
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
 * General Public License for more details.
 *
 * You should have received a copy of the GNU General Public License
 * along with this program; if not, write to the Free Software
 * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307
 * USA
 *
 * Translated to C and simplified by Geert Bosch (bosch@gnat.com)
 */

#include <assert.h>
#include <strings.h>
#include <sys/types.h>

#ifndef RABIN_WINDOW_SIZE
#define RABIN_WINDOW_SIZE 48
#endif
void rabin_reset(); 
u_int64_t rabin_slide8(u_char c); 

^ permalink raw reply

* Re: parsecvs tool now creates git repositories
From: Martin Langhoff @ 2006-04-06 21:51 UTC (permalink / raw)
  To: Keith Packard; +Cc: Jim Radford, Git Mailing List
In-Reply-To: <1144354356.2303.270.camel@neko.keithp.com>

On 4/7/06, Keith Packard <keithp@keithp.com> wrote:
> > Almost all of the errors I was seeing in the last version were fixed
> > with your "branches that don't get merged back to the trunk" fix.
>
> That's good news at least.

I'm re-running my import of Moodle's cvs (20K commits) with the newer
parsecvs. The previous attempt looked very good except that

 - file additions were recorded with one-commit-per-file. I am not
sure how rcs is recording these, but hte user does enter a common
message at "commit" time. Perhaps the file addition action could be
ignored then?

 - some tags made on a branch show up in HEAD. This may be due to
partial-tree branches, but I am not sure.

cheers


m

^ permalink raw reply

* Re: git-clone and cg-clone
From: Nicolas Vilz 'niv' @ 2006-04-06 22:14 UTC (permalink / raw)
  Cc: git
In-Reply-To: <44355978.3080205@itaapy.com>

Belmar-Letelier wrote:
> Since 0.17 to take benefit of cg-switch
> 
> I use:
> 
> $ git-clone  xxx
> $ cg-branch-add origin xxx
> 
> instead of
> 
> $ cg-clone xxx
> 
> becauce cg-clone did not fetch all the heads.
> 
> Is there a better way to do this ?
> 

well, first I was also using cg clone... but i also realized, that there
is only one branch being pulled from the repository.

If you use git clone, then all tags and branches will be pulled... so
everytime i start using a fresh repository and start pulling origin of
it, i use git clone instead of cg-clone.

i also use git checkout instead of cg-switch... well, i think i haven't
had a use for the effekts, cg-switch does, and always wanted git
checkout... and wondered about the files, which were missing in the
index of the new branch..

i think thats the difference between porcelain and plumbing...

Sincerly
Nicolas

^ permalink raw reply

* Re: parsecvs tool now creates git repositories
From: Keith Packard @ 2006-04-06 22:19 UTC (permalink / raw)
  To: Martin Langhoff; +Cc: keithp, Jim Radford, Git Mailing List
In-Reply-To: <46a038f90604061451m4522e3f3qceae2331751a307c@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 844 bytes --]

On Fri, 2006-04-07 at 09:51 +1200, Martin Langhoff wrote:

>  - file additions were recorded with one-commit-per-file. I am not
> sure how rcs is recording these, but hte user does enter a common
> message at "commit" time. Perhaps the file addition action could be
> ignored then?

If the log message is identical, and the dates are in-range, parsecvs
"should" put the adds in the same commit. 

>  - some tags made on a branch show up in HEAD. This may be due to
> partial-tree branches, but I am not sure.

Finding branch points is not perfect; it's complicated by bizzarre
behaviour when adding files and casual CVS changes which make precise
branch points hard to detect. Can I get at this repository to play with?
I'd like to see if we can't get the branch point detection more
accurate.

-- 
keith.packard@intel.com

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 191 bytes --]

^ permalink raw reply

* Re: parsecvs tool now creates git repositories
From: Martin Langhoff @ 2006-04-06 23:22 UTC (permalink / raw)
  To: Keith Packard; +Cc: Jim Radford, Git Mailing List
In-Reply-To: <1144361968.2303.288.camel@neko.keithp.com>

On 4/7/06, Keith Packard <keithp@keithp.com> wrote:
> On Fri, 2006-04-07 at 09:51 +1200, Martin Langhoff wrote:
>
> >  - file additions were recorded with one-commit-per-file. I am not
> > sure how rcs is recording these, but hte user does enter a common
> > message at "commit" time. Perhaps the file addition action could be
> > ignored then?
>
> If the log message is identical, and the dates are in-range, parsecvs
> "should" put the adds in the same commit.

parsecvs is committing them with the "added file foo.x" message, not
the actual commit message.

> >  - some tags made on a branch show up in HEAD. This may be due to
> > partial-tree branches, but I am not sure.
>
> Finding branch points is not perfect; it's complicated by bizzarre
> behaviour when adding files and casual CVS changes which make precise
> branch points hard to detect. Can I get at this repository to play with?

I fetch it with something along the lines of...

while ( true ) ; do
     wget -qc http://cvs.sourceforge.net/cvstarballs/moodle-cvsroot.tar.bz2 &&
break
     sleep 5
done

and then import the "moodle" module.

cheers,


m

^ permalink raw reply

* Re: Cygwin can't handle huge packfiles?
From: Junio C Hamano @ 2006-04-06 23:53 UTC (permalink / raw)
  To: linux; +Cc: git
In-Reply-To: <20060406205724.12216.qmail@science.horizon.com>

linux@horizon.com writes:

>> Right now we LRU the pack files and evict older ones when we
>> mmap too many, but the unit of eviction is the whole file, so it
>> would not help the case like yours at all.  It might be possible
>> to mmap only part of a packfile, but it would involve fairly
>> major surgery to sha1_file.c.
>
> The simplest solution seems to be to limit pack file size to a reasonable
> fraction of a 32-bit address space.  Say, 0.5 G.

I do not think that would help the original poster's situation
where only 5 revs result in a 1.5G pack.  I would _almost_ say
"do not pack such a repository", but there is the initial
cloning over git-aware transports which always results in a
repository with a single pack.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox