Git development
 help / color / mirror / Atom feed
* Re: Git (svn) merge - but ignore certain commits?
From: "Peter Valdemar Mørch (Lists)" @ 2009-01-08 19:17 UTC (permalink / raw)
  To: Peter Harris git-at-peter.is-a-geek.org |Lists|; +Cc: git
In-Reply-To: <eaa105840901081029h220e06e4m1a1af693e908751e@mail.gmail.com>

Peter Harris git-at-peter.is-a-geek.org |Lists| wrote:
> Well, the real problem is that it *isn't* a repeated merge. Subversion
> rebased your trunk on you, so you...
> 
>> I ended up using git cherry-pick, and diff and patch / git diff and git
>> apply.
> 
> ...wind up needing to do this.
> 
> Don't rebase trunk (which implies ditching subversion,
> (un)fortunately), and repeated merges should Just Work. See, for
> example, the git repository itself, where the master branch is
> repeatedly merged into next.

Ah, yes. I understand. Thanks for making it more clear to me. There are 
two different problems at play here:

1) git svn doesn't help with the fact that svn can't handle the repeated 
merge problem (just noise here)

2) The git-only repeated-merge problem still exists, if I want a commit 
on the branch, but *do not* want it merged back to "master". This I 
still don't see a solution for. E.g.:

---A---B---C---D--+ "master"
     \--E---F---G-/  "branch"

Here I want F and G merged back to "master", but *not* E (which is a 
quick-and-dirty but safe version of B). That still seems not to be 
possible. What I did was:

---A---B---C---D--+- "master"
    |             /
    |\--F---G----+    "devbranch"
    |             \
     \--E----------+-   "branch"

(So F and G got merged from "devbranch" to both "master" and "branch", 
but E stayed on "branch" only)

I could do that because the system worked somewhat without E and I was 
able to develop/test F and G without E. But I'd still be out of luck if 
I needed to work on "branch". There seems to me to be no way in the 
first two-branch scenario to do repeated merges from "branch" to 
"master" if I need to avoid that E gets merged back to "master".

But thanks, Peter, for helping me understand. "git svn" and the fact 
that E happened to be a revert where just noise and had nothing to do 
with the core problem (2). That still has no solution, or am I missing 
something?

Peter
-- 
Peter Valdemar Mørch
http://www.morch.com

^ permalink raw reply

* Re: X-Debbugs-Cc didn't make it to git@vger.kernel.org
From: jidanni @ 2009-01-08 19:36 UTC (permalink / raw)
  To: git; +Cc: 507475
In-Reply-To: <87ej0qf3gx.fsf@jidanni.org>

>>>>> "j" == jidanni  <jidanni@jidanni.org> writes:

j> Bummer, on
j> http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=507475
j> http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=507476
j> I used X-Debbugs-Cc, and it says Report forwarded to git@vger.kernel.org
j> but I don't see them here on nntp:gmane.comp.version-control.git .
j> Perhaps they got filtered out?

Why of course,
$ git checkout origin/todo
$ GET 'http://bugs.debian.org/cgi-bin/bugreport.cgi?msg=2;mbox=yes;bug=507475'|./taboo.perl
43 Delivered-To: submit@bugs.debian.org
matches /^[-\w_]*:/ && m!Delivered-To:!

# These are Majordomo's  global  majordomo.cf  as used at
# vger.kernel.org.

So forget about any X-Debbugs-Cc reaching any kernel.org list.

^ permalink raw reply

* Re: [PATCH (topgit)] tg-patch: add support for generating patches against worktree and index
From: martin f krafft @ 2009-01-08 19:53 UTC (permalink / raw)
  To: Kirill Smelkov, Petr Baudis, Git Mailing List
In-Reply-To: <1231438975-13624-1-git-send-email-kirr@landau.phys.spbu.ru>

[-- Attachment #1: Type: text/plain, Size: 1072 bytes --]

also sprach Kirill Smelkov <kirr@landau.phys.spbu.ru> [2009.01.09.0722 +1300]:
> This implements `tg patch -i` and `tg patch -w` to see current
> patch as generated against not-yet-committed index and worktree.

I think at this early stage, it would make sense to use long options
and not reserve short options yet. Unless Petr disagrees, I'd kindly
ask you to use long options instead. Once TopGit has been around for
a while, we can provide short options for the most important long
options.

This is possibly too conservative, but I've been bitten by lack of
new letters before because I've used them all up for options that
later turned out not to be needed.

I have not yet had the time to actually look at the patch.

-- 
martin | http://madduck.net/ | http://two.sentenc.es/
 
"when zarathustra was alone... he said to his heart: 'could it be
 possible! this old saint in the forest hath not yet heard of it, that
 god is dead!'"
                                                 - friedrich nietzsche
 
spamtraps: madduck.bogus@madduck.net

[-- Attachment #2: Digital signature (see http://martin-krafft.net/gpg/) --]
[-- Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply

* gitweb index performance (Re: [PATCH] gitweb: support the rel=vcs-* microformat)
From: Joey Hess @ 2009-01-08 19:54 UTC (permalink / raw)
  To: git
In-Reply-To: <gk4bk5$9dq$1@ger.gmane.org>

[-- Attachment #1: Type: text/plain, Size: 2147 bytes --]

Giuseppe Bilotta wrote:
> > There is a small overhead in including the microformat on project list
> > and forks list pages, but getting the project descriptions for those pages
> > already incurs a similar overhead, and the ability to get every repo url
> > in one place seems worthwhile.
> 
> I agree with this, although people with very large project lists may
> differ ... do we have timings on these?

AFAICS, when displaying the project list, gitweb reads each project's
description file, falling back to reading its config file if there is no
description file.

If performance was a problem here, the thing to do would be to add
project descriptions to the $project_list file, and use those in
preference to the description files. If a large site has done that,
they've not sent in the patch. :-)

With my patch, it will read each cloneurl file too. The best way to
optimise that for large sites seems to be to add an option that would
ignore the cloneurl files and config file and always use
@git_base_url_list.

I checked the only large site I have access to (git.debian.org) and they
use a $project_list file, but I see no other performance tuning. That's
a 2 ghz machine; it takes gitweb 28 (!) seconds to generate the nearly 1
MB index web page for 1671 repositories:

/srv/git.debian.org/http/cgi-bin/gitweb.cgi  3.04s user 9.24s system 43% cpu 28.515 total

Notice that most of the time is spent by child processes. For each
repository, gitweb runs git-for-each-ref to determine the time of the
last commit.

If that is removed (say if there were a way to get the info w/o
forking), performance improves nicely:

./gitweb.cgi > /dev/null  1.29s user 1.08s system 69% cpu 3.389 total

Making it not read description files for each project, as I suggest above,
is the next best optimisation:

./gitweb.cgi > /dev/null  1.08s user 0.05s system 96% cpu 1.170 total

So, I think it makes sense to optimise gitweb and offer knobs for performance
tuning at the expense of the flexability of description and cloneurl files.
But, git-for-each-ref is swamping everything else.

-- 
see shy jo

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply

* Re: [PATCH 0/3] Teach Git about the patience diff algorithm
From: Adeodato Simó @ 2009-01-08 19:55 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Clemens Buchacher, Johannes Schindelin, Pierre Habouzit, davidel,
	Francis Galiegue, Git ML

[-- Attachment #1: Type: text/plain, Size: 1062 bytes --]

* Linus Torvalds [Fri, 02 Jan 2009 08:42:04 -0800]:

> Yes, this one is a real patience diff change, but it's also the same one 
> that I've seen in the google fanboi findings. What google did _not_ show 
> was any real-life examples, or anybody doing any critical analysis.

This comes a bit late and maybe it's redundant, but somebody just sent
to a Debian mailing list a patch that was hard to read, and patience
improved it. (I realize it's quite similar in spirit to the "toy
patience example" that google returns, but this at list is a *real*
example where patience helped me read a patch.)

I'm also attaching bzr diff output, because it's still more readable
IMHO. (I realize that's independent of patience, as you explained, but
I'm making a point that it'd be nice to have this addressed by somebody
knowledgeable.)

Thanks,

-- 
Adeodato Simó                                     dato at net.com.org.es
Debian Developer                                  adeodato at debian.org
 
- Are you sure we're good?
- Always.
                -- Rory and Lorelai

[-- Attachment #2: git.diff --]
[-- Type: text/x-diff, Size: 3615 bytes --]

diff --git util_sock.c util_sock.c
index e20768e..f7b9145 100644
--- util_sock.c
+++ util_sock.c
@@ -1037,40 +1037,109 @@ NTSTATUS read_data(int fd, char *buffer, size_t N)
 }
 
 /****************************************************************************
- Write data to a fd.
+ Write all data from an iov array
 ****************************************************************************/
 
-ssize_t write_data(int fd, const char *buffer, size_t N)
+ssize_t write_data_iov(int fd, const struct iovec *orig_iov, int iovcnt)
 {
-	size_t total=0;
-	ssize_t ret;
-	char addr[INET6_ADDRSTRLEN];
+	int i;
+	size_t to_send;
+	ssize_t thistime;
+	size_t sent;
+	struct iovec *iov_copy, *iov;
 
-	while (total < N) {
-		ret = sys_write(fd,buffer + total,N - total);
+	to_send = 0;
+	for (i=0; i<iovcnt; i++) {
+		to_send += orig_iov[i].iov_len;
+	}
 
-		if (ret == -1) {
-			if (fd == get_client_fd()) {
-				/* Try and give an error message saying
-				 * what client failed. */
-				DEBUG(0,("write_data: write failure in "
-					"writing to client %s. Error %s\n",
-					get_peer_addr(fd,addr,sizeof(addr)),
-					strerror(errno) ));
-			} else {
-				DEBUG(0,("write_data: write failure. "
-					"Error = %s\n", strerror(errno) ));
+	thistime = sys_writev(fd, orig_iov, iovcnt);
+	if ((thistime <= 0) || (thistime == to_send)) {
+		return thistime;
+	}
+	sent = thistime;
+
+	/*
+	 * We could not send everything in one call. Make a copy of iov that
+	 * we can mess with. We keep a copy of the array start in iov_copy for
+	 * the TALLOC_FREE, because we're going to modify iov later on,
+	 * discarding elements.
+	 */
+
+	iov_copy = (struct iovec *)TALLOC_MEMDUP(
+		talloc_tos(), orig_iov, sizeof(struct iovec) * iovcnt);
+
+	if (iov_copy == NULL) {
+		errno = ENOMEM;
+		return -1;
+	}
+	iov = iov_copy;
+
+	while (sent < to_send) {
+		/*
+		 * We have to discard "thistime" bytes from the beginning
+		 * iov array, "thistime" contains the number of bytes sent
+		 * via writev last.
+		 */
+		while (thistime > 0) {
+			if (thistime < iov[0].iov_len) {
+				char *new_base =
+					(char *)iov[0].iov_base + thistime;
+				iov[0].iov_base = new_base;
+				iov[0].iov_len -= thistime;
+				break;
 			}
-			return -1;
+			thistime -= iov[0].iov_len;
+			iov += 1;
+			iovcnt -= 1;
 		}
 
-		if (ret == 0) {
-			return total;
+		thistime = sys_writev(fd, iov, iovcnt);
+		if (thistime <= 0) {
+			break;
 		}
+		sent += thistime;
+	}
+
+	TALLOC_FREE(iov_copy);
+	return sent;
+}
+
+/****************************************************************************
+ Write data to a fd.
+****************************************************************************/
+
+/****************************************************************************
+ Write data to a fd.
+****************************************************************************/
+
+ssize_t write_data(int fd, const char *buffer, size_t N)
+{
+	ssize_t ret;
+	struct iovec iov;
+
+	iov.iov_base = CONST_DISCARD(char *, buffer);
+	iov.iov_len = N;
+
+	ret = write_data_iov(fd, &iov, 1);
+	if (ret >= 0) {
+		return ret;
+	}
 
-		total += ret;
+	if (fd == get_client_fd()) {
+		char addr[INET6_ADDRSTRLEN];
+		/*
+		 * Try and give an error message saying what client failed.
+		 */
+		DEBUG(0, ("write_data: write failure in writing to client %s. "
+			  "Error %s\n", get_peer_addr(fd,addr,sizeof(addr)),
+			  strerror(errno)));
+	} else {
+		DEBUG(0,("write_data: write failure. Error = %s\n",
+			 strerror(errno) ));
 	}
-	return (ssize_t)total;
+
+	return -1;
 }
 
 /****************************************************************************

[-- Attachment #3: git_patience.diff --]
[-- Type: text/x-diff, Size: 3538 bytes --]

diff --git util_sock.c util_sock.c
index e20768e..f7b9145 100644
--- util_sock.c
+++ util_sock.c
@@ -1037,40 +1037,109 @@ NTSTATUS read_data(int fd, char *buffer, size_t N)
 }
 
 /****************************************************************************
+ Write all data from an iov array
+****************************************************************************/
+
+ssize_t write_data_iov(int fd, const struct iovec *orig_iov, int iovcnt)
+{
+	int i;
+	size_t to_send;
+	ssize_t thistime;
+	size_t sent;
+	struct iovec *iov_copy, *iov;
+
+	to_send = 0;
+	for (i=0; i<iovcnt; i++) {
+		to_send += orig_iov[i].iov_len;
+	}
+
+	thistime = sys_writev(fd, orig_iov, iovcnt);
+	if ((thistime <= 0) || (thistime == to_send)) {
+		return thistime;
+	}
+	sent = thistime;
+
+	/*
+	 * We could not send everything in one call. Make a copy of iov that
+	 * we can mess with. We keep a copy of the array start in iov_copy for
+	 * the TALLOC_FREE, because we're going to modify iov later on,
+	 * discarding elements.
+	 */
+
+	iov_copy = (struct iovec *)TALLOC_MEMDUP(
+		talloc_tos(), orig_iov, sizeof(struct iovec) * iovcnt);
+
+	if (iov_copy == NULL) {
+		errno = ENOMEM;
+		return -1;
+	}
+	iov = iov_copy;
+
+	while (sent < to_send) {
+		/*
+		 * We have to discard "thistime" bytes from the beginning
+		 * iov array, "thistime" contains the number of bytes sent
+		 * via writev last.
+		 */
+		while (thistime > 0) {
+			if (thistime < iov[0].iov_len) {
+				char *new_base =
+					(char *)iov[0].iov_base + thistime;
+				iov[0].iov_base = new_base;
+				iov[0].iov_len -= thistime;
+				break;
+			}
+			thistime -= iov[0].iov_len;
+			iov += 1;
+			iovcnt -= 1;
+		}
+
+		thistime = sys_writev(fd, iov, iovcnt);
+		if (thistime <= 0) {
+			break;
+		}
+		sent += thistime;
+	}
+
+	TALLOC_FREE(iov_copy);
+	return sent;
+}
+
+/****************************************************************************
+ Write data to a fd.
+****************************************************************************/
+
+/****************************************************************************
  Write data to a fd.
 ****************************************************************************/
 
 ssize_t write_data(int fd, const char *buffer, size_t N)
 {
-	size_t total=0;
 	ssize_t ret;
-	char addr[INET6_ADDRSTRLEN];
+	struct iovec iov;
 
-	while (total < N) {
-		ret = sys_write(fd,buffer + total,N - total);
+	iov.iov_base = CONST_DISCARD(char *, buffer);
+	iov.iov_len = N;
 
-		if (ret == -1) {
-			if (fd == get_client_fd()) {
-				/* Try and give an error message saying
-				 * what client failed. */
-				DEBUG(0,("write_data: write failure in "
-					"writing to client %s. Error %s\n",
-					get_peer_addr(fd,addr,sizeof(addr)),
-					strerror(errno) ));
-			} else {
-				DEBUG(0,("write_data: write failure. "
-					"Error = %s\n", strerror(errno) ));
-			}
-			return -1;
-		}
-
-		if (ret == 0) {
-			return total;
-		}
+	ret = write_data_iov(fd, &iov, 1);
+	if (ret >= 0) {
+		return ret;
+	}
 
-		total += ret;
+	if (fd == get_client_fd()) {
+		char addr[INET6_ADDRSTRLEN];
+		/*
+		 * Try and give an error message saying what client failed.
+		 */
+		DEBUG(0, ("write_data: write failure in writing to client %s. "
+			  "Error %s\n", get_peer_addr(fd,addr,sizeof(addr)),
+			  strerror(errno)));
+	} else {
+		DEBUG(0,("write_data: write failure. Error = %s\n",
+			 strerror(errno) ));
 	}
-	return (ssize_t)total;
+
+	return -1;
 }
 
 /****************************************************************************

[-- Attachment #4: bzr.diff --]
[-- Type: text/x-diff, Size: 3432 bytes --]

--- util_sock.c
+++ util_sock.c
@@ -1037,40 +1037,109 @@
 }
 
 /****************************************************************************
+ Write all data from an iov array
+****************************************************************************/
+
+ssize_t write_data_iov(int fd, const struct iovec *orig_iov, int iovcnt)
+{
+	int i;
+	size_t to_send;
+	ssize_t thistime;
+	size_t sent;
+	struct iovec *iov_copy, *iov;
+
+	to_send = 0;
+	for (i=0; i<iovcnt; i++) {
+		to_send += orig_iov[i].iov_len;
+	}
+
+	thistime = sys_writev(fd, orig_iov, iovcnt);
+	if ((thistime <= 0) || (thistime == to_send)) {
+		return thistime;
+	}
+	sent = thistime;
+
+	/*
+	 * We could not send everything in one call. Make a copy of iov that
+	 * we can mess with. We keep a copy of the array start in iov_copy for
+	 * the TALLOC_FREE, because we're going to modify iov later on,
+	 * discarding elements.
+	 */
+
+	iov_copy = (struct iovec *)TALLOC_MEMDUP(
+		talloc_tos(), orig_iov, sizeof(struct iovec) * iovcnt);
+
+	if (iov_copy == NULL) {
+		errno = ENOMEM;
+		return -1;
+	}
+	iov = iov_copy;
+
+	while (sent < to_send) {
+		/*
+		 * We have to discard "thistime" bytes from the beginning
+		 * iov array, "thistime" contains the number of bytes sent
+		 * via writev last.
+		 */
+		while (thistime > 0) {
+			if (thistime < iov[0].iov_len) {
+				char *new_base =
+					(char *)iov[0].iov_base + thistime;
+				iov[0].iov_base = new_base;
+				iov[0].iov_len -= thistime;
+				break;
+			}
+			thistime -= iov[0].iov_len;
+			iov += 1;
+			iovcnt -= 1;
+		}
+
+		thistime = sys_writev(fd, iov, iovcnt);
+		if (thistime <= 0) {
+			break;
+		}
+		sent += thistime;
+	}
+
+	TALLOC_FREE(iov_copy);
+	return sent;
+}
+
+/****************************************************************************
+ Write data to a fd.
+****************************************************************************/
+
+/****************************************************************************
  Write data to a fd.
 ****************************************************************************/
 
 ssize_t write_data(int fd, const char *buffer, size_t N)
 {
-	size_t total=0;
 	ssize_t ret;
-	char addr[INET6_ADDRSTRLEN];
-
-	while (total < N) {
-		ret = sys_write(fd,buffer + total,N - total);
-
-		if (ret == -1) {
-			if (fd == get_client_fd()) {
-				/* Try and give an error message saying
-				 * what client failed. */
-				DEBUG(0,("write_data: write failure in "
-					"writing to client %s. Error %s\n",
-					get_peer_addr(fd,addr,sizeof(addr)),
-					strerror(errno) ));
-			} else {
-				DEBUG(0,("write_data: write failure. "
-					"Error = %s\n", strerror(errno) ));
-			}
-			return -1;
-		}
-
-		if (ret == 0) {
-			return total;
-		}
-
-		total += ret;
-	}
-	return (ssize_t)total;
+	struct iovec iov;
+
+	iov.iov_base = CONST_DISCARD(char *, buffer);
+	iov.iov_len = N;
+
+	ret = write_data_iov(fd, &iov, 1);
+	if (ret >= 0) {
+		return ret;
+	}
+
+	if (fd == get_client_fd()) {
+		char addr[INET6_ADDRSTRLEN];
+		/*
+		 * Try and give an error message saying what client failed.
+		 */
+		DEBUG(0, ("write_data: write failure in writing to client %s. "
+			  "Error %s\n", get_peer_addr(fd,addr,sizeof(addr)),
+			  strerror(errno)));
+	} else {
+		DEBUG(0,("write_data: write failure. Error = %s\n",
+			 strerror(errno) ));
+	}
+
+	return -1;
 }
 
 /****************************************************************************

^ permalink raw reply

* Re: Git (svn) merge - but ignore certain commits?
From: Peter Harris @ 2009-01-08 20:00 UTC (permalink / raw)
  To: Peter Valdemar Mørch (Lists); +Cc: git
In-Reply-To: <4966513C.1010707@sneakemail.com>

On Thu, Jan 8, 2009 at 2:17 PM, "Peter Valdemar Mørch (Lists)" wrote:
>
> E.g.:
>
> ---A---B---C---D--+ "master"
>    \--E---F---G-/  "branch"
>
> Here I want F and G merged back to "master", but *not* E (which is a
> quick-and-dirty but safe version of B).

Stop and think about that for a second.

Rephrased, "I want to cherry pick a few commits to master using the
merge command".

That sounds rather silly when I put it that way. What do you really want? Hmm.

Maybe you want to cherry pick those commits. Maybe (if this is still
an unpublished branch), you want to "git rebase --onto B E" your
branch to get the non-dirty version of E, then merge.

Or maybe you do want to merge, but you're getting confused by not
seeing the automatic conflict markers. You could merge --no-commit the
branch, fix the conflicts (E conflicts logically with B, even if 'git
merge' doesn't automatically mark it as such -- 'git revert -n E' may
even do most of the work), and only then commit the merge revision.
Repeated merges from this state will not keep trying to import E
(since E is already in the history).

Peter Harris

^ permalink raw reply

* Re: [PATCH 0/3] Teach Git about the patience diff algorithm
From: Adeodato Simó @ 2009-01-08 20:06 UTC (permalink / raw)
  To: git
In-Reply-To: <20090108195511.GA8734@chistera.yi.org>

(My apologies for breaking the thread.)

-- 
Adeodato Simó                                     dato at net.com.org.es
Debian Developer                                  adeodato at debian.org
 
«¡Pero si es tan español que debe de tener el cerebro en forma de botijo,
con pitorro y todo!»
                -- Javier Cercas, “La velocidad de la luz”

^ permalink raw reply

* Re: [PATCH (topgit)] tg-patch: add support for generating patches against worktree and index
From: Kirill Smelkov @ 2009-01-08 20:16 UTC (permalink / raw)
  To: martin f krafft; +Cc: Petr Baudis, Git Mailing List
In-Reply-To: <20090108195356.GA14644@lapse.rw.madduck.net>

On Fri, Jan 09, 2009 at 08:53:56AM +1300, martin f krafft wrote:
> also sprach Kirill Smelkov <kirr@landau.phys.spbu.ru> [2009.01.09.0722 +1300]:
> > This implements `tg patch -i` and `tg patch -w` to see current
> > patch as generated against not-yet-committed index and worktree.
> 
> I think at this early stage, it would make sense to use long options
> and not reserve short options yet. Unless Petr disagrees, I'd kindly
> ask you to use long options instead. Once TopGit has been around for
> a while, we can provide short options for the most important long
> options.
> 
> This is possibly too conservative, but I've been bitten by lack of
> new letters before because I've used them all up for options that
> later turned out not to be needed.

I agree, but when I found myself needing something like
`tg patch --index`, I've spot this in README:


    --- a/README
    +++ b/README
    @@ -284,8 +284,9 @@ tg patch
            tg patch will be able to automatically send the patches by mail
            or save them to files. (TODO)
    
    -       TODO: tg patch -i to base at index instead of branch,
    -               -w for working tree

So I concluded -i/-w was planned from the beginning.


I myself would call these options --index and --work or something
like that, but I'll be ok with any option.


Thanks,
Kirill

^ permalink raw reply

* Re: [PATCH] Wrap inflateInit to retry allocation after releasing pack memory
From: Linus Torvalds @ 2009-01-08 20:22 UTC (permalink / raw)
  To: R. Tyler Ballance
  Cc: Shawn O. Pearce, Junio C Hamano, Nicolas Pitre, Jan Krüger,
	Git ML, kb
In-Reply-To: <1231438552.8870.645.camel@starfruit>



On Thu, 8 Jan 2009, R. Tyler Ballance wrote:
> > 
> > Tyler - does this make the corruption errors go away, and be replaced by 
> > hard failures with "out of memory" reporting?
> 
> Yeah, looks like it:

Well, I was hoping that you'd have a confirmation from your own huge repo, 
but I do suspect it's all the same thing, so I guess this counts as 
confirmation too.

> > This patch is potentially pretty noisy, on purpose. I didn't remove the 
> > reporting from places that already do so - some of them have stricter 
> > errors than this.
> 
> I'm assuming this patch is going to be reworked, if so, I'll back it out
> of our internal 1.6.1 build and anxiously await The Real Deal(tm)

Oh, it shouldn't be any noisier under _normal_ load - it's more that 
certain real corruption cases will now report the error twice. That said, 
the new errors should actually be more informative than the old ones, so 
even that isn't necessarily all bad.

Junio - I think we should apply this, and likely to the stable branch too. 
Add the re-trying the inflateInit() after shrinking pack windows on top of 
it.

			Linus

^ permalink raw reply

* Re: [PATCH] Wrap inflateInit to retry allocation after releasing pack memory
From: R. Tyler Ballance @ 2009-01-08 20:37 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Shawn O. Pearce, Junio C Hamano, Nicolas Pitre, Jan Krüger,
	Git ML, kb
In-Reply-To: <alpine.LFD.2.00.0901081216060.3283@localhost.localdomain>

[-- Attachment #1: Type: text/plain, Size: 2325 bytes --]

On Thu, 2009-01-08 at 12:22 -0800, Linus Torvalds wrote:
> 
> On Thu, 8 Jan 2009, R. Tyler Ballance wrote:
> > > 
> > > Tyler - does this make the corruption errors go away, and be replaced by 
> > > hard failures with "out of memory" reporting?
> > 
> > Yeah, looks like it:
> 
> Well, I was hoping that you'd have a confirmation from your own huge repo, 
> but I do suspect it's all the same thing, so I guess this counts as 
> confirmation too.

I never got a real solid "consistent" reproduction case with our
repository, just a lot of users that experienced the issue. I think the
Linux repro case is a far better example, and yeah, it's sorta
confirmation (waiting for operations here to deploy the patched 1.6.1 to
dev machines).

> 
> > > This patch is potentially pretty noisy, on purpose. I didn't remove the 
> > > reporting from places that already do so - some of them have stricter 
> > > errors than this.
> > 
> > I'm assuming this patch is going to be reworked, if so, I'll back it out
> > of our internal 1.6.1 build and anxiously await The Real Deal(tm)
> 
> Oh, it shouldn't be any noisier under _normal_ load - it's more that 
> certain real corruption cases will now report the error twice. That said, 
> the new errors should actually be more informative than the old ones, so 
> even that isn't necessarily all bad.
> 
> Junio - I think we should apply this, and likely to the stable branch too. 
> Add the re-trying the inflateInit() after shrinking pack windows on top of 
> it.

I really appreciate this guys, this is one of the longer threads I've
participated (spanning over a month) and I'm glad you guys were finally
able to track the issue down.

From now moving forward, I'll try to get a reproduction case with the
kernel tree or something equally big since I know it's frustrating to
play the game of telephone with a proprietary code base ("try this? what
does that do? okay, then this?").

Linus, I'll have a chance to look at your comments on my "variable
packed git window size" patch this weekend, and I'll follow-up in the
appropriate thread.


I'm relatively certain that after this witch hunt, I can get Slide to
cover a round of beers at LinuxWorld or the nearest GitTogether ;)


Cheers
-- 
-R. Tyler Ballance
Slide, Inc.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply

* Re: [PATCH (topgit)] tg-patch: add support for generating patches against worktree and index
From: Kirill Smelkov @ 2009-01-08 21:11 UTC (permalink / raw)
  To: martin f krafft; +Cc: Petr Baudis, Git Mailing List
In-Reply-To: <20090108201614.GA4185@roro3>

I'm sorry, but I've found a mistake in my code for case:

diff --git a/tg-patch.sh b/tg-patch.sh
index db1ad09..d701c54 100644
--- a/tg-patch.sh
+++ b/tg-patch.sh
@@ -17,8 +17,8 @@ while [ -n "$1" ]; do
        case "$arg" in
        -i)
                topic='(i)'
-               diff_opts="$diff_opts --cached";;
-               diff_committed_only=;
+               diff_opts="$diff_opts --cached";
+               diff_committed_only=;;
        -w)
                topic='(w)'
                diff_committed_only=;;


So here is corrected patch:


From: Kirill Smelkov <kirr@landau.phys.spbu.ru>
To: Petr Baudis <pasky@suse.cz>
Cc: martin f krafft <madduck@madduck.net>
Cc: Git Mailing List <git@vger.kernel.org>
Subject: [PATCH (topgit)] tg-patch: add support for generating patches against worktree and index

This implements `tg patch -i` and `tg patch -w` to see current patch as
generated against not-yet-committed index and worktree.


NOTE: unfortunately `git cat-file blob <file>` does not provide an option
to cat file from worktree (only from an object or from index), so I had to
unroll my own `cat file topic:file` with special support for '(i)' and
'(w)' topics.

Signed-off-by: Kirill Smelkov <kirr@landau.phys.spbu.ru>

---
 README                     |    5 +++--
 contrib/tg-completion.bash |    6 ++++++
 tg-patch.sh                |   31 +++++++++++++++++++++++++------
 tg.sh                      |   21 +++++++++++++++++++++
 4 files changed, 55 insertions(+), 8 deletions(-)

diff --git a/README b/README
index 1d38365..5796112 100644
--- a/README
+++ b/README
@@ -284,8 +284,9 @@ tg patch
 	tg patch will be able to automatically send the patches by mail
 	or save them to files. (TODO)
 
-	TODO: tg patch -i to base at index instead of branch,
-		-w for working tree
+	Options:
+	  -i		base patch generation on index instead of branch
+	  -w		base patch generation on working tree instead of branch
 
 tg mail
 ~~~~~~~
diff --git a/contrib/tg-completion.bash b/contrib/tg-completion.bash
index 9641d04..de8a7b5 100755
--- a/contrib/tg-completion.bash
+++ b/contrib/tg-completion.bash
@@ -359,6 +359,12 @@ _tg_patch ()
 	local cur="${COMP_WORDS[COMP_CWORD]}"
 
 	case "$cur" in
+	-*)
+		__tgcomp "
+			-i
+			-w
+		"
+		;;
 	*)
 		__tgcomp "$(__tg_topics)"
 	esac
diff --git a/tg-patch.sh b/tg-patch.sh
index dc699d2..d701c54 100644
--- a/tg-patch.sh
+++ b/tg-patch.sh
@@ -5,14 +5,25 @@
 
 name=
 
+topic=
+diff_opts=
+diff_committed_only=yes	# will be unset for index/worktree
+
 
 ## Parse options
 
 while [ -n "$1" ]; do
 	arg="$1"; shift
 	case "$arg" in
+	-i)
+		topic='(i)'
+		diff_opts="$diff_opts --cached";
+		diff_committed_only=;;
+	-w)
+		topic='(w)'
+		diff_committed_only=;;
 	-*)
-		echo "Usage: tg [...] patch [NAME]" >&2
+		echo "Usage: tg [...] patch [-i | -w] [NAME]" >&2
 		exit 1;;
 	*)
 		[ -z "$name" ] || die "name already specified ($name)"
@@ -20,31 +31,39 @@ while [ -n "$1" ]; do
 	esac
 done
 
+
+[ -n "$name"  -a  -z "$diff_committed_only" ]  &&
+	die "-i/-w are mutually exclusive with NAME"
+
 [ -n "$name" ] || name="$(git symbolic-ref HEAD | sed 's#^refs/\(heads\|top-bases\)/##')"
 base_rev="$(git rev-parse --short --verify "refs/top-bases/$name" 2>/dev/null)" ||
 	die "not a TopGit-controlled branch"
 
+# if not index/worktree, topic is current branch
+[ -z "$topic" ] && topic="$name"
+
+
 
 setup_pager
 
-git cat-file blob "$name:.topmsg"
+cat_file "$topic:.topmsg"
 echo
-[ -n "$(git grep '^[-]--' "$name" -- ".topmsg")" ] || echo '---'
+[ -n "$(git grep $diff_opts '^[-]--' ${diff_committed_only:+"$name"} -- ".topmsg")" ] || echo '---'
 
 # Evil obnoxious hack to work around the lack of git diff --exclude
 git_is_stupid="$(mktemp -t tg-patch-changes.XXXXXX)"
-git diff-tree --name-only "$base_rev" "$name" |
+git diff --name-only $diff_opts "$base_rev" ${diff_committed_only:+"$name"} -- |
 	fgrep -vx ".topdeps" |
 	fgrep -vx ".topmsg" >"$git_is_stupid" || : # fgrep likes to fail randomly?
 if [ -s "$git_is_stupid" ]; then
-	cat "$git_is_stupid" | xargs git diff --patch-with-stat "$base_rev" "$name" --
+	cat "$git_is_stupid" | xargs git diff --patch-with-stat $diff_opts "$base_rev" ${diff_committed_only:+"$name"} --
 else
 	echo "No changes."
 fi
 rm "$git_is_stupid"
 
 echo '-- '
-echo "tg: ($base_rev..) $name (depends on: $(git cat-file blob "$name:.topdeps" | paste -s -d' '))"
+echo "tg: ($base_rev..) $name (depends on: $(cat_file "$topic:.topdeps" | paste -s -d' '))"
 branch_contains "$name" "$base_rev" ||
 	echo "tg: The patch is out-of-date wrt. the base! Run \`$tg update\`."
 
diff --git a/tg.sh b/tg.sh
index b64fc3a..1762f03 100644
--- a/tg.sh
+++ b/tg.sh
@@ -17,6 +17,27 @@ die()
 	exit 1
 }
 
+# cat_file "topic:file"
+# Like `git cat-file blob $1`, but topics '(i)' and '(w)' means index and worktree
+cat_file()
+{
+	arg="$1"
+	case "$arg" in
+	'(w):'*)
+		arg=$(echo "$arg" | tail --bytes=+5)
+		cat "$arg"
+		return
+		;;
+	'(i):'*)
+		# ':file' means cat from index
+		arg=$(echo "$arg" | tail --bytes=+5)
+		git cat-file blob ":$arg"
+		;;
+	*)
+		git cat-file blob "$arg"
+	esac
+}
+
 # setup_hook NAME
 setup_hook()
 {
-- 
tg: (a3a5be1..) t/tg-patch-worktree (depends on: t/tg-patch-setup-pager)

^ permalink raw reply related

* Re: [BUG PATCH RFC] mailinfo: correctly handle multiline 'Subject:' header
From: Kirill Smelkov @ 2009-01-08 23:11 UTC (permalink / raw)
  To: Junio C Hamano, Alexander Potashev; +Cc: git
In-Reply-To: <7vy6xm5i6h.fsf@gitster.siamese.dyndns.org>

On Thu, Jan 08, 2009 at 12:13:42AM -0800, Junio C Hamano wrote:
> Kirill Smelkov <kirr@landau.phys.spbu.ru> writes:
> 
> > On Fri, Dec 26, 2008 at 09:38:41PM +0300, Kirill Smelkov wrote:
> >> When native language (RU) is in use, subject header usually contains several
> >> parts, e.g.
> > ...
> > Junio, All,
> >
> > What about this patch?
> 
> What's most interesting is that I do not recall seeing this patch before.
> Neither gmane (which is my back-up interface to the mailing list) nor my
> mailbox seems to have a copy, and from the look of quoted parts (namely,
> some Russian strings in the message), it is not implausible that my spam
> filter (either on my receiving end or at the ISP) may have eaten it.
> 
> > It at least exposes bug in git-mailinfo wrt handling of multiline
> > subjects, and in very details documents it and adds a test for it.
> >
> > ..., but may I try to attract git
> > community attention one more time?
> 
> It is very appreciated.

Thanks!


On Thu, Jan 08, 2009 at 12:35:52AM -0800, Junio C Hamano wrote:
> Junio C Hamano <gitster@pobox.com> writes:
> 
> > Kirill Smelkov <kirr@landau.phys.spbu.ru> writes:
> > ...
> >> http://marc.info/?l=git&m=123031899307286&w=2
> >
> > I have not had chance to look at your patch at marc yet, but from the look
> > of your problem description, I presume you could trigger this with any
> > utf-8 b-encoded loooooong subject line?
> 
> Ok, I took a look at it after downloading from the marc archive.
> 
> > diff --git a/builtin-mailinfo.c b/builtin-mailinfo.c
> > index e890f7a..d138bc3 100644
> > --- a/builtin-mailinfo.c
> > +++ b/builtin-mailinfo.c
> > @@ -436,6 +436,14 @@ static struct strbuf *decode_b_segment(const struct strbuf *b_seg)
> >  			 * for now we just trust the data.
> >  			 */
> >  			c = 0;
> > +
> > +			/* XXX: the following is needed not to output NUL in
> > +			 * the resulting string
> > +			 *
> > +			 * This seems to be ok, but I'm not 100% sure -- that's
> > +			 * why this is an RFC.
> > +			 */
> > +			continue;
> >  		}
> >  		else
> >  			continue; /* garbage */
> 
> B encoding (RFC 2045) encodes an octet stream into a sequence of groups of
> 4 letters from 64-char alphabet, each of which encodes 6-bit, plus zero or
> more padding char '=' to make the result multiple of 4.
> 
>  * If the length of the payload is a multiple of 3 octets, there is no
>    special handling.  Padding char '=' is not produced;
> 
>  * If it is a multiple of 3 octets plus one, the remaining one octet is
>    encoded with two letters, and two more padding char '=' is added;
> 
>  * If it is a multiple of 3 octets plus two, the remaining two octets are
>    encoded with three letters, and one padding char '=' is added.
> 
> Hence, a "correct" implementation should decode the input as if '=' were
> the same as 'A' (which encodes 6 bits of 0) til the end, making sure that
> the padding char '=' appears only at the end of the input, that no char
> outside the Base64 encoding alphabet appears in the input, and that the
> length of the entire encoded string is multiple of 4.  Finally it would
> discard either one or two octets (depending on the number of padding chars
> it saw) from the end of the output.
> 
> Our decode_b_segment() however emits each octet as it completes, without
> waiting for the 24-bit group that contains it to complete.  When decoding
> a correctly encoded input, by the time we see a padding '=', all the real
> payload octets are complete and we would not have any real information
> still kept in the variable "acc" (accumulator), so ignoring '=' (you do
> not even need to assign c = 0) like your patch did would work just fine.
> An alternative would be to count the number of padding at the end and drop
> the NULs from the output as necessary after the loop but that does not add
> any value to the current code.
> 
> Ideally we should validate the encoded string a bit more carefully (see
> the "correct" implementation about), and warn if a malformed input is
> found (but probably not reject outright).  But as a low-impact fix for the
> maintenance branches, I think your fix is very good.
> 
> 	Side note: I suspect that the existing code was Ok before strbuf
> 	conversion as we assumed NUL terminated output buffer.

Junio, thanks for the explanation.

I've updated the patch and included your analysis into description.

> > @@ -513,7 +521,15 @@ static int decode_header_bq(struct strbuf *it)
> >  		strbuf_reset(&piecebuf);
> >  		rfc2047 = 1;
> >  
> > -		if (in != ep) {
> > +		/* XXX: the follwoing is needed not to output '\n' on every
> > +		 * multi-line segment in Subject.
> > +		 *
> > +		 * I suspect this is not 100% correct, but I'm not a MIME guy
> > +		 * -- that's why this is an RFC.
> > +		 */
> > +
> > +		/* if in does not end with '=?=', we emit it as is */
> > +		if (in <= (ep-2) && !(ep[-1]=='\n' && ep[-2]=='=')) {
> >  			strbuf_add(&outbuf, in, ep - in);
> >  			in = ep;
> > 
> >  		}
> 
> I am not a MIME guy either (and mailinfo has a big comment that says we do
> not really do MIME --- we just pretend to do), but let me give it a try.
> 
> RFC2046 specifies that an encoded-word ("=?charset?encoding?...?=") may
> not be more than 75 characters long, and multiple encoded-words, separated
> by CRLF SPACE can be used to encode more text if needed.
> 
> It further specifies that an encoded-word can appear next to ordinary text
> or another encoded-word but it must be separated by linear white space,
> and says that such linear white space is to be ignored when displaying.
> 
> Which means that we should be eating the CRLF SPACE we see if we have seen
> an encoded-word immediately before and we are about to process another
> encoded-word.
> 
> Based on the above discussion, here is what I came up with.  It passes
> your test, but I ran out of energy to try breaking it seriously in any
> other way than just running the existing test suite.  

Thanks again very much!

I was once maintaining software, and I think I understand what you mean
by saying 'ran out of energy', so I'll try to do my best to help improve
this patch and to get it merged.

> We might want to steal some test cases from the "8. Examples" section of
> RFC2047 and add them to t5100.

Good idea. I took all the examples and incorporated them into our
testsuite.

> 
> Thanks.
> 
>  builtin-mailinfo.c |   27 +++++++++++++++++++--------
>  1 files changed, 19 insertions(+), 8 deletions(-)
> 
> diff --git c/builtin-mailinfo.c w/builtin-mailinfo.c
> index e890f7a..fcb32c9 100644
> --- c/builtin-mailinfo.c
> +++ w/builtin-mailinfo.c
> @@ -430,13 +430,6 @@ static struct strbuf *decode_b_segment(const struct strbuf *b_seg)
>  			c -= 'a' - 26;
>  		else if ('0' <= c && c <= '9')
>  			c -= '0' - 52;
> -		else if (c == '=') {
> -			/* padding is almost like (c == 0), except we do
> -			 * not output NUL resulting only from it;
> -			 * for now we just trust the data.
> -			 */
> -			c = 0;
> -		}
>  		else
>  			continue; /* garbage */
>  		switch (pos++) {
> @@ -514,7 +507,25 @@ static int decode_header_bq(struct strbuf *it)
>  		rfc2047 = 1;
>  
>  		if (in != ep) {
> -			strbuf_add(&outbuf, in, ep - in);
> +			/*
> +			 * We are about to process an encoded-word
> +			 * that begins at ep, but there is something
> +			 * before the encoded word.
> +			 */
> +			char *scan;
> +			for (scan = in; scan < ep; scan++)
> +				if (!isspace(*scan))
> +					break;
> +
> +			if (scan != ep || in == it->buf) {
> +				/*
> +				 * We should not lose that "something",
> +				 * unless we have just processed an
> +				 * encoded-word, and there is only LWS
> +				 * before the one we are about to process.
> +				 */
> +				strbuf_add(&outbuf, in, ep - in);
> +			}
>  			in = ep;
>  		}
>  		/* E.g.

Based on the above description the code looks good now. I've
incorporated it into the patch and added tests from RFC2047 (see patch
below).

On Thu, Jan 08, 2009 at 01:08:13PM +0300, Alexander Potashev wrote:
> On 21:38 Fri 26 Dec     , Kirill Smelkov wrote:
> > When native language (RU) is in use, subject header usually contains several
> > parts, e.g.
> > 
> > Subject: [Navy-patches] [PATCH]
> > 	=?utf-8?b?0JjQt9C80LXQvdGR0L0g0YHQv9C40YHQvtC6INC/0LA=?=
> > 	=?utf-8?b?0LrQtdGC0L7QsiDQvdC10L7QsdGF0L7QtNC40LzRi9GFINC00LvRjyA=?=
> > 	=?utf-8?b?0YHQsdC+0YDQutC4?=
> > 
> 
> >  t/t5100/info0012    |    5 ++++
> >  t/t5100/msg0012     |    7 ++++++
> >  t/t5100/patch0012   |   30 +++++++++++++++++++++++++++++
> >  t/t5100/sample.mbox |   52 +++++++++++++++++++++++++++++++++++++++++++++++++++
> >  6 files changed, 112 insertions(+), 2 deletions(-)
> 
> The testcases are too long, a minimal mbox with encoded "Subject:" would
> be enough to test the mailinfo parser, it's all the you need to test
> here.

Thanks Alexander for pointing this out.

I've based my testcase on already-in-there tests, which e.g. for
t/t5100/{info,msg,patch}00{04,05,09,10,11} are of approximately the same
size and are based on real mails.

Is this ok?


As to new RFC2047-examples based tests, I've tried to keep them to the
bare minimum.


Changes since v1:

 o incorporated Junio's description and code about padding
 o incorporated Junio's description and code about LWS between encoded
   words
 o incorporated tests from RFC2047 examples  (one testresult is unclear
   -- see patch description)


From: Kirill Smelkov <kirr@landau.phys.spbu.ru>
Subject: mailinfo: correctly handle multiline 'Subject:' header

When native language (RU) is in use, subject header usually contains several
parts, e.g.

Subject: [Navy-patches] [PATCH]
	=?utf-8?b?0JjQt9C80LXQvdGR0L0g0YHQv9C40YHQvtC6INC/0LA=?=
	=?utf-8?b?0LrQtdGC0L7QsiDQvdC10L7QsdGF0L7QtNC40LzRi9GFINC00LvRjyA=?=
	=?utf-8?b?0YHQsdC+0YDQutC4?=

( which btw should be extracted by git-mailinfo to:

    'Subject: Изменён список пакетов необходимых для сборки' )

This exposes several bugs in builtin-mailinfo.c which we try to fix:

1. decode_b_segment: do not append explicit NUL -- explicit NUL was preventing
   correct header construction on parts concatenation via strbuf_addbuf in
   decode_header_bq. Fixes:

-Subject: Изменён список пакетов необходимых для сборки
+Subject: Изменён список па

Junio:

> B encoding (RFC 2045) encodes an octet stream into a sequence of groups of
> 4 letters from 64-char alphabet, each of which encodes 6-bit, plus zero or
> more padding char '=' to make the result multiple of 4.
>
>  * If the length of the payload is a multiple of 3 octets, there is no
>    special handling.  Padding char '=' is not produced;
>
>  * If it is a multiple of 3 octets plus one, the remaining one octet is
>    encoded with two letters, and two more padding char '=' is added;
>
>  * If it is a multiple of 3 octets plus two, the remaining two octets are
>    encoded with three letters, and one padding char '=' is added.
>
> Hence, a "correct" implementation should decode the input as if '=' were
> the same as 'A' (which encodes 6 bits of 0) til the end, making sure that
> the padding char '=' appears only at the end of the input, that no char
> outside the Base64 encoding alphabet appears in the input, and that the
> length of the entire encoded string is multiple of 4.  Finally it would
> discard either one or two octets (depending on the number of padding chars
> it saw) from the end of the output.
>
> Our decode_b_segment() however emits each octet as it completes, without
> waiting for the 24-bit group that contains it to complete.  When decoding
> a correctly encoded input, by the time we see a padding '=', all the real
> payload octets are complete and we would not have any real information
> still kept in the variable "acc" (accumulator), so ignoring '=' (you do
> not even need to assign c = 0) like your patch did would work just fine.
> An alternative would be to count the number of padding at the end and drop
> the NULs from the output as necessary after the loop but that does not add
> any value to the current code.
>
> Ideally we should validate the encoded string a bit more carefully (see
> the "correct" implementation about), and warn if a malformed input is
> found (but probably not reject outright).  But as a low-impact fix for the
> maintenance branches, I think your fix is very good.
>
> 	Side note: I suspect that the existing code was Ok before strbuf
> 	conversion as we assumed NUL terminated output buffer.


Then

2. whitespaces between encoded words should be removed

-Subject: Изменён список пакетов необходимых для сборки
+Subject: Изменён список па кетов необходимых для сборки

Junio:

> I am not a MIME guy either (and mailinfo has a big comment that says we do
> not really do MIME --- we just pretend to do), but let me give it a try.
>
> RFC2046 specifies that an encoded-word ("=?charset?encoding?...?=") may
> not be more than 75 characters long, and multiple encoded-words, separated
> by CRLF SPACE can be used to encode more text if needed.
>
> It further specifies that an encoded-word can appear next to ordinary text
> or another encoded-word but it must be separated by linear white space,
> and says that such linear white space is to be ignored when displaying.
>
> Which means that we should be eating the CRLF SPACE we see if we have seen
> an encoded-word immediately before and we are about to process another
> encoded-word.

Also as suggested by Junio, in order to try to catch other MIME problems test
cases from the "8. Examples" section of RFC2047 are added to t5100 testsuite as
well.

    [but I'm not sure whether testresult with Nathaniel Borenstein
     (םולש ןב ילטפנ) is correct -- see rfc2047-info-0004]

Big-thanks-to: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Kirill Smelkov <kirr@landau.phys.spbu.ru>

---
 builtin-mailinfo.c           |   27 +++++++++++++++------
 t/t5100-mailinfo.sh          |   24 ++++++++++++++++++-
 t/t5100/info0012             |    5 ++++
 t/t5100/msg0012              |    7 +++++
 t/t5100/patch0012            |   30 ++++++++++++++++++++++++
 t/t5100/rfc2047-info-0001    |    4 +++
 t/t5100/rfc2047-info-0002    |    4 +++
 t/t5100/rfc2047-info-0003    |    4 +++
 t/t5100/rfc2047-info-0004    |    5 ++++
 t/t5100/rfc2047-info-0005    |    2 +
 t/t5100/rfc2047-info-0006    |    2 +
 t/t5100/rfc2047-info-0007    |    2 +
 t/t5100/rfc2047-info-0008    |    2 +
 t/t5100/rfc2047-info-0009    |    2 +
 t/t5100/rfc2047-info-0010    |    2 +
 t/t5100/rfc2047-info-0011    |    2 +
 t/t5100/rfc2047-samples.mbox |   48 ++++++++++++++++++++++++++++++++++++++
 t/t5100/sample.mbox          |   52 ++++++++++++++++++++++++++++++++++++++++++
 18 files changed, 215 insertions(+), 9 deletions(-)

diff --git a/builtin-mailinfo.c b/builtin-mailinfo.c
index f7c8c08..77a7121 100644
--- a/builtin-mailinfo.c
+++ b/builtin-mailinfo.c
@@ -430,13 +430,6 @@ static struct strbuf *decode_b_segment(const struct strbuf *b_seg)
 			c -= 'a' - 26;
 		else if ('0' <= c && c <= '9')
 			c -= '0' - 52;
-		else if (c == '=') {
-			/* padding is almost like (c == 0), except we do
-			 * not output NUL resulting only from it;
-			 * for now we just trust the data.
-			 */
-			c = 0;
-		}
 		else
 			continue; /* garbage */
 		switch (pos++) {
@@ -514,7 +507,25 @@ static int decode_header_bq(struct strbuf *it)
 		rfc2047 = 1;
 
 		if (in != ep) {
-			strbuf_add(&outbuf, in, ep - in);
+			/*
+			 * We are about to process an encoded-word
+			 * that begins at ep, but there is something
+			 * before the encoded word.
+			 */
+			char *scan;
+			for (scan = in; scan < ep; scan++)
+				if (!isspace(*scan))
+					break;
+
+			if (scan != ep || in == it->buf) {
+				/*
+				 * We should not lose that "something",
+				 * unless we have just processed an
+				 * encoded-word, and there is only LWS
+				 * before the one we are about to process.
+				 */
+				strbuf_add(&outbuf, in, ep - in);
+			}
 			in = ep;
 		}
 		/* E.g.
diff --git a/t/t5100-mailinfo.sh b/t/t5100-mailinfo.sh
index fe14589..625c204 100755
--- a/t/t5100-mailinfo.sh
+++ b/t/t5100-mailinfo.sh
@@ -11,7 +11,7 @@ test_expect_success 'split sample box' \
 	'git mailsplit -o. "$TEST_DIRECTORY"/t5100/sample.mbox >last &&
 	last=`cat last` &&
 	echo total is $last &&
-	test `cat last` = 11'
+	test `cat last` = 12'
 
 for mail in `echo 00*`
 do
@@ -26,6 +26,28 @@ do
 	'
 done
 
+
+test_expect_success 'split box with rfc2047 samples' \
+	'mkdir rfc2047 &&
+	git mailsplit -orfc2047 "$TEST_DIRECTORY"/t5100/rfc2047-samples.mbox \
+	  >rfc2047/last &&
+	last=`cat rfc2047/last` &&
+	echo total is $last &&
+	test `cat rfc2047/last` = 11'
+
+for mail in `echo rfc2047/00*`
+do
+	test_expect_success "mailinfo $mail" '
+		git mailinfo -u $mail-msg $mail-patch <$mail >$mail-info &&
+		echo msg &&
+		test_cmp "$TEST_DIRECTORY"/t5100/empty $mail-msg &&
+		echo patch &&
+		test_cmp "$TEST_DIRECTORY"/t5100/empty $mail-patch &&
+		echo info &&
+		test_cmp "$TEST_DIRECTORY"/t5100/rfc2047-info-$(basename $mail) $mail-info
+	'
+done
+
 test_expect_success 'respect NULs' '
 
 	git mailsplit -d3 -o. "$TEST_DIRECTORY"/t5100/nul-plain &&
diff --git a/t/t5100/empty b/t/t5100/empty
new file mode 100644
index 0000000..e69de29
diff --git a/t/t5100/info0012 b/t/t5100/info0012
new file mode 100644
index 0000000..ac1216f
--- /dev/null
+++ b/t/t5100/info0012
@@ -0,0 +1,5 @@
+Author: Dmitriy Blinov
+Email: bda@mnsspb.ru
+Subject: Изменён список пакетов необходимых для сборки
+Date: Wed, 12 Nov 2008 17:54:41 +0300
+
diff --git a/t/t5100/msg0012 b/t/t5100/msg0012
new file mode 100644
index 0000000..1dc2bf7
--- /dev/null
+++ b/t/t5100/msg0012
@@ -0,0 +1,7 @@
+textlive-* исправлены на texlive-*
+docutils заменён на python-docutils
+
+Действительно, оказалось, что rest2web вытягивает за собой
+python-docutils. В то время как сам rest2web не нужен.
+
+Signed-off-by: Dmitriy Blinov <bda@mnsspb.ru>
diff --git a/t/t5100/patch0012 b/t/t5100/patch0012
new file mode 100644
index 0000000..36a0b68
--- /dev/null
+++ b/t/t5100/patch0012
@@ -0,0 +1,30 @@
+---
+ howto/build_navy.txt |    6 +++---
+ 1 files changed, 3 insertions(+), 3 deletions(-)
+
+diff --git a/howto/build_navy.txt b/howto/build_navy.txt
+index 3fd3afb..0ee807e 100644
+--- a/howto/build_navy.txt
++++ b/howto/build_navy.txt
+@@ -119,8 +119,8 @@
+    - libxv-dev
+    - libusplash-dev
+    - latex-make
+-   - textlive-lang-cyrillic
+-   - textlive-latex-extra
++   - texlive-lang-cyrillic
++   - texlive-latex-extra
+    - dia
+    - python-pyrex
+    - libtool
+@@ -128,7 +128,7 @@
+    - sox
+    - cython
+    - imagemagick
+-   - docutils
++   - python-docutils
+ 
+ #. на машине dinar: добавить свой открытый ssh-ключ в authorized_keys2 пользователя ddev
+ #. на своей машине: отредактировать /etc/sudoers (команда ``visudo``) примерно следующим образом::
+-- 
+1.5.6.5
diff --git a/t/t5100/rfc2047-info-0001 b/t/t5100/rfc2047-info-0001
new file mode 100644
index 0000000..0a383b0
--- /dev/null
+++ b/t/t5100/rfc2047-info-0001
@@ -0,0 +1,4 @@
+Author: Keith Moore
+Email: moore@cs.utk.edu
+Subject: If you can read this you understand the example.
+
diff --git a/t/t5100/rfc2047-info-0002 b/t/t5100/rfc2047-info-0002
new file mode 100644
index 0000000..881be75
--- /dev/null
+++ b/t/t5100/rfc2047-info-0002
@@ -0,0 +1,4 @@
+Author: Olle Järnefors
+Email: ojarnef@admin.kth.se
+Subject: Time for ISO 10646?
+
diff --git a/t/t5100/rfc2047-info-0003 b/t/t5100/rfc2047-info-0003
new file mode 100644
index 0000000..d0f7891
--- /dev/null
+++ b/t/t5100/rfc2047-info-0003
@@ -0,0 +1,4 @@
+Author: Patrik Fältström
+Email: paf@nada.kth.se
+Subject: RFC-HDR care and feeding
+
diff --git a/t/t5100/rfc2047-info-0004 b/t/t5100/rfc2047-info-0004
new file mode 100644
index 0000000..850f831
--- /dev/null
+++ b/t/t5100/rfc2047-info-0004
@@ -0,0 +1,5 @@
+Author: Nathaniel Borenstein  
+     (םולש ןב ילטפנ)
+Email: nsb@thumper.bellcore.com
+Subject: Test of new header generator
+
diff --git a/t/t5100/rfc2047-info-0005 b/t/t5100/rfc2047-info-0005
new file mode 100644
index 0000000..c27be3b
--- /dev/null
+++ b/t/t5100/rfc2047-info-0005
@@ -0,0 +1,2 @@
+Subject: (a)
+
diff --git a/t/t5100/rfc2047-info-0006 b/t/t5100/rfc2047-info-0006
new file mode 100644
index 0000000..9dad474
--- /dev/null
+++ b/t/t5100/rfc2047-info-0006
@@ -0,0 +1,2 @@
+Subject: (a b)
+
diff --git a/t/t5100/rfc2047-info-0007 b/t/t5100/rfc2047-info-0007
new file mode 100644
index 0000000..294f195
--- /dev/null
+++ b/t/t5100/rfc2047-info-0007
@@ -0,0 +1,2 @@
+Subject: (ab)
+
diff --git a/t/t5100/rfc2047-info-0008 b/t/t5100/rfc2047-info-0008
new file mode 100644
index 0000000..294f195
--- /dev/null
+++ b/t/t5100/rfc2047-info-0008
@@ -0,0 +1,2 @@
+Subject: (ab)
+
diff --git a/t/t5100/rfc2047-info-0009 b/t/t5100/rfc2047-info-0009
new file mode 100644
index 0000000..294f195
--- /dev/null
+++ b/t/t5100/rfc2047-info-0009
@@ -0,0 +1,2 @@
+Subject: (ab)
+
diff --git a/t/t5100/rfc2047-info-0010 b/t/t5100/rfc2047-info-0010
new file mode 100644
index 0000000..9dad474
--- /dev/null
+++ b/t/t5100/rfc2047-info-0010
@@ -0,0 +1,2 @@
+Subject: (a b)
+
diff --git a/t/t5100/rfc2047-info-0011 b/t/t5100/rfc2047-info-0011
new file mode 100644
index 0000000..9dad474
--- /dev/null
+++ b/t/t5100/rfc2047-info-0011
@@ -0,0 +1,2 @@
+Subject: (a b)
+
diff --git a/t/t5100/rfc2047-samples.mbox b/t/t5100/rfc2047-samples.mbox
new file mode 100644
index 0000000..3ca2470
--- /dev/null
+++ b/t/t5100/rfc2047-samples.mbox
@@ -0,0 +1,48 @@
+From nobody Mon Sep 17 00:00:00 2001
+From: =?US-ASCII?Q?Keith_Moore?= <moore@cs.utk.edu>
+To: =?ISO-8859-1?Q?Keld_J=F8rn_Simonsen?= <keld@dkuug.dk>
+CC: =?ISO-8859-1?Q?Andr=E9?= Pirard <PIRARD@vm1.ulg.ac.be>
+Subject: =?ISO-8859-1?B?SWYgeW91IGNhbiByZWFkIHRoaXMgeW8=?=
+ =?ISO-8859-2?B?dSB1bmRlcnN0YW5kIHRoZSBleGFtcGxlLg==?=
+
+From nobody Mon Sep 17 00:00:00 2001
+From: =?ISO-8859-1?Q?Olle_J=E4rnefors?= <ojarnef@admin.kth.se>
+To: ietf-822@dimacs.rutgers.edu, ojarnef@admin.kth.se
+Subject: Time for ISO 10646?
+
+From nobody Mon Sep 17 00:00:00 2001
+To: Dave Crocker <dcrocker@mordor.stanford.edu>
+Cc: ietf-822@dimacs.rutgers.edu, paf@comsol.se
+From: =?ISO-8859-1?Q?Patrik_F=E4ltstr=F6m?= <paf@nada.kth.se>
+Subject: Re: RFC-HDR care and feeding
+
+From nobody Mon Sep 17 00:00:00 2001
+From: Nathaniel Borenstein <nsb@thumper.bellcore.com>
+      (=?iso-8859-8?b?7eXs+SDv4SDp7Oj08A==?=)
+To: Greg Vaudreuil <gvaudre@NRI.Reston.VA.US>, Ned Freed
+   <ned@innosoft.com>, Keith Moore <moore@cs.utk.edu>
+Subject: Test of new header generator
+MIME-Version: 1.0
+Content-type: text/plain; charset=ISO-8859-1
+
+From nobody Mon Sep 17 00:00:00 2001
+Subject: (=?ISO-8859-1?Q?a?=)
+
+From nobody Mon Sep 17 00:00:00 2001
+Subject: (=?ISO-8859-1?Q?a?= b)
+
+From nobody Mon Sep 17 00:00:00 2001
+Subject: (=?ISO-8859-1?Q?a?= =?ISO-8859-1?Q?b?=)
+
+From nobody Mon Sep 17 00:00:00 2001
+Subject: (=?ISO-8859-1?Q?a?=  =?ISO-8859-1?Q?b?=)
+
+From nobody Mon Sep 17 00:00:00 2001
+Subject: (=?ISO-8859-1?Q?a?=
+    =?ISO-8859-1?Q?b?=)
+
+From nobody Mon Sep 17 00:00:00 2001
+Subject: (=?ISO-8859-1?Q?a_b?=)
+
+From nobody Mon Sep 17 00:00:00 2001
+Subject: (=?ISO-8859-1?Q?a?= =?ISO-8859-2?Q?_b?=)
diff --git a/t/t5100/sample.mbox b/t/t5100/sample.mbox
index 4bf7947..94da4da 100644
--- a/t/t5100/sample.mbox
+++ b/t/t5100/sample.mbox
@@ -501,3 +501,55 @@ index 3e5fe51..aabfe5c 100644
 
 --=-=-=--
 
+From bda@mnsspb.ru Wed Nov 12 17:54:41 2008
+From: Dmitriy Blinov <bda@mnsspb.ru>
+To: navy-patches@dinar.mns.mnsspb.ru
+Date: Wed, 12 Nov 2008 17:54:41 +0300
+Message-Id: <1226501681-24923-1-git-send-email-bda@mnsspb.ru>
+X-Mailer: git-send-email 1.5.6.5
+MIME-Version: 1.0
+Content-Type: text/plain;
+  charset=utf-8
+Content-Transfer-Encoding: 8bit
+Subject: [Navy-patches] [PATCH]
+	=?utf-8?b?0JjQt9C80LXQvdGR0L0g0YHQv9C40YHQvtC6INC/0LA=?=
+	=?utf-8?b?0LrQtdGC0L7QsiDQvdC10L7QsdGF0L7QtNC40LzRi9GFINC00LvRjyA=?=
+	=?utf-8?b?0YHQsdC+0YDQutC4?=
+
+textlive-* исправлены на texlive-*
+docutils заменён на python-docutils
+
+Действительно, оказалось, что rest2web вытягивает за собой
+python-docutils. В то время как сам rest2web не нужен.
+
+Signed-off-by: Dmitriy Blinov <bda@mnsspb.ru>
+---
+ howto/build_navy.txt |    6 +++---
+ 1 files changed, 3 insertions(+), 3 deletions(-)
+
+diff --git a/howto/build_navy.txt b/howto/build_navy.txt
+index 3fd3afb..0ee807e 100644
+--- a/howto/build_navy.txt
++++ b/howto/build_navy.txt
+@@ -119,8 +119,8 @@
+    - libxv-dev
+    - libusplash-dev
+    - latex-make
+-   - textlive-lang-cyrillic
+-   - textlive-latex-extra
++   - texlive-lang-cyrillic
++   - texlive-latex-extra
+    - dia
+    - python-pyrex
+    - libtool
+@@ -128,7 +128,7 @@
+    - sox
+    - cython
+    - imagemagick
+-   - docutils
++   - python-docutils
+ 
+ #. на машине dinar: добавить свой открытый ssh-ключ в authorized_keys2 пользователя ddev
+ #. на своей машине: отредактировать /etc/sudoers (команда ``visudo``) примерно следующим образом::
+-- 
+1.5.6.5
-- 
tg: (c123b7c..) t/mailinfo-multiline-subject (depends on: master)

Thanks,
Kirill

^ permalink raw reply related

* [PATCH 0/2] Allow cloning to an existing empty directory
From: Alexander Potashev @ 2009-01-08 23:24 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Git Mailing List, Alexander Potashev

The problem I experienced today was that I couldn't clone a repo to a
separate filesystem. I've created a new LVM volume, a FS on it (XFS)
and mounted it to a directory.

But wasn't able to clone the repo to that directory. It's impossible to
mount a FS to a non-existent directory, right? But Git refuses to clone
to an existing directory.

The solution in my first patch allows cloning to an existing empty
directory. However, there could be problems doing the same as I did
with XFS using ext2-like filesystems, because they have lost+found
directories, i.e. the root directory of those FSs is never empty.



The first patch adds a function (is_pseudo_dir_name) to compare a
string with "." and "..", the second patch reuses that function in
the rest of the code.


Alexander Potashev (2):
  Allow cloning to an existing empty directory
  Use is_pseudo_dir_name everywhere

 builtin-clone.c         |    8 +++++---
 builtin-count-objects.c |    5 ++---
 builtin-fsck.c          |   14 ++++----------
 builtin-prune.c         |   14 ++++----------
 builtin-rerere.c        |   11 +++++------
 dir.c                   |   31 +++++++++++++++++++++++--------
 dir.h                   |    8 ++++++++
 entry.c                 |    5 ++---
 remote.c                |    6 ++----
 transport.c             |    4 +---
 10 files changed, 56 insertions(+), 50 deletions(-)

^ permalink raw reply

* [PATCH 1/2] Allow cloning to an existing empty directory
From: Alexander Potashev @ 2009-01-08 23:24 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Git Mailing List, Alexander Potashev
In-Reply-To: <1231457063-29186-1-git-send-email-aspotashev@gmail.com>

The die() message changed accordingly.

The previous behaviour was to only allow cloning when the destination
directory doesn't exist.

A new inline function is_pseudo_dir_name is used to check if the
directory name is either "." or "..". It returns a non-zero value if
the given string is "." or "..". It's applicable to a lot of other Git
source code.

Signed-off-by: Alexander Potashev <aspotashev@gmail.com>
---
 builtin-clone.c |    8 +++++---
 dir.c           |   19 +++++++++++++++++++
 dir.h           |    8 ++++++++
 3 files changed, 32 insertions(+), 3 deletions(-)

diff --git a/builtin-clone.c b/builtin-clone.c
index f1a1a0c..e732f15 100644
--- a/builtin-clone.c
+++ b/builtin-clone.c
@@ -357,6 +357,7 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
 	struct stat buf;
 	const char *repo_name, *repo, *work_tree, *git_dir;
 	char *path, *dir;
+	int dest_exists;
 	const struct ref *refs, *head_points_at, *remote_head, *mapped_refs;
 	struct strbuf key = STRBUF_INIT, value = STRBUF_INIT;
 	struct strbuf branch_top = STRBUF_INIT, reflog_msg = STRBUF_INIT;
@@ -406,8 +407,9 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
 		dir = guess_dir_name(repo_name, is_bundle, option_bare);
 	strip_trailing_slashes(dir);
 
-	if (!stat(dir, &buf))
-		die("destination directory '%s' already exists.", dir);
+	if ((dest_exists = !stat(dir, &buf)) && !is_empty_dir(dir))
+		die("destination path '%s' already exists and is not "
+			"an empty directory.", dir);
 
 	strbuf_addf(&reflog_msg, "clone: from %s", repo);
 
@@ -431,7 +433,7 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
 		if (safe_create_leading_directories_const(work_tree) < 0)
 			die("could not create leading directories of '%s': %s",
 					work_tree, strerror(errno));
-		if (mkdir(work_tree, 0755))
+		if (!dest_exists && mkdir(work_tree, 0755))
 			die("could not create work tree dir '%s': %s.",
 					work_tree, strerror(errno));
 		set_git_work_tree(work_tree);
diff --git a/dir.c b/dir.c
index 0131983..bd97e50 100644
--- a/dir.c
+++ b/dir.c
@@ -779,6 +779,25 @@ int is_inside_dir(const char *dir)
 	return get_relative_cwd(buffer, sizeof(buffer), dir) != NULL;
 }
 
+int is_empty_dir(const char *path)
+{
+	DIR *dir = opendir(path);
+	struct dirent *e;
+	int ret = 1;
+
+	if (!dir)
+		return 0;
+
+	while ((e = readdir(dir)) != NULL)
+		if (!is_pseudo_dir_name(e->d_name)) {
+			ret = 0;
+			break;
+		}
+
+	closedir(dir);
+	return ret;
+}
+
 int remove_dir_recursively(struct strbuf *path, int only_empty)
 {
 	DIR *dir = opendir(path->buf);
diff --git a/dir.h b/dir.h
index 768425a..940e057 100644
--- a/dir.h
+++ b/dir.h
@@ -77,6 +77,14 @@ extern int file_exists(const char *);
 extern char *get_relative_cwd(char *buffer, int size, const char *dir);
 extern int is_inside_dir(const char *dir);
 
+static inline int is_pseudo_dir_name(const char *name)
+{
+	return name[0] == '.' && (name[1] == '\0' ||
+		(name[1] == '.' && name[2] == '\0')); /* "." and ".." */
+}
+
+extern int is_empty_dir(const char *dir);
+
 extern void setup_standard_excludes(struct dir_struct *dir);
 extern int remove_dir_recursively(struct strbuf *path, int only_empty);
 
-- 
1.6.1.77.g84c9

^ permalink raw reply related

* [PATCH 2/2] Use is_pseudo_dir_name everywhere
From: Alexander Potashev @ 2009-01-08 23:24 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Git Mailing List, Alexander Potashev
In-Reply-To: <1231457063-29186-2-git-send-email-aspotashev@gmail.com>

Signed-off-by: Alexander Potashev <aspotashev@gmail.com>
---
 builtin-count-objects.c |    5 ++---
 builtin-fsck.c          |   14 ++++----------
 builtin-prune.c         |   14 ++++----------
 builtin-rerere.c        |   11 +++++------
 dir.c                   |   12 ++++--------
 entry.c                 |    5 ++---
 remote.c                |    6 ++----
 transport.c             |    4 +---
 8 files changed, 24 insertions(+), 47 deletions(-)

diff --git a/builtin-count-objects.c b/builtin-count-objects.c
index ab35b65..492a173 100644
--- a/builtin-count-objects.c
+++ b/builtin-count-objects.c
@@ -5,6 +5,7 @@
  */
 
 #include "cache.h"
+#include "dir.h"
 #include "builtin.h"
 #include "parse-options.h"
 
@@ -21,9 +22,7 @@ static void count_objects(DIR *d, char *path, int len, int verbose,
 		const char *cp;
 		int bad = 0;
 
-		if ((ent->d_name[0] == '.') &&
-		    (ent->d_name[1] == 0 ||
-		     ((ent->d_name[1] == '.') && (ent->d_name[2] == 0))))
+		if (is_pseudo_dir_name(ent->d_name))
 			continue;
 		for (cp = ent->d_name; *cp; cp++) {
 			int ch = *cp;
diff --git a/builtin-fsck.c b/builtin-fsck.c
index 297b2c4..291ca8e 100644
--- a/builtin-fsck.c
+++ b/builtin-fsck.c
@@ -10,6 +10,7 @@
 #include "tree-walk.h"
 #include "fsck.h"
 #include "parse-options.h"
+#include "dir.h"
 
 #define REACHABLE 0x0001
 #define SEEN      0x0002
@@ -395,19 +396,12 @@ static void fsck_dir(int i, char *path)
 	while ((de = readdir(dir)) != NULL) {
 		char name[100];
 		unsigned char sha1[20];
-		int len = strlen(de->d_name);
 
-		switch (len) {
-		case 2:
-			if (de->d_name[1] != '.')
-				break;
-		case 1:
-			if (de->d_name[0] != '.')
-				break;
+		if (is_pseudo_dir_name(de->d_name))
 			continue;
-		case 38:
+		if (strlen(de->d_name) == 38) {
 			sprintf(name, "%02x", i);
-			memcpy(name+2, de->d_name, len+1);
+			memcpy(name+2, de->d_name, 39);
 			if (get_sha1_hex(name, sha1) < 0)
 				break;
 			add_sha1_list(sha1, DIRENT_SORT_HINT(de));
diff --git a/builtin-prune.c b/builtin-prune.c
index 7b4ec80..06b61ea 100644
--- a/builtin-prune.c
+++ b/builtin-prune.c
@@ -5,6 +5,7 @@
 #include "builtin.h"
 #include "reachable.h"
 #include "parse-options.h"
+#include "dir.h"
 
 static const char * const prune_usage[] = {
 	"git prune [-n] [-v] [--expire <time>] [--] [<head>...]",
@@ -61,19 +62,12 @@ static int prune_dir(int i, char *path)
 	while ((de = readdir(dir)) != NULL) {
 		char name[100];
 		unsigned char sha1[20];
-		int len = strlen(de->d_name);
 
-		switch (len) {
-		case 2:
-			if (de->d_name[1] != '.')
-				break;
-		case 1:
-			if (de->d_name[0] != '.')
-				break;
+		if (is_pseudo_dir_name(de->d_name))
 			continue;
-		case 38:
+		if (strlen(de->d_name) == 38) {
 			sprintf(name, "%02x", i);
-			memcpy(name+2, de->d_name, len+1);
+			memcpy(name+2, de->d_name, 39);
 			if (get_sha1_hex(name, sha1) < 0)
 				break;
 
diff --git a/builtin-rerere.c b/builtin-rerere.c
index d4dec6b..1ac5225 100644
--- a/builtin-rerere.c
+++ b/builtin-rerere.c
@@ -1,5 +1,6 @@
 #include "builtin.h"
 #include "cache.h"
+#include "dir.h"
 #include "string-list.h"
 #include "rerere.h"
 #include "xdiff/xdiff.h"
@@ -59,17 +60,15 @@ static void garbage_collect(struct string_list *rr)
 	git_config(git_rerere_gc_config, NULL);
 	dir = opendir(git_path("rr-cache"));
 	while ((e = readdir(dir))) {
-		const char *name = e->d_name;
-		if (name[0] == '.' &&
-		    (name[1] == '\0' || (name[1] == '.' && name[2] == '\0')))
+		if (is_pseudo_dir_name (e->d_name))
 			continue;
-		then = rerere_created_at(name);
+		then = rerere_created_at(e->d_name);
 		if (!then)
 			continue;
-		cutoff = (has_resolution(name)
+		cutoff = (has_resolution(e->d_name)
 			  ? cutoff_resolve : cutoff_noresolve);
 		if (then < now - cutoff * 86400)
-			string_list_append(name, &to_remove);
+			string_list_append(e->d_name, &to_remove);
 	}
 	for (i = 0; i < to_remove.nr; i++)
 		unlink_rr_item(to_remove.items[i].string);
diff --git a/dir.c b/dir.c
index bd97e50..cdd3beb 100644
--- a/dir.c
+++ b/dir.c
@@ -585,10 +585,8 @@ static int read_directory_recursive(struct dir_struct *dir, const char *path, co
 			int len, dtype;
 			int exclude;
 
-			if ((de->d_name[0] == '.') &&
-			    (de->d_name[1] == 0 ||
-			     !strcmp(de->d_name + 1, ".") ||
-			     !strcmp(de->d_name + 1, "git")))
+			if (is_pseudo_dir_name(de->d_name) ||
+			     !strcmp(de->d_name, ".git"))
 				continue;
 			len = strlen(de->d_name);
 			/* Ignore overly long pathnames! */
@@ -812,10 +810,8 @@ int remove_dir_recursively(struct strbuf *path, int only_empty)
 	len = path->len;
 	while ((e = readdir(dir)) != NULL) {
 		struct stat st;
-		if ((e->d_name[0] == '.') &&
-		    ((e->d_name[1] == 0) ||
-		     ((e->d_name[1] == '.') && e->d_name[2] == 0)))
-			continue; /* "." and ".." */
+		if (is_pseudo_dir_name(e->d_name))
+			continue;
 
 		strbuf_setlen(path, len);
 		strbuf_addstr(path, e->d_name);
diff --git a/entry.c b/entry.c
index aa2ee46..9c6a9cf 100644
--- a/entry.c
+++ b/entry.c
@@ -1,5 +1,6 @@
 #include "cache.h"
 #include "blob.h"
+#include "dir.h"
 
 static void create_directories(const char *path, const struct checkout *state)
 {
@@ -62,9 +63,7 @@ static void remove_subtree(const char *path)
 	*name++ = '/';
 	while ((de = readdir(dir)) != NULL) {
 		struct stat st;
-		if ((de->d_name[0] == '.') &&
-		    ((de->d_name[1] == 0) ||
-		     ((de->d_name[1] == '.') && de->d_name[2] == 0)))
+		if (is_pseudo_dir_name(de->d_name))
 			continue;
 		strcpy(name, de->d_name);
 		if (lstat(pathbuf, &st))
diff --git a/remote.c b/remote.c
index 570e112..2fb5143 100644
--- a/remote.c
+++ b/remote.c
@@ -4,6 +4,7 @@
 #include "commit.h"
 #include "diff.h"
 #include "revision.h"
+#include "dir.h"
 
 static struct refspec s_tag_refspec = {
 	0,
@@ -634,10 +635,7 @@ static struct refspec *parse_push_refspec(int nr_refspec, const char **refspec)
 
 static int valid_remote_nick(const char *name)
 {
-	if (!name[0] || /* not empty */
-	    (name[0] == '.' && /* not "." */
-	     (!name[1] || /* not ".." */
-	      (name[1] == '.' && !name[2]))))
+	if (!name[0] || is_pseudo_dir_name(name))
 		return 0;
 	return !strchr(name, '/'); /* no slash */
 }
diff --git a/transport.c b/transport.c
index 56831c5..d4e3c25 100644
--- a/transport.c
+++ b/transport.c
@@ -50,9 +50,7 @@ static int read_loose_refs(struct strbuf *path, int name_offset,
 	memset (&list, 0, sizeof(list));
 
 	while ((de = readdir(dir))) {
-		if (de->d_name[0] == '.' && (de->d_name[1] == '\0' ||
-				(de->d_name[1] == '.' &&
-				 de->d_name[2] == '\0')))
+		if (is_pseudo_dir_name(de->d_name))
 			continue;
 		ALLOC_GROW(list.entries, list.nr + 1, list.alloc);
 		list.entries[list.nr++] = xstrdup(de->d_name);
-- 
1.6.1.77.g84c9

^ permalink raw reply related

* Re: gitweb index performance (Re: [PATCH] gitweb: support the rel=vcs-* microformat)
From: J.H. @ 2009-01-08 23:53 UTC (permalink / raw)
  To: Joey Hess; +Cc: git
In-Reply-To: <20090108195446.GB18025@gnu.kitenet.net>

Joey Hess wrote:
> Giuseppe Bilotta wrote:
>   
>>> There is a small overhead in including the microformat on project list
>>> and forks list pages, but getting the project descriptions for those pages
>>> already incurs a similar overhead, and the ability to get every repo url
>>> in one place seems worthwhile.
>>>       
>> I agree with this, although people with very large project lists may
>> differ ... do we have timings on these?
>>     
>
> AFAICS, when displaying the project list, gitweb reads each project's
> description file, falling back to reading its config file if there is no
> description file.
>
> If performance was a problem here, the thing to do would be to add
> project descriptions to the $project_list file, and use those in
> preference to the description files. If a large site has done that,
> they've not sent in the patch. :-)
>   

No because all the large sites have pain points and issues elsewhere in 
the app.  Most of the large sites (which I can at least speak for 
Kernel.org) went and have built in full caching layers into gitweb 
itself to deal with the problem.  This means that we don't have to worry 
about nickle and dime performance improvements that are specific to one 
section, but can do a very broad sweep and get dramatically better 
performance across all of gitweb.  Those patches have all made it back 
out onto the mailing list, but for a number of different reasons none 
have been accepted into the mainline branch.

> With my patch, it will read each cloneurl file too. The best way to
> optimise that for large sites seems to be to add an option that would
> ignore the cloneurl files and config file and always use
> @git_base_url_list.
>
> I checked the only large site I have access to (git.debian.org) and they
> use a $project_list file, but I see no other performance tuning. That's
> a 2 ghz machine; it takes gitweb 28 (!) seconds to generate the nearly 1
> MB index web page for 1671 repositories:
>   

Look at either Lea's or my caching engines, it will help dramatically on 
something of that size.

> /srv/git.debian.org/http/cgi-bin/gitweb.cgi  3.04s user 9.24s system 43% cpu 28.515 total
>
> Notice that most of the time is spent by child processes. For each
> repository, gitweb runs git-for-each-ref to determine the time of the
> last commit.
>
> If that is removed (say if there were a way to get the info w/o
> forking), performance improves nicely:
>
> ./gitweb.cgi > /dev/null  1.29s user 1.08s system 69% cpu 3.389 total
>
> Making it not read description files for each project, as I suggest above,
> is the next best optimisation:
>
> ./gitweb.cgi > /dev/null  1.08s user 0.05s system 96% cpu 1.170 total
>
> So, I think it makes sense to optimise gitweb and offer knobs for performance
> tuning at the expense of the flexability of description and cloneurl files.
> But, git-for-each-ref is swamping everything else
The problem is the knobs are going to be very fine grained, you really 
are better off looking at one of the caching engines that's available 
now.  Performance options are hard, because it's difficult to relay to 
anyone the complex tradeoffs, thus keeping knobs like that to a minimum 
are really a necessity.

- John 'Warthog9' Hawley

^ permalink raw reply

* Re: [RFC/PATCH 2/3] replace_object: add mechanism to replace objects found in "refs/replace/"
From: Junio C Hamano @ 2009-01-08 23:55 UTC (permalink / raw)
  To: Christian Couder; +Cc: git
In-Reply-To: <200901081831.22616.chriscool@tuxfamily.org>

Christian Couder <chriscool@tuxfamily.org> writes:

> Yeah, but read_sha1_file is called to read all object files, not just 
> commits. So putting the hook there will:
>
> 	1) add a lookup overhead when reading any object,
> 	2) make it possible to replace any object,

I actually see (2) as an improvement, and (1) as an associated cost.

> And there is also the following problem:
>
> 	3) this function is often called like this:
>
> 	buffer = read_sha1_file(sha1, &type, &size);
> 	if (!buffer)
> 		die("Cannot read %s", sha1_to_hex(sha1));
>
> 	so in case of error, it will give an error message with a bad sha1
> 	in it because the sha1 of the file that we cannot read is the sha1
> 	in the replace ref not the one passed to read_sha1_file.

You have refs/replace/$A that records $B, to tell git that the real object
$A in the history is replaced by another object $B.  The caller feeds $A
in the above snippet to read_sha1_file(), and your read_sha1_file()
notices that it needs to read $B instead, returns the buffer from the
object $B, and reports its type and size.  If there is no $B available, it
may return NULL and the caller says "I asked for $A but in this repository
I cannot get to it".  That sounds consistent to me, but I agree it would
be more helpful to report "and the reason why I cannot get to it is
because you have replacement defined as $B which you do not have."

> To avoid the above problems, maybe we can try to also improve what 
> read_sha1_file does:
>
> 1) allow callers to pass a type in the "type" argument and only lookup in 
> the replace refs if we say we want a commit, but this makes calling this 
> function more error prone

This is debatable, but can go either way.

> 2) when we say we want an object with a given type, check if the object we 
> read has this type (and die if not)

That we already do anyway, don't we?  parse_commit() gets data from
read_sha1_file() and would complain if it gets a blob, etc.

> 3) die in read_sha1_file when there is an error and we are replacing so that 
> callers don't need to die themself and so that we can always report an 
> accurate sha1 in the error message

I expect the use of graft and object replacement (or if you insist,
"commit replacement") rather rare, and I think it is probably Ok to
declare it a grave repository misconfiguration if somebody claims that $A
is replaced by $B without actually having $B:

	void *read_sha1_file(sha1, type, size)
        {
		void *data;
        	unsigned char *replacement = lookup_replace_object(sha1);
                if (replacement) {
                	data = read_sha1_file(replacement, type, size);
                        if (!data)
                        	die("replacement %s not found for %s",
                                    get_sha1_hex(replacement),
                                    get_sha1_hex(sha1));
		} else {
			data = read_object(sha1, type, size);
		}
                ... existing code ...
                return data;                
        }


To disable replacement for connectivity walkers, lookup_replace_object()
can look at a some global defined in environment.c, perhaps.

^ permalink raw reply

* [RFC PATCH] make diff --color-words customizable
From: Thomas Rast @ 2009-01-09  0:05 UTC (permalink / raw)
  To: git; +Cc: Johannes Schindelin

Allows for user-configurable word splits when using --color-words.
This can make the diff more readable if the regex is configured
according to the language of the file.

For now the (POSIX extended) regex must be set via the environment
GIT_DIFF_WORDS_REGEX.  Each (non-overlapping) match of the regex is
considered a word.  Anything characters not matched are considered
whitespace.  For example, for C try

  GIT_DIFF_WORDS_REGEX='[0-9]+|[a-zA-Z_][a-zA-Z0-9_]*|(\+|-|&|\|){1,2}|\S'

and for TeX try

  GIT_DIFF_WORDS_REGEX='\\[a-zA-Z@]+ *|\{|\}|\\.|[^\{} [:space:]]+'

Signed-off-by: Thomas Rast <trast@student.ethz.ch>

---

Word diff becomes much more useful especially with TeX, where it is
common to run together \sequences\of\commands\like\this that the
current --color-words treats as a single word.

Apart from possible bugs, the main issue is: where should I put the
configuration for this?


 diff.c |  142 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++-------
 1 files changed, 127 insertions(+), 15 deletions(-)

diff --git a/diff.c b/diff.c
index d235482..c1e24de 100644
--- a/diff.c
+++ b/diff.c
@@ -321,6 +321,7 @@ struct diff_words_buffer {
 	long alloc;
 	long current; /* output pointer */
 	int suppressed_newline;
+	enum diff_word_boundaries *boundaries;
 };
 
 static void diff_words_append(char *line, unsigned long len,
@@ -336,21 +337,35 @@ static void diff_words_append(char *line, unsigned long len,
 	buffer->text.size += len;
 }
 
+enum diff_word_boundaries {
+	DIFF_WORD_CONT,
+	DIFF_WORD_START,
+	DIFF_WORD_SPACE
+};
+
+
 struct diff_words_data {
 	struct diff_words_buffer minus, plus;
 	FILE *file;
+	enum diff_word_boundaries *minus_boundaries, *plus_boundaries;
 };
 
-static void print_word(FILE *file, struct diff_words_buffer *buffer, int len, int color,
+static int print_word(FILE *file, struct diff_words_buffer *buffer, int len, int color,
 		int suppress_newline)
 {
 	const char *ptr;
 	int eol = 0;
 
 	if (len == 0)
-		return;
+		return len;
 
 	ptr  = buffer->text.ptr + buffer->current;
+
+	if (buffer->boundaries[buffer->current+len-1] == DIFF_WORD_START) {
+		buffer->boundaries[buffer->current+len-1] = DIFF_WORD_CONT;
+		len--;
+	}
+
 	buffer->current += len;
 
 	if (ptr[len - 1] == '\n') {
@@ -368,6 +383,8 @@ static void print_word(FILE *file, struct diff_words_buffer *buffer, int len, in
 		else
 			putc('\n', file);
 	}
+
+	return len;
 }
 
 static void fn_out_diff_words_aux(void *priv, char *line, unsigned long len)
@@ -391,13 +408,79 @@ static void fn_out_diff_words_aux(void *priv, char *line, unsigned long len)
 				   &diff_words->plus, len, DIFF_FILE_NEW, 0);
 			break;
 		case ' ':
-			print_word(diff_words->file,
-				   &diff_words->plus, len, DIFF_PLAIN, 0);
+			len = print_word(diff_words->file,
+					 &diff_words->plus, len, DIFF_PLAIN, 0);
 			diff_words->minus.current += len;
 			break;
 	}
 }
 
+static char *worddiff_default = "\\S+";
+static regex_t worddiff_regex;
+static int worddiff_regex_compiled = 0;
+
+static int scan_word_boundaries(struct diff_words_buffer *buf)
+{
+	enum diff_word_boundaries *boundaries = buf->boundaries;
+	char *text = buf->text.ptr;
+	int len = buf->text.size;
+
+	int i = 0;
+	int count = 0;
+	int ret;
+	regmatch_t matches[1];
+	int offset, wordlen;
+	char *strz;
+
+	if (!text)
+		return 0;
+
+	if (!worddiff_regex_compiled) {
+		char *wd_pat = getenv("GIT_DIFF_WORDS_REGEX");
+		if (!wd_pat)
+			wd_pat = worddiff_default;
+		ret = regcomp(&worddiff_regex, wd_pat, REG_EXTENDED);
+		if (ret) {
+			char errbuf[1024];
+			regerror(ret, &worddiff_regex, errbuf, 1024);
+			die("word diff regex failed to compile: '%s': %s",
+			    wd_pat, errbuf);
+		}
+		worddiff_regex_compiled = 1;
+	}
+
+	strz = xmalloc(len+1);
+	memcpy(strz, text, len);
+	strz[len] = '\0';
+
+	while (i < len) {
+		ret = regexec(&worddiff_regex, strz+i, 1, matches, 0);
+		if (ret == REG_NOMATCH) {
+			/* the rest is whitespace */
+			while (i < len)
+				boundaries[i++] = DIFF_WORD_SPACE;
+			break;
+		}
+
+		offset = matches[0].rm_so;
+		while (offset-- > 0 && i < len)
+			boundaries[i++] = DIFF_WORD_SPACE;
+
+		wordlen = matches[0].rm_eo - matches[0].rm_so;
+		if (wordlen-- > 0 && i < len) {
+			boundaries[i++] = DIFF_WORD_START;
+			count++;
+		}
+		while (wordlen-- > 0 && i < len)
+			boundaries[i++] = DIFF_WORD_CONT;
+	}
+
+	free(strz);
+
+	return count;
+}
+
+
 /* this executes the word diff on the accumulated buffers */
 static void diff_words_show(struct diff_words_data *diff_words)
 {
@@ -406,23 +489,50 @@ static void diff_words_show(struct diff_words_data *diff_words)
 	xdemitcb_t ecb;
 	mmfile_t minus, plus;
 	int i;
+	char *p;
+	int bcount;
 
 	memset(&xpp, 0, sizeof(xpp));
 	memset(&xecfg, 0, sizeof(xecfg));
-	minus.size = diff_words->minus.text.size;
-	minus.ptr = xmalloc(minus.size);
-	memcpy(minus.ptr, diff_words->minus.text.ptr, minus.size);
-	for (i = 0; i < minus.size; i++)
-		if (isspace(minus.ptr[i]))
-			minus.ptr[i] = '\n';
+
+	diff_words->minus.boundaries = xmalloc(diff_words->minus.text.size * sizeof(enum diff_word_boundaries));
+	bcount = scan_word_boundaries(&diff_words->minus);
+	minus.size = diff_words->minus.text.size + bcount;
+	minus.ptr = xmalloc(minus.size + bcount);
+	p = minus.ptr;
+	for (i = 0; i < diff_words->minus.text.size; i++) {
+		switch (diff_words->minus.boundaries[i]) {
+		case DIFF_WORD_START:
+			*p++ = '\n';
+			/* fall through */
+		case DIFF_WORD_CONT:
+			*p++ = diff_words->minus.text.ptr[i];
+			break;
+		case DIFF_WORD_SPACE:
+			*p++ = '\n';
+			break;
+		}
+	}
 	diff_words->minus.current = 0;
 
-	plus.size = diff_words->plus.text.size;
+	diff_words->plus.boundaries = xmalloc(diff_words->plus.text.size * sizeof(enum diff_word_boundaries));
+	bcount = scan_word_boundaries(&diff_words->plus);
+	plus.size = diff_words->plus.text.size + bcount;
 	plus.ptr = xmalloc(plus.size);
-	memcpy(plus.ptr, diff_words->plus.text.ptr, plus.size);
-	for (i = 0; i < plus.size; i++)
-		if (isspace(plus.ptr[i]))
-			plus.ptr[i] = '\n';
+	p = plus.ptr;
+	for (i = 0; i < diff_words->plus.text.size; i++) {
+		switch (diff_words->plus.boundaries[i]) {
+		case DIFF_WORD_START:
+			*p++ = '\n';
+			/* fall through */
+		case DIFF_WORD_CONT:
+			*p++ = diff_words->plus.text.ptr[i];
+			break;
+		case DIFF_WORD_SPACE:
+			*p++ = '\n';
+			break;
+		}
+	}
 	diff_words->plus.current = 0;
 
 	xpp.flags = XDF_NEED_MINIMAL;
@@ -432,6 +542,8 @@ static void diff_words_show(struct diff_words_data *diff_words)
 	free(minus.ptr);
 	free(plus.ptr);
 	diff_words->minus.text.size = diff_words->plus.text.size = 0;
+	free(diff_words->minus.boundaries);
+	free(diff_words->plus.boundaries);
 
 	if (diff_words->minus.suppressed_newline) {
 		putc('\n', diff_words->file);
-- 
tg: (c123b7c..) t/word-diff-regex (depends on: origin/master)

^ permalink raw reply related

* git-cache-meta -- simple file meta data caching and applying
From: jidanni @ 2009-01-09  0:13 UTC (permalink / raw)
  To: git

Gentlemen, I have whipped up this:

#!/bin/sh -e
#git-cache-meta -- simple file meta data caching and applying.
#Simpler than etckeeper, metastore, setgitperms, etc.
: ${GIT_CACHE_META_FILE=.git_cache_meta}
case $@ in
    --store|--stdout)
	case $1 in --store) exec > $GIT_CACHE_META_FILE; esac
	find $(git ls-files) \
	    \( -user ${USER?} -o -printf 'chowm %u %p\n' \) \
	    \( -group $USER -o -printf 'chgrp %g %p\n' \) \
	    \( \( -type l -o -perm 755 -o -perm 644 \) -o -printf 'chmod %#m %p\n' \);;
    --apply) sh -e $GIT_CACHE_META_FILE;;
    *) 1>&2 echo "Usage: $0 --store|--stdout|--apply"; exit 1;;
esac

^ permalink raw reply

* Re: gitweb index performance (Re: [PATCH] gitweb: support the rel=vcs-* microformat)
From: Miklos Vajna @ 2009-01-09  0:16 UTC (permalink / raw)
  To: J.H., git; +Cc: Joey Hess
In-Reply-To: <496691EC.1070805@eaglescrag.net>

[-- Attachment #1: Type: text/plain, Size: 433 bytes --]

On Thu, Jan 08, 2009 at 03:53:16PM -0800, "J.H." <warthog19@eaglescrag.net> wrote:
> Look at either Lea's or my caching engines, it will help dramatically on 
> something of that size.

repo.or.cz uses a single patch for caching the project list only:

http://repo.or.cz/w/git/repo.git?a=commit;h=152fb0b22d36c6981ac3c4403b69ad91b27a1bc6

you are probably better off with such a small patch instead of using a
gitweb fork.

[-- Attachment #2: Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply

* Re: gitweb index performance (Re: [PATCH] gitweb: support the rel=vcs-* microformat)
From: Johannes Schindelin @ 2009-01-09  0:19 UTC (permalink / raw)
  To: J.H.; +Cc: Joey Hess, git
In-Reply-To: <496691EC.1070805@eaglescrag.net>

Hi,

On Thu, 8 Jan 2009, J.H. wrote:

> Look at either Lea's or my caching engines, it will help dramatically on 
> something of that size.

Speaking of which, do you have any performance comparisons between the 
two?

Ciao,
Dscho

^ permalink raw reply

* Re: git-cache-meta -- simple file meta data caching and applying
From: Jay Soffian @ 2009-01-09  0:22 UTC (permalink / raw)
  To: jidanni; +Cc: git
In-Reply-To: <87hc49jq04.fsf@jidanni.org>

On Thu, Jan 8, 2009 at 7:13 PM,  <jidanni@jidanni.org> wrote:
> Gentlemen, I have whipped up this:
>
> #!/bin/sh -e
> #git-cache-meta -- simple file meta data caching and applying.
> #Simpler than etckeeper, metastore, setgitperms, etc.
> : ${GIT_CACHE_META_FILE=.git_cache_meta}
> case $@ in
>    --store|--stdout)
>        case $1 in --store) exec > $GIT_CACHE_META_FILE; esac
>        find $(git ls-files) \
>            \( -user ${USER?} -o -printf 'chowm %u %p\n' \) \
>            \( -group $USER -o -printf 'chgrp %g %p\n' \) \
>            \( \( -type l -o -perm 755 -o -perm 644 \) -o -printf 'chmod %#m %p\n' \);;
>    --apply) sh -e $GIT_CACHE_META_FILE;;
>    *) 1>&2 echo "Usage: $0 --store|--stdout|--apply"; exit 1;;
> esac

It doesn't handle paths which contain white-space. "chown" is typo'd
as "chowm". To be useful, the contribution might also include
instructions on how it should be used with git, and perhaps also
reasoning for why someone would want to use it in place of etckeeper,
metastore, setgitperms, etc.

j.

^ permalink raw reply

* Re: [RFC PATCH] make diff --color-words customizable
From: Johannes Schindelin @ 2009-01-09  0:25 UTC (permalink / raw)
  To: Thomas Rast; +Cc: git
In-Reply-To: <1231459505-14395-1-git-send-email-trast@student.ethz.ch>

Hi,

On Fri, 9 Jan 2009, Thomas Rast wrote:

> Allows for user-configurable word splits when using --color-words. This 
> can make the diff more readable if the regex is configured according to 
> the language of the file.
> 
> For now the (POSIX extended) regex must be set via the environment
> GIT_DIFF_WORDS_REGEX.  Each (non-overlapping) match of the regex is
> considered a word.  Anything characters not matched are considered
> whitespace.  For example, for C try
> 
>   GIT_DIFF_WORDS_REGEX='[0-9]+|[a-zA-Z_][a-zA-Z0-9_]*|(\+|-|&|\|){1,2}|\S'
> 
> and for TeX try
> 
>   GIT_DIFF_WORDS_REGEX='\\[a-zA-Z@]+ *|\{|\}|\\.|[^\{} [:space:]]+'

Interesting idea.  However, I think it would be better to do the opposite, 
have _word_ patterns.  And even better to have _one_ pattern.

Then we could have a --color-words-regex=<regex> option.

BTW I think you could do what you intended to do with a _way_ smaller 
and more intuitive patch.

Ciao,
Dscho

^ permalink raw reply

* Re: gitweb index performance (Re: [PATCH] gitweb: support the rel=vcs-* microformat)
From: J.H. @ 2009-01-09  0:26 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Joey Hess, git
In-Reply-To: <alpine.DEB.1.00.0901090118431.30769@pacific.mpi-cbg.de>

Johannes Schindelin wrote:
> Hi,
>
> On Thu, 8 Jan 2009, J.H. wrote:
>
>   
>> Look at either Lea's or my caching engines, it will help dramatically on 
>> something of that size.
>>     
>
> Speaking of which, do you have any performance comparisons between the 
> two?
>   
Lea's got some - I can see if I can dig up my copy (or if she's paying 
attention maybe she can publish them), though either one is orders of 
magnitude faster than the normal code.  Beyond that it waffles back and 
forth which one is faster & why mainly because of the approaches we each 
took on the caching.  Generally speaking I would push people more 
towards Lea's than my work, if nothing else hers is more in line with 
current gitweb, though I have had some thoughts about undoing my file 
breakout and getting my code base back up to speed.

- John 'Warthog9' Hawley

^ permalink raw reply

* Re: git-cache-meta -- simple file meta data caching and applying
From: Jay Soffian @ 2009-01-09  0:28 UTC (permalink / raw)
  To: jidanni; +Cc: git
In-Reply-To: <76718490901081622q618c43d0t333882cbe44f6b30@mail.gmail.com>

On Thu, Jan 8, 2009 at 7:22 PM, Jay Soffian <jaysoffian@gmail.com> wrote:
> It doesn't handle paths which contain white-space. "chown" is typo'd
> as "chowm". To be useful, the contribution might also include
> instructions on how it should be used with git, and perhaps also
> reasoning for why someone would want to use it in place of etckeeper,
> metastore, setgitperms, etc.

It will also blow-up if the output of "git ls-files" exceeds
limitations on number of arguments. Also, might be worth mentioning it
requires GNU find.

j.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox