Git development
 help / color / mirror / Atom feed
* Re: gitweb index performance (Re: [PATCH] gitweb: support the rel=vcs-* microformat)
From: Jakub Narebski @ 2009-01-10  1:44 UTC (permalink / raw)
  To: J.H.; +Cc: Joey Hess, git, Giuseppe Bilotta
In-Reply-To: <496691EC.1070805@eaglescrag.net>

"J.H." <warthog19@eaglescrag.net> writes:
> Joey Hess wrote:
>> Giuseppe Bilotta wrote:
>>
>>>> There is a small overhead in including the microformat on project list
>>>> and forks list pages, but getting the project descriptions for those pages
>>>> already incurs a similar overhead, and the ability to get every repo url
>>>> in one place seems worthwhile.
>>>>
>>> I agree with this, although people with very large project lists may
>>> differ ... do we have timings on these?
>>>
>>
>> AFAICS, when displaying the project list, gitweb reads each project's
>> description file, falling back to reading its config file if there is no
>> description file.
>>
>> If performance was a problem here, the thing to do would be to add
>> project descriptions to the $project_list file, and use those in
>> preference to the description files. If a large site has done that,
>> they've not sent in the patch. :-)
> 
> No because all the large sites have pain points and issues elsewhere
> in the app.  Most of the large sites (which I can at least speak for
> Kernel.org) went and have built in full caching layers into gitweb
> itself to deal with the problem.  This means that we don't have to
> worry about nickle and dime performance improvements that are specific
> to one section, but can do a very broad sweep and get dramatically
> better performance across all of gitweb.  Those patches have all made
> it back out onto the mailing list, but for a number of different
> reasons none have been accepted into the mainline branch.

Additional issue is that when you add or delete repository (project),
you have to correct or regenerate projects_index file.  While it is I
think quite easy for git hosting sites such as repo.or.cz, it is
harder for sites which offer gitweb just like they ofer WWW homepages:
as a service, with repositories created (and descriptions updated)
outside of gitweb control.

-- 
Jakub Narebski
Poland
ShadeHawk on #git

^ permalink raw reply

* Re: [PATCH] Get format-patch to show first commit after root commit
From: Nathan W. Panike @ 2009-01-10  1:37 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <7vmye0yohu.fsf@gitster.siamese.dyndns.org>

Hi:

On Fri, Jan 9, 2009 at 6:49 PM, Junio C Hamano <gitster@pobox.com> wrote:

> I do not see anything special you do for "one commit" case in your patch,
> yet the proposed commit message keeps stressing "-1", which puzzles me.

I was trying to address Alexander's concerns he brought up previously
in the thread.

> Wouldn't it suffice to simply say something like:
>
>    You need to explicitly ask for --root to obtain a patch for the root
>    commit.  This may have been a good way to make sure that the user
>    realizes that a patch from the root commit won't be applicable to a
>    history with existing data, but we should assume the user knows what
>    he is doing when the user explicitly specifies a range of commits that
>    includes the root commit.
>

Indeed it would.  I was giving a specific case that shows what problem
this patch addresses.

> Three issues.
>
>  - The "if(){" violates style by not having one SP before "(" and after ")",
>   and surrounds a single statement with needless { } pair.  You need one SP
>   on each side of the = (assignment) as well.
>
>  - Because rev.show_root_diff is a no-op for non-root commit anyway, I do not
>   think you even want a conditional there.
>
>  - It is a bad style to muck with rev.* while it is actively used for
>   iteration (note that the above part is in a while loop that iterates over
>   &rev).

Thanks for the advice.  I shall adhere to it next time I submit a patch.

> I think the attached would be a better patch.  We already have a
> configuration to control if we show the patch for a root commit by
> default, and we can use reuse it here.  The configuration defaults to true
> these days.

I did not realize this configuration was available.  The patch below
is much more elegant.

> Because the code before the hunk must check if the user said "--root
> commit" or just "commit" from the command line and behave quite
> differently by looking at rev.show_root_diff, we cannot do this assignment
> before the command line parsing like other commands in the log family.
>
>  builtin-log.c |    8 ++++++++
>  1 files changed, 8 insertions(+), 0 deletions(-)
>
> diff --git c/builtin-log.c w/builtin-log.c
> index 4a02ee9..2d2c111 100644
> --- c/builtin-log.c
> +++ w/builtin-log.c
> @@ -935,6 +935,14 @@ int cmd_format_patch(int argc, const char **argv, const char *prefix)
>                 * get_revision() to do the usual traversal.
>                 */
>        }
> +
> +       /*
> +        * We cannot move this anywhere earlier because we do want to
> +        * know if --root was given explicitly from the comand line.
> +        */
> +       if (default_show_root)
> +               rev.show_root_diff = 1;
> +
>        if (cover_letter) {
>                /* remember the range */
>                int i;
>

Thanks,

Nathan Panike

^ permalink raw reply

* Re: [EGIT PATCH 3/3] Present full name of file revision
From: Robin Rosenberg @ 2009-01-10  1:28 UTC (permalink / raw)
  To: spearce; +Cc: git
In-Reply-To: <1231550077-1057-4-git-send-email-robin.rosenberg@dewire.com>

lördag 10 januari 2009 02:14:37 skrev Robin Rosenberg:
> The name need not be a path.

Drop that part of the comment. Not true.

-- robin

^ permalink raw reply

* [EGIT PATCH 3/3] Present full name of file revision
From: Robin Rosenberg @ 2009-01-10  1:14 UTC (permalink / raw)
  To: spearce; +Cc: git, Robin Rosenberg
In-Reply-To: <1231550077-1057-3-git-send-email-robin.rosenberg@dewire.com>

The name need not be a path.

Signed-off-by: Robin Rosenberg <robin.rosenberg@dewire.com>
---
 .../core/internal/storage/GitFileRevision.java     |    5 ++---
 1 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/org.spearce.egit.core/src/org/spearce/egit/core/internal/storage/GitFileRevision.java b/org.spearce.egit.core/src/org/spearce/egit/core/internal/storage/GitFileRevision.java
index 21ba19e..c762e2e 100644
--- a/org.spearce.egit.core/src/org/spearce/egit/core/internal/storage/GitFileRevision.java
+++ b/org.spearce.egit.core/src/org/spearce/egit/core/internal/storage/GitFileRevision.java
@@ -32,7 +32,7 @@
 
 	/**
 	 * Obtain a file revision for a specific blob of an existing commit.
-	 * 
+	 *
 	 * @param db
 	 *            the repository this commit was loaded out of, and that this
 	 *            file's blob should also be reachable through.
@@ -56,8 +56,7 @@ GitFileRevision(final String fileName) {
 	}
 
 	public String getName() {
-		final int last = path.lastIndexOf('/');
-		return last >= 0 ? path.substring(last + 1) : path;
+		return path;
 	}
 
 	public boolean isPropertyMissing() {
-- 
1.6.1.rc3.56.gd0306

^ permalink raw reply related

* [EGIT PATCH 2/3] Present type of change with file revision in diff viewer
From: Robin Rosenberg @ 2009-01-10  1:14 UTC (permalink / raw)
  To: spearce; +Cc: git, Robin Rosenberg
In-Reply-To: <1231550077-1057-2-git-send-email-robin.rosenberg@dewire.com>

Signed-off-by: Robin Rosenberg <robin.rosenberg@dewire.com>
---
 .../spearce/egit/ui/internal/history/RevDiff.java  |   40 +++++++++++++++++++-
 1 files changed, 38 insertions(+), 2 deletions(-)

diff --git a/org.spearce.egit.ui/src/org/spearce/egit/ui/internal/history/RevDiff.java b/org.spearce.egit.ui/src/org/spearce/egit/ui/internal/history/RevDiff.java
index 020ec73..084da3b 100644
--- a/org.spearce.egit.ui/src/org/spearce/egit/ui/internal/history/RevDiff.java
+++ b/org.spearce.egit.ui/src/org/spearce/egit/ui/internal/history/RevDiff.java
@@ -2,9 +2,13 @@
 
 import org.eclipse.compare.ITypedElement;
 import org.eclipse.compare.structuremergeviewer.IStructureComparator;
+import org.eclipse.jface.resource.ImageDescriptor;
+import org.eclipse.jface.viewers.DecorationOverlayIcon;
+import org.eclipse.jface.viewers.IDecoration;
 import org.eclipse.swt.graphics.Image;
 import org.eclipse.team.internal.ui.history.FileRevisionTypedElement;
 import org.spearce.egit.core.internal.storage.GitFileRevision;
+import org.spearce.egit.ui.UIIcons;
 import org.spearce.jgit.lib.Repository;
 
 class RevDiff {
@@ -46,9 +50,41 @@ public String getType() {
 	public Object[] getChildren() {
 		FileRevisionTypedElement[] ret = new FileRevisionTypedElement[fileDiffs.length];
 		for (int i = 0; i < ret.length; ++i) {
+			final FileDiff d = fileDiffs[i];
 			ret[i] = new FileRevisionTypedElement(GitFileRevision.inCommit(db,
-					fileDiffs[i].commit, fileDiffs[i].path,
-					fileDiffs[i].blobs[side]));
+					d.commit, d.path, d.blobs[side])) {
+				private Image image;
+
+				@Override
+				protected void finalize() throws Throwable {
+					if (image != null)
+						image.dispose();
+				}
+
+				@Override
+				public Image getImage() {
+					if (image == null) {
+						ImageDescriptor overlay;
+						switch (d.change.charAt(0)) {
+						case 'A':
+							overlay = UIIcons.OVR_PENDING_ADD;
+							break;
+						case 'M':
+							overlay = UIIcons.OVR_SHARED;
+							break;
+						case 'D':
+							overlay = UIIcons.OVR_PENDING_REMOVE;
+							break;
+						default:
+							return super.getImage(); // Should not happen...
+						}
+						image = new DecorationOverlayIcon(super.getImage(),
+								overlay, IDecoration.BOTTOM_RIGHT)
+								.createImage();
+					}
+					return image;
+				}
+			};
 		}
 		return ret;
 	}
-- 
1.6.1.rc3.56.gd0306

^ permalink raw reply related

* [EGIT PATCH 1/3] Support viewing all changes in a single compare editor
From: Robin Rosenberg @ 2009-01-10  1:14 UTC (permalink / raw)
  To: spearce; +Cc: git, Robin Rosenberg
In-Reply-To: <1231550077-1057-1-git-send-email-robin.rosenberg@dewire.com>

Instead of having to click on every file listed as a diff
an extra diff entry is inserted at the top. Double clicking
on it will launch a compare editor for all changed files.

Signed-off-by: Robin Rosenberg <robin.rosenberg@dewire.com>
---
 .../ui/internal/history/CommitFileDiffViewer.java  |   40 ++++++++++++--
 .../internal/history/FileDiffContentProvider.java  |   11 +++-
 .../ui/internal/history/FileDiffLabelProvider.java |   49 +++++++++++++++--
 .../spearce/egit/ui/internal/history/RevDiff.java  |   55 ++++++++++++++++++++
 4 files changed, 141 insertions(+), 14 deletions(-)
 create mode 100644 org.spearce.egit.ui/src/org/spearce/egit/ui/internal/history/RevDiff.java

diff --git a/org.spearce.egit.ui/src/org/spearce/egit/ui/internal/history/CommitFileDiffViewer.java b/org.spearce.egit.ui/src/org/spearce/egit/ui/internal/history/CommitFileDiffViewer.java
index ebec261..7549aa4 100644
--- a/org.spearce.egit.ui/src/org/spearce/egit/ui/internal/history/CommitFileDiffViewer.java
+++ b/org.spearce.egit.ui/src/org/spearce/egit/ui/internal/history/CommitFileDiffViewer.java
@@ -64,9 +64,17 @@ public void open(final OpenEvent event) {
 				if (s.isEmpty() || !(s instanceof IStructuredSelection))
 					return;
 				final IStructuredSelection iss = (IStructuredSelection) s;
-				final FileDiff d = (FileDiff) iss.getFirstElement();
-				if (walker != null && d.blobs.length == 2)
-					showTwoWayFileDiff(d);
+				if (iss.getFirstElement() instanceof RevDiff)
+					showTwoWayDiff((RevDiff)iss.getFirstElement());
+				else {
+					FileDiff d = (FileDiff)iss.getFirstElement();
+					if (walker != null && d.blobs.length == 2) {
+						if (iss.size() == 1)
+							showTwoWayFileDiff(d);
+						else
+							showTwoWayDiff(iss.toArray());
+					}
+				}
 			}
 		});
 
@@ -98,6 +106,23 @@ void showTwoWayFileDiff(final FileDiff d) {
 		CompareUI.openCompareEditor(in);
 	}
 
+	void showTwoWayDiff(RevDiff d) {
+		final GitCompareFileRevisionEditorInput in = new GitCompareFileRevisionEditorInput(d.left, d.right, null);
+		CompareUI.openCompareEditor(in);
+	}
+
+	void showTwoWayDiff(final Object[] d) {
+		FileDiff[] diffs = new FileDiff[d.length];
+		System.arraycopy(d, 0, diffs, 0, d.length);
+
+		final Repository db = walker.getRepository();
+		DiffSide base = new DiffSide(diffs, 0, db);
+		DiffSide next = new DiffSide(diffs, 1, db);
+
+		final GitCompareFileRevisionEditorInput in = new GitCompareFileRevisionEditorInput(base, next, null);
+		CompareUI.openCompareEditor(in);
+	}
+
 	TreeWalk getTreeWalk() {
 		return walker;
 	}
@@ -124,13 +149,16 @@ void doCopy() {
 		if (s.isEmpty() || !(s instanceof IStructuredSelection))
 			return;
 		final IStructuredSelection iss = (IStructuredSelection) s;
-		final Iterator<FileDiff> itr = iss.iterator();
+		final Iterator itr = iss.iterator();
 		final StringBuilder r = new StringBuilder();
 		while (itr.hasNext()) {
-			final FileDiff d = itr.next();
+			Object o = itr.next();
 			if (r.length() > 0)
 				r.append("\n");
-			r.append(d.path);
+			if (o instanceof FileDiff)
+				r.append(((FileDiff)o).path);
+			else
+				r.append(((RevDiff)o).left.getChildren().length + " files");
 		}
 
 		clipboard.setContents(new Object[] { r.toString() },
diff --git a/org.spearce.egit.ui/src/org/spearce/egit/ui/internal/history/FileDiffContentProvider.java b/org.spearce.egit.ui/src/org/spearce/egit/ui/internal/history/FileDiffContentProvider.java
index c84e9f3..25e7714 100644
--- a/org.spearce.egit.ui/src/org/spearce/egit/ui/internal/history/FileDiffContentProvider.java
+++ b/org.spearce.egit.ui/src/org/spearce/egit/ui/internal/history/FileDiffContentProvider.java
@@ -20,7 +20,7 @@
 
 	private RevCommit commit;
 
-	private FileDiff[] diff;
+	private Object[] diff;
 
 	public void inputChanged(final Viewer newViewer, final Object oldInput,
 			final Object newInput) {
@@ -32,7 +32,14 @@ public void inputChanged(final Viewer newViewer, final Object oldInput,
 	public Object[] getElements(final Object inputElement) {
 		if (diff == null && walk != null && commit != null) {
 			try {
-				diff = FileDiff.compute(walk, commit);
+				FileDiff[] fdiff = FileDiff.compute(walk, commit);
+				if (fdiff.length <= 1) {
+					diff = fdiff;
+				} else {
+					diff = new Object[fdiff.length + 1];
+					diff[0] = new RevDiff(fdiff, walk.getRepository());
+					System.arraycopy(fdiff, 0, diff, 1, fdiff.length);
+				}
 			} catch (IOException err) {
 				Activator.error("Can't get file difference of "
 						+ commit.getId() + ".", err);
diff --git a/org.spearce.egit.ui/src/org/spearce/egit/ui/internal/history/FileDiffLabelProvider.java b/org.spearce.egit.ui/src/org/spearce/egit/ui/internal/history/FileDiffLabelProvider.java
index 60b3a5a..c78ba6e 100644
--- a/org.spearce.egit.ui/src/org/spearce/egit/ui/internal/history/FileDiffLabelProvider.java
+++ b/org.spearce.egit.ui/src/org/spearce/egit/ui/internal/history/FileDiffLabelProvider.java
@@ -14,12 +14,49 @@
 class FileDiffLabelProvider extends BaseLabelProvider implements
 		ITableLabelProvider {
 	public String getColumnText(final Object element, final int columnIndex) {
-		final FileDiff c = (FileDiff) element;
-		switch (columnIndex) {
-		case 0:
-			return c.change;
-		case 1:
-			return c.path;
+		if (element instanceof FileDiff) {
+			final FileDiff c = (FileDiff) element;
+			switch (columnIndex) {
+			case 0:
+				return c.change;
+			case 1:
+				return c.path;
+			}
+		} else {
+			final RevDiff c = (RevDiff) element;
+			switch (columnIndex) {
+			case 0:
+				return "\u03a3";
+			case 1:
+				{
+					int mod = 0;
+					int add = 0;
+					int del = 0;
+					for (int i = 0; i < c.left.fileDiffs.length; ++i) {
+						if (c.left.fileDiffs[i].change.equals("A"))
+							add++;
+						if (c.left.fileDiffs[i].change.equals("M"))
+							mod++;
+						if (c.left.fileDiffs[i].change.equals("D"))
+							del++;
+					}
+					StringBuilder b = new StringBuilder();
+					if (add > 0) {
+						b.append(add + " added");
+					}
+					if (mod > 0) {
+						if (b.length() > 0)
+							b.append(", ");
+						b.append(mod + " changed");
+					}
+					if (del > 0) {
+						if (b.length() > 0)
+							b.append(", ");
+						b.append(del + " deleted");
+					}
+					return b.toString();
+				}
+			}
 		}
 		return "";
 	}
diff --git a/org.spearce.egit.ui/src/org/spearce/egit/ui/internal/history/RevDiff.java b/org.spearce.egit.ui/src/org/spearce/egit/ui/internal/history/RevDiff.java
new file mode 100644
index 0000000..020ec73
--- /dev/null
+++ b/org.spearce.egit.ui/src/org/spearce/egit/ui/internal/history/RevDiff.java
@@ -0,0 +1,55 @@
+package org.spearce.egit.ui.internal.history;
+
+import org.eclipse.compare.ITypedElement;
+import org.eclipse.compare.structuremergeviewer.IStructureComparator;
+import org.eclipse.swt.graphics.Image;
+import org.eclipse.team.internal.ui.history.FileRevisionTypedElement;
+import org.spearce.egit.core.internal.storage.GitFileRevision;
+import org.spearce.jgit.lib.Repository;
+
+class RevDiff {
+	DiffSide left;
+
+	DiffSide right;
+
+	RevDiff(FileDiff[] fileDiffs, Repository db) {
+		left = new DiffSide(fileDiffs, 0, db);
+		right = new DiffSide(fileDiffs, 1, db);
+	}
+}
+
+class DiffSide implements ITypedElement, IStructureComparator {
+	final FileDiff[] fileDiffs;
+
+	private final int side;
+
+	private final Repository db;
+
+	DiffSide(FileDiff[] fileDiffs, int side, Repository db) {
+		this.fileDiffs = fileDiffs;
+		this.side = side;
+		this.db = db;
+	}
+
+	public Image getImage() {
+		return null;
+	}
+
+	public String getName() {
+		return "EGit diff";
+	}
+
+	public String getType() {
+		return FOLDER_TYPE;
+	}
+
+	public Object[] getChildren() {
+		FileRevisionTypedElement[] ret = new FileRevisionTypedElement[fileDiffs.length];
+		for (int i = 0; i < ret.length; ++i) {
+			ret[i] = new FileRevisionTypedElement(GitFileRevision.inCommit(db,
+					fileDiffs[i].commit, fileDiffs[i].path,
+					fileDiffs[i].blobs[side]));
+		}
+		return ret;
+	}
+}
-- 
1.6.1.rc3.56.gd0306

^ permalink raw reply related

* [EGIT PATCH 0/3] Show all changes files in the same compare editor
From: Robin Rosenberg @ 2009-01-10  1:14 UTC (permalink / raw)
  To: spearce; +Cc: git, Robin Rosenberg

Hereby I ressurrect some of the goodness the old compare editor had,
by enabling it to show all changes in the same compare editor so one
can browse back and forth using Ctrl-. and Ctrl-, instead of having
to click on each changed file.

Question: Do we need the filediff viewer AND the compare editor. We
used to have the compare editor update automaticall when one switched
revision in the history pane.

-- robin

Robin Rosenberg (3):
  Support viewing all changes in a single compare editor
  Present type of change with file revision in diff viewer
  Present full name of file revision

 .../core/internal/storage/GitFileRevision.java     |    5 +-
 .../ui/internal/history/CommitFileDiffViewer.java  |   40 +++++++--
 .../internal/history/FileDiffContentProvider.java  |   11 ++-
 .../ui/internal/history/FileDiffLabelProvider.java |   49 +++++++++--
 .../spearce/egit/ui/internal/history/RevDiff.java  |   91 ++++++++++++++++++++
 5 files changed, 179 insertions(+), 17 deletions(-)
 create mode 100644 org.spearce.egit.ui/src/org/spearce/egit/ui/internal/history/RevDiff.java

^ permalink raw reply

* Re: git submodule merge madness
From: Ask Bjørn Hansen @ 2009-01-10  1:13 UTC (permalink / raw)
  To: Ask Bjørn Hansen; +Cc: git
In-Reply-To: <ADC7A3B1-6756-4258-93CD-DB40C7D2793C@develooper.com>


On Jan 9, 2009, at 1:50 PM, Ask Bjørn Hansen wrote:

> The typical problem is that we get an error trying to merge a "pre- 
> submodule" branch into master:
>
> 	fatal: cannot read object 894c77319a18c4d48119c2985a9275c9f5883584  
> 'some/sub/dir': It is a submodule!
>
> Mark Levedahl wrote an example in July, but I don't think he got any  
> replies:  http://marc.info/?l=git&m=121587851313303

Replying to myself for the archives:

Looking in the code I noticed it's the recursive merge algorithm  
giving that error.  Making it use the resolve strategy ("git merge -s  
resolve") made it work in the cases I had today, yay.

  - ask

-- 
http://develooper.com/ - http://askask.com/

^ permalink raw reply

* Re: gitweb index performance (Re: [PATCH] gitweb: support the rel=vcs-* microformat)
From: Jakub Narebski @ 2009-01-10  1:11 UTC (permalink / raw)
  To: Joey Hess; +Cc: git, Giuseppe Bilotta
In-Reply-To: <20090108195446.GB18025@gnu.kitenet.net>

Joey Hess <joey@kitenet.net> writes:
> Giuseppe Bilotta wrote:

> > > There is a small overhead in including the microformat on project list
> > > and forks list pages, but getting the project descriptions for those pages
> > > already incurs a similar overhead, and the ability to get every repo url
> > > in one place seems worthwhile.
> > 
> > I agree with this, although people with very large project lists may
> > differ ... do we have timings on these?
> 
> AFAICS, when displaying the project list, gitweb reads each project's
> description file, falling back to reading its config file if there is no
> description file.
> 
> If performance was a problem here, the thing to do would be to add
> project descriptions to the $project_list file, and use those in
> preference to the description files. If a large site has done that,
> they've not sent in the patch. :-)

There was such patch sent by me, but IIRC it fall out, also because it
was sent IIRC in feature freeze time.  I have "gitweb: Extend
project_index file format by project description" in my StGit stack.

> 
> With my patch, it will read each cloneurl file too. The best way to
> optimise that for large sites seems to be to add an option that would
> ignore the cloneurl files and config file and always use
> @git_base_url_list.

Good idea.

> 
> I checked the only large site I have access to (git.debian.org) and they
> use a $project_list file, but I see no other performance tuning. That's
> a 2 ghz machine; it takes gitweb 28 (!) seconds to generate the nearly 1
> MB index web page for 1671 repositories:
> 
> /srv/git.debian.org/http/cgi-bin/gitweb.cgi  3.04s user 9.24s system 43% cpu 28.515 total
>
> 
> Notice that most of the time is spent by child processes. For each
> repository, gitweb runs git-for-each-ref to determine the time of the
> last commit.
> 
> If that is removed (say if there were a way to get the info w/o
> forking), performance improves nicely:
> 
> ./gitweb.cgi > /dev/null  1.29s user 1.08s system 69% cpu 3.389 total
> 
> Making it not read description files for each project, as I suggest above,
> is the next best optimisation:
> 
> ./gitweb.cgi > /dev/null  1.08s user 0.05s system 96% cpu 1.170 total
> 
> So, I think it makes sense to optimise gitweb and offer knobs for performance
> tuning at the expense of the flexability of description and cloneurl files.
> But, git-for-each-ref is swamping everything else.

One solution would be to limit number of projects displayed on the
page, for example to 100 projects, although that would mainly reduce
problem with dealing with large page on client size, less so server
load unless we _do not_ sort projects by age.

Another solution would be to use caching: repo.or.cz uses one solution
(caching only of projects_list action), kernel.org other solution
(gitweb caching from GSoC 2008 project).

-- 
Jakub Narebski
Poland
ShadeHawk on #git

^ permalink raw reply

* Re: [PATCH] gitweb: support the rel=vcs-* microformat
From: Jakub Narebski @ 2009-01-10  1:04 UTC (permalink / raw)
  To: Giuseppe Bilotta; +Cc: git, Joey Hess
In-Reply-To: <gk4bk5$9dq$1@ger.gmane.org>

Giuseppe Bilotta <giuseppe.bilotta@gmail.com> writes: 
> On Thursday 08 January 2009 00:24, Joey Hess wrote:
> 
> > The rel=vcs-* microformat allows a web page to indicate the locations of
> > repositories related to it in a machine-parseable manner.
> > (See http://kitenet.net/~joey/rfc/rel-vcs/)
> 
> Have you considered submitting the microformat to microformats.org?
> That would make the microformat more official and would be an good
> first step to have wider coverage of it, and additional reviews.

Good thinking.  BTW. microformats.org is IIRC wiki (or at least part
of it is wiki), so it should be easy to do...

> 
> > Make gitweb use the microformat if it has been configured with project url
> > information in any of the usual ways. On the project summary page, the
> > repository URL display is simply marked up using the microformat. On the
> > project list page and forks list page, the microformat is embedded in the
> > header, since the URLs do not appear on the page.
> > 
> > The microformat could be included on other pages too, but I've skipped
> > doing so for now, since it would mean reading another file for every page
> > displayed.
> > 
> > There is a small overhead in including the microformat on project list
> > and forks list pages, but getting the project descriptions for those pages
> > already incurs a similar overhead, and the ability to get every repo url
> > in one place seems worthwhile.
> 
> I agree with this, although people with very large project lists may
> differ ... do we have timings on these?

I think while adding this microformat to 'summary' page is non-issue,
we might want to be able configure it out so it is not used for
projects_list page (which might be very large).

And what about OPML, RSS and Atom formats?

>  
> > This changes git_get_project_url_list() to not check wantarray, and only
> > return in list context -- the only way it is used AFAICS. It memoizes
> > both that function and git_get_project_description(), to avoid redundant
> > file reads.
> 
> You may want to consider splitting the patch into three: memoizing
> of git_get_project_description(), reworking of
> git_get_project_url_list(), and the actual rel=vc-* insertions.

Very good idea.  Small, single feature patches are nice.

[...]
> >  sub git_get_project_description {
> >       my $path = shift;
> >  
> > +     return $project_descriptions{$path} if exists $project_descriptions{$path};
> > +
> 
> This line is bordering on the 80 characters, so you may want to
> consider moving 'my $descr' here, with something such as
> 
> my $descr = $project_descriptions{$path};
> return $descr if exists $descr;
> 
> Also, I'm no perl guru so I'm not sure about exists vs defined here.

You might have undefined value in existing key, but I guess that we
can assume that those are equivalent for this.  While 'exists' seems
more up to what you check (does the key exosts in hash) you further on
rely on the fact that $descr is not undefined.

[...]
> >  ## ======================================================================
> >  ## ======================================================================
> >  ## actions
> > @@ -4380,7 +4422,9 @@ sub git_project_list {
> >               die_error(404, "No projects found");
> >       }
> >  
> > -     git_header_html();
> > +     my $extraheader=git_links_header(map { $_->{path} } @list);
> > +
> > +     git_header_html(undef, undef, $extraheader);
> >       if (-f $home_text) {
> >               print "<div class=\"index_include\">\n";
> >               insert_file($home_text);
> > @@ -4405,8 +4449,10 @@ sub git_forks {
> >       if (!@list) {
> >               die_error(404, "No forks found");
> >       }
> > +     
> > +     my $extraheader=git_links_header(map { $_->{path} } @list);
> >  
> > -     git_header_html();
> > +     git_header_html(undef, undef, $extraheader);
> 
> This makes me wonder if it would be worth it to turn git_header_html
> into -param => value style, but I'm not really sure it's worth it.

It is git_header_html(STATUS, EXPIRES, EXTRA)

Hmmm... now I have checked we use either git_header_html() in gitweb
(which is most common), or git_header_html(STATUS) in die_error, or in
a few cases git_header_html(undef, $expires); and now
git_header_html(undef, undef, $extra), so named parameters might be a
good idea... I don't have opinion here...

-- 
Jakub Narebski
Poland
ShadeHawk on #git

^ permalink raw reply

* [PATCH v2] make diff --color-words customizable
From: Thomas Rast @ 2009-01-10  0:57 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Johannes Schindelin, Teemu Likonen
In-Reply-To: <87wsd48wam.fsf@iki.fi>

Allows for user-configurable word splits via a regular expression when
using --color-words.  This can make the diff more readable if the
regex is configured according to the language of the file.

The regex can be specified either through an optional argument
--color-words=<regex> or through the attributes mechanism, similar to
the funcname pattern.

Each non-overlapping match of the regex is a word; everything in
between is whitespace.  We disallow matching the empty string (because
it results in an endless loop) or a newline (breaks color escapes and
interacts badly with the input coming from the usual line diff).  To
help the user, we set REG_NEWLINE so that [^...] and . do not match
newlines.

--color-words works (and always worked) by splitting words onto one
line each, and using the normal line-diff machinery to get a word
diff.  Since we cannot reuse the current approach of simply
overwriting uninteresting characters with '\n', we insert an
artificial '\n' at the end of each detected word.  Its presence must
be tracked so that we can distinguish artificial from source newlines.

Insertion of spaces is somewhat subtle.  We echo a "context" space
twice (once on each side of the diff) if it follows directly after a
word, by "skipping" it during the translation (instead of generating a
'\n').  While this loses a tiny bit of accuracy, it runs together long
sequences of changed words into one removed and one added block,
making the diff much more readable.  As a side-effect, the splitting
regex '\S+' currently results in the exact same output as the original
code.  The existing code still stays in place in case no regex is
provided, for performance.

We also build in patterns for some of the languages that already had
funcname regexes.  They are designed to group UTF-8 sequences into a
single word to make sure they remain readable.

Thanks to Johannes Schindelin for the option handling code.

Signed-off-by: Thomas Rast <trast@student.ethz.ch>

---

Thomas Rast wrote:
> I'll come up with a fixed patch, and probably make it both
> funcname-like (Jeff's idea) and command line configurable.

I think this should do.  Getting the spaces right was harder than I
thought; originally it only tracked _END and _BODY, but then a changed
sentence will look like a lot of separate word changes, making it
extremely confusing.

Teemu Likonen wrote:
> I agree with that too. A good thing about the current --color-words is
> that it automatically works with UTF-8 encoded text. This is _very_
> important as --color-words is usually the best diff tool for
> human-language texts.

Thanks for pointing this out.  I put a [\x80-\xff]+ clause in the
built-in patterns that do not already match high-bit characters, so
that they will keep them together no matter what.  Unfortunately it's
rather hard to get the same effect "by hand", as neither shell, nor
git-config, nor regex.c, seem to expand \xNN or \NNN.  You'll need $''
in bash (is this POSIX?)  or 'echo -e' or a very large keyboard, or a
pattern that can be written in terms of a negated class.

(I briefly considered forcing "|[\x80-\xff]+|\S" into the regular
expression, but the former is very encoding-specific.  Maybe at least
"|\S" would be a good addition.)



 Documentation/diff-options.txt  |   18 +++-
 Documentation/gitattributes.txt |   21 ++++
 diff.c                          |  199 +++++++++++++++++++++++++++++++++++----
 diff.h                          |    1 +
 t/t4033-diff-color-words.sh     |   90 ++++++++++++++++++
 userdiff.c                      |   27 ++++--
 userdiff.h                      |    1 +
 7 files changed, 330 insertions(+), 27 deletions(-)

diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
index 671f533..d22c06b 100644
--- a/Documentation/diff-options.txt
+++ b/Documentation/diff-options.txt
@@ -91,8 +91,22 @@ endif::git-format-patch[]
 	Turn off colored diff, even when the configuration file
 	gives the default to color output.
 
---color-words::
-	Show colored word diff, i.e. color words which have changed.
+--color-words[=<regex>]::
+	Show colored word diff, i.e., color words which have changed.
+	By default, a new word only starts at whitespace, so that a
+	'word' is defined as a maximal sequence of non-whitespace
+	characters.  The optional argument <regex> can be used to
+	configure this.  It can also be set via a diff driver, see
+	linkgit:gitattributes[1]; if a <regex> is given explicitly, it
+	overrides any diff driver setting.
++
+The <regex> must be an (extended) regular expression.  When set, every
+non-overlapping match of the <regex> is considered a word.  (Regular
+expression semantics ensure that quantifiers grab a maximal sequence
+of characters.)  Anything between these matches is considered
+whitespace and ignored for the purposes of finding differences.  You
+may want to append `|\S` to your regular expression to make sure that
+it matches all non-whitespace characters.
 
 --no-renames::
 	Turn off rename detection, even when the configuration
diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt
index 8af22ec..67f5522 100644
--- a/Documentation/gitattributes.txt
+++ b/Documentation/gitattributes.txt
@@ -334,6 +334,27 @@ patterns are available:
 - `tex` suitable for source code for LaTeX documents.
 
 
+Customizing word diff
+^^^^^^^^^^^^^^^^^^^^^
+
+You can customize the rules that `git diff --color-words` uses to
+split words in a line, by specifying an appropriate regular expression
+in the "diff.*.wordregex" configuration variable.  For example, in TeX
+a backslash followed by a sequence of letters forms a command, but
+several such commands can be run together without intervening
+whitespace.  To separate them, use a regular expression such as
+
+------------------------
+[diff "tex"]
+	wordregex = "\\\\[a-zA-Z]+|[{}]|\\\\.|[^\\{} \t]+"
+------------------------
+
+Similar to 'xfuncname', a built in value is provided for the drivers
+`bibtex`, `html`, `java`, `php`, `python` and `tex`.  See the
+documentation of --color-words in linkgit:git-diff[1] for the precise
+semantics.
+
+
 Performing text diffs of binary files
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
diff --git a/diff.c b/diff.c
index d235482..620911e 100644
--- a/diff.c
+++ b/diff.c
@@ -321,6 +321,7 @@ struct diff_words_buffer {
 	long alloc;
 	long current; /* output pointer */
 	int suppressed_newline;
+	enum diff_word_boundaries *boundaries;
 };
 
 static void diff_words_append(char *line, unsigned long len,
@@ -336,23 +337,55 @@ static void diff_words_append(char *line, unsigned long len,
 	buffer->text.size += len;
 }
 
+/*
+ * We use these to save the word boundaries.  WORD_BODY and WORD_END
+ * signal a word, meaning that after the WORD_END character an
+ * artificial newline will be inserted.
+ */
+enum diff_word_boundaries {
+	DIFF_WORD_UNDEF,
+	DIFF_WORD_BODY,
+	DIFF_WORD_END,
+	DIFF_WORD_SPACE,
+	DIFF_WORD_SKIP
+};
+
 struct diff_words_data {
 	struct diff_words_buffer minus, plus;
 	FILE *file;
+	regex_t *word_regex;
+	enum diff_word_boundaries *minus_boundaries, *plus_boundaries;
 };
 
-static void print_word(FILE *file, struct diff_words_buffer *buffer, int len, int color,
+static int print_word(FILE *file, struct diff_words_buffer *buffer, int len, int color,
 		int suppress_newline)
 {
 	const char *ptr;
 	int eol = 0;
 
 	if (len == 0)
-		return;
+		return len;
 
 	ptr  = buffer->text.ptr + buffer->current;
+
+	if (buffer->boundaries
+	    && (buffer->boundaries[buffer->current] == DIFF_WORD_BODY
+		|| buffer->boundaries[buffer->current] == DIFF_WORD_END)) {
+		/* account for the artificial newline */
+		len--;
+		/* we still have len>0 because it is a word */
+	}
+
 	buffer->current += len;
 
+	if (buffer->boundaries
+	    && buffer->boundaries[buffer->current] == DIFF_WORD_SKIP) {
+		/* we had an artificial newline, but the next whitespace
+		 * character right after was skipped because of it */
+		buffer->current++;
+		len++;
+	}
+
 	if (ptr[len - 1] == '\n') {
 		eol = 1;
 		len--;
@@ -368,6 +401,10 @@ static void print_word(FILE *file, struct diff_words_buffer *buffer, int len, in
 		else
 			putc('\n', file);
 	}
+
+	/* we need to return how many chars to skip on the other side,
+	 * so account for the (held off) \n */
+	return len+eol;
 }
 
 static void fn_out_diff_words_aux(void *priv, char *line, unsigned long len)
@@ -391,13 +428,106 @@ static void fn_out_diff_words_aux(void *priv, char *line, unsigned long len)
 				   &diff_words->plus, len, DIFF_FILE_NEW, 0);
 			break;
 		case ' ':
-			print_word(diff_words->file,
-				   &diff_words->plus, len, DIFF_PLAIN, 0);
+			len = print_word(diff_words->file,
+					 &diff_words->plus, len, DIFF_PLAIN, 0);
 			diff_words->minus.current += len;
 			break;
 	}
 }
 
+static void scan_word_boundaries(regex_t *pattern, struct diff_words_buffer *buf,
+				 mmfile_t *mmfile)
+{
+	char *text = buf->text.ptr;
+	int len = buf->text.size;
+	int i = 0;
+	int count = 0;
+	int ret;
+	regmatch_t matches[1];
+	int offset, wordlen;
+	char *strz, *p;
+
+	/* overallocate by 1 so we can safely peek past the end for a SKIP */
+	buf->boundaries = xmalloc((len+1) * sizeof(enum diff_word_boundaries));
+	buf->boundaries[len] = DIFF_WORD_UNDEF;
+
+	if (!text) {
+		mmfile->ptr = NULL;
+		mmfile->size = 0;
+		return;
+	}
+
+	strz = xmalloc(len+1);
+	memcpy(strz, text, len);
+	strz[len] = '\0';
+
+	while (i < len) {
+		ret = regexec(pattern, strz+i, 1, matches, 0);
+		if (ret == REG_NOMATCH) {
+			/* the rest is whitespace */
+			if (i > 0 && i < len) {
+				buf->boundaries[i++] = DIFF_WORD_SKIP;
+				count--;
+			}
+			while (i < len)
+				buf->boundaries[i++] = DIFF_WORD_SPACE;
+			break;
+		}
+
+		offset = matches[0].rm_so;
+		if (offset > 0 && i > 0) {
+			buf->boundaries[i++] = DIFF_WORD_SKIP;
+			count--;
+			offset--;
+		}
+		while (offset-- > 0)
+			buf->boundaries[i++] = DIFF_WORD_SPACE;
+
+		wordlen = matches[0].rm_eo - matches[0].rm_so;
+		while (wordlen > 1) {
+			if (strz[i] == '\n')
+				die("word regex matched a newline near '%s'",
+				    strz+i);
+			buf->boundaries[i++] = DIFF_WORD_BODY;
+			wordlen--;
+		}
+		if (wordlen > 0) {
+			if (strz[i] == '\n')
+				die("word regex matched a newline near '%s'",
+				    strz+i);
+			buf->boundaries[i++] = DIFF_WORD_END;
+			count++;
+		} else {
+			die("word regex matched the empty string at '%s'",
+			    strz+i);
+		}
+	}
+
+	free(strz);
+
+	mmfile->size = len + count;
+	mmfile->ptr = xmalloc(mmfile->size);
+	p = mmfile->ptr;
+	for (i = 0; i < len; i++) {
+		switch (buf->boundaries[i]) {
+		case DIFF_WORD_BODY:
+			*p++ = text[i];
+			break;
+		case DIFF_WORD_END:
+			*p++ = text[i];
+			*p++ = '\n'; /* insert an artificial newline */
+			break;
+		case DIFF_WORD_SPACE:
+			*p++ = '\n';
+			break;
+		case DIFF_WORD_SKIP:
+			/* nothing */
+			break;
+		}
+	}
+}
+
+
 /* this executes the word diff on the accumulated buffers */
 static void diff_words_show(struct diff_words_data *diff_words)
 {
@@ -409,22 +539,31 @@ static void diff_words_show(struct diff_words_data *diff_words)
 
 	memset(&xpp, 0, sizeof(xpp));
 	memset(&xecfg, 0, sizeof(xecfg));
-	minus.size = diff_words->minus.text.size;
-	minus.ptr = xmalloc(minus.size);
-	memcpy(minus.ptr, diff_words->minus.text.ptr, minus.size);
-	for (i = 0; i < minus.size; i++)
-		if (isspace(minus.ptr[i]))
-			minus.ptr[i] = '\n';
-	diff_words->minus.current = 0;
 
-	plus.size = diff_words->plus.text.size;
-	plus.ptr = xmalloc(plus.size);
-	memcpy(plus.ptr, diff_words->plus.text.ptr, plus.size);
-	for (i = 0; i < plus.size; i++)
-		if (isspace(plus.ptr[i]))
-			plus.ptr[i] = '\n';
+	if (!diff_words->word_regex) {
+		minus.size = diff_words->minus.text.size;
+		minus.ptr = xmalloc(minus.size);
+		memcpy(minus.ptr, diff_words->minus.text.ptr, minus.size);
+		for (i = 0; i < minus.size; i++)
+			if (isspace(minus.ptr[i]))
+				minus.ptr[i] = '\n';
+
+		plus.size = diff_words->plus.text.size;
+		plus.ptr = xmalloc(plus.size);
+		memcpy(plus.ptr, diff_words->plus.text.ptr, plus.size);
+		for (i = 0; i < plus.size; i++)
+			if (isspace(plus.ptr[i]))
+				plus.ptr[i] = '\n';
+	} else {
+		scan_word_boundaries(diff_words->word_regex,
+				     &diff_words->minus, &minus);
+		scan_word_boundaries(diff_words->word_regex,
+				     &diff_words->plus, &plus);
+	}
+	diff_words->minus.current = 0;
 	diff_words->plus.current = 0;
 
+
 	xpp.flags = XDF_NEED_MINIMAL;
 	xecfg.ctxlen = diff_words->minus.alloc + diff_words->plus.alloc;
 	xdi_diff_outf(&minus, &plus, fn_out_diff_words_aux, diff_words,
@@ -432,6 +571,8 @@ static void diff_words_show(struct diff_words_data *diff_words)
 	free(minus.ptr);
 	free(plus.ptr);
 	diff_words->minus.text.size = diff_words->plus.text.size = 0;
+	free(diff_words->minus.boundaries);
+	free(diff_words->plus.boundaries);
 
 	if (diff_words->minus.suppressed_newline) {
 		putc('\n', diff_words->file);
@@ -461,6 +602,7 @@ static void free_diff_words_data(struct emit_callback *ecbdata)
 
 		free (ecbdata->diff_words->minus.text.ptr);
 		free (ecbdata->diff_words->plus.text.ptr);
+		free(ecbdata->diff_words->word_regex);
 		free(ecbdata->diff_words);
 		ecbdata->diff_words = NULL;
 	}
@@ -1323,6 +1465,12 @@ static const struct userdiff_funcname *diff_funcname_pattern(struct diff_filespe
 	return one->driver->funcname.pattern ? &one->driver->funcname : NULL;
 }
 
+static const char *userdiff_word_regex(struct diff_filespec *one)
+{
+	diff_filespec_load_driver(one);
+	return one->driver->word_regex;
+}
+
 void diff_set_mnemonic_prefix(struct diff_options *options, const char *a, const char *b)
 {
 	if (!options->a_prefix)
@@ -1483,6 +1631,19 @@ static void builtin_diff(const char *name_a,
 			ecbdata.diff_words =
 				xcalloc(1, sizeof(struct diff_words_data));
 			ecbdata.diff_words->file = o->file;
+			if (!o->word_regex)
+				o->word_regex = userdiff_word_regex(one);
+			if (!o->word_regex)
+				o->word_regex = userdiff_word_regex(two);
+			if (o->word_regex) {
+				ecbdata.diff_words->word_regex = (regex_t *)
+					xmalloc(sizeof(regex_t));
+				if (regcomp(ecbdata.diff_words->word_regex,
+					    o->word_regex,
+					    REG_EXTENDED|REG_NEWLINE))
+					die ("Invalid regular expression: %s",
+					     o->word_regex);
+			}
 		}
 		xdi_diff_outf(&mf1, &mf2, fn_out_consume, &ecbdata,
 			      &xpp, &xecfg, &ecb);
@@ -2494,6 +2655,10 @@ int diff_opt_parse(struct diff_options *options, const char **av, int ac)
 		DIFF_OPT_CLR(options, COLOR_DIFF);
 	else if (!strcmp(arg, "--color-words"))
 		options->flags |= DIFF_OPT_COLOR_DIFF | DIFF_OPT_COLOR_DIFF_WORDS;
+	else if (!prefixcmp(arg, "--color-words=")) {
+		options->flags |= DIFF_OPT_COLOR_DIFF | DIFF_OPT_COLOR_DIFF_WORDS;
+		options->word_regex = arg + 14;
+	}
 	else if (!strcmp(arg, "--exit-code"))
 		DIFF_OPT_SET(options, EXIT_WITH_STATUS);
 	else if (!strcmp(arg, "--quiet"))
diff --git a/diff.h b/diff.h
index 4d5a327..23cd90c 100644
--- a/diff.h
+++ b/diff.h
@@ -98,6 +98,7 @@ struct diff_options {
 
 	int stat_width;
 	int stat_name_width;
+	const char *word_regex;
 
 	/* this is set by diffcore for DIFF_FORMAT_PATCH */
 	int found_changes;
diff --git a/t/t4033-diff-color-words.sh b/t/t4033-diff-color-words.sh
new file mode 100755
index 0000000..536cdac
--- /dev/null
+++ b/t/t4033-diff-color-words.sh
@@ -0,0 +1,90 @@
+#!/bin/sh
+
+
+test_description='diff --color-words'
+. ./test-lib.sh
+
+cat <<EOF > test_a
+foo_bar_baz
+a qu_ux b c
+alpha beta gamma delta
+EOF
+
+cat <<EOF > test_b
+foo_baz_baz
+a qu_new_ux b c
+alpha 4 2 delta
+EOF
+
+# t4026-diff-color.sh tests the color escapes, so we assume they do
+# not change
+
+munge () {
+    tail -n +5 | tr '\033' '!'
+}
+
+cat <<EOF > expect-plain
+![36m@@ -1,3 +1,3 @@![m
+![31mfoo_bar_baz![m![32mfoo_baz_baz![m
+a ![m![31mqu_ux ![m![32mqu_new_ux ![mb ![mc![m
+alpha ![m![31mbeta ![m![31mgamma ![m![32m4 ![m![32m2 ![mdelta![m
+EOF
+
+test_expect_success 'default settings' '
+	git diff --no-index --color-words test_a test_b |
+		munge > actual-plain &&
+	test_cmp expect-plain actual-plain
+'
+
+test_expect_success 'trivial regex yields same as default' '
+	git diff --no-index --color-words="\\S+" test_a test_b |
+		munge > actual-trivial &&
+	test_cmp expect-plain actual-trivial
+'
+
+cat <<EOF > expect-chars
+![36m@@ -1,3 +1,3 @@![m
+f![mo![mo![m_![mb![ma![m![31mr![m![32mz![m_![mb![ma![mz![m
+a ![mq![mu![m_![m![32mn![m![32me![m![32mw![m![32m_![mu![mx ![mb ![mc![m
+a![ml![mp![mh![ma ![m![31mb![m![31me![m![31mt![m![31ma ![m![31mg![m![31ma![m![31mm![m![31mm![m![31ma ![m![32m4 ![m![32m2 ![md![me![ml![mt![ma![m
+EOF
+
+test_expect_success 'character by character regex' '
+	git diff --no-index --color-words="\\S" test_a test_b |
+		munge > actual-chars &&
+	test_cmp expect-chars actual-chars
+'
+
+cat <<EOF > expect-nontrivial
+![36m@@ -1,3 +1,3 @@![m
+foo![m_![m![31mbar![m![32mbaz![m_![mbaz![m
+a ![mqu![m_![m![32mnew![m![32m_![mux ![mb ![mc![m
+alpha ![m![31mbeta ![m![31mgamma ![m![32m4![m![32m ![m![32m2![m![32m ![mdelta![m
+EOF
+
+test_expect_success 'nontrivial regex' '
+	git diff --no-index --color-words="[a-z]+|_" test_a test_b |
+		munge > actual-nontrivial &&
+	test_cmp expect-nontrivial actual-nontrivial
+'
+
+test_expect_success 'set a diff driver' '
+	git config diff.testdriver.wordregex "\\S" &&
+	cat <<EOF > .gitattributes
+test_* diff=testdriver
+EOF
+'
+
+test_expect_success 'use default supplied by driver' '
+	git diff --no-index --color-words test_a test_b |
+		munge > actual-chars-2 &&
+	test_cmp expect-chars actual-chars-2
+'
+
+test_expect_success 'option overrides default' '
+	git diff --no-index --color-words="[a-z]+|_" test_a test_b |
+		munge > actual-nontrivial-2 &&
+	test_cmp expect-nontrivial actual-nontrivial-2
+'
+
+test_done
diff --git a/userdiff.c b/userdiff.c
index 3681062..7fd9a07 100644
--- a/userdiff.c
+++ b/userdiff.c
@@ -6,13 +6,17 @@ static struct userdiff_driver *drivers;
 static int ndrivers;
 static int drivers_alloc;
 
-#define FUNCNAME(name, pattern) \
+#define FUNCNAME(name, pattern)			\
 	{ name, NULL, -1, { pattern, REG_EXTENDED } }
+#define PATTERNS(name, pattern, wordregex)			\
+	{ name, NULL, -1, { pattern, REG_EXTENDED }, NULL, wordregex }
 static struct userdiff_driver builtin_drivers[] = {
-FUNCNAME("html", "^[ \t]*(<[Hh][1-6][ \t].*>.*)$"),
-FUNCNAME("java",
+PATTERNS("html", "^[ \t]*(<[Hh][1-6][ \t].*>.*)$",
+	 "[^<>= \t]+|\\S"),
+PATTERNS("java",
 	 "!^[ \t]*(catch|do|for|if|instanceof|new|return|switch|throw|while)\n"
-	 "^[ \t]*(([ \t]*[A-Za-z_][A-Za-z_0-9]*){2,}[ \t]*\\([^;]*)$"),
+	 "^[ \t]*(([ \t]*[A-Za-z_][A-Za-z_0-9]*){2,}[ \t]*\\([^;]*)$",
+	 "[a-zA-Z_][a-zA-Z0-9_]*|[-+0-9.e]+|[-+*/]=|\\+\\+|--|\\S|[\x80-\xff]+"),
 FUNCNAME("objc",
 	 /* Negate C statements that can look like functions */
 	 "!^[ \t]*(do|for|if|else|return|switch|while)\n"
@@ -27,14 +31,19 @@ FUNCNAME("pascal",
 		"implementation|initialization|finalization)[ \t]*.*)$"
 	 "\n"
 	 "^(.*=[ \t]*(class|record).*)$"),
-FUNCNAME("php", "^[\t ]*((function|class).*)"),
-FUNCNAME("python", "^[ \t]*((class|def)[ \t].*)$"),
+PATTERNS("php", "^[\t ]*((function|class).*)",
+	 "\\$?[a-zA-Z_][a-zA-Z0-9_]*|[-+0-9.e]+|[-+*/]=|\\+\\+|--|->|\\S|[\x80-\xff]+"),
+PATTERNS("python", "^[ \t]*((class|def)[ \t].*)$",
+	 "[a-zA-Z_][a-zA-Z0-9_]*|[-+0-9.e]+|[-+*/]=|//|\\S|[\x80-\xff]+"),
 FUNCNAME("ruby", "^[ \t]*((class|module|def)[ \t].*)$"),
-FUNCNAME("bibtex", "(@[a-zA-Z]{1,}[ \t]*\\{{0,1}[ \t]*[^ \t\"@',\\#}{~%]*).*$"),
-FUNCNAME("tex", "^(\\\\((sub)*section|chapter|part)\\*{0,1}\\{.*)$"),
+PATTERNS("bibtex", "(@[a-zA-Z]{1,}[ \t]*\\{{0,1}[ \t]*[^ \t\"@',\\#}{~%]*).*$",
+	 "[={}\"]|[^={}\" \t]+"),
+PATTERNS("tex", "^(\\\\((sub)*section|chapter|part)\\*{0,1}\\{.*)$",
+	 "\\\\[a-zA-Z@]+|[{}]|\\\\.|[^\\{} \t]+"),
 { "default", NULL, -1, { NULL, 0 } },
 };
 #undef FUNCNAME
+#undef PATTERNS
 
 static struct userdiff_driver driver_true = {
 	"diff=true",
@@ -134,6 +143,8 @@ int userdiff_config(const char *k, const char *v)
 		return parse_string(&drv->external, k, v);
 	if ((drv = parse_driver(k, v, "textconv")))
 		return parse_string(&drv->textconv, k, v);
+	if ((drv = parse_driver(k, v, "wordregex")))
+		return parse_string(&drv->word_regex, k, v);
 
 	return 0;
 }
diff --git a/userdiff.h b/userdiff.h
index ba29457..2aab13e 100644
--- a/userdiff.h
+++ b/userdiff.h
@@ -12,6 +12,7 @@ struct userdiff_driver {
 	int binary;
 	struct userdiff_funcname funcname;
 	const char *textconv;
+	const char *word_regex;
 };
 
 int userdiff_config(const char *k, const char *v);
-- 
tg: (c123b7c..) t/word-diff-regex (depends on: origin/master)

^ permalink raw reply related

* Re: [PATCH] gitweb: support the rel=vcs-* microformat
From: Jakub Narebski @ 2009-01-10  0:52 UTC (permalink / raw)
  To: Joey Hess; +Cc: git
In-Reply-To: <20090107232427.GA18958@gnu.kitenet.net>

Joey Hess <joey@kitenet.net> writes:

> The rel=vcs-* microformat allows a web page to indicate the locations of
> repositories related to it in a machine-parseable manner.
> (See http://kitenet.net/~joey/rfc/rel-vcs/)
> 
> Make gitweb use the microformat if it has been configured with project url
> information in any of the usual ways. On the project summary page, the
> repository URL display is simply marked up using the microformat. On the
> project list page and forks list page, the microformat is embedded in the
> header, since the URLs do not appear on the page.

I think having LINK elements also for 'summary' page would be a good
idea. This microformat is I think mainly for machines, and machines
can I guess read better a few LINK elements in fairly small HEAD of
page, than scan all of many link (A) elements on the page for those
matching vcs-* microformat.

Beside I am not sure if for example hyperlinking SCP-style repository
URL makes sense at all; I am also not sure if hyperlinking links on
which you cannot click on makes good sense (unless you use SPAN or
ABBR instead of A to mark repo links...)

> 
> The microformat could be included on other pages too, but I've skipped
> doing so for now, since it would mean reading another file for every page
> displayed.

Also it is not necessary: if some tool want to get repo links for
given project, it can get 'summary' page; if some tool want to get
list of all repos, it can access one of projects list actions.

> 
> There is a small overhead in including the microformat on project list
> and forks list pages, but getting the project descriptions for those pages
> already incurs a similar overhead, and the ability to get every repo url
> in one place seems worthwhile.

By the way, do you have any benchmarks for that?

> 
> This changes git_get_project_url_list() to not check wantarray, and only
> return in list context -- the only way it is used AFAICS. It memoizes
> both that function and git_get_project_description(), to avoid redundant
> file reads.

I would also add that, from what I understand, you have made
git_get_project_url_list() subroutine to be self-sufficient: it now
considers both per-repository configuration (gitweb.url in config,
cloneurl file in $GIT_DIR) and global gitweb configuration
(@git_base_url_list variable).

Simplification of code so it always return list and does nto check
contents is a side issue, orthogonal to issue mentioned above.

> 
> Signed-off-by: Joey Hess <joey@gnu.kitenet.net>
> ---
>  gitweb/gitweb.perl |   78 +++++++++++++++++++++++++++++++++++++++++----------
>  1 files changed, 62 insertions(+), 16 deletions(-)
> 
> This incorporates Giuseppe Bilotta's feedback, and uses new features
> of the microformat. You can see this version running at
> http://git.ikiwiki.info/
> 
> diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
> index 99f71b4..c238717 100755
> --- a/gitweb/gitweb.perl
> +++ b/gitweb/gitweb.perl
> @@ -2020,9 +2020,14 @@ sub git_get_path_by_hash {
>  ## ......................................................................
>  ## git utility functions, directly accessing git repository
>  
> +{
> +my %project_descriptions; # cache
> +

Won't we get warnings (and perhaps errors) from mod_perl? Shouldn't
this be "our %project_descriptions;"?

>  sub git_get_project_description {
>  	my $path = shift;
>  
> +	return $project_descriptions{$path} if exists $project_descriptions{$path};
> +
>  	$git_dir = "$projectroot/$path";
>  	open my $fd, "$git_dir/description"
>  		or return git_get_project_config('description');
> @@ -2031,7 +2036,9 @@ sub git_get_project_description {
>  	if (defined $descr) {
>  		chomp $descr;
>  	}
> -	return $descr;
> +	return $project_descriptions{$path}=$descr;
> +}
> +
>  }

If we use 'title="$project git repository" for 'rel="vcs-git"' links,
is it still worth it extra complication to avoid double calculation of
project description in the case of 'summary' view for a project?
Because IIRC for 'projects_list' view it is already cached in
@projects list as 'descr' key...

>  
>  sub git_get_project_ctags {
> @@ -2099,18 +2106,30 @@ sub git_show_project_tagcloud {
>  	}
>  }
>  
> +{
> +my %project_url_lists; # cache
> +

Same question: would it work correctly for mod_perl?

>  sub git_get_project_url_list {
> +	# use per project git URL list in $projectroot/$path/cloneurl
> +	# or make project git URL from git base URL and project name
>  	my $path = shift;
>  
> +	return @{$project_url_lists{$path}} if exists $project_url_lists{$path};
> +
> +	my @ret;
>  	$git_dir = "$projectroot/$path";
> -	open my $fd, "$git_dir/cloneurl"
> -		or return wantarray ?
> -		@{ config_to_multi(git_get_project_config('url')) } :
> -		   config_to_multi(git_get_project_config('url'));
> -	my @git_project_url_list = map { chomp; $_ } <$fd>;
> -	close $fd;
> +	if (open my $fd, "$git_dir/cloneurl") {
> +		@ret = map { chomp; $_ } <$fd>;
> +		close $fd;
> +	} else {
> +	       @ret = @{ config_to_multi(git_get_project_config('url')) };
> +	}
> +	@ret=map { "$_/$project" } @git_base_url_list if ! @ret;

Style: 

+	@ret = map { "$_/$project" } @git_base_url_list if !@ret;

or even

+	@ret = map { "$_/$project" } @git_base_url_list unless @ret;

> +
> +	$project_url_lists{$path}=\@ret;
> +	return @ret;
> +}
>  
> -	return wantarray ? @git_project_url_list : \@git_project_url_list;
>  }

Again: is it worth caching? It is only for 'summary'; for
'projects_list' it might be better to extend @projects list instead

>  
>  sub git_get_projects_list {
> @@ -2856,6 +2875,7 @@ sub blob_contenttype {
>  sub git_header_html {
>  	my $status = shift || "200 OK";
>  	my $expires = shift;
> +	my $extraheader = shift;
>  
>  	my $title = "$site_name";
>  	if (defined $project) {
> @@ -2953,6 +2973,8 @@ EOF
>  		print qq(<link rel="shortcut icon" href="$favicon" type="image/png" />\n);
>  	}
>  
> +	print $extraheader if defined $extraheader;
> +
>  	print "</head>\n" .
>  	      "<body>\n";
>  

Good solution, but shouldn't this be better put into separate commit,
simply extending git_header_html to allow to add extra data (no need
to name it $extraheader I think, $extra would be enough) to the HTML
header (HEAD element contents)?

> @@ -4365,6 +4387,26 @@ sub git_search_grep_body {
>  	print "</table>\n";
>  }
>  
> +sub git_link_title {
> +	my $project=shift;
> +	
> +	my $description=git_get_project_description($project);
> +	return $project.(length $description ? " - $description" : "");
> +}

Style (whitespace around '='), and the fact that IMHO "$project git
repository" is better than "$project - $description", also because of
  "Unnamed repository; edit this file to name it for gitweb." 
default template

> +
> +# generates header with links to the specified projects
> +sub git_links_header {

Good abstraction, but I'm not so sure about subroutine name.

> +	my $ret='';
> +	foreach my $project (@_) {

Style: I'd rather use named variables, like "my @projects = @_";
also everywhere else we use spaces around '=' usually.

> +		# rel=vcs-* microformat
> +		my $title=git_link_title($project);

Good abstraction.

> +		foreach my $url git_get_project_url_list($project) {
> +			$ret.=qq{<link rel="vcs-git" href="$url" title="$title"/>\n}

To be HTML compatibile, it is better to use 

> +			$ret.=qq{<link rel="vcs-git" href="$url" title="$title" />\n}

(note the space before "/>").

> +		}
> +	}
> +	return $ret;
> +}
> +
>  ## ======================================================================
>  ## ======================================================================
>  ## actions
> @@ -4380,7 +4422,9 @@ sub git_project_list {
>  		die_error(404, "No projects found");
>  	}
>  
> -	git_header_html();
> +	my $extraheader=git_links_header(map { $_->{path} } @list);
> +
> +	git_header_html(undef, undef, $extraheader);
>  	if (-f $home_text) {
>  		print "<div class=\"index_include\">\n";
>  		insert_file($home_text);
> @@ -4405,8 +4449,10 @@ sub git_forks {
>  	if (!@list) {
>  		die_error(404, "No forks found");
>  	}
> +	
> +	my $extraheader=git_links_header(map { $_->{path} } @list);
>  
> -	git_header_html();
> +	git_header_html(undef, undef, $extraheader);
>  	git_print_page_nav('','');
>  	git_print_header_div('summary', "$project forks");
>  	git_project_list_body(\@list, $order);
> @@ -4468,14 +4514,14 @@ sub git_summary {
>  		print "<tr id=\"metadata_lchange\"><td>last change</td><td>$cd{'rfc2822'}</td></tr>\n";
>  	}
>  
> -	# use per project git URL list in $projectroot/$project/cloneurl
> -	# or make project git URL from git base URL and project name
>  	my $url_tag = "URL";
> -	my @url_list = git_get_project_url_list($project);
> -	@url_list = map { "$_/$project" } @git_base_url_list unless @url_list;
> -	foreach my $git_url (@url_list) {
> +	my $title=git_link_title($project);
> +	foreach my $git_url (git_get_project_url_list($project)) {
>  		next unless $git_url;
> -		print "<tr class=\"metadata_url\"><td>$url_tag</td><td>$git_url</td></tr>\n";
> +		print "<tr class=\"metadata_url\"><td>$url_tag</td><td>".
> +		      # rel=vcs-* microformat
> +		      "<a rel=\"vcs-git\" href=\"$git_url\" title=\"$title\">$git_url</a>".
> +		      "</td></tr>\n";
>  		$url_tag = "";
>  	}

Non clickable hyperlink... hmmm...

>  
> -- 
> 1.5.6.5
> 
> 
> 
> -- 
> see shy jo

-- 
Jakub Narebski
Poland
ShadeHawk on #git

^ permalink raw reply

* Re: [PATCH] Get format-patch to show first commit after root commit
From: Junio C Hamano @ 2009-01-10  0:49 UTC (permalink / raw)
  To: Nathan W. Panike; +Cc: git
In-Reply-To: <1231536787-20685-1-git-send-email-nathan.panike@gmail.com>

"Nathan W. Panike" <nathan.panike@gmail.com> writes:

> Rework this patch to try to handle the case where one does
>
> git format-patch -n ...
>
> and n is a number larger than 1.

It is unclear what "this patch" is in the context of this proposed commit
message.

> git format-patch -1 e83c5163316f89bfbde
> ...

I do not think the current backward compatibile behaviour to avoid
surprising the end user by creating a huge initial import diff is
particularly a good idea.

I do not see anything special you do for "one commit" case in your patch,
yet the proposed commit message keeps stressing "-1", which puzzles me.

Wouldn't it suffice to simply say something like:

    You need to explicitly ask for --root to obtain a patch for the root
    commit.  This may have been a good way to make sure that the user
    realizes that a patch from the root commit won't be applicable to a
    history with existing data, but we should assume the user knows what
    he is doing when the user explicitly specifies a range of commits that
    includes the root commit.

Perhaps there are some other downsides I may not remember why --root is
not the default, though.

> Signed-off-by: Nathan W. Panike <nathan.panike@gmail.com>
> ---
>  builtin-log.c |    3 +++
>  1 files changed, 3 insertions(+), 0 deletions(-)
>
> diff --git a/builtin-log.c b/builtin-log.c
> index 4a02ee9..0eca15f 100644
> --- a/builtin-log.c
> +++ b/builtin-log.c
> @@ -975,6 +975,9 @@ int cmd_format_patch(int argc, const char **argv, const char *prefix)
>  		nr++;
>  		list = xrealloc(list, nr * sizeof(list[0]));
>  		list[nr - 1] = commit;
> +		if(!commit->parents){
> +			rev.show_root_diff=1;
> +		}

Three issues.

 - The "if(){" violates style by not having one SP before "(" and after ")",
   and surrounds a single statement with needless { } pair.  You need one SP
   on each side of the = (assignment) as well.

 - Because rev.show_root_diff is a no-op for non-root commit anyway, I do not
   think you even want a conditional there.

 - It is a bad style to muck with rev.* while it is actively used for
   iteration (note that the above part is in a while loop that iterates over
   &rev).

I think the attached would be a better patch.  We already have a
configuration to control if we show the patch for a root commit by
default, and we can use reuse it here.  The configuration defaults to true
these days.

Because the code before the hunk must check if the user said "--root
commit" or just "commit" from the command line and behave quite
differently by looking at rev.show_root_diff, we cannot do this assignment
before the command line parsing like other commands in the log family.

 builtin-log.c |    8 ++++++++
 1 files changed, 8 insertions(+), 0 deletions(-)

diff --git c/builtin-log.c w/builtin-log.c
index 4a02ee9..2d2c111 100644
--- c/builtin-log.c
+++ w/builtin-log.c
@@ -935,6 +935,14 @@ int cmd_format_patch(int argc, const char **argv, const char *prefix)
 		 * get_revision() to do the usual traversal.
 		 */
 	}
+
+	/*
+	 * We cannot move this anywhere earlier because we do want to
+	 * know if --root was given explicitly from the comand line.
+	 */
+	if (default_show_root)
+		rev.show_root_diff = 1;
+
 	if (cover_letter) {
 		/* remember the range */
 		int i;

^ permalink raw reply related

* Re: [PATCH] gitweb: support the rel=vcs microformat
From: Jakub Narebski @ 2009-01-10  0:03 UTC (permalink / raw)
  To: Joey Hess; +Cc: Giuseppe Bilotta, git
In-Reply-To: <20090107190238.GA3909@gnu.kitenet.net>

Joey Hess <joey@kitenet.net> writes:
> Joey Hess wrote:

> > Another approach would be to just memoize git_get_project_description
> > and git_get_project_url_list.
> 
> Especially since git_get_project_description is already called more than
> once for some pages.

Hmmm... this is an idea worth checking.

-- 
Jakub Narebski
Poland
ShadeHawk on #git

^ permalink raw reply

* Re: [PATCH] gitweb: support the rel=vcs microformat
From: Jakub Narebski @ 2009-01-10  0:01 UTC (permalink / raw)
  To: Joey Hess; +Cc: Giuseppe Bilotta, git
In-Reply-To: <20090107184113.GA31795@gnu.kitenet.net>

Joey Hess <joey@kitenet.net> writes:
> Giuseppe Bilotta wrote:
> > Joey Hess <joey@kitenet.net> writes:

> > > Thanks for the feedback. There are some changes happening to the
> > > microformat that should make gitweb's job slightly easier, I'll respin
> > > the patch soon.
> > 
> > Let me know about this too, I very much like the idea of this microformat.
> 
> FYI, I've updated the microformat's page with the changes. The
> significant one for gitweb is that it can now be applied to <a> links.
> So on the project page, the display of the git URL could be converted to
> a link using the microformat, and there's no need to get the info
> earlier to put it in the header. Unfortunatly, the same can't be done to
> the project list page, unless it's changed to have "git" links as seen
> on vger.kernel.org's gitweb.

I'm not sure if making repository URLs to be hyperlinks is a good
idea.  You cannot (should not) click on those in ordinary web browser;
they are to be used by git (that is also additional reason why I am
not so sure about 'git' link on projects_list page idea).

Besides LINK elements in page HEAD are meant mainly for machine; I
think it might be more important to add them for machine there, even
if they are as A elements (links) or just plain text URLs somewhere
else.  For example we have LINK elements with alternate versions,
among others OPML for projectless pages, and RSS/Atom for project
pages, aven though those links are also in page body.

So I'd rather have them LINKs...
-- 
Jakub Narebski
Poland
ShadeHawk on #git

^ permalink raw reply

* Re: [PATCH] gitweb: support the rel=vcs microformat
From: Jakub Narebski @ 2009-01-09 23:56 UTC (permalink / raw)
  To: Giuseppe Bilotta; +Cc: git
In-Reply-To: <gk2794$djn$1@ger.gmane.org>

Giuseppe Bilotta <giuseppe.bilotta@gmail.com> writes:

> On Wednesday 07 January 2009 05:25, Joey Hess wrote:
> 
> > The rel=vcs microformat allows a web page to indicate the locations of
> > repositories related to it in a machine-parseable manner.
> > (See http://kitenet.net/~joey/rfc/rel-vcs/)
> 
> Interesting idea, I like it. However, I see a problem in the proposed
> implementation versus the spec. According to the spec:
> 
> """
> The "title" is optional, but recommended if there are multiple, different
> repositories linked to on one page. It is a human-readable description of the
> repository.
> [...]
> If there are multiple repositories listed, without titles, tools
> should assume they are different repositories.
> """

Good catch.

> 
> In this patch you do NOT add titles to the rel=vcs links, which means that
> everything works fine only if there is a single URL for each project. If a
> project has different URLs, it's going to appear multiple times as _different_
> projects to a spec-compliant reader.
> 
> A possible solution would be to make @git_url_list into a map keyed by the
> project name and having the description and repo URL(s) as values.
> 
> Since there is the possibility of different projects having the same
> description (e.g. the default one), the link title could be composed of
> "$project - $description" rather than simply $description.
> 
> Note that both in summary and in project list view you already retrieve the
> description, so there are no additional disk hits.

Wouldn't "$project git repository" (i.e. do not use description at
all) be a simpler, faster and also _better_ solution?
 

-- 
Jakub Narebski
Poland
ShadeHawk on #git

^ permalink raw reply

* Re: [PATCH] gitweb: support the rel=vcs microformat
From: Jakub Narebski @ 2009-01-09 23:49 UTC (permalink / raw)
  To: Joey Hess; +Cc: git
In-Reply-To: <20090107042518.GB24735@gnu.kitenet.net>

Joey Hess <joey@kitenet.net> writes:

> The rel=vcs microformat allows a web page to indicate the locations of
> repositories related to it in a machine-parseable manner.
> (See http://kitenet.net/~joey/rfc/rel-vcs/)

Let me put here an example from avove mentioned page:

  <head>
  <link rel="vcs-git" href="git://example.org/foo.git" 
        title="foo git repository" />
  </head>

  <a rel="vcs-git" href="git://example.org/foo.git" 
     title="git repository">git://example.org/foo.git</a>
  <a rel="vcs-git" href="git://example.org/foo.git">git repository</a>

There is one problem that is not solved in above microformat, but it
is problem only for git hosting sites like repo.or.cz or GitHub,
namely it does not allow to distinguish between fetch (read) link, and
push (write, publish) link.  This is not a problem for standard
(unmodified) gitweb as it shows only read-only git repositories links.

We also have to decide what to put in the 'title' attribute; I think
the simplest would be to put "$project git repository" or something
(for example "git/git.git git repository").

One thing I worry about is that those links (or at least some of those
links) are not meant for the browser to open; also SCP/SSH-like syntax
for SSH protocol in the form of 'user@host:/path/to/repo.git/' which
does not follow URL rules.

> 
> Make gitweb use the microformat in the header of pages it generates,
> if it has been configured with project url information in any of the usual
> ways.

There are two bit separate issues here: marking existing and future
URLs (current project fetch URLs which IIRC are not hyperlinked now;
planned/future 'git' links in project list page; perhaps also links in
OPML and RSS/Atom feeds) with 'rel="vcs-git"', and adding <link .../>
elements to page header.

> 
> Since getting the urls can require hitting disk, I avoided putting the
> microformat on *every* page gitweb generates. Just put it on the project
> summary page, the project list page, and the forks list page.
>
> The first of these already looks up the urls, so adding the microformat was
> free. 

I assume that this patch is only about adding <link ... /> elements to
head?  I think in the case of 'summary' view for a project it is an
excellent idea (similar to having 'prev' and 'next' link elements in
chaptered on-line book in HTML), and would allow for automation using
gitweb as a kind of service announcement.

> There is a small overhead in including the microformat on the latter
> two pages [projects list and list of forks], but getting the project
> descriptions for those pages already incurs a similar overhead, and
> the ability to get every repo url in one place seems worthwhile.

There is also OPML, which might be worth checking.

By the way, for 'projects_list' action and 'forks' actions we have to
decide whether to show _all_ links for each project (there can be more
than one), or whether we show only some main git link (like in the
case of proposed 'git' link).  And whether we trust @git_base_url_list
or do we take it as default and examine per-repository configuration
(more costly).

What is more important: 'project_list' page is already overly large
when hosting very large number of repositories (there were some
patches adding pagination for 'project_list', and perhaps they would
be resend).  Adding <link .../> elements would only add to its size;
and if will be divided into pages we would have also to take it into
account.

> 
> This changes git_get_project_description() to not check wantarray, and only
> return in list context -- the only way it is used AFAICS.

Errr... what? Why do you change git_get_project_description()
subroutine? I don't think it would be good source for 'title'
attribute; perhaps for 'desc' attribute, and only aftre sanitizing
"Unnamed repository; edit this file to name it for gitweb."

Errata: ah, it is git_get_project_url_list() subroutine...

> 
> Signed-off-by: Joey Hess <joey@gnu.kitenet.net>
> ---
>  gitweb/gitweb.perl |   38 ++++++++++++++++++++++++++------------
>  1 files changed, 26 insertions(+), 12 deletions(-)
> 
> diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
> index 99f71b4..3f8a228 100755
> --- a/gitweb/gitweb.perl
> +++ b/gitweb/gitweb.perl
> @@ -789,6 +789,9 @@ $git_dir = "$projectroot/$project" if $project;
>  our @snapshot_fmts = gitweb_get_feature('snapshot');
>  @snapshot_fmts = filter_snapshot_fmts(@snapshot_fmts);
>  
> +# populated later with git urls for the project
> +our @git_url_list;
> +

I'm not sure why this have to be global, but I assume that you want to
avoid recalculationg it in git_header_html

>  # dispatch
>  if (!defined $action) {
>  	if (defined $hash) {
> @@ -2100,17 +2103,22 @@ sub git_show_project_tagcloud {
>  }
>  
>  sub git_get_project_url_list {
> +	# use per project git URL list in $projectroot/$path/cloneurl
> +	# or make project git URL from git base URL and project name

I'd rather use separate subroutine for the second, I think.

>  	my $path = shift;
>  
> +	my @ret;
> +
>  	$git_dir = "$projectroot/$path";
> -	open my $fd, "$git_dir/cloneurl"
> -		or return wantarray ?
> -		@{ config_to_multi(git_get_project_config('url')) } :
> -		   config_to_multi(git_get_project_config('url'));
> -	my @git_project_url_list = map { chomp; $_ } <$fd>;
> -	close $fd;
> +	if (open my $fd, "$git_dir/cloneurl") {
> +		@ret = map { chomp; $_ } <$fd>;
> +		close $fd;
> +	}
> +	else {

Style: "} else {"

> +	       @ret = @{ config_to_multi(git_get_project_config('url')) };
> +	}
>  
> -	return wantarray ? @git_project_url_list : \@git_project_url_list;
> +	return @ret ? @ret : map { "$_/$project" } @git_base_url_list;
>  }

Hmmm... currently gitweb does it at caller:

	my @url_list = git_get_project_url_list($project);
	@url_list = map { "$_/$project" } @git_base_url_list unless @url_list;

Why do you want to put this in git_get_project_url_list()? Please
explain (here and in the commit message too; it has to be mentioned in
commit message that you cnage semantics a bit, and explain why you did
so).

>  
>  sub git_get_projects_list {
> @@ -2953,6 +2961,10 @@ EOF

Sidenote: this should be

  @@ -2953,6 +2961,10 @@ sub git_header_html {

but I'm not sure if it would be possible to automate...

>  		print qq(<link rel="shortcut icon" href="$favicon" type="image/png" />\n);
>  	}
>  
> +	foreach my $url (@git_url_list) {
> +		print qq{<link rel="vcs" type="git" href="$url" />\n};
> +	}
> +

Errr... in mentioned http://kitenet.net/~joey/rel-vcs/ it is

  <link rel="vcs-git" href="$url" title="$project git repository" />

and not

  <link rel="vcs" type="git" href="$url" />

Besides, 'type' attribute for A and LINK elements is about advisory
conent-type of the document pointed by link:

 type = content-type [CI]
    This attribute gives an advisory hint as to the content type of
    the content available at the link target address. It allows user
    agents to opt to use a fallback mechanism rather than fetch the
    content if they are advised that they will get content in a
    content type they do not support.  Authors who use this attribute
    take responsibility to manage the risk that it may become
    inconsistent with the content available at the link target
    address.  
    For the current list of registered content types, please consult
    [MIMETYPES].

>  	print "</head>\n" .
>  	      "<body>\n";
>  
> @@ -4380,6 +4392,8 @@ sub git_project_list {
>  		die_error(404, "No projects found");
>  	}
>  
> +	@git_url_list = map { git_get_project_url_list($_->{path}) } @list;
> +
>  	git_header_html();
>  	if (-f $home_text) {
>  		print "<div class=\"index_include\">\n";
> @@ -4400,6 +4414,8 @@ sub git_forks {
>  	if (defined $order && $order !~ m/none|project|descr|owner|age/) {
>  		die_error(400, "Unknown order parameter");
>  	}
> +	
> +	@git_url_list = map { git_get_project_url_list($_->{path}) } @list;
>  
>  	my @list = git_get_projects_list($project);
>  	if (!@list) {

Those two are pretty straightforward, but please note that
'project_list' view (action) might be _already_ too large...

> @@ -4457,6 +4473,8 @@ sub git_summary {
>  		@forklist = git_get_projects_list($project);
>  	}
>  
> +	@git_url_list = git_get_project_url_list($project);
> +
>  	git_header_html();
>  	git_print_page_nav('summary','', $head);
>  
> @@ -4468,12 +4486,8 @@ sub git_summary {
>  		print "<tr id=\"metadata_lchange\"><td>last change</td><td>$cd{'rfc2822'}</td></tr>\n";
>  	}
>  
> -	# use per project git URL list in $projectroot/$project/cloneurl
> -	# or make project git URL from git base URL and project name
>  	my $url_tag = "URL";
> -	my @url_list = git_get_project_url_list($project);
> -	@url_list = map { "$_/$project" } @git_base_url_list unless @url_list;
> -	foreach my $git_url (@url_list) {
> +	foreach my $git_url (@git_url_list) {
>  		next unless $git_url;
>  		print "<tr class=\"metadata_url\"><td>$url_tag</td><td>$git_url</td></tr>\n";
>  		$url_tag = "";
> -- 
> 1.5.6.5

This is also pretty straightforward: it moves calculation earlier for
results to be shared with git_header_html (and uses global variable).

-- 
Jakub Narebski
Poland
ShadeHawk on #git

^ permalink raw reply

* [PATCH 2/2] grep: don't call regexec() for fixed strings
From: René Scharfe @ 2009-01-09 23:18 UTC (permalink / raw)
  To: Git Mailing List; +Cc: Junio C Hamano
In-Reply-To: <4967D8F8.9070508@lsrfire.ath.cx>

Add the new flag "fixed" to struct grep_pat and set it if the pattern
is doesn't contain any regex control characters in addition to if the
flag -F/--fixed-strings was specified.

This gives a nice speed up on msysgit, where regexec() seems to be
extra slow.  Before (best of five runs):

	$ time git grep grep v1.6.1 >/dev/null

	real    0m0.552s
	user    0m0.000s
	sys     0m0.000s

	$ time git grep -F grep v1.6.1 >/dev/null

	real    0m0.170s
	user    0m0.000s
	sys     0m0.015s

With the patch:

	$ time git grep grep v1.6.1 >/dev/null

	real    0m0.173s
	user    0m0.000s
	sys     0m0.000s

The difference is much smaller on Linux, but still measurable.

Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
---
 grep.c |   29 +++++++++++++++++++++++++----
 grep.h |    1 +
 2 files changed, 26 insertions(+), 4 deletions(-)

diff --git a/grep.c b/grep.c
index 394703b..a1092df 100644
--- a/grep.c
+++ b/grep.c
@@ -28,9 +28,31 @@ void append_grep_pattern(struct grep_opt *opt, const char *pat,
 	p->next = NULL;
 }
 
+static int isregexspecial(int c)
+{
+	return isspecial(c) || c == '$' || c == '(' || c == ')' || c == '+' ||
+			       c == '.' || c == '^' || c == '{' || c == '|';
+}
+
+static int is_fixed(const char *s)
+{
+	while (!isregexspecial(*s))
+		s++;
+	return !*s;
+}
+
 static void compile_regexp(struct grep_pat *p, struct grep_opt *opt)
 {
-	int err = regcomp(&p->regexp, p->pattern, opt->regflags);
+	int err;
+
+	if (opt->fixed || is_fixed(p->pattern))
+		p->fixed = 1;
+	if (opt->regflags & REG_ICASE)
+		p->fixed = 0;
+	if (p->fixed)
+		return;
+
+	err = regcomp(&p->regexp, p->pattern, opt->regflags);
 	if (err) {
 		char errbuf[1024];
 		char where[1024];
@@ -159,8 +181,7 @@ void compile_grep_patterns(struct grep_opt *opt)
 		case GREP_PATTERN: /* atom */
 		case GREP_PATTERN_HEAD:
 		case GREP_PATTERN_BODY:
-			if (!opt->fixed)
-				compile_regexp(p, opt);
+			compile_regexp(p, opt);
 			break;
 		default:
 			opt->extended = 1;
@@ -314,7 +335,7 @@ static int match_one_pattern(struct grep_opt *opt, struct grep_pat *p, char *bol
 	}
 
  again:
-	if (!opt->fixed) {
+	if (!p->fixed) {
 		regex_t *exp = &p->regexp;
 		hit = !regexec(exp, bol, ARRAY_SIZE(pmatch),
 			       pmatch, 0);
diff --git a/grep.h b/grep.h
index 45a222d..5102ce3 100644
--- a/grep.h
+++ b/grep.h
@@ -30,6 +30,7 @@ struct grep_pat {
 	const char *pattern;
 	enum grep_header_field field;
 	regex_t regexp;
+	unsigned fixed:1;
 };
 
 enum grep_expr_node {
-- 
1.6.1

^ permalink raw reply related

* [PATCH 1/2] grep -w: forward to next possible position after rejected match
From: René Scharfe @ 2009-01-09 23:08 UTC (permalink / raw)
  To: Git Mailing List; +Cc: Junio C Hamano

grep -w accepts matches between non-word characters, only.  If a match
from regexec() doesn't meet this criteria, grep continues its search
after the first character of that match.

We can be a bit smarter here and skip all positions that follow a word
character first, as they can't match our criteria.  This way we can
consume characters quite cheaply and don't need to special-case the
handling of the beginning of a line.

Here's a contrived example command on msysgit (best of five runs):

	$ time git grep -w ...... v1.6.1 >/dev/null

	real    0m1.611s
	user    0m0.000s
	sys     0m0.015s

With the patch it's quite a bit faster:

	$ time git grep -w ...... v1.6.1 >/dev/null

	real    0m1.179s
	user    0m0.000s
	sys     0m0.015s

More common search patterns will gain a lot less, but it's a nice clean
up anyway.

Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
---
 grep.c |   11 +++++++----
 1 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/grep.c b/grep.c
index 49e9319..394703b 100644
--- a/grep.c
+++ b/grep.c
@@ -294,7 +294,6 @@ static struct {
 static int match_one_pattern(struct grep_opt *opt, struct grep_pat *p, char *bol, char *eol, enum grep_context ctx)
 {
 	int hit = 0;
-	int at_true_bol = 1;
 	int saved_ch = 0;
 	regmatch_t pmatch[10];
 
@@ -337,7 +336,7 @@ static int match_one_pattern(struct grep_opt *opt, struct grep_pat *p, char *bol
 		 * either end of the line, or at word boundary
 		 * (i.e. the next char must not be a word char).
 		 */
-		if ( ((pmatch[0].rm_so == 0 && at_true_bol) ||
+		if ( ((pmatch[0].rm_so == 0) ||
 		      !word_char(bol[pmatch[0].rm_so-1])) &&
 		     ((pmatch[0].rm_eo == (eol-bol)) ||
 		      !word_char(bol[pmatch[0].rm_eo])) )
@@ -349,10 +348,14 @@ static int match_one_pattern(struct grep_opt *opt, struct grep_pat *p, char *bol
 			/* There could be more than one match on the
 			 * line, and the first match might not be
 			 * strict word match.  But later ones could be!
+			 * Forward to the next possible start, i.e. the
+			 * next position following a non-word char.
 			 */
 			bol = pmatch[0].rm_so + bol + 1;
-			at_true_bol = 0;
-			goto again;
+			while (word_char(bol[-1]) && bol < eol)
+				bol++;
+			if (bol < eol)
+				goto again;
 		}
 	}
 	if (p->token == GREP_PATTERN_HEAD && saved_ch)
-- 
1.6.1

^ permalink raw reply related

* [PATCH] t7700: demonstrate misbehavior of 'repack -a' when local packs exist
From: Brandon Casey @ 2009-01-09 22:14 UTC (permalink / raw)
  To: git

The ability to "...fatten [the] local repository by packing everything that
is needed by the local ref into a single new pack, including things that are
borrowed from alternates"[1] is supposed to be provided by the '-a' or '-A'
options to repack when '-l' is not used, but there is a flaw.  For each
pack in the local repository without a .keep file, repack supplies a
--unpacked=<pack> argument to pack-objects.

The --unpacked option to pack-objects, with or without an argument, causes
pack-objects to ignore any object which is packed in a pack not mentioned
in an argument to --unpacked=.  So, if there are local packs, and
'repack -a' is called, then any objects which reside in packs accessible
through alternates will _not_ be packed.  If there are no local packs, then
no --unpacked argument will be supplied, and repack will behave as expected.

[1] http://mid.gmane.org/7v8wrwidi3.fsf@gitster.siamese.dyndns.org

Signed-off-by: Brandon Casey <casey@nrlssc.navy.mil>
---


FYI: I won't be looking in to a fix for this immediately. So if someone else
     has time and the inclination, please be my guest. Also, you can thank a
     transformer blow, and lots of disk loss for the discovery of this bug.

-brandon


 t/t7700-repack.sh |   19 +++++++++++++++++++
 1 files changed, 19 insertions(+), 0 deletions(-)

diff --git a/t/t7700-repack.sh b/t/t7700-repack.sh
index 3f602ea..f5682d6 100755
--- a/t/t7700-repack.sh
+++ b/t/t7700-repack.sh
@@ -69,5 +69,24 @@ test_expect_success 'packed obs in alt ODB are repacked even when local repo is
 	done
 '
 
+test_expect_failure 'packed obs in alt ODB are repacked when local repo has packs' '
+	rm -f .git/objects/pack/* &&
+	echo new_content >> file1 &&
+	git add file1 &&
+	git commit -m more_content &&
+	git repack &&
+	git repack -a -d &&
+	myidx=$(ls -1 .git/objects/pack/*.idx) &&
+	test -f "$myidx" &&
+	for p in alt_objects/pack/*.idx; do
+		git verify-pack -v $p | sed -n -e "/^[0-9a-f]\{40\}/p"
+	done | while read sha1 rest; do
+		if ! ( git verify-pack -v $myidx | grep "^$sha1" ); then
+			echo "Missing object in local pack: $sha1"
+			return 1
+		fi
+	done
+'
+
 test_done
 
-- 
1.6.1.76.gc123b

^ permalink raw reply related

* git submodule merge madness
From: Ask Bjørn Hansen @ 2009-01-09 21:50 UTC (permalink / raw)
  To: git

Hi,

We've (again) replaced a few directories with submodules.  Man, it's  
madness!

The typical problem is that we get an error trying to merge a "pre- 
submodule" branch into master:

	fatal: cannot read object 894c77319a18c4d48119c2985a9275c9f5883584  
'some/sub/dir': It is a submodule!
Mark Levedahl wrote an example in July, but I don't think he got any  
replies:  http://marc.info/?l=git&m=121587851313303
Any ideas?   Is there something we can do?    I see a strong  
correlation between adding a new submodule and the number of "git  
sucks" messages on our internal IRC server.


  - ask

-- 
http://develooper.com/ - http://askask.com/

^ permalink raw reply

* [PATCH] Get format-patch to show first commit after root commit
From: Nathan W. Panike @ 2009-01-09 21:33 UTC (permalink / raw)
  To: git; +Cc: Nathan W. Panike

Rework this patch to try to handle the case where one does

git format-patch -n ...

and n is a number larger than 1.  Currently, the command

git format-patch -1 e83c5163316f89bfbde

in the git repository creates an empty file.  Instead, one is
forced to do

git format-patch -1 --root e83c5163316f89bfbde

This seems arbitrary.  This patch fixes this case, so that

git format-patch -1 e83c5163316f89bfbde

will produce an actual patch.

Signed-off-by: Nathan W. Panike <nathan.panike@gmail.com>
---
 builtin-log.c |    3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/builtin-log.c b/builtin-log.c
index 4a02ee9..0eca15f 100644
--- a/builtin-log.c
+++ b/builtin-log.c
@@ -975,6 +975,9 @@ int cmd_format_patch(int argc, const char **argv, const char *prefix)
 		nr++;
 		list = xrealloc(list, nr * sizeof(list[0]));
 		list[nr - 1] = commit;
+		if(!commit->parents){
+			rev.show_root_diff=1;
+		}
 	}
 	total = nr;
 	if (!keep_subject && auto_number && total > 1)
-- 
1.6.1.76.gc123b.dirty

^ permalink raw reply related

* Re: 1.5.6.5 fails to clone git.kernel.org/[...]/rostedt/linux-2.6-rt
From: Tim Shepard @ 2009-01-09 21:08 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Johannes Schindelin, Daniel Barkalow, Miklos Vajna
In-Reply-To: <7vpriw26uo.fsf@gitster.siamese.dyndns.org>



Junio,

Thank you for your good explanation.

Also thanks to Miklos Vajna who also replied to suggest using git:// transport.


(Over 3 years ago I used git glone to get a copy of torvalds/linux-2.6.git
 using rsync transport and I copied the recipe I used then.  It has been
 working for "git pull" updates, and for one from-scratch re-clone
 I did sometime last year. I had no reason to suspect it was broken.)

This morning I re-started the clone using git:// transport and it worked OK.


			-Tim Shepard
			 shep@alum.mit.edu

^ permalink raw reply

* Re: [PATCH 0/3] Teach Git about the patience diff algorithm
From: Junio C Hamano @ 2009-01-09 20:53 UTC (permalink / raw)
  To: Johannes Schindelin
  Cc: Adeodato Simó, Linus Torvalds, Clemens Buchacher,
	Pierre Habouzit, davidel, Francis Galiegue, Git ML
In-Reply-To: <alpine.DEB.1.00.0901091405460.30769@pacific.mpi-cbg.de>

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

> On Thu, 8 Jan 2009, Junio C Hamano wrote:
>
>> If we find the "common" context lines that have only blank and 
>> punctuation letters in Dscho output, turn each of them into "-" and "+", 
>> and rearrange them so that all "-" are together followed by "+", it will 
>> match Bzr output.
>
> So we'd need something like this (I still think we should treat curly 
> brackets the same as punctuation, and for good measure I just handled 
> everything that is not alphanumerical the same):

I meant by punctuation to include curlies (my wording may have been wrong
but from the example with " }" line it should have been obvious).

But I agree with both points Linus raised.  The criteria to pick what to
pretend unmatching should be "small insignificant lines" (small goes for
both size and also number of consecutive "insignificant" lines), and the
coallescing should be done to join a block of consecutive changed lines of
a significant size (so you do not join two 1 or 2-line "changed line"
blocks by pretending that a 1-line unchanged insignificant line in between
them is unmatching).

^ permalink raw reply

* Re: [PATCH] Get format-patch to show first commit after root commit
From: Alexander Potashev @ 2009-01-09 20:29 UTC (permalink / raw)
  To: Nathan W. Panike; +Cc: git
In-Reply-To: <1231529725-19767-1-git-send-email-nathan.panike@gmail.com>

Hello!

I experienced this problem today while preparing a simple patch for
reply in "[PATCH 2/2] Use is_pseudo_dir_name everywhere" thread.
I used a workaround: add a file, commit, remove it, commit, add it once
again, commit and after all format-patch.

On 13:35 Fri 09 Jan     , Nathan W. Panike wrote:
> Currently, the command
> 
> git format-patch -1 e83c5163316f89bfbde
> 
> in the git repository creates an empty file.  Instead, one is
> forced to do
> 
> git format-patch -1 --root e83c5163316f89bfbde
> 
> This seems arbitrary.  This patch fixes this case, so that
> 
> git format-patch -1 e83c5163316f89bfbde

Your patch doesn't solve the problem if there are more than one commit
(say, 2 commits) and you run 'git format-patch -2'. Even with your patch
format-patch writes an empty patch file corresponding to the root commit
(actually, it creates 2 patches, but the first is empty).

Please, correct me if I'm wrong.

> 
> will produce an actual patch.
> ---
>  builtin-log.c |    2 ++
>  1 files changed, 2 insertions(+), 0 deletions(-)
> 
> diff --git a/builtin-log.c b/builtin-log.c
> index 4a02ee9..5e7b61f 100644
> --- a/builtin-log.c
> +++ b/builtin-log.c
> @@ -977,6 +977,8 @@ int cmd_format_patch(int argc, const char **argv, const char *prefix)
>  		list[nr - 1] = commit;
>  	}
>  	total = nr;
> +	if (total == 1 && !list[0]->parents)
> +		rev.show_root_diff=1;
>  	if (!keep_subject && auto_number && total > 1)
>  		numbered = 1;
>  	if (numbered)
> -- 
> 1.6.1.76.gc123b.dirty

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox