Git development
 help / color / mirror / Atom feed
* Re: rev-list/tree committer/author information.
From: Petr Baudis @ 2005-05-16 22:19 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Linus Torvalds, Daniel Barkalow, Git Mailing List
In-Reply-To: <7vekc6onzc.fsf_-_@assigned-by-dhcp.cox.net>

Dear diary, on Mon, May 16, 2005 at 11:33:11PM CEST, I got a letter
where Junio C Hamano <junkio@cox.net> told me that...
> >>>>> "LT" == Linus Torvalds <torvalds@osdl.org> writes:
> 
> LT> Anyway, everything I've read so far makes sense, and it
> LT> might make sense to continue git development using just
> LT> git-pb. The only thing I personally think sucks is the
> LT> author/committer matching of git-rev-list/tree, since it
> LT> would seem like somebody might well like to match on an
> LT> arbitrary part of a commit, and special-casing
> LT> author/committer seems somewhat broken.
> 
> Well, that author/committer thing is not in git-pb yet, if I am
> not mistaken [*1].
> 
> The only reason why I did it that way was because the strategy
> taken by "struct object" derivatives seemed to pick up bare
> absolute minimum to support actual callers that have immediate
> need for information stored in structural fields, as opposed to
> designing for helping yet to be written callers by adding fields
> to hold information of "having this might also help somebody in
> the future" type.  And the author and committer names are in the
> structured fields while signed-off-by and others are not.  Also
> when author / committer name strings are intern'ed like the way
> I did, the memory consumption even for a long sequence of
> commits are kept reasonably low.  However,...

I like Linus' suggestion. At the very least, what about making the
matching generic for the header? Something like --match-header or
whatever.

> *1* Petr has been applying quite good judgements. I would have
> polluted git-jc with that patch already if I were still running
> it.  So far, I have been generally happy with his acceptance
> criteria for external patches.  Anything he places on hold or
> just outright returns to me, I later find rooms for improvements
> myself, and the later rounds that eventually get accepted always
> turn out to be far cleaner, thanks to his comments.

Thanks for the praise :-), but I'm actually quite unhappy with putting
patches "on hold" and it's not intentional. It's just that I don't feel
right about the patch enough to apply it immediately, and I either don't
have time to voice my concerns if they are non-trivial, or I just want
some time to think about it.

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor

^ permalink raw reply

* Re: Summary of core GIT while you are away.
From: Petr Baudis @ 2005-05-16 22:22 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Junio C Hamano, braddr, nico, david, Git Mailing List
In-Reply-To: <Pine.LNX.4.58.0505160837080.28162@ppc970.osdl.org>

Dear diary, on Mon, May 16, 2005 at 06:10:10PM CEST, I got a letter
where Linus Torvalds <torvalds@osdl.org> told me that...
> The only question is whether you want to have it be human-readable by
> default (indent the message lines with a <tab>, and nonheaders with
> <tab>+4*<space> or something), and then have a "-z" flag to do the
> machine-readable version described above.

I'd prefer that, in order to be consistent with the diff tools and such.

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor

^ permalink raw reply

* Re: [PATCH 3/4] Implement git-checkout-cache -u to update stat information in the cache.
From: Junio C Hamano @ 2005-05-16 22:34 UTC (permalink / raw)
  To: Petr Baudis; +Cc: git, torvalds
In-Reply-To: <20050516220153.GA8609@pasky.ji.cz>

>>>>> "PB" == Petr Baudis <pasky@ucw.cz> writes:

PB> That reminds me, the patches would be a little easier for me to process
PB> if you followed the /dev/null convention.

Oops, thanks for spotting a bug in "jit-log -p" ;-).


^ permalink raw reply

* Re: rev-list/tree committer/author information.
From: Daniel Barkalow @ 2005-05-16 22:35 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Linus Torvalds, pasky, Git Mailing List
In-Reply-To: <7vekc6onzc.fsf_-_@assigned-by-dhcp.cox.net>

On Mon, 16 May 2005, Junio C Hamano wrote:

> The only reason why I did it that way was because the strategy
> taken by "struct object" derivatives seemed to pick up bare
> absolute minimum to support actual callers that have immediate
> need for information stored in structural fields, as opposed to
> designing for helping yet to be written callers by adding fields
> to hold information of "having this might also help somebody in
> the future" type.  And the author and committer names are in the
> structured fields while signed-off-by and others are not.  Also
> when author / committer name strings are intern'ed like the way
> I did, the memory consumption even for a long sequence of
> commits are kept reasonably low.  However,...

I think if we had a good reason to make author and committer structured,
there must be some reason we'll want them parsed out eventually, although
we may not have reached the point at which it makes sense to do it.

> The above implies to keep the unpacked raw data as a whole to be
> accessible to the callers for at least commit objects and if we
> go that route I think it would make more sense to do that
> uniformly for everything (probably except for pure "blob"
> objects for size concerns but we might as well do them while we
> are at it).  On the other hand, the current lifetime rules being
> what it is, that strategy may introduce memory consumption
> problems when working on a huge project.

Perhaps have a struct object field for the unpacked data, have it filled
by a unpack_object(struct object *) function, have the parse functions use
it if it's filled, and have a function to free it (and NULL the field).

So, if you want the actual contents, you'd call
unpack_object(&commit->obj) before parse_commit(commit), and
close_object(&commit_obj) when you're done, and the parse wouldn't
duplicate the work. The lifetime rules only really care that the return
value from lookup_*() stays the same for the duration of the program, not
that the fields in it don't get cleared (aside from the flags and the
hash).

This could be useful even for blobs; I'm working on a program to do
diff/merge on commits and everything under them (taking advantage of all
available information if the fast path leads to a conflict); it would be
nice not to have to go through the filesystem with the blob data.

	-Daniel
*This .sig left intentionally blank*


^ permalink raw reply

* Re: Mercurial 0.4e vs git network pull
From: Tristan Wibberley @ 2005-05-16 22:22 UTC (permalink / raw)
  To: git
In-Reply-To: <20050515124042.GE13024@pasky.ji.cz>

On Sun, 2005-05-15 at 14:40 +0200, Petr Baudis wrote:
> Dear diary, on Sun, May 15, 2005 at 01:22:19PM CEST, I got a letter
> where "Adam J. Richter" <adam@yggdrasil.com> told me that...
> > 
> > 	I don't understand what was wrong with Jeff Garzik's previous
> > suggestion of using http/1.1 pipelining to coalesce the round trips.
> > If you're worried about queuing too many http/1.1 requests, the client
> > could adopt a policy of not having more than a certain number of
> > requests outstanding or perhaps even making a new http connection
> > after a certain number of requests to avoid starving other clients
> > when the number of clients doing one of these transfers exceeds the
> > number of threads that the http server uses.
> 
> The problem is that to fetch a revision tree, you have to
> 
> 	send request for commit A
> 	receive commit A
> 	look at commit A for list of its parents
> 	send request for the parents
> 	receive the parents
> 	look inside for list of its parents
> 	...

What about IMAP? You could ask for just the parents for several messages
(via a message header), then start asking for message bodies (with the
juicy stuff in). You could also ask for a list of the new commits then
ask for each of the bodies (several at a time). Not as good as a "Just
give me all new data", but an *awful* lot more efficient than HTTP. And
very flexible. You just need to map changesets to IMAP messages (if such
a mapping can actually make sense :)

Prolly a bit more work though.

--
Tristan Wibberley

The opinions expressed in this message are my own opinions and not those
of my employer.



^ permalink raw reply

* Re: [PATCH 3/4] Implement git-checkout-cache -u to update stat information in the cache.
From: Linus Torvalds @ 2005-05-16 22:32 UTC (permalink / raw)
  To: Petr Baudis; +Cc: Junio C Hamano, git
In-Reply-To: <20050516220153.GA8609@pasky.ji.cz>



On Tue, 17 May 2005, Petr Baudis wrote:
>
> Dear diary, on Sun, May 15, 2005 at 11:23:12PM CEST, I got a letter
> where Junio C Hamano <junkio@cox.net> told me that...
> > With -u flag, git-checkout-cache picks up the stat information
> > from newly created file and updates the cache.  This removes the
> > need to run git-update-cache --refresh immediately after running
> > git-checkout-cache.
> 
> I actually feel ok with this, but I wonder about Linus' opinion about
> it.  :-)

I don't think I mind any more.

My initial reluctance to do this was based on the fact that I wanted to 
avoid having something that updates things "both ways" (ie updates both 
the index file _and_ the checked-out stuff), and the largest reason for 
that was just worrying about stability.

However, it's not like we've been having major stability issues, afaik, 
and as long as the index file locking is honoured (which the patch seemed 
to do), I don't have any real issues with it.

		Linus

^ permalink raw reply

* Re: [PATCH 2/4] Tweak diff output further to make it a bit less distracting.
From: Junio C Hamano @ 2005-05-16 22:51 UTC (permalink / raw)
  To: Petr Baudis; +Cc: git, torvalds
In-Reply-To: <20050516220559.GC8609@pasky.ji.cz>

>>>>> "PB" == Petr Baudis <pasky@ucw.cz> writes:

PB> What about just throwing away the newlines and just passing '@.'?

Let's just drop the patch altogether unless anybody else has
better justification and pressing needs.

The current one is tolerable, except I do not like the word
"mode" very much.

    --- a/frotz
    +++ b/frotz
    @@ xxx @@
    + asdfasdf
    # mode: 100644 100755 nitfol
    --- a/nitfol
    +++ b/nitfol
    @@ yyy @@
    - asdfasdf
    + asdfasdfasdf
    --- a/rezrov
    +++ b/rezrov
    @@ zzz @@
     ...

This is what we would have got with the patch, which as you say
gives an illusion as if there should exist an empty line in
"frotz" and "nitfol", after the lines the hunks are applied to.
I should not have pushed it to begin with.

    --- a/frotz
    +++ b/frotz
    @@ xxx @@
    + asdfasdf

    @. 100644 100755 nitfol
    --- a/nitfol
    +++ b/nitfol
    @@ yyy @@
    - asdfasdf
    + asdfasdfasdf

    --- a/rezrov
    +++ b/rezrov
    @@ zzz @@
     ...



^ permalink raw reply

* Re: [PATCH 3/4] Implement git-checkout-cache -u to update stat information in the cache.
From: Junio C Hamano @ 2005-05-16 23:11 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Petr Baudis, git
In-Reply-To: <Pine.LNX.4.58.0505161530230.18337@ppc970.osdl.org>

>>>>> "LT" == Linus Torvalds <torvalds@osdl.org> writes:

LT> ... and as long as the index file locking is honoured
LT> (which the patch seemed to do) ...

Yes, it does.  But that was one thing I was not very sure about
this patch.  I am reasonably sure this "index.c" conflicts with
the libification effort, and when the libification finishes, I
feel that what "index.c" does should be part of it.


^ permalink raw reply

* Re: rev-list/tree committer/author information.
From: Junio C Hamano @ 2005-05-16 23:17 UTC (permalink / raw)
  To: Daniel Barkalow; +Cc: Linus Torvalds, pasky, Git Mailing List
In-Reply-To: <Pine.LNX.4.21.0505161747290.30848-100000@iabervon.org>

>>>>> "DB" == Daniel Barkalow <barkalow@iabervon.org> writes:

DB> Perhaps have a struct object field for the unpacked data,...
DB> ..., and the parse wouldn't duplicate the work.

Absolutely.  That is what I wanted to see ever since I started
looking at "struct object" derivatives.


^ permalink raw reply

* [PATCH 0/2] Introducing git-run-with-user-path program.
From: Junio C Hamano @ 2005-05-16 23:21 UTC (permalink / raw)
  To: pasky, torvalds; +Cc: git
In-Reply-To: <7vu0l3puzg.fsf@assigned-by-dhcp.cox.net>

I've polished the driver part a bit more, so that it can do
something similar to "git-ls-files --others" can do, but with
the (to-be-defined) "ignore rules Porcelain layers would agree
upon". 

 [PATCH 1/2] Introduce git-run-with-user-path helper program.
 [PATCH 2/2] Add sample ignore logic to git-run-with-user-path command.

The first one adds a path canonicalization helper with path
ignore hooks but no ignore logic implementation (it passes
everything that passes verify_path()).  The second one adds a
sample ignore logic implementation using PCRE.

Although the second one is done primarily as an example and to
start a mailing list discussion, it should be also safe to merge
if you decide to take patch 1, because the logic is used only by
git-run-with-user-path which is a new program. no Porcelain uses
right now.


^ permalink raw reply

* Re: [PATCH 2/4] Tweak diff output further to make it a bit less distracting.
From: Linus Torvalds @ 2005-05-16 23:28 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Petr Baudis, git
In-Reply-To: <7vsm0mn5s1.fsf@assigned-by-dhcp.cox.net>



On Mon, 16 May 2005, Junio C Hamano wrote:

>     # mode: 100644 100755 nitfol
>     --- a/nitfol
>     +++ b/nitfol

>     @. 100644 100755 nitfol
>     --- a/nitfol
>     +++ b/nitfol

I have to say, I muct prefer the first over the second.

I don't know why people think "line noise" means "computer readable".  To
me, "@." look slike line noise, and worse, it's clearly _less_
disambiguous than spelling out "mode". The fact that it (on purpose, I
assume) looks somewhat like the "@@" beginning of a patch hunk for the
_previous_ file makes it even worse.

Terseness is filne, and thus maybe it could be just

	# 100644 100755 nitfol

but on the other hand that doesn't really have any advantages either. 
Except being better than the inexplicable "@."

Me, I'd prefer making it clear whether the file is "new", "removed" or
"changed". That's really what matters, and at least the "new" case does
need the mode (while the "removed" case does not). So instead of talking
about "mode", which is largely irrelevant (the important thing is to
indicate whether it's new or old), we should talk about how the file has
changed.

There's another issue I noticed with the current default 'diff' output: I 
often (almost always) want to have a way to go to "next file", which in 
traditional diffs I just do with "/^diff" when paging with 'less'. The 
current one doesn't have a way to do that - searching for '^--- ' comes 
closest, but is ambiguous, since it might be a part of a _patch_ that 
contains a line that got removed and that started with "-- ".

So I'd actually prefer some output format that fixed that thing too, by 
having a header for each file.

Making the header be "diff a/file b/file" would make the thing look the 
same as a regular diff, and that also makes it disambiguous (and thus easy 
for machines to parse) to have a _second_ or third line for things like 
mode change information.

So my preferred format would actually be

	diff -git a/filename b/filename
	<optional extended header lines>
	--- a/filename
	+++ b/filename
	@@ -xx ....

where the "optional extended header lines" would be very plain, like

	old mode 100644
	new mode 100755

or

	new file mode 100644

or

	deleted file mode 100644

or something like that.

In fact anything _except_ for something that starts with "-" "+" " " or
"@"  which have special meanings inside diffs (whether you want the "mode"
for the deleted file case or not is up to you - it could be an added
sanity check that the diff actually matches, but on the other hand there's
really nothing you could do anyway except warn if the mode didn't match,
so..)

This makes it very easy to parse mechanically: look for a line that starts
with "diff -git " (which cannot be part of the "meat" of the patch), and
then you know that you've found the start of an extended patch.

Why the "-git"? A normal patch header already looks something like the 
following (head of the pre-git 2.6.12-rc1 patch):

	diff -Nru a/CREDITS b/CREDITS
	--- a/CREDITS   2005-03-17 17:35:10 -08:00
	+++ b/CREDITS   2005-03-17 17:35:10 -08:00
	@@ -34,8 +34,9 @@
	 E: airlied@linux.ie
	 W: http://www.csn.ul.ie/~airlied
	 D: NFS over TCP patches
	-S: University of Limerick
	-S: Ireland
	+D: in-kernel DRM Maintainer

so the "-git" thing there is equivalent to the "-Nru" part, telling how
the diff was generated, and being an added hint that we may have extended
headers before the actual patch (which wouldn't be sensible in a non-git
patch).

One final note: I actually think that "rename patches" make a ton of 
sense, even if git itself doesn't track renames. If we ever have a "smart 
diff" thing that can generate inter-file diffs, I'd like to eventually see

	diff -git a/kernel/sched.c b/kernel/sched.c.old
	rename kernel/sched.c kernel/sched.c.old
	old mode 100644
	new mode 100755
	--- a/kernel/sched.c
	+++ b/kernel/sched.c.old
	@@ -1,5 +1,5 @@
	 /*
	- *  kernel/sched.c
	+ *  kernel/sched.c.old
	  *
	  *  Kernel scheduler and related syscalls
	  *

Notice? We could have a mode change, a rename _and_ a content change, all
at the same time under the same header. That's obviously a totally idiotic
example, but the point is that if we have a nice "extended diff header"
setup, the format is very easily able to accomodate things like this.

And it's both human-readable _and_ automatically parseable. And my old 
"/^diff " thing would still work, and still find each new entry, so I'd 
not have to teach my old fingers new tricks.

(This is, btw, the reason for the format of "git-diff-tree -v --stdin", in 
case anybody wondered. Take a look, and notice how piping the output to 
"less" and then doing "/^diff-tree " gives the expected results. Same 
deal. Human-readable _and_ machine-parseable at the same time!).

		Linus

^ permalink raw reply

* Re: [PATCH 1/2] Introduce git-run-with-user-path helper program.
From: Junio C Hamano @ 2005-05-16 23:40 UTC (permalink / raw)
  To: pasky, torvalds; +Cc: git
In-Reply-To: <7v8y2en4e8.fsf@assigned-by-dhcp.cox.net>

A new command git-run-with-user-path takes a command and paths
that are filesystem paths (either relative to the cwd or
absolute pathname).  It canonicalizes these paths to be usable
by the core GIT commands, filters using the ignore pattern rule,
chdir(2)'s to the top level of the tree and runs the given
command with these canonicalizd paths as its arguments.

This version contains necessary hooks to implement the ignore
pattern rule, but it does not implement any ignore pattern
rules, waiting for more mailing list discussions.

Signed-off-by: Junio C Hamano <junkio@cox.net>
---

Documentation/git-run-with-user-path.txt |  100 +++++++++++++++
Makefile                                 |    7 -
paths.c                                  |  199 +++++++++++++++++++++++++++++++
paths.h                                  |   14 ++
run-with-user-path.c                     |   87 +++++++++++++
t/README                                 |    1 
t/t7000-git-run-with-user-path-basic.sh  |   87 +++++++++++++
update-cache.c                           |   29 ----
8 files changed, 495 insertions(+), 29 deletions(-)
Documentation/git-run-with-user-path.txt (. --> 100644)
paths.c (. --> 100644)
paths.h (. --> 100644)
run-with-user-path.c (. --> 100644)
t/t7000-git-run-with-user-path-basic.sh (. --> 100755)

--- /dev/null
+++ b/Documentation/git-run-with-user-path.txt
@@ -0,0 +1,100 @@
+git-run-with-user-path(1)
+=========================
+v0.1, May 2005
+
+NAME
+----
+git-run-with-user-path - Run command from the top after canonicalizing paths.
+
+
+SYNOPSIS
+--------
+'git-run-with-user-path' [options] <command> <argument>... '--' <path>...
+'git-run-with-user-path' [options] ( '--echo-z' | '--echo' ) <path>...
+
+DESCRIPTION
+-----------
+In the first form, this command takes a <command>, zero or more
+<argument> and zero or more <path> arguments.  <path> arguments name
+objects on the filesystem, <command> is typically a core GIT command,
+and <argument> are the initial arguments to the <command>.
+
+It first finds the project top directory (the directory that corresponds
+to the top of the tree structure GIT_INDEX_FILE describes).  When the
+environment variable GIT_PROJECT_TOP is set, the value of the variable
+is used.  Then the <path> parameters are canonicalized to be relative to
+the project top.  It then chdir(2)'s to the project top directory and
+runs the given <command>, with <argument> and these canonicalized <path>
+arguments.
+
+This is useful for the Porcelain layer to run core GIT commands from
+subdirectories.  For example, if linux-2.6.git tree is checked out in
+/usr/src/linux, you can do:
+
+    $ cd /usr/src/linux/fs
+    $ ... work in fs directory making changes ...
+    $ git-run-with-user-path git-diff-tree -r HEAD -- ext? ../include/linux
+    $ find ext? ../include/linux ! -type d -print0 |
+      xargs -0 git-run-with-user-path git-update-cache --add -- --
+
+The above is roughly equivalent to:
+
+    $ cd /usr/src/linux
+    $ git-diff-tree -r HEAD fs/ext? include/linux
+    $ find fs/ext? include/linux ! -type d -print0 |
+      xargs git-update-cache --add --
+
+In the second form, this command just canonicalizes and filters paths,
+and writes the resulting paths to the standard output.  '--echo'
+separates each output record with a newline while '--echo-z' uses NUL as
+the record separator.
+
+    $ find ext? ../include/linux ! -type d -print0 |
+      xargs -0 git-run-with-user-path --echo --show-ignore
+
+This example gives you the list of ignored files, one per line.  It is
+roughly equivalent to:
+
+    $ find ext? ../include/linux ! -type d -print0 |
+      xargs -0 -n1 git-run-with-user-path --show-ignore echo --
+
+
+OPTIONS
+-------
+
+--no-ignore::
+
+	By default, the path arguments are filtered with the same ignore
+	rules Porcelain layers use.  With --no-ignore flag, there is no
+	such filtering done.
+
+--show-ignore::
+
+	This causes the filter to be applied in reverse.  Only the paths
+	ignored by the ignore rules are passed to the command (or
+	output).
+
+
+ENVIRONMENT VARIABLES
+---------------------
+
+'GIT_PROJECT_TOP'::
+	If the 'GIT_PROJECT_TOP' environment variable is set then it
+	specifies the directory that corresponds to the top level of the
+	tree structure GIT_INDEX_FILE describes.  When this environment
+	variable is not defined, the closest parent directory that has
+	.git/ subdirectory in it is looked for and used.
+
+
+Author
+------
+Written by Junio C Hamano <junkio@cox.net>
+
+Documentation
+--------------
+Documentation by Junio C Hamano.
+
+GIT
+---
+Part of the link:git.html[git] suite
+
--- a/Makefile
+++ b/Makefile
@@ -28,7 +28,7 @@
 	git-unpack-file git-export git-diff-cache git-convert-cache \
 	git-http-pull git-rpush git-rpull git-rev-list git-mktag \
 	git-diff-helper git-tar-tree git-local-pull git-write-blob \
-	git-get-tar-commit-id
+	git-get-tar-commit-id git-run-with-user-path
 
 all: $(PROG)
 
@@ -46,6 +46,9 @@
 LIB_H += diff.h
 LIB_OBJS += diff.o
 
+LIB_H += paths.h
+LIB_OBJS += paths.o
+
 LIB_OBJS += gitenv.o
 
 LIBS = $(LIB_FILE)
@@ -100,6 +103,7 @@
 git-rpush: rsh.c
 git-rpull: rsh.c pull.c
 git-rev-list: rev-list.c
+git-run-with-user-path: run-with-user-path.c 
 git-mktag: mktag.c
 git-diff-helper: diff-helper.c
 git-tar-tree: tar-tree.c
@@ -117,6 +121,7 @@
 sha1_file.o: $(LIB_H)
 usage.o: $(LIB_H)
 diff.o: $(LIB_H)
+paths.o: $(LIB_H)
 strbuf.o: $(LIB_H)
 gitenv.o: $(LIB_H)
 
--- /dev/null
+++ b/paths.c
@@ -0,0 +1,199 @@
+/*
+ * Copyright (c) 2005 Junio C Hamano
+ */
+#include <string.h>
+#include "cache.h"
+#include "paths.h"
+
+/****************************************************************/
+
+/* Ignore list handling part */
+
+/*
+ * We fundamentally don't like some paths: we don't want
+ * dot or dot-dot anywhere, and in fact, we don't even want
+ * any other dot-files (.git or anything else). They
+ * are hidden, for chist sake.
+ *
+ * Also, we don't want double slashes or slashes at the
+ * end that can make pathnames ambiguous.
+ */
+int verify_path(const char *path)
+{
+	char c;
+
+	goto inside;
+	for (;;) {
+		if (!c)
+			return 1;
+		if (c == '/') {
+inside:
+			c = *path++;
+			if (c != '/' && c != '.' && c != '\0')
+				continue;
+			return 0;
+		}
+		c = *path++;
+	}
+}
+
+static int initialize_ignore_list(void)
+{
+	/* Put the Porcelain layer ignore logic initialization here.
+	 * Return non-zero after issuing appropriate error message
+	 * if initialization fails.
+	 */
+	return 0;
+}
+
+int path_ignored(const char *path)
+{
+	if (!verify_path(path))
+		return 1;
+
+	/* Put the Porcelain layer ignore logic here.
+	 * Return non-zero if path is to be ignored.
+	 */
+	return 0;
+}
+
+
+/****************************************************************/
+
+/* Path canonicalization part */
+
+char *git_project_top = NULL;
+static char git_cwd[PATH_MAX];
+
+static int find_project_top(void)
+{
+	char path[PATH_MAX];
+	int dir_length;
+
+	if (!getcwd(git_cwd, sizeof(git_cwd)))
+		return error("cannot get cwd to find GIT_PROJECT_TOP");
+
+	git_project_top = gitenv("GIT_PROJECT_TOP");
+	if (git_project_top)
+		return 0;
+
+	strcpy(path, git_cwd);
+	while (path[0] && strcmp(path, "/") && !git_project_top) {
+		char *cp;
+		struct stat st;
+		dir_length = strlen(path);
+		path[dir_length] = '/';
+
+		strcpy(path + dir_length + 1, ".git");
+		if (stat(path, &st) < 0) {
+			if (errno != ENOENT)
+				return error("%s: %s", path, strerror(errno));
+			/* notfound */
+		}
+		else if (S_ISDIR(st.st_mode)) {
+			path[dir_length] = 0;
+			git_project_top = strdup(path);
+			break;
+		}
+		else
+			return error("%s: not a directory", path);
+		path[dir_length] = 0;
+		cp = strrchr(path, '/');
+		if (cp)
+			*cp = 0;
+	}
+	if (!git_project_top)
+		return error("cannot find GIT_PROJECT_TOP");
+
+	return 0;
+}
+
+char *canon_path(const char *path)
+{
+	/* path is either absolute path from root fs or
+	 * relative to the git_cwd.  What is the relative path
+	 * for that thing, viewed from GIT_PROJECT_TOP?
+	 */
+	char *cp, *op, *endp, *result = NULL;
+	char *work = xmalloc(strlen(git_cwd) + strlen(path) + 2);
+	int pfxlen = strlen(git_project_top);
+
+	if (path[0] == '/')
+		strcpy(work, path);
+	else
+		sprintf(work, "%s/%s", git_cwd, path);
+	/* We will copy to *op starting from *cp while removing
+	 * nonsense.  Initially op and cp are both set to one
+	 * past the root-level '/'.
+	 */
+	op = cp = work + 1;
+	endp = cp + strlen(cp);
+	while (cp < endp) {
+		char *ep = strchr(cp, '/');
+		if (!ep)
+			ep = endp; /* at terminating NUL */
+		/* Now look at what is between cp and ep. */
+		if (ep == cp) {
+			/* Remove double slashes.
+			 * "/xxx//foo" ==> "/xxx//foo"
+			 *    cp^              cp^
+			 */
+			cp++;
+			continue;
+		}
+		if (*cp == '.') {
+			/* dot something.  What is it? */
+			if (cp[1] == 0 || cp[1] == '/') {
+				/* Remove trailing dot.
+				 * "/xxx/." ==> "/xxx/."
+				 *     cp^           cp^
+				 * "/xxx/./foo" ==> "/xxx/./foo"
+				 *     cp^               cp^
+				 */
+				cp = ep;
+				continue;
+			}
+			if (cp[1] == '.' && (cp[2] == 0 || cp[2] == '/')) {
+				/* Uplevel.
+				 * "/xxx/../foo" ==> "/xxx/../foo"
+				 *     cp^                  cp^
+				 * while backspacing "xxx" in the op
+				 */
+				cp = cp + 3;
+				op -= 2;
+				if (op < work)
+					op = work + 1;
+				while (*op != '/' && work < op)
+					op--;
+				op++;
+				continue;
+			}
+		}
+		/* otherwise there is no funnies */
+		while (cp <= ep && *cp)
+			*op++ = *cp++;
+	}
+	*op = 0;
+	if (op[-1] == '/' && op != work)
+		op[-1] = 0;
+
+	if (!strncmp(git_project_top, work, pfxlen) &&
+	    (work[pfxlen] == '/' || work[pfxlen] == 0))
+		result = strdup(work + pfxlen + 1);
+	/* otherwise, path is outside of git-project-top and we ignore it. */
+
+	free(work);
+	return result;
+}
+
+/****************************************************************/
+
+int setup_paths(void)
+{
+	if (find_project_top())
+		return -1;
+	if (initialize_ignore_list())
+		return -1;
+	return 0;
+}
+
--- /dev/null
+++ b/paths.h
@@ -0,0 +1,14 @@
+/*
+ * Copyright (c) 2005 Junio C Hamano
+ */
+#ifndef _PATHS_H_
+#define _PATHS_H_
+
+int setup_paths(void);
+extern char *git_project_top;
+
+char *canon_path(const char *);
+int path_ignored(const char *);
+int verify_path(const char *);
+
+#endif
--- /dev/null
+++ b/run-with-user-path.c
@@ -0,0 +1,87 @@
+/*
+ * Copyright (c) 2005 Junio C Hamano
+ */
+#include <unistd.h>
+#include "cache.h"
+#include "paths.h"
+
+static int no_ignore = 0, show_ignore = 0, echo = 0, line_termination = '\n';
+
+static const char *usage_rwup = 
+"git-run-with-user-path [ --no-ignore ] [ --show-ignore ] "
+"( --echo-z | --echo | <command> <argument>... -- <path>... )";
+
+static int prepare_path_args(char **exec_param, char **path)
+{
+	int i, cnt;
+	char *canon;
+
+	for (i = cnt = 0; path[i]; i++) {
+		canon = canon_path(path[i]);
+		if (no_ignore ||
+		    !!path_ignored(canon) == !!show_ignore)
+			exec_param[cnt++] = canon;
+	}
+	return cnt;
+}
+
+int main(int ac, char **av)
+{
+	char **exec_param;
+	int i, command_end, cnt_path;
+
+	if (setup_paths())
+		exit(1);
+
+	while (1 < ac && av[1][0] == '-') {
+		if (!strcmp(av[1], "--no-ignore"))
+			no_ignore = 1;
+		else if (!strcmp(av[1], "--show-ignore"))
+			show_ignore = 1;
+		else if (!strcmp(av[1], "--echo"))
+			echo = 1;
+		else if (!strcmp(av[1], "--echo-z")) {
+			echo = 1;
+			line_termination = 0;
+		}
+		else
+			break;
+		ac--; av++;
+	}
+
+	exec_param = xcalloc(ac, sizeof(char *));
+
+	if (echo) {
+		cnt_path = prepare_path_args(exec_param, av + 1);
+		for (i = 0; exec_param[i]; i++)
+			printf("%s%c", exec_param[i], line_termination);
+		exit(0);
+	}
+
+	for (i = 1; i < ac; i++)
+		if (!strcmp(av[i], "--"))
+			break;
+	if (ac <= i)
+		die(usage_rwup); /* no -- to start path */
+
+	command_end = i; /* pointing at -- */
+	/* command command arg1 arg2 ... path1 path2 ... NULL */
+	for (i = 1; i < command_end; i++)
+		exec_param[i - 1] = av[i];
+
+	/* We need to special case -- (end of command) followed
+	 * immediately by -- (beginning of paths); otherwise
+	 * "git-run-with-user-path git-update-cache --add -- -- foo"
+	 * would try canonicalize and filter path arguments starting
+	 * from -- (beginning of paths), which is not what we want.
+	 */
+	if (!strcmp(av[command_end + 1], "--")) {
+		exec_param[command_end-1] = av[command_end + 1];
+		command_end++;
+	}
+	cnt_path = prepare_path_args(exec_param + command_end - 1,
+				     av + command_end + 1);
+	chdir(git_project_top);
+	execvp(exec_param[0], exec_param);
+	exit(1);
+}
--- a/t/README
+++ b/t/README
@@ -73,6 +73,7 @@
 	4 - the diff commands
 	5 - the pull and exporting commands
 	6 - the revision tree commands (even e.g. merge-base)
+	7 - the non-core commands and helpers
 
 Second digit tells the particular command we are testing.
 
--- /dev/null
+++ b/t/t7000-git-run-with-user-path-basic.sh
@@ -0,0 +1,87 @@
+#!/bin/sh
+#
+# Copyright (c) 2005, Junio C Hamano
+#
+
+test_description='git-run-with-user-path basic test.
+
+The command is used to help running core GIT commands that always
+expect to be run from the top level directory (i.e. the directory
+that corresponds to the top of tree GIT_INDEX_FILE describes).
+'
+
+. ./test-lib.sh
+
+LF='
+'
+HERE=$(pwd)
+
+test_expect_success \
+setup '
+mkdir path0 path1 path1/path2
+for p in path0/file0 path1/file1 path1/path2/file2
+do
+    echo hello >$p
+    git-update-cache --add -- $p
+done
+'
+
+test_expect_success \
+'finding paths from a subdirectory' '
+    case "$(cd path0 &&
+            git-run-with-user-path --no-ignore cat -- \
+	    file0 ../path1/path2/file2)" in
+    "hello${LF}hello") : ;;
+    *) (exit 1) ;;
+    esac
+'
+
+test_expect_success \
+'feeding find output via xargs from a subdirectory' '
+    case "$(cd path0 &&
+	    find . ../path1 -type f -print0 |
+	    xargs -r -0 git-run-with-user-path --no-ignore cat --)" in
+    "hello${LF}hello${LF}hello") : ;;
+    *) (exit 1) ;;
+    esac
+'
+
+cd $HERE
+test_expect_success \
+'handling special case -- at the end of command' '
+    case "$(cd path0 &&
+            git-run-with-user-path --no-ignore echo -- \
+	    -- ../path1/file1)" in
+    "-- path1/file1") : ;;
+    *) (exit 1) ;;
+    esac	    
+'
+
+cd $HERE
+test_expect_success \
+'the --echo option' '
+    case "$(cd path0 &&
+            git-run-with-user-path --no-ignore --echo ../path1/file1)" in
+    "path1/file1") : ;;
+    *) (exit 1) ;;
+    esac	    
+'
+
+cd $HERE
+mv .git .svn
+GIT_DIR=$(pwd)/.svn
+GIT_PROJECT_TOP=$(pwd)
+export GIT_DIR GIT_PROJECT_TOP
+
+test_expect_success \
+'feeding find output via xargs from a subdirectory (with GIT_PROJECT_TOP)' '
+    case "$(cd path0 &&
+            find . ../path1 -type f -print0 |
+	    xargs -r -0 git-run-with-user-path --no-ignore cat --)" in
+    "hello${LF}hello${LF}hello") : ;;
+    *) (exit 1) ;;
+    esac
+    cd ..
+'
+
+test_done
--- a/update-cache.c
+++ b/update-cache.c
@@ -5,6 +5,7 @@
  */
 #include <signal.h>
 #include "cache.h"
+#include "paths.h"
 
 /*
  * Default to not allowing changes to the list of files. The
@@ -257,34 +258,6 @@
 	return has_errors;
 }
 
-/*
- * We fundamentally don't like some paths: we don't want
- * dot or dot-dot anywhere, and in fact, we don't even want
- * any other dot-files (.git or anything else). They
- * are hidden, for chist sake.
- *
- * Also, we don't want double slashes or slashes at the
- * end that can make pathnames ambiguous.
- */
-static int verify_path(char *path)
-{
-	char c;
-
-	goto inside;
-	for (;;) {
-		if (!c)
-			return 1;
-		if (c == '/') {
-inside:
-			c = *path++;
-			if (c != '/' && c != '.' && c != '\0')
-				continue;
-			return 0;
-		}
-		c = *path++;
-	}
-}
-
 static int add_cacheinfo(char *arg1, char *arg2, char *arg3)
 {
 	int size, len, option;
------------------------------------------------


^ permalink raw reply

* [PATCH 2/2] Add sample ignore logic to git-run-with-user-path command.
From: Junio C Hamano @ 2005-05-16 23:41 UTC (permalink / raw)
  To: pasky, torvalds; +Cc: git
In-Reply-To: <7v8y2en4e8.fsf@assigned-by-dhcp.cox.net>

This adds a sample ignore file logic to git-run-with-user-path
command.  This is primarily to serve as an example for plugging
ignore file logic to the previously introduced framework, and to
spur mailing list discussions on what the final ignore file
logic should be, and where the information should come from.

Signed-off-by: Junio C Hamano <junkio@cox.net>
---

Documentation/git-run-with-user-path.txt |   32 ++++++++
Makefile                                 |    6 +
paths.c                                  |  117 +++++++++++++++++++++++++++++--
t/t7001-git-run-with-user-path-ignore.sh |   67 +++++++++++++++++
4 files changed, 215 insertions(+), 7 deletions(-)
t/t7001-git-run-with-user-path-ignore.sh (. --> 100755)

--- a/Documentation/git-run-with-user-path.txt
+++ b/Documentation/git-run-with-user-path.txt
@@ -75,6 +75,38 @@
 	output).
 
 
+IGNORE FILES
+------------
+
+This command currently uses a pcre based implementation to express
+ignore patterns.  The purpose of this implementation is to primarily
+serve as an example and to start GIT mailing list discussions, and by no
+means is cast in stone.  This section describes what this sample
+implementation does.
+
+The information used to define which paths to ignore is read from two
+files.  Both files use the same syntax.
+
+First, $CIT_DIR/ignore is read.  Then, the file whose path (relative to
+the project top) recorded in $GIT_DIR/info/ignore-file is read next.
+The latter file is expected to be revision controlled with GIT.
+
+These two files should record one ignore record per line.  A line that
+is empty, and a line that starts with a '#' are ignored and used as
+comments.
+
+Each ignore record is a pcre regular expression, optionally prefixed
+with a '!'.  To determine if a path is to be ignored, the path is
+matched against each ignore record in the order they appear in the
+ignore file.  If the ignore record matches the path, and it does not
+have the optional '!', then the path is ignored.  Otherwise, the path is
+not ignored.  In either case, the rest of ignore records are not used
+after the first match.  This means (1) an earlier entry in an ignore
+file has precedence over later ones, and (2) the entries in
+$GIT_DIR/ignore file have precedence over the ones in the file named
+by $GIT_DIR/info/ignore-file.
+
+
 ENVIRONMENT VARIABLES
 ---------------------
 
--- a/Makefile
+++ b/Makefile
@@ -54,6 +54,12 @@
 LIBS = $(LIB_FILE)
 LIBS += -lz
 
+IGNORE_USING_PCRE=1
+
+ifdef IGNORE_USING_PCRE
+  LIBS += -lpcreposix
+endif
+
 ifdef MOZILLA_SHA1
   SHA1_HEADER="mozilla-sha1/sha1.h"
   LIB_OBJS += mozilla-sha1/sha1.o
--- a/paths.c
+++ b/paths.c
@@ -2,6 +2,7 @@
  * Copyright (c) 2005 Junio C Hamano
  */
 #include <string.h>
+#include <pcreposix.h>
 #include "cache.h"
 #include "paths.h"
 
@@ -37,23 +38,125 @@
 	}
 }
 
+static struct ignore_entry {
+	int negate;
+	regex_t regexp;
+} **ignore_list;
+static int ignore_nr;
+static int ignore_alloc;
+
+static void add_ignore(const char *buf)
+{
+	struct ignore_entry *ie = xmalloc(sizeof(*ie));
+	if (buf[0] == '!') {
+		ie->negate = 1;
+		buf++;
+	}
+	else
+		ie->negate = 0;
+
+	if (regcomp(&(ie->regexp), buf, 0)) {
+		fprintf(stderr, "bad regexp <%s>\n", buf);
+		free(ie);
+		return;
+	}
+	if (ignore_alloc <= ignore_nr) {
+		ignore_alloc = alloc_nr(ignore_alloc);
+		ignore_list = xrealloc(ignore_list,
+				       ignore_alloc * sizeof(ie));
+	}
+	ignore_list[ignore_nr++] = ie;
+}
+
+static void read_ignore_list(const char *path)
+{
+	FILE *in;
+	char buf[1024];
+	in = fopen(path, "r");
+	if (!in)
+		return;
+	while (fgets(buf, sizeof(buf), in) != NULL) {
+		int l = strlen(buf);
+		/* An empty line and a line that starts with # is comment. */
+		if (buf[0] != '#' && buf[0] != '\n' && buf[l-1] == '\n') {
+			buf[l-1] = 0;
+			add_ignore(buf);
+		}
+	}
+	fclose(in);
+}
+
+static void read_ignore_list_from_file(const char *path)
+{
+	char filename[PATH_MAX];
+	int len;
+	FILE *in;
+
+	in = fopen(path, "r");
+	if (!in)
+		return;
+	strcpy(filename, git_project_top);
+	len = strlen(filename);
+	filename[len++] = '/';
+	if (fgets(filename + len, sizeof(filename) - len, in) == NULL) {
+		fclose(in);
+		return;
+	}
+	fclose(in);
+	len = strlen(filename);
+	if (filename[len-1] != '\n')
+		return;
+	filename[len-1] = 0;
+	read_ignore_list(filename);
+}
+
 static int initialize_ignore_list(void)
 {
-	/* Put the Porcelain layer ignore logic initialization here.
-	 * Return non-zero after issuing appropriate error message
-	 * if initialization fails.
-	 */
+	char *git_dir = gitenv("GIT_DIR");
+	char path[PATH_MAX];
+	int git_dir_len;
+
+	if (! git_dir)
+		sprintf(path, "%s/.git", git_project_top);
+	else
+		strcpy(path, git_dir);
+	git_dir_len = strlen(path);
+	path[git_dir_len++] = '/';
+
+	/* read private list first, and then shared list. */
+	strcpy(path + git_dir_len, "ignore");
+	read_ignore_list(path);
+
+	strcpy(path + git_dir_len, "info/ignore-file");
+	read_ignore_list_from_file(path);
+
 	return 0;
 }
 
 int path_ignored(const char *path)
 {
+	int i;
+
 	if (!verify_path(path))
 		return 1;
 
-	/* Put the Porcelain layer ignore logic here.
-	 * Return non-zero if path is to be ignored.
-	 */
+	for (i = 0; i < ignore_nr; i++) {
+		int status;
+		regmatch_t pmatch[10];
+		char errbuf[1024];
+
+		status = regexec(&(ignore_list[i]->regexp), path,
+				 sizeof(pmatch)/sizeof(pmatch[0]),
+				 pmatch, 0);
+		if (!status)
+			return !ignore_list[i]->negate;
+		if (status == REG_NOMATCH)
+			continue;
+
+		regerror(status, &(ignore_list[i]->regexp), errbuf,
+			 sizeof(errbuf));
+		fprintf(stderr, "pcre regexp execution error <%s>\n", errbuf);
+	}
 	return 0;
 }
 
--- /dev/null
+++ b/t/t7001-git-run-with-user-path-ignore.sh
@@ -0,0 +1,67 @@
+#!/bin/sh
+#
+# Copyright (c) 2005, Junio C Hamano
+#
+
+test_description='git-run-with-user-path basic test (part #2).
+
+The command is used to help running core GIT commands that always
+expect to be run from the top level directory (i.e. the directory
+that corresponds to the top of tree GIT_INDEX_FILE describes).
+
+It knows how to handle ignore files convention used by the Porcelain
+layer implementation.
+'
+
+. ./test-lib.sh
+
+LF='
+'
+HERE=$(pwd)
+
+test_expect_success \
+setup '
+echo ".*1\$" >.git/ignore &&
+echo ".*0\$" >dontdiff &&
+mkdir .git/info &&
+echo "dontdiff" >.git/info/ignore-file &&
+mkdir path0 path1 path1/path2 &&
+for p in path0/file0 path1/file1 path1/path2/file2
+do
+    echo hello >$p || exit 1
+done
+'
+
+cd $HERE
+test_expect_success \
+'finding paths from a subdirectory' '
+    case "$(cd path0 &&
+            git-run-with-user-path echo -- \
+	    file0 ../path1/path2/file2)" in
+    "path1/path2/file2") : ;;
+    *) (exit 1) ;;
+    esac
+'
+
+cd $HERE
+test_expect_success \
+'feeding find output via xargs from a subdirectory' '
+    case "$(cd path0 &&
+	    find . ../path1 -type f -print0 |
+	    xargs -r -0 git-run-with-user-path ls -1 --)" in
+    "path1/path2/file2") : ;;
+    *) (exit 1) ;;
+    esac
+'
+
+cd $HERE
+test_expect_success \
+'using !negate pattern' '
+    echo "!path0/file0$" >>.git/ignore &&
+    case "$(git-run-with-user-path ls -1 -- path0/* path1/file1)" in
+    "path0/file0") : ;;
+    *) (exit 1) ;;
+    esac
+'
+
+test_done
------------------------------------------------


^ permalink raw reply

* Re: [PATCH 2/4] Tweak diff output further to make it a bit less distracting.
From: Junio C Hamano @ 2005-05-16 23:44 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Petr Baudis, git
In-Reply-To: <Pine.LNX.4.58.0505161556260.18337@ppc970.osdl.org>

>>>>> "LT" == Linus Torvalds <torvalds@osdl.org> writes:

LT> I have to say, I muct prefer the first over the second.

Likewise.

LT> ... (whether you want the "mode"
LT> for the deleted file case or not is up to you - it could be an added
LT> sanity check that the diff actually matches, but on the other hand there's
LT> really nothing you could do anyway except warn if the mode didn't match,
LT> so..)

Applying patch in reverse comes to mind...

I'd agree what you said about "diff -git" in the rest of your
message makes the most sense.


^ permalink raw reply

* Re: git-rev-list  in local commit order
From: Linus Torvalds @ 2005-05-16 23:46 UTC (permalink / raw)
  To: Sean; +Cc: tglx, git
In-Reply-To: <1629.10.10.10.24.1116278725.squirrel@linux1>



On Mon, 16 May 2005, Sean wrote:
> 
> And you also haven't addressed what to do when someone else uses say,
> Linus' repoid, as their own.  It seems like a risk to have the operation
> of each repository depend on a value anyone else can duplicate.  Linus
> can't control what repoid everyone else uses, he can control the time on
> his own machine.  Unique repoid's are an illusion.

Yes. I'm not ahuge fan of the notion of "repo ID's". One reason is that I
actually really really like the notion of anonymous repositories, so that
when I do something stupid, and blow away one of my less successful
repositories and continue with another one, nobody ever sees it (and yes,
this happens - in my BK usage I occasionally cloned my repo for some
testing, and then ended up using the _cloned_ repo for the real work, and
totally blowing away ymy original one, and renamed my cloned one back to
where my main one is).

That said, while I actually think that time/date matters, I don't think it 
should matter a lot.

I don't see why people don't just use the "committer" name for this.  
That's really what you want, and it ends up being a very good
approximation of "repository ID" for a commit. Sure, people end up having
multiple reposiories, and thus you'll occasionally see merges that end up
merging two heads with the same "repo ID", but does anybody really care? I
doubt it.

For example, if you have a company Q&A policy that says that you want to 
keep commits to different repos separate, just make sure that those repos 
are on different machines or are accessed with different users. Or write 
some simple wrapper scripts that make sure to set GIT_COMMITTER_EMAIL to 
the proper value (say, the wrapper could be as simple as

	#!/bin/sh
	export GIT_COMMITTER_EMAIL=$(cat .git/committer_email)
	real-git-commit "$@"

and then you just create a ".git/committer_email" file per repository that 
contains the "repo ID" you want to fake.

		Linus

^ permalink raw reply

* Re: [PATCH 2/4] Tweak diff output further to make it a bit less distracting.
From: Daniel Barkalow @ 2005-05-17  0:10 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Junio C Hamano, Petr Baudis, git
In-Reply-To: <Pine.LNX.4.58.0505161556260.18337@ppc970.osdl.org>

On Mon, 16 May 2005, Linus Torvalds wrote:

> One final note: I actually think that "rename patches" make a ton of 
> sense, even if git itself doesn't track renames. If we ever have a "smart 
> diff" thing that can generate inter-file diffs, I'd like to eventually see
> 
> 	diff -git a/kernel/sched.c b/kernel/sched.c.old
> 	rename kernel/sched.c kernel/sched.c.old
> 	old mode 100644
> 	new mode 100755

I'd like something like:

diff -git a/kernel/sched.c b/kernel/sched.c.old
filename -- kernel/sched.c
filename ++ kernel/sched.c.old
mode -- 100644
mode ++ 100755
--- a/kernel/sched.c
+++ b/kernel/sched.c.old
@@ -1,5 +1,5 @@
(etc.)

because I actually start thinking of the two sides as "-" and "+", and I'd
actually have to think about which is "old" and which is "new", and which
way the "rename" line goes, and so forth. I'd actually be happier with
just a "mode -- 100644" line for a deleted file, also. If I'm looking at a
patch, and I read Makefile with '-' and '+' versions of the lists of
objects, and then get to a "new file" line, I have to think about it to
associate the '+' side with having the file and the '-' side with not
having it.

	-Daniel
*This .sig left intentionally blank*


^ permalink raw reply

* [PATCH] Fix diff output take #3.
From: Junio C Hamano @ 2005-05-17  0:11 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Petr Baudis, git
In-Reply-To: <Pine.LNX.4.58.0505161556260.18337@ppc970.osdl.org>

This implements the output format suggested by Linus in
<Pine.LNX.4.58.0505161556260.18337@ppc970.osdl.org>

Signed-off-by: Junio C Hamano <junkio@cox.net>
---

diff.c                 |   14 +++++++-------
t/t4000-diff-format.sh |    7 +++++--
2 files changed, 12 insertions(+), 9 deletions(-)

diff -git a/diff.c b/diff.c
--- a/diff.c
+++ b/diff.c
@@ -83,7 +83,6 @@
 			 struct diff_tempfile *temp)
 {
 	int i, next_at;
-	const char *git_prefix = "# mode: ";
 	const char *diff_cmd = "diff -L'%s%s' -L'%s%s'";
 	const char *diff_arg  = "'%s' '%s'||:"; /* "||:" is to return 0 */
 	const char *input_name_sq[2];
@@ -123,15 +122,16 @@
 	next_at += snprintf(cmd+next_at, cmd_size-next_at,
 			    diff_arg, input_name_sq[0], input_name_sq[1]);
 
+	printf("diff -git a/%s b/%s\n", name, name);
 	if (!path1[0][0])
-		printf("%s. %s %s\n", git_prefix, temp[1].mode, name);
+		printf("new file mode %s\n", temp[1].mode);
 	else if (!path1[1][0])
-		printf("%s%s . %s\n", git_prefix, temp[0].mode, name);
+		printf("deleted file mode %s\n", temp[0].mode);
 	else {
-		if (strcmp(temp[0].mode, temp[1].mode))
-			printf("%s%s %s %s\n", git_prefix,
-			       temp[0].mode, temp[1].mode, name);
-
+		if (strcmp(temp[0].mode, temp[1].mode)) {
+			printf("old mode %s\n", temp[0].mode);
+			printf("new mode %s\n", temp[1].mode);
+		}
 		if (strncmp(temp[0].mode, temp[1].mode, 3))
 			/* we do not run diff between different kind
 			 * of objects.
diff -git a/t/t4000-diff-format.sh b/t/t4000-diff-format.sh
--- a/t/t4000-diff-format.sh
+++ b/t/t4000-diff-format.sh
@@ -26,7 +26,9 @@
     'git-diff-files -p after editing work tree.' \
     'git-diff-files -p >current'
 cat >expected <<\EOF
-# mode: 100644 100755 path0
+diff -git a/path0 b/path0
+old mode 100644
+new mode 100755
 --- a/path0
 +++ b/path0
 @@ -1,3 +1,3 @@
@@ -34,7 +36,8 @@
  Line 2
 -line 3
 +Line 3
-# mode: 100755 . path1
+diff -git a/path1 b/path1
+deleted file mode 100755
 --- a/path1
 +++ /dev/null
 @@ -1,3 +0,0 @@
------------------------------------------------


^ permalink raw reply

* Re: [PATCH 1/2] Introduce git-run-with-user-path helper program.
From: Junio C Hamano @ 2005-05-17  4:15 UTC (permalink / raw)
  To: pasky; +Cc: torvalds, git
In-Reply-To: <7v3bsmn3ia.fsf_-_@assigned-by-dhcp.cox.net>

*Blush*.  I sent out a bad copy for this patch.

In run-with-user-path.c, in main():

+	/* We need to special case -- (end of command) followed
+	 * immediately by -- (beginning of paths); otherwise
+	 * "git-run-with-user-path git-update-cache --add -- -- foo"
+	 * would try canonicalize and filter path arguments starting
+	 * from -- (beginning of paths), which is not what we want.
+	 */
+	if (!strcmp(av[command_end + 1], "--")) {
+		exec_param[command_end-1] = av[command_end + 1];
+		command_end++;
+	}

The above if() statement should be:

    if (command_end + 1 < ac && !strcmp(av[command_end + 1], "--")) {

Otherwise "./git-run-with-user-path echo --" would segfault.
Sorry about the noise.



^ permalink raw reply

* [RFH] Janitor projects around core GIT
From: Junio C Hamano @ 2005-05-17  4:48 UTC (permalink / raw)
  To: git

Here is a list of things I would like to see done by people who
want to get their hands dirty ;-).

 * Rewrite command line parsing code, probably using GNU getopt.
   I have three gripes about option parsing in the current code:

   - Some pretend to take long options, but allow only --long
     without allowing --lon or --lo.  Some say "--flag=argument"
     while others say "--flag argument", which is inconsistent
     and confusing.

   - For some commands the order of options matter for no
     apparently good reason.  The most prominent example is
     "checkout-cache -a -f" vs "checkout-cache -f -a"; yes I
     know about the comment Linus wrote, but it is more a
     warning for people not to get confused by this behaviour;
     not a justification for that confusing behaviour.

   - Related to the second point, many commands perform actions
     as they go parsing and processing the options.  It would be
     cleaner and easier to add new options later if the option
     processing loop is reorganized so that optionss are done
     first and after all options are collected, real processing
     begins.  The current way lets you make "update-cache foo
     --add bar" to refuse to add foo but to allow adding bar,
     but I do not think being able to do that kind of thing is
     buying us much.

 * Extend coverage of tests to more commands in the t/ directory.

 * const-ness cleanups.

 * Use correct types for sizes of things, as HPA suggested the
   other day.

 * It would be nice if somebody who is handy with checker/sparse
   type tools to run them on core GIT part and/or run core GIT
   part under purify.



^ permalink raw reply

* Re: [RFH] Janitor projects around core GIT
From: Jeff Garzik @ 2005-05-17  4:56 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <7vpsvqihkh.fsf@assigned-by-dhcp.cox.net>

Junio C Hamano wrote:
>  * Rewrite command line parsing code, probably using GNU getopt.
>    I have three gripes about option parsing in the current code:


Use argp.  It supports short and long options, and is highly flexible. 
"info argp" should work on most Linux boxes.

It's in glibc, and if people care about porting git to Solaris/whatever, 
there is a GPL'd version already out there.

	Jeff



^ permalink raw reply

* [PATCH] Make object contents optionally available
From: Daniel Barkalow @ 2005-05-17  4:56 UTC (permalink / raw)
  To: git, Petr Baudis, Junio C Hamano; +Cc: Linus Torvalds

This adds contents and size fields to struct object. If unpack_object is
called on an object, it will fill in the contents field with the complete
raw contents of the object. If free_object_contents is called on an
object, the contents will be freed. If contents is filled when an object
is parsed, it is not unpacked an extra time, but the contents are not
retained if they were not unpacked before parsing.

This patch also centralizes the code to unpack and check the type of
objects, making it worthwhile even without callers for the new functions.

Signed-off-by: Daniel Barkalow <barkalow@iabervon.org>
Index: blob.c
===================================================================
--- 4fdd3c0825943d0d489ce9ca46cb853b41db3807/blob.c  (mode:100644 sha1:280f5241577ac029e9d5a7eb5bf895642b342fc8)
+++ e63ecfdf25ec8d61fe16b6c22152a2e45e60efd2/blob.c  (mode:100644 sha1:68c1adc2b1009fbc5525207b0ddaac34e9c4413b)
@@ -30,21 +30,15 @@
 
 int parse_blob(struct blob *item)
 {
-        char type[20];
-        void *buffer;
-        unsigned long size;
+	int unpacked = !!item->object.contents;
 	int ret;
 
         if (item->object.parsed)
                 return 0;
-        buffer = read_sha1_file(item->object.sha1, type, &size);
-        if (!buffer)
-                return error("Could not read %s",
-                             sha1_to_hex(item->object.sha1));
-        if (strcmp(type, blob_type))
-                return error("Object %s not a blob",
-                             sha1_to_hex(item->object.sha1));
-	ret = parse_blob_buffer(item, buffer, size);
-	free(buffer);
+	unpack_object(&item->object);
+	ret = parse_blob_buffer(item, item->object.contents, 
+			       item->object.size);
+	if (!unpacked)
+		free_object_contents(&item->object);
 	return ret;
 }
Index: commit.c
===================================================================
--- 4fdd3c0825943d0d489ce9ca46cb853b41db3807/commit.c  (mode:100644 sha1:706c7cba08ebc2100c2dbf63ed1238f39324f750)
+++ e63ecfdf25ec8d61fe16b6c22152a2e45e60efd2/commit.c  (mode:100644 sha1:38324dcc16785caa4baafb260c2fa9c0af1248a9)
@@ -69,24 +69,16 @@
 
 int parse_commit(struct commit *item)
 {
-	char type[20];
-	void *buffer;
-	unsigned long size;
+	int unpacked = !!item->object.contents;
 	int ret;
 
 	if (item->object.parsed)
 		return 0;
-	buffer = read_sha1_file(item->object.sha1, type, &size);
-	if (!buffer)
-		return error("Could not read %s",
-			     sha1_to_hex(item->object.sha1));
-	if (strcmp(type, commit_type)) {
-		free(buffer);
-		return error("Object %s not a commit",
-			     sha1_to_hex(item->object.sha1));
-	}
-	ret = parse_commit_buffer(item, buffer, size);
-	free(buffer);
+	unpack_object(&item->object);
+	ret = parse_commit_buffer(item, item->object.contents, 
+				  item->object.size);
+	if (!unpacked)
+		free_object_contents(&item->object);
 	return ret;
 }
 
Index: object.c
===================================================================
--- 4fdd3c0825943d0d489ce9ca46cb853b41db3807/object.c  (mode:100644 sha1:b5a62e7f87f24c2ab0ea83f3c445d81bcbff027a)
+++ e63ecfdf25ec8d61fe16b6c22152a2e45e60efd2/object.c  (mode:100644 sha1:3e712d5f665d22e06e310932447629b0badc6b48)
@@ -99,6 +99,34 @@
 	}
 }
 
+int unpack_object(struct object *obj)
+{
+	unsigned long mapsize;
+	void *map;
+	char type[20];
+	if (obj->contents)
+		return 0;
+	map = map_sha1_file(obj->sha1, &mapsize);
+	if (!map)
+		return error("Could not map %s",
+			     sha1_to_hex(obj->sha1));
+	obj->contents = unpack_sha1_file(map, mapsize, type, &obj->size);
+	munmap(map, mapsize);
+	if (!obj->contents)
+		return error("Could not unpack %s",
+			     sha1_to_hex(obj->sha1));
+	if (strcmp(type, obj->type))
+		return error("Object %s not a %s",
+			     sha1_to_hex(obj->sha1), obj->type);
+	return 0;
+}
+
+void free_object_contents(struct object *obj)
+{
+	free(obj->contents);
+	obj->contents = NULL;
+}
+
 struct object *parse_object(unsigned char *sha1)
 {
 	unsigned long mapsize;
Index: object.h
===================================================================
--- 4fdd3c0825943d0d489ce9ca46cb853b41db3807/object.h  (mode:100644 sha1:09700d376077b2d6136620faf6efb77ee679deeb)
+++ e63ecfdf25ec8d61fe16b6c22152a2e45e60efd2/object.h  (mode:100644 sha1:848aee6a1b9138618d5e44abf8567d2c2d3c35c1)
@@ -13,6 +13,8 @@
 	unsigned char sha1[20];
 	const char *type;
 	struct object_list *refs;
+	char *contents;
+	unsigned long size;
 };
 
 extern int nr_objs;
@@ -25,6 +27,12 @@
 /** Returns the object, having parsed it to find out what it is. **/
 struct object *parse_object(unsigned char *sha1);
 
+/** Unpacks the object into the contents field. **/
+int unpack_object(struct object *obj);
+
+/** Deallocates the unpacked contents of the object. **/
+void free_object_contents(struct object *obj);
+
 void add_ref(struct object *refer, struct object *target);
 
 void mark_reachable(struct object *obj, unsigned int mask);
Index: tag.c
===================================================================
--- 4fdd3c0825943d0d489ce9ca46cb853b41db3807/tag.c  (mode:100644 sha1:22deb243ad58b2c57fb8652fe2d08c571ee3e781)
+++ e63ecfdf25ec8d61fe16b6c22152a2e45e60efd2/tag.c  (mode:100644 sha1:7fc7960c866a2704ec8f230d24259bc331f7f209)
@@ -66,23 +66,15 @@
 
 int parse_tag(struct tag *item)
 {
-	char type[20];
-	void *data;
-	unsigned long size;
+	int unpacked = !!item->object.contents;
 	int ret;
 
 	if (item->object.parsed)
 		return 0;
-	data = read_sha1_file(item->object.sha1, type, &size);
-	if (!data)
-		return error("Could not read %s",
-			     sha1_to_hex(item->object.sha1));
-	if (strcmp(type, tag_type)) {
-		free(data);
-		return error("Object %s not a tag",
-			     sha1_to_hex(item->object.sha1));
-	}
-	ret = parse_tag_buffer(item, data, size);
-	free(data);
+	unpack_object(&item->object);
+	ret = parse_tag_buffer(item, item->object.contents, 
+			       item->object.size);
+	if (!unpacked)
+		free_object_contents(&item->object);
 	return ret;
 }
Index: tree.c
===================================================================
--- 4fdd3c0825943d0d489ce9ca46cb853b41db3807/tree.c  (mode:100644 sha1:ca800a85f771be1bd10d6575d93ca05bd3fc381c)
+++ e63ecfdf25ec8d61fe16b6c22152a2e45e60efd2/tree.c  (mode:100644 sha1:3b10428aa0a8b9da32c0deaa16edea38c74da0c0)
@@ -140,23 +140,15 @@
 
 int parse_tree(struct tree *item)
 {
-	 char type[20];
-	 void *buffer;
-	 unsigned long size;
-	 int ret;
+	int unpacked = !!item->object.contents;
+	int ret;
 
 	if (item->object.parsed)
 		return 0;
-	buffer = read_sha1_file(item->object.sha1, type, &size);
-	if (!buffer)
-		return error("Could not read %s",
-			     sha1_to_hex(item->object.sha1));
-	if (strcmp(type, tree_type)) {
-		free(buffer);
-		return error("Object %s not a tree",
-			     sha1_to_hex(item->object.sha1));
-	}
-	ret = parse_tree_buffer(item, buffer, size);
-	free(buffer);
+	unpack_object(&item->object);
+	ret = parse_tree_buffer(item, item->object.contents, 
+				item->object.size);
+	if (!unpacked)
+		free_object_contents(&item->object);
 	return ret;
 }


^ permalink raw reply

* Re: [PATCH 2/4] Tweak diff output further to make it a bit less distracting.
From: Petr Baudis @ 2005-05-17  7:01 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Junio C Hamano, git
In-Reply-To: <Pine.LNX.4.58.0505161556260.18337@ppc970.osdl.org>

Dear diary, on Tue, May 17, 2005 at 01:28:31AM CEST, I got a letter
where Linus Torvalds <torvalds@osdl.org> told me that...
> 
> 
> On Mon, 16 May 2005, Junio C Hamano wrote:
> 
> >     # mode: 100644 100755 nitfol
> >     --- a/nitfol
> >     +++ b/nitfol
> 
> >     @. 100644 100755 nitfol
> >     --- a/nitfol
> >     +++ b/nitfol
> 
> I have to say, I muct prefer the first over the second.

Glad. :-)

> One final note: I actually think that "rename patches" make a ton of 
> sense, even if git itself doesn't track renames. If we ever have a "smart 
> diff" thing that can generate inter-file diffs, I'd like to eventually see
> 
> 	diff -git a/kernel/sched.c b/kernel/sched.c.old
> 	rename kernel/sched.c kernel/sched.c.old
> 	old mode 100644
> 	new mode 100755
> 	--- a/kernel/sched.c
> 	+++ b/kernel/sched.c.old
> 	@@ -1,5 +1,5 @@
> 	 /*
> 	- *  kernel/sched.c
> 	+ *  kernel/sched.c.old
> 	  *
> 	  *  Kernel scheduler and related syscalls
> 	  *
> 
> Notice? We could have a mode change, a rename _and_ a content change, all
> at the same time under the same header. That's obviously a totally idiotic
> example, but the point is that if we have a nice "extended diff header"
> setup, the format is very easily able to accomodate things like this.

Actually, if the git diff format is fixed, do we even need the explicit
rename line? It could be enough if the filenames on the diff line would
be just different. Or you want it because of clarity?

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor

^ permalink raw reply

* Re: git-rev-list  in local commit order
From: Thomas Gleixner @ 2005-05-17  9:52 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Sean, git
In-Reply-To: <Pine.LNX.4.58.0505161638090.18337@ppc970.osdl.org>

On Mon, 2005-05-16 at 16:46 -0700, Linus Torvalds wrote:
> Yes. I'm not ahuge fan of the notion of "repo ID's". One reason is that I
> actually really really like the notion of anonymous repositories, so that
> when I do something stupid, and blow away one of my less successful
> repositories and continue with another one, nobody ever sees it (and yes,
> this happens - in my BK usage I occasionally cloned my repo for some
> testing, and then ended up using the _cloned_ repo for the real work, and
> totally blowing away ymy original one, and renamed my cloned one back to
> where my main one is).

What you blow away is a work space. But at the end you push the result
of whatever work space you kept into a public available repository. Also
BK stores a somewhat hidden repository (not workspace) id.

My idea of repository id was not the notion of workspace seperation. I
dont care in which directory and on which machine you or who ever
commits a line of code. I care where the change appears in a public
repository, which is unique.

> I don't see why people don't just use the "committer" name for this.  
> That's really what you want, and it ends up being a very good
> approximation of "repository ID" for a commit. Sure, people end up having
> multiple reposiories, and thus you'll occasionally see merges that end up
> merging two heads with the same "repo ID", but does anybody really care? I
> doubt it.

I came up with this after I started "git tracker" and David Woodhouse
asked me to make it possible to look at the history of his repository
from the repositiory POV rather than from the cloned global history of
torvalds/linux-2.6.git. 

Sure I have retrieved the information from committer name and committer
mail, but when I tried to do the same with Dave Millers and Gregs
repositories it turned out to be impossible as they use the same
name/mail for each of their repositories.

> For example, if you have a company Q&A policy that says that you want to 
> keep commits to different repos separate, just make sure that those repos 
> are on different machines or are accessed with different users. Or write 
> some simple wrapper scripts that make sure to set GIT_COMMITTER_EMAIL to 
> the proper value (say, the wrapper could be as simple as
> 
> 	#!/bin/sh
> 	export GIT_COMMITTER_EMAIL=$(cat .git/committer_email)
> 	real-git-commit "$@"
> 
> and then you just create a ".git/committer_email" file per repository that 
> contains the "repo ID" you want to fake.

Might be a workable solution. I would prefer if this would be a part of
the core plumbing.
Would you accept a patch for commit-tree which tries to check this file
for existance and use the content in case its there?

tglx



^ permalink raw reply

* git tracker updated
From: Thomas Gleixner @ 2005-05-17 10:10 UTC (permalink / raw)
  To: git

An updated version of git tracker is online on

http://www.tglx.de/gittracker


- Head forward handling reworked

- Commit view with google like page selection instead of the time span
based selection

- File revision and annotate reworked

- File revision with links to the commit

- Files, blobs and diffs can be downloaded

- Javascript added to some of the selectors (Jan-Benedikt Glaw suggested
this)

tglx



^ permalink raw reply

* Name of test directory (was: [RFH] Janitor projects around core GIT)
From: Kevin Smith @ 2005-05-17 12:30 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <7vpsvqihkh.fsf@assigned-by-dhcp.cox.net>

Junio C Hamano wrote:
>  * Extend coverage of tests to more commands in the t/ directory.

Thanks for adding unit tests to the project! I may have missed it, but 
why is the directory named t/ instead of tests/ ?

Kevin

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox