Git development

Git development
 help / color / mirror / Atom feed

* Re: how to display file history?
From: Linus Torvalds @ 2006-05-15 18:03 UTC (permalink / raw)
  To: Marco Costalba; +Cc: Eric W. Biederman, Junio C Hamano, Brown, Len, git
In-Reply-To: <e5bfff550605151022m7b9ddcd9y53cd745e16ff6b22@mail.gmail.com>

On Mon, 15 May 2006, Marco Costalba wrote:
> 
> Well, it works but the nice boundary circles are not shown, and qgit
> always adds  --boundary to command line args to feed git-rev-list, but
> in this case it seems the --boundary option didn't do his job.

Well, it did do it's job, but you expected different counting priorities 
than git-rev-list actually gives you.

For the "limit by number" case, the commit counting counts _both_ 
"primary" commits and "boundary" commits, and will limit the total output 
to the number specified.

qgit would seem to prefer to have the commit counting only affect the 
"primary" commits, and not the "boundary" ones at all. Which might be 
sensible, but it's not the semantics it has now.

gitk doesn't care, because it uses the boundary commits just as hints.

I _think_ gitk is correct here, and qgit is being too strict in its 
semantic understanding of what the boundary commits mean. But I think so 
mostly because it would actually be pretty hard to do otherwise (ie the 
git-rev-list commit counting is largely defined by it's _implementation_, 
not necessarily by what you want it to do ;)

		Linus

^ permalink raw reply

* gateway status?
From: David Lang @ 2006-05-15 18:26 UTC (permalink / raw)
  To: git

I seem to remember seeing discussion of gateways to cvs/svn that would let 
a project use a git repository and allow clients to use cvs/svn clients to 
retreive data.

am I remembering correctly, and are these tools ready for production use? 
the popfile project is getting ready to abandon sourceforge and move to 
self-hosting, but before I suggest that they use git I need to know the 
current status of these projects (I think the ability to export directly 
into the other interfaces is a significant advantage)

David Lang

-- 
There are two ways of constructing a software design. One way is to make it so simple that there are obviously no deficiencies. And the other way is to make it so complicated that there are no obvious deficiencies.
  -- C.A.R. Hoare

^ permalink raw reply

* Re: how to display file history?
From: Marco Costalba @ 2006-05-15 18:32 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Eric W. Biederman, Junio C Hamano, Brown, Len, git
In-Reply-To: <Pine.LNX.4.64.0605151055080.3866@g5.osdl.org>

On 5/15/06, Linus Torvalds <torvalds@osdl.org> wrote:
>
>
> qgit would seem to prefer to have the commit counting only affect the
> "primary" commits, and not the "boundary" ones at all. Which might be
> sensible, but it's not the semantics it has now.
>
> gitk doesn't care, because it uses the boundary commits just as hints.
>

$ git-rev-list --topo-order --after="Apr 10" --before="Apr 11" HEAD |wc
     14      14     574
$ git-rev-list --topo-order --boundary --after="Apr 10" --before="Apr
11" HEAD |wc
     18      18     742

Boundary revisions in this case are _not_ passed through search
filtering. Using --boundary option we get revisions ouside given
filter range.

This does not apply to our previous example. So at least --boundary
behaviour it's a little bit inconsistent at the moment.

        Marco

^ permalink raw reply

* Re: how to display file history?
From: Linus Torvalds @ 2006-05-15 18:51 UTC (permalink / raw)
  To: Marco Costalba; +Cc: Eric W. Biederman, Junio C Hamano, Brown, Len, git
In-Reply-To: <e5bfff550605151132s4605a241sd3132aaeb2de6a39@mail.gmail.com>

On Mon, 15 May 2006, Marco Costalba wrote:
> 
> $ git-rev-list --topo-order --after="Apr 10" --before="Apr 11" HEAD |wc
>     14      14     574
> $ git-rev-list --topo-order --boundary --after="Apr 10" --before="Apr
> 11" HEAD |wc
>     18      18     742
> 
> Boundary revisions in this case are _not_ passed through search
> filtering. Using --boundary option we get revisions ouside given
> filter range.

Right. And the commit counting is a special filter, and "boundary" is 
special in that it doesn't normally honor some other filters (it _does_ 
honor path-based ones, though, I think).

So you really should see "--boundary" as a heuristic, and as a hack to 
help you close the loop on uninteresting commits _faster_. But if 
something else has closed it for other reasons, you shouldn't depend on 
it.

		Linus

^ permalink raw reply

* Re: [PATCH] send-email: allow sendmail binary to be used instead of SMTP
From: Eric Wong @ 2006-05-15 19:10 UTC (permalink / raw)
  To: Martin Langhoff; +Cc: Junio C Hamano, git
In-Reply-To: <46a038f90605150337l3357ce3by22834823eee7b87c@mail.gmail.com>

Martin Langhoff <martin.langhoff@gmail.com> wrote:
> Thanks Eric! git-send-email used to default to using local binaries.
> It was only with the switch to Net::SMTP that the default changed to
> localhost:25.

You're welcome.

That's odd, though, looking at Mail::Sendmail, I don't think it actually
uses the sendmail binary, either.  The FAQ
<http://alma.ch/perl/Mail-Sendmail-FAQ.html> confirms this, too.

Of course documentation may be lying and the code I'm looking at may be
*very* well obfuscated :D

-- 
Eric Wong

^ permalink raw reply

* RE: how to display file history?
From: Brown, Len @ 2006-05-15 19:04 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Junio C Hamano, git

 
>> >	git log --stat -- A
>> 
>> very handy indeed.
>> 
>> I was surprised on initial use that --stat is
>> limited to the file specified in "A" and doesn't
>> expand to describe the entire commit that touches "A".
>> (ie. the stat output is a subset of what is associated
>> with the displayed commit comments).
>> 
>> This, of course, is clear now, it just isn't what
>> I expected on first use.
>
>Well,  you can obviously have your cake and eat it too (ie 
>"--full-diff").
>
>I don't often end up using the "--full-diff" thing. It's almost never 
>actually worth it until I find the diff that I actually start caring 
>about, and the full diff just makes it harder to see the part 
>I explicitly told git I was interested in.

sounds good.

>So the default "show only diffs for the files asked for" 
>behaviour is in my opinion much superior (and it used to be the only
one),
>because the "show the whole thing" part ends up being something you use
only once 
>you've already skimmed the default case and decide to go deeper.

I agree.

>Of course, "gitk" ends up using the full diff by default in its diff 
>window. I'm not convinced that's the right thing, but usually 
>when I use gitk I'm primarily looking at the history and the commit
>messages to decide if it's a relevant one, not the diff, so I don't
think 
>it matters.

Yeah, I agree that gitk is fine how it is.

The only part I don't agree with above is the word "obviously".
'--full-diff --stats' didn't jump out at me from the man page.

To be fair, yes, I should probably take the time to read the docs
through
and not rely on the man pages, they've changed a lot since I last
looked.

-Len

^ permalink raw reply

* Re: [PATCH] Add "--branches", "--tags" and "--remotes" options to git-rev-parse.
From: Jakub Narebski @ 2006-05-15 20:11 UTC (permalink / raw)
  To: git
In-Reply-To: <BAYC1-PASMTP043948149786B7EE06DED3AEA20@CEZ.ICE>

Sean Estabrooks wrote:

> On Sat, 13 May 2006 10:38:23 -0700
> Junio C Hamano <junkio@cox.net> wrote:
> 
>> Makes sense perhaps.
>> 
>> I understand you added --tags for completeness.  Probably it
>> would make sense to add --remotes if you are shooting for that.
[...]

> +--remotes::
> +     Show tag refs found in `$GIT_DIR/refs/remotes`.
> +

In `Documentation/repository-layout.txt' there is information about
`$GIT_DIR/remotes' not `$GIT_DIR/refs/remotes` so either the patch should
be amended, or the layout documentation should be updated, or the layout
repository should be changed.

-- 
Jakub Narebski
Warsaw, Poland

^ permalink raw reply

* Re: The git newbie experience
From: Carl Worth @ 2006-05-15 20:42 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Shawn Pearce, Tommi Virtanen, git
In-Reply-To: <7v1wuvvg0j.fsf@assigned-by-dhcp.cox.net>

[-- Attachment #1: Type: text/plain, Size: 3477 bytes --]

On Mon, 15 May 2006 01:39:08 -0700, Junio C Hamano wrote:
>  - Jack stashes away what he has been working on and cleans up
>    his mess.
> 
>    git checkout -b stash ;# risks error when "stash" exists
>    git commit -a -m 'Stashing WIP'
>    git checkout master ;# assuming that was where he was

I really like the proposal made elsewhere to implement a new:

	commit -b <newbranch>

which would then allow for a single command to achieve at least the
first two commands above:

	git commit -a -b stash -m 'Stashing WIP'

It might even make sense for this command to effectively perform all
three of the above commands. That is, should "commit -b" also checkout
the newly created branch or should it leave HEAD unchanged. I'm not
sure.

> You have to teach the new user to (1) name something, only to
> immediately discard it when he returns to what he was in the
> middle of, (2) remember to clean up the temporary thing once he
> is done lest he forgets to clean it up (and common names like
> "stash", "tmp" will be reused by accident causing grief next
> time he needs to do another stash), and (3) use of --no-commit
> pull.

I threw out a simple git-stash earlier, (which stashed to a branch
rather than to a file). I've spent some time using it, and am now
quite sure it's the wrong thing, and the above problems outline the
defect quite well:

1) Naming.

   Here, git-stash is doing too much. I prefer the idea of a stash
   command using a branch rather than a patch file, (and allowing one
   stash per branch rather than one stash per repository). But the
   namespace of branches is something the user owns, and we should
   avoid adding commands that steal from it unnecessarily. So my
   git-stash fails on this point, while "commit -b <newbranch>" is
   much better.

2) Cleanup and --no-commit pull

   Here, git-stash is doing too little. It's really only performing
   one piece of what needs to be done in order to switch back and
   forth between different topics of work.

So here are my thoughts on what I'd like instead:

In git, a branch is what we use to name a topic of work.

Historically, a branch has been extremely lightweight, (a name and
reference to a parent for subsequent commits). But there's been a
recent trend (in proposals at least) to add other, useful things to a
branch, (as in the discussions of branch-specific configuration).

In my work, I've found that the uncommitted state of my working tree
is something that I associate very strongly with my "current topic"
and expect the branch-changing commands respected that.

In particular, when using checkout to change branches, unless I've
specifically stated with "-m" that I want to carry my changes along, I
would like git to stash my working tree "into" the branch I'm
switching away from.

Similarly, when switching to a branch, I'd like to have the working
tree restored to what it was the last time I switched away from that
branch.

Does that seem unreasonable to anyone?

The only snag I've imagined is that when using "checkout -m" to switch
to a branch that also had a stashed working tree, then there's a merge
to be performed and that could obviously conflict. I've intentionally
not mentioned how the stashing/restoring should be implemented, since
the user shouldn't care. But a merge conflict is one case where the
implementation might leak out to the user. The wimpy thing to do would
be to refuse to allow "checkout -m" to a branch with stashed changes.

-Carl

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply

* Re: The git newbie experience
From: Junio C Hamano @ 2006-05-15 20:47 UTC (permalink / raw)
  To: Carl Baldwin; +Cc: Tommi Virtanen, Shawn Pearce, git
In-Reply-To: <20060515164610.GA24295@hpsvcnb.fc.hp.com>

Carl Baldwin <cnb@fc.hp.com> writes:

> My implementation actually wrote the working file state into the object
> store as a tree and stored a reference to the tree under something like
> .git/refs/undo (or .git/refs/stash).  Redo was a simple merge of this
> tree back onto the current working files.
>
> I think I would like something like this better than the 'generate
> binary patch and reapply the patch later.

When you think of the "binary patch" as a human readable
representation of your (hierarchical set of) tree objects, you
would realize that these two approaches aren't that much
different at the tree merge level, and it's just a matter of
which representation is more convenient and human readable.

Pros and cons I see are:

 * Branch approach needs to teach users only one thing -- create
   a branch, merge with it, throw it away.  Which is something
   the user needs to know anyway, so it is a plus.

 * Branch approach needs to store a full postimage tree and the
   base commit (so you can use it as a merge base); the
   postimage tree includes paths that are not involved in the
   change being stashed.

 * Patch records only the object names of paths that are relevant
   to the stash.  Instead of keeping the full postimage tree, it
   creates one on the fly when you actually do the unstashing.

 * Patch is human readable and can be used for purposes other
   than falling back to a three-way merge.  When cleanly applies
   apply + write-tree is faster than a tree merge.

 * Patch could be verbose if the change being stashed is large;
   after all the primary information used are the object names
   recorded on the "index" lines and the patch text itself is a
   waste from storage point of view.  This is a disadvantage of
   the "patch" approach, but its readability might offset it.
   If a change being stashed is large, the user had better be
   doing it on a separate topic branch anyway, so this might not
   be a big issue.

^ permalink raw reply

* Re: [PATCH] commit: allow --pretty= args to be abbreviated
From: Junio C Hamano @ 2006-05-15 20:47 UTC (permalink / raw)
  To: Eric Wong; +Cc: git
In-Reply-To: <20060515003405.GA5533@localdomain>

Eric Wong <normalperson@yhbt.net> writes:

> Unlike the original one, this one only does prefix matches, so
> you can't do --pretty=er anymore :)

Sounds good.  But then you know how long the unique prefix
are for each candidate, so wouldn't this rather be redundant, I
wonder?

> +
> +	/* look for abbreviations */
> +	len = strlen(arg);
> +	found = -1;
> +	for (i = 0; i < ARRAY_SIZE(cmt_fmts); i++) {
> +		if (!strncmp(cmt_fmts[i].n, arg, len)) {
> +			if (found >= 0)
> +				die("invalid --pretty format: %s", arg);
> +			found = i;
> +		}
> +	}
> +	if (found >= 0)
> +		return cmt_fmts[found].v;
> +	die("invalid --pretty format: %s", arg);
>  }

It would probably be better to say "ambiguous" not "invalid" in
the die() message.

^ permalink raw reply

* Re: [PATCH] send-email: allow sendmail binary to be used instead of SMTP
From: Ryan Anderson @ 2006-05-15 21:01 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Eric Wong, git
In-Reply-To: <7vmzdjtya4.fsf@assigned-by-dhcp.cox.net>

On Mon, May 15, 2006 at 02:47:31AM -0700, Junio C Hamano wrote:
> Eric Wong <normalperson@yhbt.net> writes:
> 
> > I believe this is what Martin wanted.  I think it's a good idea since
> > sendmail binaries tend to be more flexible, but I'm ok with it either
> > way.
> 
> I am not opposed to have an option to run a local submission
> agent binary (I said I like that if(){}else{} there, didn't I?).
> The ability to do so is a good thing.  I am not however sure
> about changing the default when no option is specified on the
> command line.

I think, in practice, that /usr/lib/sendmail will exist anywhere you hve
something running on port 25, at least on unixy machines.  In my
searches at an old job, that appeared to be the canonical place to call
sendmail from, and every MTA appears to provide an appropriate binary
there.

So, I'm not overly worried about it.

^ permalink raw reply

* Re: The git newbie experience
From: Junio C Hamano @ 2006-05-15 21:10 UTC (permalink / raw)
  To: Carl Worth; +Cc: git
In-Reply-To: <877j4nvx2w.wl%cworth@cworth.org>

Carl Worth <cworth@cworth.org> writes:

> In particular, when using checkout to change branches, unless I've
> specifically stated with "-m" that I want to carry my changes along, I
> would like git to stash my working tree "into" the branch I'm
> switching away from.
>
> Similarly, when switching to a branch, I'd like to have the working
> tree restored to what it was the last time I switched away from that
> branch.
>
> Does that seem unreasonable to anyone?

I would not call it unreasonable to want to have an easy access
to that mode of operation _as_ _well_, as an option.

If you were suggesting to make it the only way for future git to
work (I think you are not), then it sounds very unreasonable to
me.

The implementation behind the scene does not matter, but I think 
set of "stashes" that can be attached to each branch would work
well for what you would want.  OTOH, isn't it called stgit?

The reason why I want it to stay as an option is because I often
do a throw-away patch on top of whatever branch is checked out
in my working tree while reading the list traffic to compose a
response with an alternative patch.  Usually I follow that by a
"git reset" once the message is sent out, but when I like the
idea well enough, I do "checkout -b jc/that-topic master" to
switch to a new branch with the dirty state carried along, to
work on it further.  I do not want that workflow to require -m
flag, which means something different (-m means "I accept the
possibility of the three-way merge failing while doing this
switch that needs a merge").  Instead, I want "checkout -b" to
try carrying state forward and stop if it cannot without
file-level merging (i.e. the current behaviour).  Then I can
think if I want to do a stash with "git diff", or if I want to
do the temporary branch not based on "master" but the current
branch (which is guaranteed to work without -m) and deal with
the mess later.

^ permalink raw reply

* Re: [PATCH] send-email: allow sendmail binary to be used instead of SMTP
From: Junio C Hamano @ 2006-05-15 21:13 UTC (permalink / raw)
  To: Ryan Anderson; +Cc: git
In-Reply-To: <20060515210110.GR32076@h4x0r5.com>

Ryan Anderson <ryan@michonline.com> writes:

> I think, in practice, that /usr/lib/sendmail will exist anywhere you hve
> something running on port 25, at least on unixy machines.  In my
> searches at an old job, that appeared to be the canonical place to call
> sendmail from, and every MTA appears to provide an appropriate binary
> there.
>
> So, I'm not overly worried about it.

exim, postfix and friends?

I used to know somebody who port-forwarded 25/tcp to central
smtp server from smaller machines in her intranet installation,
but I would say that is rare.  I am not worried about it either;
I just wanted to make sure _somebody_ thought the potential
issues through and agreed with the change the patch makes.

^ permalink raw reply

* [RFC, PATCH] Teach revision.c about renames
From: Fredrik Kuivinen @ 2006-05-15 20:37 UTC (permalink / raw)
  To: git; +Cc: junkio

Hi,

The attached patch is a work in progress to add support for rename
following to the revision walking code in revision.c.

When the rename following is enabled (with the new '--renames' flag)
any pathspecs given on the command line are interpreted as
filenames. Those filenames are then followed through the commit graph.

For example, 'git log --renames -- git-fetch.sh' will show us the
history of 'git-fetch.sh' till commit
215a7ad1ef790467a4cd3f0dcffbd6e5f04c38f7 where it was renamed from
git-fetch-script, we then get the history of git-fetch-script instead.

This works with all commands that use revision.c. So 'gitk --renames
git-fetch.sh' and 'git whatchanged --renames git-fetch.sh' also
work. Multiple filenames are supposed to work (e.g., 'gitk --renames
-- git-fetch.sh Documentation/technical/pack-protocol.txt')

The patch currently have a few issues (that I am aware of):

* A linked list is used to keep track of which filenames that we care
  about for each commit. A more efficient data structure may be a good
  idea.

* Memory leaks. In particular the struct path_lists are never freed.

* Pathspecs that aren't filenames cause assertion failures. (for
  example 'git log --renames -- Documentation/') I don't really know
  what we should do in this case. Should we interpret 'Documentation/'
  as if the user supplied us with all the files in the Documentation
  directory and then track those? Or, should we show commits that
  change anything under Documentation/ (i.e., the old non-rename
  following behaviour) and additionally track files that are moved
  from Documentation/ to some other location. Yet another alternative
  is to disallow non-filenames. Thoughts?

* Is '--renames' a good name for the flag? I considered
  '--follow-renames' first, but found it a bit too long.


Any comments would be greatly appreciated.

- Fredrik

Signed-off-by: Fredrik Kuivinen <freku045@student.liu.se>

---

 log-tree.c |    7 +
 revision.c |  347 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 revision.h |   30 +++++
 3 files changed, 381 insertions(+), 3 deletions(-)

diff --git a/log-tree.c b/log-tree.c
index b90ba67..8ed500a 100644
--- a/log-tree.c
+++ b/log-tree.c
@@ -84,11 +84,12 @@ int log_tree_diff_flush(struct rev_info 
 }
 
 static int diff_root_tree(struct rev_info *opt,
-			  const unsigned char *new, const char *base)
+			  struct commit* commit, const char *base)
 {
 	int retval;
 	void *tree;
 	struct tree_desc empty, real;
+	const unsigned char* new = commit->object.sha1;
 
 	tree = read_object_with_reference(new, tree_type, &real.size, NULL);
 	if (!tree)
@@ -97,6 +98,7 @@ static int diff_root_tree(struct rev_inf
 
 	empty.buf = "";
 	empty.size = 0;
+	rev_setup_diffopt_paths(opt, commit, NULL);
 	retval = diff_tree(&empty, &real, base, &opt->diffopt);
 	free(tree);
 	log_tree_diff_flush(opt);
@@ -129,7 +131,7 @@ static int log_tree_diff(struct rev_info
 	parents = commit->parents;
 	if (!parents) {
 		if (opt->show_root_diff)
-			diff_root_tree(opt, sha1, "");
+			diff_root_tree(opt, commit, "");
 		return !opt->loginfo;
 	}
 
@@ -148,6 +150,7 @@ static int log_tree_diff(struct rev_info
 	for (;;) {
 		struct commit *parent = parents->item;
 
+		rev_setup_diffopt_paths(opt, commit, parent);
 		diff_tree_sha1(parent->object.sha1, sha1, "", &opt->diffopt);
 		log_tree_diff_flush(opt);
 
diff --git a/revision.c b/revision.c
index 2294b16..523ab29 100644
--- a/revision.c
+++ b/revision.c
@@ -1,12 +1,17 @@
+#include <assert.h>
+
 #include "cache.h"
 #include "tag.h"
 #include "blob.h"
 #include "tree.h"
 #include "commit.h"
 #include "diff.h"
+#include "diffcore.h"
 #include "refs.h"
 #include "revision.h"
 
+#define DEBUG 0
+
 static char *path_name(struct name_path *path, const char *name)
 {
 	struct name_path *p;
@@ -507,6 +512,320 @@ static int add_parents_only(struct rev_i
 	return 1;
 }
 
+/* Rename following code */
+const struct path_list* path_list_find(const struct path_list* list,
+					      const char* path)
+{
+	for(; list; list = list->next) {
+		if (!strcmp(path, list->item))
+			return list;
+	}
+
+	return NULL;
+}
+
+struct path_list* path_list_insert(char *item, struct path_list **list_p)
+{
+	struct path_list *new_list = xmalloc(sizeof(struct path_list));
+	new_list->item = item;
+	new_list->next = *list_p;
+	*list_p = new_list;
+	return new_list;
+}
+
+void path_list_print(FILE* file, struct path_list* list)
+{
+	for(; list; list = list->next)
+		fprintf(file, "%s ", list->item);
+	fputc('\n', file);
+}
+
+void free_path_list(struct path_list *list)
+{
+	while (list) {
+		struct path_list *temp = list;
+		list = temp->next;
+		free(temp);
+	}
+}
+
+const char** convert_to_pathspec(struct path_list* l1, struct path_list* l2)
+{
+	struct path_list* l;
+	const char** ret;
+	int i, len = 0;
+
+	for(l = l1; l; l = l->next, len++)
+		;
+	for(l = l2; l; l = l->next, len++)
+		;
+
+	ret = xmalloc((len+1)*sizeof(char*));
+	for(i = 0; l1; l1 = l1->next, i++)
+		ret[i] = l1->item;
+ 	for(; l2; l2 = l2->next, i++)
+		ret[i] = l2->item;
+
+	ret[len] = NULL;
+	return ret;
+}
+
+static struct rev_commit_info* get_util(struct commit *commit)
+{
+	struct rev_commit_info *util = commit->object.util;
+
+	if (util)
+		return util;
+
+	util = xmalloc(sizeof(struct rev_commit_info));
+	util->paths = NULL;
+	util->topo_data = NULL;
+	commit->object.util = util;
+	return util;
+}
+
+void rev_setup_diffopt_paths(struct rev_info* revs,
+			     struct commit* commit,
+			     struct commit* parent)
+{
+	const char** pathspec =
+		convert_to_pathspec(get_util(commit)->paths,
+				    parent ? get_util(parent)->paths : NULL);
+
+	/* FIXME we can't free revs->diffopt.paths here because the
+	 * initial value of that variable isn't necessarily allocated
+	 * by malloc.
+	 *
+	 * free(revs->diffopt.paths); */
+	diff_tree_release_paths(&revs->diffopt);
+	diff_tree_setup_paths(pathspec, &revs->diffopt);
+}
+
+
+static void topo_setter(struct commit* c, void* data)
+{
+	struct rev_commit_info* util = c->object.util;
+	util->topo_data = data;
+}
+
+static void* topo_getter(struct commit* c)
+{
+	struct rev_commit_info* util = c->object.util;
+	return util->topo_data;
+}
+
+
+static int same_tree_as_empty_paths(struct rev_info *revs, struct tree* t1,
+				    struct path_list* paths)
+{
+	int ret;
+
+	const char** pathspec = convert_to_pathspec(paths, NULL);
+	diff_tree_setup_paths(pathspec, &revs->pruning);
+	ret = rev_same_tree_as_empty(revs, t1);
+	diff_tree_release_paths(&revs->pruning);
+	free(pathspec);
+	return ret;
+}
+
+
+static struct path_list* file_removals;
+static void file_add_remove_ren(struct diff_options *options,
+				int addremove, unsigned mode,
+				const unsigned char *sha1,
+				const char *base, const char *path)
+{
+	if (DEBUG)
+		printf("%c base: '%s' path: '%s'\n", addremove, base, path);
+
+	if (addremove == '-') {
+		char* p = xmalloc(strlen(base) + strlen(path) + 1);
+		strcpy(p, base);
+		strcat(p, path);
+		path_list_insert(p, &file_removals);
+		tree_difference = REV_TREE_NEW;
+	} else {
+		assert(0);
+	}
+}
+
+static void file_change_ren(struct diff_options *options,
+			    unsigned old_mode, unsigned new_mode,
+			    const unsigned char *old_sha1,
+			    const unsigned char *new_sha1,
+			    const char *base, const char *path)
+{
+	if (tree_difference == REV_TREE_SAME)
+		tree_difference = REV_TREE_DIFFERENT;
+}
+
+static int compare_tree_paths(struct rev_info* revs,
+			      struct commit* parent, struct commit* commit,
+			      struct path_list** added)
+{
+	const char** pathspec;
+	struct diff_options dopts;
+
+	diff_setup(&dopts);
+	dopts.recursive = 1;
+	dopts.add_remove = file_add_remove_ren;
+	dopts.change = file_change_ren;
+	dopts.output_format = DIFF_FORMAT_NO_OUTPUT;
+
+	pathspec = convert_to_pathspec(get_util(commit)->paths, NULL);
+	diff_tree_setup_paths(pathspec, &dopts);
+
+	if (diff_setup_done(&dopts) < 0)
+		die("diff_setup_done failed");
+
+	file_removals = NULL;
+	tree_difference = REV_TREE_SAME;
+
+	diff_tree_sha1(commit->tree->object.sha1, parent->tree->object.sha1,
+		       "", &dopts);
+
+	diff_flush(&dopts);
+	diff_tree_release_paths(&dopts);
+	free(pathspec);
+
+	*added = file_removals;
+	return tree_difference;
+}
+
+static struct path_list* find_renames(struct commit* commit,
+				      struct commit* parent,
+				      struct path_list* paths)
+{
+	int i;
+	struct diff_options dopts;
+	struct path_list* ret = NULL;
+
+	if (DEBUG) {
+		printf("find_renames commit: %s ",
+		       sha1_to_hex(commit->object.sha1));
+		puts(sha1_to_hex(parent->object.sha1));
+		printf("rename from paths: ");
+		path_list_print(stdout, paths);
+	}
+
+	diff_setup(&dopts);
+	dopts.recursive = 1;
+	dopts.detect_rename = DIFF_DETECT_RENAME;
+	dopts.output_format = DIFF_FORMAT_NO_OUTPUT;
+
+	if (diff_setup_done(&dopts) < 0)
+		die("diff_setup_done failed");
+
+	diff_tree_sha1(commit->tree->object.sha1, parent->tree->object.sha1,
+		       "", &dopts);
+	diffcore_std(&dopts);
+
+	for (i = 0; i < diff_queued_diff.nr; i++) {
+		struct diff_filepair *p = diff_queued_diff.queue[i];
+
+		if (0 && p->status == 'R' && DEBUG)
+			printf("rename %s -> %s\n", p->one->path, p->two->path);
+
+		if (p->status == 'R' && path_list_find(paths, p->one->path)) {
+			if (DEBUG)
+				printf("rename %s -> %s\n",
+				       p->one->path, p->two->path);
+			path_list_insert(strdup(p->two->path), &ret);
+		}
+	}
+	diff_flush(&dopts);
+
+	if (DEBUG) {
+		printf("rename result: ");
+		path_list_print(stdout, ret);
+	}
+	return ret;
+}
+
+static struct path_list* rewrite_paths(struct rev_info *revs,
+				       struct commit* commit,
+				       struct commit* parent,
+				       struct path_list* removed)
+{
+	struct path_list* new_paths = find_renames(commit, parent, removed);
+	struct path_list* cpaths = get_util(commit)->paths;
+	for(; cpaths; cpaths = cpaths->next) {
+		if (!path_list_find(removed, cpaths->item))
+			path_list_insert(cpaths->item, &new_paths);
+	}
+	return new_paths;
+}
+
+static void simplify_commit_rename(struct rev_info *revs,
+				   struct commit *commit)
+{
+	struct commit_list **pp, *parent;
+
+	if (!commit->tree)
+		return;
+
+	{
+		struct rev_commit_info* util = get_util(commit);
+		if (!util->paths) {
+			util->paths = revs->initial_paths;
+			if (DEBUG)
+				printf("NULL paths\n");
+		}
+	}
+
+	if (!commit->parents) {
+		struct rev_commit_info* util = commit->object.util;
+		if (!same_tree_as_empty_paths(revs, commit->tree,
+					      util->paths))
+			commit->object.flags |= TREECHANGE;
+		return;
+	}
+
+	pp = &commit->parents;
+	while ((parent = *pp) != NULL) {
+		struct commit *p = parent->item;
+		struct path_list* removed;
+
+		if (p->object.flags & UNINTERESTING) {
+			pp = &parent->next;
+			continue;
+		}
+
+		parse_commit(p);
+		switch (compare_tree_paths(revs, p, commit, &removed)) {
+		case REV_TREE_SAME:
+			parent->next = NULL;
+			commit->parents = parent;
+			get_util(p)->paths = get_util(commit)->paths;
+			return;
+
+		case REV_TREE_NEW:
+		{
+			struct path_list* new_paths =
+				rewrite_paths(revs, commit, p, removed);
+			if (new_paths) {
+				struct rev_commit_info* putil = get_util(p);
+				if (!putil->paths)
+					putil->paths = new_paths;
+			} else {
+				*pp = parent->next;
+				continue;
+			}
+		}
+
+		/* fallthrough */
+		case REV_TREE_DIFFERENT:
+			pp = &parent->next;
+			if (!get_util(p)->paths)
+				get_util(p)->paths = get_util(commit)->paths;
+			continue;
+		}
+		die("bad tree compare for commit %s",
+		    sha1_to_hex(commit->object.sha1));
+	}
+	commit->object.flags |= TREECHANGE;
+}
+
 void init_revisions(struct rev_info *revs)
 {
 	memset(revs, 0, sizeof(*revs));
@@ -742,6 +1061,11 @@ int setup_revisions(int argc, const char
 				revs->full_diff = 1;
 				continue;
 			}
+			if (!strcmp(arg, "--renames")) {
+				revs->follow_renames = 1;
+				continue;
+			}
+
 			opts = diff_opt_parse(&revs->diffopt, argv+i, argc-i);
 			if (opts > 0) {
 				revs->diff = 1;
@@ -841,6 +1165,29 @@ int setup_revisions(int argc, const char
 		    (revs->diffopt.output_format != DIFF_FORMAT_DIFFSTAT))
 			revs->diffopt.output_format = DIFF_FORMAT_PATCH;
 	}
+	if (revs->follow_renames && revs->prune_data) {
+		if (revs->remove_empty_trees)
+			die("--renames and --remove-empty are currently "
+			    "mutually exclusive");
+		revs->prune_fn = simplify_commit_rename;
+		revs->topo_setter = topo_setter;
+		revs->topo_getter = topo_getter;
+		revs->limited = 1;
+
+		{
+			char** p;
+			for(p = revs->prune_data; *p; p++) {
+				if (DEBUG)
+					printf("filename: %s\n", *p);
+				path_list_insert(strdup(*p),
+						 &revs->initial_paths);
+			}
+		}
+
+		if (DEBUG)
+			printf("prefix: %s\n", revs->prefix);
+	}
+
 	revs->diffopt.abbrev = revs->abbrev;
 	diff_setup_done(&revs->diffopt);
 
diff --git a/revision.h b/revision.h
index 48d7b4c..e73a327 100644
--- a/revision.h
+++ b/revision.h
@@ -39,7 +39,8 @@ struct rev_info {
 			limited:1,
 			unpacked:1,
 			boundary:1,
-			parents:1;
+			parents:1,
+			follow_renames:1;
 
 	/* Diff flags */
 	unsigned int	diff:1,
@@ -68,6 +69,9 @@ struct rev_info {
 	struct diff_options diffopt;
 	struct diff_options pruning;
 
+	/* Rename following */
+	struct path_list* initial_paths;
+
 	topo_sort_set_fn_t topo_setter;
 	topo_sort_get_fn_t topo_getter;
 };
@@ -99,4 +103,28 @@ extern struct object_list **add_object(s
 				       struct name_path *path,
 				       const char *name);
 
+
+
+struct path_list {
+	char* item;
+	struct path_list *next;
+};
+
+struct rev_commit_info {
+	struct path_list* paths;
+	void* topo_data;
+};
+
+extern const struct path_list*
+path_list_find(const struct path_list* list, const char* path);
+extern struct path_list*
+path_list_insert(char *item, struct path_list **list_p);
+extern void path_list_print(FILE*, struct path_list* list);
+extern void free_path_list(struct path_list *list);
+extern const char** convert_to_pathspec(struct path_list* l1,
+					struct path_list* l2);
+extern void rev_setup_diffopt_paths(struct rev_info* revs,
+				    struct commit* commit,
+				    struct commit* parent);
+
 #endif

^ permalink raw reply related

* Re: [PATCH] send-email: allow sendmail binary to be used instead of SMTP
From: Ryan Anderson @ 2006-05-15 21:52 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Ryan Anderson, git
In-Reply-To: <7vhd3rgfey.fsf@assigned-by-dhcp.cox.net>

On Mon, May 15, 2006 at 02:13:25PM -0700, Junio C Hamano wrote:
> Ryan Anderson <ryan@michonline.com> writes:
> 
> > I think, in practice, that /usr/lib/sendmail will exist anywhere you hve
> > something running on port 25, at least on unixy machines.  In my
> > searches at an old job, that appeared to be the canonical place to call
> > sendmail from, and every MTA appears to provide an appropriate binary
> > there.
> >
> > So, I'm not overly worried about it.
> 
> exim, postfix and friends?

At least as packaged by Debian and Ubuntu, exim and postfix both fit
that rule.

It's just that there is no other canonical name for the binary that
provides the functionality, so everyone steals the name.

> I used to know somebody who port-forwarded 25/tcp to central
> smtp server from smaller machines in her intranet installation,
> but I would say that is rare.  I am not worried about it either;
> I just wanted to make sure _somebody_ thought the potential
> issues through and agreed with the change the patch makes.

Fair enough.

^ permalink raw reply

* Re: [PATCH] send-email: allow sendmail binary to be used instead of SMTP
From: Martin Langhoff @ 2006-05-15 22:07 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Ryan Anderson, git
In-Reply-To: <7vhd3rgfey.fsf@assigned-by-dhcp.cox.net>

On 5/16/06, Junio C Hamano <junkio@cox.net> wrote:
> Ryan Anderson <ryan@michonline.com> writes:
>
> > I think, in practice, that /usr/lib/sendmail will exist anywhere you hve
> > something running on port 25, at least on unixy machines.
...
> exim, postfix and friends?

/usr/sbin/sendmail is there for any current MTA, so if qmail, exim,
postfix, sendmail and all MTAs I can think of install a working
sendmail binary.

Newish desktop-targetted distros (and MacOSX) are leaning towards not
running an MTA on port 25 any more unless you ask them for it. They'll
do crontab-driven queue flushes, so the messages you feed to
/usr/sbin/sendmail will be sent a few minutes later.

That is why IMHO /usr/sbin/sendmail is a better default than
localhost:25, but I can live with either ;-)

> I used to know somebody who port-forwarded 25/tcp to central
> smtp server from smaller machines in her intranet installation,

I can understand that, but I think the changes in unixy distros
mentioned above make it unnecessary today.

cheers,

martin

^ permalink raw reply

* Fix silly typo in new builtin grep
From: Linus Torvalds @ 2006-05-16  0:54 UTC (permalink / raw)
  To: Junio C Hamano, Git Mailing List

The "-F" flag apparently got mis-translated due to some over-eager 
copy-paste work into a duplicate "-H" when using the external grep.

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
---

Me likee the new built-in grep. The ability to say

	git grep __make_request v2.6.13 -- '*.c'

to grep for it in a specific version is well worth the fact that it 
obviously ends up being slower than grepping in the currently checked-out 
tree. It's doing a hell of a lot more, but despite that it's not at all 
that slow.

(In fact, I would say that doing the above command in just 4 seconds is 
damn impressive - it's a large code-base, and v2.6.13 is several months, 
and over 20 _thousand_ revisions ago).

And now it doesn't have any performance downsides, so it's all upside. 

Good job. Pls merge into mainline, I think the "grep per revision" is a 
killer feature.

diff --git a/builtin-grep.c b/builtin-grep.c
index 3d6e515..66111de 100644
--- a/builtin-grep.c
+++ b/builtin-grep.c
@@ -455,7 +455,7 @@ static int external_grep(struct grep_opt
 	push_arg("grep");
 	push_arg("-H");
 	if (opt->fixed)
-		push_arg("-H");
+		push_arg("-F");
 	if (opt->linenum)
 		push_arg("-n");
 	if (opt->regflags & REG_EXTENDED)

^ permalink raw reply related

* Re: Fix silly typo in new builtin grep
From: Linus Torvalds @ 2006-05-16  1:07 UTC (permalink / raw)
  To: Junio C Hamano, Git Mailing List
In-Reply-To: <Pine.LNX.4.64.0605151743360.3866@g5.osdl.org>

On Mon, 15 May 2006, Linus Torvalds wrote:
> 
> Me likee the new built-in grep. The ability to say
> 
> 	git grep __make_request v2.6.13 -- '*.c'
> 
> to grep for it in a specific version is well worth the fact that it 
> obviously ends up being slower than grepping in the currently checked-out 
> tree. It's doing a hell of a lot more, but despite that it's not at all 
> that slow.

Side note: it looks like the version generation really isn't much of a 
cost. Grepping in v2.6.13 isn't really noticeably slower than the 
"pre-external grep" was for grepping in the checked-out file tree.

So it looks like we _could_ improve the grepping of specific versions 
noticeably if we were to have a better regex library that was as optimized 
as what the external GNU grep seems to do. The actual revision data 
generation doesn't seem to be the biggest cost, and at least in _theory_ 
we could probably speed things up by a factor of two with a faster regex 
library.

That's good to keep in mind. It may be that the glibc regexp is just not 
very good, but quite frankly, I would personally not be surprised if it's 
better than most (ie windows, for example).

			Linus

^ permalink raw reply

* Re: [PATCH] simple euristic for further free packing improvements
From: Junio C Hamano @ 2006-05-16  1:51 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: git
In-Reply-To: <Pine.LNX.4.64.0605151129540.18071@localhost.localdomain>

Nicolas Pitre <nico@cam.org> writes:

> Given that the early eviction of objects with maximum delta depth 
> may exhibit bad packing on its own, why not considering a bias against 
> deep base objects in try_delta() to mitigate that bad behavior.

This is really a good stuff.  Thanks.  Oh, and thanks for
noticing my puzzlement expressed with "#if 0" ;-).

> @@ -1038,8 +1038,8 @@ static int try_delta(struct unpacked *tr
>  
>  	/* Now some size filtering euristics. */
>  	size = trg_entry->size;
> -	max_size = size / 2 - 20;
> -	if (trg_entry->delta)
> +	max_size = (size/2 - 20) / (src_entry->depth + 1);
> +	if (trg_entry->delta && trg_entry->delta_size <= max_size)
>  		max_size = trg_entry->delta_size-1;
>  	src_size = src_entry->size;
>  	sizediff = src_size < size ? size - src_size : 0;

At the first glance, this seems rather too agressive.  It makes
me wonder if it is a good balance to penalize the second
generation base by requiring it to produce a small delta that is
at most half as we normally would (and the third generation a
third), or maybe the penalty should kick in more gradually, like
e.g. ((max_depth * 2 - src_entry->depth) / (max_depth * 2).

Having said that, judging from your past patches, I learned to
trust that you have tried tweaking this part and settled on this
simplicity and elegance, so I'll take the patch as is -- if
somebody wants to play with it that can always be done to
further improve things.

^ permalink raw reply

* Re: Fix silly typo in new builtin grep
From: Morten Welinder @ 2006-05-16  2:10 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Junio C Hamano, Git Mailing List
In-Reply-To: <Pine.LNX.4.64.0605151801100.3866@g5.osdl.org>

If I read the code right, it calls regexec for every single character
on every single line.  No wonder that takes a while!  Just call it
once and it'll search for its match quite nicely.

If that's not enough, the two obvious optimizations are...

1. If the pattern contains no regexp characters  (and that is very
    common), do a strstr.

2. If the pattern must start with a specific character, search for that
    by itself.

M.

^ permalink raw reply

* Re: [PATCH] simple euristic for further free packing improvements
From: Nicolas Pitre @ 2006-05-16  2:53 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <7v4pzqhh3t.fsf@assigned-by-dhcp.cox.net>

On Mon, 15 May 2006, Junio C Hamano wrote:

> Nicolas Pitre <nico@cam.org> writes:
> 
> > @@ -1038,8 +1038,8 @@ static int try_delta(struct unpacked *tr
> >  
> >  	/* Now some size filtering euristics. */
> >  	size = trg_entry->size;
> > -	max_size = size / 2 - 20;
> > -	if (trg_entry->delta)
> > +	max_size = (size/2 - 20) / (src_entry->depth + 1);
> > +	if (trg_entry->delta && trg_entry->delta_size <= max_size)
> >  		max_size = trg_entry->delta_size-1;
> >  	src_size = src_entry->size;
> >  	sizediff = src_size < size ? size - src_size : 0;
> 
> At the first glance, this seems rather too agressive.  It makes
> me wonder if it is a good balance to penalize the second
> generation base by requiring it to produce a small delta that is
> at most half as we normally would (and the third generation a
> third), or maybe the penalty should kick in more gradually, like
> e.g. ((max_depth * 2 - src_entry->depth) / (max_depth * 2).
> 
> Having said that, judging from your past patches, I learned to
> trust that you have tried tweaking this part and settled on this
> simplicity and elegance, so I'll take the patch as is -- if
> somebody wants to play with it that can always be done to
> further improve things.

Actually I didn't play with that part that much.  The only thing I tried 
besides this version was (size - 20) / (src_entry->depth + 1) but it 
produced larger packs than the current version.

So I thought it was better to provide a simple initial rule and leave 
possible improvements for later.


Nicolas

^ permalink raw reply

* Re: Fix silly typo in new builtin grep
From: Junio C Hamano @ 2006-05-16  3:18 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git
In-Reply-To: <Pine.LNX.4.64.0605151743360.3866@g5.osdl.org>

Linus Torvalds <torvalds@osdl.org> writes:

> The "-F" flag apparently got mis-translated due to some over-eager 
> copy-paste work into a duplicate "-H" when using the external grep.

Thanks.  I've pushed it out to "master", along with some other
stuff.

> Me likee the new built-in grep. The ability to say
>
> 	git grep __make_request v2.6.13 -- '*.c'
>
> to grep for it in a specific version is well worth the fact that it 
> obviously ends up being slower than grepping in the currently checked-out 
> tree. It's doing a hell of a lot more, but despite that it's not at all 
> that slow.
>
> (In fact, I would say that doing the above command in just 4 seconds is 
> damn impressive - it's a large code-base, and v2.6.13 is several months, 
> and over 20 _thousand_ revisions ago).

That is a BS praise and you know it ;-).  You do not have delta
chains that are 20k long, so grepping from the tree 10 revs ago
and from the tree 20k revs ago would not make a difference.

It _would_ be impressive to CVS folks, but even there each path
would not have 20k revisions.  The kernel patches tend to touch
3 paths per patch on average, so 60k changes over 18k files
distributed unevenly -- my guess (I could count but haven't) is
probably 200 revisions at most for most frequently touched file.

^ permalink raw reply

* Re: Fix silly typo in new builtin grep
From: Linus Torvalds @ 2006-05-16  3:27 UTC (permalink / raw)
  To: Morten Welinder; +Cc: Junio C Hamano, Git Mailing List
In-Reply-To: <118833cc0605151910s7619ddf0x8f014adba2a1eba5@mail.gmail.com>

On Mon, 15 May 2006, Morten Welinder wrote:
>
> If I read the code right, it calls regexec for every single character
> on every single line.  No wonder that takes a while!  Just call it
> once and it'll search for its match quite nicely.

No, it calls it once per pattern per line.

But yes, it calls it once per line, instead of calling it on some bigger 
boundary. Partly because of the line-based output, partly probably because 
regexec() is not actually amenable to a "<buffer,size>" kind of usage, but 
is based on NUL-terminated strings.

> 1. If the pattern contains no regexp characters  (and that is very
>    common), do a strstr.
> 
> 2. If the pattern must start with a specific character, search for that
>    by itself.

Yeah, we could do some simple stuff, and see if it helps..

		Linus

^ permalink raw reply

* Re: Fix silly typo in new builtin grep
From: Linus Torvalds @ 2006-05-16  3:32 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <7vu07qfyj0.fsf@assigned-by-dhcp.cox.net>

On Mon, 15 May 2006, Junio C Hamano wrote:
> >
> > (In fact, I would say that doing the above command in just 4 seconds is 
> > damn impressive - it's a large code-base, and v2.6.13 is several months, 
> > and over 20 _thousand_ revisions ago).
> 
> That is a BS praise and you know it ;-).  You do not have delta
> chains that are 20k long, so grepping from the tree 10 revs ago
> and from the tree 20k revs ago would not make a difference.

Oh, I agree. I meant in a "general version-control sense". I doubt a lot 
of other version-control systems could do it. Git can, exactly because 
it's whole-file based, and our deltas are limited.

So it's not that "builtin-grep" is wonderful. It's that _git_ is
wonderful, and the builtin-grep just shows one of the end results.

That's why we have killer features. To show off.

(That said, git will slow down a tad too - the pack-file access won't be 
as optimized for an old version tree, and so you'll seek around some more 
for the cold-cache case).

		Linus

^ permalink raw reply

* Pickaxe usage question -- only matching on added string
From: Martin Langhoff @ 2006-05-16  3:37 UTC (permalink / raw)
  To: git

Documentation for diffcore-pickaxe (in Documentation/diffcore.txt) says:

When diffcore-pickaxe is in use, it checks if there are
filepairs whose "original" side has the specified string and
whose "result" side does not.  Such a filepair represents "the
string appeared in this changeset".  It also checks for the
opposite case that loses the specified string.

Now, is there a way to get diffcore to match only on 'added' (or on
'removed', for that matter)? I am tring to identify commtis that added
patches to a project, and I seem to be getting matches that add and
remove when I do:

   git-whatchanged -p -C -S"\t" master

cheers,

martin

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox