[PATCH] (experimental) per-topic shortlog.

git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] (experimental) per-topic shortlog.
@ 2006-11-27  0:44 Junio C Hamano
  2006-11-27  1:06 ` Linus Torvalds
  0 siblings, 1 reply; 18+ messages in thread
From: Junio C Hamano @ 2006-11-27  0:44 UTC (permalink / raw)
  To: git; +Cc: Johannes Schindelin

This implements an experimental "git log-fpc" command that shows
short-log style output sorted by topics.

A "topic" is identified by going through the first-parent
chains; this ignores the fast-forward case, but for a top-level
integrator it often is good enough.

For example, if the commit ancestry graph looks like this:

         x---x---x---X---o---*---o---o---o HEAD
          \                 /
           o---o---o---o---o

and the command line asks for

	git log-fpc --no-merges X..

It first finds all the commits 'o'.  Then it emits the four
commits on the upper line (assume the merge '*' has the commit
that is a child of X as its first parent in the picture).  When
it does so, it the list of authors for these four commits on one
line, followed by the title of these commits.  After that, it
does the same for the five commits on the lower line.

---

I initially wanted to do this inside Johannes's enhanced
shortlog, but ended up doing this as a pretty much independent
thing, because the shortlog implementation stringifies the
information from the commits too early to be easily enhanced for
this purpose.

If this turns out to be a better way to present shortlog,
however, this should become an option to git-shortlog.

A sample output from:

	git log-fpc --no-merges v1.4.4.1..f64d7fd2

looks like this (f64d7fd2 was the tip of master when the last
"What's in" message was sent out).  It shows that many "fixes"
and git-svn enhancements were directly done on "master" (that is
the first group), while many gitweb enhancements, changing the
output from "prune -n", "git branch" enhancements, etc. were
first cooked in separate topic branches and then later merged
into 'master'.

To this output, I can manually add a topic title to the
beginning of each group and it would make a better overview than
what I currently send out in "What's in" message which is
generated with shortlog.

----------------------------------------------------------------

Eric Wong (6), Junio C Hamano (5), Lars Hjemli, Jakub Narebski,
 Iñaki Arenaza, Petr Baudis, Andy Parkins, and René Scharfe
 git-fetch: exit with non-zero status when fast-forward check fails
 git-svn: exit with status 1 for test failures
 git-svn: correctly access repos when only given partial read permissions
 git-branch -D: make it work even when on a yet-to-be-born branch
 Add -v and --abbrev options to git-branch
 git-clone: stop dumb protocol from copying refs outside heads/ and tags/.
 gitweb: (style) use chomp without parentheses consistently.
 gitweb: Replace SPC with &nbsp; also in tag comment
 git-svn: handle authentication without relying on cached tokens on disk
 git-cvsimport: add support for CVS pserver method HTTP/1.x proxying
 Make git-clone --use-separate-remote the default
 refs outside refs/{heads,tags} match less strongly.
 Increase length of function name buffer
 git-svn: preserve uncommitted changes after dcommit
 git-svn: correctly handle revision 0 in SVN repositories
 git-svn: error out from dcommit on a parent-less commit
 archive-zip: don't use sizeof(struct ...)

Junio C Hamano and Andy Parkins
 Typefix builtin-prune.c::prune_object()
 Improve git-prune -n output

Peter Baumann
 config option log.showroot to show the diff of root commits

Andy Parkins
 Add support to git-branch to show local and remote branches

Jakub Narebski (7)
 gitweb: Finish restoring "blob" links in git_difftree_body
 gitweb: Refactor feed generation, make output prettier, add Atom feed
 gitweb: Add an option to href() to return full URL
 gitweb: New improved formatting of chunk header in diff
 gitweb: Default to $hash_base or HEAD for $hash in "commit" and "commitdiff"
 gitweb: Buffer diff header to deal with split patches + git_patchset_body refactoring
 gitweb: Protect against possible warning in git_commitdiff

----------------------------------------------------------------

 builtin-log.c |  177 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 builtin.h     |    1 +
 git.c         |    1 +
 3 files changed, 179 insertions(+), 0 deletions(-)

diff --git a/builtin-log.c b/builtin-log.c
index 7acf5d3..1c2838c 100644
--- a/builtin-log.c
+++ b/builtin-log.c
@@ -99,6 +99,183 @@ int cmd_log(int argc, const char **argv, const char *prefix)
 	return cmd_log_walk(&rev);
 }
 
+/* bits #0..7 in revision.h, #8..11 in commit.c */
+#define FPC_RESULT (1u<<12)
+#define FPC_SHOWN  (1u<<13)
+
+struct author_record {
+	char *name;
+	int count;
+};
+struct author_count {
+	int nr, alloc;
+	struct author_record **au;
+};
+
+static int cmp_count(const void *a_, const void *b_)
+{
+	struct author_record **a = (struct author_record **) a_;
+	struct author_record **b = (struct author_record **) b_;
+	return (*b)->count - (*a)->count;
+}
+
+static void add_author(struct commit *c, struct author_count *ac)
+{
+	const char *buf = c->buffer;
+	char *au = strstr(buf, "\nauthor ");
+	char *eon;
+	struct author_record *ar;
+	int i;
+
+	if (!au)
+		return; /* oops */
+	au += 7;
+	while (*au && isspace(*au))
+		au++;
+	if (!*au)
+		return; /* oops */
+	eon = strchr(au, '<');
+	if (!eon)
+		return; /* oops */
+	while (au < --eon && isspace(*eon))
+		; /* back back back... */
+	eon++;
+	for (i = 0; i < ac->nr; i++)
+		if (!strncmp(ac->au[i]->name, au, eon-au) &&
+		    strlen(ac->au[i]->name) == eon - au) {
+			/* found it */
+			ac->au[i]->count++;
+			return;
+		}
+	if (ac->alloc <= ac->nr) {
+		ac->alloc = alloc_nr(ac->alloc);
+		ac->au = xrealloc(ac->au, sizeof(struct author_record *) *
+				  ac->alloc);
+	}
+	ar = xcalloc(1, sizeof(struct author_record));
+	ar->name = xmalloc(eon - au + 1);
+	memcpy(ar->name, au, eon - au);
+	ar->name[eon - au] = 0;
+	ar->count = 1;
+	ac->au[ac->nr++] = ar;
+}
+
+static void show_fpc(struct object_array *list)
+{
+	int i;
+	struct author_count ac;
+
+	if (!list->nr)
+		return;
+	memset(&ac, 0, sizeof(ac));
+	for (i = 0; i < list->nr; i++)
+		add_author((struct commit *) list->objects[i].item, &ac);
+	qsort(ac.au, ac.nr, sizeof(struct author_record *), cmp_count);
+
+	for (i = 0; i < ac.nr; i++) {
+		if (i) {
+			if (i < ac.nr - 1)
+				fputs(", ", stdout);
+			else if (ac.nr != 2)
+				fputs(", and ", stdout);
+			else
+				fputs(" and ", stdout);
+		}
+		if (ac.au[i]->count < 2)
+			printf("%s", ac.au[i]->name);
+		else
+			printf("%s (%d)", ac.au[i]->name, ac.au[i]->count);
+		free(ac.au[i]->name);
+		free(ac.au[i]);
+	}
+	free(ac.au);
+	putchar('\n');
+
+	for (i = 0; i < list->nr; i++) {
+		struct commit *c = (struct commit *) list->objects[i].item;
+		char *buf = c->buffer;
+		char *it = "<unnamed>";
+		int len = strlen(it);
+		buf = strstr(buf, "\n\n");
+		if (buf) {
+			char *lineend;
+			while (*buf && isspace(*buf))
+				buf++;
+			if (!*buf)
+				goto emit;
+			lineend = strchr(buf, '\n');
+			if (!lineend)
+				goto emit;
+			while (buf < lineend && isspace(*lineend))
+				lineend--;
+			len = lineend - buf + 1;
+			it = buf;
+		}
+	emit:
+		printf(" %.*s\n", len, it);
+	}
+	putchar('\n');
+}
+
+int cmd_log_fpc(int argc, const char **argv, const char *prefix)
+{
+	struct rev_info rev;
+	struct commit *c;
+	struct object_array result = { 0, 0, NULL };
+	int i;
+
+	git_config(git_log_config);
+	init_revisions(&rev, prefix);
+	rev.always_show_header = 1;
+	cmd_log_init(argc, argv, prefix, &rev);
+
+	prepare_revision_walk(&rev);
+	while ((c = get_revision(&rev)) != NULL)
+		add_object_array(&(c->object), NULL, &result);
+
+	/* clear flags and mark them "relevant" */
+	for (i = 0; i < result.nr; i++)
+		result.objects[i].item->flags |= FPC_RESULT;
+
+	for (;;) {
+		struct object_array current;
+
+		for (i = 0; i < result.nr; i++) {
+			if (!(result.objects[i].item->flags & FPC_SHOWN))
+				break;
+		}
+		if (i >= result.nr)
+			break;
+
+		memset(&current, 0, sizeof(current));
+		c = (struct commit *) result.objects[i].item;
+		while (c) {
+			int flags = c->object.flags;
+
+			if ((flags & (FPC_RESULT|FPC_SHOWN)) == FPC_RESULT) {
+				add_object_array(&(c->object), NULL, &current);
+				c->object.flags |= FPC_SHOWN;
+			}
+			if (!c->object.parsed)
+				parse_object(c->object.sha1);
+			if (!c->parents)
+				break;
+			c = c->parents->item;
+		}
+
+		/* Finally, show the series. */
+		show_fpc(&current);
+	}
+
+	/* free them */
+	for (i = 0; i < result.nr; i++) {
+		c = (struct commit *) result.objects[i].item;
+		free(c->buffer);
+		free_commit_list(c->parents);
+	}
+	return 0;
+}
+
 static int istitlechar(char c)
 {
 	return (c >= 'a' && c <= 'z') || (c >= 'A' && c <= 'Z') ||
diff --git a/builtin.h b/builtin.h
index 43fed32..a94540d 100644
--- a/builtin.h
+++ b/builtin.h
@@ -38,6 +38,7 @@ extern int cmd_grep(int argc, const char **argv, const char *prefix);
 extern int cmd_help(int argc, const char **argv, const char *prefix);
 extern int cmd_init_db(int argc, const char **argv, const char *prefix);
 extern int cmd_log(int argc, const char **argv, const char *prefix);
+extern int cmd_log_fpc(int argc, const char **argv, const char *prefix);
 extern int cmd_ls_files(int argc, const char **argv, const char *prefix);
 extern int cmd_ls_tree(int argc, const char **argv, const char *prefix);
 extern int cmd_mailinfo(int argc, const char **argv, const char *prefix);
diff --git a/git.c b/git.c
index 1aa07a5..65d98bd 100644
--- a/git.c
+++ b/git.c
@@ -243,6 +243,7 @@ static void handle_internal_command(int argc, const char **argv, char **envp)
 		{ "help", cmd_help },
 		{ "init-db", cmd_init_db },
 		{ "log", cmd_log, RUN_SETUP | USE_PAGER },
+		{ "log-fpc", cmd_log_fpc, RUN_SETUP | USE_PAGER },
 		{ "ls-files", cmd_ls_files, RUN_SETUP },
 		{ "ls-tree", cmd_ls_tree, RUN_SETUP },
 		{ "mailinfo", cmd_mailinfo },
-- 
1.4.4.1.ge3fb


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH] (experimental) per-topic shortlog.
  2006-11-27  0:44 [PATCH] (experimental) per-topic shortlog Junio C Hamano
@ 2006-11-27  1:06 ` Linus Torvalds
  2006-11-27  1:38   ` Junio C Hamano
                     ` (2 more replies)
  0 siblings, 3 replies; 18+ messages in thread
From: Linus Torvalds @ 2006-11-27  1:06 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Johannes Schindelin

On Sun, 26 Nov 2006, Junio C Hamano wrote:
>
> This implements an experimental "git log-fpc" command that shows
> short-log style output sorted by topics.
> 
> A "topic" is identified by going through the first-parent
> chains; this ignores the fast-forward case, but for a top-level
> integrator it often is good enough.

Umm. May I suggest that you try this with the kernel repo too..

There, the "first parent chain" tends to be less interesting than a lot of 
other heuristics:

 - committer

   If the committer changes, you should probably consider it a break, the 
   same way a second parent would be a break. You probably won't see this 
   in the git archive, because there tends to be a single committer, but 
   on something like the kernel where we really merge other peoples repos, 
   it's going to be as good (or better) than looking at "other parents".

 - subdirectory heuristics

   Again, with git it's not very interesting, but I bet that you'd be able 
   to use heuristics like "the bulk of the changes were contained within 
   this directory tree" for projects like the kernel, and automatically 
   decide on "topics" like drivers/scsi, fs/ext3 etc.

In other words, I don't think the "fpc" decision is even very interesting. 
If you _really_ want to do a cool shortlogger, I bet it can be done, but I 
suspect that it would be a LOT cooler to do some automatic bayesian 
clustering based on committer, author and list of filenames changed.

Of course, such a thing done well would probably be worthy of a doctoral 
thesis or something. Maybe somebody on this list who is into bayesian 
clustering and doesn't have a thesis subject...

(Of course, since I haven't been in a University setting for the last ten 
years, maybe bayesian clustering isn't the cool thing to work on any 
more).

Anyway, "topics" really should be something that is extremely open to 
various clustering models, bayesian or not ..

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] (experimental) per-topic shortlog.
  2006-11-27  1:06 ` Linus Torvalds
@ 2006-11-27  1:38   ` Junio C Hamano
  2006-11-27  1:53     ` Linus Torvalds
  2006-11-27  1:55   ` Junio C Hamano
  2006-11-27 23:46   ` Johannes Schindelin
  2 siblings, 1 reply; 18+ messages in thread
From: Junio C Hamano @ 2006-11-27  1:38 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git

Linus Torvalds <torvalds@osdl.org> writes:

> On Sun, 26 Nov 2006, Junio C Hamano wrote:
>>
>> This implements an experimental "git log-fpc" command that shows
>> short-log style output sorted by topics.
>> 
>> A "topic" is identified by going through the first-parent
>> chains; this ignores the fast-forward case, but for a top-level
>> integrator it often is good enough.
>
> Umm. May I suggest that you try this with the kernel repo too..

Have you?

I've compared 

	gitk HEAD~40..HEAD

and 

	git-log-fpc --no-merges HEAD~40..HEAD

Admittedly, the first group ("from the tip of the master") tends
to be seriously mixed up without a fixed theme (well the theme
appears to be "fix trivial warnings and compilation breakages
not limited to any particular subsystem"), but I find the other
groups quite a sane representation of what actually happened.

My copy of your tree is a bit old (HEAD is at 1abbfb412), but I
see:

 - a two-commit series on MIPS via Ralf Baechle,
 - a four-commit series on ARM via Russel King,
 - a three-commit series on POWERPC via Paul Mackerras,
 - a seventeen-commit series in net/ area via Dave Miller,
 - a three-commit series on x86_64 via Andi Kleen.
 ...

As you said, committer would be a good addition to break a
fast-forward case to make it even better.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] (experimental) per-topic shortlog.
  2006-11-27  1:38   ` Junio C Hamano
@ 2006-11-27  1:53     ` Linus Torvalds
  0 siblings, 0 replies; 18+ messages in thread
From: Linus Torvalds @ 2006-11-27  1:53 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git



On Sun, 26 Nov 2006, Junio C Hamano wrote:
> 
>  - a two-commit series on MIPS via Ralf Baechle,
>  - a four-commit series on ARM via Russel King,
>  - a three-commit series on POWERPC via Paul Mackerras,
>  - a seventeen-commit series in net/ area via Dave Miller,
>  - a three-commit series on x86_64 via Andi Kleen.

You'll reasonably often see in the kernel:

 - a patch-series by Andrew (where nothing but filename clustering really 
   would help: the committer is me, and the thing is linear)

 - linearly on top of that, a git merge that was a fast-forward 
   (especially from the subset of people who actively rebase their trees: 
   that notably includes Dave Miller, but also for example the DVB people)

so purely a first-parent logic would not catch that case at all (but the 
committer would at least catch the "patch-series by Andrew" -> "Merge of 
network tree by Davem" break).

But especially with long patch-series through Andrew, it would be nice to 
have some other heuristics (although they _tend_ to be fairly random, 
especially at the end of the release cycle - at the beginning, I tend to 
have series of 100-200 patches that often _could_ be clearly clustered 
into a few clusters).

Anyway, the real win of clusterign would likely be for big releases, ie 
soemthing like "v2.6.18..v2.6.19-rc1", where there's definitely some 
clustering even apart from just merging (although the merge topology will 
definitely get some of it)


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] (experimental) per-topic shortlog.
  2006-11-27  1:06 ` Linus Torvalds
  2006-11-27  1:38   ` Junio C Hamano
@ 2006-11-27  1:55   ` Junio C Hamano
  2006-11-27  2:52     ` Linus Torvalds
  2006-11-27 23:46   ` Johannes Schindelin
  2 siblings, 1 reply; 18+ messages in thread
From: Junio C Hamano @ 2006-11-27  1:55 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git

Linus Torvalds <torvalds@osdl.org> writes:

> On Sun, 26 Nov 2006, Junio C Hamano wrote:
>>
>> This implements an experimental "git log-fpc" command that shows
>> short-log style output sorted by topics.
>> 
>> A "topic" is identified by going through the first-parent
>> chains; this ignores the fast-forward case, but for a top-level
>> integrator it often is good enough.

After sending out a response, I re-read your message because I
did not quite get where bayesian would come into the picture.

I think I should have used the word "topic branch" instead of
"topic".  In other words, I was not interested in sifting the
various totally unrelated linear commits into groups that deal
with distinct problems.

But again you are showing your superiour intelligence by setting
the problem in a much grander scheme ;-), where there is no such
developer discipline that would help the shortlogger (like use
of topic branches).  In such a case, you would need a set of
heuristics that you described.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] (experimental) per-topic shortlog.
  2006-11-27  1:55   ` Junio C Hamano
@ 2006-11-27  2:52     ` Linus Torvalds
  2006-11-27  6:48       ` Junio C Hamano
  0 siblings, 1 reply; 18+ messages in thread
From: Linus Torvalds @ 2006-11-27  2:52 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Sun, 26 Nov 2006, Junio C Hamano wrote:
> 
> I think I should have used the word "topic branch" instead of
> "topic".  In other words, I was not interested in sifting the
> various totally unrelated linear commits into groups that deal
> with distinct problems.

Well, I think you're grown slightly jaded by the fact that git has very 
active "normal" development, that is actually done by you on the main 
branch, and you do basically zero rebasing along the side branches.

I think that's actually likely the exception rather than the rule. It's 
much more likely that people have almost _all_ active development done on 
side branches, and that - together with rebasing of the side branches - 
inevitably means that the "main branch" ends up not having such a clean 
set of "topic branch" merges.

In addition, on a more mature tree, a lot (probably _most_) of the commits 
aren't really "topics" at all, but "maintenance", which exacerbates the 
problem: you don't have a "line of development of this feature", you tend 
to have much more of a random "fix this general area", where the only 
common theme may be the fact that things are _related_ to some common 
subsystem, but not that they are a "topic branch" in the _development_ 
sense.

Put another way: bugs get fixed one by one, not in a nice linear fashion 
by "topic".

So I'm coming at it from a totally different project - where "topic 
branches" simply aren't delineated as much, and even when they are, they 
tend to be merged in multiple steps (and they pull both ways when they 
aren't re-based).

So that's why I don't think the pure branch topology is as interesting. A 
single line of development ends up being useful for you, and we'll 
certainly see _some_ of that, but in the kernel, I pretty much guarantee 
that you probably get better "topic clustering" by going simply by author, 
like the old standard "git shortlog" does. Because that will tend to get 
the clustering at a finer granularity (ie not just "networking", but 
things like "packet filtering" etc).

So the "sort by people" actually works fairly well, but it's kind of an 
"incidental" thing, and it _would_ be potentially useful to have other 
ways of grouping things.

See? It's not about "superior intelligence", it's about simply a totally 
different development phase (and a less strictly defined problem space).

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] (experimental) per-topic shortlog.
  2006-11-27  2:52     ` Linus Torvalds
@ 2006-11-27  6:48       ` Junio C Hamano
  2006-11-27 16:20         ` Linus Torvalds
  0 siblings, 1 reply; 18+ messages in thread
From: Junio C Hamano @ 2006-11-27  6:48 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git

Linus Torvalds <torvalds@osdl.org> writes:

> I think that's actually likely the exception rather than the rule. It's 
> much more likely that people have almost _all_ active development done on 
> side branches, and that - together with rebasing of the side branches - 
> inevitably means that the "main branch" ends up not having such a clean 
> set of "topic branch" merges.

You are absolutely right about "Andrew patchbomb" which is
linear and does not have the series boundary.  Import from
mostly linear foreign SCM would have the same issue.  Merge
topology would not help us at all in these cases.

> In addition, on a more mature tree, a lot (probably _most_) of the commits 
> aren't really "topics" at all, but "maintenance", which exacerbates the 
> problem: you don't have a "line of development of this feature",
> ...
> Put another way: bugs get fixed one by one, not in a nice linear fashion 
> by "topic".

Again, you are right, but that only means topic based grouping
is not for everybody, and certainly is not suitable for a long
stretch of commits on the trunk of a mature project because they
tend to touch everywhere and not all that clustered.  If those
bugs were fixed by committing on separate topic branches and
then later merged, the topology based clustering would get the
grouping right, but I would imagine we would end up seeing
hundreds of such short groups which would not be useful at all.
In such cases, it would be much more useful to have one huge
group that says "these are small fixes, each of which may touch
different areas -- they are not related but grouped together
because they are all small, obviously correct and harmless
fixes".  So I suspect that is a slightly different issue -- it
just illustrates the need for an "ungrouped" bin.

> So I'm coming at it from a totally different project - where "topic 
> branches" simply aren't delineated as much, and even when they are, they 
> tend to be merged in multiple steps (and they pull both ways when they 
> aren't re-based).

I agree multiple steps merge and merging both ways would happen
in real life, but I had an impression that fpc handles that
topology reasonably well, unless that "merge from upstream" are
of "too frequent, automated and useless" kind of merges.

> ... but in the kernel, I pretty much guarantee 
> that you probably get better "topic clustering" by going simply by author, 
> like the old standard "git shortlog" does. Because that will tend to get 
> the clustering at a finer granularity (ie not just "networking", but 
> things like "packet filtering" etc).
>
> So the "sort by people" actually works fairly well, but it's kind of an 
> "incidental" thing, and it _would_ be potentially useful to have other 
> ways of grouping things.

I think "networking" vs "packet filtering" largely depends on
how the networking subsystem you pull from is managed.  If
netfilter comes as e-mailed patches to DaveM and are applied
onto the trunk of networking subsystem, we will face exactly the
same problem as we have with Andrew's patchbomb to your trunk.

If it were managed on a separate topic branch in the networking
subsystem repository (either DaveM manages them in his
repository as a topic, or DaveM pulls from netfilter git
repository -- I do not know how that part of the patchflow
works), I would imagine you would get the same "per topic"
grouping.

Another factor is that the author population of a wide and
mature project like the kernel tends to be more diverse, and a
single person tends to be focused on one thing at a time while
others work on different things.  There is enough work in one
specific area for one person to do, and the project is too wide
for one person to be everywhere.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] (experimental) per-topic shortlog.
  2006-11-27  6:48       ` Junio C Hamano
@ 2006-11-27 16:20         ` Linus Torvalds
  0 siblings, 0 replies; 18+ messages in thread
From: Linus Torvalds @ 2006-11-27 16:20 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Sun, 26 Nov 2006, Junio C Hamano wrote:
> 
> I think "networking" vs "packet filtering" largely depends on
> how the networking subsystem you pull from is managed.  If
> netfilter comes as e-mailed patches to DaveM and are applied
> onto the trunk of networking subsystem, we will face exactly the
> same problem as we have with Andrew's patchbomb to your trunk.

Most of the subsystems end up using patches - they're simply better ways 
to move things around and have people comment on them than saying "please 
pull on this tree to see my suggestion".  I do it myself: even when I 
_generate_ the diff in my tree, I will often just do a 

	git diff > ~/diff

and then import the thing into my mailer, and say "Maybe something like 
this?".

So I think patches are fundamentally the core way to get things in the 
periphery into just about any system. Maybe we do it more than most just 
because we're so _used_ to them, but I actually think that if the kernel 
does it more than most (and I'm not sure it does), it's simply because the 
thing about patches is that they really _work_.

So yes, the network subsystem tends to be entirely linear by the time it 
hits me. That's true of a lot of other subsystems too (SCSI etc). There's 
a _few_ subsystems that actually have real topic branches: ACPI and 
network driver development comes to mind, but it seems to actually be the 
exception rather than the rule.

(I think that a lot of people work like I occasionally do: they do have 
their own local branches for some stuff, but they end up re-linearizing 
and keeping them active with "git rebase", so the branches really are 
purely local, rather than something that is visible in the end result).

But the REAL reason I'd love to see a smarter "data-mining" git log 
(whether it does things by bayesian clustering or any other kind of 
grouping technology) is that this is actually something that people ask 
for: when I make my "git shortlog" for major releases, the thing is often 
thousands of lines long, and it would be _beautiful_ if that could be 
data-mined somewhat more intelligently.

So, for example, do a simple

	git shortlog v2.6.17..v2.6.18

(with the shortlog in "next" that can do this - btw, why doesn't it 
default to using PAGER like "git log" does?), and realize that it's about 
8500 lines of stuff, and nobody can really be expected to read it. It's 
not a "shortlog" in other words.

So what would a _nice_ "shortlog" do? I'd _love_ to see ways to make it 
more concise, more "short" for something like this. Look at the output as 
a _non_kernel_ person, and what does it tell you? Not a lot. It's just too 
big.

Examples of what I think would be _really_ useful (much more so than 
going by "topic branches", even if they existed):

 - Clustering.

   The author-based clustering does work, but it would be even better to 
   cluster by other methods ("subsystem" - either by subdirectory, or by 
   noticing filename patters, or even patterns in the patches: there's a 
   lot of academic work on clustering human text, perhaps not as much on 
   clustering patches).

 - Shortening

   The "shortlog" often isn't. It's wonderful for small things as-is, but 
   once it reaches a hundred lines or more, it's less so. It would often 
   be nice to be able to say "only show the 100 biggest patches" (or 
   preferably something smarter like "the 25 biggest clusters, with a 
   short 4-line clustering explanation", but even just the "biggest 
   patches" is useful in itself and much simpler)

 - External annotations (eventually)

   One of the things that people like LWN editor Jonathan Corbet would 
   want is a way to say which patches are "important". But the thing is, 
   "importance" is (a) fleeting and (b) not necessarily as obvious when 
   the commit is made as it is afterwards. So you cannot (and must not) 
   mark things "important" at commit-time, and it thus can't really be 
   part of the repo itself, but at the same time, this is definitely 
   something that _could_ be somehow logged/annotated externally.

Now, I realize that these are all pipe-dreams, but so was my old "a better 
annotate than annotate" a year or two ago. So I'm not saying that people 
should work on this, I'm just saying that it's worth perhaps thinking 
about, because I think the git model does actually give us the power to 
_do_ things like this. Eventually.

And the reason? Performance! Git is fast enough that we really _can_ 
afford to do things like "generate diffs for every single commit in the 
range v2.6.17..v2.6.18" and it takes me just 20 seconds to do on a 
reasonable machine with "git log -p". So good performance means that we 
can _afford_ to do a diffstat for everything (or, just raw diffs to make 
it even cheaper - quite often you care more about _which_ files and how 
many files something touched than the actual size of the diff in those 
files itself), and using that diffstat to some day generate shortlogs that 
are more useful for people like Jonathan Corbet and others that just want 
to get an overview of "what happened"?

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] (experimental) per-topic shortlog.
  2006-11-27  1:06 ` Linus Torvalds
  2006-11-27  1:38   ` Junio C Hamano
  2006-11-27  1:55   ` Junio C Hamano
@ 2006-11-27 23:46   ` Johannes Schindelin
  2006-11-28  0:09     ` Junio C Hamano
  2 siblings, 1 reply; 18+ messages in thread
From: Johannes Schindelin @ 2006-11-27 23:46 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Junio C Hamano, git

Hi,

On Sun, 26 Nov 2006, Linus Torvalds wrote:

> Of course, such a thing done well would probably be worthy of a doctoral 
> thesis or something. Maybe somebody on this list who is into bayesian 
> clustering and doesn't have a thesis subject...

Funny you should mention it... I recently was exposed to Formal Concept 
Analysis, and immediately thought that this would have applications in the 
visualization of source codes' histories.

Maybe there is a way to apply Bayesian Inference to determine a subset 
which bears the highest information / subset size ratio.

As for reducing the number of lines in the shortlog: taking myself as an 
example, I often touch the same code several times, just to fix bugs. So, 
if the same code was touched several times, just take the first oneline, 
and add "(+fixes)". Of course, this is more like a wedding between 
shortlog and annotate, and likely to be slow.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] (experimental) per-topic shortlog.
  2006-11-27 23:46   ` Johannes Schindelin
@ 2006-11-28  0:09     ` Junio C Hamano
  2006-11-28 13:11       ` Jeff King
  0 siblings, 1 reply; 18+ messages in thread
From: Junio C Hamano @ 2006-11-28  0:09 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: git, Linus Torvalds

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

> As for reducing the number of lines in the shortlog: taking myself as an 
> example, I often touch the same code several times, just to fix bugs. So, 
> if the same code was touched several times, just take the first oneline, 
> and add "(+fixes)". Of course, this is more like a wedding between 
> shortlog and annotate, and likely to be slow.

Interesting.  While driving to work this morning I had the same
thought.  A revision that does not appear in the output from

	for file in $(list of files the commit touches)
        do
		git blame v2.6.17..v2.6.18 -- $file
	done

can safely be omitted from the shortlog, because later changes
fully supersedes it.

I think the list of "important" changes is an interesting
problem, but the importance may not directly be related to the
number of paths a patch touches (e.g. "you reorder the members
of a structure everybody uses in one include file and everything
starts performing faster due to better cache behaviour" would be
a few lines of a single header file).  Also better clues to
judge the importance would be found outside the repository.
"The patch discussed by many people on the list" and "the patch
that had very many iteration to get in the final shape" would
certainly be interesting ones, but that information is often not
found in the repository.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] (experimental) per-topic shortlog.
  2006-11-28  0:09     ` Junio C Hamano
@ 2006-11-28 13:11       ` Jeff King
  2006-11-28 13:43         ` Johannes Schindelin
  2006-11-29  0:57         ` Junio C Hamano
  0 siblings, 2 replies; 18+ messages in thread
From: Jeff King @ 2006-11-28 13:11 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Johannes Schindelin, git, Linus Torvalds

On Mon, Nov 27, 2006 at 04:09:53PM -0800, Junio C Hamano wrote:

> Interesting.  While driving to work this morning I had the same
> thought.  A revision that does not appear in the output from
> 
> 	for file in $(list of files the commit touches)
>         do
> 		git blame v2.6.17..v2.6.18 -- $file
> 	done

Just for fun, I took a look at what we might see by ordering commits by
their "amount of blamedness". That is, the count of lines introduced by
a commit which were not later superseded. The script I used is below:

#!/bin/sh

start=$1; shift
end=$1; shift

start_sha1=`git-rev-parse $start^{}`
git-rev-list --parents $start..$end >revs
echo $start_sha1 >>revs
for i in `git-diff --raw -r $start $end | cut -f2`; do
  echo blaming $i... >&2
  git-blame -l -S revs $i | cut -d' ' -f1
done |
  grep -v $start_sha1 |
  sort | uniq -c | sort -rn |
  while read count hash; do
    echo "$count `git-rev-list --max-count=1 --pretty=oneline $hash`"
  done

The top 15 for v1.4.3 to v1.4.4 are:

1604 6973dcaee76ef7b7bfcabd2f26e76205aae07858 Libify diff-files.
1100 9f613ddd21cbd05bfc139d9b1551b5780aa171f6 Add git-for-each-ref: helper for language bindings
1050 cee7f245dcaef6dade28464f59420095a9949aac git-pickaxe: blame rewritten.
700 58e60dd203362ecb9fdea765dcc2eb573892dbaf Add support for pushing to a remote repository using HTTP/DAV
571 9f1afe05c3ab7228e21ba3666c6e35d693149b37 gitk: New improved gitk
524 197e8951abd2ebf2c70d0847bb0b38b16b92175b http-push: support for updating remote info/refs
504 83b5d2f5b0c95fe102bc3d1cc2947abbdf5e5c5b builtin-grep: make pieces of it available as library.
462 aa1dbc9897822c8acb284b35c40da60f3debca91 Update http-push functionality
344 a57a9493df00b6fbb3699fda8ceedf4ac0783ac6 Added Perl git-cvsimport-script
343 f8b28a4078a29cbf93cac6f9edd8d5c203777313 gitk: Add a tree-browsing mode
323 00449f992b629f7f7884fb2cf46ff411a2a4f381 Make git-fmt-merge-msg a builtin
285 fd8ccbec4f0161b14f804a454e68b29e24840ad3 gitk: Work around Tcl's non-standard names for encodings
283 9cf6d3357aaaaa89dd86cc156221b7b604e9358c Add git-index-pack utility
277 e4fbbfe9eccd37c0f9c060eac181ce05988db76c Add git-zip-tree
256 da7c24dd9c75d014780179f8eb843968919e4c46 gitk: Basic support for highlighting one view within another

The bottom 15 are:

1 076b2324cdca9a2825c569cf9ec02d219c237e26 show-branch: make it work in a subdirectory.
1 061303f0b50a648db8e0af23791fc56181f6bf93 cvsimport: always set $ENV{GIT_INDEX_FILE} to $index{$branch}
1 057bc808b4aa2e7795f9bd395e68071301bc0b74 path-list: fix path-list-insert return value
1 04c13d38772c77997d8789ee2067cc351b66e2aa Save the maxwidth setting in the ~/.gitk file.
1 041a7308de3e6af36c5a6cc3412b542f42314f3f sha1_name.c: prepare to make get_tree_entry() reusable from others.
1 0360e99d06acfbb0fcb72215cf6749591ee53290 [PATCH] Fix git-rev-parse --default and --flags handling
1 02d3dca3bff6a67dead9f5b97dfe3576fe5b14e5 revision.c: fix "dense" under --remove-empty
1 02c5cba2007856465710aa37cd41b404372ab95b find_unique_abbrev() with len=0 should not abbreviate
1 02853588a48eddbaa42b58764960394e416d68bf Typofix in Makefile comment.
1 024701f1d88d79f3777bf45c82437f40a80b6eaa Make pack-objects chattier.
1 021b6e454944a4fba878651ebf9bfe0a3f6c3077 Make index file locking code reusable to others.
1 01ff767a3266a1876ce24a200c45786083768fda Merge branch 'lt/refs' into next
1 01385e275828c1116ea9bfcf827f82f450ee8f5f Comment fixes.
1 013049c985e4095106e545559c17bc594d56468d revert/cherry-pick: handle single quote in author name.
1 0086e2c854e3af3209915e4ec2f933bcef400050 Rename lost+found to lost-found.

This approach isn't without value; the top lines really _are_ important
changes, as they show where a lot of work (line-wise) went.  The bottom
lines are relatively unimportant (oh boy, comment fixes!).  But there
are obviously some one-liners that are very interesting. For example:
  1 0abc0260fa3419de649fcc1444e3d256a17ca6c7 pager: default to LESS=FRSX not LESS=FRS
generated quite a bit of discussion on the list, and end users would
care about it.

I think it's clear that "important commits" is going to be something we
determine through heuristics; blame-able lines is probably a heuristic
worth considering.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] (experimental) per-topic shortlog.
  2006-11-28 13:11       ` Jeff King
@ 2006-11-28 13:43         ` Johannes Schindelin
  2006-11-28 13:56           ` Jeff King
  2006-11-29  0:57         ` Junio C Hamano
  1 sibling, 1 reply; 18+ messages in thread
From: Johannes Schindelin @ 2006-11-28 13:43 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, git, Linus Torvalds

Hi,

On Tue, 28 Nov 2006, Jeff King wrote:

> I think it's clear that "important commits" is going to be something we 
> determine through heuristics; blame-able lines is probably a heuristic 
> worth considering.

I was surprised that not more of my stuff was in the top-15, since I 
submitted less-than-finished patches quite often. Especially 
merge-recursive was quite a bit of work for Alex and me.

BTW merge-recursive is a perfect example why this approach will break 
down: most of the rewrite in C took place in a private repository with 
quite some commits. This does not show in the git repository.

I fully expect the linux repository to behave similarly, since most of the 
features are cooked elsewhere, and not all of them are pulled, but some 
are applied (i.e. they appear out of nowhere from the repository's 
viewpoint).

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] (experimental) per-topic shortlog.
  2006-11-28 13:43         ` Johannes Schindelin
@ 2006-11-28 13:56           ` Jeff King
  0 siblings, 0 replies; 18+ messages in thread
From: Jeff King @ 2006-11-28 13:56 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Junio C Hamano, git, Linus Torvalds

On Tue, Nov 28, 2006 at 02:43:49PM +0100, Johannes Schindelin wrote:

> BTW merge-recursive is a perfect example why this approach will break 
> down: most of the rewrite in C took place in a private repository with 
> quite some commits. This does not show in the git repository.
> 
> I fully expect the linux repository to behave similarly, since most of the 
> features are cooked elsewhere, and not all of them are pulled, but some 
> are applied (i.e. they appear out of nowhere from the repository's 
> viewpoint).

Yes, I think this would be more useful in concert with some sort of
grouping. If we can make a group of commits related to merge-recursive,
and score them as a single item, then they can be compared to other
groups (which may consist of a single commit or several).


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] (experimental) per-topic shortlog.
  2006-11-28 13:11       ` Jeff King
  2006-11-28 13:43         ` Johannes Schindelin
@ 2006-11-29  0:57         ` Junio C Hamano
  2006-12-01  8:11           ` Jeff King
  1 sibling, 1 reply; 18+ messages in thread
From: Junio C Hamano @ 2006-11-29  0:57 UTC (permalink / raw)
  To: Jeff King; +Cc: git

Jeff King <peff@peff.net> writes:

> Just for fun, I took a look at what we might see by ordering commits by
> their "amount of blamedness". That is, the count of lines introduced by
> a commit which were not later superseded. The script I used is below:
>
> #!/bin/sh
>
> start=$1; shift
> end=$1; shift
>
> start_sha1=`git-rev-parse $start^{}`
> git-rev-list --parents $start..$end >revs
> echo $start_sha1 >>revs
> for i in `git-diff --raw -r $start $end | cut -f2`; do
>   echo blaming $i... >&2
>   git-blame -l -S revs $i | cut -d' ' -f1
> done |
>   grep -v $start_sha1 |
>   sort | uniq -c | sort -rn |
>   while read count hash; do
>     echo "$count `git-rev-list --max-count=1 --pretty=oneline $hash`"
>   done
>
> The top 15 for v1.4.3 to v1.4.4 are:
>
> 1604 6973dcaee76ef7b7bfcabd2f26e76205aae07858 Libify diff-files.

Something is SERIOUSLY wrong.

That commit is not even between v1.4.3 and v1.4.4.




^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] (experimental) per-topic shortlog.
  2006-11-29  0:57         ` Junio C Hamano
@ 2006-12-01  8:11           ` Jeff King
  2006-12-01 10:55             ` Junio C Hamano
  0 siblings, 1 reply; 18+ messages in thread
From: Jeff King @ 2006-12-01  8:11 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Tue, Nov 28, 2006 at 04:57:00PM -0800, Junio C Hamano wrote:

> > The top 15 for v1.4.3 to v1.4.4 are:
> >
> > 1604 6973dcaee76ef7b7bfcabd2f26e76205aae07858 Libify diff-files.
> 
> Something is SERIOUSLY wrong.
> 
> That commit is not even between v1.4.3 and v1.4.4.

Hmm, you're right. I haven't quite figured out what went wrong with the
script I posted. However, a somewhat simpler approach is to just use the
revision limiting in git-blame. The problem with this is that commits
whose parents aren't in the revision range end up getting blamed for a
lot of lines they're not responsible for.

As a quick hack, I just threw out any revisions whose parents weren't in
range. This is wrong, since those revisions probably _do_ have some
correctly blamed lines. It made me wonder about a possible feature for
git-blame: when we can't pass the blame up further, instead of taking
responsibility, output a "no responsibility line" (blaming on commit
0{40}, or some other format). I think this should be more informative
when there is a limit on the range of revisions.

The top of the "blamedness" list for v1.4.3..v1.4.4 is below. Important
things do seem to float to the top, but it would probably be much more
accurate if we were scoring groups of commits (generated by some other
analysis).

-Peff

-- >8 --
1050 cee7f245dcaef6dade28464f59420095a9949aac git-pickaxe: blame rewritten.
223 fe142b3a4577a6692a39e2386ed649664ad8bd20 Rework cvsexportcommit to handle binary files for all cases.
219 c31820c26b8f164433e67d28c403ca0df0316055 Make git-branch a builtin
216 636171cb80255682bdfc9bf5a98c9e66d4c0444a make index-pack able to complete thin packs.
182 b1f33d626501c3e080b324e182f1da76f49b5bf9 Swap the porcelain and plumbing commands in the git man page
173 744d0ac33ab579845808b8b01e526adc4678a226 gitweb: New improved patchset view
169 e30496dfcb98a305a57b835c248cbc3aa2376bfc gitweb: Support for 'forks'
142 5b329a5f5e3625cdc204e3d274c89646816f384c t6022: ignoring untracked files by merge-recursive when they do not matter
134 c0990ff36f0b9b8e806c8f649a0888d05bb22c37 Add man page for git-show-ref
128 780e6e735be189097dad4b223d8edeb18cce1928 make pack data reuse compatible with both delta types
121 2d477051ef260aad352d63fc7d9c07e4ebb4359b add the capability for index-pack to read from a stream
116 576162a45f35e157427300066b0ff566ff698a0f remove .keep pack lock files when done with refs update
110 e827633a5d7d627eb1170b2d0c71e944d0d56faf Built-in cherry

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] (experimental) per-topic shortlog.
  2006-12-01  8:11           ` Jeff King
@ 2006-12-01 10:55             ` Junio C Hamano
  2006-12-01 11:00               ` Junio C Hamano
  2006-12-01 11:23               ` Jeff King
  0 siblings, 2 replies; 18+ messages in thread
From: Junio C Hamano @ 2006-12-01 10:55 UTC (permalink / raw)
  To: Jeff King; +Cc: git

Jeff King <peff@peff.net> writes:

> On Tue, Nov 28, 2006 at 04:57:00PM -0800, Junio C Hamano wrote:
>
>> > The top 15 for v1.4.3 to v1.4.4 are:
>> >
>> > 1604 6973dcaee76ef7b7bfcabd2f26e76205aae07858 Libify diff-files.
>> 
>> Something is SERIOUSLY wrong.
>> 
>> That commit is not even between v1.4.3 and v1.4.4.
>
> Hmm, you're right. I haven't quite figured out what went wrong with the
> script I posted. However, a somewhat simpler approach is to just use the
> revision limiting in git-blame. The problem with this is that commits
> whose parents aren't in the revision range end up getting blamed for a
> lot of lines they're not responsible for.

The way you used "-S rev" was wrong.  It is a way to temporarily
install grafts and nothing else; but your grafts introduced that
way exactly matched the true parenthood except for the bottom
commit and side branches merged during the timeframe leaked
right through your grafts.  The digger started from your HEAD
(whatever that happened to be) along with the true parenthood
and found an way ancient ancestor.

A "bit more correct" script would have been something like this.

-- >8 --
#!/bin/sh
#
# Usage: sh ./run-me v1.4.3 v1.4.4
#
bottom=${1?bottom} top=${2?top}
bottom=$(git rev-parse --verify "$bottom^0")
range="$bottom..$top"
top=$(git rev-parse --verify "$top^0")

for path in $(git diff --name-only -r --diff-filter=AM "$range")
do
	echo >&2 "* $path"
	git blame -l -C "$range" -- "$path"
done | sed -e 's/ .*//' | sort | uniq -c | sort -n -r |
while read num hash
do
	test "$hash" = "$bottom" && continue
	it=$(git rev-list --pretty=oneline --abbrev --abbrev-commit -1 "$hash")
	printf '%6d %s\n' $num "$it"
done
-- 8< --

But as you correctly observed, even the above script is wrong.
The top one blamed with the above script is this commit:

  8301 808239a Merge branch 'sk/ftp'

But that is an ancestor of v1.4.3!

What's wrong is that the ancestry graph around that commit
roughly looks like this:

         z---o---o---o
        /             \
  808239a--v1.4.3--o---*---o---v1.4.4

The pickaxe passes the blame around to the parents but does not
allow a "boundary" commits to pass the blame to their parents.
As the result, the blame at the commit marked with '*' are split
along both branches, and after the leftmost commit 'z' passes
its blame to its parent, it stops there and ends up blaming
808239a, which is an ancestor of the original "boundary" commit
v1.4.3 given from the command line.  What's wrong with my script
quoted above is that the filter that checks $hash with $bottom;
it needs to check if $hash is an ancestor of $bottom.

With that change, the top commits are:

  1109 9f613dd Add git-for-each-ref: helper for language bindings
  1087 cee7f24 git-pickaxe: blame rewritten.
   218 c31820c Make git-branch a builtin
   209 636171c make index-pack able to complete thin packs.
   200 fe142b3 Rework cvsexportcommit to handle binary files for all cases.

which looks a bit more reasonable (I did not realize
for-each-ref was that big, but in fact it has its own
mini-language).

While what blame outputs is technically correct, it is not very
useful for this kind of application.  As you said, it probably
makes sense to gray-out the lines that are blamed on boundary
commits.

Side note: one might be tempted to say "then blame v1.4.3 for
lines that 808239a is blamed for", but that is a good
workaround.  The original command line could have more than one
bottom commits, and the final blame might go to a common
ancestor of them, and we need to randomly choose between them,
which is worse than telling the truth as we currently do.

And here is an experimental patch to do that.

-- >8 --
[PATCH] git-blame: mark lines blamed on boundary commits.

Lines can be blamed on a commit that is older than the boundary
commit given on the command line when a merge with a branch that
forked before the boundary is involved.  Mark them specially so
that later changes in the interested area can be easily
identified.

In porcelain format, their header line that describe the commit
gets an extended attribute line "boundary".  In human format,
the commit SHA-1 are prefixed with a '-' character.

Signed-off-by: Junio C Hamano <junkio@cox.net>

---

 builtin-blame.c |   15 ++++++++++++++-
 1 files changed, 14 insertions(+), 1 deletions(-)

diff --git a/builtin-blame.c b/builtin-blame.c
index dc3ffea..46a9d0e 100644
--- a/builtin-blame.c
+++ b/builtin-blame.c
@@ -1090,6 +1090,11 @@ static void assign_blame(struct scoreboard *sb, struct rev_info *revs, int opt)
 		if (!(commit->object.flags & UNINTERESTING) &&
 		    !(revs->max_age != -1 && commit->date  < revs->max_age))
 			pass_blame(sb, suspect, opt);
+		else {
+			commit->object.flags |= UNINTERESTING;
+			if (commit->object.parsed)
+				mark_parents_uninteresting(commit);
+		}

 		/* Take responsibility for the remaining entries */
 		for (ent = sb->ent; ent; ent = ent->next)
@@ -1273,6 +1278,8 @@ static void emit_porcelain(struct scoreboard *sb, struct blame_entry *ent)
 		printf("committer-tz %s\n", ci.committer_tz);
 		printf("filename %s\n", suspect->path);
 		printf("summary %s\n", ci.summary);
+		if (suspect->commit->object.flags & UNINTERESTING)
+			printf("boundary\n");
 	}
 	else if (suspect->commit->object.flags & MORE_THAN_ONE_PATH)
 		printf("filename %s\n", suspect->path);
@@ -1308,8 +1315,14 @@ static void emit_other(struct scoreboard *sb, struct blame_entry *ent, int opt)
 	cp = nth_line(sb, ent->lno);
 	for (cnt = 0; cnt < ent->num_lines; cnt++) {
 		char ch;
+		int length = (opt & OUTPUT_LONG_OBJECT_NAME) ? 40 : 8;
+
+		if (suspect->commit->object.flags & UNINTERESTING) {
+			length--;
+			putchar('-');
+		}

-		printf("%.*s", (opt & OUTPUT_LONG_OBJECT_NAME) ? 40 : 8, hex);
+		printf("%.*s", length, hex);
 		if (opt & OUTPUT_ANNOTATE_COMPAT)
 			printf("\t(%10s\t%10s\t%d)", ci.author,
 			       format_time(ci.author_time, ci.author_tz,

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH] (experimental) per-topic shortlog.
  2006-12-01 10:55             ` Junio C Hamano
@ 2006-12-01 11:00               ` Junio C Hamano
  2006-12-01 11:23               ` Jeff King
  1 sibling, 0 replies; 18+ messages in thread
From: Junio C Hamano @ 2006-12-01 11:00 UTC (permalink / raw)
  To: Jeff King; +Cc: git

Junio C Hamano <junkio@cox.net> writes:

> What's wrong is that the ancestry graph around that commit
> roughly looks like this:
>
>          z---o---o---o
>         /             \
>   808239a--v1.4.3--o---*---o---v1.4.4
> ...
> While what blame outputs is technically correct, it is not very
> useful for this kind of application.  As you said, it probably
> makes sense to gray-out the lines that are blamed on boundary
> commits.
>
> Side note: one might be tempted to say "then blame v1.4.3 for
> lines that 808239a is blamed for", but that is a good

Gaah.  "that is NOT a good workaround".  Sorry about the wasted
bandwidth.

> workaround.  The original command line could have more than one
> bottom commits, and the final blame might go to a common
> ancestor of them, and we need to randomly choose between them,
> which is worse than telling the truth as we currently do.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] (experimental) per-topic shortlog.
  2006-12-01 10:55             ` Junio C Hamano
  2006-12-01 11:00               ` Junio C Hamano
@ 2006-12-01 11:23               ` Jeff King
  1 sibling, 0 replies; 18+ messages in thread
From: Jeff King @ 2006-12-01 11:23 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

On Fri, Dec 01, 2006 at 02:55:34AM -0800, Junio C Hamano wrote:

> [PATCH] git-blame: mark lines blamed on boundary commits.

Excellent. This is exactly what I had in mind, and it seems to produce
sensible results out of the box:

git-diff --raw -r --diff-filter=AM $1 | cut -f2 |
  while read f; do
    git-blame -l $1 -- $f | grep -v ^- | cut -d' ' -f1
  done |
  sort | uniq -c | sort -rn |
  while read count hash; do
    echo "$count `git-rev-list --max-count=1 --pretty=oneline $hash`"
  done

Thanks!


^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2006-12-01 11:56 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-11-27  0:44 [PATCH] (experimental) per-topic shortlog Junio C Hamano
2006-11-27  1:06 ` Linus Torvalds
2006-11-27  1:38   ` Junio C Hamano
2006-11-27  1:53     ` Linus Torvalds
2006-11-27  1:55   ` Junio C Hamano
2006-11-27  2:52     ` Linus Torvalds
2006-11-27  6:48       ` Junio C Hamano
2006-11-27 16:20         ` Linus Torvalds
2006-11-27 23:46   ` Johannes Schindelin
2006-11-28  0:09     ` Junio C Hamano
2006-11-28 13:11       ` Jeff King
2006-11-28 13:43         ` Johannes Schindelin
2006-11-28 13:56           ` Jeff King
2006-11-29  0:57         ` Junio C Hamano
2006-12-01  8:11           ` Jeff King
2006-12-01 10:55             ` Junio C Hamano
2006-12-01 11:00               ` Junio C Hamano
2006-12-01 11:23               ` Jeff King

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).