From: Linus Torvalds <torvalds@osdl.org>
To: Junio C Hamano <junkio@cox.net>
Cc: git@vger.kernel.org, Johannes Schindelin <Johannes.Schindelin@gmx.de>
Subject: Re: [PATCH] (experimental) per-topic shortlog.
Date: Sun, 26 Nov 2006 17:06:08 -0800 (PST) [thread overview]
Message-ID: <Pine.LNX.4.64.0611261652520.30076@woody.osdl.org> (raw)
In-Reply-To: <7v8xhxsopp.fsf@assigned-by-dhcp.cox.net>
On Sun, 26 Nov 2006, Junio C Hamano wrote:
>
> This implements an experimental "git log-fpc" command that shows
> short-log style output sorted by topics.
>
> A "topic" is identified by going through the first-parent
> chains; this ignores the fast-forward case, but for a top-level
> integrator it often is good enough.
Umm. May I suggest that you try this with the kernel repo too..
There, the "first parent chain" tends to be less interesting than a lot of
other heuristics:
- committer
If the committer changes, you should probably consider it a break, the
same way a second parent would be a break. You probably won't see this
in the git archive, because there tends to be a single committer, but
on something like the kernel where we really merge other peoples repos,
it's going to be as good (or better) than looking at "other parents".
- subdirectory heuristics
Again, with git it's not very interesting, but I bet that you'd be able
to use heuristics like "the bulk of the changes were contained within
this directory tree" for projects like the kernel, and automatically
decide on "topics" like drivers/scsi, fs/ext3 etc.
In other words, I don't think the "fpc" decision is even very interesting.
If you _really_ want to do a cool shortlogger, I bet it can be done, but I
suspect that it would be a LOT cooler to do some automatic bayesian
clustering based on committer, author and list of filenames changed.
Of course, such a thing done well would probably be worthy of a doctoral
thesis or something. Maybe somebody on this list who is into bayesian
clustering and doesn't have a thesis subject...
(Of course, since I haven't been in a University setting for the last ten
years, maybe bayesian clustering isn't the cool thing to work on any
more).
Anyway, "topics" really should be something that is extremely open to
various clustering models, bayesian or not ..
next prev parent reply other threads:[~2006-11-27 1:06 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2006-11-27 0:44 [PATCH] (experimental) per-topic shortlog Junio C Hamano
2006-11-27 1:06 ` Linus Torvalds [this message]
2006-11-27 1:38 ` Junio C Hamano
2006-11-27 1:53 ` Linus Torvalds
2006-11-27 1:55 ` Junio C Hamano
2006-11-27 2:52 ` Linus Torvalds
2006-11-27 6:48 ` Junio C Hamano
2006-11-27 16:20 ` Linus Torvalds
2006-11-27 23:46 ` Johannes Schindelin
2006-11-28 0:09 ` Junio C Hamano
2006-11-28 13:11 ` Jeff King
2006-11-28 13:43 ` Johannes Schindelin
2006-11-28 13:56 ` Jeff King
2006-11-29 0:57 ` Junio C Hamano
2006-12-01 8:11 ` Jeff King
2006-12-01 10:55 ` Junio C Hamano
2006-12-01 11:00 ` Junio C Hamano
2006-12-01 11:23 ` Jeff King
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Pine.LNX.4.64.0611261652520.30076@woody.osdl.org \
--to=torvalds@osdl.org \
--cc=Johannes.Schindelin@gmx.de \
--cc=git@vger.kernel.org \
--cc=junkio@cox.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).