* [RFC/RFH] Fun things with git-notes, or: patch tracking backwards
@ 2009-02-09 14:08 Thomas Rast
2009-02-09 16:19 ` SZEDER Gábor
2009-02-10 22:42 ` [RFC/RFH] Fun things with git-notes, or: patch tracking backwards Thomas Rast
0 siblings, 2 replies; 11+ messages in thread
From: Thomas Rast @ 2009-02-09 14:08 UTC (permalink / raw)
To: git
[-- Attachment #1: Type: text/plain, Size: 2280 bytes --]
Hi everyone
I'll start with the fun first, try this in a git.git:
git fetch git://repo.or.cz/git/trast.git mailnotes &&
GIT_NOTES_REF=FETCH_HEAD git log origin/pu
I played around with some python code over the weekend that
automatically filters through git@vger.kernel.org history and scans
for patches, threads, and "What's cooking" messages. So far it seems
to be working ok.
The net effect is that we get a backwards patch tracker: instead of
tracking the patches via some other means (say a web interface), the
automatic annotations can reconstruct more information about the
patches than what is eventually contained in the commits, and make it
available via git-notes.
Right now it only applies the patches that it finds, and associates
that with known commits to annotate them. I eventually want to scan
for replies to patch text too, and insert them into the notes too. I
also plan to publish topics for the patches somewhere (they're already
applied locally) and track at least a rudimentary status, perhaps one
of "unreplied", "replied" and "accepted". (Distinguishing "rejected"
seems AI-complete.)
The whole process is a bunch of python scripts currently available at
git://repo.or.cz/trackgit.git
http://repo.or.cz/w/trackgit.git
It's all a bit after-the-fact right now, and can't cope with a few
things yet, for example patch series that aren't in git.git appearing
out of sequence to the mail reader.
Runtime is okay-ish so that I should be able to run it as a cronjob;
note regeneration is almost negligible (<1min), importing a month's
worth of mails takes on the order of 20min, and scanning history (to
know about the base blobs) since v1.6.0 is around 3min.
So the RFC bit is, is this useful to anyone? What information would
you like to see in it, and what could be left out?
And the RFH: I don't have a full mail archive, not even since I joined
the list. There also doesn't seem to be a convenient download button
on gmane. Does anyone have (or know of) an archive going at least
back to v1.6.0 (not sure if any further back is interesting), which
was released in August last year, that you could send to me?
Thanks in advance :-)
--
Thomas Rast
trast@{inf,student}.ethz.ch
[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 197 bytes --]
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC/RFH] Fun things with git-notes, or: patch tracking backwards
2009-02-09 14:08 [RFC/RFH] Fun things with git-notes, or: patch tracking backwards Thomas Rast
@ 2009-02-09 16:19 ` SZEDER Gábor
2009-02-09 16:29 ` Thomas Rast
2009-02-09 17:49 ` tool and worktree Giuseppe Bilotta
2009-02-10 22:42 ` [RFC/RFH] Fun things with git-notes, or: patch tracking backwards Thomas Rast
1 sibling, 2 replies; 11+ messages in thread
From: SZEDER Gábor @ 2009-02-09 16:19 UTC (permalink / raw)
To: Thomas Rast; +Cc: git
Hi Thomas,
On Mon, Feb 09, 2009 at 03:08:08PM +0100, Thomas Rast wrote:
> And the RFH: I don't have a full mail archive, not even since I joined
> the list. There also doesn't seem to be a convenient download button
> on gmane.
you can download emails in mbox format from gmane by running
wget http://download.gmane.org/gmane.comp.version-control.git/X/Y
which will download all emails starting at the email with "gmane id" X
and ending at the email with "gmane id" Y-1.
So, if you want to download the whole archive, you could run
wget http://download.gmane.org/gmane.comp.version-control.git/1/109090
(but of course that 109090 will be larger by the time you will reading
this). However, there's a catch, as gmane's script execution time is
limited to 30 sec, so you will not get the whole archive, but only the
first couple of thousands emails (in my case and network I got around
6k emails in one run). Therefore you'll need some loop to download
only a few thousand emails in each iteration.
Best,
Gábor
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC/RFH] Fun things with git-notes, or: patch tracking backwards
2009-02-09 16:19 ` SZEDER Gábor
@ 2009-02-09 16:29 ` Thomas Rast
2009-02-09 17:49 ` tool and worktree Giuseppe Bilotta
1 sibling, 0 replies; 11+ messages in thread
From: Thomas Rast @ 2009-02-09 16:29 UTC (permalink / raw)
To: SZEDER Gábor; +Cc: git
[-- Attachment #1: Type: text/plain, Size: 367 bytes --]
SZEDER Gábor wrote:
> you can download emails in mbox format from gmane by running
>
> wget http://download.gmane.org/gmane.comp.version-control.git/X/Y
>
> which will download all emails starting at the email with "gmane id" X
> and ending at the email with "gmane id" Y-1.
Thanks, that will do nicely!
--
Thomas Rast
trast@{inf,student}.ethz.ch
[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 197 bytes --]
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: tool and worktree
2009-02-09 16:19 ` SZEDER Gábor
2009-02-09 16:29 ` Thomas Rast
@ 2009-02-09 17:49 ` Giuseppe Bilotta
1 sibling, 0 replies; 11+ messages in thread
From: Giuseppe Bilotta @ 2009-02-09 17:49 UTC (permalink / raw)
To: git, git; +Cc: Giuseppe Bilotta, Shawn O. Pearce, bill lam
On Sunday 08 February 2009 20:25, Shawn O. Pearce wrote:
> bill lam <cbill.lam@gmail.com> wrote:
>> I track /etc using a config
>>
>> [core]
>> repositoryformatversion = 0
>> filemode = true
>> bare = false
>> worktree = /etc
>> logAllRefUpdates = true
>> excludesfile =
>>
>> But that can not be handled by tools,
>>
>> git gui : cannot use funny .git directory .
>
> If someone sends patches for git-gui, maybe. This use case of
> different repository and worktree isn't very common for git-gui
> so it doesn't support it.
I came across the same problem. I also sent a tentative patchset to
solve the problem about 12 hours ago:
http://thread.gmane.org/gmane.comp.version-control.git/109035
--
Giuseppe "Oblomov" Bilotta
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC/RFH] Fun things with git-notes, or: patch tracking backwards
2009-02-09 14:08 [RFC/RFH] Fun things with git-notes, or: patch tracking backwards Thomas Rast
2009-02-09 16:19 ` SZEDER Gábor
@ 2009-02-10 22:42 ` Thomas Rast
2009-02-10 22:52 ` Junio C Hamano
2009-02-10 22:59 ` Junio C Hamano
1 sibling, 2 replies; 11+ messages in thread
From: Thomas Rast @ 2009-02-10 22:42 UTC (permalink / raw)
To: git
[-- Attachment #1: Type: text/plain, Size: 785 bytes --]
Thomas Rast wrote:
> git fetch git://repo.or.cz/git/trast.git mailnotes &&
> GIT_NOTES_REF=FETCH_HEAD git log origin/pu
An update: I have fully automated the process, it now fetches mails
from Gmane over HTTP which gives it the Gmane URLs for free. I'm
rather happy with the latter feature, especially since Konsole has a
feature to recognize and open links directly.
I have imported all commits, and mails since roughly July 2008
(starting with Gmane 89000). In this timeframe there were 1802
non-merge commits, and the mailnotes tree now holds 1122 annotations.
I won't import mails any further back until the parsing-related code
has become reasonably stable, but this at least covers the post-v1.6.0
commits.
--
Thomas Rast
trast@{inf,student}.ethz.ch
[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 197 bytes --]
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC/RFH] Fun things with git-notes, or: patch tracking backwards
2009-02-10 22:42 ` [RFC/RFH] Fun things with git-notes, or: patch tracking backwards Thomas Rast
@ 2009-02-10 22:52 ` Junio C Hamano
2009-02-10 22:59 ` Junio C Hamano
1 sibling, 0 replies; 11+ messages in thread
From: Junio C Hamano @ 2009-02-10 22:52 UTC (permalink / raw)
To: Thomas Rast; +Cc: git
Thomas Rast <trast@student.ethz.ch> writes:
> Thomas Rast wrote:
>> git fetch git://repo.or.cz/git/trast.git mailnotes &&
>> GIT_NOTES_REF=FETCH_HEAD git log origin/pu
>
> An update: I have fully automated the process, it now fetches mails
> from Gmane over HTTP which gives it the Gmane URLs for free. I'm
> rather happy with the latter feature, especially since Konsole has a
> feature to recognize and open links directly.
>
> I have imported all commits, and mails since roughly July 2008
> (starting with Gmane 89000). In this timeframe there were 1802
> non-merge commits, and the mailnotes tree now holds 1122 annotations.
Wonderful.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC/RFH] Fun things with git-notes, or: patch tracking backwards
2009-02-10 22:42 ` [RFC/RFH] Fun things with git-notes, or: patch tracking backwards Thomas Rast
2009-02-10 22:52 ` Junio C Hamano
@ 2009-02-10 22:59 ` Junio C Hamano
2009-02-10 23:12 ` Thomas Rast
1 sibling, 1 reply; 11+ messages in thread
From: Junio C Hamano @ 2009-02-10 22:59 UTC (permalink / raw)
To: Thomas Rast; +Cc: git
Thomas Rast <trast@student.ethz.ch> writes:
> I have imported all commits, and mails since roughly July 2008
> (starting with Gmane 89000). In this timeframe there were 1802
> non-merge commits, and the mailnotes tree now holds 1122 annotations.
How do you match the mails to commits?
I am curious what the right balance for the matching algorithm should be,
between being forgiving about amending of commit log message and the patch
text to fix minor typos and obvious bugs, and being strict not to cause
false matches to a message that contains the second iteration of the
patch, when what was committed was the first iteration.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC/RFH] Fun things with git-notes, or: patch tracking backwards
2009-02-10 22:59 ` Junio C Hamano
@ 2009-02-10 23:12 ` Thomas Rast
2009-02-10 23:30 ` Junio C Hamano
0 siblings, 1 reply; 11+ messages in thread
From: Thomas Rast @ 2009-02-10 23:12 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git
[-- Attachment #1: Type: text/plain, Size: 880 bytes --]
Junio C Hamano wrote:
> Thomas Rast <trast@student.ethz.ch> writes:
>
> > I have imported all commits, and mails since roughly July 2008
> > (starting with Gmane 89000). In this timeframe there were 1802
> > non-merge commits, and the mailnotes tree now holds 1122 annotations.
>
> How do you match the mails to commits?
>
> I am curious what the right balance for the matching algorithm should be,
> between being forgiving about amending of commit log message and the patch
> text to fix minor typos and obvious bugs, and being strict not to cause
> false matches to a message that contains the second iteration of the
> patch, when what was committed was the first iteration.
Right now it's just the patch-id. Maybe filtering (author,subject)
and then picking the one that is the most similar could work.
--
Thomas Rast
trast@{inf,student}.ethz.ch
[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 197 bytes --]
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC/RFH] Fun things with git-notes, or: patch tracking backwards
2009-02-10 23:12 ` Thomas Rast
@ 2009-02-10 23:30 ` Junio C Hamano
2009-02-11 22:52 ` Thomas Rast
0 siblings, 1 reply; 11+ messages in thread
From: Junio C Hamano @ 2009-02-10 23:30 UTC (permalink / raw)
To: Thomas Rast; +Cc: git
Thomas Rast <trast@student.ethz.ch> writes:
> Junio C Hamano wrote:
>> Thomas Rast <trast@student.ethz.ch> writes:
>>
>> > I have imported all commits, and mails since roughly July 2008
>> > (starting with Gmane 89000). In this timeframe there were 1802
>> > non-merge commits, and the mailnotes tree now holds 1122 annotations.
>>
>> How do you match the mails to commits?
>>
>> I am curious what the right balance for the matching algorithm should be,
>> between being forgiving about amending of commit log message and the patch
>> text to fix minor typos and obvious bugs, and being strict not to cause
>> false matches to a message that contains the second iteration of the
>> patch, when what was committed was the first iteration.
>
> Right now it's just the patch-id. Maybe filtering (author,subject)
> and then picking the one that is the most similar could work.
Yeah, I actually was thinking about matching the (date, author) tuple and
nothing else, as it is unlikely you would have dups.
By the way, the note to f6b98e4 (git-web--browse: Fix check for
/bin/start, 2009-02-08) is interesting.
Ramsay's 104332 was the same as what ended up in the commit, but his
second patch that can be found by going to the thread from it is obviously
a better alternative. In short, I screwed up, by not recalling the
previous round. Sorry.
I find the "Extra-Notes:" tag a bit too loud, but I am probably a minority
who thinks everything but the Message-ID can be dropped, so please don't
take it as a feature request ;-)
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC/RFH] Fun things with git-notes, or: patch tracking backwards
2009-02-10 23:30 ` Junio C Hamano
@ 2009-02-11 22:52 ` Thomas Rast
2009-02-11 22:58 ` Thomas Rast
0 siblings, 1 reply; 11+ messages in thread
From: Thomas Rast @ 2009-02-11 22:52 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git
[-- Attachment #1: Type: text/plain, Size: 1645 bytes --]
Junio C Hamano wrote:
> Thomas Rast <trast@student.ethz.ch> writes:
> > Right now it's just the patch-id. Maybe filtering (author,subject)
> > and then picking the one that is the most similar could work.
>
> Yeah, I actually was thinking about matching the (date, author) tuple and
> nothing else, as it is unlikely you would have dups.
Thanks, good idea. I changed the code to parse the required data, and
we're now up to 1502 annotations.
Unfortunately I noticed there's a bug in the mail input stage:
Python's mailbox module assumes any '^From ' line starts a new mail,
while gmane apparently uses a slightly different format based on the
double newlines too (and they always have the same 'From
news@gmane.org Tue Mar 04 03:33:20 2003' separator too), and doesn't
quote '^From ' in the bodies. So any mail containing such body lines
got chopped down in the middle, and any patches contained in them
won't apply because of the missing headers.
A quick perl run shows that there are 26 mails affected among the
89000+ mails that I've (again) imported. The fix should be easy, but
I'm already short on sleep.
> I find the "Extra-Notes:" tag a bit too loud, but I am probably a minority
> who thinks everything but the Message-ID can be dropped, so please don't
> take it as a feature request ;-)
I refactored the final formatting stage a bit to let it do several
notes trees, and you can now take your pick:
git://repo.or.cz/trackgit.git notes/full
git://repo.or.cz/trackgit.git notes/terse
The latter only has 'Message-Id' and 'Archived-At'.
--
Thomas Rast
trast@{inf,student}.ethz.ch
[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 197 bytes --]
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2009-02-11 23:00 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-02-09 14:08 [RFC/RFH] Fun things with git-notes, or: patch tracking backwards Thomas Rast
2009-02-09 16:19 ` SZEDER Gábor
2009-02-09 16:29 ` Thomas Rast
2009-02-09 17:49 ` tool and worktree Giuseppe Bilotta
2009-02-10 22:42 ` [RFC/RFH] Fun things with git-notes, or: patch tracking backwards Thomas Rast
2009-02-10 22:52 ` Junio C Hamano
2009-02-10 22:59 ` Junio C Hamano
2009-02-10 23:12 ` Thomas Rast
2009-02-10 23:30 ` Junio C Hamano
2009-02-11 22:52 ` Thomas Rast
2009-02-11 22:58 ` Thomas Rast
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).