From: Junio C Hamano <junkio@cox.net>
To: Ryan Anderson <ryan@michonline.com>
Cc: Linus Torvalds <torvalds@osdl.org>, git@vger.kernel.org
Subject: [PATCH] GIT commit statistics.
Date: Fri, 11 Nov 2005 23:44:20 -0800 [thread overview]
Message-ID: <7v7jbeia3v.fsf_-_@assigned-by-dhcp.cox.net> (raw)
In-Reply-To: <43758D21.3060107@michonline.com> (Ryan Anderson's message of "Sat, 12 Nov 2005 01:35:13 -0500")
Ryan Anderson <ryan@michonline.com> writes:
> Junio C Hamano wrote:
>
>> Just for fun, I randomly picked two heads/master commits from
>> linux-2.6 repository ... and fed the commits
>> between the two to a little script that looks at commits and
>> tries to stat what they did (the script ignores renames so they
>> appear as deletes and adds).
>
> Mind sharing the script?
>
> It'be nice to know if these stats are typical, or unusual when you get
> numbers from a variety of other trees.
Very unpolished but here they are.
I misread the trivial count in my original message. Trivial and
Merge are counted separately, so among 3957 commits, merges were
297 (72 trivials and 225 others).
-- >8 -- cut here -- >8 --
Subject: [PATCH] GIT commit statistics
A set of scripts that read the existing commit history, and
show various stats.
Sample usage:
# Arguments are given to git-rev-list; defaults to ORIG..
# if not given, to retrace what was just pulled.
$ ./contrib/jc-git-stat-1.sh v0.99.9g..maint |
./contrib/jc-git-stat-1-log.perl
Total commit objects: 43
Trivial Merges: 1 (2.33%)
Merges: 1 (2.33%)
Number of paths touched by non-merge commits:
average 3.00, median 2, min 2, max 18
Number of merge parents:
average 2.50, median 3, min 2, max 3
Number of merge bases:
average 1.00, median 1, min 1, max 1
File level merges:
average 0.50, median 1, min 0, max 1
Number of changed paths from the first parent:
average 28.00, median 52, min 4, max 52
File level 3-ways:
average 1.00, median 1, min 1, max 1
* "Trivial Merges" are the ones done by read-tree --trivial;
* "Merges" are other merges;
* "File level merges" are paths not collapsed by read-tree 3-way (i.e.
given to merge-one-file);
* "File level 3-ways" are paths merge-one-file would have run 'merge';
Signed-off-by: Junio C Hamano <junkio@cox.net>
---
contrib/jc-git-stat-1-log.perl | 87 ++++++++++++++++++++++++++++++++++++++++
contrib/jc-git-stat-1-mof.sh | 59 +++++++++++++++++++++++++++
contrib/jc-git-stat-1.sh | 74 ++++++++++++++++++++++++++++++++++
3 files changed, 220 insertions(+), 0 deletions(-)
create mode 100755 contrib/jc-git-stat-1-log.perl
create mode 100755 contrib/jc-git-stat-1-mof.sh
create mode 100755 contrib/jc-git-stat-1.sh
applies-to: 9a0f0c748316751fbf593a21f2b16bcdd975095a
2cb3da4b260ed82dc379a11d91f55fe774a2ea49
diff --git a/contrib/jc-git-stat-1-log.perl b/contrib/jc-git-stat-1-log.perl
new file mode 100755
index 0000000..b70af2b
--- /dev/null
+++ b/contrib/jc-git-stat-1-log.perl
@@ -0,0 +1,87 @@
+#!/usr/bin/perl
+
+my ($patches, $failures, $merges, $trivials) = (0, 0, 0, 0);
+my (@patch_paths,
+ @parent_counts,
+ @base_counts,
+ @merge_counts,
+ @path_counts,
+ @res_counts,
+ @merge_m,
+ @merge_a,
+ @merge_d,
+ @merge_c,
+ @merge_u);
+
+sub avg_median {
+ my ($ary) = shift;
+ my ($msg) = shift;
+ my @a = sort { $a <=> $b } @$ary;
+ my $sum = 0;
+ for (@a) { $sum += $_ }
+ return unless (@a && $sum);
+ my ($avg, $med) = ($sum/@a, $a[(@a/2)]);
+ my ($min, $max) = ($a[0], $a[$#a]);
+ printf "%s:\n\taverage %.2f, median %d, min %d, max %d\n",
+ $msg, $avg, $med, $min, $max;
+}
+
+while (<>) {
+ next unless (s/^([MCFT]) [0-9a-f]{40} //);
+ chomp;
+ my $type = $1;
+ if ($type eq 'F') {
+ $failures++;
+ next;
+ }
+ if ($type eq 'C') {
+ $patches++;
+ push @patch_paths, $_;
+ next;
+ }
+ if ($type eq 'M') {
+ $merges++;
+ }
+ elsif ($type eq 'T') {
+ $trivials++;
+ }
+ else {
+ die "?? $type";
+ }
+ s/^(\d+) (\d+) (\d+) (\d+) (\d+) *//;
+ push @parent_counts, $1;
+ push @base_counts, $2;
+ push @merge_counts, $3;
+ push @path_counts, $4;
+ push @res_counts, $5;
+ if ($type eq 'M') {
+ /M=(\d+) A=(\d+) D=(\d+) C=(\d+) U=(\d+)/ or die;
+ push @merge_m, $1;
+ push @merge_a, $2;
+ push @merge_d, $3;
+ push @merge_c, $4;
+ push @merge_u, $5;
+ }
+}
+
+my $total = ($failures+$patches+$merges+$trivials);
+print "Total commit objects: $total\n";
+printf "Trivial Merges: $trivials (%.2f%%)\n", ($trivials * 100.0/$total);
+printf "Merges: $merges (%.2f%%)\n", ($merges * 100.0/$total);
+if ($failures) {
+ print "Failures: $failures\n";
+}
+
+avg_median(\@patch_paths, "Number of paths touched by non-merge commits");
+avg_median(\@parent_counts, "Number of merge parents");
+avg_median(\@base_counts, "Number of merge bases");
+avg_median(\@merge_counts, "File level merges");
+avg_median(\@path_counts, "Number of changed paths from the first parent");
+#avg_median(\@res_counts, "");
+avg_median(\@merge_m, "File level 3-ways");
+avg_median(\@merge_a, "Paths added");
+avg_median(\@merge_d, "Paths deleted");
+avg_median(\@merge_c, "Paths identically added with wrong permission");
+avg_median(\@merge_u, "Paths added differently");
+
+
diff --git a/contrib/jc-git-stat-1-mof.sh b/contrib/jc-git-stat-1-mof.sh
new file mode 100755
index 0000000..2be6d8b
--- /dev/null
+++ b/contrib/jc-git-stat-1-mof.sh
@@ -0,0 +1,59 @@
+#!/bin/sh
+#
+# Copyright (c) Linus Torvalds, 2005
+# Copyright (c) Junio C Hamano, 2005
+#
+# This is modified from the git per-file merge script, called with
+#
+# $1 - original file SHA1 (or empty)
+# $2 - file in branch1 SHA1 (or empty)
+# $3 - file in branch2 SHA1 (or empty)
+# $4 - pathname in repository
+# $5 - orignal file mode (or empty)
+# $6 - file in branch1 mode (or empty)
+# $7 - file in branch2 mode (or empty)
+#
+# Handle some trivial cases.. The _really_ trivial cases have
+# been handled already by git-read-tree, but that one doesn't
+# do any merges that might change the tree layout.
+
+case "${1:-.}${2:-.}${3:-.}" in
+#
+# Deleted in both or deleted in one and unchanged in the other
+#
+"$1.." | "$1.$1" | "$1$1.")
+ echo D
+ ;;
+
+#
+# Added in one.
+#
+".$2." | "..$3" )
+ echo A
+ ;;
+
+#
+# Added in both (check for same permissions).
+#
+".$3$2")
+ if [ "$6" != "$7" ]; then
+ echo C
+ else
+ echo A
+ fi
+ ;;
+
+#
+# Modified in both, but differently.
+#
+"$1$2$3")
+ echo M
+ ;;
+
+".$2$3")
+ echo U
+ ;;
+*)
+ echo C
+ ;;
+esac
diff --git a/contrib/jc-git-stat-1.sh b/contrib/jc-git-stat-1.sh
new file mode 100755
index 0000000..b03c69d
--- /dev/null
+++ b/contrib/jc-git-stat-1.sh
@@ -0,0 +1,74 @@
+#!/bin/sh
+
+MOF=`dirname "$0"`/jc-git-stat-1-mof.sh
+
+GIT_INDEX_FILE=.tmp-index
+export GIT_INDEX_FILE
+LF='
+'
+
+check_merge () {
+ rm -f $GIT_INDEX_FILE
+ commit=$1
+ shift
+
+ case "$#" in
+ 2)
+ MB=$(git-merge-base --all "$@")
+ ;;
+ *)
+ MB=$(git-show-branch --merge-base "$@")
+ ;;
+ esac
+ basecnt=$(echo $MB | wc -l)
+
+ if git-read-tree --trivial -m $MB "$@" 2>/dev/null
+ then
+ type=T
+ pathcnt=$(git-diff-index --cached --name-status "$1" | wc -l)
+ rescnt=$(git-diff-index --cached --name-status "$commit" | wc -l)
+ echo "T $commit $# $basecnt 0 $pathcnt $rescnt"
+ elif git-read-tree -m $MB "$@" 2>/dev/null
+ then
+ script='s/^ *\([0-9]*\) *\([A-Z]\)/\2=\1/'
+ type=M
+ mergecnt=$(git-ls-files --unmerged | sort -k 4,4 -u | wc -l)
+ pathcnt=$(git-diff-index --cached --name-status "$1" | wc -l)
+
+ C=0 A=0 M=0 U=0 D=0
+ eval `git-merge-index -o "$MOF" -a |
+ sort |
+ uniq -c |
+ sed -e "$script"`
+ rescnt=$(git-diff-index --cached --name-status "$commit" | wc -l)
+ echo "M $commit $# $basecnt $mergecnt $pathcnt $rescnt M=$M A=$A D=$D C=$C U=$U"
+ else
+ echo "F $commit $# $basecnt"
+ fi
+}
+
+check_patch () {
+ pathcnt=$(git-diff-tree --name-status -r "$1" | wc -l)
+ echo "C $1 $pathcnt"
+}
+
+case "$#" in
+0)
+ set ORIG_HEAD.. ;;
+esac
+
+git-rev-list --parents "$@" |
+while read commit parents
+do
+ case "$parents" in
+ ?*' '?*)
+ # Merge
+ check_merge $commit $parents
+ ;;
+ *)
+ # Change
+ check_patch $commit $parents
+ ;;
+ esac
+done
+
---
0.99.9.GIT
next prev parent reply other threads:[~2005-11-12 7:44 UTC|newest]
Thread overview: 58+ messages / expand[flat|nested] mbox.gz Atom feed top
2005-11-07 16:48 Comments on recursive merge Linus Torvalds
2005-11-07 16:56 ` Linus Torvalds
2005-11-07 23:19 ` [PATCH] merge-recursive: Only print relevant rename messages Fredrik Kuivinen
2005-11-07 23:54 ` Junio C Hamano
2005-11-09 10:36 ` Fredrik Kuivinen
2005-11-07 22:58 ` Comments on recursive merge Fredrik Kuivinen
2005-11-08 0:13 ` Junio C Hamano
2005-11-08 0:33 ` Linus Torvalds
2005-11-08 0:59 ` Junio C Hamano
2005-11-08 11:58 ` Johannes Schindelin
2005-11-08 21:02 ` Fredrik Kuivinen
2005-11-08 21:47 ` Junio C Hamano
2005-11-08 21:52 ` Linus Torvalds
2005-11-08 22:36 ` Fredrik Kuivinen
2005-11-08 23:05 ` Linus Torvalds
2005-11-08 23:18 ` Johannes Schindelin
2005-11-09 0:18 ` Linus Torvalds
2005-11-09 6:10 ` Junio C Hamano
2005-11-09 0:32 ` Petr Baudis
2005-11-09 0:51 ` Linus Torvalds
2005-11-09 0:59 ` Junio C Hamano
2005-11-09 1:22 ` Linus Torvalds
2005-11-09 1:42 ` Junio C Hamano
2005-11-09 10:20 ` Junio C Hamano
2005-11-09 14:59 ` Petr Baudis
2005-11-09 16:30 ` Linus Torvalds
2005-11-09 20:13 ` Junio C Hamano
2005-11-09 21:58 ` Linus Torvalds
2005-11-09 22:56 ` Junio C Hamano
2005-11-09 23:34 ` Linus Torvalds
2005-11-11 2:58 ` merge-base: fully contaminate the well Junio C Hamano
2005-11-11 5:36 ` Linus Torvalds
2005-11-11 6:04 ` Junio C Hamano
2005-11-11 16:18 ` Linus Torvalds
2005-11-11 8:28 ` Junio C Hamano
2005-11-08 23:04 ` Comments on recursive merge Johannes Schindelin
2005-11-08 16:21 ` [RFC/PATCH] Make git-recursive the default strategy for git-pull Junio C Hamano
2005-11-11 22:25 ` Comments on recursive merge Junio C Hamano
2005-11-11 22:53 ` Linus Torvalds
2005-11-12 0:42 ` Junio C Hamano
2005-11-12 6:35 ` Ryan Anderson
2005-11-12 7:44 ` Junio C Hamano [this message]
2005-11-12 12:19 ` [PATCH] GIT commit statistics Martin Langhoff
2005-11-12 12:53 ` Petr Baudis
2005-11-15 10:04 ` Catalin Marinas
2005-11-15 15:29 ` Chuck Lever
2005-11-12 19:04 ` Johannes Schindelin
2005-11-13 10:59 ` Junio C Hamano
2005-11-13 20:42 ` Martin Langhoff
2005-11-14 3:33 ` Junio C Hamano
2005-11-14 4:01 ` Martin Langhoff
2005-11-14 6:06 ` Junio C Hamano
2005-11-14 8:51 ` Martin Langhoff
2005-11-14 9:25 ` Petr Baudis
2005-11-14 21:25 ` Martin Langhoff
2005-11-14 9:27 ` Junio C Hamano
2005-11-15 3:00 ` Junio C Hamano
2005-11-13 11:11 ` Petr Baudis
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=7v7jbeia3v.fsf_-_@assigned-by-dhcp.cox.net \
--to=junkio@cox.net \
--cc=git@vger.kernel.org \
--cc=ryan@michonline.com \
--cc=torvalds@osdl.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).