git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Junio C Hamano <junkio@cox.net>
To: Ryan Anderson <ryan@michonline.com>
Cc: Linus Torvalds <torvalds@osdl.org>, git@vger.kernel.org
Subject: [PATCH] GIT commit statistics.
Date: Fri, 11 Nov 2005 23:44:20 -0800	[thread overview]
Message-ID: <7v7jbeia3v.fsf_-_@assigned-by-dhcp.cox.net> (raw)
In-Reply-To: <43758D21.3060107@michonline.com> (Ryan Anderson's message of "Sat, 12 Nov 2005 01:35:13 -0500")

Ryan Anderson <ryan@michonline.com> writes:

> Junio C Hamano wrote:
>
>> Just for fun, I randomly picked two heads/master commits from
>> linux-2.6 repository ... and fed the commits
>> between the two to a little script that looks at commits and
>> tries to stat what they did (the script ignores renames so they
>> appear as deletes and adds).
>
> Mind sharing the script?
>
> It'be nice to know if these stats are typical, or unusual when you get
> numbers from a variety of other trees.

Very unpolished but here they are.

I misread the trivial count in my original message.  Trivial and
Merge are counted separately, so among 3957 commits, merges were
297 (72 trivials and 225 others).

-- >8 -- cut here -- >8 --
Subject: [PATCH] GIT commit statistics

A set of scripts that read the existing commit history, and
show various stats.

Sample usage:

    # Arguments are given to git-rev-list; defaults to ORIG..
    # if not given, to retrace what was just pulled.
    $ ./contrib/jc-git-stat-1.sh v0.99.9g..maint |
      ./contrib/jc-git-stat-1-log.perl
    Total commit objects: 43
    Trivial Merges: 1 (2.33%)
    Merges: 1 (2.33%)
    Number of paths touched by non-merge commits:
	    average 3.00, median 2, min 2, max 18
    Number of merge parents:
	    average 2.50, median 3, min 2, max 3
    Number of merge bases:
	    average 1.00, median 1, min 1, max 1
    File level merges:
	    average 0.50, median 1, min 0, max 1
    Number of changed paths from the first parent:
	    average 28.00, median 52, min 4, max 52
    File level 3-ways:
	    average 1.00, median 1, min 1, max 1

 * "Trivial Merges" are the ones done by read-tree --trivial;
 * "Merges" are other merges;
 * "File level merges" are paths not collapsed by read-tree 3-way (i.e.
   given to merge-one-file);
 * "File level 3-ways" are paths merge-one-file would have run 'merge';

Signed-off-by: Junio C Hamano <junkio@cox.net>

---

 contrib/jc-git-stat-1-log.perl |   87 ++++++++++++++++++++++++++++++++++++++++
 contrib/jc-git-stat-1-mof.sh   |   59 +++++++++++++++++++++++++++
 contrib/jc-git-stat-1.sh       |   74 ++++++++++++++++++++++++++++++++++
 3 files changed, 220 insertions(+), 0 deletions(-)
 create mode 100755 contrib/jc-git-stat-1-log.perl
 create mode 100755 contrib/jc-git-stat-1-mof.sh
 create mode 100755 contrib/jc-git-stat-1.sh

applies-to: 9a0f0c748316751fbf593a21f2b16bcdd975095a
2cb3da4b260ed82dc379a11d91f55fe774a2ea49
diff --git a/contrib/jc-git-stat-1-log.perl b/contrib/jc-git-stat-1-log.perl
new file mode 100755
index 0000000..b70af2b
--- /dev/null
+++ b/contrib/jc-git-stat-1-log.perl
@@ -0,0 +1,87 @@
+#!/usr/bin/perl
+
+my ($patches, $failures, $merges, $trivials) = (0, 0, 0, 0);
+my (@patch_paths,
+    @parent_counts,
+    @base_counts,
+    @merge_counts,
+    @path_counts,
+    @res_counts,
+    @merge_m, 
+    @merge_a,
+    @merge_d, 
+    @merge_c, 
+    @merge_u);
+
+sub avg_median {
+    my ($ary) = shift;
+    my ($msg) = shift;
+    my @a = sort { $a <=> $b } @$ary;
+    my $sum = 0;
+    for (@a) { $sum += $_ }
+    return unless (@a && $sum);
+    my ($avg, $med) = ($sum/@a, $a[(@a/2)]);
+    my ($min, $max) = ($a[0], $a[$#a]);
+    printf "%s:\n\taverage %.2f, median %d, min %d, max %d\n",
+    	$msg, $avg, $med, $min, $max;
+}
+
+while (<>) {
+    next unless (s/^([MCFT]) [0-9a-f]{40} //);
+    chomp;
+    my $type = $1;
+    if ($type eq 'F') {
+	$failures++;
+	next;
+    }
+    if ($type eq 'C') {
+	$patches++;
+	push @patch_paths, $_;
+	next;
+    }
+    if ($type eq 'M') {
+	$merges++;
+    }
+    elsif ($type eq 'T') {
+	$trivials++;
+    }
+    else {
+	die "?? $type";
+    }
+    s/^(\d+) (\d+) (\d+) (\d+) (\d+) *//;
+    push @parent_counts, $1;
+    push @base_counts, $2;
+    push @merge_counts, $3;
+    push @path_counts, $4;
+    push @res_counts, $5;
+    if ($type eq 'M') {
+	/M=(\d+) A=(\d+) D=(\d+) C=(\d+) U=(\d+)/ or die;
+	push @merge_m, $1;
+	push @merge_a, $2;
+	push @merge_d, $3;
+	push @merge_c, $4;
+	push @merge_u, $5;
+    }
+}
+
+my $total = ($failures+$patches+$merges+$trivials);
+print "Total commit objects: $total\n";
+printf "Trivial Merges: $trivials (%.2f%%)\n", ($trivials * 100.0/$total);
+printf "Merges: $merges (%.2f%%)\n", ($merges * 100.0/$total);
+if ($failures) {
+    print "Failures: $failures\n";
+}
+
+avg_median(\@patch_paths, "Number of paths touched by non-merge commits");
+avg_median(\@parent_counts, "Number of merge parents");
+avg_median(\@base_counts, "Number of merge bases");
+avg_median(\@merge_counts, "File level merges");
+avg_median(\@path_counts, "Number of changed paths from the first parent");
+#avg_median(\@res_counts, "");
+avg_median(\@merge_m, "File level 3-ways");
+avg_median(\@merge_a, "Paths added");
+avg_median(\@merge_d, "Paths deleted");
+avg_median(\@merge_c, "Paths identically added with wrong permission");
+avg_median(\@merge_u, "Paths added differently");
+
+
diff --git a/contrib/jc-git-stat-1-mof.sh b/contrib/jc-git-stat-1-mof.sh
new file mode 100755
index 0000000..2be6d8b
--- /dev/null
+++ b/contrib/jc-git-stat-1-mof.sh
@@ -0,0 +1,59 @@
+#!/bin/sh
+#
+# Copyright (c) Linus Torvalds, 2005
+# Copyright (c) Junio C Hamano, 2005
+#
+# This is modified from the git per-file merge script, called with
+#
+#   $1 - original file SHA1 (or empty)
+#   $2 - file in branch1 SHA1 (or empty)
+#   $3 - file in branch2 SHA1 (or empty)
+#   $4 - pathname in repository
+#   $5 - orignal file mode (or empty)
+#   $6 - file in branch1 mode (or empty)
+#   $7 - file in branch2 mode (or empty)
+#
+# Handle some trivial cases.. The _really_ trivial cases have
+# been handled already by git-read-tree, but that one doesn't
+# do any merges that might change the tree layout.
+
+case "${1:-.}${2:-.}${3:-.}" in
+#
+# Deleted in both or deleted in one and unchanged in the other
+#
+"$1.." | "$1.$1" | "$1$1.")
+	echo D
+	;;
+
+#
+# Added in one.
+#
+".$2." | "..$3" )
+	echo A
+	;;
+
+#
+# Added in both (check for same permissions).
+#
+".$3$2")
+	if [ "$6" != "$7" ]; then
+		echo C
+	else
+		echo A
+	fi
+	;;
+
+#
+# Modified in both, but differently.
+#
+"$1$2$3")
+	echo M
+	;;
+
+".$2$3")
+	echo U
+	;;
+*)
+	echo C
+	;;
+esac
diff --git a/contrib/jc-git-stat-1.sh b/contrib/jc-git-stat-1.sh
new file mode 100755
index 0000000..b03c69d
--- /dev/null
+++ b/contrib/jc-git-stat-1.sh
@@ -0,0 +1,74 @@
+#!/bin/sh
+
+MOF=`dirname "$0"`/jc-git-stat-1-mof.sh
+
+GIT_INDEX_FILE=.tmp-index
+export GIT_INDEX_FILE
+LF='
+'
+
+check_merge () {
+	rm -f $GIT_INDEX_FILE
+	commit=$1
+	shift
+
+	case "$#" in
+	2)
+		MB=$(git-merge-base --all "$@")
+		;;
+	*)
+		MB=$(git-show-branch --merge-base "$@")
+		;;
+	esac
+	basecnt=$(echo $MB | wc -l)
+
+	if git-read-tree --trivial -m $MB "$@" 2>/dev/null
+	then
+		type=T
+		pathcnt=$(git-diff-index --cached --name-status "$1" | wc -l)
+		rescnt=$(git-diff-index --cached --name-status "$commit" | wc -l)
+		echo "T $commit $# $basecnt 0 $pathcnt $rescnt"
+	elif git-read-tree -m $MB "$@" 2>/dev/null
+	then
+	        script='s/^ *\([0-9]*\) *\([A-Z]\)/\2=\1/'
+		type=M
+		mergecnt=$(git-ls-files --unmerged | sort -k 4,4 -u | wc -l)
+		pathcnt=$(git-diff-index --cached --name-status "$1" | wc -l)
+
+		C=0 A=0 M=0 U=0 D=0
+		eval `git-merge-index -o "$MOF" -a |
+			sort |
+			uniq -c |
+			sed -e "$script"`
+		rescnt=$(git-diff-index --cached --name-status "$commit" | wc -l)
+		echo "M $commit $# $basecnt $mergecnt $pathcnt $rescnt M=$M A=$A D=$D C=$C U=$U"
+	else
+		echo "F $commit $# $basecnt"
+	fi
+}
+
+check_patch () {
+	pathcnt=$(git-diff-tree --name-status -r "$1" | wc -l)
+	echo "C $1 $pathcnt"
+}
+
+case "$#" in
+0)
+	set ORIG_HEAD.. ;;
+esac
+
+git-rev-list --parents "$@" |
+while read commit parents
+do
+	case "$parents" in
+	?*' '?*)
+		# Merge
+		check_merge $commit $parents
+		;;
+	*)
+		# Change
+		check_patch $commit $parents
+		;;
+	esac
+done
+
---
0.99.9.GIT

  reply	other threads:[~2005-11-12  7:44 UTC|newest]

Thread overview: 58+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-11-07 16:48 Comments on recursive merge Linus Torvalds
2005-11-07 16:56 ` Linus Torvalds
2005-11-07 23:19   ` [PATCH] merge-recursive: Only print relevant rename messages Fredrik Kuivinen
2005-11-07 23:54     ` Junio C Hamano
2005-11-09 10:36       ` Fredrik Kuivinen
2005-11-07 22:58 ` Comments on recursive merge Fredrik Kuivinen
2005-11-08  0:13   ` Junio C Hamano
2005-11-08  0:33     ` Linus Torvalds
2005-11-08  0:59       ` Junio C Hamano
2005-11-08 11:58       ` Johannes Schindelin
2005-11-08 21:02         ` Fredrik Kuivinen
2005-11-08 21:47           ` Junio C Hamano
2005-11-08 21:52           ` Linus Torvalds
2005-11-08 22:36             ` Fredrik Kuivinen
2005-11-08 23:05               ` Linus Torvalds
2005-11-08 23:18                 ` Johannes Schindelin
2005-11-09  0:18                   ` Linus Torvalds
2005-11-09  6:10                     ` Junio C Hamano
2005-11-09  0:32                 ` Petr Baudis
2005-11-09  0:51                   ` Linus Torvalds
2005-11-09  0:59                     ` Junio C Hamano
2005-11-09  1:22                       ` Linus Torvalds
2005-11-09  1:42                         ` Junio C Hamano
2005-11-09 10:20                           ` Junio C Hamano
2005-11-09 14:59                             ` Petr Baudis
2005-11-09 16:30                             ` Linus Torvalds
2005-11-09 20:13                               ` Junio C Hamano
2005-11-09 21:58                                 ` Linus Torvalds
2005-11-09 22:56                                   ` Junio C Hamano
2005-11-09 23:34                                     ` Linus Torvalds
2005-11-11  2:58                                   ` merge-base: fully contaminate the well Junio C Hamano
2005-11-11  5:36                                     ` Linus Torvalds
2005-11-11  6:04                                       ` Junio C Hamano
2005-11-11 16:18                                         ` Linus Torvalds
2005-11-11  8:28                                       ` Junio C Hamano
2005-11-08 23:04           ` Comments on recursive merge Johannes Schindelin
2005-11-08 16:21       ` [RFC/PATCH] Make git-recursive the default strategy for git-pull Junio C Hamano
2005-11-11 22:25       ` Comments on recursive merge Junio C Hamano
2005-11-11 22:53         ` Linus Torvalds
2005-11-12  0:42           ` Junio C Hamano
2005-11-12  6:35         ` Ryan Anderson
2005-11-12  7:44           ` Junio C Hamano [this message]
2005-11-12 12:19             ` [PATCH] GIT commit statistics Martin Langhoff
2005-11-12 12:53               ` Petr Baudis
2005-11-15 10:04                 ` Catalin Marinas
2005-11-15 15:29                   ` Chuck Lever
2005-11-12 19:04               ` Johannes Schindelin
2005-11-13 10:59               ` Junio C Hamano
2005-11-13 20:42                 ` Martin Langhoff
2005-11-14  3:33                   ` Junio C Hamano
2005-11-14  4:01                     ` Martin Langhoff
2005-11-14  6:06                       ` Junio C Hamano
2005-11-14  8:51                         ` Martin Langhoff
2005-11-14  9:25                           ` Petr Baudis
2005-11-14 21:25                             ` Martin Langhoff
2005-11-14  9:27                           ` Junio C Hamano
2005-11-15  3:00                           ` Junio C Hamano
2005-11-13 11:11               ` Petr Baudis

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7v7jbeia3v.fsf_-_@assigned-by-dhcp.cox.net \
    --to=junkio@cox.net \
    --cc=git@vger.kernel.org \
    --cc=ryan@michonline.com \
    --cc=torvalds@osdl.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).