* How to generate feature branch statistics? @ 2016-07-20 8:05 Ernesto Maserati 2016-07-20 13:14 ` Jeff King 2016-07-20 13:56 ` Jakub Narębski 0 siblings, 2 replies; 7+ messages in thread From: Ernesto Maserati @ 2016-07-20 8:05 UTC (permalink / raw) To: git I assume that feature branches are not frequently enough merged into master. Because of that we discover bugs later than we could with a more continuous code integration. I don't want to discuss here whether feature branches are good or bad. I want just to ask is there a way how to generate a statistic for the average duration of feature branches until they are merged to the master? I would like to know if it is 1 day, 2 days or lets say 8 or 17 days. Also it would be interesting to see the statistical outliers. I hope my motivation became clear and what kind of git repository data I would like to produce. Any ideas? ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: How to generate feature branch statistics? 2016-07-20 8:05 How to generate feature branch statistics? Ernesto Maserati @ 2016-07-20 13:14 ` Jeff King 2016-07-20 18:49 ` Junio C Hamano 2016-07-20 13:56 ` Jakub Narębski 1 sibling, 1 reply; 7+ messages in thread From: Jeff King @ 2016-07-20 13:14 UTC (permalink / raw) To: Ernesto Maserati; +Cc: git On Wed, Jul 20, 2016 at 10:05:09AM +0200, Ernesto Maserati wrote: > I assume that feature branches are not frequently enough merged into > master. Because of that we discover bugs later than we could with a more > continuous code integration. I don't want to discuss here whether feature > branches are good or bad. > > I want just to ask is there a way how to generate a statistic for the > average duration of feature branches until they are merged to the master? I > would like to know if it is 1 day, 2 days or lets say 8 or 17 days. Also it > would be interesting to see the statistical outliers. In a workflow that merges feature branches to master, you can generally recognize them by looking for merges along the first-parent chain of commits: git log --first-parent --merges master (Depending on your workflow, some feature branches may be fast-forwards with no merge commit, so this is just a sampling. Some workflows use "git merge --no-ff" to merge in feature branches, so this would see all of them). And then for each merge, you can get the set of commits that were merged in (it is the commits in the second parent that are not in the first). The bottom-most one is the "start" of the branch (or close to it; of course the author started writing code before they made a commit), and the "end" is the merge itself. So something like: git rev-list --first-parent --merges master | while read merge; do start=$(git log --format=%at $merge^1..$merge^2 | tail -1) end=$(git log -1 --format=%at $merge) subject=$(git log -1 --format=%s $merge) echo "$((end - start)) $subject" done That should output a sequence of topic branch merges prefixed by the number of seconds they were active. Two exercises for the reader: 1. Converting seconds into some more useful time scale. :) 2. This can probably be done with fewer invocations of git, which would be more efficient. -Peff ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: How to generate feature branch statistics? 2016-07-20 13:14 ` Jeff King @ 2016-07-20 18:49 ` Junio C Hamano 2016-07-20 23:10 ` Jakub Narębski 0 siblings, 1 reply; 7+ messages in thread From: Junio C Hamano @ 2016-07-20 18:49 UTC (permalink / raw) To: Jeff King; +Cc: Ernesto Maserati, git Jeff King <peff@peff.net> writes: > In a workflow that merges feature branches to master, you can generally > recognize them by looking for merges along the first-parent chain of > commits: > > git log --first-parent --merges master > > (Depending on your workflow, some feature branches may be fast-forwards > with no merge commit, so this is just a sampling. Some workflows use > "git merge --no-ff" to merge in feature branches, so this would see all > of them). > And then for each merge, you can get the set of commits that were merged > in (it is the commits in the second parent that are not in the first). > The bottom-most one is the "start" of the branch (or close to it; of > course the author started writing code before they made a commit), and > the "end" is the merge itself. A few things to keep in mind are * A feature branch may be merged to the master multiple times, when the feature branch is properly managed. E.g. It may have been once thought to be complete with 3 commits, get merged to 'master', then a bug is discovered and gain its fourth commits to fix the bug and merged to 'master' again, resulting in a topology like this: A---B---C-----------D (feature) / \ \ ---o---o---o---1---o---o---2---o (master) "git log --first-parent --merges master" will first find commit '2' that merged the feature for the second time, bringing in commit 'D', and then it will find commit '1' that merged the feature previously, bringing in commit 'A', 'B' and 'C'. * A feature branch that depends on other feature may have merges on their own. You may start a feature X that depends on another features Y and Z that are not yet in 'master', in addition to depending on things in 'master' that have been added since Y and Z forked from it. In such a case, your feature X may look like this: .-------------------1----------2--------x---x (feature X) / / / y---y---y (feature Y) / / / / / ---o---o---o---o---o---o---o---0 (master) / \ / z---z (feature Z) / \ / .----------------------. where '1' and '2' are merges of feature Y and then Z into the tip of 'master' when you start working on feature X. And then feature Y and feature Z may graduate to 'master' before your feature X is ready to do so, resulting in something like: .-------------------1----------2--------x---x (feature X) / / / y---y---y (feature Y) ---- / ------- / --. / / / \ ---o---o---o---o---o---o---o---o---o---o---o---o---Y---Z (master) \ / / z---z (feature Z) ---------- / ----------. \ / .----------------------. where 'Y' and 'Z' are merges of features Y and Z to 'master'. After that, feature X may become ready to be merged, resulting in: .-------------------1----------2--------x---x (feature X) / / / \ y---y---y (feature Y) ---- / ------- / --. \ / / / \ \ ---o---o---o---o---o---o---o---o---o---o---o---o---Y---Z---o---X (master) \ / / z---z (feature Z) ---------- / ----------. \ / .----------------------. When "git log --first-parent --merges master" finds X, it would notice that it pulled in commits '1', '2' and two 'x'. The "tool" to inspect the history needs to be careful deciding if '1' and '2' are the part of feature X. There are variants that make it tricky (e.g. 'Y' may not have yet been merged to 'master' when 'X' is merged, in which case you may end up pulling both 'x' and 'y' into 'master' with a single merge), which should be avoided if feature branches are managed carefully, but not everybody is careful when managing their history. Coming back to the introduction of the original message: >> I assume that feature branches are not frequently enough merged into >> master. Because of that we discover bugs later than we could with a more >> continuous code integration. I don't want to discuss here whether feature >> branches are good or bad. For our own history and workflow, the duration between the inception of a topic branch and the time it gets merged to 'master' is not all that interesting. More interesting numbers are: * The duration between the time a topic hits 'next' and the time it gets merged to 'master'. This is the time the developers and testers are using the new feature in their own work to make sure it does not have any ill effect. * The percetage of topics that is merged to 'master' with some follow-up changes since it hits 'next'. This is an approximate for the number of bugs that are caught by developers and testers before a new feature goes to the general public. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: How to generate feature branch statistics? 2016-07-20 18:49 ` Junio C Hamano @ 2016-07-20 23:10 ` Jakub Narębski 2016-07-20 23:31 ` Junio C Hamano 0 siblings, 1 reply; 7+ messages in thread From: Jakub Narębski @ 2016-07-20 23:10 UTC (permalink / raw) To: Junio C Hamano, Jeff King; +Cc: Ernesto Maserati, git W dniu 2016-07-20 o 20:49, Junio C Hamano pisze: > For our own history and workflow, the duration between the inception > of a topic branch and the time it gets merged to 'master' is not all > that interesting. Nb. if I haven't messed something up (the git history is not simple merging of topic branches into mainline), the shortest time from creating a branch to merging it in git.git is 7 seconds (probably it was a bugfix-type of a topic branch), the longest if I did it correctly is slightly less than 4 years (???): 641830c. -- Jakub Narębski ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: How to generate feature branch statistics? 2016-07-20 23:10 ` Jakub Narębski @ 2016-07-20 23:31 ` Junio C Hamano 0 siblings, 0 replies; 7+ messages in thread From: Junio C Hamano @ 2016-07-20 23:31 UTC (permalink / raw) To: Jakub Narębski; +Cc: Jeff King, Ernesto Maserati, Git Mailing List On Wed, Jul 20, 2016 at 4:10 PM, Jakub Narębski <jnareb@gmail.com> wrote: > W dniu 2016-07-20 o 20:49, Junio C Hamano pisze: > >> For our own history and workflow, the duration between the inception >> of a topic branch and the time it gets merged to 'master' is not all >> that interesting. > > Nb. if I haven't messed something up (the git history is not simple > merging of topic branches into mainline), the shortest time from > creating a branch to merging it in git.git is 7 seconds (probably > it was a bugfix-type of a topic branch), the longest if I did it > correctly is slightly less than 4 years (???): 641830c. The former is quite understandable. The point of having such a topic is so that it can be merged down to older maintenance releases. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: How to generate feature branch statistics? 2016-07-20 8:05 How to generate feature branch statistics? Ernesto Maserati 2016-07-20 13:14 ` Jeff King @ 2016-07-20 13:56 ` Jakub Narębski 2016-07-20 18:10 ` Jakub Narębski 1 sibling, 1 reply; 7+ messages in thread From: Jakub Narębski @ 2016-07-20 13:56 UTC (permalink / raw) To: Ernesto Maserati, git W dniu 2016-07-20 o 10:05, Ernesto Maserati pisze: > I assume that feature branches are not frequently enough merged into > master. Because of that we discover bugs later than we could with a more > continuous code integration. I don't want to discuss here whether feature > branches are good or bad. > > I want just to ask is there a way how to generate a statistic for the > average duration of feature branches until they are merged to the master? I > would like to know if it is 1 day, 2 days or lets say 8 or 17 days. Also it > would be interesting to see the statistical outliers. > > I hope my motivation became clear and what kind of git repository data I > would like to produce. > > Any ideas? There are at least two tools to generate statistics about git repository, namely Gitstat (https://sourceforge.net/projects/gitstat) and GitStats (https://github.com/hoxu/gitstats), both generating repo statistics as a web page. You can probably find more... but I don't know if any includes the statistics you need. I assume that you have some way of determining if the merge in 'master' branch is a merge of a topic branch, or of long-lived graduation branch (e.g. 'maint' or equivalent). To simplify the situation, I assume that the only merges in master are merges of topic branches: git rev-list --min-parents=2 master | while read merge_rev; do You might want to add "--grep=maint --invert-grep" or something like that to exclude merges of 'maint' branch. We can get date of merge (authordate with %ad/%at, or committerdate with %cd/%ct), as an epoch (seconds since 1970 -- which is good for comparing datetimes and getting the interval between two events) MERGE_DATE=$(git show -s --date=format:%s --pretty=%ad $merge_rev) Assuming that topic branches are always merged using two-head merge as a second parent (--first-parent ancestry for master in master branch only), then we can get the first revision on a merged topic branch with FIRST_REV=$(git rev-list $merge_rev^2 ^$merge_rev^1 | tail -1) We can extract the date from this revision in the same way FIRST_DATE=$(git show -s --pretty=%at $FIRST_REV) Print the difference (here to standard output, you might want to write to a file) echo $(expr $MERGE_DATE - $FIRST_DATE) And finish the loop. done Then pass the output to some histogramming or statistics tool... or use a spreadsheet. Note the results are in seconds. HTH (not checked much) -- Jakub Narębski ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: How to generate feature branch statistics? 2016-07-20 13:56 ` Jakub Narębski @ 2016-07-20 18:10 ` Jakub Narębski 0 siblings, 0 replies; 7+ messages in thread From: Jakub Narębski @ 2016-07-20 18:10 UTC (permalink / raw) To: Ernesto Maserati, git W dniu 2016-07-20 o 15:56, Jakub Narębski pisze: > W dniu 2016-07-20 o 10:05, Ernesto Maserati pisze: > >> I assume that feature branches are not frequently enough merged into >> master. Because of that we discover bugs later than we could with a more >> continuous code integration. I don't want to discuss here whether feature >> branches are good or bad. >> >> I want just to ask is there a way how to generate a statistic for the >> average duration of feature branches until they are merged to the master? I >> would like to know if it is 1 day, 2 days or lets say 8 or 17 days. Also it >> would be interesting to see the statistical outliers. >> >> I hope my motivation became clear and what kind of git repository data I >> would like to produce. >> >> Any ideas? > > There are at least two tools to generate statistics about git repository, > namely Gitstat (https://sourceforge.net/projects/gitstat) and GitStats > (https://github.com/hoxu/gitstats), both generating repo statistics as > a web page. You can probably find more... but I don't know if any includes > the statistics you need. > > I assume that you have some way of determining if the merge in 'master' > branch is a merge of a topic branch, or of long-lived graduation branch > (e.g. 'maint' or equivalent). To simplify the situation, I assume that > the only merges in master are merges of topic branches: > > git rev-list --min-parents=2 master | Self correction: Here you need to use --first-parent, as in Peff answer (which also uses less git invocations, and less of git porcelain). I wonder if it is something that libgit2 would be helpful... -- Jakub Narębski ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2016-07-20 23:31 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2016-07-20 8:05 How to generate feature branch statistics? Ernesto Maserati 2016-07-20 13:14 ` Jeff King 2016-07-20 18:49 ` Junio C Hamano 2016-07-20 23:10 ` Jakub Narębski 2016-07-20 23:31 ` Junio C Hamano 2016-07-20 13:56 ` Jakub Narębski 2016-07-20 18:10 ` Jakub Narębski
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).