git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Thomas Rast <trast@student.ethz.ch>
To: Junio C Hamano <gitster@pobox.com>
Cc: <git@vger.kernel.org>, Jakub Narebski <jnareb@gmail.com>
Subject: Re: [PATCH] Avoid loading commits twice in log with diffs
Date: Mon, 4 Mar 2013 10:58:54 +0100	[thread overview]
Message-ID: <871ubvtrj5.fsf@pctrast.inf.ethz.ch> (raw)
In-Reply-To: <7va9qlc690.fsf@alter.siamese.dyndns.org> (Junio C. Hamano's message of "Sat, 02 Mar 2013 23:06:03 -0800")

Junio C Hamano <gitster@pobox.com> writes:

> Thomas Rast <trast@student.ethz.ch> writes:
>
>> Test                      with patch        before
>> --------------------------------------------------------------------
>> 4000.2: log --raw -3000   0.50(0.43+0.06)   0.54(0.46+0.06) +7.0%***
>> 4000.3: log -p -3000      2.34(2.20+0.13)   2.37(2.22+0.13) +1.2%
>> --------------------------------------------------------------------
>> Significance hints:  '.' 0.1  '*' 0.05  '**' 0.01  '***' 0.001
>
> It may be a silly question but what is a significance hint?

That's from my still-not-rerolled perf improvements series from, ahem,
ages ago:

  http://thread.gmane.org/gmane.comp.version-control.git/192884

I stole the idea from R (and in fact use R to compute it).  The ***
tells you that the probability of the 7% difference is attributable to
chance only with 0.1% probability, or in other words, it's highly likely
that the difference is *not* down to chance.  On the other hand, note
that the p4000.3 measurements do not show a significant difference
(significance hint is absent).

The statistical background: Assume that the two series of measurements
are drawn from two normal distributions (possibly with different
mean/variance).  Welch's t-test estimates the probability of the null
hypothesis that the two distributions in fact had the same mean.  If you
can reject the null hypothesis, you have essentially proven that there's
*some* difference in the timing runs.  (Hopefully for the better, and
hopefully _not_ caused just by CPU scaling or such.)

By the way, in the above test Jakub pointed me at the Dumbbench perl
module.  I've had a look at the ideas within, and I'll try to put some
sample rejection along their lines into perf-lib.  However, several
things make the module itself rather deserve the name.  Most
prominently, I can only get it to print timings to stdout in a fixed
format designed for human consumption.

>> --- a/log-tree.c
>> +++ b/log-tree.c
>> @@ -715,7 +715,7 @@ static int log_tree_diff(struct rev_info *opt, struct commit *commit, struct log
>>  {
>>  	int showed_log;
>>  	struct commit_list *parents;
>> -	unsigned const char *sha1 = commit->object.sha1;
>> +	unsigned const char *sha1 = commit->tree->object.sha1;
>
> Overall I think this goes in the right direction and I can see why
> the changes in later two hunks are correct, but I am not sure if we
> can safely assume that the caller has parsed the incoming commit and
> we have a good commit->tree here already.
>
> Right now, this static function has a single call-site in a public
> function log_tree_commit(), whose existing callers may all pass an
> already parsed commit, but I feel somewhat uneasy to do the above
> without some mechanism in place (either parse it here or in the
> caller if unparsed, or document that log_tree_commit() must be
> called with a parsed commit and perhaps add an assert there) to
> ensure that the invariant is not broken in the future.

In that case I'll reroll with the parsing -- it will have about the same
cost as the assertion, since it'll just check ->object.parsed and
return.

-- 
Thomas Rast
trast@{inf,student}.ethz.ch

  reply	other threads:[~2013-03-04  9:59 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-03-02 10:05 [PATCH] Avoid loading commits twice in log with diffs Thomas Rast
2013-03-03  7:06 ` Junio C Hamano
2013-03-04  9:58   ` Thomas Rast [this message]
2013-03-28  8:19   ` [PATCH v2] " Thomas Rast
2013-03-28 13:51     ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=871ubvtrj5.fsf@pctrast.inf.ethz.ch \
    --to=trast@student.ethz.ch \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jnareb@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).