From: David Kastrup <dak@gnu.org>
To: git@vger.kernel.org
Subject: Re: [PATCH] blame.c: prepare_lines should not call xrealloc for every line
Date: Thu, 06 Feb 2014 11:29:16 +0100 [thread overview]
Message-ID: <87mwi4k003.fsf@fencepost.gnu.org> (raw)
In-Reply-To: xmqqk3d92t0z.fsf@gitster.dls.corp.google.com
Junio C Hamano <gitster@pobox.com> writes:
> David Kastrup <dak@gnu.org> writes:
>
>> It's snake oil making debugging harder.
>
> OK, that is a sensible argument.
>
>>> This was fun ;-)
>>
>> At the expense of seriously impacting my motivation to do any further
>> code cleanup on Git.
>
> Well, I said it was "fun" because I was learning from a new person
> who haven't made much technical or code aethesitics discussion here
> so far. If teaching others frustrates you and stops contributing to
> the project, that is a loss.
So let's post something noteworthy technical: the current code iteration
sported the following routine:
static struct blame_entry *blame_merge(struct blame_entry *list1,
struct blame_entry *list2)
{
struct blame_entry *p1 = list1, *p2 = list2,
**tail = &list1;
if (!p1)
return p2;
if (!p2)
return p1;
if (p1->s_lno <= p2->s_lno) {
do {
tail = &p1->next;
if ((p1 = *tail) == NULL) {
*tail = p2;
return list1;
}
} while (p1->s_lno <= p2->s_lno);
}
for (;;) {
*tail = p2;
do {
tail = &p2->next;
if ((p2 = *tail) == NULL) {
*tail = p1;
return list1;
}
} while (p1->s_lno > p2->s_lno);
*tail = p1;
do {
tail = &p1->next;
if ((p1 = *tail) == NULL) {
*tail = p2;
return list1;
}
} while (p1->s_lno <= p2->s_lno);
}
}
This is decidedly more complex than the equivalent
static struct blame_entry *blame_merge(struct blame_entry *list1,
struct blame_entry *list2)
{
struct blame_entry *p1 = list1, *p2 = list2,
**tail = &list1;
while (p1 && p2) {
if (p1->s_lno <= p2->s_lno) {
*tail = p1;
p1 = *(tail = &p1->next);
} else {
*tail = p2;
p2 = *(tail = &p2->next);
}
}
*tail = p1 ? p1 : p2;
return list1;
}
Why not use the latter one? Apart from the somewhat
obsessive-compulsive avoidance of checking the same condition twice in a
row (which nowadays compilers are pretty capable of taking into account
by themselves) the actually decisive difference is to avoid rewriting
links that are already pointing to the right place.
Those are the only write accesses that take place, and since we are
talking about _sorting_, it is to be expected that in the long run,
every record ends up in a cacheline different from its ultimate
neighbors. Also, assuming a more or less random distribution and
equally sized chains, about half of the links will already be correct.
In practice, the situation tends to be more degenerate, leading to a
_higher_ ratio of already correct links. Cutting down the expensive
writeout of cachelines by a factor of more than 2 makes quite a
difference in performance.
Does it matter here? Unlikely. Profiling tools tend to be useless for
seeing the impact of dirty cachelines since their operation
significantly interferes with the read/write patterns so this kind of
detail often escapes notice. But the total contribution of this stage
to the runtime of git blame will be insignificant either way.
> new blood bringing in new ideas is fine, but I'd want to see those new
> ideas backed by solid thiniking, and that means they may have to
> explain themselves to those who are new to their ideas.
Well, there _is_ solid thinking behind the above code, but there _also_
is solid thinking behind the notion that it won't matter in this case.
I still prefer not pushing out code in the wild that I would hesitate to
employ in _all_ comparable circumstances. Consider it as a pattern or a
style: a lot of thinking went into it once, and what this should buy you
is to never have to think about this issue again.
C++ is better at actual code reuse, but with C you basically get pattern
reuse. Your head is instantiating the templates. And it's a good idea
not to crowd one's head with unnecessary and/or suboptimal patterns.
--
David Kastrup
next prev parent reply other threads:[~2014-02-06 10:29 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-02-04 20:06 [PATCH] blame.c: prepare_lines should not call xrealloc for every line David Kastrup
2014-02-04 20:10 ` David Kastrup
2014-02-04 20:49 ` Junio C Hamano
2014-02-04 21:00 ` Junio C Hamano
2014-02-04 21:09 ` David Kastrup
2014-02-04 22:28 ` Philip Oakley
2014-02-04 22:48 ` Philip Oakley
2014-02-04 20:24 ` Junio C Hamano
2014-02-04 20:52 ` David Kastrup
2014-02-04 21:03 ` Junio C Hamano
2014-02-04 21:11 ` David Kastrup
2014-02-04 21:41 ` Junio C Hamano
2014-02-04 21:27 ` David Kastrup
2014-02-04 21:44 ` Junio C Hamano
2014-02-04 21:48 ` David Kastrup
2014-02-04 22:06 ` Junio C Hamano
2014-02-05 8:39 ` David Kastrup
2014-02-05 20:39 ` Junio C Hamano
2014-02-06 0:34 ` David Kastrup
2014-02-06 10:29 ` David Kastrup [this message]
2014-02-05 9:22 ` David Kastrup
2014-02-05 20:34 ` Junio C Hamano
2014-02-05 23:45 ` David Kastrup
-- strict thread matches above, loose matches on Subject: below --
2014-02-04 21:40 David Kastrup
2014-02-04 21:46 David Kastrup
2014-02-12 14:27 David Kastrup
2014-02-12 19:36 ` Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87mwi4k003.fsf@fencepost.gnu.org \
--to=dak@gnu.org \
--cc=git@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.