From mboxrd@z Thu Jan 1 00:00:00 1970 From: David Kastrup Subject: Re: [PATCH] blame.c: prepare_lines should not call xrealloc for every line Date: Thu, 06 Feb 2014 11:29:16 +0100 Organization: Organization?!? Message-ID: <87mwi4k003.fsf@fencepost.gnu.org> References: <1391544367-14599-1-git-send-email-dak@gnu.org> <874n4ewouz.fsf@fencepost.gnu.org> <87zjm6v99y.fsf@fencepost.gnu.org> <87vbwuuf5g.fsf@fencepost.gnu.org> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: QUOTED-PRINTABLE To: git@vger.kernel.org X-From: git-owner@vger.kernel.org Thu Feb 06 11:29:52 2014 Return-path: Envelope-to: gcvg-git-2@plane.gmane.org Received: from vger.kernel.org ([209.132.180.67]) by plane.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1WBMDP-0002f9-9U for gcvg-git-2@plane.gmane.org; Thu, 06 Feb 2014 11:29:51 +0100 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754289AbaBFK3e convert rfc822-to-quoted-printable (ORCPT ); Thu, 6 Feb 2014 05:29:34 -0500 Received: from plane.gmane.org ([80.91.229.3]:41905 "EHLO plane.gmane.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754010AbaBFK3c (ORCPT ); Thu, 6 Feb 2014 05:29:32 -0500 Received: from list by plane.gmane.org with local (Exim 4.69) (envelope-from ) id 1WBMD3-0002PV-Og for git@vger.kernel.org; Thu, 06 Feb 2014 11:29:29 +0100 Received: from x2f4740e.dyn.telefonica.de ([2.244.116.14]) by main.gmane.org with esmtp (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Thu, 06 Feb 2014 11:29:29 +0100 Received: from dak by x2f4740e.dyn.telefonica.de with local (Gmexim 0.1 (Debian)) id 1AlnuQ-0007hv-00 for ; Thu, 06 Feb 2014 11:29:29 +0100 X-Injected-Via-Gmane: http://gmane.org/ X-Complaints-To: usenet@ger.gmane.org X-Gmane-NNTP-Posting-Host: x2f4740e.dyn.telefonica.de X-Face: 2FEFf>]>q>2iw=B6,xrUubRI>pR&Ml9=ao@P@i)L:\urd*t9M~y1^:+Y]'C0~{mAl`oQuAl \!3KEIp?*w`|bL5qr,H)LFO6Q=qx~iH4DN;i";/yuIsqbLLCh/!U#X[S~(5eZ41to5f%E@'ELIi$t^ Vc\LWP@J5p^rst0+('>Er0=^1{]M9!p?&:\z]|;&=NP3AhB!B_bi^]Pfkw User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.3.50 (gnu/linux) Cancel-Lock: sha1:hit1dZx/SUL9ogqWNqzifL/fnfw= Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Archived-At: Junio C Hamano writes: > David Kastrup writes: > >> It's snake oil making debugging harder. > > OK, that is a sensible argument. > >>> This was fun ;-) >> >> At the expense of seriously impacting my motivation to do any furthe= r >> code cleanup on Git. > > Well, I said it was "fun" because I was learning from a new person > who haven't made much technical or code aethesitics discussion here > so far. If teaching others frustrates you and stops contributing to > the project, that is a loss. So let's post something noteworthy technical: the current code iteratio= n sported the following routine: static struct blame_entry *blame_merge(struct blame_entry *list1, struct blame_entry *list2) { struct blame_entry *p1 =3D list1, *p2 =3D list2, **tail =3D &list1; if (!p1) return p2; if (!p2) return p1; if (p1->s_lno <=3D p2->s_lno) { do { tail =3D &p1->next; if ((p1 =3D *tail) =3D=3D NULL) { *tail =3D p2; return list1; } } while (p1->s_lno <=3D p2->s_lno); } for (;;) { *tail =3D p2; do { tail =3D &p2->next; if ((p2 =3D *tail) =3D=3D NULL) { *tail =3D p1; return list1; } } while (p1->s_lno > p2->s_lno); *tail =3D p1; do { tail =3D &p1->next; if ((p1 =3D *tail) =3D=3D NULL) { *tail =3D p2; return list1; } } while (p1->s_lno <=3D p2->s_lno); } } This is decidedly more complex than the equivalent static struct blame_entry *blame_merge(struct blame_entry *list1, struct blame_entry *list2) { struct blame_entry *p1 =3D list1, *p2 =3D list2, **tail =3D &list1; while (p1 && p2) { if (p1->s_lno <=3D p2->s_lno) { *tail =3D p1; p1 =3D *(tail =3D &p1->next); } else { *tail =3D p2; p2 =3D *(tail =3D &p2->next); } } *tail =3D p1 ? p1 : p2; return list1; } Why not use the latter one? Apart from the somewhat obsessive-compulsive avoidance of checking the same condition twice in = a row (which nowadays compilers are pretty capable of taking into account by themselves) the actually decisive difference is to avoid rewriting links that are already pointing to the right place. Those are the only write accesses that take place, and since we are talking about _sorting_, it is to be expected that in the long run, every record ends up in a cacheline different from its ultimate neighbors. Also, assuming a more or less random distribution and equally sized chains, about half of the links will already be correct. In practice, the situation tends to be more degenerate, leading to a _higher_ ratio of already correct links. Cutting down the expensive writeout of cachelines by a factor of more than 2 makes quite a difference in performance. Does it matter here? Unlikely. Profiling tools tend to be useless for seeing the impact of dirty cachelines since their operation significantly interferes with the read/write patterns so this kind of detail often escapes notice. But the total contribution of this stage to the runtime of git blame will be insignificant either way. > new blood bringing in new ideas is fine, but I'd want to see those ne= w > ideas backed by solid thiniking, and that means they may have to > explain themselves to those who are new to their ideas. Well, there _is_ solid thinking behind the above code, but there _also_ is solid thinking behind the notion that it won't matter in this case. I=A0still prefer not pushing out code in the wild that I=A0would hesita= te to employ in _all_ comparable circumstances. Consider it as a pattern or = a style: a lot of thinking went into it once, and what this should buy yo= u is to never have to think about this issue again. C++ is better at actual code reuse, but with C you basically get patter= n reuse. Your head is instantiating the templates. And it's a good idea not to crowd one's head with unnecessary and/or suboptimal patterns. --=20 David Kastrup