From: David Kastrup <dak@gnu.org>
To: git@vger.kernel.org
Subject: Re: How to substructure rewrites?
Date: Mon, 27 Jan 2014 17:27:38 +0100 [thread overview]
Message-ID: <87eh3t8k5h.fsf@fencepost.gnu.org> (raw)
In-Reply-To: xmqqppndpgbg.fsf@gitster.dls.corp.google.com
Junio C Hamano <gitster@pobox.com> writes:
> David Kastrup <dak@gnu.org> writes:
>
>> As it can easily be guessed, the "add xxx function" commits are
>> basically adding not-yet-used code (and so will not disrupt
>> compilation), but everything starting with "Reorganize blame data
>> structures" up until the final commit will not work or compile since the
>> code does not match the data structures.
>>
>> So there is little point in substructing all that, right? Even
>> something seemingly isolated like
>>
>> commit f64b41c472442ae9971321fe8f62c3885ba4d8b7
>> Author: David Kastrup <dak@gnu.org>
>> Date: Sun Jan 19 02:16:21 2014 +0100
>>
>> blame.c: Let output determine MORE_THAN_ONE_PATH more efficiently
>>
>> is not really useful as a separate commit since while it does implement
>> a particular task, this is done starting with non-working code relying
>> on no-longer existent data structures.
>
> Small pieces that are incrementally added with their own
> documentation would certainly be a lot easier to read than one big
> ball of wax.
Sure. The problem is that my rewrite is characterized by doing as
little as possible in order to achieve identical results (with the
conceivable exception of picking a different, equally scored variant in
those parts of the algorithm choosing a maximum). That also means that
the basic logic and layout of the program stays the same while the data
flow and parts of the data structures are replaced.
> I am wondering if it would make it easier for everybody to tentatively
> do "git-blame vs git-blame2" dance here, just like we did "git-blame
> vs git-annotate" dance some years ago. That is, to add a completely
> new command and have them in parallel while cooking in 'next' (or we
> could even keep them in a few releases if we are not absolutely
> certain about the correctness of the result of the new code), aiming
> to eventually retire the current implementation and replace it with
> the new one. We have already have test infrastructure to allow us to
> run variants of blames, too, to help that kind of transition.
Well, the point is that the implementation is supposed to
a) deliver identical results
b) reuse as much code as possible
so there is no real point in working with a separate source file.
For the "if we are not absolutely certain about the correctness of the
result of the new code" angle, this should be covered with the usual
stable/unstable/proposed division most projects have in some way or
another for quality assurance. I have absolutely no clue how Git
organizes that, but it would usually mean that the new code is not
placed in a different _file_ (or a differently named command) but rather
in a different _branch_ as compared with the current implementation.
>> In general, the rule is likely "any commit should not create a
>> non-working state" right?
>
> Yes.
My current aim is to complete the code to the point where it is
a) fully operative and delivering equivalent results to the current
implementation
b) in every aspect at least as efficient as the current implementation
and in a state that is not basically less comprehensible than what I
started with
Since the change of the data structures and data flow requires changing
all affected program parts to get to a working state, and since I don't
have ambitions to do more than that which is required to get there,
I don't see how the bulk of the work can sensibly avoid coming as one
"omnibus" patch. Most changes, however, will be understandable quite
well locally.
For example, currently the code has a number of loops traversing one
global linked list, ignoring all entries not relevant to a particular
target, and doing something with the rest. Those loops generally are
replaced with a simpler loop just running through a single _completely_
relevant linked list. Even while those replacements are scattered
throughout the patch, they make sense without having to look at the rest
of the patch.
--
David Kastrup
prev parent reply other threads:[~2014-01-27 16:27 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-01-25 12:44 How to substructure rewrites? David Kastrup
2014-01-25 18:23 ` [PATCH 0/3] "Teaser" patch for rewriting blame for efficiency David Kastrup
2014-01-25 18:23 ` [PATCH 1/3] builtin/blame.c: struct blame_entry does not need a prev link David Kastrup
2014-01-25 18:23 ` [PATCH 2/3] Eliminate same_suspect function in builtin/blame.c David Kastrup
2014-01-25 18:23 ` [PATCH 3/3] builtin/blame.c: large-scale rewrite David Kastrup
2014-01-27 16:54 ` Junio C Hamano
2014-01-27 19:45 ` David Kastrup
2014-01-27 20:51 ` Junio C Hamano
2014-01-27 21:21 ` David Kastrup
2014-01-27 15:58 ` How to substructure rewrites? Junio C Hamano
2014-01-27 16:27 ` David Kastrup [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87eh3t8k5h.fsf@fencepost.gnu.org \
--to=dak@gnu.org \
--cc=git@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).