From: David Kastrup <dak@gnu.org>
To: git@vger.kernel.org
Subject: Re: [PATCH 3/3] builtin/blame.c: large-scale rewrite
Date: Mon, 27 Jan 2014 20:45:06 +0100 [thread overview]
Message-ID: <87a9eh8b0d.fsf@fencepost.gnu.org> (raw)
In-Reply-To: xmqq8uu1pdqq.fsf@gitster.dls.corp.google.com
Junio C Hamano <gitster@pobox.com> writes:
> David Kastrup <dak@gnu.org> writes:
>
>> The previous implementation uses a sorted linear list of struct
>> blame_entry in a struct scoreboard for organizing all partial or
>> completed work. Every task that is done requires going through the
>> whole list where most entries are not relevant to the task at hand.
>>
>> This commit reorganizes the data structures in order to have each
>> remaining subtask work with its own sorted linear list it can work off
>> front to back. Subtasks are organized into "struct origin" chains
>> hanging off particular commits. Commits are organized into a priority
>> queue, processing them in commit date order in order to keep most of
>> the work affecting a particular blob collated even in the presence of
>> an extensive merge history. In that manner, linear searches can be
>> basically avoided anywhere. They still are done for identifying one
>> of multiple analyzed files in a given commit, but the degenerate case
>> of a single large file being assembled from a multitude of smaller
>> files in the past is not likely to occur often enough to warrant
>> special treatment.
>> ---
>
> Sign-off?
Not while this is not fit for merging because of #if 0 etc and missing
functionality. This is just for review.
> Actually, I'd like to take my previous suggestion to add this as
> blame2 (to replace blame in the future) back. That was based on my
> fear/hope to see an implementation based on a completely different
> approach, but the basic premise of having one blame_entry record per
> the lines of the final image in the scoreboard, using diff between
> parents to the child to find common lines for passing the blame up,
> etc. have not changed at all and the change is all about organizing
> the pieces of data in a *much* *better* way to avoid needlessly
> finding what we already have computed.
Yes. Basically, the call graph outside of blame.c itself should be
pretty much the same.
> Style; please make /* and */ sit on their own line without anything
> else in multi-line comments.
Will do.
>> + struct origin *next;
>> struct commit *commit;
>> + /* `suspects' contains blame entries that may be attributed to
>> + * this origin's commit or to parent commits. When a commit
>> + * is being processed, all suspects will be moved, either by
>> + * assigning them to an origin in a different commit, or by
>> + * shipping them to the scoreboard's ent list because they
>> + * cannot be attributed to a different commit.
>> + */
>> + struct blame_entry *suspects;
>> mmfile_t file;
>> unsigned char blob_sha1[20];
>> unsigned mode;
>> + /* shipped gets set when shipping any suspects to the final
>> + * blame list instead of other commits
>> + */
>> + char shipped;
>
> Unused?
More like "added, forgotten, added under yet another name":
>> + char guilty;
I'll have to decide which one to keep.
>> + /* Should be present exactly once in commit chain */
>> + for (p = o->commit->util; p; l = p, p = p->next) {
>> + if (p == o)
>> + {
>
> Style; please have { sit with the control structure that opens the
> block, unless it is the opening brace of the function body or
> struct/union definition.
Ok.
>> +static struct blame_entry *
>> +blame_merge(struct blame_entry *list1,
>> + struct blame_entry *list2)
>
> Style; we do not do GNU-ish "function name begins a line". Even
> though I personally find it easier to use for things like 'grep',
> that is not the style we use around here.
Ok. I'm also certain to have a few "space between function name and
arguments" left and will grep for those before submitting the next
version.
Next version will at least include -M option, possibly leaving -C for
later.
--
David Kastrup
next prev parent reply other threads:[~2014-01-27 19:45 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-01-25 12:44 How to substructure rewrites? David Kastrup
2014-01-25 18:23 ` [PATCH 0/3] "Teaser" patch for rewriting blame for efficiency David Kastrup
2014-01-25 18:23 ` [PATCH 1/3] builtin/blame.c: struct blame_entry does not need a prev link David Kastrup
2014-01-25 18:23 ` [PATCH 2/3] Eliminate same_suspect function in builtin/blame.c David Kastrup
2014-01-25 18:23 ` [PATCH 3/3] builtin/blame.c: large-scale rewrite David Kastrup
2014-01-27 16:54 ` Junio C Hamano
2014-01-27 19:45 ` David Kastrup [this message]
2014-01-27 20:51 ` Junio C Hamano
2014-01-27 21:21 ` David Kastrup
2014-01-27 15:58 ` How to substructure rewrites? Junio C Hamano
2014-01-27 16:27 ` David Kastrup
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87a9eh8b0d.fsf@fencepost.gnu.org \
--to=dak@gnu.org \
--cc=git@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.