From: Linus Torvalds <torvalds@linux-foundation.org>
To: Junio C Hamano <gitster@pobox.com>,
Git Mailing List <git@vger.kernel.org>
Subject: Wrong damage counting in diffcore_count_changes?
Date: Fri, 4 Dec 2009 12:07:47 -0800 (PST) [thread overview]
Message-ID: <alpine.LFD.2.00.0912041200120.24579@localhost.localdomain> (raw)
Ok, so I had somebody actually ask me about '--dirstat', and as a result I
ended up looking at how well the numbers it calculates really reflect the
damage done to a file.
And to my horror, it doesn't necessarily reflect the damage well at all!
Now, dirstat just takes the same damage numbers that git uses to estimate
similarity for renames, so if dirstat gets odd numbers, then that implies
that file similarity will also be somewhat odd.
So looking at _why_ the dirstat numbers were odd, I came up with this
patch that seems to make much sense. What used to happen is that
diffcore_count_changes() simply ignored any hashes in the destination that
didn't match hashes in the source. EXCEPT if the source hash didn't exist
at all, in which case it would count _one_ destination hash that happened
to have the "next" hash value.
This changes it so that
- whenever it bypasses a destination hash (because it doesn't match a
source), it counts the bytes associated with that as "literal added"
- at the end (once we have used up all the source hashes), we do the same
thing with the remaining destination hashes.
- when hashes do match, and we use the difference in counts as a value,
we also use up that destination hash entry (the 'd++'
This _seems_ to make --dirstat output more sensible, and I'd hope that
that in turn should mean that file rename detection should also be more
sensible. But I haven't actually verified it in any way. Maybe I just
screwed up file rename detection entirely.
Did I miss something?
Linus
---
diffcore-delta.c | 11 ++++++++++-
1 files changed, 10 insertions(+), 1 deletions(-)
diff --git a/diffcore-delta.c b/diffcore-delta.c
index e670f85..7cf431d 100644
--- a/diffcore-delta.c
+++ b/diffcore-delta.c
@@ -201,10 +201,15 @@ int diffcore_count_changes(struct diff_filespec *src,
while (d->cnt) {
if (d->hashval >= s->hashval)
break;
+ la += d->cnt;
d++;
}
src_cnt = s->cnt;
- dst_cnt = d->hashval == s->hashval ? d->cnt : 0;
+ dst_cnt = 0;
+ if (d->cnt && d->hashval == s->hashval) {
+ dst_cnt = d->cnt;
+ d++;
+ }
if (src_cnt < dst_cnt) {
la += dst_cnt - src_cnt;
sc += src_cnt;
@@ -213,6 +218,10 @@ int diffcore_count_changes(struct diff_filespec *src,
sc += dst_cnt;
s++;
}
+ while (d->cnt) {
+ la += d->cnt;
+ d++;
+ }
if (!src_count_p)
free(src_count);
next reply other threads:[~2009-12-04 20:08 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-12-04 20:07 Linus Torvalds [this message]
2009-12-04 20:47 ` Wrong damage counting in diffcore_count_changes? Junio C Hamano
2009-12-04 22:48 ` Linus Torvalds
2009-12-04 23:20 ` Linus Torvalds
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=alpine.LFD.2.00.0912041200120.24579@localhost.localdomain \
--to=torvalds@linux-foundation.org \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox