git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Tay Ray Chuan <rctay89@gmail.com>
To: "René Scharfe" <rene.scharfe@lsrfire.ath.cx>
Cc: Junio C Hamano <gitster@pobox.com>,
	Alexander Pepper <pepper@inf.fu-berlin.de>,
	git@vger.kernel.org
Subject: Re: Bug: git log --numstat counts wrong
Date: Fri, 23 Sep 2011 14:30:40 +0800	[thread overview]
Message-ID: <CALUzUxprUFGMR-WVEMOOvYiwkev1cfxHOyBmZq9bKJcHq5E2VA@mail.gmail.com> (raw)
In-Reply-To: <4E7B5F28.2020204@lsrfire.ath.cx>

On Fri, Sep 23, 2011 at 12:15 AM, René Scharfe
<rene.scharfe@lsrfire.ath.cx> wrote:
> The patch below reverts a part of 27af01d5523 that's not explained in its
> commit message and doesn't seem to contribute to the intended speedup.  It
> seems to restore the original diff output.  I don't know how it's actually
> doing that, though, as I haven't dug into the code at all.
>
> [snip]
>
> diff --git a/xdiff/xprepare.c b/xdiff/xprepare.c
> index 5a33d1a..e419f4f 100644
> --- a/xdiff/xprepare.c
> +++ b/xdiff/xprepare.c
> @@ -383,7 +383,7 @@ static int xdl_clean_mmatch(char const *dis, long i, long s, long e) {
>  * might be potentially discarded if they happear in a run of discardable.
>  */
>  static int xdl_cleanup_records(xdlclassifier_t *cf, xdfile_t *xdf1, xdfile_t *xdf2) {
> -       long i, nm, nreff;
> +       long i, nm, nreff, mlim;
>        xrecord_t **recs;
>        xdlclass_t *rcrec;
>        char *dis, *dis1, *dis2;
> @@ -396,16 +396,20 @@ static int xdl_cleanup_records(xdlclassifier_t *cf, xdfile_t *xdf1, xdfile_t *xd
>        dis1 = dis;
>        dis2 = dis1 + xdf1->nrec + 1;
>
> +       if ((mlim = xdl_bogosqrt(xdf1->nrec)) > XDL_MAX_EQLIMIT)
> +               mlim = XDL_MAX_EQLIMIT;
>        for (i = xdf1->dstart, recs = &xdf1->recs[xdf1->dstart]; i <= xdf1->dend; i++, recs++) {
>                rcrec = cf->rcrecs[(*recs)->ha];
>                nm = rcrec ? rcrec->len2 : 0;
> -               dis1[i] = (nm == 0) ? 0: 1;
> +               dis1[i] = (nm == 0) ? 0: (nm >= mlim) ? 2: 1;
>        }
>
> +       if ((mlim = xdl_bogosqrt(xdf2->nrec)) > XDL_MAX_EQLIMIT)
> +               mlim = XDL_MAX_EQLIMIT;
>        for (i = xdf2->dstart, recs = &xdf2->recs[xdf2->dstart]; i <= xdf2->dend; i++, recs++) {
>                rcrec = cf->rcrecs[(*recs)->ha];
>                nm = rcrec ? rcrec->len1 : 0;
> -               dis2[i] = (nm == 0) ? 0: 1;
> +               dis2[i] = (nm == 0) ? 0: (nm >= mlim) ? 2: 1;
>        }
>
>        for (nreff = 0, i = xdf1->dstart, recs = &xdf1->recs[xdf1->dstart];
>
>
>

Thanks for the patch, René.

Sorry for not explaining that part of the change.

My understanding of mlim is that it "caps" how deep the for loop at
around line 387 goes through a hash bucket/record chaing to find a
matching record from side A in side B (and vice-versa in a later
loop), probably to prevent running time from becoming too long.

But with 27af01d, this is no longer a concern. We can get an *exact*,
pre-computed count of matching records in the other side, so we don't
have go through the hash bucket. Thus mlim is no longer needed.

So re-introducing mlim doesn't seem right, even though it may fix this
"bug" (ie restore the old behaviour).

-- 
Cheers,
Ray Chuan

  reply	other threads:[~2011-09-23  6:30 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-09-21  9:03 Bug: git log --numstat counts wrong Alexander Pepper
2011-09-21 12:24 ` Junio C Hamano
2011-09-21 13:40   ` Alexander Pepper
2011-09-21 14:23   ` Alexander Pepper
2011-09-21 20:35     ` Junio C Hamano
2011-09-22 13:19       ` Alexander Pepper
2011-09-22 17:32         ` Junio C Hamano
2011-09-22 16:15       ` René Scharfe
2011-09-23  6:30         ` Tay Ray Chuan [this message]
2011-09-25 13:39         ` [PATCH] Revert removal of multi-match discard heuristic in 27af01 Tay Ray Chuan
2011-09-25 17:53           ` René Scharfe
2011-09-22 17:51       ` Bug: git log --numstat counts wrong Junio C Hamano
2011-09-23  9:18         ` Tay Ray Chuan
2011-09-23 16:38           ` Tay Ray Chuan
2011-09-23 19:23             ` Junio C Hamano
2011-09-23 10:30         ` Alexander Pepper

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CALUzUxprUFGMR-WVEMOOvYiwkev1cfxHOyBmZq9bKJcHq5E2VA@mail.gmail.com \
    --to=rctay89@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=pepper@inf.fu-berlin.de \
    --cc=rene.scharfe@lsrfire.ath.cx \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).