* Bug: git log --numstat counts wrong @ 2011-09-21 9:03 Alexander Pepper 2011-09-21 12:24 ` Junio C Hamano 0 siblings, 1 reply; 16+ messages in thread From: Alexander Pepper @ 2011-09-21 9:03 UTC (permalink / raw) To: git Hello there. I already reported some similar bug with git log --numstat to this mailinglist (see http://www.spinics.net/lists/git/msg163358.html ). Back then empty lines seems to be the issue, but the bug was never fixed. I found another case, where git log --numstat counts wrong. This time git log --numstat yields bigger numbers than diffstat. Minimal example: $ git clone https://github.com/voldemort/voldemort.git $ cd voldemort/ $ git show 48a07e7e533f507228e8d1c99d4d48e175e14260 -- src/java/voldemort/server/storage/StorageService.java | diffstat StorageService.java | 19 ++++++++++--------- 1 file changed, 10 insertions(+), 9 deletions(-) $ git log --numstat 48a07e7e533f507228e8d1c99d4d48e175e14260 [...] 11 10 src/java/voldemort/server/storage/StorageService.java So git log --numstat claimes that 11 lines where added, where diffstat only counts 10! A closer look inside the StorageService.java reveals no empty lines. My system: * Mac osx 10.6.8 * git 1.7.5.4 (but also check with a self compiled 1.7.6.1) Can you confirm that this is a bug? If so, are there plans to fix it in the future? Greetings from Berlin Alex ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Bug: git log --numstat counts wrong 2011-09-21 9:03 Bug: git log --numstat counts wrong Alexander Pepper @ 2011-09-21 12:24 ` Junio C Hamano 2011-09-21 13:40 ` Alexander Pepper 2011-09-21 14:23 ` Alexander Pepper 0 siblings, 2 replies; 16+ messages in thread From: Junio C Hamano @ 2011-09-21 12:24 UTC (permalink / raw) To: Alexander Pepper; +Cc: git Alexander Pepper <pepper@inf.fu-berlin.de> writes: > $ git log --numstat 48a07e7e533f507228e8d1c99d4d48e175e14260 > [...] > 11 10 src/java/voldemort/server/storage/StorageService.java Didn't we update it this already? I seem to get 10/9 here not 11/10. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Bug: git log --numstat counts wrong 2011-09-21 12:24 ` Junio C Hamano @ 2011-09-21 13:40 ` Alexander Pepper 2011-09-21 14:23 ` Alexander Pepper 1 sibling, 0 replies; 16+ messages in thread From: Alexander Pepper @ 2011-09-21 13:40 UTC (permalink / raw) To: Junio C Hamano; +Cc: git Am 21.09.2011 um 14:24 schrieb Junio C Hamano: >> $ git log --numstat 48a07e7e533f507228e8d1c99d4d48e175e14260 >> [...] >> 11 10 src/java/voldemort/server/storage/StorageService.java > > Didn't we update it this already? I seem to get 10/9 here not 11/10. I just compiled git fresh from the master (4b5eac7f) and the issue is still active there: $ ../git/git --version git version 1.7.7.rc0.72.g4b5ea $ ../git/git log --numstat 48a07e7e533f507228e8d1c99d4d48e175e14260 [...] 11 10 src/java/voldemort/server/storage/StorageService.java Should I check another branch? ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Bug: git log --numstat counts wrong 2011-09-21 12:24 ` Junio C Hamano 2011-09-21 13:40 ` Alexander Pepper @ 2011-09-21 14:23 ` Alexander Pepper 2011-09-21 20:35 ` Junio C Hamano 1 sibling, 1 reply; 16+ messages in thread From: Alexander Pepper @ 2011-09-21 14:23 UTC (permalink / raw) To: Junio C Hamano; +Cc: git Am 21.09.2011 um 14:24 schrieb Junio C Hamano: >> $ git log --numstat 48a07e7e533f507228e8d1c99d4d48e175e14260 >> [...] >> 11 10 src/java/voldemort/server/storage/StorageService.java > > Didn't we update it this already? I seem to get 10/9 here not 11/10. Besides master I tried some other branches. Current 'maint' (cd2b8ae9), 'master' (4b5eac7f) and v1.7.7-rc0 (a452d148) are still counting wrong, but 'next' (3be2039f) and 'pu' (b4bcbace) do count correctly. I'll have a closer look with the 'next' branch if all counts are correct now. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Bug: git log --numstat counts wrong 2011-09-21 14:23 ` Alexander Pepper @ 2011-09-21 20:35 ` Junio C Hamano 2011-09-22 13:19 ` Alexander Pepper ` (2 more replies) 0 siblings, 3 replies; 16+ messages in thread From: Junio C Hamano @ 2011-09-21 20:35 UTC (permalink / raw) To: Alexander Pepper; +Cc: git, Tay Ray Chuan Alexander Pepper <pepper@inf.fu-berlin.de> writes: > Am 21.09.2011 um 14:24 schrieb Junio C Hamano: >>> $ git log --numstat 48a07e7e533f507228e8d1c99d4d48e175e14260 >>> [...] >>> 11 10 src/java/voldemort/server/storage/StorageService.java >> >> Didn't we update it this already? I seem to get 10/9 here not 11/10. > > Current 'maint' (cd2b8ae9), 'master' (4b5eac7f)... That's a tad old master you seem to have. Strangely, bisection points at 27af01d5523, which was supposed to be only about performance and never about correctness. There is something fishy going on.... ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Bug: git log --numstat counts wrong 2011-09-21 20:35 ` Junio C Hamano @ 2011-09-22 13:19 ` Alexander Pepper 2011-09-22 17:32 ` Junio C Hamano 2011-09-22 16:15 ` René Scharfe 2011-09-22 17:51 ` Bug: git log --numstat counts wrong Junio C Hamano 2 siblings, 1 reply; 16+ messages in thread From: Alexander Pepper @ 2011-09-22 13:19 UTC (permalink / raw) To: Junio C Hamano; +Cc: git, Tay Ray Chuan Am 21.09.2011 um 22:35 schrieb Junio C Hamano: > That's a tad old master you seem to have. Up until now I only followed the github clone, but that seems to be dated. Since kernel.org is still down, I'll now follow the google code clone. Am 21.09.2011 um 22:35 schrieb Junio C Hamano: >> Am 21.09.2011 um 14:24 schrieb Junio C Hamano: >>>> $ git log --numstat 48a07e7e533f507228e8d1c99d4d48e175e14260 >>>> [...] >>>> 11 10 src/java/voldemort/server/storage/StorageService.java >>> >>> Didn't we update it this already? I seem to get 10/9 here not 11/10. >> >> Current 'maint' (cd2b8ae9), 'master' (4b5eac7f)... > > Strangely, bisection points at 27af01d5523, which was supposed to be only > about performance and never about correctness. There is something fishy > going on.... I also did some tests, and besides --numstat also git show sometimes show different patches in comparison to older git versions. The last version with the "old" git show output is 1.7.7.rc0 and the first version with the "new" git show output is 1.7.7.rc1. with git version 1.7.7.rc0: $ git show 679e1d5cd007a0a9cb2813bd155622d7a1e904bd : [...] diff --git a/src/java/voldemort/store/stats/RequestCounter.java b/src/java/voldemort/store/stats/RequestCounter.java index b012e98..c6be603 100644 --- a/src/java/voldemort/store/stats/RequestCounter.java +++ b/src/java/voldemort/store/stats/RequestCounter.java @@ -64,16 +64,21 @@ public class RequestCounter { Accumulator accum = values.get(); long now = System.currentTimeMillis(); - if(now - accum.startTimeMS > durationMS) { - Accumulator newWithTotal = accum.newWithTotal(); - values.set(newWithTotal); - - /* - * try to set. if we fail, then someone else set it, so just keep going - */ - if(values.compareAndSet(accum, newWithTotal)) { - return newWithTotal; - } + /* + * if still in the window, just return it + */ + if(now - accum.startTimeMS <= durationMS) { + return accum; + } + + /* + * try to set. if we fail, then someone else set it, so just return that new one + */ + + Accumulator newWithTotal = accum.newWithTotal(); + + if(values.compareAndSet(accum, newWithTotal)) { + return newWithTotal; } return values.get(); with git version 1.7.7.rc1: $ git show 679e1d5cd007a0a9cb2813bd155622d7a1e904bd [...] diff --git a/src/java/voldemort/store/stats/RequestCounter.java b/src/java/voldemort/store/stats/RequestCounter.java index b012e98..c6be603 100644 --- a/src/java/voldemort/store/stats/RequestCounter.java +++ b/src/java/voldemort/store/stats/RequestCounter.java @@ -64,16 +64,21 @@ public class RequestCounter { Accumulator accum = values.get(); long now = System.currentTimeMillis(); - if(now - accum.startTimeMS > durationMS) { - Accumulator newWithTotal = accum.newWithTotal(); - values.set(newWithTotal); + /* + * if still in the window, just return it + */ + if(now - accum.startTimeMS <= durationMS) { + return accum; + } - /* - * try to set. if we fail, then someone else set it, so just keep going - */ - if(values.compareAndSet(accum, newWithTotal)) { - return newWithTotal; - } + /* + * try to set. if we fail, then someone else set it, so just return that new one + */ + + Accumulator newWithTotal = accum.newWithTotal(); + + if(values.compareAndSet(accum, newWithTotal)) { + return newWithTotal; } return values.get(); The difference is, that now it's shown as two delete and two added hunks instead of one bigger delete and one bigger added hunk. with git version 1.7.7.rc1: $ git show 679e1d5cd007a0a9cb2813bd155622d7a1e904bd | diffstat RequestCounter.java | 23 ++++++++++++++--------- 1 file changed, 14 insertions(+), 9 deletions(-) with git version 1.7.7.rc0: $ git show 679e1d5cd007a0a9cb2813bd155622d7a1e904bd | diffstat RequestCounter.java | 25 +++++++++++++++---------- 1 file changed, 15 insertions(+), 10 deletions(-) When used git version 1.7.7.rc1 I didn't observed any case where git show and git log --numstat mismatch. I'm only a little confused, that 'git show' yields different results, depending on the git version. Greetings from Berlin Alex ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: Bug: git log --numstat counts wrong 2011-09-22 13:19 ` Alexander Pepper @ 2011-09-22 17:32 ` Junio C Hamano 0 siblings, 0 replies; 16+ messages in thread From: Junio C Hamano @ 2011-09-22 17:32 UTC (permalink / raw) To: Alexander Pepper; +Cc: git, Tay Ray Chuan Alexander Pepper <pepper@inf.fu-berlin.de> writes: > When used git version 1.7.7.rc1 I didn't observed any case where git > show and git log --numstat mismatch. I'm only a little confused, that > 'git show' yields different results, depending on the git version. In general it is not surprising nor unexpected--as long as both patches describe the change correctly, they are both valid. What was unexpected to me was that 27af01d (xdiff/xprepare: improve O(n*m) performance in xdl_cleanup_records(), 2011-08-17) which was supposed to be only about performance and not about logic made that difference. ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Bug: git log --numstat counts wrong 2011-09-21 20:35 ` Junio C Hamano 2011-09-22 13:19 ` Alexander Pepper @ 2011-09-22 16:15 ` René Scharfe 2011-09-23 6:30 ` Tay Ray Chuan 2011-09-25 13:39 ` [PATCH] Revert removal of multi-match discard heuristic in 27af01 Tay Ray Chuan 2011-09-22 17:51 ` Bug: git log --numstat counts wrong Junio C Hamano 2 siblings, 2 replies; 16+ messages in thread From: René Scharfe @ 2011-09-22 16:15 UTC (permalink / raw) To: git; +Cc: git Am 21.09.2011 22:35, schrieb Junio C Hamano: > Alexander Pepper <pepper@inf.fu-berlin.de> writes: > >> Am 21.09.2011 um 14:24 schrieb Junio C Hamano: >>>> $ git log --numstat 48a07e7e533f507228e8d1c99d4d48e175e14260 >>>> [...] >>>> 11 10 src/java/voldemort/server/storage/StorageService.java >>> >>> Didn't we update it this already? I seem to get 10/9 here not 11/10. >> >> Current 'maint' (cd2b8ae9), 'master' (4b5eac7f)... > > That's a tad old master you seem to have. > > Strangely, bisection points at 27af01d5523, which was supposed to be only > about performance and never about correctness. There is something fishy > going on.... The patch below reverts a part of 27af01d5523 that's not explained in its commit message and doesn't seem to contribute to the intended speedup. It seems to restore the original diff output. I don't know how it's actually doing that, though, as I haven't dug into the code at all. Alexander, can you confirm that this patch restores the old behaviour of git diff and git show for your test cases? Ray, are you able to write a commit message for this patch if it turns out to be useful? René diff --git a/xdiff/xprepare.c b/xdiff/xprepare.c index 5a33d1a..e419f4f 100644 --- a/xdiff/xprepare.c +++ b/xdiff/xprepare.c @@ -383,7 +383,7 @@ static int xdl_clean_mmatch(char const *dis, long i, long s, long e) { * might be potentially discarded if they happear in a run of discardable. */ static int xdl_cleanup_records(xdlclassifier_t *cf, xdfile_t *xdf1, xdfile_t *xdf2) { - long i, nm, nreff; + long i, nm, nreff, mlim; xrecord_t **recs; xdlclass_t *rcrec; char *dis, *dis1, *dis2; @@ -396,16 +396,20 @@ static int xdl_cleanup_records(xdlclassifier_t *cf, xdfile_t *xdf1, xdfile_t *xd dis1 = dis; dis2 = dis1 + xdf1->nrec + 1; + if ((mlim = xdl_bogosqrt(xdf1->nrec)) > XDL_MAX_EQLIMIT) + mlim = XDL_MAX_EQLIMIT; for (i = xdf1->dstart, recs = &xdf1->recs[xdf1->dstart]; i <= xdf1->dend; i++, recs++) { rcrec = cf->rcrecs[(*recs)->ha]; nm = rcrec ? rcrec->len2 : 0; - dis1[i] = (nm == 0) ? 0: 1; + dis1[i] = (nm == 0) ? 0: (nm >= mlim) ? 2: 1; } + if ((mlim = xdl_bogosqrt(xdf2->nrec)) > XDL_MAX_EQLIMIT) + mlim = XDL_MAX_EQLIMIT; for (i = xdf2->dstart, recs = &xdf2->recs[xdf2->dstart]; i <= xdf2->dend; i++, recs++) { rcrec = cf->rcrecs[(*recs)->ha]; nm = rcrec ? rcrec->len1 : 0; - dis2[i] = (nm == 0) ? 0: 1; + dis2[i] = (nm == 0) ? 0: (nm >= mlim) ? 2: 1; } for (nreff = 0, i = xdf1->dstart, recs = &xdf1->recs[xdf1->dstart]; ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: Bug: git log --numstat counts wrong 2011-09-22 16:15 ` René Scharfe @ 2011-09-23 6:30 ` Tay Ray Chuan 2011-09-25 13:39 ` [PATCH] Revert removal of multi-match discard heuristic in 27af01 Tay Ray Chuan 1 sibling, 0 replies; 16+ messages in thread From: Tay Ray Chuan @ 2011-09-23 6:30 UTC (permalink / raw) To: René Scharfe; +Cc: Junio C Hamano, Alexander Pepper, git On Fri, Sep 23, 2011 at 12:15 AM, René Scharfe <rene.scharfe@lsrfire.ath.cx> wrote: > The patch below reverts a part of 27af01d5523 that's not explained in its > commit message and doesn't seem to contribute to the intended speedup. It > seems to restore the original diff output. I don't know how it's actually > doing that, though, as I haven't dug into the code at all. > > [snip] > > diff --git a/xdiff/xprepare.c b/xdiff/xprepare.c > index 5a33d1a..e419f4f 100644 > --- a/xdiff/xprepare.c > +++ b/xdiff/xprepare.c > @@ -383,7 +383,7 @@ static int xdl_clean_mmatch(char const *dis, long i, long s, long e) { > * might be potentially discarded if they happear in a run of discardable. > */ > static int xdl_cleanup_records(xdlclassifier_t *cf, xdfile_t *xdf1, xdfile_t *xdf2) { > - long i, nm, nreff; > + long i, nm, nreff, mlim; > xrecord_t **recs; > xdlclass_t *rcrec; > char *dis, *dis1, *dis2; > @@ -396,16 +396,20 @@ static int xdl_cleanup_records(xdlclassifier_t *cf, xdfile_t *xdf1, xdfile_t *xd > dis1 = dis; > dis2 = dis1 + xdf1->nrec + 1; > > + if ((mlim = xdl_bogosqrt(xdf1->nrec)) > XDL_MAX_EQLIMIT) > + mlim = XDL_MAX_EQLIMIT; > for (i = xdf1->dstart, recs = &xdf1->recs[xdf1->dstart]; i <= xdf1->dend; i++, recs++) { > rcrec = cf->rcrecs[(*recs)->ha]; > nm = rcrec ? rcrec->len2 : 0; > - dis1[i] = (nm == 0) ? 0: 1; > + dis1[i] = (nm == 0) ? 0: (nm >= mlim) ? 2: 1; > } > > + if ((mlim = xdl_bogosqrt(xdf2->nrec)) > XDL_MAX_EQLIMIT) > + mlim = XDL_MAX_EQLIMIT; > for (i = xdf2->dstart, recs = &xdf2->recs[xdf2->dstart]; i <= xdf2->dend; i++, recs++) { > rcrec = cf->rcrecs[(*recs)->ha]; > nm = rcrec ? rcrec->len1 : 0; > - dis2[i] = (nm == 0) ? 0: 1; > + dis2[i] = (nm == 0) ? 0: (nm >= mlim) ? 2: 1; > } > > for (nreff = 0, i = xdf1->dstart, recs = &xdf1->recs[xdf1->dstart]; > > > Thanks for the patch, René. Sorry for not explaining that part of the change. My understanding of mlim is that it "caps" how deep the for loop at around line 387 goes through a hash bucket/record chaing to find a matching record from side A in side B (and vice-versa in a later loop), probably to prevent running time from becoming too long. But with 27af01d, this is no longer a concern. We can get an *exact*, pre-computed count of matching records in the other side, so we don't have go through the hash bucket. Thus mlim is no longer needed. So re-introducing mlim doesn't seem right, even though it may fix this "bug" (ie restore the old behaviour). -- Cheers, Ray Chuan ^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH] Revert removal of multi-match discard heuristic in 27af01 2011-09-22 16:15 ` René Scharfe 2011-09-23 6:30 ` Tay Ray Chuan @ 2011-09-25 13:39 ` Tay Ray Chuan 2011-09-25 17:53 ` René Scharfe 1 sibling, 1 reply; 16+ messages in thread From: Tay Ray Chuan @ 2011-09-25 13:39 UTC (permalink / raw) To: Git Mailing List; +Cc: Junio C Hamano, René Scharfe From: René Scharfe <rene.scharfe@lsrfire.ath.cx> 27af01d (xdiff/xprepare: improve O(n*m) performance in xdl_cleanup_records(), 2011-08-17) was supposed to be a performance boost only. However, it unexpectedly changed the behaviour of diff. Revert a part of 27af01d that removes logic that mark lines as "multi-match" (ie. dis[i] == 2). This was preventing the multi-match discard heuristic (performed in xdl_cleanup_records() and xdl_clean_mmatch()) from executing. Reported-by: Alexander Pepper <pepper@inf.fu-berlin.de> Signed-off-by: Tay Ray Chuan <rctay89@gmail.com> --- Junio, this replaces the patch the one in the 'rs/diff-cleanup-records-fix' topic in 'pu'. The only difference is in the patch message. René, will need your SOB on this. Thanks for working to produce the patch. Please disregard my earlier message [1], further reading has shown my previous understanding to be wrong. [1] <CALUzUxprUFGMR-WVEMOOvYiwkev1cfxHOyBmZq9bKJcHq5E2VA@mail.gmail.com> --- xdiff/xprepare.c | 10 +++++++--- 1 files changed, 7 insertions(+), 3 deletions(-) diff --git a/xdiff/xprepare.c b/xdiff/xprepare.c index 05a8f01..4c447ca 100644 --- a/xdiff/xprepare.c +++ b/xdiff/xprepare.c @@ -398,7 +398,7 @@ static int xdl_clean_mmatch(char const *dis, long i, long s, long e) { * might be potentially discarded if they happear in a run of discardable. */ static int xdl_cleanup_records(xdlclassifier_t *cf, xdfile_t *xdf1, xdfile_t *xdf2) { - long i, nm, nreff; + long i, nm, nreff, mlim; xrecord_t **recs; xdlclass_t *rcrec; char *dis, *dis1, *dis2; @@ -411,16 +411,20 @@ static int xdl_cleanup_records(xdlclassifier_t *cf, xdfile_t *xdf1, xdfile_t *xd dis1 = dis; dis2 = dis1 + xdf1->nrec + 1; + if ((mlim = xdl_bogosqrt(xdf1->nrec)) > XDL_MAX_EQLIMIT) + mlim = XDL_MAX_EQLIMIT; for (i = xdf1->dstart, recs = &xdf1->recs[xdf1->dstart]; i <= xdf1->dend; i++, recs++) { rcrec = cf->rcrecs[(*recs)->ha]; nm = rcrec ? rcrec->len2 : 0; - dis1[i] = (nm == 0) ? 0: 1; + dis1[i] = (nm == 0) ? 0: (nm >= mlim) ? 2: 1; } + if ((mlim = xdl_bogosqrt(xdf2->nrec)) > XDL_MAX_EQLIMIT) + mlim = XDL_MAX_EQLIMIT; for (i = xdf2->dstart, recs = &xdf2->recs[xdf2->dstart]; i <= xdf2->dend; i++, recs++) { rcrec = cf->rcrecs[(*recs)->ha]; nm = rcrec ? rcrec->len1 : 0; - dis2[i] = (nm == 0) ? 0: 1; + dis2[i] = (nm == 0) ? 0: (nm >= mlim) ? 2: 1; } for (nreff = 0, i = xdf1->dstart, recs = &xdf1->recs[xdf1->dstart]; -- 1.7.7.rc3.432.g6bcf0 ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH] Revert removal of multi-match discard heuristic in 27af01 2011-09-25 13:39 ` [PATCH] Revert removal of multi-match discard heuristic in 27af01 Tay Ray Chuan @ 2011-09-25 17:53 ` René Scharfe 0 siblings, 0 replies; 16+ messages in thread From: René Scharfe @ 2011-09-25 17:53 UTC (permalink / raw) To: Tay Ray Chuan; +Cc: Git Mailing List, Junio C Hamano Am 25.09.2011 15:39, schrieb Tay Ray Chuan: > From: René Scharfe <rene.scharfe@lsrfire.ath.cx> > > 27af01d (xdiff/xprepare: improve O(n*m) performance in > xdl_cleanup_records(), 2011-08-17) was supposed to be a performance > boost only. However, it unexpectedly changed the behaviour of diff. > > Revert a part of 27af01d that removes logic that mark lines as > "multi-match" (ie. dis[i] == 2). This was preventing the multi-match > discard heuristic (performed in xdl_cleanup_records() and > xdl_clean_mmatch()) from executing. > > Reported-by: Alexander Pepper <pepper@inf.fu-berlin.de> > Signed-off-by: Tay Ray Chuan <rctay89@gmail.com> > > --- > > Junio, this replaces the patch the one in the > 'rs/diff-cleanup-records-fix' topic in 'pu'. The only difference is in > the patch message. > > René, will need your SOB on this. Thanks for working to produce the > patch. Please disregard my earlier message [1], further reading has > shown my previous understanding to be wrong. Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx> > > [1] <CALUzUxprUFGMR-WVEMOOvYiwkev1cfxHOyBmZq9bKJcHq5E2VA@mail.gmail.com> > --- > xdiff/xprepare.c | 10 +++++++--- > 1 files changed, 7 insertions(+), 3 deletions(-) > > diff --git a/xdiff/xprepare.c b/xdiff/xprepare.c > index 05a8f01..4c447ca 100644 > --- a/xdiff/xprepare.c > +++ b/xdiff/xprepare.c > @@ -398,7 +398,7 @@ static int xdl_clean_mmatch(char const *dis, long i, long s, long e) { > * might be potentially discarded if they happear in a run of discardable. > */ > static int xdl_cleanup_records(xdlclassifier_t *cf, xdfile_t *xdf1, xdfile_t *xdf2) { > - long i, nm, nreff; > + long i, nm, nreff, mlim; > xrecord_t **recs; > xdlclass_t *rcrec; > char *dis, *dis1, *dis2; > @@ -411,16 +411,20 @@ static int xdl_cleanup_records(xdlclassifier_t *cf, xdfile_t *xdf1, xdfile_t *xd > dis1 = dis; > dis2 = dis1 + xdf1->nrec + 1; > > + if ((mlim = xdl_bogosqrt(xdf1->nrec)) > XDL_MAX_EQLIMIT) > + mlim = XDL_MAX_EQLIMIT; > for (i = xdf1->dstart, recs = &xdf1->recs[xdf1->dstart]; i <= xdf1->dend; i++, recs++) { > rcrec = cf->rcrecs[(*recs)->ha]; > nm = rcrec ? rcrec->len2 : 0; > - dis1[i] = (nm == 0) ? 0: 1; > + dis1[i] = (nm == 0) ? 0: (nm >= mlim) ? 2: 1; > } > > + if ((mlim = xdl_bogosqrt(xdf2->nrec)) > XDL_MAX_EQLIMIT) > + mlim = XDL_MAX_EQLIMIT; > for (i = xdf2->dstart, recs = &xdf2->recs[xdf2->dstart]; i <= xdf2->dend; i++, recs++) { > rcrec = cf->rcrecs[(*recs)->ha]; > nm = rcrec ? rcrec->len1 : 0; > - dis2[i] = (nm == 0) ? 0: 1; > + dis2[i] = (nm == 0) ? 0: (nm >= mlim) ? 2: 1; > } > > for (nreff = 0, i = xdf1->dstart, recs = &xdf1->recs[xdf1->dstart]; ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Bug: git log --numstat counts wrong 2011-09-21 20:35 ` Junio C Hamano 2011-09-22 13:19 ` Alexander Pepper 2011-09-22 16:15 ` René Scharfe @ 2011-09-22 17:51 ` Junio C Hamano 2011-09-23 9:18 ` Tay Ray Chuan 2011-09-23 10:30 ` Alexander Pepper 2 siblings, 2 replies; 16+ messages in thread From: Junio C Hamano @ 2011-09-22 17:51 UTC (permalink / raw) To: Alexander Pepper; +Cc: git, Tay Ray Chuan Junio C Hamano <gitster@pobox.com> writes: > Alexander Pepper <pepper@inf.fu-berlin.de> writes: > >> Am 21.09.2011 um 14:24 schrieb Junio C Hamano: >>>> $ git log --numstat 48a07e7e533f507228e8d1c99d4d48e175e14260 >>>> [...] >>>> 11 10 src/java/voldemort/server/storage/StorageService.java >>> >>> Didn't we update it this already? I seem to get 10/9 here not 11/10. >> >> Current 'maint' (cd2b8ae9), 'master' (4b5eac7f)... > > That's a tad old master you seem to have. > > Strangely, bisection points at 27af01d5523, which was supposed to be only > about performance and never about correctness. There is something fishy > going on.... In any case, I think the real issue is that depending on how much context you ask, the resulting diff is different (and both are valid diffs). If you ask "log -p" (or "diff" or "show") to produce a patch, then we use the default 3-line context. And then you feed that to an external diffstat to count the number of deleted and added lines to get one set of numbers. The --numstat (and --diffstat) code seems to be running the internal diff machinery with 0-line context and counting the resulting diff internally. And of course the results between the above two would be different because diff can match lines differently when given different number of context lines to include in the result. So perhaps a good sanity-check for you to try (note: not checking your sanity, but checking the sanity of the above analysis) would be to do: $ git show 48a07e7e53 -- $that_path | diffstat $ git show -U0 48a07e7e53 -- $that_path | diffstat $ git show --numstat 48a07e7e53 -- $that_path $ git show --stat 48a07e7e53 -- $that_path and see how they compare (make sure to use the same version of git for these experiments). The first one uses the default 3-lines context, the second one forces 0-line context, and the last two uses 0-line context hardwired in the code. Applying the following patch should make the last two use the default context or -U$num given from the command line to be consistent with the codepath where we generate textual patches. diff.c | 2 ++ 1 files changed, 2 insertions(+), 0 deletions(-) diff --git a/diff.c b/diff.c index 9038f19..302ef33 100644 --- a/diff.c +++ b/diff.c @@ -2251,6 +2251,8 @@ static void builtin_diffstat(const char *name_a, const char *name_b, memset(&xpp, 0, sizeof(xpp)); memset(&xecfg, 0, sizeof(xecfg)); xpp.flags = o->xdl_opts; + xecfg.ctxlen = o->context; + xecfg.interhunkctxlen = o->interhunkcontext; xdi_diff_outf(&mf1, &mf2, diffstat_consume, diffstat, &xpp, &xecfg); } ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: Bug: git log --numstat counts wrong 2011-09-22 17:51 ` Bug: git log --numstat counts wrong Junio C Hamano @ 2011-09-23 9:18 ` Tay Ray Chuan 2011-09-23 16:38 ` Tay Ray Chuan 2011-09-23 10:30 ` Alexander Pepper 1 sibling, 1 reply; 16+ messages in thread From: Tay Ray Chuan @ 2011-09-23 9:18 UTC (permalink / raw) To: Junio C Hamano; +Cc: Alexander Pepper, git On Fri, Sep 23, 2011 at 1:51 AM, Junio C Hamano <gitster@pobox.com> wrote: > Junio C Hamano <gitster@pobox.com> writes: > >> Alexander Pepper <pepper@inf.fu-berlin.de> writes: >> >>> Am 21.09.2011 um 14:24 schrieb Junio C Hamano: >>>>> $ git log --numstat 48a07e7e533f507228e8d1c99d4d48e175e14260 >>>>> [...] >>>>> 11 10 src/java/voldemort/server/storage/StorageService.java >>>> >>>> Didn't we update it this already? I seem to get 10/9 here not 11/10. >>> >>> Current 'maint' (cd2b8ae9), 'master' (4b5eac7f)... >> >> That's a tad old master you seem to have. >> >> Strangely, bisection points at 27af01d5523, which was supposed to be only >> about performance and never about correctness. There is something fishy >> going on.... > > In any case, I think the real issue is that depending on how much context > you ask, the resulting diff is different (and both are valid diffs). If > you ask "log -p" (or "diff" or "show") to produce a patch, then we use the > default 3-line context. And then you feed that to an external diffstat to > count the number of deleted and added lines to get one set of numbers. > > The --numstat (and --diffstat) code seems to be running the internal diff > machinery with 0-line context and counting the resulting diff internally. > > And of course the results between the above two would be different because > diff can match lines differently when given different number of context > lines to include in the result. > > So perhaps a good sanity-check for you to try (note: not checking your > sanity, but checking the sanity of the above analysis) would be to do: > > $ git show 48a07e7e53 -- $that_path | diffstat > $ git show -U0 48a07e7e53 -- $that_path | diffstat > $ git show --numstat 48a07e7e53 -- $that_path > $ git show --stat 48a07e7e53 -- $that_path > > and see how they compare (make sure to use the same version of git for > these experiments). The first one uses the default 3-lines context, the > second one forces 0-line context, and the last two uses 0-line context > hardwired in the code. > > Applying the following patch should make the last two use the default > context or -U$num given from the command line to be consistent with the > codepath where we generate textual patches. > > diff.c | 2 ++ > 1 files changed, 2 insertions(+), 0 deletions(-) > > diff --git a/diff.c b/diff.c > index 9038f19..302ef33 100644 > --- a/diff.c > +++ b/diff.c > @@ -2251,6 +2251,8 @@ static void builtin_diffstat(const char *name_a, const char *name_b, > memset(&xpp, 0, sizeof(xpp)); > memset(&xecfg, 0, sizeof(xecfg)); > xpp.flags = o->xdl_opts; > + xecfg.ctxlen = o->context; > + xecfg.interhunkctxlen = o->interhunkcontext; > xdi_diff_outf(&mf1, &mf2, diffstat_consume, diffstat, > &xpp, &xecfg); > } Thanks Junio. But wait, where does this patch go? Before or after 27af01d? If I'm understanding the situation correctly, this patch won't change the reporting 10/9 for --numstat, no? Anyway, this patch looks right. Acked-by: Tay Ray Chuan <rctay89@gmail.com> Interesting to find that we have many xdiff users that don't respect diff options on the command line (or may not acess to them), like patch-id, merge. I wonder if there would be less conflicts if merge's ctxlen could be overriden... -- Cheers, Ray Chuan ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Bug: git log --numstat counts wrong 2011-09-23 9:18 ` Tay Ray Chuan @ 2011-09-23 16:38 ` Tay Ray Chuan 2011-09-23 19:23 ` Junio C Hamano 0 siblings, 1 reply; 16+ messages in thread From: Tay Ray Chuan @ 2011-09-23 16:38 UTC (permalink / raw) To: Junio C Hamano; +Cc: Alexander Pepper, git On Fri, Sep 23, 2011 at 5:18 PM, Tay Ray Chuan <rctay89@gmail.com> wrote: > On Fri, Sep 23, 2011 at 1:51 AM, Junio C Hamano <gitster@pobox.com> wrote: >> [snip] >> Applying the following patch should make the last two use the default >> context or -U$num given from the command line to be consistent with the >> codepath where we generate textual patches. >> >> diff.c | 2 ++ >> 1 files changed, 2 insertions(+), 0 deletions(-) >> >> diff --git a/diff.c b/diff.c >> index 9038f19..302ef33 100644 >> --- a/diff.c >> +++ b/diff.c >> @@ -2251,6 +2251,8 @@ static void builtin_diffstat(const char *name_a, const char *name_b, >> memset(&xpp, 0, sizeof(xpp)); >> memset(&xecfg, 0, sizeof(xecfg)); >> xpp.flags = o->xdl_opts; >> + xecfg.ctxlen = o->context; >> + xecfg.interhunkctxlen = o->interhunkcontext; >> xdi_diff_outf(&mf1, &mf2, diffstat_consume, diffstat, >> &xpp, &xecfg); >> } > > Thanks Junio. > > But wait, where does this patch go? Before or after 27af01d? If I'm > understanding the situation correctly, this patch won't change the > reporting 10/9 for --numstat, no? I think I can answer this - on to v1.7.6, which is before 27af01d was merged in. > Anyway, this patch looks right. On further thought, I think the patch merely side-steps the problem - ie. that -U0 generates "incorrect" diffs. Further digging reveals a xdiff-interface.c::trim_common_tail(); commenting its one and only call (patch below) gives back 10/9. Note that it only has effect when -U0. I think this function is incorrect. xdl_cleanup_records() and xdl_clean_mmatch() may potentially look into common tail lines, so it may not be "safe" to drop all common tail lines. -- >8 -- diff --git a/xdiff-interface.c b/xdiff-interface.c index 0e2c169..da4fab6 100644 --- a/xdiff-interface.c +++ b/xdiff-interface.c @@ -131,7 +131,7 @@ mmfile_t a = *mf1; mmfile_t b = *mf2; - trim_common_tail(&a, &b, xecfg->ctxlen); +/* trim_common_tail(&a, &b, xecfg->ctxlen); */ return xdl_diff(&a, &b, xpp, xecfg, xecb); } -- >8 -- -- Cheers, Ray Chuan ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: Bug: git log --numstat counts wrong 2011-09-23 16:38 ` Tay Ray Chuan @ 2011-09-23 19:23 ` Junio C Hamano 0 siblings, 0 replies; 16+ messages in thread From: Junio C Hamano @ 2011-09-23 19:23 UTC (permalink / raw) To: Tay Ray Chuan; +Cc: Alexander Pepper, git Tay Ray Chuan <rctay89@gmail.com> writes: > On further thought, I think the patch merely side-steps the problem - > ie. that -U0 generates "incorrect" diffs. I do not know if there anything "incorrect" about it. When the file have common blocks lines in different places that are not modified, the comparison between preimage and postimage is free to choose how these common blocks are matched, and for that reason it is incorrect to expect that "diff -U0", "diff -U3" and "diff -U20" would produce the identical results. > I think this function is incorrect. xdl_cleanup_records() and > xdl_clean_mmatch() may potentially look into common tail lines, so it > may not be "safe" to drop all common tail lines. I would prefer to keep that common trimming optimization, and also to see the same common trimming logic extended to trim (and adjust offsets) at the beginning as well in the longer term. If xdl_cleanup_records() and xdl_clean_mmatch() need to become aware of the change in the total number of lines made by trim_common_tail(), please make it so. > > -- >8 -- > diff --git a/xdiff-interface.c b/xdiff-interface.c > index 0e2c169..da4fab6 100644 > --- a/xdiff-interface.c > +++ b/xdiff-interface.c > @@ -131,7 +131,7 @@ > mmfile_t a = *mf1; > mmfile_t b = *mf2; > > - trim_common_tail(&a, &b, xecfg->ctxlen); > +/* trim_common_tail(&a, &b, xecfg->ctxlen); */ > > return xdl_diff(&a, &b, xpp, xecfg, xecb); > } > -- >8 -- ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Bug: git log --numstat counts wrong 2011-09-22 17:51 ` Bug: git log --numstat counts wrong Junio C Hamano 2011-09-23 9:18 ` Tay Ray Chuan @ 2011-09-23 10:30 ` Alexander Pepper 1 sibling, 0 replies; 16+ messages in thread From: Alexander Pepper @ 2011-09-23 10:30 UTC (permalink / raw) To: Junio C Hamano; +Cc: git, Tay Ray Chuan Am 22.09.2011 um 19:51 schrieb Junio C Hamano: > So perhaps a good sanity-check for you to try (note: not checking your > sanity, but checking the sanity of the above analysis) would be to do: > > $ git show 48a07e7e53 -- $that_path | diffstat > $ git show -U0 48a07e7e53 -- $that_path | diffstat > $ git show --numstat 48a07e7e53 -- $that_path > $ git show --stat 48a07e7e53 -- $that_path [...] > Applying the following patch should make the last two use the default > context or -U$num given from the command line to be consistent with the > codepath where we generate textual patches. > > diff.c | 2 ++ > 1 files changed, 2 insertions(+), 0 deletions(-) > > diff --git a/diff.c b/diff.c > index 9038f19..302ef33 100644 > --- a/diff.c > +++ b/diff.c > @@ -2251,6 +2251,8 @@ static void builtin_diffstat(const char *name_a, const char *name_b, > memset(&xpp, 0, sizeof(xpp)); > memset(&xecfg, 0, sizeof(xecfg)); > xpp.flags = o->xdl_opts; > + xecfg.ctxlen = o->context; > + xecfg.interhunkctxlen = o->interhunkcontext; > xdi_diff_outf(&mf1, &mf2, diffstat_consume, diffstat, > &xpp, &xecfg); > } First of all: thank you for your extended feedback, your respectful e-mails and your patch! I did some benachmarking. I compared git version 1.7.6.3 with 1.7.7.rc2 and 1.7.7.rc2 with the above patch. My test setup: git version 1.7.6.3: 740a8fc2 git version 1.7.7.rc2: 167a5800 git version 1.7.7.rc2': 167a5800 with the above patch applied. Tuple (15,07) shows 15 lines added and 7 lines removed $ git show $rev -- $that_path | diffstat $ git show -U0 $rev -- $that_path | diffstat $ git log --numstat -n1 --oneline $rev $ git show --stat --oneline -n1 $rev -- $that_path Test 1: repo='https://github.com/voldemort/voldemort.git' rev='48a07e7e' that_path='src/java/voldemort/server/storage/StorageService.java' Results: 1.7.6.3 1.7.7.rc2 1.7.7.rc2' (10,09) (10,09) (10,09) (11,10) (10,09) (10,09) (11,10) (10,09) (10,09) (11,10) (10,09) (10,09) Test 2: repo='https://github.com/voldemort/voldemort.git' rev='c21ad764' that_path='contrib/hadoop-store-builder/src/java/voldemort/store/readonly/mr/HadoopStoreBuilderReducer.java' Results: 1.7.6.3 1.7.7.rc2 1.7.7.rc2' (30,27) (25,22) (25,22) (25,22) (25,22) (25,22) (25,22) (25,22) (25,22) (25,22) (25,22) (25,22) Test 3: repo='private repo' rev='bd61f26e' that_path='[...]JmeterTest/loadtests/JMeterLoadTest.jmx' Results: 1.7.6.3 1.7.7.rc2 1.7.7.rc2' (450,3544) (450,3544) (450,3544) (401,3495) (401,3495) (401,3495) (401,3495) (401,3495) (450,3544) (401,3495) (401,3495) (450,3544) In Test 1 the patch seems to be different formatted from 1.7.7.rc2, so the context doesn't matter with the newer version. In Test 2 the patch seems to be different formatted from 1.7.7.rc2, so the context doesn't matter with the newer version. In Test 3 (which is a private repo) where a lot of different lines where changes, some single lines, some multiple lines long makes the biggest difference. With the different contexts different output is observed. I'm sorry that I can not find an open source example for that, but your patch seems to fix this. So it seems that the patch output in general changed between 1.7.6.3 and 1.7.7.rc2. If I had the right to vote for your patch, I would give it a +1 :-) Greetings from Berlin Alex PS: Will this patch be in the final version of 1.7.7? ^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2011-09-25 17:53 UTC | newest] Thread overview: 16+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2011-09-21 9:03 Bug: git log --numstat counts wrong Alexander Pepper 2011-09-21 12:24 ` Junio C Hamano 2011-09-21 13:40 ` Alexander Pepper 2011-09-21 14:23 ` Alexander Pepper 2011-09-21 20:35 ` Junio C Hamano 2011-09-22 13:19 ` Alexander Pepper 2011-09-22 17:32 ` Junio C Hamano 2011-09-22 16:15 ` René Scharfe 2011-09-23 6:30 ` Tay Ray Chuan 2011-09-25 13:39 ` [PATCH] Revert removal of multi-match discard heuristic in 27af01 Tay Ray Chuan 2011-09-25 17:53 ` René Scharfe 2011-09-22 17:51 ` Bug: git log --numstat counts wrong Junio C Hamano 2011-09-23 9:18 ` Tay Ray Chuan 2011-09-23 16:38 ` Tay Ray Chuan 2011-09-23 19:23 ` Junio C Hamano 2011-09-23 10:30 ` Alexander Pepper
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).