* git-diff-tree rename detection bug
@ 2005-09-14 16:47 Wayne Scott
2005-09-14 18:09 ` Junio C Hamano
2005-09-14 18:51 ` Junio C Hamano
0 siblings, 2 replies; 23+ messages in thread
From: Wayne Scott @ 2005-09-14 16:47 UTC (permalink / raw)
To: git
Look at the diffs between ad6571a78ac74e9fa27e581834709067dba459af and
it's parent with and without rename detection enabled. (In linux-2.6
git tree)
(formated for narrow screens)
$ REV=ad6571a78ac74e9fa27e581834709067dba459af
$ git-diff-tree -r $REV^1 $REV | grep termios.h
:000000 100644 0000000000000000000000000000000000000000
237533bb0e9f1a3e640c4906d8b350deafd315b9 A include/asm-powerpc/termios.h
:100644 000000 97c6287a6cbaa5903ee1a5934a5553e9e485d8e7
0000000000000000000000000000000000000000 D include/asm-ppc/termios.h
:100644 000000 02c3d283aa62bc1b4d7c5d1b22ce03ee4b8771eb
0000000000000000000000000000000000000000 D include/asm-ppc64/termios.h
$ git-diff-tree -r -M $REV^1 $REV | grep termios.h
:000000 100644 0000000000000000000000000000000000000000
237533bb0e9f1a3e640c4906d8b350deafd315b9 A include/asm-powerpc/termios.h
:100644 000000 97c6287a6cbaa5903ee1a5934a5553e9e485d8e7
0000000000000000000000000000000000000000 D include/asm-ppc/termios.h
Notice how the the fact that include/asm-ppc64/termios.h is deleted gets lost?
Looks broken to me.
-Wayne
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: git-diff-tree rename detection bug
2005-09-14 16:47 git-diff-tree rename detection bug Wayne Scott
@ 2005-09-14 18:09 ` Junio C Hamano
2005-09-14 18:40 ` Wayne Scott
2005-09-14 18:51 ` Junio C Hamano
1 sibling, 1 reply; 23+ messages in thread
From: Junio C Hamano @ 2005-09-14 18:09 UTC (permalink / raw)
To: wsc9tt; +Cc: git
Wayne Scott <wsc9tt@gmail.com> writes:
> Look at the diffs between ad6571a78ac74e9fa27e581834709067dba459af and
> it's parent with and without rename detection enabled. (In linux-2.6
> git tree)
> Notice how the the fact that include/asm-ppc64/termios.h is
> deleted gets lost? Looks broken to me.
I suspect that what is deleted is not asm-ppc64/termios.h but
asm-ppc/termios.h. The below is an output without grep which
seems to confuse things.
$ git-diff-tree -r -M ad6571a78ac74e9fa27e581834709067dba459af |
sed -ne 's/^.* \([AMCRDU]\)/\1/p'
R092 include/asm-ppc64/mman.h include/asm-powerpc/mman.h
R097 include/asm-ppc64/termbits.h include/asm-powerpc/termbits.h
R098 include/asm-ppc64/termios.h include/asm-powerpc/termios.h
D include/asm-ppc/mman.h
D include/asm-ppc/termbits.h
D include/asm-ppc/termios.h
$ git-diff-tree -r ad6571a78ac74e9fa27e581834709067dba459af |
sed -ne 's/^.* \([AMCRDU]\)/\1/p'
A include/asm-powerpc/mman.h
A include/asm-powerpc/termbits.h
A include/asm-powerpc/termios.h
D include/asm-ppc/mman.h
D include/asm-ppc/termbits.h
D include/asm-ppc/termios.h
D include/asm-ppc64/mman.h
D include/asm-ppc64/termbits.h
D include/asm-ppc64/termios.h
The first 3 A and last 3 D are accounted for in the -M output as
renames from asm-ppc64 to asm-powerpc. Middle 3 D from asm-ppc
are shown in the -M output. So I do not think we are losing
anything. Am I missing something?
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: git-diff-tree rename detection bug
2005-09-14 18:09 ` Junio C Hamano
@ 2005-09-14 18:40 ` Wayne Scott
2005-09-14 20:24 ` Linus Torvalds
0 siblings, 1 reply; 23+ messages in thread
From: Wayne Scott @ 2005-09-14 18:40 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git
On 9/14/05, Junio C Hamano <junkio@cox.net> wrote:
> Wayne Scott <wsc9tt@gmail.com> writes:
>
> > Look at the diffs between ad6571a78ac74e9fa27e581834709067dba459af and
> > it's parent with and without rename detection enabled. (In linux-2.6
> > git tree)
>
> > Notice how the the fact that include/asm-ppc64/termios.h is
> > deleted gets lost? Looks broken to me.
>
> I suspect that what is deleted is not asm-ppc64/termios.h but
> asm-ppc/termios.h. The below is an output without grep which
> seems to confuse things.
>
> $ git-diff-tree -r -M ad6571a78ac74e9fa27e581834709067dba459af |
> sed -ne 's/^.* \([AMCRDU]\)/\1/p'
> R092 include/asm-ppc64/mman.h include/asm-powerpc/mman.h
> R097 include/asm-ppc64/termbits.h include/asm-powerpc/termbits.h
> R098 include/asm-ppc64/termios.h include/asm-powerpc/termios.h
> D include/asm-ppc/mman.h
> D include/asm-ppc/termbits.h
> D include/asm-ppc/termios.h
>
> $ git-diff-tree -r ad6571a78ac74e9fa27e581834709067dba459af |
> sed -ne 's/^.* \([AMCRDU]\)/\1/p'
> A include/asm-powerpc/mman.h
> A include/asm-powerpc/termbits.h
> A include/asm-powerpc/termios.h
> D include/asm-ppc/mman.h
> D include/asm-ppc/termbits.h
> D include/asm-ppc/termios.h
> D include/asm-ppc64/mman.h
> D include/asm-ppc64/termbits.h
> D include/asm-ppc64/termios.h
>
> The first 3 A and last 3 D are accounted for in the -M output as
> renames from asm-ppc64 to asm-powerpc. Middle 3 D from asm-ppc
> are shown in the -M output. So I do not think we are losing
> anything. Am I missing something?
Odd. I get the same answer on my x86 box:
$ git-diff-tree -r -M ad6571a78ac74e9fa27e581834709067dba459af | sed
-ne 's/^.* \([AMCRDU]\)/\1/p'
R092 include/asm-ppc64/mman.h include/asm-powerpc/mman.h
R097 include/asm-ppc64/termbits.h include/asm-powerpc/termbits.h
R098 include/asm-ppc64/termios.h include/asm-powerpc/termios.h
D include/asm-ppc/mman.h
D include/asm-ppc/termbits.h
D include/asm-ppc/termios.h
But here is the output on my quad xeon running in 64-bit mode: (fedora core 2)
$ git-diff-tree -r -M ad6571a78ac74e9fa27e581834709067dba459af | sed
-ne 's/^.* \([AMCRDU]\)/\1/p'
R092 include/asm-ppc64/mman.h include/asm-powerpc/mman.h
R097 include/asm-ppc64/termbits.h include/asm-powerpc/termbits.h
A include/asm-powerpc/termios.h
D include/asm-ppc/mman.h
D include/asm-ppc/termbits.h
D include/asm-ppc/termios.h
This is the same version of git both rebuilt just for this test.
However, I noticed a whole collection of errors from valgrind when I
run this command line:
==13457== Invalid read of size 4
==13457== at 0x805402C: locate_rename_dst (diffcore-rename.c:28)
==13457== by 0x805464B: diffcore_rename (diffcore-rename.c:356)
==13457== by 0x805249D: diffcore_std (diff.c:1093)
==13457== by 0x8049B48: call_diff_flush (diff-tree.c:273)
==13457== by 0x804A225: diff_tree_sha1_top (diff-tree.c:298)
==13457== by 0x804A30B: diff_tree_commit (diff-tree.c:363)
==13457== by 0x804A884: main (diff-tree.c:551)
==13457== Address 0x1BBD781C is 20 bytes inside a block of size 71 free'd
==13457== at 0x1B9003C3: free (vg_replace_malloc.c:235)
==13457== by 0x805205B: diff_free_filepair (diff.c:775)
==13457== by 0x805422A: diffcore_rename (diffcore-rename.c:415)
==13457== by 0x805249D: diffcore_std (diff.c:1093)
==13457== by 0x8049B48: call_diff_flush (diff-tree.c:273)
==13457== by 0x804A225: diff_tree_sha1_top (diff-tree.c:298)
==13457== by 0x804A30B: diff_tree_commit (diff-tree.c:363)
==13457== by 0x804A884: main (diff-tree.c:551)
==13457==
==13457== Invalid read of size 1
==13457== at 0x1B90140D: strcmp (mac_replace_strmem.c:332)
==13457== by 0x8054038: locate_rename_dst (diffcore-rename.c:28)
==13457== by 0x805464B: diffcore_rename (diffcore-rename.c:356)
==13457== by 0x805249D: diffcore_std (diff.c:1093)
==13457== by 0x8049B48: call_diff_flush (diff-tree.c:273)
==13457== by 0x804A225: diff_tree_sha1_top (diff-tree.c:298)
==13457== by 0x804A30B: diff_tree_commit (diff-tree.c:363)
==13457== by 0x804A884: main (diff-tree.c:551)
==13457== Address 0x1BBD7830 is 40 bytes inside a block of size 71 free'd
==13457== at 0x1B9003C3: free (vg_replace_malloc.c:235)
==13457== by 0x805205B: diff_free_filepair (diff.c:775)
==13457== by 0x805422A: diffcore_rename (diffcore-rename.c:415)
==13457== by 0x805249D: diffcore_std (diff.c:1093)
==13457== by 0x8049B48: call_diff_flush (diff-tree.c:273)
==13457== by 0x804A225: diff_tree_sha1_top (diff-tree.c:298)
==13457== by 0x804A30B: diff_tree_commit (diff-tree.c:363)
==13457== by 0x804A884: main (diff-tree.c:551)
==13457==
==13457== Invalid read of size 1
==13457== at 0x1B901423: strcmp (mac_replace_strmem.c:332)
==13457== by 0x8054038: locate_rename_dst (diffcore-rename.c:28)
==13457== by 0x805464B: diffcore_rename (diffcore-rename.c:356)
==13457== by 0x805249D: diffcore_std (diff.c:1093)
==13457== by 0x8049B48: call_diff_flush (diff-tree.c:273)
==13457== by 0x804A225: diff_tree_sha1_top (diff-tree.c:298)
==13457== by 0x804A30B: diff_tree_commit (diff-tree.c:363)
==13457== by 0x804A884: main (diff-tree.c:551)
==13504== Invalid read of size 4
==13504== at 0x805402C: locate_rename_dst (diffcore-rename.c:28)
==13504== by 0x805464B: diffcore_rename (diffcore-rename.c:356)
==13504== by 0x805249D: diffcore_std (diff.c:1093)
==13504== by 0x8049B48: call_diff_flush (diff-tree.c:273)
==13504== by 0x804A225: diff_tree_sha1_top (diff-tree.c:298)
==13504== by 0x804A30B: diff_tree_commit (diff-tree.c:363)
==13504== by 0x804A884: main (diff-tree.c:551)
==13504== Address 0x1BBD781C is 20 bytes inside a block of size 71 free'd
==13504== at 0x1B9003C3: free (vg_replace_malloc.c:235)
==13504== by 0x805205B: diff_free_filepair (diff.c:775)
==13504== by 0x805422A: diffcore_rename (diffcore-rename.c:415)
==13504== by 0x805249D: diffcore_std (diff.c:1093)
==13504== by 0x8049B48: call_diff_flush (diff-tree.c:273)
==13504== by 0x804A225: diff_tree_sha1_top (diff-tree.c:298)
==13504== by 0x804A30B: diff_tree_commit (diff-tree.c:363)
==13504== by 0x804A884: main (diff-tree.c:551)
==13504==
==13504== Invalid read of size 1
==13504== at 0x1B90140D: strcmp (mac_replace_strmem.c:332)
==13504== by 0x8054038: locate_rename_dst (diffcore-rename.c:28)
==13504== by 0x805464B: diffcore_rename (diffcore-rename.c:356)
==13504== by 0x805249D: diffcore_std (diff.c:1093)
==13504== by 0x8049B48: call_diff_flush (diff-tree.c:273)
==13504== by 0x804A225: diff_tree_sha1_top (diff-tree.c:298)
==13504== by 0x804A30B: diff_tree_commit (diff-tree.c:363)
==13504== by 0x804A884: main (diff-tree.c:551)
==13504== Address 0x1BBD7830 is 40 bytes inside a block of size 71 free'd
==13504== at 0x1B9003C3: free (vg_replace_malloc.c:235)
==13504== by 0x805205B: diff_free_filepair (diff.c:775)
==13504== by 0x805422A: diffcore_rename (diffcore-rename.c:415)
==13504== by 0x805249D: diffcore_std (diff.c:1093)
==13504== by 0x8049B48: call_diff_flush (diff-tree.c:273)
==13504== by 0x804A225: diff_tree_sha1_top (diff-tree.c:298)
==13504== by 0x804A30B: diff_tree_commit (diff-tree.c:363)
==13504== by 0x804A884: main (diff-tree.c:551)
==13504==
==13504== Invalid read of size 1
==13504== at 0x1B901423: strcmp (mac_replace_strmem.c:332)
==13504== by 0x8054038: locate_rename_dst (diffcore-rename.c:28)
==13504== by 0x805464B: diffcore_rename (diffcore-rename.c:356)
==13504== by 0x805249D: diffcore_std (diff.c:1093)
==13504== by 0x8049B48: call_diff_flush (diff-tree.c:273)
==13504== by 0x804A225: diff_tree_sha1_top (diff-tree.c:298)
==13504== by 0x804A30B: diff_tree_commit (diff-tree.c:363)
==13504== by 0x804A884: main (diff-tree.c:551)
==13504== Address 0x1BBD7831 is 41 bytes inside a block of size 71 free'd
==13504== at 0x1B9003C3: free (vg_replace_malloc.c:235)
==13504== by 0x805205B: diff_free_filepair (diff.c:775)
==13504== by 0x805422A: diffcore_rename (diffcore-rename.c:415)
==13504== by 0x805249D: diffcore_std (diff.c:1093)
==13504== by 0x8049B48: call_diff_flush (diff-tree.c:273)
==13504== by 0x804A225: diff_tree_sha1_top (diff-tree.c:298)
==13504== by 0x804A30B: diff_tree_commit (diff-tree.c:363)
==13504== by 0x804A884: main (diff-tree.c:551)
==13504==
Perhaps that explains the difference.
-Wayne
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: git-diff-tree rename detection bug
2005-09-14 16:47 git-diff-tree rename detection bug Wayne Scott
2005-09-14 18:09 ` Junio C Hamano
@ 2005-09-14 18:51 ` Junio C Hamano
[not found] ` <59a6e5830509141208282166c8@mail.gmail.com>
1 sibling, 1 reply; 23+ messages in thread
From: Junio C Hamano @ 2005-09-14 18:51 UTC (permalink / raw)
To: wsc9tt; +Cc: git
Wayne Scott <wsc9tt@gmail.com> writes:
> Look at the diffs between ad6571a78ac74e9fa27e581834709067dba459af and
> it's parent with and without rename detection enabled. (In linux-2.6
> git tree)
> $ git-diff-tree -r -M $REV^1 $REV | grep termios.h
> :000000 100644 0000000000000000000000000000000000000000
> 237533bb0e9f1a3e640c4906d8b350deafd315b9 A include/asm-powerpc/termios.h
> :100644 000000 97c6287a6cbaa5903ee1a5934a5553e9e485d8e7
> 0000000000000000000000000000000000000000 D include/asm-ppc/termios.h
>
> Notice how the the fact that include/asm-ppc64/termios.h is deleted gets lost?
> Looks broken to me.
It looks broken to me, too. I rebuilt from a reasonably ancient
source (v0.99) and re-run the test but I could not get it to
produce 'A' for include/asm-powerpc/termios.h. So I rewound it
further to 4d235c8044a638108b67e22f94b2876657130fc8 commit,
which is really ancient version, but it still says it is renamed
from asm-ppc64 directory. FWIW, all the v0.99* tagged versions
seem to detect that rename correctly and not lose anything in my
tests.
Which version of git do you run and on what platform? It might
be that something in the diffcore chain is broken in non-i386
and/or non-GNU/Linux and/or non-GCC environment.
Shoot, I thought it would be a good practice-case for me to use
'git bisect' in reverse to find the commit that fixed a bug ;-).
My copy of linux-2.6 repository for testing is fully packed so I
could not try the commit that introduced diffcore-rename.c, but
that is what I really wanted to try.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: git-diff-tree rename detection bug
[not found] ` <59a6e5830509141208282166c8@mail.gmail.com>
@ 2005-09-14 19:19 ` Junio C Hamano
2005-09-14 20:53 ` Linus Torvalds
0 siblings, 1 reply; 23+ messages in thread
From: Junio C Hamano @ 2005-09-14 19:19 UTC (permalink / raw)
To: wsc9tt; +Cc: git
Could you revert one or both of these commits and try the same
test on the 64-bit box you saw problems with please?
Commit: 90a734dc7f37a7bd1f3beec4d33acad559360f6c
Author: Yasushi SHOJI <yashi@atmark-techno.com>
Date: Sun Aug 21 16:14:16 2005 +0900
[PATCH] possible memory leak in diff.c::diff_free_filepair()
Commit: 068eac91ce04b9aca163acb1927c3878c45d1a07
Author: Yasushi SHOJI <yashi@atmark-techno.com>
Date: Sat Aug 13 19:58:56 2005 +0900
[PATCH] plug memory leak in diff.c::diff_free_filepair()
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: git-diff-tree rename detection bug
2005-09-14 18:40 ` Wayne Scott
@ 2005-09-14 20:24 ` Linus Torvalds
2005-09-14 20:41 ` Fix alloc_filespec() initialization Linus Torvalds
2005-09-15 2:23 ` git-diff-tree rename detection bug Paul Mackerras
0 siblings, 2 replies; 23+ messages in thread
From: Linus Torvalds @ 2005-09-14 20:24 UTC (permalink / raw)
To: Wayne Scott; +Cc: Junio C Hamano, git
On Wed, 14 Sep 2005, Wayne Scott wrote:
>
> However, I noticed a whole collection of errors from valgrind when I
> run this command line:
I get even more, including:
==3234== Use of uninitialised value of size 4
==3234== at 0x80507C1: alloc_filespec (diff.c:224)
==3234== by 0x8052387: diff_addremove (diff.c:1144)
==3234== by 0x8049B74: show_file (diff-tree.c:97)
==3234== by 0x8049E17: diff_tree (diff-tree.c:118)
Damn, too bad valgrind doesn't work on ppc64, so I can't use it on my main
machine. It seems to be in development on ppc32, so maybe some day.
I'll look at it on my other machines instead,
Linus
^ permalink raw reply [flat|nested] 23+ messages in thread
* Fix alloc_filespec() initialization
2005-09-14 20:24 ` Linus Torvalds
@ 2005-09-14 20:41 ` Linus Torvalds
2005-09-15 2:23 ` git-diff-tree rename detection bug Paul Mackerras
1 sibling, 0 replies; 23+ messages in thread
From: Linus Torvalds @ 2005-09-14 20:41 UTC (permalink / raw)
To: Wayne Scott; +Cc: Junio C Hamano, git
This simplifies and fixes the initialization of a "diff_filespec" when
allocated.
The old code would not initialize "sha1_valid". Noticed by valgrind.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
---
This does not fix the issue Wayne saw, but I'm not going to look at the
later valgrind errors before I've fixed the first ones.
On Wed, 14 Sep 2005, Linus Torvalds wrote:
>
> I get even more, including:
>
> ==3234== Use of uninitialised value of size 4
> ==3234== at 0x80507C1: alloc_filespec (diff.c:224)
> ==3234== by 0x8052387: diff_addremove (diff.c:1144)
> ==3234== by 0x8049B74: show_file (diff-tree.c:97)
> ==3234== by 0x8049E17: diff_tree (diff-tree.c:118)
diff --git a/diff.c b/diff.c
--- a/diff.c
+++ b/diff.c
@@ -214,14 +214,10 @@ struct diff_filespec *alloc_filespec(con
{
int namelen = strlen(path);
struct diff_filespec *spec = xmalloc(sizeof(*spec) + namelen + 1);
+
+ memset(spec, 0, sizeof(*spec));
spec->path = (char *)(spec + 1);
- strcpy(spec->path, path);
- spec->should_free = spec->should_munmap = 0;
- spec->xfrm_flags = 0;
- spec->size = 0;
- spec->data = NULL;
- spec->mode = 0;
- memset(spec->sha1, 0, 20);
+ memcpy(spec->path, path, namelen+1);
return spec;
}
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: git-diff-tree rename detection bug
2005-09-14 19:19 ` Junio C Hamano
@ 2005-09-14 20:53 ` Linus Torvalds
2005-09-14 21:55 ` Junio C Hamano
0 siblings, 1 reply; 23+ messages in thread
From: Linus Torvalds @ 2005-09-14 20:53 UTC (permalink / raw)
To: Junio C Hamano; +Cc: wsc9tt, git
On Wed, 14 Sep 2005, Junio C Hamano wrote:
>
> Could you revert one or both of these commits and try the same
> test on the 64-bit box you saw problems with please?
>
> Commit: 90a734dc7f37a7bd1f3beec4d33acad559360f6c
> Author: Yasushi SHOJI <yashi@atmark-techno.com>
> Date: Sun Aug 21 16:14:16 2005 +0900
>
> [PATCH] possible memory leak in diff.c::diff_free_filepair()
>
> Commit: 068eac91ce04b9aca163acb1927c3878c45d1a07
> Author: Yasushi SHOJI <yashi@atmark-techno.com>
> Date: Sat Aug 13 19:58:56 2005 +0900
>
> [PATCH] plug memory leak in diff.c::diff_free_filepair()
Undoing that second one (068eac91ce04b9aca163acb1927c3878c45d1a07) fixes
the valgrind errors.
Linus
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: git-diff-tree rename detection bug
2005-09-14 20:53 ` Linus Torvalds
@ 2005-09-14 21:55 ` Junio C Hamano
0 siblings, 0 replies; 23+ messages in thread
From: Junio C Hamano @ 2005-09-14 21:55 UTC (permalink / raw)
To: Linus Torvalds; +Cc: git
Linus Torvalds <torvalds@osdl.org> writes:
> Undoing that second one (068eac91ce04b9aca163acb1927c3878c45d1a07) fixes
> the valgrind errors.
Thanks; reverted and applied your other patch and pushed them
out.
I think the filepairs are sometimes shared so to be able to free
them properly, we need to reference count. Ugh.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: git-diff-tree rename detection bug
2005-09-14 20:24 ` Linus Torvalds
2005-09-14 20:41 ` Fix alloc_filespec() initialization Linus Torvalds
@ 2005-09-15 2:23 ` Paul Mackerras
2005-09-15 2:26 ` Paul Mackerras
` (2 more replies)
1 sibling, 3 replies; 23+ messages in thread
From: Paul Mackerras @ 2005-09-15 2:23 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Wayne Scott, Junio C Hamano, git
Linus Torvalds writes:
> Damn, too bad valgrind doesn't work on ppc64, so I can't use it on my main
> machine. It seems to be in development on ppc32, so maybe some day.
How about... today? :-) My port of Valgrind-2.4.1 to ppc32 works
pretty well. You can get it from:
http://www.valgrind.org/downloads/variants.html?pmk
I assume you're compiling git as 32-bit executables on your G5. I
don't see any reason why the git binaries would need to be 64-bit.
Regards,
Paul.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: git-diff-tree rename detection bug
2005-09-15 2:23 ` git-diff-tree rename detection bug Paul Mackerras
@ 2005-09-15 2:26 ` Paul Mackerras
2005-09-15 3:36 ` Linus Torvalds
2005-09-15 3:29 ` Linus Torvalds
2005-09-15 5:58 ` H. Peter Anvin
2 siblings, 1 reply; 23+ messages in thread
From: Paul Mackerras @ 2005-09-15 2:26 UTC (permalink / raw)
To: Linus Torvalds, Wayne Scott, Junio C Hamano, git
I wrote:
> How about... today? :-) My port of Valgrind-2.4.1 to ppc32 works
> pretty well. You can get it from:
I meant to say explicitly that it runs quite happily under a ppc64
kernel, but only does 32-bit executables at present.
Paul.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: git-diff-tree rename detection bug
2005-09-15 2:23 ` git-diff-tree rename detection bug Paul Mackerras
2005-09-15 2:26 ` Paul Mackerras
@ 2005-09-15 3:29 ` Linus Torvalds
2005-09-16 5:59 ` Junio C Hamano
2005-09-15 5:58 ` H. Peter Anvin
2 siblings, 1 reply; 23+ messages in thread
From: Linus Torvalds @ 2005-09-15 3:29 UTC (permalink / raw)
To: Paul Mackerras; +Cc: Wayne Scott, Junio C Hamano, git
On Thu, 15 Sep 2005, Paul Mackerras wrote:
>
> Linus Torvalds writes:
>
> > Damn, too bad valgrind doesn't work on ppc64, so I can't use it on my main
> > machine. It seems to be in development on ppc32, so maybe some day.
>
> How about... today? :-) My port of Valgrind-2.4.1 to ppc32 works
> pretty well. You can get it from:
>
> http://www.valgrind.org/downloads/variants.html?pmk
Ahh. Compiling right now.
> I assume you're compiling git as 32-bit executables on your G5. I
> don't see any reason why the git binaries would need to be 64-bit.
I use whatever the defaults are. And yes, it seems to me -m32.
Linus
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: git-diff-tree rename detection bug
2005-09-15 2:26 ` Paul Mackerras
@ 2005-09-15 3:36 ` Linus Torvalds
2005-09-15 4:52 ` Paul Mackerras
0 siblings, 1 reply; 23+ messages in thread
From: Linus Torvalds @ 2005-09-15 3:36 UTC (permalink / raw)
To: Paul Mackerras; +Cc: Wayne Scott, Junio C Hamano, git
On Thu, 15 Sep 2005, Paul Mackerras wrote:
>
> I meant to say explicitly that it runs quite happily under a ppc64
> kernel, but only does 32-bit executables at present.
It works, but it does end up complaining about things like
==23756== Invalid read of size 4
==23756== at 0x25A38990: strlen (in /lib/libc-2.3.5.so)
..
==23756== Address 0x25B86754 is 3 bytes after a block of size 17 alloc'd
which seems to be just strlen prefetching the next word or something like
that.
But it's still nicer than not having it at all, even if it appears I'll
have to do some filtering of my own.
Linus
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: git-diff-tree rename detection bug
2005-09-15 3:36 ` Linus Torvalds
@ 2005-09-15 4:52 ` Paul Mackerras
2005-09-15 8:17 ` Junio C Hamano
2005-09-15 14:49 ` Josef Weidendorfer
0 siblings, 2 replies; 23+ messages in thread
From: Paul Mackerras @ 2005-09-15 4:52 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Wayne Scott, Junio C Hamano, git
Linus Torvalds writes:
> It works, but it does end up complaining about things like
>
> ==23756== Invalid read of size 4
> ==23756== at 0x25A38990: strlen (in /lib/libc-2.3.5.so)
> ..
> ==23756== Address 0x25B86754 is 3 bytes after a block of size 17 alloc'd
>
> which seems to be just strlen prefetching the next word or something like
> that.
The strlen() in glibc for ppc is unbearably clever hand-coded
assembly, which loads up 8 bytes at a time (once it has the address
8-byte aligned), and does various ANDs and ORs and ADDs and
conditional branches. If some of the 8 bytes aren't defined, it will
in many cases branch one way or the other based on the undefined
bytes, but end up computing the same result on either branch.
Valgrind is right in that strlen is loading up some bytes that are
past the end of a malloc'd block. In fact those bytes don't end up
affecting the result, and in fact the load couldn't cause a segfault,
but it's not surprising that Valgrind can't see that, since the value
of the extra bytes can actually affect whether a conditional branch is
taken or not, but we end up with the same result either way.
Valgrind sets LD_PRELOAD so that you get a simple Valgrind-supplied
set of string functions, including strlen, from vgpreload_memcheck.so
rather than the fancy glibc ones. However, that doesn't seem to catch
the calls to strlen from inside glibc - the call from vfprintf is a
direct branch rather than going through the PLT, for instance.
I could add a suppression to suppress all errors in strlen, but that
would mean you would miss real errors, where the string is not
null-terminated within the malloc'd block, and strlen runs off the
end.
I wish I had a good answer for this problem, but I don't. Maybe we
need a debugging version of glibc that doesn't use the fancy
bit-fiddling algorithms in the string functions.
(Just for interest: here are the comments from strlen.S:
1) Given a word 'x', we can test to see if it contains any 0 bytes
by subtracting 0x01010101, and seeing if any of the high bits of each
byte changed from 0 to 1. This works because the least significant
0 byte must have had no incoming carry (otherwise it's not the least
significant), so it is 0x00 - 0x01 == 0xff. For all other
byte values, either they have the high bit set initially, or when
1 is subtracted you get a value in the range 0x00-0x7f, none of which
have their high bit set. The expression here is
(x + 0xfefefeff) & ~(x | 0x7f7f7f7f), which gives 0x00000000 when
there were no 0x00 bytes in the word.
2) Given a word 'x', we can test to see _which_ byte was zero by
calculating ~(((x & 0x7f7f7f7f) + 0x7f7f7f7f) | x | 0x7f7f7f7f).
This produces 0x80 in each byte that was zero, and 0x00 in all
the other bytes. The '| 0x7f7f7f7f' clears the low 7 bits in each
byte, and the '| x' part ensures that bytes with the high bit set
produce 0x00. The addition will carry into the high bit of each byte
iff that byte had one of its low 7 bits set. We can then just see
which was the most significant bit set and divide by 8 to find how
many to add to the index.
This is from the book 'The PowerPC Compiler Writer's Guide',
by Steve Hoxey, Faraydon Karim, Bill Hay and Hank Warren.
)
Paul.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: git-diff-tree rename detection bug
2005-09-15 2:23 ` git-diff-tree rename detection bug Paul Mackerras
2005-09-15 2:26 ` Paul Mackerras
2005-09-15 3:29 ` Linus Torvalds
@ 2005-09-15 5:58 ` H. Peter Anvin
2005-09-15 7:41 ` Junio C Hamano
` (2 more replies)
2 siblings, 3 replies; 23+ messages in thread
From: H. Peter Anvin @ 2005-09-15 5:58 UTC (permalink / raw)
To: Paul Mackerras; +Cc: Linus Torvalds, Wayne Scott, Junio C Hamano, git
Paul Mackerras wrote:
>
> I assume you're compiling git as 32-bit executables on your G5. I
> don't see any reason why the git binaries would need to be 64-bit.
>
Well, git seems to assume it can mmap() the entirety of any file under
its control, so a 64-bit git could handle larger files.
Still, I'm using 32-bit git on ppc64.
-hpa
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: git-diff-tree rename detection bug
2005-09-15 5:58 ` H. Peter Anvin
@ 2005-09-15 7:41 ` Junio C Hamano
2005-09-15 14:55 ` Linus Torvalds
2005-09-15 20:57 ` Nicolas Pitre
2 siblings, 0 replies; 23+ messages in thread
From: Junio C Hamano @ 2005-09-15 7:41 UTC (permalink / raw)
To: H. Peter Anvin; +Cc: Paul Mackerras, Linus Torvalds, Wayne Scott, git
"H. Peter Anvin" <hpa@zytor.com> writes:
> Paul Mackerras wrote:
>> I assume you're compiling git as 32-bit executables on your G5. I
>> don't see any reason why the git binaries would need to be 64-bit.
>>
>
> Well, git seems to assume it can mmap() the entirety of any file under
> its control, so a 64-bit git could handle larger files.
>
> Still, I'm using 32-bit git on ppc64.
Which is a valid thing to do because git also assumes that a
file offset fits within 32-bit as far as I know. Each object in
a packfile should also fit in 32-bit offset and the size of one
packfile also needs to be within 32-bit offset. I was unsure if
the last limitation would cause problems in the real life but we
will see in a couple of years ;-).
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: git-diff-tree rename detection bug
2005-09-15 4:52 ` Paul Mackerras
@ 2005-09-15 8:17 ` Junio C Hamano
2005-09-15 14:49 ` Josef Weidendorfer
1 sibling, 0 replies; 23+ messages in thread
From: Junio C Hamano @ 2005-09-15 8:17 UTC (permalink / raw)
To: Paul Mackerras; +Cc: Linus Torvalds, Wayne Scott, git
Paul Mackerras <paulus@samba.org> writes:
> The strlen() in glibc for ppc is unbearably clever hand-coded
> assembly, which loads up 8 bytes at a time (once it has the address
> 8-byte aligned), and does various ANDs and ORs and ADDs and
> conditional branches. If some of the 8 bytes aren't defined, it will
> in many cases branch one way or the other based on the undefined
> bytes, but end up computing the same result on either branch.
This reminds me of what I did in my previous life, writing a
memory allocation checker -- this was before Valgrind -- and
found out that strcpy in the C library that came with Solaris
had a similar clever trick. What was interesting was that
copying a string starting at the (PAGESIZE-3)th byte on a page
and NUL terminated at the end of the same page ended up
prefetching the first word from the next page (please do not ask
me about the details -- I do not remember the disassembly of
that part of the code anymore). It was not an inconvenience for
our memory checker but was a real bug -- the next page could
very well be unaccessible.
The bug was fixed in the next version of the C library when we
updated our Solaris box.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: git-diff-tree rename detection bug
2005-09-15 4:52 ` Paul Mackerras
2005-09-15 8:17 ` Junio C Hamano
@ 2005-09-15 14:49 ` Josef Weidendorfer
2005-09-20 10:50 ` [Valgrind-developers] " Paul Mackerras
1 sibling, 1 reply; 23+ messages in thread
From: Josef Weidendorfer @ 2005-09-15 14:49 UTC (permalink / raw)
To: git; +Cc: valgrind-developers
On Thursday 15 September 2005 06:52, Paul Mackerras wrote:
> Valgrind sets LD_PRELOAD so that you get a simple Valgrind-supplied
> set of string functions, including strlen, from vgpreload_memcheck.so
> rather than the fancy glibc ones. However, that doesn't seem to catch
> the calls to strlen from inside glibc - the call from vfprintf is a
> direct branch rather than going through the PLT, for instance.
Just curious: Why is it using LD_PRELOAD and not the VGs symbol redirection
mechanism, which should catch strlen even if used inside of glibc?
Josef
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: git-diff-tree rename detection bug
2005-09-15 5:58 ` H. Peter Anvin
2005-09-15 7:41 ` Junio C Hamano
@ 2005-09-15 14:55 ` Linus Torvalds
2005-09-16 12:53 ` Matthias Urlichs
2005-09-15 20:57 ` Nicolas Pitre
2 siblings, 1 reply; 23+ messages in thread
From: Linus Torvalds @ 2005-09-15 14:55 UTC (permalink / raw)
To: H. Peter Anvin; +Cc: Paul Mackerras, Wayne Scott, Junio C Hamano, git
On Wed, 14 Sep 2005, H. Peter Anvin wrote:
>
> Well, git seems to assume it can mmap() the entirety of any file under
> its control, so a 64-bit git could handle larger files.
Right now git isn't 64-bit clean anyway (well, it should _work_ fine on
64-bit architectures, but it just can't handle files >32 bits).
I think the core object format should be all perfectly ok (since all the
sizes etc are in ascii), so this is not a huge deal. If you have an
existing git archive and suddenly notice that you need 32+ bit file
formats, you don't have to throw your archive away and re-generate it from
scratch in some new format.
In fact, I think even the streaming pack format is 64-bit-safe: it has
binary sizes, but they are all length-encoded, and I think the data
structure is safe.
Also, the index file - in order to not be unnecessarily big, and to be
able to use standard htonl/ntohl helpers etc - uses 32-bit lengths for
files. Even that _that_ should be 64-bit safe, because the length isn't
actually _used_ for anything but to verify the curret state, and as with
the timestamps etc, it's ok to "only" check the low 32 bits.
So the on-disk format should be capable of handling 64-bit entities.
HOWEVER. I might be wrong. I tried to think about it, but I didn't care
too much, because I think git would suck at truly huge files. If only
because compressing them will take forever. And the _implementation_ uses
"unsigned int" and has never been tested with anything else, so it would
need a lot of testing.
Also, the "pack index" file can only handle 32-bit offsets - you can make
a pack-file that is bigger than that, and it should be fine from a
_streaming_ standpoint (ie in the way we use them for network transport),
but you can't index them in .git/objects/packs.
Which isn't a disaster: you might choose to say that you never pack huge
files. That might be ok for some cases (maybe the huge file is a one-off
satellite picture). It would suck if the huge file is a incrementally
created log-file, where packing really would be nice.
IF we ever hit this, and IF people decide that git actually makes sense
for those kinds of files, we CAN change the pack index format. It has a
version number and everything, so we can even do it gently. Hopefully that
would be the only actual on-disk format that would need to change. But
regardless, the git code itself would need a lot of verification to make
sure that it handles big files correctly.
I'm not seeing that as a high priority. Maybe in five years ;)
Linus
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: git-diff-tree rename detection bug
2005-09-15 5:58 ` H. Peter Anvin
2005-09-15 7:41 ` Junio C Hamano
2005-09-15 14:55 ` Linus Torvalds
@ 2005-09-15 20:57 ` Nicolas Pitre
2 siblings, 0 replies; 23+ messages in thread
From: Nicolas Pitre @ 2005-09-15 20:57 UTC (permalink / raw)
To: H. Peter Anvin
Cc: Paul Mackerras, Linus Torvalds, Wayne Scott, Junio C Hamano, git
On Wed, 14 Sep 2005, H. Peter Anvin wrote:
> Paul Mackerras wrote:
> >
> > I assume you're compiling git as 32-bit executables on your G5. I
> > don't see any reason why the git binaries would need to be 64-bit.
> >
>
> Well, git seems to assume it can mmap() the entirety of any file under its
> control, so a 64-bit git could handle larger files.
But beware that the delta code would break since it currently won't cope
with any file offset larger than 32 bits. I reserved the zero byte to
prefix any extended encoding but felt it wasn't really needed yet and I
just didn't bother coding it.
Nicolas
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: git-diff-tree rename detection bug
2005-09-15 3:29 ` Linus Torvalds
@ 2005-09-16 5:59 ` Junio C Hamano
0 siblings, 0 replies; 23+ messages in thread
From: Junio C Hamano @ 2005-09-16 5:59 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Paul Mackerras, Wayne Scott, git
Linus Torvalds <torvalds@osdl.org> writes:
> On Thu, 15 Sep 2005, Paul Mackerras wrote:
>>
>> Linus Torvalds writes:
>>
>> > Damn, too bad valgrind doesn't work on ppc64, so I can't use it on my main
>> > machine. It seems to be in development on ppc32, so maybe some day.
FWIW, I think I fixed the leak that patch we reverted was
attempting to fix, and on my i386 box valgrind seems much
happier.
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: git-diff-tree rename detection bug
2005-09-15 14:55 ` Linus Torvalds
@ 2005-09-16 12:53 ` Matthias Urlichs
0 siblings, 0 replies; 23+ messages in thread
From: Matthias Urlichs @ 2005-09-16 12:53 UTC (permalink / raw)
To: git
Hi, Linus Torvalds wrote:
> I'm not seeing that as a high priority. Maybe in five years ;)
Hmm. Just for comparison, how long was the time between "this will never
run on anything but i386" and "mkdir arch/alpha"? ;-)
--
Matthias Urlichs | {M:U} IT Design @ m-u-it.de | smurf@smurf.noris.de
Disclaimer: The quote was selected randomly. Really. | http://smurf.noris.de
- -
...He who laughs does not believe in what he laughs at, but neither
does he hate it. Therefore, laughing at evil means not preparing oneself to
combat it, and laughing at good means denying the power through which good is
self-propagating.
-- Umberto Eco, "The Name of the Rose"
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Valgrind-developers] Re: git-diff-tree rename detection bug
2005-09-15 14:49 ` Josef Weidendorfer
@ 2005-09-20 10:50 ` Paul Mackerras
0 siblings, 0 replies; 23+ messages in thread
From: Paul Mackerras @ 2005-09-20 10:50 UTC (permalink / raw)
To: Josef Weidendorfer; +Cc: git, valgrind-developers
Josef Weidendorfer writes:
> Just curious: Why is it using LD_PRELOAD and not the VGs symbol redirection
> mechanism, which should catch strlen even if used inside of glibc?
Valgrind-2.4.1 does symbol redirects for stpcpy and strnlen in
libc.so.6, and stpcpy and strchr in ld-linux.so.2. I could extend
that list on ppc to include strlen et al., but I don't know if that
would solve the problem for ld.so, since strlen doesn't appear in the
symbol table for ld.so.
Paul.
^ permalink raw reply [flat|nested] 23+ messages in thread
end of thread, other threads:[~2005-09-20 10:50 UTC | newest]
Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-09-14 16:47 git-diff-tree rename detection bug Wayne Scott
2005-09-14 18:09 ` Junio C Hamano
2005-09-14 18:40 ` Wayne Scott
2005-09-14 20:24 ` Linus Torvalds
2005-09-14 20:41 ` Fix alloc_filespec() initialization Linus Torvalds
2005-09-15 2:23 ` git-diff-tree rename detection bug Paul Mackerras
2005-09-15 2:26 ` Paul Mackerras
2005-09-15 3:36 ` Linus Torvalds
2005-09-15 4:52 ` Paul Mackerras
2005-09-15 8:17 ` Junio C Hamano
2005-09-15 14:49 ` Josef Weidendorfer
2005-09-20 10:50 ` [Valgrind-developers] " Paul Mackerras
2005-09-15 3:29 ` Linus Torvalds
2005-09-16 5:59 ` Junio C Hamano
2005-09-15 5:58 ` H. Peter Anvin
2005-09-15 7:41 ` Junio C Hamano
2005-09-15 14:55 ` Linus Torvalds
2005-09-16 12:53 ` Matthias Urlichs
2005-09-15 20:57 ` Nicolas Pitre
2005-09-14 18:51 ` Junio C Hamano
[not found] ` <59a6e5830509141208282166c8@mail.gmail.com>
2005-09-14 19:19 ` Junio C Hamano
2005-09-14 20:53 ` Linus Torvalds
2005-09-14 21:55 ` Junio C Hamano
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).