* general protection faults with "git grep" version 1.7.7.1
@ 2011-10-24 20:11 Markus Trippelsdorf
2011-10-24 21:49 ` Richard W.M. Jones
0 siblings, 1 reply; 15+ messages in thread
From: Markus Trippelsdorf @ 2011-10-24 20:11 UTC (permalink / raw)
To: git
Suddenly I'm getting strange protection faults when I run "git grep" on
the gcc tree:
git[4245] general protection ip:7f291f01461f sp:7fff5618a8b0 error:0 in libc-2.14.90.so[7f291ef9a000+15d000]
% gdb git
GNU gdb (Gentoo 7.3.1 p1) 7.3.1
Copyright (C) 2011 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-pc-linux-gnu".
For bug reporting instructions, please see:
<http://bugs.gentoo.org/>...
Reading symbols from /usr/bin/git...done.
(gdb) run grep composite_pointer_type
Starting program: /usr/bin/git grep composite_pointer_type
warning: no loadable sections found in added symbol-file system-supplied DSO at 0x7ffff7ffa000
[Thread debugging using libthread_db enabled]
[New Thread 0x7ffff7859700 (LWP 18367)]
[New Thread 0x7ffff7058700 (LWP 18368)]
[New Thread 0x7ffff6857700 (LWP 18369)]
[New Thread 0x7ffff6056700 (LWP 18370)]
[New Thread 0x7ffff5855700 (LWP 18371)]
[New Thread 0x7ffff5054700 (LWP 18372)]
[New Thread 0x7ffff4853700 (LWP 18373)]
[New Thread 0x7ffff4052700 (LWP 18374)]
Program received signal SIGSEGV, Segmentation fault.
_int_malloc (av=0x7ffff7bbb600, bytes=21) at malloc.c:3463
3463 while ((pp = catomic_compare_and_exchange_val_acq (fb, victim->fd, victim))
(gdb) bt
#0 _int_malloc (av=0x7ffff7bbb600, bytes=21) at malloc.c:3463
#1 0x00007ffff78d7300 in __GI___libc_malloc (bytes=21) at malloc.c:2924
#2 0x00007ffff78dc692 in __GI___strdup (s=0x7ffff2665760 "gcc/ada/i-cexten.ads") at strdup.c:43
#3 0x00000000004d5069 in xstrdup (str=0x7ffff2665760 "gcc/ada/i-cexten.ads") at wrapper.c:23
#4 0x000000000042c448 in grep_file_async (filename=0x7ffff2665760 "gcc/ada/i-cexten.ads", name=0x59fee0 "gcc/ada/i-cexten.ads",
opt=<optimized out>) at builtin/grep.c:148
#5 grep_file (opt=0x7fffffffbfc0, filename=0x7ffff2665760 "gcc/ada/i-cexten.ads") at builtin/grep.c:459
#6 0x000000000042ddb0 in grep_cache (cached=0, pathspec=0x7fffffffbf70, opt=0x7fffffffbfc0) at builtin/grep.c:528
#7 cmd_grep (argc=<optimized out>, argv=0x7ffff2665760, prefix=0x0) at builtin/grep.c:1062
#8 0x00000000004045b0 in run_builtin (argv=0x7fffffffe110, argc=2, p=0x536ba0) at git.c:308
#9 handle_internal_command (argc=2, argv=0x7fffffffe110) at git.c:466
#10 0x00000000004047ac in run_argv (argv=0x7fffffffdfa0, argcp=0x7fffffffdfac) at git.c:512
#11 main (argc=2, argv=0x7fffffffe110) at git.c:585
(gdb)
or:
% git grep composite_pointer_type
*** glibc detected *** git: double free or corruption (fasttop): 0x0000000001919800 ***
======= Backtrace: =========
/lib64/libc.so.6(+0x792b6)[0x7f6ad1d392b6]
git[0x42bebb]
/lib64/libpthread.so.0(+0x7c9e)[0x7f6ad202ec9e]
/lib64/libc.so.6(clone+0x6d)[0x7f6ad1d99b8d]
======= Memory map: ========
00400000-00536000 r-xp 00000000 08:12 2310166 /usr/bin/git
00536000-0053d000 rw-p 00136000 08:12 2310166 /usr/bin/git
0053d000-0058b000 rw-p 00000000 00:00 0
01906000-01927000 rw-p 00000000 00:00 0 [heap]
...
And strange output:
...
gcc/cp/typeck.c: result_type = composite_pointer_type (type0, type1, op0, op1,
gcc/cp/typeck.c: result_type = composite_pointer_type (type0, type1, op0, op1,
error: 'gcc/testsuite/ada/acats/tests/c5/c54a24a.ada': short read No such file or directory
error: 'gcc/testsuite/ada/acats/tests/c5/c54a13a.ada': short read No such file or directory
error: 'gcc/testsuite/ada/acats/tests/c6/c64104m.ada': short read No such file or directory
error: 'gcc/testsuite/ada/acats/tests/cd/cd7101f.dep': short read No such file or directory
error: 'gcc/testsuite/ada/acats/tests/ce/ce3904a.ada': short read No such file or directory
error: 'gcc/testsuite/g++.dg/abi/pr39188-3b.C': short read No such file or directory
error: '<90>Dz^A': short read Is a directory
Note that all the above files actually exist.
All of this started with version v1.7.7.1, which I installed today. I
never had any problems with git before.
Any ideas what might be going on?
--
Markus
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: general protection faults with "git grep" version 1.7.7.1
2011-10-24 20:11 general protection faults with "git grep" version 1.7.7.1 Markus Trippelsdorf
@ 2011-10-24 21:49 ` Richard W.M. Jones
2011-10-24 22:58 ` Markus Trippelsdorf
2011-10-25 13:50 ` Thomas Rast
0 siblings, 2 replies; 15+ messages in thread
From: Richard W.M. Jones @ 2011-10-24 21:49 UTC (permalink / raw)
To: Markus Trippelsdorf, meyering; +Cc: git
On Mon, Oct 24, 2011 at 10:11:53PM +0200, Markus Trippelsdorf wrote:
> Suddenly I'm getting strange protection faults when I run "git grep" on
> the gcc tree:
Jim Meyering and I are trying to chase what looks like a similar or
identical bug in git-grep. We've not got much further than gdb and
valgrind so far, but see:
https://bugzilla.redhat.com/show_bug.cgi?id=747377
It's slightly suspicious that this bug only started to happen with the
latest glibc, but that could be coincidence, or could be just that
glibc exposes a latent bug in git-grep.
Rich.
--
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
virt-df lists disk usage of guests without needing to install any
software inside the virtual machine. Supports Linux and Windows.
http://et.redhat.com/~rjones/virt-df/
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: general protection faults with "git grep" version 1.7.7.1
2011-10-24 21:49 ` Richard W.M. Jones
@ 2011-10-24 22:58 ` Markus Trippelsdorf
2011-10-25 0:00 ` Bernt Hansen
2011-10-25 13:50 ` Thomas Rast
1 sibling, 1 reply; 15+ messages in thread
From: Markus Trippelsdorf @ 2011-10-24 22:58 UTC (permalink / raw)
To: Richard W.M. Jones; +Cc: meyering, git
On 2011.10.24 at 22:49 +0100, Richard W.M. Jones wrote:
> On Mon, Oct 24, 2011 at 10:11:53PM +0200, Markus Trippelsdorf wrote:
> > Suddenly I'm getting strange protection faults when I run "git grep" on
> > the gcc tree:
>
> Jim Meyering and I are trying to chase what looks like a similar or
> identical bug in git-grep. We've not got much further than gdb and
> valgrind so far, but see:
>
> https://bugzilla.redhat.com/show_bug.cgi?id=747377
>
> It's slightly suspicious that this bug only started to happen with the
> latest glibc, but that could be coincidence, or could be just that
> glibc exposes a latent bug in git-grep.
Thanks for the pointer.
Compiling git with -O1 "solves" the problem for me.
This issue is independent of the exact git version being used (I tried
three different ones and always hit the problem).
It happens always on the _second_ run of "git grep" on my machine. The
first run always succeeds. So this might be a cache related issue.
--
Markus
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: general protection faults with "git grep" version 1.7.7.1
2011-10-24 22:58 ` Markus Trippelsdorf
@ 2011-10-25 0:00 ` Bernt Hansen
2011-10-25 5:53 ` Jeff King
0 siblings, 1 reply; 15+ messages in thread
From: Bernt Hansen @ 2011-10-25 0:00 UTC (permalink / raw)
To: Markus Trippelsdorf, Jeff King; +Cc: Richard W.M. Jones, meyering, git
Markus Trippelsdorf <markus@trippelsdorf.de> writes:
> On 2011.10.24 at 22:49 +0100, Richard W.M. Jones wrote:
>> On Mon, Oct 24, 2011 at 10:11:53PM +0200, Markus Trippelsdorf wrote:
>> > Suddenly I'm getting strange protection faults when I run "git grep" on
>> > the gcc tree:
>>
>> Jim Meyering and I are trying to chase what looks like a similar or
>> identical bug in git-grep. We've not got much further than gdb and
>> valgrind so far, but see:
>>
>> https://bugzilla.redhat.com/show_bug.cgi?id=747377
>>
>> It's slightly suspicious that this bug only started to happen with the
>> latest glibc, but that could be coincidence, or could be just that
>> glibc exposes a latent bug in git-grep.
>
> Thanks for the pointer.
>
> Compiling git with -O1 "solves" the problem for me.
> This issue is independent of the exact git version being used (I tried
> three different ones and always hit the problem).
> It happens always on the _second_ run of "git grep" on my machine. The
> first run always succeeds. So this might be a cache related issue.
Hi,
I updated from an old commit 2883969 (Sync with maint, 2011-10-15)
to origin/master 10b2a48 (Merge branch 'maint', 2011-10-23) today and
promptly got segfaults on git status in my org-mode repository.
Going back to the old commit makes it work again.
Git bisect identifies the following commit as the problem:
[2548183badb98d62079beea62f9d2e1f47e99902] fix phantom untracked files when core.ignorecase is set
I'm doing make && make install to my local home directory.
On the above commit I get this:
--8<---------------cut here---------------start------------->8---
bernt@gollum:~/git/org$ git status
*** glibc detected *** git: free(): invalid next size (normal): 0x08ce1a88 ***
======= Backtrace: =========
/lib/i686/cmov/libc.so.6(+0x6b281)[0xb749d281]
/lib/i686/cmov/libc.so.6(+0x6cad8)[0xb749ead8]
/lib/i686/cmov/libc.so.6(cfree+0x6d)[0xb74a1bbd]
/lib/i686/cmov/libc.so.6(+0x5c0a0)[0xb748e0a0]
/lib/i686/cmov/libc.so.6(fopen64+0x2c)[0xb749067c]
git[0x80ba49c]
git[0x810d0ab]
git[0x80627af]
git[0x804b867]
git[0x804ba73]
/lib/i686/cmov/libc.so.6(__libc_start_main+0xe6)[0xb7448c76]
git[0x804b141]
======= Memory map: ========
08048000-0814f000 r-xp 00000000 08:01 6555228 /home/bernt/git/bin/git
0814f000-08154000 rw-p 00106000 08:01 6555228 /home/bernt/git/bin/git
08154000-0819c000 rw-p 00000000 00:00 0
08cd7000-08cf8000 rw-p 00000000 00:00 0 [heap]
b7300000-b7321000 rw-p 00000000 00:00 0
b7321000-b7400000 ---p 00000000 00:00 0
b740f000-b742c000 r-xp 00000000 08:01 4603917 /lib/libgcc_s.so.1
b742c000-b742d000 rw-p 0001c000 08:01 4603917 /lib/libgcc_s.so.1
b742d000-b742e000 rw-p 00000000 00:00 0
b742e000-b7430000 r-xp 00000000 08:01 4620339 /lib/i686/cmov/libdl-2.11.2.so
b7430000-b7431000 r--p 00001000 08:01 4620339 /lib/i686/cmov/libdl-2.11.2.so
b7431000-b7432000 rw-p 00002000 08:01 4620339 /lib/i686/cmov/libdl-2.11.2.so
b7432000-b7572000 r-xp 00000000 08:01 4622760 /lib/i686/cmov/libc-2.11.2.so
b7572000-b7574000 r--p 0013f000 08:01 4622760 /lib/i686/cmov/libc-2.11.2.so
b7574000-b7575000 rw-p 00141000 08:01 4622760 /lib/i686/cmov/libc-2.11.2.so
b7575000-b7578000 rw-p 00000000 00:00 0
b7578000-b758d000 r-xp 00000000 08:01 4620333 /lib/i686/cmov/libpthread-2.11.2.so
b758d000-b758e000 r--p 00014000 08:01 4620333 /lib/i686/cmov/libpthread-2.11.2.so
b758e000-b758f000 rw-p 00015000 08:01 4620333 /lib/i686/cmov/libpthread-2.11.2.so
b758f000-b7592000 rw-p 00000000 00:00 0
b7592000-b76cf000 r-xp 00000000 08:01 794957 /usr/lib/i686/cmov/libcrypto.so.0.9.8
b76cf000-b76e7000 rw-p 0013c000 08:01 794957 /usr/lib/i686/cmov/libcrypto.so.0.9.8
b76e7000-b76ea000 rw-p 00000000 00:00 0
b76ea000-b76fd000 r-xp 00000000 08:01 286811 /usr/lib/libz.so.1.2.3.4
b76fd000-b76fe000 rw-p 00013000 08:01 286811 /usr/lib/libz.so.1.2.3.4
b771b000-b771d000 rw-p 00000000 00:00 0
b771d000-b771e000 r-xp 00000000 00:00 0 [vdso]
b771e000-b7739000 r-xp 00000000 08:01 4604271 /lib/ld-2.11.2.so
b7739000-b773a000 r--p 0001a000 08:01 4604271 /lib/ld-2.11.2.so
b773a000-b773b000 rw-p 0001b000 08:01 4604271 /lib/ld-2.11.2.so
bfa3d000-bfa52000 rw-p 00000000 00:00 0 [stack]
Aborted (core dumped)
--8<---------------cut here---------------end--------------->8---
Let me know if I can provide any more information.
Regards
Bernt
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: general protection faults with "git grep" version 1.7.7.1
2011-10-25 0:00 ` Bernt Hansen
@ 2011-10-25 5:53 ` Jeff King
2011-10-25 11:11 ` Bernt Hansen
0 siblings, 1 reply; 15+ messages in thread
From: Jeff King @ 2011-10-25 5:53 UTC (permalink / raw)
To: Bernt Hansen; +Cc: Markus Trippelsdorf, Richard W.M. Jones, meyering, git
On Mon, Oct 24, 2011 at 08:00:15PM -0400, Bernt Hansen wrote:
> I updated from an old commit 2883969 (Sync with maint, 2011-10-15)
> to origin/master 10b2a48 (Merge branch 'maint', 2011-10-23) today and
> promptly got segfaults on git status in my org-mode repository.
>
> Going back to the old commit makes it work again.
>
> Git bisect identifies the following commit as the problem:
>
> [2548183badb98d62079beea62f9d2e1f47e99902] fix phantom untracked files when core.ignorecase is set
I think this is a separate problem. See this thread and patch:
http://thread.gmane.org/gmane.comp.version-control.git/184094/focus=184148
-Peff
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: general protection faults with "git grep" version 1.7.7.1
2011-10-25 5:53 ` Jeff King
@ 2011-10-25 11:11 ` Bernt Hansen
0 siblings, 0 replies; 15+ messages in thread
From: Bernt Hansen @ 2011-10-25 11:11 UTC (permalink / raw)
To: Jeff King; +Cc: Markus Trippelsdorf, Richard W.M. Jones, meyering, git
Jeff King <peff@peff.net> writes:
> On Mon, Oct 24, 2011 at 08:00:15PM -0400, Bernt Hansen wrote:
>
>> I updated from an old commit 2883969 (Sync with maint, 2011-10-15)
>> to origin/master 10b2a48 (Merge branch 'maint', 2011-10-23) today and
>> promptly got segfaults on git status in my org-mode repository.
>>
>> Going back to the old commit makes it work again.
>>
>> Git bisect identifies the following commit as the problem:
>>
>> [2548183badb98d62079beea62f9d2e1f47e99902] fix phantom untracked files when core.ignorecase is set
>
> I think this is a separate problem. See this thread and patch:
>
> http://thread.gmane.org/gmane.comp.version-control.git/184094/focus=184148
Thanks,
I'll look at that thread.
-Bernt
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: general protection faults with "git grep" version 1.7.7.1
2011-10-24 21:49 ` Richard W.M. Jones
2011-10-24 22:58 ` Markus Trippelsdorf
@ 2011-10-25 13:50 ` Thomas Rast
2011-10-25 15:17 ` Jim Meyering
2011-10-25 15:37 ` Jeff King
1 sibling, 2 replies; 15+ messages in thread
From: Thomas Rast @ 2011-10-25 13:50 UTC (permalink / raw)
To: Richard W.M. Jones
Cc: Markus Trippelsdorf, meyering, git, Shawn O. Pearce, Jeff King,
Nicolas Pitre
[Shawn, Peff, Nicolas: maybe you can say something on the
(non)raciness of xmalloc() in parallel with read_sha1_file(). See the
last paragraph below.]
Richard W.M. Jones wrote:
> On Mon, Oct 24, 2011 at 10:11:53PM +0200, Markus Trippelsdorf wrote:
> > Suddenly I'm getting strange protection faults when I run "git grep" on
> > the gcc tree:
>
> Jim Meyering and I are trying to chase what looks like a similar or
> identical bug in git-grep. We've not got much further than gdb and
> valgrind so far, but see:
>
> https://bugzilla.redhat.com/show_bug.cgi?id=747377
>
> It's slightly suspicious that this bug only started to happen with the
> latest glibc, but that could be coincidence, or could be just that
> glibc exposes a latent bug in git-grep.
I'm tempted to write this off as a GCC bug. If that's ok for you,
I'll leave further investigation and communication with the GCC folks
to you.
My findings are as follows:
It's easy to reproduce the behavior described in the above bug report,
using an F16 beta install in a VM. I gave the VM two cores, but
didn't test what happens with only one. By "easy" I mean I didn't
have to do any fiddling and it crashes at least one out of two times.
I looked at how git builds grep.o by saying
rm builtin/grep.o; make V=1
I then modified this to give me the assembly output from the compiler
gcc -S -s builtin/grep.o -c -MF builtin/.depend/grep.o.d -MMD -MP -g -O2 -Wall -I. -DHAVE_PATHS_H -DSHA1_HEADER='<openssl/sha.h>' -DNO_STRLCPY -DNO_MKSTEMPS builtin/grep.c
and looked at the result. To interpret the output, I would like to
remind you of the following snippets:
#define grep_lock() pthread_mutex_lock(&grep_mutex)
#define grep_unlock() pthread_mutex_unlock(&grep_mutex)
...
static struct work_item *get_work(void)
{
struct work_item *ret;
grep_lock();
while (todo_start == todo_end && !all_work_added) {
pthread_cond_wait(&cond_add, &grep_mutex);
}
...
}
...
static void *run(void *arg)
{
int hit = 0;
struct grep_opt *opt = arg;
while (1) {
struct work_item *w = get_work();
...
}
...
}
Getting back to assembly, near the beginning of run() I see (labels
and .p2align snipped):
.loc 1 162 0
movl todo_end(%rip), %ebx
.loc 1 125 0
movl $grep_mutex, %edi
call pthread_mutex_lock
.loc 1 126 0
movl todo_start(%rip), %eax
cmpl %ebx, %eax
I should say that I don't really know much about assembly, in
particular not enough to write two correct lines of it. But I can't
help noticing that it moved the load of todo_end *out of* the section
where grep_mutex is locked. And the comment near the top of the file
does say that the whole todo_* family is supposed to be protected by
that mutex. What's extra odd is that the .loc seems to indicate that
the moved load comes from work_done() instead of get_work(), which is
an entirely separate locked section!
Un-inlining the get_work helper using __attribute__((noinline)) makes
the assembly
movl $grep_mutex, %edi
call pthread_mutex_lock
.loc 1 127 0
movl todo_start(%rip), %eax
cmpl todo_end(%rip), %eax
je .L15
instead; i.e., the load is now after the lock. (Note that line
numbers were wiggled by inserting an __attribute__ line.) The
beginning of run() turns into exactly the same code if I instead
prohibit inlining of work_done().
So AFAICS, we're just unlucky to hit a GCC optimizer bug that voids
all guarantees given on locks.
That being said, I'm not entirely convinced that the code in
builtin/grep.c works in the face of memory pressure. It guards
against concurrent access to read_sha1_file() with the
read_sha1_mutex, but any call to xmalloc() outside of that mutex can
still potentially invoke the try_to_free_routine. Maybe one of the
pack experts can say whether this is safe. (However, I implemented
locking around try_to_free_routine as a quick hack and it did not fix
the issue discussed in the bug report.)
--
Thomas Rast
trast@{inf,student}.ethz.ch
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: general protection faults with "git grep" version 1.7.7.1
2011-10-25 13:50 ` Thomas Rast
@ 2011-10-25 15:17 ` Jim Meyering
2011-10-25 15:32 ` Markus Trippelsdorf
2011-10-25 16:00 ` Thomas Rast
2011-10-25 15:37 ` Jeff King
1 sibling, 2 replies; 15+ messages in thread
From: Jim Meyering @ 2011-10-25 15:17 UTC (permalink / raw)
To: Thomas Rast
Cc: Richard W.M. Jones, Markus Trippelsdorf, git, Shawn O. Pearce,
Jeff King, Nicolas Pitre
Thomas Rast wrote:
> [Shawn, Peff, Nicolas: maybe you can say something on the
> (non)raciness of xmalloc() in parallel with read_sha1_file(). See the
> last paragraph below.]
>
> Richard W.M. Jones wrote:
>> On Mon, Oct 24, 2011 at 10:11:53PM +0200, Markus Trippelsdorf wrote:
>> > Suddenly I'm getting strange protection faults when I run "git grep" on
>> > the gcc tree:
>>
>> Jim Meyering and I are trying to chase what looks like a similar or
>> identical bug in git-grep. We've not got much further than gdb and
>> valgrind so far, but see:
>>
>> https://bugzilla.redhat.com/show_bug.cgi?id=747377
>>
>> It's slightly suspicious that this bug only started to happen with the
>> latest glibc, but that could be coincidence, or could be just that
>> glibc exposes a latent bug in git-grep.
>
> I'm tempted to write this off as a GCC bug. If that's ok for you,
> I'll leave further investigation and communication with the GCC folks
> to you.
>
> My findings are as follows:
>
> It's easy to reproduce the behavior described in the above bug report,
> using an F16 beta install in a VM. I gave the VM two cores, but
> didn't test what happens with only one. By "easy" I mean I didn't
> have to do any fiddling and it crashes at least one out of two times.
>
> I looked at how git builds grep.o by saying
>
> rm builtin/grep.o; make V=1
>
> I then modified this to give me the assembly output from the compiler
>
> gcc -S -s builtin/grep.o -c -MF builtin/.depend/grep.o.d -MMD -MP -g -O2 -Wall -I. -DHAVE_PATHS_H -DSHA1_HEADER='<openssl/sha.h>' -DNO_STRLCPY -DNO_MKSTEMPS builtin/grep.c
...
> So AFAICS, we're just unlucky to hit a GCC optimizer bug that voids
> all guarantees given on locks.
Thanks for the investigation.
Actually, isn't gcc -O2's code-motion justified?
While we *know* that those globals may be modified asynchronously,
builtin/grep.c forgot to tell gcc about that.
Once you do that (via "volatile"), gcc knows not to move things.
This patch solved the problem for me:
>From 8521b8033b8ecbff2e459f9e0070beb712b9b73d Mon Sep 17 00:00:00 2001
From: Jim Meyering <meyering@redhat.com>
Date: Tue, 25 Oct 2011 17:07:05 +0200
Subject: [PATCH] declare grep's thread-related global scalars to be
"volatile"
This avoids heap corruption problems that would otherwise
arise when gcc -O2 moves code out of critical sections.
For details, see http://bugzilla.redhat.com/747377 and
http://thread.gmane.org/gmane.comp.version-control.git/184184/focus=184205
Signed-off-by: Jim Meyering <meyering@redhat.com>
---
builtin/grep.c | 12 ++++++------
1 files changed, 6 insertions(+), 6 deletions(-)
diff --git a/builtin/grep.c b/builtin/grep.c
index 7d0779f..38f92de 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -64,14 +64,14 @@ struct work_item {
*/
#define TODO_SIZE 128
static struct work_item todo[TODO_SIZE];
-static int todo_start;
-static int todo_end;
-static int todo_done;
+static volatile int todo_start;
+static volatile int todo_end;
+static volatile int todo_done;
-/* Has all work items been added? */
-static int all_work_added;
+/* Have all work items been added? */
+static volatile int all_work_added;
-/* This lock protects all the variables above. */
+/* This lock protects all of the above variables. */
static pthread_mutex_t grep_mutex;
/* Used to serialize calls to read_sha1_file. */
--
1.7.7.419.g87009
^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: general protection faults with "git grep" version 1.7.7.1
2011-10-25 15:17 ` Jim Meyering
@ 2011-10-25 15:32 ` Markus Trippelsdorf
2011-10-25 16:00 ` Thomas Rast
1 sibling, 0 replies; 15+ messages in thread
From: Markus Trippelsdorf @ 2011-10-25 15:32 UTC (permalink / raw)
To: Jim Meyering
Cc: Thomas Rast, Richard W.M. Jones, git, Shawn O. Pearce, Jeff King,
Nicolas Pitre
On 2011.10.25 at 17:17 +0200, Jim Meyering wrote:
> Thomas Rast wrote:
> > [Shawn, Peff, Nicolas: maybe you can say something on the
> > (non)raciness of xmalloc() in parallel with read_sha1_file(). See the
> > last paragraph below.]
> >
> > Richard W.M. Jones wrote:
> >> On Mon, Oct 24, 2011 at 10:11:53PM +0200, Markus Trippelsdorf wrote:
> >> > Suddenly I'm getting strange protection faults when I run "git grep" on
> >> > the gcc tree:
> >>
> >> Jim Meyering and I are trying to chase what looks like a similar or
> >> identical bug in git-grep. We've not got much further than gdb and
> >> valgrind so far, but see:
> >>
> >> https://bugzilla.redhat.com/show_bug.cgi?id=747377
> >>
> >> It's slightly suspicious that this bug only started to happen with the
> >> latest glibc, but that could be coincidence, or could be just that
> >> glibc exposes a latent bug in git-grep.
> >
> > I'm tempted to write this off as a GCC bug. If that's ok for you,
> > I'll leave further investigation and communication with the GCC folks
> > to you.
> >
> > My findings are as follows:
> >
> > It's easy to reproduce the behavior described in the above bug report,
> > using an F16 beta install in a VM. I gave the VM two cores, but
> > didn't test what happens with only one. By "easy" I mean I didn't
> > have to do any fiddling and it crashes at least one out of two times.
> >
> > I looked at how git builds grep.o by saying
> >
> > rm builtin/grep.o; make V=1
> >
> > I then modified this to give me the assembly output from the compiler
> >
> > gcc -S -s builtin/grep.o -c -MF builtin/.depend/grep.o.d -MMD -MP -g -O2 -Wall -I. -DHAVE_PATHS_H -DSHA1_HEADER='<openssl/sha.h>' -DNO_STRLCPY -DNO_MKSTEMPS builtin/grep.c
> ...
> > So AFAICS, we're just unlucky to hit a GCC optimizer bug that voids
> > all guarantees given on locks.
>
> Thanks for the investigation.
> Actually, isn't gcc -O2's code-motion justified?
> While we *know* that those globals may be modified asynchronously,
> builtin/grep.c forgot to tell gcc about that.
> Once you do that (via "volatile"), gcc knows not to move things.
>
> This patch solved the problem for me:
Yes. This fixes the issue here also.
(BTW the only recent pthread related change in glibc was the removal
of the gettimeofday vsyscall)
--
Markus
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: general protection faults with "git grep" version 1.7.7.1
2011-10-25 15:17 ` Jim Meyering
2011-10-25 15:32 ` Markus Trippelsdorf
@ 2011-10-25 16:00 ` Thomas Rast
2011-10-25 16:07 ` Thomas Rast
2011-10-25 16:37 ` Jim Meyering
1 sibling, 2 replies; 15+ messages in thread
From: Thomas Rast @ 2011-10-25 16:00 UTC (permalink / raw)
To: Jim Meyering
Cc: Richard W.M. Jones, Markus Trippelsdorf, git, Shawn O. Pearce,
Jeff King, Nicolas Pitre
Jim Meyering wrote:
> Thomas Rast wrote:
> > [GCC moves access to a file-static variable across pthread_mutex_lock()]
>
> Thanks for the investigation.
> Actually, isn't gcc -O2's code-motion justified?
> While we *know* that those globals may be modified asynchronously,
> builtin/grep.c forgot to tell gcc about that.
I'm somewhat unwilling to believe that:
* "volatile" enforces three unrelated things, see e.g. [1].
* Removing "static" would do the same as it prevents the compiler from
proving at compile-time that pthread_mutex_lock() cannot affect the
variable in question.
If this is correct, it also means that all code in all pthreads
tutorials I can find works merely by the accident of not declaring
their variables "static".
Furthermore, a future smarter compiler with better link-time
optimization might again prove the same and eliminate the
"superfluous" load.
However, as a result of the discussion I now have a shorter testcase:
#include <pthread.h>
int y;
static int x;
pthread_mutex_t m = PTHREAD_MUTEX_INITIALIZER;
void test ()
{
y = x;
pthread_mutex_lock(&m);
x = x + 1;
pthread_mutex_unlock(&m);
}
GCC 4.6.1 on F16 again assumes 'x' was not modified across the lock.
I also tested GCC 4.5.1 and 4.4.5, which instead issue a direct
add-to-memory instruction
addl $1, x(%rip)
in the locked part.
In the event that you and GCC 4.6.1 are right, I still vote for
removing 'static' instead of adding 'volatile' so as to allow basic
optimizations.
--
Thomas Rast
trast@{inf,student}.ethz.ch
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: general protection faults with "git grep" version 1.7.7.1
2011-10-25 16:00 ` Thomas Rast
@ 2011-10-25 16:07 ` Thomas Rast
2011-10-25 16:37 ` Jim Meyering
1 sibling, 0 replies; 15+ messages in thread
From: Thomas Rast @ 2011-10-25 16:07 UTC (permalink / raw)
To: Jim Meyering
Cc: Richard W.M. Jones, Markus Trippelsdorf, git, Shawn O. Pearce,
Jeff King, Nicolas Pitre
Thomas Rast wrote:
> Jim Meyering wrote:
> > Thomas Rast wrote:
> > > [GCC moves access to a file-static variable across pthread_mutex_lock()]
> >
> > Thanks for the investigation.
> > Actually, isn't gcc -O2's code-motion justified?
> > While we *know* that those globals may be modified asynchronously,
> > builtin/grep.c forgot to tell gcc about that.
>
> I'm somewhat unwilling to believe that:
>
> * "volatile" enforces three unrelated things, see e.g. [1].
Argh, forgot my reference:
[1] http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2006/n2016.html
section "Existing portable uses"
--
Thomas Rast
trast@{inf,student}.ethz.ch
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: general protection faults with "git grep" version 1.7.7.1
2011-10-25 16:00 ` Thomas Rast
2011-10-25 16:07 ` Thomas Rast
@ 2011-10-25 16:37 ` Jim Meyering
2011-10-25 16:54 ` Thomas Rast
1 sibling, 1 reply; 15+ messages in thread
From: Jim Meyering @ 2011-10-25 16:37 UTC (permalink / raw)
To: Thomas Rast
Cc: Richard W.M. Jones, Markus Trippelsdorf, git, Shawn O. Pearce,
Jeff King, Nicolas Pitre
Thomas Rast wrote:
> Jim Meyering wrote:
>> Thomas Rast wrote:
>> > [GCC moves access to a file-static variable across pthread_mutex_lock()]
>>
>> Thanks for the investigation.
>> Actually, isn't gcc -O2's code-motion justified?
>> While we *know* that those globals may be modified asynchronously,
>> builtin/grep.c forgot to tell gcc about that.
>
> I'm somewhat unwilling to believe that:
You're right to be skeptical.
I should have stuck with "using volatile works around the problem for me".
The real problem seems to be in glibc, with its addition of
the "leaf" attribute to those synchronization primitives:
http://bugzilla.redhat.com/747377#c22
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: general protection faults with "git grep" version 1.7.7.1
2011-10-25 16:37 ` Jim Meyering
@ 2011-10-25 16:54 ` Thomas Rast
2011-10-25 20:24 ` Jim Meyering
0 siblings, 1 reply; 15+ messages in thread
From: Thomas Rast @ 2011-10-25 16:54 UTC (permalink / raw)
To: Jim Meyering
Cc: Richard W.M. Jones, Markus Trippelsdorf, git, Shawn O. Pearce,
Jeff King, Nicolas Pitre
Jim Meyering wrote:
> Thomas Rast wrote:
> > Jim Meyering wrote:
> >> Thomas Rast wrote:
> >> > [GCC moves access to a file-static variable across pthread_mutex_lock()]
> >>
> >> Thanks for the investigation.
> >> Actually, isn't gcc -O2's code-motion justified?
> >> While we *know* that those globals may be modified asynchronously,
> >> builtin/grep.c forgot to tell gcc about that.
> >
> > I'm somewhat unwilling to believe that:
>
> You're right to be skeptical.
> I should have stuck with "using volatile works around the problem for me".
> The real problem seems to be in glibc, with its addition of
> the "leaf" attribute to those synchronization primitives:
>
> http://bugzilla.redhat.com/747377#c22
Aha. Glad you found it :-)
Meanwhile I read
http://www.hpl.hp.com/techreports/2004/HPL-2004-209.html
which discusses a similar issue in section 4.3, but is very
interesting on its own. It's funny how it says
We know of at least three optimizing compilers (two of them
production compilers) that performed this transformation at some
point during their lifetime; usually at least partially reversing
the decision when the implications on multi-threaded code became
known.
I guess that would be four now if it was literally the same problem.
--
Thomas Rast
trast@{inf,student}.ethz.ch
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: general protection faults with "git grep" version 1.7.7.1
2011-10-25 16:54 ` Thomas Rast
@ 2011-10-25 20:24 ` Jim Meyering
0 siblings, 0 replies; 15+ messages in thread
From: Jim Meyering @ 2011-10-25 20:24 UTC (permalink / raw)
To: Thomas Rast
Cc: Richard W.M. Jones, Markus Trippelsdorf, git, Shawn O. Pearce,
Jeff King, Nicolas Pitre
Thomas Rast wrote:
...
>> The real problem seems to be in glibc, with its addition of
>> the "leaf" attribute to those synchronization primitives:
>>
>> http://bugzilla.redhat.com/747377#c22
>
> Aha. Glad you found it :-)
>
> Meanwhile I read
>
> http://www.hpl.hp.com/techreports/2004/HPL-2004-209.html
>
> which discusses a similar issue in section 4.3, but is very
> interesting on its own. It's funny how it says
>
> We know of at least three optimizing compilers (two of them
> production compilers) that performed this transformation at some
> point during their lifetime; usually at least partially reversing
> the decision when the implications on multi-threaded code became
> known.
>
> I guess that would be four now if it was literally the same problem.
Yep. For those not following the BZ comments at the about URL,
POSIX is quite clear. Quoting from
http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap04.html#tag_04_11:
The following functions synchronize memory with respect to other threads:
fork
pthread_barrier_wait
pthread_cond_broadcast
pthread_cond_signal
pthread_cond_timedwait
pthread_cond_wait
pthread_create
pthread_join
pthread_mutex_lock
pthread_mutex_timedlock
pthread_mutex_trylock
pthread_mutex_unlock
pthread_spin_lock
pthread_spin_trylock
pthread_spin_unlock
pthread_rwlock_rdlock
pthread_rwlock_timedrdlock
pthread_rwlock_timedwrlock
pthread_rwlock_tryrdlock
pthread_rwlock_trywrlock
pthread_rwlock_unlock
pthread_rwlock_wrlock
sem_post
sem_timedwait
sem_trywait
sem_wait
semctl
semop
wait
waitpid
glibc's addition of the leaf attribute to any of those
appears to make gcc violate that.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: general protection faults with "git grep" version 1.7.7.1
2011-10-25 13:50 ` Thomas Rast
2011-10-25 15:17 ` Jim Meyering
@ 2011-10-25 15:37 ` Jeff King
1 sibling, 0 replies; 15+ messages in thread
From: Jeff King @ 2011-10-25 15:37 UTC (permalink / raw)
To: Thomas Rast
Cc: Richard W.M. Jones, Markus Trippelsdorf, meyering, git,
Shawn O. Pearce, Nicolas Pitre
On Tue, Oct 25, 2011 at 03:50:21PM +0200, Thomas Rast wrote:
> That being said, I'm not entirely convinced that the code in
> builtin/grep.c works in the face of memory pressure. It guards
> against concurrent access to read_sha1_file() with the
> read_sha1_mutex, but any call to xmalloc() outside of that mutex can
> still potentially invoke the try_to_free_routine. Maybe one of the
> pack experts can say whether this is safe. (However, I implemented
> locking around try_to_free_routine as a quick hack and it did not fix
> the issue discussed in the bug report.)
Yes, I think it needs to set try_to_free_routine. See this thread:
http://thread.gmane.org/gmane.comp.version-control.git/180446
which discusses a possible subtlety with doing so.
-Peff
^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2011-10-25 20:24 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-10-24 20:11 general protection faults with "git grep" version 1.7.7.1 Markus Trippelsdorf
2011-10-24 21:49 ` Richard W.M. Jones
2011-10-24 22:58 ` Markus Trippelsdorf
2011-10-25 0:00 ` Bernt Hansen
2011-10-25 5:53 ` Jeff King
2011-10-25 11:11 ` Bernt Hansen
2011-10-25 13:50 ` Thomas Rast
2011-10-25 15:17 ` Jim Meyering
2011-10-25 15:32 ` Markus Trippelsdorf
2011-10-25 16:00 ` Thomas Rast
2011-10-25 16:07 ` Thomas Rast
2011-10-25 16:37 ` Jim Meyering
2011-10-25 16:54 ` Thomas Rast
2011-10-25 20:24 ` Jim Meyering
2011-10-25 15:37 ` Jeff King
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).