git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* general protection faults with "git grep" version 1.7.7.1
@ 2011-10-24 20:11 Markus Trippelsdorf
  2011-10-24 21:49 ` Richard W.M. Jones
  0 siblings, 1 reply; 15+ messages in thread
From: Markus Trippelsdorf @ 2011-10-24 20:11 UTC (permalink / raw)
  To: git

Suddenly I'm getting strange protection faults when I run "git grep" on
the gcc tree:

git[4245] general protection ip:7f291f01461f sp:7fff5618a8b0 error:0 in libc-2.14.90.so[7f291ef9a000+15d000]

 % gdb git
GNU gdb (Gentoo 7.3.1 p1) 7.3.1
Copyright (C) 2011 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-pc-linux-gnu".
For bug reporting instructions, please see:
<http://bugs.gentoo.org/>...
Reading symbols from /usr/bin/git...done.
(gdb) run grep composite_pointer_type
Starting program: /usr/bin/git grep composite_pointer_type
warning: no loadable sections found in added symbol-file system-supplied DSO at 0x7ffff7ffa000
[Thread debugging using libthread_db enabled]
[New Thread 0x7ffff7859700 (LWP 18367)]
[New Thread 0x7ffff7058700 (LWP 18368)]
[New Thread 0x7ffff6857700 (LWP 18369)]
[New Thread 0x7ffff6056700 (LWP 18370)]
[New Thread 0x7ffff5855700 (LWP 18371)]
[New Thread 0x7ffff5054700 (LWP 18372)]
[New Thread 0x7ffff4853700 (LWP 18373)]
[New Thread 0x7ffff4052700 (LWP 18374)]

Program received signal SIGSEGV, Segmentation fault.
_int_malloc (av=0x7ffff7bbb600, bytes=21) at malloc.c:3463
3463        while ((pp = catomic_compare_and_exchange_val_acq (fb, victim->fd, victim))
(gdb) bt
#0  _int_malloc (av=0x7ffff7bbb600, bytes=21) at malloc.c:3463
#1  0x00007ffff78d7300 in __GI___libc_malloc (bytes=21) at malloc.c:2924
#2  0x00007ffff78dc692 in __GI___strdup (s=0x7ffff2665760 "gcc/ada/i-cexten.ads") at strdup.c:43
#3  0x00000000004d5069 in xstrdup (str=0x7ffff2665760 "gcc/ada/i-cexten.ads") at wrapper.c:23
#4  0x000000000042c448 in grep_file_async (filename=0x7ffff2665760 "gcc/ada/i-cexten.ads", name=0x59fee0 "gcc/ada/i-cexten.ads", 
    opt=<optimized out>) at builtin/grep.c:148
#5  grep_file (opt=0x7fffffffbfc0, filename=0x7ffff2665760 "gcc/ada/i-cexten.ads") at builtin/grep.c:459
#6  0x000000000042ddb0 in grep_cache (cached=0, pathspec=0x7fffffffbf70, opt=0x7fffffffbfc0) at builtin/grep.c:528
#7  cmd_grep (argc=<optimized out>, argv=0x7ffff2665760, prefix=0x0) at builtin/grep.c:1062
#8  0x00000000004045b0 in run_builtin (argv=0x7fffffffe110, argc=2, p=0x536ba0) at git.c:308
#9  handle_internal_command (argc=2, argv=0x7fffffffe110) at git.c:466
#10 0x00000000004047ac in run_argv (argv=0x7fffffffdfa0, argcp=0x7fffffffdfac) at git.c:512
#11 main (argc=2, argv=0x7fffffffe110) at git.c:585
(gdb) 

or:

 % git grep composite_pointer_type
*** glibc detected *** git: double free or corruption (fasttop): 0x0000000001919800 ***
======= Backtrace: =========
/lib64/libc.so.6(+0x792b6)[0x7f6ad1d392b6]
git[0x42bebb]
/lib64/libpthread.so.0(+0x7c9e)[0x7f6ad202ec9e]
/lib64/libc.so.6(clone+0x6d)[0x7f6ad1d99b8d]
======= Memory map: ========
00400000-00536000 r-xp 00000000 08:12 2310166                            /usr/bin/git
00536000-0053d000 rw-p 00136000 08:12 2310166                            /usr/bin/git
0053d000-0058b000 rw-p 00000000 00:00 0
01906000-01927000 rw-p 00000000 00:00 0                                  [heap]
...

And strange output:
...
gcc/cp/typeck.c:        result_type = composite_pointer_type (type0, type1, op0, op1,
gcc/cp/typeck.c:        result_type = composite_pointer_type (type0, type1, op0, op1,
error: 'gcc/testsuite/ada/acats/tests/c5/c54a24a.ada': short read No such file or directory
error: 'gcc/testsuite/ada/acats/tests/c5/c54a13a.ada': short read No such file or directory
error: 'gcc/testsuite/ada/acats/tests/c6/c64104m.ada': short read No such file or directory
error: 'gcc/testsuite/ada/acats/tests/cd/cd7101f.dep': short read No such file or directory
error: 'gcc/testsuite/ada/acats/tests/ce/ce3904a.ada': short read No such file or directory
error: 'gcc/testsuite/g++.dg/abi/pr39188-3b.C': short read No such file or directory
error: '<90>Dz^A': short read Is a directory

Note that all the above files actually exist.
All of this started with version v1.7.7.1, which I installed today. I
never had any problems with git before.
Any ideas what might be going on?
-- 
Markus

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: general protection faults with "git grep" version 1.7.7.1
  2011-10-24 20:11 general protection faults with "git grep" version 1.7.7.1 Markus Trippelsdorf
@ 2011-10-24 21:49 ` Richard W.M. Jones
  2011-10-24 22:58   ` Markus Trippelsdorf
  2011-10-25 13:50   ` Thomas Rast
  0 siblings, 2 replies; 15+ messages in thread
From: Richard W.M. Jones @ 2011-10-24 21:49 UTC (permalink / raw)
  To: Markus Trippelsdorf, meyering; +Cc: git

On Mon, Oct 24, 2011 at 10:11:53PM +0200, Markus Trippelsdorf wrote:
> Suddenly I'm getting strange protection faults when I run "git grep" on
> the gcc tree:

Jim Meyering and I are trying to chase what looks like a similar or
identical bug in git-grep.  We've not got much further than gdb and
valgrind so far, but see:

https://bugzilla.redhat.com/show_bug.cgi?id=747377

It's slightly suspicious that this bug only started to happen with the
latest glibc, but that could be coincidence, or could be just that
glibc exposes a latent bug in git-grep.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
virt-df lists disk usage of guests without needing to install any
software inside the virtual machine.  Supports Linux and Windows.
http://et.redhat.com/~rjones/virt-df/

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: general protection faults with "git grep" version 1.7.7.1
  2011-10-24 21:49 ` Richard W.M. Jones
@ 2011-10-24 22:58   ` Markus Trippelsdorf
  2011-10-25  0:00     ` Bernt Hansen
  2011-10-25 13:50   ` Thomas Rast
  1 sibling, 1 reply; 15+ messages in thread
From: Markus Trippelsdorf @ 2011-10-24 22:58 UTC (permalink / raw)
  To: Richard W.M. Jones; +Cc: meyering, git

On 2011.10.24 at 22:49 +0100, Richard W.M. Jones wrote:
> On Mon, Oct 24, 2011 at 10:11:53PM +0200, Markus Trippelsdorf wrote:
> > Suddenly I'm getting strange protection faults when I run "git grep" on
> > the gcc tree:
> 
> Jim Meyering and I are trying to chase what looks like a similar or
> identical bug in git-grep.  We've not got much further than gdb and
> valgrind so far, but see:
> 
> https://bugzilla.redhat.com/show_bug.cgi?id=747377
> 
> It's slightly suspicious that this bug only started to happen with the
> latest glibc, but that could be coincidence, or could be just that
> glibc exposes a latent bug in git-grep.

Thanks for the pointer.

Compiling git with -O1 "solves" the problem for me. 
This issue is independent of the exact git version being used (I tried
three different ones and always hit the problem).
It happens always on the _second_ run of "git grep" on my machine. The
first run always succeeds. So this might be a cache related issue.

-- 
Markus

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: general protection faults with "git grep" version 1.7.7.1
  2011-10-24 22:58   ` Markus Trippelsdorf
@ 2011-10-25  0:00     ` Bernt Hansen
  2011-10-25  5:53       ` Jeff King
  0 siblings, 1 reply; 15+ messages in thread
From: Bernt Hansen @ 2011-10-25  0:00 UTC (permalink / raw)
  To: Markus Trippelsdorf, Jeff King; +Cc: Richard W.M. Jones, meyering, git

Markus Trippelsdorf <markus@trippelsdorf.de> writes:

> On 2011.10.24 at 22:49 +0100, Richard W.M. Jones wrote:
>> On Mon, Oct 24, 2011 at 10:11:53PM +0200, Markus Trippelsdorf wrote:
>> > Suddenly I'm getting strange protection faults when I run "git grep" on
>> > the gcc tree:
>> 
>> Jim Meyering and I are trying to chase what looks like a similar or
>> identical bug in git-grep.  We've not got much further than gdb and
>> valgrind so far, but see:
>> 
>> https://bugzilla.redhat.com/show_bug.cgi?id=747377
>> 
>> It's slightly suspicious that this bug only started to happen with the
>> latest glibc, but that could be coincidence, or could be just that
>> glibc exposes a latent bug in git-grep.
>
> Thanks for the pointer.
>
> Compiling git with -O1 "solves" the problem for me. 
> This issue is independent of the exact git version being used (I tried
> three different ones and always hit the problem).
> It happens always on the _second_ run of "git grep" on my machine. The
> first run always succeeds. So this might be a cache related issue.

Hi,

I updated from an old commit 2883969 (Sync with maint, 2011-10-15)
to origin/master 10b2a48 (Merge branch 'maint', 2011-10-23) today and
promptly got segfaults on git status in my org-mode repository.

Going back to the old commit makes it work again.

Git bisect identifies the following commit as the problem:

[2548183badb98d62079beea62f9d2e1f47e99902] fix phantom untracked files when core.ignorecase is set

I'm doing make && make install to my local home directory.

On the above commit I get this:

--8<---------------cut here---------------start------------->8---
bernt@gollum:~/git/org$ git status
*** glibc detected *** git: free(): invalid next size (normal): 0x08ce1a88 ***
======= Backtrace: =========
/lib/i686/cmov/libc.so.6(+0x6b281)[0xb749d281]
/lib/i686/cmov/libc.so.6(+0x6cad8)[0xb749ead8]
/lib/i686/cmov/libc.so.6(cfree+0x6d)[0xb74a1bbd]
/lib/i686/cmov/libc.so.6(+0x5c0a0)[0xb748e0a0]
/lib/i686/cmov/libc.so.6(fopen64+0x2c)[0xb749067c]
git[0x80ba49c]
git[0x810d0ab]
git[0x80627af]
git[0x804b867]
git[0x804ba73]
/lib/i686/cmov/libc.so.6(__libc_start_main+0xe6)[0xb7448c76]
git[0x804b141]
======= Memory map: ========
08048000-0814f000 r-xp 00000000 08:01 6555228    /home/bernt/git/bin/git
0814f000-08154000 rw-p 00106000 08:01 6555228    /home/bernt/git/bin/git
08154000-0819c000 rw-p 00000000 00:00 0 
08cd7000-08cf8000 rw-p 00000000 00:00 0          [heap]
b7300000-b7321000 rw-p 00000000 00:00 0 
b7321000-b7400000 ---p 00000000 00:00 0 
b740f000-b742c000 r-xp 00000000 08:01 4603917    /lib/libgcc_s.so.1
b742c000-b742d000 rw-p 0001c000 08:01 4603917    /lib/libgcc_s.so.1
b742d000-b742e000 rw-p 00000000 00:00 0 
b742e000-b7430000 r-xp 00000000 08:01 4620339    /lib/i686/cmov/libdl-2.11.2.so
b7430000-b7431000 r--p 00001000 08:01 4620339    /lib/i686/cmov/libdl-2.11.2.so
b7431000-b7432000 rw-p 00002000 08:01 4620339    /lib/i686/cmov/libdl-2.11.2.so
b7432000-b7572000 r-xp 00000000 08:01 4622760    /lib/i686/cmov/libc-2.11.2.so
b7572000-b7574000 r--p 0013f000 08:01 4622760    /lib/i686/cmov/libc-2.11.2.so
b7574000-b7575000 rw-p 00141000 08:01 4622760    /lib/i686/cmov/libc-2.11.2.so
b7575000-b7578000 rw-p 00000000 00:00 0 
b7578000-b758d000 r-xp 00000000 08:01 4620333    /lib/i686/cmov/libpthread-2.11.2.so
b758d000-b758e000 r--p 00014000 08:01 4620333    /lib/i686/cmov/libpthread-2.11.2.so
b758e000-b758f000 rw-p 00015000 08:01 4620333    /lib/i686/cmov/libpthread-2.11.2.so
b758f000-b7592000 rw-p 00000000 00:00 0 
b7592000-b76cf000 r-xp 00000000 08:01 794957     /usr/lib/i686/cmov/libcrypto.so.0.9.8
b76cf000-b76e7000 rw-p 0013c000 08:01 794957     /usr/lib/i686/cmov/libcrypto.so.0.9.8
b76e7000-b76ea000 rw-p 00000000 00:00 0 
b76ea000-b76fd000 r-xp 00000000 08:01 286811     /usr/lib/libz.so.1.2.3.4
b76fd000-b76fe000 rw-p 00013000 08:01 286811     /usr/lib/libz.so.1.2.3.4
b771b000-b771d000 rw-p 00000000 00:00 0 
b771d000-b771e000 r-xp 00000000 00:00 0          [vdso]
b771e000-b7739000 r-xp 00000000 08:01 4604271    /lib/ld-2.11.2.so
b7739000-b773a000 r--p 0001a000 08:01 4604271    /lib/ld-2.11.2.so
b773a000-b773b000 rw-p 0001b000 08:01 4604271    /lib/ld-2.11.2.so
bfa3d000-bfa52000 rw-p 00000000 00:00 0          [stack]
Aborted (core dumped)
--8<---------------cut here---------------end--------------->8---

Let me know if I can provide any more information.

Regards
Bernt

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: general protection faults with "git grep" version 1.7.7.1
  2011-10-25  0:00     ` Bernt Hansen
@ 2011-10-25  5:53       ` Jeff King
  2011-10-25 11:11         ` Bernt Hansen
  0 siblings, 1 reply; 15+ messages in thread
From: Jeff King @ 2011-10-25  5:53 UTC (permalink / raw)
  To: Bernt Hansen; +Cc: Markus Trippelsdorf, Richard W.M. Jones, meyering, git

On Mon, Oct 24, 2011 at 08:00:15PM -0400, Bernt Hansen wrote:

> I updated from an old commit 2883969 (Sync with maint, 2011-10-15)
> to origin/master 10b2a48 (Merge branch 'maint', 2011-10-23) today and
> promptly got segfaults on git status in my org-mode repository.
> 
> Going back to the old commit makes it work again.
> 
> Git bisect identifies the following commit as the problem:
> 
> [2548183badb98d62079beea62f9d2e1f47e99902] fix phantom untracked files when core.ignorecase is set

I think this is a separate problem. See this thread and patch:

  http://thread.gmane.org/gmane.comp.version-control.git/184094/focus=184148

-Peff

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: general protection faults with "git grep" version 1.7.7.1
  2011-10-25  5:53       ` Jeff King
@ 2011-10-25 11:11         ` Bernt Hansen
  0 siblings, 0 replies; 15+ messages in thread
From: Bernt Hansen @ 2011-10-25 11:11 UTC (permalink / raw)
  To: Jeff King; +Cc: Markus Trippelsdorf, Richard W.M. Jones, meyering, git

Jeff King <peff@peff.net> writes:

> On Mon, Oct 24, 2011 at 08:00:15PM -0400, Bernt Hansen wrote:
>
>> I updated from an old commit 2883969 (Sync with maint, 2011-10-15)
>> to origin/master 10b2a48 (Merge branch 'maint', 2011-10-23) today and
>> promptly got segfaults on git status in my org-mode repository.
>> 
>> Going back to the old commit makes it work again.
>> 
>> Git bisect identifies the following commit as the problem:
>> 
>> [2548183badb98d62079beea62f9d2e1f47e99902] fix phantom untracked files when core.ignorecase is set
>
> I think this is a separate problem. See this thread and patch:
>
>   http://thread.gmane.org/gmane.comp.version-control.git/184094/focus=184148

Thanks,

I'll look at that thread.

-Bernt

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: general protection faults with "git grep" version 1.7.7.1
  2011-10-24 21:49 ` Richard W.M. Jones
  2011-10-24 22:58   ` Markus Trippelsdorf
@ 2011-10-25 13:50   ` Thomas Rast
  2011-10-25 15:17     ` Jim Meyering
  2011-10-25 15:37     ` Jeff King
  1 sibling, 2 replies; 15+ messages in thread
From: Thomas Rast @ 2011-10-25 13:50 UTC (permalink / raw)
  To: Richard W.M. Jones
  Cc: Markus Trippelsdorf, meyering, git, Shawn O. Pearce, Jeff King,
	Nicolas Pitre

[Shawn, Peff, Nicolas: maybe you can say something on the
(non)raciness of xmalloc() in parallel with read_sha1_file().  See the
last paragraph below.]

Richard W.M. Jones wrote:
> On Mon, Oct 24, 2011 at 10:11:53PM +0200, Markus Trippelsdorf wrote:
> > Suddenly I'm getting strange protection faults when I run "git grep" on
> > the gcc tree:
> 
> Jim Meyering and I are trying to chase what looks like a similar or
> identical bug in git-grep.  We've not got much further than gdb and
> valgrind so far, but see:
> 
> https://bugzilla.redhat.com/show_bug.cgi?id=747377
> 
> It's slightly suspicious that this bug only started to happen with the
> latest glibc, but that could be coincidence, or could be just that
> glibc exposes a latent bug in git-grep.

I'm tempted to write this off as a GCC bug.  If that's ok for you,
I'll leave further investigation and communication with the GCC folks
to you.

My findings are as follows:

It's easy to reproduce the behavior described in the above bug report,
using an F16 beta install in a VM.  I gave the VM two cores, but
didn't test what happens with only one.  By "easy" I mean I didn't
have to do any fiddling and it crashes at least one out of two times.

I looked at how git builds grep.o by saying

  rm builtin/grep.o; make V=1

I then modified this to give me the assembly output from the compiler

  gcc -S -s builtin/grep.o -c -MF builtin/.depend/grep.o.d -MMD -MP  -g -O2 -Wall -I.  -DHAVE_PATHS_H -DSHA1_HEADER='<openssl/sha.h>'  -DNO_STRLCPY -DNO_MKSTEMPS  builtin/grep.c

and looked at the result.  To interpret the output, I would like to
remind you of the following snippets:

  #define grep_lock() pthread_mutex_lock(&grep_mutex)
  #define grep_unlock() pthread_mutex_unlock(&grep_mutex)
...
  static struct work_item *get_work(void)
  {
          struct work_item *ret;

          grep_lock();
          while (todo_start == todo_end && !all_work_added) {
                  pthread_cond_wait(&cond_add, &grep_mutex);
          }

...
  }
...
  static void *run(void *arg)
  {
          int hit = 0;
          struct grep_opt *opt = arg;

          while (1) {
                  struct work_item *w = get_work();
...
          }
...
  }

Getting back to assembly, near the beginning of run() I see (labels
and .p2align snipped):

	.loc 1 162 0
	movl	todo_end(%rip), %ebx
	.loc 1 125 0
	movl	$grep_mutex, %edi
	call	pthread_mutex_lock
	.loc 1 126 0
	movl	todo_start(%rip), %eax
	cmpl	%ebx, %eax

I should say that I don't really know much about assembly, in
particular not enough to write two correct lines of it.  But I can't
help noticing that it moved the load of todo_end *out of* the section
where grep_mutex is locked.  And the comment near the top of the file
does say that the whole todo_* family is supposed to be protected by
that mutex.  What's extra odd is that the .loc seems to indicate that
the moved load comes from work_done() instead of get_work(), which is
an entirely separate locked section!

Un-inlining the get_work helper using __attribute__((noinline)) makes
the assembly

	movl	$grep_mutex, %edi
	call	pthread_mutex_lock
	.loc 1 127 0
	movl	todo_start(%rip), %eax
	cmpl	todo_end(%rip), %eax
	je	.L15

instead; i.e., the load is now after the lock.  (Note that line
numbers were wiggled by inserting an __attribute__ line.)  The
beginning of run() turns into exactly the same code if I instead
prohibit inlining of work_done().

So AFAICS, we're just unlucky to hit a GCC optimizer bug that voids
all guarantees given on locks.


That being said, I'm not entirely convinced that the code in
builtin/grep.c works in the face of memory pressure.  It guards
against concurrent access to read_sha1_file() with the
read_sha1_mutex, but any call to xmalloc() outside of that mutex can
still potentially invoke the try_to_free_routine.  Maybe one of the
pack experts can say whether this is safe.  (However, I implemented
locking around try_to_free_routine as a quick hack and it did not fix
the issue discussed in the bug report.)

-- 
Thomas Rast
trast@{inf,student}.ethz.ch

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: general protection faults with "git grep" version 1.7.7.1
  2011-10-25 13:50   ` Thomas Rast
@ 2011-10-25 15:17     ` Jim Meyering
  2011-10-25 15:32       ` Markus Trippelsdorf
  2011-10-25 16:00       ` Thomas Rast
  2011-10-25 15:37     ` Jeff King
  1 sibling, 2 replies; 15+ messages in thread
From: Jim Meyering @ 2011-10-25 15:17 UTC (permalink / raw)
  To: Thomas Rast
  Cc: Richard W.M. Jones, Markus Trippelsdorf, git, Shawn O. Pearce,
	Jeff King, Nicolas Pitre

Thomas Rast wrote:
> [Shawn, Peff, Nicolas: maybe you can say something on the
> (non)raciness of xmalloc() in parallel with read_sha1_file().  See the
> last paragraph below.]
>
> Richard W.M. Jones wrote:
>> On Mon, Oct 24, 2011 at 10:11:53PM +0200, Markus Trippelsdorf wrote:
>> > Suddenly I'm getting strange protection faults when I run "git grep" on
>> > the gcc tree:
>>
>> Jim Meyering and I are trying to chase what looks like a similar or
>> identical bug in git-grep.  We've not got much further than gdb and
>> valgrind so far, but see:
>>
>> https://bugzilla.redhat.com/show_bug.cgi?id=747377
>>
>> It's slightly suspicious that this bug only started to happen with the
>> latest glibc, but that could be coincidence, or could be just that
>> glibc exposes a latent bug in git-grep.
>
> I'm tempted to write this off as a GCC bug.  If that's ok for you,
> I'll leave further investigation and communication with the GCC folks
> to you.
>
> My findings are as follows:
>
> It's easy to reproduce the behavior described in the above bug report,
> using an F16 beta install in a VM.  I gave the VM two cores, but
> didn't test what happens with only one.  By "easy" I mean I didn't
> have to do any fiddling and it crashes at least one out of two times.
>
> I looked at how git builds grep.o by saying
>
>   rm builtin/grep.o; make V=1
>
> I then modified this to give me the assembly output from the compiler
>
>   gcc -S -s builtin/grep.o -c -MF builtin/.depend/grep.o.d -MMD -MP  -g -O2 -Wall -I.  -DHAVE_PATHS_H -DSHA1_HEADER='<openssl/sha.h>'  -DNO_STRLCPY -DNO_MKSTEMPS  builtin/grep.c
...
> So AFAICS, we're just unlucky to hit a GCC optimizer bug that voids
> all guarantees given on locks.

Thanks for the investigation.
Actually, isn't gcc -O2's code-motion justified?
While we *know* that those globals may be modified asynchronously,
builtin/grep.c forgot to tell gcc about that.
Once you do that (via "volatile"), gcc knows not to move things.

This patch solved the problem for me:

>From 8521b8033b8ecbff2e459f9e0070beb712b9b73d Mon Sep 17 00:00:00 2001
From: Jim Meyering <meyering@redhat.com>
Date: Tue, 25 Oct 2011 17:07:05 +0200
Subject: [PATCH] declare grep's thread-related global scalars to be
 "volatile"

This avoids heap corruption problems that would otherwise
arise when gcc -O2 moves code out of critical sections.
For details, see http://bugzilla.redhat.com/747377 and
http://thread.gmane.org/gmane.comp.version-control.git/184184/focus=184205

Signed-off-by: Jim Meyering <meyering@redhat.com>
---
 builtin/grep.c |   12 ++++++------
 1 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/builtin/grep.c b/builtin/grep.c
index 7d0779f..38f92de 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -64,14 +64,14 @@ struct work_item {
  */
 #define TODO_SIZE 128
 static struct work_item todo[TODO_SIZE];
-static int todo_start;
-static int todo_end;
-static int todo_done;
+static volatile int todo_start;
+static volatile int todo_end;
+static volatile int todo_done;

-/* Has all work items been added? */
-static int all_work_added;
+/* Have all work items been added? */
+static volatile int all_work_added;

-/* This lock protects all the variables above. */
+/* This lock protects all of the above variables. */
 static pthread_mutex_t grep_mutex;

 /* Used to serialize calls to read_sha1_file. */
--
1.7.7.419.g87009

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: general protection faults with "git grep" version 1.7.7.1
  2011-10-25 15:17     ` Jim Meyering
@ 2011-10-25 15:32       ` Markus Trippelsdorf
  2011-10-25 16:00       ` Thomas Rast
  1 sibling, 0 replies; 15+ messages in thread
From: Markus Trippelsdorf @ 2011-10-25 15:32 UTC (permalink / raw)
  To: Jim Meyering
  Cc: Thomas Rast, Richard W.M. Jones, git, Shawn O. Pearce, Jeff King,
	Nicolas Pitre

On 2011.10.25 at 17:17 +0200, Jim Meyering wrote:
> Thomas Rast wrote:
> > [Shawn, Peff, Nicolas: maybe you can say something on the
> > (non)raciness of xmalloc() in parallel with read_sha1_file().  See the
> > last paragraph below.]
> >
> > Richard W.M. Jones wrote:
> >> On Mon, Oct 24, 2011 at 10:11:53PM +0200, Markus Trippelsdorf wrote:
> >> > Suddenly I'm getting strange protection faults when I run "git grep" on
> >> > the gcc tree:
> >>
> >> Jim Meyering and I are trying to chase what looks like a similar or
> >> identical bug in git-grep.  We've not got much further than gdb and
> >> valgrind so far, but see:
> >>
> >> https://bugzilla.redhat.com/show_bug.cgi?id=747377
> >>
> >> It's slightly suspicious that this bug only started to happen with the
> >> latest glibc, but that could be coincidence, or could be just that
> >> glibc exposes a latent bug in git-grep.
> >
> > I'm tempted to write this off as a GCC bug.  If that's ok for you,
> > I'll leave further investigation and communication with the GCC folks
> > to you.
> >
> > My findings are as follows:
> >
> > It's easy to reproduce the behavior described in the above bug report,
> > using an F16 beta install in a VM.  I gave the VM two cores, but
> > didn't test what happens with only one.  By "easy" I mean I didn't
> > have to do any fiddling and it crashes at least one out of two times.
> >
> > I looked at how git builds grep.o by saying
> >
> >   rm builtin/grep.o; make V=1
> >
> > I then modified this to give me the assembly output from the compiler
> >
> >   gcc -S -s builtin/grep.o -c -MF builtin/.depend/grep.o.d -MMD -MP  -g -O2 -Wall -I.  -DHAVE_PATHS_H -DSHA1_HEADER='<openssl/sha.h>'  -DNO_STRLCPY -DNO_MKSTEMPS  builtin/grep.c
> ...
> > So AFAICS, we're just unlucky to hit a GCC optimizer bug that voids
> > all guarantees given on locks.
> 
> Thanks for the investigation.
> Actually, isn't gcc -O2's code-motion justified?
> While we *know* that those globals may be modified asynchronously,
> builtin/grep.c forgot to tell gcc about that.
> Once you do that (via "volatile"), gcc knows not to move things.
> 
> This patch solved the problem for me:

Yes. This fixes the issue here also.

(BTW the only recent pthread related change in glibc was the removal
of the gettimeofday vsyscall)

-- 
Markus

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: general protection faults with "git grep" version 1.7.7.1
  2011-10-25 13:50   ` Thomas Rast
  2011-10-25 15:17     ` Jim Meyering
@ 2011-10-25 15:37     ` Jeff King
  1 sibling, 0 replies; 15+ messages in thread
From: Jeff King @ 2011-10-25 15:37 UTC (permalink / raw)
  To: Thomas Rast
  Cc: Richard W.M. Jones, Markus Trippelsdorf, meyering, git,
	Shawn O. Pearce, Nicolas Pitre

On Tue, Oct 25, 2011 at 03:50:21PM +0200, Thomas Rast wrote:

> That being said, I'm not entirely convinced that the code in
> builtin/grep.c works in the face of memory pressure.  It guards
> against concurrent access to read_sha1_file() with the
> read_sha1_mutex, but any call to xmalloc() outside of that mutex can
> still potentially invoke the try_to_free_routine.  Maybe one of the
> pack experts can say whether this is safe.  (However, I implemented
> locking around try_to_free_routine as a quick hack and it did not fix
> the issue discussed in the bug report.)

Yes, I think it needs to set try_to_free_routine. See this thread:

  http://thread.gmane.org/gmane.comp.version-control.git/180446

which discusses a possible subtlety with doing so.

-Peff

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: general protection faults with "git grep" version 1.7.7.1
  2011-10-25 15:17     ` Jim Meyering
  2011-10-25 15:32       ` Markus Trippelsdorf
@ 2011-10-25 16:00       ` Thomas Rast
  2011-10-25 16:07         ` Thomas Rast
  2011-10-25 16:37         ` Jim Meyering
  1 sibling, 2 replies; 15+ messages in thread
From: Thomas Rast @ 2011-10-25 16:00 UTC (permalink / raw)
  To: Jim Meyering
  Cc: Richard W.M. Jones, Markus Trippelsdorf, git, Shawn O. Pearce,
	Jeff King, Nicolas Pitre

Jim Meyering wrote:
> Thomas Rast wrote:
> > [GCC moves access to a file-static variable across pthread_mutex_lock()]
> 
> Thanks for the investigation.
> Actually, isn't gcc -O2's code-motion justified?
> While we *know* that those globals may be modified asynchronously,
> builtin/grep.c forgot to tell gcc about that.

I'm somewhat unwilling to believe that:

* "volatile" enforces three unrelated things, see e.g. [1].

* Removing "static" would do the same as it prevents the compiler from
  proving at compile-time that pthread_mutex_lock() cannot affect the
  variable in question.

  If this is correct, it also means that all code in all pthreads
  tutorials I can find works merely by the accident of not declaring
  their variables "static".

  Furthermore, a future smarter compiler with better link-time
  optimization might again prove the same and eliminate the
  "superfluous" load.

However, as a result of the discussion I now have a shorter testcase:

  #include <pthread.h>

  int y;
  static int x;

  pthread_mutex_t m = PTHREAD_MUTEX_INITIALIZER;

  void test ()
  {
          y = x;
          pthread_mutex_lock(&m);
          x = x + 1;
          pthread_mutex_unlock(&m);
  }

GCC 4.6.1 on F16 again assumes 'x' was not modified across the lock.
I also tested GCC 4.5.1 and 4.4.5, which instead issue a direct
add-to-memory instruction

        addl    $1, x(%rip)

in the locked part.

In the event that you and GCC 4.6.1 are right, I still vote for
removing 'static' instead of adding 'volatile' so as to allow basic
optimizations.

-- 
Thomas Rast
trast@{inf,student}.ethz.ch

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: general protection faults with "git grep" version 1.7.7.1
  2011-10-25 16:00       ` Thomas Rast
@ 2011-10-25 16:07         ` Thomas Rast
  2011-10-25 16:37         ` Jim Meyering
  1 sibling, 0 replies; 15+ messages in thread
From: Thomas Rast @ 2011-10-25 16:07 UTC (permalink / raw)
  To: Jim Meyering
  Cc: Richard W.M. Jones, Markus Trippelsdorf, git, Shawn O. Pearce,
	Jeff King, Nicolas Pitre

Thomas Rast wrote:
> Jim Meyering wrote:
> > Thomas Rast wrote:
> > > [GCC moves access to a file-static variable across pthread_mutex_lock()]
> > 
> > Thanks for the investigation.
> > Actually, isn't gcc -O2's code-motion justified?
> > While we *know* that those globals may be modified asynchronously,
> > builtin/grep.c forgot to tell gcc about that.
> 
> I'm somewhat unwilling to believe that:
> 
> * "volatile" enforces three unrelated things, see e.g. [1].

Argh, forgot my reference:
[1] http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2006/n2016.html
    section "Existing portable uses"

-- 
Thomas Rast
trast@{inf,student}.ethz.ch

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: general protection faults with "git grep" version 1.7.7.1
  2011-10-25 16:00       ` Thomas Rast
  2011-10-25 16:07         ` Thomas Rast
@ 2011-10-25 16:37         ` Jim Meyering
  2011-10-25 16:54           ` Thomas Rast
  1 sibling, 1 reply; 15+ messages in thread
From: Jim Meyering @ 2011-10-25 16:37 UTC (permalink / raw)
  To: Thomas Rast
  Cc: Richard W.M. Jones, Markus Trippelsdorf, git, Shawn O. Pearce,
	Jeff King, Nicolas Pitre

Thomas Rast wrote:
> Jim Meyering wrote:
>> Thomas Rast wrote:
>> > [GCC moves access to a file-static variable across pthread_mutex_lock()]
>>
>> Thanks for the investigation.
>> Actually, isn't gcc -O2's code-motion justified?
>> While we *know* that those globals may be modified asynchronously,
>> builtin/grep.c forgot to tell gcc about that.
>
> I'm somewhat unwilling to believe that:

You're right to be skeptical.
I should have stuck with "using volatile works around the problem for me".
The real problem seems to be in glibc, with its addition of
the "leaf" attribute to those synchronization primitives:

  http://bugzilla.redhat.com/747377#c22

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: general protection faults with "git grep" version 1.7.7.1
  2011-10-25 16:37         ` Jim Meyering
@ 2011-10-25 16:54           ` Thomas Rast
  2011-10-25 20:24             ` Jim Meyering
  0 siblings, 1 reply; 15+ messages in thread
From: Thomas Rast @ 2011-10-25 16:54 UTC (permalink / raw)
  To: Jim Meyering
  Cc: Richard W.M. Jones, Markus Trippelsdorf, git, Shawn O. Pearce,
	Jeff King, Nicolas Pitre

Jim Meyering wrote:
> Thomas Rast wrote:
> > Jim Meyering wrote:
> >> Thomas Rast wrote:
> >> > [GCC moves access to a file-static variable across pthread_mutex_lock()]
> >>
> >> Thanks for the investigation.
> >> Actually, isn't gcc -O2's code-motion justified?
> >> While we *know* that those globals may be modified asynchronously,
> >> builtin/grep.c forgot to tell gcc about that.
> >
> > I'm somewhat unwilling to believe that:
> 
> You're right to be skeptical.
> I should have stuck with "using volatile works around the problem for me".
> The real problem seems to be in glibc, with its addition of
> the "leaf" attribute to those synchronization primitives:
> 
>   http://bugzilla.redhat.com/747377#c22

Aha.  Glad you found it :-)

Meanwhile I read

  http://www.hpl.hp.com/techreports/2004/HPL-2004-209.html

which discusses a similar issue in section 4.3, but is very
interesting on its own.  It's funny how it says

  We know of at least three optimizing compilers (two of them
  production compilers) that performed this transformation at some
  point during their lifetime; usually at least partially reversing
  the decision when the implications on multi-threaded code became
  known.

I guess that would be four now if it was literally the same problem.

-- 
Thomas Rast
trast@{inf,student}.ethz.ch

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: general protection faults with "git grep" version 1.7.7.1
  2011-10-25 16:54           ` Thomas Rast
@ 2011-10-25 20:24             ` Jim Meyering
  0 siblings, 0 replies; 15+ messages in thread
From: Jim Meyering @ 2011-10-25 20:24 UTC (permalink / raw)
  To: Thomas Rast
  Cc: Richard W.M. Jones, Markus Trippelsdorf, git, Shawn O. Pearce,
	Jeff King, Nicolas Pitre

Thomas Rast wrote:
...
>> The real problem seems to be in glibc, with its addition of
>> the "leaf" attribute to those synchronization primitives:
>>
>>   http://bugzilla.redhat.com/747377#c22
>
> Aha.  Glad you found it :-)
>
> Meanwhile I read
>
>   http://www.hpl.hp.com/techreports/2004/HPL-2004-209.html
>
> which discusses a similar issue in section 4.3, but is very
> interesting on its own.  It's funny how it says
>
>   We know of at least three optimizing compilers (two of them
>   production compilers) that performed this transformation at some
>   point during their lifetime; usually at least partially reversing
>   the decision when the implications on multi-threaded code became
>   known.
>
> I guess that would be four now if it was literally the same problem.

Yep.  For those not following the BZ comments at the about URL,
POSIX is quite clear.  Quoting from
http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap04.html#tag_04_11:

    The following functions synchronize memory with respect to other threads:

        fork
        pthread_barrier_wait
        pthread_cond_broadcast
        pthread_cond_signal
        pthread_cond_timedwait
        pthread_cond_wait
        pthread_create
        pthread_join
        pthread_mutex_lock
        pthread_mutex_timedlock
        pthread_mutex_trylock
        pthread_mutex_unlock
        pthread_spin_lock
        pthread_spin_trylock
        pthread_spin_unlock
        pthread_rwlock_rdlock
        pthread_rwlock_timedrdlock
        pthread_rwlock_timedwrlock
        pthread_rwlock_tryrdlock
        pthread_rwlock_trywrlock
        pthread_rwlock_unlock
        pthread_rwlock_wrlock
        sem_post
        sem_timedwait
        sem_trywait
        sem_wait
        semctl
        semop
        wait
        waitpid

glibc's addition of the leaf attribute to any of those
appears to make gcc violate that.

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2011-10-25 20:24 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-10-24 20:11 general protection faults with "git grep" version 1.7.7.1 Markus Trippelsdorf
2011-10-24 21:49 ` Richard W.M. Jones
2011-10-24 22:58   ` Markus Trippelsdorf
2011-10-25  0:00     ` Bernt Hansen
2011-10-25  5:53       ` Jeff King
2011-10-25 11:11         ` Bernt Hansen
2011-10-25 13:50   ` Thomas Rast
2011-10-25 15:17     ` Jim Meyering
2011-10-25 15:32       ` Markus Trippelsdorf
2011-10-25 16:00       ` Thomas Rast
2011-10-25 16:07         ` Thomas Rast
2011-10-25 16:37         ` Jim Meyering
2011-10-25 16:54           ` Thomas Rast
2011-10-25 20:24             ` Jim Meyering
2011-10-25 15:37     ` Jeff King

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).