From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from [140.186.70.92] (port=34454 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1Q7TUh-0003Ws-6A for qemu-devel@nongnu.org; Wed, 06 Apr 2011 10:14:08 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Q7TUe-00068E-7n for qemu-devel@nongnu.org; Wed, 06 Apr 2011 10:14:03 -0400 Received: from mail-wy0-f173.google.com ([74.125.82.173]:43834) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Q7TUe-00067q-0y for qemu-devel@nongnu.org; Wed, 06 Apr 2011 10:14:00 -0400 Received: by wyb42 with SMTP id 42so1468214wyb.4 for ; Wed, 06 Apr 2011 07:13:58 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: <201104042013.00558.michael@walle.cc> References: <201104042013.00558.michael@walle.cc> Date: Wed, 6 Apr 2011 14:13:58 +0000 Message-ID: From: Benjamin Poirier Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Subject: [Qemu-devel] Re: [regression] configure: add opengl detection List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Michael Walle Cc: blauwirbel@gmail.com, "Edgar E. Iglesias" , linux-bugs@nvidia.com, qemu-devel@nongnu.org On Mon, Apr 4, 2011 at 6:13 PM, Michael Walle wrote: > Hi Benjamin, > >> Let me know if you need more info. > > what happens if you configure with > > ./configure --target-list=3Dx86_64-softmmu --disable-opengl > Works as usual. The problem I'm facing stems from linking to libGL and memory protection issues. The particular system I ran this on has the binary nvidia driver and its companion libGL.so.260.19.44. As such I'd take no offense if we wave it off as a "problem in the unsupported binary drivers" and I'll be satisfied configuring with no opengl on that system. Nevertheless, I did investigate about what's happening a little further to clearly show that the problem is on nvidia's side. 1) as stated earlier, qemu segfaults when linked with the opengl libraries. 2) if I start qemu under gdb and configure it not to stop on SIGUSR2 (as I had omitted before; handle SIGUSR2 nostop noprint), qemu runs ok. Same goes for strace. 3) if we enable /proc/sys/debug/exception-trace, the kernel printks: qemu-system-x86[15693]: segfault at 10c7820 ip 00000000010c7820 sp 00007fff71e334c8 error 15 10c7820 is the faulting address. Looking at the core file, we see that 10c7820 is the famous code_gen_prologue: Program terminated with signal 11, Segmentation fault. #0 0x00000000010c7820 in code_gen_prologue () (gdb) x /20i code_gen_prologue =3D> 0x10c7820 : push %rbp 0x10c7821 : push %rbx 0x10c7822 : push %r12 0x10c7824 : push %r13 [...] By adding some debug code to map_exec() and adding a sigsegv handler (that prints /proc/self/maps) I can see that code_gen_prologue is adequately mprotect()'ed PROT_EXEC. Come time to jump into it from cpu_exec(), that map is no longer there, the page is not executable, and qemu crashes with a segfault. Here is my debug output: [...] 0091a000-01125000 rw-p 00000000 00:00 0 [...] Will now map_exec 0x10c7820 Running mprotect 0x10c7000 Result: 0 [...] 0091a000-010c7000 rw-p 00000000 00:00 0 010c7000-010c8000 rwxp 00000000 00:00 0 010c8000-01125000 rw-p 00000000 00:00 0 [...] Got SIGSEGV at address: 0x10c7820 [...] 0091a000-01125000 rw-p 00000000 00:00 0 I suspect that the nvidia libraries are messing with memory protection. A look at objdump -R /usr/lib/libGL.so.1 indicates it does need the symbol mprotect(). I tried to confirm this. Using a kernel tracer (ftrace, perf or lttng), I can see that there are usually over 500 mprotect system calls before qemu crashes, including this interesting combination (ftrace output): qemu-system-x86-21216 [002] 87794.633373: sys_mprotect(start: 10c7000, len: 1000, prot: 7) qemu-system-x86-21216 [000] 87794.806065: sys_mprotect(start: 400000, len: 2f1000, prot: 7) qemu-system-x86-21216 [000] 87794.806079: sys_mprotect(start: 8f0000, len: 835000, prot: 3) With prot: 3 (read, write) it is essentially undoing what was done 100+ ms. earlier. In order to track down exactly where that call comes from I tried using an LD_PRELOAD wrapper around glibc's mprotect() - source for the wrapper here: https://gist.github.com/905600 When I do that, qemu doesn't crash anymore. ftrace reports the number of mprotect calls is down to 123 and the odd combination is no longer present. I can put the wrapper code within qemu itself and forgo LD_PRELOAD, result is the same - no crash. I would've like to show the weird mprotect call coming out of libGL or libnvidia-whatever so we could point the finger to nvidia, but alas. I'm at a loss as to why it doesn't crash under gdb, strace or with a wrapper. If anyone has thoughts on that, I'm all ears. Thanks, -Ben > > -- > =A0Michael >