From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from [140.186.70.92] (port=54528 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1PxFJR-0005Zt-G3 for qemu-devel@nongnu.org; Wed, 09 Mar 2011 04:04:10 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1PxFJQ-00067c-9D for qemu-devel@nongnu.org; Wed, 09 Mar 2011 04:04:09 -0500 Received: from fmmailgate02.web.de ([217.72.192.227]:38724) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1PxFJP-000676-TN for qemu-devel@nongnu.org; Wed, 09 Mar 2011 04:04:08 -0500 Message-ID: <4D774282.9060102@web.de> Date: Wed, 09 Mar 2011 10:04:02 +0100 From: Jan Kiszka MIME-Version: 1.0 Subject: Re: [Qemu-devel] Re: segmentation fault in qemu-kvm-0.14.0 References: <2640D58E-2101-47FA-99B6-28815666651E@dlh.net> <4D772E4C.6020604@web.de> In-Reply-To: Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="------------enigEDB9778BC4E7B5BEB2F53B14" Sender: jan.kiszka@web.de List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Corentin Chary Cc: Peter Lieven , qemu-devel , kvm@vger.kernel.org This is an OpenPGP/MIME signed message (RFC 2440 and 3156) --------------enigEDB9778BC4E7B5BEB2F53B14 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable On 2011-03-09 09:50, Corentin Chary wrote: > On Wed, Mar 9, 2011 at 7:37 AM, Jan Kiszka wrote: >> On 2011-03-08 23:53, Peter Lieven wrote: >>> Hi, >>> >>> during testing of qemu-kvm-0.14.0 i can reproduce the following segfa= ult. i have seen similar crash already in 0.13.0, but had no time to debu= g. >>> my guess is that this segfault is related to the threaded vnc server = which was introduced in qemu 0.13.0. the bug is only triggerable if a vnc= >>> client is attached. it might also be connected to a resolution change= in the guest. i have a backtrace attached. the debugger is still running= if someone >>> needs more output >>> >> >> ... >> >>> Thread 1 (Thread 0x7ffff7ff0700 (LWP 29038)): >>> #0 0x0000000000000000 in ?? () >>> No symbol table info available. >>> #1 0x000000000041d669 in main_loop_wait (nonblocking=3D0) >>> at /usr/src/qemu-kvm-0.14.0/vl.c:1388 >> >> So we are calling a IOHandlerRecord::fd_write handler that is NULL. >> Looking at qemu_set_fd_handler2, this may happen if that function is >> called for an existing io-handler entry with non-NULL write handler, >> passing a NULL write and a non-NULL read handler. And all this without= >> the global mutex held. >> >> And there are actually calls in vnc_client_write_plain and >> vnc_client_write_locked (in contrast to vnc_write) that may generate >> this pattern. It's probably worth validating that the iothread lock is= >> always held when qemu_set_fd_handler2 is invoked to confirm this race >> theory, adding something like >> >> assert(pthread_mutex_trylock(&qemu_mutex) !=3D 0); >> (that's for qemu-kvm only) >> >> BTW, qemu with just --enable-vnc-thread, ie. without io-thread support= , >> should always run into this race as it then definitely lacks a global = mutex. >=20 > I'm not sure what mutex should be locked here (qemu_global_mutex, > qemu_fair_mutex, lock_iothread). In upstream qemu, the latter - if it exists (which is not the case in non-io-thread mode). In qemu-kvm, those locks are yet unused. Rather the code in qemu-kvm.c implements the global lock. > But here is where is should be locked (other vnc_write calls in this > thread should never trigger qemu_set_fd_handler): >=20 > diff --git a/ui/vnc-jobs-async.c b/ui/vnc-jobs-async.c > index 1d4c5e7..e02d891 100644 > --- a/ui/vnc-jobs-async.c > +++ b/ui/vnc-jobs-async.c > @@ -258,7 +258,9 @@ static int vnc_worker_thread_loop(VncJobQueue *queu= e) > goto disconnected; > } >=20 > + /* lock */ > vnc_write(job->vs, vs.output.buffer, vs.output.offset); > + /* unlock */ >=20 > disconnected: > /* Copy persistent encoding data */ > @@ -267,7 +269,9 @@ disconnected: > vnc_unlock_output(job->vs); >=20 > if (flush) { > + /* lock */ > vnc_flush(job->vs); > + /* unlock */ > } Particularly this call is critical as it will trigger vnc_client_write_locked which may NULL'ify the write handler. But I'm also not sure about vnc_send_framebuffer_update. Someone has to go through the threaded vnc code again, very thoroughly. It looks fragile= =2E Jan --------------enigEDB9778BC4E7B5BEB2F53B14 Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.15 (GNU/Linux) Comment: Using GnuPG with SUSE - http://enigmail.mozdev.org/ iEYEARECAAYFAk13QoQACgkQitSsb3rl5xRfnACbBt3h3ZQ3PV2ZYQAIGYT7A4Bz 81kAnRWkfV2aoXDMi6R8SOkPQKIpKhaJ =P+ht -----END PGP SIGNATURE----- --------------enigEDB9778BC4E7B5BEB2F53B14--