From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from [140.186.70.92] (port=54528 helo=eggs.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43) id 1PxFJR-0005Zt-G3
	for qemu-devel@nongnu.org; Wed, 09 Mar 2011 04:04:10 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <jan.kiszka@web.de>) id 1PxFJQ-00067c-9D
	for qemu-devel@nongnu.org; Wed, 09 Mar 2011 04:04:09 -0500
Received: from fmmailgate02.web.de ([217.72.192.227]:38724)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <jan.kiszka@web.de>) id 1PxFJP-000676-TN
	for qemu-devel@nongnu.org; Wed, 09 Mar 2011 04:04:08 -0500
Message-ID: <4D774282.9060102@web.de>
Date: Wed, 09 Mar 2011 10:04:02 +0100
From: Jan Kiszka <jan.kiszka@web.de>
MIME-Version: 1.0
Subject: Re: [Qemu-devel] Re: segmentation fault in qemu-kvm-0.14.0
References: <2640D58E-2101-47FA-99B6-28815666651E@dlh.net>	<4D772E4C.6020604@web.de>
	<AANLkTikAOH2nR3T7iqrKj226MrH+8O9DXC=J+7XK-Av_@mail.gmail.com>
In-Reply-To: <AANLkTikAOH2nR3T7iqrKj226MrH+8O9DXC=J+7XK-Av_@mail.gmail.com>
Content-Type: multipart/signed; micalg=pgp-sha1;
	protocol="application/pgp-signature";
	boundary="------------enigEDB9778BC4E7B5BEB2F53B14"
Sender: jan.kiszka@web.de
List-Id: qemu-devel.nongnu.org
List-Unsubscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Corentin Chary <corentin.chary@gmail.com>
Cc: Peter Lieven <pl@dlh.net>, qemu-devel <qemu-devel@nongnu.org>, kvm@vger.kernel.org

This is an OpenPGP/MIME signed message (RFC 2440 and 3156)
--------------enigEDB9778BC4E7B5BEB2F53B14
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

On 2011-03-09 09:50, Corentin Chary wrote:
> On Wed, Mar 9, 2011 at 7:37 AM, Jan Kiszka <jan.kiszka@web.de> wrote:
>> On 2011-03-08 23:53, Peter Lieven wrote:
>>> Hi,
>>>
>>> during testing of qemu-kvm-0.14.0 i can reproduce the following segfa=
ult. i have seen similar crash already in 0.13.0, but had no time to debu=
g.
>>> my guess is that this segfault is related to the threaded vnc server =
which was introduced in qemu 0.13.0. the bug is only triggerable if a vnc=

>>> client is attached. it might also be connected to a resolution change=
 in the guest. i have a backtrace attached. the debugger is still running=
 if someone
>>> needs more output
>>>
>>
>> ...
>>
>>> Thread 1 (Thread 0x7ffff7ff0700 (LWP 29038)):
>>> #0  0x0000000000000000 in ?? ()
>>> No symbol table info available.
>>> #1  0x000000000041d669 in main_loop_wait (nonblocking=3D0)
>>>     at /usr/src/qemu-kvm-0.14.0/vl.c:1388
>>
>> So we are calling a IOHandlerRecord::fd_write handler that is NULL.
>> Looking at qemu_set_fd_handler2, this may happen if that function is
>> called for an existing io-handler entry with non-NULL write handler,
>> passing a NULL write and a non-NULL read handler. And all this without=

>> the global mutex held.
>>
>> And there are actually calls in vnc_client_write_plain and
>> vnc_client_write_locked (in contrast to vnc_write) that may generate
>> this pattern. It's probably worth validating that the iothread lock is=

>> always held when qemu_set_fd_handler2 is invoked to confirm this race
>> theory, adding something like
>>
>> assert(pthread_mutex_trylock(&qemu_mutex) !=3D 0);
>> (that's for qemu-kvm only)
>>
>> BTW, qemu with just --enable-vnc-thread, ie. without io-thread support=
,
>> should always run into this race as it then definitely lacks a global =
mutex.
>=20
> I'm not sure what mutex should be locked here (qemu_global_mutex,
> qemu_fair_mutex, lock_iothread).

In upstream qemu, the latter - if it exists (which is not the case in
non-io-thread mode).

In qemu-kvm, those locks are yet unused. Rather the code in qemu-kvm.c
implements the global lock.

> But here is where is should be locked (other vnc_write calls in this
> thread should never trigger qemu_set_fd_handler):
>=20
> diff --git a/ui/vnc-jobs-async.c b/ui/vnc-jobs-async.c
> index 1d4c5e7..e02d891 100644
> --- a/ui/vnc-jobs-async.c
> +++ b/ui/vnc-jobs-async.c
> @@ -258,7 +258,9 @@ static int vnc_worker_thread_loop(VncJobQueue *queu=
e)
>          goto disconnected;
>      }
>=20
> +    /* lock */
>      vnc_write(job->vs, vs.output.buffer, vs.output.offset);
> +    /* unlock */
>=20
>  disconnected:
>      /* Copy persistent encoding data */
> @@ -267,7 +269,9 @@ disconnected:
>      vnc_unlock_output(job->vs);
>=20
>      if (flush) {
> +        /* lock */
>          vnc_flush(job->vs);
> +       /* unlock */
>      }

Particularly this call is critical as it will trigger
vnc_client_write_locked which may NULL'ify the write handler.

But I'm also not sure about vnc_send_framebuffer_update. Someone has to
go through the threaded vnc code again, very thoroughly. It looks fragile=
=2E

Jan


--------------enigEDB9778BC4E7B5BEB2F53B14
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: OpenPGP digital signature
Content-Disposition: attachment; filename="signature.asc"

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.15 (GNU/Linux)
Comment: Using GnuPG with SUSE - http://enigmail.mozdev.org/

iEYEARECAAYFAk13QoQACgkQitSsb3rl5xRfnACbBt3h3ZQ3PV2ZYQAIGYT7A4Bz
81kAnRWkfV2aoXDMi6R8SOkPQKIpKhaJ
=P+ht
-----END PGP SIGNATURE-----

--------------enigEDB9778BC4E7B5BEB2F53B14--