From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:36858)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <serge.fdrv@gmail.com>) id 1bIZKG-0005I0-8C
	for qemu-devel@nongnu.org; Thu, 30 Jun 2016 06:36:05 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <serge.fdrv@gmail.com>) id 1bIZKC-0005SE-2D
	for qemu-devel@nongnu.org; Thu, 30 Jun 2016 06:36:03 -0400
Received: from mail-lf0-x243.google.com ([2a00:1450:4010:c07::243]:33228)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <serge.fdrv@gmail.com>) id 1bIZKB-0005SA-MB
	for qemu-devel@nongnu.org; Thu, 30 Jun 2016 06:35:59 -0400
Received: by mail-lf0-x243.google.com with SMTP id l188so7962301lfe.0
	for <qemu-devel@nongnu.org>; Thu, 30 Jun 2016 03:35:59 -0700 (PDT)
References: <1466375313-7562-1-git-send-email-sergey.fedorov@linaro.org>
	<1466375313-7562-7-git-send-email-sergey.fedorov@linaro.org>
	<87lh1o0y1k.fsf@linaro.org> <5774E8C2.1050506@gmail.com>
	<87furvq85v.fsf@linaro.org>
From: Sergey Fedorov <serge.fdrv@gmail.com>
Message-ID: <5774F60C.3010707@gmail.com>
Date: Thu, 30 Jun 2016 13:35:56 +0300
MIME-Version: 1.0
In-Reply-To: <87furvq85v.fsf@linaro.org>
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
Subject: Re: [Qemu-devel] [RFC 6/8] linux-user: Support CPU work queue
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel/>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: =?UTF-8?Q?Alex_Benn=c3=a9e?= <alex.bennee@linaro.org>
Cc: Sergey Fedorov <sergey.fedorov@linaro.org>, qemu-devel@nongnu.org, Riku Voipio <riku.voipio@iki.fi>, Peter Crosthwaite <crosthwaite.peter@gmail.com>, patches@linaro.org, Paolo Bonzini <pbonzini@redhat.com>, Richard Henderson <rth@twiddle.net>

On 30/06/16 13:32, Alex Bennée wrote:
> Sergey Fedorov <serge.fdrv@gmail.com> writes:
>
>> On 29/06/16 19:17, Alex Bennée wrote:
>>> So I think there is a deadlock we can get with the async work:
>>>
>>> (gdb) thread apply all bt
>>>
>>> Thread 11 (Thread 0x7ffefeca7700 (LWP 2912)):
>>> #0  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
>>> #1  0x00005555555cb777 in wait_cpu_work () at /home/alex/lsrc/qemu/qemu.git/linux-user/main.c:155
>>> #2  0x00005555555a0cee in wait_safe_cpu_work () at /home/alex/lsrc/qemu/qemu.git/cpu-exec-common.c:87
>>> #3  0x00005555555cb8fe in cpu_exec_end (cpu=0x555555bb67e0) at /home/alex/lsrc/qemu/qemu.git/linux-user/main.c:222
>>> #4  0x00005555555cc7a7 in cpu_loop (env=0x555555bbea58) at /home/alex/lsrc/qemu/qemu.git/linux-user/main.c:749
>>> #5  0x00005555555db0b2 in clone_func (arg=0x7fffffffc9c0) at /home/alex/lsrc/qemu/qemu.git/linux-user/syscall.c:5424
>>> #6  0x00007ffff6bed6fa in start_thread (arg=0x7ffefeca7700) at pthread_create.c:333
>>> #7  0x00007ffff6923b5d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
>>>
>>> <a bunch of other threads doing the same and then...>
>>>
>>> Thread 3 (Thread 0x7ffff7f38700 (LWP 2904)):
>>> #0  0x00005555555faf5d in safe_syscall_base ()
>>> #1  0x00005555555cfeaf in safe_futex (uaddr=0x7ffff528a0a4, op=128, val=1, timeout=0x0, uaddr2=0x0, val3=-162668384)
>>>     at /home/alex/lsrc/qemu/qemu.git/linux-user/syscall.c:706
>>> #2  0x00005555555dd7cc in do_futex (uaddr=4132298916, op=128, val=1, timeout=0, uaddr2=0, val3=-162668384)
>>>     at /home/alex/lsrc/qemu/qemu.git/linux-user/syscall.c:6246
>>> #3  0x00005555555e8cdb in do_syscall (cpu_env=0x555555a81118, num=240, arg1=-162668380, arg2=128, arg3=1, arg4=0, arg5=0, arg6=-162668384,
>>>     arg7=0, arg8=0) at /home/alex/lsrc/qemu/qemu.git/linux-user/syscall.c:10642
>>> #4  0x00005555555cd20e in cpu_loop (env=0x555555a81118) at /home/alex/lsrc/qemu/qemu.git/linux-user/main.c:883
>>> #5  0x00005555555db0b2 in clone_func (arg=0x7fffffffc9c0) at /home/alex/lsrc/qemu/qemu.git/linux-user/syscall.c:5424
>>> #6  0x00007ffff6bed6fa in start_thread (arg=0x7ffff7f38700) at pthread_create.c:333
>>> #7  0x00007ffff6923b5d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
>>>
>>> So everything is stalled awaiting this thread waking up and draining
>>> its queue. So for linux-user I think we need some mechanism to kick
>>> these syscalls which I assume means throwing a signal at it.
>> Nice catch! How did you get it?
> Running pigz (armhf, debian) to compress stuff.
>
>> We always go through cpu_exec_end()
>> before serving a guest syscall and always go through cpu_exec_start()
>> before entering the guest code execution loop. If we always schedule
>> safe work on the current thread's queue then I think there's a way to
>> make it safe and avoid kicking syscalls.
> Not let the signals complete until safe work is done?

I'm thinking of waiting for completion of safe works in cpu_exec_start()
as well as in cpu_exec_end().

Regards,
Sergey