From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([140.186.70.92]:50161)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <crosa@redhat.com>) id 1RJlQI-0001x3-L6
	for qemu-devel@nongnu.org; Fri, 28 Oct 2011 08:20:39 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <crosa@redhat.com>) id 1RJlQE-0001B7-1N
	for qemu-devel@nongnu.org; Fri, 28 Oct 2011 08:20:34 -0400
Received: from mx1.redhat.com ([209.132.183.28]:24533)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <crosa@redhat.com>) id 1RJlQD-0001Av-NT
	for qemu-devel@nongnu.org; Fri, 28 Oct 2011 08:20:29 -0400
Message-ID: <4EAA9E07.3060704@redhat.com>
Date: Fri, 28 Oct 2011 09:20:23 -0300
From: Cleber Rosa <crosa@redhat.com>
MIME-Version: 1.0
References: <1316443033-6489-1-git-send-email-freddy77@gmail.com>
	<4EA95BFF.6070807@redhat.com>
	<20111027135731.GA21052@stefanha-thinkpad.localdomain>
	<4EA96776.6020807@redhat.com> <4EA96B82.6070507@redhat.com>
	<4EAA9310.2030705@redhat.com>
In-Reply-To: <4EAA9310.2030705@redhat.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Subject: Re: [Qemu-devel] [PATCH v2] block: avoid SIGUSR2
Reply-To: cleber@redhat.com
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Kevin Wolf <kwolf@redhat.com>
Cc: Lucas Meneghel Rodrigues <lmr@redhat.com>, aliguori@us.ibm.com, Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>, qemu-devel@nongnu.org, Frediano Ziglio <freddy77@gmail.com>, Paolo Bonzini <pbonzini@redhat.com>

On 10/28/2011 08:33 AM, Kevin Wolf wrote:
> Am 27.10.2011 16:32, schrieb Kevin Wolf:
>> Am 27.10.2011 16:15, schrieb Kevin Wolf:
>>> Am 27.10.2011 15:57, schrieb Stefan Hajnoczi:
>>>> On Thu, Oct 27, 2011 at 03:26:23PM +0200, Kevin Wolf wrote:
>>>>> Am 19.09.2011 16:37, schrieb Frediano Ziglio:
>>>>>> Now that iothread is always compiled sending a signal seems only an
>>>>>> additional step. This patch also avoid writing to two pipe (one from signal
>>>>>> and one in qemu_service_io).
>>>>>>
>>>>>> Work with kvm enabled or disabled. strace output is more readable (less syscalls).
>>>>>>
>>>>>> Signed-off-by: Frediano Ziglio<freddy77@gmail.com>
>>>>> Something in this change has bad effects, in the sense that it seems to
>>>>> break bdrv_read_em.
>>>> How does it break bdrv_read_em?  Are you seeing QEMU hung with 100% CPU
>>>> utilization or deadlocked?
>>> Sorry, I should have been more detailed here.
>>>
>>> No, it's nothing obvious, it must be some subtle side effect. The result
>>> of bdrv_read_em itself seems to be correct (return value and checksum of
>>> the read buffer).
>>>
>>> However instead of booting into the DOS setup I only get an error
>>> message "Kein System oder Laufwerksfehler" (don't know how it reads in
>>> English DOS versions), which seems to be produced by the boot sector.
>>>
>>> I excluded all of the minor changes, so I'm sure that it's caused by the
>>> switch from kill() to a direct call of the function that writes into the
>>> pipe.
>>>
>>>> One interesting thing is that qemu_aio_wait() does not release the QEMU
>>>> mutex, so we cannot write to a pipe with the mutex held and then spin
>>>> waiting for the iothread to do work for us.
>>>>
>>>> Exactly how kill and qemu_notify_event() were different I'm not sure
>>>> right now but it could be a factor.
>>> This would cause a hang, right? Then it isn't what I'm seeing.
>> While trying out some more things, I added some fprintfs to
>> posix_aio_process_queue() and suddenly it also fails with the kill()
>> version. So what has changed might really just be the timing, and it
>> could be a race somewhere that has always (?) existed.
> Replying to myself again... It looks like there is a problem with
> reentrancy in fdctrl_transfer_handler. I think this would have been
> guarded by the AsyncContexts before, but we don't have them any more.
>
> qemu-system-x86_64: /root/upstream/qemu/hw/fdc.c:1253:
> fdctrl_transfer_handler: Assertion `reentrancy == 0' failed.
>
> Program received signal SIGABRT, Aborted.
>
> (gdb) bt
> #0  0x0000003ccd2329a5 in raise () from /lib64/libc.so.6
> #1  0x0000003ccd234185 in abort () from /lib64/libc.so.6
> #2  0x0000003ccd22b935 in __assert_fail () from /lib64/libc.so.6
> #3  0x000000000046ff09 in fdctrl_transfer_handler (opaque=<value
> optimized out>, nchan=<value optimized out>, dma_pos=<value optimized out>,
>      dma_len=<value optimized out>) at /root/upstream/qemu/hw/fdc.c:1253
> #4  0x000000000046702c in channel_run () at /root/upstream/qemu/hw/dma.c:348
> #5  DMA_run () at /root/upstream/qemu/hw/dma.c:378
> #6  0x000000000040b0e1 in qemu_bh_poll () at async.c:70
> #7  0x000000000040aa19 in qemu_aio_wait () at aio.c:147
> #8  0x000000000041c355 in bdrv_read_em (bs=0x131fd80, sector_num=19,
> buf=<value optimized out>, nb_sectors=1) at block.c:2896
> #9  0x000000000041b3d2 in bdrv_read (bs=0x131fd80, sector_num=19,
> buf=0x1785a00 "IO      SYS!", nb_sectors=1) at block.c:1062
> #10 0x000000000041b3d2 in bdrv_read (bs=0x131f430, sector_num=19,
> buf=0x1785a00 "IO      SYS!", nb_sectors=1) at block.c:1062
> #11 0x000000000046fbb8 in do_fdctrl_transfer_handler (opaque=0x1785788,
> nchan=2, dma_pos=<value optimized out>, dma_len=512)
>      at /root/upstream/qemu/hw/fdc.c:1178
> #12 0x000000000046fecf in fdctrl_transfer_handler (opaque=<value
> optimized out>, nchan=<value optimized out>, dma_pos=<value optimized out>,
>      dma_len=<value optimized out>) at /root/upstream/qemu/hw/fdc.c:1255
> #13 0x000000000046702c in channel_run () at /root/upstream/qemu/hw/dma.c:348
> #14 DMA_run () at /root/upstream/qemu/hw/dma.c:378
> #15 0x000000000046e456 in fdctrl_start_transfer (fdctrl=0x1785788,
> direction=1) at /root/upstream/qemu/hw/fdc.c:1107
> #16 0x0000000000558a41 in kvm_handle_io (env=0x1323ff0) at
> /root/upstream/qemu/kvm-all.c:834
> #17 kvm_cpu_exec (env=0x1323ff0) at /root/upstream/qemu/kvm-all.c:976
> #18 0x000000000053686a in qemu_kvm_cpu_thread_fn (arg=0x1323ff0) at
> /root/upstream/qemu/cpus.c:661
> #19 0x0000003ccda077e1 in start_thread () from /lib64/libpthread.so.0
> #20 0x0000003ccd2e151d in clone () from /lib64/libc.so.6
>
> I'm afraid that we can only avoid things like this reliably if we
> convert all devices to be direct users of AIO/coroutines. The current
> block layer infrastructure doesn't emulate the behaviour of bdrv_read
> accurately as bottom halves can be run in the nested main loop.
>
> For floppy, the following seems to be a quick fix (Lucas, Cleber, does
> this solve your problems?), though it's not very satisfying. And I'm not
> quite sure yet why it doesn't always happen with kill() in
> posix-aio-compat.c.
>
> diff --git a/hw/dma.c b/hw/dma.c
> index 8a7302a..1d3b6f1 100644
> --- a/hw/dma.c
> +++ b/hw/dma.c
> @@ -358,6 +358,13 @@ static void DMA_run (void)
>       struct dma_cont *d;
>       int icont, ichan;
>       int rearm = 0;
> +    static int running = 0;
> +
> +    if (running) {
> +        goto out;
> +    } else {
> +        running = 0;
> +    }
>
>       d = dma_controllers;
>
> @@ -374,6 +381,8 @@ static void DMA_run (void)
>           }
>       }
>
> +out:
> +    running = 0;
>       if (rearm)
>           qemu_bh_schedule_idle(dma_bh);
>   }
>
> Kevin

Kevin,

In my quick test (compiling qemu.git master + your dma patch, and 
running a FreeDOS floppy image) it does not have any visible difference.

The boot is still stuck after printing "FreeDOS" at the console.

PS: We will trigger a full blown test, with a Windows installation using 
a floppy, but the results with the FreeDOS floppy have been very 
consistent with the full blown test.