From: Kevin Wolf <kwolf@redhat.com>
To: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Cc: Lucas Meneghel Rodrigues <lmr@redhat.com>,
aliguori@us.ibm.com, qemu-devel@nongnu.org,
Frediano Ziglio <freddy77@gmail.com>,
Cleber Rosa <crosa@redhat.com>,
Paolo Bonzini <pbonzini@redhat.com>
Subject: Re: [Qemu-devel] [PATCH v2] block: avoid SIGUSR2
Date: Fri, 28 Oct 2011 13:35:07 +0200 [thread overview]
Message-ID: <4EAA936B.3070808@redhat.com> (raw)
In-Reply-To: <4EAA9310.2030705@redhat.com>
Am 28.10.2011 13:33, schrieb Kevin Wolf:
> Am 27.10.2011 16:32, schrieb Kevin Wolf:
>> Am 27.10.2011 16:15, schrieb Kevin Wolf:
>>> Am 27.10.2011 15:57, schrieb Stefan Hajnoczi:
>>>> On Thu, Oct 27, 2011 at 03:26:23PM +0200, Kevin Wolf wrote:
>>>>> Am 19.09.2011 16:37, schrieb Frediano Ziglio:
>>>>>> Now that iothread is always compiled sending a signal seems only an
>>>>>> additional step. This patch also avoid writing to two pipe (one from signal
>>>>>> and one in qemu_service_io).
>>>>>>
>>>>>> Work with kvm enabled or disabled. strace output is more readable (less syscalls).
>>>>>>
>>>>>> Signed-off-by: Frediano Ziglio <freddy77@gmail.com>
>>>>>
>>>>> Something in this change has bad effects, in the sense that it seems to
>>>>> break bdrv_read_em.
>>>>
>>>> How does it break bdrv_read_em? Are you seeing QEMU hung with 100% CPU
>>>> utilization or deadlocked?
>>>
>>> Sorry, I should have been more detailed here.
>>>
>>> No, it's nothing obvious, it must be some subtle side effect. The result
>>> of bdrv_read_em itself seems to be correct (return value and checksum of
>>> the read buffer).
>>>
>>> However instead of booting into the DOS setup I only get an error
>>> message "Kein System oder Laufwerksfehler" (don't know how it reads in
>>> English DOS versions), which seems to be produced by the boot sector.
>>>
>>> I excluded all of the minor changes, so I'm sure that it's caused by the
>>> switch from kill() to a direct call of the function that writes into the
>>> pipe.
>>>
>>>> One interesting thing is that qemu_aio_wait() does not release the QEMU
>>>> mutex, so we cannot write to a pipe with the mutex held and then spin
>>>> waiting for the iothread to do work for us.
>>>>
>>>> Exactly how kill and qemu_notify_event() were different I'm not sure
>>>> right now but it could be a factor.
>>>
>>> This would cause a hang, right? Then it isn't what I'm seeing.
>>
>> While trying out some more things, I added some fprintfs to
>> posix_aio_process_queue() and suddenly it also fails with the kill()
>> version. So what has changed might really just be the timing, and it
>> could be a race somewhere that has always (?) existed.
>
> Replying to myself again... It looks like there is a problem with
> reentrancy in fdctrl_transfer_handler. I think this would have been
> guarded by the AsyncContexts before, but we don't have them any more.
>
> qemu-system-x86_64: /root/upstream/qemu/hw/fdc.c:1253:
> fdctrl_transfer_handler: Assertion `reentrancy == 0' failed.
>
> Program received signal SIGABRT, Aborted.
>
> (gdb) bt
> #0 0x0000003ccd2329a5 in raise () from /lib64/libc.so.6
> #1 0x0000003ccd234185 in abort () from /lib64/libc.so.6
> #2 0x0000003ccd22b935 in __assert_fail () from /lib64/libc.so.6
> #3 0x000000000046ff09 in fdctrl_transfer_handler (opaque=<value
> optimized out>, nchan=<value optimized out>, dma_pos=<value optimized out>,
> dma_len=<value optimized out>) at /root/upstream/qemu/hw/fdc.c:1253
> #4 0x000000000046702c in channel_run () at /root/upstream/qemu/hw/dma.c:348
> #5 DMA_run () at /root/upstream/qemu/hw/dma.c:378
> #6 0x000000000040b0e1 in qemu_bh_poll () at async.c:70
> #7 0x000000000040aa19 in qemu_aio_wait () at aio.c:147
> #8 0x000000000041c355 in bdrv_read_em (bs=0x131fd80, sector_num=19,
> buf=<value optimized out>, nb_sectors=1) at block.c:2896
> #9 0x000000000041b3d2 in bdrv_read (bs=0x131fd80, sector_num=19,
> buf=0x1785a00 "IO SYS!", nb_sectors=1) at block.c:1062
> #10 0x000000000041b3d2 in bdrv_read (bs=0x131f430, sector_num=19,
> buf=0x1785a00 "IO SYS!", nb_sectors=1) at block.c:1062
> #11 0x000000000046fbb8 in do_fdctrl_transfer_handler (opaque=0x1785788,
> nchan=2, dma_pos=<value optimized out>, dma_len=512)
> at /root/upstream/qemu/hw/fdc.c:1178
> #12 0x000000000046fecf in fdctrl_transfer_handler (opaque=<value
> optimized out>, nchan=<value optimized out>, dma_pos=<value optimized out>,
> dma_len=<value optimized out>) at /root/upstream/qemu/hw/fdc.c:1255
> #13 0x000000000046702c in channel_run () at /root/upstream/qemu/hw/dma.c:348
> #14 DMA_run () at /root/upstream/qemu/hw/dma.c:378
> #15 0x000000000046e456 in fdctrl_start_transfer (fdctrl=0x1785788,
> direction=1) at /root/upstream/qemu/hw/fdc.c:1107
> #16 0x0000000000558a41 in kvm_handle_io (env=0x1323ff0) at
> /root/upstream/qemu/kvm-all.c:834
> #17 kvm_cpu_exec (env=0x1323ff0) at /root/upstream/qemu/kvm-all.c:976
> #18 0x000000000053686a in qemu_kvm_cpu_thread_fn (arg=0x1323ff0) at
> /root/upstream/qemu/cpus.c:661
> #19 0x0000003ccda077e1 in start_thread () from /lib64/libpthread.so.0
> #20 0x0000003ccd2e151d in clone () from /lib64/libc.so.6
>
> I'm afraid that we can only avoid things like this reliably if we
> convert all devices to be direct users of AIO/coroutines. The current
> block layer infrastructure doesn't emulate the behaviour of bdrv_read
> accurately as bottom halves can be run in the nested main loop.
>
> For floppy, the following seems to be a quick fix (Lucas, Cleber, does
> this solve your problems?), though it's not very satisfying. And I'm not
> quite sure yet why it doesn't always happen with kill() in
> posix-aio-compat.c.
>
> diff --git a/hw/dma.c b/hw/dma.c
> index 8a7302a..1d3b6f1 100644
> --- a/hw/dma.c
> +++ b/hw/dma.c
> @@ -358,6 +358,13 @@ static void DMA_run (void)
> struct dma_cont *d;
> int icont, ichan;
> int rearm = 0;
> + static int running = 0;
> +
> + if (running) {
> + goto out;
> + } else {
> + running = 0;
running = 1, obviously. I had the fix disabled for testing something.
> + }
>
> d = dma_controllers;
>
> @@ -374,6 +381,8 @@ static void DMA_run (void)
> }
> }
>
> +out:
> + running = 0;
> if (rearm)
> qemu_bh_schedule_idle(dma_bh);
> }
>
> Kevin
>
next prev parent reply other threads:[~2011-10-28 11:32 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-09-19 14:37 [Qemu-devel] [PATCH v2] block: avoid SIGUSR2 Frediano Ziglio
2011-09-19 15:02 ` Paolo Bonzini
2011-09-19 15:11 ` Kevin Wolf
2011-09-19 15:25 ` Paolo Bonzini
2011-09-19 15:15 ` Kevin Wolf
2011-10-27 13:26 ` Kevin Wolf
2011-10-27 13:57 ` Stefan Hajnoczi
2011-10-27 14:15 ` Kevin Wolf
2011-10-27 14:32 ` Kevin Wolf
2011-10-28 11:33 ` Kevin Wolf
2011-10-28 11:35 ` Kevin Wolf [this message]
2011-10-28 11:50 ` Paolo Bonzini
2011-10-28 12:29 ` Kevin Wolf
2011-10-28 12:31 ` Stefan Hajnoczi
2011-10-28 15:58 ` Paolo Bonzini
2011-10-31 2:10 ` Zhi Yong Wu
2011-10-28 12:20 ` Cleber Rosa
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4EAA936B.3070808@redhat.com \
--to=kwolf@redhat.com \
--cc=aliguori@us.ibm.com \
--cc=crosa@redhat.com \
--cc=freddy77@gmail.com \
--cc=lmr@redhat.com \
--cc=pbonzini@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=stefanha@linux.vnet.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.