* problem about blocked monitor when disk image on NFS can not be reached. @ 2011-03-01 5:01 ya su 2011-03-01 10:51 ` Stefan Hajnoczi 0 siblings, 1 reply; 8+ messages in thread From: ya su @ 2011-03-01 5:01 UTC (permalink / raw) To: kvm@vger.kernel.org hi all: kvm start with disk image on nfs server, when nfs server can not be reached, monitor will be blocked. I change io_thread to SCHED_RR policy, it will work unfluently waiting for disk read/write timeout. I have tested a standalone thread to process kvm_handle_io, it can not start up correctly, it may need qemu_mutux protection. as io_thread process different io tasks, is it possible to transfer kvm_handle_io and handle_mmio function into this thread? but the problem will still stay, monitor will still be blocked by read/write disk request. is there anyone that will have a good suggestion? thanks. Green. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: problem about blocked monitor when disk image on NFS can not be reached. 2011-03-01 5:01 problem about blocked monitor when disk image on NFS can not be reached ya su @ 2011-03-01 10:51 ` Stefan Hajnoczi 2011-03-01 12:39 ` ya su 0 siblings, 1 reply; 8+ messages in thread From: Stefan Hajnoczi @ 2011-03-01 10:51 UTC (permalink / raw) To: ya su; +Cc: kvm@vger.kernel.org On Tue, Mar 1, 2011 at 5:01 AM, ya su <suya94335@gmail.com> wrote: > kvm start with disk image on nfs server, when nfs server can not be > reached, monitor will be blocked. I change io_thread to SCHED_RR > policy, it will work unfluently waiting for disk read/write timeout. There are some synchronous disk image reads that can put qemu-kvm to sleep until NFS responds or errors. For example, when starting hw/virtio-blk.c calls bdrv_guess_geometry() which may invoke bdrv_read(). Once the VM is running and you're using virtio-blk then disk I/O should be asynchronous. There are some synchronous cases to do with migration, snapshotting, etc where we wait for outstanding aio requests. Again this can block qemu-kvm. So in short, there's no easy way to avoid blocking the VM in all cases today. You should find, however, that normal read/write operation to a running VM does not cause qemu-kvm to sleep. Stefan ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: problem about blocked monitor when disk image on NFS can not be reached. 2011-03-01 10:51 ` Stefan Hajnoczi @ 2011-03-01 12:39 ` ya su 2011-03-01 15:01 ` Stefan Hajnoczi 0 siblings, 1 reply; 8+ messages in thread From: ya su @ 2011-03-01 12:39 UTC (permalink / raw) To: Stefan Hajnoczi; +Cc: kvm@vger.kernel.org first say sorry for the same mail sent more than one time, I don't know it will take so long time to come back. hi, stefan: thank for your explaining. how about to remove kvm_handle_io/handle_mmio in kvm_run function into kvm_main_loop, as these operation belong to io operation, this will remove the qemu_mutux between the 2 threads. is this an reasonable thought? In order to keep the monitor to response to user quicker under this suition, an easier way is to take monito io out of qemu_mutux protection. this include vnc/serial/telnet io related with monitor, as these io will not affect the running of vm itself, it need not in so stirct protection. Any suggestions? thanks. Green. 2011/3/1 Stefan Hajnoczi <stefanha@gmail.com>: > On Tue, Mar 1, 2011 at 5:01 AM, ya su <suya94335@gmail.com> wrote: >> kvm start with disk image on nfs server, when nfs server can not be >> reached, monitor will be blocked. I change io_thread to SCHED_RR >> policy, it will work unfluently waiting for disk read/write timeout. > > There are some synchronous disk image reads that can put qemu-kvm to > sleep until NFS responds or errors. For example, when starting > hw/virtio-blk.c calls bdrv_guess_geometry() which may invoke > bdrv_read(). > > Once the VM is running and you're using virtio-blk then disk I/O > should be asynchronous. There are some synchronous cases to do with > migration, snapshotting, etc where we wait for outstanding aio > requests. Again this can block qemu-kvm. > > So in short, there's no easy way to avoid blocking the VM in all cases > today. You should find, however, that normal read/write operation to > a running VM does not cause qemu-kvm to sleep. > > Stefan > ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: problem about blocked monitor when disk image on NFS can not be reached. 2011-03-01 12:39 ` ya su @ 2011-03-01 15:01 ` Stefan Hajnoczi 2011-03-01 15:23 ` Avi Kivity 0 siblings, 1 reply; 8+ messages in thread From: Stefan Hajnoczi @ 2011-03-01 15:01 UTC (permalink / raw) To: ya su; +Cc: kvm@vger.kernel.org, Kevin Wolf On Tue, Mar 1, 2011 at 12:39 PM, ya su <suya94335@gmail.com> wrote: > how about to remove kvm_handle_io/handle_mmio in kvm_run function > into kvm_main_loop, as these operation belong to io operation, this > will remove the qemu_mutux between the 2 threads. is this an > reasonable thought? > > In order to keep the monitor to response to user quicker under > this suition, an easier way is to take monito io out of qemu_mutux > protection. this include vnc/serial/telnet io related with monitor, > as these io will not affect the running of vm itself, it need not in > so stirct protection. The qemu_mutex protects all QEMU global state. The monitor does some I/O and parsing which is not necessarily global state but once it begins actually performing the command you sent, access to global state will be required (pretty much any monitor command will operate on global state). I think there are two options for handling NFS hangs: 1. Ensure that QEMU is never put to sleep by NFS for disk images. The guest continues executing, may time out and notice that storage is unavailable. 2. Pause the VM but keep the monitor running if a timeout error occurs. Not sure if there is a timeout from NFS that we can detect. For I/O errors (e.g. running out of disk space on the host) there is a configurable policy. You can choose whether to return an error to the guest or to pause the VM. I think we should treat NFS hangs as an extension to this and as a block layer problem rather than an io thread problem. Can you get backtraces when KVM hangs (gdb command: thread apply all bt)? It would be interesting to see some of the blocking cases that you are hitting. Stefan ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: problem about blocked monitor when disk image on NFS can not be reached. 2011-03-01 15:01 ` Stefan Hajnoczi @ 2011-03-01 15:23 ` Avi Kivity 2011-03-02 10:39 ` ya su 0 siblings, 1 reply; 8+ messages in thread From: Avi Kivity @ 2011-03-01 15:23 UTC (permalink / raw) To: Stefan Hajnoczi; +Cc: ya su, kvm@vger.kernel.org, Kevin Wolf On 03/01/2011 05:01 PM, Stefan Hajnoczi wrote: > On Tue, Mar 1, 2011 at 12:39 PM, ya su<suya94335@gmail.com> wrote: > > how about to remove kvm_handle_io/handle_mmio in kvm_run function > > into kvm_main_loop, as these operation belong to io operation, this > > will remove the qemu_mutux between the 2 threads. is this an > > reasonable thought? > > > > In order to keep the monitor to response to user quicker under > > this suition, an easier way is to take monito io out of qemu_mutux > > protection. this include vnc/serial/telnet io related with monitor, > > as these io will not affect the running of vm itself, it need not in > > so stirct protection. > > The qemu_mutex protects all QEMU global state. The monitor does some > I/O and parsing which is not necessarily global state but once it > begins actually performing the command you sent, access to global > state will be required (pretty much any monitor command will operate > on global state). > > I think there are two options for handling NFS hangs: > 1. Ensure that QEMU is never put to sleep by NFS for disk images. The > guest continues executing, may time out and notice that storage is > unavailable. That's the NFS soft mount option. > 2. Pause the VM but keep the monitor running if a timeout error > occurs. Not sure if there is a timeout from NFS that we can detect. The default setting (hard mount) will retry forever in the kernel. Moreover, the other default setting (nointr) means we can't even signal the hung thread. > For I/O errors (e.g. running out of disk space on the host) there is a > configurable policy. You can choose whether to return an error to the > guest or to pause the VM. I think we should treat NFS hangs as an > extension to this and as a block layer problem rather than an io > thread problem. I agree. Mount the share as a soft,intr mount and let the kernel time out and return an I/O error. > Can you get backtraces when KVM hangs (gdb command: thread apply all > bt)? It would be interesting to see some of the blocking cases that > you are hitting. Won't work (at least under the default configuration) since those threads are uninterruptible. At the very least you need an interruptible mount. -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: problem about blocked monitor when disk image on NFS can not be reached. 2011-03-01 15:23 ` Avi Kivity @ 2011-03-02 10:39 ` ya su 2011-03-02 17:26 ` Stefan Hajnoczi 0 siblings, 1 reply; 8+ messages in thread From: ya su @ 2011-03-02 10:39 UTC (permalink / raw) To: Avi Kivity; +Cc: Stefan Hajnoczi, kvm@vger.kernel.org, Kevin Wolf hi,all: io_thread bt as the following: #0 0x00007f3086eaa034 in __lll_lock_wait () from /lib64/libpthread.so.0 #1 0x00007f3086ea5345 in _L_lock_870 () from /lib64/libpthread.so.0 #2 0x00007f3086ea5217 in pthread_mutex_lock () from /lib64/libpthread.so.0 #3 0x0000000000436018 in kvm_mutex_lock () at /root/rpmbuild/BUILD/qemu-kvm-0.14/qemu-kvm.c:1730 #4 qemu_mutex_lock_iothread () at /root/rpmbuild/BUILD/qemu-kvm-0.14/qemu-kvm.c:1744 #5 0x000000000041ca67 in main_loop_wait (nonblocking=<value optimized out>) at /root/rpmbuild/BUILD/qemu-kvm-0.14/vl.c:1377 #6 0x00000000004363e7 in kvm_main_loop () at /root/rpmbuild/BUILD/qemu-kvm-0.14/qemu-kvm.c:1589 #7 0x000000000041dc3a in main_loop (argc=<value optimized out>, argv=<value optimized out>, envp=<value optimized out>) at /root/rpmbuild/BUILD/qemu-kvm-0.14/vl.c:1429 #8 main (argc=<value optimized out>, argv=<value optimized out>, envp=<value optimized out>) at /root/rpmbuild/BUILD/qemu-kvm-0.14/vl.c:3201 cpu thread as the following: #0 0x00007f3084dff093 in select () from /lib64/libc.so.6 #1 0x00000000004453ea in qemu_aio_wait () at aio.c:193 #2 0x0000000000444175 in bdrv_write_em (bs=0x1ec3090, sector_num=2009871, buf=0x7f3087532800 "F\b\200u\022\366F$\004u\fPV\350\226\367\377\377\003Ft\353\fPV\350\212\367\377\377\353\003\213Ft^]\302\b", nb_sectors=16) at block.c:2577 #3 0x000000000059ca13 in ide_sector_write (s=0x215f508) at /root/rpmbuild/BUILD/qemu-kvm-0.14/hw/ide/core.c:574 #4 0x0000000000438ced in kvm_handle_io (env=0x202ef60) at /root/rpmbuild/BUILD/qemu-kvm-0.14/kvm-all.c:821 #5 kvm_run (env=0x202ef60) at /root/rpmbuild/BUILD/qemu-kvm-0.14/qemu-kvm.c:617 #6 0x0000000000438e09 in kvm_cpu_exec (env=<value optimized out>) at /root/rpmbuild/BUILD/qemu-kvm-0.14/qemu-kvm.c:1233 #7 0x000000000043a0f7 in kvm_main_loop_cpu (_env=0x202ef60) at /root/rpmbuild/BUILD/qemu-kvm-0.14/qemu-kvm.c:1419 #8 ap_main_loop (_env=0x202ef60) at /root/rpmbuild/BUILD/qemu-kvm-0.14/qemu-kvm.c:1466 #9 0x00007f3086ea37e1 in start_thread () from /lib64/libpthread.so.0 #10 0x00007f3084e0653d in clone () from /lib64/libc.so.6 aio_thread bt as the following: #0 0x00007f3086eaae83 in pwrite64 () from /lib64/libpthread.so.0 #1 0x0000000000447501 in handle_aiocb_rw_linear (aiocb=0x21cff10, buf=0x7f3087532800 "F\b\200u\022\366F$\004u\fPV\350\226\367\377\377\003Ft\353\fPV\350\212\367\377\377\353\003\213Ft^]\302\b") at posix-aio-compat.c:212 #2 0x0000000000447d48 in handle_aiocb_rw (unused=<value optimized out>) at posix-aio-compat.c:247 #3 aio_thread (unused=<value optimized out>) at posix-aio-compat.c:341 #4 0x00007f3086ea37e1 in start_thread () from /lib64/libpthread.so.0 #5 0x00007f3084e0653d in clone () from /lib64/libc.so.6 I think io_thread is blocked by cpu thread which take the qemu_mutux first, cpu thread is waiting for aio_thread's result by qemu_aio_wait function, aio_thead take much time on pwrite64, it will take about 5-10s, then return a error(it seems like an non-block timeout call), after that, io thead will have a chance to receive monitor input, so the monitor seems to blocked frequently. in this suition, if I stop the vm, the monitor will response faster. the problem is caused by unavailabity of block layer, the block layer process the io error in a normal way, it report error to ide device, the error is handled in ide_sector_write. the root cause is: monitor's input and io operation(pwrite function) must execute in a serialized method(by qemu_mutux seamphore), so pwrite long block time will hinder monitor input. as stefan says, it seems difficult to take monitor input out of the protection, currently I will stop the vm if the disk image can not be reached. 2011/3/1 Avi Kivity <avi@redhat.com>: > On 03/01/2011 05:01 PM, Stefan Hajnoczi wrote: >> >> On Tue, Mar 1, 2011 at 12:39 PM, ya su<suya94335@gmail.com> wrote: >> > how about to remove kvm_handle_io/handle_mmio in kvm_run function >> > into kvm_main_loop, as these operation belong to io operation, this >> > will remove the qemu_mutux between the 2 threads. is this an >> > reasonable thought? >> > >> > In order to keep the monitor to response to user quicker under >> > this suition, an easier way is to take monito io out of qemu_mutux >> > protection. this include vnc/serial/telnet io related with monitor, >> > as these io will not affect the running of vm itself, it need not in >> > so stirct protection. >> >> The qemu_mutex protects all QEMU global state. The monitor does some >> I/O and parsing which is not necessarily global state but once it >> begins actually performing the command you sent, access to global >> state will be required (pretty much any monitor command will operate >> on global state). >> >> I think there are two options for handling NFS hangs: >> 1. Ensure that QEMU is never put to sleep by NFS for disk images. The >> guest continues executing, may time out and notice that storage is >> unavailable. > > That's the NFS soft mount option. > >> 2. Pause the VM but keep the monitor running if a timeout error >> occurs. Not sure if there is a timeout from NFS that we can detect. > > The default setting (hard mount) will retry forever in the kernel. > Moreover, the other default setting (nointr) means we can't even signal the > hung thread. > >> For I/O errors (e.g. running out of disk space on the host) there is a >> configurable policy. You can choose whether to return an error to the >> guest or to pause the VM. I think we should treat NFS hangs as an >> extension to this and as a block layer problem rather than an io >> thread problem. > > I agree. Mount the share as a soft,intr mount and let the kernel time out > and return an I/O error. > >> Can you get backtraces when KVM hangs (gdb command: thread apply all >> bt)? It would be interesting to see some of the blocking cases that >> you are hitting. > > Won't work (at least under the default configuration) since those threads > are uninterruptible. At the very least you need an interruptible mount. > > -- > error compiling committee.c: too many arguments to function > > ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: problem about blocked monitor when disk image on NFS can not be reached. 2011-03-02 10:39 ` ya su @ 2011-03-02 17:26 ` Stefan Hajnoczi 0 siblings, 0 replies; 8+ messages in thread From: Stefan Hajnoczi @ 2011-03-02 17:26 UTC (permalink / raw) To: ya su; +Cc: Avi Kivity, kvm@vger.kernel.org, Kevin Wolf On Wed, Mar 2, 2011 at 10:39 AM, ya su <suya94335@gmail.com> wrote: > io_thread bt as the following: > #0 0x00007f3086eaa034 in __lll_lock_wait () from /lib64/libpthread.so.0 > #1 0x00007f3086ea5345 in _L_lock_870 () from /lib64/libpthread.so.0 > #2 0x00007f3086ea5217 in pthread_mutex_lock () from /lib64/libpthread.so.0 > #3 0x0000000000436018 in kvm_mutex_lock () at > /root/rpmbuild/BUILD/qemu-kvm-0.14/qemu-kvm.c:1730 > #4 qemu_mutex_lock_iothread () at > /root/rpmbuild/BUILD/qemu-kvm-0.14/qemu-kvm.c:1744 > #5 0x000000000041ca67 in main_loop_wait (nonblocking=<value optimized out>) > at /root/rpmbuild/BUILD/qemu-kvm-0.14/vl.c:1377 > #6 0x00000000004363e7 in kvm_main_loop () at > /root/rpmbuild/BUILD/qemu-kvm-0.14/qemu-kvm.c:1589 > #7 0x000000000041dc3a in main_loop (argc=<value optimized out>, > argv=<value optimized out>, > envp=<value optimized out>) at /root/rpmbuild/BUILD/qemu-kvm-0.14/vl.c:1429 > #8 main (argc=<value optimized out>, argv=<value optimized out>, > envp=<value optimized out>) > at /root/rpmbuild/BUILD/qemu-kvm-0.14/vl.c:3201 > > cpu thread as the following: > #0 0x00007f3084dff093 in select () from /lib64/libc.so.6 > #1 0x00000000004453ea in qemu_aio_wait () at aio.c:193 > #2 0x0000000000444175 in bdrv_write_em (bs=0x1ec3090, sector_num=2009871, > buf=0x7f3087532800 > "F\b\200u\022\366F$\004u\fPV\350\226\367\377\377\003Ft\353\fPV\350\212\367\377\377\353\003\213Ft^]\302\b", > nb_sectors=16) at block.c:2577 > #3 0x000000000059ca13 in ide_sector_write (s=0x215f508) at > /root/rpmbuild/BUILD/qemu-kvm-0.14/hw/ide/core.c:574 > #4 0x0000000000438ced in kvm_handle_io (env=0x202ef60) at > /root/rpmbuild/BUILD/qemu-kvm-0.14/kvm-all.c:821 > #5 kvm_run (env=0x202ef60) at /root/rpmbuild/BUILD/qemu-kvm-0.14/qemu-kvm.c:617 > #6 0x0000000000438e09 in kvm_cpu_exec (env=<value optimized out>) > at /root/rpmbuild/BUILD/qemu-kvm-0.14/qemu-kvm.c:1233 > #7 0x000000000043a0f7 in kvm_main_loop_cpu (_env=0x202ef60) > at /root/rpmbuild/BUILD/qemu-kvm-0.14/qemu-kvm.c:1419 > #8 ap_main_loop (_env=0x202ef60) at > /root/rpmbuild/BUILD/qemu-kvm-0.14/qemu-kvm.c:1466 > #9 0x00007f3086ea37e1 in start_thread () from /lib64/libpthread.so.0 > #10 0x00007f3084e0653d in clone () from /lib64/libc.so.6 > > aio_thread bt as the following: > #0 0x00007f3086eaae83 in pwrite64 () from /lib64/libpthread.so.0 > #1 0x0000000000447501 in handle_aiocb_rw_linear (aiocb=0x21cff10, > buf=0x7f3087532800 > "F\b\200u\022\366F$\004u\fPV\350\226\367\377\377\003Ft\353\fPV\350\212\367\377\377\353\003\213Ft^]\302\b") > at posix-aio-compat.c:212 > #2 0x0000000000447d48 in handle_aiocb_rw (unused=<value optimized > out>) at posix-aio-compat.c:247 > #3 aio_thread (unused=<value optimized out>) at posix-aio-compat.c:341 > #4 0x00007f3086ea37e1 in start_thread () from /lib64/libpthread.so.0 > #5 0x00007f3084e0653d in clone () from /lib64/libc.so.6 > > I think io_thread is blocked by cpu thread which take the qemu_mutux > first, cpu thread is waiting for aio_thread's result by qemu_aio_wait > function, aio_thead take much time on pwrite64, it will take about > 5-10s, then return a error(it seems like an non-block timeout call), > after that, io thead will have a chance to receive monitor input, so > the monitor seems to blocked frequently. in this suition, if I stop > the vm, the monitor will response faster. > > the problem is caused by unavailabity of block layer, the block layer > process the io error in a normal way, it report error to ide device, > the error is handled in ide_sector_write. the root cause is: monitor's > input and io operation(pwrite function) must execute in a serialized > method(by qemu_mutux seamphore), so pwrite long block time will hinder > monitor input. > > as stefan says, it seems difficult to take monitor input out of the > protection, currently I will stop the vm if the disk image can not be > reached. If you switch to -drive if=virtio instead of IDE then the problem should be greatly reduced. Virtio-blk uses aio instead of synchronous calls, which means that the vcpu thread does not run qemu_aio_wait(). Kevin and I have been looking into the limitations imposed by synchronous calls. Today there is unfortunately synchronous code in QEMU and we can hit these NFS hang situations. qemu_aio_wait() runs a nested event loop that does a subset of what the full event loop does. This is why the monitor does not respond. If all code was asynchronous then only a top-level event loop would be necessary and the monitor would continue to function. In the immediate term I suggest using virtio-blk instead of IDE. Stefan ^ permalink raw reply [flat|nested] 8+ messages in thread
* problem about blocked monitor when disk image on NFS can not be reached. @ 2011-02-28 13:52 ya su 0 siblings, 0 replies; 8+ messages in thread From: ya su @ 2011-02-28 13:52 UTC (permalink / raw) To: kvm hi: kvm start with disk image on nfs server, when nfs server can not be reached, monitor will be blocked. I change io_thread to SCHED_RR policy, it will work unfluently waiting for disk read/write timeout. I think one solution to this is to change kvm_handle_io in a seperate thread, I will put kvm_handle_io in a new spawned thread, all io request passed in a queue between io_thread and the new spawned thread, it need copy run->io.size*run->io.count bytes from address:(uint8_t *)run + run->io.data_offset. Is this a right direction? any suggestion is welcome, thanks! Green. ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2011-03-02 17:26 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2011-03-01 5:01 problem about blocked monitor when disk image on NFS can not be reached ya su 2011-03-01 10:51 ` Stefan Hajnoczi 2011-03-01 12:39 ` ya su 2011-03-01 15:01 ` Stefan Hajnoczi 2011-03-01 15:23 ` Avi Kivity 2011-03-02 10:39 ` ya su 2011-03-02 17:26 ` Stefan Hajnoczi -- strict thread matches above, loose matches on Subject: below -- 2011-02-28 13:52 ya su
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox