problem about blocked monitor when disk image on NFS can not be reached.

Kernel KVM virtualization development
 help / color / mirror / Atom feed

* problem about blocked monitor when disk image on NFS can not be reached.
@ 2011-03-01  5:01 ya su
  2011-03-01 10:51 ` Stefan Hajnoczi
  0 siblings, 1 reply; 8+ messages in thread
From: ya su @ 2011-03-01  5:01 UTC (permalink / raw)
  To: kvm@vger.kernel.org

hi all:

   kvm start with disk image on nfs server, when nfs server can not be
reached, monitor will be blocked. I change io_thread to SCHED_RR
policy, it will work unfluently waiting for disk read/write timeout.

I have tested a standalone thread to process kvm_handle_io, it can not
start up correctly, it may need qemu_mutux protection.

as io_thread process different io tasks, is it possible to transfer
kvm_handle_io and handle_mmio function into this thread? but the
problem will still stay, monitor will still be blocked by read/write
disk request.

is there anyone that will have a good suggestion? thanks.

Green.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: problem about blocked monitor when disk image on NFS can not be reached.
  2011-03-01  5:01 problem about blocked monitor when disk image on NFS can not be reached ya su
@ 2011-03-01 10:51 ` Stefan Hajnoczi
  2011-03-01 12:39   ` ya su
  0 siblings, 1 reply; 8+ messages in thread
From: Stefan Hajnoczi @ 2011-03-01 10:51 UTC (permalink / raw)
  To: ya su; +Cc: kvm@vger.kernel.org

On Tue, Mar 1, 2011 at 5:01 AM, ya su <suya94335@gmail.com> wrote:
>   kvm start with disk image on nfs server, when nfs server can not be
> reached, monitor will be blocked. I change io_thread to SCHED_RR
> policy, it will work unfluently waiting for disk read/write timeout.

There are some synchronous disk image reads that can put qemu-kvm to
sleep until NFS responds or errors.  For example, when starting
hw/virtio-blk.c calls bdrv_guess_geometry() which may invoke
bdrv_read().

Once the VM is running and you're using virtio-blk then disk I/O
should be asynchronous.  There are some synchronous cases to do with
migration, snapshotting, etc where we wait for outstanding aio
requests.  Again this can block qemu-kvm.

So in short, there's no easy way to avoid blocking the VM in all cases
today.  You should find, however, that normal read/write operation to
a running VM does not cause qemu-kvm to sleep.

Stefan

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: problem about blocked monitor when disk image on NFS can not be reached.
  2011-03-01 10:51 ` Stefan Hajnoczi
@ 2011-03-01 12:39   ` ya su
  2011-03-01 15:01     ` Stefan Hajnoczi
  0 siblings, 1 reply; 8+ messages in thread
From: ya su @ 2011-03-01 12:39 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: kvm@vger.kernel.org

first say sorry for the same mail sent more than one time, I don't
know it will take so long time to come back.

hi, stefan:

    thank for your explaining.

    how about to remove kvm_handle_io/handle_mmio in kvm_run function
into kvm_main_loop, as these operation belong to io operation, this
will remove the qemu_mutux between the 2 threads. is this an
reasonable thought?

    In order to keep the monitor to response to user quicker under
this suition, an easier way is to take monito io out of qemu_mutux
protection. this include vnc/serial/telnet io related with monitor,
as these io will not affect the running of vm itself, it need not in
so stirct protection.

    Any suggestions? thanks.

Green.


2011/3/1 Stefan Hajnoczi <stefanha@gmail.com>:
> On Tue, Mar 1, 2011 at 5:01 AM, ya su <suya94335@gmail.com> wrote:
>>   kvm start with disk image on nfs server, when nfs server can not be
>> reached, monitor will be blocked. I change io_thread to SCHED_RR
>> policy, it will work unfluently waiting for disk read/write timeout.
>
> There are some synchronous disk image reads that can put qemu-kvm to
> sleep until NFS responds or errors.  For example, when starting
> hw/virtio-blk.c calls bdrv_guess_geometry() which may invoke
> bdrv_read().
>
> Once the VM is running and you're using virtio-blk then disk I/O
> should be asynchronous.  There are some synchronous cases to do with
> migration, snapshotting, etc where we wait for outstanding aio
> requests.  Again this can block qemu-kvm.
>
> So in short, there's no easy way to avoid blocking the VM in all cases
> today.  You should find, however, that normal read/write operation to
> a running VM does not cause qemu-kvm to sleep.
>
> Stefan
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: problem about blocked monitor when disk image on NFS can not be reached.
  2011-03-01 12:39   ` ya su
@ 2011-03-01 15:01     ` Stefan Hajnoczi
  2011-03-01 15:23       ` Avi Kivity
  0 siblings, 1 reply; 8+ messages in thread
From: Stefan Hajnoczi @ 2011-03-01 15:01 UTC (permalink / raw)
  To: ya su; +Cc: kvm@vger.kernel.org, Kevin Wolf

On Tue, Mar 1, 2011 at 12:39 PM, ya su <suya94335@gmail.com> wrote:
>    how about to remove kvm_handle_io/handle_mmio in kvm_run function
> into kvm_main_loop, as these operation belong to io operation, this
> will remove the qemu_mutux between the 2 threads. is this an
> reasonable thought?
>
>    In order to keep the monitor to response to user quicker under
> this suition, an easier way is to take monito io out of qemu_mutux
> protection. this include vnc/serial/telnet io related with monitor,
> as these io will not affect the running of vm itself, it need not in
> so stirct protection.

The qemu_mutex protects all QEMU global state.  The monitor does some
I/O and parsing which is not necessarily global state but once it
begins actually performing the command you sent, access to global
state will be required (pretty much any monitor command will operate
on global state).

I think there are two options for handling NFS hangs:
1. Ensure that QEMU is never put to sleep by NFS for disk images.  The
guest continues executing, may time out and notice that storage is
unavailable.
2. Pause the VM but keep the monitor running if a timeout error
occurs.  Not sure if there is a timeout from NFS that we can detect.

For I/O errors (e.g. running out of disk space on the host) there is a
configurable policy.  You can choose whether to return an error to the
guest or to pause the VM.  I think we should treat NFS hangs as an
extension to this and as a block layer problem rather than an io
thread problem.

Can you get backtraces when KVM hangs (gdb command: thread apply all
bt)?  It would be interesting to see some of the blocking cases that
you are hitting.

Stefan

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: problem about blocked monitor when disk image on NFS can not be reached.
  2011-03-01 15:01     ` Stefan Hajnoczi
@ 2011-03-01 15:23       ` Avi Kivity
  2011-03-02 10:39         ` ya su
  0 siblings, 1 reply; 8+ messages in thread
From: Avi Kivity @ 2011-03-01 15:23 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: ya su, kvm@vger.kernel.org, Kevin Wolf

On 03/01/2011 05:01 PM, Stefan Hajnoczi wrote:
> On Tue, Mar 1, 2011 at 12:39 PM, ya su<suya94335@gmail.com>  wrote:
> >      how about to remove kvm_handle_io/handle_mmio in kvm_run function
> >  into kvm_main_loop, as these operation belong to io operation, this
> >  will remove the qemu_mutux between the 2 threads. is this an
> >  reasonable thought?
> >
> >      In order to keep the monitor to response to user quicker under
> >  this suition, an easier way is to take monito io out of qemu_mutux
> >  protection. this include vnc/serial/telnet io related with monitor,
> >  as these io will not affect the running of vm itself, it need not in
> >  so stirct protection.
>
> The qemu_mutex protects all QEMU global state.  The monitor does some
> I/O and parsing which is not necessarily global state but once it
> begins actually performing the command you sent, access to global
> state will be required (pretty much any monitor command will operate
> on global state).
>
> I think there are two options for handling NFS hangs:
> 1. Ensure that QEMU is never put to sleep by NFS for disk images.  The
> guest continues executing, may time out and notice that storage is
> unavailable.

That's the NFS soft mount option.

> 2. Pause the VM but keep the monitor running if a timeout error
> occurs.  Not sure if there is a timeout from NFS that we can detect.

The default setting (hard mount) will retry forever in the kernel.  
Moreover, the other default setting (nointr) means we can't even signal 
the hung thread.

> For I/O errors (e.g. running out of disk space on the host) there is a
> configurable policy.  You can choose whether to return an error to the
> guest or to pause the VM.  I think we should treat NFS hangs as an
> extension to this and as a block layer problem rather than an io
> thread problem.

I agree.  Mount the share as a soft,intr mount and let the kernel time 
out and return an I/O error.

> Can you get backtraces when KVM hangs (gdb command: thread apply all
> bt)?  It would be interesting to see some of the blocking cases that
> you are hitting.

Won't work (at least under the default configuration) since those 
threads are uninterruptible.  At the very least you need an 
interruptible mount.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: problem about blocked monitor when disk image on NFS can not be reached.
  2011-03-01 15:23       ` Avi Kivity
@ 2011-03-02 10:39         ` ya su
  2011-03-02 17:26           ` Stefan Hajnoczi
  0 siblings, 1 reply; 8+ messages in thread
From: ya su @ 2011-03-02 10:39 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Stefan Hajnoczi, kvm@vger.kernel.org, Kevin Wolf

hi,all:

io_thread bt as the following:
#0  0x00007f3086eaa034 in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x00007f3086ea5345 in _L_lock_870 () from /lib64/libpthread.so.0
#2  0x00007f3086ea5217 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x0000000000436018 in kvm_mutex_lock () at
/root/rpmbuild/BUILD/qemu-kvm-0.14/qemu-kvm.c:1730
#4  qemu_mutex_lock_iothread () at
/root/rpmbuild/BUILD/qemu-kvm-0.14/qemu-kvm.c:1744
#5  0x000000000041ca67 in main_loop_wait (nonblocking=<value optimized out>)
    at /root/rpmbuild/BUILD/qemu-kvm-0.14/vl.c:1377
#6  0x00000000004363e7 in kvm_main_loop () at
/root/rpmbuild/BUILD/qemu-kvm-0.14/qemu-kvm.c:1589
#7  0x000000000041dc3a in main_loop (argc=<value optimized out>,
argv=<value optimized out>,
    envp=<value optimized out>) at /root/rpmbuild/BUILD/qemu-kvm-0.14/vl.c:1429
#8  main (argc=<value optimized out>, argv=<value optimized out>,
envp=<value optimized out>)
    at /root/rpmbuild/BUILD/qemu-kvm-0.14/vl.c:3201

cpu thread as the following:
#0  0x00007f3084dff093 in select () from /lib64/libc.so.6
#1  0x00000000004453ea in qemu_aio_wait () at aio.c:193
#2  0x0000000000444175 in bdrv_write_em (bs=0x1ec3090, sector_num=2009871,
    buf=0x7f3087532800
"F\b\200u\022\366F$\004u\fPV\350\226\367\377\377\003Ft\353\fPV\350\212\367\377\377\353\003\213Ft^]\302\b",
nb_sectors=16) at block.c:2577
#3  0x000000000059ca13 in ide_sector_write (s=0x215f508) at
/root/rpmbuild/BUILD/qemu-kvm-0.14/hw/ide/core.c:574
#4  0x0000000000438ced in kvm_handle_io (env=0x202ef60) at
/root/rpmbuild/BUILD/qemu-kvm-0.14/kvm-all.c:821
#5  kvm_run (env=0x202ef60) at /root/rpmbuild/BUILD/qemu-kvm-0.14/qemu-kvm.c:617
#6  0x0000000000438e09 in kvm_cpu_exec (env=<value optimized out>)
    at /root/rpmbuild/BUILD/qemu-kvm-0.14/qemu-kvm.c:1233
#7  0x000000000043a0f7 in kvm_main_loop_cpu (_env=0x202ef60)
    at /root/rpmbuild/BUILD/qemu-kvm-0.14/qemu-kvm.c:1419
#8  ap_main_loop (_env=0x202ef60) at
/root/rpmbuild/BUILD/qemu-kvm-0.14/qemu-kvm.c:1466
#9  0x00007f3086ea37e1 in start_thread () from /lib64/libpthread.so.0
#10 0x00007f3084e0653d in clone () from /lib64/libc.so.6

aio_thread bt as the following:
#0  0x00007f3086eaae83 in pwrite64 () from /lib64/libpthread.so.0
#1  0x0000000000447501 in handle_aiocb_rw_linear (aiocb=0x21cff10,
    buf=0x7f3087532800
"F\b\200u\022\366F$\004u\fPV\350\226\367\377\377\003Ft\353\fPV\350\212\367\377\377\353\003\213Ft^]\302\b")
at posix-aio-compat.c:212
#2  0x0000000000447d48 in handle_aiocb_rw (unused=<value optimized
out>) at posix-aio-compat.c:247
#3  aio_thread (unused=<value optimized out>) at posix-aio-compat.c:341
#4  0x00007f3086ea37e1 in start_thread () from /lib64/libpthread.so.0
#5  0x00007f3084e0653d in clone () from /lib64/libc.so.6

I think io_thread is blocked by cpu thread which take the qemu_mutux
first, cpu thread is waiting for aio_thread's result by qemu_aio_wait
function,  aio_thead take much time on pwrite64, it will take about
5-10s, then return a error(it seems like an non-block timeout call),
after that, io thead will have a chance to receive monitor input, so
the monitor seems to blocked frequently. in this suition, if I stop
the vm, the monitor will response faster.

the problem is caused by unavailabity of block layer, the block layer
process the io error in a normal way, it report error to ide device,
the error is handled in ide_sector_write. the root cause is: monitor's
input and io operation(pwrite function) must execute in a serialized
method(by qemu_mutux seamphore), so pwrite long block time will hinder
monitor input.

as stefan says, it seems difficult to take monitor input out of the
protection, currently I will stop the vm if the disk image can not be
reached.


2011/3/1 Avi Kivity <avi@redhat.com>:
> On 03/01/2011 05:01 PM, Stefan Hajnoczi wrote:
>>
>> On Tue, Mar 1, 2011 at 12:39 PM, ya su<suya94335@gmail.com>  wrote:
>> >      how about to remove kvm_handle_io/handle_mmio in kvm_run function
>> >  into kvm_main_loop, as these operation belong to io operation, this
>> >  will remove the qemu_mutux between the 2 threads. is this an
>> >  reasonable thought?
>> >
>> >      In order to keep the monitor to response to user quicker under
>> >  this suition, an easier way is to take monito io out of qemu_mutux
>> >  protection. this include vnc/serial/telnet io related with monitor,
>> >  as these io will not affect the running of vm itself, it need not in
>> >  so stirct protection.
>>
>> The qemu_mutex protects all QEMU global state.  The monitor does some
>> I/O and parsing which is not necessarily global state but once it
>> begins actually performing the command you sent, access to global
>> state will be required (pretty much any monitor command will operate
>> on global state).
>>
>> I think there are two options for handling NFS hangs:
>> 1. Ensure that QEMU is never put to sleep by NFS for disk images.  The
>> guest continues executing, may time out and notice that storage is
>> unavailable.
>
> That's the NFS soft mount option.
>
>> 2. Pause the VM but keep the monitor running if a timeout error
>> occurs.  Not sure if there is a timeout from NFS that we can detect.
>
> The default setting (hard mount) will retry forever in the kernel.
>  Moreover, the other default setting (nointr) means we can't even signal the
> hung thread.
>
>> For I/O errors (e.g. running out of disk space on the host) there is a
>> configurable policy.  You can choose whether to return an error to the
>> guest or to pause the VM.  I think we should treat NFS hangs as an
>> extension to this and as a block layer problem rather than an io
>> thread problem.
>
> I agree.  Mount the share as a soft,intr mount and let the kernel time out
> and return an I/O error.
>
>> Can you get backtraces when KVM hangs (gdb command: thread apply all
>> bt)?  It would be interesting to see some of the blocking cases that
>> you are hitting.
>
> Won't work (at least under the default configuration) since those threads
> are uninterruptible.  At the very least you need an interruptible mount.
>
> --
> error compiling committee.c: too many arguments to function
>
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: problem about blocked monitor when disk image on NFS can not be reached.
  2011-03-02 10:39         ` ya su
@ 2011-03-02 17:26           ` Stefan Hajnoczi
  0 siblings, 0 replies; 8+ messages in thread
From: Stefan Hajnoczi @ 2011-03-02 17:26 UTC (permalink / raw)
  To: ya su; +Cc: Avi Kivity, kvm@vger.kernel.org, Kevin Wolf

On Wed, Mar 2, 2011 at 10:39 AM, ya su <suya94335@gmail.com> wrote:
> io_thread bt as the following:
> #0  0x00007f3086eaa034 in __lll_lock_wait () from /lib64/libpthread.so.0
> #1  0x00007f3086ea5345 in _L_lock_870 () from /lib64/libpthread.so.0
> #2  0x00007f3086ea5217 in pthread_mutex_lock () from /lib64/libpthread.so.0
> #3  0x0000000000436018 in kvm_mutex_lock () at
> /root/rpmbuild/BUILD/qemu-kvm-0.14/qemu-kvm.c:1730
> #4  qemu_mutex_lock_iothread () at
> /root/rpmbuild/BUILD/qemu-kvm-0.14/qemu-kvm.c:1744
> #5  0x000000000041ca67 in main_loop_wait (nonblocking=<value optimized out>)
>    at /root/rpmbuild/BUILD/qemu-kvm-0.14/vl.c:1377
> #6  0x00000000004363e7 in kvm_main_loop () at
> /root/rpmbuild/BUILD/qemu-kvm-0.14/qemu-kvm.c:1589
> #7  0x000000000041dc3a in main_loop (argc=<value optimized out>,
> argv=<value optimized out>,
>    envp=<value optimized out>) at /root/rpmbuild/BUILD/qemu-kvm-0.14/vl.c:1429
> #8  main (argc=<value optimized out>, argv=<value optimized out>,
> envp=<value optimized out>)
>    at /root/rpmbuild/BUILD/qemu-kvm-0.14/vl.c:3201
>
> cpu thread as the following:
> #0  0x00007f3084dff093 in select () from /lib64/libc.so.6
> #1  0x00000000004453ea in qemu_aio_wait () at aio.c:193
> #2  0x0000000000444175 in bdrv_write_em (bs=0x1ec3090, sector_num=2009871,
>    buf=0x7f3087532800
> "F\b\200u\022\366F$\004u\fPV\350\226\367\377\377\003Ft\353\fPV\350\212\367\377\377\353\003\213Ft^]\302\b",
> nb_sectors=16) at block.c:2577
> #3  0x000000000059ca13 in ide_sector_write (s=0x215f508) at
> /root/rpmbuild/BUILD/qemu-kvm-0.14/hw/ide/core.c:574
> #4  0x0000000000438ced in kvm_handle_io (env=0x202ef60) at
> /root/rpmbuild/BUILD/qemu-kvm-0.14/kvm-all.c:821
> #5  kvm_run (env=0x202ef60) at /root/rpmbuild/BUILD/qemu-kvm-0.14/qemu-kvm.c:617
> #6  0x0000000000438e09 in kvm_cpu_exec (env=<value optimized out>)
>    at /root/rpmbuild/BUILD/qemu-kvm-0.14/qemu-kvm.c:1233
> #7  0x000000000043a0f7 in kvm_main_loop_cpu (_env=0x202ef60)
>    at /root/rpmbuild/BUILD/qemu-kvm-0.14/qemu-kvm.c:1419
> #8  ap_main_loop (_env=0x202ef60) at
> /root/rpmbuild/BUILD/qemu-kvm-0.14/qemu-kvm.c:1466
> #9  0x00007f3086ea37e1 in start_thread () from /lib64/libpthread.so.0
> #10 0x00007f3084e0653d in clone () from /lib64/libc.so.6
>
> aio_thread bt as the following:
> #0  0x00007f3086eaae83 in pwrite64 () from /lib64/libpthread.so.0
> #1  0x0000000000447501 in handle_aiocb_rw_linear (aiocb=0x21cff10,
>    buf=0x7f3087532800
> "F\b\200u\022\366F$\004u\fPV\350\226\367\377\377\003Ft\353\fPV\350\212\367\377\377\353\003\213Ft^]\302\b")
> at posix-aio-compat.c:212
> #2  0x0000000000447d48 in handle_aiocb_rw (unused=<value optimized
> out>) at posix-aio-compat.c:247
> #3  aio_thread (unused=<value optimized out>) at posix-aio-compat.c:341
> #4  0x00007f3086ea37e1 in start_thread () from /lib64/libpthread.so.0
> #5  0x00007f3084e0653d in clone () from /lib64/libc.so.6
>
> I think io_thread is blocked by cpu thread which take the qemu_mutux
> first, cpu thread is waiting for aio_thread's result by qemu_aio_wait
> function,  aio_thead take much time on pwrite64, it will take about
> 5-10s, then return a error(it seems like an non-block timeout call),
> after that, io thead will have a chance to receive monitor input, so
> the monitor seems to blocked frequently. in this suition, if I stop
> the vm, the monitor will response faster.
>
> the problem is caused by unavailabity of block layer, the block layer
> process the io error in a normal way, it report error to ide device,
> the error is handled in ide_sector_write. the root cause is: monitor's
> input and io operation(pwrite function) must execute in a serialized
> method(by qemu_mutux seamphore), so pwrite long block time will hinder
> monitor input.
>
> as stefan says, it seems difficult to take monitor input out of the
> protection, currently I will stop the vm if the disk image can not be
> reached.

If you switch to -drive if=virtio instead of IDE then the problem
should be greatly reduced.  Virtio-blk uses aio instead of synchronous
calls, which means that the vcpu thread does not run qemu_aio_wait().

Kevin and I have been looking into the limitations imposed by
synchronous calls.  Today there is unfortunately synchronous code in
QEMU and we can hit these NFS hang situations.  qemu_aio_wait() runs a
nested event loop that does a subset of what the full event loop does.
 This is why the monitor does not respond.

If all code was asynchronous then only a top-level event loop would be
necessary and the monitor would continue to function.

In the immediate term I suggest using virtio-blk instead of IDE.

Stefan

^ permalink raw reply	[flat|nested] 8+ messages in thread

* problem about blocked monitor when disk image on NFS can not be reached.
@ 2011-02-28 13:52 ya su
  0 siblings, 0 replies; 8+ messages in thread
From: ya su @ 2011-02-28 13:52 UTC (permalink / raw)
  To: kvm

hi:

   kvm start with disk image on nfs server, when nfs server can not be
reached, monitor will be blocked. I change io_thread to SCHED_RR
policy, it will work unfluently waiting for disk read/write timeout.

  I think one solution to this is to change kvm_handle_io in a
seperate thread, I will put kvm_handle_io in a new spawned thread, all
io request passed in a queue between io_thread and the new spawned
thread,  it need copy run->io.size*run->io.count bytes from
address:(uint8_t *)run + run->io.data_offset.

  Is this a right direction? any suggestion is welcome, thanks!

Green.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2011-03-02 17:26 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-03-01  5:01 problem about blocked monitor when disk image on NFS can not be reached ya su
2011-03-01 10:51 ` Stefan Hajnoczi
2011-03-01 12:39   ` ya su
2011-03-01 15:01     ` Stefan Hajnoczi
2011-03-01 15:23       ` Avi Kivity
2011-03-02 10:39         ` ya su
2011-03-02 17:26           ` Stefan Hajnoczi
  -- strict thread matches above, loose matches on Subject: below --
2011-02-28 13:52 ya su

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox