* [Qemu-devel] Qemu-KVM 0.12.3 and Multipath -> Assertion @ 2010-05-03 21:26 Peter Lieven 2010-05-04 5:38 ` [Qemu-devel] " André Weidemann 2010-05-04 8:35 ` [Qemu-devel] " Kevin Wolf 0 siblings, 2 replies; 19+ messages in thread From: Peter Lieven @ 2010-05-03 21:26 UTC (permalink / raw) To: kvm, qemu-devel Hi Qemu/KVM Devel Team, i'm using qemu-kvm 0.12.3 with latest Kernel 2.6.33.3. As backend we use open-iSCSI with dm-multipath. Multipath is configured to queue i/o if no path is available. If we create a failure on all paths, qemu starts to consume 100% CPU due to i/o waits which is ok so far. 1 odd thing: The Monitor Interface is not responding any more ... What es a really blocker is that KVM crashes with: kvm: /usr/src/qemu-kvm-0.12.3/hw/ide/internal.h:507: bmdma_active_if: Assertion `bmdma->unit != (uint8_t)-1' failed. after the multipath has reestablisched at least one path. Any ideas? I remember this was working with earlier kernel/kvm/qemu versions. Thanks, Peter ^ permalink raw reply [flat|nested] 19+ messages in thread
* [Qemu-devel] Re: Qemu-KVM 0.12.3 and Multipath -> Assertion 2010-05-03 21:26 [Qemu-devel] Qemu-KVM 0.12.3 and Multipath -> Assertion Peter Lieven @ 2010-05-04 5:38 ` André Weidemann 2010-05-04 8:35 ` [Qemu-devel] " Kevin Wolf 1 sibling, 0 replies; 19+ messages in thread From: André Weidemann @ 2010-05-04 5:38 UTC (permalink / raw) To: Peter Lieven; +Cc: qemu-devel, kvm Hi Peter, On 03.05.2010 23:26, Peter Lieven wrote: > Hi Qemu/KVM Devel Team, > > i'm using qemu-kvm 0.12.3 with latest Kernel 2.6.33.3. > As backend we use open-iSCSI with dm-multipath. > > Multipath is configured to queue i/o if no path is available. > > If we create a failure on all paths, qemu starts to consume 100% > CPU due to i/o waits which is ok so far. > > 1 odd thing: The Monitor Interface is not responding any more ... > > What es a really blocker is that KVM crashes with: > kvm: /usr/src/qemu-kvm-0.12.3/hw/ide/internal.h:507: bmdma_active_if: > Assertion `bmdma->unit != (uint8_t)-1' failed. > > after the multipath has reestablisched at least one path. > > Any ideas? I remember this was working with earlier kernel/kvm/qemu > versions. I have the same issue on my machine, although I am using local storage (LVM or a physical disk) to write my data to. I reported the "Assertion failed" on March 17th to the list. Marcello and Avi had asked some question back then, but I don't know if they have come up with a fix for it. Regards André ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Qemu-devel] Qemu-KVM 0.12.3 and Multipath -> Assertion 2010-05-03 21:26 [Qemu-devel] Qemu-KVM 0.12.3 and Multipath -> Assertion Peter Lieven 2010-05-04 5:38 ` [Qemu-devel] " André Weidemann @ 2010-05-04 8:35 ` Kevin Wolf 2010-05-04 11:38 ` Peter Lieven 1 sibling, 1 reply; 19+ messages in thread From: Kevin Wolf @ 2010-05-04 8:35 UTC (permalink / raw) To: Peter Lieven; +Cc: qemu-devel, kvm Am 03.05.2010 23:26, schrieb Peter Lieven: > Hi Qemu/KVM Devel Team, > > i'm using qemu-kvm 0.12.3 with latest Kernel 2.6.33.3. > As backend we use open-iSCSI with dm-multipath. > > Multipath is configured to queue i/o if no path is available. > > If we create a failure on all paths, qemu starts to consume 100% > CPU due to i/o waits which is ok so far. > > 1 odd thing: The Monitor Interface is not responding any more ... > > What es a really blocker is that KVM crashes with: > kvm: /usr/src/qemu-kvm-0.12.3/hw/ide/internal.h:507: bmdma_active_if: > Assertion `bmdma->unit != (uint8_t)-1' failed. > > after the multipath has reestablisched at least one path. Can you get a stack backtrace with gdb? > Any ideas? I remember this was working with earlier kernel/kvm/qemu > versions. If it works in the same setup with an older qemu version, bisecting might help. Kevin ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Qemu-devel] Qemu-KVM 0.12.3 and Multipath -> Assertion 2010-05-04 8:35 ` [Qemu-devel] " Kevin Wolf @ 2010-05-04 11:38 ` Peter Lieven 2010-05-04 12:20 ` Kevin Wolf 0 siblings, 1 reply; 19+ messages in thread From: Peter Lieven @ 2010-05-04 11:38 UTC (permalink / raw) To: Kevin Wolf; +Cc: qemu-devel, kvm hi kevin, i set a breakpint at bmdma_active_if. the first 2 breaks encountered when the last path in the multipath failed, but the assertion was not true. when i kicked one path back in the breakpoint was reached again, this time leading to an assert. the stacktrace is from the point shortly before. hope this helps. br, peter -- (gdb) b bmdma_active_if Breakpoint 2 at 0x43f2e0: file /usr/src/qemu-kvm-0.12.3/hw/ide/internal.h, line 507. (gdb) c Continuing. [Switching to Thread 0x7f7b3300d950 (LWP 21171)] Breakpoint 2, bmdma_active_if (bmdma=0xe31fd8) at /usr/src/qemu-kvm-0.12.3/hw/ide/internal.h:507 507 assert(bmdma->unit != (uint8_t)-1); (gdb) c Continuing. Breakpoint 2, bmdma_active_if (bmdma=0xe31fd8) at /usr/src/qemu-kvm-0.12.3/hw/ide/internal.h:507 507 assert(bmdma->unit != (uint8_t)-1); (gdb) c Continuing. Breakpoint 2, bmdma_active_if (bmdma=0xe31fd8) at /usr/src/qemu-kvm-0.12.3/hw/ide/internal.h:507 507 assert(bmdma->unit != (uint8_t)-1); (gdb) bt full #0 bmdma_active_if (bmdma=0xe31fd8) at /usr/src/qemu-kvm-0.12.3/hw/ide/internal.h:507 __PRETTY_FUNCTION__ = "bmdma_active_if" #1 0x000000000043f6ba in ide_read_dma_cb (opaque=0xe31fd8, ret=0) at /usr/src/qemu-kvm-0.12.3/hw/ide/core.c:554 bm = (BMDMAState *) 0xe31fd8 s = (IDEState *) 0xe17940 n = 0 sector_num = 0 #2 0x000000000058730c in dma_bdrv_cb (opaque=0xe17940, ret=0) at /usr/src/qemu-kvm-0.12.3/dma-helpers.c:94 dbs = (DMAAIOCB *) 0xe17940 cur_addr = 0 cur_len = 0 mem = (void *) 0x0 #3 0x000000000049e510 in qemu_laio_process_completion (s=0xe119c0, laiocb=0xe179c0) at linux-aio.c:68 ret = 0 #4 0x000000000049e611 in qemu_laio_enqueue_completed (s=0xe119c0, laiocb=0xe179c0) at linux-aio.c:107 No locals. #5 0x000000000049e787 in qemu_laio_completion_cb (opaque=0xe119c0) at linux-aio.c:144 iocb = (struct iocb *) 0xe179f0 laiocb = (struct qemu_laiocb *) 0xe179c0 val = 1 ret = 8 nevents = 1 i = 0 events = {{data = 0x0, obj = 0xe179f0, res = 4096, res2 = 0}, {data = 0x0, obj = 0x0, res = 0, res2 = 0} <repeats 46 times>, {data = 0x0, obj = 0x0, res = 0, res2 = 4365191}, {data = 0x429abf, obj = 0x7f7b3300c410, res = 4614129721674825936, res2 = 14777248}, {data = 0x3000000018, obj = 0x7f7b3300c4c0, res = 140167113393152, res2 = 47259417504}, {data = 0xe17740, obj = 0xa3300c4e0, res = 140167113393184, res2 = 0}, {data = 0xe17740, obj = 0x0, res = 0, res2 = 17}, {data = 0x7f7b3300ccf0, obj = 0x92, res = 32, res2 = 168}, {data = 0x7f7b33797a00, obj = 0x801000, res = 0, res2 = 140167141433408}, {data = 0x7f7b34496e00, obj = 0x7f7b33797a00, res = 140167113393392, res2 = 8392704}, {data = 0x0, obj = 0x7f7b34aca040, res = 140167134932480, res2 = 140167118209654}, {data = 0x7f7b3300d950, obj = 0x42603d, res = 0, res2 = 42949672960}, {data = 0x7f7b3300c510, obj = 0xe17ba0, res = 14776128, res2 = 43805361568}, {data = 0x7f7b3300ced0, obj = 0x42797e, res = 0, res2 = 14777248}, { data = 0x174, obj = 0x0, res = 373, res2 = 0}, {data = 0x176, obj = 0x0, res = 3221225601, res2 = 0}, {data = 0x4008ae89c0000083, obj = 0x0, res = 209379655938, res2 = 0}, { data = 0x7f7bc0000084, obj = 0x0, res = 3221225602, res2 = 0}, {data = 0x7f7b00000012, obj = 0x0, res = 17, res2 = 0}, {data = 0x0, obj = 0x11, res = 140167113395840, res2 = 146}, {data = 0x20, obj = 0xa8, res = 140167121304064, res2 = 8392704}, {data = 0x0, obj = 0x7f7b34aca040, res = 140167134932480, res2 = 140167121304064}, { data = 0x7f7b3300c680, obj = 0x801000, res = 0, res2 = 140167141433408}, {data = 0x7f7b34496e00, obj = 0x7f7b334a4276, res = 140167113398608, res2 = 4350013}, {data = 0x0, obj = 0xa00000000, res = 140167113393824, res2 = 14777248}, {data = 0xe2c010, obj = 0xa3300c730, res = 140167113396320, res2 = 4356478}, {data = 0x0, obj = 0xe17ba0, res = 372, res2 = 0}, {data = 0x175, obj = 0x0, res = 374, res2 = 0}, {data = 0xc0000081, obj = 0x0, res = 3221225603, res2 = 0}, {data = 0xc0000102, obj = 0x0, res = 3221225604, res2 = 0}, {data = 0xc0000082, obj = 0x0, res = 18, res2 = 0}, {data = 0x11, obj = 0x0, res = 0, res2 = 0}, {data = 0x0, obj = 0x0, res = 0, res2 = 0}, { data = 0x0, obj = 0x0, res = 0, res2 = 0}, {data = 0x0, obj = 0x0, res = 0, res2 = 0}, {data = 0x0, obj = 0x0, res = 0, res2 = 0}, {data = 0x0, obj = 0x0, res = 0, res2 = 140167139245116}, {data = 0x0, obj = 0x7f7b34abe118, res = 9, res2 = 13}, {data = 0x25bf5fc6, obj = 0x7f7b348b40f0, res = 140167117719264, res2 = 6}, { data = 0x96fd7f, obj = 0x7f7b3300c850, res = 140167113394680, res2 = 140167117724520}, {data = 0x0, obj = 0x7f7b34abe168, res = 140167141388288, res2 = 4206037}, { data = 0x7f7b3343a210, obj = 0x402058, res = 21474836480, res2 = 4294968102}, {data = 0x0, obj = 0x7f7b34ac8358, res = 140167113394736, res2 = 140167113394680}, { data = 0x25bf5fc6, obj = 0x7f7b3300c9e0, res = 0, res2 = 140167139246910}, {data = 0x0, obj = 0x7f7b34abe168, res = 5, res2 = 140167139245116}, {data = 0x1, obj = 0x7f7b34abe120, res = 10, res2 = 13}, {data = 0x9fd7b9dd, obj = 0x7f7b348b40f0, res = 140167139205904, res2 = 140166257704989}, {data = 0x27f5ee7, obj = 0x7f7b3300c950, res = 140167113394936, res2 = 140167139205920}, {data = 0x0, obj = 0x7f7b34abe510, res = 140167141434664, res2 = 140167134874710}, { data = 0x7f7b348aa5a8, obj = 0x7f7b34486e30, res = 21474836480, res2 = 4294967319}, {data = 0x7f7b3428427c, obj = 0x7f7b34ac7cc8, res = 140167113394992, res2 = 140167113394936}, {data = 0x9fd7b9dd, obj = 0x7f7b3300cae0, res = 0, res2 = 140167139246910}, {data = 0x0, obj = 0x7f7b34abe510, res = 140166257704965, res2 = 0}, { data = 0x500000001, obj = 0xffffffff, res = 0, res2 = 8627912}, {data = 0x801000, obj = 0x100000000, res = 140167141385488, res2 = 140167141424328}, {data = 0x7f7b3300cb60, obj = 0x7f7b34ac7970, res = 140167134874710, res2 = 0}, {data = 0x5, obj = 0x0, res = 140167117743864, res2 = 140167113398608}, {data = 0x7f7b3300cb00, obj = 0x7f7b348bf592, res = 14665264, res2 = 90}, {data = 0x2, obj = 0x7f7b33508255, res = 140167113398608, res2 = 140167113394944}, {data = 0x801000, obj = 0x0, res = 140167141433408, res2 = 4966659}, {data = 0x50, obj = 0x23300cb2f, res = 140167139206472, res2 = 140167141434664}, {data = 0x98, obj = 0xffffffff, res = 140167113395072, res2 = 2191368}, { data = 0x801000, obj = 0x3, res = 140167141433408, res2 = 140167139245116}, {data = 0x7f7b34486000, obj = 0x7f7b34abe0e8, res = 3, res2 = 13}, {data = 0xa896c0a2, obj = 0x7f7b348b40f0, res = 140167132758616, res2 = 34}, {data = 0x2a25b02, obj = 0x7f7b3300cb90, res = 140167113395512, res2 = 140167132758672}, {data = 0x0, obj = 0x7f7b34abe1b0, res = 140167141396480, res2 = 4204852}, {data = 0x7f7b34284458, obj = 0x400d38, res = 21474836480, res2 = 4294967302}, {data = 0xc008ae67325339e0, ---Type <return> to continue, or q <return> to quit--- obj = 0x7f7b34ac8358, res = 140167113395568, res2 = 140167113395512}, {data = 0xa896c0a2, obj = 0x7f7b3300cd20, res = 0, res2 = 140167139246910}, {data = 0x0, obj = 0x7f7b34abe1b0, res = 140166257704965, res2 = 0}, {data = 0x1, obj = 0x101010101010101, res = 140167113395504, res2 = 14776768}, {data = 0x7f7b3300cd20, obj = 0x13300cd7c, res = 140167141384624, res2 = 140167141426008}, {data = 0x7f7b3300cda0, obj = 0x7f7b34ac8000, res = 4204852, res2 = 4364819}, {data = 0x42994b, obj = 0x7f7b3300ccf0, res = 13837501612500713496, res2 = 14658720}, {data = 0x3000000018, obj = 0x7f7b3300cda0, res = 140167113395424, res2 = 4371109}, { data = 0x7f7b3300cdc0, obj = 0x3300cd4c, res = 64424509441, res2 = 4364819}, {data = 0x42994b, obj = 0x7f7b3300cd50, res = 13837501612500713473, res2 = 14658720}, { data = 0x3000000018, obj = 0x7f7b3300ce00, res = 140167132758816, res2 = 140167141396480}, {data = 0x7f7b3300ce20, obj = 0xffffffff, res = 140167113395580, res2 = 8626296}, { data = 0x801000, obj = 0x0, res = 140167141433408, res2 = 140167134932480}, {data = 0x0, obj = 0x7f7b348b871a, res = 140166257704965, res2 = 0}, {data = 0x7f7b3300cdd0, obj = 0x42b2a5, res = 140167132758816, res2 = 140167113398608}} ts = {tv_sec = 0, tv_nsec = 0} s = (struct qemu_laio_state *) 0xe119c0 #6 0x000000000049e841 in laio_cancel (blockacb=0xe179c0) at linux-aio.c:184 laiocb = (struct qemu_laiocb *) 0xe179c0 event = {data = 0x1, obj = 0x4c7fb1, res = 140167113395792, res2 = 4384262} ret = -22 #7 0x000000000049a29b in bdrv_aio_cancel (acb=0xe179c0) at block.c:1792 No locals. #8 0x000000000058755a in dma_aio_cancel (acb=0xe17940) at /usr/src/qemu-kvm-0.12.3/dma-helpers.c:138 dbs = (DMAAIOCB *) 0xe17940 #9 0x000000000049a29b in bdrv_aio_cancel (acb=0xe17940) at block.c:1792 No locals. #10 0x0000000000444a0c in ide_dma_cancel (bm=0xe31fd8) at /usr/src/qemu-kvm-0.12.3/hw/ide/core.c:2838 No locals. #11 0x0000000000444f39 in bmdma_cmd_writeb (opaque=0xe31fd8, addr=49152, val=8) at /usr/src/qemu-kvm-0.12.3/hw/ide/pci.c:44 bm = (BMDMAState *) 0xe31fd8 #12 0x00000000004c81bc in ioport_write (index=0, address=49152, data=8) at ioport.c:80 func = (IOPortWriteFunc *) 0x444f0c <bmdma_cmd_writeb> default_func = {0x4c81d0 <default_ioport_writeb>, 0x4c8225 <default_ioport_writew>, 0x4c8282 <default_ioport_writel>} #13 0x00000000004c8543 in cpu_outb (addr=49152, val=8 '\b') at ioport.c:198 No locals. #14 0x0000000000429689 in kvm_handle_io (port=49152, data=0x7f7b34ab7000, direction=1, size=1, count=1) at /usr/src/qemu-kvm-0.12.3/kvm-all.c:535 i = 0 ptr = (uint8_t *) 0x7f7b34ab7000 "\b" #15 0x000000000042bac3 in kvm_run (env=0xe17ba0) at /usr/src/qemu-kvm-0.12.3/qemu-kvm.c:964 r = 0 kvm = (kvm_context_t) 0xdfb0d0 run = (struct kvm_run *) 0x7f7b34ab6000 fd = 15 #16 0x000000000042cdda in kvm_cpu_exec (env=0xe17ba0) at /usr/src/qemu-kvm-0.12.3/qemu-kvm.c:1647 r = 0 #17 0x000000000042d564 in kvm_main_loop_cpu (env=0xe17ba0) at /usr/src/qemu-kvm-0.12.3/qemu-kvm.c:1889 run_cpu = 1 #18 0x000000000042d6a5 in ap_main_loop (_env=0xe17ba0) at /usr/src/qemu-kvm-0.12.3/qemu-kvm.c:1939 env = (struct CPUX86State *) 0xe17ba0 signals = {__val = {18446744067267100671, 18446744073709551615 <repeats 15 times>}} data = (struct ioperm_data *) 0x0 #19 0x00007f7b3448d3ba in start_thread () from /lib/libpthread.so.0 No symbol table info available. #20 0x00007f7b3350ffcd in clone () from /lib/libc.so.6 No symbol table info available. #21 0x0000000000000000 in ?? () No symbol table info available. (gdb) c Continuing. kvm: /usr/src/qemu-kvm-0.12.3/hw/ide/internal.h:507: bmdma_active_if: Assertion `bmdma->unit != (uint8_t)-1' failed. Program received signal SIGABRT, Aborted. 0x00007f7b3345cfb5 in raise () from /lib/libc.so.6 Kevin Wolf wrote: > Am 03.05.2010 23:26, schrieb Peter Lieven: > >> Hi Qemu/KVM Devel Team, >> >> i'm using qemu-kvm 0.12.3 with latest Kernel 2.6.33.3. >> As backend we use open-iSCSI with dm-multipath. >> >> Multipath is configured to queue i/o if no path is available. >> >> If we create a failure on all paths, qemu starts to consume 100% >> CPU due to i/o waits which is ok so far. >> >> 1 odd thing: The Monitor Interface is not responding any more ... >> >> What es a really blocker is that KVM crashes with: >> kvm: /usr/src/qemu-kvm-0.12.3/hw/ide/internal.h:507: bmdma_active_if: >> Assertion `bmdma->unit != (uint8_t)-1' failed. >> >> after the multipath has reestablisched at least one path. >> > > Can you get a stack backtrace with gdb? > > >> Any ideas? I remember this was working with earlier kernel/kvm/qemu >> versions. >> > > If it works in the same setup with an older qemu version, bisecting > might help. > > Kevin > > ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Qemu-devel] Qemu-KVM 0.12.3 and Multipath -> Assertion 2010-05-04 11:38 ` Peter Lieven @ 2010-05-04 12:20 ` Kevin Wolf 2010-05-04 13:42 ` Peter Lieven 2010-05-08 9:53 ` [Qemu-devel] Qemu-KVM 0.12.3 and Multipath -> Assertion André Weidemann 0 siblings, 2 replies; 19+ messages in thread From: Kevin Wolf @ 2010-05-04 12:20 UTC (permalink / raw) To: Peter Lieven; +Cc: qemu-devel, kvm, Christoph Hellwig Am 04.05.2010 13:38, schrieb Peter Lieven: > hi kevin, > > i set a breakpint at bmdma_active_if. the first 2 breaks encountered > when the last path in the multipath > failed, but the assertion was not true. > when i kicked one path back in the breakpoint was reached again, this > time leading to an assert. > the stacktrace is from the point shortly before. > > hope this helps. Hm, looks like there's something wrong with cancelling requests - bdrv_aio_cancel might decide that it completes a request (and consequently calls the callback for it) whereas the IDE emulation decides that it's done with the request before calling bdrv_aio_cancel. I haven't looked in much detail what this could break, but does something like this help? diff --git a/hw/ide/core.c b/hw/ide/core.c index 0757528..3cd55e3 100644 --- a/hw/ide/core.c +++ b/hw/ide/core.c @@ -2838,10 +2838,6 @@ static void ide_dma_restart(IDEState *s, int is_read) void ide_dma_cancel(BMDMAState *bm) { if (bm->status & BM_STATUS_DMAING) { - bm->status &= ~BM_STATUS_DMAING; - /* cancel DMA request */ - bm->unit = -1; - bm->dma_cb = NULL; if (bm->aiocb) { #ifdef DEBUG_AIO printf("aio_cancel\n"); @@ -2849,6 +2845,10 @@ void ide_dma_cancel(BMDMAState *bm) bdrv_aio_cancel(bm->aiocb); bm->aiocb = NULL; } + bm->status &= ~BM_STATUS_DMAING; + /* cancel DMA request */ + bm->unit = -1; + bm->dma_cb = NULL; } } Kevin ^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [Qemu-devel] Qemu-KVM 0.12.3 and Multipath -> Assertion 2010-05-04 12:20 ` Kevin Wolf @ 2010-05-04 13:42 ` Peter Lieven 2010-05-04 14:01 ` Kevin Wolf 2010-05-08 9:53 ` [Qemu-devel] Qemu-KVM 0.12.3 and Multipath -> Assertion André Weidemann 1 sibling, 1 reply; 19+ messages in thread From: Peter Lieven @ 2010-05-04 13:42 UTC (permalink / raw) To: Kevin Wolf; +Cc: qemu-devel, kvm, Christoph Hellwig hi kevin, you did it *g* looks promising. applied this patched and was not able to reproduce yet :-) secure way to reproduce was to shut down all multipath paths, then initiate i/o in the vm (e.g. start an application). of course, everything hangs at this point. after reenabling one path, vm crashed. now it seems to behave correctly and just report an DMA timeout and continues normally afterwards. can you imagine of any way preventing the vm to consume 100% cpu in that waiting state? my current approach is to run all vms with nice 1, which helped to keep the machine responsible if all vms (in my test case 64 on a box) have hanging i/o at the same time. br, peter Kevin Wolf wrote: > Am 04.05.2010 13:38, schrieb Peter Lieven: > >> hi kevin, >> >> i set a breakpint at bmdma_active_if. the first 2 breaks encountered >> when the last path in the multipath >> failed, but the assertion was not true. >> when i kicked one path back in the breakpoint was reached again, this >> time leading to an assert. >> the stacktrace is from the point shortly before. >> >> hope this helps. >> > > Hm, looks like there's something wrong with cancelling requests - > bdrv_aio_cancel might decide that it completes a request (and > consequently calls the callback for it) whereas the IDE emulation > decides that it's done with the request before calling bdrv_aio_cancel. > > I haven't looked in much detail what this could break, but does > something like this help? > > diff --git a/hw/ide/core.c b/hw/ide/core.c > index 0757528..3cd55e3 100644 > --- a/hw/ide/core.c > +++ b/hw/ide/core.c > @@ -2838,10 +2838,6 @@ static void ide_dma_restart(IDEState *s, int is_read) > void ide_dma_cancel(BMDMAState *bm) > { > if (bm->status & BM_STATUS_DMAING) { > - bm->status &= ~BM_STATUS_DMAING; > - /* cancel DMA request */ > - bm->unit = -1; > - bm->dma_cb = NULL; > if (bm->aiocb) { > #ifdef DEBUG_AIO > printf("aio_cancel\n"); > @@ -2849,6 +2845,10 @@ void ide_dma_cancel(BMDMAState *bm) > bdrv_aio_cancel(bm->aiocb); > bm->aiocb = NULL; > } > + bm->status &= ~BM_STATUS_DMAING; > + /* cancel DMA request */ > + bm->unit = -1; > + bm->dma_cb = NULL; > } > } > > Kevin > > ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Qemu-devel] Qemu-KVM 0.12.3 and Multipath -> Assertion 2010-05-04 13:42 ` Peter Lieven @ 2010-05-04 14:01 ` Kevin Wolf 2010-05-04 17:07 ` Christoph Hellwig 2010-05-12 14:01 ` qemu-kvm hangs if multipath device is queing (was: Re: [Qemu-devel] Qemu-KVM 0.12.3 and Multipath -> Assertion) Peter Lieven 0 siblings, 2 replies; 19+ messages in thread From: Kevin Wolf @ 2010-05-04 14:01 UTC (permalink / raw) To: Peter Lieven; +Cc: qemu-devel, kvm, Christoph Hellwig Am 04.05.2010 15:42, schrieb Peter Lieven: > hi kevin, > > you did it *g* > > looks promising. applied this patched and was not able to reproduce yet :-) > > secure way to reproduce was to shut down all multipath paths, then > initiate i/o > in the vm (e.g. start an application). of course, everything hangs at > this point. > > after reenabling one path, vm crashed. now it seems to behave correctly and > just report an DMA timeout and continues normally afterwards. Great, I'm going to submit it as a proper patch then. Christoph, by now I'm pretty sure it's right, but can you have another look if this is correct, anyway? > can you imagine of any way preventing the vm to consume 100% cpu in > that waiting state? > my current approach is to run all vms with nice 1, which helped to keep the > machine responsible if all vms (in my test case 64 on a box) have hanging > i/o at the same time. I don't have anything particular in mind, but you could just attach gdb and get another backtrace while it consumes 100% CPU (you'll need to use "thread apply all bt" to catch everything). Then we should see where it's hanging. Kevin ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Qemu-devel] Qemu-KVM 0.12.3 and Multipath -> Assertion 2010-05-04 14:01 ` Kevin Wolf @ 2010-05-04 17:07 ` Christoph Hellwig 2010-05-18 11:13 ` Peter Lieven 2010-05-12 14:01 ` qemu-kvm hangs if multipath device is queing (was: Re: [Qemu-devel] Qemu-KVM 0.12.3 and Multipath -> Assertion) Peter Lieven 1 sibling, 1 reply; 19+ messages in thread From: Christoph Hellwig @ 2010-05-04 17:07 UTC (permalink / raw) To: Kevin Wolf; +Cc: Peter Lieven, qemu-devel, kvm On Tue, May 04, 2010 at 04:01:35PM +0200, Kevin Wolf wrote: > Great, I'm going to submit it as a proper patch then. > > Christoph, by now I'm pretty sure it's right, but can you have another > look if this is correct, anyway? It looks correct to me - we really shouldn't update the the fields until bdrv_aio_cancel has returned. In fact we cannot cancel a request more often than we can, so there's a fairly high chance it will complete. Reviewed-by: Christoph Hellwig <hch@lst.de> ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Qemu-devel] Qemu-KVM 0.12.3 and Multipath -> Assertion 2010-05-04 17:07 ` Christoph Hellwig @ 2010-05-18 11:13 ` Peter Lieven 2010-05-18 12:14 ` Kevin Wolf 0 siblings, 1 reply; 19+ messages in thread From: Peter Lieven @ 2010-05-18 11:13 UTC (permalink / raw) To: Christoph Hellwig; +Cc: Kevin Wolf, qemu-devel, kvm hi, will this patch make it into 0.12.4.1 ? br, peter Christoph Hellwig wrote: > On Tue, May 04, 2010 at 04:01:35PM +0200, Kevin Wolf wrote: > >> Great, I'm going to submit it as a proper patch then. >> >> Christoph, by now I'm pretty sure it's right, but can you have another >> look if this is correct, anyway? >> > > It looks correct to me - we really shouldn't update the the fields > until bdrv_aio_cancel has returned. In fact we cannot cancel a request > more often than we can, so there's a fairly high chance it will > complete. > > > Reviewed-by: Christoph Hellwig <hch@lst.de> > > -- Mit freundlichen Grüßen/Kind Regards Peter Lieven .......................................................................................................... KAMP Netzwerkdienste GmbH Vestische Str. 89-91 | 46117 Oberhausen Tel: +49 (0) 208.89 402-50 | Fax: +49 (0) 208.89 402-40 mailto:pl@kamp.de | http://www.kamp.de Geschäftsführer: Heiner Lante | Michael Lante Amtsgericht Duisburg | HRB Nr. 12154 USt-Id-Nr.: DE 120607556 ......................................................................................................... ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Qemu-devel] Qemu-KVM 0.12.3 and Multipath -> Assertion 2010-05-18 11:13 ` Peter Lieven @ 2010-05-18 12:14 ` Kevin Wolf 0 siblings, 0 replies; 19+ messages in thread From: Kevin Wolf @ 2010-05-18 12:14 UTC (permalink / raw) To: Anthony Liguori; +Cc: Peter Lieven, Christoph Hellwig, kvm, qemu-devel Am 18.05.2010 13:13, schrieb Peter Lieven: > hi, > > will this patch make it into 0.12.4.1 ? > > br, > peter Anthony, can you please cherry-pick commit 38d8dfa1 into stable-0.12? Kevin > > Christoph Hellwig wrote: >> On Tue, May 04, 2010 at 04:01:35PM +0200, Kevin Wolf wrote: >> >>> Great, I'm going to submit it as a proper patch then. >>> >>> Christoph, by now I'm pretty sure it's right, but can you have another >>> look if this is correct, anyway? >>> >> >> It looks correct to me - we really shouldn't update the the fields >> until bdrv_aio_cancel has returned. In fact we cannot cancel a request >> more often than we can, so there's a fairly high chance it will >> complete. >> >> >> Reviewed-by: Christoph Hellwig <hch@lst.de> ^ permalink raw reply [flat|nested] 19+ messages in thread
* qemu-kvm hangs if multipath device is queing (was: Re: [Qemu-devel] Qemu-KVM 0.12.3 and Multipath -> Assertion) 2010-05-04 14:01 ` Kevin Wolf 2010-05-04 17:07 ` Christoph Hellwig @ 2010-05-12 14:01 ` Peter Lieven 2010-05-14 9:26 ` [Qemu-devel] Re: qemu-kvm hangs if multipath device is queing Kevin Wolf 1 sibling, 1 reply; 19+ messages in thread From: Peter Lieven @ 2010-05-12 14:01 UTC (permalink / raw) To: Kevin Wolf; +Cc: qemu-devel, kvm, Christoph Hellwig Hi Kevin, here we go. I created a blocking multipath device (interrupted all paths). qemu-kvm hangs with 100% cpu. also monitor is not responding. If I restore at least one path, the vm is continueing. BR, Peter ^C Program received signal SIGINT, Interrupt. 0x00007fd8a6aaea94 in __lll_lock_wait () from /lib/libpthread.so.0 (gdb) bt #0 0x00007fd8a6aaea94 in __lll_lock_wait () from /lib/libpthread.so.0 #1 0x00007fd8a6aaa190 in _L_lock_102 () from /lib/libpthread.so.0 #2 0x00007fd8a6aa9a7e in pthread_mutex_lock () from /lib/libpthread.so.0 #3 0x000000000042e739 in kvm_mutex_lock () at /usr/src/qemu-kvm-0.12.4/qemu-kvm.c:2524 #4 0x000000000042e76e in qemu_mutex_lock_iothread () at /usr/src/qemu-kvm-0.12.4/qemu-kvm.c:2537 #5 0x000000000040c262 in main_loop_wait (timeout=1000) at /usr/src/qemu-kvm-0.12.4/vl.c:3995 #6 0x000000000042dcf1 in kvm_main_loop () at /usr/src/qemu-kvm-0.12.4/qemu-kvm.c:2126 #7 0x000000000040c98c in main_loop () at /usr/src/qemu-kvm-0.12.4/vl.c:4212 #8 0x000000000041054b in main (argc=30, argv=0x7fff266a77e8, envp=0x7fff266a78e0) at /usr/src/qemu-kvm-0.12.4/vl.c:6252 (gdb) bt full #0 0x00007fd8a6aaea94 in __lll_lock_wait () from /lib/libpthread.so.0 No symbol table info available. #1 0x00007fd8a6aaa190 in _L_lock_102 () from /lib/libpthread.so.0 No symbol table info available. #2 0x00007fd8a6aa9a7e in pthread_mutex_lock () from /lib/libpthread.so.0 No symbol table info available. #3 0x000000000042e739 in kvm_mutex_lock () at /usr/src/qemu-kvm-0.12.4/qemu-kvm.c:2524 No locals. #4 0x000000000042e76e in qemu_mutex_lock_iothread () at /usr/src/qemu-kvm-0.12.4/qemu-kvm.c:2537 No locals. #5 0x000000000040c262 in main_loop_wait (timeout=1000) at /usr/src/qemu-kvm-0.12.4/vl.c:3995 ioh = (IOHandlerRecord *) 0x0 rfds = {fds_bits = {1048576, 0 <repeats 15 times>}} wfds = {fds_bits = {0 <repeats 16 times>}} xfds = {fds_bits = {0 <repeats 16 times>}} ret = 1 nfds = 21 tv = {tv_sec = 0, tv_usec = 999761} #6 0x000000000042dcf1 in kvm_main_loop () at /usr/src/qemu-kvm-0.12.4/qemu-kvm.c:2126 fds = {18, 19} mask = {__val = {268443712, 0 <repeats 15 times>}} sigfd = 20 #7 0x000000000040c98c in main_loop () at /usr/src/qemu-kvm-0.12.4/vl.c:4212 r = 0 #8 0x000000000041054b in main (argc=30, argv=0x7fff266a77e8, envp=0x7fff266a78e0) at /usr/src/qemu-kvm-0.12.4/vl.c:6252 gdbstub_dev = 0x0 boot_devices_bitmap = 12 i = 0 snapshot = 0 linux_boot = 0 initrd_filename = 0x0 kernel_filename = 0x0 kernel_cmdline = 0x588fac "" boot_devices = "dc", '\0' <repeats 30 times> ds = (DisplayState *) 0x198bf00 dcl = (DisplayChangeListener *) 0x0 cyls = 0 heads = 0 secs = 0 translation = 0 hda_opts = (QemuOpts *) 0x0 opts = (QemuOpts *) 0x1957390 optind = 30 ---Type <return> to continue, or q <return> to quit--- r = 0x7fff266a8a23 "-usbdevice" optarg = 0x7fff266a8a2e "tablet" loadvm = 0x0 machine = (QEMUMachine *) 0x861720 cpu_model = 0x7fff266a8917 "qemu64,model_id=Intel(R) Xeon(R) CPU", ' ' <repeats 11 times>, "E5520 @ 2.27GHz" fds = {644511720, 32767} tb_size = 0 pid_file = 0x7fff266a89bb "/var/run/qemu/vm-150.pid" incoming = 0x0 fd = 0 pwd = (struct passwd *) 0x0 chroot_dir = 0x0 run_as = 0x0 env = (struct CPUX86State *) 0x0 show_vnc_port = 0 params = {0x58cc76 "order", 0x58cc7c "once", 0x58cc81 "menu", 0x0} Kevin Wolf wrote: > Am 04.05.2010 15:42, schrieb Peter Lieven: > >> hi kevin, >> >> you did it *g* >> >> looks promising. applied this patched and was not able to reproduce yet :-) >> >> secure way to reproduce was to shut down all multipath paths, then >> initiate i/o >> in the vm (e.g. start an application). of course, everything hangs at >> this point. >> >> after reenabling one path, vm crashed. now it seems to behave correctly and >> just report an DMA timeout and continues normally afterwards. >> > > Great, I'm going to submit it as a proper patch then. > > Christoph, by now I'm pretty sure it's right, but can you have another > look if this is correct, anyway? > > >> can you imagine of any way preventing the vm to consume 100% cpu in >> that waiting state? >> my current approach is to run all vms with nice 1, which helped to keep the >> machine responsible if all vms (in my test case 64 on a box) have hanging >> i/o at the same time. >> > > I don't have anything particular in mind, but you could just attach gdb > and get another backtrace while it consumes 100% CPU (you'll need to use > "thread apply all bt" to catch everything). Then we should see where > it's hanging. > > Kevin > > > > -- Mit freundlichen Grüßen/Kind Regards Peter Lieven .......................................................................................................... KAMP Netzwerkdienste GmbH Vestische Str. 89-91 | 46117 Oberhausen Tel: +49 (0) 208.89 402-50 | Fax: +49 (0) 208.89 402-40 mailto:pl@kamp.de | http://www.kamp.de Geschäftsführer: Heiner Lante | Michael Lante Amtsgericht Duisburg | HRB Nr. 12154 USt-Id-Nr.: DE 120607556 ......................................................................................................... ^ permalink raw reply [flat|nested] 19+ messages in thread
* [Qemu-devel] Re: qemu-kvm hangs if multipath device is queing 2010-05-12 14:01 ` qemu-kvm hangs if multipath device is queing (was: Re: [Qemu-devel] Qemu-KVM 0.12.3 and Multipath -> Assertion) Peter Lieven @ 2010-05-14 9:26 ` Kevin Wolf 2010-05-18 11:10 ` Peter Lieven 0 siblings, 1 reply; 19+ messages in thread From: Kevin Wolf @ 2010-05-14 9:26 UTC (permalink / raw) To: Peter Lieven; +Cc: qemu-devel, kvm, Christoph Hellwig Hi Peter, Am 12.05.2010 16:01, schrieb Peter Lieven: > Hi Kevin, > > here we go. I created a blocking multipath device (interrupted all > paths). qemu-kvm hangs with 100% cpu. > also monitor is not responding. > > If I restore at least one path, the vm is continueing. > > BR, > Peter This seems to be the backtrace of only one thread, and likely not the interesting one. Can you please use "threads all apply bt" to get the backtrace of all threads? Kevin > > > ^C > Program received signal SIGINT, Interrupt. > 0x00007fd8a6aaea94 in __lll_lock_wait () from /lib/libpthread.so.0 > (gdb) bt > #0 0x00007fd8a6aaea94 in __lll_lock_wait () from /lib/libpthread.so.0 > #1 0x00007fd8a6aaa190 in _L_lock_102 () from /lib/libpthread.so.0 > #2 0x00007fd8a6aa9a7e in pthread_mutex_lock () from /lib/libpthread.so.0 > #3 0x000000000042e739 in kvm_mutex_lock () at > /usr/src/qemu-kvm-0.12.4/qemu-kvm.c:2524 > #4 0x000000000042e76e in qemu_mutex_lock_iothread () at > /usr/src/qemu-kvm-0.12.4/qemu-kvm.c:2537 > #5 0x000000000040c262 in main_loop_wait (timeout=1000) at > /usr/src/qemu-kvm-0.12.4/vl.c:3995 > #6 0x000000000042dcf1 in kvm_main_loop () at > /usr/src/qemu-kvm-0.12.4/qemu-kvm.c:2126 > #7 0x000000000040c98c in main_loop () at /usr/src/qemu-kvm-0.12.4/vl.c:4212 > #8 0x000000000041054b in main (argc=30, argv=0x7fff266a77e8, > envp=0x7fff266a78e0) at /usr/src/qemu-kvm-0.12.4/vl.c:6252 > (gdb) bt full > #0 0x00007fd8a6aaea94 in __lll_lock_wait () from /lib/libpthread.so.0 > No symbol table info available. > #1 0x00007fd8a6aaa190 in _L_lock_102 () from /lib/libpthread.so.0 > No symbol table info available. > #2 0x00007fd8a6aa9a7e in pthread_mutex_lock () from /lib/libpthread.so.0 > No symbol table info available. > #3 0x000000000042e739 in kvm_mutex_lock () at > /usr/src/qemu-kvm-0.12.4/qemu-kvm.c:2524 > No locals. > #4 0x000000000042e76e in qemu_mutex_lock_iothread () at > /usr/src/qemu-kvm-0.12.4/qemu-kvm.c:2537 > No locals. > #5 0x000000000040c262 in main_loop_wait (timeout=1000) at > /usr/src/qemu-kvm-0.12.4/vl.c:3995 > ioh = (IOHandlerRecord *) 0x0 > rfds = {fds_bits = {1048576, 0 <repeats 15 times>}} > wfds = {fds_bits = {0 <repeats 16 times>}} > xfds = {fds_bits = {0 <repeats 16 times>}} > ret = 1 > nfds = 21 > tv = {tv_sec = 0, tv_usec = 999761} > #6 0x000000000042dcf1 in kvm_main_loop () at > /usr/src/qemu-kvm-0.12.4/qemu-kvm.c:2126 > fds = {18, 19} > mask = {__val = {268443712, 0 <repeats 15 times>}} > sigfd = 20 > #7 0x000000000040c98c in main_loop () at /usr/src/qemu-kvm-0.12.4/vl.c:4212 > r = 0 > #8 0x000000000041054b in main (argc=30, argv=0x7fff266a77e8, > envp=0x7fff266a78e0) at /usr/src/qemu-kvm-0.12.4/vl.c:6252 > gdbstub_dev = 0x0 > boot_devices_bitmap = 12 > i = 0 > snapshot = 0 > linux_boot = 0 > initrd_filename = 0x0 > kernel_filename = 0x0 > kernel_cmdline = 0x588fac "" > boot_devices = "dc", '\0' <repeats 30 times> > ds = (DisplayState *) 0x198bf00 > dcl = (DisplayChangeListener *) 0x0 > cyls = 0 > heads = 0 > secs = 0 > translation = 0 > hda_opts = (QemuOpts *) 0x0 > opts = (QemuOpts *) 0x1957390 > optind = 30 > ---Type <return> to continue, or q <return> to quit--- > r = 0x7fff266a8a23 "-usbdevice" > optarg = 0x7fff266a8a2e "tablet" > loadvm = 0x0 > machine = (QEMUMachine *) 0x861720 > cpu_model = 0x7fff266a8917 "qemu64,model_id=Intel(R) Xeon(R) CPU", ' > ' <repeats 11 times>, "E5520 @ 2.27GHz" > fds = {644511720, 32767} > tb_size = 0 > pid_file = 0x7fff266a89bb "/var/run/qemu/vm-150.pid" > incoming = 0x0 > fd = 0 > pwd = (struct passwd *) 0x0 > chroot_dir = 0x0 > run_as = 0x0 > env = (struct CPUX86State *) 0x0 > show_vnc_port = 0 > params = {0x58cc76 "order", 0x58cc7c "once", 0x58cc81 "menu", 0x0} > > Kevin Wolf wrote: >> Am 04.05.2010 15:42, schrieb Peter Lieven: >> >>> hi kevin, >>> >>> you did it *g* >>> >>> looks promising. applied this patched and was not able to reproduce yet :-) >>> >>> secure way to reproduce was to shut down all multipath paths, then >>> initiate i/o >>> in the vm (e.g. start an application). of course, everything hangs at >>> this point. >>> >>> after reenabling one path, vm crashed. now it seems to behave correctly and >>> just report an DMA timeout and continues normally afterwards. >>> >> >> Great, I'm going to submit it as a proper patch then. >> >> Christoph, by now I'm pretty sure it's right, but can you have another >> look if this is correct, anyway? >> >> >>> can you imagine of any way preventing the vm to consume 100% cpu in >>> that waiting state? >>> my current approach is to run all vms with nice 1, which helped to keep the >>> machine responsible if all vms (in my test case 64 on a box) have hanging >>> i/o at the same time. >>> >> >> I don't have anything particular in mind, but you could just attach gdb >> and get another backtrace while it consumes 100% CPU (you'll need to use >> "thread apply all bt" to catch everything). Then we should see where >> it's hanging. >> >> Kevin >> >> >> >> > > ^ permalink raw reply [flat|nested] 19+ messages in thread
* [Qemu-devel] Re: qemu-kvm hangs if multipath device is queing 2010-05-14 9:26 ` [Qemu-devel] Re: qemu-kvm hangs if multipath device is queing Kevin Wolf @ 2010-05-18 11:10 ` Peter Lieven 2010-05-18 13:22 ` Kevin Wolf 0 siblings, 1 reply; 19+ messages in thread From: Peter Lieven @ 2010-05-18 11:10 UTC (permalink / raw) To: Kevin Wolf; +Cc: qemu-devel, kvm, Christoph Hellwig hi kevin, here is the backtrace of (hopefully) all threads: ^C Program received signal SIGINT, Interrupt. [Switching to Thread 0x7f39b72656f0 (LWP 10695)] 0x00007f39b6c3ea94 in __lll_lock_wait () from /lib/libpthread.so.0 (gdb) thread apply all bt Thread 2 (Thread 0x7f39b57b8950 (LWP 10698)): #0 0x00007f39b6c3eedb in read () from /lib/libpthread.so.0 #1 0x000000000049e723 in qemu_laio_completion_cb (opaque=0x22b4010) at linux-aio.c:125 #2 0x000000000049e8ad in laio_cancel (blockacb=0x22ba310) at linux-aio.c:184 #3 0x000000000049a309 in bdrv_aio_cancel (acb=0x22ba310) at block.c:1800 #4 0x0000000000587a52 in dma_aio_cancel (acb=0x22ba170) at /usr/src/qemu-kvm-0.12.4/dma-helpers.c:138 #5 0x000000000049a309 in bdrv_aio_cancel (acb=0x22ba170) at block.c:1800 #6 0x0000000000444aac in ide_dma_cancel (bm=0x2800fd8) at /usr/src/qemu-kvm-0.12.4/hw/ide/core.c:2834 #7 0x0000000000445001 in bmdma_cmd_writeb (opaque=0x2800fd8, addr=49152, val=8) at /usr/src/qemu-kvm-0.12.4/hw/ide/pci.c:44 #8 0x00000000004c85f0 in ioport_write (index=0, address=49152, data=8) at ioport.c:80 #9 0x00000000004c8977 in cpu_outb (addr=49152, val=8 '\b') at ioport.c:198 #10 0x0000000000429731 in kvm_handle_io (port=49152, data=0x7f39b7263000, direction=1, size=1, count=1) at /usr/src/qemu-kvm-0.12.4/kvm-all.c:535 #11 0x000000000042bb8b in kvm_run (env=0x22ba5d0) at /usr/src/qemu-kvm-0.12.4/qemu-kvm.c:968 #12 0x000000000042cea2 in kvm_cpu_exec (env=0x22ba5d0) at /usr/src/qemu-kvm-0.12.4/qemu-kvm.c:1651 #13 0x000000000042d62c in kvm_main_loop_cpu (env=0x22ba5d0) at /usr/src/qemu-kvm-0.12.4/qemu-kvm.c:1893 #14 0x000000000042d76d in ap_main_loop (_env=0x22ba5d0) at /usr/src/qemu-kvm-0.12.4/qemu-kvm.c:1943 #15 0x00007f39b6c383ba in start_thread () from /lib/libpthread.so.0 #16 0x00007f39b5cbafcd in clone () from /lib/libc.so.6 #17 0x0000000000000000 in ?? () Thread 1 (Thread 0x7f39b72656f0 (LWP 10695)): #0 0x00007f39b6c3ea94 in __lll_lock_wait () from /lib/libpthread.so.0 #1 0x00007f39b6c3a190 in _L_lock_102 () from /lib/libpthread.so.0 #2 0x00007f39b6c39a7e in pthread_mutex_lock () from /lib/libpthread.so.0 #3 0x000000000042e739 in kvm_mutex_lock () at /usr/src/qemu-kvm-0.12.4/qemu-kvm.c:2524 #4 0x000000000042e76e in qemu_mutex_lock_iothread () at /usr/src/qemu-kvm-0.12.4/qemu-kvm.c:2537 #5 0x000000000040c262 in main_loop_wait (timeout=1000) at /usr/src/qemu-kvm-0.12.4/vl.c:3995 #6 0x000000000042dcf1 in kvm_main_loop () at /usr/src/qemu-kvm-0.12.4/qemu-kvm.c:2126 #7 0x000000000040c98c in main_loop () at /usr/src/qemu-kvm-0.12.4/vl.c:4212 #8 0x000000000041054b in main (argc=30, argv=0x7fff019f1ca8, envp=0x7fff019f1da0) at /usr/src/qemu-kvm-0.12.4/vl.c:6252 ^ permalink raw reply [flat|nested] 19+ messages in thread
* [Qemu-devel] Re: qemu-kvm hangs if multipath device is queing 2010-05-18 11:10 ` Peter Lieven @ 2010-05-18 13:22 ` Kevin Wolf 2010-05-19 7:29 ` Christoph Hellwig 0 siblings, 1 reply; 19+ messages in thread From: Kevin Wolf @ 2010-05-18 13:22 UTC (permalink / raw) To: Peter Lieven; +Cc: qemu-devel, kvm, Christoph Hellwig Am 18.05.2010 13:10, schrieb Peter Lieven: > hi kevin, > > here is the backtrace of (hopefully) all threads: > > ^C > Program received signal SIGINT, Interrupt. > [Switching to Thread 0x7f39b72656f0 (LWP 10695)] > 0x00007f39b6c3ea94 in __lll_lock_wait () from /lib/libpthread.so.0 > > (gdb) thread apply all bt > > Thread 2 (Thread 0x7f39b57b8950 (LWP 10698)): > #0 0x00007f39b6c3eedb in read () from /lib/libpthread.so.0 > #1 0x000000000049e723 in qemu_laio_completion_cb (opaque=0x22b4010) at > linux-aio.c:125 > #2 0x000000000049e8ad in laio_cancel (blockacb=0x22ba310) at > linux-aio.c:184 I think it's stuck here in an endless loop: while (laiocb->ret == -EINPROGRESS) qemu_laio_completion_cb(laiocb->ctx); Can you verify this by single-stepping one or two loop iterations? ret and errno after the read call could be interesting, too. We'll be stuck in an endless loop if the request doesn't complete, which might well happen in your scenario. Not sure what the right thing to do is. We probably need to fail the bdrv_aio_cancel to avoid blocking the whole program, but I have no idea what device emulations should do on that condition. As long as we can't handle that condition correctly, leaving the hang in place is probably the best option. Maybe add some sleep to avoid 100% CPU consumption. Kevin ^ permalink raw reply [flat|nested] 19+ messages in thread
* [Qemu-devel] Re: qemu-kvm hangs if multipath device is queing 2010-05-18 13:22 ` Kevin Wolf @ 2010-05-19 7:29 ` Christoph Hellwig 2010-05-19 7:48 ` Kevin Wolf 0 siblings, 1 reply; 19+ messages in thread From: Christoph Hellwig @ 2010-05-19 7:29 UTC (permalink / raw) To: Kevin Wolf; +Cc: Peter Lieven, qemu-devel, kvm, Christoph Hellwig On Tue, May 18, 2010 at 03:22:36PM +0200, Kevin Wolf wrote: > I think it's stuck here in an endless loop: > > while (laiocb->ret == -EINPROGRESS) > qemu_laio_completion_cb(laiocb->ctx); > > Can you verify this by single-stepping one or two loop iterations? ret > and errno after the read call could be interesting, too. Maybe the compiler is just too smart. Without some form of barrier it could just optimize the loop away as laiocb->ret couldn't change in a normal single-threaded environment. ^ permalink raw reply [flat|nested] 19+ messages in thread
* [Qemu-devel] Re: qemu-kvm hangs if multipath device is queing 2010-05-19 7:29 ` Christoph Hellwig @ 2010-05-19 7:48 ` Kevin Wolf 2010-05-19 8:18 ` Peter Lieven 0 siblings, 1 reply; 19+ messages in thread From: Kevin Wolf @ 2010-05-19 7:48 UTC (permalink / raw) To: Christoph Hellwig; +Cc: Peter Lieven, qemu-devel, kvm Am 19.05.2010 09:29, schrieb Christoph Hellwig: > On Tue, May 18, 2010 at 03:22:36PM +0200, Kevin Wolf wrote: >> I think it's stuck here in an endless loop: >> >> while (laiocb->ret == -EINPROGRESS) >> qemu_laio_completion_cb(laiocb->ctx); >> >> Can you verify this by single-stepping one or two loop iterations? ret >> and errno after the read call could be interesting, too. > > Maybe the compiler is just too smart. Without some form of barrier > it could just optimize the loop away as laiocb->ret couldn't change > in a normal single-threaded environment. It probably could in theory, but in practice we're in a read() inside qemu_laio_completion, so it didn't do it here. Kevin ^ permalink raw reply [flat|nested] 19+ messages in thread
* [Qemu-devel] Re: qemu-kvm hangs if multipath device is queing 2010-05-19 7:48 ` Kevin Wolf @ 2010-05-19 8:18 ` Peter Lieven 2010-05-23 10:30 ` Peter Lieven 0 siblings, 1 reply; 19+ messages in thread From: Peter Lieven @ 2010-05-19 8:18 UTC (permalink / raw) To: Kevin Wolf; +Cc: Christoph Hellwig, kvm, qemu-devel Kevin Wolf wrote: > Am 19.05.2010 09:29, schrieb Christoph Hellwig: > >> On Tue, May 18, 2010 at 03:22:36PM +0200, Kevin Wolf wrote: >> >>> I think it's stuck here in an endless loop: >>> >>> while (laiocb->ret == -EINPROGRESS) >>> qemu_laio_completion_cb(laiocb->ctx); >>> >>> Can you verify this by single-stepping one or two loop iterations? ret >>> and errno after the read call could be interesting, too. >>> >> Maybe the compiler is just too smart. Without some form of barrier >> it could just optimize the loop away as laiocb->ret couldn't change >> in a normal single-threaded environment. >> > > It probably could in theory, but in practice we're in a read() inside > qemu_laio_completion, so it didn't do it here. > if you supply a patch that will add some usleeps at the point in question i'm willing to test if it solves the 100% cpu problem. > Kevin > > ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Qemu-devel] Re: qemu-kvm hangs if multipath device is queing 2010-05-19 8:18 ` Peter Lieven @ 2010-05-23 10:30 ` Peter Lieven 0 siblings, 0 replies; 19+ messages in thread From: Peter Lieven @ 2010-05-23 10:30 UTC (permalink / raw) To: Peter Lieven; +Cc: Kevin Wolf, Christoph Hellwig, kvm, qemu-devel Am 19.05.2010 um 10:18 schrieb Peter Lieven: > Kevin Wolf wrote: >> Am 19.05.2010 09:29, schrieb Christoph Hellwig: >> >>> On Tue, May 18, 2010 at 03:22:36PM +0200, Kevin Wolf wrote: >>> >>>> I think it's stuck here in an endless loop: >>>> >>>> while (laiocb->ret == -EINPROGRESS) >>>> qemu_laio_completion_cb(laiocb->ctx); >>>> >>>> Can you verify this by single-stepping one or two loop iterations? ret >>>> and errno after the read call could be interesting, too. >>>> >>> Maybe the compiler is just too smart. Without some form of barrier >>> it could just optimize the loop away as laiocb->ret couldn't change >>> in a normal single-threaded environment. >>> >> >> It probably could in theory, but in practice we're in a read() inside >> qemu_laio_completion, so it didn't do it here. >> > if you supply a patch that will add some usleeps at the point in > question i'm willing to test if it solves the 100% cpu problem. can someone help here? what would be the best option to add some usleeps? >> Kevin >> >> > > > ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Qemu-devel] Qemu-KVM 0.12.3 and Multipath -> Assertion 2010-05-04 12:20 ` Kevin Wolf 2010-05-04 13:42 ` Peter Lieven @ 2010-05-08 9:53 ` André Weidemann 1 sibling, 0 replies; 19+ messages in thread From: André Weidemann @ 2010-05-08 9:53 UTC (permalink / raw) To: Kevin Wolf; +Cc: Peter Lieven, qemu-devel, kvm, Christoph Hellwig Hi Kevin, On 04.05.2010 14:20, Kevin Wolf wrote: > Am 04.05.2010 13:38, schrieb Peter Lieven: >> hi kevin, >> >> i set a breakpint at bmdma_active_if. the first 2 breaks encountered >> when the last path in the multipath >> failed, but the assertion was not true. >> when i kicked one path back in the breakpoint was reached again, this >> time leading to an assert. >> the stacktrace is from the point shortly before. >> >> hope this helps. > > Hm, looks like there's something wrong with cancelling requests - > bdrv_aio_cancel might decide that it completes a request (and > consequently calls the callback for it) whereas the IDE emulation > decides that it's done with the request before calling bdrv_aio_cancel. > > I haven't looked in much detail what this could break, but does > something like this help? Your attached patch fixes the problem I had as well. I ran 3 consecutive tests tonight, which all finished without crashing the VM. I reported my "assertion failed" error on March 14th while doing disk perfomance tests using iozone in an Ubuntu 9.10 VM with qemu-kvm 0.12.3. Thank you very much. André ^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2010-05-23 10:31 UTC | newest] Thread overview: 19+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2010-05-03 21:26 [Qemu-devel] Qemu-KVM 0.12.3 and Multipath -> Assertion Peter Lieven 2010-05-04 5:38 ` [Qemu-devel] " André Weidemann 2010-05-04 8:35 ` [Qemu-devel] " Kevin Wolf 2010-05-04 11:38 ` Peter Lieven 2010-05-04 12:20 ` Kevin Wolf 2010-05-04 13:42 ` Peter Lieven 2010-05-04 14:01 ` Kevin Wolf 2010-05-04 17:07 ` Christoph Hellwig 2010-05-18 11:13 ` Peter Lieven 2010-05-18 12:14 ` Kevin Wolf 2010-05-12 14:01 ` qemu-kvm hangs if multipath device is queing (was: Re: [Qemu-devel] Qemu-KVM 0.12.3 and Multipath -> Assertion) Peter Lieven 2010-05-14 9:26 ` [Qemu-devel] Re: qemu-kvm hangs if multipath device is queing Kevin Wolf 2010-05-18 11:10 ` Peter Lieven 2010-05-18 13:22 ` Kevin Wolf 2010-05-19 7:29 ` Christoph Hellwig 2010-05-19 7:48 ` Kevin Wolf 2010-05-19 8:18 ` Peter Lieven 2010-05-23 10:30 ` Peter Lieven 2010-05-08 9:53 ` [Qemu-devel] Qemu-KVM 0.12.3 and Multipath -> Assertion André Weidemann
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).