* [Qemu-devel] [PATCH] ide: fix double free @ 2014-07-02 8:50 arei.gonglei 2014-07-02 9:04 ` Paolo Bonzini 0 siblings, 1 reply; 19+ messages in thread From: arei.gonglei @ 2014-07-02 8:50 UTC (permalink / raw) To: qemu-devel; +Cc: kwolf, Chenliang, weidong.huang, Gonglei, stefanha, pbonzini From: Chenliang <chenliang88@huawei.com> Qemu may double free when excutes command "reboot -f" in vm. One path is bdrv_aio_cancel(), the other is dma_bdrv_cb() callback prcocess. Signed-off-by: Chenliang <chenliang88@huawei.com> Signed-off-by: Gonglei <arei.gonglei@huawei.com> --- Qemu crash bt: Program received signal SIGABRT, Aborted. 0x00007f3cb2c76b55 in raise () from /lib64/libc.so.6 (gdb) bt #0 0x00007f3cb2c76b55 in raise () from /lib64/libc.so.6 #1 0x00007f3cb2c78131 in abort () from /lib64/libc.so.6 #2 0x00007f3cb2cb4e0f in __libc_message () from /lib64/libc.so.6 #3 0x00007f3cb2cba618 in malloc_printerr () from /lib64/libc.so.6 #4 0x00007f3cb2cbf65c in free () from /lib64/libc.so.6 #5 0x00007f3cb54146f2 in free_and_trace (mem=0x7f3cb63f3220) at vl.c:3078 #6 0x00007f3cb52126cd in qemu_aio_release (p=0x7f3cb63f3220) at block.c:4262 #7 0x00007f3cb525de5e in dma_complete (dbs=0x7f3cb63f3220, ret=0) at dma-helpers.c:135 #8 0x00007f3cb525df3d in dma_bdrv_cb (opaque=0x7f3cb63f3220, ret=0) at dma-helpers.c:152 #9 0x00007f3cb5212102 in bdrv_co_em_bh (opaque=0x7f3cb6398980) at block.c:4127 #10 0x00007f3cb51f6cef in aio_bh_poll (ctx=0x7f3cb622a8f0) at async.c:70 #11 0x00007f3cb51f695a in aio_poll (ctx=0x7f3cb622a8f0, blocking=false) at aio-posix.c:185 #12 0x00007f3cb51f7056 in aio_ctx_dispatch (source=0x7f3cb622a8f0, callback=0x0, user_data=0x0) at async.c:167 #13 0x00007f3cb48b969a in g_main_context_dispatch () from /usr/lib64/libglib-2.0.so.0 #14 0x00007f3cb538956d in glib_pollfds_poll () at main-loop.c:188 #15 0x00007f3cb538965e in os_host_main_loop_wait (timeout=0) at main-loop.c:233 #16 0x00007f3cb5389718 in main_loop_wait (nonblocking=0) at main-loop.c:478 #17 0x00007f3cb5411212 in main_loop () at vl.c:2319 #18 0x00007f3cb54186ba in main (argc=57, argv=0x7fff5510c968, envp=0x7fff5510cb38) at vl.c:4803 --- hw/ide/core.c | 8 +++----- 1 file changed, 3 insertions(+), 5 deletions(-) diff --git a/hw/ide/core.c b/hw/ide/core.c index 3a38f1e..76d65c1 100644 --- a/hw/ide/core.c +++ b/hw/ide/core.c @@ -2050,11 +2050,9 @@ void ide_bus_reset(IDEBus *bus) /* pending async DMA */ if (bus->dma->aiocb) { -#ifdef DEBUG_AIO - printf("aio_cancel\n"); -#endif - bdrv_aio_cancel(bus->dma->aiocb); - bus->dma->aiocb = NULL; + bdrv_drain_all(); + bdrv_flush_all(); + assert(bus->dma->aiocb == NULL); } /* reset dma provider too */ -- 1.7.12.4 ^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [Qemu-devel] [PATCH] ide: fix double free 2014-07-02 8:50 [Qemu-devel] [PATCH] ide: fix double free arei.gonglei @ 2014-07-02 9:04 ` Paolo Bonzini 2014-07-02 9:24 ` ChenLiang 0 siblings, 1 reply; 19+ messages in thread From: Paolo Bonzini @ 2014-07-02 9:04 UTC (permalink / raw) To: arei.gonglei, qemu-devel; +Cc: kwolf, Chenliang, weidong.huang, stefanha Il 02/07/2014 10:50, arei.gonglei@huawei.com ha scritto: > if (bus->dma->aiocb) { > -#ifdef DEBUG_AIO > - printf("aio_cancel\n"); > -#endif > - bdrv_aio_cancel(bus->dma->aiocb); > - bus->dma->aiocb = NULL; > + bdrv_drain_all(); > + bdrv_flush_all(); > + assert(bus->dma->aiocb == NULL); > } This is definitely a heavyweight solution, and in fact the bug should not be there in the first place. See dma_complete: static void dma_complete(DMAAIOCB *dbs, int ret) { trace_dma_complete(dbs, ret, dbs->common.cb); dma_bdrv_unmap(dbs); if (dbs->common.cb) { dbs->common.cb(dbs->common.opaque, ret); } qemu_iovec_destroy(&dbs->iov); if (dbs->bh) { qemu_bh_delete(dbs->bh); dbs->bh = NULL; } if (!dbs->in_cancel) { /* Requests may complete while dma_aio_cancel is in progress. In * this case, the AIOCB should not be released because it is still * referenced by dma_aio_cancel. */ qemu_aio_release(dbs); } } Perhaps something like this? diff --git a/dma-helpers.c b/dma-helpers.c index 53cbe92..21b70d12 100644 --- a/dma-helpers.c +++ b/dma-helpers.c @@ -181,15 +181,15 @@ static void dma_aio_cancel(BlockDriverAIOCB *acb) trace_dma_aio_cancel(dbs); + dbs->in_cancel = true; if (dbs->acb) { BlockDriverAIOCB *acb = dbs->acb; dbs->acb = NULL; - dbs->in_cancel = true; bdrv_aio_cancel(acb); - dbs->in_cancel = false; } dbs->common.cb = NULL; dma_complete(dbs, 0); + qemu_aio_release(dbs); } static const AIOCBInfo dma_aiocb_info = { ^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [Qemu-devel] [PATCH] ide: fix double free 2014-07-02 9:04 ` Paolo Bonzini @ 2014-07-02 9:24 ` ChenLiang 2014-07-02 9:25 ` Paolo Bonzini 0 siblings, 1 reply; 19+ messages in thread From: ChenLiang @ 2014-07-02 9:24 UTC (permalink / raw) To: Paolo Bonzini; +Cc: kwolf, arei.gonglei, weidong.huang, qemu-devel, stefanha On 2014/7/2 17:04, Paolo Bonzini wrote: > This is definitely a heavyweight solution, and in fact the bug should > not be there in the first place. See dma_complete: > > static void dma_complete(DMAAIOCB *dbs, int ret) > { > trace_dma_complete(dbs, ret, dbs->common.cb); > > dma_bdrv_unmap(dbs); > if (dbs->common.cb) { > dbs->common.cb(dbs->common.opaque, ret); > } > qemu_iovec_destroy(&dbs->iov); > if (dbs->bh) { > qemu_bh_delete(dbs->bh); > dbs->bh = NULL; > } > if (!dbs->in_cancel) { > /* Requests may complete while dma_aio_cancel is in progress. In > * this case, the AIOCB should not be released because it is still > * referenced by dma_aio_cancel. */ > qemu_aio_release(dbs); > } > } > > Perhaps something like this? > > diff --git a/dma-helpers.c b/dma-helpers.c > index 53cbe92..21b70d12 100644 > --- a/dma-helpers.c > +++ b/dma-helpers.c > @@ -181,15 +181,15 @@ static void dma_aio_cancel(BlockDriverAIOCB *acb) > > trace_dma_aio_cancel(dbs); > > + dbs->in_cancel = true; > if (dbs->acb) { > BlockDriverAIOCB *acb = dbs->acb; > dbs->acb = NULL; > - dbs->in_cancel = true; > bdrv_aio_cancel(acb); > - dbs->in_cancel = false; > } > dbs->common.cb = NULL; > dma_complete(dbs, 0); > + qemu_aio_release(dbs); > } > > static const AIOCBInfo dma_aiocb_info = { > > > . > acked, thanks Best regards Chenliang ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Qemu-devel] [PATCH] ide: fix double free 2014-07-02 9:24 ` ChenLiang @ 2014-07-02 9:25 ` Paolo Bonzini 2014-07-02 9:46 ` Gonglei (Arei) 0 siblings, 1 reply; 19+ messages in thread From: Paolo Bonzini @ 2014-07-02 9:25 UTC (permalink / raw) To: ChenLiang; +Cc: kwolf, arei.gonglei, weidong.huang, qemu-devel, stefanha Il 02/07/2014 11:24, ChenLiang ha scritto: > On 2014/7/2 17:04, Paolo Bonzini wrote: > >> This is definitely a heavyweight solution, and in fact the bug should >> not be there in the first place. See dma_complete: >> >> static void dma_complete(DMAAIOCB *dbs, int ret) >> { >> trace_dma_complete(dbs, ret, dbs->common.cb); >> >> dma_bdrv_unmap(dbs); >> if (dbs->common.cb) { >> dbs->common.cb(dbs->common.opaque, ret); >> } >> qemu_iovec_destroy(&dbs->iov); >> if (dbs->bh) { >> qemu_bh_delete(dbs->bh); >> dbs->bh = NULL; >> } >> if (!dbs->in_cancel) { >> /* Requests may complete while dma_aio_cancel is in progress. In >> * this case, the AIOCB should not be released because it is still >> * referenced by dma_aio_cancel. */ >> qemu_aio_release(dbs); >> } >> } >> >> Perhaps something like this? >> >> diff --git a/dma-helpers.c b/dma-helpers.c >> index 53cbe92..21b70d12 100644 >> --- a/dma-helpers.c >> +++ b/dma-helpers.c >> @@ -181,15 +181,15 @@ static void dma_aio_cancel(BlockDriverAIOCB *acb) >> >> trace_dma_aio_cancel(dbs); >> >> + dbs->in_cancel = true; >> if (dbs->acb) { >> BlockDriverAIOCB *acb = dbs->acb; >> dbs->acb = NULL; >> - dbs->in_cancel = true; >> bdrv_aio_cancel(acb); >> - dbs->in_cancel = false; >> } >> dbs->common.cb = NULL; >> dma_complete(dbs, 0); >> + qemu_aio_release(dbs); >> } >> >> static const AIOCBInfo dma_aiocb_info = { >> >> >> . >> > > acked, thanks Did you test this? :) What is the testcase? Paolo ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Qemu-devel] [PATCH] ide: fix double free 2014-07-02 9:25 ` Paolo Bonzini @ 2014-07-02 9:46 ` Gonglei (Arei) 2014-07-02 10:16 ` Paolo Bonzini 0 siblings, 1 reply; 19+ messages in thread From: Gonglei (Arei) @ 2014-07-02 9:46 UTC (permalink / raw) To: Paolo Bonzini, chenliang (T) Cc: kwolf@redhat.com, Huangweidong (C), qemu-devel@nongnu.org, stefanha@redhat.com > -----Original Message----- > From: Paolo Bonzini [mailto:pbonzini@redhat.com] > Sent: Wednesday, July 02, 2014 5:26 PM > To: chenliang (T) > Cc: Gonglei (Arei); qemu-devel@nongnu.org; kwolf@redhat.com; > stefanha@redhat.com; Huangweidong (C) > Subject: Re: [PATCH] ide: fix double free > > Il 02/07/2014 11:24, ChenLiang ha scritto: > > On 2014/7/2 17:04, Paolo Bonzini wrote: > > > >> This is definitely a heavyweight solution, and in fact the bug should > >> not be there in the first place. See dma_complete: > >> > >> static void dma_complete(DMAAIOCB *dbs, int ret) > >> { > >> trace_dma_complete(dbs, ret, dbs->common.cb); > >> > >> dma_bdrv_unmap(dbs); > >> if (dbs->common.cb) { > >> dbs->common.cb(dbs->common.opaque, ret); > >> } > >> qemu_iovec_destroy(&dbs->iov); > >> if (dbs->bh) { > >> qemu_bh_delete(dbs->bh); > >> dbs->bh = NULL; > >> } > >> if (!dbs->in_cancel) { > >> /* Requests may complete while dma_aio_cancel is in progress. > In > >> * this case, the AIOCB should not be released because it is still > >> * referenced by dma_aio_cancel. */ > >> qemu_aio_release(dbs); > >> } > >> } > >> > >> Perhaps something like this? > >> > >> diff --git a/dma-helpers.c b/dma-helpers.c > >> index 53cbe92..21b70d12 100644 > >> --- a/dma-helpers.c > >> +++ b/dma-helpers.c > >> @@ -181,15 +181,15 @@ static void dma_aio_cancel(BlockDriverAIOCB > *acb) > >> > >> trace_dma_aio_cancel(dbs); > >> > >> + dbs->in_cancel = true; > >> if (dbs->acb) { > >> BlockDriverAIOCB *acb = dbs->acb; > >> dbs->acb = NULL; > >> - dbs->in_cancel = true; > >> bdrv_aio_cancel(acb); > >> - dbs->in_cancel = false; > >> } > >> dbs->common.cb = NULL; > >> dma_complete(dbs, 0); > >> + qemu_aio_release(dbs); > >> } > >> > >> static const AIOCBInfo dma_aiocb_info = { > >> > >> > >> . > >> > > > > acked, thanks > > Did you test this? :) What is the testcase? > > Paolo Tested-by: Gonglei <arei.gonglei@huawei.com> Hi, Paolo. We have tested your above patch, and it works well for us. The qemu command line list as bellow, which started by libvirt : /home/qemu/x86_64-softmmu/qemu-system-x86_64 -name test -S -machine pc-i440fx-1.5,accel=kvm,usb=off -m 4096 -realtime mlock=off -smp 16,sockets=16,cores=1,threads=1 -uuid abbbd27f-b83c-4c3b-8add-98df31eea0b7 -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/test.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device lsi,id=scsi0,bus=pci.0,addr=0x5 -drive file=/tmp/qemu-crash/linux_x86_64.iso,if=none,id=drive-ide0-0-0,readonly=on,format=raw,cache=none,aio=native -device ide-cd,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0,bootindex=1 -drive file=/home/test.img,if=none,id=drive-ide0-0-1,format=raw,cache=none,aio=native -device ide-hd,bus=ide.0,unit=1,drive=drive-ide0-0-1,id=ide0-0-1 -netdev tap,fd=25,id=hostnet0 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:d9:3a:73,bus=pci.0,addr=0x3 -device usb-tablet,id=input0 -vnc 0.0.0.0:0 -device cirrus-vga,id=video0,vgamem_mb=9,bus=pci.0,addr=0x2 We setup a linux guest os on an IDE disk, named "test.img", when the guest os Finishe the process of install, then I execute "reboot -f" in the VM immediately, And then will crash the qemu. Best regards, -Gonglei ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Qemu-devel] [PATCH] ide: fix double free 2014-07-02 9:46 ` Gonglei (Arei) @ 2014-07-02 10:16 ` Paolo Bonzini 2014-07-02 11:12 ` ChenLiang 2014-07-02 11:33 ` ChenLiang 0 siblings, 2 replies; 19+ messages in thread From: Paolo Bonzini @ 2014-07-02 10:16 UTC (permalink / raw) To: Gonglei (Arei), chenliang (T) Cc: kwolf@redhat.com, Huangweidong (C), qemu-devel@nongnu.org, stefanha@redhat.com Il 02/07/2014 11:46, Gonglei (Arei) ha scritto: > Hi, Paolo. We have tested your above patch, and it works well for us. I'm still not sure where the fix is. I jotted the patch quickly, but I'd rather understand it better before submitting it. Here is it again: --- a/dma-helpers.c +++ b/dma-helpers.c @@ -181,15 +181,15 @@ static void dma_aio_cancel(BlockDriverAIOCB *acb) trace_dma_aio_cancel(dbs); + dbs->in_cancel = true; if (dbs->acb) { BlockDriverAIOCB *acb = dbs->acb; dbs->acb = NULL; - dbs->in_cancel = true; bdrv_aio_cancel(acb); - dbs->in_cancel = false; } dbs->common.cb = NULL; dma_complete(dbs, 0); + qemu_aio_release(dbs); } static const AIOCBInfo dma_aiocb_info = { and here is dma_complete: static void dma_complete(DMAAIOCB *dbs, int ret) { trace_dma_complete(dbs, ret, dbs->common.cb); dma_bdrv_unmap(dbs); if (dbs->common.cb) { dbs->common.cb(dbs->common.opaque, ret); } qemu_iovec_destroy(&dbs->iov); if (dbs->bh) { qemu_bh_delete(dbs->bh); dbs->bh = NULL; } if (!dbs->in_cancel) { /* Requests may complete while dma_aio_cancel is in progress. * this case, the AIOCB should not be released because it is * still referenced by dma_aio_cancel. */ qemu_aio_release(dbs); } } The qemu_aio_release call obviously happened during the second dma_complete call. However, the second dma_complete call does not call the callback, and even if dma_bdrv_unmap resulted in a call to continue_after_map_failure, the bottom half would be cancelled immediately. So we changed from: dbs->in_cancel = false; /* dma_complete calls dma_bdrv_unmap */ for (i = 0; i < dbs->iov.niov; ++i) { /* might schedule dbs->bh */ dma_memory_unmap(dbs->sg->as, dbs->iov.iov[i].iov_base, dbs->iov.iov[i].iov_len, dbs->dir, dbs->iov.iov[i].iov_len); } qemu_iovec_reset(&dbs->iov); /* back to dma_complete, dbs->common.cb == NULL */ qemu_iovec_destroy(&dbs->iov); if (dbs->bh) { /* dbs->bh now unscheduled, so it will never run */ qemu_bh_delete(dbs->bh); dbs->bh = NULL; } /* dbs->in_cancel is false here. */ if (!dbs->in_cancel) { /* Requests may complete while dma_aio_cancel is in progress. * this case, the AIOCB should not be released because it is * still referenced by dma_aio_cancel. */ qemu_aio_release(dbs); } to dbs->in_cancel = false; /* dma_complete calls dma_bdrv_unmap */ for (i = 0; i < dbs->iov.niov; ++i) { /* might schedule dbs->bh */ dma_memory_unmap(dbs->sg->as, dbs->iov.iov[i].iov_base, dbs->iov.iov[i].iov_len, dbs->dir, dbs->iov.iov[i].iov_len); } qemu_iovec_reset(&dbs->iov); /* back to dma_complete, dbs->common.cb == NULL */ qemu_iovec_destroy(&dbs->iov); if (dbs->bh) { /* dbs->bh now unscheduled, so it will never run */ qemu_bh_delete(dbs->bh); dbs->bh = NULL; } /* after the patch dbs->in_cancel is true, the if never triggers */ if (!dbs->in_cancel) { /* Requests may complete while dma_aio_cancel is in progress. * this case, the AIOCB should not be released because it is * still referenced by dma_aio_cancel. */ qemu_aio_release(dbs); } /* back to dma_aio_cancel */ qemu_aio_release(dbs); This doesn't make sense to me. Can you find out where I'm wrong? Paolo ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Qemu-devel] [PATCH] ide: fix double free 2014-07-02 10:16 ` Paolo Bonzini @ 2014-07-02 11:12 ` ChenLiang 2014-07-02 11:24 ` Paolo Bonzini 2014-07-02 11:33 ` ChenLiang 1 sibling, 1 reply; 19+ messages in thread From: ChenLiang @ 2014-07-02 11:12 UTC (permalink / raw) To: Paolo Bonzini Cc: kwolf@redhat.com, Gonglei (Arei), Huangweidong (C), qemu-devel@nongnu.org, stefanha@redhat.com On 2014/7/2 18:16, Paolo Bonzini wrote: > Il 02/07/2014 11:46, Gonglei (Arei) ha scritto: >> Hi, Paolo. We have tested your above patch, and it works well for us. > > I'm still not sure where the fix is. I jotted the patch quickly, but I'd rather understand it better before submitting it. Here is it again: > > --- a/dma-helpers.c > +++ b/dma-helpers.c > @@ -181,15 +181,15 @@ static void dma_aio_cancel(BlockDriverAIOCB *acb) > trace_dma_aio_cancel(dbs); > > + dbs->in_cancel = true; > if (dbs->acb) { > BlockDriverAIOCB *acb = dbs->acb; > dbs->acb = NULL; > - dbs->in_cancel = true; > bdrv_aio_cancel(acb); > - dbs->in_cancel = false; > } > dbs->common.cb = NULL; > dma_complete(dbs, 0); > + qemu_aio_release(dbs); > } > > static const AIOCBInfo dma_aiocb_info = { > > and here is dma_complete: > > static void dma_complete(DMAAIOCB *dbs, int ret) > { > trace_dma_complete(dbs, ret, dbs->common.cb); > dma_bdrv_unmap(dbs); > if (dbs->common.cb) { > dbs->common.cb(dbs->common.opaque, ret); > } > qemu_iovec_destroy(&dbs->iov); > if (dbs->bh) { > qemu_bh_delete(dbs->bh); > dbs->bh = NULL; > } > if (!dbs->in_cancel) { > /* Requests may complete while dma_aio_cancel is in progress. > * this case, the AIOCB should not be released because it is > * still referenced by dma_aio_cancel. */ > qemu_aio_release(dbs); > } > } > > The qemu_aio_release call obviously happened during the second dma_complete call. > > However, the second dma_complete call does not call the callback, and even if dma_bdrv_unmap resulted in a call to continue_after_map_failure, the bottom half would be cancelled immediately. So we changed from: > > dbs->in_cancel = false; > > /* dma_complete calls dma_bdrv_unmap */ > for (i = 0; i < dbs->iov.niov; ++i) { > /* might schedule dbs->bh */ > dma_memory_unmap(dbs->sg->as, dbs->iov.iov[i].iov_base, > dbs->iov.iov[i].iov_len, dbs->dir, > dbs->iov.iov[i].iov_len); > } > qemu_iovec_reset(&dbs->iov); > > /* back to dma_complete, dbs->common.cb == NULL */ > qemu_iovec_destroy(&dbs->iov); > if (dbs->bh) { > /* dbs->bh now unscheduled, so it will never run */ > qemu_bh_delete(dbs->bh); > dbs->bh = NULL; > } > /* dbs->in_cancel is false here. */ > if (!dbs->in_cancel) { > /* Requests may complete while dma_aio_cancel is in progress. > * this case, the AIOCB should not be released because it is > * still referenced by dma_aio_cancel. */ > qemu_aio_release(dbs); > } > > to > > dbs->in_cancel = false; > > /* dma_complete calls dma_bdrv_unmap */ > for (i = 0; i < dbs->iov.niov; ++i) { > /* might schedule dbs->bh */ > dma_memory_unmap(dbs->sg->as, dbs->iov.iov[i].iov_base, > dbs->iov.iov[i].iov_len, dbs->dir, > dbs->iov.iov[i].iov_len); > } > qemu_iovec_reset(&dbs->iov); > > /* back to dma_complete, dbs->common.cb == NULL */ > qemu_iovec_destroy(&dbs->iov); > if (dbs->bh) { > /* dbs->bh now unscheduled, so it will never run */ > qemu_bh_delete(dbs->bh); > dbs->bh = NULL; > } > > /* after the patch dbs->in_cancel is true, the if never triggers */ > if (!dbs->in_cancel) { > /* Requests may complete while dma_aio_cancel is in progress. > * this case, the AIOCB should not be released because it is > * still referenced by dma_aio_cancel. */ > qemu_aio_release(dbs); > } > > /* back to dma_aio_cancel */ > qemu_aio_release(dbs); > > This doesn't make sense to me. Can you find out where I'm wrong? > > Paolo > > . > This patch avoid freeing dbs by dma_complete when dma_aio_cancel is running. Because dma_complete also will be called by dma_bdrv_cb. So double free will never happen. Chenliang ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Qemu-devel] [PATCH] ide: fix double free 2014-07-02 11:12 ` ChenLiang @ 2014-07-02 11:24 ` Paolo Bonzini 0 siblings, 0 replies; 19+ messages in thread From: Paolo Bonzini @ 2014-07-02 11:24 UTC (permalink / raw) To: ChenLiang Cc: kwolf@redhat.com, Gonglei (Arei), Huangweidong (C), qemu-devel@nongnu.org, stefanha@redhat.com Il 02/07/2014 13:12, ChenLiang ha scritto: > This patch avoid freeing dbs by dma_complete when dma_aio_cancel is running. > Because dma_complete also will be called by dma_bdrv_cb. So double free > will never happen. Yes, you already said that. But I'm not sure _why_ the patch avoids freeing dbs, especially since the patch adds a free in dma_aio_cancel itself. My patch shouldn't have any effect. Where is my analysis of it wrong? Paolo ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Qemu-devel] [PATCH] ide: fix double free 2014-07-02 10:16 ` Paolo Bonzini 2014-07-02 11:12 ` ChenLiang @ 2014-07-02 11:33 ` ChenLiang 2014-07-02 11:40 ` Paolo Bonzini 1 sibling, 1 reply; 19+ messages in thread From: ChenLiang @ 2014-07-02 11:33 UTC (permalink / raw) To: Paolo Bonzini Cc: kwolf@redhat.com, Gonglei (Arei), Huangweidong (C), qemu-devel@nongnu.org, stefanha@redhat.com On 2014/7/2 18:16, Paolo Bonzini wrote: > Il 02/07/2014 11:46, Gonglei (Arei) ha scritto: >> Hi, Paolo. We have tested your above patch, and it works well for us. > > I'm still not sure where the fix is. I jotted the patch quickly, but I'd rather understand it better before submitting it. Here is it again: > > --- a/dma-helpers.c > +++ b/dma-helpers.c > @@ -181,15 +181,15 @@ static void dma_aio_cancel(BlockDriverAIOCB *acb) > trace_dma_aio_cancel(dbs); > > + dbs->in_cancel = true; > if (dbs->acb) { > BlockDriverAIOCB *acb = dbs->acb; > dbs->acb = NULL; > - dbs->in_cancel = true; > bdrv_aio_cancel(acb); > - dbs->in_cancel = false; > } > dbs->common.cb = NULL; > dma_complete(dbs, 0); > + qemu_aio_release(dbs); > } > Hmm, dbs->in_cancel will be true always. Although this will avoid freeing dbs by dma_comlete. But it maybe a mistake. > static const AIOCBInfo dma_aiocb_info = { > > and here is dma_complete: > > static void dma_complete(DMAAIOCB *dbs, int ret) > { > trace_dma_complete(dbs, ret, dbs->common.cb); > dma_bdrv_unmap(dbs); > if (dbs->common.cb) { > dbs->common.cb(dbs->common.opaque, ret); > } > qemu_iovec_destroy(&dbs->iov); > if (dbs->bh) { > qemu_bh_delete(dbs->bh); > dbs->bh = NULL; > } > if (!dbs->in_cancel) { > /* Requests may complete while dma_aio_cancel is in progress. > * this case, the AIOCB should not be released because it is > * still referenced by dma_aio_cancel. */ > qemu_aio_release(dbs); > } > } > > The qemu_aio_release call obviously happened during the second dma_complete call. > > However, the second dma_complete call does not call the callback, and even if dma_bdrv_unmap resulted in a call to continue_after_map_failure, the bottom half would be cancelled immediately. So we changed from: > > dbs->in_cancel = false; > > /* dma_complete calls dma_bdrv_unmap */ > for (i = 0; i < dbs->iov.niov; ++i) { > /* might schedule dbs->bh */ > dma_memory_unmap(dbs->sg->as, dbs->iov.iov[i].iov_base, > dbs->iov.iov[i].iov_len, dbs->dir, > dbs->iov.iov[i].iov_len); > } > qemu_iovec_reset(&dbs->iov); > > /* back to dma_complete, dbs->common.cb == NULL */ > qemu_iovec_destroy(&dbs->iov); > if (dbs->bh) { > /* dbs->bh now unscheduled, so it will never run */ > qemu_bh_delete(dbs->bh); > dbs->bh = NULL; > } > /* dbs->in_cancel is false here. */ > if (!dbs->in_cancel) { > /* Requests may complete while dma_aio_cancel is in progress. > * this case, the AIOCB should not be released because it is > * still referenced by dma_aio_cancel. */ > qemu_aio_release(dbs); > } > > to > > dbs->in_cancel = false; > > /* dma_complete calls dma_bdrv_unmap */ > for (i = 0; i < dbs->iov.niov; ++i) { > /* might schedule dbs->bh */ > dma_memory_unmap(dbs->sg->as, dbs->iov.iov[i].iov_base, > dbs->iov.iov[i].iov_len, dbs->dir, > dbs->iov.iov[i].iov_len); > } > qemu_iovec_reset(&dbs->iov); > > /* back to dma_complete, dbs->common.cb == NULL */ > qemu_iovec_destroy(&dbs->iov); > if (dbs->bh) { > /* dbs->bh now unscheduled, so it will never run */ > qemu_bh_delete(dbs->bh); > dbs->bh = NULL; > } > > /* after the patch dbs->in_cancel is true, the if never triggers */ > if (!dbs->in_cancel) { > /* Requests may complete while dma_aio_cancel is in progress. > * this case, the AIOCB should not be released because it is > * still referenced by dma_aio_cancel. */ > qemu_aio_release(dbs); > } > > /* back to dma_aio_cancel */ > qemu_aio_release(dbs); > > This doesn't make sense to me. Can you find out where I'm wrong? > > Paolo > > . > ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Qemu-devel] [PATCH] ide: fix double free 2014-07-02 11:33 ` ChenLiang @ 2014-07-02 11:40 ` Paolo Bonzini 2014-07-02 11:57 ` ChenLiang 0 siblings, 1 reply; 19+ messages in thread From: Paolo Bonzini @ 2014-07-02 11:40 UTC (permalink / raw) To: ChenLiang Cc: kwolf@redhat.com, Gonglei (Arei), Huangweidong (C), qemu-devel@nongnu.org, stefanha@redhat.com Il 02/07/2014 13:33, ChenLiang ha scritto: > On 2014/7/2 18:16, Paolo Bonzini wrote: > >> Il 02/07/2014 11:46, Gonglei (Arei) ha scritto: >>> Hi, Paolo. We have tested your above patch, and it works well for us. >> >> I'm still not sure where the fix is. I jotted the patch quickly, but I'd rather understand it better before submitting it. Here is it again: >> >> --- a/dma-helpers.c >> +++ b/dma-helpers.c >> @@ -181,15 +181,15 @@ static void dma_aio_cancel(BlockDriverAIOCB *acb) >> trace_dma_aio_cancel(dbs); >> >> + dbs->in_cancel = true; >> if (dbs->acb) { >> BlockDriverAIOCB *acb = dbs->acb; >> dbs->acb = NULL; >> - dbs->in_cancel = true; >> bdrv_aio_cancel(acb); >> - dbs->in_cancel = false; >> } >> dbs->common.cb = NULL; >> dma_complete(dbs, 0); >> + qemu_aio_release(dbs); >> } >> > > > Hmm, dbs->in_cancel will be true always. Although this will avoid freeing dbs by dma_comlete. > But it maybe a mistake. This was on purpose; I'm doing the free myself in dma_aio_cancel, so I wanted to avoid the qemu_aio_release from dma_complete. This was in case of a recursive call to dma_complete. But I don't see how that recursive call could happen outside the "if (dbs->acb)"; and inside the "if" the protection is there already. Can you gather the backtraces for _both_ calls to qemu_aio_release, rather than just the second? With what guest are you encountering the bug? Paolo ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Qemu-devel] [PATCH] ide: fix double free 2014-07-02 11:40 ` Paolo Bonzini @ 2014-07-02 11:57 ` ChenLiang 2014-07-02 12:19 ` Paolo Bonzini 0 siblings, 1 reply; 19+ messages in thread From: ChenLiang @ 2014-07-02 11:57 UTC (permalink / raw) To: Paolo Bonzini Cc: kwolf@redhat.com, Gonglei (Arei), Huangweidong (C), qemu-devel@nongnu.org, stefanha@redhat.com On 2014/7/2 19:40, Paolo Bonzini wrote: > Il 02/07/2014 13:33, ChenLiang ha scritto: >> On 2014/7/2 18:16, Paolo Bonzini wrote: >> >>> Il 02/07/2014 11:46, Gonglei (Arei) ha scritto: >>>> Hi, Paolo. We have tested your above patch, and it works well for us. >>> >>> I'm still not sure where the fix is. I jotted the patch quickly, but I'd rather understand it better before submitting it. Here is it again: >>> >>> --- a/dma-helpers.c >>> +++ b/dma-helpers.c >>> @@ -181,15 +181,15 @@ static void dma_aio_cancel(BlockDriverAIOCB *acb) >>> trace_dma_aio_cancel(dbs); >>> >>> + dbs->in_cancel = true; >>> if (dbs->acb) { >>> BlockDriverAIOCB *acb = dbs->acb; >>> dbs->acb = NULL; >>> - dbs->in_cancel = true; >>> bdrv_aio_cancel(acb); >>> - dbs->in_cancel = false; >>> } >>> dbs->common.cb = NULL; >>> dma_complete(dbs, 0); >>> + qemu_aio_release(dbs); >>> } >>> >> >> >> Hmm, dbs->in_cancel will be true always. Although this will avoid freeing dbs by dma_comlete. >> But it maybe a mistake. > > This was on purpose; I'm doing the free myself in dma_aio_cancel, so I wanted to avoid the qemu_aio_release from dma_complete. This was in case of a recursive call to dma_complete. But I don't see how that recursive call could happen outside the "if (dbs->acb)"; and inside the "if" the protection is there already. > > Can you gather the backtraces for _both_ calls to qemu_aio_release, rather than just the second? > > With what guest are you encountering the bug? > > Paolo > > > . > (gdb) bt #0 qemu_aio_release (p=0x7f44788d1290) at block.c:4260 #1 0x00007f4477494e5e in dma_complete (dbs=0x7f44788d1290, ret=0) at dma-helpers.c:135 #2 0x00007f44774952c2 in dma_aio_cancel (acb=0x7f44788d1290) at dma-helpers.c:195 #3 0x00007f447744825b in bdrv_aio_cancel (acb=0x7f44788d1290) at block.c:3848 #4 0x00007f4477513911 in ide_bus_reset (bus=0x7f44785f1bd8) at hw/ide/core.c:1957 #5 0x00007f4477516b3c in piix3_reset (opaque=0x7f44785f1530) at hw/ide/piix.c:113 #6 0x00007f4477647b9f in qemu_devices_reset () at vl.c:2131 #7 0x00007f4477647c0f in qemu_system_reset (report=true) at vl.c:2140 #8 0x00007f4477648127 in main_loop_should_exit () at vl.c:2274 #9 0x00007f447764823a in main_loop () at vl.c:2323 #10 0x00007f447764f6da in main (argc=57, argv=0x7fff5d194378, envp=0x7fff5d194548) at vl.c:4803 Chenliang is ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Qemu-devel] [PATCH] ide: fix double free 2014-07-02 11:57 ` ChenLiang @ 2014-07-02 12:19 ` Paolo Bonzini 2014-07-02 12:46 ` 陈梁 ` (2 more replies) 0 siblings, 3 replies; 19+ messages in thread From: Paolo Bonzini @ 2014-07-02 12:19 UTC (permalink / raw) To: ChenLiang Cc: kwolf@redhat.com, Gonglei (Arei), Huangweidong (C), qemu-devel@nongnu.org, stefanha@redhat.com Il 02/07/2014 13:57, ChenLiang ha scritto: >>> Hmm, dbs->in_cancel will be true always. Although this will avoid freeing dbs by dma_comlete. >>> But it maybe a mistake. >> >> This was on purpose; I'm doing the free myself in dma_aio_cancel, so I wanted to avoid the qemu_aio_release from dma_complete. This was in case of a recursive call to dma_complete. But I don't see how that recursive call could happen outside the "if (dbs->acb)"; and inside the "if" the protection is there already. >> >> Can you gather the backtraces for _both_ calls to qemu_aio_release, rather than just the second? > > (gdb) bt > #0 qemu_aio_release (p=0x7f44788d1290) at block.c:4260 > #1 0x00007f4477494e5e in dma_complete (dbs=0x7f44788d1290, ret=0) at dma-helpers.c:135 > #2 0x00007f44774952c2 in dma_aio_cancel (acb=0x7f44788d1290) at dma-helpers.c:195 > #3 0x00007f447744825b in bdrv_aio_cancel (acb=0x7f44788d1290) at block.c:3848 > #4 0x00007f4477513911 in ide_bus_reset (bus=0x7f44785f1bd8) at hw/ide/core.c:1957 > #5 0x00007f4477516b3c in piix3_reset (opaque=0x7f44785f1530) at hw/ide/piix.c:113 > #6 0x00007f4477647b9f in qemu_devices_reset () at vl.c:2131 > #7 0x00007f4477647c0f in qemu_system_reset (report=true) at vl.c:2140 > #8 0x00007f4477648127 in main_loop_should_exit () at vl.c:2274 > #9 0x00007f447764823a in main_loop () at vl.c:2323 > #10 0x00007f447764f6da in main (argc=57, argv=0x7fff5d194378, envp=0x7fff5d194548) at vl.c:4803 And the second is #7 0x00007f3cb525de5e in dma_complete (dbs=0x7f3cb63f3220, ret=0) at dma-helpers.c:135 #8 0x00007f3cb525df3d in dma_bdrv_cb (opaque=0x7f3cb63f3220, ret=0) at dma-helpers.c:152 #9 0x00007f3cb5212102 in bdrv_co_em_bh (opaque=0x7f3cb6398980) at block.c:4127 #10 0x00007f3cb51f6cef in aio_bh_poll (ctx=0x7f3cb622a8f0) at async.c:70 #11 0x00007f3cb51f695a in aio_poll (ctx=0x7f3cb622a8f0, blocking=false) at aio-posix.c:185 #12 0x00007f3cb51f7056 in aio_ctx_dispatch (source=0x7f3cb622a8f0, callback=0x0, user_data=0x0) at async.c:167 #13 0x00007f3cb48b969a in g_main_context_dispatch () from /usr/lib64/libglib-2.0.so.0 This explains why my patch "fixes" the bug. It turns a double free into a dangling pointer: the second call now sees in_cancel == true and skips the free. The second call should have happened within dma_aio_cancel's call to bdrv_aio_cancel. This is the real bug. What is your version of QEMU? I cannot see any where bdrv_co_em_bh is at line 4127 or bdrv_aio_cancel is at line 3848. Can you reproduce it with qemu.git master? Paolo ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Qemu-devel] [PATCH] ide: fix double free 2014-07-02 12:19 ` Paolo Bonzini @ 2014-07-02 12:46 ` 陈梁 2014-07-02 12:54 ` 陈梁 2014-07-03 2:23 ` ChenLiang 2 siblings, 0 replies; 19+ messages in thread From: 陈梁 @ 2014-07-02 12:46 UTC (permalink / raw) To: Paolo Bonzini Cc: kwolf@redhat.com, ChenLiang, Huangweidong (C), Gonglei (Arei), qemu-devel@nongnu.org, 陈梁, stefanha@redhat.com > Il 02/07/2014 13:57, ChenLiang ha scritto: >>>> Hmm, dbs->in_cancel will be true always. Although this will avoid freeing dbs by dma_comlete. >>>> But it maybe a mistake. >>> >>> This was on purpose; I'm doing the free myself in dma_aio_cancel, so I wanted to avoid the qemu_aio_release from dma_complete. This was in case of a recursive call to dma_complete. But I don't see how that recursive call could happen outside the "if (dbs->acb)"; and inside the "if" the protection is there already. >>> >>> Can you gather the backtraces for _both_ calls to qemu_aio_release, rather than just the second? >> >> (gdb) bt >> #0 qemu_aio_release (p=0x7f44788d1290) at block.c:4260 >> #1 0x00007f4477494e5e in dma_complete (dbs=0x7f44788d1290, ret=0) at dma-helpers.c:135 >> #2 0x00007f44774952c2 in dma_aio_cancel (acb=0x7f44788d1290) at dma-helpers.c:195 >> #3 0x00007f447744825b in bdrv_aio_cancel (acb=0x7f44788d1290) at block.c:3848 >> #4 0x00007f4477513911 in ide_bus_reset (bus=0x7f44785f1bd8) at hw/ide/core.c:1957 >> #5 0x00007f4477516b3c in piix3_reset (opaque=0x7f44785f1530) at hw/ide/piix.c:113 >> #6 0x00007f4477647b9f in qemu_devices_reset () at vl.c:2131 >> #7 0x00007f4477647c0f in qemu_system_reset (report=true) at vl.c:2140 >> #8 0x00007f4477648127 in main_loop_should_exit () at vl.c:2274 >> #9 0x00007f447764823a in main_loop () at vl.c:2323 >> #10 0x00007f447764f6da in main (argc=57, argv=0x7fff5d194378, envp=0x7fff5d194548) at vl.c:4803 > > And the second is > > #7 0x00007f3cb525de5e in dma_complete (dbs=0x7f3cb63f3220, ret=0) at dma-helpers.c:135 > #8 0x00007f3cb525df3d in dma_bdrv_cb (opaque=0x7f3cb63f3220, ret=0) at dma-helpers.c:152 > #9 0x00007f3cb5212102 in bdrv_co_em_bh (opaque=0x7f3cb6398980) at block.c:4127 > #10 0x00007f3cb51f6cef in aio_bh_poll (ctx=0x7f3cb622a8f0) at async.c:70 > #11 0x00007f3cb51f695a in aio_poll (ctx=0x7f3cb622a8f0, blocking=false) at aio-posix.c:185 > #12 0x00007f3cb51f7056 in aio_ctx_dispatch (source=0x7f3cb622a8f0, callback=0x0, user_data=0x0) > at async.c:167 > #13 0x00007f3cb48b969a in g_main_context_dispatch () from /usr/lib64/libglib-2.0.so.0 > > This explains why my patch "fixes" the bug. It turns a double free > into a dangling pointer: the second call now sees in_cancel == true > and skips the free. > > The second call should have happened within dma_aio_cancel's call to > bdrv_aio_cancel. This is the real bug. > > What is your version of QEMU? I cannot see any where bdrv_co_em_bh is > at line 4127 or bdrv_aio_cancel is at line 3848. Can you reproduce it > with qemu.git master? > > Paolo > The version of my qemu is 1.5.0. Tomorrow I will reproduce it, I am in home now : ) . Chenliang ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Qemu-devel] [PATCH] ide: fix double free 2014-07-02 12:19 ` Paolo Bonzini 2014-07-02 12:46 ` 陈梁 @ 2014-07-02 12:54 ` 陈梁 2014-07-02 13:09 ` Paolo Bonzini 2014-07-03 2:23 ` ChenLiang 2 siblings, 1 reply; 19+ messages in thread From: 陈梁 @ 2014-07-02 12:54 UTC (permalink / raw) To: Paolo Bonzini Cc: kwolf@redhat.com, ChenLiang, Huangweidong (C), Gonglei (Arei), qemu-devel@nongnu.org, 陈梁, stefanha@redhat.com > Il 02/07/2014 13:57, ChenLiang ha scritto: >>>> Hmm, dbs->in_cancel will be true always. Although this will avoid freeing dbs by dma_comlete. >>>> But it maybe a mistake. >>> >>> This was on purpose; I'm doing the free myself in dma_aio_cancel, so I wanted to avoid the qemu_aio_release from dma_complete. This was in case of a recursive call to dma_complete. But I don't see how that recursive call could happen outside the "if (dbs->acb)"; and inside the "if" the protection is there already. >>> >>> Can you gather the backtraces for _both_ calls to qemu_aio_release, rather than just the second? >> >> (gdb) bt >> #0 qemu_aio_release (p=0x7f44788d1290) at block.c:4260 >> #1 0x00007f4477494e5e in dma_complete (dbs=0x7f44788d1290, ret=0) at dma-helpers.c:135 >> #2 0x00007f44774952c2 in dma_aio_cancel (acb=0x7f44788d1290) at dma-helpers.c:195 >> #3 0x00007f447744825b in bdrv_aio_cancel (acb=0x7f44788d1290) at block.c:3848 >> #4 0x00007f4477513911 in ide_bus_reset (bus=0x7f44785f1bd8) at hw/ide/core.c:1957 >> #5 0x00007f4477516b3c in piix3_reset (opaque=0x7f44785f1530) at hw/ide/piix.c:113 >> #6 0x00007f4477647b9f in qemu_devices_reset () at vl.c:2131 >> #7 0x00007f4477647c0f in qemu_system_reset (report=true) at vl.c:2140 >> #8 0x00007f4477648127 in main_loop_should_exit () at vl.c:2274 >> #9 0x00007f447764823a in main_loop () at vl.c:2323 >> #10 0x00007f447764f6da in main (argc=57, argv=0x7fff5d194378, envp=0x7fff5d194548) at vl.c:4803 > > And the second is > > #7 0x00007f3cb525de5e in dma_complete (dbs=0x7f3cb63f3220, ret=0) at dma-helpers.c:135 > #8 0x00007f3cb525df3d in dma_bdrv_cb (opaque=0x7f3cb63f3220, ret=0) at dma-helpers.c:152 > #9 0x00007f3cb5212102 in bdrv_co_em_bh (opaque=0x7f3cb6398980) at block.c:4127 > #10 0x00007f3cb51f6cef in aio_bh_poll (ctx=0x7f3cb622a8f0) at async.c:70 > #11 0x00007f3cb51f695a in aio_poll (ctx=0x7f3cb622a8f0, blocking=false) at aio-posix.c:185 > #12 0x00007f3cb51f7056 in aio_ctx_dispatch (source=0x7f3cb622a8f0, callback=0x0, user_data=0x0) > at async.c:167 > #13 0x00007f3cb48b969a in g_main_context_dispatch () from /usr/lib64/libglib-2.0.so.0 > > This explains why my patch "fixes" the bug. It turns a double free > into a dangling pointer: the second call now sees in_cancel == true > and skips the free. > > The second call should have happened within dma_aio_cancel's call to > bdrv_aio_cancel. This is the real bug. IMO, the second need not happened within dma_aio_cancel's call to bdrv_aio_cancel. The double free will be happened if dam_aio_cancel is called. > > What is your version of QEMU? I cannot see any where bdrv_co_em_bh is > at line 4127 or bdrv_aio_cancel is at line 3848. Can you reproduce it > with qemu.git master? > > Paolo > ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Qemu-devel] [PATCH] ide: fix double free 2014-07-02 12:54 ` 陈梁 @ 2014-07-02 13:09 ` Paolo Bonzini 0 siblings, 0 replies; 19+ messages in thread From: Paolo Bonzini @ 2014-07-02 13:09 UTC (permalink / raw) To: 陈梁 Cc: kwolf@redhat.com, ChenLiang, Huangweidong (C), qemu-devel@nongnu.org, Gonglei (Arei), stefanha@redhat.com Il 02/07/2014 14:54, 陈梁 ha scritto: >> > The second call should have happened within dma_aio_cancel's call to >> > bdrv_aio_cancel. This is the real bug. > IMO, the second need not happened within dma_aio_cancel's call to bdrv_aio_cancel. > The double free will be happened if dam_aio_cancel is called. The callback must not be invoked after bdrv_aio_cancel. This is the fundamental invariant of bdrv_aio_cancel. All implementations of AIOCB must respect it, or bugs like this one happen. Here, either bdrv_aio_cancel was not invoked, or the invariant was broken. The other invariant, this time in dma-helpers.c, is that dma_bdrv_cb either exits with no pending AIOCB, or it exits with a non-NULL dbs->acb. If bdrv_aio_cancel was not invoked, this invariant was broken because there is a pending AIOCB but it is not in dbs->acb. Paolo ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Qemu-devel] [PATCH] ide: fix double free 2014-07-02 12:19 ` Paolo Bonzini 2014-07-02 12:46 ` 陈梁 2014-07-02 12:54 ` 陈梁 @ 2014-07-03 2:23 ` ChenLiang 2014-07-03 10:41 ` Paolo Bonzini 2 siblings, 1 reply; 19+ messages in thread From: ChenLiang @ 2014-07-03 2:23 UTC (permalink / raw) To: Paolo Bonzini Cc: kwolf@redhat.com, Gonglei (Arei), Huangweidong (C), qemu-devel@nongnu.org, stefanha@redhat.com On 2014/7/2 20:19, Paolo Bonzini wrote: > Il 02/07/2014 13:57, ChenLiang ha scritto: >>>> Hmm, dbs->in_cancel will be true always. Although this will avoid freeing dbs by dma_comlete. >>>> But it maybe a mistake. >>> >>> This was on purpose; I'm doing the free myself in dma_aio_cancel, so I wanted to avoid the qemu_aio_release from dma_complete. This was in case of a recursive call to dma_complete. But I don't see how that recursive call could happen outside the "if (dbs->acb)"; and inside the "if" the protection is there already. >>> >>> Can you gather the backtraces for _both_ calls to qemu_aio_release, rather than just the second? >> >> (gdb) bt >> #0 qemu_aio_release (p=0x7f44788d1290) at block.c:4260 >> #1 0x00007f4477494e5e in dma_complete (dbs=0x7f44788d1290, ret=0) at dma-helpers.c:135 >> #2 0x00007f44774952c2 in dma_aio_cancel (acb=0x7f44788d1290) at dma-helpers.c:195 >> #3 0x00007f447744825b in bdrv_aio_cancel (acb=0x7f44788d1290) at block.c:3848 >> #4 0x00007f4477513911 in ide_bus_reset (bus=0x7f44785f1bd8) at hw/ide/core.c:1957 >> #5 0x00007f4477516b3c in piix3_reset (opaque=0x7f44785f1530) at hw/ide/piix.c:113 >> #6 0x00007f4477647b9f in qemu_devices_reset () at vl.c:2131 >> #7 0x00007f4477647c0f in qemu_system_reset (report=true) at vl.c:2140 >> #8 0x00007f4477648127 in main_loop_should_exit () at vl.c:2274 >> #9 0x00007f447764823a in main_loop () at vl.c:2323 >> #10 0x00007f447764f6da in main (argc=57, argv=0x7fff5d194378, envp=0x7fff5d194548) at vl.c:4803 > > And the second is > > #7 0x00007f3cb525de5e in dma_complete (dbs=0x7f3cb63f3220, ret=0) at dma-helpers.c:135 > #8 0x00007f3cb525df3d in dma_bdrv_cb (opaque=0x7f3cb63f3220, ret=0) at dma-helpers.c:152 > #9 0x00007f3cb5212102 in bdrv_co_em_bh (opaque=0x7f3cb6398980) at block.c:4127 > #10 0x00007f3cb51f6cef in aio_bh_poll (ctx=0x7f3cb622a8f0) at async.c:70 > #11 0x00007f3cb51f695a in aio_poll (ctx=0x7f3cb622a8f0, blocking=false) at aio-posix.c:185 > #12 0x00007f3cb51f7056 in aio_ctx_dispatch (source=0x7f3cb622a8f0, callback=0x0, user_data=0x0) > at async.c:167 > #13 0x00007f3cb48b969a in g_main_context_dispatch () from /usr/lib64/libglib-2.0.so.0 > > This explains why my patch "fixes" the bug. It turns a double free > into a dangling pointer: the second call now sees in_cancel == true > and skips the free. > > The second call should have happened within dma_aio_cancel's call to > bdrv_aio_cancel. This is the real bug. > > What is your version of QEMU? I cannot see any where bdrv_co_em_bh is > at line 4127 or bdrv_aio_cancel is at line 3848. Can you reproduce it > with qemu.git master? > > Paolo > > . > qemu master branch bt: Program received signal SIGABRT, Aborted. 0x00007fd548355b55 in raise () from /lib64/libc.so.6 (gdb) bt #0 0x00007fd548355b55 in raise () from /lib64/libc.so.6 #1 0x00007fd548357131 in abort () from /lib64/libc.so.6 #2 0x00007fd548393e0f in __libc_message () from /lib64/libc.so.6 #3 0x00007fd548399618 in malloc_printerr () from /lib64/libc.so.6 #4 0x00007fd54b15e80e in free_and_trace (mem=0x7fd54beb2230) at vl.c:2815 #5 0x00007fd54b3453cd in qemu_aio_release (p=0x7fd54beb2230) at block.c:4813 #6 0x00007fd54b15717d in dma_complete (dbs=0x7fd54beb2230, ret=0) at dma-helpers.c:132 #7 0x00007fd54b157253 in dma_bdrv_cb (opaque=0x7fd54beb2230, ret=0) at dma-helpers.c:148 #8 0x00007fd54b344db8 in bdrv_co_em_bh (opaque=0x7fd54bea4b30) at block.c:4676 #9 0x00007fd54b335a72 in aio_bh_poll (ctx=0x7fd54bcec990) at async.c:81 #10 0x00007fd54b34b1b4 in aio_poll (ctx=0x7fd54bcec990, blocking=false) at aio-posix.c:188 #11 0x00007fd54b335ee0 in aio_ctx_dispatch (source=0x7fd54bcec990, callback=0x0, user_data=0x0) at async.c:211 #12 0x00007fd549e3669a in g_main_context_dispatch () from /usr/lib64/libglib-2.0.so.0 #13 0x00007fd54b348c45 in glib_pollfds_poll () at main-loop.c:190 #14 0x00007fd54b348d3d in os_host_main_loop_wait (timeout=0) at main-loop.c:235 #15 0x00007fd54b348e2f in main_loop_wait (nonblocking=0) at main-loop.c:484 #16 0x00007fd54b15b0f8 in main_loop () at vl.c:2007 #17 0x00007fd54b162a35 in main (argc=57, argv=0x7fff152720a8, envp=0x7fff15272278) at vl.c:4526 (gdb) bt #0 qemu_aio_release (p=0x7f86420ebec0) at block.c:4811 #1 0x00007f86412b617d in dma_complete (dbs=0x7f86420ebec0, ret=0) at dma-helpers.c:132 #2 0x00007f86412b65ab in dma_aio_cancel (acb=0x7f86420ebec0) at dma-helpers.c:192 #3 0x00007f86414a3996 in bdrv_aio_cancel (acb=0x7f86420ebec0) at block.c:4559 #4 0x00007f86413906af in ide_bus_reset (bus=0x7f8641fe3a20) at hw/ide/core.c:2056 #5 0x00007f86413967d6 in piix3_reset (opaque=0x7f8641fe32a0) at hw/ide/piix.c:114 #6 0x00007f86412b9a37 in qemu_devices_reset () at vl.c:1829 #7 0x00007f86412b9aef in qemu_system_reset (report=true) at vl.c:1842 #8 0x00007f86412b9fe2 in main_loop_should_exit () at vl.c:1971 #9 0x00007f86412ba100 in main_loop () at vl.c:2011 #10 0x00007f86412c1a35 in main (argc=57, argv=0x7fff2e827d38, envp=0x7fff2e827f08) at vl.c:4526 BTW, is it better to rename dbs->in_cancel to dbs->canceled ? Best regards Chenliang ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Qemu-devel] [PATCH] ide: fix double free 2014-07-03 2:23 ` ChenLiang @ 2014-07-03 10:41 ` Paolo Bonzini 2014-07-07 8:12 ` ChenLiang 0 siblings, 1 reply; 19+ messages in thread From: Paolo Bonzini @ 2014-07-03 10:41 UTC (permalink / raw) To: ChenLiang Cc: kwolf@redhat.com, Gonglei (Arei), Huangweidong (C), qemu-devel@nongnu.org, stefanha@redhat.com Il 03/07/2014 04:23, ChenLiang ha scritto: > Program received signal SIGABRT, Aborted. > 0x00007fd548355b55 in raise () from /lib64/libc.so.6 > (gdb) bt > #0 0x00007fd548355b55 in raise () from /lib64/libc.so.6 > #1 0x00007fd548357131 in abort () from /lib64/libc.so.6 > #2 0x00007fd548393e0f in __libc_message () from /lib64/libc.so.6 > #3 0x00007fd548399618 in malloc_printerr () from /lib64/libc.so.6 > #4 0x00007fd54b15e80e in free_and_trace (mem=0x7fd54beb2230) at vl.c:2815 > #5 0x00007fd54b3453cd in qemu_aio_release (p=0x7fd54beb2230) at block.c:4813 > #6 0x00007fd54b15717d in dma_complete (dbs=0x7fd54beb2230, ret=0) at dma-helpers.c:132 > #7 0x00007fd54b157253 in dma_bdrv_cb (opaque=0x7fd54beb2230, ret=0) at dma-helpers.c:148 > #8 0x00007fd54b344db8 in bdrv_co_em_bh (opaque=0x7fd54bea4b30) at block.c:4676 > #9 0x00007fd54b335a72 in aio_bh_poll (ctx=0x7fd54bcec990) at async.c:81 > #10 0x00007fd54b34b1b4 in aio_poll (ctx=0x7fd54bcec990, blocking=false) at aio-posix.c:188 > #11 0x00007fd54b335ee0 in aio_ctx_dispatch (source=0x7fd54bcec990, callback=0x0, user_data=0x0) at async.c:211 > #12 0x00007fd549e3669a in g_main_context_dispatch () from /usr/lib64/libglib-2.0.so.0 > #13 0x00007fd54b348c45 in glib_pollfds_poll () at main-loop.c:190 > #14 0x00007fd54b348d3d in os_host_main_loop_wait (timeout=0) at main-loop.c:235 > #15 0x00007fd54b348e2f in main_loop_wait (nonblocking=0) at main-loop.c:484 > #16 0x00007fd54b15b0f8 in main_loop () at vl.c:2007 > #17 0x00007fd54b162a35 in main (argc=57, argv=0x7fff152720a8, envp=0x7fff15272278) at vl.c:4526 > > (gdb) bt > #0 qemu_aio_release (p=0x7f86420ebec0) at block.c:4811 > #1 0x00007f86412b617d in dma_complete (dbs=0x7f86420ebec0, ret=0) at dma-helpers.c:132 > #2 0x00007f86412b65ab in dma_aio_cancel (acb=0x7f86420ebec0) at dma-helpers.c:192 > #3 0x00007f86414a3996 in bdrv_aio_cancel (acb=0x7f86420ebec0) at block.c:4559 > #4 0x00007f86413906af in ide_bus_reset (bus=0x7f8641fe3a20) at hw/ide/core.c:2056 > #5 0x00007f86413967d6 in piix3_reset (opaque=0x7f8641fe32a0) at hw/ide/piix.c:114 > #6 0x00007f86412b9a37 in qemu_devices_reset () at vl.c:1829 > #7 0x00007f86412b9aef in qemu_system_reset (report=true) at vl.c:1842 > #8 0x00007f86412b9fe2 in main_loop_should_exit () at vl.c:1971 > #9 0x00007f86412ba100 in main_loop () at vl.c:2011 > #10 0x00007f86412c1a35 in main (argc=57, argv=0x7fff2e827d38, envp=0x7fff2e827f08) at vl.c:4526 Ok, this is the same as your previous backtrace. The bug is still the same: dma_bdrv_cb must not be called dma_aio_cancel has finished the recursive call to bdrv_aio_cancel. > BTW, is it better to rename dbs->in_cancel to dbs->canceled ? If we were to apply my patch, yes. But with the current logic "in_cancel" says "are we inside the recursive call to bdrv_aio_cancel" so I think the answer is no. My patch is just a band-aid, I don't think it should be applied. Paolo ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Qemu-devel] [PATCH] ide: fix double free 2014-07-03 10:41 ` Paolo Bonzini @ 2014-07-07 8:12 ` ChenLiang 2014-07-07 12:38 ` Paolo Bonzini 0 siblings, 1 reply; 19+ messages in thread From: ChenLiang @ 2014-07-07 8:12 UTC (permalink / raw) To: Paolo Bonzini Cc: kwolf@redhat.com, Gonglei (Arei), Huangweidong (C), qemu-devel@nongnu.org, stefanha@redhat.com On 2014/7/3 18:41, Paolo Bonzini wrote: > Il 03/07/2014 04:23, ChenLiang ha scritto: >> Program received signal SIGABRT, Aborted. >> 0x00007fd548355b55 in raise () from /lib64/libc.so.6 >> (gdb) bt >> #0 0x00007fd548355b55 in raise () from /lib64/libc.so.6 >> #1 0x00007fd548357131 in abort () from /lib64/libc.so.6 >> #2 0x00007fd548393e0f in __libc_message () from /lib64/libc.so.6 >> #3 0x00007fd548399618 in malloc_printerr () from /lib64/libc.so.6 >> #4 0x00007fd54b15e80e in free_and_trace (mem=0x7fd54beb2230) at vl.c:2815 >> #5 0x00007fd54b3453cd in qemu_aio_release (p=0x7fd54beb2230) at block.c:4813 >> #6 0x00007fd54b15717d in dma_complete (dbs=0x7fd54beb2230, ret=0) at dma-helpers.c:132 >> #7 0x00007fd54b157253 in dma_bdrv_cb (opaque=0x7fd54beb2230, ret=0) at dma-helpers.c:148 >> #8 0x00007fd54b344db8 in bdrv_co_em_bh (opaque=0x7fd54bea4b30) at block.c:4676 >> #9 0x00007fd54b335a72 in aio_bh_poll (ctx=0x7fd54bcec990) at async.c:81 >> #10 0x00007fd54b34b1b4 in aio_poll (ctx=0x7fd54bcec990, blocking=false) at aio-posix.c:188 >> #11 0x00007fd54b335ee0 in aio_ctx_dispatch (source=0x7fd54bcec990, callback=0x0, user_data=0x0) at async.c:211 >> #12 0x00007fd549e3669a in g_main_context_dispatch () from /usr/lib64/libglib-2.0.so.0 >> #13 0x00007fd54b348c45 in glib_pollfds_poll () at main-loop.c:190 >> #14 0x00007fd54b348d3d in os_host_main_loop_wait (timeout=0) at main-loop.c:235 >> #15 0x00007fd54b348e2f in main_loop_wait (nonblocking=0) at main-loop.c:484 >> #16 0x00007fd54b15b0f8 in main_loop () at vl.c:2007 >> #17 0x00007fd54b162a35 in main (argc=57, argv=0x7fff152720a8, envp=0x7fff15272278) at vl.c:4526 >> >> (gdb) bt >> #0 qemu_aio_release (p=0x7f86420ebec0) at block.c:4811 >> #1 0x00007f86412b617d in dma_complete (dbs=0x7f86420ebec0, ret=0) at dma-helpers.c:132 >> #2 0x00007f86412b65ab in dma_aio_cancel (acb=0x7f86420ebec0) at dma-helpers.c:192 >> #3 0x00007f86414a3996 in bdrv_aio_cancel (acb=0x7f86420ebec0) at block.c:4559 >> #4 0x00007f86413906af in ide_bus_reset (bus=0x7f8641fe3a20) at hw/ide/core.c:2056 >> #5 0x00007f86413967d6 in piix3_reset (opaque=0x7f8641fe32a0) at hw/ide/piix.c:114 >> #6 0x00007f86412b9a37 in qemu_devices_reset () at vl.c:1829 >> #7 0x00007f86412b9aef in qemu_system_reset (report=true) at vl.c:1842 >> #8 0x00007f86412b9fe2 in main_loop_should_exit () at vl.c:1971 >> #9 0x00007f86412ba100 in main_loop () at vl.c:2011 >> #10 0x00007f86412c1a35 in main (argc=57, argv=0x7fff2e827d38, envp=0x7fff2e827f08) at vl.c:4526 > > Ok, this is the same as your previous backtrace. The bug is still the same: dma_bdrv_cb must not be called dma_aio_cancel has finished the recursive call to bdrv_aio_cancel. > >> BTW, is it better to rename dbs->in_cancel to dbs->canceled ? > > If we were to apply my patch, yes. But with the current logic "in_cancel" says "are we inside the recursive call to bdrv_aio_cancel" so I think the answer is no. My patch is just a band-aid, I don't think it should be applied. > > Paolo > > . > Hi, virtio_blk_reset uses bdrv_drain_all too. static void virtio_blk_reset(VirtIODevice *vdev) { VirtIOBlock *s = VIRTIO_BLK(vdev); #ifdef CONFIG_VIRTIO_BLK_DATA_PLANE if (s->dataplane) { virtio_blk_data_plane_stop(s->dataplane); } #endif /* * This should cancel pending requests, but can't do nicely until there * are per-device request lists. */ bdrv_drain_all(); bdrv_set_enable_write_cache(s->bs, s->original_wce); } Best regards Chenliang ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Qemu-devel] [PATCH] ide: fix double free 2014-07-07 8:12 ` ChenLiang @ 2014-07-07 12:38 ` Paolo Bonzini 0 siblings, 0 replies; 19+ messages in thread From: Paolo Bonzini @ 2014-07-07 12:38 UTC (permalink / raw) To: ChenLiang Cc: kwolf@redhat.com, Gonglei (Arei), Huangweidong (C), qemu-devel@nongnu.org, stefanha@redhat.com Il 07/07/2014 10:12, ChenLiang ha scritto: > Hi, > virtio_blk_reset uses bdrv_drain_all too. That's a limitation of virtio_blk_reset, as the comment above says. virtio-scsi uses bdrv_aio_cancel. :) Paolo ^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2014-07-07 12:38 UTC | newest] Thread overview: 19+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2014-07-02 8:50 [Qemu-devel] [PATCH] ide: fix double free arei.gonglei 2014-07-02 9:04 ` Paolo Bonzini 2014-07-02 9:24 ` ChenLiang 2014-07-02 9:25 ` Paolo Bonzini 2014-07-02 9:46 ` Gonglei (Arei) 2014-07-02 10:16 ` Paolo Bonzini 2014-07-02 11:12 ` ChenLiang 2014-07-02 11:24 ` Paolo Bonzini 2014-07-02 11:33 ` ChenLiang 2014-07-02 11:40 ` Paolo Bonzini 2014-07-02 11:57 ` ChenLiang 2014-07-02 12:19 ` Paolo Bonzini 2014-07-02 12:46 ` 陈梁 2014-07-02 12:54 ` 陈梁 2014-07-02 13:09 ` Paolo Bonzini 2014-07-03 2:23 ` ChenLiang 2014-07-03 10:41 ` Paolo Bonzini 2014-07-07 8:12 ` ChenLiang 2014-07-07 12:38 ` Paolo Bonzini
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).