* [Qemu-devel] [PATCH] ide: fix double free
@ 2014-07-02 8:50 arei.gonglei
2014-07-02 9:04 ` Paolo Bonzini
0 siblings, 1 reply; 19+ messages in thread
From: arei.gonglei @ 2014-07-02 8:50 UTC (permalink / raw)
To: qemu-devel; +Cc: kwolf, Chenliang, weidong.huang, Gonglei, stefanha, pbonzini
From: Chenliang <chenliang88@huawei.com>
Qemu may double free when excutes command "reboot -f" in vm.
One path is bdrv_aio_cancel(), the other is dma_bdrv_cb()
callback prcocess.
Signed-off-by: Chenliang <chenliang88@huawei.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
---
Qemu crash bt:
Program received signal SIGABRT, Aborted.
0x00007f3cb2c76b55 in raise () from /lib64/libc.so.6
(gdb) bt
#0 0x00007f3cb2c76b55 in raise () from /lib64/libc.so.6
#1 0x00007f3cb2c78131 in abort () from /lib64/libc.so.6
#2 0x00007f3cb2cb4e0f in __libc_message () from /lib64/libc.so.6
#3 0x00007f3cb2cba618 in malloc_printerr () from /lib64/libc.so.6
#4 0x00007f3cb2cbf65c in free () from /lib64/libc.so.6
#5 0x00007f3cb54146f2 in free_and_trace (mem=0x7f3cb63f3220) at vl.c:3078
#6 0x00007f3cb52126cd in qemu_aio_release (p=0x7f3cb63f3220) at block.c:4262
#7 0x00007f3cb525de5e in dma_complete (dbs=0x7f3cb63f3220, ret=0) at dma-helpers.c:135
#8 0x00007f3cb525df3d in dma_bdrv_cb (opaque=0x7f3cb63f3220, ret=0) at dma-helpers.c:152
#9 0x00007f3cb5212102 in bdrv_co_em_bh (opaque=0x7f3cb6398980) at block.c:4127
#10 0x00007f3cb51f6cef in aio_bh_poll (ctx=0x7f3cb622a8f0) at async.c:70
#11 0x00007f3cb51f695a in aio_poll (ctx=0x7f3cb622a8f0, blocking=false) at aio-posix.c:185
#12 0x00007f3cb51f7056 in aio_ctx_dispatch (source=0x7f3cb622a8f0, callback=0x0, user_data=0x0)
at async.c:167
#13 0x00007f3cb48b969a in g_main_context_dispatch () from /usr/lib64/libglib-2.0.so.0
#14 0x00007f3cb538956d in glib_pollfds_poll () at main-loop.c:188
#15 0x00007f3cb538965e in os_host_main_loop_wait (timeout=0) at main-loop.c:233
#16 0x00007f3cb5389718 in main_loop_wait (nonblocking=0) at main-loop.c:478
#17 0x00007f3cb5411212 in main_loop () at vl.c:2319
#18 0x00007f3cb54186ba in main (argc=57, argv=0x7fff5510c968, envp=0x7fff5510cb38) at vl.c:4803
---
hw/ide/core.c | 8 +++-----
1 file changed, 3 insertions(+), 5 deletions(-)
diff --git a/hw/ide/core.c b/hw/ide/core.c
index 3a38f1e..76d65c1 100644
--- a/hw/ide/core.c
+++ b/hw/ide/core.c
@@ -2050,11 +2050,9 @@ void ide_bus_reset(IDEBus *bus)
/* pending async DMA */
if (bus->dma->aiocb) {
-#ifdef DEBUG_AIO
- printf("aio_cancel\n");
-#endif
- bdrv_aio_cancel(bus->dma->aiocb);
- bus->dma->aiocb = NULL;
+ bdrv_drain_all();
+ bdrv_flush_all();
+ assert(bus->dma->aiocb == NULL);
}
/* reset dma provider too */
--
1.7.12.4
^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [Qemu-devel] [PATCH] ide: fix double free
2014-07-02 8:50 [Qemu-devel] [PATCH] ide: fix double free arei.gonglei
@ 2014-07-02 9:04 ` Paolo Bonzini
2014-07-02 9:24 ` ChenLiang
0 siblings, 1 reply; 19+ messages in thread
From: Paolo Bonzini @ 2014-07-02 9:04 UTC (permalink / raw)
To: arei.gonglei, qemu-devel; +Cc: kwolf, Chenliang, weidong.huang, stefanha
Il 02/07/2014 10:50, arei.gonglei@huawei.com ha scritto:
> if (bus->dma->aiocb) {
> -#ifdef DEBUG_AIO
> - printf("aio_cancel\n");
> -#endif
> - bdrv_aio_cancel(bus->dma->aiocb);
> - bus->dma->aiocb = NULL;
> + bdrv_drain_all();
> + bdrv_flush_all();
> + assert(bus->dma->aiocb == NULL);
> }
This is definitely a heavyweight solution, and in fact the bug should
not be there in the first place. See dma_complete:
static void dma_complete(DMAAIOCB *dbs, int ret)
{
trace_dma_complete(dbs, ret, dbs->common.cb);
dma_bdrv_unmap(dbs);
if (dbs->common.cb) {
dbs->common.cb(dbs->common.opaque, ret);
}
qemu_iovec_destroy(&dbs->iov);
if (dbs->bh) {
qemu_bh_delete(dbs->bh);
dbs->bh = NULL;
}
if (!dbs->in_cancel) {
/* Requests may complete while dma_aio_cancel is in progress. In
* this case, the AIOCB should not be released because it is still
* referenced by dma_aio_cancel. */
qemu_aio_release(dbs);
}
}
Perhaps something like this?
diff --git a/dma-helpers.c b/dma-helpers.c
index 53cbe92..21b70d12 100644
--- a/dma-helpers.c
+++ b/dma-helpers.c
@@ -181,15 +181,15 @@ static void dma_aio_cancel(BlockDriverAIOCB *acb)
trace_dma_aio_cancel(dbs);
+ dbs->in_cancel = true;
if (dbs->acb) {
BlockDriverAIOCB *acb = dbs->acb;
dbs->acb = NULL;
- dbs->in_cancel = true;
bdrv_aio_cancel(acb);
- dbs->in_cancel = false;
}
dbs->common.cb = NULL;
dma_complete(dbs, 0);
+ qemu_aio_release(dbs);
}
static const AIOCBInfo dma_aiocb_info = {
^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [Qemu-devel] [PATCH] ide: fix double free
2014-07-02 9:04 ` Paolo Bonzini
@ 2014-07-02 9:24 ` ChenLiang
2014-07-02 9:25 ` Paolo Bonzini
0 siblings, 1 reply; 19+ messages in thread
From: ChenLiang @ 2014-07-02 9:24 UTC (permalink / raw)
To: Paolo Bonzini; +Cc: kwolf, arei.gonglei, weidong.huang, qemu-devel, stefanha
On 2014/7/2 17:04, Paolo Bonzini wrote:
> This is definitely a heavyweight solution, and in fact the bug should
> not be there in the first place. See dma_complete:
>
> static void dma_complete(DMAAIOCB *dbs, int ret)
> {
> trace_dma_complete(dbs, ret, dbs->common.cb);
>
> dma_bdrv_unmap(dbs);
> if (dbs->common.cb) {
> dbs->common.cb(dbs->common.opaque, ret);
> }
> qemu_iovec_destroy(&dbs->iov);
> if (dbs->bh) {
> qemu_bh_delete(dbs->bh);
> dbs->bh = NULL;
> }
> if (!dbs->in_cancel) {
> /* Requests may complete while dma_aio_cancel is in progress. In
> * this case, the AIOCB should not be released because it is still
> * referenced by dma_aio_cancel. */
> qemu_aio_release(dbs);
> }
> }
>
> Perhaps something like this?
>
> diff --git a/dma-helpers.c b/dma-helpers.c
> index 53cbe92..21b70d12 100644
> --- a/dma-helpers.c
> +++ b/dma-helpers.c
> @@ -181,15 +181,15 @@ static void dma_aio_cancel(BlockDriverAIOCB *acb)
>
> trace_dma_aio_cancel(dbs);
>
> + dbs->in_cancel = true;
> if (dbs->acb) {
> BlockDriverAIOCB *acb = dbs->acb;
> dbs->acb = NULL;
> - dbs->in_cancel = true;
> bdrv_aio_cancel(acb);
> - dbs->in_cancel = false;
> }
> dbs->common.cb = NULL;
> dma_complete(dbs, 0);
> + qemu_aio_release(dbs);
> }
>
> static const AIOCBInfo dma_aiocb_info = {
>
>
> .
>
acked, thanks
Best regards
Chenliang
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Qemu-devel] [PATCH] ide: fix double free
2014-07-02 9:24 ` ChenLiang
@ 2014-07-02 9:25 ` Paolo Bonzini
2014-07-02 9:46 ` Gonglei (Arei)
0 siblings, 1 reply; 19+ messages in thread
From: Paolo Bonzini @ 2014-07-02 9:25 UTC (permalink / raw)
To: ChenLiang; +Cc: kwolf, arei.gonglei, weidong.huang, qemu-devel, stefanha
Il 02/07/2014 11:24, ChenLiang ha scritto:
> On 2014/7/2 17:04, Paolo Bonzini wrote:
>
>> This is definitely a heavyweight solution, and in fact the bug should
>> not be there in the first place. See dma_complete:
>>
>> static void dma_complete(DMAAIOCB *dbs, int ret)
>> {
>> trace_dma_complete(dbs, ret, dbs->common.cb);
>>
>> dma_bdrv_unmap(dbs);
>> if (dbs->common.cb) {
>> dbs->common.cb(dbs->common.opaque, ret);
>> }
>> qemu_iovec_destroy(&dbs->iov);
>> if (dbs->bh) {
>> qemu_bh_delete(dbs->bh);
>> dbs->bh = NULL;
>> }
>> if (!dbs->in_cancel) {
>> /* Requests may complete while dma_aio_cancel is in progress. In
>> * this case, the AIOCB should not be released because it is still
>> * referenced by dma_aio_cancel. */
>> qemu_aio_release(dbs);
>> }
>> }
>>
>> Perhaps something like this?
>>
>> diff --git a/dma-helpers.c b/dma-helpers.c
>> index 53cbe92..21b70d12 100644
>> --- a/dma-helpers.c
>> +++ b/dma-helpers.c
>> @@ -181,15 +181,15 @@ static void dma_aio_cancel(BlockDriverAIOCB *acb)
>>
>> trace_dma_aio_cancel(dbs);
>>
>> + dbs->in_cancel = true;
>> if (dbs->acb) {
>> BlockDriverAIOCB *acb = dbs->acb;
>> dbs->acb = NULL;
>> - dbs->in_cancel = true;
>> bdrv_aio_cancel(acb);
>> - dbs->in_cancel = false;
>> }
>> dbs->common.cb = NULL;
>> dma_complete(dbs, 0);
>> + qemu_aio_release(dbs);
>> }
>>
>> static const AIOCBInfo dma_aiocb_info = {
>>
>>
>> .
>>
>
> acked, thanks
Did you test this? :) What is the testcase?
Paolo
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Qemu-devel] [PATCH] ide: fix double free
2014-07-02 9:25 ` Paolo Bonzini
@ 2014-07-02 9:46 ` Gonglei (Arei)
2014-07-02 10:16 ` Paolo Bonzini
0 siblings, 1 reply; 19+ messages in thread
From: Gonglei (Arei) @ 2014-07-02 9:46 UTC (permalink / raw)
To: Paolo Bonzini, chenliang (T)
Cc: kwolf@redhat.com, Huangweidong (C), qemu-devel@nongnu.org,
stefanha@redhat.com
> -----Original Message-----
> From: Paolo Bonzini [mailto:pbonzini@redhat.com]
> Sent: Wednesday, July 02, 2014 5:26 PM
> To: chenliang (T)
> Cc: Gonglei (Arei); qemu-devel@nongnu.org; kwolf@redhat.com;
> stefanha@redhat.com; Huangweidong (C)
> Subject: Re: [PATCH] ide: fix double free
>
> Il 02/07/2014 11:24, ChenLiang ha scritto:
> > On 2014/7/2 17:04, Paolo Bonzini wrote:
> >
> >> This is definitely a heavyweight solution, and in fact the bug should
> >> not be there in the first place. See dma_complete:
> >>
> >> static void dma_complete(DMAAIOCB *dbs, int ret)
> >> {
> >> trace_dma_complete(dbs, ret, dbs->common.cb);
> >>
> >> dma_bdrv_unmap(dbs);
> >> if (dbs->common.cb) {
> >> dbs->common.cb(dbs->common.opaque, ret);
> >> }
> >> qemu_iovec_destroy(&dbs->iov);
> >> if (dbs->bh) {
> >> qemu_bh_delete(dbs->bh);
> >> dbs->bh = NULL;
> >> }
> >> if (!dbs->in_cancel) {
> >> /* Requests may complete while dma_aio_cancel is in progress.
> In
> >> * this case, the AIOCB should not be released because it is still
> >> * referenced by dma_aio_cancel. */
> >> qemu_aio_release(dbs);
> >> }
> >> }
> >>
> >> Perhaps something like this?
> >>
> >> diff --git a/dma-helpers.c b/dma-helpers.c
> >> index 53cbe92..21b70d12 100644
> >> --- a/dma-helpers.c
> >> +++ b/dma-helpers.c
> >> @@ -181,15 +181,15 @@ static void dma_aio_cancel(BlockDriverAIOCB
> *acb)
> >>
> >> trace_dma_aio_cancel(dbs);
> >>
> >> + dbs->in_cancel = true;
> >> if (dbs->acb) {
> >> BlockDriverAIOCB *acb = dbs->acb;
> >> dbs->acb = NULL;
> >> - dbs->in_cancel = true;
> >> bdrv_aio_cancel(acb);
> >> - dbs->in_cancel = false;
> >> }
> >> dbs->common.cb = NULL;
> >> dma_complete(dbs, 0);
> >> + qemu_aio_release(dbs);
> >> }
> >>
> >> static const AIOCBInfo dma_aiocb_info = {
> >>
> >>
> >> .
> >>
> >
> > acked, thanks
>
> Did you test this? :) What is the testcase?
>
> Paolo
Tested-by: Gonglei <arei.gonglei@huawei.com>
Hi, Paolo. We have tested your above patch, and it works well for us.
The qemu command line list as bellow, which started by libvirt :
/home/qemu/x86_64-softmmu/qemu-system-x86_64 -name test -S -machine pc-i440fx-1.5,accel=kvm,usb=off -m 4096 -realtime mlock=off -smp 16,sockets=16,cores=1,threads=1 -uuid abbbd27f-b83c-4c3b-8add-98df31eea0b7 -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/test.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device lsi,id=scsi0,bus=pci.0,addr=0x5 -drive file=/tmp/qemu-crash/linux_x86_64.iso,if=none,id=drive-ide0-0-0,readonly=on,format=raw,cache=none,aio=native -device ide-cd,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0,bootindex=1 -drive file=/home/test.img,if=none,id=drive-ide0-0-1,format=raw,cache=none,aio=native -device ide-hd,bus=ide.0,unit=1,drive=drive-ide0-0-1,id=ide0-0-1 -netdev tap,fd=25,id=hostnet0 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:d9:3a:73,bus=pci.0,addr=0x3 -device usb-tablet,id=input0 -vnc 0.0.0.0:0 -device cirrus-vga,id=video0,vgamem_mb=9,bus=pci.0,addr=0x2
We setup a linux guest os on an IDE disk, named "test.img", when the guest os
Finishe the process of install, then I execute "reboot -f" in the VM immediately,
And then will crash the qemu.
Best regards,
-Gonglei
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Qemu-devel] [PATCH] ide: fix double free
2014-07-02 9:46 ` Gonglei (Arei)
@ 2014-07-02 10:16 ` Paolo Bonzini
2014-07-02 11:12 ` ChenLiang
2014-07-02 11:33 ` ChenLiang
0 siblings, 2 replies; 19+ messages in thread
From: Paolo Bonzini @ 2014-07-02 10:16 UTC (permalink / raw)
To: Gonglei (Arei), chenliang (T)
Cc: kwolf@redhat.com, Huangweidong (C), qemu-devel@nongnu.org,
stefanha@redhat.com
Il 02/07/2014 11:46, Gonglei (Arei) ha scritto:
> Hi, Paolo. We have tested your above patch, and it works well for us.
I'm still not sure where the fix is. I jotted the patch quickly, but
I'd rather understand it better before submitting it. Here is it again:
--- a/dma-helpers.c
+++ b/dma-helpers.c
@@ -181,15 +181,15 @@ static void dma_aio_cancel(BlockDriverAIOCB *acb)
trace_dma_aio_cancel(dbs);
+ dbs->in_cancel = true;
if (dbs->acb) {
BlockDriverAIOCB *acb = dbs->acb;
dbs->acb = NULL;
- dbs->in_cancel = true;
bdrv_aio_cancel(acb);
- dbs->in_cancel = false;
}
dbs->common.cb = NULL;
dma_complete(dbs, 0);
+ qemu_aio_release(dbs);
}
static const AIOCBInfo dma_aiocb_info = {
and here is dma_complete:
static void dma_complete(DMAAIOCB *dbs, int ret)
{
trace_dma_complete(dbs, ret, dbs->common.cb);
dma_bdrv_unmap(dbs);
if (dbs->common.cb) {
dbs->common.cb(dbs->common.opaque, ret);
}
qemu_iovec_destroy(&dbs->iov);
if (dbs->bh) {
qemu_bh_delete(dbs->bh);
dbs->bh = NULL;
}
if (!dbs->in_cancel) {
/* Requests may complete while dma_aio_cancel is in progress.
* this case, the AIOCB should not be released because it is
* still referenced by dma_aio_cancel. */
qemu_aio_release(dbs);
}
}
The qemu_aio_release call obviously happened during the second
dma_complete call.
However, the second dma_complete call does not call the callback, and
even if dma_bdrv_unmap resulted in a call to continue_after_map_failure,
the bottom half would be cancelled immediately. So we changed from:
dbs->in_cancel = false;
/* dma_complete calls dma_bdrv_unmap */
for (i = 0; i < dbs->iov.niov; ++i) {
/* might schedule dbs->bh */
dma_memory_unmap(dbs->sg->as, dbs->iov.iov[i].iov_base,
dbs->iov.iov[i].iov_len, dbs->dir,
dbs->iov.iov[i].iov_len);
}
qemu_iovec_reset(&dbs->iov);
/* back to dma_complete, dbs->common.cb == NULL */
qemu_iovec_destroy(&dbs->iov);
if (dbs->bh) {
/* dbs->bh now unscheduled, so it will never run */
qemu_bh_delete(dbs->bh);
dbs->bh = NULL;
}
/* dbs->in_cancel is false here. */
if (!dbs->in_cancel) {
/* Requests may complete while dma_aio_cancel is in progress.
* this case, the AIOCB should not be released because it is
* still referenced by dma_aio_cancel. */
qemu_aio_release(dbs);
}
to
dbs->in_cancel = false;
/* dma_complete calls dma_bdrv_unmap */
for (i = 0; i < dbs->iov.niov; ++i) {
/* might schedule dbs->bh */
dma_memory_unmap(dbs->sg->as, dbs->iov.iov[i].iov_base,
dbs->iov.iov[i].iov_len, dbs->dir,
dbs->iov.iov[i].iov_len);
}
qemu_iovec_reset(&dbs->iov);
/* back to dma_complete, dbs->common.cb == NULL */
qemu_iovec_destroy(&dbs->iov);
if (dbs->bh) {
/* dbs->bh now unscheduled, so it will never run */
qemu_bh_delete(dbs->bh);
dbs->bh = NULL;
}
/* after the patch dbs->in_cancel is true, the if never triggers */
if (!dbs->in_cancel) {
/* Requests may complete while dma_aio_cancel is in progress.
* this case, the AIOCB should not be released because it is
* still referenced by dma_aio_cancel. */
qemu_aio_release(dbs);
}
/* back to dma_aio_cancel */
qemu_aio_release(dbs);
This doesn't make sense to me. Can you find out where I'm wrong?
Paolo
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Qemu-devel] [PATCH] ide: fix double free
2014-07-02 10:16 ` Paolo Bonzini
@ 2014-07-02 11:12 ` ChenLiang
2014-07-02 11:24 ` Paolo Bonzini
2014-07-02 11:33 ` ChenLiang
1 sibling, 1 reply; 19+ messages in thread
From: ChenLiang @ 2014-07-02 11:12 UTC (permalink / raw)
To: Paolo Bonzini
Cc: kwolf@redhat.com, Gonglei (Arei), Huangweidong (C),
qemu-devel@nongnu.org, stefanha@redhat.com
On 2014/7/2 18:16, Paolo Bonzini wrote:
> Il 02/07/2014 11:46, Gonglei (Arei) ha scritto:
>> Hi, Paolo. We have tested your above patch, and it works well for us.
>
> I'm still not sure where the fix is. I jotted the patch quickly, but I'd rather understand it better before submitting it. Here is it again:
>
> --- a/dma-helpers.c
> +++ b/dma-helpers.c
> @@ -181,15 +181,15 @@ static void dma_aio_cancel(BlockDriverAIOCB *acb)
> trace_dma_aio_cancel(dbs);
>
> + dbs->in_cancel = true;
> if (dbs->acb) {
> BlockDriverAIOCB *acb = dbs->acb;
> dbs->acb = NULL;
> - dbs->in_cancel = true;
> bdrv_aio_cancel(acb);
> - dbs->in_cancel = false;
> }
> dbs->common.cb = NULL;
> dma_complete(dbs, 0);
> + qemu_aio_release(dbs);
> }
>
> static const AIOCBInfo dma_aiocb_info = {
>
> and here is dma_complete:
>
> static void dma_complete(DMAAIOCB *dbs, int ret)
> {
> trace_dma_complete(dbs, ret, dbs->common.cb);
> dma_bdrv_unmap(dbs);
> if (dbs->common.cb) {
> dbs->common.cb(dbs->common.opaque, ret);
> }
> qemu_iovec_destroy(&dbs->iov);
> if (dbs->bh) {
> qemu_bh_delete(dbs->bh);
> dbs->bh = NULL;
> }
> if (!dbs->in_cancel) {
> /* Requests may complete while dma_aio_cancel is in progress.
> * this case, the AIOCB should not be released because it is
> * still referenced by dma_aio_cancel. */
> qemu_aio_release(dbs);
> }
> }
>
> The qemu_aio_release call obviously happened during the second dma_complete call.
>
> However, the second dma_complete call does not call the callback, and even if dma_bdrv_unmap resulted in a call to continue_after_map_failure, the bottom half would be cancelled immediately. So we changed from:
>
> dbs->in_cancel = false;
>
> /* dma_complete calls dma_bdrv_unmap */
> for (i = 0; i < dbs->iov.niov; ++i) {
> /* might schedule dbs->bh */
> dma_memory_unmap(dbs->sg->as, dbs->iov.iov[i].iov_base,
> dbs->iov.iov[i].iov_len, dbs->dir,
> dbs->iov.iov[i].iov_len);
> }
> qemu_iovec_reset(&dbs->iov);
>
> /* back to dma_complete, dbs->common.cb == NULL */
> qemu_iovec_destroy(&dbs->iov);
> if (dbs->bh) {
> /* dbs->bh now unscheduled, so it will never run */
> qemu_bh_delete(dbs->bh);
> dbs->bh = NULL;
> }
> /* dbs->in_cancel is false here. */
> if (!dbs->in_cancel) {
> /* Requests may complete while dma_aio_cancel is in progress.
> * this case, the AIOCB should not be released because it is
> * still referenced by dma_aio_cancel. */
> qemu_aio_release(dbs);
> }
>
> to
>
> dbs->in_cancel = false;
>
> /* dma_complete calls dma_bdrv_unmap */
> for (i = 0; i < dbs->iov.niov; ++i) {
> /* might schedule dbs->bh */
> dma_memory_unmap(dbs->sg->as, dbs->iov.iov[i].iov_base,
> dbs->iov.iov[i].iov_len, dbs->dir,
> dbs->iov.iov[i].iov_len);
> }
> qemu_iovec_reset(&dbs->iov);
>
> /* back to dma_complete, dbs->common.cb == NULL */
> qemu_iovec_destroy(&dbs->iov);
> if (dbs->bh) {
> /* dbs->bh now unscheduled, so it will never run */
> qemu_bh_delete(dbs->bh);
> dbs->bh = NULL;
> }
>
> /* after the patch dbs->in_cancel is true, the if never triggers */
> if (!dbs->in_cancel) {
> /* Requests may complete while dma_aio_cancel is in progress.
> * this case, the AIOCB should not be released because it is
> * still referenced by dma_aio_cancel. */
> qemu_aio_release(dbs);
> }
>
> /* back to dma_aio_cancel */
> qemu_aio_release(dbs);
>
> This doesn't make sense to me. Can you find out where I'm wrong?
>
> Paolo
>
> .
>
This patch avoid freeing dbs by dma_complete when dma_aio_cancel is running.
Because dma_complete also will be called by dma_bdrv_cb. So double free
will never happen.
Chenliang
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Qemu-devel] [PATCH] ide: fix double free
2014-07-02 11:12 ` ChenLiang
@ 2014-07-02 11:24 ` Paolo Bonzini
0 siblings, 0 replies; 19+ messages in thread
From: Paolo Bonzini @ 2014-07-02 11:24 UTC (permalink / raw)
To: ChenLiang
Cc: kwolf@redhat.com, Gonglei (Arei), Huangweidong (C),
qemu-devel@nongnu.org, stefanha@redhat.com
Il 02/07/2014 13:12, ChenLiang ha scritto:
> This patch avoid freeing dbs by dma_complete when dma_aio_cancel is running.
> Because dma_complete also will be called by dma_bdrv_cb. So double free
> will never happen.
Yes, you already said that. But I'm not sure _why_ the patch avoids
freeing dbs, especially since the patch adds a free in dma_aio_cancel
itself. My patch shouldn't have any effect. Where is my analysis of it
wrong?
Paolo
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Qemu-devel] [PATCH] ide: fix double free
2014-07-02 10:16 ` Paolo Bonzini
2014-07-02 11:12 ` ChenLiang
@ 2014-07-02 11:33 ` ChenLiang
2014-07-02 11:40 ` Paolo Bonzini
1 sibling, 1 reply; 19+ messages in thread
From: ChenLiang @ 2014-07-02 11:33 UTC (permalink / raw)
To: Paolo Bonzini
Cc: kwolf@redhat.com, Gonglei (Arei), Huangweidong (C),
qemu-devel@nongnu.org, stefanha@redhat.com
On 2014/7/2 18:16, Paolo Bonzini wrote:
> Il 02/07/2014 11:46, Gonglei (Arei) ha scritto:
>> Hi, Paolo. We have tested your above patch, and it works well for us.
>
> I'm still not sure where the fix is. I jotted the patch quickly, but I'd rather understand it better before submitting it. Here is it again:
>
> --- a/dma-helpers.c
> +++ b/dma-helpers.c
> @@ -181,15 +181,15 @@ static void dma_aio_cancel(BlockDriverAIOCB *acb)
> trace_dma_aio_cancel(dbs);
>
> + dbs->in_cancel = true;
> if (dbs->acb) {
> BlockDriverAIOCB *acb = dbs->acb;
> dbs->acb = NULL;
> - dbs->in_cancel = true;
> bdrv_aio_cancel(acb);
> - dbs->in_cancel = false;
> }
> dbs->common.cb = NULL;
> dma_complete(dbs, 0);
> + qemu_aio_release(dbs);
> }
>
Hmm, dbs->in_cancel will be true always. Although this will avoid freeing dbs by dma_comlete.
But it maybe a mistake.
> static const AIOCBInfo dma_aiocb_info = {
>
> and here is dma_complete:
>
> static void dma_complete(DMAAIOCB *dbs, int ret)
> {
> trace_dma_complete(dbs, ret, dbs->common.cb);
> dma_bdrv_unmap(dbs);
> if (dbs->common.cb) {
> dbs->common.cb(dbs->common.opaque, ret);
> }
> qemu_iovec_destroy(&dbs->iov);
> if (dbs->bh) {
> qemu_bh_delete(dbs->bh);
> dbs->bh = NULL;
> }
> if (!dbs->in_cancel) {
> /* Requests may complete while dma_aio_cancel is in progress.
> * this case, the AIOCB should not be released because it is
> * still referenced by dma_aio_cancel. */
> qemu_aio_release(dbs);
> }
> }
>
> The qemu_aio_release call obviously happened during the second dma_complete call.
>
> However, the second dma_complete call does not call the callback, and even if dma_bdrv_unmap resulted in a call to continue_after_map_failure, the bottom half would be cancelled immediately. So we changed from:
>
> dbs->in_cancel = false;
>
> /* dma_complete calls dma_bdrv_unmap */
> for (i = 0; i < dbs->iov.niov; ++i) {
> /* might schedule dbs->bh */
> dma_memory_unmap(dbs->sg->as, dbs->iov.iov[i].iov_base,
> dbs->iov.iov[i].iov_len, dbs->dir,
> dbs->iov.iov[i].iov_len);
> }
> qemu_iovec_reset(&dbs->iov);
>
> /* back to dma_complete, dbs->common.cb == NULL */
> qemu_iovec_destroy(&dbs->iov);
> if (dbs->bh) {
> /* dbs->bh now unscheduled, so it will never run */
> qemu_bh_delete(dbs->bh);
> dbs->bh = NULL;
> }
> /* dbs->in_cancel is false here. */
> if (!dbs->in_cancel) {
> /* Requests may complete while dma_aio_cancel is in progress.
> * this case, the AIOCB should not be released because it is
> * still referenced by dma_aio_cancel. */
> qemu_aio_release(dbs);
> }
>
> to
>
> dbs->in_cancel = false;
>
> /* dma_complete calls dma_bdrv_unmap */
> for (i = 0; i < dbs->iov.niov; ++i) {
> /* might schedule dbs->bh */
> dma_memory_unmap(dbs->sg->as, dbs->iov.iov[i].iov_base,
> dbs->iov.iov[i].iov_len, dbs->dir,
> dbs->iov.iov[i].iov_len);
> }
> qemu_iovec_reset(&dbs->iov);
>
> /* back to dma_complete, dbs->common.cb == NULL */
> qemu_iovec_destroy(&dbs->iov);
> if (dbs->bh) {
> /* dbs->bh now unscheduled, so it will never run */
> qemu_bh_delete(dbs->bh);
> dbs->bh = NULL;
> }
>
> /* after the patch dbs->in_cancel is true, the if never triggers */
> if (!dbs->in_cancel) {
> /* Requests may complete while dma_aio_cancel is in progress.
> * this case, the AIOCB should not be released because it is
> * still referenced by dma_aio_cancel. */
> qemu_aio_release(dbs);
> }
>
> /* back to dma_aio_cancel */
> qemu_aio_release(dbs);
>
> This doesn't make sense to me. Can you find out where I'm wrong?
>
> Paolo
>
> .
>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Qemu-devel] [PATCH] ide: fix double free
2014-07-02 11:33 ` ChenLiang
@ 2014-07-02 11:40 ` Paolo Bonzini
2014-07-02 11:57 ` ChenLiang
0 siblings, 1 reply; 19+ messages in thread
From: Paolo Bonzini @ 2014-07-02 11:40 UTC (permalink / raw)
To: ChenLiang
Cc: kwolf@redhat.com, Gonglei (Arei), Huangweidong (C),
qemu-devel@nongnu.org, stefanha@redhat.com
Il 02/07/2014 13:33, ChenLiang ha scritto:
> On 2014/7/2 18:16, Paolo Bonzini wrote:
>
>> Il 02/07/2014 11:46, Gonglei (Arei) ha scritto:
>>> Hi, Paolo. We have tested your above patch, and it works well for us.
>>
>> I'm still not sure where the fix is. I jotted the patch quickly, but I'd rather understand it better before submitting it. Here is it again:
>>
>> --- a/dma-helpers.c
>> +++ b/dma-helpers.c
>> @@ -181,15 +181,15 @@ static void dma_aio_cancel(BlockDriverAIOCB *acb)
>> trace_dma_aio_cancel(dbs);
>>
>> + dbs->in_cancel = true;
>> if (dbs->acb) {
>> BlockDriverAIOCB *acb = dbs->acb;
>> dbs->acb = NULL;
>> - dbs->in_cancel = true;
>> bdrv_aio_cancel(acb);
>> - dbs->in_cancel = false;
>> }
>> dbs->common.cb = NULL;
>> dma_complete(dbs, 0);
>> + qemu_aio_release(dbs);
>> }
>>
>
>
> Hmm, dbs->in_cancel will be true always. Although this will avoid freeing dbs by dma_comlete.
> But it maybe a mistake.
This was on purpose; I'm doing the free myself in dma_aio_cancel, so I
wanted to avoid the qemu_aio_release from dma_complete. This was in
case of a recursive call to dma_complete. But I don't see how that
recursive call could happen outside the "if (dbs->acb)"; and inside the
"if" the protection is there already.
Can you gather the backtraces for _both_ calls to qemu_aio_release,
rather than just the second?
With what guest are you encountering the bug?
Paolo
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Qemu-devel] [PATCH] ide: fix double free
2014-07-02 11:40 ` Paolo Bonzini
@ 2014-07-02 11:57 ` ChenLiang
2014-07-02 12:19 ` Paolo Bonzini
0 siblings, 1 reply; 19+ messages in thread
From: ChenLiang @ 2014-07-02 11:57 UTC (permalink / raw)
To: Paolo Bonzini
Cc: kwolf@redhat.com, Gonglei (Arei), Huangweidong (C),
qemu-devel@nongnu.org, stefanha@redhat.com
On 2014/7/2 19:40, Paolo Bonzini wrote:
> Il 02/07/2014 13:33, ChenLiang ha scritto:
>> On 2014/7/2 18:16, Paolo Bonzini wrote:
>>
>>> Il 02/07/2014 11:46, Gonglei (Arei) ha scritto:
>>>> Hi, Paolo. We have tested your above patch, and it works well for us.
>>>
>>> I'm still not sure where the fix is. I jotted the patch quickly, but I'd rather understand it better before submitting it. Here is it again:
>>>
>>> --- a/dma-helpers.c
>>> +++ b/dma-helpers.c
>>> @@ -181,15 +181,15 @@ static void dma_aio_cancel(BlockDriverAIOCB *acb)
>>> trace_dma_aio_cancel(dbs);
>>>
>>> + dbs->in_cancel = true;
>>> if (dbs->acb) {
>>> BlockDriverAIOCB *acb = dbs->acb;
>>> dbs->acb = NULL;
>>> - dbs->in_cancel = true;
>>> bdrv_aio_cancel(acb);
>>> - dbs->in_cancel = false;
>>> }
>>> dbs->common.cb = NULL;
>>> dma_complete(dbs, 0);
>>> + qemu_aio_release(dbs);
>>> }
>>>
>>
>>
>> Hmm, dbs->in_cancel will be true always. Although this will avoid freeing dbs by dma_comlete.
>> But it maybe a mistake.
>
> This was on purpose; I'm doing the free myself in dma_aio_cancel, so I wanted to avoid the qemu_aio_release from dma_complete. This was in case of a recursive call to dma_complete. But I don't see how that recursive call could happen outside the "if (dbs->acb)"; and inside the "if" the protection is there already.
>
> Can you gather the backtraces for _both_ calls to qemu_aio_release, rather than just the second?
>
> With what guest are you encountering the bug?
>
> Paolo
>
>
> .
>
(gdb) bt
#0 qemu_aio_release (p=0x7f44788d1290) at block.c:4260
#1 0x00007f4477494e5e in dma_complete (dbs=0x7f44788d1290, ret=0) at dma-helpers.c:135
#2 0x00007f44774952c2 in dma_aio_cancel (acb=0x7f44788d1290) at dma-helpers.c:195
#3 0x00007f447744825b in bdrv_aio_cancel (acb=0x7f44788d1290) at block.c:3848
#4 0x00007f4477513911 in ide_bus_reset (bus=0x7f44785f1bd8) at hw/ide/core.c:1957
#5 0x00007f4477516b3c in piix3_reset (opaque=0x7f44785f1530) at hw/ide/piix.c:113
#6 0x00007f4477647b9f in qemu_devices_reset () at vl.c:2131
#7 0x00007f4477647c0f in qemu_system_reset (report=true) at vl.c:2140
#8 0x00007f4477648127 in main_loop_should_exit () at vl.c:2274
#9 0x00007f447764823a in main_loop () at vl.c:2323
#10 0x00007f447764f6da in main (argc=57, argv=0x7fff5d194378, envp=0x7fff5d194548) at vl.c:4803
Chenliang
is
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Qemu-devel] [PATCH] ide: fix double free
2014-07-02 11:57 ` ChenLiang
@ 2014-07-02 12:19 ` Paolo Bonzini
2014-07-02 12:46 ` 陈梁
` (2 more replies)
0 siblings, 3 replies; 19+ messages in thread
From: Paolo Bonzini @ 2014-07-02 12:19 UTC (permalink / raw)
To: ChenLiang
Cc: kwolf@redhat.com, Gonglei (Arei), Huangweidong (C),
qemu-devel@nongnu.org, stefanha@redhat.com
Il 02/07/2014 13:57, ChenLiang ha scritto:
>>> Hmm, dbs->in_cancel will be true always. Although this will avoid freeing dbs by dma_comlete.
>>> But it maybe a mistake.
>>
>> This was on purpose; I'm doing the free myself in dma_aio_cancel, so I wanted to avoid the qemu_aio_release from dma_complete. This was in case of a recursive call to dma_complete. But I don't see how that recursive call could happen outside the "if (dbs->acb)"; and inside the "if" the protection is there already.
>>
>> Can you gather the backtraces for _both_ calls to qemu_aio_release, rather than just the second?
>
> (gdb) bt
> #0 qemu_aio_release (p=0x7f44788d1290) at block.c:4260
> #1 0x00007f4477494e5e in dma_complete (dbs=0x7f44788d1290, ret=0) at dma-helpers.c:135
> #2 0x00007f44774952c2 in dma_aio_cancel (acb=0x7f44788d1290) at dma-helpers.c:195
> #3 0x00007f447744825b in bdrv_aio_cancel (acb=0x7f44788d1290) at block.c:3848
> #4 0x00007f4477513911 in ide_bus_reset (bus=0x7f44785f1bd8) at hw/ide/core.c:1957
> #5 0x00007f4477516b3c in piix3_reset (opaque=0x7f44785f1530) at hw/ide/piix.c:113
> #6 0x00007f4477647b9f in qemu_devices_reset () at vl.c:2131
> #7 0x00007f4477647c0f in qemu_system_reset (report=true) at vl.c:2140
> #8 0x00007f4477648127 in main_loop_should_exit () at vl.c:2274
> #9 0x00007f447764823a in main_loop () at vl.c:2323
> #10 0x00007f447764f6da in main (argc=57, argv=0x7fff5d194378, envp=0x7fff5d194548) at vl.c:4803
And the second is
#7 0x00007f3cb525de5e in dma_complete (dbs=0x7f3cb63f3220, ret=0) at dma-helpers.c:135
#8 0x00007f3cb525df3d in dma_bdrv_cb (opaque=0x7f3cb63f3220, ret=0) at dma-helpers.c:152
#9 0x00007f3cb5212102 in bdrv_co_em_bh (opaque=0x7f3cb6398980) at block.c:4127
#10 0x00007f3cb51f6cef in aio_bh_poll (ctx=0x7f3cb622a8f0) at async.c:70
#11 0x00007f3cb51f695a in aio_poll (ctx=0x7f3cb622a8f0, blocking=false) at aio-posix.c:185
#12 0x00007f3cb51f7056 in aio_ctx_dispatch (source=0x7f3cb622a8f0, callback=0x0, user_data=0x0)
at async.c:167
#13 0x00007f3cb48b969a in g_main_context_dispatch () from /usr/lib64/libglib-2.0.so.0
This explains why my patch "fixes" the bug. It turns a double free
into a dangling pointer: the second call now sees in_cancel == true
and skips the free.
The second call should have happened within dma_aio_cancel's call to
bdrv_aio_cancel. This is the real bug.
What is your version of QEMU? I cannot see any where bdrv_co_em_bh is
at line 4127 or bdrv_aio_cancel is at line 3848. Can you reproduce it
with qemu.git master?
Paolo
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Qemu-devel] [PATCH] ide: fix double free
2014-07-02 12:19 ` Paolo Bonzini
@ 2014-07-02 12:46 ` 陈梁
2014-07-02 12:54 ` 陈梁
2014-07-03 2:23 ` ChenLiang
2 siblings, 0 replies; 19+ messages in thread
From: 陈梁 @ 2014-07-02 12:46 UTC (permalink / raw)
To: Paolo Bonzini
Cc: kwolf@redhat.com, ChenLiang, Huangweidong (C), Gonglei (Arei),
qemu-devel@nongnu.org, 陈梁, stefanha@redhat.com
> Il 02/07/2014 13:57, ChenLiang ha scritto:
>>>> Hmm, dbs->in_cancel will be true always. Although this will avoid freeing dbs by dma_comlete.
>>>> But it maybe a mistake.
>>>
>>> This was on purpose; I'm doing the free myself in dma_aio_cancel, so I wanted to avoid the qemu_aio_release from dma_complete. This was in case of a recursive call to dma_complete. But I don't see how that recursive call could happen outside the "if (dbs->acb)"; and inside the "if" the protection is there already.
>>>
>>> Can you gather the backtraces for _both_ calls to qemu_aio_release, rather than just the second?
>>
>> (gdb) bt
>> #0 qemu_aio_release (p=0x7f44788d1290) at block.c:4260
>> #1 0x00007f4477494e5e in dma_complete (dbs=0x7f44788d1290, ret=0) at dma-helpers.c:135
>> #2 0x00007f44774952c2 in dma_aio_cancel (acb=0x7f44788d1290) at dma-helpers.c:195
>> #3 0x00007f447744825b in bdrv_aio_cancel (acb=0x7f44788d1290) at block.c:3848
>> #4 0x00007f4477513911 in ide_bus_reset (bus=0x7f44785f1bd8) at hw/ide/core.c:1957
>> #5 0x00007f4477516b3c in piix3_reset (opaque=0x7f44785f1530) at hw/ide/piix.c:113
>> #6 0x00007f4477647b9f in qemu_devices_reset () at vl.c:2131
>> #7 0x00007f4477647c0f in qemu_system_reset (report=true) at vl.c:2140
>> #8 0x00007f4477648127 in main_loop_should_exit () at vl.c:2274
>> #9 0x00007f447764823a in main_loop () at vl.c:2323
>> #10 0x00007f447764f6da in main (argc=57, argv=0x7fff5d194378, envp=0x7fff5d194548) at vl.c:4803
>
> And the second is
>
> #7 0x00007f3cb525de5e in dma_complete (dbs=0x7f3cb63f3220, ret=0) at dma-helpers.c:135
> #8 0x00007f3cb525df3d in dma_bdrv_cb (opaque=0x7f3cb63f3220, ret=0) at dma-helpers.c:152
> #9 0x00007f3cb5212102 in bdrv_co_em_bh (opaque=0x7f3cb6398980) at block.c:4127
> #10 0x00007f3cb51f6cef in aio_bh_poll (ctx=0x7f3cb622a8f0) at async.c:70
> #11 0x00007f3cb51f695a in aio_poll (ctx=0x7f3cb622a8f0, blocking=false) at aio-posix.c:185
> #12 0x00007f3cb51f7056 in aio_ctx_dispatch (source=0x7f3cb622a8f0, callback=0x0, user_data=0x0)
> at async.c:167
> #13 0x00007f3cb48b969a in g_main_context_dispatch () from /usr/lib64/libglib-2.0.so.0
>
> This explains why my patch "fixes" the bug. It turns a double free
> into a dangling pointer: the second call now sees in_cancel == true
> and skips the free.
>
> The second call should have happened within dma_aio_cancel's call to
> bdrv_aio_cancel. This is the real bug.
>
> What is your version of QEMU? I cannot see any where bdrv_co_em_bh is
> at line 4127 or bdrv_aio_cancel is at line 3848. Can you reproduce it
> with qemu.git master?
>
> Paolo
>
The version of my qemu is 1.5.0. Tomorrow I will reproduce it, I am in home now : ) .
Chenliang
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Qemu-devel] [PATCH] ide: fix double free
2014-07-02 12:19 ` Paolo Bonzini
2014-07-02 12:46 ` 陈梁
@ 2014-07-02 12:54 ` 陈梁
2014-07-02 13:09 ` Paolo Bonzini
2014-07-03 2:23 ` ChenLiang
2 siblings, 1 reply; 19+ messages in thread
From: 陈梁 @ 2014-07-02 12:54 UTC (permalink / raw)
To: Paolo Bonzini
Cc: kwolf@redhat.com, ChenLiang, Huangweidong (C), Gonglei (Arei),
qemu-devel@nongnu.org, 陈梁, stefanha@redhat.com
> Il 02/07/2014 13:57, ChenLiang ha scritto:
>>>> Hmm, dbs->in_cancel will be true always. Although this will avoid freeing dbs by dma_comlete.
>>>> But it maybe a mistake.
>>>
>>> This was on purpose; I'm doing the free myself in dma_aio_cancel, so I wanted to avoid the qemu_aio_release from dma_complete. This was in case of a recursive call to dma_complete. But I don't see how that recursive call could happen outside the "if (dbs->acb)"; and inside the "if" the protection is there already.
>>>
>>> Can you gather the backtraces for _both_ calls to qemu_aio_release, rather than just the second?
>>
>> (gdb) bt
>> #0 qemu_aio_release (p=0x7f44788d1290) at block.c:4260
>> #1 0x00007f4477494e5e in dma_complete (dbs=0x7f44788d1290, ret=0) at dma-helpers.c:135
>> #2 0x00007f44774952c2 in dma_aio_cancel (acb=0x7f44788d1290) at dma-helpers.c:195
>> #3 0x00007f447744825b in bdrv_aio_cancel (acb=0x7f44788d1290) at block.c:3848
>> #4 0x00007f4477513911 in ide_bus_reset (bus=0x7f44785f1bd8) at hw/ide/core.c:1957
>> #5 0x00007f4477516b3c in piix3_reset (opaque=0x7f44785f1530) at hw/ide/piix.c:113
>> #6 0x00007f4477647b9f in qemu_devices_reset () at vl.c:2131
>> #7 0x00007f4477647c0f in qemu_system_reset (report=true) at vl.c:2140
>> #8 0x00007f4477648127 in main_loop_should_exit () at vl.c:2274
>> #9 0x00007f447764823a in main_loop () at vl.c:2323
>> #10 0x00007f447764f6da in main (argc=57, argv=0x7fff5d194378, envp=0x7fff5d194548) at vl.c:4803
>
> And the second is
>
> #7 0x00007f3cb525de5e in dma_complete (dbs=0x7f3cb63f3220, ret=0) at dma-helpers.c:135
> #8 0x00007f3cb525df3d in dma_bdrv_cb (opaque=0x7f3cb63f3220, ret=0) at dma-helpers.c:152
> #9 0x00007f3cb5212102 in bdrv_co_em_bh (opaque=0x7f3cb6398980) at block.c:4127
> #10 0x00007f3cb51f6cef in aio_bh_poll (ctx=0x7f3cb622a8f0) at async.c:70
> #11 0x00007f3cb51f695a in aio_poll (ctx=0x7f3cb622a8f0, blocking=false) at aio-posix.c:185
> #12 0x00007f3cb51f7056 in aio_ctx_dispatch (source=0x7f3cb622a8f0, callback=0x0, user_data=0x0)
> at async.c:167
> #13 0x00007f3cb48b969a in g_main_context_dispatch () from /usr/lib64/libglib-2.0.so.0
>
> This explains why my patch "fixes" the bug. It turns a double free
> into a dangling pointer: the second call now sees in_cancel == true
> and skips the free.
>
> The second call should have happened within dma_aio_cancel's call to
> bdrv_aio_cancel. This is the real bug.
IMO, the second need not happened within dma_aio_cancel's call to bdrv_aio_cancel.
The double free will be happened if dam_aio_cancel is called.
>
> What is your version of QEMU? I cannot see any where bdrv_co_em_bh is
> at line 4127 or bdrv_aio_cancel is at line 3848. Can you reproduce it
> with qemu.git master?
>
> Paolo
>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Qemu-devel] [PATCH] ide: fix double free
2014-07-02 12:54 ` 陈梁
@ 2014-07-02 13:09 ` Paolo Bonzini
0 siblings, 0 replies; 19+ messages in thread
From: Paolo Bonzini @ 2014-07-02 13:09 UTC (permalink / raw)
To: 陈梁
Cc: kwolf@redhat.com, ChenLiang, Huangweidong (C),
qemu-devel@nongnu.org, Gonglei (Arei), stefanha@redhat.com
Il 02/07/2014 14:54, 陈梁 ha scritto:
>> > The second call should have happened within dma_aio_cancel's call to
>> > bdrv_aio_cancel. This is the real bug.
> IMO, the second need not happened within dma_aio_cancel's call to bdrv_aio_cancel.
> The double free will be happened if dam_aio_cancel is called.
The callback must not be invoked after bdrv_aio_cancel. This is the
fundamental invariant of bdrv_aio_cancel. All implementations of AIOCB
must respect it, or bugs like this one happen.
Here, either bdrv_aio_cancel was not invoked, or the invariant was broken.
The other invariant, this time in dma-helpers.c, is that dma_bdrv_cb
either exits with no pending AIOCB, or it exits with a non-NULL
dbs->acb. If bdrv_aio_cancel was not invoked, this invariant was broken
because there is a pending AIOCB but it is not in dbs->acb.
Paolo
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Qemu-devel] [PATCH] ide: fix double free
2014-07-02 12:19 ` Paolo Bonzini
2014-07-02 12:46 ` 陈梁
2014-07-02 12:54 ` 陈梁
@ 2014-07-03 2:23 ` ChenLiang
2014-07-03 10:41 ` Paolo Bonzini
2 siblings, 1 reply; 19+ messages in thread
From: ChenLiang @ 2014-07-03 2:23 UTC (permalink / raw)
To: Paolo Bonzini
Cc: kwolf@redhat.com, Gonglei (Arei), Huangweidong (C),
qemu-devel@nongnu.org, stefanha@redhat.com
On 2014/7/2 20:19, Paolo Bonzini wrote:
> Il 02/07/2014 13:57, ChenLiang ha scritto:
>>>> Hmm, dbs->in_cancel will be true always. Although this will avoid freeing dbs by dma_comlete.
>>>> But it maybe a mistake.
>>>
>>> This was on purpose; I'm doing the free myself in dma_aio_cancel, so I wanted to avoid the qemu_aio_release from dma_complete. This was in case of a recursive call to dma_complete. But I don't see how that recursive call could happen outside the "if (dbs->acb)"; and inside the "if" the protection is there already.
>>>
>>> Can you gather the backtraces for _both_ calls to qemu_aio_release, rather than just the second?
>>
>> (gdb) bt
>> #0 qemu_aio_release (p=0x7f44788d1290) at block.c:4260
>> #1 0x00007f4477494e5e in dma_complete (dbs=0x7f44788d1290, ret=0) at dma-helpers.c:135
>> #2 0x00007f44774952c2 in dma_aio_cancel (acb=0x7f44788d1290) at dma-helpers.c:195
>> #3 0x00007f447744825b in bdrv_aio_cancel (acb=0x7f44788d1290) at block.c:3848
>> #4 0x00007f4477513911 in ide_bus_reset (bus=0x7f44785f1bd8) at hw/ide/core.c:1957
>> #5 0x00007f4477516b3c in piix3_reset (opaque=0x7f44785f1530) at hw/ide/piix.c:113
>> #6 0x00007f4477647b9f in qemu_devices_reset () at vl.c:2131
>> #7 0x00007f4477647c0f in qemu_system_reset (report=true) at vl.c:2140
>> #8 0x00007f4477648127 in main_loop_should_exit () at vl.c:2274
>> #9 0x00007f447764823a in main_loop () at vl.c:2323
>> #10 0x00007f447764f6da in main (argc=57, argv=0x7fff5d194378, envp=0x7fff5d194548) at vl.c:4803
>
> And the second is
>
> #7 0x00007f3cb525de5e in dma_complete (dbs=0x7f3cb63f3220, ret=0) at dma-helpers.c:135
> #8 0x00007f3cb525df3d in dma_bdrv_cb (opaque=0x7f3cb63f3220, ret=0) at dma-helpers.c:152
> #9 0x00007f3cb5212102 in bdrv_co_em_bh (opaque=0x7f3cb6398980) at block.c:4127
> #10 0x00007f3cb51f6cef in aio_bh_poll (ctx=0x7f3cb622a8f0) at async.c:70
> #11 0x00007f3cb51f695a in aio_poll (ctx=0x7f3cb622a8f0, blocking=false) at aio-posix.c:185
> #12 0x00007f3cb51f7056 in aio_ctx_dispatch (source=0x7f3cb622a8f0, callback=0x0, user_data=0x0)
> at async.c:167
> #13 0x00007f3cb48b969a in g_main_context_dispatch () from /usr/lib64/libglib-2.0.so.0
>
> This explains why my patch "fixes" the bug. It turns a double free
> into a dangling pointer: the second call now sees in_cancel == true
> and skips the free.
>
> The second call should have happened within dma_aio_cancel's call to
> bdrv_aio_cancel. This is the real bug.
>
> What is your version of QEMU? I cannot see any where bdrv_co_em_bh is
> at line 4127 or bdrv_aio_cancel is at line 3848. Can you reproduce it
> with qemu.git master?
>
> Paolo
>
> .
>
qemu master branch bt:
Program received signal SIGABRT, Aborted.
0x00007fd548355b55 in raise () from /lib64/libc.so.6
(gdb) bt
#0 0x00007fd548355b55 in raise () from /lib64/libc.so.6
#1 0x00007fd548357131 in abort () from /lib64/libc.so.6
#2 0x00007fd548393e0f in __libc_message () from /lib64/libc.so.6
#3 0x00007fd548399618 in malloc_printerr () from /lib64/libc.so.6
#4 0x00007fd54b15e80e in free_and_trace (mem=0x7fd54beb2230) at vl.c:2815
#5 0x00007fd54b3453cd in qemu_aio_release (p=0x7fd54beb2230) at block.c:4813
#6 0x00007fd54b15717d in dma_complete (dbs=0x7fd54beb2230, ret=0) at dma-helpers.c:132
#7 0x00007fd54b157253 in dma_bdrv_cb (opaque=0x7fd54beb2230, ret=0) at dma-helpers.c:148
#8 0x00007fd54b344db8 in bdrv_co_em_bh (opaque=0x7fd54bea4b30) at block.c:4676
#9 0x00007fd54b335a72 in aio_bh_poll (ctx=0x7fd54bcec990) at async.c:81
#10 0x00007fd54b34b1b4 in aio_poll (ctx=0x7fd54bcec990, blocking=false) at aio-posix.c:188
#11 0x00007fd54b335ee0 in aio_ctx_dispatch (source=0x7fd54bcec990, callback=0x0, user_data=0x0) at async.c:211
#12 0x00007fd549e3669a in g_main_context_dispatch () from /usr/lib64/libglib-2.0.so.0
#13 0x00007fd54b348c45 in glib_pollfds_poll () at main-loop.c:190
#14 0x00007fd54b348d3d in os_host_main_loop_wait (timeout=0) at main-loop.c:235
#15 0x00007fd54b348e2f in main_loop_wait (nonblocking=0) at main-loop.c:484
#16 0x00007fd54b15b0f8 in main_loop () at vl.c:2007
#17 0x00007fd54b162a35 in main (argc=57, argv=0x7fff152720a8, envp=0x7fff15272278) at vl.c:4526
(gdb) bt
#0 qemu_aio_release (p=0x7f86420ebec0) at block.c:4811
#1 0x00007f86412b617d in dma_complete (dbs=0x7f86420ebec0, ret=0) at dma-helpers.c:132
#2 0x00007f86412b65ab in dma_aio_cancel (acb=0x7f86420ebec0) at dma-helpers.c:192
#3 0x00007f86414a3996 in bdrv_aio_cancel (acb=0x7f86420ebec0) at block.c:4559
#4 0x00007f86413906af in ide_bus_reset (bus=0x7f8641fe3a20) at hw/ide/core.c:2056
#5 0x00007f86413967d6 in piix3_reset (opaque=0x7f8641fe32a0) at hw/ide/piix.c:114
#6 0x00007f86412b9a37 in qemu_devices_reset () at vl.c:1829
#7 0x00007f86412b9aef in qemu_system_reset (report=true) at vl.c:1842
#8 0x00007f86412b9fe2 in main_loop_should_exit () at vl.c:1971
#9 0x00007f86412ba100 in main_loop () at vl.c:2011
#10 0x00007f86412c1a35 in main (argc=57, argv=0x7fff2e827d38, envp=0x7fff2e827f08) at vl.c:4526
BTW, is it better to rename dbs->in_cancel to dbs->canceled ?
Best regards
Chenliang
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Qemu-devel] [PATCH] ide: fix double free
2014-07-03 2:23 ` ChenLiang
@ 2014-07-03 10:41 ` Paolo Bonzini
2014-07-07 8:12 ` ChenLiang
0 siblings, 1 reply; 19+ messages in thread
From: Paolo Bonzini @ 2014-07-03 10:41 UTC (permalink / raw)
To: ChenLiang
Cc: kwolf@redhat.com, Gonglei (Arei), Huangweidong (C),
qemu-devel@nongnu.org, stefanha@redhat.com
Il 03/07/2014 04:23, ChenLiang ha scritto:
> Program received signal SIGABRT, Aborted.
> 0x00007fd548355b55 in raise () from /lib64/libc.so.6
> (gdb) bt
> #0 0x00007fd548355b55 in raise () from /lib64/libc.so.6
> #1 0x00007fd548357131 in abort () from /lib64/libc.so.6
> #2 0x00007fd548393e0f in __libc_message () from /lib64/libc.so.6
> #3 0x00007fd548399618 in malloc_printerr () from /lib64/libc.so.6
> #4 0x00007fd54b15e80e in free_and_trace (mem=0x7fd54beb2230) at vl.c:2815
> #5 0x00007fd54b3453cd in qemu_aio_release (p=0x7fd54beb2230) at block.c:4813
> #6 0x00007fd54b15717d in dma_complete (dbs=0x7fd54beb2230, ret=0) at dma-helpers.c:132
> #7 0x00007fd54b157253 in dma_bdrv_cb (opaque=0x7fd54beb2230, ret=0) at dma-helpers.c:148
> #8 0x00007fd54b344db8 in bdrv_co_em_bh (opaque=0x7fd54bea4b30) at block.c:4676
> #9 0x00007fd54b335a72 in aio_bh_poll (ctx=0x7fd54bcec990) at async.c:81
> #10 0x00007fd54b34b1b4 in aio_poll (ctx=0x7fd54bcec990, blocking=false) at aio-posix.c:188
> #11 0x00007fd54b335ee0 in aio_ctx_dispatch (source=0x7fd54bcec990, callback=0x0, user_data=0x0) at async.c:211
> #12 0x00007fd549e3669a in g_main_context_dispatch () from /usr/lib64/libglib-2.0.so.0
> #13 0x00007fd54b348c45 in glib_pollfds_poll () at main-loop.c:190
> #14 0x00007fd54b348d3d in os_host_main_loop_wait (timeout=0) at main-loop.c:235
> #15 0x00007fd54b348e2f in main_loop_wait (nonblocking=0) at main-loop.c:484
> #16 0x00007fd54b15b0f8 in main_loop () at vl.c:2007
> #17 0x00007fd54b162a35 in main (argc=57, argv=0x7fff152720a8, envp=0x7fff15272278) at vl.c:4526
>
> (gdb) bt
> #0 qemu_aio_release (p=0x7f86420ebec0) at block.c:4811
> #1 0x00007f86412b617d in dma_complete (dbs=0x7f86420ebec0, ret=0) at dma-helpers.c:132
> #2 0x00007f86412b65ab in dma_aio_cancel (acb=0x7f86420ebec0) at dma-helpers.c:192
> #3 0x00007f86414a3996 in bdrv_aio_cancel (acb=0x7f86420ebec0) at block.c:4559
> #4 0x00007f86413906af in ide_bus_reset (bus=0x7f8641fe3a20) at hw/ide/core.c:2056
> #5 0x00007f86413967d6 in piix3_reset (opaque=0x7f8641fe32a0) at hw/ide/piix.c:114
> #6 0x00007f86412b9a37 in qemu_devices_reset () at vl.c:1829
> #7 0x00007f86412b9aef in qemu_system_reset (report=true) at vl.c:1842
> #8 0x00007f86412b9fe2 in main_loop_should_exit () at vl.c:1971
> #9 0x00007f86412ba100 in main_loop () at vl.c:2011
> #10 0x00007f86412c1a35 in main (argc=57, argv=0x7fff2e827d38, envp=0x7fff2e827f08) at vl.c:4526
Ok, this is the same as your previous backtrace. The bug is still the
same: dma_bdrv_cb must not be called dma_aio_cancel has finished the
recursive call to bdrv_aio_cancel.
> BTW, is it better to rename dbs->in_cancel to dbs->canceled ?
If we were to apply my patch, yes. But with the current logic
"in_cancel" says "are we inside the recursive call to bdrv_aio_cancel"
so I think the answer is no. My patch is just a band-aid, I don't think
it should be applied.
Paolo
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Qemu-devel] [PATCH] ide: fix double free
2014-07-03 10:41 ` Paolo Bonzini
@ 2014-07-07 8:12 ` ChenLiang
2014-07-07 12:38 ` Paolo Bonzini
0 siblings, 1 reply; 19+ messages in thread
From: ChenLiang @ 2014-07-07 8:12 UTC (permalink / raw)
To: Paolo Bonzini
Cc: kwolf@redhat.com, Gonglei (Arei), Huangweidong (C),
qemu-devel@nongnu.org, stefanha@redhat.com
On 2014/7/3 18:41, Paolo Bonzini wrote:
> Il 03/07/2014 04:23, ChenLiang ha scritto:
>> Program received signal SIGABRT, Aborted.
>> 0x00007fd548355b55 in raise () from /lib64/libc.so.6
>> (gdb) bt
>> #0 0x00007fd548355b55 in raise () from /lib64/libc.so.6
>> #1 0x00007fd548357131 in abort () from /lib64/libc.so.6
>> #2 0x00007fd548393e0f in __libc_message () from /lib64/libc.so.6
>> #3 0x00007fd548399618 in malloc_printerr () from /lib64/libc.so.6
>> #4 0x00007fd54b15e80e in free_and_trace (mem=0x7fd54beb2230) at vl.c:2815
>> #5 0x00007fd54b3453cd in qemu_aio_release (p=0x7fd54beb2230) at block.c:4813
>> #6 0x00007fd54b15717d in dma_complete (dbs=0x7fd54beb2230, ret=0) at dma-helpers.c:132
>> #7 0x00007fd54b157253 in dma_bdrv_cb (opaque=0x7fd54beb2230, ret=0) at dma-helpers.c:148
>> #8 0x00007fd54b344db8 in bdrv_co_em_bh (opaque=0x7fd54bea4b30) at block.c:4676
>> #9 0x00007fd54b335a72 in aio_bh_poll (ctx=0x7fd54bcec990) at async.c:81
>> #10 0x00007fd54b34b1b4 in aio_poll (ctx=0x7fd54bcec990, blocking=false) at aio-posix.c:188
>> #11 0x00007fd54b335ee0 in aio_ctx_dispatch (source=0x7fd54bcec990, callback=0x0, user_data=0x0) at async.c:211
>> #12 0x00007fd549e3669a in g_main_context_dispatch () from /usr/lib64/libglib-2.0.so.0
>> #13 0x00007fd54b348c45 in glib_pollfds_poll () at main-loop.c:190
>> #14 0x00007fd54b348d3d in os_host_main_loop_wait (timeout=0) at main-loop.c:235
>> #15 0x00007fd54b348e2f in main_loop_wait (nonblocking=0) at main-loop.c:484
>> #16 0x00007fd54b15b0f8 in main_loop () at vl.c:2007
>> #17 0x00007fd54b162a35 in main (argc=57, argv=0x7fff152720a8, envp=0x7fff15272278) at vl.c:4526
>>
>> (gdb) bt
>> #0 qemu_aio_release (p=0x7f86420ebec0) at block.c:4811
>> #1 0x00007f86412b617d in dma_complete (dbs=0x7f86420ebec0, ret=0) at dma-helpers.c:132
>> #2 0x00007f86412b65ab in dma_aio_cancel (acb=0x7f86420ebec0) at dma-helpers.c:192
>> #3 0x00007f86414a3996 in bdrv_aio_cancel (acb=0x7f86420ebec0) at block.c:4559
>> #4 0x00007f86413906af in ide_bus_reset (bus=0x7f8641fe3a20) at hw/ide/core.c:2056
>> #5 0x00007f86413967d6 in piix3_reset (opaque=0x7f8641fe32a0) at hw/ide/piix.c:114
>> #6 0x00007f86412b9a37 in qemu_devices_reset () at vl.c:1829
>> #7 0x00007f86412b9aef in qemu_system_reset (report=true) at vl.c:1842
>> #8 0x00007f86412b9fe2 in main_loop_should_exit () at vl.c:1971
>> #9 0x00007f86412ba100 in main_loop () at vl.c:2011
>> #10 0x00007f86412c1a35 in main (argc=57, argv=0x7fff2e827d38, envp=0x7fff2e827f08) at vl.c:4526
>
> Ok, this is the same as your previous backtrace. The bug is still the same: dma_bdrv_cb must not be called dma_aio_cancel has finished the recursive call to bdrv_aio_cancel.
>
>> BTW, is it better to rename dbs->in_cancel to dbs->canceled ?
>
> If we were to apply my patch, yes. But with the current logic "in_cancel" says "are we inside the recursive call to bdrv_aio_cancel" so I think the answer is no. My patch is just a band-aid, I don't think it should be applied.
>
> Paolo
>
> .
>
Hi,
virtio_blk_reset uses bdrv_drain_all too.
static void virtio_blk_reset(VirtIODevice *vdev)
{
VirtIOBlock *s = VIRTIO_BLK(vdev);
#ifdef CONFIG_VIRTIO_BLK_DATA_PLANE
if (s->dataplane) {
virtio_blk_data_plane_stop(s->dataplane);
}
#endif
/*
* This should cancel pending requests, but can't do nicely until there
* are per-device request lists.
*/
bdrv_drain_all();
bdrv_set_enable_write_cache(s->bs, s->original_wce);
}
Best regards
Chenliang
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [Qemu-devel] [PATCH] ide: fix double free
2014-07-07 8:12 ` ChenLiang
@ 2014-07-07 12:38 ` Paolo Bonzini
0 siblings, 0 replies; 19+ messages in thread
From: Paolo Bonzini @ 2014-07-07 12:38 UTC (permalink / raw)
To: ChenLiang
Cc: kwolf@redhat.com, Gonglei (Arei), Huangweidong (C),
qemu-devel@nongnu.org, stefanha@redhat.com
Il 07/07/2014 10:12, ChenLiang ha scritto:
> Hi,
> virtio_blk_reset uses bdrv_drain_all too.
That's a limitation of virtio_blk_reset, as the comment above says.
virtio-scsi uses bdrv_aio_cancel. :)
Paolo
^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2014-07-07 12:38 UTC | newest]
Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-07-02 8:50 [Qemu-devel] [PATCH] ide: fix double free arei.gonglei
2014-07-02 9:04 ` Paolo Bonzini
2014-07-02 9:24 ` ChenLiang
2014-07-02 9:25 ` Paolo Bonzini
2014-07-02 9:46 ` Gonglei (Arei)
2014-07-02 10:16 ` Paolo Bonzini
2014-07-02 11:12 ` ChenLiang
2014-07-02 11:24 ` Paolo Bonzini
2014-07-02 11:33 ` ChenLiang
2014-07-02 11:40 ` Paolo Bonzini
2014-07-02 11:57 ` ChenLiang
2014-07-02 12:19 ` Paolo Bonzini
2014-07-02 12:46 ` 陈梁
2014-07-02 12:54 ` 陈梁
2014-07-02 13:09 ` Paolo Bonzini
2014-07-03 2:23 ` ChenLiang
2014-07-03 10:41 ` Paolo Bonzini
2014-07-07 8:12 ` ChenLiang
2014-07-07 12:38 ` Paolo Bonzini
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).