qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH] ide: fix double free
@ 2014-07-02  8:50 arei.gonglei
  2014-07-02  9:04 ` Paolo Bonzini
  0 siblings, 1 reply; 19+ messages in thread
From: arei.gonglei @ 2014-07-02  8:50 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, Chenliang, weidong.huang, Gonglei, stefanha, pbonzini

From: Chenliang <chenliang88@huawei.com>

Qemu may double free when excutes command "reboot -f" in vm.
One path is bdrv_aio_cancel(), the other is dma_bdrv_cb()
callback prcocess.

Signed-off-by: Chenliang <chenliang88@huawei.com>
Signed-off-by: Gonglei <arei.gonglei@huawei.com>
---
Qemu crash bt:

Program received signal SIGABRT, Aborted.
0x00007f3cb2c76b55 in raise () from /lib64/libc.so.6
(gdb) bt
#0  0x00007f3cb2c76b55 in raise () from /lib64/libc.so.6
#1  0x00007f3cb2c78131 in abort () from /lib64/libc.so.6
#2  0x00007f3cb2cb4e0f in __libc_message () from /lib64/libc.so.6
#3  0x00007f3cb2cba618 in malloc_printerr () from /lib64/libc.so.6
#4  0x00007f3cb2cbf65c in free () from /lib64/libc.so.6
#5  0x00007f3cb54146f2 in free_and_trace (mem=0x7f3cb63f3220) at vl.c:3078
#6  0x00007f3cb52126cd in qemu_aio_release (p=0x7f3cb63f3220) at block.c:4262
#7  0x00007f3cb525de5e in dma_complete (dbs=0x7f3cb63f3220, ret=0) at dma-helpers.c:135
#8  0x00007f3cb525df3d in dma_bdrv_cb (opaque=0x7f3cb63f3220, ret=0) at dma-helpers.c:152
#9  0x00007f3cb5212102 in bdrv_co_em_bh (opaque=0x7f3cb6398980) at block.c:4127
#10 0x00007f3cb51f6cef in aio_bh_poll (ctx=0x7f3cb622a8f0) at async.c:70
#11 0x00007f3cb51f695a in aio_poll (ctx=0x7f3cb622a8f0, blocking=false) at aio-posix.c:185
#12 0x00007f3cb51f7056 in aio_ctx_dispatch (source=0x7f3cb622a8f0, callback=0x0, user_data=0x0)
    at async.c:167
#13 0x00007f3cb48b969a in g_main_context_dispatch () from /usr/lib64/libglib-2.0.so.0
#14 0x00007f3cb538956d in glib_pollfds_poll () at main-loop.c:188
#15 0x00007f3cb538965e in os_host_main_loop_wait (timeout=0) at main-loop.c:233
#16 0x00007f3cb5389718 in main_loop_wait (nonblocking=0) at main-loop.c:478
#17 0x00007f3cb5411212 in main_loop () at vl.c:2319
#18 0x00007f3cb54186ba in main (argc=57, argv=0x7fff5510c968, envp=0x7fff5510cb38) at vl.c:4803
---
 hw/ide/core.c | 8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/hw/ide/core.c b/hw/ide/core.c
index 3a38f1e..76d65c1 100644
--- a/hw/ide/core.c
+++ b/hw/ide/core.c
@@ -2050,11 +2050,9 @@ void ide_bus_reset(IDEBus *bus)
 
     /* pending async DMA */
     if (bus->dma->aiocb) {
-#ifdef DEBUG_AIO
-        printf("aio_cancel\n");
-#endif
-        bdrv_aio_cancel(bus->dma->aiocb);
-        bus->dma->aiocb = NULL;
+        bdrv_drain_all();
+        bdrv_flush_all();
+        assert(bus->dma->aiocb == NULL);
     }
 
     /* reset dma provider too */
-- 
1.7.12.4

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [PATCH] ide: fix double free
  2014-07-02  8:50 [Qemu-devel] [PATCH] ide: fix double free arei.gonglei
@ 2014-07-02  9:04 ` Paolo Bonzini
  2014-07-02  9:24   ` ChenLiang
  0 siblings, 1 reply; 19+ messages in thread
From: Paolo Bonzini @ 2014-07-02  9:04 UTC (permalink / raw)
  To: arei.gonglei, qemu-devel; +Cc: kwolf, Chenliang, weidong.huang, stefanha

Il 02/07/2014 10:50, arei.gonglei@huawei.com ha scritto:
>      if (bus->dma->aiocb) {
> -#ifdef DEBUG_AIO
> -        printf("aio_cancel\n");
> -#endif
> -        bdrv_aio_cancel(bus->dma->aiocb);
> -        bus->dma->aiocb = NULL;
> +        bdrv_drain_all();
> +        bdrv_flush_all();
> +        assert(bus->dma->aiocb == NULL);
>      }

This is definitely a heavyweight solution, and in fact the bug should 
not be there in the first place.  See dma_complete:

static void dma_complete(DMAAIOCB *dbs, int ret)
{
    trace_dma_complete(dbs, ret, dbs->common.cb);

    dma_bdrv_unmap(dbs);
    if (dbs->common.cb) {
        dbs->common.cb(dbs->common.opaque, ret);
    }
    qemu_iovec_destroy(&dbs->iov);
    if (dbs->bh) {
        qemu_bh_delete(dbs->bh);
        dbs->bh = NULL;
    }
    if (!dbs->in_cancel) {
        /* Requests may complete while dma_aio_cancel is in progress.  In
         * this case, the AIOCB should not be released because it is still
         * referenced by dma_aio_cancel.  */
        qemu_aio_release(dbs);
    }
}

Perhaps something like this?

diff --git a/dma-helpers.c b/dma-helpers.c
index 53cbe92..21b70d12 100644
--- a/dma-helpers.c
+++ b/dma-helpers.c
@@ -181,15 +181,15 @@ static void dma_aio_cancel(BlockDriverAIOCB *acb)
 
     trace_dma_aio_cancel(dbs);
 
+    dbs->in_cancel = true;
     if (dbs->acb) {
         BlockDriverAIOCB *acb = dbs->acb;
         dbs->acb = NULL;
-        dbs->in_cancel = true;
         bdrv_aio_cancel(acb);
-        dbs->in_cancel = false;
     }
     dbs->common.cb = NULL;
     dma_complete(dbs, 0);
+    qemu_aio_release(dbs);
 }
 
 static const AIOCBInfo dma_aiocb_info = {

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [PATCH] ide: fix double free
  2014-07-02  9:04 ` Paolo Bonzini
@ 2014-07-02  9:24   ` ChenLiang
  2014-07-02  9:25     ` Paolo Bonzini
  0 siblings, 1 reply; 19+ messages in thread
From: ChenLiang @ 2014-07-02  9:24 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: kwolf, arei.gonglei, weidong.huang, qemu-devel, stefanha

On 2014/7/2 17:04, Paolo Bonzini wrote:

> This is definitely a heavyweight solution, and in fact the bug should 
> not be there in the first place.  See dma_complete:
> 
> static void dma_complete(DMAAIOCB *dbs, int ret)
> {
>     trace_dma_complete(dbs, ret, dbs->common.cb);
> 
>     dma_bdrv_unmap(dbs);
>     if (dbs->common.cb) {
>         dbs->common.cb(dbs->common.opaque, ret);
>     }
>     qemu_iovec_destroy(&dbs->iov);
>     if (dbs->bh) {
>         qemu_bh_delete(dbs->bh);
>         dbs->bh = NULL;
>     }
>     if (!dbs->in_cancel) {
>         /* Requests may complete while dma_aio_cancel is in progress.  In
>          * this case, the AIOCB should not be released because it is still
>          * referenced by dma_aio_cancel.  */
>         qemu_aio_release(dbs);
>     }
> }
> 
> Perhaps something like this?
> 
> diff --git a/dma-helpers.c b/dma-helpers.c
> index 53cbe92..21b70d12 100644
> --- a/dma-helpers.c
> +++ b/dma-helpers.c
> @@ -181,15 +181,15 @@ static void dma_aio_cancel(BlockDriverAIOCB *acb)
>  
>      trace_dma_aio_cancel(dbs);
>  
> +    dbs->in_cancel = true;
>      if (dbs->acb) {
>          BlockDriverAIOCB *acb = dbs->acb;
>          dbs->acb = NULL;
> -        dbs->in_cancel = true;
>          bdrv_aio_cancel(acb);
> -        dbs->in_cancel = false;
>      }
>      dbs->common.cb = NULL;
>      dma_complete(dbs, 0);
> +    qemu_aio_release(dbs);
>  }
>  
>  static const AIOCBInfo dma_aiocb_info = {
> 
> 
> .
> 

acked, thanks

Best regards
Chenliang

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [PATCH] ide: fix double free
  2014-07-02  9:24   ` ChenLiang
@ 2014-07-02  9:25     ` Paolo Bonzini
  2014-07-02  9:46       ` Gonglei (Arei)
  0 siblings, 1 reply; 19+ messages in thread
From: Paolo Bonzini @ 2014-07-02  9:25 UTC (permalink / raw)
  To: ChenLiang; +Cc: kwolf, arei.gonglei, weidong.huang, qemu-devel, stefanha

Il 02/07/2014 11:24, ChenLiang ha scritto:
> On 2014/7/2 17:04, Paolo Bonzini wrote:
>
>> This is definitely a heavyweight solution, and in fact the bug should
>> not be there in the first place.  See dma_complete:
>>
>> static void dma_complete(DMAAIOCB *dbs, int ret)
>> {
>>     trace_dma_complete(dbs, ret, dbs->common.cb);
>>
>>     dma_bdrv_unmap(dbs);
>>     if (dbs->common.cb) {
>>         dbs->common.cb(dbs->common.opaque, ret);
>>     }
>>     qemu_iovec_destroy(&dbs->iov);
>>     if (dbs->bh) {
>>         qemu_bh_delete(dbs->bh);
>>         dbs->bh = NULL;
>>     }
>>     if (!dbs->in_cancel) {
>>         /* Requests may complete while dma_aio_cancel is in progress.  In
>>          * this case, the AIOCB should not be released because it is still
>>          * referenced by dma_aio_cancel.  */
>>         qemu_aio_release(dbs);
>>     }
>> }
>>
>> Perhaps something like this?
>>
>> diff --git a/dma-helpers.c b/dma-helpers.c
>> index 53cbe92..21b70d12 100644
>> --- a/dma-helpers.c
>> +++ b/dma-helpers.c
>> @@ -181,15 +181,15 @@ static void dma_aio_cancel(BlockDriverAIOCB *acb)
>>
>>      trace_dma_aio_cancel(dbs);
>>
>> +    dbs->in_cancel = true;
>>      if (dbs->acb) {
>>          BlockDriverAIOCB *acb = dbs->acb;
>>          dbs->acb = NULL;
>> -        dbs->in_cancel = true;
>>          bdrv_aio_cancel(acb);
>> -        dbs->in_cancel = false;
>>      }
>>      dbs->common.cb = NULL;
>>      dma_complete(dbs, 0);
>> +    qemu_aio_release(dbs);
>>  }
>>
>>  static const AIOCBInfo dma_aiocb_info = {
>>
>>
>> .
>>
>
> acked, thanks

Did you test this? :)  What is the testcase?

Paolo

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [PATCH] ide: fix double free
  2014-07-02  9:25     ` Paolo Bonzini
@ 2014-07-02  9:46       ` Gonglei (Arei)
  2014-07-02 10:16         ` Paolo Bonzini
  0 siblings, 1 reply; 19+ messages in thread
From: Gonglei (Arei) @ 2014-07-02  9:46 UTC (permalink / raw)
  To: Paolo Bonzini, chenliang (T)
  Cc: kwolf@redhat.com, Huangweidong (C), qemu-devel@nongnu.org,
	stefanha@redhat.com

> -----Original Message-----
> From: Paolo Bonzini [mailto:pbonzini@redhat.com]
> Sent: Wednesday, July 02, 2014 5:26 PM
> To: chenliang (T)
> Cc: Gonglei (Arei); qemu-devel@nongnu.org; kwolf@redhat.com;
> stefanha@redhat.com; Huangweidong (C)
> Subject: Re: [PATCH] ide: fix double free
> 
> Il 02/07/2014 11:24, ChenLiang ha scritto:
> > On 2014/7/2 17:04, Paolo Bonzini wrote:
> >
> >> This is definitely a heavyweight solution, and in fact the bug should
> >> not be there in the first place.  See dma_complete:
> >>
> >> static void dma_complete(DMAAIOCB *dbs, int ret)
> >> {
> >>     trace_dma_complete(dbs, ret, dbs->common.cb);
> >>
> >>     dma_bdrv_unmap(dbs);
> >>     if (dbs->common.cb) {
> >>         dbs->common.cb(dbs->common.opaque, ret);
> >>     }
> >>     qemu_iovec_destroy(&dbs->iov);
> >>     if (dbs->bh) {
> >>         qemu_bh_delete(dbs->bh);
> >>         dbs->bh = NULL;
> >>     }
> >>     if (!dbs->in_cancel) {
> >>         /* Requests may complete while dma_aio_cancel is in progress.
> In
> >>          * this case, the AIOCB should not be released because it is still
> >>          * referenced by dma_aio_cancel.  */
> >>         qemu_aio_release(dbs);
> >>     }
> >> }
> >>
> >> Perhaps something like this?
> >>
> >> diff --git a/dma-helpers.c b/dma-helpers.c
> >> index 53cbe92..21b70d12 100644
> >> --- a/dma-helpers.c
> >> +++ b/dma-helpers.c
> >> @@ -181,15 +181,15 @@ static void dma_aio_cancel(BlockDriverAIOCB
> *acb)
> >>
> >>      trace_dma_aio_cancel(dbs);
> >>
> >> +    dbs->in_cancel = true;
> >>      if (dbs->acb) {
> >>          BlockDriverAIOCB *acb = dbs->acb;
> >>          dbs->acb = NULL;
> >> -        dbs->in_cancel = true;
> >>          bdrv_aio_cancel(acb);
> >> -        dbs->in_cancel = false;
> >>      }
> >>      dbs->common.cb = NULL;
> >>      dma_complete(dbs, 0);
> >> +    qemu_aio_release(dbs);
> >>  }
> >>
> >>  static const AIOCBInfo dma_aiocb_info = {
> >>
> >>
> >> .
> >>
> >
> > acked, thanks
> 
> Did you test this? :)  What is the testcase?
> 
> Paolo

Tested-by: Gonglei <arei.gonglei@huawei.com>

Hi, Paolo. We have tested your above patch, and it works well for us.

The qemu command line list as bellow, which started by libvirt :

/home/qemu/x86_64-softmmu/qemu-system-x86_64 -name test -S -machine pc-i440fx-1.5,accel=kvm,usb=off -m 4096 -realtime mlock=off -smp 16,sockets=16,cores=1,threads=1 -uuid abbbd27f-b83c-4c3b-8add-98df31eea0b7 -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/test.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device lsi,id=scsi0,bus=pci.0,addr=0x5 -drive file=/tmp/qemu-crash/linux_x86_64.iso,if=none,id=drive-ide0-0-0,readonly=on,format=raw,cache=none,aio=native -device ide-cd,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0,bootindex=1 -drive file=/home/test.img,if=none,id=drive-ide0-0-1,format=raw,cache=none,aio=native -device ide-hd,bus=ide.0,unit=1,drive=drive-ide0-0-1,id=ide0-0-1 -netdev tap,fd=25,id=hostnet0 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:d9:3a:73,bus=pci.0,addr=0x3 -device usb-tablet,id=input0 -vnc 0.0.0.0:0 -device cirrus-vga,id=video0,vgamem_mb=9,bus=pci.0,addr=0x2 

We setup a linux guest os on an IDE disk, named "test.img", when the guest os
Finishe the process of install, then I execute "reboot -f" in the VM immediately, 
And then will crash the qemu.

Best regards,
-Gonglei

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [PATCH] ide: fix double free
  2014-07-02  9:46       ` Gonglei (Arei)
@ 2014-07-02 10:16         ` Paolo Bonzini
  2014-07-02 11:12           ` ChenLiang
  2014-07-02 11:33           ` ChenLiang
  0 siblings, 2 replies; 19+ messages in thread
From: Paolo Bonzini @ 2014-07-02 10:16 UTC (permalink / raw)
  To: Gonglei (Arei), chenliang (T)
  Cc: kwolf@redhat.com, Huangweidong (C), qemu-devel@nongnu.org,
	stefanha@redhat.com

Il 02/07/2014 11:46, Gonglei (Arei) ha scritto:
> Hi, Paolo. We have tested your above patch, and it works well for us.

I'm still not sure where the fix is.  I jotted the patch quickly, but 
I'd rather understand it better before submitting it.  Here is it again:

--- a/dma-helpers.c
+++ b/dma-helpers.c
@@ -181,15 +181,15 @@ static void dma_aio_cancel(BlockDriverAIOCB *acb)
      trace_dma_aio_cancel(dbs);

+    dbs->in_cancel = true;
      if (dbs->acb) {
          BlockDriverAIOCB *acb = dbs->acb;
          dbs->acb = NULL;
-        dbs->in_cancel = true;
          bdrv_aio_cancel(acb);
-        dbs->in_cancel = false;
      }
      dbs->common.cb = NULL;
      dma_complete(dbs, 0);
+    qemu_aio_release(dbs);
  }

  static const AIOCBInfo dma_aiocb_info = {

and here is dma_complete:

static void dma_complete(DMAAIOCB *dbs, int ret)
{
     trace_dma_complete(dbs, ret, dbs->common.cb);
     dma_bdrv_unmap(dbs);
     if (dbs->common.cb) {
         dbs->common.cb(dbs->common.opaque, ret);
     }
     qemu_iovec_destroy(&dbs->iov);
     if (dbs->bh) {
         qemu_bh_delete(dbs->bh);
         dbs->bh = NULL;
     }
     if (!dbs->in_cancel) {
         /* Requests may complete while dma_aio_cancel is in progress.
          * this case, the AIOCB should not be released because it is
          * still referenced by dma_aio_cancel.  */
         qemu_aio_release(dbs);
     }
}

The qemu_aio_release call obviously happened during the second 
dma_complete call.

However, the second dma_complete call does not call the callback, and 
even if dma_bdrv_unmap resulted in a call to continue_after_map_failure, 
the bottom half would be cancelled immediately.  So we changed from:

     dbs->in_cancel = false;

     /* dma_complete calls dma_bdrv_unmap */
     for (i = 0; i < dbs->iov.niov; ++i) {
         /* might schedule dbs->bh */
         dma_memory_unmap(dbs->sg->as, dbs->iov.iov[i].iov_base,
                          dbs->iov.iov[i].iov_len, dbs->dir,
                          dbs->iov.iov[i].iov_len);
     }
     qemu_iovec_reset(&dbs->iov);

     /* back to dma_complete, dbs->common.cb == NULL */
     qemu_iovec_destroy(&dbs->iov);
     if (dbs->bh) {
         /* dbs->bh now unscheduled, so it will never run */
         qemu_bh_delete(dbs->bh);
         dbs->bh = NULL;
     }
     /* dbs->in_cancel is false here.  */
     if (!dbs->in_cancel) {
         /* Requests may complete while dma_aio_cancel is in progress.
          * this case, the AIOCB should not be released because it is
          * still referenced by dma_aio_cancel.  */
         qemu_aio_release(dbs);
     }

to

     dbs->in_cancel = false;

     /* dma_complete calls dma_bdrv_unmap */
     for (i = 0; i < dbs->iov.niov; ++i) {
         /* might schedule dbs->bh */
         dma_memory_unmap(dbs->sg->as, dbs->iov.iov[i].iov_base,
                          dbs->iov.iov[i].iov_len, dbs->dir,
                          dbs->iov.iov[i].iov_len);
     }
     qemu_iovec_reset(&dbs->iov);

     /* back to dma_complete, dbs->common.cb == NULL */
     qemu_iovec_destroy(&dbs->iov);
     if (dbs->bh) {
         /* dbs->bh now unscheduled, so it will never run */
         qemu_bh_delete(dbs->bh);
         dbs->bh = NULL;
     }

     /* after the patch dbs->in_cancel is true, the if never triggers */
     if (!dbs->in_cancel) {
         /* Requests may complete while dma_aio_cancel is in progress.
          * this case, the AIOCB should not be released because it is
          * still referenced by dma_aio_cancel.  */
         qemu_aio_release(dbs);
     }

     /* back to dma_aio_cancel */
     qemu_aio_release(dbs);

This doesn't make sense to me.  Can you find out where I'm wrong?

Paolo

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [PATCH] ide: fix double free
  2014-07-02 10:16         ` Paolo Bonzini
@ 2014-07-02 11:12           ` ChenLiang
  2014-07-02 11:24             ` Paolo Bonzini
  2014-07-02 11:33           ` ChenLiang
  1 sibling, 1 reply; 19+ messages in thread
From: ChenLiang @ 2014-07-02 11:12 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: kwolf@redhat.com, Gonglei (Arei), Huangweidong (C),
	qemu-devel@nongnu.org, stefanha@redhat.com

On 2014/7/2 18:16, Paolo Bonzini wrote:

> Il 02/07/2014 11:46, Gonglei (Arei) ha scritto:
>> Hi, Paolo. We have tested your above patch, and it works well for us.
> 
> I'm still not sure where the fix is.  I jotted the patch quickly, but I'd rather understand it better before submitting it.  Here is it again:
> 
> --- a/dma-helpers.c
> +++ b/dma-helpers.c
> @@ -181,15 +181,15 @@ static void dma_aio_cancel(BlockDriverAIOCB *acb)
>      trace_dma_aio_cancel(dbs);
> 
> +    dbs->in_cancel = true;
>      if (dbs->acb) {
>          BlockDriverAIOCB *acb = dbs->acb;
>          dbs->acb = NULL;
> -        dbs->in_cancel = true;
>          bdrv_aio_cancel(acb);
> -        dbs->in_cancel = false;
>      }
>      dbs->common.cb = NULL;
>      dma_complete(dbs, 0);
> +    qemu_aio_release(dbs);
>  }
> 
>  static const AIOCBInfo dma_aiocb_info = {
> 
> and here is dma_complete:
> 
> static void dma_complete(DMAAIOCB *dbs, int ret)
> {
>     trace_dma_complete(dbs, ret, dbs->common.cb);
>     dma_bdrv_unmap(dbs);
>     if (dbs->common.cb) {
>         dbs->common.cb(dbs->common.opaque, ret);
>     }
>     qemu_iovec_destroy(&dbs->iov);
>     if (dbs->bh) {
>         qemu_bh_delete(dbs->bh);
>         dbs->bh = NULL;
>     }
>     if (!dbs->in_cancel) {
>         /* Requests may complete while dma_aio_cancel is in progress.
>          * this case, the AIOCB should not be released because it is
>          * still referenced by dma_aio_cancel.  */
>         qemu_aio_release(dbs);
>     }
> }
> 
> The qemu_aio_release call obviously happened during the second dma_complete call.
> 
> However, the second dma_complete call does not call the callback, and even if dma_bdrv_unmap resulted in a call to continue_after_map_failure, the bottom half would be cancelled immediately.  So we changed from:
> 
>     dbs->in_cancel = false;
> 
>     /* dma_complete calls dma_bdrv_unmap */
>     for (i = 0; i < dbs->iov.niov; ++i) {
>         /* might schedule dbs->bh */
>         dma_memory_unmap(dbs->sg->as, dbs->iov.iov[i].iov_base,
>                          dbs->iov.iov[i].iov_len, dbs->dir,
>                          dbs->iov.iov[i].iov_len);
>     }
>     qemu_iovec_reset(&dbs->iov);
> 
>     /* back to dma_complete, dbs->common.cb == NULL */
>     qemu_iovec_destroy(&dbs->iov);
>     if (dbs->bh) {
>         /* dbs->bh now unscheduled, so it will never run */
>         qemu_bh_delete(dbs->bh);
>         dbs->bh = NULL;
>     }
>     /* dbs->in_cancel is false here.  */
>     if (!dbs->in_cancel) {
>         /* Requests may complete while dma_aio_cancel is in progress.
>          * this case, the AIOCB should not be released because it is
>          * still referenced by dma_aio_cancel.  */
>         qemu_aio_release(dbs);
>     }
> 
> to
> 
>     dbs->in_cancel = false;
> 
>     /* dma_complete calls dma_bdrv_unmap */
>     for (i = 0; i < dbs->iov.niov; ++i) {
>         /* might schedule dbs->bh */
>         dma_memory_unmap(dbs->sg->as, dbs->iov.iov[i].iov_base,
>                          dbs->iov.iov[i].iov_len, dbs->dir,
>                          dbs->iov.iov[i].iov_len);
>     }
>     qemu_iovec_reset(&dbs->iov);
> 
>     /* back to dma_complete, dbs->common.cb == NULL */
>     qemu_iovec_destroy(&dbs->iov);
>     if (dbs->bh) {
>         /* dbs->bh now unscheduled, so it will never run */
>         qemu_bh_delete(dbs->bh);
>         dbs->bh = NULL;
>     }
> 
>     /* after the patch dbs->in_cancel is true, the if never triggers */
>     if (!dbs->in_cancel) {
>         /* Requests may complete while dma_aio_cancel is in progress.
>          * this case, the AIOCB should not be released because it is
>          * still referenced by dma_aio_cancel.  */
>         qemu_aio_release(dbs);
>     }
> 
>     /* back to dma_aio_cancel */
>     qemu_aio_release(dbs);
> 
> This doesn't make sense to me.  Can you find out where I'm wrong?
> 
> Paolo
> 
> .
> 

This patch avoid freeing dbs by dma_complete when dma_aio_cancel is running.
Because dma_complete also will be called by dma_bdrv_cb. So double free
will never happen.

Chenliang

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [PATCH] ide: fix double free
  2014-07-02 11:12           ` ChenLiang
@ 2014-07-02 11:24             ` Paolo Bonzini
  0 siblings, 0 replies; 19+ messages in thread
From: Paolo Bonzini @ 2014-07-02 11:24 UTC (permalink / raw)
  To: ChenLiang
  Cc: kwolf@redhat.com, Gonglei (Arei), Huangweidong (C),
	qemu-devel@nongnu.org, stefanha@redhat.com

Il 02/07/2014 13:12, ChenLiang ha scritto:
> This patch avoid freeing dbs by dma_complete when dma_aio_cancel is running.
> Because dma_complete also will be called by dma_bdrv_cb. So double free
> will never happen.

Yes, you already said that.  But I'm not sure _why_ the patch avoids 
freeing dbs, especially since the patch adds a free in dma_aio_cancel 
itself.  My patch shouldn't have any effect.  Where is my analysis of it 
wrong?

Paolo

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [PATCH] ide: fix double free
  2014-07-02 10:16         ` Paolo Bonzini
  2014-07-02 11:12           ` ChenLiang
@ 2014-07-02 11:33           ` ChenLiang
  2014-07-02 11:40             ` Paolo Bonzini
  1 sibling, 1 reply; 19+ messages in thread
From: ChenLiang @ 2014-07-02 11:33 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: kwolf@redhat.com, Gonglei (Arei), Huangweidong (C),
	qemu-devel@nongnu.org, stefanha@redhat.com

On 2014/7/2 18:16, Paolo Bonzini wrote:

> Il 02/07/2014 11:46, Gonglei (Arei) ha scritto:
>> Hi, Paolo. We have tested your above patch, and it works well for us.
> 
> I'm still not sure where the fix is.  I jotted the patch quickly, but I'd rather understand it better before submitting it.  Here is it again:
> 
> --- a/dma-helpers.c
> +++ b/dma-helpers.c
> @@ -181,15 +181,15 @@ static void dma_aio_cancel(BlockDriverAIOCB *acb)
>      trace_dma_aio_cancel(dbs);
> 
> +    dbs->in_cancel = true;
>      if (dbs->acb) {
>          BlockDriverAIOCB *acb = dbs->acb;
>          dbs->acb = NULL;
> -        dbs->in_cancel = true;
>          bdrv_aio_cancel(acb);
> -        dbs->in_cancel = false;
>      }
>      dbs->common.cb = NULL;
>      dma_complete(dbs, 0);
> +    qemu_aio_release(dbs);
>  }
> 


Hmm, dbs->in_cancel will be true always. Although this will avoid freeing dbs by dma_comlete.
But it maybe a mistake.

>  static const AIOCBInfo dma_aiocb_info = {
> 
> and here is dma_complete:
> 
> static void dma_complete(DMAAIOCB *dbs, int ret)
> {
>     trace_dma_complete(dbs, ret, dbs->common.cb);
>     dma_bdrv_unmap(dbs);
>     if (dbs->common.cb) {
>         dbs->common.cb(dbs->common.opaque, ret);
>     }
>     qemu_iovec_destroy(&dbs->iov);
>     if (dbs->bh) {
>         qemu_bh_delete(dbs->bh);
>         dbs->bh = NULL;
>     }
>     if (!dbs->in_cancel) {
>         /* Requests may complete while dma_aio_cancel is in progress.
>          * this case, the AIOCB should not be released because it is
>          * still referenced by dma_aio_cancel.  */
>         qemu_aio_release(dbs);
>     }
> }
> 
> The qemu_aio_release call obviously happened during the second dma_complete call.
> 
> However, the second dma_complete call does not call the callback, and even if dma_bdrv_unmap resulted in a call to continue_after_map_failure, the bottom half would be cancelled immediately.  So we changed from:
> 
>     dbs->in_cancel = false;
> 
>     /* dma_complete calls dma_bdrv_unmap */
>     for (i = 0; i < dbs->iov.niov; ++i) {
>         /* might schedule dbs->bh */
>         dma_memory_unmap(dbs->sg->as, dbs->iov.iov[i].iov_base,
>                          dbs->iov.iov[i].iov_len, dbs->dir,
>                          dbs->iov.iov[i].iov_len);
>     }
>     qemu_iovec_reset(&dbs->iov);
> 
>     /* back to dma_complete, dbs->common.cb == NULL */
>     qemu_iovec_destroy(&dbs->iov);
>     if (dbs->bh) {
>         /* dbs->bh now unscheduled, so it will never run */
>         qemu_bh_delete(dbs->bh);
>         dbs->bh = NULL;
>     }
>     /* dbs->in_cancel is false here.  */
>     if (!dbs->in_cancel) {
>         /* Requests may complete while dma_aio_cancel is in progress.
>          * this case, the AIOCB should not be released because it is
>          * still referenced by dma_aio_cancel.  */
>         qemu_aio_release(dbs);
>     }
> 
> to
> 
>     dbs->in_cancel = false;
> 
>     /* dma_complete calls dma_bdrv_unmap */
>     for (i = 0; i < dbs->iov.niov; ++i) {
>         /* might schedule dbs->bh */
>         dma_memory_unmap(dbs->sg->as, dbs->iov.iov[i].iov_base,
>                          dbs->iov.iov[i].iov_len, dbs->dir,
>                          dbs->iov.iov[i].iov_len);
>     }
>     qemu_iovec_reset(&dbs->iov);
> 
>     /* back to dma_complete, dbs->common.cb == NULL */
>     qemu_iovec_destroy(&dbs->iov);
>     if (dbs->bh) {
>         /* dbs->bh now unscheduled, so it will never run */
>         qemu_bh_delete(dbs->bh);
>         dbs->bh = NULL;
>     }
> 
>     /* after the patch dbs->in_cancel is true, the if never triggers */
>     if (!dbs->in_cancel) {
>         /* Requests may complete while dma_aio_cancel is in progress.
>          * this case, the AIOCB should not be released because it is
>          * still referenced by dma_aio_cancel.  */
>         qemu_aio_release(dbs);
>     }
> 
>     /* back to dma_aio_cancel */
>     qemu_aio_release(dbs);
> 
> This doesn't make sense to me.  Can you find out where I'm wrong?
> 
> Paolo
> 
> .
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [PATCH] ide: fix double free
  2014-07-02 11:33           ` ChenLiang
@ 2014-07-02 11:40             ` Paolo Bonzini
  2014-07-02 11:57               ` ChenLiang
  0 siblings, 1 reply; 19+ messages in thread
From: Paolo Bonzini @ 2014-07-02 11:40 UTC (permalink / raw)
  To: ChenLiang
  Cc: kwolf@redhat.com, Gonglei (Arei), Huangweidong (C),
	qemu-devel@nongnu.org, stefanha@redhat.com

Il 02/07/2014 13:33, ChenLiang ha scritto:
> On 2014/7/2 18:16, Paolo Bonzini wrote:
>
>> Il 02/07/2014 11:46, Gonglei (Arei) ha scritto:
>>> Hi, Paolo. We have tested your above patch, and it works well for us.
>>
>> I'm still not sure where the fix is.  I jotted the patch quickly, but I'd rather understand it better before submitting it.  Here is it again:
>>
>> --- a/dma-helpers.c
>> +++ b/dma-helpers.c
>> @@ -181,15 +181,15 @@ static void dma_aio_cancel(BlockDriverAIOCB *acb)
>>      trace_dma_aio_cancel(dbs);
>>
>> +    dbs->in_cancel = true;
>>      if (dbs->acb) {
>>          BlockDriverAIOCB *acb = dbs->acb;
>>          dbs->acb = NULL;
>> -        dbs->in_cancel = true;
>>          bdrv_aio_cancel(acb);
>> -        dbs->in_cancel = false;
>>      }
>>      dbs->common.cb = NULL;
>>      dma_complete(dbs, 0);
>> +    qemu_aio_release(dbs);
>>  }
>>
>
>
> Hmm, dbs->in_cancel will be true always. Although this will avoid freeing dbs by dma_comlete.
> But it maybe a mistake.

This was on purpose; I'm doing the free myself in dma_aio_cancel, so I 
wanted to avoid the qemu_aio_release from dma_complete.  This was in 
case of a recursive call to dma_complete.  But I don't see how that 
recursive call could happen outside the "if (dbs->acb)"; and inside the 
"if" the protection is there already.

Can you gather the backtraces for _both_ calls to qemu_aio_release, 
rather than just the second?

With what guest are you encountering the bug?

Paolo

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [PATCH] ide: fix double free
  2014-07-02 11:40             ` Paolo Bonzini
@ 2014-07-02 11:57               ` ChenLiang
  2014-07-02 12:19                 ` Paolo Bonzini
  0 siblings, 1 reply; 19+ messages in thread
From: ChenLiang @ 2014-07-02 11:57 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: kwolf@redhat.com, Gonglei (Arei), Huangweidong (C),
	qemu-devel@nongnu.org, stefanha@redhat.com

On 2014/7/2 19:40, Paolo Bonzini wrote:

> Il 02/07/2014 13:33, ChenLiang ha scritto:
>> On 2014/7/2 18:16, Paolo Bonzini wrote:
>>
>>> Il 02/07/2014 11:46, Gonglei (Arei) ha scritto:
>>>> Hi, Paolo. We have tested your above patch, and it works well for us.
>>>
>>> I'm still not sure where the fix is.  I jotted the patch quickly, but I'd rather understand it better before submitting it.  Here is it again:
>>>
>>> --- a/dma-helpers.c
>>> +++ b/dma-helpers.c
>>> @@ -181,15 +181,15 @@ static void dma_aio_cancel(BlockDriverAIOCB *acb)
>>>      trace_dma_aio_cancel(dbs);
>>>
>>> +    dbs->in_cancel = true;
>>>      if (dbs->acb) {
>>>          BlockDriverAIOCB *acb = dbs->acb;
>>>          dbs->acb = NULL;
>>> -        dbs->in_cancel = true;
>>>          bdrv_aio_cancel(acb);
>>> -        dbs->in_cancel = false;
>>>      }
>>>      dbs->common.cb = NULL;
>>>      dma_complete(dbs, 0);
>>> +    qemu_aio_release(dbs);
>>>  }
>>>
>>
>>
>> Hmm, dbs->in_cancel will be true always. Although this will avoid freeing dbs by dma_comlete.
>> But it maybe a mistake.
> 
> This was on purpose; I'm doing the free myself in dma_aio_cancel, so I wanted to avoid the qemu_aio_release from dma_complete.  This was in case of a recursive call to dma_complete.  But I don't see how that recursive call could happen outside the "if (dbs->acb)"; and inside the "if" the protection is there already.
> 
> Can you gather the backtraces for _both_ calls to qemu_aio_release, rather than just the second?
> 
> With what guest are you encountering the bug?
> 
> Paolo
> 
> 
> .
> 

(gdb) bt
#0  qemu_aio_release (p=0x7f44788d1290) at block.c:4260
#1  0x00007f4477494e5e in dma_complete (dbs=0x7f44788d1290, ret=0) at dma-helpers.c:135
#2  0x00007f44774952c2 in dma_aio_cancel (acb=0x7f44788d1290) at dma-helpers.c:195
#3  0x00007f447744825b in bdrv_aio_cancel (acb=0x7f44788d1290) at block.c:3848
#4  0x00007f4477513911 in ide_bus_reset (bus=0x7f44785f1bd8) at hw/ide/core.c:1957
#5  0x00007f4477516b3c in piix3_reset (opaque=0x7f44785f1530) at hw/ide/piix.c:113
#6  0x00007f4477647b9f in qemu_devices_reset () at vl.c:2131
#7  0x00007f4477647c0f in qemu_system_reset (report=true) at vl.c:2140
#8  0x00007f4477648127 in main_loop_should_exit () at vl.c:2274
#9  0x00007f447764823a in main_loop () at vl.c:2323
#10 0x00007f447764f6da in main (argc=57, argv=0x7fff5d194378, envp=0x7fff5d194548) at vl.c:4803

Chenliang


is

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [PATCH] ide: fix double free
  2014-07-02 11:57               ` ChenLiang
@ 2014-07-02 12:19                 ` Paolo Bonzini
  2014-07-02 12:46                   ` 陈梁
                                     ` (2 more replies)
  0 siblings, 3 replies; 19+ messages in thread
From: Paolo Bonzini @ 2014-07-02 12:19 UTC (permalink / raw)
  To: ChenLiang
  Cc: kwolf@redhat.com, Gonglei (Arei), Huangweidong (C),
	qemu-devel@nongnu.org, stefanha@redhat.com

Il 02/07/2014 13:57, ChenLiang ha scritto:
>>> Hmm, dbs->in_cancel will be true always. Although this will avoid freeing dbs by dma_comlete.
>>> But it maybe a mistake.
>>
>> This was on purpose; I'm doing the free myself in dma_aio_cancel, so I wanted to avoid the qemu_aio_release from dma_complete.  This was in case of a recursive call to dma_complete.  But I don't see how that recursive call could happen outside the "if (dbs->acb)"; and inside the "if" the protection is there already.
>>
>> Can you gather the backtraces for _both_ calls to qemu_aio_release, rather than just the second?
> 
> (gdb) bt
> #0  qemu_aio_release (p=0x7f44788d1290) at block.c:4260
> #1  0x00007f4477494e5e in dma_complete (dbs=0x7f44788d1290, ret=0) at dma-helpers.c:135
> #2  0x00007f44774952c2 in dma_aio_cancel (acb=0x7f44788d1290) at dma-helpers.c:195
> #3  0x00007f447744825b in bdrv_aio_cancel (acb=0x7f44788d1290) at block.c:3848
> #4  0x00007f4477513911 in ide_bus_reset (bus=0x7f44785f1bd8) at hw/ide/core.c:1957
> #5  0x00007f4477516b3c in piix3_reset (opaque=0x7f44785f1530) at hw/ide/piix.c:113
> #6  0x00007f4477647b9f in qemu_devices_reset () at vl.c:2131
> #7  0x00007f4477647c0f in qemu_system_reset (report=true) at vl.c:2140
> #8  0x00007f4477648127 in main_loop_should_exit () at vl.c:2274
> #9  0x00007f447764823a in main_loop () at vl.c:2323
> #10 0x00007f447764f6da in main (argc=57, argv=0x7fff5d194378, envp=0x7fff5d194548) at vl.c:4803

And the second is

#7  0x00007f3cb525de5e in dma_complete (dbs=0x7f3cb63f3220, ret=0) at dma-helpers.c:135
#8  0x00007f3cb525df3d in dma_bdrv_cb (opaque=0x7f3cb63f3220, ret=0) at dma-helpers.c:152
#9  0x00007f3cb5212102 in bdrv_co_em_bh (opaque=0x7f3cb6398980) at block.c:4127
#10 0x00007f3cb51f6cef in aio_bh_poll (ctx=0x7f3cb622a8f0) at async.c:70
#11 0x00007f3cb51f695a in aio_poll (ctx=0x7f3cb622a8f0, blocking=false) at aio-posix.c:185
#12 0x00007f3cb51f7056 in aio_ctx_dispatch (source=0x7f3cb622a8f0, callback=0x0, user_data=0x0)
    at async.c:167
#13 0x00007f3cb48b969a in g_main_context_dispatch () from /usr/lib64/libglib-2.0.so.0

This explains why my patch "fixes" the bug.  It turns a double free
into a dangling pointer: the second call now sees in_cancel == true
and skips the free.

The second call should have happened within dma_aio_cancel's call to
bdrv_aio_cancel.  This is the real bug.

What is your version of QEMU?  I cannot see any where bdrv_co_em_bh is
at line 4127 or bdrv_aio_cancel is at line 3848.  Can you reproduce it
with qemu.git master?

Paolo

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [PATCH] ide: fix double free
  2014-07-02 12:19                 ` Paolo Bonzini
@ 2014-07-02 12:46                   ` 陈梁
  2014-07-02 12:54                   ` 陈梁
  2014-07-03  2:23                   ` ChenLiang
  2 siblings, 0 replies; 19+ messages in thread
From: 陈梁 @ 2014-07-02 12:46 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: kwolf@redhat.com, ChenLiang, Huangweidong (C), Gonglei (Arei),
	qemu-devel@nongnu.org, 陈梁, stefanha@redhat.com


> Il 02/07/2014 13:57, ChenLiang ha scritto:
>>>> Hmm, dbs->in_cancel will be true always. Although this will avoid freeing dbs by dma_comlete.
>>>> But it maybe a mistake.
>>> 
>>> This was on purpose; I'm doing the free myself in dma_aio_cancel, so I wanted to avoid the qemu_aio_release from dma_complete.  This was in case of a recursive call to dma_complete.  But I don't see how that recursive call could happen outside the "if (dbs->acb)"; and inside the "if" the protection is there already.
>>> 
>>> Can you gather the backtraces for _both_ calls to qemu_aio_release, rather than just the second?
>> 
>> (gdb) bt
>> #0  qemu_aio_release (p=0x7f44788d1290) at block.c:4260
>> #1  0x00007f4477494e5e in dma_complete (dbs=0x7f44788d1290, ret=0) at dma-helpers.c:135
>> #2  0x00007f44774952c2 in dma_aio_cancel (acb=0x7f44788d1290) at dma-helpers.c:195
>> #3  0x00007f447744825b in bdrv_aio_cancel (acb=0x7f44788d1290) at block.c:3848
>> #4  0x00007f4477513911 in ide_bus_reset (bus=0x7f44785f1bd8) at hw/ide/core.c:1957
>> #5  0x00007f4477516b3c in piix3_reset (opaque=0x7f44785f1530) at hw/ide/piix.c:113
>> #6  0x00007f4477647b9f in qemu_devices_reset () at vl.c:2131
>> #7  0x00007f4477647c0f in qemu_system_reset (report=true) at vl.c:2140
>> #8  0x00007f4477648127 in main_loop_should_exit () at vl.c:2274
>> #9  0x00007f447764823a in main_loop () at vl.c:2323
>> #10 0x00007f447764f6da in main (argc=57, argv=0x7fff5d194378, envp=0x7fff5d194548) at vl.c:4803
> 
> And the second is
> 
> #7  0x00007f3cb525de5e in dma_complete (dbs=0x7f3cb63f3220, ret=0) at dma-helpers.c:135
> #8  0x00007f3cb525df3d in dma_bdrv_cb (opaque=0x7f3cb63f3220, ret=0) at dma-helpers.c:152
> #9  0x00007f3cb5212102 in bdrv_co_em_bh (opaque=0x7f3cb6398980) at block.c:4127
> #10 0x00007f3cb51f6cef in aio_bh_poll (ctx=0x7f3cb622a8f0) at async.c:70
> #11 0x00007f3cb51f695a in aio_poll (ctx=0x7f3cb622a8f0, blocking=false) at aio-posix.c:185
> #12 0x00007f3cb51f7056 in aio_ctx_dispatch (source=0x7f3cb622a8f0, callback=0x0, user_data=0x0)
>    at async.c:167
> #13 0x00007f3cb48b969a in g_main_context_dispatch () from /usr/lib64/libglib-2.0.so.0
> 
> This explains why my patch "fixes" the bug.  It turns a double free
> into a dangling pointer: the second call now sees in_cancel == true
> and skips the free.
> 
> The second call should have happened within dma_aio_cancel's call to
> bdrv_aio_cancel.  This is the real bug.
> 
> What is your version of QEMU?  I cannot see any where bdrv_co_em_bh is
> at line 4127 or bdrv_aio_cancel is at line 3848.  Can you reproduce it
> with qemu.git master?
> 
> Paolo
> 
The version of my qemu is 1.5.0. Tomorrow I will reproduce it, I am in home now : ) .

Chenliang

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [PATCH] ide: fix double free
  2014-07-02 12:19                 ` Paolo Bonzini
  2014-07-02 12:46                   ` 陈梁
@ 2014-07-02 12:54                   ` 陈梁
  2014-07-02 13:09                     ` Paolo Bonzini
  2014-07-03  2:23                   ` ChenLiang
  2 siblings, 1 reply; 19+ messages in thread
From: 陈梁 @ 2014-07-02 12:54 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: kwolf@redhat.com, ChenLiang, Huangweidong (C), Gonglei (Arei),
	qemu-devel@nongnu.org, 陈梁, stefanha@redhat.com


> Il 02/07/2014 13:57, ChenLiang ha scritto:
>>>> Hmm, dbs->in_cancel will be true always. Although this will avoid freeing dbs by dma_comlete.
>>>> But it maybe a mistake.
>>> 
>>> This was on purpose; I'm doing the free myself in dma_aio_cancel, so I wanted to avoid the qemu_aio_release from dma_complete.  This was in case of a recursive call to dma_complete.  But I don't see how that recursive call could happen outside the "if (dbs->acb)"; and inside the "if" the protection is there already.
>>> 
>>> Can you gather the backtraces for _both_ calls to qemu_aio_release, rather than just the second?
>> 
>> (gdb) bt
>> #0  qemu_aio_release (p=0x7f44788d1290) at block.c:4260
>> #1  0x00007f4477494e5e in dma_complete (dbs=0x7f44788d1290, ret=0) at dma-helpers.c:135
>> #2  0x00007f44774952c2 in dma_aio_cancel (acb=0x7f44788d1290) at dma-helpers.c:195
>> #3  0x00007f447744825b in bdrv_aio_cancel (acb=0x7f44788d1290) at block.c:3848
>> #4  0x00007f4477513911 in ide_bus_reset (bus=0x7f44785f1bd8) at hw/ide/core.c:1957
>> #5  0x00007f4477516b3c in piix3_reset (opaque=0x7f44785f1530) at hw/ide/piix.c:113
>> #6  0x00007f4477647b9f in qemu_devices_reset () at vl.c:2131
>> #7  0x00007f4477647c0f in qemu_system_reset (report=true) at vl.c:2140
>> #8  0x00007f4477648127 in main_loop_should_exit () at vl.c:2274
>> #9  0x00007f447764823a in main_loop () at vl.c:2323
>> #10 0x00007f447764f6da in main (argc=57, argv=0x7fff5d194378, envp=0x7fff5d194548) at vl.c:4803
> 
> And the second is
> 
> #7  0x00007f3cb525de5e in dma_complete (dbs=0x7f3cb63f3220, ret=0) at dma-helpers.c:135
> #8  0x00007f3cb525df3d in dma_bdrv_cb (opaque=0x7f3cb63f3220, ret=0) at dma-helpers.c:152
> #9  0x00007f3cb5212102 in bdrv_co_em_bh (opaque=0x7f3cb6398980) at block.c:4127
> #10 0x00007f3cb51f6cef in aio_bh_poll (ctx=0x7f3cb622a8f0) at async.c:70
> #11 0x00007f3cb51f695a in aio_poll (ctx=0x7f3cb622a8f0, blocking=false) at aio-posix.c:185
> #12 0x00007f3cb51f7056 in aio_ctx_dispatch (source=0x7f3cb622a8f0, callback=0x0, user_data=0x0)
>    at async.c:167
> #13 0x00007f3cb48b969a in g_main_context_dispatch () from /usr/lib64/libglib-2.0.so.0
> 
> This explains why my patch "fixes" the bug.  It turns a double free
> into a dangling pointer: the second call now sees in_cancel == true
> and skips the free.
> 
> The second call should have happened within dma_aio_cancel's call to
> bdrv_aio_cancel.  This is the real bug.

IMO, the second need not happened within dma_aio_cancel's call to bdrv_aio_cancel.
The double free will be happened if dam_aio_cancel is called.

> 
> What is your version of QEMU?  I cannot see any where bdrv_co_em_bh is
> at line 4127 or bdrv_aio_cancel is at line 3848.  Can you reproduce it
> with qemu.git master?
> 
> Paolo
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [PATCH] ide: fix double free
  2014-07-02 12:54                   ` 陈梁
@ 2014-07-02 13:09                     ` Paolo Bonzini
  0 siblings, 0 replies; 19+ messages in thread
From: Paolo Bonzini @ 2014-07-02 13:09 UTC (permalink / raw)
  To: 陈梁
  Cc: kwolf@redhat.com, ChenLiang, Huangweidong (C),
	qemu-devel@nongnu.org, Gonglei (Arei), stefanha@redhat.com

Il 02/07/2014 14:54, 陈梁 ha scritto:
>> > The second call should have happened within dma_aio_cancel's call to
>> > bdrv_aio_cancel.  This is the real bug.
> IMO, the second need not happened within dma_aio_cancel's call to bdrv_aio_cancel.
> The double free will be happened if dam_aio_cancel is called.

The callback must not be invoked after bdrv_aio_cancel.  This is the 
fundamental invariant of bdrv_aio_cancel.  All implementations of AIOCB 
must respect it, or bugs like this one happen.

Here, either bdrv_aio_cancel was not invoked, or the invariant was broken.

The other invariant, this time in dma-helpers.c, is that dma_bdrv_cb 
either exits with no pending AIOCB, or it exits with a non-NULL 
dbs->acb.  If bdrv_aio_cancel was not invoked, this invariant was broken 
because there is a pending AIOCB but it is not in dbs->acb.

Paolo

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [PATCH] ide: fix double free
  2014-07-02 12:19                 ` Paolo Bonzini
  2014-07-02 12:46                   ` 陈梁
  2014-07-02 12:54                   ` 陈梁
@ 2014-07-03  2:23                   ` ChenLiang
  2014-07-03 10:41                     ` Paolo Bonzini
  2 siblings, 1 reply; 19+ messages in thread
From: ChenLiang @ 2014-07-03  2:23 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: kwolf@redhat.com, Gonglei (Arei), Huangweidong (C),
	qemu-devel@nongnu.org, stefanha@redhat.com

On 2014/7/2 20:19, Paolo Bonzini wrote:

> Il 02/07/2014 13:57, ChenLiang ha scritto:
>>>> Hmm, dbs->in_cancel will be true always. Although this will avoid freeing dbs by dma_comlete.
>>>> But it maybe a mistake.
>>>
>>> This was on purpose; I'm doing the free myself in dma_aio_cancel, so I wanted to avoid the qemu_aio_release from dma_complete.  This was in case of a recursive call to dma_complete.  But I don't see how that recursive call could happen outside the "if (dbs->acb)"; and inside the "if" the protection is there already.
>>>
>>> Can you gather the backtraces for _both_ calls to qemu_aio_release, rather than just the second?
>>
>> (gdb) bt
>> #0  qemu_aio_release (p=0x7f44788d1290) at block.c:4260
>> #1  0x00007f4477494e5e in dma_complete (dbs=0x7f44788d1290, ret=0) at dma-helpers.c:135
>> #2  0x00007f44774952c2 in dma_aio_cancel (acb=0x7f44788d1290) at dma-helpers.c:195
>> #3  0x00007f447744825b in bdrv_aio_cancel (acb=0x7f44788d1290) at block.c:3848
>> #4  0x00007f4477513911 in ide_bus_reset (bus=0x7f44785f1bd8) at hw/ide/core.c:1957
>> #5  0x00007f4477516b3c in piix3_reset (opaque=0x7f44785f1530) at hw/ide/piix.c:113
>> #6  0x00007f4477647b9f in qemu_devices_reset () at vl.c:2131
>> #7  0x00007f4477647c0f in qemu_system_reset (report=true) at vl.c:2140
>> #8  0x00007f4477648127 in main_loop_should_exit () at vl.c:2274
>> #9  0x00007f447764823a in main_loop () at vl.c:2323
>> #10 0x00007f447764f6da in main (argc=57, argv=0x7fff5d194378, envp=0x7fff5d194548) at vl.c:4803
> 
> And the second is
> 
> #7  0x00007f3cb525de5e in dma_complete (dbs=0x7f3cb63f3220, ret=0) at dma-helpers.c:135
> #8  0x00007f3cb525df3d in dma_bdrv_cb (opaque=0x7f3cb63f3220, ret=0) at dma-helpers.c:152
> #9  0x00007f3cb5212102 in bdrv_co_em_bh (opaque=0x7f3cb6398980) at block.c:4127
> #10 0x00007f3cb51f6cef in aio_bh_poll (ctx=0x7f3cb622a8f0) at async.c:70
> #11 0x00007f3cb51f695a in aio_poll (ctx=0x7f3cb622a8f0, blocking=false) at aio-posix.c:185
> #12 0x00007f3cb51f7056 in aio_ctx_dispatch (source=0x7f3cb622a8f0, callback=0x0, user_data=0x0)
>     at async.c:167
> #13 0x00007f3cb48b969a in g_main_context_dispatch () from /usr/lib64/libglib-2.0.so.0
> 
> This explains why my patch "fixes" the bug.  It turns a double free
> into a dangling pointer: the second call now sees in_cancel == true
> and skips the free.
> 
> The second call should have happened within dma_aio_cancel's call to
> bdrv_aio_cancel.  This is the real bug.
> 
> What is your version of QEMU?  I cannot see any where bdrv_co_em_bh is
> at line 4127 or bdrv_aio_cancel is at line 3848.  Can you reproduce it
> with qemu.git master?
> 
> Paolo
> 
> .
> 


qemu master branch bt:

Program received signal SIGABRT, Aborted.
0x00007fd548355b55 in raise () from /lib64/libc.so.6
(gdb) bt
#0  0x00007fd548355b55 in raise () from /lib64/libc.so.6
#1  0x00007fd548357131 in abort () from /lib64/libc.so.6
#2  0x00007fd548393e0f in __libc_message () from /lib64/libc.so.6
#3  0x00007fd548399618 in malloc_printerr () from /lib64/libc.so.6
#4  0x00007fd54b15e80e in free_and_trace (mem=0x7fd54beb2230) at vl.c:2815
#5  0x00007fd54b3453cd in qemu_aio_release (p=0x7fd54beb2230) at block.c:4813
#6  0x00007fd54b15717d in dma_complete (dbs=0x7fd54beb2230, ret=0) at dma-helpers.c:132
#7  0x00007fd54b157253 in dma_bdrv_cb (opaque=0x7fd54beb2230, ret=0) at dma-helpers.c:148
#8  0x00007fd54b344db8 in bdrv_co_em_bh (opaque=0x7fd54bea4b30) at block.c:4676
#9  0x00007fd54b335a72 in aio_bh_poll (ctx=0x7fd54bcec990) at async.c:81
#10 0x00007fd54b34b1b4 in aio_poll (ctx=0x7fd54bcec990, blocking=false) at aio-posix.c:188
#11 0x00007fd54b335ee0 in aio_ctx_dispatch (source=0x7fd54bcec990, callback=0x0, user_data=0x0) at async.c:211
#12 0x00007fd549e3669a in g_main_context_dispatch () from /usr/lib64/libglib-2.0.so.0
#13 0x00007fd54b348c45 in glib_pollfds_poll () at main-loop.c:190
#14 0x00007fd54b348d3d in os_host_main_loop_wait (timeout=0) at main-loop.c:235
#15 0x00007fd54b348e2f in main_loop_wait (nonblocking=0) at main-loop.c:484
#16 0x00007fd54b15b0f8 in main_loop () at vl.c:2007
#17 0x00007fd54b162a35 in main (argc=57, argv=0x7fff152720a8, envp=0x7fff15272278) at vl.c:4526

(gdb) bt
#0  qemu_aio_release (p=0x7f86420ebec0) at block.c:4811
#1  0x00007f86412b617d in dma_complete (dbs=0x7f86420ebec0, ret=0) at dma-helpers.c:132
#2  0x00007f86412b65ab in dma_aio_cancel (acb=0x7f86420ebec0) at dma-helpers.c:192
#3  0x00007f86414a3996 in bdrv_aio_cancel (acb=0x7f86420ebec0) at block.c:4559
#4  0x00007f86413906af in ide_bus_reset (bus=0x7f8641fe3a20) at hw/ide/core.c:2056
#5  0x00007f86413967d6 in piix3_reset (opaque=0x7f8641fe32a0) at hw/ide/piix.c:114
#6  0x00007f86412b9a37 in qemu_devices_reset () at vl.c:1829
#7  0x00007f86412b9aef in qemu_system_reset (report=true) at vl.c:1842
#8  0x00007f86412b9fe2 in main_loop_should_exit () at vl.c:1971
#9  0x00007f86412ba100 in main_loop () at vl.c:2011
#10 0x00007f86412c1a35 in main (argc=57, argv=0x7fff2e827d38, envp=0x7fff2e827f08) at vl.c:4526


BTW, is it better to rename dbs->in_cancel to dbs->canceled ?

Best regards
Chenliang

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [PATCH] ide: fix double free
  2014-07-03  2:23                   ` ChenLiang
@ 2014-07-03 10:41                     ` Paolo Bonzini
  2014-07-07  8:12                       ` ChenLiang
  0 siblings, 1 reply; 19+ messages in thread
From: Paolo Bonzini @ 2014-07-03 10:41 UTC (permalink / raw)
  To: ChenLiang
  Cc: kwolf@redhat.com, Gonglei (Arei), Huangweidong (C),
	qemu-devel@nongnu.org, stefanha@redhat.com

Il 03/07/2014 04:23, ChenLiang ha scritto:
> Program received signal SIGABRT, Aborted.
> 0x00007fd548355b55 in raise () from /lib64/libc.so.6
> (gdb) bt
> #0  0x00007fd548355b55 in raise () from /lib64/libc.so.6
> #1  0x00007fd548357131 in abort () from /lib64/libc.so.6
> #2  0x00007fd548393e0f in __libc_message () from /lib64/libc.so.6
> #3  0x00007fd548399618 in malloc_printerr () from /lib64/libc.so.6
> #4  0x00007fd54b15e80e in free_and_trace (mem=0x7fd54beb2230) at vl.c:2815
> #5  0x00007fd54b3453cd in qemu_aio_release (p=0x7fd54beb2230) at block.c:4813
> #6  0x00007fd54b15717d in dma_complete (dbs=0x7fd54beb2230, ret=0) at dma-helpers.c:132
> #7  0x00007fd54b157253 in dma_bdrv_cb (opaque=0x7fd54beb2230, ret=0) at dma-helpers.c:148
> #8  0x00007fd54b344db8 in bdrv_co_em_bh (opaque=0x7fd54bea4b30) at block.c:4676
> #9  0x00007fd54b335a72 in aio_bh_poll (ctx=0x7fd54bcec990) at async.c:81
> #10 0x00007fd54b34b1b4 in aio_poll (ctx=0x7fd54bcec990, blocking=false) at aio-posix.c:188
> #11 0x00007fd54b335ee0 in aio_ctx_dispatch (source=0x7fd54bcec990, callback=0x0, user_data=0x0) at async.c:211
> #12 0x00007fd549e3669a in g_main_context_dispatch () from /usr/lib64/libglib-2.0.so.0
> #13 0x00007fd54b348c45 in glib_pollfds_poll () at main-loop.c:190
> #14 0x00007fd54b348d3d in os_host_main_loop_wait (timeout=0) at main-loop.c:235
> #15 0x00007fd54b348e2f in main_loop_wait (nonblocking=0) at main-loop.c:484
> #16 0x00007fd54b15b0f8 in main_loop () at vl.c:2007
> #17 0x00007fd54b162a35 in main (argc=57, argv=0x7fff152720a8, envp=0x7fff15272278) at vl.c:4526
>
> (gdb) bt
> #0  qemu_aio_release (p=0x7f86420ebec0) at block.c:4811
> #1  0x00007f86412b617d in dma_complete (dbs=0x7f86420ebec0, ret=0) at dma-helpers.c:132
> #2  0x00007f86412b65ab in dma_aio_cancel (acb=0x7f86420ebec0) at dma-helpers.c:192
> #3  0x00007f86414a3996 in bdrv_aio_cancel (acb=0x7f86420ebec0) at block.c:4559
> #4  0x00007f86413906af in ide_bus_reset (bus=0x7f8641fe3a20) at hw/ide/core.c:2056
> #5  0x00007f86413967d6 in piix3_reset (opaque=0x7f8641fe32a0) at hw/ide/piix.c:114
> #6  0x00007f86412b9a37 in qemu_devices_reset () at vl.c:1829
> #7  0x00007f86412b9aef in qemu_system_reset (report=true) at vl.c:1842
> #8  0x00007f86412b9fe2 in main_loop_should_exit () at vl.c:1971
> #9  0x00007f86412ba100 in main_loop () at vl.c:2011
> #10 0x00007f86412c1a35 in main (argc=57, argv=0x7fff2e827d38, envp=0x7fff2e827f08) at vl.c:4526

Ok, this is the same as your previous backtrace.  The bug is still the 
same: dma_bdrv_cb must not be called dma_aio_cancel has finished the 
recursive call to bdrv_aio_cancel.

> BTW, is it better to rename dbs->in_cancel to dbs->canceled ?

If we were to apply my patch, yes.  But with the current logic 
"in_cancel" says "are we inside the recursive call to bdrv_aio_cancel" 
so I think the answer is no.  My patch is just a band-aid, I don't think 
it should be applied.

Paolo

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [PATCH] ide: fix double free
  2014-07-03 10:41                     ` Paolo Bonzini
@ 2014-07-07  8:12                       ` ChenLiang
  2014-07-07 12:38                         ` Paolo Bonzini
  0 siblings, 1 reply; 19+ messages in thread
From: ChenLiang @ 2014-07-07  8:12 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: kwolf@redhat.com, Gonglei (Arei), Huangweidong (C),
	qemu-devel@nongnu.org, stefanha@redhat.com

On 2014/7/3 18:41, Paolo Bonzini wrote:

> Il 03/07/2014 04:23, ChenLiang ha scritto:
>> Program received signal SIGABRT, Aborted.
>> 0x00007fd548355b55 in raise () from /lib64/libc.so.6
>> (gdb) bt
>> #0  0x00007fd548355b55 in raise () from /lib64/libc.so.6
>> #1  0x00007fd548357131 in abort () from /lib64/libc.so.6
>> #2  0x00007fd548393e0f in __libc_message () from /lib64/libc.so.6
>> #3  0x00007fd548399618 in malloc_printerr () from /lib64/libc.so.6
>> #4  0x00007fd54b15e80e in free_and_trace (mem=0x7fd54beb2230) at vl.c:2815
>> #5  0x00007fd54b3453cd in qemu_aio_release (p=0x7fd54beb2230) at block.c:4813
>> #6  0x00007fd54b15717d in dma_complete (dbs=0x7fd54beb2230, ret=0) at dma-helpers.c:132
>> #7  0x00007fd54b157253 in dma_bdrv_cb (opaque=0x7fd54beb2230, ret=0) at dma-helpers.c:148
>> #8  0x00007fd54b344db8 in bdrv_co_em_bh (opaque=0x7fd54bea4b30) at block.c:4676
>> #9  0x00007fd54b335a72 in aio_bh_poll (ctx=0x7fd54bcec990) at async.c:81
>> #10 0x00007fd54b34b1b4 in aio_poll (ctx=0x7fd54bcec990, blocking=false) at aio-posix.c:188
>> #11 0x00007fd54b335ee0 in aio_ctx_dispatch (source=0x7fd54bcec990, callback=0x0, user_data=0x0) at async.c:211
>> #12 0x00007fd549e3669a in g_main_context_dispatch () from /usr/lib64/libglib-2.0.so.0
>> #13 0x00007fd54b348c45 in glib_pollfds_poll () at main-loop.c:190
>> #14 0x00007fd54b348d3d in os_host_main_loop_wait (timeout=0) at main-loop.c:235
>> #15 0x00007fd54b348e2f in main_loop_wait (nonblocking=0) at main-loop.c:484
>> #16 0x00007fd54b15b0f8 in main_loop () at vl.c:2007
>> #17 0x00007fd54b162a35 in main (argc=57, argv=0x7fff152720a8, envp=0x7fff15272278) at vl.c:4526
>>
>> (gdb) bt
>> #0  qemu_aio_release (p=0x7f86420ebec0) at block.c:4811
>> #1  0x00007f86412b617d in dma_complete (dbs=0x7f86420ebec0, ret=0) at dma-helpers.c:132
>> #2  0x00007f86412b65ab in dma_aio_cancel (acb=0x7f86420ebec0) at dma-helpers.c:192
>> #3  0x00007f86414a3996 in bdrv_aio_cancel (acb=0x7f86420ebec0) at block.c:4559
>> #4  0x00007f86413906af in ide_bus_reset (bus=0x7f8641fe3a20) at hw/ide/core.c:2056
>> #5  0x00007f86413967d6 in piix3_reset (opaque=0x7f8641fe32a0) at hw/ide/piix.c:114
>> #6  0x00007f86412b9a37 in qemu_devices_reset () at vl.c:1829
>> #7  0x00007f86412b9aef in qemu_system_reset (report=true) at vl.c:1842
>> #8  0x00007f86412b9fe2 in main_loop_should_exit () at vl.c:1971
>> #9  0x00007f86412ba100 in main_loop () at vl.c:2011
>> #10 0x00007f86412c1a35 in main (argc=57, argv=0x7fff2e827d38, envp=0x7fff2e827f08) at vl.c:4526
> 
> Ok, this is the same as your previous backtrace.  The bug is still the same: dma_bdrv_cb must not be called dma_aio_cancel has finished the recursive call to bdrv_aio_cancel.
> 
>> BTW, is it better to rename dbs->in_cancel to dbs->canceled ?
> 
> If we were to apply my patch, yes.  But with the current logic "in_cancel" says "are we inside the recursive call to bdrv_aio_cancel" so I think the answer is no.  My patch is just a band-aid, I don't think it should be applied.
> 
> Paolo
> 
> .
> 

Hi,
virtio_blk_reset uses bdrv_drain_all too.

static void virtio_blk_reset(VirtIODevice *vdev)
{
    VirtIOBlock *s = VIRTIO_BLK(vdev);

#ifdef CONFIG_VIRTIO_BLK_DATA_PLANE
    if (s->dataplane) {
        virtio_blk_data_plane_stop(s->dataplane);
    }
#endif

    /*
     * This should cancel pending requests, but can't do nicely until there
     * are per-device request lists.
     */
    bdrv_drain_all();
    bdrv_set_enable_write_cache(s->bs, s->original_wce);
}

Best regards
Chenliang

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [PATCH] ide: fix double free
  2014-07-07  8:12                       ` ChenLiang
@ 2014-07-07 12:38                         ` Paolo Bonzini
  0 siblings, 0 replies; 19+ messages in thread
From: Paolo Bonzini @ 2014-07-07 12:38 UTC (permalink / raw)
  To: ChenLiang
  Cc: kwolf@redhat.com, Gonglei (Arei), Huangweidong (C),
	qemu-devel@nongnu.org, stefanha@redhat.com

Il 07/07/2014 10:12, ChenLiang ha scritto:
> Hi,
> virtio_blk_reset uses bdrv_drain_all too.

That's a limitation of virtio_blk_reset, as the comment above says.

virtio-scsi uses bdrv_aio_cancel. :)

Paolo

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2014-07-07 12:38 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-07-02  8:50 [Qemu-devel] [PATCH] ide: fix double free arei.gonglei
2014-07-02  9:04 ` Paolo Bonzini
2014-07-02  9:24   ` ChenLiang
2014-07-02  9:25     ` Paolo Bonzini
2014-07-02  9:46       ` Gonglei (Arei)
2014-07-02 10:16         ` Paolo Bonzini
2014-07-02 11:12           ` ChenLiang
2014-07-02 11:24             ` Paolo Bonzini
2014-07-02 11:33           ` ChenLiang
2014-07-02 11:40             ` Paolo Bonzini
2014-07-02 11:57               ` ChenLiang
2014-07-02 12:19                 ` Paolo Bonzini
2014-07-02 12:46                   ` 陈梁
2014-07-02 12:54                   ` 陈梁
2014-07-02 13:09                     ` Paolo Bonzini
2014-07-03  2:23                   ` ChenLiang
2014-07-03 10:41                     ` Paolo Bonzini
2014-07-07  8:12                       ` ChenLiang
2014-07-07 12:38                         ` Paolo Bonzini

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).