From mboxrd@z Thu Jan 1 00:00:00 1970 From: Zhang Haoyu Subject: Re: [Qemu-devel] [PATCH] qcow2: fix double-free of Qcow2DiscardRegion in qcow2_process_discards Date: Sun, 12 Oct 2014 16:22:38 +0800 Message-ID: <543A3A4E.8020607@gmail.com> References: <201410111514227991260@sangfor.com> <20141012073432.GA3739@noname.redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: Stefan Hajnoczi , qemu-devel , kvm To: Kevin Wolf , Zhang Haoyu Return-path: Received: from mail-pd0-f181.google.com ([209.85.192.181]:53610 "EHLO mail-pd0-f181.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750772AbaJLIWc (ORCPT ); Sun, 12 Oct 2014 04:22:32 -0400 Received: by mail-pd0-f181.google.com with SMTP id z10so4045720pdj.12 for ; Sun, 12 Oct 2014 01:22:31 -0700 (PDT) In-Reply-To: <20141012073432.GA3739@noname.redhat.com> Sender: kvm-owner@vger.kernel.org List-ID: On 2014-10-12 15:34, Kevin Wolf wrote: > Am 11.10.2014 um 09:14 hat Zhang Haoyu geschrieben: >> In qcow2_update_snapshot_refcount -> qcow2_process_discards() -> bdrv_discard() >> may free the Qcow2DiscardRegion which is referenced by "next" pointer in >> qcow2_process_discards() now, in next iteration, d = next, so g_free(d) >> will double-free this Qcow2DiscardRegion. >> >> qcow2_snapshot_delete >> |- qcow2_update_snapshot_refcount >> |-- qcow2_process_discards >> |--- bdrv_discard >> |---- aio_poll >> |----- aio_dispatch >> |------ bdrv_co_io_em_complete >> |------- qemu_coroutine_enter(co->coroutine, NULL); <=== coroutine entry is bdrv_co_do_rw >> |--- g_free(d) <== free first Qcow2DiscardRegion is okay >> |--- d = next; <== this set is done in QTAILQ_FOREACH_SAFE() macro. >> |--- g_free(d); <== double-free will happen if during previous iteration, bdrv_discard had free this object. > Do you have a reproducer for this or did code review lead you to this? This problem can be reproduced with loop of savevm -> delvm -> savem -> delvm ..., about 4 hours. When I delete the vm snapshot, qemu crashed with a core file, I debug the core file and find the double-free and the stack. So I add a breakpoint at g_free(d);, and find that indeed a double-free happened, twice free with the same address. And only the first discard region have not happened with double-free. > > At the moment I can't see how bdrv_discard(bs->file) could ever free a > Qcow2DiscardRegion of bs, as it's working on a completely different > BlockDriverState (which usually won't even be a qcow2 one). I think the "aio_context" in bdrv_discard -> aio_poll(aio_context, true) is the qemu_aio_context, no matter the bs or bs->file passed to bdrv_discard, so aio_poll(aio_context) will poll all of the aio. > >> bdrv_co_do_rw >> |- bdrv_co_do_writev >> |-- bdrv_co_do_pwritev >> |--- bdrv_aligned_pwritev >> |---- qcow2_co_writev >> |----- qcow2_alloc_cluster_link_l2 >> |------ qcow2_free_any_clusters >> |------- qcow2_free_clusters >> |-------- update_refcount >> |--------- qcow2_process_discards >> |---------- g_free(d) <== In next iteration, this Qcow2DiscardRegion will be double-free. > This shouldn't happen in a nested call either, as s->lock can't be taken > recursively. Could you detail how s->lock prevent that, above stack is from the gdb, when I add a breakpoint in g_free(d). Thanks, Zhang Haoyu > > Kevin > >