From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([208.118.235.92]:33648) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1STuGp-0000e4-2d for qemu-devel@nongnu.org; Mon, 14 May 2012 08:21:04 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1STuGn-000159-7w for qemu-devel@nongnu.org; Mon, 14 May 2012 08:20:58 -0400 Received: from mx1.redhat.com ([209.132.183.28]:21012) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1STuGm-00014y-W9 for qemu-devel@nongnu.org; Mon, 14 May 2012 08:20:57 -0400 Message-ID: <4FB0F89F.6080306@redhat.com> Date: Mon, 14 May 2012 14:20:47 +0200 From: Kevin Wolf MIME-Version: 1.0 References: <20120513160331.5b774c93.yizhouzhou@ict.ac.cn> In-Reply-To: <20120513160331.5b774c93.yizhouzhou@ict.ac.cn> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] fixing qemu-0.1X endless loop in qcow2_alloc_cluster_offset List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Zhouyi Zhou Cc: Michael Tokarev , qemu-devel@nongnu.org, =?ISO-8859-1?Q?Andreas_F=E4rber?= Am 13.05.2012 10:03, schrieb Zhouyi Zhou: > hi all > > sometimes, qemu/kvm-0.1x will hang in endless loop in qcow2_alloc_cluster_offset. > after some investigation, I found that: > in function posix_aio_process_queue(void *opaque) > 440 ret = qemu_paio_error(acb); > 441 if (ret == ECANCELED) { > 442 /* remove the request */ > 443 *pacb = acb->next; > 444 qemu_aio_release(acb); > 445 result = 1; > 446 } else if (ret != EINPROGRESS) { > in line 444 acb got released but acb->common.opaque does not. > which will be released via guest OS via ide_dma_cancel which > will in term call qcow_aio_cancel which does not check its argument > is in flight list or not. > The fix is as follows: (debian 6's qemu-kvm-0.12.5) > ####################################### > --- block/qcow2.h~ 2010-07-27 08:43:53.000000000 +0800 > +++ block/qcow2.h 2012-05-13 15:51:39.000000000 +0800 > @@ -143,6 +143,7 @@ > QLIST_HEAD(QCowAioDependencies, QCowAIOCB) dependent_requests; > > QLIST_ENTRY(QCowL2Meta) next_in_flight; > + int inflight; > } QCowL2Meta; > --- block/qcow2.c~ 2012-05-13 15:57:09.000000000 +0800 > +++ block/qcow2.c 2012-05-13 15:57:24.000000000 +0800 > @@ -349,6 +349,10 @@ > QCowAIOCB *acb = (QCowAIOCB *)blockacb; > if (acb->hd_aiocb) > bdrv_aio_cancel(acb->hd_aiocb); > + if (acb->l2meta.inflight) { > + QLIST_REMOVE(&acb->l2meta, next_in_flight); > + acb->l2meta.inflight = 0; > + } > qemu_aio_release(acb); > } > > @@ -506,6 +510,7 @@ > acb->n = 0; > acb->cluster_offset = 0; > acb->l2meta.nb_clusters = 0; > + acb->l2meta.inflight = 0; > QLIST_INIT(&acb->l2meta.dependent_requests); > return acb; > } > @@ -534,6 +539,7 @@ > /* Take the request off the list of running requests */ > if (m->nb_clusters != 0) { > QLIST_REMOVE(m, next_in_flight); > + m->inflight = 0; > } > > /* > @@ -632,6 +638,7 @@ > fail: > if (acb->l2meta.nb_clusters != 0) { > QLIST_REMOVE(&acb->l2meta, next_in_flight); > + acb->l2meta.inflight = 0; > } > done: > if (acb->qiov->niov > 1) > --- block/qcow2-cluster.c~ 2010-07-27 08:43:53.000000000 +0800 > +++ block/qcow2-cluster.c 2012-05-13 15:53:53.000000000 +0800 > @@ -827,6 +827,7 @@ > m->offset = offset; > m->n_start = n_start; > m->nb_clusters = nb_clusters; > + m->inflight = 1; > > out: > m->nb_available = MIN(nb_clusters << (s->cluster_bits - 9), n_end); > > Thanks for investigation > Zhouyi The patch looks reasonable to me. Note however that while it fixes the hang, it still causes cluster leaks. I'm not sure if someone is interested in picking these up for old stable releases. Andreas, I think you were going to take 0.15? The first version that doesn't have the problem is 1.0. Kevin