All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Andreas Färber" <afaerber@suse.de>
To: Kevin Wolf <kwolf@redhat.com>, Zhouyi Zhou <yizhouzhou@ict.ac.cn>
Cc: Michael Tokarev <mjt@tls.msk.ru>,
	qemu-devel@nongnu.org, Bruce Rogers <brogers@suse.com>
Subject: Re: [Qemu-devel] fixing qemu-0.1X endless loop in qcow2_alloc_cluster_offset
Date: Tue, 12 Jun 2012 15:33:07 +0200	[thread overview]
Message-ID: <4FD74513.2000500@suse.de> (raw)
In-Reply-To: <4FB0F89F.6080306@redhat.com>

Am 14.05.2012 14:20, schrieb Kevin Wolf:
> Am 13.05.2012 10:03, schrieb Zhouyi Zhou:
>> hi all
>>   
>>   sometimes, qemu/kvm-0.1x will hang in endless loop in qcow2_alloc_cluster_offset.
>>   after some investigation, I found that:
>>   in function posix_aio_process_queue(void *opaque)
>> 440             ret = qemu_paio_error(acb);
>> 441             if (ret == ECANCELED) {
>> 442                 /* remove the request */
>> 443                 *pacb = acb->next;
>> 444                 qemu_aio_release(acb);
>> 445                 result = 1;
>> 446             } else if (ret != EINPROGRESS) {
>>   in line 444 acb got released but acb->common.opaque does not.
>> which will be released via guest OS via ide_dma_cancel which 
>> will in term call qcow_aio_cancel which does not check its argument
>> is in flight list or not.
>>   The fix is as follows: (debian 6's qemu-kvm-0.12.5)
>> #######################################
>> --- block/qcow2.h~      2010-07-27 08:43:53.000000000 +0800
>> +++ block/qcow2.h       2012-05-13 15:51:39.000000000 +0800
>> @@ -143,6 +143,7 @@
>>      QLIST_HEAD(QCowAioDependencies, QCowAIOCB) dependent_requests;
>>  
>>      QLIST_ENTRY(QCowL2Meta) next_in_flight;
>> +    int inflight;       
>>  } QCowL2Meta;
>> --- block/qcow2.c~  2012-05-13 15:57:09.000000000 +0800
>> +++ block/qcow2.c       2012-05-13 15:57:24.000000000 +0800
>> @@ -349,6 +349,10 @@
>>      QCowAIOCB *acb = (QCowAIOCB *)blockacb;
>>      if (acb->hd_aiocb)
>>          bdrv_aio_cancel(acb->hd_aiocb);
>> +    if (acb->l2meta.inflight) {
>> +        QLIST_REMOVE(&acb->l2meta, next_in_flight);
>> +       acb->l2meta.inflight = 0;
>> +    }
>>      qemu_aio_release(acb);
>>  }
>>  
>> @@ -506,6 +510,7 @@
>>      acb->n = 0;
>>      acb->cluster_offset = 0;
>>      acb->l2meta.nb_clusters = 0;
>> +    acb->l2meta.inflight = 0;
>>      QLIST_INIT(&acb->l2meta.dependent_requests);
>>      return acb;
>>  }
>> @@ -534,6 +539,7 @@
>>      /* Take the request off the list of running requests */
>>      if (m->nb_clusters != 0) {
>>          QLIST_REMOVE(m, next_in_flight);
>> +       m->inflight = 0;
>>      }
>>  
>>      /*
>> @@ -632,6 +638,7 @@
>>  fail:
>>      if (acb->l2meta.nb_clusters != 0) {
>>          QLIST_REMOVE(&acb->l2meta, next_in_flight);
>> +       acb->l2meta.inflight  = 0;
>>      }
>>  done:
>>      if (acb->qiov->niov > 1)
>> --- block/qcow2-cluster.c~      2010-07-27 08:43:53.000000000 +0800
>> +++ block/qcow2-cluster.c       2012-05-13 15:53:53.000000000 +0800
>> @@ -827,6 +827,7 @@
>>      m->offset = offset;
>>      m->n_start = n_start;
>>      m->nb_clusters = nb_clusters;
>> +    m->inflight = 1;
>>  
>>  out:
>>      m->nb_available = MIN(nb_clusters << (s->cluster_bits - 9), n_end);
>>
>>  Thanks for investigation
>> Zhouyi
> 
> The patch looks reasonable to me. Note however that while it fixes the
> hang, it still causes cluster leaks. I'm not sure if someone is
> interested in picking these up for old stable releases. Andreas, I think
> you were going to take 0.15? The first version that doesn't have the
> problem is 1.0.

Kevin, the policy as I understood it is to cherry-pick patches from
qemu.git into qemu-stable-x.y.git. So I don't think me applying this
patch to stable-0.15 would be right. I don't spot a particular qcow2 fix
among our 0.15 backports that I have now pushed. Do you have a pointer
which one(s) would fix this issue so that I can recheck?

Zhouyi, could you test git://git.qemu.org/qemu-stable-0.15.git and check
if the problem is still there?

Thanks,
Andreas

-- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg

  reply	other threads:[~2012-06-12 13:33 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-05-13  8:03 [Qemu-devel] fixing qemu-0.1X endless loop in qcow2_alloc_cluster_offset Zhouyi Zhou
2012-05-14 12:20 ` Kevin Wolf
2012-06-12 13:33   ` Andreas Färber [this message]
2012-06-12 13:44     ` Kevin Wolf
2012-10-12 15:52       ` Andreas Färber
2012-10-15  9:13         ` Kevin Wolf
2012-10-15 14:28           ` Andreas Färber
2012-10-15 14:46             ` Kevin Wolf
2012-10-16  3:32               ` 周洲仪
2012-11-09 17:21           ` Andreas Färber

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4FD74513.2000500@suse.de \
    --to=afaerber@suse.de \
    --cc=brogers@suse.com \
    --cc=kwolf@redhat.com \
    --cc=mjt@tls.msk.ru \
    --cc=qemu-devel@nongnu.org \
    --cc=yizhouzhou@ict.ac.cn \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.