* [Qemu-devel] QCOW2 bugs releated to qcow2_aio_cancel()
@ 2011-02-03 17:21 Chunqiang Tang
2011-02-03 17:44 ` [Qemu-devel] " Kevin Wolf
0 siblings, 1 reply; 3+ messages in thread
From: Chunqiang Tang @ 2011-02-03 17:21 UTC (permalink / raw)
To: Kevin Wolf; +Cc: Stefan Hajnoczi, qemu-devel
Hi Kevin,
Fast Virtual Disk (FVD) has an automated testing tool (see
http://wiki.qemu.org/Features/FVD/Engineering). For a long time, I knew
that QCOW2 could not pass the automated tests. Today I finally sit down to
look into those bugs. I already submitted multiple patches for different
bugs, but there is one case that I am not certain how to handle. Instead
of creating a potentially broken patch, I though you might be able to
handle it better than me. Bugs showed up when the testing tool injected
aio cancel.
First, the cancelled request is not taken off the list of running
requests, i.e., s->cluster_allocs and next_in_flight. As a result, when
the acb is freed and reused, it formed circles in s->cluster_allocs, and
the qcow2_alloc_cluster_offset() code below went into dead loop. I tried
to add run_dependent_requests() into qcow2_aio_cancel(), but that does not
solve all the problem. Dead loop still occurs.
The second bug is related to QCowAIOCB.bh. There are several issues. 1)
When a request is cancelled, the bh is not cancelled. 2) qcow2_aio_setup()
does not initialize bh=NULL and relies on qcow2_aio_read_bh() to set
bh=NULL. When the acb is reused for another request, bh!=NULL. As a
result, qcow2_schedule_bh() fails on checking "if (acb->bh) return -EIO;"
There may be other bugs related to qcow2_aio_cancel(), as the testing tool
could not run long enough before it hits a bug. As a result, the coverage
is low.
static void qcow2_aio_cancel(BlockDriverAIOCB *blockacb)
{
QCowAIOCB *acb = container_of(blockacb, QCowAIOCB, common);
if (acb->hd_aiocb)
bdrv_aio_cancel(acb->hd_aiocb);
run_dependent_requests(&acb->l2meta); /******* added ******/
qemu_aio_release(acb);
}
int qcow2_alloc_cluster_offset()
{
...
/******* run into dead loop here when a cancelled was not taken off
the list. */
QLIST_FOREACH(old_alloc, &s->cluster_allocs, next_in_flight) {
...
}
}
Regards,
ChunQiang (CQ) Tang, Ph.D.
Tel: +1-914-784-7412 Homepage:
http://www.research.ibm.com/people/c/ctang
^ permalink raw reply [flat|nested] 3+ messages in thread* [Qemu-devel] Re: QCOW2 bugs releated to qcow2_aio_cancel()
2011-02-03 17:21 [Qemu-devel] QCOW2 bugs releated to qcow2_aio_cancel() Chunqiang Tang
@ 2011-02-03 17:44 ` Kevin Wolf
2011-02-04 5:58 ` Stefan Hajnoczi
0 siblings, 1 reply; 3+ messages in thread
From: Kevin Wolf @ 2011-02-03 17:44 UTC (permalink / raw)
To: Chunqiang Tang; +Cc: Stefan Hajnoczi, qemu-devel
Am 03.02.2011 18:21, schrieb Chunqiang Tang:
> Hi Kevin,
>
> Fast Virtual Disk (FVD) has an automated testing tool (see
> http://wiki.qemu.org/Features/FVD/Engineering). For a long time, I knew
> that QCOW2 could not pass the automated tests. Today I finally sit down to
> look into those bugs. I already submitted multiple patches for different
> bugs, but there is one case that I am not certain how to handle. Instead
> of creating a potentially broken patch, I though you might be able to
> handle it better than me. Bugs showed up when the testing tool injected
> aio cancel.
We have discussed this before and yes, aio_cancel is a problem. It is
very hard to do this correctly, and so most block drivers just wait for
completion instead of actually cancelling anything.
Given that a cancel shouldn't happen too often, I think it would be
reasonable to take the same approach for qcow2. I don't think adding a
lot of complexity for getting this right is justified.
Stefan, what do you think? Maybe we could even have a default
implementation in generic block code?
Kevin
^ permalink raw reply [flat|nested] 3+ messages in thread
* [Qemu-devel] Re: QCOW2 bugs releated to qcow2_aio_cancel()
2011-02-03 17:44 ` [Qemu-devel] " Kevin Wolf
@ 2011-02-04 5:58 ` Stefan Hajnoczi
0 siblings, 0 replies; 3+ messages in thread
From: Stefan Hajnoczi @ 2011-02-04 5:58 UTC (permalink / raw)
To: Kevin Wolf; +Cc: Chunqiang Tang, qemu-devel
On Thu, Feb 3, 2011 at 5:44 PM, Kevin Wolf <kwolf@redhat.com> wrote:
> Given that a cancel shouldn't happen too often, I think it would be
> reasonable to take the same approach for qcow2. I don't think adding a
> lot of complexity for getting this right is justified.
>
> Stefan, what do you think? Maybe we could even have a default
> implementation in generic block code?
Yes, I agree that the simple cancellation approach is a good trade-off
since cancellation is rare. It guarantees that the image is
consistent on disk and that in-memory resources are freed properly.
Stefan
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2011-02-04 5:58 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-02-03 17:21 [Qemu-devel] QCOW2 bugs releated to qcow2_aio_cancel() Chunqiang Tang
2011-02-03 17:44 ` [Qemu-devel] " Kevin Wolf
2011-02-04 5:58 ` Stefan Hajnoczi
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.