Re: Endless loop in qcow2_alloc_cluster_offset

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Kevin Wolf <kwolf@redhat.com>
To: Jan Kiszka <jan.kiszka@siemens.com>
Cc: qemu-devel <qemu-devel@nongnu.org>, kvm <kvm@vger.kernel.org>
Subject: Re: Endless loop in qcow2_alloc_cluster_offset
Date: Mon, 07 Dec 2009 16:00:18 +0100	[thread overview]
Message-ID: <4B1D1882.7040404@redhat.com> (raw)
In-Reply-To: <4B1D0E34.6070907@siemens.com>

Am 07.12.2009 15:16, schrieb Jan Kiszka:
>> Likely not. What I did was nothing special, and I did not noticed such a
>> crash in the last months.
> 
> And now it happened again (qemu-kvm head, during kernel installation
> from network onto local qcow2-disk). Any clever idea how to proceed with
> this?

I still haven't seen this and I still have no theory on what could be
happening here. I'm just trying to write down what I think must happen
to get into this situation. Maybe you can point at something I'm missing
or maybe it helps you to have a sudden inspiration.

The crash happens because we have a loop in the s->cluster_allocs list.
A loop can only be created by inserting an object twice. The only insert
to this list happens in qcow2_alloc_cluster_offset (though an earlier
call than that of the stack trace).

There is only one relevant caller of this function, qcow_aio_write_cb.
Part of it is a call to run_dependent_requests which removes the request
from s->cluster_allocs. So after the QLIST_REMOVE in
run_dependent_requests the request can't be contained in the list, but
at the call of qcow2_alloc_cluster_offset it must be contained again. It
must be added somewhere in between these two calls.

In qcow_aio_write_cb there isn't much happening between these calls. The
only thing that could somehow become dangerous is the
qcow_aio_write_cb(req, 0); for queued requests in run_dependent_requests.

> I could try to run the step in a loop, hopefully retriggering it once in
> a (likely longer) while. But then we need some good instrumentation first.

I can't explain what exactly would be going wrong there, but if my
thoughts are right so far, I think that moving this into a Bottom Half
would help. So if you can reproduce it in a loop this could be worth a try.

I'd certainly prefer to understand the problem first, but thinking about
AIO is the perfect way to make your brain hurt...

Kevin

WARNING: multiple messages have this Message-ID (diff)

From: Kevin Wolf <kwolf@redhat.com>
To: Jan Kiszka <jan.kiszka@siemens.com>
Cc: qemu-devel <qemu-devel@nongnu.org>, kvm <kvm@vger.kernel.org>
Subject: [Qemu-devel] Re: Endless loop in qcow2_alloc_cluster_offset
Date: Mon, 07 Dec 2009 16:00:18 +0100	[thread overview]
Message-ID: <4B1D1882.7040404@redhat.com> (raw)
In-Reply-To: <4B1D0E34.6070907@siemens.com>

Am 07.12.2009 15:16, schrieb Jan Kiszka:
>> Likely not. What I did was nothing special, and I did not noticed such a
>> crash in the last months.
> 
> And now it happened again (qemu-kvm head, during kernel installation
> from network onto local qcow2-disk). Any clever idea how to proceed with
> this?

I still haven't seen this and I still have no theory on what could be
happening here. I'm just trying to write down what I think must happen
to get into this situation. Maybe you can point at something I'm missing
or maybe it helps you to have a sudden inspiration.

The crash happens because we have a loop in the s->cluster_allocs list.
A loop can only be created by inserting an object twice. The only insert
to this list happens in qcow2_alloc_cluster_offset (though an earlier
call than that of the stack trace).

There is only one relevant caller of this function, qcow_aio_write_cb.
Part of it is a call to run_dependent_requests which removes the request
from s->cluster_allocs. So after the QLIST_REMOVE in
run_dependent_requests the request can't be contained in the list, but
at the call of qcow2_alloc_cluster_offset it must be contained again. It
must be added somewhere in between these two calls.

In qcow_aio_write_cb there isn't much happening between these calls. The
only thing that could somehow become dangerous is the
qcow_aio_write_cb(req, 0); for queued requests in run_dependent_requests.

> I could try to run the step in a loop, hopefully retriggering it once in
> a (likely longer) while. But then we need some good instrumentation first.

I can't explain what exactly would be going wrong there, but if my
thoughts are right so far, I think that moving this into a Bottom Half
would help. So if you can reproduce it in a loop this could be worth a try.

I'd certainly prefer to understand the problem first, but thinking about
AIO is the perfect way to make your brain hurt...

Kevin

next prev parent reply	other threads:[~2009-12-07 15:01 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-11-19 12:19 Endless loop in qcow2_alloc_cluster_offset Jan Kiszka
2009-11-19 12:19 ` [Qemu-devel] " Jan Kiszka
2009-11-19 14:49 ` Kevin Wolf
2009-11-19 14:49   ` [Qemu-devel] " Kevin Wolf
2009-11-19 14:58   ` Jan Kiszka
2009-11-19 14:58     ` [Qemu-devel] " Jan Kiszka
2009-12-07 14:16     ` Jan Kiszka
2009-12-07 14:16       ` [Qemu-devel] " Jan Kiszka
2009-12-07 14:50       ` Jan Kiszka
2009-12-07 14:50         ` [Qemu-devel] " Jan Kiszka
2009-12-07 15:03         ` Kevin Wolf
2009-12-07 15:03           ` [Qemu-devel] " Kevin Wolf
2009-12-07 15:25           ` Jan Kiszka
2009-12-07 15:25             ` [Qemu-devel] " Jan Kiszka
2009-12-07 15:04         ` Avi Kivity
2009-12-07 15:04           ` [Qemu-devel] " Avi Kivity
2009-12-07 15:00       ` Kevin Wolf [this message]
2009-12-07 15:00         ` Kevin Wolf
2009-12-07 16:09         ` Jan Kiszka
2009-12-07 16:09           ` [Qemu-devel] " Jan Kiszka
2009-12-07 16:26           ` Kevin Wolf
2009-12-07 16:26             ` [Qemu-devel] " Kevin Wolf
2009-12-08 14:51         ` Kevin Wolf
2010-05-07  1:19 ` Marcelo Tosatti
2010-05-07  1:19   ` [Qemu-devel] " Marcelo Tosatti
2010-05-07  7:37   ` Kevin Wolf
2010-05-07  7:37     ` [Qemu-devel] " Kevin Wolf
2010-05-07 15:16     ` Marcelo Tosatti
2010-05-07 15:16       ` [Qemu-devel] " Marcelo Tosatti

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4B1D1882.7040404@redhat.com \
    --to=kwolf@redhat.com \
    --cc=jan.kiszka@siemens.com \
    --cc=kvm@vger.kernel.org \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.