Re: Endless loop in qcow2_alloc_cluster_offset

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Jan Kiszka <jan.kiszka@siemens.com>
To: unlisted-recipients:; (no To-header on input)
Cc: Kevin Wolf <kwolf@redhat.com>, qemu-devel <qemu-devel@nongnu.org>,
	kvm <kvm@vger.kernel.org>
Subject: Re: Endless loop in qcow2_alloc_cluster_offset
Date: Mon, 07 Dec 2009 15:16:20 +0100	[thread overview]
Message-ID: <4B1D0E34.6070907@siemens.com> (raw)
In-Reply-To: <4B055D32.3040601@siemens.com>

Jan Kiszka wrote:
> Kevin Wolf wrote:
>> Hi Jan,
>>
>> Am 19.11.2009 13:19, schrieb Jan Kiszka:
>>> (gdb) print ((BDRVQcowState *)bs->opaque)->cluster_allocs.lh_first 
>>> $5 = (struct QCowL2Meta *) 0xcb3568
>>> (gdb) print *((BDRVQcowState *)bs->opaque)->cluster_allocs.lh_first 
>>> $6 = {offset = 7417176064, n_start = 0, nb_available = 16, nb_clusters = 0, depends_on = 0xcb3568, dependent_requests = {lh_first = 0x0}, next_in_flight = {le_next = 0xcb3568, le_prev = 0xc4ebd8}}
>>>
>>> So next == first.
>> Oops. Doesn't sound quite right...
>>
>>> Is something fiddling with cluster_allocs concurrently, e.g. some signal
>>> handler? Or what could cause this list corruption? Would it be enough to
>>> move to QLIST_FOREACH_SAFE?
>> Are there any specific signals you're thinking of? Related to block code
> 
> No, was just blind guessing.
> 
>> I can only think of SIGUSR2 and this one shouldn't call any block driver
>> functions directly. You're using aio=threads, I assume? (It's the default)
> 
> Yes, all on defaults.
> 
>> QLIST_FOREACH_SAFE shouldn't make a difference in this place as the loop
>> doesn't insert or remove any elements. If the list is corrupted now, I
>> think it would be corrupted with QLIST_FOREACH_SAFE as well - at best,
>> the endless loop would occur one call later.
>>
>> The only way I see to get such a loop in a list is to re-insert an
>> element that already is part of the list. The only insert is at
>> qcow2-cluster.c:777. Remains the question how we came there twice
>> without run_dependent_requests() removing the L2Meta from our list first
>> - because this is definitely wrong...
>>
>> Presumably, it's not reproducible?
> 
> Likely not. What I did was nothing special, and I did not noticed such a
> crash in the last months.

And now it happened again (qemu-kvm head, during kernel installation
from network onto local qcow2-disk). Any clever idea how to proceed with
this?

I could try to run the step in a loop, hopefully retriggering it once in
a (likely longer) while. But then we need some good instrumentation first.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

WARNING: multiple messages have this Message-ID (diff)

From: Jan Kiszka <jan.kiszka@siemens.com>
Cc: Kevin Wolf <kwolf@redhat.com>, qemu-devel <qemu-devel@nongnu.org>,
	kvm <kvm@vger.kernel.org>
Subject: [Qemu-devel] Re: Endless loop in qcow2_alloc_cluster_offset
Date: Mon, 07 Dec 2009 15:16:20 +0100	[thread overview]
Message-ID: <4B1D0E34.6070907@siemens.com> (raw)
In-Reply-To: <4B055D32.3040601@siemens.com>

Jan Kiszka wrote:
> Kevin Wolf wrote:
>> Hi Jan,
>>
>> Am 19.11.2009 13:19, schrieb Jan Kiszka:
>>> (gdb) print ((BDRVQcowState *)bs->opaque)->cluster_allocs.lh_first 
>>> $5 = (struct QCowL2Meta *) 0xcb3568
>>> (gdb) print *((BDRVQcowState *)bs->opaque)->cluster_allocs.lh_first 
>>> $6 = {offset = 7417176064, n_start = 0, nb_available = 16, nb_clusters = 0, depends_on = 0xcb3568, dependent_requests = {lh_first = 0x0}, next_in_flight = {le_next = 0xcb3568, le_prev = 0xc4ebd8}}
>>>
>>> So next == first.
>> Oops. Doesn't sound quite right...
>>
>>> Is something fiddling with cluster_allocs concurrently, e.g. some signal
>>> handler? Or what could cause this list corruption? Would it be enough to
>>> move to QLIST_FOREACH_SAFE?
>> Are there any specific signals you're thinking of? Related to block code
> 
> No, was just blind guessing.
> 
>> I can only think of SIGUSR2 and this one shouldn't call any block driver
>> functions directly. You're using aio=threads, I assume? (It's the default)
> 
> Yes, all on defaults.
> 
>> QLIST_FOREACH_SAFE shouldn't make a difference in this place as the loop
>> doesn't insert or remove any elements. If the list is corrupted now, I
>> think it would be corrupted with QLIST_FOREACH_SAFE as well - at best,
>> the endless loop would occur one call later.
>>
>> The only way I see to get such a loop in a list is to re-insert an
>> element that already is part of the list. The only insert is at
>> qcow2-cluster.c:777. Remains the question how we came there twice
>> without run_dependent_requests() removing the L2Meta from our list first
>> - because this is definitely wrong...
>>
>> Presumably, it's not reproducible?
> 
> Likely not. What I did was nothing special, and I did not noticed such a
> crash in the last months.

And now it happened again (qemu-kvm head, during kernel installation
from network onto local qcow2-disk). Any clever idea how to proceed with
this?

I could try to run the step in a loop, hopefully retriggering it once in
a (likely longer) while. But then we need some good instrumentation first.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

next prev parent reply	other threads:[~2009-12-07 14:16 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-11-19 12:19 Endless loop in qcow2_alloc_cluster_offset Jan Kiszka
2009-11-19 12:19 ` [Qemu-devel] " Jan Kiszka
2009-11-19 14:49 ` Kevin Wolf
2009-11-19 14:49   ` [Qemu-devel] " Kevin Wolf
2009-11-19 14:58   ` Jan Kiszka
2009-11-19 14:58     ` [Qemu-devel] " Jan Kiszka
2009-12-07 14:16     ` Jan Kiszka [this message]
2009-12-07 14:16       ` Jan Kiszka
2009-12-07 14:50       ` Jan Kiszka
2009-12-07 14:50         ` [Qemu-devel] " Jan Kiszka
2009-12-07 15:03         ` Kevin Wolf
2009-12-07 15:03           ` [Qemu-devel] " Kevin Wolf
2009-12-07 15:25           ` Jan Kiszka
2009-12-07 15:25             ` [Qemu-devel] " Jan Kiszka
2009-12-07 15:04         ` Avi Kivity
2009-12-07 15:04           ` [Qemu-devel] " Avi Kivity
2009-12-07 15:00       ` Kevin Wolf
2009-12-07 15:00         ` [Qemu-devel] " Kevin Wolf
2009-12-07 16:09         ` Jan Kiszka
2009-12-07 16:09           ` [Qemu-devel] " Jan Kiszka
2009-12-07 16:26           ` Kevin Wolf
2009-12-07 16:26             ` [Qemu-devel] " Kevin Wolf
2009-12-08 14:51         ` Kevin Wolf
2010-05-07  1:19 ` Marcelo Tosatti
2010-05-07  1:19   ` [Qemu-devel] " Marcelo Tosatti
2010-05-07  7:37   ` Kevin Wolf
2010-05-07  7:37     ` [Qemu-devel] " Kevin Wolf
2010-05-07 15:16     ` Marcelo Tosatti
2010-05-07 15:16       ` [Qemu-devel] " Marcelo Tosatti

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4B1D0E34.6070907@siemens.com \
    --to=jan.kiszka@siemens.com \
    --cc=kvm@vger.kernel.org \
    --cc=kwolf@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.