From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jan Kiszka Subject: Re: Endless loop in qcow2_alloc_cluster_offset Date: Thu, 19 Nov 2009 15:58:58 +0100 Message-ID: <4B055D32.3040601@siemens.com> References: <4B0537EB.4000909@siemens.com> <4B055AEF.4030406@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-15 Content-Transfer-Encoding: 7bit Cc: qemu-devel , kvm To: Kevin Wolf Return-path: Received: from david.siemens.de ([192.35.17.14]:21504 "EHLO david.siemens.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752484AbZKSO7Q (ORCPT ); Thu, 19 Nov 2009 09:59:16 -0500 In-Reply-To: <4B055AEF.4030406@redhat.com> Sender: kvm-owner@vger.kernel.org List-ID: Kevin Wolf wrote: > Hi Jan, > > Am 19.11.2009 13:19, schrieb Jan Kiszka: >> (gdb) print ((BDRVQcowState *)bs->opaque)->cluster_allocs.lh_first >> $5 = (struct QCowL2Meta *) 0xcb3568 >> (gdb) print *((BDRVQcowState *)bs->opaque)->cluster_allocs.lh_first >> $6 = {offset = 7417176064, n_start = 0, nb_available = 16, nb_clusters = 0, depends_on = 0xcb3568, dependent_requests = {lh_first = 0x0}, next_in_flight = {le_next = 0xcb3568, le_prev = 0xc4ebd8}} >> >> So next == first. > > Oops. Doesn't sound quite right... > >> Is something fiddling with cluster_allocs concurrently, e.g. some signal >> handler? Or what could cause this list corruption? Would it be enough to >> move to QLIST_FOREACH_SAFE? > > Are there any specific signals you're thinking of? Related to block code No, was just blind guessing. > I can only think of SIGUSR2 and this one shouldn't call any block driver > functions directly. You're using aio=threads, I assume? (It's the default) Yes, all on defaults. > > QLIST_FOREACH_SAFE shouldn't make a difference in this place as the loop > doesn't insert or remove any elements. If the list is corrupted now, I > think it would be corrupted with QLIST_FOREACH_SAFE as well - at best, > the endless loop would occur one call later. > > The only way I see to get such a loop in a list is to re-insert an > element that already is part of the list. The only insert is at > qcow2-cluster.c:777. Remains the question how we came there twice > without run_dependent_requests() removing the L2Meta from our list first > - because this is definitely wrong... > > Presumably, it's not reproducible? Likely not. What I did was nothing special, and I did not noticed such a crash in the last months. Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux