From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1NB8LH-0001YY-5l for qemu-devel@nongnu.org; Thu, 19 Nov 2009 09:50:39 -0500 Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1NB8L9-0001SS-3v for qemu-devel@nongnu.org; Thu, 19 Nov 2009 09:50:36 -0500 Received: from [199.232.76.173] (port=34446 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1NB8L8-0001SA-RQ for qemu-devel@nongnu.org; Thu, 19 Nov 2009 09:50:30 -0500 Received: from mx1.redhat.com ([209.132.183.28]:24894) by monty-python.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1NB8L8-000660-0o for qemu-devel@nongnu.org; Thu, 19 Nov 2009 09:50:30 -0500 Message-ID: <4B055AEF.4030406@redhat.com> Date: Thu, 19 Nov 2009 15:49:19 +0100 From: Kevin Wolf MIME-Version: 1.0 References: <4B0537EB.4000909@siemens.com> In-Reply-To: <4B0537EB.4000909@siemens.com> Content-Type: text/plain; charset=ISO-8859-15 Content-Transfer-Encoding: 7bit Subject: [Qemu-devel] Re: Endless loop in qcow2_alloc_cluster_offset List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Jan Kiszka Cc: qemu-devel , kvm Hi Jan, Am 19.11.2009 13:19, schrieb Jan Kiszka: > (gdb) print ((BDRVQcowState *)bs->opaque)->cluster_allocs.lh_first > $5 = (struct QCowL2Meta *) 0xcb3568 > (gdb) print *((BDRVQcowState *)bs->opaque)->cluster_allocs.lh_first > $6 = {offset = 7417176064, n_start = 0, nb_available = 16, nb_clusters = 0, depends_on = 0xcb3568, dependent_requests = {lh_first = 0x0}, next_in_flight = {le_next = 0xcb3568, le_prev = 0xc4ebd8}} > > So next == first. Oops. Doesn't sound quite right... > Is something fiddling with cluster_allocs concurrently, e.g. some signal > handler? Or what could cause this list corruption? Would it be enough to > move to QLIST_FOREACH_SAFE? Are there any specific signals you're thinking of? Related to block code I can only think of SIGUSR2 and this one shouldn't call any block driver functions directly. You're using aio=threads, I assume? (It's the default) QLIST_FOREACH_SAFE shouldn't make a difference in this place as the loop doesn't insert or remove any elements. If the list is corrupted now, I think it would be corrupted with QLIST_FOREACH_SAFE as well - at best, the endless loop would occur one call later. The only way I see to get such a loop in a list is to re-insert an element that already is part of the list. The only insert is at qcow2-cluster.c:777. Remains the question how we came there twice without run_dependent_requests() removing the L2Meta from our list first - because this is definitely wrong... Presumably, it's not reproducible? Kevin