From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1NHg9t-0000U5-28 for qemu-devel@nongnu.org; Mon, 07 Dec 2009 11:09:57 -0500 Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1NHg9o-0000SU-2W for qemu-devel@nongnu.org; Mon, 07 Dec 2009 11:09:56 -0500 Received: from [199.232.76.173] (port=37665 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1NHg9l-0000S5-Fb for qemu-devel@nongnu.org; Mon, 07 Dec 2009 11:09:50 -0500 Received: from goliath.siemens.de ([192.35.17.28]:16282) by monty-python.gnu.org with esmtps (TLS-1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.60) (envelope-from ) id 1NHg9k-0005lF-Td for qemu-devel@nongnu.org; Mon, 07 Dec 2009 11:09:49 -0500 Message-ID: <4B1D28C9.70201@siemens.com> Date: Mon, 07 Dec 2009 17:09:45 +0100 From: Jan Kiszka MIME-Version: 1.0 References: <4B0537EB.4000909@siemens.com> <4B055AEF.4030406@redhat.com> <4B055D32.3040601@siemens.com> <4B1D0E34.6070907@siemens.com> <4B1D1882.7040404@redhat.com> In-Reply-To: <4B1D1882.7040404@redhat.com> Content-Type: text/plain; charset=ISO-8859-15 Content-Transfer-Encoding: 7bit Subject: [Qemu-devel] Re: Endless loop in qcow2_alloc_cluster_offset List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Kevin Wolf Cc: qemu-devel , kvm Kevin Wolf wrote: > Am 07.12.2009 15:16, schrieb Jan Kiszka: >>> Likely not. What I did was nothing special, and I did not noticed such a >>> crash in the last months. >> And now it happened again (qemu-kvm head, during kernel installation >> from network onto local qcow2-disk). Any clever idea how to proceed with >> this? > > I still haven't seen this and I still have no theory on what could be > happening here. I'm just trying to write down what I think must happen > to get into this situation. Maybe you can point at something I'm missing > or maybe it helps you to have a sudden inspiration. > > The crash happens because we have a loop in the s->cluster_allocs list. > A loop can only be created by inserting an object twice. The only insert > to this list happens in qcow2_alloc_cluster_offset (though an earlier > call than that of the stack trace). > > There is only one relevant caller of this function, qcow_aio_write_cb. > Part of it is a call to run_dependent_requests which removes the request > from s->cluster_allocs. So after the QLIST_REMOVE in > run_dependent_requests the request can't be contained in the list, but > at the call of qcow2_alloc_cluster_offset it must be contained again. It > must be added somewhere in between these two calls. > > In qcow_aio_write_cb there isn't much happening between these calls. The > only thing that could somehow become dangerous is the > qcow_aio_write_cb(req, 0); for queued requests in run_dependent_requests. If m->nb_clusters is not, the entry won't be removed from the list. And of something corrupted nb_clusters so that it became 0 although it's still enqueued, we would see the deadly loop I faced, right? Unfortunately, any arbitrary memory corruption that generates such zeros can cause this... Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux