From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:59361) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Yuott-0007Iy-Vm for qemu-devel@nongnu.org; Tue, 19 May 2015 17:18:10 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Yuots-0007nc-NR for qemu-devel@nongnu.org; Tue, 19 May 2015 17:18:09 -0400 Message-ID: <555BA87F.3060408@ilande.co.uk> Date: Tue, 19 May 2015 22:17:51 +0100 From: Mark Cave-Ayland MIME-Version: 1.0 References: <1425939893-14404-1-git-send-email-mark.cave-ayland@ilande.co.uk> <555A4362.8030409@redhat.com> <555BA225.8060602@ilande.co.uk> <555BA4B6.3030101@redhat.com> In-Reply-To: <555BA4B6.3030101@redhat.com> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [RFC 0/2] macio: split out unaligned DMA access into separate functions List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: John Snow , qemu-devel@nongnu.org, qemu-ppc@nongnu.org, kwolf@redhat.com, stefanha@redhat.com, agraf@suse.de On 19/05/15 22:01, John Snow wrote: >> Thanks John. Even though you haven't managed to figure out the problem >> the patchset attempts to solve, were you at least able to reproduce the >> image corruption locally? >> >> That said, the patchset is still worth including just for the fact that >> it fixes the flaky CDROM detection here. >> >> >> ATB, >> >> Mark. >> > > Yeah, I reproduced the problem you're describing and spent a chunk of my > time debugging it and trying to figure out a section of the trace that > coincides with "the problem," but was unable to find anything of > particular interest. Well that's definitely a good start :) I did spend some time enabling tracepoints on the block layer for both good and bad commits, and the only obvious difference I could see was the batching between multiple read/write requests. > I do notice that sometimes we appear to start a new transfer almost > immediately after one completes, but the code in place to sleep that > action until the guest finishes programming the DMA command seems to > catch it and nothing gets maliciously perturbed. > > I still wonder somewhat that with the move to async and the strange > order of how darwin appears to program DMA transfers that we're hitting > some weird race, but I think that how reliably I hit the exact same > problem means that I should think again :) > > I'll keep poking. The only other thing I had in the back of my mind was whether the async code needs some kind of extra write barrier implemented when used in this way. Unfortunately I haven't had a chance to dig into the block layer to figure out how to attempt this yet. ATB, Mark.