From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jeremy Fitzhardinge Subject: Re: Xen 4.0.0x allows for data corruption in Dom0 Date: Mon, 08 Mar 2010 15:41:08 -0800 Message-ID: <4B958B14.5030805@goop.org> References: <4B922A89.2060105@invisiblethingslab.com> <4B957914.4050408@goop.org> <4B957B93.4060401@invisiblethingslab.com> <4B958475.3050407@goop.org> <4B9586E0.2060005@invisiblethingslab.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <4B9586E0.2060005@invisiblethingslab.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: Joanna Rutkowska Cc: xen-devel@lists.xensource.com List-Id: xen-devel@lists.xenproject.org On 03/08/2010 03:23 PM, Joanna Rutkowska wrote: > But the corruptions always happen in 32-bytes chunks, which might > suggest it's not a page-related problem (e.g. wrongly re-used page), as > in that case we would be observing (at least sometimes) much bigger > chunks of corrupted data, I think. > Given that the domU doesn't have any devices or much going on, it could easily be corrupting memory in only small amounts. > The reason why I still believe it's a hypervisor related thing, it that > I'm currently using the very *same* Dom0 kernel (very recent > xen/stable-2.6.31) with Xen 3.4.2 and the system is damn stable. And I > really mean extensive use with 5-7 VMs running all the time doing > various things from Web browsing to kernel building. > OK, it's always good to get some positive feedback. > If I was to make an educated guess I would say it's something related to > some interrupt handling, i.e. Xen mishandling it, e.g. the handler is > writing out-of-buffer somewhere and it just happens to land in the Dom0 > fs buffer used by e.g. dd operation. > It would be interesting to see what happens if you write the file with the test domain paused (xm pause ...). If the corruption continues, then it is almost certainly Xen. If it stops, then it either means the corruption was caused by pages inappropriately shared between dom0 and domU, or something like vcpu context switch is corrupting memory (which would be very sad). J