From mboxrd@z Thu Jan 1 00:00:00 1970 From: Chris Lalancette Subject: Problem with X on 32 bit guest on 64-bit host Date: Thu, 05 Feb 2009 15:30:48 +0100 Message-ID: <498AF818.40802@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit To: kvm@vger.kernel.org Return-path: Received: from mx2.redhat.com ([66.187.237.31]:44431 "EHLO mx2.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756423AbZBEOcC (ORCPT ); Thu, 5 Feb 2009 09:32:02 -0500 Received: from int-mx2.corp.redhat.com (int-mx2.corp.redhat.com [172.16.27.26]) by mx2.redhat.com (8.13.8/8.13.8) with ESMTP id n15EW2GT003529 for ; Thu, 5 Feb 2009 09:32:02 -0500 Received: from ns3.rdu.redhat.com (ns3.rdu.redhat.com [10.11.255.199]) by int-mx2.corp.redhat.com (8.13.1/8.13.1) with ESMTP id n15EW19T019478 for ; Thu, 5 Feb 2009 09:32:02 -0500 Received: from localhost.localdomain (vpn-12-255.rdu.redhat.com [10.11.12.255]) by ns3.rdu.redhat.com (8.13.8/8.13.8) with ESMTP id n15EW14e024470 for ; Thu, 5 Feb 2009 09:32:01 -0500 Sender: kvm-owner@vger.kernel.org List-ID: All, I've been trying to track down this problem with starting X on a 32-bit guest on a 64-bit host, and I've hit a bit of a wall. Let me describe the setup: Host: AMD Barcelona machine, 16GB memory, 8 cores, running 2.6.29-rc2, kvm-userspace 3f7cba35281a5b2dba008179a4979d737105574d Guest: RHEL-5 32-bit guest, single VCPU. The problem is that inside the 32-bit guest, X refuses to start. Now, on an Intel platform I have hanging around here, this works just fine; I copy the guest over, start it up, and X starts right up. Also, on the Barcelona, with a 64-bit RHEL-5 guest, X starts fine. I've done quite a bit of tracing inside the guest, and from the guest's perspective, something just isn't right. When X is trying to start, one thing it does is copy a BIOS region from /dev/mem into a shared memory region mapped at 0 inside the X process. The page fault for the access to the memory region at 0 works just fine, but the very next page fault that is injected is completely bogus; it's either > TASK_SIZE (which is 0xc0000000), or has bogus VMA flags set, etc. Going further, what actually happens is that X uses glibc's optimized memcpy routine, which, in assembly, looks like this: (gdb) disass memcpy Dump of assembler code for function memcpy: 0x00387090 : mov 0xc(%esp),%ecx 0x00387094 : mov %edi,%eax 0x00387096 : mov 0x4(%esp),%edi 0x0038709a : mov %esi,%edx 0x0038709c : mov 0x8(%esp),%esi 0x003870a0 : cld 0x003870a1 : shr %ecx 0x003870a3 : jae 0x3870a6 0x003870a5 : movsb %ds:(%esi),%es:(%edi) 0x003870a6 : shr %ecx 0x003870a8 : jae 0x3870ac 0x003870aa : movsw %ds:(%esi),%es:(%edi) 0x003870ac : rep movsl %ds:(%esi),%es:(%edi) 0x003870ae : mov %eax,%edi 0x003870b0 : mov %edx,%esi 0x003870b2 : mov 0x4(%esp),%eax 0x003870b6 : ret If I replace that optimized memcpy routine with my own, stupid memcpy (basically just dst[i] = src[i] in a loop), everything works fine, and doesn't get the bogus page fault. In turn, that leads me to suspect that the rep command is actually not being emulated properly on the host side, but I'm not quite sure of that, nor am I sure where to go from here. Does anybody have any ideas of what I can do to further track this down? -- Chris Lalancette