From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jeremy Fitzhardinge Subject: Re: Fbdev graphics broken in xen/next dom0 Date: Sat, 27 Mar 2010 14:52:56 -0700 Message-ID: <4BAE7E38.9010506@goop.org> References: <4B9AA301.6090303@tycho.nsa.gov> <4B9AB559.1070709@goop.org> <4B9ADFDB.1070300@tycho.nsa.gov> <4B9AE192.30104@goop.org> <20100316004630.GA7622@phenom.dumpdata.com> <4B9FFD87.6010908@tycho.nsa.gov> <20100316221952.GA10912@phenom.dumpdata.com> <4BABF7D5.9040109@tycho.nsa.gov> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: Arvind R Cc: George Coker , Eamon Walsh , Xen-devel , Konrad Rzeszutek Wilk List-Id: xen-devel@lists.xenproject.org On 03/27/2010 02:14 AM, Arvind R wrote: > On Fri, Mar 26, 2010 at 5:25 AM, Eamon Walsh wrote: > >> On 03/16/2010 06:19 PM, Konrad Rzeszutek Wilk wrote: >> > < --- snip ---> > >> I have attached the serial console output and dmesg output. The >> initcall and drm debug stuff is present. >> >> Also, I get something new when I run the test program. It prints out: >> >> # ./silly >> Mapped /dev/fb0 at 0x7f3237175000 >> Killed >> >> Message from syslogd@moss-flapper at Mar 25 19:25:52 ... >> kernel:Bad pagetable: 000f [#1] SMP >> >> > < --- snip ---> > >> silly: Corrupted page table at address 7f3237175000 >> PGD 1deaec067 PUD 1db6d1067 PMD 1da569067 PTE fffffffffffff22f >> Bad pagetable: 000f [#1] SMP >> last sysfs file: /sys/devices/system/cpu/cpu3/cache/index2/shared_cpu_map >> CPU 1 >> Modules linked in: nfs fscache bridge stp llc ipt_MASQUERADE iptable_nat nf_nat nfsd lockd nfs_acl auth_rpcgss export] >> > < --- snip ---> > >>>> [drm] Initialized i915 1.6.0 20080730 for 0000:00:02.0 on minor 0 >>>> >>>> >>> You look to have a i915 framebuffer on your box. >>> >>> I *think* that the i915 is not using KMS and the TTM stuff, so the >>> patch that Arvind posted would probably not help you. >>> http://www.mail-archive.com/dri-devel@lists.sourceforge.net/msg48668.html >>> >>> So, lets boot your kernel with these command line parameters to get more >>> data: debug initcall_debug drm.debug=255 >>> > < --- snip ---> > > >>> e-mail thread titled: "Nouveau on dom0". It covers the gamma of things >>> to troubleshoot this. >>> > This is related and most probably due to the same bit. xf86-video-fbdev works > on bare-metal boot on XenNext with the nouveaufb driver but not on Xen. > Have upgraded whole chain to tip except xen which is 3.4.3rc3 > Here is the syslog trace: > kernel: ------------[ cut here ]------------ > kernel: WARNING: at arch/x86/mm/pat.c:872 track_pfn_vma_copy+0x4d/0x86() > kernel: Hardware name: System Product Name > kernel: Modules linked in: fbcon font bitblit softcursor nouveau ttm > drm_kms_helper drm cfbcopyarea cfbimgblt cfbfillrect bridge stp llc > ipv6 nfsd lockd nfs_acl auth_rpcgss sunrpc exportfs fuse > kernel: Pid: 5835, comm: Xorg Not tainted 2.6.32-xen0-git20100323+asusp5wd #1 > kernel: Call Trace: > kernel: [] ? track_pfn_vma_copy+0x4d/0x86 > kernel: [] ? track_pfn_vma_copy+0x4d/0x86 > kernel: [] ? warn_slowpath_common+0x77/0xa3 > kernel: [] ? track_pfn_vma_copy+0x4d/0x86 > kernel: [] ? xen_leave_lazy_mmu+0x25/0x43 > kernel: [] ? copy_page_range+0x76/0x7f8 > kernel: [] ? xen_force_evtchn_callback+0x9/0xa > kernel: [] ? check_events+0x12/0x20 > kernel: [] ? xen_restore_fl_direct_end+0x0/0x1 > kernel: [] ? dup_mm+0x276/0x409 > kernel: [] ? copy_process+0x9c8/0x10ff > kernel: [] ? do_fork+0x146/0x2c0 > kernel: [] ? stub_clone+0x13/0x20 > kernel: [] ? system_call_fastpath+0x16/0x1b > kernel: ---[ end trace c58bf004d15b0c42 ]--- > > Xorg.log ends with the same message as originally with trying > accelerated nouveau with misleading > XKB: Failed to compile keymap > > fbdev.c calls fbdevHWMapVidmem in xorg-server/hw/xfree86/fbdevhw.c > which does a mmap as in silly.c. As far as X is concerned, everything > is fine, but there is obviously a page-fault problem. Will have to setup > debug options and trace :-( > > The 'corrupted page table' syndrome is also present in the accelerated > nouveau with AGP cards - so it may be linked to this problem. At least > this problem can be repeated on many platforms :-) > The "corrupt pagetable" comes from the pte having invalid reserved bits set in it. I think the failure path is this: The bad bits get set because someone is doing a pfn->mfn conversion on a page which is already an mfn, and doesn't have a valid pfn->mfn mapping, and the result of the conversion is either 0xff... or 0x7f... (I forget right now). But either way, a whole lot of bits get set, but nothing useful. I'm not quite sure why Xen isn't complaining about this at set-pte time, but perhaps it looks vaguely valid to it (perhaps it sees the invalid flags, knows the pte can't be used to access anything, and allows it to be set?). But this fault is happening because usermode gets a tlb miss, and the CPU finds a pte with reserved bits set, and raises the fault. I'm not sure about the mm/pat.c warning thought. I had a quick look at that code, but it wasn't obvious to me what's going on there. Something about handing the IO mapping during a fork(). Not sure if its related or not. J