From mboxrd@z Thu Jan 1 00:00:00 1970 From: Arvind R Subject: Re: Fbdev graphics broken in xen/next dom0 Date: Sun, 28 Mar 2010 15:03:16 +0530 Message-ID: References: <4B9AA301.6090303@tycho.nsa.gov> <4B9AB559.1070709@goop.org> <4B9ADFDB.1070300@tycho.nsa.gov> <4B9AE192.30104@goop.org> <20100316004630.GA7622@phenom.dumpdata.com> <4B9FFD87.6010908@tycho.nsa.gov> <20100316221952.GA10912@phenom.dumpdata.com> <4BABF7D5.9040109@tycho.nsa.gov> <4BAE7E38.9010506@goop.org> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Return-path: In-Reply-To: <4BAE7E38.9010506@goop.org> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: Jeremy Fitzhardinge Cc: George Coker , Eamon Walsh , Xen-devel , Konrad Rzeszutek Wilk List-Id: xen-devel@lists.xenproject.org On Sun, Mar 28, 2010 at 3:22 AM, Jeremy Fitzhardinge wrot= e: > On 03/27/2010 02:14 AM, Arvind R wrote: >> >> On Fri, Mar 26, 2010 at 5:25 AM, Eamon Walsh =A0wr= ote: >> >>> >>> On 03/16/2010 06:19 PM, Konrad Rzeszutek Wilk wrote: >>> >> >> < =A0--- snip ---> >> >>> >>> I have attached the serial console output and dmesg output. =A0The >>> initcall and drm debug stuff is present. >>> >>> Also, I get something new when I run the test program. =A0It prints out= : >>> >>> # ./silly >>> Mapped /dev/fb0 at 0x7f3237175000 >>> Killed >>> >>> Message from syslogd@moss-flapper at Mar 25 19:25:52 ... >>> =A0kernel:Bad pagetable: 000f [#1] SMP >>> >>> >> >> < =A0--- snip ---> >> >>> >>> silly: Corrupted page table at address 7f3237175000 >>> PGD 1deaec067 PUD 1db6d1067 PMD 1da569067 PTE fffffffffffff22f >>> Bad pagetable: 000f [#1] SMP >>> last sysfs file: /sys/devices/system/cpu/cpu3/cache/index2/shared_cpu_m= ap >>> CPU 1 >>> Modules linked in: nfs fscache bridge stp llc ipt_MASQUERADE iptable_na= t >>> nf_nat nfsd lockd nfs_acl auth_rpcgss export] >>> >> >> < =A0--- snip ---> >> >>>>> >>>>> [drm] Initialized i915 1.6.0 20080730 for 0000:00:02.0 on minor 0 >>>>> >>>>> >>>> >>>> You look to have a i915 framebuffer on your box. >>>> >>>> I *think* that the i915 is not using KMS and the TTM stuff, so the >>>> patch that Arvind posted would probably not help you. >>>> >>>> http://www.mail-archive.com/dri-devel@lists.sourceforge.net/msg48668.h= tml >>>> >>>> So, lets boot your kernel with these command line parameters to get mo= re >>>> data: debug initcall_debug drm.debug=3D255 >>>> >> >> < =A0--- snip ---> >> >> >>>> >>>> e-mail thread titled: "Nouveau on dom0". It covers the gamma of things >>>> to troubleshoot this. >>>> >> >> This is related and most probably due to the same bit. xf86-video-fbdev >> works >> on bare-metal boot on XenNext with the nouveaufb driver but not on Xen. >> Have upgraded whole chain to tip except xen which is 3.4.3rc3 >> Here is the syslog trace: >> kernel: ------------[ cut here ]------------ >> kernel: WARNING: at arch/x86/mm/pat.c:872 track_pfn_vma_copy+0x4d/0x86() >> kernel: Hardware name: System Product Name >> kernel: Modules linked in: fbcon font bitblit softcursor nouveau ttm >> drm_kms_helper drm cfbcopyarea cfbimgblt cfbfillrect bridge stp llc >> ipv6 nfsd lockd nfs_acl auth_rpcgss sunrpc exportfs fuse >> kernel: Pid: 5835, comm: Xorg Not tainted 2.6.32-xen0-git20100323+asusp5= wd >> #1 >> kernel: Call Trace: >> kernel: =A0[] ? track_pfn_vma_copy+0x4d/0x86 >> kernel: =A0[] ? track_pfn_vma_copy+0x4d/0x86 >> kernel: =A0[] ? warn_slowpath_common+0x77/0xa3 >> kernel: =A0[] ? track_pfn_vma_copy+0x4d/0x86 >> kernel: =A0[] ? xen_leave_lazy_mmu+0x25/0x43 >> kernel: =A0[] ? copy_page_range+0x76/0x7f8 >> kernel: =A0[] ? xen_force_evtchn_callback+0x9/0xa >> kernel: =A0[] ? check_events+0x12/0x20 >> kernel: =A0[] ? xen_restore_fl_direct_end+0x0/0x1 >> kernel: =A0[] ? dup_mm+0x276/0x409 >> kernel: =A0[] ? copy_process+0x9c8/0x10ff >> kernel: =A0[] ? do_fork+0x146/0x2c0 >> kernel: =A0[] ? stub_clone+0x13/0x20 >> kernel: =A0[] ? system_call_fastpath+0x16/0x1b >> kernel: ---[ end trace c58bf004d15b0c42 ]--- >> >> Xorg.log ends with the same message as originally with trying >> accelerated nouveau with misleading >> XKB: Failed to compile keymap >> >> fbdev.c calls fbdevHWMapVidmem in xorg-server/hw/xfree86/fbdevhw.c >> which does a mmap as in silly.c. =A0As far as X is concerned, everything >> is fine, but there is obviously a page-fault problem. Will have to setup >> debug options and trace :-( >> >> The 'corrupted page table' syndrome is also present in the accelerated >> nouveau with AGP cards - so it may be linked to this problem. At least >> this problem can be repeated on many platforms :-) >> > > The "corrupt pagetable" comes from the pte having invalid reserved bits s= et > in it. =A0I think the failure path is this: > > The bad bits get set because someone is doing a pfn->mfn conversion on a > page which is already an mfn, and doesn't have a valid pfn->mfn mapping, = and > the result of the conversion is either 0xff... or 0x7f... (I forget right > now). =A0But either way, a whole lot of bits get set, but nothing useful.= =A0I'm > not quite sure why Xen isn't complaining about this at set-pte time, but > perhaps it looks vaguely valid to it (perhaps it sees the invalid flags, > knows the pte can't be used to access anything, and allows it to be set?)= . OK > =A0But this fault is happening because usermode gets a tlb miss, and the = CPU > finds a pte with reserved bits set, and raises the fault. Sorry, no faults! > I'm not sure about the mm/pat.c warning thought. =A0I had a quick look at= that > code, but it wasn't obvious to me what's going on there. =A0Something abo= ut > handing the IO mapping during a fork(). =A0Not sure if its related or not= . > > =A0 =A0J > Was mistaken in assuming a fault. My guess is that Jeremy's failure-path train is right, minus the fault. The hang occurs after the kernel-mode setting has completed - but usermode (which thinks all is hunky-dory) is somehow unable= to create/write to its map of the framebuffer. System responsive - no consoles= . The FBDev DDX driver mmaps the framebuffer device, once, during initializa= tion in fbdevHWMapVidmem. Subsequent calls return the previously mapped address. But unfortunately, the first mmap of the device finds it already mapped by = the console drivers (I presume) - with VM_IO set in the shareable mapping. Is this the first case where the mapped area is iomem (backed by the graphic card memor= y) and is already mapped? In mm/mmap.c mmap_region I see the vma created for the mmap - and it does not have the VM_IO set initially, The driver f_ops->mmap should be able to select it. But the common drm_mmap entry-point is not being entered at all in both bare-boot (working) and xen-boot (not working) cases! What am I missing?