From mboxrd@z Thu Jan  1 00:00:00 1970
From: Arvind R <arvino55@gmail.com>
Subject: Re: Fbdev graphics broken in xen/next dom0
Date: Sun, 28 Mar 2010 15:03:16 +0530
Message-ID: <d799c4761003280233y4232e3a7gb17d1ec6f0e32a4d@mail.gmail.com>
References: <4B9AA301.6090303@tycho.nsa.gov> <4B9AB559.1070709@goop.org>
	<4B9ADFDB.1070300@tycho.nsa.gov> <4B9AE192.30104@goop.org>
	<20100316004630.GA7622@phenom.dumpdata.com>
	<4B9FFD87.6010908@tycho.nsa.gov>
	<20100316221952.GA10912@phenom.dumpdata.com>
	<4BABF7D5.9040109@tycho.nsa.gov>
	<d799c4761003270214x343c8298j14508430c388a1bf@mail.gmail.com>
	<4BAE7E38.9010506@goop.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
Return-path: <xen-devel-bounces@lists.xensource.com>
In-Reply-To: <4BAE7E38.9010506@goop.org>
List-Unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xensource.com>
List-Help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-Subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
Sender: xen-devel-bounces@lists.xensource.com
Errors-To: xen-devel-bounces@lists.xensource.com
To: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: George Coker <gscoker@tycho.nsa.gov>, Eamon Walsh <ewalsh@tycho.nsa.gov>, Xen-devel <xen-devel@lists.xensource.com>, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
List-Id: xen-devel@lists.xenproject.org

On Sun, Mar 28, 2010 at 3:22 AM, Jeremy Fitzhardinge <jeremy@goop.org> wrot=
e:
> On 03/27/2010 02:14 AM, Arvind R wrote:
>>
>> On Fri, Mar 26, 2010 at 5:25 AM, Eamon Walsh<ewalsh@tycho.nsa.gov> =A0wr=
ote:
>>
>>>
>>> On 03/16/2010 06:19 PM, Konrad Rzeszutek Wilk wrote:
>>>
>>
>> < =A0--- snip --->
>>
>>>
>>> I have attached the serial console output and dmesg output. =A0The
>>> initcall and drm debug stuff is present.
>>>
>>> Also, I get something new when I run the test program. =A0It prints out=
:
>>>
>>> # ./silly
>>> Mapped /dev/fb0 at 0x7f3237175000
>>> Killed
>>>
>>> Message from syslogd@moss-flapper at Mar 25 19:25:52 ...
>>> =A0kernel:Bad pagetable: 000f [#1] SMP
>>>
>>>
>>
>> < =A0--- snip --->
>>
>>>
>>> silly: Corrupted page table at address 7f3237175000
>>> PGD 1deaec067 PUD 1db6d1067 PMD 1da569067 PTE fffffffffffff22f
>>> Bad pagetable: 000f [#1] SMP
>>> last sysfs file: /sys/devices/system/cpu/cpu3/cache/index2/shared_cpu_m=
ap
>>> CPU 1
>>> Modules linked in: nfs fscache bridge stp llc ipt_MASQUERADE iptable_na=
t
>>> nf_nat nfsd lockd nfs_acl auth_rpcgss export]
>>>
>>
>> < =A0--- snip --->
>>
>>>>>
>>>>> [drm] Initialized i915 1.6.0 20080730 for 0000:00:02.0 on minor 0
>>>>>
>>>>>
>>>>
>>>> You look to have a i915 framebuffer on your box.
>>>>
>>>> I *think* that the i915 is not using KMS and the TTM stuff, so the
>>>> patch that Arvind posted would probably not help you.
>>>>
>>>> http://www.mail-archive.com/dri-devel@lists.sourceforge.net/msg48668.h=
tml
>>>>
>>>> So, lets boot your kernel with these command line parameters to get mo=
re
>>>> data: debug initcall_debug drm.debug=3D255
>>>>
>>
>> < =A0--- snip --->
>>
>>
>>>>
>>>> e-mail thread titled: "Nouveau on dom0". It covers the gamma of things
>>>> to troubleshoot this.
>>>>
>>
>> This is related and most probably due to the same bit. xf86-video-fbdev
>> works
>> on bare-metal boot on XenNext with the nouveaufb driver but not on Xen.
>> Have upgraded whole chain to tip except xen which is 3.4.3rc3
>> Here is the syslog trace:
>> kernel: ------------[ cut here ]------------
>> kernel: WARNING: at arch/x86/mm/pat.c:872 track_pfn_vma_copy+0x4d/0x86()
>> kernel: Hardware name: System Product Name
>> kernel: Modules linked in: fbcon font bitblit softcursor nouveau ttm
>> drm_kms_helper drm cfbcopyarea cfbimgblt cfbfillrect bridge stp llc
>> ipv6 nfsd lockd nfs_acl auth_rpcgss sunrpc exportfs fuse
>> kernel: Pid: 5835, comm: Xorg Not tainted 2.6.32-xen0-git20100323+asusp5=
wd
>> #1
>> kernel: Call Trace:
>> kernel: =A0[<ffffffff8102c834>] ? track_pfn_vma_copy+0x4d/0x86
>> kernel: =A0[<ffffffff8102c834>] ? track_pfn_vma_copy+0x4d/0x86
>> kernel: =A0[<ffffffff8103ce54>] ? warn_slowpath_common+0x77/0xa3
>> kernel: =A0[<ffffffff8102c834>] ? track_pfn_vma_copy+0x4d/0x86
>> kernel: =A0[<ffffffff8100c436>] ? xen_leave_lazy_mmu+0x25/0x43
>> kernel: =A0[<ffffffff81090c49>] ? copy_page_range+0x76/0x7f8
>> kernel: =A0[<ffffffff8100ddc9>] ? xen_force_evtchn_callback+0x9/0xa
>> kernel: =A0[<ffffffff8100e572>] ? check_events+0x12/0x20
>> kernel: =A0[<ffffffff8100e55f>] ? xen_restore_fl_direct_end+0x0/0x1
>> kernel: =A0[<ffffffff8103b1f2>] ? dup_mm+0x276/0x409
>> kernel: =A0[<ffffffff8103bd82>] ? copy_process+0x9c8/0x10ff
>> kernel: =A0[<ffffffff8103c5ff>] ? do_fork+0x146/0x2c0
>> kernel: =A0[<ffffffff810110a3>] ? stub_clone+0x13/0x20
>> kernel: =A0[<ffffffff81010d82>] ? system_call_fastpath+0x16/0x1b
>> kernel: ---[ end trace c58bf004d15b0c42 ]---
>>
>> Xorg.log ends with the same message as originally with trying
>> accelerated nouveau with misleading
>> XKB: Failed to compile keymap
>>
>> fbdev.c calls fbdevHWMapVidmem in xorg-server/hw/xfree86/fbdevhw.c
>> which does a mmap as in silly.c. =A0As far as X is concerned, everything
>> is fine, but there is obviously a page-fault problem. Will have to setup
>> debug options and trace :-(
>>
>> The 'corrupted page table' syndrome is also present in the accelerated
>> nouveau with AGP cards - so it may be linked to this problem. At least
>> this problem can be repeated on many platforms :-)
>>
>
> The "corrupt pagetable" comes from the pte having invalid reserved bits s=
et
> in it. =A0I think the failure path is this:
>
> The bad bits get set because someone is doing a pfn->mfn conversion on a
> page which is already an mfn, and doesn't have a valid pfn->mfn mapping, =
and
> the result of the conversion is either 0xff... or 0x7f... (I forget right
> now). =A0But either way, a whole lot of bits get set, but nothing useful.=
 =A0I'm
> not quite sure why Xen isn't complaining about this at set-pte time, but
> perhaps it looks vaguely valid to it (perhaps it sees the invalid flags,
> knows the pte can't be used to access anything, and allows it to be set?)=
.

OK

> =A0But this fault is happening because usermode gets a tlb miss, and the =
CPU
> finds a pte with reserved bits set, and raises the fault.

Sorry, no faults!

> I'm not sure about the mm/pat.c warning thought. =A0I had a quick look at=
 that
> code, but it wasn't obvious to me what's going on there. =A0Something abo=
ut
> handing the IO mapping during a fork(). =A0Not sure if its related or not=
.
>
> =A0 =A0J
>
Was mistaken in assuming a fault. My guess is that Jeremy's
failure-path train is
right, minus the fault. The hang occurs after the kernel-mode setting has
completed - but usermode (which thinks all is hunky-dory) is somehow unable=
 to
create/write to its map of the framebuffer. System responsive - no consoles=
.

The FBDev DDX driver mmaps the framebuffer device, once,  during initializa=
tion
in fbdevHWMapVidmem. Subsequent calls return the previously mapped address.
But unfortunately, the first mmap of the device finds it already mapped by =
the
console drivers (I presume) - with VM_IO set in the shareable mapping.
 Is this the
first case where the mapped area is iomem (backed by the graphic card memor=
y)
and is already mapped?

In mm/mmap.c mmap_region I see the vma  created for the mmap - and it does
not have the VM_IO set initially, The driver f_ops->mmap should be
able to select it.
But the common drm_mmap entry-point is not being entered at all in both
bare-boot (working) and xen-boot (not working) cases!

What am I missing?