public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Paul Gortmaker <paul.gortmaker@windriver.com>
To: Toshi Kani <toshi.kani@hpe.com>
Cc: Borislav Petkov <bp@suse.de>,
	Richard Purdie <richard.purdie@linuxfoundation.org>,
	Toshi Kani <toshi.kani@hp.com>,
	Bruce Ashfield <bruce.ashfield@windriver.com>,
	openembedded-core <openembedded-core@lists.openembedded.org>,
	"Hart, Darren" <darren.hart@intel.com>,
	"saul.wold" <saul.wold@intel.com>, <linux-kernel@vger.kernel.org>
Subject: Re: runtime regression with "x86/mm/pat: Emulate PAT when it is disabled"
Date: Fri, 4 Mar 2016 13:37:14 -0500	[thread overview]
Message-ID: <20160304183713.GA26051@windriver.com> (raw)
In-Reply-To: <1457067768.15454.181.camel@hpe.com>

[Re: runtime regression with "x86/mm/pat: Emulate PAT when it is disabled"] On 03/03/2016 (Thu 22:02) Toshi Kani wrote:

> On Thu, 2016-03-03 at 15:59 -0500, Paul Gortmaker wrote:
> > So, the yocto folks moved from 4.1 to 4.4 and one of their automated
> > qemu x86-32 boot tests started failing.  None of the yocto details seem
> > to matter since I offered to help and I've repropduced it using 100%
> > mainline kernels and a generic distro toolchain as well.
> > 
> > The test case is slightly complicated, in that it relies on uvesafb
> > being modular, and so one has to juggle modules within an ext4 image
> > that qemu boots from.  We tried making uvesafb builtin, but that made
> > the issue magically vanish.  Given PAT, this isn't too surprising.
> > 
> > Richard did the preliminary investigation and analysis, and from that I
> > did a bisect, and found the commit in $SUBJECT to be the root cause, as
> > per the discussion here:
> > 
> > http://lists.openembedded.org/pipermail/openembedded-core/2016-March/1183
> > 97.html
> > 
> > I'd mentioned the above to bpetkov on IRC and after confirming it was
> > still an issue on 4.5-rc6, he'd asked if I had a portable reproducer.  
> > 
> > Not sure how complicated that would be, I set out to make one from my
> > build.   With a little LD_PRELOAD type magic and ensuring all the qemu
> > components are in ./  I have one that runs on an otherwise qemu-free
> > x86-64 box. 
> > 
> > The stand alone reproducer is here; launched in 00-runme:
> > 
> > http://openlinux.wrs.com/pat-splat/reproducer.tar.bz2  
> > 
> > It is nothing fancy, just a generic yocto build of "sato" (gfx enabled
> > rootfs).  When it "works" it boots to a UI touchscreen interface.  When
> > it fails, you get a black screen with a blinking cursor (as seen in
> > "vncviewer localhost:0").
> 
> Thanks for tracking down, and packaging the reproducer.  I simply untar'd
> and ran 00-runme, but was not able to connect with localhost:0.  I am not
> familiar with qemu, so I have not looked into why, though...

Maybe it was localhost:1 in your case?  The qemu should have indicated
what vncserver sessions it started.  Can you paste in the output from
the 00-runme?   I tested the reproducer on a machine that was physically
distinct from the build, and that was a generic ubuntu install, but with
no qemu support installed at all and it worked there.  Plus I got Bruce
to test it worked on his machine, so I'm rather surprised it did not
work for you.

> 
> Anyway, with regarding the error message:
>   "x86/PAT: Xorg:705 map pfn expected mapping type uncached-minus for [mem
> 0xfd000000-0xfdffffff], got write-combining"
> 
> Did it came from the following path during fork()?
>  copy_process
>   copy_mm
>    dup_mm
>     dup_mmap
>      copy_page_range
>       track_pfn_copy
>        reserve_pfn_range

The trace is consistent, and was already captured by Richard, as per:

http://lists.openembedded.org/pipermail/openembedded-core/2016-March/118397.html

which is the link given earlier.  When I say consistent, I mean that I
get essentially the same thing when booting 4.5-rc6:

[   30.098100] x86/PAT: Xorg:509 map pfn expected mapping type uncached-minus for [mem 0xfd000000-0xfdffffff], got write-combining
[   30.106782] ------------[ cut here ]------------
[   30.107093] WARNING: CPU: 0 PID: 509 at /home/paul/poky/build/tmp-glibc/work-shared/qemux86/kernel-source/arch/x86/mm/pat.c:986 untrack_pfn+0x9f/0xb0()
[   30.112553] Modules linked in: 8021q parport_pc parport floppy uvesafb
[   30.113766] CPU: 0 PID: 509 Comm: Xorg Not tainted 4.5.0-rc6-yocto-standard #1
[   30.113806] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.2-0-g33fbe13 by qemu-project.org 04/01/2014
[   30.114078]  00000000 00003286 c0149d78 c13a6c7f 00000000 00000000 c0149dac c1052bcb
[   30.114214]  c1ac7544 00000000 000001fd c1ac1ea4 000003da c104cbdf 000003da c104cbdf
[   30.114214]  00000000 cdcf0528 00000000 c0149dbc c1052ca2 00000009 00000000 c0149de0
[   30.114214] Call Trace:
[   30.114214]  [<c13a6c7f>] dump_stack+0x58/0x79
[   30.114214]  [<c1052bcb>] warn_slowpath_common+0x8b/0xc0
[   30.114214]  [<c104cbdf>] ? untrack_pfn+0x9f/0xb0
[   30.114214]  [<c104cbdf>] ? untrack_pfn+0x9f/0xb0
[   30.114214]  [<c1052ca2>] warn_slowpath_null+0x22/0x30
[   30.114214]  [<c104cbdf>] untrack_pfn+0x9f/0xb0
[   30.114214]  [<c104ecf4>] ? __kunmap_atomic+0x54/0x110
[   30.114214]  [<c114f1cf>] unmap_single_vma+0x56f/0x580
[   30.114214]  [<c11321d0>] ? pagevec_move_tail_fn+0xa0/0xa0
[   30.114214]  [<c1150123>] unmap_vmas+0x43/0x60
[   30.114214]  [<c1154d5f>] exit_mmap+0x5f/0xf0
[   30.114214]  [<c10507bd>] mmput+0x2d/0xa0
[   30.114214]  [<c1051c19>] copy_process.part.47+0x1229/0x1430
[   30.114214]  [<c1051fb4>] _do_fork+0xb4/0x3b0
[   30.114214]  [<c105239c>] SyS_clone+0x2c/0x30
[   30.114214]  [<c1001a04>] do_syscall_32_irqs_on+0x54/0xb0
[   30.114214]  [<c18b06ca>] entry_INT80_32+0x2a/0x2a
[   30.124383] ---[ end trace f7c8a5d94542f94e ]---

> 
> If so, track_pfn_copy() obtained pgprot from a PTE, and called
> reserve_pfn_range() with it.  So, the error message indicates that previous
> ioremap_wc() (i.e. pcm WC) resulted in creating UC- map (i.e. pgprot UC-).
>  pcm is a logical cache type and pgprot is a HW cache type.  They can be
> different when CPU does not have support for a given logical type.  This WC
> to UC- conversion happens when CPU does not support PAT.
> 
> Richard's change, which compares with pgprot values in reserve_pfn_range()
> is a good one, but I do not understand how we get into this mess.  We do
> not have this check when PAT is disabled, and WC is supported when PAT is
> enabled.
> 
> Commit 9cd25aac1 changed the initial values of the pcm<->pgrot conversion
> tables.  The tables should be initialized with the same values after
> pat_init() is called.  Is there any possibility that ioremap_wc() was
> called before pat_init()..?

I don't think it is an initcall ordering thing; recall that I said the
problem seems to go away when built-in vs uvesafb as module.  So given
that, I think it is more related to where the code lands.

> 
> Also, can you send me a whole dmesg output?  I'd like to check how PAT is
> initialized.

I'll send the full file off list vs. spamming everyone with it.  I'm open
to booting the pre-fail commit with PAT specific bootargs and the post-fail
with the same and diffing the two dmesg if there are bootargs you'd like
me to test.  I'd also like to ensure you have a working reproducer locally
so maybe we should look at how that failed 1st.

Thanks,
Paul.
--

> 
> Thanks!
> -Toshi

  reply	other threads:[~2016-03-04 18:37 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-03-03 20:59 runtime regression with "x86/mm/pat: Emulate PAT when it is disabled" Paul Gortmaker
2016-03-03 21:18 ` Paul Gortmaker
2016-03-04  5:02 ` Toshi Kani
2016-03-04 18:37   ` Paul Gortmaker [this message]
2016-03-04 22:12     ` Toshi Kani
2016-03-07  0:35       ` Paul Gortmaker
2016-03-07 16:03         ` Toshi Kani
     [not found]           ` <20160307210852.GC26051@windriver.com>
2016-03-07 23:38             ` Toshi Kani
2016-03-07 23:53               ` Paul Gortmaker
2016-03-08  0:56                 ` Toshi Kani
2016-03-08  1:35                   ` Toshi Kani
2016-03-08  3:28                     ` Paul Gortmaker
2016-03-08 16:38                       ` Toshi Kani
2016-03-10 14:42                     ` Paul Gortmaker
2016-03-10 16:49                       ` Toshi Kani
2016-03-10 17:20                         ` Borislav Petkov
2016-03-10 19:04                           ` Paul Gortmaker
2016-03-10 19:19                             ` Borislav Petkov
2016-03-11 13:23                               ` One Thousand Gnomes
2016-03-11 13:40                                 ` Borislav Petkov
2016-03-11 19:18                                   ` Paolo Bonzini
2016-03-11 22:16                                     ` Borislav Petkov
2016-03-11 22:28                                       ` Bruce Ashfield
2016-03-11 23:29                                         ` Richard Purdie
2016-03-12 12:03                                           ` Borislav Petkov
2016-03-10 20:12                             ` Toshi Kani
2016-03-10 20:04                           ` Toshi Kani
2016-03-10 19:20                             ` Borislav Petkov
2016-03-10 20:24                               ` Toshi Kani
2016-03-10 21:07                                 ` Borislav Petkov
2016-03-10 23:17                                   ` Toshi Kani
2016-03-08  3:16                   ` Paul Gortmaker
2016-03-08 16:13                     ` Toshi Kani
2016-03-08 16:03                       ` Paul Gortmaker
2016-03-08 17:01                         ` Toshi Kani

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160304183713.GA26051@windriver.com \
    --to=paul.gortmaker@windriver.com \
    --cc=bp@suse.de \
    --cc=bruce.ashfield@windriver.com \
    --cc=darren.hart@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=openembedded-core@lists.openembedded.org \
    --cc=richard.purdie@linuxfoundation.org \
    --cc=saul.wold@intel.com \
    --cc=toshi.kani@hp.com \
    --cc=toshi.kani@hpe.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox