From mboxrd@z Thu Jan 1 00:00:00 1970 From: Stefan Bader Subject: Re: PATastic fun Date: Tue, 26 Feb 2013 16:32:37 +0100 Message-ID: <512CD595.2000502@canonical.com> References: <51277888.50908@canonical.com> <20130222145316.GB8017@phenom.dumpdata.com> <512B2A7C.1050906@canonical.com> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============7892278221739760396==" Return-path: In-Reply-To: <512B2A7C.1050906@canonical.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: "Liu, Jinsong" Cc: Colin Ian King , "xen-devel@lists.xensource.com" , Konrad Rzeszutek Wilk , Sander Eikelenboom List-Id: xen-devel@lists.xenproject.org This is an OpenPGP/MIME signed message (RFC 2440 and 3156) --===============7892278221739760396== Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="------------enig702C1BC490730E0C45CB5F36" This is an OpenPGP/MIME signed message (RFC 2440 and 3156) --------------enig702C1BC490730E0C45CB5F36 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable On 25.02.2013 10:10, Stefan Bader wrote: > On 25.02.2013 04:15, Liu, Jinsong wrote: >> Konrad Rzeszutek Wilk wrote: >>> On Fri, Feb 22, 2013 at 02:54:16PM +0100, Stefan Bader wrote: >>>> Hi Konrad, >>> >>> Hey Stefan, >>>> >>>> here is another one from the hm-what? department: >>> >>> Heh - the really good-bug-hunting one. Lets also include Jinsong as >>> he has been tracking a similar one with mcelog. >>>> >>>> Colin discovered that running the attached program with the fork >>>> active (e.g. "./mmap-example -f 0x10000", the address can be that or= >>>> iomem) this triggers the following weird messages:=20 >>>> >>>> [ 6824.453724] mmap-example:3481 map pfn expected mapping type >>>> write-back for [mem 0x00010000-0x00010fff], got uncached-minus >>>> [ 6824.453776] ------------[ cut here ]------------ >>>> [ 6824.453796] WARNING: at >>>> /build/buildd/linux-3.8.0/arch/x86/mm/pat.c:774 >>>> untrack_pfn+0xb8/0xd0() ... [ 6824.453920] Pid: 3481, comm: >>>> mmap-example Tainted: GF=20 >>>> 3.8.0-6-generic #13-Ubuntu >>>> [ 6824.453926] Call Trace: >>>> [ 6824.453944] [] warn_slowpath_common+0x7f/0xc0 >>>> [ 6824.453954] [] warn_slowpath_null+0x1a/0x20 >>>> [ 6824.453963] [] untrack_pfn+0xb8/0xd0 >>>> [ 6824.453975] [] unmap_single_vma+0xac/0x100 >>>> [ 6824.453985] [] unmap_vmas+0x49/0x90 >>>> [ 6824.453995] [] exit_mmap+0x98/0x170 >>>> [ 6824.454007] [] mmput+0x64/0x100 >>>> [ 6824.454017] [] dup_mm+0x445/0x660 >>>> [ 6824.454027] [] >>>> copy_process.part.22+0xa5f/0x1510 [ 6824.454038]=20 >>>> [] do_fork+0x91/0x350 [ 6824.454048]=20 >>>> [] sys_clone+0x16/0x20 [ 6824.454060]=20 >>>> [] stub_clone+0x69/0x90 [ 6824.454069]=20 >>>> [] ? system_call_fastpath+0x1a/0x1f [ 6824.454076]= >>>> ---[ end trace 4918cdd0a4c9fea4 ]---=20 >>>> >>>> I found that this is related to your bandaid patch >>>> >>>> commit 8eaffa67b43e99ae581622c5133e20b0f48bcef1 >>>> Author: Konrad Rzeszutek Wilk >>>> Date: Fri Feb 10 09:16:27 2012 -0500 >>>> >>>> xen/pat: Disable PAT support for now. >>>> >>>> I just do not understand how this happens. From the trace it seems >>>> the fork=20 >>>> fails when duplicating the VMAs (dup_mm calls mmput on failure). So >>>> maybe the=20 >>>> warning is just related to this. So primarily the question is how on= >>>> fork the _PAGE_PCD bit can become set? That and _PAGE_PWT are >>>> cleared from the supported=20 >>>> mask by the patch, so somehow I would think nothing should be able >>>> to set it...=20 >>>> But apparently not so. >>>> Not sure it is a big deal since I never saw this in normal operation= >>>> and it=20 >>>> seems to be ok when unapping before doing the fork. It is just plain= >>>> odd.=20 >>> >>> Jinsong mentioned that there is some oddity with the MTRR. Somehow th= e >>> ranges are swapped or not correct. Jinsong, could you shed some light= >>> on what you have found so far? >>> >> >> Yes, Sander once also reported a similar weird warning when start mcel= og daemon, as attached. >> >> Basically, it occurs when mcelog user daemon start,=20 >> do_fork >> --> copy_process >> --> dup_mm >> --> dup_mmap >> --> copy_page_range >> --> track_pfn_copy >> --> reserve_pfn_range So that makes it clearer as this will do reserve_memtype(...) --> pat_x_mtrr_type --> mtrr_type_lookup --> __mtrr_type_lookup And that can return -1/0xff in case of mtrr not being enabled/initialized= =2E Which is not the case (given there are no messages for it in dmesg). This is no= t equal to MTRR_TYPE_WRBACK and thus becomes _PAGE_CACHE_UC_MINUS. It looks like the problem starts early in reserve_memtype: if (!pat_enabled) { /* This is identical to page table setting without PAT */= if (new_type) { if (req_type =3D=3D _PAGE_CACHE_WC) *new_type =3D _PAGE_CACHE_UC_MINUS; else *new_type =3D req_type & _PAGE_CACHE_MASK= ; } return 0; } This would be what we want, but only clearing the PWT and PCD flags from = the supported flags is not changing pat_enabled (which is 1 when PAT support = is compiled into the kernel). Unfortunately the variable is local and since = there are not any messages about PAT in dmesg I would say pat_init() is not cal= led either. Which might be used to disable PAT support by clearing the CPU fe= ature flag. Right now it seems the only work-around that message appearing is to user= "nopat" on the kernel command line. -Stefan >> --> line 624: flags !=3D want_flags >> It comes from different memory types of page table (_PAGE_CACHE_WB) an= d mtrr (_PAGE_CACHE_UC_MINUS). >> >> However, why it get different memory types from page table and mtrr is= still unclear, reproducing the bug is difficult and unstable. >> >> Thanks, >=20 > Ok, so this seems to take the same code paths. As for the test program,= it fails > on duplicating some mmap on a fork. The test program does this all the = time > (except the backtrace warning which is warn_once). > So you say, the UC- comes from the MTRR side... Hm, have to look at tha= t. >=20 --------------enig702C1BC490730E0C45CB5F36 Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with undefined - http://www.enigmail.net/ iQIcBAEBCgAGBQJRLNWVAAoJEOhnXe7L7s6jMMkQAI+RHToAV3w5uVYrmwUhqfP5 gi2zt8EqweOJpOBrLXUJewYP+viNWEphM5fpx9atcD99AaekRshCu8p0QtHqZXGs W/GfgKX+PyY3BxxI+ftH55nby+L2lYjuUVqVU43l4zhk+8M4LI9FAR5A/CafRhKG XoyzFt/maW4DJWvn0UeTe9JkeIgAd8Itgid3J8d+lUhynEVakLWpHTKbj6lbROjV +9nHBYfGM5HGyF/d/JRLN51QVzlfgFvuwnDZ4onPazuPMPT+OSOvE/tiNtIPWNJS mP+iubmMEzCsSu7ImRbQKGu43EgF/BykRkHcxkF3myYDUG5EKncTi2ozDGSm1Pd1 m9d9iCX+kqVq74pa7f1l9URzEM4NdmIlYa47K4U6rI/uybN2MzLEGoIJN1bmdMts r+ELDVWTwTXHVWOb4mh3BHFS7B0/9WRTq7x4ROBpC+lUqgfQN2Xuky7KDGiNLryq NZ8UScMx5MNv1fd16Lm6BQl1nCOthWqZoxFOpbIzROlEOSXWuxzlDMQ3inO6v4cj BNqSqECh+2kGmKW+kPrbuhOS6cRirzVID23SxpU+PeAqMm0ZzhvrM5o/G0eh2g80 MJP2X103SmzUcwBx9Uhh/sYXdc1Pp6Ka2iZhTfnSnAN/Y5s82xZayTTLNxs7FRa8 lKmFmtC7ckt9KWj9azWY =TeT7 -----END PGP SIGNATURE----- --------------enig702C1BC490730E0C45CB5F36-- --===============7892278221739760396== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel --===============7892278221739760396==--