From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755276AbcHSP2v (ORCPT ); Fri, 19 Aug 2016 11:28:51 -0400 Received: from aserp1040.oracle.com ([141.146.126.69]:20105 "EHLO aserp1040.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754740AbcHSP2r (ORCPT ); Fri, 19 Aug 2016 11:28:47 -0400 Date: Fri, 19 Aug 2016 11:27:07 -0400 From: Konrad Rzeszutek Wilk To: One Thousand Gnomes Cc: Jan Beulich , Andrew Cooper , stefan.bader@canonical.com, david.vrabel@citrix.com, xen-devel , Boris Ostrovsky , chuck.anderson@oracle.com, Juergen Gross , linux-kernel@vger.kernel.org Subject: Re: [Xen-devel] XSA 154 and ISA region (640K -> 1MB) WB cache instead of UC Message-ID: <20160819152707.GC26577@char.us.oracle.com> References: <20160817203238.GA9408@char.us.oracle.com> <57B5A4C90200007800106FF2@prv-mh.provo.novell.com> <47fabc70-4500-1035-7dc7-7f0a3915471f@citrix.com> <57B5B4560200007800107092@prv-mh.provo.novell.com> <20160818163544.07c8a051@lxorguk.ukuu.org.uk> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20160818163544.07c8a051@lxorguk.ukuu.org.uk> User-Agent: Mutt/1.6.2 (2016-07-01) X-Source-IP: userv0021.oracle.com [156.151.31.71] Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Aug 18, 2016 at 04:35:44PM +0100, One Thousand Gnomes wrote: > On Thu, 18 Aug 2016 05:12:54 -0600 > "Jan Beulich" wrote: > > > >>> On 18.08.16 at 12:16, wrote: > > > On 18/08/16 11:06, Jan Beulich wrote: > > >>>>> On 17.08.16 at 22:32, wrote: > > >>> Looking at the kernel it assumes that WB is ok for 640KB->1MB. > > >>> The comment says: > > >>> " /* Low ISA region is always mapped WB in page table. No need to track > > > *" > > >> As per above it's not clear to me what this comment is backed by. > > > > > > This states what is in the pagetables. Not the combined result with MTRRs. > > > > > > WB in the pagetables and WC/UB in the MTRRs is a legal combination which > > > functions correctly. > > > > True, but then again - haven't I been told multiple times that Linux > > nowadays prefers to run without using MTRRs? > > The BIOS sets up the fixed MTRR registers for the 640K-1MB window. Those > are separate to the variable range MTRR registers used for main memory > with specific mappings for segments A000 to BFFF then C000-C7FF / > C800-CFFF / etc up to FFFF. OK, so BIOS-inherited. Looking at the Intel SDM (figure 11-7), if the MTRR is UC for that, then having pagetables being either UC or WB are fine. Except Linux's use of the quirk (is_untracked_pat_range) ends up always requesting WB. And to combat the splat, the patch: >>From 5209635f23786fb88cf0ce77719da8acda63bf65 Mon Sep 17 00:00:00 2001 From: Konrad Rzeszutek Wilk Date: Fri, 19 Aug 2016 11:06:44 -0400 Subject: [PATCH] x86/xen: Add x86_platform.is_untracked_pat_range quirk to ignore ISA regions. On x86 whenever VMAs are setup, the 'is_ISA_range quirk' (which this patch re-implements) is used to figure whether to ignore the requested PAT type and always use WB (see 'reserve_memtype'). Specifically it forces the WB type for any region in the ISA space. >>From the Intel SDM, the combination of MTRR (UC, which is setup by the BIOS) and PAT (UC or WB) for the ISA region ends up with the same value - UC. However on Xen, due to XSA 154 we enforce that mappings that _ANY_ pagetable entry to MMIO ranges MUST have the same the same cachability mapping - and in this case we enforce UC. Which means that with XSA 154 (and without this patch) any application that maps /dev/mem to get SMBIOS information (like mcelog), and pokes in the ISA region will not have an PTE set. That is due to reserve_pfn_range returning -EINVAL which results in the PTE not being set. [These are debug entries added in 'reserve_pfn_range'] mcelog:2471 0xf0000->0xf1000, req_type=write-back new_type=write-back mcelog:2471 0xeb000->0xed000, req_type=write-back new_type=write-back .. above are successfull ones, but: mcelog:2471 0xeb000->0xed000, req_type=uncached new_type=uncached [again, a debug one:] mcelog:2471 want=uncached got=write-back strict 0x000eb000-0x000ecfff mcelog:2471 map pfn expected mapping type uncached for [mem 0x000eb000-0x000ecfff], got write-back ------------[ cut here ]------------ [] dump_stack+0x63/0x83 [] warn_slowpath_common+0x95/0xe0 [] warn_slowpath_null+0x1a/0x20 [] untrack_pfn+0x93/0xc0 [] unmap_single_vma+0xa9/0x100 [] unmap_vmas+0x54/0xa0 [] exit_mmap+0x9a/0x150 [] mmput+0x73/0x110 [] dup_mm+0x105/0x110 [] copy_process+0x11ed/0x1240 [] do_fork+0x79/0x280 [] ? syscall_trace_enter_phase1+0x153/0x180 [] SyS_clone+0x16/0x20 [] system_call_fastpath+0x12/0x71 results in that splat. The effective result of the function below is for 'reserver_memtype' to ignore the result from 'x86_platform.is_untracked_pat_range' quirk. Which means that the splat above does not happen. Signed-off-by: Konrad Rzeszutek Wilk --- arch/x86/xen/enlighten.c | 23 +++++++++++++++++++++++ 1 file changed, 23 insertions(+) diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c index 8ffb089..3238d04 100644 --- a/arch/x86/xen/enlighten.c +++ b/arch/x86/xen/enlighten.c @@ -283,6 +283,27 @@ static void __init xen_banner(void) version >> 16, version & 0xffff, extra.extraversion, xen_feature(XENFEAT_mmu_pt_update_preserve_ad) ? " (preserve-AD)" : ""); } + +/* + * On x86 whenever VMAs are setup, the 'is_ISA_range quirk' (which we + * re-implement below) is used to figure whether to ignore the + * requested PAT type and always use WB (see 'reserve_memtype'). + * + * The combination of MTRR (UC) and PAT (UC or WB) for the ISA region ends + * up with the same value - UC. + * + * However on Xen, due to XSA 154 we enforce that mappings to _ANY_ MMIO + * range MUST have the same the same cachability mapping - and in this case + * we enforce UC for everything. + * + * The effective result of the function below is for 'reserver_memtype' + * to ignore the result from 'x86_platform.is_untracked_pat_range' quirk. + */ +static bool xen_ignore(u64 s, u64 e) +{ + return false; +} + /* Check if running on Xen version (major, minor) or later */ bool xen_running_on_version_or_later(unsigned int major, unsigned int minor) @@ -1730,6 +1751,8 @@ asmlinkage __visible void __init xen_start_kernel(void) x86_init.mpparse.get_smp_config = x86_init_uint_noop; xen_boot_params_init_edd(); + + x86_platform.is_untracked_pat_range = xen_ignore; } #ifdef CONFIG_PCI /* PCI BIOS service won't work from a PV guest. */ -- 2.5.5 > > Alan