From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx127.postini.com [74.125.245.127]) by kanga.kvack.org (Postfix) with SMTP id E019B6B0032 for ; Sat, 27 Apr 2013 23:17:55 -0400 (EDT) Received: by mail-da0-f51.google.com with SMTP id g27so1423790dan.24 for ; Sat, 27 Apr 2013 20:17:55 -0700 (PDT) Message-ID: <517C94DA.9070002@gmail.com> Date: Sun, 28 Apr 2013 11:17:46 +0800 From: Will Huck MIME-Version: 1.0 Subject: Re: [PATCH] x86: add phys addr validity check for /dev/mem mmap References: <1364905733-23937-1-git-send-email-fhrbata@redhat.com> <517A0ED8.6000404@gmail.com> <20130426153502.GC3510@dhcp-26-164.brq.redhat.com> <517B777B.5020303@gmail.com> <20130427191349.GA3372@dhcp-26-164.brq.redhat.com> In-Reply-To: <20130427191349.GA3372@dhcp-26-164.brq.redhat.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: quoted-printable Sender: owner-linux-mm@kvack.org List-ID: To: Frantisek Hrbata Cc: hpa@zytor.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org, tglx@linutronix.de, mingo@redhat.com, x86@kernel.org, oleg@redhat.com, kamaleshb@in.ibm.com, hechjie@cn.ibm.com On 04/28/2013 03:13 AM, Frantisek Hrbata wrote: > On Sat, Apr 27, 2013 at 03:00:11PM +0800, Will Huck wrote: >> On 04/26/2013 11:35 PM, Frantisek Hrbata wrote: >>> On Fri, Apr 26, 2013 at 01:21:28PM +0800, Will Huck wrote: >>>> Hi Peter, >>>> On 04/02/2013 08:28 PM, Frantisek Hrbata wrote: >>>>> When CR4.PAE is set, the 64b PTE's are used(ARCH_PHYS_ADDR_T_64BIT = is set for >>>>> X86_64 || X86_PAE). According to [1] Chapter 4 Paging, some higher = bits in 64b >>>>> PTE are reserved and have to be set to zero. For example, for IA-32= e and 4KB >>>>> page [1] 4.5 IA-32e Paging: Table 4-19, bits 51-M(MAXPHYADDR) are r= eserved. So >>>>> for a CPU with e.g. 48bit phys addr width, bits 51-48 have to be ze= ro. If one of >>>>> the reserved bits is set, [1] 4.7 Page-Fault Exceptions, the #PF is= generated >>>>> with RSVD error code. >>>>> >>>>> >>>>> RSVD flag (bit 3). >>>>> This flag is 1 if there is no valid translation for the linear addr= ess because a >>>>> reserved bit was set in one of the paging-structure entries used to= translate >>>>> that address. (Because reserved bits are not checked in a paging-st= ructure entry >>>>> whose P flag is 0, bit 3 of the error code can be set only if bit 0= is also >>>>> set.) >>>>> >>>>> >>>>> In mmap_mem() the first check is valid_mmap_phys_addr_range(), but = it always >>>>> returns 1 on x86. So it's possible to use any pgoff we want and to = set the PTE's >>>>> reserved bits in remap_pfn_range(). Meaning there is a possibility = to use mmap >>>> In this case, remap_pfn_range() setup the map and reserved bits for >>>> mmio memory, so the mmio memory is already populated, why trigger >>>> #PF? >>> Hi, >>> >>> I think this is described in the quote above for the RSVD flag. >>> >>> remap_pfn_range() =3D> page present =3D> touch page =3D> tlb miss =3D= > >>> walk through paging structures =3D> reserved bit set =3D> #pf with rs= vd flag >> Page present can also trigger #PF? why? > Yes, please see > Intel 64 and IA-32 Architectures Software Developer's Manual, Volume 3A= > > 4.7 PAGE-FAULT EXCEPTIONS > > =B7 RSVD flag (bit 3). > This flag is 1 if there is no valid translation for the linear address = because > a reserved bit was set in one of the paging-structure entries used to > translate that address. (Because reserved bits are not checked in a > paging-structure entry whose P flag is 0, bit 3 of the error code can b= e set > only if bit 0 is also set.) Bits reserved in the paging-structure entri= es are > reserved for future functionality. Software developers should be aware = that > such bits may be used in the future and that a paging-structure entry t= hat > causes a page-fault exception on one processor might not do so in the f= uture. > > > I cannot tell you why. I guess this is more a question for some Intel g= uys. > > Anyway this patch is trying to fix the following problem and > the "Bad pagetable" oops. > > ---------------------------------8<------------------------------------= -- > #include > #include > #include > #include > #include > #include > #include > #include > > #define die(fmt, ...) err(1, fmt, ##__VA_ARGS__) > > /* > 1) Find some non system ram in case the CONFIG_STRICT_DEVMEM is def= ined > $ cat /proc/iomem | grep -v "\(System RAM\|reserved\)" > > 2) Find physical address width > $ cat /proc/cpuinfo | grep "address sizes" > > PTE bits 51 - M are reserved, where M is physical address width fou= nd 2) > Note: step 2) is actually not needed, we can always set just the 51= th bit > (0x8000000000000) What's the meaning here? You trigger oops since the address is beyond=20 max address cpu supported or access to a reserved page? If the answer is = the latter, I'm think it's not right. For example, the kernel code/data=20 section is reserved in memory, kernel access it will trigger oops? I=20 don't think so. > > Set OFFSET macro to > > (start of iomem range found in 1)) | (1 << 51) > > for example > 0x000a0000 | 0x8000000000000 =3D 0x80000000a0000 > > where 0x000a0000 is start of PCI BUS on my laptop > > */ > > #define OFFSET 0x80000000a0000LL > > int main(int argc, char *argv[]) > { > int fd; > long ps; > long pgoff; > char *map; > char c; > > ps =3D sysconf(_SC_PAGE_SIZE); > if (ps =3D=3D -1) > die("cannot get page size"); > > fd =3D open("/dev/mem", O_RDONLY); > if (fd =3D=3D -1) > die("cannot open /dev/mem"); > > printf("%Lx\n", pgoff); > pgoff =3D (OFFSET + (ps - 1)) & ~(ps - 1); > printf("%Lx\n", pgoff); > > map =3D mmap(NULL, ps, PROT_READ, MAP_SHARED, fd, pgoff); > if (map =3D=3D MAP_FAILED) > die("cannot mmap"); > > c =3D map[0]; > > if (munmap(map, ps) =3D=3D -1) > die("cannot munmap"); > > if (close(fd) =3D=3D -1) > die("cannot close"); > > return 0; > } > ---------------------------------8<------------------------------------= -- > > Apr 27 19:52:29 dhcp-26-164 kernel: [ 6464.814860] pfrsvd: Corrupted pa= ge table at address 7f34087c8000 > Apr 27 19:52:29 dhcp-26-164 kernel: [ 6464.817356] PGD 12d0b3067 PUD 12= d544067 PMD 12e29d067 PTE 80080000000a0225 > Apr 27 19:52:29 dhcp-26-164 kernel: [ 6464.820216] Bad pagetable: 000d = [#1] SMP > Apr 27 19:52:29 dhcp-26-164 kernel: [ 6464.822821] Modules linked in: f= use ebtable_nat xt_CHECKSUM bridge stp llc ipt_MASQUERADE nf_conntrack_ne= tbios_ns nf_conntrack_broadcast ip6table_mangle ip6t_REJECT nf_conntrack_= ipv6 nf_defrag_ipv6 iptable_nat nf_nat_ipv4 nf_nat iptable_mangle nf_conn= track_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ebtable_filter ebtabl= es ip6table_filter ip6_tables be2iscsi iscsi_boot_sysfs bnx2i cnic uio cx= gb4i cxgb4 cxgb3i cxgb3 mdio libcxgbi ib_iser rdma_cm ib_addr iw_cm ib_cm= ib_sa ib_mad ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscs= i rfcomm bnep arc4 iwldvm mac80211 snd_hda_codec_hdmi snd_hda_codec_conex= ant snd_hda_intel snd_hda_codec uvcvideo snd_hwdep snd_seq snd_seq_device= snd_pcm iTCO_wdt videobuf2_vmalloc videobuf2_memops videobuf2_core video= dev btusb snd_page_alloc bluetooth snd_timer thinkpad_acpi iwlwifi media = snd i2c_i801 cfg80211 iTCO_vendor_support intel_ips e1000e coretemp lpc_i= ch mfd_core soundcore rfkill mei microcode nfsd auth_rpcgss nfs_acl lockd= sunrpc vhost_net tun macvtap macvlan kvm_intel kvm binfmt_misc uinput dm= _crypt crc32c_intel i915 ghash_clmulni_intel firewire_ohci i2c_algo_bit d= rm_kms_helper firewire_core sdhci_pci crc_itu_t drm sdhci mmc_core i2c_co= re mxm_wmi video wmi > Apr 27 19:52:29 dhcp-26-164 kernel: [ 6464.845686] CPU 3 > Apr 27 19:52:29 dhcp-26-164 kernel: [ 6464.845709] Pid: 8751, comm: pfr= svd Not tainted 3.8.1-201.fc18.x86_64 #1 LENOVO 4384AV1/4384AV1 > Apr 27 19:52:29 dhcp-26-164 kernel: [ 6464.852876] RIP: 0033:[<00000000= 004007db>] [<00000000004007db>] 0x4007da > Apr 27 19:52:29 dhcp-26-164 kernel: [ 6464.856587] RSP: 002b:00007ffff5= c12620 EFLAGS: 00010213 > Apr 27 19:52:29 dhcp-26-164 kernel: [ 6464.860296] RAX: 00007f34087c800= 0 RBX: 0000000000000000 RCX: 00000030fd4eed6a > Apr 27 19:52:29 dhcp-26-164 kernel: [ 6464.864061] RDX: 000000000000000= 1 RSI: 0000000000001000 RDI: 0000000000000000 > Apr 27 19:52:29 dhcp-26-164 kernel: [ 6464.867878] RBP: 00007ffff5c1266= 0 R08: 0000000000000003 R09: 00080000000a0000 > Apr 27 19:52:29 dhcp-26-164 kernel: [ 6464.871706] R10: 000000000000000= 1 R11: 0000000000000206 R12: 00000000004005f0 > Apr 27 19:52:29 dhcp-26-164 kernel: [ 6464.875566] R13: 00007ffff5c1274= 0 R14: 0000000000000000 R15: 0000000000000000 > Apr 27 19:52:29 dhcp-26-164 kernel: [ 6464.879490] FS: 00007f34087a074= 0(0000) GS:ffff880137d80000(0000) knlGS:0000000000000000 > Apr 27 19:52:29 dhcp-26-164 kernel: [ 6464.883447] CS: 0010 DS: 0000 E= S: 0000 CR0: 0000000080050033 > Apr 27 19:52:29 dhcp-26-164 kernel: [ 6464.887436] CR2: 00007f34087c800= 0 CR3: 0000000107509000 CR4: 00000000000007e0 > Apr 27 19:52:29 dhcp-26-164 kernel: [ 6464.891495] DR0: 000000000000000= 0 DR1: 0000000000000000 DR2: 0000000000000000 > Apr 27 19:52:29 dhcp-26-164 kernel: [ 6464.895603] DR3: 000000000000000= 0 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > Apr 27 19:52:29 dhcp-26-164 kernel: [ 6464.899739] Process pfrsvd (pid:= 8751, threadinfo ffff880104ea8000, task ffff88012d9e1760) > Apr 27 19:52:29 dhcp-26-164 kernel: [ 6464.903944] > Apr 27 19:52:29 dhcp-26-164 kernel: [ 6464.908169] RIP [<0000000000400= 7db>] 0x4007da > Apr 27 19:52:29 dhcp-26-164 kernel: [ 6464.912447] RSP <00007ffff5c126= 20> > Apr 27 19:52:29 dhcp-26-164 kernel: [ 6464.943802] ---[ end trace 1113d= 12a53145197 ]--- > > Please note the PTE value 80080000000a0225 > > HTH > > Thank you >>> I hope I didn't misunderstand your question. >>> >>> Thanks >>> >>>>> on /dev/mem and cause system panic. It's probably not that serious,= because >>>>> access to /dev/mem is limited and the system has to have panic_on_o= ops set, but >>>>> still I think we should check this and return error. >>>>> >>>>> This patch adds check for x86 when ARCH_PHYS_ADDR_T_64BIT is set, t= he same way >>>>> as it is already done in e.g. ioremap. With this fix mmap returns -= EINVAL if the >>>>> requested phys addr is bigger then the supported phys addr width. >>>>> >>>>> [1] Intel 64 and IA-32 Architectures Software Developer's Manual, V= olume 3A >>>>> >>>>> Signed-off-by: Frantisek Hrbata >>>>> --- >>>>> arch/x86/include/asm/io.h | 4 ++++ >>>>> arch/x86/mm/mmap.c | 13 +++++++++++++ >>>>> 2 files changed, 17 insertions(+) >>>>> >>>>> diff --git a/arch/x86/include/asm/io.h b/arch/x86/include/asm/io.h >>>>> index d8e8eef..39607c6 100644 >>>>> --- a/arch/x86/include/asm/io.h >>>>> +++ b/arch/x86/include/asm/io.h >>>>> @@ -242,6 +242,10 @@ static inline void flush_write_buffers(void) >>>>> #endif >>>>> } >>>>> +#define ARCH_HAS_VALID_PHYS_ADDR_RANGE >>>>> +extern int valid_phys_addr_range(phys_addr_t addr, size_t count); >>>>> +extern int valid_mmap_phys_addr_range(unsigned long pfn, size_t co= unt); >>>>> + >>>>> #endif /* __KERNEL__ */ >>>>> extern void native_io_delay(void); >>>>> diff --git a/arch/x86/mm/mmap.c b/arch/x86/mm/mmap.c >>>>> index 845df68..92ec31c 100644 >>>>> --- a/arch/x86/mm/mmap.c >>>>> +++ b/arch/x86/mm/mmap.c >>>>> @@ -31,6 +31,8 @@ >>>>> #include >>>>> #include >>>>> +#include "physaddr.h" >>>>> + >>>>> struct __read_mostly va_alignment va_align =3D { >>>>> .flags =3D -1, >>>>> }; >>>>> @@ -122,3 +124,14 @@ void arch_pick_mmap_layout(struct mm_struct *m= m) >>>>> mm->unmap_area =3D arch_unmap_area_topdown; >>>>> } >>>>> } >>>>> + >>>>> +int valid_phys_addr_range(phys_addr_t addr, size_t count) >>>>> +{ >>>>> + return addr + count <=3D __pa(high_memory); >>>>> +} >>>>> + >>>>> +int valid_mmap_phys_addr_range(unsigned long pfn, size_t count) >>>>> +{ >>>>> + resource_size_t addr =3D (pfn << PAGE_SHIFT) + count; >>>>> + return phys_addr_valid(addr); >>>>> +} >>>> -- >>>> To unsubscribe from this list: send the line "unsubscribe linux-kern= el" in >>>> the body of a message to majordomo@vger.kernel.org >>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>> Please read the FAQ at http://www.tux.org/lkml/ >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-kernel= " in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org