* Is _PAGE_PROTNONE set only for user mappings? [not found] ` <8c2735ac-0335-6e2a-8341-8266d5d13c30@intel.com> @ 2022-05-11 5:20 ` Hyeonggon Yoo 2022-05-12 10:37 ` Mel Gorman 0 siblings, 1 reply; 10+ messages in thread From: Hyeonggon Yoo @ 2022-05-11 5:20 UTC (permalink / raw) To: Dave Hansen Cc: Tom Lendacky, Rick Edgecombe, Dave Hansen, Andy Lutomirski, Peter Zijlstra, Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86, H. Peter Anvin, Kirill A. Shutemov, Tianyu Lan, Aneesh Kumar K.V, linux-kernel, linux-mm, vbabka, akpm, mgorman, willy On Tue, May 10, 2022 at 07:39:30AM -0700, Dave Hansen wrote: > On 5/10/22 06:35, Tom Lendacky wrote: > > I'm wondering if adding a specific helper that takes a boolean to > > indicate whether to set the global flag would be best. I'll let some of > > the MM maintainers comment about that. > > First of all, I'm not positive that _PAGE_BIT_PROTNONE is ever used for > kernel mappings. This would all get a lot easier if we decided that > _PAGE_BIT_PROTNONE is only for userspace mappings and we don't have to > worry about it when _PAGE_USER is clear. After quickly skimming code it seems the place that actually sets _PAGE_PROTNONE is via mm/mmap.c's protection_map: > /* description of effects of mapping type and prot in current implementation. > * this is due to the limited x86 page protection hardware. The expected > * behavior is in parens: > * > * map_type prot > * PROT_NONE PROT_READ PROT_WRITE PROT_EXEC > * MAP_SHARED r: (no) no r: (yes) yes r: (no) yes r: (no) yes > * w: (no) no w: (no) no w: (yes) yes w: (no) no > * x: (no) no x: (no) yes x: (no) yes x: (yes) yes > * > * MAP_PRIVATE r: (no) no r: (yes) yes r: (no) yes r: (no) yes > * w: (no) no w: (no) no w: (copy) copy w: (no) no > * x: (no) no x: (no) yes x: (no) yes x: (yes) yes > * > */ > pgprot_t protection_map[16] = { > __P000, __P001, __P010, __P011, __P100, __P101, __P110, __P111, > __S000, __S001, __S010, __S011, __S100, __S101, __S110, __S111 > }; Where __P000, __S000 is PAGE_NONE (_PAGE_ACCESSED | _PAGE_PROTNONE). And protection_map is accessed via: > pgprot_t vm_get_page_prot(unsigned long vm_flags) > { > pgprot_t ret = __pgprot(pgprot_val(protection_map[vm_flags & > (VM_READ|VM_WRITE|VM_EXEC|VM_SHARED)]) | > pgprot_val(arch_vm_get_page_prot(vm_flags))); > > return arch_filter_pgprot(ret); > } > EXPORT_SYMBOL(vm_get_page_prot); I guess it's only set for processes' VMA if no caller is abusing vm_get_page_prot() for kernel mappings. But yeah, just quick guessing does not make us convinced. Let's Cc people working on mm. If kernel never uses _PAGE_PROTNONE for kernel mappings, it's just okay not to clear _PAGE_GLOBAL at first in __change_page_attr() if it's not user address, because no user will confuse _PAGE_GLOBAL as _PAGE_PROTNONE if it's kernel address. right? > > Second, the number of places that do these > __set_pages_p()/__set_pages_np() pairs is pretty limited. Some of them > are *quite* unambiguous over whether they are dealing with the direct map: > > > int set_direct_map_invalid_noflush(struct page *page) > > { > > return __set_pages_np(page, 1); > > } > > > > int set_direct_map_default_noflush(struct page *page) > > { > > return __set_pages_p(page, 1); > > } > > which would make it patently obvious whether __set_pages_p() should > restore the global bit. That would have been a problem in the "old" PTI > days where _some_ of the direct map was exposed to Meltdown. I don't > think we have any of those mappings left, though. They're all aliases > like text and cpu_entry_area. > > It would be nice if someone could look into unraveling > _PAGE_BIT_PROTNONE. We could even probably move it to another bit for > kernel mappings if we actually need it (I'm not convinced we do). -- Thanks, Hyeonggon ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Is _PAGE_PROTNONE set only for user mappings? 2022-05-11 5:20 ` Is _PAGE_PROTNONE set only for user mappings? Hyeonggon Yoo @ 2022-05-12 10:37 ` Mel Gorman 2022-05-13 5:33 ` Hyeonggon Yoo 0 siblings, 1 reply; 10+ messages in thread From: Mel Gorman @ 2022-05-12 10:37 UTC (permalink / raw) To: Hyeonggon Yoo Cc: Dave Hansen, Tom Lendacky, Rick Edgecombe, Dave Hansen, Andy Lutomirski, Peter Zijlstra, Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86, H. Peter Anvin, Kirill A. Shutemov, Tianyu Lan, Aneesh Kumar K.V, linux-kernel, linux-mm, vbabka, akpm, willy On Wed, May 11, 2022 at 02:20:45PM +0900, Hyeonggon Yoo wrote: > > pgprot_t vm_get_page_prot(unsigned long vm_flags) > > { > > pgprot_t ret = __pgprot(pgprot_val(protection_map[vm_flags & > > (VM_READ|VM_WRITE|VM_EXEC|VM_SHARED)]) | > > pgprot_val(arch_vm_get_page_prot(vm_flags))); > > > > return arch_filter_pgprot(ret); > > } > > EXPORT_SYMBOL(vm_get_page_prot); > > I guess it's only set for processes' VMA if no caller is abusing > vm_get_page_prot() for kernel mappings. > > But yeah, just quick guessing does not make us convinced. > Let's Cc people working on mm. > > If kernel never uses _PAGE_PROTNONE for kernel mappings, it's just okay > not to clear _PAGE_GLOBAL at first in __change_page_attr() if it's not user address, > because no user will confuse _PAGE_GLOBAL as _PAGE_PROTNONE if it's kernel > address. right? > I'm not aware of a case where _PAGE_BIT_PROTNONE is used for a kernel address expecting PROT_NONE semantics instead of the global bit. NUMA Balancing is not going to accidentally treat a kernel address as if it's a NUMA hinting fault. By the time a fault is determining if a PTE access is a numa hinting fault or accesssing a PROT_NONE region, it has been established that it is a userspace address backed by a valid VMA. -- Mel Gorman SUSE Labs ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Is _PAGE_PROTNONE set only for user mappings? 2022-05-12 10:37 ` Mel Gorman @ 2022-05-13 5:33 ` Hyeonggon Yoo 2022-05-16 13:03 ` Mel Gorman 2022-05-16 14:04 ` Dave Hansen 0 siblings, 2 replies; 10+ messages in thread From: Hyeonggon Yoo @ 2022-05-13 5:33 UTC (permalink / raw) To: Mel Gorman Cc: Dave Hansen, Tom Lendacky, Rick Edgecombe, Dave Hansen, Andy Lutomirski, Peter Zijlstra, Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86, H. Peter Anvin, Kirill A. Shutemov, Tianyu Lan, Aneesh Kumar K.V, linux-kernel, linux-mm, vbabka, akpm, willy On Thu, May 12, 2022 at 11:37:48AM +0100, Mel Gorman wrote: > On Wed, May 11, 2022 at 02:20:45PM +0900, Hyeonggon Yoo wrote: > > > pgprot_t vm_get_page_prot(unsigned long vm_flags) > > > { > > > pgprot_t ret = __pgprot(pgprot_val(protection_map[vm_flags & > > > (VM_READ|VM_WRITE|VM_EXEC|VM_SHARED)]) | > > > pgprot_val(arch_vm_get_page_prot(vm_flags))); > > > > > > return arch_filter_pgprot(ret); > > > } > > > EXPORT_SYMBOL(vm_get_page_prot); > > > > I guess it's only set for processes' VMA if no caller is abusing > > vm_get_page_prot() for kernel mappings. > > > > But yeah, just quick guessing does not make us convinced. > > Let's Cc people working on mm. > > > > If kernel never uses _PAGE_PROTNONE for kernel mappings, it's just okay > > not to clear _PAGE_GLOBAL at first in __change_page_attr() if it's not user address, > > because no user will confuse _PAGE_GLOBAL as _PAGE_PROTNONE if it's kernel > > address. right? > > > > I'm not aware of a case where _PAGE_BIT_PROTNONE is used for a kernel > address expecting PROT_NONE semantics instead of the global bit. NUMA > Balancing is not going to accidentally treat a kernel address as if it's > a NUMA hinting fault. By the time a fault is determining if a PTE access > is a numa hinting fault or accesssing a PROT_NONE region, it has been > established that it is a userspace address backed by a valid VMA. > Thanks Mel, and IIUC nor does do_kern_addr_fault() in arch/x86/mm/fault.c expect _PAGE_PROTNONE instead of _PAGE_GLOBAL. I want to make it clear in the code that _PAGE_PROTNONE is only used for user mappings. How does below look? diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h index 40497a9020c6..f8d02b91a90c 100644 --- a/arch/x86/include/asm/pgtable_types.h +++ b/arch/x86/include/asm/pgtable_types.h @@ -35,7 +35,10 @@ #define _PAGE_BIT_DEVMAP _PAGE_BIT_SOFTW4 /* If _PAGE_BIT_PRESENT is clear, we use these: */ -/* - if the user mapped it with PROT_NONE; pte_present gives true */ +/* + * if the user mapped it with PROT_NONE; pte_present gives true + * this is only used for user mappings (with _PAGE_BIT_USER) + */ #define _PAGE_BIT_PROTNONE _PAGE_BIT_GLOBAL #define _PAGE_PRESENT (_AT(pteval_t, 1) << _PAGE_BIT_PRESENT) @@ -115,7 +118,8 @@ #define _PAGE_DEVMAP (_AT(pteval_t, 0)) #endif -#define _PAGE_PROTNONE (_AT(pteval_t, 1) << _PAGE_BIT_PROTNONE) +#define _PAGE_PROTNONE ((_AT(pteval_t, 1) << _PAGE_BIT_USER) | \ + (_AT(pteval_t, 1) << _PAGE_BIT_PROTNONE)) /* * Set of bits not changed in pte_modify. The pte's -- Thanks, Hyeonggon ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: Is _PAGE_PROTNONE set only for user mappings? 2022-05-13 5:33 ` Hyeonggon Yoo @ 2022-05-16 13:03 ` Mel Gorman 2022-05-16 14:04 ` Dave Hansen 1 sibling, 0 replies; 10+ messages in thread From: Mel Gorman @ 2022-05-16 13:03 UTC (permalink / raw) To: Hyeonggon Yoo Cc: Dave Hansen, Tom Lendacky, Rick Edgecombe, Dave Hansen, Andy Lutomirski, Peter Zijlstra, Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86, H. Peter Anvin, Kirill A. Shutemov, Tianyu Lan, Aneesh Kumar K.V, linux-kernel, linux-mm, vbabka, akpm, willy On Fri, May 13, 2022 at 02:33:38PM +0900, Hyeonggon Yoo wrote: > On Thu, May 12, 2022 at 11:37:48AM +0100, Mel Gorman wrote: > > On Wed, May 11, 2022 at 02:20:45PM +0900, Hyeonggon Yoo wrote: > > > > pgprot_t vm_get_page_prot(unsigned long vm_flags) > > > > { > > > > pgprot_t ret = __pgprot(pgprot_val(protection_map[vm_flags & > > > > (VM_READ|VM_WRITE|VM_EXEC|VM_SHARED)]) | > > > > pgprot_val(arch_vm_get_page_prot(vm_flags))); > > > > > > > > return arch_filter_pgprot(ret); > > > > } > > > > EXPORT_SYMBOL(vm_get_page_prot); > > > > > > I guess it's only set for processes' VMA if no caller is abusing > > > vm_get_page_prot() for kernel mappings. > > > > > > But yeah, just quick guessing does not make us convinced. > > > Let's Cc people working on mm. > > > > > > If kernel never uses _PAGE_PROTNONE for kernel mappings, it's just okay > > > not to clear _PAGE_GLOBAL at first in __change_page_attr() if it's not user address, > > > because no user will confuse _PAGE_GLOBAL as _PAGE_PROTNONE if it's kernel > > > address. right? > > > > > > > I'm not aware of a case where _PAGE_BIT_PROTNONE is used for a kernel > > address expecting PROT_NONE semantics instead of the global bit. NUMA > > Balancing is not going to accidentally treat a kernel address as if it's > > a NUMA hinting fault. By the time a fault is determining if a PTE access > > is a numa hinting fault or accesssing a PROT_NONE region, it has been > > established that it is a userspace address backed by a valid VMA. > > > > Thanks Mel, and IIUC nor does do_kern_addr_fault() in arch/x86/mm/fault.c > expect _PAGE_PROTNONE instead of _PAGE_GLOBAL. I want to make it clear > in the code that _PAGE_PROTNONE is only used for user mappings. > > How does below look? > I've no strong objections. I worry that this somehow could be used to set PAGE_USER on a kernel mapping page and maybe a comment would be more appropriate. However, I'm failing to imagine how NUMA balancing could be fooled into doing that. -- Mel Gorman SUSE Labs ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Is _PAGE_PROTNONE set only for user mappings? 2022-05-13 5:33 ` Hyeonggon Yoo 2022-05-16 13:03 ` Mel Gorman @ 2022-05-16 14:04 ` Dave Hansen 2022-05-22 3:56 ` Hyeonggon Yoo 2022-05-29 10:32 ` Hyeonggon Yoo 1 sibling, 2 replies; 10+ messages in thread From: Dave Hansen @ 2022-05-16 14:04 UTC (permalink / raw) To: Hyeonggon Yoo, Mel Gorman Cc: Tom Lendacky, Rick Edgecombe, Dave Hansen, Andy Lutomirski, Peter Zijlstra, Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86, H. Peter Anvin, Kirill A. Shutemov, Tianyu Lan, Aneesh Kumar K.V, linux-kernel, linux-mm, vbabka, akpm, willy On 5/12/22 22:33, Hyeonggon Yoo wrote: > Thanks Mel, and IIUC nor does do_kern_addr_fault() in arch/x86/mm/fault.c > expect _PAGE_PROTNONE instead of _PAGE_GLOBAL. I want to make it clear > in the code that _PAGE_PROTNONE is only used for user mappings. > > How does below look? > > diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h > index 40497a9020c6..f8d02b91a90c 100644 > --- a/arch/x86/include/asm/pgtable_types.h > +++ b/arch/x86/include/asm/pgtable_types.h > @@ -35,7 +35,10 @@ > #define _PAGE_BIT_DEVMAP _PAGE_BIT_SOFTW4 > > /* If _PAGE_BIT_PRESENT is clear, we use these: */ > -/* - if the user mapped it with PROT_NONE; pte_present gives true */ > +/* > + * if the user mapped it with PROT_NONE; pte_present gives true > + * this is only used for user mappings (with _PAGE_BIT_USER) > + */ > #define _PAGE_BIT_PROTNONE _PAGE_BIT_GLOBAL > > #define _PAGE_PRESENT (_AT(pteval_t, 1) << _PAGE_BIT_PRESENT) > @@ -115,7 +118,8 @@ > #define _PAGE_DEVMAP (_AT(pteval_t, 0)) > #endif > > -#define _PAGE_PROTNONE (_AT(pteval_t, 1) << _PAGE_BIT_PROTNONE) > +#define _PAGE_PROTNONE ((_AT(pteval_t, 1) << _PAGE_BIT_USER) | \ > + (_AT(pteval_t, 1) << _PAGE_BIT_PROTNONE)) > > /* > * Set of bits not changed in pte_modify. The pte's I don't like the idea of _PAGE_BIT_USER being so implicit. It is something kernel users should know explicitly that they are messing with. I was thinking of something more along the lines of taking the set_memory.c code and ensuring that it never sets (or even observes) _PAGE_BIT_GLOBAL on a _PAGE_USER mapping. There was also a question of if set_memory.c is ever used on userspace mappings. It would be good to validate whether it's possible in-tree today and if not, enforce that _PAGE_USER PTEs should never even be observed with set_memory.c. The arch/x86/mm/dump_pagetables.c code is also a reasonable place to put assumptions about the page tables since it walks *everything* when asked. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Is _PAGE_PROTNONE set only for user mappings? 2022-05-16 14:04 ` Dave Hansen @ 2022-05-22 3:56 ` Hyeonggon Yoo 2022-05-24 20:22 ` Sean Christopherson 2022-05-29 10:32 ` Hyeonggon Yoo 1 sibling, 1 reply; 10+ messages in thread From: Hyeonggon Yoo @ 2022-05-22 3:56 UTC (permalink / raw) To: Dave Hansen Cc: Mel Gorman, Tom Lendacky, Rick Edgecombe, Dave Hansen, Andy Lutomirski, Peter Zijlstra, Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86, H. Peter Anvin, Kirill A. Shutemov, Tianyu Lan, Aneesh Kumar K.V, linux-kernel, linux-mm, vbabka, akpm, willy On Mon, May 16, 2022 at 07:04:32AM -0700, Dave Hansen wrote: > On 5/12/22 22:33, Hyeonggon Yoo wrote: > > Thanks Mel, and IIUC nor does do_kern_addr_fault() in arch/x86/mm/fault.c > > expect _PAGE_PROTNONE instead of _PAGE_GLOBAL. I want to make it clear > > in the code that _PAGE_PROTNONE is only used for user mappings. > > > > How does below look? > > > > diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h > > index 40497a9020c6..f8d02b91a90c 100644 > > --- a/arch/x86/include/asm/pgtable_types.h > > +++ b/arch/x86/include/asm/pgtable_types.h > > @@ -35,7 +35,10 @@ > > #define _PAGE_BIT_DEVMAP _PAGE_BIT_SOFTW4 > > > > /* If _PAGE_BIT_PRESENT is clear, we use these: */ > > -/* - if the user mapped it with PROT_NONE; pte_present gives true */ > > +/* > > + * if the user mapped it with PROT_NONE; pte_present gives true > > + * this is only used for user mappings (with _PAGE_BIT_USER) > > + */ > > #define _PAGE_BIT_PROTNONE _PAGE_BIT_GLOBAL > > > > #define _PAGE_PRESENT (_AT(pteval_t, 1) << _PAGE_BIT_PRESENT) > > @@ -115,7 +118,8 @@ > > #define _PAGE_DEVMAP (_AT(pteval_t, 0)) > > #endif > > > > -#define _PAGE_PROTNONE (_AT(pteval_t, 1) << _PAGE_BIT_PROTNONE) > > +#define _PAGE_PROTNONE ((_AT(pteval_t, 1) << _PAGE_BIT_USER) | \ > > + (_AT(pteval_t, 1) << _PAGE_BIT_PROTNONE)) > > > > /* > > * Set of bits not changed in pte_modify. The pte's > > I don't like the idea of _PAGE_BIT_USER being so implicit. It is > something kernel users should know explicitly that they are messing with. > Sounds reasonable. Better explicit than implicit. > I was thinking of something more along the lines of taking the > set_memory.c code and ensuring that it never sets (or even observes) > _PAGE_BIT_GLOBAL on a _PAGE_USER mapping. Yeah that would be a bit more explicit solution. > There was also a question of > if set_memory.c is ever used on userspace mappings. It would be good to > validate whether it's possible in-tree today and if not, enforce that > _PAGE_USER PTEs should never even be observed with set_memory.c. Simply adding dump_stack() tells me my kernel on my machine does not use set_memory.c for userspace mappings but Hmm I'll take a look. > The arch/x86/mm/dump_pagetables.c code is also a reasonable place to put > assumptions about the page tables since it walks *everything* when asked. Thanks for that information! Thanks, Hyeonggon ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Is _PAGE_PROTNONE set only for user mappings? 2022-05-22 3:56 ` Hyeonggon Yoo @ 2022-05-24 20:22 ` Sean Christopherson 2022-05-26 10:33 ` Hyeonggon Yoo 0 siblings, 1 reply; 10+ messages in thread From: Sean Christopherson @ 2022-05-24 20:22 UTC (permalink / raw) To: Hyeonggon Yoo Cc: Dave Hansen, Mel Gorman, Tom Lendacky, Rick Edgecombe, Dave Hansen, Andy Lutomirski, Peter Zijlstra, Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86, H. Peter Anvin, Kirill A. Shutemov, Tianyu Lan, Aneesh Kumar K.V, linux-kernel, linux-mm, vbabka, akpm, willy On Sun, May 22, 2022, Hyeonggon Yoo wrote: > On Mon, May 16, 2022 at 07:04:32AM -0700, Dave Hansen wrote: > > I was thinking of something more along the lines of taking the > > set_memory.c code and ensuring that it never sets (or even observes) > > _PAGE_BIT_GLOBAL on a _PAGE_USER mapping. > > Yeah that would be a bit more explicit solution. > > > There was also a question of > > if set_memory.c is ever used on userspace mappings. It would be good to > > validate whether it's possible in-tree today and if not, enforce that > > _PAGE_USER PTEs should never even be observed with set_memory.c. > > Simply adding dump_stack() tells me my kernel on my machine does not use > set_memory.c for userspace mappings but Hmm I'll take a look. vc_slow_virt_to_phys() uses lookup_address_in_pgd() with user mappings, but that code is all but guaranteed to be buggy, e.g. doesn't guard against concurrent modifications to user mappings. show_fault_oops() can also call into lookup_address_in_pgd() with a user mapping, though at that point the kernel has bigger problems since it's executing from user memory. And up until commits 44187235cbcc ("KVM: x86/mmu: fix potential races when walking host page table") and 643d95aac59a ("Revert "x86/mm: Introduce lookup_address_in_mm()""), KVM had a similar bug. Generally speaking, set_memory.c is not equipped to play nice with user mappings. It mostly "works", but there are races galore. IMO, hardening set_memory.c to scream if it's handed a user address or encounters _PAGE_USER PTEs would be a very good thing. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Is _PAGE_PROTNONE set only for user mappings? 2022-05-24 20:22 ` Sean Christopherson @ 2022-05-26 10:33 ` Hyeonggon Yoo 0 siblings, 0 replies; 10+ messages in thread From: Hyeonggon Yoo @ 2022-05-26 10:33 UTC (permalink / raw) To: Sean Christopherson Cc: Dave Hansen, Mel Gorman, Tom Lendacky, Rick Edgecombe, Dave Hansen, Andy Lutomirski, Peter Zijlstra, Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86, H. Peter Anvin, Kirill A. Shutemov, Tianyu Lan, Aneesh Kumar K.V, linux-kernel, linux-mm, vbabka, akpm, willy On Tue, May 24, 2022 at 08:22:30PM +0000, Sean Christopherson wrote: > On Sun, May 22, 2022, Hyeonggon Yoo wrote: > > On Mon, May 16, 2022 at 07:04:32AM -0700, Dave Hansen wrote: > > > I was thinking of something more along the lines of taking the > > > set_memory.c code and ensuring that it never sets (or even observes) > > > _PAGE_BIT_GLOBAL on a _PAGE_USER mapping. > > > > Yeah that would be a bit more explicit solution. > > > > > There was also a question of > > > if set_memory.c is ever used on userspace mappings. It would be good to > > > validate whether it's possible in-tree today and if not, enforce that > > > _PAGE_USER PTEs should never even be observed with set_memory.c. > > > > Simply adding dump_stack() tells me my kernel on my machine does not use > > set_memory.c for userspace mappings but Hmm I'll take a look. > > vc_slow_virt_to_phys() uses lookup_address_in_pgd() with user mappings, but that > code is all but guaranteed to be buggy, e.g. doesn't guard against concurrent > modifications to user mappings. > > show_fault_oops() can also call into lookup_address_in_pgd() with a user mapping, > though at that point the kernel has bigger problems since it's executing from user > memory. > > And up until commits 44187235cbcc ("KVM: x86/mmu: fix potential races when walking > host page table") and 643d95aac59a ("Revert "x86/mm: Introduce lookup_address_in_mm()""), > KVM had a similar bug. Thanks for your helpful insight. I was curious if set_memory*() helpers are used for user mappings. with some quick look ptrace() and uprobes (where updating application's text is needed) use kmap + memcpy or replace_page() instead of set_memory*() API. _lookup_address_cpa() uses init_mm.pgd when cpa.pgd is not specified and the only place that passes pgd is efi subsystem (efi_mm.pgd), which is not a userspace. So it is *obvious* that set_memory*() functions should not be used for user mappings. because that will only result in updating only init_mm's page table. Therefore answering to the first question ('do we really need to unset _PAGE_GLOBAL when we're clearing _PAGE_PRESENT in set_memory.c to avoid confusing it as _PAGE_PROTNONE?'): we don't need to consider PROT_NONE semantics in set_memory.c because we don't (shouldn't) change user mappings in it and _PAGE_PROTNONE is not used for kernel mappings. > Generally speaking, set_memory.c is not equipped to play nice with user mappings. > It mostly "works", but there are races galore. IMO, hardening set_memory.c to scream > if it's handed a user address or encounters _PAGE_USER PTEs would be a very good thing. I agree that would be a good thing too. Thanks, Hyeonggon ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Is _PAGE_PROTNONE set only for user mappings? 2022-05-16 14:04 ` Dave Hansen 2022-05-22 3:56 ` Hyeonggon Yoo @ 2022-05-29 10:32 ` Hyeonggon Yoo 2022-06-02 16:47 ` Dave Hansen 1 sibling, 1 reply; 10+ messages in thread From: Hyeonggon Yoo @ 2022-05-29 10:32 UTC (permalink / raw) To: Dave Hansen Cc: Mel Gorman, Tom Lendacky, Rick Edgecombe, Dave Hansen, Andy Lutomirski, Peter Zijlstra, Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86, H. Peter Anvin, Kirill A. Shutemov, Tianyu Lan, Aneesh Kumar K.V, linux-kernel, linux-mm, vbabka, akpm, willy On Mon, May 16, 2022 at 07:04:32AM -0700, Dave Hansen wrote: > On 5/12/22 22:33, Hyeonggon Yoo wrote: > > Thanks Mel, and IIUC nor does do_kern_addr_fault() in arch/x86/mm/fault.c > > expect _PAGE_PROTNONE instead of _PAGE_GLOBAL. I want to make it clear > > in the code that _PAGE_PROTNONE is only used for user mappings. > > > > How does below look? > > > > diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h > > index 40497a9020c6..f8d02b91a90c 100644 > > --- a/arch/x86/include/asm/pgtable_types.h > > +++ b/arch/x86/include/asm/pgtable_types.h > > @@ -35,7 +35,10 @@ > > #define _PAGE_BIT_DEVMAP _PAGE_BIT_SOFTW4 > > > > /* If _PAGE_BIT_PRESENT is clear, we use these: */ > > -/* - if the user mapped it with PROT_NONE; pte_present gives true */ > > +/* > > + * if the user mapped it with PROT_NONE; pte_present gives true > > + * this is only used for user mappings (with _PAGE_BIT_USER) > > + */ > > #define _PAGE_BIT_PROTNONE _PAGE_BIT_GLOBAL > > > > #define _PAGE_PRESENT (_AT(pteval_t, 1) << _PAGE_BIT_PRESENT) > > @@ -115,7 +118,8 @@ > > #define _PAGE_DEVMAP (_AT(pteval_t, 0)) > > #endif > > > > -#define _PAGE_PROTNONE (_AT(pteval_t, 1) << _PAGE_BIT_PROTNONE) > > +#define _PAGE_PROTNONE ((_AT(pteval_t, 1) << _PAGE_BIT_USER) | \ > > + (_AT(pteval_t, 1) << _PAGE_BIT_PROTNONE)) > > > > /* > > * Set of bits not changed in pte_modify. The pte's > > I don't like the idea of _PAGE_BIT_USER being so implicit. It is > something kernel users should know explicitly that they are messing with. > > I was thinking of something more along the lines of taking the > set_memory.c code and ensuring that it never sets (or even observes) > _PAGE_BIT_GLOBAL on a _PAGE_USER mapping. There was also a question of > if set_memory.c is ever used on userspace mappings. It would be good to > validate whether it's possible in-tree today and if not, enforce that > _PAGE_USER PTEs should never even be observed with set_memory.c. Writing code I'm a bit confused: commit d1440b23c922d8 ("x86/mm: Factor out pageattr _PAGE_GLOBAL setting") says: "This unconditional setting of _PAGE_GLOBAL is a problem when we have PTI and non-PTI and we want some areas to have _PAGE_GLOBAL and some not." Is this this sentence not valid anymore in PTI, and just unconditionally setting _PAGE_GLOBAL would be okay in kernel side regardless of PTI? I'm wondering it because previously I thought "Let's not clear _PAGE_GLOBAL in set_memory.c for kernel mappings and make pmd/pte_present() not confuse when _PAGE_USER is not set" But you don't like it as it's a bit implicit. Then I wonder - how do we know when to set _PAGE_GLOBAL again? > The arch/x86/mm/dump_pagetables.c code is also a reasonable place to put > assumptions about the page tables since it walks *everything* when asked. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Is _PAGE_PROTNONE set only for user mappings? 2022-05-29 10:32 ` Hyeonggon Yoo @ 2022-06-02 16:47 ` Dave Hansen 0 siblings, 0 replies; 10+ messages in thread From: Dave Hansen @ 2022-06-02 16:47 UTC (permalink / raw) To: Hyeonggon Yoo Cc: Mel Gorman, Tom Lendacky, Rick Edgecombe, Dave Hansen, Andy Lutomirski, Peter Zijlstra, Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86, H. Peter Anvin, Kirill A. Shutemov, Tianyu Lan, Aneesh Kumar K.V, linux-kernel, linux-mm, vbabka, akpm, willy On 5/29/22 03:32, Hyeonggon Yoo wrote: > On Mon, May 16, 2022 at 07:04:32AM -0700, Dave Hansen wrote: > Writing code I'm a bit confused: > commit d1440b23c922d8 ("x86/mm: Factor out pageattr > _PAGE_GLOBAL setting") says: > > "This unconditional setting of _PAGE_GLOBAL is a problem when we have > PTI and non-PTI and we want some areas to have _PAGE_GLOBAL and some > not." > > Is this this sentence not valid anymore in PTI, > and just unconditionally setting _PAGE_GLOBAL would be okay in kernel > side regardless of PTI? I believe it's still valid. IIRC, there are three cases: 1. No KPTI. All kernel mappings are _PAGE_GLOBAL. Basically, for present mappings, if _PAGE_USER is clear, _PAGE_GLOBAL is set. 2. KPTI with PCID hardware support (or in a few other cases): The kernel image is mostly non-global. Anything mapped into userspace *is* marked global, like entry text. 3. KPTI without PCIDs: Basically case #2, but with more of the kernel image left global. So, not only are there different KPTI modes, there a different pars of the kernel that require different _PAGE_GLOBAL behavior. pti_kernel_image_global_ok() in arch/x86/mm/pti.c explains it pretty well. ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2022-06-02 16:47 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20220506051940.156952-1-42.hyeyoo@gmail.com>
[not found] ` <56f89895-601e-44c9-bda4-5fae6782e27e@amd.com>
[not found] ` <YnpTHMvOO/pLJQ+l@hyeyoo>
[not found] ` <5fe161cb-6c55-6c4d-c208-16c77e115d3f@amd.com>
[not found] ` <8c2735ac-0335-6e2a-8341-8266d5d13c30@intel.com>
2022-05-11 5:20 ` Is _PAGE_PROTNONE set only for user mappings? Hyeonggon Yoo
2022-05-12 10:37 ` Mel Gorman
2022-05-13 5:33 ` Hyeonggon Yoo
2022-05-16 13:03 ` Mel Gorman
2022-05-16 14:04 ` Dave Hansen
2022-05-22 3:56 ` Hyeonggon Yoo
2022-05-24 20:22 ` Sean Christopherson
2022-05-26 10:33 ` Hyeonggon Yoo
2022-05-29 10:32 ` Hyeonggon Yoo
2022-06-02 16:47 ` Dave Hansen
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).