* Re: [PATCH v2] x86/mm: warn on W+x mappings [not found] ` <20151012113605.GB7384-fF5Pk5pvG8Y@public.gmane.org> @ 2015-10-12 12:41 ` Matt Fleming 2015-10-12 12:49 ` Ingo Molnar 0 siblings, 1 reply; 25+ messages in thread From: Matt Fleming @ 2015-10-12 12:41 UTC (permalink / raw) To: Borislav Petkov Cc: Stephen Smalley, Ingo Molnar, x86-DgEjT+Ai2ygdnm+yROfE0A, linux-kernel-u79uwXL29TY76Z2rM5mHXA, keescook-F7+t8E8rja9g9hUCZPvPmw, Thomas Gleixner, H. Peter Anvin, Peter Zijlstra, Andy Lutomirski, Denys Vlasenko, Brian Gerst, linux-efi-u79uwXL29TY76Z2rM5mHXA, Ard Biesheuvel On Mon, 12 Oct, at 01:36:05PM, Borislav Petkov wrote: > On Tue, Oct 06, 2015 at 11:37:57AM -0400, Stephen Smalley wrote: > > > What does this range correspond to on your kernel? > > Got a W+X splat here too, on the UEFI box with rc5+tip/master: > > [ 6.792949] rtc_cmos 00:02: setting system clock to 2015-10-12 11:17:03 UTC (1444648623) > [ 6.807863] Freeing unused kernel memory: 1312K (ffffffff81f5f000 - ffffffff820a7000) > [ 6.815831] usb 3-1: new high-speed USB device number 2 using ehci-pci > [ 6.823261] Write protecting the kernel read-only data: 14336k > [ 6.832196] Freeing unused kernel memory: 1796K (ffff88000383f000 - ffff880003a00000) > [ 6.842210] Freeing unused kernel memory: 284K (ffff880003db9000 - ffff880003e00000) > [ 6.850524] ------------[ cut here ]------------ > [ 6.855682] WARNING: CPU: 5 PID: 1 at arch/x86/mm/dump_pagetables.c:225 note_page+0x61e/0x7e0() > [ 6.864944] x86/mm: Found insecure W+X mapping at address ffff88000005e000/0xffff88000005e000 > [ 6.874022] Modules linked in: > [ 6.877643] CPU: 5 PID: 1 Comm: swapper/0 Not tainted 4.3.0-rc5+ #1 > [ 6.884462] Hardware name: Dell Inc. Precision T3600/0PTTT9, BIOS A13 05/11/2014 > [ 6.892416] ffffffff81caf1f7 ffff88043bdffd60 ffffffff813aab2c ffff88043bdffda8 > [ 6.900460] ffff88043bdffd98 ffffffff81066776 ffff880004e55308 0000000000000004 > [ 6.907816] usb 4-1: new high-speed USB device number 2 using ehci-pci > [ 6.915499] 8000000000000163 ffff88043bdffe98 0000000000000000 ffff88043bdffdf8 > [ 6.923520] Call Trace: > [ 6.926512] [<ffffffff813aab2c>] dump_stack+0x4e/0x82 > [ 6.931551] usb 3-1: New USB device found, idVendor=8087, idProduct=0024 > [ 6.931552] usb 3-1: New USB device strings: Mfr=0, Product=0, SerialNumber=0 > [ 6.933120] hub 3-1:1.0: USB hub found > [ 6.933369] hub 3-1:1.0: 6 ports detected > [ 6.955784] [<ffffffff81066776>] warn_slowpath_common+0x86/0xc0 > [ 6.962341] [<ffffffff810667fc>] warn_slowpath_fmt+0x4c/0x50 > [ 6.968631] [<ffffffff8105bb7e>] note_page+0x61e/0x7e0 > [ 6.974404] [<ffffffff8105c09f>] ptdump_walk_pgd_level_core+0x35f/0x3f0 > [ 6.981651] [<ffffffff8105c1d7>] ptdump_walk_pgd_level_checkwx+0x17/0x20 > [ 6.988996] [<ffffffff81051b0e>] mark_rodata_ro+0xee/0x100 > [ 6.995124] [<ffffffff81828610>] ? rest_init+0x140/0x140 > [ 7.001064] [<ffffffff8182862d>] kernel_init+0x1d/0xe0 > [ 7.006841] [<ffffffff81836f6f>] ret_from_fork+0x3f/0x70 > [ 7.012774] [<ffffffff81828610>] ? rest_init+0x140/0x140 > [ 7.018706] ---[ end trace 920055014e07ef1e ]--- > [ 7.024302] x86/mm: Checked W+X mappings: FAILED, 69568 W+X pages found. > > And yes, there are a bunch of those mappings here too: > > $ grep -c 'RW.*x' /sys/kernel/debug/kernel_page_tables > 75 > > Some of them are the UEFI runtime regions. I guess we can try to map > them as RO maybe, they need to be X. Matt, any reasons against that? I'm glad you asked (but you won't be)! Basically, it's guaranteed that there exist some machines that contain data in EfiRuntimeCode regions (and so require write permission) and code in EfiRuntimeData regions (and therefore require eXecute), because the whole point of the new EFI_PROPERTIES_TABLE feature in UEFI v2.5 was to make it explicit when the firmware does not include such regions. I do have patches sitting in a git branch that begin to implement support for mapping EFI runtime data regions as NX, and code regions as RO when EFI_PROPERTIES_TABLE is enabled, https://git.kernel.org/cgit/linux/kernel/git/mfleming/efi.git/log/?h=memmap Things got stalled when we realised that Linux didn't even boot with the feature enabled, see commit a5caa209ba9c ("x86/efi: Fix boot crash by mapping EFI memmap entries bottom-up at runtime, instead of top-down"). For more information on what the current limitations are for mapping EFI regions, check out this whitepaper, https://firmware.intel.com/sites/default/files/resources/A_Tour_Beyond_BIOS_Memory_Practices_with_UEFI.pdf -- Matt Fleming, Intel Open Source Technology Center ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v2] x86/mm: warn on W+x mappings 2015-10-12 12:41 ` [PATCH v2] x86/mm: warn on W+x mappings Matt Fleming @ 2015-10-12 12:49 ` Ingo Molnar [not found] ` <20151012124936.GA6260-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> 0 siblings, 1 reply; 25+ messages in thread From: Ingo Molnar @ 2015-10-12 12:49 UTC (permalink / raw) To: Matt Fleming Cc: Borislav Petkov, Stephen Smalley, x86, linux-kernel, keescook, Thomas Gleixner, H. Peter Anvin, Peter Zijlstra, Andy Lutomirski, Denys Vlasenko, Brian Gerst, linux-efi, Ard Biesheuvel * Matt Fleming <matt@codeblueprint.co.uk> wrote: > On Mon, 12 Oct, at 01:36:05PM, Borislav Petkov wrote: > > On Tue, Oct 06, 2015 at 11:37:57AM -0400, Stephen Smalley wrote: > > > > What does this range correspond to on your kernel? > > > > Got a W+X splat here too, on the UEFI box with rc5+tip/master: > > > > [ 6.792949] rtc_cmos 00:02: setting system clock to 2015-10-12 11:17:03 UTC (1444648623) > > [ 6.807863] Freeing unused kernel memory: 1312K (ffffffff81f5f000 - ffffffff820a7000) > > [ 6.815831] usb 3-1: new high-speed USB device number 2 using ehci-pci > > [ 6.823261] Write protecting the kernel read-only data: 14336k > > [ 6.832196] Freeing unused kernel memory: 1796K (ffff88000383f000 - ffff880003a00000) > > [ 6.842210] Freeing unused kernel memory: 284K (ffff880003db9000 - ffff880003e00000) > > [ 6.850524] ------------[ cut here ]------------ > > [ 6.855682] WARNING: CPU: 5 PID: 1 at arch/x86/mm/dump_pagetables.c:225 note_page+0x61e/0x7e0() > > [ 6.864944] x86/mm: Found insecure W+X mapping at address ffff88000005e000/0xffff88000005e000 > > [ 6.874022] Modules linked in: > > [ 6.877643] CPU: 5 PID: 1 Comm: swapper/0 Not tainted 4.3.0-rc5+ #1 > > [ 6.884462] Hardware name: Dell Inc. Precision T3600/0PTTT9, BIOS A13 05/11/2014 > > [ 6.892416] ffffffff81caf1f7 ffff88043bdffd60 ffffffff813aab2c ffff88043bdffda8 > > [ 6.900460] ffff88043bdffd98 ffffffff81066776 ffff880004e55308 0000000000000004 > > [ 6.907816] usb 4-1: new high-speed USB device number 2 using ehci-pci > > [ 6.915499] 8000000000000163 ffff88043bdffe98 0000000000000000 ffff88043bdffdf8 > > [ 6.923520] Call Trace: > > [ 6.926512] [<ffffffff813aab2c>] dump_stack+0x4e/0x82 > > [ 6.931551] usb 3-1: New USB device found, idVendor=8087, idProduct=0024 > > [ 6.931552] usb 3-1: New USB device strings: Mfr=0, Product=0, SerialNumber=0 > > [ 6.933120] hub 3-1:1.0: USB hub found > > [ 6.933369] hub 3-1:1.0: 6 ports detected > > [ 6.955784] [<ffffffff81066776>] warn_slowpath_common+0x86/0xc0 > > [ 6.962341] [<ffffffff810667fc>] warn_slowpath_fmt+0x4c/0x50 > > [ 6.968631] [<ffffffff8105bb7e>] note_page+0x61e/0x7e0 > > [ 6.974404] [<ffffffff8105c09f>] ptdump_walk_pgd_level_core+0x35f/0x3f0 > > [ 6.981651] [<ffffffff8105c1d7>] ptdump_walk_pgd_level_checkwx+0x17/0x20 > > [ 6.988996] [<ffffffff81051b0e>] mark_rodata_ro+0xee/0x100 > > [ 6.995124] [<ffffffff81828610>] ? rest_init+0x140/0x140 > > [ 7.001064] [<ffffffff8182862d>] kernel_init+0x1d/0xe0 > > [ 7.006841] [<ffffffff81836f6f>] ret_from_fork+0x3f/0x70 > > [ 7.012774] [<ffffffff81828610>] ? rest_init+0x140/0x140 > > [ 7.018706] ---[ end trace 920055014e07ef1e ]--- > > [ 7.024302] x86/mm: Checked W+X mappings: FAILED, 69568 W+X pages found. > > > > And yes, there are a bunch of those mappings here too: > > > > $ grep -c 'RW.*x' /sys/kernel/debug/kernel_page_tables > > 75 > > > > Some of them are the UEFI runtime regions. I guess we can try to map > > them as RO maybe, they need to be X. Matt, any reasons against that? > > I'm glad you asked (but you won't be)! > > Basically, it's guaranteed that there exist some machines that contain > data in EfiRuntimeCode regions (and so require write permission) and > code in EfiRuntimeData regions (and therefore require eXecute), > because the whole point of the new EFI_PROPERTIES_TABLE feature in > UEFI v2.5 was to make it explicit when the firmware does not include > such regions. So why not unmap them after bootup? Is there any reason to call into EFI code while the system is up and running? Thanks, Ingo ^ permalink raw reply [flat|nested] 25+ messages in thread
[parent not found: <20151012124936.GA6260-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>]
* Re: [PATCH v2] x86/mm: warn on W+x mappings [not found] ` <20151012124936.GA6260-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> @ 2015-10-12 12:55 ` Matt Fleming [not found] ` <20151012125548.GE2579-mF/unelCI9GS6iBeEJttW/XRex20P6io@public.gmane.org> 0 siblings, 1 reply; 25+ messages in thread From: Matt Fleming @ 2015-10-12 12:55 UTC (permalink / raw) To: Ingo Molnar Cc: Borislav Petkov, Stephen Smalley, x86-DgEjT+Ai2ygdnm+yROfE0A, linux-kernel-u79uwXL29TY76Z2rM5mHXA, keescook-F7+t8E8rja9g9hUCZPvPmw, Thomas Gleixner, H. Peter Anvin, Peter Zijlstra, Andy Lutomirski, Denys Vlasenko, Brian Gerst, linux-efi-u79uwXL29TY76Z2rM5mHXA, Ard Biesheuvel On Mon, 12 Oct, at 02:49:36PM, Ingo Molnar wrote: > > > So why not unmap them after bootup? Is there any reason to call into EFI code > while the system is up and running? That's where the runtime services code lives. So if you want things like EFI variables (used by the distro installer, among other things) you need to map the runtime regions. You can of course disable that by using the "noefi" kernel parameter, which should unmap everything for you once you've finished booting. -- Matt Fleming, Intel Open Source Technology Center ^ permalink raw reply [flat|nested] 25+ messages in thread
[parent not found: <20151012125548.GE2579-mF/unelCI9GS6iBeEJttW/XRex20P6io@public.gmane.org>]
* Re: [PATCH v2] x86/mm: warn on W+x mappings [not found] ` <20151012125548.GE2579-mF/unelCI9GS6iBeEJttW/XRex20P6io@public.gmane.org> @ 2015-10-12 14:17 ` Ingo Molnar [not found] ` <20151012141754.GA6621-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> 0 siblings, 1 reply; 25+ messages in thread From: Ingo Molnar @ 2015-10-12 14:17 UTC (permalink / raw) To: Matt Fleming Cc: Borislav Petkov, Stephen Smalley, x86-DgEjT+Ai2ygdnm+yROfE0A, linux-kernel-u79uwXL29TY76Z2rM5mHXA, keescook-F7+t8E8rja9g9hUCZPvPmw, Thomas Gleixner, H. Peter Anvin, Peter Zijlstra, Andy Lutomirski, Denys Vlasenko, Brian Gerst, linux-efi-u79uwXL29TY76Z2rM5mHXA, Ard Biesheuvel * Matt Fleming <matt-mF/unelCI9GS6iBeEJttW/XRex20P6io@public.gmane.org> wrote: > On Mon, 12 Oct, at 02:49:36PM, Ingo Molnar wrote: > > > > > > So why not unmap them after bootup? Is there any reason to call into EFI code > > while the system is up and running? > > That's where the runtime services code lives. So if you want things like EFI > variables (used by the distro installer, among other things) you need to map the > runtime regions. So EFI variables could be queried during bootup and saved on the Linux side. Calling into firmware after the kernel has booted up is fragile in general - beyond W+X the security considerations. Thanks, Ingo ^ permalink raw reply [flat|nested] 25+ messages in thread
[parent not found: <20151012141754.GA6621-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>]
* Re: [PATCH v2] x86/mm: warn on W+x mappings [not found] ` <20151012141754.GA6621-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> @ 2015-10-12 14:49 ` Matt Fleming [not found] ` <20151012144928.GF2579-mF/unelCI9GS6iBeEJttW/XRex20P6io@public.gmane.org> 2015-10-14 15:18 ` Ingo Molnar 0 siblings, 2 replies; 25+ messages in thread From: Matt Fleming @ 2015-10-12 14:49 UTC (permalink / raw) To: Ingo Molnar Cc: Borislav Petkov, Stephen Smalley, x86-DgEjT+Ai2ygdnm+yROfE0A, linux-kernel-u79uwXL29TY76Z2rM5mHXA, keescook-F7+t8E8rja9g9hUCZPvPmw, Thomas Gleixner, H. Peter Anvin, Peter Zijlstra, Andy Lutomirski, Denys Vlasenko, Brian Gerst, linux-efi-u79uwXL29TY76Z2rM5mHXA, Ard Biesheuvel On Mon, 12 Oct, at 04:17:54PM, Ingo Molnar wrote: > > * Matt Fleming <matt-mF/unelCI9GS6iBeEJttW/XRex20P6io@public.gmane.org> wrote: > > > On Mon, 12 Oct, at 02:49:36PM, Ingo Molnar wrote: > > > > > > > > > So why not unmap them after bootup? Is there any reason to call into EFI code > > > while the system is up and running? > > > > That's where the runtime services code lives. So if you want things like EFI > > variables (used by the distro installer, among other things) you need to map the > > runtime regions. > > So EFI variables could be queried during bootup and saved on the Linux side. Right, we could do that, but then we wouldn't be able to support creation/updating variables at runtime, such as when you install a distribution for the first time, or want to boot a new kernel filename directly from the firmware without a boot loader (and need to modify the BootXXXX variables). And it's not just EFI variables that need runtime support either, for some platforms the only way to reboot/poweroff is with EFI, such as on the ASUS T100TA (Intel Baytrail-T). That's not to say your suggestion doesn't make sense for some cases, I can definitely see how turning off runtime support but providing a cache of EFI variables would be useful for some scenarios. But I don't think it's ever going to be workable as a default option. > Calling into firmware after the kernel has booted up is fragile in general - > beyond W+X the security considerations. It isn't intended to be fragile, and effort has gone into defining the context under which the EFI runtime services can operate (though there are obviously gaps in that specification). The entire point of the EFI runtime services is that they can be invoked from the OS, and because hardware/firmware developers rely on that when designing platforms, it's going to be something that Linux is going to have to be able to do. Of course, that doesn't we shouldn't be able to turn it off if the user is happy to sacrifice some platform functionality. Additionally, if we've got suggestions for the firmware developers on what we want the runtime context to look like, let's propose it to them. They're pretty receptive in my experience. -- Matt Fleming, Intel Open Source Technology Center ^ permalink raw reply [flat|nested] 25+ messages in thread
[parent not found: <20151012144928.GF2579-mF/unelCI9GS6iBeEJttW/XRex20P6io@public.gmane.org>]
* Re: [PATCH v2] x86/mm: warn on W+x mappings [not found] ` <20151012144928.GF2579-mF/unelCI9GS6iBeEJttW/XRex20P6io@public.gmane.org> @ 2015-10-12 15:34 ` Ard Biesheuvel [not found] ` <CAKv+Gu95NoB5cPqZMeBLn7i0JWDDPKX63FuiMVD94HpbGhKrhA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 0 siblings, 1 reply; 25+ messages in thread From: Ard Biesheuvel @ 2015-10-12 15:34 UTC (permalink / raw) To: Matt Fleming Cc: Ingo Molnar, Borislav Petkov, Stephen Smalley, x86-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Kees Cook, Thomas Gleixner, H. Peter Anvin, Peter Zijlstra, Andy Lutomirski, Denys Vlasenko, Brian Gerst, linux-efi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org On 12 October 2015 at 16:49, Matt Fleming <matt-mF/unelCI9GS6iBeEJttW/XRex20P6io@public.gmane.org> wrote: > On Mon, 12 Oct, at 04:17:54PM, Ingo Molnar wrote: >> >> * Matt Fleming <matt-mF/unelCI9GS6iBeEJttW/XRex20P6io@public.gmane.org> wrote: >> >> > On Mon, 12 Oct, at 02:49:36PM, Ingo Molnar wrote: >> > > >> > > >> > > So why not unmap them after bootup? Is there any reason to call into EFI code >> > > while the system is up and running? >> > >> > That's where the runtime services code lives. So if you want things like EFI >> > variables (used by the distro installer, among other things) you need to map the >> > runtime regions. >> >> So EFI variables could be queried during bootup and saved on the Linux side. > > Right, we could do that, but then we wouldn't be able to support > creation/updating variables at runtime, such as when you install a > distribution for the first time, or want to boot a new kernel filename > directly from the firmware without a boot loader (and need to modify > the BootXXXX variables). > > And it's not just EFI variables that need runtime support either, for > some platforms the only way to reboot/poweroff is with EFI, such as on > the ASUS T100TA (Intel Baytrail-T). > > That's not to say your suggestion doesn't make sense for some cases, I > can definitely see how turning off runtime support but providing a > cache of EFI variables would be useful for some scenarios. But I don't > think it's ever going to be workable as a default option. > >> Calling into firmware after the kernel has booted up is fragile in general - >> beyond W+X the security considerations. > > It isn't intended to be fragile, and effort has gone into defining the > context under which the EFI runtime services can operate (though there > are obviously gaps in that specification). > > The entire point of the EFI runtime services is that they can be > invoked from the OS, and because hardware/firmware developers rely on > that when designing platforms, it's going to be something that Linux > is going to have to be able to do. Of course, that doesn't we > shouldn't be able to turn it off if the user is happy to sacrifice > some platform functionality. > > Additionally, if we've got suggestions for the firmware developers on > what we want the runtime context to look like, let's propose it to > them. They're pretty receptive in my experience. > On arm64, we only map in all of the UEFI runtime services regions during the time any of these services are being invoked. I think this should be mostly feasible on x86 as well, although it would involve yet another rewrite of the EFI region mapping code, and most likely a long list of quirks for platforms that are not able to deal with it correctly for one reason or the other (but that all come down to: 'if you are not doing it like Windows does it, you must be doing it wrong'). That would make the 'secure' way of mapping things an opt-in feature, which is generally not desirable for security features (since it will rarely be used in the real world then). So enabling the Properties Table memprotect feature as soon as the spec defines it in a meaningful way is probably a better way to go, and our current involvement is focused on defining it such that it can be enabled by default by firmwares rather than ending up an obscure switch in the BIOS screen that only the paranoid ever turn on. -- Ard. ^ permalink raw reply [flat|nested] 25+ messages in thread
[parent not found: <CAKv+Gu95NoB5cPqZMeBLn7i0JWDDPKX63FuiMVD94HpbGhKrhA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: [PATCH v2] x86/mm: warn on W+x mappings [not found] ` <CAKv+Gu95NoB5cPqZMeBLn7i0JWDDPKX63FuiMVD94HpbGhKrhA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2015-10-12 15:50 ` Matt Fleming [not found] ` <20151012155028.GH2579-mF/unelCI9GS6iBeEJttW/XRex20P6io@public.gmane.org> 0 siblings, 1 reply; 25+ messages in thread From: Matt Fleming @ 2015-10-12 15:50 UTC (permalink / raw) To: Ard Biesheuvel Cc: Ingo Molnar, Borislav Petkov, Stephen Smalley, x86-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Kees Cook, Thomas Gleixner, H. Peter Anvin, Peter Zijlstra, Andy Lutomirski, Denys Vlasenko, Brian Gerst, linux-efi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org On Mon, 12 Oct, at 05:34:53PM, Ard Biesheuvel wrote: > > On arm64, we only map in all of the UEFI runtime services regions > during the time any of these services are being invoked. I think this > should be mostly feasible on x86 as well, although it would involve > yet another rewrite of the EFI region mapping code, and most likely a > long list of quirks for platforms that are not able to deal with it > correctly for one reason or the other (but that all come down to: 'if > you are not doing it like Windows does it, you must be doing it > wrong'). Actually, we use separate page tables for mapping the EFI runtime services on x86 right now. These tables are only used when making runtime calls, just like on arm64. So we've got a little bit of isolation right now. > That would make the 'secure' way of mapping things an opt-in > feature, which is generally not desirable for security features > (since it will rarely be used in the real world then). I'd like to think that we're coming to EFI_PROPERTIES_TABLE early enough that we can work out all the kinks, get things working out of the box in upstream tianocore, and have it be the standard way to expose memory regions. At least that's my hope. > So enabling the Properties Table memprotect feature as soon as the > spec defines it in a meaningful way is probably a better way to go, > and our current involvement is focused on defining it such that it can > be enabled by default by firmwares rather than ending up an obscure > switch in the BIOS screen that only the paranoid ever turn on. Indeed. -- Matt Fleming, Intel Open Source Technology Center ^ permalink raw reply [flat|nested] 25+ messages in thread
[parent not found: <20151012155028.GH2579-mF/unelCI9GS6iBeEJttW/XRex20P6io@public.gmane.org>]
* Re: [PATCH v2] x86/mm: warn on W+x mappings [not found] ` <20151012155028.GH2579-mF/unelCI9GS6iBeEJttW/XRex20P6io@public.gmane.org> @ 2015-10-12 16:43 ` Ard Biesheuvel 0 siblings, 0 replies; 25+ messages in thread From: Ard Biesheuvel @ 2015-10-12 16:43 UTC (permalink / raw) To: Matt Fleming Cc: Ingo Molnar, Borislav Petkov, Stephen Smalley, x86-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Kees Cook, Thomas Gleixner, H. Peter Anvin, Peter Zijlstra, Andy Lutomirski, Denys Vlasenko, Brian Gerst, linux-efi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org On 12 October 2015 at 17:50, Matt Fleming <matt-mF/unelCI9GS6iBeEJttW/XRex20P6io@public.gmane.org> wrote: > On Mon, 12 Oct, at 05:34:53PM, Ard Biesheuvel wrote: >> >> On arm64, we only map in all of the UEFI runtime services regions >> during the time any of these services are being invoked. I think this >> should be mostly feasible on x86 as well, although it would involve >> yet another rewrite of the EFI region mapping code, and most likely a >> long list of quirks for platforms that are not able to deal with it >> correctly for one reason or the other (but that all come down to: 'if >> you are not doing it like Windows does it, you must be doing it >> wrong'). > > Actually, we use separate page tables for mapping the EFI runtime > services on x86 right now. These tables are only used when making > runtime calls, just like on arm64. > > So we've got a little bit of isolation right now. > Ah ok. I thought that only applied to the duplicate 1:1 mapping, not to the high mapping. But that does reduce the attack surface considerably. Combined with strict w^x once the UEFI 2.5 feature is fully supported, I am a lot less nervous about RWX EFI runtime regions being used to subvert the system. -- Ard. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v2] x86/mm: warn on W+x mappings 2015-10-12 14:49 ` Matt Fleming [not found] ` <20151012144928.GF2579-mF/unelCI9GS6iBeEJttW/XRex20P6io@public.gmane.org> @ 2015-10-14 15:18 ` Ingo Molnar 2015-10-14 15:30 ` Andy Lutomirski 2015-10-14 21:02 ` Matt Fleming 1 sibling, 2 replies; 25+ messages in thread From: Ingo Molnar @ 2015-10-14 15:18 UTC (permalink / raw) To: Matt Fleming Cc: Borislav Petkov, Stephen Smalley, x86, linux-kernel, keescook, Thomas Gleixner, H. Peter Anvin, Peter Zijlstra, Andy Lutomirski, Denys Vlasenko, Brian Gerst, linux-efi, Ard Biesheuvel * Matt Fleming <matt@codeblueprint.co.uk> wrote: > On Mon, 12 Oct, at 04:17:54PM, Ingo Molnar wrote: > > > > * Matt Fleming <matt@codeblueprint.co.uk> wrote: > > > > > On Mon, 12 Oct, at 02:49:36PM, Ingo Molnar wrote: > > > > > > > > > > > > So why not unmap them after bootup? Is there any reason to call into EFI code > > > > while the system is up and running? > > > > > > That's where the runtime services code lives. So if you want things like EFI > > > variables (used by the distro installer, among other things) you need to map the > > > runtime regions. > > > > So EFI variables could be queried during bootup and saved on the Linux side. > > Right, we could do that, but then we wouldn't be able to support > creation/updating variables at runtime, such as when you install a > distribution for the first time, or want to boot a new kernel filename > directly from the firmware without a boot loader (and need to modify > the BootXXXX variables). Do we know the precise position and address range of these variables? We could map them writable (but not executable), and the rest executable (but not writable). That raises the question whether the same physical page ever mixes variables and actual code - but the hope would be that it's suffiently page granular for this to work. Thanks, Ingo ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v2] x86/mm: warn on W+x mappings 2015-10-14 15:18 ` Ingo Molnar @ 2015-10-14 15:30 ` Andy Lutomirski 2015-10-14 15:35 ` Borislav Petkov 2015-10-14 21:02 ` Matt Fleming 1 sibling, 1 reply; 25+ messages in thread From: Andy Lutomirski @ 2015-10-14 15:30 UTC (permalink / raw) To: Ingo Molnar Cc: Matt Fleming, Borislav Petkov, Stephen Smalley, X86 ML, linux-kernel@vger.kernel.org, Kees Cook, Thomas Gleixner, H. Peter Anvin, Peter Zijlstra, Andy Lutomirski, Denys Vlasenko, Brian Gerst, linux-efi@vger.kernel.org, Ard Biesheuvel On Wed, Oct 14, 2015 at 8:18 AM, Ingo Molnar <mingo@kernel.org> wrote: > > * Matt Fleming <matt@codeblueprint.co.uk> wrote: > >> On Mon, 12 Oct, at 04:17:54PM, Ingo Molnar wrote: >> > >> > * Matt Fleming <matt@codeblueprint.co.uk> wrote: >> > >> > > On Mon, 12 Oct, at 02:49:36PM, Ingo Molnar wrote: >> > > > >> > > > >> > > > So why not unmap them after bootup? Is there any reason to call into EFI code >> > > > while the system is up and running? >> > > >> > > That's where the runtime services code lives. So if you want things like EFI >> > > variables (used by the distro installer, among other things) you need to map the >> > > runtime regions. >> > >> > So EFI variables could be queried during bootup and saved on the Linux side. >> >> Right, we could do that, but then we wouldn't be able to support >> creation/updating variables at runtime, such as when you install a >> distribution for the first time, or want to boot a new kernel filename >> directly from the firmware without a boot loader (and need to modify >> the BootXXXX variables). > > Do we know the precise position and address range of these variables? > > We could map them writable (but not executable), and the rest executable (but not > writable). > > That raises the question whether the same physical page ever mixes variables and > actual code - but the hope would be that it's suffiently page granular for this to > work. Can we just unmap these things until someone tries to do an EFI call, and then unmap them again after the call returns? We already switch pgds for EFI IIRC. --Andy ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v2] x86/mm: warn on W+x mappings 2015-10-14 15:30 ` Andy Lutomirski @ 2015-10-14 15:35 ` Borislav Petkov [not found] ` <20151014153522.GC8218-fF5Pk5pvG8Y@public.gmane.org> 0 siblings, 1 reply; 25+ messages in thread From: Borislav Petkov @ 2015-10-14 15:35 UTC (permalink / raw) To: Andy Lutomirski Cc: Ingo Molnar, Matt Fleming, Stephen Smalley, X86 ML, linux-kernel@vger.kernel.org, Kees Cook, Thomas Gleixner, H. Peter Anvin, Peter Zijlstra, Andy Lutomirski, Denys Vlasenko, Brian Gerst, linux-efi@vger.kernel.org, Ard Biesheuvel On Wed, Oct 14, 2015 at 08:30:48AM -0700, Andy Lutomirski wrote: > Can we just unmap these things until someone tries to do an EFI call, > and then unmap them again after the call returns? We already switch > pgds for EFI IIRC. hpa did mention an EFI-aware page fault handler at the time. I guess we could do that too... -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply. ^ permalink raw reply [flat|nested] 25+ messages in thread
[parent not found: <20151014153522.GC8218-fF5Pk5pvG8Y@public.gmane.org>]
* Re: [PATCH v2] x86/mm: warn on W+x mappings [not found] ` <20151014153522.GC8218-fF5Pk5pvG8Y@public.gmane.org> @ 2015-10-15 10:10 ` Matt Fleming [not found] ` <20151015101016.GB2975-mF/unelCI9GS6iBeEJttW/XRex20P6io@public.gmane.org> 0 siblings, 1 reply; 25+ messages in thread From: Matt Fleming @ 2015-10-15 10:10 UTC (permalink / raw) To: Borislav Petkov Cc: Andy Lutomirski, Ingo Molnar, Stephen Smalley, X86 ML, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Kees Cook, Thomas Gleixner, H. Peter Anvin, Peter Zijlstra, Andy Lutomirski, Denys Vlasenko, Brian Gerst, linux-efi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Ard Biesheuvel, Ricardo Neri, luv-hn68Rpc1hR1g9hUCZPvPmw On Wed, 14 Oct, at 05:35:22PM, Borislav Petkov wrote: > On Wed, Oct 14, 2015 at 08:30:48AM -0700, Andy Lutomirski wrote: > > Can we just unmap these things until someone tries to do an EFI call, > > and then unmap them again after the call returns? We already switch > > pgds for EFI IIRC. > > hpa did mention an EFI-aware page fault handler at the time. I guess we > could do that too... We do this for the Linux UEFI Validation project kernel [1]. There, we do not map EFI Boot Services regions by default, only if the firmware tries to access them. This gives us the opporunity to print an error message if Boot Services regions are accessed after ExitBootServices() (which is the bug mjg59 describes in commit 916f676f8dc0 ("x86, efi: Retain boot service code until after switching to virtual mode")). But for the issue being discussed in this thread, the thing unmapping the EFI regions buys you is that they're no longer accessible from the x86 sleep/wakeup code paths, since those also use trampoline_pgd which is where the EFI page tables are mapped. And that's probably a good idea. [1] - https://git.kernel.org/cgit/linux/kernel/git/mfleming/efi.git/commit/?h=stable&id=9b78793058bf93958aa9529400cb2617ec1bc958 -- Matt Fleming, Intel Open Source Technology Center ^ permalink raw reply [flat|nested] 25+ messages in thread
[parent not found: <20151015101016.GB2975-mF/unelCI9GS6iBeEJttW/XRex20P6io@public.gmane.org>]
* Re: [PATCH v2] x86/mm: warn on W+x mappings [not found] ` <20151015101016.GB2975-mF/unelCI9GS6iBeEJttW/XRex20P6io@public.gmane.org> @ 2015-10-15 10:33 ` Borislav Petkov 2015-10-16 1:45 ` Ricardo Neri 0 siblings, 1 reply; 25+ messages in thread From: Borislav Petkov @ 2015-10-15 10:33 UTC (permalink / raw) To: Matt Fleming Cc: Andy Lutomirski, Ingo Molnar, Stephen Smalley, X86 ML, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Kees Cook, Thomas Gleixner, H. Peter Anvin, Peter Zijlstra, Andy Lutomirski, Denys Vlasenko, Brian Gerst, linux-efi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Ard Biesheuvel, Ricardo Neri, luv-hn68Rpc1hR1g9hUCZPvPmw On Thu, Oct 15, 2015 at 11:10:16AM +0100, Matt Fleming wrote: > We do this for the Linux UEFI Validation project kernel [1]. There, we > do not map EFI Boot Services regions by default, only if the firmware > tries to access them. > > This gives us the opporunity to print an error message if Boot > Services regions are accessed after ExitBootServices() (which is the > bug mjg59 describes in commit 916f676f8dc0 ("x86, efi: Retain boot > service code until after switching to virtual mode")). Yeah, that's actually a good idea. Why not upstream it for the wider audience so that people can actually start reporting b0rked UEFIs? With a big and nice FW_BUG splat in there... > But for the issue being discussed in this thread, the thing unmapping > the EFI regions buys you is that they're no longer accessible from the > x86 sleep/wakeup code paths, since those also use trampoline_pgd which > is where the EFI page tables are mapped. > > And that's probably a good idea. > > [1] - https://git.kernel.org/cgit/linux/kernel/git/mfleming/efi.git/commit/?h=stable&id=9b78793058bf93958aa9529400cb2617ec1bc958 In reading that commit message above, the fact that the braindead decision of allowing SetVirtualAddressMap() to be called only once reminds me that we can't really have a PF handler for runtime services as *all* mappings need to be ready before calling SetVirtualAddressMap(). Or, alternatively, we can prep them, call SetVirtualAddressMap() and then unmap them all and map them again at the same addresses only in the PF handler, each time a runtime call happens. When that call finishes, we unmap them again... Hmm, perhaps not worth the trouble... -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v2] x86/mm: warn on W+x mappings 2015-10-15 10:33 ` Borislav Petkov @ 2015-10-16 1:45 ` Ricardo Neri 0 siblings, 0 replies; 25+ messages in thread From: Ricardo Neri @ 2015-10-16 1:45 UTC (permalink / raw) To: Borislav Petkov Cc: Matt Fleming, Andy Lutomirski, Ingo Molnar, Stephen Smalley, X86 ML, linux-kernel@vger.kernel.org, Kees Cook, Thomas Gleixner, H. Peter Anvin, Peter Zijlstra, Andy Lutomirski, Denys Vlasenko, Brian Gerst, linux-efi@vger.kernel.org, Ard Biesheuvel, luv On Thu, 2015-10-15 at 12:33 +0200, Borislav Petkov wrote: > Yeah, that's actually a good idea. Why not upstream it for the wider > audience so that people can actually start reporting b0rked UEFIs? > With > a big and nice FW_BUG splat in there... We attempted to upstream in the past but later I discovered that my implementation in particular is causing warnings due to SMP. Also, I need to implement an alternative or extension to the current efi_map_region, which have the __init qualifier. This is because mappings might happen after the __inits have been discarded. I have this work currently in my scope. Thanks and BR, Ricardo ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v2] x86/mm: warn on W+x mappings 2015-10-14 15:18 ` Ingo Molnar 2015-10-14 15:30 ` Andy Lutomirski @ 2015-10-14 21:02 ` Matt Fleming 2015-10-21 9:42 ` Ingo Molnar 1 sibling, 1 reply; 25+ messages in thread From: Matt Fleming @ 2015-10-14 21:02 UTC (permalink / raw) To: Ingo Molnar Cc: Borislav Petkov, Stephen Smalley, x86, linux-kernel, keescook, Thomas Gleixner, H. Peter Anvin, Peter Zijlstra, Andy Lutomirski, Denys Vlasenko, Brian Gerst, linux-efi, Ard Biesheuvel On Wed, 14 Oct, at 05:18:07PM, Ingo Molnar wrote: > > * Matt Fleming <matt@codeblueprint.co.uk> wrote: > > > On Mon, 12 Oct, at 04:17:54PM, Ingo Molnar wrote: > > > > > > * Matt Fleming <matt@codeblueprint.co.uk> wrote: > > > > > > > On Mon, 12 Oct, at 02:49:36PM, Ingo Molnar wrote: > > > > > > > > > > > > > > > So why not unmap them after bootup? Is there any reason to call into EFI code > > > > > while the system is up and running? > > > > > > > > That's where the runtime services code lives. So if you want things like EFI > > > > variables (used by the distro installer, among other things) you need to map the > > > > runtime regions. > > > > > > So EFI variables could be queried during bootup and saved on the Linux side. > > > > Right, we could do that, but then we wouldn't be able to support > > creation/updating variables at runtime, such as when you install a > > distribution for the first time, or want to boot a new kernel filename > > directly from the firmware without a boot loader (and need to modify > > the BootXXXX variables). > > Do we know the precise position and address range of these variables? > > We could map them writable (but not executable), and the rest executable (but not > writable). The variables are stored in NVRAM, which we don't map into the kernel virtual address space. We have to initiate the transaction of writing to the variables by executing EFI runtime services. We obviously have buffers that we pass to the BIOS that contain variable data, but these should be NX anyway because they're regular kernel allocations. > That raises the question whether the same physical page ever mixes variables and > actual code - but the hope would be that it's suffiently page granular for this to > work. I don't think that would ever happen. -- Matt Fleming, Intel Open Source Technology Center ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v2] x86/mm: warn on W+x mappings 2015-10-14 21:02 ` Matt Fleming @ 2015-10-21 9:42 ` Ingo Molnar [not found] ` <20151021094242.GA12155-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> 0 siblings, 1 reply; 25+ messages in thread From: Ingo Molnar @ 2015-10-21 9:42 UTC (permalink / raw) To: Matt Fleming Cc: Borislav Petkov, Stephen Smalley, x86, linux-kernel, keescook, Thomas Gleixner, H. Peter Anvin, Peter Zijlstra, Andy Lutomirski, Denys Vlasenko, Brian Gerst, linux-efi, Ard Biesheuvel * Matt Fleming <matt@codeblueprint.co.uk> wrote: > > > Right, we could do that, but then we wouldn't be able to support > > > creation/updating variables at runtime, such as when you install a > > > distribution for the first time, or want to boot a new kernel filename > > > directly from the firmware without a boot loader (and need to modify the > > > BootXXXX variables). > > > > Do we know the precise position and address range of these variables? > > > > We could map them writable (but not executable), and the rest executable (but > > not writable). > > The variables are stored in NVRAM, which we don't map into the kernel virtual > address space. [...] Just curious: is there firmware that memory maps those variables privately? > [...] We have to initiate the transaction of writing to the variables by > executing EFI runtime services. > > We obviously have buffers that we pass to the BIOS that contain variable data, > but these should be NX anyway because they're regular kernel allocations. > > > That raises the question whether the same physical page ever mixes variables > > and actual code - but the hope would be that it's suffiently page granular for > > this to work. > > I don't think that would ever happen. Ok, that's promising, so how about this then to solve the security weakness the new warning unearthed: map the whole EFI range as 'r-x (NX)', but detect writes from the page fault handler and transparently allow them to flip over the range to 'rw-'. Note that for security reasons we don't allow a subsequent flipping back to NX if there's an NX fault on the same page, i.e. this new mechanism is a monotonic one-way process that should dynamically 'map out' data pages versus executable pages. It should also be pretty robust, assuming we can take page faults while EFI code is executing and is trying to modify EFI data: is that the case? Thanks, Ingo ^ permalink raw reply [flat|nested] 25+ messages in thread
[parent not found: <20151021094242.GA12155-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>]
* Re: [PATCH v2] x86/mm: warn on W+x mappings [not found] ` <20151021094242.GA12155-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> @ 2015-10-21 12:49 ` Ingo Molnar [not found] ` <20151021124924.GA19262-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> 2015-10-21 20:38 ` Matt Fleming 1 sibling, 1 reply; 25+ messages in thread From: Ingo Molnar @ 2015-10-21 12:49 UTC (permalink / raw) To: Matt Fleming Cc: Borislav Petkov, Stephen Smalley, x86-DgEjT+Ai2ygdnm+yROfE0A, linux-kernel-u79uwXL29TY76Z2rM5mHXA, keescook-F7+t8E8rja9g9hUCZPvPmw, Thomas Gleixner, H. Peter Anvin, Peter Zijlstra, Andy Lutomirski, Denys Vlasenko, Brian Gerst, linux-efi-u79uwXL29TY76Z2rM5mHXA, Ard Biesheuvel * Ingo Molnar <mingo-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> wrote: > > * Matt Fleming <matt-mF/unelCI9GS6iBeEJttW/XRex20P6io@public.gmane.org> wrote: > > > > > Right, we could do that, but then we wouldn't be able to support > > > > creation/updating variables at runtime, such as when you install a > > > > distribution for the first time, or want to boot a new kernel filename > > > > directly from the firmware without a boot loader (and need to modify the > > > > BootXXXX variables). > > > > > > Do we know the precise position and address range of these variables? > > > > > > We could map them writable (but not executable), and the rest executable (but > > > not writable). > > > > The variables are stored in NVRAM, which we don't map into the kernel virtual > > address space. [...] > > Just curious: is there firmware that memory maps those variables privately? > > > [...] We have to initiate the transaction of writing to the variables by > > executing EFI runtime services. > > > > We obviously have buffers that we pass to the BIOS that contain variable data, > > but these should be NX anyway because they're regular kernel allocations. > > > > > That raises the question whether the same physical page ever mixes variables > > > and actual code - but the hope would be that it's suffiently page granular for > > > this to work. > > > > I don't think that would ever happen. > > Ok, that's promising, so how about this then to solve the security weakness the > new warning unearthed: map the whole EFI range as 'r-x (NX)', but detect writes > from the page fault handler and transparently allow them to flip over the range > to 'rw-'. So I meant to say 'page' instead of 'range'. I.e. this dynamic mechanism would flip pages over to 'rw-', as write faults occur from EFI code that writes to them. We don't need to know which regions are writable data, and which regions are executable-code/readonly-data. The following aspect would guarantee safety: > Note that for security reasons we don't allow a subsequent flipping back to NX > if there's an NX fault on the same page, i.e. this new mechanism is a monotonic > one-way process that should dynamically 'map out' data pages versus executable > pages. > > It should also be pretty robust, assuming we can take page faults while EFI code > is executing and is trying to modify EFI data: is that the case? and this is why I asked whether boundaries between 'Code' and 'Writable data' sections are page granular - which they do appear to be. (i.e. there are no singular pages that are both writable data and code at once.) Thanks, Ingo ^ permalink raw reply [flat|nested] 25+ messages in thread
[parent not found: <20151021124924.GA19262-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>]
* Re: [PATCH v2] x86/mm: warn on W+x mappings [not found] ` <20151021124924.GA19262-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> @ 2015-10-21 12:57 ` Ard Biesheuvel 2015-10-21 13:24 ` Borislav Petkov 0 siblings, 1 reply; 25+ messages in thread From: Ard Biesheuvel @ 2015-10-21 12:57 UTC (permalink / raw) To: Ingo Molnar Cc: Matt Fleming, Borislav Petkov, Stephen Smalley, x86-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Kees Cook, Thomas Gleixner, H. Peter Anvin, Peter Zijlstra, Andy Lutomirski, Denys Vlasenko, Brian Gerst, linux-efi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org On 21 October 2015 at 14:49, Ingo Molnar <mingo-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> wrote: > > * Ingo Molnar <mingo-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> wrote: > >> >> * Matt Fleming <matt-mF/unelCI9GS6iBeEJttW/XRex20P6io@public.gmane.org> wrote: >> >> > > > Right, we could do that, but then we wouldn't be able to support >> > > > creation/updating variables at runtime, such as when you install a >> > > > distribution for the first time, or want to boot a new kernel filename >> > > > directly from the firmware without a boot loader (and need to modify the >> > > > BootXXXX variables). >> > > >> > > Do we know the precise position and address range of these variables? >> > > >> > > We could map them writable (but not executable), and the rest executable (but >> > > not writable). >> > >> > The variables are stored in NVRAM, which we don't map into the kernel virtual >> > address space. [...] >> >> Just curious: is there firmware that memory maps those variables privately? >> >> > [...] We have to initiate the transaction of writing to the variables by >> > executing EFI runtime services. >> > >> > We obviously have buffers that we pass to the BIOS that contain variable data, >> > but these should be NX anyway because they're regular kernel allocations. >> > >> > > That raises the question whether the same physical page ever mixes variables >> > > and actual code - but the hope would be that it's suffiently page granular for >> > > this to work. >> > >> > I don't think that would ever happen. >> >> Ok, that's promising, so how about this then to solve the security weakness the >> new warning unearthed: map the whole EFI range as 'r-x (NX)', but detect writes >> from the page fault handler and transparently allow them to flip over the range >> to 'rw-'. > > So I meant to say 'page' instead of 'range'. > > I.e. this dynamic mechanism would flip pages over to 'rw-', as write faults occur > from EFI code that writes to them. > > We don't need to know which regions are writable data, and which regions are > executable-code/readonly-data. > > The following aspect would guarantee safety: > >> Note that for security reasons we don't allow a subsequent flipping back to NX >> if there's an NX fault on the same page, i.e. this new mechanism is a monotonic >> one-way process that should dynamically 'map out' data pages versus executable >> pages. >> >> It should also be pretty robust, assuming we can take page faults while EFI code >> is executing and is trying to modify EFI data: is that the case? > > and this is why I asked whether boundaries between 'Code' and 'Writable data' > sections are page granular - which they do appear to be. (i.e. there are no > singular pages that are both writable data and code at once.) > No, sadly they are not. Only in specific cases (which have to do with the new UEFIv2.5 memory protection feature that got this discussion started in the first place) can we assume that UEFI runtime pages are mappable either RW- or R-X but not RWX, and in those cases, we have the permissions bits that tell us unambiguously which pages are text and which are data. For the remaining cases, which is the vast majority, no such assumptions can be made, and since the UEFI runtime regions are typically populated with a bunch of PE/COFF images (each of which consists of text + data), inferring where the boundaries are between them does not seem tractable (for instance, to only map 'boundary' pages RWX) -- Ard. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v2] x86/mm: warn on W+x mappings 2015-10-21 12:57 ` Ard Biesheuvel @ 2015-10-21 13:24 ` Borislav Petkov [not found] ` <20151021132430.GD3575-fF5Pk5pvG8Y@public.gmane.org> 0 siblings, 1 reply; 25+ messages in thread From: Borislav Petkov @ 2015-10-21 13:24 UTC (permalink / raw) To: Ard Biesheuvel Cc: Ingo Molnar, Matt Fleming, Stephen Smalley, x86@kernel.org, linux-kernel@vger.kernel.org, Kees Cook, Thomas Gleixner, H. Peter Anvin, Peter Zijlstra, Andy Lutomirski, Denys Vlasenko, Brian Gerst, linux-efi@vger.kernel.org On Wed, Oct 21, 2015 at 02:57:47PM +0200, Ard Biesheuvel wrote: > ... For the remaining cases, which is the vast majority, no such > assumptions can be made, and since the UEFI runtime regions are > typically populated with a bunch of PE/COFF images (each of which > consists of text + data), inferring where the boundaries are between > them does not seem tractable (for instance, to only map 'boundary' > pages RWX) How much of a problem would it be if we still do the on-demand page faulting and map a trailing piece of code together with the data in a page RWX? Still better than mapping the *whole* thing RWX, no? -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply. ^ permalink raw reply [flat|nested] 25+ messages in thread
[parent not found: <20151021132430.GD3575-fF5Pk5pvG8Y@public.gmane.org>]
* Re: [PATCH v2] x86/mm: warn on W+x mappings [not found] ` <20151021132430.GD3575-fF5Pk5pvG8Y@public.gmane.org> @ 2015-10-21 13:28 ` Ard Biesheuvel 2015-10-21 14:36 ` Borislav Petkov 0 siblings, 1 reply; 25+ messages in thread From: Ard Biesheuvel @ 2015-10-21 13:28 UTC (permalink / raw) To: Borislav Petkov Cc: Ingo Molnar, Matt Fleming, Stephen Smalley, x86-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Kees Cook, Thomas Gleixner, H. Peter Anvin, Peter Zijlstra, Andy Lutomirski, Denys Vlasenko, Brian Gerst, linux-efi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org On 21 October 2015 at 15:24, Borislav Petkov <bp-Gina5bIWoIWzQB+pC5nmwQ@public.gmane.org> wrote: > On Wed, Oct 21, 2015 at 02:57:47PM +0200, Ard Biesheuvel wrote: >> ... For the remaining cases, which is the vast majority, no such >> assumptions can be made, and since the UEFI runtime regions are >> typically populated with a bunch of PE/COFF images (each of which >> consists of text + data), inferring where the boundaries are between >> them does not seem tractable (for instance, to only map 'boundary' >> pages RWX) > > How much of a problem would it be if we still do the on-demand page > faulting and map a trailing piece of code together with the data in a > page RWX? > > Still better than mapping the *whole* thing RWX, no? > In theory, yes. In practice, since this is supposed to be a security enhancement, we need some kind of ground truth to tell us which pages can be legally modified *and* executed, so that we can detect the illegal cases. My point was that, since a multitude of PE/COFF images can be covered by a single EfiRuntimeServicesCode region, the UEFI memory map does not give us enough information to make the distinction between a page that sits on the text/data boundary of some PE/COFF image and a page that sits wholly in either. -- Ard. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v2] x86/mm: warn on W+x mappings 2015-10-21 13:28 ` Ard Biesheuvel @ 2015-10-21 14:36 ` Borislav Petkov [not found] ` <20151021143651.GE3575-fF5Pk5pvG8Y@public.gmane.org> 0 siblings, 1 reply; 25+ messages in thread From: Borislav Petkov @ 2015-10-21 14:36 UTC (permalink / raw) To: Ard Biesheuvel Cc: Ingo Molnar, Matt Fleming, Stephen Smalley, x86@kernel.org, linux-kernel@vger.kernel.org, Kees Cook, Thomas Gleixner, H. Peter Anvin, Peter Zijlstra, Andy Lutomirski, Denys Vlasenko, Brian Gerst, linux-efi@vger.kernel.org On Wed, Oct 21, 2015 at 03:28:56PM +0200, Ard Biesheuvel wrote: > In theory, yes. In practice, since this is supposed to be a security > enhancement, we need some kind of ground truth to tell us which pages > can be legally modified *and* executed, so that we can detect the > illegal cases. My point was that, since a multitude of PE/COFF images > can be covered by a single EfiRuntimeServicesCode region, the UEFI > memory map does not give us enough information to make the distinction > between a page that sits on the text/data boundary of some PE/COFF > image and a page that sits wholly in either. Well, we're going to simply allow the accesses to in-kernel users which fault on those ranges, assuming that in-kernel modifiers are legit and DTRT. Which means, we don't really need to know which pages can be legally modified - we simply trust the in-kernel users. The moment you're able to load an evil kernel module, guarding against those writes is the last thing you need to worry about... -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply. ^ permalink raw reply [flat|nested] 25+ messages in thread
[parent not found: <20151021143651.GE3575-fF5Pk5pvG8Y@public.gmane.org>]
* Re: [PATCH v2] x86/mm: warn on W+x mappings [not found] ` <20151021143651.GE3575-fF5Pk5pvG8Y@public.gmane.org> @ 2015-10-21 18:46 ` Andy Lutomirski [not found] ` <CALCETrWwHLnt6=MiyuQY48zEaOUMuFTg1oN3NDdEOxa6XYMgCw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> 0 siblings, 1 reply; 25+ messages in thread From: Andy Lutomirski @ 2015-10-21 18:46 UTC (permalink / raw) To: Borislav Petkov Cc: Ard Biesheuvel, Ingo Molnar, Matt Fleming, Stephen Smalley, x86-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Kees Cook, Thomas Gleixner, H. Peter Anvin, Peter Zijlstra, Andy Lutomirski, Denys Vlasenko, Brian Gerst, linux-efi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org On Wed, Oct 21, 2015 at 7:36 AM, Borislav Petkov <bp-Gina5bIWoIWzQB+pC5nmwQ@public.gmane.org> wrote: > On Wed, Oct 21, 2015 at 03:28:56PM +0200, Ard Biesheuvel wrote: >> In theory, yes. In practice, since this is supposed to be a security >> enhancement, we need some kind of ground truth to tell us which pages >> can be legally modified *and* executed, so that we can detect the >> illegal cases. My point was that, since a multitude of PE/COFF images >> can be covered by a single EfiRuntimeServicesCode region, the UEFI >> memory map does not give us enough information to make the distinction >> between a page that sits on the text/data boundary of some PE/COFF >> image and a page that sits wholly in either. > > Well, we're going to simply allow the accesses to in-kernel users which > fault on those ranges, assuming that in-kernel modifiers are legit and > DTRT. Which means, we don't really need to know which pages can be > legally modified - we simply trust the in-kernel users. > > The moment you're able to load an evil kernel module, guarding against > those writes is the last thing you need to worry about... I don't think we can do a whole lot to help against broken UEFI code, but having anything mapped RWX is a nice target for people trying to exploit kernel bugs. Hence my suggestion to clear W except when actually running UEFI code. If the UEFI stuff is mapped in its own PGD entry, we could just RO that entire PGD entry everywhere except the UEFI pgd (and make sure to clear G so that the TLB entries get zapped). --Andy ^ permalink raw reply [flat|nested] 25+ messages in thread
[parent not found: <CALCETrWwHLnt6=MiyuQY48zEaOUMuFTg1oN3NDdEOxa6XYMgCw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>]
* Re: [PATCH v2] x86/mm: warn on W+x mappings [not found] ` <CALCETrWwHLnt6=MiyuQY48zEaOUMuFTg1oN3NDdEOxa6XYMgCw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org> @ 2015-10-21 20:45 ` Matt Fleming [not found] ` <20151021204522.GB20338-mF/unelCI9GS6iBeEJttW/XRex20P6io@public.gmane.org> 0 siblings, 1 reply; 25+ messages in thread From: Matt Fleming @ 2015-10-21 20:45 UTC (permalink / raw) To: Andy Lutomirski Cc: Borislav Petkov, Ard Biesheuvel, Ingo Molnar, Stephen Smalley, x86-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Kees Cook, Thomas Gleixner, H. Peter Anvin, Peter Zijlstra, Andy Lutomirski, Denys Vlasenko, Brian Gerst, linux-efi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org On Wed, 21 Oct, at 11:46:53AM, Andy Lutomirski wrote: > > If the UEFI stuff is mapped in its own PGD entry, we could just RO > that entire PGD entry everywhere except the UEFI pgd (and make sure to > clear G so that the TLB entries get zapped). What would be the benefit of making it RO as opposed to not having it mapped at all? The mappings only exist in the trampoline_pgd right now for x86 which minimizes the potentially vulnerable code paths to the EFI runtime calls and the suspend/resume code. ^ permalink raw reply [flat|nested] 25+ messages in thread
[parent not found: <20151021204522.GB20338-mF/unelCI9GS6iBeEJttW/XRex20P6io@public.gmane.org>]
* Re: [PATCH v2] x86/mm: warn on W+x mappings [not found] ` <20151021204522.GB20338-mF/unelCI9GS6iBeEJttW/XRex20P6io@public.gmane.org> @ 2015-10-21 20:49 ` Andy Lutomirski 0 siblings, 0 replies; 25+ messages in thread From: Andy Lutomirski @ 2015-10-21 20:49 UTC (permalink / raw) To: Matt Fleming Cc: Borislav Petkov, Ard Biesheuvel, Ingo Molnar, Stephen Smalley, x86-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Kees Cook, Thomas Gleixner, H. Peter Anvin, Peter Zijlstra, Andy Lutomirski, Denys Vlasenko, Brian Gerst, linux-efi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org On Wed, Oct 21, 2015 at 1:45 PM, Matt Fleming <matt-mF/unelCI9GS6iBeEJttW/XRex20P6io@public.gmane.org> wrote: > On Wed, 21 Oct, at 11:46:53AM, Andy Lutomirski wrote: >> >> If the UEFI stuff is mapped in its own PGD entry, we could just RO >> that entire PGD entry everywhere except the UEFI pgd (and make sure to >> clear G so that the TLB entries get zapped). > > What would be the benefit of making it RO as opposed to not having it > mapped at all? Nothing. > The mappings only exist in the trampoline_pgd right now > for x86 which minimizes the potentially vulnerable code paths to the > EFI runtime calls and the suspend/resume code. Oh, I didn't realize it. So what's the problem here? Honestly, while UEFI is full of questionable things, I don't really see how an unprivileged user program should be able to cause malicious input to be send to UEFI code, so it should be quite difficult to exploit a buffer overflow or other errant write in UEFI to escalate privileges from user to anything else. (Kernel -> SMM escalation is a whole different story, but preventing that is SMM's business, not the kernel's. I've actually been a wee bit tempted to write a /dev/smram driver to expose SMRAM using a portfolio of old known exploits.) --Andy -- Andy Lutomirski AMA Capital Management, LLC ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v2] x86/mm: warn on W+x mappings [not found] ` <20151021094242.GA12155-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> 2015-10-21 12:49 ` Ingo Molnar @ 2015-10-21 20:38 ` Matt Fleming 1 sibling, 0 replies; 25+ messages in thread From: Matt Fleming @ 2015-10-21 20:38 UTC (permalink / raw) To: Ingo Molnar Cc: Borislav Petkov, Stephen Smalley, x86-DgEjT+Ai2ygdnm+yROfE0A, linux-kernel-u79uwXL29TY76Z2rM5mHXA, keescook-F7+t8E8rja9g9hUCZPvPmw, Thomas Gleixner, H. Peter Anvin, Peter Zijlstra, Andy Lutomirski, Denys Vlasenko, Brian Gerst, linux-efi-u79uwXL29TY76Z2rM5mHXA, Ard Biesheuvel On Wed, 21 Oct, at 11:42:42AM, Ingo Molnar wrote: > > * Matt Fleming <matt-mF/unelCI9GS6iBeEJttW/XRex20P6io@public.gmane.org> wrote: > > > > > Right, we could do that, but then we wouldn't be able to support > > > > creation/updating variables at runtime, such as when you install a > > > > distribution for the first time, or want to boot a new kernel filename > > > > directly from the firmware without a boot loader (and need to modify the > > > > BootXXXX variables). > > > > > > Do we know the precise position and address range of these variables? > > > > > > We could map them writable (but not executable), and the rest executable (but > > > not writable). > > > > The variables are stored in NVRAM, which we don't map into the kernel virtual > > address space. [...] > > Just curious: is there firmware that memory maps those variables privately? Good question, not sure. I suspect not because it becomes much harder to protect those oh-so-precious variables from errant code wanting to write to them. Usually things get written on x86 from SMM code. > > [...] We have to initiate the transaction of writing to the variables by > > executing EFI runtime services. > > > > We obviously have buffers that we pass to the BIOS that contain variable data, > > but these should be NX anyway because they're regular kernel allocations. > > > > > That raises the question whether the same physical page ever mixes variables > > > and actual code - but the hope would be that it's suffiently page granular for > > > this to work. > > > > I don't think that would ever happen. > > Ok, that's promising, so how about this then to solve the security weakness the > new warning unearthed: map the whole EFI range as 'r-x (NX)', but detect writes > from the page fault handler and transparently allow them to flip over the range to > 'rw-'. > > Note that for security reasons we don't allow a subsequent flipping back to NX if > there's an NX fault on the same page, i.e. this new mechanism is a monotonic > one-way process that should dynamically 'map out' data pages versus executable > pages. > > It should also be pretty robust, assuming we can take page faults while EFI code > is executing and is trying to modify EFI data: is that the case? Yes, we can do that but I think I misunderstood what you were asking when you said, > That raises the question whether the same physical page ever mixes variables and > actual code - but the hope would be that it's suffiently page granular for this to > work. I was talking about EFI variables as defined in the UEFI spec, i.e. backed by some peristent storage mechanism. I wasn't talking about ".data" objects. It *is* possible for physical pages to contain both EFI code and data, as Ard mentioned, and we have no way of distinguishing when EFI code tried to write to a EfiRuntimeServicesCode page/region because there's also legitimate data there and when an exploit attempt is taking place. In which case, I think we'd essentially map everything with execute permission apart from the heap and other dynamically allocated objects stored in EfiRuntimeServicesData. But at that point, can't we just leave all these regions unmapped unless we're in the EFI code paths? And that includes not leaving the mappings around duing the suspend/resume code. ^ permalink raw reply [flat|nested] 25+ messages in thread
end of thread, other threads:[~2015-10-21 20:49 UTC | newest]
Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <1443814185-21552-1-git-send-email-sds@tycho.nsa.gov>
[not found] ` <20151003112701.GA4531@gmail.com>
[not found] ` <5612CBE8.2010504@tycho.nsa.gov>
[not found] ` <20151006073205.GA11115@gmail.com>
[not found] ` <5613EAD5.2070405@tycho.nsa.gov>
[not found] ` <20151012113605.GB7384@pd.tnic>
[not found] ` <20151012113605.GB7384-fF5Pk5pvG8Y@public.gmane.org>
2015-10-12 12:41 ` [PATCH v2] x86/mm: warn on W+x mappings Matt Fleming
2015-10-12 12:49 ` Ingo Molnar
[not found] ` <20151012124936.GA6260-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2015-10-12 12:55 ` Matt Fleming
[not found] ` <20151012125548.GE2579-mF/unelCI9GS6iBeEJttW/XRex20P6io@public.gmane.org>
2015-10-12 14:17 ` Ingo Molnar
[not found] ` <20151012141754.GA6621-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2015-10-12 14:49 ` Matt Fleming
[not found] ` <20151012144928.GF2579-mF/unelCI9GS6iBeEJttW/XRex20P6io@public.gmane.org>
2015-10-12 15:34 ` Ard Biesheuvel
[not found] ` <CAKv+Gu95NoB5cPqZMeBLn7i0JWDDPKX63FuiMVD94HpbGhKrhA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-10-12 15:50 ` Matt Fleming
[not found] ` <20151012155028.GH2579-mF/unelCI9GS6iBeEJttW/XRex20P6io@public.gmane.org>
2015-10-12 16:43 ` Ard Biesheuvel
2015-10-14 15:18 ` Ingo Molnar
2015-10-14 15:30 ` Andy Lutomirski
2015-10-14 15:35 ` Borislav Petkov
[not found] ` <20151014153522.GC8218-fF5Pk5pvG8Y@public.gmane.org>
2015-10-15 10:10 ` Matt Fleming
[not found] ` <20151015101016.GB2975-mF/unelCI9GS6iBeEJttW/XRex20P6io@public.gmane.org>
2015-10-15 10:33 ` Borislav Petkov
2015-10-16 1:45 ` Ricardo Neri
2015-10-14 21:02 ` Matt Fleming
2015-10-21 9:42 ` Ingo Molnar
[not found] ` <20151021094242.GA12155-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2015-10-21 12:49 ` Ingo Molnar
[not found] ` <20151021124924.GA19262-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2015-10-21 12:57 ` Ard Biesheuvel
2015-10-21 13:24 ` Borislav Petkov
[not found] ` <20151021132430.GD3575-fF5Pk5pvG8Y@public.gmane.org>
2015-10-21 13:28 ` Ard Biesheuvel
2015-10-21 14:36 ` Borislav Petkov
[not found] ` <20151021143651.GE3575-fF5Pk5pvG8Y@public.gmane.org>
2015-10-21 18:46 ` Andy Lutomirski
[not found] ` <CALCETrWwHLnt6=MiyuQY48zEaOUMuFTg1oN3NDdEOxa6XYMgCw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-10-21 20:45 ` Matt Fleming
[not found] ` <20151021204522.GB20338-mF/unelCI9GS6iBeEJttW/XRex20P6io@public.gmane.org>
2015-10-21 20:49 ` Andy Lutomirski
2015-10-21 20:38 ` Matt Fleming
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).