* Re: [PATCH v2] x86/mm: warn on W+x mappings
[not found] ` <20151012113605.GB7384-fF5Pk5pvG8Y@public.gmane.org>
@ 2015-10-12 12:41 ` Matt Fleming
2015-10-12 12:49 ` Ingo Molnar
0 siblings, 1 reply; 25+ messages in thread
From: Matt Fleming @ 2015-10-12 12:41 UTC (permalink / raw)
To: Borislav Petkov
Cc: Stephen Smalley, Ingo Molnar, x86-DgEjT+Ai2ygdnm+yROfE0A,
linux-kernel-u79uwXL29TY76Z2rM5mHXA,
keescook-F7+t8E8rja9g9hUCZPvPmw, Thomas Gleixner, H. Peter Anvin,
Peter Zijlstra, Andy Lutomirski, Denys Vlasenko, Brian Gerst,
linux-efi-u79uwXL29TY76Z2rM5mHXA, Ard Biesheuvel
On Mon, 12 Oct, at 01:36:05PM, Borislav Petkov wrote:
> On Tue, Oct 06, 2015 at 11:37:57AM -0400, Stephen Smalley wrote:
> > > What does this range correspond to on your kernel?
>
> Got a W+X splat here too, on the UEFI box with rc5+tip/master:
>
> [ 6.792949] rtc_cmos 00:02: setting system clock to 2015-10-12 11:17:03 UTC (1444648623)
> [ 6.807863] Freeing unused kernel memory: 1312K (ffffffff81f5f000 - ffffffff820a7000)
> [ 6.815831] usb 3-1: new high-speed USB device number 2 using ehci-pci
> [ 6.823261] Write protecting the kernel read-only data: 14336k
> [ 6.832196] Freeing unused kernel memory: 1796K (ffff88000383f000 - ffff880003a00000)
> [ 6.842210] Freeing unused kernel memory: 284K (ffff880003db9000 - ffff880003e00000)
> [ 6.850524] ------------[ cut here ]------------
> [ 6.855682] WARNING: CPU: 5 PID: 1 at arch/x86/mm/dump_pagetables.c:225 note_page+0x61e/0x7e0()
> [ 6.864944] x86/mm: Found insecure W+X mapping at address ffff88000005e000/0xffff88000005e000
> [ 6.874022] Modules linked in:
> [ 6.877643] CPU: 5 PID: 1 Comm: swapper/0 Not tainted 4.3.0-rc5+ #1
> [ 6.884462] Hardware name: Dell Inc. Precision T3600/0PTTT9, BIOS A13 05/11/2014
> [ 6.892416] ffffffff81caf1f7 ffff88043bdffd60 ffffffff813aab2c ffff88043bdffda8
> [ 6.900460] ffff88043bdffd98 ffffffff81066776 ffff880004e55308 0000000000000004
> [ 6.907816] usb 4-1: new high-speed USB device number 2 using ehci-pci
> [ 6.915499] 8000000000000163 ffff88043bdffe98 0000000000000000 ffff88043bdffdf8
> [ 6.923520] Call Trace:
> [ 6.926512] [<ffffffff813aab2c>] dump_stack+0x4e/0x82
> [ 6.931551] usb 3-1: New USB device found, idVendor=8087, idProduct=0024
> [ 6.931552] usb 3-1: New USB device strings: Mfr=0, Product=0, SerialNumber=0
> [ 6.933120] hub 3-1:1.0: USB hub found
> [ 6.933369] hub 3-1:1.0: 6 ports detected
> [ 6.955784] [<ffffffff81066776>] warn_slowpath_common+0x86/0xc0
> [ 6.962341] [<ffffffff810667fc>] warn_slowpath_fmt+0x4c/0x50
> [ 6.968631] [<ffffffff8105bb7e>] note_page+0x61e/0x7e0
> [ 6.974404] [<ffffffff8105c09f>] ptdump_walk_pgd_level_core+0x35f/0x3f0
> [ 6.981651] [<ffffffff8105c1d7>] ptdump_walk_pgd_level_checkwx+0x17/0x20
> [ 6.988996] [<ffffffff81051b0e>] mark_rodata_ro+0xee/0x100
> [ 6.995124] [<ffffffff81828610>] ? rest_init+0x140/0x140
> [ 7.001064] [<ffffffff8182862d>] kernel_init+0x1d/0xe0
> [ 7.006841] [<ffffffff81836f6f>] ret_from_fork+0x3f/0x70
> [ 7.012774] [<ffffffff81828610>] ? rest_init+0x140/0x140
> [ 7.018706] ---[ end trace 920055014e07ef1e ]---
> [ 7.024302] x86/mm: Checked W+X mappings: FAILED, 69568 W+X pages found.
>
> And yes, there are a bunch of those mappings here too:
>
> $ grep -c 'RW.*x' /sys/kernel/debug/kernel_page_tables
> 75
>
> Some of them are the UEFI runtime regions. I guess we can try to map
> them as RO maybe, they need to be X. Matt, any reasons against that?
I'm glad you asked (but you won't be)!
Basically, it's guaranteed that there exist some machines that contain
data in EfiRuntimeCode regions (and so require write permission) and
code in EfiRuntimeData regions (and therefore require eXecute),
because the whole point of the new EFI_PROPERTIES_TABLE feature in
UEFI v2.5 was to make it explicit when the firmware does not include
such regions.
I do have patches sitting in a git branch that begin to implement
support for mapping EFI runtime data regions as NX, and code regions
as RO when EFI_PROPERTIES_TABLE is enabled,
https://git.kernel.org/cgit/linux/kernel/git/mfleming/efi.git/log/?h=memmap
Things got stalled when we realised that Linux didn't even boot with
the feature enabled, see commit a5caa209ba9c ("x86/efi: Fix boot crash
by mapping EFI memmap entries bottom-up at runtime, instead of
top-down").
For more information on what the current limitations are for mapping
EFI regions, check out this whitepaper,
https://firmware.intel.com/sites/default/files/resources/A_Tour_Beyond_BIOS_Memory_Practices_with_UEFI.pdf
--
Matt Fleming, Intel Open Source Technology Center
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v2] x86/mm: warn on W+x mappings
2015-10-12 12:41 ` [PATCH v2] x86/mm: warn on W+x mappings Matt Fleming
@ 2015-10-12 12:49 ` Ingo Molnar
[not found] ` <20151012124936.GA6260-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
0 siblings, 1 reply; 25+ messages in thread
From: Ingo Molnar @ 2015-10-12 12:49 UTC (permalink / raw)
To: Matt Fleming
Cc: Borislav Petkov, Stephen Smalley, x86, linux-kernel, keescook,
Thomas Gleixner, H. Peter Anvin, Peter Zijlstra, Andy Lutomirski,
Denys Vlasenko, Brian Gerst, linux-efi, Ard Biesheuvel
* Matt Fleming <matt@codeblueprint.co.uk> wrote:
> On Mon, 12 Oct, at 01:36:05PM, Borislav Petkov wrote:
> > On Tue, Oct 06, 2015 at 11:37:57AM -0400, Stephen Smalley wrote:
> > > > What does this range correspond to on your kernel?
> >
> > Got a W+X splat here too, on the UEFI box with rc5+tip/master:
> >
> > [ 6.792949] rtc_cmos 00:02: setting system clock to 2015-10-12 11:17:03 UTC (1444648623)
> > [ 6.807863] Freeing unused kernel memory: 1312K (ffffffff81f5f000 - ffffffff820a7000)
> > [ 6.815831] usb 3-1: new high-speed USB device number 2 using ehci-pci
> > [ 6.823261] Write protecting the kernel read-only data: 14336k
> > [ 6.832196] Freeing unused kernel memory: 1796K (ffff88000383f000 - ffff880003a00000)
> > [ 6.842210] Freeing unused kernel memory: 284K (ffff880003db9000 - ffff880003e00000)
> > [ 6.850524] ------------[ cut here ]------------
> > [ 6.855682] WARNING: CPU: 5 PID: 1 at arch/x86/mm/dump_pagetables.c:225 note_page+0x61e/0x7e0()
> > [ 6.864944] x86/mm: Found insecure W+X mapping at address ffff88000005e000/0xffff88000005e000
> > [ 6.874022] Modules linked in:
> > [ 6.877643] CPU: 5 PID: 1 Comm: swapper/0 Not tainted 4.3.0-rc5+ #1
> > [ 6.884462] Hardware name: Dell Inc. Precision T3600/0PTTT9, BIOS A13 05/11/2014
> > [ 6.892416] ffffffff81caf1f7 ffff88043bdffd60 ffffffff813aab2c ffff88043bdffda8
> > [ 6.900460] ffff88043bdffd98 ffffffff81066776 ffff880004e55308 0000000000000004
> > [ 6.907816] usb 4-1: new high-speed USB device number 2 using ehci-pci
> > [ 6.915499] 8000000000000163 ffff88043bdffe98 0000000000000000 ffff88043bdffdf8
> > [ 6.923520] Call Trace:
> > [ 6.926512] [<ffffffff813aab2c>] dump_stack+0x4e/0x82
> > [ 6.931551] usb 3-1: New USB device found, idVendor=8087, idProduct=0024
> > [ 6.931552] usb 3-1: New USB device strings: Mfr=0, Product=0, SerialNumber=0
> > [ 6.933120] hub 3-1:1.0: USB hub found
> > [ 6.933369] hub 3-1:1.0: 6 ports detected
> > [ 6.955784] [<ffffffff81066776>] warn_slowpath_common+0x86/0xc0
> > [ 6.962341] [<ffffffff810667fc>] warn_slowpath_fmt+0x4c/0x50
> > [ 6.968631] [<ffffffff8105bb7e>] note_page+0x61e/0x7e0
> > [ 6.974404] [<ffffffff8105c09f>] ptdump_walk_pgd_level_core+0x35f/0x3f0
> > [ 6.981651] [<ffffffff8105c1d7>] ptdump_walk_pgd_level_checkwx+0x17/0x20
> > [ 6.988996] [<ffffffff81051b0e>] mark_rodata_ro+0xee/0x100
> > [ 6.995124] [<ffffffff81828610>] ? rest_init+0x140/0x140
> > [ 7.001064] [<ffffffff8182862d>] kernel_init+0x1d/0xe0
> > [ 7.006841] [<ffffffff81836f6f>] ret_from_fork+0x3f/0x70
> > [ 7.012774] [<ffffffff81828610>] ? rest_init+0x140/0x140
> > [ 7.018706] ---[ end trace 920055014e07ef1e ]---
> > [ 7.024302] x86/mm: Checked W+X mappings: FAILED, 69568 W+X pages found.
> >
> > And yes, there are a bunch of those mappings here too:
> >
> > $ grep -c 'RW.*x' /sys/kernel/debug/kernel_page_tables
> > 75
> >
> > Some of them are the UEFI runtime regions. I guess we can try to map
> > them as RO maybe, they need to be X. Matt, any reasons against that?
>
> I'm glad you asked (but you won't be)!
>
> Basically, it's guaranteed that there exist some machines that contain
> data in EfiRuntimeCode regions (and so require write permission) and
> code in EfiRuntimeData regions (and therefore require eXecute),
> because the whole point of the new EFI_PROPERTIES_TABLE feature in
> UEFI v2.5 was to make it explicit when the firmware does not include
> such regions.
So why not unmap them after bootup? Is there any reason to call into EFI code
while the system is up and running?
Thanks,
Ingo
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v2] x86/mm: warn on W+x mappings
[not found] ` <20151012124936.GA6260-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2015-10-12 12:55 ` Matt Fleming
[not found] ` <20151012125548.GE2579-mF/unelCI9GS6iBeEJttW/XRex20P6io@public.gmane.org>
0 siblings, 1 reply; 25+ messages in thread
From: Matt Fleming @ 2015-10-12 12:55 UTC (permalink / raw)
To: Ingo Molnar
Cc: Borislav Petkov, Stephen Smalley, x86-DgEjT+Ai2ygdnm+yROfE0A,
linux-kernel-u79uwXL29TY76Z2rM5mHXA,
keescook-F7+t8E8rja9g9hUCZPvPmw, Thomas Gleixner, H. Peter Anvin,
Peter Zijlstra, Andy Lutomirski, Denys Vlasenko, Brian Gerst,
linux-efi-u79uwXL29TY76Z2rM5mHXA, Ard Biesheuvel
On Mon, 12 Oct, at 02:49:36PM, Ingo Molnar wrote:
>
>
> So why not unmap them after bootup? Is there any reason to call into EFI code
> while the system is up and running?
That's where the runtime services code lives. So if you want things
like EFI variables (used by the distro installer, among other things)
you need to map the runtime regions.
You can of course disable that by using the "noefi" kernel parameter,
which should unmap everything for you once you've finished booting.
--
Matt Fleming, Intel Open Source Technology Center
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v2] x86/mm: warn on W+x mappings
[not found] ` <20151012125548.GE2579-mF/unelCI9GS6iBeEJttW/XRex20P6io@public.gmane.org>
@ 2015-10-12 14:17 ` Ingo Molnar
[not found] ` <20151012141754.GA6621-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
0 siblings, 1 reply; 25+ messages in thread
From: Ingo Molnar @ 2015-10-12 14:17 UTC (permalink / raw)
To: Matt Fleming
Cc: Borislav Petkov, Stephen Smalley, x86-DgEjT+Ai2ygdnm+yROfE0A,
linux-kernel-u79uwXL29TY76Z2rM5mHXA,
keescook-F7+t8E8rja9g9hUCZPvPmw, Thomas Gleixner, H. Peter Anvin,
Peter Zijlstra, Andy Lutomirski, Denys Vlasenko, Brian Gerst,
linux-efi-u79uwXL29TY76Z2rM5mHXA, Ard Biesheuvel
* Matt Fleming <matt-mF/unelCI9GS6iBeEJttW/XRex20P6io@public.gmane.org> wrote:
> On Mon, 12 Oct, at 02:49:36PM, Ingo Molnar wrote:
> >
> >
> > So why not unmap them after bootup? Is there any reason to call into EFI code
> > while the system is up and running?
>
> That's where the runtime services code lives. So if you want things like EFI
> variables (used by the distro installer, among other things) you need to map the
> runtime regions.
So EFI variables could be queried during bootup and saved on the Linux side.
Calling into firmware after the kernel has booted up is fragile in general -
beyond W+X the security considerations.
Thanks,
Ingo
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v2] x86/mm: warn on W+x mappings
[not found] ` <20151012141754.GA6621-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2015-10-12 14:49 ` Matt Fleming
[not found] ` <20151012144928.GF2579-mF/unelCI9GS6iBeEJttW/XRex20P6io@public.gmane.org>
2015-10-14 15:18 ` Ingo Molnar
0 siblings, 2 replies; 25+ messages in thread
From: Matt Fleming @ 2015-10-12 14:49 UTC (permalink / raw)
To: Ingo Molnar
Cc: Borislav Petkov, Stephen Smalley, x86-DgEjT+Ai2ygdnm+yROfE0A,
linux-kernel-u79uwXL29TY76Z2rM5mHXA,
keescook-F7+t8E8rja9g9hUCZPvPmw, Thomas Gleixner, H. Peter Anvin,
Peter Zijlstra, Andy Lutomirski, Denys Vlasenko, Brian Gerst,
linux-efi-u79uwXL29TY76Z2rM5mHXA, Ard Biesheuvel
On Mon, 12 Oct, at 04:17:54PM, Ingo Molnar wrote:
>
> * Matt Fleming <matt-mF/unelCI9GS6iBeEJttW/XRex20P6io@public.gmane.org> wrote:
>
> > On Mon, 12 Oct, at 02:49:36PM, Ingo Molnar wrote:
> > >
> > >
> > > So why not unmap them after bootup? Is there any reason to call into EFI code
> > > while the system is up and running?
> >
> > That's where the runtime services code lives. So if you want things like EFI
> > variables (used by the distro installer, among other things) you need to map the
> > runtime regions.
>
> So EFI variables could be queried during bootup and saved on the Linux side.
Right, we could do that, but then we wouldn't be able to support
creation/updating variables at runtime, such as when you install a
distribution for the first time, or want to boot a new kernel filename
directly from the firmware without a boot loader (and need to modify
the BootXXXX variables).
And it's not just EFI variables that need runtime support either, for
some platforms the only way to reboot/poweroff is with EFI, such as on
the ASUS T100TA (Intel Baytrail-T).
That's not to say your suggestion doesn't make sense for some cases, I
can definitely see how turning off runtime support but providing a
cache of EFI variables would be useful for some scenarios. But I don't
think it's ever going to be workable as a default option.
> Calling into firmware after the kernel has booted up is fragile in general -
> beyond W+X the security considerations.
It isn't intended to be fragile, and effort has gone into defining the
context under which the EFI runtime services can operate (though there
are obviously gaps in that specification).
The entire point of the EFI runtime services is that they can be
invoked from the OS, and because hardware/firmware developers rely on
that when designing platforms, it's going to be something that Linux
is going to have to be able to do. Of course, that doesn't we
shouldn't be able to turn it off if the user is happy to sacrifice
some platform functionality.
Additionally, if we've got suggestions for the firmware developers on
what we want the runtime context to look like, let's propose it to
them. They're pretty receptive in my experience.
--
Matt Fleming, Intel Open Source Technology Center
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v2] x86/mm: warn on W+x mappings
[not found] ` <20151012144928.GF2579-mF/unelCI9GS6iBeEJttW/XRex20P6io@public.gmane.org>
@ 2015-10-12 15:34 ` Ard Biesheuvel
[not found] ` <CAKv+Gu95NoB5cPqZMeBLn7i0JWDDPKX63FuiMVD94HpbGhKrhA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
0 siblings, 1 reply; 25+ messages in thread
From: Ard Biesheuvel @ 2015-10-12 15:34 UTC (permalink / raw)
To: Matt Fleming
Cc: Ingo Molnar, Borislav Petkov, Stephen Smalley,
x86-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org,
linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Kees Cook,
Thomas Gleixner, H. Peter Anvin, Peter Zijlstra, Andy Lutomirski,
Denys Vlasenko, Brian Gerst,
linux-efi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
On 12 October 2015 at 16:49, Matt Fleming <matt-mF/unelCI9GS6iBeEJttW/XRex20P6io@public.gmane.org> wrote:
> On Mon, 12 Oct, at 04:17:54PM, Ingo Molnar wrote:
>>
>> * Matt Fleming <matt-mF/unelCI9GS6iBeEJttW/XRex20P6io@public.gmane.org> wrote:
>>
>> > On Mon, 12 Oct, at 02:49:36PM, Ingo Molnar wrote:
>> > >
>> > >
>> > > So why not unmap them after bootup? Is there any reason to call into EFI code
>> > > while the system is up and running?
>> >
>> > That's where the runtime services code lives. So if you want things like EFI
>> > variables (used by the distro installer, among other things) you need to map the
>> > runtime regions.
>>
>> So EFI variables could be queried during bootup and saved on the Linux side.
>
> Right, we could do that, but then we wouldn't be able to support
> creation/updating variables at runtime, such as when you install a
> distribution for the first time, or want to boot a new kernel filename
> directly from the firmware without a boot loader (and need to modify
> the BootXXXX variables).
>
> And it's not just EFI variables that need runtime support either, for
> some platforms the only way to reboot/poweroff is with EFI, such as on
> the ASUS T100TA (Intel Baytrail-T).
>
> That's not to say your suggestion doesn't make sense for some cases, I
> can definitely see how turning off runtime support but providing a
> cache of EFI variables would be useful for some scenarios. But I don't
> think it's ever going to be workable as a default option.
>
>> Calling into firmware after the kernel has booted up is fragile in general -
>> beyond W+X the security considerations.
>
> It isn't intended to be fragile, and effort has gone into defining the
> context under which the EFI runtime services can operate (though there
> are obviously gaps in that specification).
>
> The entire point of the EFI runtime services is that they can be
> invoked from the OS, and because hardware/firmware developers rely on
> that when designing platforms, it's going to be something that Linux
> is going to have to be able to do. Of course, that doesn't we
> shouldn't be able to turn it off if the user is happy to sacrifice
> some platform functionality.
>
> Additionally, if we've got suggestions for the firmware developers on
> what we want the runtime context to look like, let's propose it to
> them. They're pretty receptive in my experience.
>
On arm64, we only map in all of the UEFI runtime services regions
during the time any of these services are being invoked. I think this
should be mostly feasible on x86 as well, although it would involve
yet another rewrite of the EFI region mapping code, and most likely a
long list of quirks for platforms that are not able to deal with it
correctly for one reason or the other (but that all come down to: 'if
you are not doing it like Windows does it, you must be doing it
wrong'). That would make the 'secure' way of mapping things an opt-in
feature, which is generally not desirable for security features (since
it will rarely be used in the real world then).
So enabling the Properties Table memprotect feature as soon as the
spec defines it in a meaningful way is probably a better way to go,
and our current involvement is focused on defining it such that it can
be enabled by default by firmwares rather than ending up an obscure
switch in the BIOS screen that only the paranoid ever turn on.
--
Ard.
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v2] x86/mm: warn on W+x mappings
[not found] ` <CAKv+Gu95NoB5cPqZMeBLn7i0JWDDPKX63FuiMVD94HpbGhKrhA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2015-10-12 15:50 ` Matt Fleming
[not found] ` <20151012155028.GH2579-mF/unelCI9GS6iBeEJttW/XRex20P6io@public.gmane.org>
0 siblings, 1 reply; 25+ messages in thread
From: Matt Fleming @ 2015-10-12 15:50 UTC (permalink / raw)
To: Ard Biesheuvel
Cc: Ingo Molnar, Borislav Petkov, Stephen Smalley,
x86-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org,
linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Kees Cook,
Thomas Gleixner, H. Peter Anvin, Peter Zijlstra, Andy Lutomirski,
Denys Vlasenko, Brian Gerst,
linux-efi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
On Mon, 12 Oct, at 05:34:53PM, Ard Biesheuvel wrote:
>
> On arm64, we only map in all of the UEFI runtime services regions
> during the time any of these services are being invoked. I think this
> should be mostly feasible on x86 as well, although it would involve
> yet another rewrite of the EFI region mapping code, and most likely a
> long list of quirks for platforms that are not able to deal with it
> correctly for one reason or the other (but that all come down to: 'if
> you are not doing it like Windows does it, you must be doing it
> wrong').
Actually, we use separate page tables for mapping the EFI runtime
services on x86 right now. These tables are only used when making
runtime calls, just like on arm64.
So we've got a little bit of isolation right now.
> That would make the 'secure' way of mapping things an opt-in
> feature, which is generally not desirable for security features
> (since it will rarely be used in the real world then).
I'd like to think that we're coming to EFI_PROPERTIES_TABLE early
enough that we can work out all the kinks, get things working out of
the box in upstream tianocore, and have it be the standard way to
expose memory regions.
At least that's my hope.
> So enabling the Properties Table memprotect feature as soon as the
> spec defines it in a meaningful way is probably a better way to go,
> and our current involvement is focused on defining it such that it can
> be enabled by default by firmwares rather than ending up an obscure
> switch in the BIOS screen that only the paranoid ever turn on.
Indeed.
--
Matt Fleming, Intel Open Source Technology Center
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v2] x86/mm: warn on W+x mappings
[not found] ` <20151012155028.GH2579-mF/unelCI9GS6iBeEJttW/XRex20P6io@public.gmane.org>
@ 2015-10-12 16:43 ` Ard Biesheuvel
0 siblings, 0 replies; 25+ messages in thread
From: Ard Biesheuvel @ 2015-10-12 16:43 UTC (permalink / raw)
To: Matt Fleming
Cc: Ingo Molnar, Borislav Petkov, Stephen Smalley,
x86-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org,
linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Kees Cook,
Thomas Gleixner, H. Peter Anvin, Peter Zijlstra, Andy Lutomirski,
Denys Vlasenko, Brian Gerst,
linux-efi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
On 12 October 2015 at 17:50, Matt Fleming <matt-mF/unelCI9GS6iBeEJttW/XRex20P6io@public.gmane.org> wrote:
> On Mon, 12 Oct, at 05:34:53PM, Ard Biesheuvel wrote:
>>
>> On arm64, we only map in all of the UEFI runtime services regions
>> during the time any of these services are being invoked. I think this
>> should be mostly feasible on x86 as well, although it would involve
>> yet another rewrite of the EFI region mapping code, and most likely a
>> long list of quirks for platforms that are not able to deal with it
>> correctly for one reason or the other (but that all come down to: 'if
>> you are not doing it like Windows does it, you must be doing it
>> wrong').
>
> Actually, we use separate page tables for mapping the EFI runtime
> services on x86 right now. These tables are only used when making
> runtime calls, just like on arm64.
>
> So we've got a little bit of isolation right now.
>
Ah ok. I thought that only applied to the duplicate 1:1 mapping, not
to the high mapping.
But that does reduce the attack surface considerably. Combined with
strict w^x once the UEFI 2.5 feature is fully supported, I am a lot
less nervous about RWX EFI runtime regions being used to subvert the
system.
--
Ard.
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v2] x86/mm: warn on W+x mappings
2015-10-12 14:49 ` Matt Fleming
[not found] ` <20151012144928.GF2579-mF/unelCI9GS6iBeEJttW/XRex20P6io@public.gmane.org>
@ 2015-10-14 15:18 ` Ingo Molnar
2015-10-14 15:30 ` Andy Lutomirski
2015-10-14 21:02 ` Matt Fleming
1 sibling, 2 replies; 25+ messages in thread
From: Ingo Molnar @ 2015-10-14 15:18 UTC (permalink / raw)
To: Matt Fleming
Cc: Borislav Petkov, Stephen Smalley, x86, linux-kernel, keescook,
Thomas Gleixner, H. Peter Anvin, Peter Zijlstra, Andy Lutomirski,
Denys Vlasenko, Brian Gerst, linux-efi, Ard Biesheuvel
* Matt Fleming <matt@codeblueprint.co.uk> wrote:
> On Mon, 12 Oct, at 04:17:54PM, Ingo Molnar wrote:
> >
> > * Matt Fleming <matt@codeblueprint.co.uk> wrote:
> >
> > > On Mon, 12 Oct, at 02:49:36PM, Ingo Molnar wrote:
> > > >
> > > >
> > > > So why not unmap them after bootup? Is there any reason to call into EFI code
> > > > while the system is up and running?
> > >
> > > That's where the runtime services code lives. So if you want things like EFI
> > > variables (used by the distro installer, among other things) you need to map the
> > > runtime regions.
> >
> > So EFI variables could be queried during bootup and saved on the Linux side.
>
> Right, we could do that, but then we wouldn't be able to support
> creation/updating variables at runtime, such as when you install a
> distribution for the first time, or want to boot a new kernel filename
> directly from the firmware without a boot loader (and need to modify
> the BootXXXX variables).
Do we know the precise position and address range of these variables?
We could map them writable (but not executable), and the rest executable (but not
writable).
That raises the question whether the same physical page ever mixes variables and
actual code - but the hope would be that it's suffiently page granular for this to
work.
Thanks,
Ingo
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v2] x86/mm: warn on W+x mappings
2015-10-14 15:18 ` Ingo Molnar
@ 2015-10-14 15:30 ` Andy Lutomirski
2015-10-14 15:35 ` Borislav Petkov
2015-10-14 21:02 ` Matt Fleming
1 sibling, 1 reply; 25+ messages in thread
From: Andy Lutomirski @ 2015-10-14 15:30 UTC (permalink / raw)
To: Ingo Molnar
Cc: Matt Fleming, Borislav Petkov, Stephen Smalley, X86 ML,
linux-kernel@vger.kernel.org, Kees Cook, Thomas Gleixner,
H. Peter Anvin, Peter Zijlstra, Andy Lutomirski, Denys Vlasenko,
Brian Gerst, linux-efi@vger.kernel.org, Ard Biesheuvel
On Wed, Oct 14, 2015 at 8:18 AM, Ingo Molnar <mingo@kernel.org> wrote:
>
> * Matt Fleming <matt@codeblueprint.co.uk> wrote:
>
>> On Mon, 12 Oct, at 04:17:54PM, Ingo Molnar wrote:
>> >
>> > * Matt Fleming <matt@codeblueprint.co.uk> wrote:
>> >
>> > > On Mon, 12 Oct, at 02:49:36PM, Ingo Molnar wrote:
>> > > >
>> > > >
>> > > > So why not unmap them after bootup? Is there any reason to call into EFI code
>> > > > while the system is up and running?
>> > >
>> > > That's where the runtime services code lives. So if you want things like EFI
>> > > variables (used by the distro installer, among other things) you need to map the
>> > > runtime regions.
>> >
>> > So EFI variables could be queried during bootup and saved on the Linux side.
>>
>> Right, we could do that, but then we wouldn't be able to support
>> creation/updating variables at runtime, such as when you install a
>> distribution for the first time, or want to boot a new kernel filename
>> directly from the firmware without a boot loader (and need to modify
>> the BootXXXX variables).
>
> Do we know the precise position and address range of these variables?
>
> We could map them writable (but not executable), and the rest executable (but not
> writable).
>
> That raises the question whether the same physical page ever mixes variables and
> actual code - but the hope would be that it's suffiently page granular for this to
> work.
Can we just unmap these things until someone tries to do an EFI call,
and then unmap them again after the call returns? We already switch
pgds for EFI IIRC.
--Andy
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v2] x86/mm: warn on W+x mappings
2015-10-14 15:30 ` Andy Lutomirski
@ 2015-10-14 15:35 ` Borislav Petkov
[not found] ` <20151014153522.GC8218-fF5Pk5pvG8Y@public.gmane.org>
0 siblings, 1 reply; 25+ messages in thread
From: Borislav Petkov @ 2015-10-14 15:35 UTC (permalink / raw)
To: Andy Lutomirski
Cc: Ingo Molnar, Matt Fleming, Stephen Smalley, X86 ML,
linux-kernel@vger.kernel.org, Kees Cook, Thomas Gleixner,
H. Peter Anvin, Peter Zijlstra, Andy Lutomirski, Denys Vlasenko,
Brian Gerst, linux-efi@vger.kernel.org, Ard Biesheuvel
On Wed, Oct 14, 2015 at 08:30:48AM -0700, Andy Lutomirski wrote:
> Can we just unmap these things until someone tries to do an EFI call,
> and then unmap them again after the call returns? We already switch
> pgds for EFI IIRC.
hpa did mention an EFI-aware page fault handler at the time. I guess we
could do that too...
--
Regards/Gruss,
Boris.
ECO tip #101: Trim your mails when you reply.
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v2] x86/mm: warn on W+x mappings
2015-10-14 15:18 ` Ingo Molnar
2015-10-14 15:30 ` Andy Lutomirski
@ 2015-10-14 21:02 ` Matt Fleming
2015-10-21 9:42 ` Ingo Molnar
1 sibling, 1 reply; 25+ messages in thread
From: Matt Fleming @ 2015-10-14 21:02 UTC (permalink / raw)
To: Ingo Molnar
Cc: Borislav Petkov, Stephen Smalley, x86, linux-kernel, keescook,
Thomas Gleixner, H. Peter Anvin, Peter Zijlstra, Andy Lutomirski,
Denys Vlasenko, Brian Gerst, linux-efi, Ard Biesheuvel
On Wed, 14 Oct, at 05:18:07PM, Ingo Molnar wrote:
>
> * Matt Fleming <matt@codeblueprint.co.uk> wrote:
>
> > On Mon, 12 Oct, at 04:17:54PM, Ingo Molnar wrote:
> > >
> > > * Matt Fleming <matt@codeblueprint.co.uk> wrote:
> > >
> > > > On Mon, 12 Oct, at 02:49:36PM, Ingo Molnar wrote:
> > > > >
> > > > >
> > > > > So why not unmap them after bootup? Is there any reason to call into EFI code
> > > > > while the system is up and running?
> > > >
> > > > That's where the runtime services code lives. So if you want things like EFI
> > > > variables (used by the distro installer, among other things) you need to map the
> > > > runtime regions.
> > >
> > > So EFI variables could be queried during bootup and saved on the Linux side.
> >
> > Right, we could do that, but then we wouldn't be able to support
> > creation/updating variables at runtime, such as when you install a
> > distribution for the first time, or want to boot a new kernel filename
> > directly from the firmware without a boot loader (and need to modify
> > the BootXXXX variables).
>
> Do we know the precise position and address range of these variables?
>
> We could map them writable (but not executable), and the rest executable (but not
> writable).
The variables are stored in NVRAM, which we don't map into the kernel
virtual address space. We have to initiate the transaction of writing
to the variables by executing EFI runtime services.
We obviously have buffers that we pass to the BIOS that contain
variable data, but these should be NX anyway because they're regular
kernel allocations.
> That raises the question whether the same physical page ever mixes variables and
> actual code - but the hope would be that it's suffiently page granular for this to
> work.
I don't think that would ever happen.
--
Matt Fleming, Intel Open Source Technology Center
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v2] x86/mm: warn on W+x mappings
[not found] ` <20151014153522.GC8218-fF5Pk5pvG8Y@public.gmane.org>
@ 2015-10-15 10:10 ` Matt Fleming
[not found] ` <20151015101016.GB2975-mF/unelCI9GS6iBeEJttW/XRex20P6io@public.gmane.org>
0 siblings, 1 reply; 25+ messages in thread
From: Matt Fleming @ 2015-10-15 10:10 UTC (permalink / raw)
To: Borislav Petkov
Cc: Andy Lutomirski, Ingo Molnar, Stephen Smalley, X86 ML,
linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Kees Cook,
Thomas Gleixner, H. Peter Anvin, Peter Zijlstra, Andy Lutomirski,
Denys Vlasenko, Brian Gerst,
linux-efi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Ard Biesheuvel,
Ricardo Neri, luv-hn68Rpc1hR1g9hUCZPvPmw
On Wed, 14 Oct, at 05:35:22PM, Borislav Petkov wrote:
> On Wed, Oct 14, 2015 at 08:30:48AM -0700, Andy Lutomirski wrote:
> > Can we just unmap these things until someone tries to do an EFI call,
> > and then unmap them again after the call returns? We already switch
> > pgds for EFI IIRC.
>
> hpa did mention an EFI-aware page fault handler at the time. I guess we
> could do that too...
We do this for the Linux UEFI Validation project kernel [1]. There, we
do not map EFI Boot Services regions by default, only if the firmware
tries to access them.
This gives us the opporunity to print an error message if Boot
Services regions are accessed after ExitBootServices() (which is the
bug mjg59 describes in commit 916f676f8dc0 ("x86, efi: Retain boot
service code until after switching to virtual mode")).
But for the issue being discussed in this thread, the thing unmapping
the EFI regions buys you is that they're no longer accessible from the
x86 sleep/wakeup code paths, since those also use trampoline_pgd which
is where the EFI page tables are mapped.
And that's probably a good idea.
[1] - https://git.kernel.org/cgit/linux/kernel/git/mfleming/efi.git/commit/?h=stable&id=9b78793058bf93958aa9529400cb2617ec1bc958
--
Matt Fleming, Intel Open Source Technology Center
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v2] x86/mm: warn on W+x mappings
[not found] ` <20151015101016.GB2975-mF/unelCI9GS6iBeEJttW/XRex20P6io@public.gmane.org>
@ 2015-10-15 10:33 ` Borislav Petkov
2015-10-16 1:45 ` Ricardo Neri
0 siblings, 1 reply; 25+ messages in thread
From: Borislav Petkov @ 2015-10-15 10:33 UTC (permalink / raw)
To: Matt Fleming
Cc: Andy Lutomirski, Ingo Molnar, Stephen Smalley, X86 ML,
linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Kees Cook,
Thomas Gleixner, H. Peter Anvin, Peter Zijlstra, Andy Lutomirski,
Denys Vlasenko, Brian Gerst,
linux-efi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Ard Biesheuvel,
Ricardo Neri, luv-hn68Rpc1hR1g9hUCZPvPmw
On Thu, Oct 15, 2015 at 11:10:16AM +0100, Matt Fleming wrote:
> We do this for the Linux UEFI Validation project kernel [1]. There, we
> do not map EFI Boot Services regions by default, only if the firmware
> tries to access them.
>
> This gives us the opporunity to print an error message if Boot
> Services regions are accessed after ExitBootServices() (which is the
> bug mjg59 describes in commit 916f676f8dc0 ("x86, efi: Retain boot
> service code until after switching to virtual mode")).
Yeah, that's actually a good idea. Why not upstream it for the wider
audience so that people can actually start reporting b0rked UEFIs? With
a big and nice FW_BUG splat in there...
> But for the issue being discussed in this thread, the thing unmapping
> the EFI regions buys you is that they're no longer accessible from the
> x86 sleep/wakeup code paths, since those also use trampoline_pgd which
> is where the EFI page tables are mapped.
>
> And that's probably a good idea.
>
> [1] - https://git.kernel.org/cgit/linux/kernel/git/mfleming/efi.git/commit/?h=stable&id=9b78793058bf93958aa9529400cb2617ec1bc958
In reading that commit message above, the fact that the braindead
decision of allowing SetVirtualAddressMap() to be called only once
reminds me that we can't really have a PF handler for runtime
services as *all* mappings need to be ready before calling
SetVirtualAddressMap().
Or, alternatively, we can prep them, call SetVirtualAddressMap() and
then unmap them all and map them again at the same addresses only in the
PF handler, each time a runtime call happens. When that call finishes,
we unmap them again...
Hmm, perhaps not worth the trouble...
--
Regards/Gruss,
Boris.
ECO tip #101: Trim your mails when you reply.
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v2] x86/mm: warn on W+x mappings
2015-10-15 10:33 ` Borislav Petkov
@ 2015-10-16 1:45 ` Ricardo Neri
0 siblings, 0 replies; 25+ messages in thread
From: Ricardo Neri @ 2015-10-16 1:45 UTC (permalink / raw)
To: Borislav Petkov
Cc: Matt Fleming, Andy Lutomirski, Ingo Molnar, Stephen Smalley,
X86 ML, linux-kernel@vger.kernel.org, Kees Cook, Thomas Gleixner,
H. Peter Anvin, Peter Zijlstra, Andy Lutomirski, Denys Vlasenko,
Brian Gerst, linux-efi@vger.kernel.org, Ard Biesheuvel, luv
On Thu, 2015-10-15 at 12:33 +0200, Borislav Petkov wrote:
> Yeah, that's actually a good idea. Why not upstream it for the wider
> audience so that people can actually start reporting b0rked UEFIs?
> With
> a big and nice FW_BUG splat in there...
We attempted to upstream in the past but later I discovered that my
implementation in particular is causing warnings due to SMP. Also, I
need to implement an alternative or extension to the current
efi_map_region, which have the __init qualifier. This is because
mappings might happen after the __inits have been discarded. I have this
work currently in my scope.
Thanks and BR,
Ricardo
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v2] x86/mm: warn on W+x mappings
2015-10-14 21:02 ` Matt Fleming
@ 2015-10-21 9:42 ` Ingo Molnar
[not found] ` <20151021094242.GA12155-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
0 siblings, 1 reply; 25+ messages in thread
From: Ingo Molnar @ 2015-10-21 9:42 UTC (permalink / raw)
To: Matt Fleming
Cc: Borislav Petkov, Stephen Smalley, x86, linux-kernel, keescook,
Thomas Gleixner, H. Peter Anvin, Peter Zijlstra, Andy Lutomirski,
Denys Vlasenko, Brian Gerst, linux-efi, Ard Biesheuvel
* Matt Fleming <matt@codeblueprint.co.uk> wrote:
> > > Right, we could do that, but then we wouldn't be able to support
> > > creation/updating variables at runtime, such as when you install a
> > > distribution for the first time, or want to boot a new kernel filename
> > > directly from the firmware without a boot loader (and need to modify the
> > > BootXXXX variables).
> >
> > Do we know the precise position and address range of these variables?
> >
> > We could map them writable (but not executable), and the rest executable (but
> > not writable).
>
> The variables are stored in NVRAM, which we don't map into the kernel virtual
> address space. [...]
Just curious: is there firmware that memory maps those variables privately?
> [...] We have to initiate the transaction of writing to the variables by
> executing EFI runtime services.
>
> We obviously have buffers that we pass to the BIOS that contain variable data,
> but these should be NX anyway because they're regular kernel allocations.
>
> > That raises the question whether the same physical page ever mixes variables
> > and actual code - but the hope would be that it's suffiently page granular for
> > this to work.
>
> I don't think that would ever happen.
Ok, that's promising, so how about this then to solve the security weakness the
new warning unearthed: map the whole EFI range as 'r-x (NX)', but detect writes
from the page fault handler and transparently allow them to flip over the range to
'rw-'.
Note that for security reasons we don't allow a subsequent flipping back to NX if
there's an NX fault on the same page, i.e. this new mechanism is a monotonic
one-way process that should dynamically 'map out' data pages versus executable
pages.
It should also be pretty robust, assuming we can take page faults while EFI code
is executing and is trying to modify EFI data: is that the case?
Thanks,
Ingo
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v2] x86/mm: warn on W+x mappings
[not found] ` <20151021094242.GA12155-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2015-10-21 12:49 ` Ingo Molnar
[not found] ` <20151021124924.GA19262-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2015-10-21 20:38 ` Matt Fleming
1 sibling, 1 reply; 25+ messages in thread
From: Ingo Molnar @ 2015-10-21 12:49 UTC (permalink / raw)
To: Matt Fleming
Cc: Borislav Petkov, Stephen Smalley, x86-DgEjT+Ai2ygdnm+yROfE0A,
linux-kernel-u79uwXL29TY76Z2rM5mHXA,
keescook-F7+t8E8rja9g9hUCZPvPmw, Thomas Gleixner, H. Peter Anvin,
Peter Zijlstra, Andy Lutomirski, Denys Vlasenko, Brian Gerst,
linux-efi-u79uwXL29TY76Z2rM5mHXA, Ard Biesheuvel
* Ingo Molnar <mingo-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> wrote:
>
> * Matt Fleming <matt-mF/unelCI9GS6iBeEJttW/XRex20P6io@public.gmane.org> wrote:
>
> > > > Right, we could do that, but then we wouldn't be able to support
> > > > creation/updating variables at runtime, such as when you install a
> > > > distribution for the first time, or want to boot a new kernel filename
> > > > directly from the firmware without a boot loader (and need to modify the
> > > > BootXXXX variables).
> > >
> > > Do we know the precise position and address range of these variables?
> > >
> > > We could map them writable (but not executable), and the rest executable (but
> > > not writable).
> >
> > The variables are stored in NVRAM, which we don't map into the kernel virtual
> > address space. [...]
>
> Just curious: is there firmware that memory maps those variables privately?
>
> > [...] We have to initiate the transaction of writing to the variables by
> > executing EFI runtime services.
> >
> > We obviously have buffers that we pass to the BIOS that contain variable data,
> > but these should be NX anyway because they're regular kernel allocations.
> >
> > > That raises the question whether the same physical page ever mixes variables
> > > and actual code - but the hope would be that it's suffiently page granular for
> > > this to work.
> >
> > I don't think that would ever happen.
>
> Ok, that's promising, so how about this then to solve the security weakness the
> new warning unearthed: map the whole EFI range as 'r-x (NX)', but detect writes
> from the page fault handler and transparently allow them to flip over the range
> to 'rw-'.
So I meant to say 'page' instead of 'range'.
I.e. this dynamic mechanism would flip pages over to 'rw-', as write faults occur
from EFI code that writes to them.
We don't need to know which regions are writable data, and which regions are
executable-code/readonly-data.
The following aspect would guarantee safety:
> Note that for security reasons we don't allow a subsequent flipping back to NX
> if there's an NX fault on the same page, i.e. this new mechanism is a monotonic
> one-way process that should dynamically 'map out' data pages versus executable
> pages.
>
> It should also be pretty robust, assuming we can take page faults while EFI code
> is executing and is trying to modify EFI data: is that the case?
and this is why I asked whether boundaries between 'Code' and 'Writable data'
sections are page granular - which they do appear to be. (i.e. there are no
singular pages that are both writable data and code at once.)
Thanks,
Ingo
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v2] x86/mm: warn on W+x mappings
[not found] ` <20151021124924.GA19262-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2015-10-21 12:57 ` Ard Biesheuvel
2015-10-21 13:24 ` Borislav Petkov
0 siblings, 1 reply; 25+ messages in thread
From: Ard Biesheuvel @ 2015-10-21 12:57 UTC (permalink / raw)
To: Ingo Molnar
Cc: Matt Fleming, Borislav Petkov, Stephen Smalley,
x86-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org,
linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Kees Cook,
Thomas Gleixner, H. Peter Anvin, Peter Zijlstra, Andy Lutomirski,
Denys Vlasenko, Brian Gerst,
linux-efi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
On 21 October 2015 at 14:49, Ingo Molnar <mingo-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> wrote:
>
> * Ingo Molnar <mingo-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> wrote:
>
>>
>> * Matt Fleming <matt-mF/unelCI9GS6iBeEJttW/XRex20P6io@public.gmane.org> wrote:
>>
>> > > > Right, we could do that, but then we wouldn't be able to support
>> > > > creation/updating variables at runtime, such as when you install a
>> > > > distribution for the first time, or want to boot a new kernel filename
>> > > > directly from the firmware without a boot loader (and need to modify the
>> > > > BootXXXX variables).
>> > >
>> > > Do we know the precise position and address range of these variables?
>> > >
>> > > We could map them writable (but not executable), and the rest executable (but
>> > > not writable).
>> >
>> > The variables are stored in NVRAM, which we don't map into the kernel virtual
>> > address space. [...]
>>
>> Just curious: is there firmware that memory maps those variables privately?
>>
>> > [...] We have to initiate the transaction of writing to the variables by
>> > executing EFI runtime services.
>> >
>> > We obviously have buffers that we pass to the BIOS that contain variable data,
>> > but these should be NX anyway because they're regular kernel allocations.
>> >
>> > > That raises the question whether the same physical page ever mixes variables
>> > > and actual code - but the hope would be that it's suffiently page granular for
>> > > this to work.
>> >
>> > I don't think that would ever happen.
>>
>> Ok, that's promising, so how about this then to solve the security weakness the
>> new warning unearthed: map the whole EFI range as 'r-x (NX)', but detect writes
>> from the page fault handler and transparently allow them to flip over the range
>> to 'rw-'.
>
> So I meant to say 'page' instead of 'range'.
>
> I.e. this dynamic mechanism would flip pages over to 'rw-', as write faults occur
> from EFI code that writes to them.
>
> We don't need to know which regions are writable data, and which regions are
> executable-code/readonly-data.
>
> The following aspect would guarantee safety:
>
>> Note that for security reasons we don't allow a subsequent flipping back to NX
>> if there's an NX fault on the same page, i.e. this new mechanism is a monotonic
>> one-way process that should dynamically 'map out' data pages versus executable
>> pages.
>>
>> It should also be pretty robust, assuming we can take page faults while EFI code
>> is executing and is trying to modify EFI data: is that the case?
>
> and this is why I asked whether boundaries between 'Code' and 'Writable data'
> sections are page granular - which they do appear to be. (i.e. there are no
> singular pages that are both writable data and code at once.)
>
No, sadly they are not. Only in specific cases (which have to do with
the new UEFIv2.5 memory protection feature that got this discussion
started in the first place) can we assume that UEFI runtime pages are
mappable either RW- or R-X but not RWX, and in those cases, we have
the permissions bits that tell us unambiguously which pages are text
and which are data. For the remaining cases, which is the vast
majority, no such assumptions can be made, and since the UEFI runtime
regions are typically populated with a bunch of PE/COFF images (each
of which consists of text + data), inferring where the boundaries are
between them does not seem tractable (for instance, to only map
'boundary' pages RWX)
--
Ard.
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v2] x86/mm: warn on W+x mappings
2015-10-21 12:57 ` Ard Biesheuvel
@ 2015-10-21 13:24 ` Borislav Petkov
[not found] ` <20151021132430.GD3575-fF5Pk5pvG8Y@public.gmane.org>
0 siblings, 1 reply; 25+ messages in thread
From: Borislav Petkov @ 2015-10-21 13:24 UTC (permalink / raw)
To: Ard Biesheuvel
Cc: Ingo Molnar, Matt Fleming, Stephen Smalley, x86@kernel.org,
linux-kernel@vger.kernel.org, Kees Cook, Thomas Gleixner,
H. Peter Anvin, Peter Zijlstra, Andy Lutomirski, Denys Vlasenko,
Brian Gerst, linux-efi@vger.kernel.org
On Wed, Oct 21, 2015 at 02:57:47PM +0200, Ard Biesheuvel wrote:
> ... For the remaining cases, which is the vast majority, no such
> assumptions can be made, and since the UEFI runtime regions are
> typically populated with a bunch of PE/COFF images (each of which
> consists of text + data), inferring where the boundaries are between
> them does not seem tractable (for instance, to only map 'boundary'
> pages RWX)
How much of a problem would it be if we still do the on-demand page
faulting and map a trailing piece of code together with the data in a
page RWX?
Still better than mapping the *whole* thing RWX, no?
--
Regards/Gruss,
Boris.
ECO tip #101: Trim your mails when you reply.
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v2] x86/mm: warn on W+x mappings
[not found] ` <20151021132430.GD3575-fF5Pk5pvG8Y@public.gmane.org>
@ 2015-10-21 13:28 ` Ard Biesheuvel
2015-10-21 14:36 ` Borislav Petkov
0 siblings, 1 reply; 25+ messages in thread
From: Ard Biesheuvel @ 2015-10-21 13:28 UTC (permalink / raw)
To: Borislav Petkov
Cc: Ingo Molnar, Matt Fleming, Stephen Smalley,
x86-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org,
linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Kees Cook,
Thomas Gleixner, H. Peter Anvin, Peter Zijlstra, Andy Lutomirski,
Denys Vlasenko, Brian Gerst,
linux-efi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
On 21 October 2015 at 15:24, Borislav Petkov <bp-Gina5bIWoIWzQB+pC5nmwQ@public.gmane.org> wrote:
> On Wed, Oct 21, 2015 at 02:57:47PM +0200, Ard Biesheuvel wrote:
>> ... For the remaining cases, which is the vast majority, no such
>> assumptions can be made, and since the UEFI runtime regions are
>> typically populated with a bunch of PE/COFF images (each of which
>> consists of text + data), inferring where the boundaries are between
>> them does not seem tractable (for instance, to only map 'boundary'
>> pages RWX)
>
> How much of a problem would it be if we still do the on-demand page
> faulting and map a trailing piece of code together with the data in a
> page RWX?
>
> Still better than mapping the *whole* thing RWX, no?
>
In theory, yes. In practice, since this is supposed to be a security
enhancement, we need some kind of ground truth to tell us which pages
can be legally modified *and* executed, so that we can detect the
illegal cases. My point was that, since a multitude of PE/COFF images
can be covered by a single EfiRuntimeServicesCode region, the UEFI
memory map does not give us enough information to make the distinction
between a page that sits on the text/data boundary of some PE/COFF
image and a page that sits wholly in either.
--
Ard.
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v2] x86/mm: warn on W+x mappings
2015-10-21 13:28 ` Ard Biesheuvel
@ 2015-10-21 14:36 ` Borislav Petkov
[not found] ` <20151021143651.GE3575-fF5Pk5pvG8Y@public.gmane.org>
0 siblings, 1 reply; 25+ messages in thread
From: Borislav Petkov @ 2015-10-21 14:36 UTC (permalink / raw)
To: Ard Biesheuvel
Cc: Ingo Molnar, Matt Fleming, Stephen Smalley, x86@kernel.org,
linux-kernel@vger.kernel.org, Kees Cook, Thomas Gleixner,
H. Peter Anvin, Peter Zijlstra, Andy Lutomirski, Denys Vlasenko,
Brian Gerst, linux-efi@vger.kernel.org
On Wed, Oct 21, 2015 at 03:28:56PM +0200, Ard Biesheuvel wrote:
> In theory, yes. In practice, since this is supposed to be a security
> enhancement, we need some kind of ground truth to tell us which pages
> can be legally modified *and* executed, so that we can detect the
> illegal cases. My point was that, since a multitude of PE/COFF images
> can be covered by a single EfiRuntimeServicesCode region, the UEFI
> memory map does not give us enough information to make the distinction
> between a page that sits on the text/data boundary of some PE/COFF
> image and a page that sits wholly in either.
Well, we're going to simply allow the accesses to in-kernel users which
fault on those ranges, assuming that in-kernel modifiers are legit and
DTRT. Which means, we don't really need to know which pages can be
legally modified - we simply trust the in-kernel users.
The moment you're able to load an evil kernel module, guarding against
those writes is the last thing you need to worry about...
--
Regards/Gruss,
Boris.
ECO tip #101: Trim your mails when you reply.
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v2] x86/mm: warn on W+x mappings
[not found] ` <20151021143651.GE3575-fF5Pk5pvG8Y@public.gmane.org>
@ 2015-10-21 18:46 ` Andy Lutomirski
[not found] ` <CALCETrWwHLnt6=MiyuQY48zEaOUMuFTg1oN3NDdEOxa6XYMgCw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
0 siblings, 1 reply; 25+ messages in thread
From: Andy Lutomirski @ 2015-10-21 18:46 UTC (permalink / raw)
To: Borislav Petkov
Cc: Ard Biesheuvel, Ingo Molnar, Matt Fleming, Stephen Smalley,
x86-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org,
linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Kees Cook,
Thomas Gleixner, H. Peter Anvin, Peter Zijlstra, Andy Lutomirski,
Denys Vlasenko, Brian Gerst,
linux-efi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
On Wed, Oct 21, 2015 at 7:36 AM, Borislav Petkov <bp-Gina5bIWoIWzQB+pC5nmwQ@public.gmane.org> wrote:
> On Wed, Oct 21, 2015 at 03:28:56PM +0200, Ard Biesheuvel wrote:
>> In theory, yes. In practice, since this is supposed to be a security
>> enhancement, we need some kind of ground truth to tell us which pages
>> can be legally modified *and* executed, so that we can detect the
>> illegal cases. My point was that, since a multitude of PE/COFF images
>> can be covered by a single EfiRuntimeServicesCode region, the UEFI
>> memory map does not give us enough information to make the distinction
>> between a page that sits on the text/data boundary of some PE/COFF
>> image and a page that sits wholly in either.
>
> Well, we're going to simply allow the accesses to in-kernel users which
> fault on those ranges, assuming that in-kernel modifiers are legit and
> DTRT. Which means, we don't really need to know which pages can be
> legally modified - we simply trust the in-kernel users.
>
> The moment you're able to load an evil kernel module, guarding against
> those writes is the last thing you need to worry about...
I don't think we can do a whole lot to help against broken UEFI code,
but having anything mapped RWX is a nice target for people trying to
exploit kernel bugs. Hence my suggestion to clear W except when
actually running UEFI code.
If the UEFI stuff is mapped in its own PGD entry, we could just RO
that entire PGD entry everywhere except the UEFI pgd (and make sure to
clear G so that the TLB entries get zapped).
--Andy
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v2] x86/mm: warn on W+x mappings
[not found] ` <20151021094242.GA12155-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2015-10-21 12:49 ` Ingo Molnar
@ 2015-10-21 20:38 ` Matt Fleming
1 sibling, 0 replies; 25+ messages in thread
From: Matt Fleming @ 2015-10-21 20:38 UTC (permalink / raw)
To: Ingo Molnar
Cc: Borislav Petkov, Stephen Smalley, x86-DgEjT+Ai2ygdnm+yROfE0A,
linux-kernel-u79uwXL29TY76Z2rM5mHXA,
keescook-F7+t8E8rja9g9hUCZPvPmw, Thomas Gleixner, H. Peter Anvin,
Peter Zijlstra, Andy Lutomirski, Denys Vlasenko, Brian Gerst,
linux-efi-u79uwXL29TY76Z2rM5mHXA, Ard Biesheuvel
On Wed, 21 Oct, at 11:42:42AM, Ingo Molnar wrote:
>
> * Matt Fleming <matt-mF/unelCI9GS6iBeEJttW/XRex20P6io@public.gmane.org> wrote:
>
> > > > Right, we could do that, but then we wouldn't be able to support
> > > > creation/updating variables at runtime, such as when you install a
> > > > distribution for the first time, or want to boot a new kernel filename
> > > > directly from the firmware without a boot loader (and need to modify the
> > > > BootXXXX variables).
> > >
> > > Do we know the precise position and address range of these variables?
> > >
> > > We could map them writable (but not executable), and the rest executable (but
> > > not writable).
> >
> > The variables are stored in NVRAM, which we don't map into the kernel virtual
> > address space. [...]
>
> Just curious: is there firmware that memory maps those variables privately?
Good question, not sure. I suspect not because it becomes much harder
to protect those oh-so-precious variables from errant code wanting to
write to them.
Usually things get written on x86 from SMM code.
> > [...] We have to initiate the transaction of writing to the variables by
> > executing EFI runtime services.
> >
> > We obviously have buffers that we pass to the BIOS that contain variable data,
> > but these should be NX anyway because they're regular kernel allocations.
> >
> > > That raises the question whether the same physical page ever mixes variables
> > > and actual code - but the hope would be that it's suffiently page granular for
> > > this to work.
> >
> > I don't think that would ever happen.
>
> Ok, that's promising, so how about this then to solve the security weakness the
> new warning unearthed: map the whole EFI range as 'r-x (NX)', but detect writes
> from the page fault handler and transparently allow them to flip over the range to
> 'rw-'.
>
> Note that for security reasons we don't allow a subsequent flipping back to NX if
> there's an NX fault on the same page, i.e. this new mechanism is a monotonic
> one-way process that should dynamically 'map out' data pages versus executable
> pages.
>
> It should also be pretty robust, assuming we can take page faults while EFI code
> is executing and is trying to modify EFI data: is that the case?
Yes, we can do that but I think I misunderstood what you were asking
when you said,
> That raises the question whether the same physical page ever mixes variables and
> actual code - but the hope would be that it's suffiently page granular for this to
> work.
I was talking about EFI variables as defined in the UEFI spec, i.e.
backed by some peristent storage mechanism. I wasn't talking about
".data" objects.
It *is* possible for physical pages to contain both EFI code and data,
as Ard mentioned, and we have no way of distinguishing when EFI code
tried to write to a EfiRuntimeServicesCode page/region because there's
also legitimate data there and when an exploit attempt is taking
place.
In which case, I think we'd essentially map everything with execute
permission apart from the heap and other dynamically allocated objects
stored in EfiRuntimeServicesData.
But at that point, can't we just leave all these regions unmapped
unless we're in the EFI code paths? And that includes not leaving the
mappings around duing the suspend/resume code.
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v2] x86/mm: warn on W+x mappings
[not found] ` <CALCETrWwHLnt6=MiyuQY48zEaOUMuFTg1oN3NDdEOxa6XYMgCw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2015-10-21 20:45 ` Matt Fleming
[not found] ` <20151021204522.GB20338-mF/unelCI9GS6iBeEJttW/XRex20P6io@public.gmane.org>
0 siblings, 1 reply; 25+ messages in thread
From: Matt Fleming @ 2015-10-21 20:45 UTC (permalink / raw)
To: Andy Lutomirski
Cc: Borislav Petkov, Ard Biesheuvel, Ingo Molnar, Stephen Smalley,
x86-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org,
linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Kees Cook,
Thomas Gleixner, H. Peter Anvin, Peter Zijlstra, Andy Lutomirski,
Denys Vlasenko, Brian Gerst,
linux-efi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
On Wed, 21 Oct, at 11:46:53AM, Andy Lutomirski wrote:
>
> If the UEFI stuff is mapped in its own PGD entry, we could just RO
> that entire PGD entry everywhere except the UEFI pgd (and make sure to
> clear G so that the TLB entries get zapped).
What would be the benefit of making it RO as opposed to not having it
mapped at all? The mappings only exist in the trampoline_pgd right now
for x86 which minimizes the potentially vulnerable code paths to the
EFI runtime calls and the suspend/resume code.
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH v2] x86/mm: warn on W+x mappings
[not found] ` <20151021204522.GB20338-mF/unelCI9GS6iBeEJttW/XRex20P6io@public.gmane.org>
@ 2015-10-21 20:49 ` Andy Lutomirski
0 siblings, 0 replies; 25+ messages in thread
From: Andy Lutomirski @ 2015-10-21 20:49 UTC (permalink / raw)
To: Matt Fleming
Cc: Borislav Petkov, Ard Biesheuvel, Ingo Molnar, Stephen Smalley,
x86-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org,
linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Kees Cook,
Thomas Gleixner, H. Peter Anvin, Peter Zijlstra, Andy Lutomirski,
Denys Vlasenko, Brian Gerst,
linux-efi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
On Wed, Oct 21, 2015 at 1:45 PM, Matt Fleming <matt-mF/unelCI9GS6iBeEJttW/XRex20P6io@public.gmane.org> wrote:
> On Wed, 21 Oct, at 11:46:53AM, Andy Lutomirski wrote:
>>
>> If the UEFI stuff is mapped in its own PGD entry, we could just RO
>> that entire PGD entry everywhere except the UEFI pgd (and make sure to
>> clear G so that the TLB entries get zapped).
>
> What would be the benefit of making it RO as opposed to not having it
> mapped at all?
Nothing.
> The mappings only exist in the trampoline_pgd right now
> for x86 which minimizes the potentially vulnerable code paths to the
> EFI runtime calls and the suspend/resume code.
Oh, I didn't realize it.
So what's the problem here? Honestly, while UEFI is full of
questionable things, I don't really see how an unprivileged user
program should be able to cause malicious input to be send to UEFI
code, so it should be quite difficult to exploit a buffer overflow or
other errant write in UEFI to escalate privileges from user to
anything else. (Kernel -> SMM escalation is a whole different story,
but preventing that is SMM's business, not the kernel's. I've
actually been a wee bit tempted to write a /dev/smram driver to expose
SMRAM using a portfolio of old known exploits.)
--Andy
--
Andy Lutomirski
AMA Capital Management, LLC
^ permalink raw reply [flat|nested] 25+ messages in thread
end of thread, other threads:[~2015-10-21 20:49 UTC | newest]
Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <1443814185-21552-1-git-send-email-sds@tycho.nsa.gov>
[not found] ` <20151003112701.GA4531@gmail.com>
[not found] ` <5612CBE8.2010504@tycho.nsa.gov>
[not found] ` <20151006073205.GA11115@gmail.com>
[not found] ` <5613EAD5.2070405@tycho.nsa.gov>
[not found] ` <20151012113605.GB7384@pd.tnic>
[not found] ` <20151012113605.GB7384-fF5Pk5pvG8Y@public.gmane.org>
2015-10-12 12:41 ` [PATCH v2] x86/mm: warn on W+x mappings Matt Fleming
2015-10-12 12:49 ` Ingo Molnar
[not found] ` <20151012124936.GA6260-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2015-10-12 12:55 ` Matt Fleming
[not found] ` <20151012125548.GE2579-mF/unelCI9GS6iBeEJttW/XRex20P6io@public.gmane.org>
2015-10-12 14:17 ` Ingo Molnar
[not found] ` <20151012141754.GA6621-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2015-10-12 14:49 ` Matt Fleming
[not found] ` <20151012144928.GF2579-mF/unelCI9GS6iBeEJttW/XRex20P6io@public.gmane.org>
2015-10-12 15:34 ` Ard Biesheuvel
[not found] ` <CAKv+Gu95NoB5cPqZMeBLn7i0JWDDPKX63FuiMVD94HpbGhKrhA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-10-12 15:50 ` Matt Fleming
[not found] ` <20151012155028.GH2579-mF/unelCI9GS6iBeEJttW/XRex20P6io@public.gmane.org>
2015-10-12 16:43 ` Ard Biesheuvel
2015-10-14 15:18 ` Ingo Molnar
2015-10-14 15:30 ` Andy Lutomirski
2015-10-14 15:35 ` Borislav Petkov
[not found] ` <20151014153522.GC8218-fF5Pk5pvG8Y@public.gmane.org>
2015-10-15 10:10 ` Matt Fleming
[not found] ` <20151015101016.GB2975-mF/unelCI9GS6iBeEJttW/XRex20P6io@public.gmane.org>
2015-10-15 10:33 ` Borislav Petkov
2015-10-16 1:45 ` Ricardo Neri
2015-10-14 21:02 ` Matt Fleming
2015-10-21 9:42 ` Ingo Molnar
[not found] ` <20151021094242.GA12155-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2015-10-21 12:49 ` Ingo Molnar
[not found] ` <20151021124924.GA19262-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2015-10-21 12:57 ` Ard Biesheuvel
2015-10-21 13:24 ` Borislav Petkov
[not found] ` <20151021132430.GD3575-fF5Pk5pvG8Y@public.gmane.org>
2015-10-21 13:28 ` Ard Biesheuvel
2015-10-21 14:36 ` Borislav Petkov
[not found] ` <20151021143651.GE3575-fF5Pk5pvG8Y@public.gmane.org>
2015-10-21 18:46 ` Andy Lutomirski
[not found] ` <CALCETrWwHLnt6=MiyuQY48zEaOUMuFTg1oN3NDdEOxa6XYMgCw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-10-21 20:45 ` Matt Fleming
[not found] ` <20151021204522.GB20338-mF/unelCI9GS6iBeEJttW/XRex20P6io@public.gmane.org>
2015-10-21 20:49 ` Andy Lutomirski
2015-10-21 20:38 ` Matt Fleming
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).