public inbox for linux-efi@vger.kernel.org
 help / color / mirror / Atom feed
* Re: [GIT PULL] x86/mm changes for v4.4
       [not found]           ` <CA+55aFyC7L85SYen3Uz6e1cvH0jXzQ9_MddHJ=7PxvpOR2U23w@mail.gmail.com>
@ 2015-11-06 11:39             ` Matt Fleming
       [not found]               ` <20151106113943.GB2651-mF/unelCI9GS6iBeEJttW/XRex20P6io@public.gmane.org>
  0 siblings, 1 reply; 13+ messages in thread
From: Matt Fleming @ 2015-11-06 11:39 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Dave Jones, Ingo Molnar, Linux Kernel Mailing List,
	Thomas Gleixner, H. Peter Anvin, Borislav Petkov, Andrew Morton,
	Andy Lutomirski, Denys Vlasenko, Stephen Smalley, linux-efi

On Thu, 05 Nov, at 01:33:10PM, Linus Torvalds wrote:
> 
> And if this turns out to be due to EFI wanting those permissions, what
> should we do? People have talked about running the EFI callbacks in
> their own private page table setup, which sounds like the right idea,
> but until that actually *happens*....

We have separate page tables today, for a few reasons, but mainly it's
so that we can have an identity mapping of memory present in the
region usually used by user processes - broken firmware still uses
those identity mappings even after the kernel tells it they're
invalid.

Note that when I say "separate" I'm talking about trampoline_pgd[]
which is also used by the x86 suspend/resume code.

However, turns out that the issue with the current scheme is the fact
that trampoline_pgd[] actually shares a couple of PGD entries with
swapper_pg_dir as can be seen in setup_real_mode(),


        trampoline_pgd = (u64 *)__va(real_mode_header->trampoline_pgd);
        trampoline_pgd[0] = init_level4_pgt[pgd_index(__PAGE_OFFSET)].pgd;
        trampoline_pgd[511] = init_level4_pgt[511].pgd;


So when we map the EFI regions in efi_map_regions() we're inserting
them into swapper_pg_dir also, which is why you're seeing the
warnings.

If I remember correctly the rationale for using trampoline_pgd[] was
that it already did what we wanted (provided the identity mapping) and
would save us the overhead of maintaining more page tables for no good
reason. Obviously this entire thread is a good reason.

I suggest we stop using trampoline_pgd[] (since it has a good reason
for sharing the kernel mapping PGD entries) and create our own so that
we can isolate EFI completely.

For the immediate problem of the warnings spewing forth on all UEFI
machines, at the very least the config options needs to be disabled by
default, if not the patch reverted.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [GIT PULL] x86/mm changes for v4.4
       [not found]           ` <20151106065549.GA2031@gmail.com>
@ 2015-11-06 12:39             ` Matt Fleming
       [not found]               ` <20151106123912.GC2651-mF/unelCI9GS6iBeEJttW/XRex20P6io@public.gmane.org>
       [not found]             ` <CALCETrU2dn4TEj_2QiCPy4Mjw6hCbB84k1RnPzx7sLNygj4D5Q@mail.gmail.com>
  1 sibling, 1 reply; 13+ messages in thread
From: Matt Fleming @ 2015-11-06 12:39 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, Stephen Smalley, Dave Jones,
	Linux Kernel Mailing List, Thomas Gleixner, H. Peter Anvin,
	Borislav Petkov, Andrew Morton, Andy Lutomirski, Denys Vlasenko,
	Kees Cook, linux-efi, Ard Biesheuvel

On Fri, 06 Nov, at 07:55:50AM, Ingo Molnar wrote:
> 
>  3) We should fix the EFI permission problem without relying on the firmware: it 
>     appears we could just mark everything R-X optimistically, and if a write fault 
>     happens (it's pretty rare in fact, only triggers when we write to an EFI 
>     variable and so), we can mark the faulting page RW- on the fly, because it 
>     appears that writable EFI sections, while not enumerated very well in 'old' 
>     firmware, are still supposed to be page granular. (Even 'new' firmware I 
>     wouldn't automatically trust to get the enumeration right...)

Sorry, this isn't true. I misled you with one of my earlier posts on
this topic. Let me try and clear things up...

Writing to EFI regions has to do with every invocation of the EFI
runtime services - it's not limited to when you read/write/delete EFI
variables. In fact, EFI variables really have nothing to do with this
discussion, they're a completely opaque concept to the OS, we have no
idea how the firmware implements them. Everything is done via the EFI
boot/runtime services.

The firmware itself will attempt to write to EFI regions when we
invoke the EFI services because that's where the PE/COFF ".data" and
".bss" sections live along with the heap. There's even some relocation
fixups that occur as SetVirtualAddressMap() time so it'll write to
".text" too.

Now, the above PE/COFF sections are usually (always?) contained within
EFI regions of type EfiRuntimeServicesCode. We know this is true
because the firmware folks have told us so, and because stopping that
is the motivation behind the new EFI_PROPERTIES_TABLE feature in UEFI
V2.5.

The data sections within the region are also *not* guaranteed to be
page granular because work was required in Tianocore for emitting
sections with 4k alignment as part of the EFI_PROPERTIES_TABLE
support.

Ultimately, what this means is that if you were to attempt to
dynamically fixup those regions that required write permission, you'd
have to modify the mappings for the majority of the EFI regions
anyway. And if you're blindly allowing write permission as a fixup,
there's not much security to be had.

>     If that 'supposed to be' turns out to be 'not true' (not unheard of in
>     firmware land), then plan B would be to mark pages that generate write faults 
>     RWX as well, to not break functionality. (This 'mark it RWX' is not something 
>     that exploits would have easy access to, and we could also generate a warning
>     [after the EFI call has finished] if it ever triggers.)
> 
>     Admittedly this approach might not be without its own complications, but it 
>     looks reasonably simple (I don't think we need per EFI call page tables, 
>     etc.), and does not assume much about the firmware being able to enumerate its 
>     permissions properly. Were we to merge EFI support today I'd have insisted on 
>     trying such an approach from day 1 on.

We already have separate EFI page tables, though with the caveat that
we share some of swapper_pg_dir's PGD entries. The best solution would
be to stop sharing entries and isolate the EFI mappings from every
other page table structure, so that they're only used during the EFI
service calls.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [GIT PULL] x86/mm changes for v4.4
       [not found]               ` <CALCETrU2dn4TEj_2QiCPy4Mjw6hCbB84k1RnPzx7sLNygj4D5Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2015-11-06 13:09                 ` Matt Fleming
       [not found]                   ` <20151106130948.GD2651-mF/unelCI9GS6iBeEJttW/XRex20P6io@public.gmane.org>
  0 siblings, 1 reply; 13+ messages in thread
From: Matt Fleming @ 2015-11-06 13:09 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Ingo Molnar, Linus Torvalds, Stephen Smalley, Dave Jones,
	Linux Kernel Mailing List, Thomas Gleixner, H. Peter Anvin,
	Borislav Petkov, Andrew Morton, Andy Lutomirski, Denys Vlasenko,
	Kees Cook, linux-efi-u79uwXL29TY76Z2rM5mHXA

On Thu, 05 Nov, at 11:05:35PM, Andy Lutomirski wrote:
> 
> Admittedly, we might need to use a certain amount of care to avoid
> interesting conflicts with the vmap mechanism.  We might need to vmap
> all of the EFI stuff, and possibly even all the top-level entries that
> contain EFI stuff (i.e. exactly one of them unless EFI ends up *huge*)
> as a blank not-present region to avoid overlaps, but that's not a big
> deal.

There shouldn't be any room for conflicting with vmap() because the VA
region where we map EFI regions is still carved out especially for us.

Right Boris?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [GIT PULL] x86/mm changes for v4.4
       [not found]                   ` <20151106130948.GD2651-mF/unelCI9GS6iBeEJttW/XRex20P6io@public.gmane.org>
@ 2015-11-06 13:24                     ` Borislav Petkov
  0 siblings, 0 replies; 13+ messages in thread
From: Borislav Petkov @ 2015-11-06 13:24 UTC (permalink / raw)
  To: Matt Fleming
  Cc: Andy Lutomirski, Ingo Molnar, Linus Torvalds, Stephen Smalley,
	Dave Jones, Linux Kernel Mailing List, Thomas Gleixner,
	H. Peter Anvin, Andrew Morton, Andy Lutomirski, Denys Vlasenko,
	Kees Cook, linux-efi-u79uwXL29TY76Z2rM5mHXA

On Fri, Nov 06, 2015 at 01:09:48PM +0000, Matt Fleming wrote:
> On Thu, 05 Nov, at 11:05:35PM, Andy Lutomirski wrote:
> > 
> > Admittedly, we might need to use a certain amount of care to avoid
> > interesting conflicts with the vmap mechanism.  We might need to vmap
> > all of the EFI stuff, and possibly even all the top-level entries that
> > contain EFI stuff (i.e. exactly one of them unless EFI ends up *huge*)
> > as a blank not-present region to avoid overlaps, but that's not a big
> > deal.
> 
> There shouldn't be any room for conflicting with vmap() because the VA
> region where we map EFI regions is still carved out especially for us.
> 
> Right Boris?

Yap:

ffffea0000000000 - ffffeaffffffffff (=40 bits) virtual memory map (1TB)

vs

ffffffef00000000 - ffffffff00000000 EFI region in trampoline_pgd

the new pagetable will make that issue moot too.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [GIT PULL] x86/mm changes for v4.4
       [not found]               ` <20151106113943.GB2651-mF/unelCI9GS6iBeEJttW/XRex20P6io@public.gmane.org>
@ 2015-11-07  7:05                 ` Ingo Molnar
       [not found]                   ` <20151107070554.GB6235-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 13+ messages in thread
From: Ingo Molnar @ 2015-11-07  7:05 UTC (permalink / raw)
  To: Matt Fleming
  Cc: Linus Torvalds, Dave Jones, Linux Kernel Mailing List,
	Thomas Gleixner, H. Peter Anvin, Borislav Petkov, Andrew Morton,
	Andy Lutomirski, Denys Vlasenko, Stephen Smalley,
	linux-efi-u79uwXL29TY76Z2rM5mHXA


* Matt Fleming <matt-mF/unelCI9GS6iBeEJttW/XRex20P6io@public.gmane.org> wrote:

> On Thu, 05 Nov, at 01:33:10PM, Linus Torvalds wrote:
> > 
> > And if this turns out to be due to EFI wanting those permissions, what should 
> > we do? People have talked about running the EFI callbacks in their own private 
> > page table setup, which sounds like the right idea, but until that actually 
> > *happens*....
> 
> We have separate page tables today, for a few reasons, but mainly it's
> so that we can have an identity mapping of memory present in the
> region usually used by user processes - broken firmware still uses
> those identity mappings even after the kernel tells it they're
> invalid.
> 
> Note that when I say "separate" I'm talking about trampoline_pgd[]
> which is also used by the x86 suspend/resume code.
> 
> However, turns out that the issue with the current scheme is the fact
> that trampoline_pgd[] actually shares a couple of PGD entries with
> swapper_pg_dir as can be seen in setup_real_mode(),
> 
> 
>         trampoline_pgd = (u64 *)__va(real_mode_header->trampoline_pgd);
>         trampoline_pgd[0] = init_level4_pgt[pgd_index(__PAGE_OFFSET)].pgd;
>         trampoline_pgd[511] = init_level4_pgt[511].pgd;
> 
> 
> So when we map the EFI regions in efi_map_regions() we're inserting
> them into swapper_pg_dir also, which is why you're seeing the
> warnings.
> 
> If I remember correctly the rationale for using trampoline_pgd[] was
> that it already did what we wanted (provided the identity mapping) and
> would save us the overhead of maintaining more page tables for no good
> reason. Obviously this entire thread is a good reason.
> 
> I suggest we stop using trampoline_pgd[] (since it has a good reason
> for sharing the kernel mapping PGD entries) and create our own so that
> we can isolate EFI completely.

Ok. Could you please make this fix a priority for upcoming EFI changes?

> For the immediate problem of the warnings spewing forth on all UEFI
> machines, at the very least the config options needs to be disabled by
> default, if not the patch reverted.

We'll certainly flip around the default, but reverting would be shooting
the messenger: the EFI code is endangering everyone else today, and for
no good reason as it appears... so the warning very much served its
purpose in pointing out a valid problem.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [GIT PULL] x86/mm changes for v4.4
       [not found]               ` <20151106123912.GC2651-mF/unelCI9GS6iBeEJttW/XRex20P6io@public.gmane.org>
@ 2015-11-07  7:09                 ` Ingo Molnar
  2015-11-07  7:39                   ` Ard Biesheuvel
  0 siblings, 1 reply; 13+ messages in thread
From: Ingo Molnar @ 2015-11-07  7:09 UTC (permalink / raw)
  To: Matt Fleming
  Cc: Linus Torvalds, Stephen Smalley, Dave Jones,
	Linux Kernel Mailing List, Thomas Gleixner, H. Peter Anvin,
	Borislav Petkov, Andrew Morton, Andy Lutomirski, Denys Vlasenko,
	Kees Cook, linux-efi-u79uwXL29TY76Z2rM5mHXA, Ard Biesheuvel


* Matt Fleming <matt-mF/unelCI9GS6iBeEJttW/XRex20P6io@public.gmane.org> wrote:

> On Fri, 06 Nov, at 07:55:50AM, Ingo Molnar wrote:
> > 
> >  3) We should fix the EFI permission problem without relying on the firmware: it 
> >     appears we could just mark everything R-X optimistically, and if a write fault 
> >     happens (it's pretty rare in fact, only triggers when we write to an EFI 
> >     variable and so), we can mark the faulting page RW- on the fly, because it 
> >     appears that writable EFI sections, while not enumerated very well in 'old' 
> >     firmware, are still supposed to be page granular. (Even 'new' firmware I 
> >     wouldn't automatically trust to get the enumeration right...)
> 
> Sorry, this isn't true. I misled you with one of my earlier posts on
> this topic. Let me try and clear things up...
> 
> Writing to EFI regions has to do with every invocation of the EFI
> runtime services - it's not limited to when you read/write/delete EFI
> variables. In fact, EFI variables really have nothing to do with this
> discussion, they're a completely opaque concept to the OS, we have no
> idea how the firmware implements them. Everything is done via the EFI
> boot/runtime services.
> 
> The firmware itself will attempt to write to EFI regions when we
> invoke the EFI services because that's where the PE/COFF ".data" and
> ".bss" sections live along with the heap. There's even some relocation
> fixups that occur as SetVirtualAddressMap() time so it'll write to
> ".text" too.
> 
> Now, the above PE/COFF sections are usually (always?) contained within
> EFI regions of type EfiRuntimeServicesCode. We know this is true
> because the firmware folks have told us so, and because stopping that
> is the motivation behind the new EFI_PROPERTIES_TABLE feature in UEFI
> V2.5.
> 
> The data sections within the region are also *not* guaranteed to be
> page granular because work was required in Tianocore for emitting
> sections with 4k alignment as part of the EFI_PROPERTIES_TABLE
> support.
> 
> Ultimately, what this means is that if you were to attempt to
> dynamically fixup those regions that required write permission, you'd
> have to modify the mappings for the majority of the EFI regions
> anyway. And if you're blindly allowing write permission as a fixup,
> there's not much security to be had.

I think you misunderstood my suggestion: the 'fixup' would be changing it from R-X 
to RW-, i.e. it would add 'write' permission but remove 'execute' permission.

Note that there would be no 'RWX' permission at any given moment - which is the 
dangerous combination.

> >     If that 'supposed to be' turns out to be 'not true' (not unheard of in
> >     firmware land), then plan B would be to mark pages that generate write faults 
> >     RWX as well, to not break functionality. (This 'mark it RWX' is not something 
> >     that exploits would have easy access to, and we could also generate a warning
> >     [after the EFI call has finished] if it ever triggers.)
> > 
> >     Admittedly this approach might not be without its own complications, but it 
> >     looks reasonably simple (I don't think we need per EFI call page tables, 
> >     etc.), and does not assume much about the firmware being able to enumerate its 
> >     permissions properly. Were we to merge EFI support today I'd have insisted on 
> >     trying such an approach from day 1 on.
> 
> We already have separate EFI page tables, though with the caveat that
> we share some of swapper_pg_dir's PGD entries. The best solution would
> be to stop sharing entries and isolate the EFI mappings from every
> other page table structure, so that they're only used during the EFI
> service calls.

Absolutely. Can you try to fix this for v4.3?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [GIT PULL] x86/mm changes for v4.4
  2015-11-07  7:09                 ` Ingo Molnar
@ 2015-11-07  7:39                   ` Ard Biesheuvel
  2015-11-08  6:58                     ` Kees Cook
  0 siblings, 1 reply; 13+ messages in thread
From: Ard Biesheuvel @ 2015-11-07  7:39 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Matt Fleming, Linus Torvalds, Stephen Smalley, Dave Jones,
	Linux Kernel Mailing List, Thomas Gleixner, H. Peter Anvin,
	Borislav Petkov, Andrew Morton, Andy Lutomirski, Denys Vlasenko,
	Kees Cook, linux-efi@vger.kernel.org

On 7 November 2015 at 08:09, Ingo Molnar <mingo@kernel.org> wrote:
>
> * Matt Fleming <matt@codeblueprint.co.uk> wrote:
>
>> On Fri, 06 Nov, at 07:55:50AM, Ingo Molnar wrote:
>> >
>> >  3) We should fix the EFI permission problem without relying on the firmware: it
>> >     appears we could just mark everything R-X optimistically, and if a write fault
>> >     happens (it's pretty rare in fact, only triggers when we write to an EFI
>> >     variable and so), we can mark the faulting page RW- on the fly, because it
>> >     appears that writable EFI sections, while not enumerated very well in 'old'
>> >     firmware, are still supposed to be page granular. (Even 'new' firmware I
>> >     wouldn't automatically trust to get the enumeration right...)
>>
>> Sorry, this isn't true. I misled you with one of my earlier posts on
>> this topic. Let me try and clear things up...
>>
>> Writing to EFI regions has to do with every invocation of the EFI
>> runtime services - it's not limited to when you read/write/delete EFI
>> variables. In fact, EFI variables really have nothing to do with this
>> discussion, they're a completely opaque concept to the OS, we have no
>> idea how the firmware implements them. Everything is done via the EFI
>> boot/runtime services.
>>
>> The firmware itself will attempt to write to EFI regions when we
>> invoke the EFI services because that's where the PE/COFF ".data" and
>> ".bss" sections live along with the heap. There's even some relocation
>> fixups that occur as SetVirtualAddressMap() time so it'll write to
>> ".text" too.
>>
>> Now, the above PE/COFF sections are usually (always?) contained within
>> EFI regions of type EfiRuntimeServicesCode. We know this is true
>> because the firmware folks have told us so, and because stopping that
>> is the motivation behind the new EFI_PROPERTIES_TABLE feature in UEFI
>> V2.5.
>>
>> The data sections within the region are also *not* guaranteed to be
>> page granular because work was required in Tianocore for emitting
>> sections with 4k alignment as part of the EFI_PROPERTIES_TABLE
>> support.
>>
>> Ultimately, what this means is that if you were to attempt to
>> dynamically fixup those regions that required write permission, you'd
>> have to modify the mappings for the majority of the EFI regions
>> anyway. And if you're blindly allowing write permission as a fixup,
>> there's not much security to be had.
>
> I think you misunderstood my suggestion: the 'fixup' would be changing it from R-X
> to RW-, i.e. it would add 'write' permission but remove 'execute' permission.
>
> Note that there would be no 'RWX' permission at any given moment - which is the
> dangerous combination.
>

The problem with that is that /any/ page in the UEFI runtime region
may intersect with both .text and .data of any of the PE/COFF images
that make up the runtime firmware (since the PE/COFF sections are not
necessarily page aligned). Such pages require RWX permissions. The
UEFI memory map does not provide the information to identify those
pages a priori (the entire region containing several PE/COFF images
could be covered by a single entry) so it is hard to guess which pages
should be allowed these RWX permissions.

>> >     If that 'supposed to be' turns out to be 'not true' (not unheard of in
>> >     firmware land), then plan B would be to mark pages that generate write faults
>> >     RWX as well, to not break functionality. (This 'mark it RWX' is not something
>> >     that exploits would have easy access to, and we could also generate a warning
>> >     [after the EFI call has finished] if it ever triggers.)
>> >
>> >     Admittedly this approach might not be without its own complications, but it
>> >     looks reasonably simple (I don't think we need per EFI call page tables,
>> >     etc.), and does not assume much about the firmware being able to enumerate its
>> >     permissions properly. Were we to merge EFI support today I'd have insisted on
>> >     trying such an approach from day 1 on.
>>
>> We already have separate EFI page tables, though with the caveat that
>> we share some of swapper_pg_dir's PGD entries. The best solution would
>> be to stop sharing entries and isolate the EFI mappings from every
>> other page table structure, so that they're only used during the EFI
>> service calls.
>
> Absolutely. Can you try to fix this for v4.3?
>
> Thanks,
>
>         Ingo

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [GIT PULL] x86/mm changes for v4.4
       [not found]                   ` <20151107070554.GB6235-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2015-11-07 10:03                     ` Matt Fleming
  0 siblings, 0 replies; 13+ messages in thread
From: Matt Fleming @ 2015-11-07 10:03 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Linus Torvalds, Dave Jones, Linux Kernel Mailing List,
	Thomas Gleixner, H. Peter Anvin, Borislav Petkov, Andrew Morton,
	Andy Lutomirski, Denys Vlasenko, Stephen Smalley,
	linux-efi-u79uwXL29TY76Z2rM5mHXA

On Sat, 07 Nov, at 08:05:54AM, Ingo Molnar wrote:
> 
> * Matt Fleming <matt-mF/unelCI9GS6iBeEJttW/XRex20P6io@public.gmane.org> wrote:
> 
> > On Thu, 05 Nov, at 01:33:10PM, Linus Torvalds wrote:
> > > 
> > > And if this turns out to be due to EFI wanting those permissions, what should 
> > > we do? People have talked about running the EFI callbacks in their own private 
> > > page table setup, which sounds like the right idea, but until that actually 
> > > *happens*....
> > 
> > We have separate page tables today, for a few reasons, but mainly it's
> > so that we can have an identity mapping of memory present in the
> > region usually used by user processes - broken firmware still uses
> > those identity mappings even after the kernel tells it they're
> > invalid.
> > 
> > Note that when I say "separate" I'm talking about trampoline_pgd[]
> > which is also used by the x86 suspend/resume code.
> > 
> > However, turns out that the issue with the current scheme is the fact
> > that trampoline_pgd[] actually shares a couple of PGD entries with
> > swapper_pg_dir as can be seen in setup_real_mode(),
> > 
> > 
> >         trampoline_pgd = (u64 *)__va(real_mode_header->trampoline_pgd);
> >         trampoline_pgd[0] = init_level4_pgt[pgd_index(__PAGE_OFFSET)].pgd;
> >         trampoline_pgd[511] = init_level4_pgt[511].pgd;
> > 
> > 
> > So when we map the EFI regions in efi_map_regions() we're inserting
> > them into swapper_pg_dir also, which is why you're seeing the
> > warnings.
> > 
> > If I remember correctly the rationale for using trampoline_pgd[] was
> > that it already did what we wanted (provided the identity mapping) and
> > would save us the overhead of maintaining more page tables for no good
> > reason. Obviously this entire thread is a good reason.
> > 
> > I suggest we stop using trampoline_pgd[] (since it has a good reason
> > for sharing the kernel mapping PGD entries) and create our own so that
> > we can isolate EFI completely.
> 
> Ok. Could you please make this fix a priority for upcoming EFI changes?

Yep, I'll get on it.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [GIT PULL] x86/mm changes for v4.4
  2015-11-07  7:39                   ` Ard Biesheuvel
@ 2015-11-08  6:58                     ` Kees Cook
  2015-11-08  7:55                       ` Ard Biesheuvel
  0 siblings, 1 reply; 13+ messages in thread
From: Kees Cook @ 2015-11-08  6:58 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Ingo Molnar, Matt Fleming, Linus Torvalds, Stephen Smalley,
	Dave Jones, Linux Kernel Mailing List, Thomas Gleixner,
	H. Peter Anvin, Borislav Petkov, Andrew Morton, Andy Lutomirski,
	Denys Vlasenko, linux-efi@vger.kernel.org, Matthew Garrett

On Fri, Nov 6, 2015 at 11:39 PM, Ard Biesheuvel
<ard.biesheuvel@linaro.org> wrote:
> On 7 November 2015 at 08:09, Ingo Molnar <mingo@kernel.org> wrote:
>>
>> * Matt Fleming <matt@codeblueprint.co.uk> wrote:
>>
>>> On Fri, 06 Nov, at 07:55:50AM, Ingo Molnar wrote:
>>> >
>>> >  3) We should fix the EFI permission problem without relying on the firmware: it
>>> >     appears we could just mark everything R-X optimistically, and if a write fault
>>> >     happens (it's pretty rare in fact, only triggers when we write to an EFI
>>> >     variable and so), we can mark the faulting page RW- on the fly, because it
>>> >     appears that writable EFI sections, while not enumerated very well in 'old'
>>> >     firmware, are still supposed to be page granular. (Even 'new' firmware I
>>> >     wouldn't automatically trust to get the enumeration right...)
>>>
>>> Sorry, this isn't true. I misled you with one of my earlier posts on
>>> this topic. Let me try and clear things up...
>>>
>>> Writing to EFI regions has to do with every invocation of the EFI
>>> runtime services - it's not limited to when you read/write/delete EFI
>>> variables. In fact, EFI variables really have nothing to do with this
>>> discussion, they're a completely opaque concept to the OS, we have no
>>> idea how the firmware implements them. Everything is done via the EFI
>>> boot/runtime services.
>>>
>>> The firmware itself will attempt to write to EFI regions when we
>>> invoke the EFI services because that's where the PE/COFF ".data" and
>>> ".bss" sections live along with the heap. There's even some relocation
>>> fixups that occur as SetVirtualAddressMap() time so it'll write to
>>> ".text" too.
>>>
>>> Now, the above PE/COFF sections are usually (always?) contained within
>>> EFI regions of type EfiRuntimeServicesCode. We know this is true
>>> because the firmware folks have told us so, and because stopping that
>>> is the motivation behind the new EFI_PROPERTIES_TABLE feature in UEFI
>>> V2.5.
>>>
>>> The data sections within the region are also *not* guaranteed to be
>>> page granular because work was required in Tianocore for emitting
>>> sections with 4k alignment as part of the EFI_PROPERTIES_TABLE
>>> support.
>>>
>>> Ultimately, what this means is that if you were to attempt to
>>> dynamically fixup those regions that required write permission, you'd
>>> have to modify the mappings for the majority of the EFI regions
>>> anyway. And if you're blindly allowing write permission as a fixup,
>>> there's not much security to be had.
>>
>> I think you misunderstood my suggestion: the 'fixup' would be changing it from R-X
>> to RW-, i.e. it would add 'write' permission but remove 'execute' permission.
>>
>> Note that there would be no 'RWX' permission at any given moment - which is the
>> dangerous combination.
>>
>
> The problem with that is that /any/ page in the UEFI runtime region
> may intersect with both .text and .data of any of the PE/COFF images
> that make up the runtime firmware (since the PE/COFF sections are not
> necessarily page aligned). Such pages require RWX permissions. The
> UEFI memory map does not provide the information to identify those
> pages a priori (the entire region containing several PE/COFF images
> could be covered by a single entry) so it is hard to guess which pages
> should be allowed these RWX permissions.

I'm sad that UEFI was designed without even the most basic of memory
protections in mind. UEFI _itself_ should be setting up protective
page mappings. :(

For a boot firmware, it seems to me that safe page table layout would
be a top priority bug. The "reporting issues" page for TianoCore
doesn't actually seem to link to the "Project Tracker":
https://github.com/tianocore/tianocore.github.io/wiki/Reporting-Issues

Does anyone know how to get this correctly reported so future UEFI
releases don't suffer from this?

-Kees

>
>>> >     If that 'supposed to be' turns out to be 'not true' (not unheard of in
>>> >     firmware land), then plan B would be to mark pages that generate write faults
>>> >     RWX as well, to not break functionality. (This 'mark it RWX' is not something
>>> >     that exploits would have easy access to, and we could also generate a warning
>>> >     [after the EFI call has finished] if it ever triggers.)
>>> >
>>> >     Admittedly this approach might not be without its own complications, but it
>>> >     looks reasonably simple (I don't think we need per EFI call page tables,
>>> >     etc.), and does not assume much about the firmware being able to enumerate its
>>> >     permissions properly. Were we to merge EFI support today I'd have insisted on
>>> >     trying such an approach from day 1 on.
>>>
>>> We already have separate EFI page tables, though with the caveat that
>>> we share some of swapper_pg_dir's PGD entries. The best solution would
>>> be to stop sharing entries and isolate the EFI mappings from every
>>> other page table structure, so that they're only used during the EFI
>>> service calls.
>>
>> Absolutely. Can you try to fix this for v4.3?
>>
>> Thanks,
>>
>>         Ingo



-- 
Kees Cook
Chrome OS Security

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [GIT PULL] x86/mm changes for v4.4
  2015-11-08  6:58                     ` Kees Cook
@ 2015-11-08  7:55                       ` Ard Biesheuvel
  2015-11-09 21:08                         ` Kees Cook
  0 siblings, 1 reply; 13+ messages in thread
From: Ard Biesheuvel @ 2015-11-08  7:55 UTC (permalink / raw)
  To: Kees Cook
  Cc: Ingo Molnar, Matt Fleming, Linus Torvalds, Stephen Smalley,
	Dave Jones, Linux Kernel Mailing List, Thomas Gleixner,
	H. Peter Anvin, Borislav Petkov, Andrew Morton, Andy Lutomirski,
	Denys Vlasenko, linux-efi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	Matthew Garrett

On 8 November 2015 at 07:58, Kees Cook <keescook-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org> wrote:
> On Fri, Nov 6, 2015 at 11:39 PM, Ard Biesheuvel
> <ard.biesheuvel-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote:
>> On 7 November 2015 at 08:09, Ingo Molnar <mingo-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> wrote:
>>>
>>> * Matt Fleming <matt-mF/unelCI9GS6iBeEJttW/XRex20P6io@public.gmane.org> wrote:
>>>
>>>> On Fri, 06 Nov, at 07:55:50AM, Ingo Molnar wrote:
>>>> >
>>>> >  3) We should fix the EFI permission problem without relying on the firmware: it
>>>> >     appears we could just mark everything R-X optimistically, and if a write fault
>>>> >     happens (it's pretty rare in fact, only triggers when we write to an EFI
>>>> >     variable and so), we can mark the faulting page RW- on the fly, because it
>>>> >     appears that writable EFI sections, while not enumerated very well in 'old'
>>>> >     firmware, are still supposed to be page granular. (Even 'new' firmware I
>>>> >     wouldn't automatically trust to get the enumeration right...)
>>>>
>>>> Sorry, this isn't true. I misled you with one of my earlier posts on
>>>> this topic. Let me try and clear things up...
>>>>
>>>> Writing to EFI regions has to do with every invocation of the EFI
>>>> runtime services - it's not limited to when you read/write/delete EFI
>>>> variables. In fact, EFI variables really have nothing to do with this
>>>> discussion, they're a completely opaque concept to the OS, we have no
>>>> idea how the firmware implements them. Everything is done via the EFI
>>>> boot/runtime services.
>>>>
>>>> The firmware itself will attempt to write to EFI regions when we
>>>> invoke the EFI services because that's where the PE/COFF ".data" and
>>>> ".bss" sections live along with the heap. There's even some relocation
>>>> fixups that occur as SetVirtualAddressMap() time so it'll write to
>>>> ".text" too.
>>>>
>>>> Now, the above PE/COFF sections are usually (always?) contained within
>>>> EFI regions of type EfiRuntimeServicesCode. We know this is true
>>>> because the firmware folks have told us so, and because stopping that
>>>> is the motivation behind the new EFI_PROPERTIES_TABLE feature in UEFI
>>>> V2.5.
>>>>
>>>> The data sections within the region are also *not* guaranteed to be
>>>> page granular because work was required in Tianocore for emitting
>>>> sections with 4k alignment as part of the EFI_PROPERTIES_TABLE
>>>> support.
>>>>
>>>> Ultimately, what this means is that if you were to attempt to
>>>> dynamically fixup those regions that required write permission, you'd
>>>> have to modify the mappings for the majority of the EFI regions
>>>> anyway. And if you're blindly allowing write permission as a fixup,
>>>> there's not much security to be had.
>>>
>>> I think you misunderstood my suggestion: the 'fixup' would be changing it from R-X
>>> to RW-, i.e. it would add 'write' permission but remove 'execute' permission.
>>>
>>> Note that there would be no 'RWX' permission at any given moment - which is the
>>> dangerous combination.
>>>
>>
>> The problem with that is that /any/ page in the UEFI runtime region
>> may intersect with both .text and .data of any of the PE/COFF images
>> that make up the runtime firmware (since the PE/COFF sections are not
>> necessarily page aligned). Such pages require RWX permissions. The
>> UEFI memory map does not provide the information to identify those
>> pages a priori (the entire region containing several PE/COFF images
>> could be covered by a single entry) so it is hard to guess which pages
>> should be allowed these RWX permissions.
>
> I'm sad that UEFI was designed without even the most basic of memory
> protections in mind. UEFI _itself_ should be setting up protective
> page mappings. :(
>

Well, the 4 KB alignment of sections was considered prohibitive at the
time from code size pov. But this was a long time ago, obviously.

> For a boot firmware, it seems to me that safe page table layout would
> be a top priority bug. The "reporting issues" page for TianoCore
> doesn't actually seem to link to the "Project Tracker":
> https://github.com/tianocore/tianocore.github.io/wiki/Reporting-Issues
>
> Does anyone know how to get this correctly reported so future UEFI
> releases don't suffer from this?
>

Ugh. Don't get me started on that topic. I have been working with the
UEFI forum since July to get a fundamentally broken implementation of
memory protections fixed. UEFI v2.5 defines a memory protection scheme
that is based on splitting PE/COFF images into separate memory regions
so that R-X and RW- permissions can be applied. Unfortunately, that
broke every OS in existence (including Windows 8), since the OS is
allowed to reorder memory regions when it lays out the virtual
remapping of the UEFI regions, resulting in PE/COFF .data and .text
potentially appearing out of order.

The good news is that we fixed it for the upcoming release (v2.6). I
can't disclose any specifics, though :-(

-- 
Ard.


>>>> >     If that 'supposed to be' turns out to be 'not true' (not unheard of in
>>>> >     firmware land), then plan B would be to mark pages that generate write faults
>>>> >     RWX as well, to not break functionality. (This 'mark it RWX' is not something
>>>> >     that exploits would have easy access to, and we could also generate a warning
>>>> >     [after the EFI call has finished] if it ever triggers.)
>>>> >
>>>> >     Admittedly this approach might not be without its own complications, but it
>>>> >     looks reasonably simple (I don't think we need per EFI call page tables,
>>>> >     etc.), and does not assume much about the firmware being able to enumerate its
>>>> >     permissions properly. Were we to merge EFI support today I'd have insisted on
>>>> >     trying such an approach from day 1 on.
>>>>
>>>> We already have separate EFI page tables, though with the caveat that
>>>> we share some of swapper_pg_dir's PGD entries. The best solution would
>>>> be to stop sharing entries and isolate the EFI mappings from every
>>>> other page table structure, so that they're only used during the EFI
>>>> service calls.
>>>
>>> Absolutely. Can you try to fix this for v4.3?
>>>
>>> Thanks,
>>>
>>>         Ingo
>
>
>
> --
> Kees Cook
> Chrome OS Security

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [GIT PULL] x86/mm changes for v4.4
  2015-11-08  7:55                       ` Ard Biesheuvel
@ 2015-11-09 21:08                         ` Kees Cook
  2015-11-10  7:08                           ` Ard Biesheuvel
  0 siblings, 1 reply; 13+ messages in thread
From: Kees Cook @ 2015-11-09 21:08 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Ingo Molnar, Matt Fleming, Linus Torvalds, Stephen Smalley,
	Dave Jones, Linux Kernel Mailing List, Thomas Gleixner,
	H. Peter Anvin, Borislav Petkov, Andrew Morton, Andy Lutomirski,
	Denys Vlasenko, linux-efi@vger.kernel.org, Matthew Garrett

On Sat, Nov 7, 2015 at 11:55 PM, Ard Biesheuvel
<ard.biesheuvel@linaro.org> wrote:
> On 8 November 2015 at 07:58, Kees Cook <keescook@chromium.org> wrote:
>> On Fri, Nov 6, 2015 at 11:39 PM, Ard Biesheuvel
>> <ard.biesheuvel@linaro.org> wrote:
>>> On 7 November 2015 at 08:09, Ingo Molnar <mingo@kernel.org> wrote:
>>>>
>>>> * Matt Fleming <matt@codeblueprint.co.uk> wrote:
>>>>
>>>>> On Fri, 06 Nov, at 07:55:50AM, Ingo Molnar wrote:
>>>>> >
>>>>> >  3) We should fix the EFI permission problem without relying on the firmware: it
>>>>> >     appears we could just mark everything R-X optimistically, and if a write fault
>>>>> >     happens (it's pretty rare in fact, only triggers when we write to an EFI
>>>>> >     variable and so), we can mark the faulting page RW- on the fly, because it
>>>>> >     appears that writable EFI sections, while not enumerated very well in 'old'
>>>>> >     firmware, are still supposed to be page granular. (Even 'new' firmware I
>>>>> >     wouldn't automatically trust to get the enumeration right...)
>>>>>
>>>>> Sorry, this isn't true. I misled you with one of my earlier posts on
>>>>> this topic. Let me try and clear things up...
>>>>>
>>>>> Writing to EFI regions has to do with every invocation of the EFI
>>>>> runtime services - it's not limited to when you read/write/delete EFI
>>>>> variables. In fact, EFI variables really have nothing to do with this
>>>>> discussion, they're a completely opaque concept to the OS, we have no
>>>>> idea how the firmware implements them. Everything is done via the EFI
>>>>> boot/runtime services.
>>>>>
>>>>> The firmware itself will attempt to write to EFI regions when we
>>>>> invoke the EFI services because that's where the PE/COFF ".data" and
>>>>> ".bss" sections live along with the heap. There's even some relocation
>>>>> fixups that occur as SetVirtualAddressMap() time so it'll write to
>>>>> ".text" too.
>>>>>
>>>>> Now, the above PE/COFF sections are usually (always?) contained within
>>>>> EFI regions of type EfiRuntimeServicesCode. We know this is true
>>>>> because the firmware folks have told us so, and because stopping that
>>>>> is the motivation behind the new EFI_PROPERTIES_TABLE feature in UEFI
>>>>> V2.5.
>>>>>
>>>>> The data sections within the region are also *not* guaranteed to be
>>>>> page granular because work was required in Tianocore for emitting
>>>>> sections with 4k alignment as part of the EFI_PROPERTIES_TABLE
>>>>> support.
>>>>>
>>>>> Ultimately, what this means is that if you were to attempt to
>>>>> dynamically fixup those regions that required write permission, you'd
>>>>> have to modify the mappings for the majority of the EFI regions
>>>>> anyway. And if you're blindly allowing write permission as a fixup,
>>>>> there's not much security to be had.
>>>>
>>>> I think you misunderstood my suggestion: the 'fixup' would be changing it from R-X
>>>> to RW-, i.e. it would add 'write' permission but remove 'execute' permission.
>>>>
>>>> Note that there would be no 'RWX' permission at any given moment - which is the
>>>> dangerous combination.
>>>>
>>>
>>> The problem with that is that /any/ page in the UEFI runtime region
>>> may intersect with both .text and .data of any of the PE/COFF images
>>> that make up the runtime firmware (since the PE/COFF sections are not
>>> necessarily page aligned). Such pages require RWX permissions. The
>>> UEFI memory map does not provide the information to identify those
>>> pages a priori (the entire region containing several PE/COFF images
>>> could be covered by a single entry) so it is hard to guess which pages
>>> should be allowed these RWX permissions.
>>
>> I'm sad that UEFI was designed without even the most basic of memory
>> protections in mind. UEFI _itself_ should be setting up protective
>> page mappings. :(
>>
>
> Well, the 4 KB alignment of sections was considered prohibitive at the
> time from code size pov. But this was a long time ago, obviously.

Heh, yeah, I'd expect max 4K padding to get code/data correctly
aligned on a 2MB binary to not be an issue. :)

>
>> For a boot firmware, it seems to me that safe page table layout would
>> be a top priority bug. The "reporting issues" page for TianoCore
>> doesn't actually seem to link to the "Project Tracker":
>> https://github.com/tianocore/tianocore.github.io/wiki/Reporting-Issues
>>
>> Does anyone know how to get this correctly reported so future UEFI
>> releases don't suffer from this?
>>
>
> Ugh. Don't get me started on that topic. I have been working with the
> UEFI forum since July to get a fundamentally broken implementation of
> memory protections fixed. UEFI v2.5 defines a memory protection scheme
> that is based on splitting PE/COFF images into separate memory regions
> so that R-X and RW- permissions can be applied. Unfortunately, that
> broke every OS in existence (including Windows 8), since the OS is
> allowed to reorder memory regions when it lays out the virtual
> remapping of the UEFI regions, resulting in PE/COFF .data and .text
> potentially appearing out of order.
>
> The good news is that we fixed it for the upcoming release (v2.6). I
> can't disclose any specifics, though :-(

As long as there's motion to getting it fixed, that makes me happy! :)
Does 2.6 get rid of the (AIUI) 2MB limit too?

-Kees

>
> --
> Ard.
>
>
>>>>> >     If that 'supposed to be' turns out to be 'not true' (not unheard of in
>>>>> >     firmware land), then plan B would be to mark pages that generate write faults
>>>>> >     RWX as well, to not break functionality. (This 'mark it RWX' is not something
>>>>> >     that exploits would have easy access to, and we could also generate a warning
>>>>> >     [after the EFI call has finished] if it ever triggers.)
>>>>> >
>>>>> >     Admittedly this approach might not be without its own complications, but it
>>>>> >     looks reasonably simple (I don't think we need per EFI call page tables,
>>>>> >     etc.), and does not assume much about the firmware being able to enumerate its
>>>>> >     permissions properly. Were we to merge EFI support today I'd have insisted on
>>>>> >     trying such an approach from day 1 on.
>>>>>
>>>>> We already have separate EFI page tables, though with the caveat that
>>>>> we share some of swapper_pg_dir's PGD entries. The best solution would
>>>>> be to stop sharing entries and isolate the EFI mappings from every
>>>>> other page table structure, so that they're only used during the EFI
>>>>> service calls.
>>>>
>>>> Absolutely. Can you try to fix this for v4.3?
>>>>
>>>> Thanks,
>>>>
>>>>         Ingo
>>
>>
>>
>> --
>> Kees Cook
>> Chrome OS Security



-- 
Kees Cook
Chrome OS Security

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [GIT PULL] x86/mm changes for v4.4
  2015-11-09 21:08                         ` Kees Cook
@ 2015-11-10  7:08                           ` Ard Biesheuvel
       [not found]                             ` <CAKv+Gu9ct9Rwi+_-0KtLq3Gw2Rn+QLhSVt_zbn4zBxfk_Qs16g-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 13+ messages in thread
From: Ard Biesheuvel @ 2015-11-10  7:08 UTC (permalink / raw)
  To: Kees Cook
  Cc: Ingo Molnar, Matt Fleming, Linus Torvalds, Stephen Smalley,
	Dave Jones, Linux Kernel Mailing List, Thomas Gleixner,
	H. Peter Anvin, Borislav Petkov, Andrew Morton, Andy Lutomirski,
	Denys Vlasenko, linux-efi@vger.kernel.org, Matthew Garrett

On 9 November 2015 at 22:08, Kees Cook <keescook@chromium.org> wrote:
> On Sat, Nov 7, 2015 at 11:55 PM, Ard Biesheuvel
> <ard.biesheuvel@linaro.org> wrote:
>> On 8 November 2015 at 07:58, Kees Cook <keescook@chromium.org> wrote:
>>> On Fri, Nov 6, 2015 at 11:39 PM, Ard Biesheuvel
>>> <ard.biesheuvel@linaro.org> wrote:
>>>> On 7 November 2015 at 08:09, Ingo Molnar <mingo@kernel.org> wrote:
>>>>>
>>>>> * Matt Fleming <matt@codeblueprint.co.uk> wrote:
>>>>>
>>>>>> On Fri, 06 Nov, at 07:55:50AM, Ingo Molnar wrote:
>>>>>> >
>>>>>> >  3) We should fix the EFI permission problem without relying on the firmware: it
>>>>>> >     appears we could just mark everything R-X optimistically, and if a write fault
>>>>>> >     happens (it's pretty rare in fact, only triggers when we write to an EFI
>>>>>> >     variable and so), we can mark the faulting page RW- on the fly, because it
>>>>>> >     appears that writable EFI sections, while not enumerated very well in 'old'
>>>>>> >     firmware, are still supposed to be page granular. (Even 'new' firmware I
>>>>>> >     wouldn't automatically trust to get the enumeration right...)
>>>>>>
>>>>>> Sorry, this isn't true. I misled you with one of my earlier posts on
>>>>>> this topic. Let me try and clear things up...
>>>>>>
>>>>>> Writing to EFI regions has to do with every invocation of the EFI
>>>>>> runtime services - it's not limited to when you read/write/delete EFI
>>>>>> variables. In fact, EFI variables really have nothing to do with this
>>>>>> discussion, they're a completely opaque concept to the OS, we have no
>>>>>> idea how the firmware implements them. Everything is done via the EFI
>>>>>> boot/runtime services.
>>>>>>
>>>>>> The firmware itself will attempt to write to EFI regions when we
>>>>>> invoke the EFI services because that's where the PE/COFF ".data" and
>>>>>> ".bss" sections live along with the heap. There's even some relocation
>>>>>> fixups that occur as SetVirtualAddressMap() time so it'll write to
>>>>>> ".text" too.
>>>>>>
>>>>>> Now, the above PE/COFF sections are usually (always?) contained within
>>>>>> EFI regions of type EfiRuntimeServicesCode. We know this is true
>>>>>> because the firmware folks have told us so, and because stopping that
>>>>>> is the motivation behind the new EFI_PROPERTIES_TABLE feature in UEFI
>>>>>> V2.5.
>>>>>>
>>>>>> The data sections within the region are also *not* guaranteed to be
>>>>>> page granular because work was required in Tianocore for emitting
>>>>>> sections with 4k alignment as part of the EFI_PROPERTIES_TABLE
>>>>>> support.
>>>>>>
>>>>>> Ultimately, what this means is that if you were to attempt to
>>>>>> dynamically fixup those regions that required write permission, you'd
>>>>>> have to modify the mappings for the majority of the EFI regions
>>>>>> anyway. And if you're blindly allowing write permission as a fixup,
>>>>>> there's not much security to be had.
>>>>>
>>>>> I think you misunderstood my suggestion: the 'fixup' would be changing it from R-X
>>>>> to RW-, i.e. it would add 'write' permission but remove 'execute' permission.
>>>>>
>>>>> Note that there would be no 'RWX' permission at any given moment - which is the
>>>>> dangerous combination.
>>>>>
>>>>
>>>> The problem with that is that /any/ page in the UEFI runtime region
>>>> may intersect with both .text and .data of any of the PE/COFF images
>>>> that make up the runtime firmware (since the PE/COFF sections are not
>>>> necessarily page aligned). Such pages require RWX permissions. The
>>>> UEFI memory map does not provide the information to identify those
>>>> pages a priori (the entire region containing several PE/COFF images
>>>> could be covered by a single entry) so it is hard to guess which pages
>>>> should be allowed these RWX permissions.
>>>
>>> I'm sad that UEFI was designed without even the most basic of memory
>>> protections in mind. UEFI _itself_ should be setting up protective
>>> page mappings. :(
>>>
>>
>> Well, the 4 KB alignment of sections was considered prohibitive at the
>> time from code size pov. But this was a long time ago, obviously.
>
> Heh, yeah, I'd expect max 4K padding to get code/data correctly
> aligned on a 2MB binary to not be an issue. :)
>

This is not about section sizes on ARM. The PE/COFF format does not
use segments, like ELF, so the payload (the sections) needs to be
completely disjoint from the header. This means, when using 4 KB
alignment, that every PE/COFF image wastes ~4 KB in the header and 4
KB on average in the section padding (assuming a .text/.data/.reloc
layout, as is common with PE/COFF)

Considering that a typical UEFI firmware image consists of numerous
(around 50 on average, I think) PE/COFF images, and some of them
execute from NOR flash, the Tianocore tooling (which is the reference
implementation) has always been geared towards keeping the alignment
as small as possible, typically 32 bytes unless data objects need
more. Since the UEFI runtime services are typically implemented by
several of these PE/COFF images, and since the memory they occupy may
be described by a single UEFI memory map entry, there is simply no
easy way to decide which pages need R-X, RW- or RWX. Even looking for
PE/COFF headers in the memory region is not guaranteed to work, since
the PE/COFF header is part of the file format, not the memory format
(i.e., since the header is disjoint from the payload, a PE/COFF loader
is not required to copy the header to memory)

>>
>>> For a boot firmware, it seems to me that safe page table layout would
>>> be a top priority bug. The "reporting issues" page for TianoCore
>>> doesn't actually seem to link to the "Project Tracker":
>>> https://github.com/tianocore/tianocore.github.io/wiki/Reporting-Issues
>>>
>>> Does anyone know how to get this correctly reported so future UEFI
>>> releases don't suffer from this?
>>>
>>
>> Ugh. Don't get me started on that topic. I have been working with the
>> UEFI forum since July to get a fundamentally broken implementation of
>> memory protections fixed. UEFI v2.5 defines a memory protection scheme
>> that is based on splitting PE/COFF images into separate memory regions
>> so that R-X and RW- permissions can be applied. Unfortunately, that
>> broke every OS in existence (including Windows 8), since the OS is
>> allowed to reorder memory regions when it lays out the virtual
>> remapping of the UEFI regions, resulting in PE/COFF .data and .text
>> potentially appearing out of order.
>>
>> The good news is that we fixed it for the upcoming release (v2.6). I
>> can't disclose any specifics, though :-(
>
> As long as there's motion to getting it fixed, that makes me happy! :)
> Does 2.6 get rid of the (AIUI) 2MB limit too?
>

No, there is no such limit in UEFI. If there is a limit like that, it
is an implementation detail of the UEFI support in the OS.

For arm64 (and the upcoming ARM support), the UEFI runtime services
regions are remapped into a virtual userland range that is only active
during the time runtime services are being invoked. (x86 does
something similar, but it shares the page tables with the
suspend/resume code afaiu) These mappings could be page granularity
(since they don't require splitting PUDs or PMDs in the linear
region), with the side note that arm64 mandates 64 KB alignment (to
interoperate with 64 KB pages OSes). This requirement has been added
to the UEFI spec, i.e., a v2.5 compliant arm64 firmware should not
expose UEFI runtime regions that are not 64 KB aligned.

-- 
Ard.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [GIT PULL] x86/mm changes for v4.4
       [not found]                             ` <CAKv+Gu9ct9Rwi+_-0KtLq3Gw2Rn+QLhSVt_zbn4zBxfk_Qs16g-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2015-11-10 20:11                               ` Kees Cook
  0 siblings, 0 replies; 13+ messages in thread
From: Kees Cook @ 2015-11-10 20:11 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Ingo Molnar, Matt Fleming, Linus Torvalds, Stephen Smalley,
	Dave Jones, Linux Kernel Mailing List, Thomas Gleixner,
	H. Peter Anvin, Borislav Petkov, Andrew Morton, Andy Lutomirski,
	Denys Vlasenko, linux-efi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	Matthew Garrett

On Mon, Nov 9, 2015 at 11:08 PM, Ard Biesheuvel
<ard.biesheuvel-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote:
> On 9 November 2015 at 22:08, Kees Cook <keescook-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org> wrote:
>> On Sat, Nov 7, 2015 at 11:55 PM, Ard Biesheuvel
>> <ard.biesheuvel-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote:
>>> On 8 November 2015 at 07:58, Kees Cook <keescook-F7+t8E8rja9g9hUCZPvPmw@public.gmane.org> wrote:
>>>> On Fri, Nov 6, 2015 at 11:39 PM, Ard Biesheuvel
>>>> <ard.biesheuvel-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote:
>>>>> On 7 November 2015 at 08:09, Ingo Molnar <mingo-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> wrote:
>>>>>>
>>>>>> * Matt Fleming <matt-mF/unelCI9GS6iBeEJttW/XRex20P6io@public.gmane.org> wrote:
>>>>>>
>>>>>>> On Fri, 06 Nov, at 07:55:50AM, Ingo Molnar wrote:
>>>>>>> >
>>>>>>> >  3) We should fix the EFI permission problem without relying on the firmware: it
>>>>>>> >     appears we could just mark everything R-X optimistically, and if a write fault
>>>>>>> >     happens (it's pretty rare in fact, only triggers when we write to an EFI
>>>>>>> >     variable and so), we can mark the faulting page RW- on the fly, because it
>>>>>>> >     appears that writable EFI sections, while not enumerated very well in 'old'
>>>>>>> >     firmware, are still supposed to be page granular. (Even 'new' firmware I
>>>>>>> >     wouldn't automatically trust to get the enumeration right...)
>>>>>>>
>>>>>>> Sorry, this isn't true. I misled you with one of my earlier posts on
>>>>>>> this topic. Let me try and clear things up...
>>>>>>>
>>>>>>> Writing to EFI regions has to do with every invocation of the EFI
>>>>>>> runtime services - it's not limited to when you read/write/delete EFI
>>>>>>> variables. In fact, EFI variables really have nothing to do with this
>>>>>>> discussion, they're a completely opaque concept to the OS, we have no
>>>>>>> idea how the firmware implements them. Everything is done via the EFI
>>>>>>> boot/runtime services.
>>>>>>>
>>>>>>> The firmware itself will attempt to write to EFI regions when we
>>>>>>> invoke the EFI services because that's where the PE/COFF ".data" and
>>>>>>> ".bss" sections live along with the heap. There's even some relocation
>>>>>>> fixups that occur as SetVirtualAddressMap() time so it'll write to
>>>>>>> ".text" too.
>>>>>>>
>>>>>>> Now, the above PE/COFF sections are usually (always?) contained within
>>>>>>> EFI regions of type EfiRuntimeServicesCode. We know this is true
>>>>>>> because the firmware folks have told us so, and because stopping that
>>>>>>> is the motivation behind the new EFI_PROPERTIES_TABLE feature in UEFI
>>>>>>> V2.5.
>>>>>>>
>>>>>>> The data sections within the region are also *not* guaranteed to be
>>>>>>> page granular because work was required in Tianocore for emitting
>>>>>>> sections with 4k alignment as part of the EFI_PROPERTIES_TABLE
>>>>>>> support.
>>>>>>>
>>>>>>> Ultimately, what this means is that if you were to attempt to
>>>>>>> dynamically fixup those regions that required write permission, you'd
>>>>>>> have to modify the mappings for the majority of the EFI regions
>>>>>>> anyway. And if you're blindly allowing write permission as a fixup,
>>>>>>> there's not much security to be had.
>>>>>>
>>>>>> I think you misunderstood my suggestion: the 'fixup' would be changing it from R-X
>>>>>> to RW-, i.e. it would add 'write' permission but remove 'execute' permission.
>>>>>>
>>>>>> Note that there would be no 'RWX' permission at any given moment - which is the
>>>>>> dangerous combination.
>>>>>>
>>>>>
>>>>> The problem with that is that /any/ page in the UEFI runtime region
>>>>> may intersect with both .text and .data of any of the PE/COFF images
>>>>> that make up the runtime firmware (since the PE/COFF sections are not
>>>>> necessarily page aligned). Such pages require RWX permissions. The
>>>>> UEFI memory map does not provide the information to identify those
>>>>> pages a priori (the entire region containing several PE/COFF images
>>>>> could be covered by a single entry) so it is hard to guess which pages
>>>>> should be allowed these RWX permissions.
>>>>
>>>> I'm sad that UEFI was designed without even the most basic of memory
>>>> protections in mind. UEFI _itself_ should be setting up protective
>>>> page mappings. :(
>>>>
>>>
>>> Well, the 4 KB alignment of sections was considered prohibitive at the
>>> time from code size pov. But this was a long time ago, obviously.
>>
>> Heh, yeah, I'd expect max 4K padding to get code/data correctly
>> aligned on a 2MB binary to not be an issue. :)
>>
>
> This is not about section sizes on ARM. The PE/COFF format does not
> use segments, like ELF, so the payload (the sections) needs to be
> completely disjoint from the header. This means, when using 4 KB
> alignment, that every PE/COFF image wastes ~4 KB in the header and 4
> KB on average in the section padding (assuming a .text/.data/.reloc
> layout, as is common with PE/COFF)
>
> Considering that a typical UEFI firmware image consists of numerous
> (around 50 on average, I think) PE/COFF images, and some of them

Oooh, that's no fun. So the linker can't produce merged .text and
.data sections?

> execute from NOR flash, the Tianocore tooling (which is the reference
> implementation) has always been geared towards keeping the alignment
> as small as possible, typically 32 bytes unless data objects need
> more. Since the UEFI runtime services are typically implemented by
> several of these PE/COFF images, and since the memory they occupy may
> be described by a single UEFI memory map entry, there is simply no
> easy way to decide which pages need R-X, RW- or RWX. Even looking for
> PE/COFF headers in the memory region is not guaranteed to work, since
> the PE/COFF header is part of the file format, not the memory format
> (i.e., since the header is disjoint from the payload, a PE/COFF loader
> is not required to copy the header to memory)
>
>>>
>>>> For a boot firmware, it seems to me that safe page table layout would
>>>> be a top priority bug. The "reporting issues" page for TianoCore
>>>> doesn't actually seem to link to the "Project Tracker":
>>>> https://github.com/tianocore/tianocore.github.io/wiki/Reporting-Issues
>>>>
>>>> Does anyone know how to get this correctly reported so future UEFI
>>>> releases don't suffer from this?
>>>>
>>>
>>> Ugh. Don't get me started on that topic. I have been working with the
>>> UEFI forum since July to get a fundamentally broken implementation of
>>> memory protections fixed. UEFI v2.5 defines a memory protection scheme
>>> that is based on splitting PE/COFF images into separate memory regions
>>> so that R-X and RW- permissions can be applied. Unfortunately, that
>>> broke every OS in existence (including Windows 8), since the OS is
>>> allowed to reorder memory regions when it lays out the virtual
>>> remapping of the UEFI regions, resulting in PE/COFF .data and .text
>>> potentially appearing out of order.
>>>
>>> The good news is that we fixed it for the upcoming release (v2.6). I
>>> can't disclose any specifics, though :-(
>>
>> As long as there's motion to getting it fixed, that makes me happy! :)
>> Does 2.6 get rid of the (AIUI) 2MB limit too?
>>
>
> No, there is no such limit in UEFI. If there is a limit like that, it
> is an implementation detail of the UEFI support in the OS.
>
> For arm64 (and the upcoming ARM support), the UEFI runtime services
> regions are remapped into a virtual userland range that is only active
> during the time runtime services are being invoked. (x86 does
> something similar, but it shares the page tables with the
> suspend/resume code afaiu) These mappings could be page granularity
> (since they don't require splitting PUDs or PMDs in the linear
> region), with the side note that arm64 mandates 64 KB alignment (to
> interoperate with 64 KB pages OSes). This requirement has been added
> to the UEFI spec, i.e., a v2.5 compliant arm64 firmware should not
> expose UEFI runtime regions that are not 64 KB aligned.

Cool, thanks for the details!

-Kees

-- 
Kees Cook
Chrome OS Security

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2015-11-10 20:11 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20151103111649.GA3477@gmail.com>
     [not found] ` <CA+55aFzcwO+RSLeHOwAYvjZ5AcVvD9Th2=G3R=ZQY1xf+MkDow@mail.gmail.com>
     [not found]   ` <20151104233907.GA25925@codemonkey.org.uk>
     [not found]     ` <CA+55aFxb14eM6b=ctq65Dx-Ujehj2dbtsVM9rrVOVfLgT=EoHg@mail.gmail.com>
     [not found]       ` <20151105021710.GA22941@codemonkey.org.uk>
     [not found]         ` <CA+55aFyXNFu_TfmBjGedCRujoAbhqiBcia7XOtzSq0uxbVv6MA@mail.gmail.com>
     [not found]           ` <CA+55aFyC7L85SYen3Uz6e1cvH0jXzQ9_MddHJ=7PxvpOR2U23w@mail.gmail.com>
2015-11-06 11:39             ` [GIT PULL] x86/mm changes for v4.4 Matt Fleming
     [not found]               ` <20151106113943.GB2651-mF/unelCI9GS6iBeEJttW/XRex20P6io@public.gmane.org>
2015-11-07  7:05                 ` Ingo Molnar
     [not found]                   ` <20151107070554.GB6235-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2015-11-07 10:03                     ` Matt Fleming
     [not found]           ` <20151106065549.GA2031@gmail.com>
2015-11-06 12:39             ` Matt Fleming
     [not found]               ` <20151106123912.GC2651-mF/unelCI9GS6iBeEJttW/XRex20P6io@public.gmane.org>
2015-11-07  7:09                 ` Ingo Molnar
2015-11-07  7:39                   ` Ard Biesheuvel
2015-11-08  6:58                     ` Kees Cook
2015-11-08  7:55                       ` Ard Biesheuvel
2015-11-09 21:08                         ` Kees Cook
2015-11-10  7:08                           ` Ard Biesheuvel
     [not found]                             ` <CAKv+Gu9ct9Rwi+_-0KtLq3Gw2Rn+QLhSVt_zbn4zBxfk_Qs16g-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-11-10 20:11                               ` Kees Cook
     [not found]             ` <CALCETrU2dn4TEj_2QiCPy4Mjw6hCbB84k1RnPzx7sLNygj4D5Q@mail.gmail.com>
     [not found]               ` <CALCETrU2dn4TEj_2QiCPy4Mjw6hCbB84k1RnPzx7sLNygj4D5Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-11-06 13:09                 ` Matt Fleming
     [not found]                   ` <20151106130948.GD2651-mF/unelCI9GS6iBeEJttW/XRex20P6io@public.gmane.org>
2015-11-06 13:24                     ` Borislav Petkov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox