* [PATCH] phys_efi_set_virtual_address_map needs va, no pa. @ 2012-06-20 8:24 Robin Holt 2012-06-20 12:07 ` Matthew Garrett 0 siblings, 1 reply; 8+ messages in thread From: Robin Holt @ 2012-06-20 8:24 UTC (permalink / raw) To: linux-kernel; +Cc: H. Peter Anvin, Matthew Garrett The kernel allocated memmap may end up being beyond the first 512GB of memory. That early range is identity mapped, while the remainder of memory is not. The net result is the memmap allocated by efi_enter_virtual_mode will not be accessible via its __pa as is currently passed back to EFI. Since EFI is going to have to parse the passed in table, I believe the EFI documentation is wrong. I asked one of our BIOS engineers to look at the Intel reference code and he said it was obvious that the address would have to be a virtaully accessible address as we are in virtual mode while EFI is handling the callback. Signed-off-by: Robin Holt <holt@sgi.com> Cc: Matthew Garrett <mjg@redhat.com> Cc: H. Peter Anvin <hpa@linux.intel.com> --- arch/x86/platform/efi/efi.c | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c index 92660ed..ea4317a 100644 --- a/arch/x86/platform/efi/efi.c +++ b/arch/x86/platform/efi/efi.c @@ -869,7 +869,7 @@ void __init efi_enter_virtual_mode(void) memmap.desc_size * count, memmap.desc_size, memmap.desc_version, - (efi_memory_desc_t *)__pa(new_memmap)); + new_memmap); if (status != EFI_SUCCESS) { pr_alert("Unable to switch EFI into virtual mode " -- 1.7.0.4 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH] phys_efi_set_virtual_address_map needs va, no pa. 2012-06-20 8:24 [PATCH] phys_efi_set_virtual_address_map needs va, no pa Robin Holt @ 2012-06-20 12:07 ` Matthew Garrett 2012-06-20 20:41 ` H. Peter Anvin 0 siblings, 1 reply; 8+ messages in thread From: Matthew Garrett @ 2012-06-20 12:07 UTC (permalink / raw) To: Robin Holt; +Cc: linux-kernel, H. Peter Anvin On Wed, Jun 20, 2012 at 03:24:57AM -0500, Robin Holt wrote: > The kernel allocated memmap may end up being beyond the first > 512GB of memory. That early range is identity mapped, while the > remainder of memory is not. The net result is the memmap allocated by > efi_enter_virtual_mode will not be accessible via its __pa as is currently > passed back to EFI. > > Since EFI is going to have to parse the passed in table, I believe the > EFI documentation is wrong. > > I asked one of our BIOS engineers to look at the Intel reference code > and he said it was obvious that the address would have to be a virtaully > accessible address as we are in virtual mode while EFI is handling the > callback. No, that's completely wrong. UEFI can't be called in virtual mode until *after* SetVirtualAddressMap(). The UEFI spec indicates that all physical memory must have an identity mapping at this stage (section 2.3.4), so if we don't then that's a bug that needs to be fixed. -- Matthew Garrett | mjg59@srcf.ucam.org ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] phys_efi_set_virtual_address_map needs va, no pa. 2012-06-20 12:07 ` Matthew Garrett @ 2012-06-20 20:41 ` H. Peter Anvin 2012-06-21 0:27 ` Robin Holt 0 siblings, 1 reply; 8+ messages in thread From: H. Peter Anvin @ 2012-06-20 20:41 UTC (permalink / raw) To: Matthew Garrett; +Cc: Robin Holt, linux-kernel, Sakkinen, Jarkko On 06/20/2012 05:07 AM, Matthew Garrett wrote: > > No, that's completely wrong. UEFI can't be called in virtual mode until > *after* SetVirtualAddressMap(). The UEFI spec indicates that all > physical memory must have an identity mapping at this stage (section > 2.3.4), so if we don't then that's a bug that needs to be fixed. > I think it is a bug, and with the trampoline work in 3.4 we should finally have a proper platform to fix it. In particular, we should keep a full 1:1 page map around, and it should be the one that is in the trampoline (real_mode_header->trampoline_pgd) as we need the page directory to be 32-bit addressable. The right thing to do is to sync the pgds in the 1:1 area, both for 64 bit and for legacy 32 bit (PAE 32 bit don't need it, since all the kernel maps are shared.) This is currently done ad hoc (and differently!) on both 32 and 64 bits and that really should be fixed. Once that is properly fixed, we have a usable identity mapping. On that subject, I have been thinking about the kexec use case. I'm thinking that if we indeed cannot use either physical mode nor a zero-offset virtual mode, that the most likely sane thing to do is to use a fixed offset of 2^46 and still use a (pseudo-)1:1 map. Do we have any data at all on machines that supposedly can't use identity-mapped EFI? -hpa ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] phys_efi_set_virtual_address_map needs va, no pa. 2012-06-20 20:41 ` H. Peter Anvin @ 2012-06-21 0:27 ` Robin Holt 2012-06-21 0:46 ` H. Peter Anvin 0 siblings, 1 reply; 8+ messages in thread From: Robin Holt @ 2012-06-21 0:27 UTC (permalink / raw) To: H. Peter Anvin Cc: Matthew Garrett, Robin Holt, linux-kernel, Sakkinen, Jarkko On Wed, Jun 20, 2012 at 01:41:54PM -0700, H. Peter Anvin wrote: > On 06/20/2012 05:07 AM, Matthew Garrett wrote: > > > > No, that's completely wrong. UEFI can't be called in virtual mode until > > *after* SetVirtualAddressMap(). The UEFI spec indicates that all > > physical memory must have an identity mapping at this stage (section > > 2.3.4), so if we don't then that's a bug that needs to be fixed. > > > > I think it is a bug, and with the trampoline work in 3.4 we should > finally have a proper platform to fix it. > > In particular, we should keep a full 1:1 page map around, and it should > be the one that is in the trampoline (real_mode_header->trampoline_pgd) > as we need the page directory to be 32-bit addressable. > > The right thing to do is to sync the pgds in the 1:1 area, both for 64 > bit and for legacy 32 bit (PAE 32 bit don't need it, since all the > kernel maps are shared.) This is currently done ad hoc (and > differently!) on both 32 and 64 bits and that really should be fixed. What do you need from me? If you want me to help with this, I have a _WHOLE_ lot of learning to do. Can you give me any pointers? We are trying to get this finally fixed. We have had work-around code in SLES11 SP1, SLES11 SP2, and RHEL 6.x. I would love to get this fixed for future distro snaps. Thanks, Robin ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] phys_efi_set_virtual_address_map needs va, no pa. 2012-06-21 0:27 ` Robin Holt @ 2012-06-21 0:46 ` H. Peter Anvin 2012-06-21 16:52 ` Robin Holt 2012-06-21 19:16 ` Konrad Rzeszutek Wilk 0 siblings, 2 replies; 8+ messages in thread From: H. Peter Anvin @ 2012-06-21 0:46 UTC (permalink / raw) To: Robin Holt Cc: Matthew Garrett, linux-kernel, Sakkinen, Jarkko, Konrad Rzeszutek Wilk On 06/20/2012 05:27 PM, Robin Holt wrote: > > What do you need from me? If you want me to help with this, I have a > _WHOLE_ lot of learning to do. Can you give me any pointers? > > We are trying to get this finally fixed. We have had work-around code > in SLES11 SP1, SLES11 SP2, and RHEL 6.x. I would love to get this fixed > for future distro snaps. > If you want to tackle it, the task is basically that when we modify the pgds in 32-bit legacy (non-PAE) mode, we should make the corresponding modifications to initial_page_table, and in 64-bit mode to real_mode_header->trampoline_pgd. It might be worthwhile to introduce a common pointer for both, obviously. This is currently handled via something called the pgd_list (when we update the top level kernel address space we walk pgd_list and update them all), but there are two issues: 1. Obviously, in the case of the 1:1 map, we don't just need to maintain the kernel area, but the "user space" part of the address space should contain a copy, as well. 2. To complicate things, there is code in there to grab an mm lock for the benefit of Xen. The 1:1 map doesn't have an mm associated with it, so I'm not quite sure how that is to be handled. Perhaps Xen just plain won't need it and we can just bypass it, but I have no bloody idea. It is also a bit "cute" how we seem to make a function call to indirect through a pointer (why on Earth is pgd_page_get_mm() not an inline?!), and then grab a lock unconditionally, regardless of if we are affected by Xen or not. -hpa ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] phys_efi_set_virtual_address_map needs va, no pa. 2012-06-21 0:46 ` H. Peter Anvin @ 2012-06-21 16:52 ` Robin Holt 2012-06-22 0:35 ` H. Peter Anvin 2012-06-21 19:16 ` Konrad Rzeszutek Wilk 1 sibling, 1 reply; 8+ messages in thread From: Robin Holt @ 2012-06-21 16:52 UTC (permalink / raw) To: H. Peter Anvin Cc: Robin Holt, Matthew Garrett, linux-kernel, Sakkinen, Jarkko, Konrad Rzeszutek Wilk On Wed, Jun 20, 2012 at 05:46:49PM -0700, H. Peter Anvin wrote: > On 06/20/2012 05:27 PM, Robin Holt wrote: > > > > What do you need from me? If you want me to help with this, I have a > > _WHOLE_ lot of learning to do. Can you give me any pointers? > > > > We are trying to get this finally fixed. We have had work-around code > > in SLES11 SP1, SLES11 SP2, and RHEL 6.x. I would love to get this fixed > > for future distro snaps. > > > > If you want to tackle it, the task is basically that when we modify the > pgds in 32-bit legacy (non-PAE) mode, we should make the corresponding > modifications to initial_page_table, and in 64-bit mode to > real_mode_header->trampoline_pgd. It might be worthwhile to introduce a > common pointer for both, obviously. I am completely lost as to what should be done. How do we know which identity maps need to be created? Do we just add them as we are scanning the e820/EFI memory maps and include the reserved, etc ranges? Do we look at the table handed to us by EFI at the beginning of boot and use that as the basis? Or do we simply wait until the kernel's memory initialization is complete and cover all of physical memory from zero up to the highest physical address? > This is currently handled via something called the pgd_list (when we > update the top level kernel address space we walk pgd_list and update > them all), but there are two issues: > > 1. Obviously, in the case of the 1:1 map, we don't just need to maintain > the kernel area, but the "user space" part of the address space should > contain a copy, as well. > > 2. To complicate things, there is code in there to grab an mm lock for > the benefit of Xen. The 1:1 map doesn't have an mm associated with it, > so I'm not quite sure how that is to be handled. Perhaps Xen just plain > won't need it and we can just bypass it, but I have no bloody idea. > > It is also a bit "cute" how we seem to make a function call to indirect > through a pointer (why on Earth is pgd_page_get_mm() not an inline?!), > and then grab a lock unconditionally, regardless of if we are affected > by Xen or not. > > -hpa ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] phys_efi_set_virtual_address_map needs va, no pa. 2012-06-21 16:52 ` Robin Holt @ 2012-06-22 0:35 ` H. Peter Anvin 0 siblings, 0 replies; 8+ messages in thread From: H. Peter Anvin @ 2012-06-22 0:35 UTC (permalink / raw) To: Robin Holt Cc: H. Peter Anvin, Matthew Garrett, linux-kernel, Sakkinen, Jarkko, Konrad Rzeszutek Wilk On 06/21/2012 09:52 AM, Robin Holt wrote: > > I am completely lost as to what should be done. How do we know > which identity maps need to be created? Do we just add them as we are > scanning the e820/EFI memory maps and include the reserved, etc ranges? > Do we look at the table handed to us by EFI at the beginning of boot and > use that as the basis? Or do we simply wait until the kernel's memory > initialization is complete and cover all of physical memory from zero > up to the highest physical address? > Robin, we already create the 1:1 maps. Right now there is some weirdness with some of the issues that you mention, but that is orthogonal to this. The 1:1 map created for the kernel is created at a specific offset, __PAGE_OFFSET, and is propagated into every vm context created by the kernel. There are two problems: 1. The "initial" (32 bit) or "trampoline" (64 bit) maps aren't on the list of vm contexts created by the kernel (pgd_list), so they never get updated after a particular point in the boot. 2. The initial/trampoline maps need these mappings not just at address __PAGE_OFFSET, but also at address zero (identity mapping), which means that just adding it to the pgd_list is insufficient. Note that i386-PAE is unaffected, simply because the contents of the top (3rd) level is always fixed. -hpa ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] phys_efi_set_virtual_address_map needs va, no pa. 2012-06-21 0:46 ` H. Peter Anvin 2012-06-21 16:52 ` Robin Holt @ 2012-06-21 19:16 ` Konrad Rzeszutek Wilk 1 sibling, 0 replies; 8+ messages in thread From: Konrad Rzeszutek Wilk @ 2012-06-21 19:16 UTC (permalink / raw) To: H. Peter Anvin Cc: Robin Holt, Matthew Garrett, linux-kernel, Sakkinen, Jarkko On Wed, Jun 20, 2012 at 05:46:49PM -0700, H. Peter Anvin wrote: > On 06/20/2012 05:27 PM, Robin Holt wrote: > > > > What do you need from me? If you want me to help with this, I have a > > _WHOLE_ lot of learning to do. Can you give me any pointers? > > > > We are trying to get this finally fixed. We have had work-around code > > in SLES11 SP1, SLES11 SP2, and RHEL 6.x. I would love to get this fixed > > for future distro snaps. > > > > If you want to tackle it, the task is basically that when we modify the > pgds in 32-bit legacy (non-PAE) mode, we should make the corresponding > modifications to initial_page_table, and in 64-bit mode to > real_mode_header->trampoline_pgd. It might be worthwhile to introduce a > common pointer for both, obviously. > > This is currently handled via something called the pgd_list (when we > update the top level kernel address space we walk pgd_list and update > them all), but there are two issues: > > 1. Obviously, in the case of the 1:1 map, we don't just need to maintain > the kernel area, but the "user space" part of the address space should > contain a copy, as well. > > 2. To complicate things, there is code in there to grab an mm lock for > the benefit of Xen. The 1:1 map doesn't have an mm associated with it, > so I'm not quite sure how that is to be handled. Perhaps Xen just plain > won't need it and we can just bypass it, but I have no bloody idea. You mean this? 79e53d8 (Andrea Arcangeli 2011-02-16 15:45:22 -0800 127) spin_lock(&pgd_lock); 4f76cd38 (Jeremy Fitzhardinge 2008-03-17 16:36:55 -0700 128) pgd_list_del(pgd); a79e53d8 (Andrea Arcangeli 2011-02-16 15:45:22 -0800 129) spin_unlock(&pgd_lock); which says: x86/mm: Fix pgd_lock deadlock It's forbidden to take the page_table_lock with the irq disabled or if there's contention the IPIs (for tlb flushes) sent with the page_table_lock held will never run leading to a deadlock. Nobody takes the pgd_lock from irq context so the _irqsave can be removed. Looking before that git commit I see Jeremy's 4f76cd38 unification of the 32-bit and 64-bit pgtable, and before that: 1da177e4c3f41524e886b7f1b8a0c1fc7321cac2 Linux-2.6.12-rc2 I am not really convienced that lock was put there for Xen as the git history seems to point to well, way ancient stuff. Or are you referring to something else? > > It is also a bit "cute" how we seem to make a function call to indirect > through a pointer (why on Earth is pgd_page_get_mm() not an inline?!), > and then grab a lock unconditionally, regardless of if we are affected > by Xen or not. > > -hpa > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2012-06-22 0:36 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2012-06-20 8:24 [PATCH] phys_efi_set_virtual_address_map needs va, no pa Robin Holt 2012-06-20 12:07 ` Matthew Garrett 2012-06-20 20:41 ` H. Peter Anvin 2012-06-21 0:27 ` Robin Holt 2012-06-21 0:46 ` H. Peter Anvin 2012-06-21 16:52 ` Robin Holt 2012-06-22 0:35 ` H. Peter Anvin 2012-06-21 19:16 ` Konrad Rzeszutek Wilk
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).