* Re: [RFC PATCH] kexec, x86/boot: map systab region in identity mapping before accessing it @ 2019-04-19 10:17 Borislav Petkov 2019-04-19 10:50 ` Baoquan He 0 siblings, 1 reply; 19+ messages in thread From: Borislav Petkov @ 2019-04-19 10:17 UTC (permalink / raw) To: Kairui Song, Thomas Gleixner Cc: linux-kernel, Junichi Nomura, Dave Young, Chao Fan, Baoquan He, x86@kernel.org, kexec@lists.infradead.org Breaking thread because this one got too big. On Fri, Apr 19, 2019 at 04:34:58PM +0800, Kairui Song wrote: > There are two approach to fix it, detect if the systab is mapped, and > avoid reading it if not. Ok, so tglx and I discussed this situation which is slowly getting out of hand with all the tinkering. So, here's what we should do - scream loudly now if some of this doesn't make any sense. 1. Junichi's patch should get the systab check above added and sent to 5.1 so that at least some EFI kexecing can work with 5.1 2. Then, the fact whether the kernel has been kexec'ed and which addresses it should use early, should all be passed through boot_params which is either setup by kexec(1) or by the first kernel itself, in the kexec_file_load() case. > the systab region is not mapped by the identity mapping provided by > kexec. 3. Then that needs to be fixed in the first kernel as it is a shortcoming of us starting to parse systab very early. It is the kexec setup code's problem not the early compressed stage's problem that the EFI systab is not mapped. Anything else I've forgotten? Anything I've misrepresented? Thx. -- Regards/Gruss, Boris. Good mailing practices for 400: avoid top-posting and trim the reply. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC PATCH] kexec, x86/boot: map systab region in identity mapping before accessing it 2019-04-19 10:17 [RFC PATCH] kexec, x86/boot: map systab region in identity mapping before accessing it Borislav Petkov @ 2019-04-19 10:50 ` Baoquan He 2019-04-19 10:55 ` Baoquan He ` (2 more replies) 0 siblings, 3 replies; 19+ messages in thread From: Baoquan He @ 2019-04-19 10:50 UTC (permalink / raw) To: Borislav Petkov Cc: Kairui Song, Thomas Gleixner, linux-kernel, Junichi Nomura, Dave Young, Chao Fan, x86@kernel.org, kexec@lists.infradead.org On 04/19/19 at 12:17pm, Borislav Petkov wrote: > Breaking thread because this one got too big. > > On Fri, Apr 19, 2019 at 04:34:58PM +0800, Kairui Song wrote: > > There are two approach to fix it, detect if the systab is mapped, and > > avoid reading it if not. > > Ok, so tglx and I discussed this situation which is slowly getting out > of hand with all the tinkering. > > So, here's what we should do - scream loudly now if some of this doesn't > make any sense. > > 1. Junichi's patch should get the systab check above added and sent to > 5.1 so that at least some EFI kexecing can work with 5.1 Talked with Kairui privately just now. Seems Junichi's patch need add this systab mapping. Since the systab region is not mapped on some machines. Those machine don't have this issue because they got systab region luckily coverred by 1 GB page mapping in 1st kernel before kexec jumping. This issue should happen whether it is KASLR kernel or not KASLR kernel. > > 2. Then, the fact whether the kernel has been kexec'ed and which > addresses it should use early, should all be passed through boot_params > which is either setup by kexec(1) or by the first kernel itself, in the > kexec_file_load() case. Seems no better way to check if it's kexec-ed kernel, except of the setup data checking of kexec-ed kernel. It may happen in both kexec_load or kexec_file_load, since we build ident mapping of kexec for RAM in 1st kernel. > > > the systab region is not mapped by the identity mapping provided by > > kexec. > > 3. Then that needs to be fixed in the first kernel as it is a > shortcoming of us starting to parse systab very early. It is the kexec > setup code's problem not the early compressed stage's problem that the > EFI systab is not mapped. Yeah, adding the systab mapping looks good. Kairui put it in decompressing stage just because he wants to cover the case in which the old kernel kexec jumping to 2nd kernel. Now it seems not very reasonable, we also have the new kernel kexec jumping to old 2nd kernel. Thanks Baoquan ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC PATCH] kexec, x86/boot: map systab region in identity mapping before accessing it 2019-04-19 10:50 ` Baoquan He @ 2019-04-19 10:55 ` Baoquan He 2019-04-19 11:20 ` Kairui Song 2019-04-19 11:28 ` [RFC PATCH] kexec, x86/boot: map systab region in identity mapping before accessing it Borislav Petkov 2 siblings, 0 replies; 19+ messages in thread From: Baoquan He @ 2019-04-19 10:55 UTC (permalink / raw) To: Borislav Petkov Cc: Kairui Song, Thomas Gleixner, linux-kernel, Junichi Nomura, Dave Young, Chao Fan, x86@kernel.org, kexec@lists.infradead.org On 04/19/19 at 06:50pm, Baoquan He wrote: > On 04/19/19 at 12:17pm, Borislav Petkov wrote: > > Breaking thread because this one got too big. > > > > On Fri, Apr 19, 2019 at 04:34:58PM +0800, Kairui Song wrote: > > > There are two approach to fix it, detect if the systab is mapped, and > > > avoid reading it if not. > > > > Ok, so tglx and I discussed this situation which is slowly getting out > > of hand with all the tinkering. > > > > So, here's what we should do - scream loudly now if some of this doesn't > > make any sense. > > > > 1. Junichi's patch should get the systab check above added and sent to > > 5.1 so that at least some EFI kexecing can work with 5.1 > > Talked with Kairui privately just now. Seems Junichi's patch need add > this systab mapping. Since the systab region is not mapped on some > machines. Those machine don't have this issue because they got systab > region luckily coverred by 1 GB page mapping in 1st kernel before > kexec jumping. > > This issue should happen whether it is KASLR kernel or not KASLR kernel. > > > > > 2. Then, the fact whether the kernel has been kexec'ed and which > > addresses it should use early, should all be passed through boot_params > > which is either setup by kexec(1) or by the first kernel itself, in the > > kexec_file_load() case. > > Seems no better way to check if it's kexec-ed kernel, except of the > setup data checking of kexec-ed kernel. > > It may happen in both kexec_load or kexec_file_load, since we build > ident mapping of kexec for RAM in 1st kernel. > > > > > > the systab region is not mapped by the identity mapping provided by > > > kexec. > > > > 3. Then that needs to be fixed in the first kernel as it is a > > shortcoming of us starting to parse systab very early. It is the kexec > > setup code's problem not the early compressed stage's problem that the > > EFI systab is not mapped. > > Yeah, adding the systab mapping looks good. Kairui put it in ^ in 1st kernel > decompressing stage just because he wants to cover the case in which the > old kernel kexec jumping to 2nd kernel. Now it seems not very > reasonable, we also have the new kernel kexec jumping to old 2nd kernel. > > Thanks > Baoquan ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC PATCH] kexec, x86/boot: map systab region in identity mapping before accessing it 2019-04-19 10:50 ` Baoquan He 2019-04-19 10:55 ` Baoquan He @ 2019-04-19 11:20 ` Kairui Song 2019-04-19 11:34 ` Borislav Petkov 2019-04-19 11:28 ` [RFC PATCH] kexec, x86/boot: map systab region in identity mapping before accessing it Borislav Petkov 2 siblings, 1 reply; 19+ messages in thread From: Kairui Song @ 2019-04-19 11:20 UTC (permalink / raw) To: Baoquan He Cc: Borislav Petkov, Thomas Gleixner, Linux Kernel Mailing List, Junichi Nomura, Dave Young, Chao Fan, x86@kernel.org, kexec@lists.infradead.org On Fri, Apr 19, 2019 at 6:50 PM Baoquan He <bhe@redhat.com> wrote: > > On 04/19/19 at 12:17pm, Borislav Petkov wrote: > > Breaking thread because this one got too big. > > > > On Fri, Apr 19, 2019 at 04:34:58PM +0800, Kairui Song wrote: > > > There are two approach to fix it, detect if the systab is mapped, and > > > avoid reading it if not. > > > > Ok, so tglx and I discussed this situation which is slowly getting out > > of hand with all the tinkering. > > > > So, here's what we should do - scream loudly now if some of this doesn't > > make any sense. > > > > 1. Junichi's patch should get the systab check above added and sent to > > 5.1 so that at least some EFI kexecing can work with 5.1 > > Talked with Kairui privately just now. Seems Junichi's patch need add > this systab mapping. Since the systab region is not mapped on some > machines. Those machine don't have this issue because they got systab > region luckily coverred by 1 GB page mapping in 1st kernel before > kexec jumping. > > This issue should happen whether it is KASLR kernel or not KASLR kernel. Thanks for the declaration Bao, I can verify on the machine I have, the issue still exist without kaslr. Currently, we read rsdp in early code and fill in boot_params unconditional, so it will read from the systab anyway. > > > > > 2. Then, the fact whether the kernel has been kexec'ed and which > > addresses it should use early, should all be passed through boot_params > > which is either setup by kexec(1) or by the first kernel itself, in the > > kexec_file_load() case. > > Seems no better way to check if it's kexec-ed kernel, except of the > setup data checking of kexec-ed kernel. > > It may happen in both kexec_load or kexec_file_load, since we build > ident mapping of kexec for RAM in 1st kernel. For kexec_file_load newer kernel will fill in the acpi_rsdp in boot_params so it bypassed the kexec_get_rsdp_addr (which will read from systab). The problem is not fixed, systab mapping still missing, but not likely to happen with kexec_file_load on newer kernel. > > > > > > the systab region is not mapped by the identity mapping provided by > > > kexec. > > > > 3. Then that needs to be fixed in the first kernel as it is a > > shortcoming of us starting to parse systab very early. It is the kexec > > setup code's problem not the early compressed stage's problem that the > > EFI systab is not mapped. > > Yeah, adding the systab mapping looks good. Kairui put it in > decompressing stage just because he wants to cover the case in which the > old kernel kexec jumping to 2nd kernel. Now it seems not very > reasonable, we also have the new kernel kexec jumping to old 1nd kernel. Yes, kexec only cover RAM in the ident map it prepared for second kernel, but the systab could be in reserved region, so if it didn't fall into the 1G padding by accident it will fail when reading from it. Fix in early code could make sure 2nd kernel always work. Or should we treat it specially in kexec mapping prepare code? > > Thanks > Baoquan -- Best Regards, Kairui Song ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC PATCH] kexec, x86/boot: map systab region in identity mapping before accessing it 2019-04-19 11:20 ` Kairui Song @ 2019-04-19 11:34 ` Borislav Petkov 2019-04-19 11:50 ` Kairui Song 0 siblings, 1 reply; 19+ messages in thread From: Borislav Petkov @ 2019-04-19 11:34 UTC (permalink / raw) To: Kairui Song Cc: Baoquan He, Thomas Gleixner, Linux Kernel Mailing List, Junichi Nomura, Dave Young, Chao Fan, x86@kernel.org, kexec@lists.infradead.org On Fri, Apr 19, 2019 at 07:20:06PM +0800, Kairui Song wrote: > Thanks for the declaration Bao, I can verify on the machine I have, > the issue still exist without kaslr. Currently, we read rsdp in early > code and fill in boot_params unconditional, so it will read from the > systab anyway. Yes, and in the future, info required by the kexec'ed kernel - like the EFI systab address or even whether the kernel has been kexec'ed or comes from cold boot - should be passed in boot_params. So that we don't have to do all that ugly dancing in early code. > Yes, kexec only cover RAM in the ident map it prepared for second > kernel, but the systab could be in reserved region, so if it didn't > fall into the 1G padding by accident it will fail when reading from > it. Fix in early code could make sure 2nd kernel always work. Or > should we treat it specially in kexec mapping prepare code? Yes, we should. As I said, this is not early boot code's problem but the kexec setup code's problem. If the new kernel cannot get RSDP that early, then it should fail the same way it failed before. That early RDSP parsing was added for the movable regions thing working with KASLR. If it can't get a RDSP for whatever reason, then if KASLR selects a region overlapping with the movable regions, then it is the old behavior. Ok? -- Regards/Gruss, Boris. Good mailing practices for 400: avoid top-posting and trim the reply. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC PATCH] kexec, x86/boot: map systab region in identity mapping before accessing it 2019-04-19 11:34 ` Borislav Petkov @ 2019-04-19 11:50 ` Kairui Song 2019-04-19 14:19 ` [PATCH] x86/boot: Disable RSDP parsing temporarily Borislav Petkov 0 siblings, 1 reply; 19+ messages in thread From: Kairui Song @ 2019-04-19 11:50 UTC (permalink / raw) To: Borislav Petkov Cc: Baoquan He, Thomas Gleixner, Linux Kernel Mailing List, Junichi Nomura, Dave Young, Chao Fan, x86@kernel.org, kexec@lists.infradead.org On Fri, Apr 19, 2019 at 7:34 PM Borislav Petkov <bp@alien8.de> wrote: > > On Fri, Apr 19, 2019 at 07:20:06PM +0800, Kairui Song wrote: > > Thanks for the declaration Bao, I can verify on the machine I have, > > the issue still exist without kaslr. Currently, we read rsdp in early > > code and fill in boot_params unconditional, so it will read from the > > systab anyway. > > Yes, and in the future, info required by the kexec'ed kernel - like the > EFI systab address or even whether the kernel has been kexec'ed or comes > from cold boot - should be passed in boot_params. So that we don't have > to do all that ugly dancing in early code. > > > Yes, kexec only cover RAM in the ident map it prepared for second > > kernel, but the systab could be in reserved region, so if it didn't > > fall into the 1G padding by accident it will fail when reading from > > it. Fix in early code could make sure 2nd kernel always work. Or > > should we treat it specially in kexec mapping prepare code? > > Yes, we should. As I said, this is not early boot code's problem but the > kexec setup code's problem. > > If the new kernel cannot get RSDP that early, then it should fail the > same way it failed before. That early RDSP parsing was added for the > movable regions thing working with KASLR. > > If it can't get a RDSP for whatever reason, then if KASLR selects > a region overlapping with the movable regions, then it is the old > behavior. > > Ok? > OK. And then fix the mapping issue in 1st kernel is the right way, I'll skip the update for the early code mapping thing. -- Best Regards, Kairui Song ^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH] x86/boot: Disable RSDP parsing temporarily 2019-04-19 11:50 ` Kairui Song @ 2019-04-19 14:19 ` Borislav Petkov 2019-04-22 9:46 ` [tip:x86/urgent] " tip-bot for Borislav Petkov 0 siblings, 1 reply; 19+ messages in thread From: Borislav Petkov @ 2019-04-19 14:19 UTC (permalink / raw) To: Kairui Song, Thomas Gleixner Cc: Baoquan He, Linux Kernel Mailing List, Junichi Nomura, Dave Young, Chao Fan, x86@kernel.org, kexec@lists.infradead.org, Ard Biesheuvel, Dave Hansen, H. Peter Anvin, indou.takao, Ingo Molnar, Juergen Gross, Kees Cook, Kirill A. Shutemov, msys.mizuma, Tom Lendacky Ok, thinking about this more, we believe it is too late in the release cycle to keep experimenting so the only thing left to do is the below. This should bring the situation back to what it was before, at 5.0 times, and we'll have plenty of time now to address and properly fix all the outstanding issues. --- From: Borislav Petkov <bp@suse.de> The original intention to move RDSP parsing very early, before KASLR does its ranges selection, was to accommodate movable memory regions machines (CONFIG_MEMORY_HOTREMOVE) to still be able to do memory hotplug. However, that broke kexec'ing a kernel on EFI machines because depending on where the EFI systab was mapped, on at least one machine it isn't present in the kexec mapping of the second kernel, leading to a triple fault in the early code. Fixing this properly requires significantly involved surgery and we cannot allow ourselves to do that, that close to the merge window. So disable the RSDP parsing code temporarily until it is fixed properly in the next release cycle. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Baoquan He <bhe@redhat.com> Cc: Chao Fan <fanc.fnst@cn.fujitsu.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: indou.takao@jp.fujitsu.com Cc: Ingo Molnar <mingo@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: kasong@redhat.com Cc: Kees Cook <keescook@chromium.org> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: msys.mizuma@gmail.com Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: x86-ml <x86@kernel.org> --- arch/x86/boot/compressed/misc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c index c0d6c560df69..5a237e8dbf8d 100644 --- a/arch/x86/boot/compressed/misc.c +++ b/arch/x86/boot/compressed/misc.c @@ -352,7 +352,7 @@ asmlinkage __visible void *extract_kernel(void *rmode, memptr heap, boot_params->hdr.loadflags &= ~KASLR_FLAG; /* Save RSDP address for later use. */ - boot_params->acpi_rsdp_addr = get_rsdp_addr(); + /* boot_params->acpi_rsdp_addr = get_rsdp_addr(); */ sanitize_boot_params(boot_params); -- 2.21.0 -- Regards/Gruss, Boris. Good mailing practices for 400: avoid top-posting and trim the reply. ^ permalink raw reply related [flat|nested] 19+ messages in thread
* [tip:x86/urgent] x86/boot: Disable RSDP parsing temporarily 2019-04-19 14:19 ` [PATCH] x86/boot: Disable RSDP parsing temporarily Borislav Petkov @ 2019-04-22 9:46 ` tip-bot for Borislav Petkov 0 siblings, 0 replies; 19+ messages in thread From: tip-bot for Borislav Petkov @ 2019-04-22 9:46 UTC (permalink / raw) To: linux-tip-commits Cc: thomas.lendacky, dave.hansen, keescook, tglx, x86, mingo, ard.biesheuvel, linux-kernel, mingo, kirill.shutemov, hpa, bp, jgross, fanc.fnst, bhe Commit-ID: 36f0c423552dacaca152324b8e9bda42a6d88865 Gitweb: https://git.kernel.org/tip/36f0c423552dacaca152324b8e9bda42a6d88865 Author: Borislav Petkov <bp@suse.de> AuthorDate: Fri, 19 Apr 2019 15:40:14 +0200 Committer: Borislav Petkov <bp@suse.de> CommitDate: Mon, 22 Apr 2019 11:36:43 +0200 x86/boot: Disable RSDP parsing temporarily The original intention to move RDSP parsing very early, before KASLR does its ranges selection, was to accommodate movable memory regions machines (CONFIG_MEMORY_HOTREMOVE) to still be able to do memory hotplug. However, that broke kexec'ing a kernel on EFI machines because depending on where the EFI systab was mapped, on at least one machine it isn't present in the kexec mapping of the second kernel, leading to a triple fault in the early code. Fixing this properly requires significantly involved surgery and we cannot allow ourselves to do that, that close to the merge window. So disable the RSDP parsing code temporarily until it is fixed properly in the next release cycle. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Baoquan He <bhe@redhat.com> Cc: Chao Fan <fanc.fnst@cn.fujitsu.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: indou.takao@jp.fujitsu.com Cc: Ingo Molnar <mingo@redhat.com> Cc: Juergen Gross <jgross@suse.com> Cc: kasong@redhat.com Cc: Kees Cook <keescook@chromium.org> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: msys.mizuma@gmail.com Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/20190419141952.GE10324@zn.tnic --- arch/x86/boot/compressed/misc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c index c0d6c560df69..5a237e8dbf8d 100644 --- a/arch/x86/boot/compressed/misc.c +++ b/arch/x86/boot/compressed/misc.c @@ -352,7 +352,7 @@ asmlinkage __visible void *extract_kernel(void *rmode, memptr heap, boot_params->hdr.loadflags &= ~KASLR_FLAG; /* Save RSDP address for later use. */ - boot_params->acpi_rsdp_addr = get_rsdp_addr(); + /* boot_params->acpi_rsdp_addr = get_rsdp_addr(); */ sanitize_boot_params(boot_params); ^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [RFC PATCH] kexec, x86/boot: map systab region in identity mapping before accessing it 2019-04-19 10:50 ` Baoquan He 2019-04-19 10:55 ` Baoquan He 2019-04-19 11:20 ` Kairui Song @ 2019-04-19 11:28 ` Borislav Petkov 2019-04-19 11:36 ` Borislav Petkov 2019-04-19 11:44 ` Baoquan He 2 siblings, 2 replies; 19+ messages in thread From: Borislav Petkov @ 2019-04-19 11:28 UTC (permalink / raw) To: Baoquan He Cc: Kairui Song, Thomas Gleixner, linux-kernel, Junichi Nomura, Dave Young, Chao Fan, x86@kernel.org, kexec@lists.infradead.org On Fri, Apr 19, 2019 at 06:50:14PM +0800, Baoquan He wrote: > Talked with Kairui privately just now. Seems Junichi's patch need add > this systab mapping. Since the systab region is not mapped on some > machines. Those machine don't have this issue because they got systab > region luckily coverred by 1 GB page mapping in 1st kernel before > kexec jumping. You don't have to repeat all I that - I know what the problem is. Read what I said again: it is too late for 5.1 to do any involved surgery. > > 2. Then, the fact whether the kernel has been kexec'ed and which > > addresses it should use early, should all be passed through boot_params > > which is either setup by kexec(1) or by the first kernel itself, in the > > kexec_file_load() case. > > Seems no better way to check if it's kexec-ed kernel, except of the > setup data checking of kexec-ed kernel. Why does that "seem" so? Read again what I said: "should all be passed through boot_params". Which means, boot_params should be extended with a field of a flag to say: "this is a kexec'ed kernel". If it "seems" then it should be made to not "seem" but to work properly. > Yeah, adding the systab mapping looks good. Kairui put it in > decompressing stage just because he wants to cover the case in which the > old kernel kexec jumping to 2nd kernel. Now it seems not very > reasonable, we also have the new kernel kexec jumping to old 2nd kernel. I don't think we can guarantee kexec between old<->new kernel to always work. Otherwise, we can forget all development and improvements of new kernel. -- Regards/Gruss, Boris. Good mailing practices for 400: avoid top-posting and trim the reply. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC PATCH] kexec, x86/boot: map systab region in identity mapping before accessing it 2019-04-19 11:28 ` [RFC PATCH] kexec, x86/boot: map systab region in identity mapping before accessing it Borislav Petkov @ 2019-04-19 11:36 ` Borislav Petkov 2019-04-22 14:33 ` Baoquan He 2019-04-19 11:44 ` Baoquan He 1 sibling, 1 reply; 19+ messages in thread From: Borislav Petkov @ 2019-04-19 11:36 UTC (permalink / raw) To: Baoquan He Cc: Kairui Song, Thomas Gleixner, linux-kernel, Junichi Nomura, Dave Young, Chao Fan, x86@kernel.org, kexec@lists.infradead.org On Fri, Apr 19, 2019 at 01:28:01PM +0200, Borislav Petkov wrote: > Read again what I said: "should all be passed through boot_params". > Which means, boot_params should be extended with a field of a flag to > say: "this is a kexec'ed kernel". And by that I mean similar to the XLF_EFI_KEXEC mechanism. The first kernel or kexec(1) should prepare the info needed by the kexec'ed kernel. -- Regards/Gruss, Boris. Good mailing practices for 400: avoid top-posting and trim the reply. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC PATCH] kexec, x86/boot: map systab region in identity mapping before accessing it 2019-04-19 11:36 ` Borislav Petkov @ 2019-04-22 14:33 ` Baoquan He 2019-04-22 15:17 ` Borislav Petkov 0 siblings, 1 reply; 19+ messages in thread From: Baoquan He @ 2019-04-22 14:33 UTC (permalink / raw) To: Junichi Nomura, Borislav Petkov, dyoung Cc: Kairui Song, Thomas Gleixner, linux-kernel, Chao Fan, x86@kernel.org, kexec@lists.infradead.org On 04/19/19 at 01:36pm, Borislav Petkov wrote: > On Fri, Apr 19, 2019 at 01:28:01PM +0200, Borislav Petkov wrote: > > Read again what I said: "should all be passed through boot_params". > > Which means, boot_params should be extended with a field of a flag to > > say: "this is a kexec'ed kernel". > > And by that I mean similar to the XLF_EFI_KEXEC mechanism. The first > kernel or kexec(1) should prepare the info needed by the kexec'ed > kernel. We have set the loader type to '0x0D << 4' for kexec specifically, in both kexec_load and kexec_file_load. We can check this to identify if it's kexec-ed kernel or not. Update patch with it? static void *bzImage64_load(struct kimage *image, char *kernel, unsigned long kernel_len, char *initrd, unsigned long initrd_len, char *cmdline, unsigned long cmdline_len) { ... /* bootloader info. Do we need a separate ID for kexec kernel loader? */ params->hdr.type_of_loader = 0x0D << 4; ... } ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC PATCH] kexec, x86/boot: map systab region in identity mapping before accessing it 2019-04-22 14:33 ` Baoquan He @ 2019-04-22 15:17 ` Borislav Petkov 2019-04-26 9:51 ` Baoquan He 0 siblings, 1 reply; 19+ messages in thread From: Borislav Petkov @ 2019-04-22 15:17 UTC (permalink / raw) To: Baoquan He, H. Peter Anvin Cc: Junichi Nomura, dyoung, Kairui Song, Thomas Gleixner, linux-kernel, Chao Fan, x86@kernel.org, kexec@lists.infradead.org + hpa On Mon, Apr 22, 2019 at 10:33:46PM +0800, Baoquan He wrote: > On 04/19/19 at 01:36pm, Borislav Petkov wrote: > > On Fri, Apr 19, 2019 at 01:28:01PM +0200, Borislav Petkov wrote: > > > Read again what I said: "should all be passed through boot_params". > > > Which means, boot_params should be extended with a field of a flag to > > > say: "this is a kexec'ed kernel". > > > > And by that I mean similar to the XLF_EFI_KEXEC mechanism. The first > > kernel or kexec(1) should prepare the info needed by the kexec'ed > > kernel. > > We have set the loader type to '0x0D << 4' for kexec specifically, in both > kexec_load and kexec_file_load. We can check this to identify if it's > kexec-ed kernel or not. > > Update patch with it? > > static void *bzImage64_load(struct kimage *image, char *kernel, > unsigned long kernel_len, char *initrd, > unsigned long initrd_len, char *cmdline, > unsigned long cmdline_len) > { > > ... > /* bootloader info. Do we need a separate ID for kexec kernel loader? */ > params->hdr.type_of_loader = 0x0D << 4; That's already documented in Documentation/x86/boot.txt Field name: type_of_loader Type: write (obligatory) Offset/size: 0x210/1 Protocol: 2.00+ ... D kexec-tools And yes, the question in the code is still valid: do we need a separate ID. I'd say no and we'll simply call 0xD all kernels loaded using a kexec-type syscall. IMO. -- Regards/Gruss, Boris. Good mailing practices for 400: avoid top-posting and trim the reply. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC PATCH] kexec, x86/boot: map systab region in identity mapping before accessing it 2019-04-22 15:17 ` Borislav Petkov @ 2019-04-26 9:51 ` Baoquan He 2019-04-26 9:58 ` Borislav Petkov 0 siblings, 1 reply; 19+ messages in thread From: Baoquan He @ 2019-04-26 9:51 UTC (permalink / raw) To: Borislav Petkov Cc: H. Peter Anvin, Junichi Nomura, dyoung, Kairui Song, Thomas Gleixner, linux-kernel, Chao Fan, x86@kernel.org, kexec@lists.infradead.org On 04/22/19 at 05:17pm, Borislav Petkov wrote: > + hpa > > On Mon, Apr 22, 2019 at 10:33:46PM +0800, Baoquan He wrote: > > On 04/19/19 at 01:36pm, Borislav Petkov wrote: > > > On Fri, Apr 19, 2019 at 01:28:01PM +0200, Borislav Petkov wrote: > > > > Read again what I said: "should all be passed through boot_params". > > > > Which means, boot_params should be extended with a field of a flag to > > > > say: "this is a kexec'ed kernel". > > > > > > And by that I mean similar to the XLF_EFI_KEXEC mechanism. The first > > > kernel or kexec(1) should prepare the info needed by the kexec'ed > > > kernel. > > > > We have set the loader type to '0x0D << 4' for kexec specifically, in both > > kexec_load and kexec_file_load. We can check this to identify if it's > > kexec-ed kernel or not. > > > > Update patch with it? > > > > static void *bzImage64_load(struct kimage *image, char *kernel, > > unsigned long kernel_len, char *initrd, > > unsigned long initrd_len, char *cmdline, > > unsigned long cmdline_len) > > { > > > > ... > > /* bootloader info. Do we need a separate ID for kexec kernel loader? */ > > params->hdr.type_of_loader = 0x0D << 4; > > That's already documented in Documentation/x86/boot.txt > > Field name: type_of_loader > Type: write (obligatory) > Offset/size: 0x210/1 > Protocol: 2.00+ > > ... > > D kexec-tools > > And yes, the question in the code is still valid: do we need a separate ID. > > I'd say no and we'll simply call 0xD all kernels loaded using a > kexec-type syscall. Yes, agree. Time has proved we don't need a separate ID, just 0x0D is fine for both kexec/kdump. We can clear it away now. I can make a patch to add a bit into xloadflags, to indicate that this is kexec-ed kernel. It can help to differentiate kexec-ed kernel from kdump kernel. As we know, kdump kernel is recognized with /proc/vmcore existence. While during kernel initialization stage, or /proc/vmcore is not validated in some cases, the adding bit may help. Thoughts? Thanks Baoquan ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC PATCH] kexec, x86/boot: map systab region in identity mapping before accessing it 2019-04-26 9:51 ` Baoquan He @ 2019-04-26 9:58 ` Borislav Petkov 2019-04-26 10:16 ` Baoquan He 0 siblings, 1 reply; 19+ messages in thread From: Borislav Petkov @ 2019-04-26 9:58 UTC (permalink / raw) To: Baoquan He Cc: H. Peter Anvin, Junichi Nomura, dyoung, Kairui Song, Thomas Gleixner, linux-kernel, Chao Fan, x86@kernel.org, kexec@lists.infradead.org On Fri, Apr 26, 2019 at 05:51:34PM +0800, Baoquan He wrote: > I can make a patch to add a bit into xloadflags, to indicate that this > is kexec-ed kernel. It can help to differentiate kexec-ed kernel from > kdump kernel. From the recent snafu, the only thing we needed is to differentiate between the *first* kernel and the following kernel(s) which has been started/loaded using a kexec syscall. -- Regards/Gruss, Boris. Good mailing practices for 400: avoid top-posting and trim the reply. ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC PATCH] kexec, x86/boot: map systab region in identity mapping before accessing it 2019-04-26 9:58 ` Borislav Petkov @ 2019-04-26 10:16 ` Baoquan He 0 siblings, 0 replies; 19+ messages in thread From: Baoquan He @ 2019-04-26 10:16 UTC (permalink / raw) To: Borislav Petkov Cc: H. Peter Anvin, Junichi Nomura, dyoung, Kairui Song, Thomas Gleixner, linux-kernel, Chao Fan, x86@kernel.org, kexec@lists.infradead.org On 04/26/19 at 11:58am, Borislav Petkov wrote: > On Fri, Apr 26, 2019 at 05:51:34PM +0800, Baoquan He wrote: > > I can make a patch to add a bit into xloadflags, to indicate that this > > is kexec-ed kernel. It can help to differentiate kexec-ed kernel from > > kdump kernel. > > From the recent snafu, the only thing we needed is to differentiate > between the *first* kernel and the following kernel(s) which has been > started/loaded using a kexec syscall. OK. To make sure I got it, the loader type 0xD is enough for this, right? It's fine to me, we can add it later if needed. I remember there's an issue in intel/amd iommu, in which we need differentiate between kexec/kdump kernel, but not very sure. I will check it when I have time to work on that. Thanks Baoquan ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC PATCH] kexec, x86/boot: map systab region in identity mapping before accessing it 2019-04-19 11:28 ` [RFC PATCH] kexec, x86/boot: map systab region in identity mapping before accessing it Borislav Petkov 2019-04-19 11:36 ` Borislav Petkov @ 2019-04-19 11:44 ` Baoquan He 1 sibling, 0 replies; 19+ messages in thread From: Baoquan He @ 2019-04-19 11:44 UTC (permalink / raw) To: Borislav Petkov Cc: Kairui Song, Thomas Gleixner, linux-kernel, Junichi Nomura, Dave Young, Chao Fan, x86@kernel.org, kexec@lists.infradead.org On 04/19/19 at 01:28pm, Borislav Petkov wrote: > Why does that "seem" so? > > Read again what I said: "should all be passed through boot_params". > Which means, boot_params should be extended with a field of a flag to > say: "this is a kexec'ed kernel". > > If it "seems" then it should be made to not "seem" but to work properly. No objection to extending with fields of a flag to mark kexec'ed kernel. Or kdump kernel either. We now check kdump kernel by /proc/vmcore. > > > Yeah, adding the systab mapping looks good. Kairui put it in > > decompressing stage just because he wants to cover the case in which the > > old kernel kexec jumping to 2nd kernel. Now it seems not very > > reasonable, we also have the new kernel kexec jumping to old 2nd kernel. > > I don't think we can guarantee kexec between old<->new kernel to always > work. Otherwise, we can forget all development and improvements of new > kernel. I personally agree with this. Very earlier, we tried to remove the 896 MB limitation of crashkernel=xM, to extend it to 4G or the whole RAM, but rejcted by linus since he worried it could break the old kernel kdumping. I may not remember well. ^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH] x86/boot: Use efi_setup_data for searching RSDP on kexec-ed kernels
@ 2019-04-16 9:52 Borislav Petkov
2019-04-19 8:34 ` [RFC PATCH] kexec, x86/boot: map systab region in identity mapping before accessing it Kairui Song
0 siblings, 1 reply; 19+ messages in thread
From: Borislav Petkov @ 2019-04-16 9:52 UTC (permalink / raw)
To: Junichi Nomura
Cc: Dave Young, Chao Fan, Baoquan He, Kairui Song, x86@kernel.org,
kexec@lists.infradead.org, linux-kernel@vger.kernel.org
I'll queue the below in the next days if there are no more complaints:
---
From: Junichi Nomura <j-nomura@ce.jp.nec.com>
Commit
3a63f70bf4c3a ("x86/boot: Early parse RSDP and save it in boot_params")
broke kexec boot on EFI systems. efi_get_rsdp_addr() in the early
parsing code tries to search RSDP from the EFI tables but that will
crash because the table address is virtual when the kernel was booted by
kexec (set_virtual_address_map() has run in the first kernel and cannot
be run again in the second kernel).
In the case of kexec, the physical address of EFI tables is provided via
efi_setup_data in boot_params, which is set up by kexec(1).
Factor out the table parsing code and use different pointers depending
on whether the kernel is booted by kexec or not.
[ bp: Massage. ]
Fixes: 3a63f70bf4c3a ("x86/boot: Early parse RSDP and save it in boot_params")
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Chao Fan <fanc.fnst@cn.fujitsu.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Dave Young <dyoung@redhat.com>
Link: https://lkml.kernel.org/r/20190408231011.GA5402@jeru.linux.bs1.fc.nec.co.jp
---
arch/x86/boot/compressed/acpi.c | 143 ++++++++++++++++++++++++--------
1 file changed, 107 insertions(+), 36 deletions(-)
diff --git a/arch/x86/boot/compressed/acpi.c b/arch/x86/boot/compressed/acpi.c
index 0ef4ad55b29b..8cecce1ac0cd 100644
--- a/arch/x86/boot/compressed/acpi.c
+++ b/arch/x86/boot/compressed/acpi.c
@@ -44,17 +44,109 @@ static acpi_physical_address get_acpi_rsdp(void)
return addr;
}
-/* Search EFI system tables for RSDP. */
-static acpi_physical_address efi_get_rsdp_addr(void)
+/*
+ * Search EFI system tables for RSDP. If both ACPI_20_TABLE_GUID and
+ * ACPI_TABLE_GUID are found, take the former, which has more features.
+ */
+static acpi_physical_address
+__efi_get_rsdp_addr(unsigned long config_tables, unsigned int nr_tables,
+ bool efi_64)
{
acpi_physical_address rsdp_addr = 0;
#ifdef CONFIG_EFI
- unsigned long systab, systab_tables, config_tables;
+ int i;
+
+ /* Get EFI tables from systab. */
+ for (i = 0; i < nr_tables; i++) {
+ acpi_physical_address table;
+ efi_guid_t guid;
+
+ if (efi_64) {
+ efi_config_table_64_t *tbl = (efi_config_table_64_t *) config_tables + i;
+
+ guid = tbl->guid;
+ table = tbl->table;
+
+ if (!IS_ENABLED(CONFIG_X86_64) && table >> 32) {
+ debug_putstr("Error getting RSDP address: EFI config table located above 4GB.\n");
+ return 0;
+ }
+ } else {
+ efi_config_table_32_t *tbl = (efi_config_table_32_t *) config_tables + i;
+
+ guid = tbl->guid;
+ table = tbl->table;
+ }
+
+ if (!(efi_guidcmp(guid, ACPI_TABLE_GUID)))
+ rsdp_addr = table;
+ else if (!(efi_guidcmp(guid, ACPI_20_TABLE_GUID)))
+ return table;
+ }
+#endif
+ return rsdp_addr;
+}
+
+/* EFI/kexec support is 64-bit only. */
+#ifdef CONFIG_X86_64
+static struct efi_setup_data *get_kexec_setup_data_addr(void)
+{
+ struct setup_data *data;
+ u64 pa_data;
+
+ pa_data = boot_params->hdr.setup_data;
+ while (pa_data) {
+ data = (struct setup_data *)pa_data;
+ if (data->type == SETUP_EFI)
+ return (struct efi_setup_data *)(pa_data + sizeof(struct setup_data));
+
+ pa_data = data->next;
+ }
+ return NULL;
+}
+
+static acpi_physical_address kexec_get_rsdp_addr(void)
+{
+ efi_system_table_64_t *systab;
+ struct efi_setup_data *esd;
+ struct efi_info *ei;
+ char *sig;
+
+ esd = (struct efi_setup_data *)get_kexec_setup_data_addr();
+ if (!esd)
+ return 0;
+
+ if (!esd->tables) {
+ debug_putstr("Wrong kexec SETUP_EFI data.\n");
+ return 0;
+ }
+
+ ei = &boot_params->efi_info;
+ sig = (char *)&ei->efi_loader_signature;
+ if (strncmp(sig, EFI64_LOADER_SIGNATURE, 4)) {
+ debug_putstr("Wrong kexec EFI loader signature.\n");
+ return 0;
+ }
+
+ /* Get systab from boot params. */
+ systab = (efi_system_table_64_t *) (ei->efi_systab | ((__u64)ei->efi_systab_hi << 32));
+ if (!systab)
+ error("EFI system table not found in kexec boot_params.");
+
+ return __efi_get_rsdp_addr((unsigned long)esd->tables, systab->nr_tables, true);
+}
+#else
+static acpi_physical_address kexec_get_rsdp_addr(void) { return 0; }
+#endif /* CONFIG_X86_64 */
+
+static acpi_physical_address efi_get_rsdp_addr(void)
+{
+#ifdef CONFIG_EFI
+ unsigned long systab, config_tables;
unsigned int nr_tables;
struct efi_info *ei;
bool efi_64;
- int size, i;
char *sig;
ei = &boot_params->efi_info;
@@ -88,49 +180,20 @@ static acpi_physical_address efi_get_rsdp_addr(void)
config_tables = stbl->tables;
nr_tables = stbl->nr_tables;
- size = sizeof(efi_config_table_64_t);
} else {
efi_system_table_32_t *stbl = (efi_system_table_32_t *)systab;
config_tables = stbl->tables;
nr_tables = stbl->nr_tables;
- size = sizeof(efi_config_table_32_t);
}
if (!config_tables)
error("EFI config tables not found.");
- /* Get EFI tables from systab. */
- for (i = 0; i < nr_tables; i++) {
- acpi_physical_address table;
- efi_guid_t guid;
-
- config_tables += size;
-
- if (efi_64) {
- efi_config_table_64_t *tbl = (efi_config_table_64_t *)config_tables;
-
- guid = tbl->guid;
- table = tbl->table;
-
- if (!IS_ENABLED(CONFIG_X86_64) && table >> 32) {
- debug_putstr("Error getting RSDP address: EFI config table located above 4GB.\n");
- return 0;
- }
- } else {
- efi_config_table_32_t *tbl = (efi_config_table_32_t *)config_tables;
-
- guid = tbl->guid;
- table = tbl->table;
- }
-
- if (!(efi_guidcmp(guid, ACPI_TABLE_GUID)))
- rsdp_addr = table;
- else if (!(efi_guidcmp(guid, ACPI_20_TABLE_GUID)))
- return table;
- }
+ return __efi_get_rsdp_addr(config_tables, nr_tables, efi_64);
+#else
+ return 0;
#endif
- return rsdp_addr;
}
static u8 compute_checksum(u8 *buffer, u32 length)
@@ -220,6 +283,14 @@ acpi_physical_address get_rsdp_addr(void)
if (!pa)
pa = boot_params->acpi_rsdp_addr;
+ /*
+ * Try to get EFI data from setup_data. This can happen when we're a
+ * kexec'ed kernel and kexec(1) has passed all the required EFI info to
+ * us.
+ */
+ if (!pa)
+ pa = kexec_get_rsdp_addr();
+
if (!pa)
pa = efi_get_rsdp_addr();
--
2.21.0
--
Regards/Gruss,
Boris.
Good mailing practices for 400: avoid top-posting and trim the reply.
^ permalink raw reply related [flat|nested] 19+ messages in thread* [RFC PATCH] kexec, x86/boot: map systab region in identity mapping before accessing it 2019-04-16 9:52 [PATCH] x86/boot: Use efi_setup_data for searching RSDP on kexec-ed kernels Borislav Petkov @ 2019-04-19 8:34 ` Kairui Song 2019-04-19 8:58 ` Baoquan He 0 siblings, 1 reply; 19+ messages in thread From: Kairui Song @ 2019-04-19 8:34 UTC (permalink / raw) To: linux-kernel Cc: Borislav Petkov, Junichi Nomura, Dave Young, Chao Fan, Baoquan He, Kairui Song, x86@kernel.org, kexec@lists.infradead.org The previous patch "x86/boot: Use efi_setup_data for searching RSDP on kexec-ed kernels" always reset some machines. This is a follow up of that patch. The reason is, by default, the systab region is not mapped by the identity mapping provided by kexec. So kernel will be accessing a not mapped memory region and cause fault. But as kexec tend to pad the map region up tp PUD or PMD size, the systab could be included in the map by accident so it worked on some machines, but that will be broken easily and unstable. There are two approach to fix it, detect if the systab is mapped, and avoid reading it if not. Another one is to ensure the region is map by either check and map the systab in fisrt kernel before kexec. Or map the systab in early code before reading it. Mapping in the early code should cover every case (else boot from an older kernel will also fail). This patch is a draft of implementing it. Just added a helper (add_identity_map_pgd) which could be used to add extra identity mapping in very early stage. And call it before reading systab. There should be no need to unmap it as the early page table will be discarded later. But some refractoring is included, which introduced a lot of changes, move some page table related code from kaslr_64.c to pgtable_64.c. If the appraoch goes well could prepare a sperate clean up patches. Signed-off-by: Kairui Song <kasong@redhat.com> --- arch/x86/boot/compressed/acpi.c | 5 + arch/x86/boot/compressed/kaslr_64.c | 109 +-------------------- arch/x86/boot/compressed/misc.c | 2 + arch/x86/boot/compressed/pgtable.h | 11 +++ arch/x86/boot/compressed/pgtable_64.c | 131 +++++++++++++++++++++++++- arch/x86/include/asm/boot.h | 8 +- 6 files changed, 156 insertions(+), 110 deletions(-) diff --git a/arch/x86/boot/compressed/acpi.c b/arch/x86/boot/compressed/acpi.c index 8cecce1ac0cd..a513b0f9bfda 100644 --- a/arch/x86/boot/compressed/acpi.c +++ b/arch/x86/boot/compressed/acpi.c @@ -2,6 +2,7 @@ #define BOOT_CTYPE_H #include "misc.h" #include "error.h" +#include "pgtable.h" #include "../string.h" #include <linux/numa.h> @@ -134,6 +135,10 @@ static acpi_physical_address kexec_get_rsdp_addr(void) if (!systab) error("EFI system table not found in kexec boot_params."); + add_identity_map_pgd((unsigned long)systab, + (unsigned long)systab + sizeof(*systab), + early_boot_top_pgt); + return __efi_get_rsdp_addr((unsigned long)esd->tables, systab->nr_tables, true); } #else diff --git a/arch/x86/boot/compressed/kaslr_64.c b/arch/x86/boot/compressed/kaslr_64.c index 748456c365f4..ec7093e192bf 100644 --- a/arch/x86/boot/compressed/kaslr_64.c +++ b/arch/x86/boot/compressed/kaslr_64.c @@ -8,121 +8,21 @@ * Copyright (C) 2016 Kees Cook */ -/* - * Since we're dealing with identity mappings, physical and virtual - * addresses are the same, so override these defines which are ultimately - * used by the headers in misc.h. - */ -#define __pa(x) ((unsigned long)(x)) -#define __va(x) ((void *)((unsigned long)(x))) - -/* No PAGE_TABLE_ISOLATION support needed either: */ -#undef CONFIG_PAGE_TABLE_ISOLATION - #include "misc.h" - -/* These actually do the work of building the kernel identity maps. */ -#include <asm/init.h> -#include <asm/pgtable.h> -/* Use the static base for this part of the boot process */ -#undef __PAGE_OFFSET -#define __PAGE_OFFSET __PAGE_OFFSET_BASE -#include "../../mm/ident_map.c" +#include "pgtable.h" /* Used by pgtable.h asm code to force instruction serialization. */ unsigned long __force_order; -/* Used to track our page table allocation area. */ -struct alloc_pgt_data { - unsigned char *pgt_buf; - unsigned long pgt_buf_size; - unsigned long pgt_buf_offset; -}; - -/* - * Allocates space for a page table entry, using struct alloc_pgt_data - * above. Besides the local callers, this is used as the allocation - * callback in mapping_info below. - */ -static void *alloc_pgt_page(void *context) -{ - struct alloc_pgt_data *pages = (struct alloc_pgt_data *)context; - unsigned char *entry; - - /* Validate there is space available for a new page. */ - if (pages->pgt_buf_offset >= pages->pgt_buf_size) { - debug_putstr("out of pgt_buf in " __FILE__ "!?\n"); - debug_putaddr(pages->pgt_buf_offset); - debug_putaddr(pages->pgt_buf_size); - return NULL; - } - - entry = pages->pgt_buf + pages->pgt_buf_offset; - pages->pgt_buf_offset += PAGE_SIZE; - - return entry; -} - -/* Used to track our allocated page tables. */ -static struct alloc_pgt_data pgt_data; - /* The top level page table entry pointer. */ static unsigned long top_level_pgt; -phys_addr_t physical_mask = (1ULL << __PHYSICAL_MASK_SHIFT) - 1; - -/* - * Mapping information structure passed to kernel_ident_mapping_init(). - * Due to relocation, pointers must be assigned at run time not build time. - */ -static struct x86_mapping_info mapping_info; - /* Locates and clears a region for a new top level page table. */ void initialize_identity_maps(void) { - /* If running as an SEV guest, the encryption mask is required. */ - set_sev_encryption_mask(); - - /* Exclude the encryption mask from __PHYSICAL_MASK */ - physical_mask &= ~sme_me_mask; - - /* Init mapping_info with run-time function/buffer pointers. */ - mapping_info.alloc_pgt_page = alloc_pgt_page; - mapping_info.context = &pgt_data; - mapping_info.page_flag = __PAGE_KERNEL_LARGE_EXEC | sme_me_mask; - mapping_info.kernpg_flag = _KERNPG_TABLE; - - /* - * It should be impossible for this not to already be true, - * but since calling this a second time would rewind the other - * counters, let's just make sure this is reset too. - */ - pgt_data.pgt_buf_offset = 0; - - /* - * If we came here via startup_32(), cr3 will be _pgtable already - * and we must append to the existing area instead of entirely - * overwriting it. - * - * With 5-level paging, we use '_pgtable' to allocate the p4d page table, - * the top-level page table is allocated separately. - * - * p4d_offset(top_level_pgt, 0) would cover both the 4- and 5-level - * cases. On 4-level paging it's equal to 'top_level_pgt'. - */ - top_level_pgt = read_cr3_pa(); - if (p4d_offset((pgd_t *)top_level_pgt, 0) == (p4d_t *)_pgtable) { - debug_putstr("booted via startup_32()\n"); - pgt_data.pgt_buf = _pgtable + BOOT_INIT_PGT_SIZE; - pgt_data.pgt_buf_size = BOOT_PGT_SIZE - BOOT_INIT_PGT_SIZE; - memset(pgt_data.pgt_buf, 0, pgt_data.pgt_buf_size); - } else { - debug_putstr("booted via startup_64()\n"); - pgt_data.pgt_buf = _pgtable; - pgt_data.pgt_buf_size = BOOT_PGT_SIZE; - memset(pgt_data.pgt_buf, 0, pgt_data.pgt_buf_size); + top_level_pgt = early_boot_top_pgt; + if ((p4d_t *)top_level_pgt != (p4d_t *)_pgtable) top_level_pgt = (unsigned long)alloc_pgt_page(&pgt_data); - } } /* @@ -141,8 +41,7 @@ void add_identity_map(unsigned long start, unsigned long size) return; /* Build the mapping. */ - kernel_ident_mapping_init(&mapping_info, (pgd_t *)top_level_pgt, - start, end); + add_identity_map_pgd(start, end, top_level_pgt); } /* diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c index c0d6c560df69..6b3548080d15 100644 --- a/arch/x86/boot/compressed/misc.c +++ b/arch/x86/boot/compressed/misc.c @@ -345,6 +345,8 @@ asmlinkage __visible void *extract_kernel(void *rmode, memptr heap, const unsigned long kernel_total_size = VO__end - VO__text; unsigned long virt_addr = LOAD_PHYSICAL_ADDR; + initialize_pgtable_alloc(); + /* Retain x86 boot parameters pointer passed from startup_32/64. */ boot_params = rmode; diff --git a/arch/x86/boot/compressed/pgtable.h b/arch/x86/boot/compressed/pgtable.h index 6ff7e81b5628..443df2b65fbf 100644 --- a/arch/x86/boot/compressed/pgtable.h +++ b/arch/x86/boot/compressed/pgtable.h @@ -16,5 +16,16 @@ extern unsigned long *trampoline_32bit; extern void trampoline_32bit_src(void *return_ptr); +extern struct alloc_pgt_data pgt_data; + +extern unsigned long early_boot_top_pgt; + +void *alloc_pgt_page(void *context); + +int add_identity_map_pgd(unsigned long pstart, + unsigned long pend, unsigned long pgd); + +void initialize_pgtable_alloc(void); + #endif /* __ASSEMBLER__ */ #endif /* BOOT_COMPRESSED_PAGETABLE_H */ diff --git a/arch/x86/boot/compressed/pgtable_64.c b/arch/x86/boot/compressed/pgtable_64.c index f8debf7aeb4c..cd36cf9e6a5c 100644 --- a/arch/x86/boot/compressed/pgtable_64.c +++ b/arch/x86/boot/compressed/pgtable_64.c @@ -1,9 +1,30 @@ +/* + * Since we're dealing with identity mappings, physical and virtual + * addresses are the same, so override these defines which are ultimately + * used by the headers in misc.h. + */ +#define __pa(x) ((unsigned long)(x)) +#define __va(x) ((void *)((unsigned long)(x))) + +/* No PAGE_TABLE_ISOLATION support needed either: */ +#undef CONFIG_PAGE_TABLE_ISOLATION + +#include "misc.h" +#include "pgtable.h" +#include "../string.h" + #include <linux/efi.h> #include <asm/e820/types.h> #include <asm/processor.h> #include <asm/efi.h> -#include "pgtable.h" -#include "../string.h" + +/* For handling early ident mapping */ +#include <asm/init.h> +#include <asm/pgtable.h> +/* Use the static base for this part of the boot process */ +#undef __PAGE_OFFSET +#define __PAGE_OFFSET __PAGE_OFFSET_BASE +#include "../../mm/ident_map.c" /* * __force_order is used by special_insns.h asm code to force instruction @@ -14,6 +35,28 @@ */ unsigned long __force_order; +/* Used to track our page table allocation area. */ +struct alloc_pgt_data { + unsigned char *pgt_buf; + unsigned long pgt_buf_size; + unsigned long pgt_buf_offset; +}; + +/* Used to track our allocated page tables. */ +struct alloc_pgt_data pgt_data; + +/* Track the first loaded boot page table. */ +unsigned long early_boot_top_pgt; + +phys_addr_t physical_mask = (1ULL << __PHYSICAL_MASK_SHIFT) - 1; + +/* + * Mapping information structure passed to kernel_ident_mapping_init(). + * Due to relocation, pointers must be assigned at run time not build time. + */ +static struct x86_mapping_info mapping_info; + +/* For handling trampoline. */ #define BIOS_START_MIN 0x20000U /* 128K, less than this is insane */ #define BIOS_START_MAX 0x9f000U /* 640K, absolute maximum */ @@ -202,3 +245,87 @@ void cleanup_trampoline(void *pgtable) /* Restore trampoline memory */ memcpy(trampoline_32bit, trampoline_save, TRAMPOLINE_32BIT_SIZE); } + +/* + * Allocates space for a page table entry, using struct alloc_pgt_data + * above. Besides the local callers, this is used as the allocation + * callback in mapping_info below. + */ +void *alloc_pgt_page(void *context) +{ + struct alloc_pgt_data *pages = (struct alloc_pgt_data *)context; + unsigned char *entry; + + /* Validate there is space available for a new page. */ + if (pages->pgt_buf_offset >= pages->pgt_buf_size) { + debug_putstr("out of pgt_buf in " __FILE__ "!?\n"); + debug_putaddr(pages->pgt_buf_offset); + debug_putaddr(pages->pgt_buf_size); + return NULL; + } + + entry = pages->pgt_buf + pages->pgt_buf_offset; + pages->pgt_buf_offset += PAGE_SIZE; + + return entry; +} + +/* Locates and clears a region for update or create page table. */ +void initialize_pgtable_alloc(void) +{ + /* If running as an SEV guest, the encryption mask is required. */ + set_sev_encryption_mask(); + + /* Exclude the encryption mask from __PHYSICAL_MASK */ + physical_mask &= ~sme_me_mask; + + /* Init mapping_info with run-time function/buffer pointers. */ + mapping_info.alloc_pgt_page = alloc_pgt_page; + mapping_info.context = &pgt_data; + mapping_info.page_flag = __PAGE_KERNEL_LARGE_EXEC | sme_me_mask; + mapping_info.kernpg_flag = _KERNPG_TABLE; + + /* + * It should be impossible for this not to already be true, + * but since calling this a second time would rewind the other + * counters, let's just make sure this is reset too. + */ + pgt_data.pgt_buf_offset = 0; + + /* + * If we came here via startup_32(), cr3 will be _pgtable already + * and we must append to the existing area instead of entirely + * overwriting it. + * + * With 5-level paging, we use '_pgtable' to allocate the p4d page + * table, the top-level page table is allocated separately. + * + * p4d_offset(early_boot_top_pgt, 0) would cover both the 4- and 5-level + * cases. On 4-level paging it's equal to 'early_boot_top_pgt'. + */ + + early_boot_top_pgt = read_cr3_pa(); + early_boot_top_pgt = (unsigned long)p4d_offset( + (pgd_t *)early_boot_top_pgt, 0); + if ((p4d_t *)early_boot_top_pgt == (p4d_t *)_pgtable) { + debug_putstr("booted via startup_32()\n"); + pgt_data.pgt_buf = _pgtable + BOOT_INIT_PGT_SIZE; + pgt_data.pgt_buf_size = BOOT_PGT_SIZE - BOOT_INIT_PGT_SIZE; + memset(pgt_data.pgt_buf, 0, pgt_data.pgt_buf_size); + } else { + debug_putstr("booted via startup_64()\n"); + pgt_data.pgt_buf = _pgtable; + pgt_data.pgt_buf_size = BOOT_PGT_SIZE; + memset(pgt_data.pgt_buf, 0, pgt_data.pgt_buf_size); + } +} + +/* + * Helper for mapping extra memory region in very early stage + * before extract and execute the actual kernel + */ +int add_identity_map_pgd(unsigned long pstart, unsigned long pend, + unsigned long pgd) +{ + kernel_ident_mapping_init(&mapping_info, (pgd_t *)pgd, pstart, pend); +} diff --git a/arch/x86/include/asm/boot.h b/arch/x86/include/asm/boot.h index 680c320363db..fb37eb98b65d 100644 --- a/arch/x86/include/asm/boot.h +++ b/arch/x86/include/asm/boot.h @@ -33,6 +33,8 @@ #ifdef CONFIG_X86_64 # define BOOT_STACK_SIZE 0x4000 +/* Reserve one page for possible extra mapping requirement */ +# define BOOT_EXTRA_PGT_SIZE (1*4096) # define BOOT_INIT_PGT_SIZE (6*4096) # ifdef CONFIG_RANDOMIZE_BASE /* @@ -43,12 +45,12 @@ * Total is 19 pages. */ # ifdef CONFIG_X86_VERBOSE_BOOTUP -# define BOOT_PGT_SIZE (19*4096) +# define BOOT_PGT_SIZE ((19 * 4096) + BOOT_EXTRA_PGT_SIZE) # else /* !CONFIG_X86_VERBOSE_BOOTUP */ -# define BOOT_PGT_SIZE (17*4096) +# define BOOT_PGT_SIZE ((17 * 4096) + BOOT_EXTRA_PGT_SIZE) # endif # else /* !CONFIG_RANDOMIZE_BASE */ -# define BOOT_PGT_SIZE BOOT_INIT_PGT_SIZE +# define BOOT_PGT_SIZE (BOOT_INIT_PGT_SIZE + BOOT_EXTRA_PGT_SIZE) # endif #else /* !CONFIG_X86_64 */ -- 2.20.1 ^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [RFC PATCH] kexec, x86/boot: map systab region in identity mapping before accessing it 2019-04-19 8:34 ` [RFC PATCH] kexec, x86/boot: map systab region in identity mapping before accessing it Kairui Song @ 2019-04-19 8:58 ` Baoquan He 2019-04-19 9:39 ` Kairui Song 0 siblings, 1 reply; 19+ messages in thread From: Baoquan He @ 2019-04-19 8:58 UTC (permalink / raw) To: Kairui Song Cc: linux-kernel, Borislav Petkov, Junichi Nomura, Dave Young, Chao Fan, x86@kernel.org, kexec@lists.infradead.org On 04/19/19 at 04:34pm, Kairui Song wrote: > /* Locates and clears a region for a new top level page table. */ > void initialize_identity_maps(void) > { > - /* If running as an SEV guest, the encryption mask is required. */ > - set_sev_encryption_mask(); > - > - /* Exclude the encryption mask from __PHYSICAL_MASK */ > - physical_mask &= ~sme_me_mask; > - > - /* Init mapping_info with run-time function/buffer pointers. */ > - mapping_info.alloc_pgt_page = alloc_pgt_page; > - mapping_info.context = &pgt_data; > - mapping_info.page_flag = __PAGE_KERNEL_LARGE_EXEC | sme_me_mask; > - mapping_info.kernpg_flag = _KERNPG_TABLE; > - > - /* > - * It should be impossible for this not to already be true, > - * but since calling this a second time would rewind the other > - * counters, let's just make sure this is reset too. > - */ > - pgt_data.pgt_buf_offset = 0; > - > - /* > - * If we came here via startup_32(), cr3 will be _pgtable already > - * and we must append to the existing area instead of entirely > - * overwriting it. > - * > - * With 5-level paging, we use '_pgtable' to allocate the p4d page table, > - * the top-level page table is allocated separately. > - * > - * p4d_offset(top_level_pgt, 0) would cover both the 4- and 5-level > - * cases. On 4-level paging it's equal to 'top_level_pgt'. > - */ > - top_level_pgt = read_cr3_pa(); > - if (p4d_offset((pgd_t *)top_level_pgt, 0) == (p4d_t *)_pgtable) { > - debug_putstr("booted via startup_32()\n"); > - pgt_data.pgt_buf = _pgtable + BOOT_INIT_PGT_SIZE; > - pgt_data.pgt_buf_size = BOOT_PGT_SIZE - BOOT_INIT_PGT_SIZE; > - memset(pgt_data.pgt_buf, 0, pgt_data.pgt_buf_size); > - } else { > - debug_putstr("booted via startup_64()\n"); > - pgt_data.pgt_buf = _pgtable; > - pgt_data.pgt_buf_size = BOOT_PGT_SIZE; > - memset(pgt_data.pgt_buf, 0, pgt_data.pgt_buf_size); > + top_level_pgt = early_boot_top_pgt; > + if ((p4d_t *)top_level_pgt != (p4d_t *)_pgtable) > top_level_pgt = (unsigned long)alloc_pgt_page(&pgt_data); Kairui, will you make a patchset to include these changes separately later on? I don't get the purposes of code changes. E.g here, I don't know why you introduce a new variable early_boot_top_pgt, and allocate the page table, even though they have been done in the old initialize_identity_maps(). Thanks Baoquan > - } > } > > /* > @@ -141,8 +41,7 @@ void add_identity_map(unsigned long start, unsigned long size) > return; > > /* Build the mapping. */ > - kernel_ident_mapping_init(&mapping_info, (pgd_t *)top_level_pgt, > - start, end); > + add_identity_map_pgd(start, end, top_level_pgt); > } > > /* > diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c > index c0d6c560df69..6b3548080d15 100644 > --- a/arch/x86/boot/compressed/misc.c > +++ b/arch/x86/boot/compressed/misc.c > @@ -345,6 +345,8 @@ asmlinkage __visible void *extract_kernel(void *rmode, memptr heap, > const unsigned long kernel_total_size = VO__end - VO__text; > unsigned long virt_addr = LOAD_PHYSICAL_ADDR; > > + initialize_pgtable_alloc(); > + > /* Retain x86 boot parameters pointer passed from startup_32/64. */ > boot_params = rmode; > > diff --git a/arch/x86/boot/compressed/pgtable.h b/arch/x86/boot/compressed/pgtable.h > index 6ff7e81b5628..443df2b65fbf 100644 > --- a/arch/x86/boot/compressed/pgtable.h > +++ b/arch/x86/boot/compressed/pgtable.h > @@ -16,5 +16,16 @@ extern unsigned long *trampoline_32bit; > > extern void trampoline_32bit_src(void *return_ptr); > > +extern struct alloc_pgt_data pgt_data; > + > +extern unsigned long early_boot_top_pgt; > + > +void *alloc_pgt_page(void *context); > + > +int add_identity_map_pgd(unsigned long pstart, > + unsigned long pend, unsigned long pgd); > + > +void initialize_pgtable_alloc(void); > + > #endif /* __ASSEMBLER__ */ > #endif /* BOOT_COMPRESSED_PAGETABLE_H */ > diff --git a/arch/x86/boot/compressed/pgtable_64.c b/arch/x86/boot/compressed/pgtable_64.c > index f8debf7aeb4c..cd36cf9e6a5c 100644 > --- a/arch/x86/boot/compressed/pgtable_64.c > +++ b/arch/x86/boot/compressed/pgtable_64.c > @@ -1,9 +1,30 @@ > +/* > + * Since we're dealing with identity mappings, physical and virtual > + * addresses are the same, so override these defines which are ultimately > + * used by the headers in misc.h. > + */ > +#define __pa(x) ((unsigned long)(x)) > +#define __va(x) ((void *)((unsigned long)(x))) > + > +/* No PAGE_TABLE_ISOLATION support needed either: */ > +#undef CONFIG_PAGE_TABLE_ISOLATION > + > +#include "misc.h" > +#include "pgtable.h" > +#include "../string.h" > + > #include <linux/efi.h> > #include <asm/e820/types.h> > #include <asm/processor.h> > #include <asm/efi.h> > -#include "pgtable.h" > -#include "../string.h" > + > +/* For handling early ident mapping */ > +#include <asm/init.h> > +#include <asm/pgtable.h> > +/* Use the static base for this part of the boot process */ > +#undef __PAGE_OFFSET > +#define __PAGE_OFFSET __PAGE_OFFSET_BASE > +#include "../../mm/ident_map.c" > > /* > * __force_order is used by special_insns.h asm code to force instruction > @@ -14,6 +35,28 @@ > */ > unsigned long __force_order; > > +/* Used to track our page table allocation area. */ > +struct alloc_pgt_data { > + unsigned char *pgt_buf; > + unsigned long pgt_buf_size; > + unsigned long pgt_buf_offset; > +}; > + > +/* Used to track our allocated page tables. */ > +struct alloc_pgt_data pgt_data; > + > +/* Track the first loaded boot page table. */ > +unsigned long early_boot_top_pgt; > + > +phys_addr_t physical_mask = (1ULL << __PHYSICAL_MASK_SHIFT) - 1; > + > +/* > + * Mapping information structure passed to kernel_ident_mapping_init(). > + * Due to relocation, pointers must be assigned at run time not build time. > + */ > +static struct x86_mapping_info mapping_info; > + > +/* For handling trampoline. */ > #define BIOS_START_MIN 0x20000U /* 128K, less than this is insane */ > #define BIOS_START_MAX 0x9f000U /* 640K, absolute maximum */ > > @@ -202,3 +245,87 @@ void cleanup_trampoline(void *pgtable) > /* Restore trampoline memory */ > memcpy(trampoline_32bit, trampoline_save, TRAMPOLINE_32BIT_SIZE); > } > + > +/* > + * Allocates space for a page table entry, using struct alloc_pgt_data > + * above. Besides the local callers, this is used as the allocation > + * callback in mapping_info below. > + */ > +void *alloc_pgt_page(void *context) > +{ > + struct alloc_pgt_data *pages = (struct alloc_pgt_data *)context; > + unsigned char *entry; > + > + /* Validate there is space available for a new page. */ > + if (pages->pgt_buf_offset >= pages->pgt_buf_size) { > + debug_putstr("out of pgt_buf in " __FILE__ "!?\n"); > + debug_putaddr(pages->pgt_buf_offset); > + debug_putaddr(pages->pgt_buf_size); > + return NULL; > + } > + > + entry = pages->pgt_buf + pages->pgt_buf_offset; > + pages->pgt_buf_offset += PAGE_SIZE; > + > + return entry; > +} > + > +/* Locates and clears a region for update or create page table. */ > +void initialize_pgtable_alloc(void) > +{ > + /* If running as an SEV guest, the encryption mask is required. */ > + set_sev_encryption_mask(); > + > + /* Exclude the encryption mask from __PHYSICAL_MASK */ > + physical_mask &= ~sme_me_mask; > + > + /* Init mapping_info with run-time function/buffer pointers. */ > + mapping_info.alloc_pgt_page = alloc_pgt_page; > + mapping_info.context = &pgt_data; > + mapping_info.page_flag = __PAGE_KERNEL_LARGE_EXEC | sme_me_mask; > + mapping_info.kernpg_flag = _KERNPG_TABLE; > + > + /* > + * It should be impossible for this not to already be true, > + * but since calling this a second time would rewind the other > + * counters, let's just make sure this is reset too. > + */ > + pgt_data.pgt_buf_offset = 0; > + > + /* > + * If we came here via startup_32(), cr3 will be _pgtable already > + * and we must append to the existing area instead of entirely > + * overwriting it. > + * > + * With 5-level paging, we use '_pgtable' to allocate the p4d page > + * table, the top-level page table is allocated separately. > + * > + * p4d_offset(early_boot_top_pgt, 0) would cover both the 4- and 5-level > + * cases. On 4-level paging it's equal to 'early_boot_top_pgt'. > + */ > + > + early_boot_top_pgt = read_cr3_pa(); > + early_boot_top_pgt = (unsigned long)p4d_offset( > + (pgd_t *)early_boot_top_pgt, 0); > + if ((p4d_t *)early_boot_top_pgt == (p4d_t *)_pgtable) { > + debug_putstr("booted via startup_32()\n"); > + pgt_data.pgt_buf = _pgtable + BOOT_INIT_PGT_SIZE; > + pgt_data.pgt_buf_size = BOOT_PGT_SIZE - BOOT_INIT_PGT_SIZE; > + memset(pgt_data.pgt_buf, 0, pgt_data.pgt_buf_size); > + } else { > + debug_putstr("booted via startup_64()\n"); > + pgt_data.pgt_buf = _pgtable; > + pgt_data.pgt_buf_size = BOOT_PGT_SIZE; > + memset(pgt_data.pgt_buf, 0, pgt_data.pgt_buf_size); > + } > +} > + > +/* > + * Helper for mapping extra memory region in very early stage > + * before extract and execute the actual kernel > + */ > +int add_identity_map_pgd(unsigned long pstart, unsigned long pend, > + unsigned long pgd) > +{ > + kernel_ident_mapping_init(&mapping_info, (pgd_t *)pgd, pstart, pend); > +} > diff --git a/arch/x86/include/asm/boot.h b/arch/x86/include/asm/boot.h > index 680c320363db..fb37eb98b65d 100644 > --- a/arch/x86/include/asm/boot.h > +++ b/arch/x86/include/asm/boot.h > @@ -33,6 +33,8 @@ > #ifdef CONFIG_X86_64 > # define BOOT_STACK_SIZE 0x4000 > > +/* Reserve one page for possible extra mapping requirement */ > +# define BOOT_EXTRA_PGT_SIZE (1*4096) > # define BOOT_INIT_PGT_SIZE (6*4096) > # ifdef CONFIG_RANDOMIZE_BASE > /* > @@ -43,12 +45,12 @@ > * Total is 19 pages. > */ > # ifdef CONFIG_X86_VERBOSE_BOOTUP > -# define BOOT_PGT_SIZE (19*4096) > +# define BOOT_PGT_SIZE ((19 * 4096) + BOOT_EXTRA_PGT_SIZE) > # else /* !CONFIG_X86_VERBOSE_BOOTUP */ > -# define BOOT_PGT_SIZE (17*4096) > +# define BOOT_PGT_SIZE ((17 * 4096) + BOOT_EXTRA_PGT_SIZE) > # endif > # else /* !CONFIG_RANDOMIZE_BASE */ > -# define BOOT_PGT_SIZE BOOT_INIT_PGT_SIZE > +# define BOOT_PGT_SIZE (BOOT_INIT_PGT_SIZE + BOOT_EXTRA_PGT_SIZE) > # endif > > #else /* !CONFIG_X86_64 */ > -- > 2.20.1 > ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [RFC PATCH] kexec, x86/boot: map systab region in identity mapping before accessing it 2019-04-19 8:58 ` Baoquan He @ 2019-04-19 9:39 ` Kairui Song 0 siblings, 0 replies; 19+ messages in thread From: Kairui Song @ 2019-04-19 9:39 UTC (permalink / raw) To: Baoquan He Cc: Linux Kernel Mailing List, Borislav Petkov, Junichi Nomura, Dave Young, Chao Fan, x86@kernel.org, kexec@lists.infradead.org On Fri, Apr 19, 2019 at 4:58 PM Baoquan He <bhe@redhat.com> wrote: > > On 04/19/19 at 04:34pm, Kairui Song wrote: > > /* Locates and clears a region for a new top level page table. */ > > void initialize_identity_maps(void) > > { > > - /* If running as an SEV guest, the encryption mask is required. */ > > - set_sev_encryption_mask(); > > - > > - /* Exclude the encryption mask from __PHYSICAL_MASK */ > > - physical_mask &= ~sme_me_mask; > > - > > - /* Init mapping_info with run-time function/buffer pointers. */ > > - mapping_info.alloc_pgt_page = alloc_pgt_page; > > - mapping_info.context = &pgt_data; > > - mapping_info.page_flag = __PAGE_KERNEL_LARGE_EXEC | sme_me_mask; > > - mapping_info.kernpg_flag = _KERNPG_TABLE; > > - > > - /* > > - * It should be impossible for this not to already be true, > > - * but since calling this a second time would rewind the other > > - * counters, let's just make sure this is reset too. > > - */ > > - pgt_data.pgt_buf_offset = 0; > > - > > - /* > > - * If we came here via startup_32(), cr3 will be _pgtable already > > - * and we must append to the existing area instead of entirely > > - * overwriting it. > > - * > > - * With 5-level paging, we use '_pgtable' to allocate the p4d page table, > > - * the top-level page table is allocated separately. > > - * > > - * p4d_offset(top_level_pgt, 0) would cover both the 4- and 5-level > > - * cases. On 4-level paging it's equal to 'top_level_pgt'. > > - */ > > - top_level_pgt = read_cr3_pa(); > > - if (p4d_offset((pgd_t *)top_level_pgt, 0) == (p4d_t *)_pgtable) { > > - debug_putstr("booted via startup_32()\n"); > > - pgt_data.pgt_buf = _pgtable + BOOT_INIT_PGT_SIZE; > > - pgt_data.pgt_buf_size = BOOT_PGT_SIZE - BOOT_INIT_PGT_SIZE; > > - memset(pgt_data.pgt_buf, 0, pgt_data.pgt_buf_size); > > - } else { > > - debug_putstr("booted via startup_64()\n"); > > - pgt_data.pgt_buf = _pgtable; > > - pgt_data.pgt_buf_size = BOOT_PGT_SIZE; > > - memset(pgt_data.pgt_buf, 0, pgt_data.pgt_buf_size); > > + top_level_pgt = early_boot_top_pgt; > > + if ((p4d_t *)top_level_pgt != (p4d_t *)_pgtable) > > top_level_pgt = (unsigned long)alloc_pgt_page(&pgt_data); > > Kairui, will you make a patchset to include these changes separately > later on? I don't get the purposes of code changes. E.g here, I > don't know why you introduce a new variable early_boot_top_pgt, and > allocate the page table, even though they have been done in the old > initialize_identity_maps(). > > Thanks > Baoquan > OK, right, it's not a good idea to mess up things together, I'll resend the patch, and will sent the cleanup separately. Without clean up it may bring in some extra burden with certain kernel config, but that should be OK for the fix. -- Best Regards, Kairui Song ^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2019-04-26 10:16 UTC | newest] Thread overview: 19+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2019-04-19 10:17 [RFC PATCH] kexec, x86/boot: map systab region in identity mapping before accessing it Borislav Petkov 2019-04-19 10:50 ` Baoquan He 2019-04-19 10:55 ` Baoquan He 2019-04-19 11:20 ` Kairui Song 2019-04-19 11:34 ` Borislav Petkov 2019-04-19 11:50 ` Kairui Song 2019-04-19 14:19 ` [PATCH] x86/boot: Disable RSDP parsing temporarily Borislav Petkov 2019-04-22 9:46 ` [tip:x86/urgent] " tip-bot for Borislav Petkov 2019-04-19 11:28 ` [RFC PATCH] kexec, x86/boot: map systab region in identity mapping before accessing it Borislav Petkov 2019-04-19 11:36 ` Borislav Petkov 2019-04-22 14:33 ` Baoquan He 2019-04-22 15:17 ` Borislav Petkov 2019-04-26 9:51 ` Baoquan He 2019-04-26 9:58 ` Borislav Petkov 2019-04-26 10:16 ` Baoquan He 2019-04-19 11:44 ` Baoquan He -- strict thread matches above, loose matches on Subject: below -- 2019-04-16 9:52 [PATCH] x86/boot: Use efi_setup_data for searching RSDP on kexec-ed kernels Borislav Petkov 2019-04-19 8:34 ` [RFC PATCH] kexec, x86/boot: map systab region in identity mapping before accessing it Kairui Song 2019-04-19 8:58 ` Baoquan He 2019-04-19 9:39 ` Kairui Song
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox