* [PATCH v2 0/2] x86/mm: unmapping and marking-as-I/O in arch_init_memory()
@ 2025-08-11 10:48 Jan Beulich
2025-08-11 10:49 ` [PATCH v2 1/2] x86/mkelf32: pad load segment to 2Mb boundary Jan Beulich
2025-08-11 10:50 ` [PATCH v2 2/2] x86/mm: drop unmapping from marking-as-I/O in arch_init_memory() Jan Beulich
0 siblings, 2 replies; 8+ messages in thread
From: Jan Beulich @ 2025-08-11 10:48 UTC (permalink / raw)
To: xen-devel@lists.xenproject.org; +Cc: Andrew Cooper, Roger Pau Monné
What we unmap there are mappings we better wouldn't have established in
the first place. Arrange for us to only ever map RAM, and then drop the
unmapping code, which was flawed anyway.
Nothing with a similar effect as patch 1 needs doing for xen.efi: Prior
to GNU binutils commit a844415db878 ("bfd/PE: correct SizeOfImage
calculation") too large a size was calculated. With that change in place,
image size is properly rounded up to a multiple of 2Mb.
1: mkelf32: pad load segment to 2Mb boundary
2: mm: drop unmapping from marking-as-I/O in arch_init_memory()
Jan
^ permalink raw reply [flat|nested] 8+ messages in thread* [PATCH v2 1/2] x86/mkelf32: pad load segment to 2Mb boundary 2025-08-11 10:48 [PATCH v2 0/2] x86/mm: unmapping and marking-as-I/O in arch_init_memory() Jan Beulich @ 2025-08-11 10:49 ` Jan Beulich 2025-08-12 16:18 ` Roger Pau Monné 2025-08-11 10:50 ` [PATCH v2 2/2] x86/mm: drop unmapping from marking-as-I/O in arch_init_memory() Jan Beulich 1 sibling, 1 reply; 8+ messages in thread From: Jan Beulich @ 2025-08-11 10:49 UTC (permalink / raw) To: xen-devel@lists.xenproject.org; +Cc: Andrew Cooper, Roger Pau Monné In order to legitimately set up initial mappings past _end[], we need to make sure that the entire mapped range is inside a RAM region. Therefore we need to inform the bootloader (or alike) that our allocated size is larger than just the next SECTION_ALIGN-ed boundary past _end[]. This allows dropping a command line option from the tool, which was introduced to work around a supposed linker bug, when the problem was really Xen's. While adjusting adjacent code, correct the argc check to also cover the case correctly when --notes was passed. Signed-off-by: Jan Beulich <jbeulich@suse.com> --- There's no good Fixes: tag, I don't think, as in theory the issue could even have happened when we still required to be loaded at a fixed physical address (1Mb originally, later 2Mb), and when we statically mapped the low 16Mb. If we assumed such can't happen below 16Mb, these two should be added: Fixes: e4dd91ea85a3 ("x86: Ensure RAM holes really are not mapped in Xen's ongoing 1:1 physmap") Fixes: 7cd7f2f5e116 ("x86/boot: Remove the preconstructed low 16M superpage mappings") --- v2: New. --- a/xen/arch/x86/Makefile +++ b/xen/arch/x86/Makefile @@ -130,8 +130,7 @@ orphan-handling-$(call ld-option,--orpha $(TARGET): TMP = $(dot-target).elf32 $(TARGET): $(TARGET)-syms $(efi-y) $(obj)/boot/mkelf32 - $(obj)/boot/mkelf32 $(notes_phdrs) $(TARGET)-syms $(TMP) $(XEN_IMG_OFFSET) \ - `$(NM) $(TARGET)-syms | sed -ne 's/^\([^ ]*\) . __2M_rwdata_end$$/0x\1/p'` + $(obj)/boot/mkelf32 $(notes_phdrs) $(TARGET)-syms $(TMP) $(XEN_IMG_OFFSET) od -t x4 -N 8192 $(TMP) | grep 1badb002 > /dev/null || \ { echo "No Multiboot1 header found" >&2; false; } od -t x4 -N 32768 $(TMP) | grep e85250d6 > /dev/null || \ --- a/xen/arch/x86/boot/mkelf32.c +++ b/xen/arch/x86/boot/mkelf32.c @@ -248,7 +248,6 @@ static void do_read(int fd, void *data, int main(int argc, char **argv) { - uint64_t final_exec_addr; uint32_t loadbase, dat_siz, mem_siz, note_base, note_sz, offset; char *inimage, *outimage; int infd, outfd; @@ -261,22 +260,24 @@ int main(int argc, char **argv) Elf64_Ehdr in64_ehdr; Elf64_Phdr in64_phdr; - if ( argc < 5 ) + if ( argc < 4 ) { + help: fprintf(stderr, "Usage: mkelf32 [--notes] <in-image> <out-image> " - "<load-base> <final-exec-addr>\n"); + "<load-base>\n"); return 1; } if ( !strcmp(argv[1], "--notes") ) { + if ( argc < 5 ) + goto help; i = 2; num_phdrs = 2; } inimage = argv[i++]; outimage = argv[i++]; loadbase = strtoul(argv[i++], NULL, 16); - final_exec_addr = strtoull(argv[i++], NULL, 16); infd = open(inimage, O_RDONLY); if ( infd == -1 ) @@ -339,9 +340,12 @@ int main(int argc, char **argv) (void)lseek(infd, in64_phdr.p_offset, SEEK_SET); dat_siz = (uint32_t)in64_phdr.p_filesz; - /* Do not use p_memsz: it does not include BSS alignment padding. */ - /*mem_siz = (uint32_t)in64_phdr.p_memsz;*/ - mem_siz = (uint32_t)(final_exec_addr - in64_phdr.p_vaddr); + /* + * We don't pad .bss in the linker script, but during early boot we map + * the Xen image using 2M pages. To avoid running into adjacent non-RAM + * regions, pad the segment to the next 2M boundary. + */ + mem_siz = ((uint32_t)in64_phdr.p_memsz + (1U << 20) - 1) & (-1U << 20); note_sz = note_base = offset = 0; if ( num_phdrs > 1 ) ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v2 1/2] x86/mkelf32: pad load segment to 2Mb boundary 2025-08-11 10:49 ` [PATCH v2 1/2] x86/mkelf32: pad load segment to 2Mb boundary Jan Beulich @ 2025-08-12 16:18 ` Roger Pau Monné 2025-08-14 7:02 ` Jan Beulich 0 siblings, 1 reply; 8+ messages in thread From: Roger Pau Monné @ 2025-08-12 16:18 UTC (permalink / raw) To: Jan Beulich; +Cc: xen-devel@lists.xenproject.org, Andrew Cooper On Mon, Aug 11, 2025 at 12:49:57PM +0200, Jan Beulich wrote: > In order to legitimately set up initial mappings past _end[], we need > to make sure that the entire mapped range is inside a RAM region. > Therefore we need to inform the bootloader (or alike) that our allocated > size is larger than just the next SECTION_ALIGN-ed boundary past _end[]. > > This allows dropping a command line option from the tool, which was > introduced to work around a supposed linker bug, when the problem was > really Xen's. > > While adjusting adjacent code, correct the argc check to also cover the > case correctly when --notes was passed. > > Signed-off-by: Jan Beulich <jbeulich@suse.com> > --- > There's no good Fixes: tag, I don't think, as in theory the issue could > even have happened when we still required to be loaded at a fixed > physical address (1Mb originally, later 2Mb), and when we statically > mapped the low 16Mb. If we assumed such can't happen below 16Mb, these > two should be added: > Fixes: e4dd91ea85a3 ("x86: Ensure RAM holes really are not mapped in Xen's ongoing 1:1 physmap") > Fixes: 7cd7f2f5e116 ("x86/boot: Remove the preconstructed low 16M superpage mappings") > --- > v2: New. > > --- a/xen/arch/x86/Makefile > +++ b/xen/arch/x86/Makefile > @@ -130,8 +130,7 @@ orphan-handling-$(call ld-option,--orpha > > $(TARGET): TMP = $(dot-target).elf32 > $(TARGET): $(TARGET)-syms $(efi-y) $(obj)/boot/mkelf32 > - $(obj)/boot/mkelf32 $(notes_phdrs) $(TARGET)-syms $(TMP) $(XEN_IMG_OFFSET) \ > - `$(NM) $(TARGET)-syms | sed -ne 's/^\([^ ]*\) . __2M_rwdata_end$$/0x\1/p'` > + $(obj)/boot/mkelf32 $(notes_phdrs) $(TARGET)-syms $(TMP) $(XEN_IMG_OFFSET) > od -t x4 -N 8192 $(TMP) | grep 1badb002 > /dev/null || \ > { echo "No Multiboot1 header found" >&2; false; } > od -t x4 -N 32768 $(TMP) | grep e85250d6 > /dev/null || \ > --- a/xen/arch/x86/boot/mkelf32.c > +++ b/xen/arch/x86/boot/mkelf32.c > @@ -248,7 +248,6 @@ static void do_read(int fd, void *data, > > int main(int argc, char **argv) > { > - uint64_t final_exec_addr; > uint32_t loadbase, dat_siz, mem_siz, note_base, note_sz, offset; > char *inimage, *outimage; > int infd, outfd; > @@ -261,22 +260,24 @@ int main(int argc, char **argv) > Elf64_Ehdr in64_ehdr; > Elf64_Phdr in64_phdr; > > - if ( argc < 5 ) > + if ( argc < 4 ) > { > + help: > fprintf(stderr, "Usage: mkelf32 [--notes] <in-image> <out-image> " > - "<load-base> <final-exec-addr>\n"); > + "<load-base>\n"); > return 1; > } > > if ( !strcmp(argv[1], "--notes") ) > { > + if ( argc < 5 ) > + goto help; > i = 2; > num_phdrs = 2; > } > inimage = argv[i++]; > outimage = argv[i++]; > loadbase = strtoul(argv[i++], NULL, 16); > - final_exec_addr = strtoull(argv[i++], NULL, 16); > > infd = open(inimage, O_RDONLY); > if ( infd == -1 ) > @@ -339,9 +340,12 @@ int main(int argc, char **argv) > (void)lseek(infd, in64_phdr.p_offset, SEEK_SET); > dat_siz = (uint32_t)in64_phdr.p_filesz; > > - /* Do not use p_memsz: it does not include BSS alignment padding. */ > - /*mem_siz = (uint32_t)in64_phdr.p_memsz;*/ > - mem_siz = (uint32_t)(final_exec_addr - in64_phdr.p_vaddr); > + /* > + * We don't pad .bss in the linker script, but during early boot we map > + * the Xen image using 2M pages. To avoid running into adjacent non-RAM > + * regions, pad the segment to the next 2M boundary. Won't it be easier to pad in the linker script? We could still have __bss_end before the padding, so that initialization isn't done to the extra padding area. Otherwise it would be helpful to mention why the padding must be done here (opposed to being done in the linker script). Thanks, Roger. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v2 1/2] x86/mkelf32: pad load segment to 2Mb boundary 2025-08-12 16:18 ` Roger Pau Monné @ 2025-08-14 7:02 ` Jan Beulich 2025-08-14 14:04 ` Roger Pau Monné 0 siblings, 1 reply; 8+ messages in thread From: Jan Beulich @ 2025-08-14 7:02 UTC (permalink / raw) To: Roger Pau Monné; +Cc: xen-devel@lists.xenproject.org, Andrew Cooper On 12.08.2025 18:18, Roger Pau Monné wrote: > On Mon, Aug 11, 2025 at 12:49:57PM +0200, Jan Beulich wrote: >> @@ -339,9 +340,12 @@ int main(int argc, char **argv) >> (void)lseek(infd, in64_phdr.p_offset, SEEK_SET); >> dat_siz = (uint32_t)in64_phdr.p_filesz; >> >> - /* Do not use p_memsz: it does not include BSS alignment padding. */ >> - /*mem_siz = (uint32_t)in64_phdr.p_memsz;*/ >> - mem_siz = (uint32_t)(final_exec_addr - in64_phdr.p_vaddr); >> + /* >> + * We don't pad .bss in the linker script, but during early boot we map >> + * the Xen image using 2M pages. To avoid running into adjacent non-RAM >> + * regions, pad the segment to the next 2M boundary. > > Won't it be easier to pad in the linker script? We could still have > __bss_end before the padding, so that initialization isn't done to the > extra padding area. Otherwise it would be helpful to mention why the > padding must be done here (opposed to being done in the linker > script). The way the linker script currently is written doesn't lend itself to do the padding there: It would either mean to introduce an artificial padding section (which I'd dislike), or it would result in _end[] and __2M_rwdata_end[] also moving, which pretty clearly we don't want. Maybe there are other options that I simply don't see. A further complication would be xen.efi's .reloc, which we don't want to needlessly move either. That may be coverable by pr-processor conditionals, but I wanted to mention the aspect nevertheless. Jan ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v2 1/2] x86/mkelf32: pad load segment to 2Mb boundary 2025-08-14 7:02 ` Jan Beulich @ 2025-08-14 14:04 ` Roger Pau Monné 0 siblings, 0 replies; 8+ messages in thread From: Roger Pau Monné @ 2025-08-14 14:04 UTC (permalink / raw) To: Jan Beulich; +Cc: xen-devel@lists.xenproject.org, Andrew Cooper On Thu, Aug 14, 2025 at 09:02:35AM +0200, Jan Beulich wrote: > On 12.08.2025 18:18, Roger Pau Monné wrote: > > On Mon, Aug 11, 2025 at 12:49:57PM +0200, Jan Beulich wrote: > >> @@ -339,9 +340,12 @@ int main(int argc, char **argv) > >> (void)lseek(infd, in64_phdr.p_offset, SEEK_SET); > >> dat_siz = (uint32_t)in64_phdr.p_filesz; > >> > >> - /* Do not use p_memsz: it does not include BSS alignment padding. */ > >> - /*mem_siz = (uint32_t)in64_phdr.p_memsz;*/ > >> - mem_siz = (uint32_t)(final_exec_addr - in64_phdr.p_vaddr); > >> + /* > >> + * We don't pad .bss in the linker script, but during early boot we map > >> + * the Xen image using 2M pages. To avoid running into adjacent non-RAM > >> + * regions, pad the segment to the next 2M boundary. > > > > Won't it be easier to pad in the linker script? We could still have > > __bss_end before the padding, so that initialization isn't done to the > > extra padding area. Otherwise it would be helpful to mention why the > > padding must be done here (opposed to being done in the linker > > script). > > The way the linker script currently is written doesn't lend itself to do > the padding there: It would either mean to introduce an artificial > padding section (which I'd dislike), or it would result in _end[] and > __2M_rwdata_end[] also moving, which pretty clearly we don't want. Maybe > there are other options that I simply don't see. We could move both _end and __2M_rwdata_end inside the .bss section, but that's also ugly IMO. I would probably prefer the extra padding section. > A further complication would be xen.efi's .reloc, which we don't want to > needlessly move either. That may be coverable by pr-processor > conditionals, but I wanted to mention the aspect nevertheless. Yeah, we could make the extra padding section depend on pre-processor checks. I think I would prefer the usage of such extra section rather than mangling the elf program headers afterwards, but since we are already doing it: Acked-by: Roger Pau Monné <roger.pau@citrix.com> Thanks, Roger. ^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH v2 2/2] x86/mm: drop unmapping from marking-as-I/O in arch_init_memory() 2025-08-11 10:48 [PATCH v2 0/2] x86/mm: unmapping and marking-as-I/O in arch_init_memory() Jan Beulich 2025-08-11 10:49 ` [PATCH v2 1/2] x86/mkelf32: pad load segment to 2Mb boundary Jan Beulich @ 2025-08-11 10:50 ` Jan Beulich 2025-08-12 16:30 ` Roger Pau Monné 1 sibling, 1 reply; 8+ messages in thread From: Jan Beulich @ 2025-08-11 10:50 UTC (permalink / raw) To: xen-devel@lists.xenproject.org; +Cc: Andrew Cooper, Roger Pau Monné The unmapping part would have wanted to cover UNUSABLE regions as well, and it would now have been necessary for space outside the low 16Mb (wherever Xen is placed). However, with everything up to the next 2Mb boundary now properly backed by RAM, we don't need to unmap anything anymore: Space up to __2M_rwdata_end[] is properly reserved, whereas space past that mark (up to the next 2Mb boundary) is ordinary RAM. While there, limit the scopes of involved variables. Signed-off-by: Jan Beulich <jbeulich@suse.com> --- v2: Drop unmapping code altogether. --- a/xen/arch/x86/mm.c +++ b/xen/arch/x86/mm.c @@ -275,8 +275,6 @@ static void __init assign_io_page(struct void __init arch_init_memory(void) { - unsigned long i, pfn, rstart_pfn, rend_pfn, iostart_pfn, ioend_pfn; - /* * Basic guest-accessible flags: * PRESENT, R/W, USER, A/D, AVAIL[0,1,2], AVAIL_HIGH, NX (if available). @@ -292,12 +290,17 @@ void __init arch_init_memory(void) * case the low 1MB. */ BUG_ON(pvh_boot && trampoline_phys != 0x1000); - for ( i = 0; i < 0x100; i++ ) + for ( unsigned int i = 0; i < MB(1) >> PAGE_SHIFT; i++ ) assign_io_page(mfn_to_page(_mfn(i))); - /* Any areas not specified as RAM by the e820 map are considered I/O. */ - for ( i = 0, pfn = 0; pfn < max_page; i++ ) + /* + * Any areas not specified as RAM or UNUSABLE by the e820 map are + * considered I/O. + */ + for ( unsigned long i = 0, pfn = 0; pfn < max_page; i++ ) { + unsigned long rstart_pfn, rend_pfn; + while ( (i < e820.nr_map) && (e820.map[i].type != E820_RAM) && (e820.map[i].type != E820_UNUSABLE) ) @@ -317,17 +320,6 @@ void __init arch_init_memory(void) PFN_DOWN(e820.map[i].addr + e820.map[i].size)); } - /* - * Make sure any Xen mappings of RAM holes above 1MB are blown away. - * In particular this ensures that RAM holes are respected even in - * the statically-initialised 1-16MB mapping area. - */ - iostart_pfn = max_t(unsigned long, pfn, 1UL << (20 - PAGE_SHIFT)); - ioend_pfn = min(rstart_pfn, 16UL << (20 - PAGE_SHIFT)); - if ( iostart_pfn < ioend_pfn ) - destroy_xen_mappings((unsigned long)mfn_to_virt(iostart_pfn), - (unsigned long)mfn_to_virt(ioend_pfn)); - /* Mark as I/O up to next RAM region. */ for ( ; pfn < rstart_pfn; pfn++ ) { @@ -365,6 +357,7 @@ void __init arch_init_memory(void) const l3_pgentry_t *l3idle = map_l3t_from_l4e( idle_pg_table[l4_table_offset(split_va)]); l3_pgentry_t *l3tab = map_domain_page(l3mfn); + unsigned int i; for ( i = 0; i < l3_table_offset(split_va); ++i ) l3tab[i] = l3idle[i]; ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v2 2/2] x86/mm: drop unmapping from marking-as-I/O in arch_init_memory() 2025-08-11 10:50 ` [PATCH v2 2/2] x86/mm: drop unmapping from marking-as-I/O in arch_init_memory() Jan Beulich @ 2025-08-12 16:30 ` Roger Pau Monné 2025-08-14 7:04 ` Jan Beulich 0 siblings, 1 reply; 8+ messages in thread From: Roger Pau Monné @ 2025-08-12 16:30 UTC (permalink / raw) To: Jan Beulich; +Cc: xen-devel@lists.xenproject.org, Andrew Cooper On Mon, Aug 11, 2025 at 12:50:23PM +0200, Jan Beulich wrote: > The unmapping part would have wanted to cover UNUSABLE regions as well, > and it would now have been necessary for space outside the low 16Mb > (wherever Xen is placed). However, with everything up to the next 2Mb > boundary now properly backed by RAM, we don't need to unmap anything > anymore: Space up to __2M_rwdata_end[] is properly reserved, whereas > space past that mark (up to the next 2Mb boundary) is ordinary RAM. Oh, I see, so this was done to unmap trailing space when the Xen image region is mapped using 2M pages. > While there, limit the scopes of involved variables. > > Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Roger Pau Monné <roger.pau@citrix.com> > --- > v2: Drop unmapping code altogether. > > --- a/xen/arch/x86/mm.c > +++ b/xen/arch/x86/mm.c > @@ -275,8 +275,6 @@ static void __init assign_io_page(struct > > void __init arch_init_memory(void) > { > - unsigned long i, pfn, rstart_pfn, rend_pfn, iostart_pfn, ioend_pfn; > - > /* > * Basic guest-accessible flags: > * PRESENT, R/W, USER, A/D, AVAIL[0,1,2], AVAIL_HIGH, NX (if available). > @@ -292,12 +290,17 @@ void __init arch_init_memory(void) > * case the low 1MB. > */ > BUG_ON(pvh_boot && trampoline_phys != 0x1000); > - for ( i = 0; i < 0x100; i++ ) > + for ( unsigned int i = 0; i < MB(1) >> PAGE_SHIFT; i++ ) I would use PFN_DOWN() rather than the shift, but that's just my preference. Thanks, Roger. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v2 2/2] x86/mm: drop unmapping from marking-as-I/O in arch_init_memory() 2025-08-12 16:30 ` Roger Pau Monné @ 2025-08-14 7:04 ` Jan Beulich 0 siblings, 0 replies; 8+ messages in thread From: Jan Beulich @ 2025-08-14 7:04 UTC (permalink / raw) To: Roger Pau Monné; +Cc: xen-devel@lists.xenproject.org, Andrew Cooper On 12.08.2025 18:30, Roger Pau Monné wrote: > On Mon, Aug 11, 2025 at 12:50:23PM +0200, Jan Beulich wrote: >> The unmapping part would have wanted to cover UNUSABLE regions as well, >> and it would now have been necessary for space outside the low 16Mb >> (wherever Xen is placed). However, with everything up to the next 2Mb >> boundary now properly backed by RAM, we don't need to unmap anything >> anymore: Space up to __2M_rwdata_end[] is properly reserved, whereas >> space past that mark (up to the next 2Mb boundary) is ordinary RAM. > > Oh, I see, so this was done to unmap trailing space when the Xen image > region is mapped using 2M pages. > >> While there, limit the scopes of involved variables. >> >> Signed-off-by: Jan Beulich <jbeulich@suse.com> > > Acked-by: Roger Pau Monné <roger.pau@citrix.com> Thanks. >> @@ -292,12 +290,17 @@ void __init arch_init_memory(void) >> * case the low 1MB. >> */ >> BUG_ON(pvh_boot && trampoline_phys != 0x1000); >> - for ( i = 0; i < 0x100; i++ ) >> + for ( unsigned int i = 0; i < MB(1) >> PAGE_SHIFT; i++ ) > > I would use PFN_DOWN() rather than the shift, but that's just my > preference. Oh, yes, fine with me. Jan ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2025-08-14 14:05 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-08-11 10:48 [PATCH v2 0/2] x86/mm: unmapping and marking-as-I/O in arch_init_memory() Jan Beulich 2025-08-11 10:49 ` [PATCH v2 1/2] x86/mkelf32: pad load segment to 2Mb boundary Jan Beulich 2025-08-12 16:18 ` Roger Pau Monné 2025-08-14 7:02 ` Jan Beulich 2025-08-14 14:04 ` Roger Pau Monné 2025-08-11 10:50 ` [PATCH v2 2/2] x86/mm: drop unmapping from marking-as-I/O in arch_init_memory() Jan Beulich 2025-08-12 16:30 ` Roger Pau Monné 2025-08-14 7:04 ` Jan Beulich
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.