* [Qemu-devel] [PULL 0/1] e820: pass high memory too. @ 2013-10-17 11:09 Gerd Hoffmann 2013-10-17 11:09 ` [Qemu-devel] [PATCH 1/1] " Gerd Hoffmann 2013-10-18 11:31 ` [Qemu-devel] [PULL 0/1] " Gerd Hoffmann 0 siblings, 2 replies; 9+ messages in thread From: Gerd Hoffmann @ 2013-10-17 11:09 UTC (permalink / raw) To: qemu-devel; +Cc: Gerd Hoffmann Hi, See patch description for all the details. Patch has been out for review for a while without objections. please pull, Gerd The following changes since commit 1680d485777ecf436d724631ea8722cc0c66990e: Merge remote-tracking branch 'rth/tcg-ldst-6' into staging (2013-10-14 09:59:59 -0700) are available in the git repository at: git://git.kraxel.org/qemu e820.1 for you to fetch changes up to 0624c7f916b4d97f17726d9b295d6a6b0dc5076d: e820: pass high memory too. (2013-10-17 13:06:11 +0200) ---------------------------------------------------------------- Gerd Hoffmann (1): e820: pass high memory too. hw/i386/pc.c | 8 ++++++++ 1 file changed, 8 insertions(+) ^ permalink raw reply [flat|nested] 9+ messages in thread
* [Qemu-devel] [PATCH 1/1] e820: pass high memory too. 2013-10-17 11:09 [Qemu-devel] [PULL 0/1] e820: pass high memory too Gerd Hoffmann @ 2013-10-17 11:09 ` Gerd Hoffmann 2013-10-17 13:00 ` Andrea Arcangeli 2013-10-18 11:31 ` [Qemu-devel] [PULL 0/1] " Gerd Hoffmann 1 sibling, 1 reply; 9+ messages in thread From: Gerd Hoffmann @ 2013-10-17 11:09 UTC (permalink / raw) To: qemu-devel; +Cc: Andrea Arcangeli, Gerd Hoffmann, Anthony Liguori We have a fw_cfg entry to pass e820 entries from qemu to the firmware. Today it's used to pass reservations only. This patch makes qemu pass entries for RAM too. This allows to pass RAM sizes larger than 1TB to the firmware and it will also allow to pass non-contignous memory ramges should we decide to implement that some day, say for our virtual numa nodes. Obviously this needs some extra care to not break existing firware. SeaBIOS loads the entries and happily adds them without looking at the type. Which is problematic for memory below 4g as this will overwrite reservations added for bios memory etc. For memory above 4g it works just fine, seabios will merge the entry derived from cmos with the one loaded from fw_cfg. OVMF doesn't look at the fw_cfg e820 table. coreboot doesn't look at the fw_cfg e820 table. Cc: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Gerd Hoffmann <kraxel@redhat.com> Reviewed-By: Igor Mammedov <imammedo@redhat.com> --- hw/i386/pc.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/hw/i386/pc.c b/hw/i386/pc.c index 0c313fe..ec5508b 100644 --- a/hw/i386/pc.c +++ b/hw/i386/pc.c @@ -1134,12 +1134,20 @@ FWCfgState *pc_memory_init(MemoryRegion *system_memory, memory_region_init_alias(ram_below_4g, NULL, "ram-below-4g", ram, 0, below_4g_mem_size); memory_region_add_subregion(system_memory, 0, ram_below_4g); + if (0) { + /* + * Ideally we should do that too, but that would ruin the e820 + * reservations added by seabios before initializing fw_cfg. + */ + e820_add_entry(0, below_4g_mem_size, E820_RAM); + } if (above_4g_mem_size > 0) { ram_above_4g = g_malloc(sizeof(*ram_above_4g)); memory_region_init_alias(ram_above_4g, NULL, "ram-above-4g", ram, below_4g_mem_size, above_4g_mem_size); memory_region_add_subregion(system_memory, 0x100000000ULL, ram_above_4g); + e820_add_entry(0x100000000ULL, above_4g_mem_size, E820_RAM); } -- 1.8.3.1 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [Qemu-devel] [PATCH 1/1] e820: pass high memory too. 2013-10-17 11:09 ` [Qemu-devel] [PATCH 1/1] " Gerd Hoffmann @ 2013-10-17 13:00 ` Andrea Arcangeli 2013-10-17 14:30 ` Gerd Hoffmann 0 siblings, 1 reply; 9+ messages in thread From: Andrea Arcangeli @ 2013-10-17 13:00 UTC (permalink / raw) To: Gerd Hoffmann; +Cc: qemu-devel, Anthony Liguori Hi, On Thu, Oct 17, 2013 at 01:09:38PM +0200, Gerd Hoffmann wrote: > We have a fw_cfg entry to pass e820 entries from qemu to the firmware. > Today it's used to pass reservations only. This patch makes qemu pass > entries for RAM too. > > This allows to pass RAM sizes larger than 1TB to the firmware and it > will also allow to pass non-contignous memory ramges should we decide > to implement that some day, say for our virtual numa nodes. > > Obviously this needs some extra care to not break existing firware. > > SeaBIOS loads the entries and happily adds them without looking at the > type. Which is problematic for memory below 4g as this will overwrite > reservations added for bios memory etc. For memory above 4g it works > just fine, seabios will merge the entry derived from cmos with the one > loaded from fw_cfg. The reason for not fixing the cmos and defer the fixage of the >1TB boot, is to develop a better approach, and this mixture of e820 and cmos doesn't look like an improvement. The only thing it avoids is to touch seabios but it provides no benefit whatsoever if compared to fixing the cmos which looks cleaner to me than having to compute a mix of cmos and e820 in seabios (and potentially having other bioses following this mix-incomplete-API). I thought the reason of deferring the fixage of >1TB boot to wait for a better approach and better API, I didn't think the end result had to be a mix API that adds no value. The premise that "this will also allow to pass non-contiguous memory" is partly false, as you can't use the e820 API below 4g so there's no way to create non contiguous memory with this mix-cmos-e820-API. So instead of adding "if (0)" patches and requiring bioses to mix information from e820 maps and cmos to boot with more than 1TB, why can't simply seabios can be fixed to preserve its own reservations (fragmenting the e820 map passed by qemu) while it build the e820 map? So then we the qemu API becomes: e820_add_entry(0, ram_size, E820_RAM); No mix. And seabios shall as well crash if the highest ram address in the e820 map provided by qemu (truncated to 40 bits) doesn't match the cmos information, to be sure we notice if we break backwards compatibility with the cmos API. This intermediate mixed paravirt API doesn't make much sense to me. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Qemu-devel] [PATCH 1/1] e820: pass high memory too. 2013-10-17 13:00 ` Andrea Arcangeli @ 2013-10-17 14:30 ` Gerd Hoffmann 2013-10-17 16:15 ` Andrea Arcangeli 0 siblings, 1 reply; 9+ messages in thread From: Gerd Hoffmann @ 2013-10-17 14:30 UTC (permalink / raw) To: Andrea Arcangeli; +Cc: qemu-devel, Anthony Liguori On Do, 2013-10-17 at 15:00 +0200, Andrea Arcangeli wrote: > Hi, > > On Thu, Oct 17, 2013 at 01:09:38PM +0200, Gerd Hoffmann wrote: > > We have a fw_cfg entry to pass e820 entries from qemu to the firmware. > > Today it's used to pass reservations only. This patch makes qemu pass > > entries for RAM too. > > > > This allows to pass RAM sizes larger than 1TB to the firmware and it > > will also allow to pass non-contignous memory ramges should we decide > > to implement that some day, say for our virtual numa nodes. > > > > Obviously this needs some extra care to not break existing firware. > > > > SeaBIOS loads the entries and happily adds them without looking at the > > type. Which is problematic for memory below 4g as this will overwrite > > reservations added for bios memory etc. For memory above 4g it works > > just fine, seabios will merge the entry derived from cmos with the one > > loaded from fw_cfg. > > The reason for not fixing the cmos and defer the fixage of the >1TB > boot, is to develop a better approach, and this mixture of e820 and > cmos doesn't look like an improvement. The only thing it avoids is to > touch seabios but it provides no benefit whatsoever if compared to > fixing the cmos which looks cleaner to me than having to compute a mix > of cmos and e820 in seabios (and potentially having other bioses > following this mix-incomplete-API). e820 allows to pass non-contignous ram ranges to seabios (not that qemu supports that today, but when implemented the qemu/seabios interface will deal with it just fine). How you'll do that with the cmos? > I thought the reason of deferring the fixage of >1TB boot to wait for > a better approach and better API, I didn't think the end result had to > be a mix API that adds no value. IMO e820 is better than CMOS. > The premise that "this will also allow to pass non-contiguous memory" > is partly false, as you can't use the e820 API below 4g so there's no > way to create non contiguous memory with this mix-cmos-e820-API. Sure you can. Why do you think you can't? > So instead of adding "if (0)" patches and requiring bioses to mix > information from e820 maps and cmos to boot with more than 1TB, why > can't simply seabios can be fixed to preserve its own reservations > (fragmenting the e820 map passed by qemu) while it build the e820 map? > > So then we the qemu API becomes: > > e820_add_entry(0, ram_size, E820_RAM); That is the goal. seabios will be fixed to deal with this correctly. I don't want break old seabios versions though (especially not before we have a seabios release which can handle it), so I'll wait with flipping the switch for that. cheers, Gerd ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Qemu-devel] [PATCH 1/1] e820: pass high memory too. 2013-10-17 14:30 ` Gerd Hoffmann @ 2013-10-17 16:15 ` Andrea Arcangeli 2013-10-18 8:55 ` Gerd Hoffmann 0 siblings, 1 reply; 9+ messages in thread From: Andrea Arcangeli @ 2013-10-17 16:15 UTC (permalink / raw) To: Gerd Hoffmann; +Cc: qemu-devel, Anthony Liguori On Thu, Oct 17, 2013 at 04:30:27PM +0200, Gerd Hoffmann wrote: > On Do, 2013-10-17 at 15:00 +0200, Andrea Arcangeli wrote: > > Hi, > > > > On Thu, Oct 17, 2013 at 01:09:38PM +0200, Gerd Hoffmann wrote: > > > We have a fw_cfg entry to pass e820 entries from qemu to the firmware. > > > Today it's used to pass reservations only. This patch makes qemu pass > > > entries for RAM too. > > > > > > This allows to pass RAM sizes larger than 1TB to the firmware and it > > > will also allow to pass non-contignous memory ramges should we decide > > > to implement that some day, say for our virtual numa nodes. > > > > > > Obviously this needs some extra care to not break existing firware. > > > > > > SeaBIOS loads the entries and happily adds them without looking at the > > > type. Which is problematic for memory below 4g as this will overwrite > > > reservations added for bios memory etc. For memory above 4g it works > > > just fine, seabios will merge the entry derived from cmos with the one > > > loaded from fw_cfg. > > > > The reason for not fixing the cmos and defer the fixage of the >1TB > > boot, is to develop a better approach, and this mixture of e820 and > > cmos doesn't look like an improvement. The only thing it avoids is to > > touch seabios but it provides no benefit whatsoever if compared to > > fixing the cmos which looks cleaner to me than having to compute a mix > > of cmos and e820 in seabios (and potentially having other bioses > > following this mix-incomplete-API). > > e820 allows to pass non-contignous ram ranges to seabios (not that qemu > supports that today, but when implemented the qemu/seabios interface > will deal with it just fine). How you'll do that with the cmos? You're changing the qemu-bios paravirt protocol, and to boot with >1TB seabios is now requires to mix information from two APIs (rtc and e820 fw_cfg command). > IMO e820 is better than CMOS. Agreed. > > > The premise that "this will also allow to pass non-contiguous memory" > > is partly false, as you can't use the e820 API below 4g so there's no > > way to create non contiguous memory with this mix-cmos-e820-API. > > Sure you can. Why do you think you can't? How do you specify an hole below 4g unless you modify seabios first? > That is the goal. seabios will be fixed to deal with this correctly. > I don't want break old seabios versions though (especially not before we > have a seabios release which can handle it), so I'll wait with flipping > the switch for that. Why to ship qemu with an intermediate paravirt protocol? And if you don't want to break old seabios I guess you should use a new fw_cfg command. Just to show you how flakey this intermediate paravirt interface is, assume I boot with -m 1029g. So "high" is 1g in seabios. So RamSizeOver4G is 1g. RamSizeOver4G = high; add_e820(0x100000000ull, high, E820_RAM); so far so good for e820 maps, that gets overwritten later. But that's not the end of it. Then seabios does: r64_mem.base = ALIGN(0x100000000LL + RamSizeOver4G, align_mem); r64_pref.base = ALIGN(r64_mem.base + sum_mem, align_pref); So seabios will map pci space at 5g where there is ram, instead of at 1024g. And in smbios (what is smbios anyway? :) add_struct(19, p, 0, RamSize >> 20, 0); if (RamSizeOver4G) add_struct(19, p, 4096, RamSizeOver4G >> 20, 1); I doubt you intended the above range to be 4g-5g on a with 1029g of ram. Not to tell: int ram_mb = (RamSize + RamSizeOver4G) >> 20; ram_mb is actually 5G when it should be 1029g. In short your change is already breaking current seabios. But even if it would work, my fundamental problem is the fact this is a flakey mixture of APIs to create a new intermediate paravirt interface that some other bios could have the idea to support and if they do, they risk a breakage again when qemu speaks the final paravirt protocol that allows real ram holes to be created below 4g. If we don't want to fix the rtc interface to fix the 1TB, to get a better paravirt protocol implemented instead, well then the only way is to first modify seabios to pick the ramover4g info from the highest address of the e820 table, and to avoid the e820 reservations to be overwritten by ram ranges below 4g. And then use a different fw_cfg command value, if you intend to be backwards compatible with old seabios that wouldn't cope with qemu initially passing (0, ram_size) as e820 range for the RAM. When seabios speaks the new paravirt interface, only then modify qemu to use the new paravirt interface. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Qemu-devel] [PATCH 1/1] e820: pass high memory too. 2013-10-17 16:15 ` Andrea Arcangeli @ 2013-10-18 8:55 ` Gerd Hoffmann 2013-10-18 13:31 ` Andrea Arcangeli 0 siblings, 1 reply; 9+ messages in thread From: Gerd Hoffmann @ 2013-10-18 8:55 UTC (permalink / raw) To: Andrea Arcangeli; +Cc: qemu-devel, Anthony Liguori Hi, > > > The premise that "this will also allow to pass non-contiguous memory" > > > is partly false, as you can't use the e820 API below 4g so there's no > > > way to create non contiguous memory with this mix-cmos-e820-API. > > > > Sure you can. Why do you think you can't? > > How do you specify an hole below 4g unless you modify seabios first? First chunk of memory in cmos. Full list in e820 (first chunk temporarily disabled). > > That is the goal. seabios will be fixed to deal with this correctly. > > I don't want break old seabios versions though (especially not before we > > have a seabios release which can handle it), so I'll wait with flipping > > the switch for that. > > Why to ship qemu with an intermediate paravirt protocol? > > And if you don't want to break old seabios I guess you should use a > new fw_cfg command. That'll work too. /me was just trying to avoid having two almost identical interfaces in fw_cfg. > Just to show you how flakey this intermediate paravirt interface is, > assume I boot with -m 1029g. > [ snip ] To get more than 1TB work correctly (with all corner cases covered) both qemu and seabios need to be updated. No matter whenever we'll go implement that using cmos, by extending e820 interface or using a new e820 interface. > So "high" is 1g in seabios. So > RamSizeOver4G is 1g. Patch to address that exists. > In short your change is already breaking current seabios. No. > When seabios speaks the new paravirt interface, only then modify qemu > to use the new paravirt interface. This isn't RHEL. We can't enforce any ordering with rpm dependencies. cheers, Gerd ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Qemu-devel] [PATCH 1/1] e820: pass high memory too. 2013-10-18 8:55 ` Gerd Hoffmann @ 2013-10-18 13:31 ` Andrea Arcangeli 2013-10-18 14:37 ` Gerd Hoffmann 0 siblings, 1 reply; 9+ messages in thread From: Andrea Arcangeli @ 2013-10-18 13:31 UTC (permalink / raw) To: Gerd Hoffmann; +Cc: qemu-devel, Anthony Liguori On Fri, Oct 18, 2013 at 10:55:12AM +0200, Gerd Hoffmann wrote: > Hi, > > > > > The premise that "this will also allow to pass non-contiguous memory" > > > > is partly false, as you can't use the e820 API below 4g so there's no > > > > way to create non contiguous memory with this mix-cmos-e820-API. > > > > > > Sure you can. Why do you think you can't? > > > > How do you specify an hole below 4g unless you modify seabios first? > > First chunk of memory in cmos. > Full list in e820 (first chunk temporarily disabled). How do you know the full list below 4g isn't wiping out non-ram seabios mappings? > To get more than 1TB work correctly (with all corner cases covered) both > qemu and seabios need to be updated. No matter whenever we'll go > implement that using cmos, by extending e820 interface or using a new > e820 interface. Agreed. > > > So "high" is 1g in seabios. So > > RamSizeOver4G is 1g. > > Patch to address that exists. I don't get what you mean here. > > > In short your change is already breaking current seabios. > > No. I guess next thing is to try if it kernel boots and the root filesystem gets mounted with -m 1029g. Maybe something lucky happens and it actually boots but I think there's a reasonable chance that it doesn't boot considering the broken addresses where it will map pci space and other bits. > > When seabios speaks the new paravirt interface, only then modify qemu > > to use the new paravirt interface. > > This isn't RHEL. We can't enforce any ordering with rpm dependencies. We ship the bios.bin binary in qemu? In the same commit you add a proper paravirt interface and you update bios.bin. Did anybody ever update seabios paravirt protocols before in qemu lifetime? There wasn't even seabios initially, there shall be a way to upgrade the bios, isn't there? Furthermore by adding a new fw_cfg like I suggested above (which is the only way to avoid breaking current seabios interface), old seabios and new seabios can keep working fine. The current e820 API simply is unusable if seabios overwrites the non-ram ranges it reserved earlier. And mixing it with cmos sounds a bad idea as cmos is truncated and leads to various variables in current seabios to end up truncated to 1g instead of 1029g. If people uses non standard bioses, they better know what they're doing. And if they want to be really safe and sure it's all tested and flawless, they should buy RHEL/RHEV in the first place. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Qemu-devel] [PATCH 1/1] e820: pass high memory too. 2013-10-18 13:31 ` Andrea Arcangeli @ 2013-10-18 14:37 ` Gerd Hoffmann 0 siblings, 0 replies; 9+ messages in thread From: Gerd Hoffmann @ 2013-10-18 14:37 UTC (permalink / raw) To: Andrea Arcangeli; +Cc: qemu-devel, Anthony Liguori On Fr, 2013-10-18 at 15:31 +0200, Andrea Arcangeli wrote: > On Fri, Oct 18, 2013 at 10:55:12AM +0200, Gerd Hoffmann wrote: > > Hi, > > > > > > > The premise that "this will also allow to pass non-contiguous memory" > > > > > is partly false, as you can't use the e820 API below 4g so there's no > > > > > way to create non contiguous memory with this mix-cmos-e820-API. > > > > > > > > Sure you can. Why do you think you can't? > > > > > > How do you specify an hole below 4g unless you modify seabios first? > > > > First chunk of memory in cmos. > > Full list in e820 (first chunk temporarily disabled). > > How do you know the full list below 4g isn't wiping out non-ram > seabios mappings? Unlikely, but might happen if you have pci devices with large bars. seabios needs updates to handle non-contignous memory correctly. First, for this one. Second, to fix the same issue above 4G (as you've raised in your previous mail). Third, to make sure smbios information is correct. Maybe more. But that is (a) internal to seabios and doesn't require adjustments to the qemu/seabios interface and (b) only needed to handle stuff not working today, i.e. there are no regressions. > > To get more than 1TB work correctly (with all corner cases covered) both > > qemu and seabios need to be updated. No matter whenever we'll go > > implement that using cmos, by extending e820 interface or using a new > > e820 interface. > > Agreed. > > > > > > So "high" is 1g in seabios. So > > > RamSizeOver4G is 1g. > > > > Patch to address that exists. > > I don't get what you mean here. Patch for seabios, not qemu. RamSizeOver4G must be adjusted according to the e820 entries. > > > In short your change is already breaking current seabios. > > > > No. > > I guess next thing is to try if it kernel boots and the root > filesystem gets mounted with -m 1029g. Maybe something lucky happens > and it actually boots but I think there's a reasonable chance that it > doesn't boot considering the broken addresses where it will map pci > space and other bits. Likewise broken without the patch, possibly with other symptoms. More than 1TB just doesn't work correctly without patching both qemu and seabios. > > > When seabios speaks the new paravirt interface, only then modify qemu > > > to use the new paravirt interface. > > > > This isn't RHEL. We can't enforce any ordering with rpm dependencies. > > We ship the bios.bin binary in qemu? In the same commit you add a > proper paravirt interface and you update bios.bin. That doesn't help much. Distros usually compile seabios themself from the sources. > Did anybody ever > update seabios paravirt protocols before in qemu lifetime? The process is: (1) commit patches to qemu. (2) commit patches to seabios. (3) wait for next seabios release. (4) update seabios. > Furthermore by adding a new fw_cfg like I suggested above (which is > the only way to avoid breaking current seabios interface), old seabios > and new seabios can keep working fine. The current e820 API simply is > unusable if seabios overwrites the non-ram ranges it reserved > earlier. I'll go brew up patches doing that. cheers, Gerd ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Qemu-devel] [PULL 0/1] e820: pass high memory too. 2013-10-17 11:09 [Qemu-devel] [PULL 0/1] e820: pass high memory too Gerd Hoffmann 2013-10-17 11:09 ` [Qemu-devel] [PATCH 1/1] " Gerd Hoffmann @ 2013-10-18 11:31 ` Gerd Hoffmann 1 sibling, 0 replies; 9+ messages in thread From: Gerd Hoffmann @ 2013-10-18 11:31 UTC (permalink / raw) To: qemu-devel On Do, 2013-10-17 at 13:09 +0200, Gerd Hoffmann wrote: > Hi, > > See patch description for all the details. Patch has been out for > review for a while without objections. Revoking pull request. There are objections now. cheers, Gerd ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2013-10-18 14:37 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2013-10-17 11:09 [Qemu-devel] [PULL 0/1] e820: pass high memory too Gerd Hoffmann 2013-10-17 11:09 ` [Qemu-devel] [PATCH 1/1] " Gerd Hoffmann 2013-10-17 13:00 ` Andrea Arcangeli 2013-10-17 14:30 ` Gerd Hoffmann 2013-10-17 16:15 ` Andrea Arcangeli 2013-10-18 8:55 ` Gerd Hoffmann 2013-10-18 13:31 ` Andrea Arcangeli 2013-10-18 14:37 ` Gerd Hoffmann 2013-10-18 11:31 ` [Qemu-devel] [PULL 0/1] " Gerd Hoffmann
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).