* maybe revert commit c275a57f5ec3 "xen/balloon: Set balloon's initial state to number of existing RAM pages" @ 2017-03-22 21:16 Dan Streetman 2017-03-23 2:13 ` Boris Ostrovsky 0 siblings, 1 reply; 18+ messages in thread From: Dan Streetman @ 2017-03-22 21:16 UTC (permalink / raw) To: Boris Ostrovsky, Konrad Rzeszutek Wilk Cc: David Vrabel, Juergen Gross, xen-devel, linux-kernel I have a question about a problem introduced by this commit: c275a57f5ec3056f732843b11659d892235faff7 "xen/balloon: Set balloon's initial state to number of existing RAM pages" It changed the xen balloon current_pages calculation to start with the number of physical pages in the system, instead of max_pfn. Since get_num_physpages() does not include holes, it's always less than the e820 map's max_pfn. However, the problem that commit introduced is, if the hypervisor sets the balloon target to equal to the e820 map's max_pfn, then the balloon target will *always* be higher than the initial current pages. Even if the hypervisor sets the target to (e820 max_pfn - holes), if the OS adds any holes, the balloon target will be higher than the current pages. This is the situation, for example, for Amazon AWS instances. The result is, the xen balloon will always immediately hotplug some memory at boot, but then make only (max_pfn - get_num_physpages()) available to the system. This balloon-hotplugged memory can cause problems, if the hypervisor wasn't expecting it; specifically, the system's physical page addresses now will exceed the e820 map's max_pfn, due to the balloon-hotplugged pages; if the hypervisor isn't expecting pt-device DMA to/from those physical pages above the e820 max_pfn, it causes problems. For example: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1668129 The additional small amount of balloon memory can cause other problems as well, for example: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457 Anyway, I'd like to ask, was the original commit added because hypervisors are supposed to set their balloon target to the guest system's number of phys pages (max_pfn - holes)? The mailing list discussion and commit description seem to indicate that. However I'm not sure how that is possible, because the kernel reserves its own holes, regardless of any predefined holes in the e820 map; for example, the kernel reserves 64k (by default) at phys addr 0 (the amount of reservation is configurable via CONFIG_X86_RESERVE_LOW). So the hypervisor really has no way to know what the "right" target to specify is; unless it knows the exact guest OS and kernel version, and kernel config values, it will never be able to correctly specify its target to be exactly (e820 max_pfn - all holes). Should this commit be reverted? Should the xen balloon target be adjusted based on kernel-added e820 holes? Should something else be done? For context, Amazon Linux has simply disabled Xen ballooning completely. Likewise, we're planning to disable Xen ballooning in the Ubuntu kernel for Amazon AWS-specific kernels (but not for non-AWS Ubuntu kernels). However, if reverting this patch makes sense in a bigger context (i.e. Xen users besides AWS), that would allow more Ubuntu kernels to work correctly in AWS instances. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: maybe revert commit c275a57f5ec3 "xen/balloon: Set balloon's initial state to number of existing RAM pages" 2017-03-22 21:16 maybe revert commit c275a57f5ec3 "xen/balloon: Set balloon's initial state to number of existing RAM pages" Dan Streetman @ 2017-03-23 2:13 ` Boris Ostrovsky 2017-03-23 7:56 ` Juergen Gross 2017-03-24 20:34 ` Dan Streetman 0 siblings, 2 replies; 18+ messages in thread From: Boris Ostrovsky @ 2017-03-23 2:13 UTC (permalink / raw) To: Dan Streetman, Konrad Rzeszutek Wilk Cc: David Vrabel, Juergen Gross, xen-devel, linux-kernel On 03/22/2017 05:16 PM, Dan Streetman wrote: > I have a question about a problem introduced by this commit: > c275a57f5ec3056f732843b11659d892235faff7 > "xen/balloon: Set balloon's initial state to number of existing RAM pages" > > It changed the xen balloon current_pages calculation to start with the > number of physical pages in the system, instead of max_pfn. Since > get_num_physpages() does not include holes, it's always less than the > e820 map's max_pfn. > > However, the problem that commit introduced is, if the hypervisor sets > the balloon target to equal to the e820 map's max_pfn, then the > balloon target will *always* be higher than the initial current pages. > Even if the hypervisor sets the target to (e820 max_pfn - holes), if > the OS adds any holes, the balloon target will be higher than the > current pages. This is the situation, for example, for Amazon AWS > instances. The result is, the xen balloon will always immediately > hotplug some memory at boot, but then make only (max_pfn - > get_num_physpages()) available to the system. > > This balloon-hotplugged memory can cause problems, if the hypervisor > wasn't expecting it; specifically, the system's physical page > addresses now will exceed the e820 map's max_pfn, due to the > balloon-hotplugged pages; if the hypervisor isn't expecting pt-device > DMA to/from those physical pages above the e820 max_pfn, it causes > problems. For example: > https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1668129 > > The additional small amount of balloon memory can cause other problems > as well, for example: > https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457 > > Anyway, I'd like to ask, was the original commit added because > hypervisors are supposed to set their balloon target to the guest > system's number of phys pages (max_pfn - holes)? The mailing list > discussion and commit description seem to indicate that. IIRC the problem that this was trying to fix was that since max_pfn includes holes, upon booting we'd immediately balloon down by the (typically, MMIO) hole size. If you boot a guest with ~4+GB memory you should see this. > However I'm > not sure how that is possible, because the kernel reserves its own > holes, regardless of any predefined holes in the e820 map; for > example, the kernel reserves 64k (by default) at phys addr 0 (the > amount of reservation is configurable via CONFIG_X86_RESERVE_LOW). So > the hypervisor really has no way to know what the "right" target to > specify is; unless it knows the exact guest OS and kernel version, and > kernel config values, it will never be able to correctly specify its > target to be exactly (e820 max_pfn - all holes). > > Should this commit be reverted? Should the xen balloon target be > adjusted based on kernel-added e820 holes? I think the second one but shouldn't current_pages be updated, and not the target? The latter is set by Xen (toolstack, via xenstore usually). Also, the bugs above (at least one of them) talk about NVMe and I wonder whether the memory that they add is of RAM type --- I believe it has its own type and so perhaps that introduces additional inconsistencies. AWS may have added their own support for that, which we don't have upstream yet. -boris > Should something else be > done? > > For context, Amazon Linux has simply disabled Xen ballooning > completely. Likewise, we're planning to disable Xen ballooning in the > Ubuntu kernel for Amazon AWS-specific kernels (but not for non-AWS > Ubuntu kernels). However, if reverting this patch makes sense in a > bigger context (i.e. Xen users besides AWS), that would allow more > Ubuntu kernels to work correctly in AWS instances. > ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: maybe revert commit c275a57f5ec3 "xen/balloon: Set balloon's initial state to number of existing RAM pages" 2017-03-23 2:13 ` Boris Ostrovsky @ 2017-03-23 7:56 ` Juergen Gross 2017-03-24 20:30 ` Dan Streetman 2017-03-24 20:34 ` Dan Streetman 1 sibling, 1 reply; 18+ messages in thread From: Juergen Gross @ 2017-03-23 7:56 UTC (permalink / raw) To: Dan Streetman Cc: Boris Ostrovsky, Konrad Rzeszutek Wilk, xen-devel, linux-kernel On 23/03/17 03:13, Boris Ostrovsky wrote: > > > On 03/22/2017 05:16 PM, Dan Streetman wrote: >> I have a question about a problem introduced by this commit: >> c275a57f5ec3056f732843b11659d892235faff7 >> "xen/balloon: Set balloon's initial state to number of existing RAM >> pages" >> >> It changed the xen balloon current_pages calculation to start with the >> number of physical pages in the system, instead of max_pfn. Since >> get_num_physpages() does not include holes, it's always less than the >> e820 map's max_pfn. >> >> However, the problem that commit introduced is, if the hypervisor sets >> the balloon target to equal to the e820 map's max_pfn, then the >> balloon target will *always* be higher than the initial current pages. >> Even if the hypervisor sets the target to (e820 max_pfn - holes), if >> the OS adds any holes, the balloon target will be higher than the >> current pages. This is the situation, for example, for Amazon AWS >> instances. The result is, the xen balloon will always immediately >> hotplug some memory at boot, but then make only (max_pfn - >> get_num_physpages()) available to the system. >> >> This balloon-hotplugged memory can cause problems, if the hypervisor >> wasn't expecting it; specifically, the system's physical page >> addresses now will exceed the e820 map's max_pfn, due to the >> balloon-hotplugged pages; if the hypervisor isn't expecting pt-device >> DMA to/from those physical pages above the e820 max_pfn, it causes >> problems. For example: >> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1668129 >> >> The additional small amount of balloon memory can cause other problems >> as well, for example: >> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457 >> >> Anyway, I'd like to ask, was the original commit added because >> hypervisors are supposed to set their balloon target to the guest >> system's number of phys pages (max_pfn - holes)? The mailing list >> discussion and commit description seem to indicate that. > > > IIRC the problem that this was trying to fix was that since max_pfn > includes holes, upon booting we'd immediately balloon down by the > (typically, MMIO) hole size. > > If you boot a guest with ~4+GB memory you should see this. > > >> However I'm >> not sure how that is possible, because the kernel reserves its own >> holes, regardless of any predefined holes in the e820 map; for >> example, the kernel reserves 64k (by default) at phys addr 0 (the >> amount of reservation is configurable via CONFIG_X86_RESERVE_LOW). So >> the hypervisor really has no way to know what the "right" target to >> specify is; unless it knows the exact guest OS and kernel version, and >> kernel config values, it will never be able to correctly specify its >> target to be exactly (e820 max_pfn - all holes). >> >> Should this commit be reverted? Should the xen balloon target be >> adjusted based on kernel-added e820 holes? > > I think the second one but shouldn't current_pages be updated, and not > the target? The latter is set by Xen (toolstack, via xenstore usually). Right. Looking into a HVM domU I can't see any problem related to CONFIG_X86_RESERVE_LOW: it is set to 64 on my system. The domU is configured with 2048 MB of RAM, 8MB being video RAM. Looking into /sys/devices/system/xen_memory/xen_memory0 I can see the current size and target size do match: both are 2088960 kB (2 GB - 8 MB). Ballooning down and up to 2048 MB again doesn't change the picture. So which additional holes are added by the kernel on AWS via which functions? Juergen ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: maybe revert commit c275a57f5ec3 "xen/balloon: Set balloon's initial state to number of existing RAM pages" 2017-03-23 7:56 ` Juergen Gross @ 2017-03-24 20:30 ` Dan Streetman 0 siblings, 0 replies; 18+ messages in thread From: Dan Streetman @ 2017-03-24 20:30 UTC (permalink / raw) To: Juergen Gross Cc: Boris Ostrovsky, Konrad Rzeszutek Wilk, xen-devel, linux-kernel On Thu, Mar 23, 2017 at 3:56 AM, Juergen Gross <jgross@suse.com> wrote: > On 23/03/17 03:13, Boris Ostrovsky wrote: >> >> >> On 03/22/2017 05:16 PM, Dan Streetman wrote: >>> I have a question about a problem introduced by this commit: >>> c275a57f5ec3056f732843b11659d892235faff7 >>> "xen/balloon: Set balloon's initial state to number of existing RAM >>> pages" >>> >>> It changed the xen balloon current_pages calculation to start with the >>> number of physical pages in the system, instead of max_pfn. Since >>> get_num_physpages() does not include holes, it's always less than the >>> e820 map's max_pfn. >>> >>> However, the problem that commit introduced is, if the hypervisor sets >>> the balloon target to equal to the e820 map's max_pfn, then the >>> balloon target will *always* be higher than the initial current pages. >>> Even if the hypervisor sets the target to (e820 max_pfn - holes), if >>> the OS adds any holes, the balloon target will be higher than the >>> current pages. This is the situation, for example, for Amazon AWS >>> instances. The result is, the xen balloon will always immediately >>> hotplug some memory at boot, but then make only (max_pfn - >>> get_num_physpages()) available to the system. >>> >>> This balloon-hotplugged memory can cause problems, if the hypervisor >>> wasn't expecting it; specifically, the system's physical page >>> addresses now will exceed the e820 map's max_pfn, due to the >>> balloon-hotplugged pages; if the hypervisor isn't expecting pt-device >>> DMA to/from those physical pages above the e820 max_pfn, it causes >>> problems. For example: >>> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1668129 >>> >>> The additional small amount of balloon memory can cause other problems >>> as well, for example: >>> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457 >>> >>> Anyway, I'd like to ask, was the original commit added because >>> hypervisors are supposed to set their balloon target to the guest >>> system's number of phys pages (max_pfn - holes)? The mailing list >>> discussion and commit description seem to indicate that. >> >> >> IIRC the problem that this was trying to fix was that since max_pfn >> includes holes, upon booting we'd immediately balloon down by the >> (typically, MMIO) hole size. >> >> If you boot a guest with ~4+GB memory you should see this. >> >> >>> However I'm >>> not sure how that is possible, because the kernel reserves its own >>> holes, regardless of any predefined holes in the e820 map; for >>> example, the kernel reserves 64k (by default) at phys addr 0 (the >>> amount of reservation is configurable via CONFIG_X86_RESERVE_LOW). So >>> the hypervisor really has no way to know what the "right" target to >>> specify is; unless it knows the exact guest OS and kernel version, and >>> kernel config values, it will never be able to correctly specify its >>> target to be exactly (e820 max_pfn - all holes). >>> >>> Should this commit be reverted? Should the xen balloon target be >>> adjusted based on kernel-added e820 holes? >> >> I think the second one but shouldn't current_pages be updated, and not >> the target? The latter is set by Xen (toolstack, via xenstore usually). > > Right. > > Looking into a HVM domU I can't see any problem related to > CONFIG_X86_RESERVE_LOW: it is set to 64 on my system. The domU is sorry I brought that up; I was only giving an example. It's not directly relevant to this and may have distracted from the actual problem; in fact on closer inspection, the X86_RESERVE_LOW is using memblock_reserve(), which removes it from managed memory but not the e820 map (and thus doesn't remove it from get_num_physpages()). Only phys page 0 is actually reserved in the e820 map. > configured with 2048 MB of RAM, 8MB being video RAM. Looking into > /sys/devices/system/xen_memory/xen_memory0 I can see the current > size and target size do match: both are 2088960 kB (2 GB - 8 MB). > > Ballooning down and up to 2048 MB again doesn't change the picture. > > So which additional holes are added by the kernel on AWS via which > functions? I'll use two AWS types as examples, t2.micro (1G mem) and t2.large (8G mem). In the micro, the results of ballooning are obvious, because the hotplugged memory always goes into the Normal zone; but since the base memory is only 1g, it's contained entirely in the DMA32/DMA zones. So we get: $ grep -E '(start_pfn|present|spanned|managed)' /proc/zoneinfo spanned 4095 present 3997 managed 3976 start_pfn: 1 spanned 258048 present 258048 managed 249606 start_pfn: 4096 spanned 32768 present 32768 managed 11 start_pfn: 262144 As you can see, none of the e820 memory went into the Normal zone; the balloon driver hotpluged 128m (32k pages), but only made 11 pages available. Having a memory zone with only 11 pages really screwed with kswapd, since the zone's memory watermarks were all 0. That was the second bug I referenced in my initial email. Anyway, if we look at the large instance, you don't really notice the additional balloon memory: $ grep -E '(start_pfn|present|spanned|managed)' /proc/zoneinfo spanned 4095 present 3997 managed 3976 start_pfn: 1 spanned 1044480 present 978944 managed 958778 start_pfn: 4096 spanned 1146880 present 1146880 managed 1080666 start_pfn: 1048576 but, doing the actual math shows the problem: $ printf "%x\n" $[ 1048576 + 1146880 ] 218000 $ printf "%x\n" $[ 1048576 + 1080666 ] 207d5a $ dmesg|grep e820 [ 0.000000] e820: BIOS-provided physical RAM map: [ 0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009dfff] usable [ 0.000000] BIOS-e820: [mem 0x000000000009e000-0x000000000009ffff] reserved [ 0.000000] BIOS-e820: [mem 0x00000000000e0000-0x00000000000fffff] reserved [ 0.000000] BIOS-e820: [mem 0x0000000000100000-0x00000000efffffff] usable [ 0.000000] BIOS-e820: [mem 0x00000000fc000000-0x00000000ffffffff] reserved [ 0.000000] BIOS-e820: [mem 0x0000000100000000-0x000000020fffffff] usable [ 0.000000] e820: update [mem 0x00000000-0x00000fff] usable ==> reserved [ 0.000000] e820: remove [mem 0x000a0000-0x000fffff] usable [ 0.000000] e820: last_pfn = 0x210000 max_arch_pfn = 0x400000000 [ 0.000000] e820: last_pfn = 0xf0000 max_arch_pfn = 0x400000000 [ 0.000000] e820: [mem 0xf0000000-0xfbffffff] available for PCI devices [ 0.595083] e820: reserve RAM buffer [mem 0x0009e000-0x0009ffff] so, we can see the balloon driver hotplugged those extra 0x8000 pages, and made some of them available. The target has been set to: $ printf "%x\n" $( cat /sys/devices/system/xen_memory/xen_memory0/target ) 200000000 while the e820 map provides: $ printf "%x\n" $[ 0x210000000 - 0x100000000 + 0xf0000000 - 0x100000 + 0x9e000 - 0x1000 ] 1fff9d000 and current memory is: /sys/devices/system/xen_memory/xen_memory0$ printf "%x\n" $[ $( cat info/current_kb ) * 1024 ] 1fffa8000 so the balloon driver has added... $ echo $[ ( 0x1fffa8000 - 0x1fff9d000 ) / 4096 ] 11 exactly 11 pages, just like the micro instance type. I'm not sure where the balloon driver gets that 11 page calculation, nor am I sure why the current_kb is actually less than the balloon target. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: maybe revert commit c275a57f5ec3 "xen/balloon: Set balloon's initial state to number of existing RAM pages" 2017-03-23 2:13 ` Boris Ostrovsky 2017-03-23 7:56 ` Juergen Gross @ 2017-03-24 20:34 ` Dan Streetman 2017-03-24 21:10 ` Konrad Rzeszutek Wilk 1 sibling, 1 reply; 18+ messages in thread From: Dan Streetman @ 2017-03-24 20:34 UTC (permalink / raw) To: Boris Ostrovsky Cc: Konrad Rzeszutek Wilk, David Vrabel, Juergen Gross, xen-devel, linux-kernel On Wed, Mar 22, 2017 at 10:13 PM, Boris Ostrovsky <boris.ostrovsky@oracle.com> wrote: > > > On 03/22/2017 05:16 PM, Dan Streetman wrote: >> >> I have a question about a problem introduced by this commit: >> c275a57f5ec3056f732843b11659d892235faff7 >> "xen/balloon: Set balloon's initial state to number of existing RAM pages" >> >> It changed the xen balloon current_pages calculation to start with the >> number of physical pages in the system, instead of max_pfn. Since >> get_num_physpages() does not include holes, it's always less than the >> e820 map's max_pfn. >> >> However, the problem that commit introduced is, if the hypervisor sets >> the balloon target to equal to the e820 map's max_pfn, then the >> balloon target will *always* be higher than the initial current pages. >> Even if the hypervisor sets the target to (e820 max_pfn - holes), if >> the OS adds any holes, the balloon target will be higher than the >> current pages. This is the situation, for example, for Amazon AWS >> instances. The result is, the xen balloon will always immediately >> hotplug some memory at boot, but then make only (max_pfn - >> get_num_physpages()) available to the system. >> >> This balloon-hotplugged memory can cause problems, if the hypervisor >> wasn't expecting it; specifically, the system's physical page >> addresses now will exceed the e820 map's max_pfn, due to the >> balloon-hotplugged pages; if the hypervisor isn't expecting pt-device >> DMA to/from those physical pages above the e820 max_pfn, it causes >> problems. For example: >> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1668129 >> >> The additional small amount of balloon memory can cause other problems >> as well, for example: >> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457 >> >> Anyway, I'd like to ask, was the original commit added because >> hypervisors are supposed to set their balloon target to the guest >> system's number of phys pages (max_pfn - holes)? The mailing list >> discussion and commit description seem to indicate that. > > > > IIRC the problem that this was trying to fix was that since max_pfn includes > holes, upon booting we'd immediately balloon down by the (typically, MMIO) > hole size. > > If you boot a guest with ~4+GB memory you should see this. > > >> However I'm >> not sure how that is possible, because the kernel reserves its own >> holes, regardless of any predefined holes in the e820 map; for >> example, the kernel reserves 64k (by default) at phys addr 0 (the >> amount of reservation is configurable via CONFIG_X86_RESERVE_LOW). So >> the hypervisor really has no way to know what the "right" target to >> specify is; unless it knows the exact guest OS and kernel version, and >> kernel config values, it will never be able to correctly specify its >> target to be exactly (e820 max_pfn - all holes). >> >> Should this commit be reverted? Should the xen balloon target be >> adjusted based on kernel-added e820 holes? > > > I think the second one but shouldn't current_pages be updated, and not the > target? The latter is set by Xen (toolstack, via xenstore usually). > > Also, the bugs above (at least one of them) talk about NVMe and I wonder > whether the memory that they add is of RAM type --- I believe it has its own > type and so perhaps that introduces additional inconsistencies. AWS may have > added their own support for that, which we don't have upstream yet. The type of memory doesn't have anything to do with it. The problem with NVMe is it's a passthrough device, so the guest talks directly to the NVMe controller and does DMA with it. But the hypervisor does swiotlb translation between the guest physical memory, and the host physical memory, so that the NVMe device can correctly DMA to the right memory in the host. However, the hypervisor only has the guest's physical memory up to the max e820 pfn mapped; it didn't expect the balloon driver to hotplug any additional memory above the e820 max pfn, so when the NVMe driver in the guest tries to tell the NVMe controller to DMA to that balloon-hotplugged memory, the hypervisor fails the NVMe request, because it can't do the guest-to-host phys mem mapping, since the guest phys address is outside the expected max range. > > -boris > > > >> Should something else be >> done? >> >> For context, Amazon Linux has simply disabled Xen ballooning >> completely. Likewise, we're planning to disable Xen ballooning in the >> Ubuntu kernel for Amazon AWS-specific kernels (but not for non-AWS >> Ubuntu kernels). However, if reverting this patch makes sense in a >> bigger context (i.e. Xen users besides AWS), that would allow more >> Ubuntu kernels to work correctly in AWS instances. >> > ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: maybe revert commit c275a57f5ec3 "xen/balloon: Set balloon's initial state to number of existing RAM pages" 2017-03-24 20:34 ` Dan Streetman @ 2017-03-24 21:10 ` Konrad Rzeszutek Wilk 2017-03-24 21:26 ` Dan Streetman 0 siblings, 1 reply; 18+ messages in thread From: Konrad Rzeszutek Wilk @ 2017-03-24 21:10 UTC (permalink / raw) To: Dan Streetman Cc: Boris Ostrovsky, David Vrabel, Juergen Gross, xen-devel, linux-kernel On Fri, Mar 24, 2017 at 04:34:23PM -0400, Dan Streetman wrote: > On Wed, Mar 22, 2017 at 10:13 PM, Boris Ostrovsky > <boris.ostrovsky@oracle.com> wrote: > > > > > > On 03/22/2017 05:16 PM, Dan Streetman wrote: > >> > >> I have a question about a problem introduced by this commit: > >> c275a57f5ec3056f732843b11659d892235faff7 > >> "xen/balloon: Set balloon's initial state to number of existing RAM pages" > >> > >> It changed the xen balloon current_pages calculation to start with the > >> number of physical pages in the system, instead of max_pfn. Since > >> get_num_physpages() does not include holes, it's always less than the > >> e820 map's max_pfn. > >> > >> However, the problem that commit introduced is, if the hypervisor sets > >> the balloon target to equal to the e820 map's max_pfn, then the > >> balloon target will *always* be higher than the initial current pages. > >> Even if the hypervisor sets the target to (e820 max_pfn - holes), if > >> the OS adds any holes, the balloon target will be higher than the > >> current pages. This is the situation, for example, for Amazon AWS > >> instances. The result is, the xen balloon will always immediately > >> hotplug some memory at boot, but then make only (max_pfn - > >> get_num_physpages()) available to the system. > >> > >> This balloon-hotplugged memory can cause problems, if the hypervisor > >> wasn't expecting it; specifically, the system's physical page > >> addresses now will exceed the e820 map's max_pfn, due to the > >> balloon-hotplugged pages; if the hypervisor isn't expecting pt-device > >> DMA to/from those physical pages above the e820 max_pfn, it causes > >> problems. For example: > >> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1668129 > >> > >> The additional small amount of balloon memory can cause other problems > >> as well, for example: > >> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457 > >> > >> Anyway, I'd like to ask, was the original commit added because > >> hypervisors are supposed to set their balloon target to the guest > >> system's number of phys pages (max_pfn - holes)? The mailing list > >> discussion and commit description seem to indicate that. > > > > > > > > IIRC the problem that this was trying to fix was that since max_pfn includes > > holes, upon booting we'd immediately balloon down by the (typically, MMIO) > > hole size. > > > > If you boot a guest with ~4+GB memory you should see this. > > > > > >> However I'm > >> not sure how that is possible, because the kernel reserves its own > >> holes, regardless of any predefined holes in the e820 map; for > >> example, the kernel reserves 64k (by default) at phys addr 0 (the > >> amount of reservation is configurable via CONFIG_X86_RESERVE_LOW). So > >> the hypervisor really has no way to know what the "right" target to > >> specify is; unless it knows the exact guest OS and kernel version, and > >> kernel config values, it will never be able to correctly specify its > >> target to be exactly (e820 max_pfn - all holes). > >> > >> Should this commit be reverted? Should the xen balloon target be > >> adjusted based on kernel-added e820 holes? > > > > > > I think the second one but shouldn't current_pages be updated, and not the > > target? The latter is set by Xen (toolstack, via xenstore usually). > > > > Also, the bugs above (at least one of them) talk about NVMe and I wonder > > whether the memory that they add is of RAM type --- I believe it has its own > > type and so perhaps that introduces additional inconsistencies. AWS may have > > added their own support for that, which we don't have upstream yet. > > The type of memory doesn't have anything to do with it. > > The problem with NVMe is it's a passthrough device, so the guest talks > directly to the NVMe controller and does DMA with it. But the > hypervisor does swiotlb translation between the guest physical memory, Um, the hypervisor does not have SWIOTLB support, only IOMMU support. > and the host physical memory, so that the NVMe device can correctly > DMA to the right memory in the host. > > However, the hypervisor only has the guest's physical memory up to the > max e820 pfn mapped; it didn't expect the balloon driver to hotplug > any additional memory above the e820 max pfn, so when the NVMe driver > in the guest tries to tell the NVMe controller to DMA to that > balloon-hotplugged memory, the hypervisor fails the NVMe request, But when the memory hotplug happens the hypercalls are done to raise the max pfn. > because it can't do the guest-to-host phys mem mapping, since the > guest phys address is outside the expected max range. > > > > > > > -boris > > > > > > > >> Should something else be > >> done? > >> > >> For context, Amazon Linux has simply disabled Xen ballooning > >> completely. Likewise, we're planning to disable Xen ballooning in the > >> Ubuntu kernel for Amazon AWS-specific kernels (but not for non-AWS > >> Ubuntu kernels). However, if reverting this patch makes sense in a > >> bigger context (i.e. Xen users besides AWS), that would allow more > >> Ubuntu kernels to work correctly in AWS instances. > >> > > ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: maybe revert commit c275a57f5ec3 "xen/balloon: Set balloon's initial state to number of existing RAM pages" 2017-03-24 21:10 ` Konrad Rzeszutek Wilk @ 2017-03-24 21:26 ` Dan Streetman 2017-03-25 1:33 ` Boris Ostrovsky 0 siblings, 1 reply; 18+ messages in thread From: Dan Streetman @ 2017-03-24 21:26 UTC (permalink / raw) To: Konrad Rzeszutek Wilk Cc: Boris Ostrovsky, David Vrabel, Juergen Gross, xen-devel, linux-kernel On Fri, Mar 24, 2017 at 5:10 PM, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> wrote: > On Fri, Mar 24, 2017 at 04:34:23PM -0400, Dan Streetman wrote: >> On Wed, Mar 22, 2017 at 10:13 PM, Boris Ostrovsky >> <boris.ostrovsky@oracle.com> wrote: >> > >> > >> > On 03/22/2017 05:16 PM, Dan Streetman wrote: >> >> >> >> I have a question about a problem introduced by this commit: >> >> c275a57f5ec3056f732843b11659d892235faff7 >> >> "xen/balloon: Set balloon's initial state to number of existing RAM pages" >> >> >> >> It changed the xen balloon current_pages calculation to start with the >> >> number of physical pages in the system, instead of max_pfn. Since >> >> get_num_physpages() does not include holes, it's always less than the >> >> e820 map's max_pfn. >> >> >> >> However, the problem that commit introduced is, if the hypervisor sets >> >> the balloon target to equal to the e820 map's max_pfn, then the >> >> balloon target will *always* be higher than the initial current pages. >> >> Even if the hypervisor sets the target to (e820 max_pfn - holes), if >> >> the OS adds any holes, the balloon target will be higher than the >> >> current pages. This is the situation, for example, for Amazon AWS >> >> instances. The result is, the xen balloon will always immediately >> >> hotplug some memory at boot, but then make only (max_pfn - >> >> get_num_physpages()) available to the system. >> >> >> >> This balloon-hotplugged memory can cause problems, if the hypervisor >> >> wasn't expecting it; specifically, the system's physical page >> >> addresses now will exceed the e820 map's max_pfn, due to the >> >> balloon-hotplugged pages; if the hypervisor isn't expecting pt-device >> >> DMA to/from those physical pages above the e820 max_pfn, it causes >> >> problems. For example: >> >> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1668129 >> >> >> >> The additional small amount of balloon memory can cause other problems >> >> as well, for example: >> >> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1518457 >> >> >> >> Anyway, I'd like to ask, was the original commit added because >> >> hypervisors are supposed to set their balloon target to the guest >> >> system's number of phys pages (max_pfn - holes)? The mailing list >> >> discussion and commit description seem to indicate that. >> > >> > >> > >> > IIRC the problem that this was trying to fix was that since max_pfn includes >> > holes, upon booting we'd immediately balloon down by the (typically, MMIO) >> > hole size. >> > >> > If you boot a guest with ~4+GB memory you should see this. >> > >> > >> >> However I'm >> >> not sure how that is possible, because the kernel reserves its own >> >> holes, regardless of any predefined holes in the e820 map; for >> >> example, the kernel reserves 64k (by default) at phys addr 0 (the >> >> amount of reservation is configurable via CONFIG_X86_RESERVE_LOW). So >> >> the hypervisor really has no way to know what the "right" target to >> >> specify is; unless it knows the exact guest OS and kernel version, and >> >> kernel config values, it will never be able to correctly specify its >> >> target to be exactly (e820 max_pfn - all holes). >> >> >> >> Should this commit be reverted? Should the xen balloon target be >> >> adjusted based on kernel-added e820 holes? >> > >> > >> > I think the second one but shouldn't current_pages be updated, and not the >> > target? The latter is set by Xen (toolstack, via xenstore usually). >> > >> > Also, the bugs above (at least one of them) talk about NVMe and I wonder >> > whether the memory that they add is of RAM type --- I believe it has its own >> > type and so perhaps that introduces additional inconsistencies. AWS may have >> > added their own support for that, which we don't have upstream yet. >> >> The type of memory doesn't have anything to do with it. >> >> The problem with NVMe is it's a passthrough device, so the guest talks >> directly to the NVMe controller and does DMA with it. But the >> hypervisor does swiotlb translation between the guest physical memory, > > Um, the hypervisor does not have SWIOTLB support, only IOMMU support. heh, well I have no special insight into Amazon's hypervisor, so I have no idea what underlying memory remapping mechanism it uses :) > >> and the host physical memory, so that the NVMe device can correctly >> DMA to the right memory in the host. >> >> However, the hypervisor only has the guest's physical memory up to the >> max e820 pfn mapped; it didn't expect the balloon driver to hotplug >> any additional memory above the e820 max pfn, so when the NVMe driver >> in the guest tries to tell the NVMe controller to DMA to that >> balloon-hotplugged memory, the hypervisor fails the NVMe request, > > But when the memory hotplug happens the hypercalls are done to > raise the max pfn. well...all I can say is it rejects DMA above the e820 range. so this very well may be a hypervisor bug, where it should add the balloon memory region to whatever does the NVMe passthrough device iommu mapping. I think we can all agree that the *ideal* situation would be, for the balloon driver to not immediately hotplug memory so it can add 11 more pages, so maybe I just need to figure out why the balloon driver thinks it needs 11 more pages, and fix that. > >> because it can't do the guest-to-host phys mem mapping, since the >> guest phys address is outside the expected max range. >> >> >> >> > >> > -boris >> > >> > >> > >> >> Should something else be >> >> done? >> >> >> >> For context, Amazon Linux has simply disabled Xen ballooning >> >> completely. Likewise, we're planning to disable Xen ballooning in the >> >> Ubuntu kernel for Amazon AWS-specific kernels (but not for non-AWS >> >> Ubuntu kernels). However, if reverting this patch makes sense in a >> >> bigger context (i.e. Xen users besides AWS), that would allow more >> >> Ubuntu kernels to work correctly in AWS instances. >> >> >> > ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: maybe revert commit c275a57f5ec3 "xen/balloon: Set balloon's initial state to number of existing RAM pages" 2017-03-24 21:26 ` Dan Streetman @ 2017-03-25 1:33 ` Boris Ostrovsky 2017-03-27 19:57 ` Dan Streetman 0 siblings, 1 reply; 18+ messages in thread From: Boris Ostrovsky @ 2017-03-25 1:33 UTC (permalink / raw) To: Dan Streetman, Konrad Rzeszutek Wilk Cc: Juergen Gross, xen-devel, linux-kernel > > I think we can all agree that the *ideal* situation would be, for the > balloon driver to not immediately hotplug memory so it can add 11 more > pages, so maybe I just need to figure out why the balloon driver > thinks it needs 11 more pages, and fix that. How does the new memory appear in the guest? Via online_pages()? Or is ballooning triggered from watch_target()? -boris ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: maybe revert commit c275a57f5ec3 "xen/balloon: Set balloon's initial state to number of existing RAM pages" 2017-03-25 1:33 ` Boris Ostrovsky @ 2017-03-27 19:57 ` Dan Streetman 2017-03-28 1:57 ` Boris Ostrovsky 0 siblings, 1 reply; 18+ messages in thread From: Dan Streetman @ 2017-03-27 19:57 UTC (permalink / raw) To: Boris Ostrovsky Cc: Konrad Rzeszutek Wilk, Juergen Gross, xen-devel, linux-kernel On Fri, Mar 24, 2017 at 9:33 PM, Boris Ostrovsky <boris.ostrovsky@oracle.com> wrote: > >> >> I think we can all agree that the *ideal* situation would be, for the >> balloon driver to not immediately hotplug memory so it can add 11 more >> pages, so maybe I just need to figure out why the balloon driver >> thinks it needs 11 more pages, and fix that. > > > > How does the new memory appear in the guest? Via online_pages()? > > Or is ballooning triggered from watch_target()? yes, it's triggered from watch_target() which then calls online_pages() with the new memory. I added some debug (all numbers are in hex): [ 0.500080] xen:balloon: Initialising balloon driver [ 0.503027] xen:balloon: balloon_init: current/target pages 1fff9d [ 0.504044] xen_balloon: Initialising balloon driver [ 0.508046] xen_balloon: watch_target: new target 800000 kb [ 0.508046] xen:balloon: balloon_set_new_target: target 200000 [ 0.524024] xen:balloon: current_credit: target pages 200000 current pages 1fff9d credit 63 [ 0.567055] xen:balloon: balloon_process: current_credit 63 [ 0.568005] xen:balloon: reserve_additional_memory: adding memory resource for 8000 pages [ 3.694443] online_pages: pfn 210000 nr_pages 8000 type 0 [ 3.701072] xen:balloon: current_credit: target pages 200000 current pages 1fff9d credit 63 [ 3.701074] xen:balloon: balloon_process: current_credit 63 [ 3.701075] xen:balloon: increase_reservation: nr_pages 63 [ 3.701170] xen:balloon: increase_reservation: done, current_pages 1fffa8 [ 3.701172] xen:balloon: current_credit: target pages 200000 current pages 1fffa8 credit 58 [ 3.701173] xen:balloon: balloon_process: current_credit 58 [ 3.701173] xen:balloon: increase_reservation: nr_pages 58 [ 3.701180] xen:balloon: increase_reservation: XENMEM_populate_physmap err 0 [ 5.708085] xen:balloon: current_credit: target pages 200000 current pages 1fffa8 credit 58 [ 5.708088] xen:balloon: balloon_process: current_credit 58 [ 5.708089] xen:balloon: increase_reservation: nr_pages 58 [ 5.708106] xen:balloon: increase_reservation: XENMEM_populate_physmap err 0 [ 9.716065] xen:balloon: current_credit: target pages 200000 current pages 1fffa8 credit 58 [ 9.716068] xen:balloon: balloon_process: current_credit 58 [ 9.716069] xen:balloon: increase_reservation: nr_pages 58 [ 9.716087] xen:balloon: increase_reservation: XENMEM_populate_physmap err 0 and that continues forever at the max interval (32), since max_retry_count is unlimited. So I think I understand things now; first, the current_pages is set properly based on the e820 map: $ dmesg|grep -i e820 [ 0.000000] e820: BIOS-provided physical RAM map: [ 0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009dfff] usable [ 0.000000] BIOS-e820: [mem 0x000000000009e000-0x000000000009ffff] reserved [ 0.000000] BIOS-e820: [mem 0x00000000000e0000-0x00000000000fffff] reserved [ 0.000000] BIOS-e820: [mem 0x0000000000100000-0x00000000efffffff] usable [ 0.000000] BIOS-e820: [mem 0x00000000fc000000-0x00000000ffffffff] reserved [ 0.000000] BIOS-e820: [mem 0x0000000100000000-0x000000020fffffff] usable [ 0.000000] e820: update [mem 0x00000000-0x00000fff] usable ==> reserved [ 0.000000] e820: remove [mem 0x000a0000-0x000fffff] usable [ 0.000000] e820: last_pfn = 0x210000 max_arch_pfn = 0x400000000 [ 0.000000] e820: last_pfn = 0xf0000 max_arch_pfn = 0x400000000 [ 0.000000] e820: [mem 0xf0000000-0xfbffffff] available for PCI devices [ 0.528007] e820: reserve RAM buffer [mem 0x0009e000-0x0009ffff] ubuntu@ip-172-31-60-112:~$ printf "%x\n" $[ 0x210000 - 0x100000 + 0xf0000 - 0x100 + 0x9e - 1 ] 1fff9d then, the xen balloon notices its target has been set to 200000 by the hypervisor. That target does account for the hole at 0xf0000 to 0x100000, but it doesn't account for the hole at 0xe0 to 0x100 ( 0x20 pages), nor the hole at 0x9e to 0xa0 ( 2 pages ), nor the unlisted hole (that the kernel removes) at 0xa0 to 0xe0 ( 0x40 pages). That's 0x62 pages, plus the 1-page hole at addr 0 that the kernel always reserves, is 0x63 pages of holes, which aren't accounted for in the hypervisor's target. so the balloon driver hotplugs the memory, and tries to increase its reservation to provide the needed pages to get the current_pages up to the target. However, when it calls the hypervisor to populate the physmap, the hypervisor only allows 11 (0xb) pages to be populated; all calls after that get back 0 from the hypervisor. Do you think the hypervisor's balloon target should account for the e820 holes (and for the kernel's added hole at addr 0)? Alternately/additionally, if the hypervisor doesn't want to support ballooning, should it just return error from the call to populate the physmap, and not allow those 11 pages? At this point, it doesn't seem to me like the kernel is doing anything wrong, correct? ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: maybe revert commit c275a57f5ec3 "xen/balloon: Set balloon's initial state to number of existing RAM pages" 2017-03-27 19:57 ` Dan Streetman @ 2017-03-28 1:57 ` Boris Ostrovsky 2017-03-28 8:08 ` [Xen-devel] " Jan Beulich 0 siblings, 1 reply; 18+ messages in thread From: Boris Ostrovsky @ 2017-03-28 1:57 UTC (permalink / raw) To: Dan Streetman Cc: Konrad Rzeszutek Wilk, Juergen Gross, xen-devel, linux-kernel On 03/27/2017 03:57 PM, Dan Streetman wrote: > On Fri, Mar 24, 2017 at 9:33 PM, Boris Ostrovsky > <boris.ostrovsky@oracle.com> wrote: >> >>> >>> I think we can all agree that the *ideal* situation would be, for the >>> balloon driver to not immediately hotplug memory so it can add 11 more >>> pages, so maybe I just need to figure out why the balloon driver >>> thinks it needs 11 more pages, and fix that. >> >> >> >> How does the new memory appear in the guest? Via online_pages()? >> >> Or is ballooning triggered from watch_target()? > > yes, it's triggered from watch_target() which then calls > online_pages() with the new memory. I added some debug (all numbers > are in hex): > > [ 0.500080] xen:balloon: Initialising balloon driver > [ 0.503027] xen:balloon: balloon_init: current/target pages 1fff9d > [ 0.504044] xen_balloon: Initialising balloon driver > [ 0.508046] xen_balloon: watch_target: new target 800000 kb > [ 0.508046] xen:balloon: balloon_set_new_target: target 200000 > [ 0.524024] xen:balloon: current_credit: target pages 200000 > current pages 1fff9d credit 63 > [ 0.567055] xen:balloon: balloon_process: current_credit 63 > [ 0.568005] xen:balloon: reserve_additional_memory: adding memory > resource for 8000 pages > [ 3.694443] online_pages: pfn 210000 nr_pages 8000 type 0 > [ 3.701072] xen:balloon: current_credit: target pages 200000 > current pages 1fff9d credit 63 > [ 3.701074] xen:balloon: balloon_process: current_credit 63 > [ 3.701075] xen:balloon: increase_reservation: nr_pages 63 > [ 3.701170] xen:balloon: increase_reservation: done, current_pages 1fffa8 > [ 3.701172] xen:balloon: current_credit: target pages 200000 > current pages 1fffa8 credit 58 > [ 3.701173] xen:balloon: balloon_process: current_credit 58 > [ 3.701173] xen:balloon: increase_reservation: nr_pages 58 > [ 3.701180] xen:balloon: increase_reservation: XENMEM_populate_physmap err 0 > [ 5.708085] xen:balloon: current_credit: target pages 200000 > current pages 1fffa8 credit 58 > [ 5.708088] xen:balloon: balloon_process: current_credit 58 > [ 5.708089] xen:balloon: increase_reservation: nr_pages 58 > [ 5.708106] xen:balloon: increase_reservation: XENMEM_populate_physmap err 0 > [ 9.716065] xen:balloon: current_credit: target pages 200000 > current pages 1fffa8 credit 58 > [ 9.716068] xen:balloon: balloon_process: current_credit 58 > [ 9.716069] xen:balloon: increase_reservation: nr_pages 58 > [ 9.716087] xen:balloon: increase_reservation: XENMEM_populate_physmap err 0 > > > and that continues forever at the max interval (32), since > max_retry_count is unlimited. So I think I understand things now; > first, the current_pages is set properly based on the e820 map: > > $ dmesg|grep -i e820 > [ 0.000000] e820: BIOS-provided physical RAM map: > [ 0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009dfff] usable > [ 0.000000] BIOS-e820: [mem 0x000000000009e000-0x000000000009ffff] reserved > [ 0.000000] BIOS-e820: [mem 0x00000000000e0000-0x00000000000fffff] reserved > [ 0.000000] BIOS-e820: [mem 0x0000000000100000-0x00000000efffffff] usable > [ 0.000000] BIOS-e820: [mem 0x00000000fc000000-0x00000000ffffffff] reserved > [ 0.000000] BIOS-e820: [mem 0x0000000100000000-0x000000020fffffff] usable > [ 0.000000] e820: update [mem 0x00000000-0x00000fff] usable ==> reserved > [ 0.000000] e820: remove [mem 0x000a0000-0x000fffff] usable > [ 0.000000] e820: last_pfn = 0x210000 max_arch_pfn = 0x400000000 > [ 0.000000] e820: last_pfn = 0xf0000 max_arch_pfn = 0x400000000 > [ 0.000000] e820: [mem 0xf0000000-0xfbffffff] available for PCI devices > [ 0.528007] e820: reserve RAM buffer [mem 0x0009e000-0x0009ffff] > ubuntu@ip-172-31-60-112:~$ printf "%x\n" $[ 0x210000 - 0x100000 + > 0xf0000 - 0x100 + 0x9e - 1 ] > 1fff9d > > > then, the xen balloon notices its target has been set to 200000 by the > hypervisor. That target does account for the hole at 0xf0000 to > 0x100000, but it doesn't account for the hole at 0xe0 to 0x100 ( 0x20 > pages), nor the hole at 0x9e to 0xa0 ( 2 pages ), nor the unlisted > hole (that the kernel removes) at 0xa0 to 0xe0 ( 0x40 pages). That's > 0x62 pages, plus the 1-page hole at addr 0 that the kernel always > reserves, is 0x63 pages of holes, which aren't accounted for in the > hypervisor's target. > > so the balloon driver hotplugs the memory, and tries to increase its > reservation to provide the needed pages to get the current_pages up to > the target. However, when it calls the hypervisor to populate the > physmap, the hypervisor only allows 11 (0xb) pages to be populated; > all calls after that get back 0 from the hypervisor. > > Do you think the hypervisor's balloon target should account for the > e820 holes (and for the kernel's added hole at addr 0)? > Alternately/additionally, if the hypervisor doesn't want to support > ballooning, should it just return error from the call to populate the > physmap, and not allow those 11 pages? > > At this point, it doesn't seem to me like the kernel is doing anything > wrong, correct? > I think there is indeed a disconnect between target memory (provided by the toolstack) and current memory (i.e actual pages available to the guest). For example [ 0.000000] BIOS-e820: [mem 0x000000000009e000-0x000000000009ffff] reserved [ 0.000000] BIOS-e820: [mem 0x00000000000e0000-0x00000000000fffff] reserved are missed in target calculation. The hvmloader marks them as RESERVED (in build_e820_table()) but target value is not aware of this action. And then the same problem repeats when kernel removes 0x000a0000-0x000fffff chunk. (BTW, this is all happening before the new 0x8000 pages are onlined, which takes places much later and is a separate and what looks to me an unrelated event). -boris ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Xen-devel] maybe revert commit c275a57f5ec3 "xen/balloon: Set balloon's initial state to number of existing RAM pages" 2017-03-28 1:57 ` Boris Ostrovsky @ 2017-03-28 8:08 ` Jan Beulich 2017-03-28 14:27 ` Boris Ostrovsky 0 siblings, 1 reply; 18+ messages in thread From: Jan Beulich @ 2017-03-28 8:08 UTC (permalink / raw) To: Dan Streetman, Boris Ostrovsky; +Cc: xen-devel, Juergen Gross, linux-kernel >>> On 28.03.17 at 03:57, <boris.ostrovsky@oracle.com> wrote: > I think there is indeed a disconnect between target memory (provided by > the toolstack) and current memory (i.e actual pages available to the guest). > > For example > > [ 0.000000] BIOS-e820: [mem 0x000000000009e000-0x000000000009ffff] > reserved > [ 0.000000] BIOS-e820: [mem 0x00000000000e0000-0x00000000000fffff] > reserved > > are missed in target calculation. The hvmloader marks them as RESERVED > (in build_e820_table()) but target value is not aware of this action. > > And then the same problem repeats when kernel removes > 0x000a0000-0x000fffff chunk. But this is all in-guest behavior, i.e. nothing an entity outside the guest (tool stack or hypervisor) should need to be aware of. That said, there is still room for improvement in the tools I think: Regions which architecturally aren't RAM (namely the 0xa0000-0xfffff range) would probably better not be accounted for as RAM as far as ballooning is concerned. In the hypervisor, otoh, all memory assigned to the guest (i.e. including such backing ROMs) needs to be accounted. Jan ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Xen-devel] maybe revert commit c275a57f5ec3 "xen/balloon: Set balloon's initial state to number of existing RAM pages" 2017-03-28 8:08 ` [Xen-devel] " Jan Beulich @ 2017-03-28 14:27 ` Boris Ostrovsky 2017-03-28 15:04 ` Jan Beulich 2017-03-28 15:30 ` Juergen Gross 0 siblings, 2 replies; 18+ messages in thread From: Boris Ostrovsky @ 2017-03-28 14:27 UTC (permalink / raw) To: Jan Beulich, Dan Streetman; +Cc: xen-devel, Juergen Gross, linux-kernel On 03/28/2017 04:08 AM, Jan Beulich wrote: >>>> On 28.03.17 at 03:57, <boris.ostrovsky@oracle.com> wrote: >> I think there is indeed a disconnect between target memory (provided by >> the toolstack) and current memory (i.e actual pages available to the guest). >> >> For example >> >> [ 0.000000] BIOS-e820: [mem 0x000000000009e000-0x000000000009ffff] >> reserved >> [ 0.000000] BIOS-e820: [mem 0x00000000000e0000-0x00000000000fffff] >> reserved >> >> are missed in target calculation. The hvmloader marks them as RESERVED >> (in build_e820_table()) but target value is not aware of this action. >> >> And then the same problem repeats when kernel removes >> 0x000a0000-0x000fffff chunk. > But this is all in-guest behavior, i.e. nothing an entity outside the > guest (tool stack or hypervisor) should need to be aware of. That > said, there is still room for improvement in the tools I think: > Regions which architecturally aren't RAM (namely the > 0xa0000-0xfffff range) would probably better not be accounted > for as RAM as far as ballooning is concerned. In the hypervisor, > otoh, all memory assigned to the guest (i.e. including such backing > ROMs) needs to be accounted. On the Linux side we should not include in balloon calculations pages reserved by trim_bios_range(), i.e. (BIOS_END-BIOS_BEGIN) + 1. Which leaves hvmloader's special pages (and possibly memory under 0xA0000 which may get reserved). Can we pass this info to guests via xenstore? -boris ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Xen-devel] maybe revert commit c275a57f5ec3 "xen/balloon: Set balloon's initial state to number of existing RAM pages" 2017-03-28 14:27 ` Boris Ostrovsky @ 2017-03-28 15:04 ` Jan Beulich 2017-03-28 15:30 ` Juergen Gross 1 sibling, 0 replies; 18+ messages in thread From: Jan Beulich @ 2017-03-28 15:04 UTC (permalink / raw) To: Boris Ostrovsky; +Cc: Dan Streetman, xen-devel, Juergen Gross, linux-kernel >>> On 28.03.17 at 16:27, <boris.ostrovsky@oracle.com> wrote: > Which leaves hvmloader's special pages (and possibly memory under > 0xA0000 which may get reserved). Can we pass this info to guests via > xenstore? I'm perhaps the wrong one to ask regarding xenstore, but for in-guest communication this seems an at least strange approach to me. Jan ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Xen-devel] maybe revert commit c275a57f5ec3 "xen/balloon: Set balloon's initial state to number of existing RAM pages" 2017-03-28 14:27 ` Boris Ostrovsky 2017-03-28 15:04 ` Jan Beulich @ 2017-03-28 15:30 ` Juergen Gross 2017-03-28 16:32 ` Boris Ostrovsky 2017-07-08 0:59 ` Konrad Rzeszutek Wilk 1 sibling, 2 replies; 18+ messages in thread From: Juergen Gross @ 2017-03-28 15:30 UTC (permalink / raw) To: Boris Ostrovsky, Jan Beulich, Dan Streetman; +Cc: xen-devel, linux-kernel On 28/03/17 16:27, Boris Ostrovsky wrote: > On 03/28/2017 04:08 AM, Jan Beulich wrote: >>>>> On 28.03.17 at 03:57, <boris.ostrovsky@oracle.com> wrote: >>> I think there is indeed a disconnect between target memory (provided by >>> the toolstack) and current memory (i.e actual pages available to the guest). >>> >>> For example >>> >>> [ 0.000000] BIOS-e820: [mem 0x000000000009e000-0x000000000009ffff] >>> reserved >>> [ 0.000000] BIOS-e820: [mem 0x00000000000e0000-0x00000000000fffff] >>> reserved >>> >>> are missed in target calculation. The hvmloader marks them as RESERVED >>> (in build_e820_table()) but target value is not aware of this action. >>> >>> And then the same problem repeats when kernel removes >>> 0x000a0000-0x000fffff chunk. >> But this is all in-guest behavior, i.e. nothing an entity outside the >> guest (tool stack or hypervisor) should need to be aware of. That >> said, there is still room for improvement in the tools I think: >> Regions which architecturally aren't RAM (namely the >> 0xa0000-0xfffff range) would probably better not be accounted >> for as RAM as far as ballooning is concerned. In the hypervisor, >> otoh, all memory assigned to the guest (i.e. including such backing >> ROMs) needs to be accounted. > > On the Linux side we should not include in balloon calculations pages > reserved by trim_bios_range(), i.e. (BIOS_END-BIOS_BEGIN) + 1. > > Which leaves hvmloader's special pages (and possibly memory under > 0xA0000 which may get reserved). Can we pass this info to guests via > xenstore? I'd rather keep an internal difference between online pages and E820-map count value in the balloon driver. This should work always. Juergen ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Xen-devel] maybe revert commit c275a57f5ec3 "xen/balloon: Set balloon's initial state to number of existing RAM pages" 2017-03-28 15:30 ` Juergen Gross @ 2017-03-28 16:32 ` Boris Ostrovsky 2017-03-29 4:36 ` Juergen Gross 2017-07-08 0:59 ` Konrad Rzeszutek Wilk 1 sibling, 1 reply; 18+ messages in thread From: Boris Ostrovsky @ 2017-03-28 16:32 UTC (permalink / raw) To: Juergen Gross, Jan Beulich, Dan Streetman; +Cc: xen-devel, linux-kernel On 03/28/2017 11:30 AM, Juergen Gross wrote: > On 28/03/17 16:27, Boris Ostrovsky wrote: >> On 03/28/2017 04:08 AM, Jan Beulich wrote: >>>>>> On 28.03.17 at 03:57, <boris.ostrovsky@oracle.com> wrote: >>>> I think there is indeed a disconnect between target memory (provided by >>>> the toolstack) and current memory (i.e actual pages available to the guest). >>>> >>>> For example >>>> >>>> [ 0.000000] BIOS-e820: [mem 0x000000000009e000-0x000000000009ffff] >>>> reserved >>>> [ 0.000000] BIOS-e820: [mem 0x00000000000e0000-0x00000000000fffff] >>>> reserved >>>> >>>> are missed in target calculation. The hvmloader marks them as RESERVED >>>> (in build_e820_table()) but target value is not aware of this action. >>>> >>>> And then the same problem repeats when kernel removes >>>> 0x000a0000-0x000fffff chunk. >>> But this is all in-guest behavior, i.e. nothing an entity outside the >>> guest (tool stack or hypervisor) should need to be aware of. That >>> said, there is still room for improvement in the tools I think: >>> Regions which architecturally aren't RAM (namely the >>> 0xa0000-0xfffff range) would probably better not be accounted >>> for as RAM as far as ballooning is concerned. In the hypervisor, >>> otoh, all memory assigned to the guest (i.e. including such backing >>> ROMs) needs to be accounted. >> On the Linux side we should not include in balloon calculations pages >> reserved by trim_bios_range(), i.e. (BIOS_END-BIOS_BEGIN) + 1. >> >> Which leaves hvmloader's special pages (and possibly memory under >> 0xA0000 which may get reserved). Can we pass this info to guests via >> xenstore? > I'd rather keep an internal difference between online pages and E820-map > count value in the balloon driver. This should work always. We could indeed base calculation on initial state of e820 and not count the holes toward ballooning needs. I am not sure this will work for memory unplug though, where a hole can be created in the map and we will be supposed to handle disappearing memory via ballooning. Or am I creating a problem where none exists? -boris ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Xen-devel] maybe revert commit c275a57f5ec3 "xen/balloon: Set balloon's initial state to number of existing RAM pages" 2017-03-28 16:32 ` Boris Ostrovsky @ 2017-03-29 4:36 ` Juergen Gross 0 siblings, 0 replies; 18+ messages in thread From: Juergen Gross @ 2017-03-29 4:36 UTC (permalink / raw) To: Boris Ostrovsky, Jan Beulich, Dan Streetman; +Cc: xen-devel, linux-kernel On 28/03/17 18:32, Boris Ostrovsky wrote: > On 03/28/2017 11:30 AM, Juergen Gross wrote: >> On 28/03/17 16:27, Boris Ostrovsky wrote: >>> On 03/28/2017 04:08 AM, Jan Beulich wrote: >>>>>>> On 28.03.17 at 03:57, <boris.ostrovsky@oracle.com> wrote: >>>>> I think there is indeed a disconnect between target memory (provided by >>>>> the toolstack) and current memory (i.e actual pages available to the guest). >>>>> >>>>> For example >>>>> >>>>> [ 0.000000] BIOS-e820: [mem 0x000000000009e000-0x000000000009ffff] >>>>> reserved >>>>> [ 0.000000] BIOS-e820: [mem 0x00000000000e0000-0x00000000000fffff] >>>>> reserved >>>>> >>>>> are missed in target calculation. The hvmloader marks them as RESERVED >>>>> (in build_e820_table()) but target value is not aware of this action. >>>>> >>>>> And then the same problem repeats when kernel removes >>>>> 0x000a0000-0x000fffff chunk. >>>> But this is all in-guest behavior, i.e. nothing an entity outside the >>>> guest (tool stack or hypervisor) should need to be aware of. That >>>> said, there is still room for improvement in the tools I think: >>>> Regions which architecturally aren't RAM (namely the >>>> 0xa0000-0xfffff range) would probably better not be accounted >>>> for as RAM as far as ballooning is concerned. In the hypervisor, >>>> otoh, all memory assigned to the guest (i.e. including such backing >>>> ROMs) needs to be accounted. >>> On the Linux side we should not include in balloon calculations pages >>> reserved by trim_bios_range(), i.e. (BIOS_END-BIOS_BEGIN) + 1. >>> >>> Which leaves hvmloader's special pages (and possibly memory under >>> 0xA0000 which may get reserved). Can we pass this info to guests via >>> xenstore? >> I'd rather keep an internal difference between online pages and E820-map >> count value in the balloon driver. This should work always. > > We could indeed base calculation on initial state of e820 and not count > the holes toward ballooning needs. I am not sure this will work for > memory unplug though, where a hole can be created in the map and we will > be supposed to handle disappearing memory via ballooning. > > Or am I creating a problem where none exists? I'm rather sure memory has to be offlined before being deleted from the E820 map. Juergen ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Xen-devel] maybe revert commit c275a57f5ec3 "xen/balloon: Set balloon's initial state to number of existing RAM pages" 2017-03-28 15:30 ` Juergen Gross 2017-03-28 16:32 ` Boris Ostrovsky @ 2017-07-08 0:59 ` Konrad Rzeszutek Wilk 2017-07-09 13:16 ` Juergen Gross 1 sibling, 1 reply; 18+ messages in thread From: Konrad Rzeszutek Wilk @ 2017-07-08 0:59 UTC (permalink / raw) To: Juergen Gross Cc: Boris Ostrovsky, Jan Beulich, Dan Streetman, xen-devel, linux-kernel On Tue, Mar 28, 2017 at 05:30:24PM +0200, Juergen Gross wrote: > On 28/03/17 16:27, Boris Ostrovsky wrote: > > On 03/28/2017 04:08 AM, Jan Beulich wrote: > >>>>> On 28.03.17 at 03:57, <boris.ostrovsky@oracle.com> wrote: > >>> I think there is indeed a disconnect between target memory (provided by > >>> the toolstack) and current memory (i.e actual pages available to the guest). > >>> > >>> For example > >>> > >>> [ 0.000000] BIOS-e820: [mem 0x000000000009e000-0x000000000009ffff] > >>> reserved > >>> [ 0.000000] BIOS-e820: [mem 0x00000000000e0000-0x00000000000fffff] > >>> reserved > >>> > >>> are missed in target calculation. The hvmloader marks them as RESERVED > >>> (in build_e820_table()) but target value is not aware of this action. > >>> > >>> And then the same problem repeats when kernel removes > >>> 0x000a0000-0x000fffff chunk. > >> But this is all in-guest behavior, i.e. nothing an entity outside the > >> guest (tool stack or hypervisor) should need to be aware of. That > >> said, there is still room for improvement in the tools I think: > >> Regions which architecturally aren't RAM (namely the > >> 0xa0000-0xfffff range) would probably better not be accounted > >> for as RAM as far as ballooning is concerned. In the hypervisor, > >> otoh, all memory assigned to the guest (i.e. including such backing > >> ROMs) needs to be accounted. > > > > On the Linux side we should not include in balloon calculations pages > > reserved by trim_bios_range(), i.e. (BIOS_END-BIOS_BEGIN) + 1. > > > > Which leaves hvmloader's special pages (and possibly memory under > > 0xA0000 which may get reserved). Can we pass this info to guests via > > xenstore? > > I'd rather keep an internal difference between online pages and E820-map > count value in the balloon driver. This should work always. Did we ever come with a patch for this? ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [Xen-devel] maybe revert commit c275a57f5ec3 "xen/balloon: Set balloon's initial state to number of existing RAM pages" 2017-07-08 0:59 ` Konrad Rzeszutek Wilk @ 2017-07-09 13:16 ` Juergen Gross 0 siblings, 0 replies; 18+ messages in thread From: Juergen Gross @ 2017-07-09 13:16 UTC (permalink / raw) To: Konrad Rzeszutek Wilk Cc: Boris Ostrovsky, Jan Beulich, Dan Streetman, xen-devel, linux-kernel On 08/07/17 02:59, Konrad Rzeszutek Wilk wrote: > On Tue, Mar 28, 2017 at 05:30:24PM +0200, Juergen Gross wrote: >> On 28/03/17 16:27, Boris Ostrovsky wrote: >>> On 03/28/2017 04:08 AM, Jan Beulich wrote: >>>>>>> On 28.03.17 at 03:57, <boris.ostrovsky@oracle.com> wrote: >>>>> I think there is indeed a disconnect between target memory (provided by >>>>> the toolstack) and current memory (i.e actual pages available to the guest). >>>>> >>>>> For example >>>>> >>>>> [ 0.000000] BIOS-e820: [mem 0x000000000009e000-0x000000000009ffff] >>>>> reserved >>>>> [ 0.000000] BIOS-e820: [mem 0x00000000000e0000-0x00000000000fffff] >>>>> reserved >>>>> >>>>> are missed in target calculation. The hvmloader marks them as RESERVED >>>>> (in build_e820_table()) but target value is not aware of this action. >>>>> >>>>> And then the same problem repeats when kernel removes >>>>> 0x000a0000-0x000fffff chunk. >>>> But this is all in-guest behavior, i.e. nothing an entity outside the >>>> guest (tool stack or hypervisor) should need to be aware of. That >>>> said, there is still room for improvement in the tools I think: >>>> Regions which architecturally aren't RAM (namely the >>>> 0xa0000-0xfffff range) would probably better not be accounted >>>> for as RAM as far as ballooning is concerned. In the hypervisor, >>>> otoh, all memory assigned to the guest (i.e. including such backing >>>> ROMs) needs to be accounted. >>> >>> On the Linux side we should not include in balloon calculations pages >>> reserved by trim_bios_range(), i.e. (BIOS_END-BIOS_BEGIN) + 1. >>> >>> Which leaves hvmloader's special pages (and possibly memory under >>> 0xA0000 which may get reserved). Can we pass this info to guests via >>> xenstore? >> >> I'd rather keep an internal difference between online pages and E820-map >> count value in the balloon driver. This should work always. > > Did we ever come with a patch for this? Yes, I've sent V2 recently: https://lists.xen.org/archives/html/xen-devel/2017-07/msg00530.html Juergen ^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2017-07-09 13:16 UTC | newest] Thread overview: 18+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2017-03-22 21:16 maybe revert commit c275a57f5ec3 "xen/balloon: Set balloon's initial state to number of existing RAM pages" Dan Streetman 2017-03-23 2:13 ` Boris Ostrovsky 2017-03-23 7:56 ` Juergen Gross 2017-03-24 20:30 ` Dan Streetman 2017-03-24 20:34 ` Dan Streetman 2017-03-24 21:10 ` Konrad Rzeszutek Wilk 2017-03-24 21:26 ` Dan Streetman 2017-03-25 1:33 ` Boris Ostrovsky 2017-03-27 19:57 ` Dan Streetman 2017-03-28 1:57 ` Boris Ostrovsky 2017-03-28 8:08 ` [Xen-devel] " Jan Beulich 2017-03-28 14:27 ` Boris Ostrovsky 2017-03-28 15:04 ` Jan Beulich 2017-03-28 15:30 ` Juergen Gross 2017-03-28 16:32 ` Boris Ostrovsky 2017-03-29 4:36 ` Juergen Gross 2017-07-08 0:59 ` Konrad Rzeszutek Wilk 2017-07-09 13:16 ` Juergen Gross
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).