* Excluding init_on_free for pages for initial balloon down (Xen)
@ 2026-03-01 15:04 Marek Marczykowski-Górecki
2026-03-02 6:36 ` Jürgen Groß
0 siblings, 1 reply; 9+ messages in thread
From: Marek Marczykowski-Górecki @ 2026-03-01 15:04 UTC (permalink / raw)
To: xen-devel
Cc: Juergen Gross, Boris Ostrovsky, Andrew Morton, David Hildenbrand
[-- Attachment #1: Type: text/plain, Size: 3650 bytes --]
Hi,
Some time ago I made a change to disable scrubbing pages that are
ballooned out during system boot. I'll paste the whole commit message as
it's relevant here:
197ecb3802c0 xen/balloon: add runtime control for scrubbing ballooned out pages
Scrubbing pages on initial balloon down can take some time, especially
in nested virtualization case (nested EPT is slow). When HVM/PVH guest is
started with memory= significantly lower than maxmem=, all the extra
pages will be scrubbed before returning to Xen. But since most of them
weren't used at all at that point, Xen needs to populate them first
(from populate-on-demand pool). In nested virt case (Xen inside KVM)
this slows down the guest boot by 15-30s with just 1.5GB needed to be
returned to Xen.
Add runtime parameter to enable/disable it, to allow initially disabling
scrubbing, then enable it back during boot (for example in initramfs).
Such usage relies on assumption that a) most pages ballooned out during
initial boot weren't used at all, and b) even if they were, very few
secrets are in the guest at that time (before any serious userspace
kicks in).
Convert CONFIG_XEN_SCRUB_PAGES to CONFIG_XEN_SCRUB_PAGES_DEFAULT (also
enabled by default), controlling default value for the new runtime
switch.
Now, I face the same issue with init_on_free/init_on_alloc (not sure
which one applies here, probably the latter one), which several
distributions enable by default. The result is (see timestamps):
[2026-02-24 01:12:55] [ 7.485151] xen:balloon: Waiting for initial ballooning down having finished.
[2026-02-24 01:14:14] [ 86.581510] xen:balloon: Initial ballooning down finished.
But here the situation is a bit more complicated:
init_on_free/init_on_alloc applies to any pages, not just those for
balloon driver. I see two approaches to solve the issue:
1. Similar to xen_scrub_pages=, add a runtime switch for
init_on_free/init_on_alloc, then force them off during boot, and
re-enable early in initramfs.
2. Somehow adjust balloon driver to bypass init_on_alloc when ballooning
a page out.
The first approach is likely easier to implement, but also has some
drawbacks: it may result in some kernel structures that are allocated
early to remain with garbage data in uninitialized places. While it may
not matter during early boot, such structures may survive for quite some
time, and maybe attacker can use them later on to exploit some other
bug. This wasn't really a concern with xen_scrub_pages, as those pages
were immediately ballooned out.
The second approach sounds architecturally better, and maybe
init_on_alloc could be always bypassed during balloon out? The balloon
driver can scrub the page on its own already (which is enabled by
default). That of course assumes the issue is only about init_on_alloc,
not init_on_free (or both) - which I haven't really confirmed yet...
If going this way, I see the balloon driver does basically
alloc_page(GFP_BALLOON), where GFP_BALLOON is:
/* When ballooning out (allocating memory to return to Xen) we don't really
want the kernel to try too hard since that can trigger the oom killer. */
#define GFP_BALLOON \
(GFP_HIGHUSER | __GFP_NOWARN | __GFP_NORETRY | __GFP_NOMEMALLOC)
Would that be about adding some new flag here? Or maybe there is already
one for this purpose?
Any opinions?
PS issue tracked at https://github.com/QubesOS/qubes-issues/issues/10723
--
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: Excluding init_on_free for pages for initial balloon down (Xen) 2026-03-01 15:04 Excluding init_on_free for pages for initial balloon down (Xen) Marek Marczykowski-Górecki @ 2026-03-02 6:36 ` Jürgen Groß 2026-03-02 8:40 ` David Hildenbrand (Arm) 0 siblings, 1 reply; 9+ messages in thread From: Jürgen Groß @ 2026-03-02 6:36 UTC (permalink / raw) To: Marek Marczykowski-Górecki, xen-devel Cc: Boris Ostrovsky, Andrew Morton, David Hildenbrand [-- Attachment #1.1.1: Type: text/plain, Size: 4103 bytes --] On 01.03.26 16:04, Marek Marczykowski-Górecki wrote: > Hi, > > Some time ago I made a change to disable scrubbing pages that are > ballooned out during system boot. I'll paste the whole commit message as > it's relevant here: > > 197ecb3802c0 xen/balloon: add runtime control for scrubbing ballooned out pages > > Scrubbing pages on initial balloon down can take some time, especially > in nested virtualization case (nested EPT is slow). When HVM/PVH guest is > started with memory= significantly lower than maxmem=, all the extra > pages will be scrubbed before returning to Xen. But since most of them > weren't used at all at that point, Xen needs to populate them first > (from populate-on-demand pool). In nested virt case (Xen inside KVM) > this slows down the guest boot by 15-30s with just 1.5GB needed to be > returned to Xen. > > Add runtime parameter to enable/disable it, to allow initially disabling > scrubbing, then enable it back during boot (for example in initramfs). > Such usage relies on assumption that a) most pages ballooned out during > initial boot weren't used at all, and b) even if they were, very few > secrets are in the guest at that time (before any serious userspace > kicks in). > Convert CONFIG_XEN_SCRUB_PAGES to CONFIG_XEN_SCRUB_PAGES_DEFAULT (also > enabled by default), controlling default value for the new runtime > switch. > > Now, I face the same issue with init_on_free/init_on_alloc (not sure > which one applies here, probably the latter one), which several > distributions enable by default. The result is (see timestamps): > > [2026-02-24 01:12:55] [ 7.485151] xen:balloon: Waiting for initial ballooning down having finished. > [2026-02-24 01:14:14] [ 86.581510] xen:balloon: Initial ballooning down finished. > > But here the situation is a bit more complicated: > init_on_free/init_on_alloc applies to any pages, not just those for > balloon driver. I see two approaches to solve the issue: > 1. Similar to xen_scrub_pages=, add a runtime switch for > init_on_free/init_on_alloc, then force them off during boot, and > re-enable early in initramfs. > 2. Somehow adjust balloon driver to bypass init_on_alloc when ballooning > a page out. > > The first approach is likely easier to implement, but also has some > drawbacks: it may result in some kernel structures that are allocated > early to remain with garbage data in uninitialized places. While it may > not matter during early boot, such structures may survive for quite some > time, and maybe attacker can use them later on to exploit some other > bug. This wasn't really a concern with xen_scrub_pages, as those pages > were immediately ballooned out. > > The second approach sounds architecturally better, and maybe > init_on_alloc could be always bypassed during balloon out? The balloon > driver can scrub the page on its own already (which is enabled by > default). That of course assumes the issue is only about init_on_alloc, > not init_on_free (or both) - which I haven't really confirmed yet... > If going this way, I see the balloon driver does basically > alloc_page(GFP_BALLOON), where GFP_BALLOON is: > > /* When ballooning out (allocating memory to return to Xen) we don't really > want the kernel to try too hard since that can trigger the oom killer. */ > #define GFP_BALLOON \ > (GFP_HIGHUSER | __GFP_NOWARN | __GFP_NORETRY | __GFP_NOMEMALLOC) > > Would that be about adding some new flag here? Or maybe there is already > one for this purpose? There doesn't seem to be a flag for that. But I think adding a new flag __GFP_NO_INIT and testing that in want_init_on_alloc() _before_ checking CONFIG_INIT_ON_ALLOC_DEFAULT_ON would be a sensible approach. > Any opinions? You are aware of the "init_on_alloc" boot parameter? So if this is fine for you, you could just use approach 1 above without any kernel patches needed. Juergen [-- Attachment #1.1.2: OpenPGP public key --] [-- Type: application/pgp-keys, Size: 3743 bytes --] [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 495 bytes --] ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Excluding init_on_free for pages for initial balloon down (Xen) 2026-03-02 6:36 ` Jürgen Groß @ 2026-03-02 8:40 ` David Hildenbrand (Arm) 2026-03-02 11:01 ` Marek Marczykowski-Górecki 0 siblings, 1 reply; 9+ messages in thread From: David Hildenbrand (Arm) @ 2026-03-02 8:40 UTC (permalink / raw) To: Jürgen Groß, Marek Marczykowski-Górecki, xen-devel Cc: Boris Ostrovsky, Andrew Morton On 3/2/26 07:36, Jürgen Groß wrote: > On 01.03.26 16:04, Marek Marczykowski-Górecki wrote: >> Hi, >> >> Some time ago I made a change to disable scrubbing pages that are >> ballooned out during system boot. I'll paste the whole commit message as >> it's relevant here: >> >> 197ecb3802c0 xen/balloon: add runtime control for scrubbing >> ballooned out pages >> >> Scrubbing pages on initial balloon down can take some time, >> especially >> in nested virtualization case (nested EPT is slow). When HVM/PVH >> guest is >> started with memory= significantly lower than maxmem=, all the extra >> pages will be scrubbed before returning to Xen. But since most of >> them >> weren't used at all at that point, Xen needs to populate them first >> (from populate-on-demand pool). In nested virt case (Xen inside KVM) >> this slows down the guest boot by 15-30s with just 1.5GB needed >> to be >> returned to Xen. >> Add runtime parameter to enable/disable it, to allow >> initially disabling >> scrubbing, then enable it back during boot (for example in >> initramfs). >> Such usage relies on assumption that a) most pages ballooned out >> during >> initial boot weren't used at all, and b) even if they were, very few >> secrets are in the guest at that time (before any serious userspace >> kicks in). >> Convert CONFIG_XEN_SCRUB_PAGES to CONFIG_XEN_SCRUB_PAGES_DEFAULT >> (also >> enabled by default), controlling default value for the new runtime >> switch. >> >> Now, I face the same issue with init_on_free/init_on_alloc (not sure >> which one applies here, probably the latter one), which several >> distributions enable by default. The result is (see timestamps): >> >> [2026-02-24 01:12:55] [ 7.485151] xen:balloon: Waiting for >> initial ballooning down having finished. >> [2026-02-24 01:14:14] [ 86.581510] xen:balloon: Initial >> ballooning down finished. >> >> But here the situation is a bit more complicated: >> init_on_free/init_on_alloc applies to any pages, not just those for >> balloon driver. I see two approaches to solve the issue: >> 1. Similar to xen_scrub_pages=, add a runtime switch for >> init_on_free/init_on_alloc, then force them off during boot, and >> re-enable early in initramfs. >> 2. Somehow adjust balloon driver to bypass init_on_alloc when ballooning >> a page out. >> >> The first approach is likely easier to implement, but also has some >> drawbacks: it may result in some kernel structures that are allocated >> early to remain with garbage data in uninitialized places. While it may >> not matter during early boot, such structures may survive for quite some >> time, and maybe attacker can use them later on to exploit some other >> bug. This wasn't really a concern with xen_scrub_pages, as those pages >> were immediately ballooned out. >> >> The second approach sounds architecturally better, and maybe >> init_on_alloc could be always bypassed during balloon out? The balloon >> driver can scrub the page on its own already (which is enabled by >> default). That of course assumes the issue is only about init_on_alloc, >> not init_on_free (or both) - which I haven't really confirmed yet... >> If going this way, I see the balloon driver does basically >> alloc_page(GFP_BALLOON), where GFP_BALLOON is: >> >> /* When ballooning out (allocating memory to return to Xen) we >> don't really >> want the kernel to try too hard since that can trigger the oom >> killer. */ >> #define GFP_BALLOON \ >> (GFP_HIGHUSER | __GFP_NOWARN | __GFP_NORETRY | __GFP_NOMEMALLOC) >> >> Would that be about adding some new flag here? Or maybe there is already >> one for this purpose? > > There doesn't seem to be a flag for that. > > But I think adding a new flag __GFP_NO_INIT and testing that in > want_init_on_alloc() _before_ checking CONFIG_INIT_ON_ALLOC_DEFAULT_ON > would be a sensible approach. People argued against such flags in the past, because it will simply get abused by arbitrary drivers that want to be smart. Whatever leaves the buddy shall be zeroed out. If there is a double-zeroing happen, the latter could get optimized out by checking something like user_alloc_needs_zeroing(). See mm/huge_memory.c:vma_alloc_anon_folio_pmd() as an example where we avoid double-zeroing. > >> Any opinions? > > You are aware of the "init_on_alloc" boot parameter? So if this is fine > for you, you could just use approach 1 above without any kernel patches > needed. I don't think init_on_alloc can be enabled after boot. IIUC, 1) would require a runtime switch. -- Cheers, David ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Excluding init_on_free for pages for initial balloon down (Xen) 2026-03-02 8:40 ` David Hildenbrand (Arm) @ 2026-03-02 11:01 ` Marek Marczykowski-Górecki 2026-03-02 11:05 ` Jan Beulich 2026-03-02 14:54 ` David Hildenbrand (Arm) 0 siblings, 2 replies; 9+ messages in thread From: Marek Marczykowski-Górecki @ 2026-03-02 11:01 UTC (permalink / raw) To: David Hildenbrand (Arm) Cc: Jürgen Groß, xen-devel, Boris Ostrovsky, Andrew Morton [-- Attachment #1: Type: text/plain, Size: 6061 bytes --] On Mon, Mar 02, 2026 at 09:40:29AM +0100, David Hildenbrand (Arm) wrote: > On 3/2/26 07:36, Jürgen Groß wrote: > > On 01.03.26 16:04, Marek Marczykowski-Górecki wrote: > >> Hi, > >> > >> Some time ago I made a change to disable scrubbing pages that are > >> ballooned out during system boot. I'll paste the whole commit message as > >> it's relevant here: > >> > >> 197ecb3802c0 xen/balloon: add runtime control for scrubbing > >> ballooned out pages > >> > >> Scrubbing pages on initial balloon down can take some time, > >> especially > >> in nested virtualization case (nested EPT is slow). When HVM/PVH > >> guest is > >> started with memory= significantly lower than maxmem=, all the extra > >> pages will be scrubbed before returning to Xen. But since most of > >> them > >> weren't used at all at that point, Xen needs to populate them first > >> (from populate-on-demand pool). In nested virt case (Xen inside KVM) > >> this slows down the guest boot by 15-30s with just 1.5GB needed > >> to be > >> returned to Xen. > >> Add runtime parameter to enable/disable it, to allow > >> initially disabling > >> scrubbing, then enable it back during boot (for example in > >> initramfs). > >> Such usage relies on assumption that a) most pages ballooned out > >> during > >> initial boot weren't used at all, and b) even if they were, very few > >> secrets are in the guest at that time (before any serious userspace > >> kicks in). > >> Convert CONFIG_XEN_SCRUB_PAGES to CONFIG_XEN_SCRUB_PAGES_DEFAULT > >> (also > >> enabled by default), controlling default value for the new runtime > >> switch. > >> > >> Now, I face the same issue with init_on_free/init_on_alloc (not sure > >> which one applies here, probably the latter one), which several > >> distributions enable by default. The result is (see timestamps): > >> > >> [2026-02-24 01:12:55] [ 7.485151] xen:balloon: Waiting for > >> initial ballooning down having finished. > >> [2026-02-24 01:14:14] [ 86.581510] xen:balloon: Initial > >> ballooning down finished. > >> > >> But here the situation is a bit more complicated: > >> init_on_free/init_on_alloc applies to any pages, not just those for > >> balloon driver. I see two approaches to solve the issue: > >> 1. Similar to xen_scrub_pages=, add a runtime switch for > >> init_on_free/init_on_alloc, then force them off during boot, and > >> re-enable early in initramfs. > >> 2. Somehow adjust balloon driver to bypass init_on_alloc when ballooning > >> a page out. > >> > >> The first approach is likely easier to implement, but also has some > >> drawbacks: it may result in some kernel structures that are allocated > >> early to remain with garbage data in uninitialized places. While it may > >> not matter during early boot, such structures may survive for quite some > >> time, and maybe attacker can use them later on to exploit some other > >> bug. This wasn't really a concern with xen_scrub_pages, as those pages > >> were immediately ballooned out. > >> > >> The second approach sounds architecturally better, and maybe > >> init_on_alloc could be always bypassed during balloon out? The balloon > >> driver can scrub the page on its own already (which is enabled by > >> default). That of course assumes the issue is only about init_on_alloc, > >> not init_on_free (or both) - which I haven't really confirmed yet... > >> If going this way, I see the balloon driver does basically > >> alloc_page(GFP_BALLOON), where GFP_BALLOON is: > >> > >> /* When ballooning out (allocating memory to return to Xen) we > >> don't really > >> want the kernel to try too hard since that can trigger the oom > >> killer. */ > >> #define GFP_BALLOON \ > >> (GFP_HIGHUSER | __GFP_NOWARN | __GFP_NORETRY | __GFP_NOMEMALLOC) > >> > >> Would that be about adding some new flag here? Or maybe there is already > >> one for this purpose? > > > > There doesn't seem to be a flag for that. > > > > But I think adding a new flag __GFP_NO_INIT and testing that in > > want_init_on_alloc() _before_ checking CONFIG_INIT_ON_ALLOC_DEFAULT_ON > > would be a sensible approach. > > People argued against such flags in the past, because it will simply get > abused by arbitrary drivers that want to be smart. Could it be named differently to discourage such usage? Maybe __GFP_BALLOON_OUT ? > Whatever leaves the buddy shall be zeroed out. If there is a > double-zeroing happen, the latter could get optimized out by checking > something like user_alloc_needs_zeroing(). > > See mm/huge_memory.c:vma_alloc_anon_folio_pmd() as an example where we > avoid double-zeroing. It isn't just reducing double-zeroing to single zeroing. It's about avoiding zeroing such pages at all. If a domU is started with populate-on-demand, many (sometimes most) of its pages are populated in EPT. The idea of PoD is to start guest with high static memory size, but low actual allocation and fake it until balloon driver kicks in and make the domU really not use more pages than it has. When balloon driver try to return those pages to the hypervisor, normally it would just take unallocated page one by one and made Linux not use them. But if _any_ zeroing is happening, each page first needs to be mapped to the guest by the hypervisor (one trip through EPT), just to be removed from them a moment later... > >> Any opinions? > > > > You are aware of the "init_on_alloc" boot parameter? So if this is fine > > for you, you could just use approach 1 above without any kernel patches > > needed. > > I don't think init_on_alloc can be enabled after boot. IIUC, 1) would > require a runtime switch. Indeed. -- Best Regards, Marek Marczykowski-Górecki Invisible Things Lab [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Excluding init_on_free for pages for initial balloon down (Xen) 2026-03-02 11:01 ` Marek Marczykowski-Górecki @ 2026-03-02 11:05 ` Jan Beulich 2026-03-02 11:11 ` Marek Marczykowski-Górecki 2026-03-02 14:54 ` David Hildenbrand (Arm) 1 sibling, 1 reply; 9+ messages in thread From: Jan Beulich @ 2026-03-02 11:05 UTC (permalink / raw) To: Marek Marczykowski-Górecki Cc: Jürgen Groß, xen-devel, Boris Ostrovsky, Andrew Morton, David Hildenbrand (Arm) On 02.03.2026 12:01, Marek Marczykowski-Górecki wrote: > On Mon, Mar 02, 2026 at 09:40:29AM +0100, David Hildenbrand (Arm) wrote: >> On 3/2/26 07:36, Jürgen Groß wrote: >>> On 01.03.26 16:04, Marek Marczykowski-Górecki wrote: >>>> Hi, >>>> >>>> Some time ago I made a change to disable scrubbing pages that are >>>> ballooned out during system boot. I'll paste the whole commit message as >>>> it's relevant here: >>>> >>>> 197ecb3802c0 xen/balloon: add runtime control for scrubbing >>>> ballooned out pages >>>> >>>> Scrubbing pages on initial balloon down can take some time, >>>> especially >>>> in nested virtualization case (nested EPT is slow). When HVM/PVH >>>> guest is >>>> started with memory= significantly lower than maxmem=, all the extra >>>> pages will be scrubbed before returning to Xen. But since most of >>>> them >>>> weren't used at all at that point, Xen needs to populate them first >>>> (from populate-on-demand pool). In nested virt case (Xen inside KVM) >>>> this slows down the guest boot by 15-30s with just 1.5GB needed >>>> to be >>>> returned to Xen. >>>> Add runtime parameter to enable/disable it, to allow >>>> initially disabling >>>> scrubbing, then enable it back during boot (for example in >>>> initramfs). >>>> Such usage relies on assumption that a) most pages ballooned out >>>> during >>>> initial boot weren't used at all, and b) even if they were, very few >>>> secrets are in the guest at that time (before any serious userspace >>>> kicks in). >>>> Convert CONFIG_XEN_SCRUB_PAGES to CONFIG_XEN_SCRUB_PAGES_DEFAULT >>>> (also >>>> enabled by default), controlling default value for the new runtime >>>> switch. >>>> >>>> Now, I face the same issue with init_on_free/init_on_alloc (not sure >>>> which one applies here, probably the latter one), which several >>>> distributions enable by default. The result is (see timestamps): >>>> >>>> [2026-02-24 01:12:55] [ 7.485151] xen:balloon: Waiting for >>>> initial ballooning down having finished. >>>> [2026-02-24 01:14:14] [ 86.581510] xen:balloon: Initial >>>> ballooning down finished. >>>> >>>> But here the situation is a bit more complicated: >>>> init_on_free/init_on_alloc applies to any pages, not just those for >>>> balloon driver. I see two approaches to solve the issue: >>>> 1. Similar to xen_scrub_pages=, add a runtime switch for >>>> init_on_free/init_on_alloc, then force them off during boot, and >>>> re-enable early in initramfs. >>>> 2. Somehow adjust balloon driver to bypass init_on_alloc when ballooning >>>> a page out. >>>> >>>> The first approach is likely easier to implement, but also has some >>>> drawbacks: it may result in some kernel structures that are allocated >>>> early to remain with garbage data in uninitialized places. While it may >>>> not matter during early boot, such structures may survive for quite some >>>> time, and maybe attacker can use them later on to exploit some other >>>> bug. This wasn't really a concern with xen_scrub_pages, as those pages >>>> were immediately ballooned out. >>>> >>>> The second approach sounds architecturally better, and maybe >>>> init_on_alloc could be always bypassed during balloon out? The balloon >>>> driver can scrub the page on its own already (which is enabled by >>>> default). That of course assumes the issue is only about init_on_alloc, >>>> not init_on_free (or both) - which I haven't really confirmed yet... >>>> If going this way, I see the balloon driver does basically >>>> alloc_page(GFP_BALLOON), where GFP_BALLOON is: >>>> >>>> /* When ballooning out (allocating memory to return to Xen) we >>>> don't really >>>> want the kernel to try too hard since that can trigger the oom >>>> killer. */ >>>> #define GFP_BALLOON \ >>>> (GFP_HIGHUSER | __GFP_NOWARN | __GFP_NORETRY | __GFP_NOMEMALLOC) >>>> >>>> Would that be about adding some new flag here? Or maybe there is already >>>> one for this purpose? >>> >>> There doesn't seem to be a flag for that. >>> >>> But I think adding a new flag __GFP_NO_INIT and testing that in >>> want_init_on_alloc() _before_ checking CONFIG_INIT_ON_ALLOC_DEFAULT_ON >>> would be a sensible approach. >> >> People argued against such flags in the past, because it will simply get >> abused by arbitrary drivers that want to be smart. > > Could it be named differently to discourage such usage? Maybe > __GFP_BALLOON_OUT ? > >> Whatever leaves the buddy shall be zeroed out. If there is a >> double-zeroing happen, the latter could get optimized out by checking >> something like user_alloc_needs_zeroing(). >> >> See mm/huge_memory.c:vma_alloc_anon_folio_pmd() as an example where we >> avoid double-zeroing. > > It isn't just reducing double-zeroing to single zeroing. It's about > avoiding zeroing such pages at all. If a domU is started with > populate-on-demand, many (sometimes most) of its pages are populated in > EPT. ITYM "unpopulated in EPT"? Jan > The idea of PoD is to start guest with high static memory size, but > low actual allocation and fake it until balloon driver kicks in and make > the domU really not use more pages than it has. When balloon driver try > to return those pages to the hypervisor, normally it would just take > unallocated page one by one and made Linux not use them. But if _any_ > zeroing is happening, each page first needs to be mapped to the guest by > the hypervisor (one trip through EPT), just to be removed from them a > moment later... > >>>> Any opinions? >>> >>> You are aware of the "init_on_alloc" boot parameter? So if this is fine >>> for you, you could just use approach 1 above without any kernel patches >>> needed. >> >> I don't think init_on_alloc can be enabled after boot. IIUC, 1) would >> require a runtime switch. > > Indeed. > ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Excluding init_on_free for pages for initial balloon down (Xen) 2026-03-02 11:05 ` Jan Beulich @ 2026-03-02 11:11 ` Marek Marczykowski-Górecki 0 siblings, 0 replies; 9+ messages in thread From: Marek Marczykowski-Górecki @ 2026-03-02 11:11 UTC (permalink / raw) To: Jan Beulich Cc: Jürgen Groß, xen-devel, Boris Ostrovsky, Andrew Morton, David Hildenbrand (Arm) [-- Attachment #1: Type: text/plain, Size: 467 bytes --] On Mon, Mar 02, 2026 at 12:05:57PM +0100, Jan Beulich wrote: > On 02.03.2026 12:01, Marek Marczykowski-Górecki wrote: > > It isn't just reducing double-zeroing to single zeroing. It's about > > avoiding zeroing such pages at all. If a domU is started with > > populate-on-demand, many (sometimes most) of its pages are populated in > > EPT. > > ITYM "unpopulated in EPT"? Yes... -- Best Regards, Marek Marczykowski-Górecki Invisible Things Lab [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Excluding init_on_free for pages for initial balloon down (Xen) 2026-03-02 11:01 ` Marek Marczykowski-Górecki 2026-03-02 11:05 ` Jan Beulich @ 2026-03-02 14:54 ` David Hildenbrand (Arm) 2026-03-02 15:11 ` Marek Marczykowski-Górecki 1 sibling, 1 reply; 9+ messages in thread From: David Hildenbrand (Arm) @ 2026-03-02 14:54 UTC (permalink / raw) To: Marek Marczykowski-Górecki Cc: Jürgen Groß, xen-devel, Boris Ostrovsky, Andrew Morton, Vlastimil Babka > >> Whatever leaves the buddy shall be zeroed out. If there is a >> double-zeroing happen, the latter could get optimized out by checking >> something like user_alloc_needs_zeroing(). >> >> See mm/huge_memory.c:vma_alloc_anon_folio_pmd() as an example where we >> avoid double-zeroing. > > It isn't just reducing double-zeroing to single zeroing. It's about > avoiding zeroing such pages at all. If a domU is started with > populate-on-demand, many (sometimes most) of its pages are populated in > EPT. The idea of PoD is to start guest with high static memory size, but > low actual allocation and fake it until balloon driver kicks in and make > the domU really not use more pages than it has. When balloon driver try > to return those pages to the hypervisor, normally it would just take > unallocated page one by one and made Linux not use them. But if _any_ > zeroing is happening, each page first needs to be mapped to the guest by > the hypervisor (one trip through EPT), just to be removed from them a > moment later... The same is true for most balloon drivers, including virtio-balloon. So far nobody really cared about that, though, as init_on_free usually comes with such a high performance price tag that people in cheap VMs (where you overcommit etc) don't enable it. __GFP_BALLOON_OUT is just nasty. We could probably have a special allocation interface (not exposed to arbitrary kernel modules) and have things like mm/balloon.c consume that. IIUC, xen balloon does not use the memory balloon infrastructure, though. So we'd need some EXPORT_SYMBOL_FOR_MODULES() magic. Like an struct page *alloc_balloon_pages(gfp_t gfp, unsigned int order); Where we only support a subset of gfp flags, for example, to now having to deal with mempolicy. But it needs a bit of code to make it fly, so I am not sure if the page allocator wants to support that. -- Cheers, David ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Excluding init_on_free for pages for initial balloon down (Xen) 2026-03-02 14:54 ` David Hildenbrand (Arm) @ 2026-03-02 15:11 ` Marek Marczykowski-Górecki 2026-03-02 15:21 ` Jürgen Groß 0 siblings, 1 reply; 9+ messages in thread From: Marek Marczykowski-Górecki @ 2026-03-02 15:11 UTC (permalink / raw) To: David Hildenbrand (Arm) Cc: Jürgen Groß, xen-devel, Boris Ostrovsky, Andrew Morton, Vlastimil Babka, linux-mm [-- Attachment #1: Type: text/plain, Size: 2398 bytes --] On Mon, Mar 02, 2026 at 03:54:12PM +0100, David Hildenbrand (Arm) wrote: > > > >> Whatever leaves the buddy shall be zeroed out. If there is a > >> double-zeroing happen, the latter could get optimized out by checking > >> something like user_alloc_needs_zeroing(). > >> > >> See mm/huge_memory.c:vma_alloc_anon_folio_pmd() as an example where we > >> avoid double-zeroing. > > > > It isn't just reducing double-zeroing to single zeroing. It's about > > avoiding zeroing such pages at all. If a domU is started with > > populate-on-demand, many (sometimes most) of its pages are populated in > > EPT. The idea of PoD is to start guest with high static memory size, but > > low actual allocation and fake it until balloon driver kicks in and make > > the domU really not use more pages than it has. When balloon driver try > > to return those pages to the hypervisor, normally it would just take > > unallocated page one by one and made Linux not use them. But if _any_ > > zeroing is happening, each page first needs to be mapped to the guest by > > the hypervisor (one trip through EPT), just to be removed from them a > > moment later... > > The same is true for most balloon drivers, including virtio-balloon. > > So far nobody really cared about that, though, as init_on_free usually > comes with such a high performance price tag that people in cheap VMs > (where you overcommit etc) don't enable it. > > __GFP_BALLOON_OUT is just nasty. > > We could probably have a special allocation interface (not exposed to > arbitrary kernel modules) and have things like mm/balloon.c consume that. > > > IIUC, xen balloon does not use the memory balloon infrastructure, > though. Is there some fundamental reason for that? By looking at the code, the migration to use mm/balloon.c shouldn't be that hard (famous last words...). > So we'd need some EXPORT_SYMBOL_FOR_MODULES() magic. Then this wouldn't be necessary. > Like an > > struct page *alloc_balloon_pages(gfp_t gfp, unsigned int order); > > Where we only support a subset of gfp flags, for example, to now having > to deal with mempolicy. > > But it needs a bit of code to make it fly, so I am not sure if the page > allocator wants to support that. PS adding linux-mm, which I forgot initially... -- Best Regards, Marek Marczykowski-Górecki Invisible Things Lab [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Excluding init_on_free for pages for initial balloon down (Xen) 2026-03-02 15:11 ` Marek Marczykowski-Górecki @ 2026-03-02 15:21 ` Jürgen Groß 0 siblings, 0 replies; 9+ messages in thread From: Jürgen Groß @ 2026-03-02 15:21 UTC (permalink / raw) To: Marek Marczykowski-Górecki, David Hildenbrand (Arm) Cc: xen-devel, Boris Ostrovsky, Andrew Morton, Vlastimil Babka, linux-mm [-- Attachment #1.1.1: Type: text/plain, Size: 2028 bytes --] On 02.03.26 16:11, Marek Marczykowski-Górecki wrote: > On Mon, Mar 02, 2026 at 03:54:12PM +0100, David Hildenbrand (Arm) wrote: >>> >>>> Whatever leaves the buddy shall be zeroed out. If there is a >>>> double-zeroing happen, the latter could get optimized out by checking >>>> something like user_alloc_needs_zeroing(). >>>> >>>> See mm/huge_memory.c:vma_alloc_anon_folio_pmd() as an example where we >>>> avoid double-zeroing. >>> >>> It isn't just reducing double-zeroing to single zeroing. It's about >>> avoiding zeroing such pages at all. If a domU is started with >>> populate-on-demand, many (sometimes most) of its pages are populated in >>> EPT. The idea of PoD is to start guest with high static memory size, but >>> low actual allocation and fake it until balloon driver kicks in and make >>> the domU really not use more pages than it has. When balloon driver try >>> to return those pages to the hypervisor, normally it would just take >>> unallocated page one by one and made Linux not use them. But if _any_ >>> zeroing is happening, each page first needs to be mapped to the guest by >>> the hypervisor (one trip through EPT), just to be removed from them a >>> moment later... >> >> The same is true for most balloon drivers, including virtio-balloon. >> >> So far nobody really cared about that, though, as init_on_free usually >> comes with such a high performance price tag that people in cheap VMs >> (where you overcommit etc) don't enable it. >> >> __GFP_BALLOON_OUT is just nasty. >> >> We could probably have a special allocation interface (not exposed to >> arbitrary kernel modules) and have things like mm/balloon.c consume that. >> >> >> IIUC, xen balloon does not use the memory balloon infrastructure, >> though. > > Is there some fundamental reason for that? By looking at the code, the > migration to use mm/balloon.c shouldn't be that hard (famous last > words...). I wanted to do that for years, but -ENOTIME. Patches welcome. :-) Juergen [-- Attachment #1.1.2: OpenPGP public key --] [-- Type: application/pgp-keys, Size: 3743 bytes --] [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 495 bytes --] ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2026-03-02 15:21 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-03-01 15:04 Excluding init_on_free for pages for initial balloon down (Xen) Marek Marczykowski-Górecki 2026-03-02 6:36 ` Jürgen Groß 2026-03-02 8:40 ` David Hildenbrand (Arm) 2026-03-02 11:01 ` Marek Marczykowski-Górecki 2026-03-02 11:05 ` Jan Beulich 2026-03-02 11:11 ` Marek Marczykowski-Górecki 2026-03-02 14:54 ` David Hildenbrand (Arm) 2026-03-02 15:11 ` Marek Marczykowski-Górecki 2026-03-02 15:21 ` Jürgen Groß
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.