[PATCH RFC] Drivers: hv: balloon: Temporary disable the driver on ARM64 when PAGE

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH RFC] Drivers: hv: balloon: Temporary disable the driver on ARM64 when PAGE_SIZE != 4k
@ 2022-01-05 16:50 Vitaly Kuznetsov
  2022-01-05 20:22 ` Wei Liu
  0 siblings, 1 reply; 4+ messages in thread
From: Vitaly Kuznetsov @ 2022-01-05 16:50 UTC (permalink / raw)
  To: linux-hyperv
  Cc: K. Y. Srinivasan, Haiyang Zhang, Stephen Hemminger, Wei Liu,
	Dexuan Cui, Michael Kelley, Boqun Feng, linux-kernel

Hyper-V ballooning and memory hotplug protocol always seems to assume
4k page size so all PFNs in the structures used for communication are
4k PFNs. In case a different page size is in use on the guest (e.g.
64k), things go terribly wrong all over:
- When reporting statistics, post_status() reports them in guest pages
and hypervisor sees very low memory usage.
- When ballooning, guest reports back PFNs of the allocated pages but
the hypervisor treats them as 4k PFNs.
- When unballooning or memory hotplugging, PFNs coming from the host
are 4k PFNs and they may not even be 64k aligned making it difficult
to handle.

While statistics and ballooning requests would be relatively easy to
handle by converting between guest and hypervisor page sizes in the
communication structures, handling unballooning and memory hotplug
requests seem to be harder. In particular, when ballooning up
alloc_balloon_pages() shatters huge pages so unballooning request can
be handled for any part of it. It is not possible to shatter a 64k
page into 4k pages so it's unclear how to handle unballooning for a
sub-range if such request ever comes so we can't just report a 64k
page as 16 separate 4k pages.

Ideally, the protocol between the guest and the host should be changed
to allow for different guest page sizes.

While there's no solution for the above mentioned problems, it seems
we're better off without the driver in problematic cases.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 drivers/hv/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/hv/Kconfig b/drivers/hv/Kconfig
index 0747a8f1fcee..fb353a13e5c4 100644
--- a/drivers/hv/Kconfig
+++ b/drivers/hv/Kconfig
@@ -25,7 +25,7 @@ config HYPERV_UTILS

 config HYPERV_BALLOON
 	tristate "Microsoft Hyper-V Balloon driver"
-	depends on HYPERV
+	depends on HYPERV && (X86 || (ARM64 && ARM64_4K_PAGES))
 	select PAGE_REPORTING
 	help
 	  Select this option to enable Hyper-V Balloon driver.
-- 
2.33.1

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH RFC] Drivers: hv: balloon: Temporary disable the driver on ARM64 when PAGE_SIZE != 4k
  2022-01-05 16:50 [PATCH RFC] Drivers: hv: balloon: Temporary disable the driver on ARM64 when PAGE_SIZE != 4k Vitaly Kuznetsov
@ 2022-01-05 20:22 ` Wei Liu
  2022-01-06  8:46   ` Vitaly Kuznetsov
  0 siblings, 1 reply; 4+ messages in thread
From: Wei Liu @ 2022-01-05 20:22 UTC (permalink / raw)
  To: Vitaly Kuznetsov
  Cc: linux-hyperv, K. Y. Srinivasan, Haiyang Zhang, Stephen Hemminger,
	Wei Liu, Dexuan Cui, Michael Kelley, Boqun Feng, linux-kernel

On Wed, Jan 05, 2022 at 05:50:28PM +0100, Vitaly Kuznetsov wrote:
> Hyper-V ballooning and memory hotplug protocol always seems to assume
> 4k page size so all PFNs in the structures used for communication are
> 4k PFNs. In case a different page size is in use on the guest (e.g.
> 64k), things go terribly wrong all over:
> - When reporting statistics, post_status() reports them in guest pages
> and hypervisor sees very low memory usage.
> - When ballooning, guest reports back PFNs of the allocated pages but
> the hypervisor treats them as 4k PFNs.
> - When unballooning or memory hotplugging, PFNs coming from the host
> are 4k PFNs and they may not even be 64k aligned making it difficult
> to handle.
> 
> While statistics and ballooning requests would be relatively easy to
> handle by converting between guest and hypervisor page sizes in the
> communication structures, handling unballooning and memory hotplug
> requests seem to be harder. In particular, when ballooning up
> alloc_balloon_pages() shatters huge pages so unballooning request can
> be handled for any part of it. It is not possible to shatter a 64k
> page into 4k pages so it's unclear how to handle unballooning for a
> sub-range if such request ever comes so we can't just report a 64k
> page as 16 separate 4k pages.
> 

How does virtio-balloon handle it? Does its protocol handle different
page sizes?


> Ideally, the protocol between the guest and the host should be changed
> to allow for different guest page sizes.
> 
> While there's no solution for the above mentioned problems, it seems
> we're better off without the driver in problematic cases.
> 
> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> ---
>  drivers/hv/Kconfig | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/hv/Kconfig b/drivers/hv/Kconfig
> index 0747a8f1fcee..fb353a13e5c4 100644
> --- a/drivers/hv/Kconfig
> +++ b/drivers/hv/Kconfig
> @@ -25,7 +25,7 @@ config HYPERV_UTILS
>  
>  config HYPERV_BALLOON
>  	tristate "Microsoft Hyper-V Balloon driver"
> -	depends on HYPERV
> +	depends on HYPERV && (X86 || (ARM64 && ARM64_4K_PAGES))
>  	select PAGE_REPORTING
>  	help
>  	  Select this option to enable Hyper-V Balloon driver.
> -- 
> 2.33.1
> 

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH RFC] Drivers: hv: balloon: Temporary disable the driver on ARM64 when PAGE_SIZE != 4k
  2022-01-05 20:22 ` Wei Liu
@ 2022-01-06  8:46   ` Vitaly Kuznetsov
  2022-01-06 10:05     ` David Hildenbrand
  0 siblings, 1 reply; 4+ messages in thread
From: Vitaly Kuznetsov @ 2022-01-06  8:46 UTC (permalink / raw)
  To: Wei Liu, David Hildenbrand
  Cc: linux-hyperv, K. Y. Srinivasan, Haiyang Zhang, Stephen Hemminger,
	Wei Liu, Dexuan Cui, Michael Kelley, Boqun Feng, linux-kernel

Wei Liu <wei.liu@kernel.org> writes:

> On Wed, Jan 05, 2022 at 05:50:28PM +0100, Vitaly Kuznetsov wrote:
>> Hyper-V ballooning and memory hotplug protocol always seems to assume
>> 4k page size so all PFNs in the structures used for communication are
>> 4k PFNs. In case a different page size is in use on the guest (e.g.
>> 64k), things go terribly wrong all over:
>> - When reporting statistics, post_status() reports them in guest pages
>> and hypervisor sees very low memory usage.
>> - When ballooning, guest reports back PFNs of the allocated pages but
>> the hypervisor treats them as 4k PFNs.
>> - When unballooning or memory hotplugging, PFNs coming from the host
>> are 4k PFNs and they may not even be 64k aligned making it difficult
>> to handle.
>> 
>> While statistics and ballooning requests would be relatively easy to
>> handle by converting between guest and hypervisor page sizes in the
>> communication structures, handling unballooning and memory hotplug
>> requests seem to be harder. In particular, when ballooning up
>> alloc_balloon_pages() shatters huge pages so unballooning request can
>> be handled for any part of it. It is not possible to shatter a 64k
>> page into 4k pages so it's unclear how to handle unballooning for a
>> sub-range if such request ever comes so we can't just report a 64k
>> page as 16 separate 4k pages.
>> 
>
> How does virtio-balloon handle it? Does its protocol handle different
> page sizes?
>

Let's ask the expert)

David,

how does virtio-balloon (and virtio-mem) deal with different page sizes
between guest and host?

>
>> Ideally, the protocol between the guest and the host should be changed
>> to allow for different guest page sizes.
>> 
>> While there's no solution for the above mentioned problems, it seems
>> we're better off without the driver in problematic cases.
>> 
>> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
>> ---
>>  drivers/hv/Kconfig | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>> 
>> diff --git a/drivers/hv/Kconfig b/drivers/hv/Kconfig
>> index 0747a8f1fcee..fb353a13e5c4 100644
>> --- a/drivers/hv/Kconfig
>> +++ b/drivers/hv/Kconfig
>> @@ -25,7 +25,7 @@ config HYPERV_UTILS
>>  
>>  config HYPERV_BALLOON
>>  	tristate "Microsoft Hyper-V Balloon driver"
>> -	depends on HYPERV
>> +	depends on HYPERV && (X86 || (ARM64 && ARM64_4K_PAGES))
>>  	select PAGE_REPORTING
>>  	help
>>  	  Select this option to enable Hyper-V Balloon driver.
>> -- 
>> 2.33.1
>> 
>

-- 
Vitaly


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH RFC] Drivers: hv: balloon: Temporary disable the driver on ARM64 when PAGE_SIZE != 4k
  2022-01-06  8:46   ` Vitaly Kuznetsov
@ 2022-01-06 10:05     ` David Hildenbrand
  0 siblings, 0 replies; 4+ messages in thread
From: David Hildenbrand @ 2022-01-06 10:05 UTC (permalink / raw)
  To: Vitaly Kuznetsov, Wei Liu
  Cc: linux-hyperv, K. Y. Srinivasan, Haiyang Zhang, Stephen Hemminger,
	Dexuan Cui, Michael Kelley, Boqun Feng, linux-kernel

On 06.01.22 09:46, Vitaly Kuznetsov wrote:
> Wei Liu <wei.liu@kernel.org> writes:
> 
>> On Wed, Jan 05, 2022 at 05:50:28PM +0100, Vitaly Kuznetsov wrote:
>>> Hyper-V ballooning and memory hotplug protocol always seems to assume
>>> 4k page size so all PFNs in the structures used for communication are
>>> 4k PFNs. In case a different page size is in use on the guest (e.g.
>>> 64k), things go terribly wrong all over:
>>> - When reporting statistics, post_status() reports them in guest pages
>>> and hypervisor sees very low memory usage.
>>> - When ballooning, guest reports back PFNs of the allocated pages but
>>> the hypervisor treats them as 4k PFNs.
>>> - When unballooning or memory hotplugging, PFNs coming from the host
>>> are 4k PFNs and they may not even be 64k aligned making it difficult
>>> to handle.
>>>
>>> While statistics and ballooning requests would be relatively easy to
>>> handle by converting between guest and hypervisor page sizes in the
>>> communication structures, handling unballooning and memory hotplug
>>> requests seem to be harder. In particular, when ballooning up
>>> alloc_balloon_pages() shatters huge pages so unballooning request can
>>> be handled for any part of it. It is not possible to shatter a 64k
>>> page into 4k pages so it's unclear how to handle unballooning for a
>>> sub-range if such request ever comes so we can't just report a 64k
>>> page as 16 separate 4k pages.
>>>
>>
>> How does virtio-balloon handle it? Does its protocol handle different
>> page sizes?
>>
> 
> Let's ask the expert)
> 
> David,
> 
> how does virtio-balloon (and virtio-mem) deal with different page sizes
> between guest and host?

virtio-mem is easy, virtio-balloon is more involved. virtio-balloon has
similarly a 4k granularity as part of the protocol.

1. virtio-mem:

It has a per-device block size determined by the device, usually around
1 MiB or bigger. virtio-mem usually uses the THP size (e.g., 2 MiB on
x86_64) unless larger huge pages are used for backing device memory in
the hypervisor. So the actual base page size doesn't play any role.

Resizing is triggered by a resize request towards the guest, and it's
always up to the guest to select device blocks to (un)plug.

E.g., plugged_size: 200 MiB, requested_size: 300 MiB -> guest is
requested to plug 100 MiB (select unplugged device blocks and request to
plug them)

E.g., plugged_size: 300 MiB, requested_size: 200 MiB -> guest is
requested to unplug 100 MiB (select plugged device blocks and request to
unplug them)

1.1 host granularity < guest granularity

Assume the device supports 2MiB and the guest 4 MiB, which is the case
on current x86_64, The guest will simply (un)plug in 4 MiB granularity,
logically mapping to two 2MiB device blocks. Requests not aligned to
4MiB cannot be fully processed.

1.2 guest granularity < host granularity

Assume the device supports 2 MiB and the guest 1 MiB. The guest will
simply (un)plug in 2 MiB granularity.

2. virtio-balloon

It's based on 4k pages.

Inflation/deflation is triggered by a balloon size change request. It's
always up to the guest to select pages to inflate/deflate

E.g., current_size: 200 MiB, target_size: 300 MiB -> guest is requested
to inflate the balloon by 100 MiB (select deflated pages and request to
inflate them)

E.g., current_size: 300 MiB, target_size: 200 MiB -> guest is requested
to deflate 100 MiB (select inflated pages and request to deflate them)

2.1 guest granularity > 4k:

Assume the guest has a page size of 16K. Inflation/deflation requests
not aligned to 16K cannot be fully processed. Otherwise, the guest
simply inflates/defaltes 16k pages by logically inflating/deflating 4
consecutive 4k pages. It's worth noting that inflation/deflation
requests of 4k pages cannot be rejected by the host.

VIRTIO_BALLOON_PAGES_PER_PAGE expresses exactly that. set_page_pfns()
simply iterates over VIRTIO_BALLOON_PAGES_PER_PAGE "4k sub-pages"

2.2 host granularity > 4k:

Assume the host has a page size of 16K. From a guest POV we don't care
and we don't know and will just operate on 4k pages. In the hypervisor
it's problematic, though: if the guest inflated a 4k page, we cannot
easily free up a 16k page. We'd have to track the state of each and
every individual page, which is undesirable: once the complete 16k page
would be inflated, we could free it. QEMU only tracks this for
consecutive inflation requests, which sometimes works.

Long story short, IIRC all the details about the HV balloon, the real
issue is that while the guest is in charge of selecting pages to
inflate, it's the *hypervisor* that selects pages to deflate, which
makes it impossible to handle 2.1 in a similar way to virtio-balloon.
The hypervisor would have to know about the guest granularity in order
to not request to deflate, say, 4k pages, but 16k pages. Devil might be
in the detail.

-- 
Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2022-01-06 10:05 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-01-05 16:50 [PATCH RFC] Drivers: hv: balloon: Temporary disable the driver on ARM64 when PAGE_SIZE != 4k Vitaly Kuznetsov
2022-01-05 20:22 ` Wei Liu
2022-01-06  8:46   ` Vitaly Kuznetsov
2022-01-06 10:05     ` David Hildenbrand

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox