* [RFC PATCH] virtio_balloon: Support wait on ACK for hinting
@ 2026-01-19 15:42 Jack Thomson
2026-01-19 15:50 ` David Hildenbrand (Red Hat)
0 siblings, 1 reply; 4+ messages in thread
From: Jack Thomson @ 2026-01-19 15:42 UTC (permalink / raw)
To: mst, david, jasowang
Cc: xuanzhuo, eperezma, virtualization, linux-kernel, kalyazin,
xmarcalx, jackabt
From: Jack Thomson <jackabt@amazon.com>
This RFC patch adds a new virtio feature for the virtio-balloon driver
during free page hinting, which will wait on device ack before
committing the range to the free_page_list. The reason for the change is
it allows the device to modify this range without it being reclaimed
from the free_page_list before the ack is sent. As expected, testing
shows this adds overhead to the hinting run duration, increasing it by
~30% with our Firecracker setup. Currently free page hinting is used
mainly for live migration, but this would open it up for a new use-case.
We would like to leverage this with MADV_DONTNEED to reduce RSS of a
guest. We'd like to use hinting because of the flexibility of control it
brings compared to reporting, allowing memory to be reclaimed in
deterministic periods. The traditional balloon device was tested to be
much slower when compared to hinting for these workloads. Currently,
without this synchronization, hinted pages may be reclaimed from the
free list before the device finishes processing them, making hinting
unsuitable for this use-case.
Signed-off-by: Jack Thomson <jackabt@amazon.com>
---
drivers/virtio/virtio_balloon.c | 21 ++++++++++++++++++---
include/uapi/linux/virtio_balloon.h | 1 +
2 files changed, 19 insertions(+), 3 deletions(-)
diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index 74fe59f5a78c..82b560422279 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -596,8 +596,11 @@ static int init_vqs(struct virtio_balloon *vb)
vqs_info[VIRTIO_BALLOON_VQ_STATS].callback = stats_request;
}
- if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT))
+ if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
vqs_info[VIRTIO_BALLOON_VQ_FREE_PAGE].name = "free_page_vq";
+ if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_HINT_WAIT_ON_ACK))
+ vqs_info[VIRTIO_BALLOON_VQ_FREE_PAGE].callback = balloon_ack;
+ }
if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_REPORTING)) {
vqs_info[VIRTIO_BALLOON_VQ_REPORTING].name = "reporting_vq";
@@ -669,8 +672,11 @@ static int send_cmd_id_start(struct virtio_balloon *vb)
virtio_balloon_cmd_id_received(vb));
sg_init_one(&sg, &vb->cmd_id_active, sizeof(vb->cmd_id_active));
err = virtqueue_add_outbuf(vq, &sg, 1, &vb->cmd_id_active, GFP_KERNEL);
- if (!err)
+ if (!err) {
virtqueue_kick(vq);
+ if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_HINT_WAIT_ON_ACK))
+ wait_event(vb->acked, virtqueue_get_buf(vq, &unused));
+ }
return err;
}
@@ -686,8 +692,11 @@ static int send_cmd_id_stop(struct virtio_balloon *vb)
sg_init_one(&sg, &vb->cmd_id_stop, sizeof(vb->cmd_id_stop));
err = virtqueue_add_outbuf(vq, &sg, 1, &vb->cmd_id_stop, GFP_KERNEL);
- if (!err)
+ if (!err) {
virtqueue_kick(vq);
+ if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_HINT_WAIT_ON_ACK))
+ wait_event(vb->acked, virtqueue_get_buf(vq, &unused));
+ }
return err;
}
@@ -722,6 +731,8 @@ static int get_free_page_and_send(struct virtio_balloon *vb)
return err;
}
virtqueue_kick(vq);
+ if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_HINT_WAIT_ON_ACK))
+ wait_event(vb->acked, virtqueue_get_buf(vq, &unused));
spin_lock_irq(&vb->free_page_list_lock);
balloon_page_push(&vb->free_page_list, page);
vb->num_free_page_blocks++;
@@ -1186,6 +1197,9 @@ static int virtballoon_validate(struct virtio_device *vdev)
else if (!virtio_has_feature(vdev, VIRTIO_BALLOON_F_PAGE_POISON))
__virtio_clear_bit(vdev, VIRTIO_BALLOON_F_REPORTING);
+ if (!virtio_has_feature(vdev, VIRTIO_BALLOON_F_FREE_PAGE_HINT))
+ __virtio_clear_bit(vdev, VIRTIO_BALLOON_F_HINT_WAIT_ON_ACK);
+
__virtio_clear_bit(vdev, VIRTIO_F_ACCESS_PLATFORM);
return 0;
}
@@ -1197,6 +1211,7 @@ static unsigned int features[] = {
VIRTIO_BALLOON_F_FREE_PAGE_HINT,
VIRTIO_BALLOON_F_PAGE_POISON,
VIRTIO_BALLOON_F_REPORTING,
+ VIRTIO_BALLOON_F_HINT_WAIT_ON_ACK,
};
static struct virtio_driver virtio_balloon_driver = {
diff --git a/include/uapi/linux/virtio_balloon.h b/include/uapi/linux/virtio_balloon.h
index ee35a372805d..86698ab06261 100644
--- a/include/uapi/linux/virtio_balloon.h
+++ b/include/uapi/linux/virtio_balloon.h
@@ -37,6 +37,7 @@
#define VIRTIO_BALLOON_F_FREE_PAGE_HINT 3 /* VQ to report free pages */
#define VIRTIO_BALLOON_F_PAGE_POISON 4 /* Guest is using page poisoning */
#define VIRTIO_BALLOON_F_REPORTING 5 /* Page reporting virtqueue */
+#define VIRTIO_BALLOON_F_HINT_WAIT_ON_ACK 6 /* Page hinting waits on device ack */
/* Size of a PFN in the balloon interface. */
#define VIRTIO_BALLOON_PFN_SHIFT 12
base-commit: 24d479d26b25bce5faea3ddd9fa8f3a6c3129ea7
--
2.43.0
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [RFC PATCH] virtio_balloon: Support wait on ACK for hinting
2026-01-19 15:42 [RFC PATCH] virtio_balloon: Support wait on ACK for hinting Jack Thomson
@ 2026-01-19 15:50 ` David Hildenbrand (Red Hat)
2026-01-19 16:30 ` Thomson, Jack
0 siblings, 1 reply; 4+ messages in thread
From: David Hildenbrand (Red Hat) @ 2026-01-19 15:50 UTC (permalink / raw)
To: Jack Thomson, mst, jasowang
Cc: xuanzhuo, eperezma, virtualization, linux-kernel, kalyazin,
xmarcalx, jackabt
On 1/19/26 16:42, Jack Thomson wrote:
> From: Jack Thomson <jackabt@amazon.com>
>
> This RFC patch adds a new virtio feature for the virtio-balloon driver
> during free page hinting, which will wait on device ack before
> committing the range to the free_page_list. The reason for the change is
> it allows the device to modify this range without it being reclaimed
> from the free_page_list before the ack is sent. As expected, testing
> shows this adds overhead to the hinting run duration, increasing it by
> ~30% with our Firecracker setup. Currently free page hinting is used
> mainly for live migration, but this would open it up for a new use-case.
>
> We would like to leverage this with MADV_DONTNEED to reduce RSS of a
> guest. We'd like to use hinting because of the flexibility of control it
> brings compared to reporting, allowing memory to be reclaimed in
> deterministic periods.
Can you elaborate in more detail why you don't simply use reporting,
like QEMU?
Could you instead see optimizations being done to reporting that could
make it fly for your use case?
Hinting is a rather special case thing only used for reducing VM
migration time in QEMU AFAIR.
--
Cheers
David
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [RFC PATCH] virtio_balloon: Support wait on ACK for hinting
2026-01-19 15:50 ` David Hildenbrand (Red Hat)
@ 2026-01-19 16:30 ` Thomson, Jack
2026-02-11 20:22 ` David Hildenbrand (Arm)
0 siblings, 1 reply; 4+ messages in thread
From: Thomson, Jack @ 2026-01-19 16:30 UTC (permalink / raw)
To: David Hildenbrand (Red Hat), mst, jasowang
Cc: xuanzhuo, eperezma, virtualization, linux-kernel, kalyazin,
xmarcalx, jackabt
On 19/01/2026 3:50 pm, David Hildenbrand (Red Hat) wrote:
> On 1/19/26 16:42, Jack Thomson wrote:
>> From: Jack Thomson <jackabt@amazon.com>
>>
>> This RFC patch adds a new virtio feature for the virtio-balloon driver
>> during free page hinting, which will wait on device ack before
>> committing the range to the free_page_list. The reason for the change is
>> it allows the device to modify this range without it being reclaimed
>> from the free_page_list before the ack is sent. As expected, testing
>> shows this adds overhead to the hinting run duration, increasing it by
>> ~30% with our Firecracker setup. Currently free page hinting is used
>> mainly for live migration, but this would open it up for a new use-case.
>>
>> We would like to leverage this with MADV_DONTNEED to reduce RSS of a
>> guest. We'd like to use hinting because of the flexibility of control it
>> brings compared to reporting, allowing memory to be reclaimed in
>> deterministic periods.
>
> Can you elaborate in more detail why you don't simply use reporting,
> like QEMU?
Ideally we'd like to use hinting as the API allows us to control when
this reclamation takes place so as not to impact active VMs. For example
if we know a VM is idle we can reclaim memory but also cancel the
reclamation quickly if the VM receives new work (something we can't do
quickly with the traditional balloon.)
> Could you instead see optimizations being done to reporting that could
> make it fly for your use case?
One thing that I considered was having reporting running but skip
reported ranges during active times. But this may lead to missing
reclamation opportunities.
>
> Hinting is a rather special case thing only used for reducing VM
> migration time in QEMU AFAIR.
>
Yeah, its API allowing direct control was what interested us. With this
extension it made a great pairing just needed the synchronisation to
make it safe.
--
Thanks,
Jack
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [RFC PATCH] virtio_balloon: Support wait on ACK for hinting
2026-01-19 16:30 ` Thomson, Jack
@ 2026-02-11 20:22 ` David Hildenbrand (Arm)
0 siblings, 0 replies; 4+ messages in thread
From: David Hildenbrand (Arm) @ 2026-02-11 20:22 UTC (permalink / raw)
To: Thomson, Jack, mst, jasowang
Cc: xuanzhuo, eperezma, virtualization, linux-kernel, kalyazin,
xmarcalx, jackabt
On 1/19/26 17:30, Thomson, Jack wrote:
>
>
> On 19/01/2026 3:50 pm, David Hildenbrand (Red Hat) wrote:
>> On 1/19/26 16:42, Jack Thomson wrote:
>>> From: Jack Thomson <jackabt@amazon.com>
>>>
>>> This RFC patch adds a new virtio feature for the virtio-balloon driver
>>> during free page hinting, which will wait on device ack before
>>> committing the range to the free_page_list. The reason for the change is
>>> it allows the device to modify this range without it being reclaimed
>>> from the free_page_list before the ack is sent. As expected, testing
>>> shows this adds overhead to the hinting run duration, increasing it by
>>> ~30% with our Firecracker setup. Currently free page hinting is used
>>> mainly for live migration, but this would open it up for a new use-case.
>>>
>>> We would like to leverage this with MADV_DONTNEED to reduce RSS of a
>>> guest. We'd like to use hinting because of the flexibility of control it
>>> brings compared to reporting, allowing memory to be reclaimed in
>>> deterministic periods.
>>
>> Can you elaborate in more detail why you don't simply use reporting,
>> like QEMU?
>
> Ideally we'd like to use hinting as the API allows us to control when
> this reclamation takes place so as not to impact active VMs. For example
> if we know a VM is idle we can reclaim memory but also cancel the
> reclamation quickly if the VM receives new work (something we can't do
> quickly with the traditional balloon.)
>
>> Could you instead see optimizations being done to reporting that could
>> make it fly for your use case?
>
> One thing that I considered was having reporting running but skip
> reported ranges during active times. But this may lead to missing
> reclamation opportunities.
We could implement a pause+continue option for free-page-reporting
option. So the device could tell the VM to pause reporting and later to
restart reporting.
>
>>
>> Hinting is a rather special case thing only used for reducing VM
>> migration time in QEMU AFAIR.
>>
>
> Yeah, its API allowing direct control was what interested us. With this
> extension it made a great pairing just needed the synchronisation to
> make it safe.
Sorry for getting back to you only now.
So, right now the thing is that hinted pages can get reused by the VM
any time. The hypervisor must detect if that happened and not discard
the pages in that case.
While that works for live migration with bitmap dirty tracking (and is a
bit confusing ...), it doesn't work when you want to MADV_DONTNEED that
memory, because it could be the hypervisor issues MADV_DONTNEED just
after the VM reused the memory.
So you are proposing to let the VM wait for the ack before possibly
reusing the pages.
One thing to note is that free page hinting in Linux allocates memory
through
alloc_pages(VIRTIO_BALLOON_FREE_PAGE_ALLOC_FLAG,
VIRTIO_BALLOON_HINT_BLOCK_ORDER);
Meaning
a) Limited to MAX_ORDER chunks (e.g., 4MB on x86). This is even bigger
than the free-page-reporting granularity (pageblock order, 2MB on
x86)
b) Cannot free memory on ZONE_MOVABLE or CMA in the VM (as these are
unmovable allocations)
So it's a bit suboptimal.
Also, in contrast to free-page-reporting, these pages are not going to
get reused unless we run into the shrinker, which is a bit suboptimal as
well. Free-page-reporting is a lot more optimized for that, as it just
returns reported pages back to the buddy immediately.
So if possible, I would suggest instead to extend free-page-reporting.
--
Cheers,
David
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2026-02-11 20:22 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-01-19 15:42 [RFC PATCH] virtio_balloon: Support wait on ACK for hinting Jack Thomson
2026-01-19 15:50 ` David Hildenbrand (Red Hat)
2026-01-19 16:30 ` Thomson, Jack
2026-02-11 20:22 ` David Hildenbrand (Arm)
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox