* [virtio-dev] [PATCH v19 QEMU 1/4] virtio-balloon: Implement support for page poison tracking feature
2020-04-10 3:41 [virtio-dev] [PATCH v19 QEMU 0/4] virtio-balloon: add support for free page reporting Alexander Duyck
@ 2020-04-10 3:41 ` Alexander Duyck
2020-04-15 8:08 ` [virtio-dev] " David Hildenbrand
2020-04-10 3:41 ` [virtio-dev] [PATCH v19 QEMU 2/4] linux-headers: update to contain virito-balloon free page reporting Alexander Duyck
` (2 subsequent siblings)
3 siblings, 1 reply; 22+ messages in thread
From: Alexander Duyck @ 2020-04-10 3:41 UTC (permalink / raw)
To: pbonzini, david, mst; +Cc: virtio-dev, qemu-devel
From: Alexander Duyck <alexander.h.duyck@linux.intel.com>
We need to make certain to advertise support for page poison tracking if
we want to actually get data on if the guest will be poisoning pages. So
if free page hinting is active we should add page poisoning support and
let the guest disable it if it isn't using it.
Page poisoning will result in a page being dirtied on free. As such we
cannot really avoid having to copy the page at least one more time since
we will need to write the poison value to the destination. As such we can
just ignore free page hinting if page poisoning is enabled as it will
actually reduce the work we have to do.
Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
---
hw/virtio/virtio-balloon.c | 26 ++++++++++++++++++++++----
include/hw/virtio/virtio-balloon.h | 1 +
2 files changed, 23 insertions(+), 4 deletions(-)
diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
index a4729f7fc930..1c6d36a29a04 100644
--- a/hw/virtio/virtio-balloon.c
+++ b/hw/virtio/virtio-balloon.c
@@ -531,6 +531,15 @@ static void virtio_balloon_free_page_start(VirtIOBalloon *s)
return;
}
+ /*
+ * If page poisoning is enabled then we probably shouldn't bother with
+ * the hinting since the poisoning will dirty the page and invalidate
+ * the work we are doing anyway.
+ */
+ if (virtio_vdev_has_feature(vdev, VIRTIO_BALLOON_F_PAGE_POISON)) {
+ return;
+ }
+
if (s->free_page_report_cmd_id == UINT_MAX) {
s->free_page_report_cmd_id =
VIRTIO_BALLOON_FREE_PAGE_REPORT_CMD_ID_MIN;
@@ -618,12 +627,10 @@ static size_t virtio_balloon_config_size(VirtIOBalloon *s)
if (s->qemu_4_0_config_size) {
return sizeof(struct virtio_balloon_config);
}
- if (virtio_has_feature(features, VIRTIO_BALLOON_F_PAGE_POISON)) {
+ if (virtio_has_feature(features, VIRTIO_BALLOON_F_PAGE_POISON) ||
+ virtio_has_feature(features, VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
return sizeof(struct virtio_balloon_config);
}
- if (virtio_has_feature(features, VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
- return offsetof(struct virtio_balloon_config, poison_val);
- }
return offsetof(struct virtio_balloon_config, free_page_report_cmd_id);
}
@@ -634,6 +641,7 @@ static void virtio_balloon_get_config(VirtIODevice *vdev, uint8_t *config_data)
config.num_pages = cpu_to_le32(dev->num_pages);
config.actual = cpu_to_le32(dev->actual);
+ config.poison_val = cpu_to_le32(dev->poison_val);
if (dev->free_page_report_status == FREE_PAGE_REPORT_S_REQUESTED) {
config.free_page_report_cmd_id =
@@ -697,6 +705,9 @@ static void virtio_balloon_set_config(VirtIODevice *vdev,
qapi_event_send_balloon_change(vm_ram_size -
((ram_addr_t) dev->actual << VIRTIO_BALLOON_PFN_SHIFT));
}
+ dev->poison_val = virtio_vdev_has_feature(vdev,
+ VIRTIO_BALLOON_F_PAGE_POISON) ?
+ le32_to_cpu(config.poison_val) : 0;
trace_virtio_balloon_set_config(dev->actual, oldactual);
}
@@ -706,6 +717,9 @@ static uint64_t virtio_balloon_get_features(VirtIODevice *vdev, uint64_t f,
VirtIOBalloon *dev = VIRTIO_BALLOON(vdev);
f |= dev->host_features;
virtio_add_feature(&f, VIRTIO_BALLOON_F_STATS_VQ);
+ if (virtio_has_feature(f, VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
+ virtio_add_feature(&f, VIRTIO_BALLOON_F_PAGE_POISON);
+ }
return f;
}
@@ -854,6 +868,8 @@ static void virtio_balloon_device_reset(VirtIODevice *vdev)
g_free(s->stats_vq_elem);
s->stats_vq_elem = NULL;
}
+
+ s->poison_val = 0;
}
static void virtio_balloon_set_status(VirtIODevice *vdev, uint8_t status)
@@ -916,6 +932,8 @@ static Property virtio_balloon_properties[] = {
VIRTIO_BALLOON_F_DEFLATE_ON_OOM, false),
DEFINE_PROP_BIT("free-page-hint", VirtIOBalloon, host_features,
VIRTIO_BALLOON_F_FREE_PAGE_HINT, false),
+ DEFINE_PROP_BIT("x-page-poison", VirtIOBalloon, host_features,
+ VIRTIO_BALLOON_F_PAGE_POISON, false),
/* QEMU 4.0 accidentally changed the config size even when free-page-hint
* is disabled, resulting in QEMU 3.1 migration incompatibility. This
* property retains this quirk for QEMU 4.1 machine types.
diff --git a/include/hw/virtio/virtio-balloon.h b/include/hw/virtio/virtio-balloon.h
index d1c968d2376e..7fe78e5c14d7 100644
--- a/include/hw/virtio/virtio-balloon.h
+++ b/include/hw/virtio/virtio-balloon.h
@@ -70,6 +70,7 @@ typedef struct VirtIOBalloon {
uint32_t host_features;
bool qemu_4_0_config_size;
+ uint32_t poison_val;
} VirtIOBalloon;
#endif
---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
^ permalink raw reply related [flat|nested] 22+ messages in thread* [virtio-dev] Re: [PATCH v19 QEMU 1/4] virtio-balloon: Implement support for page poison tracking feature
2020-04-10 3:41 ` [virtio-dev] [PATCH v19 QEMU 1/4] virtio-balloon: Implement support for page poison tracking feature Alexander Duyck
@ 2020-04-15 8:08 ` David Hildenbrand
2020-04-15 17:17 ` Alexander Duyck
0 siblings, 1 reply; 22+ messages in thread
From: David Hildenbrand @ 2020-04-15 8:08 UTC (permalink / raw)
To: Alexander Duyck, pbonzini, mst; +Cc: virtio-dev, qemu-devel
On 10.04.20 05:41, Alexander Duyck wrote:
> From: Alexander Duyck <alexander.h.duyck@linux.intel.com>
>
> We need to make certain to advertise support for page poison tracking if
> we want to actually get data on if the guest will be poisoning pages. So
> if free page hinting is active we should add page poisoning support and
> let the guest disable it if it isn't using it.
>
> Page poisoning will result in a page being dirtied on free. As such we
> cannot really avoid having to copy the page at least one more time since
> we will need to write the poison value to the destination. As such we can
> just ignore free page hinting if page poisoning is enabled as it will
> actually reduce the work we have to do.
>
> Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
> ---
> hw/virtio/virtio-balloon.c | 26 ++++++++++++++++++++++----
> include/hw/virtio/virtio-balloon.h | 1 +
> 2 files changed, 23 insertions(+), 4 deletions(-)
>
> diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
> index a4729f7fc930..1c6d36a29a04 100644
> --- a/hw/virtio/virtio-balloon.c
> +++ b/hw/virtio/virtio-balloon.c
> @@ -531,6 +531,15 @@ static void virtio_balloon_free_page_start(VirtIOBalloon *s)
> return;
> }
>
> + /*
> + * If page poisoning is enabled then we probably shouldn't bother with
> + * the hinting since the poisoning will dirty the page and invalidate
> + * the work we are doing anyway.
> + */
> + if (virtio_vdev_has_feature(vdev, VIRTIO_BALLOON_F_PAGE_POISON)) {
Why not check for the poison value instead? (as you do in patch #3) ?
> + return;
> + }
> +
> if (s->free_page_report_cmd_id == UINT_MAX) {
> s->free_page_report_cmd_id =
> VIRTIO_BALLOON_FREE_PAGE_REPORT_CMD_ID_MIN;
We should rename all "free_page_report" stuff we can to
"free_page_hint"/"free_page_hinting" to avoid confusion (e.g., on my
side :) ) before adding free page reporting .
(looking at the virtio-balloon linux header, it's also confusing but
we're stuck with that - maybe we should add better comments)
> @@ -618,12 +627,10 @@ static size_t virtio_balloon_config_size(VirtIOBalloon *s)
> if (s->qemu_4_0_config_size) {
> return sizeof(struct virtio_balloon_config);
> }
> - if (virtio_has_feature(features, VIRTIO_BALLOON_F_PAGE_POISON)) {
> + if (virtio_has_feature(features, VIRTIO_BALLOON_F_PAGE_POISON) ||
> + virtio_has_feature(features, VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
> return sizeof(struct virtio_balloon_config);
> }
> - if (virtio_has_feature(features, VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
> - return offsetof(struct virtio_balloon_config, poison_val);
> - }
I am not sure this change is completely sane. Why is that necessary at all?
> return offsetof(struct virtio_balloon_config, free_page_report_cmd_id);
> }
>
> @@ -634,6 +641,7 @@ static void virtio_balloon_get_config(VirtIODevice *vdev, uint8_t *config_data)
>
> config.num_pages = cpu_to_le32(dev->num_pages);
> config.actual = cpu_to_le32(dev->actual);
> + config.poison_val = cpu_to_le32(dev->poison_val);
>
> if (dev->free_page_report_status == FREE_PAGE_REPORT_S_REQUESTED) {
> config.free_page_report_cmd_id =
> @@ -697,6 +705,9 @@ static void virtio_balloon_set_config(VirtIODevice *vdev,
> qapi_event_send_balloon_change(vm_ram_size -
> ((ram_addr_t) dev->actual << VIRTIO_BALLOON_PFN_SHIFT));
> }
> + dev->poison_val = virtio_vdev_has_feature(vdev,
> + VIRTIO_BALLOON_F_PAGE_POISON) ?
> + le32_to_cpu(config.poison_val) : 0;
Can we just do a
dev->poison_val = 0;
if (virtio_vdev_has_feature(vdev, VIRTIO_BALLOON_F_PAGE_POISON)) {
dev->poison_val = le32_to_cpu(config.poison_val);
}
instead?
> trace_virtio_balloon_set_config(dev->actual, oldactual);
> }
>
> @@ -706,6 +717,9 @@ static uint64_t virtio_balloon_get_features(VirtIODevice *vdev, uint64_t f,
> VirtIOBalloon *dev = VIRTIO_BALLOON(vdev);
> f |= dev->host_features;
> virtio_add_feature(&f, VIRTIO_BALLOON_F_STATS_VQ);
> + if (virtio_has_feature(f, VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
> + virtio_add_feature(&f, VIRTIO_BALLOON_F_PAGE_POISON);
> + }
>
> return f;
> }
> @@ -854,6 +868,8 @@ static void virtio_balloon_device_reset(VirtIODevice *vdev)
> g_free(s->stats_vq_elem);
> s->stats_vq_elem = NULL;
> }
> +
> + s->poison_val = 0;
> }
>
> static void virtio_balloon_set_status(VirtIODevice *vdev, uint8_t status)
> @@ -916,6 +932,8 @@ static Property virtio_balloon_properties[] = {
> VIRTIO_BALLOON_F_DEFLATE_ON_OOM, false),
> DEFINE_PROP_BIT("free-page-hint", VirtIOBalloon, host_features,
> VIRTIO_BALLOON_F_FREE_PAGE_HINT, false),
> + DEFINE_PROP_BIT("x-page-poison", VirtIOBalloon, host_features,
> + VIRTIO_BALLOON_F_PAGE_POISON, false),
Just curious, why an x- feature?
--
Thanks,
David / dhildenb
---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
^ permalink raw reply [flat|nested] 22+ messages in thread* [virtio-dev] Re: [PATCH v19 QEMU 1/4] virtio-balloon: Implement support for page poison tracking feature
2020-04-15 8:08 ` [virtio-dev] " David Hildenbrand
@ 2020-04-15 17:17 ` Alexander Duyck
2020-04-15 18:16 ` David Hildenbrand
0 siblings, 1 reply; 22+ messages in thread
From: Alexander Duyck @ 2020-04-15 17:17 UTC (permalink / raw)
To: David Hildenbrand
Cc: Paolo Bonzini, Michael S. Tsirkin, virtio-dev, qemu-devel
On Wed, Apr 15, 2020 at 1:08 AM David Hildenbrand <david@redhat.com> wrote:
>
> On 10.04.20 05:41, Alexander Duyck wrote:
> > From: Alexander Duyck <alexander.h.duyck@linux.intel.com>
> >
> > We need to make certain to advertise support for page poison tracking if
> > we want to actually get data on if the guest will be poisoning pages. So
> > if free page hinting is active we should add page poisoning support and
> > let the guest disable it if it isn't using it.
> >
> > Page poisoning will result in a page being dirtied on free. As such we
> > cannot really avoid having to copy the page at least one more time since
> > we will need to write the poison value to the destination. As such we can
> > just ignore free page hinting if page poisoning is enabled as it will
> > actually reduce the work we have to do.
> >
> > Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
> > ---
> > hw/virtio/virtio-balloon.c | 26 ++++++++++++++++++++++----
> > include/hw/virtio/virtio-balloon.h | 1 +
> > 2 files changed, 23 insertions(+), 4 deletions(-)
> >
> > diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
> > index a4729f7fc930..1c6d36a29a04 100644
> > --- a/hw/virtio/virtio-balloon.c
> > +++ b/hw/virtio/virtio-balloon.c
> > @@ -531,6 +531,15 @@ static void virtio_balloon_free_page_start(VirtIOBalloon *s)
> > return;
> > }
> >
> > + /*
> > + * If page poisoning is enabled then we probably shouldn't bother with
> > + * the hinting since the poisoning will dirty the page and invalidate
> > + * the work we are doing anyway.
> > + */
> > + if (virtio_vdev_has_feature(vdev, VIRTIO_BALLOON_F_PAGE_POISON)) {
>
> Why not check for the poison value instead? (as you do in patch #3) ?
So if I recall correctly the vdev has feature requires the host to
indicate that the feature is in use. If page poisoning is not enabled
on the host then it clears the flag on its end and we can proceed with
the feature.
The comment above explains the "why". Basically poisoning a page will
dirty it. So why hint a page as free when that will drop it back into
the guest and result in it being dirtied again. What you end up with
is all the pages that were temporarily placed in the balloon are dirty
after the hinting report is finished at which point you made things
worse instead of helping to improve them.
>
> > + return;
> > + }
> > +
> > if (s->free_page_report_cmd_id == UINT_MAX) {
> > s->free_page_report_cmd_id =
> > VIRTIO_BALLOON_FREE_PAGE_REPORT_CMD_ID_MIN;
>
> We should rename all "free_page_report" stuff we can to
> "free_page_hint"/"free_page_hinting" to avoid confusion (e.g., on my
> side :) ) before adding free page reporting .
>
> (looking at the virtio-balloon linux header, it's also confusing but
> we're stuck with that - maybe we should add better comments)
Are we stuck? Couldn't we just convert it to an anonymous union with
free_page_hint_cmd_id and then use that where needed?
> > @@ -618,12 +627,10 @@ static size_t virtio_balloon_config_size(VirtIOBalloon *s)
> > if (s->qemu_4_0_config_size) {
> > return sizeof(struct virtio_balloon_config);
> > }
> > - if (virtio_has_feature(features, VIRTIO_BALLOON_F_PAGE_POISON)) {
> > + if (virtio_has_feature(features, VIRTIO_BALLOON_F_PAGE_POISON) ||
> > + virtio_has_feature(features, VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
> > return sizeof(struct virtio_balloon_config);
> > }
> > - if (virtio_has_feature(features, VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
> > - return offsetof(struct virtio_balloon_config, poison_val);
> > - }
>
> I am not sure this change is completely sane. Why is that necessary at all?
The poison_val is stored at the end of the structure and is required
in order to make free page hinting to work. What this change is doing
is forcing it so that we report the config size as the full size if
either poisoning or hinting are enabled since the poison val is the
last member of the config structure.
If the question is why bother reducing the size if free page hinting
is not present then I guess we could simplify this and just report
report the size of the config for all cases.
> > return offsetof(struct virtio_balloon_config, free_page_report_cmd_id);
> > }
> >
> > @@ -634,6 +641,7 @@ static void virtio_balloon_get_config(VirtIODevice *vdev, uint8_t *config_data)
> >
> > config.num_pages = cpu_to_le32(dev->num_pages);
> > config.actual = cpu_to_le32(dev->actual);
> > + config.poison_val = cpu_to_le32(dev->poison_val);
> >
> > if (dev->free_page_report_status == FREE_PAGE_REPORT_S_REQUESTED) {
> > config.free_page_report_cmd_id =
> > @@ -697,6 +705,9 @@ static void virtio_balloon_set_config(VirtIODevice *vdev,
> > qapi_event_send_balloon_change(vm_ram_size -
> > ((ram_addr_t) dev->actual << VIRTIO_BALLOON_PFN_SHIFT));
> > }
> > + dev->poison_val = virtio_vdev_has_feature(vdev,
> > + VIRTIO_BALLOON_F_PAGE_POISON) ?
> > + le32_to_cpu(config.poison_val) : 0;
>
> Can we just do a
>
>
> dev->poison_val = 0;
> if (virtio_vdev_has_feature(vdev, VIRTIO_BALLOON_F_PAGE_POISON)) {
> dev->poison_val = le32_to_cpu(config.poison_val);
> }
>
> instead?
I can change it to that if that is what is preferred.
> > trace_virtio_balloon_set_config(dev->actual, oldactual);
> > }
> >
> > @@ -706,6 +717,9 @@ static uint64_t virtio_balloon_get_features(VirtIODevice *vdev, uint64_t f,
> > VirtIOBalloon *dev = VIRTIO_BALLOON(vdev);
> > f |= dev->host_features;
> > virtio_add_feature(&f, VIRTIO_BALLOON_F_STATS_VQ);
> > + if (virtio_has_feature(f, VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
> > + virtio_add_feature(&f, VIRTIO_BALLOON_F_PAGE_POISON);
> > + }
> >
> > return f;
> > }
> > @@ -854,6 +868,8 @@ static void virtio_balloon_device_reset(VirtIODevice *vdev)
> > g_free(s->stats_vq_elem);
> > s->stats_vq_elem = NULL;
> > }
> > +
> > + s->poison_val = 0;
> > }
> >
> > static void virtio_balloon_set_status(VirtIODevice *vdev, uint8_t status)
> > @@ -916,6 +932,8 @@ static Property virtio_balloon_properties[] = {
> > VIRTIO_BALLOON_F_DEFLATE_ON_OOM, false),
> > DEFINE_PROP_BIT("free-page-hint", VirtIOBalloon, host_features,
> > VIRTIO_BALLOON_F_FREE_PAGE_HINT, false),
> > + DEFINE_PROP_BIT("x-page-poison", VirtIOBalloon, host_features,
> > + VIRTIO_BALLOON_F_PAGE_POISON, false),
>
> Just curious, why an x- feature?
It was something I didn't expect the users to enable. It gets enabled
when either free page hinting or free page reporting is enabled. So if
you look you will see that in virtio_balloon_get_features the page
poison feature is added if free page hinting is present. The guest
will clear the feature bit if poisoning is not enabled in the guest.
That results in the vdev getting the bit cleared.
Part of it was also about making this work with the existing feature
code as it had been added to the upstream kernel.
---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
^ permalink raw reply [flat|nested] 22+ messages in thread* [virtio-dev] Re: [PATCH v19 QEMU 1/4] virtio-balloon: Implement support for page poison tracking feature
2020-04-15 17:17 ` Alexander Duyck
@ 2020-04-15 18:16 ` David Hildenbrand
2020-04-15 19:28 ` Alexander Duyck
0 siblings, 1 reply; 22+ messages in thread
From: David Hildenbrand @ 2020-04-15 18:16 UTC (permalink / raw)
To: Alexander Duyck; +Cc: Paolo Bonzini, Michael S. Tsirkin, virtio-dev, qemu-devel
>
> The comment above explains the "why". Basically poisoning a page will
> dirty it. So why hint a page as free when that will drop it back into
> the guest and result in it being dirtied again. What you end up with
> is all the pages that were temporarily placed in the balloon are dirty
> after the hinting report is finished at which point you made things
> worse instead of helping to improve them.
Let me think this through. What happens on free page hinting is that
a) we tell the guest that migration starts
b) it allocates pages (alloc_pages()), sends them to us and adds them to
a list
b) we exclude these pages on migration
c) we tell the guest that migration is over
d) the guest frees all allocated pages
The "issue" with VIRTIO_BALLOON_F_PAGE_POISON is, that in d), the guest
will fill all pages again with a pattern (or zero).
AFAIKs, it's either
1) Not performing free page hinting, migrating pages filled with a
poison value (or zero).
2) Performing free page hinting, not migrating pages filled with a
poison value (or zero), letting only the guest fill them again.
I have the feeling, that 2) is still better for migration, because we
don't migrate useless pages and let the guest reconstruct the content?
(having a poison value of zero might be debatable)
Can you tell me what I am missing? :)
>
>>
>>> + return;
>>> + }
>>> +
>>> if (s->free_page_report_cmd_id == UINT_MAX) {
>>> s->free_page_report_cmd_id =
>>> VIRTIO_BALLOON_FREE_PAGE_REPORT_CMD_ID_MIN;
>>
>> We should rename all "free_page_report" stuff we can to
>> "free_page_hint"/"free_page_hinting" to avoid confusion (e.g., on my
>> side :) ) before adding free page reporting .
>>
>> (looking at the virtio-balloon linux header, it's also confusing but
>> we're stuck with that - maybe we should add better comments)
>
> Are we stuck? Couldn't we just convert it to an anonymous union with
> free_page_hint_cmd_id and then use that where needed?
Saw your patch already :)
>
>>> @@ -618,12 +627,10 @@ static size_t virtio_balloon_config_size(VirtIOBalloon *s)
>>> if (s->qemu_4_0_config_size) {
>>> return sizeof(struct virtio_balloon_config);
>>> }
>>> - if (virtio_has_feature(features, VIRTIO_BALLOON_F_PAGE_POISON)) {
>>> + if (virtio_has_feature(features, VIRTIO_BALLOON_F_PAGE_POISON) ||
>>> + virtio_has_feature(features, VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
>>> return sizeof(struct virtio_balloon_config);
>>> }
>>> - if (virtio_has_feature(features, VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
>>> - return offsetof(struct virtio_balloon_config, poison_val);
>>> - }
>>
>> I am not sure this change is completely sane. Why is that necessary at all?
>
> The poison_val is stored at the end of the structure and is required
> in order to make free page hinting to work. What this change is doing
Required to make it work? In the kernel,
commit 2e991629bcf55a43681aec1ee096eeb03cf81709
Author: Wei Wang <wei.w.wang@intel.com>
Date: Mon Aug 27 09:32:19 2018 +0800
virtio-balloon: VIRTIO_BALLOON_F_PAGE_POISON
was merged after
commit 86a559787e6f5cf662c081363f64a20cad654195
Author: Wei Wang <wei.w.wang@intel.com>
Date: Mon Aug 27 09:32:17 2018 +0800
virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT
So I assume it's perfectly fine to not have it.
I'd say it's the responsibility of the guest to *not* use
VIRTIO_BALLOON_F_FREE_PAGE_HINT in case it is using page poisoning
without VIRTIO_BALLOON_F_PAGE_POISON. Otherwise it will shoot itself
into the foot.
> is forcing it so that we report the config size as the full size if
> either poisoning or hinting are enabled since the poison val is the
> last member of the config structure.
>
> If the question is why bother reducing the size if free page hinting
> is not present then I guess we could simplify this and just report
> report the size of the config for all cases.
I guess the idea is that if you migrate from one QEMU to another, your
config size will not suddenly change (fenced by a feature set that will
not change during migration, observable by a running guest).
My guess would be that we cannot simply change existing configurations
(defined via feature bits) as you do here - see e.g., qemu-4-0-config-size.
[...]
>>>
>>> @@ -706,6 +717,9 @@ static uint64_t virtio_balloon_get_features(VirtIODevice *vdev, uint64_t f,
>>> VirtIOBalloon *dev = VIRTIO_BALLOON(vdev);
>>> f |= dev->host_features;
>>> virtio_add_feature(&f, VIRTIO_BALLOON_F_STATS_VQ);
>>> + if (virtio_has_feature(f, VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
>>> + virtio_add_feature(&f, VIRTIO_BALLOON_F_PAGE_POISON);
>>> + }
>>>
>>> return f;
>>> }
>>> @@ -854,6 +868,8 @@ static void virtio_balloon_device_reset(VirtIODevice *vdev)
>>> g_free(s->stats_vq_elem);
>>> s->stats_vq_elem = NULL;
>>> }
>>> +
>>> + s->poison_val = 0;
>>> }
>>>
>>> static void virtio_balloon_set_status(VirtIODevice *vdev, uint8_t status)
>>> @@ -916,6 +932,8 @@ static Property virtio_balloon_properties[] = {
>>> VIRTIO_BALLOON_F_DEFLATE_ON_OOM, false),
>>> DEFINE_PROP_BIT("free-page-hint", VirtIOBalloon, host_features,
>>> VIRTIO_BALLOON_F_FREE_PAGE_HINT, false),
>>> + DEFINE_PROP_BIT("x-page-poison", VirtIOBalloon, host_features,
>>> + VIRTIO_BALLOON_F_PAGE_POISON, false),
>>
>> Just curious, why an x- feature?
>
> It was something I didn't expect the users to enable. It gets enabled
> when either free page hinting or free page reporting is enabled. So if
> you look you will see that in virtio_balloon_get_features the page
> poison feature is added if free page hinting is present. The guest
> will clear the feature bit if poisoning is not enabled in the guest.
> That results in the vdev getting the bit cleared.
The weird thing is that setting it to "false" will still enable it
automatically, depending on other features. Not sure how helpful that is.
I'd prefer to simply always enable it, similar to
VIRTIO_BALLOON_F_STATS_VQ - but it's late and I am confused how
migration with compat machines will work. :)
So, I wonder if we need this feature bit here at all.
--
Thanks,
David / dhildenb
---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
^ permalink raw reply [flat|nested] 22+ messages in thread* [virtio-dev] Re: [PATCH v19 QEMU 1/4] virtio-balloon: Implement support for page poison tracking feature
2020-04-15 18:16 ` David Hildenbrand
@ 2020-04-15 19:28 ` Alexander Duyck
[not found] ` <EDD25A47-8A8D-4F9B-9875-B983A1BA72C2@redhat.com>
0 siblings, 1 reply; 22+ messages in thread
From: Alexander Duyck @ 2020-04-15 19:28 UTC (permalink / raw)
To: David Hildenbrand, Wang, Wei W
Cc: Paolo Bonzini, Michael S. Tsirkin, virtio-dev, qemu-devel
On Wed, Apr 15, 2020 at 11:17 AM David Hildenbrand <david@redhat.com> wrote:
>
> >
> > The comment above explains the "why". Basically poisoning a page will
> > dirty it. So why hint a page as free when that will drop it back into
> > the guest and result in it being dirtied again. What you end up with
> > is all the pages that were temporarily placed in the balloon are dirty
> > after the hinting report is finished at which point you made things
> > worse instead of helping to improve them.
>
> Let me think this through. What happens on free page hinting is that
>
> a) we tell the guest that migration starts
> b) it allocates pages (alloc_pages()), sends them to us and adds them to
> a list
> b) we exclude these pages on migration
> c) we tell the guest that migration is over
> d) the guest frees all allocated pages
>
> The "issue" with VIRTIO_BALLOON_F_PAGE_POISON is, that in d), the guest
> will fill all pages again with a pattern (or zero).
They should have already been filled with the poison pattern before we
got to d). A bigger worry is that we at some point in the future
update the kernel so that d) doesn't trigger re-poisoning, in which
case the pages won't be marked as dirty, we will have skipped the
migration, and we have no idea what will be in the pages at the end.
> AFAIKs, it's either
>
> 1) Not performing free page hinting, migrating pages filled with a
> poison value (or zero).
> 2) Performing free page hinting, not migrating pages filled with a
> poison value (or zero), letting only the guest fill them again.
>
> I have the feeling, that 2) is still better for migration, because we
> don't migrate useless pages and let the guest reconstruct the content?
> (having a poison value of zero might be debatable)
>
> Can you tell me what I am missing? :)
The goal of the free page hinting was to reduce the number of pages
that have to be migrated, in the second case you point out is is
basically deferring it and will make the post-copy quite more
expensive since all of the free memory will be pushed to the post-copy
which I would think would be undesirable since it means the VM would
have to be down for a greater amount of time with the poisoning
enabled.
The worst case scenario would be one in which the VM was suspended for
migration while there were still pages in the balloon that needed to
be drained. In such a case you would have them in an indeterminate
state since the last page poisoning for them might have been ignored
since they were marked as "free", so they are just going to be
whatever value they were if they had been migrated at all.
> >
> >>
> >>> + return;
> >>> + }
> >>> +
> >>> if (s->free_page_report_cmd_id == UINT_MAX) {
> >>> s->free_page_report_cmd_id =
> >>> VIRTIO_BALLOON_FREE_PAGE_REPORT_CMD_ID_MIN;
> >>
> >> We should rename all "free_page_report" stuff we can to
> >> "free_page_hint"/"free_page_hinting" to avoid confusion (e.g., on my
> >> side :) ) before adding free page reporting .
> >>
> >> (looking at the virtio-balloon linux header, it's also confusing but
> >> we're stuck with that - maybe we should add better comments)
> >
> > Are we stuck? Couldn't we just convert it to an anonymous union with
> > free_page_hint_cmd_id and then use that where needed?
>
> Saw your patch already :)
>
> >
> >>> @@ -618,12 +627,10 @@ static size_t virtio_balloon_config_size(VirtIOBalloon *s)
> >>> if (s->qemu_4_0_config_size) {
> >>> return sizeof(struct virtio_balloon_config);
> >>> }
> >>> - if (virtio_has_feature(features, VIRTIO_BALLOON_F_PAGE_POISON)) {
> >>> + if (virtio_has_feature(features, VIRTIO_BALLOON_F_PAGE_POISON) ||
> >>> + virtio_has_feature(features, VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
> >>> return sizeof(struct virtio_balloon_config);
> >>> }
> >>> - if (virtio_has_feature(features, VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
> >>> - return offsetof(struct virtio_balloon_config, poison_val);
> >>> - }
> >>
> >> I am not sure this change is completely sane. Why is that necessary at all?
> >
> > The poison_val is stored at the end of the structure and is required
> > in order to make free page hinting to work. What this change is doing
>
> Required to make it work? In the kernel,
>
> commit 2e991629bcf55a43681aec1ee096eeb03cf81709
> Author: Wei Wang <wei.w.wang@intel.com>
> Date: Mon Aug 27 09:32:19 2018 +0800
>
> virtio-balloon: VIRTIO_BALLOON_F_PAGE_POISON
>
> was merged after
>
> commit 86a559787e6f5cf662c081363f64a20cad654195
> Author: Wei Wang <wei.w.wang@intel.com>
> Date: Mon Aug 27 09:32:17 2018 +0800
>
> virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT
>
> So I assume it's perfectly fine to not have it.
>
> I'd say it's the responsibility of the guest to *not* use
> VIRTIO_BALLOON_F_FREE_PAGE_HINT in case it is using page poisoning
> without VIRTIO_BALLOON_F_PAGE_POISON. Otherwise it will shoot itself
> into the foot.
Based on the timing I am guessing the page poisoning was in the same
patch set as the free page hinting since there is only 2 seconds
between when the two are merged. If I recall the page poisoning logic
was added after the issue was pointed out that it needed to address
it.
I can probably make some changes to virtballoon_validate in the kernel
driver to address the fact that if we are poisoning pages we need to
either have the PAGE_POISON feature or we need to disable page
reporting and page hinting.
> > is forcing it so that we report the config size as the full size if
> > either poisoning or hinting are enabled since the poison val is the
> > last member of the config structure.
> >
> > If the question is why bother reducing the size if free page hinting
> > is not present then I guess we could simplify this and just report
> > report the size of the config for all cases.
>
> I guess the idea is that if you migrate from one QEMU to another, your
> config size will not suddenly change (fenced by a feature set that will
> not change during migration, observable by a running guest).
>
> My guess would be that we cannot simply change existing configurations
> (defined via feature bits) as you do here - see e.g., qemu-4-0-config-size.
Okay, I guess I can revert that bit.
> [...]
>
> >>>
> >>> @@ -706,6 +717,9 @@ static uint64_t virtio_balloon_get_features(VirtIODevice *vdev, uint64_t f,
> >>> VirtIOBalloon *dev = VIRTIO_BALLOON(vdev);
> >>> f |= dev->host_features;
> >>> virtio_add_feature(&f, VIRTIO_BALLOON_F_STATS_VQ);
> >>> + if (virtio_has_feature(f, VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
> >>> + virtio_add_feature(&f, VIRTIO_BALLOON_F_PAGE_POISON);
> >>> + }
> >>>
> >>> return f;
> >>> }
> >>> @@ -854,6 +868,8 @@ static void virtio_balloon_device_reset(VirtIODevice *vdev)
> >>> g_free(s->stats_vq_elem);
> >>> s->stats_vq_elem = NULL;
> >>> }
> >>> +
> >>> + s->poison_val = 0;
> >>> }
> >>>
> >>> static void virtio_balloon_set_status(VirtIODevice *vdev, uint8_t status)
> >>> @@ -916,6 +932,8 @@ static Property virtio_balloon_properties[] = {
> >>> VIRTIO_BALLOON_F_DEFLATE_ON_OOM, false),
> >>> DEFINE_PROP_BIT("free-page-hint", VirtIOBalloon, host_features,
> >>> VIRTIO_BALLOON_F_FREE_PAGE_HINT, false),
> >>> + DEFINE_PROP_BIT("x-page-poison", VirtIOBalloon, host_features,
> >>> + VIRTIO_BALLOON_F_PAGE_POISON, false),
> >>
> >> Just curious, why an x- feature?
> >
> > It was something I didn't expect the users to enable. It gets enabled
> > when either free page hinting or free page reporting is enabled. So if
> > you look you will see that in virtio_balloon_get_features the page
> > poison feature is added if free page hinting is present. The guest
> > will clear the feature bit if poisoning is not enabled in the guest.
> > That results in the vdev getting the bit cleared.
>
> The weird thing is that setting it to "false" will still enable it
> automatically, depending on other features. Not sure how helpful that is.
>
> I'd prefer to simply always enable it, similar to
> VIRTIO_BALLOON_F_STATS_VQ - but it's late and I am confused how
> migration with compat machines will work. :)
>
> So, I wonder if we need this feature bit here at all.
I can drop it. I don't really recall why I added that piece.
---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
^ permalink raw reply [flat|nested] 22+ messages in thread
* [virtio-dev] [PATCH v19 QEMU 2/4] linux-headers: update to contain virito-balloon free page reporting
2020-04-10 3:41 [virtio-dev] [PATCH v19 QEMU 0/4] virtio-balloon: add support for free page reporting Alexander Duyck
2020-04-10 3:41 ` [virtio-dev] [PATCH v19 QEMU 1/4] virtio-balloon: Implement support for page poison tracking feature Alexander Duyck
@ 2020-04-10 3:41 ` Alexander Duyck
2020-04-10 3:41 ` [virtio-dev] [PATCH v19 QEMU 3/4] virtio-balloon: Provide an interface for " Alexander Duyck
2020-04-10 3:41 ` [virtio-dev] [PATCH v19 QEMU 4/4] memory: Do not allow direct write access to rom_device regions Alexander Duyck
3 siblings, 0 replies; 22+ messages in thread
From: Alexander Duyck @ 2020-04-10 3:41 UTC (permalink / raw)
To: pbonzini, david, mst; +Cc: virtio-dev, qemu-devel
From: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Sync the latest upstream changes for free page reporting. To be
replaced by a full linux header sync.
Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
---
include/standard-headers/linux/virtio_balloon.h | 1 +
1 file changed, 1 insertion(+)
diff --git a/include/standard-headers/linux/virtio_balloon.h b/include/standard-headers/linux/virtio_balloon.h
index 9375ca2a70de..1c5f6d6f2de6 100644
--- a/include/standard-headers/linux/virtio_balloon.h
+++ b/include/standard-headers/linux/virtio_balloon.h
@@ -36,6 +36,7 @@
#define VIRTIO_BALLOON_F_DEFLATE_ON_OOM 2 /* Deflate balloon on OOM */
#define VIRTIO_BALLOON_F_FREE_PAGE_HINT 3 /* VQ to report free pages */
#define VIRTIO_BALLOON_F_PAGE_POISON 4 /* Guest is using page poisoning */
+#define VIRTIO_BALLOON_F_REPORTING 5 /* Page reporting virtqueue */
/* Size of a PFN in the balloon interface. */
#define VIRTIO_BALLOON_PFN_SHIFT 12
---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [virtio-dev] [PATCH v19 QEMU 3/4] virtio-balloon: Provide an interface for free page reporting
2020-04-10 3:41 [virtio-dev] [PATCH v19 QEMU 0/4] virtio-balloon: add support for free page reporting Alexander Duyck
2020-04-10 3:41 ` [virtio-dev] [PATCH v19 QEMU 1/4] virtio-balloon: Implement support for page poison tracking feature Alexander Duyck
2020-04-10 3:41 ` [virtio-dev] [PATCH v19 QEMU 2/4] linux-headers: update to contain virito-balloon free page reporting Alexander Duyck
@ 2020-04-10 3:41 ` Alexander Duyck
2020-04-15 8:17 ` [virtio-dev] " David Hildenbrand
2020-04-10 3:41 ` [virtio-dev] [PATCH v19 QEMU 4/4] memory: Do not allow direct write access to rom_device regions Alexander Duyck
3 siblings, 1 reply; 22+ messages in thread
From: Alexander Duyck @ 2020-04-10 3:41 UTC (permalink / raw)
To: pbonzini, david, mst; +Cc: virtio-dev, qemu-devel
From: Alexander Duyck <alexander.h.duyck@linux.intel.com>
Add support for free page reporting. The idea is to function very similar
to how the balloon works in that we basically end up madvising the page as
not being used. However we don't really need to bother with any deflate
type logic since the page will be faulted back into the guest when it is
read or written to.
This provides a new way of letting the guest proactively report free
pages to the hypervisor, so the hypervisor can reuse them. In contrast to
inflate/deflate that is triggered via the hypervisor explicitly.
Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
---
hw/virtio/virtio-balloon.c | 63 +++++++++++++++++++++++++++++++++++-
include/hw/virtio/virtio-balloon.h | 2 +
2 files changed, 62 insertions(+), 3 deletions(-)
diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
index 1c6d36a29a04..86d8b48a8e3a 100644
--- a/hw/virtio/virtio-balloon.c
+++ b/hw/virtio/virtio-balloon.c
@@ -321,6 +321,57 @@ static void balloon_stats_set_poll_interval(Object *obj, Visitor *v,
balloon_stats_change_timer(s, 0);
}
+static void virtio_balloon_handle_report(VirtIODevice *vdev, VirtQueue *vq)
+{
+ VirtIOBalloon *dev = VIRTIO_BALLOON(vdev);
+ VirtQueueElement *elem;
+
+ while ((elem = virtqueue_pop(vq, sizeof(VirtQueueElement)))) {
+ unsigned int i;
+
+ for (i = 0; i < elem->in_num; i++) {
+ void *addr = elem->in_sg[i].iov_base;
+ size_t size = elem->in_sg[i].iov_len;
+ ram_addr_t ram_offset;
+ size_t rb_page_size;
+ RAMBlock *rb;
+
+ if (qemu_balloon_is_inhibited() || dev->poison_val) {
+ continue;
+ }
+
+ /*
+ * There is no need to check the memory section to see if
+ * it is ram/readonly/romd like there is for handle_output
+ * below. If the region is not meant to be written to then
+ * address_space_map will have allocated a bounce buffer
+ * and it will be freed in address_space_unmap and trigger
+ * and unassigned_mem_write before failing to copy over the
+ * buffer. If more than one bad descriptor is provided it
+ * will return NULL after the first bounce buffer and fail
+ * to map any resources.
+ */
+ rb = qemu_ram_block_from_host(addr, false, &ram_offset);
+ if (!rb) {
+ trace_virtio_balloon_bad_addr(elem->in_addr[i]);
+ continue;
+ }
+
+ /* For now we will simply ignore unaligned memory regions */
+ rb_page_size = qemu_ram_pagesize(rb);
+ if (!QEMU_IS_ALIGNED(ram_offset | size, rb_page_size)) {
+ continue;
+ }
+
+ ram_block_discard_range(rb, ram_offset, size);
+ }
+
+ virtqueue_push(vq, elem, 0);
+ virtio_notify(vdev, vq);
+ g_free(elem);
+ }
+}
+
static void virtio_balloon_handle_output(VirtIODevice *vdev, VirtQueue *vq)
{
VirtIOBalloon *s = VIRTIO_BALLOON(vdev);
@@ -628,7 +679,8 @@ static size_t virtio_balloon_config_size(VirtIOBalloon *s)
return sizeof(struct virtio_balloon_config);
}
if (virtio_has_feature(features, VIRTIO_BALLOON_F_PAGE_POISON) ||
- virtio_has_feature(features, VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
+ virtio_has_feature(features, VIRTIO_BALLOON_F_FREE_PAGE_HINT) ||
+ virtio_has_feature(features, VIRTIO_BALLOON_F_REPORTING)) {
return sizeof(struct virtio_balloon_config);
}
return offsetof(struct virtio_balloon_config, free_page_report_cmd_id);
@@ -717,7 +769,8 @@ static uint64_t virtio_balloon_get_features(VirtIODevice *vdev, uint64_t f,
VirtIOBalloon *dev = VIRTIO_BALLOON(vdev);
f |= dev->host_features;
virtio_add_feature(&f, VIRTIO_BALLOON_F_STATS_VQ);
- if (virtio_has_feature(f, VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
+ if (virtio_has_feature(f, VIRTIO_BALLOON_F_FREE_PAGE_HINT) ||
+ virtio_has_feature(f, VIRTIO_BALLOON_F_REPORTING)) {
virtio_add_feature(&f, VIRTIO_BALLOON_F_PAGE_POISON);
}
@@ -807,6 +860,10 @@ static void virtio_balloon_device_realize(DeviceState *dev, Error **errp)
s->dvq = virtio_add_queue(vdev, 128, virtio_balloon_handle_output);
s->svq = virtio_add_queue(vdev, 128, virtio_balloon_receive_stats);
+ if (virtio_has_feature(s->host_features, VIRTIO_BALLOON_F_REPORTING)) {
+ s->rvq = virtio_add_queue(vdev, 32, virtio_balloon_handle_report);
+ }
+
if (virtio_has_feature(s->host_features,
VIRTIO_BALLOON_F_FREE_PAGE_HINT)) {
s->free_page_vq = virtio_add_queue(vdev, VIRTQUEUE_MAX_SIZE,
@@ -940,6 +997,8 @@ static Property virtio_balloon_properties[] = {
*/
DEFINE_PROP_BOOL("qemu-4-0-config-size", VirtIOBalloon,
qemu_4_0_config_size, false),
+ DEFINE_PROP_BIT("free-page-reporting", VirtIOBalloon, host_features,
+ VIRTIO_BALLOON_F_REPORTING, true),
DEFINE_PROP_LINK("iothread", VirtIOBalloon, iothread, TYPE_IOTHREAD,
IOThread *),
DEFINE_PROP_END_OF_LIST(),
diff --git a/include/hw/virtio/virtio-balloon.h b/include/hw/virtio/virtio-balloon.h
index 7fe78e5c14d7..db5bf7127112 100644
--- a/include/hw/virtio/virtio-balloon.h
+++ b/include/hw/virtio/virtio-balloon.h
@@ -42,7 +42,7 @@ enum virtio_balloon_free_page_report_status {
typedef struct VirtIOBalloon {
VirtIODevice parent_obj;
- VirtQueue *ivq, *dvq, *svq, *free_page_vq;
+ VirtQueue *ivq, *dvq, *svq, *free_page_vq, *rvq;
uint32_t free_page_report_status;
uint32_t num_pages;
uint32_t actual;
---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
^ permalink raw reply related [flat|nested] 22+ messages in thread* [virtio-dev] Re: [PATCH v19 QEMU 3/4] virtio-balloon: Provide an interface for free page reporting
2020-04-10 3:41 ` [virtio-dev] [PATCH v19 QEMU 3/4] virtio-balloon: Provide an interface for " Alexander Duyck
@ 2020-04-15 8:17 ` David Hildenbrand
2020-04-15 9:03 ` David Hildenbrand
2020-04-15 15:31 ` Alexander Duyck
0 siblings, 2 replies; 22+ messages in thread
From: David Hildenbrand @ 2020-04-15 8:17 UTC (permalink / raw)
To: Alexander Duyck, pbonzini, mst; +Cc: virtio-dev, qemu-devel
On 10.04.20 05:41, Alexander Duyck wrote:
> From: Alexander Duyck <alexander.h.duyck@linux.intel.com>
>
> Add support for free page reporting. The idea is to function very similar
> to how the balloon works in that we basically end up madvising the page as
> not being used. However we don't really need to bother with any deflate
> type logic since the page will be faulted back into the guest when it is
> read or written to.
>
> This provides a new way of letting the guest proactively report free
> pages to the hypervisor, so the hypervisor can reuse them. In contrast to
> inflate/deflate that is triggered via the hypervisor explicitly.
Much better, thanks!
>
> Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
> ---
> hw/virtio/virtio-balloon.c | 63 +++++++++++++++++++++++++++++++++++-
> include/hw/virtio/virtio-balloon.h | 2 +
> 2 files changed, 62 insertions(+), 3 deletions(-)
>
> diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
> index 1c6d36a29a04..86d8b48a8e3a 100644
> --- a/hw/virtio/virtio-balloon.c
> +++ b/hw/virtio/virtio-balloon.c
> @@ -321,6 +321,57 @@ static void balloon_stats_set_poll_interval(Object *obj, Visitor *v,
> balloon_stats_change_timer(s, 0);
> }
>
> +static void virtio_balloon_handle_report(VirtIODevice *vdev, VirtQueue *vq)
> +{
> + VirtIOBalloon *dev = VIRTIO_BALLOON(vdev);
> + VirtQueueElement *elem;
> +
> + while ((elem = virtqueue_pop(vq, sizeof(VirtQueueElement)))) {
> + unsigned int i;
> +
> + for (i = 0; i < elem->in_num; i++) {
> + void *addr = elem->in_sg[i].iov_base;
> + size_t size = elem->in_sg[i].iov_len;
> + ram_addr_t ram_offset;
> + size_t rb_page_size;
> + RAMBlock *rb;
> +
> + if (qemu_balloon_is_inhibited() || dev->poison_val) {
> + continue;
actually, you want to do that in the outer loop, no?
> + }
> +
> + /*
> + * There is no need to check the memory section to see if
> + * it is ram/readonly/romd like there is for handle_output
> + * below. If the region is not meant to be written to then
> + * address_space_map will have allocated a bounce buffer
> + * and it will be freed in address_space_unmap and trigger
> + * and unassigned_mem_write before failing to copy over the
> + * buffer. If more than one bad descriptor is provided it
> + * will return NULL after the first bounce buffer and fail
> + * to map any resources.
> + */
> + rb = qemu_ram_block_from_host(addr, false, &ram_offset);
> + if (!rb) {
> + trace_virtio_balloon_bad_addr(elem->in_addr[i]);
> + continue;
> + }
> +
> + /* For now we will simply ignore unaligned memory regions */
> + rb_page_size = qemu_ram_pagesize(rb);
> + if (!QEMU_IS_ALIGNED(ram_offset | size, rb_page_size)) {
/me thinks you can drop rb_page_size
I *think* there is still one remaining case to handle: Crossing RAM blocks.
Most probably you should check
/* For now, ignore crossing RAM blocks. */
if (ram_offset + size >= qemu_ram_get_used_length()) {
continue;
}
otherwise ram_block_discard_range() will report an error.
--
Thanks,
David / dhildenb
---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
^ permalink raw reply [flat|nested] 22+ messages in thread* [virtio-dev] Re: [PATCH v19 QEMU 3/4] virtio-balloon: Provide an interface for free page reporting
2020-04-15 8:17 ` [virtio-dev] " David Hildenbrand
@ 2020-04-15 9:03 ` David Hildenbrand
2020-04-15 15:31 ` Alexander Duyck
1 sibling, 0 replies; 22+ messages in thread
From: David Hildenbrand @ 2020-04-15 9:03 UTC (permalink / raw)
To: Alexander Duyck, pbonzini, mst; +Cc: virtio-dev, qemu-devel
On 15.04.20 10:17, David Hildenbrand wrote:
> On 10.04.20 05:41, Alexander Duyck wrote:
>> From: Alexander Duyck <alexander.h.duyck@linux.intel.com>
>>
>> Add support for free page reporting. The idea is to function very similar
>> to how the balloon works in that we basically end up madvising the page as
>> not being used. However we don't really need to bother with any deflate
>> type logic since the page will be faulted back into the guest when it is
>> read or written to.
>>
>> This provides a new way of letting the guest proactively report free
>> pages to the hypervisor, so the hypervisor can reuse them. In contrast to
>> inflate/deflate that is triggered via the hypervisor explicitly.
>
> Much better, thanks!
>
>>
>> Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
>> ---
>> hw/virtio/virtio-balloon.c | 63 +++++++++++++++++++++++++++++++++++-
>> include/hw/virtio/virtio-balloon.h | 2 +
>> 2 files changed, 62 insertions(+), 3 deletions(-)
>>
>> diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
>> index 1c6d36a29a04..86d8b48a8e3a 100644
>> --- a/hw/virtio/virtio-balloon.c
>> +++ b/hw/virtio/virtio-balloon.c
>> @@ -321,6 +321,57 @@ static void balloon_stats_set_poll_interval(Object *obj, Visitor *v,
>> balloon_stats_change_timer(s, 0);
>> }
>>
>> +static void virtio_balloon_handle_report(VirtIODevice *vdev, VirtQueue *vq)
>> +{
>> + VirtIOBalloon *dev = VIRTIO_BALLOON(vdev);
>> + VirtQueueElement *elem;
>> +
>> + while ((elem = virtqueue_pop(vq, sizeof(VirtQueueElement)))) {
>> + unsigned int i;
>> +
>> + for (i = 0; i < elem->in_num; i++) {
>> + void *addr = elem->in_sg[i].iov_base;
>> + size_t size = elem->in_sg[i].iov_len;
>> + ram_addr_t ram_offset;
>> + size_t rb_page_size;
>> + RAMBlock *rb;
>> +
>> + if (qemu_balloon_is_inhibited() || dev->poison_val) {
>> + continue;
>
> actually, you want to do that in the outer loop, no?
>
>> + }
>> +
>> + /*
>> + * There is no need to check the memory section to see if
>> + * it is ram/readonly/romd like there is for handle_output
>> + * below. If the region is not meant to be written to then
>> + * address_space_map will have allocated a bounce buffer
>> + * and it will be freed in address_space_unmap and trigger
>> + * and unassigned_mem_write before failing to copy over the
>> + * buffer. If more than one bad descriptor is provided it
>> + * will return NULL after the first bounce buffer and fail
>> + * to map any resources.
>> + */
>> + rb = qemu_ram_block_from_host(addr, false, &ram_offset);
>> + if (!rb) {
>> + trace_virtio_balloon_bad_addr(elem->in_addr[i]);
>> + continue;
>> + }
>> +
>> + /* For now we will simply ignore unaligned memory regions */
>> + rb_page_size = qemu_ram_pagesize(rb);
>> + if (!QEMU_IS_ALIGNED(ram_offset | size, rb_page_size)) {
>
> /me thinks you can drop rb_page_size
>
> I *think* there is still one remaining case to handle: Crossing RAM blocks.
>
> Most probably you should check
>
> /* For now, ignore crossing RAM blocks. */
> if (ram_offset + size >= qemu_ram_get_used_length()) {
> continue;
(should be an > I guess)
--
Thanks,
David / dhildenb
---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
^ permalink raw reply [flat|nested] 22+ messages in thread* [virtio-dev] Re: [PATCH v19 QEMU 3/4] virtio-balloon: Provide an interface for free page reporting
2020-04-15 8:17 ` [virtio-dev] " David Hildenbrand
2020-04-15 9:03 ` David Hildenbrand
@ 2020-04-15 15:31 ` Alexander Duyck
1 sibling, 0 replies; 22+ messages in thread
From: Alexander Duyck @ 2020-04-15 15:31 UTC (permalink / raw)
To: David Hildenbrand
Cc: Paolo Bonzini, Michael S. Tsirkin, virtio-dev, qemu-devel
On Wed, Apr 15, 2020 at 1:17 AM David Hildenbrand <david@redhat.com> wrote:
>
> On 10.04.20 05:41, Alexander Duyck wrote:
> > From: Alexander Duyck <alexander.h.duyck@linux.intel.com>
> >
> > Add support for free page reporting. The idea is to function very similar
> > to how the balloon works in that we basically end up madvising the page as
> > not being used. However we don't really need to bother with any deflate
> > type logic since the page will be faulted back into the guest when it is
> > read or written to.
> >
> > This provides a new way of letting the guest proactively report free
> > pages to the hypervisor, so the hypervisor can reuse them. In contrast to
> > inflate/deflate that is triggered via the hypervisor explicitly.
>
> Much better, thanks!
>
> >
> > Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
> > ---
> > hw/virtio/virtio-balloon.c | 63 +++++++++++++++++++++++++++++++++++-
> > include/hw/virtio/virtio-balloon.h | 2 +
> > 2 files changed, 62 insertions(+), 3 deletions(-)
> >
> > diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
> > index 1c6d36a29a04..86d8b48a8e3a 100644
> > --- a/hw/virtio/virtio-balloon.c
> > +++ b/hw/virtio/virtio-balloon.c
> > @@ -321,6 +321,57 @@ static void balloon_stats_set_poll_interval(Object *obj, Visitor *v,
> > balloon_stats_change_timer(s, 0);
> > }
> >
> > +static void virtio_balloon_handle_report(VirtIODevice *vdev, VirtQueue *vq)
> > +{
> > + VirtIOBalloon *dev = VIRTIO_BALLOON(vdev);
> > + VirtQueueElement *elem;
> > +
> > + while ((elem = virtqueue_pop(vq, sizeof(VirtQueueElement)))) {
> > + unsigned int i;
> > +
> > + for (i = 0; i < elem->in_num; i++) {
> > + void *addr = elem->in_sg[i].iov_base;
> > + size_t size = elem->in_sg[i].iov_len;
> > + ram_addr_t ram_offset;
> > + size_t rb_page_size;
> > + RAMBlock *rb;
> > +
> > + if (qemu_balloon_is_inhibited() || dev->poison_val) {
> > + continue;
>
> actually, you want to do that in the outer loop, no?
I'll move that. Odds are compiler was doing that anyway.
> > + }
> > +
> > + /*
> > + * There is no need to check the memory section to see if
> > + * it is ram/readonly/romd like there is for handle_output
> > + * below. If the region is not meant to be written to then
> > + * address_space_map will have allocated a bounce buffer
> > + * and it will be freed in address_space_unmap and trigger
> > + * and unassigned_mem_write before failing to copy over the
> > + * buffer. If more than one bad descriptor is provided it
> > + * will return NULL after the first bounce buffer and fail
> > + * to map any resources.
> > + */
> > + rb = qemu_ram_block_from_host(addr, false, &ram_offset);
> > + if (!rb) {
> > + trace_virtio_balloon_bad_addr(elem->in_addr[i]);
> > + continue;
> > + }
> > +
> > + /* For now we will simply ignore unaligned memory regions */
> > + rb_page_size = qemu_ram_pagesize(rb);
> > + if (!QEMU_IS_ALIGNED(ram_offset | size, rb_page_size)) {
>
> /me thinks you can drop rb_page_size
You mean just fold it into the statement? I guess I can do that.
> I *think* there is still one remaining case to handle: Crossing RAM blocks.
>
> Most probably you should check
>
> /* For now, ignore crossing RAM blocks. */
> if (ram_offset + size >= qemu_ram_get_used_length()) {
> continue;
> }
>
> otherwise ram_block_discard_range() will report an error.
Makes sense. I can add that into the QEMU_IS_ALIGNED check.
---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
^ permalink raw reply [flat|nested] 22+ messages in thread
* [virtio-dev] [PATCH v19 QEMU 4/4] memory: Do not allow direct write access to rom_device regions
2020-04-10 3:41 [virtio-dev] [PATCH v19 QEMU 0/4] virtio-balloon: add support for free page reporting Alexander Duyck
` (2 preceding siblings ...)
2020-04-10 3:41 ` [virtio-dev] [PATCH v19 QEMU 3/4] virtio-balloon: Provide an interface for " Alexander Duyck
@ 2020-04-10 3:41 ` Alexander Duyck
2020-04-10 10:50 ` [virtio-dev] " Paolo Bonzini
3 siblings, 1 reply; 22+ messages in thread
From: Alexander Duyck @ 2020-04-10 3:41 UTC (permalink / raw)
To: pbonzini, david, mst; +Cc: virtio-dev, qemu-devel
From: Alexander Duyck <alexander.h.duyck@linux.intel.com>
According to the documentation in memory.h a ROM memory region will be
backed by RAM for reads, but is supposed to go through a callback for
writes. Currently we were not checking for the existence of the rom_device
flag when determining if we could perform a direct write or not.
To correct that add a check to memory_region_is_direct so that if the
memory region has the rom_device flag set we will return false for all
checks where is_write is set.
Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
---
include/exec/memory.h | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/include/exec/memory.h b/include/exec/memory.h
index 1614d9a02c0c..e000bd2f97b2 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -2351,8 +2351,8 @@ void address_space_write_cached_slow(MemoryRegionCache *cache,
static inline bool memory_access_is_direct(MemoryRegion *mr, bool is_write)
{
if (is_write) {
- return memory_region_is_ram(mr) &&
- !mr->readonly && !memory_region_is_ram_device(mr);
+ return memory_region_is_ram(mr) && !mr->readonly &&
+ !mr->rom_device && !memory_region_is_ram_device(mr);
} else {
return (memory_region_is_ram(mr) && !memory_region_is_ram_device(mr)) ||
memory_region_is_romd(mr);
---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
^ permalink raw reply related [flat|nested] 22+ messages in thread* [virtio-dev] Re: [PATCH v19 QEMU 4/4] memory: Do not allow direct write access to rom_device regions
2020-04-10 3:41 ` [virtio-dev] [PATCH v19 QEMU 4/4] memory: Do not allow direct write access to rom_device regions Alexander Duyck
@ 2020-04-10 10:50 ` Paolo Bonzini
2020-04-13 22:48 ` Alexander Duyck
0 siblings, 1 reply; 22+ messages in thread
From: Paolo Bonzini @ 2020-04-10 10:50 UTC (permalink / raw)
To: Alexander Duyck, david, mst; +Cc: virtio-dev, qemu-devel
On 10/04/20 05:41, Alexander Duyck wrote:
> From: Alexander Duyck <alexander.h.duyck@linux.intel.com>
>
> According to the documentation in memory.h a ROM memory region will be
> backed by RAM for reads, but is supposed to go through a callback for
> writes. Currently we were not checking for the existence of the rom_device
> flag when determining if we could perform a direct write or not.
>
> To correct that add a check to memory_region_is_direct so that if the
> memory region has the rom_device flag set we will return false for all
> checks where is_write is set.
>
> Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
> ---
> include/exec/memory.h | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/include/exec/memory.h b/include/exec/memory.h
> index 1614d9a02c0c..e000bd2f97b2 100644
> --- a/include/exec/memory.h
> +++ b/include/exec/memory.h
> @@ -2351,8 +2351,8 @@ void address_space_write_cached_slow(MemoryRegionCache *cache,
> static inline bool memory_access_is_direct(MemoryRegion *mr, bool is_write)
> {
> if (is_write) {
> - return memory_region_is_ram(mr) &&
> - !mr->readonly && !memory_region_is_ram_device(mr);
> + return memory_region_is_ram(mr) && !mr->readonly &&
> + !mr->rom_device && !memory_region_is_ram_device(mr);
> } else {
> return (memory_region_is_ram(mr) && !memory_region_is_ram_device(mr)) ||
> memory_region_is_romd(mr);
>
Good catch. I queued this up for 5.0.
Thanks,
Paolo
---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
^ permalink raw reply [flat|nested] 22+ messages in thread* [virtio-dev] Re: [PATCH v19 QEMU 4/4] memory: Do not allow direct write access to rom_device regions
2020-04-10 10:50 ` [virtio-dev] " Paolo Bonzini
@ 2020-04-13 22:48 ` Alexander Duyck
2020-04-14 7:36 ` David Hildenbrand
0 siblings, 1 reply; 22+ messages in thread
From: Alexander Duyck @ 2020-04-13 22:48 UTC (permalink / raw)
To: Paolo Bonzini, Michael S. Tsirkin, David Hildenbrand
Cc: virtio-dev, qemu-devel
On Fri, Apr 10, 2020 at 3:50 AM Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> On 10/04/20 05:41, Alexander Duyck wrote:
> > From: Alexander Duyck <alexander.h.duyck@linux.intel.com>
> >
> > According to the documentation in memory.h a ROM memory region will be
> > backed by RAM for reads, but is supposed to go through a callback for
> > writes. Currently we were not checking for the existence of the rom_device
> > flag when determining if we could perform a direct write or not.
> >
> > To correct that add a check to memory_region_is_direct so that if the
> > memory region has the rom_device flag set we will return false for all
> > checks where is_write is set.
> >
> > Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
> > ---
> > include/exec/memory.h | 4 ++--
> > 1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/include/exec/memory.h b/include/exec/memory.h
> > index 1614d9a02c0c..e000bd2f97b2 100644
> > --- a/include/exec/memory.h
> > +++ b/include/exec/memory.h
> > @@ -2351,8 +2351,8 @@ void address_space_write_cached_slow(MemoryRegionCache *cache,
> > static inline bool memory_access_is_direct(MemoryRegion *mr, bool is_write)
> > {
> > if (is_write) {
> > - return memory_region_is_ram(mr) &&
> > - !mr->readonly && !memory_region_is_ram_device(mr);
> > + return memory_region_is_ram(mr) && !mr->readonly &&
> > + !mr->rom_device && !memory_region_is_ram_device(mr);
> > } else {
> > return (memory_region_is_ram(mr) && !memory_region_is_ram_device(mr)) ||
> > memory_region_is_romd(mr);
> >
>
> Good catch. I queued this up for 5.0.
>
> Thanks,
>
> Paolo
Thanks Paolo,
It looks like you only pulled this patch correct?
If so, David & Michael, do I need to resubmit the first 3 in this
series or can those be pulled separately?
Thanks.
Alex
---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
^ permalink raw reply [flat|nested] 22+ messages in thread* [virtio-dev] Re: [PATCH v19 QEMU 4/4] memory: Do not allow direct write access to rom_device regions
2020-04-13 22:48 ` Alexander Duyck
@ 2020-04-14 7:36 ` David Hildenbrand
0 siblings, 0 replies; 22+ messages in thread
From: David Hildenbrand @ 2020-04-14 7:36 UTC (permalink / raw)
To: Alexander Duyck, Paolo Bonzini, Michael S. Tsirkin; +Cc: virtio-dev, qemu-devel
On 14.04.20 00:48, Alexander Duyck wrote:
> On Fri, Apr 10, 2020 at 3:50 AM Paolo Bonzini <pbonzini@redhat.com> wrote:
>>
>> On 10/04/20 05:41, Alexander Duyck wrote:
>>> From: Alexander Duyck <alexander.h.duyck@linux.intel.com>
>>>
>>> According to the documentation in memory.h a ROM memory region will be
>>> backed by RAM for reads, but is supposed to go through a callback for
>>> writes. Currently we were not checking for the existence of the rom_device
>>> flag when determining if we could perform a direct write or not.
>>>
>>> To correct that add a check to memory_region_is_direct so that if the
>>> memory region has the rom_device flag set we will return false for all
>>> checks where is_write is set.
>>>
>>> Signed-off-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
>>> ---
>>> include/exec/memory.h | 4 ++--
>>> 1 file changed, 2 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/include/exec/memory.h b/include/exec/memory.h
>>> index 1614d9a02c0c..e000bd2f97b2 100644
>>> --- a/include/exec/memory.h
>>> +++ b/include/exec/memory.h
>>> @@ -2351,8 +2351,8 @@ void address_space_write_cached_slow(MemoryRegionCache *cache,
>>> static inline bool memory_access_is_direct(MemoryRegion *mr, bool is_write)
>>> {
>>> if (is_write) {
>>> - return memory_region_is_ram(mr) &&
>>> - !mr->readonly && !memory_region_is_ram_device(mr);
>>> + return memory_region_is_ram(mr) && !mr->readonly &&
>>> + !mr->rom_device && !memory_region_is_ram_device(mr);
>>> } else {
>>> return (memory_region_is_ram(mr) && !memory_region_is_ram_device(mr)) ||
>>> memory_region_is_romd(mr);
>>>
>>
>> Good catch. I queued this up for 5.0.
>>
>> Thanks,
>>
>> Paolo
>
> Thanks Paolo,
>
> It looks like you only pulled this patch correct?
>
> If so, David & Michael, do I need to resubmit the first 3 in this
> series or can those be pulled separately?
QEMU is currently in hard freeze. I'll have a final look over the
patches. If nothing jumps at me (and nothing changed upstream in the
meantime), Michael will queue them without a resend.
Thanks!
--
Thanks,
David / dhildenb
---------------------------------------------------------------------
To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
^ permalink raw reply [flat|nested] 22+ messages in thread