* [PATCH v2 1/4] softmmu/physmem: Warn with ram_block_discard_range() on MAP_PRIVATE file mapping
2023-07-06 7:56 [PATCH v2 0/4] virtio-mem: Support "x-ignore-shared" migration David Hildenbrand
@ 2023-07-06 7:56 ` David Hildenbrand
2023-07-06 8:10 ` Juan Quintela
` (2 more replies)
2023-07-06 7:56 ` [PATCH v2 2/4] virtio-mem: Skip most of virtio_mem_unplug_all() without plugged memory David Hildenbrand
` (3 subsequent siblings)
4 siblings, 3 replies; 27+ messages in thread
From: David Hildenbrand @ 2023-07-06 7:56 UTC (permalink / raw)
To: qemu-devel
Cc: David Hildenbrand, Michael S. Tsirkin, Juan Quintela, Peter Xu,
Leonardo Bras, Paolo Bonzini, Philippe Mathieu-Daudé,
Peng Tao, Mario Casquero
ram_block_discard_range() cannot possibly do the right thing in
MAP_PRIVATE file mappings in the general case.
To achieve the documented semantics, we also have to punch a hole into
the file, possibly messing with other MAP_PRIVATE/MAP_SHARED mappings
of such a file.
For example, using VM templating -- see commit b17fbbe55cba ("migration:
allow private destination ram with x-ignore-shared") -- in combination with
any mechanism that relies on discarding of RAM is problematic. This
includes:
* Postcopy live migration
* virtio-balloon inflation/deflation or free-page-reporting
* virtio-mem
So at least warn that there is something possibly dangerous is going on
when using ram_block_discard_range() in these cases.
Acked-by: Peter Xu <peterx@redhat.com>
Tested-by: Mario Casquero <mcasquer@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
softmmu/physmem.c | 18 ++++++++++++++++++
1 file changed, 18 insertions(+)
diff --git a/softmmu/physmem.c b/softmmu/physmem.c
index bda475a719..4ee157bda4 100644
--- a/softmmu/physmem.c
+++ b/softmmu/physmem.c
@@ -3456,6 +3456,24 @@ int ram_block_discard_range(RAMBlock *rb, uint64_t start, size_t length)
* so a userfault will trigger.
*/
#ifdef CONFIG_FALLOCATE_PUNCH_HOLE
+ /*
+ * We'll discard data from the actual file, even though we only
+ * have a MAP_PRIVATE mapping, possibly messing with other
+ * MAP_PRIVATE/MAP_SHARED mappings. There is no easy way to
+ * change that behavior whithout violating the promised
+ * semantics of ram_block_discard_range().
+ *
+ * Only warn, because it work as long as nobody else uses that
+ * file.
+ */
+ if (!qemu_ram_is_shared(rb)) {
+ warn_report_once("ram_block_discard_range: Discarding RAM"
+ " in private file mappings is possibly"
+ " dangerous, because it will modify the"
+ " underlying file and will affect other"
+ " users of the file");
+ }
+
ret = fallocate(rb->fd, FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE,
start, length);
if (ret) {
--
2.41.0
^ permalink raw reply related [flat|nested] 27+ messages in thread
* Re: [PATCH v2 1/4] softmmu/physmem: Warn with ram_block_discard_range() on MAP_PRIVATE file mapping
2023-07-06 7:56 ` [PATCH v2 1/4] softmmu/physmem: Warn with ram_block_discard_range() on MAP_PRIVATE file mapping David Hildenbrand
@ 2023-07-06 8:10 ` Juan Quintela
2023-07-06 8:31 ` David Hildenbrand
2023-07-06 8:49 ` David Hildenbrand
2023-10-18 3:02 ` Xiaoyao Li
2 siblings, 1 reply; 27+ messages in thread
From: Juan Quintela @ 2023-07-06 8:10 UTC (permalink / raw)
To: David Hildenbrand
Cc: qemu-devel, Michael S. Tsirkin, Peter Xu, Leonardo Bras,
Paolo Bonzini, Philippe Mathieu-Daudé, Peng Tao,
Mario Casquero
David Hildenbrand <david@redhat.com> wrote:
> ram_block_discard_range() cannot possibly do the right thing in
> MAP_PRIVATE file mappings in the general case.
>
> To achieve the documented semantics, we also have to punch a hole into
> the file, possibly messing with other MAP_PRIVATE/MAP_SHARED mappings
> of such a file.
>
> For example, using VM templating -- see commit b17fbbe55cba ("migration:
> allow private destination ram with x-ignore-shared") -- in combination with
> any mechanism that relies on discarding of RAM is problematic. This
> includes:
> * Postcopy live migration
> * virtio-balloon inflation/deflation or free-page-reporting
> * virtio-mem
>
> So at least warn that there is something possibly dangerous is going on
> when using ram_block_discard_range() in these cases.
>
> Acked-by: Peter Xu <peterx@redhat.com>
> Tested-by: Mario Casquero <mcasquer@redhat.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
(at least we give a warning)
But I wonder if we can do better and test that:
* Postcopy live migration
We can check if we are on postcopy, or put a marker so we know that
postcopy can have problems when started.
* virtio-balloon inflation/deflation or free-page-reporting
We can check if we have ever used virtio-balloon.
* virtio-mem
We can check if we have used virtio-men
I am just wondering if that is even possible?
Thanks, Juan.
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v2 1/4] softmmu/physmem: Warn with ram_block_discard_range() on MAP_PRIVATE file mapping
2023-07-06 8:10 ` Juan Quintela
@ 2023-07-06 8:31 ` David Hildenbrand
2023-07-06 13:20 ` Juan Quintela
0 siblings, 1 reply; 27+ messages in thread
From: David Hildenbrand @ 2023-07-06 8:31 UTC (permalink / raw)
To: quintela
Cc: qemu-devel, Michael S. Tsirkin, Peter Xu, Leonardo Bras,
Paolo Bonzini, Philippe Mathieu-Daudé, Peng Tao,
Mario Casquero
On 06.07.23 10:10, Juan Quintela wrote:
> David Hildenbrand <david@redhat.com> wrote:
>> ram_block_discard_range() cannot possibly do the right thing in
>> MAP_PRIVATE file mappings in the general case.
>>
>> To achieve the documented semantics, we also have to punch a hole into
>> the file, possibly messing with other MAP_PRIVATE/MAP_SHARED mappings
>> of such a file.
>>
>> For example, using VM templating -- see commit b17fbbe55cba ("migration:
>> allow private destination ram with x-ignore-shared") -- in combination with
>> any mechanism that relies on discarding of RAM is problematic. This
>> includes:
>> * Postcopy live migration
>> * virtio-balloon inflation/deflation or free-page-reporting
>> * virtio-mem
>>
>> So at least warn that there is something possibly dangerous is going on
>> when using ram_block_discard_range() in these cases.
>>
>> Acked-by: Peter Xu <peterx@redhat.com>
>> Tested-by: Mario Casquero <mcasquer@redhat.com>
>> Signed-off-by: David Hildenbrand <david@redhat.com>
>
> Reviewed-by: Juan Quintela <quintela@redhat.com>
>
> (at least we give a warning)
>
> But I wonder if we can do better and test that:
> * Postcopy live migration
>
> We can check if we are on postcopy, or put a marker so we know that
> postcopy can have problems when started.
>
> * virtio-balloon inflation/deflation or free-page-reporting
>
> We can check if we have ever used virtio-balloon.
>
> * virtio-mem
>
> We can check if we have used virtio-men
>
>
> I am just wondering if that is even possible?
Now we warn when any of these features actually tries discarding RAM
(calling ram_block_discard_range()).
As these features trigger discarding of RAM, once we reach this point we
know that they are getting used. (in comparison to default libvirt
attaching a virtio-balloon device without anybody ever using it)
The alternative would be checking the RAM for compatibility when
configuring each features. I decided to warn at a central place for now,
which covers any users.
Thanks!
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v2 1/4] softmmu/physmem: Warn with ram_block_discard_range() on MAP_PRIVATE file mapping
2023-07-06 8:31 ` David Hildenbrand
@ 2023-07-06 13:20 ` Juan Quintela
2023-07-06 13:23 ` David Hildenbrand
0 siblings, 1 reply; 27+ messages in thread
From: Juan Quintela @ 2023-07-06 13:20 UTC (permalink / raw)
To: David Hildenbrand
Cc: qemu-devel, Michael S. Tsirkin, Peter Xu, Leonardo Bras,
Paolo Bonzini, Philippe Mathieu-Daudé, Peng Tao,
Mario Casquero
David Hildenbrand <david@redhat.com> wrote:
> On 06.07.23 10:10, Juan Quintela wrote:
>> David Hildenbrand <david@redhat.com> wrote:
>>> ram_block_discard_range() cannot possibly do the right thing in
>>> MAP_PRIVATE file mappings in the general case.
>>>
>>> To achieve the documented semantics, we also have to punch a hole into
>>> the file, possibly messing with other MAP_PRIVATE/MAP_SHARED mappings
>>> of such a file.
>>>
>>> For example, using VM templating -- see commit b17fbbe55cba ("migration:
>>> allow private destination ram with x-ignore-shared") -- in combination with
>>> any mechanism that relies on discarding of RAM is problematic. This
>>> includes:
>>> * Postcopy live migration
>>> * virtio-balloon inflation/deflation or free-page-reporting
>>> * virtio-mem
>>>
>>> So at least warn that there is something possibly dangerous is going on
>>> when using ram_block_discard_range() in these cases.
>>>
>>> Acked-by: Peter Xu <peterx@redhat.com>
>>> Tested-by: Mario Casquero <mcasquer@redhat.com>
>>> Signed-off-by: David Hildenbrand <david@redhat.com>
>> Reviewed-by: Juan Quintela <quintela@redhat.com>
>> (at least we give a warning)
>> But I wonder if we can do better and test that:
>> * Postcopy live migration
>> We can check if we are on postcopy, or put a marker so we know
>> that
>> postcopy can have problems when started.
>> * virtio-balloon inflation/deflation or free-page-reporting
>> We can check if we have ever used virtio-balloon.
>> * virtio-mem
>> We can check if we have used virtio-men
>> I am just wondering if that is even possible?
>
> Now we warn when any of these features actually tries discarding RAM
> (calling ram_block_discard_range()).
>
> As these features trigger discarding of RAM, once we reach this point
> we know that they are getting used. (in comparison to default libvirt
> attaching a virtio-balloon device without anybody ever using it)
>
>
> The alternative would be checking the RAM for compatibility when
> configuring each features. I decided to warn at a central place for
> now, which covers any users.
I think this is the right thing to do.
Patient: It hurts when I do this.
Doctor: Don't do that.
O:-)
Now more seriously, at this point we are very late to do anything
sensible. I think that the normal thing when we are configuring
incompatible things just flag it.
We are following that approach with migration for some time now, and
everybody is happier.
Later, Juan.
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v2 1/4] softmmu/physmem: Warn with ram_block_discard_range() on MAP_PRIVATE file mapping
2023-07-06 13:20 ` Juan Quintela
@ 2023-07-06 13:23 ` David Hildenbrand
0 siblings, 0 replies; 27+ messages in thread
From: David Hildenbrand @ 2023-07-06 13:23 UTC (permalink / raw)
To: quintela
Cc: qemu-devel, Michael S. Tsirkin, Peter Xu, Leonardo Bras,
Paolo Bonzini, Philippe Mathieu-Daudé, Peng Tao,
Mario Casquero
On 06.07.23 15:20, Juan Quintela wrote:
> David Hildenbrand <david@redhat.com> wrote:
>> On 06.07.23 10:10, Juan Quintela wrote:
>>> David Hildenbrand <david@redhat.com> wrote:
>>>> ram_block_discard_range() cannot possibly do the right thing in
>>>> MAP_PRIVATE file mappings in the general case.
>>>>
>>>> To achieve the documented semantics, we also have to punch a hole into
>>>> the file, possibly messing with other MAP_PRIVATE/MAP_SHARED mappings
>>>> of such a file.
>>>>
>>>> For example, using VM templating -- see commit b17fbbe55cba ("migration:
>>>> allow private destination ram with x-ignore-shared") -- in combination with
>>>> any mechanism that relies on discarding of RAM is problematic. This
>>>> includes:
>>>> * Postcopy live migration
>>>> * virtio-balloon inflation/deflation or free-page-reporting
>>>> * virtio-mem
>>>>
>>>> So at least warn that there is something possibly dangerous is going on
>>>> when using ram_block_discard_range() in these cases.
>>>>
>>>> Acked-by: Peter Xu <peterx@redhat.com>
>>>> Tested-by: Mario Casquero <mcasquer@redhat.com>
>>>> Signed-off-by: David Hildenbrand <david@redhat.com>
>>> Reviewed-by: Juan Quintela <quintela@redhat.com>
>>> (at least we give a warning)
>>> But I wonder if we can do better and test that:
>>> * Postcopy live migration
>>> We can check if we are on postcopy, or put a marker so we know
>>> that
>>> postcopy can have problems when started.
>>> * virtio-balloon inflation/deflation or free-page-reporting
>>> We can check if we have ever used virtio-balloon.
>>> * virtio-mem
>>> We can check if we have used virtio-men
>>> I am just wondering if that is even possible?
>>
>> Now we warn when any of these features actually tries discarding RAM
>> (calling ram_block_discard_range()).
>>
>> As these features trigger discarding of RAM, once we reach this point
>> we know that they are getting used. (in comparison to default libvirt
>> attaching a virtio-balloon device without anybody ever using it)
>>
>>
>> The alternative would be checking the RAM for compatibility when
>> configuring each features. I decided to warn at a central place for
>> now, which covers any users.
>
> I think this is the right thing to do.
>
> Patient: It hurts when I do this.
> Doctor: Don't do that.
>
> O:-)
:)
>
> Now more seriously, at this point we are very late to do anything
> sensible. I think that the normal thing when we are configuring
> incompatible things just flag it.
>
> We are following that approach with migration for some time now, and
> everybody is happier.
For the time being I'll move forward with this patch.
I agree that warning early is nicer (but warning for example for
virtio-balloon early doesn't make too much sense: libvirt adds it
blindly to each VM just to query guest statistics and never inflate the
balloon).
In any case we'll want to warn here as well, because we know that new
callers will easily ignore that limitation / checks.
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v2 1/4] softmmu/physmem: Warn with ram_block_discard_range() on MAP_PRIVATE file mapping
2023-07-06 7:56 ` [PATCH v2 1/4] softmmu/physmem: Warn with ram_block_discard_range() on MAP_PRIVATE file mapping David Hildenbrand
2023-07-06 8:10 ` Juan Quintela
@ 2023-07-06 8:49 ` David Hildenbrand
2023-10-18 3:02 ` Xiaoyao Li
2 siblings, 0 replies; 27+ messages in thread
From: David Hildenbrand @ 2023-07-06 8:49 UTC (permalink / raw)
To: qemu-devel
Cc: Michael S. Tsirkin, Juan Quintela, Peter Xu, Leonardo Bras,
Paolo Bonzini, Philippe Mathieu-Daudé, Peng Tao,
Mario Casquero
> #ifdef CONFIG_FALLOCATE_PUNCH_HOLE
> + /*
> + * We'll discard data from the actual file, even though we only
> + * have a MAP_PRIVATE mapping, possibly messing with other
> + * MAP_PRIVATE/MAP_SHARED mappings. There is no easy way to
> + * change that behavior whithout violating the promised
> + * semantics of ram_block_discard_range().
> + *
> + * Only warn, because it work as long as nobody else uses that
>
I'll fixup
s/work/works/
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v2 1/4] softmmu/physmem: Warn with ram_block_discard_range() on MAP_PRIVATE file mapping
2023-07-06 7:56 ` [PATCH v2 1/4] softmmu/physmem: Warn with ram_block_discard_range() on MAP_PRIVATE file mapping David Hildenbrand
2023-07-06 8:10 ` Juan Quintela
2023-07-06 8:49 ` David Hildenbrand
@ 2023-10-18 3:02 ` Xiaoyao Li
2023-10-18 7:42 ` David Hildenbrand
2 siblings, 1 reply; 27+ messages in thread
From: Xiaoyao Li @ 2023-10-18 3:02 UTC (permalink / raw)
To: David Hildenbrand, qemu-devel
Cc: Michael S. Tsirkin, Juan Quintela, Peter Xu, Leonardo Bras,
Paolo Bonzini, Philippe Mathieu-Daudé, Peng Tao,
Mario Casquero
David,
On 7/6/2023 3:56 PM, David Hildenbrand wrote:
> ram_block_discard_range() cannot possibly do the right thing in
> MAP_PRIVATE file mappings in the general case.
>
> To achieve the documented semantics, we also have to punch a hole into
> the file, possibly messing with other MAP_PRIVATE/MAP_SHARED mappings
> of such a file.
>
> For example, using VM templating -- see commit b17fbbe55cba ("migration:
> allow private destination ram with x-ignore-shared") -- in combination with
> any mechanism that relies on discarding of RAM is problematic. This
> includes:
> * Postcopy live migration
> * virtio-balloon inflation/deflation or free-page-reporting
> * virtio-mem
>
> So at least warn that there is something possibly dangerous is going on
> when using ram_block_discard_range() in these cases.
>
> Acked-by: Peter Xu <peterx@redhat.com>
> Tested-by: Mario Casquero <mcasquer@redhat.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
> softmmu/physmem.c | 18 ++++++++++++++++++
> 1 file changed, 18 insertions(+)
>
> diff --git a/softmmu/physmem.c b/softmmu/physmem.c
> index bda475a719..4ee157bda4 100644
> --- a/softmmu/physmem.c
> +++ b/softmmu/physmem.c
> @@ -3456,6 +3456,24 @@ int ram_block_discard_range(RAMBlock *rb, uint64_t start, size_t length)
> * so a userfault will trigger.
> */
> #ifdef CONFIG_FALLOCATE_PUNCH_HOLE
> + /*
> + * We'll discard data from the actual file, even though we only
> + * have a MAP_PRIVATE mapping, possibly messing with other
> + * MAP_PRIVATE/MAP_SHARED mappings. There is no easy way to
> + * change that behavior whithout violating the promised
> + * semantics of ram_block_discard_range().
> + *
> + * Only warn, because it work as long as nobody else uses that
> + * file.
> + */
> + if (!qemu_ram_is_shared(rb)) {
> + warn_report_once("ram_block_discard_range: Discarding RAM"
> + " in private file mappings is possibly"
> + " dangerous, because it will modify the"
> + " underlying file and will affect other"
> + " users of the file");
> + }
> +
TDX has two types of memory backend for each RAM, shared memory and
private memory. Private memory is serviced by guest memfd and shared
memory can also be backed with a fd.
At any time, only one type needs to be valid, which means the opposite
can be discarded. We do implement the memory discard when TDX converts
the memory[1]. It will trigger this warning 100% because by default the
guest memfd is not mapped as shared (MAP_SHARED).
Simply remove the warning will fail the purpose of this patch. The other
option is to skip the warning for TDX case, which looks vary hacky. Do
you have any idea?
[1]
https://lore.kernel.org/all/20230914035117.3285885-18-xiaoyao.li@intel.com/
> ret = fallocate(rb->fd, FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE,
> start, length);
> if (ret) {
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v2 1/4] softmmu/physmem: Warn with ram_block_discard_range() on MAP_PRIVATE file mapping
2023-10-18 3:02 ` Xiaoyao Li
@ 2023-10-18 7:42 ` David Hildenbrand
2023-10-18 9:02 ` Xiaoyao Li
0 siblings, 1 reply; 27+ messages in thread
From: David Hildenbrand @ 2023-10-18 7:42 UTC (permalink / raw)
To: Xiaoyao Li, qemu-devel
Cc: Michael S. Tsirkin, Juan Quintela, Peter Xu, Leonardo Bras,
Paolo Bonzini, Philippe Mathieu-Daudé, Peng Tao,
Mario Casquero
On 18.10.23 05:02, Xiaoyao Li wrote:
> David,
>
> On 7/6/2023 3:56 PM, David Hildenbrand wrote:
>> ram_block_discard_range() cannot possibly do the right thing in
>> MAP_PRIVATE file mappings in the general case.
>>
>> To achieve the documented semantics, we also have to punch a hole into
>> the file, possibly messing with other MAP_PRIVATE/MAP_SHARED mappings
>> of such a file.
>>
>> For example, using VM templating -- see commit b17fbbe55cba ("migration:
>> allow private destination ram with x-ignore-shared") -- in combination with
>> any mechanism that relies on discarding of RAM is problematic. This
>> includes:
>> * Postcopy live migration
>> * virtio-balloon inflation/deflation or free-page-reporting
>> * virtio-mem
>>
>> So at least warn that there is something possibly dangerous is going on
>> when using ram_block_discard_range() in these cases.
>>
>> Acked-by: Peter Xu <peterx@redhat.com>
>> Tested-by: Mario Casquero <mcasquer@redhat.com>
>> Signed-off-by: David Hildenbrand <david@redhat.com>
>> ---
>> softmmu/physmem.c | 18 ++++++++++++++++++
>> 1 file changed, 18 insertions(+)
>>
>> diff --git a/softmmu/physmem.c b/softmmu/physmem.c
>> index bda475a719..4ee157bda4 100644
>> --- a/softmmu/physmem.c
>> +++ b/softmmu/physmem.c
>> @@ -3456,6 +3456,24 @@ int ram_block_discard_range(RAMBlock *rb, uint64_t start, size_t length)
>> * so a userfault will trigger.
>> */
>> #ifdef CONFIG_FALLOCATE_PUNCH_HOLE
>> + /*
>> + * We'll discard data from the actual file, even though we only
>> + * have a MAP_PRIVATE mapping, possibly messing with other
>> + * MAP_PRIVATE/MAP_SHARED mappings. There is no easy way to
>> + * change that behavior whithout violating the promised
>> + * semantics of ram_block_discard_range().
>> + *
>> + * Only warn, because it work as long as nobody else uses that
>> + * file.
>> + */
>> + if (!qemu_ram_is_shared(rb)) {
>> + warn_report_once("ram_block_discard_range: Discarding RAM"
>> + " in private file mappings is possibly"
>> + " dangerous, because it will modify the"
>> + " underlying file and will affect other"
>> + " users of the file");
>> + }
>> +
>
> TDX has two types of memory backend for each RAM, shared memory and
> private memory. Private memory is serviced by guest memfd and shared
> memory can also be backed with a fd.
>
> At any time, only one type needs to be valid, which means the opposite
> can be discarded. We do implement the memory discard when TDX converts
> the memory[1]. It will trigger this warning 100% because by default the
> guest memfd is not mapped as shared (MAP_SHARED).
If MAP_PRIVATE is not involved and you are taking the pages directly out
of the memfd, you should mark that thing as shared. Anonymous memory is
never involved.
"Private memory" is only private from the guest POV, not from a mmap()
point of view.
Two different concepts of "private".
>
> Simply remove the warning will fail the purpose of this patch. The other
> option is to skip the warning for TDX case, which looks vary hacky. Do
> you have any idea?
For TDX, all memory backends / RAMBlocks should be marked as "shared",
and you should fail if that is not provided by the user.
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v2 1/4] softmmu/physmem: Warn with ram_block_discard_range() on MAP_PRIVATE file mapping
2023-10-18 7:42 ` David Hildenbrand
@ 2023-10-18 9:02 ` Xiaoyao Li
2023-10-18 9:26 ` David Hildenbrand
0 siblings, 1 reply; 27+ messages in thread
From: Xiaoyao Li @ 2023-10-18 9:02 UTC (permalink / raw)
To: David Hildenbrand, qemu-devel
Cc: Michael S. Tsirkin, Juan Quintela, Peter Xu, Leonardo Bras,
Paolo Bonzini, Philippe Mathieu-Daudé, Peng Tao,
Mario Casquero
On 10/18/2023 3:42 PM, David Hildenbrand wrote:
> On 18.10.23 05:02, Xiaoyao Li wrote:
>> David,
>>
>> On 7/6/2023 3:56 PM, David Hildenbrand wrote:
>>> ram_block_discard_range() cannot possibly do the right thing in
>>> MAP_PRIVATE file mappings in the general case.
>>>
>>> To achieve the documented semantics, we also have to punch a hole into
>>> the file, possibly messing with other MAP_PRIVATE/MAP_SHARED mappings
>>> of such a file.
>>>
>>> For example, using VM templating -- see commit b17fbbe55cba ("migration:
>>> allow private destination ram with x-ignore-shared") -- in
>>> combination with
>>> any mechanism that relies on discarding of RAM is problematic. This
>>> includes:
>>> * Postcopy live migration
>>> * virtio-balloon inflation/deflation or free-page-reporting
>>> * virtio-mem
>>>
>>> So at least warn that there is something possibly dangerous is going on
>>> when using ram_block_discard_range() in these cases.
>>>
>>> Acked-by: Peter Xu <peterx@redhat.com>
>>> Tested-by: Mario Casquero <mcasquer@redhat.com>
>>> Signed-off-by: David Hildenbrand <david@redhat.com>
>>> ---
>>> softmmu/physmem.c | 18 ++++++++++++++++++
>>> 1 file changed, 18 insertions(+)
>>>
>>> diff --git a/softmmu/physmem.c b/softmmu/physmem.c
>>> index bda475a719..4ee157bda4 100644
>>> --- a/softmmu/physmem.c
>>> +++ b/softmmu/physmem.c
>>> @@ -3456,6 +3456,24 @@ int ram_block_discard_range(RAMBlock *rb,
>>> uint64_t start, size_t length)
>>> * so a userfault will trigger.
>>> */
>>> #ifdef CONFIG_FALLOCATE_PUNCH_HOLE
>>> + /*
>>> + * We'll discard data from the actual file, even though
>>> we only
>>> + * have a MAP_PRIVATE mapping, possibly messing with other
>>> + * MAP_PRIVATE/MAP_SHARED mappings. There is no easy way to
>>> + * change that behavior whithout violating the promised
>>> + * semantics of ram_block_discard_range().
>>> + *
>>> + * Only warn, because it work as long as nobody else
>>> uses that
>>> + * file.
>>> + */
>>> + if (!qemu_ram_is_shared(rb)) {
>>> + warn_report_once("ram_block_discard_range:
>>> Discarding RAM"
>>> + " in private file mappings is
>>> possibly"
>>> + " dangerous, because it will modify
>>> the"
>>> + " underlying file and will affect
>>> other"
>>> + " users of the file");
>>> + }
>>> +
>>
>> TDX has two types of memory backend for each RAM, shared memory and
>> private memory. Private memory is serviced by guest memfd and shared
>> memory can also be backed with a fd.
>>
>> At any time, only one type needs to be valid, which means the opposite
>> can be discarded. We do implement the memory discard when TDX converts
>> the memory[1]. It will trigger this warning 100% because by default the
>> guest memfd is not mapped as shared (MAP_SHARED).
>
> If MAP_PRIVATE is not involved and you are taking the pages directly out
> of the memfd, you should mark that thing as shared.
Is it the general rule of Linux? Of just the rule of QEMU memory discard?
> Anonymous memory is never involved.
Could you please elaborate more on this? What do you want to express
here regrading anonymous memory? (Sorry that I'm newbie for mmap stuff)
>
> "Private memory" is only private from the guest POV, not from a mmap()
> point of view.
>
> Two different concepts of "private".
>
>>
>> Simply remove the warning will fail the purpose of this patch. The other
>> option is to skip the warning for TDX case, which looks vary hacky. Do
>> you have any idea?
>
> For TDX, all memory backends / RAMBlocks should be marked as "shared",
> and you should fail if that is not provided by the user.
As I asked above, I want to understand the logic clearly. Is mapped as
shared is a must to support the memory discard? i.e., if we want to
support memory discard after memory type change, then the memory must be
mapped with MAP_SHARED?
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v2 1/4] softmmu/physmem: Warn with ram_block_discard_range() on MAP_PRIVATE file mapping
2023-10-18 9:02 ` Xiaoyao Li
@ 2023-10-18 9:26 ` David Hildenbrand
2023-10-18 16:27 ` Xiaoyao Li
0 siblings, 1 reply; 27+ messages in thread
From: David Hildenbrand @ 2023-10-18 9:26 UTC (permalink / raw)
To: Xiaoyao Li, qemu-devel
Cc: Michael S. Tsirkin, Juan Quintela, Peter Xu, Leonardo Bras,
Paolo Bonzini, Philippe Mathieu-Daudé, Peng Tao,
Mario Casquero
On 18.10.23 11:02, Xiaoyao Li wrote:
> On 10/18/2023 3:42 PM, David Hildenbrand wrote:
>> On 18.10.23 05:02, Xiaoyao Li wrote:
>>> David,
>>>
>>> On 7/6/2023 3:56 PM, David Hildenbrand wrote:
>>>> ram_block_discard_range() cannot possibly do the right thing in
>>>> MAP_PRIVATE file mappings in the general case.
>>>>
>>>> To achieve the documented semantics, we also have to punch a hole into
>>>> the file, possibly messing with other MAP_PRIVATE/MAP_SHARED mappings
>>>> of such a file.
>>>>
>>>> For example, using VM templating -- see commit b17fbbe55cba ("migration:
>>>> allow private destination ram with x-ignore-shared") -- in
>>>> combination with
>>>> any mechanism that relies on discarding of RAM is problematic. This
>>>> includes:
>>>> * Postcopy live migration
>>>> * virtio-balloon inflation/deflation or free-page-reporting
>>>> * virtio-mem
>>>>
>>>> So at least warn that there is something possibly dangerous is going on
>>>> when using ram_block_discard_range() in these cases.
>>>>
>>>> Acked-by: Peter Xu <peterx@redhat.com>
>>>> Tested-by: Mario Casquero <mcasquer@redhat.com>
>>>> Signed-off-by: David Hildenbrand <david@redhat.com>
>>>> ---
>>>> softmmu/physmem.c | 18 ++++++++++++++++++
>>>> 1 file changed, 18 insertions(+)
>>>>
>>>> diff --git a/softmmu/physmem.c b/softmmu/physmem.c
>>>> index bda475a719..4ee157bda4 100644
>>>> --- a/softmmu/physmem.c
>>>> +++ b/softmmu/physmem.c
>>>> @@ -3456,6 +3456,24 @@ int ram_block_discard_range(RAMBlock *rb,
>>>> uint64_t start, size_t length)
>>>> * so a userfault will trigger.
>>>> */
>>>> #ifdef CONFIG_FALLOCATE_PUNCH_HOLE
>>>> + /*
>>>> + * We'll discard data from the actual file, even though
>>>> we only
>>>> + * have a MAP_PRIVATE mapping, possibly messing with other
>>>> + * MAP_PRIVATE/MAP_SHARED mappings. There is no easy way to
>>>> + * change that behavior whithout violating the promised
>>>> + * semantics of ram_block_discard_range().
>>>> + *
>>>> + * Only warn, because it work as long as nobody else
>>>> uses that
>>>> + * file.
>>>> + */
>>>> + if (!qemu_ram_is_shared(rb)) {
>>>> + warn_report_once("ram_block_discard_range:
>>>> Discarding RAM"
>>>> + " in private file mappings is
>>>> possibly"
>>>> + " dangerous, because it will modify
>>>> the"
>>>> + " underlying file and will affect
>>>> other"
>>>> + " users of the file");
>>>> + }
>>>> +
>>>
>>> TDX has two types of memory backend for each RAM, shared memory and
>>> private memory. Private memory is serviced by guest memfd and shared
>>> memory can also be backed with a fd.
>>>
>>> At any time, only one type needs to be valid, which means the opposite
>>> can be discarded. We do implement the memory discard when TDX converts
>>> the memory[1]. It will trigger this warning 100% because by default the
>>> guest memfd is not mapped as shared (MAP_SHARED).
>>
>> If MAP_PRIVATE is not involved and you are taking the pages directly out
>> of the memfd, you should mark that thing as shared.
>
> Is it the general rule of Linux? Of just the rule of QEMU memory discard?
>
MAP_SHARED vs. MAP_PRIVATE is a common UNIX principle, and that's what
this flag and the check is about.
From mmap(2)
MAP_SHARED: Share this mapping. Updates to the mapping are visible to
other processes mapping the same region, and (in the case of file-backed
mappings) are carried through to the underlying file.
MAP_PRIVATE: Create a private copy-on-write mapping. Updates to the
mapping are not visible to other processes mapping the same file, and
are not carried through to the underlying file. It is unspecified
whether changes made to the file after the mmap() call are visible in
the mapped region.
For your purpose (no mmap() at all), we behave like MAP_SHARED -- as if
nothing special is done. No Copy-on-write, no anonymous memory.
>> Anonymous memory is never involved.
>
> Could you please elaborate more on this? What do you want to express
> here regrading anonymous memory? (Sorry that I'm newbie for mmap stuff)
Anonymous memory is memory that is private to a specific process, and
(see MAP_PRIVATE) modifications remain private to the process and are
not reflected to the file.
If you have a MAP_PRIVATE file mapping and write to a virtual memory
location, you'll get a process-private copy of the underlying pagecache
page. that's what we call anonymous memory, because it does not belong
to a specific file. fallocate(punch) would not free up that anonymous
memory.
>
>>
>> "Private memory" is only private from the guest POV, not from a mmap()
>> point of view.
>>
>> Two different concepts of "private".
>>
>>>
>>> Simply remove the warning will fail the purpose of this patch. The other
>>> option is to skip the warning for TDX case, which looks vary hacky. Do
>>> you have any idea?
>>
>> For TDX, all memory backends / RAMBlocks should be marked as "shared",
>> and you should fail if that is not provided by the user.
>
> As I asked above, I want to understand the logic clearly. Is mapped as
> shared is a must to support the memory discard? i.e., if we want to
> support memory discard after memory type change, then the memory must be
> mapped with MAP_SHARED?
MAP_PIRVATE means that it's not sufficient to only fallocate(punch) the
fd to free up all memory for a virtual address, because there might be
anonymous memory in a private mapping that has to be freed up using
MADV_DONTNEED. That's why this functions handles both cases differently,
and warns if something possibly nasty is being done.
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v2 1/4] softmmu/physmem: Warn with ram_block_discard_range() on MAP_PRIVATE file mapping
2023-10-18 9:26 ` David Hildenbrand
@ 2023-10-18 16:27 ` Xiaoyao Li
2023-10-19 8:26 ` David Hildenbrand
0 siblings, 1 reply; 27+ messages in thread
From: Xiaoyao Li @ 2023-10-18 16:27 UTC (permalink / raw)
To: David Hildenbrand, qemu-devel
Cc: Michael S. Tsirkin, Juan Quintela, Peter Xu, Leonardo Bras,
Paolo Bonzini, Philippe Mathieu-Daudé, Peng Tao,
Mario Casquero
On 10/18/2023 5:26 PM, David Hildenbrand wrote:
> On 18.10.23 11:02, Xiaoyao Li wrote:
>> On 10/18/2023 3:42 PM, David Hildenbrand wrote:
>>> On 18.10.23 05:02, Xiaoyao Li wrote:
>>>> David,
>>>>
>>>> On 7/6/2023 3:56 PM, David Hildenbrand wrote:
>>>>> ram_block_discard_range() cannot possibly do the right thing in
>>>>> MAP_PRIVATE file mappings in the general case.
>>>>>
>>>>> To achieve the documented semantics, we also have to punch a hole into
>>>>> the file, possibly messing with other MAP_PRIVATE/MAP_SHARED mappings
>>>>> of such a file.
>>>>>
>>>>> For example, using VM templating -- see commit b17fbbe55cba
>>>>> ("migration:
>>>>> allow private destination ram with x-ignore-shared") -- in
>>>>> combination with
>>>>> any mechanism that relies on discarding of RAM is problematic. This
>>>>> includes:
>>>>> * Postcopy live migration
>>>>> * virtio-balloon inflation/deflation or free-page-reporting
>>>>> * virtio-mem
>>>>>
>>>>> So at least warn that there is something possibly dangerous is
>>>>> going on
>>>>> when using ram_block_discard_range() in these cases.
>>>>>
>>>>> Acked-by: Peter Xu <peterx@redhat.com>
>>>>> Tested-by: Mario Casquero <mcasquer@redhat.com>
>>>>> Signed-off-by: David Hildenbrand <david@redhat.com>
>>>>> ---
>>>>> softmmu/physmem.c | 18 ++++++++++++++++++
>>>>> 1 file changed, 18 insertions(+)
>>>>>
>>>>> diff --git a/softmmu/physmem.c b/softmmu/physmem.c
>>>>> index bda475a719..4ee157bda4 100644
>>>>> --- a/softmmu/physmem.c
>>>>> +++ b/softmmu/physmem.c
>>>>> @@ -3456,6 +3456,24 @@ int ram_block_discard_range(RAMBlock *rb,
>>>>> uint64_t start, size_t length)
>>>>> * so a userfault will trigger.
>>>>> */
>>>>> #ifdef CONFIG_FALLOCATE_PUNCH_HOLE
>>>>> + /*
>>>>> + * We'll discard data from the actual file, even though
>>>>> we only
>>>>> + * have a MAP_PRIVATE mapping, possibly messing with
>>>>> other
>>>>> + * MAP_PRIVATE/MAP_SHARED mappings. There is no easy
>>>>> way to
>>>>> + * change that behavior whithout violating the promised
>>>>> + * semantics of ram_block_discard_range().
>>>>> + *
>>>>> + * Only warn, because it work as long as nobody else
>>>>> uses that
>>>>> + * file.
>>>>> + */
>>>>> + if (!qemu_ram_is_shared(rb)) {
>>>>> + warn_report_once("ram_block_discard_range:
>>>>> Discarding RAM"
>>>>> + " in private file mappings is
>>>>> possibly"
>>>>> + " dangerous, because it will modify
>>>>> the"
>>>>> + " underlying file and will affect
>>>>> other"
>>>>> + " users of the file");
>>>>> + }
>>>>> +
>>>>
>>>> TDX has two types of memory backend for each RAM, shared memory and
>>>> private memory. Private memory is serviced by guest memfd and shared
>>>> memory can also be backed with a fd.
>>>>
>>>> At any time, only one type needs to be valid, which means the opposite
>>>> can be discarded. We do implement the memory discard when TDX converts
>>>> the memory[1]. It will trigger this warning 100% because by default the
>>>> guest memfd is not mapped as shared (MAP_SHARED).
>>>
>>> If MAP_PRIVATE is not involved and you are taking the pages directly out
>>> of the memfd, you should mark that thing as shared.
>>
>> Is it the general rule of Linux? Of just the rule of QEMU memory discard?
>>
>
> MAP_SHARED vs. MAP_PRIVATE is a common UNIX principle, and that's what
> this flag and the check is about.
>
> From mmap(2)
>
> MAP_SHARED: Share this mapping. Updates to the mapping are visible to
> other processes mapping the same region, and (in the case of file-backed
> mappings) are carried through to the underlying file.
>
> MAP_PRIVATE: Create a private copy-on-write mapping. Updates to the
> mapping are not visible to other processes mapping the same file, and
> are not carried through to the underlying file. It is unspecified
> whether changes made to the file after the mmap() call are visible in
> the mapped region.
>
> For your purpose (no mmap() at all), we behave like MAP_SHARED -- as if
> nothing special is done. No Copy-on-write, no anonymous memory.
>
>>> Anonymous memory is never involved.
>>
>> Could you please elaborate more on this? What do you want to express
>> here regrading anonymous memory? (Sorry that I'm newbie for mmap stuff)
>
> Anonymous memory is memory that is private to a specific process, and
> (see MAP_PRIVATE) modifications remain private to the process and are
> not reflected to the file.
>
> If you have a MAP_PRIVATE file mapping and write to a virtual memory
> location, you'll get a process-private copy of the underlying pagecache
> page. that's what we call anonymous memory, because it does not belong
> to a specific file. fallocate(punch) would not free up that anonymous
> memory.
For guest memfd, it does implement kvm_gmem_fallocate as .fallocate()
callback, which calls truncate_inode_pages_range() [*].
I'm not sure if it frees up the memory. I need to learn it.
[*]
https://github.com/kvm-x86/linux/blob/911b515af3ec5f53992b9cc162cf7d3893c2fbe2/virt/kvm/guest_memfd.c#L147C73-L147C73
>>
>>>
>>> "Private memory" is only private from the guest POV, not from a mmap()
>>> point of view.
>>>
>>> Two different concepts of "private".
>>>
>>>>
>>>> Simply remove the warning will fail the purpose of this patch. The
>>>> other
>>>> option is to skip the warning for TDX case, which looks vary hacky. Do
>>>> you have any idea?
>>>
>>> For TDX, all memory backends / RAMBlocks should be marked as "shared",
>>> and you should fail if that is not provided by the user.
>>
>> As I asked above, I want to understand the logic clearly. Is mapped as
>> shared is a must to support the memory discard? i.e., if we want to
>> support memory discard after memory type change, then the memory must be
>> mapped with MAP_SHARED?
>
> MAP_PIRVATE means that it's not sufficient to only fallocate(punch) the
> fd to free up all memory for a virtual address, because there might be
> anonymous memory in a private mapping that has to be freed up using
> MADV_DONTNEED.
I can understand this. But it seems unrelated to my question: Is mapped
as shared is a must to support the memory discard?
e.g., if use below parameters to specify the RAM for a VM
-object memory-backend-memfd,id=mem0,size=2G \
-machine memory-backend=mem0
since not specifying "share" property, the ram_block doesn't have
RAM_SHARED set. If want to discard some range of this memfd, it triggers
the warning. Is this warning expected?
I know it is not a possible case for current QEMU, but it's true for
future TDX VM. TDX VM can have memory-backend-memfd as the backend for
shared memory and kvm gmem memfd for private memory. At any time, for
any memory range, only one type is valid, thus the range in opposite
memfd can be fallocated.
Here I get your message as "the ramblock needs to have RAM_SHARED flag
to allow the fallocate of the memfd". This is what I don't understand.
That's why this functions handles both cases differently,
> and warns if something possibly nasty is being done.
>
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v2 1/4] softmmu/physmem: Warn with ram_block_discard_range() on MAP_PRIVATE file mapping
2023-10-18 16:27 ` Xiaoyao Li
@ 2023-10-19 8:26 ` David Hildenbrand
2023-10-19 9:26 ` Xiaoyao Li
0 siblings, 1 reply; 27+ messages in thread
From: David Hildenbrand @ 2023-10-19 8:26 UTC (permalink / raw)
To: Xiaoyao Li, qemu-devel
Cc: Michael S. Tsirkin, Juan Quintela, Peter Xu, Leonardo Bras,
Paolo Bonzini, Philippe Mathieu-Daudé, Peng Tao,
Mario Casquero
On 18.10.23 18:27, Xiaoyao Li wrote:
> On 10/18/2023 5:26 PM, David Hildenbrand wrote:
>> On 18.10.23 11:02, Xiaoyao Li wrote:
>>> On 10/18/2023 3:42 PM, David Hildenbrand wrote:
>>>> On 18.10.23 05:02, Xiaoyao Li wrote:
>>>>> David,
>>>>>
>>>>> On 7/6/2023 3:56 PM, David Hildenbrand wrote:
>>>>>> ram_block_discard_range() cannot possibly do the right thing in
>>>>>> MAP_PRIVATE file mappings in the general case.
>>>>>>
>>>>>> To achieve the documented semantics, we also have to punch a hole into
>>>>>> the file, possibly messing with other MAP_PRIVATE/MAP_SHARED mappings
>>>>>> of such a file.
>>>>>>
>>>>>> For example, using VM templating -- see commit b17fbbe55cba
>>>>>> ("migration:
>>>>>> allow private destination ram with x-ignore-shared") -- in
>>>>>> combination with
>>>>>> any mechanism that relies on discarding of RAM is problematic. This
>>>>>> includes:
>>>>>> * Postcopy live migration
>>>>>> * virtio-balloon inflation/deflation or free-page-reporting
>>>>>> * virtio-mem
>>>>>>
>>>>>> So at least warn that there is something possibly dangerous is
>>>>>> going on
>>>>>> when using ram_block_discard_range() in these cases.
>>>>>>
>>>>>> Acked-by: Peter Xu <peterx@redhat.com>
>>>>>> Tested-by: Mario Casquero <mcasquer@redhat.com>
>>>>>> Signed-off-by: David Hildenbrand <david@redhat.com>
>>>>>> ---
>>>>>> softmmu/physmem.c | 18 ++++++++++++++++++
>>>>>> 1 file changed, 18 insertions(+)
>>>>>>
>>>>>> diff --git a/softmmu/physmem.c b/softmmu/physmem.c
>>>>>> index bda475a719..4ee157bda4 100644
>>>>>> --- a/softmmu/physmem.c
>>>>>> +++ b/softmmu/physmem.c
>>>>>> @@ -3456,6 +3456,24 @@ int ram_block_discard_range(RAMBlock *rb,
>>>>>> uint64_t start, size_t length)
>>>>>> * so a userfault will trigger.
>>>>>> */
>>>>>> #ifdef CONFIG_FALLOCATE_PUNCH_HOLE
>>>>>> + /*
>>>>>> + * We'll discard data from the actual file, even though
>>>>>> we only
>>>>>> + * have a MAP_PRIVATE mapping, possibly messing with
>>>>>> other
>>>>>> + * MAP_PRIVATE/MAP_SHARED mappings. There is no easy
>>>>>> way to
>>>>>> + * change that behavior whithout violating the promised
>>>>>> + * semantics of ram_block_discard_range().
>>>>>> + *
>>>>>> + * Only warn, because it work as long as nobody else
>>>>>> uses that
>>>>>> + * file.
>>>>>> + */
>>>>>> + if (!qemu_ram_is_shared(rb)) {
>>>>>> + warn_report_once("ram_block_discard_range:
>>>>>> Discarding RAM"
>>>>>> + " in private file mappings is
>>>>>> possibly"
>>>>>> + " dangerous, because it will modify
>>>>>> the"
>>>>>> + " underlying file and will affect
>>>>>> other"
>>>>>> + " users of the file");
>>>>>> + }
>>>>>> +
>>>>>
>>>>> TDX has two types of memory backend for each RAM, shared memory and
>>>>> private memory. Private memory is serviced by guest memfd and shared
>>>>> memory can also be backed with a fd.
>>>>>
>>>>> At any time, only one type needs to be valid, which means the opposite
>>>>> can be discarded. We do implement the memory discard when TDX converts
>>>>> the memory[1]. It will trigger this warning 100% because by default the
>>>>> guest memfd is not mapped as shared (MAP_SHARED).
>>>>
>>>> If MAP_PRIVATE is not involved and you are taking the pages directly out
>>>> of the memfd, you should mark that thing as shared.
>>>
>>> Is it the general rule of Linux? Of just the rule of QEMU memory discard?
>>>
>>
>> MAP_SHARED vs. MAP_PRIVATE is a common UNIX principle, and that's what
>> this flag and the check is about.
>>
>> From mmap(2)
>>
>> MAP_SHARED: Share this mapping. Updates to the mapping are visible to
>> other processes mapping the same region, and (in the case of file-backed
>> mappings) are carried through to the underlying file.
>>
>> MAP_PRIVATE: Create a private copy-on-write mapping. Updates to the
>> mapping are not visible to other processes mapping the same file, and
>> are not carried through to the underlying file. It is unspecified
>> whether changes made to the file after the mmap() call are visible in
>> the mapped region.
>>
>> For your purpose (no mmap() at all), we behave like MAP_SHARED -- as if
>> nothing special is done. No Copy-on-write, no anonymous memory.
>>
>>>> Anonymous memory is never involved.
>>>
>>> Could you please elaborate more on this? What do you want to express
>>> here regrading anonymous memory? (Sorry that I'm newbie for mmap stuff)
>>
>> Anonymous memory is memory that is private to a specific process, and
>> (see MAP_PRIVATE) modifications remain private to the process and are
>> not reflected to the file.
>>
>> If you have a MAP_PRIVATE file mapping and write to a virtual memory
>> location, you'll get a process-private copy of the underlying pagecache
>> page. that's what we call anonymous memory, because it does not belong
>> to a specific file. fallocate(punch) would not free up that anonymous
>> memory.
>
> For guest memfd, it does implement kvm_gmem_fallocate as .fallocate()
> callback, which calls truncate_inode_pages_range() [*].
>
> I'm not sure if it frees up the memory. I need to learn it.
>
> [*]
> https://github.com/kvm-x86/linux/blob/911b515af3ec5f53992b9cc162cf7d3893c2fbe2/virt/kvm/guest_memfd.c#L147C73-L147C73
>
>>>
>>>>
>>>> "Private memory" is only private from the guest POV, not from a mmap()
>>>> point of view.
>>>>
>>>> Two different concepts of "private".
>>>>
>>>>>
>>>>> Simply remove the warning will fail the purpose of this patch. The
>>>>> other
>>>>> option is to skip the warning for TDX case, which looks vary hacky. Do
>>>>> you have any idea?
>>>>
>>>> For TDX, all memory backends / RAMBlocks should be marked as "shared",
>>>> and you should fail if that is not provided by the user.
>>>
>>> As I asked above, I want to understand the logic clearly. Is mapped as
>>> shared is a must to support the memory discard? i.e., if we want to
>>> support memory discard after memory type change, then the memory must be
>>> mapped with MAP_SHARED?
>>
>> MAP_PIRVATE means that it's not sufficient to only fallocate(punch) the
>> fd to free up all memory for a virtual address, because there might be
>> anonymous memory in a private mapping that has to be freed up using
>> MADV_DONTNEED.
>
> I can understand this. But it seems unrelated to my question: Is mapped
> as shared is a must to support the memory discard?
Sorry, I don't quite get what you are asking that I haven't answered
yet. Let's talk about the issue you are seeing below.
>
> e.g., if use below parameters to specify the RAM for a VM
>
> -object memory-backend-memfd,id=mem0,size=2G \
> -machine memory-backend=mem0
>
> since not specifying "share" property, the ram_block doesn't have
> RAM_SHARED set. If want to discard some range of this memfd, it triggers
> the warning. Is this warning expected?
That should not be the case. See "memfd_backend_instance_init" where we
set share=true. In memfd_backend_memory_alloc(), we set RAM_SHARED.
We only default to share=off for memory-backend-file (well, and
memory-backend-ram).
So are you sure you get this error message in the configuration you are
describing here?
>
> I know it is not a possible case for current QEMU, but it's true for
> future TDX VM. TDX VM can have memory-backend-memfd as the backend for
> shared memory and kvm gmem memfd for private memory. At any time, for
> any memory range, only one type is valid, thus the range in opposite
> memfd can be fallocated.
Right.
>
> Here I get your message as "the ramblock needs to have RAM_SHARED flag
> to allow the fallocate of the memfd". This is what I don't understand.
The problem I am seeing is that either
(a) Someone explicitly sets share=off for some reason for
memory-backend-memfd, triggering the warning.
(b) We somehow lose RAM_SHARED in above configuration, which would be
bad and trigger the warning.
Can you make sure that (a) is not the case?
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v2 1/4] softmmu/physmem: Warn with ram_block_discard_range() on MAP_PRIVATE file mapping
2023-10-19 8:26 ` David Hildenbrand
@ 2023-10-19 9:26 ` Xiaoyao Li
2023-10-19 9:43 ` David Hildenbrand
0 siblings, 1 reply; 27+ messages in thread
From: Xiaoyao Li @ 2023-10-19 9:26 UTC (permalink / raw)
To: David Hildenbrand, qemu-devel
Cc: Michael S. Tsirkin, Juan Quintela, Peter Xu, Leonardo Bras,
Paolo Bonzini, Philippe Mathieu-Daudé, Peng Tao,
Mario Casquero
On 10/19/2023 4:26 PM, David Hildenbrand wrote:
> On 18.10.23 18:27, Xiaoyao Li wrote:
>> On 10/18/2023 5:26 PM, David Hildenbrand wrote:
>>> On 18.10.23 11:02, Xiaoyao Li wrote:
>>>> On 10/18/2023 3:42 PM, David Hildenbrand wrote:
>>>>> On 18.10.23 05:02, Xiaoyao Li wrote:
>>>>>> David,
>>>>>>
>>>>>> On 7/6/2023 3:56 PM, David Hildenbrand wrote:
>>>>>>> ram_block_discard_range() cannot possibly do the right thing in
>>>>>>> MAP_PRIVATE file mappings in the general case.
>>>>>>>
>>>>>>> To achieve the documented semantics, we also have to punch a hole
>>>>>>> into
>>>>>>> the file, possibly messing with other MAP_PRIVATE/MAP_SHARED
>>>>>>> mappings
>>>>>>> of such a file.
>>>>>>>
>>>>>>> For example, using VM templating -- see commit b17fbbe55cba
>>>>>>> ("migration:
>>>>>>> allow private destination ram with x-ignore-shared") -- in
>>>>>>> combination with
>>>>>>> any mechanism that relies on discarding of RAM is problematic. This
>>>>>>> includes:
>>>>>>> * Postcopy live migration
>>>>>>> * virtio-balloon inflation/deflation or free-page-reporting
>>>>>>> * virtio-mem
>>>>>>>
>>>>>>> So at least warn that there is something possibly dangerous is
>>>>>>> going on
>>>>>>> when using ram_block_discard_range() in these cases.
>>>>>>>
>>>>>>> Acked-by: Peter Xu <peterx@redhat.com>
>>>>>>> Tested-by: Mario Casquero <mcasquer@redhat.com>
>>>>>>> Signed-off-by: David Hildenbrand <david@redhat.com>
>>>>>>> ---
>>>>>>> softmmu/physmem.c | 18 ++++++++++++++++++
>>>>>>> 1 file changed, 18 insertions(+)
>>>>>>>
>>>>>>> diff --git a/softmmu/physmem.c b/softmmu/physmem.c
>>>>>>> index bda475a719..4ee157bda4 100644
>>>>>>> --- a/softmmu/physmem.c
>>>>>>> +++ b/softmmu/physmem.c
>>>>>>> @@ -3456,6 +3456,24 @@ int ram_block_discard_range(RAMBlock *rb,
>>>>>>> uint64_t start, size_t length)
>>>>>>> * so a userfault will trigger.
>>>>>>> */
>>>>>>> #ifdef CONFIG_FALLOCATE_PUNCH_HOLE
>>>>>>> + /*
>>>>>>> + * We'll discard data from the actual file, even though
>>>>>>> we only
>>>>>>> + * have a MAP_PRIVATE mapping, possibly messing with
>>>>>>> other
>>>>>>> + * MAP_PRIVATE/MAP_SHARED mappings. There is no easy
>>>>>>> way to
>>>>>>> + * change that behavior whithout violating the promised
>>>>>>> + * semantics of ram_block_discard_range().
>>>>>>> + *
>>>>>>> + * Only warn, because it work as long as nobody else
>>>>>>> uses that
>>>>>>> + * file.
>>>>>>> + */
>>>>>>> + if (!qemu_ram_is_shared(rb)) {
>>>>>>> + warn_report_once("ram_block_discard_range:
>>>>>>> Discarding RAM"
>>>>>>> + " in private file mappings is
>>>>>>> possibly"
>>>>>>> + " dangerous, because it will
>>>>>>> modify
>>>>>>> the"
>>>>>>> + " underlying file and will affect
>>>>>>> other"
>>>>>>> + " users of the file");
>>>>>>> + }
>>>>>>> +
>>>>>>
>>>>>> TDX has two types of memory backend for each RAM, shared memory and
>>>>>> private memory. Private memory is serviced by guest memfd and shared
>>>>>> memory can also be backed with a fd.
>>>>>>
>>>>>> At any time, only one type needs to be valid, which means the
>>>>>> opposite
>>>>>> can be discarded. We do implement the memory discard when TDX
>>>>>> converts
>>>>>> the memory[1]. It will trigger this warning 100% because by
>>>>>> default the
>>>>>> guest memfd is not mapped as shared (MAP_SHARED).
>>>>>
>>>>> If MAP_PRIVATE is not involved and you are taking the pages
>>>>> directly out
>>>>> of the memfd, you should mark that thing as shared.
>>>>
>>>> Is it the general rule of Linux? Of just the rule of QEMU memory
>>>> discard?
>>>>
>>>
>>> MAP_SHARED vs. MAP_PRIVATE is a common UNIX principle, and that's what
>>> this flag and the check is about.
>>>
>>> From mmap(2)
>>>
>>> MAP_SHARED: Share this mapping. Updates to the mapping are visible to
>>> other processes mapping the same region, and (in the case of file-backed
>>> mappings) are carried through to the underlying file.
>>>
>>> MAP_PRIVATE: Create a private copy-on-write mapping. Updates to the
>>> mapping are not visible to other processes mapping the same file, and
>>> are not carried through to the underlying file. It is unspecified
>>> whether changes made to the file after the mmap() call are visible in
>>> the mapped region.
>>>
>>> For your purpose (no mmap() at all), we behave like MAP_SHARED -- as if
>>> nothing special is done. No Copy-on-write, no anonymous memory.
>>>
>>>>> Anonymous memory is never involved.
>>>>
>>>> Could you please elaborate more on this? What do you want to express
>>>> here regrading anonymous memory? (Sorry that I'm newbie for mmap stuff)
>>>
>>> Anonymous memory is memory that is private to a specific process, and
>>> (see MAP_PRIVATE) modifications remain private to the process and are
>>> not reflected to the file.
>>>
>>> If you have a MAP_PRIVATE file mapping and write to a virtual memory
>>> location, you'll get a process-private copy of the underlying pagecache
>>> page. that's what we call anonymous memory, because it does not belong
>>> to a specific file. fallocate(punch) would not free up that anonymous
>>> memory.
>>
>> For guest memfd, it does implement kvm_gmem_fallocate as .fallocate()
>> callback, which calls truncate_inode_pages_range() [*].
>>
>> I'm not sure if it frees up the memory. I need to learn it.
>>
>> [*]
>> https://github.com/kvm-x86/linux/blob/911b515af3ec5f53992b9cc162cf7d3893c2fbe2/virt/kvm/guest_memfd.c#L147C73-L147C73
>>
>>>>
>>>>>
>>>>> "Private memory" is only private from the guest POV, not from a mmap()
>>>>> point of view.
>>>>>
>>>>> Two different concepts of "private".
>>>>>
>>>>>>
>>>>>> Simply remove the warning will fail the purpose of this patch. The
>>>>>> other
>>>>>> option is to skip the warning for TDX case, which looks vary
>>>>>> hacky. Do
>>>>>> you have any idea?
>>>>>
>>>>> For TDX, all memory backends / RAMBlocks should be marked as "shared",
>>>>> and you should fail if that is not provided by the user.
>>>>
>>>> As I asked above, I want to understand the logic clearly. Is mapped as
>>>> shared is a must to support the memory discard? i.e., if we want to
>>>> support memory discard after memory type change, then the memory
>>>> must be
>>>> mapped with MAP_SHARED?
>>>
>>> MAP_PIRVATE means that it's not sufficient to only fallocate(punch) the
>>> fd to free up all memory for a virtual address, because there might be
>>> anonymous memory in a private mapping that has to be freed up using
>>> MADV_DONTNEED.
>>
>> I can understand this. But it seems unrelated to my question: Is mapped
>> as shared is a must to support the memory discard?
>
> Sorry, I don't quite get what you are asking that I haven't answered
> yet. Let's talk about the issue you are seeing below.
>
>>
>> e.g., if use below parameters to specify the RAM for a VM
>>
>> -object memory-backend-memfd,id=mem0,size=2G \
>> -machine memory-backend=mem0
>>
>> since not specifying "share" property, the ram_block doesn't have
>> RAM_SHARED set. If want to discard some range of this memfd, it triggers
>> the warning. Is this warning expected?
>
> That should not be the case. See "memfd_backend_instance_init" where we
> set share=true. In memfd_backend_memory_alloc(), we set RAM_SHARED.
>
> We only default to share=off for memory-backend-file (well, and
> memory-backend-ram).
>
> So are you sure you get this error message in the configuration you are
> describing here?
Sorry, I made an mistake. I was using "-object
memory-backend-ram,id=mem0,size=2G" instead of "memory-backend-memfd".
yes, when using "memory-backend-memfd" as the backend for TDX shared
memory, it doesn't trigger the warning because
memfd_backend_instance_init() set "share" to true.
When using "memory-backend-ram" as the backend for TDX shared memory,
the warning is triggered converting memory from private (kvm gmem) to
shared (backend-ram). In this case, there is a valid fd (kvm gmem fd),
so it goes to the path of need_fallocate. However,
qemu_ram_is_shared(rb) returns false because "memory-backend-ram"
doesn't have "share" default on. Thus, the warning is triggered.
It seems I need figure out a more proper solution to refactor the
ram_block_discard_range(), to make it applicable for kvm gmem discard case.
>>
>> I know it is not a possible case for current QEMU, but it's true for
>> future TDX VM. TDX VM can have memory-backend-memfd as the backend for
>> shared memory and kvm gmem memfd for private memory. At any time, for
>> any memory range, only one type is valid, thus the range in opposite
>> memfd can be fallocated.
>
> Right.
>
>>
>> Here I get your message as "the ramblock needs to have RAM_SHARED flag
>> to allow the fallocate of the memfd". This is what I don't understand.
>
> The problem I am seeing is that either
>
> (a) Someone explicitly sets share=off for some reason for
> memory-backend-memfd, triggering the warning.
>
> (b) We somehow lose RAM_SHARED in above configuration, which would be
> bad and trigger the warning.
>
> Can you make sure that (a) is not the case?
Above reply answers it. Sorry for it.
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v2 1/4] softmmu/physmem: Warn with ram_block_discard_range() on MAP_PRIVATE file mapping
2023-10-19 9:26 ` Xiaoyao Li
@ 2023-10-19 9:43 ` David Hildenbrand
0 siblings, 0 replies; 27+ messages in thread
From: David Hildenbrand @ 2023-10-19 9:43 UTC (permalink / raw)
To: Xiaoyao Li, qemu-devel
Cc: Michael S. Tsirkin, Juan Quintela, Peter Xu, Leonardo Bras,
Paolo Bonzini, Philippe Mathieu-Daudé, Peng Tao,
Mario Casquero
On 19.10.23 11:26, Xiaoyao Li wrote:
> On 10/19/2023 4:26 PM, David Hildenbrand wrote:
>> On 18.10.23 18:27, Xiaoyao Li wrote:
>>> On 10/18/2023 5:26 PM, David Hildenbrand wrote:
>>>> On 18.10.23 11:02, Xiaoyao Li wrote:
>>>>> On 10/18/2023 3:42 PM, David Hildenbrand wrote:
>>>>>> On 18.10.23 05:02, Xiaoyao Li wrote:
>>>>>>> David,
>>>>>>>
>>>>>>> On 7/6/2023 3:56 PM, David Hildenbrand wrote:
>>>>>>>> ram_block_discard_range() cannot possibly do the right thing in
>>>>>>>> MAP_PRIVATE file mappings in the general case.
>>>>>>>>
>>>>>>>> To achieve the documented semantics, we also have to punch a hole
>>>>>>>> into
>>>>>>>> the file, possibly messing with other MAP_PRIVATE/MAP_SHARED
>>>>>>>> mappings
>>>>>>>> of such a file.
>>>>>>>>
>>>>>>>> For example, using VM templating -- see commit b17fbbe55cba
>>>>>>>> ("migration:
>>>>>>>> allow private destination ram with x-ignore-shared") -- in
>>>>>>>> combination with
>>>>>>>> any mechanism that relies on discarding of RAM is problematic. This
>>>>>>>> includes:
>>>>>>>> * Postcopy live migration
>>>>>>>> * virtio-balloon inflation/deflation or free-page-reporting
>>>>>>>> * virtio-mem
>>>>>>>>
>>>>>>>> So at least warn that there is something possibly dangerous is
>>>>>>>> going on
>>>>>>>> when using ram_block_discard_range() in these cases.
>>>>>>>>
>>>>>>>> Acked-by: Peter Xu <peterx@redhat.com>
>>>>>>>> Tested-by: Mario Casquero <mcasquer@redhat.com>
>>>>>>>> Signed-off-by: David Hildenbrand <david@redhat.com>
>>>>>>>> ---
>>>>>>>> softmmu/physmem.c | 18 ++++++++++++++++++
>>>>>>>> 1 file changed, 18 insertions(+)
>>>>>>>>
>>>>>>>> diff --git a/softmmu/physmem.c b/softmmu/physmem.c
>>>>>>>> index bda475a719..4ee157bda4 100644
>>>>>>>> --- a/softmmu/physmem.c
>>>>>>>> +++ b/softmmu/physmem.c
>>>>>>>> @@ -3456,6 +3456,24 @@ int ram_block_discard_range(RAMBlock *rb,
>>>>>>>> uint64_t start, size_t length)
>>>>>>>> * so a userfault will trigger.
>>>>>>>> */
>>>>>>>> #ifdef CONFIG_FALLOCATE_PUNCH_HOLE
>>>>>>>> + /*
>>>>>>>> + * We'll discard data from the actual file, even though
>>>>>>>> we only
>>>>>>>> + * have a MAP_PRIVATE mapping, possibly messing with
>>>>>>>> other
>>>>>>>> + * MAP_PRIVATE/MAP_SHARED mappings. There is no easy
>>>>>>>> way to
>>>>>>>> + * change that behavior whithout violating the promised
>>>>>>>> + * semantics of ram_block_discard_range().
>>>>>>>> + *
>>>>>>>> + * Only warn, because it work as long as nobody else
>>>>>>>> uses that
>>>>>>>> + * file.
>>>>>>>> + */
>>>>>>>> + if (!qemu_ram_is_shared(rb)) {
>>>>>>>> + warn_report_once("ram_block_discard_range:
>>>>>>>> Discarding RAM"
>>>>>>>> + " in private file mappings is
>>>>>>>> possibly"
>>>>>>>> + " dangerous, because it will
>>>>>>>> modify
>>>>>>>> the"
>>>>>>>> + " underlying file and will affect
>>>>>>>> other"
>>>>>>>> + " users of the file");
>>>>>>>> + }
>>>>>>>> +
>>>>>>>
>>>>>>> TDX has two types of memory backend for each RAM, shared memory and
>>>>>>> private memory. Private memory is serviced by guest memfd and shared
>>>>>>> memory can also be backed with a fd.
>>>>>>>
>>>>>>> At any time, only one type needs to be valid, which means the
>>>>>>> opposite
>>>>>>> can be discarded. We do implement the memory discard when TDX
>>>>>>> converts
>>>>>>> the memory[1]. It will trigger this warning 100% because by
>>>>>>> default the
>>>>>>> guest memfd is not mapped as shared (MAP_SHARED).
>>>>>>
>>>>>> If MAP_PRIVATE is not involved and you are taking the pages
>>>>>> directly out
>>>>>> of the memfd, you should mark that thing as shared.
>>>>>
>>>>> Is it the general rule of Linux? Of just the rule of QEMU memory
>>>>> discard?
>>>>>
>>>>
>>>> MAP_SHARED vs. MAP_PRIVATE is a common UNIX principle, and that's what
>>>> this flag and the check is about.
>>>>
>>>> From mmap(2)
>>>>
>>>> MAP_SHARED: Share this mapping. Updates to the mapping are visible to
>>>> other processes mapping the same region, and (in the case of file-backed
>>>> mappings) are carried through to the underlying file.
>>>>
>>>> MAP_PRIVATE: Create a private copy-on-write mapping. Updates to the
>>>> mapping are not visible to other processes mapping the same file, and
>>>> are not carried through to the underlying file. It is unspecified
>>>> whether changes made to the file after the mmap() call are visible in
>>>> the mapped region.
>>>>
>>>> For your purpose (no mmap() at all), we behave like MAP_SHARED -- as if
>>>> nothing special is done. No Copy-on-write, no anonymous memory.
>>>>
>>>>>> Anonymous memory is never involved.
>>>>>
>>>>> Could you please elaborate more on this? What do you want to express
>>>>> here regrading anonymous memory? (Sorry that I'm newbie for mmap stuff)
>>>>
>>>> Anonymous memory is memory that is private to a specific process, and
>>>> (see MAP_PRIVATE) modifications remain private to the process and are
>>>> not reflected to the file.
>>>>
>>>> If you have a MAP_PRIVATE file mapping and write to a virtual memory
>>>> location, you'll get a process-private copy of the underlying pagecache
>>>> page. that's what we call anonymous memory, because it does not belong
>>>> to a specific file. fallocate(punch) would not free up that anonymous
>>>> memory.
>>>
>>> For guest memfd, it does implement kvm_gmem_fallocate as .fallocate()
>>> callback, which calls truncate_inode_pages_range() [*].
>>>
>>> I'm not sure if it frees up the memory. I need to learn it.
>>>
>>> [*]
>>> https://github.com/kvm-x86/linux/blob/911b515af3ec5f53992b9cc162cf7d3893c2fbe2/virt/kvm/guest_memfd.c#L147C73-L147C73
>>>
>>>>>
>>>>>>
>>>>>> "Private memory" is only private from the guest POV, not from a mmap()
>>>>>> point of view.
>>>>>>
>>>>>> Two different concepts of "private".
>>>>>>
>>>>>>>
>>>>>>> Simply remove the warning will fail the purpose of this patch. The
>>>>>>> other
>>>>>>> option is to skip the warning for TDX case, which looks vary
>>>>>>> hacky. Do
>>>>>>> you have any idea?
>>>>>>
>>>>>> For TDX, all memory backends / RAMBlocks should be marked as "shared",
>>>>>> and you should fail if that is not provided by the user.
>>>>>
>>>>> As I asked above, I want to understand the logic clearly. Is mapped as
>>>>> shared is a must to support the memory discard? i.e., if we want to
>>>>> support memory discard after memory type change, then the memory
>>>>> must be
>>>>> mapped with MAP_SHARED?
>>>>
>>>> MAP_PIRVATE means that it's not sufficient to only fallocate(punch) the
>>>> fd to free up all memory for a virtual address, because there might be
>>>> anonymous memory in a private mapping that has to be freed up using
>>>> MADV_DONTNEED.
>>>
>>> I can understand this. But it seems unrelated to my question: Is mapped
>>> as shared is a must to support the memory discard?
>>
>> Sorry, I don't quite get what you are asking that I haven't answered
>> yet. Let's talk about the issue you are seeing below.
>>
>>>
>>> e.g., if use below parameters to specify the RAM for a VM
>>>
>>> -object memory-backend-memfd,id=mem0,size=2G \
>>> -machine memory-backend=mem0
>>>
>>> since not specifying "share" property, the ram_block doesn't have
>>> RAM_SHARED set. If want to discard some range of this memfd, it triggers
>>> the warning. Is this warning expected?
>>
>> That should not be the case. See "memfd_backend_instance_init" where we
>> set share=true. In memfd_backend_memory_alloc(), we set RAM_SHARED.
>>
>> We only default to share=off for memory-backend-file (well, and
>> memory-backend-ram).
>>
>> So are you sure you get this error message in the configuration you are
>> describing here?
>
> Sorry, I made an mistake. I was using "-object
> memory-backend-ram,id=mem0,size=2G" instead of "memory-backend-memfd".
>
> yes, when using "memory-backend-memfd" as the backend for TDX shared
> memory, it doesn't trigger the warning because
> memfd_backend_instance_init() set "share" to true.
>
> When using "memory-backend-ram" as the backend for TDX shared memory,
> the warning is triggered converting memory from private (kvm gmem) to
> shared (backend-ram). In this case, there is a valid fd (kvm gmem fd),
> so it goes to the path of need_fallocate. However,
> qemu_ram_is_shared(rb) returns false because "memory-backend-ram"
> doesn't have "share" default on. Thus, the warning is triggered.
>
> It seems I need figure out a more proper solution to refactor the
> ram_block_discard_range(), to make it applicable for kvm gmem discard case.
You should probably completely ignore any ramblock flags when
fallocate(punch) the kvm_gmem_fd. kvm_gmem_fd is a rather special
"secondary storage that's never mapped", independent of the ordinary
RAMBlock memory.
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 27+ messages in thread
* [PATCH v2 2/4] virtio-mem: Skip most of virtio_mem_unplug_all() without plugged memory
2023-07-06 7:56 [PATCH v2 0/4] virtio-mem: Support "x-ignore-shared" migration David Hildenbrand
2023-07-06 7:56 ` [PATCH v2 1/4] softmmu/physmem: Warn with ram_block_discard_range() on MAP_PRIVATE file mapping David Hildenbrand
@ 2023-07-06 7:56 ` David Hildenbrand
2023-07-06 8:15 ` Juan Quintela
2023-07-06 7:56 ` [PATCH v2 3/4] migration/ram: Expose ramblock_is_ignored() as migrate_ram_is_ignored() David Hildenbrand
` (2 subsequent siblings)
4 siblings, 1 reply; 27+ messages in thread
From: David Hildenbrand @ 2023-07-06 7:56 UTC (permalink / raw)
To: qemu-devel
Cc: David Hildenbrand, Michael S. Tsirkin, Juan Quintela, Peter Xu,
Leonardo Bras, Paolo Bonzini, Philippe Mathieu-Daudé,
Peng Tao, Mario Casquero
Already when starting QEMU we perform one system reset that ends up
triggering virtio_mem_unplug_all() with no actual memory plugged yet.
That, in turn will trigger ram_block_discard_range() and perform some
other actions that are not required in that case.
Let's optimize virtio_mem_unplug_all() for the case that no memory is
plugged. This will be beneficial for x-ignore-shared support as well.
Tested-by: Mario Casquero <mcasquer@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
hw/virtio/virtio-mem.c | 20 ++++++++++----------
1 file changed, 10 insertions(+), 10 deletions(-)
diff --git a/hw/virtio/virtio-mem.c b/hw/virtio/virtio-mem.c
index ec0ae32589..a922c21380 100644
--- a/hw/virtio/virtio-mem.c
+++ b/hw/virtio/virtio-mem.c
@@ -621,20 +621,20 @@ static int virtio_mem_unplug_all(VirtIOMEM *vmem)
{
RAMBlock *rb = vmem->memdev->mr.ram_block;
- if (virtio_mem_is_busy()) {
- return -EBUSY;
- }
-
- if (ram_block_discard_range(rb, 0, qemu_ram_get_used_length(rb))) {
- return -EBUSY;
- }
- virtio_mem_notify_unplug_all(vmem);
-
- bitmap_clear(vmem->bitmap, 0, vmem->bitmap_size);
if (vmem->size) {
+ if (virtio_mem_is_busy()) {
+ return -EBUSY;
+ }
+ if (ram_block_discard_range(rb, 0, qemu_ram_get_used_length(rb))) {
+ return -EBUSY;
+ }
+ virtio_mem_notify_unplug_all(vmem);
+
+ bitmap_clear(vmem->bitmap, 0, vmem->bitmap_size);
vmem->size = 0;
notifier_list_notify(&vmem->size_change_notifiers, &vmem->size);
}
+
trace_virtio_mem_unplugged_all();
virtio_mem_resize_usable_region(vmem, vmem->requested_size, true);
return 0;
--
2.41.0
^ permalink raw reply related [flat|nested] 27+ messages in thread
* Re: [PATCH v2 2/4] virtio-mem: Skip most of virtio_mem_unplug_all() without plugged memory
2023-07-06 7:56 ` [PATCH v2 2/4] virtio-mem: Skip most of virtio_mem_unplug_all() without plugged memory David Hildenbrand
@ 2023-07-06 8:15 ` Juan Quintela
2023-07-06 8:38 ` David Hildenbrand
0 siblings, 1 reply; 27+ messages in thread
From: Juan Quintela @ 2023-07-06 8:15 UTC (permalink / raw)
To: David Hildenbrand
Cc: qemu-devel, Michael S. Tsirkin, Peter Xu, Leonardo Bras,
Paolo Bonzini, Philippe Mathieu-Daudé, Peng Tao,
Mario Casquero
David Hildenbrand <david@redhat.com> wrote:
> Already when starting QEMU we perform one system reset that ends up
> triggering virtio_mem_unplug_all() with no actual memory plugged yet.
> That, in turn will trigger ram_block_discard_range() and perform some
> other actions that are not required in that case.
>
> Let's optimize virtio_mem_unplug_all() for the case that no memory is
> plugged. This will be beneficial for x-ignore-shared support as well.
>
> Tested-by: Mario Casquero <mcasquer@redhat.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>
It works, so ...
Reviewed-by: Juan Quintela <quintela@redhat.com>
> RAMBlock *rb = vmem->memdev->mr.ram_block;
>
> - if (virtio_mem_is_busy()) {
> - return -EBUSY;
> - }
> -
> - if (ram_block_discard_range(rb, 0, qemu_ram_get_used_length(rb))) {
> - return -EBUSY;
> - }
> - virtio_mem_notify_unplug_all(vmem);
> -
> - bitmap_clear(vmem->bitmap, 0, vmem->bitmap_size);
> if (vmem->size) {
> + if (virtio_mem_is_busy()) {
> + return -EBUSY;
I see that the only way that virtio_men_is_busy() is true is if we are
in the middle of a migration. In the case that vmem is 0, we don't
care. So we are good.
> + }
> + if (ram_block_discard_range(rb, 0, qemu_ram_get_used_length(rb))) {
> + return -EBUSY;
> + }
Nothing to discard, so also good.
> + virtio_mem_notify_unplug_all(vmem);
Nothing to notify, so also good.
> + bitmap_clear(vmem->bitmap, 0, vmem->bitmap_size);
> vmem->size = 0;
> notifier_list_notify(&vmem->size_change_notifiers, &vmem->size);
> }
> +
> trace_virtio_mem_unplugged_all();
> virtio_mem_resize_usable_region(vmem, vmem->requested_size, true);
> return 0;
Once that we are here. Do you remember _why_ do we allow virtio-mem
plug/unplug in the middle of a migration.
We forbid to plug/unplug everything else. Why do we need to plug/unplug
virtio-mem during migration?
Thanks, Juan.
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v2 2/4] virtio-mem: Skip most of virtio_mem_unplug_all() without plugged memory
2023-07-06 8:15 ` Juan Quintela
@ 2023-07-06 8:38 ` David Hildenbrand
2023-07-06 13:27 ` Juan Quintela
0 siblings, 1 reply; 27+ messages in thread
From: David Hildenbrand @ 2023-07-06 8:38 UTC (permalink / raw)
To: quintela
Cc: qemu-devel, Michael S. Tsirkin, Peter Xu, Leonardo Bras,
Paolo Bonzini, Philippe Mathieu-Daudé, Peng Tao,
Mario Casquero
On 06.07.23 10:15, Juan Quintela wrote:
> David Hildenbrand <david@redhat.com> wrote:
>> Already when starting QEMU we perform one system reset that ends up
>> triggering virtio_mem_unplug_all() with no actual memory plugged yet.
>> That, in turn will trigger ram_block_discard_range() and perform some
>> other actions that are not required in that case.
>>
>> Let's optimize virtio_mem_unplug_all() for the case that no memory is
>> plugged. This will be beneficial for x-ignore-shared support as well.
>>
>> Tested-by: Mario Casquero <mcasquer@redhat.com>
>> Signed-off-by: David Hildenbrand <david@redhat.com>
>
> It works, so ...
>
> Reviewed-by: Juan Quintela <quintela@redhat.com>
Thanks!
[...]
>> + bitmap_clear(vmem->bitmap, 0, vmem->bitmap_size);
>> vmem->size = 0;
>> notifier_list_notify(&vmem->size_change_notifiers, &vmem->size);
>> }
>> +
>> trace_virtio_mem_unplugged_all();
>> virtio_mem_resize_usable_region(vmem, vmem->requested_size, true);
>> return 0;
>
> Once that we are here. Do you remember _why_ do we allow virtio-mem
> plug/unplug in the middle of a migration.
>
> We forbid to plug/unplug everything else. Why do we need to plug/unplug
> virtio-mem during migration?
With virtio-mem you tell the VM the desired size for the device
(requested-size), and the VM will select blocks to (un)plug and send
(un)plug requests to the hypervisor in order to reach the requested size.
So changing the requested size in the hypervisor (by the QEMU user) and
the VM processing that resize request is asynchronous -- similar to
memory ballooning.
As the VM can send these (un)plug requests any time, and we exactly
don't want to allow (un)plug during migration, we have
virtio_mem_is_busy() to reject any such requests to tell the VM "please
try again later".
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v2 2/4] virtio-mem: Skip most of virtio_mem_unplug_all() without plugged memory
2023-07-06 8:38 ` David Hildenbrand
@ 2023-07-06 13:27 ` Juan Quintela
0 siblings, 0 replies; 27+ messages in thread
From: Juan Quintela @ 2023-07-06 13:27 UTC (permalink / raw)
To: David Hildenbrand
Cc: qemu-devel, Michael S. Tsirkin, Peter Xu, Leonardo Bras,
Paolo Bonzini, Philippe Mathieu-Daudé, Peng Tao,
Mario Casquero
David Hildenbrand <david@redhat.com> wrote:
> On 06.07.23 10:15, Juan Quintela wrote:
>> David Hildenbrand <david@redhat.com> wrote:
>>> Already when starting QEMU we perform one system reset that ends up
>>> triggering virtio_mem_unplug_all() with no actual memory plugged yet.
>>> That, in turn will trigger ram_block_discard_range() and perform some
>>> other actions that are not required in that case.
>>>
>>> Let's optimize virtio_mem_unplug_all() for the case that no memory is
>>> plugged. This will be beneficial for x-ignore-shared support as well.
>>>
>>> Tested-by: Mario Casquero <mcasquer@redhat.com>
>>> Signed-off-by: David Hildenbrand <david@redhat.com>
>> It works, so ...
>> Reviewed-by: Juan Quintela <quintela@redhat.com>
>
> Thanks!
>
> [...]
>
>>> + bitmap_clear(vmem->bitmap, 0, vmem->bitmap_size);
>>> vmem->size = 0;
>>> notifier_list_notify(&vmem->size_change_notifiers, &vmem->size);
>>> }
>>> +
>>> trace_virtio_mem_unplugged_all();
>>> virtio_mem_resize_usable_region(vmem, vmem->requested_size, true);
>>> return 0;
>> Once that we are here. Do you remember _why_ do we allow virtio-mem
>> plug/unplug in the middle of a migration.
>> We forbid to plug/unplug everything else. Why do we need to
>> plug/unplug
>> virtio-mem during migration?
>
> With virtio-mem you tell the VM the desired size for the device
> (requested-size), and the VM will select blocks to (un)plug and send
> (un)plug requests to the hypervisor in order to reach the requested
> size.
>
> So changing the requested size in the hypervisor (by the QEMU user)
> and the VM processing that resize request is asynchronous -- similar
> to memory ballooning.
>
> As the VM can send these (un)plug requests any time, and we exactly
> don't want to allow (un)plug during migration, we have
> virtio_mem_is_busy() to reject any such requests to tell the VM
> "please try again later".
Ahh.
I see it now.
Thanks, Juan.
^ permalink raw reply [flat|nested] 27+ messages in thread
* [PATCH v2 3/4] migration/ram: Expose ramblock_is_ignored() as migrate_ram_is_ignored()
2023-07-06 7:56 [PATCH v2 0/4] virtio-mem: Support "x-ignore-shared" migration David Hildenbrand
2023-07-06 7:56 ` [PATCH v2 1/4] softmmu/physmem: Warn with ram_block_discard_range() on MAP_PRIVATE file mapping David Hildenbrand
2023-07-06 7:56 ` [PATCH v2 2/4] virtio-mem: Skip most of virtio_mem_unplug_all() without plugged memory David Hildenbrand
@ 2023-07-06 7:56 ` David Hildenbrand
2023-07-06 8:16 ` Juan Quintela
2023-07-06 7:56 ` [PATCH v2 4/4] virtio-mem: Support "x-ignore-shared" migration David Hildenbrand
2023-07-06 14:03 ` [PATCH v2 0/4] " Michael S. Tsirkin
4 siblings, 1 reply; 27+ messages in thread
From: David Hildenbrand @ 2023-07-06 7:56 UTC (permalink / raw)
To: qemu-devel
Cc: David Hildenbrand, Michael S. Tsirkin, Juan Quintela, Peter Xu,
Leonardo Bras, Paolo Bonzini, Philippe Mathieu-Daudé,
Peng Tao, Mario Casquero
virtio-mem wants to know whether it should not mess with the RAMBlock
content (e.g., discard RAM, preallocate memory) on incoming migration.
So let's expose that function as migrate_ram_is_ignored() in
migration/misc.h
Acked-by: Peter Xu <peterx@redhat.com>
Tested-by: Mario Casquero <mcasquer@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
include/migration/misc.h | 1 +
migration/postcopy-ram.c | 2 +-
migration/ram.c | 14 +++++++-------
migration/ram.h | 3 +--
4 files changed, 10 insertions(+), 10 deletions(-)
diff --git a/include/migration/misc.h b/include/migration/misc.h
index 5ebe13b4b9..7dcc0b5c2c 100644
--- a/include/migration/misc.h
+++ b/include/migration/misc.h
@@ -40,6 +40,7 @@ int precopy_notify(PrecopyNotifyReason reason, Error **errp);
void ram_mig_init(void);
void qemu_guest_free_page_hint(void *addr, size_t len);
+bool migrate_ram_is_ignored(RAMBlock *block);
/* migration/block.c */
diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index 5615ec29eb..29aea9456d 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -408,7 +408,7 @@ bool postcopy_ram_supported_by_host(MigrationIncomingState *mis, Error **errp)
/*
* We don't support postcopy with some type of ramblocks.
*
- * NOTE: we explicitly ignored ramblock_is_ignored() instead we checked
+ * NOTE: we explicitly ignored migrate_ram_is_ignored() instead we checked
* all possible ramblocks. This is because this function can be called
* when creating the migration object, during the phase RAM_MIGRATABLE
* is not even properly set for all the ramblocks.
diff --git a/migration/ram.c b/migration/ram.c
index 5283a75f02..0ada6477e8 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -194,7 +194,7 @@ static bool postcopy_preempt_active(void)
return migrate_postcopy_preempt() && migration_in_postcopy();
}
-bool ramblock_is_ignored(RAMBlock *block)
+bool migrate_ram_is_ignored(RAMBlock *block)
{
return !qemu_ram_is_migratable(block) ||
(migrate_ignore_shared() && qemu_ram_is_shared(block)
@@ -696,7 +696,7 @@ static void pss_find_next_dirty(PageSearchStatus *pss)
unsigned long size = rb->used_length >> TARGET_PAGE_BITS;
unsigned long *bitmap = rb->bmap;
- if (ramblock_is_ignored(rb)) {
+ if (migrate_ram_is_ignored(rb)) {
/* Points directly to the end, so we know no dirty page */
pss->page = size;
return;
@@ -780,7 +780,7 @@ unsigned long colo_bitmap_find_dirty(RAMState *rs, RAMBlock *rb,
*num = 0;
- if (ramblock_is_ignored(rb)) {
+ if (migrate_ram_is_ignored(rb)) {
return size;
}
@@ -2260,7 +2260,7 @@ static int ram_save_host_page(RAMState *rs, PageSearchStatus *pss)
unsigned long start_page = pss->page;
int res;
- if (ramblock_is_ignored(pss->block)) {
+ if (migrate_ram_is_ignored(pss->block)) {
error_report("block %s should not be migrated !", pss->block->idstr);
return 0;
}
@@ -3347,7 +3347,7 @@ static inline RAMBlock *ram_block_from_stream(MigrationIncomingState *mis,
return NULL;
}
- if (ramblock_is_ignored(block)) {
+ if (migrate_ram_is_ignored(block)) {
error_report("block %s should not be migrated !", id);
return NULL;
}
@@ -3958,7 +3958,7 @@ static int ram_load_precopy(QEMUFile *f)
}
if (migrate_ignore_shared()) {
hwaddr addr = qemu_get_be64(f);
- if (ramblock_is_ignored(block) &&
+ if (migrate_ram_is_ignored(block) &&
block->mr->addr != addr) {
error_report("Mismatched GPAs for block %s "
"%" PRId64 "!= %" PRId64,
@@ -4254,7 +4254,7 @@ static void ram_mig_ram_block_resized(RAMBlockNotifier *n, void *host,
RAMBlock *rb = qemu_ram_block_from_host(host, false, &offset);
Error *err = NULL;
- if (ramblock_is_ignored(rb)) {
+ if (migrate_ram_is_ignored(rb)) {
return;
}
diff --git a/migration/ram.h b/migration/ram.h
index ea1f3c25b5..145c915ca7 100644
--- a/migration/ram.h
+++ b/migration/ram.h
@@ -36,11 +36,10 @@
extern XBZRLECacheStats xbzrle_counters;
extern CompressionStats compression_counters;
-bool ramblock_is_ignored(RAMBlock *block);
/* Should be holding either ram_list.mutex, or the RCU lock. */
#define RAMBLOCK_FOREACH_NOT_IGNORED(block) \
INTERNAL_RAMBLOCK_FOREACH(block) \
- if (ramblock_is_ignored(block)) {} else
+ if (migrate_ram_is_ignored(block)) {} else
#define RAMBLOCK_FOREACH_MIGRATABLE(block) \
INTERNAL_RAMBLOCK_FOREACH(block) \
--
2.41.0
^ permalink raw reply related [flat|nested] 27+ messages in thread
* Re: [PATCH v2 3/4] migration/ram: Expose ramblock_is_ignored() as migrate_ram_is_ignored()
2023-07-06 7:56 ` [PATCH v2 3/4] migration/ram: Expose ramblock_is_ignored() as migrate_ram_is_ignored() David Hildenbrand
@ 2023-07-06 8:16 ` Juan Quintela
0 siblings, 0 replies; 27+ messages in thread
From: Juan Quintela @ 2023-07-06 8:16 UTC (permalink / raw)
To: David Hildenbrand
Cc: qemu-devel, Michael S. Tsirkin, Peter Xu, Leonardo Bras,
Paolo Bonzini, Philippe Mathieu-Daudé, Peng Tao,
Mario Casquero
David Hildenbrand <david@redhat.com> wrote:
> virtio-mem wants to know whether it should not mess with the RAMBlock
> content (e.g., discard RAM, preallocate memory) on incoming migration.
>
> So let's expose that function as migrate_ram_is_ignored() in
> migration/misc.h
Reviewed-by: Juan Quintela <quintela@redhat.com>
^ permalink raw reply [flat|nested] 27+ messages in thread
* [PATCH v2 4/4] virtio-mem: Support "x-ignore-shared" migration
2023-07-06 7:56 [PATCH v2 0/4] virtio-mem: Support "x-ignore-shared" migration David Hildenbrand
` (2 preceding siblings ...)
2023-07-06 7:56 ` [PATCH v2 3/4] migration/ram: Expose ramblock_is_ignored() as migrate_ram_is_ignored() David Hildenbrand
@ 2023-07-06 7:56 ` David Hildenbrand
2023-07-06 11:06 ` Juan Quintela
2023-07-06 11:59 ` Juan Quintela
2023-07-06 14:03 ` [PATCH v2 0/4] " Michael S. Tsirkin
4 siblings, 2 replies; 27+ messages in thread
From: David Hildenbrand @ 2023-07-06 7:56 UTC (permalink / raw)
To: qemu-devel
Cc: David Hildenbrand, Michael S. Tsirkin, Juan Quintela, Peter Xu,
Leonardo Bras, Paolo Bonzini, Philippe Mathieu-Daudé,
Peng Tao, Mario Casquero
To achieve desired "x-ignore-shared" functionality, we should not
discard all RAM when realizing the device and not mess with
preallocation/postcopy when loading device state. In essence, we should
not touch RAM content.
As "x-ignore-shared" gets set after realizing the device, we cannot
rely on that. Let's simply skip discarding of RAM on incoming migration.
Note that virtio_mem_post_load() will call
virtio_mem_restore_unplugged() -- unless "x-ignore-shared" is set. So
once migration finished we'll have a consistent state.
The initial system reset will also not discard any RAM, because
virtio_mem_unplug_all() will not call virtio_mem_unplug_all() when no
memory is plugged (which is the case before loading the device state).
Note that something like VM templating -- see commit b17fbbe55cba
("migration: allow private destination ram with x-ignore-shared") -- is
currently incompatible with virtio-mem and ram_block_discard_range() will
warn in case a private file mapping is supplied by virtio-mem.
For VM templating with virtio-mem, it makes more sense to either
(a) Create the template without the virtio-mem device and hotplug a
virtio-mem device to the new VM instances using proper own memory
backend.
(b) Use a virtio-mem device that doesn't provide any memory in the
template (requested-size=0) and use private anonymous memory.
Tested-by: Mario Casquero <mcasquer@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
hw/virtio/virtio-mem.c | 47 ++++++++++++++++++++++++++++++++++--------
1 file changed, 38 insertions(+), 9 deletions(-)
diff --git a/hw/virtio/virtio-mem.c b/hw/virtio/virtio-mem.c
index a922c21380..3f41e00e74 100644
--- a/hw/virtio/virtio-mem.c
+++ b/hw/virtio/virtio-mem.c
@@ -18,6 +18,7 @@
#include "sysemu/numa.h"
#include "sysemu/sysemu.h"
#include "sysemu/reset.h"
+#include "sysemu/runstate.h"
#include "hw/virtio/virtio.h"
#include "hw/virtio/virtio-bus.h"
#include "hw/virtio/virtio-mem.h"
@@ -901,11 +902,23 @@ static void virtio_mem_device_realize(DeviceState *dev, Error **errp)
return;
}
- ret = ram_block_discard_range(rb, 0, qemu_ram_get_used_length(rb));
- if (ret) {
- error_setg_errno(errp, -ret, "Unexpected error discarding RAM");
- ram_block_coordinated_discard_require(false);
- return;
+ /*
+ * We don't know at this point whether shared RAM is migrated using
+ * QEMU or migrated using the file content. "x-ignore-shared" will be
+ * configured after realizing the device. So in case we have an
+ * incoming migration, simply always skip the discard step.
+ *
+ * Otherwise, make sure that we start with a clean slate: either the
+ * memory backend might get reused or the shared file might still have
+ * memory allocated.
+ */
+ if (!runstate_check(RUN_STATE_INMIGRATE)) {
+ ret = ram_block_discard_range(rb, 0, qemu_ram_get_used_length(rb));
+ if (ret) {
+ error_setg_errno(errp, -ret, "Unexpected error discarding RAM");
+ ram_block_coordinated_discard_require(false);
+ return;
+ }
}
virtio_mem_resize_usable_region(vmem, vmem->requested_size, true);
@@ -977,10 +990,6 @@ static int virtio_mem_post_load(void *opaque, int version_id)
RamDiscardListener *rdl;
int ret;
- if (vmem->prealloc && !vmem->early_migration) {
- warn_report("Proper preallocation with migration requires a newer QEMU machine");
- }
-
/*
* We started out with all memory discarded and our memory region is mapped
* into an address space. Replay, now that we updated the bitmap.
@@ -993,6 +1002,18 @@ static int virtio_mem_post_load(void *opaque, int version_id)
}
}
+ /*
+ * If shared RAM is migrated using the file content and not using QEMU,
+ * don't mess with preallocation and postcopy.
+ */
+ if (migrate_ram_is_ignored(vmem->memdev->mr.ram_block)) {
+ return 0;
+ }
+
+ if (vmem->prealloc && !vmem->early_migration) {
+ warn_report("Proper preallocation with migration requires a newer QEMU machine");
+ }
+
if (migration_in_incoming_postcopy()) {
return 0;
}
@@ -1025,6 +1046,14 @@ static int virtio_mem_post_load_early(void *opaque, int version_id)
return 0;
}
+ /*
+ * If shared RAM is migrated using the file content and not using QEMU,
+ * don't mess with preallocation and postcopy.
+ */
+ if (migrate_ram_is_ignored(rb)) {
+ return 0;
+ }
+
/*
* We restored the bitmap and verified that the basic properties
* match on source and destination, so we can go ahead and preallocate
--
2.41.0
^ permalink raw reply related [flat|nested] 27+ messages in thread
* Re: [PATCH v2 4/4] virtio-mem: Support "x-ignore-shared" migration
2023-07-06 7:56 ` [PATCH v2 4/4] virtio-mem: Support "x-ignore-shared" migration David Hildenbrand
@ 2023-07-06 11:06 ` Juan Quintela
2023-07-06 11:27 ` David Hildenbrand
2023-07-06 11:59 ` Juan Quintela
1 sibling, 1 reply; 27+ messages in thread
From: Juan Quintela @ 2023-07-06 11:06 UTC (permalink / raw)
To: David Hildenbrand
Cc: qemu-devel, Michael S. Tsirkin, Peter Xu, Leonardo Bras,
Paolo Bonzini, Philippe Mathieu-Daudé, Peng Tao,
Mario Casquero
David Hildenbrand <david@redhat.com> wrote:
> To achieve desired "x-ignore-shared" functionality, we should not
> discard all RAM when realizing the device and not mess with
> preallocation/postcopy when loading device state. In essence, we should
> not touch RAM content.
>
> As "x-ignore-shared" gets set after realizing the device, we cannot
> rely on that. Let's simply skip discarding of RAM on incoming migration.
> Note that virtio_mem_post_load() will call
> virtio_mem_restore_unplugged() -- unless "x-ignore-shared" is set. So
> once migration finished we'll have a consistent state.
>
> The initial system reset will also not discard any RAM, because
> virtio_mem_unplug_all() will not call virtio_mem_unplug_all() when no
> memory is plugged (which is the case before loading the device state).
>
> Note that something like VM templating -- see commit b17fbbe55cba
> ("migration: allow private destination ram with x-ignore-shared")
And here I am, I reviewed the patch, and 4 years later I don't remember
anything about it O:-)
> -- is
> currently incompatible with virtio-mem and ram_block_discard_range() will
> warn in case a private file mapping is supplied by virtio-mem.
If it is incompatible, only a warning is not enough.
>
> For VM templating with virtio-mem, it makes more sense to either
> (a) Create the template without the virtio-mem device and hotplug a
> virtio-mem device to the new VM instances using proper own memory
> backend.
> (b) Use a virtio-mem device that doesn't provide any memory in the
> template (requested-size=0) and use private anonymous memory.
>
> Tested-by: Mario Casquero <mcasquer@redhat.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
> hw/virtio/virtio-mem.c | 47 ++++++++++++++++++++++++++++++++++--------
> 1 file changed, 38 insertions(+), 9 deletions(-)
>
> diff --git a/hw/virtio/virtio-mem.c b/hw/virtio/virtio-mem.c
> index a922c21380..3f41e00e74 100644
> --- a/hw/virtio/virtio-mem.c
> +++ b/hw/virtio/virtio-mem.c
> @@ -18,6 +18,7 @@
> #include "sysemu/numa.h"
> #include "sysemu/sysemu.h"
> #include "sysemu/reset.h"
> +#include "sysemu/runstate.h"
> #include "hw/virtio/virtio.h"
> #include "hw/virtio/virtio-bus.h"
> #include "hw/virtio/virtio-mem.h"
> @@ -901,11 +902,23 @@ static void virtio_mem_device_realize(DeviceState *dev, Error **errp)
> return;
> }
>
> - ret = ram_block_discard_range(rb, 0, qemu_ram_get_used_length(rb));
> - if (ret) {
> - error_setg_errno(errp, -ret, "Unexpected error discarding RAM");
> - ram_block_coordinated_discard_require(false);
> - return;
> + /*
> + * We don't know at this point whether shared RAM is migrated using
> + * QEMU or migrated using the file content. "x-ignore-shared" will be
> + * configured after realizing the device. So in case we have an
> + * incoming migration, simply always skip the discard step.
> + *
> + * Otherwise, make sure that we start with a clean slate: either the
> + * memory backend might get reused or the shared file might still have
> + * memory allocated.
> + */
> + if (!runstate_check(RUN_STATE_INMIGRATE)) {
> + ret = ram_block_discard_range(rb, 0, qemu_ram_get_used_length(rb));
> + if (ret) {
> + error_setg_errno(errp, -ret, "Unexpected error discarding RAM");
> + ram_block_coordinated_discard_require(false);
> + return;
> + }
> }
Makes sense.
>
> virtio_mem_resize_usable_region(vmem, vmem->requested_size, true);
> @@ -977,10 +990,6 @@ static int virtio_mem_post_load(void *opaque, int version_id)
> RamDiscardListener *rdl;
> int ret;
>
> - if (vmem->prealloc && !vmem->early_migration) {
> - warn_report("Proper preallocation with migration requires a newer QEMU machine");
> - }
> -
> /*
> * We started out with all memory discarded and our memory region is mapped
> * into an address space. Replay, now that we updated the bitmap.
> @@ -993,6 +1002,18 @@ static int virtio_mem_post_load(void *opaque, int version_id)
> }
> }
>
> + /*
> + * If shared RAM is migrated using the file content and not using QEMU,
> + * don't mess with preallocation and postcopy.
> + */
> + if (migrate_ram_is_ignored(vmem->memdev->mr.ram_block)) {
> + return 0;
> + }
> +
> + if (vmem->prealloc && !vmem->early_migration) {
> + warn_report("Proper preallocation with migration requires a newer QEMU machine");
> + }
> +
Could you explain why you are putting the check after calling
virtio_mem_notify_populate_cb()?
What is it expected to for file memory backed RAM? I got lost when I
saw that it just calls:
static int virtio_mem_notify_populate_cb(MemoryRegionSection *s, void *arg)
{
RamDiscardListener *rdl = arg;
return rdl->notify_populate(rdl, s);
}
I end in vfio, and got completely confused about what is going on there.
> if (migration_in_incoming_postcopy()) {
> return 0;
> }
> @@ -1025,6 +1046,14 @@ static int virtio_mem_post_load_early(void *opaque, int version_id)
> return 0;
> }
>
> + /*
> + * If shared RAM is migrated using the file content and not using QEMU,
> + * don't mess with preallocation and postcopy.
> + */
> + if (migrate_ram_is_ignored(rb)) {
> + return 0;
> + }
> +
> /*
> * We restored the bitmap and verified that the basic properties
> * match on source and destination, so we can go ahead and preallocate
OK.
Thanks, Juan.
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v2 4/4] virtio-mem: Support "x-ignore-shared" migration
2023-07-06 11:06 ` Juan Quintela
@ 2023-07-06 11:27 ` David Hildenbrand
0 siblings, 0 replies; 27+ messages in thread
From: David Hildenbrand @ 2023-07-06 11:27 UTC (permalink / raw)
To: quintela
Cc: qemu-devel, Michael S. Tsirkin, Peter Xu, Leonardo Bras,
Paolo Bonzini, Philippe Mathieu-Daudé, Peng Tao,
Mario Casquero
On 06.07.23 13:06, Juan Quintela wrote:
> David Hildenbrand <david@redhat.com> wrote:
>> To achieve desired "x-ignore-shared" functionality, we should not
>> discard all RAM when realizing the device and not mess with
>> preallocation/postcopy when loading device state. In essence, we should
>> not touch RAM content.
>>
>> As "x-ignore-shared" gets set after realizing the device, we cannot
>> rely on that. Let's simply skip discarding of RAM on incoming migration.
>> Note that virtio_mem_post_load() will call
>> virtio_mem_restore_unplugged() -- unless "x-ignore-shared" is set. So
>> once migration finished we'll have a consistent state.
>>
>> The initial system reset will also not discard any RAM, because
>> virtio_mem_unplug_all() will not call virtio_mem_unplug_all() when no
>> memory is plugged (which is the case before loading the device state).
>>
>> Note that something like VM templating -- see commit b17fbbe55cba
>> ("migration: allow private destination ram with x-ignore-shared")
>
> And here I am, I reviewed the patch, and 4 years later I don't remember
> anything about it O:-)
:)
[...]
>> + /*
>> + * If shared RAM is migrated using the file content and not using QEMU,
>> + * don't mess with preallocation and postcopy.
>> + */
>> + if (migrate_ram_is_ignored(vmem->memdev->mr.ram_block)) {
>> + return 0;
>> + }
>> +
>> + if (vmem->prealloc && !vmem->early_migration) {
>> + warn_report("Proper preallocation with migration requires a newer QEMU machine");
>> + }
>> +
>
> Could you explain why you are putting the check after calling
> virtio_mem_notify_populate_cb()?
>
> What is it expected to for file memory backed RAM? I got lost when I
> saw that it just calls:
>
> static int virtio_mem_notify_populate_cb(MemoryRegionSection *s, void *arg)
> {
> RamDiscardListener *rdl = arg;
>
> return rdl->notify_populate(rdl, s);
> }
>
>
> I end in vfio, and got completely confused about what is going on there.
:)
Once we reached virtio_mem_post_load(), we restored the bitmap that
contains the state of all device blocks (plugged vs. unplugged).
Whenever we modify the bitmap (plug / unplug), we have to notify
(RamDiscardManager) listeners, such that they are aware of the state
change and can perform according action.
For example, vfio will go ahead and register the newly plugged blocks
with the kernel (DMA map it into the vfio), where the kernel will end up
long-term pinning these pages. Effectively, we only end up DMA-mapping
plugged memory blocks, so only these get pinned by the kernel (and we
can actually release the memory of unplugged blocks).
So here (virtio_mem_post_load()), we just restored the bitmap from the
migration stream and effectively went from 0 plugged blocks (bitmap
empty) before migration to "maybe some plugged blocks in the bitmap".
So we go over the bitmap and tell the world (vfio) to go ahead and
DMA-map these blocks that are suddenly plugged.
And that part is independent of the actual RAM migration /
x-ignore-shared, sow have to do it unconditional.
Thanks for the thorough review!
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v2 4/4] virtio-mem: Support "x-ignore-shared" migration
2023-07-06 7:56 ` [PATCH v2 4/4] virtio-mem: Support "x-ignore-shared" migration David Hildenbrand
2023-07-06 11:06 ` Juan Quintela
@ 2023-07-06 11:59 ` Juan Quintela
1 sibling, 0 replies; 27+ messages in thread
From: Juan Quintela @ 2023-07-06 11:59 UTC (permalink / raw)
To: David Hildenbrand
Cc: qemu-devel, Michael S. Tsirkin, Peter Xu, Leonardo Bras,
Paolo Bonzini, Philippe Mathieu-Daudé, Peng Tao,
Mario Casquero
David Hildenbrand <david@redhat.com> wrote:
> To achieve desired "x-ignore-shared" functionality, we should not
> discard all RAM when realizing the device and not mess with
> preallocation/postcopy when loading device state. In essence, we should
> not touch RAM content.
>
> As "x-ignore-shared" gets set after realizing the device, we cannot
> rely on that. Let's simply skip discarding of RAM on incoming migration.
> Note that virtio_mem_post_load() will call
> virtio_mem_restore_unplugged() -- unless "x-ignore-shared" is set. So
> once migration finished we'll have a consistent state.
>
> The initial system reset will also not discard any RAM, because
> virtio_mem_unplug_all() will not call virtio_mem_unplug_all() when no
> memory is plugged (which is the case before loading the device state).
>
> Note that something like VM templating -- see commit b17fbbe55cba
> ("migration: allow private destination ram with x-ignore-shared") -- is
> currently incompatible with virtio-mem and ram_block_discard_range() will
> warn in case a private file mapping is supplied by virtio-mem.
>
> For VM templating with virtio-mem, it makes more sense to either
> (a) Create the template without the virtio-mem device and hotplug a
> virtio-mem device to the new VM instances using proper own memory
> backend.
> (b) Use a virtio-mem device that doesn't provide any memory in the
> template (requested-size=0) and use private anonymous memory.
>
> Tested-by: Mario Casquero <mcasquer@redhat.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>
After very nice explanation.
Reviewed-by: Juan Quintela <quintela@redhat.com>
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v2 0/4] virtio-mem: Support "x-ignore-shared" migration
2023-07-06 7:56 [PATCH v2 0/4] virtio-mem: Support "x-ignore-shared" migration David Hildenbrand
` (3 preceding siblings ...)
2023-07-06 7:56 ` [PATCH v2 4/4] virtio-mem: Support "x-ignore-shared" migration David Hildenbrand
@ 2023-07-06 14:03 ` Michael S. Tsirkin
2023-07-07 12:21 ` David Hildenbrand
4 siblings, 1 reply; 27+ messages in thread
From: Michael S. Tsirkin @ 2023-07-06 14:03 UTC (permalink / raw)
To: David Hildenbrand
Cc: qemu-devel, Juan Quintela, Peter Xu, Leonardo Bras, Paolo Bonzini,
Philippe Mathieu-Daudé, Peng Tao, Mario Casquero
On Thu, Jul 06, 2023 at 09:56:05AM +0200, David Hildenbrand wrote:
> If there is no further feedback, I'll queue this myself shortly.
>
> Stumbling over "x-ignore-shared" migration support for virtio-mem on
> my todo list, I remember talking to Dave G. a while ago about how
> ram_block_discard_range() in MAP_PIRVATE file mappings is possibly
> harmful when the file is used somewhere else -- for example, with VM
> templating in multiple VMs.
>
> This series adds a warning to ram_block_discard_range() in that problematic
> case and adds "x-ignore-shared" migration support for virtio-mem, which
> is pretty straight-forward. The last patch also documents how VM templating
> interacts with virtio-mem.
>
> v1 -> v2:
> - Pick up tags
> - "virtio-mem: Support "x-ignore-shared" migration"
> -> Fix spelling mistake
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
> Cc: "Michael S. Tsirkin" <mst@redhat.com>
> Cc: Juan Quintela <quintela@redhat.com>
> Cc: Peter Xu <peterx@redhat.com>
> Cc: Leonardo Bras <leobras@redhat.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: "Philippe Mathieu-Daudé" <philmd@linaro.org>
> Cc: Peng Tao <tao.peng@linux.alibaba.com>
> Cc: Mario Casquero <mcasquer@redhat.com>
>
> David Hildenbrand (4):
> softmmu/physmem: Warn with ram_block_discard_range() on MAP_PRIVATE
> file mapping
> virtio-mem: Skip most of virtio_mem_unplug_all() without plugged
> memory
> migration/ram: Expose ramblock_is_ignored() as
> migrate_ram_is_ignored()
> virtio-mem: Support "x-ignore-shared" migration
>
> hw/virtio/virtio-mem.c | 67 ++++++++++++++++++++++++++++------------
> include/migration/misc.h | 1 +
> migration/postcopy-ram.c | 2 +-
> migration/ram.c | 14 ++++-----
> migration/ram.h | 3 +-
> softmmu/physmem.c | 18 +++++++++++
> 6 files changed, 76 insertions(+), 29 deletions(-)
>
> --
> 2.41.0
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v2 0/4] virtio-mem: Support "x-ignore-shared" migration
2023-07-06 14:03 ` [PATCH v2 0/4] " Michael S. Tsirkin
@ 2023-07-07 12:21 ` David Hildenbrand
0 siblings, 0 replies; 27+ messages in thread
From: David Hildenbrand @ 2023-07-07 12:21 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: qemu-devel, Juan Quintela, Peter Xu, Leonardo Bras, Paolo Bonzini,
Philippe Mathieu-Daudé, Peng Tao, Mario Casquero
On 06.07.23 16:03, Michael S. Tsirkin wrote:
> On Thu, Jul 06, 2023 at 09:56:05AM +0200, David Hildenbrand wrote:
>> If there is no further feedback, I'll queue this myself shortly.
>>
>> Stumbling over "x-ignore-shared" migration support for virtio-mem on
>> my todo list, I remember talking to Dave G. a while ago about how
>> ram_block_discard_range() in MAP_PIRVATE file mappings is possibly
>> harmful when the file is used somewhere else -- for example, with VM
>> templating in multiple VMs.
>>
>> This series adds a warning to ram_block_discard_range() in that problematic
>> case and adds "x-ignore-shared" migration support for virtio-mem, which
>> is pretty straight-forward. The last patch also documents how VM templating
>> interacts with virtio-mem.
>>
>> v1 -> v2:
>> - Pick up tags
>> - "virtio-mem: Support "x-ignore-shared" migration"
>> -> Fix spelling mistake
>
> Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Thanks, I queued this to
https://github.com/davidhildenbrand/qemu.git mem-next
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 27+ messages in thread