From: David Hildenbrand <david@redhat.com>
To: Peter Xu <peterx@redhat.com>
Cc: qemu-devel@nongnu.org, "Michael S. Tsirkin" <mst@redhat.com>,
"Juan Quintela" <quintela@redhat.com>,
"Leonardo Bras" <leobras@redhat.com>,
"Paolo Bonzini" <pbonzini@redhat.com>,
"Philippe Mathieu-Daudé" <philmd@linaro.org>,
"Peng Tao" <tao.peng@linux.alibaba.com>
Subject: Re: [PATCH v1 1/4] softmmu/physmem: Warn with ram_block_discard_range() on MAP_PRIVATE file mapping
Date: Wed, 21 Jun 2023 18:17:37 +0200 [thread overview]
Message-ID: <9f7afce0-ff7f-33f8-4f39-bba77f2b2ba4@redhat.com> (raw)
In-Reply-To: <ZJMdZRoeu9BVm0z8@x1n>
On 21.06.23 17:55, Peter Xu wrote:
> On Tue, Jun 20, 2023 at 03:03:51PM +0200, David Hildenbrand wrote:
>> ram_block_discard_range() cannot possibly do the right thing in
>> MAP_PRIVATE file mappings in the general case.
>>
>> To achieve the documented semantics, we also have to punch a hole into
>> the file, possibly messing with other MAP_PRIVATE/MAP_SHARED mappings
>> of such a file.
>>
>> For example, using VM templating -- see commit b17fbbe55cba ("migration:
>> allow private destination ram with x-ignore-shared") -- in combination with
>> any mechanism that relies on discarding of RAM is problematic. This
>> includes:
>> * Postcopy live migration
>> * virtio-balloon inflation/deflation or free-page-reporting
>> * virtio-mem
>>
>> So at least warn that there is something possibly dangerous is going on
>> when using ram_block_discard_range() in these cases.
>
> The issue is probably valid.
>
> One thing I worry is when the user (or, qemu instance) exclusively owns the
> file, just forgot to attach share=on, where it used to work perfectly then
> it'll show this warning. But I agree maybe it's good to remind them just
> to attach the share=on.
For memory-backend-memfd "share=on" is fortunately the default. For
memory-backend-file it isn't (and in most cases you do want share=on,
like for hugetlbfs or tmpfs).
Missing the "share=on" for memory-backend-file can have sane use cases,
but for the common /dev/shm/ case it even results in an undesired
double-memory consumption (just like memory-backend-memfd,share=off).
>
> For real private mem users, the warning can of real help, one should
> probably leverage things like file snapshot provided by modern file
> systems, so each VM should just have its own snapshot ram file to use then
> map it share=on I suppose.
Yes, I agree. Although we recently learned that fs-backed VM RAM (SSD)
performs poorly and will severely wear your SSD severly :(
>
> For the long term, maybe we should simply support private mem here simply
> by a MADV_DONTNEED. I assume that's the right semantics for postcopy (just
> need to support MINOR faults, though; MISSING faults definitely will stop
> working.. but for all the rest framework shouldn't need much change), and I
> hope that's also the semantics that balloon/virtio-mem wants here. Not
> sure whether/when that's strongly needed, assuming the corner case above
> can still be work arounded properly by other means.
I briefly thought about that but came to the conclusion that fixing it
is not that easy. So I went with the warn.
As documented, ram_block_discard_range() guarantees two things
a) Read 0 after discarding succeeded
b) Make postcopy work by triggering a fault on next access
And if we'd simply want to drop the FALLOC_FL_PUNCH_HOLE:
1) For hugetlb, only newer kernels support MADV_DONTNEED. So there is no
way to just discard in a private mapping here that works for kernels we
still care about.
2) free-page-reporting wants to read 0's when re-accessing discarded
memory. If there is still something there in the file, that won't work.
3) Regarding postcopy on MAP_PRIVATE shmem, I am not sure if it will
actually do what you want if the pagecache holds a page. Maybe it works,
but I am not so sure. Needs investigation.
>
> For now, a warning looks all sane.
>
>>
>> Signed-off-by: David Hildenbrand <david@redhat.com>
>
> Acked-by: Peter Xu <peterx@redhat.com>
Thanks!
--
Cheers,
David / dhildenb
next prev parent reply other threads:[~2023-06-21 16:18 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-06-20 13:03 [PATCH v1 0/4] virtio-mem: Support "x-ignore-shared" migration David Hildenbrand
2023-06-20 13:03 ` [PATCH v1 1/4] softmmu/physmem: Warn with ram_block_discard_range() on MAP_PRIVATE file mapping David Hildenbrand
2023-06-21 15:55 ` Peter Xu
2023-06-21 16:17 ` David Hildenbrand [this message]
2023-06-21 16:55 ` Peter Xu
2023-06-22 13:10 ` David Hildenbrand
2023-06-22 14:54 ` Peter Xu
2023-06-20 13:03 ` [PATCH v1 2/4] virtio-mem: Skip most of virtio_mem_unplug_all() without plugged memory David Hildenbrand
2023-06-20 13:03 ` [PATCH v1 3/4] migration/ram: Expose ramblock_is_ignored() as migrate_ram_is_ignored() David Hildenbrand
2023-06-21 15:56 ` Peter Xu
2023-06-20 13:03 ` [PATCH v1 4/4] virtio-mem: Support "x-ignore-shared" migration David Hildenbrand
2023-06-20 13:06 ` Michael S. Tsirkin
2023-06-20 13:40 ` David Hildenbrand
2023-07-06 5:59 ` [PATCH v1 0/4] " Mario Casquero
2023-07-06 7:19 ` David Hildenbrand
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=9f7afce0-ff7f-33f8-4f39-bba77f2b2ba4@redhat.com \
--to=david@redhat.com \
--cc=leobras@redhat.com \
--cc=mst@redhat.com \
--cc=pbonzini@redhat.com \
--cc=peterx@redhat.com \
--cc=philmd@linaro.org \
--cc=qemu-devel@nongnu.org \
--cc=quintela@redhat.com \
--cc=tao.peng@linux.alibaba.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).