qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: William Roche <william.roche@oracle.com>,
	kvm@vger.kernel.org, qemu-devel@nongnu.org, qemu-arm@nongnu.org
Cc: peterx@redhat.com, pbonzini@redhat.com,
	richard.henderson@linaro.org, philmd@linaro.org,
	peter.maydell@linaro.org, mtosatti@redhat.com,
	joao.m.martins@oracle.com
Subject: Re: [PATCH v1 2/4] accel/kvm: Keep track of the HWPoisonPage page_size
Date: Mon, 4 Nov 2024 15:10:59 +0100	[thread overview]
Message-ID: <e9ffa8b7-0e2a-46fb-a2e7-af701024a7d2@redhat.com> (raw)
In-Reply-To: <ea82558c-eef1-4af6-bb90-5902f04ce06c@oracle.com>

> 
> Ok, I can remove the first patch that is created to track the kernel
> provided page size and pass it to the kvm_hwpoison_page_add() function,
> but we could deal with the page size at the kvm_hwpoison_page_add()
> function level as we don't rely on the kernel provided info, but just
> the RAMBlock page size.

Great!

>>> I see that ram_block_discard_range() adds more control before
>>> discarding the RAM region and can also call madvise() in addition to
>>> the fallocate punch hole for standard sized memory pages. Now as the
>>> range is supposed to be recreated, I'm not convinced that these
>>> madvise calls are necessary.
>>
>> They are the proper replacement for the mmap(MAP_FIXED) + fallocate.
>>
>> That function handles all cases of properly discarding guest RAM.
> 
> In the case of hugetlbfs pages, ram_block_discard_range() does the
> punch-hole fallocate call (and prints out the warning messages).
> The madvise call is only done when (rb->page_size ==
> qemu_real_host_page_size()) which isn't true for hugetlbfs.
> So need_madvise is false and neither QEMU_MADV_REMOVE nor
> QEMU_MADV_DONTNEED madvise calls is performed.

See my other mail regarding fallocte()+hugetlb oddities.

The warning is for MAP_PRIVATE mappings where we cannot be sure that we are
not doing harm to somebody else that is mapping the file :(

See

commit 1d44ff586f8a8e113379430750b5a0a2a3f64cf9
Author: David Hildenbrand <david@redhat.com>
Date:   Thu Jul 6 09:56:06 2023 +0200

     softmmu/physmem: Warn with ram_block_discard_range() on MAP_PRIVATE file mapping
     
     ram_block_discard_range() cannot possibly do the right thing in
     MAP_PRIVATE file mappings in the general case.
     
     To achieve the documented semantics, we also have to punch a hole into
     the file, possibly messing with other MAP_PRIVATE/MAP_SHARED mappings
     of such a file.
     
     For example, using VM templating -- see commit b17fbbe55cba ("migration:
     allow private destination ram with x-ignore-shared") -- in combination with
     any mechanism that relies on discarding of RAM is problematic. This
     includes:
     * Postcopy live migration
     * virtio-balloon inflation/deflation or free-page-reporting
     * virtio-mem
     
     So at least warn that there is something possibly dangerous is going on
     when using ram_block_discard_range() in these cases.

So the warning is the best we can do to say "this is possibly very
problematic, and it might be undesirable".

For hugetlb, users should switch to using memory-backend-memfd or
memory-backend-file,share=on.

> 
> 
>>
>>>
>>> But we can also notice that this function will report the following
>>> warning in all cases of not shared file backends:
>>> "ram_block_discard_range: Discarding RAM in private file mappings is
>>> possibly dangerous, because it will modify the underlying file and
>>> will affect other users of the file"
>>
>> Yes, because it's a clear warning sign that something weird is
>> happening. You might be throwing away data that some other process might
>> be relying on.
>>
>> How are you making QEMU consume hugetlbs?
> 
> A classical way to consume (not shared) hugetlbfs pages is done with the
> creation of a file that is opened, mmapped by the Qemu instance but we
> also delete the file system entry so that if the Qemu instance dies, the
> resources are released. This file is usually not shared.

Right, see above. We should be using memory-backend-file,share=on with that,
just like we would with shmem/tmpfs :(

The ugly bit is that the legacy "-mem-path" option translates to
"memory-backend-file,share=off", and we cannot easily change that.

That option really should not be used anymore.

> 
> 
>>
>> We could suppress these warnings, but let's first see how you are able
>> to trigger it.
> 
> The warning is always displayed when such a hugetlbfs VM impacted by a
> memory error is rebooted.
> I understand the reason why we have this message, but in the case of
> hugetlbfs classical use this (new) message on reboot is probably too
> worrying...  But loosing memory is already very worrying ;)

See above; we cannot easily identify "we map this file MAP_PRIVATE
but we are guaranteed to be the single user", so punching a hole in that
file might just corrupt data for another user (e.g., VM templating) without
any warning.

Again, we could suppress the warning, but not using MAP_PRIVATE with
a hugetlb file would be even better.

(hugetlb contains other hacks that make sure that MAP_PRIVATE on a file
won't result in a double memory consumption -- with shmem/tmpfs it would
result in a double memory consumption!)

Are the users you are aware of using "-mem-path" or "-object memory-backend-file"?

We might be able to change the default for the latter with a new QEMU version,
maybe ...


-- 
Cheers,

David / dhildenb



  reply	other threads:[~2024-11-04 14:12 UTC|newest]

Thread overview: 119+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-09-10  9:07 [RFC 0/6] hugetlbfs largepage RAS project “William Roche
2024-09-10  9:07 ` [RFC 1/6] accel/kvm: SIGBUS handler should also deal with si_addr_lsb “William Roche
2024-09-10  9:07 ` [RFC 2/6] accel/kvm: Keep track of the HWPoisonPage sizes “William Roche
2024-09-10  9:07 ` [RFC 3/6] system/physmem: Remap memory pages on reset based on the page size “William Roche
2024-09-10  9:07 ` [RFC 4/6] system: Introducing hugetlbfs largepage RAS feature “William Roche
2024-09-10  9:07 ` [RFC 5/6] system/hugetlb_ras: Handle madvise SIGBUS signal on listener “William Roche
2024-09-10  9:07 ` [RFC 6/6] system/hugetlb_ras: Replay lost BUS_MCEERR_AO signals on VM resume “William Roche
2024-09-10 10:02 ` [RFC RESEND 0/6] hugetlbfs largepage RAS project “William Roche
2024-09-10 10:02   ` [RFC RESEND 1/6] accel/kvm: SIGBUS handler should also deal with si_addr_lsb “William Roche
2024-09-10 10:02   ` [RFC RESEND 2/6] accel/kvm: Keep track of the HWPoisonPage sizes “William Roche
2024-09-10 10:02   ` [RFC RESEND 3/6] system/physmem: Remap memory pages on reset based on the page size “William Roche
2024-09-10 10:02   ` [RFC RESEND 4/6] system: Introducing hugetlbfs largepage RAS feature “William Roche
2024-09-10 10:02   ` [RFC RESEND 5/6] system/hugetlb_ras: Handle madvise SIGBUS signal on listener “William Roche
2024-09-10 10:02   ` [RFC RESEND 6/6] system/hugetlb_ras: Replay lost BUS_MCEERR_AO signals on VM resume “William Roche
2024-09-10 11:36   ` [RFC RESEND 0/6] hugetlbfs largepage RAS project David Hildenbrand
2024-09-10 16:24     ` William Roche
2024-09-11 22:07       ` David Hildenbrand
2024-09-12 17:07         ` William Roche
2024-09-19 16:52           ` William Roche
2024-10-09 15:45             ` Peter Xu
2024-10-10 20:35               ` William Roche
2024-10-22 21:34               ` [PATCH v1 0/4] hugetlbfs memory HW error fixes “William Roche
2024-10-22 21:35                 ` [PATCH v1 1/4] accel/kvm: SIGBUS handler should also deal with si_addr_lsb “William Roche
2024-10-22 21:35                 ` [PATCH v1 2/4] accel/kvm: Keep track of the HWPoisonPage page_size “William Roche
2024-10-23  7:28                   ` David Hildenbrand
2024-10-25 23:27                     ` William Roche
2024-10-28 16:42                       ` David Hildenbrand
2024-10-30  1:56                         ` William Roche
2024-11-04 14:10                           ` David Hildenbrand [this message]
2024-10-25 23:30                     ` William Roche
2024-10-22 21:35                 ` [PATCH v1 3/4] system/physmem: Largepage punch hole before reset of memory pages “William Roche
2024-10-23  7:30                   ` David Hildenbrand
2024-10-25 23:27                     ` William Roche
2024-10-28 17:01                       ` David Hildenbrand
2024-10-30  1:56                         ` William Roche
2024-11-04 13:30                           ` David Hildenbrand
2024-11-07 10:21                             ` [PATCH v2 0/7] hugetlbfs memory HW error fixes “William Roche
2024-11-07 10:21                               ` [PATCH v2 1/7] accel/kvm: Keep track of the HWPoisonPage page_size “William Roche
2024-11-12 10:30                                 ` David Hildenbrand
2024-11-12 18:17                                   ` William Roche
2024-11-12 21:35                                     ` David Hildenbrand
2024-11-07 10:21                               ` [PATCH v2 2/7] system/physmem: poisoned memory discard on reboot “William Roche
2024-11-12 11:07                                 ` David Hildenbrand
2024-11-12 18:17                                   ` William Roche
2024-11-12 22:06                                     ` David Hildenbrand
2024-11-07 10:21                               ` [PATCH v2 3/7] accel/kvm: Report the loss of a large memory page “William Roche
2024-11-12 11:13                                 ` David Hildenbrand
2024-11-12 18:17                                   ` William Roche
2024-11-12 22:22                                     ` David Hildenbrand
2024-11-15 21:03                                       ` William Roche
2024-11-18  9:45                                         ` David Hildenbrand
2024-11-07 10:21                               ` [PATCH v2 4/7] numa: Introduce and use ram_block_notify_remap() “William Roche
2024-11-07 10:21                               ` [PATCH v2 5/7] hostmem: Factor out applying settings “William Roche
2024-11-07 10:21                               ` [PATCH v2 6/7] hostmem: Handle remapping of RAM “William Roche
2024-11-12 13:45                                 ` David Hildenbrand
2024-11-12 18:17                                   ` William Roche
2024-11-12 22:24                                     ` David Hildenbrand
2024-11-07 10:21                               ` [PATCH v2 7/7] system/physmem: Memory settings applied on remap notification “William Roche
2024-10-22 21:35                 ` [PATCH v1 4/4] accel/kvm: Report the loss of a large memory page “William Roche
2024-10-28 16:32             ` [RFC RESEND 0/6] hugetlbfs largepage RAS project David Hildenbrand
2024-11-25 14:27         ` [PATCH v3 0/7] hugetlbfs memory HW error fixes “William Roche
2024-11-25 14:27           ` [PATCH v3 1/7] hwpoison_page_list and qemu_ram_remap are based of pages “William Roche
2024-11-25 14:27           ` [PATCH v3 2/7] system/physmem: poisoned memory discard on reboot “William Roche
2024-11-25 14:27           ` [PATCH v3 3/7] accel/kvm: Report the loss of a large memory page “William Roche
2024-11-25 14:27           ` [PATCH v3 4/7] numa: Introduce and use ram_block_notify_remap() “William Roche
2024-11-25 14:27           ` [PATCH v3 5/7] hostmem: Factor out applying settings “William Roche
2024-11-25 14:27           ` [PATCH v3 6/7] hostmem: Handle remapping of RAM “William Roche
2024-11-25 14:27           ` [PATCH v3 7/7] system/physmem: Memory settings applied on remap notification “William Roche
2024-12-02 15:41           ` [PATCH v3 0/7] hugetlbfs memory HW error fixes William Roche
2024-12-02 16:00             ` David Hildenbrand
2024-12-03  0:15               ` William Roche
2024-12-03 14:08                 ` David Hildenbrand
2024-12-03 14:39                   ` William Roche
2024-12-03 15:00                     ` David Hildenbrand
2024-12-06 18:26                       ` William Roche
2024-12-09 21:25                         ` David Hildenbrand
2024-12-14 13:45         ` [PATCH v4 0/7] Poisoned memory recovery on reboot “William Roche
2024-12-14 13:45           ` [PATCH v4 1/7] hwpoison_page_list and qemu_ram_remap are based on pages “William Roche
2025-01-08 21:34             ` David Hildenbrand
2025-01-10 20:56               ` William Roche
2025-01-14 13:56                 ` David Hildenbrand
2024-12-14 13:45           ` [PATCH v4 2/7] system/physmem: poisoned memory discard on reboot “William Roche
2025-01-08 21:44             ` David Hildenbrand
2025-01-10 20:56               ` William Roche
2025-01-14 14:00                 ` David Hildenbrand
2025-01-27 21:15                   ` William Roche
2024-12-14 13:45           ` [PATCH v4 3/7] accel/kvm: Report the loss of a large memory page “William Roche
2024-12-14 13:45           ` [PATCH v4 4/7] numa: Introduce and use ram_block_notify_remap() “William Roche
2024-12-14 13:45           ` [PATCH v4 5/7] hostmem: Factor out applying settings “William Roche
2025-01-08 21:58             ` David Hildenbrand
2025-01-10 20:56               ` William Roche
2024-12-14 13:45           ` [PATCH v4 6/7] hostmem: Handle remapping of RAM “William Roche
2025-01-08 21:51             ` [PATCH v4 6/7] c David Hildenbrand
2025-01-10 20:57               ` [PATCH v4 6/7] hostmem: Handle remapping of RAM William Roche
2024-12-14 13:45           ` [PATCH v4 7/7] system/physmem: Memory settings applied on remap notification “William Roche
2025-01-08 21:53             ` David Hildenbrand
2025-01-10 20:57               ` William Roche
2025-01-14 14:01                 ` David Hildenbrand
2025-01-08 21:22           ` [PATCH v4 0/7] Poisoned memory recovery on reboot David Hildenbrand
2025-01-10 20:55             ` William Roche
2025-01-10 21:13         ` [PATCH v5 0/6] " “William Roche
2025-01-10 21:14           ` [PATCH v5 1/6] system/physmem: handle hugetlb correctly in qemu_ram_remap() “William Roche
2025-01-14 14:02             ` David Hildenbrand
2025-01-27 21:16               ` William Roche
2025-01-28 18:41                 ` David Hildenbrand
2025-01-10 21:14           ` [PATCH v5 2/6] system/physmem: poisoned memory discard on reboot “William Roche
2025-01-14 14:07             ` David Hildenbrand
2025-01-27 21:16               ` William Roche
2025-01-10 21:14           ` [PATCH v5 3/6] accel/kvm: Report the loss of a large memory page “William Roche
2025-01-14 14:09             ` David Hildenbrand
2025-01-27 21:16               ` William Roche
2025-01-28 18:45                 ` David Hildenbrand
2025-01-10 21:14           ` [PATCH v5 4/6] numa: Introduce and use ram_block_notify_remap() “William Roche
2025-01-10 21:14           ` [PATCH v5 5/6] hostmem: Factor out applying settings “William Roche
2025-01-10 21:14           ` [PATCH v5 6/6] hostmem: Handle remapping of RAM “William Roche
2025-01-14 14:11             ` David Hildenbrand
2025-01-27 21:16               ` William Roche
2025-01-14 14:12           ` [PATCH v5 0/6] Poisoned memory recovery on reboot David Hildenbrand
2025-01-27 21:16             ` William Roche

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e9ffa8b7-0e2a-46fb-a2e7-af701024a7d2@redhat.com \
    --to=david@redhat.com \
    --cc=joao.m.martins@oracle.com \
    --cc=kvm@vger.kernel.org \
    --cc=mtosatti@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=peter.maydell@linaro.org \
    --cc=peterx@redhat.com \
    --cc=philmd@linaro.org \
    --cc=qemu-arm@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=richard.henderson@linaro.org \
    --cc=william.roche@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).