qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: Gavin Shan <gshan@redhat.com>, qemu-arm@nongnu.org
Cc: peter.maydell@linaro.org, drjones@redhat.com,
	Michal Privoznik <mprivozn@redhat.com>,
	richard.henderson@linaro.org, qemu-devel@nongnu.org,
	eric.auger@redhat.com, shan.gavin@gmail.com,
	Jonathan.Cameron@huawei.com, imammedo@redhat.com
Subject: Re: [PATCH 1/1] hw/arm/virt: Support for virtio-mem-pci
Date: Wed, 1 Dec 2021 10:03:05 +0100	[thread overview]
Message-ID: <1ae3f686-aaa9-4402-433e-325f422275df@redhat.com> (raw)
In-Reply-To: <c44a9072-0f21-9321-1828-23aadea5fea3@redhat.com>

>>>
>>>     * It has been passing the tests with various combinations like 64KB
>>>       and 4KB page sizes on host and guest, different memory device
>>>       backends like normal, transparent huge page and HugeTLB, plus
>>>       migration.
>>
>> Perfect. A note that hugetlbfs isn't fully supported/safe to use until
>> we have preallocation support in QEMU (WIP).
>>
> 
> Yes, there is some warnings raised to enlarge 'request-size' on
> host with 64KB page size. Note that the memory backends I used
> in the testings always have "prealloc=on" property though.

1. prealloc=on

"prealloc=on" on the memory backend won't get the job done, because the first
thing virtio-mem does is discard all memory in the memory backend again when
it initializes. So it's an expensive NOP :) See

https://lkml.kernel.org/r/20211130104136.40927-9-david@redhat.com

for the virtio-mem "prealloc=on" option that preallocates memory when
exposing that memory to the VM.

To use huge pages in a safe way with virtio-mem, we need "reserve=off" on the
memory backend and "prealloc=on" on the virtio-mem device. (I'm in the process
of documenting that on virtio-mem.gitlab.io/ to make it clearer)


2. Warning on arm64 with 64k

I assume the warning you're seeing is regarding the block-size:

"Read unsupported THP size: ..." followed by
"Could not detect THP size, falling back to ..."

The right thing to do for now should be to remove that sanity check:

diff --git a/hw/virtio/virtio-mem.c b/hw/virtio/virtio-mem.c
index d5a578142b..33c32afeb1 100644
--- a/hw/virtio/virtio-mem.c
+++ b/hw/virtio/virtio-mem.c
@@ -78,11 +78,8 @@ static uint32_t virtio_mem_thp_size(void)
     if (g_file_get_contents(HPAGE_PMD_SIZE_PATH, &content, NULL, NULL) &&
         !qemu_strtou64(content, &endptr, 0, &tmp) &&
         (!endptr || *endptr == '\n')) {
-        /*
-         * Sanity-check the value, if it's too big (e.g., aarch64 with 64k base
-         * pages) or weird, fallback to something smaller.
-         */
-        if (!tmp || !is_power_of_2(tmp) || tmp > 16 * MiB) {
+        /* Sanity-check the value and fallback to something reasonable.  */
+        if (!tmp || !is_power_of_2(tmp)) {
             warn_report("Read unsupported THP size: %" PRIx64, tmp);
         } else {
             thp_size = tmp;

This will not affect setups we care about ( x86-64 KVM ).

It will imply that with a arm64 64k host, we can only hot(un)plug in
512 MiB granularity when not using hugetlb witht he default block-size.
However, that is already the case with arm64 64k guests as well.
The suggestion will be to use arm64 4k with virtio-mem in the host and
the guest for increased flexibility -- fortunately most distros
already have performed the switch to 4k on arm64, so we don't really
care IMHO.

To support block_size < THP size when not using hugetlb,
we'll have to disable THP (via MADV_NOHUGEPAGE) permanently for the memory
region, also making sure that e.g., postcopy won't re-enable it by adding
a proper flag (RAM_NOHUGEPAGE) to the RAMBlock. Because the issue is that
once the guest touches some memory, we might populate a THP that would cover
plugged and unplugged memory, which is bad.

Instead of warning in virtio_mem_device_realize() when
	vmem->block_size < virtio_mem_default_block_size(rb)
we'd have to disable THP.


Further, we should fixup the default THP size on arm64 in case we're
running on an older kernel where we cannot sense the THP size:


diff --git a/hw/virtio/virtio-mem.c b/hw/virtio/virtio-mem.c
index d5a578142b..371cee380a 100644
--- a/hw/virtio/virtio-mem.c
+++ b/hw/virtio/virtio-mem.c
@@ -38,13 +38,21 @@
  */
 #define VIRTIO_MEM_MIN_BLOCK_SIZE ((uint32_t)(1 * MiB))
 
+static uint32_t virtio_mem_default_thp_size(void)
+{
+#if defined(__aarch64__)
+    if (qemu_real_host_page_size == 64 * KiB) {
+        return 512 * MiB;
+    }
+#endif
 #if defined(__x86_64__) || defined(__arm__) || defined(__aarch64__) || \
     defined(__powerpc64__)
-#define VIRTIO_MEM_DEFAULT_THP_SIZE ((uint32_t)(2 * MiB))
+    return 2 * MiB;
 #else
-        /* fallback to 1 MiB (e.g., the THP size on s390x) */
-#define VIRTIO_MEM_DEFAULT_THP_SIZE VIRTIO_MEM_MIN_BLOCK_SIZE
+    /* fallback to 1 MiB (e.g., the THP size on s390x) */
+    return VIRTIO_MEM_MIN_BLOCK_SIZE;
 #endif
+}
 
 /*
  * We want to have a reasonable default block size such that
@@ -90,7 +98,7 @@ static uint32_t virtio_mem_thp_size(void)
     }
 
     if (!thp_size) {
-        thp_size = VIRTIO_MEM_DEFAULT_THP_SIZE;
+        thp_size = virtio_mem_default_thp_size();
         warn_report("Could not detect THP size, falling back to %" PRIx64
                     "  MiB.", thp_size / MiB);
     }



In the context of proper arm64 support, adding the above two changes
should be good enough. If you agree, can you include them in your v2
series as a separate patch?

Supporting block_size < thp_size when not using hugetlb is a different
work IMHO, if someone ever cares about that.



Thanks!

-- 
Thanks,

David / dhildenb



  reply	other threads:[~2021-12-01  9:04 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-30  0:33 [PATCH 0/1] hw/arm/virt: Support for virtio-mem-pci Gavin Shan
2021-11-30  0:33 ` [PATCH 1/1] " Gavin Shan
2021-11-30  9:37   ` David Hildenbrand
2021-12-01  5:09     ` Gavin Shan
2021-12-01  9:03       ` David Hildenbrand [this message]
2021-12-03  3:42         ` Gavin Shan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1ae3f686-aaa9-4402-433e-325f422275df@redhat.com \
    --to=david@redhat.com \
    --cc=Jonathan.Cameron@huawei.com \
    --cc=drjones@redhat.com \
    --cc=eric.auger@redhat.com \
    --cc=gshan@redhat.com \
    --cc=imammedo@redhat.com \
    --cc=mprivozn@redhat.com \
    --cc=peter.maydell@linaro.org \
    --cc=qemu-arm@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=richard.henderson@linaro.org \
    --cc=shan.gavin@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).