qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Xiaoyao Li <xiaoyao.li@intel.com>
To: "Paolo Bonzini" <pbonzini@redhat.com>,
	"David Hildenbrand" <david@redhat.com>,
	"Igor Mammedov" <imammedo@redhat.com>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	"Marcel Apfelbaum" <marcel.apfelbaum@gmail.com>,
	"Richard Henderson" <richard.henderson@linaro.org>,
	"Peter Xu" <peterx@redhat.com>,
	"Philippe Mathieu-Daudé" <philmd@linaro.org>,
	"Cornelia Huck" <cohuck@redhat.com>,
	"Daniel P. Berrangé" <berrange@redhat.com>,
	"Eric Blake" <eblake@redhat.com>,
	"Markus Armbruster" <armbru@redhat.com>,
	"Marcelo Tosatti" <mtosatti@redhat.com>
Cc: qemu-devel@nongnu.org, kvm@vger.kernel.org, xiaoyao.li@intel.com,
	Michael Roth <michael.roth@amd.com>,
	isaku.yamahata@gmail.com, Sean Christopherson <seanjc@google.com>,
	Claudio Fontana <cfontana@suse.de>
Subject: [RFC PATCH v2 15/21] physmem: extract ram_block_discard_range_fd() from ram_block_discard_range()
Date: Wed, 13 Sep 2023 23:51:11 -0400	[thread overview]
Message-ID: <20230914035117.3285885-16-xiaoyao.li@intel.com> (raw)
In-Reply-To: <20230914035117.3285885-1-xiaoyao.li@intel.com>

Extract the alignment check and sanity check out from
ram_block_discard_range() into a seperate function
ram_block_discard_range_fd(), which can be passed with an explicit fd as
input parameter.

ram_block_discard_range_fd() can be used to discard private memory range
from gmem fd with later patch. When doing private memory <-> shared
memory conversion, it requires 4KB alignment instead of
RamBlock.page_size.

Signed-off-by: Xiaoyao Li <xiaoyao.li@intel.com>
---
 softmmu/physmem.c | 192 ++++++++++++++++++++++++----------------------
 1 file changed, 100 insertions(+), 92 deletions(-)

diff --git a/softmmu/physmem.c b/softmmu/physmem.c
index 34d580ec0d39..6ee6bc794f44 100644
--- a/softmmu/physmem.c
+++ b/softmmu/physmem.c
@@ -3425,117 +3425,125 @@ int qemu_ram_foreach_block(RAMBlockIterFunc func, void *opaque)
     return ret;
 }
 
+static int ram_block_discard_range_fd(RAMBlock *rb, uint64_t start,
+                                      size_t length, int fd)
+{
+    uint8_t *host_startaddr = rb->host + start;
+    bool need_madvise, need_fallocate;
+    int ret = -1;
+
+    errno = ENOTSUP; /* If we are missing MADVISE etc */
+
+    /* The logic here is messy;
+     *    madvise DONTNEED fails for hugepages
+     *    fallocate works on hugepages and shmem
+     *    shared anonymous memory requires madvise REMOVE
+     */
+    need_madvise = (rb->page_size == qemu_host_page_size) && (rb->fd == fd);
+    need_fallocate = fd != -1;
+
+    if (need_fallocate) {
+        /* For a file, this causes the area of the file to be zero'd
+         * if read, and for hugetlbfs also causes it to be unmapped
+         * so a userfault will trigger.
+         */
+#ifdef CONFIG_FALLOCATE_PUNCH_HOLE
+        /*
+         * We'll discard data from the actual file, even though we only
+         * have a MAP_PRIVATE mapping, possibly messing with other
+         * MAP_PRIVATE/MAP_SHARED mappings. There is no easy way to
+         * change that behavior whithout violating the promised
+         * semantics of ram_block_discard_range().
+         *
+         * Only warn, because it works as long as nobody else uses that
+         * file.
+         */
+        if (!qemu_ram_is_shared(rb)) {
+            warn_report_once("%s: Discarding RAM"
+                                " in private file mappings is possibly"
+                                " dangerous, because it will modify the"
+                                " underlying file and will affect other"
+                                " users of the file", __func__);
+        }
+
+        ret = fallocate(fd, FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE,
+                        start, length);
+        if (ret) {
+            ret = -errno;
+            error_report("%s: Failed to fallocate %s:%" PRIx64 " +%zx (%d)",
+                            __func__, rb->idstr, start, length, ret);
+            return ret;
+        }
+#else
+        ret = -ENOSYS;
+        error_report("%s: fallocate not available/file "
+                     "%s:%" PRIx64 " +%zx (%d)",
+                     __func__, rb->idstr, start, length, ret);
+        return ret;
+#endif
+    }
+
+    if (need_madvise) {
+        /* For normal RAM this causes it to be unmapped,
+         * for shared memory it causes the local mapping to disappear
+         * and to fall back on the file contents (which we just
+         * fallocate'd away).
+         */
+#if defined(CONFIG_MADVISE)
+        if (qemu_ram_is_shared(rb) && fd < 0) {
+            ret = madvise(host_startaddr, length, QEMU_MADV_REMOVE);
+        } else {
+            ret = madvise(host_startaddr, length, QEMU_MADV_DONTNEED);
+        }
+        if (ret) {
+            ret = -errno;
+            error_report("%s: Failed to discard range %s:%" PRIx64 " +%zx (%d)",
+                         __func__, rb->idstr, start, length, ret);
+            return ret;
+        }
+#else
+        ret = -ENOSYS;
+        error_report("%s: MADVISE not available %s:%" PRIx64 " +%zx (%d)",
+                        __func__, rb->idstr, start, length, ret);
+        return ret;
+#endif
+    }
+
+    trace_ram_block_discard_range(rb->idstr, host_startaddr, length,
+                                  need_madvise, need_fallocate, ret);
+    return ret;
+}
+
 /*
  * Unmap pages of memory from start to start+length such that
  * they a) read as 0, b) Trigger whatever fault mechanism
  * the OS provides for postcopy.
+ *
  * The pages must be unmapped by the end of the function.
- * Returns: 0 on success, none-0 on failure
- *
+ * Returns: 0 on success, none-0 on failure.
  */
 int ram_block_discard_range(RAMBlock *rb, uint64_t start, size_t length)
 {
-    int ret = -1;
-
     uint8_t *host_startaddr = rb->host + start;
 
     if (!QEMU_PTR_IS_ALIGNED(host_startaddr, rb->page_size)) {
         error_report("%s: Unaligned start address: %p",
                      __func__, host_startaddr);
-        goto err;
+        return -1;
     }
 
-    if ((start + length) <= rb->max_length) {
-        bool need_madvise, need_fallocate;
-        if (!QEMU_IS_ALIGNED(length, rb->page_size)) {
-            error_report("%s: Unaligned length: %zx", __func__, length);
-            goto err;
-        }
-
-        errno = ENOTSUP; /* If we are missing MADVISE etc */
-
-        /* The logic here is messy;
-         *    madvise DONTNEED fails for hugepages
-         *    fallocate works on hugepages and shmem
-         *    shared anonymous memory requires madvise REMOVE
-         */
-        need_madvise = (rb->page_size == qemu_host_page_size);
-        need_fallocate = rb->fd != -1;
-        if (need_fallocate) {
-            /* For a file, this causes the area of the file to be zero'd
-             * if read, and for hugetlbfs also causes it to be unmapped
-             * so a userfault will trigger.
-             */
-#ifdef CONFIG_FALLOCATE_PUNCH_HOLE
-            /*
-             * We'll discard data from the actual file, even though we only
-             * have a MAP_PRIVATE mapping, possibly messing with other
-             * MAP_PRIVATE/MAP_SHARED mappings. There is no easy way to
-             * change that behavior whithout violating the promised
-             * semantics of ram_block_discard_range().
-             *
-             * Only warn, because it works as long as nobody else uses that
-             * file.
-             */
-            if (!qemu_ram_is_shared(rb)) {
-                warn_report_once("%s: Discarding RAM"
-                                 " in private file mappings is possibly"
-                                 " dangerous, because it will modify the"
-                                 " underlying file and will affect other"
-                                 " users of the file", __func__);
-            }
-
-            ret = fallocate(rb->fd, FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE,
-                            start, length);
-            if (ret) {
-                ret = -errno;
-                error_report("%s: Failed to fallocate %s:%" PRIx64 " +%zx (%d)",
-                             __func__, rb->idstr, start, length, ret);
-                goto err;
-            }
-#else
-            ret = -ENOSYS;
-            error_report("%s: fallocate not available/file"
-                         "%s:%" PRIx64 " +%zx (%d)",
-                         __func__, rb->idstr, start, length, ret);
-            goto err;
-#endif
-        }
-        if (need_madvise) {
-            /* For normal RAM this causes it to be unmapped,
-             * for shared memory it causes the local mapping to disappear
-             * and to fall back on the file contents (which we just
-             * fallocate'd away).
-             */
-#if defined(CONFIG_MADVISE)
-            if (qemu_ram_is_shared(rb) && rb->fd < 0) {
-                ret = madvise(host_startaddr, length, QEMU_MADV_REMOVE);
-            } else {
-                ret = madvise(host_startaddr, length, QEMU_MADV_DONTNEED);
-            }
-            if (ret) {
-                ret = -errno;
-                error_report("%s: Failed to discard range "
-                             "%s:%" PRIx64 " +%zx (%d)",
-                             __func__, rb->idstr, start, length, ret);
-                goto err;
-            }
-#else
-            ret = -ENOSYS;
-            error_report("%s: MADVISE not available %s:%" PRIx64 " +%zx (%d)",
-                         __func__, rb->idstr, start, length, ret);
-            goto err;
-#endif
-        }
-        trace_ram_block_discard_range(rb->idstr, host_startaddr, length,
-                                      need_madvise, need_fallocate, ret);
-    } else {
+    if ((start + length) > rb->max_length) {
         error_report("%s: Overrun block '%s' (%" PRIu64 "/%zx/" RAM_ADDR_FMT")",
                      __func__, rb->idstr, start, length, rb->max_length);
+        return -1;
     }
 
-err:
-    return ret;
+    if (!QEMU_IS_ALIGNED(length, rb->page_size)) {
+        error_report("%s: Unaligned length: %zx", __func__, length);
+        return -1;
+    }
+
+    return ram_block_discard_range_fd(rb, start, length, rb->fd);
 }
 
 bool ramblock_is_pmem(RAMBlock *rb)
-- 
2.34.1



  parent reply	other threads:[~2023-09-14  3:53 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-09-14  3:50 [RFC PATCH v2 00/21] QEMU gmem implemention Xiaoyao Li
2023-09-14  3:50 ` [RFC PATCH v2 01/21] *** HACK *** linux-headers: Update headers to pull in gmem APIs Xiaoyao Li
2023-09-14  3:50 ` [RFC PATCH v2 02/21] RAMBlock: Add support of KVM private gmem Xiaoyao Li
2023-09-15  2:04   ` Wang, Lei
2023-09-15  3:45     ` Xiaoyao Li
2023-09-21  8:55   ` David Hildenbrand
2023-09-22  0:22     ` Xiaoyao Li
2023-09-22  7:08       ` David Hildenbrand
2023-10-08  2:59         ` Xiaoyao Li
2023-10-06 11:07   ` Daniel P. Berrangé
2023-09-14  3:50 ` [RFC PATCH v2 03/21] HostMem: Add private property and associate it with RAM_KVM_GMEM Xiaoyao Li
2023-09-19  9:46   ` Markus Armbruster
2023-09-19 23:24     ` Xiaoyao Li
2023-09-20  7:30       ` Markus Armbruster
2023-09-20 14:35         ` Xiaoyao Li
2023-09-20 14:37           ` David Hildenbrand
2023-09-20 15:42             ` Markus Armbruster
2023-09-21  8:38               ` Xiaoyao Li
2023-09-21  8:45                 ` David Hildenbrand
2023-09-14  3:51 ` [RFC PATCH v2 04/21] memory: Introduce memory_region_has_gmem_fd() Xiaoyao Li
2023-09-21  8:46   ` David Hildenbrand
2023-09-22  0:22     ` Xiaoyao Li
2023-09-14  3:51 ` [RFC PATCH v2 05/21] kvm: Enable KVM_SET_USER_MEMORY_REGION2 for memslot Xiaoyao Li
2023-09-21  8:56   ` David Hildenbrand
2023-09-22  0:23     ` Xiaoyao Li
2023-09-14  3:51 ` [RFC PATCH v2 06/21] i386: Add support for sw-protected-vm object Xiaoyao Li
2023-09-14  3:51 ` [RFC PATCH v2 07/21] i386/pc: Drop pc_machine_kvm_type() Xiaoyao Li
2023-09-21  8:51   ` David Hildenbrand
2023-09-22  0:24     ` Xiaoyao Li
2023-09-22  7:11       ` David Hildenbrand
2023-09-23  7:32   ` David Woodhouse
2023-09-14  3:51 ` [RFC PATCH v2 08/21] target/i386: Implement mc->kvm_type() to get VM type Xiaoyao Li
2023-09-14  3:51 ` [RFC PATCH v2 09/21] target/i386: Introduce kvm_confidential_guest_init() Xiaoyao Li
2023-09-14  3:51 ` [RFC PATCH v2 10/21] i386/kvm: Implement kvm_sw_protected_vm_init() for sw-protcted-vm specific functions Xiaoyao Li
2023-09-14  3:51 ` [RFC PATCH v2 11/21] kvm: Introduce support for memory_attributes Xiaoyao Li
2023-09-14  3:51 ` [RFC PATCH v2 12/21] kvm/memory: Introduce the infrastructure to set the default shared/private value Xiaoyao Li
2023-09-14  3:51 ` [RFC PATCH v2 13/21] i386/kvm: Set memory to default private for KVM_X86_SW_PROTECTED_VM Xiaoyao Li
2023-09-14  3:51 ` [RFC PATCH v2 14/21] physmem: replace function name with __func__ in ram_block_discard_range() Xiaoyao Li
2023-09-14  3:51 ` Xiaoyao Li [this message]
2023-09-14  3:51 ` [RFC PATCH v2 16/21] physmem: Introduce ram_block_convert_range() Xiaoyao Li
2023-09-14  3:51 ` [RFC PATCH v2 17/21] kvm: handle KVM_EXIT_MEMORY_FAULT Xiaoyao Li
2023-09-14  3:51 ` [RFC PATCH v2 18/21] trace/kvm: Add trace for page convertion between shared and private Xiaoyao Li
2023-09-14  3:51 ` [RFC PATCH v2 19/21] pci-host/q35: Move PAM initialization above SMRAM initialization Xiaoyao Li
2023-09-14  3:51 ` [RFC PATCH v2 20/21] q35: Introduce smm_ranges property for q35-pci-host Xiaoyao Li
2023-09-14  3:51 ` [RFC PATCH v2 21/21] i386: Disable SMM mode for X86_SW_PROTECTED_VM Xiaoyao Li
2023-09-14 13:09 ` [RFC PATCH v2 00/21] QEMU gmem implemention David Hildenbrand
2023-09-15  1:10   ` Sean Christopherson
2023-09-21  9:11     ` David Hildenbrand
2023-09-22  7:03       ` Xiaoyao Li
2023-09-22  7:10         ` David Hildenbrand
2023-09-15  3:37   ` Xiaoyao Li
2023-09-21  8:59     ` David Hildenbrand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230914035117.3285885-16-xiaoyao.li@intel.com \
    --to=xiaoyao.li@intel.com \
    --cc=armbru@redhat.com \
    --cc=berrange@redhat.com \
    --cc=cfontana@suse.de \
    --cc=cohuck@redhat.com \
    --cc=david@redhat.com \
    --cc=eblake@redhat.com \
    --cc=imammedo@redhat.com \
    --cc=isaku.yamahata@gmail.com \
    --cc=kvm@vger.kernel.org \
    --cc=marcel.apfelbaum@gmail.com \
    --cc=michael.roth@amd.com \
    --cc=mst@redhat.com \
    --cc=mtosatti@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=peterx@redhat.com \
    --cc=philmd@linaro.org \
    --cc=qemu-devel@nongnu.org \
    --cc=richard.henderson@linaro.org \
    --cc=seanjc@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).