From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([208.118.235.92]:52369)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <borntraeger@de.ibm.com>) id 1SeP94-0004G0-S7
	for qemu-devel@nongnu.org; Tue, 12 Jun 2012 07:20:24 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <borntraeger@de.ibm.com>) id 1SeP8y-0006fE-RT
	for qemu-devel@nongnu.org; Tue, 12 Jun 2012 07:20:22 -0400
Received: from e06smtp12.uk.ibm.com ([195.75.94.108]:59505)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <borntraeger@de.ibm.com>) id 1SeP8y-0006eX-Fl
	for qemu-devel@nongnu.org; Tue, 12 Jun 2012 07:20:16 -0400
Received: from /spool/local
	by e06smtp12.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use
	Only! Violators will be prosecuted
	for <qemu-devel@nongnu.org> from <borntraeger@de.ibm.com>;
	Tue, 12 Jun 2012 12:20:13 +0100
Received: from d06av11.portsmouth.uk.ibm.com (d06av11.portsmouth.uk.ibm.com
	[9.149.37.252])
	by d06nrmr1307.portsmouth.uk.ibm.com (8.13.8/8.13.8/NCO v10.0) with
	ESMTP id q5CBKBjB2650338
	for <qemu-devel@nongnu.org>; Tue, 12 Jun 2012 12:20:11 +0100
Received: from d06av11.portsmouth.uk.ibm.com (loopback [127.0.0.1])
	by d06av11.portsmouth.uk.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with
	ESMTP id q5CBKA5B004587
	for <qemu-devel@nongnu.org>; Tue, 12 Jun 2012 05:20:11 -0600
Message-ID: <4FD725EB.7050501@de.ibm.com>
Date: Tue, 12 Jun 2012 13:20:11 +0200
From: Christian Borntraeger <borntraeger@de.ibm.com>
MIME-Version: 1.0
References: <1338984323-21914-1-git-send-email-jfrei@de.ibm.com>
	<1338984323-21914-3-git-send-email-jfrei@de.ibm.com>
	<4FD70CC0.7000901@suse.de>
In-Reply-To: <4FD70CC0.7000901@suse.de>
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Subject: Re: [Qemu-devel] [PATCH 2/8] s390: autodetect map private
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Alexander Graf <agraf@suse.de>
Cc: Jens Freimann <jfrei@de.ibm.com>, Cornelia Huck <cornelia.huck@de.ibm.com>, Jens Freimann <jfrei@linux.vnet.ibm.com>, Heinz Graalfs <graalfs@linux.vnet.ibm.com>, qemu-devel <qemu-devel@nongnu.org>

> Is there any way we can move the above code to target-s390x? Having the
> branch below is already invasive enough for generic code, but we really
> don't need all the special s390 quirks to live here.


Hmm, we have to have a special hook somehow.
What about this approach?

-----------------------

By default qemu will use MAP_PRIVATE for guest pages. This will write
protect pages and thus break on s390 systems that dont support this feature.
Therefore qemu has a hack to always use MAP_SHARED for s390. But MAP_SHARED
has other problems (no dirty pages tracking, a lot more swap overhead etc.)
Newer systems allow the distinction via KVM_CAP_S390_COW. With this feature
qemu can use the standard qemu alloc if available, otherwise it will use
the old s390 hack.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
---
 exec.c             |   19 ++++---------------
 kvm.h              |    1 +
 oslib-posix.c      |    3 +++
 target-s390x/kvm.c |   34 ++++++++++++++++++++++++++++++++++
 4 files changed, 42 insertions(+), 15 deletions(-)

diff --git a/exec.c b/exec.c
index a587e7a..9b9b8e1 100644
--- a/exec.c
+++ b/exec.c
@@ -2645,26 +2645,15 @@ ram_addr_t qemu_ram_alloc_from_ptr(ram_addr_t size, void *host,
             exit(1);
 #endif
         } else {
-#if defined(TARGET_S390X) && defined(CONFIG_KVM)
-            /* S390 KVM requires the topmost vma of the RAM to be smaller than
-               an system defined value, which is at least 256GB. Larger systems
-               have larger values. We put the guest between the end of data
-               segment (system break) and this value. We use 32GB as a base to
-               have enough room for the system break to grow. */
-            new_block->host = mmap((void*)0x800000000, size,
-                                   PROT_EXEC|PROT_READ|PROT_WRITE,
-                                   MAP_SHARED | MAP_ANONYMOUS | MAP_FIXED, -1, 0);
-            if (new_block->host == MAP_FAILED) {
-                fprintf(stderr, "Allocating RAM failed\n");
-                abort();
-            }
-#else
             if (xen_enabled()) {
                 xen_ram_alloc(new_block->offset, size, mr);
+#if defined(TARGET_S390X)
+            } else if (kvm_enabled()) {
+                new_block->host = kvm_arch_alloc(size);
+#endif
             } else {
                 new_block->host = qemu_vmalloc(size);
             }
-#endif
             qemu_madvise(new_block->host, size, QEMU_MADV_MERGEABLE);
         }
     }
diff --git a/kvm.h b/kvm.h
index 9c7b0ea..9d50016 100644
--- a/kvm.h
+++ b/kvm.h
@@ -102,6 +102,7 @@ int kvm_vcpu_ioctl(CPUArchState *env, int type, ...);
 
 extern const KVMCapabilityInfo kvm_arch_required_capabilities[];
 
+void *kvm_arch_alloc(ram_addr_t size);
 void kvm_arch_pre_run(CPUArchState *env, struct kvm_run *run);
 void kvm_arch_post_run(CPUArchState *env, struct kvm_run *run);
 
diff --git a/oslib-posix.c b/oslib-posix.c
index b6a3c7f..93902ac 100644
--- a/oslib-posix.c
+++ b/oslib-posix.c
@@ -41,6 +41,9 @@ extern int daemon(int, int);
       therefore we need special code which handles running on Valgrind. */
 #  define QEMU_VMALLOC_ALIGN (512 * 4096)
 #  define CONFIG_VALGRIND
+#elif defined(__linux__) && defined(__s390x__)
+   /* Use 1 MiB (segment size) alignment so gmap can be used by KVM. */
+#  define QEMU_VMALLOC_ALIGN (256 * 4096)
 #else
 #  define QEMU_VMALLOC_ALIGN getpagesize()
 #endif
diff --git a/target-s390x/kvm.c b/target-s390x/kvm.c
index 90aad61..ccf5daa 100644
--- a/target-s390x/kvm.c
+++ b/target-s390x/kvm.c
@@ -135,6 +135,40 @@ int kvm_arch_get_registers(CPUS390XState *env)
     return 0;
 }
 
+/*
+ * Legacy layout for s390:
+ * Older S390 KVM requires the topmost vma of the RAM to be
+ * smaller than an system defined value, which is at least 256GB.
+ * Larger systems have larger values. We put the guest between
+ * the end of data segment (system break) and this value. We
+ * use 32GB as a base to have enough room for the system break
+ * to grow. We also have to use MAP parameters that avoid
+ * read-only mapping of guest pages.
+ */
+static void *legacy_s390_alloc(ram_addr_t size)
+{
+    void *mem;
+
+    mem = mmap((void *) 0x800000000ULL, size,
+               PROT_EXEC|PROT_READ|PROT_WRITE,
+               MAP_SHARED | MAP_ANONYMOUS | MAP_FIXED, -1, 0);
+    if (mem == MAP_FAILED) {
+        fprintf(stderr, "Allocating RAM failed\n");
+        abort();
+    }
+    return mem;
+}
+
+void *kvm_arch_alloc(ram_addr_t size)
+{
+    if (kvm_check_extension(kvm_state, KVM_CAP_S390_GMAP) &&
+        kvm_check_extension(kvm_state, KVM_CAP_S390_COW)) {
+        return qemu_vmalloc(size);
+    } else {
+        return legacy_s390_alloc(size);
+    }
+}
+
 int kvm_arch_insert_sw_breakpoint(CPUS390XState *env, struct kvm_sw_breakpoint *bp)
 {
     static const uint8_t diag_501[] = {0x83, 0x24, 0x05, 0x01};