qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: "Michael S. Tsirkin" <mst@redhat.com>
To: qemu-devel@nongnu.org
Cc: "Peter Maydell" <peter.maydell@linaro.org>,
	"Jason A. Donenfeld" <Jason@zx2c4.com>,
	x86@kernel.org, "Philippe Mathieu-Daudé" <philmd@linaro.org>,
	"H . Peter Anvin" <hpa@zytor.com>,
	"Borislav Petkov" <bp@alien8.de>,
	"Eric Biggers" <ebiggers@kernel.org>,
	"Eric Biggers" <ebiggers@google.com>,
	"Mathias Krause" <minipli@grsecurity.net>,
	"Sergio Lopez" <slp@redhat.com>,
	"Paolo Bonzini" <pbonzini@redhat.com>,
	"Richard Henderson" <richard.henderson@linaro.org>,
	"Eduardo Habkost" <eduardo@habkost.net>,
	"Marcel Apfelbaum" <marcel.apfelbaum@gmail.com>,
	"Gerd Hoffmann" <kraxel@redhat.com>
Subject: [PULL 10/56] x86: don't let decompressed kernel image clobber setup_data
Date: Mon, 30 Jan 2023 15:19:59 -0500	[thread overview]
Message-ID: <20230130201810.11518-11-mst@redhat.com> (raw)
In-Reply-To: <20230130201810.11518-1-mst@redhat.com>

From: "Jason A. Donenfeld" <Jason@zx2c4.com>

The setup_data links are appended to the compressed kernel image. Since
the kernel image is typically loaded at 0x100000, setup_data lives at
`0x100000 + compressed_size`, which does not get relocated during the
kernel's boot process.

The kernel typically decompresses the image starting at address
0x1000000 (note: there's one more zero there than the compressed image
above). This usually is fine for most kernels.

However, if the compressed image is actually quite large, then
setup_data will live at a `0x100000 + compressed_size` that extends into
the decompressed zone at 0x1000000. In other words, if compressed_size
is larger than `0x1000000 - 0x100000`, then the decompression step will
clobber setup_data, resulting in crashes.

Visually, what happens now is that QEMU appends setup_data to the kernel
image:

          kernel image            setup_data
   |--------------------------||----------------|
0x100000                  0x100000+l1     0x100000+l1+l2

The problem is that this decompresses to 0x1000000 (one more zero). So
if l1 is > (0x1000000-0x100000), then this winds up looking like:

          kernel image            setup_data
   |--------------------------||----------------|
0x100000                  0x100000+l1     0x100000+l1+l2

                                 d e c o m p r e s s e d   k e r n e l
                     |-------------------------------------------------------------|
                0x1000000                                                     0x1000000+l3

The decompressed kernel seemingly overwriting the compressed kernel
image isn't a problem, because that gets relocated to a higher address
early on in the boot process, at the end of startup_64. setup_data,
however, stays in the same place, since those links are self referential
and nothing fixes them up.  So the decompressed kernel clobbers it.

Fix this by appending setup_data to the cmdline blob rather than the
kernel image blob, which remains at a lower address that won't get
clobbered.

This could have been done by overwriting the initrd blob instead, but
that poses big difficulties, such as no longer being able to use memory
mapped files for initrd, hurting performance, and, more importantly, the
initrd address calculation is hard coded in qboot, and it always grows
down rather than up, which means lots of brittle semantics would have to
be changed around, incurring more complexity. In contrast, using cmdline
is simple and doesn't interfere with anything.

The microvm machine has a gross hack where it fiddles with fw_cfg data
after the fact. So this hack is updated to account for this appending,
by reserving some bytes.

Fixup-by: Michael S. Tsirkin <mst@redhat.com>
Cc: x86@kernel.org
Cc: Philippe Mathieu-Daudé <philmd@linaro.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Message-Id: <20221230220725.618763-1-Jason@zx2c4.com>
Message-ID: <20230128061015-mutt-send-email-mst@kernel.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Eric Biggers <ebiggers@google.com>
Tested-by: Mathias Krause <minipli@grsecurity.net>
---
 include/hw/i386/microvm.h |  5 ++--
 include/hw/nvram/fw_cfg.h |  9 +++++++
 hw/i386/microvm.c         | 15 +++++++----
 hw/i386/x86.c             | 52 +++++++++++++++++++++------------------
 hw/nvram/fw_cfg.c         |  9 +++++++
 5 files changed, 59 insertions(+), 31 deletions(-)

diff --git a/include/hw/i386/microvm.h b/include/hw/i386/microvm.h
index fad97a891d..e8af61f194 100644
--- a/include/hw/i386/microvm.h
+++ b/include/hw/i386/microvm.h
@@ -50,8 +50,9 @@
  */
 
 /* Platform virtio definitions */
-#define VIRTIO_MMIO_BASE      0xfeb00000
-#define VIRTIO_CMDLINE_MAXLEN 64
+#define VIRTIO_MMIO_BASE                0xfeb00000
+#define VIRTIO_CMDLINE_MAXLEN           64
+#define VIRTIO_CMDLINE_TOTAL_MAX_LEN    ((VIRTIO_CMDLINE_MAXLEN + 1) * 16)
 
 #define GED_MMIO_BASE         0xfea00000
 #define GED_MMIO_BASE_MEMHP   (GED_MMIO_BASE + 0x100)
diff --git a/include/hw/nvram/fw_cfg.h b/include/hw/nvram/fw_cfg.h
index 2e503904dc..990dcdbb2e 100644
--- a/include/hw/nvram/fw_cfg.h
+++ b/include/hw/nvram/fw_cfg.h
@@ -139,6 +139,15 @@ void fw_cfg_add_bytes_callback(FWCfgState *s, uint16_t key,
                                void *data, size_t len,
                                bool read_only);
 
+/**
+ * fw_cfg_read_bytes_ptr:
+ * @s: fw_cfg device being modified
+ * @key: selector key value for new fw_cfg item
+ *
+ * Reads an existing fw_cfg data pointer.
+ */
+void *fw_cfg_read_bytes_ptr(FWCfgState *s, uint16_t key);
+
 /**
  * fw_cfg_add_string:
  * @s: fw_cfg device being modified
diff --git a/hw/i386/microvm.c b/hw/i386/microvm.c
index 170a331e3f..29f30dd6d3 100644
--- a/hw/i386/microvm.c
+++ b/hw/i386/microvm.c
@@ -378,7 +378,8 @@ static void microvm_fix_kernel_cmdline(MachineState *machine)
     MicrovmMachineState *mms = MICROVM_MACHINE(machine);
     BusState *bus;
     BusChild *kid;
-    char *cmdline;
+    char *cmdline, *existing_cmdline;
+    size_t len;
 
     /*
      * Find MMIO transports with attached devices, and add them to the kernel
@@ -387,7 +388,8 @@ static void microvm_fix_kernel_cmdline(MachineState *machine)
      * Yes, this is a hack, but one that heavily improves the UX without
      * introducing any significant issues.
      */
-    cmdline = g_strdup(machine->kernel_cmdline);
+    existing_cmdline = fw_cfg_read_bytes_ptr(x86ms->fw_cfg, FW_CFG_CMDLINE_DATA);
+    cmdline = g_strdup(existing_cmdline);
     bus = sysbus_get_default();
     QTAILQ_FOREACH(kid, &bus->children, sibling) {
         DeviceState *dev = kid->child;
@@ -411,9 +413,12 @@ static void microvm_fix_kernel_cmdline(MachineState *machine)
         }
     }
 
-    fw_cfg_modify_i32(x86ms->fw_cfg, FW_CFG_CMDLINE_SIZE, strlen(cmdline) + 1);
-    fw_cfg_modify_string(x86ms->fw_cfg, FW_CFG_CMDLINE_DATA, cmdline);
-
+    len = strlen(cmdline);
+    if (len > VIRTIO_CMDLINE_TOTAL_MAX_LEN + strlen(existing_cmdline)) {
+        fprintf(stderr, "qemu: virtio mmio cmdline too large, skipping\n");
+    } else {
+        memcpy(existing_cmdline, cmdline, len + 1);
+    }
     g_free(cmdline);
 }
 
diff --git a/hw/i386/x86.c b/hw/i386/x86.c
index 78cc131926..eaff4227bd 100644
--- a/hw/i386/x86.c
+++ b/hw/i386/x86.c
@@ -50,6 +50,7 @@
 #include "hw/intc/i8259.h"
 #include "hw/rtc/mc146818rtc.h"
 #include "target/i386/sev.h"
+#include "hw/i386/microvm.h"
 
 #include "hw/acpi/cpu_hotplug.h"
 #include "hw/irq.h"
@@ -813,12 +814,18 @@ void x86_load_linux(X86MachineState *x86ms,
     const char *kernel_filename = machine->kernel_filename;
     const char *initrd_filename = machine->initrd_filename;
     const char *dtb_filename = machine->dtb;
-    const char *kernel_cmdline = machine->kernel_cmdline;
+    char *kernel_cmdline;
     SevKernelLoaderContext sev_load_ctx = {};
     enum { RNG_SEED_LENGTH = 32 };
 
-    /* Align to 16 bytes as a paranoia measure */
-    cmdline_size = (strlen(kernel_cmdline) + 16) & ~15;
+    /*
+     * Add the NUL terminator, some padding for the microvm cmdline fiddling
+     * hack, and then align to 16 bytes as a paranoia measure
+     */
+    cmdline_size = (strlen(machine->kernel_cmdline) + 1 +
+                    VIRTIO_CMDLINE_TOTAL_MAX_LEN + 16) & ~15;
+    /* Make a copy, since we might append arbitrary bytes to it later. */
+    kernel_cmdline = g_strndup(machine->kernel_cmdline, cmdline_size);
 
     /* load the kernel header */
     f = fopen(kernel_filename, "rb");
@@ -959,12 +966,6 @@ void x86_load_linux(X86MachineState *x86ms,
         initrd_max = x86ms->below_4g_mem_size - acpi_data_size - 1;
     }
 
-    fw_cfg_add_i32(fw_cfg, FW_CFG_CMDLINE_ADDR, cmdline_addr);
-    fw_cfg_add_i32(fw_cfg, FW_CFG_CMDLINE_SIZE, strlen(kernel_cmdline) + 1);
-    fw_cfg_add_string(fw_cfg, FW_CFG_CMDLINE_DATA, kernel_cmdline);
-    sev_load_ctx.cmdline_data = (char *)kernel_cmdline;
-    sev_load_ctx.cmdline_size = strlen(kernel_cmdline) + 1;
-
     if (protocol >= 0x202) {
         stl_p(header + 0x228, cmdline_addr);
     } else {
@@ -1091,27 +1092,24 @@ void x86_load_linux(X86MachineState *x86ms,
             exit(1);
         }
 
-        setup_data_offset = QEMU_ALIGN_UP(kernel_size, 16);
-        kernel_size = setup_data_offset + sizeof(SetupData) + dtb_size;
-        kernel = g_realloc(kernel, kernel_size);
-
-
-        setup_data = (SetupData *)(kernel + setup_data_offset);
+        setup_data_offset = cmdline_size;
+        cmdline_size += sizeof(SetupData) + dtb_size;
+        kernel_cmdline = g_realloc(kernel_cmdline, cmdline_size);
+        setup_data = (void *)kernel_cmdline + setup_data_offset;
         setup_data->next = cpu_to_le64(first_setup_data);
-        first_setup_data = prot_addr + setup_data_offset;
+        first_setup_data = cmdline_addr + setup_data_offset;
         setup_data->type = cpu_to_le32(SETUP_DTB);
         setup_data->len = cpu_to_le32(dtb_size);
-
         load_image_size(dtb_filename, setup_data->data, dtb_size);
     }
 
-    if (!legacy_no_rng_seed) {
-        setup_data_offset = QEMU_ALIGN_UP(kernel_size, 16);
-        kernel_size = setup_data_offset + sizeof(SetupData) + RNG_SEED_LENGTH;
-        kernel = g_realloc(kernel, kernel_size);
-        setup_data = (SetupData *)(kernel + setup_data_offset);
+    if (!legacy_no_rng_seed && protocol >= 0x209) {
+        setup_data_offset = cmdline_size;
+        cmdline_size += sizeof(SetupData) + RNG_SEED_LENGTH;
+        kernel_cmdline = g_realloc(kernel_cmdline, cmdline_size);
+        setup_data = (void *)kernel_cmdline + setup_data_offset;
         setup_data->next = cpu_to_le64(first_setup_data);
-        first_setup_data = prot_addr + setup_data_offset;
+        first_setup_data = cmdline_addr + setup_data_offset;
         setup_data->type = cpu_to_le32(SETUP_RNG_SEED);
         setup_data->len = cpu_to_le32(RNG_SEED_LENGTH);
         qemu_guest_getrandom_nofail(setup_data->data, RNG_SEED_LENGTH);
@@ -1122,6 +1120,12 @@ void x86_load_linux(X86MachineState *x86ms,
         fw_cfg_add_bytes(fw_cfg, FW_CFG_KERNEL_DATA, kernel, kernel_size);
     }
 
+    fw_cfg_add_i32(fw_cfg, FW_CFG_CMDLINE_ADDR, cmdline_addr);
+    fw_cfg_add_i32(fw_cfg, FW_CFG_CMDLINE_SIZE, cmdline_size);
+    fw_cfg_add_bytes(fw_cfg, FW_CFG_CMDLINE_DATA, kernel_cmdline, cmdline_size);
+    sev_load_ctx.cmdline_data = (char *)kernel_cmdline;
+    sev_load_ctx.cmdline_size = cmdline_size;
+
     fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_ADDR, prot_addr);
     fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_SIZE, kernel_size);
     sev_load_ctx.kernel_data = (char *)kernel;
@@ -1134,7 +1138,7 @@ void x86_load_linux(X86MachineState *x86ms,
      * kernel on the other side of the fw_cfg interface matches the hash of the
      * file the user passed in.
      */
-    if (!sev_enabled()) {
+    if (!sev_enabled() && first_setup_data) {
         SetupDataFixup *fixup = g_malloc(sizeof(*fixup));
 
         memcpy(setup, header, MIN(sizeof(header), setup_size));
diff --git a/hw/nvram/fw_cfg.c b/hw/nvram/fw_cfg.c
index a00881bc64..432754eda4 100644
--- a/hw/nvram/fw_cfg.c
+++ b/hw/nvram/fw_cfg.c
@@ -741,6 +741,15 @@ void fw_cfg_add_bytes(FWCfgState *s, uint16_t key, void *data, size_t len)
     fw_cfg_add_bytes_callback(s, key, NULL, NULL, NULL, data, len, true);
 }
 
+void *fw_cfg_read_bytes_ptr(FWCfgState *s, uint16_t key)
+{
+    int arch = !!(key & FW_CFG_ARCH_LOCAL);
+
+    key &= FW_CFG_ENTRY_MASK;
+    assert(key < fw_cfg_max_entry(s));
+    return s->entries[arch][key].data;
+}
+
 void fw_cfg_add_string(FWCfgState *s, uint16_t key, const char *value)
 {
     size_t sz = strlen(value) + 1;
-- 
MST



WARNING: multiple messages have this Message-ID (diff)
From: "Michael S. Tsirkin" <mst@redhat.com>
To: qemu-devel@nongnu.org
Cc: "Peter Maydell" <peter.maydell@linaro.org>,
	"Jason A. Donenfeld" <Jason@zx2c4.com>,
	x86@kernel.org, "Philippe Mathieu-Daudé" <philmd@linaro.org>,
	"H . Peter Anvin" <hpa@zytor.com>,
	"Borislav Petkov" <bp@alien8.de>,
	"Eric Biggers" <ebiggers@kernel.org>,
	"Eric Biggers" <ebiggers@google.com>,
	"Mathias Krause" <minipli@grsecurity.net>,
	"Sergio Lopez" <slp@redhat.com>,
	"Paolo Bonzini" <pbonzini@redhat.com>,
	"Richard Henderson" <richard.henderson@linaro.org>,
	"Eduardo Habkost" <eduardo@habkost.net>,
	"Marcel Apfelbaum" <marcel.apfelbaum@gmail.com>,
	"Gerd Hoffmann" <kraxel@redhat.com>
Subject: [PULL 10/56] x86: don't let decompressed kernel image clobber setup_data
Date: Mon, 30 Jan 2023 15:19:14 -0500	[thread overview]
Message-ID: <20230130201810.11518-11-mst@redhat.com> (raw)
Message-ID: <20230130201914.W-v9wz4dn-FTms5fcnuE2JVYROA9mOBZR4fAXjFL4qI@z> (raw)
In-Reply-To: <20230130201810.11518-1-mst@redhat.com>

From: "Jason A. Donenfeld" <Jason@zx2c4.com>

The setup_data links are appended to the compressed kernel image. Since
the kernel image is typically loaded at 0x100000, setup_data lives at
`0x100000 + compressed_size`, which does not get relocated during the
kernel's boot process.

The kernel typically decompresses the image starting at address
0x1000000 (note: there's one more zero there than the compressed image
above). This usually is fine for most kernels.

However, if the compressed image is actually quite large, then
setup_data will live at a `0x100000 + compressed_size` that extends into
the decompressed zone at 0x1000000. In other words, if compressed_size
is larger than `0x1000000 - 0x100000`, then the decompression step will
clobber setup_data, resulting in crashes.

Visually, what happens now is that QEMU appends setup_data to the kernel
image:

          kernel image            setup_data
   |--------------------------||----------------|
0x100000                  0x100000+l1     0x100000+l1+l2

The problem is that this decompresses to 0x1000000 (one more zero). So
if l1 is > (0x1000000-0x100000), then this winds up looking like:

          kernel image            setup_data
   |--------------------------||----------------|
0x100000                  0x100000+l1     0x100000+l1+l2

                                 d e c o m p r e s s e d   k e r n e l
                     |-------------------------------------------------------------|
                0x1000000                                                     0x1000000+l3

The decompressed kernel seemingly overwriting the compressed kernel
image isn't a problem, because that gets relocated to a higher address
early on in the boot process, at the end of startup_64. setup_data,
however, stays in the same place, since those links are self referential
and nothing fixes them up.  So the decompressed kernel clobbers it.

Fix this by appending setup_data to the cmdline blob rather than the
kernel image blob, which remains at a lower address that won't get
clobbered.

This could have been done by overwriting the initrd blob instead, but
that poses big difficulties, such as no longer being able to use memory
mapped files for initrd, hurting performance, and, more importantly, the
initrd address calculation is hard coded in qboot, and it always grows
down rather than up, which means lots of brittle semantics would have to
be changed around, incurring more complexity. In contrast, using cmdline
is simple and doesn't interfere with anything.

The microvm machine has a gross hack where it fiddles with fw_cfg data
after the fact. So this hack is updated to account for this appending,
by reserving some bytes.

Fixup-by: Michael S. Tsirkin <mst@redhat.com>
Cc: x86@kernel.org
Cc: Philippe Mathieu-Daudé <philmd@linaro.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Message-Id: <20221230220725.618763-1-Jason@zx2c4.com>
Message-ID: <20230128061015-mutt-send-email-mst@kernel.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Eric Biggers <ebiggers@google.com>
Tested-by: Mathias Krause <minipli@grsecurity.net>
---
 include/hw/i386/microvm.h |  5 ++--
 include/hw/nvram/fw_cfg.h |  9 +++++++
 hw/i386/microvm.c         | 15 +++++++----
 hw/i386/x86.c             | 52 +++++++++++++++++++++------------------
 hw/nvram/fw_cfg.c         |  9 +++++++
 5 files changed, 59 insertions(+), 31 deletions(-)

diff --git a/include/hw/i386/microvm.h b/include/hw/i386/microvm.h
index fad97a891d..e8af61f194 100644
--- a/include/hw/i386/microvm.h
+++ b/include/hw/i386/microvm.h
@@ -50,8 +50,9 @@
  */
 
 /* Platform virtio definitions */
-#define VIRTIO_MMIO_BASE      0xfeb00000
-#define VIRTIO_CMDLINE_MAXLEN 64
+#define VIRTIO_MMIO_BASE                0xfeb00000
+#define VIRTIO_CMDLINE_MAXLEN           64
+#define VIRTIO_CMDLINE_TOTAL_MAX_LEN    ((VIRTIO_CMDLINE_MAXLEN + 1) * 16)
 
 #define GED_MMIO_BASE         0xfea00000
 #define GED_MMIO_BASE_MEMHP   (GED_MMIO_BASE + 0x100)
diff --git a/include/hw/nvram/fw_cfg.h b/include/hw/nvram/fw_cfg.h
index 2e503904dc..990dcdbb2e 100644
--- a/include/hw/nvram/fw_cfg.h
+++ b/include/hw/nvram/fw_cfg.h
@@ -139,6 +139,15 @@ void fw_cfg_add_bytes_callback(FWCfgState *s, uint16_t key,
                                void *data, size_t len,
                                bool read_only);
 
+/**
+ * fw_cfg_read_bytes_ptr:
+ * @s: fw_cfg device being modified
+ * @key: selector key value for new fw_cfg item
+ *
+ * Reads an existing fw_cfg data pointer.
+ */
+void *fw_cfg_read_bytes_ptr(FWCfgState *s, uint16_t key);
+
 /**
  * fw_cfg_add_string:
  * @s: fw_cfg device being modified
diff --git a/hw/i386/microvm.c b/hw/i386/microvm.c
index 170a331e3f..29f30dd6d3 100644
--- a/hw/i386/microvm.c
+++ b/hw/i386/microvm.c
@@ -378,7 +378,8 @@ static void microvm_fix_kernel_cmdline(MachineState *machine)
     MicrovmMachineState *mms = MICROVM_MACHINE(machine);
     BusState *bus;
     BusChild *kid;
-    char *cmdline;
+    char *cmdline, *existing_cmdline;
+    size_t len;
 
     /*
      * Find MMIO transports with attached devices, and add them to the kernel
@@ -387,7 +388,8 @@ static void microvm_fix_kernel_cmdline(MachineState *machine)
      * Yes, this is a hack, but one that heavily improves the UX without
      * introducing any significant issues.
      */
-    cmdline = g_strdup(machine->kernel_cmdline);
+    existing_cmdline = fw_cfg_read_bytes_ptr(x86ms->fw_cfg, FW_CFG_CMDLINE_DATA);
+    cmdline = g_strdup(existing_cmdline);
     bus = sysbus_get_default();
     QTAILQ_FOREACH(kid, &bus->children, sibling) {
         DeviceState *dev = kid->child;
@@ -411,9 +413,12 @@ static void microvm_fix_kernel_cmdline(MachineState *machine)
         }
     }
 
-    fw_cfg_modify_i32(x86ms->fw_cfg, FW_CFG_CMDLINE_SIZE, strlen(cmdline) + 1);
-    fw_cfg_modify_string(x86ms->fw_cfg, FW_CFG_CMDLINE_DATA, cmdline);
-
+    len = strlen(cmdline);
+    if (len > VIRTIO_CMDLINE_TOTAL_MAX_LEN + strlen(existing_cmdline)) {
+        fprintf(stderr, "qemu: virtio mmio cmdline too large, skipping\n");
+    } else {
+        memcpy(existing_cmdline, cmdline, len + 1);
+    }
     g_free(cmdline);
 }
 
diff --git a/hw/i386/x86.c b/hw/i386/x86.c
index 78cc131926..eaff4227bd 100644
--- a/hw/i386/x86.c
+++ b/hw/i386/x86.c
@@ -50,6 +50,7 @@
 #include "hw/intc/i8259.h"
 #include "hw/rtc/mc146818rtc.h"
 #include "target/i386/sev.h"
+#include "hw/i386/microvm.h"
 
 #include "hw/acpi/cpu_hotplug.h"
 #include "hw/irq.h"
@@ -813,12 +814,18 @@ void x86_load_linux(X86MachineState *x86ms,
     const char *kernel_filename = machine->kernel_filename;
     const char *initrd_filename = machine->initrd_filename;
     const char *dtb_filename = machine->dtb;
-    const char *kernel_cmdline = machine->kernel_cmdline;
+    char *kernel_cmdline;
     SevKernelLoaderContext sev_load_ctx = {};
     enum { RNG_SEED_LENGTH = 32 };
 
-    /* Align to 16 bytes as a paranoia measure */
-    cmdline_size = (strlen(kernel_cmdline) + 16) & ~15;
+    /*
+     * Add the NUL terminator, some padding for the microvm cmdline fiddling
+     * hack, and then align to 16 bytes as a paranoia measure
+     */
+    cmdline_size = (strlen(machine->kernel_cmdline) + 1 +
+                    VIRTIO_CMDLINE_TOTAL_MAX_LEN + 16) & ~15;
+    /* Make a copy, since we might append arbitrary bytes to it later. */
+    kernel_cmdline = g_strndup(machine->kernel_cmdline, cmdline_size);
 
     /* load the kernel header */
     f = fopen(kernel_filename, "rb");
@@ -959,12 +966,6 @@ void x86_load_linux(X86MachineState *x86ms,
         initrd_max = x86ms->below_4g_mem_size - acpi_data_size - 1;
     }
 
-    fw_cfg_add_i32(fw_cfg, FW_CFG_CMDLINE_ADDR, cmdline_addr);
-    fw_cfg_add_i32(fw_cfg, FW_CFG_CMDLINE_SIZE, strlen(kernel_cmdline) + 1);
-    fw_cfg_add_string(fw_cfg, FW_CFG_CMDLINE_DATA, kernel_cmdline);
-    sev_load_ctx.cmdline_data = (char *)kernel_cmdline;
-    sev_load_ctx.cmdline_size = strlen(kernel_cmdline) + 1;
-
     if (protocol >= 0x202) {
         stl_p(header + 0x228, cmdline_addr);
     } else {
@@ -1091,27 +1092,24 @@ void x86_load_linux(X86MachineState *x86ms,
             exit(1);
         }
 
-        setup_data_offset = QEMU_ALIGN_UP(kernel_size, 16);
-        kernel_size = setup_data_offset + sizeof(SetupData) + dtb_size;
-        kernel = g_realloc(kernel, kernel_size);
-
-
-        setup_data = (SetupData *)(kernel + setup_data_offset);
+        setup_data_offset = cmdline_size;
+        cmdline_size += sizeof(SetupData) + dtb_size;
+        kernel_cmdline = g_realloc(kernel_cmdline, cmdline_size);
+        setup_data = (void *)kernel_cmdline + setup_data_offset;
         setup_data->next = cpu_to_le64(first_setup_data);
-        first_setup_data = prot_addr + setup_data_offset;
+        first_setup_data = cmdline_addr + setup_data_offset;
         setup_data->type = cpu_to_le32(SETUP_DTB);
         setup_data->len = cpu_to_le32(dtb_size);
-
         load_image_size(dtb_filename, setup_data->data, dtb_size);
     }
 
-    if (!legacy_no_rng_seed) {
-        setup_data_offset = QEMU_ALIGN_UP(kernel_size, 16);
-        kernel_size = setup_data_offset + sizeof(SetupData) + RNG_SEED_LENGTH;
-        kernel = g_realloc(kernel, kernel_size);
-        setup_data = (SetupData *)(kernel + setup_data_offset);
+    if (!legacy_no_rng_seed && protocol >= 0x209) {
+        setup_data_offset = cmdline_size;
+        cmdline_size += sizeof(SetupData) + RNG_SEED_LENGTH;
+        kernel_cmdline = g_realloc(kernel_cmdline, cmdline_size);
+        setup_data = (void *)kernel_cmdline + setup_data_offset;
         setup_data->next = cpu_to_le64(first_setup_data);
-        first_setup_data = prot_addr + setup_data_offset;
+        first_setup_data = cmdline_addr + setup_data_offset;
         setup_data->type = cpu_to_le32(SETUP_RNG_SEED);
         setup_data->len = cpu_to_le32(RNG_SEED_LENGTH);
         qemu_guest_getrandom_nofail(setup_data->data, RNG_SEED_LENGTH);
@@ -1122,6 +1120,12 @@ void x86_load_linux(X86MachineState *x86ms,
         fw_cfg_add_bytes(fw_cfg, FW_CFG_KERNEL_DATA, kernel, kernel_size);
     }
 
+    fw_cfg_add_i32(fw_cfg, FW_CFG_CMDLINE_ADDR, cmdline_addr);
+    fw_cfg_add_i32(fw_cfg, FW_CFG_CMDLINE_SIZE, cmdline_size);
+    fw_cfg_add_bytes(fw_cfg, FW_CFG_CMDLINE_DATA, kernel_cmdline, cmdline_size);
+    sev_load_ctx.cmdline_data = (char *)kernel_cmdline;
+    sev_load_ctx.cmdline_size = cmdline_size;
+
     fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_ADDR, prot_addr);
     fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_SIZE, kernel_size);
     sev_load_ctx.kernel_data = (char *)kernel;
@@ -1134,7 +1138,7 @@ void x86_load_linux(X86MachineState *x86ms,
      * kernel on the other side of the fw_cfg interface matches the hash of the
      * file the user passed in.
      */
-    if (!sev_enabled()) {
+    if (!sev_enabled() && first_setup_data) {
         SetupDataFixup *fixup = g_malloc(sizeof(*fixup));
 
         memcpy(setup, header, MIN(sizeof(header), setup_size));
diff --git a/hw/nvram/fw_cfg.c b/hw/nvram/fw_cfg.c
index a00881bc64..432754eda4 100644
--- a/hw/nvram/fw_cfg.c
+++ b/hw/nvram/fw_cfg.c
@@ -741,6 +741,15 @@ void fw_cfg_add_bytes(FWCfgState *s, uint16_t key, void *data, size_t len)
     fw_cfg_add_bytes_callback(s, key, NULL, NULL, NULL, data, len, true);
 }
 
+void *fw_cfg_read_bytes_ptr(FWCfgState *s, uint16_t key)
+{
+    int arch = !!(key & FW_CFG_ARCH_LOCAL);
+
+    key &= FW_CFG_ENTRY_MASK;
+    assert(key < fw_cfg_max_entry(s));
+    return s->entries[arch][key].data;
+}
+
 void fw_cfg_add_string(FWCfgState *s, uint16_t key, const char *value)
 {
     size_t sz = strlen(value) + 1;
-- 
MST



  parent reply	other threads:[~2023-01-30 20:24 UTC|newest]

Thread overview: 66+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-30 20:18 [PULL 00/56] virtio,pc,pci: features, cleanups, fixes Michael S. Tsirkin
2023-01-30 20:18 ` [PULL 01/56] shpc: disallow unplug when power indicator is blinking Michael S. Tsirkin
2023-01-30 20:18 ` [PULL 02/56] hw/i386/acpi-build: Remove unused attributes Michael S. Tsirkin
2023-01-30 20:18 ` [PULL 03/56] hw/isa/isa-bus: Turn isa_build_aml() into qbus_build_aml() Michael S. Tsirkin
2023-01-30 20:18 ` [PULL 04/56] hw/acpi/piix4: No need to #include "hw/southbridge/piix.h" Michael S. Tsirkin
2023-01-30 20:18 ` [PULL 05/56] hw/acpi/acpi_dev_interface: Remove unused parameter from AcpiDeviceIfClass::madt_cpu Michael S. Tsirkin
2023-01-30 20:19 ` [PULL 06/56] vhost-user: Correct a reference of TARGET_AARCH64 Michael S. Tsirkin
2023-01-30 20:19 ` [PULL 07/56] hw/pci-host: Use register definitions from PCI standard Michael S. Tsirkin
2023-01-30 20:19 ` [PULL 08/56] virtio-rng-pci: fix migration compat for vectors Michael S. Tsirkin
2023-01-30 20:19 ` [PULL 09/56] intel-iommu: Document iova_tree Michael S. Tsirkin
2023-01-30 20:19 ` [PULL 12/56] tests: acpi: cleanup arguments to make them more readable Michael S. Tsirkin
2023-01-30 20:19 ` [PULL 13/56] tests: acpi: whitelist DSDT blobs for tests that use pci-bridges Michael S. Tsirkin
2023-01-30 20:19 ` [PULL 14/56] tests: acpi: extend pcihp with nested bridges Michael S. Tsirkin
2023-01-30 20:19 ` [PULL 15/56] tests: acpi: update expected blobs Michael S. Tsirkin
2023-01-30 20:19 ` [PULL 16/56] tests: acpi: cleanup use_uefi argument usage Michael S. Tsirkin
2023-01-30 20:19 ` [PULL 17/56] pci_bridge: remove whitespace Michael S. Tsirkin
2023-01-30 20:19 ` [PULL 18/56] x86: acpi: pcihp: clean up duplicate bridge_in_acpi assignment Michael S. Tsirkin
2023-01-30 20:19 ` [PULL 19/56] pci: acpi hotplug: rename x-native-hotplug to x-do-not-expose-native-hotplug-cap Michael S. Tsirkin
2023-01-30 20:19 ` [PULL 20/56] pcihp: piix4: do not call acpi_pcihp_reset() when ACPI PCI hotplug is disabled Michael S. Tsirkin
2023-01-30 20:19 ` [PULL 21/56] pci: acpihp: assign BSEL only to coldplugged bridges Michael S. Tsirkin
2023-01-30 20:19 ` [PULL 22/56] x86: pcihp: fix invalid AML PCNT calls to hotplugged bridges Michael S. Tsirkin
2023-01-30 20:19 ` [PULL 23/56] tests: boot_sector_test: avoid crashing if status is not available yet Michael S. Tsirkin
2023-01-30 20:19 ` Michael S. Tsirkin [this message]
2023-01-30 20:19   ` [PULL 10/56] x86: don't let decompressed kernel image clobber setup_data Michael S. Tsirkin
2023-01-31 19:39   ` Jason A. Donenfeld
2023-01-31 21:27     ` Michael S. Tsirkin
2023-01-31 20:54   ` H. Peter Anvin
2023-01-31 21:22     ` Jason A. Donenfeld
2023-02-01  5:40       ` H. Peter Anvin
2023-01-31 23:32   ` Jason A. Donenfeld
2023-01-30 20:20 ` [PULL 11/56] tests: qtest: print device_add error before failing test Michael S. Tsirkin
2023-01-30 20:20 ` [PULL 26/56] tests: acpi: add reboot cycle to bridge test Michael S. Tsirkin
2023-01-30 20:20 ` [PULL 27/56] tests: acpi: whitelist DSDT before refactoring acpi based PCI hotplug machinery Michael S. Tsirkin
2023-01-30 20:20 ` [PULL 28/56] pcihp: drop pcihp_bridge_en dependency when composing PCNT method Michael S. Tsirkin
2023-01-30 20:20 ` [PULL 29/56] tests: acpi: update expected blobs Michael S. Tsirkin
2023-01-30 20:20 ` [PULL 30/56] tests: acpi: whitelist DSDT before refactoring acpi based PCI hotplug machinery Michael S. Tsirkin
2023-01-30 20:20 ` [PULL 24/56] tests: acpi: extend bridge tests with hotplugged bridges Michael S. Tsirkin
2023-01-30 20:20   ` Michael S. Tsirkin
2023-01-30 20:20 ` [PULL 31/56] pcihp: compose PCNT callchain right before its user _GPE._E01 Michael S. Tsirkin
2023-01-30 20:20 ` [PULL 32/56] pcihp: do not put empty PCNT in DSDT Michael S. Tsirkin
2023-01-30 20:20 ` [PULL 25/56] tests: boot_sector_test(): make it multi-shot Michael S. Tsirkin
2023-01-30 20:20 ` [PULL 33/56] tests: acpi: update expected blobs Michael S. Tsirkin
2023-01-30 20:20 ` [PULL 34/56] whitelist DSDT before adding endpoint devices to bridge testcases Michael S. Tsirkin
2023-01-30 20:20 ` [PULL 35/56] tests: acpi: add endpoint devices to bridges Michael S. Tsirkin
2023-01-30 20:20 ` [PULL 36/56] tests: acpi: update expected blobs Michael S. Tsirkin
2023-01-30 20:20 ` [PULL 37/56] x86: pcihp: acpi: prepare slot ignore rule to work with self describing bridges Michael S. Tsirkin
2023-01-30 20:20 ` [PULL 38/56] pci: acpi: wire up AcpiDevAmlIf interface to generic bridge Michael S. Tsirkin
2023-01-30 20:20 ` [PULL 39/56] pcihp: make bridge describe itself using AcpiDevAmlIfClass:build_dev_aml Michael S. Tsirkin
2023-01-30 20:20 ` [PULL 40/56] pci: make sure pci_bus_is_express() won't error out with "discards ‘const’ qualifier" Michael S. Tsirkin
2023-01-30 20:20 ` [PULL 41/56] pcihp: isolate rule whether slot should be described in DSDT Michael S. Tsirkin
2023-01-30 20:21 ` [PULL 42/56] tests: acpi: whitelist DSDT before decoupling PCI hotplug code from basic slots description Michael S. Tsirkin
2023-01-30 20:21 ` [PULL 43/56] pcihp: acpi: decouple hotplug and generic " Michael S. Tsirkin
2023-01-30 20:21 ` [PULL 44/56] tests: acpi: update expected blobs Michael S. Tsirkin
2023-01-30 20:21 ` [PULL 45/56] tests: acpi: whitelist DSDT blobs before removing dynamic _DSM on coldplugged bridges Michael S. Tsirkin
2023-01-30 20:21 ` [PULL 46/56] pcihp: acpi: ignore coldplugged bridges when composing hotpluggable slots Michael S. Tsirkin
2023-01-30 20:21 ` [PULL 47/56] tests: acpi: update expected blobs Michael S. Tsirkin
2023-01-30 20:21 ` [PULL 48/56] tests: acpi: whitelist DSDT before moving non-hotpluggble slots description from hotplug path Michael S. Tsirkin
2023-01-30 20:21 ` [PULL 49/56] pcihp: generate populated non-hotpluggble slot descriptions on non-hotplug path Michael S. Tsirkin
2023-01-30 20:21 ` [PULL 50/56] tests: acpi: update expected blobs Michael S. Tsirkin
2023-01-30 20:21 ` [PULL 51/56] vhost-user: Skip unnecessary duplicated VHOST_USER_ADD/REM_MEM_REG requests Michael S. Tsirkin
2023-01-30 20:21 ` [PULL 52/56] hw: Use TYPE_PCI_BUS definition where appropriate Michael S. Tsirkin
2023-01-30 20:21 ` [PULL 53/56] tests/qtest/bios-tables-test: Make the test less verbose by default Michael S. Tsirkin
2023-01-30 20:21 ` [PULL 54/56] Revert "vhost-user: Monitor slave channel in vhost_user_read()" Michael S. Tsirkin
2023-01-30 20:21 ` [PULL 55/56] Revert "vhost-user: Introduce nested event loop " Michael S. Tsirkin
2023-01-30 20:21 ` [PULL 56/56] docs/pcie.txt: Replace ioh3420 with pcie-root-port Michael S. Tsirkin
2023-02-02 13:42 ` [PULL 00/56] virtio,pc,pci: features, cleanups, fixes Peter Maydell

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230130201810.11518-11-mst@redhat.com \
    --to=mst@redhat.com \
    --cc=Jason@zx2c4.com \
    --cc=bp@alien8.de \
    --cc=ebiggers@google.com \
    --cc=ebiggers@kernel.org \
    --cc=eduardo@habkost.net \
    --cc=hpa@zytor.com \
    --cc=kraxel@redhat.com \
    --cc=marcel.apfelbaum@gmail.com \
    --cc=minipli@grsecurity.net \
    --cc=pbonzini@redhat.com \
    --cc=peter.maydell@linaro.org \
    --cc=philmd@linaro.org \
    --cc=qemu-devel@nongnu.org \
    --cc=richard.henderson@linaro.org \
    --cc=slp@redhat.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).