[Qemu-devel] QEMU fw_cfg DMA interface

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [Qemu-devel] QEMU fw_cfg DMA interface
@ 2015-10-01 12:14 Marc Marí
  2015-10-01 12:16 ` [Qemu-devel] [PATCH v4 0/7] " Marc Marí
  2015-10-01 16:03 ` [Qemu-devel] QEMU " Eric Blake
  0 siblings, 2 replies; 60+ messages in thread
From: Marc Marí @ 2015-10-01 12:14 UTC (permalink / raw)
  To: linux-kernel, qemu-devel, seabios
  Cc: Mark Rutland, Rob Herring, Drew, Arnd Bergmann, devicetree,
	Stefan Hajnoczi, Alexander Graf, Kevin O'Connor,
	Gerd Hoffmann, Marc Marí, Laszlo

Implementation of the FW CFG DMA interface.

When running a Linux guest on top of QEMU, using the -kernel options, this
is the timing improvement for x86:

QEMU commit b2312c6 and SeaBIOS commit 423542e
QEMU startup time: .080
BIOS startup time: .060
Kernel setup time: .586
Total time: .726

QEMU with this patch series and SeaBIOS with this patch series
QEMU startup time: .080
BIOS startup time: .039
Kernel setup time: .005
Total time: .126

QEMU startup time is the time between the start and the first kvm_entry.
BIOS startup time is the time between the first kvm_entry and the start of
function do_boot, in SeaBIOS.
Kernel setup time is the time between the start of the function do_boot in
SeaBIOS and the jump to the Linux kernel.

As you can see, both the BIOS (because of ACPI tables and other configurations)
and the Linux kernel boot (because of the copy to memory) are greatly
improved with this new interface.

Also, this new interface is an addon to the old interface. Both interfaces
are compatible and interchangeable.

Changes from v1:
 - Take into account order of fields in the FWCfgDmaAccess structure
 - Check and change endianness of FWCfgDmaAccess fields
 - Change order of fields in the FWCfgDmaAccess structure
 - Add FW_CFG_DMA_CTL_SKIP feature for control field
 - Split FW_CFG_SIZE in QEMU
 - Make FW_CFG_ID a bitmap of features
 - Add 64 bit address support for the transfer. Trigger when writing the low
   address, and address is 0 by default and at the end of each transfer.
 - Align ports and addresses.
 - Preserve old fw_cfg_comb_valid behaviour in QEMU
 - Update documentation to reflect all these changes

Changes from v2:
 - Make IOports fw_cfg DMA region a different IO region.
 - Reuse everything for MMIO and IOport DMA regions
 - Make transfer status only based on control field
 - Use DMA helpers instead of direct map/unmap
 - Change ARM fw_cfg DMA address space
 - Change Linux boot process to match linuxboot.S
 - Add select capabilities in the FWCfgDmaAccess struct
 - Update documentation to reflect all these changes

Changes from v3:
 - Set properly fw_cfg DMA fields in ARM
 - Set fw_cfg DMA boot process properly (by Laszlo Ersek)
 - Add signature to fw_cfg DMA address field (by Kevin O'Connor)
 - Minor nitpicks

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [Qemu-devel] [PATCH v4 0/7] fw_cfg DMA interface
  2015-10-01 12:14 [Qemu-devel] QEMU fw_cfg DMA interface Marc Marí
@ 2015-10-01 12:16 ` Marc Marí
  2015-10-01 12:16   ` [Qemu-devel] [PATCH v4 1/7] fw_cfg: document fw_cfg_modify_iXX() update functions Marc Marí
                     ` (7 more replies)
  2015-10-01 16:03 ` [Qemu-devel] QEMU " Eric Blake
  1 sibling, 8 replies; 60+ messages in thread
From: Marc Marí @ 2015-10-01 12:16 UTC (permalink / raw)
  To: qemu-devel
  Cc: Drew, Stefan Hajnoczi, Gabriel L. Somlo, Kevin O'Connor,
	Gerd Hoffmann, Marc Marí, Laszlo

Implement host-side of the FW CFG DMA interface both for x86 and ARM.

Based on Gerd Hoffman's initial implementation.

Gabriel L. Somlo (1):
  fw_cfg: document fw_cfg_modify_iXX() update functions

Kevin O'Connor (1):
  fw_cfg: Define a static signature to be returned on DMA port reads

Marc Marí (5):
  fw_cfg DMA interface documentation
  Implement fw_cfg DMA interface
  Enable fw_cfg DMA interface for ARM
  Enable fw_cfg DMA interface for x86
  Make the kernel image in the fw_cfg DMA interface bootable

 docs/specs/fw_cfg.txt     |  80 +++++++++++++++-
 hw/arm/virt.c             |   8 +-
 hw/i386/pc.c              |  12 ++-
 hw/nvram/fw_cfg.c         | 240 +++++++++++++++++++++++++++++++++++++++++++---
 include/hw/nvram/fw_cfg.h |  16 +++-
 5 files changed, 329 insertions(+), 27 deletions(-)

-- 
2.4.3

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [Qemu-devel] [PATCH v4 1/7] fw_cfg: document fw_cfg_modify_iXX() update functions
  2015-10-01 12:16 ` [Qemu-devel] [PATCH v4 0/7] " Marc Marí
@ 2015-10-01 12:16   ` Marc Marí
  2015-10-01 12:16   ` [Qemu-devel] [PATCH v4 2/7] fw_cfg DMA interface documentation Marc Marí
                     ` (6 subsequent siblings)
  7 siblings, 0 replies; 60+ messages in thread
From: Marc Marí @ 2015-10-01 12:16 UTC (permalink / raw)
  To: qemu-devel
  Cc: Drew, Stefan Hajnoczi, Gabriel L. Somlo, Kevin O'Connor,
	Gerd Hoffmann, Laszlo

From: "Gabriel L. Somlo" <somlo@cmu.edu>

Document the behavior of fw_cfg_modify_iXX() for leak-less updating
of integer-type blobs.

Currently only fw_cfg_modify_i16() is coded, but 32- and 64-bit versions
may be added later if necessary..

Signed-off-by: Gabriel Somlo <somlo@cmu.edu>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
---
 docs/specs/fw_cfg.txt | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/docs/specs/fw_cfg.txt b/docs/specs/fw_cfg.txt
index 74351dd..5bc7b96 100644
--- a/docs/specs/fw_cfg.txt
+++ b/docs/specs/fw_cfg.txt
@@ -159,6 +159,17 @@ will convert a 16-, 32-, or 64-bit integer to little-endian, then add
 a dynamically allocated copy of the appropriately sized item to fw_cfg
 under the given selector key value.
 
+== fw_cfg_modify_iXX() ==
+
+Modify the value of an XX-bit item (where XX may be 16, 32, or 64).
+Similarly to the corresponding fw_cfg_add_iXX() function set, convert
+a 16-, 32-, or 64-bit integer to little endian, create a dynamically
+allocated copy of the required size, and replace the existing item at
+the given selector key value with the newly allocated one. The previous
+item, assumed to have been allocated during an earlier call to
+fw_cfg_add_iXX() or fw_cfg_modify_iXX() (of the same width XX), is freed
+before the function returns.
+
 == fw_cfg_add_file() ==
 
 Given a filename (i.e., fw_cfg item name), starting pointer, and size,
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* [Qemu-devel] [PATCH v4 2/7] fw_cfg DMA interface documentation
  2015-10-01 12:16 ` [Qemu-devel] [PATCH v4 0/7] " Marc Marí
  2015-10-01 12:16   ` [Qemu-devel] [PATCH v4 1/7] fw_cfg: document fw_cfg_modify_iXX() update functions Marc Marí
@ 2015-10-01 12:16   ` Marc Marí
  2015-10-01 14:41     ` Laszlo Ersek
  2015-10-01 12:16   ` [Qemu-devel] [PATCH v4 3/7] Implement fw_cfg DMA interface Marc Marí
                     ` (5 subsequent siblings)
  7 siblings, 1 reply; 60+ messages in thread
From: Marc Marí @ 2015-10-01 12:16 UTC (permalink / raw)
  To: qemu-devel
  Cc: Drew, Stefan Hajnoczi, Gabriel L. Somlo, Kevin O'Connor,
	Gerd Hoffmann, Marc Marí, Laszlo

Add fw_cfg DMA interface specification in the documentation.

Based on Gerd Hoffman's initial implementation.

Signed-off-by: Marc Marí <markmb@redhat.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
---
 docs/specs/fw_cfg.txt | 65 +++++++++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 61 insertions(+), 4 deletions(-)

diff --git a/docs/specs/fw_cfg.txt b/docs/specs/fw_cfg.txt
index 5bc7b96..2d6b2da 100644
--- a/docs/specs/fw_cfg.txt
+++ b/docs/specs/fw_cfg.txt
@@ -76,6 +76,13 @@ increasing address order, similar to memcpy().
 
 Selector Register IOport: 0x510
 Data Register IOport:     0x511
+DMA Address IOport:       0x514
+
+=== ARM Register Locations ===
+
+Selector Register address: Base + 8 (2 bytes)
+Data Register address:     Base + 0 (8 bytes)
+DMA Address address:       Base + 16 (8 bytes)
 
 == Firmware Configuration Items ==
 
@@ -86,11 +93,12 @@ by selecting the "signature" item using key 0x0000 (FW_CFG_SIGNATURE),
 and reading four bytes from the data register. If the fw_cfg device is
 present, the four bytes read will contain the characters "QEMU".
 
-=== Revision (Key 0x0001, FW_CFG_ID) ===
+=== Revision / feature bitmap (Key 0x0001, FW_CFG_ID) ===
 
-A 32-bit little-endian unsigned int, this item is used as an interface
-revision number, and is currently set to 1 by QEMU when fw_cfg is
-initialized.
+A 32-bit little-endian unsigned int, this item is used to check for enabled
+features.
+ - Bit 0: traditional interface. Always set.
+ - Bit 1: DMA interface.
 
 === File Directory (Key 0x0019, FW_CFG_FILE_DIR) ===
 
@@ -132,6 +140,55 @@ Selector Reg.    Range Usage
 In practice, the number of allowed firmware configuration items is given
 by the value of FW_CFG_MAX_ENTRY (see fw_cfg.h).
 
+= Guest-side DMA Interface =
+
+If bit 1 of the feature bitmap is set, the DMA interface is present. This does
+not replace the existing fw_cfg interface, it is an add-on. This interface
+can be used through the 64-bit wide address register.
+
+The address register is in big-endian format. The value for the register is 0
+at startup and after an operation. A write to the lower half triggers an
+operation. This means that operations with 32-bit addresses can be triggered
+with just one write, whereas operations with 64-bit addresses can be
+triggered with one 64-bit write or two 32-bit writes, starting with the
+higher part.
+
+In this register, the physical address of a FWCfgDmaAccess structure in RAM
+should be written. This is the format of the FWCfgDmaAccess structure:
+
+typedef struct FWCfgDmaAccess {
+    uint32_t control;
+    uint32_t length;
+    uint64_t address;
+} FWCfgDmaAccess;
+
+The fields of the structure are in big endian mode, and the field at the lowest
+address is the "control" field.
+
+The "control" field has the following bits:
+ - Bit 0: Error
+ - Bit 1: Read
+ - Bit 2: Skip
+ - Bit 3: Select. The upper 16 bits are the selected index.
+
+When an operation is triggered, if the "control" field has bit 3 set, the
+upper 16 bits are interpreted as an index of a firmware configuration item.
+This has the same effect as writing the selector register.
+
+If the "control" field has bit 1 set, a read operation will be performed.
+"length" bytes for the current selector and offset will be copied into the
+physical RAM address specified by the "address" field.
+
+If the "control" field has bit 2 set (and not bit 1), a skip operation will be
+performed. The offset for the current selector will be advanced "length" bytes.
+
+To check the result, read the "control" field:
+   error bit set        ->  something went wrong.
+   all bits cleared     ->  transfer finished successfully.
+   otherwise            ->  transfer still in progress (doesn't happen
+                            today due to implementation not being async,
+                            but may in the future).
+
 = Host-side API =
 
 The following functions are available to the QEMU programmer for adding
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH v4 2/7] fw_cfg DMA interface documentation
  2015-10-01 12:16   ` [Qemu-devel] [PATCH v4 2/7] fw_cfg DMA interface documentation Marc Marí
@ 2015-10-01 14:41     ` Laszlo Ersek
  0 siblings, 0 replies; 60+ messages in thread
From: Laszlo Ersek @ 2015-10-01 14:41 UTC (permalink / raw)
  To: Marc Marí, qemu-devel
  Cc: Gabriel L. Somlo, Stefan Hajnoczi, Drew, Kevin O'Connor,
	Gerd Hoffmann

On 10/01/15 14:16, Marc Marí wrote:
> Add fw_cfg DMA interface specification in the documentation.
> 
> Based on Gerd Hoffman's initial implementation.
> 
> Signed-off-by: Marc Marí <markmb@redhat.com>
> Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
> ---
>  docs/specs/fw_cfg.txt | 65 +++++++++++++++++++++++++++++++++++++++++++++++----
>  1 file changed, 61 insertions(+), 4 deletions(-)
> 
> diff --git a/docs/specs/fw_cfg.txt b/docs/specs/fw_cfg.txt
> index 5bc7b96..2d6b2da 100644
> --- a/docs/specs/fw_cfg.txt
> +++ b/docs/specs/fw_cfg.txt
> @@ -76,6 +76,13 @@ increasing address order, similar to memcpy().
>  
>  Selector Register IOport: 0x510
>  Data Register IOport:     0x511
> +DMA Address IOport:       0x514
> +
> +=== ARM Register Locations ===
> +
> +Selector Register address: Base + 8 (2 bytes)
> +Data Register address:     Base + 0 (8 bytes)
> +DMA Address address:       Base + 16 (8 bytes)
>  
>  == Firmware Configuration Items ==
>  
> @@ -86,11 +93,12 @@ by selecting the "signature" item using key 0x0000 (FW_CFG_SIGNATURE),
>  and reading four bytes from the data register. If the fw_cfg device is
>  present, the four bytes read will contain the characters "QEMU".
>  
> -=== Revision (Key 0x0001, FW_CFG_ID) ===
> +=== Revision / feature bitmap (Key 0x0001, FW_CFG_ID) ===
>  
> -A 32-bit little-endian unsigned int, this item is used as an interface
> -revision number, and is currently set to 1 by QEMU when fw_cfg is
> -initialized.
> +A 32-bit little-endian unsigned int, this item is used to check for enabled
> +features.
> + - Bit 0: traditional interface. Always set.
> + - Bit 1: DMA interface.
>  
>  === File Directory (Key 0x0019, FW_CFG_FILE_DIR) ===
>  
> @@ -132,6 +140,55 @@ Selector Reg.    Range Usage
>  In practice, the number of allowed firmware configuration items is given
>  by the value of FW_CFG_MAX_ENTRY (see fw_cfg.h).
>  
> += Guest-side DMA Interface =
> +
> +If bit 1 of the feature bitmap is set, the DMA interface is present. This does
> +not replace the existing fw_cfg interface, it is an add-on. This interface
> +can be used through the 64-bit wide address register.
> +
> +The address register is in big-endian format. The value for the register is 0
> +at startup and after an operation. A write to the lower half triggers an

suggest "least significant half (at offset 4)" in place of "lower half"

> +operation. This means that operations with 32-bit addresses can be triggered
> +with just one write, whereas operations with 64-bit addresses can be
> +triggered with one 64-bit write or two 32-bit writes, starting with the
> +higher part.

suggest "most significant half (at offset 0)" in place of "higher part".

With those changes:

Reviewed-by: Laszlo Ersek <lersek@redhat.com>

Thanks!
Laszlo

> +
> +In this register, the physical address of a FWCfgDmaAccess structure in RAM
> +should be written. This is the format of the FWCfgDmaAccess structure:
> +
> +typedef struct FWCfgDmaAccess {
> +    uint32_t control;
> +    uint32_t length;
> +    uint64_t address;
> +} FWCfgDmaAccess;
> +
> +The fields of the structure are in big endian mode, and the field at the lowest
> +address is the "control" field.
> +
> +The "control" field has the following bits:
> + - Bit 0: Error
> + - Bit 1: Read
> + - Bit 2: Skip
> + - Bit 3: Select. The upper 16 bits are the selected index.
> +
> +When an operation is triggered, if the "control" field has bit 3 set, the
> +upper 16 bits are interpreted as an index of a firmware configuration item.
> +This has the same effect as writing the selector register.
> +
> +If the "control" field has bit 1 set, a read operation will be performed.
> +"length" bytes for the current selector and offset will be copied into the
> +physical RAM address specified by the "address" field.
> +
> +If the "control" field has bit 2 set (and not bit 1), a skip operation will be
> +performed. The offset for the current selector will be advanced "length" bytes.
> +
> +To check the result, read the "control" field:
> +   error bit set        ->  something went wrong.
> +   all bits cleared     ->  transfer finished successfully.
> +   otherwise            ->  transfer still in progress (doesn't happen
> +                            today due to implementation not being async,
> +                            but may in the future).
> +
>  = Host-side API =
>  
>  The following functions are available to the QEMU programmer for adding
> 

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [Qemu-devel] [PATCH v4 3/7] Implement fw_cfg DMA interface
  2015-10-01 12:16 ` [Qemu-devel] [PATCH v4 0/7] " Marc Marí
  2015-10-01 12:16   ` [Qemu-devel] [PATCH v4 1/7] fw_cfg: document fw_cfg_modify_iXX() update functions Marc Marí
  2015-10-01 12:16   ` [Qemu-devel] [PATCH v4 2/7] fw_cfg DMA interface documentation Marc Marí
@ 2015-10-01 12:16   ` Marc Marí
  2015-10-01 14:36     ` Laszlo Ersek
  2015-10-06 14:44     ` Stefan Hajnoczi
  2015-10-01 12:16   ` [Qemu-devel] [PATCH v4 4/7] Enable fw_cfg DMA interface for ARM Marc Marí
                     ` (4 subsequent siblings)
  7 siblings, 2 replies; 60+ messages in thread
From: Marc Marí @ 2015-10-01 12:16 UTC (permalink / raw)
  To: qemu-devel
  Cc: Drew, Stefan Hajnoczi, Gabriel L. Somlo, Kevin O'Connor,
	Gerd Hoffmann, Marc Marí, Laszlo

Based on the specifications on docs/specs/fw_cfg.txt

This interface is an addon. The old interface can still be used as usual.

Based on Gerd Hoffman's initial implementation.

Signed-off-by: Marc Marí <markmb@redhat.com>
---
 hw/arm/virt.c             |   2 +-
 hw/nvram/fw_cfg.c         | 231 +++++++++++++++++++++++++++++++++++++++++++---
 include/hw/nvram/fw_cfg.h |  16 +++-
 3 files changed, 233 insertions(+), 16 deletions(-)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index d25d6cf..7ae984f 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -683,7 +683,7 @@ static void create_fw_cfg(const VirtBoardInfo *vbi)
     hwaddr size = vbi->memmap[VIRT_FW_CFG].size;
     char *nodename;
 
-    fw_cfg_init_mem_wide(base + 8, base, 8);
+    fw_cfg_init_mem_wide(base + 8, base, 8, 0, NULL);
 
     nodename = g_strdup_printf("/fw-cfg@%" PRIx64, base);
     qemu_fdt_add_subnode(vbi->fdt, nodename);
diff --git a/hw/nvram/fw_cfg.c b/hw/nvram/fw_cfg.c
index 658f8c4..59933b3 100644
--- a/hw/nvram/fw_cfg.c
+++ b/hw/nvram/fw_cfg.c
@@ -23,6 +23,7 @@
  */
 #include "hw/hw.h"
 #include "sysemu/sysemu.h"
+#include "sysemu/dma.h"
 #include "hw/isa/isa.h"
 #include "hw/nvram/fw_cfg.h"
 #include "hw/sysbus.h"
@@ -30,7 +31,7 @@
 #include "qemu/error-report.h"
 #include "qemu/config-file.h"
 
-#define FW_CFG_SIZE 2
+#define FW_CFG_CTL_SIZE 2
 #define FW_CFG_NAME "fw_cfg"
 #define FW_CFG_PATH "/machine/" FW_CFG_NAME
 
@@ -42,6 +43,16 @@
 #define FW_CFG_IO(obj)  OBJECT_CHECK(FWCfgIoState,  (obj), TYPE_FW_CFG_IO)
 #define FW_CFG_MEM(obj) OBJECT_CHECK(FWCfgMemState, (obj), TYPE_FW_CFG_MEM)
 
+/* FW_CFG_VERSION bits */
+#define FW_CFG_VERSION      0x01
+#define FW_CFG_VERSION_DMA  0x02
+
+/* FW_CFG_DMA_CONTROL bits */
+#define FW_CFG_DMA_CTL_ERROR   0x01
+#define FW_CFG_DMA_CTL_READ    0x02
+#define FW_CFG_DMA_CTL_SKIP    0x04
+#define FW_CFG_DMA_CTL_SELECT  0x08
+
 typedef struct FWCfgEntry {
     uint32_t len;
     uint8_t *data;
@@ -59,6 +70,10 @@ struct FWCfgState {
     uint16_t cur_entry;
     uint32_t cur_offset;
     Notifier machine_ready;
+
+    bool dma_enabled;
+    AddressSpace *dma_as;
+    dma_addr_t dma_addr;
 };
 
 struct FWCfgIoState {
@@ -66,8 +81,8 @@ struct FWCfgIoState {
     FWCfgState parent_obj;
     /*< public >*/
 
-    MemoryRegion comb_iomem;
-    uint32_t iobase;
+    MemoryRegion comb_iomem, dma_iomem;
+    uint32_t iobase, dma_iobase;
 };
 
 struct FWCfgMemState {
@@ -75,7 +90,7 @@ struct FWCfgMemState {
     FWCfgState parent_obj;
     /*< public >*/
 
-    MemoryRegion ctl_iomem, data_iomem;
+    MemoryRegion ctl_iomem, data_iomem, dma_iomem;
     uint32_t data_width;
     MemoryRegionOps wide_data_ops;
 };
@@ -292,6 +307,119 @@ static void fw_cfg_data_mem_write(void *opaque, hwaddr addr,
     } while (i);
 }
 
+static void fw_cfg_dma_transfer(FWCfgState *s)
+{
+    dma_addr_t len;
+    FWCfgDmaAccess dma;
+    int arch;
+    FWCfgEntry *e;
+    int read;
+    dma_addr_t dma_addr;
+
+    /* Reset the address before the next access */
+    dma_addr = s->dma_addr;
+    s->dma_addr = 0;
+
+    dma.address = ldq_be_dma(s->dma_as,
+                            dma_addr + offsetof(FWCfgDmaAccess, address));
+    dma.length = ldl_be_dma(s->dma_as,
+                            dma_addr + offsetof(FWCfgDmaAccess, length));
+    dma.control = ldl_be_dma(s->dma_as,
+                            dma_addr + offsetof(FWCfgDmaAccess, control));
+
+    if (dma.control & FW_CFG_DMA_CTL_SELECT) {
+        fw_cfg_select(s, dma.control >> 16);
+    }
+
+    arch = !!(s->cur_entry & FW_CFG_ARCH_LOCAL);
+    e = &s->entries[arch][s->cur_entry & FW_CFG_ENTRY_MASK];
+
+    if (dma.control & FW_CFG_DMA_CTL_READ) {
+        read = 1;
+    } else if (dma.control & FW_CFG_DMA_CTL_SKIP) {
+        read = 0;
+    } else {
+        dma.length = 0;
+    }
+
+    dma.control = 0;
+
+    while (dma.length > 0 && !(dma.control & FW_CFG_DMA_CTL_ERROR)) {
+        if (s->cur_entry == FW_CFG_INVALID || !e->data ||
+                                s->cur_offset >= e->len) {
+            len = dma.length;
+
+            /* If the access is not a read access, it will be a skip access,
+             * tested before.
+             */
+            if (read) {
+                if (dma_memory_set(s->dma_as, dma.address, 0, len)) {
+                    dma.control |= FW_CFG_DMA_CTL_ERROR;
+                }
+            }
+
+        } else {
+            if (dma.length <= (e->len - s->cur_offset)) {
+                len = dma.length;
+            } else {
+                len = (e->len - s->cur_offset);
+            }
+
+            if (e->read_callback) {
+                e->read_callback(e->callback_opaque, s->cur_offset);
+            }
+
+            /* If the access is not a read access, it will be a skip access,
+             * tested before.
+             */
+            if (read) {
+                if (dma_memory_write(s->dma_as, dma.address,
+                                    &e->data[s->cur_offset], len)) {
+                    dma.control |= FW_CFG_DMA_CTL_ERROR;
+                }
+            }
+
+            s->cur_offset += len;
+        }
+
+        dma.address += len;
+        dma.length  -= len;
+
+    }
+
+    stl_be_dma(s->dma_as, dma_addr + offsetof(FWCfgDmaAccess, control),
+                dma.control);
+
+    trace_fw_cfg_read(s, 0);
+}
+
+static void fw_cfg_dma_mem_write(void *opaque, hwaddr addr,
+                                 uint64_t value, unsigned size)
+{
+    FWCfgState *s = opaque;
+
+    if (size == 4) {
+        if (addr == 0) {
+            /* FWCfgDmaAccess high address */
+            s->dma_addr = value << 32;
+        } else if (addr == 4) {
+            /* FWCfgDmaAccess low address */
+            s->dma_addr |= value;
+            fw_cfg_dma_transfer(s);
+        }
+    } else if (size == 8 && addr == 0) {
+        s->dma_addr = value;
+        fw_cfg_dma_transfer(s);
+    }
+}
+
+static bool fw_cfg_dma_mem_valid(void *opaque, hwaddr addr,
+                                  unsigned size, bool is_write)
+{
+    return is_write && ((size == 4 && (addr == 0 || addr == 4)) ||
+                        (size == 8 && addr == 0));
+}
+
 static bool fw_cfg_data_mem_valid(void *opaque, hwaddr addr,
                                   unsigned size, bool is_write)
 {
@@ -359,6 +487,12 @@ static const MemoryRegionOps fw_cfg_comb_mem_ops = {
     .valid.accepts = fw_cfg_comb_valid,
 };
 
+static const MemoryRegionOps fw_cfg_dma_mem_ops = {
+    .write = fw_cfg_dma_mem_write,
+    .endianness = DEVICE_BIG_ENDIAN,
+    .valid.accepts = fw_cfg_dma_mem_valid,
+};
+
 static void fw_cfg_reset(DeviceState *d)
 {
     FWCfgState *s = FW_CFG(d);
@@ -399,6 +533,22 @@ static bool is_version_1(void *opaque, int version_id)
     return version_id == 1;
 }
 
+static bool fw_cfg_dma_enabled(void *opaque)
+{
+    FWCfgState *s = opaque;
+
+    return s->dma_enabled;
+}
+
+static VMStateDescription vmstate_fw_cfg_dma = {
+    .name = "fw_cfg/dma",
+    .needed = fw_cfg_dma_enabled,
+    .fields = (VMStateField[]) {
+        VMSTATE_UINT64(dma_addr, FWCfgState),
+        VMSTATE_END_OF_LIST()
+    },
+};
+
 static const VMStateDescription vmstate_fw_cfg = {
     .name = "fw_cfg",
     .version_id = 2,
@@ -408,6 +558,10 @@ static const VMStateDescription vmstate_fw_cfg = {
         VMSTATE_UINT16_HACK(cur_offset, FWCfgState, is_version_1),
         VMSTATE_UINT32_V(cur_offset, FWCfgState, 2),
         VMSTATE_END_OF_LIST()
+    },
+    .subsections = (const VMStateDescription*[]) {
+        &vmstate_fw_cfg_dma,
+        NULL,
     }
 };
 
@@ -593,7 +747,6 @@ static void fw_cfg_init1(DeviceState *dev)
     qdev_init_nofail(dev);
 
     fw_cfg_add_bytes(s, FW_CFG_SIGNATURE, (char *)"QEMU", 4);
-    fw_cfg_add_i32(s, FW_CFG_ID, 1);
     fw_cfg_add_bytes(s, FW_CFG_UUID, qemu_uuid, 16);
     fw_cfg_add_i16(s, FW_CFG_NOGRAPHIC, (uint16_t)(display_type == DT_NOGRAPHIC));
     fw_cfg_add_i16(s, FW_CFG_NB_CPUS, (uint16_t)smp_cpus);
@@ -605,25 +758,52 @@ static void fw_cfg_init1(DeviceState *dev)
     qemu_add_machine_init_done_notifier(&s->machine_ready);
 }
 
-FWCfgState *fw_cfg_init_io(uint32_t iobase)
+FWCfgState *fw_cfg_init_io_dma(uint32_t iobase, uint32_t dma_iobase,
+                                AddressSpace *dma_as)
 {
     DeviceState *dev;
+    FWCfgState *s;
+    uint32_t version = FW_CFG_VERSION;
 
     dev = qdev_create(NULL, TYPE_FW_CFG_IO);
     qdev_prop_set_uint32(dev, "iobase", iobase);
+    qdev_prop_set_uint32(dev, "dma_iobase", dma_iobase);
+
     fw_cfg_init1(dev);
+    s = FW_CFG(dev);
+
+    if (dma_as) {
+        /* 64 bits for the address field */
+        s->dma_as = dma_as;
+        s->dma_enabled = true;
+        s->dma_addr = 0;
+
+        version |= FW_CFG_VERSION_DMA;
+    }
 
-    return FW_CFG(dev);
+    fw_cfg_add_i32(s, FW_CFG_ID, version);
+
+    return s;
 }
 
-FWCfgState *fw_cfg_init_mem_wide(hwaddr ctl_addr, hwaddr data_addr,
-                                 uint32_t data_width)
+FWCfgState *fw_cfg_init_io(uint32_t iobase)
+{
+    return fw_cfg_init_io_dma(iobase, 0, NULL);
+}
+
+FWCfgState *fw_cfg_init_mem_wide(hwaddr ctl_addr,
+                                 hwaddr data_addr, uint32_t data_width,
+                                 hwaddr dma_addr, AddressSpace *dma_as)
 {
     DeviceState *dev;
     SysBusDevice *sbd;
+    FWCfgState *s;
+    uint32_t version = FW_CFG_VERSION;
+    bool dma_enabled = dma_addr && dma_as;
 
     dev = qdev_create(NULL, TYPE_FW_CFG_MEM);
     qdev_prop_set_uint32(dev, "data_width", data_width);
+    qdev_prop_set_bit(dev, "dma_enabled", dma_enabled);
 
     fw_cfg_init1(dev);
 
@@ -631,13 +811,25 @@ FWCfgState *fw_cfg_init_mem_wide(hwaddr ctl_addr, hwaddr data_addr,
     sysbus_mmio_map(sbd, 0, ctl_addr);
     sysbus_mmio_map(sbd, 1, data_addr);
 
-    return FW_CFG(dev);
+    s = FW_CFG(dev);
+
+    if (dma_enabled) {
+        s->dma_as = dma_as;
+        s->dma_addr = 0;
+        sysbus_mmio_map(sbd, 2, dma_addr);
+        version |= FW_CFG_VERSION_DMA;
+    }
+
+    fw_cfg_add_i32(s, FW_CFG_ID, version);
+
+    return s;
 }
 
 FWCfgState *fw_cfg_init_mem(hwaddr ctl_addr, hwaddr data_addr)
 {
     return fw_cfg_init_mem_wide(ctl_addr, data_addr,
-                                fw_cfg_data_mem_ops.valid.max_access_size);
+                                fw_cfg_data_mem_ops.valid.max_access_size,
+                                0, NULL);
 }
 
 
@@ -664,6 +856,7 @@ static const TypeInfo fw_cfg_info = {
 
 static Property fw_cfg_io_properties[] = {
     DEFINE_PROP_UINT32("iobase", FWCfgIoState, iobase, -1),
+    DEFINE_PROP_UINT32("dma_iobase", FWCfgIoState, dma_iobase, -1),
     DEFINE_PROP_END_OF_LIST(),
 };
 
@@ -673,8 +866,12 @@ static void fw_cfg_io_realize(DeviceState *dev, Error **errp)
     SysBusDevice *sbd = SYS_BUS_DEVICE(dev);
 
     memory_region_init_io(&s->comb_iomem, OBJECT(s), &fw_cfg_comb_mem_ops,
-                          FW_CFG(s), "fwcfg", FW_CFG_SIZE);
+                          FW_CFG(s), "fwcfg", FW_CFG_CTL_SIZE);
     sysbus_add_io(sbd, s->iobase, &s->comb_iomem);
+
+    memory_region_init_io(&s->dma_iomem, OBJECT(s), &fw_cfg_dma_mem_ops,
+                          FW_CFG(s), "fwcfg.dma", sizeof(dma_addr_t));
+    sysbus_add_io(sbd, s->dma_iobase, &s->dma_iomem);
 }
 
 static void fw_cfg_io_class_init(ObjectClass *klass, void *data)
@@ -695,6 +892,8 @@ static const TypeInfo fw_cfg_io_info = {
 
 static Property fw_cfg_mem_properties[] = {
     DEFINE_PROP_UINT32("data_width", FWCfgMemState, data_width, -1),
+    DEFINE_PROP_BOOL("dma_enabled", FWCfgMemState, parent_obj.dma_enabled,
+                     false),
     DEFINE_PROP_END_OF_LIST(),
 };
 
@@ -705,7 +904,7 @@ static void fw_cfg_mem_realize(DeviceState *dev, Error **errp)
     const MemoryRegionOps *data_ops = &fw_cfg_data_mem_ops;
 
     memory_region_init_io(&s->ctl_iomem, OBJECT(s), &fw_cfg_ctl_mem_ops,
-                          FW_CFG(s), "fwcfg.ctl", FW_CFG_SIZE);
+                          FW_CFG(s), "fwcfg.ctl", FW_CFG_CTL_SIZE);
     sysbus_init_mmio(sbd, &s->ctl_iomem);
 
     if (s->data_width > data_ops->valid.max_access_size) {
@@ -723,6 +922,12 @@ static void fw_cfg_mem_realize(DeviceState *dev, Error **errp)
     memory_region_init_io(&s->data_iomem, OBJECT(s), data_ops, FW_CFG(s),
                           "fwcfg.data", data_ops->valid.max_access_size);
     sysbus_init_mmio(sbd, &s->data_iomem);
+
+    if (FW_CFG(s)->dma_enabled) {
+        memory_region_init_io(&s->dma_iomem, OBJECT(s), &fw_cfg_dma_mem_ops,
+                              FW_CFG(s), "fwcfg.dma", sizeof(dma_addr_t));
+        sysbus_init_mmio(sbd, &s->dma_iomem);
+    }
 }
 
 static void fw_cfg_mem_class_init(ObjectClass *klass, void *data)
diff --git a/include/hw/nvram/fw_cfg.h b/include/hw/nvram/fw_cfg.h
index e60d3ca..ee0cd8a 100644
--- a/include/hw/nvram/fw_cfg.h
+++ b/include/hw/nvram/fw_cfg.h
@@ -61,6 +61,15 @@ typedef struct FWCfgFiles {
     FWCfgFile f[];
 } FWCfgFiles;
 
+/* Control as first field allows for different structures selected by this
+ * field, which might be useful in the future
+ */
+typedef struct FWCfgDmaAccess {
+    uint32_t control;
+    uint32_t length;
+    uint64_t address;
+} QEMU_PACKED FWCfgDmaAccess;
+
 typedef void (*FWCfgCallback)(void *opaque, uint8_t *data);
 typedef void (*FWCfgReadCallback)(void *opaque, uint32_t offset);
 
@@ -77,10 +86,13 @@ void fw_cfg_add_file_callback(FWCfgState *s, const char *filename,
                               void *data, size_t len);
 void *fw_cfg_modify_file(FWCfgState *s, const char *filename, void *data,
                          size_t len);
+FWCfgState *fw_cfg_init_io_dma(uint32_t iobase, uint32_t dma_iobase,
+                                AddressSpace *dma_as);
 FWCfgState *fw_cfg_init_io(uint32_t iobase);
 FWCfgState *fw_cfg_init_mem(hwaddr ctl_addr, hwaddr data_addr);
-FWCfgState *fw_cfg_init_mem_wide(hwaddr ctl_addr, hwaddr data_addr,
-                                 uint32_t data_width);
+FWCfgState *fw_cfg_init_mem_wide(hwaddr ctl_addr,
+                                 hwaddr data_addr, uint32_t data_width,
+                                 hwaddr dma_addr, AddressSpace *dma_as);
 
 FWCfgState *fw_cfg_find(void);
 
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH v4 3/7] Implement fw_cfg DMA interface
  2015-10-01 12:16   ` [Qemu-devel] [PATCH v4 3/7] Implement fw_cfg DMA interface Marc Marí
@ 2015-10-01 14:36     ` Laszlo Ersek
  2015-10-01 15:52       ` Marc Marí
  2015-10-01 17:18       ` Peter Maydell
  2015-10-06 14:44     ` Stefan Hajnoczi
  1 sibling, 2 replies; 60+ messages in thread
From: Laszlo Ersek @ 2015-10-01 14:36 UTC (permalink / raw)
  To: Marc Marí, qemu-devel
  Cc: Gabriel L. Somlo, Stefan Hajnoczi, Drew, Kevin O'Connor,
	Gerd Hoffmann

This looks good to me. Thanks for addressing my v3 request.

I have some new remarks here. I feel *really* bad for not finding them
earlier. (If you get tired of working on this series, I could pick it up
and try to shepherd it further.)

On 10/01/15 14:16, Marc Marí wrote:
> Based on the specifications on docs/specs/fw_cfg.txt
> 
> This interface is an addon. The old interface can still be used as usual.
> 
> Based on Gerd Hoffman's initial implementation.
> 
> Signed-off-by: Marc Marí <markmb@redhat.com>
> ---
>  hw/arm/virt.c             |   2 +-
>  hw/nvram/fw_cfg.c         | 231 +++++++++++++++++++++++++++++++++++++++++++---
>  include/hw/nvram/fw_cfg.h |  16 +++-
>  3 files changed, 233 insertions(+), 16 deletions(-)
> 
> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
> index d25d6cf..7ae984f 100644
> --- a/hw/arm/virt.c
> +++ b/hw/arm/virt.c
> @@ -683,7 +683,7 @@ static void create_fw_cfg(const VirtBoardInfo *vbi)
>      hwaddr size = vbi->memmap[VIRT_FW_CFG].size;
>      char *nodename;
>  
> -    fw_cfg_init_mem_wide(base + 8, base, 8);
> +    fw_cfg_init_mem_wide(base + 8, base, 8, 0, NULL);
>  
>      nodename = g_strdup_printf("/fw-cfg@%" PRIx64, base);
>      qemu_fdt_add_subnode(vbi->fdt, nodename);
> diff --git a/hw/nvram/fw_cfg.c b/hw/nvram/fw_cfg.c
> index 658f8c4..59933b3 100644
> --- a/hw/nvram/fw_cfg.c
> +++ b/hw/nvram/fw_cfg.c
> @@ -23,6 +23,7 @@
>   */
>  #include "hw/hw.h"
>  #include "sysemu/sysemu.h"
> +#include "sysemu/dma.h"
>  #include "hw/isa/isa.h"
>  #include "hw/nvram/fw_cfg.h"
>  #include "hw/sysbus.h"
> @@ -30,7 +31,7 @@
>  #include "qemu/error-report.h"
>  #include "qemu/config-file.h"
>  
> -#define FW_CFG_SIZE 2
> +#define FW_CFG_CTL_SIZE 2
>  #define FW_CFG_NAME "fw_cfg"
>  #define FW_CFG_PATH "/machine/" FW_CFG_NAME
>  
> @@ -42,6 +43,16 @@
>  #define FW_CFG_IO(obj)  OBJECT_CHECK(FWCfgIoState,  (obj), TYPE_FW_CFG_IO)
>  #define FW_CFG_MEM(obj) OBJECT_CHECK(FWCfgMemState, (obj), TYPE_FW_CFG_MEM)
>  
> +/* FW_CFG_VERSION bits */
> +#define FW_CFG_VERSION      0x01
> +#define FW_CFG_VERSION_DMA  0x02
> +
> +/* FW_CFG_DMA_CONTROL bits */
> +#define FW_CFG_DMA_CTL_ERROR   0x01
> +#define FW_CFG_DMA_CTL_READ    0x02
> +#define FW_CFG_DMA_CTL_SKIP    0x04
> +#define FW_CFG_DMA_CTL_SELECT  0x08
> +
>  typedef struct FWCfgEntry {
>      uint32_t len;
>      uint8_t *data;
> @@ -59,6 +70,10 @@ struct FWCfgState {
>      uint16_t cur_entry;
>      uint32_t cur_offset;
>      Notifier machine_ready;
> +
> +    bool dma_enabled;
> +    AddressSpace *dma_as;
> +    dma_addr_t dma_addr;
>  };
>  
>  struct FWCfgIoState {
> @@ -66,8 +81,8 @@ struct FWCfgIoState {
>      FWCfgState parent_obj;
>      /*< public >*/
>  
> -    MemoryRegion comb_iomem;
> -    uint32_t iobase;
> +    MemoryRegion comb_iomem, dma_iomem;
> +    uint32_t iobase, dma_iobase;
>  };
>  
>  struct FWCfgMemState {
> @@ -75,7 +90,7 @@ struct FWCfgMemState {
>      FWCfgState parent_obj;
>      /*< public >*/
>  
> -    MemoryRegion ctl_iomem, data_iomem;
> +    MemoryRegion ctl_iomem, data_iomem, dma_iomem;
>      uint32_t data_width;
>      MemoryRegionOps wide_data_ops;
>  };

(1) I *think* the new "dma_iomem" field, of type MemoryRegion, could be
moved up to the parent struct FWCfgEntry, from both FWCfgMemState and
FWCfgIoState. (And the references in the rest of the code could be updated.)

(

Independently, some loud thinking, mostly for myself: I've always been
surprised by the difference between (a) FWCfgIoState *carrying*
"dma_iobase" as a field -- and a property! --, and (b) FWCfgMemState
*not* carrying the same as a field -- nor as a property.

I think I finally understand this difference now. It is all rooted in
the difference between the internal APIs sysbus_add_io() and
sysbus_init_mmio(). Both of these are called from the device realize
functions, but the first (sysbus_add_io()) wants the IO port address at
once, whereas the second (sysbus_init_mmio()) doesn't want the address
-- the actual mapping (sysbus_mmio_map()) is delayed to board code; the
device code doesn't want to be aware of it.

And this ripples to the top. Because sysbus_add_io() wants the IO port
address, we must pass that address to the device realize function. And
for that, we need a device property -- "dma_iobase". This is not new, it
just follows the example of the preexistent "iobase" field / property.

Whereas, in the sysbus_init_mmio() case, we can keep the MMIO address
private to the board code; the realize function need not know the
address. However, the realize function does need to know the *fact* that
we're going to do DMA. Given that we must maintain this fact (in
"FWCfgState.dma_enabled") anyway, for other -- e.g. migration subsection
-- purposes as well, it makes sense to expose that same field of the
parent struct as a property, so we can set it in the memory mapped case
*before* the realize function looks at it.

I feel better now, thanks for listening.

)

Then,

> @@ -292,6 +307,119 @@ static void fw_cfg_data_mem_write(void *opaque, hwaddr addr,
>      } while (i);
>  }
>  
> +static void fw_cfg_dma_transfer(FWCfgState *s)
> +{
> +    dma_addr_t len;
> +    FWCfgDmaAccess dma;
> +    int arch;
> +    FWCfgEntry *e;
> +    int read;
> +    dma_addr_t dma_addr;
> +
> +    /* Reset the address before the next access */
> +    dma_addr = s->dma_addr;
> +    s->dma_addr = 0;
> +
> +    dma.address = ldq_be_dma(s->dma_as,
> +                            dma_addr + offsetof(FWCfgDmaAccess, address));
> +    dma.length = ldl_be_dma(s->dma_as,
> +                            dma_addr + offsetof(FWCfgDmaAccess, length));
> +    dma.control = ldl_be_dma(s->dma_as,
> +                            dma_addr + offsetof(FWCfgDmaAccess, control));
> +
> +    if (dma.control & FW_CFG_DMA_CTL_SELECT) {
> +        fw_cfg_select(s, dma.control >> 16);
> +    }
> +
> +    arch = !!(s->cur_entry & FW_CFG_ARCH_LOCAL);
> +    e = &s->entries[arch][s->cur_entry & FW_CFG_ENTRY_MASK];
> +
> +    if (dma.control & FW_CFG_DMA_CTL_READ) {
> +        read = 1;
> +    } else if (dma.control & FW_CFG_DMA_CTL_SKIP) {
> +        read = 0;
> +    } else {
> +        dma.length = 0;

I can see you addressed Kevin's comment here.


> +    }
> +
> +    dma.control = 0;
> +
> +    while (dma.length > 0 && !(dma.control & FW_CFG_DMA_CTL_ERROR)) {
> +        if (s->cur_entry == FW_CFG_INVALID || !e->data ||
> +                                s->cur_offset >= e->len) {
> +            len = dma.length;
> +
> +            /* If the access is not a read access, it will be a skip access,
> +             * tested before.
> +             */
> +            if (read) {
> +                if (dma_memory_set(s->dma_as, dma.address, 0, len)) {
> +                    dma.control |= FW_CFG_DMA_CTL_ERROR;
> +                }
> +            }
> +
> +        } else {
> +            if (dma.length <= (e->len - s->cur_offset)) {
> +                len = dma.length;
> +            } else {
> +                len = (e->len - s->cur_offset);
> +            }
> +
> +            if (e->read_callback) {
> +                e->read_callback(e->callback_opaque, s->cur_offset);
> +            }
> +
> +            /* If the access is not a read access, it will be a skip access,
> +             * tested before.
> +             */
> +            if (read) {
> +                if (dma_memory_write(s->dma_as, dma.address,
> +                                    &e->data[s->cur_offset], len)) {
> +                    dma.control |= FW_CFG_DMA_CTL_ERROR;
> +                }
> +            }
> +
> +            s->cur_offset += len;
> +        }
> +
> +        dma.address += len;
> +        dma.length  -= len;
> +
> +    }
> +
> +    stl_be_dma(s->dma_as, dma_addr + offsetof(FWCfgDmaAccess, control),
> +                dma.control);
> +
> +    trace_fw_cfg_read(s, 0);
> +}

Seems OK to me.

> +
> +static void fw_cfg_dma_mem_write(void *opaque, hwaddr addr,
> +                                 uint64_t value, unsigned size)
> +{
> +    FWCfgState *s = opaque;
> +
> +    if (size == 4) {
> +        if (addr == 0) {
> +            /* FWCfgDmaAccess high address */
> +            s->dma_addr = value << 32;
> +        } else if (addr == 4) {
> +            /* FWCfgDmaAccess low address */
> +            s->dma_addr |= value;
> +            fw_cfg_dma_transfer(s);
> +        }
> +    } else if (size == 8 && addr == 0) {
> +        s->dma_addr = value;
> +        fw_cfg_dma_transfer(s);
> +    }
> +}

Seems to match the zeroing of s->dma_addr in fw_cfg_dma_transfer(). Good.

> +
> +static bool fw_cfg_dma_mem_valid(void *opaque, hwaddr addr,
> +                                  unsigned size, bool is_write)
> +{
> +    return is_write && ((size == 4 && (addr == 0 || addr == 4)) ||
> +                        (size == 8 && addr == 0));
> +}
> +
>  static bool fw_cfg_data_mem_valid(void *opaque, hwaddr addr,
>                                    unsigned size, bool is_write)
>  {
> @@ -359,6 +487,12 @@ static const MemoryRegionOps fw_cfg_comb_mem_ops = {
>      .valid.accepts = fw_cfg_comb_valid,
>  };
>  
> +static const MemoryRegionOps fw_cfg_dma_mem_ops = {
> +    .write = fw_cfg_dma_mem_write,
> +    .endianness = DEVICE_BIG_ENDIAN,
> +    .valid.accepts = fw_cfg_dma_mem_valid,
> +};

(2) Okay. This is somewhat important, and *completely* non-intuitive,
unfortunately.

Without setting *both*

    .valid.max_access_size = 8,
    .impl.max_access_size = 8,

here, the memory subsystem will split up all 8-byte wide accesses (from
the guest side) to two 4-byte wide calls to fw_cfg_dma_mem_write()).

Those calls do satisfy the ordering logic in fw_cfg_dma_mem_write(), but
nonetheless, the lack of the above setting makes the following code in
fw_cfg_dma_mem_write() dead:

> +    } else if (size == 8 && addr == 0) {
> +        s->dma_addr = value;
> +        fw_cfg_dma_transfer(s);
> +    }

(I verified this claim with gdb on aarch64.)

So, please initialize both of the above fields to 8.

> +
>  static void fw_cfg_reset(DeviceState *d)
>  {
>      FWCfgState *s = FW_CFG(d);
> @@ -399,6 +533,22 @@ static bool is_version_1(void *opaque, int version_id)
>      return version_id == 1;
>  }
>  
> +static bool fw_cfg_dma_enabled(void *opaque)
> +{
> +    FWCfgState *s = opaque;
> +
> +    return s->dma_enabled;
> +}
> +
> +static VMStateDescription vmstate_fw_cfg_dma = {
> +    .name = "fw_cfg/dma",
> +    .needed = fw_cfg_dma_enabled,
> +    .fields = (VMStateField[]) {
> +        VMSTATE_UINT64(dma_addr, FWCfgState),
> +        VMSTATE_END_OF_LIST()
> +    },
> +};

Looks good to me. All fields that come from the command line (ie.
management layer) need not / must not be part of the migration stream.
And all data that is programmed by the guest, must. Here, "dma_addr" is
the only such item. Okay.

> +
>  static const VMStateDescription vmstate_fw_cfg = {
>      .name = "fw_cfg",
>      .version_id = 2,
> @@ -408,6 +558,10 @@ static const VMStateDescription vmstate_fw_cfg = {
>          VMSTATE_UINT16_HACK(cur_offset, FWCfgState, is_version_1),
>          VMSTATE_UINT32_V(cur_offset, FWCfgState, 2),
>          VMSTATE_END_OF_LIST()
> +    },
> +    .subsections = (const VMStateDescription*[]) {
> +        &vmstate_fw_cfg_dma,
> +        NULL,
>      }
>  };
>  
> @@ -593,7 +747,6 @@ static void fw_cfg_init1(DeviceState *dev)
>      qdev_init_nofail(dev);
>  
>      fw_cfg_add_bytes(s, FW_CFG_SIGNATURE, (char *)"QEMU", 4);
> -    fw_cfg_add_i32(s, FW_CFG_ID, 1);
>      fw_cfg_add_bytes(s, FW_CFG_UUID, qemu_uuid, 16);
>      fw_cfg_add_i16(s, FW_CFG_NOGRAPHIC, (uint16_t)(display_type == DT_NOGRAPHIC));
>      fw_cfg_add_i16(s, FW_CFG_NB_CPUS, (uint16_t)smp_cpus);

This is called from fw_cfg_init_io() and fw_cfg_init_mem_wide().

The former is renamed to fw_cfg_init_io_dma() -- and gets a wrapper
under the original name --, and sets FW_CFG_ID expliticly.

The latter sets FW_CFG_ID expliticly.

Okay.


> @@ -605,25 +758,52 @@ static void fw_cfg_init1(DeviceState *dev)
>      qemu_add_machine_init_done_notifier(&s->machine_ready);
>  }
>  
> -FWCfgState *fw_cfg_init_io(uint32_t iobase)
> +FWCfgState *fw_cfg_init_io_dma(uint32_t iobase, uint32_t dma_iobase,
> +                                AddressSpace *dma_as)
>  {
>      DeviceState *dev;
> +    FWCfgState *s;
> +    uint32_t version = FW_CFG_VERSION;
>  
>      dev = qdev_create(NULL, TYPE_FW_CFG_IO);
>      qdev_prop_set_uint32(dev, "iobase", iobase);
> +    qdev_prop_set_uint32(dev, "dma_iobase", dma_iobase);
> +
>      fw_cfg_init1(dev);
> +    s = FW_CFG(dev);
> +
> +    if (dma_as) {
> +        /* 64 bits for the address field */
> +        s->dma_as = dma_as;
> +        s->dma_enabled = true;
> +        s->dma_addr = 0;
> +
> +        version |= FW_CFG_VERSION_DMA;
> +    }
>  
> -    return FW_CFG(dev);
> +    fw_cfg_add_i32(s, FW_CFG_ID, version);
> +
> +    return s;
>  }
>  
> -FWCfgState *fw_cfg_init_mem_wide(hwaddr ctl_addr, hwaddr data_addr,
> -                                 uint32_t data_width)
> +FWCfgState *fw_cfg_init_io(uint32_t iobase)
> +{
> +    return fw_cfg_init_io_dma(iobase, 0, NULL);
> +}
> +
> +FWCfgState *fw_cfg_init_mem_wide(hwaddr ctl_addr,
> +                                 hwaddr data_addr, uint32_t data_width,
> +                                 hwaddr dma_addr, AddressSpace *dma_as)
>  {
>      DeviceState *dev;
>      SysBusDevice *sbd;
> +    FWCfgState *s;
> +    uint32_t version = FW_CFG_VERSION;
> +    bool dma_enabled = dma_addr && dma_as;
>  
>      dev = qdev_create(NULL, TYPE_FW_CFG_MEM);
>      qdev_prop_set_uint32(dev, "data_width", data_width);
> +    qdev_prop_set_bit(dev, "dma_enabled", dma_enabled);
>  
>      fw_cfg_init1(dev);
>  
> @@ -631,13 +811,25 @@ FWCfgState *fw_cfg_init_mem_wide(hwaddr ctl_addr, hwaddr data_addr,
>      sysbus_mmio_map(sbd, 0, ctl_addr);
>      sysbus_mmio_map(sbd, 1, data_addr);
>  
> -    return FW_CFG(dev);
> +    s = FW_CFG(dev);
> +
> +    if (dma_enabled) {
> +        s->dma_as = dma_as;
> +        s->dma_addr = 0;
> +        sysbus_mmio_map(sbd, 2, dma_addr);
> +        version |= FW_CFG_VERSION_DMA;
> +    }
> +
> +    fw_cfg_add_i32(s, FW_CFG_ID, version);
> +
> +    return s;
>  }
>  
>  FWCfgState *fw_cfg_init_mem(hwaddr ctl_addr, hwaddr data_addr)
>  {
>      return fw_cfg_init_mem_wide(ctl_addr, data_addr,
> -                                fw_cfg_data_mem_ops.valid.max_access_size);
> +                                fw_cfg_data_mem_ops.valid.max_access_size,
> +                                0, NULL);
>  }
>  
>  
> @@ -664,6 +856,7 @@ static const TypeInfo fw_cfg_info = {
>  
>  static Property fw_cfg_io_properties[] = {
>      DEFINE_PROP_UINT32("iobase", FWCfgIoState, iobase, -1),
> +    DEFINE_PROP_UINT32("dma_iobase", FWCfgIoState, dma_iobase, -1),
>      DEFINE_PROP_END_OF_LIST(),
>  };
>  
> @@ -673,8 +866,12 @@ static void fw_cfg_io_realize(DeviceState *dev, Error **errp)
>      SysBusDevice *sbd = SYS_BUS_DEVICE(dev);
>  
>      memory_region_init_io(&s->comb_iomem, OBJECT(s), &fw_cfg_comb_mem_ops,
> -                          FW_CFG(s), "fwcfg", FW_CFG_SIZE);
> +                          FW_CFG(s), "fwcfg", FW_CFG_CTL_SIZE);
>      sysbus_add_io(sbd, s->iobase, &s->comb_iomem);
> +
> +    memory_region_init_io(&s->dma_iomem, OBJECT(s), &fw_cfg_dma_mem_ops,
> +                          FW_CFG(s), "fwcfg.dma", sizeof(dma_addr_t));
> +    sysbus_add_io(sbd, s->dma_iobase, &s->dma_iomem);
>  }

(3) Hmmmm. I think this should be made conditional. sysbus_add_io() maps
the region into IO port space immediately. Callers of fw_cfg_init_io()
should *not* reach sysbus_add_io(); it makes no sense to map the DMA
addr register at IO port 0.

(And then you can omit memory_region_init_io() as well, if dma_iobase is
zero.)

The rest of the code looks fine to me.

Again, I apologize for sucking this much at timely reviews lately. If
you fix (2) and (3) above -- optionally: (1) as well --, then you'll
have my R-b.

If you've lost your patience, I can pick up this series. :)

Thank you
Laszlo


>  
>  static void fw_cfg_io_class_init(ObjectClass *klass, void *data)
> @@ -695,6 +892,8 @@ static const TypeInfo fw_cfg_io_info = {
>  
>  static Property fw_cfg_mem_properties[] = {
>      DEFINE_PROP_UINT32("data_width", FWCfgMemState, data_width, -1),
> +    DEFINE_PROP_BOOL("dma_enabled", FWCfgMemState, parent_obj.dma_enabled,
> +                     false),
>      DEFINE_PROP_END_OF_LIST(),
>  };
>  
> @@ -705,7 +904,7 @@ static void fw_cfg_mem_realize(DeviceState *dev, Error **errp)
>      const MemoryRegionOps *data_ops = &fw_cfg_data_mem_ops;
>  
>      memory_region_init_io(&s->ctl_iomem, OBJECT(s), &fw_cfg_ctl_mem_ops,
> -                          FW_CFG(s), "fwcfg.ctl", FW_CFG_SIZE);
> +                          FW_CFG(s), "fwcfg.ctl", FW_CFG_CTL_SIZE);
>      sysbus_init_mmio(sbd, &s->ctl_iomem);
>  
>      if (s->data_width > data_ops->valid.max_access_size) {
> @@ -723,6 +922,12 @@ static void fw_cfg_mem_realize(DeviceState *dev, Error **errp)
>      memory_region_init_io(&s->data_iomem, OBJECT(s), data_ops, FW_CFG(s),
>                            "fwcfg.data", data_ops->valid.max_access_size);
>      sysbus_init_mmio(sbd, &s->data_iomem);
> +
> +    if (FW_CFG(s)->dma_enabled) {
> +        memory_region_init_io(&s->dma_iomem, OBJECT(s), &fw_cfg_dma_mem_ops,
> +                              FW_CFG(s), "fwcfg.dma", sizeof(dma_addr_t));
> +        sysbus_init_mmio(sbd, &s->dma_iomem);
> +    }
>  }
>  
>  static void fw_cfg_mem_class_init(ObjectClass *klass, void *data)
> diff --git a/include/hw/nvram/fw_cfg.h b/include/hw/nvram/fw_cfg.h
> index e60d3ca..ee0cd8a 100644
> --- a/include/hw/nvram/fw_cfg.h
> +++ b/include/hw/nvram/fw_cfg.h
> @@ -61,6 +61,15 @@ typedef struct FWCfgFiles {
>      FWCfgFile f[];
>  } FWCfgFiles;
>  
> +/* Control as first field allows for different structures selected by this
> + * field, which might be useful in the future
> + */
> +typedef struct FWCfgDmaAccess {
> +    uint32_t control;
> +    uint32_t length;
> +    uint64_t address;
> +} QEMU_PACKED FWCfgDmaAccess;
> +
>  typedef void (*FWCfgCallback)(void *opaque, uint8_t *data);
>  typedef void (*FWCfgReadCallback)(void *opaque, uint32_t offset);
>  
> @@ -77,10 +86,13 @@ void fw_cfg_add_file_callback(FWCfgState *s, const char *filename,
>                                void *data, size_t len);
>  void *fw_cfg_modify_file(FWCfgState *s, const char *filename, void *data,
>                           size_t len);
> +FWCfgState *fw_cfg_init_io_dma(uint32_t iobase, uint32_t dma_iobase,
> +                                AddressSpace *dma_as);
>  FWCfgState *fw_cfg_init_io(uint32_t iobase);
>  FWCfgState *fw_cfg_init_mem(hwaddr ctl_addr, hwaddr data_addr);
> -FWCfgState *fw_cfg_init_mem_wide(hwaddr ctl_addr, hwaddr data_addr,
> -                                 uint32_t data_width);
> +FWCfgState *fw_cfg_init_mem_wide(hwaddr ctl_addr,
> +                                 hwaddr data_addr, uint32_t data_width,
> +                                 hwaddr dma_addr, AddressSpace *dma_as);
>  
>  FWCfgState *fw_cfg_find(void);
>  
> 

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH v4 3/7] Implement fw_cfg DMA interface
  2015-10-01 14:36     ` Laszlo Ersek
@ 2015-10-01 15:52       ` Marc Marí
  2015-10-01 17:18       ` Peter Maydell
  1 sibling, 0 replies; 60+ messages in thread
From: Marc Marí @ 2015-10-01 15:52 UTC (permalink / raw)
  To: Laszlo Ersek
  Cc: Drew, Stefan Hajnoczi, Gabriel L. Somlo, qemu-devel,
	Kevin O'Connor, Gerd Hoffmann

On Thu, 1 Oct 2015 16:36:07 +0200
Laszlo Ersek <lersek@redhat.com> wrote:

> This looks good to me. Thanks for addressing my v3 request.
> 
> I have some new remarks here. I feel *really* bad for not finding them
> earlier. (If you get tired of working on this series, I could pick it
> up and try to shepherd it further.)

I'll continue with it, don't worry. Thanks for your comments.

Marc

> 
> On 10/01/15 14:16, Marc Marí wrote:
> > Based on the specifications on docs/specs/fw_cfg.txt
> > 
> > This interface is an addon. The old interface can still be used as
> > usual.
> > 
> > Based on Gerd Hoffman's initial implementation.
> > 
> > Signed-off-by: Marc Marí <markmb@redhat.com>
> > ---
> >  hw/arm/virt.c             |   2 +-
> >  hw/nvram/fw_cfg.c         | 231
> > +++++++++++++++++++++++++++++++++++++++++++---
> > include/hw/nvram/fw_cfg.h |  16 +++- 3 files changed, 233
> > insertions(+), 16 deletions(-)
> > 
> > diff --git a/hw/arm/virt.c b/hw/arm/virt.c
> > index d25d6cf..7ae984f 100644
> > --- a/hw/arm/virt.c
> > +++ b/hw/arm/virt.c
> > @@ -683,7 +683,7 @@ static void create_fw_cfg(const VirtBoardInfo
> > *vbi) hwaddr size = vbi->memmap[VIRT_FW_CFG].size;
> >      char *nodename;
> >  
> > -    fw_cfg_init_mem_wide(base + 8, base, 8);
> > +    fw_cfg_init_mem_wide(base + 8, base, 8, 0, NULL);
> >  
> >      nodename = g_strdup_printf("/fw-cfg@%" PRIx64, base);
> >      qemu_fdt_add_subnode(vbi->fdt, nodename);
> > diff --git a/hw/nvram/fw_cfg.c b/hw/nvram/fw_cfg.c
> > index 658f8c4..59933b3 100644
> > --- a/hw/nvram/fw_cfg.c
> > +++ b/hw/nvram/fw_cfg.c
> > @@ -23,6 +23,7 @@
> >   */
> >  #include "hw/hw.h"
> >  #include "sysemu/sysemu.h"
> > +#include "sysemu/dma.h"
> >  #include "hw/isa/isa.h"
> >  #include "hw/nvram/fw_cfg.h"
> >  #include "hw/sysbus.h"
> > @@ -30,7 +31,7 @@
> >  #include "qemu/error-report.h"
> >  #include "qemu/config-file.h"
> >  
> > -#define FW_CFG_SIZE 2
> > +#define FW_CFG_CTL_SIZE 2
> >  #define FW_CFG_NAME "fw_cfg"
> >  #define FW_CFG_PATH "/machine/" FW_CFG_NAME
> >  
> > @@ -42,6 +43,16 @@
> >  #define FW_CFG_IO(obj)  OBJECT_CHECK(FWCfgIoState,  (obj),
> > TYPE_FW_CFG_IO) #define FW_CFG_MEM(obj) OBJECT_CHECK(FWCfgMemState,
> > (obj), TYPE_FW_CFG_MEM) 
> > +/* FW_CFG_VERSION bits */
> > +#define FW_CFG_VERSION      0x01
> > +#define FW_CFG_VERSION_DMA  0x02
> > +
> > +/* FW_CFG_DMA_CONTROL bits */
> > +#define FW_CFG_DMA_CTL_ERROR   0x01
> > +#define FW_CFG_DMA_CTL_READ    0x02
> > +#define FW_CFG_DMA_CTL_SKIP    0x04
> > +#define FW_CFG_DMA_CTL_SELECT  0x08
> > +
> >  typedef struct FWCfgEntry {
> >      uint32_t len;
> >      uint8_t *data;
> > @@ -59,6 +70,10 @@ struct FWCfgState {
> >      uint16_t cur_entry;
> >      uint32_t cur_offset;
> >      Notifier machine_ready;
> > +
> > +    bool dma_enabled;
> > +    AddressSpace *dma_as;
> > +    dma_addr_t dma_addr;
> >  };
> >  
> >  struct FWCfgIoState {
> > @@ -66,8 +81,8 @@ struct FWCfgIoState {
> >      FWCfgState parent_obj;
> >      /*< public >*/
> >  
> > -    MemoryRegion comb_iomem;
> > -    uint32_t iobase;
> > +    MemoryRegion comb_iomem, dma_iomem;
> > +    uint32_t iobase, dma_iobase;
> >  };
> >  
> >  struct FWCfgMemState {
> > @@ -75,7 +90,7 @@ struct FWCfgMemState {
> >      FWCfgState parent_obj;
> >      /*< public >*/
> >  
> > -    MemoryRegion ctl_iomem, data_iomem;
> > +    MemoryRegion ctl_iomem, data_iomem, dma_iomem;
> >      uint32_t data_width;
> >      MemoryRegionOps wide_data_ops;
> >  };
> 
> (1) I *think* the new "dma_iomem" field, of type MemoryRegion, could
> be moved up to the parent struct FWCfgEntry, from both FWCfgMemState
> and FWCfgIoState. (And the references in the rest of the code could
> be updated.)
> 
> (
> 
> Independently, some loud thinking, mostly for myself: I've always been
> surprised by the difference between (a) FWCfgIoState *carrying*
> "dma_iobase" as a field -- and a property! --, and (b) FWCfgMemState
> *not* carrying the same as a field -- nor as a property.
> 
> I think I finally understand this difference now. It is all rooted in
> the difference between the internal APIs sysbus_add_io() and
> sysbus_init_mmio(). Both of these are called from the device realize
> functions, but the first (sysbus_add_io()) wants the IO port address
> at once, whereas the second (sysbus_init_mmio()) doesn't want the
> address -- the actual mapping (sysbus_mmio_map()) is delayed to board
> code; the device code doesn't want to be aware of it.
> 
> And this ripples to the top. Because sysbus_add_io() wants the IO port
> address, we must pass that address to the device realize function. And
> for that, we need a device property -- "dma_iobase". This is not new,
> it just follows the example of the preexistent "iobase" field /
> property.
> 
> Whereas, in the sysbus_init_mmio() case, we can keep the MMIO address
> private to the board code; the realize function need not know the
> address. However, the realize function does need to know the *fact*
> that we're going to do DMA. Given that we must maintain this fact (in
> "FWCfgState.dma_enabled") anyway, for other -- e.g. migration
> subsection -- purposes as well, it makes sense to expose that same
> field of the parent struct as a property, so we can set it in the
> memory mapped case *before* the realize function looks at it.
> 
> I feel better now, thanks for listening.
> 
> )
> 
> Then,
> 
> > @@ -292,6 +307,119 @@ static void fw_cfg_data_mem_write(void
> > *opaque, hwaddr addr, } while (i);
> >  }
> >  
> > +static void fw_cfg_dma_transfer(FWCfgState *s)
> > +{
> > +    dma_addr_t len;
> > +    FWCfgDmaAccess dma;
> > +    int arch;
> > +    FWCfgEntry *e;
> > +    int read;
> > +    dma_addr_t dma_addr;
> > +
> > +    /* Reset the address before the next access */
> > +    dma_addr = s->dma_addr;
> > +    s->dma_addr = 0;
> > +
> > +    dma.address = ldq_be_dma(s->dma_as,
> > +                            dma_addr + offsetof(FWCfgDmaAccess,
> > address));
> > +    dma.length = ldl_be_dma(s->dma_as,
> > +                            dma_addr + offsetof(FWCfgDmaAccess,
> > length));
> > +    dma.control = ldl_be_dma(s->dma_as,
> > +                            dma_addr + offsetof(FWCfgDmaAccess,
> > control)); +
> > +    if (dma.control & FW_CFG_DMA_CTL_SELECT) {
> > +        fw_cfg_select(s, dma.control >> 16);
> > +    }
> > +
> > +    arch = !!(s->cur_entry & FW_CFG_ARCH_LOCAL);
> > +    e = &s->entries[arch][s->cur_entry & FW_CFG_ENTRY_MASK];
> > +
> > +    if (dma.control & FW_CFG_DMA_CTL_READ) {
> > +        read = 1;
> > +    } else if (dma.control & FW_CFG_DMA_CTL_SKIP) {
> > +        read = 0;
> > +    } else {
> > +        dma.length = 0;
> 
> I can see you addressed Kevin's comment here.
> 
> 
> > +    }
> > +
> > +    dma.control = 0;
> > +
> > +    while (dma.length > 0 && !(dma.control &
> > FW_CFG_DMA_CTL_ERROR)) {
> > +        if (s->cur_entry == FW_CFG_INVALID || !e->data ||
> > +                                s->cur_offset >= e->len) {
> > +            len = dma.length;
> > +
> > +            /* If the access is not a read access, it will be a
> > skip access,
> > +             * tested before.
> > +             */
> > +            if (read) {
> > +                if (dma_memory_set(s->dma_as, dma.address, 0,
> > len)) {
> > +                    dma.control |= FW_CFG_DMA_CTL_ERROR;
> > +                }
> > +            }
> > +
> > +        } else {
> > +            if (dma.length <= (e->len - s->cur_offset)) {
> > +                len = dma.length;
> > +            } else {
> > +                len = (e->len - s->cur_offset);
> > +            }
> > +
> > +            if (e->read_callback) {
> > +                e->read_callback(e->callback_opaque,
> > s->cur_offset);
> > +            }
> > +
> > +            /* If the access is not a read access, it will be a
> > skip access,
> > +             * tested before.
> > +             */
> > +            if (read) {
> > +                if (dma_memory_write(s->dma_as, dma.address,
> > +                                    &e->data[s->cur_offset], len))
> > {
> > +                    dma.control |= FW_CFG_DMA_CTL_ERROR;
> > +                }
> > +            }
> > +
> > +            s->cur_offset += len;
> > +        }
> > +
> > +        dma.address += len;
> > +        dma.length  -= len;
> > +
> > +    }
> > +
> > +    stl_be_dma(s->dma_as, dma_addr + offsetof(FWCfgDmaAccess,
> > control),
> > +                dma.control);
> > +
> > +    trace_fw_cfg_read(s, 0);
> > +}
> 
> Seems OK to me.
> 
> > +
> > +static void fw_cfg_dma_mem_write(void *opaque, hwaddr addr,
> > +                                 uint64_t value, unsigned size)
> > +{
> > +    FWCfgState *s = opaque;
> > +
> > +    if (size == 4) {
> > +        if (addr == 0) {
> > +            /* FWCfgDmaAccess high address */
> > +            s->dma_addr = value << 32;
> > +        } else if (addr == 4) {
> > +            /* FWCfgDmaAccess low address */
> > +            s->dma_addr |= value;
> > +            fw_cfg_dma_transfer(s);
> > +        }
> > +    } else if (size == 8 && addr == 0) {
> > +        s->dma_addr = value;
> > +        fw_cfg_dma_transfer(s);
> > +    }
> > +}
> 
> Seems to match the zeroing of s->dma_addr in fw_cfg_dma_transfer().
> Good.
> 
> > +
> > +static bool fw_cfg_dma_mem_valid(void *opaque, hwaddr addr,
> > +                                  unsigned size, bool is_write)
> > +{
> > +    return is_write && ((size == 4 && (addr == 0 || addr == 4)) ||
> > +                        (size == 8 && addr == 0));
> > +}
> > +
> >  static bool fw_cfg_data_mem_valid(void *opaque, hwaddr addr,
> >                                    unsigned size, bool is_write)
> >  {
> > @@ -359,6 +487,12 @@ static const MemoryRegionOps
> > fw_cfg_comb_mem_ops = { .valid.accepts = fw_cfg_comb_valid,
> >  };
> >  
> > +static const MemoryRegionOps fw_cfg_dma_mem_ops = {
> > +    .write = fw_cfg_dma_mem_write,
> > +    .endianness = DEVICE_BIG_ENDIAN,
> > +    .valid.accepts = fw_cfg_dma_mem_valid,
> > +};
> 
> (2) Okay. This is somewhat important, and *completely* non-intuitive,
> unfortunately.
> 
> Without setting *both*
> 
>     .valid.max_access_size = 8,
>     .impl.max_access_size = 8,
> 
> here, the memory subsystem will split up all 8-byte wide accesses
> (from the guest side) to two 4-byte wide calls to
> fw_cfg_dma_mem_write()).
> 
> Those calls do satisfy the ordering logic in fw_cfg_dma_mem_write(),
> but nonetheless, the lack of the above setting makes the following
> code in fw_cfg_dma_mem_write() dead:
> 
> > +    } else if (size == 8 && addr == 0) {
> > +        s->dma_addr = value;
> > +        fw_cfg_dma_transfer(s);
> > +    }
> 
> (I verified this claim with gdb on aarch64.)
> 
> So, please initialize both of the above fields to 8.
> 
> > +
> >  static void fw_cfg_reset(DeviceState *d)
> >  {
> >      FWCfgState *s = FW_CFG(d);
> > @@ -399,6 +533,22 @@ static bool is_version_1(void *opaque, int
> > version_id) return version_id == 1;
> >  }
> >  
> > +static bool fw_cfg_dma_enabled(void *opaque)
> > +{
> > +    FWCfgState *s = opaque;
> > +
> > +    return s->dma_enabled;
> > +}
> > +
> > +static VMStateDescription vmstate_fw_cfg_dma = {
> > +    .name = "fw_cfg/dma",
> > +    .needed = fw_cfg_dma_enabled,
> > +    .fields = (VMStateField[]) {
> > +        VMSTATE_UINT64(dma_addr, FWCfgState),
> > +        VMSTATE_END_OF_LIST()
> > +    },
> > +};
> 
> Looks good to me. All fields that come from the command line (ie.
> management layer) need not / must not be part of the migration stream.
> And all data that is programmed by the guest, must. Here, "dma_addr"
> is the only such item. Okay.
> 
> > +
> >  static const VMStateDescription vmstate_fw_cfg = {
> >      .name = "fw_cfg",
> >      .version_id = 2,
> > @@ -408,6 +558,10 @@ static const VMStateDescription vmstate_fw_cfg
> > = { VMSTATE_UINT16_HACK(cur_offset, FWCfgState, is_version_1),
> >          VMSTATE_UINT32_V(cur_offset, FWCfgState, 2),
> >          VMSTATE_END_OF_LIST()
> > +    },
> > +    .subsections = (const VMStateDescription*[]) {
> > +        &vmstate_fw_cfg_dma,
> > +        NULL,
> >      }
> >  };
> >  
> > @@ -593,7 +747,6 @@ static void fw_cfg_init1(DeviceState *dev)
> >      qdev_init_nofail(dev);
> >  
> >      fw_cfg_add_bytes(s, FW_CFG_SIGNATURE, (char *)"QEMU", 4);
> > -    fw_cfg_add_i32(s, FW_CFG_ID, 1);
> >      fw_cfg_add_bytes(s, FW_CFG_UUID, qemu_uuid, 16);
> >      fw_cfg_add_i16(s, FW_CFG_NOGRAPHIC, (uint16_t)(display_type ==
> > DT_NOGRAPHIC)); fw_cfg_add_i16(s, FW_CFG_NB_CPUS,
> > (uint16_t)smp_cpus);
> 
> This is called from fw_cfg_init_io() and fw_cfg_init_mem_wide().
> 
> The former is renamed to fw_cfg_init_io_dma() -- and gets a wrapper
> under the original name --, and sets FW_CFG_ID expliticly.
> 
> The latter sets FW_CFG_ID expliticly.
> 
> Okay.
> 
> 
> > @@ -605,25 +758,52 @@ static void fw_cfg_init1(DeviceState *dev)
> >      qemu_add_machine_init_done_notifier(&s->machine_ready);
> >  }
> >  
> > -FWCfgState *fw_cfg_init_io(uint32_t iobase)
> > +FWCfgState *fw_cfg_init_io_dma(uint32_t iobase, uint32_t
> > dma_iobase,
> > +                                AddressSpace *dma_as)
> >  {
> >      DeviceState *dev;
> > +    FWCfgState *s;
> > +    uint32_t version = FW_CFG_VERSION;
> >  
> >      dev = qdev_create(NULL, TYPE_FW_CFG_IO);
> >      qdev_prop_set_uint32(dev, "iobase", iobase);
> > +    qdev_prop_set_uint32(dev, "dma_iobase", dma_iobase);
> > +
> >      fw_cfg_init1(dev);
> > +    s = FW_CFG(dev);
> > +
> > +    if (dma_as) {
> > +        /* 64 bits for the address field */
> > +        s->dma_as = dma_as;
> > +        s->dma_enabled = true;
> > +        s->dma_addr = 0;
> > +
> > +        version |= FW_CFG_VERSION_DMA;
> > +    }
> >  
> > -    return FW_CFG(dev);
> > +    fw_cfg_add_i32(s, FW_CFG_ID, version);
> > +
> > +    return s;
> >  }
> >  
> > -FWCfgState *fw_cfg_init_mem_wide(hwaddr ctl_addr, hwaddr data_addr,
> > -                                 uint32_t data_width)
> > +FWCfgState *fw_cfg_init_io(uint32_t iobase)
> > +{
> > +    return fw_cfg_init_io_dma(iobase, 0, NULL);
> > +}
> > +
> > +FWCfgState *fw_cfg_init_mem_wide(hwaddr ctl_addr,
> > +                                 hwaddr data_addr, uint32_t
> > data_width,
> > +                                 hwaddr dma_addr, AddressSpace
> > *dma_as) {
> >      DeviceState *dev;
> >      SysBusDevice *sbd;
> > +    FWCfgState *s;
> > +    uint32_t version = FW_CFG_VERSION;
> > +    bool dma_enabled = dma_addr && dma_as;
> >  
> >      dev = qdev_create(NULL, TYPE_FW_CFG_MEM);
> >      qdev_prop_set_uint32(dev, "data_width", data_width);
> > +    qdev_prop_set_bit(dev, "dma_enabled", dma_enabled);
> >  
> >      fw_cfg_init1(dev);
> >  
> > @@ -631,13 +811,25 @@ FWCfgState *fw_cfg_init_mem_wide(hwaddr
> > ctl_addr, hwaddr data_addr, sysbus_mmio_map(sbd, 0, ctl_addr);
> >      sysbus_mmio_map(sbd, 1, data_addr);
> >  
> > -    return FW_CFG(dev);
> > +    s = FW_CFG(dev);
> > +
> > +    if (dma_enabled) {
> > +        s->dma_as = dma_as;
> > +        s->dma_addr = 0;
> > +        sysbus_mmio_map(sbd, 2, dma_addr);
> > +        version |= FW_CFG_VERSION_DMA;
> > +    }
> > +
> > +    fw_cfg_add_i32(s, FW_CFG_ID, version);
> > +
> > +    return s;
> >  }
> >  
> >  FWCfgState *fw_cfg_init_mem(hwaddr ctl_addr, hwaddr data_addr)
> >  {
> >      return fw_cfg_init_mem_wide(ctl_addr, data_addr,
> > -
> > fw_cfg_data_mem_ops.valid.max_access_size);
> > +
> > fw_cfg_data_mem_ops.valid.max_access_size,
> > +                                0, NULL);
> >  }
> >  
> >  
> > @@ -664,6 +856,7 @@ static const TypeInfo fw_cfg_info = {
> >  
> >  static Property fw_cfg_io_properties[] = {
> >      DEFINE_PROP_UINT32("iobase", FWCfgIoState, iobase, -1),
> > +    DEFINE_PROP_UINT32("dma_iobase", FWCfgIoState, dma_iobase, -1),
> >      DEFINE_PROP_END_OF_LIST(),
> >  };
> >  
> > @@ -673,8 +866,12 @@ static void fw_cfg_io_realize(DeviceState
> > *dev, Error **errp) SysBusDevice *sbd = SYS_BUS_DEVICE(dev);
> >  
> >      memory_region_init_io(&s->comb_iomem, OBJECT(s),
> > &fw_cfg_comb_mem_ops,
> > -                          FW_CFG(s), "fwcfg", FW_CFG_SIZE);
> > +                          FW_CFG(s), "fwcfg", FW_CFG_CTL_SIZE);
> >      sysbus_add_io(sbd, s->iobase, &s->comb_iomem);
> > +
> > +    memory_region_init_io(&s->dma_iomem, OBJECT(s),
> > &fw_cfg_dma_mem_ops,
> > +                          FW_CFG(s), "fwcfg.dma",
> > sizeof(dma_addr_t));
> > +    sysbus_add_io(sbd, s->dma_iobase, &s->dma_iomem);
> >  }
> 
> (3) Hmmmm. I think this should be made conditional. sysbus_add_io()
> maps the region into IO port space immediately. Callers of
> fw_cfg_init_io() should *not* reach sysbus_add_io(); it makes no
> sense to map the DMA addr register at IO port 0.
> 
> (And then you can omit memory_region_init_io() as well, if dma_iobase
> is zero.)
> 
> The rest of the code looks fine to me.
> 
> Again, I apologize for sucking this much at timely reviews lately. If
> you fix (2) and (3) above -- optionally: (1) as well --, then you'll
> have my R-b.
> 
> If you've lost your patience, I can pick up this series. :)
> 
> Thank you
> Laszlo
> 
> 
> >  
> >  static void fw_cfg_io_class_init(ObjectClass *klass, void *data)
> > @@ -695,6 +892,8 @@ static const TypeInfo fw_cfg_io_info = {
> >  
> >  static Property fw_cfg_mem_properties[] = {
> >      DEFINE_PROP_UINT32("data_width", FWCfgMemState, data_width,
> > -1),
> > +    DEFINE_PROP_BOOL("dma_enabled", FWCfgMemState,
> > parent_obj.dma_enabled,
> > +                     false),
> >      DEFINE_PROP_END_OF_LIST(),
> >  };
> >  
> > @@ -705,7 +904,7 @@ static void fw_cfg_mem_realize(DeviceState
> > *dev, Error **errp) const MemoryRegionOps *data_ops =
> > &fw_cfg_data_mem_ops; 
> >      memory_region_init_io(&s->ctl_iomem, OBJECT(s),
> > &fw_cfg_ctl_mem_ops,
> > -                          FW_CFG(s), "fwcfg.ctl", FW_CFG_SIZE);
> > +                          FW_CFG(s), "fwcfg.ctl", FW_CFG_CTL_SIZE);
> >      sysbus_init_mmio(sbd, &s->ctl_iomem);
> >  
> >      if (s->data_width > data_ops->valid.max_access_size) {
> > @@ -723,6 +922,12 @@ static void fw_cfg_mem_realize(DeviceState
> > *dev, Error **errp) memory_region_init_io(&s->data_iomem,
> > OBJECT(s), data_ops, FW_CFG(s), "fwcfg.data",
> > data_ops->valid.max_access_size); sysbus_init_mmio(sbd,
> > &s->data_iomem); +
> > +    if (FW_CFG(s)->dma_enabled) {
> > +        memory_region_init_io(&s->dma_iomem, OBJECT(s),
> > &fw_cfg_dma_mem_ops,
> > +                              FW_CFG(s), "fwcfg.dma",
> > sizeof(dma_addr_t));
> > +        sysbus_init_mmio(sbd, &s->dma_iomem);
> > +    }
> >  }
> >  
> >  static void fw_cfg_mem_class_init(ObjectClass *klass, void *data)
> > diff --git a/include/hw/nvram/fw_cfg.h b/include/hw/nvram/fw_cfg.h
> > index e60d3ca..ee0cd8a 100644
> > --- a/include/hw/nvram/fw_cfg.h
> > +++ b/include/hw/nvram/fw_cfg.h
> > @@ -61,6 +61,15 @@ typedef struct FWCfgFiles {
> >      FWCfgFile f[];
> >  } FWCfgFiles;
> >  
> > +/* Control as first field allows for different structures selected
> > by this
> > + * field, which might be useful in the future
> > + */
> > +typedef struct FWCfgDmaAccess {
> > +    uint32_t control;
> > +    uint32_t length;
> > +    uint64_t address;
> > +} QEMU_PACKED FWCfgDmaAccess;
> > +
> >  typedef void (*FWCfgCallback)(void *opaque, uint8_t *data);
> >  typedef void (*FWCfgReadCallback)(void *opaque, uint32_t offset);
> >  
> > @@ -77,10 +86,13 @@ void fw_cfg_add_file_callback(FWCfgState *s,
> > const char *filename, void *data, size_t len);
> >  void *fw_cfg_modify_file(FWCfgState *s, const char *filename, void
> > *data, size_t len);
> > +FWCfgState *fw_cfg_init_io_dma(uint32_t iobase, uint32_t
> > dma_iobase,
> > +                                AddressSpace *dma_as);
> >  FWCfgState *fw_cfg_init_io(uint32_t iobase);
> >  FWCfgState *fw_cfg_init_mem(hwaddr ctl_addr, hwaddr data_addr);
> > -FWCfgState *fw_cfg_init_mem_wide(hwaddr ctl_addr, hwaddr data_addr,
> > -                                 uint32_t data_width);
> > +FWCfgState *fw_cfg_init_mem_wide(hwaddr ctl_addr,
> > +                                 hwaddr data_addr, uint32_t
> > data_width,
> > +                                 hwaddr dma_addr, AddressSpace
> > *dma_as); 
> >  FWCfgState *fw_cfg_find(void);
> >  
> > 
> 

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH v4 3/7] Implement fw_cfg DMA interface
  2015-10-01 14:36     ` Laszlo Ersek
  2015-10-01 15:52       ` Marc Marí
@ 2015-10-01 17:18       ` Peter Maydell
  2015-10-01 19:20         ` Laszlo Ersek
  1 sibling, 1 reply; 60+ messages in thread
From: Peter Maydell @ 2015-10-01 17:18 UTC (permalink / raw)
  To: Laszlo Ersek
  Cc: Drew, Stefan Hajnoczi, Gabriel L. Somlo, QEMU Developers,
	Kevin O'Connor, Gerd Hoffmann, Marc Marí

On 1 October 2015 at 15:36, Laszlo Ersek <lersek@redhat.com> wrote:
> I think I finally understand this difference now. It is all rooted in
> the difference between the internal APIs sysbus_add_io() and
> sysbus_init_mmio(). Both of these are called from the device realize
> functions, but the first (sysbus_add_io()) wants the IO port address at
> once, whereas the second (sysbus_init_mmio()) doesn't want the address
> -- the actual mapping (sysbus_mmio_map()) is delayed to board code; the
> device code doesn't want to be aware of it.

Yes. The sysbus_add_io() API is firmly wedded to the x86 I/O port
concept and to the idea that devices are at fixed I/O port addresses
which don't depend on what board they're in (because the few
non-x86 systems that set up some kind of "IO port" abstraction
are generally doing it to look more x86-like).

That said, sysbus_add_io() is one of those odd functions which
we use half a dozen times in the whole codebase and which leaves
me wondering if it ought to be refactored to work differently
(eg split into "declare IO ports" and "map IO ports into IO space"
like the mmio functions)...

thanks
-- PMM

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH v4 3/7] Implement fw_cfg DMA interface
  2015-10-01 17:18       ` Peter Maydell
@ 2015-10-01 19:20         ` Laszlo Ersek
  0 siblings, 0 replies; 60+ messages in thread
From: Laszlo Ersek @ 2015-10-01 19:20 UTC (permalink / raw)
  To: Peter Maydell
  Cc: Drew, Stefan Hajnoczi, Gabriel L. Somlo, QEMU Developers,
	Kevin O'Connor, Gerd Hoffmann, Marc Marí

On 10/01/15 19:18, Peter Maydell wrote:
> On 1 October 2015 at 15:36, Laszlo Ersek <lersek@redhat.com> wrote:
>> I think I finally understand this difference now. It is all rooted in
>> the difference between the internal APIs sysbus_add_io() and
>> sysbus_init_mmio(). Both of these are called from the device realize
>> functions, but the first (sysbus_add_io()) wants the IO port address at
>> once, whereas the second (sysbus_init_mmio()) doesn't want the address
>> -- the actual mapping (sysbus_mmio_map()) is delayed to board code; the
>> device code doesn't want to be aware of it.
> 
> Yes. The sysbus_add_io() API is firmly wedded to the x86 I/O port
> concept and to the idea that devices are at fixed I/O port addresses
> which don't depend on what board they're in (because the few
> non-x86 systems that set up some kind of "IO port" abstraction
> are generally doing it to look more x86-like).
> 
> That said, sysbus_add_io() is one of those odd functions which
> we use half a dozen times in the whole codebase and which leaves
> me wondering if it ought to be refactored to work differently
> (eg split into "declare IO ports" and "map IO ports into IO space"
> like the mmio functions)...

I had the same idea looming in the back of my mind, but I wouldn't know
how to attack the refactoring (plus I wouldn't want to delay Marc's work
with it -- after all the function is not being introduced by this
series), so I didn't bring it up. :P

Thanks!
Laszlo

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH v4 3/7] Implement fw_cfg DMA interface
  2015-10-01 12:16   ` [Qemu-devel] [PATCH v4 3/7] Implement fw_cfg DMA interface Marc Marí
  2015-10-01 14:36     ` Laszlo Ersek
@ 2015-10-06 14:44     ` Stefan Hajnoczi
  2015-10-06 14:53       ` Peter Maydell
  2015-10-06 14:54       ` Marc Marí
  1 sibling, 2 replies; 60+ messages in thread
From: Stefan Hajnoczi @ 2015-10-06 14:44 UTC (permalink / raw)
  To: Marc Marí
  Cc: Drew, Gabriel L. Somlo, qemu-devel, Kevin O'Connor,
	Gerd Hoffmann, Laszlo

On Thu, Oct 01, 2015 at 02:16:55PM +0200, Marc Marí wrote:
> @@ -292,6 +307,119 @@ static void fw_cfg_data_mem_write(void *opaque, hwaddr addr,
>      } while (i);
>  }
>  
> +static void fw_cfg_dma_transfer(FWCfgState *s)
> +{
> +    dma_addr_t len;
> +    FWCfgDmaAccess dma;
> +    int arch;
> +    FWCfgEntry *e;
> +    int read;
> +    dma_addr_t dma_addr;
> +
> +    /* Reset the address before the next access */
> +    dma_addr = s->dma_addr;
> +    s->dma_addr = 0;
> +
> +    dma.address = ldq_be_dma(s->dma_as,
> +                            dma_addr + offsetof(FWCfgDmaAccess, address));
> +    dma.length = ldl_be_dma(s->dma_as,
> +                            dma_addr + offsetof(FWCfgDmaAccess, length));
> +    dma.control = ldl_be_dma(s->dma_as,
> +                            dma_addr + offsetof(FWCfgDmaAccess, control));

ldq_be_dma() doesn't report errors.  If dma_addr is invalid the return
value could be anything.  Memory corruption inside the guest is possible
if the address/length/control values happen to cause a memory read
operation!

Instead, please use:

if (dma_memory_read(s->dma_as, dma_addr, &dma, sizeof(dma))) {
    stl_be_dma(s->dma_as, dma_addr + offsetof(FWCfgDmaAccess, control),
               FW_CFG_DMA_CTL_ERROR);
    return;
}

dma.address = be64_to_cpu(dma.address);
dma.length = be32_to_cpu(dma.length);
dma.control = be32_to_cpu(dma.control);

> +
> +    if (dma.control & FW_CFG_DMA_CTL_SELECT) {
> +        fw_cfg_select(s, dma.control >> 16);
> +    }
> +
> +    arch = !!(s->cur_entry & FW_CFG_ARCH_LOCAL);
> +    e = &s->entries[arch][s->cur_entry & FW_CFG_ENTRY_MASK];
> +
> +    if (dma.control & FW_CFG_DMA_CTL_READ) {
> +        read = 1;
> +    } else if (dma.control & FW_CFG_DMA_CTL_SKIP) {
> +        read = 0;
> +    } else {
> +        dma.length = 0;
> +    }
> +
> +    dma.control = 0;
> +
> +    while (dma.length > 0 && !(dma.control & FW_CFG_DMA_CTL_ERROR)) {
> +        if (s->cur_entry == FW_CFG_INVALID || !e->data ||
> +                                s->cur_offset >= e->len) {
> +            len = dma.length;
> +
> +            /* If the access is not a read access, it will be a skip access,
> +             * tested before.
> +             */
> +            if (read) {
> +                if (dma_memory_set(s->dma_as, dma.address, 0, len)) {
> +                    dma.control |= FW_CFG_DMA_CTL_ERROR;
> +                }
> +            }
> +
> +        } else {
> +            if (dma.length <= (e->len - s->cur_offset)) {
> +                len = dma.length;
> +            } else {
> +                len = (e->len - s->cur_offset);
> +            }
> +
> +            if (e->read_callback) {
> +                e->read_callback(e->callback_opaque, s->cur_offset);
> +            }
> +
> +            /* If the access is not a read access, it will be a skip access,
> +             * tested before.
> +             */
> +            if (read) {
> +                if (dma_memory_write(s->dma_as, dma.address,
> +                                    &e->data[s->cur_offset], len)) {
> +                    dma.control |= FW_CFG_DMA_CTL_ERROR;
> +                }
> +            }
> +
> +            s->cur_offset += len;
> +        }
> +
> +        dma.address += len;
> +        dma.length  -= len;

I thought these fields are written back to guest memory?  For example,
so the guest knows how many bytes were read before the error occurred.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH v4 3/7] Implement fw_cfg DMA interface
  2015-10-06 14:44     ` Stefan Hajnoczi
@ 2015-10-06 14:53       ` Peter Maydell
  2015-10-08  9:07         ` Stefan Hajnoczi
  2015-10-06 14:54       ` Marc Marí
  1 sibling, 1 reply; 60+ messages in thread
From: Peter Maydell @ 2015-10-06 14:53 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Drew, Gabriel L. Somlo, QEMU Developers, Kevin O'Connor,
	Gerd Hoffmann, Marc Marí, Laszlo

On 6 October 2015 at 15:44, Stefan Hajnoczi <stefanha@gmail.com> wrote:
> On Thu, Oct 01, 2015 at 02:16:55PM +0200, Marc Marí wrote:
>> @@ -292,6 +307,119 @@ static void fw_cfg_data_mem_write(void *opaque, hwaddr addr,
>>      } while (i);
>>  }
>>
>> +static void fw_cfg_dma_transfer(FWCfgState *s)
>> +{
>> +    dma_addr_t len;
>> +    FWCfgDmaAccess dma;
>> +    int arch;
>> +    FWCfgEntry *e;
>> +    int read;
>> +    dma_addr_t dma_addr;
>> +
>> +    /* Reset the address before the next access */
>> +    dma_addr = s->dma_addr;
>> +    s->dma_addr = 0;
>> +
>> +    dma.address = ldq_be_dma(s->dma_as,
>> +                            dma_addr + offsetof(FWCfgDmaAccess, address));
>> +    dma.length = ldl_be_dma(s->dma_as,
>> +                            dma_addr + offsetof(FWCfgDmaAccess, length));
>> +    dma.control = ldl_be_dma(s->dma_as,
>> +                            dma_addr + offsetof(FWCfgDmaAccess, control));
>
> ldq_be_dma() doesn't report errors.  If dma_addr is invalid the return
> value could be anything.  Memory corruption inside the guest is possible
> if the address/length/control values happen to cause a memory read
> operation!

We discussed this in a previous revision. IMHO if the guest has
passed us a bogus dma_addr it should expect memory corruption.
We only need to be sure we don't allow a VM escape.

> Instead, please use:
>
> if (dma_memory_read(s->dma_as, dma_addr, &dma, sizeof(dma))) {
>     stl_be_dma(s->dma_as, dma_addr + offsetof(FWCfgDmaAccess, control),
>                FW_CFG_DMA_CTL_ERROR);

If the guest handed us a bad dma_addr then this write will also
be bogus and could corrupt the guest's memory.

thanks
-- PMM

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH v4 3/7] Implement fw_cfg DMA interface
  2015-10-06 14:53       ` Peter Maydell
@ 2015-10-08  9:07         ` Stefan Hajnoczi
  2015-10-08 10:01           ` Marc Marí
  0 siblings, 1 reply; 60+ messages in thread
From: Stefan Hajnoczi @ 2015-10-08  9:07 UTC (permalink / raw)
  To: Peter Maydell
  Cc: Drew, Gabriel L. Somlo, QEMU Developers, Kevin O'Connor,
	Gerd Hoffmann, Marc Marí, Laszlo

On Tue, Oct 06, 2015 at 03:53:33PM +0100, Peter Maydell wrote:
> On 6 October 2015 at 15:44, Stefan Hajnoczi <stefanha@gmail.com> wrote:
> > On Thu, Oct 01, 2015 at 02:16:55PM +0200, Marc Marí wrote:
> >> @@ -292,6 +307,119 @@ static void fw_cfg_data_mem_write(void *opaque, hwaddr addr,
> >>      } while (i);
> >>  }
> >>
> >> +static void fw_cfg_dma_transfer(FWCfgState *s)
> >> +{
> >> +    dma_addr_t len;
> >> +    FWCfgDmaAccess dma;
> >> +    int arch;
> >> +    FWCfgEntry *e;
> >> +    int read;
> >> +    dma_addr_t dma_addr;
> >> +
> >> +    /* Reset the address before the next access */
> >> +    dma_addr = s->dma_addr;
> >> +    s->dma_addr = 0;
> >> +
> >> +    dma.address = ldq_be_dma(s->dma_as,
> >> +                            dma_addr + offsetof(FWCfgDmaAccess, address));
> >> +    dma.length = ldl_be_dma(s->dma_as,
> >> +                            dma_addr + offsetof(FWCfgDmaAccess, length));
> >> +    dma.control = ldl_be_dma(s->dma_as,
> >> +                            dma_addr + offsetof(FWCfgDmaAccess, control));
> >
> > ldq_be_dma() doesn't report errors.  If dma_addr is invalid the return
> > value could be anything.  Memory corruption inside the guest is possible
> > if the address/length/control values happen to cause a memory read
> > operation!
> 
> We discussed this in a previous revision. IMHO if the guest has
> passed us a bogus dma_addr it should expect memory corruption.
> We only need to be sure we don't allow a VM escape.

Even if the guest-visible behavior doesn't matter, Valgrind won't like
this.  ldq_be_dma() reads from uninitialized stack memory:

#define DEFINE_LDST_DMA(_lname, _sname, _bits, _end) \
    static inline uint##_bits##_t ld##_lname##_##_end##_dma(AddressSpace *as, \
                                                            dma_addr_t addr) \
    {                                                                   \
        uint##_bits##_t val;                                            \
        dma_memory_read(as, addr, &val, (_bits) / 8);                   \
        return _end##_bits##_to_cpu(val);                               \
    }

Bad QEMU, bad userspace process :).

I think we really need to check the error and at least return early.

> > Instead, please use:
> >
> > if (dma_memory_read(s->dma_as, dma_addr, &dma, sizeof(dma))) {
> >     stl_be_dma(s->dma_as, dma_addr + offsetof(FWCfgDmaAccess, control),
> >                FW_CFG_DMA_CTL_ERROR);
> 
> If the guest handed us a bad dma_addr then this write will also
> be bogus and could corrupt the guest's memory.

That's fine because it's not a random address - it's the address that
the guest gave us.

Stefan

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH v4 3/7] Implement fw_cfg DMA interface
  2015-10-08  9:07         ` Stefan Hajnoczi
@ 2015-10-08 10:01           ` Marc Marí
  0 siblings, 0 replies; 60+ messages in thread
From: Marc Marí @ 2015-10-08 10:01 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Peter Maydell, Drew, Gabriel L. Somlo, QEMU Developers,
	Kevin O'Connor, Gerd Hoffmann, Laszlo

On Thu, 8 Oct 2015 10:07:34 +0100
Stefan Hajnoczi <stefanha@gmail.com> wrote:

> On Tue, Oct 06, 2015 at 03:53:33PM +0100, Peter Maydell wrote:
> > On 6 October 2015 at 15:44, Stefan Hajnoczi <stefanha@gmail.com>
> > wrote:
> > > On Thu, Oct 01, 2015 at 02:16:55PM +0200, Marc Marí wrote:
> > >> @@ -292,6 +307,119 @@ static void fw_cfg_data_mem_write(void
> > >> *opaque, hwaddr addr, } while (i);
> > >>  }
> > >>
> > >> +static void fw_cfg_dma_transfer(FWCfgState *s)
> > >> +{
> > >> +    dma_addr_t len;
> > >> +    FWCfgDmaAccess dma;
> > >> +    int arch;
> > >> +    FWCfgEntry *e;
> > >> +    int read;
> > >> +    dma_addr_t dma_addr;
> > >> +
> > >> +    /* Reset the address before the next access */
> > >> +    dma_addr = s->dma_addr;
> > >> +    s->dma_addr = 0;
> > >> +
> > >> +    dma.address = ldq_be_dma(s->dma_as,
> > >> +                            dma_addr + offsetof(FWCfgDmaAccess,
> > >> address));
> > >> +    dma.length = ldl_be_dma(s->dma_as,
> > >> +                            dma_addr + offsetof(FWCfgDmaAccess,
> > >> length));
> > >> +    dma.control = ldl_be_dma(s->dma_as,
> > >> +                            dma_addr + offsetof(FWCfgDmaAccess,
> > >> control));
> > >
> > > ldq_be_dma() doesn't report errors.  If dma_addr is invalid the
> > > return value could be anything.  Memory corruption inside the
> > > guest is possible if the address/length/control values happen to
> > > cause a memory read operation!
> > 
> > We discussed this in a previous revision. IMHO if the guest has
> > passed us a bogus dma_addr it should expect memory corruption.
> > We only need to be sure we don't allow a VM escape.
> 
> Even if the guest-visible behavior doesn't matter, Valgrind won't like
> this.  ldq_be_dma() reads from uninitialized stack memory:
> 
> #define DEFINE_LDST_DMA(_lname, _sname, _bits, _end) \
>     static inline uint##_bits##_t
> ld##_lname##_##_end##_dma(AddressSpace *as, \ dma_addr_t addr) \
>     {                                                                   \
>         uint##_bits##_t
> val;                                            \ dma_memory_read(as,
> addr, &val, (_bits) / 8);                   \ return
> _end##_bits##_to_cpu(val);                               \ }
> 
> Bad QEMU, bad userspace process :).
> 
> I think we really need to check the error and at least return early.

It doesn't hurt to check the error. I'll add it.

Thanks
Marc

> > > Instead, please use:
> > >
> > > if (dma_memory_read(s->dma_as, dma_addr, &dma, sizeof(dma))) {
> > >     stl_be_dma(s->dma_as, dma_addr + offsetof(FWCfgDmaAccess,
> > > control), FW_CFG_DMA_CTL_ERROR);
> > 
> > If the guest handed us a bad dma_addr then this write will also
> > be bogus and could corrupt the guest's memory.
> 
> That's fine because it's not a random address - it's the address that
> the guest gave us.
> 
> Stefan

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH v4 3/7] Implement fw_cfg DMA interface
  2015-10-06 14:44     ` Stefan Hajnoczi
  2015-10-06 14:53       ` Peter Maydell
@ 2015-10-06 14:54       ` Marc Marí
  1 sibling, 0 replies; 60+ messages in thread
From: Marc Marí @ 2015-10-06 14:54 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Drew, Gabriel L. Somlo, qemu-devel, Kevin O'Connor,
	Gerd Hoffmann, Laszlo

On Tue, 6 Oct 2015 15:44:53 +0100
Stefan Hajnoczi <stefanha@gmail.com> wrote:

> On Thu, Oct 01, 2015 at 02:16:55PM +0200, Marc Marí wrote:
> > @@ -292,6 +307,119 @@ static void fw_cfg_data_mem_write(void
> > *opaque, hwaddr addr, } while (i);
> >  }
> >  
> > +static void fw_cfg_dma_transfer(FWCfgState *s)
> > +{
> > +    dma_addr_t len;
> > +    FWCfgDmaAccess dma;
> > +    int arch;
> > +    FWCfgEntry *e;
> > +    int read;
> > +    dma_addr_t dma_addr;
> > +
> > +    /* Reset the address before the next access */
> > +    dma_addr = s->dma_addr;
> > +    s->dma_addr = 0;
> > +
> > +    dma.address = ldq_be_dma(s->dma_as,
> > +                            dma_addr + offsetof(FWCfgDmaAccess,
> > address));
> > +    dma.length = ldl_be_dma(s->dma_as,
> > +                            dma_addr + offsetof(FWCfgDmaAccess,
> > length));
> > +    dma.control = ldl_be_dma(s->dma_as,
> > +                            dma_addr + offsetof(FWCfgDmaAccess,
> > control));
> 
> ldq_be_dma() doesn't report errors.  If dma_addr is invalid the return
> value could be anything.  Memory corruption inside the guest is
> possible if the address/length/control values happen to cause a
> memory read operation!
> 
> Instead, please use:
> 
> if (dma_memory_read(s->dma_as, dma_addr, &dma, sizeof(dma))) {
>     stl_be_dma(s->dma_as, dma_addr + offsetof(FWCfgDmaAccess,
> control), FW_CFG_DMA_CTL_ERROR);
>     return;
> }
> 
> dma.address = be64_to_cpu(dma.address);
> dma.length = be32_to_cpu(dma.length);
> dma.control = be32_to_cpu(dma.control);
> 
> > +
> > +    if (dma.control & FW_CFG_DMA_CTL_SELECT) {
> > +        fw_cfg_select(s, dma.control >> 16);
> > +    }
> > +
> > +    arch = !!(s->cur_entry & FW_CFG_ARCH_LOCAL);
> > +    e = &s->entries[arch][s->cur_entry & FW_CFG_ENTRY_MASK];
> > +
> > +    if (dma.control & FW_CFG_DMA_CTL_READ) {
> > +        read = 1;
> > +    } else if (dma.control & FW_CFG_DMA_CTL_SKIP) {
> > +        read = 0;
> > +    } else {
> > +        dma.length = 0;
> > +    }
> > +
> > +    dma.control = 0;
> > +
> > +    while (dma.length > 0 && !(dma.control &
> > FW_CFG_DMA_CTL_ERROR)) {
> > +        if (s->cur_entry == FW_CFG_INVALID || !e->data ||
> > +                                s->cur_offset >= e->len) {
> > +            len = dma.length;
> > +
> > +            /* If the access is not a read access, it will be a
> > skip access,
> > +             * tested before.
> > +             */
> > +            if (read) {
> > +                if (dma_memory_set(s->dma_as, dma.address, 0,
> > len)) {
> > +                    dma.control |= FW_CFG_DMA_CTL_ERROR;
> > +                }
> > +            }
> > +
> > +        } else {
> > +            if (dma.length <= (e->len - s->cur_offset)) {
> > +                len = dma.length;
> > +            } else {
> > +                len = (e->len - s->cur_offset);
> > +            }
> > +
> > +            if (e->read_callback) {
> > +                e->read_callback(e->callback_opaque,
> > s->cur_offset);
> > +            }
> > +
> > +            /* If the access is not a read access, it will be a
> > skip access,
> > +             * tested before.
> > +             */
> > +            if (read) {
> > +                if (dma_memory_write(s->dma_as, dma.address,
> > +                                    &e->data[s->cur_offset], len))
> > {
> > +                    dma.control |= FW_CFG_DMA_CTL_ERROR;
> > +                }
> > +            }
> > +
> > +            s->cur_offset += len;
> > +        }
> > +
> > +        dma.address += len;
> > +        dma.length  -= len;
> 
> I thought these fields are written back to guest memory?  For example,
> so the guest knows how many bytes were read before the error occurred.

This was proposed here:

http://lists.gnu.org/archive/html/qemu-devel/2015-08/msg04001.html

I also don't see much benefit into knowing how many bytes were read. If
the guest is trying to read an invalid entry or past the end of that
entry, the memory will be filled with zeros. The only moment when an
error would be reported is when there's some problem in the mapping.
And this problem is bad enough to just abort the whole operation.

Regards
Marc

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [Qemu-devel] [PATCH v4 4/7] Enable fw_cfg DMA interface for ARM
  2015-10-01 12:16 ` [Qemu-devel] [PATCH v4 0/7] " Marc Marí
                     ` (2 preceding siblings ...)
  2015-10-01 12:16   ` [Qemu-devel] [PATCH v4 3/7] Implement fw_cfg DMA interface Marc Marí
@ 2015-10-01 12:16   ` Marc Marí
  2015-10-01 14:42     ` Laszlo Ersek
  2015-10-01 12:16   ` [Qemu-devel] [PATCH v4 5/7] Enable fw_cfg DMA interface for x86 Marc Marí
                     ` (3 subsequent siblings)
  7 siblings, 1 reply; 60+ messages in thread
From: Marc Marí @ 2015-10-01 12:16 UTC (permalink / raw)
  To: qemu-devel
  Cc: Drew, Stefan Hajnoczi, Gabriel L. Somlo, Kevin O'Connor,
	Gerd Hoffmann, Marc Marí, Laszlo

Enable the fw_cfg DMA interface for the ARM virt machine.

Based on Gerd Hoffman's initial implementation.

Signed-off-by: Marc Marí <markmb@redhat.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
---
 hw/arm/virt.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 7ae984f..0a39087 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -119,7 +119,7 @@ static const MemMapEntry a15memmap[] = {
     [VIRT_GIC_REDIST] =         { 0x080A0000, 0x00F60000 },
     [VIRT_UART] =               { 0x09000000, 0x00001000 },
     [VIRT_RTC] =                { 0x09010000, 0x00001000 },
-    [VIRT_FW_CFG] =             { 0x09020000, 0x0000000a },
+    [VIRT_FW_CFG] =             { 0x09020000, 0x00000018 },
     [VIRT_MMIO] =               { 0x0a000000, 0x00000200 },
     /* ...repeating for a total of NUM_VIRTIO_TRANSPORTS, each of that size */
     [VIRT_PLATFORM_BUS] =       { 0x0c000000, 0x02000000 },
@@ -677,13 +677,13 @@ static void create_flash(const VirtBoardInfo *vbi)
     g_free(nodename);
 }
 
-static void create_fw_cfg(const VirtBoardInfo *vbi)
+static void create_fw_cfg(const VirtBoardInfo *vbi, AddressSpace *as)
 {
     hwaddr base = vbi->memmap[VIRT_FW_CFG].base;
     hwaddr size = vbi->memmap[VIRT_FW_CFG].size;
     char *nodename;
 
-    fw_cfg_init_mem_wide(base + 8, base, 8, 0, NULL);
+    fw_cfg_init_mem_wide(base + 8, base, 8, base + 16, as);
 
     nodename = g_strdup_printf("/fw-cfg@%" PRIx64, base);
     qemu_fdt_add_subnode(vbi->fdt, nodename);
@@ -1026,7 +1026,7 @@ static void machvirt_init(MachineState *machine)
      */
     create_virtio_devices(vbi, pic);
 
-    create_fw_cfg(vbi);
+    create_fw_cfg(vbi, &address_space_memory);
     rom_set_fw(fw_cfg_find());
 
     guest_info->smp_cpus = smp_cpus;
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH v4 4/7] Enable fw_cfg DMA interface for ARM
  2015-10-01 12:16   ` [Qemu-devel] [PATCH v4 4/7] Enable fw_cfg DMA interface for ARM Marc Marí
@ 2015-10-01 14:42     ` Laszlo Ersek
  0 siblings, 0 replies; 60+ messages in thread
From: Laszlo Ersek @ 2015-10-01 14:42 UTC (permalink / raw)
  To: Marc Marí, qemu-devel
  Cc: Gabriel L. Somlo, Stefan Hajnoczi, Drew, Kevin O'Connor,
	Gerd Hoffmann

On 10/01/15 14:16, Marc Marí wrote:
> Enable the fw_cfg DMA interface for the ARM virt machine.
> 
> Based on Gerd Hoffman's initial implementation.
> 
> Signed-off-by: Marc Marí <markmb@redhat.com>
> Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
> ---
>  hw/arm/virt.c | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
> index 7ae984f..0a39087 100644
> --- a/hw/arm/virt.c
> +++ b/hw/arm/virt.c
> @@ -119,7 +119,7 @@ static const MemMapEntry a15memmap[] = {
>      [VIRT_GIC_REDIST] =         { 0x080A0000, 0x00F60000 },
>      [VIRT_UART] =               { 0x09000000, 0x00001000 },
>      [VIRT_RTC] =                { 0x09010000, 0x00001000 },
> -    [VIRT_FW_CFG] =             { 0x09020000, 0x0000000a },
> +    [VIRT_FW_CFG] =             { 0x09020000, 0x00000018 },
>      [VIRT_MMIO] =               { 0x0a000000, 0x00000200 },
>      /* ...repeating for a total of NUM_VIRTIO_TRANSPORTS, each of that size */
>      [VIRT_PLATFORM_BUS] =       { 0x0c000000, 0x02000000 },
> @@ -677,13 +677,13 @@ static void create_flash(const VirtBoardInfo *vbi)
>      g_free(nodename);
>  }
>  
> -static void create_fw_cfg(const VirtBoardInfo *vbi)
> +static void create_fw_cfg(const VirtBoardInfo *vbi, AddressSpace *as)
>  {
>      hwaddr base = vbi->memmap[VIRT_FW_CFG].base;
>      hwaddr size = vbi->memmap[VIRT_FW_CFG].size;
>      char *nodename;
>  
> -    fw_cfg_init_mem_wide(base + 8, base, 8, 0, NULL);
> +    fw_cfg_init_mem_wide(base + 8, base, 8, base + 16, as);
>  
>      nodename = g_strdup_printf("/fw-cfg@%" PRIx64, base);
>      qemu_fdt_add_subnode(vbi->fdt, nodename);
> @@ -1026,7 +1026,7 @@ static void machvirt_init(MachineState *machine)
>       */
>      create_virtio_devices(vbi, pic);
>  
> -    create_fw_cfg(vbi);
> +    create_fw_cfg(vbi, &address_space_memory);
>      rom_set_fw(fw_cfg_find());
>  
>      guest_info->smp_cpus = smp_cpus;
> 

Reviewed-by: Laszlo Ersek <lersek@redhat.com>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [Qemu-devel] [PATCH v4 5/7] Enable fw_cfg DMA interface for x86
  2015-10-01 12:16 ` [Qemu-devel] [PATCH v4 0/7] " Marc Marí
                     ` (3 preceding siblings ...)
  2015-10-01 12:16   ` [Qemu-devel] [PATCH v4 4/7] Enable fw_cfg DMA interface for ARM Marc Marí
@ 2015-10-01 12:16   ` Marc Marí
  2015-10-01 14:48     ` Laszlo Ersek
  2015-10-01 12:16   ` [Qemu-devel] [PATCH v4 6/7] Make the kernel image in the fw_cfg DMA interface bootable Marc Marí
                     ` (2 subsequent siblings)
  7 siblings, 1 reply; 60+ messages in thread
From: Marc Marí @ 2015-10-01 12:16 UTC (permalink / raw)
  To: qemu-devel
  Cc: Drew, Stefan Hajnoczi, Gabriel L. Somlo, Kevin O'Connor,
	Gerd Hoffmann, Marc Marí, Laszlo

Enable the fw_cfg DMA interface for all the x86 platforms.

Based on Gerd Hoffman's initial implementation.

Signed-off-by: Marc Marí <markmb@redhat.com>
---
 hw/i386/pc.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 461c128..81d93b4 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -753,14 +753,15 @@ static void pc_build_smbios(FWCfgState *fw_cfg)
     }
 }
 
-static FWCfgState *bochs_bios_init(void)
+static FWCfgState *bochs_bios_init(AddressSpace *as)
 {
     FWCfgState *fw_cfg;
     uint64_t *numa_fw_cfg;
     int i, j;
     unsigned int apic_id_limit = pc_apic_id_limit(max_cpus);
 
-    fw_cfg = fw_cfg_init_io(BIOS_CFG_IOPORT);
+    fw_cfg = fw_cfg_init_io_dma(BIOS_CFG_IOPORT, BIOS_CFG_IOPORT + 4, as);
+
     /* FW_CFG_MAX_CPUS is a bit confusing/problematic on x86:
      *
      * SeaBIOS needs FW_CFG_MAX_CPUS for CPU hotplug, but the CPU hotplug
@@ -1407,7 +1408,8 @@ FWCfgState *pc_memory_init(PCMachineState *pcms,
                                         option_rom_mr,
                                         1);
 
-    fw_cfg = bochs_bios_init();
+    fw_cfg = bochs_bios_init(&address_space_memory);
+
     rom_set_fw(fw_cfg);
 
     if (guest_info->has_reserved_memory && pcms->hotplug_memory.base) {
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH v4 5/7] Enable fw_cfg DMA interface for x86
  2015-10-01 12:16   ` [Qemu-devel] [PATCH v4 5/7] Enable fw_cfg DMA interface for x86 Marc Marí
@ 2015-10-01 14:48     ` Laszlo Ersek
  0 siblings, 0 replies; 60+ messages in thread
From: Laszlo Ersek @ 2015-10-01 14:48 UTC (permalink / raw)
  To: Marc Marí, qemu-devel
  Cc: Gabriel L. Somlo, Stefan Hajnoczi, Drew, Kevin O'Connor,
	Gerd Hoffmann

On 10/01/15 14:16, Marc Marí wrote:
> Enable the fw_cfg DMA interface for all the x86 platforms.
> 
> Based on Gerd Hoffman's initial implementation.
> 
> Signed-off-by: Marc Marí <markmb@redhat.com>
> ---
>  hw/i386/pc.c | 8 +++++---
>  1 file changed, 5 insertions(+), 3 deletions(-)
> 
> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> index 461c128..81d93b4 100644
> --- a/hw/i386/pc.c
> +++ b/hw/i386/pc.c
> @@ -753,14 +753,15 @@ static void pc_build_smbios(FWCfgState *fw_cfg)
>      }
>  }
>  
> -static FWCfgState *bochs_bios_init(void)
> +static FWCfgState *bochs_bios_init(AddressSpace *as)
>  {
>      FWCfgState *fw_cfg;
>      uint64_t *numa_fw_cfg;
>      int i, j;
>      unsigned int apic_id_limit = pc_apic_id_limit(max_cpus);
>  
> -    fw_cfg = fw_cfg_init_io(BIOS_CFG_IOPORT);
> +    fw_cfg = fw_cfg_init_io_dma(BIOS_CFG_IOPORT, BIOS_CFG_IOPORT + 4, as);
> +
>      /* FW_CFG_MAX_CPUS is a bit confusing/problematic on x86:
>       *
>       * SeaBIOS needs FW_CFG_MAX_CPUS for CPU hotplug, but the CPU hotplug
> @@ -1407,7 +1408,8 @@ FWCfgState *pc_memory_init(PCMachineState *pcms,
>                                          option_rom_mr,
>                                          1);
>  
> -    fw_cfg = bochs_bios_init();
> +    fw_cfg = bochs_bios_init(&address_space_memory);
> +
>      rom_set_fw(fw_cfg);
>  
>      if (guest_info->has_reserved_memory && pcms->hotplug_memory.base) {
> 

Looks like this matches the agreement between Gerd and Peter.

http://thread.gmane.org/gmane.comp.emulators.qemu/363030/focus=363058

Reviewed-by: Laszlo Ersek <lersek@redhat.com>

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [Qemu-devel] [PATCH v4 6/7] Make the kernel image in the fw_cfg DMA interface bootable
  2015-10-01 12:16 ` [Qemu-devel] [PATCH v4 0/7] " Marc Marí
                     ` (4 preceding siblings ...)
  2015-10-01 12:16   ` [Qemu-devel] [PATCH v4 5/7] Enable fw_cfg DMA interface for x86 Marc Marí
@ 2015-10-01 12:16   ` Marc Marí
  2015-10-01 15:25     ` Laszlo Ersek
  2015-10-01 12:16   ` [Qemu-devel] [PATCH v4 7/7] fw_cfg: Define a static signature to be returned on DMA port reads Marc Marí
  2015-10-01 13:19   ` [Qemu-devel] [PATCH v4 0/7] fw_cfg DMA interface Kevin O'Connor
  7 siblings, 1 reply; 60+ messages in thread
From: Marc Marí @ 2015-10-01 12:16 UTC (permalink / raw)
  To: qemu-devel
  Cc: Drew, Stefan Hajnoczi, Gabriel L. Somlo, Kevin O'Connor,
	Gerd Hoffmann, Marc Marí, Laszlo

Add an entry to the bootorder file with name "vmlinux".
Give this entry more priority than the romfile.

Signed-off-by: Marc Marí <markmb@redhat.com>
---
 hw/i386/pc.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 81d93b4..c4c51f7 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -1012,8 +1012,10 @@ static void load_linux(PCMachineState *pcms,
     fw_cfg_add_bytes(fw_cfg, FW_CFG_SETUP_DATA, setup, setup_size);
 
     option_rom[nb_option_roms].name = "linuxboot.bin";
-    option_rom[nb_option_roms].bootindex = 0;
+    option_rom[nb_option_roms].bootindex = 1;
     nb_option_roms++;
+
+    add_boot_device_path(0, NULL, "vmlinux");
 }
 
 #define NE2000_NB_MAX 6
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH v4 6/7] Make the kernel image in the fw_cfg DMA interface bootable
  2015-10-01 12:16   ` [Qemu-devel] [PATCH v4 6/7] Make the kernel image in the fw_cfg DMA interface bootable Marc Marí
@ 2015-10-01 15:25     ` Laszlo Ersek
  2015-10-01 16:02       ` Kevin O'Connor
  2015-10-02  8:09       ` Gerd Hoffmann
  0 siblings, 2 replies; 60+ messages in thread
From: Laszlo Ersek @ 2015-10-01 15:25 UTC (permalink / raw)
  To: Marc Marí, qemu-devel
  Cc: Gabriel L. Somlo, Stefan Hajnoczi, Drew, Kevin O'Connor,
	Gerd Hoffmann

On 10/01/15 14:16, Marc Marí wrote:
> Add an entry to the bootorder file with name "vmlinux".
> Give this entry more priority than the romfile.
> 
> Signed-off-by: Marc Marí <markmb@redhat.com>
> ---
>  hw/i386/pc.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> index 81d93b4..c4c51f7 100644
> --- a/hw/i386/pc.c
> +++ b/hw/i386/pc.c
> @@ -1012,8 +1012,10 @@ static void load_linux(PCMachineState *pcms,
>      fw_cfg_add_bytes(fw_cfg, FW_CFG_SETUP_DATA, setup, setup_size);
>  
>      option_rom[nb_option_roms].name = "linuxboot.bin";
> -    option_rom[nb_option_roms].bootindex = 0;
> +    option_rom[nb_option_roms].bootindex = 1;
>      nb_option_roms++;
> +
> +    add_boot_device_path(0, NULL, "vmlinux");
>  }
>  
>  #define NE2000_NB_MAX 6
> 

Where does this idea come from?

This will yet again break the invariant that the bootorder fw_cfg file
is a list of OpenFirmware device paths.

(The other annoying offender being "HALT", which caused me huge grief in
the OVMF OpenFirmware devpath parser parser, when libvirt decided that
"-boot strict=on" would become default.)

OVMF (and AAVMF) have been able to boot kernels directly from fw_cfg for
quite some time now, without the above change. They look at the fw_cfg
key 0x0008 (FW_CFG_KERNEL_SIZE). Direct kernel boot is being requested
iff the (little endian encoded) uint32 value is nonzero.

In QEMU, this role of FW_CFG_KERNEL_SIZE is true for:

- arm_load_kernel_notify() [hw/arm/boot.c], relied upon by AAVMF,

- load_multiboot() [hw/i386/multiboot.c] and
  load_linux() [hw/i386/pc.c], relied upon by OVMF,

- "hw/ppc/mac_newworld.c", "hw/ppc/mac_oldworld.c", "hw/sparc/sun4m.c",
  and "hw/sparc64/sun4u.c", relied upon by whatever boot firmware they
  have.

Why is this necessary for SeaBIOS?

... I can see the function bootprio_find_vmlinux(), in SeaBIOS patch

  [PATCH v4 2/2] Boot Linux using QEMU fw_cfg DMA interface

Given that direct kernel boot is always expected to take priority over
anything else (which is ensured by this QEMU patch too), can
bootprio_find_vmlinux() in SeaBIOS just look at the same fw_cfg key
(0x0008)?

I checked the QEMU_CFG_* macros in "src/fw/paravirt.c", and I think when
SeaBIOS boots an fw_cfg kernel *now*, it doesn't do it with its own
implementation; it probably launches the "linuxboot.bin" oprom (from
QEMU -- "pc-bios/optionrom/linuxboot.S").

I vaguely recall that this assembly code has been deemed unwieldy for
implementing the DMA interface (and I fully agree), which is why the
above-referenced SeaBIOS patch adds the capability to SeaBIOS itself. I
agree with that too.

But, instead of messing up the "bootorder" fw_cfg file, can
bootprio_find_vmlinux() look at the non-nullity of the
QEMU_CFG_KERNEL_SIZE key? Such as:

- read four bytes from under the fw_cfg selector QEMU_CFG_KERNEL_SIZE
  (0x0008),
- if it is zero,    return -1 --> no kernel boot requested,
- if it is nonzero, return  0 --> which means "top priority".

In other words, I agree with:

> -    option_rom[nb_option_roms].bootindex = 0;
> +    option_rom[nb_option_roms].bootindex = 1;

in this patch, but I disagree with:

> +    add_boot_device_path(0, NULL, "vmlinux");

Thank you
Laszlo

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH v4 6/7] Make the kernel image in the fw_cfg DMA interface bootable
  2015-10-01 15:25     ` Laszlo Ersek
@ 2015-10-01 16:02       ` Kevin O'Connor
  2015-10-01 16:10         ` Laszlo Ersek
                           ` (2 more replies)
  2015-10-02  8:09       ` Gerd Hoffmann
  1 sibling, 3 replies; 60+ messages in thread
From: Kevin O'Connor @ 2015-10-01 16:02 UTC (permalink / raw)
  To: Laszlo Ersek
  Cc: Drew, Stefan Hajnoczi, Gabriel L. Somlo, qemu-devel,
	Gerd Hoffmann, Marc Marí

On Thu, Oct 01, 2015 at 05:25:11PM +0200, Laszlo Ersek wrote:
> On 10/01/15 14:16, Marc Marí wrote:
> > Add an entry to the bootorder file with name "vmlinux".
> > Give this entry more priority than the romfile.
> > 
> > Signed-off-by: Marc Marí <markmb@redhat.com>
> > ---
> >  hw/i386/pc.c | 4 +++-
> >  1 file changed, 3 insertions(+), 1 deletion(-)
> > 
> > diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> > index 81d93b4..c4c51f7 100644
> > --- a/hw/i386/pc.c
> > +++ b/hw/i386/pc.c
> > @@ -1012,8 +1012,10 @@ static void load_linux(PCMachineState *pcms,
> >      fw_cfg_add_bytes(fw_cfg, FW_CFG_SETUP_DATA, setup, setup_size);
> >  
> >      option_rom[nb_option_roms].name = "linuxboot.bin";
> > -    option_rom[nb_option_roms].bootindex = 0;
> > +    option_rom[nb_option_roms].bootindex = 1;
> >      nb_option_roms++;
> > +
> > +    add_boot_device_path(0, NULL, "vmlinux");
> >  }
> >  
> >  #define NE2000_NB_MAX 6
> > 
> 
> Where does this idea come from?
> 
> This will yet again break the invariant that the bootorder fw_cfg file
> is a list of OpenFirmware device paths.

I believe it came from a discussion between myself and Marc, because I
did not like the way Marc's original SeaBIOS patches overloaded the
meaning of "genroms/linuxboot.bin" in the bootorder file.

[...]
> Given that direct kernel boot is always expected to take priority over
> anything else (which is ensured by this QEMU patch too), can
> bootprio_find_vmlinux() in SeaBIOS just look at the same fw_cfg key
> (0x0008)?

That's fine with me.  Marc - I think qemu_vmlinux_setup() in SeaBIOS
with the following would work:

void qemu_vmlinux_setup(void)
{
    u32 kernel_size;
    qemu_cfg_read_entry(&kernel_size, QEMU_CFG_KERNEL_SIZE, sizeof(kernel_size));
    if (kernel_size)
        boot_add_qemu_vmlinux("QEMU Kernel image", 0);
}

Marc, if you're okay with the above, you don't have to keep respinning
patches - I can fix it up upon commit.

-Kevin

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH v4 6/7] Make the kernel image in the fw_cfg DMA interface bootable
  2015-10-01 16:02       ` Kevin O'Connor
@ 2015-10-01 16:10         ` Laszlo Ersek
  2015-10-01 18:15         ` Marc Marí
  2015-10-02  8:16         ` Gerd Hoffmann
  2 siblings, 0 replies; 60+ messages in thread
From: Laszlo Ersek @ 2015-10-01 16:10 UTC (permalink / raw)
  To: Kevin O'Connor
  Cc: Drew, Stefan Hajnoczi, Gabriel L. Somlo, qemu-devel,
	Gerd Hoffmann, Marc Marí

On 10/01/15 18:02, Kevin O'Connor wrote:
> On Thu, Oct 01, 2015 at 05:25:11PM +0200, Laszlo Ersek wrote:
>> On 10/01/15 14:16, Marc Marí wrote:
>>> Add an entry to the bootorder file with name "vmlinux".
>>> Give this entry more priority than the romfile.
>>>
>>> Signed-off-by: Marc Marí <markmb@redhat.com>
>>> ---
>>>  hw/i386/pc.c | 4 +++-
>>>  1 file changed, 3 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
>>> index 81d93b4..c4c51f7 100644
>>> --- a/hw/i386/pc.c
>>> +++ b/hw/i386/pc.c
>>> @@ -1012,8 +1012,10 @@ static void load_linux(PCMachineState *pcms,
>>>      fw_cfg_add_bytes(fw_cfg, FW_CFG_SETUP_DATA, setup, setup_size);
>>>  
>>>      option_rom[nb_option_roms].name = "linuxboot.bin";
>>> -    option_rom[nb_option_roms].bootindex = 0;
>>> +    option_rom[nb_option_roms].bootindex = 1;
>>>      nb_option_roms++;
>>> +
>>> +    add_boot_device_path(0, NULL, "vmlinux");
>>>  }
>>>  
>>>  #define NE2000_NB_MAX 6
>>>
>>
>> Where does this idea come from?
>>
>> This will yet again break the invariant that the bootorder fw_cfg file
>> is a list of OpenFirmware device paths.
> 
> I believe it came from a discussion between myself and Marc, because I
> did not like the way Marc's original SeaBIOS patches overloaded the
> meaning of "genroms/linuxboot.bin" in the bootorder file.
> 
> [...]
>> Given that direct kernel boot is always expected to take priority over
>> anything else (which is ensured by this QEMU patch too), can
>> bootprio_find_vmlinux() in SeaBIOS just look at the same fw_cfg key
>> (0x0008)?
> 
> That's fine with me.  Marc - I think qemu_vmlinux_setup() in SeaBIOS
> with the following would work:
> 
> void qemu_vmlinux_setup(void)
> {
>     u32 kernel_size;
>     qemu_cfg_read_entry(&kernel_size, QEMU_CFG_KERNEL_SIZE, sizeof(kernel_size));
>     if (kernel_size)
>         boot_add_qemu_vmlinux("QEMU Kernel image", 0);
> }

Thank you, Kevin; that would be great!
Laszlo

> Marc, if you're okay with the above, you don't have to keep respinning
> patches - I can fix it up upon commit.
> 
> -Kevin
> 

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH v4 6/7] Make the kernel image in the fw_cfg DMA interface bootable
  2015-10-01 16:02       ` Kevin O'Connor
  2015-10-01 16:10         ` Laszlo Ersek
@ 2015-10-01 18:15         ` Marc Marí
  2015-10-02  8:16         ` Gerd Hoffmann
  2 siblings, 0 replies; 60+ messages in thread
From: Marc Marí @ 2015-10-01 18:15 UTC (permalink / raw)
  To: Kevin O'Connor
  Cc: Drew, Stefan Hajnoczi, Gabriel L. Somlo, qemu-devel,
	Gerd Hoffmann, Laszlo Ersek

On Thu, 1 Oct 2015 12:02:42 -0400
"Kevin O'Connor" <kevin@koconnor.net> wrote:

> On Thu, Oct 01, 2015 at 05:25:11PM +0200, Laszlo Ersek wrote:
> > On 10/01/15 14:16, Marc Marí wrote:
> > > Add an entry to the bootorder file with name "vmlinux".
> > > Give this entry more priority than the romfile.
> > > 
> > > Signed-off-by: Marc Marí <markmb@redhat.com>
> > > ---
> > >  hw/i386/pc.c | 4 +++-
> > >  1 file changed, 3 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> > > index 81d93b4..c4c51f7 100644
> > > --- a/hw/i386/pc.c
> > > +++ b/hw/i386/pc.c
> > > @@ -1012,8 +1012,10 @@ static void load_linux(PCMachineState
> > > *pcms, fw_cfg_add_bytes(fw_cfg, FW_CFG_SETUP_DATA, setup,
> > > setup_size); 
> > >      option_rom[nb_option_roms].name = "linuxboot.bin";
> > > -    option_rom[nb_option_roms].bootindex = 0;
> > > +    option_rom[nb_option_roms].bootindex = 1;
> > >      nb_option_roms++;
> > > +
> > > +    add_boot_device_path(0, NULL, "vmlinux");
> > >  }
> > >  
> > >  #define NE2000_NB_MAX 6
> > > 
> > 
> > Where does this idea come from?
> > 
> > This will yet again break the invariant that the bootorder fw_cfg
> > file is a list of OpenFirmware device paths.
> 
> I believe it came from a discussion between myself and Marc, because I
> did not like the way Marc's original SeaBIOS patches overloaded the
> meaning of "genroms/linuxboot.bin" in the bootorder file.

I didn't like it either.

> [...]
> > Given that direct kernel boot is always expected to take priority
> > over anything else (which is ensured by this QEMU patch too), can
> > bootprio_find_vmlinux() in SeaBIOS just look at the same fw_cfg key
> > (0x0008)?
> 
> That's fine with me.  Marc - I think qemu_vmlinux_setup() in SeaBIOS
> with the following would work:
> 
> void qemu_vmlinux_setup(void)
> {
>     u32 kernel_size;
>     qemu_cfg_read_entry(&kernel_size, QEMU_CFG_KERNEL_SIZE,
> sizeof(kernel_size)); if (kernel_size)
>         boot_add_qemu_vmlinux("QEMU Kernel image", 0);
> }
> 
> Marc, if you're okay with the above, you don't have to keep respinning
> patches - I can fix it up upon commit.

I'm ok with it

Thanks
Marc

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH v4 6/7] Make the kernel image in the fw_cfg DMA interface bootable
  2015-10-01 16:02       ` Kevin O'Connor
  2015-10-01 16:10         ` Laszlo Ersek
  2015-10-01 18:15         ` Marc Marí
@ 2015-10-02  8:16         ` Gerd Hoffmann
  2015-10-02  8:24           ` Marc Marí
  2015-10-02 13:38           ` Kevin O'Connor
  2 siblings, 2 replies; 60+ messages in thread
From: Gerd Hoffmann @ 2015-10-02  8:16 UTC (permalink / raw)
  To: Kevin O'Connor
  Cc: Drew, Stefan Hajnoczi, Gabriel L. Somlo, qemu-devel,
	Marc Marí, Laszlo Ersek

  Hi,

> That's fine with me.  Marc - I think qemu_vmlinux_setup() in SeaBIOS
> with the following would work:
> 
> void qemu_vmlinux_setup(void)
> {
>     u32 kernel_size;
>     qemu_cfg_read_entry(&kernel_size, QEMU_CFG_KERNEL_SIZE, sizeof(kernel_size));
>     if (kernel_size)
>         boot_add_qemu_vmlinux("QEMU Kernel image", 0);
> }

It isn't that simple.  We also have support for multiboot kernels (using
multiboot.bin option rom).  So when doing this you need to be prepared
to find a multiboot kernel in fw_cfg.

cheers,
  Gerd

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH v4 6/7] Make the kernel image in the fw_cfg DMA interface bootable
  2015-10-02  8:16         ` Gerd Hoffmann
@ 2015-10-02  8:24           ` Marc Marí
  2015-10-02  9:01             ` Gerd Hoffmann
  2015-10-02 13:38           ` Kevin O'Connor
  1 sibling, 1 reply; 60+ messages in thread
From: Marc Marí @ 2015-10-02  8:24 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: Drew, Stefan Hajnoczi, Gabriel L. Somlo, qemu-devel,
	Kevin O'Connor, Laszlo Ersek

On Fri, 02 Oct 2015 10:16:26 +0200
Gerd Hoffmann <kraxel@redhat.com> wrote:

>   Hi,
> 
> > That's fine with me.  Marc - I think qemu_vmlinux_setup() in SeaBIOS
> > with the following would work:
> > 
> > void qemu_vmlinux_setup(void)
> > {
> >     u32 kernel_size;
> >     qemu_cfg_read_entry(&kernel_size, QEMU_CFG_KERNEL_SIZE,
> > sizeof(kernel_size)); if (kernel_size)
> >         boot_add_qemu_vmlinux("QEMU Kernel image", 0);
> > }
> 
> It isn't that simple.  We also have support for multiboot kernels
> (using multiboot.bin option rom).  So when doing this you need to be
> prepared to find a multiboot kernel in fw_cfg.
> 
> cheers,
>   Gerd

A solution that I can see is adding DMA boot capabilities to the
linuxboot.S optionrom. I was trying to avoid this, but it looks like
not doing so creates lots of problems. It may be better than adding a
"nice" shortcut somewhere in QEMU or SeaBIOS.

Who uses this optionrom (and will benefit from this change)?

Thanks
Marc

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH v4 6/7] Make the kernel image in the fw_cfg DMA interface bootable
  2015-10-02  8:24           ` Marc Marí
@ 2015-10-02  9:01             ` Gerd Hoffmann
  2015-10-02 11:47               ` Laszlo Ersek
  0 siblings, 1 reply; 60+ messages in thread
From: Gerd Hoffmann @ 2015-10-02  9:01 UTC (permalink / raw)
  To: Marc Marí
  Cc: Drew, Stefan Hajnoczi, Gabriel L. Somlo, qemu-devel,
	Kevin O'Connor, Laszlo Ersek

  Hi,

> A solution that I can see is adding DMA boot capabilities to the
> linuxboot.S optionrom. I was trying to avoid this, but it looks like
> not doing so creates lots of problems. It may be better than adding a
> "nice" shortcut somewhere in QEMU or SeaBIOS.
> 
> Who uses this optionrom (and will benefit from this change)?

Any -kernel boot on x86 will use either linuxboot.bin or multiboot.bin.

>From a compatibility point of view implementing it in the option rom
certainly has its advantages.  And as the option roms are shipped with
and loaded by qemu you don't even have to do runtime detection if that
simplifies things.  We can just build two versions of the roms and qemu
can use the one or the other depending on whenever the machine type in
question has fw_cfg dma enabled or not.

cheers,
  Gerd

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH v4 6/7] Make the kernel image in the fw_cfg DMA interface bootable
  2015-10-02  9:01             ` Gerd Hoffmann
@ 2015-10-02 11:47               ` Laszlo Ersek
  2015-10-02 12:07                 ` Gerd Hoffmann
  0 siblings, 1 reply; 60+ messages in thread
From: Laszlo Ersek @ 2015-10-02 11:47 UTC (permalink / raw)
  To: Gerd Hoffmann, Marc Marí
  Cc: Stefan Hajnoczi, Drew, Kevin O'Connor, Gabriel L. Somlo,
	qemu-devel

On 10/02/15 11:01, Gerd Hoffmann wrote:
>   Hi,
> 
>> A solution that I can see is adding DMA boot capabilities to the
>> linuxboot.S optionrom. I was trying to avoid this, but it looks like
>> not doing so creates lots of problems. It may be better than adding a
>> "nice" shortcut somewhere in QEMU or SeaBIOS.
>>
>> Who uses this optionrom (and will benefit from this change)?
> 
> Any -kernel boot on x86 will use either linuxboot.bin or multiboot.bin.

(Except when your firmware is OVMF -- OVMF has its own LoadLinuxLib. So,
if you decide to extend linuxboot.bin / multiboot.bin with the DMA
capability, that can't regress OVMF by definition, and you certainly
won't hear me complain.)

Thanks
Laszlo

> From a compatibility point of view implementing it in the option rom
> certainly has its advantages.  And as the option roms are shipped with
> and loaded by qemu you don't even have to do runtime detection if that
> simplifies things.  We can just build two versions of the roms and qemu
> can use the one or the other depending on whenever the machine type in
> question has fw_cfg dma enabled or not.
> 
> cheers,
>   Gerd
> 
> 
> 
> 

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH v4 6/7] Make the kernel image in the fw_cfg DMA interface bootable
  2015-10-02 11:47               ` Laszlo Ersek
@ 2015-10-02 12:07                 ` Gerd Hoffmann
  2015-10-02 13:25                   ` Laszlo Ersek
  0 siblings, 1 reply; 60+ messages in thread
From: Gerd Hoffmann @ 2015-10-02 12:07 UTC (permalink / raw)
  To: Laszlo Ersek
  Cc: Drew, Stefan Hajnoczi, Gabriel L. Somlo, qemu-devel,
	Kevin O'Connor, Marc Marí

  Hi,

> > Any -kernel boot on x86 will use either linuxboot.bin or multiboot.bin.
> 
> (Except when your firmware is OVMF -- OVMF has its own LoadLinuxLib. So,
> if you decide to extend linuxboot.bin / multiboot.bin with the DMA
> capability, that can't regress OVMF by definition, and you certainly
> won't hear me complain.)

What does ovmf expect btw?  linux kernel with efi stub I assume?  Could
you also load efi apps, i.e. something like "qemu -kernel shell.efi"?

cheers,
  Gerd

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH v4 6/7] Make the kernel image in the fw_cfg DMA interface bootable
  2015-10-02 12:07                 ` Gerd Hoffmann
@ 2015-10-02 13:25                   ` Laszlo Ersek
  2015-10-02 13:30                     ` Laszlo Ersek
  2015-10-03  0:05                     ` Jordan Justen
  0 siblings, 2 replies; 60+ messages in thread
From: Laszlo Ersek @ 2015-10-02 13:25 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: Drew, Matt Fleming, Stefan Hajnoczi, Gabriel L. Somlo, qemu-devel,
	Kevin O'Connor, Jordan Justen (Intel address), Marc Marí

On 10/02/15 14:07, Gerd Hoffmann wrote:
>   Hi,
> 
>>> Any -kernel boot on x86 will use either linuxboot.bin or multiboot.bin.
>>
>> (Except when your firmware is OVMF -- OVMF has its own LoadLinuxLib. So,
>> if you decide to extend linuxboot.bin / multiboot.bin with the DMA
>> capability, that can't regress OVMF by definition, and you certainly
>> won't hear me complain.)
> 
> What does ovmf expect btw?  linux kernel with efi stub I assume?

That's a hard question for me to answer :) (The library was written /
ported by Jordan, so I'm not responding from personal memory.)

In Dec 2014 - Jan 2015, Matt, Paolo, Jordan & myself had a long
discussion about the different ways to boot an EFI kernel (subject "the
different ways to boot an EFI kernel"). Ultimately Matt wrote an article:

http://www.uefidk.com/blog/linux-efi-boot-stub

To distill the discussion (and I hope Matt will correct me if I'm wrong,
although I'll be heavily stealing from his emails), there are three ways:

(1) Legacy EFI boot where the boot loader does *everything*
(2) EFI boot stub
(3) EFI handover protocol

In (1), the boot loader calls ExitBootServices().

"OvmfPkg/Library/LoadLinuxLib" supports this method, for directly
booting a kernel from fw_cfg, on the path that it calls "Old kernels
without EFI handover protocol".

The problem with (1) is that "all the smarts
of booting a Linux kernel on EFI platforms are in the boot loader",
which includes workarounds for platform bugs, and lock-step development
between kernel and boot loader.

(2) From Matt: "This method makes the Linux kernel appear to be
a PE/COFF executable. This allows us to perform bug workarounds early in
the kernel source because it's the kernel that calls ExitBootServices().
Additional goodness is obtained by the fact that you no longer need to
try as hard to keep kernel and boot loader development in sync because
the boot loader does very little and all smarts are in the kernel."

The general drawback here is that without a boot loader, you can "only"
load the kernel image (and the EFI stub of the kernel can "only" load
the initrd) from filesystems for which the firmware has support (ie.
those that it exposes with EFI_SIMPLE_FILE_SYSTEM_PROTOCOL interfaces).

OvmfPkg/Library/LoadLinuxLib does *not* support this method.

However, ArmVirtPkg/Library/PlatformIntelBdsLib/QemuKernel.c does. (In
fact, for AAVMF, this is the only method supported.) The "drawback"
remark about filesystems does not apply, because the kernel, initrd and
cmdline blobs are retrieved from fw_cfg, and exposed in a synthetic
(memory-only, read-only) EFI_SIMPLE_FILE_SYSTEM_PROTOCOL. Then we launch
the kernel with gBS->LoadImage() and gBS->StartImage(), and the kernel
loads the initrd with the standard EFI_SIMPLE_FILE_SYSTEM_PROTOCOL
member functions.

(3) From Matt again: 'The third option, the EFI handover protocol, is a
happy medium between the first two approaches. You split the "EFI boot
smarts" between the boot loader and EFI boot stub, which allows you to
a) load files from non-FAT file systems via the boot loader and b) leave
all the EFI bug workarounds to the kernel developers because the kernel
is still responsible for calling ExitBootServices().'

Later Matt also mentioned that under (3) you can have extra bells and
whistles (graphics etc) in your boot loader, while the kernel still gets
early firmware access.

OVMF supports this method too.

I'll add that the "FAT file system" restriction that (3) is supposed to
remedy is a bit laxer in general; even without a GRUB-like boot loader,
you can load a kernel from PXE / TFTP, and generally from anything that
looks like an EFI_SIMPLE_FILE_SYSTEM_PROTOCOL.

So the summary is this:
- OVMF can load "legacy" kernels, implementing an appropriate boot
  loader and calling ExitBootServices() (1)

- OVMF can load EFI stubbed kernels, implementing an absolutely minimal
  boot loader (3)

- AAVMF can load EFI stubbed kernels only (see the rationale in
  <https://github.com/tianocore/edk2/commit/23d04b58>), in method (2).
  This was written by me, and I chose this over (1) and (3) because:

  - (1) would have never worked / made any sense for aarch64 -- see
    again the commit msg I referenced,

  - (3) would have been overkill -- all the extras that could have been
    granted by an external boot loader were useless here, and the EFI
    stub dependency of (3) enabled the simpler, direct LoadImage() /
    StartImage() method.

... Paolo asked why OVMF hadn't opted for (2) as well, to which Matt
replied -- if I understand correctly -- that in parallel with the
kernel's facilities being developed for (3), OVMF was supposed to
support / exercise those facilities too.

Another question was if (2) could be enabled in / ported to OVMF -- it
could be, yes, but I'm hard pressed for any reason.

> Could
> you also load efi apps, i.e. something like "qemu -kernel shell.efi"?

I seem to remember that this has been suggested by Jordan as well. The
synthetic file system + LoadImage() + StartImage(), seen in (2), would
be theoretically suitable for this. However, at least three things
aren't a good match now:

- The way the synthetic filesystem is currently populated. One thing we
  need to put in there is the kernel image, to be launched from the
  firmware with LoadImage() + StartImage(). However, the kernel uses
  the filesystem too (it looks for the initrd file there), therefore we
  populate the fs with that file too. Such a filesystem may not a good
  match for other EFI executables (especially not if they expect a
  writeable working directory).

- If you boot the kernel successfully, then StartImage() never returns.
  If it returns, then that's mostly considered an error, and the usual
  boot option processing commences, as if you had never specified
  "-kernel". This is probably not appropriate when your payload was the
  UEFI shell: exiting the shell (any shell) is completely normal for a
  user, and starting to process boot options right after that is
  probably unexpected / unintended.

- When the top level kernel boot function returns (due to a genuine
  kernel boot error, or because "-kernel" wasn't specified,
  or -- fictively -- because the shell exited normally), the synthetic
  filesystem is torn down. This is okay for a kernel payload (because if
  booting it failed for the first time, there's no reason to retry it),
  but a user might want to reenter (and again exit) the shell any
  number of times.

... I think I prefer to keep the shell built into the firmware, and/or
to keep it on "UefiShell.iso". I already consider "-kernel" an abuse of
fw_cfg (one that at this point we can't get rid of any more); let's not
make it worse. I think the *real* goal here is an easy-to-use,
zero-config semihosting solution; ie. accessing a host directory tree
within the guest.

However, that use case is a burning problem for full-blown guest OS-es
as well. Assuming we end up with a simple solution, I think I'd prefer
to implement a UEFI_DRIVER that exposes the same host-side tree as an
EFI_SIMPLE_FILE_SYSTEM_PROTOCOL instance to the firmware, over
bastardizing fw_cfg even more.

I don't yet know how dynamic host-side writes should be shown to guest
firmware, but in the worst case, I could just return EFI_MEDIA_CHANGED:

    If the medium is changed while there are open file handles to the
    volume, all file handles to the volume will return
    EFI_MEDIA_CHANGED. To access the files on the new medium, the
    volume must be reopened with OpenVolume().

Thanks
Laszlo

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH v4 6/7] Make the kernel image in the fw_cfg DMA interface bootable
  2015-10-02 13:25                   ` Laszlo Ersek
@ 2015-10-02 13:30                     ` Laszlo Ersek
  2015-10-03  0:05                     ` Jordan Justen
  1 sibling, 0 replies; 60+ messages in thread
From: Laszlo Ersek @ 2015-10-02 13:30 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: Drew, Matt Fleming, Stefan Hajnoczi, Gabriel L. Somlo, qemu-devel,
	Kevin O'Connor, Jordan Justen (Intel address), Marc Marí

On 10/02/15 15:25, Laszlo Ersek wrote:

> In Dec 2014 - Jan 2015, Matt, Paolo, Jordan & myself had a long
> discussion about the different ways to boot an EFI kernel (subject "the
> different ways to boot an EFI kernel"). Ultimately Matt wrote an article:
> 
> http://www.uefidk.com/blog/linux-efi-boot-stub

Correction: that article predates our discussion. The one I wanted to
link is this:

http://lwn.net/Articles/632528/

Sorry
Laszlo

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH v4 6/7] Make the kernel image in the fw_cfg DMA interface bootable
  2015-10-02 13:25                   ` Laszlo Ersek
  2015-10-02 13:30                     ` Laszlo Ersek
@ 2015-10-03  0:05                     ` Jordan Justen
  1 sibling, 0 replies; 60+ messages in thread
From: Jordan Justen @ 2015-10-03  0:05 UTC (permalink / raw)
  To: Laszlo Ersek, Gerd Hoffmann
  Cc: Drew, Matt Fleming, Stefan Hajnoczi, Gabriel L. Somlo, qemu-devel,
	Kevin O'Connor, Marc Marí

On 2015-10-02 06:25:50, Laszlo Ersek wrote:
> On 10/02/15 14:07, Gerd Hoffmann wrote:
> >   Hi,
> > 
> >>> Any -kernel boot on x86 will use either linuxboot.bin or multiboot.bin.
> >>
> >> (Except when your firmware is OVMF -- OVMF has its own LoadLinuxLib. So,
> >> if you decide to extend linuxboot.bin / multiboot.bin with the DMA
> >> capability, that can't regress OVMF by definition, and you certainly
> >> won't hear me complain.)
> > 
> > What does ovmf expect btw?  linux kernel with efi stub I assume?
> 
> That's a hard question for me to answer :) (The library was written /
> ported by Jordan, so I'm not responding from personal memory.)
> 
> In Dec 2014 - Jan 2015, Matt, Paolo, Jordan & myself had a long
> discussion about the different ways to boot an EFI kernel (subject "the
> different ways to boot an EFI kernel"). Ultimately Matt wrote an article:
> 
> http://www.uefidk.com/blog/linux-efi-boot-stub
> 
> To distill the discussion (and I hope Matt will correct me if I'm wrong,
> although I'll be heavily stealing from his emails), there are three ways:
> 
> (1) Legacy EFI boot where the boot loader does *everything*
> (2) EFI boot stub
> (3) EFI handover protocol
> 
> In (1), the boot loader calls ExitBootServices().
> 
> "OvmfPkg/Library/LoadLinuxLib" supports this method, for directly
> booting a kernel from fw_cfg, on the path that it calls "Old kernels
> without EFI handover protocol".
> 
> The problem with (1) is that "all the smarts
> of booting a Linux kernel on EFI platforms are in the boot loader",
> which includes workarounds for platform bugs, and lock-step development
> between kernel and boot loader.
> 
> (2) From Matt: "This method makes the Linux kernel appear to be
> a PE/COFF executable. This allows us to perform bug workarounds early in
> the kernel source because it's the kernel that calls ExitBootServices().
> Additional goodness is obtained by the fact that you no longer need to
> try as hard to keep kernel and boot loader development in sync because
> the boot loader does very little and all smarts are in the kernel."
> 
> The general drawback here is that without a boot loader, you can "only"
> load the kernel image (and the EFI stub of the kernel can "only" load
> the initrd) from filesystems for which the firmware has support (ie.
> those that it exposes with EFI_SIMPLE_FILE_SYSTEM_PROTOCOL interfaces).
> 
> OvmfPkg/Library/LoadLinuxLib does *not* support this method.
> 
> However, ArmVirtPkg/Library/PlatformIntelBdsLib/QemuKernel.c does. (In
> fact, for AAVMF, this is the only method supported.) The "drawback"
> remark about filesystems does not apply, because the kernel, initrd and
> cmdline blobs are retrieved from fw_cfg, and exposed in a synthetic
> (memory-only, read-only) EFI_SIMPLE_FILE_SYSTEM_PROTOCOL. Then we launch
> the kernel with gBS->LoadImage() and gBS->StartImage(), and the kernel
> loads the initrd with the standard EFI_SIMPLE_FILE_SYSTEM_PROTOCOL
> member functions.
> 
> (3) From Matt again: 'The third option, the EFI handover protocol, is a
> happy medium between the first two approaches. You split the "EFI boot
> smarts" between the boot loader and EFI boot stub, which allows you to
> a) load files from non-FAT file systems via the boot loader and b) leave
> all the EFI bug workarounds to the kernel developers because the kernel
> is still responsible for calling ExitBootServices().'
> 
> Later Matt also mentioned that under (3) you can have extra bells and
> whistles (graphics etc) in your boot loader, while the kernel still gets
> early firmware access.
> 
> OVMF supports this method too.
> 
> I'll add that the "FAT file system" restriction that (3) is supposed to
> remedy is a bit laxer in general; even without a GRUB-like boot loader,
> you can load a kernel from PXE / TFTP, and generally from anything that
> looks like an EFI_SIMPLE_FILE_SYSTEM_PROTOCOL.
> 
> So the summary is this:
> - OVMF can load "legacy" kernels, implementing an appropriate boot
>   loader and calling ExitBootServices() (1)
> 
> - OVMF can load EFI stubbed kernels, implementing an absolutely minimal
>   boot loader (3)
> 
> - AAVMF can load EFI stubbed kernels only (see the rationale in
>   <https://github.com/tianocore/edk2/commit/23d04b58>), in method (2).
>   This was written by me, and I chose this over (1) and (3) because:
> 
>   - (1) would have never worked / made any sense for aarch64 -- see
>     again the commit msg I referenced,
> 
>   - (3) would have been overkill -- all the extras that could have been
>     granted by an external boot loader were useless here, and the EFI
>     stub dependency of (3) enabled the simpler, direct LoadImage() /
>     StartImage() method.
> 
> ... Paolo asked why OVMF hadn't opted for (2) as well, to which Matt
> replied -- if I understand correctly -- that in parallel with the
> kernel's facilities being developed for (3), OVMF was supposed to
> support / exercise those facilities too.
> 
> Another question was if (2) could be enabled in / ported to OVMF -- it
> could be, yes, but I'm hard pressed for any reason.

I already did this nearly 3 years ago, but then I dropped the ball.

http://thread.gmane.org/gmane.comp.bios.tianocore.devel/925

I'd still like to add this to OVMF, since it enables any old EFI
application to be run with qemu's -kernel parameter. For example, I
was able to launch the EFI shells with it.

-Jordan

> > Could
> > you also load efi apps, i.e. something like "qemu -kernel shell.efi"?
> 
> I seem to remember that this has been suggested by Jordan as well. The
> synthetic file system + LoadImage() + StartImage(), seen in (2), would
> be theoretically suitable for this. However, at least three things
> aren't a good match now:
> 
> - The way the synthetic filesystem is currently populated. One thing we
>   need to put in there is the kernel image, to be launched from the
>   firmware with LoadImage() + StartImage(). However, the kernel uses
>   the filesystem too (it looks for the initrd file there), therefore we
>   populate the fs with that file too. Such a filesystem may not a good
>   match for other EFI executables (especially not if they expect a
>   writeable working directory).
> 
> - If you boot the kernel successfully, then StartImage() never returns.
>   If it returns, then that's mostly considered an error, and the usual
>   boot option processing commences, as if you had never specified
>   "-kernel". This is probably not appropriate when your payload was the
>   UEFI shell: exiting the shell (any shell) is completely normal for a
>   user, and starting to process boot options right after that is
>   probably unexpected / unintended.
> 
> - When the top level kernel boot function returns (due to a genuine
>   kernel boot error, or because "-kernel" wasn't specified,
>   or -- fictively -- because the shell exited normally), the synthetic
>   filesystem is torn down. This is okay for a kernel payload (because if
>   booting it failed for the first time, there's no reason to retry it),
>   but a user might want to reenter (and again exit) the shell any
>   number of times.
> 
> ... I think I prefer to keep the shell built into the firmware, and/or
> to keep it on "UefiShell.iso". I already consider "-kernel" an abuse of
> fw_cfg (one that at this point we can't get rid of any more); let's not
> make it worse. I think the *real* goal here is an easy-to-use,
> zero-config semihosting solution; ie. accessing a host directory tree
> within the guest.
> 
> However, that use case is a burning problem for full-blown guest OS-es
> as well. Assuming we end up with a simple solution, I think I'd prefer
> to implement a UEFI_DRIVER that exposes the same host-side tree as an
> EFI_SIMPLE_FILE_SYSTEM_PROTOCOL instance to the firmware, over
> bastardizing fw_cfg even more.
> 
> I don't yet know how dynamic host-side writes should be shown to guest
> firmware, but in the worst case, I could just return EFI_MEDIA_CHANGED:
> 
>     If the medium is changed while there are open file handles to the
>     volume, all file handles to the volume will return
>     EFI_MEDIA_CHANGED. To access the files on the new medium, the
>     volume must be reopened with OpenVolume().
> 
> Thanks
> Laszlo
> 

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH v4 6/7] Make the kernel image in the fw_cfg DMA interface bootable
  2015-10-02  8:16         ` Gerd Hoffmann
  2015-10-02  8:24           ` Marc Marí
@ 2015-10-02 13:38           ` Kevin O'Connor
  2015-10-05  9:18             ` Gerd Hoffmann
  1 sibling, 1 reply; 60+ messages in thread
From: Kevin O'Connor @ 2015-10-02 13:38 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: Drew, Stefan Hajnoczi, Gabriel L. Somlo, qemu-devel,
	Marc Marí, Laszlo Ersek

On Fri, Oct 02, 2015 at 10:16:26AM +0200, Gerd Hoffmann wrote:
>   Hi,
> 
> > That's fine with me.  Marc - I think qemu_vmlinux_setup() in SeaBIOS
> > with the following would work:
> > 
> > void qemu_vmlinux_setup(void)
> > {
> >     u32 kernel_size;
> >     qemu_cfg_read_entry(&kernel_size, QEMU_CFG_KERNEL_SIZE, sizeof(kernel_size));
> >     if (kernel_size)
> >         boot_add_qemu_vmlinux("QEMU Kernel image", 0);
> > }
> 
> It isn't that simple.  We also have support for multiboot kernels (using
> multiboot.bin option rom).  So when doing this you need to be prepared
> to find a multiboot kernel in fw_cfg.

Is there some way to detect if it's a multiboot kernel?  If so,
seabios can just fall back to using multiboot.bin.

-Kevin

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH v4 6/7] Make the kernel image in the fw_cfg DMA interface bootable
  2015-10-02 13:38           ` Kevin O'Connor
@ 2015-10-05  9:18             ` Gerd Hoffmann
  0 siblings, 0 replies; 60+ messages in thread
From: Gerd Hoffmann @ 2015-10-05  9:18 UTC (permalink / raw)
  To: Kevin O'Connor
  Cc: Drew, Stefan Hajnoczi, Gabriel L. Somlo, qemu-devel,
	Marc Marí, Laszlo Ersek

On Fr, 2015-10-02 at 09:38 -0400, Kevin O'Connor wrote:
> On Fri, Oct 02, 2015 at 10:16:26AM +0200, Gerd Hoffmann wrote:
> >   Hi,
> > 
> > > That's fine with me.  Marc - I think qemu_vmlinux_setup() in SeaBIOS
> > > with the following would work:
> > > 
> > > void qemu_vmlinux_setup(void)
> > > {
> > >     u32 kernel_size;
> > >     qemu_cfg_read_entry(&kernel_size, QEMU_CFG_KERNEL_SIZE, sizeof(kernel_size));
> > >     if (kernel_size)
> > >         boot_add_qemu_vmlinux("QEMU Kernel image", 0);
> > > }
> > 
> > It isn't that simple.  We also have support for multiboot kernels (using
> > multiboot.bin option rom).  So when doing this you need to be prepared
> > to find a multiboot kernel in fw_cfg.
> 
> Is there some way to detect if it's a multiboot kernel?

Yes.  There is a header with magic + checksum in the first 8k, see
hw/i386/multiboot.c (in qemu).

Or check the option rom name in the bootorder file, it's multiboot
instead of linuxboot.

>   If so,
> seabios can just fall back to using multiboot.bin.

Or add multiboot support to seabios.

cheers,
  Gerd

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH v4 6/7] Make the kernel image in the fw_cfg DMA interface bootable
  2015-10-01 15:25     ` Laszlo Ersek
  2015-10-01 16:02       ` Kevin O'Connor
@ 2015-10-02  8:09       ` Gerd Hoffmann
  2015-10-02 13:40         ` Kevin O'Connor
  1 sibling, 1 reply; 60+ messages in thread
From: Gerd Hoffmann @ 2015-10-02  8:09 UTC (permalink / raw)
  To: Laszlo Ersek
  Cc: Drew, Stefan Hajnoczi, Gabriel L. Somlo, qemu-devel,
	Kevin O'Connor, Marc Marí

> - read four bytes from under the fw_cfg selector QEMU_CFG_KERNEL_SIZE
>   (0x0008),
> - if it is zero,    return -1 --> no kernel boot requested,
> - if it is nonzero, return  0 --> which means "top priority".
> 
> In other words, I agree with:
> 
> > -    option_rom[nb_option_roms].bootindex = 0;
> > +    option_rom[nb_option_roms].bootindex = 1;

Hmm.  That makes the boot order undefined for "qemu -kernel foo -device
virtio-blk,drive=bar,bootindex=1" when using an old seabios.  I don't
think this is a good idea.

cheers,
  Gerd

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH v4 6/7] Make the kernel image in the fw_cfg DMA interface bootable
  2015-10-02  8:09       ` Gerd Hoffmann
@ 2015-10-02 13:40         ` Kevin O'Connor
  2015-10-02 13:50           ` Laszlo Ersek
                             ` (2 more replies)
  0 siblings, 3 replies; 60+ messages in thread
From: Kevin O'Connor @ 2015-10-02 13:40 UTC (permalink / raw)
  To: Gerd Hoffmann
  Cc: Drew, Stefan Hajnoczi, Gabriel L. Somlo, qemu-devel,
	Marc Marí, Laszlo Ersek

On Fri, Oct 02, 2015 at 10:09:17AM +0200, Gerd Hoffmann wrote:
> > - read four bytes from under the fw_cfg selector QEMU_CFG_KERNEL_SIZE
> >   (0x0008),
> > - if it is zero,    return -1 --> no kernel boot requested,
> > - if it is nonzero, return  0 --> which means "top priority".
> > 
> > In other words, I agree with:
> > 
> > > -    option_rom[nb_option_roms].bootindex = 0;
> > > +    option_rom[nb_option_roms].bootindex = 1;

The bootindex in QEMU is not visible in the firmware, so if the rest
of patch 6 is dropped then the above should be dropped as well.

> Hmm.  That makes the boot order undefined for "qemu -kernel foo -device
> virtio-blk,drive=bar,bootindex=1" when using an old seabios.  I don't
> think this is a good idea.

Wouldn't that make the bootorder undefined everywhere?  What does it
mean to use -kernel and specify a bootorder?

-Kevin

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH v4 6/7] Make the kernel image in the fw_cfg DMA interface bootable
  2015-10-02 13:40         ` Kevin O'Connor
@ 2015-10-02 13:50           ` Laszlo Ersek
  2015-10-02 15:24           ` Daniel P. Berrange
  2015-10-05  9:26           ` Gerd Hoffmann
  2 siblings, 0 replies; 60+ messages in thread
From: Laszlo Ersek @ 2015-10-02 13:50 UTC (permalink / raw)
  To: Kevin O'Connor, Gerd Hoffmann
  Cc: Gabriel L. Somlo, Stefan Hajnoczi, Marc Marí, Drew,
	qemu-devel

On 10/02/15 15:40, Kevin O'Connor wrote:
> On Fri, Oct 02, 2015 at 10:09:17AM +0200, Gerd Hoffmann wrote:
>>> - read four bytes from under the fw_cfg selector QEMU_CFG_KERNEL_SIZE
>>>   (0x0008),
>>> - if it is zero,    return -1 --> no kernel boot requested,
>>> - if it is nonzero, return  0 --> which means "top priority".
>>>
>>> In other words, I agree with:
>>>
>>>> -    option_rom[nb_option_roms].bootindex = 0;
>>>> +    option_rom[nb_option_roms].bootindex = 1;
> 
> The bootindex in QEMU is not visible in the firmware, so if the rest
> of patch 6 is dropped then the above should be dropped as well.
> 
>> Hmm.  That makes the boot order undefined for "qemu -kernel foo -device
>> virtio-blk,drive=bar,bootindex=1" when using an old seabios.  I don't
>> think this is a good idea.
> 
> Wouldn't that make the bootorder undefined everywhere?  What does it
> mean to use -kernel and specify a bootorder?

OVMF & AAVMF look at QEMU_CFG_KERNEL_SIZE first, and process the
bootorder only after. (The kernel boot can fail due to various reasons,
after which it makes sense to start processing the bootorder fw_cfg file.)

Thanks
Laszlo

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH v4 6/7] Make the kernel image in the fw_cfg DMA interface bootable
  2015-10-02 13:40         ` Kevin O'Connor
  2015-10-02 13:50           ` Laszlo Ersek
@ 2015-10-02 15:24           ` Daniel P. Berrange
  2015-10-05  9:26           ` Gerd Hoffmann
  2 siblings, 0 replies; 60+ messages in thread
From: Daniel P. Berrange @ 2015-10-02 15:24 UTC (permalink / raw)
  To: Kevin O'Connor
  Cc: Drew, Stefan Hajnoczi, Gabriel L. Somlo, qemu-devel,
	Gerd Hoffmann, Marc Marí, Laszlo Ersek

On Fri, Oct 02, 2015 at 09:40:41AM -0400, Kevin O'Connor wrote:
> On Fri, Oct 02, 2015 at 10:09:17AM +0200, Gerd Hoffmann wrote:
> > > - read four bytes from under the fw_cfg selector QEMU_CFG_KERNEL_SIZE
> > >   (0x0008),
> > > - if it is zero,    return -1 --> no kernel boot requested,
> > > - if it is nonzero, return  0 --> which means "top priority".
> > > 
> > > In other words, I agree with:
> > > 
> > > > -    option_rom[nb_option_roms].bootindex = 0;
> > > > +    option_rom[nb_option_roms].bootindex = 1;
> 
> The bootindex in QEMU is not visible in the firmware, so if the rest
> of patch 6 is dropped then the above should be dropped as well.
> 
> > Hmm.  That makes the boot order undefined for "qemu -kernel foo -device
> > virtio-blk,drive=bar,bootindex=1" when using an old seabios.  I don't
> > think this is a good idea.
> 
> Wouldn't that make the bootorder undefined everywhere?  What does it
> mean to use -kernel and specify a bootorder?

I think it is pretty meaningless - I've always considered the use of
-kernel vs -boot / bootindex= to be mutually exclusive. I think that
libvirt leaves out -boot entirely if it adds -kernel. If the kernel
specified via -kernel doesn't boot for some reason (corrupt image
or wrong arch image or something) then I think users would reasonably
expect QEMU to not boot, rather than fallback to non-kernel boot
approach.

Regards,
Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH v4 6/7] Make the kernel image in the fw_cfg DMA interface bootable
  2015-10-02 13:40         ` Kevin O'Connor
  2015-10-02 13:50           ` Laszlo Ersek
  2015-10-02 15:24           ` Daniel P. Berrange
@ 2015-10-05  9:26           ` Gerd Hoffmann
  2 siblings, 0 replies; 60+ messages in thread
From: Gerd Hoffmann @ 2015-10-05  9:26 UTC (permalink / raw)
  To: Kevin O'Connor
  Cc: Drew, Stefan Hajnoczi, Gabriel L. Somlo, qemu-devel,
	Marc Marí, Laszlo Ersek

On Fr, 2015-10-02 at 09:40 -0400, Kevin O'Connor wrote:
> On Fri, Oct 02, 2015 at 10:09:17AM +0200, Gerd Hoffmann wrote:
> > > - read four bytes from under the fw_cfg selector QEMU_CFG_KERNEL_SIZE
> > >   (0x0008),
> > > - if it is zero,    return -1 --> no kernel boot requested,
> > > - if it is nonzero, return  0 --> which means "top priority".
> > > 
> > > In other words, I agree with:
> > > 
> > > > -    option_rom[nb_option_roms].bootindex = 0;
> > > > +    option_rom[nb_option_roms].bootindex = 1;
> 
> The bootindex in QEMU is not visible in the firmware, so if the rest
> of patch 6 is dropped then the above should be dropped as well.
> 
> > Hmm.  That makes the boot order undefined for "qemu -kernel foo -device
> > virtio-blk,drive=bar,bootindex=1" when using an old seabios.  I don't
> > think this is a good idea.
> 
> Wouldn't that make the bootorder undefined everywhere? What does it
> mean to use -kernel and specify a bootorder?

Kernel gets priority 0, everything else what you specify (typically
configs start with bootindex=1), so the kernel gets the highest
priority.  So unless the user uses F12 to enter the boot menu and picks
something else the kernel is booted.

cheers,
  Gerd

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [Qemu-devel] [PATCH v4 7/7] fw_cfg: Define a static signature to be returned on DMA port reads
  2015-10-01 12:16 ` [Qemu-devel] [PATCH v4 0/7] " Marc Marí
                     ` (5 preceding siblings ...)
  2015-10-01 12:16   ` [Qemu-devel] [PATCH v4 6/7] Make the kernel image in the fw_cfg DMA interface bootable Marc Marí
@ 2015-10-01 12:16   ` Marc Marí
  2015-10-01 16:07     ` Laszlo Ersek
  2015-10-01 13:19   ` [Qemu-devel] [PATCH v4 0/7] fw_cfg DMA interface Kevin O'Connor
  7 siblings, 1 reply; 60+ messages in thread
From: Marc Marí @ 2015-10-01 12:16 UTC (permalink / raw)
  To: qemu-devel
  Cc: Drew, Stefan Hajnoczi, Gabriel L. Somlo, Kevin O'Connor,
	Gerd Hoffmann, Laszlo

From: Kevin O'Connor <kevin@koconnor.net>

Return a static signature ("QEMU CFG") if the guest does a read to the
DMA address io register.

Signed-off-by: Kevin O'Connor <kevin@koconnor.net>
---
 docs/specs/fw_cfg.txt |  4 ++++
 hw/nvram/fw_cfg.c     | 13 +++++++++++--
 2 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/docs/specs/fw_cfg.txt b/docs/specs/fw_cfg.txt
index 2d6b2da..249b99e 100644
--- a/docs/specs/fw_cfg.txt
+++ b/docs/specs/fw_cfg.txt
@@ -93,6 +93,10 @@ by selecting the "signature" item using key 0x0000 (FW_CFG_SIGNATURE),
 and reading four bytes from the data register. If the fw_cfg device is
 present, the four bytes read will contain the characters "QEMU".
 
+Additionaly, if the DMA interface is available then a read to the DMA
+Address will return 0x51454d5520434647 ("QEMU CFG" in big-endian
+format).
+
 === Revision / feature bitmap (Key 0x0001, FW_CFG_ID) ===
 
 A 32-bit little-endian unsigned int, this item is used to check for enabled
diff --git a/hw/nvram/fw_cfg.c b/hw/nvram/fw_cfg.c
index 59933b3..c6dcce4 100644
--- a/hw/nvram/fw_cfg.c
+++ b/hw/nvram/fw_cfg.c
@@ -53,6 +53,8 @@
 #define FW_CFG_DMA_CTL_SKIP    0x04
 #define FW_CFG_DMA_CTL_SELECT  0x08
 
+#define FW_CFG_DMA_SIGNATURE 0x51454d5520434647 /* "QEMU CFG" */
+
 typedef struct FWCfgEntry {
     uint32_t len;
     uint8_t *data;
@@ -393,6 +395,12 @@ static void fw_cfg_dma_transfer(FWCfgState *s)
     trace_fw_cfg_read(s, 0);
 }
 
+static uint64_t fw_cfg_dma_mem_read(void *opaque, hwaddr addr,
+                                    unsigned size)
+{
+    return FW_CFG_DMA_SIGNATURE >> ((8 - addr - size) * 8);
+}
+
 static void fw_cfg_dma_mem_write(void *opaque, hwaddr addr,
                                  uint64_t value, unsigned size)
 {
@@ -416,8 +424,8 @@ static void fw_cfg_dma_mem_write(void *opaque, hwaddr addr,
 static bool fw_cfg_dma_mem_valid(void *opaque, hwaddr addr,
                                   unsigned size, bool is_write)
 {
-    return is_write && ((size == 4 && (addr == 0 || addr == 4)) ||
-                        (size == 8 && addr == 0));
+    return !is_write || ((size == 4 && (addr == 0 || addr == 4)) ||
+                         (size == 8 && addr == 0));
 }
 
 static bool fw_cfg_data_mem_valid(void *opaque, hwaddr addr,
@@ -488,6 +496,7 @@ static const MemoryRegionOps fw_cfg_comb_mem_ops = {
 };
 
 static const MemoryRegionOps fw_cfg_dma_mem_ops = {
+    .read = fw_cfg_dma_mem_read,
     .write = fw_cfg_dma_mem_write,
     .endianness = DEVICE_BIG_ENDIAN,
     .valid.accepts = fw_cfg_dma_mem_valid,
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH v4 7/7] fw_cfg: Define a static signature to be returned on DMA port reads
  2015-10-01 12:16   ` [Qemu-devel] [PATCH v4 7/7] fw_cfg: Define a static signature to be returned on DMA port reads Marc Marí
@ 2015-10-01 16:07     ` Laszlo Ersek
  2015-10-01 17:02       ` Kevin O'Connor
  0 siblings, 1 reply; 60+ messages in thread
From: Laszlo Ersek @ 2015-10-01 16:07 UTC (permalink / raw)
  To: Marc Marí, qemu-devel
  Cc: Gabriel L. Somlo, Stefan Hajnoczi, Drew, Kevin O'Connor,
	Gerd Hoffmann

On 10/01/15 14:16, Marc Marí wrote:
> From: Kevin O'Connor <kevin@koconnor.net>
> 
> Return a static signature ("QEMU CFG") if the guest does a read to the
> DMA address io register.
> 
> Signed-off-by: Kevin O'Connor <kevin@koconnor.net>
> ---
>  docs/specs/fw_cfg.txt |  4 ++++
>  hw/nvram/fw_cfg.c     | 13 +++++++++++--
>  2 files changed, 15 insertions(+), 2 deletions(-)
> 
> diff --git a/docs/specs/fw_cfg.txt b/docs/specs/fw_cfg.txt
> index 2d6b2da..249b99e 100644
> --- a/docs/specs/fw_cfg.txt
> +++ b/docs/specs/fw_cfg.txt
> @@ -93,6 +93,10 @@ by selecting the "signature" item using key 0x0000 (FW_CFG_SIGNATURE),
>  and reading four bytes from the data register. If the fw_cfg device is
>  present, the four bytes read will contain the characters "QEMU".
>  
> +Additionaly, if the DMA interface is available then a read to the DMA
> +Address will return 0x51454d5520434647 ("QEMU CFG" in big-endian
> +format).

I suggest:

  Additionaly, if the DMA interface is available, then a read into the
  DMA Address register will return the matching substring of the
  "QEMU CFG" string. Characters that are at lower offsets of the
  substring being read will be stored at lower addresses in the guest.

> +
>  === Revision / feature bitmap (Key 0x0001, FW_CFG_ID) ===
>  
>  A 32-bit little-endian unsigned int, this item is used to check for enabled
> diff --git a/hw/nvram/fw_cfg.c b/hw/nvram/fw_cfg.c
> index 59933b3..c6dcce4 100644
> --- a/hw/nvram/fw_cfg.c
> +++ b/hw/nvram/fw_cfg.c
> @@ -53,6 +53,8 @@
>  #define FW_CFG_DMA_CTL_SKIP    0x04
>  #define FW_CFG_DMA_CTL_SELECT  0x08
>  
> +#define FW_CFG_DMA_SIGNATURE 0x51454d5520434647 /* "QEMU CFG" */
> +
>  typedef struct FWCfgEntry {
>      uint32_t len;
>      uint8_t *data;
> @@ -393,6 +395,12 @@ static void fw_cfg_dma_transfer(FWCfgState *s)
>      trace_fw_cfg_read(s, 0);
>  }
>  
> +static uint64_t fw_cfg_dma_mem_read(void *opaque, hwaddr addr,
> +                                    unsigned size)
> +{
> +    return FW_CFG_DMA_SIGNATURE >> ((8 - addr - size) * 8);
> +}
> +

After re-reading the message of QEMU commit 36b62ae6a5, I think this is
mostly correct.

(I think *technically* it is fully correct, due to the truncation that
is implicit in the argument passing to bswapXX(), in the
adjust_endianness() function [memory.c].)

However, for the human reader's sake, I'd like to see the
over-significant bits masked off explicitly here, after the shift.

>  static void fw_cfg_dma_mem_write(void *opaque, hwaddr addr,
>                                   uint64_t value, unsigned size)
>  {
> @@ -416,8 +424,8 @@ static void fw_cfg_dma_mem_write(void *opaque, hwaddr addr,
>  static bool fw_cfg_dma_mem_valid(void *opaque, hwaddr addr,
>                                    unsigned size, bool is_write)
>  {
> -    return is_write && ((size == 4 && (addr == 0 || addr == 4)) ||
> -                        (size == 8 && addr == 0));
> +    return !is_write || ((size == 4 && (addr == 0 || addr == 4)) ||
> +                         (size == 8 && addr == 0));
>  }

This is not good enough I think; it's possible for the guest to do an
"inl" at offset 7 into the register, which will cause undefined behavior
in fw_cfg_dma_mem_read(). (The shift count being negative.)

Thus, for reading, please enforce:

  (addr + size <= 8)

(In an uncontrolled environment, I'd protect the addition against
overflow too, but here the individual addends are safe enough.)

>  
>  static bool fw_cfg_data_mem_valid(void *opaque, hwaddr addr,
> @@ -488,6 +496,7 @@ static const MemoryRegionOps fw_cfg_comb_mem_ops = {
>  };
>  
>  static const MemoryRegionOps fw_cfg_dma_mem_ops = {
> +    .read = fw_cfg_dma_mem_read,
>      .write = fw_cfg_dma_mem_write,
>      .endianness = DEVICE_BIG_ENDIAN,
>      .valid.accepts = fw_cfg_dma_mem_valid,
> 

Thanks
Laszlo

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH v4 7/7] fw_cfg: Define a static signature to be returned on DMA port reads
  2015-10-01 16:07     ` Laszlo Ersek
@ 2015-10-01 17:02       ` Kevin O'Connor
  2015-10-01 17:17         ` Laszlo Ersek
  0 siblings, 1 reply; 60+ messages in thread
From: Kevin O'Connor @ 2015-10-01 17:02 UTC (permalink / raw)
  To: Laszlo Ersek
  Cc: Drew, Stefan Hajnoczi, Gabriel L. Somlo, qemu-devel,
	Gerd Hoffmann, Marc Marí

On Thu, Oct 01, 2015 at 06:07:06PM +0200, Laszlo Ersek wrote:
> On 10/01/15 14:16, Marc Marí wrote:
> > From: Kevin O'Connor <kevin@koconnor.net>
> > 
> > Return a static signature ("QEMU CFG") if the guest does a read to the
> > DMA address io register.
> > 
> > Signed-off-by: Kevin O'Connor <kevin@koconnor.net>
> > ---
> >  docs/specs/fw_cfg.txt |  4 ++++
> >  hw/nvram/fw_cfg.c     | 13 +++++++++++--
> >  2 files changed, 15 insertions(+), 2 deletions(-)
> > 
> > diff --git a/docs/specs/fw_cfg.txt b/docs/specs/fw_cfg.txt
> > index 2d6b2da..249b99e 100644
> > --- a/docs/specs/fw_cfg.txt
> > +++ b/docs/specs/fw_cfg.txt
> > @@ -93,6 +93,10 @@ by selecting the "signature" item using key 0x0000 (FW_CFG_SIGNATURE),
> >  and reading four bytes from the data register. If the fw_cfg device is
> >  present, the four bytes read will contain the characters "QEMU".
> >  
> > +Additionaly, if the DMA interface is available then a read to the DMA
> > +Address will return 0x51454d5520434647 ("QEMU CFG" in big-endian
> > +format).
> 
> I suggest:
> 
>   Additionaly, if the DMA interface is available, then a read into the
>   DMA Address register will return the matching substring of the
>   "QEMU CFG" string. Characters that are at lower offsets of the
>   substring being read will be stored at lower addresses in the guest.

The interface only supports 32bit and 64bit writes, so I would suggest
the document not encourage 8bit reads.  The implementation happens to
works with 8bit reads, but that was just a side-effect.

[...]
> > @@ -393,6 +395,12 @@ static void fw_cfg_dma_transfer(FWCfgState *s)
> >      trace_fw_cfg_read(s, 0);
> >  }
> >  
> > +static uint64_t fw_cfg_dma_mem_read(void *opaque, hwaddr addr,
> > +                                    unsigned size)
> > +{
> > +    return FW_CFG_DMA_SIGNATURE >> ((8 - addr - size) * 8);
> > +}
> > +
> 
> After re-reading the message of QEMU commit 36b62ae6a5, I think this is
> mostly correct.
> 
> (I think *technically* it is fully correct, due to the truncation that
> is implicit in the argument passing to bswapXX(), in the
> adjust_endianness() function [memory.c].)
>
> However, for the human reader's sake, I'd like to see the
> over-significant bits masked off explicitly here, after the shift.

The value has to be truncated, as a guest inl() or readl() can't
possibly return more than 32 bits.

I think the code's harder to read with the mask in place.  Are there
any helper macros or functions in QEMU to help with register
read/writes of smaller/larger sizes than expected?  Maybe the
.impl.min_access_size should just be set to 4.

[...]
> > @@ -416,8 +424,8 @@ static void fw_cfg_dma_mem_write(void *opaque, hwaddr addr,
> >  static bool fw_cfg_dma_mem_valid(void *opaque, hwaddr addr,
> >                                    unsigned size, bool is_write)
> >  {
> > -    return is_write && ((size == 4 && (addr == 0 || addr == 4)) ||
> > -                        (size == 8 && addr == 0));
> > +    return !is_write || ((size == 4 && (addr == 0 || addr == 4)) ||
> > +                         (size == 8 && addr == 0));
> >  }
> 
> This is not good enough I think; it's possible for the guest to do an
> "inl" at offset 7 into the register, which will cause undefined behavior
> in fw_cfg_dma_mem_read(). (The shift count being negative.)

The function never gets called on unaligned accesses (.impl.unaligned
!= true), so that can't happen.

-Kevin

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH v4 7/7] fw_cfg: Define a static signature to be returned on DMA port reads
  2015-10-01 17:02       ` Kevin O'Connor
@ 2015-10-01 17:17         ` Laszlo Ersek
  0 siblings, 0 replies; 60+ messages in thread
From: Laszlo Ersek @ 2015-10-01 17:17 UTC (permalink / raw)
  To: Kevin O'Connor
  Cc: Drew, Stefan Hajnoczi, Gabriel L. Somlo, qemu-devel,
	Gerd Hoffmann, Marc Marí

On 10/01/15 19:02, Kevin O'Connor wrote:
> On Thu, Oct 01, 2015 at 06:07:06PM +0200, Laszlo Ersek wrote:
>> On 10/01/15 14:16, Marc Marí wrote:
>>> From: Kevin O'Connor <kevin@koconnor.net>
>>>
>>> Return a static signature ("QEMU CFG") if the guest does a read to the
>>> DMA address io register.
>>>
>>> Signed-off-by: Kevin O'Connor <kevin@koconnor.net>
>>> ---
>>>  docs/specs/fw_cfg.txt |  4 ++++
>>>  hw/nvram/fw_cfg.c     | 13 +++++++++++--
>>>  2 files changed, 15 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/docs/specs/fw_cfg.txt b/docs/specs/fw_cfg.txt
>>> index 2d6b2da..249b99e 100644
>>> --- a/docs/specs/fw_cfg.txt
>>> +++ b/docs/specs/fw_cfg.txt
>>> @@ -93,6 +93,10 @@ by selecting the "signature" item using key 0x0000 (FW_CFG_SIGNATURE),
>>>  and reading four bytes from the data register. If the fw_cfg device is
>>>  present, the four bytes read will contain the characters "QEMU".
>>>  
>>> +Additionaly, if the DMA interface is available then a read to the DMA
>>> +Address will return 0x51454d5520434647 ("QEMU CFG" in big-endian
>>> +format).
>>
>> I suggest:
>>
>>   Additionaly, if the DMA interface is available, then a read into the
>>   DMA Address register will return the matching substring of the
>>   "QEMU CFG" string. Characters that are at lower offsets of the
>>   substring being read will be stored at lower addresses in the guest.
> 
> The interface only supports 32bit and 64bit writes, so I would suggest
> the document not encourage 8bit reads.  The implementation happens to
> works with 8bit reads, but that was just a side-effect.

Okay, I don't insist. :)

> 
> [...]
>>> @@ -393,6 +395,12 @@ static void fw_cfg_dma_transfer(FWCfgState *s)
>>>      trace_fw_cfg_read(s, 0);
>>>  }
>>>  
>>> +static uint64_t fw_cfg_dma_mem_read(void *opaque, hwaddr addr,
>>> +                                    unsigned size)
>>> +{
>>> +    return FW_CFG_DMA_SIGNATURE >> ((8 - addr - size) * 8);
>>> +}
>>> +
>>
>> After re-reading the message of QEMU commit 36b62ae6a5, I think this is
>> mostly correct.
>>
>> (I think *technically* it is fully correct, due to the truncation that
>> is implicit in the argument passing to bswapXX(), in the
>> adjust_endianness() function [memory.c].)
>>
>> However, for the human reader's sake, I'd like to see the
>> over-significant bits masked off explicitly here, after the shift.
> 
> The value has to be truncated, as a guest inl() or readl() can't
> possibly return more than 32 bits.
> 
> I think the code's harder to read with the mask in place.  Are there
> any helper macros or functions in QEMU to help with register
> read/writes of smaller/larger sizes than expected?  Maybe the
> .impl.min_access_size should just be set to 4.

I'd be fine even with just a comment then. :)

> 
> [...]
>>> @@ -416,8 +424,8 @@ static void fw_cfg_dma_mem_write(void *opaque, hwaddr addr,
>>>  static bool fw_cfg_dma_mem_valid(void *opaque, hwaddr addr,
>>>                                    unsigned size, bool is_write)
>>>  {
>>> -    return is_write && ((size == 4 && (addr == 0 || addr == 4)) ||
>>> -                        (size == 8 && addr == 0));
>>> +    return !is_write || ((size == 4 && (addr == 0 || addr == 4)) ||
>>> +                         (size == 8 && addr == 0));
>>>  }
>>
>> This is not good enough I think; it's possible for the guest to do an
>> "inl" at offset 7 into the register, which will cause undefined behavior
>> in fw_cfg_dma_mem_read(). (The shift count being negative.)
> 
> The function never gets called on unaligned accesses (.impl.unaligned
> != true), so that can't happen.

That may be true, but -- for me at least -- that's too much to remember
(or to look up all the time). Plus, I find an explicit check more
robust, should the alignment restriction be lifted at some point.

In any case, I don't insist.

Thanks
Laszlo

> 
> -Kevin
> 

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] [PATCH v4 0/7] fw_cfg DMA interface
  2015-10-01 12:16 ` [Qemu-devel] [PATCH v4 0/7] " Marc Marí
                     ` (6 preceding siblings ...)
  2015-10-01 12:16   ` [Qemu-devel] [PATCH v4 7/7] fw_cfg: Define a static signature to be returned on DMA port reads Marc Marí
@ 2015-10-01 13:19   ` Kevin O'Connor
  7 siblings, 0 replies; 60+ messages in thread
From: Kevin O'Connor @ 2015-10-01 13:19 UTC (permalink / raw)
  To: Marc Marí
  Cc: Drew, Stefan Hajnoczi, Gabriel L. Somlo, qemu-devel,
	Gerd Hoffmann, Laszlo

On Thu, Oct 01, 2015 at 02:16:52PM +0200, Marc Marí wrote:
> Implement host-side of the FW CFG DMA interface both for x86 and ARM.

Thanks for working on this.  The QEMU patches look good to me.

-Kevin

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] QEMU fw_cfg DMA interface
  2015-10-01 12:14 [Qemu-devel] QEMU fw_cfg DMA interface Marc Marí
  2015-10-01 12:16 ` [Qemu-devel] [PATCH v4 0/7] " Marc Marí
@ 2015-10-01 16:03 ` Eric Blake
  2015-10-01 16:11   ` Eric Blake
  2015-10-01 16:17   ` Laszlo Ersek
  1 sibling, 2 replies; 60+ messages in thread
From: Eric Blake @ 2015-10-01 16:03 UTC (permalink / raw)
  To: Marc Marí, linux-kernel, qemu-devel, seabios
  Cc: Mark Rutland, Rob Herring, Drew, Arnd Bergmann, devicetree,
	Stefan Hajnoczi, Alexander Graf, Kevin O'Connor,
	Gerd Hoffmann, Laszlo

[-- Attachment #1: Type: text/plain, Size: 589 bytes --]

[meta-comment]

On 10/01/2015 06:14 AM, Marc Marí wrote:
> Implementation of the FW CFG DMA interface.

The subject line is missing "v4" and "0/7". Also, the cover letter is
missing a diffstat.  That makes it harder to see from the cover letter
what the rest of the series is about.  'git format-patch/send-email
--cover-letter' does what you want; you can even 'git config
format.coverletter=auto' to always include a decent cover letter on any
multi-patch series.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] QEMU fw_cfg DMA interface
  2015-10-01 16:03 ` [Qemu-devel] QEMU " Eric Blake
@ 2015-10-01 16:11   ` Eric Blake
  2015-10-01 16:19     ` Laszlo Ersek
  2015-10-01 16:17   ` Laszlo Ersek
  1 sibling, 1 reply; 60+ messages in thread
From: Eric Blake @ 2015-10-01 16:11 UTC (permalink / raw)
  To: Marc Marí, linux-kernel, qemu-devel, seabios
  Cc: Mark Rutland, Rob Herring, Drew, Arnd Bergmann, devicetree,
	Stefan Hajnoczi, Alexander Graf, Kevin O'Connor,
	Gerd Hoffmann, Laszlo

[-- Attachment #1: Type: text/plain, Size: 1104 bytes --]

On 10/01/2015 10:03 AM, Eric Blake wrote:
> [meta-comment]
> 
> On 10/01/2015 06:14 AM, Marc Marí wrote:
>> Implementation of the FW CFG DMA interface.
> 
> The subject line is missing "v4" and "0/7". Also, the cover letter is
> missing a diffstat.  That makes it harder to see from the cover letter
> what the rest of the series is about.  'git format-patch/send-email
> --cover-letter' does what you want; you can even 'git config
> format.coverletter=auto' to always include a decent cover letter on any
> multi-patch series.

Oh, I see - you sent a meta-cover letter (the one I replied to in this
subthread), and then a patch series including a cover letter (the real
0/7, then 1/7 and friends in-reply-to the 0/7) as a child of the
meta-cover.  It's still a bit awkward for tools that expect the 0/7 as
the start of the thread, and part of my confusion was caused by
out-of-order mail delivery due to the nongnu.org mail server still
recovering from its mail delays.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] QEMU fw_cfg DMA interface
  2015-10-01 16:11   ` Eric Blake
@ 2015-10-01 16:19     ` Laszlo Ersek
  0 siblings, 0 replies; 60+ messages in thread
From: Laszlo Ersek @ 2015-10-01 16:19 UTC (permalink / raw)
  To: Eric Blake, Marc Marí, linux-kernel, qemu-devel, seabios
  Cc: Mark Rutland, Rob Herring, Drew, Arnd Bergmann, devicetree,
	Stefan Hajnoczi, Alexander Graf, Kevin O'Connor,
	Gerd Hoffmann

On 10/01/15 18:11, Eric Blake wrote:
> On 10/01/2015 10:03 AM, Eric Blake wrote:
>> [meta-comment]
>>
>> On 10/01/2015 06:14 AM, Marc Marí wrote:
>>> Implementation of the FW CFG DMA interface.
>>
>> The subject line is missing "v4" and "0/7". Also, the cover letter is
>> missing a diffstat.  That makes it harder to see from the cover letter
>> what the rest of the series is about.  'git format-patch/send-email
>> --cover-letter' does what you want; you can even 'git config
>> format.coverletter=auto' to always include a decent cover letter on any
>> multi-patch series.
> 
> Oh, I see - you sent a meta-cover letter (the one I replied to in this
> subthread), and then a patch series including a cover letter (the real
> 0/7, then 1/7 and friends in-reply-to the 0/7) as a child of the
> meta-cover.  It's still a bit awkward for tools that expect the 0/7 as
> the start of the thread,

Yep, the pattern I just described doesn't consider those tools. Is that
a bad problem? Maybe the pattern is not so clever then. :)

(I'm allowed to say bad things about it, because I "invented" it
"independently". :))

> and part of my confusion was caused by
> out-of-order mail delivery due to the nongnu.org mail server still
> recovering from its mail delays.
> 

Right, those delays are not helping.

Thanks
Laszlo

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] QEMU fw_cfg DMA interface
  2015-10-01 16:03 ` [Qemu-devel] QEMU " Eric Blake
  2015-10-01 16:11   ` Eric Blake
@ 2015-10-01 16:17   ` Laszlo Ersek
  2015-10-01 16:21     ` Eric Blake
  1 sibling, 1 reply; 60+ messages in thread
From: Laszlo Ersek @ 2015-10-01 16:17 UTC (permalink / raw)
  To: Eric Blake, Marc Marí, linux-kernel, qemu-devel, seabios
  Cc: Mark Rutland, Rob Herring, Drew, Arnd Bergmann, devicetree,
	Stefan Hajnoczi, Alexander Graf, Kevin O'Connor,
	Gerd Hoffmann

On 10/01/15 18:03, Eric Blake wrote:
> [meta-comment]
> 
> On 10/01/2015 06:14 AM, Marc Marí wrote:
>> Implementation of the FW CFG DMA interface.
> 
> The subject line is missing "v4" and "0/7". Also, the cover letter is
> missing a diffstat.  That makes it harder to see from the cover letter
> what the rest of the series is about.  'git format-patch/send-email
> --cover-letter' does what you want; you can even 'git config
> format.coverletter=auto' to always include a decent cover letter on any
> multi-patch series.
> 

This posting follows a little bit different pattern, one that I myself
follow when posting patches for two (or more) components that must work
in sync.

Usually, a top-level blurb is manually cross-posted to all relevant
mailing lists. Then, each separate patch series is posted only to the
relevant mailing list, with its own cover letter (as usual with git),
*in response* to the manually posted blurb.

This has the following benefits:

- in mailing list archives that organize messages into threads *across*
  mailing lists (like Gmane does, for example), the top-level manual
  blurb is a good "root" for referencing the entire posting.

- The same is true for personal mailboxes, if a recipient is explicitly
  CC'd on all of the messages.

Because the top level blurb is parent to several patch series, and those
child series can all have different version numbers (due to different
numbers of respinds), it is not always straightforward to assign a
version number to the top blurb.

Thanks
Laszlo

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] QEMU fw_cfg DMA interface
  2015-10-01 16:17   ` Laszlo Ersek
@ 2015-10-01 16:21     ` Eric Blake
  2015-10-01 16:34       ` Laszlo Ersek
  0 siblings, 1 reply; 60+ messages in thread
From: Eric Blake @ 2015-10-01 16:21 UTC (permalink / raw)
  To: Laszlo Ersek, Marc Marí, linux-kernel, qemu-devel, seabios
  Cc: Mark Rutland, Rob Herring, Drew, Arnd Bergmann, devicetree,
	Stefan Hajnoczi, Alexander Graf, Kevin O'Connor,
	Gerd Hoffmann

[-- Attachment #1: Type: text/plain, Size: 1463 bytes --]

On 10/01/2015 10:17 AM, Laszlo Ersek wrote:
> On 10/01/15 18:03, Eric Blake wrote:
>> [meta-comment]
>>
>> On 10/01/2015 06:14 AM, Marc Marí wrote:
>>> Implementation of the FW CFG DMA interface.
>>
>> The subject line is missing "v4" and "0/7". Also, the cover letter is
>> missing a diffstat.  That makes it harder to see from the cover letter
>> what the rest of the series is about.  'git format-patch/send-email
>> --cover-letter' does what you want; you can even 'git config
>> format.coverletter=auto' to always include a decent cover letter on any
>> multi-patch series.
>>
> 
> This posting follows a little bit different pattern, one that I myself
> follow when posting patches for two (or more) components that must work
> in sync.

Ok, makes sense. Maybe the only additional suggestions would be to make
it more obvious in the subject line (put the text 'cross-post'
somewhere?) or have the first paragraph of the meta-cover be more
explicit that there are going to be multiple sub-threads, one per
project, where all subthreads must be applied to their corresponding
project for the overall feature to be complete?

[And maybe I should wait a few minutes for the full thread to appear in
my inbox, rather than immediately replying to the first mail while the
series is still incomplete due to mail delays...]

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] QEMU fw_cfg DMA interface
  2015-10-01 16:21     ` Eric Blake
@ 2015-10-01 16:34       ` Laszlo Ersek
  0 siblings, 0 replies; 60+ messages in thread
From: Laszlo Ersek @ 2015-10-01 16:34 UTC (permalink / raw)
  To: Eric Blake, Marc Marí, linux-kernel, qemu-devel, seabios
  Cc: Mark Rutland, Rob Herring, Drew, Arnd Bergmann, devicetree,
	Stefan Hajnoczi, Alexander Graf, Kevin O'Connor,
	Gerd Hoffmann

On 10/01/15 18:21, Eric Blake wrote:
> On 10/01/2015 10:17 AM, Laszlo Ersek wrote:
>> On 10/01/15 18:03, Eric Blake wrote:
>>> [meta-comment]
>>>
>>> On 10/01/2015 06:14 AM, Marc Marí wrote:
>>>> Implementation of the FW CFG DMA interface.
>>>
>>> The subject line is missing "v4" and "0/7". Also, the cover letter is
>>> missing a diffstat.  That makes it harder to see from the cover letter
>>> what the rest of the series is about.  'git format-patch/send-email
>>> --cover-letter' does what you want; you can even 'git config
>>> format.coverletter=auto' to always include a decent cover letter on any
>>> multi-patch series.
>>>
>>
>> This posting follows a little bit different pattern, one that I myself
>> follow when posting patches for two (or more) components that must work
>> in sync.
> 
> Ok, makes sense. Maybe the only additional suggestions would be to make
> it more obvious in the subject line (put the text 'cross-post'
> somewhere?) or have the first paragraph of the meta-cover be more
> explicit that there are going to be multiple sub-threads, one per
> project, where all subthreads must be applied to their corresponding
> project for the overall feature to be complete?

That's a good idea. I think prefixing the main blurb's subject with
[cross-post], and a "standard" first paragraph based on your above
suggestion, would be helpful.

> [And maybe I should wait a few minutes for the full thread to appear in
> my inbox, rather than immediately replying to the first mail while the
> series is still incomplete due to mail delays...]

I'm not patient; it would be unfair from me to expect others to be... :)

Laszlo

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [Qemu-devel] QEMU fw_cfg DMA interface
@ 2015-09-18  8:58 Marc Marí
  0 siblings, 0 replies; 60+ messages in thread
From: Marc Marí @ 2015-09-18  8:58 UTC (permalink / raw)
  To: linux-kernel, qemu-devel, seabios
  Cc: Mark Rutland, Rob Herring, Drew, Arnd Bergmann, devicetree,
	Stefan Hajnoczi, Alexander Graf, Kevin O'Connor,
	Gerd Hoffmann, Marc Marí, Laszlo

Implementation of the FW CFG DMA interface.

When running a Linux guest on top of QEMU, using the -kernel options, this
is the timing improvement for x86:

QEMU commit 16a1b6e and SeaBIOS commit e4d2b8c
QEMU startup time: .080
BIOS startup time: .060
Kernel setup time: .586
Total time: .726

QEMU with this patch series and SeaBIOS with this patch series
QEMU startup time: .080
BIOS startup time: .039
Kernel setup time: .002
Total time: .121

QEMU startup time is the time between the start and the first kvm_entry.
BIOS startup time is the time between the first kvm_entry and the start of
function do_boot, in SeaBIOS.
Kernel setup time is the time between the start of the function do_boot in
SeaBIOS and the jump to the Linux kernel.

As you can see, both the BIOS (because of ACPI tables and other configurations)
and the Linux kernel boot (because of the copy to memory) are greatly
improved with this new interface.

Also, this new interface is an addon to the old interface. Both interfaces
are compatible and interchangeable.

Changes from v1:
 - Take into account order of fields in the FWCfgDmaAccess structure
 - Check and change endianness of FWCfgDmaAccess fields
 - Change order of fields in the FWCfgDmaAccess structure
 - Add FW_CFG_DMA_CTL_SKIP feature for control field
 - Split FW_CFG_SIZE in QEMU
 - Make FW_CFG_ID a bitmap of features
 - Add 64 bit address support for the transfer. Trigger when writing the low
   address, and address is 0 by default and at the end of each transfer.
 - Align ports and addresses.
 - Preserve old fw_cfg_comb_valid behaviour in QEMU
 - Update documentation to reflect all these changes

Changes from v2:
 - Make IOports fw_cfg DMA region a different IO region.
 - Reuse everything for MMIO and IOport DMA regions
 - Make transfer status only based on control field
 - Use DMA helpers instead of direct map/unmap
 - Change ARM fw_cfg DMA address space
 - Change Linux boot process to match linuxboot.S
 - Add select capabilities in the FWCfgDmaAccess struct
 - Update documentation to reflect all these changes

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [Qemu-devel] QEMU fw_cfg DMA interface
@ 2015-08-31  9:08 Marc Marí
  0 siblings, 0 replies; 60+ messages in thread
From: Marc Marí @ 2015-08-31  9:08 UTC (permalink / raw)
  To: linux-kernel, qemu-devel, seabios
  Cc: Mark Rutland, Rob Herring, Drew, Arnd Bergmann, devicetree,
	Stefan Hajnoczi, Alexander Graf, Kevin O'Connor,
	Gerd Hoffmann, Marc Marí, Laszlo

Implementation of the FW CFG DMA interface.

When running a Linux guest on top of QEMU, using the -kernel options, this
is the timing improvement for x86:

QEMU commit 090d0bf and SeaBIOS commit 2fc20dc
QEMU startup time: .078
BIOS startup time: .060
Kernel setup time: .578
Total time: .716

QEMU with this patch series and SeaBIOS with this patch series
QEMU startup time: .080
BIOS startup time: .039
Kernel setup time: .002
Total time: .121

QEMU startup time is the time between the start and the first kvm_entry.
BIOS startup time is the time between the first kvm_entry and the start of
function do_boot, in SeaBIOS.
Kernel setup time is the time between the start of the function do_boot in
SeaBIOS and the jump to the Linux kernel.

As you can see, both the BIOS (because of ACPI tables and other configurations)
and the Linux kernel boot (because of the copy to memory) are greatly
improved with this new interface.

Also, this new interface is an addon to the old interface. Both interfaces
are compatible and interchangeable.

Changes from v1:
 - Take into account order of fields in the FWCfgDmaAccess structure
 - Check and change endianness of FWCfgDmaAccess fields
 - Change order of fields in the FWCfgDmaAccess structure
 - Add FW_CFG_DMA_CTL_SKIP feature for control field
 - Split FW_CFG_SIZE in QEMU
 - Make FW_CFG_ID a bitmap of features
 - Add 64 bit address support for the transfer. Trigger when writing the low
   address, and address is 0 by default and at the end of each transfer.
 - Align ports and addresses.
 - Preserve old fw_cfg_comb_valid behaviour in QEMU
 - Update documentation to reflect all these changes

^ permalink raw reply	[flat|nested] 60+ messages in thread

* [Qemu-devel] QEMU fw_cfg DMA interface
@ 2015-08-06 11:00 Marc Marí
  2015-08-06 12:27 ` Stefan Hajnoczi
  0 siblings, 1 reply; 60+ messages in thread
From: Marc Marí @ 2015-08-06 11:00 UTC (permalink / raw)
  To: linux-kernel, qemu-devel, seabios
  Cc: Drew, Stefan Hajnoczi, Kevin O'Connor, Gerd Hoffmann,
	Marc Marí, Laszlo

Implementation of the FW CFG DMA interface.

When running a Linux guest on top of QEMU, using the -kernel options, this
is the timing improvement for x86:

QEMU commit 2be4f242b50a8 and SeaBIOS commit 908a58c1d5ff
QEMU startup time: .078
BIOS startup time: .060
Kernel setup time: .578
Total time: .716

QEMU with this patch series and SeaBIOS with this patch series
QEMU startup time: .080
BIOS startup time: .039
Kernel setup time: .002
Total time: .121

QEMU startup time is the time between the start and the first kvm_entry.
BIOS startup time is the time between the first kvm_entry and the start of
function do_boot, in SeaBIOS.
Kernel setup time is the time between the start of the function do_boot in
SeaBIOS and the jump to the Linux kernel.

As you can see, both the BIOS (because of ACPI tables and other configurations)
and the Linux kernel boot (because of the copy to memory) are greatly
improved with this new interface.

Also, this new interface is an addon to the old interface. Both interfaces
are compatible and interchangeable.

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] QEMU fw_cfg DMA interface
  2015-08-06 11:00 Marc Marí
@ 2015-08-06 12:27 ` Stefan Hajnoczi
  2015-08-06 12:37   ` Marc Marí
  0 siblings, 1 reply; 60+ messages in thread
From: Stefan Hajnoczi @ 2015-08-06 12:27 UTC (permalink / raw)
  To: Marc Marí
  Cc: Drew, seabios, linux-kernel, qemu-devel, Kevin O'Connor,
	Gerd Hoffmann, Laszlo

On Thu, Aug 6, 2015 at 12:00 PM, Marc Marí <markmb@redhat.com> wrote:
> When running a Linux guest on top of QEMU, using the -kernel options, this
> is the timing improvement for x86:
>
> QEMU commit 2be4f242b50a8 and SeaBIOS commit 908a58c1d5ff
> QEMU startup time: .078
> BIOS startup time: .060
> Kernel setup time: .578
> Total time: .716
>
> QEMU with this patch series and SeaBIOS with this patch series
> QEMU startup time: .080
> BIOS startup time: .039
> Kernel setup time: .002
> Total time: .121

Impressive results!

Is this a fully-featured QEMU build or did you disable things?

Is this the default SeaBIOS build or did you disable things?

Stefan

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] QEMU fw_cfg DMA interface
  2015-08-06 12:27 ` Stefan Hajnoczi
@ 2015-08-06 12:37   ` Marc Marí
  2015-08-06 12:40     ` Stefan Hajnoczi
  2015-08-06 15:30     ` Kevin O'Connor
  0 siblings, 2 replies; 60+ messages in thread
From: Marc Marí @ 2015-08-06 12:37 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Drew, seabios, linux-kernel, qemu-devel, Kevin O'Connor,
	Gerd Hoffmann, Laszlo

On Thu, 6 Aug 2015 13:27:16 +0100
Stefan Hajnoczi <stefanha@gmail.com> wrote:

> On Thu, Aug 6, 2015 at 12:00 PM, Marc Marí <markmb@redhat.com> wrote:
> > When running a Linux guest on top of QEMU, using the -kernel
> > options, this is the timing improvement for x86:
> >
> > QEMU commit 2be4f242b50a8 and SeaBIOS commit 908a58c1d5ff
> > QEMU startup time: .078
> > BIOS startup time: .060
> > Kernel setup time: .578
> > Total time: .716
> >
> > QEMU with this patch series and SeaBIOS with this patch series
> > QEMU startup time: .080
> > BIOS startup time: .039
> > Kernel setup time: .002
> > Total time: .121
> 
> Impressive results!
> 
> Is this a fully-featured QEMU build or did you disable things?
> 
> Is this the default SeaBIOS build or did you disable things?
> 

This is the default QEMU configuration I get for my system. It's not a
fully-featured QEMU, but it has a lot of things enabled. SeaBIOS
has a default configuration (with debugging disabled).

The QEMU configuration is:
[...]
tcg debug enabled no
gprof enabled     no
sparse enabled    no
strip binaries    yes
profiler          no
static build      no
pixman            system
SDL support       yes
GTK support       yes
GNUTLS support    yes
GNUTLS hash       yes
GNUTLS gcrypt     no
GNUTLS nettle     yes (2.7.1)
VTE support       yes
curses support    yes
curl support      yes
mingw32 support   no
Audio drivers     oss
Block whitelist (rw)
Block whitelist (ro)
VirtFS support    yes
VNC support       yes
VNC TLS support   yes
VNC SASL support  yes
VNC JPEG support  yes
VNC PNG support   yes
xen support       no
brlapi support    yes
bluez  support    yes
Documentation     no
GUEST_BASE        yes
PIE               yes
vde support       yes
netmap support    no
Linux AIO support yes
ATTR/XATTR support yes
Install blobs     yes
KVM support       yes
RDMA support      yes
TCG interpreter   no
fdt support       yes
preadv support    yes
fdatasync         yes
madvise           yes
posix_madvise     yes
sigev_thread_id   yes
uuid support      yes
libcap-ng support yes
vhost-net support yes
vhost-scsi support yes
Trace backends    nop
spice support     yes (0.12.7/0.12.5)
rbd support       yes
xfsctl support    no
nss used          yes
libusb            yes
usb net redir     yes
OpenGL support    no
libiscsi support  yes
libnfs support    no
build guest agent yes
QGA VSS support   no
QGA w32 disk info no
seccomp support   yes
coroutine backend ucontext
coroutine pool    yes
GlusterFS support yes
Archipelago support no
gcov              gcov
gcov enabled      no
TPM support       yes
libssh2 support   yes
TPM passthrough   yes
QOM debugging     yes
vhdx              yes
lzo support       yes
snappy support    yes
bzip2 support     yes
NUMA host support yes
tcmalloc support  no

Marc

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] QEMU fw_cfg DMA interface
  2015-08-06 12:37   ` Marc Marí
@ 2015-08-06 12:40     ` Stefan Hajnoczi
  2015-08-06 15:30     ` Kevin O'Connor
  1 sibling, 0 replies; 60+ messages in thread
From: Stefan Hajnoczi @ 2015-08-06 12:40 UTC (permalink / raw)
  To: Marc Marí
  Cc: Drew, seabios, linux-kernel, qemu-devel, Kevin O'Connor,
	Gerd Hoffmann, Laszlo

On Thu, Aug 6, 2015 at 1:37 PM, Marc Marí <markmb@redhat.com> wrote:
> On Thu, 6 Aug 2015 13:27:16 +0100
> Stefan Hajnoczi <stefanha@gmail.com> wrote:
>
>> On Thu, Aug 6, 2015 at 12:00 PM, Marc Marí <markmb@redhat.com> wrote:
>> > When running a Linux guest on top of QEMU, using the -kernel
>> > options, this is the timing improvement for x86:
>> >
>> > QEMU commit 2be4f242b50a8 and SeaBIOS commit 908a58c1d5ff
>> > QEMU startup time: .078
>> > BIOS startup time: .060
>> > Kernel setup time: .578
>> > Total time: .716
>> >
>> > QEMU with this patch series and SeaBIOS with this patch series
>> > QEMU startup time: .080
>> > BIOS startup time: .039
>> > Kernel setup time: .002
>> > Total time: .121
>>
>> Impressive results!
>>
>> Is this a fully-featured QEMU build or did you disable things?
>>
>> Is this the default SeaBIOS build or did you disable things?
>>
>
> This is the default QEMU configuration I get for my system. It's not a
> fully-featured QEMU, but it has a lot of things enabled. SeaBIOS
> has a default configuration (with debugging disabled).

That's great.

Since SeaBIOS is a default configuration, the remaining BIOS startup
time is amenable to more optimizations in the future.

Stefan

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] QEMU fw_cfg DMA interface
  2015-08-06 12:37   ` Marc Marí
  2015-08-06 12:40     ` Stefan Hajnoczi
@ 2015-08-06 15:30     ` Kevin O'Connor
  2015-08-06 15:53       ` Marc Marí
  1 sibling, 1 reply; 60+ messages in thread
From: Kevin O'Connor @ 2015-08-06 15:30 UTC (permalink / raw)
  To: Marc Marí
  Cc: Drew, Stefan Hajnoczi, seabios, linux-kernel, qemu-devel,
	Gerd Hoffmann, Laszlo

On Thu, Aug 06, 2015 at 02:37:45PM +0200, Marc Marí wrote:
> On Thu, 6 Aug 2015 13:27:16 +0100
> Stefan Hajnoczi <stefanha@gmail.com> wrote:
> 
> > On Thu, Aug 6, 2015 at 12:00 PM, Marc Marí <markmb@redhat.com> wrote:
> > > When running a Linux guest on top of QEMU, using the -kernel
> > > options, this is the timing improvement for x86:
> > >
> > > QEMU commit 2be4f242b50a8 and SeaBIOS commit 908a58c1d5ff
> > > QEMU startup time: .078
> > > BIOS startup time: .060
> > > Kernel setup time: .578
> > > Total time: .716
> > >
> > > QEMU with this patch series and SeaBIOS with this patch series
> > > QEMU startup time: .080
> > > BIOS startup time: .039
> > > Kernel setup time: .002
> > > Total time: .121
> > 
> > Impressive results!
> > 
> > Is this a fully-featured QEMU build or did you disable things?
> > 
> > Is this the default SeaBIOS build or did you disable things?
> > 
> 
> This is the default QEMU configuration I get for my system. It's not a
> fully-featured QEMU, but it has a lot of things enabled. SeaBIOS
> has a default configuration (with debugging disabled).

Thanks!

What qemu command-line did you use during testing?  Also, do you have
a quick primer on how to use the kvm trace stuff to obtain timings?

-Kevin

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] QEMU fw_cfg DMA interface
  2015-08-06 15:30     ` Kevin O'Connor
@ 2015-08-06 15:53       ` Marc Marí
  2015-08-07  4:30         ` Kevin O'Connor
  0 siblings, 1 reply; 60+ messages in thread
From: Marc Marí @ 2015-08-06 15:53 UTC (permalink / raw)
  To: Kevin O'Connor
  Cc: Drew, Stefan Hajnoczi, seabios, linux-kernel, qemu-devel,
	Gerd Hoffmann, Laszlo

[-- Attachment #1: Type: text/plain, Size: 2332 bytes --]

On Thu, 6 Aug 2015 11:30:43 -0400
"Kevin O'Connor" <kevin@koconnor.net> wrote:

> On Thu, Aug 06, 2015 at 02:37:45PM +0200, Marc Marí wrote:
> > On Thu, 6 Aug 2015 13:27:16 +0100
> > Stefan Hajnoczi <stefanha@gmail.com> wrote:
> > 
> > > On Thu, Aug 6, 2015 at 12:00 PM, Marc Marí <markmb@redhat.com>
> > > wrote:
> > > > When running a Linux guest on top of QEMU, using the -kernel
> > > > options, this is the timing improvement for x86:
> > > >
> > > > QEMU commit 2be4f242b50a8 and SeaBIOS commit 908a58c1d5ff
> > > > QEMU startup time: .078
> > > > BIOS startup time: .060
> > > > Kernel setup time: .578
> > > > Total time: .716
> > > >
> > > > QEMU with this patch series and SeaBIOS with this patch series
> > > > QEMU startup time: .080
> > > > BIOS startup time: .039
> > > > Kernel setup time: .002
> > > > Total time: .121
> > > 
> > > Impressive results!
> > > 
> > > Is this a fully-featured QEMU build or did you disable things?
> > > 
> > > Is this the default SeaBIOS build or did you disable things?
> > > 
> > 
> > This is the default QEMU configuration I get for my system. It's
> > not a fully-featured QEMU, but it has a lot of things enabled.
> > SeaBIOS has a default configuration (with debugging disabled).
> 
> Thanks!
> 
> What qemu command-line did you use during testing?  Also, do you have
> a quick primer on how to use the kvm trace stuff to obtain timings?
> 

The command line I used is:

x86_64-softmmu/qemu-system-x86_64 --enable-kvm \
    -kernel /boot/vmlinuz-4.0.7-300.fc22.x86_64 \
    -L pc-bios/optionrom/ \
    -bios roms/seabios/out/bios.bin -nographic

And I used perf (and two out instructions in the BIOS) to measure the
times:

perf record -a -e kvm:\* -e sched:sched_process_exec

And searching for sched:sched_process_exec, kvm:kvm_entry, pio_write at
0xf5 and pio_write at 0xf4. Out at 0xf5 is the one in "do_boot"
function, and out at 0xf4 is the one just before the jump to the Linux
kernel.

I attach the script I've been using. It can be improved, but it works.
It can be run like this:

sudo ../../results/run_test.sh x86_64-softmmu/qemu-system-x86_64 \
    --enable-kvm -kernel /boot/vmlinuz-4.0.8-300.fc22.x86_64 \
    -L pc-bios/optionrom/ \
    -bios roms/seabios/out/bios.bin -nographic

Thanks
Marc

[-- Attachment #2: run_test.sh --]
[-- Type: application/x-shellscript, Size: 1401 bytes --]

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] QEMU fw_cfg DMA interface
  2015-08-06 15:53       ` Marc Marí
@ 2015-08-07  4:30         ` Kevin O'Connor
  2015-08-17 22:08           ` Gerd Hoffmann
  0 siblings, 1 reply; 60+ messages in thread
From: Kevin O'Connor @ 2015-08-07  4:30 UTC (permalink / raw)
  To: Marc Marí
  Cc: Drew, Stefan Hajnoczi, seabios, qemu-devel, Gerd Hoffmann, Laszlo

Trimming CC list.

On Thu, Aug 06, 2015 at 05:53:26PM +0200, Marc Marí wrote:
> On Thu, 6 Aug 2015 11:30:43 -0400
> "Kevin O'Connor" <kevin@koconnor.net> wrote:
> > On Thu, Aug 06, 2015 at 02:37:45PM +0200, Marc Marí wrote:
> > > On Thu, 6 Aug 2015 13:27:16 +0100
> > > Stefan Hajnoczi <stefanha@gmail.com> wrote:
> > > 
> > > > On Thu, Aug 6, 2015 at 12:00 PM, Marc Marí <markmb@redhat.com>
> > > > wrote:
> > > > > When running a Linux guest on top of QEMU, using the -kernel
> > > > > options, this is the timing improvement for x86:
> > > > >
> > > > > QEMU commit 2be4f242b50a8 and SeaBIOS commit 908a58c1d5ff
> > > > > QEMU startup time: .078
> > > > > BIOS startup time: .060
> > > > > Kernel setup time: .578
> > > > > Total time: .716
> > > > >
> > > > > QEMU with this patch series and SeaBIOS with this patch series
> > > > > QEMU startup time: .080
> > > > > BIOS startup time: .039
> > > > > Kernel setup time: .002
> > > > > Total time: .121
> > > > 
> > > > Impressive results!
> > > > 
> > > > Is this a fully-featured QEMU build or did you disable things?
> > > > 
> > > > Is this the default SeaBIOS build or did you disable things?
> > > > 
> > > 
> > > This is the default QEMU configuration I get for my system. It's
> > > not a fully-featured QEMU, but it has a lot of things enabled.
> > > SeaBIOS has a default configuration (with debugging disabled).
> > 
> > Thanks!
> > 
> > What qemu command-line did you use during testing?  Also, do you have
> > a quick primer on how to use the kvm trace stuff to obtain timings?
> > 
> 
> The command line I used is:
> 
> x86_64-softmmu/qemu-system-x86_64 --enable-kvm \
>     -kernel /boot/vmlinuz-4.0.7-300.fc22.x86_64 \
>     -L pc-bios/optionrom/ \
>     -bios roms/seabios/out/bios.bin -nographic

Thanks.  One thing that immediately pops up is that even though
"-nographic" is used, it looks like QEMU still emulates a VGA device
and that device has an option rom that takes several milliseconds to
init the display hardware.  Adding "-device VGA,romfile=" to the qemu
command line will further reduce the boot time by avoiding this
hardware init delay.

-Kevin

^ permalink raw reply	[flat|nested] 60+ messages in thread

* Re: [Qemu-devel] QEMU fw_cfg DMA interface
  2015-08-07  4:30         ` Kevin O'Connor
@ 2015-08-17 22:08           ` Gerd Hoffmann
  0 siblings, 0 replies; 60+ messages in thread
From: Gerd Hoffmann @ 2015-08-17 22:08 UTC (permalink / raw)
  To: Kevin O'Connor
  Cc: Drew, Stefan Hajnoczi, seabios, qemu-devel, Marc Marí,
	Laszlo

  Hi,

> > The command line I used is:
> > 
> > x86_64-softmmu/qemu-system-x86_64 --enable-kvm \
> >     -kernel /boot/vmlinuz-4.0.7-300.fc22.x86_64 \
> >     -L pc-bios/optionrom/ \
> >     -bios roms/seabios/out/bios.bin -nographic
> 
> Thanks.  One thing that immediately pops up is that even though
> "-nographic" is used, it looks like QEMU still emulates a VGA device
> and that device has an option rom that takes several milliseconds to
> init the display hardware.  Adding "-device VGA,romfile=" to the qemu
> command line will further reduce the boot time by avoiding this
> hardware init delay.

Same goes for the NIC btw.

cheers,
  Gerd

^ permalink raw reply	[flat|nested] 60+ messages in thread

end of thread, other threads:[~2015-10-08 10:02 UTC | newest]

Thread overview: 60+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-10-01 12:14 [Qemu-devel] QEMU fw_cfg DMA interface Marc Marí
2015-10-01 12:16 ` [Qemu-devel] [PATCH v4 0/7] " Marc Marí
2015-10-01 12:16   ` [Qemu-devel] [PATCH v4 1/7] fw_cfg: document fw_cfg_modify_iXX() update functions Marc Marí
2015-10-01 12:16   ` [Qemu-devel] [PATCH v4 2/7] fw_cfg DMA interface documentation Marc Marí
2015-10-01 14:41     ` Laszlo Ersek
2015-10-01 12:16   ` [Qemu-devel] [PATCH v4 3/7] Implement fw_cfg DMA interface Marc Marí
2015-10-01 14:36     ` Laszlo Ersek
2015-10-01 15:52       ` Marc Marí
2015-10-01 17:18       ` Peter Maydell
2015-10-01 19:20         ` Laszlo Ersek
2015-10-06 14:44     ` Stefan Hajnoczi
2015-10-06 14:53       ` Peter Maydell
2015-10-08  9:07         ` Stefan Hajnoczi
2015-10-08 10:01           ` Marc Marí
2015-10-06 14:54       ` Marc Marí
2015-10-01 12:16   ` [Qemu-devel] [PATCH v4 4/7] Enable fw_cfg DMA interface for ARM Marc Marí
2015-10-01 14:42     ` Laszlo Ersek
2015-10-01 12:16   ` [Qemu-devel] [PATCH v4 5/7] Enable fw_cfg DMA interface for x86 Marc Marí
2015-10-01 14:48     ` Laszlo Ersek
2015-10-01 12:16   ` [Qemu-devel] [PATCH v4 6/7] Make the kernel image in the fw_cfg DMA interface bootable Marc Marí
2015-10-01 15:25     ` Laszlo Ersek
2015-10-01 16:02       ` Kevin O'Connor
2015-10-01 16:10         ` Laszlo Ersek
2015-10-01 18:15         ` Marc Marí
2015-10-02  8:16         ` Gerd Hoffmann
2015-10-02  8:24           ` Marc Marí
2015-10-02  9:01             ` Gerd Hoffmann
2015-10-02 11:47               ` Laszlo Ersek
2015-10-02 12:07                 ` Gerd Hoffmann
2015-10-02 13:25                   ` Laszlo Ersek
2015-10-02 13:30                     ` Laszlo Ersek
2015-10-03  0:05                     ` Jordan Justen
2015-10-02 13:38           ` Kevin O'Connor
2015-10-05  9:18             ` Gerd Hoffmann
2015-10-02  8:09       ` Gerd Hoffmann
2015-10-02 13:40         ` Kevin O'Connor
2015-10-02 13:50           ` Laszlo Ersek
2015-10-02 15:24           ` Daniel P. Berrange
2015-10-05  9:26           ` Gerd Hoffmann
2015-10-01 12:16   ` [Qemu-devel] [PATCH v4 7/7] fw_cfg: Define a static signature to be returned on DMA port reads Marc Marí
2015-10-01 16:07     ` Laszlo Ersek
2015-10-01 17:02       ` Kevin O'Connor
2015-10-01 17:17         ` Laszlo Ersek
2015-10-01 13:19   ` [Qemu-devel] [PATCH v4 0/7] fw_cfg DMA interface Kevin O'Connor
2015-10-01 16:03 ` [Qemu-devel] QEMU " Eric Blake
2015-10-01 16:11   ` Eric Blake
2015-10-01 16:19     ` Laszlo Ersek
2015-10-01 16:17   ` Laszlo Ersek
2015-10-01 16:21     ` Eric Blake
2015-10-01 16:34       ` Laszlo Ersek
  -- strict thread matches above, loose matches on Subject: below --
2015-09-18  8:58 Marc Marí
2015-08-31  9:08 Marc Marí
2015-08-06 11:00 Marc Marí
2015-08-06 12:27 ` Stefan Hajnoczi
2015-08-06 12:37   ` Marc Marí
2015-08-06 12:40     ` Stefan Hajnoczi
2015-08-06 15:30     ` Kevin O'Connor
2015-08-06 15:53       ` Marc Marí
2015-08-07  4:30         ` Kevin O'Connor
2015-08-17 22:08           ` Gerd Hoffmann

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).