Re: [PATCH v5 05/10] ACPI ERST: support for ACPI ERST feature

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: Eric DeVolder <eric.devolder@oracle.com>
To: Igor Mammedov <imammedo@redhat.com>
Cc: ehabkost@redhat.com, mst@redhat.com, konrad.wilk@oracle.com,
	qemu-devel@nongnu.org, pbonzini@redhat.com,
	boris.ostrovsky@oracle.com, rth@twiddle.net
Subject: Re: [PATCH v5 05/10] ACPI ERST: support for ACPI ERST feature
Date: Wed, 21 Jul 2021 11:07:40 -0500	[thread overview]
Message-ID: <0a09250e-4347-0ad6-d95c-0bfd2094b98b@oracle.com> (raw)
In-Reply-To: <20210720141704.381734cc@redhat.com>



On 7/20/21 7:17 AM, Igor Mammedov wrote:
> On Wed, 30 Jun 2021 15:07:16 -0400
> Eric DeVolder <eric.devolder@oracle.com> wrote:
> 
>> This change implements the support for the ACPI ERST feature.
> Drop this
Done

> 
>>
>> This implements a PCI device for ACPI ERST. This implments the
> s/implments/implements/
Corrected

> 
>> non-NVRAM "mode" of operation for ERST.
> add here why non-NVRAM "mode" is implemented.
How about:
This implements a PCI device for ACPI ERST. This implments the
non-NVRAM "mode" of operation for ERST as it is supported by
Linux and Windows and aligns with ERST support in most BIOS.


> 
> Also even if this non-NVRAM implementation, there is still
> a lot of not necessary data copying (see below) so drop it
> or justify why it's there.
>   
>> This change also includes erst.c in the build of general ACPI support.
> Drop this as well
Done

> 
> 
>> Signed-off-by: Eric DeVolder <eric.devolder@oracle.com>
>> ---
>>   hw/acpi/erst.c      | 704 ++++++++++++++++++++++++++++++++++++++++++++++++++++
>>   hw/acpi/meson.build |   1 +
>>   2 files changed, 705 insertions(+)
>>   create mode 100644 hw/acpi/erst.c
>>
>> diff --git a/hw/acpi/erst.c b/hw/acpi/erst.c
>> new file mode 100644
>> index 0000000..6e9bd2e
>> --- /dev/null
>> +++ b/hw/acpi/erst.c
>> @@ -0,0 +1,704 @@
>> +/*
>> + * ACPI Error Record Serialization Table, ERST, Implementation
>> + *
>> + * Copyright (c) 2021 Oracle and/or its affiliates.
>> + *
>> + * ACPI ERST introduced in ACPI 4.0, June 16, 2009.
>> + * ACPI Platform Error Interfaces : Error Serialization
>> + *
>> + * This library is free software; you can redistribute it and/or
>> + * modify it under the terms of the GNU Lesser General Public
>> + * License as published by the Free Software Foundation;
>> + * version 2 of the License.
>> + *
>> + * This library is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
>> + * Lesser General Public License for more details.
>> + *
>> + * You should have received a copy of the GNU Lesser General Public
>> + * License along with this library; if not, see <http://www.gnu.org/licenses/>
>> + */
>> +
>> +#include <sys/types.h>
>> +#include <sys/stat.h>
>> +#include <unistd.h>
>> +
>> +#include "qemu/osdep.h"
>> +#include "qapi/error.h"
>> +#include "hw/qdev-core.h"
>> +#include "exec/memory.h"
>> +#include "qom/object.h"
>> +#include "hw/pci/pci.h"
>> +#include "qom/object_interfaces.h"
>> +#include "qemu/error-report.h"
>> +#include "migration/vmstate.h"
>> +#include "hw/qdev-properties.h"
>> +#include "hw/acpi/acpi.h"
>> +#include "hw/acpi/acpi-defs.h"
>> +#include "hw/acpi/aml-build.h"
>> +#include "hw/acpi/bios-linker-loader.h"
>> +#include "exec/address-spaces.h"
>> +#include "sysemu/hostmem.h"
>> +#include "hw/acpi/erst.h"
>> +#include "trace.h"
>> +
>> +/* UEFI 2.1: Append N Common Platform Error Record */
>> +#define UEFI_CPER_RECORD_MIN_SIZE 128U
>> +#define UEFI_CPER_RECORD_LENGTH_OFFSET 20U
>> +#define UEFI_CPER_RECORD_ID_OFFSET 96U
>> +#define IS_UEFI_CPER_RECORD(ptr) \
>> +    (((ptr)[0] == 'C') && \
>> +     ((ptr)[1] == 'P') && \
>> +     ((ptr)[2] == 'E') && \
>> +     ((ptr)[3] == 'R'))
>> +#define THE_UEFI_CPER_RECORD_ID(ptr) \
>> +    (*(uint64_t *)(&(ptr)[UEFI_CPER_RECORD_ID_OFFSET]))
>> +
>> +/*
>> + * This implementation is an ACTION (cmd) and VALUE (data)
>> + * interface consisting of just two 64-bit registers.
>> + */
>> +#define ERST_REG_SIZE (2UL * sizeof(uint64_t))
> 
>> +#define ERST_CSR_ACTION (0UL << 3) /* action (cmd) */
>> +#define ERST_CSR_VALUE  (1UL << 3) /* argument/value (data) */
> what's meaning of CRS?
CSR = control status register
> Looking at patch both should be called ERST_[ACTION|VALUE]_OFFSET
Done
> pls use explicit offset values instead of shifting bit.
Done
> 
> 
>> +/*
>> + * ERST_RECORD_SIZE is the buffer size for exchanging ERST
>> + * record contents. Thus, it defines the maximum record size.
>> + * As this is mapped through a PCI BAR, it must be a power of
>> + * two, and should be at least PAGE_SIZE.
>> + * Records are stored in the backing file in a simple fashion.
>> + * The backing file is essentially divided into fixed size
>> + * "slots", ERST_RECORD_SIZE in length, with each "slot"
>> + * storing a single record. No attempt at optimizing storage
>> + * through compression, compaction, etc is attempted.
>> + * NOTE that any change to this value will make any pre-
>> + * existing backing files, not of the same ERST_RECORD_SIZE,
>> + * unusable to the guest.
>> + */
>> +/* 8KiB records, not too small, not too big */
>> +#define ERST_RECORD_SIZE (2UL * 4096)
>> +
>> +#define ERST_INVALID_RECORD_ID (~0UL)
>> +#define ERST_EXECUTE_OPERATION_MAGIC 0x9CUL
>> +
>> +/*
>> + * Object cast macro
>> + */
>> +#define ACPIERST(obj) \
>> +    OBJECT_CHECK(ERSTDeviceState, (obj), TYPE_ACPI_ERST)
>> +
>> +/*
>> + * Main ERST device state structure
>> + */
>> +typedef struct {
>> +    PCIDevice parent_obj;
>> +
>> +    HostMemoryBackend *hostmem;
>> +    MemoryRegion *hostmem_mr;
>> +
>> +    MemoryRegion iomem; /* programming registes */
>> +    MemoryRegion nvmem; /* exchange buffer */
>> +    uint32_t prop_size;
> s/^^^/storage_size/
Corrected

> 
>> +    hwaddr bar0; /* programming registers */
>> +    hwaddr bar1; /* exchange buffer */
> why do you need to keep this addresses around?
> Suggest to drop these fields and use local variables or pci_get_bar_addr() at call site.
Corrected

> 
>> +
>> +    uint8_t operation;
>> +    uint8_t busy_status;
>> +    uint8_t command_status;
>> +    uint32_t record_offset;
>> +    uint32_t record_count;
>> +    uint64_t reg_action;
>> +    uint64_t reg_value;
>> +    uint64_t record_identifier;
>> +
>> +    unsigned next_record_index;
> 
> 
>> +    uint8_t record[ERST_RECORD_SIZE]; /* read/written directly by guest */
>> +    uint8_t tmp_record[ERST_RECORD_SIZE]; /* intermediate manipulation buffer */
> drop these see [**] below
Corrected

> 
>> +
>> +} ERSTDeviceState;
>> +
>> +/*******************************************************************/
>> +/*******************************************************************/
>> +
>> +static unsigned copy_from_nvram_by_index(ERSTDeviceState *s, unsigned index)
>> +{
>> +    /* Read an nvram entry into tmp_record */
>> +    unsigned rc = ACPI_ERST_STATUS_FAILED;
>> +    off_t offset = (index * ERST_RECORD_SIZE);
>> +
>> +    if ((offset + ERST_RECORD_SIZE) <= s->prop_size) {
>> +        if (s->hostmem_mr) {
>> +            uint8_t *p = (uint8_t *)memory_region_get_ram_ptr(s->hostmem_mr);
>> +            memcpy(s->tmp_record, p + offset, ERST_RECORD_SIZE);
>> +            rc = ACPI_ERST_STATUS_SUCCESS;
>> +        }
>> +    }
>> +    return rc;
>> +}
>> +
>> +static unsigned copy_to_nvram_by_index(ERSTDeviceState *s, unsigned index)
>> +{
>> +    /* Write entry in tmp_record into nvram, and backing file */
>> +    unsigned rc = ACPI_ERST_STATUS_FAILED;
>> +    off_t offset = (index * ERST_RECORD_SIZE);
>> +
>> +    if ((offset + ERST_RECORD_SIZE) <= s->prop_size) {
>> +        if (s->hostmem_mr) {
>> +            uint8_t *p = (uint8_t *)memory_region_get_ram_ptr(s->hostmem_mr);
>> +            memcpy(p + offset, s->tmp_record, ERST_RECORD_SIZE);
>> +            rc = ACPI_ERST_STATUS_SUCCESS;
>> +        }
>> +    }
>> +    return rc;
>> +}
>> +
>> +static int lookup_erst_record_by_identifier(ERSTDeviceState *s,
>> +    uint64_t record_identifier, bool *record_found, bool alloc_for_write)
>> +{
>> +    int rc = -1;
>> +    int empty_index = -1;
>> +    int index = 0;
>> +    unsigned rrc;
>> +
>> +    *record_found = 0;
>> +
>> +    do {
>> +        rrc = copy_from_nvram_by_index(s, (unsigned)index);
> 
> you have direct access to backend memory so there is no need
> whatsoever to copy records from it to an intermediate buffer
> everywhere. Almost all operations with records can be done
> in place modulo EXECUTE_OPERATION action in BEGIN_[READ|WRITE]
> context, where record is moved between backend and guest buffer.
> 
> So please eliminate all not necessary copying.
> (for fun, time operations and set backend size to some huge
> value to see how expensive this code is)

I've corrected this. In our previous exchangs, I thought the reference
to copying was about trying to directly have guest write/read the appropriate
record in the backend storage. After reading this comment I realized that
yes I was doing alot of copying (an artifact of the transition away from
direct file i/o to MemoryBackend). So good find, and I've eliminated the
intermediate copying.

> 
>> +        if (rrc == ACPI_ERST_STATUS_SUCCESS) {
>> +            uint64_t this_identifier;
>> +            this_identifier = THE_UEFI_CPER_RECORD_ID(s->tmp_record);
>> +            if (IS_UEFI_CPER_RECORD(s->tmp_record) &&
>> +                (this_identifier == record_identifier)) {
>> +                rc = index;
>> +                *record_found = 1;
>> +                break;
>> +            }
>> +            if ((this_identifier == ERST_INVALID_RECORD_ID) &&
>> +                (empty_index < 0)) {
>> +                empty_index = index; /* first available for write */
>> +            }
>> +        }
>> +        ++index;
>> +    } while (rrc == ACPI_ERST_STATUS_SUCCESS);
>> +
>> +    /* Record not found, allocate for writing */
>> +    if ((rc < 0) && alloc_for_write) {
>> +        rc = empty_index;
>> +    }
>> +
>> +    return rc;
>> +}
>> +
>> +static unsigned clear_erst_record(ERSTDeviceState *s)
>> +{
>> +    unsigned rc = ACPI_ERST_STATUS_RECORD_NOT_FOUND;
>> +    bool record_found;
>> +    int index;
>> +
>> +    index = lookup_erst_record_by_identifier(s,
>> +        s->record_identifier, &record_found, 0);
>> +    if (record_found) {
>> +        memset(s->tmp_record, 0xFF, ERST_RECORD_SIZE);
>> +        rc = copy_to_nvram_by_index(s, (unsigned)index);
>> +        if (rc == ACPI_ERST_STATUS_SUCCESS) {
>> +            s->record_count -= 1;
>> +        }
>> +    }
>> +
>> +    return rc;
>> +}
>> +
>> +static unsigned write_erst_record(ERSTDeviceState *s)
>> +{
>> +    unsigned rc = ACPI_ERST_STATUS_FAILED;
>> +
>> +    if (s->record_offset < (ERST_RECORD_SIZE - UEFI_CPER_RECORD_MIN_SIZE)) {
>> +        uint64_t record_identifier;
>> +        uint8_t *record = &s->record[s->record_offset];
>> +        bool record_found;
>> +        int index;
>> +
>> +        record_identifier = (s->record_identifier == ERST_INVALID_RECORD_ID)
>> +            ? THE_UEFI_CPER_RECORD_ID(record) : s->record_identifier;
>> +
>> +        index = lookup_erst_record_by_identifier(s,
>> +            record_identifier, &record_found, 1);
>> +        if (index < 0) {
>> +            rc = ACPI_ERST_STATUS_NOT_ENOUGH_SPACE;
>> +        } else {
>> +            if (0 != s->record_offset) {
>> +                memset(&s->tmp_record[ERST_RECORD_SIZE - s->record_offset],
>> +                    0xFF, s->record_offset);
>> +            }
>> +            memcpy(s->tmp_record, record, ERST_RECORD_SIZE - s->record_offset);
>> +            rc = copy_to_nvram_by_index(s, (unsigned)index);
>> +            if (rc == ACPI_ERST_STATUS_SUCCESS) {
>> +                if (!record_found) { /* not overwriting existing record */
>> +                    s->record_count += 1; /* writing new record */
>> +                }
>> +            }
>> +        }
>> +    }
>> +
>> +    return rc;
>> +}
>> +
>> +static unsigned next_erst_record(ERSTDeviceState *s,
>> +    uint64_t *record_identifier)
>> +{
>> +    unsigned rc = ACPI_ERST_STATUS_RECORD_NOT_FOUND;
>> +    unsigned index;
>> +    unsigned rrc;
>> +
>> +    *record_identifier = ERST_INVALID_RECORD_ID;
>> +
>> +    index = s->next_record_index;
>> +    do {
>> +        rrc = copy_from_nvram_by_index(s, (unsigned)index);
>> +        if (rrc == ACPI_ERST_STATUS_SUCCESS) {
>> +            if (IS_UEFI_CPER_RECORD(s->tmp_record)) {
>> +                s->next_record_index = index + 1; /* where to start next time */
>> +                *record_identifier = THE_UEFI_CPER_RECORD_ID(s->tmp_record);
>> +                rc = ACPI_ERST_STATUS_SUCCESS;
>> +                break;
>> +            }
>> +            ++index;
>> +        } else {
>> +            if (s->next_record_index == 0) {
>> +                rc = ACPI_ERST_STATUS_RECORD_STORE_EMPTY;
>> +            }
>> +            s->next_record_index = 0; /* at end, reset */
>> +        }
>> +    } while (rrc == ACPI_ERST_STATUS_SUCCESS);
>> +
>> +    return rc;
>> +}
>> +
>> +static unsigned read_erst_record(ERSTDeviceState *s)
>> +{
>> +    unsigned rc = ACPI_ERST_STATUS_RECORD_NOT_FOUND;
>> +    bool record_found;
>> +    int index;
>> +
>> +    index = lookup_erst_record_by_identifier(s,
>> +        s->record_identifier, &record_found, 0);
>> +    if (record_found) {
>> +        rc = copy_from_nvram_by_index(s, (unsigned)index);
>> +        if (rc == ACPI_ERST_STATUS_SUCCESS) {
>> +            if (s->record_offset < ERST_RECORD_SIZE) {
>> +                memcpy(&s->record[s->record_offset], s->tmp_record,
>> +                    ERST_RECORD_SIZE - s->record_offset);
>> +            }
>> +        }
>> +    }
>> +
>> +    return rc;
>> +}
>> +
>> +static unsigned get_erst_record_count(ERSTDeviceState *s)
>> +{
>> +    /* Compute record_count */
>> +    unsigned index = 0;
>> +
>> +    s->record_count = 0;
>> +    while (copy_from_nvram_by_index(s, index) == ACPI_ERST_STATUS_SUCCESS) {
>> +        uint8_t *ptr = &s->tmp_record[0];
>> +        uint64_t record_identifier = THE_UEFI_CPER_RECORD_ID(ptr);
>> +        if (IS_UEFI_CPER_RECORD(ptr) &&
>> +            (ERST_INVALID_RECORD_ID != record_identifier)) {
>> +            s->record_count += 1;
>> +        }
>> +        ++index;
>> +    }
>> +    return s->record_count;
>> +}
>> +
>> +/*******************************************************************/
>> +
>> +static uint64_t erst_rd_reg64(hwaddr addr,
>> +    uint64_t reg, unsigned size)
>> +{
>> +    uint64_t rdval;
>> +    uint64_t mask;
>> +    unsigned shift;
>> +
>> +    if (size == sizeof(uint64_t)) {
>> +        /* 64b access */
>> +        mask = 0xFFFFFFFFFFFFFFFFUL;
>> +        shift = 0;
>> +    } else {
>> +        /* 32b access */
>> +        mask = 0x00000000FFFFFFFFUL;
>> +        shift = ((addr & 0x4) == 0x4) ? 32 : 0;
>> +    }
>> +
>> +    rdval = reg;
>> +    rdval >>= shift;
>> +    rdval &= mask;
>> +
>> +    return rdval;
>> +}
>> +
>> +static uint64_t erst_wr_reg64(hwaddr addr,
>> +    uint64_t reg, uint64_t val, unsigned size)
>> +{
>> +    uint64_t wrval;
>> +    uint64_t mask;
>> +    unsigned shift;
>> +
>> +    if (size == sizeof(uint64_t)) {
>> +        /* 64b access */
>> +        mask = 0xFFFFFFFFFFFFFFFFUL;
>> +        shift = 0;
>> +    } else {
>> +        /* 32b access */
>> +        mask = 0x00000000FFFFFFFFUL;
>> +        shift = ((addr & 0x4) == 0x4) ? 32 : 0;
>> +    }
>> +
>> +    val &= mask;
>> +    val <<= shift;
>> +    mask <<= shift;
>> +    wrval = reg;
>> +    wrval &= ~mask;
>> +    wrval |= val;
>> +
>> +    return wrval;
>> +}
> (I see in next patch it's us defining access width in the ACPI tables)
> so question is: do we have to have mixed register width access?
> can't all register accesses be 64-bit?

Initially I attempted to just use 64-bit exclusively. The problem is that,
for reasons I don't understand, the OSPM on Linux, even x86_64, breaks a 64b
register access into two. Here's an example of reading the exchange buffer
address, which is coded as a 64b access:

acpi_erst_reg_write addr: 0x0000 <== 0x000000000000000d (size: 4)
acpi_erst_reg_read  addr: 0x0008 ==> 0x00000000c1010000 (size: 4)
acpi_erst_reg_read  addr: 0x000c ==> 0x0000000000000000 (size: 4)

So I went ahead and made ACTION register accesses 32b, else there would
be two reads of 32-bts, of which the second is useless.

> 
>> +static void erst_reg_write(void *opaque, hwaddr addr,
>> +    uint64_t val, unsigned size)
>> +{
>> +    ERSTDeviceState *s = (ERSTDeviceState *)opaque;
>> +
>> +    /*
>> +     * NOTE: All actions/operations/side effects happen on the WRITE,
>> +     * by design. The READs simply return the reg_value contents.
>> +     */
>> +    trace_acpi_erst_reg_write(addr, val, size);
>> +
>> +    switch (addr) {
>> +    case ERST_CSR_VALUE + 0:
>> +    case ERST_CSR_VALUE + 4:
>> +        s->reg_value = erst_wr_reg64(addr, s->reg_value, val, size);
>> +        break;
>> +    case ERST_CSR_ACTION + 0:
>> +/*  case ERST_CSR_ACTION+4: as coded, not really a 64b register */
>> +        switch (val) {
>> +        case ACPI_ERST_ACTION_BEGIN_WRITE_OPERATION:
>> +        case ACPI_ERST_ACTION_BEGIN_READ_OPERATION:
>> +        case ACPI_ERST_ACTION_BEGIN_CLEAR_OPERATION:
>> +        case ACPI_ERST_ACTION_BEGIN_DUMMY_WRITE_OPERATION:
>> +        case ACPI_ERST_ACTION_END_OPERATION:
>> +            s->operation = val;
>> +            break;
>> +        case ACPI_ERST_ACTION_SET_RECORD_OFFSET:
>> +            s->record_offset = s->reg_value;
>> +            break;
>> +        case ACPI_ERST_ACTION_EXECUTE_OPERATION:
>> +            if ((uint8_t)s->reg_value == ERST_EXECUTE_OPERATION_MAGIC) {
>> +                s->busy_status = 1;
>> +                switch (s->operation) {
>> +                case ACPI_ERST_ACTION_BEGIN_WRITE_OPERATION:
>> +                    s->command_status = write_erst_record(s);
>> +                    break;
>> +                case ACPI_ERST_ACTION_BEGIN_READ_OPERATION:
>> +                    s->command_status = read_erst_record(s);
>> +                    break;
>> +                case ACPI_ERST_ACTION_BEGIN_CLEAR_OPERATION:
>> +                    s->command_status = clear_erst_record(s);
>> +                    break;
>> +                case ACPI_ERST_ACTION_BEGIN_DUMMY_WRITE_OPERATION:
>> +                    s->command_status = ACPI_ERST_STATUS_SUCCESS;
>> +                    break;
>> +                case ACPI_ERST_ACTION_END_OPERATION:
>> +                    s->command_status = ACPI_ERST_STATUS_SUCCESS;
>> +                    break;
>> +                default:
>> +                    s->command_status = ACPI_ERST_STATUS_FAILED;
>> +                    break;
>> +                }
>> +                s->record_identifier = ERST_INVALID_RECORD_ID;
>> +                s->busy_status = 0;
>> +            }
>> +            break;
>> +        case ACPI_ERST_ACTION_CHECK_BUSY_STATUS:
>> +            s->reg_value = s->busy_status;
>> +            break;
>> +        case ACPI_ERST_ACTION_GET_COMMAND_STATUS:
>> +            s->reg_value = s->command_status;
>> +            break;
>> +        case ACPI_ERST_ACTION_GET_RECORD_IDENTIFIER:
>> +            s->command_status = next_erst_record(s, &s->reg_value);
>> +            break;
>> +        case ACPI_ERST_ACTION_SET_RECORD_IDENTIFIER:
>> +            s->record_identifier = s->reg_value;
>> +            break;
>> +        case ACPI_ERST_ACTION_GET_RECORD_COUNT:
>> +            s->reg_value = s->record_count;
>> +            break;
>> +        case ACPI_ERST_ACTION_GET_ERROR_LOG_ADDRESS_RANGE:
>> +            s->reg_value = s->bar1;
>> +            break;
>> +        case ACPI_ERST_ACTION_GET_ERROR_LOG_ADDRESS_LENGTH:
>> +            s->reg_value = ERST_RECORD_SIZE;
>> +            break;
>> +        case ACPI_ERST_ACTION_GET_ERROR_LOG_ADDRESS_RANGE_ATTRIBUTES:
>> +            s->reg_value = 0x0; /* intentional, not NVRAM mode */
>> +            break;
>> +        case ACPI_ERST_ACTION_GET_EXECUTE_OPERATION_TIMINGS:
>> +            /*
>> +             * 100UL is max, 10UL is nominal
> 100/10 of what, also add reference to spec/table it comes from
> and explain in comment why theses values were chosen
I've changed the comment and style to be similar to build_amd_iommu().
These are merely sane non-zero max/min times.

> 
>> +             */
>> +            s->reg_value = ((100UL << 32) | (10UL << 0));
>> +            break;
>> +        case ACPI_ERST_ACTION_RESERVED:
> not necessary, it will be handled by 'default:'
Corrected

> 
>> +        default:
>> +            /*
>> +             * Unknown action/command, NOP
>> +             */
>> +            break;
>> +        }
>> +        break;
>> +    default:
>> +        /*
>> +         * This should not happen, but if it does, NOP
>> +         */
>> +        break;
>> +    }
>> +}
>> +
>> +static uint64_t erst_reg_read(void *opaque, hwaddr addr,
>> +                                unsigned size)
>> +{
>> +    ERSTDeviceState *s = (ERSTDeviceState *)opaque;
>> +    uint64_t val = 0;
>> +
>> +    switch (addr) {
>> +    case ERST_CSR_ACTION + 0:
>> +    case ERST_CSR_ACTION + 4:
>> +        val = erst_rd_reg64(addr, s->reg_action, size);
>> +        break;
>> +    case ERST_CSR_VALUE + 0:
>> +    case ERST_CSR_VALUE + 4:
>> +        val = erst_rd_reg64(addr, s->reg_value, size);
>> +        break;
>> +    default:
>> +        break;
>> +    }
>> +    trace_acpi_erst_reg_read(addr, val, size);
>> +    return val;
>> +}
>> +
>> +static const MemoryRegionOps erst_reg_ops = {
>> +    .read = erst_reg_read,
>> +    .write = erst_reg_write,
>> +    .endianness = DEVICE_NATIVE_ENDIAN,
>> +};
>> +
>> +static void erst_mem_write(void *opaque, hwaddr addr,
>> +    uint64_t val, unsigned size)
>> +{
>> +    ERSTDeviceState *s = (ERSTDeviceState *)opaque;
> 
>> +    uint8_t *ptr = &s->record[addr - 0];
>> +    trace_acpi_erst_mem_write(addr, val, size);
>> +    switch (size) {
>> +    default:
>> +    case sizeof(uint8_t):
>> +        *(uint8_t *)ptr = (uint8_t)val;
>> +        break;
>> +    case sizeof(uint16_t):
>> +        *(uint16_t *)ptr = (uint16_t)val;
>> +        break;
>> +    case sizeof(uint32_t):
>> +        *(uint32_t *)ptr = (uint32_t)val;
>> +        break;
>> +    case sizeof(uint64_t):
>> +        *(uint64_t *)ptr = (uint64_t)val;
>> +        break;
>> +    }
>> +}
>> +
>> +static uint64_t erst_mem_read(void *opaque, hwaddr addr,
>> +                                unsigned size)
>> +{
>> +    ERSTDeviceState *s = (ERSTDeviceState *)opaque;
>> +    uint8_t *ptr = &s->record[addr - 0];
>> +    uint64_t val = 0;
>> +    switch (size) {
>> +    default:
>> +    case sizeof(uint8_t):
>> +        val = *(uint8_t *)ptr;
>> +        break;
>> +    case sizeof(uint16_t):
>> +        val = *(uint16_t *)ptr;
>> +        break;
>> +    case sizeof(uint32_t):
>> +        val = *(uint32_t *)ptr;
>> +        break;
>> +    case sizeof(uint64_t):
>> +        val = *(uint64_t *)ptr;
>> +        break;
>> +    }
>> +    trace_acpi_erst_mem_read(addr, val, size);
>> +    return val;
>> +}
>> +
>> +static const MemoryRegionOps erst_mem_ops = {
>> +    .read = erst_mem_read,
>> +    .write = erst_mem_write,
>> +    .endianness = DEVICE_NATIVE_ENDIAN,
>> +};
>> +
>> +/*******************************************************************/
>> +/*******************************************************************/
>> +
>> +static const VMStateDescription erst_vmstate  = {
>> +    .name = "acpi-erst",
>> +    .version_id = 1,
>> +    .minimum_version_id = 1,
>> +    .fields = (VMStateField[]) {
>> +        VMSTATE_UINT8(operation, ERSTDeviceState),
>> +        VMSTATE_UINT8(busy_status, ERSTDeviceState),
>> +        VMSTATE_UINT8(command_status, ERSTDeviceState),
>> +        VMSTATE_UINT32(record_offset, ERSTDeviceState),
>> +        VMSTATE_UINT32(record_count, ERSTDeviceState),
>> +        VMSTATE_UINT64(reg_action, ERSTDeviceState),
>> +        VMSTATE_UINT64(reg_value, ERSTDeviceState),
>> +        VMSTATE_UINT64(record_identifier, ERSTDeviceState),
>> +        VMSTATE_UINT8_ARRAY(record, ERSTDeviceState, ERST_RECORD_SIZE),
>> +        VMSTATE_UINT8_ARRAY(tmp_record, ERSTDeviceState, ERST_RECORD_SIZE),
>> +        VMSTATE_END_OF_LIST()
>> +    }
>> +};
>> +
>> +static void erst_realizefn(PCIDevice *pci_dev, Error **errp)
>> +{
>> +    ERSTDeviceState *s = ACPIERST(pci_dev);
>> +    unsigned index = 0;
>> +    bool share;
>> +
>> +    trace_acpi_erst_realizefn_in();
>> +
>> +    if (!s->hostmem) {
>> +        error_setg(errp, "'" ACPI_ERST_MEMDEV_PROP "' property is not set");
>> +        return;
>> +    } else if (host_memory_backend_is_mapped(s->hostmem)) {
>> +        error_setg(errp, "can't use already busy memdev: %s",
>> +                   object_get_canonical_path_component(OBJECT(s->hostmem)));
>> +        return;
>> +    }
>> +
>> +    share = object_property_get_bool(OBJECT(s->hostmem), "share", &error_fatal);
> s/&error_fatal/errp/
Corrected

> 
>> +    if (!share) {
>> +        error_setg(errp, "ACPI ERST requires hostmem property share=on: %s",
>> +                   object_get_canonical_path_component(OBJECT(s->hostmem)));
>> +    }
> This limits possible to use backends to file|memfd only, so
> I wonder if really need this limitation, what if user doesn't
> care about preserving it across QEMU restarts. (i.e. usecase
> where storage is used as a means to troubleshoot guest crash
> i.e. QEMU is not restarted in between)
> 
> Maybe instead of enforcing we should just document that if user
> wishes to preserve content they should use file|memfd backend with
> share=on option.

I've removed this check. It is documented the way it is intended to be used.

> 
>> +
>> +    s->hostmem_mr = host_memory_backend_get_memory(s->hostmem);
>> +
>> +    /* HostMemoryBackend size will be multiple of PAGE_SIZE */
>> +    s->prop_size = object_property_get_int(OBJECT(s->hostmem), "size", &error_fatal);
> s/&error_fatal/errp/
Corrected

> 
>> +
>> +    /* Convert prop_size to integer multiple of ERST_RECORD_SIZE */
>> +    s->prop_size -= (s->prop_size % ERST_RECORD_SIZE);
> 
> pls, no fixups on behalf of user, if size is not what it should be
> error out with suggestion how to fix it.
Removed

> 
>> +
>> +    /*
>> +     * MemoryBackend initializes contents to zero, but we actually
>> +     * want contents initialized to 0xFF, ERST_INVALID_RECORD_ID.
>> +     */
>> +    if (copy_from_nvram_by_index(s, 0) == ACPI_ERST_STATUS_SUCCESS) {
>> +        if (s->tmp_record[0] == 0x00) {
>> +            memset(s->tmp_record, 0xFF, ERST_RECORD_SIZE);
> this doesn't scale,
> (set backend size to more than host physical RAM, put it on slow storage and have fun.)
of course, which is why i think we need to have an upper bound (my early
submissions did).

> 
> Is it possible to use 0 as invalid record id or change storage format
> so you would not have to rewrite whole file at startup (maybe some sort
no

> of metadata header/records book-keeping table before actual records.
> And initialize file only if header is invalid.)
I have to scan the backend storage anyway in order to initialize the record
count, so I've combined that scan with a test to see if the backend storage
needs to be initialized.

> 
>> +            index = 0;
>> +            while (copy_to_nvram_by_index(s, index) == ACPI_ERST_STATUS_SUCCESS) {
>> +                ++index;
>> +            }
> also back&forth copying here is not really necessary.
corrected

> 
>> +        }
>> +    }
>> +
>> +    /* Initialize record_count */
>> +    get_erst_record_count(s);
> why not put it into reset?
It is initialized once, then subsequent write/clear operations update
the counter as needed.

> 
>> +
>> +    /* BAR 0: Programming registers */
>> +    memory_region_init_io(&s->iomem, OBJECT(pci_dev), &erst_reg_ops, s,
>> +                          TYPE_ACPI_ERST, ERST_REG_SIZE);
>> +    pci_register_bar(pci_dev, 0, PCI_BASE_ADDRESS_SPACE_MEMORY, &s->iomem);
>> +
> 
>> +    /* BAR 1: Exchange buffer memory */
>> +    memory_region_init_io(&s->nvmem, OBJECT(pci_dev), &erst_mem_ops, s,
>> +                          TYPE_ACPI_ERST, ERST_RECORD_SIZE);
>> +    pci_register_bar(pci_dev, 1, PCI_BASE_ADDRESS_SPACE_MEMORY, &s->nvmem);
> 
> **)
> instead of using mmio for buffer where each write causes
> guest exit to QEMU, map memory region directly to guest.
> see ivshmem_bar2, the only difference with ivshmem, you'd
> create memory region manually (for example you can use
> memory_region_init_resizeable_ram)
> 
> this way you can speedup access and drop erst_mem_ops and
> [tmp_]record intermediate buffers.
> 
> Instead of [tmp_]record you can copy record content
> directly between buffer and backend memory regions.

I've changed the exchange buffer into a MemoryBackend object and
eliminated the erst_mem_ops.

> 
>> +    /*
>> +     * The vmstate_register_ram_global() puts the memory in
>> +     * migration stream, where it is written back to the memory
>> +     * upon reaching the destination, which causes the backing
>> +     * file to be updated (with share=on).
>> +     */
>> +    vmstate_register_ram_global(s->hostmem_mr);
>> +
>> +    trace_acpi_erst_realizefn_out(s->prop_size);
>> +}
>> +
>> +static void erst_reset(DeviceState *dev)
>> +{
>> +    ERSTDeviceState *s = ACPIERST(dev);
>> +
>> +    trace_acpi_erst_reset_in(s->record_count);
>> +    s->operation = 0;
>> +    s->busy_status = 0;
>> +    s->command_status = ACPI_ERST_STATUS_SUCCESS;
> 
>> +    /* indicate empty/no-more until further notice */
> pls rephrase, I'm not sure what it's trying to say
Eliminated; I don't know why I was trying to say there either
> 
>> +    s->record_identifier = ERST_INVALID_RECORD_ID;
>> +    s->record_offset = 0;
>> +    s->next_record_index = 0;
> 
>> +    /* NOTE: record_count and nvram are initialized elsewhere */
>> +    trace_acpi_erst_reset_out(s->record_count);
>> +}
>> +
>> +static Property erst_properties[] = {
>> +    DEFINE_PROP_LINK(ACPI_ERST_MEMDEV_PROP, ERSTDeviceState, hostmem,
>> +                     TYPE_MEMORY_BACKEND, HostMemoryBackend *),
>> +    DEFINE_PROP_END_OF_LIST(),
>> +};
>> +
>> +static void erst_class_init(ObjectClass *klass, void *data)
>> +{
>> +    DeviceClass *dc = DEVICE_CLASS(klass);
>> +    PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
>> +
>> +    trace_acpi_erst_class_init_in();
>> +    k->realize = erst_realizefn;
>> +    k->vendor_id = PCI_VENDOR_ID_REDHAT;
>> +    k->device_id = PCI_DEVICE_ID_REDHAT_ACPI_ERST;
>> +    k->revision = 0x00;
>> +    k->class_id = PCI_CLASS_OTHERS;
>> +    dc->reset = erst_reset;
>> +    dc->vmsd = &erst_vmstate;
>> +    dc->user_creatable = true;
>> +    device_class_set_props(dc, erst_properties);
>> +    dc->desc = "ACPI Error Record Serialization Table (ERST) device";
>> +    set_bit(DEVICE_CATEGORY_MISC, dc->categories);
>> +    trace_acpi_erst_class_init_out();
>> +}
>> +
>> +static const TypeInfo erst_type_info = {
>> +    .name          = TYPE_ACPI_ERST,
>> +    .parent        = TYPE_PCI_DEVICE,
>> +    .class_init    = erst_class_init,
>> +    .instance_size = sizeof(ERSTDeviceState),
>> +    .interfaces = (InterfaceInfo[]) {
>> +        { INTERFACE_CONVENTIONAL_PCI_DEVICE },
> what is this for here?
> 
>> +        { }
>> +    }
>> +};
>> +
>> +static void erst_register_types(void)
>> +{
>> +    type_register_static(&erst_type_info);
>> +}
>> +
>> +type_init(erst_register_types)
>> diff --git a/hw/acpi/meson.build b/hw/acpi/meson.build
>> index dd69577..262a8ee 100644
>> --- a/hw/acpi/meson.build
>> +++ b/hw/acpi/meson.build
>> @@ -4,6 +4,7 @@ acpi_ss.add(files(
>>     'aml-build.c',
>>     'bios-linker-loader.c',
>>     'utils.c',
>> +  'erst.c',
>>   ))
>>   acpi_ss.add(when: 'CONFIG_ACPI_CPU_HOTPLUG', if_true: files('cpu.c'))
>>   acpi_ss.add(when: 'CONFIG_ACPI_CPU_HOTPLUG', if_true: files('cpu_hotplug.c'))
>

next prev parent reply	other threads:[~2021-07-21 16:09 UTC|newest]

Thread overview: 58+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-30 19:07 [PATCH v5 00/10] acpi: Error Record Serialization Table, ERST, support for QEMU Eric DeVolder
2021-06-30 19:07 ` [PATCH v5 01/10] ACPI ERST: bios-tables-test.c steps 1 and 2 Eric DeVolder
2021-06-30 19:07 ` [PATCH v5 02/10] ACPI ERST: specification for ERST support Eric DeVolder
2021-06-30 19:26   ` Eric DeVolder
2021-07-19 15:02     ` Igor Mammedov
2021-07-21 15:42       ` Eric DeVolder
2021-07-26 10:06         ` Igor Mammedov
2021-07-26 19:52           ` Eric DeVolder
2021-07-27 11:45             ` Igor Mammedov
2021-07-28 15:16               ` Eric DeVolder
2021-06-30 19:07 ` [PATCH v5 03/10] ACPI ERST: PCI device_id for ERST Eric DeVolder
2021-07-19 15:06   ` Igor Mammedov
2021-07-21 15:42     ` Eric DeVolder
2021-06-30 19:07 ` [PATCH v5 04/10] ACPI ERST: header file " Eric DeVolder
2021-06-30 19:07 ` [PATCH v5 05/10] ACPI ERST: support for ACPI ERST feature Eric DeVolder
2021-07-20 12:17   ` Igor Mammedov
2021-07-21 16:07     ` Eric DeVolder [this message]
2021-07-26 10:42       ` Igor Mammedov
2021-07-26 20:01         ` Eric DeVolder
2021-07-27 12:52           ` Igor Mammedov
2021-08-04 22:13             ` Eric DeVolder
2021-08-05  9:05               ` Igor Mammedov
2021-07-21 17:36     ` Eric DeVolder
2021-07-26 10:13       ` Igor Mammedov
2021-06-30 19:07 ` [PATCH v5 06/10] ACPI ERST: build the ACPI ERST table Eric DeVolder
2021-07-20 13:16   ` Igor Mammedov
2021-07-20 14:59     ` Igor Mammedov
2021-07-21 16:12     ` Eric DeVolder
2021-07-26 11:00       ` Igor Mammedov
2021-07-26 20:02         ` Eric DeVolder
2021-07-27 12:01           ` Igor Mammedov
2021-07-28 15:18             ` Eric DeVolder
2021-06-30 19:07 ` [PATCH v5 07/10] ACPI ERST: trace support Eric DeVolder
2021-07-20 13:15   ` Igor Mammedov
2021-07-21 16:14     ` Eric DeVolder
2021-07-26 11:08       ` Igor Mammedov
2021-07-26 20:03         ` Eric DeVolder
2021-06-30 19:07 ` [PATCH v5 08/10] ACPI ERST: create ACPI ERST table for pc/x86 machines Eric DeVolder
2021-07-20 13:19   ` Igor Mammedov
2021-07-21 16:16     ` Eric DeVolder
2021-07-26 11:30       ` Igor Mammedov
2021-07-26 20:03         ` Eric DeVolder
2021-06-30 19:07 ` [PATCH v5 09/10] ACPI ERST: qtest for ERST Eric DeVolder
2021-07-20 13:38   ` Igor Mammedov
2021-07-21 16:18     ` Eric DeVolder
2021-07-26 11:45       ` Igor Mammedov
2021-07-26 20:06         ` Eric DeVolder
2021-06-30 19:07 ` [PATCH v5 10/10] ACPI ERST: step 6 of bios-tables-test.c Eric DeVolder
2021-07-20 13:24   ` Igor Mammedov
2021-07-21 16:19     ` Eric DeVolder
2021-07-13 20:38 ` [PATCH v5 00/10] acpi: Error Record Serialization Table, ERST, support for QEMU Michael S. Tsirkin
2021-07-21 15:23   ` Eric DeVolder
2021-07-20 14:57 ` Igor Mammedov
2021-07-21 15:26   ` Eric DeVolder
2021-07-23 16:26   ` Eric DeVolder
2021-07-27 12:55   ` Igor Mammedov
2021-07-28 15:19     ` Eric DeVolder
2021-07-29  8:07       ` Igor Mammedov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=0a09250e-4347-0ad6-d95c-0bfd2094b98b@oracle.com \
    --to=eric.devolder@oracle.com \
    --cc=boris.ostrovsky@oracle.com \
    --cc=ehabkost@redhat.com \
    --cc=imammedo@redhat.com \
    --cc=konrad.wilk@oracle.com \
    --cc=mst@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=rth@twiddle.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).