All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jonathan Cameron <Jonathan.Cameron@huawei.com>
To: <shiju.jose@huawei.com>
Cc: <qemu-devel@nongnu.org>, <linux-cxl@vger.kernel.org>,
	<tanxiaofei@huawei.com>, <prime.zeng@hisilicon.com>,
	<linuxarm@huawei.com>
Subject: Re: [PATCH v2 7/7] hw/cxl: Add emulation for memory sparing control feature
Date: Fri, 20 Jun 2025 15:48:13 +0100	[thread overview]
Message-ID: <20250620154813.00002bbd@huawei.com> (raw)
In-Reply-To: <20250619151619.1695-8-shiju.jose@huawei.com>

On Thu, 19 Jun 2025 16:16:19 +0100
<shiju.jose@huawei.com> wrote:

> From: Shiju Jose <shiju.jose@huawei.com>
> 
> Memory sparing is defined as a repair function that replaces a portion of
> memory with a portion of functional memory at that same DPA. The subclasses
> for this operation vary in terms of the scope of the sparing being
> performed. The Cacheline sparing subclass refers to a sparing action that
> can replace a full cacheline. Row sparing is provided as an alternative to
> PPR sparing functions and its scope is that of a single DDR row. Bank
> sparing allows an entire bank to be replaced. Rank sparing is defined as
> an operation in which an entire DDR rank is replaced.
> 
> Memory sparing maintenance operations may be supported by CXL devices
> that implement CXL.mem protocol. A sparing maintenance operation requests
> the CXL device to perform a repair operation on its media.
> For example, a CXL device with DRAM components that support memory sparing
> features may implement sparing Maintenance operations.
> 
> The host may issue a query command by setting Query Resources flag in the
> Input Payload (CXL Spec 3.2 Table 8-120) to determine availability of
> sparing resources for a given address. In response to a query request,
> the device shall report the resource availability by producing the Memory
> Sparing Event Record (CXL Spec 3.2 Table 8-60) in which the Channel, Rank,
> Nibble Mask, Bank Group, Bank, Row, Column, Sub-Channel fields are a copy
> of the values specified in the request.
> 
> During the execution of a sparing maintenance operation, a CXL memory device:
> - May or may not retain data
> - May or may not be able to process CXL.mem requests correctly.
> These CXL memory device capabilities are specified by restriction flags
> in the memory sparing feature readable attributes.
> 
> When a CXL device identifies error on a memory component, the device
> may inform the host about the need for a memory sparing maintenance
> operation by using DRAM event record, where the 'maintenance needed' flag
> may set. The event record contains some of the DPA, Channel, Rank,
> Nibble Mask, Bank Group, Bank, Row, Column, Sub-Channel fields that
> should be repaired. The userspace tool requests for maintenance operation
> if the 'maintenance needed' flag set in the CXL DRAM error record.
> 
> CXL spec 3.2 section 8.2.10.7.2.3 describes the memory sparing feature
> discovery and configuration.
> 
> CXL spec 3.2 section 8.2.10.7.1.4 describes the device's memory sparing
> maintenance operation feature.
> 
> Add emulation for CXL memory device memory sparing control feature
> and memory sparing maintenance operation command.
> 
> TODO: Following are the pending tasks, though not sure how to implement.
>  1. Add emulation for memory sparing maintenance operation.

At most wipe the data if advertising that it won't be retained.
No need to actually do anything.

>  2. On query, report memory sparing resource availability in a memory sparing
>     event record if required in the future.
I'd go with a a per device per type set of counters.
Lets just say we have X of them on a device - once used up they are gone.
No need to worry too much on what X is.  Just pick some values so we have
something to test against. 4 maybe enough for testing?


Some comments on previous patch feed through to here.  A few more things inline.

Jonathan

> 
> Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
> ---
>  hw/cxl/cxl-mailbox-utils.c  | 295 ++++++++++++++++++++++++++++++++++++
>  hw/mem/cxl_type3.c          |  44 ++++++
>  include/hw/cxl/cxl_device.h |  40 +++++
>  3 files changed, 379 insertions(+)
> 
>  
> +typedef struct CXLMemSparingMaintInPayload {
> +    uint8_t flags;
> +    uint8_t channel;
> +    uint8_t rank;
> +    uint8_t nibble_mask[3];
> +    uint8_t bank_group;
> +    uint8_t bank;
> +    uint8_t row[3];
> +    uint16_t column;
> +    uint8_t sub_channel;
> +} QEMU_PACKED CXLMemSparingMaintInPayload;
> +
> +static CXLRetCode cxl_perform_mem_sparing(CXLType3Dev *ct3d, uint8_t sub_class,
> +                                          void *maint_pi)
> +{
> +     CXLMemSparingMaintInPayload *sparing_maint_pi = (void *)maint_pi;

Odd spacing

> +
> +    qemu_log_mask(LOG_UNIMP, "Memory Sparing Maintenance Input Payload...\n");
> +    qemu_log_mask(LOG_UNIMP, "flags = %u\n", sparing_maint_pi->flags);
> +    qemu_log_mask(LOG_UNIMP, "channel= %u\n", sparing_maint_pi->channel);
> +    qemu_log_mask(LOG_UNIMP, "rank = %u\n", sparing_maint_pi->rank);
> +    qemu_log_mask(LOG_UNIMP, "nibble_mask[0] = 0x%x\n",
> +                  sparing_maint_pi->nibble_mask[0]);
> +    qemu_log_mask(LOG_UNIMP, "nibble_mask[1] = 0x%x\n",
> +                  sparing_maint_pi->nibble_mask[1]);
> +    qemu_log_mask(LOG_UNIMP, "nibble_mask[2] = 0x%x\n",
> +                  sparing_maint_pi->nibble_mask[2]);
> +    qemu_log_mask(LOG_UNIMP, "bank_group = %u\n",
> +                  sparing_maint_pi->bank_group);
> +    qemu_log_mask(LOG_UNIMP, "bank = %u\n", sparing_maint_pi->bank);
> +    qemu_log_mask(LOG_UNIMP, "row[0] = 0x%x\n", sparing_maint_pi->row[0]);
> +    qemu_log_mask(LOG_UNIMP, "row[1] = 0x%x\n", sparing_maint_pi->row[1]);
> +    qemu_log_mask(LOG_UNIMP, "row[2] = 0x%x\n", sparing_maint_pi->row[2]);
> +    qemu_log_mask(LOG_UNIMP, "column = %u\n", sparing_maint_pi->column);
> +    qemu_log_mask(LOG_UNIMP, "sub_channel = %u\n",
> +                  sparing_maint_pi->sub_channel);

LOG_UNIMP is a bit odd given there is nothing to do really.

> +
> +    switch (sub_class) {
> +    case 0: /* Cacheline Memory Sparing */
> +        qemu_log("Cacheline Memory Sparing\n");
> +        return CXL_MBOX_SUCCESS;
> +    case 1: /* Row Memory Sparing */
> +        qemu_log("Row Memory Sparing\n");
> +        return CXL_MBOX_SUCCESS;
> +    case 2: /* Bank Memory Sparing */
> +        qemu_log("Bank Memory Sparing\n");
> +        return CXL_MBOX_SUCCESS;
> +    case 3: /* Rank Memory Sparing */
> +        qemu_log("Rank Memory Sparing\n");
> +        return CXL_MBOX_SUCCESS;
> +    default:
> +        return CXL_MBOX_UNSUPPORTED;

As previously - I think this is invalid parameter as the command is supported
just not the sub_class.

> +    }
> +}
> +


WARNING: multiple messages have this Message-ID (diff)
From: Jonathan Cameron via <qemu-devel@nongnu.org>
To: <shiju.jose@huawei.com>
Cc: <qemu-devel@nongnu.org>, <linux-cxl@vger.kernel.org>,
	<tanxiaofei@huawei.com>, <prime.zeng@hisilicon.com>,
	<linuxarm@huawei.com>
Subject: Re: [PATCH v2 7/7] hw/cxl: Add emulation for memory sparing control feature
Date: Fri, 20 Jun 2025 15:48:13 +0100	[thread overview]
Message-ID: <20250620154813.00002bbd@huawei.com> (raw)
In-Reply-To: <20250619151619.1695-8-shiju.jose@huawei.com>

On Thu, 19 Jun 2025 16:16:19 +0100
<shiju.jose@huawei.com> wrote:

> From: Shiju Jose <shiju.jose@huawei.com>
> 
> Memory sparing is defined as a repair function that replaces a portion of
> memory with a portion of functional memory at that same DPA. The subclasses
> for this operation vary in terms of the scope of the sparing being
> performed. The Cacheline sparing subclass refers to a sparing action that
> can replace a full cacheline. Row sparing is provided as an alternative to
> PPR sparing functions and its scope is that of a single DDR row. Bank
> sparing allows an entire bank to be replaced. Rank sparing is defined as
> an operation in which an entire DDR rank is replaced.
> 
> Memory sparing maintenance operations may be supported by CXL devices
> that implement CXL.mem protocol. A sparing maintenance operation requests
> the CXL device to perform a repair operation on its media.
> For example, a CXL device with DRAM components that support memory sparing
> features may implement sparing Maintenance operations.
> 
> The host may issue a query command by setting Query Resources flag in the
> Input Payload (CXL Spec 3.2 Table 8-120) to determine availability of
> sparing resources for a given address. In response to a query request,
> the device shall report the resource availability by producing the Memory
> Sparing Event Record (CXL Spec 3.2 Table 8-60) in which the Channel, Rank,
> Nibble Mask, Bank Group, Bank, Row, Column, Sub-Channel fields are a copy
> of the values specified in the request.
> 
> During the execution of a sparing maintenance operation, a CXL memory device:
> - May or may not retain data
> - May or may not be able to process CXL.mem requests correctly.
> These CXL memory device capabilities are specified by restriction flags
> in the memory sparing feature readable attributes.
> 
> When a CXL device identifies error on a memory component, the device
> may inform the host about the need for a memory sparing maintenance
> operation by using DRAM event record, where the 'maintenance needed' flag
> may set. The event record contains some of the DPA, Channel, Rank,
> Nibble Mask, Bank Group, Bank, Row, Column, Sub-Channel fields that
> should be repaired. The userspace tool requests for maintenance operation
> if the 'maintenance needed' flag set in the CXL DRAM error record.
> 
> CXL spec 3.2 section 8.2.10.7.2.3 describes the memory sparing feature
> discovery and configuration.
> 
> CXL spec 3.2 section 8.2.10.7.1.4 describes the device's memory sparing
> maintenance operation feature.
> 
> Add emulation for CXL memory device memory sparing control feature
> and memory sparing maintenance operation command.
> 
> TODO: Following are the pending tasks, though not sure how to implement.
>  1. Add emulation for memory sparing maintenance operation.

At most wipe the data if advertising that it won't be retained.
No need to actually do anything.

>  2. On query, report memory sparing resource availability in a memory sparing
>     event record if required in the future.
I'd go with a a per device per type set of counters.
Lets just say we have X of them on a device - once used up they are gone.
No need to worry too much on what X is.  Just pick some values so we have
something to test against. 4 maybe enough for testing?


Some comments on previous patch feed through to here.  A few more things inline.

Jonathan

> 
> Signed-off-by: Shiju Jose <shiju.jose@huawei.com>
> ---
>  hw/cxl/cxl-mailbox-utils.c  | 295 ++++++++++++++++++++++++++++++++++++
>  hw/mem/cxl_type3.c          |  44 ++++++
>  include/hw/cxl/cxl_device.h |  40 +++++
>  3 files changed, 379 insertions(+)
> 
>  
> +typedef struct CXLMemSparingMaintInPayload {
> +    uint8_t flags;
> +    uint8_t channel;
> +    uint8_t rank;
> +    uint8_t nibble_mask[3];
> +    uint8_t bank_group;
> +    uint8_t bank;
> +    uint8_t row[3];
> +    uint16_t column;
> +    uint8_t sub_channel;
> +} QEMU_PACKED CXLMemSparingMaintInPayload;
> +
> +static CXLRetCode cxl_perform_mem_sparing(CXLType3Dev *ct3d, uint8_t sub_class,
> +                                          void *maint_pi)
> +{
> +     CXLMemSparingMaintInPayload *sparing_maint_pi = (void *)maint_pi;

Odd spacing

> +
> +    qemu_log_mask(LOG_UNIMP, "Memory Sparing Maintenance Input Payload...\n");
> +    qemu_log_mask(LOG_UNIMP, "flags = %u\n", sparing_maint_pi->flags);
> +    qemu_log_mask(LOG_UNIMP, "channel= %u\n", sparing_maint_pi->channel);
> +    qemu_log_mask(LOG_UNIMP, "rank = %u\n", sparing_maint_pi->rank);
> +    qemu_log_mask(LOG_UNIMP, "nibble_mask[0] = 0x%x\n",
> +                  sparing_maint_pi->nibble_mask[0]);
> +    qemu_log_mask(LOG_UNIMP, "nibble_mask[1] = 0x%x\n",
> +                  sparing_maint_pi->nibble_mask[1]);
> +    qemu_log_mask(LOG_UNIMP, "nibble_mask[2] = 0x%x\n",
> +                  sparing_maint_pi->nibble_mask[2]);
> +    qemu_log_mask(LOG_UNIMP, "bank_group = %u\n",
> +                  sparing_maint_pi->bank_group);
> +    qemu_log_mask(LOG_UNIMP, "bank = %u\n", sparing_maint_pi->bank);
> +    qemu_log_mask(LOG_UNIMP, "row[0] = 0x%x\n", sparing_maint_pi->row[0]);
> +    qemu_log_mask(LOG_UNIMP, "row[1] = 0x%x\n", sparing_maint_pi->row[1]);
> +    qemu_log_mask(LOG_UNIMP, "row[2] = 0x%x\n", sparing_maint_pi->row[2]);
> +    qemu_log_mask(LOG_UNIMP, "column = %u\n", sparing_maint_pi->column);
> +    qemu_log_mask(LOG_UNIMP, "sub_channel = %u\n",
> +                  sparing_maint_pi->sub_channel);

LOG_UNIMP is a bit odd given there is nothing to do really.

> +
> +    switch (sub_class) {
> +    case 0: /* Cacheline Memory Sparing */
> +        qemu_log("Cacheline Memory Sparing\n");
> +        return CXL_MBOX_SUCCESS;
> +    case 1: /* Row Memory Sparing */
> +        qemu_log("Row Memory Sparing\n");
> +        return CXL_MBOX_SUCCESS;
> +    case 2: /* Bank Memory Sparing */
> +        qemu_log("Bank Memory Sparing\n");
> +        return CXL_MBOX_SUCCESS;
> +    case 3: /* Rank Memory Sparing */
> +        qemu_log("Rank Memory Sparing\n");
> +        return CXL_MBOX_SUCCESS;
> +    default:
> +        return CXL_MBOX_UNSUPPORTED;

As previously - I think this is invalid parameter as the command is supported
just not the sub_class.

> +    }
> +}
> +



  reply	other threads:[~2025-06-20 14:48 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-06-19 15:16 [PATCH v2 0/7] hw/cxl: Update CXL events to rev3.2 and add maintenance support for memory repair features shiju.jose
2025-06-19 15:16 ` shiju.jose--- via
2025-06-19 15:16 ` [PATCH v2 1/7] hw/cxl/events: Update for rev3.2 common event record format shiju.jose
2025-06-19 15:16   ` shiju.jose--- via
2025-06-20 14:03   ` Jonathan Cameron
2025-06-20 14:03     ` Jonathan Cameron via
2025-06-24  9:10     ` Shiju Jose
2025-06-24  9:10       ` Shiju Jose via
2025-06-19 15:16 ` [PATCH v2 2/7] hw/cxl/events: Updates for rev3.2 general media event record shiju.jose
2025-06-19 15:16   ` shiju.jose--- via
2025-06-20 14:13   ` Jonathan Cameron
2025-06-20 14:13     ` Jonathan Cameron via
2025-06-19 15:16 ` [PATCH v2 3/7] hw/cxl/events: Updates for rev3.2 DRAM " shiju.jose
2025-06-19 15:16   ` shiju.jose--- via
2025-06-19 15:16 ` [PATCH v2 4/7] hw/cxl/events: Updates for rev3.2 memory module " shiju.jose
2025-06-19 15:16   ` shiju.jose--- via
2025-06-19 15:16 ` [PATCH v2 5/7] hw/cxl/cxl-mailbox-utils: Move declaration of scrub and ECS feature attributes in cmd_features_set_feature() shiju.jose
2025-06-19 15:16   ` shiju.jose--- via
2025-06-20 14:16   ` Jonathan Cameron
2025-06-20 14:16     ` Jonathan Cameron via
2025-06-19 15:16 ` [PATCH v2 6/7] hw/cxl: Add Maintenance support shiju.jose
2025-06-19 15:16   ` shiju.jose--- via
2025-06-20 14:40   ` Jonathan Cameron
2025-06-20 14:40     ` Jonathan Cameron via
2025-06-19 15:16 ` [PATCH v2 7/7] hw/cxl: Add emulation for memory sparing control feature shiju.jose
2025-06-19 15:16   ` shiju.jose--- via
2025-06-20 14:48   ` Jonathan Cameron [this message]
2025-06-20 14:48     ` Jonathan Cameron via

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250620154813.00002bbd@huawei.com \
    --to=jonathan.cameron@huawei.com \
    --cc=linux-cxl@vger.kernel.org \
    --cc=linuxarm@huawei.com \
    --cc=prime.zeng@hisilicon.com \
    --cc=qemu-devel@nongnu.org \
    --cc=shiju.jose@huawei.com \
    --cc=tanxiaofei@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.