From: Jonathan Cameron <Jonathan.Cameron@huawei.com>
To: Gregory Price <gourry@gourry.net>
Cc: <linux-cxl@vger.kernel.org>, <qemu-devel@nongnu.org>,
<svetly.todorov@memverge.com>, <nifan.cxl@gmail.com>
Subject: Re: [PATCH RFC v3 3/3] mhsld: implement MHSLD device
Date: Thu, 12 Dec 2024 17:40:16 +0000 [thread overview]
Message-ID: <20241212174016.0000002a@huawei.com> (raw)
In-Reply-To: <20241018161252.8896-4-gourry@gourry.net>
On Fri, 18 Oct 2024 12:12:52 -0400
Gregory Price <gourry@gourry.net> wrote:
> From: Svetly Todorov <svetly.todorov@memverge.com>
>
> Using a shared-memory bytemap, validates that DC adds, releases,
> and reclamations happen on extents belonging to the appropriate
> host.
>
> The MHSLD device inherits from the CXL_TYPE3 class and adds the following
> configuration options:
As in previous the -- syntax is somewhat confusing as that's
not how the parameters are applied.
> --mhd-head=<u32>
> --mhd-state_file=<str>
> --mhd-init=<bool>
>
> --mhd-head specifies the head ID of the host on the given device.
>
> --mhd-state_file is the name of the shared-memory-backed file used
> to store the MHD state.
>
> --mhd-init indicates whether this QEMU instance should initialize
> the state_file; if so, the instance will create the file if it does
> not exist, ftruncate it to the appropriate size, and initialize its
> header. It is assumed that the --mhd-init instance is run and allowed
> to completely finish configuration before any other guests access the
> shared state.
>
> The shared state file only needs to be intialized once. Even if a guest
> dies without clearing the ownership bits associated with its head-ID,
> future guests with that ID will clear those bits in cxl_mhsld_realize(),
> regardless of whether mhd_init is true or false.
That sounds like a race condition if not all hosts are brought
up before the first add.
>
> The following command line options create an MHSLD with 4GB of
> backing memory, whose state is tracked in /dev/shm/mhd_metadata.
> --mhd-init=true tells this instance to initialize the state as
> described above.
>
> ./qemu-system_x86-64 \
> [... other options ...] \
> -device pxb-cxl,id=cxl.0,bus=pcie.0,bus_nr=52 \
> -device cxl-rp,id=rp0,bus=cxl.0,chassis=0,port=0,slot=0 \
> -object memory-backend-ram,id=mem0,size=4G \
> -device cxl-mhsld,bus=rp0,num-dc-regions=1,volatile-dc-memdev=mem0,id=cxl-mem0,sn=66667,mhd-head=0,mhd-state_file=mhd_metadata,mhd-init=true \
> -M cxl-fmw.0.targets.0=cxl.0,cxl-fmw.0.size=4G \
> -qmp unix:/tmp/qmp-sock-1,server,nowait
>
> Once this guest completes setup, other guests looking to access the
> device can be booted with the same configuration options, but with
> --mhd-head != 0,
> --mhd-init=false,
> and a different QMP socket.
>
> Signed-off-by: Gregory Price <gourry@gourry.net>
> Signed-off-by: Svetly Todorov <svetly.todorov@memverge.com>
A few trivial things inline.
In general the scheme looks workable but I'm not sure the contraints at setup time
etc are suitable for an upstream solution. Certainly a useful tool to have
for kernel development though so I'll try and find time in next few days to apply
this on my gitlab tree.
Longer term I think we need a more complex external program or a main / proxy
type arrangement so that ordering requirements can be enforce and we can have
richer info. Having to chat to each qmp interface independently works fine is
also a bit more complex than I think we would eventually want.
Having a solution in place though will make it much easier to move towards
an eventual upstreamable solution. This is a great place to start from!
Jonathan
> +static bool cxl_mhsld_release_extent(PCIDevice *d, uint64_t dpa, uint64_t len)
> +{
> + cxl_mhsld_state_clear(CXL_MHSLD(d), dpa / MHSLD_BLOCK_SZ,
> + len / MHSLD_BLOCK_SZ);
Trivial but align after (
> + return true;
> +}
> diff --git a/hw/cxl/mhsld/mhsld.h b/hw/cxl/mhsld/mhsld.h
> new file mode 100644
> index 0000000000..e7ead1f0d2
> --- /dev/null
> +++ b/hw/cxl/mhsld/mhsld.h
> @@ -0,0 +1,75 @@
> +/*
> + * SPDX-License-Identifier: GPL-2.0-or-later
> + *
> + * Copyright (c) 2024 MemVerge Inc.
> + *
> + */
> +
> +#ifndef CXL_MHSLD_H
> +#define CXL_MHSLD_H
> +#include <stdint.h>
> +#include "hw/cxl/cxl.h"
> +#include "hw/cxl/cxl_mailbox.h"
> +#include "hw/cxl/cxl_device.h"
> +#include "qemu/units.h"
> +
> +#define MHSLD_BLOCK_SZ (2 * MiB)
> +
> +/*
> + * We limit the number of heads to prevent the shared state
> + * region from becoming a major memory hog. We need 512MB of
> + * memory space to track 8-host ownership of 4GB of memory in
> + * blocks of 2MB. This can change if the block size is increased.
I'm lost what makes up that size?
> + */
> +#define MHSLD_HEADS (8)
> +
> +/*
> + * The shared state cannot have 2 variable sized regions
> + * so we have to max out the ldmap.
> + */
> +typedef struct MHSLDSharedState {
> + uint8_t nr_heads;
> + uint8_t nr_lds;
> + uint8_t ldmap[MHSLD_HEADS];
> + uint64_t nr_blocks;
> + uint8_t blocks[];
> +} MHSLDSharedState;
> +
> +struct CXLMHSLDState {
> + CXLType3Dev ct3d;
> + bool mhd_init;
> + char *mhd_state_file;
> + int mhd_state_fd;
> + size_t mhd_state_size;
> + uint32_t mhd_head;
> + MHSLDSharedState *mhd_state;
> +};
> +
> +struct CXLMHSLDClass {
> + CXLType3Class parent_class;
> +};
> +
> +enum {
> + MHSLD_MHD = 0x55,
> + #define GET_MHD_INFO 0x0
> +};
> +
> +/*
> + * MHD Get Info Command
> + * Returns information the LD's associated with this head
> + */
> +typedef struct MHDGetInfoInput {
> + uint8_t start_ld;
> + uint8_t ldmap_len;
> +} QEMU_PACKED MHDGetInfoInput;
> +
> +typedef struct MHDGetInfoOutput {
> + uint8_t nr_lds;
> + uint8_t nr_heads;
> + uint16_t resv1;
> + uint8_t start_ld;
> + uint8_t ldmap_len;
> + uint16_t resv2;
> + uint8_t ldmap[];
> +} QEMU_PACKED MHDGetInfoOutput;
> +#endif
next prev parent reply other threads:[~2024-12-12 17:40 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-10-18 16:12 [PATCH RFC v3 0/3] cxl: Multi-headed Single Logical Device (MHSLD) Gregory Price
2024-10-18 16:12 ` [PATCH RFC v3 1/3] cxl-mailbox-utils: move CXLUpdateDCExtentListInPl into header Gregory Price
2024-10-18 16:12 ` [PATCH RFC v3 2/3] cxl_type3: add MHD callbacks Gregory Price
2024-12-12 17:12 ` Jonathan Cameron
2024-10-18 16:12 ` [PATCH RFC v3 3/3] mhsld: implement MHSLD device Gregory Price
2024-12-12 17:40 ` Jonathan Cameron [this message]
2024-12-12 19:52 ` Gregory Price
2024-12-13 16:18 ` Jonathan Cameron
2024-12-13 17:03 ` Gregory Price
2025-01-24 14:12 ` Jonathan Cameron
2025-01-24 15:50 ` Gregory Price
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20241212174016.0000002a@huawei.com \
--to=jonathan.cameron@huawei.com \
--cc=gourry@gourry.net \
--cc=linux-cxl@vger.kernel.org \
--cc=nifan.cxl@gmail.com \
--cc=qemu-devel@nongnu.org \
--cc=svetly.todorov@memverge.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox