All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jonathan Cameron <Jonathan.Cameron@Huawei.com>
To: Gregory Price <gregory.price@memverge.com>
Cc: <nifan.cxl@gmail.com>, <qemu-devel@nongnu.org>,
	<linux-cxl@vger.kernel.org>, <ira.weiny@intel.com>,
	<dan.j.williams@intel.com>, <a.manzanares@samsung.com>,
	<dave@stgolabs.net>, <nmtadam.samsung@gmail.com>,
	<jim.harris@samsung.com>, <Jorgen.Hansen@wdc.com>,
	<wj28.lee@gmail.com>, Fan Ni <fan.ni@samsung.com>
Subject: Re: [PATCH v6 09/12] hw/cxl/events: Add qmp interfaces to add/release dynamic capacity extents
Date: Fri, 5 Apr 2024 18:44:52 +0100	[thread overview]
Message-ID: <20240405184452.00007986@Huawei.com> (raw)
In-Reply-To: <ZhAh0Qmv2/VTe1wT@memverge.com>

On Fri, 5 Apr 2024 12:07:45 -0400
Gregory Price <gregory.price@memverge.com> wrote:

> On Fri, Apr 05, 2024 at 01:27:19PM +0100, Jonathan Cameron wrote:
> > On Wed, 3 Apr 2024 14:16:25 -0400
> > Gregory Price <gregory.price@memverge.com> wrote:
> > 
> > A few follow up comments.
> >   
> > >   
> > > > +        error_setg(errp, "no valid extents to send to process");
> > > > +        return;
> > > > +    }
> > > > +    
> > > 
> > > I'm looking at adding the MHD extensions around this point, e.g.:
> > > 
> > > /* If MHD cannot allocate requested extents, the cmd fails */
> > > if (type == DC_EVENT_ADD_CAPACITY && dcd->mhd_dcd_extents_allocate &&
> > >     num_extents != dcd->mhd_dcd_extents_allocate(...))
> > > 	return;
> > > 
> > > where mhd_dcd_extents_allocate checks the MHD block bitmap and tags
> > > for correctness (shared // no double-allocations, etc). On success,
> > > it garuantees proper ownership.
> > > 
> > > the release path would then be done in the release response path from
> > > the host, as opposed to the release event injection.  
> > 
> > I think it would be polite to check if the QMP command on release
> > for whether it is asking something plausible - makes for an easier
> > to user QMP interface.  I guess it's not strictly required though.
> > What races are there on release?  
> 
> The only real critical section, barring force-release beign supported,
> is when you clear the bits in the device allowing new requests to swipe
> those blocks. The appropriate place appears to be after the host kernel
> has responded to the release extent request.

Agreed you can't release till then, but you can check if it's going to 
work.  I think that's worth doing for ease of use reasons.

> 
> Also need to handle the case of multiple add-requests contending for the
> same region, but that's just an "oops failed to get all the bits, roll
> back" scenario - easy to handle.
> 
> Could go coarse-grained to just lock access to the bitmap entirely while
> operating on it, or be fancy and use atomics to go lockless. The latter
> code already exists in the Niagara model for reference.

I'm fine either way, though I'd just use a lock in initial version()

> 
> > We aren't support force release
> > for now, and for anything else, it's host specific (unlike add where
> > the extra rules kick in).   AS such I 'think' a check at command
> > time will be valid as long as the host hasn't done an async
> > release of capacity between that and the event record.  That
> > is a race we always have and the host should at most log it and
> > not release capacity twice.
> >  
> 
> Borrowing from the Ira's flow chart, here are the pieces I believe are
> needed to implement MHD support for DCD.
> 
> Orchestrator      FM         Device       Host Kernel    Host User
> 
>     |             |           |            |              |
>     |-- Add ----->|-- Add --->A--- Add --->|              |
>     |  Capacity   |  Extent   |   Extent   |              |
>     |             |           |            |              |
>     |             |<--Accept--B<--Accept --|              |
>     |             |   Extent  |   Extent   |              |
>     |             |           |            |              |
>     |             |     ... snip ...       |              |
>     |             |           |            |              |
>     |-- Remove -->|--Release->C---Release->|              |
>     |  Capacity   |  Extent   |   Extent   |              |
>     |             |           |            |              |
>     |             |<-Release--D<--Release--|              |
>     |             |  Extent   |   Extent   |              |
>     |             |           |            |              |
> 
> 1. (A) Upon Device Receiving Add Capacity Request
>    a. the device sanity checks the request against local mappings
>    b. the mhd hook is called to sanity check against global mappings
>    c. the mhd bitmap is updated, marking the capacity owned by that head
> 
>    function: qmp_cxl_process_dynamic_capacity
> 
> 2. (B) Upon Device Receiving Add Dynamic Capacity Response
>    a. accepted extents are compared to the original request
>    b. not accepted extents are cleared from the bitmap (local and MHD)
>    (Note: My understanding is that for now each request = 1 extent)

Yeah but that is a restriction I think we need to solve soon.

> 
>    function: cmd_dcd_add_dyn_cap_rsp
> 
> 3. (C) Upon Device receiving Release Dynamic Capacity Request
>    a. check for a pending release request. If exists, error.

Not sure that's necessary - can queue as long as the head
can track if the bits are in a pending release state.

>    b. check that the bits in the MHD bitmap are actually set
Good.
> 
>    function: qmp_cxl_process_dynamic_capacity
> 
> 4. (D) Upon Device receiving Release Dynamic Capacity Response
>    a. clear the bits in the mhd bitmap
>    b. remove the pending request from the pending list
> 
>    function: cmd_dcd_release_dyn_cap
> 
> Something to note: The MHD bitmap is essentially the same as the
> existing DCD extent bitmap - except that it is located in a shared
> region of memory (mmap file, shm, whatever - pick one).

I think you will ideally also have a per head one to track head access
to the things offered by the mhd.

> 
> Maybe it's worth abstracting out the bitmap twiddling to make that
> backable by a file mmap'd SHARED and use atomics to twiddle the bits?
> 
> That would be about 90% of the way to MH-DCD.

> 
> Maybe flock() could be used for coarse locking on the a shared bitmap
> in the short term?  This mitigates your concern of using shm.h as
> the coordination piece, though i'm not sure how portable flock() is.
Sounds nice, but you are wondering into that pesky userspace stuff
where I'd have to google a lot to understand :)

Jonathan

> 
> ~Gregory


WARNING: multiple messages have this Message-ID (diff)
From: Jonathan Cameron via <qemu-devel@nongnu.org>
To: Gregory Price <gregory.price@memverge.com>
Cc: <nifan.cxl@gmail.com>, <qemu-devel@nongnu.org>,
	<linux-cxl@vger.kernel.org>, <ira.weiny@intel.com>,
	<dan.j.williams@intel.com>, <a.manzanares@samsung.com>,
	<dave@stgolabs.net>, <nmtadam.samsung@gmail.com>,
	<jim.harris@samsung.com>, <Jorgen.Hansen@wdc.com>,
	<wj28.lee@gmail.com>, Fan Ni <fan.ni@samsung.com>
Subject: Re: [PATCH v6 09/12] hw/cxl/events: Add qmp interfaces to add/release dynamic capacity extents
Date: Fri, 5 Apr 2024 18:44:52 +0100	[thread overview]
Message-ID: <20240405184452.00007986@Huawei.com> (raw)
In-Reply-To: <ZhAh0Qmv2/VTe1wT@memverge.com>

On Fri, 5 Apr 2024 12:07:45 -0400
Gregory Price <gregory.price@memverge.com> wrote:

> On Fri, Apr 05, 2024 at 01:27:19PM +0100, Jonathan Cameron wrote:
> > On Wed, 3 Apr 2024 14:16:25 -0400
> > Gregory Price <gregory.price@memverge.com> wrote:
> > 
> > A few follow up comments.
> >   
> > >   
> > > > +        error_setg(errp, "no valid extents to send to process");
> > > > +        return;
> > > > +    }
> > > > +    
> > > 
> > > I'm looking at adding the MHD extensions around this point, e.g.:
> > > 
> > > /* If MHD cannot allocate requested extents, the cmd fails */
> > > if (type == DC_EVENT_ADD_CAPACITY && dcd->mhd_dcd_extents_allocate &&
> > >     num_extents != dcd->mhd_dcd_extents_allocate(...))
> > > 	return;
> > > 
> > > where mhd_dcd_extents_allocate checks the MHD block bitmap and tags
> > > for correctness (shared // no double-allocations, etc). On success,
> > > it garuantees proper ownership.
> > > 
> > > the release path would then be done in the release response path from
> > > the host, as opposed to the release event injection.  
> > 
> > I think it would be polite to check if the QMP command on release
> > for whether it is asking something plausible - makes for an easier
> > to user QMP interface.  I guess it's not strictly required though.
> > What races are there on release?  
> 
> The only real critical section, barring force-release beign supported,
> is when you clear the bits in the device allowing new requests to swipe
> those blocks. The appropriate place appears to be after the host kernel
> has responded to the release extent request.

Agreed you can't release till then, but you can check if it's going to 
work.  I think that's worth doing for ease of use reasons.

> 
> Also need to handle the case of multiple add-requests contending for the
> same region, but that's just an "oops failed to get all the bits, roll
> back" scenario - easy to handle.
> 
> Could go coarse-grained to just lock access to the bitmap entirely while
> operating on it, or be fancy and use atomics to go lockless. The latter
> code already exists in the Niagara model for reference.

I'm fine either way, though I'd just use a lock in initial version()

> 
> > We aren't support force release
> > for now, and for anything else, it's host specific (unlike add where
> > the extra rules kick in).   AS such I 'think' a check at command
> > time will be valid as long as the host hasn't done an async
> > release of capacity between that and the event record.  That
> > is a race we always have and the host should at most log it and
> > not release capacity twice.
> >  
> 
> Borrowing from the Ira's flow chart, here are the pieces I believe are
> needed to implement MHD support for DCD.
> 
> Orchestrator      FM         Device       Host Kernel    Host User
> 
>     |             |           |            |              |
>     |-- Add ----->|-- Add --->A--- Add --->|              |
>     |  Capacity   |  Extent   |   Extent   |              |
>     |             |           |            |              |
>     |             |<--Accept--B<--Accept --|              |
>     |             |   Extent  |   Extent   |              |
>     |             |           |            |              |
>     |             |     ... snip ...       |              |
>     |             |           |            |              |
>     |-- Remove -->|--Release->C---Release->|              |
>     |  Capacity   |  Extent   |   Extent   |              |
>     |             |           |            |              |
>     |             |<-Release--D<--Release--|              |
>     |             |  Extent   |   Extent   |              |
>     |             |           |            |              |
> 
> 1. (A) Upon Device Receiving Add Capacity Request
>    a. the device sanity checks the request against local mappings
>    b. the mhd hook is called to sanity check against global mappings
>    c. the mhd bitmap is updated, marking the capacity owned by that head
> 
>    function: qmp_cxl_process_dynamic_capacity
> 
> 2. (B) Upon Device Receiving Add Dynamic Capacity Response
>    a. accepted extents are compared to the original request
>    b. not accepted extents are cleared from the bitmap (local and MHD)
>    (Note: My understanding is that for now each request = 1 extent)

Yeah but that is a restriction I think we need to solve soon.

> 
>    function: cmd_dcd_add_dyn_cap_rsp
> 
> 3. (C) Upon Device receiving Release Dynamic Capacity Request
>    a. check for a pending release request. If exists, error.

Not sure that's necessary - can queue as long as the head
can track if the bits are in a pending release state.

>    b. check that the bits in the MHD bitmap are actually set
Good.
> 
>    function: qmp_cxl_process_dynamic_capacity
> 
> 4. (D) Upon Device receiving Release Dynamic Capacity Response
>    a. clear the bits in the mhd bitmap
>    b. remove the pending request from the pending list
> 
>    function: cmd_dcd_release_dyn_cap
> 
> Something to note: The MHD bitmap is essentially the same as the
> existing DCD extent bitmap - except that it is located in a shared
> region of memory (mmap file, shm, whatever - pick one).

I think you will ideally also have a per head one to track head access
to the things offered by the mhd.

> 
> Maybe it's worth abstracting out the bitmap twiddling to make that
> backable by a file mmap'd SHARED and use atomics to twiddle the bits?
> 
> That would be about 90% of the way to MH-DCD.

> 
> Maybe flock() could be used for coarse locking on the a shared bitmap
> in the short term?  This mitigates your concern of using shm.h as
> the coordination piece, though i'm not sure how portable flock() is.
Sounds nice, but you are wondering into that pesky userspace stuff
where I'd have to google a lot to understand :)

Jonathan

> 
> ~Gregory



  reply	other threads:[~2024-04-05 17:44 UTC|newest]

Thread overview: 65+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-25 19:02 [PATCH v6 00/12] Enabling DCD emulation support in Qemu nifan.cxl
2024-03-25 19:02 ` [PATCH v6 01/12] hw/cxl/cxl-mailbox-utils: Add dc_event_log_size field to output payload of identify memory device command nifan.cxl
2024-03-25 19:02 ` [PATCH v6 02/12] hw/cxl/cxl-mailbox-utils: Add dynamic capacity region representative and mailbox command support nifan.cxl
2024-03-25 19:02 ` [PATCH v6 03/12] include/hw/cxl/cxl_device: Rename mem_size as static_mem_size for type3 memory devices nifan.cxl
2024-03-25 19:02 ` [PATCH v6 04/12] hw/mem/cxl_type3: Add support to create DC regions to " nifan.cxl
2024-03-25 19:02 ` [PATCH v6 05/12] hw/mem/cxl-type3: Refactor ct3_build_cdat_entries_for_mr to take mr size instead of mr as argument nifan.cxl
2024-03-25 19:02 ` [PATCH v6 06/12] hw/mem/cxl_type3: Add host backend and address space handling for DC regions nifan.cxl
2024-04-05 10:58   ` Jonathan Cameron
2024-04-05 10:58     ` Jonathan Cameron via
2024-03-25 19:02 ` [PATCH v6 07/12] hw/mem/cxl_type3: Add DC extent list representative and get DC extent list mailbox support nifan.cxl
2024-04-05 11:08   ` Jonathan Cameron
2024-04-05 11:08     ` Jonathan Cameron via
2024-03-25 19:02 ` [PATCH v6 08/12] hw/cxl/cxl-mailbox-utils: Add mailbox commands to support add/release dynamic capacity response nifan.cxl
2024-04-04 13:32   ` Jørgen Hansen
2024-04-05 11:12     ` Jonathan Cameron
2024-04-05 11:12       ` Jonathan Cameron via
2024-04-09 19:21     ` fan
2024-04-15 17:56     ` fan
2024-04-16 10:02       ` Jørgen Hansen
2024-04-16 16:27         ` fan
2024-04-15 18:00     ` fan
2024-04-05 11:39   ` Jonathan Cameron
2024-04-05 11:39     ` Jonathan Cameron via
2024-03-25 19:02 ` [PATCH v6 09/12] hw/cxl/events: Add qmp interfaces to add/release dynamic capacity extents nifan.cxl
2024-04-03 18:16   ` Gregory Price
2024-04-05 12:27     ` Jonathan Cameron
2024-04-05 12:27       ` Jonathan Cameron via
2024-04-05 16:07       ` Gregory Price
2024-04-05 17:44         ` Jonathan Cameron [this message]
2024-04-05 17:44           ` Jonathan Cameron via
2024-04-05 18:09           ` Gregory Price
2024-04-09 16:10             ` Jonathan Cameron
2024-04-09 16:10               ` Jonathan Cameron via
2024-04-05 12:18   ` Jonathan Cameron
2024-04-05 12:18     ` Jonathan Cameron via
2024-04-09 21:26     ` fan
2024-04-10 19:49       ` Jonathan Cameron
2024-04-10 19:49         ` Jonathan Cameron via
2024-04-15 20:06         ` fan
2024-04-16 14:58           ` Jonathan Cameron
2024-04-16 14:58             ` Jonathan Cameron via
2024-04-16 16:52             ` fan
2024-04-17 11:50               ` Jonathan Cameron
2024-04-17 11:50                 ` Jonathan Cameron via
2024-04-16 17:14             ` Gregory Price
2024-03-25 19:02 ` [PATCH v6 10/12] hw/mem/cxl_type3: Add dpa range validation for accesses to DC regions nifan.cxl
2024-04-05 12:29   ` Jonathan Cameron
2024-04-05 12:29     ` Jonathan Cameron via
2024-04-12 22:54   ` Gregory Price
2024-04-15 17:37     ` fan
2024-04-16 15:00       ` Jonathan Cameron
2024-04-16 15:00         ` Jonathan Cameron via
2024-04-16 16:37         ` fan
2024-04-17 11:59           ` Jonathan Cameron
2024-04-17 11:59             ` Jonathan Cameron via
2024-04-18 17:58             ` Gregory Price
2024-04-16 17:15         ` Gregory Price
2024-03-25 19:02 ` [PATCH v6 11/12] hw/cxl/cxl-mailbox-utils: Add superset extent release mailbox support nifan.cxl
2024-04-05  9:57   ` Jørgen Hansen
2024-04-15 20:17     ` fan
2024-04-05 12:32   ` Jonathan Cameron
2024-04-05 12:32     ` Jonathan Cameron via
2024-03-25 19:02 ` [PATCH v6 12/12] hw/mem/cxl_type3: Allow to release extent superset in QMP interface nifan.cxl
2024-04-05 12:33   ` Jonathan Cameron
2024-04-05 12:33     ` Jonathan Cameron via

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240405184452.00007986@Huawei.com \
    --to=jonathan.cameron@huawei.com \
    --cc=Jorgen.Hansen@wdc.com \
    --cc=a.manzanares@samsung.com \
    --cc=dan.j.williams@intel.com \
    --cc=dave@stgolabs.net \
    --cc=fan.ni@samsung.com \
    --cc=gregory.price@memverge.com \
    --cc=ira.weiny@intel.com \
    --cc=jim.harris@samsung.com \
    --cc=linux-cxl@vger.kernel.org \
    --cc=nifan.cxl@gmail.com \
    --cc=nmtadam.samsung@gmail.com \
    --cc=qemu-devel@nongnu.org \
    --cc=wj28.lee@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.