Re: [GIT preview] for-6.3/cxl-ram-region

Linux CXL
 help / color / mirror / Atom feed

From: Gregory Price <gregory.price@memverge.com>
To: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Dan Williams <dan.j.williams@intel.com>, linux-cxl@vger.kernel.org
Subject: Re: [GIT preview] for-6.3/cxl-ram-region
Date: Mon, 30 Jan 2023 09:16:54 -0500	[thread overview]
Message-ID: <Y9fRVu+yLza4d5Vt@memverge.com> (raw)
In-Reply-To: <20230126193424.00005034@huawei.com>

On Thu, Jan 26, 2023 at 07:34:24PM +0000, Jonathan Cameron wrote:
> On Thu, 26 Jan 2023 18:50:25 +0000
> Jonathan Cameron <Jonathan.Cameron@huawei.com> wrote:
> 
> > On Wed, 25 Jan 2023 22:29:15 -0800
> > Dan Williams <dan.j.williams@intel.com> wrote:
> > 
> > > Dan Williams wrote:  
> > > > There are still some sharp edges on this patchset, like the missing
> > > > device-dax hookup, but it is likely enough to show the direction and
> > > > unblock other testing. Specifically I want to see how this fares with
> > > > Greg's recent volatile region provisioning in QEMU.
> > > > 
> > > > I am hoping to have those last bits ironed out before the end of the
> > > > week. Note that this topic branch will rebase so do not base any
> > > > work beyond proof-of-concept on top of it.
> > > > 
> > > > https://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl.git/log/?h=for-6.3/cxl-ram-region    
> > > 
> > > I also spotted at least one bug-fix that needs to be broken out and
> > > submitted separately, so reviewer-beware at this stage.  
> > 
> > So far I'm failing to set target0 via echo which is wonderfully getting
> > an error of SUCCESS.  So I think you are returning rc == 0 somewhere.
> > 
> > 
> > Ah. There is a shadowed int rc variable in store_targetN() else
> > 
> > Now to figure out why attach_target() is failing.  I probably have
> > the config sequence wrong as I've just bodged an existing one I had
> > for pmem.
> > 
> I was trying to add it to a pmem region. Oops.
> 
> Can successfully bind the region driver. But not sure how to test
> beyond that.
> 
> Basically I got to the helpful 'TODO hook up devdax' as you mention above.
> Looks like decoders are programmed correctly as I can read and write from
> the HPA using devmem2.
> 
> So far I've just tested single direct connected device.
> 
> This is against http://gitlab.com/jic23/qemu cxl-2023-01-26 which has been
> there for a good 30 seconds. Mostly unrelated changes wrt to this work, but
> it includes a few trivial tweaks to Gregory's patches as discussed on list.
> 
> Thanks for sharing this early version. It unsticks Gregory's series as far as
> I'm concerned (if there is anything broken in more complex tests that won't
> be related to Gregory's stuff that only effects type 3 devices).
> 
> Jonathan
> 

I found the same results.

Reference command and config for list readers:

sudo /opt/qemu-cxl/bin/qemu-system-x86_64 \
-drive file=/var/lib/libvirt/images/cxl.qcow2,format=qcow2,index=0,media=disk,id=hd \
-m 2G,slots=4,maxmem=4G \
-smp 4 \
-machine type=q35,accel=kvm,cxl=on \
-enable-kvm \
-nographic \
-device pxb-cxl,id=cxl.0,bus=pcie.0,bus_nr=52 \
-device cxl-rp,id=rp0,bus=cxl.0,chassis=0,port=0,slot=0 \
-object memory-backend-ram,id=mem0,size=1G,share=on \
-device cxl-type3,bus=rp0,volatile-memdev=mem0,id=cxl-mem0 \
-M cxl-fmw.0.targets.0=cxl.0,cxl-fmw.0.size=1G


echo region0 > /sys/bus/cxl/devices/decoder0.0/create_ram_region
echo 1 > /sys/bus/cxl/devices/region0/interleave_ways
echo 256 > /sys/bus/cxl/devices/region0/interleave_granularity
echo 0x40000000 > /sys/bus/cxl/devices/region0/size
echo mem0 > /sys/bus/cxl/devices/region0/target0

Not sure if bug/missing feature, but after attaching a device to the
target, you get no output when reading target0

```
[root@fedora ~]# cat /sys/bus/cxl/devices/region0/target0

[root@fedora ~]#
```

Would be nice for the sake of easier topology reporting if either this
reported the configured target or we added a link to the targets into
the directory.

But this looks good to be so far, excited to see the devdax patch, I
think i can whip up a sample DCD device (for command testing and proof
of concept) pretty quickly after this.


One question re: auto-online of the devdax hookup - is the intent for
auto-online to follow /sys/devices/system/memory/auto_online_blocks
settings or should we consider controlling auto-online more granularly?

It's a bit of a catch-22 if we follow auto_online_blocks:
  1) for local memory expanders, if off, this is annoying
	2) for statically configured remote-pools (remote expanders)
     this is annoying for the same reason
	3) for early DCD's (multi-headed expander, no-switch), the pattern
	   / expectation i'm seeing is that the device expects hosts to see all
		 memory blocks when the device is hooked up, and then expects hosts
		 to "play nice" by only onlining blocks that have been allocated.
		 (there is some device-side exclusion features to enforce security).

Basically early DCD's will look like remote expanders with some
exclusivity controls (configured via the DCD commands).

So with the pattern above, lets say you have a 1TB pool attached to 4
hosts.  Each host would produce the following commands:

echo region0 > /sys/bus/cxl/devices/decoder0.0/create_ram_region
echo 1 > /sys/bus/cxl/devices/region0/interleave_ways
echo 256 > /sys/bus/cxl/devices/region0/interleave_granularity
echo 0x10000000000 > /sys/bus/cxl/devices/region0/size
echo mem0 > /sys/bus/cxl/devices/region0/target0

and mem0 would get 4096 memory# blocks (presumably under region/devdax?)

A provisioning command would be sent via the device interface

ioctl(DCD(N blocks)) -> /sys/bus/cxl/devices/mem0/dev 
return: DCD return structure with extents[blocks[a,b,c],...]

Then the final action would be
echo online > /sys/bus/cxl/devices/region0/devdax/memory[a,b,c...]

or online_moveable, or probably some other special zone to make sure
the memory is not used by the kernel (so it can be later released)


So to me, it feels like we might want more granular auto-online control,
but I don't know how possible that is.


Note: This is me relaying what I've seen/heard from some device vendors
in terms of what they think the control scheme will be, so if something
is wildly off-base, it would be good to address the expectations.


Either way: This is awesome, thank you for sharing the preview Dan.

~Gregory

next prev parent reply	other threads:[~2023-01-30 16:55 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-26  6:25 [GIT preview] for-6.3/cxl-ram-region Dan Williams
2023-01-26  6:29 ` Dan Williams
2023-01-26 18:50   ` Jonathan Cameron
2023-01-26 19:34     ` Jonathan Cameron
2023-01-30 14:16       ` Gregory Price [this message]
2023-01-30 20:10         ` Dan Williams
2023-01-30 20:58           ` Gregory Price
2023-01-30 23:18             ` Dan Williams
2023-01-30 22:00               ` Gregory Price
2023-01-31  2:00               ` Gregory Price
2023-01-31 16:56                 ` Dan Williams
2023-01-31 17:59                 ` Verma, Vishal L
2023-01-31 19:03                   ` Gregory Price
2023-01-31 19:46                     ` Verma, Vishal L
2023-01-31 20:24                       ` Verma, Vishal L
2023-01-31 23:03                         ` Gregory Price
2023-01-31 23:17                           ` Gregory Price
2023-01-31 23:50                             ` Fan Ni
2023-02-01  5:29                               ` Gregory Price
2023-02-01 21:16                                 ` Gregory Price
2023-02-02  1:06                                   ` Gregory Price
2023-02-02 16:03                                   ` Jonathan Cameron
2023-02-01 22:05                                     ` Gregory Price
2023-02-02 18:13                                       ` Jonathan Cameron
2023-02-02  0:43                                         ` Gregory Price
2023-02-02 18:18                                       ` Dan Williams
2023-02-02  0:44                                         ` Gregory Price
2023-02-07 16:31                                           ` Jonathan Cameron
2023-01-30 14:23       ` Gregory Price
2023-01-31 14:56         ` Jonathan Cameron
2023-01-31 17:34           ` Gregory Price
2023-01-26 22:05 ` Gregory Price
2023-01-26 22:20   ` Dan Williams
2023-02-04  2:36 ` Dan Williams

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y9fRVu+yLza4d5Vt@memverge.com \
    --to=gregory.price@memverge.com \
    --cc=Jonathan.Cameron@huawei.com \
    --cc=dan.j.williams@intel.com \
    --cc=linux-cxl@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox