From: Dave Jiang <dave.jiang@intel.com>
To: alison.schofield@intel.com, Davidlohr Bueso <dave@stgolabs.net>,
Jonathan Cameron <jonathan.cameron@huawei.com>,
Vishal Verma <vishal.l.verma@intel.com>,
Ira Weiny <ira.weiny@intel.com>,
Dan Williams <dan.j.williams@intel.com>
Cc: linux-cxl@vger.kernel.org
Subject: Re: [PATCH] cxl/region: Allow 6 & 12 way regions on 3-way HB interleaves
Date: Mon, 24 Feb 2025 17:41:20 -0700 [thread overview]
Message-ID: <010e16c3-0f90-41b0-8939-4cc8cb26de7a@intel.com> (raw)
In-Reply-To: <20250224235817.2259508-1-alison.schofield@intel.com>
On 2/24/25 4:58 PM, alison.schofield@intel.com wrote:
> From: Alison Schofield <alison.schofield@intel.com>
>
> The CXL driver requires the granularity of a region and it's root
s/it's/its/
> decoder to be the same. This is particularly restrictive for 3-way
> host bridge interleaves where the only spec defined interleave
> configurations for creating 6-way and 12-way regions on a 3-way HB
configuration?
> interleave require mixed granularities.
requries?
>
> CXL 3.1 Specification 9.13.1.1:
May as well go to 3.2
> Legal Interleaving Configurations: 12-way, 6-way, and 3-way
>
> Adding support for these new interleaves touches these areas:
>
> 1) User created regions employing "cxl create-region" fail when the
> ndctl tool gets a mixed granularity request. That is addressed in
s/ndctl/CXL CLI/
> a patch to the ndctl tool.
>
> 2) User created regions employing sysfs directly fail at the sysfs
> store of the non-matching region granularity. That restriction is
> lifted here. Note that the driver immediately allows any reasonable
> 3-way HB granularity, but the region config may still fail later
> if the ways/gran between region and root decoder are incompatible.
>
> 3) Auto created regions, in which the drivers role is to reflect
> the BIOS created region and provide RAS support, basically sneak on
> by. The driver ignores the root decoders granularity and assumes it
> is the same as the regions. The impact being that endpoint devices
> appear out of order and DPA to HPA address translations that depend
> on that order are invalid. Here the driver stops making that same
> granularity assumption and checks for an allowable granularity.
>
> A new helper, interleave_granularity_allow(), is used in both the
> user and auto creation path to allow the newly supported configs.
>
> Another new helper, gran_multiple(), is used in sorting, target
> list searching, and distance calculations, supporting the new
> root granularity to region granularity multiples.
>
> And, as noted in 2), there's an added check to see if the
> selected sysfs ways/gran make sense when considered together.
>
> Signed-off-by: Alison Schofield <alison.schofield@intel.com>
> ---
> drivers/cxl/core/region.c | 101 +++++++++++++++++++++++++++++++-------
> 1 file changed, 82 insertions(+), 19 deletions(-)
>
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index 3cb91cf0e2dd..9d4716b7fcf8 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -532,6 +532,31 @@ static ssize_t interleave_granularity_show(struct device *dev,
> return rc;
> }
>
> +static bool interleave_granularity_allow(struct cxl_decoder *cxld, u16 ig)
> +{
> + /*
> + * When the host-bridge is interleaved, disallow region granularity
> + * != root granularity with the exception of 3-way HB interleaves.
> + * Allow the CXL Spec defined 3-way HB interleaves that can only be
> + * configured when host-bridge interleave is greater that the
> + * region interleave. (CXL 3.1 Specification 9.13.1.1)
spec 3.2
> + * Allow 2+2+2 interleave where HB gran is 2 * region granularity
> + * 4+4+4 interleave where HB gran is 4 * region granularity
> + *
> + * Regions with a granularity greater than the root interleave result
> + * in invalid DPA translations (invalid to support).
> + */
> + if (cxld->interleave_ways > 1 && ig != cxld->interleave_granularity) {
> + if (cxld->interleave_ways != 3)
> + return false;
> +
> + if (cxld->interleave_granularity % (2 * ig) &&
> + cxld->interleave_granularity % (4 * ig))
> + return false;
Can you please explain how the math works here? I'm not understanding the code here vs the comments above.
DJ
> + }
> + return true;
> +}
> +
> static ssize_t interleave_granularity_store(struct device *dev,
> struct device_attribute *attr,
> const char *buf, size_t len)
> @@ -551,15 +576,7 @@ static ssize_t interleave_granularity_store(struct device *dev,
> if (rc)
> return rc;
>
> - /*
> - * When the host-bridge is interleaved, disallow region granularity !=
> - * root granularity. Regions with a granularity less than the root
> - * interleave result in needing multiple endpoints to support a single
> - * slot in the interleave (possible to support in the future). Regions
> - * with a granularity greater than the root interleave result in invalid
> - * DPA translations (invalid to support).
> - */
> - if (cxld->interleave_ways > 1 && val != cxld->interleave_granularity)
> + if (!interleave_granularity_allow(cxld, val))
> return -EINVAL;
>
> rc = down_write_killable(&cxl_region_rwsem);
> @@ -1290,6 +1307,23 @@ static int check_interleave_cap(struct cxl_decoder *cxld, int iw, int ig)
> return 0;
> }
>
> +static int gran_multiple(struct cxl_region *cxlr)
> +{
> + struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(cxlr->dev.parent);
> + int root_decoder_gran = cxlrd->cxlsd.cxld.interleave_granularity;
> + int region_gran = cxlr->params.interleave_granularity;
> +
> + /*
> + * In regions built upon 3-way HB interleaves, the root decoder
> + * granularity can be a multiple of the region granularity. The
> + * multiple value is used in sorting and distance calculations.
> + */
> + if (cxlrd->cxlsd.cxld.interleave_ways != 3 || !root_decoder_gran)
> + return 1;
> +
> + return root_decoder_gran / region_gran;
> +}
> +
> static int cxl_port_setup_targets(struct cxl_port *port,
> struct cxl_region *cxlr,
> struct cxl_endpoint_decoder *cxled)
> @@ -1321,6 +1355,7 @@ static int cxl_port_setup_targets(struct cxl_port *port,
> cxlsd = to_cxl_switch_decoder(&cxld->dev);
> if (cxl_rr->nr_targets_set) {
> int i, distance = 1;
> + int multiple = gran_multiple(cxlr);
> struct cxl_region_ref *cxl_rr_iter;
>
> /*
> @@ -1332,12 +1367,17 @@ static int cxl_port_setup_targets(struct cxl_port *port,
> * always 1 as every index targets a different host-bridge. At
> * each subsequent switch level those ports map every Nth region
> * position where N is the width of the switch == distance.
> + *
> + * With the introduction of mixed granularities in 3-way HB
> + * interleaves, divide N by a multiple that represents the root
> + * decoder to region granularity.
> */
> do {
> cxl_rr_iter = cxl_rr_load(iter, cxlr);
> distance *= cxl_rr_iter->nr_targets;
> iter = to_cxl_port(iter->dev.parent);
> } while (!is_cxl_root(iter));
> + distance /= multiple;
> distance *= cxlrd->cxlsd.cxld.interleave_ways;
>
> for (i = 0; i < cxl_rr->nr_targets_set; i++)
> @@ -1354,12 +1394,9 @@ static int cxl_port_setup_targets(struct cxl_port *port,
> if (is_cxl_root(parent_port)) {
> /*
> * Root decoder IG is always set to value in CFMWS which
> - * may be different than this region's IG. We can use the
> - * region's IG here since interleave_granularity_store()
> - * does not allow interleaved host-bridges with
> - * root IG != region IG.
> + * may be different than this region's IG.
> */
> - parent_ig = p->interleave_granularity;
> + parent_ig = cxlrd->cxlsd.cxld.interleave_granularity;
> parent_iw = cxlrd->cxlsd.cxld.interleave_ways;
> /*
> * For purposes of address bit routing, use power-of-2 math for
> @@ -1431,7 +1468,7 @@ static int cxl_port_setup_targets(struct cxl_port *port,
>
> if (test_bit(CXL_REGION_F_AUTO, &cxlr->flags)) {
> if (cxld->interleave_ways != iw ||
> - cxld->interleave_granularity != ig ||
> + !interleave_granularity_allow(cxld, ig) ||
> cxld->hpa_range.start != p->res->start ||
> cxld->hpa_range.end != p->res->end ||
> ((cxld->flags & CXL_DECODER_F_ENABLE) == 0)) {
> @@ -1661,11 +1698,12 @@ static int cxl_region_attach_position(struct cxl_region *cxlr,
> struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
> struct cxl_switch_decoder *cxlsd = &cxlrd->cxlsd;
> struct cxl_decoder *cxld = &cxlsd->cxld;
> + int multiple = gran_multiple(cxlr);
> int iw = cxld->interleave_ways;
> struct cxl_port *iter;
> int rc;
>
> - if (dport != cxlrd->cxlsd.target[pos % iw]) {
> + if (dport != cxlrd->cxlsd.target[pos / multiple % iw]) {
> dev_dbg(&cxlr->dev, "%s:%s invalid target position for %s\n",
> dev_name(&cxlmd->dev), dev_name(&cxled->cxld.dev),
> dev_name(&cxlrd->cxlsd.cxld.dev));
> @@ -1809,7 +1847,8 @@ static int find_pos_and_ways(struct cxl_port *port, struct range *range,
> * Return: position >= 0 on success
> * -ENXIO on failure
> */
> -static int cxl_calc_interleave_pos(struct cxl_endpoint_decoder *cxled)
> +static int cxl_calc_interleave_pos(struct cxl_endpoint_decoder *cxled,
> + int multiple)
> {
> struct cxl_port *iter, *port = cxled_to_port(cxled);
> struct cxl_memdev *cxlmd = cxled_to_memdev(cxled);
> @@ -1855,6 +1894,11 @@ static int cxl_calc_interleave_pos(struct cxl_endpoint_decoder *cxled)
> if (rc)
> return rc;
>
> + if (multiple > 1 && is_cxl_root(next_port(iter))) {
> + pos = pos + multiple * parent_pos;
> + break;
> + }
> +
> pos = pos * parent_ways + parent_pos;
> }
>
> @@ -1869,12 +1913,13 @@ static int cxl_calc_interleave_pos(struct cxl_endpoint_decoder *cxled)
> static int cxl_region_sort_targets(struct cxl_region *cxlr)
> {
> struct cxl_region_params *p = &cxlr->params;
> + int multiple = gran_multiple(cxlr);
> int i, rc = 0;
>
> for (i = 0; i < p->nr_targets; i++) {
> struct cxl_endpoint_decoder *cxled = p->targets[i];
>
> - cxled->pos = cxl_calc_interleave_pos(cxled);
> + cxled->pos = cxl_calc_interleave_pos(cxled, multiple);
> /*
> * Record that sorting failed, but still continue to calc
> * cxled->pos so that follow-on code paths can reliably
> @@ -1902,6 +1947,23 @@ static int cxl_region_attach(struct cxl_region *cxlr,
> struct cxl_dport *dport;
> int rc = -ENXIO;
>
> + /*
> + * Protect against improper gran mixes on 3-way HB interleave
> + * Expect decoder gran*ways == region gran*ways
> + */
> + if (cxlrd->cxlsd.cxld.interleave_ways == 3) {
> + if ((cxlrd->cxlsd.cxld.interleave_granularity * 3) !=
> + (p->interleave_ways * p->interleave_granularity)) {
> + dev_dbg(&cxlr->dev,
> + "invalid config region:%d/%d decoder %d/%d\n",
> + p->interleave_ways, p->interleave_granularity,
> + cxlrd->cxlsd.cxld.interleave_ways,
> + cxlrd->cxlsd.cxld.interleave_granularity);
> +
> + return rc;
> + }
> + }
> +
> rc = check_interleave_cap(&cxled->cxld, p->interleave_ways,
> p->interleave_granularity);
> if (rc) {
> @@ -2055,9 +2117,10 @@ static int cxl_region_attach(struct cxl_region *cxlr,
> */
> for (int i = 0; i < p->nr_targets; i++) {
> struct cxl_endpoint_decoder *cxled = p->targets[i];
> + int multiple = gran_multiple(cxlr);
> int test_pos;
>
> - test_pos = cxl_calc_interleave_pos(cxled);
> + test_pos = cxl_calc_interleave_pos(cxled, multiple);
> dev_dbg(&cxled->cxld.dev,
> "Test cxl_calc_interleave_pos(): %s test_pos:%d cxled->pos:%d\n",
> (test_pos == cxled->pos) ? "success" : "fail",
>
> base-commit: 2bb67004903cfd35710750654669a77e7223fcd1
next prev parent reply other threads:[~2025-02-25 0:41 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-02-24 23:58 [PATCH] cxl/region: Allow 6 & 12 way regions on 3-way HB interleaves alison.schofield
2025-02-25 0:41 ` Dave Jiang [this message]
2025-02-25 2:27 ` Alison Schofield
2025-02-25 15:41 ` Dave Jiang
2025-02-25 5:02 ` kernel test robot
2025-02-26 7:12 ` Li Ming
2025-03-06 22:15 ` Alison Schofield
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=010e16c3-0f90-41b0-8939-4cc8cb26de7a@intel.com \
--to=dave.jiang@intel.com \
--cc=alison.schofield@intel.com \
--cc=dan.j.williams@intel.com \
--cc=dave@stgolabs.net \
--cc=ira.weiny@intel.com \
--cc=jonathan.cameron@huawei.com \
--cc=linux-cxl@vger.kernel.org \
--cc=vishal.l.verma@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox