From: Jonathan Cameron <Jonathan.Cameron@Huawei.com>
To: Dan Williams <dan.j.williams@intel.com>
Cc: <linux-cxl@vger.kernel.org>,
Ben Widawsky <ben.widawsky@intel.com>,
Vishal L Verma <vishal.l.verma@intel.com>,
"Weiny, Ira" <ira.weiny@intel.com>,
"Schofield, Alison" <alison.schofield@intel.com>
Subject: Re: CXL type 3 which doesn't have cxl mem enabled.
Date: Tue, 26 Apr 2022 20:38:50 +0100 [thread overview]
Message-ID: <20220426203850.00006538@Huawei.com> (raw)
In-Reply-To: <CAPcyv4gjn7MAo-kzW40EAc9J6zQaOKh0a-42GyV1X5Efac_W_Q@mail.gmail.com>
On Tue, 26 Apr 2022 12:00:41 -0700
Dan Williams <dan.j.williams@intel.com> wrote:
> On Tue, Apr 26, 2022 at 11:06 AM Jonathan Cameron
> <Jonathan.Cameron@huawei.com> wrote:
> >
> > On Tue, 26 Apr 2022 10:19:55 -0700
> > Dan Williams <dan.j.williams@intel.com> wrote:
> >
> > > On Tue, Apr 26, 2022 at 10:09 AM Jonathan Cameron
> > > <Jonathan.Cameron@huawei.com> wrote:
> > > >
> > > > Hi All,
> > > >
> > > > I ran into this whilst debugging why on the current QEMU code
> > > > we now get a probe failure for CXL mem due to the range 1 size being
> > > > non 0.
> > > >
> > > > The conditions for whether we have legacy ranges programmed don't
> > > > take into account if Mem_Enable = 1. That is if the
> > > > DVSEC CXL Control Mem_Enable bit is set on the type 3 device.
> > > > If it's not then there is no existing user of the CXL memory
> > > > setup by firmware or similar so we can switch over to HDM
> > > > decoders and it doesn't matter what is in the range registers.
> > > >
> > > > Unfortunately the QEMU code was bringing the device up with
> > > > Mem_Enabled already set. So I fixed that. After all default
> > > > value of that bit should be 0.
> > > >
> > > > A few problems then showed up.
> > > >
> > > > 1. Nothing in the Linux code actually sets Mem_Enabled to 1.
> >
> > Sorry - my mistake, that should be Mem_Enable. Though that doesn't
> > actually clarify things much...
> >
> > >
> > > That's because the device is supposed to, I though, set it of its own
> > > accord as a result of link training. It's an RO field in the spec, so
> > > Linux can't set it:
> > >
> > > 8.2.1.3.3 DVSEC Flex Bus Port Status (Offset 0Eh)
> > > "Mem_Enabled: When set, indicates that CXL.mem protocol operation has
> > > been enabled as a result of PCIe alternate protocol negotiation for
> > > Flex Bus."
> >
> > Agreed with that statement.
> >
> > Ah. Nothing like confusing register field names that are very similar....
> > A Mem_Enable is in DVSEC for CXL Device 8.1.3.2 and is RWL.
> > Just for giggles there is also a Mem_Enable in the Flex Bus Port Control
> > but the range registers comment isn't about that one (I hope anyway!).
>
> Not sure whether to laugh or cry at that, sorry for the mix up on my part.
>
> > The kernel currently sets the value of info->mem_enabled using
> > the Mem_Enable field of the DVSEC for CXL Device.
> > https://elixir.bootlin.com/linux/v5.18-rc4/source/drivers/cxl/pci.c#L501
> >
> > So I think wrong name and wrong DVSEC for that particular condition.
>
> Yeah, I don't even see a need to cache that value, so something like
> the attached? Note that the intent was to only have cxl_mem worry
> about MMIO mapped register details and not require the 'struct
> pci_dev' which makes things easier for cxl_test in the near term.
>
Hi Dan,
That fixes this problem (I'll test tomorrow but it looks right), but...
I think we still run into the problem I was debugging in the
first place which is whether the Device DVSEC Range 1 size is non 0.
https://elixir.bootlin.com/linux/v5.18-rc4/source/drivers/cxl/pci.c#L541
(ranges ends up > 0 and hence we conclude firmware already programmed the
device and fail the probe).
As far as I can tell a CXL 2.0 type 3 device is allowed to provide
the option to use ranges or HDM decoders (or it can be HDM decoder only).
See the comment at end of 8.1.3.8.4 :
"A CXL.mem capable device that implements CXL HDM Decoder Capability registers
follows the above behavior as long as HDM Decoder Enable bit in CXL HDM Decoder
Global Control register is zero."
As such we can't use the range size alone to check if
the Range registers are in use (it's a RO value, not something previously
configured) We need to perform the full check as described which
includes checking Mem_Enable (which is in the above behavior that comment
is directing us towards).
As the below patch has already set Mem_Enable hardware field unconditionally
we don't have the necessary info by the time we reach the relevant code.
https://elixir.bootlin.com/linux/v5.18-rc4/source/drivers/cxl/mem.c#L107
So you could cache the current value (perhaps with a more meaningful
name than the spec gives it!) of Device DVSEC Mem_Enable
at the point where you have it written in this patch and add a check
on the cached value at the point in the reference above.
That's still a little ugly as ideally we shouldn't transition through
a somewhat invalid state - though it is harmless as no traffic
will be sent by the host (probably - though I suspect hardware folk
would tell me we can't assume it...).
The invalid state being:
- Mem_Enable set
- Range registers in use because global HDM Decoder enable not yet set.
- Range registers were programmed by firmware to something that actually
works but not enabled for some odd reason. I think they might even be
technically valid with the defaults even though the base is 0 (imagine
a very large CXL memory - some of it might overlap with a region being
routed by the host to the CXL host bridges).
Ideally we wouldn't set that Mem_Enable until we have switched
to the HDM decoders. To avoid that we probably need another callback
from cxl_mem into cxl_pci.
Agreed on the laughing or crying. I'm off to find a beer.
Jonathan
> diff --git a/drivers/cxl/cxlmem.h b/drivers/cxl/cxlmem.h
> index 7235d2f976e5..ef6950a2a4fd 100644
> --- a/drivers/cxl/cxlmem.h
> +++ b/drivers/cxl/cxlmem.h
> @@ -150,12 +150,10 @@ static inline int cxl_mbox_cmd_rc2errno(struct cxl_mbox_cmd *mbox_cmd)
>
> /**
> * struct cxl_endpoint_dvsec_info - Cached DVSEC info
> - * @mem_enabled: cached value of mem_enabled in the DVSEC, PCIE_DEVICE
> * @ranges: Number of active HDM ranges this device uses.
> * @dvsec_range: cached attributes of the ranges in the DVSEC, PCIE_DEVICE
> */
> struct cxl_endpoint_dvsec_info {
> - bool mem_enabled;
> int ranges;
> struct range dvsec_range[2];
> };
> diff --git a/drivers/cxl/mem.c b/drivers/cxl/mem.c
> index 401b0fbe21db..c2d9dadf4a2e 100644
> --- a/drivers/cxl/mem.c
> +++ b/drivers/cxl/mem.c
> @@ -27,12 +27,8 @@
> static int wait_for_media(struct cxl_memdev *cxlmd)
> {
> struct cxl_dev_state *cxlds = cxlmd->cxlds;
> - struct cxl_endpoint_dvsec_info *info = &cxlds->info;
> int rc;
>
> - if (!info->mem_enabled)
> - return -EBUSY;
> -
> rc = cxlds->wait_media_ready(cxlds);
> if (rc)
> return rc;
> diff --git a/drivers/cxl/pci.c b/drivers/cxl/pci.c
> index e7ab9a34d718..5c8f933bbece 100644
> --- a/drivers/cxl/pci.c
> +++ b/drivers/cxl/pci.c
> @@ -463,6 +463,17 @@ static int wait_for_media_ready(struct cxl_dev_state *cxlds)
> return 0;
> }
>
> +static void cxl_disable_mem(void *pdev)
> +{
> + struct cxl_dev_state *cxlds = pci_get_drvdata(pdev);
> + int d = cxlds->cxl_dvsec;
> + u16 ctrl;
> +
> + pci_read_config_word(pdev, d + CXL_DVSEC_CTRL_OFFSET, &ctrl);
> + ctrl &= ~CXL_DVSEC_MEM_ENABLE;
> + pci_write_config_word(pdev, d + CXL_DVSEC_CTRL_OFFSET, ctrl);
> +}
> +
> /*
> * Return positive number of non-zero ranges on success and a negative
> * error code on failure. The cxl_mem driver depends on ranges == 0 to
> @@ -486,13 +497,26 @@ static int __cxl_dvsec_ranges(struct cxl_dev_state *cxlds,
> if (rc)
> return rc;
>
> + if (!(cap & CXL_DVSEC_MEM_CAPABLE)) {
> + dev_dbg(dev, "Not MEM Capable\n");
> + return -ENXIO;
> + }
> +
> rc = pci_read_config_word(pdev, d + CXL_DVSEC_CTRL_OFFSET, &ctrl);
> if (rc)
> return rc;
>
> - if (!(cap & CXL_DVSEC_MEM_CAPABLE)) {
> - dev_dbg(dev, "Not MEM Capable\n");
> - return -ENXIO;
> + if (!(ctrl & CXL_DVSEC_MEM_ENABLE)) {
> + ctrl |= CXL_DVSEC_MEM_ENABLE;
> + rc = pci_write_config_word(pdev, d + CXL_DVSEC_CTRL_OFFSET,
> + ctrl);
> + if (rc)
> + return rc;
> +
> + rc = devm_add_action_or_reset(&pdev->dev, cxl_disable_mem,
> + pdev);
> + if (rc)
> + return rc;
> }
>
> /*
> @@ -511,8 +535,6 @@ static int __cxl_dvsec_ranges(struct cxl_dev_state *cxlds,
> return rc;
> }
>
> - info->mem_enabled = FIELD_GET(CXL_DVSEC_MEM_ENABLE, ctrl);
> -
> for (i = 0; i < hdm_count; i++) {
> u64 base, size;
> u32 temp;
> @@ -585,6 +607,7 @@ static int cxl_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
> cxlds = cxl_dev_state_create(&pdev->dev);
> if (IS_ERR(cxlds))
> return PTR_ERR(cxlds);
> + pci_set_drvdata(pdev, cxlds);
>
> cxlds->serial = pci_get_dsn(pdev);
> cxlds->cxl_dvsec = pci_find_dvsec_capability(
> diff --git a/tools/testing/cxl/test/mem.c b/tools/testing/cxl/test/mem.c
> index b6b726eff3e2..44d01224734a 100644
> --- a/tools/testing/cxl/test/mem.c
> +++ b/tools/testing/cxl/test/mem.c
> @@ -250,10 +250,6 @@ static void label_area_release(void *lsa)
>
> static void mock_validate_dvsec_ranges(struct cxl_dev_state *cxlds)
> {
> - struct cxl_endpoint_dvsec_info *info;
> -
> - info = &cxlds->info;
> - info->mem_enabled = true;
> }
>
> static int cxl_mock_mem_probe(struct platform_device *pdev)
next prev parent reply other threads:[~2022-04-26 19:39 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-04-26 17:08 CXL type 3 which doesn't have cxl mem enabled Jonathan Cameron
2022-04-26 17:19 ` Dan Williams
2022-04-26 18:06 ` Jonathan Cameron
2022-04-26 19:00 ` Dan Williams
2022-04-26 19:38 ` Jonathan Cameron [this message]
2022-04-26 20:02 ` Dan Williams
2022-04-27 8:35 ` Jonathan Cameron
2022-04-28 21:10 ` Dan Williams
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20220426203850.00006538@Huawei.com \
--to=jonathan.cameron@huawei.com \
--cc=alison.schofield@intel.com \
--cc=ben.widawsky@intel.com \
--cc=dan.j.williams@intel.com \
--cc=ira.weiny@intel.com \
--cc=linux-cxl@vger.kernel.org \
--cc=vishal.l.verma@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.