From: Wonjae Lee <wj28.lee@samsung.com>
To: Alison Schofield <alison.schofield@intel.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>,
Jonathan Cameron <jonathan.cameron@huawei.com>,
Dave Jiang <dave.jiang@intel.com>,
Vishal Verma <vishal.l.verma@intel.com>,
Ira Weiny <ira.weiny@intel.com>,
Dan Williams <dan.j.williams@intel.com>,
"linux-cxl@vger.kernel.org" <linux-cxl@vger.kernel.org>,
KyungSan Kim <ks0204.kim@samsung.com>,
Hojin Nam <hj96.nam@samsung.com>
Subject: RE:(2) [RFC PATCH] cxl/region: Allow out of order assembly of autodiscovered regions
Date: Wed, 31 Jan 2024 10:02:24 +0900 [thread overview]
Message-ID: <20240131010224epcms2p859f54c3160adff7b18e57584dcc98aaa@epcms2p8> (raw)
In-Reply-To: <Zbh9BMbpjr/qtVjE@aschofie-mobl2>
On Mon, Jan 29, 2024 at 08:37:24PM -0800, Alison Schofield wrote:
> On Fri, Jan 26, 2024 at 05:51:08PM +0900, Wonjae Lee wrote:
> > Thank you for your quick response and sending the patch!
> >
> > I'm very sorry for the delay in replying to your mail. It took some time to
> > reproduce test and check the issue.
> >
> > On Fri, Jan 12, 2024 at 09:04:21PM -0800, alison.schofield@intel.com wrote:
> > > From: Alison Schofield <alison.schofield@intel.com>
> > >
> > > Wonjae Lee,
> > >
> > > Here is the RFC Patch I mentioned in this thread:
> > > https://lore.kernel.org/linux-cxl/20240112092201epcms2p577b3c979bdc694a370e5952edc091f68@epcms2p5/
> > >
> > > This passes the cxl-test suite, so hoping no regression, but it needs
> > > to be tested with the required config: 2 memdevs connected to the same
> > > port, each memdev belongs to a different auto-region. Repeated attempts
> > > should hit the test case and emit this debug message upon success:
> > > dev_dbg(&cxlr_a->dev, "allow out of order region ref alloc\n");
> > >
> > > Failure will be the HPA order violation message.
> > >
> > > All reviewers & testers welcomed.
> >
> > After applying your patch to base-commit, I ran the same test I mentioned in
> > the previous thread, and during the boot process, a null pointer dereference
> > occurs as shown below:
> >
> > thread: https://lore.kernel.org/linux-cxl/ZaFn0dfaJ8iOi1UW@aschofie-mobl2/T/#t
> > base-commit: 3601311593eb44d34f142b993cb6f38f9a7863b3
> > dmesg log:
> > [] cxl region0: region sort successful
> > [] cxl region0: mem1:endpoint6 decoder6.0 add: mem1:decoder6.0 @ 0 next: none nr_eps: 1 nr_targets: 1
> > [] cxl region0: pci0000:3d:port1 decoder1.0 add: mem1:decoder6.0 @ 0 next: mem1 nr_eps: 1 nr_targets: 1
> > [] BUG: kernel NULL pointer dereference, address: 00000000000002e8
> > ...
> > [] Call Trace:
> > ...
> > [] ? auto_order_ok+0x5f/0xb0 [cxl_core]
> > [] cxl_region_attach_position+0x379/0x860 [cxl_core]
> > [] cxl_region_attach+0x466/0x8b0 [cxl_core]
> >
> >
> > To be specific, cxld_a in auto_order_ok() is NULL:
> > cxld_a = cxl_region_find_decoder(port, cxlr_a); /* cxld_a == NULL */
> >
> > I've noticed that if I pass cxled as a function argument appropriately, and get
> > cxld_a as shown below, the region initialization succeeds with the log "cxl
> > region0: allow out of order region ref alloc" as you mention.
> >
> > @@ -756,8 +756,8 @@ static struct cxl_decoder *cxl_region_find_decoder(struct cxl_port *port,
> > return to_cxl_decoder(dev);
> > }
> > -bool auto_order_ok(struct cxl_port *port, struct cxl_region *cxlr_a,
> > - struct cxl_region *cxlr_b)
> > +bool auto_order_ok(struct cxl_port *port, struct cxl_endpoint_decoder *cxled,
> > + struct cxl_region *cxlr_a, struct cxl_region *cxlr_b)
> > {
> > struct cxl_region_ref *cxl_rr;
> > struct cxl_decoder *cxld_a, *cxld_b;
> > @@ -771,7 +771,10 @@ bool auto_order_ok(struct cxl_port *port, struct cxl_region *cxlr_a,
> > if (!test_bit(CXL_REGION_F_AUTO, &cxlr_a->flags))
> > return false;
> >
> > - cxld_a = cxl_region_find_decoder(port, cxlr_a);
> > + if (port == cxled_to_port(cxled))
> > + cxld_a = &cxled->cxld;
> > + else
> > + cxld_a = cxl_region_find_decoder(port, cxlr_a);
> >
> > I hope this helps.
>
> Yes, this is very helpful! Glad to have you trying it out.
>
> I think at this point we can call alloc_region_ref() with a known cxled
> of this port and simply pass it on. I'm going to skip this if...else pattern,
> and simply get cxld_a = &cxled->cxld;
>
> I think there may be opportunity for refinements in cxl_region_find_decoder()
> and it's descendants, but it's not directly tied to this patch.
>
> Thanks,
> Alison
I'm glad it was useful. I agree to change it in the direction you said.
Looking forward to the patch :)
Thanks,
Wonjae
>
> >
> > Thanks,
> > Wonjae
> >
> > > Begin commit log:
> > > Autodiscovered regions can fail to assemble if they are not discovered
> > > in HPA decode order. The user will see failure messages like:
> > >
> > > [] cxl region0: endpoint5: HPA order violation region1
> > > [] cxl region0: endpoint5: failed to allocate region reference
> > >
> > > The check that is causing the failure helps the CXL driver enforce
> > > a CXL spec mandate that decoders be committed in HPA order. The
> > > check is needless for autodiscovered regions since their decoders
> > > are already programmed. Trying to enforce order in the assembly of
> > > these regions is useless because they are assembled once all their
> > > member endpoints arrive, and there is no guarantee on the order in
> > > which endpoints are discovered during probe.
> > >
> > > Keep the existing check, but for autodiscovered regions, allow the
> > > out of order assembly after a sanity check that the lowered numbered
> > > decoder has the lower HPA starting address.
> > >
> > > Signed-off-by: Alison Schofield <alison.schofield@intel.com>
> > > ---
> > > drivers/cxl/core/region.c 34 +++++++++++++++++++++++++++++++++-
> > > 1 file changed, 33 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> > > index 0f05692bfec3..8770ebcae05d 100644
> > > --- a/drivers/cxl/core/region.c
> > > +++ b/drivers/cxl/core/region.c
> > > @@ -753,6 +753,37 @@ static struct cxl_decoder *cxl_region_find_decoder(struct cxl_port *port,
> > > return to_cxl_decoder(dev);
> > > }
> > >
> > > +bool auto_order_ok(struct cxl_port *port, struct cxl_region *cxlr_a,
> > > + struct cxl_region *cxlr_b)
> > > +{
> > > + struct cxl_region_ref *cxl_rr;
> > > + struct cxl_decoder *cxld_a, *cxld_b;
> > > +
> > > + /*
> > > + * Allow the out of order assembly of auto-discovered regions as
> > > + * long as correct decoder programming order can be verified.
> > > + *
> > > + * Per CXL Spec 3.1 8.2.4.20.12 Committing Decoder Programming,
> > > + * software must commit decoders in HPA order. Therefore it is
> > > + * sufficient to sanity check that the lowered number decoder
> > > + * has the lower HPA starting address.
> > > + */
> > > + if (!test_bit(CXL_REGION_F_AUTO, &cxlr_a->flags))
> > > + return false;
> > > +
> > > + cxld_a = cxl_region_find_decoder(port, cxlr_a);
> > > + cxl_rr = cxl_rr_load(port, cxlr_b);
> > > + cxld_b = cxl_rr->decoder;
> > > +
> > > + if (cxld_b->id > cxld_a->id) {
> > > + dev_dbg(&cxlr_a->dev,
> > > + "allow out of order region ref alloc\n");
> > > + return true;
> > > + }
> > > +
> > > + return false;
> > > +}
> > > +
> > > static struct cxl_region_ref *alloc_region_ref(struct cxl_port *port,
> > > struct cxl_region *cxlr)
> > > {
> > > @@ -767,7 +798,8 @@ static struct cxl_region_ref *alloc_region_ref(struct cxl_port *port,
> > > if (!ip->res)
> > > continue;
> > >
> > > - if (ip->res->start > p->res->start) {
> > > + if (ip->res->start > p->res->start &&
> > > + (!auto_order_ok(port, cxlr, iter->region))) {
> > > dev_dbg(&cxlr->dev,
> > > "%s: HPA order violation %s:%pr vs %pr\n",
> > > dev_name(&port->dev),
> > >
> > > base-commit: 3601311593eb44d34f142b993cb6f38f9a7863b3
> > > --
> > > 2.37.3
> > >
>
prev parent reply other threads:[~2024-01-31 1:02 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <CGME20240113050447epcas2p2097a5c49c1f0f9318ec4202843e169b8@epcms2p8>
2024-01-13 5:04 ` [RFC PATCH] cxl/region: Allow out of order assembly of autodiscovered regions alison.schofield
2024-01-18 19:46 ` Dan Williams
2024-02-08 20:52 ` Alison Schofield
2024-02-08 22:57 ` Dan Williams
2024-02-09 0:23 ` Alison Schofield
2024-02-09 5:25 ` Dan Williams
2024-01-26 8:51 ` Wonjae Lee
2024-01-30 4:37 ` Alison Schofield
2024-01-31 1:02 ` Wonjae Lee [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240131010224epcms2p859f54c3160adff7b18e57584dcc98aaa@epcms2p8 \
--to=wj28.lee@samsung.com \
--cc=alison.schofield@intel.com \
--cc=dan.j.williams@intel.com \
--cc=dave.jiang@intel.com \
--cc=dave@stgolabs.net \
--cc=hj96.nam@samsung.com \
--cc=ira.weiny@intel.com \
--cc=jonathan.cameron@huawei.com \
--cc=ks0204.kim@samsung.com \
--cc=linux-cxl@vger.kernel.org \
--cc=vishal.l.verma@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox