Linux CXL
 help / color / mirror / Atom feed
From: Wonjae Lee <wj28.lee@samsung.com>
To: Alison Schofield <alison.schofield@intel.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>,
	Jonathan Cameron <jonathan.cameron@huawei.com>,
	Dave Jiang <dave.jiang@intel.com>,
	Vishal Verma <vishal.l.verma@intel.com>,
	Ira Weiny <ira.weiny@intel.com>,
	Dan Williams <dan.j.williams@intel.com>,
	"linux-cxl@vger.kernel.org" <linux-cxl@vger.kernel.org>,
	KyungSan Kim <ks0204.kim@samsung.com>,
	Hojin Nam <hj96.nam@samsung.com>
Subject: RE:(2) [RFC PATCH] cxl/region: Allow out of order assembly of autodiscovered regions
Date: Wed, 31 Jan 2024 10:02:24 +0900	[thread overview]
Message-ID: <20240131010224epcms2p859f54c3160adff7b18e57584dcc98aaa@epcms2p8> (raw)
In-Reply-To: <Zbh9BMbpjr/qtVjE@aschofie-mobl2>

On Mon, Jan 29, 2024 at 08:37:24PM -0800, Alison Schofield wrote:
> On Fri, Jan 26, 2024 at 05:51:08PM +0900, Wonjae Lee wrote:
> > Thank you for your quick response and sending the patch!
> >
> > I'm very sorry for the delay in replying to your mail. It took some time to
> > reproduce test and check the issue.
> >
> > On Fri, Jan 12, 2024 at 09:04:21PM -0800, alison.schofield@intel.com wrote:
> > > From: Alison Schofield <alison.schofield@intel.com>
> > >
> > > Wonjae Lee,
> > >
> > > Here is the RFC Patch I mentioned in this thread:
> > > https://lore.kernel.org/linux-cxl/20240112092201epcms2p577b3c979bdc694a370e5952edc091f68@epcms2p5/
> > >
> > > This passes the cxl-test suite, so hoping no regression, but it needs
> > > to be tested with the required config: 2 memdevs connected to the same
> > > port, each memdev belongs to a different auto-region. Repeated attempts
> > > should hit the test case and emit this debug message upon success:
> > >  dev_dbg(&cxlr_a->dev, "allow out of order region ref alloc\n");
> > >
> > > Failure will be the HPA order violation message.
> > >
> > > All reviewers & testers welcomed.
> >
> > After applying your patch to base-commit, I ran the same test I mentioned in
> > the previous thread, and during the boot process, a null pointer dereference
> > occurs as shown below:
> >
> > thread: https://lore.kernel.org/linux-cxl/ZaFn0dfaJ8iOi1UW@aschofie-mobl2/T/#t
> > base-commit: 3601311593eb44d34f142b993cb6f38f9a7863b3
> > dmesg log:
> > [] cxl region0: region sort successful
> > [] cxl region0: mem1:endpoint6 decoder6.0 add: mem1:decoder6.0 @ 0 next: none nr_eps: 1 nr_targets: 1
> > [] cxl region0: pci0000:3d:port1 decoder1.0 add: mem1:decoder6.0 @ 0 next: mem1 nr_eps: 1 nr_targets: 1
> > [] BUG: kernel NULL pointer dereference, address: 00000000000002e8
> > ...
> > [] Call Trace:
> > ...
> > []  ? auto_order_ok+0x5f/0xb0 [cxl_core]
> > []  cxl_region_attach_position+0x379/0x860 [cxl_core]
> > []  cxl_region_attach+0x466/0x8b0 [cxl_core]
> >
> >
> > To be specific, cxld_a in auto_order_ok() is NULL:
> >     cxld_a = cxl_region_find_decoder(port, cxlr_a); /* cxld_a == NULL */
> >
> > I've noticed that if I pass cxled as a function argument appropriately, and get
> > cxld_a as shown below, the region initialization succeeds with the log "cxl
> > region0: allow out of order region ref alloc" as you mention.
> >
> > @@ -756,8 +756,8 @@ static struct cxl_decoder *cxl_region_find_decoder(struct cxl_port *port,
> >         return to_cxl_decoder(dev);
> >  }
> > -bool auto_order_ok(struct cxl_port *port, struct cxl_region *cxlr_a,
> > -                  struct cxl_region *cxlr_b)
> > +bool auto_order_ok(struct cxl_port *port, struct cxl_endpoint_decoder *cxled,
> > +                  struct cxl_region *cxlr_a, struct cxl_region *cxlr_b)
> >  {
> >         struct cxl_region_ref *cxl_rr;
> >         struct cxl_decoder *cxld_a, *cxld_b;
> > @@ -771,7 +771,10 @@ bool auto_order_ok(struct cxl_port *port, struct cxl_region *cxlr_a,
> >         if (!test_bit(CXL_REGION_F_AUTO, &cxlr_a->flags))
> >                 return false;
> >
> > -       cxld_a = cxl_region_find_decoder(port, cxlr_a);
> > +       if (port == cxled_to_port(cxled))
> > +               cxld_a = &cxled->cxld;
> > +       else
> > +               cxld_a = cxl_region_find_decoder(port, cxlr_a);
> >
> > I hope this helps.
>
> Yes, this is very helpful! Glad to have you trying it out.
>
> I think at this point we can call alloc_region_ref() with a known cxled
> of this port and simply pass it on. I'm going to skip this if...else pattern,
> and simply get cxld_a = &cxled->cxld;
>
> I think there may be opportunity for refinements in cxl_region_find_decoder()
> and it's descendants, but it's not directly tied to this patch.
>
> Thanks,
> Alison

I'm glad it was useful. I agree to change it in the direction you said.
Looking forward to the patch :)

Thanks,
Wonjae

>
> >
> > Thanks,
> > Wonjae
> >
> > > Begin commit log:
> > > Autodiscovered regions can fail to assemble if they are not discovered
> > > in HPA decode order. The user will see failure messages like:
> > >
> > > [] cxl region0: endpoint5: HPA order violation region1
> > > [] cxl region0: endpoint5: failed to allocate region reference
> > >
> > > The check that is causing the failure helps the CXL driver enforce
> > > a CXL spec mandate that decoders be committed in HPA order. The
> > > check is needless for autodiscovered regions since their decoders
> > > are already programmed. Trying to enforce order in the assembly of
> > > these regions is useless because they are assembled once all their
> > > member endpoints arrive, and there is no guarantee on the order in
> > > which endpoints are discovered during probe.
> > >
> > > Keep the existing check, but for autodiscovered regions, allow the
> > > out of order assembly after a sanity check that the lowered numbered
> > > decoder has the lower HPA starting address.
> > >
> > > Signed-off-by: Alison Schofield <alison.schofield@intel.com>
> > > ---
> > >  drivers/cxl/core/region.c 34 +++++++++++++++++++++++++++++++++-
> > >  1 file changed, 33 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> > > index 0f05692bfec3..8770ebcae05d 100644
> > > --- a/drivers/cxl/core/region.c
> > > +++ b/drivers/cxl/core/region.c
> > > @@ -753,6 +753,37 @@ static struct cxl_decoder *cxl_region_find_decoder(struct cxl_port *port,
> > >  return to_cxl_decoder(dev);
> > >  }
> > >
> > > +bool auto_order_ok(struct cxl_port *port, struct cxl_region *cxlr_a,
> > > +        struct cxl_region *cxlr_b)
> > > +{
> > > + struct cxl_region_ref *cxl_rr;
> > > + struct cxl_decoder *cxld_a, *cxld_b;
> > > +
> > > + /*
> > > +  * Allow the out of order assembly of auto-discovered regions as
> > > +  * long as correct decoder programming order can be verified.
> > > +  *
> > > +  * Per CXL Spec 3.1 8.2.4.20.12 Committing Decoder Programming,
> > > +  * software must commit decoders in HPA order. Therefore it is
> > > +  * sufficient to sanity check that the lowered number decoder
> > > +  * has the lower HPA starting address.
> > > +  */
> > > + if (!test_bit(CXL_REGION_F_AUTO, &cxlr_a->flags))
> > > +    return false;
> > > +
> > > + cxld_a = cxl_region_find_decoder(port, cxlr_a);
> > > + cxl_rr = cxl_rr_load(port, cxlr_b);
> > > + cxld_b = cxl_rr->decoder;
> > > +
> > > + if (cxld_b->id > cxld_a->id) {
> > > +    dev_dbg(&cxlr_a->dev,
> > > +        "allow out of order region ref alloc\n");
> > > +    return true;
> > > + }
> > > +
> > > + return false;
> > > +}
> > > +
> > >  static struct cxl_region_ref *alloc_region_ref(struct cxl_port *port,
> > >                          struct cxl_region *cxlr)
> > >  {
> > > @@ -767,7 +798,8 @@ static struct cxl_region_ref *alloc_region_ref(struct cxl_port *port,
> > >      if (!ip->res)
> > >          continue;
> > >
> > > -    if (ip->res->start > p->res->start) {
> > > +    if (ip->res->start > p->res->start &&
> > > +        (!auto_order_ok(port, cxlr, iter->region))) {
> > >          dev_dbg(&cxlr->dev,
> > >              "%s: HPA order violation %s:%pr vs %pr\n",
> > >              dev_name(&port->dev),
> > >
> > > base-commit: 3601311593eb44d34f142b993cb6f38f9a7863b3
> > > --
> > > 2.37.3
> > >
>


      reply	other threads:[~2024-01-31  1:02 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CGME20240113050447epcas2p2097a5c49c1f0f9318ec4202843e169b8@epcms2p8>
2024-01-13  5:04 ` [RFC PATCH] cxl/region: Allow out of order assembly of autodiscovered regions alison.schofield
2024-01-18 19:46   ` Dan Williams
2024-02-08 20:52     ` Alison Schofield
2024-02-08 22:57       ` Dan Williams
2024-02-09  0:23         ` Alison Schofield
2024-02-09  5:25           ` Dan Williams
2024-01-26  8:51   ` Wonjae Lee
2024-01-30  4:37     ` Alison Schofield
2024-01-31  1:02       ` Wonjae Lee [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240131010224epcms2p859f54c3160adff7b18e57584dcc98aaa@epcms2p8 \
    --to=wj28.lee@samsung.com \
    --cc=alison.schofield@intel.com \
    --cc=dan.j.williams@intel.com \
    --cc=dave.jiang@intel.com \
    --cc=dave@stgolabs.net \
    --cc=hj96.nam@samsung.com \
    --cc=ira.weiny@intel.com \
    --cc=jonathan.cameron@huawei.com \
    --cc=ks0204.kim@samsung.com \
    --cc=linux-cxl@vger.kernel.org \
    --cc=vishal.l.verma@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox