qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Gregory Price <gregory.price@memverge.com>
To: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Jonathan Cameron via <qemu-devel@nongnu.org>,
	Lukas Wunner <lukas@wunner.de>, Michael Tsirkin <mst@redhat.com>,
	Ben Widawsky <bwidawsk@kernel.org>,
	linux-cxl@vger.kernel.org, linuxarm@huawei.com,
	Ira Weiny <ira.weiny@intel.com>,
	Gregory Price <gourry.memverge@gmail.com>,
	Dan Williams <dan.j.williams@intel.com>
Subject: Re: cxl nvdimm Potential probe ordering issues.
Date: Fri, 20 Jan 2023 00:51:26 -0500	[thread overview]
Message-ID: <Y8or3tSnQm5In17u@memverge.com> (raw)
In-Reply-To: <20230119161711.000057a7@Huawei.com>

On Thu, Jan 19, 2023 at 04:17:11PM +0000, Jonathan Cameron wrote:
> 
> Whilst I still have no idea if this is the same problem, I have identified
> what goes wrong if there is a module probe ordering issue.
> https://elixir.bootlin.com/linux/v6.2-rc4/source/drivers/cxl/core/pmem.c#L306
> 
> 	/*
> 	 * The two actions below arrange for @cxl_nvd to be deleted when either
> 	 * the top-level PMEM bridge goes down, or the endpoint device goes
> 	 * through ->remove().
> 	 */
> 	device_lock(&cxl_nvb->dev);
> 	if (cxl_nvb->dev.driver)
> 		rc = devm_add_action_or_reset(&cxl_nvb->dev, cxl_nvd_unregister,
> 					      cxl_nvd);
> 	else
> // bridge driver not loaded, so we hit this path.
> 		rc = -ENXIO;
> 	device_unlock(&cxl_nvb->dev);
> 
> 	if (rc)
> /// and this one
> 		goto err_alloc;
> 
> 	/* @cxlmd carries a reference on @cxl_nvb until cxlmd_release_nvdimm */
> 	return devm_add_action_or_reset(&cxlmd->dev, cxlmd_release_nvdimm, cxlmd);
> 
> err:
> 	put_device(dev);
> err_alloc:
> 	cxlmd->cxl_nvb = NULL;
> 	cxlmd->cxl_nvd = NULL;
> 	put_device(&cxl_nvb->dev);
> // whilst we scrub the pointers we don't actually get rid of the
> // cxl_nvd that we registered.  Hence later load of the driver tries to
> // attach to that and boom because we've scrubbed these pointers here.
> // A quick hack is to just call device_del(&cxl_nvd->dev) if rc = -ENXIO here.
> // There may well be a races though....
> 	return rc;
> }
> EXPORT_SYMBOL_NS_GPL(devm_cxl_add_nvdimm, CXL);
> 
> 
> Of course this "fix" just stops things blowing up, it doesn't leave things
> in a remotely useful state.  If it's triggered because someone
> is messing with the load order that's fine.  If the same issue
> is occurring for Gregory, not so much. 
> 
> Jonathan
> 

mild hint in the dev_cxl_add_nvdimm_bridge path

driver/cxl/acpi.c

static int cxl_acpi_probe(struct platform_device *pdev)
{
... snip ...
  if (IS_ENABLED(CONFIG_CXL_PMEM))
    rc = device_for_each_child(&root_port->dev, root_port,
             add_root_nvdimm_bridge);
  if (rc < 0)
    return rc;

  /* In case PCI is scanned before ACPI re-trigger memdev attach */
  cxl_bus_rescan();
  return 0;
}


if PCI is presently written in a way that it's expecting nvdimm_bridge
to be present (via acpi_probe), then clearly this would break.

From the other discussion here... that seems to be the issue?  If that's
an issue, I also imagine there are other parts that may be subject to
the same problem.


static int cxl_pmem_region_probe(struct device *dev)
{
  struct nd_mapping_desc mappings[CXL_DECODER_MAX_INTERLEAVE];
  struct cxl_pmem_region *cxlr_pmem = to_cxl_pmem_region(dev);
  struct cxl_region *cxlr = cxlr_pmem->cxlr;
  struct cxl_nvdimm_bridge *cxl_nvb = cxlr->cxl_nvb;


this may be unreachable due to prior stack traces, but you get the
point.

Reiterating my confusion a bit: I don't have an nvdimm, why am i getting
an nvdimm_bridge?  The reason it no longer appears to trigger on my
memexp example is because it doesnt go down this path:

static int cxl_mem_probe(struct device *dev)
{
... snip ...

  // resource size is 0 here due to type3dev->persistent_capacity=0
  if (resource_size(&cxlds->pmem_res) && IS_ENABLED(CONFIG_CXL_PMEM)) {
    rc = devm_cxl_add_nvdimm(cxlmd);
    if (rc == -ENODEV)
      dev_info(dev, "PMEM disabled by platform\n");
    else
      return rc;
  }
... snip ...
}

This seems like more than an ordering issue.


  reply	other threads:[~2023-01-20  5:52 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-11 14:24 [PATCH 0/8] hw/cxl: CXL emulation cleanups and minor fixes for upstream Jonathan Cameron via
2023-01-11 14:24 ` [PATCH 1/8] hw/mem/cxl_type3: Improve error handling in realize() Jonathan Cameron via
2023-01-11 17:33   ` Ira Weiny
2023-01-11 14:24 ` [PATCH 2/8] hw/pci-bridge/cxl_downstream: Fix type naming mismatch Jonathan Cameron via
2023-01-11 14:45   ` Philippe Mathieu-Daudé
2023-01-11 17:38   ` Ira Weiny
2023-01-11 14:24 ` [PATCH 3/8] hw/cxl: set cxl-type3 device type to PCI_CLASS_MEMORY_CXL Jonathan Cameron via
2023-01-11 17:41   ` Ira Weiny
2023-01-11 14:24 ` [PATCH 4/8] hw/cxl: Add CXL_CAPACITY_MULTIPLIER definition Jonathan Cameron via
2023-01-11 15:48   ` Philippe Mathieu-Daudé
2023-01-11 14:24 ` [PATCH 5/8] hw/i386/acpi: Drop duplicate _UID entry for CXL root bridge Jonathan Cameron via
2023-01-11 17:48   ` Ira Weiny
2023-01-11 14:24 ` [PATCH 6/8] qemu/bswap: Add const_le64() Jonathan Cameron via
2023-01-11 15:49   ` Philippe Mathieu-Daudé
2023-01-11 16:07     ` Philippe Mathieu-Daudé
2023-01-11 16:33       ` Philippe Mathieu-Daudé
2023-01-11 16:40   ` Philippe Mathieu-Daudé
2023-01-11 16:59     ` Jonathan Cameron via
2023-01-11 14:24 ` [PATCH 7/8] qemu/uuid: Add UUID static initializer Jonathan Cameron via
2023-01-11 14:24 ` [PATCH 8/8] hw/cxl/mailbox: Use new UUID network order define for cel_uuid Jonathan Cameron via
2023-01-11 15:50   ` Philippe Mathieu-Daudé
2023-01-12 15:39 ` [PATCH 0/8] hw/cxl: CXL emulation cleanups and minor fixes for upstream Gregory Price
2023-01-12 17:21   ` Jonathan Cameron via
2023-01-12 22:46     ` Gregory Price
2023-01-13  9:12       ` Jonathan Cameron via
2023-01-13 14:19         ` Gregory Price
2023-01-13 14:40           ` Jonathan Cameron via
2023-01-13 14:45             ` Jonathan Cameron via
2023-01-13 15:12               ` Lukas Wunner
2023-01-13 15:42                 ` Gregory Price
2023-01-18 19:22                 ` Gregory Price
2023-01-18 19:31                   ` Gregory Price
2023-01-19 12:42                     ` Jonathan Cameron via
2023-01-19 15:04                       ` cxl nvdimm Potential probe ordering issues Jonathan Cameron via
2023-01-19 16:17                         ` Jonathan Cameron via
2023-01-20  5:51                           ` Gregory Price [this message]
2023-01-20 17:26                           ` Dan Williams
2023-01-20  4:53                         ` Gregory Price
2023-01-20 10:47                           ` Jonathan Cameron via
2023-01-20 17:38                           ` Dan Williams
2023-01-20 21:54                             ` Gregory Price
2023-01-20 22:41                               ` Dan Williams
2023-01-23  9:44                                 ` Jonathan Cameron via
2023-01-23 18:16                                 ` Gregory Price
2023-01-19 10:19                   ` [PATCH 0/8] hw/cxl: CXL emulation cleanups and minor fixes for upstream Jonathan Cameron via
2023-01-19 11:48                     ` Michael S. Tsirkin
2023-01-19 12:16                       ` Jonathan Cameron via
2023-01-19 14:23                       ` Gregory Price
2023-01-19 14:20                     ` Gregory Price
2023-01-13 14:45             ` Gregory Price

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y8or3tSnQm5In17u@memverge.com \
    --to=gregory.price@memverge.com \
    --cc=Jonathan.Cameron@huawei.com \
    --cc=bwidawsk@kernel.org \
    --cc=dan.j.williams@intel.com \
    --cc=gourry.memverge@gmail.com \
    --cc=ira.weiny@intel.com \
    --cc=linux-cxl@vger.kernel.org \
    --cc=linuxarm@huawei.com \
    --cc=lukas@wunner.de \
    --cc=mst@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).