From: Jonathan Cameron <Jonathan.Cameron@Huawei.com>
To: Dan Williams <dan.j.williams@intel.com>
Cc: <dave.jiang@intel.com>, <ira.weiny@intel.com>,
<stable@vger.kernel.org>,
"Greg Kroah-Hartman" <gregkh@linuxfoundation.org>,
Davidlohr Bueso <dave@stgolabs.net>,
Alison Schofield <alison.schofield@intel.com>,
"Zijun Hu" <zijun_hu@icloud.com>, <vishal.l.verma@intel.com>,
<linux-cxl@vger.kernel.org>
Subject: Re: [PATCH 4/5] cxl/port: Fix use-after-free, permit out-of-order decoder shutdown
Date: Tue, 15 Oct 2024 17:47:43 +0100 [thread overview]
Message-ID: <20241015174743.0000180d@Huawei.com> (raw)
In-Reply-To: <172862486548.2150669.3548553804904171839.stgit@dwillia2-xfh.jf.intel.com>
On Thu, 10 Oct 2024 22:34:26 -0700
Dan Williams <dan.j.williams@intel.com> wrote:
> In support of investigating an initialization failure report [1],
> cxl_test was updated to register mock memory-devices after the mock
> root-port/bus device had been registered. That led to cxl_test crashing
> with a use-after-free bug with the following signature:
>
> cxl_port_attach_region: cxl region3: cxl_host_bridge.0:port3 decoder3.0 add: mem0:decoder7.0 @ 0 next: cxl_switch_uport.0 nr_eps: 1 nr_targets: 1
> cxl_port_attach_region: cxl region3: cxl_host_bridge.0:port3 decoder3.0 add: mem4:decoder14.0 @ 1 next: cxl_switch_uport.0 nr_eps: 2 nr_targets: 1
> cxl_port_setup_targets: cxl region3: cxl_switch_uport.0:port6 target[0] = cxl_switch_dport.0 for mem0:decoder7.0 @ 0
> 1) cxl_port_setup_targets: cxl region3: cxl_switch_uport.0:port6 target[1] = cxl_switch_dport.4 for mem4:decoder14.0 @ 1
> [..]
> cxld_unregister: cxl decoder14.0:
> cxl_region_decode_reset: cxl_region region3:
> mock_decoder_reset: cxl_port port3: decoder3.0 reset
> 2) mock_decoder_reset: cxl_port port3: decoder3.0: out of order reset, expected decoder3.1
> cxl_endpoint_decoder_release: cxl decoder14.0:
> [..]
> cxld_unregister: cxl decoder7.0:
> 3) cxl_region_decode_reset: cxl_region region3:
> Oops: general protection fault, probably for non-canonical address 0x6b6b6b6b6b6b6bc3: 0000 [#1] PREEMPT SMP PTI
> [..]
> RIP: 0010:to_cxl_port+0x8/0x60 [cxl_core]
> [..]
> Call Trace:
> <TASK>
> cxl_region_decode_reset+0x69/0x190 [cxl_core]
> cxl_region_detach+0xe8/0x210 [cxl_core]
> cxl_decoder_kill_region+0x27/0x40 [cxl_core]
> cxld_unregister+0x5d/0x60 [cxl_core]
>
> At 1) a region has been established with 2 endpoint decoders (7.0 and
> 14.0). Those endpoints share a common switch-decoder in the topology
> (3.0). At teardown, 2), decoder14.0 is the first to be removed and hits
> the "out of order reset case" in the switch decoder. The effect though
> is that region3 cleanup is aborted leaving it in-tact and
> referencing decoder14.0. At 3) the second attempt to teardown region3
> trips over the stale decoder14.0 object which has long since been
> deleted.
>
> The fix here is to recognize that the CXL specification places no
> mandate on in-order shutdown of switch-decoders, the driver enforces
> in-order allocation, and hardware enforces in-order commit. So, rather
> than fail and leave objects dangling, always remove them.
>
> In support of making cxl_region_decode_reset() always succeed,
> cxl_region_invalidate_memregion() failures are turned into warnings.
> Crashing the kernel is ok there since system integrity is at risk if
> caches cannot be managed around physical address mutation events like
> CXL region destruction.
I'm fine with this, but seems like it is worth breaking out as a precursor
where we can discuss merits of that change separate from the complexity
of the rest.
I don't mind that strongly though so if you keep this intact,
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Trivial passing comment inline.
> diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c
> index 3df10517a327..223c273c0cd1 100644
> --- a/drivers/cxl/core/hdm.c
> +++ b/drivers/cxl/core/hdm.c
> @@ -712,7 +712,44 @@ static int cxl_decoder_commit(struct cxl_decoder *cxld)
> return 0;
> }
>
> -static int cxl_decoder_reset(struct cxl_decoder *cxld)
> +static int commit_reap(struct device *dev, const void *data)
> +{
> + struct cxl_port *port = to_cxl_port(dev->parent);
> + struct cxl_decoder *cxld;
> +
> + if (!is_switch_decoder(dev) && !is_endpoint_decoder(dev))
> + return 0;
> +
> + cxld = to_cxl_decoder(dev);
> + if (port->commit_end == cxld->id &&
> + ((cxld->flags & CXL_DECODER_F_ENABLE) == 0)) {
I'd have gone with !(cxld->flags & CXL_DECODER_F_ENABLE) but
this is consistent with exiting form, so fine as is.
> + port->commit_end--;
> + dev_dbg(&port->dev, "reap: %s commit_end: %d\n",
> + dev_name(&cxld->dev), port->commit_end);
> + }
> +
> + return 0;
> +}
> diff --git a/tools/testing/cxl/test/cxl.c b/tools/testing/cxl/test/cxl.c
> index 90d5afd52dd0..c5bbd89b3192 100644
> --- a/tools/testing/cxl/test/cxl.c
> +++ b/tools/testing/cxl/test/cxl.c
> @@ -693,26 +693,22 @@ static int mock_decoder_commit(struct cxl_decoder *cxld)
> return 0;
> }
>
> -static int mock_decoder_reset(struct cxl_decoder *cxld)
> +static void mock_decoder_reset(struct cxl_decoder *cxld)
> {
> struct cxl_port *port = to_cxl_port(cxld->dev.parent);
> int id = cxld->id;
>
> if ((cxld->flags & CXL_DECODER_F_ENABLE) == 0)
> - return 0;
> + return;
>
> dev_dbg(&port->dev, "%s reset\n", dev_name(&cxld->dev));
> - if (port->commit_end != id) {
> + if (port->commit_end == id)
> + cxl_port_commit_reap(cxld);
> + else
> dev_dbg(&port->dev,
> "%s: out of order reset, expected decoder%d.%d\n",
> dev_name(&cxld->dev), port->id, port->commit_end);
> - return -EBUSY;
> - }
> -
> - port->commit_end--;
> cxld->flags &= ~CXL_DECODER_F_ENABLE;
> -
> - return 0;
> }
>
> static void default_mock_decoder(struct cxl_decoder *cxld)
>
>
next prev parent reply other threads:[~2024-10-15 16:47 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-10-11 5:33 [PATCH 0/5] cxl: Initialization and shutdown fixes Dan Williams
2024-10-11 5:34 ` [PATCH 1/5] cxl/port: Fix CXL port initialization order when the subsystem is built-in Dan Williams
2024-10-14 11:33 ` Jonathan Cameron
2024-10-11 5:34 ` [PATCH 2/5] cxl/port: Fix cxl_bus_rescan() vs bus_rescan_devices() Dan Williams
2024-10-11 12:27 ` Lk Sii
2024-10-11 17:52 ` Dan Williams
2024-10-15 16:36 ` Jonathan Cameron
2024-10-15 17:57 ` Dan Williams
2024-10-16 14:51 ` Jonathan Cameron
2024-10-23 0:33 ` Dan Williams
2024-10-11 5:34 ` [PATCH 3/5] cxl/acpi: Ensure ports ready at cxl_acpi_probe() return Dan Williams
2024-10-11 5:34 ` [PATCH 4/5] cxl/port: Fix use-after-free, permit out-of-order decoder shutdown Dan Williams
2024-10-11 11:50 ` Zijun Hu
2024-10-11 17:46 ` Dan Williams
2024-10-11 23:40 ` Zijun Hu
2024-10-12 17:56 ` Gregory Price
2024-10-12 22:16 ` Dan Williams
2024-10-14 1:29 ` Zijun Hu
2024-10-14 19:32 ` Dan Williams
2024-10-15 0:02 ` Zijun Hu
2024-10-15 0:10 ` Dan Williams
2024-10-15 16:47 ` Jonathan Cameron [this message]
2024-10-23 0:31 ` Dan Williams
2024-10-11 5:34 ` [PATCH 5/5] cxl/test: Improve init-order fidelity relative to real-world systems Dan Williams
2024-10-11 11:21 ` [PATCH 0/5] cxl: Initialization and shutdown fixes Alejandro Lucero Palau
2024-10-11 17:38 ` Dan Williams
2024-10-12 6:30 ` Alejandro Lucero Palau
2024-10-12 21:57 ` Dan Williams
2024-10-14 15:13 ` Alejandro Lucero Palau
2024-10-14 22:24 ` Dan Williams
2024-10-15 8:45 ` Alejandro Lucero Palau
2024-10-15 16:37 ` Dan Williams
2024-10-16 14:41 ` Alejandro Lucero Palau
2024-10-23 0:46 ` Dan Williams
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20241015174743.0000180d@Huawei.com \
--to=jonathan.cameron@huawei.com \
--cc=alison.schofield@intel.com \
--cc=dan.j.williams@intel.com \
--cc=dave.jiang@intel.com \
--cc=dave@stgolabs.net \
--cc=gregkh@linuxfoundation.org \
--cc=ira.weiny@intel.com \
--cc=linux-cxl@vger.kernel.org \
--cc=stable@vger.kernel.org \
--cc=vishal.l.verma@intel.com \
--cc=zijun_hu@icloud.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.