public inbox for linux-cxl@vger.kernel.org
 help / color / mirror / Atom feed
From: Zijun Hu <zijun_hu@icloud.com>
To: Dan Williams <dan.j.williams@intel.com>,
	dave.jiang@intel.com, ira.weiny@intel.com
Cc: stable@vger.kernel.org,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Davidlohr Bueso <dave@stgolabs.net>,
	Jonathan Cameron <jonathan.cameron@huawei.com>,
	Alison Schofield <alison.schofield@intel.com>,
	vishal.l.verma@intel.com, linux-cxl@vger.kernel.org
Subject: Re: [PATCH 4/5] cxl/port: Fix use-after-free, permit out-of-order decoder shutdown
Date: Fri, 11 Oct 2024 19:50:11 +0800	[thread overview]
Message-ID: <a7b8a007-907a-4fda-9a35-68faed109ed3@icloud.com> (raw)
In-Reply-To: <172862486548.2150669.3548553804904171839.stgit@dwillia2-xfh.jf.intel.com>

On 2024/10/11 13:34, Dan Williams wrote:
> In support of investigating an initialization failure report [1],
> cxl_test was updated to register mock memory-devices after the mock
> root-port/bus device had been registered. That led to cxl_test crashing
> with a use-after-free bug with the following signature:
> 
>     cxl_port_attach_region: cxl region3: cxl_host_bridge.0:port3 decoder3.0 add: mem0:decoder7.0 @ 0 next: cxl_switch_uport.0 nr_eps: 1 nr_targets: 1
>     cxl_port_attach_region: cxl region3: cxl_host_bridge.0:port3 decoder3.0 add: mem4:decoder14.0 @ 1 next: cxl_switch_uport.0 nr_eps: 2 nr_targets: 1
>     cxl_port_setup_targets: cxl region3: cxl_switch_uport.0:port6 target[0] = cxl_switch_dport.0 for mem0:decoder7.0 @ 0
> 1)  cxl_port_setup_targets: cxl region3: cxl_switch_uport.0:port6 target[1] = cxl_switch_dport.4 for mem4:decoder14.0 @ 1
>     [..]
>     cxld_unregister: cxl decoder14.0:
>     cxl_region_decode_reset: cxl_region region3:
>     mock_decoder_reset: cxl_port port3: decoder3.0 reset
> 2)  mock_decoder_reset: cxl_port port3: decoder3.0: out of order reset, expected decoder3.1
>     cxl_endpoint_decoder_release: cxl decoder14.0:
>     [..]
>     cxld_unregister: cxl decoder7.0:
> 3)  cxl_region_decode_reset: cxl_region region3:
>     Oops: general protection fault, probably for non-canonical address 0x6b6b6b6b6b6b6bc3: 0000 [#1] PREEMPT SMP PTI
>     [..]
>     RIP: 0010:to_cxl_port+0x8/0x60 [cxl_core]
>     [..]
>     Call Trace:
>      <TASK>
>      cxl_region_decode_reset+0x69/0x190 [cxl_core]
>      cxl_region_detach+0xe8/0x210 [cxl_core]
>      cxl_decoder_kill_region+0x27/0x40 [cxl_core]
>      cxld_unregister+0x5d/0x60 [cxl_core]
> 
> At 1) a region has been established with 2 endpoint decoders (7.0 and
> 14.0). Those endpoints share a common switch-decoder in the topology
> (3.0). At teardown, 2), decoder14.0 is the first to be removed and hits
> the "out of order reset case" in the switch decoder. The effect though
> is that region3 cleanup is aborted leaving it in-tact and
> referencing decoder14.0. At 3) the second attempt to teardown region3
> trips over the stale decoder14.0 object which has long since been
> deleted.
> 
> The fix here is to recognize that the CXL specification places no
> mandate on in-order shutdown of switch-decoders, the driver enforces
> in-order allocation, and hardware enforces in-order commit. So, rather
> than fail and leave objects dangling, always remove them.
> 
> In support of making cxl_region_decode_reset() always succeed,
> cxl_region_invalidate_memregion() failures are turned into warnings.
> Crashing the kernel is ok there since system integrity is at risk if
> caches cannot be managed around physical address mutation events like
> CXL region destruction.
> 
> A new device_for_each_child_reverse_from() is added to cleanup
> port->commit_end after all dependent decoders have been disabled. In
> other words if decoders are allocated 0->1->2 and disabled 1->2->0 then
> port->commit_end only decrements from 2 after 2 has been disabled, and
> it decrements all the way to zero since 1 was disabled previously.
> 
> Link: http://lore.kernel.org/20241004212504.1246-1-gourry@gourry.net [1]
> Cc: <stable@vger.kernel.org>
> Fixes: 176baefb2eb5 ("cxl/hdm: Commit decoder state to hardware")
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: Davidlohr Bueso <dave@stgolabs.net>
> Cc: Jonathan Cameron <jonathan.cameron@huawei.com>
> Cc: Dave Jiang <dave.jiang@intel.com>
> Cc: Alison Schofield <alison.schofield@intel.com>
> Cc: Ira Weiny <ira.weiny@intel.com>
> Cc: Zijun Hu <zijun_hu@icloud.com>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---
>  drivers/base/core.c          |   35 +++++++++++++++++++++++++++++
>  drivers/cxl/core/hdm.c       |   50 +++++++++++++++++++++++++++++++++++-------
>  drivers/cxl/core/region.c    |   48 +++++++++++-----------------------------
>  drivers/cxl/cxl.h            |    3 ++-
>  include/linux/device.h       |    3 +++
>  tools/testing/cxl/test/cxl.c |   14 ++++--------
>  6 files changed, 100 insertions(+), 53 deletions(-)
> 
> diff --git a/drivers/base/core.c b/drivers/base/core.c
> index a4c853411a6b..e42f1ad73078 100644
> --- a/drivers/base/core.c
> +++ b/drivers/base/core.c
> @@ -4037,6 +4037,41 @@ int device_for_each_child_reverse(struct device *parent, void *data,
>  }
>  EXPORT_SYMBOL_GPL(device_for_each_child_reverse);
>  
> +/**
> + * device_for_each_child_reverse_from - device child iterator in reversed order.
> + * @parent: parent struct device.
> + * @from: optional starting point in child list
> + * @fn: function to be called for each device.
> + * @data: data for the callback.
> + *
> + * Iterate over @parent's child devices, starting at @from, and call @fn
> + * for each, passing it @data. This helper is identical to
> + * device_for_each_child_reverse() when @from is NULL.
> + *
> + * @fn is checked each iteration. If it returns anything other than 0,
> + * iteration stop and that value is returned to the caller of
> + * device_for_each_child_reverse_from();
> + */
> +int device_for_each_child_reverse_from(struct device *parent,
> +				       struct device *from, const void *data,
> +				       int (*fn)(struct device *, const void *))
> +{
> +	struct klist_iter i;
> +	struct device *child;
> +	int error = 0;
> +
> +	if (!parent->p)
> +		return 0;
> +
> +	klist_iter_init_node(&parent->p->klist_children, &i,
> +			     (from ? &from->p->knode_parent : NULL));
> +	while ((child = prev_device(&i)) && !error)
> +		error = fn(child, data);
> +	klist_iter_exit(&i);
> +	return error;
> +}
> +EXPORT_SYMBOL_GPL(device_for_each_child_reverse_from);
> +

it does NOT deserve, also does NOT need to introduce a new core driver
API device_for_each_child_reverse_from(). existing
device_for_each_child_reverse() can do what the _from() wants to do.

we can use similar approach as below link shown:
https://lore.kernel.org/all/20240815-const_dfc_prepare-v2-2-8316b87b8ff9@quicinc.com/

>  /**
>   * device_find_child - device iterator for locating a particular device.
>   * @parent: parent struct device
> diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c
> index 3df10517a327..223c273c0cd1 100644
> --- a/drivers/cxl/core/hdm.c
> +++ b/drivers/cxl/core/hdm.c
> @@ -712,7 +712,44 @@ static int cxl_decoder_commit(struct cxl_decoder *cxld)
>  	return 0;
>  }
>  
> -static int cxl_decoder_reset(struct cxl_decoder *cxld)
> +static int commit_reap(struct device *dev, const void *data)
> +{
> +	struct cxl_port *port = to_cxl_port(dev->parent);
> +	struct cxl_decoder *cxld;
> +
> +	if (!is_switch_decoder(dev) && !is_endpoint_decoder(dev))
> +		return 0;
> +
> +	cxld = to_cxl_decoder(dev);
> +	if (port->commit_end == cxld->id &&
> +	    ((cxld->flags & CXL_DECODER_F_ENABLE) == 0)) {
> +		port->commit_end--;
> +		dev_dbg(&port->dev, "reap: %s commit_end: %d\n",
> +			dev_name(&cxld->dev), port->commit_end);
> +	}
> +
> +	return 0;
> +}
> +
> +void cxl_port_commit_reap(struct cxl_decoder *cxld)
> +{
> +	struct cxl_port *port = to_cxl_port(cxld->dev.parent);
> +
> +	lockdep_assert_held_write(&cxl_region_rwsem);
> +
> +	/*
> +	 * Once the highest committed decoder is disabled, free any other
> +	 * decoders that were pinned allocated by out-of-order release.
> +	 */
> +	port->commit_end--;
> +	dev_dbg(&port->dev, "reap: %s commit_end: %d\n", dev_name(&cxld->dev),
> +		port->commit_end);
> +	device_for_each_child_reverse_from(&port->dev, &cxld->dev, NULL,
> +					   commit_reap);
> +}
> +EXPORT_SYMBOL_NS_GPL(cxl_port_commit_reap, CXL);
> +
> +static void cxl_decoder_reset(struct cxl_decoder *cxld)
>  {
>  	struct cxl_port *port = to_cxl_port(cxld->dev.parent);
>  	struct cxl_hdm *cxlhdm = dev_get_drvdata(&port->dev);
> @@ -721,14 +758,14 @@ static int cxl_decoder_reset(struct cxl_decoder *cxld)
>  	u32 ctrl;
>  
>  	if ((cxld->flags & CXL_DECODER_F_ENABLE) == 0)
> -		return 0;
> +		return;
>  
> -	if (port->commit_end != id) {
> +	if (port->commit_end == id)
> +		cxl_port_commit_reap(cxld);
> +	else
>  		dev_dbg(&port->dev,
>  			"%s: out of order reset, expected decoder%d.%d\n",
>  			dev_name(&cxld->dev), port->id, port->commit_end);
> -		return -EBUSY;
> -	}
>  
>  	down_read(&cxl_dpa_rwsem);
>  	ctrl = readl(hdm + CXL_HDM_DECODER0_CTRL_OFFSET(id));
> @@ -741,7 +778,6 @@ static int cxl_decoder_reset(struct cxl_decoder *cxld)
>  	writel(0, hdm + CXL_HDM_DECODER0_BASE_LOW_OFFSET(id));
>  	up_read(&cxl_dpa_rwsem);
>  
> -	port->commit_end--;
>  	cxld->flags &= ~CXL_DECODER_F_ENABLE;
>  
>  	/* Userspace is now responsible for reconfiguring this decoder */
> @@ -751,8 +787,6 @@ static int cxl_decoder_reset(struct cxl_decoder *cxld)
>  		cxled = to_cxl_endpoint_decoder(&cxld->dev);
>  		cxled->state = CXL_DECODER_STATE_MANUAL;
>  	}
> -
> -	return 0;
>  }
>  
>  static int cxl_setup_hdm_decoder_from_dvsec(
> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c
> index e701e4b04032..3478d2058303 100644
> --- a/drivers/cxl/core/region.c
> +++ b/drivers/cxl/core/region.c
> @@ -232,8 +232,8 @@ static int cxl_region_invalidate_memregion(struct cxl_region *cxlr)
>  				"Bypassing cpu_cache_invalidate_memregion() for testing!\n");
>  			return 0;
>  		} else {
> -			dev_err(&cxlr->dev,
> -				"Failed to synchronize CPU cache state\n");
> +			dev_WARN(&cxlr->dev,
> +				 "Failed to synchronize CPU cache state\n");
>  			return -ENXIO;
>  		}
>  	}
> @@ -242,19 +242,17 @@ static int cxl_region_invalidate_memregion(struct cxl_region *cxlr)
>  	return 0;
>  }
>  
> -static int cxl_region_decode_reset(struct cxl_region *cxlr, int count)
> +static void cxl_region_decode_reset(struct cxl_region *cxlr, int count)
>  {
>  	struct cxl_region_params *p = &cxlr->params;
> -	int i, rc = 0;
> +	int i;
>  
>  	/*
> -	 * Before region teardown attempt to flush, and if the flush
> -	 * fails cancel the region teardown for data consistency
> -	 * concerns
> +	 * Before region teardown attempt to flush, evict any data cached for
> +	 * this region, or scream loudly about missing arch / platform support
> +	 * for CXL teardown.
>  	 */
> -	rc = cxl_region_invalidate_memregion(cxlr);
> -	if (rc)
> -		return rc;
> +	cxl_region_invalidate_memregion(cxlr);
>  
>  	for (i = count - 1; i >= 0; i--) {
>  		struct cxl_endpoint_decoder *cxled = p->targets[i];
> @@ -277,23 +275,17 @@ static int cxl_region_decode_reset(struct cxl_region *cxlr, int count)
>  			cxl_rr = cxl_rr_load(iter, cxlr);
>  			cxld = cxl_rr->decoder;
>  			if (cxld->reset)
> -				rc = cxld->reset(cxld);
> -			if (rc)
> -				return rc;
> +				cxld->reset(cxld);
>  			set_bit(CXL_REGION_F_NEEDS_RESET, &cxlr->flags);
>  		}
>  
>  endpoint_reset:
> -		rc = cxled->cxld.reset(&cxled->cxld);
> -		if (rc)
> -			return rc;
> +		cxled->cxld.reset(&cxled->cxld);
>  		set_bit(CXL_REGION_F_NEEDS_RESET, &cxlr->flags);
>  	}
>  
>  	/* all decoders associated with this region have been torn down */
>  	clear_bit(CXL_REGION_F_NEEDS_RESET, &cxlr->flags);
> -
> -	return 0;
>  }
>  
>  static int commit_decoder(struct cxl_decoder *cxld)
> @@ -409,16 +401,8 @@ static ssize_t commit_store(struct device *dev, struct device_attribute *attr,
>  		 * still pending.
>  		 */
>  		if (p->state == CXL_CONFIG_RESET_PENDING) {
> -			rc = cxl_region_decode_reset(cxlr, p->interleave_ways);
> -			/*
> -			 * Revert to committed since there may still be active
> -			 * decoders associated with this region, or move forward
> -			 * to active to mark the reset successful
> -			 */
> -			if (rc)
> -				p->state = CXL_CONFIG_COMMIT;
> -			else
> -				p->state = CXL_CONFIG_ACTIVE;
> +			cxl_region_decode_reset(cxlr, p->interleave_ways);
> +			p->state = CXL_CONFIG_ACTIVE;
>  		}
>  	}
>  
> @@ -2054,13 +2038,7 @@ static int cxl_region_detach(struct cxl_endpoint_decoder *cxled)
>  	get_device(&cxlr->dev);
>  
>  	if (p->state > CXL_CONFIG_ACTIVE) {
> -		/*
> -		 * TODO: tear down all impacted regions if a device is
> -		 * removed out of order
> -		 */
> -		rc = cxl_region_decode_reset(cxlr, p->interleave_ways);
> -		if (rc)
> -			goto out;
> +		cxl_region_decode_reset(cxlr, p->interleave_ways);
>  		p->state = CXL_CONFIG_ACTIVE;
>  	}
>  
> diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h
> index 0d8b810a51f0..5406e3ab3d4a 100644
> --- a/drivers/cxl/cxl.h
> +++ b/drivers/cxl/cxl.h
> @@ -359,7 +359,7 @@ struct cxl_decoder {
>  	struct cxl_region *region;
>  	unsigned long flags;
>  	int (*commit)(struct cxl_decoder *cxld);
> -	int (*reset)(struct cxl_decoder *cxld);
> +	void (*reset)(struct cxl_decoder *cxld);
>  };
>  
>  /*
> @@ -730,6 +730,7 @@ static inline bool is_cxl_root(struct cxl_port *port)
>  int cxl_num_decoders_committed(struct cxl_port *port);
>  bool is_cxl_port(const struct device *dev);
>  struct cxl_port *to_cxl_port(const struct device *dev);
> +void cxl_port_commit_reap(struct cxl_decoder *cxld);
>  struct pci_bus;
>  int devm_cxl_register_pci_bus(struct device *host, struct device *uport_dev,
>  			      struct pci_bus *bus);
> diff --git a/include/linux/device.h b/include/linux/device.h
> index b4bde8d22697..667cb6db9019 100644
> --- a/include/linux/device.h
> +++ b/include/linux/device.h
> @@ -1078,6 +1078,9 @@ int device_for_each_child(struct device *dev, void *data,
>  			  int (*fn)(struct device *dev, void *data));
>  int device_for_each_child_reverse(struct device *dev, void *data,
>  				  int (*fn)(struct device *dev, void *data));
> +int device_for_each_child_reverse_from(struct device *parent,
> +				       struct device *from, const void *data,
> +				       int (*fn)(struct device *, const void *));
>  struct device *device_find_child(struct device *dev, void *data,
>  				 int (*match)(struct device *dev, void *data));
>  struct device *device_find_child_by_name(struct device *parent,
> diff --git a/tools/testing/cxl/test/cxl.c b/tools/testing/cxl/test/cxl.c
> index 90d5afd52dd0..c5bbd89b3192 100644
> --- a/tools/testing/cxl/test/cxl.c
> +++ b/tools/testing/cxl/test/cxl.c
> @@ -693,26 +693,22 @@ static int mock_decoder_commit(struct cxl_decoder *cxld)
>  	return 0;
>  }
>  
> -static int mock_decoder_reset(struct cxl_decoder *cxld)
> +static void mock_decoder_reset(struct cxl_decoder *cxld)
>  {
>  	struct cxl_port *port = to_cxl_port(cxld->dev.parent);
>  	int id = cxld->id;
>  
>  	if ((cxld->flags & CXL_DECODER_F_ENABLE) == 0)
> -		return 0;
> +		return;
>  
>  	dev_dbg(&port->dev, "%s reset\n", dev_name(&cxld->dev));
> -	if (port->commit_end != id) {
> +	if (port->commit_end == id)
> +		cxl_port_commit_reap(cxld);
> +	else
>  		dev_dbg(&port->dev,
>  			"%s: out of order reset, expected decoder%d.%d\n",
>  			dev_name(&cxld->dev), port->id, port->commit_end);
> -		return -EBUSY;
> -	}
> -
> -	port->commit_end--;
>  	cxld->flags &= ~CXL_DECODER_F_ENABLE;
> -
> -	return 0;
>  }
>  
>  static void default_mock_decoder(struct cxl_decoder *cxld)
> 


  reply	other threads:[~2024-10-11 11:50 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-10-11  5:33 [PATCH 0/5] cxl: Initialization and shutdown fixes Dan Williams
2024-10-11  5:34 ` [PATCH 1/5] cxl/port: Fix CXL port initialization order when the subsystem is built-in Dan Williams
2024-10-14 11:33   ` Jonathan Cameron
2024-10-11  5:34 ` [PATCH 2/5] cxl/port: Fix cxl_bus_rescan() vs bus_rescan_devices() Dan Williams
2024-10-11 12:27   ` Lk Sii
2024-10-11 17:52     ` Dan Williams
2024-10-15 16:36   ` Jonathan Cameron
2024-10-15 17:57     ` Dan Williams
2024-10-16 14:51       ` Jonathan Cameron
2024-10-23  0:33         ` Dan Williams
2024-10-11  5:34 ` [PATCH 3/5] cxl/acpi: Ensure ports ready at cxl_acpi_probe() return Dan Williams
2024-10-11  5:34 ` [PATCH 4/5] cxl/port: Fix use-after-free, permit out-of-order decoder shutdown Dan Williams
2024-10-11 11:50   ` Zijun Hu [this message]
2024-10-11 17:46     ` Dan Williams
2024-10-11 23:40       ` Zijun Hu
2024-10-12 17:56         ` Gregory Price
2024-10-12 22:16         ` Dan Williams
2024-10-14  1:29           ` Zijun Hu
2024-10-14 19:32             ` Dan Williams
2024-10-15  0:02               ` Zijun Hu
2024-10-15  0:10                 ` Dan Williams
2024-10-15 16:47   ` Jonathan Cameron
2024-10-23  0:31     ` Dan Williams
2024-10-11  5:34 ` [PATCH 5/5] cxl/test: Improve init-order fidelity relative to real-world systems Dan Williams
2024-10-11 11:21 ` [PATCH 0/5] cxl: Initialization and shutdown fixes Alejandro Lucero Palau
2024-10-11 17:38   ` Dan Williams
2024-10-12  6:30     ` Alejandro Lucero Palau
2024-10-12 21:57       ` Dan Williams
2024-10-14 15:13         ` Alejandro Lucero Palau
2024-10-14 22:24           ` Dan Williams
2024-10-15  8:45             ` Alejandro Lucero Palau
2024-10-15 16:37               ` Dan Williams
2024-10-16 14:41                 ` Alejandro Lucero Palau
2024-10-23  0:46                   ` Dan Williams

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a7b8a007-907a-4fda-9a35-68faed109ed3@icloud.com \
    --to=zijun_hu@icloud.com \
    --cc=alison.schofield@intel.com \
    --cc=dan.j.williams@intel.com \
    --cc=dave.jiang@intel.com \
    --cc=dave@stgolabs.net \
    --cc=gregkh@linuxfoundation.org \
    --cc=ira.weiny@intel.com \
    --cc=jonathan.cameron@huawei.com \
    --cc=linux-cxl@vger.kernel.org \
    --cc=stable@vger.kernel.org \
    --cc=vishal.l.verma@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox