All of lore.kernel.org
 help / color / mirror / Atom feed
From: fan <nifan.cxl@gmail.com>
To: Dan Williams <dan.j.williams@intel.com>
Cc: linux-cxl@vger.kernel.org, stable@vger.kernel.org,
	Alison Schofield <alison.schofield@intel.com>,
	Jonathan Cameron <Jonathan.Cameron@huawei.com>,
	Dave Jiang <dave.jiang@intel.com>,
	Ira Weiny <ira.weiny@intel.com>
Subject: Re: [PATCH] cxl/hdm: Fix dpa translation locking
Date: Tue, 19 Dec 2023 10:14:19 -0800	[thread overview]
Message-ID: <ZYHde6aiBj5mmh7P@debian> (raw)
In-Reply-To: <170192142664.461900.3169528633970716889.stgit@dwillia2-xfh.jf.intel.com>

On Wed, Dec 06, 2023 at 07:57:06PM -0800, Dan Williams wrote:
> The helper, cxl_dpa_resource_start(), snapshots the dpa-address of an
> endpoint-decoder after acquiring the cxl_dpa_rwsem. However, it is
> sufficient to assert that cxl_dpa_rwsem is held rather than acquire it
> in the helper. Otherwise, it triggers multiple lockdep reports:
> 
> 1/ Tracing callbacks are in an atomic context that can not acquire sleeping
> locks:
> 
>     BUG: sleeping function called from invalid context at kernel/locking/rwsem.c:1525
>     in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 1288, name: bash
>     preempt_count: 2, expected: 0
>     RCU nest depth: 0, expected: 0
>     [..]
>     Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS edk2-20230524-3.fc38 05/24/2023
>     Call Trace:
>      <TASK>
>      dump_stack_lvl+0x71/0x90
>      __might_resched+0x1b2/0x2c0
>      down_read+0x1a/0x190
>      cxl_dpa_resource_start+0x15/0x50 [cxl_core]
>      cxl_trace_hpa+0x122/0x300 [cxl_core]
>      trace_event_raw_event_cxl_poison+0x1c9/0x2d0 [cxl_core]
> 
> 2/ The rwsem is already held in the inject poison path:
> 
>     WARNING: possible recursive locking detected
>     6.7.0-rc2+ #12 Tainted: G        W  OE    N
>     --------------------------------------------
>     bash/1288 is trying to acquire lock:
>     ffffffffc05f73d0 (cxl_dpa_rwsem){++++}-{3:3}, at: cxl_dpa_resource_start+0x15/0x50 [cxl_core]
> 
>     but task is already holding lock:
>     ffffffffc05f73d0 (cxl_dpa_rwsem){++++}-{3:3}, at: cxl_inject_poison+0x7d/0x1e0 [cxl_core]
>     [..]
>     Call Trace:
>      <TASK>
>      dump_stack_lvl+0x71/0x90
>      __might_resched+0x1b2/0x2c0
>      down_read+0x1a/0x190
>      cxl_dpa_resource_start+0x15/0x50 [cxl_core]
>      cxl_trace_hpa+0x122/0x300 [cxl_core]
>      trace_event_raw_event_cxl_poison+0x1c9/0x2d0 [cxl_core]
>      __traceiter_cxl_poison+0x5c/0x80 [cxl_core]
>      cxl_inject_poison+0x1bc/0x1e0 [cxl_core]
> 
> This appears to have been an issue since the initial implementation and
> uncovered by the new cxl-poison.sh test [1]. That test is now passing with
> these changes.
> 
> Fixes: 28a3ae4ff66c ("cxl/trace: Add an HPA to cxl_poison trace events")
> Link: http://lore.kernel.org/r/e4f2716646918135ddbadf4146e92abb659de734.1700615159.git.alison.schofield@intel.com [1]
> Cc: <stable@vger.kernel.org>
> Cc: Alison Schofield <alison.schofield@intel.com>
> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Cc: Dave Jiang <dave.jiang@intel.com>
> Cc: Ira Weiny <ira.weiny@intel.com>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---

Reviewed-by: Fan Ni <fan.ni@samsung.com>

>  drivers/cxl/core/hdm.c  |    3 +--
>  drivers/cxl/core/port.c |    4 ++--
>  2 files changed, 3 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c
> index 529baa8a1759..7d97790b893d 100644
> --- a/drivers/cxl/core/hdm.c
> +++ b/drivers/cxl/core/hdm.c
> @@ -363,10 +363,9 @@ resource_size_t cxl_dpa_resource_start(struct cxl_endpoint_decoder *cxled)
>  {
>  	resource_size_t base = -1;
>  
> -	down_read(&cxl_dpa_rwsem);
> +	lockdep_assert_held(&cxl_dpa_rwsem);
>  	if (cxled->dpa_res)
>  		base = cxled->dpa_res->start;
> -	up_read(&cxl_dpa_rwsem);
>  
>  	return base;
>  }
> diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
> index 38441634e4c6..f6e9b2986a9a 100644
> --- a/drivers/cxl/core/port.c
> +++ b/drivers/cxl/core/port.c
> @@ -226,9 +226,9 @@ static ssize_t dpa_resource_show(struct device *dev, struct device_attribute *at
>  			    char *buf)
>  {
>  	struct cxl_endpoint_decoder *cxled = to_cxl_endpoint_decoder(dev);
> -	u64 base = cxl_dpa_resource_start(cxled);
>  
> -	return sysfs_emit(buf, "%#llx\n", base);
> +	guard(rwsem_read)(&cxl_dpa_rwsem);
> +	return sysfs_emit(buf, "%#llx\n", cxl_dpa_resource_start(cxled));
>  }
>  static DEVICE_ATTR_RO(dpa_resource);
>  
> 

      parent reply	other threads:[~2023-12-19 18:14 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-12-07  3:57 [PATCH] cxl/hdm: Fix dpa translation locking Dan Williams
2023-12-07 15:40 ` kernel test robot
2023-12-07 15:46 ` Dave Jiang
2023-12-07 16:13 ` kernel test robot
2023-12-19 16:49 ` Jonathan Cameron
2023-12-19 18:14 ` fan [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZYHde6aiBj5mmh7P@debian \
    --to=nifan.cxl@gmail.com \
    --cc=Jonathan.Cameron@huawei.com \
    --cc=alison.schofield@intel.com \
    --cc=dan.j.williams@intel.com \
    --cc=dave.jiang@intel.com \
    --cc=ira.weiny@intel.com \
    --cc=linux-cxl@vger.kernel.org \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.