* Re: [PATCH] cxl/hdm: Fix dpa translation locking
2023-12-07 3:57 [PATCH] cxl/hdm: Fix dpa translation locking Dan Williams
@ 2023-12-07 15:40 ` kernel test robot
2023-12-07 15:46 ` Dave Jiang
` (3 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: kernel test robot @ 2023-12-07 15:40 UTC (permalink / raw)
To: Dan Williams, linux-cxl
Cc: oe-kbuild-all, stable, Alison Schofield, Jonathan Cameron,
Dave Jiang, Ira Weiny
Hi Dan,
kernel test robot noticed the following build warnings:
[auto build test WARNING on linus/master]
[also build test WARNING on v6.7-rc4 next-20231207]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/Dan-Williams/cxl-hdm-Fix-dpa-translation-locking/20231207-115958
base: linus/master
patch link: https://lore.kernel.org/r/170192142664.461900.3169528633970716889.stgit%40dwillia2-xfh.jf.intel.com
patch subject: [PATCH] cxl/hdm: Fix dpa translation locking
config: i386-buildonly-randconfig-003-20231207 (https://download.01.org/0day-ci/archive/20231207/202312072350.QQTkSsY7-lkp@intel.com/config)
compiler: gcc-12 (Debian 12.2.0-14) 12.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20231207/202312072350.QQTkSsY7-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202312072350.QQTkSsY7-lkp@intel.com/
All warnings (new ones prefixed by >>):
drivers/cxl/core/port.c: In function 'dpa_resource_show':
>> drivers/cxl/core/port.c:231:37: warning: format '%llx' expects argument of type 'long long unsigned int', but argument 3 has type 'resource_size_t' {aka 'unsigned int'} [-Wformat=]
231 | return sysfs_emit(buf, "%#llx\n", cxl_dpa_resource_start(cxled));
| ~~~~^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
| | |
| | resource_size_t {aka unsigned int}
| long long unsigned int
| %#x
vim +231 drivers/cxl/core/port.c
224
225 static ssize_t dpa_resource_show(struct device *dev, struct device_attribute *attr,
226 char *buf)
227 {
228 struct cxl_endpoint_decoder *cxled = to_cxl_endpoint_decoder(dev);
229
230 guard(rwsem_read)(&cxl_dpa_rwsem);
> 231 return sysfs_emit(buf, "%#llx\n", cxl_dpa_resource_start(cxled));
232 }
233 static DEVICE_ATTR_RO(dpa_resource);
234
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: [PATCH] cxl/hdm: Fix dpa translation locking
2023-12-07 3:57 [PATCH] cxl/hdm: Fix dpa translation locking Dan Williams
2023-12-07 15:40 ` kernel test robot
@ 2023-12-07 15:46 ` Dave Jiang
2023-12-07 16:13 ` kernel test robot
` (2 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: Dave Jiang @ 2023-12-07 15:46 UTC (permalink / raw)
To: Dan Williams, linux-cxl
Cc: stable, Alison Schofield, Jonathan Cameron, Ira Weiny
On 12/6/23 20:57, Dan Williams wrote:
> The helper, cxl_dpa_resource_start(), snapshots the dpa-address of an
> endpoint-decoder after acquiring the cxl_dpa_rwsem. However, it is
> sufficient to assert that cxl_dpa_rwsem is held rather than acquire it
> in the helper. Otherwise, it triggers multiple lockdep reports:
>
> 1/ Tracing callbacks are in an atomic context that can not acquire sleeping
> locks:
>
> BUG: sleeping function called from invalid context at kernel/locking/rwsem.c:1525
> in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 1288, name: bash
> preempt_count: 2, expected: 0
> RCU nest depth: 0, expected: 0
> [..]
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS edk2-20230524-3.fc38 05/24/2023
> Call Trace:
> <TASK>
> dump_stack_lvl+0x71/0x90
> __might_resched+0x1b2/0x2c0
> down_read+0x1a/0x190
> cxl_dpa_resource_start+0x15/0x50 [cxl_core]
> cxl_trace_hpa+0x122/0x300 [cxl_core]
> trace_event_raw_event_cxl_poison+0x1c9/0x2d0 [cxl_core]
>
> 2/ The rwsem is already held in the inject poison path:
>
> WARNING: possible recursive locking detected
> 6.7.0-rc2+ #12 Tainted: G W OE N
> --------------------------------------------
> bash/1288 is trying to acquire lock:
> ffffffffc05f73d0 (cxl_dpa_rwsem){++++}-{3:3}, at: cxl_dpa_resource_start+0x15/0x50 [cxl_core]
>
> but task is already holding lock:
> ffffffffc05f73d0 (cxl_dpa_rwsem){++++}-{3:3}, at: cxl_inject_poison+0x7d/0x1e0 [cxl_core]
> [..]
> Call Trace:
> <TASK>
> dump_stack_lvl+0x71/0x90
> __might_resched+0x1b2/0x2c0
> down_read+0x1a/0x190
> cxl_dpa_resource_start+0x15/0x50 [cxl_core]
> cxl_trace_hpa+0x122/0x300 [cxl_core]
> trace_event_raw_event_cxl_poison+0x1c9/0x2d0 [cxl_core]
> __traceiter_cxl_poison+0x5c/0x80 [cxl_core]
> cxl_inject_poison+0x1bc/0x1e0 [cxl_core]
>
> This appears to have been an issue since the initial implementation and
> uncovered by the new cxl-poison.sh test [1]. That test is now passing with
> these changes.
>
> Fixes: 28a3ae4ff66c ("cxl/trace: Add an HPA to cxl_poison trace events")
> Link: http://lore.kernel.org/r/e4f2716646918135ddbadf4146e92abb659de734.1700615159.git.alison.schofield@intel.com [1]
> Cc: <stable@vger.kernel.org>
> Cc: Alison Schofield <alison.schofield@intel.com>
> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Cc: Dave Jiang <dave.jiang@intel.com>
> Cc: Ira Weiny <ira.weiny@intel.com>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
> ---
> drivers/cxl/core/hdm.c | 3 +--
> drivers/cxl/core/port.c | 4 ++--
> 2 files changed, 3 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c
> index 529baa8a1759..7d97790b893d 100644
> --- a/drivers/cxl/core/hdm.c
> +++ b/drivers/cxl/core/hdm.c
> @@ -363,10 +363,9 @@ resource_size_t cxl_dpa_resource_start(struct cxl_endpoint_decoder *cxled)
> {
> resource_size_t base = -1;
>
> - down_read(&cxl_dpa_rwsem);
> + lockdep_assert_held(&cxl_dpa_rwsem);
> if (cxled->dpa_res)
> base = cxled->dpa_res->start;
> - up_read(&cxl_dpa_rwsem);
>
> return base;
> }
> diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
> index 38441634e4c6..f6e9b2986a9a 100644
> --- a/drivers/cxl/core/port.c
> +++ b/drivers/cxl/core/port.c
> @@ -226,9 +226,9 @@ static ssize_t dpa_resource_show(struct device *dev, struct device_attribute *at
> char *buf)
> {
> struct cxl_endpoint_decoder *cxled = to_cxl_endpoint_decoder(dev);
> - u64 base = cxl_dpa_resource_start(cxled);
>
> - return sysfs_emit(buf, "%#llx\n", base);
> + guard(rwsem_read)(&cxl_dpa_rwsem);
> + return sysfs_emit(buf, "%#llx\n", cxl_dpa_resource_start(cxled));
> }
> static DEVICE_ATTR_RO(dpa_resource);
>
>
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: [PATCH] cxl/hdm: Fix dpa translation locking
2023-12-07 3:57 [PATCH] cxl/hdm: Fix dpa translation locking Dan Williams
2023-12-07 15:40 ` kernel test robot
2023-12-07 15:46 ` Dave Jiang
@ 2023-12-07 16:13 ` kernel test robot
2023-12-19 16:49 ` Jonathan Cameron
2023-12-19 18:14 ` fan
4 siblings, 0 replies; 6+ messages in thread
From: kernel test robot @ 2023-12-07 16:13 UTC (permalink / raw)
To: Dan Williams, linux-cxl
Cc: llvm, oe-kbuild-all, stable, Alison Schofield, Jonathan Cameron,
Dave Jiang, Ira Weiny
Hi Dan,
kernel test robot noticed the following build warnings:
[auto build test WARNING on linus/master]
[also build test WARNING on v6.7-rc4 next-20231207]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/Dan-Williams/cxl-hdm-Fix-dpa-translation-locking/20231207-115958
base: linus/master
patch link: https://lore.kernel.org/r/170192142664.461900.3169528633970716889.stgit%40dwillia2-xfh.jf.intel.com
patch subject: [PATCH] cxl/hdm: Fix dpa translation locking
config: i386-allmodconfig (https://download.01.org/0day-ci/archive/20231208/202312080021.eQEtUpnB-lkp@intel.com/config)
compiler: clang version 16.0.4 (https://github.com/llvm/llvm-project.git ae42196bc493ffe877a7e3dff8be32035dea4d07)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20231208/202312080021.eQEtUpnB-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202312080021.eQEtUpnB-lkp@intel.com/
All warnings (new ones prefixed by >>):
>> drivers/cxl/core/port.c:231:36: warning: format specifies type 'unsigned long long' but the argument has type 'resource_size_t' (aka 'unsigned int') [-Wformat]
return sysfs_emit(buf, "%#llx\n", cxl_dpa_resource_start(cxled));
~~~~~ ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
%#x
1 warning generated.
vim +231 drivers/cxl/core/port.c
224
225 static ssize_t dpa_resource_show(struct device *dev, struct device_attribute *attr,
226 char *buf)
227 {
228 struct cxl_endpoint_decoder *cxled = to_cxl_endpoint_decoder(dev);
229
230 guard(rwsem_read)(&cxl_dpa_rwsem);
> 231 return sysfs_emit(buf, "%#llx\n", cxl_dpa_resource_start(cxled));
232 }
233 static DEVICE_ATTR_RO(dpa_resource);
234
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: [PATCH] cxl/hdm: Fix dpa translation locking
2023-12-07 3:57 [PATCH] cxl/hdm: Fix dpa translation locking Dan Williams
` (2 preceding siblings ...)
2023-12-07 16:13 ` kernel test robot
@ 2023-12-19 16:49 ` Jonathan Cameron
2023-12-19 18:14 ` fan
4 siblings, 0 replies; 6+ messages in thread
From: Jonathan Cameron @ 2023-12-19 16:49 UTC (permalink / raw)
To: Dan Williams; +Cc: linux-cxl, stable, Alison Schofield, Dave Jiang, Ira Weiny
On Wed, 06 Dec 2023 19:57:06 -0800
Dan Williams <dan.j.williams@intel.com> wrote:
> The helper, cxl_dpa_resource_start(), snapshots the dpa-address of an
> endpoint-decoder after acquiring the cxl_dpa_rwsem. However, it is
> sufficient to assert that cxl_dpa_rwsem is held rather than acquire it
> in the helper. Otherwise, it triggers multiple lockdep reports:
>
> 1/ Tracing callbacks are in an atomic context that can not acquire sleeping
> locks:
>
> BUG: sleeping function called from invalid context at kernel/locking/rwsem.c:1525
> in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 1288, name: bash
> preempt_count: 2, expected: 0
> RCU nest depth: 0, expected: 0
> [..]
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS edk2-20230524-3.fc38 05/24/2023
> Call Trace:
> <TASK>
> dump_stack_lvl+0x71/0x90
> __might_resched+0x1b2/0x2c0
> down_read+0x1a/0x190
> cxl_dpa_resource_start+0x15/0x50 [cxl_core]
> cxl_trace_hpa+0x122/0x300 [cxl_core]
> trace_event_raw_event_cxl_poison+0x1c9/0x2d0 [cxl_core]
>
> 2/ The rwsem is already held in the inject poison path:
>
> WARNING: possible recursive locking detected
> 6.7.0-rc2+ #12 Tainted: G W OE N
> --------------------------------------------
> bash/1288 is trying to acquire lock:
> ffffffffc05f73d0 (cxl_dpa_rwsem){++++}-{3:3}, at: cxl_dpa_resource_start+0x15/0x50 [cxl_core]
>
> but task is already holding lock:
> ffffffffc05f73d0 (cxl_dpa_rwsem){++++}-{3:3}, at: cxl_inject_poison+0x7d/0x1e0 [cxl_core]
> [..]
> Call Trace:
> <TASK>
> dump_stack_lvl+0x71/0x90
> __might_resched+0x1b2/0x2c0
> down_read+0x1a/0x190
> cxl_dpa_resource_start+0x15/0x50 [cxl_core]
> cxl_trace_hpa+0x122/0x300 [cxl_core]
> trace_event_raw_event_cxl_poison+0x1c9/0x2d0 [cxl_core]
> __traceiter_cxl_poison+0x5c/0x80 [cxl_core]
> cxl_inject_poison+0x1bc/0x1e0 [cxl_core]
>
> This appears to have been an issue since the initial implementation and
> uncovered by the new cxl-poison.sh test [1]. That test is now passing with
> these changes.
>
> Fixes: 28a3ae4ff66c ("cxl/trace: Add an HPA to cxl_poison trace events")
> Link: http://lore.kernel.org/r/e4f2716646918135ddbadf4146e92abb659de734.1700615159.git.alison.schofield@intel.com [1]
> Cc: <stable@vger.kernel.org>
> Cc: Alison Schofield <alison.schofield@intel.com>
> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Cc: Dave Jiang <dave.jiang@intel.com>
> Cc: Ira Weiny <ira.weiny@intel.com>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Seems good other than %#pa being more appropriate for the printk.
I'm guessing you already tidied that up though.
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> ---
> drivers/cxl/core/hdm.c | 3 +--
> drivers/cxl/core/port.c | 4 ++--
> 2 files changed, 3 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c
> index 529baa8a1759..7d97790b893d 100644
> --- a/drivers/cxl/core/hdm.c
> +++ b/drivers/cxl/core/hdm.c
> @@ -363,10 +363,9 @@ resource_size_t cxl_dpa_resource_start(struct cxl_endpoint_decoder *cxled)
> {
> resource_size_t base = -1;
>
> - down_read(&cxl_dpa_rwsem);
> + lockdep_assert_held(&cxl_dpa_rwsem);
> if (cxled->dpa_res)
> base = cxled->dpa_res->start;
> - up_read(&cxl_dpa_rwsem);
>
> return base;
> }
> diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
> index 38441634e4c6..f6e9b2986a9a 100644
> --- a/drivers/cxl/core/port.c
> +++ b/drivers/cxl/core/port.c
> @@ -226,9 +226,9 @@ static ssize_t dpa_resource_show(struct device *dev, struct device_attribute *at
> char *buf)
> {
> struct cxl_endpoint_decoder *cxled = to_cxl_endpoint_decoder(dev);
> - u64 base = cxl_dpa_resource_start(cxled);
>
> - return sysfs_emit(buf, "%#llx\n", base);
> + guard(rwsem_read)(&cxl_dpa_rwsem);
> + return sysfs_emit(buf, "%#llx\n", cxl_dpa_resource_start(cxled));
> }
> static DEVICE_ATTR_RO(dpa_resource);
>
>
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: [PATCH] cxl/hdm: Fix dpa translation locking
2023-12-07 3:57 [PATCH] cxl/hdm: Fix dpa translation locking Dan Williams
` (3 preceding siblings ...)
2023-12-19 16:49 ` Jonathan Cameron
@ 2023-12-19 18:14 ` fan
4 siblings, 0 replies; 6+ messages in thread
From: fan @ 2023-12-19 18:14 UTC (permalink / raw)
To: Dan Williams
Cc: linux-cxl, stable, Alison Schofield, Jonathan Cameron, Dave Jiang,
Ira Weiny
On Wed, Dec 06, 2023 at 07:57:06PM -0800, Dan Williams wrote:
> The helper, cxl_dpa_resource_start(), snapshots the dpa-address of an
> endpoint-decoder after acquiring the cxl_dpa_rwsem. However, it is
> sufficient to assert that cxl_dpa_rwsem is held rather than acquire it
> in the helper. Otherwise, it triggers multiple lockdep reports:
>
> 1/ Tracing callbacks are in an atomic context that can not acquire sleeping
> locks:
>
> BUG: sleeping function called from invalid context at kernel/locking/rwsem.c:1525
> in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 1288, name: bash
> preempt_count: 2, expected: 0
> RCU nest depth: 0, expected: 0
> [..]
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS edk2-20230524-3.fc38 05/24/2023
> Call Trace:
> <TASK>
> dump_stack_lvl+0x71/0x90
> __might_resched+0x1b2/0x2c0
> down_read+0x1a/0x190
> cxl_dpa_resource_start+0x15/0x50 [cxl_core]
> cxl_trace_hpa+0x122/0x300 [cxl_core]
> trace_event_raw_event_cxl_poison+0x1c9/0x2d0 [cxl_core]
>
> 2/ The rwsem is already held in the inject poison path:
>
> WARNING: possible recursive locking detected
> 6.7.0-rc2+ #12 Tainted: G W OE N
> --------------------------------------------
> bash/1288 is trying to acquire lock:
> ffffffffc05f73d0 (cxl_dpa_rwsem){++++}-{3:3}, at: cxl_dpa_resource_start+0x15/0x50 [cxl_core]
>
> but task is already holding lock:
> ffffffffc05f73d0 (cxl_dpa_rwsem){++++}-{3:3}, at: cxl_inject_poison+0x7d/0x1e0 [cxl_core]
> [..]
> Call Trace:
> <TASK>
> dump_stack_lvl+0x71/0x90
> __might_resched+0x1b2/0x2c0
> down_read+0x1a/0x190
> cxl_dpa_resource_start+0x15/0x50 [cxl_core]
> cxl_trace_hpa+0x122/0x300 [cxl_core]
> trace_event_raw_event_cxl_poison+0x1c9/0x2d0 [cxl_core]
> __traceiter_cxl_poison+0x5c/0x80 [cxl_core]
> cxl_inject_poison+0x1bc/0x1e0 [cxl_core]
>
> This appears to have been an issue since the initial implementation and
> uncovered by the new cxl-poison.sh test [1]. That test is now passing with
> these changes.
>
> Fixes: 28a3ae4ff66c ("cxl/trace: Add an HPA to cxl_poison trace events")
> Link: http://lore.kernel.org/r/e4f2716646918135ddbadf4146e92abb659de734.1700615159.git.alison.schofield@intel.com [1]
> Cc: <stable@vger.kernel.org>
> Cc: Alison Schofield <alison.schofield@intel.com>
> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Cc: Dave Jiang <dave.jiang@intel.com>
> Cc: Ira Weiny <ira.weiny@intel.com>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---
Reviewed-by: Fan Ni <fan.ni@samsung.com>
> drivers/cxl/core/hdm.c | 3 +--
> drivers/cxl/core/port.c | 4 ++--
> 2 files changed, 3 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c
> index 529baa8a1759..7d97790b893d 100644
> --- a/drivers/cxl/core/hdm.c
> +++ b/drivers/cxl/core/hdm.c
> @@ -363,10 +363,9 @@ resource_size_t cxl_dpa_resource_start(struct cxl_endpoint_decoder *cxled)
> {
> resource_size_t base = -1;
>
> - down_read(&cxl_dpa_rwsem);
> + lockdep_assert_held(&cxl_dpa_rwsem);
> if (cxled->dpa_res)
> base = cxled->dpa_res->start;
> - up_read(&cxl_dpa_rwsem);
>
> return base;
> }
> diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c
> index 38441634e4c6..f6e9b2986a9a 100644
> --- a/drivers/cxl/core/port.c
> +++ b/drivers/cxl/core/port.c
> @@ -226,9 +226,9 @@ static ssize_t dpa_resource_show(struct device *dev, struct device_attribute *at
> char *buf)
> {
> struct cxl_endpoint_decoder *cxled = to_cxl_endpoint_decoder(dev);
> - u64 base = cxl_dpa_resource_start(cxled);
>
> - return sysfs_emit(buf, "%#llx\n", base);
> + guard(rwsem_read)(&cxl_dpa_rwsem);
> + return sysfs_emit(buf, "%#llx\n", cxl_dpa_resource_start(cxled));
> }
> static DEVICE_ATTR_RO(dpa_resource);
>
>
^ permalink raw reply [flat|nested] 6+ messages in thread