* [RFC PATCH 1/1] cxl/pci: Fake bandwidth for platform devices [not found] <20241122072001.94758-1-ming4.li@outlook.com> @ 2024-11-22 7:20 ` Li Ming 2024-11-26 15:03 ` Jonathan Cameron 2024-11-27 3:44 ` Zhijian Li (Fujitsu) 0 siblings, 2 replies; 5+ messages in thread From: Li Ming @ 2024-11-22 7:20 UTC (permalink / raw) To: linux-cxl, dave.jiang; +Cc: Li Ming cxl-test environment always hits below call trace with KASAN enabled BUG: KASAN: slab-out-of-bounds in pcie_capability_read_word+0x1df/0x220 Call Trace: <TASK> dump_stack_lvl+0x82/0xd0 print_report+0xcb/0x5d0 kasan_report+0xbd/0xf0 pcie_capability_read_word+0x1df/0x220 pcie_link_speed_mbps+0x6a/0x130 cxl_pci_get_bandwidth+0x68/0x1c0 [cxl_core] cxl_endpoint_gather_bandwidth.constprop.0+0x352/0x780 [cxl_core] cxl_region_shared_upstream_bandwidth_update+0x257/0x1640 [cxl_core] cxl_region_attach+0x1025/0x1e80 [cxl_core] cxl_add_to_region+0x121/0x14c0 [cxl_core] discover_region+0xa5/0x150 [cxl_port] cxl-test environment creates cxl memory devices based on platform devices rather than PCI devices for testing. but cxl_endpoint_gather_bandwidth() always assumes the device is a PCI device, it will cause the issue in cxl_pci_get_bandwidth(). The fixup is that faking a maximun bandwidth in cxl_pci_get_bandwidth() for a platform device so that the cxl-test environment can be used to validate the functionality of region bandwidth. Fixes: a5ab0de0ebaa ("cxl: Calculate region bandwidth of targets with shared upstream link") Signed-off-by: Li Ming <ming4.li@outlook.com> --- drivers/cxl/core/cdat.c | 5 ++--- drivers/cxl/core/core.h | 2 +- drivers/cxl/core/pci.c | 24 +++++++++++++++++------- 3 files changed, 20 insertions(+), 11 deletions(-) diff --git a/drivers/cxl/core/cdat.c b/drivers/cxl/core/cdat.c index 438869df241a..c67ef0d1b1a6 100644 --- a/drivers/cxl/core/cdat.c +++ b/drivers/cxl/core/cdat.c @@ -634,7 +634,6 @@ static int cxl_endpoint_gather_bandwidth(struct cxl_region *cxlr, struct access_coordinate ep_coord[ACCESS_COORDINATE_MAX]; struct cxl_memdev *cxlmd = cxled_to_memdev(cxled); struct cxl_dev_state *cxlds = cxlmd->cxlds; - struct pci_dev *pdev = to_pci_dev(cxlds->dev); struct cxl_perf_ctx *perf_ctx; struct cxl_dpa_perf *perf; unsigned long index; @@ -675,7 +674,7 @@ static int cxl_endpoint_gather_bandwidth(struct cxl_region *cxlr, } /* Direct upstream link from EP bandwidth */ - rc = cxl_pci_get_bandwidth(pdev, pci_coord); + rc = cxl_pci_get_bandwidth(cxlds->dev, pci_coord); if (rc < 0) return rc; @@ -809,7 +808,7 @@ static struct xarray *cxl_switch_gather_bandwidth(struct cxl_region *cxlr, return ERR_PTR(-EINVAL); /* Retrieve the upstream link bandwidth */ - rc = cxl_pci_get_bandwidth(to_pci_dev(dev), coords); + rc = cxl_pci_get_bandwidth(dev, coords); if (rc) return ERR_PTR(-ENXIO); diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h index 800466f96a68..1c16c06d201e 100644 --- a/drivers/cxl/core/core.h +++ b/drivers/cxl/core/core.h @@ -108,7 +108,7 @@ enum cxl_poison_trace_type { }; long cxl_pci_get_latency(struct pci_dev *pdev); -int cxl_pci_get_bandwidth(struct pci_dev *pdev, struct access_coordinate *c); +int cxl_pci_get_bandwidth(struct device *dev, struct access_coordinate *c); int cxl_update_hmat_access_coordinates(int nid, struct cxl_region *cxlr, enum access_coordinate_class access); bool cxl_need_node_perf_attrs_update(int nid); diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c index 5b46bc46aaa9..bd0448c3c9a8 100644 --- a/drivers/cxl/core/pci.c +++ b/drivers/cxl/core/pci.c @@ -7,6 +7,7 @@ #include <linux/pci.h> #include <linux/pci-doe.h> #include <linux/aer.h> +#include <linux/platform_device.h> #include <cxlpci.h> #include <cxlmem.h> #include <cxl.h> @@ -1032,19 +1033,28 @@ bool cxl_endpoint_decoder_reset_detected(struct cxl_port *port) } EXPORT_SYMBOL_NS_GPL(cxl_endpoint_decoder_reset_detected, CXL); -int cxl_pci_get_bandwidth(struct pci_dev *pdev, struct access_coordinate *c) +int cxl_pci_get_bandwidth(struct device *dev, struct access_coordinate *c) { int speed, bw; u16 lnksta; u32 width; - speed = pcie_link_speed_mbps(pdev); - if (speed < 0) - return speed; - speed /= BITS_PER_BYTE; + if (dev_is_platform(dev)) { + /* PCIE_SPEED_64_0GT as fake speed for platform device */ + speed = 64000 / BITS_PER_BYTE; + /* PCI_EXP_LNKSTA_NLW_X8 as fake width for platform device */ + width = FIELD_GET(PCI_EXP_LNKSTA_NLW, PCI_EXP_LNKSTA_NLW_X8); + } else { + struct pci_dev *pdev = to_pci_dev(dev); + + speed = pcie_link_speed_mbps(pdev); + if (speed < 0) + return speed; + speed /= BITS_PER_BYTE; - pcie_capability_read_word(pdev, PCI_EXP_LNKSTA, &lnksta); - width = FIELD_GET(PCI_EXP_LNKSTA_NLW, lnksta); + pcie_capability_read_word(pdev, PCI_EXP_LNKSTA, &lnksta); + width = FIELD_GET(PCI_EXP_LNKSTA_NLW, lnksta); + } bw = speed * width; for (int i = 0; i < ACCESS_COORDINATE_MAX; i++) { -- 2.34.1 ^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [RFC PATCH 1/1] cxl/pci: Fake bandwidth for platform devices 2024-11-22 7:20 ` [RFC PATCH 1/1] cxl/pci: Fake bandwidth for platform devices Li Ming @ 2024-11-26 15:03 ` Jonathan Cameron 2024-11-27 2:23 ` Li Ming 2024-11-27 3:44 ` Zhijian Li (Fujitsu) 1 sibling, 1 reply; 5+ messages in thread From: Jonathan Cameron @ 2024-11-26 15:03 UTC (permalink / raw) To: Li Ming; +Cc: linux-cxl, dave.jiang On Fri, 22 Nov 2024 15:20:01 +0800 Li Ming <ming4.li@outlook.com> wrote: > cxl-test environment always hits below call trace with KASAN > enabled > > BUG: KASAN: slab-out-of-bounds in pcie_capability_read_word+0x1df/0x220 > Call Trace: > <TASK> > dump_stack_lvl+0x82/0xd0 > print_report+0xcb/0x5d0 > kasan_report+0xbd/0xf0 > pcie_capability_read_word+0x1df/0x220 > pcie_link_speed_mbps+0x6a/0x130 > cxl_pci_get_bandwidth+0x68/0x1c0 [cxl_core] > cxl_endpoint_gather_bandwidth.constprop.0+0x352/0x780 [cxl_core] > cxl_region_shared_upstream_bandwidth_update+0x257/0x1640 [cxl_core] > cxl_region_attach+0x1025/0x1e80 [cxl_core] > cxl_add_to_region+0x121/0x14c0 [cxl_core] > discover_region+0xa5/0x150 [cxl_port] > > cxl-test environment creates cxl memory devices based on platform devices > rather than PCI devices for testing. but cxl_endpoint_gather_bandwidth() > always assumes the device is a PCI device, it will cause the issue in > cxl_pci_get_bandwidth(). > > The fixup is that faking a maximun bandwidth in cxl_pci_get_bandwidth() maximum > for a platform device so that the cxl-test environment can be used to > validate the functionality of region bandwidth. > > Fixes: a5ab0de0ebaa ("cxl: Calculate region bandwidth of targets with shared upstream link") > Signed-off-by: Li Ming <ming4.li@outlook.com> > --- > drivers/cxl/core/cdat.c | 5 ++--- > drivers/cxl/core/core.h | 2 +- > drivers/cxl/core/pci.c | 24 +++++++++++++++++------- > 3 files changed, 20 insertions(+), 11 deletions(-) > > diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c > index 5b46bc46aaa9..bd0448c3c9a8 100644 > --- a/drivers/cxl/core/pci.c > +++ b/drivers/cxl/core/pci.c > @@ -7,6 +7,7 @@ > #include <linux/pci.h> > #include <linux/pci-doe.h> > #include <linux/aer.h> > +#include <linux/platform_device.h> Having a platform device include in a file called pci.c is a bit nasty.. Can we instead check if the device is not a pci one? > #include <cxlpci.h> > #include <cxlmem.h> > #include <cxl.h> > @@ -1032,19 +1033,28 @@ bool cxl_endpoint_decoder_reset_detected(struct cxl_port *port) > } > EXPORT_SYMBOL_NS_GPL(cxl_endpoint_decoder_reset_detected, CXL); > > -int cxl_pci_get_bandwidth(struct pci_dev *pdev, struct access_coordinate *c) > +int cxl_pci_get_bandwidth(struct device *dev, struct access_coordinate *c) > { > int speed, bw; > u16 lnksta; > u32 width; > > - speed = pcie_link_speed_mbps(pdev); > - if (speed < 0) > - return speed; > - speed /= BITS_PER_BYTE; > + if (dev_is_platform(dev)) { > + /* PCIE_SPEED_64_0GT as fake speed for platform device */ > + speed = 64000 / BITS_PER_BYTE; > + /* PCI_EXP_LNKSTA_NLW_X8 as fake width for platform device */ > + width = FIELD_GET(PCI_EXP_LNKSTA_NLW, PCI_EXP_LNKSTA_NLW_X8); > + } else { > + struct pci_dev *pdev = to_pci_dev(dev); > + > + speed = pcie_link_speed_mbps(pdev); > + if (speed < 0) > + return speed; > + speed /= BITS_PER_BYTE; > > - pcie_capability_read_word(pdev, PCI_EXP_LNKSTA, &lnksta); > - width = FIELD_GET(PCI_EXP_LNKSTA_NLW, lnksta); > + pcie_capability_read_word(pdev, PCI_EXP_LNKSTA, &lnksta); > + width = FIELD_GET(PCI_EXP_LNKSTA_NLW, lnksta); > + } > bw = speed * width; > > for (int i = 0; i < ACCESS_COORDINATE_MAX; i++) { ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [RFC PATCH 1/1] cxl/pci: Fake bandwidth for platform devices 2024-11-26 15:03 ` Jonathan Cameron @ 2024-11-27 2:23 ` Li Ming 0 siblings, 0 replies; 5+ messages in thread From: Li Ming @ 2024-11-27 2:23 UTC (permalink / raw) To: Jonathan Cameron; +Cc: linux-cxl, dave.jiang On 11/26/2024 11:03 PM, Jonathan Cameron wrote: > On Fri, 22 Nov 2024 15:20:01 +0800 > Li Ming <ming4.li@outlook.com> wrote: > >> cxl-test environment always hits below call trace with KASAN >> enabled >> >> BUG: KASAN: slab-out-of-bounds in pcie_capability_read_word+0x1df/0x220 >> Call Trace: >> <TASK> >> dump_stack_lvl+0x82/0xd0 >> print_report+0xcb/0x5d0 >> kasan_report+0xbd/0xf0 >> pcie_capability_read_word+0x1df/0x220 >> pcie_link_speed_mbps+0x6a/0x130 >> cxl_pci_get_bandwidth+0x68/0x1c0 [cxl_core] >> cxl_endpoint_gather_bandwidth.constprop.0+0x352/0x780 [cxl_core] >> cxl_region_shared_upstream_bandwidth_update+0x257/0x1640 [cxl_core] >> cxl_region_attach+0x1025/0x1e80 [cxl_core] >> cxl_add_to_region+0x121/0x14c0 [cxl_core] >> discover_region+0xa5/0x150 [cxl_port] >> >> cxl-test environment creates cxl memory devices based on platform devices >> rather than PCI devices for testing. but cxl_endpoint_gather_bandwidth() >> always assumes the device is a PCI device, it will cause the issue in >> cxl_pci_get_bandwidth(). >> >> The fixup is that faking a maximun bandwidth in cxl_pci_get_bandwidth() > > maximum Will fix it in v2. > >> for a platform device so that the cxl-test environment can be used to >> validate the functionality of region bandwidth. >> >> Fixes: a5ab0de0ebaa ("cxl: Calculate region bandwidth of targets with shared upstream link") >> Signed-off-by: Li Ming <ming4.li@outlook.com> >> --- >> drivers/cxl/core/cdat.c | 5 ++--- >> drivers/cxl/core/core.h | 2 +- >> drivers/cxl/core/pci.c | 24 +++++++++++++++++------- >> 3 files changed, 20 insertions(+), 11 deletions(-) >> > >> diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c >> index 5b46bc46aaa9..bd0448c3c9a8 100644 >> --- a/drivers/cxl/core/pci.c >> +++ b/drivers/cxl/core/pci.c >> @@ -7,6 +7,7 @@ >> #include <linux/pci.h> >> #include <linux/pci-doe.h> >> #include <linux/aer.h> >> +#include <linux/platform_device.h> > > Having a platform device include in a file called pci.c is a bit nasty.. > Can we instead check if the device is not a pci one? > Yes, will do it in v2. Thanks for review. BTW, I will use my another email address to send v2, outlook has some problems causing that the cover-letter and patches cannot be threaded. Ming ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [RFC PATCH 1/1] cxl/pci: Fake bandwidth for platform devices 2024-11-22 7:20 ` [RFC PATCH 1/1] cxl/pci: Fake bandwidth for platform devices Li Ming 2024-11-26 15:03 ` Jonathan Cameron @ 2024-11-27 3:44 ` Zhijian Li (Fujitsu) 2024-11-27 4:06 ` Li Ming 1 sibling, 1 reply; 5+ messages in thread From: Zhijian Li (Fujitsu) @ 2024-11-27 3:44 UTC (permalink / raw) To: Li Ming, linux-cxl@vger.kernel.org, dave.jiang@intel.com On 22/11/2024 15:20, Li Ming wrote: > cxl-test environment always hits below call trace with KASAN > enabled > > BUG: KASAN: slab-out-of-bounds in pcie_capability_read_word+0x1df/0x220 > Call Trace: > <TASK> > dump_stack_lvl+0x82/0xd0 > print_report+0xcb/0x5d0 > kasan_report+0xbd/0xf0 > pcie_capability_read_word+0x1df/0x220 > pcie_link_speed_mbps+0x6a/0x130 > cxl_pci_get_bandwidth+0x68/0x1c0 [cxl_core] > cxl_endpoint_gather_bandwidth.constprop.0+0x352/0x780 [cxl_core] > cxl_region_shared_upstream_bandwidth_update+0x257/0x1640 [cxl_core] > cxl_region_attach+0x1025/0x1e80 [cxl_core] > cxl_add_to_region+0x121/0x14c0 [cxl_core] > discover_region+0xa5/0x150 [cxl_port] Is this testing on the upstream kernel? I guessed this have been fixed by cce3cd647721 cxl/core: Return error when cxl_endpoint_gather_bandwidth() handles a non-PCI device Thanks Zhijian > > cxl-test environment creates cxl memory devices based on platform devices > rather than PCI devices for testing. but cxl_endpoint_gather_bandwidth() > always assumes the device is a PCI device, it will cause the issue in > cxl_pci_get_bandwidth(). > > The fixup is that faking a maximun bandwidth in cxl_pci_get_bandwidth() > for a platform device so that the cxl-test environment can be used to > validate the functionality of region bandwidth. > > Fixes: a5ab0de0ebaa ("cxl: Calculate region bandwidth of targets with shared upstream link") > Signed-off-by: Li Ming <ming4.li@outlook.com> > --- > drivers/cxl/core/cdat.c | 5 ++--- > drivers/cxl/core/core.h | 2 +- > drivers/cxl/core/pci.c | 24 +++++++++++++++++------- > 3 files changed, 20 insertions(+), 11 deletions(-) > > diff --git a/drivers/cxl/core/cdat.c b/drivers/cxl/core/cdat.c > index 438869df241a..c67ef0d1b1a6 100644 > --- a/drivers/cxl/core/cdat.c > +++ b/drivers/cxl/core/cdat.c > @@ -634,7 +634,6 @@ static int cxl_endpoint_gather_bandwidth(struct cxl_region *cxlr, > struct access_coordinate ep_coord[ACCESS_COORDINATE_MAX]; > struct cxl_memdev *cxlmd = cxled_to_memdev(cxled); > struct cxl_dev_state *cxlds = cxlmd->cxlds; > - struct pci_dev *pdev = to_pci_dev(cxlds->dev); > struct cxl_perf_ctx *perf_ctx; > struct cxl_dpa_perf *perf; > unsigned long index; > @@ -675,7 +674,7 @@ static int cxl_endpoint_gather_bandwidth(struct cxl_region *cxlr, > } > > /* Direct upstream link from EP bandwidth */ > - rc = cxl_pci_get_bandwidth(pdev, pci_coord); > + rc = cxl_pci_get_bandwidth(cxlds->dev, pci_coord); > if (rc < 0) > return rc; > > @@ -809,7 +808,7 @@ static struct xarray *cxl_switch_gather_bandwidth(struct cxl_region *cxlr, > return ERR_PTR(-EINVAL); > > /* Retrieve the upstream link bandwidth */ > - rc = cxl_pci_get_bandwidth(to_pci_dev(dev), coords); > + rc = cxl_pci_get_bandwidth(dev, coords); > if (rc) > return ERR_PTR(-ENXIO); > > diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h > index 800466f96a68..1c16c06d201e 100644 > --- a/drivers/cxl/core/core.h > +++ b/drivers/cxl/core/core.h > @@ -108,7 +108,7 @@ enum cxl_poison_trace_type { > }; > > long cxl_pci_get_latency(struct pci_dev *pdev); > -int cxl_pci_get_bandwidth(struct pci_dev *pdev, struct access_coordinate *c); > +int cxl_pci_get_bandwidth(struct device *dev, struct access_coordinate *c); > int cxl_update_hmat_access_coordinates(int nid, struct cxl_region *cxlr, > enum access_coordinate_class access); > bool cxl_need_node_perf_attrs_update(int nid); > diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c > index 5b46bc46aaa9..bd0448c3c9a8 100644 > --- a/drivers/cxl/core/pci.c > +++ b/drivers/cxl/core/pci.c > @@ -7,6 +7,7 @@ > #include <linux/pci.h> > #include <linux/pci-doe.h> > #include <linux/aer.h> > +#include <linux/platform_device.h> > #include <cxlpci.h> > #include <cxlmem.h> > #include <cxl.h> > @@ -1032,19 +1033,28 @@ bool cxl_endpoint_decoder_reset_detected(struct cxl_port *port) > } > EXPORT_SYMBOL_NS_GPL(cxl_endpoint_decoder_reset_detected, CXL); > > -int cxl_pci_get_bandwidth(struct pci_dev *pdev, struct access_coordinate *c) > +int cxl_pci_get_bandwidth(struct device *dev, struct access_coordinate *c) > { > int speed, bw; > u16 lnksta; > u32 width; > > - speed = pcie_link_speed_mbps(pdev); > - if (speed < 0) > - return speed; > - speed /= BITS_PER_BYTE; > + if (dev_is_platform(dev)) { > + /* PCIE_SPEED_64_0GT as fake speed for platform device */ > + speed = 64000 / BITS_PER_BYTE; > + /* PCI_EXP_LNKSTA_NLW_X8 as fake width for platform device */ > + width = FIELD_GET(PCI_EXP_LNKSTA_NLW, PCI_EXP_LNKSTA_NLW_X8); > + } else { > + struct pci_dev *pdev = to_pci_dev(dev); > + > + speed = pcie_link_speed_mbps(pdev); > + if (speed < 0) > + return speed; > + speed /= BITS_PER_BYTE; > > - pcie_capability_read_word(pdev, PCI_EXP_LNKSTA, &lnksta); > - width = FIELD_GET(PCI_EXP_LNKSTA_NLW, lnksta); > + pcie_capability_read_word(pdev, PCI_EXP_LNKSTA, &lnksta); > + width = FIELD_GET(PCI_EXP_LNKSTA_NLW, lnksta); > + } > bw = speed * width; > > for (int i = 0; i < ACCESS_COORDINATE_MAX; i++) { ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [RFC PATCH 1/1] cxl/pci: Fake bandwidth for platform devices 2024-11-27 3:44 ` Zhijian Li (Fujitsu) @ 2024-11-27 4:06 ` Li Ming 0 siblings, 0 replies; 5+ messages in thread From: Li Ming @ 2024-11-27 4:06 UTC (permalink / raw) To: Zhijian Li (Fujitsu), linux-cxl@vger.kernel.org, dave.jiang@intel.com On 11/27/2024 11:44 AM, Zhijian Li (Fujitsu) wrote: > > > On 22/11/2024 15:20, Li Ming wrote: >> cxl-test environment always hits below call trace with KASAN >> enabled >> >> BUG: KASAN: slab-out-of-bounds in pcie_capability_read_word+0x1df/0x220 >> Call Trace: >> <TASK> >> dump_stack_lvl+0x82/0xd0 >> print_report+0xcb/0x5d0 >> kasan_report+0xbd/0xf0 >> pcie_capability_read_word+0x1df/0x220 >> pcie_link_speed_mbps+0x6a/0x130 >> cxl_pci_get_bandwidth+0x68/0x1c0 [cxl_core] >> cxl_endpoint_gather_bandwidth.constprop.0+0x352/0x780 [cxl_core] >> cxl_region_shared_upstream_bandwidth_update+0x257/0x1640 [cxl_core] >> cxl_region_attach+0x1025/0x1e80 [cxl_core] >> cxl_add_to_region+0x121/0x14c0 [cxl_core] >> discover_region+0xa5/0x150 [cxl_port] > > Is this testing on the upstream kernel? > I guessed this have been fixed by > cce3cd647721 cxl/core: Return error when cxl_endpoint_gather_bandwidth() handles a non-PCI device > > > Thanks > Zhijian Hi Zhijian, The commit you provided should fix the issue as well. I test it with tag cxl-for-6.13 which based on 6.12-rc5 not include the commit. The commit was included since 6.12-rc7, That is why I can reproduce this issue. If we already has a fix for that, please ignore this patch, sorry about that. Ming ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2024-11-27 4:07 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20241122072001.94758-1-ming4.li@outlook.com>
2024-11-22 7:20 ` [RFC PATCH 1/1] cxl/pci: Fake bandwidth for platform devices Li Ming
2024-11-26 15:03 ` Jonathan Cameron
2024-11-27 2:23 ` Li Ming
2024-11-27 3:44 ` Zhijian Li (Fujitsu)
2024-11-27 4:06 ` Li Ming
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.