From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from frasgout.his.huawei.com (frasgout.his.huawei.com [185.176.79.56]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5217864CCE; Wed, 14 Feb 2024 17:54:20 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=185.176.79.56 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707933263; cv=none; b=r8R5yVp+KPpmwpMm6kK0fu6a5AFuFgM2K0A0yVQqIMi6vOx8H6DHgH1OHol5ziC1nbmzfzAZWu5v2TgN+xoEDtH29ZcgMq1WmzUmhXFnpTZ2AgQ7+uqBIpvAOP7K2Qmm5ZNgjsfK3pyEwZGVblpk/MooRNiw13PxxUgqFEErbnU= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1707933263; c=relaxed/simple; bh=ghwOtyRX3lifuXRWK5EKF/Nd4F95oq0MXCX4u7qpfKQ=; h=Date:From:To:CC:Subject:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=kAOtW+j92u+pK0c5juKtCX5a1xEzRdjKIQA+jNJTxk+IS8QqxgxBhUFZ1grLUx6w+J8RutC5lhS9vmsfvY/4Hk+b2Hk+gh3keNVREQ5nsHtcBIn5KhzrFFoIVMqbRkXiNPu5UdQWlFxkUigpde26kBQb/9znBKbs2ZALc1hrdYU= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=Huawei.com; spf=pass smtp.mailfrom=huawei.com; arc=none smtp.client-ip=185.176.79.56 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=Huawei.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huawei.com Received: from mail.maildlp.com (unknown [172.18.186.231]) by frasgout.his.huawei.com (SkyGuard) with ESMTP id 4TZm251kGGz67g0Y; Thu, 15 Feb 2024 01:50:37 +0800 (CST) Received: from lhrpeml500005.china.huawei.com (unknown [7.191.163.240]) by mail.maildlp.com (Postfix) with ESMTPS id E2178140D26; Thu, 15 Feb 2024 01:54:18 +0800 (CST) Received: from localhost (10.202.227.76) by lhrpeml500005.china.huawei.com (7.191.163.240) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.35; Wed, 14 Feb 2024 17:54:18 +0000 Date: Wed, 14 Feb 2024 17:54:17 +0000 From: Jonathan Cameron To: Dave Jiang CC: , "linux-cxl@vger.kernel.org" , "linux-acpi@vger.kernel.org" , "dan.j.williams@intel.com" , "ira.weiny@intel.com" , "vishal.l.verma@intel.com" , "alison.schofield@intel.com" , "dave@stgolabs.net" , "brice.goglin@gmail.com" , "nifan.cxl@gmail.com" , "rafael@kernel.org" , "gregkh@linuxfoundation.org" , KyungSan Kim , "Hojin Nam" Subject: Re: [PATCH v4 08/11] cxl/region: Calculate performance data for a region Message-ID: <20240214175417.00004910@Huawei.com> In-Reply-To: References: <170568503239.1008395.2633682569497108646.stgit@djiang5-mobl3> <170568485801.1008395.12244787918793980621.stgit@djiang5-mobl3> <20240131022226epcms2p47d09167da93641ea88461563c494335a@epcms2p4> Organization: Huawei Technologies Research and Development (UK) Ltd. X-Mailer: Claws Mail 4.1.0 (GTK 3.24.33; x86_64-w64-mingw32) Precedence: bulk X-Mailing-List: linux-acpi@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="ISO-8859-1" Content-Transfer-Encoding: quoted-printable X-ClientProxiedBy: lhrpeml100002.china.huawei.com (7.191.160.241) To lhrpeml500005.china.huawei.com (7.191.163.240) On Wed, 31 Jan 2024 08:56:41 -0700 Dave Jiang wrote: > On 1/30/24 19:22, Wonjae Lee wrote: > > On Fri, Jan 19, 2024 at 10:23:52AM -0700, Dave Jiang wrote: =20 > >> Calculate and store the performance data for a CXL region. Find the wo= rst > >> read and write latency for all the included ranges from each of the de= vices > >> that attributes to the region and designate that as the latency data. = Sum > >> all the read and write bandwidth data for each of the device region and > >> that is the total bandwidth for the region. > >> > >> The perf list is expected to be constructed before the endpoint decode= rs > >> are registered and thus there should be no early reading of the entries > >> from the region assemble action. The calling of the region qos calcula= te > >> function is under the protection of cxl_dpa_rwsem and will ensure that > >> all DPA associated work has completed. > >> > >> Reviewed-by: Jonathan Cameron > >> Signed-off-by: Dave Jiang > >> --- > >> v4: > >> - Calculate access classes 0 and 1 by retrieving host bridge coords > >> - Add lockdep assert for cxl_dpa_rwsem (Dan) > >> - Clarify that HMAT code is HMEM_REPORTING code. (Dan) > >> --- > >> =A0 drivers/cxl/core/cdat.c=A0 =A0 74 ++++++++++++++++++++++++++++++++= +++++++++++++ > >> =A0 drivers/cxl/core/region.c =A0 =A0 2 + > >> =A0 drivers/cxl/cxl.h=A0 =A0 =A0 =A0 =A0 =A0 4 ++ > >> =A0 3 files changed, 80 insertions(+) > >> > >> diff --git a/drivers/cxl/core/cdat.c b/drivers/cxl/core/cdat.c > >> index 6e3998723aaa..7acb5837afad 100644 > >> --- a/drivers/cxl/core/cdat.c > >> +++ b/drivers/cxl/core/cdat.c > >> @@ -8,6 +8,7 @@ > >> =A0 #include "cxlpci.h" > >> =A0 #include "cxlmem.h" > >> =A0 #include "cxl.h" > >> +#include "core.h" > >> > >> =A0 struct dsmas_entry { > >> =A0 struct range dpa_range; > >> @@ -546,3 +547,76 @@ void cxl_coordinates_combine(struct access_coordi= nate *out, > >> =A0 EXPORT_SYMBOL_NS_GPL(cxl_coordinates_combine, CXL); > >> > >> =A0 MODULE_IMPORT_NS(CXL); > >> + > >> +void cxl_region_perf_data_calculate(struct cxl_region *cxlr, > >> +=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 struct cxl_endpoint_decoder *cxled) > >> +{ > >> + struct cxl_memdev *cxlmd =3D cxled_to_memdev(cxled); > >> + struct cxl_port *port =3D cxlmd->endpoint; > >> + struct cxl_dev_state *cxlds =3D cxlmd->cxlds; > >> + struct cxl_memdev_state *mds =3D to_cxl_memdev_state(cxlds); > >> + struct access_coordinate hb_coord[ACCESS_COORDINATE_MAX]; > >> + struct access_coordinate coord; > >> + struct range dpa =3D { > >> +=A0 =A0 =A0 =A0 .start =3D cxled->dpa_res->start, > >> +=A0 =A0 =A0 =A0 .end =3D cxled->dpa_res->end, > >> + }; > >> + struct list_head *perf_list; > >> + struct cxl_dpa_perf *perf; > >> + bool found =3D false; > >> + int rc; > >> + > >> + switch (cxlr->mode) { > >> + case CXL_DECODER_RAM: > >> +=A0 =A0 perf_list =3D &mds->ram_perf_list; > >> +=A0 =A0 break; > >> + case CXL_DECODER_PMEM: > >> +=A0 =A0 perf_list =3D &mds->pmem_perf_list; > >> +=A0 =A0 break; > >> + default: > >> +=A0 =A0 return; > >> + } > >> + > >> + lockdep_assert_held(&cxl_dpa_rwsem); > >> + > >> + list_for_each_entry(perf, perf_list, list) { > >> +=A0 =A0 if (range_contains(&perf->dpa_range, &dpa)) { > >> +=A0 =A0 =A0 =A0 found =3D true; > >> +=A0 =A0 =A0 =A0 break; > >> +=A0 =A0 } > >> + } > >> + > >> + if (!found) > >> +=A0 =A0 return; > >> + > >> + rc =3D cxl_hb_get_perf_coordinates(port, hb_coord); > >> + if (rc)=A0 { > >> +=A0 =A0 dev_dbg(&port->dev, "Failed to retrieve hb perf coordinates.\= n"); > >> +=A0 =A0 return; > >> + } > >> + > >> + for (int i =3D 0; i < ACCESS_COORDINATE_MAX; i++) { > >> +=A0 =A0 /* Pickup the host bridge coords */ > >> +=A0 =A0 cxl_coordinates_combine(&coord, &hb_coord[i], &perf->coord); > >> + > >> +=A0 =A0 /* Get total bandwidth and the worst latency for the cxl regi= on */ > >> +=A0 =A0 cxlr->coord[i].read_latency =3D max_t(unsigned int, > >> +=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 cxlr->coord[i].read_l= atency, > >> +=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 coord.read_latency); > >> +=A0 =A0 cxlr->coord[i].write_latency =3D max_t(unsigned int, > >> +=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 cxlr->coord[i].wr= ite_latency, > >> +=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 coord.write_laten= cy); > >> +=A0 =A0 cxlr->coord[i].read_bandwidth +=3D coord.read_bandwidth; > >> +=A0 =A0 cxlr->coord[i].write_bandwidth +=3D coord.write_bandwidth; > >> + > >> +=A0 =A0 /* > >> +=A0 =A0 =A0 * Convert latency to nanosec from picosec to be consistent > >> +=A0 =A0 =A0 * with the resulting latency coordinates computed by the > >> +=A0 =A0 =A0 * HMAT_REPORTING code. > >> +=A0 =A0 =A0 */ > >> +=A0 =A0 cxlr->coord[i].read_latency =3D > >> +=A0 =A0 =A0 =A0 DIV_ROUND_UP(cxlr->coord[i].read_latency, 1000); > >> +=A0 =A0 cxlr->coord[i].write_latency =3D > >> +=A0 =A0 =A0 =A0 DIV_ROUND_UP(cxlr->coord[i].write_latency, 1000); =20 > >=20 > > Hello, > >=20 > > I ran into a bit of confusion and have a question while validating CDAT > > behaviour with physical CXL devices. (I'm not sure if this is the right > > thread to ask this question, sorry if it isn't.) > >=20 > > IIUC, the raw data of latency is in picosec, but the comments on the > > struct access_coordinate say that the latency units are in nanosec: > > * @read_latency: Read latency in nanoseconds > > * @write_latency: Write latency in nanoseconds > >=20 > > This was a bit confusing at first, as the raw data of latency are in > > ps, and the structure that stores the latency expects units of ns. =20 >=20 > Right. The numbers stored with the HMAT_REPORTING code and eventually NUM= A nodes are normalized to nanoseconds, even though the raw data is in picos= econds. For CXL, I left the CDAT and computed numbers as raw numbers (picos= econds) until the final step when I calculate the latency for the entire re= gion. And then it gets converted to nanoseconds in order to write back to t= he memory_target for HMAT_REPORTING. The numbers we retrieve from HMAT_REPO= RTING for the generic target is already in nanoseconds. HMAT had a non backwards compatible change in units between the first and second revs of that table. See ACPI 6.2 for revision 1. With another hat on I'm a bit embarrassed about that :( . >=20 >=20 > >=20 > > I saw that you have already had a discussion with Brice about the > > pico/nanosecond unit conversion. My question is, are there any plans to > > store latency number of cxl port in nanoseconds or change the comments > > of coords structure? =20 >=20 > The numbers for the coords structure will remain in nanoseconds as it alw= ays have been.=20 >=20 > >=20 > > Thanks, > > Wonjae > > =20 > >> + } > >> +} > >> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c > >> index 57a5901d5a60..7f19b533c5ae 100644 > >> --- a/drivers/cxl/core/region.c > >> +++ b/drivers/cxl/core/region.c > >> @@ -1722,6 +1722,8 @@ static int cxl_region_attach(struct cxl_region *= cxlr, > >> =A0 =A0 =A0 return -EINVAL; > >> =A0 } > >> > >> + cxl_region_perf_data_calculate(cxlr, cxled); > >> + > >> =A0 if (test_bit(CXL_REGION_F_AUTO, &cxlr->flags)) { > >> =A0 =A0 =A0 int i; > >> > >> diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h > >> index 80e6bd294e18..f6637fa33113 100644 > >> --- a/drivers/cxl/cxl.h > >> +++ b/drivers/cxl/cxl.h > >> @@ -519,6 +519,7 @@ struct cxl_region_params { > >> =A0 * @cxlr_pmem: (for pmem regions) cached copy of the nvdimm bridge > >> =A0 * @flags: Region state flags > >> =A0 * @params: active + config params for the region > >> + * @coord: QoS access coordinates for the region > >> =A0 */ > >> =A0 struct cxl_region { > >> =A0 struct device dev; > >> @@ -529,6 +530,7 @@ struct cxl_region { > >> =A0 struct cxl_pmem_region *cxlr_pmem; > >> =A0 unsigned long flags; > >> =A0 struct cxl_region_params params; > >> + struct access_coordinate coord[ACCESS_COORDINATE_MAX]; > >> =A0 }; > >> > >> =A0 struct cxl_nvdimm_bridge { > >> @@ -880,6 +882,8 @@ int cxl_endpoint_get_perf_coordinates(struct cxl_p= ort *port, > >> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 struct access_coordinate *coor= d); > >> =A0 int cxl_hb_get_perf_coordinates(struct cxl_port *port, > >> =A0 =A0 =A0 =A0 =A0 =A0 =A0 struct access_coordinate *coord); > >> +void cxl_region_perf_data_calculate(struct cxl_region *cxlr, > >> +=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 struct cxl_endpoint_decoder *cxled); > >> > >> =A0 void cxl_coordinates_combine(struct access_coordinate *out, > >> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 struct access_coordinate *c1, > >> > >> > >> =20 >=20