From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.16]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 47C3C303A27; Wed, 15 Oct 2025 18:17:56 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.16 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1760552278; cv=none; b=cCyJPgi1HE+o9gIR5zHtLNXgCSLcb6YKR/SBqeSeNIi+K1pS72QrcMUMAY4VFupPHC3CwcEZvOdMh9taE8nlcOXyO0ix2ArIogxLtGKc4hwJWi0OquJtlLleGK/0nUSezbIP0g0HCAWPYg02T/GEbYnQ+Jlmxd7n6hFGmTV6TDQ= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1760552278; c=relaxed/simple; bh=D/lKt9zo/NiMC6Do0JXtB02ANkMdlGC1mQ3OZ5KTdr4=; h=Message-ID:Date:MIME-Version:Subject:To:Cc:References:From: In-Reply-To:Content-Type; b=FAqJpscOfnC8MdqGN14y9Rn1Ru2LnS1gkrL40DF7BVHlqQrvINhlPlFCjdXaOlly9RhInFK2fUkWDSfO2rZc7mno/yfLC+rgk5RAjY5BKa0XtT4mCRsZkmuRfh+KY1V6fb1uQyUdobEC9dLb4KmwLbLKgS+uT27R7Ur11tj8v1w= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=MuwrO7Tv; arc=none smtp.client-ip=198.175.65.16 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="MuwrO7Tv" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1760552276; x=1792088276; h=message-id:date:mime-version:subject:to:cc:references: from:in-reply-to:content-transfer-encoding; bh=D/lKt9zo/NiMC6Do0JXtB02ANkMdlGC1mQ3OZ5KTdr4=; b=MuwrO7TvXq6xel7v6ipPIgBnKMoEkllj71WREN5Zk0IxFETWdwqZHkfQ /9RNuyqKTKc0NwWizVdPOU0faKTLQGZxMUMQ6blrQuPfriJP6a4cuIcUC rfKfpKmC+mOu+KXAeqYf5CxggDtERPPpDbZfP/mqfcmAdwaBX2WjnTfiz WrsvZSGGEvxw+ULI+E9JjfIR9ioQ2CnBRxKyTor/y91m2bRaGJa3PrYAc zsulOfvMECD2fiklVi6TkwkwcaNWkRRTSIqOleyVbUGNANWPgMb8cPGuG gn7pD4SdeYvBqQFAMemDSGCtt+o5NC9j6mV+Z/t3Two7h0vM008fDXC/L A==; X-CSE-ConnectionGUID: qzxoxl/MQTG7p0bXMhe7Sg== X-CSE-MsgGUID: kR1+lhlAQSaehTuwyd2p2w== X-IronPort-AV: E=McAfee;i="6800,10657,11583"; a="62887104" X-IronPort-AV: E=Sophos;i="6.19,232,1754982000"; d="scan'208";a="62887104" Received: from orviesa004.jf.intel.com ([10.64.159.144]) by orvoesa108.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Oct 2025 11:17:56 -0700 X-CSE-ConnectionGUID: HF8dw2GVTtybHtoS369Cug== X-CSE-MsgGUID: 7k8RrJvtQz6DF+UgsVSoEg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.19,232,1754982000"; d="scan'208";a="186500151" Received: from cmdeoliv-mobl4.amr.corp.intel.com (HELO [10.125.111.221]) ([10.125.111.221]) by orviesa004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Oct 2025 11:17:55 -0700 Message-ID: Date: Wed, 15 Oct 2025 11:17:54 -0700 Precedence: bulk X-Mailing-List: linux-cxl@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v19 11/22] cxl: Define a driver interface for HPA free space enumeration To: alejandro.lucero-palau@amd.com, linux-cxl@vger.kernel.org, netdev@vger.kernel.org, dan.j.williams@intel.com, edward.cree@amd.com, davem@davemloft.net, kuba@kernel.org, pabeni@redhat.com, edumazet@google.com Cc: Alejandro Lucero , Jonathan Cameron References: <20251006100130.2623388-1-alejandro.lucero-palau@amd.com> <20251006100130.2623388-12-alejandro.lucero-palau@amd.com> From: Dave Jiang Content-Language: en-US In-Reply-To: <20251006100130.2623388-12-alejandro.lucero-palau@amd.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit On 10/6/25 3:01 AM, alejandro.lucero-palau@amd.com wrote: > From: Alejandro Lucero > > CXL region creation involves allocating capacity from Device Physical Address > (DPA) and assigning it to decode a given Host Physical Address (HPA). Before > determining how much DPA to allocate the amount of available HPA must be > determined. Also, not all HPA is created equal, some HPA targets RAM, some > targets PMEM, some is prepared for device-memory flows like HDM-D and HDM-DB, > and some is HDM-H (host-only). > > In order to support Type2 CXL devices, wrap all of those concerns into > an API that retrieves a root decoder (platform CXL window) that fits the > specified constraints and the capacity available for a new region. > > Add a complementary function for releasing the reference to such root > decoder. > > Based on https://lore.kernel.org/linux-cxl/168592159290.1948938.13522227102445462976.stgit@dwillia2-xfh.jf.intel.com/ > > Signed-off-by: Alejandro Lucero > Reviewed-by: Jonathan Cameron > --- > drivers/cxl/core/region.c | 162 ++++++++++++++++++++++++++++++++++++++ > drivers/cxl/cxl.h | 3 + > include/cxl/cxl.h | 6 ++ > 3 files changed, 171 insertions(+) > > diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c > index e9bf42d91689..c5b66204ecde 100644 > --- a/drivers/cxl/core/region.c > +++ b/drivers/cxl/core/region.c > @@ -703,6 +703,168 @@ static int free_hpa(struct cxl_region *cxlr) > return 0; > } > > +struct cxlrd_max_context { > + struct device * const *host_bridges; > + int interleave_ways; > + unsigned long flags; > + resource_size_t max_hpa; > + struct cxl_root_decoder *cxlrd; > +}; > + > +static int find_max_hpa(struct device *dev, void *data) > +{ > + struct cxlrd_max_context *ctx = data; > + struct cxl_switch_decoder *cxlsd; > + struct cxl_root_decoder *cxlrd; > + struct resource *res, *prev; > + struct cxl_decoder *cxld; > + resource_size_t max; > + int found = 0; > + > + if (!is_root_decoder(dev)) > + return 0; > + > + cxlrd = to_cxl_root_decoder(dev); > + cxlsd = &cxlrd->cxlsd; > + cxld = &cxlsd->cxld; > + > + if ((cxld->flags & ctx->flags) != ctx->flags) { > + dev_dbg(dev, "flags not matching: %08lx vs %08lx\n", > + cxld->flags, ctx->flags); > + return 0; > + } > + > + for (int i = 0; i < ctx->interleave_ways; i++) { > + for (int j = 0; j < ctx->interleave_ways; j++) { > + if (ctx->host_bridges[i] == cxlsd->target[j]->dport_dev) { > + found++; > + break; > + } > + } > + } > + > + if (found != ctx->interleave_ways) { > + dev_dbg(dev, > + "Not enough host bridges. Found %d for %d interleave ways requested\n", > + found, ctx->interleave_ways); > + return 0; > + } > + > + /* > + * Walk the root decoder resource range relying on cxl_rwsem.region to > + * preclude sibling arrival/departure and find the largest free space > + * gap. > + */ > + lockdep_assert_held_read(&cxl_rwsem.region); > + res = cxlrd->res->child; > + > + /* With no resource child the whole parent resource is available */ > + if (!res) > + max = resource_size(cxlrd->res); > + else > + max = 0; > + > + for (prev = NULL; res; prev = res, res = res->sibling) { > + struct resource *next = res->sibling; > + resource_size_t free = 0; > + > + /* > + * Sanity check for preventing arithmetic problems below as a > + * resource with size 0 could imply using the end field below > + * when set to unsigned zero - 1 or all f in hex. > + */ > + if (prev && !resource_size(prev)) > + continue; > + > + if (!prev && res->start > cxlrd->res->start) { > + free = res->start - cxlrd->res->start; > + max = max(free, max); > + } > + if (prev && res->start > prev->end + 1) { > + free = res->start - prev->end + 1; > + max = max(free, max); > + } > + if (next && res->end + 1 < next->start) { > + free = next->start - res->end + 1; > + max = max(free, max); > + } > + if (!next && res->end + 1 < cxlrd->res->end + 1) { > + free = cxlrd->res->end + 1 - res->end + 1; > + max = max(free, max); > + } > + } > + > + dev_dbg(cxlrd_dev(cxlrd), "found %pa bytes of free space\n", &max); > + if (max > ctx->max_hpa) { > + if (ctx->cxlrd) > + put_device(cxlrd_dev(ctx->cxlrd)); > + get_device(cxlrd_dev(cxlrd)); > + ctx->cxlrd = cxlrd; > + ctx->max_hpa = max; > + } > + return 0; > +} > + > +/** > + * cxl_get_hpa_freespace - find a root decoder with free capacity per constraints > + * @cxlmd: the mem device requiring the HPA > + * @interleave_ways: number of entries in @host_bridges > + * @flags: CXL_DECODER_F flags for selecting RAM vs PMEM, and Type2 device > + * @max_avail_contig: output parameter of max contiguous bytes available in the > + * returned decoder > + * > + * Returns a pointer to a struct cxl_root_decoder > + * > + * The return tuple of a 'struct cxl_root_decoder' and 'bytes available given > + * in (@max_avail_contig))' is a point in time snapshot. If by the time the > + * caller goes to use this decoder and its capacity is reduced then caller needs > + * to loop and retry. > + * > + * The returned root decoder has an elevated reference count that needs to be > + * put with cxl_put_root_decoder(cxlrd). > + */ > +struct cxl_root_decoder *cxl_get_hpa_freespace(struct cxl_memdev *cxlmd, > + int interleave_ways, > + unsigned long flags, > + resource_size_t *max_avail_contig) > +{ > + struct cxl_port *endpoint = cxlmd->endpoint; Do the assignment right before the check. Would help prevent future issues of use before check. > + struct cxlrd_max_context ctx = { > + .flags = flags, > + }; > + struct cxl_port *root_port; > + > + if (!endpoint) { > + dev_dbg(&cxlmd->dev, "endpoint not linked to memdev\n"); > + return ERR_PTR(-ENXIO); > + } > + > + ctx.host_bridges = &endpoint->host_bridge; Would there ever be a scenario where there would be multiple host bridges that requires this to be an array? I'm not seeing that usage in this patch series. DJ > + > + struct cxl_root *root __free(put_cxl_root) = find_cxl_root(endpoint); > + if (!root) { > + dev_dbg(&endpoint->dev, "endpoint is not related to a root port\n"); > + return ERR_PTR(-ENXIO); > + } > + > + root_port = &root->port; > + scoped_guard(rwsem_read, &cxl_rwsem.region) > + device_for_each_child(&root_port->dev, &ctx, find_max_hpa); > + > + if (!ctx.cxlrd) > + return ERR_PTR(-ENOMEM); > + > + *max_avail_contig = ctx.max_hpa; > + return ctx.cxlrd; > +} > +EXPORT_SYMBOL_NS_GPL(cxl_get_hpa_freespace, "CXL"); > + > +void cxl_put_root_decoder(struct cxl_root_decoder *cxlrd) > +{ > + put_device(cxlrd_dev(cxlrd)); > +} > +EXPORT_SYMBOL_NS_GPL(cxl_put_root_decoder, "CXL"); > + > static ssize_t size_store(struct device *dev, struct device_attribute *attr, > const char *buf, size_t len) > { > diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h > index 793d4dfe51a2..076640e91ee0 100644 > --- a/drivers/cxl/cxl.h > +++ b/drivers/cxl/cxl.h > @@ -664,6 +664,9 @@ struct cxl_root_decoder *to_cxl_root_decoder(struct device *dev); > struct cxl_switch_decoder *to_cxl_switch_decoder(struct device *dev); > struct cxl_endpoint_decoder *to_cxl_endpoint_decoder(struct device *dev); > bool is_root_decoder(struct device *dev); > + > +#define cxlrd_dev(cxlrd) (&(cxlrd)->cxlsd.cxld.dev) > + > bool is_switch_decoder(struct device *dev); > bool is_endpoint_decoder(struct device *dev); > struct cxl_root_decoder *cxl_root_decoder_alloc(struct cxl_port *port, > diff --git a/include/cxl/cxl.h b/include/cxl/cxl.h > index 043fc31c764e..2ec514c77021 100644 > --- a/include/cxl/cxl.h > +++ b/include/cxl/cxl.h > @@ -250,4 +250,10 @@ int cxl_set_capacity(struct cxl_dev_state *cxlds, u64 capacity); > struct cxl_memdev *devm_cxl_add_memdev(struct device *host, > struct cxl_dev_state *cxlds, > const struct cxl_memdev_ops *ops); > +struct cxl_port; > +struct cxl_root_decoder *cxl_get_hpa_freespace(struct cxl_memdev *cxlmd, > + int interleave_ways, > + unsigned long flags, > + resource_size_t *max); > +void cxl_put_root_decoder(struct cxl_root_decoder *cxlrd); > #endif /* __CXL_CXL_H__ */