From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.16]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 092AE2010EE for ; Wed, 3 Sep 2025 18:21:04 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.16 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1756923667; cv=none; b=EKun+04EJW9I4jy0jwnObah8NgMWLmvdMjN7CZnh/R5SIx+61ozw3TGo2fGxyuybllb6xGHel1pIW6Qew3HoXWejD7deLe0jNW/0SNJKKHQ2Cmz7L34RXHDvn8750ZCgiy6PjSV11bpQsdUUkonxK+nc5tUnuvlEzgI2GmeqxWA= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1756923667; c=relaxed/simple; bh=LpAL/E2sLZoP7m6mN/mBOBKiK8Yb8gqEaT5KH5Q1jNg=; h=Message-ID:Date:MIME-Version:Subject:To:Cc:References:From: In-Reply-To:Content-Type; b=csUwDQPe+21fKpac10OUv87OjnGXncE6Q0unpWbcxENyWonvBSK+yvSomESN4CsxL4UJn0OddYtjM6KsLFF28hNnvsjf6/fdrHQHyRUJowMTpgTdvvZu1FAC3/GIBzOXSERULfWP7vY4iW9oQL4N3VGLw8FZyZx5kc3KbKPJ4tU= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=gJHe0gDX; arc=none smtp.client-ip=192.198.163.16 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="gJHe0gDX" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1756923665; x=1788459665; h=message-id:date:mime-version:subject:to:cc:references: from:in-reply-to:content-transfer-encoding; bh=LpAL/E2sLZoP7m6mN/mBOBKiK8Yb8gqEaT5KH5Q1jNg=; b=gJHe0gDXtUbeSCzyuGaGnHo2wbSjRcwlAoQCDicthlvsQF15r/wXeUTX njt2NNhajQwD7TnNK3gu5yeQo/trf7pZnKLmupsydV3QW0yhzcz3Byx0K d823Ghb31mYx+QDgVQDQ5I2vH4FbpGBBaJed2qXsU0lxndYlELJocgE2I OIkLSj5rlUZ5wqa5a8Zrl79xumkkBjcf5gsPx2/Zk6myyj+ugKQCnNiOu QOEsf8qpRV2xVBx8ypWUdvDYvYMcXMmS+8JE8YX583H23d+tz3MIfMOyT qCpwP6kBDF2KF/HO5X2GpNnitkzEn17PxbY319bGyPjpuQEN8dPn5Wivq A==; X-CSE-ConnectionGUID: lZMwsl6wRnmURemRtnsqCw== X-CSE-MsgGUID: TITsGGYmQxKp+j5TRT8C4Q== X-IronPort-AV: E=McAfee;i="6800,10657,11542"; a="46821102" X-IronPort-AV: E=Sophos;i="6.18,236,1751266800"; d="scan'208";a="46821102" Received: from orviesa004.jf.intel.com ([10.64.159.144]) by fmvoesa110.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 03 Sep 2025 11:21:04 -0700 X-CSE-ConnectionGUID: f7SU4IVvSZ23c9Yp1iHwwA== X-CSE-MsgGUID: lpkYXoBfTuC6E7Rv9q6pig== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.18,236,1751266800"; d="scan'208";a="176020480" Received: from kcaccard-desk.amr.corp.intel.com (HELO [10.125.109.251]) ([10.125.109.251]) by orviesa004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 03 Sep 2025 11:21:03 -0700 Message-ID: Date: Wed, 3 Sep 2025 11:21:02 -0700 Precedence: bulk X-Mailing-List: linux-cxl@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v8 05/11] cxl: Defer dport allocation for switch ports To: Robert Richter Cc: linux-cxl@vger.kernel.org, dave@stgolabs.net, jonathan.cameron@huawei.com, alison.schofield@intel.com, vishal.l.verma@intel.com, ira.weiny@intel.com, dan.j.williams@intel.com References: <20250814222151.3520500-1-dave.jiang@intel.com> <20250814222151.3520500-6-dave.jiang@intel.com> Content-Language: en-US From: Dave Jiang In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit On 9/1/25 10:29 AM, Robert Richter wrote: > On 27.08.25 14:15:21, Dave Jiang wrote: >> >> >> On 8/20/25 5:41 AM, Robert Richter wrote: >>> Hi Dave, >>> >>> see my comments below. >>> >>> On 14.08.25 15:21:45, Dave Jiang wrote: >>>> The current implementation enumerates the dports during the cxl_port >>>> driver probe. Without an endpoint connected, the dport may not be >>>> active during port probe. This scheme may prevent a valid hardware >>>> dport id to be retrieved and MMIO registers to be read when an endpoint >>>> is hot-plugged. Move the dport allocation and setup to behind memdev >>>> probe so the endpoint is guaranteed to be connected. >>>> >>>> In the original enumeration behavior, there are 3 phases (or 2 if no CXL >>>> switches) for port creation. cxl_acpi() creates a Root Port (RP) from the >>>> ACPI0017.N device. Through that it enumerates downstream ports composed >>>> of ACPI0016.N devices through add_host_bridge_dport(). Once done, it >>>> uses add_host_bridge_uport() to create the ports that enumerate the PCI >>>> RPs as the dports of these ports. Every time a port is created, the port >>>> driver is attached, cxl_switch_porbe_probe() is called and >>>> devm_cxl_port_enumerate_dports() is invoked to enumerate and probe >>>> the dports. >>>> >>>> The second phase is if there are any CXL switches. When the pci endpoint >>>> device driver (cxl_pci) calls probe, it will add a mem device and triggers >>>> the cxl_mem_probe(). cxl_mem_probe() calls devm_cxl_enumerate_ports() >>>> and attempts to discovery and create all the ports represent CXL switches. >>>> During this phase, a port is created per switch and the attached dports >>>> are also enumerated and probed. >>>> >>>> The last phase is creating endpoint port which happens for all endpoint >>>> devices. >>>> >>>> In this commit, the port create and its dport probing in cxl_acpi is not >>>> changed. That will be handled later. The behavior change is only for CXL >>>> switch ports. Only the dport that is part of the path for an endpoint >>>> device to the RP will be probed. This happens naturally by the code >>>> walking up the device hierarchy and identifying the upstream device and >>>> the downstream device. >>>> >>>> The new sequence is instead of creating all possible dports at initial >>>> port creation, defer port instantiation until a memdev beneath that >>>> dport arrives. Introduce devm_cxl_create_or_extend_port() to centralize >>>> the creation and extension of ports with new dports as memory devices >>>> arrive. As part of this rework, switch decoder target list is amended >>>> at runtime as dports show up. >>>> >>>> While the decoders are allocated during the port driver probe, >>>> The decoders must also be updated since previously it's all done when all >>>> the dports are setup and now every time a dport is setup per endpoint, the >>>> switch target listing need to be updated with new dport. A >>>> guard(rwsem_write) is used to update decoder targets. This is similar to >>>> when decoder_populate_target() is called and the decoder programming >>>> must be protected. >>>> >>>> Link: https://lore.kernel.org/linux-cxl/20250305100123.3077031-1-rrichter@amd.com/ >>>> Reviewed-by: Jonathan Cameron >>>> Signed-off-by: Dave Jiang >>>> --- >>>> v8: >>>> - grammar and spelling fixups (Dan) >>>> - Clarify commit log story. (Dan) >>>> - Move register mapping and decoder enumeration to when first dport shows up (Dan) >>>> - Fix kdoc indentation issue with devm_cxl_add_dport_by_dev() >>>> - cxl_port_update_total_dports() -> cxl_probe_possible_dports(). (Dan) >>>> - Remove failure path for possible dports == 0. (Dan, Robert) >>>> - update_switch_decoder() -> update_decoder_targets(). (Dan) >>>> - Remove lock asserts where not needed. (Dan) >>>> - Add support for passthrough decoder init. (Dan) >>>> - Return -ENXIO when no driver attached. (Dan) >>>> - Move guard() from devm-cxl_add_dport_by_uport. (Dan, Robert) >>>> - Add devm_cxl_create_or_extend_port() helper. (Dan) >>>> - Remove shortcut for the port iteration path. Find better way to deal. (Dan, Robert) >>>> - Remove 'new_dport' local var. (Robert) >>>> - Use find_cxl_port_by_uport() instead of find_cxl_port(). (Robert) >>>> - Move port check logic to add_port_attach_ep(). (Robert) >>>> --- >>>> drivers/cxl/core/cdat.c | 2 +- >>>> drivers/cxl/core/core.h | 2 + >>>> drivers/cxl/core/hdm.c | 6 - >>>> drivers/cxl/core/pci.c | 81 +++++++++++ >>>> drivers/cxl/core/port.c | 287 +++++++++++++++++++++++++++++++------- >>>> drivers/cxl/core/region.c | 4 +- >>>> drivers/cxl/cxl.h | 3 + >>>> drivers/cxl/port.c | 29 +--- >>>> 8 files changed, 331 insertions(+), 83 deletions(-) >>>> >>>> diff --git a/drivers/cxl/core/cdat.c b/drivers/cxl/core/cdat.c >>>> index c0af645425f4..b156b81a9b20 100644 >>>> --- a/drivers/cxl/core/cdat.c >>>> +++ b/drivers/cxl/core/cdat.c >>>> @@ -338,7 +338,7 @@ static int match_cxlrd_hb(struct device *dev, void *data) >>>> >>>> guard(rwsem_read)(&cxl_rwsem.region); >>>> for (int i = 0; i < cxlsd->nr_targets; i++) { >>>> - if (host_bridge == cxlsd->target[i]->dport_dev) >>>> + if (cxlsd->target[i] && host_bridge == cxlsd->target[i]->dport_dev) >>>> return 1; >>>> } >>>> >>>> diff --git a/drivers/cxl/core/core.h b/drivers/cxl/core/core.h >>>> index 2669f251d677..2ac71eb459e6 100644 >>>> --- a/drivers/cxl/core/core.h >>>> +++ b/drivers/cxl/core/core.h >>>> @@ -146,6 +146,8 @@ int cxl_port_get_switch_dport_bandwidth(struct cxl_port *port, >>>> int cxl_ras_init(void); >>>> void cxl_ras_exit(void); >>>> int cxl_gpf_port_setup(struct cxl_dport *dport); >>>> +struct cxl_dport *devm_cxl_add_dport_by_dev(struct cxl_port *port, >>>> + struct device *dport_dev); >>>> >>>> #ifdef CONFIG_CXL_FEATURES >>>> struct cxl_feat_entry * >>>> diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c >>>> index cee68bbc7ff6..5263e9eba7d0 100644 >>>> --- a/drivers/cxl/core/hdm.c >>>> +++ b/drivers/cxl/core/hdm.c >>>> @@ -52,8 +52,6 @@ static int add_hdm_decoder(struct cxl_port *port, struct cxl_decoder *cxld) >>>> int devm_cxl_add_passthrough_decoder(struct cxl_port *port) >>>> { >>>> struct cxl_switch_decoder *cxlsd; >>>> - struct cxl_dport *dport = NULL; >>>> - unsigned long index; >>>> struct cxl_hdm *cxlhdm = dev_get_drvdata(&port->dev); >>>> >>>> /* >>>> @@ -69,10 +67,6 @@ int devm_cxl_add_passthrough_decoder(struct cxl_port *port) >>>> >>>> device_lock_assert(&port->dev); >>>> >>>> - xa_for_each(&port->dports, index, dport) >>>> - break; >>>> - cxlsd->cxld.target_map[0] = dport->port_id; >>>> - >>> >>> The change of initialization of cxlsd->cxld.target_map[] could have >>> been a separate patch to reduce size of this patch. >>> >>>> return add_hdm_decoder(port, &cxlsd->cxld); >>>> } >>>> EXPORT_SYMBOL_NS_GPL(devm_cxl_add_passthrough_decoder, "CXL"); >>>> diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c >>>> index b50551601c2e..b9d770f1aa7b 100644 >>>> --- a/drivers/cxl/core/pci.c >>>> +++ b/drivers/cxl/core/pci.c >>>> @@ -24,6 +24,44 @@ static unsigned short media_ready_timeout = 60; >>>> module_param(media_ready_timeout, ushort, 0644); >>>> MODULE_PARM_DESC(media_ready_timeout, "seconds to wait for media ready"); >>>> >>>> +/** >>>> + * devm_cxl_add_dport_by_dev - allocate a dport by dport device >>>> + * @port: cxl_port that hosts the dport >>>> + * @dport_dev: 'struct device' of the dport >>>> + * >>>> + * Returns the allocate dport on success or ERR_PTR() of -errno on error >>>> + */ >>>> +struct cxl_dport *devm_cxl_add_dport_by_dev(struct cxl_port *port, >>> >>> This function only determines the port_num. How about only implement >>> this in a function cxl_pci_get_port_num() and call devm_cxl_add_dport >>> directly? >> > >> I can split out the code to get the port_num locally, but we can't >> call devm_cxl_add_dport() directly in core/port.c because we need >> the map.resource and in order to retrieve that cxl_find_regblock() >> requires a pci dev. > > I mean the following: > > In the mock case, there is always a decoder. That is, > devm_cxl_add_passthrough_decoder() will only be used for pci devs. > > Create cxl_port_has_multiple_dports() which contains: > > if (!dev_is_pci(...)) > return false; > /* pci_walk_bus() and inspect dports: */ > ... > > In cxl_switch_port_setup(): > > rc = devm_cxl_enumerate_decoders(...) > if (!rc) > return 0; > if (cxl_port_has_multiple_dports(...)) I think you probably mean cxl_port_has_single_dport()? Otherwise it would be if (!cxl_port_has_multiple_dports(...)) { rc = devm_cxl_add_passthrough_decoder(...); ... } And any error would cause a creation of passthrough decoder. DJ > rc = devm_cxl_add_passthrough_decoder(...); > > You don't need function devm_cxl_add_dport_by_dev() any longer, just > use devm_cxl_add_dport() instead. > >>> >>> That would nicely fit into core/pci.c. >>> >>>> + struct device *dport_dev) >>>> +{ >>>> + struct cxl_register_map map; >>>> + struct pci_dev *pdev; >>>> + u32 lnkcap, port_num; >>>> + int type; >>>> + int rc; >>>> + >>>> + if (!dev_is_pci(dport_dev)) >>>> + return ERR_PTR(-EINVAL); >>>> + >>>> + device_lock_assert(&port->dev); >>>> + >>>> + pdev = to_pci_dev(dport_dev); >>>> + type = pci_pcie_type(pdev); >>>> + if (type != PCI_EXP_TYPE_DOWNSTREAM && type != PCI_EXP_TYPE_ROOT_PORT) >>>> + return ERR_PTR(-EINVAL); >>>> + >>>> + if (pci_read_config_dword(pdev, pci_pcie_cap(pdev) + PCI_EXP_LNKCAP, >>>> + &lnkcap)) >>>> + return ERR_PTR(-ENXIO); >>>> + >>>> + rc = cxl_find_regblock(pdev, CXL_REGLOC_RBI_COMPONENT, &map); >>>> + if (rc) >>>> + dev_dbg(&port->dev, "failed to find component registers\n"); >>>> + >>>> + port_num = FIELD_GET(PCI_EXP_LNKCAP_PN, lnkcap); >>> >>> So, just return port_num instead. >>> >>>> + return devm_cxl_add_dport(port, &pdev->dev, port_num, map.resource); >>>> +} >>>> + >>>> struct cxl_walk_context { >>>> struct pci_bus *bus; >>>> struct cxl_port *port; >>>> @@ -1169,3 +1207,46 @@ int cxl_gpf_port_setup(struct cxl_dport *dport) >>>> >>>> return 0; >>>> } >>>> + >>>> +static int count_dports(struct pci_dev *pdev, void *data) >>>> +{ >>>> + struct cxl_walk_context *ctx = data; >>>> + int type = pci_pcie_type(pdev); >>>> + >>>> + if (pdev->bus != ctx->bus) >>>> + return 0; >>>> + if (!pci_is_pcie(pdev)) >>>> + return 0; >>>> + if (type != ctx->type) >>>> + return 0; >>>> + >>>> + ctx->count++; >>>> + return 0; >>>> +} >>>> + >>>> +int cxl_port_get_possible_dports(struct cxl_port *port) >>>> +{ >>>> + struct pci_bus *bus = cxl_port_to_pci_bus(port); >>>> + struct cxl_walk_context ctx; >>>> + int type; >>>> + >>>> + if (!bus) { >>>> + dev_err(&port->dev, "No PCI bus found for port %s\n", >>>> + dev_name(&port->dev)); >>>> + return -ENXIO; >>>> + } >>>> + >>>> + if (pci_is_root_bus(bus)) >>>> + type = PCI_EXP_TYPE_ROOT_PORT; >>>> + else >>>> + type = PCI_EXP_TYPE_DOWNSTREAM; >>>> + >>>> + ctx = (struct cxl_walk_context) { >>>> + .bus = bus, >>>> + .type = type, >>>> + }; >>>> + pci_walk_bus(bus, count_dports, &ctx); >>> >>> Don't walk the whole bus, just check children of port->uport_dev. >> > >> cxl_port_to_pci_bus() gets the pdev->subordinate of the >> port->uport_dev. So I think that's equivalent of checking the >> children of port->uport_dev and not actually walking the whole pci >> bus no? > > pci_walk_bus() also calls subordinates. So it is equivalent, but > count_dports is called for other devices too that are not children. > And it is not obvious that only direct children are counted. Use > device_for_each_child()? > >>> >>>> + >>>> + return ctx.count; >>>> +} >>>> +EXPORT_SYMBOL_NS_GPL(cxl_port_get_possible_dports, "CXL"); >>> >>> See below for my comment on possible_dports. >>> >>> Since we only check for count > 1 the implemntation could be >>> simplified and renamed to e.g. cxl_port_has_multiple_dports which >>> could easily be used to call devm_cxl_add_passthrough_decoder(). >> > >> This would be possible if the function can return a bool. However, >> it is possible to encounter errors. And errors should not be >> equivalent to a false (0) return value and resulting in a >> passthrough decoder creation. Thus I think we should stay with the >> current function name. > > Ok, but see also above. > >>> >>>> diff --git a/drivers/cxl/core/port.c b/drivers/cxl/core/port.c >>>> index 25209952f469..877f888ee8f5 100644 >>>> --- a/drivers/cxl/core/port.c >>>> +++ b/drivers/cxl/core/port.c >>>> @@ -1367,21 +1367,6 @@ static struct cxl_port *find_cxl_port(struct device *dport_dev, >>>> return port; >>>> } >>>> >>>> -static struct cxl_port *find_cxl_port_at(struct cxl_port *parent_port, >>>> - struct device *dport_dev, >>>> - struct cxl_dport **dport) >>>> -{ >>>> - struct cxl_find_port_ctx ctx = { >>>> - .dport_dev = dport_dev, >>>> - .parent_port = parent_port, >>>> - .dport = dport, >>>> - }; >>>> - struct cxl_port *port; >>>> - >>>> - port = __find_cxl_port(&ctx); >>>> - return port; >>>> -} >>>> - >>>> /* >>>> * All users of grandparent() are using it to walk PCIe-like switch port >>>> * hierarchy. A PCIe switch is comprised of a bridge device representing the >>>> @@ -1557,24 +1542,221 @@ static resource_size_t find_component_registers(struct device *dev) >>>> return map.resource; >>>> } >>>> >>>> +static int match_port_by_uport(struct device *dev, const void *data) >>>> +{ >>>> + const struct device *uport_dev = data; >>>> + struct cxl_port *port; >>>> + >>>> + if (!is_cxl_port(dev)) >>>> + return 0; >>>> + >>>> + port = to_cxl_port(dev); >>>> + return uport_dev == port->uport_dev; >>>> +} >>>> + >>>> +/* >>>> + * Function takes a device reference on the port device. Caller should do a >>>> + * put_device() when done. >>>> + */ >>>> +static struct cxl_port *find_cxl_port_by_uport(struct device *uport_dev) >>>> +{ >>>> + struct device *dev; >>>> + >>>> + dev = bus_find_device(&cxl_bus_type, NULL, uport_dev, match_port_by_uport); >>>> + if (dev) >>>> + return to_cxl_port(dev); >>>> + return NULL; >>>> +} >>>> + >>>> +static int update_decoder_targets(struct device *dev, void *data) >>>> +{ >>>> + struct cxl_dport *dport = data; >>>> + struct cxl_switch_decoder *cxlsd; >>>> + struct cxl_decoder *cxld; >>>> + int i; >>>> + >>>> + if (!is_switch_decoder(dev)) >>>> + return 0; >>>> + >>>> + cxlsd = to_cxl_switch_decoder(dev); >>>> + cxld = &cxlsd->cxld; >>>> + guard(rwsem_write)(&cxl_rwsem.region); >>>> + >>>> + /* Short cut for passthrough decoder */ >>>> + if (cxlsd->nr_targets == 1) { >>> >>> I think we should still check port_id. That is, remove the shortcut. >>> If nr_targets == 1, then interleave_ways should be one too, so you >>> gain nothing. Plus, you also see the dev_dbg(). >> >> ok >> >>> >>>> + cxlsd->target[0] = dport; >>>> + return 0; >>>> + } >>>> + >>>> + for (i = 0; i < cxld->interleave_ways; i++) { >>>> + if (cxld->target_map[i] == dport->port_id) { >>>> + cxlsd->target[i] = dport; >>>> + dev_dbg(dev, "dport%d found in target list, index %d\n", >>>> + dport->port_id, i); >>>> + return 0; >>> >>> Only one target exists, right? Stop the iteration by returning a >>> non-zero here (caller needs to be adjusted then). >> >> ok >> >>> >>>> + } >>>> + } >>>> + >>>> + return 0; >>>> +} >>>> + >>>> +static int cxl_decoders_dport_update(struct cxl_dport *dport) >>>> +{ >>>> + return device_for_each_child(&dport->port->dev, dport, >>>> + update_decoder_targets); >>> >>> Might need changes if update_decoder_targets returns 1 to stop the >>> iterator. >> >> ok >> >>> >>>> +} >>>> + >>>> +static int cxl_switch_port_setup(struct cxl_port *port) >>>> +{ >>> >>> Could you factor out that function in a separate patch? >>> >>> The function only sets up decoders. Name it >>> cxl_switch_port_setup_decoders()? >>> >>>> + struct cxl_hdm *cxlhdm; >>>> + >>>> + cxlhdm = devm_cxl_setup_hdm(port, NULL); >>>> + if (!IS_ERR(cxlhdm)) >>>> + return devm_cxl_enumerate_decoders(cxlhdm, NULL); >>>> + >>>> + if (PTR_ERR(cxlhdm) != -ENODEV) { >>>> + dev_err(&port->dev, "Failed to map HDM decoder capability\n"); >>>> + return PTR_ERR(cxlhdm); >>>> + } >>>> + >>>> + if (port->possible_dports == 1) { >>>> + dev_dbg(&port->dev, "Fallback to passthrough decoder\n"); >>>> + return devm_cxl_add_passthrough_decoder(port); >>> >>> Imo, the possible_dports handling should be removed as it only >>> introduces dead code. mock_cxl_setup_hdm() always returns a valid >>> cxlhdm (unless for -ENOMEM) and the mock case never reaches this code >>> here. >>> >>> So how about moving (the "real") devm_cxl_add_passthrough_decoder() >>> and cxl_port_get_possible_dports() to devm_cxl_enumerate_decoders()? >>> devm_cxl_add_passthrough_decoder() would be static then and >>> cxl_port_get_possible_dports() will be a core.h function only. Then, >>> mock_cxl_add_passthrough_decoder() could be removed too. >>> >>> I really would like to have a clean core module interface that allows >>> an easy implementation of cxl_test and avoid too much impact to the >>> driver code. >> >> So after looking at this a bit, it looks like we need a bigger refactor than just devm_cxl_enumerate_decoders(). I have an attempt in the next rev you can take a look. It reduces from 3-4 mock functions down to 2. >> >>> >>>> + } >>>> + >>>> + dev_err(&port->dev, "HDM decoder capability not found\n"); >>>> + return -ENXIO; >>>> +} >>>> + >>>> +DEFINE_FREE(put_cxl_dport, struct cxl_dport *, if (!IS_ERR_OR_NULL(_T)) reap_dport(_T)) >>>> +static struct cxl_dport *cxl_port_get_or_add_dport(struct cxl_port *port, >>>> + struct device *dport_dev) >>>> +{ >>>> + struct cxl_dport *dport; >>>> + int rc; >>>> + >>>> + guard(device)(&port->dev); >>>> + >>>> + if (!port->dev.driver) >>>> + return ERR_PTR(-ENXIO); >>>> + >>>> + dport = cxl_find_dport_by_dev(port, dport_dev); >>>> + if (dport) >>>> + return dport; >>> >>> What is the case if there is already a dport bound to the port? Since >>> there is a 1:1 mapping downstream, there is only one allocation and I >>> would expect that dport never exists and an -EBUSY should be returned >>> otherwise. >> >> ok >> >>> >>>> + >>>> + struct cxl_dport *new_dport __free(put_cxl_dport) = >>>> + devm_cxl_add_dport_by_dev(port, dport_dev); >>> >>> See my comment on devm_cxl_add_dport_by_dev() above. >>> >>>> + if (IS_ERR(new_dport)) >>>> + return new_dport; >>>> + >>>> + cxl_switch_parse_cdat(port); >>>> + >>>> + /* >>>> + * First instance of dport appearing, need to setup the port, including >>>> + * allocating decoders. >>>> + */ >>>> + if (port->nr_dports == 1) { >>>> + rc = cxl_switch_port_setup(port); >>> >>> Can't this be done with port creation? I don't see a reason doing this >>> late at this point. >>> >>>> + if (rc) >>>> + return ERR_PTR(rc); >>>> + return no_free_ptr(new_dport); >>>> + } >>>> + >>>> + rc = cxl_decoders_dport_update(new_dport); >>>> + if (rc) >>>> + return ERR_PTR(rc); >>> >>> Maybe unfold cxl_decoders_dport_update() here? >> >> ok >> >>> >>>> + >>>> + return no_free_ptr(new_dport); >>>> +} >>>> + >>>> +static struct cxl_dport *devm_cxl_add_dport_by_uport(struct device *uport_dev, >>>> + struct device *dport_dev) >>>> +{ >>>> + struct cxl_port *port __free(put_cxl_port) = >>>> + find_cxl_port_by_uport(uport_dev); >>>> + >>>> + if (!port) >>>> + return ERR_PTR(-ENODEV); >>>> + >>>> + return cxl_port_get_or_add_dport(port, dport_dev); >>>> +} >>> >>> That function can be removed, see below. >> >> ok >> >>> >>>> + >>>> +static struct cxl_dport * >>>> +devm_cxl_create_or_extend_port(struct device *ep_dev, >>>> + struct cxl_port *parent_port, >>>> + struct cxl_dport *parent_dport, >>>> + struct device *uport_dev, >>>> + struct device *dport_dev) >>>> +{ >>>> + resource_size_t component_reg_phys; >>>> + >>>> + guard(device)(&parent_port->dev); >>>> + >>>> + if (!parent_port->dev.driver) { >>>> + dev_warn(ep_dev, >>>> + "port %s:%s disabled, failed to enumerate CXL.mem\n", >>>> + dev_name(&parent_port->dev), dev_name(uport_dev)); >>>> + return ERR_PTR(-ENXIO); >>>> + } >>>> + >>>> + struct cxl_port *port __free(put_cxl_port) = >>>> + find_cxl_port_by_uport(uport_dev); >>>> + >>>> + if (!port) { >>>> + component_reg_phys = find_component_registers(uport_dev); >>>> + port = devm_cxl_add_port(&parent_port->dev, uport_dev, >>>> + component_reg_phys, parent_dport); >>>> + if (IS_ERR(port)) >>>> + return (struct cxl_dport *)port; >>>> + >>>> + /* >>>> + * retry to make sure a port is found. a port device >>>> + * reference is taken. >>>> + */ >>>> + port = find_cxl_port_by_uport(uport_dev); >>>> + if (!port) >>>> + return ERR_PTR(-ENODEV); >>>> + >>>> + dev_dbg(ep_dev, "created port %s:%s\n", >>>> + dev_name(&port->dev), dev_name(port->uport_dev)); >>>> + } >>>> + >>>> + return cxl_port_get_or_add_dport(port, dport_dev); >>>> +} >>>> + >>>> static int add_port_attach_ep(struct cxl_memdev *cxlmd, >>>> struct device *uport_dev, >>>> struct device *dport_dev) >>>> { >>>> struct device *dparent = grandparent(dport_dev); >>>> struct cxl_dport *dport, *parent_dport; >>>> - resource_size_t component_reg_phys; >>>> int rc; >>>> >>>> if (is_cxl_host_bridge(dparent)) { >>>> + struct cxl_port *port __free(put_cxl_port) = >>>> + find_cxl_port_by_uport(uport_dev); >>>> /* >>>> * The iteration reached the topology root without finding the >>>> * CXL-root 'cxl_port' on a previous iteration, fail for now to >>>> * be re-probed after platform driver attaches. >>>> */ >>>> - dev_dbg(&cxlmd->dev, "%s is a root dport\n", >>>> - dev_name(dport_dev)); >>>> - return -ENXIO; >>>> + if (!port) { >>>> + dev_dbg(&cxlmd->dev, "%s is a root dport\n", >>>> + dev_name(dport_dev)); >>>> + return -ENXIO; >>>> + } >>>> + >>>> + /* >>>> + * While the port is found, there may not be a dport associated >>>> + * yet. Try to associate the dport to the port. On return success, >>>> + * the iteration will restart with the dport now attached. >>>> + */ >>>> + dport = devm_cxl_add_dport_by_uport(uport_dev, >>>> + dport_dev); >>> >>> port is known here, use cxl_port_get_or_add_dport(port, dport_dev) >>> instead. Remove devm_cxl_add_dport_by_uport(). >> >> ok >> >>> >>>> + if (IS_ERR(dport)) >>>> + return PTR_ERR(dport); >>>> + >>>> + return 0; >>>> } >>>> >>>> struct cxl_port *parent_port __free(put_cxl_port) = >>>> @@ -1584,36 +1766,12 @@ static int add_port_attach_ep(struct cxl_memdev *cxlmd, >>>> return -EAGAIN; >>>> } >>>> >>>> - /* >>>> - * Definition with __free() here to keep the sequence of >>>> - * dereferencing the device of the port before the parent_port releasing. >>>> - */ >>>> - struct cxl_port *port __free(put_cxl_port) = NULL; >>>> - scoped_guard(device, &parent_port->dev) { >>>> - if (!parent_port->dev.driver) { >>>> - dev_warn(&cxlmd->dev, >>>> - "port %s:%s disabled, failed to enumerate CXL.mem\n", >>>> - dev_name(&parent_port->dev), dev_name(uport_dev)); >>>> - return -ENXIO; >>>> - } >>>> + dport = devm_cxl_create_or_extend_port(&cxlmd->dev, parent_port, >>>> + parent_dport, uport_dev, >>>> + dport_dev); >>> >>> You expand add_port_attach_ep() here. This function was originally >>> called if there is no *port* at all. Now, as the dport_dev is not yet >>> registered, the port may already exist, but it is not found since the >>> dport_dev is not yet registered and add_port_attach_ep() is called now >>> even if the port exists. I think we should move that dport_dev >>> registration a level higher to devm_cxl_enumerate_ports(). That might >>> need a cleanup of the iterator and the removal of >>> add_port_attach_ep(). >> >> Yes. The new rev will move the dport registration a level up. No need to remove add_port_attach_ep(). devm_cxl_create_or_extend_port() will be devm_cxl_create_port(). >> >>> >>>> + if (IS_ERR(dport)) >>>> + return PTR_ERR(dport); >>>> >>>> - port = find_cxl_port_at(parent_port, dport_dev, &dport); >>>> - if (!port) { >>>> - component_reg_phys = find_component_registers(uport_dev); >>>> - port = devm_cxl_add_port(&parent_port->dev, uport_dev, >>>> - component_reg_phys, parent_dport); >>>> - if (IS_ERR(port)) >>>> - return PTR_ERR(port); >>>> - >>>> - /* retry find to pick up the new dport information */ >>>> - port = find_cxl_port_at(parent_port, dport_dev, &dport); >>>> - if (!port) >>>> - return -ENXIO; >>>> - } >>>> - } >>>> - >>>> - dev_dbg(&cxlmd->dev, "add to new port %s:%s\n", >>>> - dev_name(&port->dev), dev_name(port->uport_dev)); >>>> rc = cxl_add_ep(dport, &cxlmd->dev); >>>> if (rc == -EBUSY) { >>>> /* >>>> @@ -1630,6 +1788,7 @@ int devm_cxl_enumerate_ports(struct cxl_memdev *cxlmd) >>>> { >>>> struct device *dev = &cxlmd->dev; >>>> struct device *iter; >>>> + int ports_need_create = 0; >>>> int rc; >>>> >>>> /* >>>> @@ -1654,6 +1813,8 @@ int devm_cxl_enumerate_ports(struct cxl_memdev *cxlmd) >>>> struct device *uport_dev; >>>> struct cxl_dport *dport; >>>> >>>> + ports_need_create++; >>>> + >>>> if (is_cxl_host_bridge(dport_dev)) >>>> return 0; >>>> >>>> @@ -1688,10 +1849,28 @@ int devm_cxl_enumerate_ports(struct cxl_memdev *cxlmd) >>>> >>>> cxl_gpf_port_setup(dport); >>>> >>>> + ports_need_create--; >>>> /* Any more ports to add between this one and the root? */ >>>> if (!dev_is_cxl_root_child(&port->dev)) >>>> continue; >>>> >>>> + /* >>>> + * The 'ports_need_create' variable tracks a port being >>>> + * created as it goes through this iterative loop. It's >>>> + * incremented when it first enters the loop and decremented >>>> + * when the port is found. If at the root of the hierarchy >>>> + * and the variable is not 0, then it's missing a port >>>> + * creation somewhere in the hierarchy and should restart. >>>> + * For example in a setup where there's a PCI root port, a >>>> + * switch, and an endpoint, it is possible to get to the >>>> + * PCI root port and its creation, and the switch port is >>>> + * still missing because the root port didn't exist. This >>>> + * triggers a restart of the loop to create the switch port >>>> + * now with a present root port. >>>> + */ >>>> + if (ports_need_create) >>> >>> Uh, that becomes hard. Isn't the iterator much simpler: >>> >>> * Start the iter = endpoint. >>> >>> * Find first existing parent port up to the root. >>> >>> * If that is the direct parent of the endpoint, attach it to the >>> parent (add dport etc.). Exit loop without errors. >>> >>> * Else, create port and attach it to the found parent port (including >>> dport handling). >>> >>> * Fail on errors or retry otherwise. >>> >>> So, devm_cxl_enumerate_ports() should be reworked better, also address >>> my other comments regarding add_port_attach_ep() and >>> devm_cxl_create_or_extend_port(). >> >> So I reworked this whole path a bit. Maybe not exactly what you are envisioning here but it is a lot cleaner. You can take a look at the next rev. >> >>> >>>> + goto retry; >>>> + >>>> return 0; >>>> } >>>> >>>> @@ -1700,8 +1879,10 @@ int devm_cxl_enumerate_ports(struct cxl_memdev *cxlmd) >>>> if (rc == -EAGAIN) >>>> continue; >>>> /* failed to add ep or port */ >>>> - if (rc) >>>> + if (rc < 0) >>>> return rc; >>>> + >>>> + ports_need_create = 0; >>>> /* port added, new descendants possible, start over */ >>>> goto retry; >>>> } >>>> @@ -1733,14 +1914,16 @@ static int decoder_populate_targets(struct cxl_switch_decoder *cxlsd, >>>> device_lock_assert(&port->dev); >>>> >>>> if (xa_empty(&port->dports)) >>>> - return -EINVAL; >>>> + return 0; >>>> >>>> guard(rwsem_write)(&cxl_rwsem.region); >>>> for (i = 0; i < cxlsd->cxld.interleave_ways; i++) { >>>> struct cxl_dport *dport = find_dport(port, cxld->target_map[i]); >>>> >>>> - if (!dport) >>>> - return -ENXIO; >>>> + if (!dport) { >>>> + /* dport may be activated later */ >>>> + continue; >>>> + } >>>> cxlsd->target[i] = dport; >>>> } >>> >>> Should that be dropped entirely as the target setup is done somewhere >>> else? >>> >> No. This is still needed for root ports. > > Root ports are dport_devs and don't have a target list, host bridges > have. I did not follow the entire code flow, but shouldn't > cxl_decoders_dport_update() handle that? > > -Robert > >> >> DJ >>>> >>>> diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c >>>> index 71cc42d05248..bba62867df90 100644 >>>> --- a/drivers/cxl/core/region.c >>>> +++ b/drivers/cxl/core/region.c >>>> @@ -1510,8 +1510,10 @@ static int cxl_port_setup_targets(struct cxl_port *port, >>>> cxl_rr->nr_targets_set); >>>> return -ENXIO; >>>> } >>>> - } else >>>> + } else { >>>> cxlsd->target[cxl_rr->nr_targets_set] = ep->dport; >>>> + cxlsd->cxld.target_map[cxl_rr->nr_targets_set] = ep->dport->port_id; >>>> + } >>>> inc = 1; >>>> out_target_set: >>>> cxl_rr->nr_targets_set += inc; >>>> diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h >>>> index 87a905db5ffb..df10a01376c6 100644 >>>> --- a/drivers/cxl/cxl.h >>>> +++ b/drivers/cxl/cxl.h >>>> @@ -591,6 +591,7 @@ struct cxl_dax_region { >>>> * @parent_dport: dport that points to this port in the parent >>>> * @decoder_ida: allocator for decoder ids >>>> * @reg_map: component and ras register mapping parameters >>>> + * @possible_dports: Total possible dports reported by hardware >>>> * @nr_dports: number of entries in @dports >>>> * @hdm_end: track last allocated HDM decoder instance for allocation ordering >>>> * @commit_end: cursor to track highest committed decoder for commit ordering >>>> @@ -612,6 +613,7 @@ struct cxl_port { >>>> struct cxl_dport *parent_dport; >>>> struct ida decoder_ida; >>>> struct cxl_register_map reg_map; >>>> + int possible_dports; >>>> int nr_dports; >>>> int hdm_end; >>>> int commit_end; >>>> @@ -911,6 +913,7 @@ void cxl_coordinates_combine(struct access_coordinate *out, >>>> struct access_coordinate *c2); >>>> >>>> bool cxl_endpoint_decoder_reset_detected(struct cxl_port *port); >>>> +int cxl_port_get_possible_dports(struct cxl_port *port); >>>> >>>> /* >>>> * Unit test builds overrides this to __weak, find the 'strong' version >>>> diff --git a/drivers/cxl/port.c b/drivers/cxl/port.c >>>> index cf32dc50b7a6..941a7d7157bd 100644 >>>> --- a/drivers/cxl/port.c >>>> +++ b/drivers/cxl/port.c >>>> @@ -59,34 +59,17 @@ static int discover_region(struct device *dev, void *unused) >>>> >>>> static int cxl_switch_port_probe(struct cxl_port *port) >>>> { >>>> - struct cxl_hdm *cxlhdm; >>>> - int rc; >>>> + int dports; >>>> >>>> /* Cache the data early to ensure is_visible() works */ >>>> read_cdat_data(port); >>>> >>>> - rc = devm_cxl_port_enumerate_dports(port); >>>> - if (rc < 0) >>>> - return rc; >>>> + dports = cxl_port_get_possible_dports(port); >>>> + if (dports < 0) >>>> + return dports; >>>> + port->possible_dports = dports; >>> >>> As said, I think the whole possible_dports part can be removed. >>> >>> Thanks, >>> >>> -Robert >>> >>>> >>>> - cxl_switch_parse_cdat(port); >>>> - >>>> - cxlhdm = devm_cxl_setup_hdm(port, NULL); >>>> - if (!IS_ERR(cxlhdm)) >>>> - return devm_cxl_enumerate_decoders(cxlhdm, NULL); >>>> - >>>> - if (PTR_ERR(cxlhdm) != -ENODEV) { >>>> - dev_err(&port->dev, "Failed to map HDM decoder capability\n"); >>>> - return PTR_ERR(cxlhdm); >>>> - } >>>> - >>>> - if (rc == 1) { >>>> - dev_dbg(&port->dev, "Fallback to passthrough decoder\n"); >>>> - return devm_cxl_add_passthrough_decoder(port); >>>> - } >>>> - >>>> - dev_err(&port->dev, "HDM decoder capability not found\n"); >>>> - return -ENXIO; >>>> + return 0; >>>> } >>>> >>>> static int cxl_endpoint_port_probe(struct cxl_port *port) >>>> -- >>>> 2.50.1 >>>> >>