From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id ACB523DA5B5; Thu, 30 Apr 2026 23:47:48 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777592868; cv=none; b=atMU/dGQkst4ixrC/ZdtyfG2CJEAptdO7h15y8YtHONrQmncinD3O3go7+ne9I6SmChwatd68LE4/vrdycoiTyo+jyYqMD2l1LW8QJaJ1DJYYZ03Vc9VElc4l5EmnTACEASvO51g1UHL1FnXhC7LQmAGuNFid/20BlXajRuPEEM= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777592868; c=relaxed/simple; bh=WZ3eT3xdLHy5u7YcqL0ogKQsD0wpsZ7CxqWCX+prLy0=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=EmvVYQ7LvOnoA8Q8a75Y0k7UGexg3AdZTfudTHO5Ezv3WYtY0+cBIoF2EN+cl4zDaQ7NV6eovXJC6k2RbTAo8HQfUiQH87pXHY6j79NHC9DJWVqw0Apej9CSDtoP8of0CHoKtAoaYZBH4fCs01T38xXudjd8U8diSz0RCnSavCw= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=jU2QAg42; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="jU2QAg42" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 794AAC2BCB3; Thu, 30 Apr 2026 23:47:47 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1777592868; bh=WZ3eT3xdLHy5u7YcqL0ogKQsD0wpsZ7CxqWCX+prLy0=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=jU2QAg42s+31kEUIpmjiGn9bjAHBJgt4dFWV83YJEsxoedyyx3ilKfrTLtBCbycMh wDGJDy6nIW53dcsxua+va+prM/ZNQhkUXWhIRATVfFRNfFngNgyH1GPcvydOMAW90Q Hrnvjjb7IIUV4YYWGpJ3Y5+W1rtnsj7QT54xJokqaWK1pGiY6cYl4PftKuOD6TtTHQ emQwx/pAd4c0DqdRDn0Njj+IG6lP1neJOKkDwJLijMu1Lhed9IWQmYGEfDFscSigQn XqQu/68KolzqhfudOQ+hYvUocGoSTGrqfAuprbrCnydGPgnu5Gx9NrYL4Cc7H+NiPM gj5yEgUAKeCpg== From: Jakub Kicinski To: illusion.wang@nebula-matrix.com Cc: Jakub Kicinski , dimon.zhao@nebula-matrix.com, alvin.wang@nebula-matrix.com, sam.chen@nebula-matrix.com, netdev@vger.kernel.org, andrew+netdev@lunn.ch, corbet@lwn.net, linux-doc@vger.kernel.org, lorenzo@kernel.org, pabeni@redhat.com, horms@kernel.org, vadim.fedorenko@linux.dev, lukas.bulwahn@redhat.com, edumazet@google.com, enelsonmoore@gmail.com, skhan@linuxfoundation.org, hkallweit1@gmail.com, linux-kernel@vger.kernel.org Subject: Re: [PATCH v13 net-next 07/11] net/nebula-matrix: add intr resource implementation Date: Thu, 30 Apr 2026 16:47:46 -0700 Message-ID: <20260430234746.3074632-1-kuba@kernel.org> X-Mailer: git-send-email 2.54.0 In-Reply-To: <20260428114910.2616-8-illusion.wang@nebula-matrix.com> References: <20260428114910.2616-8-illusion.wang@nebula-matrix.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit This is an AI-generated review of your patch. The human sending this email has considered the AI review valid, or at least plausible. --- net/nebula-matrix: add intr resource implementation This patch adds an interrupt resource module that tracks MSI-X vector allocation per function via two bitmaps and a per-function array, and introduces three resource ops (configure_msix_map, destroy_msix_map, enable_mailbox_irq) plus the matching hw_ops (configure_msix_map, configure_msix_info, enable_mailbox_irq) that program the device MSI-X mapping and info registers. A few questions below on the new code. > diff --git a/drivers/net/ethernet/nebula-matrix/nbl/nbl_hw/nbl_hw_leonis/nbl_resource_leonis.c b/drivers/net/ethernet/nebula-matrix/nbl/nbl_hw/nbl_hw_leonis/nbl_resource_leonis.c > index 1ec1c28d99ff..1cceb0df5e1e 100644 > --- a/drivers/net/ethernet/nebula-matrix/nbl/nbl_hw/nbl_hw_leonis/nbl_resource_leonis.c > +++ b/drivers/net/ethernet/nebula-matrix/nbl/nbl_hw/nbl_hw_leonis/nbl_resource_leonis.c > @@ -12,6 +12,7 @@ static struct nbl_resource_ops res_ops = { > .get_eth_id = nbl_res_get_eth_id, > }; > > +static bool is_ops_inited; Is this lazy initialization of a file-scope vtable safe against concurrent probes? nbl_res_setup_ops() does an unlocked check-then-set on is_ops_inited and then mutates the file-scope res_ops by calling nbl_intr_setup_ops(): if (!is_ops_inited) { ret = nbl_intr_setup_ops(&res_ops); if (ret) return ERR_PTR(-ENOMEM); is_ops_inited = true; } Two probe threads for different functions on the same device could both observe is_ops_inited == false and race on writing the same pointer fields into res_ops. Would a static designated initializer in res_ops (keeping res_ops const and dropping is_ops_inited / nbl_intr_setup_ops entirely) remove the race and simplify the scaffolding? Related: nbl_intr_setup_ops() only ever returns 0, so the caller's "if (ret) return ERR_PTR(-ENOMEM);" is unreachable, and mapping any future failure to -ENOMEM would also be incorrect. Could this function be void? [ ... ] > diff --git a/drivers/net/ethernet/nebula-matrix/nbl/nbl_hw/nbl_interrupt.c b/drivers/net/ethernet/nebula-matrix/nbl/nbl_hw/nbl_interrupt.c > new file mode 100644 > index 000000000000..92a6fff5523e > --- /dev/null > +++ b/drivers/net/ethernet/nebula-matrix/nbl/nbl_hw/nbl_interrupt.c > @@ -0,0 +1,246 @@ > +// SPDX-License-Identifier: GPL-2.0 > +/* > + * Copyright (c) 2025 Nebula Matrix Limited. > + */ > +#include > +#include > +#include "nbl_interrupt.h" > + > +static int nbl_res_intr_destroy_msix_map(struct nbl_resource_mgt *res_mgt, > + u16 func_id) > +{ > + struct nbl_interrupt_mgt *intr_mgt = res_mgt->intr_mgt; > + struct nbl_hw_ops *hw_ops = res_mgt->hw_ops_tbl->ops; > + struct device *dev = res_mgt->common->dev; > + struct nbl_msix_map_table *msix_map_table; > + u16 *interrupts; > + u16 intr_num, i; > + > + /* use ctrl dev bdf */ > + hw_ops->configure_msix_map(res_mgt->hw_ops_tbl->priv, func_id, false, 0, > + 0, 0, 0); > + > + intr_num = intr_mgt->func_intr_res[func_id].num_interrupts; > + interrupts = intr_mgt->func_intr_res[func_id].interrupts; Is func_id bounds-checked anywhere on this path? intr_mgt->func_intr_res[] is sized NBL_MAX_FUNC (520) but the u16 func_id (0..65535) is indexed directly, and nbl_res_func_id_to_bdf() (which validates against max_pf) is not called here. If func_id arrives from a mailbox message in a later patch, can this index out of bounds? The same question applies to nbl_res_intr_enable_mailbox_irq() and to the first use of func_intr_res[func_id] in nbl_res_intr_configure_msix_map() below, which happens before nbl_res_func_id_to_bdf() is called. > + > + if (!interrupts) { > + dev_err(dev, "No interrupts to clr for func_id %u\n", func_id); > + return -EINVAL; > + } > + for (i = 0; i < intr_num; i++) { > + if (interrupts[i] >= NBL_MAX_OTHER_INTERRUPT) > + clear_bit(interrupts[i] - NBL_MAX_OTHER_INTERRUPT, > + intr_mgt->interrupt_net_bitmap); > + else > + clear_bit(interrupts[i], > + intr_mgt->interrupt_others_bitmap); What serializes access to intr_mgt->interrupt_net_bitmap, intr_mgt->interrupt_others_bitmap and intr_mgt->func_intr_res[func_id] across concurrent invocations of these three ops? set_bit() / clear_bit() are individually atomic, but the sequence used in nbl_res_intr_configure_msix_map(): intr_index = find_first_zero_bit(intr_mgt->interrupt_net_bitmap, NBL_MAX_NET_INTERRUPT); ... set_bit(intr_index, intr_mgt->interrupt_net_bitmap); is a non-atomic read-modify-write. Two concurrent calls for different func_ids could each observe the same zero bit and both claim it, resulting in two functions being programmed with the same global MSI-X index. The per-function state has a similar concern: configure, destroy and enable_mailbox_irq all read/write func_intr_res[func_id].interrupts, num_interrupts and msix_map_table without any lock held. Concurrent callers on the same func_id can race the "if (interrupts) destroy" TOCTOU and realloc, double-kfree() the interrupts array, or dma_free_coherent() twice. enable_mailbox_irq dereferences interrupts[vector_id] which destroy_msix_map has just freed and set to NULL -- is this a use-after-free? The WARN_ON(msix_map_table->base_addr) below suggests single-threaded use is assumed; what enforces that? [ ... ] > + kfree(interrupts); > + intr_mgt->func_intr_res[func_id].interrupts = NULL; > + intr_mgt->func_intr_res[func_id].num_interrupts = 0; > + > + msix_map_table = &intr_mgt->func_intr_res[func_id].msix_map_table; > + dma_free_coherent(dev, msix_map_table->size, msix_map_table->base_addr, > + msix_map_table->dma); Is there a teardown path that iterates over func_intr_res[] and calls destroy_msix_map() for each active func_id on module unload / device remove? The interrupts array (kcalloc) and the msix_map_table (dma_alloc_coherent) allocated in configure_msix_map are not devm-managed, and nbl_res_remove_leonis() does not walk func_intr_res[]. Any func_id still mapped at unload time would leak both allocations and leave a stale bus address programmed in the hardware MSI-X map register. > + msix_map_table->size = 0; > + msix_map_table->base_addr = NULL; > + msix_map_table->dma = 0; > + > + return 0; > +} > + > +static int nbl_res_intr_configure_msix_map(struct nbl_resource_mgt *res_mgt, > + u16 func_id, u16 num_net_msix, > + u16 num_others_msix, > + bool net_msix_mask_en) > +{ > + struct nbl_interrupt_mgt *intr_mgt = res_mgt->intr_mgt; > + struct nbl_hw_ops *hw_ops = res_mgt->hw_ops_tbl->ops; > + struct nbl_common_info *common = res_mgt->common; > + struct nbl_msix_map_table *msix_map_table; > + struct nbl_msix_map *msix_map_entries; > + struct device *dev = common->dev; > + u16 requested, intr_index; > + u8 bus, devid, function; > + bool msix_mask_en; > + u16 *interrupts; > + int ret = 0; > + u16 i; > + > + requested = num_net_msix + num_others_msix; > + if (requested > NBL_MSIX_MAP_TABLE_MAX_ENTRIES) > + return -EINVAL; > + if (intr_mgt->func_intr_res[func_id].interrupts) > + nbl_res_intr_destroy_msix_map(res_mgt, func_id); Is this reconfigure ordering intentional? The existing mapping is torn down (dma_free_coherent, kfree, and hw_ops->configure_msix_map(valid=false)) before the new allocations are attempted. If the subsequent dma_alloc_coherent or kcalloc fails, the function returns -ENOMEM with the previously working interrupt setup already gone and no rollback. Would allocate-then-swap be safer? > + > + ret = nbl_res_func_id_to_bdf(res_mgt, func_id, &bus, &devid, &function); > + if (ret == U32_MAX) > + return -EINVAL; nbl_res_func_id_to_bdf() is declared to return int but returns U32_MAX on error, and the caller stores it in int ret and compares against U32_MAX. This currently works only because -1 sign-extends to 0xFFFFFFFF. If the callee is later changed to return a normal negative errno (for example -EINVAL), will this check silently stop detecting failures and let uninitialised bus/devid/function be programmed into hardware? > + > + msix_map_table = &intr_mgt->func_intr_res[func_id].msix_map_table; > + WARN_ON(msix_map_table->base_addr); > + msix_map_table->size = > + sizeof(struct nbl_msix_map) * NBL_MSIX_MAP_TABLE_MAX_ENTRIES; > + msix_map_table->base_addr = dma_alloc_coherent(dev, > + msix_map_table->size, > + &msix_map_table->dma, > + GFP_KERNEL); [ ... ] > + msix_map_entries = msix_map_table->base_addr; > + for (i = 0; i < requested; i++) { > + msix_map_entries[i].global_msix_index = interrupts[i]; > + msix_map_entries[i].valid = 1; > + > + if (i < num_net_msix && net_msix_mask_en) > + msix_mask_en = 1; > + else > + msix_mask_en = 0; > + hw_ops->configure_msix_info(res_mgt->hw_ops_tbl->priv, func_id, > + true, interrupts[i], bus, devid, > + function, msix_mask_en); > + } > + > + /* use ctrl dev bdf */ > + hw_ops->configure_msix_map(res_mgt->hw_ops_tbl->priv, func_id, true, > + msix_map_table->dma, common->hw_bus, > + common->devid, common->function); Is a dma_wmb() (or equivalent) needed between the stores into the dma_alloc_coherent table (msix_map_entries[i] fields above) and the MMIO writel() in hw_ops->configure_msix_map that arms the device to DMA-read that table? dma_alloc_coherent guarantees cache coherence but not ordering between CPU stores into the coherent buffer and subsequent MMIO stores on weakly-ordered architectures such as arm64 and ppc. > + > + return 0; [ ... ] > +static int nbl_res_intr_enable_mailbox_irq(struct nbl_resource_mgt *res_mgt, > + u16 func_id, u16 vector_id, > + bool enable_msix) > +{ > + struct nbl_interrupt_mgt *intr_mgt = res_mgt->intr_mgt; > + struct nbl_hw_ops *hw_ops = res_mgt->hw_ops_tbl->ops; > + u16 global_vec_id; > + > + global_vec_id = intr_mgt->func_intr_res[func_id].interrupts[vector_id]; Is this dereference guarded anywhere? There is no check that func_intr_res[func_id].interrupts is non-NULL or that vector_id is less than num_interrupts. After destroy_msix_map() runs, interrupts is NULL; and before configure_msix_map() has ever run for that func_id it is also NULL, so this becomes a NULL-pointer read at offset 2 * vector_id. When interrupts is non-NULL but vector_id is out of range, this is an out-of-bounds slab read whose value is then programmed into a hardware MSI-X register via hw_ops->enable_mailbox_irq(). > + hw_ops->enable_mailbox_irq(res_mgt->hw_ops_tbl->priv, func_id, > + enable_msix, global_vec_id); > + > + return 0; > +} > + > +/* NBL_INTR_SET_OPS(ops_name, func) > + * > + * Use X Macros to reduce setup and remove codes. > + */ > +#define NBL_INTR_OPS_TBL \ > +do { \ > + NBL_INTR_SET_OPS(configure_msix_map, \ > + nbl_res_intr_configure_msix_map); \ > + NBL_INTR_SET_OPS(destroy_msix_map, \ > + nbl_res_intr_destroy_msix_map); \ > + NBL_INTR_SET_OPS(enable_mailbox_irq, \ > + nbl_res_intr_enable_mailbox_irq); \ > +} while (0) This isn't a bug, but could this X-macro scheme be replaced with a plain designated initializer on a const struct nbl_resource_ops? The indirection through NBL_INTR_OPS_TBL / NBL_INTR_SET_OPS and the NBL_NAME(x) helper (whose stated purpose is "Used for macros to pass checkpatch") add no type safety over: static const struct nbl_resource_ops res_ops = { .configure_msix_map = nbl_res_intr_configure_msix_map, .destroy_msix_map = nbl_res_intr_destroy_msix_map, .enable_mailbox_irq = nbl_res_intr_enable_mailbox_irq, ... }; [ ... ] > diff --git a/drivers/net/ethernet/nebula-matrix/nbl/nbl_hw/nbl_resource.h b/drivers/net/ethernet/nebula-matrix/nbl/nbl_hw/nbl_resource.h > index 90c78aefa498..d4a1374e0dd2 100644 > --- a/drivers/net/ethernet/nebula-matrix/nbl/nbl_hw/nbl_resource.h > +++ b/drivers/net/ethernet/nebula-matrix/nbl/nbl_hw/nbl_resource.h > @@ -18,6 +18,37 @@ > > struct nbl_resource_mgt; > > +/* --------- INTERRUPT ---------- */ > +#define NBL_MAX_OTHER_INTERRUPT 1024 > +#define NBL_MAX_NET_INTERRUPT 4096 > + > +struct nbl_msix_map { > + u16 valid:1; > + u16 global_msix_index:13; > + u16 rsv:2; > +}; Is it safe to rely on C bitfield layout for a structure that is written into dma_alloc_coherent memory and then consumed by the device? The bit ordering of bitfields within a storage unit is implementation-defined, and GCC reverses bit ordering on big-endian targets. The same question applies to nbl_mailbox_qinfo_map_table, nbl_host_msix_info, nbl_pcompleter_host_msix_fid_table and nbl_function_msix_map, which are assembled with C bitfields and then cast to u32 * to be written into MMIO registers in nbl_hw_leonis.c. Would explicit FIELD_PREP / shifts-and-masks or __le* encoding be more portable? > + > +struct nbl_msix_map_table { > + struct nbl_msix_map *base_addr; > + dma_addr_t dma; > + size_t size; > +}; > + > +struct nbl_func_interrupt_resource_mng { > + u16 num_interrupts; > + u16 num_net_interrupts; > + u16 msix_base; > + u16 msix_max; msix_base and msix_max do not appear to be read or written anywhere in this patch. Could these be added in the patch that first uses them? > + u16 *interrupts; > + struct nbl_msix_map_table msix_map_table; > +}; [ ... ] One last question on the commit message: MSI-X Interrupt Configuration: Dynamically allocate and manage MSI-X interrupt vectors, including network interrupts and other types of interrupts. ... Interrupt Information Query: Provide interfaces to obtain the hardware register addresses and data of interrupts. The patch programs the device-private MSI-X mapping and info registers and tracks index bitmaps, but there is no pci_alloc_irq_vectors(), no request_irq(), no IRQ handler, and no "interrupt information query" getter exposed through nbl_resource_ops -- only configure_msix_map, destroy_msix_map and enable_mailbox_irq are added. Could the commit message be narrowed to what this patch actually implements?