Re: [PATCH v13 net-next 07/11] net/nebula-matrix: add intr resource implementation

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Jakub Kicinski <kuba@kernel.org>
To: illusion.wang@nebula-matrix.com
Cc: Jakub Kicinski <kuba@kernel.org>,
	dimon.zhao@nebula-matrix.com, alvin.wang@nebula-matrix.com,
	sam.chen@nebula-matrix.com, netdev@vger.kernel.org,
	andrew+netdev@lunn.ch, corbet@lwn.net, linux-doc@vger.kernel.org,
	lorenzo@kernel.org, pabeni@redhat.com, horms@kernel.org,
	vadim.fedorenko@linux.dev, lukas.bulwahn@redhat.com,
	edumazet@google.com, enelsonmoore@gmail.com,
	skhan@linuxfoundation.org, hkallweit1@gmail.com,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH v13 net-next 07/11] net/nebula-matrix: add intr resource implementation
Date: Thu, 30 Apr 2026 16:47:46 -0700	[thread overview]
Message-ID: <20260430234746.3074632-1-kuba@kernel.org> (raw)
In-Reply-To: <20260428114910.2616-8-illusion.wang@nebula-matrix.com>

This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
---
net/nebula-matrix: add intr resource implementation

This patch adds an interrupt resource module that tracks MSI-X vector
allocation per function via two bitmaps and a per-function array, and
introduces three resource ops (configure_msix_map, destroy_msix_map,
enable_mailbox_irq) plus the matching hw_ops (configure_msix_map,
configure_msix_info, enable_mailbox_irq) that program the device MSI-X
mapping and info registers.

A few questions below on the new code.

> diff --git a/drivers/net/ethernet/nebula-matrix/nbl/nbl_hw/nbl_hw_leonis/nbl_resource_leonis.c b/drivers/net/ethernet/nebula-matrix/nbl/nbl_hw/nbl_hw_leonis/nbl_resource_leonis.c
> index 1ec1c28d99ff..1cceb0df5e1e 100644
> --- a/drivers/net/ethernet/nebula-matrix/nbl/nbl_hw/nbl_hw_leonis/nbl_resource_leonis.c
> +++ b/drivers/net/ethernet/nebula-matrix/nbl/nbl_hw/nbl_hw_leonis/nbl_resource_leonis.c
> @@ -12,6 +12,7 @@ static struct nbl_resource_ops res_ops = {
>  	.get_eth_id = nbl_res_get_eth_id,
>  };
>  
> +static bool is_ops_inited;

Is this lazy initialization of a file-scope vtable safe against concurrent
probes?

nbl_res_setup_ops() does an unlocked check-then-set on is_ops_inited and
then mutates the file-scope res_ops by calling nbl_intr_setup_ops():

	if (!is_ops_inited) {
		ret = nbl_intr_setup_ops(&res_ops);
		if (ret)
			return ERR_PTR(-ENOMEM);
		is_ops_inited = true;
	}

Two probe threads for different functions on the same device could both
observe is_ops_inited == false and race on writing the same pointer fields
into res_ops.  Would a static designated initializer in res_ops (keeping
res_ops const and dropping is_ops_inited / nbl_intr_setup_ops entirely)
remove the race and simplify the scaffolding?

Related: nbl_intr_setup_ops() only ever returns 0, so the caller's
"if (ret) return ERR_PTR(-ENOMEM);" is unreachable, and mapping any future
failure to -ENOMEM would also be incorrect.  Could this function be void?

[ ... ]

> diff --git a/drivers/net/ethernet/nebula-matrix/nbl/nbl_hw/nbl_interrupt.c b/drivers/net/ethernet/nebula-matrix/nbl/nbl_hw/nbl_interrupt.c
> new file mode 100644
> index 000000000000..92a6fff5523e
> --- /dev/null
> +++ b/drivers/net/ethernet/nebula-matrix/nbl/nbl_hw/nbl_interrupt.c
> @@ -0,0 +1,246 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Copyright (c) 2025 Nebula Matrix Limited.
> + */
> +#include <linux/device.h>
> +#include <linux/dma-mapping.h>
> +#include "nbl_interrupt.h"
> +
> +static int nbl_res_intr_destroy_msix_map(struct nbl_resource_mgt *res_mgt,
> +					 u16 func_id)
> +{
> +	struct nbl_interrupt_mgt *intr_mgt = res_mgt->intr_mgt;
> +	struct nbl_hw_ops *hw_ops = res_mgt->hw_ops_tbl->ops;
> +	struct device *dev = res_mgt->common->dev;
> +	struct nbl_msix_map_table *msix_map_table;
> +	u16 *interrupts;
> +	u16 intr_num, i;
> +
> +	/* use ctrl dev bdf */
> +	hw_ops->configure_msix_map(res_mgt->hw_ops_tbl->priv, func_id, false, 0,
> +				   0, 0, 0);
> +
> +	intr_num = intr_mgt->func_intr_res[func_id].num_interrupts;
> +	interrupts = intr_mgt->func_intr_res[func_id].interrupts;

Is func_id bounds-checked anywhere on this path?  intr_mgt->func_intr_res[]
is sized NBL_MAX_FUNC (520) but the u16 func_id (0..65535) is indexed
directly, and nbl_res_func_id_to_bdf() (which validates against max_pf) is
not called here.  If func_id arrives from a mailbox message in a later
patch, can this index out of bounds?

The same question applies to nbl_res_intr_enable_mailbox_irq() and to the
first use of func_intr_res[func_id] in nbl_res_intr_configure_msix_map()
below, which happens before nbl_res_func_id_to_bdf() is called.

> +
> +	if (!interrupts) {
> +		dev_err(dev, "No interrupts to clr for func_id %u\n", func_id);
> +		return -EINVAL;
> +	}
> +	for (i = 0; i < intr_num; i++) {
> +		if (interrupts[i] >= NBL_MAX_OTHER_INTERRUPT)
> +			clear_bit(interrupts[i] - NBL_MAX_OTHER_INTERRUPT,
> +				  intr_mgt->interrupt_net_bitmap);
> +		else
> +			clear_bit(interrupts[i],
> +				  intr_mgt->interrupt_others_bitmap);

What serializes access to intr_mgt->interrupt_net_bitmap,
intr_mgt->interrupt_others_bitmap and intr_mgt->func_intr_res[func_id]
across concurrent invocations of these three ops?

set_bit() / clear_bit() are individually atomic, but the sequence used in
nbl_res_intr_configure_msix_map():

	intr_index = find_first_zero_bit(intr_mgt->interrupt_net_bitmap,
					 NBL_MAX_NET_INTERRUPT);
	...
	set_bit(intr_index, intr_mgt->interrupt_net_bitmap);

is a non-atomic read-modify-write.  Two concurrent calls for different
func_ids could each observe the same zero bit and both claim it, resulting
in two functions being programmed with the same global MSI-X index.

The per-function state has a similar concern: configure, destroy and
enable_mailbox_irq all read/write func_intr_res[func_id].interrupts,
num_interrupts and msix_map_table without any lock held.  Concurrent
callers on the same func_id can race the "if (interrupts) destroy" TOCTOU
and realloc, double-kfree() the interrupts array, or dma_free_coherent()
twice.  enable_mailbox_irq dereferences interrupts[vector_id] which
destroy_msix_map has just freed and set to NULL -- is this a
use-after-free?

The WARN_ON(msix_map_table->base_addr) below suggests single-threaded use
is assumed; what enforces that?

[ ... ]

> +	kfree(interrupts);
> +	intr_mgt->func_intr_res[func_id].interrupts = NULL;
> +	intr_mgt->func_intr_res[func_id].num_interrupts = 0;
> +
> +	msix_map_table = &intr_mgt->func_intr_res[func_id].msix_map_table;
> +	dma_free_coherent(dev, msix_map_table->size, msix_map_table->base_addr,
> +			  msix_map_table->dma);

Is there a teardown path that iterates over func_intr_res[] and calls
destroy_msix_map() for each active func_id on module unload / device
remove?

The interrupts array (kcalloc) and the msix_map_table (dma_alloc_coherent)
allocated in configure_msix_map are not devm-managed, and
nbl_res_remove_leonis() does not walk func_intr_res[].  Any func_id still
mapped at unload time would leak both allocations and leave a stale bus
address programmed in the hardware MSI-X map register.

> +	msix_map_table->size = 0;
> +	msix_map_table->base_addr = NULL;
> +	msix_map_table->dma = 0;
> +
> +	return 0;
> +}
> +
> +static int nbl_res_intr_configure_msix_map(struct nbl_resource_mgt *res_mgt,
> +					   u16 func_id, u16 num_net_msix,
> +					   u16 num_others_msix,
> +					   bool net_msix_mask_en)
> +{
> +	struct nbl_interrupt_mgt *intr_mgt = res_mgt->intr_mgt;
> +	struct nbl_hw_ops *hw_ops = res_mgt->hw_ops_tbl->ops;
> +	struct nbl_common_info *common = res_mgt->common;
> +	struct nbl_msix_map_table *msix_map_table;
> +	struct nbl_msix_map *msix_map_entries;
> +	struct device *dev = common->dev;
> +	u16 requested, intr_index;
> +	u8 bus, devid, function;
> +	bool msix_mask_en;
> +	u16 *interrupts;
> +	int ret = 0;
> +	u16 i;
> +
> +	requested = num_net_msix + num_others_msix;
> +	if (requested > NBL_MSIX_MAP_TABLE_MAX_ENTRIES)
> +		return -EINVAL;
> +	if (intr_mgt->func_intr_res[func_id].interrupts)
> +		nbl_res_intr_destroy_msix_map(res_mgt, func_id);

Is this reconfigure ordering intentional?

The existing mapping is torn down (dma_free_coherent, kfree, and
hw_ops->configure_msix_map(valid=false)) before the new allocations are
attempted.  If the subsequent dma_alloc_coherent or kcalloc fails, the
function returns -ENOMEM with the previously working interrupt setup
already gone and no rollback.  Would allocate-then-swap be safer?

> +
> +	ret = nbl_res_func_id_to_bdf(res_mgt, func_id, &bus, &devid, &function);
> +	if (ret == U32_MAX)
> +		return -EINVAL;

nbl_res_func_id_to_bdf() is declared to return int but returns U32_MAX on
error, and the caller stores it in int ret and compares against U32_MAX.
This currently works only because -1 sign-extends to 0xFFFFFFFF.  If the
callee is later changed to return a normal negative errno (for example
-EINVAL), will this check silently stop detecting failures and let
uninitialised bus/devid/function be programmed into hardware?

> +
> +	msix_map_table = &intr_mgt->func_intr_res[func_id].msix_map_table;
> +	WARN_ON(msix_map_table->base_addr);
> +	msix_map_table->size =
> +		sizeof(struct nbl_msix_map) * NBL_MSIX_MAP_TABLE_MAX_ENTRIES;
> +	msix_map_table->base_addr = dma_alloc_coherent(dev,
> +						       msix_map_table->size,
> +						       &msix_map_table->dma,
> +						       GFP_KERNEL);

[ ... ]

> +	msix_map_entries = msix_map_table->base_addr;
> +	for (i = 0; i < requested; i++) {
> +		msix_map_entries[i].global_msix_index = interrupts[i];
> +		msix_map_entries[i].valid = 1;
> +
> +		if (i < num_net_msix && net_msix_mask_en)
> +			msix_mask_en = 1;
> +		else
> +			msix_mask_en = 0;
> +		hw_ops->configure_msix_info(res_mgt->hw_ops_tbl->priv, func_id,
> +					    true, interrupts[i], bus, devid,
> +					    function, msix_mask_en);
> +	}
> +
> +	/* use ctrl dev bdf */
> +	hw_ops->configure_msix_map(res_mgt->hw_ops_tbl->priv, func_id, true,
> +				   msix_map_table->dma, common->hw_bus,
> +				   common->devid, common->function);

Is a dma_wmb() (or equivalent) needed between the stores into the
dma_alloc_coherent table (msix_map_entries[i] fields above) and the MMIO
writel() in hw_ops->configure_msix_map that arms the device to DMA-read
that table?

dma_alloc_coherent guarantees cache coherence but not ordering between
CPU stores into the coherent buffer and subsequent MMIO stores on
weakly-ordered architectures such as arm64 and ppc.

> +
> +	return 0;

[ ... ]

> +static int nbl_res_intr_enable_mailbox_irq(struct nbl_resource_mgt *res_mgt,
> +					   u16 func_id, u16 vector_id,
> +					   bool enable_msix)
> +{
> +	struct nbl_interrupt_mgt *intr_mgt = res_mgt->intr_mgt;
> +	struct nbl_hw_ops *hw_ops = res_mgt->hw_ops_tbl->ops;
> +	u16 global_vec_id;
> +
> +	global_vec_id = intr_mgt->func_intr_res[func_id].interrupts[vector_id];

Is this dereference guarded anywhere?  There is no check that
func_intr_res[func_id].interrupts is non-NULL or that vector_id is less
than num_interrupts.

After destroy_msix_map() runs, interrupts is NULL; and before
configure_msix_map() has ever run for that func_id it is also NULL, so
this becomes a NULL-pointer read at offset 2 * vector_id.  When interrupts
is non-NULL but vector_id is out of range, this is an out-of-bounds slab
read whose value is then programmed into a hardware MSI-X register via
hw_ops->enable_mailbox_irq().

> +	hw_ops->enable_mailbox_irq(res_mgt->hw_ops_tbl->priv, func_id,
> +				   enable_msix, global_vec_id);
> +
> +	return 0;
> +}
> +
> +/* NBL_INTR_SET_OPS(ops_name, func)
> + *
> + * Use X Macros to reduce setup and remove codes.
> + */
> +#define NBL_INTR_OPS_TBL						\
> +do {									\
> +	NBL_INTR_SET_OPS(configure_msix_map,				\
> +			 nbl_res_intr_configure_msix_map);		\
> +	NBL_INTR_SET_OPS(destroy_msix_map,				\
> +			 nbl_res_intr_destroy_msix_map);		\
> +	NBL_INTR_SET_OPS(enable_mailbox_irq,				\
> +			 nbl_res_intr_enable_mailbox_irq);		\
> +} while (0)

This isn't a bug, but could this X-macro scheme be replaced with a plain
designated initializer on a const struct nbl_resource_ops?  The indirection
through NBL_INTR_OPS_TBL / NBL_INTR_SET_OPS and the NBL_NAME(x) helper
(whose stated purpose is "Used for macros to pass checkpatch") add no
type safety over:

	static const struct nbl_resource_ops res_ops = {
		.configure_msix_map  = nbl_res_intr_configure_msix_map,
		.destroy_msix_map    = nbl_res_intr_destroy_msix_map,
		.enable_mailbox_irq  = nbl_res_intr_enable_mailbox_irq,
		...
	};

[ ... ]

> diff --git a/drivers/net/ethernet/nebula-matrix/nbl/nbl_hw/nbl_resource.h b/drivers/net/ethernet/nebula-matrix/nbl/nbl_hw/nbl_resource.h
> index 90c78aefa498..d4a1374e0dd2 100644
> --- a/drivers/net/ethernet/nebula-matrix/nbl/nbl_hw/nbl_resource.h
> +++ b/drivers/net/ethernet/nebula-matrix/nbl/nbl_hw/nbl_resource.h
> @@ -18,6 +18,37 @@
>  
>  struct nbl_resource_mgt;
>  
> +/* --------- INTERRUPT ---------- */
> +#define NBL_MAX_OTHER_INTERRUPT			1024
> +#define NBL_MAX_NET_INTERRUPT			4096
> +
> +struct nbl_msix_map {
> +	u16 valid:1;
> +	u16 global_msix_index:13;
> +	u16 rsv:2;
> +};

Is it safe to rely on C bitfield layout for a structure that is written
into dma_alloc_coherent memory and then consumed by the device?  The bit
ordering of bitfields within a storage unit is implementation-defined, and
GCC reverses bit ordering on big-endian targets.

The same question applies to nbl_mailbox_qinfo_map_table,
nbl_host_msix_info, nbl_pcompleter_host_msix_fid_table and
nbl_function_msix_map, which are assembled with C bitfields and then cast
to u32 * to be written into MMIO registers in nbl_hw_leonis.c.  Would
explicit FIELD_PREP / shifts-and-masks or __le* encoding be more portable?

> +
> +struct nbl_msix_map_table {
> +	struct nbl_msix_map *base_addr;
> +	dma_addr_t dma;
> +	size_t size;
> +};
> +
> +struct nbl_func_interrupt_resource_mng {
> +	u16 num_interrupts;
> +	u16 num_net_interrupts;
> +	u16 msix_base;
> +	u16 msix_max;

msix_base and msix_max do not appear to be read or written anywhere in
this patch.  Could these be added in the patch that first uses them?

> +	u16 *interrupts;
> +	struct nbl_msix_map_table msix_map_table;
> +};

[ ... ]

One last question on the commit message:

    MSI-X Interrupt Configuration:
    Dynamically allocate and manage MSI-X interrupt vectors, including
    network interrupts and other types of interrupts.
    ...
    Interrupt Information Query: Provide interfaces to obtain the
    hardware register addresses and data of interrupts.

The patch programs the device-private MSI-X mapping and info registers and
tracks index bitmaps, but there is no pci_alloc_irq_vectors(), no
request_irq(), no IRQ handler, and no "interrupt information query" getter
exposed through nbl_resource_ops -- only configure_msix_map,
destroy_msix_map and enable_mailbox_irq are added.  Could the commit
message be narrowed to what this patch actually implements?

next prev parent reply	other threads:[~2026-04-30 23:47 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-28 11:48 [PATCH v13 net-next 00/11] nbl driver for Nebulamatrix NICs illusion.wang
2026-04-28 11:48 ` [PATCH v13 net-next 01/11] net/nebula-matrix: add minimum nbl build framework illusion.wang
2026-04-28 11:48 ` [PATCH v13 net-next 02/11] net/nebula-matrix: add our driver architecture illusion.wang
2026-04-28 11:48 ` [PATCH v13 net-next 03/11] net/nebula-matrix: add chip related definitions illusion.wang
2026-04-30 10:41   ` Paolo Abeni
2026-04-28 11:48 ` [PATCH v13 net-next 04/11] net/nebula-matrix: channel msg value and msg struct illusion.wang
2026-04-30 23:47   ` Jakub Kicinski
2026-04-28 11:48 ` [PATCH v13 net-next 05/11] net/nebula-matrix: add channel layer illusion.wang
2026-04-30 10:51   ` Paolo Abeni
2026-04-28 11:48 ` [PATCH v13 net-next 06/11] net/nebula-matrix: add common resource implementation illusion.wang
2026-04-28 11:49 ` [PATCH v13 net-next 07/11] net/nebula-matrix: add intr " illusion.wang
2026-04-30 23:47   ` Jakub Kicinski [this message]
2026-04-28 11:49 ` [PATCH v13 net-next 08/11] net/nebula-matrix: add vsi " illusion.wang
2026-04-30 23:47   ` Jakub Kicinski
2026-04-28 11:49 ` [PATCH v13 net-next 09/11] net/nebula-matrix: add Dispatch layer implementation illusion.wang
2026-04-30 23:47   ` Jakub Kicinski
2026-04-28 11:49 ` [PATCH v13 net-next 10/11] net/nebula-matrix: add common/ctrl dev init/reinit operation illusion.wang
2026-04-28 11:49 ` [PATCH v13 net-next 11/11] net/nebula-matrix: add common dev start/stop operation illusion.wang
2026-04-30 23:47   ` Jakub Kicinski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260430234746.3074632-1-kuba@kernel.org \
    --to=kuba@kernel.org \
    --cc=alvin.wang@nebula-matrix.com \
    --cc=andrew+netdev@lunn.ch \
    --cc=corbet@lwn.net \
    --cc=dimon.zhao@nebula-matrix.com \
    --cc=edumazet@google.com \
    --cc=enelsonmoore@gmail.com \
    --cc=hkallweit1@gmail.com \
    --cc=horms@kernel.org \
    --cc=illusion.wang@nebula-matrix.com \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lorenzo@kernel.org \
    --cc=lukas.bulwahn@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=sam.chen@nebula-matrix.com \
    --cc=skhan@linuxfoundation.org \
    --cc=vadim.fedorenko@linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox