Re: [net-next v24 2/7] net: mtip: The L2 switch driver for imx287

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Jakub Kicinski <kuba@kernel.org>
To: lukasz.majewski@mailbox.org
Cc: Jakub Kicinski <kuba@kernel.org>,
	andrew+netdev@lunn.ch, davem@davemloft.net, edumazet@google.com,
	pabeni@redhat.com, richardcochran@gmail.com, lgirdwood@gmail.com,
	nathan@kernel.org, linux-kernel@vger.kernel.org,
	netdev@vger.kernel.org, krzk+dt@kernel.org, imx@lists.linux.dev,
	linux-arm-kernel@lists.infradead.org, robh@kernel.org
Subject: Re: [net-next v24 2/7] net: mtip: The L2 switch driver for imx287
Date: Thu,  4 Jun 2026 19:18:50 -0700	[thread overview]
Message-ID: <20260605021851.3594517-1-kuba@kernel.org> (raw)
In-Reply-To: <20260601112437.2216043-3-lukasz.majewski@mailbox.org>

This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
---
net: mtip: The L2 switch driver for imx287

This patch adds a new driver for the MoreThanIP L2 switch embedded in
the i.MX287 SoC.  The two-port switch sits between uDMA[01] and the
MAC-NET[01] blocks and can be used as an alternative to the existing
FEC driver.

A few questions inline below.

> diff --git a/drivers/net/ethernet/freescale/mtipsw/mtipl2sw.c b/drivers/net/ethernet/freescale/mtipsw/mtipl2sw.c
> new file mode 100644
> index 000000000000..1d1005a42306
> --- /dev/null
> +++ b/drivers/net/ethernet/freescale/mtipsw/mtipl2sw.c

[ ... ]

> +static void mtip_switch_restart(struct net_device *dev, int duplex0,
> +				int duplex1)
> +{
> +	struct mtip_ndev_priv *priv = netdev_priv(dev);
> +	struct switch_enet_private *fep = priv->fep;
> +	int i;
> +
> +	 /* Perform a reset. We should wait for this. */
> +	writel(MCF_ESW_MODE_SW_RST, fep->hwp + ESW_MODE);

[High]
Should this routine hold a lock?  fep->hw_lock is declared and
initialised in mtip_sw_probe() with the comment "/* Lock for HW
configuration */" but neither mtip_switch_restart() nor mtip_adjust_link()
acquire it.

The two ports each get their own phy_device via of_phy_connect(), and
phylib serialises adjust_link callbacks per phy_device, not across them.
If both ports' state machines fire mtip_adjust_link() at the same time,
both CPUs end up inside mtip_switch_restart() in parallel, racing on
ESW_MODE_SW_RST, ESW_BMPC, ESW_ISR, ESW_RDSR/ESW_TDSR, fep->cur_tx /
cur_rx / dirty_tx, and fep->full_duplex[].

> +	/* Reset SKB transmit buffers */
> +	for (i = 0; i <= TX_RING_MOD_MASK; i++) {
> +		if (fep->tx_skbuff[i]) {
> +			dev_kfree_skb_any(fep->tx_skbuff[i]);
> +			fep->tx_skbuff[i] = NULL;
> +		}
> +	}

[High]
Can two concurrent invocations of this loop double-free a tx_skbuff[i]
entry?  Both CPUs read the same non-NULL pointer, both call
dev_kfree_skb_any() on it, and only then race on the NULL store.

> +
> +	fep->full_duplex[0] = duplex0;
> +	fep->full_duplex[1] = duplex1;
> +
> +	mtip_configure_enet_mii(fep, 1);
> +	mtip_configure_enet_mii(fep, 2);
> +
> +	/* And last, enable the transmit and receive processing */
> +	writel(MCF_ESW_RDAR_R_DES_ACTIVE, fep->hwp + ESW_RDAR);
> +
> +	/* Enable interrupts we wish to service */
> +	writel(0xFFFFFFFF, fep->hwp + ESW_ISR);
> +	writel(MCF_ESW_IMR_TXF | MCF_ESW_IMR_RXF,
> +	       fep->hwp + ESW_IMR);
> +
> +	mtip_config_switch(fep);
> +}
> +
> +static irqreturn_t mtip_interrupt(int irq, void *ptr_fep)
> +{
> +	struct switch_enet_private *fep = ptr_fep;
> +	irqreturn_t ret = IRQ_NONE;
> +	u32 int_events, int_imask;
> +
> +	/* Get the interrupt events that caused us to be here */
> +	int_events = readl(fep->hwp + ESW_ISR);
> +	writel(int_events, fep->hwp + ESW_ISR);
> +
> +	if (int_events & (MCF_ESW_ISR_RXF | MCF_ESW_ISR_TXF)) {
> +		ret = IRQ_HANDLED;
> +		/* Disable the RX interrupt */
> +		if (napi_schedule_prep(&fep->napi)) {
> +			int_imask = readl(fep->hwp + ESW_IMR);
> +			int_imask &= ~MCF_ESW_IMR_RXF;
> +			writel(int_imask, fep->hwp + ESW_IMR);
> +			__napi_schedule(&fep->napi);
> +		}
> +	}
> +
> +	return ret;
> +}

[Medium]
Should this also clear MCF_ESW_IMR_TXF here?  The condition above
schedules NAPI on either RXF or TXF, and the NAPI poll re-enables both
bits together:

	writel(MCF_ESW_IMR_RXF | MCF_ESW_IMR_TXF,
	       fep->hwp + ESW_IMR);

With only RXF masked here, every TX completion while NAPI is running
re-asserts the IRQ line; mtip_interrupt() runs again, napi_schedule_prep()
returns false because NAPI is already scheduled, and the handler returns
IRQ_HANDLED having done nothing useful.  Once mtip_switch_tx() is filled
in by the later patch in the series, would this turn into a sustained
IRQ storm on TX-busy workloads?

[ ... ]

> +static int mtip_mii_init(struct switch_enet_private *fep,
> +			 struct platform_device *pdev)
> +{
> +	struct device_node *node;
> +	int err = -ENXIO;
> +
> +	/* Clear MMFR to avoid to generate MII event by writing MSCR.
> +	 * MII event generation condition:
> +	 * - writing MSCR:
> +	 *      - mmfr[31:0]_not_zero & mscr[7:0]_is_zero &
> +	 *        mscr_reg_data_in[7:0] != 0
> +	 * - writing MMFR:
> +	 *      - mscr[7:0]_not_zero
> +	 */
> +	writel(0, fep->hwp + MCF_FEC_MII_DATA);
> +	/* Clear any pending transaction complete indication */
> +	writel(MCF_ENET_MII, fep->enet_addr + MCF_FEC_EIR);

[Medium]
Is the base address for the MMFR clear correct here?

fep->hwp is the switch base (fep->enet_addr + ENET_SWI_PHYS_ADDR_OFFSET,
i.e. enet_addr + 0x8000), while MCF_FEC_MII_DATA (0x040) is an offset
within the MAC register space.  At offset 0x040 from fep->hwp the header
defines:

	#define ESW_MCR             (0x040)

So this writel() goes to the switch's Mirror Configuration Register
rather than to MMFR.  The very next line correctly uses fep->enet_addr +
MCF_FEC_EIR for the EIR clear, and mtip_mdio_read()/mtip_mdio_write()
also use fep->enet_addr + MCF_FEC_MII_DATA for the actual MDIO accesses.
Should this writel() also use fep->enet_addr?

[ ... ]

> +static int mtip_open(struct net_device *dev)
> +{
> +	struct mtip_ndev_priv *priv = netdev_priv(dev);
> +	struct switch_enet_private *fep = priv->fep;
> +	int ret, port_idx = priv->portnum - 1;
> +
> +	if (fep->usage_count == 0) {
> +		ret = clk_enable(fep->clk_ipg);

[ ... ]

> +	ret = mtip_mii_probe(dev);
> +	if (ret)
> +		goto mtip_mii_probe_err;
> +
> +	phy_start(fep->phy_dev[port_idx]);
> +
> +	if (fep->usage_count == 0) {
> +		napi_enable(&fep->napi);
> +		mtip_switch_restart(dev, 1, 1);
> +
> +		netif_start_queue(dev);
> +	}

[High]
Can phy_start() race with the mtip_switch_restart() call below it?
Once phy_start() returns, the phylib state machine workqueue may invoke
mtip_adjust_link() on another CPU, which calls mtip_switch_restart() on
link-up.  That can run concurrently with the mtip_switch_restart(dev,
1, 1) here, and both paths walk fep->tx_skbuff[] freeing entries and
reprogram switch HW registers without synchronisation.

Would it be safer to start the PHY only after the switch HW has been
configured (and to take a lock around mtip_switch_restart())?

[ ... ]

> +static int mtip_parse_of(struct switch_enet_private *fep,
> +			 struct device_node *np)
> +{
> +	struct device_node *p;
> +	unsigned int port_num;
> +	int ret = 0;
> +
> +	p = of_get_child_by_name(np, "ethernet-ports");
> +
> +	for_each_available_child_of_node_scoped(p, port) {
> +		if (of_property_read_u32(port, "reg", &port_num))
> +			continue;

[Medium]
If of_get_child_by_name() returns NULL (no "ethernet-ports" node), the
for_each_available_child_of_node_scoped() loop is skipped entirely, ret
stays 0, and mtip_parse_of() returns success with no ports parsed.
Similarly, if every child fails the "reg" read above, the continue keeps
ret at 0.

After such a "success" return, mtip_ndev_init() calls
alloc_netdev(sizeof(struct mtip_ndev_priv), fep->ndev_name[i],
NET_NAME_USER, ether_setup) with fep->ndev_name[i] still NULL.
alloc_netdev_mqs() then calls strscpy_pad() on a NULL name and oopses.

Should missing "ethernet-ports", missing "reg", or having no parsed
ports be treated as errors here?

[ ... ]

> +		fep->phy_np[port_num - 1] = of_parse_phandle(port,
> +							     "phy-handle", 0);
> +		if (!fep->phy_np[port_num - 1]) {
> +			dev_err(&fep->pdev->dev,
> +				"%s: PHY handle not defined!\n", __func__);
> +			ret = -ENODEV;
> +			goto of_get_err;
> +		}
> +	}
> +
> + of_get_err:
> +	mtip_parse_of_cleanup(fep);
> +	of_node_put(p);
> +
> +	return ret;
> +}

[High]
Does the success path fall through into of_get_err: unconditionally?

There is no return ret; (or success label) before the of_get_err label,
so on a successful parse mtip_parse_of_cleanup(fep) runs and calls
of_node_put() on every fep->phy_np[i] just acquired by
of_parse_phandle().  mtip_parse_of_cleanup() also does not NULL the
slots:

	static void mtip_parse_of_cleanup(struct switch_enet_private *fep)
	{
		int i;

		for (i = 0; i < SWITCH_EPORT_NUMBER; i++)
			if (fep->phy_np[i])
				of_node_put(fep->phy_np[i]);
	}

After probe, mtip_open()->mtip_mii_probe() then passes the same
fep->phy_np[i] pointers to of_phy_connect() — does this dereference an
OF node the driver no longer holds a reference on?

The error and remove paths (the of_free_references label, the explicit
cleanup after devm_request_irq() failure, and mtip_sw_remove() ->
mtip_parse_of_cleanup()) call mtip_parse_of_cleanup() again on the same
non-NULL pointers, which would underflow the OF refcount on each.

Should there be a return ret; before of_get_err: and a NULL-out of the
slot in mtip_parse_of_cleanup() after each of_node_put()?

[ ... ]

> +static int mtip_sw_probe(struct platform_device *pdev)
> +{
[ ... ]
> +	ret = devm_request_irq(&pdev->dev, fep->irq, mtip_interrupt, 0,
> +			       dev_name(&pdev->dev), fep);
> +	if (ret) {
> +		mtip_parse_of_cleanup(fep);
> +		return dev_err_probe(&pdev->dev, ret, "Could not alloc IRQ\n");
> +	}

[Medium]
Can a stale IRQ here dereference an uninitialised napi struct?

netif_napi_add(dev, &fep->napi, mtip_rx_napi) is only called from
mtip_open(), so between this devm_request_irq() and the first ndo_open
fep->napi is zero-initialised (no poll, no dev, no state).  Probe also
doesn't write 0 to ESW_IMR before requesting the IRQ — it relies on the
register's reset value.

If the bootloader or a warm reset left the switch with IMR set and
events pending in ESW_ISR, mtip_interrupt() can fire immediately on
enable, see RXF/TXF, call napi_schedule_prep(&fep->napi) (which sets
NAPIF_STATE_SCHED unconditionally and returns true), and queue the
zero-initialised napi struct.  net_rx_action() would then call
n->poll == NULL.

Would it help to either mask all interrupts (writel(0, fep->hwp +
ESW_IMR)) before request_irq, or defer request_irq until after
netif_napi_add()?
-- 
pw-bot: cr

next prev parent reply	other threads:[~2026-06-05  2:19 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-01 11:24 [net-next v24 0/7] net: mtip: Add support for MTIP imx287 L2 switch driver Lukasz Majewski
2026-06-01 11:24 ` [net-next v24 1/7] dt-bindings: net: Add MTIP L2 switch description Lukasz Majewski
2026-06-02 11:32   ` sashiko-bot
2026-06-01 11:24 ` [net-next v24 2/7] net: mtip: The L2 switch driver for imx287 Lukasz Majewski
2026-06-02 11:32   ` sashiko-bot
2026-06-05  2:18   ` Jakub Kicinski [this message]
2026-06-01 11:24 ` [net-next v24 3/7] net: mtip: Add buffers management functions to the L2 switch driver Lukasz Majewski
2026-06-02 11:32   ` sashiko-bot
2026-06-05  2:18   ` Jakub Kicinski
2026-06-09  8:37     ` Łukasz Majewski
2026-06-01 11:24 ` [net-next v24 4/7] net: mtip: Add net_device_ops " Lukasz Majewski
2026-06-02 11:32   ` sashiko-bot
2026-06-05  2:18   ` Jakub Kicinski
2026-06-01 11:24 ` [net-next v24 5/7] net: mtip: Add mtip_switch_{rx|tx} " Lukasz Majewski
2026-06-02 11:32   ` sashiko-bot
2026-06-05  2:18   ` Jakub Kicinski
2026-06-01 11:24 ` [net-next v24 6/7] net: mtip: Extend the L2 switch driver with management operations Lukasz Majewski
2026-06-02 11:32   ` sashiko-bot
2026-06-05  2:18   ` Jakub Kicinski
2026-06-01 11:24 ` [net-next v24 7/7] net: mtip: Extend the L2 switch driver for imx287 with bridge operations Lukasz Majewski
2026-06-02 11:32   ` sashiko-bot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260605021851.3594517-1-kuba@kernel.org \
    --to=kuba@kernel.org \
    --cc=andrew+netdev@lunn.ch \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=imx@lists.linux.dev \
    --cc=krzk+dt@kernel.org \
    --cc=lgirdwood@gmail.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lukasz.majewski@mailbox.org \
    --cc=nathan@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=richardcochran@gmail.com \
    --cc=robh@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.