Re: [net-next v24 2/7] net: mtip: The L2 switch driver for imx287

Linux-ARM-Kernel Archive on lore.kernel.org
 help / color / mirror / Atom feed

* Re: [net-next v24 2/7] net: mtip: The L2 switch driver for imx287
       [not found] <20260601112437.2216043-3-lukasz.majewski@mailbox.org>
@ 2026-06-05  2:18 ` Jakub Kicinski
  0 siblings, 0 replies; only message in thread
From: Jakub Kicinski @ 2026-06-05  2:18 UTC (permalink / raw)
  To: lukasz.majewski
  Cc: Jakub Kicinski, andrew+netdev, davem, edumazet, pabeni,
	richardcochran, lgirdwood, nathan, linux-kernel, netdev, krzk+dt,
	imx, linux-arm-kernel, robh

This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
---
net: mtip: The L2 switch driver for imx287

This patch adds a new driver for the MoreThanIP L2 switch embedded in
the i.MX287 SoC.  The two-port switch sits between uDMA[01] and the
MAC-NET[01] blocks and can be used as an alternative to the existing
FEC driver.

A few questions inline below.

> diff --git a/drivers/net/ethernet/freescale/mtipsw/mtipl2sw.c b/drivers/net/ethernet/freescale/mtipsw/mtipl2sw.c
> new file mode 100644
> index 000000000000..1d1005a42306
> --- /dev/null
> +++ b/drivers/net/ethernet/freescale/mtipsw/mtipl2sw.c

[ ... ]

> +static void mtip_switch_restart(struct net_device *dev, int duplex0,
> +				int duplex1)
> +{
> +	struct mtip_ndev_priv *priv = netdev_priv(dev);
> +	struct switch_enet_private *fep = priv->fep;
> +	int i;
> +
> +	 /* Perform a reset. We should wait for this. */
> +	writel(MCF_ESW_MODE_SW_RST, fep->hwp + ESW_MODE);

[High]
Should this routine hold a lock?  fep->hw_lock is declared and
initialised in mtip_sw_probe() with the comment "/* Lock for HW
configuration */" but neither mtip_switch_restart() nor mtip_adjust_link()
acquire it.

The two ports each get their own phy_device via of_phy_connect(), and
phylib serialises adjust_link callbacks per phy_device, not across them.
If both ports' state machines fire mtip_adjust_link() at the same time,
both CPUs end up inside mtip_switch_restart() in parallel, racing on
ESW_MODE_SW_RST, ESW_BMPC, ESW_ISR, ESW_RDSR/ESW_TDSR, fep->cur_tx /
cur_rx / dirty_tx, and fep->full_duplex[].

> +	/* Reset SKB transmit buffers */
> +	for (i = 0; i <= TX_RING_MOD_MASK; i++) {
> +		if (fep->tx_skbuff[i]) {
> +			dev_kfree_skb_any(fep->tx_skbuff[i]);
> +			fep->tx_skbuff[i] = NULL;
> +		}
> +	}

[High]
Can two concurrent invocations of this loop double-free a tx_skbuff[i]
entry?  Both CPUs read the same non-NULL pointer, both call
dev_kfree_skb_any() on it, and only then race on the NULL store.

> +
> +	fep->full_duplex[0] = duplex0;
> +	fep->full_duplex[1] = duplex1;
> +
> +	mtip_configure_enet_mii(fep, 1);
> +	mtip_configure_enet_mii(fep, 2);
> +
> +	/* And last, enable the transmit and receive processing */
> +	writel(MCF_ESW_RDAR_R_DES_ACTIVE, fep->hwp + ESW_RDAR);
> +
> +	/* Enable interrupts we wish to service */
> +	writel(0xFFFFFFFF, fep->hwp + ESW_ISR);
> +	writel(MCF_ESW_IMR_TXF | MCF_ESW_IMR_RXF,
> +	       fep->hwp + ESW_IMR);
> +
> +	mtip_config_switch(fep);
> +}
> +
> +static irqreturn_t mtip_interrupt(int irq, void *ptr_fep)
> +{
> +	struct switch_enet_private *fep = ptr_fep;
> +	irqreturn_t ret = IRQ_NONE;
> +	u32 int_events, int_imask;
> +
> +	/* Get the interrupt events that caused us to be here */
> +	int_events = readl(fep->hwp + ESW_ISR);
> +	writel(int_events, fep->hwp + ESW_ISR);
> +
> +	if (int_events & (MCF_ESW_ISR_RXF | MCF_ESW_ISR_TXF)) {
> +		ret = IRQ_HANDLED;
> +		/* Disable the RX interrupt */
> +		if (napi_schedule_prep(&fep->napi)) {
> +			int_imask = readl(fep->hwp + ESW_IMR);
> +			int_imask &= ~MCF_ESW_IMR_RXF;
> +			writel(int_imask, fep->hwp + ESW_IMR);
> +			__napi_schedule(&fep->napi);
> +		}
> +	}
> +
> +	return ret;
> +}

[Medium]
Should this also clear MCF_ESW_IMR_TXF here?  The condition above
schedules NAPI on either RXF or TXF, and the NAPI poll re-enables both
bits together:

	writel(MCF_ESW_IMR_RXF | MCF_ESW_IMR_TXF,
	       fep->hwp + ESW_IMR);

With only RXF masked here, every TX completion while NAPI is running
re-asserts the IRQ line; mtip_interrupt() runs again, napi_schedule_prep()
returns false because NAPI is already scheduled, and the handler returns
IRQ_HANDLED having done nothing useful.  Once mtip_switch_tx() is filled
in by the later patch in the series, would this turn into a sustained
IRQ storm on TX-busy workloads?

[ ... ]

> +static int mtip_mii_init(struct switch_enet_private *fep,
> +			 struct platform_device *pdev)
> +{
> +	struct device_node *node;
> +	int err = -ENXIO;
> +
> +	/* Clear MMFR to avoid to generate MII event by writing MSCR.
> +	 * MII event generation condition:
> +	 * - writing MSCR:
> +	 *      - mmfr[31:0]_not_zero & mscr[7:0]_is_zero &
> +	 *        mscr_reg_data_in[7:0] != 0
> +	 * - writing MMFR:
> +	 *      - mscr[7:0]_not_zero
> +	 */
> +	writel(0, fep->hwp + MCF_FEC_MII_DATA);
> +	/* Clear any pending transaction complete indication */
> +	writel(MCF_ENET_MII, fep->enet_addr + MCF_FEC_EIR);

[Medium]
Is the base address for the MMFR clear correct here?

fep->hwp is the switch base (fep->enet_addr + ENET_SWI_PHYS_ADDR_OFFSET,
i.e. enet_addr + 0x8000), while MCF_FEC_MII_DATA (0x040) is an offset
within the MAC register space.  At offset 0x040 from fep->hwp the header
defines:

	#define ESW_MCR             (0x040)

So this writel() goes to the switch's Mirror Configuration Register
rather than to MMFR.  The very next line correctly uses fep->enet_addr +
MCF_FEC_EIR for the EIR clear, and mtip_mdio_read()/mtip_mdio_write()
also use fep->enet_addr + MCF_FEC_MII_DATA for the actual MDIO accesses.
Should this writel() also use fep->enet_addr?

[ ... ]

> +static int mtip_open(struct net_device *dev)
> +{
> +	struct mtip_ndev_priv *priv = netdev_priv(dev);
> +	struct switch_enet_private *fep = priv->fep;
> +	int ret, port_idx = priv->portnum - 1;
> +
> +	if (fep->usage_count == 0) {
> +		ret = clk_enable(fep->clk_ipg);

[ ... ]

> +	ret = mtip_mii_probe(dev);
> +	if (ret)
> +		goto mtip_mii_probe_err;
> +
> +	phy_start(fep->phy_dev[port_idx]);
> +
> +	if (fep->usage_count == 0) {
> +		napi_enable(&fep->napi);
> +		mtip_switch_restart(dev, 1, 1);
> +
> +		netif_start_queue(dev);
> +	}

[High]
Can phy_start() race with the mtip_switch_restart() call below it?
Once phy_start() returns, the phylib state machine workqueue may invoke
mtip_adjust_link() on another CPU, which calls mtip_switch_restart() on
link-up.  That can run concurrently with the mtip_switch_restart(dev,
1, 1) here, and both paths walk fep->tx_skbuff[] freeing entries and
reprogram switch HW registers without synchronisation.

Would it be safer to start the PHY only after the switch HW has been
configured (and to take a lock around mtip_switch_restart())?

[ ... ]

> +static int mtip_parse_of(struct switch_enet_private *fep,
> +			 struct device_node *np)
> +{
> +	struct device_node *p;
> +	unsigned int port_num;
> +	int ret = 0;
> +
> +	p = of_get_child_by_name(np, "ethernet-ports");
> +
> +	for_each_available_child_of_node_scoped(p, port) {
> +		if (of_property_read_u32(port, "reg", &port_num))
> +			continue;

[Medium]
If of_get_child_by_name() returns NULL (no "ethernet-ports" node), the
for_each_available_child_of_node_scoped() loop is skipped entirely, ret
stays 0, and mtip_parse_of() returns success with no ports parsed.
Similarly, if every child fails the "reg" read above, the continue keeps
ret at 0.

After such a "success" return, mtip_ndev_init() calls
alloc_netdev(sizeof(struct mtip_ndev_priv), fep->ndev_name[i],
NET_NAME_USER, ether_setup) with fep->ndev_name[i] still NULL.
alloc_netdev_mqs() then calls strscpy_pad() on a NULL name and oopses.

Should missing "ethernet-ports", missing "reg", or having no parsed
ports be treated as errors here?

[ ... ]

> +		fep->phy_np[port_num - 1] = of_parse_phandle(port,
> +							     "phy-handle", 0);
> +		if (!fep->phy_np[port_num - 1]) {
> +			dev_err(&fep->pdev->dev,
> +				"%s: PHY handle not defined!\n", __func__);
> +			ret = -ENODEV;
> +			goto of_get_err;
> +		}
> +	}
> +
> + of_get_err:
> +	mtip_parse_of_cleanup(fep);
> +	of_node_put(p);
> +
> +	return ret;
> +}

[High]
Does the success path fall through into of_get_err: unconditionally?

There is no return ret; (or success label) before the of_get_err label,
so on a successful parse mtip_parse_of_cleanup(fep) runs and calls
of_node_put() on every fep->phy_np[i] just acquired by
of_parse_phandle().  mtip_parse_of_cleanup() also does not NULL the
slots:

	static void mtip_parse_of_cleanup(struct switch_enet_private *fep)
	{
		int i;

		for (i = 0; i < SWITCH_EPORT_NUMBER; i++)
			if (fep->phy_np[i])
				of_node_put(fep->phy_np[i]);
	}

After probe, mtip_open()->mtip_mii_probe() then passes the same
fep->phy_np[i] pointers to of_phy_connect() — does this dereference an
OF node the driver no longer holds a reference on?

The error and remove paths (the of_free_references label, the explicit
cleanup after devm_request_irq() failure, and mtip_sw_remove() ->
mtip_parse_of_cleanup()) call mtip_parse_of_cleanup() again on the same
non-NULL pointers, which would underflow the OF refcount on each.

Should there be a return ret; before of_get_err: and a NULL-out of the
slot in mtip_parse_of_cleanup() after each of_node_put()?

[ ... ]

> +static int mtip_sw_probe(struct platform_device *pdev)
> +{
[ ... ]
> +	ret = devm_request_irq(&pdev->dev, fep->irq, mtip_interrupt, 0,
> +			       dev_name(&pdev->dev), fep);
> +	if (ret) {
> +		mtip_parse_of_cleanup(fep);
> +		return dev_err_probe(&pdev->dev, ret, "Could not alloc IRQ\n");
> +	}

[Medium]
Can a stale IRQ here dereference an uninitialised napi struct?

netif_napi_add(dev, &fep->napi, mtip_rx_napi) is only called from
mtip_open(), so between this devm_request_irq() and the first ndo_open
fep->napi is zero-initialised (no poll, no dev, no state).  Probe also
doesn't write 0 to ESW_IMR before requesting the IRQ — it relies on the
register's reset value.

If the bootloader or a warm reset left the switch with IMR set and
events pending in ESW_ISR, mtip_interrupt() can fire immediately on
enable, see RXF/TXF, call napi_schedule_prep(&fep->napi) (which sets
NAPIF_STATE_SCHED unconditionally and returns true), and queue the
zero-initialised napi struct.  net_rx_action() would then call
n->poll == NULL.

Would it help to either mask all interrupts (writel(0, fep->hwp +
ESW_IMR)) before request_irq, or defer request_irq until after
netif_napi_add()?
-- 
pw-bot: cr

^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2026-06-05  2:19 UTC | newest]

Thread overview: (only message) (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20260601112437.2216043-3-lukasz.majewski@mailbox.org>
2026-06-05  2:18 ` [net-next v24 2/7] net: mtip: The L2 switch driver for imx287 Jakub Kicinski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox