From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from smtp.kernel.org (aws-us-west-2-korg-mail-alma10-1.taild15c8.ts.net [100.103.45.18])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id D8A843672A2;
	Fri,  5 Jun 2026 02:18:59 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=100.103.45.18
ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1780625946; cv=none; b=HIruOz2f4eW5vKU/JcFWNrWfxNotO2YC+DEH9kD1Xm82cWx1JP9YdeoBfWvZ04jVYnYCKtBhlDzp0vQm4XSdZinU2ul6aEdhIeR99cX2oIZP5gd2BtWyMD+MDaO1jtC2qljBjWxfMY5QSYrHQYE4uhszoDEeAGYMabujLZoY1DI=
ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1780625946; c=relaxed/simple;
	bh=Cpu5rFldQg8OFJ06Ye6HbzrzNMh2nVbLy2tbuF/gavc=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version; b=ZYFpf3drl/S7wWAipjhORvp+2SQsq6ktG9h9fCv8eNAEhXSs0KGT7DI5eaW0gNZomOTiruSk+im35ltYg9Og25Hkt+yYwi8suPlnM5aujDCHfGjuxnj/v3BITaNQDfguuwmQK/AlLveDV6IXkEcrox0x40/Cbslgp+vPKoSeMxM=
ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=eSK1dnYs; arc=none smtp.client-ip=100.103.45.18
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="eSK1dnYs"
Received: by smtp.kernel.org (Postfix) with ESMTPSA id ABE901F0089A;
	Fri,  5 Jun 2026 02:18:58 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel.org;
	s=k20260515; t=1780625939;
	bh=FDS6Cjxf+/JaLSZoJgj5+4lXnvtwYjgZb5JthpJtzIg=;
	h=From:To:Cc:Subject:Date:In-Reply-To:References;
	b=eSK1dnYs/i4g7x7knYUHW/Wl5/sSyoXtjWuckn2ea6tzYp5zrTDgrTa+9u4hsm/pO
	 pHc5bJP+iGItN/dbiX7ZSBTHPx8/1Hr5dYgw2D9RYKnE4MXEa5KvB/54oYsCHNDLo/
	 srRuaTwHWpV1uDkh/ypS872D0MAU22VP9P5PxEcwbNffgPpMg3eyTKTdhB2YmwEyLc
	 2M9pTrN2mI7bQwOoog6vxYJCZgcDOvxukPYpG7hriBZQFD1PKD2JRemTDq1Y1xsJmI
	 yLkTCTH5WUiR11orLkqg+CAYuKgK43f9El/wcVnnfNcKi4IQ1nZ/16ttbg/6FC5uJs
	 46bwhGgMXMUSg==
From: Jakub Kicinski <kuba@kernel.org>
To: lukasz.majewski@mailbox.org
Cc: Jakub Kicinski <kuba@kernel.org>,
	andrew+netdev@lunn.ch,
	davem@davemloft.net,
	edumazet@google.com,
	pabeni@redhat.com,
	richardcochran@gmail.com,
	lgirdwood@gmail.com,
	nathan@kernel.org,
	linux-kernel@vger.kernel.org,
	netdev@vger.kernel.org,
	krzk+dt@kernel.org,
	imx@lists.linux.dev,
	linux-arm-kernel@lists.infradead.org,
	robh@kernel.org
Subject: Re: [net-next v24 5/7] net: mtip: Add mtip_switch_{rx|tx} functions to the L2 switch driver
Date: Thu,  4 Jun 2026 19:18:57 -0700
Message-ID: <20260605021857.3594621-1-kuba@kernel.org>
X-Mailer: git-send-email 2.54.0
In-Reply-To: <20260601112437.2216043-6-lukasz.majewski@mailbox.org>
References: <20260601112437.2216043-6-lukasz.majewski@mailbox.org>
Precedence: bulk
X-Mailing-List: netdev@vger.kernel.org
List-Id: <netdev.vger.kernel.org>
List-Subscribe: <mailto:netdev+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:netdev+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit

This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
---
net: mtip: Add mtip_switch_{rx|tx} functions to the L2 switch driver

This patch fills in the previously stubbed mtip_switch_tx() and
mtip_switch_rx() functions for the MTIP L2 switch driver, adds a small
helper mtip_atable_get_entry_port_number() that looks a MAC up in the
hardware address table, and a mtip_update_cbd() helper used to refill RX
buffer descriptors from the page_pool.

> diff --git a/drivers/net/ethernet/freescale/mtipsw/mtipl2sw.c b/drivers/net/ethernet/freescale/mtipsw/mtipl2sw.c
> index 7dfeda230e3f..212207861e13 100644
> --- a/drivers/net/ethernet/freescale/mtipsw/mtipl2sw.c
> +++ b/drivers/net/ethernet/freescale/mtipsw/mtipl2sw.c

[ ... ]

> @@ -836,11 +873,233 @@ static irqreturn_t mtip_interrupt(int irq, void *ptr_fep)
>  
>  static void mtip_switch_tx(struct switch_enet_private *fep)
>  {
> +	struct net_device *dev;
> +	unsigned short status;
> +	struct sk_buff *skb;
> +	struct cbd_t *bdp;
> +	int index;
> +
> +	spin_lock_bh(&fep->hw_lock);
> +	bdp = fep->dirty_tx;
> +
> +	while (((status = bdp->cbd_sc) & BD_ENET_TX_READY) == 0) {
> +		if (bdp == fep->cur_tx &&
> +		    !mtip_netif_queues_stopped(fep))
> +			break;
> +
> +		index = bdp - fep->tx_bd_base;
> +		dma_unmap_single(&fep->pdev->dev, bdp->cbd_bufaddr,
> +				 MTIP_SWITCH_TX_FRSIZE, DMA_TO_DEVICE);
> +		bdp->cbd_bufaddr = 0;
> +		skb = fep->tx_skbuff[index];
> +		dev = skb->dev;

[High]
Can mtip_switch_tx() race with mtip_adjust_link() and dereference a NULL
skb here?

mtip_adjust_link() runs in process context from phylib's state machine
and calls:

    mtip_netif_stop_queues(fep);
    mtip_switch_restart(...);
    if (mtip_netif_queues_stopped(fep))
            mtip_netif_wake_queues(fep);

mtip_switch_restart() does not appear to take fep->hw_lock, does not
disable NAPI, clears tx_skbuff[i] = NULL and resets cur_tx = dirty_tx
= base, and at the end re-enables RX/TX interrupts via ESW_IMR.

While this is in flight, both per-port queues are XOFF so
mtip_netif_queues_stopped(fep) returns true. A concurrent RX interrupt
on another CPU can schedule NAPI, which calls mtip_switch_tx() with
this state:

    bdp = fep->dirty_tx;             /* == base after restart */
    while ((bdp->cbd_sc & BD_ENET_TX_READY) == 0) {
            if (bdp == fep->cur_tx && !mtip_netif_queues_stopped(fep))
                    break;           /* not taken: queues stopped */
            ...
            skb = fep->tx_skbuff[index];   /* NULL */
            dev = skb->dev;                /* NULL deref */

mtip_switch_restart() does not reset cbd_sc, so cbd_sc & BD_ENET_TX_READY
can still be 0 from prior TX activity, which lets the loop body run.
dma_unmap_single() would also be called with the stale cbd_bufaddr.

The mtip_timeout_work() path wraps the same restart in napi_disable() /
netif_tx_lock_bh(); should the adjust_link path do the same?

> +		/* Check for errors */
> +		if (status & (BD_ENET_TX_HB | BD_ENET_TX_LC |
> +				   BD_ENET_TX_RL | BD_ENET_TX_UN |
> +				   BD_ENET_TX_CSL)) {
> +			dev->stats.tx_errors++;
> +			if (status & BD_ENET_TX_HB)  /* No heartbeat */
> +				dev->stats.tx_heartbeat_errors++;
> +			if (status & BD_ENET_TX_LC)  /* Late collision */
> +				dev->stats.tx_window_errors++;
> +			if (status & BD_ENET_TX_RL)  /* Retrans limit */
> +				dev->stats.tx_aborted_errors++;
> +			if (status & BD_ENET_TX_UN)  /* Underrun */
> +				dev->stats.tx_fifo_errors++;
> +			if (status & BD_ENET_TX_CSL) /* Carrier lost */
> +				dev->stats.tx_carrier_errors++;
> +		} else {
> +			dev->stats.tx_packets++;
> +		}
> +
> +		if (status & BD_ENET_TX_READY)
> +			dev_err_ratelimited(&fep->pdev->dev,
> +					    "xmit interrupt and TX_READY.\n");

[Low]
Can this dev_err_ratelimited() ever fire?

The enclosing loop is:

    while (((status = bdp->cbd_sc) & BD_ENET_TX_READY) == 0) {

status is only assigned in the loop header and is not re-read inside the
body, so by construction status & BD_ENET_TX_READY is always 0 when this
diagnostic is reached. Was the intent to re-read bdp->cbd_sc, or is this
leftover scaffolding that should just be removed?

> +
> +		/* Deferred means some collisions occurred during transmit,
> +		 * but we eventually sent the packet OK.
> +		 */
> +		if (status & BD_ENET_TX_DEF)
> +			dev->stats.collisions++;
> +
> +		/* Free the sk buffer associated with this last transmit */
> +		dev_consume_skb_any(skb);
> +		fep->tx_skbuff[index] = NULL;

[ ... ]

> +	}
> +	fep->dirty_tx = bdp;
> +	spin_unlock_bh(&fep->hw_lock);
>  }
>  
> +static int mtip_update_cbd(struct switch_enet_private *fep, struct cbd_t *bdp,
> +			   int index)
> +{

[ ... ]

> +}
> +

[ ... ]

>  static int mtip_switch_rx(struct net_device *dev, int budget)
>  {
> -	return -ENOMEM;
> +	struct mtip_ndev_priv *priv = netdev_priv(dev);

[Low]
After this patch mtip_switch_rx() only ever returns pkt_received, which
is non-negative; the page_pool allocation failure path does
pndev->stats.rx_dropped++ and goto rx_processing_done rather than
returning -ENOMEM.

The earlier patch in the series left mtip_rx_napi() with:

    pkts = mtip_switch_rx(napi->dev, budget);
    if (pkts == -ENOMEM) {
            napi_complete(napi);
            writel(MCF_ESW_IMR_RXF | MCF_ESW_IMR_TXF, fep->hwp + ESW_IMR);
            return 0;
    }

Is that branch now unreachable, and should it be removed so the contract
between mtip_switch_rx() and its caller stays consistent?

[ ... ]

> +
> +		/* Get correct RX page */
> +		index = bdp - fep->rx_bd_base;
> +		page = fep->page[index];
> +		/* Process the incoming frame */
> +		pkt_len = bdp->cbd_datlen;
> +
> +		dma_sync_single_for_cpu(&fep->pdev->dev, bdp->cbd_bufaddr,
> +					pkt_len, DMA_FROM_DEVICE);
> +		net_prefetch(page_address(page));
> +		data = page_address(page);
> +
> +		if (fep->quirks & FEC_QUIRK_SWAP_FRAME)
> +			swap_buffer(data, pkt_len);
> +
> +		eth_hdr = (struct ethhdr *)data;
> +		mtip_atable_get_entry_port_number(fep, eth_hdr->h_source,
> +						  &rx_port);

[ ... ]

> +
> +		if (mtip_update_cbd(fep, bdp, index)) {
> +			pndev->stats.rx_dropped++;
> +			goto rx_processing_done;
> +		}
> +
> +		/* The packet length includes FCS, but we don't want to
> +		 * include that when passing upstream as it messes up
> +		 * bridging applications.
> +		 */
> +		skb = build_skb(page_address(page), PAGE_SIZE);
> +		if (unlikely(!skb)) {
> +			page_pool_recycle_direct(fep->page_pool, page);
> +			pndev->stats.rx_dropped++;
> +
> +			netdev_err_once(pndev, "build_skb failed!\n");
> +			goto rx_processing_done;
> +		}
> +
> +		skb_put(skb, pkt_len);      /* Make room */
> +		skb_mark_for_recycle(skb);
> +		skb->protocol = eth_type_trans(skb, pndev);
> +		skb->offload_fwd_mark = fep->br_offload;
> +		napi_gro_receive(&fep->napi, skb);
> +
> +		pndev->stats.rx_packets++;
> +		pndev->stats.rx_bytes += pkt_len;

[High]
Is the FCS actually being stripped here?

The comment above build_skb() states:

    /* The packet length includes FCS, but we don't want to
     * include that when passing upstream as it messes up
     * bridging applications.
     */

but skb_put() and the rx_bytes accounting both use pkt_len unmodified:

    skb_put(skb, pkt_len);      /* Make room */
    ...
    pndev->stats.rx_bytes += pkt_len;

The equivalent fec_enet_rx_queue_napi() in fec_main.c does
pkt_len - sub_len (with sub_len = 4 + fep->rx_shift). Combined with
mtip_enet_init() programming MCF_FEC_RCR_CRC_FWD in the RCR register so
the CRC is forwarded into the RX buffer, every skb handed to
napi_gro_receive() ends up with 4 trailing FCS bytes and rx_bytes is
overcounted by 4 per frame. Should pkt_len be reduced by 4 (or by
4 + fep->rx_shift) before skb_put() and the stats update?

[ ... ]