public inbox for devicetree@vger.kernel.org
 help / color / mirror / Atom feed
From: Lukasz Majewski <lukma@denx.de>
To: Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>
Cc: Andrew Lunn <andrew+netdev@lunn.ch>,
	davem@davemloft.net, Eric Dumazet <edumazet@google.com>,
	Rob Herring <robh@kernel.org>,
	Krzysztof Kozlowski <krzk+dt@kernel.org>,
	Conor Dooley <conor+dt@kernel.org>,
	Shawn Guo <shawnguo@kernel.org>,
	Sascha Hauer <s.hauer@pengutronix.de>,
	Pengutronix Kernel Team <kernel@pengutronix.de>,
	Fabio Estevam <festevam@gmail.com>,
	Richard Cochran <richardcochran@gmail.com>,
	netdev@vger.kernel.org, devicetree@vger.kernel.org,
	linux-kernel@vger.kernel.org, imx@lists.linux.dev,
	linux-arm-kernel@lists.infradead.org,
	Stefan Wahren <wahrenst@gmx.net>, Simon Horman <horms@kernel.org>
Subject: Re: [net-next v15 06/12] net: mtip: Add net_device_ops functions to the L2 switch driver
Date: Wed, 23 Jul 2025 22:05:17 +0200	[thread overview]
Message-ID: <20250723220517.063c204b@wsk> (raw)
In-Reply-To: <20250722111639.3a53b450@wsk>

[-- Attachment #1: Type: text/plain, Size: 9786 bytes --]

Hi Jakub, Paolo,

Do you have more comments and questions regarding this driver after my
explanation?

Shall I do something more?

Thanks in advance for you feedback.

> Hi Jakub,
> 
> > On Wed, 16 Jul 2025 23:47:25 +0200 Lukasz Majewski wrote:  
> > > +static netdev_tx_t mtip_start_xmit_port(struct sk_buff *skb,
> > > +					struct net_device *dev,
> > > int port) +{
> > > +	struct mtip_ndev_priv *priv = netdev_priv(dev);
> > > +	struct switch_enet_private *fep = priv->fep;
> > > +	unsigned short status;
> > > +	struct cbd_t *bdp;
> > > +	void *bufaddr;
> > > +
> > > +	spin_lock(&fep->hw_lock);    
> > 
> > I see some inconsistencies in how you take this lock.
> > Bunch of bare spin_lock() calls from BH context, but there's also
> > a _irqsave() call in mtip_adjust_link().  
> 
> In the legacy NXP (Freescale) code for this IP block (i.e. MTIP
> switch) the recommended way to re-setup it, when link or duplex
> changes, is to reset and reconfigure it.
> 
> It requires setting up interrupts as well... In that situation, IMHO
> disabling system interrupts is required to avoid some undefined
> behaviour.
> 
> > Please align to the strictest
> > context (not sure if the irqsave is actually needed, at a glance,
> > IOW whether the lock is taken from an IRQ)  
> 
> The spin_lock() for xmit port is similar to what is done for
> fec_main.c. As this switch uses single uDMA for both ports as well as
> there is no support (and need) for multiple queues it can be omitted.
> 
> >   
> > > +	if (!fep->link[0] && !fep->link[1]) {
> > > +		/* Link is down or autonegotiation is in
> > > progress. */
> > > +		netif_stop_queue(dev);
> > > +		spin_unlock(&fep->hw_lock);
> > > +		return NETDEV_TX_BUSY;
> > > +	}
> > > +
> > > +	/* Fill in a Tx ring entry */
> > > +	bdp = fep->cur_tx;
> > > +
> > > +	/* Force read memory barier on the current transmit
> > > description */    
> > 
> > Barrier are between things. What is this barrier separating, and
> > what write barrier does it pair with? As far as I can tell cur_tx
> > is just a value in memory, and accesses are under ->hw_lock, so
> > there should be no ordering concerns.  
> 
> The bdp is the uDMA descritptor (memory allocated in the coherent dma
> area). It is used by the uDMA when data is transferred to MTIP switch
> internal buffer.
> 
> The bdp->cbd_sc is a half word, which is modified by uDMA engine, to
> indicate if there are errors or transfer has ended.
> 
> The rmb() shall improve robustness - it assures that the status
> corresponds to what was set by uDMA. On the other hand dma coherent
> allocation shall do this as well.
> 
> The fec_main.c places the rmb() in similar places, so I followed their
> approach.
> 
> >   
> > > +	rmb();
> > > +	status = bdp->cbd_sc;
> > > +
> > > +	if (status & BD_ENET_TX_READY) {
> > > +		/* All transmit buffers are full. Bail out.
> > > +		 * This should not happen, since dev->tbusy
> > > should be set.
> > > +		 */
> > > +		netif_stop_queue(dev);
> > > +		dev_err(&fep->pdev->dev, "%s: tx queue full!.\n",
> > > dev->name);    
> > 
> > This needs to be rate limited, we don't want to flood the logs in
> > case there's a bug.  
> 
> +1
> 
> > 
> > Also at a glance it seems like you have one fep for multiple
> > netdevs.  
> 
> Yes.
> 
> > So stopping one netdev's Tx queue when fep fills up will not stop
> > the other ports from pushing frames, right?  
> 
> This is a bit more complicated...
> 
> Other solutions - like cpsw_new - are conceptually simple; there are
> two DMAs to two separate eth IP blocks.
> During startup two separate devices are created. When one wants to
> enable bridge (i.e. start in-hw offloading) - just single bit is setup
> and ... that's it.
> 
> With vf610 / imx287 and MTIP it is a bit different (imx287 is even
> worse as second ETH interface has incomplete functionality by design).
> 
> When switch is not active - you have two uDMA ports to two ENET IP
> blocks. Full separation. That is what is done with fec_main.c driver.
> 
> When you enable MTIP switch - then you have just a single uDMA0 active
> for "both" ports. In fact you "bridge" two ports into a single one -
> that is why Freescale/NXP driver (for 2.6.y) just had eth0 to "model"
> bridged interfaces. That was "simpler" (PHY management was done in the
> driver as well).
> 
> Now, in this driver, we do have two network devices, which are
> "bridged" (so there is br0). And of course there must be separation
> between lan0/1 when this driver is used, but bridge is not (yet)
> created. This works :-)
> 
> 
> So I do have - 2x netdevs (handled by single uDMA0) + 2PHYS + br0 +
> NAPI + switchdev (to avoid broadcast frame storms + {R}STP + FDB -
> WIP).
> 
> 
> Just pure fun :-) to model it all ... and make happy all maintainers
> :-)
> 
> >   
> > > +		spin_unlock(&fep->hw_lock);
> > > +		return NETDEV_TX_BUSY;
> > > +	}
> > > +
> > > +	/* Clear all of the status flags */
> > > +	status &= ~BD_ENET_TX_STATS;
> > > +
> > > +	/* Set buffer length and buffer pointer */
> > > +	bufaddr = skb->data;
> > > +	bdp->cbd_datlen = skb->len;
> > > +
> > > +	/* On some FEC implementations data must be aligned on
> > > +	 * 4-byte boundaries. Use bounce buffers to copy data
> > > +	 * and get it aligned.spin
> > > +	 */
> > > +	if ((unsigned long)bufaddr & MTIP_ALIGNMENT) {    
> > 
> > I think you should add 
> > 
> > 	if ... ||
> >            fep->quirks & FEC_QUIRK_SWAP_FRAME)
> > 
> > here. You can't modify skb->data without calling skb_cow_data()
> > but you already have buffers allocated so can as well use them.  
> 
> The vf610 doesn't need the frame to be swapped, but has requirements
> for alignment as well.
> 
> I would keep things as they are now - as they just improve
> readability.
> 
> Please keep in mind that this version only supports imx287, but the
> plan is to add vf610 as well (to be more specific - this driver also
> works on vf610, but I plan to add those patches after this one is
> accepted and pulled). 
> 
> >   
> > > +		unsigned int index;
> > > +
> > > +		index = bdp - fep->tx_bd_base;
> > > +		memcpy(fep->tx_bounce[index],
> > > +		       (void *)skb->data, skb->len);    
> > 
> > this fits on one 80 char line BTW, quite easily:
> > 
> > 		memcpy(fep->tx_bounce[index], (void *)skb->data,
> > skb->len);
> > 
> > Also the cast to void * is not necessary in C.  
> 
> +1
> 
> >   
> > > +		bufaddr = fep->tx_bounce[index];
> > > +	}
> > > +
> > > +	if (fep->quirks & FEC_QUIRK_SWAP_FRAME)
> > > +		swap_buffer(bufaddr, skb->len);
> > > +
> > > +	/* Save skb pointer. */
> > > +	fep->tx_skbuff[fep->skb_cur] = skb;
> > > +
> > > +	fep->skb_cur = (fep->skb_cur + 1) & TX_RING_MOD_MASK;    
> > 
> > Not sure if this is buggy, but maybe delay updating things until the
> > mapping succeeds? Fewer things to unwind.  
> 
> Yes, the skb storage as well as ring buffer modification can be done
> after dma mapping code.
> 
> >   
> > > +	/* Push the data cache so the CPM does not get stale
> > > memory
> > > +	 * data.
> > > +	 */
> > > +	bdp->cbd_bufaddr = dma_map_single(&fep->pdev->dev,
> > > bufaddr,
> > > +					  MTIP_SWITCH_TX_FRSIZE,
> > > +					  DMA_TO_DEVICE);
> > > +	if (unlikely(dma_mapping_error(&fep->pdev->dev,
> > > bdp->cbd_bufaddr))) {
> > > +		dev_err(&fep->pdev->dev,
> > > +			"Failed to map descriptor tx buffer\n");
> > > +		dev->stats.tx_errors++;
> > > +		dev->stats.tx_dropped++;    
> > 
> > dropped and errors are two different counters
> > I'd stick to dropped  
> 
> Ok.
> 
> >   
> > > +		dev_kfree_skb_any(skb);
> > > +		goto err;
> > > +	}
> > > +
> > > +	/* Send it on its way.  Tell FEC it's ready, interrupt
> > > when done,
> > > +	 * it's the last BD of the frame, and to put the CRC on
> > > the end.
> > > +	 */
> > > +
> > > +	status |= (BD_ENET_TX_READY | BD_ENET_TX_INTR
> > > +			| BD_ENET_TX_LAST | BD_ENET_TX_TC);    
> > 
> > The | goes at the end of the previous line, start of new line
> > adjusts to the opening brackets..
> >   
> 
> I've refactored it.
> 
> > > +
> > > +	/* Synchronize all descriptor writes */
> > > +	wmb();
> > > +	bdp->cbd_sc = status;
> > > +
> > > +	netif_trans_update(dev);    
> > 
> > Is this call necessary?  
> 
> I've added it when I was forward porting the old driver. It can be
> removed.
> 
> >   
> > > +	skb_tx_timestamp(skb);
> > > +
> > > +	/* Trigger transmission start */
> > > +	writel(MCF_ESW_TDAR_X_DES_ACTIVE, fep->hwp + ESW_TDAR);
> > > +
> > > +	dev->stats.tx_bytes += skb->len;
> > > +	/* If this was the last BD in the ring,
> > > +	 * start at the beginning again.
> > > +	 */
> > > +	if (status & BD_ENET_TX_WRAP)
> > > +		bdp = fep->tx_bd_base;
> > > +	else
> > > +		bdp++;
> > > +
> > > +	if (bdp == fep->dirty_tx) {
> > > +		fep->tx_full = 1;
> > > +		netif_stop_queue(dev);
> > > +	}
> > > +
> > > +	fep->cur_tx = bdp;
> > > + err:
> > > +	spin_unlock(&fep->hw_lock);
> > > +
> > > +	return NETDEV_TX_OK;
> > > +}    
> 
> 
> Thanks for the feedback.
> 
> Best regards,
> 
> Lukasz Majewski
> 
> --
> 
> DENX Software Engineering GmbH, Managing Director: Johanna Denk,
> Tabea Lutz HRB 165235 Munich, Office: Kirchenstr.5, D-82194
> Groebenzell, Germany
> Phone: (+49)-8142-66989-59 Fax: (+49)-8142-66989-80 Email:
> lukma@denx.de




Best regards,

Lukasz Majewski

--

DENX Software Engineering GmbH, Managing Director: Johanna Denk,
Tabea Lutz HRB 165235 Munich, Office: Kirchenstr.5, D-82194
Groebenzell, Germany
Phone: (+49)-8142-66989-59 Fax: (+49)-8142-66989-80 Email: lukma@denx.de

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

  reply	other threads:[~2025-07-23 20:05 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-07-16 21:47 [net-next v15 00/12] net: mtip: Add support for MTIP imx287 L2 switch driver Lukasz Majewski
2025-07-16 21:47 ` [net-next v15 01/12] dt-bindings: net: Add MTIP L2 switch description Lukasz Majewski
2025-07-16 21:47 ` [net-next v15 02/12] ARM: dts: nxp: mxs: Adjust the imx28.dtsi " Lukasz Majewski
2025-07-16 21:47 ` [net-next v15 03/12] ARM: dts: nxp: mxs: Adjust XEA board's DTS to support L2 switch Lukasz Majewski
2025-07-16 21:47 ` [net-next v15 04/12] net: mtip: The L2 switch driver for imx287 Lukasz Majewski
2025-07-19  1:10   ` Jakub Kicinski
2025-07-21 21:06     ` Lukasz Majewski
2025-07-16 21:47 ` [net-next v15 05/12] net: mtip: Add buffers management functions to the L2 switch driver Lukasz Majewski
2025-07-16 21:47 ` [net-next v15 06/12] net: mtip: Add net_device_ops " Lukasz Majewski
2025-07-19  1:28   ` Jakub Kicinski
2025-07-22  9:16     ` Lukasz Majewski
2025-07-23 20:05       ` Lukasz Majewski [this message]
2025-07-23 20:17         ` Jakub Kicinski
2025-07-23 21:11           ` Lukasz Majewski
2025-07-16 21:47 ` [net-next v15 07/12] net: mtip: Add mtip_switch_{rx|tx} " Lukasz Majewski
2025-07-16 21:47 ` [net-next v15 08/12] net: mtip: Extend the L2 switch driver with management operations Lukasz Majewski
2025-07-16 21:47 ` [net-next v15 09/12] net: mtip: Extend the L2 switch driver for imx287 with bridge operations Lukasz Majewski
2025-07-16 21:47 ` [net-next v15 10/12] ARM: mxs_defconfig: Enable CONFIG_NFS_FSCACHE Lukasz Majewski
2025-07-16 21:47 ` [net-next v15 11/12] ARM: mxs_defconfig: Update mxs_defconfig to 6.16-rc5 Lukasz Majewski
2025-07-16 21:47 ` [net-next v15 12/12] ARM: mxs_defconfig: Enable CONFIG_FEC_MTIP_L2SW to support MTIP L2 switch Lukasz Majewski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250723220517.063c204b@wsk \
    --to=lukma@denx.de \
    --cc=andrew+netdev@lunn.ch \
    --cc=conor+dt@kernel.org \
    --cc=davem@davemloft.net \
    --cc=devicetree@vger.kernel.org \
    --cc=edumazet@google.com \
    --cc=festevam@gmail.com \
    --cc=horms@kernel.org \
    --cc=imx@lists.linux.dev \
    --cc=kernel@pengutronix.de \
    --cc=krzk+dt@kernel.org \
    --cc=kuba@kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=richardcochran@gmail.com \
    --cc=robh@kernel.org \
    --cc=s.hauer@pengutronix.de \
    --cc=shawnguo@kernel.org \
    --cc=wahrenst@gmx.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox