From: Lukasz Majewski <lukma@denx.de>
To: Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>
Cc: Andrew Lunn <andrew+netdev@lunn.ch>,
davem@davemloft.net, Eric Dumazet <edumazet@google.com>,
Rob Herring <robh@kernel.org>,
Krzysztof Kozlowski <krzk+dt@kernel.org>,
Conor Dooley <conor+dt@kernel.org>,
Shawn Guo <shawnguo@kernel.org>,
Sascha Hauer <s.hauer@pengutronix.de>,
Pengutronix Kernel Team <kernel@pengutronix.de>,
Fabio Estevam <festevam@gmail.com>,
Richard Cochran <richardcochran@gmail.com>,
netdev@vger.kernel.org, devicetree@vger.kernel.org,
linux-kernel@vger.kernel.org, imx@lists.linux.dev,
linux-arm-kernel@lists.infradead.org,
Stefan Wahren <wahrenst@gmx.net>, Simon Horman <horms@kernel.org>
Subject: Re: [net-next v15 06/12] net: mtip: Add net_device_ops functions to the L2 switch driver
Date: Wed, 23 Jul 2025 22:05:17 +0200 [thread overview]
Message-ID: <20250723220517.063c204b@wsk> (raw)
In-Reply-To: <20250722111639.3a53b450@wsk>
[-- Attachment #1: Type: text/plain, Size: 9786 bytes --]
Hi Jakub, Paolo,
Do you have more comments and questions regarding this driver after my
explanation?
Shall I do something more?
Thanks in advance for you feedback.
> Hi Jakub,
>
> > On Wed, 16 Jul 2025 23:47:25 +0200 Lukasz Majewski wrote:
> > > +static netdev_tx_t mtip_start_xmit_port(struct sk_buff *skb,
> > > + struct net_device *dev,
> > > int port) +{
> > > + struct mtip_ndev_priv *priv = netdev_priv(dev);
> > > + struct switch_enet_private *fep = priv->fep;
> > > + unsigned short status;
> > > + struct cbd_t *bdp;
> > > + void *bufaddr;
> > > +
> > > + spin_lock(&fep->hw_lock);
> >
> > I see some inconsistencies in how you take this lock.
> > Bunch of bare spin_lock() calls from BH context, but there's also
> > a _irqsave() call in mtip_adjust_link().
>
> In the legacy NXP (Freescale) code for this IP block (i.e. MTIP
> switch) the recommended way to re-setup it, when link or duplex
> changes, is to reset and reconfigure it.
>
> It requires setting up interrupts as well... In that situation, IMHO
> disabling system interrupts is required to avoid some undefined
> behaviour.
>
> > Please align to the strictest
> > context (not sure if the irqsave is actually needed, at a glance,
> > IOW whether the lock is taken from an IRQ)
>
> The spin_lock() for xmit port is similar to what is done for
> fec_main.c. As this switch uses single uDMA for both ports as well as
> there is no support (and need) for multiple queues it can be omitted.
>
> >
> > > + if (!fep->link[0] && !fep->link[1]) {
> > > + /* Link is down or autonegotiation is in
> > > progress. */
> > > + netif_stop_queue(dev);
> > > + spin_unlock(&fep->hw_lock);
> > > + return NETDEV_TX_BUSY;
> > > + }
> > > +
> > > + /* Fill in a Tx ring entry */
> > > + bdp = fep->cur_tx;
> > > +
> > > + /* Force read memory barier on the current transmit
> > > description */
> >
> > Barrier are between things. What is this barrier separating, and
> > what write barrier does it pair with? As far as I can tell cur_tx
> > is just a value in memory, and accesses are under ->hw_lock, so
> > there should be no ordering concerns.
>
> The bdp is the uDMA descritptor (memory allocated in the coherent dma
> area). It is used by the uDMA when data is transferred to MTIP switch
> internal buffer.
>
> The bdp->cbd_sc is a half word, which is modified by uDMA engine, to
> indicate if there are errors or transfer has ended.
>
> The rmb() shall improve robustness - it assures that the status
> corresponds to what was set by uDMA. On the other hand dma coherent
> allocation shall do this as well.
>
> The fec_main.c places the rmb() in similar places, so I followed their
> approach.
>
> >
> > > + rmb();
> > > + status = bdp->cbd_sc;
> > > +
> > > + if (status & BD_ENET_TX_READY) {
> > > + /* All transmit buffers are full. Bail out.
> > > + * This should not happen, since dev->tbusy
> > > should be set.
> > > + */
> > > + netif_stop_queue(dev);
> > > + dev_err(&fep->pdev->dev, "%s: tx queue full!.\n",
> > > dev->name);
> >
> > This needs to be rate limited, we don't want to flood the logs in
> > case there's a bug.
>
> +1
>
> >
> > Also at a glance it seems like you have one fep for multiple
> > netdevs.
>
> Yes.
>
> > So stopping one netdev's Tx queue when fep fills up will not stop
> > the other ports from pushing frames, right?
>
> This is a bit more complicated...
>
> Other solutions - like cpsw_new - are conceptually simple; there are
> two DMAs to two separate eth IP blocks.
> During startup two separate devices are created. When one wants to
> enable bridge (i.e. start in-hw offloading) - just single bit is setup
> and ... that's it.
>
> With vf610 / imx287 and MTIP it is a bit different (imx287 is even
> worse as second ETH interface has incomplete functionality by design).
>
> When switch is not active - you have two uDMA ports to two ENET IP
> blocks. Full separation. That is what is done with fec_main.c driver.
>
> When you enable MTIP switch - then you have just a single uDMA0 active
> for "both" ports. In fact you "bridge" two ports into a single one -
> that is why Freescale/NXP driver (for 2.6.y) just had eth0 to "model"
> bridged interfaces. That was "simpler" (PHY management was done in the
> driver as well).
>
> Now, in this driver, we do have two network devices, which are
> "bridged" (so there is br0). And of course there must be separation
> between lan0/1 when this driver is used, but bridge is not (yet)
> created. This works :-)
>
>
> So I do have - 2x netdevs (handled by single uDMA0) + 2PHYS + br0 +
> NAPI + switchdev (to avoid broadcast frame storms + {R}STP + FDB -
> WIP).
>
>
> Just pure fun :-) to model it all ... and make happy all maintainers
> :-)
>
> >
> > > + spin_unlock(&fep->hw_lock);
> > > + return NETDEV_TX_BUSY;
> > > + }
> > > +
> > > + /* Clear all of the status flags */
> > > + status &= ~BD_ENET_TX_STATS;
> > > +
> > > + /* Set buffer length and buffer pointer */
> > > + bufaddr = skb->data;
> > > + bdp->cbd_datlen = skb->len;
> > > +
> > > + /* On some FEC implementations data must be aligned on
> > > + * 4-byte boundaries. Use bounce buffers to copy data
> > > + * and get it aligned.spin
> > > + */
> > > + if ((unsigned long)bufaddr & MTIP_ALIGNMENT) {
> >
> > I think you should add
> >
> > if ... ||
> > fep->quirks & FEC_QUIRK_SWAP_FRAME)
> >
> > here. You can't modify skb->data without calling skb_cow_data()
> > but you already have buffers allocated so can as well use them.
>
> The vf610 doesn't need the frame to be swapped, but has requirements
> for alignment as well.
>
> I would keep things as they are now - as they just improve
> readability.
>
> Please keep in mind that this version only supports imx287, but the
> plan is to add vf610 as well (to be more specific - this driver also
> works on vf610, but I plan to add those patches after this one is
> accepted and pulled).
>
> >
> > > + unsigned int index;
> > > +
> > > + index = bdp - fep->tx_bd_base;
> > > + memcpy(fep->tx_bounce[index],
> > > + (void *)skb->data, skb->len);
> >
> > this fits on one 80 char line BTW, quite easily:
> >
> > memcpy(fep->tx_bounce[index], (void *)skb->data,
> > skb->len);
> >
> > Also the cast to void * is not necessary in C.
>
> +1
>
> >
> > > + bufaddr = fep->tx_bounce[index];
> > > + }
> > > +
> > > + if (fep->quirks & FEC_QUIRK_SWAP_FRAME)
> > > + swap_buffer(bufaddr, skb->len);
> > > +
> > > + /* Save skb pointer. */
> > > + fep->tx_skbuff[fep->skb_cur] = skb;
> > > +
> > > + fep->skb_cur = (fep->skb_cur + 1) & TX_RING_MOD_MASK;
> >
> > Not sure if this is buggy, but maybe delay updating things until the
> > mapping succeeds? Fewer things to unwind.
>
> Yes, the skb storage as well as ring buffer modification can be done
> after dma mapping code.
>
> >
> > > + /* Push the data cache so the CPM does not get stale
> > > memory
> > > + * data.
> > > + */
> > > + bdp->cbd_bufaddr = dma_map_single(&fep->pdev->dev,
> > > bufaddr,
> > > + MTIP_SWITCH_TX_FRSIZE,
> > > + DMA_TO_DEVICE);
> > > + if (unlikely(dma_mapping_error(&fep->pdev->dev,
> > > bdp->cbd_bufaddr))) {
> > > + dev_err(&fep->pdev->dev,
> > > + "Failed to map descriptor tx buffer\n");
> > > + dev->stats.tx_errors++;
> > > + dev->stats.tx_dropped++;
> >
> > dropped and errors are two different counters
> > I'd stick to dropped
>
> Ok.
>
> >
> > > + dev_kfree_skb_any(skb);
> > > + goto err;
> > > + }
> > > +
> > > + /* Send it on its way. Tell FEC it's ready, interrupt
> > > when done,
> > > + * it's the last BD of the frame, and to put the CRC on
> > > the end.
> > > + */
> > > +
> > > + status |= (BD_ENET_TX_READY | BD_ENET_TX_INTR
> > > + | BD_ENET_TX_LAST | BD_ENET_TX_TC);
> >
> > The | goes at the end of the previous line, start of new line
> > adjusts to the opening brackets..
> >
>
> I've refactored it.
>
> > > +
> > > + /* Synchronize all descriptor writes */
> > > + wmb();
> > > + bdp->cbd_sc = status;
> > > +
> > > + netif_trans_update(dev);
> >
> > Is this call necessary?
>
> I've added it when I was forward porting the old driver. It can be
> removed.
>
> >
> > > + skb_tx_timestamp(skb);
> > > +
> > > + /* Trigger transmission start */
> > > + writel(MCF_ESW_TDAR_X_DES_ACTIVE, fep->hwp + ESW_TDAR);
> > > +
> > > + dev->stats.tx_bytes += skb->len;
> > > + /* If this was the last BD in the ring,
> > > + * start at the beginning again.
> > > + */
> > > + if (status & BD_ENET_TX_WRAP)
> > > + bdp = fep->tx_bd_base;
> > > + else
> > > + bdp++;
> > > +
> > > + if (bdp == fep->dirty_tx) {
> > > + fep->tx_full = 1;
> > > + netif_stop_queue(dev);
> > > + }
> > > +
> > > + fep->cur_tx = bdp;
> > > + err:
> > > + spin_unlock(&fep->hw_lock);
> > > +
> > > + return NETDEV_TX_OK;
> > > +}
>
>
> Thanks for the feedback.
>
> Best regards,
>
> Lukasz Majewski
>
> --
>
> DENX Software Engineering GmbH, Managing Director: Johanna Denk,
> Tabea Lutz HRB 165235 Munich, Office: Kirchenstr.5, D-82194
> Groebenzell, Germany
> Phone: (+49)-8142-66989-59 Fax: (+49)-8142-66989-80 Email:
> lukma@denx.de
Best regards,
Lukasz Majewski
--
DENX Software Engineering GmbH, Managing Director: Johanna Denk,
Tabea Lutz HRB 165235 Munich, Office: Kirchenstr.5, D-82194
Groebenzell, Germany
Phone: (+49)-8142-66989-59 Fax: (+49)-8142-66989-80 Email: lukma@denx.de
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
next prev parent reply other threads:[~2025-07-23 20:05 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-07-16 21:47 [net-next v15 00/12] net: mtip: Add support for MTIP imx287 L2 switch driver Lukasz Majewski
2025-07-16 21:47 ` [net-next v15 01/12] dt-bindings: net: Add MTIP L2 switch description Lukasz Majewski
2025-07-16 21:47 ` [net-next v15 02/12] ARM: dts: nxp: mxs: Adjust the imx28.dtsi " Lukasz Majewski
2025-07-16 21:47 ` [net-next v15 03/12] ARM: dts: nxp: mxs: Adjust XEA board's DTS to support L2 switch Lukasz Majewski
2025-07-16 21:47 ` [net-next v15 04/12] net: mtip: The L2 switch driver for imx287 Lukasz Majewski
2025-07-19 1:10 ` Jakub Kicinski
2025-07-21 21:06 ` Lukasz Majewski
2025-07-16 21:47 ` [net-next v15 05/12] net: mtip: Add buffers management functions to the L2 switch driver Lukasz Majewski
2025-07-16 21:47 ` [net-next v15 06/12] net: mtip: Add net_device_ops " Lukasz Majewski
2025-07-19 1:28 ` Jakub Kicinski
2025-07-22 9:16 ` Lukasz Majewski
2025-07-23 20:05 ` Lukasz Majewski [this message]
2025-07-23 20:17 ` Jakub Kicinski
2025-07-23 21:11 ` Lukasz Majewski
2025-07-16 21:47 ` [net-next v15 07/12] net: mtip: Add mtip_switch_{rx|tx} " Lukasz Majewski
2025-07-16 21:47 ` [net-next v15 08/12] net: mtip: Extend the L2 switch driver with management operations Lukasz Majewski
2025-07-16 21:47 ` [net-next v15 09/12] net: mtip: Extend the L2 switch driver for imx287 with bridge operations Lukasz Majewski
2025-07-16 21:47 ` [net-next v15 10/12] ARM: mxs_defconfig: Enable CONFIG_NFS_FSCACHE Lukasz Majewski
2025-07-16 21:47 ` [net-next v15 11/12] ARM: mxs_defconfig: Update mxs_defconfig to 6.16-rc5 Lukasz Majewski
2025-07-16 21:47 ` [net-next v15 12/12] ARM: mxs_defconfig: Enable CONFIG_FEC_MTIP_L2SW to support MTIP L2 switch Lukasz Majewski
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250723220517.063c204b@wsk \
--to=lukma@denx.de \
--cc=andrew+netdev@lunn.ch \
--cc=conor+dt@kernel.org \
--cc=davem@davemloft.net \
--cc=devicetree@vger.kernel.org \
--cc=edumazet@google.com \
--cc=festevam@gmail.com \
--cc=horms@kernel.org \
--cc=imx@lists.linux.dev \
--cc=kernel@pengutronix.de \
--cc=krzk+dt@kernel.org \
--cc=kuba@kernel.org \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=richardcochran@gmail.com \
--cc=robh@kernel.org \
--cc=s.hauer@pengutronix.de \
--cc=shawnguo@kernel.org \
--cc=wahrenst@gmx.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox