From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 33343C83F1A for ; Wed, 23 Jul 2025 20:08:07 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender:List-Subscribe:List-Help :List-Post:List-Archive:List-Unsubscribe:List-Id:Content-Type:MIME-Version: References:In-Reply-To:Message-ID:Subject:Cc:To:From:Date:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=JRu1eEhqQTWRrM+V9siD8r1BXrfWDxNfwR7uGPU++GM=; b=4rhBL/2d5XxK+6WBMkFeApCSvR JDmDaWoAnhSCMKfAORbQZakN9hZsVqVOOahBzGbkY8c/GHYqupVAgVCXf99cNJIyoeRDN+mh+ZVWG tvKACre/QRaRJaWDd4eyOxL3tWsbK7x6TTMzaZvm6b3k7uGqv3eTjoVDR5oE/CpCJXVqVi63soBXR PUN4al36dm2MYu8VZLNzfU7Yl2VRxNEnaZLHVQvsG0cBOp/HcWgQJra7MbyZkvLhFgewvqZNqbn9S fwiuhmvfzqZIkWxm+0S9klUSLOlWa7xdAlOrJb/2GCR7hIgFBpr8g9Wd/bUnSseJ0szrZFkR6bEVV C7CUOcxg==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.98.2 #2 (Red Hat Linux)) id 1uefks-00000005sDB-2wTi; Wed, 23 Jul 2025 20:07:58 +0000 Received: from mx.denx.de ([2a03:4000:64:cc:545d:19ff:fe05:8172]) by bombadil.infradead.org with esmtps (Exim 4.98.2 #2 (Red Hat Linux)) id 1uefiR-00000005s81-13A9 for linux-arm-kernel@lists.infradead.org; Wed, 23 Jul 2025 20:05:28 +0000 Received: from [127.0.0.1] (localhost [127.0.0.1]) by localhost (Mailerdaemon) with ESMTPSA id 6DE3810272359; Wed, 23 Jul 2025 22:05:18 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=denx.de; s=mx-20241105; t=1753301122; h=from:subject:date:message-id:to:cc:mime-version:content-type: in-reply-to:references; bh=JRu1eEhqQTWRrM+V9siD8r1BXrfWDxNfwR7uGPU++GM=; b=SyBtKbQtFJsJr8BbMMF4N8L0uuZ0w/k99K9Lgwo26BBDMWchLQ3FZDnWWOq3pb6F2H+WtB IH+nJS/DzMJAuwGcepsegNzfGd+UQw22MHZhJhknKZ/tRgAoe1M/xgOHeRLGFKRwdYXD2r 95E75LHIK/XQR7g4vKwLJC+4XmCd5557t59okh64ES41tuBaHxSMNEiUjOH17vuxdAoBRs /dO5I6YGCmlA9RA2dm6eFwf1/wYrat3uHsPR4y4rH7sW2HHPWRh/llRKP00X3Mc/lRYll3 XfBrBSFn3P805Vkju6oKOWmp5MwvuyFOmTq+xOE8v2pQOekag46+iKswvcWOIA== Date: Wed, 23 Jul 2025 22:05:17 +0200 From: Lukasz Majewski To: Jakub Kicinski , Paolo Abeni Cc: Andrew Lunn , davem@davemloft.net, Eric Dumazet , Rob Herring , Krzysztof Kozlowski , Conor Dooley , Shawn Guo , Sascha Hauer , Pengutronix Kernel Team , Fabio Estevam , Richard Cochran , netdev@vger.kernel.org, devicetree@vger.kernel.org, linux-kernel@vger.kernel.org, imx@lists.linux.dev, linux-arm-kernel@lists.infradead.org, Stefan Wahren , Simon Horman Subject: Re: [net-next v15 06/12] net: mtip: Add net_device_ops functions to the L2 switch driver Message-ID: <20250723220517.063c204b@wsk> In-Reply-To: <20250722111639.3a53b450@wsk> References: <20250716214731.3384273-1-lukma@denx.de> <20250716214731.3384273-7-lukma@denx.de> <20250718182840.7ab7e202@kernel.org> <20250722111639.3a53b450@wsk> Organization: denx.de X-Mailer: Claws Mail 3.19.0 (GTK+ 2.24.33; x86_64-pc-linux-gnu) MIME-Version: 1.0 Content-Type: multipart/signed; boundary="Sig_/TD4of5q5UnstYcIEUgvkE=c"; protocol="application/pgp-signature"; micalg=pgp-sha512 X-Last-TLS-Session-Version: TLSv1.3 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20250723_130527_591146_9A756159 X-CRM114-Status: GOOD ( 72.05 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org --Sig_/TD4of5q5UnstYcIEUgvkE=c Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: quoted-printable Hi Jakub, Paolo, Do you have more comments and questions regarding this driver after my explanation? Shall I do something more? Thanks in advance for you feedback. > Hi Jakub, >=20 > > On Wed, 16 Jul 2025 23:47:25 +0200 Lukasz Majewski wrote: =20 > > > +static netdev_tx_t mtip_start_xmit_port(struct sk_buff *skb, > > > + struct net_device *dev, > > > int port) +{ > > > + struct mtip_ndev_priv *priv =3D netdev_priv(dev); > > > + struct switch_enet_private *fep =3D priv->fep; > > > + unsigned short status; > > > + struct cbd_t *bdp; > > > + void *bufaddr; > > > + > > > + spin_lock(&fep->hw_lock); =20 > >=20 > > I see some inconsistencies in how you take this lock. > > Bunch of bare spin_lock() calls from BH context, but there's also > > a _irqsave() call in mtip_adjust_link(). =20 >=20 > In the legacy NXP (Freescale) code for this IP block (i.e. MTIP > switch) the recommended way to re-setup it, when link or duplex > changes, is to reset and reconfigure it. >=20 > It requires setting up interrupts as well... In that situation, IMHO > disabling system interrupts is required to avoid some undefined > behaviour. >=20 > > Please align to the strictest > > context (not sure if the irqsave is actually needed, at a glance, > > IOW whether the lock is taken from an IRQ) =20 >=20 > The spin_lock() for xmit port is similar to what is done for > fec_main.c. As this switch uses single uDMA for both ports as well as > there is no support (and need) for multiple queues it can be omitted. >=20 > > =20 > > > + if (!fep->link[0] && !fep->link[1]) { > > > + /* Link is down or autonegotiation is in > > > progress. */ > > > + netif_stop_queue(dev); > > > + spin_unlock(&fep->hw_lock); > > > + return NETDEV_TX_BUSY; > > > + } > > > + > > > + /* Fill in a Tx ring entry */ > > > + bdp =3D fep->cur_tx; > > > + > > > + /* Force read memory barier on the current transmit > > > description */ =20 > >=20 > > Barrier are between things. What is this barrier separating, and > > what write barrier does it pair with? As far as I can tell cur_tx > > is just a value in memory, and accesses are under ->hw_lock, so > > there should be no ordering concerns. =20 >=20 > The bdp is the uDMA descritptor (memory allocated in the coherent dma > area). It is used by the uDMA when data is transferred to MTIP switch > internal buffer. >=20 > The bdp->cbd_sc is a half word, which is modified by uDMA engine, to > indicate if there are errors or transfer has ended. >=20 > The rmb() shall improve robustness - it assures that the status > corresponds to what was set by uDMA. On the other hand dma coherent > allocation shall do this as well. >=20 > The fec_main.c places the rmb() in similar places, so I followed their > approach. >=20 > > =20 > > > + rmb(); > > > + status =3D bdp->cbd_sc; > > > + > > > + if (status & BD_ENET_TX_READY) { > > > + /* All transmit buffers are full. Bail out. > > > + * This should not happen, since dev->tbusy > > > should be set. > > > + */ > > > + netif_stop_queue(dev); > > > + dev_err(&fep->pdev->dev, "%s: tx queue full!.\n", > > > dev->name); =20 > >=20 > > This needs to be rate limited, we don't want to flood the logs in > > case there's a bug. =20 >=20 > +1 >=20 > >=20 > > Also at a glance it seems like you have one fep for multiple > > netdevs. =20 >=20 > Yes. >=20 > > So stopping one netdev's Tx queue when fep fills up will not stop > > the other ports from pushing frames, right? =20 >=20 > This is a bit more complicated... >=20 > Other solutions - like cpsw_new - are conceptually simple; there are > two DMAs to two separate eth IP blocks. > During startup two separate devices are created. When one wants to > enable bridge (i.e. start in-hw offloading) - just single bit is setup > and ... that's it. >=20 > With vf610 / imx287 and MTIP it is a bit different (imx287 is even > worse as second ETH interface has incomplete functionality by design). >=20 > When switch is not active - you have two uDMA ports to two ENET IP > blocks. Full separation. That is what is done with fec_main.c driver. >=20 > When you enable MTIP switch - then you have just a single uDMA0 active > for "both" ports. In fact you "bridge" two ports into a single one - > that is why Freescale/NXP driver (for 2.6.y) just had eth0 to "model" > bridged interfaces. That was "simpler" (PHY management was done in the > driver as well). >=20 > Now, in this driver, we do have two network devices, which are > "bridged" (so there is br0). And of course there must be separation > between lan0/1 when this driver is used, but bridge is not (yet) > created. This works :-) >=20 >=20 > So I do have - 2x netdevs (handled by single uDMA0) + 2PHYS + br0 + > NAPI + switchdev (to avoid broadcast frame storms + {R}STP + FDB - > WIP). >=20 >=20 > Just pure fun :-) to model it all ... and make happy all maintainers > :-) >=20 > > =20 > > > + spin_unlock(&fep->hw_lock); > > > + return NETDEV_TX_BUSY; > > > + } > > > + > > > + /* Clear all of the status flags */ > > > + status &=3D ~BD_ENET_TX_STATS; > > > + > > > + /* Set buffer length and buffer pointer */ > > > + bufaddr =3D skb->data; > > > + bdp->cbd_datlen =3D skb->len; > > > + > > > + /* On some FEC implementations data must be aligned on > > > + * 4-byte boundaries. Use bounce buffers to copy data > > > + * and get it aligned.spin > > > + */ > > > + if ((unsigned long)bufaddr & MTIP_ALIGNMENT) { =20 > >=20 > > I think you should add=20 > >=20 > > if ... || > > fep->quirks & FEC_QUIRK_SWAP_FRAME) > >=20 > > here. You can't modify skb->data without calling skb_cow_data() > > but you already have buffers allocated so can as well use them. =20 >=20 > The vf610 doesn't need the frame to be swapped, but has requirements > for alignment as well. >=20 > I would keep things as they are now - as they just improve > readability. >=20 > Please keep in mind that this version only supports imx287, but the > plan is to add vf610 as well (to be more specific - this driver also > works on vf610, but I plan to add those patches after this one is > accepted and pulled).=20 >=20 > > =20 > > > + unsigned int index; > > > + > > > + index =3D bdp - fep->tx_bd_base; > > > + memcpy(fep->tx_bounce[index], > > > + (void *)skb->data, skb->len); =20 > >=20 > > this fits on one 80 char line BTW, quite easily: > >=20 > > memcpy(fep->tx_bounce[index], (void *)skb->data, > > skb->len); > >=20 > > Also the cast to void * is not necessary in C. =20 >=20 > +1 >=20 > > =20 > > > + bufaddr =3D fep->tx_bounce[index]; > > > + } > > > + > > > + if (fep->quirks & FEC_QUIRK_SWAP_FRAME) > > > + swap_buffer(bufaddr, skb->len); > > > + > > > + /* Save skb pointer. */ > > > + fep->tx_skbuff[fep->skb_cur] =3D skb; > > > + > > > + fep->skb_cur =3D (fep->skb_cur + 1) & TX_RING_MOD_MASK; =20 > >=20 > > Not sure if this is buggy, but maybe delay updating things until the > > mapping succeeds? Fewer things to unwind. =20 >=20 > Yes, the skb storage as well as ring buffer modification can be done > after dma mapping code. >=20 > > =20 > > > + /* Push the data cache so the CPM does not get stale > > > memory > > > + * data. > > > + */ > > > + bdp->cbd_bufaddr =3D dma_map_single(&fep->pdev->dev, > > > bufaddr, > > > + MTIP_SWITCH_TX_FRSIZE, > > > + DMA_TO_DEVICE); > > > + if (unlikely(dma_mapping_error(&fep->pdev->dev, > > > bdp->cbd_bufaddr))) { > > > + dev_err(&fep->pdev->dev, > > > + "Failed to map descriptor tx buffer\n"); > > > + dev->stats.tx_errors++; > > > + dev->stats.tx_dropped++; =20 > >=20 > > dropped and errors are two different counters > > I'd stick to dropped =20 >=20 > Ok. >=20 > > =20 > > > + dev_kfree_skb_any(skb); > > > + goto err; > > > + } > > > + > > > + /* Send it on its way. Tell FEC it's ready, interrupt > > > when done, > > > + * it's the last BD of the frame, and to put the CRC on > > > the end. > > > + */ > > > + > > > + status |=3D (BD_ENET_TX_READY | BD_ENET_TX_INTR > > > + | BD_ENET_TX_LAST | BD_ENET_TX_TC); =20 > >=20 > > The | goes at the end of the previous line, start of new line > > adjusts to the opening brackets.. > > =20 >=20 > I've refactored it. >=20 > > > + > > > + /* Synchronize all descriptor writes */ > > > + wmb(); > > > + bdp->cbd_sc =3D status; > > > + > > > + netif_trans_update(dev); =20 > >=20 > > Is this call necessary? =20 >=20 > I've added it when I was forward porting the old driver. It can be > removed. >=20 > > =20 > > > + skb_tx_timestamp(skb); > > > + > > > + /* Trigger transmission start */ > > > + writel(MCF_ESW_TDAR_X_DES_ACTIVE, fep->hwp + ESW_TDAR); > > > + > > > + dev->stats.tx_bytes +=3D skb->len; > > > + /* If this was the last BD in the ring, > > > + * start at the beginning again. > > > + */ > > > + if (status & BD_ENET_TX_WRAP) > > > + bdp =3D fep->tx_bd_base; > > > + else > > > + bdp++; > > > + > > > + if (bdp =3D=3D fep->dirty_tx) { > > > + fep->tx_full =3D 1; > > > + netif_stop_queue(dev); > > > + } > > > + > > > + fep->cur_tx =3D bdp; > > > + err: > > > + spin_unlock(&fep->hw_lock); > > > + > > > + return NETDEV_TX_OK; > > > +} =20 >=20 >=20 > Thanks for the feedback. >=20 > Best regards, >=20 > Lukasz Majewski >=20 > -- >=20 > DENX Software Engineering GmbH, Managing Director: Johanna Denk, > Tabea Lutz HRB 165235 Munich, Office: Kirchenstr.5, D-82194 > Groebenzell, Germany > Phone: (+49)-8142-66989-59 Fax: (+49)-8142-66989-80 Email: > lukma@denx.de Best regards, Lukasz Majewski -- DENX Software Engineering GmbH, Managing Director: Johanna Denk, Tabea Lutz HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany Phone: (+49)-8142-66989-59 Fax: (+49)-8142-66989-80 Email: lukma@denx.de --Sig_/TD4of5q5UnstYcIEUgvkE=c Content-Type: application/pgp-signature Content-Description: OpenPGP digital signature -----BEGIN PGP SIGNATURE----- iQEzBAEBCgAdFiEEgAyFJ+N6uu6+XupJAR8vZIA0zr0FAmiBQH0ACgkQAR8vZIA0 zr1Utgf6Ard+NJXd0ZSwPbQGeReeiTVtyyTnWcReC90yLPoHeAp0XlMJjBDXKjd4 UB7FYVTELP+NVWQFhAutUrWMlMaiAetWXTwiYgxmCa5O05HuHATTkaN7dRZ3iVQc lvPpoTZfRXf2QPEJ7PaoPsvqoVew1GINeqqKmqodBbDvOz7ykY7ixvjbrw/CLFXM q/3gN2TzXmKS1sTiM0y05r/V7XunIzrV4eZVcEJAy1p7WuE56VqHssJVf33FOCU8 WJHa9x6eli6qVbdsU6EbtX5yI3cJpTJtqi55COBtFAd1h7EahG44/wtJ6Hjfzh7+ zH/lEkH06w5PbB/+xFJag5+KwzvlcQ== =EEYc -----END PGP SIGNATURE----- --Sig_/TD4of5q5UnstYcIEUgvkE=c--