Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH net v2 0/3] net/sched: Fix ct zone matching for invalid conntrack state
From: Marcelo Ricardo Leitner @ 2021-12-09 12:11 UTC (permalink / raw)
  To: Paul Blakey
  Cc: dev, netdev, Saeed Mahameed, Cong Wang, Jamal Hadi Salim,
	Pravin B Shelar, davem, Jiri Pirko, wenxu, Oz Shlomo, Vlad Buslov,
	Roi Dayan
In-Reply-To: <20211209075734.10199-1-paulb@nvidia.com>

On Thu, Dec 09, 2021 at 09:57:31AM +0200, Paul Blakey wrote:
> Changelog:
> 	1->2:
> 	  Cover letter wording
> 	  Added blamed CCs

Thanks.

> 
> Paul Blakey (3):
>   net/sched: Extend qdisc control block with tc control block
>   net/sched: flow_dissector: Fix matching on zone id for invalid conns
>   net: openvswitch: Fix matching zone id for invalid conns arriving from tc

I keep getting surprised by how much metadata we have on CT other than
skb->_nfct. :-)

Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>

^ permalink raw reply

* Re: Bad performance in RX with sfc 40G
From: Íñigo Huguet @ 2021-12-09 12:06 UTC (permalink / raw)
  To: Íñigo Huguet, Edward Cree, netdev, Dinan Gunawardena
In-Reply-To: <20211120083107.z2cm7tkl2rsri2v7@gmail.com>

Hi,

On Sat, Nov 20, 2021 at 9:31 AM Martin Habets <habetsm.xilinx@gmail.com> wrote:
> If you're testing without the IOMMU enabled I suspect the recycle ring
> size may be too small. Can your try the patch below?

Sorry for the very late reply, but I've had to be out of work for many days.

This patch has improved the performance a lot, reaching the same
30Gbps than in TX. However, it seems sometimes a bit erratic, still
dropping to 15Gbps sometimes, specially after module remove & probe,
or from one iperf call to another. But not being all the times, I
didn't found a clear pattern. Anyway, it clearly improves things.

Can this patch be applied as is or it's just a test?

--
Íñigo Huguet

^ permalink raw reply

* [PATCH v1 bpf 1/1] libbpf: don't force user-supplied ifname string to be of fixed size
From: Emmanuel Deloget @ 2021-12-09 12:03 UTC (permalink / raw)
  To: Björn Töpel, Magnus Karlsson, Jonathan Lemon,
	Alexei Starovoitov, Daniel Borkmann, Andrii Nakryiko,
	Martin KaFai Lau, Song Liu, Yonghong Song, John Fastabend,
	KP Singh
  Cc: Emmanuel Deloget, netdev, bpf, linux-kernel

When calling either xsk_socket__create_shared() or xsk_socket__create()
the user supplies a const char *ifname which is implicitely supposed to
be a pointer to the start of a char[IFNAMSIZ] array. The internal
function xsk_create_ctx() then blindly copy IFNAMSIZ bytes from this
string into the xsk context.

This is counter-intuitive and error-prone.

For example,

        int r = xsk_socket__create(..., "eth0", ...)

may result in an invalid object because of the blind copy. The "eth0"
string might be followed by random data from the ro data section,
resulting in ctx->ifname being filled with the correct interface name
then a bunch and invalid bytes.

The same kind of issue arises when the ifname string is located on the
stack:

        char ifname[] = "eth0";
        int r = xsk_socket__create(..., ifname, ...);

Or comes from the command line

        const char *ifname = argv[n];
        int r = xsk_socket__create(..., ifname, ...);

In both case we'll fill ctx->ifname with random data from the stack.

In practice, we saw that this issue caused various small errors which,
in then end, prevented us to setup a valid xsk context that would have
allowed us to capture packets on our interfaces. We fixed this issue in
our code by forcing our char ifname[] to be of size IFNAMSIZ but that felt
weird and unnecessary.

Fixes: 2f6324a3937f8 (libbpf: Support shared umems between queues and devices)
Signed-off-by: Emmanuel Deloget <emmanuel.deloget@eho.link>
---
 tools/lib/bpf/xsk.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/tools/lib/bpf/xsk.c b/tools/lib/bpf/xsk.c
index 81f8fbc85e70..8dda80bcefcc 100644
--- a/tools/lib/bpf/xsk.c
+++ b/tools/lib/bpf/xsk.c
@@ -944,6 +944,7 @@ static struct xsk_ctx *xsk_create_ctx(struct xsk_socket *xsk,
 {
 	struct xsk_ctx *ctx;
 	int err;
+	size_t ifnamlen;

 	ctx = calloc(1, sizeof(*ctx));
 	if (!ctx)
@@ -965,8 +966,10 @@ static struct xsk_ctx *xsk_create_ctx(struct xsk_socket *xsk,
 	ctx->refcount = 1;
 	ctx->umem = umem;
 	ctx->queue_id = queue_id;
-	memcpy(ctx->ifname, ifname, IFNAMSIZ - 1);
-	ctx->ifname[IFNAMSIZ - 1] = '\0';
+
+	ifnamlen = strnlen(ifname, IFNAMSIZ);
+	memcpy(ctx->ifname, ifname, ifnamlen);
+	ctx->ifname[IFNAMSIZ - 1] = 0;

 	ctx->fill = fill;
 	ctx->comp = comp;
-- 
2.32.0

^ permalink raw reply related

* Re: [PATCH v1 1/1] can: mcp251x: Get rid of duplicate of_node assignment
From: Andy Shevchenko @ 2021-12-09 11:58 UTC (permalink / raw)
  To: Marc Kleine-Budde, linux-can, netdev, linux-kernel
  Cc: Wolfgang Grandegger, David S. Miller, Jakub Kicinski
In-Reply-To: <20211202205855.76946-1-andriy.shevchenko@linux.intel.com>

On Thu, Dec 02, 2021 at 10:58:55PM +0200, Andy Shevchenko wrote:
> GPIO library does copy the of_node from the parent device of
> the GPIO chip, there is no need to repeat this in the individual
> drivers. Remove assignment here.
> 
> For the details one may look into the of_gpio_dev_init() implementation.

Marc, what do you think about this change?

-- 
With Best Regards,
Andy Shevchenko

^ permalink raw reply

* Re: [PATCH v2 net-next] tcp: Warn if sock_owned_by_user() is true in tcp_child_process().
From: Paolo Abeni @ 2021-12-09 11:59 UTC (permalink / raw)
  To: Kuniyuki Iwashima, edumazet; +Cc: benh, davem, kuba, kuni1840, netdev
In-Reply-To: <20211209110746.91987-1-kuniyu@amazon.co.jp>

On Thu, 2021-12-09 at 20:07 +0900, Kuniyuki Iwashima wrote:
> From:   Eric Dumazet <edumazet@google.com>
> Date:   Thu, 9 Dec 2021 00:00:35 -0800
> > On Wed, Dec 8, 2021 at 5:33 PM Kuniyuki Iwashima <kuniyu@amazon.co.jp> wrote:
> > > 
> > > While creating a child socket from ACK (not TCP Fast Open case), before
> > > v2.3.41, we used to call bh_lock_sock() later than now; it was called just
> > > before tcp_rcv_state_process().  The full socket was put into an accept
> > > queue and exposed to other CPUs before bh_lock_sock() so that process
> > > context might have acquired the lock by then.  Thus, we had to check if any
> > > process context was accessing the socket before tcp_rcv_state_process().
> > > 
> > 
> > I think you misunderstood me.
> > 
> > I think this code is not dead yet, so I would :
> > 
> > Not include a Fixes: tag to avoid unnecessary backports (of a patch
> > and its revert)
> > 
> > If you want to get syzbot coverage for few releases, especially with
> > MPTCP and synflood,
> > you  can then submit a patch like the following.
> 
> Sorry, I got on the same page.
> Let me take a look at MPTCP, then if I still think it is dead code, I will
> submit the patch.

For the records, I think the 'else' branch should be reachble with
MPTCP in some non trivial scenario, e.g. MPJ subflows 3WHS racing with
setsockopt on the main MPTCP socket. I'm unsure if syzbot could catch
that, as it needs mptcp endpoints configuration.

Cheers,

Paolo

^ permalink raw reply

* Re: [PATCH] net: bonding: Add support for IPV6 ns/na
From: kernel test robot @ 2021-12-09 11:57 UTC (permalink / raw)
  To: Sun Shouxin, j.vosburgh, vfalico, andy, davem, kuba
  Cc: llvm, kbuild-all, netdev, linux-kernel, huyd12
In-Reply-To: <1639032622-28098-1-git-send-email-sunshouxin@chinatelecom.cn>

Hi Sun,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on linus/master]
[also build test ERROR on v5.16-rc4 next-20211208]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Sun-Shouxin/net-bonding-Add-support-for-IPV6-ns-na/20211209-150108
base:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 2a987e65025e2b79c6d453b78cb5985ac6e5eb26
config: riscv-randconfig-c006-20211209 (https://download.01.org/0day-ci/archive/20211209/202112091907.6iLel0c9-lkp@intel.com/config)
compiler: clang version 14.0.0 (https://github.com/llvm/llvm-project 097a1cb1d5ebb3a0ec4bcaed8ba3ff6a8e33c00a)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # install riscv cross compiling tool for clang build
        # apt-get install binutils-riscv64-linux-gnu
        # https://github.com/0day-ci/linux/commit/ab724c314fcdcaa60e70c590850b2ce57430d7fa
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Sun-Shouxin/net-bonding-Add-support-for-IPV6-ns-na/20211209-150108
        git checkout ab724c314fcdcaa60e70c590850b2ce57430d7fa
        # save the config file to linux build tree
        mkdir build_dir
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=riscv SHELL=/bin/bash drivers/net/bonding/

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

>> drivers/net/bonding/bond_alb.c:1307:26: error: implicit declaration of function 'csum_ipv6_magic' [-Werror,-Wimplicit-function-declaration]
                           icmp6h->icmp6_cksum = csum_ipv6_magic(&ip6hdr->saddr,
                                                 ^
   drivers/net/bonding/bond_alb.c:1307:26: note: did you mean 'csum_tcpudp_magic'?
   include/asm-generic/checksum.h:52:1: note: 'csum_tcpudp_magic' declared here
   csum_tcpudp_magic(__be32 saddr, __be32 daddr, __u32 len,
   ^
   1 error generated.


vim +/csum_ipv6_magic +1307 drivers/net/bonding/bond_alb.c

  1272	
  1273	static void alb_change_nd_option(struct sk_buff *skb, void *data)
  1274	{
  1275		struct nd_msg *msg = (struct nd_msg *)skb_transport_header(skb);
  1276		struct nd_opt_hdr *nd_opt = (struct nd_opt_hdr *)msg->opt;
  1277		struct net_device *dev = skb->dev;
  1278		struct icmp6hdr *icmp6h = icmp6_hdr(skb);
  1279		struct ipv6hdr *ip6hdr = ipv6_hdr(skb);
  1280		u8 *lladdr = NULL;
  1281		u32 ndoptlen = skb_tail_pointer(skb) - (skb_transport_header(skb) +
  1282					offsetof(struct nd_msg, opt));
  1283	
  1284		while (ndoptlen) {
  1285			int l;
  1286	
  1287			switch (nd_opt->nd_opt_type) {
  1288			case ND_OPT_SOURCE_LL_ADDR:
  1289			case ND_OPT_TARGET_LL_ADDR:
  1290			lladdr = ndisc_opt_addr_data(nd_opt, dev);
  1291			break;
  1292	
  1293			default:
  1294			break;
  1295			}
  1296	
  1297			l = nd_opt->nd_opt_len << 3;
  1298	
  1299			if (ndoptlen < l || l == 0)
  1300				return;
  1301	
  1302			if (lladdr) {
  1303				memcpy(lladdr, data, dev->addr_len);
  1304				lladdr = NULL;
  1305				icmp6h->icmp6_cksum = 0;
  1306	
> 1307				icmp6h->icmp6_cksum = csum_ipv6_magic(&ip6hdr->saddr,
  1308								      &ip6hdr->daddr,
  1309							ntohs(ip6hdr->payload_len),
  1310							IPPROTO_ICMPV6,
  1311							csum_partial(icmp6h,
  1312								     ntohs(ip6hdr->payload_len), 0));
  1313				lladdr = NULL;
  1314			}
  1315			ndoptlen -= l;
  1316			nd_opt = ((void *)nd_opt) + l;
  1317		}
  1318	}
  1319	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

^ permalink raw reply

* Re: [PATCH net-next v3 3/6] net: lan966x: add support for interrupts from analyzer
From: Vladimir Oltean @ 2021-12-09 11:47 UTC (permalink / raw)
  To: Horatiu Vultur
  Cc: netdev@vger.kernel.org, devicetree@vger.kernel.org,
	linux-kernel@vger.kernel.org, davem@davemloft.net,
	kuba@kernel.org, robh+dt@kernel.org, UNGLinuxDriver@microchip.com,
	linux@armlinux.org.uk, f.fainelli@gmail.com,
	vivien.didelot@gmail.com, andrew@lunn.ch
In-Reply-To: <20211209094615.329379-4-horatiu.vultur@microchip.com>

On Thu, Dec 09, 2021 at 10:46:12AM +0100, Horatiu Vultur wrote:
> This patch adds support for handling the interrupts generated by the
> analyzer. Currently, only the MAC table generates these interrupts.
> The MAC table will generate an interrupt whenever it learns or forgets
> an entry in the table. It is the SW responsibility figure out which
> entries were added/removed.
> 
> Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
> ---
>  .../ethernet/microchip/lan966x/lan966x_mac.c  | 244 ++++++++++++++++++
>  .../ethernet/microchip/lan966x/lan966x_main.c |  23 ++
>  .../ethernet/microchip/lan966x/lan966x_main.h |   6 +
>  3 files changed, 273 insertions(+)
> 
> diff --git a/drivers/net/ethernet/microchip/lan966x/lan966x_mac.c b/drivers/net/ethernet/microchip/lan966x/lan966x_mac.c
> index f6878b9f57ef..c01ab01bffbf 100644
> --- a/drivers/net/ethernet/microchip/lan966x/lan966x_mac.c
> +++ b/drivers/net/ethernet/microchip/lan966x/lan966x_mac.c
> @@ -1,5 +1,6 @@
>  // SPDX-License-Identifier: GPL-2.0+
>  
> +#include <net/switchdev.h>
>  #include "lan966x_main.h"
>  
>  #define LAN966X_MAC_COLUMNS		4
> @@ -13,6 +14,23 @@
>  #define MACACCESS_CMD_WRITE		7
>  #define MACACCESS_CMD_SYNC_GET_NEXT	8
>  
> +#define LAN966X_MAC_INVALID_ROW		-1
> +
> +struct lan966x_mac_entry {
> +	struct list_head list;
> +	unsigned char mac[ETH_ALEN] __aligned(2);
> +	u16 vid;
> +	u16 port_index;
> +	int row;
> +};
> +
> +struct lan966x_mac_raw_entry {
> +	u32 mach;
> +	u32 macl;
> +	u32 maca;
> +	bool process;
> +};
> +
>  static int lan966x_mac_get_status(struct lan966x *lan966x)
>  {
>  	return lan_rd(lan966x, ANA_MACACCESS);
> @@ -98,4 +116,230 @@ void lan966x_mac_init(struct lan966x *lan966x)
>  	/* Clear the MAC table */
>  	lan_wr(MACACCESS_CMD_INIT, lan966x, ANA_MACACCESS);
>  	lan966x_mac_wait_for_completion(lan966x);
> +
> +	spin_lock_init(&lan966x->mac_lock);
> +	INIT_LIST_HEAD(&lan966x->mac_entries);
> +}
> +
> +static struct lan966x_mac_entry *lan966x_mac_alloc_entry(const unsigned char *mac,
> +							 u16 vid, u16 port_index)
> +{
> +	struct lan966x_mac_entry *mac_entry;
> +
> +	mac_entry = kzalloc(sizeof(*mac_entry), GFP_KERNEL);
> +	if (!mac_entry)
> +		return NULL;
> +
> +	memcpy(mac_entry->mac, mac, ETH_ALEN);
> +	mac_entry->vid = vid;
> +	mac_entry->port_index = port_index;
> +	mac_entry->row = LAN966X_MAC_INVALID_ROW;
> +	return mac_entry;
> +}
> +
> +static void lan966x_fdb_call_notifiers(enum switchdev_notifier_type type,
> +				       const char *mac, u16 vid,
> +				       struct net_device *dev)
> +{
> +	struct switchdev_notifier_fdb_info info = { 0 };
> +
> +	info.addr = mac;
> +	info.vid = vid;
> +	info.offloaded = true;
> +	call_switchdev_notifiers(type, dev, &info.info, NULL);
> +}
> +
> +void lan966x_mac_purge_entries(struct lan966x *lan966x)
> +{
> +	struct lan966x_mac_entry *mac_entry, *tmp;
> +	unsigned long flags;
> +
> +	spin_lock_irqsave(&lan966x->mac_lock, flags);

I hope I'm not wrong, but you are only using this spinlock to serialize
access to the list, which isn't accessed from hardirq context anywhere
(the irq is threaded). So spin_lock_irqsave could simply be spin_lock.
Unless...

> +	list_for_each_entry_safe(mac_entry, tmp, &lan966x->mac_entries,
> +				 list) {
> +		lan966x_mac_forget(lan966x, mac_entry->mac, mac_entry->vid,
> +				   ENTRYTYPE_LOCKED);

Does this generate a MAC table interrupt?

> +
> +		list_del(&mac_entry->list);
> +		kfree(mac_entry);
> +	}
> +	spin_unlock_irqrestore(&lan966x->mac_lock, flags);
> +}
> +
> +static void lan966x_mac_notifiers(struct lan966x *lan966x,
> +				  enum switchdev_notifier_type type,
> +				  unsigned char *mac, u32 vid,
> +				  struct net_device *dev)
> +{
> +	rtnl_lock();
> +	lan966x_fdb_call_notifiers(type, mac, vid, dev);
> +	rtnl_unlock();
> +}
> +
> +static void lan966x_mac_process_raw_entry(struct lan966x_mac_raw_entry *raw_entry,
> +					  u8 *mac, u16 *vid, u32 *dest_idx)
> +{
> +	mac[0] = (raw_entry->mach >> 8)  & 0xff;
> +	mac[1] = (raw_entry->mach >> 0)  & 0xff;
> +	mac[2] = (raw_entry->macl >> 24) & 0xff;
> +	mac[3] = (raw_entry->macl >> 16) & 0xff;
> +	mac[4] = (raw_entry->macl >> 8)  & 0xff;
> +	mac[5] = (raw_entry->macl >> 0)  & 0xff;
> +
> +	*vid = (raw_entry->mach >> 16) & 0xfff;
> +	*dest_idx = ANA_MACACCESS_DEST_IDX_GET(raw_entry->maca);
> +}
> +
> +static void lan966x_mac_irq_process(struct lan966x *lan966x, u32 row,
> +				    struct lan966x_mac_raw_entry *raw_entries)
> +{
> +	struct lan966x_mac_entry *mac_entry, *tmp;
> +	char mac[ETH_ALEN] __aligned(2);

unsigned char

> +	unsigned long flags;
> +	u32 dest_idx;
> +	u32 column;
> +	u16 vid;
> +
> +	spin_lock_irqsave(&lan966x->mac_lock, flags);
> +	list_for_each_entry_safe(mac_entry, tmp, &lan966x->mac_entries, list) {
> +		bool found = false;
> +
> +		if (mac_entry->row != row)
> +			continue;

When the MAC table gets large, you could consider keeping separate lists
per row. This way you can avoid traversing a list of elements you're
sure you don't care about.

> +
> +		for (column = 0; column < LAN966X_MAC_COLUMNS; ++column) {
> +			/* All the valid entries are at the start of the row,
> +			 * so when get one invalid entry it can just skip the
> +			 * rest of the columns
> +			 */
> +			if (!ANA_MACACCESS_VALID_GET(raw_entries[column].maca))
> +				break;
> +
> +			lan966x_mac_process_raw_entry(&raw_entries[column],
> +						      mac, &vid, &dest_idx);
> +			WARN_ON(dest_idx > lan966x->num_phys_ports);
> +
> +			/* If the entry in SW is found, then there is nothing
> +			 * to do
> +			 */
> +			if (mac_entry->vid == vid &&
> +			    ether_addr_equal(mac_entry->mac, mac) &&
> +			    mac_entry->port_index == dest_idx) {
> +				raw_entries[column].process = true;
> +				found = true;
> +				break;
> +			}
> +		}
> +
> +		if (!found) {
> +			/* Notify the bridge that the entry doesn't exist
> +			 * anymore in the HW and remmove the entry from the SW

s/remmove/remove/

> +			 * list
> +			 */
> +			lan966x_mac_notifiers(lan966x, SWITCHDEV_FDB_DEL_TO_BRIDGE,
> +					      mac_entry->mac, mac_entry->vid,
> +					      lan966x->ports[mac_entry->port_index]->dev);
> +
> +			list_del(&mac_entry->list);
> +			kfree(mac_entry);
> +		}
> +	}
> +	spin_unlock_irqrestore(&lan966x->mac_lock, flags);
> +
> +	/* Now go to the list of columns and see if any entry was not in the SW
> +	 * list, then that means that the entry is new so it needs to notify the
> +	 * bridge.
> +	 */
> +	for (column = 0; column < LAN966X_MAC_COLUMNS; ++column) {
> +		/* All the valid entries are at the start of the row, so when
> +		 * get one invalid entry it can just skip the rest of the columns
> +		 */
> +		if (!ANA_MACACCESS_VALID_GET(raw_entries[column].maca))
> +			break;
> +
> +		/* If the entry already exists then don't do anything */
> +		if (raw_entries[column].process)

s/process/processed/

> +			continue;
> +
> +		lan966x_mac_process_raw_entry(&raw_entries[column],
> +					      mac, &vid, &dest_idx);
> +		WARN_ON(dest_idx > lan966x->num_phys_ports);
> +
> +		mac_entry = lan966x_mac_alloc_entry(mac, vid, dest_idx);
> +		if (!mac_entry)
> +			return;
> +
> +		mac_entry->row = row;
> +
> +		spin_lock_irqsave(&lan966x->mac_lock, flags);
> +		list_add_tail(&mac_entry->list, &lan966x->mac_entries);
> +		spin_unlock_irqrestore(&lan966x->mac_lock, flags);

spin_lock_irqsave shouldn't be necessary from an irq handler.

> +
> +		lan966x_mac_notifiers(lan966x, SWITCHDEV_FDB_ADD_TO_BRIDGE,
> +				      mac, vid, lan966x->ports[dest_idx]->dev);
> +	}
> +}
> +
> +irqreturn_t lan966x_mac_irq_handler(struct lan966x *lan966x)
> +{
> +	struct lan966x_mac_raw_entry entry[LAN966X_MAC_COLUMNS] = { 0 };
> +	u32 index, column;
> +	bool stop = true;
> +	u32 val;
> +
> +	/* Check if the mac table triggered this, if not just bail out */
> +	if (!(ANA_ANAINTR_INTR_GET(lan_rd(lan966x, ANA_ANAINTR))))
> +		return IRQ_NONE;

The interrupt isn't shared, so if we enter this condition, it means the
analyzer block generated it, just not the MAC table portion of it.
If we return IRQ_NONE there will be an IRQ storm because that condition
will never go away. Could we ack the interrupt and return IRQ_HANDLED?

> +
> +	/* Start the scan from 0, 0 */
> +	lan_wr(ANA_MACTINDX_M_INDEX_SET(0) |
> +	       ANA_MACTINDX_BUCKET_SET(0),
> +	       lan966x, ANA_MACTINDX);
> +
> +	while (1) {
> +		lan_rmw(ANA_MACACCESS_MAC_TABLE_CMD_SET(MACACCESS_CMD_SYNC_GET_NEXT),
> +			ANA_MACACCESS_MAC_TABLE_CMD,
> +			lan966x, ANA_MACACCESS);
> +		lan966x_mac_wait_for_completion(lan966x);
> +
> +		val = lan_rd(lan966x, ANA_MACTINDX);
> +		index = ANA_MACTINDX_M_INDEX_GET(val);
> +		column = ANA_MACTINDX_BUCKET_GET(val);
> +
> +		/* The SYNC-GET-NEXT returns all the entries(4) in a row in
> +		 * which is suffered a change. By change it means that new entry
> +		 * was added or an entry was removed because of ageing.
> +		 * It would return all the columns for that row. And after that
> +		 * it would return the next row The stop conditions of the
> +		 * SYNC-GET-NEXT is when it reaches 'directly' to row 0
> +		 * column 3. So if SYNC-GET-NEXT returns row 0 and column 0
> +		 * then it is required to continue to read more even if it
> +		 * reaches row 0 and column 3.
> +		 */
> +		if (index == 0 && column == 0)
> +			stop = false;
> +
> +		if (column == LAN966X_MAC_COLUMNS - 1 &&
> +		    index == 0 && stop)
> +			break;
> +
> +		entry[column].mach = lan_rd(lan966x, ANA_MACHDATA);
> +		entry[column].macl = lan_rd(lan966x, ANA_MACLDATA);
> +		entry[column].maca = lan_rd(lan966x, ANA_MACACCESS);
> +
> +		/* Once all the columns are read process them */
> +		if (column == LAN966X_MAC_COLUMNS - 1) {
> +			lan966x_mac_irq_process(lan966x, index, entry);
> +			/* A row was processed so it is safe to assume that the
> +			 * next row/column can be the stop condition
> +			 */
> +			stop = true;
> +		}
> +	}
> +
> +	lan_rmw(ANA_ANAINTR_INTR_SET(0),
> +		ANA_ANAINTR_INTR,
> +		lan966x, ANA_ANAINTR);
> +
> +	return IRQ_HANDLED;
>  }
> diff --git a/drivers/net/ethernet/microchip/lan966x/lan966x_main.c b/drivers/net/ethernet/microchip/lan966x/lan966x_main.c
> index 101c1f005baf..7c6d6293611a 100644
> --- a/drivers/net/ethernet/microchip/lan966x/lan966x_main.c
> +++ b/drivers/net/ethernet/microchip/lan966x/lan966x_main.c
> @@ -527,6 +527,13 @@ static irqreturn_t lan966x_xtr_irq_handler(int irq, void *args)
>  	return IRQ_HANDLED;
>  }
>  
> +static irqreturn_t lan966x_ana_irq_handler(int irq, void *args)
> +{
> +	struct lan966x *lan966x = args;
> +
> +	return lan966x_mac_irq_handler(lan966x);
> +}
> +
>  static void lan966x_cleanup_ports(struct lan966x *lan966x)
>  {
>  	struct lan966x_port *port;
> @@ -554,6 +561,11 @@ static void lan966x_cleanup_ports(struct lan966x *lan966x)
>  
>  	disable_irq(lan966x->xtr_irq);
>  	lan966x->xtr_irq = -ENXIO;
> +
> +	if (lan966x->ana_irq) {
> +		disable_irq(lan966x->ana_irq);
> +		lan966x->ana_irq = -ENXIO;
> +	}
>  }
>  
>  static int lan966x_probe_port(struct lan966x *lan966x, u32 p,
> @@ -870,6 +882,15 @@ static int lan966x_probe(struct platform_device *pdev)
>  		return -ENODEV;
>  	}
>  
> +	lan966x->ana_irq = platform_get_irq_byname(pdev, "ana");
> +	if (lan966x->ana_irq) {
> +		err = devm_request_threaded_irq(&pdev->dev, lan966x->ana_irq, NULL,
> +						lan966x_ana_irq_handler, IRQF_ONESHOT,
> +						"ana irq", lan966x);
> +		if (err)
> +			return dev_err_probe(&pdev->dev, err, "Unable to use ana irq");
> +	}
> +
>  	/* init switch */
>  	lan966x_init(lan966x);
>  	lan966x_stats_init(lan966x);
> @@ -923,6 +944,8 @@ static int lan966x_remove(struct platform_device *pdev)
>  	destroy_workqueue(lan966x->stats_queue);
>  	mutex_destroy(&lan966x->stats_lock);
>  
> +	lan966x_mac_purge_entries(lan966x);
> +
>  	return 0;
>  }
>  
> diff --git a/drivers/net/ethernet/microchip/lan966x/lan966x_main.h b/drivers/net/ethernet/microchip/lan966x/lan966x_main.h
> index 7e5a3b6f168d..ba548d65b58a 100644
> --- a/drivers/net/ethernet/microchip/lan966x/lan966x_main.h
> +++ b/drivers/net/ethernet/microchip/lan966x/lan966x_main.h
> @@ -75,6 +75,9 @@ struct lan966x {
>  
>  	u8 base_mac[ETH_ALEN];
>  
> +	struct list_head mac_entries;
> +	spinlock_t mac_lock; /* lock for mac_entries list */
> +
>  	/* stats */
>  	const struct lan966x_stat_layout *stats_layout;
>  	u32 num_stats;
> @@ -87,6 +90,7 @@ struct lan966x {
>  
>  	/* interrupts */
>  	int xtr_irq;
> +	int ana_irq;
>  };
>  
>  struct lan966x_port_config {
> @@ -141,6 +145,8 @@ int lan966x_mac_forget(struct lan966x *lan966x,
>  int lan966x_mac_cpu_learn(struct lan966x *lan966x, const char *addr, u16 vid);
>  int lan966x_mac_cpu_forget(struct lan966x *lan966x, const char *addr, u16 vid);
>  void lan966x_mac_init(struct lan966x *lan966x);
> +void lan966x_mac_purge_entries(struct lan966x *lan966x);
> +irqreturn_t lan966x_mac_irq_handler(struct lan966x *lan966x);
>  
>  static inline void __iomem *lan_addr(void __iomem *base[],
>  				     int id, int tinst, int tcnt,
> -- 
> 2.33.0
>

^ permalink raw reply

* Re: [net-next 6/6] can: mcp251xfd: mcp251xfd_regmap_crc_read(): work around broken CRC on TBC register
From: Marc Kleine-Budde @ 2021-12-09 11:27 UTC (permalink / raw)
  To: Sven Schuchmann
  Cc: Thomas.Kopp@microchip.com, pavel.modilaynen@volvocars.com,
	drew@beagleboard.org, linux-can@vger.kernel.org,
	menschel.p@posteo.de, netdev@vger.kernel.org, will@macchina.cc
In-Reply-To: <PA4P190MB1390F869654448440F869BCBD9709@PA4P190MB1390.EURP190.PROD.OUTLOOK.COM>

[-- Attachment #1: Type: text/plain, Size: 1890 bytes --]

On 09.12.2021 11:17:09, Sven Schuchmann wrote:
> we are also seeing the CRC Errors in our setup (rpi4, Kernel 5.10.x)
> from time to time. I just wanted to post here what I am seeing, maybe
> it helps...
> 
> [    6.761711] spi_master spi1: will run message pump with realtime priority
> [    6.778063] mcp251xfd spi1.0 can1: MCP2518FD rev0.0 (-RX_INT -MAB_NO_WARN +CRC_REG +CRC_RX +CRC_TX +ECC -HD c:40.00MHz m:20.00MHz r:17.00MHz e:16.66MHz) successfully initialized.
> 
> [ 4327.107856] mcp251xfd spi1.0 canfd1: CRC read error at address 0x0010 (length=4, data=00 cc 62 c4, CRC=0xa3a0) retrying.
> [ 7770.163335] mcp251xfd spi1.0 canfd1: CRC read error at address 0x0010 (length=4, data=00 bf 16 d5, CRC=0x9d3c) retrying.
> [ 8000.565955] mcp251xfd spi1.0 canfd1: CRC read error at address 0x0010 (length=4, data=00 40 66 fa, CRC=0x31d7) retrying.
> [ 9753.658173] mcp251xfd spi1.0 canfd1: CRC read error at address 0x0010 (length=4, data=80 e9 01 4e, CRC=0xe862) retrying.

You are using the a back port of my HW timestamp in your v5.10 branch.
So every 45 seconds the TBC register (address 0x0010) is read,
additionally for every CAN error frame.

In the mean time, I've implemented a workaround for the CRC read errors:

| c7eb923c3caf can: mcp251xfd: mcp251xfd_regmap_crc_read(): work around broken CRC on TBC register
| ef7a8c3e7599 can: mcp251xfd: mcp251xfd_regmap_crc_read_one(): Factor out crc check into separate function

It fixes the CRC read error, if the first data byte is 0x00 or 0x80.

These messages should disappear, if you cherry-pick the above patches.

regards,
Marc

-- 
Pengutronix e.K.                 | Marc Kleine-Budde           |
Embedded Linux                   | https://www.pengutronix.de  |
Vertretung West/Dortmund         | Phone: +49-231-2826-924     |
Amtsgericht Hildesheim, HRA 2686 | Fax:   +49-5121-206917-5555 |

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply

* Re: [PATCH] libceph, ceph: potential dereference of null pointer
From: Jeff Layton @ 2021-12-09 11:20 UTC (permalink / raw)
  To: Jiasheng Jiang, idryomov, davem, kuba; +Cc: ceph-devel, netdev, linux-kernel
In-Reply-To: <20211209025038.2028112-1-jiasheng@iscas.ac.cn>

On Thu, 2021-12-09 at 10:50 +0800, Jiasheng Jiang wrote:
> The return value of kzalloc() needs to be checked.
> To avoid use of null pointer in case of the failure of alloc.
> 
> Fixes: 3d14c5d2b6e1 ("ceph: factor out libceph from Ceph file system")
> Signed-off-by: Jiasheng Jiang <jiasheng@iscas.ac.cn>
> ---
>  net/ceph/osd_client.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/net/ceph/osd_client.c b/net/ceph/osd_client.c
> index ff8624a7c964..3203e8a34370 100644
> --- a/net/ceph/osd_client.c
> +++ b/net/ceph/osd_client.c
> @@ -1234,6 +1234,8 @@ static struct ceph_osd *create_osd(struct ceph_osd_client *osdc, int onum)
>  	WARN_ON(onum == CEPH_HOMELESS_OSD);
>  
>  	osd = kzalloc(sizeof(*osd), GFP_NOIO | __GFP_NOFAIL);
> +	if (!osd)
> +		return NULL;
>  	osd_init(osd);
>  	osd->o_osdc = osdc;
>  	osd->o_osd = onum;

__GFP_NOFAIL should ensure that it never returns NULL, right?

Also, if you're going to fix this up to handle that error then you
probably also need to fix lookup_create_osd to handle a NULL return from
create_osd as well.
-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply

* Re: [PATCH V6 4/5] scsi: storvsc: Add Isolation VM support for storvsc driver
From: Tianyu Lan @ 2021-12-09 11:17 UTC (permalink / raw)
  To: Long Li, KY Srinivasan, Haiyang Zhang, Stephen Hemminger,
	wei.liu@kernel.org, Dexuan Cui, tglx@linutronix.de,
	mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com,
	x86@kernel.org, hpa@zytor.com, davem@davemloft.net,
	kuba@kernel.org, jejb@linux.ibm.com, martin.petersen@oracle.com,
	arnd@arndb.de, hch@infradead.org, m.szyprowski@samsung.com,
	robin.murphy@arm.com, Tianyu Lan, thomas.lendacky@amd.com,
	Michael Kelley (LINUX)
  Cc: iommu@lists.linux-foundation.org, linux-arch@vger.kernel.org,
	linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-scsi@vger.kernel.org, netdev@vger.kernel.org, vkuznets,
	brijesh.singh@amd.com, konrad.wilk@oracle.com, hch@lst.de,
	joro@8bytes.org, parri.andrea@gmail.com, dave.hansen@intel.com
In-Reply-To: <BY5PR21MB1506535EF9222ED4300C38BBCE709@BY5PR21MB1506.namprd21.prod.outlook.com>



On 12/9/2021 4:00 PM, Long Li wrote:
>> @@ -1848,21 +1851,22 @@ static int storvsc_queuecommand(struct Scsi_Host
>> *host, struct scsi_cmnd *scmnd)
>>   		payload->range.len = length;
>>   		payload->range.offset = offset_in_hvpg;
>>
>> +		sg_count = scsi_dma_map(scmnd);
>> +		if (sg_count < 0)
>> +			return SCSI_MLQUEUE_DEVICE_BUSY;
> Hi Tianyu,
> 
> This patch (and this patch series) unconditionally adds code for dealing with DMA addresses for all VMs, including non-isolation VMs.
> 
> Does this add performance penalty for VMs that don't require isolation?
> 

Hi Long:
	scsi_dma_map() in the traditional VM just save sg->offset to
sg->dma_address and no data copy because swiotlb bounce buffer code
doesn't work. The data copy only takes place in the Isolation VM and
swiotlb_force is set. So there is no additional overhead in the 
traditional VM.

Thanks.

^ permalink raw reply

* AW: [net-next 6/6] can: mcp251xfd: mcp251xfd_regmap_crc_read(): work around broken CRC on TBC register
From: Sven Schuchmann @ 2021-12-09 11:17 UTC (permalink / raw)
  To: Thomas.Kopp@microchip.com, pavel.modilaynen@volvocars.com,
	mkl@pengutronix.de
  Cc: drew@beagleboard.org, linux-can@vger.kernel.org,
	menschel.p@posteo.de, netdev@vger.kernel.org, will@macchina.cc
In-Reply-To: <DM4PR11MB5390BA1C370A5AF90E666F1EFB709@DM4PR11MB5390.namprd11.prod.outlook.com>

Hi All,

we are also seeing the CRC Errors in our setup (rpi4, Kernel 5.10.x)
from time to time. I just wanted to post here what I am seeing, maybe it helps...

[    6.761711] spi_master spi1: will run message pump with realtime priority
[    6.778063] mcp251xfd spi1.0 can1: MCP2518FD rev0.0 (-RX_INT -MAB_NO_WARN +CRC_REG +CRC_RX +CRC_TX +ECC -HD c:40.00MHz m:20.00MHz r:17.00MHz e:16.66MHz) successfully initialized.

[ 4327.107856] mcp251xfd spi1.0 canfd1: CRC read error at address 0x0010 (length=4, data=00 cc 62 c4, CRC=0xa3a0) retrying.
[ 7770.163335] mcp251xfd spi1.0 canfd1: CRC read error at address 0x0010 (length=4, data=00 bf 16 d5, CRC=0x9d3c) retrying.
[ 8000.565955] mcp251xfd spi1.0 canfd1: CRC read error at address 0x0010 (length=4, data=00 40 66 fa, CRC=0x31d7) retrying.
[ 9753.658173] mcp251xfd spi1.0 canfd1: CRC read error at address 0x0010 (length=4, data=80 e9 01 4e, CRC=0xe862) retrying.


Sven


> -----Ursprüngliche Nachricht-----
> Von: Thomas.Kopp@microchip.com <Thomas.Kopp@microchip.com>
> Gesendet: Donnerstag, 9. Dezember 2021 11:22
> An: pavel.modilaynen@volvocars.com; mkl@pengutronix.de
> Cc: drew@beagleboard.org; linux-can@vger.kernel.org; menschel.p@posteo.de;
> netdev@vger.kernel.org; will@macchina.cc
> Betreff: RE: [net-next 6/6] can: mcp251xfd: mcp251xfd_regmap_crc_read(): work around
> broken CRC on TBC register
> 
> Hi Pavel,
> 
> > We have the similar CRC read errors but
> > the lowest byte is not 0x00 and 0x80, it's actually 0x0x or 0x8x, e.g.
> >
> > mcp251xfd spi0.0 can0: CRC read error at address 0x0010 (length=4,
> > data=82 d1 fa 6c, CRC=0xd9c2) retrying.
> >
> > 0xb0 0x10 0x04 0x82 0xd1 0xfa 0x6c => 0x59FD (not matching)
> >
> > but if I flip the first received bit  (highest bit in the lowest byte):
> > 0xb0 0x10 0x04 0x02 0xd1 0xfa 0x6c => 0xD9C2 (matching!)
> 
> What settings do you have on your setup? Can you please print the dmesg output from the
> init? I'm especially interested in Sysclk and SPI speed.
> 
> Thanks,
> Thomas

^ permalink raw reply

* Re: [PATCH net-next v4 4/7] devlink: Add new "event_eq_size" generic device param
From: Jiri Pirko @ 2021-12-09 11:16 UTC (permalink / raw)
  To: Shay Drory
  Cc: David S . Miller, Jakub Kicinski, jiri, saeedm, netdev,
	linux-kernel, Moshe Shemesh
In-Reply-To: <20211209100929.28115-5-shayd@nvidia.com>

Thu, Dec 09, 2021 at 11:09:26AM CET, shayd@nvidia.com wrote:
>Add new device generic parameter to determine the size of the
>asynchronous control events EQ.
>
>For example, to reduce event EQ size to 64, execute:
>$ devlink dev param set pci/0000:06:00.0 \
>              name event_eq_size value 64 cmode driverinit
>$ devlink dev reload pci/0000:06:00.0
>
>Signed-off-by: Shay Drory <shayd@nvidia.com>
>Reviewed-by: Moshe Shemesh <moshe@nvidia.com>

Reviewed-by: Jiri Pirko <jiri@nvidia.com>

^ permalink raw reply

* Re: [PATCH net-next v4 2/7] devlink: Add new "io_eq_size" generic device param
From: Jiri Pirko @ 2021-12-09 11:16 UTC (permalink / raw)
  To: Shay Drory
  Cc: David S . Miller, Jakub Kicinski, jiri, saeedm, netdev,
	linux-kernel, Moshe Shemesh
In-Reply-To: <20211209100929.28115-3-shayd@nvidia.com>

Thu, Dec 09, 2021 at 11:09:24AM CET, shayd@nvidia.com wrote:
>Add new device generic parameter to determine the size of the
>I/O completion EQs.
>
>For example, to reduce I/O EQ size to 64, execute:
>$ devlink dev param set pci/0000:06:00.0 \
>              name io_eq_size value 64 cmode driverinit
>$ devlink dev reload pci/0000:06:00.0
>
>Signed-off-by: Shay Drory <shayd@nvidia.com>
>Reviewed-by: Moshe Shemesh <moshe@nvidia.com>

Reviewed-by: Jiri Pirko <jiri@nvidia.com>

^ permalink raw reply

* Re: [PATCH] net: dsa: felix: Fix memory leak in felix_setup_mmio_filtering
From: José Expósito @ 2021-12-09 11:09 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Vladimir Oltean, Claudiu Manoil, alexandre.belloni@bootlin.com,
	andrew@lunn.ch, vivien.didelot@gmail.com, f.fainelli@gmail.com,
	davem@davemloft.net, linux@armlinux.org.uk,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org
In-Reply-To: <20211208151030.1b489fad@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com>

On Wed, Dec 08, 2021 at 03:10:30PM -0800, Jakub Kicinski wrote:
> On Wed, 8 Dec 2021 18:13:32 +0000 Vladimir Oltean wrote:
> > Impossible memory leak, I might add, because DSA will error out much
> > soon if there isn't any CPU port defined:
> > https://elixir.bootlin.com/linux/v5.15.7/source/net/dsa/dsa2.c#L374
> > I don't think I should have added the "if (cpu < 0)" check at all, but
> > then it would have raised other flags, about BIT(negative number) or
> > things like that. I don't know what's the best way to deal with this?
> > 
> > Anyway, in case we find no better alternative:
> > 
> > Fixes: 8d5f7954b7c8 ("net: dsa: felix: break at first CPU port during init and teardown")
> > Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
> 
> If this is the way to go please repost with the tags added
> and a commit message.

Thanks for the quick review. I just sent v2 in case you decide to keep
the cpu check:

https://lore.kernel.org/netdev/20211209110538.11585-1-jose.exposito89@gmail.com/T/#u

Jose

^ permalink raw reply

* Re: [PATCH v2 net-next] tcp: Warn if sock_owned_by_user() is true in tcp_child_process().
From: Kuniyuki Iwashima @ 2021-12-09 11:07 UTC (permalink / raw)
  To: edumazet; +Cc: benh, davem, kuba, kuni1840, kuniyu, netdev
In-Reply-To: <CANn89iJ12OugQTv4JHwVWKtZp88sbQKXD61PvnQWOo3009tTKQ@mail.gmail.com>

From:   Eric Dumazet <edumazet@google.com>
Date:   Thu, 9 Dec 2021 00:00:35 -0800
> On Wed, Dec 8, 2021 at 5:33 PM Kuniyuki Iwashima <kuniyu@amazon.co.jp> wrote:
>>
>> While creating a child socket from ACK (not TCP Fast Open case), before
>> v2.3.41, we used to call bh_lock_sock() later than now; it was called just
>> before tcp_rcv_state_process().  The full socket was put into an accept
>> queue and exposed to other CPUs before bh_lock_sock() so that process
>> context might have acquired the lock by then.  Thus, we had to check if any
>> process context was accessing the socket before tcp_rcv_state_process().
>>
> 
> I think you misunderstood me.
> 
> I think this code is not dead yet, so I would :
> 
> Not include a Fixes: tag to avoid unnecessary backports (of a patch
> and its revert)
> 
> If you want to get syzbot coverage for few releases, especially with
> MPTCP and synflood,
> you  can then submit a patch like the following.

Sorry, I got on the same page.
Let me take a look at MPTCP, then if I still think it is dead code, I will
submit the patch.

Thank you.


> 
> diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c
> index cf913a66df17..19da6e442fca 100644
> --- a/net/ipv4/tcp_minisocks.c
> +++ b/net/ipv4/tcp_minisocks.c
> @@ -843,6 +843,9 @@ int tcp_child_process(struct sock *parent, struct
> sock *child,
>                  * in main socket hash table and lock on listening
>                  * socket does not protect us more.
>                  */
> +
> +               /* Check if this code path is obsolete ? */
> +               WARN_ON_ONCE(1);
>                 __sk_add_backlog(child, skb);
>         }

^ permalink raw reply

* [PATCH v2] net: dsa: felix: Fix memory leak in felix_setup_mmio_filtering
From: José Expósito @ 2021-12-09 11:05 UTC (permalink / raw)
  To: vladimir.oltean
  Cc: claudiu.manoil, alexandre.belloni, andrew, vivien.didelot,
	f.fainelli, davem, kuba, linux, netdev, linux-kernel,
	José Expósito

Avoid a memory leak if there is not a CPU port defined.

Fixes: 8d5f7954b7c8 ("net: dsa: felix: break at first CPU port during init and teardown")
Addresses-Coverity-ID: 1492897 ("Resource leak")
Addresses-Coverity-ID: 1492899 ("Resource leak")
Signed-off-by: José Expósito <jose.exposito89@gmail.com>
Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>

---

v2: Add Fixes and Reviewed-by tags
---
 drivers/net/dsa/ocelot/felix.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/net/dsa/ocelot/felix.c b/drivers/net/dsa/ocelot/felix.c
index 327cc4654806..f1a05e7dc818 100644
--- a/drivers/net/dsa/ocelot/felix.c
+++ b/drivers/net/dsa/ocelot/felix.c
@@ -290,8 +290,11 @@ static int felix_setup_mmio_filtering(struct felix *felix)
 		}
 	}
 
-	if (cpu < 0)
+	if (cpu < 0) {
+		kfree(tagging_rule);
+		kfree(redirect_rule);
 		return -EINVAL;
+	}
 
 	tagging_rule->key_type = OCELOT_VCAP_KEY_ETYPE;
 	*(__be16 *)tagging_rule->key.etype.etype.value = htons(ETH_P_1588);
-- 
2.25.1


^ permalink raw reply related

* Re: [PATCH net-next v3 2/6] dt-bindings: net: lan966x: Extend with the analyzer interrupt
From: Vladimir Oltean @ 2021-12-09 10:58 UTC (permalink / raw)
  To: Horatiu Vultur
  Cc: netdev@vger.kernel.org, devicetree@vger.kernel.org,
	linux-kernel@vger.kernel.org, davem@davemloft.net,
	kuba@kernel.org, robh+dt@kernel.org, UNGLinuxDriver@microchip.com,
	linux@armlinux.org.uk, f.fainelli@gmail.com,
	vivien.didelot@gmail.com, andrew@lunn.ch
In-Reply-To: <20211209094615.329379-3-horatiu.vultur@microchip.com>

On Thu, Dec 09, 2021 at 10:46:11AM +0100, Horatiu Vultur wrote:
> Extend dt-bindings for lan966x with analyzer interrupt.
> This interrupt can be generated for example when the HW learn/forgets
> an entry in the MAC table.
> 
> Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
> ---

Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>

Why don't you describe your hardware in the device tree all at once?
Doing it piece by piece means that every time when you add a new
functionality you need to be compatible with the absence of a certain
reg, interrupt etc.

>  .../devicetree/bindings/net/microchip,lan966x-switch.yaml       | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/Documentation/devicetree/bindings/net/microchip,lan966x-switch.yaml b/Documentation/devicetree/bindings/net/microchip,lan966x-switch.yaml
> index 5bee665d5fcf..e79e4e166ad8 100644
> --- a/Documentation/devicetree/bindings/net/microchip,lan966x-switch.yaml
> +++ b/Documentation/devicetree/bindings/net/microchip,lan966x-switch.yaml
> @@ -37,12 +37,14 @@ properties:
>      items:
>        - description: register based extraction
>        - description: frame dma based extraction
> +      - description: analyzer interrupt
>  
>    interrupt-names:
>      minItems: 1
>      items:
>        - const: xtr
>        - const: fdma
> +      - const: ana
>  
>    resets:
>      items:
> -- 
> 2.33.0
>

^ permalink raw reply

* Re: [PATCH net-next v7 1/6] stmmac: dwmac-mediatek: add platform level clocks management
From: AngeloGioacchino Del Regno @ 2021-12-09 10:51 UTC (permalink / raw)
  To: Biao Huang, davem, Jakub Kicinski, Rob Herring
  Cc: Matthias Brugger, Giuseppe Cavallaro, Alexandre Torgue,
	Jose Abreu, Maxime Coquelin, netdev, devicetree, linux-kernel,
	linux-arm-kernel, linux-mediatek, linux-stm32, srv_heupstream,
	macpaul.lin, dkirjanov
In-Reply-To: <20211208054716.603-2-biao.huang@mediatek.com>

Il 08/12/21 06:47, Biao Huang ha scritto:
> This patch implements clks_config callback for dwmac-mediatek platform,
> which could support platform level clocks management.
> 
> Signed-off-by: Biao Huang <biao.huang@mediatek.com>

Sorry, I've sent my ack on v6. Sending it on v7.

Acked-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>

^ permalink raw reply

* Re: [PATCH net-next v6 1/6] stmmac: dwmac-mediatek: add platform level clocks management
From: AngeloGioacchino Del Regno @ 2021-12-09 10:50 UTC (permalink / raw)
  To: Biao Huang, davem, Jakub Kicinski, Rob Herring
  Cc: Matthias Brugger, Giuseppe Cavallaro, Alexandre Torgue,
	Jose Abreu, Maxime Coquelin, netdev, devicetree, linux-kernel,
	linux-arm-kernel, linux-mediatek, linux-stm32, srv_heupstream,
	macpaul.lin, dkirjanov
In-Reply-To: <20211208030354.31877-2-biao.huang@mediatek.com>

Il 08/12/21 04:03, Biao Huang ha scritto:
> This patch implements clks_config callback for dwmac-mediatek platform,
> which could support platform level clocks management.
> 
> Signed-off-by: Biao Huang <biao.huang@mediatek.com>

Acked-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>

^ permalink raw reply

* Re: [PATCH net-next] net: prestera: flower template support
From: Volodymyr Mytnyk @ 2021-12-09 10:46 UTC (permalink / raw)
  To: Jamal Hadi Salim, netdev@vger.kernel.org
  Cc: Taras Chornyi, Mickey Rachamim, Serhiy Pshyk, Volodymyr Mytnyk,
	Taras Chornyi, David S. Miller, Jakub Kicinski,
	linux-kernel@vger.kernel.org
In-Reply-To: <c8379f78-01da-cd2f-f4e2-99874a01f995@mojatatu.com>

Hi Jamal,

>
> > Hi Jamal,
> > 
> >>
> >>> From: Volodymyr Mytnyk<vmytnyk@marvell.com>
> >>>
> >>> Add user template explicit support. At this moment, max TCAM rule size
> >>> is utilized for all rules, doesn't matter which and how much flower
> >>> matches are provided by user. It means that some of TCAM space is
> >>> wasted, which impacts the number of filters that can be offloaded.
> >>>
> >>> Introducing the template, allows to have more HW offloaded filters.
> >>>
> >>> Example:
> >>>     tc qd add dev PORT clsact
> >>>     tc chain add dev PORT ingress protocol ip \
> >>>       flower dst_ip 0.0.0.0/16
> >>
> >> "chain" or "filter"?
> > 
> > tc chain add ... flower [tempalte] is the command to add explicitly chain with a given template
> > 
> 
> I guess you are enforcing the template on chain 0. My brain
> was  expecting chain id to be called out.
> 

chain 0 is the default chain id for "tc chain" & "tc filter" command,
so, that's why I did not mention it in the command line. Please note,
this patch adds only template support. Chains are not supported yet,
and will be added later.

> 
> > tc filter ... is the command to add a filter itself in that chain
> > 
> 
> Got it.
> 
> 
> >> You are not using tc priority? Above will result in two priorities (the 0.0.0.0 entry will be more important) and in classical flower approach two  different tables.
> >> I am wondering how you map the table to the TCAM.
> >> Is the priority sorting entirely based on masks in hardware?
> > 
> > Kernel tc filter priority is used as a priority for HW rule (see flower implementation).
> 
> The TCAM however should be able to accept many masks - is the idea
> here to enforce some mask per chain and then have priority being the
> priorities handle conflict? What happens when you explicitly specify
> priority. If you dont specify it the kernel provides it and essentially
> resolution is based on the order in which the rules are entered..

The HW rule insert/delete into TCAM is done by the FW itself. It means,
that the FW will take care about prio and (re)order the rule based on the
priority provided by user/kernel. So, kernel driver just need to provide
prio to the FW when adding the rule into the HW.

> 
> cheers,
> jamal

Thanks and Regards,
  Volodymyr

^ permalink raw reply

* [PATCH net-next v7 4/4] net: ocelot: add FDMA support
From: Clément Léger @ 2021-12-09 10:43 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Rob Herring, Vladimir Oltean,
	Claudiu Manoil, Alexandre Belloni, UNGLinuxDriver, Andrew Lunn,
	Florian Fainelli, Russell King
  Cc: Clément Léger, netdev, devicetree, linux-kernel,
	Thomas Petazzoni, Denis Kirjanov, Julian Wiedmann
In-Reply-To: <20211209104306.986188-1-clement.leger@bootlin.com>

Ethernet frames can be extracted or injected autonomously to or from
the device’s DDR3/DDR3L memory and/or PCIe memory space. Linked list
data structures in memory are used for injecting or extracting Ethernet
frames. The FDMA generates interrupts when frame extraction or
injection is done and when the linked lists need updating.

The FDMA is shared between all the ethernet ports of the switch and
uses a linked list of descriptors (DCB) to inject and extract packets.
Before adding descriptors, the FDMA channels must be stopped. It would
be inefficient to do that each time a descriptor would be added so the
channels are restarted only once they stopped.

Both channels uses ring-like structure to feed the DCBs to the FDMA.
head and tail are never touched by hardware and are completely handled
by the driver. On top of that, page recycling has been added and is
mostly taken from gianfar driver.

Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Co-developed-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
Signed-off-by: Clément Léger <clement.leger@bootlin.com>
---
 drivers/net/ethernet/mscc/Makefile         |   1 +
 drivers/net/ethernet/mscc/ocelot_fdma.c    | 894 +++++++++++++++++++++
 drivers/net/ethernet/mscc/ocelot_fdma.h    | 166 ++++
 drivers/net/ethernet/mscc/ocelot_net.c     |  25 +-
 drivers/net/ethernet/mscc/ocelot_vsc7514.c |  10 +
 include/soc/mscc/ocelot.h                  |   3 +
 6 files changed, 1095 insertions(+), 4 deletions(-)
 create mode 100644 drivers/net/ethernet/mscc/ocelot_fdma.c
 create mode 100644 drivers/net/ethernet/mscc/ocelot_fdma.h

diff --git a/drivers/net/ethernet/mscc/Makefile b/drivers/net/ethernet/mscc/Makefile
index 722c27694b21..d76a9b78b6ca 100644
--- a/drivers/net/ethernet/mscc/Makefile
+++ b/drivers/net/ethernet/mscc/Makefile
@@ -11,5 +11,6 @@ mscc_ocelot_switch_lib-y := \
 mscc_ocelot_switch_lib-$(CONFIG_BRIDGE_MRP) += ocelot_mrp.o
 obj-$(CONFIG_MSCC_OCELOT_SWITCH) += mscc_ocelot.o
 mscc_ocelot-y := \
+	ocelot_fdma.o \
 	ocelot_vsc7514.o \
 	ocelot_net.o
diff --git a/drivers/net/ethernet/mscc/ocelot_fdma.c b/drivers/net/ethernet/mscc/ocelot_fdma.c
new file mode 100644
index 000000000000..350a0b52f021
--- /dev/null
+++ b/drivers/net/ethernet/mscc/ocelot_fdma.c
@@ -0,0 +1,894 @@
+// SPDX-License-Identifier: (GPL-2.0 OR MIT)
+/*
+ * Microsemi SoCs FDMA driver
+ *
+ * Copyright (c) 2021 Microchip
+ *
+ * Page recycling code is mostly taken from gianfar driver.
+ */
+
+#include <linux/align.h>
+#include <linux/bitops.h>
+#include <linux/dmapool.h>
+#include <linux/dsa/ocelot.h>
+#include <linux/netdevice.h>
+#include <linux/of_platform.h>
+#include <linux/skbuff.h>
+
+#include "ocelot_fdma.h"
+#include "ocelot_qs.h"
+
+DEFINE_STATIC_KEY_FALSE(ocelot_fdma_enabled);
+
+static void ocelot_fdma_writel(struct ocelot *ocelot, u32 reg, u32 data)
+{
+	regmap_write(ocelot->targets[FDMA], reg, data);
+}
+
+static u32 ocelot_fdma_readl(struct ocelot *ocelot, u32 reg)
+{
+	u32 retval;
+
+	regmap_read(ocelot->targets[FDMA], reg, &retval);
+
+	return retval;
+}
+
+static dma_addr_t ocelot_fdma_idx_dma(dma_addr_t base, u16 idx)
+{
+	return base + idx * sizeof(struct ocelot_fdma_dcb);
+}
+
+static u16 ocelot_fdma_dma_idx(dma_addr_t base, dma_addr_t dma)
+{
+	return (dma - base) / sizeof(struct ocelot_fdma_dcb);
+}
+
+static u16 ocelot_fdma_idx_next(u16 idx, u16 ring_sz)
+{
+	return unlikely(idx == ring_sz - 1) ? 0 : idx + 1;
+}
+
+static u16 ocelot_fdma_idx_prev(u16 idx, u16 ring_sz)
+{
+	return unlikely(idx == 0) ? ring_sz - 1 : idx - 1;
+}
+
+static int ocelot_fdma_rx_ring_free(struct ocelot_fdma *fdma)
+{
+	struct ocelot_fdma_rx_ring *rx_ring = &fdma->rx_ring;
+
+	if (rx_ring->next_to_use >= rx_ring->next_to_clean)
+		return OCELOT_FDMA_RX_RING_SIZE -
+		       (rx_ring->next_to_use - rx_ring->next_to_clean) - 1;
+	else
+		return rx_ring->next_to_clean - rx_ring->next_to_use - 1;
+}
+
+static int ocelot_fdma_tx_ring_free(struct ocelot_fdma *fdma)
+{
+	struct ocelot_fdma_tx_ring *tx_ring = &fdma->tx_ring;
+
+	if (tx_ring->next_to_use >= tx_ring->next_to_clean)
+		return OCELOT_FDMA_TX_RING_SIZE -
+		       (tx_ring->next_to_use - tx_ring->next_to_clean) - 1;
+	else
+		return tx_ring->next_to_clean - tx_ring->next_to_use - 1;
+}
+
+static bool ocelot_fdma_tx_ring_empty(struct ocelot_fdma *fdma)
+{
+	struct ocelot_fdma_tx_ring *tx_ring = &fdma->tx_ring;
+
+	return tx_ring->next_to_clean == tx_ring->next_to_use;
+}
+
+static void ocelot_fdma_activate_chan(struct ocelot *ocelot, dma_addr_t dma,
+				      int chan)
+{
+	ocelot_fdma_writel(ocelot, MSCC_FDMA_DCB_LLP(chan), dma);
+	/* Barrier to force memory writes to DCB to be completed before starting
+	 * the channel.
+	 */
+	wmb();
+	ocelot_fdma_writel(ocelot, MSCC_FDMA_CH_ACTIVATE, BIT(chan));
+}
+
+static int ocelot_fdma_wait_chan_safe(struct ocelot *ocelot, int chan)
+{
+	unsigned long timeout;
+	u32 safe;
+
+	timeout = jiffies + usecs_to_jiffies(OCELOT_FDMA_CH_SAFE_TIMEOUT_US);
+	do {
+		safe = ocelot_fdma_readl(ocelot, MSCC_FDMA_CH_SAFE);
+		if (safe & BIT(chan))
+			return 0;
+	} while (time_after(jiffies, timeout));
+
+	return -ETIMEDOUT;
+}
+
+static void ocelot_fdma_dcb_set_data(struct ocelot_fdma_dcb *dcb,
+				     dma_addr_t dma_addr,
+				     size_t size)
+{
+	u32 offset = dma_addr & 0x3;
+
+	dcb->llp = 0;
+	dcb->datap = ALIGN_DOWN(dma_addr, 4);
+	dcb->datal = ALIGN_DOWN(size, 4);
+	dcb->stat = MSCC_FDMA_DCB_STAT_BLOCKO(offset);
+}
+
+static bool ocelot_fdma_rx_alloc_page(struct ocelot *ocelot,
+				      struct ocelot_fdma_rx_buf *rxb)
+{
+	dma_addr_t mapping;
+	struct page *page;
+
+	page = dev_alloc_page();
+	if (unlikely(!page))
+		return false;
+
+	mapping = dma_map_page(ocelot->dev, page, 0, PAGE_SIZE,
+			       DMA_FROM_DEVICE);
+	if (unlikely(dma_mapping_error(ocelot->dev, mapping))) {
+		__free_page(page);
+		return false;
+	}
+
+	rxb->page = page;
+	rxb->page_offset = 0;
+	rxb->dma_addr = mapping;
+
+	return true;
+}
+
+static int ocelot_fdma_alloc_rx_buffs(struct ocelot *ocelot, u16 alloc_cnt)
+{
+	struct ocelot_fdma *fdma = ocelot->fdma;
+	struct ocelot_fdma_rx_ring *rx_ring;
+	struct ocelot_fdma_rx_buf *rxb;
+	struct ocelot_fdma_dcb *dcb;
+	dma_addr_t dma_addr;
+	int ret = 0;
+	u16 idx;
+
+	rx_ring = &fdma->rx_ring;
+	idx = rx_ring->next_to_use;
+
+	while (alloc_cnt--) {
+		rxb = &rx_ring->bufs[idx];
+		/* try reuse page */
+		if (unlikely(!rxb->page)) {
+			if (unlikely(!ocelot_fdma_rx_alloc_page(ocelot, rxb))) {
+				dev_err_ratelimited(ocelot->dev,
+						    "Failed to allocate rx\n");
+				ret = -ENOMEM;
+				break;
+			}
+		}
+
+		dcb = &rx_ring->dcbs[idx];
+		dma_addr = rxb->dma_addr + rxb->page_offset;
+		ocelot_fdma_dcb_set_data(dcb, dma_addr, OCELOT_FDMA_RXB_SIZE);
+
+		idx = ocelot_fdma_idx_next(idx, OCELOT_FDMA_RX_RING_SIZE);
+		/* Chain the DCB to the next one */
+		dcb->llp = ocelot_fdma_idx_dma(rx_ring->dcbs_dma, idx);
+	}
+
+	rx_ring->next_to_use = idx;
+	rx_ring->next_to_alloc = idx;
+
+	return ret;
+}
+
+static bool ocelot_fdma_tx_dcb_set_skb(struct ocelot *ocelot,
+				       struct ocelot_fdma_tx_buf *tx_buf,
+				       struct ocelot_fdma_dcb *dcb,
+				       struct sk_buff *skb)
+{
+	dma_addr_t mapping;
+
+	mapping = dma_map_single(ocelot->dev, skb->data, skb->len,
+				 DMA_TO_DEVICE);
+	if (unlikely(dma_mapping_error(ocelot->dev, mapping)))
+		return false;
+
+	dma_unmap_addr_set(tx_buf, dma_addr, mapping);
+
+	ocelot_fdma_dcb_set_data(dcb, mapping, OCELOT_FDMA_RX_SIZE);
+	tx_buf->skb = skb;
+	dcb->stat |= MSCC_FDMA_DCB_STAT_BLOCKL(skb->len);
+	dcb->stat |= MSCC_FDMA_DCB_STAT_SOF | MSCC_FDMA_DCB_STAT_EOF;
+
+	return true;
+}
+
+static bool ocelot_fdma_check_stop_rx(struct ocelot *ocelot)
+{
+	u32 llp;
+
+	/* Check if the FDMA hits the DCB with LLP == NULL */
+	llp = ocelot_fdma_readl(ocelot, MSCC_FDMA_DCB_LLP(MSCC_FDMA_XTR_CHAN));
+	if (unlikely(llp))
+		return false;
+
+	ocelot_fdma_writel(ocelot, MSCC_FDMA_CH_DISABLE,
+			   BIT(MSCC_FDMA_XTR_CHAN));
+
+	return true;
+}
+
+static void ocelot_fdma_rx_set_llp(struct ocelot_fdma_rx_ring *rx_ring)
+{
+	struct ocelot_fdma_dcb *dcb;
+	unsigned int idx;
+
+	idx = ocelot_fdma_idx_prev(rx_ring->next_to_use,
+				   OCELOT_FDMA_RX_RING_SIZE);
+	dcb = &rx_ring->dcbs[idx];
+	dcb->llp = 0;
+}
+
+static void ocelot_fdma_rx_restart(struct ocelot *ocelot)
+{
+	struct ocelot_fdma *fdma = ocelot->fdma;
+	struct ocelot_fdma_rx_ring *rx_ring;
+	const u8 chan = MSCC_FDMA_XTR_CHAN;
+	dma_addr_t new_llp, dma_base;
+	unsigned int idx;
+	u32 llp_prev;
+	int ret;
+
+	rx_ring = &fdma->rx_ring;
+	ret = ocelot_fdma_wait_chan_safe(ocelot, chan);
+	if (ret) {
+		dev_err_ratelimited(ocelot->dev,
+				    "Unable to stop RX channel\n");
+		return;
+	}
+
+	ocelot_fdma_rx_set_llp(rx_ring);
+
+	/* FDMA stopped on the last DCB that contained a NULL LLP, since
+	 * we processed some DCBs in RX, there is free space, and  we must set
+	 * DCB_LLP to point to the next DCB
+	 */
+	llp_prev = ocelot_fdma_readl(ocelot, MSCC_FDMA_DCB_LLP_PREV(chan));
+	dma_base = rx_ring->dcbs_dma;
+
+	/* Get the next DMA addr located after LLP == NULL DCB */
+	idx = ocelot_fdma_dma_idx(dma_base, llp_prev);
+	idx = ocelot_fdma_idx_next(idx, OCELOT_FDMA_RX_RING_SIZE);
+	new_llp = ocelot_fdma_idx_dma(dma_base, idx);
+
+	/* Finally reactivate the channel */
+	ocelot_fdma_activate_chan(ocelot, new_llp, chan);
+}
+
+static bool ocelot_fdma_add_rx_frag(struct ocelot_fdma_rx_buf *rxb, u32 stat,
+				    struct sk_buff *skb, bool first)
+{
+	int size = MSCC_FDMA_DCB_STAT_BLOCKL(stat);
+	struct page *page = rxb->page;
+
+	if (likely(first)) {
+		skb_put(skb, size);
+	} else {
+		skb_add_rx_frag(skb, skb_shinfo(skb)->nr_frags, page,
+				rxb->page_offset, size, OCELOT_FDMA_RX_SIZE);
+	}
+
+	/* Try to reuse page */
+	if (unlikely(page_ref_count(page) != 1 || page_is_pfmemalloc(page)))
+		return false;
+
+	/* Change offset to the other half */
+	rxb->page_offset ^= OCELOT_FDMA_RX_SIZE;
+
+	page_ref_inc(page);
+
+	return true;
+}
+
+static void ocelot_fdma_reuse_rx_page(struct ocelot *ocelot,
+				      struct ocelot_fdma_rx_buf *old_rxb)
+{
+	struct ocelot_fdma_rx_ring *rx_ring = &ocelot->fdma->rx_ring;
+	struct ocelot_fdma_rx_buf *new_rxb;
+
+	new_rxb = &rx_ring->bufs[rx_ring->next_to_alloc];
+	rx_ring->next_to_alloc = ocelot_fdma_idx_next(rx_ring->next_to_alloc,
+						      OCELOT_FDMA_RX_RING_SIZE);
+
+	/* Copy page reference */
+	*new_rxb = *old_rxb;
+
+	/* Sync for use by the device */
+	dma_sync_single_range_for_device(ocelot->dev, old_rxb->dma_addr,
+					 old_rxb->page_offset,
+					 OCELOT_FDMA_RX_SIZE, DMA_FROM_DEVICE);
+}
+
+static struct sk_buff *ocelot_fdma_get_skb(struct ocelot *ocelot, u32 stat,
+					   struct ocelot_fdma_rx_buf *rxb,
+					   struct sk_buff *skb)
+{
+	bool first = false;
+
+	/* Allocate skb head and data */
+	if (likely(!skb)) {
+		void *buff_addr = page_address(rxb->page) +
+				  rxb->page_offset;
+
+		skb = build_skb(buff_addr, OCELOT_FDMA_SKBFRAG_SIZE);
+		if (unlikely(!skb)) {
+			dev_err_ratelimited(ocelot->dev,
+					    "build_skb failed !\n");
+			return NULL;
+		}
+		first = true;
+	}
+
+	dma_sync_single_range_for_cpu(ocelot->dev, rxb->dma_addr,
+				      rxb->page_offset, OCELOT_FDMA_RX_SIZE,
+				      DMA_FROM_DEVICE);
+
+	if (ocelot_fdma_add_rx_frag(rxb, stat, skb, first)) {
+		/* Reuse the free half of the page for the next_to_alloc DCB*/
+		ocelot_fdma_reuse_rx_page(ocelot, rxb);
+	} else {
+		/* page cannot be reused, unmap it */
+		dma_unmap_page(ocelot->dev, rxb->dma_addr, PAGE_SIZE,
+			       DMA_FROM_DEVICE);
+	}
+
+	/* clear rx buff content */
+	rxb->page = NULL;
+
+	return skb;
+}
+
+static bool ocelot_fdma_receive_skb(struct ocelot *ocelot, struct sk_buff *skb)
+{
+	struct net_device *ndev;
+	void *xfh = skb->data;
+	u64 timestamp;
+	u64 src_port;
+
+	skb_pull(skb, OCELOT_TAG_LEN);
+
+	ocelot_xfh_get_src_port(xfh, &src_port);
+	if (unlikely(src_port >= ocelot->num_phys_ports))
+		return false;
+
+	ndev = ocelot_port_to_netdev(ocelot, src_port);
+	if (unlikely(!ndev))
+		return false;
+
+	pskb_trim(skb, skb->len - ETH_FCS_LEN);
+
+	skb->dev = ndev;
+	skb->protocol = eth_type_trans(skb, skb->dev);
+	skb->dev->stats.rx_bytes += skb->len;
+	skb->dev->stats.rx_packets++;
+
+	if (ocelot->ptp) {
+		ocelot_xfh_get_rew_val(xfh, &timestamp);
+		ocelot_ptp_rx_timestamp(ocelot, skb, timestamp);
+	}
+
+	if (likely(!skb_defer_rx_timestamp(skb)))
+		netif_receive_skb(skb);
+
+	return true;
+}
+
+static int ocelot_fdma_rx_get(struct ocelot *ocelot, int budget)
+{
+	struct ocelot_fdma *fdma = ocelot->fdma;
+	struct ocelot_fdma_rx_ring *rx_ring;
+	struct ocelot_fdma_rx_buf *rxb;
+	struct ocelot_fdma_dcb *dcb;
+	struct sk_buff *skb;
+	int work_done = 0;
+	int cleaned_cnt;
+	u32 stat;
+	u16 idx;
+
+	cleaned_cnt = ocelot_fdma_rx_ring_free(fdma);
+	rx_ring = &fdma->rx_ring;
+	skb = rx_ring->skb;
+
+	while (budget--) {
+		idx = rx_ring->next_to_clean;
+		dcb = &rx_ring->dcbs[idx];
+		stat = dcb->stat;
+		if (MSCC_FDMA_DCB_STAT_BLOCKL(stat) == 0)
+			break;
+
+		/* New packet is a start of frame but we already got a skb set,
+		 * we probably lost an EOF packet, free skb
+		 */
+		if (unlikely(skb && (stat & MSCC_FDMA_DCB_STAT_SOF))) {
+			dev_kfree_skb(skb);
+			skb = NULL;
+		}
+
+		rxb = &rx_ring->bufs[idx];
+		/* Fetch next to clean buffer from the rx_ring */
+		skb = ocelot_fdma_get_skb(ocelot, stat, rxb, skb);
+		if (unlikely(!skb))
+			break;
+
+		work_done++;
+		cleaned_cnt++;
+
+		idx = ocelot_fdma_idx_next(idx, OCELOT_FDMA_RX_RING_SIZE);
+		rx_ring->next_to_clean = idx;
+
+		if (unlikely(stat & MSCC_FDMA_DCB_STAT_ABORT ||
+			     stat & MSCC_FDMA_DCB_STAT_PD)) {
+			dev_err_ratelimited(ocelot->dev,
+					    "DCB aborted or pruned\n");
+			dev_kfree_skb(skb);
+			skb = NULL;
+			continue;
+		}
+
+		/* We still need to process the other fragment of the packet
+		 * before delivering it to the network stack
+		 */
+		if (!(stat & MSCC_FDMA_DCB_STAT_EOF))
+			continue;
+
+		if (unlikely(!ocelot_fdma_receive_skb(ocelot, skb)))
+			dev_kfree_skb(skb);
+
+		skb = NULL;
+	}
+
+	rx_ring->skb = skb;
+
+	if (cleaned_cnt)
+		ocelot_fdma_alloc_rx_buffs(ocelot, cleaned_cnt);
+
+	return work_done;
+}
+
+static void ocelot_fdma_wakeup_netdev(struct ocelot *ocelot)
+{
+	struct ocelot_port_private *priv;
+	struct ocelot_port *ocelot_port;
+	struct net_device *dev;
+	int port;
+
+	for (port = 0; port < ocelot->num_phys_ports; port++) {
+		ocelot_port = ocelot->ports[port];
+		if (!ocelot_port)
+			continue;
+		priv = container_of(ocelot_port, struct ocelot_port_private,
+				    port);
+		dev = priv->dev;
+
+		if (unlikely(netif_queue_stopped(dev)))
+			netif_wake_queue(dev);
+	}
+}
+
+static void ocelot_fdma_tx_cleanup(struct ocelot *ocelot, int budget)
+{
+	struct ocelot_fdma *fdma = ocelot->fdma;
+	struct ocelot_fdma_tx_ring *tx_ring;
+	struct ocelot_fdma_tx_buf *buf;
+	unsigned int new_null_llp_idx;
+	struct ocelot_fdma_dcb *dcb;
+	bool end_of_list = false;
+	struct sk_buff *skb;
+	dma_addr_t dma;
+	u32 dcb_llp;
+	u16 ntc;
+	int ret;
+
+	tx_ring = &fdma->tx_ring;
+
+	/* Purge the TX packets that have been sent up to the NULL llp or the
+	 * end of done list.
+	 */
+	while (!ocelot_fdma_tx_ring_empty(fdma)) {
+		ntc = tx_ring->next_to_clean;
+		dcb = &tx_ring->dcbs[ntc];
+		if (!(dcb->stat & MSCC_FDMA_DCB_STAT_PD))
+			break;
+
+		buf = &tx_ring->bufs[ntc];
+		skb = buf->skb;
+		dma_unmap_single(ocelot->dev, dma_unmap_addr(buf, dma_addr),
+				 skb->len, DMA_TO_DEVICE);
+		napi_consume_skb(skb, budget);
+		dcb_llp = dcb->llp;
+
+		/* Only update after accessing all dcb fields */
+		tx_ring->next_to_clean = ocelot_fdma_idx_next(ntc,
+							      OCELOT_FDMA_TX_RING_SIZE);
+
+		/* If we hit the NULL LLP, stop, we might need to reload FDMA */
+		if (dcb_llp == 0) {
+			end_of_list = true;
+			break;
+		}
+	}
+
+	/* No need to try to wake if there were no TX cleaned_cnt up. */
+	if (ocelot_fdma_tx_ring_free(fdma))
+		ocelot_fdma_wakeup_netdev(ocelot);
+
+	/* If there is still some DCBs to be processed by the FDMA or if the
+	 * pending list is empty, there is no need to restart the FDMA.
+	 */
+	if (!end_of_list || ocelot_fdma_tx_ring_empty(fdma))
+		return;
+
+	ret = ocelot_fdma_wait_chan_safe(ocelot, MSCC_FDMA_INJ_CHAN);
+	if (ret) {
+		dev_warn(ocelot->dev,
+			 "Failed to wait for TX channel to stop\n");
+		return;
+	}
+
+	/* Set NULL LLP to be the last DCB used */
+	new_null_llp_idx = ocelot_fdma_idx_prev(tx_ring->next_to_use,
+						OCELOT_FDMA_TX_RING_SIZE);
+	dcb = &tx_ring->dcbs[new_null_llp_idx];
+	dcb->llp = 0;
+
+	dma = ocelot_fdma_idx_dma(tx_ring->dcbs_dma, tx_ring->next_to_clean);
+	ocelot_fdma_activate_chan(ocelot, dma, MSCC_FDMA_INJ_CHAN);
+}
+
+static int ocelot_fdma_napi_poll(struct napi_struct *napi, int budget)
+{
+	struct ocelot_fdma *fdma = container_of(napi, struct ocelot_fdma, napi);
+	struct ocelot *ocelot = fdma->ocelot;
+	int work_done = 0;
+	bool rx_stopped;
+
+	ocelot_fdma_tx_cleanup(ocelot, budget);
+
+	rx_stopped = ocelot_fdma_check_stop_rx(ocelot);
+
+	work_done = ocelot_fdma_rx_get(ocelot, budget);
+
+	if (rx_stopped)
+		ocelot_fdma_rx_restart(ocelot);
+
+	if (work_done < budget) {
+		napi_complete_done(&fdma->napi, work_done);
+		ocelot_fdma_writel(ocelot, MSCC_FDMA_INTR_ENA,
+				   BIT(MSCC_FDMA_INJ_CHAN) |
+				   BIT(MSCC_FDMA_XTR_CHAN));
+	}
+
+	return work_done;
+}
+
+static irqreturn_t ocelot_fdma_interrupt(int irq, void *dev_id)
+{
+	u32 ident, llp, frm, err, err_code;
+	struct ocelot *ocelot = dev_id;
+
+	ident = ocelot_fdma_readl(ocelot, MSCC_FDMA_INTR_IDENT);
+	frm = ocelot_fdma_readl(ocelot, MSCC_FDMA_INTR_FRM);
+	llp = ocelot_fdma_readl(ocelot, MSCC_FDMA_INTR_LLP);
+
+	ocelot_fdma_writel(ocelot, MSCC_FDMA_INTR_LLP, llp & ident);
+	ocelot_fdma_writel(ocelot, MSCC_FDMA_INTR_FRM, frm & ident);
+	if (frm || llp) {
+		ocelot_fdma_writel(ocelot, MSCC_FDMA_INTR_ENA, 0);
+		napi_schedule(&ocelot->fdma->napi);
+	}
+
+	err = ocelot_fdma_readl(ocelot, MSCC_FDMA_EVT_ERR);
+	if (unlikely(err)) {
+		err_code = ocelot_fdma_readl(ocelot, MSCC_FDMA_EVT_ERR_CODE);
+		dev_err_ratelimited(ocelot->dev,
+				    "Error ! chans mask: %#x, code: %#x\n",
+				    err, err_code);
+
+		ocelot_fdma_writel(ocelot, MSCC_FDMA_EVT_ERR, err);
+		ocelot_fdma_writel(ocelot, MSCC_FDMA_EVT_ERR_CODE, err_code);
+	}
+
+	return IRQ_HANDLED;
+}
+
+static void ocelot_fdma_send_skb(struct ocelot *ocelot,
+				 struct ocelot_fdma *fdma, struct sk_buff *skb)
+{
+	struct ocelot_fdma_tx_ring *tx_ring = &fdma->tx_ring;
+	struct ocelot_fdma_tx_buf *tx_buf;
+	struct ocelot_fdma_dcb *dcb;
+	dma_addr_t dma;
+	u16 next_idx;
+
+	dcb = &tx_ring->dcbs[tx_ring->next_to_use];
+	tx_buf = &tx_ring->bufs[tx_ring->next_to_use];
+	if (!ocelot_fdma_tx_dcb_set_skb(ocelot, tx_buf, dcb, skb)) {
+		dev_kfree_skb_any(skb);
+		return;
+	}
+
+	next_idx = ocelot_fdma_idx_next(tx_ring->next_to_use,
+					OCELOT_FDMA_TX_RING_SIZE);
+	skb_tx_timestamp(skb);
+
+	/* If the FDMA TX chan is empty, then enqueue the DCB directly */
+	if (ocelot_fdma_tx_ring_empty(fdma)) {
+		dma = ocelot_fdma_idx_dma(tx_ring->dcbs_dma,
+					  tx_ring->next_to_use);
+		ocelot_fdma_activate_chan(ocelot, dma, MSCC_FDMA_INJ_CHAN);
+	} else {
+		/* Chain the DCBs */
+		dcb->llp = ocelot_fdma_idx_dma(tx_ring->dcbs_dma, next_idx);
+	}
+
+	tx_ring->next_to_use = next_idx;
+}
+
+static int ocelot_fdma_prepare_skb(struct ocelot *ocelot, int port, u32 rew_op,
+				   struct sk_buff *skb, struct net_device *dev)
+{
+	int needed_headroom = max_t(int, OCELOT_TAG_LEN - skb_headroom(skb), 0);
+	int needed_tailroom = max_t(int, ETH_FCS_LEN - skb_tailroom(skb), 0);
+	void *ifh;
+	int err;
+
+	if (unlikely(needed_headroom || needed_tailroom ||
+		     skb_header_cloned(skb))) {
+		err = pskb_expand_head(skb, needed_headroom, needed_tailroom,
+				       GFP_ATOMIC);
+		if (unlikely(err)) {
+			dev_kfree_skb_any(skb);
+			return 1;
+		}
+	}
+
+	err = skb_linearize(skb);
+	if (err) {
+		net_err_ratelimited("%s: skb_linearize error (%d)!\n",
+				    dev->name, err);
+		dev_kfree_skb_any(skb);
+		return 1;
+	}
+
+	ifh = skb_push(skb, OCELOT_TAG_LEN);
+	skb_put(skb, ETH_FCS_LEN);
+	memset(ifh, 0, OCELOT_TAG_LEN);
+	ocelot_ifh_port_set(ifh, port, rew_op, skb_vlan_tag_get(skb));
+
+	return 0;
+}
+
+int ocelot_fdma_inject_frame(struct ocelot *ocelot, int port, u32 rew_op,
+			     struct sk_buff *skb, struct net_device *dev)
+{
+	struct ocelot_fdma *fdma = ocelot->fdma;
+	int ret = NETDEV_TX_OK;
+
+	spin_lock(&fdma->tx_ring.xmit_lock);
+
+	if (ocelot_fdma_tx_ring_free(fdma) == 0) {
+		netif_stop_queue(dev);
+		ret = NETDEV_TX_BUSY;
+		goto out;
+	}
+
+	if (ocelot_fdma_prepare_skb(ocelot, port, rew_op, skb, dev))
+		goto out;
+
+	ocelot_fdma_send_skb(ocelot, fdma, skb);
+
+out:
+	spin_unlock(&fdma->tx_ring.xmit_lock);
+
+	return ret;
+}
+
+static void ocelot_fdma_free_rx_ring(struct ocelot *ocelot)
+{
+	struct ocelot_fdma *fdma = ocelot->fdma;
+	struct ocelot_fdma_rx_ring *rx_ring;
+	struct ocelot_fdma_rx_buf *rxb;
+	u16 idx;
+
+	rx_ring = &fdma->rx_ring;
+	idx = rx_ring->next_to_clean;
+
+	/* Free the pages held in the RX ring */
+	while (idx != rx_ring->next_to_use) {
+		rxb = &rx_ring->bufs[idx];
+		dma_unmap_page(ocelot->dev, rxb->dma_addr, PAGE_SIZE,
+			       DMA_FROM_DEVICE);
+		__free_page(rxb->page);
+		idx = ocelot_fdma_idx_next(idx, OCELOT_FDMA_RX_RING_SIZE);
+	}
+
+	if (fdma->rx_ring.skb)
+		dev_kfree_skb_any(fdma->rx_ring.skb);
+}
+
+static void ocelot_fdma_free_tx_ring(struct ocelot *ocelot)
+{
+	struct ocelot_fdma *fdma = ocelot->fdma;
+	struct ocelot_fdma_tx_ring *tx_ring;
+	struct ocelot_fdma_tx_buf *txb;
+	struct sk_buff *skb;
+	u16 idx;
+
+	tx_ring = &fdma->tx_ring;
+	idx = tx_ring->next_to_clean;
+
+	while (idx != tx_ring->next_to_use) {
+		txb = &tx_ring->bufs[idx];
+		skb = txb->skb;
+		dma_unmap_single(ocelot->dev, txb->dma_addr, skb->len,
+				 DMA_TO_DEVICE);
+		dev_kfree_skb_any(skb);
+		idx = ocelot_fdma_idx_next(idx, OCELOT_FDMA_TX_RING_SIZE);
+	}
+}
+
+static int ocelot_fdma_rings_alloc(struct ocelot *ocelot)
+{
+	struct ocelot_fdma *fdma = ocelot->fdma;
+	struct ocelot_fdma_dcb *dcbs;
+	unsigned int adjust;
+	dma_addr_t dcbs_dma;
+	int ret;
+
+	/* Create a pool of consistent memory blocks for hardware descriptors */
+	fdma->dcbs_base = dmam_alloc_coherent(ocelot->dev,
+					      OCELOT_DCBS_HW_ALLOC_SIZE,
+					      &fdma->dcbs_dma_base, GFP_KERNEL);
+	if (!fdma->dcbs_base)
+		return -ENOMEM;
+
+	/* DCBs must be aligned on a 32bit boundary */
+	dcbs = fdma->dcbs_base;
+	dcbs_dma = fdma->dcbs_dma_base;
+	if (!IS_ALIGNED(dcbs_dma, 4)) {
+		adjust = dcbs_dma & 0x3;
+		dcbs_dma = ALIGN(dcbs_dma, 4);
+		dcbs = (void *)dcbs + adjust;
+	}
+
+	/* TX queue */
+	fdma->tx_ring.dcbs = dcbs;
+	fdma->tx_ring.dcbs_dma = dcbs_dma;
+	spin_lock_init(&fdma->tx_ring.xmit_lock);
+
+	/* RX queue */
+	fdma->rx_ring.dcbs = dcbs + OCELOT_FDMA_TX_RING_SIZE;
+	fdma->rx_ring.dcbs_dma = dcbs_dma + OCELOT_FDMA_TX_DCB_SIZE;
+	ret = ocelot_fdma_alloc_rx_buffs(ocelot,
+					 ocelot_fdma_tx_ring_free(fdma));
+	if (ret) {
+		ocelot_fdma_free_rx_ring(ocelot);
+		return ret;
+	}
+
+	/* Set the last DCB LLP as NULL, this is normally done when restarting
+	 * the RX chan, but this is for the first run
+	 */
+	ocelot_fdma_rx_set_llp(&fdma->rx_ring);
+
+	return 0;
+}
+
+void ocelot_fdma_netdev_init(struct ocelot *ocelot, struct net_device *dev)
+{
+	struct ocelot_fdma *fdma = ocelot->fdma;
+
+	dev->needed_headroom = OCELOT_TAG_LEN;
+	dev->needed_tailroom = ETH_FCS_LEN;
+
+	if (fdma->ndev)
+		return;
+
+	fdma->ndev = dev;
+	netif_napi_add(dev, &fdma->napi, ocelot_fdma_napi_poll,
+		       OCELOT_FDMA_WEIGHT);
+}
+
+void ocelot_fdma_netdev_deinit(struct ocelot *ocelot, struct net_device *dev)
+{
+	struct ocelot_fdma *fdma = ocelot->fdma;
+
+	if (fdma->ndev == dev) {
+		netif_napi_del(&fdma->napi);
+		fdma->ndev = NULL;
+	}
+}
+
+void ocelot_fdma_init(struct platform_device *pdev, struct ocelot *ocelot)
+{
+	struct device *dev = ocelot->dev;
+	struct ocelot_fdma *fdma;
+	int ret;
+
+	fdma = devm_kzalloc(dev, sizeof(*fdma), GFP_KERNEL);
+	if (!fdma)
+		return;
+
+	ocelot->fdma = fdma;
+	ocelot->dev->coherent_dma_mask = DMA_BIT_MASK(32);
+
+	ocelot_fdma_writel(ocelot, MSCC_FDMA_INTR_ENA, 0);
+
+	fdma->ocelot = ocelot;
+	fdma->irq = platform_get_irq_byname(pdev, "fdma");
+	ret = devm_request_irq(dev, fdma->irq, ocelot_fdma_interrupt, 0,
+			       dev_name(dev), ocelot);
+	if (ret)
+		goto err_free_fdma;
+
+	ret = ocelot_fdma_rings_alloc(ocelot);
+	if (ret)
+		goto err_free_irq;
+
+	static_branch_enable(&ocelot_fdma_enabled);
+
+	return;
+
+err_free_irq:
+	devm_free_irq(dev, fdma->irq, fdma);
+err_free_fdma:
+	devm_kfree(dev, fdma);
+
+	ocelot->fdma = NULL;
+}
+
+void ocelot_fdma_start(struct ocelot *ocelot)
+{
+	struct ocelot_fdma *fdma = ocelot->fdma;
+
+	/* Reconfigure for extraction and injection using DMA */
+	ocelot_write_rix(ocelot, QS_INJ_GRP_CFG_MODE(2), QS_INJ_GRP_CFG, 0);
+	ocelot_write_rix(ocelot, QS_INJ_CTRL_GAP_SIZE(0), QS_INJ_CTRL, 0);
+
+	ocelot_write_rix(ocelot, QS_XTR_GRP_CFG_MODE(2), QS_XTR_GRP_CFG, 0);
+
+	ocelot_fdma_writel(ocelot, MSCC_FDMA_INTR_LLP, 0xffffffff);
+	ocelot_fdma_writel(ocelot, MSCC_FDMA_INTR_FRM, 0xffffffff);
+
+	ocelot_fdma_writel(ocelot, MSCC_FDMA_INTR_LLP_ENA,
+			   BIT(MSCC_FDMA_INJ_CHAN) | BIT(MSCC_FDMA_XTR_CHAN));
+	ocelot_fdma_writel(ocelot, MSCC_FDMA_INTR_FRM_ENA,
+			   BIT(MSCC_FDMA_XTR_CHAN));
+	ocelot_fdma_writel(ocelot, MSCC_FDMA_INTR_ENA,
+			   BIT(MSCC_FDMA_INJ_CHAN) | BIT(MSCC_FDMA_XTR_CHAN));
+
+	napi_enable(&fdma->napi);
+
+	ocelot_fdma_activate_chan(ocelot, ocelot->fdma->rx_ring.dcbs_dma,
+				  MSCC_FDMA_XTR_CHAN);
+}
+
+void ocelot_fdma_deinit(struct ocelot *ocelot)
+{
+	struct ocelot_fdma *fdma = ocelot->fdma;
+
+	ocelot_fdma_writel(ocelot, MSCC_FDMA_INTR_ENA, 0);
+	ocelot_fdma_writel(ocelot, MSCC_FDMA_CH_FORCEDIS,
+			   BIT(MSCC_FDMA_XTR_CHAN));
+	ocelot_fdma_writel(ocelot, MSCC_FDMA_CH_FORCEDIS,
+			   BIT(MSCC_FDMA_INJ_CHAN));
+	napi_synchronize(&fdma->napi);
+	napi_disable(&fdma->napi);
+
+	ocelot_fdma_free_rx_ring(ocelot);
+	ocelot_fdma_free_tx_ring(ocelot);
+}
diff --git a/drivers/net/ethernet/mscc/ocelot_fdma.h b/drivers/net/ethernet/mscc/ocelot_fdma.h
new file mode 100644
index 000000000000..2fc8e1dd7230
--- /dev/null
+++ b/drivers/net/ethernet/mscc/ocelot_fdma.h
@@ -0,0 +1,166 @@
+/* SPDX-License-Identifier: (GPL-2.0 OR MIT) */
+/*
+ * Microsemi SoCs FDMA driver
+ *
+ * Copyright (c) 2021 Microchip
+ */
+#ifndef _MSCC_OCELOT_FDMA_H_
+#define _MSCC_OCELOT_FDMA_H_
+
+#include "ocelot.h"
+
+#define MSCC_FDMA_DCB_STAT_BLOCKO(x)	(((x) << 20) & GENMASK(31, 20))
+#define MSCC_FDMA_DCB_STAT_BLOCKO_M	GENMASK(31, 20)
+#define MSCC_FDMA_DCB_STAT_BLOCKO_X(x)	(((x) & GENMASK(31, 20)) >> 20)
+#define MSCC_FDMA_DCB_STAT_PD		BIT(19)
+#define MSCC_FDMA_DCB_STAT_ABORT	BIT(18)
+#define MSCC_FDMA_DCB_STAT_EOF		BIT(17)
+#define MSCC_FDMA_DCB_STAT_SOF		BIT(16)
+#define MSCC_FDMA_DCB_STAT_BLOCKL_M	GENMASK(15, 0)
+#define MSCC_FDMA_DCB_STAT_BLOCKL(x)	((x) & GENMASK(15, 0))
+
+#define MSCC_FDMA_DCB_LLP(x)		((x) * 4 + 0x0)
+#define MSCC_FDMA_DCB_LLP_PREV(x)	((x) * 4 + 0xA0)
+#define MSCC_FDMA_CH_SAFE		0xcc
+#define MSCC_FDMA_CH_ACTIVATE		0xd0
+#define MSCC_FDMA_CH_DISABLE		0xd4
+#define MSCC_FDMA_CH_FORCEDIS		0xd8
+#define MSCC_FDMA_EVT_ERR		0x164
+#define MSCC_FDMA_EVT_ERR_CODE		0x168
+#define MSCC_FDMA_INTR_LLP		0x16c
+#define MSCC_FDMA_INTR_LLP_ENA		0x170
+#define MSCC_FDMA_INTR_FRM		0x174
+#define MSCC_FDMA_INTR_FRM_ENA		0x178
+#define MSCC_FDMA_INTR_ENA		0x184
+#define MSCC_FDMA_INTR_IDENT		0x188
+
+#define MSCC_FDMA_INJ_CHAN		2
+#define MSCC_FDMA_XTR_CHAN		0
+
+#define OCELOT_FDMA_WEIGHT		32
+
+#define OCELOT_FDMA_CH_SAFE_TIMEOUT_US	10
+
+#define OCELOT_FDMA_RX_RING_SIZE	512
+#define OCELOT_FDMA_TX_RING_SIZE	128
+
+#define OCELOT_FDMA_RX_DCB_SIZE		(OCELOT_FDMA_RX_RING_SIZE * \
+					 sizeof(struct ocelot_fdma_dcb))
+#define OCELOT_FDMA_TX_DCB_SIZE		(OCELOT_FDMA_TX_RING_SIZE * \
+					 sizeof(struct ocelot_fdma_dcb))
+/* +4 allows for word alignment after allocation */
+#define OCELOT_DCBS_HW_ALLOC_SIZE	(OCELOT_FDMA_RX_DCB_SIZE + \
+					 OCELOT_FDMA_TX_DCB_SIZE + \
+					 4)
+
+#define OCELOT_FDMA_RX_SIZE		(PAGE_SIZE / 2)
+
+#define OCELOT_FDMA_SKBFRAG_OVR		(4 + SKB_DATA_ALIGN(sizeof(struct skb_shared_info)))
+#define OCELOT_FDMA_RXB_SIZE		ALIGN_DOWN(OCELOT_FDMA_RX_SIZE - OCELOT_FDMA_SKBFRAG_OVR, 4)
+#define OCELOT_FDMA_SKBFRAG_SIZE	(OCELOT_FDMA_RXB_SIZE + OCELOT_FDMA_SKBFRAG_OVR)
+
+DECLARE_STATIC_KEY_FALSE(ocelot_fdma_enabled);
+
+struct ocelot_fdma_dcb {
+	u32 llp;
+	u32 datap;
+	u32 datal;
+	u32 stat;
+} __packed;
+
+/**
+ * struct ocelot_fdma_tx_buf - TX buffer structure
+ * @skb: SKB currently used in the corresponding DCB.
+ * @dma_addr: SKB DMA mapped address.
+ */
+struct ocelot_fdma_tx_buf {
+	struct sk_buff *skb;
+	DEFINE_DMA_UNMAP_ADDR(dma_addr);
+};
+
+/**
+ * struct ocelot_fdma_tx_ring - TX ring description of DCBs
+ *
+ * @dcbs: DCBs allocated for the ring
+ * @dcbs_dma: DMA base address of the DCBs
+ * @bufs: List of TX buffer associated to the DCBs
+ * @xmit_lock: lock for concurrent xmit access
+ * @next_to_clean: Next DCB to be cleaned in tx_cleanup
+ * @next_to_use: Next available DCB to send SKB
+ */
+struct ocelot_fdma_tx_ring {
+	struct ocelot_fdma_dcb *dcbs;
+	dma_addr_t dcbs_dma;
+	struct ocelot_fdma_tx_buf bufs[OCELOT_FDMA_TX_RING_SIZE];
+	/* Protect concurrent xmit calls */
+	spinlock_t xmit_lock;
+	u16 next_to_clean;
+	u16 next_to_use;
+};
+
+/**
+ * struct ocelot_fdma_rx_buf - RX buffer structure
+ * @page: Struct page used in this buffer
+ * @page_offset: Current page offset (either 0 or PAGE_SIZE/2)
+ * @dma_addr: DMA address of the page
+ */
+struct ocelot_fdma_rx_buf {
+	struct page *page;
+	u32 page_offset;
+	dma_addr_t dma_addr;
+};
+
+/**
+ * struct ocelot_fdma_rx_ring - TX ring description of DCBs
+ *
+ * @dcbs: DCBs allocated for the ring
+ * @dcbs_dma: DMA base address of the DCBs
+ * @bufs: List of RX buffer associated to the DCBs
+ * @skb: SKB currently received by the netdev
+ * @next_to_clean: Next DCB to be cleaned NAPI polling
+ * @next_to_use: Next available DCB to send SKB
+ * @next_to_alloc: Next buffer that needs to be allocated (page reuse or alloc)
+ */
+struct ocelot_fdma_rx_ring {
+	struct ocelot_fdma_dcb *dcbs;
+	dma_addr_t dcbs_dma;
+	struct ocelot_fdma_rx_buf bufs[OCELOT_FDMA_RX_RING_SIZE];
+	struct sk_buff *skb;
+	u16 next_to_clean;
+	u16 next_to_use;
+	u16 next_to_alloc;
+};
+
+/**
+ * struct ocelot_fdma - FDMA context
+ *
+ * @irq: FDMA interrupt
+ * @ndev: Net device used to initialize NAPI
+ * @dcbs_base: Memory coherent DCBs
+ * @dcbs_dma_base: DMA base address of memory coherent DCBs
+ * @tx_ring: Injection ring
+ * @rx_ring: Extraction ring
+ * @napi: NAPI context
+ * @ocelot: Back-pointer to ocelot struct
+ */
+struct ocelot_fdma {
+	int irq;
+	struct net_device *ndev;
+	struct ocelot_fdma_dcb *dcbs_base;
+	dma_addr_t dcbs_dma_base;
+	struct ocelot_fdma_tx_ring tx_ring;
+	struct ocelot_fdma_rx_ring rx_ring;
+	struct napi_struct napi;
+	struct ocelot *ocelot;
+};
+
+void ocelot_fdma_init(struct platform_device *pdev, struct ocelot *ocelot);
+void ocelot_fdma_start(struct ocelot *ocelot);
+void ocelot_fdma_deinit(struct ocelot *ocelot);
+int ocelot_fdma_inject_frame(struct ocelot *fdma, int port, u32 rew_op,
+			     struct sk_buff *skb, struct net_device *dev);
+void ocelot_fdma_netdev_init(struct ocelot *ocelot, struct net_device *dev);
+void ocelot_fdma_netdev_deinit(struct ocelot *ocelot,
+			       struct net_device *dev);
+
+#endif
diff --git a/drivers/net/ethernet/mscc/ocelot_net.c b/drivers/net/ethernet/mscc/ocelot_net.c
index d83d3ffba3ac..8115c3db252e 100644
--- a/drivers/net/ethernet/mscc/ocelot_net.c
+++ b/drivers/net/ethernet/mscc/ocelot_net.c
@@ -15,6 +15,7 @@
 #include <net/pkt_cls.h>
 #include "ocelot.h"
 #include "ocelot_vcap.h"
+#include "ocelot_fdma.h"
 
 #define OCELOT_MAC_QUIRKS	OCELOT_QUIRK_QSGMII_PORTS_MUST_BE_UP
 
@@ -457,7 +458,8 @@ static netdev_tx_t ocelot_port_xmit(struct sk_buff *skb, struct net_device *dev)
 	int port = priv->chip_port;
 	u32 rew_op = 0;
 
-	if (!ocelot_can_inject(ocelot, 0))
+	if (!static_branch_unlikely(&ocelot_fdma_enabled) &&
+	    !ocelot_can_inject(ocelot, 0))
 		return NETDEV_TX_BUSY;
 
 	/* Check if timestamping is needed */
@@ -475,9 +477,13 @@ static netdev_tx_t ocelot_port_xmit(struct sk_buff *skb, struct net_device *dev)
 		rew_op = ocelot_ptp_rew_op(skb);
 	}
 
-	ocelot_port_inject_frame(ocelot, port, 0, rew_op, skb);
+	if (static_branch_unlikely(&ocelot_fdma_enabled)) {
+		ocelot_fdma_inject_frame(ocelot, port, rew_op, skb, dev);
+	} else {
+		ocelot_port_inject_frame(ocelot, port, 0, rew_op, skb);
 
-	kfree_skb(skb);
+		consume_skb(skb);
+	}
 
 	return NETDEV_TX_OK;
 }
@@ -1702,14 +1708,20 @@ int ocelot_probe_port(struct ocelot *ocelot, int port, struct regmap *target,
 	if (err)
 		goto out;
 
+	if (ocelot->fdma)
+		ocelot_fdma_netdev_init(ocelot, dev);
+
 	err = register_netdev(dev);
 	if (err) {
 		dev_err(ocelot->dev, "register_netdev failed\n");
-		goto out;
+		goto out_fdma_deinit;
 	}
 
 	return 0;
 
+out_fdma_deinit:
+	if (ocelot->fdma)
+		ocelot_fdma_netdev_deinit(ocelot, dev);
 out:
 	ocelot->ports[port] = NULL;
 	free_netdev(dev);
@@ -1722,9 +1734,14 @@ void ocelot_release_port(struct ocelot_port *ocelot_port)
 	struct ocelot_port_private *priv = container_of(ocelot_port,
 						struct ocelot_port_private,
 						port);
+	struct ocelot *ocelot = ocelot_port->ocelot;
+	struct ocelot_fdma *fdma = ocelot->fdma;
 
 	unregister_netdev(priv->dev);
 
+	if (fdma)
+		ocelot_fdma_netdev_deinit(ocelot, priv->dev);
+
 	if (priv->phylink) {
 		rtnl_lock();
 		phylink_disconnect_phy(priv->phylink);
diff --git a/drivers/net/ethernet/mscc/ocelot_vsc7514.c b/drivers/net/ethernet/mscc/ocelot_vsc7514.c
index cd3eb101f159..bd0b87ec2527 100644
--- a/drivers/net/ethernet/mscc/ocelot_vsc7514.c
+++ b/drivers/net/ethernet/mscc/ocelot_vsc7514.c
@@ -18,6 +18,7 @@
 
 #include <soc/mscc/ocelot_vcap.h>
 #include <soc/mscc/ocelot_hsio.h>
+#include "ocelot_fdma.h"
 #include "ocelot.h"
 
 #define VSC7514_VCAP_POLICER_BASE			128
@@ -1048,6 +1049,7 @@ static int mscc_ocelot_probe(struct platform_device *pdev)
 		{ S1, "s1" },
 		{ S2, "s2" },
 		{ PTP, "ptp", 1 },
+		{ FDMA, "fdma", 1 },
 	};
 
 	if (!np && !pdev->dev.platform_data)
@@ -1083,6 +1085,9 @@ static int mscc_ocelot_probe(struct platform_device *pdev)
 		ocelot->targets[io_target[i].id] = target;
 	}
 
+	if (ocelot->targets[FDMA])
+		ocelot_fdma_init(pdev, ocelot);
+
 	hsio = syscon_regmap_lookup_by_compatible("mscc,ocelot-hsio");
 	if (IS_ERR(hsio)) {
 		dev_err(&pdev->dev, "missing hsio syscon\n");
@@ -1146,6 +1151,9 @@ static int mscc_ocelot_probe(struct platform_device *pdev)
 	if (err)
 		goto out_ocelot_devlink_unregister;
 
+	if (ocelot->fdma)
+		ocelot_fdma_start(ocelot);
+
 	err = ocelot_devlink_sb_register(ocelot);
 	if (err)
 		goto out_ocelot_release_ports;
@@ -1186,6 +1194,8 @@ static int mscc_ocelot_remove(struct platform_device *pdev)
 {
 	struct ocelot *ocelot = platform_get_drvdata(pdev);
 
+	if (ocelot->fdma)
+		ocelot_fdma_deinit(ocelot);
 	devlink_unregister(ocelot->devlink);
 	ocelot_deinit_timestamp(ocelot);
 	ocelot_devlink_sb_unregister(ocelot);
diff --git a/include/soc/mscc/ocelot.h b/include/soc/mscc/ocelot.h
index f038062a97a9..3e9454b00562 100644
--- a/include/soc/mscc/ocelot.h
+++ b/include/soc/mscc/ocelot.h
@@ -118,6 +118,7 @@ enum ocelot_target {
 	S2,
 	HSIO,
 	PTP,
+	FDMA,
 	GCB,
 	DEV_GMII,
 	TARGET_MAX,
@@ -732,6 +733,8 @@ struct ocelot {
 	/* Protects the PTP clock */
 	spinlock_t			ptp_clock_lock;
 	struct ptp_pin_desc		ptp_pins[OCELOT_PTP_PINS_NUM];
+
+	struct ocelot_fdma		*fdma;
 };
 
 struct ocelot_policer {
-- 
2.34.1


^ permalink raw reply related

* [PATCH net-next v7 3/4] net: ocelot: add support for ndo_change_mtu
From: Clément Léger @ 2021-12-09 10:43 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Rob Herring, Vladimir Oltean,
	Claudiu Manoil, Alexandre Belloni, UNGLinuxDriver, Andrew Lunn,
	Florian Fainelli, Russell King
  Cc: Clément Léger, netdev, devicetree, linux-kernel,
	Thomas Petazzoni, Denis Kirjanov, Julian Wiedmann
In-Reply-To: <20211209104306.986188-1-clement.leger@bootlin.com>

This commit adds support for changing MTU for the ocelot register based
interface. For ocelot, JUMBO frame size can be set up to 25000 bytes
but has been set to 9000 which is a saner value and allows for maximum
gain of performance with FDMA.

Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Clément Léger <clement.leger@bootlin.com>
---
 drivers/net/ethernet/mscc/ocelot.h     |  2 ++
 drivers/net/ethernet/mscc/ocelot_net.c | 14 ++++++++++++++
 2 files changed, 16 insertions(+)

diff --git a/drivers/net/ethernet/mscc/ocelot.h b/drivers/net/ethernet/mscc/ocelot.h
index 1eb0b5ad51e9..bf4eff6d7086 100644
--- a/drivers/net/ethernet/mscc/ocelot.h
+++ b/drivers/net/ethernet/mscc/ocelot.h
@@ -32,6 +32,8 @@
 
 #define OCELOT_PTP_QUEUE_SZ	128
 
+#define OCELOT_JUMBO_MTU	9000
+
 struct ocelot_port_tc {
 	bool block_shared;
 	unsigned long offload_cnt;
diff --git a/drivers/net/ethernet/mscc/ocelot_net.c b/drivers/net/ethernet/mscc/ocelot_net.c
index d9694dc14a2d..d83d3ffba3ac 100644
--- a/drivers/net/ethernet/mscc/ocelot_net.c
+++ b/drivers/net/ethernet/mscc/ocelot_net.c
@@ -764,10 +764,23 @@ static int ocelot_ioctl(struct net_device *dev, struct ifreq *ifr, int cmd)
 	return phy_mii_ioctl(dev->phydev, ifr, cmd);
 }
 
+static int ocelot_change_mtu(struct net_device *dev, int new_mtu)
+{
+	struct ocelot_port_private *priv = netdev_priv(dev);
+	struct ocelot_port *ocelot_port = &priv->port;
+	struct ocelot *ocelot = ocelot_port->ocelot;
+
+	ocelot_port_set_maxlen(ocelot, priv->chip_port, new_mtu);
+	WRITE_ONCE(dev->mtu, new_mtu);
+
+	return 0;
+}
+
 static const struct net_device_ops ocelot_port_netdev_ops = {
 	.ndo_open			= ocelot_port_open,
 	.ndo_stop			= ocelot_port_stop,
 	.ndo_start_xmit			= ocelot_port_xmit,
+	.ndo_change_mtu			= ocelot_change_mtu,
 	.ndo_set_rx_mode		= ocelot_set_rx_mode,
 	.ndo_set_mac_address		= ocelot_port_set_mac_address,
 	.ndo_get_stats64		= ocelot_get_stats64,
@@ -1670,6 +1683,7 @@ int ocelot_probe_port(struct ocelot *ocelot, int port, struct regmap *target,
 
 	dev->netdev_ops = &ocelot_port_netdev_ops;
 	dev->ethtool_ops = &ocelot_ethtool_ops;
+	dev->max_mtu = OCELOT_JUMBO_MTU;
 
 	dev->hw_features |= NETIF_F_HW_VLAN_CTAG_FILTER | NETIF_F_RXFCS |
 		NETIF_F_HW_TC;
-- 
2.34.1


^ permalink raw reply related

* [PATCH net-next v7 2/4] net: ocelot: add and export ocelot_ptp_rx_timestamp()
From: Clément Léger @ 2021-12-09 10:43 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Rob Herring, Vladimir Oltean,
	Claudiu Manoil, Alexandre Belloni, UNGLinuxDriver, Andrew Lunn,
	Florian Fainelli, Russell King
  Cc: Clément Léger, netdev, devicetree, linux-kernel,
	Thomas Petazzoni, Denis Kirjanov, Julian Wiedmann
In-Reply-To: <20211209104306.986188-1-clement.leger@bootlin.com>

In order to support PTP in FDMA, PTP handling code is needed. Since
this is the same as for register-based extraction, export it with
a new ocelot_ptp_rx_timestamp() function.

Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Clément Léger <clement.leger@bootlin.com>
---
 drivers/net/ethernet/mscc/ocelot.c | 41 +++++++++++++++++-------------
 include/soc/mscc/ocelot.h          |  2 ++
 2 files changed, 26 insertions(+), 17 deletions(-)

diff --git a/drivers/net/ethernet/mscc/ocelot.c b/drivers/net/ethernet/mscc/ocelot.c
index b5ec8ce7f4dd..876a7ecf86eb 100644
--- a/drivers/net/ethernet/mscc/ocelot.c
+++ b/drivers/net/ethernet/mscc/ocelot.c
@@ -1054,14 +1054,34 @@ static int ocelot_xtr_poll_xfh(struct ocelot *ocelot, int grp, u32 *xfh)
 	return 0;
 }
 
-int ocelot_xtr_poll_frame(struct ocelot *ocelot, int grp, struct sk_buff **nskb)
+void ocelot_ptp_rx_timestamp(struct ocelot *ocelot, struct sk_buff *skb,
+			     u64 timestamp)
 {
 	struct skb_shared_hwtstamps *shhwtstamps;
 	u64 tod_in_ns, full_ts_in_ns;
+	struct timespec64 ts;
+
+	ocelot_ptp_gettime64(&ocelot->ptp_info, &ts);
+
+	tod_in_ns = ktime_set(ts.tv_sec, ts.tv_nsec);
+	if ((tod_in_ns & 0xffffffff) < timestamp)
+		full_ts_in_ns = (((tod_in_ns >> 32) - 1) << 32) |
+				timestamp;
+	else
+		full_ts_in_ns = (tod_in_ns & GENMASK_ULL(63, 32)) |
+				timestamp;
+
+	shhwtstamps = skb_hwtstamps(skb);
+	memset(shhwtstamps, 0, sizeof(struct skb_shared_hwtstamps));
+	shhwtstamps->hwtstamp = full_ts_in_ns;
+}
+EXPORT_SYMBOL(ocelot_ptp_rx_timestamp);
+
+int ocelot_xtr_poll_frame(struct ocelot *ocelot, int grp, struct sk_buff **nskb)
+{
 	u64 timestamp, src_port, len;
 	u32 xfh[OCELOT_TAG_LEN / 4];
 	struct net_device *dev;
-	struct timespec64 ts;
 	struct sk_buff *skb;
 	int sz, buf_len;
 	u32 val, *buf;
@@ -1117,21 +1137,8 @@ int ocelot_xtr_poll_frame(struct ocelot *ocelot, int grp, struct sk_buff **nskb)
 		*buf = val;
 	}
 
-	if (ocelot->ptp) {
-		ocelot_ptp_gettime64(&ocelot->ptp_info, &ts);
-
-		tod_in_ns = ktime_set(ts.tv_sec, ts.tv_nsec);
-		if ((tod_in_ns & 0xffffffff) < timestamp)
-			full_ts_in_ns = (((tod_in_ns >> 32) - 1) << 32) |
-					timestamp;
-		else
-			full_ts_in_ns = (tod_in_ns & GENMASK_ULL(63, 32)) |
-					timestamp;
-
-		shhwtstamps = skb_hwtstamps(skb);
-		memset(shhwtstamps, 0, sizeof(struct skb_shared_hwtstamps));
-		shhwtstamps->hwtstamp = full_ts_in_ns;
-	}
+	if (ocelot->ptp)
+		ocelot_ptp_rx_timestamp(ocelot, skb, timestamp);
 
 	/* Everything we see on an interface that is in the HW bridge
 	 * has already been forwarded.
diff --git a/include/soc/mscc/ocelot.h b/include/soc/mscc/ocelot.h
index 9b99cfd39a59..f038062a97a9 100644
--- a/include/soc/mscc/ocelot.h
+++ b/include/soc/mscc/ocelot.h
@@ -797,6 +797,8 @@ void ocelot_port_inject_frame(struct ocelot *ocelot, int port, int grp,
 void ocelot_ifh_port_set(void *ifh, int port, u32 rew_op, u32 vlan_tag);
 int ocelot_xtr_poll_frame(struct ocelot *ocelot, int grp, struct sk_buff **skb);
 void ocelot_drain_cpu_queue(struct ocelot *ocelot, int grp);
+void ocelot_ptp_rx_timestamp(struct ocelot *ocelot, struct sk_buff *skb,
+			     u64 timestamp);
 
 /* Hardware initialization */
 int ocelot_regfields_init(struct ocelot *ocelot,
-- 
2.34.1


^ permalink raw reply related

* [PATCH net-next v7 1/4] net: ocelot: export ocelot_ifh_port_set() to setup IFH
From: Clément Léger @ 2021-12-09 10:43 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Rob Herring, Vladimir Oltean,
	Claudiu Manoil, Alexandre Belloni, UNGLinuxDriver, Andrew Lunn,
	Florian Fainelli, Russell King
  Cc: Clément Léger, netdev, devicetree, linux-kernel,
	Thomas Petazzoni, Denis Kirjanov, Julian Wiedmann
In-Reply-To: <20211209104306.986188-1-clement.leger@bootlin.com>

FDMA will need this code to prepare the injection frame header when
sending SKBs. Move this code into ocelot_ifh_port_set() and add
conditional IFH setting for vlan and rew op if they are not set.

Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Clément Léger <clement.leger@bootlin.com>
---
 drivers/net/ethernet/mscc/ocelot.c | 18 +++++++++++++-----
 include/soc/mscc/ocelot.h          |  1 +
 2 files changed, 14 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/mscc/ocelot.c b/drivers/net/ethernet/mscc/ocelot.c
index b1856d8c944b..b5ec8ce7f4dd 100644
--- a/drivers/net/ethernet/mscc/ocelot.c
+++ b/drivers/net/ethernet/mscc/ocelot.c
@@ -1164,6 +1164,18 @@ bool ocelot_can_inject(struct ocelot *ocelot, int grp)
 }
 EXPORT_SYMBOL(ocelot_can_inject);
 
+void ocelot_ifh_port_set(void *ifh, int port, u32 rew_op, u32 vlan_tag)
+{
+	ocelot_ifh_set_bypass(ifh, 1);
+	ocelot_ifh_set_dest(ifh, BIT_ULL(port));
+	ocelot_ifh_set_tag_type(ifh, IFH_TAG_TYPE_C);
+	if (vlan_tag)
+		ocelot_ifh_set_vlan_tci(ifh, vlan_tag);
+	if (rew_op)
+		ocelot_ifh_set_rew_op(ifh, rew_op);
+}
+EXPORT_SYMBOL(ocelot_ifh_port_set);
+
 void ocelot_port_inject_frame(struct ocelot *ocelot, int port, int grp,
 			      u32 rew_op, struct sk_buff *skb)
 {
@@ -1173,11 +1185,7 @@ void ocelot_port_inject_frame(struct ocelot *ocelot, int port, int grp,
 	ocelot_write_rix(ocelot, QS_INJ_CTRL_GAP_SIZE(1) |
 			 QS_INJ_CTRL_SOF, QS_INJ_CTRL, grp);
 
-	ocelot_ifh_set_bypass(ifh, 1);
-	ocelot_ifh_set_dest(ifh, BIT_ULL(port));
-	ocelot_ifh_set_tag_type(ifh, IFH_TAG_TYPE_C);
-	ocelot_ifh_set_vlan_tci(ifh, skb_vlan_tag_get(skb));
-	ocelot_ifh_set_rew_op(ifh, rew_op);
+	ocelot_ifh_port_set(ifh, port, rew_op, skb_vlan_tag_get(skb));
 
 	for (i = 0; i < OCELOT_TAG_LEN / 4; i++)
 		ocelot_write_rix(ocelot, ifh[i], QS_INJ_WR, grp);
diff --git a/include/soc/mscc/ocelot.h b/include/soc/mscc/ocelot.h
index 33f2e8c9e88b..9b99cfd39a59 100644
--- a/include/soc/mscc/ocelot.h
+++ b/include/soc/mscc/ocelot.h
@@ -794,6 +794,7 @@ void __ocelot_target_write_ix(struct ocelot *ocelot, enum ocelot_target target,
 bool ocelot_can_inject(struct ocelot *ocelot, int grp);
 void ocelot_port_inject_frame(struct ocelot *ocelot, int port, int grp,
 			      u32 rew_op, struct sk_buff *skb);
+void ocelot_ifh_port_set(void *ifh, int port, u32 rew_op, u32 vlan_tag);
 int ocelot_xtr_poll_frame(struct ocelot *ocelot, int grp, struct sk_buff **skb);
 void ocelot_drain_cpu_queue(struct ocelot *ocelot, int grp);
 
-- 
2.34.1


^ permalink raw reply related

* [PATCH net-next v7 0/4] Add FDMA support on ocelot switch driver
From: Clément Léger @ 2021-12-09 10:43 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski, Rob Herring, Vladimir Oltean,
	Claudiu Manoil, Alexandre Belloni, UNGLinuxDriver, Andrew Lunn,
	Florian Fainelli, Russell King
  Cc: Clément Léger, netdev, devicetree, linux-kernel,
	Thomas Petazzoni, Denis Kirjanov, Julian Wiedmann

This series adds support for the Frame DMA present on the VSC7514
switch. The FDMA is able to extract and inject packets on the various
ethernet interfaces present on the switch.

------------------
Changes in V7:
  - Fix kernel doc for fdma struct

Changes in V6:
  - Remove dead code added in ocelot_vsc7514
  - Remove useless include added in mscc/ocelot.h
  - Remove trailing whitespace
  - Move skb_tx_timestamp before sending the skb
  - Fix a few long lines

Changes in V5:
  - Add skb freeing for TX and fix RX ring skb not being freed
  - Fix napi init in case of netdev registration failure
  - Reorganize FDMA register definitions
  - Used regmap targets from ocelot structure to get fdma pointer
  - s/page_count/page_ref_count
  - Move napi back in struct ocelot_fdma

Changes in V4:
  - Use regmap for register access
  - Removed yaml bindings convertion as well as mac address from dt
  - Removed pre-computed IFH for the moment
  - Fixed timestamp reading for PTP in FDMA
  - Fixed wrong exit path for fdma netdev init
  - Removed spinlock from TX cleanup
  - Add asynchronous RX chan stop before refilling
  - Reduce CH_SAFE wait time to 10us
  - Reduce waiting time for channel to be safe
  - Completely rework rx to use page recycling (code from gianfar)
  - Reenable MTU change support since FDMA now supports it transparently
  - Split TX and RX ring size
  - Larger RX size to lower page allocation rate
  - Add static key to check for FDMA to be enabled in fast path

Changes in V3:
  - Add timeouts for hardware registers read
  - Add cleanup path in fdma_init
  - Rework injection and extraction to used ring like structure
  - Added PTP support to FDMA
  - Use pskb_expand_head instead of skb_copy_expand in xmit
  - Drop jumbo support
  - Use of_get_ethdev_address
  - Add ocelot_fdma_netdev_init/deinit

Changes in V2:
  - Read MAC for each port and not as switch base MAC address
  - Add missing static for some functions in ocelot_fdma.c
  - Split change_mtu from fdma commit
  - Add jumbo support for register based xmit
  - Move precomputed header into ocelot_port struct
  - Remove use of QUIRK_ENDIAN_LITTLE due to misconfiguration for tests
  - Remove fragmented packet sending which has not been tested

Clément Léger (4):
  net: ocelot: export ocelot_ifh_port_set() to setup IFH
  net: ocelot: add and export ocelot_ptp_rx_timestamp()
  net: ocelot: add support for ndo_change_mtu
  net: ocelot: add FDMA support

 drivers/net/ethernet/mscc/Makefile         |   1 +
 drivers/net/ethernet/mscc/ocelot.c         |  59 +-
 drivers/net/ethernet/mscc/ocelot.h         |   2 +
 drivers/net/ethernet/mscc/ocelot_fdma.c    | 894 +++++++++++++++++++++
 drivers/net/ethernet/mscc/ocelot_fdma.h    | 166 ++++
 drivers/net/ethernet/mscc/ocelot_net.c     |  39 +-
 drivers/net/ethernet/mscc/ocelot_vsc7514.c |  10 +
 include/soc/mscc/ocelot.h                  |   6 +
 8 files changed, 1151 insertions(+), 26 deletions(-)
 create mode 100644 drivers/net/ethernet/mscc/ocelot_fdma.c
 create mode 100644 drivers/net/ethernet/mscc/ocelot_fdma.h

-- 
2.34.1


^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox