Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH v4 8/9] MIPS: SGI-IP27: fix readb/writeb addressing
From: Thomas Bogendoerfer @ 2019-08-14 11:40 UTC (permalink / raw)
  To: Philippe Mathieu-Daudé
  Cc: Greg Kroah-Hartman, Andy Shevchenko, Ralf Baechle, Paul Burton,
	James Hogan, Dmitry Torokhov, Lee Jones, David S. Miller,
	Srinivas Kandagatla, Alessandro Zummo, Alexandre Belloni,
	Jiri Slaby, Evgeniy Polyakov, linux-mips,
	Linux Kernel Mailing List, linux-input, netdev,
	open list:REAL TIME CLOCK (RTC) SUBSYSTEM,
	open list:SERIAL DRIVERS
In-Reply-To: <90129235-58c2-aeed-a9d3-96f4a8f45709@amsat.org>

On Tue, 13 Aug 2019 10:47:13 +0200
Philippe Mathieu-Daudé <f4bug@amsat.org> wrote:

> Hi Thomas,
> 
> On 8/11/19 9:29 AM, Greg Kroah-Hartman wrote:
> > On Sat, Aug 10, 2019 at 04:22:23PM +0300, Andy Shevchenko wrote:
> >> On Fri, Aug 9, 2019 at 1:34 PM Thomas Bogendoerfer
> >> <tbogendoerfer@suse.de> wrote:
> >>>
> >>> Our chosen byte swapping, which is what firmware already uses, is to
> >>> do readl/writel by normal lw/sw intructions (data invariance). This
> >>> also means we need to mangle addresses for u8 and u16 accesses. The
> >>> mangling for 16bit has been done aready, but 8bit one was missing.
> >>> Correcting this causes different addresses for accesses to the
> >>> SuperIO and local bus of the IOC3 chip. This is fixed by changing
> >>> byte order in ioc3 and m48rtc_rtc structs.
> >>
> >>>  /* serial port register map */
> >>>  struct ioc3_serialregs {
> >>> -       uint32_t        sscr;
> >>> -       uint32_t        stpir;
> >>> -       uint32_t        stcir;
> >>> -       uint32_t        srpir;
> >>> -       uint32_t        srcir;
> >>> -       uint32_t        srtr;
> >>> -       uint32_t        shadow;
> >>> +       u32     sscr;
> >>> +       u32     stpir;
> >>> +       u32     stcir;
> >>> +       u32     srpir;
> >>> +       u32     srcir;
> >>> +       u32     srtr;
> >>> +       u32     shadow;
> >>>  };
> >>
> >> Isn't it a churn? AFAIU kernel documentation the uint32_t is okay to
> >> use, just be consistent inside one module / driver.
> >> Am I mistaken?
> > 
> > No, but really it uint* shouldn't be used anywhere in the kernel source
> > as it does not make sense.
> 
> If you respin your series, please send this cleanup as a separate patch.

no need for an extra patch. I realized that patch 7 in this series introduces
all of these uint32_t. So i already fixed it there.

Thomas.

-- 
SUSE Linux GmbH
GF: Felix Imendörffer, Mary Higgins, Sri Rasiah
HRB 21284 (AG Nürnberg)

^ permalink raw reply

* [PATCH bpf-next] tools: bpftool: compile with $(EXTRA_WARNINGS)
From: Quentin Monnet @ 2019-08-14 11:37 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann
  Cc: bpf, netdev, oss-drivers, Quentin Monnet

Compile bpftool with $(EXTRA_WARNINGS), as defined in
scripts/Makefile.include, and fix the new warnings produced.

Simply leave -Wswitch-enum out of the warning list, as we have several
switch-case structures where it is not desirable to process all values
of an enum.

Remove -Wshadow from the warnings we manually add to CFLAGS, as it is
handled in $(EXTRA_WARNINGS).

Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
---
 tools/bpf/bpftool/Makefile | 3 ++-
 tools/bpf/bpftool/cgroup.c | 2 +-
 tools/bpf/bpftool/perf.c   | 4 ++++
 3 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/tools/bpf/bpftool/Makefile b/tools/bpf/bpftool/Makefile
index 4c9d1ffc3fc7..f284c207765a 100644
--- a/tools/bpf/bpftool/Makefile
+++ b/tools/bpf/bpftool/Makefile
@@ -37,7 +37,8 @@ prefix ?= /usr/local
 bash_compdir ?= /usr/share/bash-completion/completions
 
 CFLAGS += -O2
-CFLAGS += -W -Wall -Wextra -Wno-unused-parameter -Wshadow -Wno-missing-field-initializers
+CFLAGS += -W -Wall -Wextra -Wno-unused-parameter -Wno-missing-field-initializers
+CFLAGS += $(filter-out -Wswitch-enum,$(EXTRA_WARNINGS))
 CFLAGS += -DPACKAGE='"bpftool"' -D__EXPORTED_HEADERS__ \
 	-I$(srctree)/kernel/bpf/ \
 	-I$(srctree)/tools/include \
diff --git a/tools/bpf/bpftool/cgroup.c b/tools/bpf/bpftool/cgroup.c
index 44352b5aca85..1ef45e55039e 100644
--- a/tools/bpf/bpftool/cgroup.c
+++ b/tools/bpf/bpftool/cgroup.c
@@ -120,8 +120,8 @@ static int count_attached_bpf_progs(int cgroup_fd, enum bpf_attach_type type)
 static int show_attached_bpf_progs(int cgroup_fd, enum bpf_attach_type type,
 				   int level)
 {
+	const char *attach_flags_str;
 	__u32 prog_ids[1024] = {0};
-	char *attach_flags_str;
 	__u32 prog_cnt, iter;
 	__u32 attach_flags;
 	char buf[32];
diff --git a/tools/bpf/bpftool/perf.c b/tools/bpf/bpftool/perf.c
index f2a545e667c4..b2046f33e23f 100644
--- a/tools/bpf/bpftool/perf.c
+++ b/tools/bpf/bpftool/perf.c
@@ -104,6 +104,8 @@ static void print_perf_json(int pid, int fd, __u32 prog_id, __u32 fd_type,
 		jsonw_string_field(json_wtr, "filename", buf);
 		jsonw_lluint_field(json_wtr, "offset", probe_offset);
 		break;
+	default:
+		break;
 	}
 	jsonw_end_object(json_wtr);
 }
@@ -140,6 +142,8 @@ static void print_perf_plain(int pid, int fd, __u32 prog_id, __u32 fd_type,
 		printf("uretprobe  filename %s  offset %llu\n", buf,
 		       probe_offset);
 		break;
+	default:
+		break;
 	}
 }
 
-- 
2.17.1


^ permalink raw reply related

* Re: [PATCH AUTOSEL 4.19 04/42] netfilter: conntrack: always store window size un-scaled
From: Jakub Jankowski @ 2019-08-14 11:17 UTC (permalink / raw)
  To: Reindl Harald
  Cc: Thomas Jarosch, Sasha Levin, linux-kernel, stable,
	Florian Westphal, Jozsef Kadlecsik, Pablo Neira Ayuso,
	netfilter-devel, coreteam, netdev
In-Reply-To: <41ce587d-dfaa-fe6b-66a8-58ba1a3a2872@thelounge.net>

On 2019-08-14, Reindl Harald wrote:

> that's still not in 5.2.8

It will make its way into next 5.2.x release, as it is now in the pending 
queue: 
https://git.kernel.org/pub/scm/linux/kernel/git/stable/stable-queue.git/tree/queue-5.2

Regards,
  Jakub.


>
> without the exception and "nf_conntrack_tcp_timeout_max_retrans = 60" a
> vnc-over-ssh session having the VNC view in the background freezes
> within 60 secods
>
> -----------------------------------------------------------------------------------------------
> IPV4 TABLE MANGLE (STATEFUL PRE-NAT/FILTER)
> -----------------------------------------------------------------------------------------------
> Chain PREROUTING (policy ACCEPT 100 packets, 9437 bytes)
> num   pkts bytes target     prot opt in     out     source
> destination
> 1     6526 3892K ACCEPT     all  --  *      *       0.0.0.0/0
> 0.0.0.0/0            ctstate RELATED,ESTABLISHED
> 2      125  6264 ACCEPT     all  --  lo     *       0.0.0.0/0
> 0.0.0.0/0
> 3       64  4952 ACCEPT     all  --  vmnet8 *       0.0.0.0/0
> 0.0.0.0/0
> 4        1    40 DROP       all  --  *      *       0.0.0.0/0
> 0.0.0.0/0            ctstate INVALID
>
> -------- Weitergeleitete Nachricht --------
> Betreff: [PATCH AUTOSEL 5.2 07/76] netfilter: conntrack: always store
> window size un-scaled
>
> Am 08.08.19 um 11:02 schrieb Thomas Jarosch:
>> Hello together,
>>
>> You wrote on Fri, Aug 02, 2019 at 09:22:24AM -0400:
>>> From: Florian Westphal <fw@strlen.de>
>>>
>>> [ Upstream commit 959b69ef57db00cb33e9c4777400ae7183ebddd3 ]
>>>
>>> Jakub Jankowski reported following oddity:
>>>
>>> After 3 way handshake completes, timeout of new connection is set to
>>> max_retrans (300s) instead of established (5 days).
>>>
>>> shortened excerpt from pcap provided:
>>> 25.070622 IP (flags [DF], proto TCP (6), length 52)
>>> 10.8.5.4.1025 > 10.8.1.2.80: Flags [S], seq 11, win 64240, [wscale 8]
>>> 26.070462 IP (flags [DF], proto TCP (6), length 48)
>>> 10.8.1.2.80 > 10.8.5.4.1025: Flags [S.], seq 82, ack 12, win 65535, [wscale 3]
>>> 27.070449 IP (flags [DF], proto TCP (6), length 40)
>>> 10.8.5.4.1025 > 10.8.1.2.80: Flags [.], ack 83, win 512, length 0
>>>
>>> Turns out the last_win is of u16 type, but we store the scaled value:
>>> 512 << 8 (== 0x20000) becomes 0 window.
>>>
>>> The Fixes tag is not correct, as the bug has existed forever, but
>>> without that change all that this causes might cause is to mistake a
>>> window update (to-nonzero-from-zero) for a retransmit.
>>>
>>> Fixes: fbcd253d2448b8 ("netfilter: conntrack: lower timeout to RETRANS seconds if window is 0")
>>> Reported-by: Jakub Jankowski <shasta@toxcorp.com>
>>> Tested-by: Jakub Jankowski <shasta@toxcorp.com>
>>> Signed-off-by: Florian Westphal <fw@strlen.de>
>>> Acked-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
>>> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
>>> Signed-off-by: Sasha Levin <sashal@kernel.org>
>>
>> Also:
>> Tested-by: Thomas Jarosch <thomas.jarosch@intra2net.com>
>>
>> ;)
>>
>> We've hit the issue with the wrong conntrack timeout at two different sites,
>> long-lived connections to a SAP server over IPSec VPN were constantly dropping.
>>
>> For us this was a regression after updating from kernel 3.14 to 4.19.
>> Yesterday I've applied the patch to kernel 4.19.57 and the problem is fixed.
>>
>> The issue was extra hard to debug as we could just boot the new kernel
>> for twenty minutes in the evening on these productive systems.
>>
>> The stable kernel patch from last Friday came right on time. I was just
>> about the replay the TCP connection with tcpreplay, so this saved
>> me from another week of debugging. Thanks everyone!
>

-- 
Jakub Jankowski|shasta@toxcorp.com|https://toxcorp.com/

^ permalink raw reply

* [PATCH net-next 3/4 v2] net: ethernet: mediatek: Rename NEXT_RX_DESP_IDX to NEXT_DESP_IDX
From: Stefan Roese @ 2019-08-14 11:18 UTC (permalink / raw)
  To: netdev; +Cc: René van Dorst, Daniel Golle, Sean Wang, John Crispin
In-Reply-To: <20190814111825.10855-1-sr@denx.de>

Rename the NEXT_RX_DESP_IDX macro to NEXT_DESP_IDX, so that it better
can be used for TX ops as well. This will be used in the upcoming
MT7628/88 support (same functionality for RX and TX in this macro).

Signed-off-by: Stefan Roese <sr@denx.de>
Cc: René van Dorst <opensource@vdorst.com>
Cc: Daniel Golle <daniel@makrotopia.org>
Cc: Sean Wang <sean.wang@mediatek.com>
Cc: John Crispin <john@phrozen.org>
---
v2:
- New patch

 drivers/net/ethernet/mediatek/mtk_eth_soc.c | 4 ++--
 drivers/net/ethernet/mediatek/mtk_eth_soc.h | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/mediatek/mtk_eth_soc.c b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
index bee2cdca66e7..d9978174b96a 100644
--- a/drivers/net/ethernet/mediatek/mtk_eth_soc.c
+++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
@@ -903,7 +903,7 @@ static struct mtk_rx_ring *mtk_get_rx_ring(struct mtk_eth *eth)
 
 	for (i = 0; i < MTK_MAX_RX_RING_NUM; i++) {
 		ring = &eth->rx_ring[i];
-		idx = NEXT_RX_DESP_IDX(ring->calc_idx, ring->dma_size);
+		idx = NEXT_DESP_IDX(ring->calc_idx, ring->dma_size);
 		if (ring->dma[idx].rxd2 & RX_DMA_DONE) {
 			ring->calc_idx_update = true;
 			return ring;
@@ -952,7 +952,7 @@ static int mtk_poll_rx(struct napi_struct *napi, int budget,
 		if (unlikely(!ring))
 			goto rx_done;
 
-		idx = NEXT_RX_DESP_IDX(ring->calc_idx, ring->dma_size);
+		idx = NEXT_DESP_IDX(ring->calc_idx, ring->dma_size);
 		rxd = &ring->dma[idx];
 		data = ring->data[idx];
 
diff --git a/drivers/net/ethernet/mediatek/mtk_eth_soc.h b/drivers/net/ethernet/mediatek/mtk_eth_soc.h
index 088e2bc621f7..556644f28eae 100644
--- a/drivers/net/ethernet/mediatek/mtk_eth_soc.h
+++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.h
@@ -39,7 +39,7 @@
 				 NETIF_F_SG | NETIF_F_TSO | \
 				 NETIF_F_TSO6 | \
 				 NETIF_F_IPV6_CSUM)
-#define NEXT_RX_DESP_IDX(X, Y)	(((X) + 1) & ((Y) - 1))
+#define NEXT_DESP_IDX(X, Y)	(((X) + 1) & ((Y) - 1))
 
 #define MTK_MAX_RX_RING_NUM	4
 #define MTK_HW_LRO_DMA_SIZE	8
-- 
2.22.1


^ permalink raw reply related

* [PATCH net-next 4/4 v2] net: ethernet: mediatek: Add MT7628/88 SoC support
From: Stefan Roese @ 2019-08-14 11:18 UTC (permalink / raw)
  To: netdev; +Cc: René van Dorst, Daniel Golle, Sean Wang, John Crispin
In-Reply-To: <20190814111825.10855-1-sr@denx.de>

This patch adds support for the MediaTek MT7628/88 SoCs to the common
MediaTek ethernet driver. Some minor changes are needed for this and
a bigger change, as the MT7628 does not support QDMA (only PDMA).

Signed-off-by: Stefan Roese <sr@denx.de>
Cc: René van Dorst <opensource@vdorst.com>
Cc: Daniel Golle <daniel@makrotopia.org>
Cc: Sean Wang <sean.wang@mediatek.com>
Cc: John Crispin <john@phrozen.org>
---
v2:
- Rebased on net-next (David)
- Used "ralink,rt5350-eth" compatible (Daniel)
- Fixed capability bit usage (Rene)
- Extracted DT bindings description to separate patch
- Introduced MTK_QDMA capability, which is used on all
  currently supported SoCs. Only the newly introcuded MT7628
  uses PDMA and does not have this capability bit set
- Added tx_int_mask_reg/tx_int_status_reg variables to better
  abstract the QDMA vs PDMA usage

 drivers/net/ethernet/mediatek/Kconfig        |   2 +-
 drivers/net/ethernet/mediatek/mtk_eth_path.c |   4 +
 drivers/net/ethernet/mediatek/mtk_eth_soc.c  | 480 ++++++++++++++-----
 drivers/net/ethernet/mediatek/mtk_eth_soc.h  |  51 +-
 4 files changed, 425 insertions(+), 112 deletions(-)

diff --git a/drivers/net/ethernet/mediatek/Kconfig b/drivers/net/ethernet/mediatek/Kconfig
index 1f7fff81f24d..b76cf2e1c9dc 100644
--- a/drivers/net/ethernet/mediatek/Kconfig
+++ b/drivers/net/ethernet/mediatek/Kconfig
@@ -1,7 +1,7 @@
 # SPDX-License-Identifier: GPL-2.0-only
 config NET_VENDOR_MEDIATEK
 	bool "MediaTek ethernet driver"
-	depends on ARCH_MEDIATEK || SOC_MT7621
+	depends on ARCH_MEDIATEK || SOC_MT7621 || SOC_MT7620
 	---help---
 	  If you have a Mediatek SoC with ethernet, say Y.
 
diff --git a/drivers/net/ethernet/mediatek/mtk_eth_path.c b/drivers/net/ethernet/mediatek/mtk_eth_path.c
index 7f05880cf9ef..28960e4c4e43 100644
--- a/drivers/net/ethernet/mediatek/mtk_eth_path.c
+++ b/drivers/net/ethernet/mediatek/mtk_eth_path.c
@@ -315,6 +315,10 @@ int mtk_setup_hw_path(struct mtk_eth *eth, int mac_id, int phymode)
 {
 	int err;
 
+	/* No mux'ing for MT7628/88 */
+	if (MTK_HAS_CAPS(eth->soc->caps, MTK_SOC_MT7628))
+		return 0;
+
 	switch (phymode) {
 	case PHY_INTERFACE_MODE_TRGMII:
 	case PHY_INTERFACE_MODE_RGMII_TXID:
diff --git a/drivers/net/ethernet/mediatek/mtk_eth_soc.c b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
index d9978174b96a..65eaafe6e975 100644
--- a/drivers/net/ethernet/mediatek/mtk_eth_soc.c
+++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
@@ -323,11 +323,14 @@ static int mtk_phy_connect(struct net_device *dev)
 		goto err_phy;
 	}
 
-	/* put the gmac into the right mode */
-	regmap_read(eth->ethsys, ETHSYS_SYSCFG0, &val);
-	val &= ~SYSCFG0_GE_MODE(SYSCFG0_GE_MASK, mac->id);
-	val |= SYSCFG0_GE_MODE(mac->ge_mode, mac->id);
-	regmap_write(eth->ethsys, ETHSYS_SYSCFG0, val);
+	/* No MT7628/88 support for now */
+	if (!MTK_HAS_CAPS(eth->soc->caps, MTK_SOC_MT7628)) {
+		/* put the gmac into the right mode */
+		regmap_read(eth->ethsys, ETHSYS_SYSCFG0, &val);
+		val &= ~SYSCFG0_GE_MODE(SYSCFG0_GE_MASK, mac->id);
+		val |= SYSCFG0_GE_MODE(mac->ge_mode, mac->id);
+		regmap_write(eth->ethsys, ETHSYS_SYSCFG0, val);
+	}
 
 	/* couple phydev to net_device */
 	if (mtk_phy_connect_node(eth, mac, np))
@@ -395,8 +398,8 @@ static inline void mtk_tx_irq_disable(struct mtk_eth *eth, u32 mask)
 	u32 val;
 
 	spin_lock_irqsave(&eth->tx_irq_lock, flags);
-	val = mtk_r32(eth, MTK_QDMA_INT_MASK);
-	mtk_w32(eth, val & ~mask, MTK_QDMA_INT_MASK);
+	val = mtk_r32(eth, eth->tx_int_mask_reg);
+	mtk_w32(eth, val & ~mask, eth->tx_int_mask_reg);
 	spin_unlock_irqrestore(&eth->tx_irq_lock, flags);
 }
 
@@ -406,8 +409,8 @@ static inline void mtk_tx_irq_enable(struct mtk_eth *eth, u32 mask)
 	u32 val;
 
 	spin_lock_irqsave(&eth->tx_irq_lock, flags);
-	val = mtk_r32(eth, MTK_QDMA_INT_MASK);
-	mtk_w32(eth, val | mask, MTK_QDMA_INT_MASK);
+	val = mtk_r32(eth, eth->tx_int_mask_reg);
+	mtk_w32(eth, val | mask, eth->tx_int_mask_reg);
 	spin_unlock_irqrestore(&eth->tx_irq_lock, flags);
 }
 
@@ -437,6 +440,7 @@ static int mtk_set_mac_address(struct net_device *dev, void *p)
 {
 	int ret = eth_mac_addr(dev, p);
 	struct mtk_mac *mac = netdev_priv(dev);
+	struct mtk_eth *eth = mac->hw;
 	const char *macaddr = dev->dev_addr;
 
 	if (ret)
@@ -446,11 +450,19 @@ static int mtk_set_mac_address(struct net_device *dev, void *p)
 		return -EBUSY;
 
 	spin_lock_bh(&mac->hw->page_lock);
-	mtk_w32(mac->hw, (macaddr[0] << 8) | macaddr[1],
-		MTK_GDMA_MAC_ADRH(mac->id));
-	mtk_w32(mac->hw, (macaddr[2] << 24) | (macaddr[3] << 16) |
-		(macaddr[4] << 8) | macaddr[5],
-		MTK_GDMA_MAC_ADRL(mac->id));
+	if (MTK_HAS_CAPS(eth->soc->caps, MTK_SOC_MT7628)) {
+		mtk_w32(mac->hw, (macaddr[0] << 8) | macaddr[1],
+			MT7628_SDM_MAC_ADRH);
+		mtk_w32(mac->hw, (macaddr[2] << 24) | (macaddr[3] << 16) |
+			(macaddr[4] << 8) | macaddr[5],
+			MT7628_SDM_MAC_ADRL);
+	} else {
+		mtk_w32(mac->hw, (macaddr[0] << 8) | macaddr[1],
+			MTK_GDMA_MAC_ADRH(mac->id));
+		mtk_w32(mac->hw, (macaddr[2] << 24) | (macaddr[3] << 16) |
+			(macaddr[4] << 8) | macaddr[5],
+			MTK_GDMA_MAC_ADRL(mac->id));
+	}
 	spin_unlock_bh(&mac->hw->page_lock);
 
 	return 0;
@@ -626,19 +638,47 @@ static inline struct mtk_tx_buf *mtk_desc_to_tx_buf(struct mtk_tx_ring *ring,
 	return &ring->buf[idx];
 }
 
+static struct mtk_tx_dma *qdma_to_pdma(struct mtk_tx_ring *ring,
+				       struct mtk_tx_dma *dma)
+{
+	return ring->dma_pdma - ring->dma + dma;
+}
+
+static int txd_to_idx(struct mtk_tx_ring *ring, struct mtk_tx_dma *dma)
+{
+	return ((u32)dma - (u32)ring->dma) / sizeof(*dma);
+}
+
 static void mtk_tx_unmap(struct mtk_eth *eth, struct mtk_tx_buf *tx_buf)
 {
-	if (tx_buf->flags & MTK_TX_FLAGS_SINGLE0) {
-		dma_unmap_single(eth->dev,
-				 dma_unmap_addr(tx_buf, dma_addr0),
-				 dma_unmap_len(tx_buf, dma_len0),
-				 DMA_TO_DEVICE);
-	} else if (tx_buf->flags & MTK_TX_FLAGS_PAGE0) {
-		dma_unmap_page(eth->dev,
-			       dma_unmap_addr(tx_buf, dma_addr0),
-			       dma_unmap_len(tx_buf, dma_len0),
-			       DMA_TO_DEVICE);
+	if (MTK_HAS_CAPS(eth->soc->caps, MTK_QDMA)) {
+		if (tx_buf->flags & MTK_TX_FLAGS_SINGLE0) {
+			dma_unmap_single(eth->dev,
+					 dma_unmap_addr(tx_buf, dma_addr0),
+					 dma_unmap_len(tx_buf, dma_len0),
+					 DMA_TO_DEVICE);
+		} else if (tx_buf->flags & MTK_TX_FLAGS_PAGE0) {
+			dma_unmap_page(eth->dev,
+				       dma_unmap_addr(tx_buf, dma_addr0),
+				       dma_unmap_len(tx_buf, dma_len0),
+				       DMA_TO_DEVICE);
+		}
+	} else {
+		if (dma_unmap_len(tx_buf, dma_len0)) {
+			dma_unmap_page(eth->dev,
+				       dma_unmap_addr(tx_buf, dma_addr0),
+				       dma_unmap_len(tx_buf, dma_len0),
+				       DMA_TO_DEVICE);
+		}
+
+		if (dma_unmap_len(tx_buf, dma_len1)) {
+			dma_unmap_page(eth->dev,
+				       dma_unmap_addr(tx_buf, dma_addr1),
+				       dma_unmap_len(tx_buf, dma_len1),
+				       DMA_TO_DEVICE);
+		}
 	}
+
 	tx_buf->flags = 0;
 	if (tx_buf->skb &&
 	    (tx_buf->skb != (struct sk_buff *)MTK_DMA_DUMMY_DESC))
@@ -646,19 +686,45 @@ static void mtk_tx_unmap(struct mtk_eth *eth, struct mtk_tx_buf *tx_buf)
 	tx_buf->skb = NULL;
 }
 
+static void setup_tx_buf(struct mtk_eth *eth, struct mtk_tx_buf *tx_buf,
+			 struct mtk_tx_dma *txd, dma_addr_t mapped_addr,
+			 size_t size, int idx)
+{
+	if (MTK_HAS_CAPS(eth->soc->caps, MTK_QDMA)) {
+		dma_unmap_addr_set(tx_buf, dma_addr0, mapped_addr);
+		dma_unmap_len_set(tx_buf, dma_len0, size);
+	} else {
+		if (idx & 1) {
+			txd->txd3 = mapped_addr;
+			txd->txd2 |= TX_DMA_PLEN1(size);
+			dma_unmap_addr_set(tx_buf, dma_addr1, mapped_addr);
+			dma_unmap_len_set(tx_buf, dma_len1, size);
+		} else {
+			tx_buf->skb = (struct sk_buff *)MTK_DMA_DUMMY_DESC;
+			txd->txd1 = mapped_addr;
+			txd->txd2 = TX_DMA_PLEN0(size);
+			dma_unmap_addr_set(tx_buf, dma_addr0, mapped_addr);
+			dma_unmap_len_set(tx_buf, dma_len0, size);
+		}
+	}
+}
+
 static int mtk_tx_map(struct sk_buff *skb, struct net_device *dev,
 		      int tx_num, struct mtk_tx_ring *ring, bool gso)
 {
 	struct mtk_mac *mac = netdev_priv(dev);
 	struct mtk_eth *eth = mac->hw;
 	struct mtk_tx_dma *itxd, *txd;
+	struct mtk_tx_dma *itxd_pdma, *txd_pdma;
 	struct mtk_tx_buf *itx_buf, *tx_buf;
 	dma_addr_t mapped_addr;
 	unsigned int nr_frags;
 	int i, n_desc = 1;
 	u32 txd4 = 0, fport;
+	int k = 0;
 
 	itxd = ring->next_free;
+	itxd_pdma = qdma_to_pdma(ring, itxd);
 	if (itxd == ring->last_free)
 		return -ENOMEM;
 
@@ -689,12 +755,14 @@ static int mtk_tx_map(struct sk_buff *skb, struct net_device *dev,
 	itx_buf->flags |= MTK_TX_FLAGS_SINGLE0;
 	itx_buf->flags |= (!mac->id) ? MTK_TX_FLAGS_FPORT0 :
 			  MTK_TX_FLAGS_FPORT1;
-	dma_unmap_addr_set(itx_buf, dma_addr0, mapped_addr);
-	dma_unmap_len_set(itx_buf, dma_len0, skb_headlen(skb));
+	setup_tx_buf(eth, itx_buf, itxd_pdma, mapped_addr, skb_headlen(skb),
+		     k++);
 
 	/* TX SG offload */
 	txd = itxd;
+	txd_pdma = qdma_to_pdma(ring, txd);
 	nr_frags = skb_shinfo(skb)->nr_frags;
+
 	for (i = 0; i < nr_frags; i++) {
 		skb_frag_t *frag = &skb_shinfo(skb)->frags[i];
 		unsigned int offset = 0;
@@ -703,12 +771,21 @@ static int mtk_tx_map(struct sk_buff *skb, struct net_device *dev,
 		while (frag_size) {
 			bool last_frag = false;
 			unsigned int frag_map_size;
+			bool new_desc = true;
+
+			if (MTK_HAS_CAPS(eth->soc->caps, MTK_QDMA) ||
+			    (i & 0x1)) {
+				txd = mtk_qdma_phys_to_virt(ring, txd->txd2);
+				txd_pdma = qdma_to_pdma(ring, txd);
+				if (txd == ring->last_free)
+					goto err_dma;
+
+				n_desc++;
+			} else {
+				new_desc = false;
+			}
 
-			txd = mtk_qdma_phys_to_virt(ring, txd->txd2);
-			if (txd == ring->last_free)
-				goto err_dma;
 
-			n_desc++;
 			frag_map_size = min(frag_size, MTK_TX_DMA_BUF_LEN);
 			mapped_addr = skb_frag_dma_map(eth->dev, frag, offset,
 						       frag_map_size,
@@ -727,14 +804,16 @@ static int mtk_tx_map(struct sk_buff *skb, struct net_device *dev,
 			WRITE_ONCE(txd->txd4, fport);
 
 			tx_buf = mtk_desc_to_tx_buf(ring, txd);
-			memset(tx_buf, 0, sizeof(*tx_buf));
+			if (new_desc)
+				memset(tx_buf, 0, sizeof(*tx_buf));
 			tx_buf->skb = (struct sk_buff *)MTK_DMA_DUMMY_DESC;
 			tx_buf->flags |= MTK_TX_FLAGS_PAGE0;
 			tx_buf->flags |= (!mac->id) ? MTK_TX_FLAGS_FPORT0 :
 					 MTK_TX_FLAGS_FPORT1;
 
-			dma_unmap_addr_set(tx_buf, dma_addr0, mapped_addr);
-			dma_unmap_len_set(tx_buf, dma_len0, frag_map_size);
+			setup_tx_buf(eth, tx_buf, txd_pdma, mapped_addr,
+				     frag_map_size, k++);
+
 			frag_size -= frag_map_size;
 			offset += frag_map_size;
 		}
@@ -746,6 +825,12 @@ static int mtk_tx_map(struct sk_buff *skb, struct net_device *dev,
 	WRITE_ONCE(itxd->txd4, txd4);
 	WRITE_ONCE(itxd->txd3, (TX_DMA_SWC | TX_DMA_PLEN0(skb_headlen(skb)) |
 				(!nr_frags * TX_DMA_LS0)));
+	if (!MTK_HAS_CAPS(eth->soc->caps, MTK_QDMA)) {
+		if (k & 0x1)
+			txd_pdma->txd2 |= TX_DMA_LS0;
+		else
+			txd_pdma->txd2 |= TX_DMA_LS1;
+	}
 
 	netdev_sent_queue(dev, skb->len);
 	skb_tx_timestamp(skb);
@@ -758,9 +843,15 @@ static int mtk_tx_map(struct sk_buff *skb, struct net_device *dev,
 	 */
 	wmb();
 
-	if (netif_xmit_stopped(netdev_get_tx_queue(dev, 0)) ||
-	    !netdev_xmit_more())
-		mtk_w32(eth, txd->txd2, MTK_QTX_CTX_PTR);
+	if (MTK_HAS_CAPS(eth->soc->caps, MTK_QDMA)) {
+		if (netif_xmit_stopped(netdev_get_tx_queue(dev, 0)) ||
+		    !netdev_xmit_more())
+			mtk_w32(eth, txd->txd2, MTK_QTX_CTX_PTR);
+	} else {
+		int next_idx = NEXT_DESP_IDX(txd_to_idx(ring, txd),
+					     ring->dma_size);
+		mtk_w32(eth, next_idx, MT7628_TX_CTX_IDX0);
+	}
 
 	return 0;
 
@@ -772,7 +863,11 @@ static int mtk_tx_map(struct sk_buff *skb, struct net_device *dev,
 		mtk_tx_unmap(eth, tx_buf);
 
 		itxd->txd3 = TX_DMA_LS0 | TX_DMA_OWNER_CPU;
+		if (!MTK_HAS_CAPS(eth->soc->caps, MTK_QDMA))
+			itxd_pdma->txd2 = TX_DMA_DESP2_DEF;
+
 		itxd = mtk_qdma_phys_to_virt(ring, itxd->txd2);
+		itxd_pdma = qdma_to_pdma(ring, itxd);
 	} while (itxd != txd);
 
 	return -ENOMEM;
@@ -946,7 +1041,7 @@ static int mtk_poll_rx(struct napi_struct *napi, int budget,
 		struct net_device *netdev;
 		unsigned int pktlen;
 		dma_addr_t dma_addr;
-		int mac = 0;
+		int mac;
 
 		ring = mtk_get_rx_ring(eth);
 		if (unlikely(!ring))
@@ -961,9 +1056,13 @@ static int mtk_poll_rx(struct napi_struct *napi, int budget,
 			break;
 
 		/* find out which mac the packet come from. values start at 1 */
-		mac = (trxd.rxd4 >> RX_DMA_FPORT_SHIFT) &
-		      RX_DMA_FPORT_MASK;
-		mac--;
+		if (MTK_HAS_CAPS(eth->soc->caps, MTK_SOC_MT7628)) {
+			mac = 0;
+		} else {
+			mac = (trxd.rxd4 >> RX_DMA_FPORT_SHIFT) &
+				RX_DMA_FPORT_MASK;
+			mac--;
+		}
 
 		if (unlikely(mac < 0 || mac >= MTK_MAC_COUNT ||
 			     !eth->netdev[mac]))
@@ -981,7 +1080,8 @@ static int mtk_poll_rx(struct napi_struct *napi, int budget,
 			goto release_desc;
 		}
 		dma_addr = dma_map_single(eth->dev,
-					  new_data + NET_SKB_PAD,
+					  new_data + NET_SKB_PAD +
+					  eth->ip_align,
 					  ring->buf_size,
 					  DMA_FROM_DEVICE);
 		if (unlikely(dma_mapping_error(eth->dev, dma_addr))) {
@@ -1004,7 +1104,7 @@ static int mtk_poll_rx(struct napi_struct *napi, int budget,
 		pktlen = RX_DMA_GET_PLEN0(trxd.rxd2);
 		skb->dev = netdev;
 		skb_put(skb, pktlen);
-		if (trxd.rxd4 & RX_DMA_L4_VALID)
+		if (trxd.rxd4 & eth->rx_dma_l4_valid)
 			skb->ip_summed = CHECKSUM_UNNECESSARY;
 		else
 			skb_checksum_none_assert(skb);
@@ -1021,7 +1121,10 @@ static int mtk_poll_rx(struct napi_struct *napi, int budget,
 		rxd->rxd1 = (unsigned int)dma_addr;
 
 release_desc:
-		rxd->rxd2 = RX_DMA_PLEN0(ring->buf_size);
+		if (MTK_HAS_CAPS(eth->soc->caps, MTK_SOC_MT7628))
+			rxd->rxd2 = RX_DMA_LSO;
+		else
+			rxd->rxd2 = RX_DMA_PLEN0(ring->buf_size);
 
 		ring->calc_idx = idx;
 
@@ -1040,19 +1143,14 @@ static int mtk_poll_rx(struct napi_struct *napi, int budget,
 	return done;
 }
 
-static int mtk_poll_tx(struct mtk_eth *eth, int budget)
+static int mtk_poll_tx_qdma(struct mtk_eth *eth, int budget,
+			    unsigned int *done, unsigned int *bytes)
 {
 	struct mtk_tx_ring *ring = &eth->tx_ring;
 	struct mtk_tx_dma *desc;
 	struct sk_buff *skb;
 	struct mtk_tx_buf *tx_buf;
-	unsigned int done[MTK_MAX_DEVS];
-	unsigned int bytes[MTK_MAX_DEVS];
 	u32 cpu, dma;
-	int total = 0, i;
-
-	memset(done, 0, sizeof(done));
-	memset(bytes, 0, sizeof(bytes));
 
 	cpu = mtk_r32(eth, MTK_QTX_CRX_PTR);
 	dma = mtk_r32(eth, MTK_QTX_DRX_PTR);
@@ -1090,6 +1188,62 @@ static int mtk_poll_tx(struct mtk_eth *eth, int budget)
 
 	mtk_w32(eth, cpu, MTK_QTX_CRX_PTR);
 
+	return budget;
+}
+
+static int mtk_poll_tx_pdma(struct mtk_eth *eth, int budget,
+			    unsigned int *done, unsigned int *bytes)
+{
+	struct mtk_tx_ring *ring = &eth->tx_ring;
+	struct mtk_tx_dma *desc;
+	struct sk_buff *skb;
+	struct mtk_tx_buf *tx_buf;
+	u32 cpu, dma;
+
+	cpu = ring->cpu_idx;
+	dma = mtk_r32(eth, MT7628_TX_DTX_IDX0);
+
+	while ((cpu != dma) && budget) {
+		tx_buf = &ring->buf[cpu];
+		skb = tx_buf->skb;
+		if (!skb)
+			break;
+
+		if (skb != (struct sk_buff *)MTK_DMA_DUMMY_DESC) {
+			bytes[0] += skb->len;
+			done[0]++;
+			budget--;
+		}
+
+		mtk_tx_unmap(eth, tx_buf);
+
+		desc = &ring->dma[cpu];
+		ring->last_free = desc;
+		atomic_inc(&ring->free_count);
+
+		cpu = NEXT_DESP_IDX(cpu, ring->dma_size);
+	}
+
+	ring->cpu_idx = cpu;
+
+	return budget;
+}
+
+static int mtk_poll_tx(struct mtk_eth *eth, int budget)
+{
+	struct mtk_tx_ring *ring = &eth->tx_ring;
+	unsigned int done[MTK_MAX_DEVS];
+	unsigned int bytes[MTK_MAX_DEVS];
+	int total = 0, i;
+
+	memset(done, 0, sizeof(done));
+	memset(bytes, 0, sizeof(bytes));
+
+	if (MTK_HAS_CAPS(eth->soc->caps, MTK_QDMA))
+		budget = mtk_poll_tx_qdma(eth, budget, done, bytes);
+	else
+		budget = mtk_poll_tx_pdma(eth, budget, done, bytes);
+
 	for (i = 0; i < MTK_MAC_COUNT; i++) {
 		if (!eth->netdev[i] || !done[i])
 			continue;
@@ -1121,13 +1275,14 @@ static int mtk_napi_tx(struct napi_struct *napi, int budget)
 	u32 status, mask;
 	int tx_done = 0;
 
-	mtk_handle_status_irq(eth);
-	mtk_w32(eth, MTK_TX_DONE_INT, MTK_QDMA_INT_STATUS);
+	if (MTK_HAS_CAPS(eth->soc->caps, MTK_QDMA))
+		mtk_handle_status_irq(eth);
+	mtk_w32(eth, MTK_TX_DONE_INT, eth->tx_int_status_reg);
 	tx_done = mtk_poll_tx(eth, budget);
 
 	if (unlikely(netif_msg_intr(eth))) {
-		status = mtk_r32(eth, MTK_QDMA_INT_STATUS);
-		mask = mtk_r32(eth, MTK_QDMA_INT_MASK);
+		status = mtk_r32(eth, eth->tx_int_status_reg);
+		mask = mtk_r32(eth, eth->tx_int_mask_reg);
 		dev_info(eth->dev,
 			 "done tx %d, intr 0x%08x/0x%x\n",
 			 tx_done, status, mask);
@@ -1136,7 +1291,7 @@ static int mtk_napi_tx(struct napi_struct *napi, int budget)
 	if (tx_done == budget)
 		return budget;
 
-	status = mtk_r32(eth, MTK_QDMA_INT_STATUS);
+	status = mtk_r32(eth, eth->tx_int_status_reg);
 	if (status & MTK_TX_DONE_INT)
 		return budget;
 
@@ -1203,6 +1358,24 @@ static int mtk_tx_alloc(struct mtk_eth *eth)
 		ring->dma[i].txd3 = TX_DMA_LS0 | TX_DMA_OWNER_CPU;
 	}
 
+	/* On MT7688 (PDMA only) this driver uses the ring->dma structs
+	 * only as the framework. The real HW descriptors are the PDMA
+	 * descriptors in ring->dma_pdma.
+	 */
+	if (!MTK_HAS_CAPS(eth->soc->caps, MTK_QDMA)) {
+		ring->dma_pdma = dma_alloc_coherent(eth->dev, MTK_DMA_SIZE * sz,
+						    &ring->phys_pdma,
+						    GFP_ATOMIC);
+		if (!ring->dma_pdma)
+			goto no_tx_mem;
+
+		for (i = 0; i < MTK_DMA_SIZE; i++) {
+			ring->dma_pdma[i].txd2 = TX_DMA_DESP2_DEF;
+			ring->dma_pdma[i].txd4 = 0;
+		}
+	}
+
+	ring->dma_size = MTK_DMA_SIZE;
 	atomic_set(&ring->free_count, MTK_DMA_SIZE - 2);
 	ring->next_free = &ring->dma[0];
 	ring->last_free = &ring->dma[MTK_DMA_SIZE - 1];
@@ -1213,15 +1386,23 @@ static int mtk_tx_alloc(struct mtk_eth *eth)
 	 */
 	wmb();
 
-	mtk_w32(eth, ring->phys, MTK_QTX_CTX_PTR);
-	mtk_w32(eth, ring->phys, MTK_QTX_DTX_PTR);
-	mtk_w32(eth,
-		ring->phys + ((MTK_DMA_SIZE - 1) * sz),
-		MTK_QTX_CRX_PTR);
-	mtk_w32(eth,
-		ring->phys + ((MTK_DMA_SIZE - 1) * sz),
-		MTK_QTX_DRX_PTR);
-	mtk_w32(eth, (QDMA_RES_THRES << 8) | QDMA_RES_THRES, MTK_QTX_CFG(0));
+	if (MTK_HAS_CAPS(eth->soc->caps, MTK_QDMA)) {
+		mtk_w32(eth, ring->phys, MTK_QTX_CTX_PTR);
+		mtk_w32(eth, ring->phys, MTK_QTX_DTX_PTR);
+		mtk_w32(eth,
+			ring->phys + ((MTK_DMA_SIZE - 1) * sz),
+			MTK_QTX_CRX_PTR);
+		mtk_w32(eth,
+			ring->phys + ((MTK_DMA_SIZE - 1) * sz),
+			MTK_QTX_DRX_PTR);
+		mtk_w32(eth, (QDMA_RES_THRES << 8) | QDMA_RES_THRES,
+			MTK_QTX_CFG(0));
+	} else {
+		mtk_w32(eth, ring->phys_pdma, MT7628_TX_BASE_PTR0);
+		mtk_w32(eth, MTK_DMA_SIZE, MT7628_TX_MAX_CNT0);
+		mtk_w32(eth, 0, MT7628_TX_CTX_IDX0);
+		mtk_w32(eth, MT7628_PST_DTX_IDX0, MTK_PDMA_RST_IDX);
+	}
 
 	return 0;
 
@@ -1248,6 +1429,14 @@ static void mtk_tx_clean(struct mtk_eth *eth)
 				  ring->phys);
 		ring->dma = NULL;
 	}
+
+	if (ring->dma_pdma) {
+		dma_free_coherent(eth->dev,
+				  MTK_DMA_SIZE * sizeof(*ring->dma_pdma),
+				  ring->dma_pdma,
+				  ring->phys_pdma);
+		ring->dma_pdma = NULL;
+	}
 }
 
 static int mtk_rx_alloc(struct mtk_eth *eth, int ring_no, int rx_flag)
@@ -1295,14 +1484,17 @@ static int mtk_rx_alloc(struct mtk_eth *eth, int ring_no, int rx_flag)
 
 	for (i = 0; i < rx_dma_size; i++) {
 		dma_addr_t dma_addr = dma_map_single(eth->dev,
-				ring->data[i] + NET_SKB_PAD,
+				ring->data[i] + NET_SKB_PAD + eth->ip_align,
 				ring->buf_size,
 				DMA_FROM_DEVICE);
 		if (unlikely(dma_mapping_error(eth->dev, dma_addr)))
 			return -ENOMEM;
 		ring->dma[i].rxd1 = (unsigned int)dma_addr;
 
-		ring->dma[i].rxd2 = RX_DMA_PLEN0(ring->buf_size);
+		if (MTK_HAS_CAPS(eth->soc->caps, MTK_SOC_MT7628))
+			ring->dma[i].rxd2 = RX_DMA_LSO;
+		else
+			ring->dma[i].rxd2 = RX_DMA_PLEN0(ring->buf_size);
 	}
 	ring->dma_size = rx_dma_size;
 	ring->calc_idx_update = false;
@@ -1618,9 +1810,16 @@ static int mtk_dma_busy_wait(struct mtk_eth *eth)
 	unsigned long t_start = jiffies;
 
 	while (1) {
-		if (!(mtk_r32(eth, MTK_QDMA_GLO_CFG) &
-		      (MTK_RX_DMA_BUSY | MTK_TX_DMA_BUSY)))
-			return 0;
+		if (MTK_HAS_CAPS(eth->soc->caps, MTK_QDMA)) {
+			if (!(mtk_r32(eth, MTK_QDMA_GLO_CFG) &
+			      (MTK_RX_DMA_BUSY | MTK_TX_DMA_BUSY)))
+				return 0;
+		} else {
+			if (!(mtk_r32(eth, MTK_PDMA_GLO_CFG) &
+			      (MTK_RX_DMA_BUSY | MTK_TX_DMA_BUSY)))
+				return 0;
+		}
+
 		if (time_after(jiffies, t_start + MTK_DMA_BUSY_TIMEOUT))
 			break;
 	}
@@ -1637,20 +1836,24 @@ static int mtk_dma_init(struct mtk_eth *eth)
 	if (mtk_dma_busy_wait(eth))
 		return -EBUSY;
 
-	/* QDMA needs scratch memory for internal reordering of the
-	 * descriptors
-	 */
-	err = mtk_init_fq_dma(eth);
-	if (err)
-		return err;
+	if (MTK_HAS_CAPS(eth->soc->caps, MTK_QDMA)) {
+		/* QDMA needs scratch memory for internal reordering of the
+		 * descriptors
+		 */
+		err = mtk_init_fq_dma(eth);
+		if (err)
+			return err;
+	}
 
 	err = mtk_tx_alloc(eth);
 	if (err)
 		return err;
 
-	err = mtk_rx_alloc(eth, 0, MTK_RX_FLAGS_QDMA);
-	if (err)
-		return err;
+	if (MTK_HAS_CAPS(eth->soc->caps, MTK_QDMA)) {
+		err = mtk_rx_alloc(eth, 0, MTK_RX_FLAGS_QDMA);
+		if (err)
+			return err;
+	}
 
 	err = mtk_rx_alloc(eth, 0, MTK_RX_FLAGS_NORMAL);
 	if (err)
@@ -1667,10 +1870,14 @@ static int mtk_dma_init(struct mtk_eth *eth)
 			return err;
 	}
 
-	/* Enable random early drop and set drop threshold automatically */
-	mtk_w32(eth, FC_THRES_DROP_MODE | FC_THRES_DROP_EN | FC_THRES_MIN,
-		MTK_QDMA_FC_THRES);
-	mtk_w32(eth, 0x0, MTK_QDMA_HRED2);
+	if (MTK_HAS_CAPS(eth->soc->caps, MTK_QDMA)) {
+		/* Enable random early drop and set drop threshold
+		 * automatically
+		 */
+		mtk_w32(eth, FC_THRES_DROP_MODE | FC_THRES_DROP_EN |
+			FC_THRES_MIN, MTK_QDMA_FC_THRES);
+		mtk_w32(eth, 0x0, MTK_QDMA_HRED2);
+	}
 
 	return 0;
 }
@@ -1741,13 +1948,15 @@ static irqreturn_t mtk_handle_irq_tx(int irq, void *_eth)
 static irqreturn_t mtk_handle_irq(int irq, void *_eth)
 {
 	struct mtk_eth *eth = _eth;
+	u32 status;
 
+	status = mtk_r32(eth, MTK_PDMA_INT_STATUS);
 	if (mtk_r32(eth, MTK_PDMA_INT_MASK) & MTK_RX_DONE_INT) {
 		if (mtk_r32(eth, MTK_PDMA_INT_STATUS) & MTK_RX_DONE_INT)
 			mtk_handle_irq_rx(irq, _eth);
 	}
-	if (mtk_r32(eth, MTK_QDMA_INT_MASK) & MTK_TX_DONE_INT) {
-		if (mtk_r32(eth, MTK_QDMA_INT_STATUS) & MTK_TX_DONE_INT)
+	if (mtk_r32(eth, eth->tx_int_mask_reg) & MTK_TX_DONE_INT) {
+		if (mtk_r32(eth, eth->tx_int_status_reg) & MTK_TX_DONE_INT)
 			mtk_handle_irq_tx(irq, _eth);
 	}
 
@@ -1779,17 +1988,23 @@ static int mtk_start_dma(struct mtk_eth *eth)
 		return err;
 	}
 
-	mtk_w32(eth,
-		MTK_TX_WB_DDONE | MTK_TX_DMA_EN |
-		MTK_DMA_SIZE_16DWORDS | MTK_NDP_CO_PRO |
-		MTK_RX_DMA_EN | MTK_RX_2B_OFFSET |
-		MTK_RX_BT_32DWORDS,
-		MTK_QDMA_GLO_CFG);
+	if (MTK_HAS_CAPS(eth->soc->caps, MTK_QDMA)) {
+		mtk_w32(eth,
+			MTK_TX_WB_DDONE | MTK_TX_DMA_EN |
+			MTK_DMA_SIZE_16DWORDS | MTK_NDP_CO_PRO |
+			MTK_RX_DMA_EN | MTK_RX_2B_OFFSET |
+			MTK_RX_BT_32DWORDS,
+			MTK_QDMA_GLO_CFG);
 
-	mtk_w32(eth,
-		MTK_RX_DMA_EN | rx_2b_offset |
-		MTK_RX_BT_32DWORDS | MTK_MULTI_EN,
-		MTK_PDMA_GLO_CFG);
+		mtk_w32(eth,
+			MTK_RX_DMA_EN | rx_2b_offset |
+			MTK_RX_BT_32DWORDS | MTK_MULTI_EN,
+			MTK_PDMA_GLO_CFG);
+	} else {
+		mtk_w32(eth, MTK_TX_WB_DDONE | MTK_TX_DMA_EN | MTK_RX_DMA_EN |
+			MTK_MULTI_EN | MTK_PDMA_SIZE_8DWORDS,
+			MTK_PDMA_GLO_CFG);
+	}
 
 	return 0;
 }
@@ -1817,7 +2032,6 @@ static int mtk_open(struct net_device *dev)
 
 	phy_start(dev->phydev);
 	netif_start_queue(dev);
-
 	return 0;
 }
 
@@ -1861,7 +2075,8 @@ static int mtk_stop(struct net_device *dev)
 	napi_disable(&eth->tx_napi);
 	napi_disable(&eth->rx_napi);
 
-	mtk_stop_dma(eth, MTK_QDMA_GLO_CFG);
+	if (MTK_HAS_CAPS(eth->soc->caps, MTK_QDMA))
+		mtk_stop_dma(eth, MTK_QDMA_GLO_CFG);
 	mtk_stop_dma(eth, MTK_PDMA_GLO_CFG);
 
 	mtk_dma_free(eth);
@@ -1923,6 +2138,24 @@ static int mtk_hw_init(struct mtk_eth *eth)
 	if (ret)
 		goto err_disable_pm;
 
+	if (MTK_HAS_CAPS(eth->soc->caps, MTK_SOC_MT7628)) {
+		ret = device_reset(eth->dev);
+		if (ret) {
+			dev_err(eth->dev, "MAC reset failed!\n");
+			goto err_disable_pm;
+		}
+
+		/* enable interrupt delay for RX */
+		mtk_w32(eth, MTK_PDMA_DELAY_RX_DELAY, MTK_PDMA_DELAY_INT);
+
+		/* disable delay and normal interrupt */
+		mtk_tx_irq_disable(eth, ~0);
+		mtk_rx_irq_disable(eth, ~0);
+
+		return 0;
+	}
+
+	/* Non-MT7628 handling... */
 	ethsys_reset(eth, RSTCTRL_FE);
 	ethsys_reset(eth, RSTCTRL_PPE);
 
@@ -2426,13 +2659,13 @@ static int mtk_add_mac(struct mtk_eth *eth, struct device_node *np)
 	eth->netdev[id]->netdev_ops = &mtk_netdev_ops;
 	eth->netdev[id]->base_addr = (unsigned long)eth->base;
 
-	eth->netdev[id]->hw_features = MTK_HW_FEATURES;
+	eth->netdev[id]->hw_features = eth->soc->hw_features;
 	if (eth->hwlro)
 		eth->netdev[id]->hw_features |= NETIF_F_LRO;
 
-	eth->netdev[id]->vlan_features = MTK_HW_FEATURES &
+	eth->netdev[id]->vlan_features = eth->soc->hw_features &
 		~(NETIF_F_HW_VLAN_CTAG_TX | NETIF_F_HW_VLAN_CTAG_RX);
-	eth->netdev[id]->features |= MTK_HW_FEATURES;
+	eth->netdev[id]->features |= eth->soc->hw_features;
 	eth->netdev[id]->ethtool_ops = &mtk_ethtool_ops;
 
 	eth->netdev[id]->irq = eth->irq[0];
@@ -2463,15 +2696,32 @@ static int mtk_probe(struct platform_device *pdev)
 	if (IS_ERR(eth->base))
 		return PTR_ERR(eth->base);
 
+	if (MTK_HAS_CAPS(eth->soc->caps, MTK_QDMA)) {
+		eth->tx_int_mask_reg = MTK_QDMA_INT_MASK;
+		eth->tx_int_status_reg = MTK_QDMA_INT_STATUS;
+	} else {
+		eth->tx_int_mask_reg = MTK_PDMA_INT_MASK;
+		eth->tx_int_status_reg = MTK_PDMA_INT_STATUS;
+	}
+
+	if (MTK_HAS_CAPS(eth->soc->caps, MTK_SOC_MT7628)) {
+		eth->rx_dma_l4_valid = RX_DMA_L4_VALID_PDMA;
+		eth->ip_align = NET_IP_ALIGN;
+	} else {
+		eth->rx_dma_l4_valid = RX_DMA_L4_VALID;
+	}
+
 	spin_lock_init(&eth->page_lock);
 	spin_lock_init(&eth->tx_irq_lock);
 	spin_lock_init(&eth->rx_irq_lock);
 
-	eth->ethsys = syscon_regmap_lookup_by_phandle(pdev->dev.of_node,
-						      "mediatek,ethsys");
-	if (IS_ERR(eth->ethsys)) {
-		dev_err(&pdev->dev, "no ethsys regmap found\n");
-		return PTR_ERR(eth->ethsys);
+	if (!MTK_HAS_CAPS(eth->soc->caps, MTK_SOC_MT7628)) {
+		eth->ethsys = syscon_regmap_lookup_by_phandle(pdev->dev.of_node,
+							      "mediatek,ethsys");
+		if (IS_ERR(eth->ethsys)) {
+			dev_err(&pdev->dev, "no ethsys regmap found\n");
+			return PTR_ERR(eth->ethsys);
+		}
 	}
 
 	if (MTK_HAS_CAPS(eth->soc->caps, MTK_INFRA)) {
@@ -2572,9 +2822,12 @@ static int mtk_probe(struct platform_device *pdev)
 	if (err)
 		goto err_free_dev;
 
-	err = mtk_mdio_init(eth);
-	if (err)
-		goto err_free_dev;
+	/* No MT7628/88 support yet */
+	if (!MTK_HAS_CAPS(eth->soc->caps, MTK_SOC_MT7628)) {
+		err = mtk_mdio_init(eth);
+		if (err)
+			goto err_free_dev;
+	}
 
 	for (i = 0; i < MTK_MAX_DEVS; i++) {
 		if (!eth->netdev[i])
@@ -2637,12 +2890,14 @@ static int mtk_remove(struct platform_device *pdev)
 
 static const struct mtk_soc_data mt2701_data = {
 	.caps = MT7623_CAPS | MTK_HWLRO,
+	.hw_features = MTK_HW_FEATURES,
 	.required_clks = MT7623_CLKS_BITMAP,
 	.required_pctl = true,
 };
 
 static const struct mtk_soc_data mt7621_data = {
 	.caps = MT7621_CAPS,
+	.hw_features = MTK_HW_FEATURES,
 	.required_clks = MT7621_CLKS_BITMAP,
 	.required_pctl = false,
 };
@@ -2650,12 +2905,14 @@ static const struct mtk_soc_data mt7621_data = {
 static const struct mtk_soc_data mt7622_data = {
 	.ana_rgc3 = 0x2028,
 	.caps = MT7622_CAPS | MTK_HWLRO,
+	.hw_features = MTK_HW_FEATURES,
 	.required_clks = MT7622_CLKS_BITMAP,
 	.required_pctl = false,
 };
 
 static const struct mtk_soc_data mt7623_data = {
 	.caps = MT7623_CAPS | MTK_HWLRO,
+	.hw_features = MTK_HW_FEATURES,
 	.required_clks = MT7623_CLKS_BITMAP,
 	.required_pctl = true,
 };
@@ -2663,16 +2920,25 @@ static const struct mtk_soc_data mt7623_data = {
 static const struct mtk_soc_data mt7629_data = {
 	.ana_rgc3 = 0x128,
 	.caps = MT7629_CAPS | MTK_HWLRO,
+	.hw_features = MTK_HW_FEATURES,
 	.required_clks = MT7629_CLKS_BITMAP,
 	.required_pctl = false,
 };
 
+static const struct mtk_soc_data rt5350_data = {
+	.caps = MT7628_CAPS,
+	.hw_features = MTK_HW_FEATURES_MT7628,
+	.required_clks = MT7628_CLKS_BITMAP,
+	.required_pctl = false,
+};
+
 const struct of_device_id of_mtk_match[] = {
 	{ .compatible = "mediatek,mt2701-eth", .data = &mt2701_data},
 	{ .compatible = "mediatek,mt7621-eth", .data = &mt7621_data},
 	{ .compatible = "mediatek,mt7622-eth", .data = &mt7622_data},
 	{ .compatible = "mediatek,mt7623-eth", .data = &mt7623_data},
 	{ .compatible = "mediatek,mt7629-eth", .data = &mt7629_data},
+	{ .compatible = "ralink,rt5350-eth", .data = &rt5350_data},
 	{},
 };
 MODULE_DEVICE_TABLE(of, of_mtk_match);
diff --git a/drivers/net/ethernet/mediatek/mtk_eth_soc.h b/drivers/net/ethernet/mediatek/mtk_eth_soc.h
index 556644f28eae..cc1466ae0926 100644
--- a/drivers/net/ethernet/mediatek/mtk_eth_soc.h
+++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.h
@@ -39,6 +39,7 @@
 				 NETIF_F_SG | NETIF_F_TSO | \
 				 NETIF_F_TSO6 | \
 				 NETIF_F_IPV6_CSUM)
+#define MTK_HW_FEATURES_MT7628	(NETIF_F_SG | NETIF_F_RXCSUM)
 #define NEXT_DESP_IDX(X, Y)	(((X) + 1) & ((Y) - 1))
 
 #define MTK_MAX_RX_RING_NUM	4
@@ -118,6 +119,7 @@
 /* PDMA Global Configuration Register */
 #define MTK_PDMA_GLO_CFG	0xa04
 #define MTK_MULTI_EN		BIT(10)
+#define MTK_PDMA_SIZE_8DWORDS	(1 << 4)
 
 /* PDMA Reset Index Register */
 #define MTK_PDMA_RST_IDX	0xa08
@@ -276,11 +278,18 @@
 #define TX_DMA_OWNER_CPU	BIT(31)
 #define TX_DMA_LS0		BIT(30)
 #define TX_DMA_PLEN0(_x)	(((_x) & MTK_TX_DMA_BUF_LEN) << 16)
+#define TX_DMA_PLEN1(_x)	((_x) & MTK_TX_DMA_BUF_LEN)
 #define TX_DMA_SWC		BIT(14)
 #define TX_DMA_SDL(_x)		(((_x) & 0x3fff) << 16)
 
+/* PDMA on MT7628 */
+#define TX_DMA_DONE		BIT(31)
+#define TX_DMA_LS1		BIT(14)
+#define TX_DMA_DESP2_DEF	(TX_DMA_LS0 | TX_DMA_DONE)
+
 /* QDMA descriptor rxd2 */
 #define RX_DMA_DONE		BIT(31)
+#define RX_DMA_LSO		BIT(30)
 #define RX_DMA_PLEN0(_x)	(((_x) & 0x3fff) << 16)
 #define RX_DMA_GET_PLEN0(_x)	(((_x) >> 16) & 0x3fff)
 
@@ -289,6 +298,7 @@
 
 /* QDMA descriptor rxd4 */
 #define RX_DMA_L4_VALID		BIT(24)
+#define RX_DMA_L4_VALID_PDMA	BIT(30)		/* when PDMA is used */
 #define RX_DMA_FPORT_SHIFT	19
 #define RX_DMA_FPORT_MASK	0x7
 
@@ -412,6 +422,19 @@
 #define CO_QPHY_SEL            BIT(0)
 #define GEPHY_MAC_SEL          BIT(1)
 
+/* MT7628/88 specific stuff */
+#define MT7628_PDMA_OFFSET	0x0800
+#define MT7628_SDM_OFFSET	0x0c00
+
+#define MT7628_TX_BASE_PTR0	(MT7628_PDMA_OFFSET + 0x00)
+#define MT7628_TX_MAX_CNT0	(MT7628_PDMA_OFFSET + 0x04)
+#define MT7628_TX_CTX_IDX0	(MT7628_PDMA_OFFSET + 0x08)
+#define MT7628_TX_DTX_IDX0	(MT7628_PDMA_OFFSET + 0x0c)
+#define MT7628_PST_DTX_IDX0	BIT(0)
+
+#define MT7628_SDM_MAC_ADRL	(MT7628_SDM_OFFSET + 0x0c)
+#define MT7628_SDM_MAC_ADRH	(MT7628_SDM_OFFSET + 0x10)
+
 struct mtk_rx_dma {
 	unsigned int rxd1;
 	unsigned int rxd2;
@@ -509,6 +532,7 @@ enum mtk_clks_map {
 				 BIT(MTK_CLK_SGMII_CK) | \
 				 BIT(MTK_CLK_ETH2PLL))
 #define MT7621_CLKS_BITMAP	(0)
+#define MT7628_CLKS_BITMAP	(0)
 #define MT7629_CLKS_BITMAP	(BIT(MTK_CLK_ETHIF) | BIT(MTK_CLK_ESW) |  \
 				 BIT(MTK_CLK_GP0) | BIT(MTK_CLK_GP1) | \
 				 BIT(MTK_CLK_GP2) | BIT(MTK_CLK_FE) | \
@@ -563,6 +587,10 @@ struct mtk_tx_ring {
 	struct mtk_tx_dma *last_free;
 	u16 thresh;
 	atomic_t free_count;
+	int dma_size;
+	struct mtk_tx_dma *dma_pdma;	/* For MT7628/88 PDMA handling */
+	dma_addr_t phys_pdma;
+	int cpu_idx;
 };
 
 /* PDMA rx ring mode */
@@ -604,6 +632,8 @@ enum mkt_eth_capabilities {
 	MTK_HWLRO_BIT,
 	MTK_SHARED_INT_BIT,
 	MTK_TRGMII_MT7621_CLK_BIT,
+	MTK_QDMA_BIT,
+	MTK_SOC_MT7628_BIT,
 
 	/* MUX BITS*/
 	MTK_ETH_MUX_GDM1_TO_GMAC1_ESW_BIT,
@@ -634,6 +664,8 @@ enum mkt_eth_capabilities {
 #define MTK_HWLRO		BIT(MTK_HWLRO_BIT)
 #define MTK_SHARED_INT		BIT(MTK_SHARED_INT_BIT)
 #define MTK_TRGMII_MT7621_CLK	BIT(MTK_TRGMII_MT7621_CLK_BIT)
+#define MTK_QDMA		BIT(MTK_QDMA_BIT)
+#define MTK_SOC_MT7628		BIT(MTK_SOC_MT7628_BIT)
 
 #define MTK_ETH_MUX_GDM1_TO_GMAC1_ESW		\
 	BIT(MTK_ETH_MUX_GDM1_TO_GMAC1_ESW_BIT)
@@ -687,26 +719,31 @@ enum mkt_eth_capabilities {
 #define MTK_HAS_CAPS(caps, _x)		(((caps) & (_x)) == (_x))
 
 #define MT7621_CAPS  (MTK_GMAC1_RGMII | MTK_GMAC1_TRGMII | \
-		      MTK_GMAC2_RGMII | MTK_SHARED_INT | MTK_TRGMII_MT7621_CLK)
+		      MTK_GMAC2_RGMII | MTK_SHARED_INT | \
+		      MTK_TRGMII_MT7621_CLK | MTK_QDMA)
 
 #define MT7622_CAPS  (MTK_GMAC1_RGMII | MTK_GMAC1_SGMII | MTK_GMAC2_RGMII | \
 		      MTK_GMAC2_SGMII | MTK_GDM1_ESW | \
 		      MTK_MUX_GDM1_TO_GMAC1_ESW | \
-		      MTK_MUX_GMAC1_GMAC2_TO_SGMII_RGMII)
+		      MTK_MUX_GMAC1_GMAC2_TO_SGMII_RGMII | MTK_QDMA)
+
+#define MT7623_CAPS  (MTK_GMAC1_RGMII | MTK_GMAC1_TRGMII | MTK_GMAC2_RGMII | \
+		      MTK_QDMA)
 
-#define MT7623_CAPS  (MTK_GMAC1_RGMII | MTK_GMAC1_TRGMII | MTK_GMAC2_RGMII)
+#define MT7628_CAPS  (MTK_SHARED_INT | MTK_SOC_MT7628)
 
 #define MT7629_CAPS  (MTK_GMAC1_SGMII | MTK_GMAC2_SGMII | MTK_GMAC2_GEPHY | \
 		      MTK_GDM1_ESW | MTK_MUX_GDM1_TO_GMAC1_ESW | \
 		      MTK_MUX_GMAC2_GMAC0_TO_GEPHY | \
 		      MTK_MUX_U3_GMAC2_TO_QPHY | \
-		      MTK_MUX_GMAC12_TO_GEPHY_SGMII)
+		      MTK_MUX_GMAC12_TO_GEPHY_SGMII | MTK_QDMA)
 
 /* struct mtk_eth_data -	This is the structure holding all differences
  *				among various plaforms
  * @ana_rgc3:                   The offset for register ANA_RGC3 related to
  *				sgmiisys syscon
  * @caps			Flags shown the extra capability for the SoC
+ * @hw_features			Flags shown HW features
  * @required_clks		Flags shown the bitmap for required clocks on
  *				the target SoC
  * @required_pctl		A bool value to show whether the SoC requires
@@ -717,6 +754,7 @@ struct mtk_soc_data {
 	u32		caps;
 	u32		required_clks;
 	bool		required_pctl;
+	netdev_features_t hw_features;
 };
 
 /* currently no SoC has more than 2 macs */
@@ -810,6 +848,11 @@ struct mtk_eth {
 	unsigned long			state;
 
 	const struct mtk_soc_data	*soc;
+
+	u32				tx_int_mask_reg;
+	u32				tx_int_status_reg;
+	u32				rx_dma_l4_valid;
+	int				ip_align;
 };
 
 /* struct mtk_mac -	the structure that holds the info about the MACs of the
-- 
2.22.1


^ permalink raw reply related

* [PATCH net-next 2/4 v2] net: ethernet: mediatek: Rename MTK_QMTK_INT_STATUS to MTK_QDMA_INT_STATUS
From: Stefan Roese @ 2019-08-14 11:18 UTC (permalink / raw)
  To: netdev; +Cc: René van Dorst, Daniel Golle, Sean Wang, John Crispin
In-Reply-To: <20190814111825.10855-1-sr@denx.de>

Currently all QDMA registers are named "MTK_QDMA_foo" in this driver
with one exception: MTK_QMTK_INT_STATUS. This patch renames
MTK_QMTK_INT_STATUS to MTK_QDMA_INT_STATUS so that all macros follow
this rule.

Signed-off-by: Stefan Roese <sr@denx.de>
Cc: René van Dorst <opensource@vdorst.com>
Cc: Daniel Golle <daniel@makrotopia.org>
Cc: Sean Wang <sean.wang@mediatek.com>
Cc: John Crispin <john@phrozen.org>
---
v2:
- New patch

 drivers/net/ethernet/mediatek/mtk_eth_soc.c | 8 ++++----
 drivers/net/ethernet/mediatek/mtk_eth_soc.h | 2 +-
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/mediatek/mtk_eth_soc.c b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
index ddbffeb5701b..bee2cdca66e7 100644
--- a/drivers/net/ethernet/mediatek/mtk_eth_soc.c
+++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
@@ -1122,11 +1122,11 @@ static int mtk_napi_tx(struct napi_struct *napi, int budget)
 	int tx_done = 0;
 
 	mtk_handle_status_irq(eth);
-	mtk_w32(eth, MTK_TX_DONE_INT, MTK_QMTK_INT_STATUS);
+	mtk_w32(eth, MTK_TX_DONE_INT, MTK_QDMA_INT_STATUS);
 	tx_done = mtk_poll_tx(eth, budget);
 
 	if (unlikely(netif_msg_intr(eth))) {
-		status = mtk_r32(eth, MTK_QMTK_INT_STATUS);
+		status = mtk_r32(eth, MTK_QDMA_INT_STATUS);
 		mask = mtk_r32(eth, MTK_QDMA_INT_MASK);
 		dev_info(eth->dev,
 			 "done tx %d, intr 0x%08x/0x%x\n",
@@ -1136,7 +1136,7 @@ static int mtk_napi_tx(struct napi_struct *napi, int budget)
 	if (tx_done == budget)
 		return budget;
 
-	status = mtk_r32(eth, MTK_QMTK_INT_STATUS);
+	status = mtk_r32(eth, MTK_QDMA_INT_STATUS);
 	if (status & MTK_TX_DONE_INT)
 		return budget;
 
@@ -1747,7 +1747,7 @@ static irqreturn_t mtk_handle_irq(int irq, void *_eth)
 			mtk_handle_irq_rx(irq, _eth);
 	}
 	if (mtk_r32(eth, MTK_QDMA_INT_MASK) & MTK_TX_DONE_INT) {
-		if (mtk_r32(eth, MTK_QMTK_INT_STATUS) & MTK_TX_DONE_INT)
+		if (mtk_r32(eth, MTK_QDMA_INT_STATUS) & MTK_TX_DONE_INT)
 			mtk_handle_irq_tx(irq, _eth);
 	}
 
diff --git a/drivers/net/ethernet/mediatek/mtk_eth_soc.h b/drivers/net/ethernet/mediatek/mtk_eth_soc.h
index bab94f763e2c..088e2bc621f7 100644
--- a/drivers/net/ethernet/mediatek/mtk_eth_soc.h
+++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.h
@@ -212,7 +212,7 @@
 #define FC_THRES_MIN		0x4444
 
 /* QDMA Interrupt Status Register */
-#define MTK_QMTK_INT_STATUS	0x1A18
+#define MTK_QDMA_INT_STATUS	0x1A18
 #define MTK_RX_DONE_DLY		BIT(30)
 #define MTK_RX_DONE_INT3	BIT(19)
 #define MTK_RX_DONE_INT2	BIT(18)
-- 
2.22.1


^ permalink raw reply related

* [PATCH net-next 1/4 v2] dt-bindings: net: mediatek: Add support for MediaTek MT7628/88 SoC
From: Stefan Roese @ 2019-08-14 11:18 UTC (permalink / raw)
  To: netdev
  Cc: René van Dorst, Daniel Golle, Sean Wang, John Crispin,
	devicetree, Rob Herring

Add compatible for the ethernet IP core on MT7628/88 SoCs. Its
compatible with the older Ralink Rt5350F SoC. And OpenWrt already
uses this compatible string for the MT76x8.

Signed-off-by: Stefan Roese <sr@denx.de>
Cc: René van Dorst <opensource@vdorst.com>
Cc: Daniel Golle <daniel@makrotopia.org>
Cc: Sean Wang <sean.wang@mediatek.com>
Cc: John Crispin <john@phrozen.org>
Cc: devicetree@vger.kernel.org
Cc: Rob Herring <robh@kernel.org>
---
v2:
- New patch - bindings description moved to separate patch

Documentation/devicetree/bindings/net/mediatek-net.txt | 1 +
 1 file changed, 1 insertion(+)

diff --git a/Documentation/devicetree/bindings/net/mediatek-net.txt b/Documentation/devicetree/bindings/net/mediatek-net.txt
index 770ff98d4524..72d03e07cf7c 100644
--- a/Documentation/devicetree/bindings/net/mediatek-net.txt
+++ b/Documentation/devicetree/bindings/net/mediatek-net.txt
@@ -12,6 +12,7 @@ Required properties:
 		"mediatek,mt7623-eth", "mediatek,mt2701-eth": for MT7623 SoC
 		"mediatek,mt7622-eth": for MT7622 SoC
 		"mediatek,mt7629-eth": for MT7629 SoC
+		"ralink,rt5350-eth": for Ralink Rt5350F and MT7628/88 SoC
 - reg: Address and length of the register set for the device
 - interrupts: Should contain the three frame engines interrupts in numeric
 	order. These are fe_int0, fe_int1 and fe_int2.
-- 
2.22.1


^ permalink raw reply related

* Re: [PATCH] net: ethernet: mediatek: Add MT7628/88 SoC support
From: Stefan Roese @ 2019-08-14 10:48 UTC (permalink / raw)
  To: René van Dorst
  Cc: netdev, linux-mediatek, Sean Wang, John Crispin, Daniel Golle
In-Reply-To: <20190814092621.Horde.epvj8zK96-aCiV70YB5Q7II@www.vdorst.com>

Hi Rene,

On 14.08.19 11:26, René van Dorst wrote:
> Hi Stefan,
> 
> Quoting Stefan Roese <sr@denx.de>:
> 
>> Hi Rene,
>>
>> On 17.07.19 14:53, René van Dorst wrote:
>>
>> <snip>
>>
>>>> +++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.h
>>>> @@ -39,7 +39,8 @@
>>>>   				 NETIF_F_SG | NETIF_F_TSO | \
>>>>   				 NETIF_F_TSO6 | \
>>>>   				 NETIF_F_IPV6_CSUM)
>>>> -#define NEXT_RX_DESP_IDX(X, Y)	(((X) + 1) & ((Y) - 1))
>>>> +#define MTK_HW_FEATURES_MT7628	(NETIF_F_SG | NETIF_F_RXCSUM)
>>>> +#define NEXT_DESP_IDX(X, Y)	(((X) + 1) & ((Y) - 1))
>>>>
>>>>   #define MTK_MAX_RX_RING_NUM	4
>>>>   #define MTK_HW_LRO_DMA_SIZE	8
>>>> @@ -118,6 +119,7 @@
>>>>   /* PDMA Global Configuration Register */
>>>>   #define MTK_PDMA_GLO_CFG	0xa04
>>>>   #define MTK_MULTI_EN		BIT(10)
>>>> +#define MTK_PDMA_SIZE_8DWORDS	(1 << 4)
>>>>
>>>>   /* PDMA Reset Index Register */
>>>>   #define MTK_PDMA_RST_IDX	0xa08
>>>> @@ -276,11 +278,18 @@
>>>>   #define TX_DMA_OWNER_CPU	BIT(31)
>>>>   #define TX_DMA_LS0		BIT(30)
>>>>   #define TX_DMA_PLEN0(_x)	(((_x) & MTK_TX_DMA_BUF_LEN) << 16)
>>>> +#define TX_DMA_PLEN1(_x)	((_x) & MTK_TX_DMA_BUF_LEN)
>>>>   #define TX_DMA_SWC		BIT(14)
>>>>   #define TX_DMA_SDL(_x)		(((_x) & 0x3fff) << 16)
>>>>
>>>> +/* PDMA on MT7628 */
>>>> +#define TX_DMA_DONE		BIT(31)
>>>> +#define TX_DMA_LS1		BIT(14)
>>>> +#define TX_DMA_DESP2_DEF	(TX_DMA_LS0 | TX_DMA_DONE)
>>>> +
>>>>   /* QDMA descriptor rxd2 */
>>>>   #define RX_DMA_DONE		BIT(31)
>>>> +#define RX_DMA_LSO		BIT(30)
>>>>   #define RX_DMA_PLEN0(_x)	(((_x) & 0x3fff) << 16)
>>>>   #define RX_DMA_GET_PLEN0(_x)	(((_x) >> 16) & 0x3fff)
>>>>
>>>> @@ -289,6 +298,7 @@
>>>>
>>>>   /* QDMA descriptor rxd4 */
>>>>   #define RX_DMA_L4_VALID		BIT(24)
>>>> +#define RX_DMA_L4_VALID_PDMA	BIT(30)		/* when PDMA is used */
>>>>   #define RX_DMA_FPORT_SHIFT	19
>>>>   #define RX_DMA_FPORT_MASK	0x7
>>>>
>>>> @@ -412,6 +422,19 @@
>>>>   #define CO_QPHY_SEL            BIT(0)
>>>>   #define GEPHY_MAC_SEL          BIT(1)
>>>>
>>>> +/* MT7628/88 specific stuff */
>>>> +#define MT7628_PDMA_OFFSET	0x0800
>>>> +#define MT7628_SDM_OFFSET	0x0c00
>>>> +
>>>> +#define MT7628_TX_BASE_PTR0	(MT7628_PDMA_OFFSET + 0x00)
>>>> +#define MT7628_TX_MAX_CNT0	(MT7628_PDMA_OFFSET + 0x04)
>>>> +#define MT7628_TX_CTX_IDX0	(MT7628_PDMA_OFFSET + 0x08)
>>>> +#define MT7628_TX_DTX_IDX0	(MT7628_PDMA_OFFSET + 0x0c)
>>>> +#define MT7628_PST_DTX_IDX0	BIT(0)
>>>> +
>>>> +#define MT7628_SDM_MAC_ADRL	(MT7628_SDM_OFFSET + 0x0c)
>>>> +#define MT7628_SDM_MAC_ADRH	(MT7628_SDM_OFFSET + 0x10)
>>>> +
>>>>   struct mtk_rx_dma {
>>>>   	unsigned int rxd1;
>>>>   	unsigned int rxd2;
>>>> @@ -509,6 +532,7 @@ enum mtk_clks_map {
>>>>   				 BIT(MTK_CLK_SGMII_CK) | \
>>>>   				 BIT(MTK_CLK_ETH2PLL))
>>>>   #define MT7621_CLKS_BITMAP	(0)
>>>> +#define MT7628_CLKS_BITMAP	(0)
>>>>   #define MT7629_CLKS_BITMAP	(BIT(MTK_CLK_ETHIF) | BIT(MTK_CLK_ESW) |  \
>>>>   				 BIT(MTK_CLK_GP0) | BIT(MTK_CLK_GP1) | \
>>>>   				 BIT(MTK_CLK_GP2) | BIT(MTK_CLK_FE) | \
>>>> @@ -563,6 +587,10 @@ struct mtk_tx_ring {
>>>>   	struct mtk_tx_dma *last_free;
>>>>   	u16 thresh;
>>>>   	atomic_t free_count;
>>>> +	int dma_size;
>>>> +	struct mtk_tx_dma *dma_pdma;	/* For MT7628/88 PDMA handling */
>>>> +	dma_addr_t phys_pdma;
>>>> +	int cpu_idx;
>>>>   };
>>>>
>>>>   /* PDMA rx ring mode */
>>>> @@ -604,6 +632,7 @@ enum mkt_eth_capabilities {
>>>>   	MTK_HWLRO_BIT,
>>>>   	MTK_SHARED_INT_BIT,
>>>>   	MTK_TRGMII_MT7621_CLK_BIT,
>>>> +	MTK_SOC_MT7628,
>>>
>>> This should be MTK_SOC_MT7628_BIT, this only defines the bit number!
>>>
>>> and futher on #define MTK_SOC_MT7628 BIT(MTK_SOC_MT7628_BIT)
>>
>> Okay, thanks.
>>
>>> Based on this commit [0], MT7621 also needs the PDMA for the RX path.
>>> I know that is not your issue but I think it is better to add a extra
>>> capability bit for the PDMA bits so it can also be used on other socs.
>>
>> Yes, MT7621 also uses PDMA for RX. The code for RX is pretty much
>> shared (re-used), with slight changes for the MT7628/88 to work
>> correctly on this SoC.
>>
>> I'll work on a capability bit for PDMA vs QDMA on TX though. This
>> might make things a little more transparent.
> 
> Great, Thanks for addressing this issue.
> 
> I hope we can collaborate to also support mt76x8 in my PHYLINK patches [0][1].
> I am close to posting V2 of the patches but I am currently waiting on some
> fiber modules to test the changes better.

I do have a "hackish" DSA driver for the integrated switch (ESW) in my
tree. If time permits, I'll work on upstreaming this one as well. And
yes, hopefully we can collaborate on your PHYLINK work too.

Thanks,
Stefan

^ permalink raw reply

* [PATCH net 2/2] rxrpc: Fix read-after-free in rxrpc_queue_local()
From: David Howells @ 2019-08-14 10:48 UTC (permalink / raw)
  To: netdev; +Cc: dhowells, linux-afs, linux-kernel
In-Reply-To: <156577967167.1405.3581547705200268244.stgit@warthog.procyon.org.uk>

rxrpc_queue_local() attempts to queue the local endpoint it is given and
then, if successful, prints a trace line.  The trace line includes the
current usage count - but we're not allowed to look at the local endpoint
at this point as we passed our ref on it to the workqueue.

Fix this by reading the usage count before queuing the work item.

Also fix the reading of local->debug_id for trace lines, which must be done
with the same consideration as reading the usage count.

Fixes: 09d2bf595db4 ("rxrpc: Add a tracepoint to track rxrpc_local refcounting")
Reported-by: syzbot+78e71c5bab4f76a6a719@syzkaller.appspotmail.com
Signed-off-by: David Howells <dhowells@redhat.com>
---

 include/trace/events/rxrpc.h |    6 +++---
 net/rxrpc/local_object.c     |   19 ++++++++++---------
 2 files changed, 13 insertions(+), 12 deletions(-)

diff --git a/include/trace/events/rxrpc.h b/include/trace/events/rxrpc.h
index cc1d060cbf13..fa06b528c73c 100644
--- a/include/trace/events/rxrpc.h
+++ b/include/trace/events/rxrpc.h
@@ -498,10 +498,10 @@ rxrpc_tx_points;
 #define E_(a, b)	{ a, b }
 
 TRACE_EVENT(rxrpc_local,
-	    TP_PROTO(struct rxrpc_local *local, enum rxrpc_local_trace op,
+	    TP_PROTO(unsigned int local_debug_id, enum rxrpc_local_trace op,
 		     int usage, const void *where),
 
-	    TP_ARGS(local, op, usage, where),
+	    TP_ARGS(local_debug_id, op, usage, where),
 
 	    TP_STRUCT__entry(
 		    __field(unsigned int,	local		)
@@ -511,7 +511,7 @@ TRACE_EVENT(rxrpc_local,
 			     ),
 
 	    TP_fast_assign(
-		    __entry->local = local->debug_id;
+		    __entry->local = local_debug_id;
 		    __entry->op = op;
 		    __entry->usage = usage;
 		    __entry->where = where;
diff --git a/net/rxrpc/local_object.c b/net/rxrpc/local_object.c
index c45765b7263e..72a6e12a9304 100644
--- a/net/rxrpc/local_object.c
+++ b/net/rxrpc/local_object.c
@@ -93,7 +93,7 @@ static struct rxrpc_local *rxrpc_alloc_local(struct rxrpc_net *rxnet,
 		local->debug_id = atomic_inc_return(&rxrpc_debug_id);
 		memcpy(&local->srx, srx, sizeof(*srx));
 		local->srx.srx_service = 0;
-		trace_rxrpc_local(local, rxrpc_local_new, 1, NULL);
+		trace_rxrpc_local(local->debug_id, rxrpc_local_new, 1, NULL);
 	}
 
 	_leave(" = %p", local);
@@ -321,7 +321,7 @@ struct rxrpc_local *rxrpc_get_local(struct rxrpc_local *local)
 	int n;
 
 	n = atomic_inc_return(&local->usage);
-	trace_rxrpc_local(local, rxrpc_local_got, n, here);
+	trace_rxrpc_local(local->debug_id, rxrpc_local_got, n, here);
 	return local;
 }
 
@@ -335,7 +335,8 @@ struct rxrpc_local *rxrpc_get_local_maybe(struct rxrpc_local *local)
 	if (local) {
 		int n = atomic_fetch_add_unless(&local->usage, 1, 0);
 		if (n > 0)
-			trace_rxrpc_local(local, rxrpc_local_got, n + 1, here);
+			trace_rxrpc_local(local->debug_id, rxrpc_local_got,
+					  n + 1, here);
 		else
 			local = NULL;
 	}
@@ -343,16 +344,16 @@ struct rxrpc_local *rxrpc_get_local_maybe(struct rxrpc_local *local)
 }
 
 /*
- * Queue a local endpoint unless it has become unreferenced and pass the
- * caller's reference to the work item.
+ * Queue a local endpoint and pass the caller's reference to the work item.
  */
 void rxrpc_queue_local(struct rxrpc_local *local)
 {
 	const void *here = __builtin_return_address(0);
+	unsigned int debug_id = local->debug_id;
+	int n = atomic_read(&local->usage);
 
 	if (rxrpc_queue_work(&local->processor))
-		trace_rxrpc_local(local, rxrpc_local_queued,
-				  atomic_read(&local->usage), here);
+		trace_rxrpc_local(debug_id, rxrpc_local_queued, n, here);
 	else
 		rxrpc_put_local(local);
 }
@@ -367,7 +368,7 @@ void rxrpc_put_local(struct rxrpc_local *local)
 
 	if (local) {
 		n = atomic_dec_return(&local->usage);
-		trace_rxrpc_local(local, rxrpc_local_put, n, here);
+		trace_rxrpc_local(local->debug_id, rxrpc_local_put, n, here);
 
 		if (n == 0)
 			call_rcu(&local->rcu, rxrpc_local_rcu);
@@ -456,7 +457,7 @@ static void rxrpc_local_processor(struct work_struct *work)
 		container_of(work, struct rxrpc_local, processor);
 	bool again;
 
-	trace_rxrpc_local(local, rxrpc_local_processing,
+	trace_rxrpc_local(local->debug_id, rxrpc_local_processing,
 			  atomic_read(&local->usage), NULL);
 
 	do {


^ permalink raw reply related

* [PATCH net 1/2] rxrpc: Fix local endpoint replacement
From: David Howells @ 2019-08-14 10:47 UTC (permalink / raw)
  To: netdev; +Cc: dhowells, linux-afs, linux-kernel
In-Reply-To: <156577967167.1405.3581547705200268244.stgit@warthog.procyon.org.uk>

When a local endpoint (struct rxrpc_local) ceases to be in use by any
AF_RXRPC sockets, it starts the process of being destroyed, but this
doesn't cause it to be removed from the namespace endpoint list immediately
as tearing it down isn't trivial and can't be done in softirq context, so
it gets deferred.

If a new socket comes along that wants to bind to the same endpoint, a new
rxrpc_local object will be allocated and rxrpc_lookup_local() will use
list_replace() to substitute the new one for the old.

Then, when the dying object gets to rxrpc_local_destroyer(), it is removed
unconditionally from whatever list it is on by calling list_del_init().

However, list_replace() doesn't reset the pointers in the replaced
list_head and so the list_del_init() will likely corrupt the local
endpoints list.

Fix this by using list_replace_init() instead.

Fixes: 730c5fd42c1e ("rxrpc: Fix local endpoint refcounting")
Reported-by: syzbot+193e29e9387ea5837f1d@syzkaller.appspotmail.com
Signed-off-by: David Howells <dhowells@redhat.com>
---

 net/rxrpc/local_object.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/rxrpc/local_object.c b/net/rxrpc/local_object.c
index c9db3e762d8d..c45765b7263e 100644
--- a/net/rxrpc/local_object.c
+++ b/net/rxrpc/local_object.c
@@ -283,7 +283,7 @@ struct rxrpc_local *rxrpc_lookup_local(struct net *net,
 		goto sock_error;

 	if (cursor != &rxnet->local_endpoints)
-		list_replace(cursor, &local->link);
+		list_replace_init(cursor, &local->link);
 	else
 		list_add_tail(&local->link, cursor);
 	age = "new";

^ permalink raw reply related

* [PATCH net 0/2] rxrpc: Fix local endpoint handling
From: David Howells @ 2019-08-14 10:47 UTC (permalink / raw)
  To: netdev; +Cc: dhowells, linux-afs, linux-kernel


Here's a pair of patches that fix two issues in the handling of local
endpoints (rxrpc_local structs):

 (1) Use list_replace_init() rather than list_replace() if we're going to
     unconditionally delete the replaced item later, lest the list get
     corrupted.

 (2) Don't access the rxrpc_local object after passing our ref to the
     workqueue, not even to illuminate tracepoints, as the work function
     may cause the object to be freed.  We have to cache the information
     beforehand.

The patches are tagged here:

	git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git
	rxrpc-fixes-20190814

and can also be found on the following branch:

	http://git.kernel.org/cgit/linux/kernel/git/dhowells/linux-fs.git/log/?h=rxrpc-fixes

David
---
David Howells (2):
      rxrpc: Fix local endpoint replacement
      rxrpc: Fix read-after-free in rxrpc_queue_local()


 include/trace/events/rxrpc.h |    6 +++---
 net/rxrpc/local_object.c     |   21 +++++++++++----------
 2 files changed, 14 insertions(+), 13 deletions(-)


^ permalink raw reply

* Re: [PATCH bpf-next 1/3] libbpf: add asm/unistd.h to xsk to get __NR_mmap2
From: Ivan Khoronzhuk @ 2019-08-14 10:19 UTC (permalink / raw)
  To: Yonghong Song
  Cc: magnus.karlsson@intel.com, bjorn.topel@intel.com,
	davem@davemloft.net, hawk@kernel.org, john.fastabend@gmail.com,
	jakub.kicinski@netronome.com, daniel@iogearbox.net,
	netdev@vger.kernel.org, bpf@vger.kernel.org,
	xdp-newbies@vger.kernel.org, linux-kernel@vger.kernel.org
In-Reply-To: <05e5d15b-5ef9-b4dc-a76c-0d423fb2f15d@fb.com>

On Wed, Aug 14, 2019 at 12:32:41AM +0000, Yonghong Song wrote:

Hi, Yonghong Song

>
>
>On 8/13/19 3:23 AM, Ivan Khoronzhuk wrote:
>> That's needed to get __NR_mmap2 when mmap2 syscall is used.
>
>It seems I did not have this issue on x64 machine e.g., Fedora 29.
>My glibc version is 2.28. gcc 8.2.1.

On 64 there is no the issue.

>
>What is your particular system glibc version?
>So needing kernel asm/unistd.h is because of older glibc on your
>system, or something else? Could you clarify?

It doesn't fix build issues, only runtime one on 32bits.

If no such inclusion -> no __NR_mmap2 definition - just mmap() is used ->
no problems on x64.

Is the inclusion -> no NR_mmap2 or is NR_mmap2 -> no problems on x64


-- 
Regards,
Ivan Khoronzhuk

^ permalink raw reply

* Re: [PATCH AUTOSEL 4.19 04/42] netfilter: conntrack: always store window size un-scaled
From: Reindl Harald @ 2019-08-14 10:19 UTC (permalink / raw)
  To: Thomas Jarosch, Sasha Levin
  Cc: linux-kernel, stable, Florian Westphal, Jakub Jankowski,
	Jozsef Kadlecsik, Pablo Neira Ayuso, netfilter-devel, coreteam,
	netdev
In-Reply-To: <20190808090209.wb63n6ibii4ivvba@intra2net.com>

that's still not in 5.2.8

without the exception and "nf_conntrack_tcp_timeout_max_retrans = 60" a
vnc-over-ssh session having the VNC view in the background freezes
within 60 secods

-----------------------------------------------------------------------------------------------
IPV4 TABLE MANGLE (STATEFUL PRE-NAT/FILTER)
-----------------------------------------------------------------------------------------------
Chain PREROUTING (policy ACCEPT 100 packets, 9437 bytes)
num   pkts bytes target     prot opt in     out     source
 destination
1     6526 3892K ACCEPT     all  --  *      *       0.0.0.0/0
 0.0.0.0/0            ctstate RELATED,ESTABLISHED
2      125  6264 ACCEPT     all  --  lo     *       0.0.0.0/0
 0.0.0.0/0
3       64  4952 ACCEPT     all  --  vmnet8 *       0.0.0.0/0
 0.0.0.0/0
4        1    40 DROP       all  --  *      *       0.0.0.0/0
 0.0.0.0/0            ctstate INVALID

-------- Weitergeleitete Nachricht --------
Betreff: [PATCH AUTOSEL 5.2 07/76] netfilter: conntrack: always store
window size un-scaled

Am 08.08.19 um 11:02 schrieb Thomas Jarosch:
> Hello together,
> 
> You wrote on Fri, Aug 02, 2019 at 09:22:24AM -0400:
>> From: Florian Westphal <fw@strlen.de>
>>
>> [ Upstream commit 959b69ef57db00cb33e9c4777400ae7183ebddd3 ]
>>
>> Jakub Jankowski reported following oddity:
>>
>> After 3 way handshake completes, timeout of new connection is set to
>> max_retrans (300s) instead of established (5 days).
>>
>> shortened excerpt from pcap provided:
>> 25.070622 IP (flags [DF], proto TCP (6), length 52)
>> 10.8.5.4.1025 > 10.8.1.2.80: Flags [S], seq 11, win 64240, [wscale 8]
>> 26.070462 IP (flags [DF], proto TCP (6), length 48)
>> 10.8.1.2.80 > 10.8.5.4.1025: Flags [S.], seq 82, ack 12, win 65535, [wscale 3]
>> 27.070449 IP (flags [DF], proto TCP (6), length 40)
>> 10.8.5.4.1025 > 10.8.1.2.80: Flags [.], ack 83, win 512, length 0
>>
>> Turns out the last_win is of u16 type, but we store the scaled value:
>> 512 << 8 (== 0x20000) becomes 0 window.
>>
>> The Fixes tag is not correct, as the bug has existed forever, but
>> without that change all that this causes might cause is to mistake a
>> window update (to-nonzero-from-zero) for a retransmit.
>>
>> Fixes: fbcd253d2448b8 ("netfilter: conntrack: lower timeout to RETRANS seconds if window is 0")
>> Reported-by: Jakub Jankowski <shasta@toxcorp.com>
>> Tested-by: Jakub Jankowski <shasta@toxcorp.com>
>> Signed-off-by: Florian Westphal <fw@strlen.de>
>> Acked-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
>> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
>> Signed-off-by: Sasha Levin <sashal@kernel.org>
> 
> Also:
> Tested-by: Thomas Jarosch <thomas.jarosch@intra2net.com>
> 
> ;)
> 
> We've hit the issue with the wrong conntrack timeout at two different sites,
> long-lived connections to a SAP server over IPSec VPN were constantly dropping.
> 
> For us this was a regression after updating from kernel 3.14 to 4.19.
> Yesterday I've applied the patch to kernel 4.19.57 and the problem is fixed.
> 
> The issue was extra hard to debug as we could just boot the new kernel
> for twenty minutes in the evening on these productive systems.
> 
> The stable kernel patch from last Friday came right on time. I was just
> about the replay the TCP connection with tcpreplay, so this saved
> me from another week of debugging. Thanks everyone!

^ permalink raw reply

* Re: KMSAN: uninit-value in smsc75xx_bind
From: Oliver Neukum @ 2019-08-14 10:16 UTC (permalink / raw)
  To: Andrey Konovalov
  Cc: David S. Miller, Alexander Potapenko, syzkaller-bugs,
	steve.glendinning, syzbot, LKML, USB list, netdev
In-Reply-To: <CAAeHK+zzj_+Qvm217KO2OHnfBMWbepP0Y-OYTWpOw3ghe5ji=g@mail.gmail.com>

Am Dienstag, den 13.08.2019, 17:08 +0200 schrieb Andrey Konovalov:
> On Tue, Aug 13, 2019 at 2:43 PM Oliver Neukum <oneukum@suse.com> wrote:
> > 
> > 
> > Hi,
> > 
> > this looks like a false positive to me.
> > The offending code is likely this:
> > 
> >         if (size) {
> >                 buf = kmalloc(size, GFP_KERNEL);
> >                 if (!buf)
> >                         goto out;
> >         }
> > 
> >         err = usb_control_msg(dev->udev, usb_rcvctrlpipe(dev->udev, 0),
> >                               cmd, reqtype, value, index, buf, size,
> >                               USB_CTRL_GET_TIMEOUT);
> > 
> > which uses 'buf' uninitialized. But it is used for input.
> > What is happening here?
> 
> AFAICS, the uninitialized use of buf that KMSAN points out is in the
> "if (buf & PMT_CTL_DEV_RDY)"  statement in smsc75xx_wait_ready(). Does
> __smsc75xx_read_reg/usb_control_msg() always initialize buf? Can it
> just initialize the first few bytes for example?
> 

Hi,

you are unfortunately right and this is not the only driver vulnerable
in this way. I am going through them.

	Regards
		Oliver


^ permalink raw reply

* Re: [PATCH] can: rcar_can: Remove unused platform data support
From: Wolfram Sang @ 2019-08-14  9:46 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: Wolfgang Grandegger, Marc Kleine-Budde, Sergei Shtylyov,
	David S . Miller, Wolfram Sang, linux-can, linux-renesas-soc,
	netdev
In-Reply-To: <20190814092221.12959-1-geert+renesas@glider.be>

[-- Attachment #1: Type: text/plain, Size: 655 bytes --]

On Wed, Aug 14, 2019 at 11:22:21AM +0200, Geert Uytterhoeven wrote:
> All R-Car platforms use DT for describing CAN controllers.
> R-Car CAN platform data support was never used in any upstream kernel.
> 
> Move the Clock Select Register settings enum into the driver, and remove
> platform data support and the corresponding header file.
> 
> Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>

Not tested on HW, but double-checked there are no users in-tree, there
is no dangling pdata pointer left in the driver, visual review of the
moved block, and build-tested.

Reviewed-by: Wolfram Sang <wsa+renesas@sang-engineering.com>


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* can: tcan4x5x: spi bits_per_word issue on Raspberry PI
From: FIXED-TERM Buecheler Konstantin (ETAS-SEC/ECT-Mu) @ 2019-08-14  9:43 UTC (permalink / raw)
  To: linux-can@vger.kernel.org, netdev@vger.kernel.org; +Cc: Dan Murphy

Hi all,

I am trying to use a tcan4550 together with a Raspberry PI 3 B. I am using the tcan4x5x driver from net-next. 
I always get the following error during startup.
	tcan4x5x spi0.0: Probe failed, err=-22
	tcan4x5x: probe of spi0.0 failed with error -22

I realized that this happens because the Raspberry PI does only support 8/9 bit words. https://elinux.org/index.php?title=RPi_SPI#Supported_bits_per_word
In the driver it is set to 32.
	spi->bits_per_word = 32;

Setting this to 8 does not help of course since the tcan chip expects a multiple of 32 per spi transaction. 
I don't know if this is a Raspberry Pi specific problem or if there are more devices with this hardware limitation. 

Does anyone have a workaround for that? 

If this a common issue it might be a good idea to patch the driver. I will check if I can find proper a way to do so. 

Regards, 
Konstantin 

^ permalink raw reply

* Re: [RFC bpf-next 0/3] tools: bpftool: add subcommand to count map entries
From: Quentin Monnet @ 2019-08-14  9:42 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Alexei Starovoitov, Daniel Borkmann, bpf, netdev, oss-drivers
In-Reply-To: <20190814015149.b4pmubo3s4ou5yek@ast-mbp>

2019-08-13 18:51 UTC-0700 ~ Alexei Starovoitov
<alexei.starovoitov@gmail.com>
> On Tue, Aug 13, 2019 at 02:09:18PM +0100, Quentin Monnet wrote:
>> This series adds a "bpftool map count" subcommand to count the number of
>> entries present in a BPF map. This results from a customer request for a
>> tool to count the number of entries in BPF maps used in production (for
>> example, to know how many free entries are left in a given map).
>>
>> The first two commits actually contain some clean-up in preparation for the
>> new subcommand.
>>
>> The third commit adds the new subcommand. Because what data should count as
>> an entry is not entirely clear for all map types, we actually dump several
>> counters, and leave it to the users to interpret the values.
>>
>> Sending as a RFC because I'm looking for feedback on the approach. Is
>> printing several values the good thing to do? Also, note that some map
>> types such as queue/stack maps do not support any type of counting, this
>> would need to be implemented in the kernel I believe.
>>
>> More generally, we have a use case where (hash) maps are under pressure
>> (many additions/deletions from the BPF program), and counting the entries
>> by iterating other the different keys is not at all reliable. Would that
>> make sense to add a new bpf() subcommand to count the entries on the kernel
>> side instead of cycling over the entries in bpftool? If so, we would need
>> to agree on what makes an entry for each kind of map.
> 
> I don't mind new bpftool sub-command, but against adding kernel interface.
> Can you elaborate what is the actual use case?

Hi Alexei, thanks for your feedback.

The use case is a network processing application (close to a NAT), where
a hash map is used to keep track of flows, many of them being
short-lived. The BPF program spends a good chunk of time adding and
deleting entries to/from the map. The overall size (number of entries)
increases slowly, and when it grows past a certain threshold some action
must be taken (some flows are deleted from user space, possibly copied
to another map or whatever) to ensure we still have some room for new
incoming flows.

> The same can be achieved by 'bpftool map dump|grep key|wc -l', no?

To some extent (with subtleties for some other map types); and we use a
similar command line as a workaround for now. But because of the rate of
inserts/deletes in the map, the process often reports a number higher
than the max number of entries (we observed up to ~750k when max_entries
is 500k), even is the map is only half-full on average during the count.
On the worst case (though not frequent), an entry is deleted just before
we get the next key from it, and iteration starts all over again. This
is not reliable to determine how much space is left in the map.

I cannot see a solution that would provide a more accurate count from
user space, when the map is under pressure?

> 
>> Note that we are also facing similar issues for purging map from their
>> entries (deleting all entries at once). We can iterate on the keys and
>> delete elements one by one, but this is very inefficient when entries are
>> being added/removed in parallel from the BPF program, and having another
>> dedicated command accessible from the bpf() system call might help here as
>> well.
> 
> I think that fits into the batch processing of map commands discussion.
> 

This is also what we do at the moment, but we hit similar limitations
when iterating over the keys.

Thanks,
Quentin

^ permalink raw reply

* Re: [PATCH] net: ethernet: mediatek: Add MT7628/88 SoC support
From: René van Dorst @ 2019-08-14  9:26 UTC (permalink / raw)
  To: Stefan Roese
  Cc: netdev, linux-mediatek, Sean Wang, Felix Fietkau, John Crispin,
	Daniel Golle
In-Reply-To: <a92d7207-80b2-e88d-d869-64c9758ef1da@denx.de>

Hi Stefan,

Quoting Stefan Roese <sr@denx.de>:

> Hi Rene,
>
> On 17.07.19 14:53, René van Dorst wrote:
>
> <snip>
>
>>> +++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.h
>>> @@ -39,7 +39,8 @@
>>>  				 NETIF_F_SG | NETIF_F_TSO | \
>>>  				 NETIF_F_TSO6 | \
>>>  				 NETIF_F_IPV6_CSUM)
>>> -#define NEXT_RX_DESP_IDX(X, Y)	(((X) + 1) & ((Y) - 1))
>>> +#define MTK_HW_FEATURES_MT7628	(NETIF_F_SG | NETIF_F_RXCSUM)
>>> +#define NEXT_DESP_IDX(X, Y)	(((X) + 1) & ((Y) - 1))
>>>
>>>  #define MTK_MAX_RX_RING_NUM	4
>>>  #define MTK_HW_LRO_DMA_SIZE	8
>>> @@ -118,6 +119,7 @@
>>>  /* PDMA Global Configuration Register */
>>>  #define MTK_PDMA_GLO_CFG	0xa04
>>>  #define MTK_MULTI_EN		BIT(10)
>>> +#define MTK_PDMA_SIZE_8DWORDS	(1 << 4)
>>>
>>>  /* PDMA Reset Index Register */
>>>  #define MTK_PDMA_RST_IDX	0xa08
>>> @@ -276,11 +278,18 @@
>>>  #define TX_DMA_OWNER_CPU	BIT(31)
>>>  #define TX_DMA_LS0		BIT(30)
>>>  #define TX_DMA_PLEN0(_x)	(((_x) & MTK_TX_DMA_BUF_LEN) << 16)
>>> +#define TX_DMA_PLEN1(_x)	((_x) & MTK_TX_DMA_BUF_LEN)
>>>  #define TX_DMA_SWC		BIT(14)
>>>  #define TX_DMA_SDL(_x)		(((_x) & 0x3fff) << 16)
>>>
>>> +/* PDMA on MT7628 */
>>> +#define TX_DMA_DONE		BIT(31)
>>> +#define TX_DMA_LS1		BIT(14)
>>> +#define TX_DMA_DESP2_DEF	(TX_DMA_LS0 | TX_DMA_DONE)
>>> +
>>>  /* QDMA descriptor rxd2 */
>>>  #define RX_DMA_DONE		BIT(31)
>>> +#define RX_DMA_LSO		BIT(30)
>>>  #define RX_DMA_PLEN0(_x)	(((_x) & 0x3fff) << 16)
>>>  #define RX_DMA_GET_PLEN0(_x)	(((_x) >> 16) & 0x3fff)
>>>
>>> @@ -289,6 +298,7 @@
>>>
>>>  /* QDMA descriptor rxd4 */
>>>  #define RX_DMA_L4_VALID		BIT(24)
>>> +#define RX_DMA_L4_VALID_PDMA	BIT(30)		/* when PDMA is used */
>>>  #define RX_DMA_FPORT_SHIFT	19
>>>  #define RX_DMA_FPORT_MASK	0x7
>>>
>>> @@ -412,6 +422,19 @@
>>>  #define CO_QPHY_SEL            BIT(0)
>>>  #define GEPHY_MAC_SEL          BIT(1)
>>>
>>> +/* MT7628/88 specific stuff */
>>> +#define MT7628_PDMA_OFFSET	0x0800
>>> +#define MT7628_SDM_OFFSET	0x0c00
>>> +
>>> +#define MT7628_TX_BASE_PTR0	(MT7628_PDMA_OFFSET + 0x00)
>>> +#define MT7628_TX_MAX_CNT0	(MT7628_PDMA_OFFSET + 0x04)
>>> +#define MT7628_TX_CTX_IDX0	(MT7628_PDMA_OFFSET + 0x08)
>>> +#define MT7628_TX_DTX_IDX0	(MT7628_PDMA_OFFSET + 0x0c)
>>> +#define MT7628_PST_DTX_IDX0	BIT(0)
>>> +
>>> +#define MT7628_SDM_MAC_ADRL	(MT7628_SDM_OFFSET + 0x0c)
>>> +#define MT7628_SDM_MAC_ADRH	(MT7628_SDM_OFFSET + 0x10)
>>> +
>>>  struct mtk_rx_dma {
>>>  	unsigned int rxd1;
>>>  	unsigned int rxd2;
>>> @@ -509,6 +532,7 @@ enum mtk_clks_map {
>>>  				 BIT(MTK_CLK_SGMII_CK) | \
>>>  				 BIT(MTK_CLK_ETH2PLL))
>>>  #define MT7621_CLKS_BITMAP	(0)
>>> +#define MT7628_CLKS_BITMAP	(0)
>>>  #define MT7629_CLKS_BITMAP	(BIT(MTK_CLK_ETHIF) | BIT(MTK_CLK_ESW) |  \
>>>  				 BIT(MTK_CLK_GP0) | BIT(MTK_CLK_GP1) | \
>>>  				 BIT(MTK_CLK_GP2) | BIT(MTK_CLK_FE) | \
>>> @@ -563,6 +587,10 @@ struct mtk_tx_ring {
>>>  	struct mtk_tx_dma *last_free;
>>>  	u16 thresh;
>>>  	atomic_t free_count;
>>> +	int dma_size;
>>> +	struct mtk_tx_dma *dma_pdma;	/* For MT7628/88 PDMA handling */
>>> +	dma_addr_t phys_pdma;
>>> +	int cpu_idx;
>>>  };
>>>
>>>  /* PDMA rx ring mode */
>>> @@ -604,6 +632,7 @@ enum mkt_eth_capabilities {
>>>  	MTK_HWLRO_BIT,
>>>  	MTK_SHARED_INT_BIT,
>>>  	MTK_TRGMII_MT7621_CLK_BIT,
>>> +	MTK_SOC_MT7628,
>>
>> This should be MTK_SOC_MT7628_BIT, this only defines the bit number!
>>
>> and futher on #define MTK_SOC_MT7628 BIT(MTK_SOC_MT7628_BIT)
>
> Okay, thanks.
>
>> Based on this commit [0], MT7621 also needs the PDMA for the RX path.
>> I know that is not your issue but I think it is better to add a extra
>> capability bit for the PDMA bits so it can also be used on other socs.
>
> Yes, MT7621 also uses PDMA for RX. The code for RX is pretty much
> shared (re-used), with slight changes for the MT7628/88 to work
> correctly on this SoC.
>
> I'll work on a capability bit for PDMA vs QDMA on TX though. This
> might make things a little more transparent.

Great, Thanks for addressing this issue.

I hope we can collaborate to also support mt76x8 in my PHYLINK patches [0][1].
I am close to posting V2 of the patches but I am currently waiting on some
fiber modules to test the changes better.

Greats,

René

[0] https://patchwork.ozlabs.org/patch/1136551/
[1] https://patchwork.ozlabs.org/patch/1136519/

>
>> Greats,
>>
>> René
>>
>> [0] https://lkml.org/lkml/2018/3/14/1038
>
> Thanks,
> Stefan




^ permalink raw reply

* tc - mirred ingress not supported at the moment
From: Martin Olsson @ 2019-08-14  9:25 UTC (permalink / raw)
  To: Cong Wang; +Cc: netdev
In-Reply-To: <CAAT+qEYG5=5ny+t0VcqiYjDUQLrcj9sBR=2w-fdsE7Jjf4xOkQ@mail.gmail.com>

Hi Cong!

Ah sorry.
Already implemented. Great!

Hmmm. Then why don't the manual at
https://www.linux.org/docs/man8/tc-mirred.html to reflect the changes?
That was the place I checked to see if ingress was still not implemented.
In the commit you point at, the sentence "Currently only egress is
implemented" has been removed.

Question:
Is there any form of performance penalty if I send the mirrored
traffic to the ingress queue of the destination interface rather than
to the egress queue?
I mean, in the kernel there is the possibility to perform far more
actions on the ingress queue than on the egress, but if I leave both
queues at their defaults, will mirrored packets to ingress use more
CPU cycles than to the egress destination, or are they more or less
identical?

Question 2:
Given the commit
https://git.kernel.org/pub/scm/network/iproute2/iproute2.git/commit/?id=5eca0a3701223619a513c7209f7d9335ca1b4cfa,
how can I see in what kernel version it was added?

/Martin

Den tis 13 aug. 2019 kl 18:47 skrev Cong Wang <xiyou.wangcong@gmail.com>:
>
> On Tue, Aug 13, 2019 at 4:05 AM Martin Olsson
> <martin.olsson+netdev@sentorsecurity.com> wrote:
> > Q1: Why was 'ingress' not implemented at the same time as 'egress'?
>
> Because you are using an old iproute2.
>
> ingress support is added by:
> https://git.kernel.org/pub/scm/network/iproute2/iproute2.git/commit/?id=5eca0a3701223619a513c7209f7d9335ca1b4cfa
>
>
> > 2)
> > Ok, so I have to use 'egress':
> > # tc filter add dev eno2 parent ffff: prio 999  protocol all matchall
> > action mirred egress redirect dev mon0
>
>
> So you redirect packets from eno2's ingress to mon0's egress.
>
>
> >
> > Since the mirred action forces me to use 'egress' as the direction on
> > the dest interface, all kinds of network statistics tools show
> > incorrect counters. :-(
> > eno2 is a pure sniffer interface (it is connected to the SPAN dest
> > port of a switch).
> > All packets (matchall) on eno2 are mirrored to mon0.
> >
> > # ip -s link show dev eno2
> >     ...
> >     ...
> >     RX: bytes  packets  errors  dropped overrun mcast
> >     13660757   16329    0       0       0       0
> >     TX: bytes  packets  errors  dropped carrier collsns
> >     0          0        0       0       0       0
> > # ip -s link show dev mon0
> >     ...
> >     ...
> >     RX: bytes  packets  errors  dropped overrun mcast
> >     0          0        0       0       0       0
> >     TX: bytes  packets  errors  dropped carrier collsns
> >     13660757   16329    0       0       0       0
> >
> > eno2 and mon0 should be identical, but they are inverted.
>
> Yes, this behavior is correct. The keyword "egress" in your cmdline
> already says so.
>
> >
> > Q2: So... Can the 'ingress' option please be implemented? (I'm no
> > programmer, so unfortunetly I can't do it myself).
>
> It is completed, you need to update your iproute2 and kernel.
>
> Thanks.

^ permalink raw reply

* [PATCH 4/7] netfilter: nf_flow_table: conntrack picks up expired flows
From: Pablo Neira Ayuso @ 2019-08-14  9:24 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev
In-Reply-To: <20190814092440.20087-1-pablo@netfilter.org>

Update conntrack entry to pick up expired flows, otherwise the conntrack
entry gets stuck with the internal offload timeout (one day). The TCP
state also needs to be adjusted to ESTABLISHED state and tracking is set
to liberal mode in order to give conntrack a chance to pick up the
expired flow.

Fixes: ac2a66665e23 ("netfilter: add generic flow table infrastructure")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/netfilter/nf_flow_table_core.c | 17 ++++++++++-------
 1 file changed, 10 insertions(+), 7 deletions(-)

diff --git a/net/netfilter/nf_flow_table_core.c b/net/netfilter/nf_flow_table_core.c
index e3d797252a98..68a24471ffee 100644
--- a/net/netfilter/nf_flow_table_core.c
+++ b/net/netfilter/nf_flow_table_core.c
@@ -111,7 +111,7 @@ static void flow_offload_fixup_tcp(struct ip_ct_tcp *tcp)
 #define NF_FLOWTABLE_TCP_PICKUP_TIMEOUT	(120 * HZ)
 #define NF_FLOWTABLE_UDP_PICKUP_TIMEOUT	(30 * HZ)
 
-static void flow_offload_fixup_ct_state(struct nf_conn *ct)
+static void flow_offload_fixup_ct(struct nf_conn *ct)
 {
 	const struct nf_conntrack_l4proto *l4proto;
 	unsigned int timeout;
@@ -208,6 +208,11 @@ int flow_offload_add(struct nf_flowtable *flow_table, struct flow_offload *flow)
 }
 EXPORT_SYMBOL_GPL(flow_offload_add);
 
+static inline bool nf_flow_has_expired(const struct flow_offload *flow)
+{
+	return (__s32)(flow->timeout - (u32)jiffies) <= 0;
+}
+
 static void flow_offload_del(struct nf_flowtable *flow_table,
 			     struct flow_offload *flow)
 {
@@ -223,6 +228,9 @@ static void flow_offload_del(struct nf_flowtable *flow_table,
 	e = container_of(flow, struct flow_offload_entry, flow);
 	clear_bit(IPS_OFFLOAD_BIT, &e->ct->status);
 
+	if (nf_flow_has_expired(flow))
+		flow_offload_fixup_ct(e->ct);
+
 	flow_offload_free(flow);
 }
 
@@ -233,7 +241,7 @@ void flow_offload_teardown(struct flow_offload *flow)
 	flow->flags |= FLOW_OFFLOAD_TEARDOWN;
 
 	e = container_of(flow, struct flow_offload_entry, flow);
-	flow_offload_fixup_ct_state(e->ct);
+	flow_offload_fixup_ct(e->ct);
 }
 EXPORT_SYMBOL_GPL(flow_offload_teardown);
 
@@ -298,11 +306,6 @@ nf_flow_table_iterate(struct nf_flowtable *flow_table,
 	return err;
 }
 
-static inline bool nf_flow_has_expired(const struct flow_offload *flow)
-{
-	return (__s32)(flow->timeout - (u32)jiffies) <= 0;
-}
-
 static void nf_flow_offload_gc_step(struct flow_offload *flow, void *data)
 {
 	struct nf_flowtable *flow_table = data;
-- 
2.11.0



^ permalink raw reply related

* [PATCH 5/7] netfilter: nf_flow_table: teardown flow timeout race
From: Pablo Neira Ayuso @ 2019-08-14  9:24 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev
In-Reply-To: <20190814092440.20087-1-pablo@netfilter.org>

Flows that are in teardown state (due to RST / FIN TCP packet) still
have their offload flag set on. Hence, the conntrack garbage collector
may race to undo the timeout adjustment that the fixup routine performs,
leaving the conntrack entry in place with the internal offload timeout
(one day).

Update teardown flow state to ESTABLISHED and set tracking to liberal,
then once the offload bit is cleared, adjust timeout if it is more than
the default fixup timeout (conntrack might already have set a lower
timeout from the packet path).

Fixes: da5984e51063 ("netfilter: nf_flow_table: add support for sending flows back to the slow path")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/netfilter/nf_flow_table_core.c | 34 +++++++++++++++++++++++++---------
 1 file changed, 25 insertions(+), 9 deletions(-)

diff --git a/net/netfilter/nf_flow_table_core.c b/net/netfilter/nf_flow_table_core.c
index 68a24471ffee..80a8f9ae4c93 100644
--- a/net/netfilter/nf_flow_table_core.c
+++ b/net/netfilter/nf_flow_table_core.c
@@ -111,15 +111,16 @@ static void flow_offload_fixup_tcp(struct ip_ct_tcp *tcp)
 #define NF_FLOWTABLE_TCP_PICKUP_TIMEOUT	(120 * HZ)
 #define NF_FLOWTABLE_UDP_PICKUP_TIMEOUT	(30 * HZ)
 
-static void flow_offload_fixup_ct(struct nf_conn *ct)
+static inline __s32 nf_flow_timeout_delta(unsigned int timeout)
+{
+	return (__s32)(timeout - (u32)jiffies);
+}
+
+static void flow_offload_fixup_ct_timeout(struct nf_conn *ct)
 {
 	const struct nf_conntrack_l4proto *l4proto;
+	int l4num = nf_ct_protonum(ct);
 	unsigned int timeout;
-	int l4num;
-
-	l4num = nf_ct_protonum(ct);
-	if (l4num == IPPROTO_TCP)
-		flow_offload_fixup_tcp(&ct->proto.tcp);
 
 	l4proto = nf_ct_l4proto_find(l4num);
 	if (!l4proto)
@@ -132,7 +133,20 @@ static void flow_offload_fixup_ct(struct nf_conn *ct)
 	else
 		return;
 
-	ct->timeout = nfct_time_stamp + timeout;
+	if (nf_flow_timeout_delta(ct->timeout) > (__s32)timeout)
+		ct->timeout = nfct_time_stamp + timeout;
+}
+
+static void flow_offload_fixup_ct_state(struct nf_conn *ct)
+{
+	if (nf_ct_protonum(ct) == IPPROTO_TCP)
+		flow_offload_fixup_tcp(&ct->proto.tcp);
+}
+
+static void flow_offload_fixup_ct(struct nf_conn *ct)
+{
+	flow_offload_fixup_ct_state(ct);
+	flow_offload_fixup_ct_timeout(ct);
 }
 
 void flow_offload_free(struct flow_offload *flow)
@@ -210,7 +224,7 @@ EXPORT_SYMBOL_GPL(flow_offload_add);
 
 static inline bool nf_flow_has_expired(const struct flow_offload *flow)
 {
-	return (__s32)(flow->timeout - (u32)jiffies) <= 0;
+	return nf_flow_timeout_delta(flow->timeout) <= 0;
 }
 
 static void flow_offload_del(struct nf_flowtable *flow_table,
@@ -230,6 +244,8 @@ static void flow_offload_del(struct nf_flowtable *flow_table,
 
 	if (nf_flow_has_expired(flow))
 		flow_offload_fixup_ct(e->ct);
+	else if (flow->flags & FLOW_OFFLOAD_TEARDOWN)
+		flow_offload_fixup_ct_timeout(e->ct);
 
 	flow_offload_free(flow);
 }
@@ -241,7 +257,7 @@ void flow_offload_teardown(struct flow_offload *flow)
 	flow->flags |= FLOW_OFFLOAD_TEARDOWN;
 
 	e = container_of(flow, struct flow_offload_entry, flow);
-	flow_offload_fixup_ct(e->ct);
+	flow_offload_fixup_ct_state(e->ct);
 }
 EXPORT_SYMBOL_GPL(flow_offload_teardown);
 
-- 
2.11.0



^ permalink raw reply related

* [PATCH 3/7] netfilter: nf_tables: use-after-free in failing rule with bound set
From: Pablo Neira Ayuso @ 2019-08-14  9:24 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev
In-Reply-To: <20190814092440.20087-1-pablo@netfilter.org>

If a rule that has already a bound anonymous set fails to be added, the
preparation phase releases the rule and the bound set. However, the
transaction object from the abort path still has a reference to the set
object that is stale, leading to a use-after-free when checking for the
set->bound field. Add a new field to the transaction that specifies if
the set is bound, so the abort path can skip releasing it since the rule
command owns it and it takes care of releasing it. After this update,
the set->bound field is removed.

[   24.649883] Unable to handle kernel paging request at virtual address 0000000000040434
[   24.657858] Mem abort info:
[   24.660686]   ESR = 0x96000004
[   24.663769]   Exception class = DABT (current EL), IL = 32 bits
[   24.669725]   SET = 0, FnV = 0
[   24.672804]   EA = 0, S1PTW = 0
[   24.675975] Data abort info:
[   24.678880]   ISV = 0, ISS = 0x00000004
[   24.682743]   CM = 0, WnR = 0
[   24.685723] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000428952000
[   24.692207] [0000000000040434] pgd=0000000000000000
[   24.697119] Internal error: Oops: 96000004 [#1] SMP
[...]
[   24.889414] Call trace:
[   24.891870]  __nf_tables_abort+0x3f0/0x7a0
[   24.895984]  nf_tables_abort+0x20/0x40
[   24.899750]  nfnetlink_rcv_batch+0x17c/0x588
[   24.904037]  nfnetlink_rcv+0x13c/0x190
[   24.907803]  netlink_unicast+0x18c/0x208
[   24.911742]  netlink_sendmsg+0x1b0/0x350
[   24.915682]  sock_sendmsg+0x4c/0x68
[   24.919185]  ___sys_sendmsg+0x288/0x2c8
[   24.923037]  __sys_sendmsg+0x7c/0xd0
[   24.926628]  __arm64_sys_sendmsg+0x2c/0x38
[   24.930744]  el0_svc_common.constprop.0+0x94/0x158
[   24.935556]  el0_svc_handler+0x34/0x90
[   24.939322]  el0_svc+0x8/0xc
[   24.942216] Code: 37280300 f9404023 91014262 aa1703e0 (f9401863)
[   24.948336] ---[ end trace cebbb9dcbed3b56f ]---

Fixes: f6ac85858976 ("netfilter: nf_tables: unbind set in rule from commit path")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 include/net/netfilter/nf_tables.h |  9 +++++++--
 net/netfilter/nf_tables_api.c     | 15 ++++++++++-----
 2 files changed, 17 insertions(+), 7 deletions(-)

diff --git a/include/net/netfilter/nf_tables.h b/include/net/netfilter/nf_tables.h
index 9b624566b82d..475d6f28ca67 100644
--- a/include/net/netfilter/nf_tables.h
+++ b/include/net/netfilter/nf_tables.h
@@ -421,8 +421,7 @@ struct nft_set {
 	unsigned char			*udata;
 	/* runtime data below here */
 	const struct nft_set_ops	*ops ____cacheline_aligned;
-	u16				flags:13,
-					bound:1,
+	u16				flags:14,
 					genmask:2;
 	u8				klen;
 	u8				dlen;
@@ -1348,12 +1347,15 @@ struct nft_trans_rule {
 struct nft_trans_set {
 	struct nft_set			*set;
 	u32				set_id;
+	bool				bound;
 };
 
 #define nft_trans_set(trans)	\
 	(((struct nft_trans_set *)trans->data)->set)
 #define nft_trans_set_id(trans)	\
 	(((struct nft_trans_set *)trans->data)->set_id)
+#define nft_trans_set_bound(trans)	\
+	(((struct nft_trans_set *)trans->data)->bound)
 
 struct nft_trans_chain {
 	bool				update;
@@ -1384,12 +1386,15 @@ struct nft_trans_table {
 struct nft_trans_elem {
 	struct nft_set			*set;
 	struct nft_set_elem		elem;
+	bool				bound;
 };
 
 #define nft_trans_elem_set(trans)	\
 	(((struct nft_trans_elem *)trans->data)->set)
 #define nft_trans_elem(trans)	\
 	(((struct nft_trans_elem *)trans->data)->elem)
+#define nft_trans_elem_set_bound(trans)	\
+	(((struct nft_trans_elem *)trans->data)->bound)
 
 struct nft_trans_obj {
 	struct nft_object		*obj;
diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index 605a7cfe7ca7..88abbddf8967 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -138,9 +138,14 @@ static void nft_set_trans_bind(const struct nft_ctx *ctx, struct nft_set *set)
 		return;
 
 	list_for_each_entry_reverse(trans, &net->nft.commit_list, list) {
-		if (trans->msg_type == NFT_MSG_NEWSET &&
-		    nft_trans_set(trans) == set) {
-			set->bound = true;
+		switch (trans->msg_type) {
+		case NFT_MSG_NEWSET:
+			if (nft_trans_set(trans) == set)
+				nft_trans_set_bound(trans) = true;
+			break;
+		case NFT_MSG_NEWSETELEM:
+			if (nft_trans_elem_set(trans) == set)
+				nft_trans_elem_set_bound(trans) = true;
 			break;
 		}
 	}
@@ -6906,7 +6911,7 @@ static int __nf_tables_abort(struct net *net)
 			break;
 		case NFT_MSG_NEWSET:
 			trans->ctx.table->use--;
-			if (nft_trans_set(trans)->bound) {
+			if (nft_trans_set_bound(trans)) {
 				nft_trans_destroy(trans);
 				break;
 			}
@@ -6918,7 +6923,7 @@ static int __nf_tables_abort(struct net *net)
 			nft_trans_destroy(trans);
 			break;
 		case NFT_MSG_NEWSETELEM:
-			if (nft_trans_elem_set(trans)->bound) {
+			if (nft_trans_elem_set_bound(trans)) {
 				nft_trans_destroy(trans);
 				break;
 			}
-- 
2.11.0



^ permalink raw reply related

* [PATCH 6/7] netfilter: conntrack: Use consistent ct id hash calculation
From: Pablo Neira Ayuso @ 2019-08-14  9:24 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev
In-Reply-To: <20190814092440.20087-1-pablo@netfilter.org>

From: Dirk Morris <dmorris@metaloft.com>

Change ct id hash calculation to only use invariants.

Currently the ct id hash calculation is based on some fields that can
change in the lifetime on a conntrack entry in some corner cases. The
current hash uses the whole tuple which contains an hlist pointer which
will change when the conntrack is placed on the dying list resulting in
a ct id change.

This patch also removes the reply-side tuple and extension pointer from
the hash calculation so that the ct id will will not change from
initialization until confirmation.

Fixes: 3c79107631db1f7 ("netfilter: ctnetlink: don't use conntrack/expect object addresses as id")
Signed-off-by: Dirk Morris <dmorris@metaloft.com>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/netfilter/nf_conntrack_core.c | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
index a542761e90d1..81a8ef42b88d 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -453,13 +453,12 @@ EXPORT_SYMBOL_GPL(nf_ct_invert_tuple);
  * table location, we assume id gets exposed to userspace.
  *
  * Following nf_conn items do not change throughout lifetime
- * of the nf_conn after it has been committed to main hash table:
+ * of the nf_conn:
  *
  * 1. nf_conn address
- * 2. nf_conn->ext address
- * 3. nf_conn->master address (normally NULL)
- * 4. tuple
- * 5. the associated net namespace
+ * 2. nf_conn->master address (normally NULL)
+ * 3. the associated net namespace
+ * 4. the original direction tuple
  */
 u32 nf_ct_get_id(const struct nf_conn *ct)
 {
@@ -469,9 +468,10 @@ u32 nf_ct_get_id(const struct nf_conn *ct)
 	net_get_random_once(&ct_id_seed, sizeof(ct_id_seed));
 
 	a = (unsigned long)ct;
-	b = (unsigned long)ct->master ^ net_hash_mix(nf_ct_net(ct));
-	c = (unsigned long)ct->ext;
-	d = (unsigned long)siphash(&ct->tuplehash, sizeof(ct->tuplehash),
+	b = (unsigned long)ct->master;
+	c = (unsigned long)nf_ct_net(ct);
+	d = (unsigned long)siphash(&ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple,
+				   sizeof(ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple),
 				   &ct_id_seed);
 #ifdef CONFIG_64BIT
 	return siphash_4u64((u64)a, (u64)b, (u64)c, (u64)d, &ct_id_seed);
-- 
2.11.0



^ permalink raw reply related

* [PATCH 7/7] netfilter: nft_flow_offload: skip tcp rst and fin packets
From: Pablo Neira Ayuso @ 2019-08-14  9:24 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev
In-Reply-To: <20190814092440.20087-1-pablo@netfilter.org>

TCP rst and fin packets do not qualify to place a flow into the
flowtable. Most likely there will be no more packets after connection
closure. Without this patch, this flow entry expires and connection
tracking picks up the entry in ESTABLISHED state using the fixup
timeout, which makes this look inconsistent to the user for a connection
that is actually already closed.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/netfilter/nft_flow_offload.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/net/netfilter/nft_flow_offload.c b/net/netfilter/nft_flow_offload.c
index aa5f571d4361..060a4ed46d5e 100644
--- a/net/netfilter/nft_flow_offload.c
+++ b/net/netfilter/nft_flow_offload.c
@@ -72,11 +72,11 @@ static void nft_flow_offload_eval(const struct nft_expr *expr,
 {
 	struct nft_flow_offload *priv = nft_expr_priv(expr);
 	struct nf_flowtable *flowtable = &priv->flowtable->data;
+	struct tcphdr _tcph, *tcph = NULL;
 	enum ip_conntrack_info ctinfo;
 	struct nf_flow_route route;
 	struct flow_offload *flow;
 	enum ip_conntrack_dir dir;
-	bool is_tcp = false;
 	struct nf_conn *ct;
 	int ret;
 
@@ -89,7 +89,10 @@ static void nft_flow_offload_eval(const struct nft_expr *expr,
 
 	switch (ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple.dst.protonum) {
 	case IPPROTO_TCP:
-		is_tcp = true;
+		tcph = skb_header_pointer(pkt->skb, pkt->xt.thoff,
+					  sizeof(_tcph), &_tcph);
+		if (unlikely(!tcph || tcph->fin || tcph->rst))
+			goto out;
 		break;
 	case IPPROTO_UDP:
 		break;
@@ -115,7 +118,7 @@ static void nft_flow_offload_eval(const struct nft_expr *expr,
 	if (!flow)
 		goto err_flow_alloc;
 
-	if (is_tcp) {
+	if (tcph) {
 		ct->proto.tcp.seen[0].flags |= IP_CT_TCP_FLAG_BE_LIBERAL;
 		ct->proto.tcp.seen[1].flags |= IP_CT_TCP_FLAG_BE_LIBERAL;
 	}
-- 
2.11.0



^ permalink raw reply related

* [PATCH 2/7] netfilter: nf_flow_table: fix offload for flows that are subject to xfrm
From: Pablo Neira Ayuso @ 2019-08-14  9:24 UTC (permalink / raw)
  To: netfilter-devel; +Cc: davem, netdev
In-Reply-To: <20190814092440.20087-1-pablo@netfilter.org>

From: Florian Westphal <fw@strlen.de>

This makes the previously added 'encap test' pass.
Because its possible that the xfrm dst entry becomes stale while such
a flow is offloaded, we need to call dst_check() -- the notifier that
handles this for non-tunneled traffic isn't sufficient, because SA or
or policies might have changed.

If dst becomes stale the flow offload entry will be tagged for teardown
and packets will be passed to 'classic' forwarding path.

Removing the entry right away is problematic, as this would
introduce a race condition with the gc worker.

In case flow is long-lived, it could eventually be offloaded again
once the gc worker removes the entry from the flow table.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
---
 net/netfilter/nf_flow_table_ip.c | 43 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 43 insertions(+)

diff --git a/net/netfilter/nf_flow_table_ip.c b/net/netfilter/nf_flow_table_ip.c
index cdfc33517e85..d68c801dd614 100644
--- a/net/netfilter/nf_flow_table_ip.c
+++ b/net/netfilter/nf_flow_table_ip.c
@@ -214,6 +214,25 @@ static bool nf_flow_exceeds_mtu(const struct sk_buff *skb, unsigned int mtu)
 	return true;
 }
 
+static int nf_flow_offload_dst_check(struct dst_entry *dst)
+{
+	if (unlikely(dst_xfrm(dst)))
+		return dst_check(dst, 0) ? 0 : -1;
+
+	return 0;
+}
+
+static unsigned int nf_flow_xmit_xfrm(struct sk_buff *skb,
+				      const struct nf_hook_state *state,
+				      struct dst_entry *dst)
+{
+	skb_orphan(skb);
+	skb_dst_set_noref(skb, dst);
+	skb->tstamp = 0;
+	dst_output(state->net, state->sk, skb);
+	return NF_STOLEN;
+}
+
 unsigned int
 nf_flow_offload_ip_hook(void *priv, struct sk_buff *skb,
 			const struct nf_hook_state *state)
@@ -254,6 +273,11 @@ nf_flow_offload_ip_hook(void *priv, struct sk_buff *skb,
 	if (nf_flow_state_check(flow, ip_hdr(skb)->protocol, skb, thoff))
 		return NF_ACCEPT;
 
+	if (nf_flow_offload_dst_check(&rt->dst)) {
+		flow_offload_teardown(flow);
+		return NF_ACCEPT;
+	}
+
 	if (nf_flow_nat_ip(flow, skb, thoff, dir) < 0)
 		return NF_DROP;
 
@@ -261,6 +285,13 @@ nf_flow_offload_ip_hook(void *priv, struct sk_buff *skb,
 	iph = ip_hdr(skb);
 	ip_decrease_ttl(iph);
 
+	if (unlikely(dst_xfrm(&rt->dst))) {
+		memset(skb->cb, 0, sizeof(struct inet_skb_parm));
+		IPCB(skb)->iif = skb->dev->ifindex;
+		IPCB(skb)->flags = IPSKB_FORWARDED;
+		return nf_flow_xmit_xfrm(skb, state, &rt->dst);
+	}
+
 	skb->dev = outdev;
 	nexthop = rt_nexthop(rt, flow->tuplehash[!dir].tuple.src_v4.s_addr);
 	skb_dst_set_noref(skb, &rt->dst);
@@ -467,6 +498,11 @@ nf_flow_offload_ipv6_hook(void *priv, struct sk_buff *skb,
 				sizeof(*ip6h)))
 		return NF_ACCEPT;
 
+	if (nf_flow_offload_dst_check(&rt->dst)) {
+		flow_offload_teardown(flow);
+		return NF_ACCEPT;
+	}
+
 	if (skb_try_make_writable(skb, sizeof(*ip6h)))
 		return NF_DROP;
 
@@ -477,6 +513,13 @@ nf_flow_offload_ipv6_hook(void *priv, struct sk_buff *skb,
 	ip6h = ipv6_hdr(skb);
 	ip6h->hop_limit--;
 
+	if (unlikely(dst_xfrm(&rt->dst))) {
+		memset(skb->cb, 0, sizeof(struct inet6_skb_parm));
+		IP6CB(skb)->iif = skb->dev->ifindex;
+		IP6CB(skb)->flags = IP6SKB_FORWARDED;
+		return nf_flow_xmit_xfrm(skb, state, &rt->dst);
+	}
+
 	skb->dev = outdev;
 	nexthop = rt6_nexthop(rt, &flow->tuplehash[!dir].tuple.src_v6);
 	skb_dst_set_noref(skb, &rt->dst);
-- 
2.11.0



^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox