Netdev List

Netdev List
 help / color / mirror / Atom feed

* [PATCH AUTOSEL 5.10 06/18] vxlan: fix error return code in vxlan_fdb_append
From: Sasha Levin @ 2022-04-19 18:13 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Hongbin Wang, David S . Miller, Sasha Levin, kuba, pabeni, netdev
In-Reply-To: <20220419181353.485719-1-sashal@kernel.org>

From: Hongbin Wang <wh_bin@126.com>

[ Upstream commit 7cea5560bf656b84f9ed01c0cc829d4eecd0640b ]

When kmalloc and dst_cache_init failed,
should return ENOMEM rather than ENOBUFS.

Signed-off-by: Hongbin Wang <wh_bin@126.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 drivers/net/vxlan.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 48fbdce6a70e..72d670667f64 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -710,11 +710,11 @@ static int vxlan_fdb_append(struct vxlan_fdb *f,

 	rd = kmalloc(sizeof(*rd), GFP_ATOMIC);
 	if (rd == NULL)
-		return -ENOBUFS;
+		return -ENOMEM;

 	if (dst_cache_init(&rd->dst_cache, GFP_ATOMIC)) {
 		kfree(rd);
-		return -ENOBUFS;
+		return -ENOMEM;
 	}

 	rd->remote_ip = *ip;
-- 
2.35.1

^ permalink raw reply related

* [PATCH AUTOSEL 5.15 16/27] net: macb: Restart tx only if queue pointer is lagging
From: Sasha Levin @ 2022-04-19 18:12 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Tomas Melin, Claudiu Beznea, Jakub Kicinski, Sasha Levin,
	nicolas.ferre, davem, pabeni, netdev
In-Reply-To: <20220419181242.485308-1-sashal@kernel.org>

From: Tomas Melin <tomas.melin@vaisala.com>

[ Upstream commit 5ad7f18cd82cee8e773d40cc7a1465a526f2615c ]

commit 4298388574da ("net: macb: restart tx after tx used bit read")
added support for restarting transmission. Restarting tx does not work
in case controller asserts TXUBR interrupt and TQBP is already at the end
of the tx queue. In that situation, restarting tx will immediately cause
assertion of another TXUBR interrupt. The driver will end up in an infinite
interrupt loop which it cannot break out of.

For cases where TQBP is at the end of the tx queue, instead
only clear TX_USED interrupt. As more data gets pushed to the queue,
transmission will resume.

This issue was observed on a Xilinx Zynq-7000 based board.
During stress test of the network interface,
driver would get stuck on interrupt loop within seconds or minutes
causing CPU to stall.

Signed-off-by: Tomas Melin <tomas.melin@vaisala.com>
Tested-by: Claudiu Beznea <claudiu.beznea@microchip.com>
Reviewed-by: Claudiu Beznea <claudiu.beznea@microchip.com>
Link: https://lore.kernel.org/r/20220407161659.14532-1-tomas.melin@vaisala.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 drivers/net/ethernet/cadence/macb_main.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/drivers/net/ethernet/cadence/macb_main.c b/drivers/net/ethernet/cadence/macb_main.c
index 9705c49655ad..217c1a0f8940 100644
--- a/drivers/net/ethernet/cadence/macb_main.c
+++ b/drivers/net/ethernet/cadence/macb_main.c
@@ -1689,6 +1689,7 @@ static void macb_tx_restart(struct macb_queue *queue)
 	unsigned int head = queue->tx_head;
 	unsigned int tail = queue->tx_tail;
 	struct macb *bp = queue->bp;
+	unsigned int head_idx, tbqp;
 
 	if (bp->caps & MACB_CAPS_ISR_CLEAR_ON_WRITE)
 		queue_writel(queue, ISR, MACB_BIT(TXUBR));
@@ -1696,6 +1697,13 @@ static void macb_tx_restart(struct macb_queue *queue)
 	if (head == tail)
 		return;
 
+	tbqp = queue_readl(queue, TBQP) / macb_dma_desc_get_size(bp);
+	tbqp = macb_adj_dma_desc_idx(bp, macb_tx_ring_wrap(bp, tbqp));
+	head_idx = macb_adj_dma_desc_idx(bp, macb_tx_ring_wrap(bp, head));
+
+	if (tbqp == head_idx)
+		return;
+
 	macb_writel(bp, NCR, macb_readl(bp, NCR) | MACB_BIT(TSTART));
 }
 
-- 
2.35.1


^ permalink raw reply related

* [PATCH AUTOSEL 5.15 13/27] dpaa_eth: Fix missing of_node_put in dpaa_get_ts_info()
From: Sasha Levin @ 2022-04-19 18:12 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Lv Ruyi, Zeal Robot, David S . Miller, Sasha Levin, madalin.bucur,
	kuba, pabeni, netdev
In-Reply-To: <20220419181242.485308-1-sashal@kernel.org>

From: Lv Ruyi <lv.ruyi@zte.com.cn>

[ Upstream commit 1a7eb80d170c28be2928433702256fe2a0bd1e0f ]

Both of of_get_parent() and of_parse_phandle() return node pointer with
refcount incremented, use of_node_put() on it to decrease refcount
when done.

Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: Lv Ruyi <lv.ruyi@zte.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 drivers/net/ethernet/freescale/dpaa/dpaa_ethtool.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/freescale/dpaa/dpaa_ethtool.c b/drivers/net/ethernet/freescale/dpaa/dpaa_ethtool.c
index 763d2c7b5fb1..5750f9a56393 100644
--- a/drivers/net/ethernet/freescale/dpaa/dpaa_ethtool.c
+++ b/drivers/net/ethernet/freescale/dpaa/dpaa_ethtool.c
@@ -489,11 +489,15 @@ static int dpaa_get_ts_info(struct net_device *net_dev,
 	info->phc_index = -1;
 
 	fman_node = of_get_parent(mac_node);
-	if (fman_node)
+	if (fman_node) {
 		ptp_node = of_parse_phandle(fman_node, "ptimer-handle", 0);
+		of_node_put(fman_node);
+	}
 
-	if (ptp_node)
+	if (ptp_node) {
 		ptp_dev = of_find_device_by_node(ptp_node);
+		of_node_put(ptp_node);
+	}
 
 	if (ptp_dev)
 		ptp = platform_get_drvdata(ptp_dev);
-- 
2.35.1


^ permalink raw reply related

* [PATCH AUTOSEL 5.15 12/27] brcmfmac: sdio: Fix undefined behavior due to shift overflowing the constant
From: Sasha Levin @ 2022-04-19 18:12 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Borislav Petkov, Borislav Petkov, Arend van Spriel, Franky Lin,
	Hante Meuleman, Kalle Valo, David S. Miller, Jakub Kicinski,
	brcm80211-dev-list.pdl, netdev, Arend van Spriel, Sasha Levin,
	pabeni, linus.walleij, mbrugger, tongtiangen, sean.anderson,
	shenyang39, mike.rudenko, angus, linux-wireless,
	SHA-cyfmac-dev-list
In-Reply-To: <20220419181242.485308-1-sashal@kernel.org>

From: Borislav Petkov <bp@alien8.de>

[ Upstream commit 6fb3a5868b2117611f41e421e10e6a8c2a13039a ]

Fix:

  drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c: In function ‘brcmf_sdio_drivestrengthinit’:
  drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c:3798:2: error: case label does not reduce to an integer constant
    case SDIOD_DRVSTR_KEY(BRCM_CC_43143_CHIP_ID, 17):
    ^~~~
  drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c:3809:2: error: case label does not reduce to an integer constant
    case SDIOD_DRVSTR_KEY(BRCM_CC_43362_CHIP_ID, 13):
    ^~~~

See https://lore.kernel.org/r/YkwQ6%2BtIH8GQpuct@zn.tnic for the gory
details as to why it triggers with older gccs only.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Arend van Spriel <aspriel@gmail.com>
Cc: Franky Lin <franky.lin@broadcom.com>
Cc: Hante Meuleman <hante.meuleman@broadcom.com>
Cc: Kalle Valo <kvalo@kernel.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: brcm80211-dev-list.pdl@broadcom.com
Cc: netdev@vger.kernel.org
Acked-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/Ykx0iRlvtBnKqtbG@zn.tnic
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c
index 5d156e591b35..f7961b22e051 100644
--- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c
+++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c
@@ -557,7 +557,7 @@ enum brcmf_sdio_frmtype {
 	BRCMF_SDIO_FT_SUB,
 };
 
-#define SDIOD_DRVSTR_KEY(chip, pmu)     (((chip) << 16) | (pmu))
+#define SDIOD_DRVSTR_KEY(chip, pmu)     (((unsigned int)(chip) << 16) | (pmu))
 
 /* SDIO Pad drive strength to select value mappings */
 struct sdiod_drive_str {
-- 
2.35.1


^ permalink raw reply related

* [PATCH AUTOSEL 5.15 11/27] mt76: Fix undefined behavior due to shift overflowing the constant
From: Sasha Levin @ 2022-04-19 18:12 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Borislav Petkov, Felix Fietkau, Lorenzo Bianconi, Ryder Lee,
	Shayne Chen, Sean Wang, Kalle Valo, David S. Miller,
	Jakub Kicinski, linux-wireless, netdev, Sasha Levin, lorenzo,
	pabeni, matthias.bgg, christophe.jaillet, linux-arm-kernel,
	linux-mediatek
In-Reply-To: <20220419181242.485308-1-sashal@kernel.org>

From: Borislav Petkov <bp@suse.de>

[ Upstream commit dbc2b1764734857d68425468ffa8486e97ab89df ]

Fix:

  drivers/net/wireless/mediatek/mt76/mt76x2/pci.c: In function ‘mt76x2e_probe’:
  ././include/linux/compiler_types.h:352:38: error: call to ‘__compiletime_assert_946’ \
	declared with attribute error: FIELD_PREP: mask is not constant
    _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)

See https://lore.kernel.org/r/YkwQ6%2BtIH8GQpuct@zn.tnic for the gory
details as to why it triggers with older gccs only.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Felix Fietkau <nbd@nbd.name>
Cc: Lorenzo Bianconi <lorenzo.bianconi83@gmail.com>
Cc: Ryder Lee <ryder.lee@mediatek.com>
Cc: Shayne Chen <shayne.chen@mediatek.com>
Cc: Sean Wang <sean.wang@mediatek.com>
Cc: Kalle Valo <kvalo@kernel.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: linux-wireless@vger.kernel.org
Cc: netdev@vger.kernel.org
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20220405151517.29753-9-bp@alien8.de
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 drivers/net/wireless/mediatek/mt76/mt76x2/pci.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/wireless/mediatek/mt76/mt76x2/pci.c b/drivers/net/wireless/mediatek/mt76/mt76x2/pci.c
index adf288e50e21..5cd0379d86de 100644
--- a/drivers/net/wireless/mediatek/mt76/mt76x2/pci.c
+++ b/drivers/net/wireless/mediatek/mt76/mt76x2/pci.c
@@ -80,7 +80,7 @@ mt76x2e_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 	mt76_rmw_field(dev, 0x15a10, 0x1f << 16, 0x9);
 
 	/* RG_SSUSB_G1_CDR_BIC_LTR = 0xf */
-	mt76_rmw_field(dev, 0x15a0c, 0xf << 28, 0xf);
+	mt76_rmw_field(dev, 0x15a0c, 0xfU << 28, 0xf);
 
 	/* RG_SSUSB_CDR_BR_PE1D = 0x3 */
 	mt76_rmw_field(dev, 0x15c58, 0x3 << 6, 0x3);
-- 
2.35.1


^ permalink raw reply related

* [PATCH AUTOSEL 5.15 10/27] net: atlantic: Avoid out-of-bounds indexing
From: Sasha Levin @ 2022-04-19 18:12 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Kai-Heng Feng, Mario Limonciello, Igor Russkikh, Jakub Kicinski,
	Sasha Levin, davem, pabeni, netdev
In-Reply-To: <20220419181242.485308-1-sashal@kernel.org>

From: Kai-Heng Feng <kai.heng.feng@canonical.com>

[ Upstream commit 8d3a6c37d50d5a0504c126c932cc749e6dd9c78f ]

UBSAN warnings are observed on atlantic driver:
[ 294.432996] UBSAN: array-index-out-of-bounds in /build/linux-Qow4fL/linux-5.15.0/drivers/net/ethernet/aquantia/atlantic/aq_nic.c:484:48
[ 294.433695] index 8 is out of range for type 'aq_vec_s *[8]'

The ring is dereferenced right before breaking out the loop, to prevent
that from happening, only use the index in the loop to fix the issue.

BugLink: https://bugs.launchpad.net/bugs/1958770
Tested-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Reviewed-by: Igor Russkikh <irusskikh@marvell.com>
Link: https://lore.kernel.org/r/20220408022204.16815-1-kai.heng.feng@canonical.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 .../net/ethernet/aquantia/atlantic/aq_nic.c   |  8 +++----
 .../net/ethernet/aquantia/atlantic/aq_vec.c   | 24 +++++++++----------
 2 files changed, 16 insertions(+), 16 deletions(-)

diff --git a/drivers/net/ethernet/aquantia/atlantic/aq_nic.c b/drivers/net/ethernet/aquantia/atlantic/aq_nic.c
index 9de0065f89b9..fbb1e05d5878 100644
--- a/drivers/net/ethernet/aquantia/atlantic/aq_nic.c
+++ b/drivers/net/ethernet/aquantia/atlantic/aq_nic.c
@@ -480,8 +480,8 @@ int aq_nic_start(struct aq_nic_s *self)
 	if (err < 0)
 		goto err_exit;
 
-	for (i = 0U, aq_vec = self->aq_vec[0];
-		self->aq_vecs > i; ++i, aq_vec = self->aq_vec[i]) {
+	for (i = 0U; self->aq_vecs > i; ++i) {
+		aq_vec = self->aq_vec[i];
 		err = aq_vec_start(aq_vec);
 		if (err < 0)
 			goto err_exit;
@@ -511,8 +511,8 @@ int aq_nic_start(struct aq_nic_s *self)
 		mod_timer(&self->polling_timer, jiffies +
 			  AQ_CFG_POLLING_TIMER_INTERVAL);
 	} else {
-		for (i = 0U, aq_vec = self->aq_vec[0];
-			self->aq_vecs > i; ++i, aq_vec = self->aq_vec[i]) {
+		for (i = 0U; self->aq_vecs > i; ++i) {
+			aq_vec = self->aq_vec[i];
 			err = aq_pci_func_alloc_irq(self, i, self->ndev->name,
 						    aq_vec_isr, aq_vec,
 						    aq_vec_get_affinity_mask(aq_vec));
diff --git a/drivers/net/ethernet/aquantia/atlantic/aq_vec.c b/drivers/net/ethernet/aquantia/atlantic/aq_vec.c
index f4774cf051c9..6ab1f3212d24 100644
--- a/drivers/net/ethernet/aquantia/atlantic/aq_vec.c
+++ b/drivers/net/ethernet/aquantia/atlantic/aq_vec.c
@@ -43,8 +43,8 @@ static int aq_vec_poll(struct napi_struct *napi, int budget)
 	if (!self) {
 		err = -EINVAL;
 	} else {
-		for (i = 0U, ring = self->ring[0];
-			self->tx_rings > i; ++i, ring = self->ring[i]) {
+		for (i = 0U; self->tx_rings > i; ++i) {
+			ring = self->ring[i];
 			u64_stats_update_begin(&ring[AQ_VEC_RX_ID].stats.rx.syncp);
 			ring[AQ_VEC_RX_ID].stats.rx.polls++;
 			u64_stats_update_end(&ring[AQ_VEC_RX_ID].stats.rx.syncp);
@@ -182,8 +182,8 @@ int aq_vec_init(struct aq_vec_s *self, const struct aq_hw_ops *aq_hw_ops,
 	self->aq_hw_ops = aq_hw_ops;
 	self->aq_hw = aq_hw;
 
-	for (i = 0U, ring = self->ring[0];
-		self->tx_rings > i; ++i, ring = self->ring[i]) {
+	for (i = 0U; self->tx_rings > i; ++i) {
+		ring = self->ring[i];
 		err = aq_ring_init(&ring[AQ_VEC_TX_ID], ATL_RING_TX);
 		if (err < 0)
 			goto err_exit;
@@ -224,8 +224,8 @@ int aq_vec_start(struct aq_vec_s *self)
 	unsigned int i = 0U;
 	int err = 0;
 
-	for (i = 0U, ring = self->ring[0];
-		self->tx_rings > i; ++i, ring = self->ring[i]) {
+	for (i = 0U; self->tx_rings > i; ++i) {
+		ring = self->ring[i];
 		err = self->aq_hw_ops->hw_ring_tx_start(self->aq_hw,
 							&ring[AQ_VEC_TX_ID]);
 		if (err < 0)
@@ -248,8 +248,8 @@ void aq_vec_stop(struct aq_vec_s *self)
 	struct aq_ring_s *ring = NULL;
 	unsigned int i = 0U;
 
-	for (i = 0U, ring = self->ring[0];
-		self->tx_rings > i; ++i, ring = self->ring[i]) {
+	for (i = 0U; self->tx_rings > i; ++i) {
+		ring = self->ring[i];
 		self->aq_hw_ops->hw_ring_tx_stop(self->aq_hw,
 						 &ring[AQ_VEC_TX_ID]);
 
@@ -268,8 +268,8 @@ void aq_vec_deinit(struct aq_vec_s *self)
 	if (!self)
 		goto err_exit;
 
-	for (i = 0U, ring = self->ring[0];
-		self->tx_rings > i; ++i, ring = self->ring[i]) {
+	for (i = 0U; self->tx_rings > i; ++i) {
+		ring = self->ring[i];
 		aq_ring_tx_clean(&ring[AQ_VEC_TX_ID]);
 		aq_ring_rx_deinit(&ring[AQ_VEC_RX_ID]);
 	}
@@ -297,8 +297,8 @@ void aq_vec_ring_free(struct aq_vec_s *self)
 	if (!self)
 		goto err_exit;
 
-	for (i = 0U, ring = self->ring[0];
-		self->tx_rings > i; ++i, ring = self->ring[i]) {
+	for (i = 0U; self->tx_rings > i; ++i) {
+		ring = self->ring[i];
 		aq_ring_free(&ring[AQ_VEC_TX_ID]);
 		if (i < self->rx_rings)
 			aq_ring_free(&ring[AQ_VEC_RX_ID]);
-- 
2.35.1


^ permalink raw reply related

* [PATCH AUTOSEL 5.15 08/27] vxlan: fix error return code in vxlan_fdb_append
From: Sasha Levin @ 2022-04-19 18:12 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Hongbin Wang, David S . Miller, Sasha Levin, kuba, pabeni, netdev
In-Reply-To: <20220419181242.485308-1-sashal@kernel.org>

From: Hongbin Wang <wh_bin@126.com>

[ Upstream commit 7cea5560bf656b84f9ed01c0cc829d4eecd0640b ]

When kmalloc and dst_cache_init failed,
should return ENOMEM rather than ENOBUFS.

Signed-off-by: Hongbin Wang <wh_bin@126.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 drivers/net/vxlan.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 141635a35c28..129e270e9a7c 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -711,11 +711,11 @@ static int vxlan_fdb_append(struct vxlan_fdb *f,

 	rd = kmalloc(sizeof(*rd), GFP_ATOMIC);
 	if (rd == NULL)
-		return -ENOBUFS;
+		return -ENOMEM;

 	if (dst_cache_init(&rd->dst_cache, GFP_ATOMIC)) {
 		kfree(rd);
-		return -ENOBUFS;
+		return -ENOMEM;
 	}

 	rd->remote_ip = *ip;
-- 
2.35.1

^ permalink raw reply related

* [PATCH AUTOSEL 5.17 21/34] net: macb: Restart tx only if queue pointer is lagging
From: Sasha Levin @ 2022-04-19 18:10 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Tomas Melin, Claudiu Beznea, Jakub Kicinski, Sasha Levin,
	nicolas.ferre, davem, pabeni, netdev
In-Reply-To: <20220419181104.484667-1-sashal@kernel.org>

From: Tomas Melin <tomas.melin@vaisala.com>

[ Upstream commit 5ad7f18cd82cee8e773d40cc7a1465a526f2615c ]

commit 4298388574da ("net: macb: restart tx after tx used bit read")
added support for restarting transmission. Restarting tx does not work
in case controller asserts TXUBR interrupt and TQBP is already at the end
of the tx queue. In that situation, restarting tx will immediately cause
assertion of another TXUBR interrupt. The driver will end up in an infinite
interrupt loop which it cannot break out of.

For cases where TQBP is at the end of the tx queue, instead
only clear TX_USED interrupt. As more data gets pushed to the queue,
transmission will resume.

This issue was observed on a Xilinx Zynq-7000 based board.
During stress test of the network interface,
driver would get stuck on interrupt loop within seconds or minutes
causing CPU to stall.

Signed-off-by: Tomas Melin <tomas.melin@vaisala.com>
Tested-by: Claudiu Beznea <claudiu.beznea@microchip.com>
Reviewed-by: Claudiu Beznea <claudiu.beznea@microchip.com>
Link: https://lore.kernel.org/r/20220407161659.14532-1-tomas.melin@vaisala.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 drivers/net/ethernet/cadence/macb_main.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/drivers/net/ethernet/cadence/macb_main.c b/drivers/net/ethernet/cadence/macb_main.c
index d13f06cf0308..c4f4b13ac469 100644
--- a/drivers/net/ethernet/cadence/macb_main.c
+++ b/drivers/net/ethernet/cadence/macb_main.c
@@ -1656,6 +1656,7 @@ static void macb_tx_restart(struct macb_queue *queue)
 	unsigned int head = queue->tx_head;
 	unsigned int tail = queue->tx_tail;
 	struct macb *bp = queue->bp;
+	unsigned int head_idx, tbqp;
 
 	if (bp->caps & MACB_CAPS_ISR_CLEAR_ON_WRITE)
 		queue_writel(queue, ISR, MACB_BIT(TXUBR));
@@ -1663,6 +1664,13 @@ static void macb_tx_restart(struct macb_queue *queue)
 	if (head == tail)
 		return;
 
+	tbqp = queue_readl(queue, TBQP) / macb_dma_desc_get_size(bp);
+	tbqp = macb_adj_dma_desc_idx(bp, macb_tx_ring_wrap(bp, tbqp));
+	head_idx = macb_adj_dma_desc_idx(bp, macb_tx_ring_wrap(bp, head));
+
+	if (tbqp == head_idx)
+		return;
+
 	macb_writel(bp, NCR, macb_readl(bp, NCR) | MACB_BIT(TSTART));
 }
 
-- 
2.35.1


^ permalink raw reply related

* [PATCH AUTOSEL 5.17 16/34] brcmfmac: sdio: Fix undefined behavior due to shift overflowing the constant
From: Sasha Levin @ 2022-04-19 18:10 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Borislav Petkov, Borislav Petkov, Arend van Spriel, Franky Lin,
	Hante Meuleman, Kalle Valo, David S. Miller, Jakub Kicinski,
	brcm80211-dev-list.pdl, netdev, Arend van Spriel, Sasha Levin,
	pabeni, hdegoede, linus.walleij, mbrugger, wsa+renesas, marcan,
	jiaqing.zhao, shenyang39, angus, mike.rudenko, linux-wireless,
	SHA-cyfmac-dev-list
In-Reply-To: <20220419181104.484667-1-sashal@kernel.org>

From: Borislav Petkov <bp@alien8.de>

[ Upstream commit 6fb3a5868b2117611f41e421e10e6a8c2a13039a ]

Fix:

  drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c: In function ‘brcmf_sdio_drivestrengthinit’:
  drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c:3798:2: error: case label does not reduce to an integer constant
    case SDIOD_DRVSTR_KEY(BRCM_CC_43143_CHIP_ID, 17):
    ^~~~
  drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c:3809:2: error: case label does not reduce to an integer constant
    case SDIOD_DRVSTR_KEY(BRCM_CC_43362_CHIP_ID, 13):
    ^~~~

See https://lore.kernel.org/r/YkwQ6%2BtIH8GQpuct@zn.tnic for the gory
details as to why it triggers with older gccs only.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Arend van Spriel <aspriel@gmail.com>
Cc: Franky Lin <franky.lin@broadcom.com>
Cc: Hante Meuleman <hante.meuleman@broadcom.com>
Cc: Kalle Valo <kvalo@kernel.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: brcm80211-dev-list.pdl@broadcom.com
Cc: netdev@vger.kernel.org
Acked-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/Ykx0iRlvtBnKqtbG@zn.tnic
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c
index 5d156e591b35..f7961b22e051 100644
--- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c
+++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c
@@ -557,7 +557,7 @@ enum brcmf_sdio_frmtype {
 	BRCMF_SDIO_FT_SUB,
 };
 
-#define SDIOD_DRVSTR_KEY(chip, pmu)     (((chip) << 16) | (pmu))
+#define SDIOD_DRVSTR_KEY(chip, pmu)     (((unsigned int)(chip) << 16) | (pmu))
 
 /* SDIO Pad drive strength to select value mappings */
 struct sdiod_drive_str {
-- 
2.35.1


^ permalink raw reply related

* [PATCH AUTOSEL 5.17 17/34] dpaa_eth: Fix missing of_node_put in dpaa_get_ts_info()
From: Sasha Levin @ 2022-04-19 18:10 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Lv Ruyi, Zeal Robot, David S . Miller, Sasha Levin, madalin.bucur,
	kuba, pabeni, netdev
In-Reply-To: <20220419181104.484667-1-sashal@kernel.org>

From: Lv Ruyi <lv.ruyi@zte.com.cn>

[ Upstream commit 1a7eb80d170c28be2928433702256fe2a0bd1e0f ]

Both of of_get_parent() and of_parse_phandle() return node pointer with
refcount incremented, use of_node_put() on it to decrease refcount
when done.

Reported-by: Zeal Robot <zealci@zte.com.cn>
Signed-off-by: Lv Ruyi <lv.ruyi@zte.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 drivers/net/ethernet/freescale/dpaa/dpaa_ethtool.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/freescale/dpaa/dpaa_ethtool.c b/drivers/net/ethernet/freescale/dpaa/dpaa_ethtool.c
index 763d2c7b5fb1..5750f9a56393 100644
--- a/drivers/net/ethernet/freescale/dpaa/dpaa_ethtool.c
+++ b/drivers/net/ethernet/freescale/dpaa/dpaa_ethtool.c
@@ -489,11 +489,15 @@ static int dpaa_get_ts_info(struct net_device *net_dev,
 	info->phc_index = -1;
 
 	fman_node = of_get_parent(mac_node);
-	if (fman_node)
+	if (fman_node) {
 		ptp_node = of_parse_phandle(fman_node, "ptimer-handle", 0);
+		of_node_put(fman_node);
+	}
 
-	if (ptp_node)
+	if (ptp_node) {
 		ptp_dev = of_find_device_by_node(ptp_node);
+		of_node_put(ptp_node);
+	}
 
 	if (ptp_dev)
 		ptp = platform_get_drvdata(ptp_dev);
-- 
2.35.1


^ permalink raw reply related

* [PATCH AUTOSEL 5.17 15/34] mt76: Fix undefined behavior due to shift overflowing the constant
From: Sasha Levin @ 2022-04-19 18:10 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Borislav Petkov, Felix Fietkau, Lorenzo Bianconi, Ryder Lee,
	Shayne Chen, Sean Wang, Kalle Valo, David S. Miller,
	Jakub Kicinski, linux-wireless, netdev, Sasha Levin, lorenzo,
	pabeni, matthias.bgg, christophe.jaillet, linux-arm-kernel,
	linux-mediatek
In-Reply-To: <20220419181104.484667-1-sashal@kernel.org>

From: Borislav Petkov <bp@suse.de>

[ Upstream commit dbc2b1764734857d68425468ffa8486e97ab89df ]

Fix:

  drivers/net/wireless/mediatek/mt76/mt76x2/pci.c: In function ‘mt76x2e_probe’:
  ././include/linux/compiler_types.h:352:38: error: call to ‘__compiletime_assert_946’ \
	declared with attribute error: FIELD_PREP: mask is not constant
    _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)

See https://lore.kernel.org/r/YkwQ6%2BtIH8GQpuct@zn.tnic for the gory
details as to why it triggers with older gccs only.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Felix Fietkau <nbd@nbd.name>
Cc: Lorenzo Bianconi <lorenzo.bianconi83@gmail.com>
Cc: Ryder Lee <ryder.lee@mediatek.com>
Cc: Shayne Chen <shayne.chen@mediatek.com>
Cc: Sean Wang <sean.wang@mediatek.com>
Cc: Kalle Valo <kvalo@kernel.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: linux-wireless@vger.kernel.org
Cc: netdev@vger.kernel.org
Signed-off-by: Kalle Valo <kvalo@kernel.org>
Link: https://lore.kernel.org/r/20220405151517.29753-9-bp@alien8.de
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 drivers/net/wireless/mediatek/mt76/mt76x2/pci.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/wireless/mediatek/mt76/mt76x2/pci.c b/drivers/net/wireless/mediatek/mt76/mt76x2/pci.c
index 8a22ee581674..df85ebc6e1df 100644
--- a/drivers/net/wireless/mediatek/mt76/mt76x2/pci.c
+++ b/drivers/net/wireless/mediatek/mt76/mt76x2/pci.c
@@ -80,7 +80,7 @@ mt76x2e_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 	mt76_rmw_field(dev, 0x15a10, 0x1f << 16, 0x9);
 
 	/* RG_SSUSB_G1_CDR_BIC_LTR = 0xf */
-	mt76_rmw_field(dev, 0x15a0c, 0xf << 28, 0xf);
+	mt76_rmw_field(dev, 0x15a0c, 0xfU << 28, 0xf);
 
 	/* RG_SSUSB_CDR_BR_PE1D = 0x3 */
 	mt76_rmw_field(dev, 0x15c58, 0x3 << 6, 0x3);
-- 
2.35.1


^ permalink raw reply related

* [PATCH AUTOSEL 5.17 14/34] net: atlantic: Avoid out-of-bounds indexing
From: Sasha Levin @ 2022-04-19 18:10 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Kai-Heng Feng, Mario Limonciello, Igor Russkikh, Jakub Kicinski,
	Sasha Levin, davem, pabeni, netdev
In-Reply-To: <20220419181104.484667-1-sashal@kernel.org>

From: Kai-Heng Feng <kai.heng.feng@canonical.com>

[ Upstream commit 8d3a6c37d50d5a0504c126c932cc749e6dd9c78f ]

UBSAN warnings are observed on atlantic driver:
[ 294.432996] UBSAN: array-index-out-of-bounds in /build/linux-Qow4fL/linux-5.15.0/drivers/net/ethernet/aquantia/atlantic/aq_nic.c:484:48
[ 294.433695] index 8 is out of range for type 'aq_vec_s *[8]'

The ring is dereferenced right before breaking out the loop, to prevent
that from happening, only use the index in the loop to fix the issue.

BugLink: https://bugs.launchpad.net/bugs/1958770
Tested-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Reviewed-by: Igor Russkikh <irusskikh@marvell.com>
Link: https://lore.kernel.org/r/20220408022204.16815-1-kai.heng.feng@canonical.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 .../net/ethernet/aquantia/atlantic/aq_nic.c   |  8 +++----
 .../net/ethernet/aquantia/atlantic/aq_vec.c   | 24 +++++++++----------
 2 files changed, 16 insertions(+), 16 deletions(-)

diff --git a/drivers/net/ethernet/aquantia/atlantic/aq_nic.c b/drivers/net/ethernet/aquantia/atlantic/aq_nic.c
index 33f1a1377588..24d715c28a35 100644
--- a/drivers/net/ethernet/aquantia/atlantic/aq_nic.c
+++ b/drivers/net/ethernet/aquantia/atlantic/aq_nic.c
@@ -486,8 +486,8 @@ int aq_nic_start(struct aq_nic_s *self)
 	if (err < 0)
 		goto err_exit;
 
-	for (i = 0U, aq_vec = self->aq_vec[0];
-		self->aq_vecs > i; ++i, aq_vec = self->aq_vec[i]) {
+	for (i = 0U; self->aq_vecs > i; ++i) {
+		aq_vec = self->aq_vec[i];
 		err = aq_vec_start(aq_vec);
 		if (err < 0)
 			goto err_exit;
@@ -517,8 +517,8 @@ int aq_nic_start(struct aq_nic_s *self)
 		mod_timer(&self->polling_timer, jiffies +
 			  AQ_CFG_POLLING_TIMER_INTERVAL);
 	} else {
-		for (i = 0U, aq_vec = self->aq_vec[0];
-			self->aq_vecs > i; ++i, aq_vec = self->aq_vec[i]) {
+		for (i = 0U; self->aq_vecs > i; ++i) {
+			aq_vec = self->aq_vec[i];
 			err = aq_pci_func_alloc_irq(self, i, self->ndev->name,
 						    aq_vec_isr, aq_vec,
 						    aq_vec_get_affinity_mask(aq_vec));
diff --git a/drivers/net/ethernet/aquantia/atlantic/aq_vec.c b/drivers/net/ethernet/aquantia/atlantic/aq_vec.c
index f4774cf051c9..6ab1f3212d24 100644
--- a/drivers/net/ethernet/aquantia/atlantic/aq_vec.c
+++ b/drivers/net/ethernet/aquantia/atlantic/aq_vec.c
@@ -43,8 +43,8 @@ static int aq_vec_poll(struct napi_struct *napi, int budget)
 	if (!self) {
 		err = -EINVAL;
 	} else {
-		for (i = 0U, ring = self->ring[0];
-			self->tx_rings > i; ++i, ring = self->ring[i]) {
+		for (i = 0U; self->tx_rings > i; ++i) {
+			ring = self->ring[i];
 			u64_stats_update_begin(&ring[AQ_VEC_RX_ID].stats.rx.syncp);
 			ring[AQ_VEC_RX_ID].stats.rx.polls++;
 			u64_stats_update_end(&ring[AQ_VEC_RX_ID].stats.rx.syncp);
@@ -182,8 +182,8 @@ int aq_vec_init(struct aq_vec_s *self, const struct aq_hw_ops *aq_hw_ops,
 	self->aq_hw_ops = aq_hw_ops;
 	self->aq_hw = aq_hw;
 
-	for (i = 0U, ring = self->ring[0];
-		self->tx_rings > i; ++i, ring = self->ring[i]) {
+	for (i = 0U; self->tx_rings > i; ++i) {
+		ring = self->ring[i];
 		err = aq_ring_init(&ring[AQ_VEC_TX_ID], ATL_RING_TX);
 		if (err < 0)
 			goto err_exit;
@@ -224,8 +224,8 @@ int aq_vec_start(struct aq_vec_s *self)
 	unsigned int i = 0U;
 	int err = 0;
 
-	for (i = 0U, ring = self->ring[0];
-		self->tx_rings > i; ++i, ring = self->ring[i]) {
+	for (i = 0U; self->tx_rings > i; ++i) {
+		ring = self->ring[i];
 		err = self->aq_hw_ops->hw_ring_tx_start(self->aq_hw,
 							&ring[AQ_VEC_TX_ID]);
 		if (err < 0)
@@ -248,8 +248,8 @@ void aq_vec_stop(struct aq_vec_s *self)
 	struct aq_ring_s *ring = NULL;
 	unsigned int i = 0U;
 
-	for (i = 0U, ring = self->ring[0];
-		self->tx_rings > i; ++i, ring = self->ring[i]) {
+	for (i = 0U; self->tx_rings > i; ++i) {
+		ring = self->ring[i];
 		self->aq_hw_ops->hw_ring_tx_stop(self->aq_hw,
 						 &ring[AQ_VEC_TX_ID]);
 
@@ -268,8 +268,8 @@ void aq_vec_deinit(struct aq_vec_s *self)
 	if (!self)
 		goto err_exit;
 
-	for (i = 0U, ring = self->ring[0];
-		self->tx_rings > i; ++i, ring = self->ring[i]) {
+	for (i = 0U; self->tx_rings > i; ++i) {
+		ring = self->ring[i];
 		aq_ring_tx_clean(&ring[AQ_VEC_TX_ID]);
 		aq_ring_rx_deinit(&ring[AQ_VEC_RX_ID]);
 	}
@@ -297,8 +297,8 @@ void aq_vec_ring_free(struct aq_vec_s *self)
 	if (!self)
 		goto err_exit;
 
-	for (i = 0U, ring = self->ring[0];
-		self->tx_rings > i; ++i, ring = self->ring[i]) {
+	for (i = 0U; self->tx_rings > i; ++i) {
+		ring = self->ring[i];
 		aq_ring_free(&ring[AQ_VEC_TX_ID]);
 		if (i < self->rx_rings)
 			aq_ring_free(&ring[AQ_VEC_RX_ID]);
-- 
2.35.1


^ permalink raw reply related

* [PATCH AUTOSEL 5.17 12/34] vxlan: fix error return code in vxlan_fdb_append
From: Sasha Levin @ 2022-04-19 18:10 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Hongbin Wang, David S . Miller, Sasha Levin, kuba, pabeni, netdev
In-Reply-To: <20220419181104.484667-1-sashal@kernel.org>

From: Hongbin Wang <wh_bin@126.com>

[ Upstream commit 7cea5560bf656b84f9ed01c0cc829d4eecd0640b ]

When kmalloc and dst_cache_init failed,
should return ENOMEM rather than ENOBUFS.

Signed-off-by: Hongbin Wang <wh_bin@126.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 drivers/net/vxlan.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 359d16780dbb..1bf8f7c35b7d 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -712,11 +712,11 @@ static int vxlan_fdb_append(struct vxlan_fdb *f,

 	rd = kmalloc(sizeof(*rd), GFP_ATOMIC);
 	if (rd == NULL)
-		return -ENOBUFS;
+		return -ENOMEM;

 	if (dst_cache_init(&rd->dst_cache, GFP_ATOMIC)) {
 		kfree(rd);
-		return -ENOBUFS;
+		return -ENOMEM;
 	}

 	rd->remote_ip = *ip;
-- 
2.35.1

^ permalink raw reply related

* Re: [PATCH net] net: sched: act_mirred: Reset ct info when mirror/redirect skb
From: Marcelo Ricardo Leitner @ 2022-04-19 17:26 UTC (permalink / raw)
  To: Eyal Birger
  Cc: Hangbin Liu, netdev, jhs, xiyou.wangcong, jiri, davem, kuba,
	ahleihel, dcaratti, aconole, roid, Shmulik Ladkani
In-Reply-To: <CAHsH6GuZciVLrn7J-DR4S+QU7Xrv422t1kfMyA7r=jADssNw+A@mail.gmail.com>

Hi,

On Tue, Apr 19, 2022 at 07:50:38PM +0300, Eyal Birger wrote:
> Hi,
>
> On Mon, Aug 9, 2021 at 1:29 PM <patchwork-bot+netdevbpf@kernel.org> wrote:
> >
> > Hello:
> >
> > This patch was applied to netdev/net.git (refs/heads/master):
> >
> > On Mon,  9 Aug 2021 15:04:55 +0800 you wrote:
> > > When mirror/redirect a skb to a different port, the ct info should be reset
> > > for reclassification. Or the pkts will match unexpected rules. For example,
> > > with following topology and commands:
> > >
> > >     -----------
> > >               |
> > >        veth0 -+-------
> > >               |
> > >        veth1 -+-------
> > >               |
> > >
> > > [...]
> >
> > Here is the summary with links:
> >   - [net] net: sched: act_mirred: Reset ct info when mirror/redirect skb
> >     https://git.kernel.org/netdev/net/c/d09c548dbf3b
>
> Unfortunately this commit breaks DNAT when performed before going via mirred
> egress->ingress.
>
> The reason is that connection tracking is lost and therefore a new state
> is created on ingress.
>
> This breaks existing setups.
>
> See below a simplified script reproducing this issue.

I guess I can understand why the reproducer triggers it, but I fail to
see the actual use case you have behind it. Can you please elaborate
on it?

>
> Therefore I suggest this commit be reverted and a knob is introduced to mirred
> for clearing ct as needed.
>
> Eyal.
>
> Reproduction script:
>
> #!/bin/bash
>
> ip netns add a
> ip netns add b
>
> ip netns exec a sysctl -w net.ipv4.conf.all.forwarding=1
> ip netns exec a sysctl -w net.ipv4.conf.all.accept_local=1
>
> ip link add veth0 netns a type veth peer name veth0 netns b
> ip -net a link set veth0 up
> ip -net a addr add dev veth0 198.51.100.1/30
>
> ip -net a link add dum0 type dummy
> ip -net a link set dev dum0 up
> ip -net a addr add dev dum0 198.51.100.2/32
>
> ip netns exec a iptables -t nat -I OUTPUT -d 10.0.0.1 -j DNAT
> --to-destination 10.0.0.2
> ip -net a route add default dev dum0
> ip -net a rule add pref 50 iif dum0 lookup 1000
> ip -net a route add table 1000 default dev veth0
>
> ip netns exec a tc qdisc add dev dum0 clsact
> ip netns exec a tc filter add dev dum0 parent ffff:fff3 prio 50 basic
> action mirred ingress redirect dev dum0
>
> ip -net b link set veth0 up
> ip  -net b addr add 10.0.0.2/32 dev veth0
> ip  -net b addr add 198.51.100.3/30 dev veth0
>
> ip netns exec a ping 10.0.0.1
> >
> > You are awesome, thank you!
> > --
> > Deet-doot-dot, I am a bot.
> > https://korg.docs.kernel.org/patchwork/pwbot.html
> >
> >
>


^ permalink raw reply

* Re: [PATCH] brcmfmac: Remove #ifdef guards for PM related functions
From: Paul Cercueil @ 2022-04-19 17:22 UTC (permalink / raw)
  To: Arend van Spriel
  Cc: Arend van Spriel, Franky Lin, Hante Meuleman, Kalle Valo,
	linux-wireless, brcm80211-dev-list.pdl, SHA-cyfmac-dev-list,
	netdev, linux-kernel
In-Reply-To: <afb9ea60-7f93-a646-da82-4f408051c748@broadcom.com>

Hi Arend,

Le lun., avril 18 2022 at 09:09:46 +0200, Arend van Spriel 
<arend.vanspriel@broadcom.com> a écrit :
> On 4/15/2022 10:03 PM, Paul Cercueil wrote:
>> Use the new DEFINE_SIMPLE_DEV_PM_OPS() and pm_sleep_ptr() macros to
>> handle the .suspend/.resume callbacks.
>> 
>> These macros allow the suspend and resume functions to be 
>> automatically
>> dropped by the compiler when CONFIG_SUSPEND is disabled, without 
>> having
>> to use #ifdef guards. The advantage is then that these functions are 
>> not
>> conditionally compiled.
> 
> The advantage stated here may not be obvious to everyone and that is 
> because it only scratches the surface. The code is always compiled 
> independent from the Kconfig options used and because of that the 
> real advantage is that build regressions are easier to catch.

Exactly. I will improve the commit message to make this a bit more 
explicit.

>> Some other functions not directly called by the .suspend/.resume
>> callbacks, but still related to PM were also taken outside #ifdef
>> guards.
> 
> a few comments on this inline...
> 
>> Signed-off-by: Paul Cercueil <paul@crapouillou.net>
>> ---
>>   .../broadcom/brcm80211/brcmfmac/bcmsdh.c      | 44 
>> +++++++------------
>>   .../broadcom/brcm80211/brcmfmac/sdio.c        |  5 +--
>>   .../broadcom/brcm80211/brcmfmac/sdio.h        | 16 -------
>>   3 files changed, 19 insertions(+), 46 deletions(-)
>> 
>> diff --git 
>> a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/bcmsdh.c 
>> b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/bcmsdh.c
>> index ac02244a6fdf..a8cf5a570101 100644
>> --- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/bcmsdh.c
>> +++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/bcmsdh.c
> 
> [...]
> 
>> @@ -873,7 +865,8 @@ int brcmf_sdiod_remove(struct brcmf_sdio_dev 
>> *sdiodev)
>>   		sdiodev->bus = NULL;
>>   	}
>>   \x7f-	brcmf_sdiod_freezer_detach(sdiodev);
>> +	if (IS_ENABLED(CONFIG_PM_SLEEP))
>> +		brcmf_sdiod_freezer_detach(sdiodev);
> 
> Please move the if statement inside the function to keep the code 
> flow in the calling function the same as before.
> 
>>   \x7f  	/* Disable Function 2 */
>>   	sdio_claim_host(sdiodev->func2);
>> @@ -949,9 +942,11 @@ int brcmf_sdiod_probe(struct brcmf_sdio_dev 
>> *sdiodev)
>>   		goto out;
>>   	}
>>   \x7f-	ret = brcmf_sdiod_freezer_attach(sdiodev);
>> -	if (ret)
>> -		goto out;
>> +	if (IS_ENABLED(CONFIG_PM_SLEEP)) {
>> +		ret = brcmf_sdiod_freezer_attach(sdiodev);
>> +		if (ret)
>> +			goto out;
>> +	}
> 
> Dito. Move the if statement inside the function.

Sure.

Cheers,
-Paul

> 
>>   \x7f  	/* try to attach to the target device */
>>   	sdiodev->bus = brcmf_sdio_probe(sdiodev);



^ permalink raw reply

* [RFC PATCH 1/2] tcp: allow MPTCP to update the announced window.
From: Paolo Abeni @ 2022-04-19 17:18 UTC (permalink / raw)
  To: netdev; +Cc: Eric Dumazet, mptcp
In-Reply-To: <cover.1650386197.git.pabeni@redhat.com>

The MPTCP RFC requires that the MPTCP-level receive window's
right edge never moves backward. Currently the MPTCP code
enforces such constraint while tracking the right edge, but it
does not reflects the constraint back on the wire, as lacks a
suitable hook to update accordingly the TCP header.

This change modifies the existing mptcp_write_options() hook,
providing the current packet's TCP header up to the MPTCP protocol
level, so that the next patch could implement the above mentioned
constraint.

No functional changes intended.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 include/net/mptcp.h   |  2 +-
 net/ipv4/tcp_output.c | 14 ++++++++------
 net/mptcp/options.c   |  2 +-
 3 files changed, 10 insertions(+), 8 deletions(-)

diff --git a/include/net/mptcp.h b/include/net/mptcp.h
index 0a3b0fb04a3b..ded4359b15be 100644
--- a/include/net/mptcp.h
+++ b/include/net/mptcp.h
@@ -124,7 +124,7 @@ bool mptcp_established_options(struct sock *sk, struct sk_buff *skb,
 			       struct mptcp_out_options *opts);
 bool mptcp_incoming_options(struct sock *sk, struct sk_buff *skb);
 
-void mptcp_write_options(__be32 *ptr, const struct tcp_sock *tp,
+void mptcp_write_options(struct tcphdr *th, __be32 *ptr, struct tcp_sock *tp,
 			 struct mptcp_out_options *opts);
 
 void mptcp_diag_fill_info(struct mptcp_sock *msk, struct mptcp_info *info);
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index c221f3bce975..7d64c3fc5d40 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -444,12 +444,13 @@ struct tcp_out_options {
 	struct mptcp_out_options mptcp;
 };
 
-static void mptcp_options_write(__be32 *ptr, const struct tcp_sock *tp,
+static void mptcp_options_write(struct tcphdr *th, __be32 *ptr,
+				struct tcp_sock *tp,
 				struct tcp_out_options *opts)
 {
 #if IS_ENABLED(CONFIG_MPTCP)
 	if (unlikely(OPTION_MPTCP & opts->options))
-		mptcp_write_options(ptr, tp, &opts->mptcp);
+		mptcp_write_options(th, ptr, tp, &opts->mptcp);
 #endif
 }
 
@@ -605,9 +606,10 @@ static void bpf_skops_write_hdr_opt(struct sock *sk, struct sk_buff *skb,
  * At least SACK_PERM as the first option is known to lead to a disaster
  * (but it may well be that other scenarios fail similarly).
  */
-static void tcp_options_write(__be32 *ptr, struct tcp_sock *tp,
+static void tcp_options_write(struct tcphdr *th, struct tcp_sock *tp,
 			      struct tcp_out_options *opts)
 {
+	__be32 *ptr = (__be32 *)(th + 1);
 	u16 options = opts->options;	/* mungable copy */
 
 	if (unlikely(OPTION_MD5 & options)) {
@@ -701,7 +703,7 @@ static void tcp_options_write(__be32 *ptr, struct tcp_sock *tp,
 
 	smc_options_write(ptr, &options);
 
-	mptcp_options_write(ptr, tp, opts);
+	mptcp_options_write(th, ptr, tp, opts);
 }
 
 static void smc_set_option(const struct tcp_sock *tp,
@@ -1354,7 +1356,7 @@ static int __tcp_transmit_skb(struct sock *sk, struct sk_buff *skb,
 		th->window	= htons(min(tp->rcv_wnd, 65535U));
 	}
 
-	tcp_options_write((__be32 *)(th + 1), tp, &opts);
+	tcp_options_write(th, tp, &opts);
 
 #ifdef CONFIG_TCP_MD5SIG
 	/* Calculate the MD5 hash, as we have all we need now */
@@ -3590,7 +3592,7 @@ struct sk_buff *tcp_make_synack(const struct sock *sk, struct dst_entry *dst,
 
 	/* RFC1323: The window in SYN & SYN/ACK segments is never scaled. */
 	th->window = htons(min(req->rsk_rcv_wnd, 65535U));
-	tcp_options_write((__be32 *)(th + 1), NULL, &opts);
+	tcp_options_write(th, NULL, &opts);
 	th->doff = (tcp_header_size >> 2);
 	__TCP_INC_STATS(sock_net(sk), TCP_MIB_OUTSEGS);
 
diff --git a/net/mptcp/options.c b/net/mptcp/options.c
index 325383646f5c..9d6f14b496df 100644
--- a/net/mptcp/options.c
+++ b/net/mptcp/options.c
@@ -1265,7 +1265,7 @@ static u16 mptcp_make_csum(const struct mptcp_ext *mpext)
 				 ~csum_unfold(mpext->csum));
 }
 
-void mptcp_write_options(__be32 *ptr, const struct tcp_sock *tp,
+void mptcp_write_options(struct tcphdr *th, __be32 *ptr, struct tcp_sock *tp,
 			 struct mptcp_out_options *opts)
 {
 	const struct sock *ssk = (const struct sock *)tp;
-- 
2.35.1


^ permalink raw reply related

* [RFC PATCH 2/2] mptcp: never shrink offered window
From: Paolo Abeni @ 2022-04-19 17:18 UTC (permalink / raw)
  To: netdev; +Cc: Eric Dumazet, mptcp
In-Reply-To: <cover.1650386197.git.pabeni@redhat.com>

As per RFC, the offered MPTCP-level window should never shrink.
While we currently track the right edge, we don't enforce the
above constraint on the wire.
Additionally, concurrent xmit on different subflows can end-up in
erroneous right edge update.
Address the above explicitly updating the announced window and
protecting the update with an additional atomic operation (sic)

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 net/mptcp/options.c | 52 +++++++++++++++++++++++++++++++++++++++------
 1 file changed, 45 insertions(+), 7 deletions(-)

diff --git a/net/mptcp/options.c b/net/mptcp/options.c
index 9d6f14b496df..86d67ad41266 100644
--- a/net/mptcp/options.c
+++ b/net/mptcp/options.c
@@ -1224,20 +1224,58 @@ bool mptcp_incoming_options(struct sock *sk, struct sk_buff *skb)
 	return true;
 }
 
-static void mptcp_set_rwin(const struct tcp_sock *tp)
+static void mptcp_set_rwin(struct tcp_sock *tp, struct tcphdr *th)
 {
 	const struct sock *ssk = (const struct sock *)tp;
-	const struct mptcp_subflow_context *subflow;
+	struct mptcp_subflow_context *subflow;
+	u64 ack_seq, rcv_wnd_old, rcv_wnd_new;
 	struct mptcp_sock *msk;
-	u64 ack_seq;
+	u32 new_win;
+	u64 win;
 
 	subflow = mptcp_subflow_ctx(ssk);
 	msk = mptcp_sk(subflow->conn);
 
-	ack_seq = READ_ONCE(msk->ack_seq) + tp->rcv_wnd;
+	ack_seq = READ_ONCE(msk->ack_seq);
+	rcv_wnd_new = ack_seq + tp->rcv_wnd;
+
+	rcv_wnd_old = READ_ONCE(msk->rcv_wnd_sent);
+	if (after64(rcv_wnd_new, rcv_wnd_old)) {
+		u64 rcv_wnd;
+
+		for (;;) {
+			rcv_wnd = cmpxchg64(&msk->rcv_wnd_sent, rcv_wnd_old, rcv_wnd_new);
+
+			if (rcv_wnd == rcv_wnd_old)
+				break;
+			if (before64(rcv_wnd_new, rcv_wnd))
+				goto raise_win;
+			rcv_wnd_old = rcv_wnd;
+		};
+		return;
+	}
+
+	if (rcv_wnd_new != rcv_wnd_old) {
+raise_win:
+		win = rcv_wnd_old - ack_seq;
+		tp->rcv_wnd = min_t(u64, win, U32_MAX);
+		new_win = tp->rcv_wnd;
 
-	if (after64(ack_seq, READ_ONCE(msk->rcv_wnd_sent)))
-		WRITE_ONCE(msk->rcv_wnd_sent, ack_seq);
+		/* Make sure we do not exceed the maximum possible
+		 * scaled window.
+		 */
+		if (unlikely(th->syn))
+			new_win = min(new_win, 65535U) << tp->rx_opt.rcv_wscale;
+		if (!tp->rx_opt.rcv_wscale &&
+		    sock_net(ssk)->ipv4.sysctl_tcp_workaround_signed_windows)
+			new_win = min(new_win, MAX_TCP_WINDOW);
+		else
+			new_win = min(new_win, (65535U << tp->rx_opt.rcv_wscale));
+
+		/* RFC1323 scaling applied */
+		new_win >>= tp->rx_opt.rcv_wscale;
+		th->window = htons(new_win);
+	}
 }
 
 u16 __mptcp_make_csum(u64 data_seq, u32 subflow_seq, u16 data_len, __wsum sum)
@@ -1550,7 +1588,7 @@ void mptcp_write_options(struct tcphdr *th, __be32 *ptr, struct tcp_sock *tp,
 	}
 
 	if (tp)
-		mptcp_set_rwin(tp);
+		mptcp_set_rwin(tp, th);
 }
 
 __be32 mptcp_get_reset_option(const struct sk_buff *skb)
-- 
2.35.1


^ permalink raw reply related

* [RFC PATCH 0/2] mptcp: never shrink offered window
From: Paolo Abeni @ 2022-04-19 17:18 UTC (permalink / raw)
  To: netdev; +Cc: Eric Dumazet, mptcp

There patches are part of a somewhat largish series to implement
more strict RFC compliance for the MPTCP-level congestion window
handling, solving some tput instability issues.

The obtain the above we need to modify an existing MPTCP hook,
touching the TCP code (in patch 1/2), ence the RFC.

The later patch demonstrates the actual usage of such hook.

Paolo Abeni (2):
  tcp: allow MPTCP to update the announced window.
  mptcp: never shrink offered window

 include/net/mptcp.h   |  2 +-
 net/ipv4/tcp_output.c | 14 ++++++-----
 net/mptcp/options.c   | 54 ++++++++++++++++++++++++++++++++++++-------
 3 files changed, 55 insertions(+), 15 deletions(-)

-- 
2.35.1

^ permalink raw reply

* [patch iproute2-next] devlink: introduce -h[ex] cmdline option to allow dumping numbers in hex format
From: Jiri Pirko @ 2022-04-19 17:16 UTC (permalink / raw)
  To: netdev; +Cc: sthemmin, dsahern

From: Jiri Pirko <jiri@nvidia.com>

For health reporter dumps it is quite convenient to have the numbers in
hexadecimal format. Introduce a command line option to allow user to
achieve that output.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
---
 devlink/devlink.c  | 19 +++++++++++++------
 man/man8/devlink.8 |  4 ++++
 2 files changed, 17 insertions(+), 6 deletions(-)

diff --git a/devlink/devlink.c b/devlink/devlink.c
index aab739f7f437..bc681737ec8a 100644
--- a/devlink/devlink.c
+++ b/devlink/devlink.c
@@ -367,6 +367,7 @@ struct dl {
 	bool pretty_output;
 	bool verbose;
 	bool stats;
+	bool hex;
 	struct {
 		bool present;
 		char *bus_name;
@@ -8044,6 +8045,8 @@ static int cmd_health_dump_clear(struct dl *dl)
 
 static int fmsg_value_show(struct dl *dl, int type, struct nlattr *nl_data)
 {
+	const char *num_fmt = dl->hex ? "%x" : "%u";
+	const char *num64_fmt = dl->hex ? "%"PRIx64 : "%"PRIu64;
 	uint8_t *data;
 	uint32_t len;
 
@@ -8053,16 +8056,16 @@ static int fmsg_value_show(struct dl *dl, int type, struct nlattr *nl_data)
 		print_bool(PRINT_ANY, NULL, "%s", mnl_attr_get_u8(nl_data));
 		break;
 	case MNL_TYPE_U8:
-		print_uint(PRINT_ANY, NULL, "%u", mnl_attr_get_u8(nl_data));
+		print_uint(PRINT_ANY, NULL, num_fmt, mnl_attr_get_u8(nl_data));
 		break;
 	case MNL_TYPE_U16:
-		print_uint(PRINT_ANY, NULL, "%u", mnl_attr_get_u16(nl_data));
+		print_uint(PRINT_ANY, NULL, num_fmt, mnl_attr_get_u16(nl_data));
 		break;
 	case MNL_TYPE_U32:
-		print_uint(PRINT_ANY, NULL, "%u", mnl_attr_get_u32(nl_data));
+		print_uint(PRINT_ANY, NULL, num_fmt, mnl_attr_get_u32(nl_data));
 		break;
 	case MNL_TYPE_U64:
-		print_u64(PRINT_ANY, NULL, "%"PRIu64, mnl_attr_get_u64(nl_data));
+		print_u64(PRINT_ANY, NULL, num64_fmt, mnl_attr_get_u64(nl_data));
 		break;
 	case MNL_TYPE_NUL_STRING:
 		print_string(PRINT_ANY, NULL, "%s", mnl_attr_get_str(nl_data));
@@ -8939,7 +8942,7 @@ static void help(void)
 	pr_err("Usage: devlink [ OPTIONS ] OBJECT { COMMAND | help }\n"
 	       "       devlink [ -f[orce] ] -b[atch] filename -N[etns] netnsname\n"
 	       "where  OBJECT := { dev | port | sb | monitor | dpipe | resource | region | health | trap }\n"
-	       "       OPTIONS := { -V[ersion] | -n[o-nice-names] | -j[son] | -p[retty] | -v[erbose] -s[tatistics] }\n");
+	       "       OPTIONS := { -V[ersion] | -n[o-nice-names] | -j[son] | -p[retty] | -v[erbose] -s[tatistics] -h[ex] }\n");
 }
 
 static int dl_cmd(struct dl *dl, int argc, char **argv)
@@ -9053,6 +9056,7 @@ int main(int argc, char **argv)
 		{ "statistics",		no_argument,		NULL, 's' },
 		{ "Netns",		required_argument,	NULL, 'N' },
 		{ "iec",		no_argument,		NULL, 'i' },
+		{ "hex",		no_argument,		NULL, 'h' },
 		{ NULL, 0, NULL, 0 }
 	};
 	const char *batch_file = NULL;
@@ -9068,7 +9072,7 @@ int main(int argc, char **argv)
 		return EXIT_FAILURE;
 	}
 
-	while ((opt = getopt_long(argc, argv, "Vfb:njpvsN:i",
+	while ((opt = getopt_long(argc, argv, "Vfb:njpvsN:ih",
 				  long_options, NULL)) >= 0) {
 
 		switch (opt) {
@@ -9106,6 +9110,9 @@ int main(int argc, char **argv)
 		case 'i':
 			use_iec = true;
 			break;
+		case 'h':
+			dl->hex = true;
+			break;
 		default:
 			pr_err("Unknown option.\n");
 			help();
diff --git a/man/man8/devlink.8 b/man/man8/devlink.8
index 840cf44cf97b..3797a27cefc5 100644
--- a/man/man8/devlink.8
+++ b/man/man8/devlink.8
@@ -63,6 +63,10 @@ Switches to the specified network namespace.
 .BR "\-i", " --iec"
 Print human readable rates in IEC units (e.g. 1Ki = 1024).
 
+.TP
+.BR "\-h", " --hex"
+Print dump numbers in hexadecimal format.
+
 .SS
 .I OBJECT
 
-- 
2.35.1


^ permalink raw reply related

* [patch iproute2-main] devlink: fix "devlink health dump" command without arg
From: Jiri Pirko @ 2022-04-19 17:15 UTC (permalink / raw)
  To: netdev; +Cc: sthemmin, dsahern

From: Jiri Pirko <jiri@nvidia.com>

Fix bug when user calls "devlink health dump" without "show" or "clear":
$ devlink health dump
Command "(null)" not found

Put the dump command into a separate helper as it is usual in the rest
of the code. Also, treat no cmd as "show", as it is common for other
devlink objects.

Fixes: 041e6e651a8e ("devlink: Add devlink health dump show command")
Signed-off-by: Jiri Pirko <jiri@nvidia.com>
---
 devlink/devlink.c | 25 ++++++++++++++++++-------
 1 file changed, 18 insertions(+), 7 deletions(-)

diff --git a/devlink/devlink.c b/devlink/devlink.c
index da9f97788bcf..aab739f7f437 100644
--- a/devlink/devlink.c
+++ b/devlink/devlink.c
@@ -8526,6 +8526,23 @@ static void cmd_health_help(void)
 	pr_err("                          [ auto_dump    { true | false } ]\n");
 }
 
+static int cmd_health_dump(struct dl *dl)
+{
+	if (dl_argv_match(dl, "help")) {
+		cmd_health_help();
+		return 0;
+	} else if (dl_argv_match(dl, "show") ||
+		   dl_argv_match(dl, "list") || dl_no_arg(dl)) {
+		dl_arg_inc(dl);
+		return cmd_health_dump_show(dl);
+	} else if (dl_argv_match(dl, "clear")) {
+		dl_arg_inc(dl);
+		return cmd_health_dump_clear(dl);
+	}
+	pr_err("Command \"%s\" not found\n", dl_argv(dl));
+	return -ENOENT;
+}
+
 static int cmd_health(struct dl *dl)
 {
 	if (dl_argv_match(dl, "help")) {
@@ -8546,13 +8563,7 @@ static int cmd_health(struct dl *dl)
 		return cmd_health_test(dl);
 	} else if (dl_argv_match(dl, "dump")) {
 		dl_arg_inc(dl);
-		if (dl_argv_match(dl, "show")) {
-			dl_arg_inc(dl);
-			return cmd_health_dump_show(dl);
-		} else if (dl_argv_match(dl, "clear")) {
-			dl_arg_inc(dl);
-			return cmd_health_dump_clear(dl);
-		}
+		return cmd_health_dump(dl);
 	} else if (dl_argv_match(dl, "set")) {
 		dl_arg_inc(dl);
 		return cmd_health_set_params(dl);
-- 
2.35.1


^ permalink raw reply related

* Re: [PATCH bpf-next v2] bpf: move rcu lock management out of BPF_PROG_RUN routines
From: Alexei Starovoitov @ 2022-04-19 17:05 UTC (permalink / raw)
  To: Stanislav Fomichev
  Cc: Network Development, bpf, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau
In-Reply-To: <CAKH8qBugYgnqj3wfOi4+974XCTH2hoGbzq43hve1_NjDwNO_eQ@mail.gmail.com>

On Tue, Apr 19, 2022 at 10:01 AM Stanislav Fomichev <sdf@google.com> wrote:
>
> On Tue, Apr 19, 2022 at 9:49 AM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> >
> > On Tue, Apr 19, 2022 at 9:35 AM Stanislav Fomichev <sdf@google.com> wrote:
> > >
> > > On Tue, Apr 19, 2022 at 9:32 AM Alexei Starovoitov
> > > <alexei.starovoitov@gmail.com> wrote:
> > > >
> > > > On Tue, Apr 19, 2022 at 9:20 AM Alexei Starovoitov
> > > > <alexei.starovoitov@gmail.com> wrote:
> > > > >
> > > > > On Tue, Apr 19, 2022 at 8:42 AM <sdf@google.com> wrote:
> > > > > >
> > > > > > On 04/18, Alexei Starovoitov wrote:
> > > > > > > On Mon, Apr 18, 2022 at 9:50 AM <sdf@google.com> wrote:
> > > > > > > >
> > > > > > > > On 04/16, Alexei Starovoitov wrote:
> > > > > > > > > On Thu, Apr 14, 2022 at 9:12 AM Stanislav Fomichev <sdf@google.com>
> > > > > > > wrote:
> > > > > > > > > > +static int
> > > > > > > > > > +bpf_prog_run_array_cg_flags(const struct cgroup_bpf *cgrp,
> > > > > > > > > > +                           enum cgroup_bpf_attach_type atype,
> > > > > > > > > > +                           const void *ctx, bpf_prog_run_fn
> > > > > > > run_prog,
> > > > > > > > > > +                           int retval, u32 *ret_flags)
> > > > > > > > > > +{
> > > > > > > > > > +       const struct bpf_prog_array_item *item;
> > > > > > > > > > +       const struct bpf_prog *prog;
> > > > > > > > > > +       const struct bpf_prog_array *array;
> > > > > > > > > > +       struct bpf_run_ctx *old_run_ctx;
> > > > > > > > > > +       struct bpf_cg_run_ctx run_ctx;
> > > > > > > > > > +       u32 func_ret;
> > > > > > > > > > +
> > > > > > > > > > +       run_ctx.retval = retval;
> > > > > > > > > > +       migrate_disable();
> > > > > > > > > > +       rcu_read_lock();
> > > > > > > > > > +       array = rcu_dereference(cgrp->effective[atype]);
> > > > > > > > > > +       item = &array->items[0];
> > > > > > > > > > +       old_run_ctx = bpf_set_run_ctx(&run_ctx.run_ctx);
> > > > > > > > > > +       while ((prog = READ_ONCE(item->prog))) {
> > > > > > > > > > +               run_ctx.prog_item = item;
> > > > > > > > > > +               func_ret = run_prog(prog, ctx);
> > > > > > > > > ...
> > > > > > > > > > +       ret = bpf_prog_run_array_cg(&cgrp->bpf, CGROUP_GETSOCKOPT,
> > > > > > > > > >                                     &ctx, bpf_prog_run, retval);
> > > > > > > >
> > > > > > > > > Did you check the asm that bpf_prog_run gets inlined
> > > > > > > > > after being passed as a pointer to a function?
> > > > > > > > > Crossing fingers... I suspect not every compiler can do that :(
> > > > > > > > > De-virtualization optimization used to be tricky.
> > > > > > > >
> > > > > > > > No, I didn't, but looking at it right now, both gcc and clang
> > > > > > > > seem to be doing inlining all way up to bpf_dispatcher_nop_func.
> > > > > > > >
> > > > > > > > clang:
> > > > > > > >
> > > > > > > >    0000000000001750 <__cgroup_bpf_run_filter_sock_addr>:
> > > > > > > >    __cgroup_bpf_run_filter_sock_addr():
> > > > > > > >    ./kernel/bpf/cgroup.c:1226
> > > > > > > >    int __cgroup_bpf_run_filter_sock_addr(struct sock *sk,
> > > > > > > >                                       struct sockaddr *uaddr,
> > > > > > > >                                       enum cgroup_bpf_attach_type atype,
> > > > > > > >                                       void *t_ctx,
> > > > > > > >                                       u32 *flags)
> > > > > > > >    {
> > > > > > > >
> > > > > > > >    ...
> > > > > > > >
> > > > > > > >    ./include/linux/filter.h:628
> > > > > > > >                 ret = dfunc(ctx, prog->insnsi, prog->bpf_func);
> > > > > > > >        1980:    49 8d 75 48             lea    0x48(%r13),%rsi
> > > > > > > >    bpf_dispatcher_nop_func():
> > > > > > > >    ./include/linux/bpf.h:804
> > > > > > > >         return bpf_func(ctx, insnsi);
> > > > > > > >        1984:    4c 89 f7                mov    %r14,%rdi
> > > > > > > >        1987:    41 ff 55 30             call   *0x30(%r13)
> > > > > > > >        198b:    89 c3                   mov    %eax,%ebx
> > > > > > > >
> > > > > > > > gcc (w/retpoline):
> > > > > > > >
> > > > > > > >    0000000000001110 <__cgroup_bpf_run_filter_sock_addr>:
> > > > > > > >    __cgroup_bpf_run_filter_sock_addr():
> > > > > > > >    kernel/bpf/cgroup.c:1226
> > > > > > > >    {
> > > > > > > >
> > > > > > > >    ...
> > > > > > > >
> > > > > > > >    ./include/linux/filter.h:628
> > > > > > > >                 ret = dfunc(ctx, prog->insnsi, prog->bpf_func);
> > > > > > > >        11c5:    49 8d 75 48             lea    0x48(%r13),%rsi
> > > > > > > >    bpf_dispatcher_nop_func():
> > > > > > > >    ./include/linux/bpf.h:804
> > > > > > > >        11c9:    48 8d 7c 24 10          lea    0x10(%rsp),%rdi
> > > > > > > >        11ce:    e8 00 00 00 00          call   11d3
> > > > > > > > <__cgroup_bpf_run_filter_sock_addr+0xc3>
> > > > > > > >                         11cf: R_X86_64_PLT32
> > > > > > > __x86_indirect_thunk_rax-0x4
> > > > > > > >        11d3:    89 c3                   mov    %eax,%ebx
> > > > > >
> > > > > > > Hmm. I'm not sure how you've got this asm.
> > > > > > > Here is what I see with gcc 8 and gcc 10:
> > > > > > > bpf_prog_run_array_cg:
> > > > > > > ...
> > > > > > >          movq    %rcx, %r12      # run_prog, run_prog
> > > > > > > ...
> > > > > > > # ../kernel/bpf/cgroup.c:77:            run_ctx.prog_item = item;
> > > > > > >          movq    %rbx, (%rsp)    # item, run_ctx.prog_item
> > > > > > > # ../kernel/bpf/cgroup.c:78:            if (!run_prog(prog, ctx) &&
> > > > > > > !IS_ERR_VALUE((long)run_ctx.retval))
> > > > > > >          movq    %rbp, %rsi      # ctx,
> > > > > > >          call    *%r12   # run_prog
> > > > > >
> > > > > > > __cgroup_bpf_run_filter_sk:
> > > > > > >          movq    $bpf_prog_run, %rcx     #,
> > > > > > > # ../kernel/bpf/cgroup.c:1202:  return
> > > > > > > bpf_prog_run_array_cg(&cgrp->bpf, atype, sk, bpf_prog_run, 0);
> > > > > > >          leaq    1520(%rax), %rdi        #, tmp92
> > > > > > > # ../kernel/bpf/cgroup.c:1202:  return
> > > > > > > bpf_prog_run_array_cg(&cgrp->bpf, atype, sk, bpf_prog_run, 0);
> > > > > > >          jmp     bpf_prog_run_array_cg   #
> > > > > >
> > > > > > > This is without kasan, lockdep and all debug configs are off.
> > > > > >
> > > > > > > So the generated code is pretty bad as I predicted :(
> > > > > >
> > > > > > > So I'm afraid this approach is no go.
> > > > > >
> > > > > > I've retested again and it still unrolls it for me on gcc 11 :-/
> > > > > > Anyway, I guess we have two options:
> > > > > >
> > > > > > 1. Go back to defines.
> > > > > > 2. Don't pass a ptr to func, but pass an enum which indicates whether
> > > > > >     to use bpf_prog_run or __bpf_prog_run_save_cb. Seems like in this
> > > > > >     case the compiler shouldn't have any trouble unwrapping it?
> > > > > >
> > > > > > I'll prototype and send (2). If it won't work out we can always get back
> > > > > > to (1).
> > > > >
> > > > > Going back to defines is probably not necessary.
> > > > > Could you try moving bpf_prog_run_array_cg*() back to .h
> > > > > and use static __always_inline ?
> > > >
> > > > Actually below was enough for gcc 8 and 10:
> > > > -static int
> > > > +static __always_inline int
> > > >  bpf_prog_run_array_cg_flags(const struct cgroup_bpf *cgrp,
> > > >                             enum cgroup_bpf_attach_type atype,
> > > >                             const void *ctx, bpf_prog_run_fn run_prog,
> > > > @@ -55,7 +55,7 @@ bpf_prog_run_array_cg_flags(const struct cgroup_bpf *cgrp,
> > > >         return run_ctx.retval;
> > > >  }
> > > >
> > > > -static int
> > > > +static __always_inline int
> > > >  bpf_prog_run_array_cg(const struct cgroup_bpf *cgrp,
> > > >
> > > > we can keep them in .c and generated code looks good.
> > > >
> > > > I can apply it with the above change.
> > > > wdyt?
> > >
> > > Sure, let's go with that if it works! On my side, I managed to get the
> > > same bad results on gcc-8; moving them to bpf-cgroup.h with
> > > __always_inline seems to fix it. But if we can keep them in .c, that
> > > looks even better.
> >
> > Ok. Applied.
> >
> > As the next step... can we combine bpf_prog_run_array_cg*()
> > into one function?
> > The only difference is:
> > func_ret = run_prog(prog, ctx);
> > if (!(func_ret & 1)
> >   vs
> > if (!run_prog(prog, ctx)
> >
> > afaik we don't have a check on the verifier side for possible
> > return values of cgroup progs,
> > so it might break some progs if we just do the former
> > in both cases?
>
> Seems like we do have return ranges checking for cgroup progs (I'm
> looking at check_return_code). Using bpf_prog_run_array_cg_flags
> everywhere seems possible, I can try and post some patches if it
> works.

Thanks!
Since it's always_inline extra 'if (ret_flags) *ret_flags = ...'
in the critical path is fine, since it will be optimized out
by the compiler when ret_flags are NULL.
Hopefully the compiler can hoist the check out of the while loop too
in case it's not NULL. But no big deal.
That array has typically only one prog in it.

^ permalink raw reply

* RE: [PATCH 2/2] Trigger proper interrupts in igc_xsk_wakeup
From: Vinicius Costa Gomes @ 2022-04-19 17:03 UTC (permalink / raw)
  To: Jeff Evanson, Jeff Evanson, Jesse Brandeburg, Tony Nguyen,
	David S. Miller, Jakub Kicinski, intel-wired-lan@lists.osuosl.org,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org
  Cc: jeff.evanson@gmail.com
In-Reply-To: <CY4PR16MB002318405355FB6AC87CA0919BF39@CY4PR16MB0023.namprd16.prod.outlook.com>

Hi Jeff,

Jeff Evanson <Jeff.Evanson@qsc.com> writes:

> Hi Vinicius. Thank you for the reply.
>
> The scenario only happens when the transmit queue interrupt is mapped
> to a different interrupt from the receive queue. In the case where
> XDP_WAKEUP_TX is set in the flags argument, the previous code would
> only trigger the interrupt for the receive queue, causing the transmit
> queue's napi_struct to never be scheduled.

This is a good explanation, adding it to the commit message for the v2
would be a good idea.

>
> In the scenario where XDP_WAKEUP_TX and XDP_WAKEUP_RX are both set in
> flags, the receive interrupt is always triggered and the transmit
> interrupt is only triggered when the transmit q_vector differs from
> the receive q_vector.

I missed that condition when I looked at the patch. Sorry about that.
Makes more sense now.

Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>

>
> Regards,
> Jeff Evanson
>
> -----Original Message-----
> From: Vinicius Costa Gomes <vinicius.gomes@intel.com> 
> Sent: Monday, April 18, 2022 11:45 AM
> To: Jeff Evanson <jeff.evanson@gmail.com>; Jesse Brandeburg <jesse.brandeburg@intel.com>; Tony Nguyen <anthony.l.nguyen@intel.com>; David S. Miller <davem@davemloft.net>; Jakub Kicinski <kuba@kernel.org>; intel-wired-lan@lists.osuosl.org; netdev@vger.kernel.org; linux-kernel@vger.kernel.org
> Cc: Jeff Evanson <Jeff.Evanson@qsc.com>; jeff.evanson@gmail.com
> Subject: Re: [PATCH 2/2] Trigger proper interrupts in igc_xsk_wakeup
>
> -External-
>
> Jeff Evanson <jeff.evanson@gmail.com> writes:
>
>> in igc_xsk_wakeup, trigger the proper interrupt based on whether flags 
>> contains XDP_WAKEUP_RX and/or XDP_WAKEUP_TX
>>
>> Signed-off-by: Jeff Evanson <jeff.evanson@qsc.com>
>> ---
>>  drivers/net/ethernet/intel/igc/igc_main.c | 36 
>> +++++++++++++++++------
>>  1 file changed, 27 insertions(+), 9 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/intel/igc/igc_main.c 
>> b/drivers/net/ethernet/intel/igc/igc_main.c
>> index a36a18c84aeb..d706de95dc06 100644
>> --- a/drivers/net/ethernet/intel/igc/igc_main.c
>> +++ b/drivers/net/ethernet/intel/igc/igc_main.c
>> @@ -6073,7 +6073,7 @@ static void igc_trigger_rxtxq_interrupt(struct 
>> igc_adapter *adapter,  int igc_xsk_wakeup(struct net_device *dev, u32 
>> queue_id, u32 flags)  {
>>  	struct igc_adapter *adapter = netdev_priv(dev);
>> -	struct igc_q_vector *q_vector;
>> +	struct igc_q_vector *txq_vector = 0, *rxq_vector = 0;
>>  	struct igc_ring *ring;
>>  
>>  	if (test_bit(__IGC_DOWN, &adapter->state)) @@ -6082,17 +6082,35 @@ 
>> int igc_xsk_wakeup(struct net_device *dev, u32 queue_id, u32 flags)
>>  	if (!igc_xdp_is_enabled(adapter))
>>  		return -ENXIO;
>>  
>> -	if (queue_id >= adapter->num_rx_queues)
>> -		return -EINVAL;
>> +	if (flags & XDP_WAKEUP_RX) {
>> +		if (queue_id >= adapter->num_rx_queues)
>> +			return -EINVAL;
>>  
>> -	ring = adapter->rx_ring[queue_id];
>> +		ring = adapter->rx_ring[queue_id];
>> +		if (!ring->xsk_pool)
>> +			return -ENXIO;
>>  
>> -	if (!ring->xsk_pool)
>> -		return -ENXIO;
>> +		rxq_vector = ring->q_vector;
>> +	}
>> +
>> +	if (flags & XDP_WAKEUP_TX) {
>> +		if (queue_id >= adapter->num_tx_queues)
>> +			return -EINVAL;
>> +
>> +		ring = adapter->tx_ring[queue_id];
>> +		if (!ring->xsk_pool)
>> +			return -ENXIO;
>> +
>> +		txq_vector = ring->q_vector;
>> +	}
>> +
>> +	if (rxq_vector &&
>> +	    !napi_if_scheduled_mark_missed(&rxq_vector->napi))
>> +		igc_trigger_rxtxq_interrupt(adapter, rxq_vector);
>>  
>> -	q_vector = adapter->q_vector[queue_id];
>> -	if (!napi_if_scheduled_mark_missed(&q_vector->napi))
>> -		igc_trigger_rxtxq_interrupt(adapter, q_vector);
>> +	if (txq_vector && txq_vector != rxq_vector &&
>> +	    !napi_if_scheduled_mark_missed(&txq_vector->napi))
>> +		igc_trigger_rxtxq_interrupt(adapter, txq_vector);
>
> Two things:
>  - My imagination was not able to produce a scenario where this commit  would solve any problems. Can you explain better the case where the  current code would cause the wrong interrupt to be generated (or miss  generating an interrupt)? (this should be in the commit message)
>  - I think that with this patch applied, there would cases (both TX and  RX flags set) that we cause two writes into the EICS register. That  could cause unnecessary wakeups.
>
>>  
>>  	return 0;
>>  }
>> --
>> 2.17.1
>>
>
> Cheers,
> -- 
> Vinicius
>

-- 
Vinicius

^ permalink raw reply

* Re: [PATCH bpf-next v2] bpf: move rcu lock management out of BPF_PROG_RUN routines
From: Stanislav Fomichev @ 2022-04-19 17:01 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Network Development, bpf, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau
In-Reply-To: <CAADnVQJ5qV54=7QMJfUzdAj1T31xD4aRMSin4npyNmudwP2m+g@mail.gmail.com>

On Tue, Apr 19, 2022 at 9:49 AM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Tue, Apr 19, 2022 at 9:35 AM Stanislav Fomichev <sdf@google.com> wrote:
> >
> > On Tue, Apr 19, 2022 at 9:32 AM Alexei Starovoitov
> > <alexei.starovoitov@gmail.com> wrote:
> > >
> > > On Tue, Apr 19, 2022 at 9:20 AM Alexei Starovoitov
> > > <alexei.starovoitov@gmail.com> wrote:
> > > >
> > > > On Tue, Apr 19, 2022 at 8:42 AM <sdf@google.com> wrote:
> > > > >
> > > > > On 04/18, Alexei Starovoitov wrote:
> > > > > > On Mon, Apr 18, 2022 at 9:50 AM <sdf@google.com> wrote:
> > > > > > >
> > > > > > > On 04/16, Alexei Starovoitov wrote:
> > > > > > > > On Thu, Apr 14, 2022 at 9:12 AM Stanislav Fomichev <sdf@google.com>
> > > > > > wrote:
> > > > > > > > > +static int
> > > > > > > > > +bpf_prog_run_array_cg_flags(const struct cgroup_bpf *cgrp,
> > > > > > > > > +                           enum cgroup_bpf_attach_type atype,
> > > > > > > > > +                           const void *ctx, bpf_prog_run_fn
> > > > > > run_prog,
> > > > > > > > > +                           int retval, u32 *ret_flags)
> > > > > > > > > +{
> > > > > > > > > +       const struct bpf_prog_array_item *item;
> > > > > > > > > +       const struct bpf_prog *prog;
> > > > > > > > > +       const struct bpf_prog_array *array;
> > > > > > > > > +       struct bpf_run_ctx *old_run_ctx;
> > > > > > > > > +       struct bpf_cg_run_ctx run_ctx;
> > > > > > > > > +       u32 func_ret;
> > > > > > > > > +
> > > > > > > > > +       run_ctx.retval = retval;
> > > > > > > > > +       migrate_disable();
> > > > > > > > > +       rcu_read_lock();
> > > > > > > > > +       array = rcu_dereference(cgrp->effective[atype]);
> > > > > > > > > +       item = &array->items[0];
> > > > > > > > > +       old_run_ctx = bpf_set_run_ctx(&run_ctx.run_ctx);
> > > > > > > > > +       while ((prog = READ_ONCE(item->prog))) {
> > > > > > > > > +               run_ctx.prog_item = item;
> > > > > > > > > +               func_ret = run_prog(prog, ctx);
> > > > > > > > ...
> > > > > > > > > +       ret = bpf_prog_run_array_cg(&cgrp->bpf, CGROUP_GETSOCKOPT,
> > > > > > > > >                                     &ctx, bpf_prog_run, retval);
> > > > > > >
> > > > > > > > Did you check the asm that bpf_prog_run gets inlined
> > > > > > > > after being passed as a pointer to a function?
> > > > > > > > Crossing fingers... I suspect not every compiler can do that :(
> > > > > > > > De-virtualization optimization used to be tricky.
> > > > > > >
> > > > > > > No, I didn't, but looking at it right now, both gcc and clang
> > > > > > > seem to be doing inlining all way up to bpf_dispatcher_nop_func.
> > > > > > >
> > > > > > > clang:
> > > > > > >
> > > > > > >    0000000000001750 <__cgroup_bpf_run_filter_sock_addr>:
> > > > > > >    __cgroup_bpf_run_filter_sock_addr():
> > > > > > >    ./kernel/bpf/cgroup.c:1226
> > > > > > >    int __cgroup_bpf_run_filter_sock_addr(struct sock *sk,
> > > > > > >                                       struct sockaddr *uaddr,
> > > > > > >                                       enum cgroup_bpf_attach_type atype,
> > > > > > >                                       void *t_ctx,
> > > > > > >                                       u32 *flags)
> > > > > > >    {
> > > > > > >
> > > > > > >    ...
> > > > > > >
> > > > > > >    ./include/linux/filter.h:628
> > > > > > >                 ret = dfunc(ctx, prog->insnsi, prog->bpf_func);
> > > > > > >        1980:    49 8d 75 48             lea    0x48(%r13),%rsi
> > > > > > >    bpf_dispatcher_nop_func():
> > > > > > >    ./include/linux/bpf.h:804
> > > > > > >         return bpf_func(ctx, insnsi);
> > > > > > >        1984:    4c 89 f7                mov    %r14,%rdi
> > > > > > >        1987:    41 ff 55 30             call   *0x30(%r13)
> > > > > > >        198b:    89 c3                   mov    %eax,%ebx
> > > > > > >
> > > > > > > gcc (w/retpoline):
> > > > > > >
> > > > > > >    0000000000001110 <__cgroup_bpf_run_filter_sock_addr>:
> > > > > > >    __cgroup_bpf_run_filter_sock_addr():
> > > > > > >    kernel/bpf/cgroup.c:1226
> > > > > > >    {
> > > > > > >
> > > > > > >    ...
> > > > > > >
> > > > > > >    ./include/linux/filter.h:628
> > > > > > >                 ret = dfunc(ctx, prog->insnsi, prog->bpf_func);
> > > > > > >        11c5:    49 8d 75 48             lea    0x48(%r13),%rsi
> > > > > > >    bpf_dispatcher_nop_func():
> > > > > > >    ./include/linux/bpf.h:804
> > > > > > >        11c9:    48 8d 7c 24 10          lea    0x10(%rsp),%rdi
> > > > > > >        11ce:    e8 00 00 00 00          call   11d3
> > > > > > > <__cgroup_bpf_run_filter_sock_addr+0xc3>
> > > > > > >                         11cf: R_X86_64_PLT32
> > > > > > __x86_indirect_thunk_rax-0x4
> > > > > > >        11d3:    89 c3                   mov    %eax,%ebx
> > > > >
> > > > > > Hmm. I'm not sure how you've got this asm.
> > > > > > Here is what I see with gcc 8 and gcc 10:
> > > > > > bpf_prog_run_array_cg:
> > > > > > ...
> > > > > >          movq    %rcx, %r12      # run_prog, run_prog
> > > > > > ...
> > > > > > # ../kernel/bpf/cgroup.c:77:            run_ctx.prog_item = item;
> > > > > >          movq    %rbx, (%rsp)    # item, run_ctx.prog_item
> > > > > > # ../kernel/bpf/cgroup.c:78:            if (!run_prog(prog, ctx) &&
> > > > > > !IS_ERR_VALUE((long)run_ctx.retval))
> > > > > >          movq    %rbp, %rsi      # ctx,
> > > > > >          call    *%r12   # run_prog
> > > > >
> > > > > > __cgroup_bpf_run_filter_sk:
> > > > > >          movq    $bpf_prog_run, %rcx     #,
> > > > > > # ../kernel/bpf/cgroup.c:1202:  return
> > > > > > bpf_prog_run_array_cg(&cgrp->bpf, atype, sk, bpf_prog_run, 0);
> > > > > >          leaq    1520(%rax), %rdi        #, tmp92
> > > > > > # ../kernel/bpf/cgroup.c:1202:  return
> > > > > > bpf_prog_run_array_cg(&cgrp->bpf, atype, sk, bpf_prog_run, 0);
> > > > > >          jmp     bpf_prog_run_array_cg   #
> > > > >
> > > > > > This is without kasan, lockdep and all debug configs are off.
> > > > >
> > > > > > So the generated code is pretty bad as I predicted :(
> > > > >
> > > > > > So I'm afraid this approach is no go.
> > > > >
> > > > > I've retested again and it still unrolls it for me on gcc 11 :-/
> > > > > Anyway, I guess we have two options:
> > > > >
> > > > > 1. Go back to defines.
> > > > > 2. Don't pass a ptr to func, but pass an enum which indicates whether
> > > > >     to use bpf_prog_run or __bpf_prog_run_save_cb. Seems like in this
> > > > >     case the compiler shouldn't have any trouble unwrapping it?
> > > > >
> > > > > I'll prototype and send (2). If it won't work out we can always get back
> > > > > to (1).
> > > >
> > > > Going back to defines is probably not necessary.
> > > > Could you try moving bpf_prog_run_array_cg*() back to .h
> > > > and use static __always_inline ?
> > >
> > > Actually below was enough for gcc 8 and 10:
> > > -static int
> > > +static __always_inline int
> > >  bpf_prog_run_array_cg_flags(const struct cgroup_bpf *cgrp,
> > >                             enum cgroup_bpf_attach_type atype,
> > >                             const void *ctx, bpf_prog_run_fn run_prog,
> > > @@ -55,7 +55,7 @@ bpf_prog_run_array_cg_flags(const struct cgroup_bpf *cgrp,
> > >         return run_ctx.retval;
> > >  }
> > >
> > > -static int
> > > +static __always_inline int
> > >  bpf_prog_run_array_cg(const struct cgroup_bpf *cgrp,
> > >
> > > we can keep them in .c and generated code looks good.
> > >
> > > I can apply it with the above change.
> > > wdyt?
> >
> > Sure, let's go with that if it works! On my side, I managed to get the
> > same bad results on gcc-8; moving them to bpf-cgroup.h with
> > __always_inline seems to fix it. But if we can keep them in .c, that
> > looks even better.
>
> Ok. Applied.
>
> As the next step... can we combine bpf_prog_run_array_cg*()
> into one function?
> The only difference is:
> func_ret = run_prog(prog, ctx);
> if (!(func_ret & 1)
>   vs
> if (!run_prog(prog, ctx)
>
> afaik we don't have a check on the verifier side for possible
> return values of cgroup progs,
> so it might break some progs if we just do the former
> in both cases?

Seems like we do have return ranges checking for cgroup progs (I'm
looking at check_return_code). Using bpf_prog_run_array_cg_flags
everywhere seems possible, I can try and post some patches if it
works.

^ permalink raw reply

* Re: [PATCH net] net: sched: act_mirred: Reset ct info when mirror/redirect skb
From: Eyal Birger @ 2022-04-19 16:50 UTC (permalink / raw)
  To: Hangbin Liu
  Cc: netdev, jhs, xiyou.wangcong, jiri, davem, kuba, mleitner,
	ahleihel, dcaratti, aconole, roid, Shmulik Ladkani
In-Reply-To: <162850320655.31628.17692584840907169170.git-patchwork-notify@kernel.org>

Hi,

On Mon, Aug 9, 2021 at 1:29 PM <patchwork-bot+netdevbpf@kernel.org> wrote:
>
> Hello:
>
> This patch was applied to netdev/net.git (refs/heads/master):
>
> On Mon,  9 Aug 2021 15:04:55 +0800 you wrote:
> > When mirror/redirect a skb to a different port, the ct info should be reset
> > for reclassification. Or the pkts will match unexpected rules. For example,
> > with following topology and commands:
> >
> >     -----------
> >               |
> >        veth0 -+-------
> >               |
> >        veth1 -+-------
> >               |
> >
> > [...]
>
> Here is the summary with links:
>   - [net] net: sched: act_mirred: Reset ct info when mirror/redirect skb
>     https://git.kernel.org/netdev/net/c/d09c548dbf3b

Unfortunately this commit breaks DNAT when performed before going via mirred
egress->ingress.

The reason is that connection tracking is lost and therefore a new state
is created on ingress.

This breaks existing setups.

See below a simplified script reproducing this issue.

Therefore I suggest this commit be reverted and a knob is introduced to mirred
for clearing ct as needed.

Eyal.

Reproduction script:

#!/bin/bash

ip netns add a
ip netns add b

ip netns exec a sysctl -w net.ipv4.conf.all.forwarding=1
ip netns exec a sysctl -w net.ipv4.conf.all.accept_local=1

ip link add veth0 netns a type veth peer name veth0 netns b
ip -net a link set veth0 up
ip -net a addr add dev veth0 198.51.100.1/30

ip -net a link add dum0 type dummy
ip -net a link set dev dum0 up
ip -net a addr add dev dum0 198.51.100.2/32

ip netns exec a iptables -t nat -I OUTPUT -d 10.0.0.1 -j DNAT
--to-destination 10.0.0.2
ip -net a route add default dev dum0
ip -net a rule add pref 50 iif dum0 lookup 1000
ip -net a route add table 1000 default dev veth0

ip netns exec a tc qdisc add dev dum0 clsact
ip netns exec a tc filter add dev dum0 parent ffff:fff3 prio 50 basic
action mirred ingress redirect dev dum0

ip -net b link set veth0 up
ip  -net b addr add 10.0.0.2/32 dev veth0
ip  -net b addr add 198.51.100.3/30 dev veth0

ip netns exec a ping 10.0.0.1
>
> You are awesome, thank you!
> --
> Deet-doot-dot, I am a bot.
> https://korg.docs.kernel.org/patchwork/pwbot.html
>
>

^ permalink raw reply

* Re: [PATCH bpf-next v2] bpf: move rcu lock management out of BPF_PROG_RUN routines
From: Alexei Starovoitov @ 2022-04-19 16:48 UTC (permalink / raw)
  To: Stanislav Fomichev
  Cc: Network Development, bpf, Alexei Starovoitov, Daniel Borkmann,
	Andrii Nakryiko, Martin KaFai Lau
In-Reply-To: <CAKH8qBvApjJ5G4bNMtHxT+Fcw6uKOTZh1opkaC96OT6Gq55aJg@mail.gmail.com>

On Tue, Apr 19, 2022 at 9:35 AM Stanislav Fomichev <sdf@google.com> wrote:
>
> On Tue, Apr 19, 2022 at 9:32 AM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> >
> > On Tue, Apr 19, 2022 at 9:20 AM Alexei Starovoitov
> > <alexei.starovoitov@gmail.com> wrote:
> > >
> > > On Tue, Apr 19, 2022 at 8:42 AM <sdf@google.com> wrote:
> > > >
> > > > On 04/18, Alexei Starovoitov wrote:
> > > > > On Mon, Apr 18, 2022 at 9:50 AM <sdf@google.com> wrote:
> > > > > >
> > > > > > On 04/16, Alexei Starovoitov wrote:
> > > > > > > On Thu, Apr 14, 2022 at 9:12 AM Stanislav Fomichev <sdf@google.com>
> > > > > wrote:
> > > > > > > > +static int
> > > > > > > > +bpf_prog_run_array_cg_flags(const struct cgroup_bpf *cgrp,
> > > > > > > > +                           enum cgroup_bpf_attach_type atype,
> > > > > > > > +                           const void *ctx, bpf_prog_run_fn
> > > > > run_prog,
> > > > > > > > +                           int retval, u32 *ret_flags)
> > > > > > > > +{
> > > > > > > > +       const struct bpf_prog_array_item *item;
> > > > > > > > +       const struct bpf_prog *prog;
> > > > > > > > +       const struct bpf_prog_array *array;
> > > > > > > > +       struct bpf_run_ctx *old_run_ctx;
> > > > > > > > +       struct bpf_cg_run_ctx run_ctx;
> > > > > > > > +       u32 func_ret;
> > > > > > > > +
> > > > > > > > +       run_ctx.retval = retval;
> > > > > > > > +       migrate_disable();
> > > > > > > > +       rcu_read_lock();
> > > > > > > > +       array = rcu_dereference(cgrp->effective[atype]);
> > > > > > > > +       item = &array->items[0];
> > > > > > > > +       old_run_ctx = bpf_set_run_ctx(&run_ctx.run_ctx);
> > > > > > > > +       while ((prog = READ_ONCE(item->prog))) {
> > > > > > > > +               run_ctx.prog_item = item;
> > > > > > > > +               func_ret = run_prog(prog, ctx);
> > > > > > > ...
> > > > > > > > +       ret = bpf_prog_run_array_cg(&cgrp->bpf, CGROUP_GETSOCKOPT,
> > > > > > > >                                     &ctx, bpf_prog_run, retval);
> > > > > >
> > > > > > > Did you check the asm that bpf_prog_run gets inlined
> > > > > > > after being passed as a pointer to a function?
> > > > > > > Crossing fingers... I suspect not every compiler can do that :(
> > > > > > > De-virtualization optimization used to be tricky.
> > > > > >
> > > > > > No, I didn't, but looking at it right now, both gcc and clang
> > > > > > seem to be doing inlining all way up to bpf_dispatcher_nop_func.
> > > > > >
> > > > > > clang:
> > > > > >
> > > > > >    0000000000001750 <__cgroup_bpf_run_filter_sock_addr>:
> > > > > >    __cgroup_bpf_run_filter_sock_addr():
> > > > > >    ./kernel/bpf/cgroup.c:1226
> > > > > >    int __cgroup_bpf_run_filter_sock_addr(struct sock *sk,
> > > > > >                                       struct sockaddr *uaddr,
> > > > > >                                       enum cgroup_bpf_attach_type atype,
> > > > > >                                       void *t_ctx,
> > > > > >                                       u32 *flags)
> > > > > >    {
> > > > > >
> > > > > >    ...
> > > > > >
> > > > > >    ./include/linux/filter.h:628
> > > > > >                 ret = dfunc(ctx, prog->insnsi, prog->bpf_func);
> > > > > >        1980:    49 8d 75 48             lea    0x48(%r13),%rsi
> > > > > >    bpf_dispatcher_nop_func():
> > > > > >    ./include/linux/bpf.h:804
> > > > > >         return bpf_func(ctx, insnsi);
> > > > > >        1984:    4c 89 f7                mov    %r14,%rdi
> > > > > >        1987:    41 ff 55 30             call   *0x30(%r13)
> > > > > >        198b:    89 c3                   mov    %eax,%ebx
> > > > > >
> > > > > > gcc (w/retpoline):
> > > > > >
> > > > > >    0000000000001110 <__cgroup_bpf_run_filter_sock_addr>:
> > > > > >    __cgroup_bpf_run_filter_sock_addr():
> > > > > >    kernel/bpf/cgroup.c:1226
> > > > > >    {
> > > > > >
> > > > > >    ...
> > > > > >
> > > > > >    ./include/linux/filter.h:628
> > > > > >                 ret = dfunc(ctx, prog->insnsi, prog->bpf_func);
> > > > > >        11c5:    49 8d 75 48             lea    0x48(%r13),%rsi
> > > > > >    bpf_dispatcher_nop_func():
> > > > > >    ./include/linux/bpf.h:804
> > > > > >        11c9:    48 8d 7c 24 10          lea    0x10(%rsp),%rdi
> > > > > >        11ce:    e8 00 00 00 00          call   11d3
> > > > > > <__cgroup_bpf_run_filter_sock_addr+0xc3>
> > > > > >                         11cf: R_X86_64_PLT32
> > > > > __x86_indirect_thunk_rax-0x4
> > > > > >        11d3:    89 c3                   mov    %eax,%ebx
> > > >
> > > > > Hmm. I'm not sure how you've got this asm.
> > > > > Here is what I see with gcc 8 and gcc 10:
> > > > > bpf_prog_run_array_cg:
> > > > > ...
> > > > >          movq    %rcx, %r12      # run_prog, run_prog
> > > > > ...
> > > > > # ../kernel/bpf/cgroup.c:77:            run_ctx.prog_item = item;
> > > > >          movq    %rbx, (%rsp)    # item, run_ctx.prog_item
> > > > > # ../kernel/bpf/cgroup.c:78:            if (!run_prog(prog, ctx) &&
> > > > > !IS_ERR_VALUE((long)run_ctx.retval))
> > > > >          movq    %rbp, %rsi      # ctx,
> > > > >          call    *%r12   # run_prog
> > > >
> > > > > __cgroup_bpf_run_filter_sk:
> > > > >          movq    $bpf_prog_run, %rcx     #,
> > > > > # ../kernel/bpf/cgroup.c:1202:  return
> > > > > bpf_prog_run_array_cg(&cgrp->bpf, atype, sk, bpf_prog_run, 0);
> > > > >          leaq    1520(%rax), %rdi        #, tmp92
> > > > > # ../kernel/bpf/cgroup.c:1202:  return
> > > > > bpf_prog_run_array_cg(&cgrp->bpf, atype, sk, bpf_prog_run, 0);
> > > > >          jmp     bpf_prog_run_array_cg   #
> > > >
> > > > > This is without kasan, lockdep and all debug configs are off.
> > > >
> > > > > So the generated code is pretty bad as I predicted :(
> > > >
> > > > > So I'm afraid this approach is no go.
> > > >
> > > > I've retested again and it still unrolls it for me on gcc 11 :-/
> > > > Anyway, I guess we have two options:
> > > >
> > > > 1. Go back to defines.
> > > > 2. Don't pass a ptr to func, but pass an enum which indicates whether
> > > >     to use bpf_prog_run or __bpf_prog_run_save_cb. Seems like in this
> > > >     case the compiler shouldn't have any trouble unwrapping it?
> > > >
> > > > I'll prototype and send (2). If it won't work out we can always get back
> > > > to (1).
> > >
> > > Going back to defines is probably not necessary.
> > > Could you try moving bpf_prog_run_array_cg*() back to .h
> > > and use static __always_inline ?
> >
> > Actually below was enough for gcc 8 and 10:
> > -static int
> > +static __always_inline int
> >  bpf_prog_run_array_cg_flags(const struct cgroup_bpf *cgrp,
> >                             enum cgroup_bpf_attach_type atype,
> >                             const void *ctx, bpf_prog_run_fn run_prog,
> > @@ -55,7 +55,7 @@ bpf_prog_run_array_cg_flags(const struct cgroup_bpf *cgrp,
> >         return run_ctx.retval;
> >  }
> >
> > -static int
> > +static __always_inline int
> >  bpf_prog_run_array_cg(const struct cgroup_bpf *cgrp,
> >
> > we can keep them in .c and generated code looks good.
> >
> > I can apply it with the above change.
> > wdyt?
>
> Sure, let's go with that if it works! On my side, I managed to get the
> same bad results on gcc-8; moving them to bpf-cgroup.h with
> __always_inline seems to fix it. But if we can keep them in .c, that
> looks even better.

Ok. Applied.

As the next step... can we combine bpf_prog_run_array_cg*()
into one function?
The only difference is:
func_ret = run_prog(prog, ctx);
if (!(func_ret & 1)
  vs
if (!run_prog(prog, ctx)

afaik we don't have a check on the verifier side for possible
return values of cgroup progs,
so it might break some progs if we just do the former
in both cases?

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox