All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next 0/4] net: make dev_watchdog() less intrusive
@ 2021-11-17  3:29 Eric Dumazet
  2021-11-17  3:29 ` [PATCH net-next 1/4] net: use an atomic_long_t for queue->trans_timeout Eric Dumazet
                   ` (4 more replies)
  0 siblings, 5 replies; 7+ messages in thread
From: Eric Dumazet @ 2021-11-17  3:29 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski; +Cc: netdev, Eric Dumazet, Eric Dumazet

From: Eric Dumazet <edumazet@google.com>

dev_watchdog() is used on many NIC to periodically monitor TX queues
to detect hangs.

Problem is : It stops all queues, then check them, then 'unfreeze' them.

Not only this stops feeding the NIC, it also migrates all qdiscs
to be serviced on the cpu calling netif_tx_unlock(), causing
a potential latency artifact.

With many TX queues, this is becoming more visible.

Eric Dumazet (4):
  net: use an atomic_long_t for queue->trans_timeout
  net: annotate accesses to queue->trans_start
  net: do not inline netif_tx_lock()/netif_tx_unlock()
  net: no longer stop all TX queues in dev_watchdog()

 .../net/ethernet/apm/xgene/xgene_enet_main.c  |  2 +-
 drivers/net/ethernet/atheros/ag71xx.c         |  2 +-
 .../net/ethernet/freescale/dpaa/dpaa_eth.c    |  4 +-
 .../net/ethernet/hisilicon/hns3/hns3_enet.c   |  2 +-
 drivers/net/ethernet/ibm/ibmvnic.c            |  2 +-
 drivers/net/ethernet/intel/igb/igb_main.c     |  4 +-
 .../mellanox/mlx5/core/en/reporter_tx.c       |  2 +-
 .../net/ethernet/stmicro/stmmac/stmmac_main.c |  6 +-
 drivers/net/ethernet/ti/am65-cpsw-nuss.c      |  2 +-
 drivers/net/virtio_net.c                      |  2 +-
 drivers/net/wireless/marvell/mwifiex/init.c   |  2 +-
 drivers/staging/rtl8192e/rtllib_softmac.c     |  2 +-
 include/linux/netdevice.h                     | 57 +++++----------
 net/core/net-sysfs.c                          |  6 +-
 net/sched/sch_generic.c                       | 69 ++++++++++++++++---
 15 files changed, 94 insertions(+), 70 deletions(-)

-- 
2.34.0.rc1.387.gb447b232ab-goog


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH net-next 1/4] net: use an atomic_long_t for queue->trans_timeout
  2021-11-17  3:29 [PATCH net-next 0/4] net: make dev_watchdog() less intrusive Eric Dumazet
@ 2021-11-17  3:29 ` Eric Dumazet
  2021-11-17  3:29 ` [PATCH net-next 2/4] net: annotate accesses to queue->trans_start Eric Dumazet
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 7+ messages in thread
From: Eric Dumazet @ 2021-11-17  3:29 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski
  Cc: netdev, Eric Dumazet, Eric Dumazet, david decotigny

From: Eric Dumazet <edumazet@google.com>

tx_timeout_show() assumed dev_watchdog() would stop all
the queues, to fetch queue->trans_timeout under protection
of the queue->_xmit_lock.

As we want to no longer disrupt transmits, we use an
atomic_long_t instead.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: david decotigny <david.decotigny@google.com>
---
 include/linux/netdevice.h | 2 +-
 net/core/net-sysfs.c      | 6 +-----
 net/sched/sch_generic.c   | 2 +-
 3 files changed, 3 insertions(+), 7 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 31a7e6b2768123021690a3dc8572c5e8cb0e0027..143ac02c7f1cc90cf6704574fb0012e1ba830c70 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -592,7 +592,7 @@ struct netdev_queue {
 	 * Number of TX timeouts for this queue
 	 * (/sys/class/net/DEV/Q/trans_timeout)
 	 */
-	unsigned long		trans_timeout;
+	atomic_long_t		trans_timeout;
 
 	/* Subordinate device that the queue has been assigned to */
 	struct net_device	*sb_dev;
diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
index 9c01c642cf9ef384fe54e56243b102ef838d0a62..addbef5419fbb62ce83f5132ae21c9d2872e95f5 100644
--- a/net/core/net-sysfs.c
+++ b/net/core/net-sysfs.c
@@ -1201,11 +1201,7 @@ static const struct sysfs_ops netdev_queue_sysfs_ops = {
 
 static ssize_t tx_timeout_show(struct netdev_queue *queue, char *buf)
 {
-	unsigned long trans_timeout;
-
-	spin_lock_irq(&queue->_xmit_lock);
-	trans_timeout = queue->trans_timeout;
-	spin_unlock_irq(&queue->_xmit_lock);
+	unsigned long trans_timeout = atomic_long_read(&queue->trans_timeout);
 
 	return sprintf(buf, fmt_ulong, trans_timeout);
 }
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index 3b0f620958037eb46e395f172c2315fdd98be914..1b4328bd495d54d44a9d51b53c8e8bc18b9cc294 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -467,7 +467,7 @@ static void dev_watchdog(struct timer_list *t)
 				    time_after(jiffies, (trans_start +
 							 dev->watchdog_timeo))) {
 					some_queue_timedout = 1;
-					txq->trans_timeout++;
+					atomic_long_inc(&txq->trans_timeout);
 					break;
 				}
 			}
-- 
2.34.0.rc1.387.gb447b232ab-goog


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH net-next 2/4] net: annotate accesses to queue->trans_start
  2021-11-17  3:29 [PATCH net-next 0/4] net: make dev_watchdog() less intrusive Eric Dumazet
  2021-11-17  3:29 ` [PATCH net-next 1/4] net: use an atomic_long_t for queue->trans_timeout Eric Dumazet
@ 2021-11-17  3:29 ` Eric Dumazet
  2021-11-18 18:27   ` kernel test robot
  2021-11-17  3:29 ` [PATCH net-next 3/4] net: do not inline netif_tx_lock()/netif_tx_unlock() Eric Dumazet
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 7+ messages in thread
From: Eric Dumazet @ 2021-11-17  3:29 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski; +Cc: netdev, Eric Dumazet, Eric Dumazet

From: Eric Dumazet <edumazet@google.com>

In following patches, dev_watchdog() will no longer stop all queues.
It will read queue->trans_start locklessly.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 drivers/net/ethernet/apm/xgene/xgene_enet_main.c |  2 +-
 drivers/net/ethernet/atheros/ag71xx.c            |  2 +-
 drivers/net/ethernet/freescale/dpaa/dpaa_eth.c   |  4 ++--
 drivers/net/ethernet/hisilicon/hns3/hns3_enet.c  |  2 +-
 drivers/net/ethernet/ibm/ibmvnic.c               |  2 +-
 drivers/net/ethernet/intel/igb/igb_main.c        |  4 ++--
 .../ethernet/mellanox/mlx5/core/en/reporter_tx.c |  2 +-
 .../net/ethernet/stmicro/stmmac/stmmac_main.c    |  6 +++---
 drivers/net/ethernet/ti/am65-cpsw-nuss.c         |  2 +-
 drivers/net/virtio_net.c                         |  2 +-
 drivers/net/wireless/marvell/mwifiex/init.c      |  2 +-
 drivers/staging/rtl8192e/rtllib_softmac.c        |  2 +-
 include/linux/netdevice.h                        | 16 +++++++++++++---
 net/sched/sch_generic.c                          |  8 ++++----
 14 files changed, 33 insertions(+), 23 deletions(-)

diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_main.c b/drivers/net/ethernet/apm/xgene/xgene_enet_main.c
index 220dc42af31ae1200ca05441000bb2a1abd0fd89..ff2d099aab218b30783266d5f905e3f0846f0951 100644
--- a/drivers/net/ethernet/apm/xgene/xgene_enet_main.c
+++ b/drivers/net/ethernet/apm/xgene/xgene_enet_main.c
@@ -869,7 +869,7 @@ static void xgene_enet_timeout(struct net_device *ndev, unsigned int txqueue)
 
 	for (i = 0; i < pdata->txq_cnt; i++) {
 		txq = netdev_get_tx_queue(ndev, i);
-		txq->trans_start = jiffies;
+		txq_trans_cond_update(txq);
 		netif_tx_start_queue(txq);
 	}
 }
diff --git a/drivers/net/ethernet/atheros/ag71xx.c b/drivers/net/ethernet/atheros/ag71xx.c
index 88d2ab7483994a850efcea1228d9718e0f6bc2ae..e4f30bb7498fec02742376a37a22495cba38a9ea 100644
--- a/drivers/net/ethernet/atheros/ag71xx.c
+++ b/drivers/net/ethernet/atheros/ag71xx.c
@@ -766,7 +766,7 @@ static bool ag71xx_check_dma_stuck(struct ag71xx *ag)
 	unsigned long timestamp;
 	u32 rx_sm, tx_sm, rx_fd;
 
-	timestamp = netdev_get_tx_queue(ag->ndev, 0)->trans_start;
+	timestamp = READ_ONCE(netdev_get_tx_queue(ag->ndev, 0)->trans_start);
 	if (likely(time_before(jiffies, timestamp + HZ / 10)))
 		return false;
 
diff --git a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
index 6b2927d863e2cc7569dadd4cd1e974dcc347274e..d6871437d9515df18c7819817799f9ade8bcb57e 100644
--- a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
+++ b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
@@ -2325,7 +2325,7 @@ dpaa_start_xmit(struct sk_buff *skb, struct net_device *net_dev)
 	txq = netdev_get_tx_queue(net_dev, queue_mapping);
 
 	/* LLTX requires to do our own update of trans_start */
-	txq->trans_start = jiffies;
+	txq_trans_cond_update(txq);
 
 	if (priv->tx_tstamp && skb_shinfo(skb)->tx_flags & SKBTX_HW_TSTAMP) {
 		fd.cmd |= cpu_to_be32(FM_FD_CMD_UPD);
@@ -2531,7 +2531,7 @@ static int dpaa_xdp_xmit_frame(struct net_device *net_dev,
 
 	/* Bump the trans_start */
 	txq = netdev_get_tx_queue(net_dev, smp_processor_id());
-	txq->trans_start = jiffies;
+	txq_trans_cond_update(txq);
 
 	err = dpaa_xmit(priv, percpu_stats, smp_processor_id(), &fd);
 	if (err) {
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
index 13835a37b3a2fd13d557750fb3943172fc2eb700..d5100179f8d589dc4ea517f5bafca0612a5fcc38 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
@@ -2679,7 +2679,7 @@ static bool hns3_get_tx_timeo_queue_info(struct net_device *ndev)
 		unsigned long trans_start;
 
 		q = netdev_get_tx_queue(ndev, i);
-		trans_start = q->trans_start;
+		trans_start = READ_ONCE(q->trans_start);
 		if (netif_xmit_stopped(q) &&
 		    time_after(jiffies,
 			       (trans_start + ndev->watchdog_timeo))) {
diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c
index 3cca51735421a7435f4c7f32fa3f5af9003f2d37..c327fc8860da20e16b6801adc071bfb90dd05c36 100644
--- a/drivers/net/ethernet/ibm/ibmvnic.c
+++ b/drivers/net/ethernet/ibm/ibmvnic.c
@@ -2058,7 +2058,7 @@ static netdev_tx_t ibmvnic_xmit(struct sk_buff *skb, struct net_device *netdev)
 
 	tx_packets++;
 	tx_bytes += skb->len;
-	txq->trans_start = jiffies;
+	txq_trans_cond_update(txq);
 	ret = NETDEV_TX_OK;
 	goto out;
 
diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index 836be0d3b29105d48530e2ce6b3f8db13c730e71..18a019a47182218ff85a83a00e75c99005e22a34 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -2927,7 +2927,7 @@ static int igb_xdp_xmit_back(struct igb_adapter *adapter, struct xdp_buff *xdp)
 	nq = txring_txq(tx_ring);
 	__netif_tx_lock(nq, cpu);
 	/* Avoid transmit queue timeout since we share it with the slow path */
-	nq->trans_start = jiffies;
+	txq_trans_cond_update(nq);
 	ret = igb_xmit_xdp_ring(adapter, tx_ring, xdpf);
 	__netif_tx_unlock(nq);
 
@@ -2961,7 +2961,7 @@ static int igb_xdp_xmit(struct net_device *dev, int n,
 	__netif_tx_lock(nq, cpu);
 
 	/* Avoid transmit queue timeout since we share it with the slow path */
-	nq->trans_start = jiffies;
+	txq_trans_cond_update(nq);
 
 	for (i = 0; i < n; i++) {
 		struct xdp_frame *xdpf = frames[i];
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c
index 4f4bc8726ec4fa2b20a32edb25780ad177f6860a..86060513328739ecd6702525a6902bd82c7d2db6 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c
@@ -565,7 +565,7 @@ int mlx5e_reporter_tx_timeout(struct mlx5e_txqsq *sq)
 	snprintf(err_str, sizeof(err_str),
 		 "TX timeout on queue: %d, SQ: 0x%x, CQ: 0x%x, SQ Cons: 0x%x SQ Prod: 0x%x, usecs since last trans: %u",
 		 sq->ch_ix, sq->sqn, sq->cq.mcq.cqn, sq->cc, sq->pc,
-		 jiffies_to_usecs(jiffies - sq->txq->trans_start));
+		 jiffies_to_usecs(jiffies - READ_ONCE(sq->txq->trans_start)));
 
 	mlx5e_health_report(priv, priv->tx_reporter, err_str, &err_ctx);
 	return to_ctx.status;
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index 033c35c09a54876eeb87e30aad5ec8e8613f13b9..389d125310c151e54b428d19616fd07531051f54 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -2356,7 +2356,7 @@ static bool stmmac_xdp_xmit_zc(struct stmmac_priv *priv, u32 queue, u32 budget)
 	bool work_done = true;
 
 	/* Avoids TX time-out as we are sharing with slow path */
-	nq->trans_start = jiffies;
+	txq_trans_cond_update(nq->trans_start);
 
 	budget = min(budget, stmmac_tx_avail(priv, queue));
 
@@ -4657,7 +4657,7 @@ static int stmmac_xdp_xmit_back(struct stmmac_priv *priv,
 
 	__netif_tx_lock(nq, cpu);
 	/* Avoids TX time-out as we are sharing with slow path */
-	nq->trans_start = jiffies;
+	txq_trans_cond_update(nq->trans_start);
 
 	res = stmmac_xdp_xmit_xdpf(priv, queue, xdpf, false);
 	if (res == STMMAC_XDP_TX)
@@ -6293,7 +6293,7 @@ static int stmmac_xdp_xmit(struct net_device *dev, int num_frames,
 
 	__netif_tx_lock(nq, cpu);
 	/* Avoids TX time-out as we are sharing with slow path */
-	nq->trans_start = jiffies;
+	txq_trans_cond_update(nq);
 
 	for (i = 0; i < num_frames; i++) {
 		int res;
diff --git a/drivers/net/ethernet/ti/am65-cpsw-nuss.c b/drivers/net/ethernet/ti/am65-cpsw-nuss.c
index c092cb61416a180a5ce1d0d28bd163e4a1dab302..750cea23e9cd02bba139a58553c4b1753956ad10 100644
--- a/drivers/net/ethernet/ti/am65-cpsw-nuss.c
+++ b/drivers/net/ethernet/ti/am65-cpsw-nuss.c
@@ -345,7 +345,7 @@ static void am65_cpsw_nuss_ndo_host_tx_timeout(struct net_device *ndev,
 
 	netif_txq = netdev_get_tx_queue(ndev, txqueue);
 	tx_chn = &common->tx_chns[txqueue];
-	trans_start = netif_txq->trans_start;
+	trans_start = READ_ONCE(netif_txq->trans_start);
 
 	netdev_err(ndev, "txq:%d DRV_XOFF:%d tmo:%u dql_avail:%d free_desc:%zu\n",
 		   txqueue,
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 1771d6e5224fd834a4dfca4ba578134439d4d201..03e38e38ee4b5a97567eb692cd84a55722a1a8b2 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -2694,7 +2694,7 @@ static void virtnet_tx_timeout(struct net_device *dev, unsigned int txqueue)
 
 	netdev_err(dev, "TX timeout on queue: %u, sq: %s, vq: 0x%x, name: %s, %u usecs ago\n",
 		   txqueue, sq->name, sq->vq->index, sq->vq->name,
-		   jiffies_to_usecs(jiffies - txq->trans_start));
+		   jiffies_to_usecs(jiffies - READ_ONCE(txq->trans_start)));
 }
 
 static const struct net_device_ops virtnet_netdev = {
diff --git a/drivers/net/wireless/marvell/mwifiex/init.c b/drivers/net/wireless/marvell/mwifiex/init.c
index f006a3d72b4046f435739e9218e3be6bf7001adc..88c72d1827a00d608e90e03beb20202228cd8699 100644
--- a/drivers/net/wireless/marvell/mwifiex/init.c
+++ b/drivers/net/wireless/marvell/mwifiex/init.c
@@ -332,7 +332,7 @@ void mwifiex_set_trans_start(struct net_device *dev)
 	int i;
 
 	for (i = 0; i < dev->num_tx_queues; i++)
-		netdev_get_tx_queue(dev, i)->trans_start = jiffies;
+		txq_trans_cond_update(netdev_get_tx_queue(dev, i));
 
 	netif_trans_update(dev);
 }
diff --git a/drivers/staging/rtl8192e/rtllib_softmac.c b/drivers/staging/rtl8192e/rtllib_softmac.c
index d2726d01c7573fe8230e0a7e0f7f811c1ff8cffc..aabbea48223d2f7915285c883e2ae94111bd91b6 100644
--- a/drivers/staging/rtl8192e/rtllib_softmac.c
+++ b/drivers/staging/rtl8192e/rtllib_softmac.c
@@ -2515,7 +2515,7 @@ void rtllib_stop_all_queues(struct rtllib_device *ieee)
 	unsigned int i;
 
 	for (i = 0; i < ieee->dev->num_tx_queues; i++)
-		netdev_get_tx_queue(ieee->dev, i)->trans_start = jiffies;
+		txq_trans_cond_update(netdev_get_tx_queue(ieee->dev, i));
 
 	netif_tx_stop_all_queues(ieee->dev);
 }
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 143ac02c7f1cc90cf6704574fb0012e1ba830c70..83e6204c0ba3491b56eec5c7f94e55eab7159223 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -4095,10 +4095,21 @@ static inline void __netif_tx_unlock_bh(struct netdev_queue *txq)
 	spin_unlock_bh(&txq->_xmit_lock);
 }
 
+/*
+ * txq->trans_start can be read locklessly from dev_watchdog()
+ */
 static inline void txq_trans_update(struct netdev_queue *txq)
 {
 	if (txq->xmit_lock_owner != -1)
-		txq->trans_start = jiffies;
+		WRITE_ONCE(txq->trans_start, jiffies);
+}
+
+static inline void txq_trans_cond_update(struct netdev_queue *txq)
+{
+	unsigned long now = jiffies;
+
+	if (READ_ONCE(txq->trans_start) != now)
+		WRITE_ONCE(txq->trans_start, now);
 }
 
 /* legacy drivers only, netdev_start_xmit() sets txq->trans_start */
@@ -4106,8 +4117,7 @@ static inline void netif_trans_update(struct net_device *dev)
 {
 	struct netdev_queue *txq = netdev_get_tx_queue(dev, 0);
 
-	if (txq->trans_start != jiffies)
-		txq->trans_start = jiffies;
+	txq_trans_cond_update(txq);
 }
 
 /**
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index 1b4328bd495d54d44a9d51b53c8e8bc18b9cc294..02c46041f76e85571fd2862e02fb409bfd8e6611 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -434,9 +434,9 @@ unsigned long dev_trans_start(struct net_device *dev)
 		dev = vlan_dev_real_dev(dev);
 	else if (netif_is_macvlan(dev))
 		dev = macvlan_dev_real_dev(dev);
-	res = netdev_get_tx_queue(dev, 0)->trans_start;
+	res = READ_ONCE(netdev_get_tx_queue(dev, 0)->trans_start);
 	for (i = 1; i < dev->num_tx_queues; i++) {
-		val = netdev_get_tx_queue(dev, i)->trans_start;
+		val = READ_ONCE(netdev_get_tx_queue(dev, i)->trans_start);
 		if (val && time_after(val, res))
 			res = val;
 	}
@@ -462,7 +462,7 @@ static void dev_watchdog(struct timer_list *t)
 				struct netdev_queue *txq;
 
 				txq = netdev_get_tx_queue(dev, i);
-				trans_start = txq->trans_start;
+				trans_start = READ_ONCE(txq->trans_start);
 				if (netif_xmit_stopped(txq) &&
 				    time_after(jiffies, (trans_start +
 							 dev->watchdog_timeo))) {
@@ -1148,7 +1148,7 @@ static void transition_one_qdisc(struct net_device *dev,
 
 	rcu_assign_pointer(dev_queue->qdisc, new_qdisc);
 	if (need_watchdog_p) {
-		dev_queue->trans_start = 0;
+		WRITE_ONCE(dev_queue->trans_start, 0);
 		*need_watchdog_p = 1;
 	}
 }
-- 
2.34.0.rc1.387.gb447b232ab-goog


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH net-next 3/4] net: do not inline netif_tx_lock()/netif_tx_unlock()
  2021-11-17  3:29 [PATCH net-next 0/4] net: make dev_watchdog() less intrusive Eric Dumazet
  2021-11-17  3:29 ` [PATCH net-next 1/4] net: use an atomic_long_t for queue->trans_timeout Eric Dumazet
  2021-11-17  3:29 ` [PATCH net-next 2/4] net: annotate accesses to queue->trans_start Eric Dumazet
@ 2021-11-17  3:29 ` Eric Dumazet
  2021-11-17  3:29 ` [PATCH net-next 4/4] net: no longer stop all TX queues in dev_watchdog() Eric Dumazet
  2021-11-17 15:00 ` [PATCH net-next 0/4] net: make dev_watchdog() less intrusive patchwork-bot+netdevbpf
  4 siblings, 0 replies; 7+ messages in thread
From: Eric Dumazet @ 2021-11-17  3:29 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski; +Cc: netdev, Eric Dumazet, Eric Dumazet

From: Eric Dumazet <edumazet@google.com>

These are not fast path, there is no point in inlining them.

Also provide netif_freeze_queues()/netif_unfreeze_queues()
so that we can use them from dev_watchdog() in the following patch.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 include/linux/netdevice.h | 39 ++----------------------------
 net/sched/sch_generic.c   | 51 +++++++++++++++++++++++++++++++++++++++
 2 files changed, 53 insertions(+), 37 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 83e6204c0ba3491b56eec5c7f94e55eab7159223..28e79ef5ca06f66a788ce3e3f59d158be9150332 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -4126,27 +4126,7 @@ static inline void netif_trans_update(struct net_device *dev)
  *
  * Get network device transmit lock
  */
-static inline void netif_tx_lock(struct net_device *dev)
-{
-	unsigned int i;
-	int cpu;
-
-	spin_lock(&dev->tx_global_lock);
-	cpu = smp_processor_id();
-	for (i = 0; i < dev->num_tx_queues; i++) {
-		struct netdev_queue *txq = netdev_get_tx_queue(dev, i);
-
-		/* We are the only thread of execution doing a
-		 * freeze, but we have to grab the _xmit_lock in
-		 * order to synchronize with threads which are in
-		 * the ->hard_start_xmit() handler and already
-		 * checked the frozen bit.
-		 */
-		__netif_tx_lock(txq, cpu);
-		set_bit(__QUEUE_STATE_FROZEN, &txq->state);
-		__netif_tx_unlock(txq);
-	}
-}
+void netif_tx_lock(struct net_device *dev);
 
 static inline void netif_tx_lock_bh(struct net_device *dev)
 {
@@ -4154,22 +4134,7 @@ static inline void netif_tx_lock_bh(struct net_device *dev)
 	netif_tx_lock(dev);
 }
 
-static inline void netif_tx_unlock(struct net_device *dev)
-{
-	unsigned int i;
-
-	for (i = 0; i < dev->num_tx_queues; i++) {
-		struct netdev_queue *txq = netdev_get_tx_queue(dev, i);
-
-		/* No need to grab the _xmit_lock here.  If the
-		 * queue is not stopped for another reason, we
-		 * force a schedule.
-		 */
-		clear_bit(__QUEUE_STATE_FROZEN, &txq->state);
-		netif_schedule_queue(txq);
-	}
-	spin_unlock(&dev->tx_global_lock);
-}
+void netif_tx_unlock(struct net_device *dev);
 
 static inline void netif_tx_unlock_bh(struct net_device *dev)
 {
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index 02c46041f76e85571fd2862e02fb409bfd8e6611..389e0d8fc68d12cf092a975511729a8dae1b29fb 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -445,6 +445,57 @@ unsigned long dev_trans_start(struct net_device *dev)
 }
 EXPORT_SYMBOL(dev_trans_start);
 
+static void netif_freeze_queues(struct net_device *dev)
+{
+	unsigned int i;
+	int cpu;
+
+	cpu = smp_processor_id();
+	for (i = 0; i < dev->num_tx_queues; i++) {
+		struct netdev_queue *txq = netdev_get_tx_queue(dev, i);
+
+		/* We are the only thread of execution doing a
+		 * freeze, but we have to grab the _xmit_lock in
+		 * order to synchronize with threads which are in
+		 * the ->hard_start_xmit() handler and already
+		 * checked the frozen bit.
+		 */
+		__netif_tx_lock(txq, cpu);
+		set_bit(__QUEUE_STATE_FROZEN, &txq->state);
+		__netif_tx_unlock(txq);
+	}
+}
+
+void netif_tx_lock(struct net_device *dev)
+{
+	spin_lock(&dev->tx_global_lock);
+	netif_freeze_queues(dev);
+}
+EXPORT_SYMBOL(netif_tx_lock);
+
+static void netif_unfreeze_queues(struct net_device *dev)
+{
+	unsigned int i;
+
+	for (i = 0; i < dev->num_tx_queues; i++) {
+		struct netdev_queue *txq = netdev_get_tx_queue(dev, i);
+
+		/* No need to grab the _xmit_lock here.  If the
+		 * queue is not stopped for another reason, we
+		 * force a schedule.
+		 */
+		clear_bit(__QUEUE_STATE_FROZEN, &txq->state);
+		netif_schedule_queue(txq);
+	}
+}
+
+void netif_tx_unlock(struct net_device *dev)
+{
+	netif_unfreeze_queues(dev);
+	spin_unlock(&dev->tx_global_lock);
+}
+EXPORT_SYMBOL(netif_tx_unlock);
+
 static void dev_watchdog(struct timer_list *t)
 {
 	struct net_device *dev = from_timer(dev, t, watchdog_timer);
-- 
2.34.0.rc1.387.gb447b232ab-goog


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH net-next 4/4] net: no longer stop all TX queues in dev_watchdog()
  2021-11-17  3:29 [PATCH net-next 0/4] net: make dev_watchdog() less intrusive Eric Dumazet
                   ` (2 preceding siblings ...)
  2021-11-17  3:29 ` [PATCH net-next 3/4] net: do not inline netif_tx_lock()/netif_tx_unlock() Eric Dumazet
@ 2021-11-17  3:29 ` Eric Dumazet
  2021-11-17 15:00 ` [PATCH net-next 0/4] net: make dev_watchdog() less intrusive patchwork-bot+netdevbpf
  4 siblings, 0 replies; 7+ messages in thread
From: Eric Dumazet @ 2021-11-17  3:29 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski; +Cc: netdev, Eric Dumazet, Eric Dumazet

From: Eric Dumazet <edumazet@google.com>

There is no reason for stopping all TX queues from dev_watchdog()

Not only this stops feeding the NIC, it also migrates all qdiscs
to be serviced on the cpu calling netif_tx_unlock(), causing
a potential latency artifact.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/sched/sch_generic.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index 389e0d8fc68d12cf092a975511729a8dae1b29fb..d33804d41c5c5a9047c808fd37ba65ae8875fc79 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -500,7 +500,7 @@ static void dev_watchdog(struct timer_list *t)
 {
 	struct net_device *dev = from_timer(dev, t, watchdog_timer);
 
-	netif_tx_lock(dev);
+	spin_lock(&dev->tx_global_lock);
 	if (!qdisc_tx_is_noop(dev)) {
 		if (netif_device_present(dev) &&
 		    netif_running(dev) &&
@@ -523,11 +523,13 @@ static void dev_watchdog(struct timer_list *t)
 				}
 			}
 
-			if (some_queue_timedout) {
+			if (unlikely(some_queue_timedout)) {
 				trace_net_dev_xmit_timeout(dev, i);
 				WARN_ONCE(1, KERN_INFO "NETDEV WATCHDOG: %s (%s): transmit queue %u timed out\n",
 				       dev->name, netdev_drivername(dev), i);
+				netif_freeze_queues(dev);
 				dev->netdev_ops->ndo_tx_timeout(dev, i);
+				netif_unfreeze_queues(dev);
 			}
 			if (!mod_timer(&dev->watchdog_timer,
 				       round_jiffies(jiffies +
@@ -535,7 +537,7 @@ static void dev_watchdog(struct timer_list *t)
 				dev_hold(dev);
 		}
 	}
-	netif_tx_unlock(dev);
+	spin_unlock(&dev->tx_global_lock);
 
 	dev_put(dev);
 }
-- 
2.34.0.rc1.387.gb447b232ab-goog


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH net-next 0/4] net: make dev_watchdog() less intrusive
  2021-11-17  3:29 [PATCH net-next 0/4] net: make dev_watchdog() less intrusive Eric Dumazet
                   ` (3 preceding siblings ...)
  2021-11-17  3:29 ` [PATCH net-next 4/4] net: no longer stop all TX queues in dev_watchdog() Eric Dumazet
@ 2021-11-17 15:00 ` patchwork-bot+netdevbpf
  4 siblings, 0 replies; 7+ messages in thread
From: patchwork-bot+netdevbpf @ 2021-11-17 15:00 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: davem, kuba, netdev, edumazet

Hello:

This series was applied to netdev/net-next.git (master)
by David S. Miller <davem@davemloft.net>:

On Tue, 16 Nov 2021 19:29:20 -0800 you wrote:
> From: Eric Dumazet <edumazet@google.com>
> 
> dev_watchdog() is used on many NIC to periodically monitor TX queues
> to detect hangs.
> 
> Problem is : It stops all queues, then check them, then 'unfreeze' them.
> 
> [...]

Here is the summary with links:
  - [net-next,1/4] net: use an atomic_long_t for queue->trans_timeout
    https://git.kernel.org/netdev/net-next/c/8160fb43d55d
  - [net-next,2/4] net: annotate accesses to queue->trans_start
    https://git.kernel.org/netdev/net-next/c/5337824f4dc4
  - [net-next,3/4] net: do not inline netif_tx_lock()/netif_tx_unlock()
    https://git.kernel.org/netdev/net-next/c/dab8fe320726
  - [net-next,4/4] net: no longer stop all TX queues in dev_watchdog()
    https://git.kernel.org/netdev/net-next/c/bec251bc8b6a

You are awesome, thank you!
-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH net-next 2/4] net: annotate accesses to queue->trans_start
  2021-11-17  3:29 ` [PATCH net-next 2/4] net: annotate accesses to queue->trans_start Eric Dumazet
@ 2021-11-18 18:27   ` kernel test robot
  0 siblings, 0 replies; 7+ messages in thread
From: kernel test robot @ 2021-11-18 18:27 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 6808 bytes --]

Hi Eric,

I love your patch! Perhaps something to improve:

[auto build test WARNING on net-next/master]

url:    https://github.com/0day-ci/linux/commits/Eric-Dumazet/net-make-dev_watchdog-less-intrusive/20211117-113056
base:   https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git 62803fec52f80e4dd375de2dd76510c405792928
config: mips-allyesconfig (attached as .config)
compiler: mips-linux-gcc (GCC) 11.2.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/461b045f634f4e17925ad5622adadf9cbf474cec
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Eric-Dumazet/net-make-dev_watchdog-less-intrusive/20211117-113056
        git checkout 461b045f634f4e17925ad5622adadf9cbf474cec
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.2.0 make.cross ARCH=mips 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

   drivers/net/ethernet/stmicro/stmmac/stmmac_main.c: In function 'stmmac_xdp_xmit_zc':
>> drivers/net/ethernet/stmicro/stmmac/stmmac_main.c:2359:33: warning: passing argument 1 of 'txq_trans_cond_update' makes pointer from integer without a cast [-Wint-conversion]
    2359 |         txq_trans_cond_update(nq->trans_start);
         |                               ~~^~~~~~~~~~~~~
         |                                 |
         |                                 long unsigned int
   In file included from include/net/sock.h:46,
                    from include/linux/tcp.h:19,
                    from drivers/net/ethernet/stmicro/stmmac/stmmac_main.c:21:
   include/linux/netdevice.h:4107:63: note: expected 'struct netdev_queue *' but argument is of type 'long unsigned int'
    4107 | static inline void txq_trans_cond_update(struct netdev_queue *txq)
         |                                          ~~~~~~~~~~~~~~~~~~~~~^~~
   drivers/net/ethernet/stmicro/stmmac/stmmac_main.c: In function 'stmmac_xdp_xmit_back':
   drivers/net/ethernet/stmicro/stmmac/stmmac_main.c:4660:33: warning: passing argument 1 of 'txq_trans_cond_update' makes pointer from integer without a cast [-Wint-conversion]
    4660 |         txq_trans_cond_update(nq->trans_start);
         |                               ~~^~~~~~~~~~~~~
         |                                 |
         |                                 long unsigned int
   In file included from include/net/sock.h:46,
                    from include/linux/tcp.h:19,
                    from drivers/net/ethernet/stmicro/stmmac/stmmac_main.c:21:
   include/linux/netdevice.h:4107:63: note: expected 'struct netdev_queue *' but argument is of type 'long unsigned int'
    4107 | static inline void txq_trans_cond_update(struct netdev_queue *txq)
         |                                          ~~~~~~~~~~~~~~~~~~~~~^~~


vim +/txq_trans_cond_update +2359 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c

  2347	
  2348	static bool stmmac_xdp_xmit_zc(struct stmmac_priv *priv, u32 queue, u32 budget)
  2349	{
  2350		struct netdev_queue *nq = netdev_get_tx_queue(priv->dev, queue);
  2351		struct stmmac_tx_queue *tx_q = &priv->tx_queue[queue];
  2352		struct xsk_buff_pool *pool = tx_q->xsk_pool;
  2353		unsigned int entry = tx_q->cur_tx;
  2354		struct dma_desc *tx_desc = NULL;
  2355		struct xdp_desc xdp_desc;
  2356		bool work_done = true;
  2357	
  2358		/* Avoids TX time-out as we are sharing with slow path */
> 2359		txq_trans_cond_update(nq->trans_start);
  2360	
  2361		budget = min(budget, stmmac_tx_avail(priv, queue));
  2362	
  2363		while (budget-- > 0) {
  2364			dma_addr_t dma_addr;
  2365			bool set_ic;
  2366	
  2367			/* We are sharing with slow path and stop XSK TX desc submission when
  2368			 * available TX ring is less than threshold.
  2369			 */
  2370			if (unlikely(stmmac_tx_avail(priv, queue) < STMMAC_TX_XSK_AVAIL) ||
  2371			    !netif_carrier_ok(priv->dev)) {
  2372				work_done = false;
  2373				break;
  2374			}
  2375	
  2376			if (!xsk_tx_peek_desc(pool, &xdp_desc))
  2377				break;
  2378	
  2379			if (likely(priv->extend_desc))
  2380				tx_desc = (struct dma_desc *)(tx_q->dma_etx + entry);
  2381			else if (tx_q->tbs & STMMAC_TBS_AVAIL)
  2382				tx_desc = &tx_q->dma_entx[entry].basic;
  2383			else
  2384				tx_desc = tx_q->dma_tx + entry;
  2385	
  2386			dma_addr = xsk_buff_raw_get_dma(pool, xdp_desc.addr);
  2387			xsk_buff_raw_dma_sync_for_device(pool, dma_addr, xdp_desc.len);
  2388	
  2389			tx_q->tx_skbuff_dma[entry].buf_type = STMMAC_TXBUF_T_XSK_TX;
  2390	
  2391			/* To return XDP buffer to XSK pool, we simple call
  2392			 * xsk_tx_completed(), so we don't need to fill up
  2393			 * 'buf' and 'xdpf'.
  2394			 */
  2395			tx_q->tx_skbuff_dma[entry].buf = 0;
  2396			tx_q->xdpf[entry] = NULL;
  2397	
  2398			tx_q->tx_skbuff_dma[entry].map_as_page = false;
  2399			tx_q->tx_skbuff_dma[entry].len = xdp_desc.len;
  2400			tx_q->tx_skbuff_dma[entry].last_segment = true;
  2401			tx_q->tx_skbuff_dma[entry].is_jumbo = false;
  2402	
  2403			stmmac_set_desc_addr(priv, tx_desc, dma_addr);
  2404	
  2405			tx_q->tx_count_frames++;
  2406	
  2407			if (!priv->tx_coal_frames[queue])
  2408				set_ic = false;
  2409			else if (tx_q->tx_count_frames % priv->tx_coal_frames[queue] == 0)
  2410				set_ic = true;
  2411			else
  2412				set_ic = false;
  2413	
  2414			if (set_ic) {
  2415				tx_q->tx_count_frames = 0;
  2416				stmmac_set_tx_ic(priv, tx_desc);
  2417				priv->xstats.tx_set_ic_bit++;
  2418			}
  2419	
  2420			stmmac_prepare_tx_desc(priv, tx_desc, 1, xdp_desc.len,
  2421					       true, priv->mode, true, true,
  2422					       xdp_desc.len);
  2423	
  2424			stmmac_enable_dma_transmission(priv, priv->ioaddr);
  2425	
  2426			tx_q->cur_tx = STMMAC_GET_ENTRY(tx_q->cur_tx, priv->dma_tx_size);
  2427			entry = tx_q->cur_tx;
  2428		}
  2429	
  2430		if (tx_desc) {
  2431			stmmac_flush_tx_descriptors(priv, queue);
  2432			xsk_tx_release(pool);
  2433		}
  2434	
  2435		/* Return true if all of the 3 conditions are met
  2436		 *  a) TX Budget is still available
  2437		 *  b) work_done = true when XSK TX desc peek is empty (no more
  2438		 *     pending XSK TX for transmission)
  2439		 */
  2440		return !!budget && work_done;
  2441	}
  2442	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 72397 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2021-11-18 18:27 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2021-11-17  3:29 [PATCH net-next 0/4] net: make dev_watchdog() less intrusive Eric Dumazet
2021-11-17  3:29 ` [PATCH net-next 1/4] net: use an atomic_long_t for queue->trans_timeout Eric Dumazet
2021-11-17  3:29 ` [PATCH net-next 2/4] net: annotate accesses to queue->trans_start Eric Dumazet
2021-11-18 18:27   ` kernel test robot
2021-11-17  3:29 ` [PATCH net-next 3/4] net: do not inline netif_tx_lock()/netif_tx_unlock() Eric Dumazet
2021-11-17  3:29 ` [PATCH net-next 4/4] net: no longer stop all TX queues in dev_watchdog() Eric Dumazet
2021-11-17 15:00 ` [PATCH net-next 0/4] net: make dev_watchdog() less intrusive patchwork-bot+netdevbpf

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.