* [net-next PATCH v4 1/4] net: introduce napi_is_scheduled helper
2023-10-18 12:35 [net-next PATCH v4 0/4] net: stmmac: improve tx timer logic Christian Marangi
@ 2023-10-18 12:35 ` Christian Marangi
2023-10-18 12:35 ` [net-next PATCH v4 2/4] net: stmmac: improve TX timer arm logic Christian Marangi
` (3 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: Christian Marangi @ 2023-10-18 12:35 UTC (permalink / raw)
To: Raju Rangoju, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Alexandre Torgue, Jose Abreu, Maxime Coquelin,
Ping-Ke Shih, Kalle Valo, Simon Horman, Daniel Borkmann,
Jiri Pirko, Hangbin Liu, netdev, linux-kernel, linux-stm32,
linux-arm-kernel, linux-wireless
Cc: Christian Marangi
We currently have napi_if_scheduled_mark_missed that can be used to
check if napi is scheduled but that does more thing than simply checking
it and return a bool. Some driver already implement custom function to
check if napi is scheduled.
Drop these custom function and introduce napi_is_scheduled that simply
check if napi is scheduled atomically.
Update any driver and code that implement a similar check and instead
use this new helper.
Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
---
drivers/net/ethernet/chelsio/cxgb3/sge.c | 8 --------
drivers/net/wireless/realtek/rtw89/core.c | 2 +-
include/linux/netdevice.h | 23 +++++++++++++++++++++++
net/core/dev.c | 2 +-
4 files changed, 25 insertions(+), 10 deletions(-)
diff --git a/drivers/net/ethernet/chelsio/cxgb3/sge.c b/drivers/net/ethernet/chelsio/cxgb3/sge.c
index 2e9a74fe0970..71fa2dc19034 100644
--- a/drivers/net/ethernet/chelsio/cxgb3/sge.c
+++ b/drivers/net/ethernet/chelsio/cxgb3/sge.c
@@ -2501,14 +2501,6 @@ static int napi_rx_handler(struct napi_struct *napi, int budget)
return work_done;
}
-/*
- * Returns true if the device is already scheduled for polling.
- */
-static inline int napi_is_scheduled(struct napi_struct *napi)
-{
- return test_bit(NAPI_STATE_SCHED, &napi->state);
-}
-
/**
* process_pure_responses - process pure responses from a response queue
* @adap: the adapter
diff --git a/drivers/net/wireless/realtek/rtw89/core.c b/drivers/net/wireless/realtek/rtw89/core.c
index cca18d7ea1dd..6faf4dcf007c 100644
--- a/drivers/net/wireless/realtek/rtw89/core.c
+++ b/drivers/net/wireless/realtek/rtw89/core.c
@@ -1919,7 +1919,7 @@ static void rtw89_core_rx_to_mac80211(struct rtw89_dev *rtwdev,
struct napi_struct *napi = &rtwdev->napi;
/* In low power mode, napi isn't scheduled. Receive it to netif. */
- if (unlikely(!test_bit(NAPI_STATE_SCHED, &napi->state)))
+ if (unlikely(!napi_is_scheduled(napi)))
napi = NULL;
rtw89_core_hw_to_sband_rate(rx_status);
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 1c7681263d30..b8bf669212cc 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -482,6 +482,29 @@ static inline bool napi_prefer_busy_poll(struct napi_struct *n)
return test_bit(NAPI_STATE_PREFER_BUSY_POLL, &n->state);
}
+/**
+ * napi_is_scheduled - test if NAPI is scheduled
+ * @n: NAPI context
+ *
+ * This check is "best-effort". With no locking implemented,
+ * a NAPI can be scheduled or terminate right after this check
+ * and produce not precise results.
+ *
+ * NAPI_STATE_SCHED is an internal state, napi_is_scheduled
+ * should not be used normally and napi_schedule should be
+ * used instead.
+ *
+ * Use only if the driver really needs to check if a NAPI
+ * is scheduled for example in the context of delayed timer
+ * that can be skipped if a NAPI is already scheduled.
+ *
+ * Return True if NAPI is scheduled, False otherwise.
+ */
+static inline bool napi_is_scheduled(struct napi_struct *n)
+{
+ return test_bit(NAPI_STATE_SCHED, &n->state);
+}
+
bool napi_schedule_prep(struct napi_struct *n);
/**
diff --git a/net/core/dev.c b/net/core/dev.c
index 3ca746a5f0ad..8d267fc0b988 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -6527,7 +6527,7 @@ static int __napi_poll(struct napi_struct *n, bool *repoll)
* accidentally calling ->poll() when NAPI is not scheduled.
*/
work = 0;
- if (test_bit(NAPI_STATE_SCHED, &n->state)) {
+ if (napi_is_scheduled(n)) {
work = n->poll(n, weight);
trace_napi_poll(n, work, weight);
}
--
2.40.1
^ permalink raw reply related [flat|nested] 6+ messages in thread* [net-next PATCH v4 2/4] net: stmmac: improve TX timer arm logic
2023-10-18 12:35 [net-next PATCH v4 0/4] net: stmmac: improve tx timer logic Christian Marangi
2023-10-18 12:35 ` [net-next PATCH v4 1/4] net: introduce napi_is_scheduled helper Christian Marangi
@ 2023-10-18 12:35 ` Christian Marangi
2023-10-18 12:35 ` [net-next PATCH v4 3/4] net: stmmac: move TX timer arm after DMA enable Christian Marangi
` (2 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: Christian Marangi @ 2023-10-18 12:35 UTC (permalink / raw)
To: Raju Rangoju, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Alexandre Torgue, Jose Abreu, Maxime Coquelin,
Ping-Ke Shih, Kalle Valo, Simon Horman, Daniel Borkmann,
Jiri Pirko, Hangbin Liu, netdev, linux-kernel, linux-stm32,
linux-arm-kernel, linux-wireless
Cc: Christian Marangi
There is currently a problem with the TX timer getting armed multiple
unnecessary times causing big performance regression on some device that
suffer from heavy handling of hrtimer rearm.
The use of the TX timer is an old implementation that predates the napi
implementation and the interrupt enable/disable handling.
Due to stmmac being a very old code, the TX timer was never evaluated
again with this new implementation and was kept there causing
performance regression. The performance regression started to appear
with kernel version 4.19 with 8fce33317023 ("net: stmmac: Rework coalesce
timer and fix multi-queue races") where the timer was reduced to 1ms
causing it to be armed 40 times more than before.
Decreasing the timer made the problem more present and caused the
regression in the other of 600-700mbps on some device (regression where
this was notice is ipq806x).
The problem is in the fact that handling the hrtimer on some target is
expensive and recent kernel made the timer armed much more times.
A solution that was proposed was reverting the hrtimer change and use
mod_timer but such solution would still hide the real problem in the
current implementation.
To fix the regression, apply some additional logic and skip arming the
timer when not needed.
Arm the timer ONLY if a napi is not already scheduled. Running the timer
is redundant since the same function (stmmac_tx_clean) will run in the
napi TX poll. Also try to cancel any timer if a napi is scheduled to
prevent redundant run of TX call.
With the following new logic the original performance are restored while
keeping using the hrtimer.
Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
---
.../net/ethernet/stmicro/stmmac/stmmac_main.c | 18 +++++++++++++++---
1 file changed, 15 insertions(+), 3 deletions(-)
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index bb1dbf4c9f6c..5124ee87286c 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -2996,13 +2996,25 @@ static void stmmac_tx_timer_arm(struct stmmac_priv *priv, u32 queue)
{
struct stmmac_tx_queue *tx_q = &priv->dma_conf.tx_queue[queue];
u32 tx_coal_timer = priv->tx_coal_timer[queue];
+ struct stmmac_channel *ch;
+ struct napi_struct *napi;
if (!tx_coal_timer)
return;
- hrtimer_start(&tx_q->txtimer,
- STMMAC_COAL_TIMER(tx_coal_timer),
- HRTIMER_MODE_REL);
+ ch = &priv->channel[tx_q->queue_index];
+ napi = tx_q->xsk_pool ? &ch->rxtx_napi : &ch->tx_napi;
+
+ /* Arm timer only if napi is not already scheduled.
+ * Try to cancel any timer if napi is scheduled, timer will be armed
+ * again in the next scheduled napi.
+ */
+ if (unlikely(!napi_is_scheduled(napi)))
+ hrtimer_start(&tx_q->txtimer,
+ STMMAC_COAL_TIMER(tx_coal_timer),
+ HRTIMER_MODE_REL);
+ else
+ hrtimer_try_to_cancel(&tx_q->txtimer);
}
/**
--
2.40.1
^ permalink raw reply related [flat|nested] 6+ messages in thread* [net-next PATCH v4 3/4] net: stmmac: move TX timer arm after DMA enable
2023-10-18 12:35 [net-next PATCH v4 0/4] net: stmmac: improve tx timer logic Christian Marangi
2023-10-18 12:35 ` [net-next PATCH v4 1/4] net: introduce napi_is_scheduled helper Christian Marangi
2023-10-18 12:35 ` [net-next PATCH v4 2/4] net: stmmac: improve TX timer arm logic Christian Marangi
@ 2023-10-18 12:35 ` Christian Marangi
2023-10-18 12:35 ` [net-next PATCH v4 4/4] net: stmmac: increase TX coalesce timer to 5ms Christian Marangi
2023-10-19 13:50 ` [net-next PATCH v4 0/4] net: stmmac: improve tx timer logic patchwork-bot+netdevbpf
4 siblings, 0 replies; 6+ messages in thread
From: Christian Marangi @ 2023-10-18 12:35 UTC (permalink / raw)
To: Raju Rangoju, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Alexandre Torgue, Jose Abreu, Maxime Coquelin,
Ping-Ke Shih, Kalle Valo, Simon Horman, Daniel Borkmann,
Jiri Pirko, Hangbin Liu, netdev, linux-kernel, linux-stm32,
linux-arm-kernel, linux-wireless
Cc: Christian Marangi
Move TX timer arm call after DMA interrupt is enabled again.
The TX timer arm function changed logic and now is skipped if a napi is
already scheduled. By moving the TX timer arm call after DMA is enabled,
we permit to correctly skip if a DMA interrupt has been fired and a napi
has been scheduled again.
Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
---
.../net/ethernet/stmicro/stmmac/stmmac_main.c | 22 +++++++++++++++----
1 file changed, 18 insertions(+), 4 deletions(-)
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index 5124ee87286c..11055e98efcc 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -2543,9 +2543,13 @@ static void stmmac_bump_dma_threshold(struct stmmac_priv *priv, u32 chan)
* @priv: driver private structure
* @budget: napi budget limiting this functions packet handling
* @queue: TX queue index
+ * @pending_packets: signal to arm the TX coal timer
* Description: it reclaims the transmit resources after transmission completes.
+ * If some packets still needs to be handled, due to TX coalesce, set
+ * pending_packets to true to make NAPI arm the TX coal timer.
*/
-static int stmmac_tx_clean(struct stmmac_priv *priv, int budget, u32 queue)
+static int stmmac_tx_clean(struct stmmac_priv *priv, int budget, u32 queue,
+ bool *pending_packets)
{
struct stmmac_tx_queue *tx_q = &priv->dma_conf.tx_queue[queue];
struct stmmac_txq_stats *txq_stats = &priv->xstats.txq_stats[queue];
@@ -2706,7 +2710,7 @@ static int stmmac_tx_clean(struct stmmac_priv *priv, int budget, u32 queue)
/* We still have pending packets, let's call for a new scheduling */
if (tx_q->dirty_tx != tx_q->cur_tx)
- stmmac_tx_timer_arm(priv, queue);
+ *pending_packets = true;
flags = u64_stats_update_begin_irqsave(&txq_stats->syncp);
txq_stats->tx_packets += tx_packets;
@@ -5572,6 +5576,7 @@ static int stmmac_napi_poll_tx(struct napi_struct *napi, int budget)
container_of(napi, struct stmmac_channel, tx_napi);
struct stmmac_priv *priv = ch->priv_data;
struct stmmac_txq_stats *txq_stats;
+ bool pending_packets = false;
u32 chan = ch->index;
unsigned long flags;
int work_done;
@@ -5581,7 +5586,7 @@ static int stmmac_napi_poll_tx(struct napi_struct *napi, int budget)
txq_stats->napi_poll++;
u64_stats_update_end_irqrestore(&txq_stats->syncp, flags);
- work_done = stmmac_tx_clean(priv, budget, chan);
+ work_done = stmmac_tx_clean(priv, budget, chan, &pending_packets);
work_done = min(work_done, budget);
if (work_done < budget && napi_complete_done(napi, work_done)) {
@@ -5592,6 +5597,10 @@ static int stmmac_napi_poll_tx(struct napi_struct *napi, int budget)
spin_unlock_irqrestore(&ch->lock, flags);
}
+ /* TX still have packet to handle, check if we need to arm tx timer */
+ if (pending_packets)
+ stmmac_tx_timer_arm(priv, chan);
+
return work_done;
}
@@ -5600,6 +5609,7 @@ static int stmmac_napi_poll_rxtx(struct napi_struct *napi, int budget)
struct stmmac_channel *ch =
container_of(napi, struct stmmac_channel, rxtx_napi);
struct stmmac_priv *priv = ch->priv_data;
+ bool tx_pending_packets = false;
int rx_done, tx_done, rxtx_done;
struct stmmac_rxq_stats *rxq_stats;
struct stmmac_txq_stats *txq_stats;
@@ -5616,7 +5626,7 @@ static int stmmac_napi_poll_rxtx(struct napi_struct *napi, int budget)
txq_stats->napi_poll++;
u64_stats_update_end_irqrestore(&txq_stats->syncp, flags);
- tx_done = stmmac_tx_clean(priv, budget, chan);
+ tx_done = stmmac_tx_clean(priv, budget, chan, &tx_pending_packets);
tx_done = min(tx_done, budget);
rx_done = stmmac_rx_zc(priv, budget, chan);
@@ -5641,6 +5651,10 @@ static int stmmac_napi_poll_rxtx(struct napi_struct *napi, int budget)
spin_unlock_irqrestore(&ch->lock, flags);
}
+ /* TX still have packet to handle, check if we need to arm tx timer */
+ if (tx_pending_packets)
+ stmmac_tx_timer_arm(priv, chan);
+
return min(rxtx_done, budget - 1);
}
--
2.40.1
^ permalink raw reply related [flat|nested] 6+ messages in thread* [net-next PATCH v4 4/4] net: stmmac: increase TX coalesce timer to 5ms
2023-10-18 12:35 [net-next PATCH v4 0/4] net: stmmac: improve tx timer logic Christian Marangi
` (2 preceding siblings ...)
2023-10-18 12:35 ` [net-next PATCH v4 3/4] net: stmmac: move TX timer arm after DMA enable Christian Marangi
@ 2023-10-18 12:35 ` Christian Marangi
2023-10-19 13:50 ` [net-next PATCH v4 0/4] net: stmmac: improve tx timer logic patchwork-bot+netdevbpf
4 siblings, 0 replies; 6+ messages in thread
From: Christian Marangi @ 2023-10-18 12:35 UTC (permalink / raw)
To: Raju Rangoju, David S. Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Alexandre Torgue, Jose Abreu, Maxime Coquelin,
Ping-Ke Shih, Kalle Valo, Simon Horman, Daniel Borkmann,
Jiri Pirko, Hangbin Liu, netdev, linux-kernel, linux-stm32,
linux-arm-kernel, linux-wireless
Cc: Christian Marangi
Commit 8fce33317023 ("net: stmmac: Rework coalesce timer and fix
multi-queue races") decreased the TX coalesce timer from 40ms to 1ms.
This caused some performance regression on some target (regression was
reported at least on ipq806x) in the order of 600mbps dropping from
gigabit handling to only 200mbps.
The problem was identified in the TX timer getting armed too much time.
While this was fixed and improved in another commit, performance can be
improved even further by increasing the timer delay a bit moving from
1ms to 5ms.
The value is a good balance between battery saving by prevending too
much interrupt to be generated and permitting good performance for
internet oriented devices.
Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
---
drivers/net/ethernet/stmicro/stmmac/common.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/stmicro/stmmac/common.h b/drivers/net/ethernet/stmicro/stmmac/common.h
index 1e996c29043d..e3f650e88f82 100644
--- a/drivers/net/ethernet/stmicro/stmmac/common.h
+++ b/drivers/net/ethernet/stmicro/stmmac/common.h
@@ -293,7 +293,7 @@ struct stmmac_safety_stats {
#define MIN_DMA_RIWT 0x10
#define DEF_DMA_RIWT 0xa0
/* Tx coalesce parameters */
-#define STMMAC_COAL_TX_TIMER 1000
+#define STMMAC_COAL_TX_TIMER 5000
#define STMMAC_MAX_COAL_TX_TICK 100000
#define STMMAC_TX_MAX_FRAMES 256
#define STMMAC_TX_FRAMES 25
--
2.40.1
^ permalink raw reply related [flat|nested] 6+ messages in thread* Re: [net-next PATCH v4 0/4] net: stmmac: improve tx timer logic
2023-10-18 12:35 [net-next PATCH v4 0/4] net: stmmac: improve tx timer logic Christian Marangi
` (3 preceding siblings ...)
2023-10-18 12:35 ` [net-next PATCH v4 4/4] net: stmmac: increase TX coalesce timer to 5ms Christian Marangi
@ 2023-10-19 13:50 ` patchwork-bot+netdevbpf
4 siblings, 0 replies; 6+ messages in thread
From: patchwork-bot+netdevbpf @ 2023-10-19 13:50 UTC (permalink / raw)
To: Christian Marangi
Cc: rajur, davem, edumazet, kuba, pabeni, alexandre.torgue, joabreu,
mcoquelin.stm32, pkshih, kvalo, horms, daniel, jiri, liuhangbin,
netdev, linux-kernel, linux-stm32, linux-arm-kernel,
linux-wireless
Hello:
This series was applied to netdev/net-next.git (main)
by Paolo Abeni <pabeni@redhat.com>:
On Wed, 18 Oct 2023 14:35:46 +0200 you wrote:
> This series comes with the intention of restoring original performance
> of stmmac on some router/device that used the stmmac driver to handle
> gigabit traffic.
>
> More info are present in patch 3. This cover letter is to show results
> and improvements of the following change.
>
> [...]
Here is the summary with links:
- [net-next,v4,1/4] net: introduce napi_is_scheduled helper
https://git.kernel.org/netdev/net-next/c/7f3eb2174512
- [net-next,v4,2/4] net: stmmac: improve TX timer arm logic
https://git.kernel.org/netdev/net-next/c/2d1a42cf7f77
- [net-next,v4,3/4] net: stmmac: move TX timer arm after DMA enable
https://git.kernel.org/netdev/net-next/c/a594166387fe
- [net-next,v4,4/4] net: stmmac: increase TX coalesce timer to 5ms
https://git.kernel.org/netdev/net-next/c/039550960a22
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
^ permalink raw reply [flat|nested] 6+ messages in thread