* [next-queue v3 0/4] i40e: Add an i40e_napi_poll tracepoint
@ 2022-10-06 23:43 Joe Damato
2022-10-06 23:43 ` [next-queue v3 1/4] i40e: Store the irq number in i40e_q_vector Joe Damato
` (3 more replies)
0 siblings, 4 replies; 8+ messages in thread
From: Joe Damato @ 2022-10-06 23:43 UTC (permalink / raw)
To: intel-wired-lan
Cc: netdev, kuba, davem, anthony.l.nguyen, jesse.brandeburg,
maciej.fijalkowski, Joe Damato
Greetings:
Welcome to v3.
Debugging and tuning the NAPI and i40e NIC parameters can be a bit tricky
as there are many different options to test.
This change adds a tracepoint to i40e_napi_poll which exposes a lot of
helpful debug information for users who'd like to get a better
understanding of how their NIC is performing as they adjust various
parameters and tuning knobs.
With this series applied, you can use the tracepoint with perf by running:
$ sudo perf trace -e i40e:i40e_napi_poll -a --call-graph=fp --libtraceevent_print
388.258 :0/0 i40e:i40e_napi_poll(i40e_napi_poll on dev eth2 q i40e-eth2-TxRx-9 irq 346 irq_mask 00000000,00000000,00000000,00000000,00000000,00800000 curr_cpu 23 budget 64 bpr 64 rx_cleaned 28 tx_cleaned 0 rx_clean_complete 1 tx_clean_complete 1)
i40e_napi_poll ([i40e])
i40e_napi_poll ([i40e])
__napi_poll ([kernel.kallsyms])
net_rx_action ([kernel.kallsyms])
__do_softirq ([kernel.kallsyms])
common_interrupt ([kernel.kallsyms])
asm_common_interrupt ([kernel.kallsyms])
intel_idle_irq ([kernel.kallsyms])
cpuidle_enter_state ([kernel.kallsyms])
cpuidle_enter ([kernel.kallsyms])
do_idle ([kernel.kallsyms])
cpu_startup_entry ([kernel.kallsyms])
[0x243fd8] ([kernel.kallsyms])
secondary_startup_64_no_verify ([kernel.kallsyms])
The output is verbose, but is helpful when trying to determine the impact of
various turning parameters.
Thanks,
Joe
v2 -> v3:
- Add an rx_clean_complete to the RX patch so that it can be output
in tracepoint instead of the valued of 'clean_complete' which can
be ambiguous (patch 3/4 was updated).
- Update the tracepoint to swap 'clean_complete' with
'rx_clean_complete' (patch 4/4 was updated).
v1 -> v2:
- TX path modified to push an out parameter through the function
call chain instead of modifying control flow.
- RX path modified to also use an out parameter to track the number
of packets processed.
- Naming of tracepoint struct members and format string modified to
be more readable.
Joe Damato (4):
i40e: Store the irq number in i40e_q_vector
i40e: Record number TXes cleaned during NAPI
i40e: Record number of RXes cleaned during NAPI
i40e: Add i40e_napi_poll tracepoint
drivers/net/ethernet/intel/i40e/i40e.h | 1 +
drivers/net/ethernet/intel/i40e/i40e_main.c | 1 +
drivers/net/ethernet/intel/i40e/i40e_trace.h | 49 ++++++++++++++++++++++++++++
drivers/net/ethernet/intel/i40e/i40e_txrx.c | 33 ++++++++++++++-----
drivers/net/ethernet/intel/i40e/i40e_xsk.c | 21 +++++++++---
drivers/net/ethernet/intel/i40e/i40e_xsk.h | 6 ++--
6 files changed, 95 insertions(+), 16 deletions(-)
--
2.7.4
^ permalink raw reply [flat|nested] 8+ messages in thread
* [next-queue v3 1/4] i40e: Store the irq number in i40e_q_vector
2022-10-06 23:43 [next-queue v3 0/4] i40e: Add an i40e_napi_poll tracepoint Joe Damato
@ 2022-10-06 23:43 ` Joe Damato
2022-10-06 23:43 ` [next-queue v3 2/4] i40e: Record number TXes cleaned during NAPI Joe Damato
` (2 subsequent siblings)
3 siblings, 0 replies; 8+ messages in thread
From: Joe Damato @ 2022-10-06 23:43 UTC (permalink / raw)
To: intel-wired-lan
Cc: netdev, kuba, davem, anthony.l.nguyen, jesse.brandeburg,
maciej.fijalkowski, Joe Damato
Make it easy to figure out the IRQ number for a particular i40e_q_vector by
storing the assigned IRQ in the structure itself.
Signed-off-by: Joe Damato <jdamato@fastly.com>
---
drivers/net/ethernet/intel/i40e/i40e.h | 1 +
drivers/net/ethernet/intel/i40e/i40e_main.c | 1 +
2 files changed, 2 insertions(+)
diff --git a/drivers/net/ethernet/intel/i40e/i40e.h b/drivers/net/ethernet/intel/i40e/i40e.h
index 9926c4e..8e1f395 100644
--- a/drivers/net/ethernet/intel/i40e/i40e.h
+++ b/drivers/net/ethernet/intel/i40e/i40e.h
@@ -992,6 +992,7 @@ struct i40e_q_vector {
struct rcu_head rcu; /* to avoid race with update stats on free */
char name[I40E_INT_NAME_STR_LEN];
bool arm_wb_state;
+ int irq_num; /* IRQ assigned to this q_vector */
} ____cacheline_internodealigned_in_smp;
/* lan device */
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 6b7535a..6efe130 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -4123,6 +4123,7 @@ static int i40e_vsi_request_irq_msix(struct i40e_vsi *vsi, char *basename)
}
/* register for affinity change notifications */
+ q_vector->irq_num = irq_num;
q_vector->affinity_notify.notify = i40e_irq_affinity_notify;
q_vector->affinity_notify.release = i40e_irq_affinity_release;
irq_set_affinity_notifier(irq_num, &q_vector->affinity_notify);
--
2.7.4
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [next-queue v3 2/4] i40e: Record number TXes cleaned during NAPI
2022-10-06 23:43 [next-queue v3 0/4] i40e: Add an i40e_napi_poll tracepoint Joe Damato
2022-10-06 23:43 ` [next-queue v3 1/4] i40e: Store the irq number in i40e_q_vector Joe Damato
@ 2022-10-06 23:43 ` Joe Damato
2022-10-07 8:10 ` Maciej Fijalkowski
2022-10-06 23:43 ` [next-queue v3 3/4] i40e: Record number of RXes " Joe Damato
2022-10-06 23:43 ` [next-queue v3 4/4] i40e: Add i40e_napi_poll tracepoint Joe Damato
3 siblings, 1 reply; 8+ messages in thread
From: Joe Damato @ 2022-10-06 23:43 UTC (permalink / raw)
To: intel-wired-lan
Cc: netdev, kuba, davem, anthony.l.nguyen, jesse.brandeburg,
maciej.fijalkowski, Joe Damato
Update i40e_clean_tx_irq to take an out parameter (tx_cleaned) which stores
the number TXs cleaned.
Likewise, update i40e_clean_xdp_tx_irq and i40e_xmit_zc to do the same.
Care has been taken to avoid changing the control flow of any functions
involved.
Signed-off-by: Joe Damato <jdamato@fastly.com>
---
drivers/net/ethernet/intel/i40e/i40e_txrx.c | 16 +++++++++++-----
drivers/net/ethernet/intel/i40e/i40e_xsk.c | 15 +++++++++++----
drivers/net/ethernet/intel/i40e/i40e_xsk.h | 3 ++-
3 files changed, 24 insertions(+), 10 deletions(-)
diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
index b97c95f..a2cc98e 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
@@ -923,11 +923,13 @@ void i40e_detect_recover_hung(struct i40e_vsi *vsi)
* @vsi: the VSI we care about
* @tx_ring: Tx ring to clean
* @napi_budget: Used to determine if we are in netpoll
+ * @tx_cleaned: Out parameter set to the number of TXes cleaned
*
* Returns true if there's any budget left (e.g. the clean is finished)
**/
static bool i40e_clean_tx_irq(struct i40e_vsi *vsi,
- struct i40e_ring *tx_ring, int napi_budget)
+ struct i40e_ring *tx_ring, int napi_budget,
+ unsigned int *tx_cleaned)
{
int i = tx_ring->next_to_clean;
struct i40e_tx_buffer *tx_buf;
@@ -1026,7 +1028,7 @@ static bool i40e_clean_tx_irq(struct i40e_vsi *vsi,
i40e_arm_wb(tx_ring, vsi, budget);
if (ring_is_xdp(tx_ring))
- return !!budget;
+ goto out;
/* notify netdev of completed buffers */
netdev_tx_completed_queue(txring_txq(tx_ring),
@@ -1048,6 +1050,8 @@ static bool i40e_clean_tx_irq(struct i40e_vsi *vsi,
}
}
+out:
+ *tx_cleaned = total_packets;
return !!budget;
}
@@ -2689,10 +2693,12 @@ int i40e_napi_poll(struct napi_struct *napi, int budget)
container_of(napi, struct i40e_q_vector, napi);
struct i40e_vsi *vsi = q_vector->vsi;
struct i40e_ring *ring;
+ bool tx_clean_complete = true;
bool clean_complete = true;
bool arm_wb = false;
int budget_per_ring;
int work_done = 0;
+ unsigned int tx_cleaned = 0;
if (test_bit(__I40E_VSI_DOWN, vsi->state)) {
napi_complete(napi);
@@ -2704,11 +2710,11 @@ int i40e_napi_poll(struct napi_struct *napi, int budget)
*/
i40e_for_each_ring(ring, q_vector->tx) {
bool wd = ring->xsk_pool ?
- i40e_clean_xdp_tx_irq(vsi, ring) :
- i40e_clean_tx_irq(vsi, ring, budget);
+ i40e_clean_xdp_tx_irq(vsi, ring, &tx_cleaned) :
+ i40e_clean_tx_irq(vsi, ring, budget, &tx_cleaned);
if (!wd) {
- clean_complete = false;
+ clean_complete = tx_clean_complete = false;
continue;
}
arm_wb |= ring->arm_wb;
diff --git a/drivers/net/ethernet/intel/i40e/i40e_xsk.c b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
index 790aaeff..f98ce7e4 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_xsk.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
@@ -530,18 +530,22 @@ static void i40e_set_rs_bit(struct i40e_ring *xdp_ring)
* i40e_xmit_zc - Performs zero-copy Tx AF_XDP
* @xdp_ring: XDP Tx ring
* @budget: NAPI budget
+ * @tx_cleaned: Out parameter of the TX packets processed
*
* Returns true if the work is finished.
**/
-static bool i40e_xmit_zc(struct i40e_ring *xdp_ring, unsigned int budget)
+static bool i40e_xmit_zc(struct i40e_ring *xdp_ring, unsigned int budget,
+ unsigned int *tx_cleaned)
{
struct xdp_desc *descs = xdp_ring->xsk_pool->tx_descs;
u32 nb_pkts, nb_processed = 0;
unsigned int total_bytes = 0;
nb_pkts = xsk_tx_peek_release_desc_batch(xdp_ring->xsk_pool, budget);
- if (!nb_pkts)
+ if (!nb_pkts) {
+ *tx_cleaned = 0;
return true;
+ }
if (xdp_ring->next_to_use + nb_pkts >= xdp_ring->count) {
nb_processed = xdp_ring->count - xdp_ring->next_to_use;
@@ -558,6 +562,7 @@ static bool i40e_xmit_zc(struct i40e_ring *xdp_ring, unsigned int budget)
i40e_update_tx_stats(xdp_ring, nb_pkts, total_bytes);
+ *tx_cleaned = nb_pkts;
return nb_pkts < budget;
}
@@ -581,10 +586,12 @@ static void i40e_clean_xdp_tx_buffer(struct i40e_ring *tx_ring,
* i40e_clean_xdp_tx_irq - Completes AF_XDP entries, and cleans XDP entries
* @vsi: Current VSI
* @tx_ring: XDP Tx ring
+ * @tx_cleaned: out parameter of number of TXes cleaned
*
* Returns true if cleanup/tranmission is done.
**/
-bool i40e_clean_xdp_tx_irq(struct i40e_vsi *vsi, struct i40e_ring *tx_ring)
+bool i40e_clean_xdp_tx_irq(struct i40e_vsi *vsi, struct i40e_ring *tx_ring,
+ unsigned int *tx_cleaned)
{
struct xsk_buff_pool *bp = tx_ring->xsk_pool;
u32 i, completed_frames, xsk_frames = 0;
@@ -634,7 +641,7 @@ bool i40e_clean_xdp_tx_irq(struct i40e_vsi *vsi, struct i40e_ring *tx_ring)
if (xsk_uses_need_wakeup(tx_ring->xsk_pool))
xsk_set_tx_need_wakeup(tx_ring->xsk_pool);
- return i40e_xmit_zc(tx_ring, I40E_DESC_UNUSED(tx_ring));
+ return i40e_xmit_zc(tx_ring, I40E_DESC_UNUSED(tx_ring), tx_cleaned);
}
/**
diff --git a/drivers/net/ethernet/intel/i40e/i40e_xsk.h b/drivers/net/ethernet/intel/i40e/i40e_xsk.h
index 821df24..396ed11 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_xsk.h
+++ b/drivers/net/ethernet/intel/i40e/i40e_xsk.h
@@ -30,7 +30,8 @@ int i40e_xsk_pool_setup(struct i40e_vsi *vsi, struct xsk_buff_pool *pool,
bool i40e_alloc_rx_buffers_zc(struct i40e_ring *rx_ring, u16 cleaned_count);
int i40e_clean_rx_irq_zc(struct i40e_ring *rx_ring, int budget);
-bool i40e_clean_xdp_tx_irq(struct i40e_vsi *vsi, struct i40e_ring *tx_ring);
+bool i40e_clean_xdp_tx_irq(struct i40e_vsi *vsi, struct i40e_ring *tx_ring,
+ unsigned int *tx_cleaned);
int i40e_xsk_wakeup(struct net_device *dev, u32 queue_id, u32 flags);
int i40e_realloc_rx_bi_zc(struct i40e_vsi *vsi, bool zc);
void i40e_clear_rx_bi_zc(struct i40e_ring *rx_ring);
--
2.7.4
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [next-queue v3 3/4] i40e: Record number of RXes cleaned during NAPI
2022-10-06 23:43 [next-queue v3 0/4] i40e: Add an i40e_napi_poll tracepoint Joe Damato
2022-10-06 23:43 ` [next-queue v3 1/4] i40e: Store the irq number in i40e_q_vector Joe Damato
2022-10-06 23:43 ` [next-queue v3 2/4] i40e: Record number TXes cleaned during NAPI Joe Damato
@ 2022-10-06 23:43 ` Joe Damato
2022-10-06 23:43 ` [next-queue v3 4/4] i40e: Add i40e_napi_poll tracepoint Joe Damato
3 siblings, 0 replies; 8+ messages in thread
From: Joe Damato @ 2022-10-06 23:43 UTC (permalink / raw)
To: intel-wired-lan
Cc: netdev, kuba, davem, anthony.l.nguyen, jesse.brandeburg,
maciej.fijalkowski, Joe Damato
Adjust i40e_clean_rx_irq and i40e_clean_rx_irq_zc to accept an out
parameter which records the number of RX packets cleaned.
Care has been taken to avoid any changes in control flow.
Signed-off-by: Joe Damato <jdamato@fastly.com>
---
drivers/net/ethernet/intel/i40e/i40e_txrx.c | 14 ++++++++++----
drivers/net/ethernet/intel/i40e/i40e_xsk.c | 6 +++++-
drivers/net/ethernet/intel/i40e/i40e_xsk.h | 3 ++-
3 files changed, 17 insertions(+), 6 deletions(-)
diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
index a2cc98e..adf133b 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
@@ -2426,6 +2426,7 @@ static void i40e_inc_ntc(struct i40e_ring *rx_ring)
* i40e_clean_rx_irq - Clean completed descriptors from Rx ring - bounce buf
* @rx_ring: rx descriptor ring to transact packets on
* @budget: Total limit on number of packets to process
+ * @rx_cleaned: Out parameter of the number of packets processed
*
* This function provides a "bounce buffer" approach to Rx interrupt
* processing. The advantage to this is that on systems that have
@@ -2434,7 +2435,8 @@ static void i40e_inc_ntc(struct i40e_ring *rx_ring)
*
* Returns amount of work completed
**/
-static int i40e_clean_rx_irq(struct i40e_ring *rx_ring, int budget)
+static int i40e_clean_rx_irq(struct i40e_ring *rx_ring, int budget,
+ unsigned int *rx_cleaned)
{
unsigned int total_rx_bytes = 0, total_rx_packets = 0, frame_sz = 0;
u16 cleaned_count = I40E_DESC_UNUSED(rx_ring);
@@ -2571,6 +2573,8 @@ static int i40e_clean_rx_irq(struct i40e_ring *rx_ring, int budget)
i40e_update_rx_stats(rx_ring, total_rx_bytes, total_rx_packets);
+ *rx_cleaned = total_rx_packets;
+
/* guarantee a trip back through this routine if there was a failure */
return failure ? budget : (int)total_rx_packets;
}
@@ -2694,11 +2698,13 @@ int i40e_napi_poll(struct napi_struct *napi, int budget)
struct i40e_vsi *vsi = q_vector->vsi;
struct i40e_ring *ring;
bool tx_clean_complete = true;
+ bool rx_clean_complete = true;
bool clean_complete = true;
bool arm_wb = false;
int budget_per_ring;
int work_done = 0;
unsigned int tx_cleaned = 0;
+ unsigned int rx_cleaned = 0;
if (test_bit(__I40E_VSI_DOWN, vsi->state)) {
napi_complete(napi);
@@ -2738,13 +2744,13 @@ int i40e_napi_poll(struct napi_struct *napi, int budget)
i40e_for_each_ring(ring, q_vector->rx) {
int cleaned = ring->xsk_pool ?
- i40e_clean_rx_irq_zc(ring, budget_per_ring) :
- i40e_clean_rx_irq(ring, budget_per_ring);
+ i40e_clean_rx_irq_zc(ring, budget_per_ring, &rx_cleaned) :
+ i40e_clean_rx_irq(ring, budget_per_ring, &rx_cleaned);
work_done += cleaned;
/* if we clean as many as budgeted, we must not be done */
if (cleaned >= budget_per_ring)
- clean_complete = false;
+ clean_complete = rx_clean_complete = false;
}
/* If work not completed, return budget and polling will return */
diff --git a/drivers/net/ethernet/intel/i40e/i40e_xsk.c b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
index f98ce7e4..b1f582a0 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_xsk.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
@@ -378,10 +378,12 @@ static void i40e_handle_xdp_result_zc(struct i40e_ring *rx_ring,
* i40e_clean_rx_irq_zc - Consumes Rx packets from the hardware ring
* @rx_ring: Rx ring
* @budget: NAPI budget
+ * @rx_cleaned: out parameter of the packets processed
*
* Returns amount of work completed
**/
-int i40e_clean_rx_irq_zc(struct i40e_ring *rx_ring, int budget)
+int i40e_clean_rx_irq_zc(struct i40e_ring *rx_ring, int budget,
+ unsigned int *rx_cleaned)
{
unsigned int total_rx_bytes = 0, total_rx_packets = 0;
u16 next_to_clean = rx_ring->next_to_clean;
@@ -452,6 +454,8 @@ int i40e_clean_rx_irq_zc(struct i40e_ring *rx_ring, int budget)
i40e_finalize_xdp_rx(rx_ring, xdp_xmit);
i40e_update_rx_stats(rx_ring, total_rx_bytes, total_rx_packets);
+ *rx_cleaned = total_rx_packets;
+
if (xsk_uses_need_wakeup(rx_ring->xsk_pool)) {
if (failure || next_to_clean == rx_ring->next_to_use)
xsk_set_rx_need_wakeup(rx_ring->xsk_pool);
diff --git a/drivers/net/ethernet/intel/i40e/i40e_xsk.h b/drivers/net/ethernet/intel/i40e/i40e_xsk.h
index 396ed11..1089cc0 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_xsk.h
+++ b/drivers/net/ethernet/intel/i40e/i40e_xsk.h
@@ -28,7 +28,8 @@ int i40e_queue_pair_enable(struct i40e_vsi *vsi, int queue_pair);
int i40e_xsk_pool_setup(struct i40e_vsi *vsi, struct xsk_buff_pool *pool,
u16 qid);
bool i40e_alloc_rx_buffers_zc(struct i40e_ring *rx_ring, u16 cleaned_count);
-int i40e_clean_rx_irq_zc(struct i40e_ring *rx_ring, int budget);
+int i40e_clean_rx_irq_zc(struct i40e_ring *rx_ring, int budget,
+ unsigned int *rx_cleaned);
bool i40e_clean_xdp_tx_irq(struct i40e_vsi *vsi, struct i40e_ring *tx_ring,
unsigned int *tx_cleaned);
--
2.7.4
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [next-queue v3 4/4] i40e: Add i40e_napi_poll tracepoint
2022-10-06 23:43 [next-queue v3 0/4] i40e: Add an i40e_napi_poll tracepoint Joe Damato
` (2 preceding siblings ...)
2022-10-06 23:43 ` [next-queue v3 3/4] i40e: Record number of RXes " Joe Damato
@ 2022-10-06 23:43 ` Joe Damato
2022-10-07 8:18 ` Maciej Fijalkowski
3 siblings, 1 reply; 8+ messages in thread
From: Joe Damato @ 2022-10-06 23:43 UTC (permalink / raw)
To: intel-wired-lan
Cc: netdev, kuba, davem, anthony.l.nguyen, jesse.brandeburg,
maciej.fijalkowski, Joe Damato
Add a tracepoint for i40e_napi_poll that allows users to get detailed
information about the amount of work done. This information can help users
better tune the correct NAPI parameters (like weight and budget), as well
as debug NIC settings like rx-usecs and tx-usecs, etc.
An example of the output from this tracepoint:
$ sudo perf trace -e i40e:i40e_napi_poll -a --call-graph=fp --libtraceevent_print
[..snip..]
388.258 :0/0 i40e:i40e_napi_poll(i40e_napi_poll on dev eth2 q i40e-eth2-TxRx-9 irq 346 irq_mask 00000000,00000000,00000000,00000000,00000000,00800000 curr_cpu 23 budget 64 bpr 64 rx_cleaned 28 tx_cleaned 0 rx_clean_complete 1 tx_clean_complete 1)
i40e_napi_poll ([i40e])
i40e_napi_poll ([i40e])
__napi_poll ([kernel.kallsyms])
net_rx_action ([kernel.kallsyms])
__do_softirq ([kernel.kallsyms])
common_interrupt ([kernel.kallsyms])
asm_common_interrupt ([kernel.kallsyms])
intel_idle_irq ([kernel.kallsyms])
cpuidle_enter_state ([kernel.kallsyms])
cpuidle_enter ([kernel.kallsyms])
do_idle ([kernel.kallsyms])
cpu_startup_entry ([kernel.kallsyms])
[0x243fd8] ([kernel.kallsyms])
secondary_startup_64_no_verify ([kernel.kallsyms])
Signed-off-by: Joe Damato <jdamato@fastly.com>
---
drivers/net/ethernet/intel/i40e/i40e_trace.h | 49 ++++++++++++++++++++++++++++
drivers/net/ethernet/intel/i40e/i40e_txrx.c | 3 ++
2 files changed, 52 insertions(+)
diff --git a/drivers/net/ethernet/intel/i40e/i40e_trace.h b/drivers/net/ethernet/intel/i40e/i40e_trace.h
index b5b1229..7d7c161 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_trace.h
+++ b/drivers/net/ethernet/intel/i40e/i40e_trace.h
@@ -55,6 +55,55 @@
* being built from shared code.
*/
+#define NO_DEV "(i40e no_device)"
+
+TRACE_EVENT(i40e_napi_poll,
+
+ TP_PROTO(struct napi_struct *napi, struct i40e_q_vector *q, int budget,
+ int budget_per_ring, unsigned int rx_cleaned, unsigned int tx_cleaned,
+ bool rx_clean_complete, bool tx_clean_complete),
+
+ TP_ARGS(napi, q, budget, budget_per_ring, rx_cleaned, tx_cleaned,
+ rx_clean_complete, tx_clean_complete),
+
+ TP_STRUCT__entry(
+ __field(int, budget)
+ __field(int, budget_per_ring)
+ __field(unsigned int, rx_cleaned)
+ __field(unsigned int, tx_cleaned)
+ __field(int, rx_clean_complete)
+ __field(int, tx_clean_complete)
+ __field(int, irq_num)
+ __field(int, curr_cpu)
+ __string(qname, q->name)
+ __string(dev_name, napi->dev ? napi->dev->name : NO_DEV)
+ __bitmask(irq_affinity, nr_cpumask_bits)
+ ),
+
+ TP_fast_assign(
+ __entry->budget = budget;
+ __entry->budget_per_ring = budget_per_ring;
+ __entry->rx_cleaned = rx_cleaned;
+ __entry->tx_cleaned = tx_cleaned;
+ __entry->rx_clean_complete = rx_clean_complete;
+ __entry->tx_clean_complete = tx_clean_complete;
+ __entry->irq_num = q->irq_num;
+ __entry->curr_cpu = get_cpu();
+ __assign_str(qname, q->name);
+ __assign_str(dev_name, napi->dev ? napi->dev->name : NO_DEV);
+ __assign_bitmask(irq_affinity, cpumask_bits(&q->affinity_mask),
+ nr_cpumask_bits);
+ ),
+
+ TP_printk("i40e_napi_poll on dev %s q %s irq %d irq_mask %s curr_cpu %d "
+ "budget %d bpr %d rx_cleaned %lu tx_cleaned %lu "
+ "rx_clean_complete %d tx_clean_complete %d",
+ __get_str(dev_name), __get_str(qname), __entry->irq_num,
+ __get_bitmask(irq_affinity), __entry->curr_cpu, __entry->budget,
+ __entry->budget_per_ring, __entry->rx_cleaned, __entry->tx_cleaned,
+ __entry->rx_clean_complete, __entry->tx_clean_complete)
+);
+
/* Events related to a vsi & ring */
DECLARE_EVENT_CLASS(
i40e_tx_template,
diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
index adf133b..fb9add8 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
@@ -2753,6 +2753,9 @@ int i40e_napi_poll(struct napi_struct *napi, int budget)
clean_complete = rx_clean_complete = false;
}
+ trace_i40e_napi_poll(napi, q_vector, budget, budget_per_ring, rx_cleaned,
+ tx_cleaned, rx_clean_complete, tx_clean_complete);
+
/* If work not completed, return budget and polling will return */
if (!clean_complete) {
int cpu_id = smp_processor_id();
--
2.7.4
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [next-queue v3 2/4] i40e: Record number TXes cleaned during NAPI
2022-10-06 23:43 ` [next-queue v3 2/4] i40e: Record number TXes cleaned during NAPI Joe Damato
@ 2022-10-07 8:10 ` Maciej Fijalkowski
0 siblings, 0 replies; 8+ messages in thread
From: Maciej Fijalkowski @ 2022-10-07 8:10 UTC (permalink / raw)
To: Joe Damato
Cc: intel-wired-lan, netdev, kuba, davem, anthony.l.nguyen,
jesse.brandeburg
On Thu, Oct 06, 2022 at 04:43:56PM -0700, Joe Damato wrote:
> Update i40e_clean_tx_irq to take an out parameter (tx_cleaned) which stores
> the number TXs cleaned.
>
> Likewise, update i40e_clean_xdp_tx_irq and i40e_xmit_zc to do the same.
>
> Care has been taken to avoid changing the control flow of any functions
> involved.
>
> Signed-off-by: Joe Damato <jdamato@fastly.com>
> ---
> drivers/net/ethernet/intel/i40e/i40e_txrx.c | 16 +++++++++++-----
> drivers/net/ethernet/intel/i40e/i40e_xsk.c | 15 +++++++++++----
> drivers/net/ethernet/intel/i40e/i40e_xsk.h | 3 ++-
> 3 files changed, 24 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
> index b97c95f..a2cc98e 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
> +++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
> @@ -923,11 +923,13 @@ void i40e_detect_recover_hung(struct i40e_vsi *vsi)
> * @vsi: the VSI we care about
> * @tx_ring: Tx ring to clean
> * @napi_budget: Used to determine if we are in netpoll
> + * @tx_cleaned: Out parameter set to the number of TXes cleaned
> *
> * Returns true if there's any budget left (e.g. the clean is finished)
> **/
> static bool i40e_clean_tx_irq(struct i40e_vsi *vsi,
> - struct i40e_ring *tx_ring, int napi_budget)
> + struct i40e_ring *tx_ring, int napi_budget,
> + unsigned int *tx_cleaned)
> {
> int i = tx_ring->next_to_clean;
> struct i40e_tx_buffer *tx_buf;
> @@ -1026,7 +1028,7 @@ static bool i40e_clean_tx_irq(struct i40e_vsi *vsi,
> i40e_arm_wb(tx_ring, vsi, budget);
>
> if (ring_is_xdp(tx_ring))
> - return !!budget;
> + goto out;
>
> /* notify netdev of completed buffers */
> netdev_tx_completed_queue(txring_txq(tx_ring),
> @@ -1048,6 +1050,8 @@ static bool i40e_clean_tx_irq(struct i40e_vsi *vsi,
> }
> }
>
> +out:
> + *tx_cleaned = total_packets;
> return !!budget;
> }
>
> @@ -2689,10 +2693,12 @@ int i40e_napi_poll(struct napi_struct *napi, int budget)
> container_of(napi, struct i40e_q_vector, napi);
> struct i40e_vsi *vsi = q_vector->vsi;
> struct i40e_ring *ring;
> + bool tx_clean_complete = true;
> bool clean_complete = true;
> bool arm_wb = false;
> int budget_per_ring;
> int work_done = 0;
> + unsigned int tx_cleaned = 0;
>
> if (test_bit(__I40E_VSI_DOWN, vsi->state)) {
> napi_complete(napi);
> @@ -2704,11 +2710,11 @@ int i40e_napi_poll(struct napi_struct *napi, int budget)
> */
> i40e_for_each_ring(ring, q_vector->tx) {
> bool wd = ring->xsk_pool ?
> - i40e_clean_xdp_tx_irq(vsi, ring) :
> - i40e_clean_tx_irq(vsi, ring, budget);
> + i40e_clean_xdp_tx_irq(vsi, ring, &tx_cleaned) :
> + i40e_clean_tx_irq(vsi, ring, budget, &tx_cleaned);
>
> if (!wd) {
> - clean_complete = false;
> + clean_complete = tx_clean_complete = false;
> continue;
> }
> arm_wb |= ring->arm_wb;
> diff --git a/drivers/net/ethernet/intel/i40e/i40e_xsk.c b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
> index 790aaeff..f98ce7e4 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e_xsk.c
> +++ b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
> @@ -530,18 +530,22 @@ static void i40e_set_rs_bit(struct i40e_ring *xdp_ring)
> * i40e_xmit_zc - Performs zero-copy Tx AF_XDP
> * @xdp_ring: XDP Tx ring
> * @budget: NAPI budget
> + * @tx_cleaned: Out parameter of the TX packets processed
> *
> * Returns true if the work is finished.
> **/
> -static bool i40e_xmit_zc(struct i40e_ring *xdp_ring, unsigned int budget)
> +static bool i40e_xmit_zc(struct i40e_ring *xdp_ring, unsigned int budget,
> + unsigned int *tx_cleaned)
> {
> struct xdp_desc *descs = xdp_ring->xsk_pool->tx_descs;
> u32 nb_pkts, nb_processed = 0;
> unsigned int total_bytes = 0;
>
> nb_pkts = xsk_tx_peek_release_desc_batch(xdp_ring->xsk_pool, budget);
> - if (!nb_pkts)
> + if (!nb_pkts) {
> + *tx_cleaned = 0;
> return true;
> + }
>
> if (xdp_ring->next_to_use + nb_pkts >= xdp_ring->count) {
> nb_processed = xdp_ring->count - xdp_ring->next_to_use;
> @@ -558,6 +562,7 @@ static bool i40e_xmit_zc(struct i40e_ring *xdp_ring, unsigned int budget)
>
> i40e_update_tx_stats(xdp_ring, nb_pkts, total_bytes);
>
> + *tx_cleaned = nb_pkts;
please either call it tx_processed or just always fill it with 0 (see my
reply to 4/4).
> return nb_pkts < budget;
> }
>
> @@ -581,10 +586,12 @@ static void i40e_clean_xdp_tx_buffer(struct i40e_ring *tx_ring,
> * i40e_clean_xdp_tx_irq - Completes AF_XDP entries, and cleans XDP entries
> * @vsi: Current VSI
> * @tx_ring: XDP Tx ring
> + * @tx_cleaned: out parameter of number of TXes cleaned
> *
> * Returns true if cleanup/tranmission is done.
> **/
> -bool i40e_clean_xdp_tx_irq(struct i40e_vsi *vsi, struct i40e_ring *tx_ring)
> +bool i40e_clean_xdp_tx_irq(struct i40e_vsi *vsi, struct i40e_ring *tx_ring,
> + unsigned int *tx_cleaned)
> {
> struct xsk_buff_pool *bp = tx_ring->xsk_pool;
> u32 i, completed_frames, xsk_frames = 0;
> @@ -634,7 +641,7 @@ bool i40e_clean_xdp_tx_irq(struct i40e_vsi *vsi, struct i40e_ring *tx_ring)
> if (xsk_uses_need_wakeup(tx_ring->xsk_pool))
> xsk_set_tx_need_wakeup(tx_ring->xsk_pool);
>
> - return i40e_xmit_zc(tx_ring, I40E_DESC_UNUSED(tx_ring));
> + return i40e_xmit_zc(tx_ring, I40E_DESC_UNUSED(tx_ring), tx_cleaned);
> }
>
> /**
> diff --git a/drivers/net/ethernet/intel/i40e/i40e_xsk.h b/drivers/net/ethernet/intel/i40e/i40e_xsk.h
> index 821df24..396ed11 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e_xsk.h
> +++ b/drivers/net/ethernet/intel/i40e/i40e_xsk.h
> @@ -30,7 +30,8 @@ int i40e_xsk_pool_setup(struct i40e_vsi *vsi, struct xsk_buff_pool *pool,
> bool i40e_alloc_rx_buffers_zc(struct i40e_ring *rx_ring, u16 cleaned_count);
> int i40e_clean_rx_irq_zc(struct i40e_ring *rx_ring, int budget);
>
> -bool i40e_clean_xdp_tx_irq(struct i40e_vsi *vsi, struct i40e_ring *tx_ring);
> +bool i40e_clean_xdp_tx_irq(struct i40e_vsi *vsi, struct i40e_ring *tx_ring,
> + unsigned int *tx_cleaned);
> int i40e_xsk_wakeup(struct net_device *dev, u32 queue_id, u32 flags);
> int i40e_realloc_rx_bi_zc(struct i40e_vsi *vsi, bool zc);
> void i40e_clear_rx_bi_zc(struct i40e_ring *rx_ring);
> --
> 2.7.4
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [next-queue v3 4/4] i40e: Add i40e_napi_poll tracepoint
2022-10-06 23:43 ` [next-queue v3 4/4] i40e: Add i40e_napi_poll tracepoint Joe Damato
@ 2022-10-07 8:18 ` Maciej Fijalkowski
2022-10-07 17:49 ` Joe Damato
0 siblings, 1 reply; 8+ messages in thread
From: Maciej Fijalkowski @ 2022-10-07 8:18 UTC (permalink / raw)
To: Joe Damato
Cc: intel-wired-lan, netdev, kuba, davem, anthony.l.nguyen,
jesse.brandeburg
On Thu, Oct 06, 2022 at 04:43:58PM -0700, Joe Damato wrote:
> Add a tracepoint for i40e_napi_poll that allows users to get detailed
> information about the amount of work done. This information can help users
> better tune the correct NAPI parameters (like weight and budget), as well
> as debug NIC settings like rx-usecs and tx-usecs, etc.
>
> An example of the output from this tracepoint:
>
> $ sudo perf trace -e i40e:i40e_napi_poll -a --call-graph=fp --libtraceevent_print
>
> [..snip..]
>
> 388.258 :0/0 i40e:i40e_napi_poll(i40e_napi_poll on dev eth2 q
> i40e-eth2-TxRx-9 irq 346 irq_mask
> 00000000,00000000,00000000,00000000,00000000,00800000 curr_cpu 23 budget
> 64 bpr 64 rx_cleaned 28 tx_cleaned 0 rx_clean_complete 1
> tx_clean_complete 1)
So from AF_XDP POV I won't be using it as I would need some other
information. As I said, we don't work on NAPI budget but rather with the
free ring space and I don't get it here. tx_cleaned is also quite
incorrect name to me for count of produced descriptors to Tx ring. I feel
like it would be better to stub it for AF_XDP.
As Jesse said previously we probably can followup with AF_XDP specific
tracepoint with tx cleaned/tx transmitted/NAPI budget/AF_XDP budget (free
ring space) if we find the need for it.
That's my 0.02$, I'm not going to hold this set or whatever, I'll leave the
decision to Sridhar & Jesse.
> i40e_napi_poll ([i40e])
> i40e_napi_poll ([i40e])
> __napi_poll ([kernel.kallsyms])
> net_rx_action ([kernel.kallsyms])
> __do_softirq ([kernel.kallsyms])
> common_interrupt ([kernel.kallsyms])
> asm_common_interrupt ([kernel.kallsyms])
> intel_idle_irq ([kernel.kallsyms])
> cpuidle_enter_state ([kernel.kallsyms])
> cpuidle_enter ([kernel.kallsyms])
> do_idle ([kernel.kallsyms])
> cpu_startup_entry ([kernel.kallsyms])
> [0x243fd8] ([kernel.kallsyms])
> secondary_startup_64_no_verify ([kernel.kallsyms])
>
> Signed-off-by: Joe Damato <jdamato@fastly.com>
> ---
> drivers/net/ethernet/intel/i40e/i40e_trace.h | 49 ++++++++++++++++++++++++++++
> drivers/net/ethernet/intel/i40e/i40e_txrx.c | 3 ++
> 2 files changed, 52 insertions(+)
>
> diff --git a/drivers/net/ethernet/intel/i40e/i40e_trace.h b/drivers/net/ethernet/intel/i40e/i40e_trace.h
> index b5b1229..7d7c161 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e_trace.h
> +++ b/drivers/net/ethernet/intel/i40e/i40e_trace.h
> @@ -55,6 +55,55 @@
> * being built from shared code.
> */
>
> +#define NO_DEV "(i40e no_device)"
> +
> +TRACE_EVENT(i40e_napi_poll,
> +
> + TP_PROTO(struct napi_struct *napi, struct i40e_q_vector *q, int budget,
> + int budget_per_ring, unsigned int rx_cleaned, unsigned int tx_cleaned,
> + bool rx_clean_complete, bool tx_clean_complete),
> +
> + TP_ARGS(napi, q, budget, budget_per_ring, rx_cleaned, tx_cleaned,
> + rx_clean_complete, tx_clean_complete),
> +
> + TP_STRUCT__entry(
> + __field(int, budget)
> + __field(int, budget_per_ring)
> + __field(unsigned int, rx_cleaned)
> + __field(unsigned int, tx_cleaned)
> + __field(int, rx_clean_complete)
> + __field(int, tx_clean_complete)
> + __field(int, irq_num)
> + __field(int, curr_cpu)
> + __string(qname, q->name)
> + __string(dev_name, napi->dev ? napi->dev->name : NO_DEV)
> + __bitmask(irq_affinity, nr_cpumask_bits)
> + ),
> +
> + TP_fast_assign(
> + __entry->budget = budget;
> + __entry->budget_per_ring = budget_per_ring;
> + __entry->rx_cleaned = rx_cleaned;
> + __entry->tx_cleaned = tx_cleaned;
> + __entry->rx_clean_complete = rx_clean_complete;
> + __entry->tx_clean_complete = tx_clean_complete;
> + __entry->irq_num = q->irq_num;
> + __entry->curr_cpu = get_cpu();
> + __assign_str(qname, q->name);
> + __assign_str(dev_name, napi->dev ? napi->dev->name : NO_DEV);
> + __assign_bitmask(irq_affinity, cpumask_bits(&q->affinity_mask),
> + nr_cpumask_bits);
> + ),
> +
> + TP_printk("i40e_napi_poll on dev %s q %s irq %d irq_mask %s curr_cpu %d "
> + "budget %d bpr %d rx_cleaned %lu tx_cleaned %lu "
> + "rx_clean_complete %d tx_clean_complete %d",
> + __get_str(dev_name), __get_str(qname), __entry->irq_num,
> + __get_bitmask(irq_affinity), __entry->curr_cpu, __entry->budget,
> + __entry->budget_per_ring, __entry->rx_cleaned, __entry->tx_cleaned,
> + __entry->rx_clean_complete, __entry->tx_clean_complete)
> +);
> +
> /* Events related to a vsi & ring */
> DECLARE_EVENT_CLASS(
> i40e_tx_template,
> diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
> index adf133b..fb9add8 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
> +++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
> @@ -2753,6 +2753,9 @@ int i40e_napi_poll(struct napi_struct *napi, int budget)
> clean_complete = rx_clean_complete = false;
> }
>
> + trace_i40e_napi_poll(napi, q_vector, budget, budget_per_ring, rx_cleaned,
> + tx_cleaned, rx_clean_complete, tx_clean_complete);
> +
> /* If work not completed, return budget and polling will return */
> if (!clean_complete) {
> int cpu_id = smp_processor_id();
> --
> 2.7.4
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [next-queue v3 4/4] i40e: Add i40e_napi_poll tracepoint
2022-10-07 8:18 ` Maciej Fijalkowski
@ 2022-10-07 17:49 ` Joe Damato
0 siblings, 0 replies; 8+ messages in thread
From: Joe Damato @ 2022-10-07 17:49 UTC (permalink / raw)
To: Maciej Fijalkowski
Cc: intel-wired-lan, netdev, kuba, davem, anthony.l.nguyen,
jesse.brandeburg
On Fri, Oct 07, 2022 at 10:18:41AM +0200, Maciej Fijalkowski wrote:
> On Thu, Oct 06, 2022 at 04:43:58PM -0700, Joe Damato wrote:
> > Add a tracepoint for i40e_napi_poll that allows users to get detailed
> > information about the amount of work done. This information can help users
> > better tune the correct NAPI parameters (like weight and budget), as well
> > as debug NIC settings like rx-usecs and tx-usecs, etc.
> >
> > An example of the output from this tracepoint:
> >
> > $ sudo perf trace -e i40e:i40e_napi_poll -a --call-graph=fp --libtraceevent_print
> >
> > [..snip..]
> >
> > 388.258 :0/0 i40e:i40e_napi_poll(i40e_napi_poll on dev eth2 q
> > i40e-eth2-TxRx-9 irq 346 irq_mask
> > 00000000,00000000,00000000,00000000,00000000,00800000 curr_cpu 23 budget
> > 64 bpr 64 rx_cleaned 28 tx_cleaned 0 rx_clean_complete 1
> > tx_clean_complete 1)
>
> So from AF_XDP POV I won't be using it as I would need some other
> information.
>
> As I said, we don't work on NAPI budget but rather with the
> free ring space and I don't get it here. tx_cleaned is also quite
> incorrect name to me for count of produced descriptors to Tx ring. I feel
> like it would be better to stub it for AF_XDP.
>
> As Jesse said previously we probably can followup with AF_XDP specific
> tracepoint with tx cleaned/tx transmitted/NAPI budget/AF_XDP budget (free
> ring space) if we find the need for it.
>
> That's my 0.02$, I'm not going to hold this set or whatever, I'll leave the
> decision to Sridhar & Jesse.
I'll send the other patchset I've written and tested as an RFC which doesn't
touch anything in XDP at all and only adds the tracepoint in i40e_napi_poll only
if xdp is not enabled.
The code in that branch for i40e_napi_poll looks like this:
+ if (!i40e_enabled_xdp_vsi(vsi))
+ trace_i40e_napi_poll(napi, q_vector, budget, budget_per_ring, rx_cleaned,
+ tx_cleaned, rx_clean_complete, tx_clean_complete);
The XDP functions are not modified to take any out parameters in that
branch.
In that case, anyone who cares about XDP (when I think about XDP, I feel
nothing, so that probably won't be me) can go back through and add their
own XDP related tracepoint as the 'else' branch and modify all the xdp
related functions and add (very precisely named) out parameters.
As far as I'm concerned: I submit this code simply because it's been very
useful for me to debug i40e performance and to tune settings and I thought
it might be helpful for others.
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2022-10-07 17:49 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-10-06 23:43 [next-queue v3 0/4] i40e: Add an i40e_napi_poll tracepoint Joe Damato
2022-10-06 23:43 ` [next-queue v3 1/4] i40e: Store the irq number in i40e_q_vector Joe Damato
2022-10-06 23:43 ` [next-queue v3 2/4] i40e: Record number TXes cleaned during NAPI Joe Damato
2022-10-07 8:10 ` Maciej Fijalkowski
2022-10-06 23:43 ` [next-queue v3 3/4] i40e: Record number of RXes " Joe Damato
2022-10-06 23:43 ` [next-queue v3 4/4] i40e: Add i40e_napi_poll tracepoint Joe Damato
2022-10-07 8:18 ` Maciej Fijalkowski
2022-10-07 17:49 ` Joe Damato
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).