* [RFC PATCH net-next v2 0/6] net: xilinx: axienet: Enable adaptive IRQ coalescing with DIM
@ 2024-09-09 23:52 Sean Anderson
2024-09-09 23:52 ` [RFC PATCH net-next v2 1/6] net: xilinx: axienet: Add some symbolic constants for IRQ delay timer Sean Anderson
` (6 more replies)
0 siblings, 7 replies; 11+ messages in thread
From: Sean Anderson @ 2024-09-09 23:52 UTC (permalink / raw)
To: David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Radhey Shyam Pandey, netdev
Cc: linux-arm-kernel, Michal Simek, linux-kernel, Sean Anderson,
Heng Qi
To improve performance without sacrificing latency under low load,
enable DIM. While I appreciate not having to write the library myself, I
do think there are many unusual aspects to DIM, as detailed in the last
patch.
This series depends on [1-2] and is therefore marked RFC. This series is
otherwise ready to merge.
[1] https://lore.kernel.org/netdev/20240909230908.1319982-1-sean.anderson@linux.dev/
[2] https://lore.kernel.org/netdev/20240909231904.1322387-1-sean.anderson@linux.dev/
Changes in v2:
- Add some symbolic constants for IRQ delay timer
- Report an error for bad coalesce settings
- Don't use spin_lock_irqsave when we know the context
- Split the CR calculation refactor from runtime coalesce settings
adjustment support for easier review.
- Have axienet_update_coalesce_rx/tx take the cr value/mask instead of
calculating it with axienet_calc_cr. This will make it easier to add
partial updates in the next few commits.
- Get coalesce parameters from driver state
- Don't take the RTNL in axienet_rx_dim_work to avoid deadlock. Instead,
calculate a partial cr update that axienet_update_coalesce_rx can
perform under a spin lock.
- Use READ/WRITE_ONCE when accessing/modifying rx_irqs
Sean Anderson (6):
net: xilinx: axienet: Add some symbolic constants for IRQ delay timer
net: xilinx: axienet: Report an error for bad coalesce settings
net: xilinx: axienet: Combine CR calculation
net: xilinx: axienet: Support adjusting coalesce settings while
running
net: xilinx: axienet: Get coalesce parameters from driver state
net: xilinx: axienet: Enable adaptive IRQ coalescing with DIM
drivers/net/ethernet/xilinx/Kconfig | 1 +
drivers/net/ethernet/xilinx/xilinx_axienet.h | 31 +-
.../net/ethernet/xilinx/xilinx_axienet_main.c | 320 ++++++++++++++----
3 files changed, 273 insertions(+), 79 deletions(-)
--
2.35.1.1320.gc452695387.dirty
^ permalink raw reply [flat|nested] 11+ messages in thread
* [RFC PATCH net-next v2 1/6] net: xilinx: axienet: Add some symbolic constants for IRQ delay timer
2024-09-09 23:52 [RFC PATCH net-next v2 0/6] net: xilinx: axienet: Enable adaptive IRQ coalescing with DIM Sean Anderson
@ 2024-09-09 23:52 ` Sean Anderson
2024-09-09 23:52 ` [RFC PATCH net-next v2 2/6] net: xilinx: axienet: Report an error for bad coalesce settings Sean Anderson
` (5 subsequent siblings)
6 siblings, 0 replies; 11+ messages in thread
From: Sean Anderson @ 2024-09-09 23:52 UTC (permalink / raw)
To: David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Radhey Shyam Pandey, netdev
Cc: linux-arm-kernel, Michal Simek, linux-kernel, Sean Anderson
Instead of using literals, add some symbolic constants for the IRQ delay
timer calculation.
Signed-off-by: Sean Anderson <sean.anderson@linux.dev>
---
Changes in v2:
- New
drivers/net/ethernet/xilinx/xilinx_axienet.h | 3 +++
drivers/net/ethernet/xilinx/xilinx_axienet_main.c | 7 ++-----
2 files changed, 5 insertions(+), 5 deletions(-)
diff --git a/drivers/net/ethernet/xilinx/xilinx_axienet.h b/drivers/net/ethernet/xilinx/xilinx_axienet.h
index d70cb62ecf29..5c0a21ef96a4 100644
--- a/drivers/net/ethernet/xilinx/xilinx_axienet.h
+++ b/drivers/net/ethernet/xilinx/xilinx_axienet.h
@@ -119,6 +119,9 @@
#define XAXIDMA_IRQ_ERROR_MASK 0x00004000 /* Error interrupt */
#define XAXIDMA_IRQ_ALL_MASK 0x00007000 /* All interrupts */
+/* Constant to convert delay counts to microseconds */
+#define XAXIDMA_DELAY_SCALE (125ULL * USEC_PER_SEC)
+
/* Default TX/RX Threshold and delay timer values for SGDMA mode */
#define XAXIDMA_DFT_TX_THRESHOLD 24
#define XAXIDMA_DFT_TX_USEC 50
diff --git a/drivers/net/ethernet/xilinx/xilinx_axienet_main.c b/drivers/net/ethernet/xilinx/xilinx_axienet_main.c
index b065979db196..cf67d455da48 100644
--- a/drivers/net/ethernet/xilinx/xilinx_axienet_main.c
+++ b/drivers/net/ethernet/xilinx/xilinx_axienet_main.c
@@ -238,11 +238,8 @@ static u32 axienet_usec_to_timer(struct axienet_local *lp, u32 coalesce_usec)
/* 1 Timeout Interval = 125 * (clock period of SG clock) */
result = DIV64_U64_ROUND_CLOSEST((u64)coalesce_usec * clk_rate,
- (u64)125000000);
- if (result > 255)
- result = 255;
-
- return result;
+ XAXIDMA_DELAY_SCALE);
+ return min(result, FIELD_MAX(XAXIDMA_DELAY_MASK));
}
/**
--
2.35.1.1320.gc452695387.dirty
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [RFC PATCH net-next v2 2/6] net: xilinx: axienet: Report an error for bad coalesce settings
2024-09-09 23:52 [RFC PATCH net-next v2 0/6] net: xilinx: axienet: Enable adaptive IRQ coalescing with DIM Sean Anderson
2024-09-09 23:52 ` [RFC PATCH net-next v2 1/6] net: xilinx: axienet: Add some symbolic constants for IRQ delay timer Sean Anderson
@ 2024-09-09 23:52 ` Sean Anderson
2024-09-09 23:52 ` [RFC PATCH net-next v2 3/6] net: xilinx: axienet: Combine CR calculation Sean Anderson
` (4 subsequent siblings)
6 siblings, 0 replies; 11+ messages in thread
From: Sean Anderson @ 2024-09-09 23:52 UTC (permalink / raw)
To: David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Radhey Shyam Pandey, netdev
Cc: linux-arm-kernel, Michal Simek, linux-kernel, Sean Anderson
Instead of silently ignoring invalid/unsupported settings, report an
error. Additionally, relax the check for non-zero usecs to apply only
when it will be used (i.e. when frames != 1).
Signed-off-by: Sean Anderson <sean.anderson@linux.dev>
---
Changes in v2:
- New
.../net/ethernet/xilinx/xilinx_axienet_main.c | 27 +++++++++++++------
1 file changed, 19 insertions(+), 8 deletions(-)
diff --git a/drivers/net/ethernet/xilinx/xilinx_axienet_main.c b/drivers/net/ethernet/xilinx/xilinx_axienet_main.c
index cf67d455da48..bc987f7ca1ea 100644
--- a/drivers/net/ethernet/xilinx/xilinx_axienet_main.c
+++ b/drivers/net/ethernet/xilinx/xilinx_axienet_main.c
@@ -2048,14 +2048,25 @@ axienet_ethtools_set_coalesce(struct net_device *ndev,
return -EBUSY;
}
- if (ecoalesce->rx_max_coalesced_frames)
- lp->coalesce_count_rx = ecoalesce->rx_max_coalesced_frames;
- if (ecoalesce->rx_coalesce_usecs)
- lp->coalesce_usec_rx = ecoalesce->rx_coalesce_usecs;
- if (ecoalesce->tx_max_coalesced_frames)
- lp->coalesce_count_tx = ecoalesce->tx_max_coalesced_frames;
- if (ecoalesce->tx_coalesce_usecs)
- lp->coalesce_usec_tx = ecoalesce->tx_coalesce_usecs;
+ if (!ecoalesce->rx_max_coalesced_frames ||
+ !ecoalesce->tx_max_coalesced_frames) {
+ NL_SET_ERR_MSG(extack, "frames must be non-zero");
+ return -EINVAL;
+ }
+
+ if ((ecoalesce->rx_max_coalesced_frames > 1 &&
+ !ecoalesce->rx_coalesce_usecs) ||
+ (ecoalesce->tx_max_coalesced_frames > 1 &&
+ !ecoalesce->tx_coalesce_usecs)) {
+ NL_SET_ERR_MSG(extack,
+ "usecs must be non-zero when frames is greater than one");
+ return -EINVAL;
+ }
+
+ lp->coalesce_count_rx = ecoalesce->rx_max_coalesced_frames;
+ lp->coalesce_usec_rx = ecoalesce->rx_coalesce_usecs;
+ lp->coalesce_count_tx = ecoalesce->tx_max_coalesced_frames;
+ lp->coalesce_usec_tx = ecoalesce->tx_coalesce_usecs;
return 0;
}
--
2.35.1.1320.gc452695387.dirty
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [RFC PATCH net-next v2 3/6] net: xilinx: axienet: Combine CR calculation
2024-09-09 23:52 [RFC PATCH net-next v2 0/6] net: xilinx: axienet: Enable adaptive IRQ coalescing with DIM Sean Anderson
2024-09-09 23:52 ` [RFC PATCH net-next v2 1/6] net: xilinx: axienet: Add some symbolic constants for IRQ delay timer Sean Anderson
2024-09-09 23:52 ` [RFC PATCH net-next v2 2/6] net: xilinx: axienet: Report an error for bad coalesce settings Sean Anderson
@ 2024-09-09 23:52 ` Sean Anderson
2024-09-10 1:32 ` Nelson, Shannon
2024-09-09 23:52 ` [RFC PATCH net-next v2 4/6] net: xilinx: axienet: Support adjusting coalesce settings while running Sean Anderson
` (3 subsequent siblings)
6 siblings, 1 reply; 11+ messages in thread
From: Sean Anderson @ 2024-09-09 23:52 UTC (permalink / raw)
To: David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Radhey Shyam Pandey, netdev
Cc: linux-arm-kernel, Michal Simek, linux-kernel, Sean Anderson
Combine the common parts of the CR calculations for better code reuse.
While we're at it, simplify the code a bit.
Signed-off-by: Sean Anderson <sean.anderson@linux.dev>
---
Changes in v2:
- Split off from runtime coalesce modification support
drivers/net/ethernet/xilinx/xilinx_axienet.h | 2 -
.../net/ethernet/xilinx/xilinx_axienet_main.c | 69 ++++++++++---------
2 files changed, 35 insertions(+), 36 deletions(-)
diff --git a/drivers/net/ethernet/xilinx/xilinx_axienet.h b/drivers/net/ethernet/xilinx/xilinx_axienet.h
index 5c0a21ef96a4..c43ce8f7590c 100644
--- a/drivers/net/ethernet/xilinx/xilinx_axienet.h
+++ b/drivers/net/ethernet/xilinx/xilinx_axienet.h
@@ -112,8 +112,6 @@
#define XAXIDMA_DELAY_MASK ((u32)0xFF000000) /* Delay timeout counter */
#define XAXIDMA_COALESCE_MASK ((u32)0x00FF0000) /* Coalesce counter */
-#define XAXIDMA_DELAY_SHIFT 24
-
#define XAXIDMA_IRQ_IOC_MASK 0x00001000 /* Completion intr */
#define XAXIDMA_IRQ_DELAY_MASK 0x00002000 /* Delay interrupt */
#define XAXIDMA_IRQ_ERROR_MASK 0x00004000 /* Error interrupt */
diff --git a/drivers/net/ethernet/xilinx/xilinx_axienet_main.c b/drivers/net/ethernet/xilinx/xilinx_axienet_main.c
index bc987f7ca1ea..bff94d378b9f 100644
--- a/drivers/net/ethernet/xilinx/xilinx_axienet_main.c
+++ b/drivers/net/ethernet/xilinx/xilinx_axienet_main.c
@@ -224,22 +224,41 @@ static void axienet_dma_bd_release(struct net_device *ndev)
}
/**
- * axienet_usec_to_timer - Calculate IRQ delay timer value
- * @lp: Pointer to the axienet_local structure
- * @coalesce_usec: Microseconds to convert into timer value
+ * axienet_calc_cr() - Calculate control register value
+ * @lp: Device private data
+ * @coalesce_count: Number of completions before an interrupt
+ * @coalesce_usec: Microseconds after the last completion before an interrupt
+ *
+ * Calculate a control register value based on the coalescing settings. The
+ * run/stop bit is not set.
*/
-static u32 axienet_usec_to_timer(struct axienet_local *lp, u32 coalesce_usec)
+static u32 axienet_calc_cr(struct axienet_local *lp, u32 count, u32 usec)
{
- u32 result;
- u64 clk_rate = 125000000; /* arbitrary guess if no clock rate set */
+ u32 cr;
- if (lp->axi_clk)
- clk_rate = clk_get_rate(lp->axi_clk);
+ count = min(count, FIELD_MAX(XAXIDMA_COALESCE_MASK));
+ cr = FIELD_PREP(XAXIDMA_COALESCE_MASK, count) | XAXIDMA_IRQ_IOC_MASK |
+ XAXIDMA_IRQ_ERROR_MASK;
+ /* Only set interrupt delay timer if not generating an interrupt on
+ * the first packet. Otherwise leave at 0 to disable delay interrupt.
+ */
+ if (count > 1) {
+ u64 clk_rate = 125000000; /* arbitrary guess if no clock rate set */
+ u32 timer;
- /* 1 Timeout Interval = 125 * (clock period of SG clock) */
- result = DIV64_U64_ROUND_CLOSEST((u64)coalesce_usec * clk_rate,
- XAXIDMA_DELAY_SCALE);
- return min(result, FIELD_MAX(XAXIDMA_DELAY_MASK));
+ if (lp->axi_clk)
+ clk_rate = clk_get_rate(lp->axi_clk);
+
+ /* 1 Timeout Interval = 125 * (clock period of SG clock) */
+ timer = DIV64_U64_ROUND_CLOSEST((u64)usec * clk_rate,
+ XAXIDMA_DELAY_SCALE);
+
+ timer = min(timer, FIELD_MAX(XAXIDMA_DELAY_MASK));
+ cr |= FIELD_PREP(XAXIDMA_DELAY_MASK, timer) |
+ XAXIDMA_IRQ_DELAY_MASK;
+ }
+
+ return cr;
}
/**
@@ -249,31 +268,13 @@ static u32 axienet_usec_to_timer(struct axienet_local *lp, u32 coalesce_usec)
static void axienet_dma_start(struct axienet_local *lp)
{
/* Start updating the Rx channel control register */
- lp->rx_dma_cr = FIELD_PREP(XAXIDMA_COALESCE_MASK,
- min(lp->coalesce_count_rx,
- FIELD_MAX(XAXIDMA_COALESCE_MASK))) |
- XAXIDMA_IRQ_IOC_MASK | XAXIDMA_IRQ_ERROR_MASK;
- /* Only set interrupt delay timer if not generating an interrupt on
- * the first RX packet. Otherwise leave at 0 to disable delay interrupt.
- */
- if (lp->coalesce_count_rx > 1)
- lp->rx_dma_cr |= (axienet_usec_to_timer(lp, lp->coalesce_usec_rx)
- << XAXIDMA_DELAY_SHIFT) |
- XAXIDMA_IRQ_DELAY_MASK;
+ lp->rx_dma_cr = axienet_calc_cr(lp, lp->coalesce_count_rx,
+ lp->coalesce_usec_rx);
axienet_dma_out32(lp, XAXIDMA_RX_CR_OFFSET, lp->rx_dma_cr);
/* Start updating the Tx channel control register */
- lp->tx_dma_cr = FIELD_PREP(XAXIDMA_COALESCE_MASK,
- min(lp->coalesce_count_tx,
- FIELD_MAX(XAXIDMA_COALESCE_MASK))) |
- XAXIDMA_IRQ_IOC_MASK | XAXIDMA_IRQ_ERROR_MASK;
- /* Only set interrupt delay timer if not generating an interrupt on
- * the first TX packet. Otherwise leave at 0 to disable delay interrupt.
- */
- if (lp->coalesce_count_tx > 1)
- lp->tx_dma_cr |= (axienet_usec_to_timer(lp, lp->coalesce_usec_tx)
- << XAXIDMA_DELAY_SHIFT) |
- XAXIDMA_IRQ_DELAY_MASK;
+ lp->tx_dma_cr = axienet_calc_cr(lp, lp->coalesce_count_tx,
+ lp->coalesce_usec_tx);
axienet_dma_out32(lp, XAXIDMA_TX_CR_OFFSET, lp->tx_dma_cr);
/* Populate the tail pointer and bring the Rx Axi DMA engine out of
--
2.35.1.1320.gc452695387.dirty
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [RFC PATCH net-next v2 4/6] net: xilinx: axienet: Support adjusting coalesce settings while running
2024-09-09 23:52 [RFC PATCH net-next v2 0/6] net: xilinx: axienet: Enable adaptive IRQ coalescing with DIM Sean Anderson
` (2 preceding siblings ...)
2024-09-09 23:52 ` [RFC PATCH net-next v2 3/6] net: xilinx: axienet: Combine CR calculation Sean Anderson
@ 2024-09-09 23:52 ` Sean Anderson
2024-09-09 23:52 ` [RFC PATCH net-next v2 5/6] net: xilinx: axienet: Get coalesce parameters from driver state Sean Anderson
` (2 subsequent siblings)
6 siblings, 0 replies; 11+ messages in thread
From: Sean Anderson @ 2024-09-09 23:52 UTC (permalink / raw)
To: David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Radhey Shyam Pandey, netdev
Cc: linux-arm-kernel, Michal Simek, linux-kernel, Sean Anderson
In preparation for adaptive IRQ coalescing, we first need to support
adjusting the settings at runtime. The existing code doesn't require any
locking because
- dma_start is the only function that modifies rx/tx_dma_cr. It is
always called with IRQs and NAPI disabled, so nothing else is touching
the hardware.
- The IRQs don't race with poll, since the latter is a softirq.
- The IRQs don't race with dma_stop since they both just clear the
control registers.
- dma_stop doesn't race with poll since the former is called with NAPI
disabled.
However, once we introduce another function that modifies rx/tx_dma_cr,
we need to have some locking to prevent races. Introduce two locks to
protect these variables and their registers.
The control register values are now generated where the coalescing
settings are set. Converting coalescing settings to control register
values may require sleeping because of clk_get_rate. However, the
read/modify/write of the control registers themselves can't sleep
because it needs to happen in IRQ context. By pre-calculating the
control register values, we avoid introducing an additional mutex.
Since axienet_dma_start writes the control settings when it runs, we
don't bother updating the CR registers when rx/tx_dma_started is false.
This prevents any issues from writing to the control registers in the
middle of a reset sequence.
Signed-off-by: Sean Anderson <sean.anderson@linux.dev>
---
Changes in v2:
- Don't use spin_lock_irqsave when we know the context
- Split the CR calculation refactor from runtime coalesce settings
adjustment support for easier review.
- Have axienet_update_coalesce_rx/tx take the cr value/mask instead of
calculating it with axienet_calc_cr. This will make it easier to add
partial updates in the next few commits.
- Split off CR calculation merging into another patch
drivers/net/ethernet/xilinx/xilinx_axienet.h | 8 ++
.../net/ethernet/xilinx/xilinx_axienet_main.c | 125 +++++++++++++++---
2 files changed, 114 insertions(+), 19 deletions(-)
diff --git a/drivers/net/ethernet/xilinx/xilinx_axienet.h b/drivers/net/ethernet/xilinx/xilinx_axienet.h
index c43ce8f7590c..f0864cb8defe 100644
--- a/drivers/net/ethernet/xilinx/xilinx_axienet.h
+++ b/drivers/net/ethernet/xilinx/xilinx_axienet.h
@@ -484,7 +484,9 @@ struct skbuf_dma_descriptor {
* @regs: Base address for the axienet_local device address space
* @dma_regs: Base address for the axidma device address space
* @napi_rx: NAPI RX control structure
+ * @rx_cr_lock: Lock protecting @rx_dma_cr, its register, and @rx_dma_started
* @rx_dma_cr: Nominal content of RX DMA control register
+ * @rx_dma_started: Set when RX DMA is started
* @rx_bd_v: Virtual address of the RX buffer descriptor ring
* @rx_bd_p: Physical address(start address) of the RX buffer descr. ring
* @rx_bd_num: Size of RX buffer descriptor ring
@@ -494,7 +496,9 @@ struct skbuf_dma_descriptor {
* @rx_bytes: RX byte count for statistics
* @rx_stat_sync: Synchronization object for RX stats
* @napi_tx: NAPI TX control structure
+ * @tx_cr_lock: Lock protecting @tx_dma_cr, its register, and @tx_dma_started
* @tx_dma_cr: Nominal content of TX DMA control register
+ * @tx_dma_started: Set when TX DMA is started
* @tx_bd_v: Virtual address of the TX buffer descriptor ring
* @tx_bd_p: Physical address(start address) of the TX buffer descr. ring
* @tx_bd_num: Size of TX buffer descriptor ring
@@ -566,7 +570,9 @@ struct axienet_local {
void __iomem *dma_regs;
struct napi_struct napi_rx;
+ spinlock_t rx_cr_lock;
u32 rx_dma_cr;
+ bool rx_dma_started;
struct axidma_bd *rx_bd_v;
dma_addr_t rx_bd_p;
u32 rx_bd_num;
@@ -576,7 +582,9 @@ struct axienet_local {
struct u64_stats_sync rx_stat_sync;
struct napi_struct napi_tx;
+ spinlock_t tx_cr_lock;
u32 tx_dma_cr;
+ bool tx_dma_started;
struct axidma_bd *tx_bd_v;
dma_addr_t tx_bd_p;
u32 tx_bd_num;
diff --git a/drivers/net/ethernet/xilinx/xilinx_axienet_main.c b/drivers/net/ethernet/xilinx/xilinx_axienet_main.c
index bff94d378b9f..6bcb605aa67e 100644
--- a/drivers/net/ethernet/xilinx/xilinx_axienet_main.c
+++ b/drivers/net/ethernet/xilinx/xilinx_axienet_main.c
@@ -267,16 +267,12 @@ static u32 axienet_calc_cr(struct axienet_local *lp, u32 count, u32 usec)
*/
static void axienet_dma_start(struct axienet_local *lp)
{
+ spin_lock_irq(&lp->rx_cr_lock);
+
/* Start updating the Rx channel control register */
- lp->rx_dma_cr = axienet_calc_cr(lp, lp->coalesce_count_rx,
- lp->coalesce_usec_rx);
+ lp->rx_dma_cr &= ~XAXIDMA_CR_RUNSTOP_MASK;
axienet_dma_out32(lp, XAXIDMA_RX_CR_OFFSET, lp->rx_dma_cr);
- /* Start updating the Tx channel control register */
- lp->tx_dma_cr = axienet_calc_cr(lp, lp->coalesce_count_tx,
- lp->coalesce_usec_tx);
- axienet_dma_out32(lp, XAXIDMA_TX_CR_OFFSET, lp->tx_dma_cr);
-
/* Populate the tail pointer and bring the Rx Axi DMA engine out of
* halted state. This will make the Rx side ready for reception.
*/
@@ -285,6 +281,14 @@ static void axienet_dma_start(struct axienet_local *lp)
axienet_dma_out32(lp, XAXIDMA_RX_CR_OFFSET, lp->rx_dma_cr);
axienet_dma_out_addr(lp, XAXIDMA_RX_TDESC_OFFSET, lp->rx_bd_p +
(sizeof(*lp->rx_bd_v) * (lp->rx_bd_num - 1)));
+ lp->rx_dma_started = true;
+
+ spin_unlock_irq(&lp->rx_cr_lock);
+ spin_lock_irq(&lp->tx_cr_lock);
+
+ /* Start updating the Tx channel control register */
+ lp->tx_dma_cr &= ~XAXIDMA_CR_RUNSTOP_MASK;
+ axienet_dma_out32(lp, XAXIDMA_TX_CR_OFFSET, lp->tx_dma_cr);
/* Write to the RS (Run-stop) bit in the Tx channel control register.
* Tx channel is now ready to run. But only after we write to the
@@ -293,6 +297,9 @@ static void axienet_dma_start(struct axienet_local *lp)
axienet_dma_out_addr(lp, XAXIDMA_TX_CDESC_OFFSET, lp->tx_bd_p);
lp->tx_dma_cr |= XAXIDMA_CR_RUNSTOP_MASK;
axienet_dma_out32(lp, XAXIDMA_TX_CR_OFFSET, lp->tx_dma_cr);
+ lp->tx_dma_started = true;
+
+ spin_unlock_irq(&lp->tx_cr_lock);
}
/**
@@ -628,14 +635,22 @@ static void axienet_dma_stop(struct axienet_local *lp)
int count;
u32 cr, sr;
- cr = axienet_dma_in32(lp, XAXIDMA_RX_CR_OFFSET);
- cr &= ~(XAXIDMA_CR_RUNSTOP_MASK | XAXIDMA_IRQ_ALL_MASK);
+ spin_lock_irq(&lp->rx_cr_lock);
+
+ cr = lp->rx_dma_cr & ~(XAXIDMA_CR_RUNSTOP_MASK | XAXIDMA_IRQ_ALL_MASK);
axienet_dma_out32(lp, XAXIDMA_RX_CR_OFFSET, cr);
+ lp->rx_dma_started = false;
+
+ spin_unlock_irq(&lp->rx_cr_lock);
synchronize_irq(lp->rx_irq);
- cr = axienet_dma_in32(lp, XAXIDMA_TX_CR_OFFSET);
- cr &= ~(XAXIDMA_CR_RUNSTOP_MASK | XAXIDMA_IRQ_ALL_MASK);
+ spin_lock_irq(&lp->tx_cr_lock);
+
+ cr = lp->tx_dma_cr & ~(XAXIDMA_CR_RUNSTOP_MASK | XAXIDMA_IRQ_ALL_MASK);
axienet_dma_out32(lp, XAXIDMA_TX_CR_OFFSET, cr);
+ lp->tx_dma_started = false;
+
+ spin_unlock_irq(&lp->tx_cr_lock);
synchronize_irq(lp->tx_irq);
/* Give DMAs a chance to halt gracefully */
@@ -979,7 +994,9 @@ static int axienet_tx_poll(struct napi_struct *napi, int budget)
* cause an immediate interrupt if any TX packets are
* already pending.
*/
+ spin_lock_irq(&lp->tx_cr_lock);
axienet_dma_out32(lp, XAXIDMA_TX_CR_OFFSET, lp->tx_dma_cr);
+ spin_unlock_irq(&lp->tx_cr_lock);
}
return packets;
}
@@ -1245,7 +1262,9 @@ static int axienet_rx_poll(struct napi_struct *napi, int budget)
* cause an immediate interrupt if any RX packets are
* already pending.
*/
+ spin_lock_irq(&lp->rx_cr_lock);
axienet_dma_out32(lp, XAXIDMA_RX_CR_OFFSET, lp->rx_dma_cr);
+ spin_unlock_irq(&lp->rx_cr_lock);
}
return packets;
}
@@ -1283,13 +1302,16 @@ static irqreturn_t axienet_tx_irq(int irq, void *_ndev)
/* Disable further TX completion interrupts and schedule
* NAPI to handle the completions.
*/
- u32 cr = lp->tx_dma_cr;
+ u32 cr;
+ spin_lock(&lp->tx_cr_lock);
+ cr = lp->tx_dma_cr;
cr &= ~(XAXIDMA_IRQ_IOC_MASK | XAXIDMA_IRQ_DELAY_MASK);
if (napi_schedule_prep(&lp->napi_tx)) {
axienet_dma_out32(lp, XAXIDMA_TX_CR_OFFSET, cr);
__napi_schedule_irqoff(&lp->napi_tx);
}
+ spin_unlock(&lp->tx_cr_lock);
}
return IRQ_HANDLED;
@@ -1328,13 +1350,16 @@ static irqreturn_t axienet_rx_irq(int irq, void *_ndev)
/* Disable further RX completion interrupts and schedule
* NAPI receive.
*/
- u32 cr = lp->rx_dma_cr;
+ u32 cr;
+ spin_lock(&lp->rx_cr_lock);
+ cr = lp->rx_dma_cr;
cr &= ~(XAXIDMA_IRQ_IOC_MASK | XAXIDMA_IRQ_DELAY_MASK);
if (napi_schedule_prep(&lp->napi_rx)) {
axienet_dma_out32(lp, XAXIDMA_RX_CR_OFFSET, cr);
__napi_schedule_irqoff(&lp->napi_rx);
}
+ spin_unlock(&lp->rx_cr_lock);
}
return IRQ_HANDLED;
@@ -1994,6 +2019,62 @@ axienet_ethtools_set_pauseparam(struct net_device *ndev,
return phylink_ethtool_set_pauseparam(lp->phylink, epauseparm);
}
+/**
+ * axienet_set_cr_rx() - Set RX CR
+ * @lp: Device private data
+ * @cr: Value to write to the RX CR
+ * @mask: Bits to set from @cr
+ */
+static void axienet_update_coalesce_rx(struct axienet_local *lp, u32 cr,
+ u32 mask)
+{
+ spin_lock_irq(&lp->rx_cr_lock);
+ lp->rx_dma_cr &= ~mask;
+ lp->rx_dma_cr |= cr;
+ /* If DMA isn't started, then the settings will be applied the next
+ * time dma_start() is called.
+ */
+ if (lp->rx_dma_started) {
+ u32 reg = axienet_dma_in32(lp, XAXIDMA_RX_CR_OFFSET);
+
+ /* Don't enable IRQs if they are disabled by NAPI */
+ if (reg & XAXIDMA_IRQ_ALL_MASK)
+ cr = lp->rx_dma_cr;
+ else
+ cr = lp->rx_dma_cr & ~XAXIDMA_IRQ_ALL_MASK;
+ axienet_dma_out32(lp, XAXIDMA_RX_CR_OFFSET, cr);
+ }
+ spin_unlock_irq(&lp->rx_cr_lock);
+}
+
+/**
+ * axienet_set_cr_tx() - Set TX CR
+ * @lp: Device private data
+ * @cr: Value to write to the TX CR
+ * @mask: Bits to set from @cr
+ */
+static void axienet_update_coalesce_tx(struct axienet_local *lp, u32 cr,
+ u32 mask)
+{
+ spin_lock_irq(&lp->tx_cr_lock);
+ lp->tx_dma_cr &= ~mask;
+ lp->tx_dma_cr |= cr;
+ /* If DMA isn't started, then the settings will be applied the next
+ * time dma_start() is called.
+ */
+ if (lp->tx_dma_started) {
+ u32 reg = axienet_dma_in32(lp, XAXIDMA_TX_CR_OFFSET);
+
+ /* Don't enable IRQs if they are disabled by NAPI */
+ if (reg & XAXIDMA_IRQ_ALL_MASK)
+ cr = lp->tx_dma_cr;
+ else
+ cr = lp->tx_dma_cr & ~XAXIDMA_IRQ_ALL_MASK;
+ axienet_dma_out32(lp, XAXIDMA_TX_CR_OFFSET, cr);
+ }
+ spin_unlock_irq(&lp->tx_cr_lock);
+}
+
/**
* axienet_ethtools_get_coalesce - Get DMA interrupt coalescing count.
* @ndev: Pointer to net_device structure
@@ -2042,12 +2123,7 @@ axienet_ethtools_set_coalesce(struct net_device *ndev,
struct netlink_ext_ack *extack)
{
struct axienet_local *lp = netdev_priv(ndev);
-
- if (netif_running(ndev)) {
- NL_SET_ERR_MSG(extack,
- "Please stop netif before applying configuration");
- return -EBUSY;
- }
+ u32 cr;
if (!ecoalesce->rx_max_coalesced_frames ||
!ecoalesce->tx_max_coalesced_frames) {
@@ -2069,6 +2145,11 @@ axienet_ethtools_set_coalesce(struct net_device *ndev,
lp->coalesce_count_tx = ecoalesce->tx_max_coalesced_frames;
lp->coalesce_usec_tx = ecoalesce->tx_coalesce_usecs;
+ cr = axienet_calc_cr(lp, lp->coalesce_count_rx, lp->coalesce_usec_rx);
+ axienet_update_coalesce_rx(lp, cr, ~XAXIDMA_CR_RUNSTOP_MASK);
+
+ cr = axienet_calc_cr(lp, lp->coalesce_count_tx, lp->coalesce_usec_tx);
+ axienet_update_coalesce_tx(lp, cr, ~XAXIDMA_CR_RUNSTOP_MASK);
return 0;
}
@@ -2853,10 +2934,16 @@ static int axienet_probe(struct platform_device *pdev)
axienet_set_mac_address(ndev, NULL);
}
+ spin_lock_init(&lp->rx_cr_lock);
+ spin_lock_init(&lp->tx_cr_lock);
lp->coalesce_count_rx = XAXIDMA_DFT_RX_THRESHOLD;
lp->coalesce_count_tx = XAXIDMA_DFT_TX_THRESHOLD;
lp->coalesce_usec_rx = XAXIDMA_DFT_RX_USEC;
lp->coalesce_usec_tx = XAXIDMA_DFT_TX_USEC;
+ lp->rx_dma_cr = axienet_calc_cr(lp, lp->coalesce_count_rx,
+ lp->coalesce_usec_rx);
+ lp->tx_dma_cr = axienet_calc_cr(lp, lp->coalesce_count_tx,
+ lp->coalesce_usec_tx);
ret = axienet_mdio_setup(lp);
if (ret)
--
2.35.1.1320.gc452695387.dirty
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [RFC PATCH net-next v2 5/6] net: xilinx: axienet: Get coalesce parameters from driver state
2024-09-09 23:52 [RFC PATCH net-next v2 0/6] net: xilinx: axienet: Enable adaptive IRQ coalescing with DIM Sean Anderson
` (3 preceding siblings ...)
2024-09-09 23:52 ` [RFC PATCH net-next v2 4/6] net: xilinx: axienet: Support adjusting coalesce settings while running Sean Anderson
@ 2024-09-09 23:52 ` Sean Anderson
2024-09-09 23:52 ` [RFC PATCH net-next v2 6/6] net: xilinx: axienet: Enable adaptive IRQ coalescing with DIM Sean Anderson
2024-09-10 1:34 ` [RFC PATCH net-next v2 0/6] " Nelson, Shannon
6 siblings, 0 replies; 11+ messages in thread
From: Sean Anderson @ 2024-09-09 23:52 UTC (permalink / raw)
To: David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Radhey Shyam Pandey, netdev
Cc: linux-arm-kernel, Michal Simek, linux-kernel, Sean Anderson
The cr variables now contain the same values as the control registers
themselves. Extract/calculate the values from the variables instead of
saving the user-specified values. This allows us to remove some
bookeeping, and also lets the user know what the actual coalesce
settings are.
Signed-off-by: Sean Anderson <sean.anderson@linux.dev>
---
Changes in v2:
- New
drivers/net/ethernet/xilinx/xilinx_axienet.h | 8 ---
.../net/ethernet/xilinx/xilinx_axienet_main.c | 70 +++++++++++++------
2 files changed, 47 insertions(+), 31 deletions(-)
diff --git a/drivers/net/ethernet/xilinx/xilinx_axienet.h b/drivers/net/ethernet/xilinx/xilinx_axienet.h
index f0864cb8defe..33d05e55567e 100644
--- a/drivers/net/ethernet/xilinx/xilinx_axienet.h
+++ b/drivers/net/ethernet/xilinx/xilinx_axienet.h
@@ -533,10 +533,6 @@ struct skbuf_dma_descriptor {
* @rxmem: Stores rx memory size for jumbo frame handling.
* @csum_offload_on_tx_path: Stores the checksum selection on TX side.
* @csum_offload_on_rx_path: Stores the checksum selection on RX side.
- * @coalesce_count_rx: Store the irq coalesce on RX side.
- * @coalesce_usec_rx: IRQ coalesce delay for RX
- * @coalesce_count_tx: Store the irq coalesce on TX side.
- * @coalesce_usec_tx: IRQ coalesce delay for TX
* @use_dmaengine: flag to check dmaengine framework usage.
* @tx_chan: TX DMA channel.
* @rx_chan: RX DMA channel.
@@ -617,10 +613,6 @@ struct axienet_local {
int csum_offload_on_tx_path;
int csum_offload_on_rx_path;
- u32 coalesce_count_rx;
- u32 coalesce_usec_rx;
- u32 coalesce_count_tx;
- u32 coalesce_usec_tx;
u8 use_dmaengine;
struct dma_chan *tx_chan;
struct dma_chan *rx_chan;
diff --git a/drivers/net/ethernet/xilinx/xilinx_axienet_main.c b/drivers/net/ethernet/xilinx/xilinx_axienet_main.c
index 6bcb605aa67e..eb9600417d81 100644
--- a/drivers/net/ethernet/xilinx/xilinx_axienet_main.c
+++ b/drivers/net/ethernet/xilinx/xilinx_axienet_main.c
@@ -223,6 +223,13 @@ static void axienet_dma_bd_release(struct net_device *ndev)
lp->rx_bd_p);
}
+static u64 axienet_dma_rate(struct axienet_local *lp)
+{
+ if (lp->axi_clk)
+ return clk_get_rate(lp->axi_clk);
+ return 125000000; /* arbitrary guess if no clock rate set */
+}
+
/**
* axienet_calc_cr() - Calculate control register value
* @lp: Device private data
@@ -243,12 +250,9 @@ static u32 axienet_calc_cr(struct axienet_local *lp, u32 count, u32 usec)
* the first packet. Otherwise leave at 0 to disable delay interrupt.
*/
if (count > 1) {
- u64 clk_rate = 125000000; /* arbitrary guess if no clock rate set */
+ u64 clk_rate = axienet_dma_rate(lp);
u32 timer;
- if (lp->axi_clk)
- clk_rate = clk_get_rate(lp->axi_clk);
-
/* 1 Timeout Interval = 125 * (clock period of SG clock) */
timer = DIV64_U64_ROUND_CLOSEST((u64)usec * clk_rate,
XAXIDMA_DELAY_SCALE);
@@ -261,6 +265,23 @@ static u32 axienet_calc_cr(struct axienet_local *lp, u32 count, u32 usec)
return cr;
}
+/**
+ * axienet_cr_params() - Extract coalesce parameters from the CR
+ * @lp: Device private data
+ * @cr: The control register to parse
+ * @count: Number of packets before an interrupt
+ * @usec: Idle time (in usec) before an interrupt
+ */
+static void axienet_coalesce_params(struct axienet_local *lp, u32 cr,
+ u32 *count, u32 *usec)
+{
+ u64 clk_rate = axienet_dma_rate(lp);
+ u64 timer = FIELD_GET(XAXIDMA_DELAY_MASK, cr);
+
+ *count = FIELD_GET(XAXIDMA_COALESCE_MASK, cr);
+ *usec = DIV64_U64_ROUND_CLOSEST(timer * XAXIDMA_DELAY_SCALE, clk_rate);
+}
+
/**
* axienet_dma_start - Set up DMA registers and start DMA operation
* @lp: Pointer to the axienet_local structure
@@ -2095,11 +2116,21 @@ axienet_ethtools_get_coalesce(struct net_device *ndev,
struct netlink_ext_ack *extack)
{
struct axienet_local *lp = netdev_priv(ndev);
+ u32 cr;
- ecoalesce->rx_max_coalesced_frames = lp->coalesce_count_rx;
- ecoalesce->rx_coalesce_usecs = lp->coalesce_usec_rx;
- ecoalesce->tx_max_coalesced_frames = lp->coalesce_count_tx;
- ecoalesce->tx_coalesce_usecs = lp->coalesce_usec_tx;
+ spin_lock_irq(&lp->rx_cr_lock);
+ cr = lp->rx_dma_cr;
+ spin_unlock_irq(&lp->rx_cr_lock);
+ axienet_coalesce_params(lp, cr,
+ &ecoalesce->rx_max_coalesced_frames,
+ &ecoalesce->rx_coalesce_usecs);
+
+ spin_lock_irq(&lp->tx_cr_lock);
+ cr = lp->tx_dma_cr;
+ spin_unlock_irq(&lp->tx_cr_lock);
+ axienet_coalesce_params(lp, cr,
+ &ecoalesce->tx_max_coalesced_frames,
+ &ecoalesce->tx_coalesce_usecs);
return 0;
}
@@ -2140,15 +2171,12 @@ axienet_ethtools_set_coalesce(struct net_device *ndev,
return -EINVAL;
}
- lp->coalesce_count_rx = ecoalesce->rx_max_coalesced_frames;
- lp->coalesce_usec_rx = ecoalesce->rx_coalesce_usecs;
- lp->coalesce_count_tx = ecoalesce->tx_max_coalesced_frames;
- lp->coalesce_usec_tx = ecoalesce->tx_coalesce_usecs;
-
- cr = axienet_calc_cr(lp, lp->coalesce_count_rx, lp->coalesce_usec_rx);
+ cr = axienet_calc_cr(lp, ecoalesce->rx_max_coalesced_frames,
+ ecoalesce->rx_coalesce_usecs);
axienet_update_coalesce_rx(lp, cr, ~XAXIDMA_CR_RUNSTOP_MASK);
- cr = axienet_calc_cr(lp, lp->coalesce_count_tx, lp->coalesce_usec_tx);
+ cr = axienet_calc_cr(lp, ecoalesce->tx_max_coalesced_frames,
+ ecoalesce->tx_coalesce_usecs);
axienet_update_coalesce_tx(lp, cr, ~XAXIDMA_CR_RUNSTOP_MASK);
return 0;
}
@@ -2936,14 +2964,10 @@ static int axienet_probe(struct platform_device *pdev)
spin_lock_init(&lp->rx_cr_lock);
spin_lock_init(&lp->tx_cr_lock);
- lp->coalesce_count_rx = XAXIDMA_DFT_RX_THRESHOLD;
- lp->coalesce_count_tx = XAXIDMA_DFT_TX_THRESHOLD;
- lp->coalesce_usec_rx = XAXIDMA_DFT_RX_USEC;
- lp->coalesce_usec_tx = XAXIDMA_DFT_TX_USEC;
- lp->rx_dma_cr = axienet_calc_cr(lp, lp->coalesce_count_rx,
- lp->coalesce_usec_rx);
- lp->tx_dma_cr = axienet_calc_cr(lp, lp->coalesce_count_tx,
- lp->coalesce_usec_tx);
+ lp->rx_dma_cr = axienet_calc_cr(lp, XAXIDMA_DFT_RX_THRESHOLD,
+ XAXIDMA_DFT_RX_USEC);
+ lp->tx_dma_cr = axienet_calc_cr(lp, XAXIDMA_DFT_TX_THRESHOLD,
+ XAXIDMA_DFT_TX_USEC);
ret = axienet_mdio_setup(lp);
if (ret)
--
2.35.1.1320.gc452695387.dirty
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [RFC PATCH net-next v2 6/6] net: xilinx: axienet: Enable adaptive IRQ coalescing with DIM
2024-09-09 23:52 [RFC PATCH net-next v2 0/6] net: xilinx: axienet: Enable adaptive IRQ coalescing with DIM Sean Anderson
` (4 preceding siblings ...)
2024-09-09 23:52 ` [RFC PATCH net-next v2 5/6] net: xilinx: axienet: Get coalesce parameters from driver state Sean Anderson
@ 2024-09-09 23:52 ` Sean Anderson
2024-09-10 1:32 ` Nelson, Shannon
2024-09-10 1:34 ` [RFC PATCH net-next v2 0/6] " Nelson, Shannon
6 siblings, 1 reply; 11+ messages in thread
From: Sean Anderson @ 2024-09-09 23:52 UTC (permalink / raw)
To: David S . Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
Radhey Shyam Pandey, netdev
Cc: linux-arm-kernel, Michal Simek, linux-kernel, Sean Anderson,
Heng Qi
The default RX IRQ coalescing settings of one IRQ per packet can represent
a significant CPU load. However, increasing the coalescing unilaterally
can result in undesirable latency under low load. Adaptive IRQ
coalescing with DIM offers a way to adjust the coalescing settings based
on load.
This device only supports "CQE" mode [1], where each packet resets the
timer. Therefore, an interrupt is fired either when we receive
coalesce_count_rx packets or when the interface is idle for
coalesce_usec_rx. With this in mind, consider the following scenarios:
Link saturated
Here we want to set coalesce_count_rx to a large value, in order to
coalesce more packets and reduce CPU load. coalesce_usec_rx should
be set to at least the time for one packet. Otherwise the link will
be "idle" and we will get an interrupt for each packet anyway.
Bursts of packets
Each burst should be coalesced into a single interrupt, although it
may be prudent to reduce coalesce_count_rx for better latency.
coalesce_usec_rx should be set to at least the time for one packet
so bursts are coalesced. However, additional time beyond the packet
time will just increase latency at the end of a burst.
Sporadic packets
Due to low load, we can set coalesce_count_rx to 1 in order to
reduce latency to the minimum. coalesce_usec_rx does not matter in
this case.
Based on this analysis, I expected the CQE profiles to look something
like
usec = 0, pkts = 1 // Low load
usec = 16, pkts = 4
usec = 16, pkts = 16
usec = 16, pkts = 64
usec = 16, pkts = 256 // High load
Where usec is set to 16 to be a few us greater than the 12.3 us packet
time of a 1500 MTU packet at 1 GBit/s. However, the CQE profile is
instead
usec = 2, pkts = 256 // Low load
usec = 8, pkts = 128
usec = 16, pkts = 64
usec = 32, pkts = 64
usec = 64, pkts = 64 // High load
I found this very surprising. The number of coalesced packets
*decreases* as load increases. But as load increases we have more
opportunities to coalesce packets without affecting latency as much.
Additionally, the profile *increases* the usec as the load increases.
But as load increases, the gaps between packets will tend to become
smaller, making it possible to *decrease* usec for better latency at the
end of a "burst".
I consider the default CQE profile unsuitable for this NIC. Therefore,
we use the first profile outlined in this commit instead.
coalesce_usec_rx is set to 16 by default, but the user can customize it.
This may be necessary if they are using jumbo frames. I think adjusting
the profile times based on the link speed/mtu would be good improvement
for generic DIM.
In addition to the above profile problems, I noticed the following
additional issues with DIM while testing:
- DIM tends to "wander" when at low load, since the performance gradient
is pretty flat. If you only have 10p/ms anyway then adjusting the
coalescing settings will not affect throughput very much.
- DIM takes a long time to adjust back to low indices when load is
decreased following a period of high load. This is because it only
re-evaluates its settings once every 64 interrupts. However, at low
load 64 interrupts can be several seconds.
Finally: performance. This patch increases receive throughput with
iperf3 from 840 Mbits/sec to 938 Mbits/sec, decreases interrupts from
69920/sec to 316/sec, and decreases CPU utilization (4x Cortex-A53) from
43% to 9%.
[1] Who names this stuff?
Signed-off-by: Sean Anderson <sean.anderson@linux.dev>
---
Heng, maybe you have some comments on DIM regarding the above?
Changes in v2:
- Don't take the RTNL in axienet_rx_dim_work to avoid deadlock. Instead,
calculate a partial cr update that axienet_update_coalesce_rx can
perform under a spin lock.
- Use READ/WRITE_ONCE when accessing/modifying rx_irqs
drivers/net/ethernet/xilinx/Kconfig | 1 +
drivers/net/ethernet/xilinx/xilinx_axienet.h | 10 ++-
.../net/ethernet/xilinx/xilinx_axienet_main.c | 80 +++++++++++++++++--
3 files changed, 82 insertions(+), 9 deletions(-)
diff --git a/drivers/net/ethernet/xilinx/Kconfig b/drivers/net/ethernet/xilinx/Kconfig
index 35d96c633a33..7502214cc7d5 100644
--- a/drivers/net/ethernet/xilinx/Kconfig
+++ b/drivers/net/ethernet/xilinx/Kconfig
@@ -28,6 +28,7 @@ config XILINX_AXI_EMAC
depends on HAS_IOMEM
depends on XILINX_DMA
select PHYLINK
+ select DIMLIB
help
This driver supports the 10/100/1000 Ethernet from Xilinx for the
AXI bus interface used in Xilinx Virtex FPGAs and Soc's.
diff --git a/drivers/net/ethernet/xilinx/xilinx_axienet.h b/drivers/net/ethernet/xilinx/xilinx_axienet.h
index 33d05e55567e..b6604e354de7 100644
--- a/drivers/net/ethernet/xilinx/xilinx_axienet.h
+++ b/drivers/net/ethernet/xilinx/xilinx_axienet.h
@@ -9,6 +9,7 @@
#ifndef XILINX_AXIENET_H
#define XILINX_AXIENET_H
+#include <linux/dim.h>
#include <linux/netdevice.h>
#include <linux/spinlock.h>
#include <linux/interrupt.h>
@@ -123,8 +124,7 @@
/* Default TX/RX Threshold and delay timer values for SGDMA mode */
#define XAXIDMA_DFT_TX_THRESHOLD 24
#define XAXIDMA_DFT_TX_USEC 50
-#define XAXIDMA_DFT_RX_THRESHOLD 1
-#define XAXIDMA_DFT_RX_USEC 50
+#define XAXIDMA_DFT_RX_USEC 16
#define XAXIDMA_BD_CTRL_TXSOF_MASK 0x08000000 /* First tx packet */
#define XAXIDMA_BD_CTRL_TXEOF_MASK 0x04000000 /* Last tx packet */
@@ -484,6 +484,9 @@ struct skbuf_dma_descriptor {
* @regs: Base address for the axienet_local device address space
* @dma_regs: Base address for the axidma device address space
* @napi_rx: NAPI RX control structure
+ * @rx_dim: DIM state for the receive queue
+ * @rx_irqs: Number of interrupts
+ * @rx_dim_enabled: Whether DIM is enabled or not
* @rx_cr_lock: Lock protecting @rx_dma_cr, its register, and @rx_dma_started
* @rx_dma_cr: Nominal content of RX DMA control register
* @rx_dma_started: Set when RX DMA is started
@@ -566,6 +569,9 @@ struct axienet_local {
void __iomem *dma_regs;
struct napi_struct napi_rx;
+ struct dim rx_dim;
+ bool rx_dim_enabled;
+ u16 rx_irqs;
spinlock_t rx_cr_lock;
u32 rx_dma_cr;
bool rx_dma_started;
diff --git a/drivers/net/ethernet/xilinx/xilinx_axienet_main.c b/drivers/net/ethernet/xilinx/xilinx_axienet_main.c
index eb9600417d81..194ae87f534a 100644
--- a/drivers/net/ethernet/xilinx/xilinx_axienet_main.c
+++ b/drivers/net/ethernet/xilinx/xilinx_axienet_main.c
@@ -1279,6 +1279,18 @@ static int axienet_rx_poll(struct napi_struct *napi, int budget)
axienet_dma_out_addr(lp, XAXIDMA_RX_TDESC_OFFSET, tail_p);
if (packets < budget && napi_complete_done(napi, packets)) {
+ if (READ_ONCE(lp->rx_dim_enabled)) {
+ struct dim_sample sample = {
+ .time = ktime_get(),
+ /* Safe because we are the only writer */
+ .pkt_ctr = u64_stats_read(&lp->rx_packets),
+ .byte_ctr = u64_stats_read(&lp->rx_bytes),
+ .event_ctr = READ_ONCE(lp->rx_irqs),
+ };
+
+ net_dim(&lp->rx_dim, sample);
+ }
+
/* Re-enable RX completion interrupts. This should
* cause an immediate interrupt if any RX packets are
* already pending.
@@ -1373,6 +1385,7 @@ static irqreturn_t axienet_rx_irq(int irq, void *_ndev)
*/
u32 cr;
+ WRITE_ONCE(lp->rx_irqs, READ_ONCE(lp->rx_irqs) + 1);
spin_lock(&lp->rx_cr_lock);
cr = lp->rx_dma_cr;
cr &= ~(XAXIDMA_IRQ_IOC_MASK | XAXIDMA_IRQ_DELAY_MASK);
@@ -1670,6 +1683,7 @@ static int axienet_open(struct net_device *ndev)
if (lp->eth_irq > 0)
free_irq(lp->eth_irq, ndev);
err_phy:
+ cancel_work_sync(&lp->rx_dim.work);
cancel_delayed_work_sync(&lp->stats_work);
phylink_stop(lp->phylink);
phylink_disconnect_phy(lp->phylink);
@@ -1696,6 +1710,7 @@ static int axienet_stop(struct net_device *ndev)
napi_disable(&lp->napi_rx);
}
+ cancel_work_sync(&lp->rx_dim.work);
cancel_delayed_work_sync(&lp->stats_work);
phylink_stop(lp->phylink);
@@ -2068,6 +2083,31 @@ static void axienet_update_coalesce_rx(struct axienet_local *lp, u32 cr,
spin_unlock_irq(&lp->rx_cr_lock);
}
+/**
+ * axienet_dim_coalesce_count_rx() - RX coalesce count for DIM
+ * @lp: Device private data
+ */
+static u32 axienet_dim_coalesce_count_rx(struct axienet_local *lp)
+{
+ return 1 << (lp->rx_dim.profile_ix << 1);
+}
+
+/**
+ * axienet_rx_dim_work() - Adjust RX DIM settings
+ * @work: The work struct
+ */
+static void axienet_rx_dim_work(struct work_struct *work)
+{
+ struct axienet_local *lp =
+ container_of(work, struct axienet_local, rx_dim.work);
+ u32 cr = axienet_calc_cr(lp, axienet_dim_coalesce_count_rx(lp), 0);
+ u32 mask = XAXIDMA_COALESCE_MASK | XAXIDMA_IRQ_IOC_MASK |
+ XAXIDMA_IRQ_ERROR_MASK;
+
+ axienet_update_coalesce_rx(lp, cr, mask);
+ lp->rx_dim.state = DIM_START_MEASURE;
+}
+
/**
* axienet_set_cr_tx() - Set TX CR
* @lp: Device private data
@@ -2118,6 +2158,8 @@ axienet_ethtools_get_coalesce(struct net_device *ndev,
struct axienet_local *lp = netdev_priv(ndev);
u32 cr;
+ ecoalesce->use_adaptive_rx_coalesce = lp->rx_dim_enabled;
+
spin_lock_irq(&lp->rx_cr_lock);
cr = lp->rx_dma_cr;
spin_unlock_irq(&lp->rx_cr_lock);
@@ -2154,7 +2196,9 @@ axienet_ethtools_set_coalesce(struct net_device *ndev,
struct netlink_ext_ack *extack)
{
struct axienet_local *lp = netdev_priv(ndev);
- u32 cr;
+ bool new_dim = ecoalesce->use_adaptive_rx_coalesce;
+ bool old_dim = lp->rx_dim_enabled;
+ u32 cr, mask = ~XAXIDMA_CR_RUNSTOP_MASK;
if (!ecoalesce->rx_max_coalesced_frames ||
!ecoalesce->tx_max_coalesced_frames) {
@@ -2162,7 +2206,7 @@ axienet_ethtools_set_coalesce(struct net_device *ndev,
return -EINVAL;
}
- if ((ecoalesce->rx_max_coalesced_frames > 1 &&
+ if (((ecoalesce->rx_max_coalesced_frames > 1 || new_dim) &&
!ecoalesce->rx_coalesce_usecs) ||
(ecoalesce->tx_max_coalesced_frames > 1 &&
!ecoalesce->tx_coalesce_usecs)) {
@@ -2171,9 +2215,27 @@ axienet_ethtools_set_coalesce(struct net_device *ndev,
return -EINVAL;
}
- cr = axienet_calc_cr(lp, ecoalesce->rx_max_coalesced_frames,
- ecoalesce->rx_coalesce_usecs);
- axienet_update_coalesce_rx(lp, cr, ~XAXIDMA_CR_RUNSTOP_MASK);
+ if (new_dim && !old_dim) {
+ cr = axienet_calc_cr(lp, axienet_dim_coalesce_count_rx(lp),
+ ecoalesce->rx_coalesce_usecs);
+ } else if (!new_dim) {
+ if (old_dim) {
+ WRITE_ONCE(lp->rx_dim_enabled, false);
+ napi_synchronize(&lp->napi_rx);
+ flush_work(&lp->rx_dim.work);
+ }
+
+ cr = axienet_calc_cr(lp, ecoalesce->rx_max_coalesced_frames,
+ ecoalesce->rx_coalesce_usecs);
+ } else {
+ /* Dummy value for count just to calculate timer */
+ cr = axienet_calc_cr(lp, 2, ecoalesce->rx_coalesce_usecs);
+ mask = XAXIDMA_DELAY_MASK | XAXIDMA_IRQ_DELAY_MASK;
+ }
+
+ axienet_update_coalesce_rx(lp, cr, mask);
+ if (new_dim && !old_dim)
+ WRITE_ONCE(lp->rx_dim_enabled, true);
cr = axienet_calc_cr(lp, ecoalesce->tx_max_coalesced_frames,
ecoalesce->tx_coalesce_usecs);
@@ -2415,7 +2477,8 @@ axienet_ethtool_get_rmon_stats(struct net_device *dev,
static const struct ethtool_ops axienet_ethtool_ops = {
.supported_coalesce_params = ETHTOOL_COALESCE_MAX_FRAMES |
- ETHTOOL_COALESCE_USECS,
+ ETHTOOL_COALESCE_USECS |
+ ETHTOOL_COALESCE_USE_ADAPTIVE_RX,
.get_drvinfo = axienet_ethtools_get_drvinfo,
.get_regs_len = axienet_ethtools_get_regs_len,
.get_regs = axienet_ethtools_get_regs,
@@ -2964,7 +3027,10 @@ static int axienet_probe(struct platform_device *pdev)
spin_lock_init(&lp->rx_cr_lock);
spin_lock_init(&lp->tx_cr_lock);
- lp->rx_dma_cr = axienet_calc_cr(lp, XAXIDMA_DFT_RX_THRESHOLD,
+ INIT_WORK(&lp->rx_dim.work, axienet_rx_dim_work);
+ lp->rx_dim_enabled = true;
+ lp->rx_dim.profile_ix = 1;
+ lp->rx_dma_cr = axienet_calc_cr(lp, axienet_dim_coalesce_count_rx(lp),
XAXIDMA_DFT_RX_USEC);
lp->tx_dma_cr = axienet_calc_cr(lp, XAXIDMA_DFT_TX_THRESHOLD,
XAXIDMA_DFT_TX_USEC);
--
2.35.1.1320.gc452695387.dirty
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [RFC PATCH net-next v2 3/6] net: xilinx: axienet: Combine CR calculation
2024-09-09 23:52 ` [RFC PATCH net-next v2 3/6] net: xilinx: axienet: Combine CR calculation Sean Anderson
@ 2024-09-10 1:32 ` Nelson, Shannon
0 siblings, 0 replies; 11+ messages in thread
From: Nelson, Shannon @ 2024-09-10 1:32 UTC (permalink / raw)
To: Sean Anderson, David S . Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Radhey Shyam Pandey, netdev
Cc: linux-arm-kernel, Michal Simek, linux-kernel
On 9/9/2024 4:52 PM, Sean Anderson wrote:
>
> Combine the common parts of the CR calculations for better code reuse.
> While we're at it, simplify the code a bit.
>
> Signed-off-by: Sean Anderson <sean.anderson@linux.dev>
> ---
>
> Changes in v2:
> - Split off from runtime coalesce modification support
>
> drivers/net/ethernet/xilinx/xilinx_axienet.h | 2 -
> .../net/ethernet/xilinx/xilinx_axienet_main.c | 69 ++++++++++---------
> 2 files changed, 35 insertions(+), 36 deletions(-)
>
> diff --git a/drivers/net/ethernet/xilinx/xilinx_axienet.h b/drivers/net/ethernet/xilinx/xilinx_axienet.h
> index 5c0a21ef96a4..c43ce8f7590c 100644
> --- a/drivers/net/ethernet/xilinx/xilinx_axienet.h
> +++ b/drivers/net/ethernet/xilinx/xilinx_axienet.h
> @@ -112,8 +112,6 @@
> #define XAXIDMA_DELAY_MASK ((u32)0xFF000000) /* Delay timeout counter */
> #define XAXIDMA_COALESCE_MASK ((u32)0x00FF0000) /* Coalesce counter */
>
> -#define XAXIDMA_DELAY_SHIFT 24
> -
> #define XAXIDMA_IRQ_IOC_MASK 0x00001000 /* Completion intr */
> #define XAXIDMA_IRQ_DELAY_MASK 0x00002000 /* Delay interrupt */
> #define XAXIDMA_IRQ_ERROR_MASK 0x00004000 /* Error interrupt */
> diff --git a/drivers/net/ethernet/xilinx/xilinx_axienet_main.c b/drivers/net/ethernet/xilinx/xilinx_axienet_main.c
> index bc987f7ca1ea..bff94d378b9f 100644
> --- a/drivers/net/ethernet/xilinx/xilinx_axienet_main.c
> +++ b/drivers/net/ethernet/xilinx/xilinx_axienet_main.c
> @@ -224,22 +224,41 @@ static void axienet_dma_bd_release(struct net_device *ndev)
> }
>
> /**
> - * axienet_usec_to_timer - Calculate IRQ delay timer value
> - * @lp: Pointer to the axienet_local structure
> - * @coalesce_usec: Microseconds to convert into timer value
> + * axienet_calc_cr() - Calculate control register value
> + * @lp: Device private data
> + * @coalesce_count: Number of completions before an interrupt
> + * @coalesce_usec: Microseconds after the last completion before an interrupt
nit: The comments should match the actual parameter names
sln
> + *
> + * Calculate a control register value based on the coalescing settings. The
> + * run/stop bit is not set.
> */
> -static u32 axienet_usec_to_timer(struct axienet_local *lp, u32 coalesce_usec)
> +static u32 axienet_calc_cr(struct axienet_local *lp, u32 count, u32 usec)
> {
> - u32 result;
> - u64 clk_rate = 125000000; /* arbitrary guess if no clock rate set */
> + u32 cr;
>
> - if (lp->axi_clk)
> - clk_rate = clk_get_rate(lp->axi_clk);
> + count = min(count, FIELD_MAX(XAXIDMA_COALESCE_MASK));
> + cr = FIELD_PREP(XAXIDMA_COALESCE_MASK, count) | XAXIDMA_IRQ_IOC_MASK |
> + XAXIDMA_IRQ_ERROR_MASK;
> + /* Only set interrupt delay timer if not generating an interrupt on
> + * the first packet. Otherwise leave at 0 to disable delay interrupt.
> + */
> + if (count > 1) {
> + u64 clk_rate = 125000000; /* arbitrary guess if no clock rate set */
> + u32 timer;
>
> - /* 1 Timeout Interval = 125 * (clock period of SG clock) */
> - result = DIV64_U64_ROUND_CLOSEST((u64)coalesce_usec * clk_rate,
> - XAXIDMA_DELAY_SCALE);
> - return min(result, FIELD_MAX(XAXIDMA_DELAY_MASK));
> + if (lp->axi_clk)
> + clk_rate = clk_get_rate(lp->axi_clk);
> +
> + /* 1 Timeout Interval = 125 * (clock period of SG clock) */
> + timer = DIV64_U64_ROUND_CLOSEST((u64)usec * clk_rate,
> + XAXIDMA_DELAY_SCALE);
> +
> + timer = min(timer, FIELD_MAX(XAXIDMA_DELAY_MASK));
> + cr |= FIELD_PREP(XAXIDMA_DELAY_MASK, timer) |
> + XAXIDMA_IRQ_DELAY_MASK;
> + }
> +
> + return cr;
> }
>
> /**
> @@ -249,31 +268,13 @@ static u32 axienet_usec_to_timer(struct axienet_local *lp, u32 coalesce_usec)
> static void axienet_dma_start(struct axienet_local *lp)
> {
> /* Start updating the Rx channel control register */
> - lp->rx_dma_cr = FIELD_PREP(XAXIDMA_COALESCE_MASK,
> - min(lp->coalesce_count_rx,
> - FIELD_MAX(XAXIDMA_COALESCE_MASK))) |
> - XAXIDMA_IRQ_IOC_MASK | XAXIDMA_IRQ_ERROR_MASK;
> - /* Only set interrupt delay timer if not generating an interrupt on
> - * the first RX packet. Otherwise leave at 0 to disable delay interrupt.
> - */
> - if (lp->coalesce_count_rx > 1)
> - lp->rx_dma_cr |= (axienet_usec_to_timer(lp, lp->coalesce_usec_rx)
> - << XAXIDMA_DELAY_SHIFT) |
> - XAXIDMA_IRQ_DELAY_MASK;
> + lp->rx_dma_cr = axienet_calc_cr(lp, lp->coalesce_count_rx,
> + lp->coalesce_usec_rx);
> axienet_dma_out32(lp, XAXIDMA_RX_CR_OFFSET, lp->rx_dma_cr);
>
> /* Start updating the Tx channel control register */
> - lp->tx_dma_cr = FIELD_PREP(XAXIDMA_COALESCE_MASK,
> - min(lp->coalesce_count_tx,
> - FIELD_MAX(XAXIDMA_COALESCE_MASK))) |
> - XAXIDMA_IRQ_IOC_MASK | XAXIDMA_IRQ_ERROR_MASK;
> - /* Only set interrupt delay timer if not generating an interrupt on
> - * the first TX packet. Otherwise leave at 0 to disable delay interrupt.
> - */
> - if (lp->coalesce_count_tx > 1)
> - lp->tx_dma_cr |= (axienet_usec_to_timer(lp, lp->coalesce_usec_tx)
> - << XAXIDMA_DELAY_SHIFT) |
> - XAXIDMA_IRQ_DELAY_MASK;
> + lp->tx_dma_cr = axienet_calc_cr(lp, lp->coalesce_count_tx,
> + lp->coalesce_usec_tx);
> axienet_dma_out32(lp, XAXIDMA_TX_CR_OFFSET, lp->tx_dma_cr);
>
> /* Populate the tail pointer and bring the Rx Axi DMA engine out of
> --
> 2.35.1.1320.gc452695387.dirty
>
>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC PATCH net-next v2 6/6] net: xilinx: axienet: Enable adaptive IRQ coalescing with DIM
2024-09-09 23:52 ` [RFC PATCH net-next v2 6/6] net: xilinx: axienet: Enable adaptive IRQ coalescing with DIM Sean Anderson
@ 2024-09-10 1:32 ` Nelson, Shannon
0 siblings, 0 replies; 11+ messages in thread
From: Nelson, Shannon @ 2024-09-10 1:32 UTC (permalink / raw)
To: Sean Anderson, David S . Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Radhey Shyam Pandey, netdev
Cc: linux-arm-kernel, Michal Simek, linux-kernel, Heng Qi
On 9/9/2024 4:52 PM, Sean Anderson wrote:
>
> The default RX IRQ coalescing settings of one IRQ per packet can represent
> a significant CPU load. However, increasing the coalescing unilaterally
> can result in undesirable latency under low load. Adaptive IRQ
> coalescing with DIM offers a way to adjust the coalescing settings based
> on load.
>
> This device only supports "CQE" mode [1], where each packet resets the
> timer. Therefore, an interrupt is fired either when we receive
> coalesce_count_rx packets or when the interface is idle for
> coalesce_usec_rx. With this in mind, consider the following scenarios:
>
> Link saturated
> Here we want to set coalesce_count_rx to a large value, in order to
> coalesce more packets and reduce CPU load. coalesce_usec_rx should
> be set to at least the time for one packet. Otherwise the link will
> be "idle" and we will get an interrupt for each packet anyway.
>
> Bursts of packets
> Each burst should be coalesced into a single interrupt, although it
> may be prudent to reduce coalesce_count_rx for better latency.
> coalesce_usec_rx should be set to at least the time for one packet
> so bursts are coalesced. However, additional time beyond the packet
> time will just increase latency at the end of a burst.
>
> Sporadic packets
> Due to low load, we can set coalesce_count_rx to 1 in order to
> reduce latency to the minimum. coalesce_usec_rx does not matter in
> this case.
>
> Based on this analysis, I expected the CQE profiles to look something
> like
>
> usec = 0, pkts = 1 // Low load
> usec = 16, pkts = 4
> usec = 16, pkts = 16
> usec = 16, pkts = 64
> usec = 16, pkts = 256 // High load
>
> Where usec is set to 16 to be a few us greater than the 12.3 us packet
> time of a 1500 MTU packet at 1 GBit/s. However, the CQE profile is
> instead
>
> usec = 2, pkts = 256 // Low load
> usec = 8, pkts = 128
> usec = 16, pkts = 64
> usec = 32, pkts = 64
> usec = 64, pkts = 64 // High load
>
> I found this very surprising. The number of coalesced packets
> *decreases* as load increases. But as load increases we have more
> opportunities to coalesce packets without affecting latency as much.
> Additionally, the profile *increases* the usec as the load increases.
> But as load increases, the gaps between packets will tend to become
> smaller, making it possible to *decrease* usec for better latency at the
> end of a "burst".
>
> I consider the default CQE profile unsuitable for this NIC. Therefore,
> we use the first profile outlined in this commit instead.
> coalesce_usec_rx is set to 16 by default, but the user can customize it.
> This may be necessary if they are using jumbo frames. I think adjusting
> the profile times based on the link speed/mtu would be good improvement
> for generic DIM.
>
> In addition to the above profile problems, I noticed the following
> additional issues with DIM while testing:
>
> - DIM tends to "wander" when at low load, since the performance gradient
> is pretty flat. If you only have 10p/ms anyway then adjusting the
> coalescing settings will not affect throughput very much.
> - DIM takes a long time to adjust back to low indices when load is
> decreased following a period of high load. This is because it only
> re-evaluates its settings once every 64 interrupts. However, at low
> load 64 interrupts can be several seconds.
>
> Finally: performance. This patch increases receive throughput with
> iperf3 from 840 Mbits/sec to 938 Mbits/sec, decreases interrupts from
> 69920/sec to 316/sec, and decreases CPU utilization (4x Cortex-A53) from
> 43% to 9%.
>
> [1] Who names this stuff?
>
> Signed-off-by: Sean Anderson <sean.anderson@linux.dev>
> ---
> Heng, maybe you have some comments on DIM regarding the above?
>
> Changes in v2:
> - Don't take the RTNL in axienet_rx_dim_work to avoid deadlock. Instead,
> calculate a partial cr update that axienet_update_coalesce_rx can
> perform under a spin lock.
> - Use READ/WRITE_ONCE when accessing/modifying rx_irqs
>
> drivers/net/ethernet/xilinx/Kconfig | 1 +
> drivers/net/ethernet/xilinx/xilinx_axienet.h | 10 ++-
> .../net/ethernet/xilinx/xilinx_axienet_main.c | 80 +++++++++++++++++--
> 3 files changed, 82 insertions(+), 9 deletions(-)
>
> diff --git a/drivers/net/ethernet/xilinx/Kconfig b/drivers/net/ethernet/xilinx/Kconfig
> index 35d96c633a33..7502214cc7d5 100644
> --- a/drivers/net/ethernet/xilinx/Kconfig
> +++ b/drivers/net/ethernet/xilinx/Kconfig
> @@ -28,6 +28,7 @@ config XILINX_AXI_EMAC
> depends on HAS_IOMEM
> depends on XILINX_DMA
> select PHYLINK
> + select DIMLIB
> help
> This driver supports the 10/100/1000 Ethernet from Xilinx for the
> AXI bus interface used in Xilinx Virtex FPGAs and Soc's.
> diff --git a/drivers/net/ethernet/xilinx/xilinx_axienet.h b/drivers/net/ethernet/xilinx/xilinx_axienet.h
> index 33d05e55567e..b6604e354de7 100644
> --- a/drivers/net/ethernet/xilinx/xilinx_axienet.h
> +++ b/drivers/net/ethernet/xilinx/xilinx_axienet.h
> @@ -9,6 +9,7 @@
> #ifndef XILINX_AXIENET_H
> #define XILINX_AXIENET_H
>
> +#include <linux/dim.h>
> #include <linux/netdevice.h>
> #include <linux/spinlock.h>
> #include <linux/interrupt.h>
> @@ -123,8 +124,7 @@
> /* Default TX/RX Threshold and delay timer values for SGDMA mode */
> #define XAXIDMA_DFT_TX_THRESHOLD 24
> #define XAXIDMA_DFT_TX_USEC 50
> -#define XAXIDMA_DFT_RX_THRESHOLD 1
> -#define XAXIDMA_DFT_RX_USEC 50
> +#define XAXIDMA_DFT_RX_USEC 16
>
> #define XAXIDMA_BD_CTRL_TXSOF_MASK 0x08000000 /* First tx packet */
> #define XAXIDMA_BD_CTRL_TXEOF_MASK 0x04000000 /* Last tx packet */
> @@ -484,6 +484,9 @@ struct skbuf_dma_descriptor {
> * @regs: Base address for the axienet_local device address space
> * @dma_regs: Base address for the axidma device address space
> * @napi_rx: NAPI RX control structure
> + * @rx_dim: DIM state for the receive queue
> + * @rx_irqs: Number of interrupts
> + * @rx_dim_enabled: Whether DIM is enabled or not
nit: These should be in the same order as in the actual struct
sln
> * @rx_cr_lock: Lock protecting @rx_dma_cr, its register, and @rx_dma_started
> * @rx_dma_cr: Nominal content of RX DMA control register
> * @rx_dma_started: Set when RX DMA is started
> @@ -566,6 +569,9 @@ struct axienet_local {
> void __iomem *dma_regs;
>
> struct napi_struct napi_rx;
> + struct dim rx_dim;
> + bool rx_dim_enabled;
> + u16 rx_irqs;
> spinlock_t rx_cr_lock;
> u32 rx_dma_cr;
> bool rx_dma_started;
> diff --git a/drivers/net/ethernet/xilinx/xilinx_axienet_main.c b/drivers/net/ethernet/xilinx/xilinx_axienet_main.c
> index eb9600417d81..194ae87f534a 100644
> --- a/drivers/net/ethernet/xilinx/xilinx_axienet_main.c
> +++ b/drivers/net/ethernet/xilinx/xilinx_axienet_main.c
> @@ -1279,6 +1279,18 @@ static int axienet_rx_poll(struct napi_struct *napi, int budget)
> axienet_dma_out_addr(lp, XAXIDMA_RX_TDESC_OFFSET, tail_p);
>
> if (packets < budget && napi_complete_done(napi, packets)) {
> + if (READ_ONCE(lp->rx_dim_enabled)) {
> + struct dim_sample sample = {
> + .time = ktime_get(),
> + /* Safe because we are the only writer */
> + .pkt_ctr = u64_stats_read(&lp->rx_packets),
> + .byte_ctr = u64_stats_read(&lp->rx_bytes),
> + .event_ctr = READ_ONCE(lp->rx_irqs),
> + };
> +
> + net_dim(&lp->rx_dim, sample);
> + }
> +
> /* Re-enable RX completion interrupts. This should
> * cause an immediate interrupt if any RX packets are
> * already pending.
> @@ -1373,6 +1385,7 @@ static irqreturn_t axienet_rx_irq(int irq, void *_ndev)
> */
> u32 cr;
>
> + WRITE_ONCE(lp->rx_irqs, READ_ONCE(lp->rx_irqs) + 1);
> spin_lock(&lp->rx_cr_lock);
> cr = lp->rx_dma_cr;
> cr &= ~(XAXIDMA_IRQ_IOC_MASK | XAXIDMA_IRQ_DELAY_MASK);
> @@ -1670,6 +1683,7 @@ static int axienet_open(struct net_device *ndev)
> if (lp->eth_irq > 0)
> free_irq(lp->eth_irq, ndev);
> err_phy:
> + cancel_work_sync(&lp->rx_dim.work);
> cancel_delayed_work_sync(&lp->stats_work);
> phylink_stop(lp->phylink);
> phylink_disconnect_phy(lp->phylink);
> @@ -1696,6 +1710,7 @@ static int axienet_stop(struct net_device *ndev)
> napi_disable(&lp->napi_rx);
> }
>
> + cancel_work_sync(&lp->rx_dim.work);
> cancel_delayed_work_sync(&lp->stats_work);
>
> phylink_stop(lp->phylink);
> @@ -2068,6 +2083,31 @@ static void axienet_update_coalesce_rx(struct axienet_local *lp, u32 cr,
> spin_unlock_irq(&lp->rx_cr_lock);
> }
>
> +/**
> + * axienet_dim_coalesce_count_rx() - RX coalesce count for DIM
> + * @lp: Device private data
> + */
> +static u32 axienet_dim_coalesce_count_rx(struct axienet_local *lp)
> +{
> + return 1 << (lp->rx_dim.profile_ix << 1);
> +}
> +
> +/**
> + * axienet_rx_dim_work() - Adjust RX DIM settings
> + * @work: The work struct
> + */
> +static void axienet_rx_dim_work(struct work_struct *work)
> +{
> + struct axienet_local *lp =
> + container_of(work, struct axienet_local, rx_dim.work);
> + u32 cr = axienet_calc_cr(lp, axienet_dim_coalesce_count_rx(lp), 0);
> + u32 mask = XAXIDMA_COALESCE_MASK | XAXIDMA_IRQ_IOC_MASK |
> + XAXIDMA_IRQ_ERROR_MASK;
> +
> + axienet_update_coalesce_rx(lp, cr, mask);
> + lp->rx_dim.state = DIM_START_MEASURE;
> +}
> +
> /**
> * axienet_set_cr_tx() - Set TX CR
> * @lp: Device private data
> @@ -2118,6 +2158,8 @@ axienet_ethtools_get_coalesce(struct net_device *ndev,
> struct axienet_local *lp = netdev_priv(ndev);
> u32 cr;
>
> + ecoalesce->use_adaptive_rx_coalesce = lp->rx_dim_enabled;
> +
> spin_lock_irq(&lp->rx_cr_lock);
> cr = lp->rx_dma_cr;
> spin_unlock_irq(&lp->rx_cr_lock);
> @@ -2154,7 +2196,9 @@ axienet_ethtools_set_coalesce(struct net_device *ndev,
> struct netlink_ext_ack *extack)
> {
> struct axienet_local *lp = netdev_priv(ndev);
> - u32 cr;
> + bool new_dim = ecoalesce->use_adaptive_rx_coalesce;
> + bool old_dim = lp->rx_dim_enabled;
> + u32 cr, mask = ~XAXIDMA_CR_RUNSTOP_MASK;
>
> if (!ecoalesce->rx_max_coalesced_frames ||
> !ecoalesce->tx_max_coalesced_frames) {
> @@ -2162,7 +2206,7 @@ axienet_ethtools_set_coalesce(struct net_device *ndev,
> return -EINVAL;
> }
>
> - if ((ecoalesce->rx_max_coalesced_frames > 1 &&
> + if (((ecoalesce->rx_max_coalesced_frames > 1 || new_dim) &&
> !ecoalesce->rx_coalesce_usecs) ||
> (ecoalesce->tx_max_coalesced_frames > 1 &&
> !ecoalesce->tx_coalesce_usecs)) {
> @@ -2171,9 +2215,27 @@ axienet_ethtools_set_coalesce(struct net_device *ndev,
> return -EINVAL;
> }
>
> - cr = axienet_calc_cr(lp, ecoalesce->rx_max_coalesced_frames,
> - ecoalesce->rx_coalesce_usecs);
> - axienet_update_coalesce_rx(lp, cr, ~XAXIDMA_CR_RUNSTOP_MASK);
> + if (new_dim && !old_dim) {
> + cr = axienet_calc_cr(lp, axienet_dim_coalesce_count_rx(lp),
> + ecoalesce->rx_coalesce_usecs);
> + } else if (!new_dim) {
> + if (old_dim) {
> + WRITE_ONCE(lp->rx_dim_enabled, false);
> + napi_synchronize(&lp->napi_rx);
> + flush_work(&lp->rx_dim.work);
> + }
> +
> + cr = axienet_calc_cr(lp, ecoalesce->rx_max_coalesced_frames,
> + ecoalesce->rx_coalesce_usecs);
> + } else {
> + /* Dummy value for count just to calculate timer */
> + cr = axienet_calc_cr(lp, 2, ecoalesce->rx_coalesce_usecs);
> + mask = XAXIDMA_DELAY_MASK | XAXIDMA_IRQ_DELAY_MASK;
> + }
> +
> + axienet_update_coalesce_rx(lp, cr, mask);
> + if (new_dim && !old_dim)
> + WRITE_ONCE(lp->rx_dim_enabled, true);
>
> cr = axienet_calc_cr(lp, ecoalesce->tx_max_coalesced_frames,
> ecoalesce->tx_coalesce_usecs);
> @@ -2415,7 +2477,8 @@ axienet_ethtool_get_rmon_stats(struct net_device *dev,
>
> static const struct ethtool_ops axienet_ethtool_ops = {
> .supported_coalesce_params = ETHTOOL_COALESCE_MAX_FRAMES |
> - ETHTOOL_COALESCE_USECS,
> + ETHTOOL_COALESCE_USECS |
> + ETHTOOL_COALESCE_USE_ADAPTIVE_RX,
> .get_drvinfo = axienet_ethtools_get_drvinfo,
> .get_regs_len = axienet_ethtools_get_regs_len,
> .get_regs = axienet_ethtools_get_regs,
> @@ -2964,7 +3027,10 @@ static int axienet_probe(struct platform_device *pdev)
>
> spin_lock_init(&lp->rx_cr_lock);
> spin_lock_init(&lp->tx_cr_lock);
> - lp->rx_dma_cr = axienet_calc_cr(lp, XAXIDMA_DFT_RX_THRESHOLD,
> + INIT_WORK(&lp->rx_dim.work, axienet_rx_dim_work);
> + lp->rx_dim_enabled = true;
> + lp->rx_dim.profile_ix = 1;
> + lp->rx_dma_cr = axienet_calc_cr(lp, axienet_dim_coalesce_count_rx(lp),
> XAXIDMA_DFT_RX_USEC);
> lp->tx_dma_cr = axienet_calc_cr(lp, XAXIDMA_DFT_TX_THRESHOLD,
> XAXIDMA_DFT_TX_USEC);
> --
> 2.35.1.1320.gc452695387.dirty
>
>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC PATCH net-next v2 0/6] net: xilinx: axienet: Enable adaptive IRQ coalescing with DIM
2024-09-09 23:52 [RFC PATCH net-next v2 0/6] net: xilinx: axienet: Enable adaptive IRQ coalescing with DIM Sean Anderson
` (5 preceding siblings ...)
2024-09-09 23:52 ` [RFC PATCH net-next v2 6/6] net: xilinx: axienet: Enable adaptive IRQ coalescing with DIM Sean Anderson
@ 2024-09-10 1:34 ` Nelson, Shannon
2024-09-10 14:30 ` Sean Anderson
6 siblings, 1 reply; 11+ messages in thread
From: Nelson, Shannon @ 2024-09-10 1:34 UTC (permalink / raw)
To: Sean Anderson, David S . Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Radhey Shyam Pandey, netdev
Cc: linux-arm-kernel, Michal Simek, linux-kernel, Heng Qi
On 9/9/2024 4:52 PM, Sean Anderson wrote:
>
> To improve performance without sacrificing latency under low load,
> enable DIM. While I appreciate not having to write the library myself, I
> do think there are many unusual aspects to DIM, as detailed in the last
> patch.
>
> This series depends on [1-2] and is therefore marked RFC. This series is
> otherwise ready to merge.
>
> [1] https://lore.kernel.org/netdev/20240909230908.1319982-1-sean.anderson@linux.dev/
> [2] https://lore.kernel.org/netdev/20240909231904.1322387-1-sean.anderson@linux.dev/
>
> Changes in v2:
> - Add some symbolic constants for IRQ delay timer
> - Report an error for bad coalesce settings
> - Don't use spin_lock_irqsave when we know the context
> - Split the CR calculation refactor from runtime coalesce settings
> adjustment support for easier review.
> - Have axienet_update_coalesce_rx/tx take the cr value/mask instead of
> calculating it with axienet_calc_cr. This will make it easier to add
> partial updates in the next few commits.
> - Get coalesce parameters from driver state
> - Don't take the RTNL in axienet_rx_dim_work to avoid deadlock. Instead,
> calculate a partial cr update that axienet_update_coalesce_rx can
> perform under a spin lock.
> - Use READ/WRITE_ONCE when accessing/modifying rx_irqs
>
> Sean Anderson (6):
> net: xilinx: axienet: Add some symbolic constants for IRQ delay timer
> net: xilinx: axienet: Report an error for bad coalesce settings
> net: xilinx: axienet: Combine CR calculation
> net: xilinx: axienet: Support adjusting coalesce settings while
> running
> net: xilinx: axienet: Get coalesce parameters from driver state
> net: xilinx: axienet: Enable adaptive IRQ coalescing with DIM
>
> drivers/net/ethernet/xilinx/Kconfig | 1 +
> drivers/net/ethernet/xilinx/xilinx_axienet.h | 31 +-
> .../net/ethernet/xilinx/xilinx_axienet_main.c | 320 ++++++++++++++----
> 3 files changed, 273 insertions(+), 79 deletions(-)
>
> --
> 2.35.1.1320.gc452695387.dirty
>
>
Except for a couple of comment nits pointed out in 3/6 and 6/6, this set
seems reasonable.
Reviewed by: Shannon Nelson <shannon.nelson@amd.com>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC PATCH net-next v2 0/6] net: xilinx: axienet: Enable adaptive IRQ coalescing with DIM
2024-09-10 1:34 ` [RFC PATCH net-next v2 0/6] " Nelson, Shannon
@ 2024-09-10 14:30 ` Sean Anderson
0 siblings, 0 replies; 11+ messages in thread
From: Sean Anderson @ 2024-09-10 14:30 UTC (permalink / raw)
To: Nelson, Shannon, David S . Miller, Eric Dumazet, Jakub Kicinski,
Paolo Abeni, Radhey Shyam Pandey, netdev
Cc: linux-arm-kernel, Michal Simek, linux-kernel, Heng Qi
On 9/9/24 21:34, Nelson, Shannon wrote:
> On 9/9/2024 4:52 PM, Sean Anderson wrote:
>>
>> To improve performance without sacrificing latency under low load,
>> enable DIM. While I appreciate not having to write the library myself, I
>> do think there are many unusual aspects to DIM, as detailed in the last
>> patch.
>>
>> This series depends on [1-2] and is therefore marked RFC. This series is
>> otherwise ready to merge.
>>
>> [1] https://lore.kernel.org/netdev/20240909230908.1319982-1-sean.anderson@linux.dev/
>> [2] https://lore.kernel.org/netdev/20240909231904.1322387-1-sean.anderson@linux.dev/
>>
>> Changes in v2:
>> - Add some symbolic constants for IRQ delay timer
>> - Report an error for bad coalesce settings
>> - Don't use spin_lock_irqsave when we know the context
>> - Split the CR calculation refactor from runtime coalesce settings
>> adjustment support for easier review.
>> - Have axienet_update_coalesce_rx/tx take the cr value/mask instead of
>> calculating it with axienet_calc_cr. This will make it easier to add
>> partial updates in the next few commits.
>> - Get coalesce parameters from driver state
>> - Don't take the RTNL in axienet_rx_dim_work to avoid deadlock. Instead,
>> calculate a partial cr update that axienet_update_coalesce_rx can
>> perform under a spin lock.
>> - Use READ/WRITE_ONCE when accessing/modifying rx_irqs
>>
>> Sean Anderson (6):
>> net: xilinx: axienet: Add some symbolic constants for IRQ delay timer
>> net: xilinx: axienet: Report an error for bad coalesce settings
>> net: xilinx: axienet: Combine CR calculation
>> net: xilinx: axienet: Support adjusting coalesce settings while
>> running
>> net: xilinx: axienet: Get coalesce parameters from driver state
>> net: xilinx: axienet: Enable adaptive IRQ coalescing with DIM
>>
>> drivers/net/ethernet/xilinx/Kconfig | 1 +
>> drivers/net/ethernet/xilinx/xilinx_axienet.h | 31 +-
>> .../net/ethernet/xilinx/xilinx_axienet_main.c | 320 ++++++++++++++----
>> 3 files changed, 273 insertions(+), 79 deletions(-)
>>
>> --
>> 2.35.1.1320.gc452695387.dirty
>>
>>
>
> Except for a couple of comment nits pointed out in 3/6 and 6/6, this set seems reasonable.
>
> Reviewed by: Shannon Nelson <shannon.nelson@amd.com>
>
>
Thanks. I'll address these for v3.
--Sean
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2024-09-10 14:31 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-09-09 23:52 [RFC PATCH net-next v2 0/6] net: xilinx: axienet: Enable adaptive IRQ coalescing with DIM Sean Anderson
2024-09-09 23:52 ` [RFC PATCH net-next v2 1/6] net: xilinx: axienet: Add some symbolic constants for IRQ delay timer Sean Anderson
2024-09-09 23:52 ` [RFC PATCH net-next v2 2/6] net: xilinx: axienet: Report an error for bad coalesce settings Sean Anderson
2024-09-09 23:52 ` [RFC PATCH net-next v2 3/6] net: xilinx: axienet: Combine CR calculation Sean Anderson
2024-09-10 1:32 ` Nelson, Shannon
2024-09-09 23:52 ` [RFC PATCH net-next v2 4/6] net: xilinx: axienet: Support adjusting coalesce settings while running Sean Anderson
2024-09-09 23:52 ` [RFC PATCH net-next v2 5/6] net: xilinx: axienet: Get coalesce parameters from driver state Sean Anderson
2024-09-09 23:52 ` [RFC PATCH net-next v2 6/6] net: xilinx: axienet: Enable adaptive IRQ coalescing with DIM Sean Anderson
2024-09-10 1:32 ` Nelson, Shannon
2024-09-10 1:34 ` [RFC PATCH net-next v2 0/6] " Nelson, Shannon
2024-09-10 14:30 ` Sean Anderson
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).