* [PATCH net-next v3 0/4] Cleanup and optimizations to transmit code
@ 2023-10-27 12:16 Shinas Rasheed
2023-10-27 12:16 ` [PATCH net-next v3 1/4] octeon_ep: add padding for small packets Shinas Rasheed
` (4 more replies)
0 siblings, 5 replies; 6+ messages in thread
From: Shinas Rasheed @ 2023-10-27 12:16 UTC (permalink / raw)
To: netdev, linux-kernel
Cc: hgani, vimleshk, egallen, mschmidt, pabeni, horms, kuba, davem,
wizhao, konguyen, Shinas Rasheed
Pad small packets to ETH_ZLEN before transmit, cleanup dma sync calls,
add xmit_more functionality and then further remove atomic
variable usage in the prior.
Changes:
V3:
- Stop returning NETDEV_TX_BUSY when ring is full in xmit_patch.
Change to inspect early if next packet can fit in ring instead of
current packet, and stop queue if not.
- Add smp_mb between stopping tx queue and checking if tx queue has
free entries again, in queue full check function to let reflect
IQ process completions that might have happened on other cpus.
- Update small packet padding patch changelog to give more info.
V2: https://lore.kernel.org/all/20231024145119.2366588-1-srasheed@marvell.com/
- Added patch for padding small packets to ETH_ZLEN, part of
optimization patches for transmit code missed out in V1
- Updated changelog to provide more details for dma_sync remove patch
- Updated changelog to use imperative tone in add xmit_more patch
V1: https://lore.kernel.org/all/20231023114449.2362147-1-srasheed@marvell.com/
Shinas Rasheed (4):
octeon_ep: add padding for small packets
octeon_ep: remove dma sync in trasmit path
octeon_ep: implement xmit_more in transmit
octeon_ep: remove atomic variable usage in Tx data path
.../ethernet/marvell/octeon_ep/octep_config.h | 3 +-
.../ethernet/marvell/octeon_ep/octep_main.c | 55 +++++++++++--------
.../ethernet/marvell/octeon_ep/octep_main.h | 9 +++
.../net/ethernet/marvell/octeon_ep/octep_tx.c | 5 +-
.../net/ethernet/marvell/octeon_ep/octep_tx.h | 3 -
5 files changed, 45 insertions(+), 30 deletions(-)
--
2.25.1
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH net-next v3 1/4] octeon_ep: add padding for small packets
2023-10-27 12:16 [PATCH net-next v3 0/4] Cleanup and optimizations to transmit code Shinas Rasheed
@ 2023-10-27 12:16 ` Shinas Rasheed
2023-10-27 12:16 ` [PATCH net-next v3 2/4] octeon_ep: remove dma sync in trasmit path Shinas Rasheed
` (3 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: Shinas Rasheed @ 2023-10-27 12:16 UTC (permalink / raw)
To: netdev, linux-kernel
Cc: hgani, vimleshk, egallen, mschmidt, pabeni, horms, kuba, davem,
wizhao, konguyen, Shinas Rasheed, Veerasenareddy Burru,
Sathesh Edara, Eric Dumazet
Pad small packets to ETH_ZLEN before transmit, as hardware
cannot pad and requires software padding to ensure
minimum ethernet frame length.
Signed-off-by: Shinas Rasheed <srasheed@marvell.com>
---
V3:
- Updated changelog to provide more info.
V2: https://lore.kernel.org/all/20231024145119.2366588-2-srasheed@marvell.com/
- Introduced the patch
drivers/net/ethernet/marvell/octeon_ep/octep_main.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/net/ethernet/marvell/octeon_ep/octep_main.c b/drivers/net/ethernet/marvell/octeon_ep/octep_main.c
index 552970c7dec0..2c86b911a380 100644
--- a/drivers/net/ethernet/marvell/octeon_ep/octep_main.c
+++ b/drivers/net/ethernet/marvell/octeon_ep/octep_main.c
@@ -820,6 +820,9 @@ static netdev_tx_t octep_start_xmit(struct sk_buff *skb,
u16 nr_frags, si;
u16 q_no, wi;
+ if (skb_put_padto(skb, ETH_ZLEN))
+ return NETDEV_TX_OK;
+
q_no = skb_get_queue_mapping(skb);
if (q_no >= oct->num_iqs) {
netdev_err(netdev, "Invalid Tx skb->queue_mapping=%d\n", q_no);
--
2.25.1
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH net-next v3 2/4] octeon_ep: remove dma sync in trasmit path
2023-10-27 12:16 [PATCH net-next v3 0/4] Cleanup and optimizations to transmit code Shinas Rasheed
2023-10-27 12:16 ` [PATCH net-next v3 1/4] octeon_ep: add padding for small packets Shinas Rasheed
@ 2023-10-27 12:16 ` Shinas Rasheed
2023-10-27 12:16 ` [PATCH net-next v3 3/4] octeon_ep: implement xmit_more in transmit Shinas Rasheed
` (2 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: Shinas Rasheed @ 2023-10-27 12:16 UTC (permalink / raw)
To: netdev, linux-kernel
Cc: hgani, vimleshk, egallen, mschmidt, pabeni, horms, kuba, davem,
wizhao, konguyen, Shinas Rasheed, Veerasenareddy Burru,
Sathesh Edara, Eric Dumazet
Cleanup dma sync calls for scatter gather
mappings, since they are coherent allocations
and do not need explicit sync to be called.
Signed-off-by: Shinas Rasheed <srasheed@marvell.com>
---
V3:
- No changes.
V2: https://lore.kernel.org/all/20231024145119.2366588-3-srasheed@marvell.com/
- Provided more details in changelog
V1: https://lore.kernel.org/all/20231023114449.2362147-2-srasheed@marvell.com/
drivers/net/ethernet/marvell/octeon_ep/octep_main.c | 7 -------
1 file changed, 7 deletions(-)
diff --git a/drivers/net/ethernet/marvell/octeon_ep/octep_main.c b/drivers/net/ethernet/marvell/octeon_ep/octep_main.c
index 2c86b911a380..1c02304677c9 100644
--- a/drivers/net/ethernet/marvell/octeon_ep/octep_main.c
+++ b/drivers/net/ethernet/marvell/octeon_ep/octep_main.c
@@ -872,9 +872,6 @@ static netdev_tx_t octep_start_xmit(struct sk_buff *skb,
if (dma_mapping_error(iq->dev, dma))
goto dma_map_err;
- dma_sync_single_for_cpu(iq->dev, tx_buffer->sglist_dma,
- OCTEP_SGLIST_SIZE_PER_PKT,
- DMA_TO_DEVICE);
memset(sglist, 0, OCTEP_SGLIST_SIZE_PER_PKT);
sglist[0].len[3] = len;
sglist[0].dma_ptr[0] = dma;
@@ -894,10 +891,6 @@ static netdev_tx_t octep_start_xmit(struct sk_buff *skb,
frag++;
si++;
}
- dma_sync_single_for_device(iq->dev, tx_buffer->sglist_dma,
- OCTEP_SGLIST_SIZE_PER_PKT,
- DMA_TO_DEVICE);
-
hw_desc->dptr = tx_buffer->sglist_dma;
}
--
2.25.1
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH net-next v3 3/4] octeon_ep: implement xmit_more in transmit
2023-10-27 12:16 [PATCH net-next v3 0/4] Cleanup and optimizations to transmit code Shinas Rasheed
2023-10-27 12:16 ` [PATCH net-next v3 1/4] octeon_ep: add padding for small packets Shinas Rasheed
2023-10-27 12:16 ` [PATCH net-next v3 2/4] octeon_ep: remove dma sync in trasmit path Shinas Rasheed
@ 2023-10-27 12:16 ` Shinas Rasheed
2023-10-27 12:16 ` [PATCH net-next v3 4/4] octeon_ep: remove atomic variable usage in Tx data path Shinas Rasheed
2023-10-30 6:31 ` [PATCH net-next v3 0/4] Cleanup and optimizations to transmit code Jakub Kicinski
4 siblings, 0 replies; 6+ messages in thread
From: Shinas Rasheed @ 2023-10-27 12:16 UTC (permalink / raw)
To: netdev, linux-kernel
Cc: hgani, vimleshk, egallen, mschmidt, pabeni, horms, kuba, davem,
wizhao, konguyen, Shinas Rasheed, Veerasenareddy Burru,
Sathesh Edara, Eric Dumazet
Add xmit_more handling in tx datapath for octeon_ep pf.
Signed-off-by: Shinas Rasheed <srasheed@marvell.com>
---
V3:
- Stop returning NETDEV_TX_BUSY when ring is full in xmit_patch.
Change to inspect early if next packet can fit in ring instead of
current packet, and stop queue if not.
- Add smp_mb between stopping tx queue and checking if tx queue has
free entries again, in queue full check function to let reflect
IQ process completions that might have happened on other cpus.
V2: https://lore.kernel.org/all/20231024145119.2366588-4-srasheed@marvell.com/
- Updated changelog to have imperative tone.
V1: https://lore.kernel.org/all/20231023114449.2362147-3-srasheed@marvell.com/
.../ethernet/marvell/octeon_ep/octep_config.h | 2 +-
.../ethernet/marvell/octeon_ep/octep_main.c | 36 ++++++++++++++-----
2 files changed, 28 insertions(+), 10 deletions(-)
diff --git a/drivers/net/ethernet/marvell/octeon_ep/octep_config.h b/drivers/net/ethernet/marvell/octeon_ep/octep_config.h
index 1622a6ebf036..ed8b1ace56b9 100644
--- a/drivers/net/ethernet/marvell/octeon_ep/octep_config.h
+++ b/drivers/net/ethernet/marvell/octeon_ep/octep_config.h
@@ -15,7 +15,7 @@
/* Tx Queue: maximum descriptors per ring */
#define OCTEP_IQ_MAX_DESCRIPTORS 1024
/* Minimum input (Tx) requests to be enqueued to ring doorbell */
-#define OCTEP_DB_MIN 1
+#define OCTEP_DB_MIN 8
/* Packet threshold for Tx queue interrupt */
#define OCTEP_IQ_INTR_THRESHOLD 0x0
diff --git a/drivers/net/ethernet/marvell/octeon_ep/octep_main.c b/drivers/net/ethernet/marvell/octeon_ep/octep_main.c
index 1c02304677c9..2d1bcdc589f3 100644
--- a/drivers/net/ethernet/marvell/octeon_ep/octep_main.c
+++ b/drivers/net/ethernet/marvell/octeon_ep/octep_main.c
@@ -784,6 +784,13 @@ static inline int octep_iq_full_check(struct octep_iq *iq)
/* Stop the queue if unable to send */
netif_stop_subqueue(iq->netdev, iq->q_no);
+ /* Allow for pending updates in write index
+ * from iq_process_completion in other cpus
+ * to reflect, in case queue gets free
+ * entries.
+ */
+ smp_mb();
+
/* check again and restart the queue, in case NAPI has just freed
* enough Tx ring entries.
*/
@@ -818,6 +825,7 @@ static netdev_tx_t octep_start_xmit(struct sk_buff *skb,
struct octep_iq *iq;
skb_frag_t *frag;
u16 nr_frags, si;
+ int xmit_more;
u16 q_no, wi;
if (skb_put_padto(skb, ETH_ZLEN))
@@ -830,10 +838,6 @@ static netdev_tx_t octep_start_xmit(struct sk_buff *skb,
}
iq = oct->iq[q_no];
- if (octep_iq_full_check(iq)) {
- iq->stats.tx_busy++;
- return NETDEV_TX_BUSY;
- }
shinfo = skb_shinfo(skb);
nr_frags = shinfo->nr_frags;
@@ -894,19 +898,33 @@ static netdev_tx_t octep_start_xmit(struct sk_buff *skb,
hw_desc->dptr = tx_buffer->sglist_dma;
}
- netdev_tx_sent_queue(iq->netdev_q, skb->len);
+ xmit_more = netdev_xmit_more();
+
+ __netdev_tx_sent_queue(iq->netdev_q, skb->len, xmit_more);
+
skb_tx_timestamp(skb);
atomic_inc(&iq->instr_pending);
+ iq->fill_cnt++;
wi++;
if (wi == iq->max_count)
wi = 0;
iq->host_write_index = wi;
+
+ /* octep_iq_full_check stops the queue and returns
+ * true if so, in case the queue has become full
+ * by inserting current packet. If so, we can
+ * go ahead and ring doorbell.
+ */
+ if (!octep_iq_full_check(iq) && xmit_more &&
+ iq->fill_cnt < iq->fill_threshold)
+ return NETDEV_TX_OK;
+
/* Flush the hw descriptor before writing to doorbell */
wmb();
-
- /* Ring Doorbell to notify the NIC there is a new packet */
- writel(1, iq->doorbell_reg);
- iq->stats.instr_posted++;
+ /* Ring Doorbell to notify the NIC of new packets */
+ writel(iq->fill_cnt, iq->doorbell_reg);
+ iq->stats.instr_posted += iq->fill_cnt;
+ iq->fill_cnt = 0;
return NETDEV_TX_OK;
dma_map_sg_err:
--
2.25.1
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH net-next v3 4/4] octeon_ep: remove atomic variable usage in Tx data path
2023-10-27 12:16 [PATCH net-next v3 0/4] Cleanup and optimizations to transmit code Shinas Rasheed
` (2 preceding siblings ...)
2023-10-27 12:16 ` [PATCH net-next v3 3/4] octeon_ep: implement xmit_more in transmit Shinas Rasheed
@ 2023-10-27 12:16 ` Shinas Rasheed
2023-10-30 6:31 ` [PATCH net-next v3 0/4] Cleanup and optimizations to transmit code Jakub Kicinski
4 siblings, 0 replies; 6+ messages in thread
From: Shinas Rasheed @ 2023-10-27 12:16 UTC (permalink / raw)
To: netdev, linux-kernel
Cc: hgani, vimleshk, egallen, mschmidt, pabeni, horms, kuba, davem,
wizhao, konguyen, Shinas Rasheed, Veerasenareddy Burru,
Sathesh Edara, Eric Dumazet
Replace atomic variable "instr_pending" which represents number of
posted tx instructions pending completion, with host_write_idx and
flush_index variables in the xmit and completion processing respectively.
Signed-off-by: Shinas Rasheed <srasheed@marvell.com>
---
V3:
- No changes.
V2: https://lore.kernel.org/all/20231024145119.2366588-5-srasheed@marvell.com/
- No changes.
V1: https://lore.kernel.org/all/20231023114449.2362147-4-srasheed@marvell.com/
drivers/net/ethernet/marvell/octeon_ep/octep_config.h | 1 +
drivers/net/ethernet/marvell/octeon_ep/octep_main.c | 9 +++------
drivers/net/ethernet/marvell/octeon_ep/octep_main.h | 9 +++++++++
drivers/net/ethernet/marvell/octeon_ep/octep_tx.c | 5 +----
drivers/net/ethernet/marvell/octeon_ep/octep_tx.h | 3 ---
5 files changed, 14 insertions(+), 13 deletions(-)
diff --git a/drivers/net/ethernet/marvell/octeon_ep/octep_config.h b/drivers/net/ethernet/marvell/octeon_ep/octep_config.h
index ed8b1ace56b9..91cfa19c65b9 100644
--- a/drivers/net/ethernet/marvell/octeon_ep/octep_config.h
+++ b/drivers/net/ethernet/marvell/octeon_ep/octep_config.h
@@ -13,6 +13,7 @@
#define OCTEP_64BYTE_INSTR 64
/* Tx Queue: maximum descriptors per ring */
+/* This needs to be a power of 2 */
#define OCTEP_IQ_MAX_DESCRIPTORS 1024
/* Minimum input (Tx) requests to be enqueued to ring doorbell */
#define OCTEP_DB_MIN 8
diff --git a/drivers/net/ethernet/marvell/octeon_ep/octep_main.c b/drivers/net/ethernet/marvell/octeon_ep/octep_main.c
index 2d1bcdc589f3..974a34be9ffa 100644
--- a/drivers/net/ethernet/marvell/octeon_ep/octep_main.c
+++ b/drivers/net/ethernet/marvell/octeon_ep/octep_main.c
@@ -777,7 +777,7 @@ static int octep_stop(struct net_device *netdev)
*/
static inline int octep_iq_full_check(struct octep_iq *iq)
{
- if (likely((iq->max_count - atomic_read(&iq->instr_pending)) >=
+ if (likely((IQ_INSTR_SPACE(iq)) >
OCTEP_WAKE_QUEUE_THRESHOLD))
return 0;
@@ -794,7 +794,7 @@ static inline int octep_iq_full_check(struct octep_iq *iq)
/* check again and restart the queue, in case NAPI has just freed
* enough Tx ring entries.
*/
- if (unlikely((iq->max_count - atomic_read(&iq->instr_pending)) >=
+ if (unlikely(IQ_INSTR_SPACE(iq) >
OCTEP_WAKE_QUEUE_THRESHOLD)) {
netif_start_subqueue(iq->netdev, iq->q_no);
iq->stats.restart_cnt++;
@@ -903,12 +903,9 @@ static netdev_tx_t octep_start_xmit(struct sk_buff *skb,
__netdev_tx_sent_queue(iq->netdev_q, skb->len, xmit_more);
skb_tx_timestamp(skb);
- atomic_inc(&iq->instr_pending);
iq->fill_cnt++;
wi++;
- if (wi == iq->max_count)
- wi = 0;
- iq->host_write_index = wi;
+ iq->host_write_index = wi & iq->ring_size_mask;
/* octep_iq_full_check stops the queue and returns
* true if so, in case the queue has become full
diff --git a/drivers/net/ethernet/marvell/octeon_ep/octep_main.h b/drivers/net/ethernet/marvell/octeon_ep/octep_main.h
index 6df902ebb7f3..c33e046b69a4 100644
--- a/drivers/net/ethernet/marvell/octeon_ep/octep_main.h
+++ b/drivers/net/ethernet/marvell/octeon_ep/octep_main.h
@@ -40,6 +40,15 @@
#define OCTEP_OQ_INTR_RESEND_BIT 59
#define OCTEP_MMIO_REGIONS 3
+
+#define IQ_INSTR_PENDING(iq) ({ typeof(iq) iq__ = (iq); \
+ ((iq__)->host_write_index - (iq__)->flush_index) & \
+ (iq__)->ring_size_mask; \
+ })
+#define IQ_INSTR_SPACE(iq) ({ typeof(iq) iq_ = (iq); \
+ (iq_)->max_count - IQ_INSTR_PENDING(iq_); \
+ })
+
/* PCI address space mapping information.
* Each of the 3 address spaces given by BAR0, BAR2 and BAR4 of
* Octeon gets mapped to different physical address spaces in
diff --git a/drivers/net/ethernet/marvell/octeon_ep/octep_tx.c b/drivers/net/ethernet/marvell/octeon_ep/octep_tx.c
index d0adb82d65c3..06851b78aa28 100644
--- a/drivers/net/ethernet/marvell/octeon_ep/octep_tx.c
+++ b/drivers/net/ethernet/marvell/octeon_ep/octep_tx.c
@@ -21,7 +21,6 @@ static void octep_iq_reset_indices(struct octep_iq *iq)
iq->flush_index = 0;
iq->pkts_processed = 0;
iq->pkt_in_done = 0;
- atomic_set(&iq->instr_pending, 0);
}
/**
@@ -82,7 +81,6 @@ int octep_iq_process_completions(struct octep_iq *iq, u16 budget)
}
iq->pkts_processed += compl_pkts;
- atomic_sub(compl_pkts, &iq->instr_pending);
iq->stats.instr_completed += compl_pkts;
iq->stats.bytes_sent += compl_bytes;
iq->stats.sgentry_sent += compl_sg;
@@ -91,7 +89,7 @@ int octep_iq_process_completions(struct octep_iq *iq, u16 budget)
netdev_tx_completed_queue(iq->netdev_q, compl_pkts, compl_bytes);
if (unlikely(__netif_subqueue_stopped(iq->netdev, iq->q_no)) &&
- ((iq->max_count - atomic_read(&iq->instr_pending)) >
+ (IQ_INSTR_SPACE(iq) >
OCTEP_WAKE_QUEUE_THRESHOLD))
netif_wake_subqueue(iq->netdev, iq->q_no);
return !budget;
@@ -144,7 +142,6 @@ static void octep_iq_free_pending(struct octep_iq *iq)
dev_kfree_skb_any(skb);
}
- atomic_set(&iq->instr_pending, 0);
iq->flush_index = fi;
netdev_tx_reset_queue(netdev_get_tx_queue(iq->netdev, iq->q_no));
}
diff --git a/drivers/net/ethernet/marvell/octeon_ep/octep_tx.h b/drivers/net/ethernet/marvell/octeon_ep/octep_tx.h
index 86c98b13fc44..1ba4ff65e54d 100644
--- a/drivers/net/ethernet/marvell/octeon_ep/octep_tx.h
+++ b/drivers/net/ethernet/marvell/octeon_ep/octep_tx.h
@@ -172,9 +172,6 @@ struct octep_iq {
/* Statistics for this input queue. */
struct octep_iq_stats stats;
- /* This field keeps track of the instructions pending in this queue. */
- atomic_t instr_pending;
-
/* Pointer to the Virtual Base addr of the input ring. */
struct octep_tx_desc_hw *desc_ring;
--
2.25.1
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH net-next v3 0/4] Cleanup and optimizations to transmit code
2023-10-27 12:16 [PATCH net-next v3 0/4] Cleanup and optimizations to transmit code Shinas Rasheed
` (3 preceding siblings ...)
2023-10-27 12:16 ` [PATCH net-next v3 4/4] octeon_ep: remove atomic variable usage in Tx data path Shinas Rasheed
@ 2023-10-30 6:31 ` Jakub Kicinski
4 siblings, 0 replies; 6+ messages in thread
From: Jakub Kicinski @ 2023-10-30 6:31 UTC (permalink / raw)
To: Shinas Rasheed
Cc: netdev, linux-kernel, hgani, vimleshk, egallen, mschmidt, pabeni,
horms, davem, wizhao, konguyen
On Fri, 27 Oct 2023 05:16:35 -0700 Shinas Rasheed wrote:
> Pad small packets to ETH_ZLEN before transmit, cleanup dma sync calls,
> add xmit_more functionality and then further remove atomic
> variable usage in the prior.
## Form letter - net-next-closed
The merge window for v6.7 has begun and we have already posted our pull
request. Therefore net-next is closed for new drivers, features, code
refactoring and optimizations. We are currently accepting bug fixes only.
Please repost when net-next reopens after Nov 12th.
RFC patches sent for review only are obviously welcome at any time.
See: https://www.kernel.org/doc/html/next/process/maintainer-netdev.html#development-cycle
--
pw-bot: defer
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2023-10-30 6:31 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-10-27 12:16 [PATCH net-next v3 0/4] Cleanup and optimizations to transmit code Shinas Rasheed
2023-10-27 12:16 ` [PATCH net-next v3 1/4] octeon_ep: add padding for small packets Shinas Rasheed
2023-10-27 12:16 ` [PATCH net-next v3 2/4] octeon_ep: remove dma sync in trasmit path Shinas Rasheed
2023-10-27 12:16 ` [PATCH net-next v3 3/4] octeon_ep: implement xmit_more in transmit Shinas Rasheed
2023-10-27 12:16 ` [PATCH net-next v3 4/4] octeon_ep: remove atomic variable usage in Tx data path Shinas Rasheed
2023-10-30 6:31 ` [PATCH net-next v3 0/4] Cleanup and optimizations to transmit code Jakub Kicinski
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).