public inbox for dev@dpdk.org
 help / color / mirror / Atom feed
* [RFC 1/2] config: add optimal burst size configuration
@ 2025-11-26  8:24 pbhagavatula
  2025-11-26  8:24 ` [RFC 2/2] examples: use optimal burst size pbhagavatula
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: pbhagavatula @ 2025-11-26  8:24 UTC (permalink / raw)
  To: mb, jerinj, Wathsala Vithanage, Bruce Richardson; +Cc: dev, Pavan Nikhilesh

From: Pavan Nikhilesh <pbhagavatula@marvell.com>

Add RTE_OPTIMAL_BURST_SIZE to allow platforms to configure the
optimal burst size.

Set default value to 64 for soc_cn10k and 32 generally.

Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
---
This improves performance by 5% on l2fwd, other examples showed
negligible difference on CN10K.

 config/arm/meson.build | 1 +
 config/meson.build     | 1 +
 2 files changed, 2 insertions(+)

diff --git a/config/arm/meson.build b/config/arm/meson.build
index 523b0fc0ed50..fa64c07016b1 100644
--- a/config/arm/meson.build
+++ b/config/arm/meson.build
@@ -481,6 +481,7 @@ soc_cn10k = {
         ['RTE_MAX_LCORE', 24],
         ['RTE_MAX_NUMA_NODES', 1],
         ['RTE_MEMPOOL_ALIGN', 128],
+        ['RTE_OPTIMAL_BURST_SIZE', 64],
     ],
     'part_number': '0xd49',
     'extra_march_features': ['crypto'],
diff --git a/config/meson.build b/config/meson.build
index 0cb074ab95b7..95367ae88e2d 100644
--- a/config/meson.build
+++ b/config/meson.build
@@ -386,6 +386,7 @@ if get_option('mbuf_refcnt_atomic')
     dpdk_conf.set('RTE_MBUF_REFCNT_ATOMIC', true)
 endif
 dpdk_conf.set10('RTE_IOVA_IN_MBUF', get_option('enable_iova_as_pa'))
+dpdk_conf.set('RTE_OPTIMAL_BURST_SIZE', 32)

 compile_time_cpuflags = []
 subdir(arch_subdir)
--
2.50.1 (Apple Git-155)


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [RFC 2/2] examples: use optimal burst size
  2025-11-26  8:24 [RFC 1/2] config: add optimal burst size configuration pbhagavatula
@ 2025-11-26  8:24 ` pbhagavatula
  2025-11-26  9:57 ` [RFC 1/2] config: add optimal burst size configuration Morten Brørup
  2026-02-20 23:07 ` [PATCH 1/2] config: add mbuf " pbhagavatula
  2 siblings, 0 replies; 12+ messages in thread
From: pbhagavatula @ 2025-11-26  8:24 UTC (permalink / raw)
  To: mb, jerinj, Wisam Jaddo, Aman Singh, Chas Williams,
	Min Hu (Connor), Akhil Goyal, Anoob Joseph, Nicolas Chautru,
	David Hunt, Chengwen Feng, Kevin Laatz, Bruce Richardson,
	Konstantin Ananyev, Radu Nicolau, Tomasz Kantecki, Fan Zhang,
	Sunil Kumar Kori, Pavan Nikhilesh, Anatoly Burakov,
	Sivaprasad Tummala, Jingjing Wu, Volodymyr Fialko,
	Cristian Dumitrescu, John McNamara, Maxime Coquelin, Chenbo Xia
  Cc: dev

From: Pavan Nikhilesh <pbhagavatula@marvell.com>

Replace hardcoded burst sizes with RTE_OPTIMAL_BURST_SIZE to
adapt to platform capabilities.

Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
---
 app/test-eventdev/evt_options.c            |  2 +-
 app/test-flow-perf/main.c                  |  2 +-
 app/test-pmd/testpmd.h                     |  2 +-
 app/test/test_link_bonding.c               |  2 +-
 app/test/test_link_bonding_mode4.c         |  4 +--
 app/test/test_pmd_perf.c                   |  2 +-
 app/test/test_security_inline_proto.c      |  2 +-
 examples/bbdev_app/main.c                  |  2 +-
 examples/bond/main.c                       |  2 +-
 examples/distributor/main.c                |  4 +--
 examples/dma/dmafwd.c                      |  2 +-
 examples/ethtool/ethtool-app/main.c        |  2 +-
 examples/ip_fragmentation/main.c           |  2 +-
 examples/ip_reassembly/main.c              |  3 +--
 examples/ipsec-secgw/ipsec-secgw.h         |  4 +--
 examples/ipv4_multicast/main.c             |  2 +-
 examples/l2fwd-cat/l2fwd-cat.c             |  2 +-
 examples/l2fwd-crypto/main.c               |  2 +-
 examples/l2fwd-event/l2fwd_common.h        |  2 +-
 examples/l2fwd-jobstats/main.c             |  2 +-
 examples/l2fwd-keepalive/main.c            |  2 +-
 examples/l2fwd-macsec/main.c               |  2 +-
 examples/l2fwd/main.c                      |  2 +-
 examples/l3fwd-power/main.c                |  2 +-
 examples/l3fwd/l3fwd.h                     |  4 +--
 examples/l3fwd/main.c                      | 29 +++++++++++++++-------
 examples/link_status_interrupt/main.c      |  4 +--
 examples/multi_process/symmetric_mp/main.c |  2 +-
 examples/ntb/ntb_fwd.c                     |  4 +--
 examples/packet_ordering/main.c            |  2 +-
 examples/qos_meter/main.c                  |  4 +--
 examples/qos_sched/main.h                  |  4 +--
 examples/rxtx_callbacks/main.c             |  2 +-
 examples/skeleton/basicfwd.c               |  2 +-
 examples/vhost/main.h                      |  2 +-
 examples/vhost_crypto/main.c               |  2 +-
 examples/vm_power_manager/main.c           |  2 +-
 examples/vmdq/main.c                       |  2 +-
 examples/vmdq_dcb/main.c                   |  2 +-
 39 files changed, 66 insertions(+), 56 deletions(-)

diff --git a/app/test-eventdev/evt_options.c b/app/test-eventdev/evt_options.c
index 0e70c971eb2e..55e2b07157d7 100644
--- a/app/test-eventdev/evt_options.c
+++ b/app/test-eventdev/evt_options.c
@@ -37,7 +37,7 @@ evt_options_default(struct evt_options *opt)
 	opt->expiry_nsec = 1E4;   /* 10000ns ~10us */
 	opt->prod_type = EVT_PROD_TYPE_SYNT;
 	opt->eth_queues = 1;
-	opt->vector_size = 64;
+	opt->vector_size = RTE_OPTIMAL_BURST_SIZE;
 	opt->vector_tmo_nsec = 100E3;
 	opt->crypto_op_type = RTE_CRYPTO_OP_TYPE_SYMMETRIC;
 	opt->crypto_cipher_alg = RTE_CRYPTO_CIPHER_NULL;
diff --git a/app/test-flow-perf/main.c b/app/test-flow-perf/main.c
index a8876acf1f90..13c5d6bf02eb 100644
--- a/app/test-flow-perf/main.c
+++ b/app/test-flow-perf/main.c
@@ -100,7 +100,7 @@ static uint8_t max_priority;
 static uint32_t rand_seed;
 static uint64_t meter_profile_values[3]; /* CIR CBS EBS values. */
 
-#define MAX_PKT_BURST    32
+#define MAX_PKT_BURST	  RTE_OPTIMAL_BURST_SIZE
 #define LCORE_MODE_PKT    1
 #define LCORE_MODE_STATS  2
 #define MAX_STREAMS      64
diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h
index 492b5757f113..229146a8d677 100644
--- a/app/test-pmd/testpmd.h
+++ b/app/test-pmd/testpmd.h
@@ -78,7 +78,7 @@ struct cmdline_file_info {
 #define TX_DESC_MAX    2048
 
 #define MAX_PKT_BURST 512
-#define DEF_PKT_BURST 32
+#define DEF_PKT_BURST RTE_OPTIMAL_BURST_SIZE
 
 #define DEF_MBUF_CACHE 250
 
diff --git a/app/test/test_link_bonding.c b/app/test/test_link_bonding.c
index 19b064771aef..dd1b19104732 100644
--- a/app/test/test_link_bonding.c
+++ b/app/test/test_link_bonding.c
@@ -52,7 +52,7 @@
 #define RX_DESC_MAX	(2048)
 #define TX_DESC_MAX	(2048)
 #define MAX_PKT_BURST			(512)
-#define DEF_PKT_BURST			(16)
+#define DEF_PKT_BURST			(RTE_OPTIMAL_BURST_SIZE)
 
 #define BONDING_DEV_NAME			("net_bonding_ut")
 
diff --git a/app/test/test_link_bonding_mode4.c b/app/test/test_link_bonding_mode4.c
index ff13dbed93f3..ec336f06848c 100644
--- a/app/test/test_link_bonding_mode4.c
+++ b/app/test/test_link_bonding_mode4.c
@@ -41,8 +41,8 @@
 
 #define TEST_RX_DESC_MAX        (2048)
 #define TEST_TX_DESC_MAX        (2048)
-#define MAX_PKT_BURST           (32)
-#define DEF_PKT_BURST           (16)
+#define MAX_PKT_BURST		(RTE_OPTIMAL_BURST_SIZE)
+#define DEF_PKT_BURST		(RTE_OPTIMAL_BURST_SIZE)
 
 #define BONDING_DEV_NAME         ("net_bonding_m4_bond_dev")
 
diff --git a/app/test/test_pmd_perf.c b/app/test/test_pmd_perf.c
index 995b0a6f20c4..be4ebdf4c3ad 100644
--- a/app/test/test_pmd_perf.c
+++ b/app/test/test_pmd_perf.c
@@ -17,7 +17,7 @@
 #define NB_ETHPORTS_USED                (1)
 #define NB_SOCKETS                      (2)
 #define MEMPOOL_CACHE_SIZE 250
-#define MAX_PKT_BURST                   (32)
+#define MAX_PKT_BURST			(RTE_OPTIMAL_BURST_SIZE)
 #define RX_DESC_DEFAULT        (1024)
 #define TX_DESC_DEFAULT        (1024)
 #define RTE_PORT_ALL            (~(uint16_t)0x0)
diff --git a/app/test/test_security_inline_proto.c b/app/test/test_security_inline_proto.c
index 04ecfd02c6a1..40b579107008 100644
--- a/app/test/test_security_inline_proto.c
+++ b/app/test/test_security_inline_proto.c
@@ -44,7 +44,7 @@ test_inline_ipsec_sg(void)
 
 #define NB_ETHPORTS_USED		1
 #define MEMPOOL_CACHE_SIZE		32
-#define MAX_PKT_BURST			32
+#define MAX_PKT_BURST			RTE_OPTIMAL_BURST_SIZE
 #define RX_DESC_DEFAULT	1024
 #define TX_DESC_DEFAULT	1024
 #define RTE_PORT_ALL		(~(uint16_t)0x0)
diff --git a/examples/bbdev_app/main.c b/examples/bbdev_app/main.c
index 03f15f91cc6b..453bab9758f6 100644
--- a/examples/bbdev_app/main.c
+++ b/examples/bbdev_app/main.c
@@ -39,7 +39,7 @@
 #define LLR_1_BIT 0x81
 #define LLR_0_BIT 0x7F
 
-#define MAX_PKT_BURST 32
+#define MAX_PKT_BURST	   RTE_OPTIMAL_BURST_SIZE
 #define NB_MBUF 8191
 #define MEMPOOL_CACHE_SIZE 256
 
diff --git a/examples/bond/main.c b/examples/bond/main.c
index 9f38b63cbbad..36e7e0f4b54d 100644
--- a/examples/bond/main.c
+++ b/examples/bond/main.c
@@ -52,7 +52,7 @@
 
 #define NB_MBUF   (1024*8)
 
-#define MAX_PKT_BURST 32
+#define MAX_PKT_BURST	     RTE_OPTIMAL_BURST_SIZE
 #define BURST_TX_DRAIN_US 100      /* TX drain every ~100us */
 #define BURST_RX_INTERVAL_NS (10) /* RX poll interval ~100ns */
 
diff --git a/examples/distributor/main.c b/examples/distributor/main.c
index ea44939fba04..977b80e03697 100644
--- a/examples/distributor/main.c
+++ b/examples/distributor/main.c
@@ -23,10 +23,10 @@
 #define TX_RING_SIZE 1024
 #define NUM_MBUFS ((64*1024)-1)
 #define MBUF_CACHE_SIZE 128
-#define BURST_SIZE 64
+#define BURST_SIZE	 RTE_OPTIMAL_BURST_SIZE
 #define SCHED_RX_RING_SZ 8192
 #define SCHED_TX_RING_SZ 65536
-#define BURST_SIZE_TX 32
+#define BURST_SIZE_TX	 RTE_OPTIMAL_BURST_SIZE
 
 #define RTE_LOGTYPE_DISTRAPP RTE_LOGTYPE_USER1
 
diff --git a/examples/dma/dmafwd.c b/examples/dma/dmafwd.c
index 5ba0aaa40b21..3f9d934cd1b4 100644
--- a/examples/dma/dmafwd.c
+++ b/examples/dma/dmafwd.c
@@ -15,7 +15,7 @@
 
 /* size of ring used for software copying between rx and tx. */
 #define RTE_LOGTYPE_DMA RTE_LOGTYPE_USER1
-#define MAX_PKT_BURST 32
+#define MAX_PKT_BURST		     RTE_OPTIMAL_BURST_SIZE
 #define MEMPOOL_CACHE_SIZE 512
 #define MIN_POOL_SIZE 65536U
 #define CMD_LINE_OPT_PORTMASK_INDEX 1
diff --git a/examples/ethtool/ethtool-app/main.c b/examples/ethtool/ethtool-app/main.c
index 1f011a932166..183cbd714020 100644
--- a/examples/ethtool/ethtool-app/main.c
+++ b/examples/ethtool/ethtool-app/main.c
@@ -19,7 +19,7 @@
 #include "ethapp.h"
 
 #define MAX_PORTS RTE_MAX_ETHPORTS
-#define MAX_BURST_LENGTH 32
+#define MAX_BURST_LENGTH   RTE_OPTIMAL_BURST_SIZE
 #define PORT_RX_QUEUE_SIZE 1024
 #define PORT_TX_QUEUE_SIZE 1024
 #define PKTPOOL_EXTRA_SIZE 512
diff --git a/examples/ip_fragmentation/main.c b/examples/ip_fragmentation/main.c
index 1f841028442f..57bb0f52cb90 100644
--- a/examples/ip_fragmentation/main.c
+++ b/examples/ip_fragmentation/main.c
@@ -75,7 +75,7 @@
 
 #define NB_MBUF   8192
 
-#define MAX_PKT_BURST	32
+#define MAX_PKT_BURST	  RTE_OPTIMAL_BURST_SIZE
 #define BURST_TX_DRAIN_US 100 /* TX drain every ~100us */
 
 /* Configure how many packets ahead to prefetch, when reading packets */
diff --git a/examples/ip_reassembly/main.c b/examples/ip_reassembly/main.c
index 25b904dbd44d..bf4d30d0ef20 100644
--- a/examples/ip_reassembly/main.c
+++ b/examples/ip_reassembly/main.c
@@ -44,8 +44,7 @@
 
 #include <rte_ip_frag.h>
 
-#define MAX_PKT_BURST 32
-
+#define MAX_PKT_BURST RTE_OPTIMAL_BURST_SIZE
 
 #define RTE_LOGTYPE_IP_RSMBL RTE_LOGTYPE_USER1
 
diff --git a/examples/ipsec-secgw/ipsec-secgw.h b/examples/ipsec-secgw/ipsec-secgw.h
index b4ef4b6d04bc..939159dd32b0 100644
--- a/examples/ipsec-secgw/ipsec-secgw.h
+++ b/examples/ipsec-secgw/ipsec-secgw.h
@@ -11,8 +11,8 @@
 
 #define NB_SOCKETS 4
 
-#define MAX_PKT_BURST 32
-#define MAX_PKT_BURST_VEC 256
+#define MAX_PKT_BURST	  RTE_OPTIMAL_BURST_SIZE
+#define MAX_PKT_BURST_VEC RTE_OPTIMAL_BURST_SIZE
 
 #define MAX_PKTS                                  \
 	((MAX_PKT_BURST_VEC > MAX_PKT_BURST ?     \
diff --git a/examples/ipv4_multicast/main.c b/examples/ipv4_multicast/main.c
index 1eed645d02e0..742151749bbe 100644
--- a/examples/ipv4_multicast/main.c
+++ b/examples/ipv4_multicast/main.c
@@ -54,7 +54,7 @@
 /* allow max jumbo frame 9.5 KB */
 #define	JUMBO_FRAME_MAX_SIZE	0x2600
 
-#define MAX_PKT_BURST 32
+#define MAX_PKT_BURST	  RTE_OPTIMAL_BURST_SIZE
 #define BURST_TX_DRAIN_US 100 /* TX drain every ~100us */
 
 /* Configure how many packets ahead to prefetch, when reading packets */
diff --git a/examples/l2fwd-cat/l2fwd-cat.c b/examples/l2fwd-cat/l2fwd-cat.c
index 6e16705e9931..bac18496a7fb 100644
--- a/examples/l2fwd-cat/l2fwd-cat.c
+++ b/examples/l2fwd-cat/l2fwd-cat.c
@@ -17,7 +17,7 @@
 
 #define NUM_MBUFS 8191
 #define MBUF_CACHE_SIZE 250
-#define BURST_SIZE 32
+#define BURST_SIZE	RTE_OPTIMAL_BURST_SIZE
 
 /* l2fwd-cat.c: CAT enabled, basic DPDK skeleton forwarding example. */
 
diff --git a/examples/l2fwd-crypto/main.c b/examples/l2fwd-crypto/main.c
index a441312f5524..bfe0b662a5ed 100644
--- a/examples/l2fwd-crypto/main.c
+++ b/examples/l2fwd-crypto/main.c
@@ -61,7 +61,7 @@ enum cdev_type {
 #define MAX_KEY_SIZE 128
 #define MAX_IV_SIZE 16
 #define MAX_AAD_SIZE 65535
-#define MAX_PKT_BURST 32
+#define MAX_PKT_BURST		RTE_OPTIMAL_BURST_SIZE
 #define BURST_TX_DRAIN_US 100 /* TX drain every ~100us */
 #define SESSION_POOL_CACHE_SIZE 0
 
diff --git a/examples/l2fwd-event/l2fwd_common.h b/examples/l2fwd-event/l2fwd_common.h
index 8cf91b919cd4..2f271ac06972 100644
--- a/examples/l2fwd-event/l2fwd_common.h
+++ b/examples/l2fwd-event/l2fwd_common.h
@@ -42,7 +42,7 @@
 #include <rte_mbuf.h>
 #include <rte_spinlock.h>
 
-#define MAX_PKT_BURST 32
+#define MAX_PKT_BURST	       RTE_OPTIMAL_BURST_SIZE
 #define MAX_RX_QUEUE_PER_LCORE 16
 #define MAX_TX_QUEUE_PER_PORT 16
 
diff --git a/examples/l2fwd-jobstats/main.c b/examples/l2fwd-jobstats/main.c
index 308b8edd2023..7018d2b7e185 100644
--- a/examples/l2fwd-jobstats/main.c
+++ b/examples/l2fwd-jobstats/main.c
@@ -39,7 +39,7 @@
 
 #define NB_MBUF   8192
 
-#define MAX_PKT_BURST 32
+#define MAX_PKT_BURST	  RTE_OPTIMAL_BURST_SIZE
 #define BURST_TX_DRAIN_US 100 /* TX drain every ~100us */
 
 /*
diff --git a/examples/l2fwd-keepalive/main.c b/examples/l2fwd-keepalive/main.c
index bff2b99531aa..29ce52580678 100644
--- a/examples/l2fwd-keepalive/main.c
+++ b/examples/l2fwd-keepalive/main.c
@@ -43,7 +43,7 @@
 
 #define NB_MBUF_PER_PORT 3000
 
-#define MAX_PKT_BURST 32
+#define MAX_PKT_BURST	  RTE_OPTIMAL_BURST_SIZE
 #define BURST_TX_DRAIN_US 100 /* TX drain every ~100us */
 
 /*
diff --git a/examples/l2fwd-macsec/main.c b/examples/l2fwd-macsec/main.c
index 73e32fc197b6..dcfa97896a7e 100644
--- a/examples/l2fwd-macsec/main.c
+++ b/examples/l2fwd-macsec/main.c
@@ -49,7 +49,7 @@ static int promiscuous_on = 1;
 
 #define RTE_LOGTYPE_L2FWD RTE_LOGTYPE_USER1
 
-#define MAX_PKT_BURST 32
+#define MAX_PKT_BURST		RTE_OPTIMAL_BURST_SIZE
 #define BURST_TX_DRAIN_US 100 /* TX drain every ~100us */
 #define MEMPOOL_CACHE_SIZE 256
 #define SESSION_POOL_CACHE_SIZE 0
diff --git a/examples/l2fwd/main.c b/examples/l2fwd/main.c
index c6fafdd01935..c98350f58fba 100644
--- a/examples/l2fwd/main.c
+++ b/examples/l2fwd/main.c
@@ -48,7 +48,7 @@ static int promiscuous_on;
 
 #define RTE_LOGTYPE_L2FWD RTE_LOGTYPE_USER1
 
-#define MAX_PKT_BURST 32
+#define MAX_PKT_BURST	   RTE_OPTIMAL_BURST_SIZE
 #define BURST_TX_DRAIN_US 100 /* TX drain every ~100us */
 #define MEMPOOL_CACHE_SIZE 256
 
diff --git a/examples/l3fwd-power/main.c b/examples/l3fwd-power/main.c
index ec12d1cc0b73..46dbefaebbb2 100644
--- a/examples/l3fwd-power/main.c
+++ b/examples/l3fwd-power/main.c
@@ -55,7 +55,7 @@
 RTE_LOG_REGISTER(l3fwd_power_logtype, l3fwd.power, INFO);
 #define RTE_LOGTYPE_L3FWD_POWER l3fwd_power_logtype
 
-#define MAX_PKT_BURST 32
+#define MAX_PKT_BURST RTE_OPTIMAL_BURST_SIZE
 
 #define MIN_ZERO_POLL_COUNT 10
 
diff --git a/examples/l3fwd/l3fwd.h b/examples/l3fwd/l3fwd.h
index 471e3b488fe6..0fa166c6d528 100644
--- a/examples/l3fwd/l3fwd.h
+++ b/examples/l3fwd/l3fwd.h
@@ -23,14 +23,14 @@
 #define RX_DESC_DEFAULT 1024
 #define TX_DESC_DEFAULT 1024
 
-#define DEFAULT_PKT_BURST 32
+#define DEFAULT_PKT_BURST RTE_OPTIMAL_BURST_SIZE
 #define MAX_PKT_BURST 512
 #define BURST_TX_DRAIN_US 100 /* TX drain every ~100us */
 
 #define MEMPOOL_CACHE_SIZE RTE_MEMPOOL_CACHE_MAX_SIZE
 #define MAX_RX_QUEUE_PER_LCORE 16
 
-#define VECTOR_SIZE_DEFAULT   MAX_PKT_BURST
+#define VECTOR_SIZE_DEFAULT   RTE_OPTIMAL_BURST_SIZE
 #define VECTOR_TMO_NS_DEFAULT 1E6 /* 1ms */
 
 #define NB_SOCKETS        8
diff --git a/examples/l3fwd/main.c b/examples/l3fwd/main.c
index a5626ff02d13..00375a74fbe7 100644
--- a/examples/l3fwd/main.c
+++ b/examples/l3fwd/main.c
@@ -1074,17 +1074,26 @@ parse_args(int argc, char **argv)
 		return -1;
 	}
 
-	if (evt_rsrc->vector_enabled && !evt_rsrc->vector_size) {
-		evt_rsrc->vector_size = VECTOR_SIZE_DEFAULT;
-		fprintf(stderr, "vector size set to default (%" PRIu16 ")\n",
-			evt_rsrc->vector_size);
+	if (evt_rsrc->vector_enabled) {
+		if (!evt_rsrc->vector_size) {
+			evt_rsrc->vector_size = VECTOR_SIZE_DEFAULT;
+			fprintf(stderr, "vector size set to default (%" PRIu16 ")\n",
+				evt_rsrc->vector_size);
+		} else {
+			fprintf(stderr, "vector size set to (%" PRIu16 ")\n",
+				evt_rsrc->vector_size);
+		}
 	}
 
-	if (evt_rsrc->vector_enabled && !evt_rsrc->vector_tmo_ns) {
-		evt_rsrc->vector_tmo_ns = VECTOR_TMO_NS_DEFAULT;
-		fprintf(stderr,
-			"vector timeout set to default (%" PRIu64 " ns)\n",
-			evt_rsrc->vector_tmo_ns);
+	if (evt_rsrc->vector_enabled) {
+		if (!evt_rsrc->vector_tmo_ns) {
+			evt_rsrc->vector_tmo_ns = VECTOR_TMO_NS_DEFAULT;
+			fprintf(stderr, "vector timeout set to default (%" PRIu64 " ns)\n",
+				evt_rsrc->vector_tmo_ns);
+		} else {
+			fprintf(stderr, "vector timeout set to (%" PRIu64 " ns)\n",
+				evt_rsrc->vector_tmo_ns);
+		}
 	}
 #endif
 
@@ -1687,7 +1696,9 @@ main(int argc, char **argv)
 	if (ret < 0)
 		rte_exit(EXIT_FAILURE, "Invalid L3FWD parameters\n");
 
+#ifndef RTE_LIB_EVENTDEV
 	RTE_LOG(INFO, L3FWD, "Using Rx burst %u Tx burst %u\n", rx_burst_size, tx_burst_size);
+#endif
 
 	/* Setup function pointers for lookup method. */
 	setup_l3fwd_lookup_tables();
diff --git a/examples/link_status_interrupt/main.c b/examples/link_status_interrupt/main.c
index ac9c7f62170e..2cf8e3f67f91 100644
--- a/examples/link_status_interrupt/main.c
+++ b/examples/link_status_interrupt/main.c
@@ -39,7 +39,7 @@
 
 #define NB_MBUF   8192
 
-#define MAX_PKT_BURST 32
+#define MAX_PKT_BURST	  RTE_OPTIMAL_BURST_SIZE
 #define BURST_TX_DRAIN_US 100 /* TX drain every ~100us */
 
 /*
@@ -61,7 +61,7 @@ static unsigned int lsi_rx_queue_per_lcore = 1;
 /* destination port for L2 forwarding */
 static unsigned lsi_dst_ports[RTE_MAX_ETHPORTS] = {0};
 
-#define MAX_PKT_BURST 32
+#define MAX_PKT_BURST RTE_OPTIMAL_BURST_SIZE
 
 #define MAX_RX_QUEUE_PER_LCORE 16
 #define MAX_TX_QUEUE_PER_PORT 16
diff --git a/examples/multi_process/symmetric_mp/main.c b/examples/multi_process/symmetric_mp/main.c
index f7d8439cd4e6..1780e93fd742 100644
--- a/examples/multi_process/symmetric_mp/main.c
+++ b/examples/multi_process/symmetric_mp/main.c
@@ -46,7 +46,7 @@
 
 #define NB_MBUFS 64*1024 /* use 64k mbufs */
 #define MBUF_CACHE_SIZE 256
-#define PKT_BURST 32
+#define PKT_BURST	RTE_OPTIMAL_BURST_SIZE
 #define RX_RING_SIZE 1024
 #define TX_RING_SIZE 1024
 
diff --git a/examples/ntb/ntb_fwd.c b/examples/ntb/ntb_fwd.c
index 33f3c1ef17e4..5e7629753089 100644
--- a/examples/ntb/ntb_fwd.c
+++ b/examples/ntb/ntb_fwd.c
@@ -83,8 +83,8 @@ static uint16_t nb_desc = NTB_DEFAULT_NUM_DESCS;
 
 static uint16_t tx_free_thresh;
 
-#define NTB_MAX_PKT_BURST 32
-#define NTB_DFLT_PKT_BURST 32
+#define NTB_MAX_PKT_BURST  RTE_OPTIMAL_BURST_SIZE
+#define NTB_DFLT_PKT_BURST RTE_OPTIMAL_BURST_SIZE
 static uint16_t pkt_burst = NTB_DFLT_PKT_BURST;
 
 #define BURST_TX_RETRIES 64
diff --git a/examples/packet_ordering/main.c b/examples/packet_ordering/main.c
index 5ffdf72d71ab..a2092930523a 100644
--- a/examples/packet_ordering/main.c
+++ b/examples/packet_ordering/main.c
@@ -21,7 +21,7 @@
 #define RX_DESC_PER_QUEUE 1024
 #define TX_DESC_PER_QUEUE 1024
 
-#define MAX_PKTS_BURST 32
+#define MAX_PKTS_BURST	     RTE_OPTIMAL_BURST_SIZE
 #define REORDER_BUFFER_SIZE 8192
 #define MBUF_PER_POOL 65535
 #define MBUF_POOL_CACHE_SIZE 250
diff --git a/examples/qos_meter/main.c b/examples/qos_meter/main.c
index da1b0b228787..cdfdfde82aec 100644
--- a/examples/qos_meter/main.c
+++ b/examples/qos_meter/main.c
@@ -76,8 +76,8 @@ static struct rte_eth_conf port_conf = {
  * Packet RX/TX
  *
  ***/
-#define RTE_MBUF_F_RX_BURST_MAX                32
-#define RTE_MBUF_F_TX_BURST_MAX                32
+#define RTE_MBUF_F_RX_BURST_MAX		RTE_OPTIMAL_BURST_SIZE
+#define RTE_MBUF_F_TX_BURST_MAX		RTE_OPTIMAL_BURST_SIZE
 #define TIME_TX_DRAIN                   200000ULL
 
 static uint16_t port_rx;
diff --git a/examples/qos_sched/main.h b/examples/qos_sched/main.h
index ea66df0434fb..58abd5b9c4e2 100644
--- a/examples/qos_sched/main.h
+++ b/examples/qos_sched/main.h
@@ -24,10 +24,10 @@ extern "C" {
 #define APP_RING_SIZE (8*1024)
 #define NB_MBUF   (2*1024*1024)
 
-#define MAX_PKT_RX_BURST 64
+#define MAX_PKT_RX_BURST RTE_OPTIMAL_BURST_SIZE
 #define PKT_ENQUEUE 64
 #define PKT_DEQUEUE 63
-#define MAX_PKT_TX_BURST 64
+#define MAX_PKT_TX_BURST RTE_OPTIMAL_BURST_SIZE
 
 #define RX_PTHRESH 8 /**< Default values of RX prefetch threshold reg. */
 #define RX_HTHRESH 8 /**< Default values of RX host threshold reg. */
diff --git a/examples/rxtx_callbacks/main.c b/examples/rxtx_callbacks/main.c
index 4682921285de..8b01248f286d 100644
--- a/examples/rxtx_callbacks/main.c
+++ b/examples/rxtx_callbacks/main.c
@@ -19,7 +19,7 @@
 
 #define NUM_MBUFS 8191
 #define MBUF_CACHE_SIZE 250
-#define BURST_SIZE 32
+#define BURST_SIZE	RTE_OPTIMAL_BURST_SIZE
 
 static int hwts_dynfield_offset = -1;
 
diff --git a/examples/skeleton/basicfwd.c b/examples/skeleton/basicfwd.c
index 133293cf15bb..28ab971f56db 100644
--- a/examples/skeleton/basicfwd.c
+++ b/examples/skeleton/basicfwd.c
@@ -16,7 +16,7 @@
 
 #define NUM_MBUFS 8191
 #define MBUF_CACHE_SIZE 250
-#define BURST_SIZE 32
+#define BURST_SIZE	RTE_OPTIMAL_BURST_SIZE
 
 /* basicfwd.c: Basic DPDK skeleton forwarding example. */
 
diff --git a/examples/vhost/main.h b/examples/vhost/main.h
index c986cbc5a994..b85c251037d0 100644
--- a/examples/vhost/main.h
+++ b/examples/vhost/main.h
@@ -17,7 +17,7 @@
 
 enum {VIRTIO_RXQ, VIRTIO_TXQ, VIRTIO_QNUM};
 
-#define MAX_PKT_BURST 32		/* Max burst size for RX/TX */
+#define MAX_PKT_BURST RTE_OPTIMAL_BURST_SIZE /* Max burst size for RX/TX */
 
 struct device_statistics {
 	uint64_t	tx;
diff --git a/examples/vhost_crypto/main.c b/examples/vhost_crypto/main.c
index 8bdfc40c4b20..d60e17bee7d0 100644
--- a/examples/vhost_crypto/main.c
+++ b/examples/vhost_crypto/main.c
@@ -23,7 +23,7 @@
 #include <cmdline.h>
 
 #define NB_VIRTIO_QUEUES		(1)
-#define MAX_PKT_BURST			(64)
+#define MAX_PKT_BURST			(RTE_OPTIMAL_BURST_SIZE)
 #define MAX_IV_LEN			(32)
 #define NB_MEMPOOL_OBJS			(8192)
 #define NB_CRYPTO_DESCRIPTORS		(4096)
diff --git a/examples/vm_power_manager/main.c b/examples/vm_power_manager/main.c
index c14138202004..839348bb75d6 100644
--- a/examples/vm_power_manager/main.c
+++ b/examples/vm_power_manager/main.c
@@ -45,7 +45,7 @@
 
 #define NUM_MBUFS 8191
 #define MBUF_CACHE_SIZE 250
-#define BURST_SIZE 32
+#define BURST_SIZE	RTE_OPTIMAL_BURST_SIZE
 
 static uint32_t enabled_port_mask;
 static volatile bool force_quit;
diff --git a/examples/vmdq/main.c b/examples/vmdq/main.c
index 4a3ce6884c5c..19af8b052adf 100644
--- a/examples/vmdq/main.c
+++ b/examples/vmdq/main.c
@@ -42,7 +42,7 @@
 						TX_DESC_DEFAULT))
 #define MBUF_CACHE_SIZE 64
 
-#define MAX_PKT_BURST 32
+#define MAX_PKT_BURST RTE_OPTIMAL_BURST_SIZE
 
 /*
  * Configurable number of RX/TX ring descriptors
diff --git a/examples/vmdq_dcb/main.c b/examples/vmdq_dcb/main.c
index 4ccc2fe4b01c..94b077ccbc75 100644
--- a/examples/vmdq_dcb/main.c
+++ b/examples/vmdq_dcb/main.c
@@ -43,7 +43,7 @@
 						TX_DESC_DEFAULT))
 #define MBUF_CACHE_SIZE 64
 
-#define MAX_PKT_BURST 32
+#define MAX_PKT_BURST RTE_OPTIMAL_BURST_SIZE
 
 /*
  * Configurable number of RX/TX ring descriptors
-- 
2.50.1 (Apple Git-155)


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* RE: [RFC 1/2] config: add optimal burst size configuration
  2025-11-26  8:24 [RFC 1/2] config: add optimal burst size configuration pbhagavatula
  2025-11-26  8:24 ` [RFC 2/2] examples: use optimal burst size pbhagavatula
@ 2025-11-26  9:57 ` Morten Brørup
  2025-11-26 10:58   ` Pavan Nikhilesh Bhagavatula
                     ` (2 more replies)
  2026-02-20 23:07 ` [PATCH 1/2] config: add mbuf " pbhagavatula
  2 siblings, 3 replies; 12+ messages in thread
From: Morten Brørup @ 2025-11-26  9:57 UTC (permalink / raw)
  To: Pavan Nikhilesh, Jerin Jacob, Wathsala Vithanage,
	Bruce Richardson; +Cc: dev

> From: Pavan Nikhilesh <pbhagavatula@marvell.com>
> 
> Add RTE_OPTIMAL_BURST_SIZE to allow platforms to configure the
> optimal burst size.
> 
> Set default value to 64 for soc_cn10k and 32 generally.
> 
> Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
> ---
> This improves performance by 5% on l2fwd, other examples showed
> negligible difference on CN10K.
>

I support the concept of having a recommended mbuf burst size, targeting the majority of generic applications.
Making it CPU dependent seems like a good choice.

It should be named differently.
First of all, "optimal" depends on the use case; if targeting low latency, shorter bursts are better, so "OPTIMAL" should not be part of the name.
Second, I would guess that it only targets mbuf bursts, not also bursts of other operations (e.g. hash lookups), so "MBUF" should be part of the name.

Suggestion:
/* Recommended burst size for generic applications, striking a balance between throughput and latency. */
dpdk_conf.set('RTE_MBUF_BURST_SIZE_MAX' (or _DEFAULT), 64)

<feature creep>
/* Recommended burst size for generic applications targeting low latency. */
dpdk_conf.set('RTE_MBUF_BURST_SIZE_MIN', 4)
</feature creep>

Having these standardized will also allow libraries and drivers to optimize for them, e.g. drivers should support bursts sizes all the way down to RTE_MBUF_BURST_SIZE_MIN, and can static_assert() that the RTE_MBUF_BURST_SIZE_MIN is not lower than supported by the driver/hardware.

<more feature creep>
rte_config.h could have "#define RTE_MBUF_BURST_SIZE RTE_MBUF_BURST_SIZE_MAX", for the application developer to change to RTE_MBUF_BURST_SIZE_MIN for low latency applications.
This will let the libraries and drivers optimize for the specific burst size used by the application.
</more feature creep>

<rambling>
Intuitively, I would assume that the optimal burst size essentially depends on the CPU's L1D cache size and the application's number of non-mbuf cache lines accessed per burst.
Let's say a CPU core has 32 KiB cache (= 512 cache lines), and each burst touches 4 cache lines per packet:
2 cache lines for the mbuf
1 cache line for the packet data
1 cache line per packet for some table lookup/forwarding entry

Then the mbuf burst should be max 512/4 = 128.
But local variables also use memory during processing, so using a burst of 64 would leave room for that and some more.
</rambling>

>  config/arm/meson.build | 1 +
>  config/meson.build     | 1 +
>  2 files changed, 2 insertions(+)
> 
> diff --git a/config/arm/meson.build b/config/arm/meson.build
> index 523b0fc0ed50..fa64c07016b1 100644
> --- a/config/arm/meson.build
> +++ b/config/arm/meson.build
> @@ -481,6 +481,7 @@ soc_cn10k = {
>          ['RTE_MAX_LCORE', 24],
>          ['RTE_MAX_NUMA_NODES', 1],
>          ['RTE_MEMPOOL_ALIGN', 128],
> +        ['RTE_OPTIMAL_BURST_SIZE', 64],
>      ],
>      'part_number': '0xd49',
>      'extra_march_features': ['crypto'],
> diff --git a/config/meson.build b/config/meson.build
> index 0cb074ab95b7..95367ae88e2d 100644
> --- a/config/meson.build
> +++ b/config/meson.build
> @@ -386,6 +386,7 @@ if get_option('mbuf_refcnt_atomic')
>      dpdk_conf.set('RTE_MBUF_REFCNT_ATOMIC', true)
>  endif
>  dpdk_conf.set10('RTE_IOVA_IN_MBUF', get_option('enable_iova_as_pa'))
> +dpdk_conf.set('RTE_OPTIMAL_BURST_SIZE', 32)
> 
>  compile_time_cpuflags = []
>  subdir(arch_subdir)
> --
> 2.50.1 (Apple Git-155)


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC 1/2] config: add optimal burst size configuration
  2025-11-26  9:57 ` [RFC 1/2] config: add optimal burst size configuration Morten Brørup
@ 2025-11-26 10:58   ` Pavan Nikhilesh Bhagavatula
  2025-11-26 11:00   ` Pavan Nikhilesh Bhagavatula
  2025-11-27 22:01   ` Stephen Hemminger
  2 siblings, 0 replies; 12+ messages in thread
From: Pavan Nikhilesh Bhagavatula @ 2025-11-26 10:58 UTC (permalink / raw)
  To: Morten Brørup, Jerin Jacob, Wathsala Vithanage,
	Bruce Richardson
  Cc: dev@dpdk.org

>> From: Pavan Nikhilesh <pbhagavatula@marvell.com>
>>
>> Add RTE_OPTIMAL_BURST_SIZE to allow platforms to configure the
>> optimal burst size.
>>
>> Set default value to 64 for soc_cn10k and 32 generally.
>>
>> Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
>> ---
>> This improves performance by 5% on l2fwd, other examples showed
>> negligible difference on CN10K.
>>
>
>I support the concept of having a recommended mbuf burst size, targeting the majority of generic applications.
>Making it CPU dependent seems like a good choice.
>
>It should be named differently.
>First of all, "optimal" depends on the use case; if targeting low latency, shorter bursts are better, so "OPTIMAL" should not be part of the name.
>Second, I would guess that it only targets mbuf bursts, not also bursts of other operations (e.g. hash lookups), so "MBUF" should be part of the name.
>
>Suggestion:
>/* Recommended burst size for generic applications, striking a balance between throughput and latency. */
>dpdk_conf.set('RTE_MBUF_BURST_SIZE_MAX' (or _DEFAULT), 64)
>

Agreed, would the 

><feature creep>
>/* Recommended burst size for generic applications targeting low latency. */
>dpdk_conf.set('RTE_MBUF_BURST_SIZE_MIN', 4)
></feature creep>
>
>Having these standardized will also allow libraries and drivers to optimize for them, e.g. drivers should support bursts sizes all the way down to RTE_MBUF_BURST_SIZE_MIN, and can static_assert() that the RTE_MBUF_BURST_SIZE_MIN is not lower than supported by the driver/hardware.
>
><more feature creep>
>rte_config.h could have "#define RTE_MBUF_BURST_SIZE RTE_MBUF_BURST_SIZE_MAX", for the application developer to change to RTE_MBUF_BURST_SIZE_MIN for low latency applications.
>This will let the libraries and drivers optimize for the specific burst size used by the application.
></more feature creep>
>
><rambling>
>Intuitively, I would assume that the optimal burst size essentially depends on the CPU's L1D cache size and the application's number of non-mbuf cache lines accessed per burst.
>Let's say a CPU core has 32 KiB cache (= 512 cache lines), and each burst touches 4 cache lines per packet:
>2 cache lines for the mbuf
>1 cache line for the packet data
>1 cache line per packet for some table lookup/forwarding entry
>
>Then the mbuf burst should be max 512/4 = 128.
>But local variables also use memory during processing, so using a burst of 64 would leave room for that and some more.
></rambling>
>
>>  config/arm/meson.build | 1 +
>>  config/meson.build     | 1 +
>>  2 files changed, 2 insertions(+)
>>
>> diff --git a/config/arm/meson.build b/config/arm/meson.build
>> index 523b0fc0ed50..fa64c07016b1 100644
>> --- a/config/arm/meson.build
>> +++ b/config/arm/meson.build
>> @@ -481,6 +481,7 @@ soc_cn10k = {
>>          ['RTE_MAX_LCORE', 24],
>>          ['RTE_MAX_NUMA_NODES', 1],
>>          ['RTE_MEMPOOL_ALIGN', 128],
>> +        ['RTE_OPTIMAL_BURST_SIZE', 64],
>>      ],
>>      'part_number': '0xd49',
>>      'extra_march_features': ['crypto'],
>> diff --git a/config/meson.build b/config/meson.build
>> index 0cb074ab95b7..95367ae88e2d 100644
>> --- a/config/meson.build
>> +++ b/config/meson.build
>> @@ -386,6 +386,7 @@ if get_option('mbuf_refcnt_atomic')
>>      dpdk_conf.set('RTE_MBUF_REFCNT_ATOMIC', true)
>>  endif
>>  dpdk_conf.set10('RTE_IOVA_IN_MBUF', get_option('enable_iova_as_pa'))
>> +dpdk_conf.set('RTE_OPTIMAL_BURST_SIZE', 32)
>>
>>  compile_time_cpuflags = []
>>  subdir(arch_subdir)
>> --
>> 2.50.1 (Apple Git-155)



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC 1/2] config: add optimal burst size configuration
  2025-11-26  9:57 ` [RFC 1/2] config: add optimal burst size configuration Morten Brørup
  2025-11-26 10:58   ` Pavan Nikhilesh Bhagavatula
@ 2025-11-26 11:00   ` Pavan Nikhilesh Bhagavatula
  2026-02-02  9:57     ` Pavan Nikhilesh Bhagavatula
  2025-11-27 22:01   ` Stephen Hemminger
  2 siblings, 1 reply; 12+ messages in thread
From: Pavan Nikhilesh Bhagavatula @ 2025-11-26 11:00 UTC (permalink / raw)
  To: Morten Brørup, Jerin Jacob, Wathsala Vithanage,
	Bruce Richardson
  Cc: dev@dpdk.org

>> From: Pavan Nikhilesh <pbhagavatula@marvell.com>
>>
>> Add RTE_OPTIMAL_BURST_SIZE to allow platforms to configure the
>> optimal burst size.
>>
>> Set default value to 64 for soc_cn10k and 32 generally.
>>
>> Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
>> ---
>> This improves performance by 5% on l2fwd, other examples showed
>> negligible difference on CN10K.
>>
>
>I support the concept of having a recommended mbuf burst size, targeting the majority of generic applications.
>Making it CPU dependent seems like a good choice.
>
>It should be named differently.
>First of all, "optimal" depends on the use case; if targeting low latency, shorter bursts are better, so "OPTIMAL" should not be part of the name.
>Second, I would guess that it only targets mbuf bursts, not also bursts of other operations (e.g. hash lookups), so "MBUF" should be part of the name.
>
>Suggestion:
>/* Recommended burst size for generic applications, striking a balance between throughput and latency. */
>dpdk_conf.set('RTE_MBUF_BURST_SIZE_MAX' (or _DEFAULT), 64)
>

Agreed, would the comment be enough to say that it is a recommendation and not an enforcement? or should it be added to the macro name?
I am sceptical of changing burst size of 64 since most of the applications _today_ use 32, might cause unintended regression.

RTE_MBUF_BURST_SIZE_(REC)_PERF?

><feature creep>
>/* Recommended burst size for generic applications targeting low latency. */
>dpdk_conf.set('RTE_MBUF_BURST_SIZE_MIN', 4)
></feature creep>

RTE_MBUF_BURST_SIZE_(REC)_LAT?

(I am bad at names)
>
>Having these standardized will also allow libraries and drivers to optimize for them, e.g. drivers should support bursts sizes all the way down to RTE_MBUF_BURST_SIZE_MIN, and can static_assert() that the RTE_MBUF_BURST_SIZE_MIN is not lower than supported by the driver/hardware.
>
><more feature creep>
>rte_config.h could have "#define RTE_MBUF_BURST_SIZE RTE_MBUF_BURST_SIZE_MAX", for the application developer to change to RTE_MBUF_BURST_SIZE_MIN for low latency applications.
>This will let the libraries and drivers optimize for the specific burst size used by the application.
></more feature creep>

This is fine with me, we can wrap it around a meson option to avoid manually changing rte_config.h

>
><rambling>
>Intuitively, I would assume that the optimal burst size essentially depends on the CPU's L1D cache size and the application's number of non-mbuf cache lines accessed per burst.
>Let's say a CPU core has 32 KiB cache (= 512 cache lines), and each burst touches 4 cache lines per packet:
>2 cache lines for the mbuf
>1 cache line for the packet data
>1 cache line per packet for some table lookup/forwarding entry
>
>Then the mbuf burst should be max 512/4 = 128.
>But local variables also use memory during processing, so using a burst of 64 would leave room for that and some more.
></rambling>

We could probably read `/sys/devices/system/cpu/cpu0/cache/index0/size` in meson and calculate the number of lines and burst but, I dont think its
that simple, for example, CN10K has 64KiB L1D cache and anything above 64 burst size causes performance loss.

Thanks,
Pavan

>
>>  config/arm/meson.build | 1 +
>>  config/meson.build     | 1 +
>>  2 files changed, 2 insertions(+)
>>
>> diff --git a/config/arm/meson.build b/config/arm/meson.build
>> index 523b0fc0ed50..fa64c07016b1 100644
>> --- a/config/arm/meson.build
>> +++ b/config/arm/meson.build
>> @@ -481,6 +481,7 @@ soc_cn10k = {
>>          ['RTE_MAX_LCORE', 24],
>>          ['RTE_MAX_NUMA_NODES', 1],
>>          ['RTE_MEMPOOL_ALIGN', 128],
>> +        ['RTE_OPTIMAL_BURST_SIZE', 64],
>>      ],
>>      'part_number': '0xd49',
>>      'extra_march_features': ['crypto'],
>> diff --git a/config/meson.build b/config/meson.build
>> index 0cb074ab95b7..95367ae88e2d 100644
>> --- a/config/meson.build
>> +++ b/config/meson.build
>> @@ -386,6 +386,7 @@ if get_option('mbuf_refcnt_atomic')
>>      dpdk_conf.set('RTE_MBUF_REFCNT_ATOMIC', true)
>>  endif
>>  dpdk_conf.set10('RTE_IOVA_IN_MBUF', get_option('enable_iova_as_pa'))
>> +dpdk_conf.set('RTE_OPTIMAL_BURST_SIZE', 32)
>>
>>  compile_time_cpuflags = []
>>  subdir(arch_subdir)
>> --
>> 2.50.1 (Apple Git-155)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC 1/2] config: add optimal burst size configuration
  2025-11-26  9:57 ` [RFC 1/2] config: add optimal burst size configuration Morten Brørup
  2025-11-26 10:58   ` Pavan Nikhilesh Bhagavatula
  2025-11-26 11:00   ` Pavan Nikhilesh Bhagavatula
@ 2025-11-27 22:01   ` Stephen Hemminger
  2026-02-02  9:52     ` [EXTERNAL] " Pavan Nikhilesh Bhagavatula
  2 siblings, 1 reply; 12+ messages in thread
From: Stephen Hemminger @ 2025-11-27 22:01 UTC (permalink / raw)
  To: Morten Brørup
  Cc: Pavan Nikhilesh, Jerin Jacob, Wathsala Vithanage,
	Bruce Richardson, dev

On Wed, 26 Nov 2025 10:57:13 +0100
Morten Brørup <mb@smartsharesystems.com> wrote:

> > From: Pavan Nikhilesh <pbhagavatula@marvell.com>
> > 
> > Add RTE_OPTIMAL_BURST_SIZE to allow platforms to configure the
> > optimal burst size.
> > 
> > Set default value to 64 for soc_cn10k and 32 generally.
> > 
> > Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
> > ---
> > This improves performance by 5% on l2fwd, other examples showed
> > negligible difference on CN10K.
> >  
> 
> I support the concept of having a recommended mbuf burst size, targeting the majority of generic applications.
> Making it CPU dependent seems like a good choice.
> 
> It should be named differently.
> First of all, "optimal" depends on the use case; if targeting low latency, shorter bursts are better, so "OPTIMAL" should not be part of the name.
> Second, I would guess that it only targets mbuf bursts, not also bursts of other operations (e.g. hash lookups), so "MBUF" should be part of the name.
> 
> Suggestion:
> /* Recommended burst size for generic applications, striking a balance between throughput and latency. */
> dpdk_conf.set('RTE_MBUF_BURST_SIZE_MAX' (or _DEFAULT), 64)
> 
> <feature creep>
> /* Recommended burst size for generic applications targeting low latency. */
> dpdk_conf.set('RTE_MBUF_BURST_SIZE_MIN', 4)
> </feature creep>
> 
> Having these standardized will also allow libraries and drivers to optimize for them, e.g. drivers should support bursts sizes all the way down to RTE_MBUF_BURST_SIZE_MIN, and can static_assert() that the RTE_MBUF_BURST_SIZE_MIN is not lower than supported by the driver/hardware.
> 
> <more feature creep>
> rte_config.h could have "#define RTE_MBUF_BURST_SIZE RTE_MBUF_BURST_SIZE_MAX", for the application developer to change to RTE_MBUF_BURST_SIZE_MIN for low latency applications.
> This will let the libraries and drivers optimize for the specific burst size used by the application.
> </more feature creep>
> 
> <rambling>
> Intuitively, I would assume that the optimal burst size essentially depends on the CPU's L1D cache size and the application's number of non-mbuf cache lines accessed per burst.
> Let's say a CPU core has 32 KiB cache (= 512 cache lines), and each burst touches 4 cache lines per packet:
> 2 cache lines for the mbuf
> 1 cache line for the packet data
> 1 cache line per packet for some table lookup/forwarding entry
> 
> Then the mbuf burst should be max 512/4 = 128.
> But local variables also use memory during processing, so using a burst of 64 would leave room for that and some more.
> </rambling>
> 
> >  config/arm/meson.build | 1 +
> >  config/meson.build     | 1 +
> >  2 files changed, 2 insertions(+)
> > 
> > diff --git a/config/arm/meson.build b/config/arm/meson.build
> > index 523b0fc0ed50..fa64c07016b1 100644
> > --- a/config/arm/meson.build
> > +++ b/config/arm/meson.build
> > @@ -481,6 +481,7 @@ soc_cn10k = {
> >          ['RTE_MAX_LCORE', 24],
> >          ['RTE_MAX_NUMA_NODES', 1],
> >          ['RTE_MEMPOOL_ALIGN', 128],
> > +        ['RTE_OPTIMAL_BURST_SIZE', 64],
> >      ],
> >      'part_number': '0xd49',
> >      'extra_march_features': ['crypto'],
> > diff --git a/config/meson.build b/config/meson.build
> > index 0cb074ab95b7..95367ae88e2d 100644
> > --- a/config/meson.build
> > +++ b/config/meson.build
> > @@ -386,6 +386,7 @@ if get_option('mbuf_refcnt_atomic')
> >      dpdk_conf.set('RTE_MBUF_REFCNT_ATOMIC', true)
> >  endif
> >  dpdk_conf.set10('RTE_IOVA_IN_MBUF', get_option('enable_iova_as_pa'))
> > +dpdk_conf.set('RTE_OPTIMAL_BURST_SIZE', 32)
> > 
> >  compile_time_cpuflags = []
> >  subdir(arch_subdir)
> > --
> > 2.50.1 (Apple Git-155)  

I understand the motivation, and it make sense for a pure embedded system.
But then again on an embedded system the application can just set its burst size;
this config option only impacts performance of testpmd and examples. And the
performance of testpmd is mostly irrelevant what matters is the real application.

Making it a DPDK config option is a problem for DPDK build in distros.
The optimal burst size would be driver dependent etc.

Perhaps better off in the existing rx / tx descriptor hints.
Most of those device configs really need to be relooked at
since they were inherited from how old Intel drivers worked.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [EXTERNAL] Re: [RFC 1/2] config: add optimal burst size configuration
  2025-11-27 22:01   ` Stephen Hemminger
@ 2026-02-02  9:52     ` Pavan Nikhilesh Bhagavatula
  2026-02-03  9:38       ` Morten Brørup
  0 siblings, 1 reply; 12+ messages in thread
From: Pavan Nikhilesh Bhagavatula @ 2026-02-02  9:52 UTC (permalink / raw)
  To: Stephen Hemminger, Morten Brørup
  Cc: Jerin Jacob, Wathsala Vithanage, Bruce Richardson, dev@dpdk.org

>> > From: Pavan Nikhilesh <pbhagavatula@marvell.com>
>> >
>> > Add RTE_OPTIMAL_BURST_SIZE to allow platforms to configure the
>> > optimal burst size.
>> >
>> > Set default value to 64 for soc_cn10k and 32 generally.
>> >
>> > Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
>> > ---
>> > This improves performance by 5% on l2fwd, other examples showed
>> > negligible difference on CN10K.
>> >
>>
>> I support the concept of having a recommended mbuf burst size, targeting the majority of generic applications.
>> Making it CPU dependent seems like a good choice.
>>
>> It should be named differently.
>> First of all, "optimal" depends on the use case; if targeting low latency, shorter bursts are better, so "OPTIMAL" should not be part of the name.
>> Second, I would guess that it only targets mbuf bursts, not also bursts of other operations (e.g. hash lookups), so "MBUF" should be part of the name.
>>
>> Suggestion:
>> /* Recommended burst size for generic applications, striking a balance between throughput and latency. */
>> dpdk_conf.set('RTE_MBUF_BURST_SIZE_MAX' (or _DEFAULT), 64)
>>
>> <feature creep>
>> /* Recommended burst size for generic applications targeting low latency. */
>> dpdk_conf.set('RTE_MBUF_BURST_SIZE_MIN', 4)
>> </feature creep>
>>
>> Having these standardized will also allow libraries and drivers to optimize for them, e.g. drivers should support bursts sizes all the way down to RTE_MBUF_BURST_SIZE_MIN, and can static_assert() that the RTE_MBUF_BURST_SIZE_MIN is not lower than supported by the driver/hardware.
>>
>> <more feature creep>
>> rte_config.h could have "#define RTE_MBUF_BURST_SIZE RTE_MBUF_BURST_SIZE_MAX", for the application developer to change to RTE_MBUF_BURST_SIZE_MIN for low latency applications.
>> This will let the libraries and drivers optimize for the specific burst size used by the application.
>> </more feature creep>
>>
>> <rambling>
>> Intuitively, I would assume that the optimal burst size essentially depends on the CPU's L1D cache size and the application's number of non-mbuf cache lines accessed per burst.
>> Let's say a CPU core has 32 KiB cache (= 512 cache lines), and each burst touches 4 cache lines per packet:
>> 2 cache lines for the mbuf
>> 1 cache line for the packet data
>> 1 cache line per packet for some table lookup/forwarding entry
>>
>> Then the mbuf burst should be max 512/4 = 128.
>> But local variables also use memory during processing, so using a burst of 64 would leave room for that and some more.
>> </rambling>
>>
>> >  config/arm/meson.build | 1 +
>> >  config/meson.build     | 1 +
>> >  2 files changed, 2 insertions(+)
>> >
>> > diff --git a/config/arm/meson.build b/config/arm/meson.build
>> > index 523b0fc0ed50..fa64c07016b1 100644
>> > --- a/config/arm/meson.build
>> > +++ b/config/arm/meson.build
>> > @@ -481,6 +481,7 @@ soc_cn10k = {
>> >          ['RTE_MAX_LCORE', 24],
>> >          ['RTE_MAX_NUMA_NODES', 1],
>> >          ['RTE_MEMPOOL_ALIGN', 128],
>> > +        ['RTE_OPTIMAL_BURST_SIZE', 64],
>> >      ],
>> >      'part_number': '0xd49',
>> >      'extra_march_features': ['crypto'],
>> > diff --git a/config/meson.build b/config/meson.build
>> > index 0cb074ab95b7..95367ae88e2d 100644
>> > --- a/config/meson.build
>> > +++ b/config/meson.build
>> > @@ -386,6 +386,7 @@ if get_option('mbuf_refcnt_atomic')
>> >      dpdk_conf.set('RTE_MBUF_REFCNT_ATOMIC', true)
>> >  endif
>> >  dpdk_conf.set10('RTE_IOVA_IN_MBUF', get_option('enable_iova_as_pa'))
>> > +dpdk_conf.set('RTE_OPTIMAL_BURST_SIZE', 32)
>> >
>> >  compile_time_cpuflags = []
>> >  subdir(arch_subdir)
>> > --
>> > 2.50.1 (Apple Git-155)
>
>I understand the motivation, and it make sense for a pure embedded system.
>But then again on an embedded system the application can just set its burst size;
>this config option only impacts performance of testpmd and examples. And the
>performance of testpmd is mostly irrelevant what matters is the real application.
>

True, but generally customer engagements start with benchmarking testpmd/l3fwd etc.
berfore moving to custom apps.
So, having better performance numbers helps.

>Making it a DPDK config option is a problem for DPDK build in distros.
>The optimal burst size would be driver dependent etc.
>

Since we are not modifying the current default burst size (32) it shouldn't be a problem and
can even benifit SoCs.

>Perhaps better off in the existing rx / tx descriptor hints.
>Most of those device configs really need to be relooked at
>since they were inherited from how old Intel drivers worked.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [RFC 1/2] config: add optimal burst size configuration
  2025-11-26 11:00   ` Pavan Nikhilesh Bhagavatula
@ 2026-02-02  9:57     ` Pavan Nikhilesh Bhagavatula
  0 siblings, 0 replies; 12+ messages in thread
From: Pavan Nikhilesh Bhagavatula @ 2026-02-02  9:57 UTC (permalink / raw)
  To: Morten Brørup, Jerin Jacob, Wathsala Vithanage,
	Bruce Richardson
  Cc: dev@dpdk.org

>>> From: Pavan Nikhilesh <pbhagavatula@marvell.com>
>>>
>>> Add RTE_OPTIMAL_BURST_SIZE to allow platforms to configure the
>>> optimal burst size.
>>>
>>> Set default value to 64 for soc_cn10k and 32 generally.
>>>
>>> Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
>>> ---
>>> This improves performance by 5% on l2fwd, other examples showed
>>> negligible difference on CN10K.
>>>
>>
>>I support the concept of having a recommended mbuf burst size, targeting the majority of generic applications.
>>Making it CPU dependent seems like a good choice.
>>
>>It should be named differently.
>>First of all, "optimal" depends on the use case; if targeting low latency, shorter bursts are better, so "OPTIMAL" should not be part of the name.
>>Second, I would guess that it only targets mbuf bursts, not also bursts of other operations (e.g. hash lookups), so "MBUF" should be part of the name.
>>
>>Suggestion:
>>/* Recommended burst size for generic applications, striking a balance between throughput and latency. */
>>dpdk_conf.set('RTE_MBUF_BURST_SIZE_MAX' (or _DEFAULT), 64)
>>
>
>Agreed, would the comment be enough to say that it is a recommendation and not an enforcement? or should it be added to the macro name?
>I am sceptical of changing burst size of 64 since most of the applications _today_ use 32, might cause unintended regression.
>
>RTE_MBUF_BURST_SIZE_(REC)_PERF?
>
>><feature creep>
>>/* Recommended burst size for generic applications targeting low latency. */
>>dpdk_conf.set('RTE_MBUF_BURST_SIZE_MIN', 4)
>></feature creep>
>
>RTE_MBUF_BURST_SIZE_(REC)_LAT?
>
>(I am bad at names)
>>
>>Having these standardized will also allow libraries and drivers to optimize for them, e.g. drivers should support bursts sizes all the way down to RTE_MBUF_BURST_SIZE_MIN, and can static_assert() that the RTE_MBUF_BURST_SIZE_MIN is not lower than supported by the driver/hardware.
>>
>><more feature creep>
>>rte_config.h could have "#define RTE_MBUF_BURST_SIZE RTE_MBUF_BURST_SIZE_MAX", for the application developer to change to RTE_MBUF_BURST_SIZE_MIN for low latency applications.
>>This will let the libraries and drivers optimize for the specific burst size used by the application.
>></more feature creep>
>
>This is fine with me, we can wrap it around a meson option to avoid manually changing rte_config.h
>
>>
>><rambling>
>>Intuitively, I would assume that the optimal burst size essentially depends on the CPU's L1D cache size and the application's number of non-mbuf cache lines accessed per burst.
>>Let's say a CPU core has 32 KiB cache (= 512 cache lines), and each burst touches 4 cache lines per packet:
>>2 cache lines for the mbuf
>>1 cache line for the packet data
>>1 cache line per packet for some table lookup/forwarding entry
>>
>>Then the mbuf burst should be max 512/4 = 128.
>>But local variables also use memory during processing, so using a burst of 64 would leave room for that and some more.
>></rambling>
>
>We could probably read `/sys/devices/system/cpu/cpu0/cache/index0/size` in meson and calculate the number of lines and burst but, I dont think its
>that simple, for example, CN10K has 64KiB L1D cache and anything above 64 burst size causes performance loss.
>
>Thanks,
>Pavan
>

@Morten, Any further thoughts? I can spin up a v1.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [EXTERNAL] Re: [RFC 1/2] config: add optimal burst size configuration
  2026-02-02  9:52     ` [EXTERNAL] " Pavan Nikhilesh Bhagavatula
@ 2026-02-03  9:38       ` Morten Brørup
  0 siblings, 0 replies; 12+ messages in thread
From: Morten Brørup @ 2026-02-03  9:38 UTC (permalink / raw)
  To: Pavan Nikhilesh Bhagavatula, Stephen Hemminger
  Cc: Jerin Jacob, Wathsala Vithanage, Bruce Richardson, dev

> >> > From: Pavan Nikhilesh <pbhagavatula@marvell.com>
> >> >
> >> > Add RTE_OPTIMAL_BURST_SIZE to allow platforms to configure the
> >> > optimal burst size.
> >> >
> >> > Set default value to 64 for soc_cn10k and 32 generally.
> >> >
> >> > Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
> >> > ---
> >> > This improves performance by 5% on l2fwd, other examples showed
> >> > negligible difference on CN10K.
> >> >
> >>
> >> I support the concept of having a recommended mbuf burst size,
> targeting the majority of generic applications.
> >> Making it CPU dependent seems like a good choice.
> >>
> >> It should be named differently.
> >> First of all, "optimal" depends on the use case; if targeting low
> latency, shorter bursts are better, so "OPTIMAL" should not be part of
> the name.
> >> Second, I would guess that it only targets mbuf bursts, not also
> bursts of other operations (e.g. hash lookups), so "MBUF" should be
> part of the name.
> >>
> >> Suggestion:
> >> /* Recommended burst size for generic applications, striking a
> balance between throughput and latency. */
> >> dpdk_conf.set('RTE_MBUF_BURST_SIZE_MAX' (or _DEFAULT), 64)
> >>
> >> <feature creep>
> >> /* Recommended burst size for generic applications targeting low
> latency. */
> >> dpdk_conf.set('RTE_MBUF_BURST_SIZE_MIN', 4)
> >> </feature creep>
> >>
> >> Having these standardized will also allow libraries and drivers to
> optimize for them, e.g. drivers should support bursts sizes all the way
> down to RTE_MBUF_BURST_SIZE_MIN, and can static_assert() that the
> RTE_MBUF_BURST_SIZE_MIN is not lower than supported by the
> driver/hardware.
> >>
> >> <more feature creep>
> >> rte_config.h could have "#define RTE_MBUF_BURST_SIZE
> RTE_MBUF_BURST_SIZE_MAX", for the application developer to change to
> RTE_MBUF_BURST_SIZE_MIN for low latency applications.
> >> This will let the libraries and drivers optimize for the specific
> burst size used by the application.
> >> </more feature creep>
> >>
> >> <rambling>
> >> Intuitively, I would assume that the optimal burst size essentially
> depends on the CPU's L1D cache size and the application's number of
> non-mbuf cache lines accessed per burst.
> >> Let's say a CPU core has 32 KiB cache (= 512 cache lines), and each
> burst touches 4 cache lines per packet:
> >> 2 cache lines for the mbuf
> >> 1 cache line for the packet data
> >> 1 cache line per packet for some table lookup/forwarding entry
> >>
> >> Then the mbuf burst should be max 512/4 = 128.
> >> But local variables also use memory during processing, so using a
> burst of 64 would leave room for that and some more.
> >> </rambling>
> >>
> >> >  config/arm/meson.build | 1 +
> >> >  config/meson.build     | 1 +
> >> >  2 files changed, 2 insertions(+)
> >> >
> >> > diff --git a/config/arm/meson.build b/config/arm/meson.build
> >> > index 523b0fc0ed50..fa64c07016b1 100644
> >> > --- a/config/arm/meson.build
> >> > +++ b/config/arm/meson.build
> >> > @@ -481,6 +481,7 @@ soc_cn10k = {
> >> >          ['RTE_MAX_LCORE', 24],
> >> >          ['RTE_MAX_NUMA_NODES', 1],
> >> >          ['RTE_MEMPOOL_ALIGN', 128],
> >> > +        ['RTE_OPTIMAL_BURST_SIZE', 64],
> >> >      ],
> >> >      'part_number': '0xd49',
> >> >      'extra_march_features': ['crypto'],
> >> > diff --git a/config/meson.build b/config/meson.build
> >> > index 0cb074ab95b7..95367ae88e2d 100644
> >> > --- a/config/meson.build
> >> > +++ b/config/meson.build
> >> > @@ -386,6 +386,7 @@ if get_option('mbuf_refcnt_atomic')
> >> >      dpdk_conf.set('RTE_MBUF_REFCNT_ATOMIC', true)
> >> >  endif
> >> >  dpdk_conf.set10('RTE_IOVA_IN_MBUF',
> get_option('enable_iova_as_pa'))
> >> > +dpdk_conf.set('RTE_OPTIMAL_BURST_SIZE', 32)
> >> >
> >> >  compile_time_cpuflags = []
> >> >  subdir(arch_subdir)
> >> > --
> >> > 2.50.1 (Apple Git-155)
> >
> >I understand the motivation, and it make sense for a pure embedded
> system.
> >But then again on an embedded system the application can just set its
> burst size;
> >this config option only impacts performance of testpmd and examples.
> And the
> >performance of testpmd is mostly irrelevant what matters is the real
> application.
> >
> 
> True, but generally customer engagements start with benchmarking
> testpmd/l3fwd etc.
> berfore moving to custom apps.
> So, having better performance numbers helps.

Good example from real life. :-)

> 
> >Making it a DPDK config option is a problem for DPDK build in distros.
> >The optimal burst size would be driver dependent etc.
> >
> 
> Since we are not modifying the current default burst size (32) it
> shouldn't be a problem and
> can even benifit SoCs.
> 
> >Perhaps better off in the existing rx / tx descriptor hints.
> >Most of those device configs really need to be relooked at
> >since they were inherited from how old Intel drivers worked.

Yes, descriptor hints are very incomplete. E.g. they should also consider the CPU.

Currently, DPDK applications use a hardcoded default of 32, regardless of everything else.
That was probably good for some of the hardware (CPUs and NICs) originally used for DPDK.

I think adapting to other environments make sense.

I agree that finding the optimal burst size is a combination of multiple parameters, and highly dependent on the application too.
The current alternative, sticking with 32, is clearly not the best option.
So I agree with the idea of trying to improve that.

If we make it a common build option, drivers and libs can optimize for these values, instead of randomly picking some number to use internally.
E.g. rte_pktmbuf_free_bulk() [1] uses an array of 64 mbufs. It might as well have been 32.

Using a common build option also makes it possible to align (regarding burst size) the application with the drivers and libs, when built from scratch.
And it will not harm the distros, which currently use (more or less randomly selected) hardcoded burst sizes.

It may not be the optimal solution, but IMO better than what we have today.

[1]: https://elixir.bootlin.com/dpdk/v25.11/source/lib/mbuf/rte_mbuf.c#L556


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 1/2] config: add mbuf burst size configuration
  2025-11-26  8:24 [RFC 1/2] config: add optimal burst size configuration pbhagavatula
  2025-11-26  8:24 ` [RFC 2/2] examples: use optimal burst size pbhagavatula
  2025-11-26  9:57 ` [RFC 1/2] config: add optimal burst size configuration Morten Brørup
@ 2026-02-20 23:07 ` pbhagavatula
  2026-02-20 23:07   ` [PATCH 2/2] examples: use default mbuf burst size pbhagavatula
  2 siblings, 1 reply; 12+ messages in thread
From: pbhagavatula @ 2026-02-20 23:07 UTC (permalink / raw)
  To: mb, jerinj, Wathsala Vithanage, Bruce Richardson; +Cc: dev, Pavan Nikhilesh

From: Pavan Nikhilesh <pbhagavatula@marvell.com>

Add configurable mbuf burst size macros:
- RTE_MBUF_BURST_SIZE_THROUGHPUT: optimized for throughput (default 32)
- RTE_MBUF_BURST_SIZE_LATENCY: optimized for low latency (default 4)
- RTE_MBUF_BURST_SIZE_DEFAULT: references the selected profile

Add meson option 'mbuf_burst_size_default' to select between
'throughput' (default) and 'latency' profiles.

Platform-specific configurations can override
RTE_MBUF_BURST_SIZE_THROUGHPUT.
Set to 64 for CN10K which benefits from larger bursts.

Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
---
v1 Changes:
- Renamed RTE_OPTIMAL_BURST_SIZE to RTE_MBUF_BURST_SIZE_DEFAULT,
- Added RTE_MBUF_BURST_SIZE_THROUGHPUT/RTE_MBUF_BURST_SIZE_LATENCY
  macros with meson option mbuf_burst_size_default to select profile.

 config/arm/meson.build |  1 +
 config/meson.build     | 15 +++++++++++++++
 meson_options.txt      |  2 ++
 3 files changed, 18 insertions(+)

diff --git a/config/arm/meson.build b/config/arm/meson.build
index 523b0fc0ed50..1b2cdbd25774 100644
--- a/config/arm/meson.build
+++ b/config/arm/meson.build
@@ -481,6 +481,7 @@ soc_cn10k = {
         ['RTE_MAX_LCORE', 24],
         ['RTE_MAX_NUMA_NODES', 1],
         ['RTE_MEMPOOL_ALIGN', 128],
+        ['RTE_MBUF_BURST_SIZE_THROUGHPUT', 64],
     ],
     'part_number': '0xd49',
     'extra_march_features': ['crypto'],
diff --git a/config/meson.build b/config/meson.build
index 02e2798ccaf1..578b561a4fd1 100644
--- a/config/meson.build
+++ b/config/meson.build
@@ -393,10 +393,25 @@ if get_option('mbuf_refcnt_atomic')
 endif
 dpdk_conf.set10('RTE_IOVA_IN_MBUF', get_option('enable_iova_as_pa'))

+# Recommended mbuf burst sizes for generic applications.
+# Platform-specific configs may override these values.
+# RTE_MBUF_BURST_SIZE_THROUGHPUT: Burst size optimized for throughput.
+dpdk_conf.set('RTE_MBUF_BURST_SIZE_THROUGHPUT', 32)
+# RTE_MBUF_BURST_SIZE_LATENCY: Burst size optimized for low latency.
+dpdk_conf.set('RTE_MBUF_BURST_SIZE_LATENCY', 4)
+
 compile_time_cpuflags = []
 subdir(arch_subdir)
 dpdk_conf.set('RTE_COMPILE_TIME_CPUFLAGS', ','.join(compile_time_cpuflags))

+# RTE_MBUF_BURST_SIZE_DEFAULT: Default burst size used by examples and testpmd.
+# Controlled by -Dmbuf_burst_size_default option (throughput or latency).
+if get_option('mbuf_burst_size_default') == 'latency'
+    dpdk_conf.set('RTE_MBUF_BURST_SIZE_DEFAULT', 'RTE_MBUF_BURST_SIZE_LATENCY')
+else
+    dpdk_conf.set('RTE_MBUF_BURST_SIZE_DEFAULT', 'RTE_MBUF_BURST_SIZE_THROUGHPUT')
+endif
+
 # apply cross-specific options
 if meson.is_cross_build()
     # configure RTE_MAX_LCORE and RTE_MAX_NUMA_NODES from cross file
diff --git a/meson_options.txt b/meson_options.txt
index e28d24054cf1..2caf0be91d39 100644
--- a/meson_options.txt
+++ b/meson_options.txt
@@ -46,6 +46,8 @@ option('enable_iova_as_pa', type: 'boolean', value: true, description:
        'Support the use of physical addresses for IO addresses, such as used by UIO or VFIO in no-IOMMU mode. When disabled, DPDK can only run with IOMMU support for address mappings, but will have more space available in the mbuf structure.')
 option('mbuf_refcnt_atomic', type: 'boolean', value: true, description:
        'Atomically access the mbuf refcnt.')
+option('mbuf_burst_size_default', type: 'combo', choices: ['throughput', 'latency'], value: 'throughput', description:
+       'Default mbuf burst size profile: throughput-optimized or latency-optimized.')
 option('platform', type: 'string', value: 'native', description:
        'Platform to build, either "native", "generic" or a SoC. Please refer to the Linux build guide for more information.')
 option('pkt_mbuf_headroom', type: 'integer', value: 128, description:
--
2.50.1 (Apple Git-155)


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 2/2] examples: use default mbuf burst size
  2026-02-20 23:07 ` [PATCH 1/2] config: add mbuf " pbhagavatula
@ 2026-02-20 23:07   ` pbhagavatula
  2026-02-21 16:26     ` Stephen Hemminger
  0 siblings, 1 reply; 12+ messages in thread
From: pbhagavatula @ 2026-02-20 23:07 UTC (permalink / raw)
  To: mb, jerinj, Wisam Jaddo, Aman Singh, Chas Williams,
	Min Hu (Connor), Akhil Goyal, Anoob Joseph, Nicolas Chautru,
	David Hunt, Chengwen Feng, Kevin Laatz, Bruce Richardson,
	Konstantin Ananyev, Radu Nicolau, Tomasz Kantecki, Fan Zhang,
	Sunil Kumar Kori, Pavan Nikhilesh, Anatoly Burakov,
	Sivaprasad Tummala, Jingjing Wu, Volodymyr Fialko,
	Cristian Dumitrescu, John McNamara, Maxime Coquelin, Chenbo Xia
  Cc: dev

From: Pavan Nikhilesh <pbhagavatula@marvell.com>

Replace hardcoded burst sizes with RTE_MBUF_BURST_SIZE_DEFAULT
to adapt to platform-specific optimal burst sizes.

Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
---
 app/test-eventdev/evt_options.c            |  2 +-
 app/test-flow-perf/main.c                  |  2 +-
 app/test-pmd/testpmd.h                     |  2 +-
 app/test/test_link_bonding.c               |  2 +-
 app/test/test_link_bonding_mode4.c         |  4 ++--
 app/test/test_pmd_perf.c                   |  2 +-
 app/test/test_security_inline_proto.c      |  2 +-
 examples/bbdev_app/main.c                  |  2 +-
 examples/bond/main.c                       |  2 +-
 examples/distributor/main.c                |  4 ++--
 examples/dma/dmafwd.c                      |  2 +-
 examples/ethtool/ethtool-app/main.c        |  2 +-
 examples/ip_fragmentation/main.c           |  2 +-
 examples/ip_reassembly/main.c              |  3 +--
 examples/ipsec-secgw/ipsec-secgw.h         |  4 ++--
 examples/ipv4_multicast/main.c             |  2 +-
 examples/l2fwd-cat/l2fwd-cat.c             |  2 +-
 examples/l2fwd-crypto/main.c               |  2 +-
 examples/l2fwd-event/l2fwd_common.h        |  2 +-
 examples/l2fwd-jobstats/main.c             |  2 +-
 examples/l2fwd-keepalive/main.c            |  2 +-
 examples/l2fwd-macsec/main.c               |  2 +-
 examples/l2fwd/main.c                      |  2 +-
 examples/l3fwd-power/main.c                |  2 +-
 examples/l3fwd/l3fwd.h                     |  4 ++--
 examples/l3fwd/main.c                      | 27 ++++++++++++++--------
 examples/link_status_interrupt/main.c      |  4 ++--
 examples/multi_process/symmetric_mp/main.c |  2 +-
 examples/ntb/ntb_fwd.c                     |  4 ++--
 examples/packet_ordering/main.c            |  2 +-
 examples/qos_meter/main.c                  |  4 ++--
 examples/qos_sched/main.h                  |  4 ++--
 examples/rxtx_callbacks/main.c             |  2 +-
 examples/skeleton/basicfwd.c               |  2 +-
 examples/vhost/main.h                      |  2 +-
 examples/vhost_crypto/main.c               |  2 +-
 examples/vm_power_manager/main.c           |  2 +-
 examples/vmdq/main.c                       |  2 +-
 examples/vmdq_dcb/main.c                   |  2 +-
 39 files changed, 64 insertions(+), 56 deletions(-)

diff --git a/app/test-eventdev/evt_options.c b/app/test-eventdev/evt_options.c
index 0e70c971eb2e..ebdb1eb33478 100644
--- a/app/test-eventdev/evt_options.c
+++ b/app/test-eventdev/evt_options.c
@@ -37,7 +37,7 @@ evt_options_default(struct evt_options *opt)
 	opt->expiry_nsec = 1E4;   /* 10000ns ~10us */
 	opt->prod_type = EVT_PROD_TYPE_SYNT;
 	opt->eth_queues = 1;
-	opt->vector_size = 64;
+	opt->vector_size = RTE_MBUF_BURST_SIZE_DEFAULT;
 	opt->vector_tmo_nsec = 100E3;
 	opt->crypto_op_type = RTE_CRYPTO_OP_TYPE_SYMMETRIC;
 	opt->crypto_cipher_alg = RTE_CRYPTO_CIPHER_NULL;
diff --git a/app/test-flow-perf/main.c b/app/test-flow-perf/main.c
index 6636d1517f48..fc4f70d583ce 100644
--- a/app/test-flow-perf/main.c
+++ b/app/test-flow-perf/main.c
@@ -100,7 +100,7 @@ static uint8_t max_priority;
 static uint32_t rand_seed;
 static uint64_t meter_profile_values[3]; /* CIR CBS EBS values. */
 
-#define MAX_PKT_BURST    32
+#define MAX_PKT_BURST	  RTE_MBUF_BURST_SIZE_DEFAULT
 #define LCORE_MODE_PKT    1
 #define LCORE_MODE_STATS  2
 #define MAX_STREAMS      64
diff --git a/app/test-pmd/testpmd.h b/app/test-pmd/testpmd.h
index f319471c732e..107cc2891a21 100644
--- a/app/test-pmd/testpmd.h
+++ b/app/test-pmd/testpmd.h
@@ -78,7 +78,7 @@ struct cmdline_file_info {
 #define TX_DESC_MAX    2048
 
 #define MAX_PKT_BURST 512
-#define DEF_PKT_BURST 32
+#define DEF_PKT_BURST RTE_MBUF_BURST_SIZE_DEFAULT
 
 #define DEF_MBUF_CACHE 250
 
diff --git a/app/test/test_link_bonding.c b/app/test/test_link_bonding.c
index 19b064771aef..6fe9b40e5f54 100644
--- a/app/test/test_link_bonding.c
+++ b/app/test/test_link_bonding.c
@@ -52,7 +52,7 @@
 #define RX_DESC_MAX	(2048)
 #define TX_DESC_MAX	(2048)
 #define MAX_PKT_BURST			(512)
-#define DEF_PKT_BURST			(16)
+#define DEF_PKT_BURST			(RTE_MBUF_BURST_SIZE_DEFAULT)
 
 #define BONDING_DEV_NAME			("net_bonding_ut")
 
diff --git a/app/test/test_link_bonding_mode4.c b/app/test/test_link_bonding_mode4.c
index ff13dbed93f3..c772108f31d5 100644
--- a/app/test/test_link_bonding_mode4.c
+++ b/app/test/test_link_bonding_mode4.c
@@ -41,8 +41,8 @@
 
 #define TEST_RX_DESC_MAX        (2048)
 #define TEST_TX_DESC_MAX        (2048)
-#define MAX_PKT_BURST           (32)
-#define DEF_PKT_BURST           (16)
+#define MAX_PKT_BURST		(RTE_MBUF_BURST_SIZE_DEFAULT)
+#define DEF_PKT_BURST		(RTE_MBUF_BURST_SIZE_DEFAULT)
 
 #define BONDING_DEV_NAME         ("net_bonding_m4_bond_dev")
 
diff --git a/app/test/test_pmd_perf.c b/app/test/test_pmd_perf.c
index 995b0a6f20c4..b0ec93f89a2a 100644
--- a/app/test/test_pmd_perf.c
+++ b/app/test/test_pmd_perf.c
@@ -17,7 +17,7 @@
 #define NB_ETHPORTS_USED                (1)
 #define NB_SOCKETS                      (2)
 #define MEMPOOL_CACHE_SIZE 250
-#define MAX_PKT_BURST                   (32)
+#define MAX_PKT_BURST			(RTE_MBUF_BURST_SIZE_DEFAULT)
 #define RX_DESC_DEFAULT        (1024)
 #define TX_DESC_DEFAULT        (1024)
 #define RTE_PORT_ALL            (~(uint16_t)0x0)
diff --git a/app/test/test_security_inline_proto.c b/app/test/test_security_inline_proto.c
index 8b88fce3e990..5c86a9a7707a 100644
--- a/app/test/test_security_inline_proto.c
+++ b/app/test/test_security_inline_proto.c
@@ -44,7 +44,7 @@ test_inline_ipsec_sg(void)
 
 #define NB_ETHPORTS_USED		1
 #define MEMPOOL_CACHE_SIZE		32
-#define MAX_PKT_BURST			32
+#define MAX_PKT_BURST			RTE_MBUF_BURST_SIZE_DEFAULT
 #define RX_DESC_DEFAULT	1024
 #define TX_DESC_DEFAULT	1024
 #define RTE_PORT_ALL		(~(uint16_t)0x0)
diff --git a/examples/bbdev_app/main.c b/examples/bbdev_app/main.c
index 03f15f91cc6b..e318025c9095 100644
--- a/examples/bbdev_app/main.c
+++ b/examples/bbdev_app/main.c
@@ -39,7 +39,7 @@
 #define LLR_1_BIT 0x81
 #define LLR_0_BIT 0x7F
 
-#define MAX_PKT_BURST 32
+#define MAX_PKT_BURST	   RTE_MBUF_BURST_SIZE_DEFAULT
 #define NB_MBUF 8191
 #define MEMPOOL_CACHE_SIZE 256
 
diff --git a/examples/bond/main.c b/examples/bond/main.c
index 4e8eeb7a5e1b..0fd968c44b4a 100644
--- a/examples/bond/main.c
+++ b/examples/bond/main.c
@@ -51,7 +51,7 @@
 
 #define NB_MBUF   (1024*8)
 
-#define MAX_PKT_BURST 32
+#define MAX_PKT_BURST	     RTE_MBUF_BURST_SIZE_DEFAULT
 #define BURST_TX_DRAIN_US 100      /* TX drain every ~100us */
 #define BURST_RX_INTERVAL_NS (10) /* RX poll interval ~100ns */
 
diff --git a/examples/distributor/main.c b/examples/distributor/main.c
index ea44939fba04..d60de85a369f 100644
--- a/examples/distributor/main.c
+++ b/examples/distributor/main.c
@@ -23,10 +23,10 @@
 #define TX_RING_SIZE 1024
 #define NUM_MBUFS ((64*1024)-1)
 #define MBUF_CACHE_SIZE 128
-#define BURST_SIZE 64
+#define BURST_SIZE	 RTE_MBUF_BURST_SIZE_DEFAULT
 #define SCHED_RX_RING_SZ 8192
 #define SCHED_TX_RING_SZ 65536
-#define BURST_SIZE_TX 32
+#define BURST_SIZE_TX	 RTE_MBUF_BURST_SIZE_DEFAULT
 
 #define RTE_LOGTYPE_DISTRAPP RTE_LOGTYPE_USER1
 
diff --git a/examples/dma/dmafwd.c b/examples/dma/dmafwd.c
index 5ba0aaa40b21..d282db911dc5 100644
--- a/examples/dma/dmafwd.c
+++ b/examples/dma/dmafwd.c
@@ -15,7 +15,7 @@
 
 /* size of ring used for software copying between rx and tx. */
 #define RTE_LOGTYPE_DMA RTE_LOGTYPE_USER1
-#define MAX_PKT_BURST 32
+#define MAX_PKT_BURST		     RTE_MBUF_BURST_SIZE_DEFAULT
 #define MEMPOOL_CACHE_SIZE 512
 #define MIN_POOL_SIZE 65536U
 #define CMD_LINE_OPT_PORTMASK_INDEX 1
diff --git a/examples/ethtool/ethtool-app/main.c b/examples/ethtool/ethtool-app/main.c
index b6bbae70d29f..cbfb92ce4d94 100644
--- a/examples/ethtool/ethtool-app/main.c
+++ b/examples/ethtool/ethtool-app/main.c
@@ -19,7 +19,7 @@
 #include "ethapp.h"
 
 #define MAX_PORTS RTE_MAX_ETHPORTS
-#define MAX_BURST_LENGTH 32
+#define MAX_BURST_LENGTH   RTE_MBUF_BURST_SIZE_DEFAULT
 #define PORT_RX_QUEUE_SIZE 1024
 #define PORT_TX_QUEUE_SIZE 1024
 #define PKTPOOL_EXTRA_SIZE 512
diff --git a/examples/ip_fragmentation/main.c b/examples/ip_fragmentation/main.c
index 218068237331..aa944a353233 100644
--- a/examples/ip_fragmentation/main.c
+++ b/examples/ip_fragmentation/main.c
@@ -74,7 +74,7 @@
 
 #define NB_MBUF   8192
 
-#define MAX_PKT_BURST	32
+#define MAX_PKT_BURST	  RTE_MBUF_BURST_SIZE_DEFAULT
 #define BURST_TX_DRAIN_US 100 /* TX drain every ~100us */
 
 /* Configure how many packets ahead to prefetch, when reading packets */
diff --git a/examples/ip_reassembly/main.c b/examples/ip_reassembly/main.c
index 520fbea1c2ec..20858643f721 100644
--- a/examples/ip_reassembly/main.c
+++ b/examples/ip_reassembly/main.c
@@ -43,8 +43,7 @@
 
 #include <rte_ip_frag.h>
 
-#define MAX_PKT_BURST 32
-
+#define MAX_PKT_BURST RTE_MBUF_BURST_SIZE_DEFAULT
 
 #define RTE_LOGTYPE_IP_RSMBL RTE_LOGTYPE_USER1
 
diff --git a/examples/ipsec-secgw/ipsec-secgw.h b/examples/ipsec-secgw/ipsec-secgw.h
index b4ef4b6d04bc..191b60c8e2ee 100644
--- a/examples/ipsec-secgw/ipsec-secgw.h
+++ b/examples/ipsec-secgw/ipsec-secgw.h
@@ -11,8 +11,8 @@
 
 #define NB_SOCKETS 4
 
-#define MAX_PKT_BURST 32
-#define MAX_PKT_BURST_VEC 256
+#define MAX_PKT_BURST	  RTE_MBUF_BURST_SIZE_DEFAULT
+#define MAX_PKT_BURST_VEC RTE_MBUF_BURST_SIZE_DEFAULT
 
 #define MAX_PKTS                                  \
 	((MAX_PKT_BURST_VEC > MAX_PKT_BURST ?     \
diff --git a/examples/ipv4_multicast/main.c b/examples/ipv4_multicast/main.c
index bd4c3f335be0..1ea330c5e4a6 100644
--- a/examples/ipv4_multicast/main.c
+++ b/examples/ipv4_multicast/main.c
@@ -53,7 +53,7 @@
 /* allow max jumbo frame 9.5 KB */
 #define	JUMBO_FRAME_MAX_SIZE	0x2600
 
-#define MAX_PKT_BURST 32
+#define MAX_PKT_BURST	  RTE_MBUF_BURST_SIZE_DEFAULT
 #define BURST_TX_DRAIN_US 100 /* TX drain every ~100us */
 
 /* Configure how many packets ahead to prefetch, when reading packets */
diff --git a/examples/l2fwd-cat/l2fwd-cat.c b/examples/l2fwd-cat/l2fwd-cat.c
index 6e16705e9931..d79e9b0a29b3 100644
--- a/examples/l2fwd-cat/l2fwd-cat.c
+++ b/examples/l2fwd-cat/l2fwd-cat.c
@@ -17,7 +17,7 @@
 
 #define NUM_MBUFS 8191
 #define MBUF_CACHE_SIZE 250
-#define BURST_SIZE 32
+#define BURST_SIZE	RTE_MBUF_BURST_SIZE_DEFAULT
 
 /* l2fwd-cat.c: CAT enabled, basic DPDK skeleton forwarding example. */
 
diff --git a/examples/l2fwd-crypto/main.c b/examples/l2fwd-crypto/main.c
index a441312f5524..4c27bb7d78e3 100644
--- a/examples/l2fwd-crypto/main.c
+++ b/examples/l2fwd-crypto/main.c
@@ -61,7 +61,7 @@ enum cdev_type {
 #define MAX_KEY_SIZE 128
 #define MAX_IV_SIZE 16
 #define MAX_AAD_SIZE 65535
-#define MAX_PKT_BURST 32
+#define MAX_PKT_BURST		RTE_MBUF_BURST_SIZE_DEFAULT
 #define BURST_TX_DRAIN_US 100 /* TX drain every ~100us */
 #define SESSION_POOL_CACHE_SIZE 0
 
diff --git a/examples/l2fwd-event/l2fwd_common.h b/examples/l2fwd-event/l2fwd_common.h
index f4f1c45cd16b..f8fee3e45963 100644
--- a/examples/l2fwd-event/l2fwd_common.h
+++ b/examples/l2fwd-event/l2fwd_common.h
@@ -41,7 +41,7 @@
 #include <rte_mbuf.h>
 #include <rte_spinlock.h>
 
-#define MAX_PKT_BURST 32
+#define MAX_PKT_BURST	       RTE_MBUF_BURST_SIZE_DEFAULT
 #define MAX_RX_QUEUE_PER_LCORE 16
 #define MAX_TX_QUEUE_PER_PORT 16
 
diff --git a/examples/l2fwd-jobstats/main.c b/examples/l2fwd-jobstats/main.c
index a7cd5b4840d5..a2531c6a6214 100644
--- a/examples/l2fwd-jobstats/main.c
+++ b/examples/l2fwd-jobstats/main.c
@@ -38,7 +38,7 @@
 
 #define NB_MBUF   8192
 
-#define MAX_PKT_BURST 32
+#define MAX_PKT_BURST	  RTE_MBUF_BURST_SIZE_DEFAULT
 #define BURST_TX_DRAIN_US 100 /* TX drain every ~100us */
 
 /*
diff --git a/examples/l2fwd-keepalive/main.c b/examples/l2fwd-keepalive/main.c
index 993e0bf9dacc..0e384266f510 100644
--- a/examples/l2fwd-keepalive/main.c
+++ b/examples/l2fwd-keepalive/main.c
@@ -42,7 +42,7 @@
 
 #define NB_MBUF_PER_PORT 3000
 
-#define MAX_PKT_BURST 32
+#define MAX_PKT_BURST	  RTE_MBUF_BURST_SIZE_DEFAULT
 #define BURST_TX_DRAIN_US 100 /* TX drain every ~100us */
 
 /*
diff --git a/examples/l2fwd-macsec/main.c b/examples/l2fwd-macsec/main.c
index 98763440bc7a..957c2a900c97 100644
--- a/examples/l2fwd-macsec/main.c
+++ b/examples/l2fwd-macsec/main.c
@@ -48,7 +48,7 @@ static int promiscuous_on = 1;
 
 #define RTE_LOGTYPE_L2FWD RTE_LOGTYPE_USER1
 
-#define MAX_PKT_BURST 32
+#define MAX_PKT_BURST		RTE_MBUF_BURST_SIZE_DEFAULT
 #define BURST_TX_DRAIN_US 100 /* TX drain every ~100us */
 #define MEMPOOL_CACHE_SIZE 256
 #define SESSION_POOL_CACHE_SIZE 0
diff --git a/examples/l2fwd/main.c b/examples/l2fwd/main.c
index 59ea3172aee5..1feda5a1d43a 100644
--- a/examples/l2fwd/main.c
+++ b/examples/l2fwd/main.c
@@ -47,7 +47,7 @@ static int promiscuous_on;
 
 #define RTE_LOGTYPE_L2FWD RTE_LOGTYPE_USER1
 
-#define MAX_PKT_BURST 32
+#define MAX_PKT_BURST	   RTE_MBUF_BURST_SIZE_DEFAULT
 #define BURST_TX_DRAIN_US 100 /* TX drain every ~100us */
 #define MEMPOOL_CACHE_SIZE 256
 
diff --git a/examples/l3fwd-power/main.c b/examples/l3fwd-power/main.c
index 02ec17d79963..8cc099132057 100644
--- a/examples/l3fwd-power/main.c
+++ b/examples/l3fwd-power/main.c
@@ -54,7 +54,7 @@
 RTE_LOG_REGISTER(l3fwd_power_logtype, l3fwd.power, INFO);
 #define RTE_LOGTYPE_L3FWD_POWER l3fwd_power_logtype
 
-#define MAX_PKT_BURST 32
+#define MAX_PKT_BURST RTE_MBUF_BURST_SIZE_DEFAULT
 
 #define MIN_ZERO_POLL_COUNT 10
 
diff --git a/examples/l3fwd/l3fwd.h b/examples/l3fwd/l3fwd.h
index 471e3b488fe6..7979249e726a 100644
--- a/examples/l3fwd/l3fwd.h
+++ b/examples/l3fwd/l3fwd.h
@@ -23,14 +23,14 @@
 #define RX_DESC_DEFAULT 1024
 #define TX_DESC_DEFAULT 1024
 
-#define DEFAULT_PKT_BURST 32
+#define DEFAULT_PKT_BURST RTE_MBUF_BURST_SIZE_DEFAULT
 #define MAX_PKT_BURST 512
 #define BURST_TX_DRAIN_US 100 /* TX drain every ~100us */
 
 #define MEMPOOL_CACHE_SIZE RTE_MEMPOOL_CACHE_MAX_SIZE
 #define MAX_RX_QUEUE_PER_LCORE 16
 
-#define VECTOR_SIZE_DEFAULT   MAX_PKT_BURST
+#define VECTOR_SIZE_DEFAULT   RTE_MBUF_BURST_SIZE_DEFAULT
 #define VECTOR_TMO_NS_DEFAULT 1E6 /* 1ms */
 
 #define NB_SOCKETS        8
diff --git a/examples/l3fwd/main.c b/examples/l3fwd/main.c
index 4c641947949a..5fdd7f2bbc8f 100644
--- a/examples/l3fwd/main.c
+++ b/examples/l3fwd/main.c
@@ -1073,17 +1073,26 @@ parse_args(int argc, char **argv)
 		return -1;
 	}
 
-	if (evt_rsrc->vector_enabled && !evt_rsrc->vector_size) {
-		evt_rsrc->vector_size = VECTOR_SIZE_DEFAULT;
-		fprintf(stderr, "vector size set to default (%" PRIu16 ")\n",
-			evt_rsrc->vector_size);
+	if (evt_rsrc->vector_enabled) {
+		if (!evt_rsrc->vector_size) {
+			evt_rsrc->vector_size = VECTOR_SIZE_DEFAULT;
+			fprintf(stderr, "vector size set to default (%" PRIu16 ")\n",
+				evt_rsrc->vector_size);
+		} else {
+			fprintf(stderr, "vector size set to (%" PRIu16 ")\n",
+				evt_rsrc->vector_size);
+		}
 	}
 
-	if (evt_rsrc->vector_enabled && !evt_rsrc->vector_tmo_ns) {
-		evt_rsrc->vector_tmo_ns = VECTOR_TMO_NS_DEFAULT;
-		fprintf(stderr,
-			"vector timeout set to default (%" PRIu64 " ns)\n",
-			evt_rsrc->vector_tmo_ns);
+	if (evt_rsrc->vector_enabled) {
+		if (!evt_rsrc->vector_tmo_ns) {
+			evt_rsrc->vector_tmo_ns = VECTOR_TMO_NS_DEFAULT;
+			fprintf(stderr, "vector timeout set to default (%" PRIu64 " ns)\n",
+				evt_rsrc->vector_tmo_ns);
+		} else {
+			fprintf(stderr, "vector timeout set to (%" PRIu64 " ns)\n",
+				evt_rsrc->vector_tmo_ns);
+		}
 	}
 #endif
 
diff --git a/examples/link_status_interrupt/main.c b/examples/link_status_interrupt/main.c
index aa33e71d7aa5..0adc93153bd5 100644
--- a/examples/link_status_interrupt/main.c
+++ b/examples/link_status_interrupt/main.c
@@ -38,7 +38,7 @@
 
 #define NB_MBUF   8192
 
-#define MAX_PKT_BURST 32
+#define MAX_PKT_BURST	  RTE_MBUF_BURST_SIZE_DEFAULT
 #define BURST_TX_DRAIN_US 100 /* TX drain every ~100us */
 
 /*
@@ -60,7 +60,7 @@ static unsigned int lsi_rx_queue_per_lcore = 1;
 /* destination port for L2 forwarding */
 static unsigned lsi_dst_ports[RTE_MAX_ETHPORTS] = {0};
 
-#define MAX_PKT_BURST 32
+#define MAX_PKT_BURST RTE_MBUF_BURST_SIZE_DEFAULT
 
 #define MAX_RX_QUEUE_PER_LCORE 16
 #define MAX_TX_QUEUE_PER_PORT 16
diff --git a/examples/multi_process/symmetric_mp/main.c b/examples/multi_process/symmetric_mp/main.c
index 7314a9c6ea83..f10b66963f3d 100644
--- a/examples/multi_process/symmetric_mp/main.c
+++ b/examples/multi_process/symmetric_mp/main.c
@@ -45,7 +45,7 @@
 
 #define NB_MBUFS 64*1024 /* use 64k mbufs */
 #define MBUF_CACHE_SIZE 256
-#define PKT_BURST 32
+#define PKT_BURST	RTE_MBUF_BURST_SIZE_DEFAULT
 #define RX_RING_SIZE 1024
 #define TX_RING_SIZE 1024
 
diff --git a/examples/ntb/ntb_fwd.c b/examples/ntb/ntb_fwd.c
index 33f3c1ef17e4..330693c13b38 100644
--- a/examples/ntb/ntb_fwd.c
+++ b/examples/ntb/ntb_fwd.c
@@ -83,8 +83,8 @@ static uint16_t nb_desc = NTB_DEFAULT_NUM_DESCS;
 
 static uint16_t tx_free_thresh;
 
-#define NTB_MAX_PKT_BURST 32
-#define NTB_DFLT_PKT_BURST 32
+#define NTB_MAX_PKT_BURST  RTE_MBUF_BURST_SIZE_DEFAULT
+#define NTB_DFLT_PKT_BURST RTE_MBUF_BURST_SIZE_DEFAULT
 static uint16_t pkt_burst = NTB_DFLT_PKT_BURST;
 
 #define BURST_TX_RETRIES 64
diff --git a/examples/packet_ordering/main.c b/examples/packet_ordering/main.c
index 5ffdf72d71ab..7bbd8133e570 100644
--- a/examples/packet_ordering/main.c
+++ b/examples/packet_ordering/main.c
@@ -21,7 +21,7 @@
 #define RX_DESC_PER_QUEUE 1024
 #define TX_DESC_PER_QUEUE 1024
 
-#define MAX_PKTS_BURST 32
+#define MAX_PKTS_BURST	     RTE_MBUF_BURST_SIZE_DEFAULT
 #define REORDER_BUFFER_SIZE 8192
 #define MBUF_PER_POOL 65535
 #define MBUF_POOL_CACHE_SIZE 250
diff --git a/examples/qos_meter/main.c b/examples/qos_meter/main.c
index da1b0b228787..9e493ac4e79f 100644
--- a/examples/qos_meter/main.c
+++ b/examples/qos_meter/main.c
@@ -76,8 +76,8 @@ static struct rte_eth_conf port_conf = {
  * Packet RX/TX
  *
  ***/
-#define RTE_MBUF_F_RX_BURST_MAX                32
-#define RTE_MBUF_F_TX_BURST_MAX                32
+#define RTE_MBUF_F_RX_BURST_MAX		RTE_MBUF_BURST_SIZE_DEFAULT
+#define RTE_MBUF_F_TX_BURST_MAX		RTE_MBUF_BURST_SIZE_DEFAULT
 #define TIME_TX_DRAIN                   200000ULL
 
 static uint16_t port_rx;
diff --git a/examples/qos_sched/main.h b/examples/qos_sched/main.h
index ea66df0434fb..3cac415003d4 100644
--- a/examples/qos_sched/main.h
+++ b/examples/qos_sched/main.h
@@ -24,10 +24,10 @@ extern "C" {
 #define APP_RING_SIZE (8*1024)
 #define NB_MBUF   (2*1024*1024)
 
-#define MAX_PKT_RX_BURST 64
+#define MAX_PKT_RX_BURST RTE_MBUF_BURST_SIZE_DEFAULT
 #define PKT_ENQUEUE 64
 #define PKT_DEQUEUE 63
-#define MAX_PKT_TX_BURST 64
+#define MAX_PKT_TX_BURST RTE_MBUF_BURST_SIZE_DEFAULT
 
 #define RX_PTHRESH 8 /**< Default values of RX prefetch threshold reg. */
 #define RX_HTHRESH 8 /**< Default values of RX host threshold reg. */
diff --git a/examples/rxtx_callbacks/main.c b/examples/rxtx_callbacks/main.c
index 4682921285de..774516994e31 100644
--- a/examples/rxtx_callbacks/main.c
+++ b/examples/rxtx_callbacks/main.c
@@ -19,7 +19,7 @@
 
 #define NUM_MBUFS 8191
 #define MBUF_CACHE_SIZE 250
-#define BURST_SIZE 32
+#define BURST_SIZE	RTE_MBUF_BURST_SIZE_DEFAULT
 
 static int hwts_dynfield_offset = -1;
 
diff --git a/examples/skeleton/basicfwd.c b/examples/skeleton/basicfwd.c
index 133293cf15bb..22e2576bee54 100644
--- a/examples/skeleton/basicfwd.c
+++ b/examples/skeleton/basicfwd.c
@@ -16,7 +16,7 @@
 
 #define NUM_MBUFS 8191
 #define MBUF_CACHE_SIZE 250
-#define BURST_SIZE 32
+#define BURST_SIZE	RTE_MBUF_BURST_SIZE_DEFAULT
 
 /* basicfwd.c: Basic DPDK skeleton forwarding example. */
 
diff --git a/examples/vhost/main.h b/examples/vhost/main.h
index c986cbc5a994..e684f8b3ed16 100644
--- a/examples/vhost/main.h
+++ b/examples/vhost/main.h
@@ -17,7 +17,7 @@
 
 enum {VIRTIO_RXQ, VIRTIO_TXQ, VIRTIO_QNUM};
 
-#define MAX_PKT_BURST 32		/* Max burst size for RX/TX */
+#define MAX_PKT_BURST RTE_MBUF_BURST_SIZE_DEFAULT /* Max burst size for RX/TX */
 
 struct device_statistics {
 	uint64_t	tx;
diff --git a/examples/vhost_crypto/main.c b/examples/vhost_crypto/main.c
index 8bdfc40c4b20..37a7b9cc18dc 100644
--- a/examples/vhost_crypto/main.c
+++ b/examples/vhost_crypto/main.c
@@ -23,7 +23,7 @@
 #include <cmdline.h>
 
 #define NB_VIRTIO_QUEUES		(1)
-#define MAX_PKT_BURST			(64)
+#define MAX_PKT_BURST			(RTE_MBUF_BURST_SIZE_DEFAULT)
 #define MAX_IV_LEN			(32)
 #define NB_MEMPOOL_OBJS			(8192)
 #define NB_CRYPTO_DESCRIPTORS		(4096)
diff --git a/examples/vm_power_manager/main.c b/examples/vm_power_manager/main.c
index c14138202004..11a412e41e2d 100644
--- a/examples/vm_power_manager/main.c
+++ b/examples/vm_power_manager/main.c
@@ -45,7 +45,7 @@
 
 #define NUM_MBUFS 8191
 #define MBUF_CACHE_SIZE 250
-#define BURST_SIZE 32
+#define BURST_SIZE	RTE_MBUF_BURST_SIZE_DEFAULT
 
 static uint32_t enabled_port_mask;
 static volatile bool force_quit;
diff --git a/examples/vmdq/main.c b/examples/vmdq/main.c
index 12ef5bffc2e6..eee44ff5c9ec 100644
--- a/examples/vmdq/main.c
+++ b/examples/vmdq/main.c
@@ -41,7 +41,7 @@
 						TX_DESC_DEFAULT))
 #define MBUF_CACHE_SIZE 64
 
-#define MAX_PKT_BURST 32
+#define MAX_PKT_BURST RTE_MBUF_BURST_SIZE_DEFAULT
 
 /*
  * Configurable number of RX/TX ring descriptors
diff --git a/examples/vmdq_dcb/main.c b/examples/vmdq_dcb/main.c
index 6eccee086d82..d60261d535a8 100644
--- a/examples/vmdq_dcb/main.c
+++ b/examples/vmdq_dcb/main.c
@@ -42,7 +42,7 @@
 						TX_DESC_DEFAULT))
 #define MBUF_CACHE_SIZE 64
 
-#define MAX_PKT_BURST 32
+#define MAX_PKT_BURST RTE_MBUF_BURST_SIZE_DEFAULT
 
 /*
  * Configurable number of RX/TX ring descriptors
-- 
2.50.1 (Apple Git-155)


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH 2/2] examples: use default mbuf burst size
  2026-02-20 23:07   ` [PATCH 2/2] examples: use default mbuf burst size pbhagavatula
@ 2026-02-21 16:26     ` Stephen Hemminger
  0 siblings, 0 replies; 12+ messages in thread
From: Stephen Hemminger @ 2026-02-21 16:26 UTC (permalink / raw)
  To: pbhagavatula
  Cc: mb, jerinj, Wisam Jaddo, Aman Singh, Chas Williams,
	Min Hu (Connor), Akhil Goyal, Anoob Joseph, Nicolas Chautru,
	David Hunt, Chengwen Feng, Kevin Laatz, Bruce Richardson,
	Konstantin Ananyev, Radu Nicolau, Tomasz Kantecki, Fan Zhang,
	Sunil Kumar Kori, Anatoly Burakov, Sivaprasad Tummala,
	Jingjing Wu, Volodymyr Fialko, Cristian Dumitrescu, John McNamara,
	Maxime Coquelin, Chenbo Xia, dev

On Sat, 21 Feb 2026 04:37:44 +0530
<pbhagavatula@marvell.com> wrote:

> From: Pavan Nikhilesh <pbhagavatula@marvell.com>
> 
> Replace hardcoded burst sizes with RTE_MBUF_BURST_SIZE_DEFAULT
> to adapt to platform-specific optimal burst sizes.
> 
> Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
> ---

This is way to broad. See deeper AI analysis for all the places
this patch is over stepping...

Should only touch examples, and never change the MAX values.
It should not change any drivers.

-----
Patch 2/2: examples: use default mbuf burst size
Error: Semantic mismatch — evt_options.vector_size set to burst size macro
In app/test-eventdev/evt_options.c:


c
-	opt->vector_size = 64;
+	opt->vector_size = RTE_MBUF_BURST_SIZE_DEFAULT;
The original hardcoded value of 64 for vector_size is an event vector size, not an mbuf burst size. Event vector sizes and mbuf burst sizes serve different purposes — event vectors aggregate events for batch processing and can legitimately be larger than the Rx/Tx burst size. Replacing 64 with RTE_MBUF_BURST_SIZE_DEFAULT (which defaults to 32 in throughput mode) silently halves the default event vector size on all non-CN10K platforms. This is a functional regression that could degrade event-mode throughput. The original value should remain 64, or a separate RTE_EVENT_VECTOR_SIZE_DEFAULT macro should be introduced.

Error: MAX_PKT_BURST_VEC reduced from 256 to burst default
In examples/ipsec-secgw/ipsec-secgw.h:


c
-#define MAX_PKT_BURST_VEC 256
+#define MAX_PKT_BURST_VEC RTE_MBUF_BURST_SIZE_DEFAULT
MAX_PKT_BURST_VEC was intentionally set to 256 for vectorized IPsec processing, which processes packets in large vectorized batches. Reducing it to 32 (the default throughput burst size) is a significant behavioral change that will likely hurt vectorized IPsec performance. This macro also feeds into the MAX_PKTS calculation immediately below it, so the reduction cascades. This should either keep 256 or use a separate vector-specific configuration.

Error: MAX_PKT_BURST and DEF_PKT_BURST collapsed to same value in bonding tests
In app/test/test_link_bonding_mode4.c:


c
-#define MAX_PKT_BURST           (32)
-#define DEF_PKT_BURST           (16)
+#define MAX_PKT_BURST		(RTE_MBUF_BURST_SIZE_DEFAULT)
+#define DEF_PKT_BURST		(RTE_MBUF_BURST_SIZE_DEFAULT)
Originally MAX_PKT_BURST (32) and DEF_PKT_BURST (16) were intentionally different values — the max is a buffer/array size limit and the default is the actual burst count used. Setting both to the same value means the test now always bursts at the maximum array size with no headroom. If any code path allocates MAX_PKT_BURST elements and bursts DEF_PKT_BURST, this distinction is lost. At minimum, DEF_PKT_BURST should remain smaller than MAX_PKT_BURST.

Similarly in app/test/test_link_bonding.c, the original DEF_PKT_BURST was 16, now it becomes RTE_MBUF_BURST_SIZE_DEFAULT (32), which equals MAX_PKT_BURST in some cases — same concern about losing the max/default distinction.

Warning: NTB_MAX_PKT_BURST and NTB_DFLT_PKT_BURST collapsed
In examples/ntb/ntb_fwd.c:


c
-#define NTB_MAX_PKT_BURST 32
-#define NTB_DFLT_PKT_BURST 32
+#define NTB_MAX_PKT_BURST  RTE_MBUF_BURST_SIZE_DEFAULT
+#define NTB_DFLT_PKT_BURST RTE_MBUF_BURST_SIZE_DEFAULT
These were already the same value (32), so there's no regression today, but the intent of having separate MAX and DEFAULT macros is to allow them to diverge. Replacing both with the same configurable macro permanently ties them together. Consider using RTE_MBUF_BURST_SIZE_DEFAULT only for the default and keeping MAX as a separate (potentially larger) constant.

Warning: RTE_MBUF_F_RX_BURST_MAX / RTE_MBUF_F_TX_BURST_MAX naming confusion
In examples/qos_meter/main.c:


c
-#define RTE_MBUF_F_RX_BURST_MAX                32
-#define RTE_MBUF_F_TX_BURST_MAX                32
+#define RTE_MBUF_F_RX_BURST_MAX		RTE_MBUF_BURST_SIZE_DEFAULT
+#define RTE_MBUF_F_TX_BURST_MAX		RTE_MBUF_BURST_SIZE_DEFAULT
The RTE_MBUF_F_ prefix is reserved for mbuf flags (e.g., RTE_MBUF_F_RX_RSS_HASH). These macros are local to the example and not real mbuf flags, but this naming is confusing and was a pre-existing issue. Since this patch is already touching these lines, it would be a good opportunity to rename them to something like QOS_RX_BURST_MAX/QOS_TX_BURST_MAX to avoid namespace confusion.

Warning: PKT_ENQUEUE and PKT_DEQUEUE not updated in qos_sched
In examples/qos_sched/main.h:


c
 #define MAX_PKT_RX_BURST RTE_MBUF_BURST_SIZE_DEFAULT
 #define PKT_ENQUEUE 64
 #define PKT_DEQUEUE 63
 #define MAX_PKT_TX_BURST RTE_MBUF_BURST_SIZE_DEFAULT
MAX_PKT_RX_BURST and MAX_PKT_TX_BURST are updated from 64 to RTE_MBUF_BURST_SIZE_DEFAULT (32), but PKT_ENQUEUE remains hardcoded at 64 and PKT_DEQUEUE at 63. Now the enqueue size is 2× the Rx burst size, creating an inconsistency. If these values are related (and they appear to be — you burst Rx into an enqueue batch), they should be updated together or the relationship should be documented.

Warning: fprintf(stderr, ...) in example code for non-error messages
In examples/l3fwd/main.c, the added else branches print to stderr even when the user explicitly provided a value:


c
} else {
    fprintf(stderr, "vector size set to (%" PRIu16 ")\n",
        evt_rsrc->vector_size);
}
Using stderr for informational messages (not errors) is unconventional. This is example code so it's not critical, but printf or RTE_LOG would be more appropriate for confirming a user-supplied setting.

Warning: Duplicate MAX_PKT_BURST definition
In examples/link_status_interrupt/main.c, MAX_PKT_BURST is defined twice (lines 41 and 63 in the original). Both are being updated, but this pre-existing duplicate #define should ideally be cleaned up. Since the patch is already modifying both lines, removing the duplicate would be a good cleanup.






^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2026-02-21 16:26 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-11-26  8:24 [RFC 1/2] config: add optimal burst size configuration pbhagavatula
2025-11-26  8:24 ` [RFC 2/2] examples: use optimal burst size pbhagavatula
2025-11-26  9:57 ` [RFC 1/2] config: add optimal burst size configuration Morten Brørup
2025-11-26 10:58   ` Pavan Nikhilesh Bhagavatula
2025-11-26 11:00   ` Pavan Nikhilesh Bhagavatula
2026-02-02  9:57     ` Pavan Nikhilesh Bhagavatula
2025-11-27 22:01   ` Stephen Hemminger
2026-02-02  9:52     ` [EXTERNAL] " Pavan Nikhilesh Bhagavatula
2026-02-03  9:38       ` Morten Brørup
2026-02-20 23:07 ` [PATCH 1/2] config: add mbuf " pbhagavatula
2026-02-20 23:07   ` [PATCH 2/2] examples: use default mbuf burst size pbhagavatula
2026-02-21 16:26     ` Stephen Hemminger

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox