* [PATCH v1 0/5] Make VA reservation limits configurable
@ 2026-03-11 10:58 Anatoly Burakov
2026-03-11 10:58 ` [PATCH v1 1/5] eal/memory: always use one segment per memory type Anatoly Burakov
` (5 more replies)
0 siblings, 6 replies; 14+ messages in thread
From: Anatoly Burakov @ 2026-03-11 10:58 UTC (permalink / raw)
To: dev
Currently, the VA space reservation is governed by a combination of a few
values:
- Total max VA space (512G for most platforms, 1T for some, 2G for 32-bit) - Max
memory per memory type - Max pages per memory type
The "memory" type is defined as unique combination of NUMA node and page size.
The reason why there are two limits is because for large pages, having large
segment limit causes runaway multi-terabyte reservations, while for smaller
pages, having large memory limit causes hundreds of thousands of huge page
slots. The total maximum memory size was originally intended as a safeguard
against discontiguous NUMA nodes, but this has since been fixed by EAL API
explicitly supporting discontiguous NUMA nodes, so this is no longer a problem.
In addition to that, each memory type was split into multiple segment lists,
with the idea that it should be easier for a secondary process to reserve
multiple smaller chunks at discontiguous addresses than it is to reserve a large
single chunk of memory. It is unknown whether this actually makes a difference,
but what *is* known is that it's a source of additional complexity with memory
reservation, as well as a source of gratuitous memory reservation limits placed
on DPDK.
This patchset attempts to simplify and improve this situation in a few key
areas:
- Get rid of global memory limits
Total memory usage can, and should, scale with NUMA sockets, and so now it does.
- Get rid of multiple segment lists per memory type
This removes two config options, and makes the address space reservations a lot
simpler.
- Allocate all memory segment lists as one big blob of memory
This further simplifies address space reservations.
- Use memory size limits instead of segments limits
Despite smaller page sizes still needing limits on number of segments, they are
directly translated into memory size limits at init time, so that all limits the
VA space reservation ever sees are expressed in bytes, not segments. This
reduces complexity in how we manage the VA space reservations and work with our
limits.
- Do not use config constants directly
We switch to only invoking these constants once - at startup, when we are
discovering hugepage sizes available to the system. This allows us to be more
flexible in how we manage these limits.
- Add EAL command-line option to set per-page size limits
The final piece of the puzzle - the "more flexible in how we manage these
limits" part. This new parameter affords us more flexible VA space management,
including disabling specific page sizes entirely (by specifying 0 as the limit).
This allows increasing/decreasing VA space reservation limits without
recompiling DPDK.
Anatoly Burakov (5):
eal/memory: always use one segment per memory type
eal/memory: allocate all VA space in one go
eal/memory: get rid of global VA space limits
eal/memory: store default segment limits in config
eal/memory: add page size VA limits EAL parameter
app/test/test.c | 1 +
app/test/test_eal_flags.c | 113 ++++++++++++
config/arm/meson.build | 1 -
config/meson.build | 5 -
config/rte_config.h | 2 -
doc/guides/linux_gsg/linux_eal_parameters.rst | 13 ++
.../prog_guide/env_abstraction_layer.rst | 33 +++-
lib/eal/common/eal_common_dynmem.c | 160 ++++++++---------
lib/eal/common/eal_common_memory.c | 29 ++-
lib/eal/common/eal_common_options.c | 120 +++++++++++++
lib/eal/common/eal_internal_cfg.h | 8 +
lib/eal/common/eal_memcfg.h | 6 +
lib/eal/common/eal_option_list.h | 1 +
lib/eal/common/eal_options.h | 1 +
lib/eal/common/eal_private.h | 15 +-
lib/eal/freebsd/eal.c | 6 +
lib/eal/freebsd/eal_memory.c | 98 +++--------
lib/eal/linux/eal.c | 6 +
lib/eal/linux/eal_memalloc.c | 2 +-
lib/eal/linux/eal_memory.c | 165 +++++++++++-------
lib/eal/windows/eal.c | 6 +
21 files changed, 542 insertions(+), 249 deletions(-)
--
2.47.3
^ permalink raw reply [flat|nested] 14+ messages in thread* [PATCH v1 1/5] eal/memory: always use one segment per memory type 2026-03-11 10:58 [PATCH v1 0/5] Make VA reservation limits configurable Anatoly Burakov @ 2026-03-11 10:58 ` Anatoly Burakov 2026-03-11 10:58 ` [PATCH v1 2/5] eal/memory: allocate all VA space in one go Anatoly Burakov ` (4 subsequent siblings) 5 siblings, 0 replies; 14+ messages in thread From: Anatoly Burakov @ 2026-03-11 10:58 UTC (permalink / raw) To: dev, Bruce Richardson Initially, the dynamic memory mode has used multiple segment lists for backing of different memory types, with the motivation being that it should be easier for secondary processes to map many smaller segments than fewer but larger ones, but in practice this does not seem to make any difference. To reduce the amount of complexity in how memory segment lists are handled, collapse the multi-list logic to always use single segment list. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> --- config/rte_config.h | 2 - .../prog_guide/env_abstraction_layer.rst | 4 - lib/eal/common/eal_common_dynmem.c | 97 +++++-------------- lib/eal/common/eal_common_memory.c | 7 +- lib/eal/common/eal_private.h | 2 +- lib/eal/freebsd/eal_memory.c | 70 ++++--------- lib/eal/linux/eal_memalloc.c | 2 +- lib/eal/linux/eal_memory.c | 87 ++++++----------- 8 files changed, 80 insertions(+), 191 deletions(-) diff --git a/config/rte_config.h b/config/rte_config.h index a2609fa403..0447cdf2ad 100644 --- a/config/rte_config.h +++ b/config/rte_config.h @@ -43,8 +43,6 @@ #define RTE_MAX_HEAPS 32 #define RTE_MAX_LCORE_VAR 131072 #define RTE_MAX_MEMSEG_LISTS 128 -#define RTE_MAX_MEMSEG_PER_LIST 8192 -#define RTE_MAX_MEM_MB_PER_LIST 32768 #define RTE_MAX_MEMSEG_PER_TYPE 32768 #define RTE_MAX_MEM_MB_PER_TYPE 65536 #define RTE_MAX_TAILQ 32 diff --git a/doc/guides/prog_guide/env_abstraction_layer.rst b/doc/guides/prog_guide/env_abstraction_layer.rst index d716895c1d..04368a3950 100644 --- a/doc/guides/prog_guide/env_abstraction_layer.rst +++ b/doc/guides/prog_guide/env_abstraction_layer.rst @@ -204,10 +204,6 @@ of virtual memory being preallocated at startup by editing the following config variables: * ``RTE_MAX_MEMSEG_LISTS`` controls how many segment lists can DPDK have -* ``RTE_MAX_MEM_MB_PER_LIST`` controls how much megabytes of memory each - segment list can address -* ``RTE_MAX_MEMSEG_PER_LIST`` controls how many segments each segment list - can have * ``RTE_MAX_MEMSEG_PER_TYPE`` controls how many segments each memory type can have (where "type" is defined as "page size + NUMA node" combination) * ``RTE_MAX_MEM_MB_PER_TYPE`` controls how much megabytes of memory each diff --git a/lib/eal/common/eal_common_dynmem.c b/lib/eal/common/eal_common_dynmem.c index 8f51d6dd4a..86da2bd80b 100644 --- a/lib/eal/common/eal_common_dynmem.c +++ b/lib/eal/common/eal_common_dynmem.c @@ -28,7 +28,6 @@ eal_dynmem_memseg_lists_init(void) int i, hpi_idx, msl_idx, ret = -1; /* fail unless told to succeed */ struct rte_memseg_list *msl; uint64_t max_mem, max_mem_per_type; - unsigned int max_seglists_per_type; unsigned int n_memtypes, cur_type; struct internal_config *internal_conf = eal_get_internal_configuration(); @@ -45,8 +44,7 @@ eal_dynmem_memseg_lists_init(void) * * deciding amount of memory going towards each memory type is a * balancing act between maximum segments per type, maximum memory per - * type, and number of detected NUMA nodes. the goal is to make sure - * each memory type gets at least one memseg list. + * type, and number of detected NUMA nodes. * * the total amount of memory is limited by RTE_MAX_MEM_MB value. * @@ -57,19 +55,9 @@ eal_dynmem_memseg_lists_init(void) * smaller page sizes, it can take hundreds of thousands of segments to * reach the above specified per-type memory limits. * - * additionally, each type may have multiple memseg lists associated - * with it, each limited by either RTE_MAX_MEM_MB_PER_LIST for bigger - * page sizes, or RTE_MAX_MEMSEG_PER_LIST segments for smaller ones. - * - * the number of memseg lists per type is decided based on the above - * limits, and also taking number of detected NUMA nodes, to make sure - * that we don't run out of memseg lists before we populate all NUMA - * nodes with memory. - * - * we do this in three stages. first, we collect the number of types. - * then, we figure out memory constraints and populate the list of - * would-be memseg lists. then, we go ahead and allocate the memseg - * lists. + * each memory type is allotted a single memseg list. the size of that + * list is calculated here to respect the per-type memory and segment + * limits that apply. */ /* create space for mem types */ @@ -109,89 +97,56 @@ eal_dynmem_memseg_lists_init(void) /* number of memtypes could have been lower due to no NUMA support */ n_memtypes = cur_type; + /* can we fit all memtypes into the memseg lists? */ + if (n_memtypes > RTE_MAX_MEMSEG_LISTS) { + EAL_LOG(ERR, "Too many memory types detected: %u. Please increase " + "RTE_MAX_MEMSEG_LISTS in configuration.", n_memtypes); + goto out; + } + /* set up limits for types */ max_mem = (uint64_t)RTE_MAX_MEM_MB << 20; max_mem_per_type = RTE_MIN((uint64_t)RTE_MAX_MEM_MB_PER_TYPE << 20, max_mem / n_memtypes); - /* - * limit maximum number of segment lists per type to ensure there's - * space for memseg lists for all NUMA nodes with all page sizes - */ - max_seglists_per_type = RTE_MAX_MEMSEG_LISTS / n_memtypes; - - if (max_seglists_per_type == 0) { - EAL_LOG(ERR, "Cannot accommodate all memory types, please increase RTE_MAX_MEMSEG_LISTS"); - goto out; - } /* go through all mem types and create segment lists */ msl_idx = 0; for (cur_type = 0; cur_type < n_memtypes; cur_type++) { - unsigned int cur_seglist, n_seglists, n_segs; - unsigned int max_segs_per_type, max_segs_per_list; + unsigned int n_segs; + unsigned int max_segs_per_type; struct memtype *type = &memtypes[cur_type]; - uint64_t max_mem_per_list, pagesz; + uint64_t pagesz; int socket_id; pagesz = type->page_sz; socket_id = type->socket_id; /* - * we need to create segment lists for this type. we must take + * we need to create a segment list for this type. we must take * into account the following things: * - * 1. total amount of memory we can use for this memory type - * 2. total amount of memory per memseg list allowed + * 1. total amount of memory to use for this memory type + * 2. total amount of memory allowed per type * 3. number of segments needed to fit the amount of memory * 4. number of segments allowed per type - * 5. number of segments allowed per memseg list - * 6. number of memseg lists we are allowed to take up */ - - /* calculate how much segments we will need in total */ max_segs_per_type = max_mem_per_type / pagesz; - /* limit number of segments to maximum allowed per type */ max_segs_per_type = RTE_MIN(max_segs_per_type, (unsigned int)RTE_MAX_MEMSEG_PER_TYPE); - /* limit number of segments to maximum allowed per list */ - max_segs_per_list = RTE_MIN(max_segs_per_type, - (unsigned int)RTE_MAX_MEMSEG_PER_LIST); + n_segs = max_segs_per_type; - /* calculate how much memory we can have per segment list */ - max_mem_per_list = RTE_MIN(max_segs_per_list * pagesz, - (uint64_t)RTE_MAX_MEM_MB_PER_LIST << 20); - - /* calculate how many segments each segment list will have */ - n_segs = RTE_MIN(max_segs_per_list, max_mem_per_list / pagesz); - - /* calculate how many segment lists we can have */ - n_seglists = RTE_MIN(max_segs_per_type / n_segs, - max_mem_per_type / max_mem_per_list); - - /* limit number of segment lists according to our maximum */ - n_seglists = RTE_MIN(n_seglists, max_seglists_per_type); - - EAL_LOG(DEBUG, "Creating %i segment lists: " + EAL_LOG(DEBUG, "Creating segment list: " "n_segs:%i socket_id:%i hugepage_sz:%" PRIu64, - n_seglists, n_segs, socket_id, pagesz); + n_segs, socket_id, pagesz); - /* create all segment lists */ - for (cur_seglist = 0; cur_seglist < n_seglists; cur_seglist++) { - if (msl_idx >= RTE_MAX_MEMSEG_LISTS) { - EAL_LOG(ERR, - "No more space in memseg lists, please increase RTE_MAX_MEMSEG_LISTS"); - goto out; - } - msl = &mcfg->memsegs[msl_idx++]; + msl = &mcfg->memsegs[msl_idx++]; - if (eal_memseg_list_init(msl, pagesz, n_segs, - socket_id, cur_seglist, true)) - goto out; + if (eal_memseg_list_init(msl, pagesz, n_segs, socket_id, true)) + goto out; - if (eal_memseg_list_alloc(msl, 0)) { - EAL_LOG(ERR, "Cannot allocate VA space for memseg list"); - goto out; - } + if (eal_memseg_list_alloc(msl, 0)) { + EAL_LOG(ERR, "Cannot allocate VA space for memseg list"); + goto out; } } /* we're successful */ diff --git a/lib/eal/common/eal_common_memory.c b/lib/eal/common/eal_common_memory.c index dccf9406c5..e8e41bb741 100644 --- a/lib/eal/common/eal_common_memory.c +++ b/lib/eal/common/eal_common_memory.c @@ -40,7 +40,7 @@ * which is a multiple of hugepage size. */ -#define MEMSEG_LIST_FMT "memseg-%" PRIu64 "k-%i-%i" +#define MEMSEG_LIST_FMT "memseg-%" PRIu64 "k-%i" static void *next_baseaddr; static uint64_t system_page_sz; @@ -228,12 +228,11 @@ eal_memseg_list_init_named(struct rte_memseg_list *msl, const char *name, int eal_memseg_list_init(struct rte_memseg_list *msl, uint64_t page_sz, - int n_segs, int socket_id, int type_msl_idx, bool heap) + int n_segs, int socket_id, bool heap) { char name[RTE_FBARRAY_NAME_LEN]; - snprintf(name, sizeof(name), MEMSEG_LIST_FMT, page_sz >> 10, socket_id, - type_msl_idx); + snprintf(name, sizeof(name), MEMSEG_LIST_FMT, page_sz >> 10, socket_id); return eal_memseg_list_init_named( msl, name, page_sz, n_segs, socket_id, heap); diff --git a/lib/eal/common/eal_private.h b/lib/eal/common/eal_private.h index e032dd10c9..40408d61b4 100644 --- a/lib/eal/common/eal_private.h +++ b/lib/eal/common/eal_private.h @@ -306,7 +306,7 @@ eal_memseg_list_init_named(struct rte_memseg_list *msl, const char *name, */ int eal_memseg_list_init(struct rte_memseg_list *msl, uint64_t page_sz, - int n_segs, int socket_id, int type_msl_idx, bool heap); + int n_segs, int socket_id, bool heap); /** * Reserve VA space for a memory segment list diff --git a/lib/eal/freebsd/eal_memory.c b/lib/eal/freebsd/eal_memory.c index cd608db9f9..36dcc04ce4 100644 --- a/lib/eal/freebsd/eal_memory.c +++ b/lib/eal/freebsd/eal_memory.c @@ -190,7 +190,7 @@ rte_eal_hugepage_init(void) break; } if (msl_idx == RTE_MAX_MEMSEG_LISTS) { - EAL_LOG(ERR, "Could not find space for memseg. Please increase RTE_MAX_MEMSEG_PER_LIST " + EAL_LOG(ERR, "Could not find space for memseg. Please increase " "RTE_MAX_MEMSEG_PER_TYPE and/or RTE_MAX_MEM_MB_PER_TYPE in configuration."); return -1; } @@ -320,23 +320,6 @@ rte_eal_using_phys_addrs(void) return 0; } -static uint64_t -get_mem_amount(uint64_t page_sz, uint64_t max_mem) -{ - uint64_t area_sz, max_pages; - - /* limit to RTE_MAX_MEMSEG_PER_LIST pages or RTE_MAX_MEM_MB_PER_LIST */ - max_pages = RTE_MAX_MEMSEG_PER_LIST; - max_mem = RTE_MIN((uint64_t)RTE_MAX_MEM_MB_PER_LIST << 20, max_mem); - - area_sz = RTE_MIN(page_sz * max_pages, max_mem); - - /* make sure the list isn't smaller than the page size */ - area_sz = RTE_MAX(area_sz, page_sz); - - return RTE_ALIGN(area_sz, page_sz); -} - static int memseg_list_alloc(struct rte_memseg_list *msl) { @@ -380,9 +363,10 @@ memseg_primary_init(void) hpi_idx++) { uint64_t max_type_mem, total_type_mem = 0; uint64_t avail_mem; - int type_msl_idx, max_segs, avail_segs, total_segs = 0; + int max_segs, avail_segs; struct hugepage_info *hpi; uint64_t hugepage_sz; + unsigned int n_segs; hpi = &internal_conf->hugepage_info[hpi_idx]; hugepage_sz = hpi->hugepage_sz; @@ -413,40 +397,28 @@ memseg_primary_init(void) max_type_mem = RTE_MIN(avail_mem, max_type_mem); max_segs = RTE_MIN(avail_segs, max_segs); + n_segs = RTE_MIN(max_type_mem / hugepage_sz, (uint64_t)max_segs); + if (n_segs == 0) + continue; + + if (msl_idx >= RTE_MAX_MEMSEG_LISTS) { + EAL_LOG(ERR, + "No more space in memseg lists, please increase RTE_MAX_MEMSEG_LISTS"); + return -1; + } - type_msl_idx = 0; - while (total_type_mem < max_type_mem && - total_segs < max_segs) { - uint64_t cur_max_mem, cur_mem; - unsigned int n_segs; - - if (msl_idx >= RTE_MAX_MEMSEG_LISTS) { - EAL_LOG(ERR, - "No more space in memseg lists, please increase RTE_MAX_MEMSEG_LISTS"); - return -1; - } - - msl = &mcfg->memsegs[msl_idx++]; - - cur_max_mem = max_type_mem - total_type_mem; - - cur_mem = get_mem_amount(hugepage_sz, - cur_max_mem); - n_segs = cur_mem / hugepage_sz; - - if (eal_memseg_list_init(msl, hugepage_sz, n_segs, - 0, type_msl_idx, false)) - return -1; + msl = &mcfg->memsegs[msl_idx++]; - total_segs += msl->memseg_arr.len; - total_type_mem = total_segs * hugepage_sz; - type_msl_idx++; + if (eal_memseg_list_init(msl, hugepage_sz, n_segs, + 0, false)) + return -1; - if (memseg_list_alloc(msl)) { - EAL_LOG(ERR, "Cannot allocate VA space for memseg list"); - return -1; - } + total_type_mem = n_segs * hugepage_sz; + if (memseg_list_alloc(msl)) { + EAL_LOG(ERR, "Cannot allocate VA space for memseg list"); + return -1; } + total_mem += total_type_mem; } return 0; diff --git a/lib/eal/linux/eal_memalloc.c b/lib/eal/linux/eal_memalloc.c index a39bc31c7b..87e40e4465 100644 --- a/lib/eal/linux/eal_memalloc.c +++ b/lib/eal/linux/eal_memalloc.c @@ -283,7 +283,7 @@ get_seg_fd(char *path, int buflen, struct hugepage_info *hi, } else { out_fd = &fd_list[list_idx].fds[seg_idx]; huge_path = eal_get_hugefile_path(path, buflen, hi->hugedir, - list_idx * RTE_MAX_MEMSEG_PER_LIST + seg_idx); + list_idx * RTE_MAX_MEMSEG_PER_TYPE + seg_idx); } if (huge_path == NULL) { EAL_LOG(DEBUG, "%s(): hugefile path truncated: '%s'", diff --git a/lib/eal/linux/eal_memory.c b/lib/eal/linux/eal_memory.c index bf783e3c76..568d5da124 100644 --- a/lib/eal/linux/eal_memory.c +++ b/lib/eal/linux/eal_memory.c @@ -740,7 +740,7 @@ remap_segment(struct hugepage_file *hugepages, int seg_start, int seg_end) break; } if (msl_idx == RTE_MAX_MEMSEG_LISTS) { - EAL_LOG(ERR, "Could not find space for memseg. Please increase RTE_MAX_MEMSEG_PER_LIST " + EAL_LOG(ERR, "Could not find space for memseg. Please increase " "RTE_MAX_MEMSEG_PER_TYPE and/or RTE_MAX_MEM_MB_PER_TYPE in configuration."); return -1; } @@ -822,23 +822,6 @@ remap_segment(struct hugepage_file *hugepages, int seg_start, int seg_end) return seg_len; } -static uint64_t -get_mem_amount(uint64_t page_sz, uint64_t max_mem) -{ - uint64_t area_sz, max_pages; - - /* limit to RTE_MAX_MEMSEG_PER_LIST pages or RTE_MAX_MEM_MB_PER_LIST */ - max_pages = RTE_MAX_MEMSEG_PER_LIST; - max_mem = RTE_MIN((uint64_t)RTE_MAX_MEM_MB_PER_LIST << 20, max_mem); - - area_sz = RTE_MIN(page_sz * max_pages, max_mem); - - /* make sure the list isn't smaller than the page size */ - area_sz = RTE_MAX(area_sz, page_sz); - - return RTE_ALIGN(area_sz, page_sz); -} - static int memseg_list_free(struct rte_memseg_list *msl) { @@ -995,7 +978,7 @@ prealloc_segments(struct hugepage_file *hugepages, int n_pages) /* now, allocate fbarray itself */ if (eal_memseg_list_init(msl, page_sz, n_segs, - socket, msl_idx, true) < 0) + socket, true) < 0) return -1; /* finally, allocate VA space */ @@ -1831,7 +1814,7 @@ memseg_primary_init_32(void) uint64_t max_pagesz_mem, cur_pagesz_mem = 0; uint64_t hugepage_sz; struct hugepage_info *hpi; - int type_msl_idx, max_segs, total_segs = 0; + unsigned int n_segs; hpi = &internal_conf->hugepage_info[hpi_idx]; hugepage_sz = hpi->hugepage_sz; @@ -1840,62 +1823,48 @@ memseg_primary_init_32(void) if (hpi->num_pages[socket_id] == 0) continue; - max_segs = RTE_MAX_MEMSEG_PER_TYPE; max_pagesz_mem = max_socket_mem - cur_socket_mem; /* make it multiple of page size */ max_pagesz_mem = RTE_ALIGN_FLOOR(max_pagesz_mem, hugepage_sz); + n_segs = RTE_MIN(max_pagesz_mem / hugepage_sz, + (unsigned int)RTE_MAX_MEMSEG_PER_TYPE); EAL_LOG(DEBUG, "Attempting to preallocate " "%" PRIu64 "M on socket %i", max_pagesz_mem >> 20, socket_id); - type_msl_idx = 0; - while (cur_pagesz_mem < max_pagesz_mem && - total_segs < max_segs) { - uint64_t cur_mem; - unsigned int n_segs; + if (n_segs == 0) + continue; - if (msl_idx >= RTE_MAX_MEMSEG_LISTS) { - EAL_LOG(ERR, - "No more space in memseg lists, please increase RTE_MAX_MEMSEG_LISTS"); - return -1; - } + if (msl_idx >= RTE_MAX_MEMSEG_LISTS) { + EAL_LOG(ERR, + "No more space in memseg lists, please increase RTE_MAX_MEMSEG_LISTS"); + return -1; + } - msl = &mcfg->memsegs[msl_idx]; + msl = &mcfg->memsegs[msl_idx]; - cur_mem = get_mem_amount(hugepage_sz, - max_pagesz_mem); - n_segs = cur_mem / hugepage_sz; + if (eal_memseg_list_init(msl, hugepage_sz, + n_segs, socket_id, true)) { + /* failing to allocate a memseg list is a serious error. */ + EAL_LOG(ERR, "Cannot allocate memseg list"); + return -1; + } - if (eal_memseg_list_init(msl, hugepage_sz, - n_segs, socket_id, type_msl_idx, - true)) { - /* failing to allocate a memseg list is - * a serious error. - */ - EAL_LOG(ERR, "Cannot allocate memseg list"); + if (eal_memseg_list_alloc(msl, 0)) { + /* if we couldn't allocate VA space, try smaller page sizes. */ + EAL_LOG(ERR, "Cannot allocate VA space for memseg list, retrying with different page size"); + /* deallocate memseg list */ + if (memseg_list_free(msl)) return -1; - } - - if (eal_memseg_list_alloc(msl, 0)) { - /* if we couldn't allocate VA space, we - * can try with smaller page sizes. - */ - EAL_LOG(ERR, "Cannot allocate VA space for memseg list, retrying with different page size"); - /* deallocate memseg list */ - if (memseg_list_free(msl)) - return -1; - break; - } - - total_segs += msl->memseg_arr.len; - cur_pagesz_mem = total_segs * hugepage_sz; - type_msl_idx++; - msl_idx++; + continue; } + + cur_pagesz_mem = n_segs * hugepage_sz; cur_socket_mem += cur_pagesz_mem; + msl_idx++; } if (cur_socket_mem == 0) { EAL_LOG(ERR, "Cannot allocate VA space on socket %u", -- 2.47.3 ^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH v1 2/5] eal/memory: allocate all VA space in one go 2026-03-11 10:58 [PATCH v1 0/5] Make VA reservation limits configurable Anatoly Burakov 2026-03-11 10:58 ` [PATCH v1 1/5] eal/memory: always use one segment per memory type Anatoly Burakov @ 2026-03-11 10:58 ` Anatoly Burakov 2026-03-11 10:58 ` [PATCH v1 3/5] eal/memory: get rid of global VA space limits Anatoly Burakov ` (3 subsequent siblings) 5 siblings, 0 replies; 14+ messages in thread From: Anatoly Burakov @ 2026-03-11 10:58 UTC (permalink / raw) To: dev Instead of allocating VA space per memseg list in dynmem mode, allocate it all in one go, and then assign memseg lists portions of that space. In a similar way, for dynmem initialization in secondary processes, also attach all VA space in one go. Legacy/32-bit paths are untouched. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> --- lib/eal/common/eal_common_dynmem.c | 56 ++++++++++++++++++++----- lib/eal/common/eal_common_memory.c | 22 ++++++++++ lib/eal/common/eal_memcfg.h | 6 +++ lib/eal/common/eal_private.h | 13 ++++++ lib/eal/linux/eal_memory.c | 66 +++++++++++++++++++++++++++++- 5 files changed, 152 insertions(+), 11 deletions(-) diff --git a/lib/eal/common/eal_common_dynmem.c b/lib/eal/common/eal_common_dynmem.c index 86da2bd80b..bfc4de6698 100644 --- a/lib/eal/common/eal_common_dynmem.c +++ b/lib/eal/common/eal_common_dynmem.c @@ -24,11 +24,16 @@ eal_dynmem_memseg_lists_init(void) struct memtype { uint64_t page_sz; int socket_id; + unsigned int n_segs; + size_t mem_sz; + size_t va_offset; } *memtypes = NULL; int i, hpi_idx, msl_idx, ret = -1; /* fail unless told to succeed */ struct rte_memseg_list *msl; uint64_t max_mem, max_mem_per_type; + size_t mem_va_len, mem_va_page_sz; unsigned int n_memtypes, cur_type; + void *mem_va_addr = NULL; struct internal_config *internal_conf = eal_get_internal_configuration(); @@ -108,18 +113,16 @@ eal_dynmem_memseg_lists_init(void) max_mem = (uint64_t)RTE_MAX_MEM_MB << 20; max_mem_per_type = RTE_MIN((uint64_t)RTE_MAX_MEM_MB_PER_TYPE << 20, max_mem / n_memtypes); + mem_va_len = 0; + mem_va_page_sz = 0; - /* go through all mem types and create segment lists */ - msl_idx = 0; + /* calculate total VA space and offsets for all mem types */ for (cur_type = 0; cur_type < n_memtypes; cur_type++) { - unsigned int n_segs; unsigned int max_segs_per_type; struct memtype *type = &memtypes[cur_type]; uint64_t pagesz; - int socket_id; pagesz = type->page_sz; - socket_id = type->socket_id; /* * we need to create a segment list for this type. we must take @@ -133,25 +136,58 @@ eal_dynmem_memseg_lists_init(void) max_segs_per_type = max_mem_per_type / pagesz; max_segs_per_type = RTE_MIN(max_segs_per_type, (unsigned int)RTE_MAX_MEMSEG_PER_TYPE); - n_segs = max_segs_per_type; + type->n_segs = max_segs_per_type; + type->mem_sz = (size_t)pagesz * type->n_segs; + mem_va_page_sz = RTE_MAX(mem_va_page_sz, (size_t)pagesz); + mem_va_len = RTE_ALIGN_CEIL(mem_va_len, pagesz); + type->va_offset = mem_va_len; + mem_va_len += type->mem_sz; + } + + mem_va_addr = eal_get_virtual_area(NULL, &mem_va_len, + mem_va_page_sz, 0, 0); + if (mem_va_addr == NULL) { + EAL_LOG(ERR, "Cannot reserve VA space for memseg lists"); + goto out; + } + + /* go through all mem types and create segment lists */ + msl_idx = 0; + for (cur_type = 0; cur_type < n_memtypes; cur_type++) { + struct memtype *type = &memtypes[cur_type]; + uint64_t pagesz; + int socket_id; + + pagesz = type->page_sz; + socket_id = type->socket_id; EAL_LOG(DEBUG, "Creating segment list: " "n_segs:%i socket_id:%i hugepage_sz:%" PRIu64, - n_segs, socket_id, pagesz); + type->n_segs, socket_id, pagesz); msl = &mcfg->memsegs[msl_idx++]; - if (eal_memseg_list_init(msl, pagesz, n_segs, socket_id, true)) + if (eal_memseg_list_init(msl, pagesz, type->n_segs, socket_id, true)) goto out; - if (eal_memseg_list_alloc(msl, 0)) { - EAL_LOG(ERR, "Cannot allocate VA space for memseg list"); + if (eal_memseg_list_assign(msl, + RTE_PTR_ADD(mem_va_addr, type->va_offset))) { + EAL_LOG(ERR, "Cannot assign VA space for memseg list"); goto out; } } /* we're successful */ ret = 0; out: + if (ret != 0) { + if (mem_va_addr != NULL) + eal_mem_free(mem_va_addr, mem_va_len); + } else { + /* store the VA space data in shared config */ + mcfg->mem_va_addr = (uintptr_t)mem_va_addr; + mcfg->mem_va_len = mem_va_len; + mcfg->mem_va_page_sz = mem_va_page_sz; + } free(memtypes); return ret; } diff --git a/lib/eal/common/eal_common_memory.c b/lib/eal/common/eal_common_memory.c index e8e41bb741..b91e765cbf 100644 --- a/lib/eal/common/eal_common_memory.c +++ b/lib/eal/common/eal_common_memory.c @@ -271,6 +271,28 @@ eal_memseg_list_alloc(struct rte_memseg_list *msl, int reserve_flags) return 0; } +int +eal_memseg_list_assign(struct rte_memseg_list *msl, void *addr) +{ + size_t page_sz, mem_sz; + + page_sz = msl->page_sz; + mem_sz = page_sz * msl->memseg_arr.len; + + if (addr == NULL || addr != RTE_PTR_ALIGN(addr, page_sz)) { + rte_errno = EINVAL; + return -1; + } + + msl->base_va = addr; + msl->len = mem_sz; + + EAL_LOG(DEBUG, "VA assigned for memseg list at %p, size %zx", + addr, mem_sz); + + return 0; +} + void eal_memseg_list_populate(struct rte_memseg_list *msl, void *addr, int n_segs) { diff --git a/lib/eal/common/eal_memcfg.h b/lib/eal/common/eal_memcfg.h index 60e2089797..2b3b3b62ba 100644 --- a/lib/eal/common/eal_memcfg.h +++ b/lib/eal/common/eal_memcfg.h @@ -49,6 +49,12 @@ struct rte_mem_config { struct rte_memseg_list memsegs[RTE_MAX_MEMSEG_LISTS]; /**< List of dynamic arrays holding memsegs */ + uintptr_t mem_va_addr; + /**< Base VA address reserved for dynamic memory memseg lists. */ + size_t mem_va_len; + /**< Length of VA range reserved for dynamic memory memseg lists. */ + size_t mem_va_page_sz; + /**< Page size alignment used for dynamic memory VA reservation. */ struct rte_tailq_head tailq_head[RTE_MAX_TAILQ]; /**< Tailqs for objects */ diff --git a/lib/eal/common/eal_private.h b/lib/eal/common/eal_private.h index 40408d61b4..b62b71369a 100644 --- a/lib/eal/common/eal_private.h +++ b/lib/eal/common/eal_private.h @@ -322,6 +322,19 @@ eal_memseg_list_init(struct rte_memseg_list *msl, uint64_t page_sz, int eal_memseg_list_alloc(struct rte_memseg_list *msl, int reserve_flags); +/** + * Assign a pre-reserved VA range to a memory segment list. + * + * @param msl + * Initialized memory segment list with page size defined. + * @param addr + * Starting address of list VA range. + * @return + * 0 on success, (-1) on failure and rte_errno is set. + */ +int +eal_memseg_list_assign(struct rte_memseg_list *msl, void *addr); + /** * Populate MSL, each segment is one page long. * diff --git a/lib/eal/linux/eal_memory.c b/lib/eal/linux/eal_memory.c index 568d5da124..3b2afee852 100644 --- a/lib/eal/linux/eal_memory.c +++ b/lib/eal/linux/eal_memory.c @@ -1883,7 +1883,59 @@ memseg_primary_init(void) } static int -memseg_secondary_init(void) +memseg_secondary_init_dynmem(void) +{ + struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config; + int msl_idx = 0; + struct rte_memseg_list *msl; + void *mem_va_addr; + size_t mem_va_len; + + if (mcfg->mem_va_addr == 0 || mcfg->mem_va_len == 0 || + mcfg->mem_va_page_sz == 0) { + EAL_LOG(ERR, "Missing shared dynamic memory VA range from primary process"); + return -1; + } + + mem_va_addr = (void *)(uintptr_t)mcfg->mem_va_addr; + mem_va_len = mcfg->mem_va_len; + + if (eal_get_virtual_area(mem_va_addr, &mem_va_len, + mcfg->mem_va_page_sz, 0, 0) == NULL) { + EAL_LOG(ERR, "Cannot reserve VA space for hugepage memory"); + return -1; + } + + for (msl_idx = 0; msl_idx < RTE_MAX_MEMSEG_LISTS; msl_idx++) { + + msl = &mcfg->memsegs[msl_idx]; + + /* skip empty and external memseg lists */ + if (msl->memseg_arr.len == 0 || msl->external) + continue; + + if (rte_fbarray_attach(&msl->memseg_arr)) { + EAL_LOG(ERR, "Cannot attach to primary process memseg lists"); + eal_mem_free(mem_va_addr, mem_va_len); + return -1; + } + + if (eal_memseg_list_assign(msl, msl->base_va)) { + EAL_LOG(ERR, "Cannot assign VA space for hugepage memory"); + eal_mem_free(mem_va_addr, mem_va_len); + return -1; + } + + EAL_LOG(DEBUG, "Attaching segment list: " + "n_segs:%u socket_id:%d hugepage_sz:%" PRIu64, + msl->memseg_arr.len, msl->socket_id, msl->page_sz); + } + + return 0; +} + +static int +memseg_secondary_init_legacy(void) { struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config; int msl_idx = 0; @@ -1912,6 +1964,18 @@ memseg_secondary_init(void) return 0; } +static int +memseg_secondary_init(void) +{ + const struct internal_config *internal_conf = + eal_get_internal_configuration(); + + if (!internal_conf->legacy_mem) + return memseg_secondary_init_dynmem(); + + return memseg_secondary_init_legacy(); +} + int rte_eal_memseg_init(void) { -- 2.47.3 ^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH v1 3/5] eal/memory: get rid of global VA space limits 2026-03-11 10:58 [PATCH v1 0/5] Make VA reservation limits configurable Anatoly Burakov 2026-03-11 10:58 ` [PATCH v1 1/5] eal/memory: always use one segment per memory type Anatoly Burakov 2026-03-11 10:58 ` [PATCH v1 2/5] eal/memory: allocate all VA space in one go Anatoly Burakov @ 2026-03-11 10:58 ` Anatoly Burakov 2026-03-11 10:58 ` [PATCH v1 4/5] eal/memory: store default segment limits in config Anatoly Burakov ` (2 subsequent siblings) 5 siblings, 0 replies; 14+ messages in thread From: Anatoly Burakov @ 2026-03-11 10:58 UTC (permalink / raw) To: dev, Wathsala Vithanage, Bruce Richardson Currently, all VA space reservations take into account global memory limit. The original intent was to limit memory allocations to however many NUMA nodes the machine had taking into the account that socket ID's may be discontiguous. Since we have had "socket count" API for while and it gives us correct NUMA node count, taking discontiguousness into account, we can relax the total limits and remove the restrictions, and let VA space usage scale with NUMA nodes. The only place where we actually require a hard limit is in 32-bit code, where we cannot allocate more than 2G of VA space. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> --- config/arm/meson.build | 1 - config/meson.build | 5 --- .../prog_guide/env_abstraction_layer.rst | 2 -- lib/eal/common/eal_common_dynmem.c | 13 +++---- lib/eal/freebsd/eal_memory.c | 36 +++---------------- lib/eal/linux/eal_memory.c | 10 +++--- 6 files changed, 14 insertions(+), 53 deletions(-) diff --git a/config/arm/meson.build b/config/arm/meson.build index 523b0fc0ed..3b03f5e31b 100644 --- a/config/arm/meson.build +++ b/config/arm/meson.build @@ -69,7 +69,6 @@ part_number_config_arm = { 'flags': [ ['RTE_MACHINE', '"neoverse-n1"'], ['RTE_ARM_FEATURE_ATOMICS', true], - ['RTE_MAX_MEM_MB', 1048576], ['RTE_MAX_LCORE', 256], ['RTE_MAX_NUMA_NODES', 8] ] diff --git a/config/meson.build b/config/meson.build index 02e2798cca..f68f1f5f53 100644 --- a/config/meson.build +++ b/config/meson.build @@ -383,11 +383,6 @@ dpdk_conf.set('RTE_PKTMBUF_HEADROOM', get_option('pkt_mbuf_headroom')) dpdk_conf.set('RTE_MAX_VFIO_GROUPS', 64) dpdk_conf.set('RTE_DRIVER_MEMPOOL_BUCKET_SIZE_KB', 64) dpdk_conf.set('RTE_LIBRTE_DPAA2_USE_PHYS_IOVA', true) -if dpdk_conf.get('RTE_ARCH_64') - dpdk_conf.set('RTE_MAX_MEM_MB', 524288) -else # for 32-bit we need smaller reserved memory areas - dpdk_conf.set('RTE_MAX_MEM_MB', 2048) -endif if get_option('mbuf_refcnt_atomic') dpdk_conf.set('RTE_MBUF_REFCNT_ATOMIC', true) endif diff --git a/doc/guides/prog_guide/env_abstraction_layer.rst b/doc/guides/prog_guide/env_abstraction_layer.rst index 04368a3950..63e0568afa 100644 --- a/doc/guides/prog_guide/env_abstraction_layer.rst +++ b/doc/guides/prog_guide/env_abstraction_layer.rst @@ -208,8 +208,6 @@ variables: can have (where "type" is defined as "page size + NUMA node" combination) * ``RTE_MAX_MEM_MB_PER_TYPE`` controls how much megabytes of memory each memory type can address -* ``RTE_MAX_MEM_MB`` places a global maximum on the amount of memory - DPDK can reserve Normally, these options do not need to be changed. diff --git a/lib/eal/common/eal_common_dynmem.c b/lib/eal/common/eal_common_dynmem.c index bfc4de6698..640199473e 100644 --- a/lib/eal/common/eal_common_dynmem.c +++ b/lib/eal/common/eal_common_dynmem.c @@ -30,7 +30,7 @@ eal_dynmem_memseg_lists_init(void) } *memtypes = NULL; int i, hpi_idx, msl_idx, ret = -1; /* fail unless told to succeed */ struct rte_memseg_list *msl; - uint64_t max_mem, max_mem_per_type; + uint64_t max_mem_per_type; size_t mem_va_len, mem_va_page_sz; unsigned int n_memtypes, cur_type; void *mem_va_addr = NULL; @@ -51,11 +51,8 @@ eal_dynmem_memseg_lists_init(void) * balancing act between maximum segments per type, maximum memory per * type, and number of detected NUMA nodes. * - * the total amount of memory is limited by RTE_MAX_MEM_MB value. - * - * the total amount of memory per type is limited by either - * RTE_MAX_MEM_MB_PER_TYPE, or by RTE_MAX_MEM_MB divided by the number - * of detected NUMA nodes. additionally, maximum number of segments per + * the total amount of memory per type is limited by + * RTE_MAX_MEM_MB_PER_TYPE. additionally, maximum number of segments per * type is also limited by RTE_MAX_MEMSEG_PER_TYPE. this is because for * smaller page sizes, it can take hundreds of thousands of segments to * reach the above specified per-type memory limits. @@ -110,9 +107,7 @@ eal_dynmem_memseg_lists_init(void) } /* set up limits for types */ - max_mem = (uint64_t)RTE_MAX_MEM_MB << 20; - max_mem_per_type = RTE_MIN((uint64_t)RTE_MAX_MEM_MB_PER_TYPE << 20, - max_mem / n_memtypes); + max_mem_per_type = (uint64_t)RTE_MAX_MEM_MB_PER_TYPE << 20; mem_va_len = 0; mem_va_page_sz = 0; diff --git a/lib/eal/freebsd/eal_memory.c b/lib/eal/freebsd/eal_memory.c index 36dcc04ce4..6ae5b22f1b 100644 --- a/lib/eal/freebsd/eal_memory.c +++ b/lib/eal/freebsd/eal_memory.c @@ -337,7 +337,6 @@ memseg_primary_init(void) struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config; int hpi_idx, msl_idx = 0; struct rte_memseg_list *msl; - uint64_t max_mem, total_mem; struct internal_config *internal_conf = eal_get_internal_configuration(); @@ -346,24 +345,15 @@ memseg_primary_init(void) return 0; /* FreeBSD has an issue where core dump will dump the entire memory - * contents, including anonymous zero-page memory. Therefore, while we - * will be limiting total amount of memory to RTE_MAX_MEM_MB, we will - * also be further limiting total memory amount to whatever memory is - * available to us through contigmem driver (plus spacing blocks). - * - * so, at each stage, we will be checking how much memory we are - * preallocating, and adjust all the values accordingly. + * contents, including anonymous zero-page memory. To avoid reserving VA + * space we are not going to use, size memseg lists according to + * contigmem-provided page counts. */ - max_mem = (uint64_t)RTE_MAX_MEM_MB << 20; - total_mem = 0; - /* create memseg lists */ for (hpi_idx = 0; hpi_idx < (int) internal_conf->num_hugepage_sizes; hpi_idx++) { - uint64_t max_type_mem, total_type_mem = 0; - uint64_t avail_mem; - int max_segs, avail_segs; + int avail_segs; struct hugepage_info *hpi; uint64_t hugepage_sz; unsigned int n_segs; @@ -373,15 +363,6 @@ memseg_primary_init(void) /* no NUMA support on FreeBSD */ - /* check if we've already exceeded total memory amount */ - if (total_mem >= max_mem) - break; - - /* first, calculate theoretical limits according to config */ - max_type_mem = RTE_MIN(max_mem - total_mem, - (uint64_t)RTE_MAX_MEM_MB_PER_TYPE << 20); - max_segs = RTE_MAX_MEMSEG_PER_TYPE; - /* now, limit all of that to whatever will actually be * available to us, because without dynamic allocation support, * all of that extra memory will be sitting there being useless @@ -393,11 +374,7 @@ memseg_primary_init(void) * that are non-contiguous. */ avail_segs = (hpi->num_pages[0] * 2) - 1; - avail_mem = avail_segs * hugepage_sz; - - max_type_mem = RTE_MIN(avail_mem, max_type_mem); - max_segs = RTE_MIN(avail_segs, max_segs); - n_segs = RTE_MIN(max_type_mem / hugepage_sz, (uint64_t)max_segs); + n_segs = avail_segs; if (n_segs == 0) continue; @@ -413,13 +390,10 @@ memseg_primary_init(void) 0, false)) return -1; - total_type_mem = n_segs * hugepage_sz; if (memseg_list_alloc(msl)) { EAL_LOG(ERR, "Cannot allocate VA space for memseg list"); return -1; } - - total_mem += total_type_mem; } return 0; } diff --git a/lib/eal/linux/eal_memory.c b/lib/eal/linux/eal_memory.c index 3b2afee852..c169895c6f 100644 --- a/lib/eal/linux/eal_memory.c +++ b/lib/eal/linux/eal_memory.c @@ -1695,12 +1695,13 @@ rte_eal_using_phys_addrs(void) static int __rte_unused memseg_primary_init_32(void) { + /* limit total amount of memory on 32-bit */ + const uint64_t mem32_max_mem = 2ULL << 30; struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config; int active_sockets, hpi_idx, msl_idx = 0; unsigned int socket_id, i; struct rte_memseg_list *msl; uint64_t extra_mem_per_socket, total_extra_mem, total_requested_mem; - uint64_t max_mem; struct internal_config *internal_conf = eal_get_internal_configuration(); @@ -1743,13 +1744,12 @@ memseg_primary_init_32(void) else total_requested_mem = internal_conf->memory; - max_mem = (uint64_t)RTE_MAX_MEM_MB << 20; - if (total_requested_mem > max_mem) { + if (total_requested_mem > mem32_max_mem) { EAL_LOG(ERR, "Invalid parameters: 32-bit process can at most use %uM of memory", - (unsigned int)(max_mem >> 20)); + (unsigned int)(mem32_max_mem >> 20)); return -1; } - total_extra_mem = max_mem - total_requested_mem; + total_extra_mem = mem32_max_mem - total_requested_mem; extra_mem_per_socket = active_sockets == 0 ? total_extra_mem : total_extra_mem / active_sockets; -- 2.47.3 ^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH v1 4/5] eal/memory: store default segment limits in config 2026-03-11 10:58 [PATCH v1 0/5] Make VA reservation limits configurable Anatoly Burakov ` (2 preceding siblings ...) 2026-03-11 10:58 ` [PATCH v1 3/5] eal/memory: get rid of global VA space limits Anatoly Burakov @ 2026-03-11 10:58 ` Anatoly Burakov 2026-03-11 10:58 ` [PATCH v1 5/5] eal/memory: add page size VA limits EAL parameter Anatoly Burakov 2026-03-13 16:06 ` [PATCH v2 0/6] Make VA reservation limits configurable Anatoly Burakov 5 siblings, 0 replies; 14+ messages in thread From: Anatoly Burakov @ 2026-03-11 10:58 UTC (permalink / raw) To: dev, Bruce Richardson, Dmitry Kozlyuk Currently, VA space allocation is regulated by two constants picked up from config - max memseg per list, and max memory per list. In preparation for these limits being dynamic, add a per-page-size limit value in config, populate that value from these defaults at init time, and adjust the code to only refer to the mem limits from internal config. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> --- lib/eal/common/eal_common_dynmem.c | 29 +++++++++++------------------ lib/eal/common/eal_common_options.c | 20 ++++++++++++++++++++ lib/eal/common/eal_internal_cfg.h | 2 ++ lib/eal/common/eal_options.h | 1 + lib/eal/freebsd/eal.c | 6 ++++++ lib/eal/linux/eal.c | 6 ++++++ lib/eal/linux/eal_memory.c | 6 ++++-- lib/eal/windows/eal.c | 6 ++++++ 8 files changed, 56 insertions(+), 20 deletions(-) diff --git a/lib/eal/common/eal_common_dynmem.c b/lib/eal/common/eal_common_dynmem.c index 640199473e..0d5e056239 100644 --- a/lib/eal/common/eal_common_dynmem.c +++ b/lib/eal/common/eal_common_dynmem.c @@ -24,13 +24,13 @@ eal_dynmem_memseg_lists_init(void) struct memtype { uint64_t page_sz; int socket_id; + unsigned int hpi_idx; unsigned int n_segs; size_t mem_sz; size_t va_offset; } *memtypes = NULL; int i, hpi_idx, msl_idx, ret = -1; /* fail unless told to succeed */ struct rte_memseg_list *msl; - uint64_t max_mem_per_type; size_t mem_va_len, mem_va_page_sz; unsigned int n_memtypes, cur_type; void *mem_va_addr = NULL; @@ -51,15 +51,9 @@ eal_dynmem_memseg_lists_init(void) * balancing act between maximum segments per type, maximum memory per * type, and number of detected NUMA nodes. * - * the total amount of memory per type is limited by - * RTE_MAX_MEM_MB_PER_TYPE. additionally, maximum number of segments per - * type is also limited by RTE_MAX_MEMSEG_PER_TYPE. this is because for - * smaller page sizes, it can take hundreds of thousands of segments to - * reach the above specified per-type memory limits. - * - * each memory type is allotted a single memseg list. the size of that - * list is calculated here to respect the per-type memory and segment - * limits that apply. + * the total amount of memory per type is limited by per-page-size + * memory values in internal config. each memory type is allotted one + * memseg list. */ /* create space for mem types */ @@ -90,6 +84,7 @@ eal_dynmem_memseg_lists_init(void) #endif memtypes[cur_type].page_sz = hugepage_sz; memtypes[cur_type].socket_id = socket_id; + memtypes[cur_type].hpi_idx = hpi_idx; EAL_LOG(DEBUG, "Detected memory type: " "socket_id:%u hugepage_sz:%" PRIu64, @@ -106,18 +101,19 @@ eal_dynmem_memseg_lists_init(void) goto out; } - /* set up limits for types */ - max_mem_per_type = (uint64_t)RTE_MAX_MEM_MB_PER_TYPE << 20; mem_va_len = 0; mem_va_page_sz = 0; /* calculate total VA space and offsets for all mem types */ for (cur_type = 0; cur_type < n_memtypes; cur_type++) { - unsigned int max_segs_per_type; + unsigned int n_segs; struct memtype *type = &memtypes[cur_type]; + uint64_t max_mem_per_type; uint64_t pagesz; pagesz = type->page_sz; + max_mem_per_type = + internal_conf->hugepage_mem_sz_limits[type->hpi_idx]; /* * we need to create a segment list for this type. we must take @@ -126,12 +122,9 @@ eal_dynmem_memseg_lists_init(void) * 1. total amount of memory to use for this memory type * 2. total amount of memory allowed per type * 3. number of segments needed to fit the amount of memory - * 4. number of segments allowed per type */ - max_segs_per_type = max_mem_per_type / pagesz; - max_segs_per_type = RTE_MIN(max_segs_per_type, - (unsigned int)RTE_MAX_MEMSEG_PER_TYPE); - type->n_segs = max_segs_per_type; + n_segs = max_mem_per_type / pagesz; + type->n_segs = n_segs; type->mem_sz = (size_t)pagesz * type->n_segs; mem_va_page_sz = RTE_MAX(mem_va_page_sz, (size_t)pagesz); mem_va_len = RTE_ALIGN_CEIL(mem_va_len, pagesz); diff --git a/lib/eal/common/eal_common_options.c b/lib/eal/common/eal_common_options.c index aad676a004..806f4d0a2c 100644 --- a/lib/eal/common/eal_common_options.c +++ b/lib/eal/common/eal_common_options.c @@ -510,6 +510,7 @@ eal_reset_internal_config(struct internal_config *internal_cfg) memset(&internal_cfg->hugepage_info[i], 0, sizeof(internal_cfg->hugepage_info[0])); internal_cfg->hugepage_info[i].lock_descriptor = -1; + internal_cfg->hugepage_mem_sz_limits[i] = 0; } internal_cfg->base_virtaddr = 0; @@ -2359,6 +2360,25 @@ eal_adjust_config(struct internal_config *internal_cfg) return 0; } +int +eal_apply_hugepage_mem_sz_limits(struct internal_config *internal_cfg) +{ + unsigned int i, j; + + for (i = 0; i < internal_cfg->num_hugepage_sizes; i++) { + const uint64_t pagesz = internal_cfg->hugepage_info[i].hugepage_sz; + uint64_t limit; + + /* assign default limits */ + limit = RTE_MIN((uint64_t)RTE_MAX_MEM_MB_PER_TYPE << 20, + (uint64_t)RTE_MAX_MEMSEG_PER_TYPE * pagesz); + + internal_cfg->hugepage_mem_sz_limits[i] = limit; + } + + return 0; +} + RTE_EXPORT_SYMBOL(rte_vect_get_max_simd_bitwidth) uint16_t rte_vect_get_max_simd_bitwidth(void) diff --git a/lib/eal/common/eal_internal_cfg.h b/lib/eal/common/eal_internal_cfg.h index 95d327a613..0bf192c6e5 100644 --- a/lib/eal/common/eal_internal_cfg.h +++ b/lib/eal/common/eal_internal_cfg.h @@ -96,6 +96,8 @@ struct internal_config { /**< user defined mbuf pool ops name */ unsigned num_hugepage_sizes; /**< how many sizes on this system */ struct hugepage_info hugepage_info[MAX_HUGEPAGE_SIZES]; + uint64_t hugepage_mem_sz_limits[MAX_HUGEPAGE_SIZES]; + /**< default max memory per hugepage size */ enum rte_iova_mode iova_mode ; /**< Set IOVA mode on this system */ rte_cpuset_t ctrl_cpuset; /**< cpuset for ctrl threads */ volatile unsigned int init_complete; diff --git a/lib/eal/common/eal_options.h b/lib/eal/common/eal_options.h index f5e7905609..82cc8be8db 100644 --- a/lib/eal/common/eal_options.h +++ b/lib/eal/common/eal_options.h @@ -12,6 +12,7 @@ struct rte_tel_data; int eal_parse_log_options(void); int eal_parse_args(void); int eal_option_device_parse(void); +int eal_apply_hugepage_mem_sz_limits(struct internal_config *internal_cfg); int eal_adjust_config(struct internal_config *internal_cfg); int eal_cleanup_config(struct internal_config *internal_cfg); enum rte_proc_type_t eal_proc_type_detect(void); diff --git a/lib/eal/freebsd/eal.c b/lib/eal/freebsd/eal.c index 60f5e676a8..8b1ba5b99b 100644 --- a/lib/eal/freebsd/eal.c +++ b/lib/eal/freebsd/eal.c @@ -585,6 +585,12 @@ rte_eal_init(int argc, char **argv) rte_errno = EACCES; goto err_out; } + if (internal_conf->process_type == RTE_PROC_PRIMARY && + eal_apply_hugepage_mem_sz_limits(internal_conf) < 0) { + rte_eal_init_alert("Cannot apply hugepage memory limits."); + rte_errno = EINVAL; + goto err_out; + } } if (internal_conf->memory == 0 && internal_conf->force_numa == 0) { diff --git a/lib/eal/linux/eal.c b/lib/eal/linux/eal.c index d848de03d8..fc2e9b8c0e 100644 --- a/lib/eal/linux/eal.c +++ b/lib/eal/linux/eal.c @@ -748,6 +748,12 @@ rte_eal_init(int argc, char **argv) rte_errno = EACCES; goto err_out; } + if (internal_conf->process_type == RTE_PROC_PRIMARY && + eal_apply_hugepage_mem_sz_limits(internal_conf) < 0) { + rte_eal_init_alert("Cannot apply hugepage memory limits."); + rte_errno = EINVAL; + goto err_out; + } } if (internal_conf->memory == 0 && internal_conf->force_numa == 0) { diff --git a/lib/eal/linux/eal_memory.c b/lib/eal/linux/eal_memory.c index c169895c6f..38934b9a65 100644 --- a/lib/eal/linux/eal_memory.c +++ b/lib/eal/linux/eal_memory.c @@ -1813,6 +1813,7 @@ memseg_primary_init_32(void) for (hpi_idx = 0; hpi_idx < hp_sizes; hpi_idx++) { uint64_t max_pagesz_mem, cur_pagesz_mem = 0; uint64_t hugepage_sz; + uint64_t pagesz_mem_limit; struct hugepage_info *hpi; unsigned int n_segs; @@ -1824,12 +1825,13 @@ memseg_primary_init_32(void) continue; max_pagesz_mem = max_socket_mem - cur_socket_mem; + pagesz_mem_limit = internal_conf->hugepage_mem_sz_limits[hpi_idx]; + max_pagesz_mem = RTE_MIN(max_pagesz_mem, pagesz_mem_limit); /* make it multiple of page size */ max_pagesz_mem = RTE_ALIGN_FLOOR(max_pagesz_mem, hugepage_sz); - n_segs = RTE_MIN(max_pagesz_mem / hugepage_sz, - (unsigned int)RTE_MAX_MEMSEG_PER_TYPE); + n_segs = max_pagesz_mem / hugepage_sz; EAL_LOG(DEBUG, "Attempting to preallocate " "%" PRIu64 "M on socket %i", diff --git a/lib/eal/windows/eal.c b/lib/eal/windows/eal.c index f06375a624..6dacae7235 100644 --- a/lib/eal/windows/eal.c +++ b/lib/eal/windows/eal.c @@ -229,6 +229,12 @@ rte_eal_init(int argc, char **argv) rte_errno = EACCES; goto err_out; } + if (!internal_conf->no_hugetlbfs && + eal_apply_hugepage_mem_sz_limits(internal_conf) < 0) { + rte_eal_init_alert("Cannot apply hugepage memory limits"); + rte_errno = EINVAL; + goto err_out; + } if (internal_conf->memory == 0 && !internal_conf->force_numa) { if (internal_conf->no_hugetlbfs) -- 2.47.3 ^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH v1 5/5] eal/memory: add page size VA limits EAL parameter 2026-03-11 10:58 [PATCH v1 0/5] Make VA reservation limits configurable Anatoly Burakov ` (3 preceding siblings ...) 2026-03-11 10:58 ` [PATCH v1 4/5] eal/memory: store default segment limits in config Anatoly Burakov @ 2026-03-11 10:58 ` Anatoly Burakov 2026-03-13 16:06 ` [PATCH v2 0/6] Make VA reservation limits configurable Anatoly Burakov 5 siblings, 0 replies; 14+ messages in thread From: Anatoly Burakov @ 2026-03-11 10:58 UTC (permalink / raw) To: dev Currently, the VA space limits placed on DPDK memory are only informed by the default configuration coming from `rte_config.h` file. Add an EAL flag to specify per-page size memory limits explicitly, thereby overriding the default VA space reservations. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> --- app/test/test.c | 1 + app/test/test_eal_flags.c | 113 ++++++++++++++++++ doc/guides/linux_gsg/linux_eal_parameters.rst | 13 ++ .../prog_guide/env_abstraction_layer.rst | 27 ++++- lib/eal/common/eal_common_dynmem.c | 9 ++ lib/eal/common/eal_common_options.c | 100 ++++++++++++++++ lib/eal/common/eal_internal_cfg.h | 6 + lib/eal/common/eal_option_list.h | 1 + 8 files changed, 268 insertions(+), 2 deletions(-) diff --git a/app/test/test.c b/app/test/test.c index 58ef52f312..c610c3588e 100644 --- a/app/test/test.c +++ b/app/test/test.c @@ -80,6 +80,7 @@ do_recursive_call(void) { "test_memory_flags", no_action }, { "test_file_prefix", no_action }, { "test_no_huge_flag", no_action }, + { "test_pagesz_mem_flags", no_action }, { "test_panic", test_panic }, { "test_exit", test_exit }, #ifdef RTE_LIB_TIMER diff --git a/app/test/test_eal_flags.c b/app/test/test_eal_flags.c index b3a8d0ae6f..7939efee41 100644 --- a/app/test/test_eal_flags.c +++ b/app/test/test_eal_flags.c @@ -95,6 +95,14 @@ test_misc_flags(void) return TEST_SKIPPED; } +static int +test_pagesz_mem_flags(void) +{ + printf("pagesz_mem_flags not supported on Windows, skipping test\n"); + return TEST_SKIPPED; +} + + #else #include <libgen.h> @@ -1502,6 +1510,110 @@ populate_socket_mem_param(int num_sockets, const char *mem, offset += written; } +/* + * Tests for correct handling of --pagesz-mem flag + */ +static int +test_pagesz_mem_flags(void) +{ +#ifdef RTE_EXEC_ENV_FREEBSD + /* FreeBSD does not support --pagesz-mem */ + return 0; +#else + const char *in_memory = "--in-memory"; + const char *prefix = file_prefix_arg(); + if (prefix == NULL) { + printf("Error (line %d) - unable to get current prefix!\n", __LINE__); + return -1; + } + + /* invalid: no value */ + static const char * const argv0[] = {prgname, eal_debug_logs, no_pci, + "--file-prefix=" memtest, in_memory, "--pagesz-mem="}; + + /* invalid: no colon (missing limit) */ + static const char * const argv1[] = {prgname, eal_debug_logs, no_pci, + "--file-prefix=" memtest, in_memory, "--pagesz-mem=2M"}; + + /* invalid: colon present but limit is empty */ + static const char * const argv2[] = {prgname, eal_debug_logs, no_pci, + "--file-prefix=" memtest, in_memory, "--pagesz-mem=2M:"}; + + /* invalid: limit not aligned to page size (3M is not a multiple of 2M) */ + static const char * const argv3[] = {prgname, eal_debug_logs, no_pci, + "--file-prefix=" memtest, in_memory, "--pagesz-mem=2M:3M"}; + + /* invalid: garbage value */ + static const char * const argv4[] = {prgname, eal_debug_logs, no_pci, + "--file-prefix=" memtest, in_memory, "--pagesz-mem=garbage"}; + + /* invalid: --pagesz-mem combined with --no-huge */ + static const char * const argv5[] = {prgname, eal_debug_logs, no_pci, + "--file-prefix=" memtest, in_memory, no_huge, "--pagesz-mem=2M:2M"}; + + /* valid: single well-formed aligned pair */ + static const char * const argv6[] = {prgname, eal_debug_logs, no_pci, + "--file-prefix=" memtest, in_memory, "--pagesz-mem=2M:64M"}; + + /* valid: multiple occurrences */ + static const char * const argv7[] = {prgname, eal_debug_logs, no_pci, + "--file-prefix=" memtest, in_memory, + "--pagesz-mem=2M:64M", "--pagesz-mem=1K:0"}; + + /* valid: fake page size set to zero (ignored but syntactically valid) */ + static const char * const argv8[] = {prgname, eal_debug_logs, no_pci, + "--file-prefix=" memtest, in_memory, "--pagesz-mem=1K:0"}; + + if (launch_proc(argv0) == 0) { + printf("Error (line %d) - process run ok with empty --pagesz-mem!\n", + __LINE__); + return -1; + } + if (launch_proc(argv1) == 0) { + printf("Error (line %d) - process run ok with --pagesz-mem missing colon!\n", + __LINE__); + return -1; + } + if (launch_proc(argv2) == 0) { + printf("Error (line %d) - process run ok with --pagesz-mem missing limit!\n", + __LINE__); + return -1; + } + if (launch_proc(argv3) == 0) { + printf("Error (line %d) - process run ok with --pagesz-mem unaligned limit!\n", + __LINE__); + return -1; + } + if (launch_proc(argv4) == 0) { + printf("Error (line %d) - process run ok with --pagesz-mem garbage value!\n", + __LINE__); + return -1; + } + if (launch_proc(argv5) == 0) { + printf("Error (line %d) - process run ok with --pagesz-mem and --no-huge!\n", + __LINE__); + return -1; + } + if (launch_proc(argv6) != 0) { + printf("Error (line %d) - process failed with valid --pagesz-mem!\n", + __LINE__); + return -1; + } + if (launch_proc(argv7) != 0) { + printf("Error (line %d) - process failed with multiple valid --pagesz-mem!\n", + __LINE__); + return -1; + } + if (launch_proc(argv8) != 0) { + printf("Error (line %d) - process failed with --pagesz-mem zero limit!\n", + __LINE__); + return -1; + } + + return 0; +#endif /* !RTE_EXEC_ENV_FREEBSD */ +} + /* * Tests for correct handling of -m and --socket-mem flags */ @@ -1683,5 +1795,6 @@ REGISTER_FAST_TEST(eal_flags_b_opt_autotest, NOHUGE_SKIP, ASAN_SKIP, test_invali REGISTER_FAST_TEST(eal_flags_vdev_opt_autotest, NOHUGE_SKIP, ASAN_SKIP, test_invalid_vdev_flag); REGISTER_FAST_TEST(eal_flags_r_opt_autotest, NOHUGE_SKIP, ASAN_SKIP, test_invalid_r_flag); REGISTER_FAST_TEST(eal_flags_mem_autotest, NOHUGE_SKIP, ASAN_SKIP, test_memory_flags); +REGISTER_FAST_TEST(eal_flags_pagesz_mem_autotest, NOHUGE_SKIP, ASAN_SKIP, test_pagesz_mem_flags); REGISTER_FAST_TEST(eal_flags_file_prefix_autotest, NOHUGE_SKIP, ASAN_SKIP, test_file_prefix); REGISTER_FAST_TEST(eal_flags_misc_autotest, NOHUGE_SKIP, ASAN_SKIP, test_misc_flags); diff --git a/doc/guides/linux_gsg/linux_eal_parameters.rst b/doc/guides/linux_gsg/linux_eal_parameters.rst index 7c5b26ce26..0507a1bf9e 100644 --- a/doc/guides/linux_gsg/linux_eal_parameters.rst +++ b/doc/guides/linux_gsg/linux_eal_parameters.rst @@ -75,6 +75,19 @@ Memory-related options Place a per-NUMA node upper limit on memory use (non-legacy memory mode only). 0 will disable the limit for a particular NUMA node. +* ``--pagesz-mem <page size:limit>`` + + Set memory limit per hugepage size. + The option accepts one ``<pagesz>:<limit>`` pair per use, + and can be repeated for multiple page sizes. + Both values support K/M/G/T suffixes (for example ``2M:64G``). + + The memory limit must be a multiple of page size. + + For example:: + + --pagesz-mem 2M:32G --pagesz-mem 1G:512G + * ``--single-file-segments`` Create fewer files in hugetlbfs (non-legacy mode only). diff --git a/doc/guides/prog_guide/env_abstraction_layer.rst b/doc/guides/prog_guide/env_abstraction_layer.rst index 63e0568afa..102cec12c5 100644 --- a/doc/guides/prog_guide/env_abstraction_layer.rst +++ b/doc/guides/prog_guide/env_abstraction_layer.rst @@ -204,13 +204,36 @@ of virtual memory being preallocated at startup by editing the following config variables: * ``RTE_MAX_MEMSEG_LISTS`` controls how many segment lists can DPDK have -* ``RTE_MAX_MEMSEG_PER_TYPE`` controls how many segments each memory type +* ``RTE_MAX_MEMSEG_PER_TYPE`` sets the default number of segments each memory type can have (where "type" is defined as "page size + NUMA node" combination) -* ``RTE_MAX_MEM_MB_PER_TYPE`` controls how much megabytes of memory each +* ``RTE_MAX_MEM_MB_PER_TYPE`` sets the default amount of memory each memory type can address Normally, these options do not need to be changed. +Runtime Override of Per-Page-Size Memory Limits +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +By default, DPDK uses compile-time configured limits for memory allocation per page size +(as set by ``RTE_MAX_MEM_MB_PER_TYPE``). +These limits apply uniformly across all NUMA nodes for a given page size. + +It is possible to override these defaults at runtime using the ``--pagesz-mem`` option, +which allows specifying custom memory limits for each page size. This is useful when: + +* The default limits are insufficient (or too big) for your workload +* You want to dedicate more memory to specific page sizes + +The ``--pagesz-mem`` option accepts exactly one ``<pagesz>:<limit>`` pair per +occurrence, where ``pagesz`` is a page size (e.g., ``2M``, ``4M``, ``1G``) +and ``limit`` is the maximum memory to reserve for that page size (e.g., ``64G``, ``512M``). +Both values support standard binary suffixes (K, M, G, T). +Memory limits must be aligned to their corresponding page size. + +Multiple page sizes can be specified by repeating the option:: + + --pagesz-mem 2M:64G --pagesz-mem 1G:512G + .. note:: Preallocated virtual memory is not to be confused with preallocated hugepage diff --git a/lib/eal/common/eal_common_dynmem.c b/lib/eal/common/eal_common_dynmem.c index 0d5e056239..60a25f524e 100644 --- a/lib/eal/common/eal_common_dynmem.c +++ b/lib/eal/common/eal_common_dynmem.c @@ -132,6 +132,11 @@ eal_dynmem_memseg_lists_init(void) mem_va_len += type->mem_sz; } + if (mem_va_len == 0) { + EAL_LOG(ERR, "No virtual memory will be reserved"); + goto out; + } + mem_va_addr = eal_get_virtual_area(NULL, &mem_va_len, mem_va_page_sz, 0, 0); if (mem_va_addr == NULL) { @@ -146,6 +151,10 @@ eal_dynmem_memseg_lists_init(void) uint64_t pagesz; int socket_id; + /* skip page sizes with zero memory limit */ + if (type->n_segs == 0) + continue; + pagesz = type->page_sz; socket_id = type->socket_id; diff --git a/lib/eal/common/eal_common_options.c b/lib/eal/common/eal_common_options.c index 806f4d0a2c..9982e7f2ce 100644 --- a/lib/eal/common/eal_common_options.c +++ b/lib/eal/common/eal_common_options.c @@ -233,6 +233,20 @@ eal_collate_args(int argc, char **argv) EAL_LOG(ERR, "Options allow (-a) and block (-b) can't be used at the same time"); return -1; } +#ifdef RTE_EXEC_ENV_FREEBSD + if (!TAILQ_EMPTY(&args.pagesz_mem)) { + EAL_LOG(ERR, "Option pagesz-mem is not supported on FreeBSD"); + return -1; + } +#endif + if (!TAILQ_EMPTY(&args.pagesz_mem) && args.no_huge) { + EAL_LOG(ERR, "Options pagesz-mem and no-huge can't be used at the same time"); + return -1; + } + if (!TAILQ_EMPTY(&args.pagesz_mem) && args.legacy_mem) { + EAL_LOG(ERR, "Options pagesz-mem and legacy-mem can't be used at the same time"); + return -1; + } /* for non-list args, we can just check for zero/null values using macro */ if (CONFLICTING_OPTIONS(args, coremask, lcores) || @@ -511,7 +525,10 @@ eal_reset_internal_config(struct internal_config *internal_cfg) sizeof(internal_cfg->hugepage_info[0])); internal_cfg->hugepage_info[i].lock_descriptor = -1; internal_cfg->hugepage_mem_sz_limits[i] = 0; + internal_cfg->pagesz_mem_overrides[i].pagesz = 0; + internal_cfg->pagesz_mem_overrides[i].limit = 0; } + internal_cfg->num_pagesz_mem_overrides = 0; internal_cfg->base_virtaddr = 0; /* if set to NONE, interrupt mode is determined automatically */ @@ -1867,6 +1884,77 @@ eal_parse_socket_arg(char *strval, volatile uint64_t *socket_arg) return 0; } +static int +eal_parse_pagesz_mem(char *strval, struct internal_config *internal_cfg) +{ + char *pagesz_str, *mem_str; + int len; + uint64_t pagesz, mem_limit; + struct pagesz_mem_override *pmo; + + /* do we have space? */ + if (internal_cfg->num_pagesz_mem_overrides >= MAX_HUGEPAGE_SIZES) { + EAL_LOG(ERR, + "--pagesz-mem: too many page size entries (max %d)", + MAX_HUGEPAGE_SIZES); + return -1; + } + + len = strnlen(strval, 1024); + if (len >= 1024) { + EAL_LOG(ERR, "--pagesz-mem parameter is too long"); + return -1; + } + + /* parse exactly one pagesz:mem pair per --pagesz-mem option */ + pagesz_str = strval; + mem_str = strchr(pagesz_str, ':'); + + if (mem_str == NULL || mem_str == pagesz_str || mem_str[1] == '\0') { + EAL_LOG(ERR, "--pagesz-mem parameter format is invalid, expected <pagesz>:<limit>"); + return -1; + } + + /* reject accidental multiple pairs in one option */ + if (strchr(mem_str + 1, ',') != NULL) { + EAL_LOG(ERR, "--pagesz-mem accepts one <pagesz>:<limit> pair per option"); + return -1; + } + + /* temporarily null-terminate pagesz for parsing */ + *mem_str = '\0'; + mem_str++; + + /* parse page size */ + errno = 0; + pagesz = rte_str_to_size(pagesz_str); + if (pagesz == 0 || errno != 0) { + EAL_LOG(ERR, "invalid page size in --pagesz-mem: '%s'", pagesz_str); + return -1; + } + + /* parse memory limit (0 is valid: disables allocation for this page size) */ + errno = 0; + mem_limit = rte_str_to_size(mem_str); + if (errno != 0) { + EAL_LOG(ERR, "invalid memory limit in --pagesz-mem: '%s'", mem_str); + return -1; + } + + /* validate alignment: memory limit must be divisible by page size */ + if (mem_limit % pagesz != 0) { + EAL_LOG(ERR, "--pagesz-mem memory limit must be aligned to page size"); + return -1; + } + + pmo = &internal_cfg->pagesz_mem_overrides[internal_cfg->num_pagesz_mem_overrides]; + pmo->pagesz = pagesz; + pmo->limit = mem_limit; + internal_cfg->num_pagesz_mem_overrides++; + + return 0; +} + static int eal_parse_vfio_intr(const char *mode) { @@ -2172,6 +2260,12 @@ eal_parse_args(void) } int_cfg->force_numa_limits = 1; } + TAILQ_FOREACH(arg, &args.pagesz_mem, next) { + if (eal_parse_pagesz_mem(arg->arg, int_cfg) < 0) { + EAL_LOG(ERR, "invalid pagesz-mem parameter: '%s'", arg->arg); + return -1; + } + } /* tracing settings, not supported on windows */ #ifdef RTE_EXEC_ENV_WINDOWS @@ -2373,6 +2467,12 @@ eal_apply_hugepage_mem_sz_limits(struct internal_config *internal_cfg) limit = RTE_MIN((uint64_t)RTE_MAX_MEM_MB_PER_TYPE << 20, (uint64_t)RTE_MAX_MEMSEG_PER_TYPE * pagesz); + /* override with user value for matching page size */ + for (j = 0; j < (unsigned int)internal_cfg->num_pagesz_mem_overrides; j++) { + if (internal_cfg->pagesz_mem_overrides[j].pagesz == pagesz) + limit = internal_cfg->pagesz_mem_overrides[j].limit; + } + internal_cfg->hugepage_mem_sz_limits[i] = limit; } diff --git a/lib/eal/common/eal_internal_cfg.h b/lib/eal/common/eal_internal_cfg.h index 0bf192c6e5..11fe1cb8f9 100644 --- a/lib/eal/common/eal_internal_cfg.h +++ b/lib/eal/common/eal_internal_cfg.h @@ -98,6 +98,12 @@ struct internal_config { struct hugepage_info hugepage_info[MAX_HUGEPAGE_SIZES]; uint64_t hugepage_mem_sz_limits[MAX_HUGEPAGE_SIZES]; /**< default max memory per hugepage size */ + /** storage for user-specified pagesz-mem overrides */ + struct pagesz_mem_override { + uint64_t pagesz; /**< page size in bytes */ + uint64_t limit; /**< memory limit in bytes */ + } pagesz_mem_overrides[MAX_HUGEPAGE_SIZES]; + int num_pagesz_mem_overrides; /**< number of stored overrides */ enum rte_iova_mode iova_mode ; /**< Set IOVA mode on this system */ rte_cpuset_t ctrl_cpuset; /**< cpuset for ctrl threads */ volatile unsigned int init_complete; diff --git a/lib/eal/common/eal_option_list.h b/lib/eal/common/eal_option_list.h index abee16340b..c99d06be7a 100644 --- a/lib/eal/common/eal_option_list.h +++ b/lib/eal/common/eal_option_list.h @@ -51,6 +51,7 @@ STR_ARG("--mbuf-pool-ops-name", NULL, "User defined mbuf default pool ops name", STR_ARG("--memory-channels", "-n", "Number of memory channels per socket", memory_channels) STR_ARG("--memory-ranks", "-r", "Force number of memory ranks (don't detect)", memory_ranks) STR_ARG("--memory-size", "-m", "Total size of memory to allocate initially", memory_size) +LIST_ARG("--pagesz-mem", NULL, "Memory allocation per hugepage size (format: <pagesz>:<limit>, e.g. 2M:32G). Repeat option for multiple page sizes.", pagesz_mem) BOOL_ARG("--no-hpet", NULL, "Disable HPET timer", no_hpet) BOOL_ARG("--no-huge", NULL, "Disable hugetlbfs support", no_huge) BOOL_ARG("--no-pci", NULL, "Disable all PCI devices", no_pci) -- 2.47.3 ^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH v2 0/6] Make VA reservation limits configurable 2026-03-11 10:58 [PATCH v1 0/5] Make VA reservation limits configurable Anatoly Burakov ` (4 preceding siblings ...) 2026-03-11 10:58 ` [PATCH v1 5/5] eal/memory: add page size VA limits EAL parameter Anatoly Burakov @ 2026-03-13 16:06 ` Anatoly Burakov 2026-03-13 16:06 ` [PATCH v2 1/6] eal: reject non-numeric input in str to size Anatoly Burakov ` (5 more replies) 5 siblings, 6 replies; 14+ messages in thread From: Anatoly Burakov @ 2026-03-13 16:06 UTC (permalink / raw) To: dev Currently, the VA space reservation is governed by a combination of a few values: - Total max VA space (512G for most platforms, 1T for some, 2G for 32-bit) - Max memory per memory type - Max pages per memory type The "memory" type is defined as unique combination of NUMA node and page size. The reason why there are two limits is because for large pages, having large segment limit causes runaway multi-terabyte reservations, while for smaller pages, having large memory limit causes hundreds of thousands of huge page slots. The total maximum memory size was originally intended as a safeguard against discontiguous NUMA nodes, but this has since been fixed by EAL API explicitly supporting discontiguous NUMA nodes, so this is no longer a problem. In addition to that, each memory type was split into multiple segment lists, with the idea that it should be easier for a secondary process to reserve multiple smaller chunks at discontiguous addresses than it is to reserve a large single chunk of memory. It is unknown whether this actually makes a difference, but what *is* known is that it's a source of additional complexity with memory reservation, as well as a source of gratuitous memory reservation limits placed on DPDK. This patchset attempts to simplify and improve this situation in a few key areas: - Get rid of global memory limits Total memory usage can, and should, scale with NUMA sockets, and so now it does. - Get rid of multiple segment lists per memory type This removes two config options, and makes the address space reservations a lot simpler. - Allocate all memory segment lists as one big blob of memory This further simplifies address space reservations. - Use memory size limits instead of segments limits Despite smaller page sizes still needing limits on number of segments, they are directly translated into memory size limits at init time, so that all limits the VA space reservation ever sees are expressed in bytes, not segments. This reduces complexity in how we manage the VA space reservations and work with our limits. - Do not use config constants directly We switch to only invoking these constants once - at startup, when we are discovering hugepage sizes available to the system. This allows us to be more flexible in how we manage these limits. - Add EAL command-line option to set per-page size limits The final piece of the puzzle - the "more flexible in how we manage these limits" part. This new parameter affords us more flexible VA space management, including disabling specific page sizes entirely (by specifying 0 as the limit). This allows increasing/decreasing VA space reservation limits without recompiling DPDK. v1 -> v2: - Fix str_to_size not handling invalid input properly - Move str_to_size autotests to string autotests file - Fix hugepage file segment indexing to not use global constants - More validation around VA reservation Anatoly Burakov (6): eal: reject non-numeric input in str to size eal/memory: remove per-list segment and memory limits eal/memory: allocate all VA space in one go eal/memory: get rid of global VA space limits eal/memory: store default segment limits in config eal/memory: add page size VA limits EAL parameter app/test/test.c | 1 + app/test/test_eal_flags.c | 126 ++++++++++++ app/test/test_malloc.c | 30 --- app/test/test_string_fns.c | 66 ++++++ config/arm/meson.build | 1 - config/meson.build | 5 - config/rte_config.h | 2 - doc/guides/linux_gsg/linux_eal_parameters.rst | 13 ++ .../prog_guide/env_abstraction_layer.rst | 33 ++- lib/eal/common/eal_common_dynmem.c | 192 ++++++++---------- lib/eal/common/eal_common_memory.c | 28 ++- lib/eal/common/eal_common_options.c | 141 +++++++++++++ lib/eal/common/eal_common_string_fns.c | 4 + lib/eal/common/eal_filesystem.h | 13 ++ lib/eal/common/eal_internal_cfg.h | 8 + lib/eal/common/eal_memcfg.h | 6 + lib/eal/common/eal_option_list.h | 1 + lib/eal/common/eal_options.h | 1 + lib/eal/common/eal_private.h | 19 +- lib/eal/freebsd/eal.c | 6 + lib/eal/freebsd/eal_memory.c | 101 +++------ lib/eal/linux/eal.c | 6 + lib/eal/linux/eal_memalloc.c | 4 +- lib/eal/linux/eal_memory.c | 170 ++++++++++------ lib/eal/windows/eal.c | 6 + 25 files changed, 687 insertions(+), 296 deletions(-) -- 2.47.3 ^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH v2 1/6] eal: reject non-numeric input in str to size 2026-03-13 16:06 ` [PATCH v2 0/6] Make VA reservation limits configurable Anatoly Burakov @ 2026-03-13 16:06 ` Anatoly Burakov 2026-03-13 16:16 ` Bruce Richardson 2026-03-13 16:06 ` [PATCH v2 2/6] eal/memory: remove per-list segment and memory limits Anatoly Burakov ` (4 subsequent siblings) 5 siblings, 1 reply; 14+ messages in thread From: Anatoly Burakov @ 2026-03-13 16:06 UTC (permalink / raw) To: dev Add a check in rte_str_to_size validating that strtoull() consumed at least one character. If not, set errno to EINVAL and return 0. Also move rte_str_to_size unit coverage from malloc tests to string_autotest, where string utility tests belong, and add a new test to test for handling invalid numerical input. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> --- app/test/test_malloc.c | 30 ------------ app/test/test_string_fns.c | 66 ++++++++++++++++++++++++++ lib/eal/common/eal_common_string_fns.c | 4 ++ 3 files changed, 70 insertions(+), 30 deletions(-) diff --git a/app/test/test_malloc.c b/app/test/test_malloc.c index 344a730e28..da868c8091 100644 --- a/app/test/test_malloc.c +++ b/app/test/test_malloc.c @@ -271,35 +271,6 @@ test_reordered_free(void) return ret; } -/* test function inside the malloc lib*/ -static int -test_str_to_size(void) -{ - struct { - const char *str; - uint64_t value; - } test_values[] = - {{ "5G", (uint64_t)5 * 1024 * 1024 *1024 }, - {"0x20g", (uint64_t)0x20 * 1024 * 1024 *1024}, - {"10M", 10 * 1024 * 1024}, - {"050m", 050 * 1024 * 1024}, - {"8K", 8 * 1024}, - {"15k", 15 * 1024}, - {"0200", 0200}, - {"0x103", 0x103}, - {"432", 432}, - {"-1", 0}, /* negative values return 0 */ - {" -2", 0}, - {" -3MB", 0}, - {"18446744073709551616", 0} /* ULLONG_MAX + 1 == out of range*/ - }; - unsigned i; - for (i = 0; i < RTE_DIM(test_values); i++) - if (rte_str_to_size(test_values[i].str) != test_values[i].value) - return -1; - return 0; -} - static int test_multi_alloc_statistics(void) { @@ -1145,7 +1116,6 @@ test_free_sensitive(void) static struct unit_test_suite test_suite = { .suite_name = "Malloc test suite", .unit_test_cases = { - TEST_CASE(test_str_to_size), TEST_CASE(test_zero_aligned_alloc), TEST_CASE(test_malloc_bad_params), TEST_CASE(test_realloc), diff --git a/app/test/test_string_fns.c b/app/test/test_string_fns.c index 786eda9e49..697cb7ed15 100644 --- a/app/test/test_string_fns.c +++ b/app/test/test_string_fns.c @@ -5,6 +5,7 @@ #include <stdio.h> #include <stdarg.h> #include <stddef.h> +#include <inttypes.h> #include <errno.h> #include <string.h> @@ -314,6 +315,69 @@ test_rte_basename(void) return 0; } +static int +test_rte_str_to_size(void) +{ + struct { + const char *str; + uint64_t value; + } valid_values[] = { + {"5G", (uint64_t)5 * 1024 * 1024 * 1024}, + {"0x20g", (uint64_t)0x20 * 1024 * 1024 * 1024}, + {"10M", 10 * 1024 * 1024}, + {"050m", 050 * 1024 * 1024}, + {"8K", 8 * 1024}, + {"15k", 15 * 1024}, + {"0200", 0200}, + {"0x103", 0x103}, + {"432", 432}, + {"-1", 0}, + {" -2", 0}, + {" -3MB", 0}, + }; + struct { + const char *str; + } invalid_values[] = { + /* we can only check for invalid input at the start of the string */ + {"garbage"}, + {""}, + {" "}, + }; + unsigned int i; + uint64_t value; + + LOG("Checking valid rte_str_to_size inputs\n"); + + for (i = 0; i < RTE_DIM(valid_values); i++) { + errno = 0; + value = rte_str_to_size(valid_values[i].str); + if (value != valid_values[i].value) { + LOG("FAIL: valid input '%s'\n", valid_values[i].str); + return -1; + } + LOG("PASS: valid input '%s' -> %" PRIu64 "\n", + valid_values[i].str, value); + } + + LOG("Checking invalid rte_str_to_size inputs\n"); + + for (i = 0; i < RTE_DIM(invalid_values); i++) { + errno = 0; + (void)rte_str_to_size(invalid_values[i].str); + if (errno == 0) { + LOG("FAIL: invalid input '%s' did not set errno\n", + invalid_values[i].str); + return -1; + } + LOG("PASS: invalid input '%s' set errno=%d\n", + invalid_values[i].str, errno); + } + + LOG("%s - PASSED\n", __func__); + + return 0; +} + static int test_string_fns(void) { @@ -325,6 +389,8 @@ test_string_fns(void) return -1; if (test_rte_basename() < 0) return -1; + if (test_rte_str_to_size() < 0) + return -1; return 0; } diff --git a/lib/eal/common/eal_common_string_fns.c b/lib/eal/common/eal_common_string_fns.c index fa87831c3a..e0dc48bd80 100644 --- a/lib/eal/common/eal_common_string_fns.c +++ b/lib/eal/common/eal_common_string_fns.c @@ -85,6 +85,10 @@ rte_str_to_size(const char *str) errno = 0; size = strtoull(str, &endptr, 0); + if (endptr == str) { + errno = EINVAL; + return 0; + } if (errno) return 0; -- 2.47.3 ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH v2 1/6] eal: reject non-numeric input in str to size 2026-03-13 16:06 ` [PATCH v2 1/6] eal: reject non-numeric input in str to size Anatoly Burakov @ 2026-03-13 16:16 ` Bruce Richardson 0 siblings, 0 replies; 14+ messages in thread From: Bruce Richardson @ 2026-03-13 16:16 UTC (permalink / raw) To: Anatoly Burakov; +Cc: dev On Fri, Mar 13, 2026 at 04:06:32PM +0000, Anatoly Burakov wrote: > Add a check in rte_str_to_size validating that strtoull() consumed at least > one character. If not, set errno to EINVAL and return 0. > > Also move rte_str_to_size unit coverage from malloc tests to > string_autotest, where string utility tests belong, and add a new test to > test for handling invalid numerical input. > > Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> > --- Acked-by: Bruce Richardson <bruce.richardson@intel.com> ^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH v2 2/6] eal/memory: remove per-list segment and memory limits 2026-03-13 16:06 ` [PATCH v2 0/6] Make VA reservation limits configurable Anatoly Burakov 2026-03-13 16:06 ` [PATCH v2 1/6] eal: reject non-numeric input in str to size Anatoly Burakov @ 2026-03-13 16:06 ` Anatoly Burakov 2026-03-13 16:06 ` [PATCH v2 3/6] eal/memory: allocate all VA space in one go Anatoly Burakov ` (3 subsequent siblings) 5 siblings, 0 replies; 14+ messages in thread From: Anatoly Burakov @ 2026-03-13 16:06 UTC (permalink / raw) To: dev, Bruce Richardson Initially, the dynamic memory mode has used multiple segment lists for backing of different memory types, with the motivation being that it should be easier for secondary processes to map many smaller segments than fewer but larger ones, but in practice this does not seem to make any difference for 64-bit platforms, as there's usually plenty of address space. To reduce the amount of complexity in how memory segment lists are handled, collapse the multi-list logic to always use single segment list. That does not mean that all memory types will always get one segment - in some cases (e.g. 32-bit) we may not be able to allocate enough contiguous VA spaces to fit entire memory type into one list, in which case the number of memseg lists for that type will be more than one. It is more about lifting the upper limit on how many segment lists can a type have. If we end up blowing up our number of segment lists so much that we exceed a very generous default maximum memseg lists number then the user has bigger problems to address. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> --- config/rte_config.h | 2 - .../prog_guide/env_abstraction_layer.rst | 4 - lib/eal/common/eal_common_dynmem.c | 110 +++++------------- lib/eal/common/eal_common_memory.c | 6 +- lib/eal/common/eal_filesystem.h | 13 +++ lib/eal/common/eal_private.h | 6 +- lib/eal/freebsd/eal_memory.c | 75 ++++-------- lib/eal/linux/eal_memalloc.c | 4 +- lib/eal/linux/eal_memory.c | 88 ++++++-------- 9 files changed, 107 insertions(+), 201 deletions(-) diff --git a/config/rte_config.h b/config/rte_config.h index a2609fa403..0447cdf2ad 100644 --- a/config/rte_config.h +++ b/config/rte_config.h @@ -43,8 +43,6 @@ #define RTE_MAX_HEAPS 32 #define RTE_MAX_LCORE_VAR 131072 #define RTE_MAX_MEMSEG_LISTS 128 -#define RTE_MAX_MEMSEG_PER_LIST 8192 -#define RTE_MAX_MEM_MB_PER_LIST 32768 #define RTE_MAX_MEMSEG_PER_TYPE 32768 #define RTE_MAX_MEM_MB_PER_TYPE 65536 #define RTE_MAX_TAILQ 32 diff --git a/doc/guides/prog_guide/env_abstraction_layer.rst b/doc/guides/prog_guide/env_abstraction_layer.rst index d716895c1d..04368a3950 100644 --- a/doc/guides/prog_guide/env_abstraction_layer.rst +++ b/doc/guides/prog_guide/env_abstraction_layer.rst @@ -204,10 +204,6 @@ of virtual memory being preallocated at startup by editing the following config variables: * ``RTE_MAX_MEMSEG_LISTS`` controls how many segment lists can DPDK have -* ``RTE_MAX_MEM_MB_PER_LIST`` controls how much megabytes of memory each - segment list can address -* ``RTE_MAX_MEMSEG_PER_LIST`` controls how many segments each segment list - can have * ``RTE_MAX_MEMSEG_PER_TYPE`` controls how many segments each memory type can have (where "type" is defined as "page size + NUMA node" combination) * ``RTE_MAX_MEM_MB_PER_TYPE`` controls how much megabytes of memory each diff --git a/lib/eal/common/eal_common_dynmem.c b/lib/eal/common/eal_common_dynmem.c index 8f51d6dd4a..ef0270cc30 100644 --- a/lib/eal/common/eal_common_dynmem.c +++ b/lib/eal/common/eal_common_dynmem.c @@ -24,11 +24,10 @@ eal_dynmem_memseg_lists_init(void) struct memtype { uint64_t page_sz; int socket_id; - } *memtypes = NULL; + } memtypes[RTE_MAX_MEMSEG_LISTS] = {0}; int i, hpi_idx, msl_idx, ret = -1; /* fail unless told to succeed */ struct rte_memseg_list *msl; uint64_t max_mem, max_mem_per_type; - unsigned int max_seglists_per_type; unsigned int n_memtypes, cur_type; struct internal_config *internal_conf = eal_get_internal_configuration(); @@ -45,8 +44,7 @@ eal_dynmem_memseg_lists_init(void) * * deciding amount of memory going towards each memory type is a * balancing act between maximum segments per type, maximum memory per - * type, and number of detected NUMA nodes. the goal is to make sure - * each memory type gets at least one memseg list. + * type, and number of detected NUMA nodes. * * the total amount of memory is limited by RTE_MAX_MEM_MB value. * @@ -57,26 +55,18 @@ eal_dynmem_memseg_lists_init(void) * smaller page sizes, it can take hundreds of thousands of segments to * reach the above specified per-type memory limits. * - * additionally, each type may have multiple memseg lists associated - * with it, each limited by either RTE_MAX_MEM_MB_PER_LIST for bigger - * page sizes, or RTE_MAX_MEMSEG_PER_LIST segments for smaller ones. - * - * the number of memseg lists per type is decided based on the above - * limits, and also taking number of detected NUMA nodes, to make sure - * that we don't run out of memseg lists before we populate all NUMA - * nodes with memory. - * - * we do this in three stages. first, we collect the number of types. - * then, we figure out memory constraints and populate the list of - * would-be memseg lists. then, we go ahead and allocate the memseg - * lists. + * each memory type is allotted a single memseg list. the size of that + * list is calculated here to respect the per-type memory and segment + * limits that apply. */ - /* create space for mem types */ + /* maximum number of memtypes we're ever going to get */ n_memtypes = internal_conf->num_hugepage_sizes * rte_socket_count(); - memtypes = calloc(n_memtypes, sizeof(*memtypes)); - if (memtypes == NULL) { - EAL_LOG(ERR, "Cannot allocate space for memory types"); + + /* can we fit all memtypes into the memseg lists? */ + if (n_memtypes > RTE_MAX_MEMSEG_LISTS) { + EAL_LOG(ERR, "Too many memory types detected: %u. Please increase " + "RTE_MAX_MEMSEG_LISTS in configuration.", n_memtypes); return -1; } @@ -113,91 +103,49 @@ eal_dynmem_memseg_lists_init(void) max_mem = (uint64_t)RTE_MAX_MEM_MB << 20; max_mem_per_type = RTE_MIN((uint64_t)RTE_MAX_MEM_MB_PER_TYPE << 20, max_mem / n_memtypes); - /* - * limit maximum number of segment lists per type to ensure there's - * space for memseg lists for all NUMA nodes with all page sizes - */ - max_seglists_per_type = RTE_MAX_MEMSEG_LISTS / n_memtypes; - - if (max_seglists_per_type == 0) { - EAL_LOG(ERR, "Cannot accommodate all memory types, please increase RTE_MAX_MEMSEG_LISTS"); - goto out; - } /* go through all mem types and create segment lists */ msl_idx = 0; for (cur_type = 0; cur_type < n_memtypes; cur_type++) { - unsigned int cur_seglist, n_seglists, n_segs; - unsigned int max_segs_per_type, max_segs_per_list; + unsigned int n_segs; struct memtype *type = &memtypes[cur_type]; - uint64_t max_mem_per_list, pagesz; + uint64_t pagesz; int socket_id; pagesz = type->page_sz; socket_id = type->socket_id; /* - * we need to create segment lists for this type. we must take + * we need to create a segment list for this type. we must take * into account the following things: * - * 1. total amount of memory we can use for this memory type - * 2. total amount of memory per memseg list allowed + * 1. total amount of memory to use for this memory type + * 2. total amount of memory allowed per type * 3. number of segments needed to fit the amount of memory * 4. number of segments allowed per type - * 5. number of segments allowed per memseg list - * 6. number of memseg lists we are allowed to take up */ + n_segs = max_mem_per_type / pagesz; + n_segs = RTE_MIN(n_segs, (unsigned int)RTE_MAX_MEMSEG_PER_TYPE); - /* calculate how much segments we will need in total */ - max_segs_per_type = max_mem_per_type / pagesz; - /* limit number of segments to maximum allowed per type */ - max_segs_per_type = RTE_MIN(max_segs_per_type, - (unsigned int)RTE_MAX_MEMSEG_PER_TYPE); - /* limit number of segments to maximum allowed per list */ - max_segs_per_list = RTE_MIN(max_segs_per_type, - (unsigned int)RTE_MAX_MEMSEG_PER_LIST); + EAL_LOG(DEBUG, "Creating segment list: " + "n_segs:%u socket_id:%i hugepage_sz:%" PRIu64, + n_segs, socket_id, pagesz); - /* calculate how much memory we can have per segment list */ - max_mem_per_list = RTE_MIN(max_segs_per_list * pagesz, - (uint64_t)RTE_MAX_MEM_MB_PER_LIST << 20); + msl = &mcfg->memsegs[msl_idx]; - /* calculate how many segments each segment list will have */ - n_segs = RTE_MIN(max_segs_per_list, max_mem_per_list / pagesz); + if (eal_memseg_list_init(msl, pagesz, n_segs, socket_id, + msl_idx, true)) + goto out; - /* calculate how many segment lists we can have */ - n_seglists = RTE_MIN(max_segs_per_type / n_segs, - max_mem_per_type / max_mem_per_list); - - /* limit number of segment lists according to our maximum */ - n_seglists = RTE_MIN(n_seglists, max_seglists_per_type); - - EAL_LOG(DEBUG, "Creating %i segment lists: " - "n_segs:%i socket_id:%i hugepage_sz:%" PRIu64, - n_seglists, n_segs, socket_id, pagesz); - - /* create all segment lists */ - for (cur_seglist = 0; cur_seglist < n_seglists; cur_seglist++) { - if (msl_idx >= RTE_MAX_MEMSEG_LISTS) { - EAL_LOG(ERR, - "No more space in memseg lists, please increase RTE_MAX_MEMSEG_LISTS"); - goto out; - } - msl = &mcfg->memsegs[msl_idx++]; - - if (eal_memseg_list_init(msl, pagesz, n_segs, - socket_id, cur_seglist, true)) - goto out; - - if (eal_memseg_list_alloc(msl, 0)) { - EAL_LOG(ERR, "Cannot allocate VA space for memseg list"); - goto out; - } + if (eal_memseg_list_alloc(msl, 0)) { + EAL_LOG(ERR, "Cannot allocate VA space for memseg list"); + goto out; } + msl_idx++; } /* we're successful */ ret = 0; out: - free(memtypes); return ret; } diff --git a/lib/eal/common/eal_common_memory.c b/lib/eal/common/eal_common_memory.c index dccf9406c5..b9388021ff 100644 --- a/lib/eal/common/eal_common_memory.c +++ b/lib/eal/common/eal_common_memory.c @@ -228,12 +228,12 @@ eal_memseg_list_init_named(struct rte_memseg_list *msl, const char *name, int eal_memseg_list_init(struct rte_memseg_list *msl, uint64_t page_sz, - int n_segs, int socket_id, int type_msl_idx, bool heap) + int n_segs, int socket_id, int msl_idx, bool heap) { char name[RTE_FBARRAY_NAME_LEN]; - snprintf(name, sizeof(name), MEMSEG_LIST_FMT, page_sz >> 10, socket_id, - type_msl_idx); + snprintf(name, sizeof(name), MEMSEG_LIST_FMT, + page_sz >> 10, socket_id, msl_idx); return eal_memseg_list_init_named( msl, name, page_sz, n_segs, socket_id, heap); diff --git a/lib/eal/common/eal_filesystem.h b/lib/eal/common/eal_filesystem.h index 6b99d22160..2d22b52e76 100644 --- a/lib/eal/common/eal_filesystem.h +++ b/lib/eal/common/eal_filesystem.h @@ -114,6 +114,19 @@ eal_get_hugefile_path(char *buffer, size_t buflen, const char *hugedir, int f_id return buffer; } +#define HUGEFILE_FMT_LIST_SEG "%s/%smap_%u_%u" +static inline __rte_warn_unused_result const char * +eal_get_hugefile_list_seg_path(char *buffer, size_t buflen, + const char *hugedir, unsigned int list_idx, unsigned int seg_idx) +{ + if (snprintf(buffer, buflen, HUGEFILE_FMT_LIST_SEG, + hugedir, eal_get_hugefile_prefix(), list_idx, seg_idx) + >= (int)buflen) + return NULL; + else + return buffer; +} + /** define the default filename prefix for the %s values above */ #define HUGEFILE_PREFIX_DEFAULT "rte" diff --git a/lib/eal/common/eal_private.h b/lib/eal/common/eal_private.h index e032dd10c9..70f7b46699 100644 --- a/lib/eal/common/eal_private.h +++ b/lib/eal/common/eal_private.h @@ -299,14 +299,14 @@ eal_memseg_list_init_named(struct rte_memseg_list *msl, const char *name, * Initialize memory segment list and create its backing storage * with a name corresponding to MSL parameters. * - * @param type_msl_idx - * Index of the MSL among other MSLs of the same socket and page size. + * @param msl_idx + * Index of the MSL in memsegs array. * * @see eal_memseg_list_init_named for remaining parameters description. */ int eal_memseg_list_init(struct rte_memseg_list *msl, uint64_t page_sz, - int n_segs, int socket_id, int type_msl_idx, bool heap); + int n_segs, int socket_id, int msl_idx, bool heap); /** * Reserve VA space for a memory segment list diff --git a/lib/eal/freebsd/eal_memory.c b/lib/eal/freebsd/eal_memory.c index cd608db9f9..3eb5d193ec 100644 --- a/lib/eal/freebsd/eal_memory.c +++ b/lib/eal/freebsd/eal_memory.c @@ -190,8 +190,8 @@ rte_eal_hugepage_init(void) break; } if (msl_idx == RTE_MAX_MEMSEG_LISTS) { - EAL_LOG(ERR, "Could not find space for memseg. Please increase RTE_MAX_MEMSEG_PER_LIST " - "RTE_MAX_MEMSEG_PER_TYPE and/or RTE_MAX_MEM_MB_PER_TYPE in configuration."); + EAL_LOG(ERR, + "Could not find suitable space for memseg in existing memseg lists"); return -1; } arr = &msl->memseg_arr; @@ -320,23 +320,6 @@ rte_eal_using_phys_addrs(void) return 0; } -static uint64_t -get_mem_amount(uint64_t page_sz, uint64_t max_mem) -{ - uint64_t area_sz, max_pages; - - /* limit to RTE_MAX_MEMSEG_PER_LIST pages or RTE_MAX_MEM_MB_PER_LIST */ - max_pages = RTE_MAX_MEMSEG_PER_LIST; - max_mem = RTE_MIN((uint64_t)RTE_MAX_MEM_MB_PER_LIST << 20, max_mem); - - area_sz = RTE_MIN(page_sz * max_pages, max_mem); - - /* make sure the list isn't smaller than the page size */ - area_sz = RTE_MAX(area_sz, page_sz); - - return RTE_ALIGN(area_sz, page_sz); -} - static int memseg_list_alloc(struct rte_memseg_list *msl) { @@ -380,9 +363,10 @@ memseg_primary_init(void) hpi_idx++) { uint64_t max_type_mem, total_type_mem = 0; uint64_t avail_mem; - int type_msl_idx, max_segs, avail_segs, total_segs = 0; + unsigned int avail_segs; struct hugepage_info *hpi; uint64_t hugepage_sz; + unsigned int n_segs; hpi = &internal_conf->hugepage_info[hpi_idx]; hugepage_sz = hpi->hugepage_sz; @@ -396,7 +380,6 @@ memseg_primary_init(void) /* first, calculate theoretical limits according to config */ max_type_mem = RTE_MIN(max_mem - total_mem, (uint64_t)RTE_MAX_MEM_MB_PER_TYPE << 20); - max_segs = RTE_MAX_MEMSEG_PER_TYPE; /* now, limit all of that to whatever will actually be * available to us, because without dynamic allocation support, @@ -412,42 +395,30 @@ memseg_primary_init(void) avail_mem = avail_segs * hugepage_sz; max_type_mem = RTE_MIN(avail_mem, max_type_mem); - max_segs = RTE_MIN(avail_segs, max_segs); - - type_msl_idx = 0; - while (total_type_mem < max_type_mem && - total_segs < max_segs) { - uint64_t cur_max_mem, cur_mem; - unsigned int n_segs; - - if (msl_idx >= RTE_MAX_MEMSEG_LISTS) { - EAL_LOG(ERR, - "No more space in memseg lists, please increase RTE_MAX_MEMSEG_LISTS"); - return -1; - } - - msl = &mcfg->memsegs[msl_idx++]; - - cur_max_mem = max_type_mem - total_type_mem; - - cur_mem = get_mem_amount(hugepage_sz, - cur_max_mem); - n_segs = cur_mem / hugepage_sz; + n_segs = max_type_mem / hugepage_sz; + if (n_segs == 0) + continue; + + if (msl_idx >= RTE_MAX_MEMSEG_LISTS) { + EAL_LOG(ERR, + "No more space in memseg lists, please increase RTE_MAX_MEMSEG_LISTS"); + return -1; + } - if (eal_memseg_list_init(msl, hugepage_sz, n_segs, - 0, type_msl_idx, false)) - return -1; + msl = &mcfg->memsegs[msl_idx]; - total_segs += msl->memseg_arr.len; - total_type_mem = total_segs * hugepage_sz; - type_msl_idx++; + if (eal_memseg_list_init(msl, hugepage_sz, n_segs, + 0, msl_idx, false)) + return -1; - if (memseg_list_alloc(msl)) { - EAL_LOG(ERR, "Cannot allocate VA space for memseg list"); - return -1; - } + total_type_mem = n_segs * hugepage_sz; + if (memseg_list_alloc(msl)) { + EAL_LOG(ERR, "Cannot allocate VA space for memseg list"); + return -1; } + total_mem += total_type_mem; + msl_idx++; } return 0; } diff --git a/lib/eal/linux/eal_memalloc.c b/lib/eal/linux/eal_memalloc.c index a39bc31c7b..2227b1c52b 100644 --- a/lib/eal/linux/eal_memalloc.c +++ b/lib/eal/linux/eal_memalloc.c @@ -282,8 +282,8 @@ get_seg_fd(char *path, int buflen, struct hugepage_info *hi, huge_path = eal_get_hugefile_path(path, buflen, hi->hugedir, list_idx); } else { out_fd = &fd_list[list_idx].fds[seg_idx]; - huge_path = eal_get_hugefile_path(path, buflen, hi->hugedir, - list_idx * RTE_MAX_MEMSEG_PER_LIST + seg_idx); + huge_path = eal_get_hugefile_list_seg_path(path, buflen, + hi->hugedir, list_idx, seg_idx); } if (huge_path == NULL) { EAL_LOG(DEBUG, "%s(): hugefile path truncated: '%s'", diff --git a/lib/eal/linux/eal_memory.c b/lib/eal/linux/eal_memory.c index bf783e3c76..691d8eb3cc 100644 --- a/lib/eal/linux/eal_memory.c +++ b/lib/eal/linux/eal_memory.c @@ -740,8 +740,8 @@ remap_segment(struct hugepage_file *hugepages, int seg_start, int seg_end) break; } if (msl_idx == RTE_MAX_MEMSEG_LISTS) { - EAL_LOG(ERR, "Could not find space for memseg. Please increase RTE_MAX_MEMSEG_PER_LIST " - "RTE_MAX_MEMSEG_PER_TYPE and/or RTE_MAX_MEM_MB_PER_TYPE in configuration."); + EAL_LOG(ERR, + "Could not find suitable space for memseg in existing memseg lists"); return -1; } @@ -822,23 +822,6 @@ remap_segment(struct hugepage_file *hugepages, int seg_start, int seg_end) return seg_len; } -static uint64_t -get_mem_amount(uint64_t page_sz, uint64_t max_mem) -{ - uint64_t area_sz, max_pages; - - /* limit to RTE_MAX_MEMSEG_PER_LIST pages or RTE_MAX_MEM_MB_PER_LIST */ - max_pages = RTE_MAX_MEMSEG_PER_LIST; - max_mem = RTE_MIN((uint64_t)RTE_MAX_MEM_MB_PER_LIST << 20, max_mem); - - area_sz = RTE_MIN(page_sz * max_pages, max_mem); - - /* make sure the list isn't smaller than the page size */ - area_sz = RTE_MAX(area_sz, page_sz); - - return RTE_ALIGN(area_sz, page_sz); -} - static int memseg_list_free(struct rte_memseg_list *msl) { @@ -1831,7 +1814,6 @@ memseg_primary_init_32(void) uint64_t max_pagesz_mem, cur_pagesz_mem = 0; uint64_t hugepage_sz; struct hugepage_info *hpi; - int type_msl_idx, max_segs, total_segs = 0; hpi = &internal_conf->hugepage_info[hpi_idx]; hugepage_sz = hpi->hugepage_sz; @@ -1840,62 +1822,60 @@ memseg_primary_init_32(void) if (hpi->num_pages[socket_id] == 0) continue; - max_segs = RTE_MAX_MEMSEG_PER_TYPE; max_pagesz_mem = max_socket_mem - cur_socket_mem; /* make it multiple of page size */ max_pagesz_mem = RTE_ALIGN_FLOOR(max_pagesz_mem, hugepage_sz); + if (max_pagesz_mem == 0) + continue; + EAL_LOG(DEBUG, "Attempting to preallocate " "%" PRIu64 "M on socket %i", max_pagesz_mem >> 20, socket_id); - type_msl_idx = 0; - while (cur_pagesz_mem < max_pagesz_mem && - total_segs < max_segs) { - uint64_t cur_mem; + while (cur_pagesz_mem < max_pagesz_mem) { + uint64_t rem_mem; unsigned int n_segs; - if (msl_idx >= RTE_MAX_MEMSEG_LISTS) { - EAL_LOG(ERR, - "No more space in memseg lists, please increase RTE_MAX_MEMSEG_LISTS"); - return -1; - } + rem_mem = max_pagesz_mem - cur_pagesz_mem; + n_segs = rem_mem / hugepage_sz; - msl = &mcfg->memsegs[msl_idx]; + while (n_segs > 0) { + if (msl_idx >= RTE_MAX_MEMSEG_LISTS) { + EAL_LOG(ERR, + "No more space in memseg lists, please increase RTE_MAX_MEMSEG_LISTS"); + return -1; + } - cur_mem = get_mem_amount(hugepage_sz, - max_pagesz_mem); - n_segs = cur_mem / hugepage_sz; + msl = &mcfg->memsegs[msl_idx]; - if (eal_memseg_list_init(msl, hugepage_sz, - n_segs, socket_id, type_msl_idx, - true)) { - /* failing to allocate a memseg list is - * a serious error. - */ - EAL_LOG(ERR, "Cannot allocate memseg list"); - return -1; - } + if (eal_memseg_list_init(msl, hugepage_sz, + n_segs, socket_id, msl_idx, true) < 0) { + /* failing to allocate a memseg list is a serious error. */ + EAL_LOG(ERR, "Cannot allocate memseg list"); + return -1; + } + + if (eal_memseg_list_alloc(msl, 0) == 0) + break; - if (eal_memseg_list_alloc(msl, 0)) { - /* if we couldn't allocate VA space, we - * can try with smaller page sizes. - */ - EAL_LOG(ERR, "Cannot allocate VA space for memseg list, retrying with different page size"); - /* deallocate memseg list */ if (memseg_list_free(msl)) return -1; - break; + + EAL_LOG(DEBUG, + "Cannot allocate VA space for memseg list, retrying with smaller chunk"); + n_segs /= 2; } - total_segs += msl->memseg_arr.len; - cur_pagesz_mem = total_segs * hugepage_sz; - type_msl_idx++; + if (n_segs == 0) + break; + + cur_pagesz_mem += (uint64_t)n_segs * hugepage_sz; + cur_socket_mem += (uint64_t)n_segs * hugepage_sz; msl_idx++; } - cur_socket_mem += cur_pagesz_mem; } if (cur_socket_mem == 0) { EAL_LOG(ERR, "Cannot allocate VA space on socket %u", -- 2.47.3 ^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH v2 3/6] eal/memory: allocate all VA space in one go 2026-03-13 16:06 ` [PATCH v2 0/6] Make VA reservation limits configurable Anatoly Burakov 2026-03-13 16:06 ` [PATCH v2 1/6] eal: reject non-numeric input in str to size Anatoly Burakov 2026-03-13 16:06 ` [PATCH v2 2/6] eal/memory: remove per-list segment and memory limits Anatoly Burakov @ 2026-03-13 16:06 ` Anatoly Burakov 2026-03-13 16:06 ` [PATCH v2 4/6] eal/memory: get rid of global VA space limits Anatoly Burakov ` (2 subsequent siblings) 5 siblings, 0 replies; 14+ messages in thread From: Anatoly Burakov @ 2026-03-13 16:06 UTC (permalink / raw) To: dev, Bruce Richardson Instead of allocating VA space per memseg list in dynmem mode, allocate it all in one go, and then assign memseg lists portions of that space. In a similar way, for dynmem initialization in secondary processes, also attach all VA space in one go. Legacy/32-bit paths are untouched. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> --- lib/eal/common/eal_common_dynmem.c | 56 ++++++++++++++++++++---- lib/eal/common/eal_common_memory.c | 22 ++++++++++ lib/eal/common/eal_memcfg.h | 6 +++ lib/eal/common/eal_private.h | 13 ++++++ lib/eal/freebsd/eal_memory.c | 12 ++---- lib/eal/linux/eal_memory.c | 69 +++++++++++++++++++++++++++++- 6 files changed, 159 insertions(+), 19 deletions(-) diff --git a/lib/eal/common/eal_common_dynmem.c b/lib/eal/common/eal_common_dynmem.c index ef0270cc30..78fa349485 100644 --- a/lib/eal/common/eal_common_dynmem.c +++ b/lib/eal/common/eal_common_dynmem.c @@ -24,11 +24,16 @@ eal_dynmem_memseg_lists_init(void) struct memtype { uint64_t page_sz; int socket_id; + unsigned int n_segs; + size_t mem_sz; + size_t va_offset; } memtypes[RTE_MAX_MEMSEG_LISTS] = {0}; int i, hpi_idx, msl_idx, ret = -1; /* fail unless told to succeed */ struct rte_memseg_list *msl; uint64_t max_mem, max_mem_per_type; + size_t mem_va_len, mem_va_page_sz; unsigned int n_memtypes, cur_type; + void *mem_va_addr = NULL; struct internal_config *internal_conf = eal_get_internal_configuration(); @@ -103,17 +108,16 @@ eal_dynmem_memseg_lists_init(void) max_mem = (uint64_t)RTE_MAX_MEM_MB << 20; max_mem_per_type = RTE_MIN((uint64_t)RTE_MAX_MEM_MB_PER_TYPE << 20, max_mem / n_memtypes); + mem_va_len = 0; + mem_va_page_sz = 0; - /* go through all mem types and create segment lists */ - msl_idx = 0; + /* calculate total VA space and offsets for all mem types */ for (cur_type = 0; cur_type < n_memtypes; cur_type++) { unsigned int n_segs; struct memtype *type = &memtypes[cur_type]; uint64_t pagesz; - int socket_id; pagesz = type->page_sz; - socket_id = type->socket_id; /* * we need to create a segment list for this type. we must take @@ -126,19 +130,44 @@ eal_dynmem_memseg_lists_init(void) */ n_segs = max_mem_per_type / pagesz; n_segs = RTE_MIN(n_segs, (unsigned int)RTE_MAX_MEMSEG_PER_TYPE); + type->n_segs = n_segs; + type->mem_sz = (size_t)pagesz * type->n_segs; + mem_va_page_sz = RTE_MAX(mem_va_page_sz, (size_t)pagesz); + mem_va_len = RTE_ALIGN_CEIL(mem_va_len, pagesz); + type->va_offset = mem_va_len; + mem_va_len += type->mem_sz; + } + + mem_va_addr = eal_get_virtual_area(NULL, &mem_va_len, + mem_va_page_sz, 0, 0); + if (mem_va_addr == NULL) { + EAL_LOG(ERR, "Cannot reserve VA space for memseg lists"); + goto out; + } + + /* go through all mem types and create segment lists */ + msl_idx = 0; + for (cur_type = 0; cur_type < n_memtypes; cur_type++) { + struct memtype *type = &memtypes[cur_type]; + uint64_t pagesz; + int socket_id; + + pagesz = type->page_sz; + socket_id = type->socket_id; EAL_LOG(DEBUG, "Creating segment list: " "n_segs:%u socket_id:%i hugepage_sz:%" PRIu64, - n_segs, socket_id, pagesz); + type->n_segs, socket_id, pagesz); msl = &mcfg->memsegs[msl_idx]; - if (eal_memseg_list_init(msl, pagesz, n_segs, socket_id, - msl_idx, true)) + if (eal_memseg_list_init(msl, pagesz, type->n_segs, + socket_id, msl_idx, true)) goto out; - if (eal_memseg_list_alloc(msl, 0)) { - EAL_LOG(ERR, "Cannot allocate VA space for memseg list"); + if (eal_memseg_list_assign(msl, + RTE_PTR_ADD(mem_va_addr, type->va_offset))) { + EAL_LOG(ERR, "Cannot assign VA space for memseg list"); goto out; } msl_idx++; @@ -146,6 +175,15 @@ eal_dynmem_memseg_lists_init(void) /* we're successful */ ret = 0; out: + if (ret != 0) { + if (mem_va_addr != NULL) + eal_mem_free(mem_va_addr, mem_va_len); + } else { + /* store the VA space data in shared config */ + mcfg->mem_va_addr = (uintptr_t)mem_va_addr; + mcfg->mem_va_len = mem_va_len; + mcfg->mem_va_page_sz = mem_va_page_sz; + } return ret; } diff --git a/lib/eal/common/eal_common_memory.c b/lib/eal/common/eal_common_memory.c index b9388021ff..b590fb2fb5 100644 --- a/lib/eal/common/eal_common_memory.c +++ b/lib/eal/common/eal_common_memory.c @@ -272,6 +272,28 @@ eal_memseg_list_alloc(struct rte_memseg_list *msl, int reserve_flags) return 0; } +int +eal_memseg_list_assign(struct rte_memseg_list *msl, void *addr) +{ + size_t page_sz, mem_sz; + + page_sz = msl->page_sz; + mem_sz = page_sz * msl->memseg_arr.len; + + if (addr == NULL || addr != RTE_PTR_ALIGN(addr, page_sz)) { + rte_errno = EINVAL; + return -1; + } + + msl->base_va = addr; + msl->len = mem_sz; + + EAL_LOG(DEBUG, "VA assigned for memseg list at %p, size %zx", + addr, mem_sz); + + return 0; +} + void eal_memseg_list_populate(struct rte_memseg_list *msl, void *addr, int n_segs) { diff --git a/lib/eal/common/eal_memcfg.h b/lib/eal/common/eal_memcfg.h index 60e2089797..2b3b3b62ba 100644 --- a/lib/eal/common/eal_memcfg.h +++ b/lib/eal/common/eal_memcfg.h @@ -49,6 +49,12 @@ struct rte_mem_config { struct rte_memseg_list memsegs[RTE_MAX_MEMSEG_LISTS]; /**< List of dynamic arrays holding memsegs */ + uintptr_t mem_va_addr; + /**< Base VA address reserved for dynamic memory memseg lists. */ + size_t mem_va_len; + /**< Length of VA range reserved for dynamic memory memseg lists. */ + size_t mem_va_page_sz; + /**< Page size alignment used for dynamic memory VA reservation. */ struct rte_tailq_head tailq_head[RTE_MAX_TAILQ]; /**< Tailqs for objects */ diff --git a/lib/eal/common/eal_private.h b/lib/eal/common/eal_private.h index 70f7b46699..0c0544beaf 100644 --- a/lib/eal/common/eal_private.h +++ b/lib/eal/common/eal_private.h @@ -322,6 +322,19 @@ eal_memseg_list_init(struct rte_memseg_list *msl, uint64_t page_sz, int eal_memseg_list_alloc(struct rte_memseg_list *msl, int reserve_flags); +/** + * Assign a pre-reserved VA range to a memory segment list. + * + * @param msl + * Initialized memory segment list with page size defined. + * @param addr + * Starting address of list VA range. + * @return + * 0 on success, (-1) on failure and rte_errno is set. + */ +int +eal_memseg_list_assign(struct rte_memseg_list *msl, void *addr); + /** * Populate MSL, each segment is one page long. * diff --git a/lib/eal/freebsd/eal_memory.c b/lib/eal/freebsd/eal_memory.c index 3eb5d193ec..09ce9dac10 100644 --- a/lib/eal/freebsd/eal_memory.c +++ b/lib/eal/freebsd/eal_memory.c @@ -362,8 +362,6 @@ memseg_primary_init(void) for (hpi_idx = 0; hpi_idx < (int) internal_conf->num_hugepage_sizes; hpi_idx++) { uint64_t max_type_mem, total_type_mem = 0; - uint64_t avail_mem; - unsigned int avail_segs; struct hugepage_info *hpi; uint64_t hugepage_sz; unsigned int n_segs; @@ -391,11 +389,8 @@ memseg_primary_init(void) * so we will allocate more and put spaces between segments * that are non-contiguous. */ - avail_segs = (hpi->num_pages[0] * 2) - 1; - avail_mem = avail_segs * hugepage_sz; - - max_type_mem = RTE_MIN(avail_mem, max_type_mem); - n_segs = max_type_mem / hugepage_sz; + n_segs = RTE_MIN((hpi->num_pages[0] * 2) - 1, + max_type_mem / hugepage_sz); if (n_segs == 0) continue; @@ -411,12 +406,11 @@ memseg_primary_init(void) 0, msl_idx, false)) return -1; - total_type_mem = n_segs * hugepage_sz; if (memseg_list_alloc(msl)) { EAL_LOG(ERR, "Cannot allocate VA space for memseg list"); return -1; } - + total_type_mem = n_segs * hugepage_sz; total_mem += total_type_mem; msl_idx++; } diff --git a/lib/eal/linux/eal_memory.c b/lib/eal/linux/eal_memory.c index 691d8eb3cc..1bbf771db8 100644 --- a/lib/eal/linux/eal_memory.c +++ b/lib/eal/linux/eal_memory.c @@ -1893,8 +1893,60 @@ memseg_primary_init(void) return eal_dynmem_memseg_lists_init(); } +static int __rte_unused +memseg_secondary_init_dynmem(void) +{ + struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config; + int msl_idx = 0; + struct rte_memseg_list *msl; + void *mem_va_addr; + size_t mem_va_len; + + if (mcfg->mem_va_addr == 0 || mcfg->mem_va_len == 0 || + mcfg->mem_va_page_sz == 0) { + EAL_LOG(ERR, "Missing shared dynamic memory VA range from primary process"); + return -1; + } + + mem_va_addr = (void *)(uintptr_t)mcfg->mem_va_addr; + mem_va_len = mcfg->mem_va_len; + + if (eal_get_virtual_area(mem_va_addr, &mem_va_len, + mcfg->mem_va_page_sz, 0, 0) == NULL) { + EAL_LOG(ERR, "Cannot reserve VA space for hugepage memory"); + return -1; + } + + for (msl_idx = 0; msl_idx < RTE_MAX_MEMSEG_LISTS; msl_idx++) { + + msl = &mcfg->memsegs[msl_idx]; + + /* skip empty and external memseg lists */ + if (msl->memseg_arr.len == 0 || msl->external) + continue; + + if (rte_fbarray_attach(&msl->memseg_arr)) { + EAL_LOG(ERR, "Cannot attach to primary process memseg lists"); + eal_mem_free(mem_va_addr, mem_va_len); + return -1; + } + + if (eal_memseg_list_assign(msl, msl->base_va)) { + EAL_LOG(ERR, "Cannot assign VA space for hugepage memory"); + eal_mem_free(mem_va_addr, mem_va_len); + return -1; + } + + EAL_LOG(DEBUG, "Attaching segment list: " + "n_segs:%u socket_id:%d hugepage_sz:%" PRIu64, + msl->memseg_arr.len, msl->socket_id, msl->page_sz); + } + + return 0; +} + static int -memseg_secondary_init(void) +memseg_secondary_init_legacy(void) { struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config; int msl_idx = 0; @@ -1923,6 +1975,21 @@ memseg_secondary_init(void) return 0; } +static int +memseg_secondary_init(void) +{ + const struct internal_config *internal_conf = + eal_get_internal_configuration(); + + /* for 32-bit dynmem init is same as legacy */ +#ifdef RTE_ARCH_64 + if (!internal_conf->legacy_mem) + return memseg_secondary_init_dynmem(); +#endif + + return memseg_secondary_init_legacy(); +} + int rte_eal_memseg_init(void) { -- 2.47.3 ^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH v2 4/6] eal/memory: get rid of global VA space limits 2026-03-13 16:06 ` [PATCH v2 0/6] Make VA reservation limits configurable Anatoly Burakov ` (2 preceding siblings ...) 2026-03-13 16:06 ` [PATCH v2 3/6] eal/memory: allocate all VA space in one go Anatoly Burakov @ 2026-03-13 16:06 ` Anatoly Burakov 2026-03-13 16:06 ` [PATCH v2 5/6] eal/memory: store default segment limits in config Anatoly Burakov 2026-03-13 16:06 ` [PATCH v2 6/6] eal/memory: add page size VA limits EAL parameter Anatoly Burakov 5 siblings, 0 replies; 14+ messages in thread From: Anatoly Burakov @ 2026-03-13 16:06 UTC (permalink / raw) To: dev, Wathsala Vithanage, Bruce Richardson Currently, all VA space reservations take into account global memory limit. The original intent was to limit memory allocations to however many NUMA nodes the machine had taking into the account that socket ID's may be discontiguous. Since we have had "socket count" API for while and it gives us correct NUMA node count, taking discontiguousness into account, we can relax the total limits and remove the restrictions, and let VA space usage scale with NUMA nodes. The only place where we actually require a hard limit is in 32-bit code, where we cannot allocate more than 2G of VA space. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> --- config/arm/meson.build | 1 - config/meson.build | 5 ---- .../prog_guide/env_abstraction_layer.rst | 2 -- lib/eal/common/eal_common_dynmem.c | 13 +++------ lib/eal/freebsd/eal_memory.c | 28 +++---------------- lib/eal/linux/eal_memory.c | 10 +++---- 6 files changed, 13 insertions(+), 46 deletions(-) diff --git a/config/arm/meson.build b/config/arm/meson.build index 523b0fc0ed..3b03f5e31b 100644 --- a/config/arm/meson.build +++ b/config/arm/meson.build @@ -69,7 +69,6 @@ part_number_config_arm = { 'flags': [ ['RTE_MACHINE', '"neoverse-n1"'], ['RTE_ARM_FEATURE_ATOMICS', true], - ['RTE_MAX_MEM_MB', 1048576], ['RTE_MAX_LCORE', 256], ['RTE_MAX_NUMA_NODES', 8] ] diff --git a/config/meson.build b/config/meson.build index 02e2798cca..f68f1f5f53 100644 --- a/config/meson.build +++ b/config/meson.build @@ -383,11 +383,6 @@ dpdk_conf.set('RTE_PKTMBUF_HEADROOM', get_option('pkt_mbuf_headroom')) dpdk_conf.set('RTE_MAX_VFIO_GROUPS', 64) dpdk_conf.set('RTE_DRIVER_MEMPOOL_BUCKET_SIZE_KB', 64) dpdk_conf.set('RTE_LIBRTE_DPAA2_USE_PHYS_IOVA', true) -if dpdk_conf.get('RTE_ARCH_64') - dpdk_conf.set('RTE_MAX_MEM_MB', 524288) -else # for 32-bit we need smaller reserved memory areas - dpdk_conf.set('RTE_MAX_MEM_MB', 2048) -endif if get_option('mbuf_refcnt_atomic') dpdk_conf.set('RTE_MBUF_REFCNT_ATOMIC', true) endif diff --git a/doc/guides/prog_guide/env_abstraction_layer.rst b/doc/guides/prog_guide/env_abstraction_layer.rst index 04368a3950..63e0568afa 100644 --- a/doc/guides/prog_guide/env_abstraction_layer.rst +++ b/doc/guides/prog_guide/env_abstraction_layer.rst @@ -208,8 +208,6 @@ variables: can have (where "type" is defined as "page size + NUMA node" combination) * ``RTE_MAX_MEM_MB_PER_TYPE`` controls how much megabytes of memory each memory type can address -* ``RTE_MAX_MEM_MB`` places a global maximum on the amount of memory - DPDK can reserve Normally, these options do not need to be changed. diff --git a/lib/eal/common/eal_common_dynmem.c b/lib/eal/common/eal_common_dynmem.c index 78fa349485..c163bf4967 100644 --- a/lib/eal/common/eal_common_dynmem.c +++ b/lib/eal/common/eal_common_dynmem.c @@ -30,7 +30,7 @@ eal_dynmem_memseg_lists_init(void) } memtypes[RTE_MAX_MEMSEG_LISTS] = {0}; int i, hpi_idx, msl_idx, ret = -1; /* fail unless told to succeed */ struct rte_memseg_list *msl; - uint64_t max_mem, max_mem_per_type; + uint64_t max_mem_per_type; size_t mem_va_len, mem_va_page_sz; unsigned int n_memtypes, cur_type; void *mem_va_addr = NULL; @@ -51,11 +51,8 @@ eal_dynmem_memseg_lists_init(void) * balancing act between maximum segments per type, maximum memory per * type, and number of detected NUMA nodes. * - * the total amount of memory is limited by RTE_MAX_MEM_MB value. - * - * the total amount of memory per type is limited by either - * RTE_MAX_MEM_MB_PER_TYPE, or by RTE_MAX_MEM_MB divided by the number - * of detected NUMA nodes. additionally, maximum number of segments per + * the total amount of memory per type is limited by + * RTE_MAX_MEM_MB_PER_TYPE. additionally, maximum number of segments per * type is also limited by RTE_MAX_MEMSEG_PER_TYPE. this is because for * smaller page sizes, it can take hundreds of thousands of segments to * reach the above specified per-type memory limits. @@ -105,9 +102,7 @@ eal_dynmem_memseg_lists_init(void) n_memtypes = cur_type; /* set up limits for types */ - max_mem = (uint64_t)RTE_MAX_MEM_MB << 20; - max_mem_per_type = RTE_MIN((uint64_t)RTE_MAX_MEM_MB_PER_TYPE << 20, - max_mem / n_memtypes); + max_mem_per_type = (uint64_t)RTE_MAX_MEM_MB_PER_TYPE << 20; mem_va_len = 0; mem_va_page_sz = 0; diff --git a/lib/eal/freebsd/eal_memory.c b/lib/eal/freebsd/eal_memory.c index 09ce9dac10..fd2566cfa3 100644 --- a/lib/eal/freebsd/eal_memory.c +++ b/lib/eal/freebsd/eal_memory.c @@ -337,7 +337,6 @@ memseg_primary_init(void) struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config; int hpi_idx, msl_idx = 0; struct rte_memseg_list *msl; - uint64_t max_mem, total_mem; struct internal_config *internal_conf = eal_get_internal_configuration(); @@ -346,22 +345,14 @@ memseg_primary_init(void) return 0; /* FreeBSD has an issue where core dump will dump the entire memory - * contents, including anonymous zero-page memory. Therefore, while we - * will be limiting total amount of memory to RTE_MAX_MEM_MB, we will - * also be further limiting total memory amount to whatever memory is - * available to us through contigmem driver (plus spacing blocks). - * - * so, at each stage, we will be checking how much memory we are - * preallocating, and adjust all the values accordingly. + * contents, including anonymous zero-page memory. To avoid reserving VA + * space we are not going to use, size memseg lists according to + * contigmem-provided page counts. */ - max_mem = (uint64_t)RTE_MAX_MEM_MB << 20; - total_mem = 0; - /* create memseg lists */ for (hpi_idx = 0; hpi_idx < (int) internal_conf->num_hugepage_sizes; hpi_idx++) { - uint64_t max_type_mem, total_type_mem = 0; struct hugepage_info *hpi; uint64_t hugepage_sz; unsigned int n_segs; @@ -371,14 +362,6 @@ memseg_primary_init(void) /* no NUMA support on FreeBSD */ - /* check if we've already exceeded total memory amount */ - if (total_mem >= max_mem) - break; - - /* first, calculate theoretical limits according to config */ - max_type_mem = RTE_MIN(max_mem - total_mem, - (uint64_t)RTE_MAX_MEM_MB_PER_TYPE << 20); - /* now, limit all of that to whatever will actually be * available to us, because without dynamic allocation support, * all of that extra memory will be sitting there being useless @@ -389,8 +372,7 @@ memseg_primary_init(void) * so we will allocate more and put spaces between segments * that are non-contiguous. */ - n_segs = RTE_MIN((hpi->num_pages[0] * 2) - 1, - max_type_mem / hugepage_sz); + n_segs = (hpi->num_pages[0] * 2) - 1; if (n_segs == 0) continue; @@ -410,8 +392,6 @@ memseg_primary_init(void) EAL_LOG(ERR, "Cannot allocate VA space for memseg list"); return -1; } - total_type_mem = n_segs * hugepage_sz; - total_mem += total_type_mem; msl_idx++; } return 0; diff --git a/lib/eal/linux/eal_memory.c b/lib/eal/linux/eal_memory.c index 1bbf771db8..55779badec 100644 --- a/lib/eal/linux/eal_memory.c +++ b/lib/eal/linux/eal_memory.c @@ -1695,12 +1695,13 @@ rte_eal_using_phys_addrs(void) static int __rte_unused memseg_primary_init_32(void) { + /* limit total amount of memory on 32-bit */ + const uint64_t mem32_max_mem = 2ULL << 30; struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config; int active_sockets, hpi_idx, msl_idx = 0; unsigned int socket_id, i; struct rte_memseg_list *msl; uint64_t extra_mem_per_socket, total_extra_mem, total_requested_mem; - uint64_t max_mem; struct internal_config *internal_conf = eal_get_internal_configuration(); @@ -1743,13 +1744,12 @@ memseg_primary_init_32(void) else total_requested_mem = internal_conf->memory; - max_mem = (uint64_t)RTE_MAX_MEM_MB << 20; - if (total_requested_mem > max_mem) { + if (total_requested_mem > mem32_max_mem) { EAL_LOG(ERR, "Invalid parameters: 32-bit process can at most use %uM of memory", - (unsigned int)(max_mem >> 20)); + (unsigned int)(mem32_max_mem >> 20)); return -1; } - total_extra_mem = max_mem - total_requested_mem; + total_extra_mem = mem32_max_mem - total_requested_mem; extra_mem_per_socket = active_sockets == 0 ? total_extra_mem : total_extra_mem / active_sockets; -- 2.47.3 ^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH v2 5/6] eal/memory: store default segment limits in config 2026-03-13 16:06 ` [PATCH v2 0/6] Make VA reservation limits configurable Anatoly Burakov ` (3 preceding siblings ...) 2026-03-13 16:06 ` [PATCH v2 4/6] eal/memory: get rid of global VA space limits Anatoly Burakov @ 2026-03-13 16:06 ` Anatoly Burakov 2026-03-13 16:06 ` [PATCH v2 6/6] eal/memory: add page size VA limits EAL parameter Anatoly Burakov 5 siblings, 0 replies; 14+ messages in thread From: Anatoly Burakov @ 2026-03-13 16:06 UTC (permalink / raw) To: dev, Bruce Richardson, Dmitry Kozlyuk Currently, VA space allocation is regulated by two constants picked up from config - max memseg per list, and max memory per list. In preparation for these limits being dynamic, add a per-page-size limit value in config, populate that value from these defaults at init time, and adjust the code to only refer to the mem limits from internal config. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> --- lib/eal/common/eal_common_dynmem.c | 22 ++++++++-------------- lib/eal/common/eal_common_options.c | 20 ++++++++++++++++++++ lib/eal/common/eal_internal_cfg.h | 2 ++ lib/eal/common/eal_options.h | 1 + lib/eal/freebsd/eal.c | 6 ++++++ lib/eal/linux/eal.c | 6 ++++++ lib/eal/linux/eal_memory.c | 3 +++ lib/eal/windows/eal.c | 6 ++++++ 8 files changed, 52 insertions(+), 14 deletions(-) diff --git a/lib/eal/common/eal_common_dynmem.c b/lib/eal/common/eal_common_dynmem.c index c163bf4967..c33fbdea6d 100644 --- a/lib/eal/common/eal_common_dynmem.c +++ b/lib/eal/common/eal_common_dynmem.c @@ -24,13 +24,13 @@ eal_dynmem_memseg_lists_init(void) struct memtype { uint64_t page_sz; int socket_id; + unsigned int hpi_idx; unsigned int n_segs; size_t mem_sz; size_t va_offset; } memtypes[RTE_MAX_MEMSEG_LISTS] = {0}; int i, hpi_idx, msl_idx, ret = -1; /* fail unless told to succeed */ struct rte_memseg_list *msl; - uint64_t max_mem_per_type; size_t mem_va_len, mem_va_page_sz; unsigned int n_memtypes, cur_type; void *mem_va_addr = NULL; @@ -51,15 +51,9 @@ eal_dynmem_memseg_lists_init(void) * balancing act between maximum segments per type, maximum memory per * type, and number of detected NUMA nodes. * - * the total amount of memory per type is limited by - * RTE_MAX_MEM_MB_PER_TYPE. additionally, maximum number of segments per - * type is also limited by RTE_MAX_MEMSEG_PER_TYPE. this is because for - * smaller page sizes, it can take hundreds of thousands of segments to - * reach the above specified per-type memory limits. - * - * each memory type is allotted a single memseg list. the size of that - * list is calculated here to respect the per-type memory and segment - * limits that apply. + * the total amount of memory per type is limited by per-page-size + * memory values in internal config. each memory type is allotted one + * memseg list. */ /* maximum number of memtypes we're ever going to get */ @@ -92,6 +86,7 @@ eal_dynmem_memseg_lists_init(void) #endif memtypes[cur_type].page_sz = hugepage_sz; memtypes[cur_type].socket_id = socket_id; + memtypes[cur_type].hpi_idx = hpi_idx; EAL_LOG(DEBUG, "Detected memory type: " "socket_id:%u hugepage_sz:%" PRIu64, @@ -101,8 +96,6 @@ eal_dynmem_memseg_lists_init(void) /* number of memtypes could have been lower due to no NUMA support */ n_memtypes = cur_type; - /* set up limits for types */ - max_mem_per_type = (uint64_t)RTE_MAX_MEM_MB_PER_TYPE << 20; mem_va_len = 0; mem_va_page_sz = 0; @@ -110,9 +103,12 @@ eal_dynmem_memseg_lists_init(void) for (cur_type = 0; cur_type < n_memtypes; cur_type++) { unsigned int n_segs; struct memtype *type = &memtypes[cur_type]; + uint64_t max_mem_per_type; uint64_t pagesz; pagesz = type->page_sz; + max_mem_per_type = + internal_conf->hugepage_mem_sz_limits[type->hpi_idx]; /* * we need to create a segment list for this type. we must take @@ -121,10 +117,8 @@ eal_dynmem_memseg_lists_init(void) * 1. total amount of memory to use for this memory type * 2. total amount of memory allowed per type * 3. number of segments needed to fit the amount of memory - * 4. number of segments allowed per type */ n_segs = max_mem_per_type / pagesz; - n_segs = RTE_MIN(n_segs, (unsigned int)RTE_MAX_MEMSEG_PER_TYPE); type->n_segs = n_segs; type->mem_sz = (size_t)pagesz * type->n_segs; mem_va_page_sz = RTE_MAX(mem_va_page_sz, (size_t)pagesz); diff --git a/lib/eal/common/eal_common_options.c b/lib/eal/common/eal_common_options.c index aad676a004..bbc4427524 100644 --- a/lib/eal/common/eal_common_options.c +++ b/lib/eal/common/eal_common_options.c @@ -510,6 +510,7 @@ eal_reset_internal_config(struct internal_config *internal_cfg) memset(&internal_cfg->hugepage_info[i], 0, sizeof(internal_cfg->hugepage_info[0])); internal_cfg->hugepage_info[i].lock_descriptor = -1; + internal_cfg->hugepage_mem_sz_limits[i] = 0; } internal_cfg->base_virtaddr = 0; @@ -2359,6 +2360,25 @@ eal_adjust_config(struct internal_config *internal_cfg) return 0; } +int +eal_apply_hugepage_mem_sz_limits(struct internal_config *internal_cfg) +{ + unsigned int i; + + for (i = 0; i < internal_cfg->num_hugepage_sizes; i++) { + const uint64_t pagesz = internal_cfg->hugepage_info[i].hugepage_sz; + uint64_t limit; + + /* assign default limits */ + limit = RTE_MIN((uint64_t)RTE_MAX_MEM_MB_PER_TYPE << 20, + (uint64_t)RTE_MAX_MEMSEG_PER_TYPE * pagesz); + + internal_cfg->hugepage_mem_sz_limits[i] = limit; + } + + return 0; +} + RTE_EXPORT_SYMBOL(rte_vect_get_max_simd_bitwidth) uint16_t rte_vect_get_max_simd_bitwidth(void) diff --git a/lib/eal/common/eal_internal_cfg.h b/lib/eal/common/eal_internal_cfg.h index 95d327a613..0bf192c6e5 100644 --- a/lib/eal/common/eal_internal_cfg.h +++ b/lib/eal/common/eal_internal_cfg.h @@ -96,6 +96,8 @@ struct internal_config { /**< user defined mbuf pool ops name */ unsigned num_hugepage_sizes; /**< how many sizes on this system */ struct hugepage_info hugepage_info[MAX_HUGEPAGE_SIZES]; + uint64_t hugepage_mem_sz_limits[MAX_HUGEPAGE_SIZES]; + /**< default max memory per hugepage size */ enum rte_iova_mode iova_mode ; /**< Set IOVA mode on this system */ rte_cpuset_t ctrl_cpuset; /**< cpuset for ctrl threads */ volatile unsigned int init_complete; diff --git a/lib/eal/common/eal_options.h b/lib/eal/common/eal_options.h index f5e7905609..82cc8be8db 100644 --- a/lib/eal/common/eal_options.h +++ b/lib/eal/common/eal_options.h @@ -12,6 +12,7 @@ struct rte_tel_data; int eal_parse_log_options(void); int eal_parse_args(void); int eal_option_device_parse(void); +int eal_apply_hugepage_mem_sz_limits(struct internal_config *internal_cfg); int eal_adjust_config(struct internal_config *internal_cfg); int eal_cleanup_config(struct internal_config *internal_cfg); enum rte_proc_type_t eal_proc_type_detect(void); diff --git a/lib/eal/freebsd/eal.c b/lib/eal/freebsd/eal.c index 60f5e676a8..8b1ba5b99b 100644 --- a/lib/eal/freebsd/eal.c +++ b/lib/eal/freebsd/eal.c @@ -585,6 +585,12 @@ rte_eal_init(int argc, char **argv) rte_errno = EACCES; goto err_out; } + if (internal_conf->process_type == RTE_PROC_PRIMARY && + eal_apply_hugepage_mem_sz_limits(internal_conf) < 0) { + rte_eal_init_alert("Cannot apply hugepage memory limits."); + rte_errno = EINVAL; + goto err_out; + } } if (internal_conf->memory == 0 && internal_conf->force_numa == 0) { diff --git a/lib/eal/linux/eal.c b/lib/eal/linux/eal.c index d848de03d8..fc2e9b8c0e 100644 --- a/lib/eal/linux/eal.c +++ b/lib/eal/linux/eal.c @@ -748,6 +748,12 @@ rte_eal_init(int argc, char **argv) rte_errno = EACCES; goto err_out; } + if (internal_conf->process_type == RTE_PROC_PRIMARY && + eal_apply_hugepage_mem_sz_limits(internal_conf) < 0) { + rte_eal_init_alert("Cannot apply hugepage memory limits."); + rte_errno = EINVAL; + goto err_out; + } } if (internal_conf->memory == 0 && internal_conf->force_numa == 0) { diff --git a/lib/eal/linux/eal_memory.c b/lib/eal/linux/eal_memory.c index 55779badec..1ed4f69e3e 100644 --- a/lib/eal/linux/eal_memory.c +++ b/lib/eal/linux/eal_memory.c @@ -1813,6 +1813,7 @@ memseg_primary_init_32(void) for (hpi_idx = 0; hpi_idx < hp_sizes; hpi_idx++) { uint64_t max_pagesz_mem, cur_pagesz_mem = 0; uint64_t hugepage_sz; + uint64_t pagesz_mem_limit; struct hugepage_info *hpi; hpi = &internal_conf->hugepage_info[hpi_idx]; @@ -1823,6 +1824,8 @@ memseg_primary_init_32(void) continue; max_pagesz_mem = max_socket_mem - cur_socket_mem; + pagesz_mem_limit = internal_conf->hugepage_mem_sz_limits[hpi_idx]; + max_pagesz_mem = RTE_MIN(max_pagesz_mem, pagesz_mem_limit); /* make it multiple of page size */ max_pagesz_mem = RTE_ALIGN_FLOOR(max_pagesz_mem, diff --git a/lib/eal/windows/eal.c b/lib/eal/windows/eal.c index f06375a624..6dacae7235 100644 --- a/lib/eal/windows/eal.c +++ b/lib/eal/windows/eal.c @@ -229,6 +229,12 @@ rte_eal_init(int argc, char **argv) rte_errno = EACCES; goto err_out; } + if (!internal_conf->no_hugetlbfs && + eal_apply_hugepage_mem_sz_limits(internal_conf) < 0) { + rte_eal_init_alert("Cannot apply hugepage memory limits"); + rte_errno = EINVAL; + goto err_out; + } if (internal_conf->memory == 0 && !internal_conf->force_numa) { if (internal_conf->no_hugetlbfs) -- 2.47.3 ^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH v2 6/6] eal/memory: add page size VA limits EAL parameter 2026-03-13 16:06 ` [PATCH v2 0/6] Make VA reservation limits configurable Anatoly Burakov ` (4 preceding siblings ...) 2026-03-13 16:06 ` [PATCH v2 5/6] eal/memory: store default segment limits in config Anatoly Burakov @ 2026-03-13 16:06 ` Anatoly Burakov 5 siblings, 0 replies; 14+ messages in thread From: Anatoly Burakov @ 2026-03-13 16:06 UTC (permalink / raw) To: dev Currently, the VA space limits placed on DPDK memory are only informed by the default configuration coming from `rte_config.h` file. Add an EAL flag to specify per-page size memory limits explicitly, thereby overriding the default VA space reservations. Signed-off-by: Anatoly Burakov <anatoly.burakov@intel.com> --- app/test/test.c | 1 + app/test/test_eal_flags.c | 126 ++++++++++++++++++ doc/guides/linux_gsg/linux_eal_parameters.rst | 13 ++ .../prog_guide/env_abstraction_layer.rst | 27 +++- lib/eal/common/eal_common_dynmem.c | 9 ++ lib/eal/common/eal_common_options.c | 121 +++++++++++++++++ lib/eal/common/eal_internal_cfg.h | 6 + lib/eal/common/eal_option_list.h | 1 + 8 files changed, 302 insertions(+), 2 deletions(-) diff --git a/app/test/test.c b/app/test/test.c index 58ef52f312..c610c3588e 100644 --- a/app/test/test.c +++ b/app/test/test.c @@ -80,6 +80,7 @@ do_recursive_call(void) { "test_memory_flags", no_action }, { "test_file_prefix", no_action }, { "test_no_huge_flag", no_action }, + { "test_pagesz_mem_flags", no_action }, { "test_panic", test_panic }, { "test_exit", test_exit }, #ifdef RTE_LIB_TIMER diff --git a/app/test/test_eal_flags.c b/app/test/test_eal_flags.c index b3a8d0ae6f..4e1038be75 100644 --- a/app/test/test_eal_flags.c +++ b/app/test/test_eal_flags.c @@ -95,6 +95,14 @@ test_misc_flags(void) return TEST_SKIPPED; } +static int +test_pagesz_mem_flags(void) +{ + printf("pagesz_mem_flags not supported on Windows, skipping test\n"); + return TEST_SKIPPED; +} + + #else #include <libgen.h> @@ -1502,6 +1510,123 @@ populate_socket_mem_param(int num_sockets, const char *mem, offset += written; } +/* + * Tests for correct handling of --pagesz-mem flag + */ +static int +test_pagesz_mem_flags(void) +{ +#ifdef RTE_EXEC_ENV_FREEBSD + /* FreeBSD does not support --pagesz-mem */ + return 0; +#else + const char *in_memory = "--in-memory"; + + /* invalid: no value */ + const char * const argv0[] = {prgname, eal_debug_logs, no_pci, + "--file-prefix=" memtest, in_memory, "--pagesz-mem="}; + + /* invalid: no colon (missing limit) */ + const char * const argv1[] = {prgname, eal_debug_logs, no_pci, + "--file-prefix=" memtest, in_memory, "--pagesz-mem=2M"}; + + /* invalid: colon present but limit is empty */ + const char * const argv2[] = {prgname, eal_debug_logs, no_pci, + "--file-prefix=" memtest, in_memory, "--pagesz-mem=2M:"}; + + /* invalid: limit not aligned to page size (3M is not a multiple of 2M) */ + const char * const argv3[] = {prgname, eal_debug_logs, no_pci, + "--file-prefix=" memtest, in_memory, "--pagesz-mem=2M:3M"}; + + /* invalid: garbage value */ + const char * const argv4[] = {prgname, eal_debug_logs, no_pci, + "--file-prefix=" memtest, in_memory, "--pagesz-mem=garbage"}; + + /* invalid: garbage value */ + const char * const argv5[] = {prgname, eal_debug_logs, no_pci, + "--file-prefix=" memtest, in_memory, "--pagesz-mem=2M:garbage"}; + + /* invalid: --pagesz-mem combined with --no-huge */ + const char * const argv6[] = {prgname, eal_debug_logs, no_pci, + "--file-prefix=" memtest, in_memory, no_huge, "--pagesz-mem=2M:2M"}; + + /* valid: single well-formed aligned pair */ + const char * const argv7[] = {prgname, eal_debug_logs, no_pci, + "--file-prefix=" memtest, in_memory, "--pagesz-mem=2M:64M"}; + + /* valid: multiple occurrences */ + const char * const argv8[] = {prgname, eal_debug_logs, no_pci, + "--file-prefix=" memtest, in_memory, + "--pagesz-mem=2M:64M", "--pagesz-mem=1K:8K"}; + + /* valid: fake page size set to zero (ignored but syntactically valid) */ + const char * const argv9[] = {prgname, eal_debug_logs, no_pci, + "--file-prefix=" memtest, in_memory, "--pagesz-mem=1K:0"}; + + /* invalid: page size must be a power of two */ + const char * const argv10[] = {prgname, eal_debug_logs, no_pci, + "--file-prefix=" memtest, in_memory, "--pagesz-mem=3M:6M"}; + + if (launch_proc(argv0) == 0) { + printf("Error (line %d) - process run ok with empty --pagesz-mem!\n", + __LINE__); + return -1; + } + if (launch_proc(argv1) == 0) { + printf("Error (line %d) - process run ok with --pagesz-mem missing colon!\n", + __LINE__); + return -1; + } + if (launch_proc(argv2) == 0) { + printf("Error (line %d) - process run ok with --pagesz-mem missing limit!\n", + __LINE__); + return -1; + } + if (launch_proc(argv3) == 0) { + printf("Error (line %d) - process run ok with --pagesz-mem unaligned limit!\n", + __LINE__); + return -1; + } + if (launch_proc(argv4) == 0) { + printf("Error (line %d) - process run ok with --pagesz-mem garbage value!\n", + __LINE__); + return -1; + } + if (launch_proc(argv5) == 0) { + printf("Error (line %d) - process run ok with --pagesz-mem garbage value!\n", + __LINE__); + return -1; + } + if (launch_proc(argv6) == 0) { + printf("Error (line %d) - process run ok with --pagesz-mem and --no-huge!\n", + __LINE__); + return -1; + } + if (launch_proc(argv7) != 0) { + printf("Error (line %d) - process failed with valid --pagesz-mem!\n", + __LINE__); + return -1; + } + if (launch_proc(argv8) != 0) { + printf("Error (line %d) - process failed with multiple valid --pagesz-mem!\n", + __LINE__); + return -1; + } + if (launch_proc(argv9) != 0) { + printf("Error (line %d) - process failed with --pagesz-mem zero limit!\n", + __LINE__); + return -1; + } + if (launch_proc(argv10) == 0) { + printf("Error (line %d) - process run ok with non-power-of-two pagesz!\n", + __LINE__); + return -1; + } + + return 0; +#endif /* !RTE_EXEC_ENV_FREEBSD */ +} + /* * Tests for correct handling of -m and --socket-mem flags */ @@ -1683,5 +1808,6 @@ REGISTER_FAST_TEST(eal_flags_b_opt_autotest, NOHUGE_SKIP, ASAN_SKIP, test_invali REGISTER_FAST_TEST(eal_flags_vdev_opt_autotest, NOHUGE_SKIP, ASAN_SKIP, test_invalid_vdev_flag); REGISTER_FAST_TEST(eal_flags_r_opt_autotest, NOHUGE_SKIP, ASAN_SKIP, test_invalid_r_flag); REGISTER_FAST_TEST(eal_flags_mem_autotest, NOHUGE_SKIP, ASAN_SKIP, test_memory_flags); +REGISTER_FAST_TEST(eal_flags_pagesz_mem_autotest, NOHUGE_SKIP, ASAN_SKIP, test_pagesz_mem_flags); REGISTER_FAST_TEST(eal_flags_file_prefix_autotest, NOHUGE_SKIP, ASAN_SKIP, test_file_prefix); REGISTER_FAST_TEST(eal_flags_misc_autotest, NOHUGE_SKIP, ASAN_SKIP, test_misc_flags); diff --git a/doc/guides/linux_gsg/linux_eal_parameters.rst b/doc/guides/linux_gsg/linux_eal_parameters.rst index 7c5b26ce26..ce38dd128a 100644 --- a/doc/guides/linux_gsg/linux_eal_parameters.rst +++ b/doc/guides/linux_gsg/linux_eal_parameters.rst @@ -75,6 +75,19 @@ Memory-related options Place a per-NUMA node upper limit on memory use (non-legacy memory mode only). 0 will disable the limit for a particular NUMA node. +* ``--pagesz-mem <page size:limit>`` + + Set memory limit per hugepage size. + Each time the option is used, provide a single ``<pagesz>:<limit>`` pair; + repeat the option to specify additional page sizes. + Both values support K/M/G/T suffixes (for example ``2M:32G``). + + The memory limit must be a multiple of page size. + + For example:: + + --pagesz-mem 2M:32G --pagesz-mem 1G:512G + * ``--single-file-segments`` Create fewer files in hugetlbfs (non-legacy mode only). diff --git a/doc/guides/prog_guide/env_abstraction_layer.rst b/doc/guides/prog_guide/env_abstraction_layer.rst index 63e0568afa..e2adf0a184 100644 --- a/doc/guides/prog_guide/env_abstraction_layer.rst +++ b/doc/guides/prog_guide/env_abstraction_layer.rst @@ -204,13 +204,36 @@ of virtual memory being preallocated at startup by editing the following config variables: * ``RTE_MAX_MEMSEG_LISTS`` controls how many segment lists can DPDK have -* ``RTE_MAX_MEMSEG_PER_TYPE`` controls how many segments each memory type +* ``RTE_MAX_MEMSEG_PER_TYPE`` sets the default number of segments each memory type can have (where "type" is defined as "page size + NUMA node" combination) -* ``RTE_MAX_MEM_MB_PER_TYPE`` controls how much megabytes of memory each +* ``RTE_MAX_MEM_MB_PER_TYPE`` sets the default amount of memory each memory type can address Normally, these options do not need to be changed. +Runtime Override of Per-Page-Size Memory Limits +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +By default, DPDK uses compile-time configured limits for memory allocation per page size +(as set by ``RTE_MAX_MEM_MB_PER_TYPE``). +These limits apply uniformly across all NUMA nodes for a given page size. + +It is possible to override these defaults at runtime using the ``--pagesz-mem`` option, +which allows specifying custom memory limits for each page size. This is useful when: + +* The default limits may be insufficient or excessive for your workload +* You want to dedicate more memory to specific page sizes + +The ``--pagesz-mem`` option accepts exactly one ``<pagesz>:<limit>`` pair per +occurrence, where ``pagesz`` is a page size (e.g., ``2M``, ``4M``, ``1G``) +and ``limit`` is the maximum memory to reserve for that page size (e.g., ``64G``, ``512M``). +Both values support standard binary suffixes (K, M, G, T). +Memory limits must be aligned to their corresponding page size. + +Multiple page sizes can be specified by repeating the option:: + + --pagesz-mem 2M:64G --pagesz-mem 1G:512G + .. note:: Preallocated virtual memory is not to be confused with preallocated hugepage diff --git a/lib/eal/common/eal_common_dynmem.c b/lib/eal/common/eal_common_dynmem.c index c33fbdea6d..7096f46ff3 100644 --- a/lib/eal/common/eal_common_dynmem.c +++ b/lib/eal/common/eal_common_dynmem.c @@ -127,6 +127,11 @@ eal_dynmem_memseg_lists_init(void) mem_va_len += type->mem_sz; } + if (mem_va_len == 0) { + EAL_LOG(ERR, "No virtual memory will be reserved"); + goto out; + } + mem_va_addr = eal_get_virtual_area(NULL, &mem_va_len, mem_va_page_sz, 0, 0); if (mem_va_addr == NULL) { @@ -141,6 +146,10 @@ eal_dynmem_memseg_lists_init(void) uint64_t pagesz; int socket_id; + /* skip page sizes with zero memory limit */ + if (type->n_segs == 0) + continue; + pagesz = type->page_sz; socket_id = type->socket_id; diff --git a/lib/eal/common/eal_common_options.c b/lib/eal/common/eal_common_options.c index bbc4427524..0532d27aaa 100644 --- a/lib/eal/common/eal_common_options.c +++ b/lib/eal/common/eal_common_options.c @@ -21,6 +21,7 @@ #endif #include <rte_string_fns.h> +#include <rte_common.h> #include <rte_eal.h> #include <rte_log.h> #include <rte_lcore.h> @@ -233,6 +234,20 @@ eal_collate_args(int argc, char **argv) EAL_LOG(ERR, "Options allow (-a) and block (-b) can't be used at the same time"); return -1; } +#ifdef RTE_EXEC_ENV_FREEBSD + if (!TAILQ_EMPTY(&args.pagesz_mem)) { + EAL_LOG(ERR, "Option pagesz-mem is not supported on FreeBSD"); + return -1; + } +#endif + if (!TAILQ_EMPTY(&args.pagesz_mem) && args.no_huge) { + EAL_LOG(ERR, "Options pagesz-mem and no-huge can't be used at the same time"); + return -1; + } + if (!TAILQ_EMPTY(&args.pagesz_mem) && args.legacy_mem) { + EAL_LOG(ERR, "Options pagesz-mem and legacy-mem can't be used at the same time"); + return -1; + } /* for non-list args, we can just check for zero/null values using macro */ if (CONFLICTING_OPTIONS(args, coremask, lcores) || @@ -511,7 +526,10 @@ eal_reset_internal_config(struct internal_config *internal_cfg) sizeof(internal_cfg->hugepage_info[0])); internal_cfg->hugepage_info[i].lock_descriptor = -1; internal_cfg->hugepage_mem_sz_limits[i] = 0; + internal_cfg->pagesz_mem_overrides[i].pagesz = 0; + internal_cfg->pagesz_mem_overrides[i].limit = 0; } + internal_cfg->num_pagesz_mem_overrides = 0; internal_cfg->base_virtaddr = 0; /* if set to NONE, interrupt mode is determined automatically */ @@ -1867,6 +1885,96 @@ eal_parse_socket_arg(char *strval, volatile uint64_t *socket_arg) return 0; } +static int +eal_parse_pagesz_mem(char *strval, struct internal_config *internal_cfg) +{ + char strval_cpy[1024]; + char *fields[3]; + char *pagesz_str, *mem_str; + int arg_num; + int len; + unsigned int i; + uint64_t pagesz, mem_limit; + struct pagesz_mem_override *pmo; + + len = strnlen(strval, 1024); + if (len >= 1024) { + EAL_LOG(ERR, "--pagesz-mem parameter is too long"); + return -1; + } + + rte_strlcpy(strval_cpy, strval, sizeof(strval_cpy)); + + /* parse exactly one pagesz:mem pair per --pagesz-mem option */ + arg_num = rte_strsplit(strval_cpy, len, fields, RTE_DIM(fields), ':'); + if (arg_num != 2 || fields[0][0] == '\0' || fields[1][0] == '\0') { + EAL_LOG(ERR, "--pagesz-mem parameter format is invalid, expected <pagesz>:<limit>"); + return -1; + } + pagesz_str = fields[0]; + mem_str = fields[1]; + + /* reject accidental multiple pairs in one option */ + if (strchr(mem_str, ',') != NULL) { + EAL_LOG(ERR, "--pagesz-mem accepts one <pagesz>:<limit> pair per option"); + return -1; + } + + /* parse page size */ + errno = 0; + pagesz = rte_str_to_size(pagesz_str); + if (pagesz == 0 || errno != 0) { + EAL_LOG(ERR, "invalid page size in --pagesz-mem: '%s'", pagesz_str); + return -1; + } + if (!rte_is_power_of_2(pagesz)) { + EAL_LOG(ERR, "invalid page size in --pagesz-mem: '%s' (must be a power of two)", + pagesz_str); + return -1; + } + + /* parse memory limit (0 is valid: disables allocation for this page size) */ + errno = 0; + mem_limit = rte_str_to_size(mem_str); + if (errno != 0) { + EAL_LOG(ERR, "invalid memory limit in --pagesz-mem: '%s'", mem_str); + return -1; + } + + /* validate alignment: memory limit must be divisible by page size */ + if (mem_limit % pagesz != 0) { + EAL_LOG(ERR, "--pagesz-mem memory limit must be aligned to page size"); + return -1; + } + + for (i = 0; i < internal_cfg->num_pagesz_mem_overrides; i++) { + pmo = &internal_cfg->pagesz_mem_overrides[i]; + if (pmo->pagesz != pagesz) + continue; + + EAL_LOG(WARNING, + "--pagesz-mem specified multiple times for page size '%s'; later limit '%s' will be used", + pagesz_str, mem_str); + pmo->limit = mem_limit; + return 0; + } + + /* do we have space? */ + if (internal_cfg->num_pagesz_mem_overrides >= MAX_HUGEPAGE_SIZES) { + EAL_LOG(ERR, + "--pagesz-mem: too many page size entries (max %d)", + MAX_HUGEPAGE_SIZES); + return -1; + } + + pmo = &internal_cfg->pagesz_mem_overrides[internal_cfg->num_pagesz_mem_overrides]; + pmo->pagesz = pagesz; + pmo->limit = mem_limit; + internal_cfg->num_pagesz_mem_overrides++; + + return 0; +} + static int eal_parse_vfio_intr(const char *mode) { @@ -2172,6 +2280,12 @@ eal_parse_args(void) } int_cfg->force_numa_limits = 1; } + TAILQ_FOREACH(arg, &args.pagesz_mem, next) { + if (eal_parse_pagesz_mem(arg->arg, int_cfg) < 0) { + EAL_LOG(ERR, "invalid pagesz-mem parameter: '%s'", arg->arg); + return -1; + } + } /* tracing settings, not supported on windows */ #ifdef RTE_EXEC_ENV_WINDOWS @@ -2366,6 +2480,7 @@ eal_apply_hugepage_mem_sz_limits(struct internal_config *internal_cfg) unsigned int i; for (i = 0; i < internal_cfg->num_hugepage_sizes; i++) { + unsigned int j; const uint64_t pagesz = internal_cfg->hugepage_info[i].hugepage_sz; uint64_t limit; @@ -2373,6 +2488,12 @@ eal_apply_hugepage_mem_sz_limits(struct internal_config *internal_cfg) limit = RTE_MIN((uint64_t)RTE_MAX_MEM_MB_PER_TYPE << 20, (uint64_t)RTE_MAX_MEMSEG_PER_TYPE * pagesz); + /* override with user value for matching page size */ + for (j = 0; j < (unsigned int)internal_cfg->num_pagesz_mem_overrides; j++) { + if (internal_cfg->pagesz_mem_overrides[j].pagesz == pagesz) + limit = internal_cfg->pagesz_mem_overrides[j].limit; + } + internal_cfg->hugepage_mem_sz_limits[i] = limit; } diff --git a/lib/eal/common/eal_internal_cfg.h b/lib/eal/common/eal_internal_cfg.h index 0bf192c6e5..8475c87969 100644 --- a/lib/eal/common/eal_internal_cfg.h +++ b/lib/eal/common/eal_internal_cfg.h @@ -98,6 +98,12 @@ struct internal_config { struct hugepage_info hugepage_info[MAX_HUGEPAGE_SIZES]; uint64_t hugepage_mem_sz_limits[MAX_HUGEPAGE_SIZES]; /**< default max memory per hugepage size */ + /** storage for user-specified pagesz-mem overrides */ + struct pagesz_mem_override { + uint64_t pagesz; /**< page size in bytes */ + uint64_t limit; /**< memory limit in bytes */ + } pagesz_mem_overrides[MAX_HUGEPAGE_SIZES]; + unsigned int num_pagesz_mem_overrides; /**< number of stored overrides */ enum rte_iova_mode iova_mode ; /**< Set IOVA mode on this system */ rte_cpuset_t ctrl_cpuset; /**< cpuset for ctrl threads */ volatile unsigned int init_complete; diff --git a/lib/eal/common/eal_option_list.h b/lib/eal/common/eal_option_list.h index abee16340b..164a0b3888 100644 --- a/lib/eal/common/eal_option_list.h +++ b/lib/eal/common/eal_option_list.h @@ -56,6 +56,7 @@ BOOL_ARG("--no-huge", NULL, "Disable hugetlbfs support", no_huge) BOOL_ARG("--no-pci", NULL, "Disable all PCI devices", no_pci) BOOL_ARG("--no-shconf", NULL, "Disable shared config file generation", no_shconf) BOOL_ARG("--no-telemetry", NULL, "Disable telemetry", no_telemetry) +LIST_ARG("--pagesz-mem", NULL, "Memory allocation per hugepage size (format: <pagesz>:<limit>, e.g. 2M:32G). Repeat option for multiple page sizes.", pagesz_mem) STR_ARG("--proc-type", NULL, "Type of process (primary|secondary|auto)", proc_type) OPT_STR_ARG("--remap-lcore-ids", "-R", "Remap lcore IDs to be contiguous starting from 0, or supplied value", remap_lcore_ids) STR_ARG("--service-corelist", "-S", "List of cores to use for service threads", service_corelist) -- 2.47.3 ^ permalink raw reply related [flat|nested] 14+ messages in thread
end of thread, other threads:[~2026-03-13 16:16 UTC | newest] Thread overview: 14+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-03-11 10:58 [PATCH v1 0/5] Make VA reservation limits configurable Anatoly Burakov 2026-03-11 10:58 ` [PATCH v1 1/5] eal/memory: always use one segment per memory type Anatoly Burakov 2026-03-11 10:58 ` [PATCH v1 2/5] eal/memory: allocate all VA space in one go Anatoly Burakov 2026-03-11 10:58 ` [PATCH v1 3/5] eal/memory: get rid of global VA space limits Anatoly Burakov 2026-03-11 10:58 ` [PATCH v1 4/5] eal/memory: store default segment limits in config Anatoly Burakov 2026-03-11 10:58 ` [PATCH v1 5/5] eal/memory: add page size VA limits EAL parameter Anatoly Burakov 2026-03-13 16:06 ` [PATCH v2 0/6] Make VA reservation limits configurable Anatoly Burakov 2026-03-13 16:06 ` [PATCH v2 1/6] eal: reject non-numeric input in str to size Anatoly Burakov 2026-03-13 16:16 ` Bruce Richardson 2026-03-13 16:06 ` [PATCH v2 2/6] eal/memory: remove per-list segment and memory limits Anatoly Burakov 2026-03-13 16:06 ` [PATCH v2 3/6] eal/memory: allocate all VA space in one go Anatoly Burakov 2026-03-13 16:06 ` [PATCH v2 4/6] eal/memory: get rid of global VA space limits Anatoly Burakov 2026-03-13 16:06 ` [PATCH v2 5/6] eal/memory: store default segment limits in config Anatoly Burakov 2026-03-13 16:06 ` [PATCH v2 6/6] eal/memory: add page size VA limits EAL parameter Anatoly Burakov
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox