From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by smtp.lore.kernel.org (Postfix) with ESMTP id 09103CD4851 for ; Tue, 19 May 2026 10:12:58 +0000 (UTC) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 16BD040296; Tue, 19 May 2026 12:12:58 +0200 (CEST) Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by mails.dpdk.org (Postfix) with ESMTP id C29E24021E for ; Tue, 19 May 2026 12:12:55 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1779185575; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=qNladmy+N8GJKJZYYfh7XFg+udPW70SEdStn3rLxDm0=; b=NDGT+pXjHwET72l4sVkKcT0I3kHISa9K/Fkc9pSVmc7M33am2A0pWlr1WPfF5s15v4gn/n L5YwOC0OgfPGNt+e7+6DhBsDysHu7f0TKI/CW5ZRnd0nf9rJqgdN4W5BqI9nMdBVhD5TKb tb2V6AImwa58o5dT0bfqPjflJxLLUJ0= Received: from mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-628--xtQziViPfaHd2ULFdqudQ-1; Tue, 19 May 2026 06:12:51 -0400 X-MC-Unique: -xtQziViPfaHd2ULFdqudQ-1 X-Mimecast-MFC-AGG-ID: -xtQziViPfaHd2ULFdqudQ_1779185569 Received: from mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.12]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 3AE5118005B5; Tue, 19 May 2026 10:12:49 +0000 (UTC) Received: from ringo.redhat.com (unknown [10.44.34.88]) by mx-prod-int-03.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id B723019560A2; Tue, 19 May 2026 10:12:44 +0000 (UTC) From: Robin Jarry To: dev@dpdk.org, Jerin Jacob , Kiran Kumar K , Nithin Dabilpuram , Zhirun Yan Cc: Vladimir Medvedkin , Christophe Fontaine , David Marchand , Konstantin Ananyev , Maxime Leroy Subject: [PATCH dpdk] graph: replace circular buffer with priority-based bitmap scheduling Date: Tue, 19 May 2026 12:12:33 +0200 Message-ID: <20260519101232.541102-2-rjarry@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.0 on 10.30.177.12 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: zI083sHVUGV8mOx3rzD7z3Q1UTssiEuLd07oGHy3oqI_1779185569 X-Mimecast-Originator: redhat.com Content-Transfer-Encoding: quoted-printable content-type: text/plain; charset="US-ASCII"; x-default=true X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Replace the FIFO circular buffer used to track pending nodes with a bitmap and a priority-sorted schedule table. Each node can now have a scheduling priority (int16_t, default 0, lower value means visited first). Source nodes are forced to INT16_MIN so they always run first. At graph creation time, nodes are sorted by (priority, id) and assigned a bit position (sched_idx). During the walk, a bitmap tracks which nodes have pending objects. Scanning from the lowest bit naturally visits nodes in priority order. This improves batching in fan-out-then-converge topologies. When eth_input classifies packets to both mpls_input and ipv4_input, the old FIFO order could process ipv4_input before mpls_input, causing ipv4_input to be visited twice (once before and once after the MPLS label is popped). With mpls_input at a higher priority (lower value), it runs first and its output accumulates in ipv4_input which is then visited only once with all packets. The bitmap set operation is idempotent (OR on an already-set bit is a no-op) which removes the need for the idx =3D=3D 0 guards that the circular buffer required to avoid duplicate enqueue. Suggested-by: Vladimir Medvedkin Signed-off-by: Robin Jarry Cc: Christophe Fontaine Cc: David Marchand Cc: Konstantin Ananyev Cc: Maxime Leroy --- doc/guides/prog_guide/graph_lib.rst | 37 +- .../prog_guide/img/graph_mem_layout.svg | 1823 +++++++---------- lib/graph/graph.c | 19 +- lib/graph/graph_debug.c | 12 +- lib/graph/graph_populate.c | 117 +- lib/graph/graph_private.h | 27 +- lib/graph/node.c | 2 + lib/graph/rte_graph.h | 1 + lib/graph/rte_graph_model_mcore_dispatch.h | 34 +- lib/graph/rte_graph_model_rtc.h | 65 +- lib/graph/rte_graph_worker.h | 2 +- lib/graph/rte_graph_worker_common.h | 81 +- 12 files changed, 984 insertions(+), 1236 deletions(-) diff --git a/doc/guides/prog_guide/graph_lib.rst b/doc/guides/prog_guide/gr= aph_lib.rst index 8409e7666e85..9c6d8679b686 100644 --- a/doc/guides/prog_guide/graph_lib.rst +++ b/doc/guides/prog_guide/graph_lib.rst @@ -117,13 +117,22 @@ next_node[]: The dynamic array to store the downstream nodes connected to this node. Do= wnstream node should not be current node itself or a source node. =20 +priority: +^^^^^^^^^ + +The scheduling priority of the node (``int16_t``, default 0). Nodes with l= ower +priority values are visited first during the graph walk. This allows contr= ol +over the order in which pending nodes are processed, which can improve pac= ket +batching in topologies where multiple paths converge on the same node. + Source node: ^^^^^^^^^^^^ =20 Source nodes are static nodes created using ``RTE_NODE_REGISTER`` by passi= ng ``flags`` as ``RTE_NODE_SOURCE_F``. -While performing the graph walk, the ``process()`` function of all the sou= rce -nodes will be called first. So that these nodes can be used as input nodes= for a graph. +Source nodes are automatically assigned the lowest possible priority +(``INT16_MIN``) so that their ``process()`` function is always called firs= t +during the graph walk. This ensures they act as input nodes for a graph. =20 nb_xstats: ^^^^^^^^^^ @@ -396,12 +405,26 @@ Graph object memory layout Understanding the memory layout helps to debug the graph library and improve the performance if needed. =20 -Graph object consists of a header, circular buffer to store the pending st= ream -when walking over the graph, variable-length memory to store the ``rte_nod= e`` objects, -and variable-length memory to store the xstat reported by each ``rte_node`= `. +A graph object consists of a header, a scheduling table mapping bit positi= ons to +node offsets, pending and source bitmaps for tracking which nodes need +processing, variable-length memory to store the ``rte_node`` objects, and +variable-length memory to store the xstat reported by each ``rte_node``. =20 -The graph_nodes_mem_create() creates and populate this memory. The functio= ns -such as ``rte_graph_walk()`` and ``rte_node_enqueue_*`` use this memory +Nodes are sorted by ``(priority, node_id)`` at graph creation time and eac= h +node is assigned a bit position in the pending bitmap. During the graph wa= lk, +the bitmap is scanned from the lowest bit, so nodes with lower priority va= lues +are visited first. Source nodes are always assigned the lowest priority +(``INT16_MIN``) to ensure they run before any processing node. + +This priority-based ordering improves batching in fan-out-then-converge +topologies. For example, if ``eth_input`` classifies packets to both +``mpls_input`` and ``ipv4_input``, giving ``mpls_input`` a lower priority = value +ensures it runs first. Its output accumulates in ``ipv4_input`` which is t= hen +visited only once with all packets, instead of being visited twice (before= and +after MPLS label popping). + +The ``graph_fp_mem_create()`` function creates and populates this memory. = The +functions such as ``rte_graph_walk()`` and ``rte_node_enqueue_*`` use this= memory to enable fastpath services. =20 Graph feature arc diff --git a/doc/guides/prog_guide/img/graph_mem_layout.svg b/doc/guides/pr= og_guide/img/graph_mem_layout.svg index 3f3a4f3a808e..853987907dec 100644 --- a/doc/guides/prog_guide/img/graph_mem_layout.svg +++ b/doc/guides/prog_guide/img/graph_mem_layout.svg @@ -1,65 +1,22 @@ - + + - - - - + id=3D"defs56"> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Node 1 xstats - Node 1 - xstat 1 - xstat 1 - xstat 1 - xstat 2 - xstat 2 - xstat 2 - xstat N - xstat 3 - xstat 3 - xstat 4 - Node 4 - Node 8 - Node N - Node 4 xstats - Node 8 xstats - Node N xstats - - rte_node xstat memory layout - - - - - - - - - - - - - + + + + + + + + rte_graph object memory layout + + + Graph Header + struct = rte_graph + + + Scheduling table + + + Pending bitmap + + + Source nodes bitmap + + + Fence + RTE_GRA= PH_FENCE + + + Node 0 + struct = rte_node + + + Node 1 + struct = rte_node + + + Node 2 + struct = rte_node + + + Node N + struct = rte_node + + ... + + + + + nb_nodes + + + Node 1 xstats + + + + Scheduling table + + + Node 0 offset (prio min) + + Node 1 offset + + Node 2 offset + ... + + Node N offset (prio max) + + + + rte_n= ode object memory layout + + + Fence + + Slowpath area + (incl. sched_idx) + + Context memory + + Fastpath area + + Next node 0 ptr + + Next node 1 ptr + + Next node N ptr + + + + + nb_edges + + + + rte_node xstats memory layout + + + + + + + + + one uint64_t word per node + Seeded to pending on walk start + + + xstat0 + + + Node 7 xstats + + Node N xstats + ... + + + + cacheline + + xstat1 + + xstatN + ... + + ... + + bit set to 1when node has work diff --git a/lib/graph/graph.c b/lib/graph/graph.c index 6911ea8abeed..6dc1402e6bd0 100644 --- a/lib/graph/graph.c +++ b/lib/graph/graph.c @@ -334,20 +334,6 @@ graph_mem_fixup_secondary(struct rte_graph *graph) =09return graph_mem_fixup_node_ctx(graph); } =20 -static bool -graph_src_node_avail(struct graph *graph) -{ -=09struct graph_node *graph_node; - -=09STAILQ_FOREACH(graph_node, &graph->node_list, next) -=09=09if ((graph_node->node->flags & RTE_NODE_SOURCE_F) && -=09=09 (graph_node->node->lcore_id =3D=3D RTE_MAX_LCORE || -=09=09 graph->lcore_id =3D=3D graph_node->node->lcore_id)) -=09=09=09return true; - -=09return false; -} - RTE_EXPORT_SYMBOL(rte_graph_model_mcore_dispatch_core_bind) int rte_graph_model_mcore_dispatch_core_bind(rte_graph_t id, int lcore) @@ -375,9 +361,8 @@ rte_graph_model_mcore_dispatch_core_bind(rte_graph_t id= , int lcore) =09graph->graph->dispatch.lcore_id =3D graph->lcore_id; =09graph->socket =3D rte_lcore_to_socket_id(lcore); =20 -=09/* check the availability of source node */ -=09if (!graph_src_node_avail(graph)) -=09=09graph->graph->head =3D 0; +=09/* Rebuild source bitmap with only source nodes bound to this lcore */ +=09graph_src_bitmap_rebuild(graph); =20 =09return 0; =20 diff --git a/lib/graph/graph_debug.c b/lib/graph/graph_debug.c index e3b8cccdc1f0..8e99fa1b0fb8 100644 --- a/lib/graph/graph_debug.c +++ b/lib/graph/graph_debug.c @@ -15,8 +15,8 @@ graph_dump(FILE *f, struct graph *g) =20 =09fprintf(f, "graph <%s>\n", g->name); =09fprintf(f, " id=3D%" PRIu32 "\n", g->id); -=09fprintf(f, " cir_start=3D%" PRIu32 "\n", g->cir_start); -=09fprintf(f, " cir_mask=3D%" PRIu32 "\n", g->cir_mask); +=09fprintf(f, " sched_table_off=3D%" PRIu32 "\n", g->sched_table_off); +=09fprintf(f, " nb_sched_words=3D%" PRIu16 "\n", g->nb_sched_words); =09fprintf(f, " addr=3D%p\n", g); =09fprintf(f, " graph=3D%p\n", g->graph); =09fprintf(f, " mem_sz=3D%zu\n", g->mem_sz); @@ -63,14 +63,14 @@ rte_graph_obj_dump(FILE *f, struct rte_graph *g, bool a= ll) =20 =09fprintf(f, "graph <%s> @ %p\n", g->name, g); =09fprintf(f, " id=3D%" PRIu32 "\n", g->id); -=09fprintf(f, " head=3D%" PRId32 "\n", (int32_t)g->head); -=09fprintf(f, " tail=3D%" PRId32 "\n", (int32_t)g->tail); -=09fprintf(f, " cir_mask=3D0x%" PRIx32 "\n", g->cir_mask); =09fprintf(f, " nb_nodes=3D%" PRId32 "\n", g->nb_nodes); +=09fprintf(f, " nb_sched_words=3D%" PRIu16 "\n", g->nb_sched_words); =09fprintf(f, " socket=3D%d\n", g->socket); =09fprintf(f, " fence=3D0x%" PRIx64 "\n", g->fence); =09fprintf(f, " nodes_start=3D0x%" PRIx32 "\n", g->nodes_start); -=09fprintf(f, " cir_start=3D%p\n", g->cir_start); +=09fprintf(f, " sched_table=3D%p\n", g->sched_table); +=09fprintf(f, " pending=3D%p\n", g->pending); +=09fprintf(f, " src_pending=3D%p\n", g->src_pending); =20 =09rte_graph_foreach_node(count, off, g, n) { =09=09if (!all && n->idx =3D=3D 0) diff --git a/lib/graph/graph_populate.c b/lib/graph/graph_populate.c index 026daecb2122..45bc7704fede 100644 --- a/lib/graph/graph_populate.c +++ b/lib/graph/graph_populate.c @@ -3,6 +3,8 @@ */ =20 =20 +#include + #include #include #include @@ -15,19 +17,27 @@ static size_t graph_fp_mem_calc_size(struct graph *graph) { =09struct graph_node *graph_node; -=09rte_node_t val; +=09uint16_t nwords; =09size_t sz; =20 =09/* Graph header */ =09sz =3D sizeof(struct rte_graph); -=09/* Source nodes list */ -=09sz +=3D sizeof(rte_graph_off_t) * graph->src_node_count; -=09/* Circular buffer for pending streams of size number of nodes */ -=09val =3D rte_align32pow2(graph->node_count * sizeof(rte_graph_off_t)); -=09sz =3D RTE_ALIGN(sz, val); -=09graph->cir_start =3D sz; -=09graph->cir_mask =3D rte_align32pow2(graph->node_count) - 1; -=09sz +=3D val; + +=09/* Schedule table: node offset indexed by sched_idx */ +=09sz =3D RTE_ALIGN(sz, RTE_CACHE_LINE_SIZE); +=09graph->sched_table_off =3D sz; +=09sz +=3D sizeof(rte_graph_off_t) * graph->node_count; + +=09/* Pending and source pending bitmaps */ +=09nwords =3D (graph->node_count + 63) / 64; +=09graph->nb_sched_words =3D nwords; +=09sz =3D RTE_ALIGN(sz, RTE_CACHE_LINE_SIZE); +=09graph->pending_off =3D sz; +=09sz +=3D sizeof(uint64_t) * nwords; +=09sz =3D RTE_ALIGN(sz, RTE_CACHE_LINE_SIZE); +=09graph->src_pending_off =3D sz; +=09sz +=3D sizeof(uint64_t) * nwords; + =09/* Fence */ =09sz +=3D sizeof(RTE_GRAPH_FENCE); =09sz =3D RTE_ALIGN(sz, RTE_CACHE_LINE_SIZE); @@ -54,20 +64,44 @@ graph_fp_mem_calc_size(struct graph *graph) } =20 static void -graph_header_popluate(struct graph *_graph) +graph_header_populate(struct graph *_graph) { =09struct rte_graph *graph =3D _graph->graph; =20 -=09graph->tail =3D 0; -=09graph->head =3D (int32_t)-_graph->src_node_count; -=09graph->cir_mask =3D _graph->cir_mask; =09graph->nb_nodes =3D _graph->node_count; -=09graph->cir_start =3D RTE_PTR_ADD(graph, _graph->cir_start); +=09graph->nb_sched_words =3D _graph->nb_sched_words; +=09graph->sched_table =3D RTE_PTR_ADD(graph, _graph->sched_table_off); +=09graph->pending =3D RTE_PTR_ADD(graph, _graph->pending_off); +=09graph->src_pending =3D RTE_PTR_ADD(graph, _graph->src_pending_off); =09graph->nodes_start =3D _graph->nodes_start; =09graph->socket =3D _graph->socket; =09graph->id =3D _graph->id; =09memcpy(graph->name, _graph->name, RTE_GRAPH_NAMESIZE); =09graph->fence =3D RTE_GRAPH_FENCE; + +=09memset(graph->pending, 0, sizeof(uint64_t) * _graph->nb_sched_words); +=09memset(graph->src_pending, 0, sizeof(uint64_t) * _graph->nb_sched_words= ); +} + +static int16_t +graph_node_effective_priority(const struct graph_node *gn) +{ +=09if (gn->node->flags & RTE_NODE_SOURCE_F) +=09=09return INT16_MIN; +=09return gn->node->priority; +} + +static int +graph_node_priority_cmp(const void *a, const void *b) +{ +=09const struct graph_node *const *na =3D a; +=09const struct graph_node *const *nb =3D b; +=09int16_t pa =3D graph_node_effective_priority(*na); +=09int16_t pb =3D graph_node_effective_priority(*nb); + +=09if (pa !=3D pb) +=09=09return (int)pa - (int)pb; +=09return (int)(*na)->node->id - (int)(*nb)->node->id; } =20 static void @@ -76,15 +110,26 @@ graph_nodes_populate(struct graph *_graph) =09rte_graph_off_t xstat_off =3D _graph->xstats_start; =09rte_graph_off_t off =3D _graph->nodes_start; =09struct rte_graph *graph =3D _graph->graph; -=09struct graph_node *graph_node; +=09struct graph_node **sorted, *graph_node; =09rte_edge_t count, nb_edges; =09rte_node_t pid; +=09uint32_t n; =20 -=09STAILQ_FOREACH(graph_node, &_graph->node_list, next) { +=09/* Build a sorted array of graph_node pointers by (priority, id) */ +=09sorted =3D calloc(_graph->node_count, sizeof(*sorted)); +=09RTE_VERIFY(sorted !=3D NULL); +=09n =3D 0; +=09STAILQ_FOREACH(graph_node, &_graph->node_list, next) +=09=09sorted[n++] =3D graph_node; +=09qsort(sorted, n, sizeof(*sorted), graph_node_priority_cmp); + +=09for (n =3D 0; n < _graph->node_count; n++) { +=09=09graph_node =3D sorted[n]; =09=09struct rte_node *node =3D RTE_PTR_ADD(graph, off); =09=09memset(node, 0, sizeof(*node)); =09=09node->fence =3D RTE_GRAPH_FENCE; =09=09node->off =3D off; +=09=09node->sched_idx =3D n; =09=09if (graph_pcap_is_enable()) { =09=09=09node->process =3D graph_pcap_dispatch; =09=09=09node->original_process =3D graph_node->node->process; @@ -123,8 +168,14 @@ graph_nodes_populate(struct graph *_graph) =09=09off +=3D sizeof(struct rte_node *) * nb_edges; =09=09off =3D RTE_ALIGN(off, RTE_CACHE_LINE_SIZE); =09=09node->next =3D off; + +=09=09/* Fill the schedule table */ +=09=09graph->sched_table[n] =3D node->off; + =09=09__rte_node_stream_alloc(graph, node); =09} + +=09free(sorted); } =20 struct rte_node * @@ -179,12 +230,11 @@ graph_node_nexts_populate(struct graph *_graph) } =20 static int -graph_src_nodes_offset_populate(struct graph *_graph) +graph_src_bitmap_populate(struct graph *_graph) { =09struct rte_graph *graph =3D _graph->graph; =09struct graph_node *graph_node; =09struct rte_node *node; -=09int32_t head =3D -1; =09const char *name; =20 =09STAILQ_FOREACH(graph_node, &_graph->node_list, next) { @@ -195,7 +245,7 @@ graph_src_nodes_offset_populate(struct graph *_graph) =09=09=09=09SET_ERR_JMP(EINVAL, fail, "%s not found", name); =20 =09=09=09__rte_node_stream_alloc(graph, node); -=09=09=09graph->cir_start[head--] =3D node->off; +=09=09=09__rte_node_pending_set(graph->src_pending, node); =09=09} =09} =20 @@ -204,17 +254,42 @@ graph_src_nodes_offset_populate(struct graph *_graph) =09return -rte_errno; } =20 +void +graph_src_bitmap_rebuild(struct graph *_graph) +{ +=09struct rte_graph *graph =3D _graph->graph; +=09struct graph_node *graph_node; +=09struct rte_node *node; +=09const char *name; + +=09memset(graph->src_pending, 0, +=09 sizeof(uint64_t) * graph->nb_sched_words); + +=09STAILQ_FOREACH(graph_node, &_graph->node_list, next) { +=09=09if (!(graph_node->node->flags & RTE_NODE_SOURCE_F)) +=09=09=09continue; +=09=09if (graph_node->node->lcore_id !=3D RTE_MAX_LCORE && +=09=09 graph_node->node->lcore_id !=3D _graph->lcore_id) +=09=09=09continue; +=09=09name =3D graph_node->node->name; +=09=09node =3D graph_node_name_to_ptr(graph, name); +=09=09if (node =3D=3D NULL) +=09=09=09continue; +=09=09__rte_node_pending_set(graph->src_pending, node); +=09} +} + static int graph_fp_mem_populate(struct graph *graph) { =09int rc; =20 -=09graph_header_popluate(graph); +=09graph_header_populate(graph); =09if (graph_pcap_is_enable()) =09=09graph_pcap_init(graph); =09graph_nodes_populate(graph); =09rc =3D graph_node_nexts_populate(graph); -=09rc |=3D graph_src_nodes_offset_populate(graph); +=09rc |=3D graph_src_bitmap_populate(graph); =20 =09return rc; } diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h index 26cdc6637192..df6f83b20261 100644 --- a/lib/graph/graph_private.h +++ b/lib/graph/graph_private.h @@ -49,6 +49,7 @@ struct node { =09STAILQ_ENTRY(node) next; /**< Next node in the list. */ =09char name[RTE_NODE_NAMESIZE]; /**< Name of the node. */ =09uint64_t flags;=09=09 /**< Node configuration flag. */ +=09int16_t priority;=09 /**< Scheduling priority. */ =09unsigned int lcore_id; =09/**< Node runs on the Lcore ID used for mcore dispatch model. */ =09rte_node_process_t process; /**< Node process function. */ @@ -98,19 +99,23 @@ struct graph { =09const struct rte_memzone *mz; =09/**< Memzone to store graph data. */ =09rte_graph_off_t nodes_start; -=09/**< Node memory start offset in graph reel. */ +=09/**< Node memory start offset in graph memory. */ =09rte_graph_off_t xstats_start; -=09/**< Node xstats memory start offset in graph reel. */ +=09/**< Node xstats memory start offset in graph memory. */ =09rte_node_t src_node_count; =09/**< Number of source nodes in a graph. */ =09struct rte_graph *graph; =09/**< Pointer to graph data. */ =09rte_node_t node_count; =09/**< Total number of nodes. */ -=09uint32_t cir_start; -=09/**< Circular buffer start offset in graph reel. */ -=09uint32_t cir_mask; -=09/**< Circular buffer mask for wrap around. */ +=09uint32_t sched_table_off; +=09/**< Schedule table start offset in graph memory. */ +=09uint32_t pending_off; +=09/**< Pending bitmap start offset in graph memory. */ +=09uint32_t src_pending_off; +=09/**< Source pending bitmap start offset in graph memory. */ +=09uint16_t nb_sched_words; +=09/**< Number of uint64_t words in pending bitmaps. */ =09rte_graph_t id; =09/**< Graph identifier. */ =09rte_graph_t parent_id; @@ -378,6 +383,16 @@ int graph_fp_mem_create(struct graph *graph); */ int graph_fp_mem_destroy(struct graph *graph); =20 +/** + * @internal + * + * Rebuild the source pending bitmap based on lcore affinity. + * + * @param graph + * Pointer to the internal graph object. + */ +void graph_src_bitmap_rebuild(struct graph *graph); + /* Lookup functions */ /** * @internal diff --git a/lib/graph/node.c b/lib/graph/node.c index e3359fe490a5..b5599143b37b 100644 --- a/lib/graph/node.c +++ b/lib/graph/node.c @@ -153,6 +153,7 @@ __rte_node_register(const struct rte_node_register *reg= ) =09if (rte_strscpy(node->name, reg->name, RTE_NODE_NAMESIZE) < 0) =09=09goto free_xstat; =09node->flags =3D reg->flags; +=09node->priority =3D reg->priority; =09node->process =3D reg->process; =09node->init =3D reg->init; =09node->fini =3D reg->fini; @@ -216,6 +217,7 @@ node_clone(struct node *node, const char *name) =20 =09/* Clone the source node */ =09reg->flags =3D node->flags; +=09reg->priority =3D node->priority; =09reg->process =3D node->process; =09reg->init =3D node->init; =09reg->fini =3D node->fini; diff --git a/lib/graph/rte_graph.h b/lib/graph/rte_graph.h index 7e433f466129..6cd32ec22284 100644 --- a/lib/graph/rte_graph.h +++ b/lib/graph/rte_graph.h @@ -496,6 +496,7 @@ struct rte_node_register { =09char name[RTE_NODE_NAMESIZE]; /**< Name of the node. */ =09uint64_t flags;=09=09 /**< Node configuration flag. */ #define RTE_NODE_SOURCE_F (1ULL << 0) /**< Node type is source. */ +=09int16_t priority; /**< Scheduling priority (lower =3D visited first, de= fault 0). */ =09rte_node_process_t process; /**< Node process function. */ =09rte_node_init_t init; /**< Node init function. */ =09rte_node_fini_t fini; /**< Node fini function. */ diff --git a/lib/graph/rte_graph_model_mcore_dispatch.h b/lib/graph/rte_gra= ph_model_mcore_dispatch.h index f9ff3daa88ec..50a473564b56 100644 --- a/lib/graph/rte_graph_model_mcore_dispatch.h +++ b/lib/graph/rte_graph_model_mcore_dispatch.h @@ -77,9 +77,13 @@ int rte_graph_model_mcore_dispatch_node_lcore_affinity_s= et(const char *name, =09=09=09=09=09=09=09 unsigned int lcore_id); =20 /** - * Perform graph walk on the circular buffer and invoke the process functi= on + * Perform graph walk on the pending bitmap and invoke the process functio= n * of the nodes and collect the stats. * + * Nodes are visited in scheduling order (lowest priority value first). + * Source nodes are seeded into the pending bitmap at the start of each wa= lk. + * Nodes with different lcore affinity are dispatched to their target lcor= e. + * * @param graph * Graph pointer returned from rte_graph_lookup function. * @@ -88,20 +92,28 @@ int rte_graph_model_mcore_dispatch_node_lcore_affinity_= set(const char *name, static inline void rte_graph_walk_mcore_dispatch(struct rte_graph *graph) { -=09const rte_graph_off_t *cir_start =3D graph->cir_start; -=09const rte_node_t mask =3D graph->cir_mask; -=09uint32_t head =3D graph->head; +=09const uint16_t nwords =3D graph->nb_sched_words; =09struct rte_node *node; +=09uint16_t word, bit; =20 =09if (graph->dispatch.wq !=3D NULL) =09=09__rte_graph_mcore_dispatch_sched_wq_process(graph); =20 -=09while (likely(head !=3D graph->tail)) { -=09=09node =3D (struct rte_node *)RTE_PTR_ADD(graph, cir_start[(int32_t)he= ad++]); +=09/* Seed pending bitmap with source nodes bound to this lcore */ +=09for (word =3D 0; word < nwords; word++) +=09=09graph->pending[word] |=3D graph->src_pending[word]; =20 -=09=09/* skip the src nodes which not bind with current worker */ -=09=09if ((int32_t)head < 1 && node->dispatch.lcore_id !=3D graph->dispatc= h.lcore_id) -=09=09=09continue; +=09for (;;) { +=09=09/* find first word with any pending bit */ +=09=09for (word =3D 0; word < nwords; word++) +=09=09=09if (graph->pending[word]) +=09=09=09=09break; +=09=09if (word =3D=3D nwords) +=09=09=09break; /* no more pending nodes */ + +=09=09bit =3D rte_ctz64(graph->pending[word]); +=09=09graph->pending[word] &=3D ~(1ULL << bit); +=09=09node =3D __rte_graph_pending_node(graph, word, bit); =20 =09=09/* Schedule the node until all task/objs are done */ =09=09if (node->dispatch.lcore_id !=3D RTE_MAX_LCORE && @@ -111,11 +123,7 @@ rte_graph_walk_mcore_dispatch(struct rte_graph *graph) =09=09=09continue; =20 =09=09__rte_node_process(graph, node); - -=09=09head =3D likely((int32_t)head > 0) ? head & mask : head; =09} - -=09graph->tail =3D 0; } =20 #ifdef __cplusplus diff --git a/lib/graph/rte_graph_model_rtc.h b/lib/graph/rte_graph_model_rt= c.h index 4b6236e301e3..38feb3e1ca88 100644 --- a/lib/graph/rte_graph_model_rtc.h +++ b/lib/graph/rte_graph_model_rtc.h @@ -6,9 +6,12 @@ #include "rte_graph_worker_common.h" =20 /** - * Perform graph walk on the circular buffer and invoke the process functi= on + * Perform graph walk on the pending bitmap and invoke the process functio= n * of the nodes and collect the stats. * + * Nodes are visited in scheduling order (lowest priority value first). + * Source nodes are seeded into the pending bitmap at the start of each wa= lk. + * * @param graph * Graph pointer returned from rte_graph_lookup function. * @@ -17,30 +20,52 @@ static inline void rte_graph_walk_rtc(struct rte_graph *graph) { -=09const rte_graph_off_t *cir_start =3D graph->cir_start; -=09const rte_node_t mask =3D graph->cir_mask; -=09uint32_t head =3D graph->head; +=09const uint16_t nwords =3D graph->nb_sched_words; =09struct rte_node *node; +=09uint16_t word, bit; =20 =09/* -=09 * Walk on the source node(s) ((cir_start - head) -> cir_start) and the= n -=09 * on the pending streams (cir_start -> (cir_start + mask) -> cir_start= ) -=09 * in a circular buffer fashion. +=09 * Nodes are assigned a bit position (sched_idx) sorted by (priority, +=09 * node_id) at graph creation time. Source nodes are forced to INT16_MI= N +=09 * priority so they always come first. =09 * -=09 *=09+-----+ <=3D cir_start - head [number of source nodes] -=09 *=09| | -=09 *=09| ... | <=3D source nodes -=09 *=09| | -=09 *=09+-----+ <=3D cir_start [head =3D 0] [tail =3D 0] -=09 *=09| | -=09 *=09| ... | <=3D pending streams -=09 *=09| | -=09 *=09+-----+ <=3D cir_start + mask +=09 * sched_table[] maps bit positions to node offsets: +=09 * +=09 * pending[] sched_table[] +=09 * +----------+ +------------------+ +=09 * | word 0 | ---> | src_node_0 | bit 0 (prio=3DINT16_MIN) +=09 * | 1100...1 | | src_node_1 | bit 1 (prio=3DINT16_MIN) +=09 * | | | mpls_input | bit 2 (prio=3D-10) +=09 * | | | ipv4_input | bit 3 (prio=3D0) +=09 * | | | ... | +=09 * +----------+ +------------------+ +=09 * | word 1 | ---> | ip4_rewrite | bit 64 (prio=3D10) +=09 * | ... | | ... | +=09 * +----------+ +------------------+ +=09 * +=09 * Walk: for each word, find lowest set bit (rte_ctz64), process that +=09 * node, clear the bit, re-read the word (processing may have set new +=09 * bits), repeat. +=09 * +=09 * After each node is processed, restart scanning from word 0 since +=09 * processing may set bits in any word, including earlier ones. =09 */ -=09while (likely(head !=3D graph->tail)) { -=09=09node =3D (struct rte_node *)RTE_PTR_ADD(graph, cir_start[(int32_t)he= ad++]); + +=09/* Seed pending bitmap with source nodes */ +=09for (word =3D 0; word < nwords; word++) +=09=09graph->pending[word] |=3D graph->src_pending[word]; + +=09for (;;) { +=09=09/* find first word with any pending bit */ +=09=09for (word =3D 0; word < nwords; word++) +=09=09=09if (graph->pending[word]) +=09=09=09=09break; +=09=09if (word =3D=3D nwords) +=09=09=09break; /* no more pending nodes */ + +=09=09bit =3D rte_ctz64(graph->pending[word]); +=09=09graph->pending[word] &=3D ~(1ULL << bit); +=09=09node =3D __rte_graph_pending_node(graph, word, bit); =09=09__rte_node_process(graph, node); -=09=09head =3D likely((int32_t)head > 0) ? head & mask : head; =09} -=09graph->tail =3D 0; } diff --git a/lib/graph/rte_graph_worker.h b/lib/graph/rte_graph_worker.h index b0f952a82cc9..e513d7a655d9 100644 --- a/lib/graph/rte_graph_worker.h +++ b/lib/graph/rte_graph_worker.h @@ -14,7 +14,7 @@ extern "C" { #endif =20 /** - * Perform graph walk on the circular buffer and invoke the process functi= on + * Perform graph walk on the pending bitmap and invoke the process functio= n * of the nodes and collect the stats. * * @param graph diff --git a/lib/graph/rte_graph_worker_common.h b/lib/graph/rte_graph_work= er_common.h index 4ab53a533e4c..e52a37ce5e84 100644 --- a/lib/graph/rte_graph_worker_common.h +++ b/lib/graph/rte_graph_worker_common.h @@ -49,15 +49,14 @@ SLIST_HEAD(rte_graph_rq_head, rte_graph); */ struct __rte_cache_aligned rte_graph { =09/* Fast path area. */ -=09uint32_t tail;=09=09 /**< Tail of circular buffer. */ -=09uint32_t head;=09=09 /**< Head of circular buffer. */ -=09uint32_t cir_mask;=09 /**< Circular buffer wrap around mask. */ =09rte_node_t nb_nodes;=09 /**< Number of nodes in the graph. */ -=09rte_graph_off_t *cir_start; /**< Pointer to circular buffer. */ =09rte_graph_off_t nodes_start; /**< Offset at which node memory starts. *= / +=09rte_graph_off_t *sched_table; /**< Node offset indexed by sched_idx. */ +=09uint64_t *pending;=09 /**< Bitmap of pending nodes. */ +=09uint64_t *src_pending;=09 /**< Bitmap of source nodes (constant). *= / +=09uint16_t nb_sched_words; /**< Number of uint64_t words in pending b= itmaps. */ =09uint8_t model;=09=09 /**< graph model */ -=09uint8_t reserved1;=09 /**< Reserved for future use. */ -=09uint16_t reserved2;=09 /**< Reserved for future use. */ +=09/* 26 bytes padding */ =09union { =09=09/* Fast schedule area for mcore dispatch model */ =09=09struct { @@ -98,6 +97,7 @@ struct __rte_cache_aligned rte_node { =09rte_node_t id;=09=09/**< Node identifier. */ =09rte_node_t parent_id;=09/**< Parent Node identifier. */ =09rte_edge_t nb_edges;=09/**< Number of edges from this node. */ +=09uint16_t sched_idx;=09/**< Bit position in pending bitmap. */ =09uint32_t realloc_count;=09/**< Number of times realloced. */ =20 =09char parent[RTE_NODE_NAMESIZE];=09/**< Parent node name. */ @@ -132,7 +132,7 @@ struct __rte_cache_aligned rte_node { =09=09}; /**< Node Context. */ =09=09uint16_t size;=09=09/**< Total number of objects available. */ =09=09uint16_t idx;=09=09/**< Number of objects used. */ -=09=09rte_graph_off_t off;=09/**< Offset of node in the graph reel. */ +=09=09rte_graph_off_t off;=09/**< Offset of node in the graph memory. */ =09=09uint64_t total_cycles;=09/**< Cycles spent in this node. */ =09=09uint64_t total_calls;=09/**< Calls done to this node. */ =09=09uint64_t total_objs;=09/**< Objects processed by this node. */ @@ -187,12 +187,12 @@ void __rte_node_stream_alloc_size(struct rte_graph *g= raph, /** * @internal * - * Enqueue a given node to the tail of the graph reel. + * Process a node's pending objects and collect stats. * * @param graph * Pointer Graph object. * @param node - * Pointer to node object to be enqueued. + * Pointer to node object to be processed. */ static __rte_always_inline void __rte_node_process(struct rte_graph *graph, struct rte_node *node) @@ -220,21 +220,42 @@ __rte_node_process(struct rte_graph *graph, struct rt= e_node *node) /** * @internal * - * Enqueue a given node to the tail of the graph reel. + * Get a pointer to a node from the scheduling table. * * @param graph * Pointer Graph object. + * @param word + * Offset in the pending bitmap. + * @param bit + * Bit number. + * + * @return + * Pointer to the node. + */ +static __rte_always_inline struct rte_node * +__rte_graph_pending_node(struct rte_graph *graph, uint16_t word, uint16_t = bit) +{ +=09const uint16_t index =3D (word * sizeof(*graph->pending) * CHAR_BIT) + = bit; +=09const rte_graph_off_t node_offset =3D graph->sched_table[index]; +=09return RTE_PTR_ADD(graph, node_offset); +} + +/** + * @internal + * + * Mark a node as pending in the graph scheduling bitmap. + * + * @param bitmap + * Either graph->pending or graph->src_pending. * @param node - * Pointer to node object to be enqueued. + * Pointer to node object to be marked pending. */ static __rte_always_inline void -__rte_node_enqueue_tail_update(struct rte_graph *graph, struct rte_node *n= ode) +__rte_node_pending_set(uint64_t *bitmap, struct rte_node *node) { -=09uint32_t tail; - -=09tail =3D graph->tail; -=09graph->cir_start[tail++] =3D node->off; -=09graph->tail =3D tail & graph->cir_mask; +=09const uint16_t word =3D node->sched_idx / (sizeof(*bitmap) * CHAR_BIT); +=09const uint16_t bit =3D node->sched_idx % (sizeof(*bitmap) * CHAR_BIT); +=09bitmap[word] |=3D 1ULL << bit; } =20 /** @@ -242,8 +263,8 @@ __rte_node_enqueue_tail_update(struct rte_graph *graph,= struct rte_node *node) * * Enqueue sequence prologue function. * - * Updates the node to tail of graph reel and resizes the number of object= s - * available in the stream as needed. + * Marks the node as pending in the scheduling bitmap and resizes the numb= er + * of objects available in the stream as needed. * * @param graph * Pointer to the graph object. @@ -259,9 +280,7 @@ __rte_node_enqueue_prologue(struct rte_graph *graph, st= ruct rte_node *node, =09=09=09 const uint16_t idx, const uint16_t space) { =20 -=09/* Add to the pending stream list if the node is new */ -=09if (idx =3D=3D 0) -=09=09__rte_node_enqueue_tail_update(graph, node); +=09__rte_node_pending_set(graph->pending, node); =20 =09if (unlikely(node->size < (idx + space))) =09=09__rte_node_stream_alloc_size(graph, node, node->size + space); @@ -293,7 +312,7 @@ __rte_node_next_node_get(struct rte_node *node, rte_edg= e_t next) =20 /** * Enqueue the objs to next node for further processing and set - * the next node to pending state in the circular buffer. + * the next node to pending state in the scheduling bitmap. * * @param graph * Graph pointer returned from rte_graph_lookup(). @@ -321,7 +340,7 @@ rte_node_enqueue(struct rte_graph *graph, struct rte_no= de *node, =20 /** * Enqueue only one obj to next node for further processing and - * set the next node to pending state in the circular buffer. + * set the next node to pending state in the scheduling bitmap. * * @param graph * Graph pointer returned from rte_graph_lookup(). @@ -347,7 +366,7 @@ rte_node_enqueue_x1(struct rte_graph *graph, struct rte= _node *node, =20 /** * Enqueue only two objs to next node for further processing and - * set the next node to pending state in the circular buffer. + * set the next node to pending state in the scheduling bitmap. * Same as rte_node_enqueue_x1 but enqueue two objs. * * @param graph @@ -377,7 +396,7 @@ rte_node_enqueue_x2(struct rte_graph *graph, struct rte= _node *node, =20 /** * Enqueue only four objs to next node for further processing and - * set the next node to pending state in the circular buffer. + * set the next node to pending state in the scheduling bitmap. * Same as rte_node_enqueue_x1 but enqueue four objs. * * @param graph @@ -414,7 +433,7 @@ rte_node_enqueue_x4(struct rte_graph *graph, struct rte= _node *node, =20 /** * Enqueue objs to multiple next nodes for further processing and - * set the next nodes to pending state in the circular buffer. + * set the next nodes to pending state in the scheduling bitmap. * objs[i] will be enqueued to nexts[i]. * * @param graph @@ -472,7 +491,7 @@ rte_node_next_stream_get(struct rte_graph *graph, struc= t rte_node *node, } =20 /** - * Put the next stream to pending state in the circular buffer + * Put the next stream to pending state in the scheduling bitmap * for further processing. Should be invoked after rte_node_next_stream_ge= t(). * * @param graph @@ -495,9 +514,7 @@ rte_node_next_stream_put(struct rte_graph *graph, struc= t rte_node *node, =09=09return; =20 =09node =3D __rte_node_next_node_get(node, next); -=09if (node->idx =3D=3D 0) -=09=09__rte_node_enqueue_tail_update(graph, node); - +=09__rte_node_pending_set(graph->pending, node); =09node->idx +=3D idx; } =20 @@ -530,7 +547,7 @@ rte_node_next_stream_move(struct rte_graph *graph, stru= ct rte_node *src, =09=09src->objs =3D dobjs; =09=09src->size =3D dsz; =09=09dst->idx =3D src->idx; -=09=09__rte_node_enqueue_tail_update(graph, dst); +=09=09__rte_node_pending_set(graph->pending, dst); =09} else { /* Move the objects from src node to dst node */ =09=09rte_node_enqueue(graph, src, next, src->objs, src->idx); =09} --=20 2.54.0