* [PATCH 0/6] ip_frag: fix reassembly defects and add test
@ 2026-06-16 21:05 Stephen Hemminger
2026-06-16 21:05 ` [PATCH 1/6] ip_frag: tolerate duplicate fragments Stephen Hemminger
` (5 more replies)
0 siblings, 6 replies; 7+ messages in thread
From: Stephen Hemminger @ 2026-06-16 21:05 UTC (permalink / raw)
To: dev; +Cc: Stephen Hemminger
The IP reassembly library tracks only a running byte total and reserved
slots for the first and last fragments, with no coverage map. As a result
it mishandles duplicate, overlapping, oversized, and misheadered
fragments, and the IPv4 key is missing a field RFC 791 requires. There
was also no functional test to catch any of it.
These came out of reviewing a duplicate-fragment report on the list.
Patches 1 and 2 are interdependent: the overlap discard relies on the
duplicate handling so an exact duplicate is dropped on its own rather
than discarding the whole datagram. The rest are independent.
Patch 6 adds a functional test modeled on the Linux selftest ip_defrag.c.
It passes on this series; with any single fix reverted the matching case
fails.
Stephen Hemminger (6):
ip_frag: tolerate duplicate fragments
ip_frag: discard datagrams with overlapping fragments
ip_frag: include protocol in IPv4 reassembly key
ip_frag: drop IPv6 fragments with unexpected headers
ip_frag: reject oversized reassembled datagrams
app/test: add test for IP reassembly
app/test/meson.build | 1 +
app/test/test_reassembly.c | 644 ++++++++++++++++++++++++++++++
lib/ip_frag/ip_frag_internal.c | 36 +-
lib/ip_frag/rte_ipv4_reassembly.c | 17 +-
lib/ip_frag/rte_ipv6_reassembly.c | 22 +-
5 files changed, 714 insertions(+), 6 deletions(-)
create mode 100644 app/test/test_reassembly.c
--
2.53.0
^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH 1/6] ip_frag: tolerate duplicate fragments
2026-06-16 21:05 [PATCH 0/6] ip_frag: fix reassembly defects and add test Stephen Hemminger
@ 2026-06-16 21:05 ` Stephen Hemminger
2026-06-16 21:05 ` [PATCH 2/6] ip_frag: discard datagrams with overlapping fragments Stephen Hemminger
` (4 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: Stephen Hemminger @ 2026-06-16 21:05 UTC (permalink / raw)
To: dev; +Cc: Stephen Hemminger, stable, Samyak Jain, Konstantin Ananyev
The reassembly code tracked only a running byte total and reserved slots
for the first and last fragments, with no check for a fragment
duplicating data already received. A single duplicate could destroy a
recoverable datagram:
- a duplicate first or last fragment collided with the reserved slot and
sent the whole entry down the error path, freeing every collected
fragment;
- a duplicate intermediate fragment was appended to a new slot, inflating
frag_size past total_size so reassembly never completed.
RFC 791 reassembly tolerates duplicates: a fragment covering bytes
already present carries no new information. Check for an exact duplicate
(stored fragment with the same offset and length) and drop only that
mbuf, before frag_size is updated, leaving the entry's accounting
unchanged.
Overlapping fragments with differing bounds are a separate issue
addressed in the next patch.
Fixes: cc8f4d020c0b ("examples/ip_reassembly: initial import")
Cc: stable@dpdk.org
Reported-by: Samyak Jain <samyak.jain@amantyatech.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
lib/ip_frag/ip_frag_internal.c | 18 +++++++++++++++++-
1 file changed, 17 insertions(+), 1 deletion(-)
diff --git a/lib/ip_frag/ip_frag_internal.c b/lib/ip_frag/ip_frag_internal.c
index 382f42d0e1..9a03ef995a 100644
--- a/lib/ip_frag/ip_frag_internal.c
+++ b/lib/ip_frag/ip_frag_internal.c
@@ -89,7 +89,23 @@ struct rte_mbuf *
ip_frag_process(struct ip_frag_pkt *fp, struct rte_ip_frag_death_row *dr,
struct rte_mbuf *mb, uint16_t ofs, uint16_t len, uint16_t more_frags)
{
- uint32_t idx;
+ uint32_t i, idx;
+
+ /*
+ * Discard an exact duplicate fragment. If a previously stored fragment
+ * already covers the same offset and length, this fragment carries no
+ * new data. Reassembly is tolerant of duplicates (RFC 791), so drop
+ * only this mbuf and keep the reassembly entry intact rather than
+ * treating it as an error. Fragments overlapping an existing one with
+ * different bounds are not handled here.
+ */
+ for (i = 0; i != fp->last_idx; i++) {
+ if (fp->frags[i].mb != NULL && fp->frags[i].ofs == ofs &&
+ fp->frags[i].len == len) {
+ IP_FRAG_MBUF2DR(dr, mb);
+ return NULL;
+ }
+ }
fp->frag_size += len;
--
2.53.0
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH 2/6] ip_frag: discard datagrams with overlapping fragments
2026-06-16 21:05 [PATCH 0/6] ip_frag: fix reassembly defects and add test Stephen Hemminger
2026-06-16 21:05 ` [PATCH 1/6] ip_frag: tolerate duplicate fragments Stephen Hemminger
@ 2026-06-16 21:05 ` Stephen Hemminger
2026-06-16 21:05 ` [PATCH 3/6] ip_frag: include protocol in IPv4 reassembly key Stephen Hemminger
` (3 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: Stephen Hemminger @ 2026-06-16 21:05 UTC (permalink / raw)
To: dev; +Cc: Stephen Hemminger, stable, Konstantin Ananyev
Existing code does not handle overlapping fragments.
RFC 8200 (IPv6) requires that on overlap all reassembly is abandoned
andall received fragments are dropped. RFC 791 (IPv4) originally called
fortrimming and rewriting, but Linux discards for IPv4 as well, since
overlap has no legitimate use and is a known attack vector.
Depends on the duplicate-tolerance change so that an exact duplicate is
dropped on its own rather than discarding the whole datagram.
Fixes: cc8f4d020c0b ("examples/ip_reassembly: initial import")
Cc: stable@dpdk.org
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
lib/ip_frag/ip_frag_internal.c | 34 ++++++++++++++++++++++++++--------
1 file changed, 26 insertions(+), 8 deletions(-)
diff --git a/lib/ip_frag/ip_frag_internal.c b/lib/ip_frag/ip_frag_internal.c
index 9a03ef995a..2505314a29 100644
--- a/lib/ip_frag/ip_frag_internal.c
+++ b/lib/ip_frag/ip_frag_internal.c
@@ -92,16 +92,34 @@ ip_frag_process(struct ip_frag_pkt *fp, struct rte_ip_frag_death_row *dr,
uint32_t i, idx;
/*
- * Discard an exact duplicate fragment. If a previously stored fragment
- * already covers the same offset and length, this fragment carries no
- * new data. Reassembly is tolerant of duplicates (RFC 791), so drop
- * only this mbuf and keep the reassembly entry intact rather than
- * treating it as an error. Fragments overlapping an existing one with
- * different bounds are not handled here.
+ * Scan the fragments already collected for this datagram before
+ * storing the new one. The stored set is kept free of duplicates and
+ * overlaps, so a single pass is sufficient.
*/
for (i = 0; i != fp->last_idx; i++) {
- if (fp->frags[i].mb != NULL && fp->frags[i].ofs == ofs &&
- fp->frags[i].len == len) {
+ if (fp->frags[i].mb == NULL)
+ continue;
+
+ /*
+ * Exact duplicate: carries no new data. Reassembly tolerates
+ * duplicates (RFC 791), so drop only this mbuf and keep the
+ * entry.
+ */
+ if (fp->frags[i].ofs == ofs && fp->frags[i].len == len) {
+ IP_FRAG_MBUF2DR(dr, mb);
+ return NULL;
+ }
+
+ /*
+ * Overlap with an existing fragment. Per RFC 8200 section 4.5
+ * (and RFC 5722) the datagram must be discarded; the same is
+ * applied to IPv4. Free all collected fragments, drop this one,
+ * and invalidate the entry.
+ */
+ if (ofs < fp->frags[i].ofs + fp->frags[i].len &&
+ fp->frags[i].ofs < ofs + len) {
+ ip_frag_free(fp, dr);
+ ip_frag_key_invalidate(&fp->key);
IP_FRAG_MBUF2DR(dr, mb);
return NULL;
}
--
2.53.0
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH 3/6] ip_frag: include protocol in IPv4 reassembly key
2026-06-16 21:05 [PATCH 0/6] ip_frag: fix reassembly defects and add test Stephen Hemminger
2026-06-16 21:05 ` [PATCH 1/6] ip_frag: tolerate duplicate fragments Stephen Hemminger
2026-06-16 21:05 ` [PATCH 2/6] ip_frag: discard datagrams with overlapping fragments Stephen Hemminger
@ 2026-06-16 21:05 ` Stephen Hemminger
2026-06-16 21:05 ` [PATCH 4/6] ip_frag: drop IPv6 fragments with unexpected headers Stephen Hemminger
` (2 subsequent siblings)
5 siblings, 0 replies; 7+ messages in thread
From: Stephen Hemminger @ 2026-06-16 21:05 UTC (permalink / raw)
To: dev; +Cc: Stephen Hemminger, stable, Konstantin Ananyev
DPDK IPv4 reassembly code was not following RFC 791 section 3.2
which says:
The internet identification field (ID) is used together with the
source and destination address, and the protocol fields, to identify
datagram fragments for reassembly.
Omitting the protocol means two datagrams between the
same pair of hosts that share an IP id but carry different protocols
(for example UDP and ICMP) are merged into a single reassembly context,
producing a corrupted datagram.
Fold the protocol into the unused upper bits of the 32-bit id field
of the key. The IPv4 identification is 16 bits and occupies the low
half, so the protocol can be carried in the upper bits without changing
the key layout, the key comparison or the hash.
Fixes: cc8f4d020c0b ("examples/ip_reassembly: initial import")
Cc: stable@dpdk.org
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
lib/ip_frag/rte_ipv4_reassembly.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/lib/ip_frag/rte_ipv4_reassembly.c b/lib/ip_frag/rte_ipv4_reassembly.c
index 3c8ae113ba..980f7a3b77 100644
--- a/lib/ip_frag/rte_ipv4_reassembly.c
+++ b/lib/ip_frag/rte_ipv4_reassembly.c
@@ -111,9 +111,15 @@ rte_ipv4_frag_reassemble_packet(struct rte_ip_frag_tbl *tbl,
ip_ofs = (uint16_t)(flag_offset & RTE_IPV4_HDR_OFFSET_MASK);
ip_flag = (uint16_t)(flag_offset & RTE_IPV4_HDR_MF_FLAG);
+ /*
+ * RFC 791 requires using: source, destination, identifier field and protocol
+ */
+
/* use first 8 bytes only */
memcpy(&key.src_dst[0], &ip_hdr->src_addr, 8);
- key.id = ip_hdr->packet_id;
+
+ /* packet_id is 16 bits and proto id is 8 bits */
+ key.id = ((uint32_t) ip_hdr->next_proto_id << 16) | ip_hdr->packet_id;
key.key_len = IPV4_KEYLEN;
ip_ofs *= RTE_IPV4_HDR_OFFSET_UNITS;
--
2.53.0
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH 4/6] ip_frag: drop IPv6 fragments with unexpected headers
2026-06-16 21:05 [PATCH 0/6] ip_frag: fix reassembly defects and add test Stephen Hemminger
` (2 preceding siblings ...)
2026-06-16 21:05 ` [PATCH 3/6] ip_frag: include protocol in IPv4 reassembly key Stephen Hemminger
@ 2026-06-16 21:05 ` Stephen Hemminger
2026-06-16 21:05 ` [PATCH 5/6] ip_frag: reject oversized reassembled datagrams Stephen Hemminger
2026-06-16 21:05 ` [PATCH 6/6] app/test: add test for IP reassembly Stephen Hemminger
5 siblings, 0 replies; 7+ messages in thread
From: Stephen Hemminger @ 2026-06-16 21:05 UTC (permalink / raw)
To: dev
Cc: Stephen Hemminger, stable, Konstantin Ananyev, Anatoly Burakov,
Thomas Monjalon
DPDK version of IPv6 reassembly only handles a fragment header placed
directly after the IPv6 header. With other extension headers in the
unfragmentable part, ipv6_frag_reassemble() patches the wrong
next-header field, miscomputes the payload length, and shifts the
wrong bytes, corrupting the result.
Drop the fragment when l3_len covers more than the IPv6 and fragment
headers. RFC 8200 allows a receiver to discard packets whose extension
headers are not in the recommended order, and RFC 9099 recommends
dropping non-conforming fragmented IPv6 packets, so dropping here is
permitted rather than a deviation.
Fixes: 4f1a8f633862 ("ip_frag: add IPv6 reassembly")
Cc: stable@dpdk.org
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
lib/ip_frag/rte_ipv6_reassembly.c | 13 +++++++++++++
1 file changed, 13 insertions(+)
diff --git a/lib/ip_frag/rte_ipv6_reassembly.c b/lib/ip_frag/rte_ipv6_reassembly.c
index 0e809a01e5..7c1659002b 100644
--- a/lib/ip_frag/rte_ipv6_reassembly.c
+++ b/lib/ip_frag/rte_ipv6_reassembly.c
@@ -180,6 +180,19 @@ rte_ipv6_frag_reassemble_packet(struct rte_ip_frag_tbl *tbl,
return NULL;
}
+ /*
+ * Only a fragment header directly following the IPv6 header is
+ * supported. Other extension headers in the unfragmentable part are
+ * not handled: ipv6_frag_reassemble() assumes l3_len covers exactly
+ * the IPv6 and fragment headers when it patches the next-header field
+ * and removes the fragment header. Drop the fragment rather than
+ * produce a corrupt datagram.
+ */
+ if (mb->l3_len != sizeof(struct rte_ipv6_hdr) + sizeof(*frag_hdr)) {
+ IP_FRAG_MBUF2DR(dr, mb);
+ return NULL;
+ }
+
if (unlikely(trim > 0))
rte_pktmbuf_trim(mb, trim);
--
2.53.0
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH 5/6] ip_frag: reject oversized reassembled datagrams
2026-06-16 21:05 [PATCH 0/6] ip_frag: fix reassembly defects and add test Stephen Hemminger
` (3 preceding siblings ...)
2026-06-16 21:05 ` [PATCH 4/6] ip_frag: drop IPv6 fragments with unexpected headers Stephen Hemminger
@ 2026-06-16 21:05 ` Stephen Hemminger
2026-06-16 21:05 ` [PATCH 6/6] app/test: add test for IP reassembly Stephen Hemminger
5 siblings, 0 replies; 7+ messages in thread
From: Stephen Hemminger @ 2026-06-16 21:05 UTC (permalink / raw)
To: dev; +Cc: Stephen Hemminger, stable, Konstantin Ananyev
The reassembled total length of a packet must not exceed 65535.
A fragment with a high offset could drive the sum past that,
causing silent truncation since IP payload_len/total_length is 16 bits.
When reassembling a packet the total length should not be allowed
to exceed 65535. A fragment with high offset could drive the sum
past that, causing silent truncation.
A valid datagram never exceeds 65535 bytes, so reject any fragment
whose resulting length would exceed that.
Fold the test into the existing zero-length check.
Fixes: cc8f4d020c0b ("examples/ip_reassembly: initial import")
Cc: stable@dpdk.org
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
lib/ip_frag/rte_ipv4_reassembly.c | 9 +++++++--
lib/ip_frag/rte_ipv6_reassembly.c | 9 +++++++--
2 files changed, 14 insertions(+), 4 deletions(-)
diff --git a/lib/ip_frag/rte_ipv4_reassembly.c b/lib/ip_frag/rte_ipv4_reassembly.c
index 980f7a3b77..727fc58243 100644
--- a/lib/ip_frag/rte_ipv4_reassembly.c
+++ b/lib/ip_frag/rte_ipv4_reassembly.c
@@ -136,8 +136,13 @@ rte_ipv4_frag_reassemble_packet(struct rte_ip_frag_tbl *tbl,
tbl, tbl->max_cycles, tbl->entry_mask, tbl->max_entries,
tbl->use_entries);
- /* check that fragment length is greater then zero. */
- if (ip_len <= 0) {
+ /*
+ * Drop fragments with no payload, and any fragment whose end would
+ * make the reassembled datagram exceed the maximum IPv4 size. The
+ * total_length field is 16 bits, so otherwise it is silently
+ * truncated while the mbuf still holds the full length.
+ */
+ if (ip_len <= 0 || ip_ofs + ip_len + mb->l3_len > UINT16_MAX) {
IP_FRAG_MBUF2DR(dr, mb);
return NULL;
}
diff --git a/lib/ip_frag/rte_ipv6_reassembly.c b/lib/ip_frag/rte_ipv6_reassembly.c
index 7c1659002b..0b44275b37 100644
--- a/lib/ip_frag/rte_ipv6_reassembly.c
+++ b/lib/ip_frag/rte_ipv6_reassembly.c
@@ -174,8 +174,13 @@ rte_ipv6_frag_reassemble_packet(struct rte_ip_frag_tbl *tbl,
tbl, tbl->max_cycles, tbl->entry_mask, tbl->max_entries,
tbl->use_entries);
- /* check that fragment length is greater then zero. */
- if (ip_len <= 0) {
+ /*
+ * Drop fragments with no payload, and any fragment whose end would
+ * make the reassembled payload exceed 65535 bytes. The payload_len
+ * field is 16 bits, so otherwise it is silently truncated while the
+ * mbuf still holds the full length.
+ */
+ if (ip_len <= 0 || ip_ofs + ip_len > UINT16_MAX) {
IP_FRAG_MBUF2DR(dr, mb);
return NULL;
}
--
2.53.0
^ permalink raw reply related [flat|nested] 7+ messages in thread
* [PATCH 6/6] app/test: add test for IP reassembly
2026-06-16 21:05 [PATCH 0/6] ip_frag: fix reassembly defects and add test Stephen Hemminger
` (4 preceding siblings ...)
2026-06-16 21:05 ` [PATCH 5/6] ip_frag: reject oversized reassembled datagrams Stephen Hemminger
@ 2026-06-16 21:05 ` Stephen Hemminger
5 siblings, 0 replies; 7+ messages in thread
From: Stephen Hemminger @ 2026-06-16 21:05 UTC (permalink / raw)
To: dev; +Cc: Stephen Hemminger
There was no functional test for IPv4/IPv6 reassembly,
only a performance test. Add a new test modeled on the Linux kernel
selftest tools/testing/selftests/net/ip_defrag.c. This is new
code so no license conflict.
Test covers:
size and fragment-size sweep across in-order, reverse,
odd/even and block delivery orders with byte-exact payload validation;
minimum 8-byte fragments; the fragment-count limit;
timeout of incomplete datagrams;
and the duplicate, overlap, extension-header and
oversized-fragment cases fixed earlier in this series.
The reassembled packet is checked with rte_mbuf_check() to catch
malformed segment chains in addition to wrong content.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
app/test/meson.build | 1 +
app/test/test_reassembly.c | 644 +++++++++++++++++++++++++++++++++++++
2 files changed, 645 insertions(+)
create mode 100644 app/test/test_reassembly.c
diff --git a/app/test/meson.build b/app/test/meson.build
index 61024125a7..b8c2208d0b 100644
--- a/app/test/meson.build
+++ b/app/test/meson.build
@@ -160,6 +160,7 @@ source_file_deps = {
'test_rawdev.c': ['rawdev', 'bus_vdev', 'raw_skeleton'],
'test_rcu_qsbr.c': ['rcu', 'hash'],
'test_rcu_qsbr_perf.c': ['rcu', 'hash'],
+ 'test_reassembly.c': ['net', 'ip_frag'],
'test_reassembly_perf.c': ['net', 'ip_frag'],
'test_reciprocal_division.c': [],
'test_reciprocal_division_perf.c': [],
diff --git a/app/test/test_reassembly.c b/app/test/test_reassembly.c
new file mode 100644
index 0000000000..9cada5f3b4
--- /dev/null
+++ b/app/test/test_reassembly.c
@@ -0,0 +1,644 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2026
+ *
+ * Functional unit tests for the IP reassembly path of librte_ip_frag.
+ *
+ * Coverage mirrors the Linux selftest tools/testing/selftests/net/ip_defrag.c
+ * adapted to the library API and to DPDK-specific constraints:
+ *
+ * - size / fragment-size sweep, bounded by RTE_LIBRTE_IP_FRAG_MAX_FRAG
+ * - in-order, reverse, odd-then-even, and block-reordered delivery
+ * - byte-exact validation of the reassembled payload (not just length)
+ * - minimum (8-byte) fragments
+ * - fragment-count boundary: exactly MAX reassembles, MAX + 1 fails
+ * - incomplete datagram reaped on timeout
+ * - zero-length fragment rejected
+ * - duplicate fragment tolerated in a reordered set
+ * - overlapping fragments (leading/trailing/contained) discarded
+ * - IPv6 fragment with extension headers in the unfragmentable part dropped
+ * - fragment whose end exceeds the maximum datagram size dropped
+ *
+ * The last four groups depend on the corresponding reassembly fixes
+ * (duplicate tolerance, overlap discard, extension-header drop, oversize
+ * drop); they pass once those are applied and fail on unpatched code. The
+ * remaining cases pass regardless.
+ *
+ * Fragments use l2_len == 0; the library reads the L3 header at offset 0.
+ */
+
+#include "test.h"
+
+#include <string.h>
+
+#include <rte_common.h>
+#include <rte_cycles.h>
+#include <rte_ip.h>
+#include <rte_ip_frag.h>
+#include <rte_log.h>
+#include <rte_mbuf.h>
+#include <rte_mempool.h>
+
+#define NB_MBUF 1024
+#define MBUF_CACHE 0 /* exact accounting for leak checks */
+#define MBUF_DATA 2048
+#define V4_L3_LEN ((uint16_t)sizeof(struct rte_ipv4_hdr))
+#define V6_L3_LEN ((uint16_t)(sizeof(struct rte_ipv6_hdr) + \
+ RTE_IPV6_FRAG_HDR_SIZE))
+#define TEST_ID 0x4242
+
+#ifndef RTE_LIBRTE_IP_FRAG_MAX_FRAG
+#define RTE_LIBRTE_IP_FRAG_MAX_FRAG 8
+#endif
+#define MAX_FRAG RTE_LIBRTE_IP_FRAG_MAX_FRAG
+
+#define MAX_PAYLOAD (MAX_FRAG * 256) /* keeps a fragment in one mbuf */
+
+enum family { V4, V6 };
+enum order { IN_ORDER, REVERSE, ODD_EVEN, BLOCK };
+
+struct frag_desc {
+ uint16_t ofs; /* byte offset into the payload */
+ uint16_t plen; /* payload bytes after L3 */
+ uint8_t mf;
+};
+
+static struct rte_mempool *pkt_pool;
+
+/* position-dependent payload pattern, non-periodic at 256 so a misordered
+ * reassembly is detected even when lengths line up.
+ */
+static inline uint8_t
+pat(uint32_t k)
+{
+ return (uint8_t)(k * 31u + 7u);
+}
+
+/* ------------------------------- harness -------------------------------- */
+
+static int
+testsuite_setup(void)
+{
+ /* the table create/destroy per case is chatty at INFO level */
+ rte_log_set_level_pattern("lib.ip_frag", RTE_LOG_NOTICE);
+
+ pkt_pool = rte_pktmbuf_pool_create("REASM_POOL", NB_MBUF, MBUF_CACHE,
+ 0, MBUF_DATA, SOCKET_ID_ANY);
+ return pkt_pool == NULL ? TEST_FAILED : TEST_SUCCESS;
+}
+
+static void
+testsuite_teardown(void)
+{
+ rte_mempool_free(pkt_pool);
+ pkt_pool = NULL;
+}
+
+/* Every case must start and end with a full pool, so a leak in one case is
+ * pinpointed here rather than silently masking the next one.
+ */
+static int
+ut_setup(void)
+{
+ if (rte_mempool_avail_count(pkt_pool) != NB_MBUF) {
+ printf("pool not full at case start: %u/%u\n",
+ rte_mempool_avail_count(pkt_pool), NB_MBUF);
+ return TEST_FAILED;
+ }
+ return TEST_SUCCESS;
+}
+
+static struct rte_ip_frag_tbl *
+tbl_new(uint64_t max_cycles)
+{
+ return rte_ip_frag_table_create(16, MAX_FRAG, 16, max_cycles,
+ rte_socket_id());
+}
+
+/* Build one fragment with a position-dependent payload. */
+static struct rte_mbuf *
+build_frag(enum family fam, uint16_t ofs, uint16_t plen, uint8_t mf)
+{
+ struct rte_mbuf *m = rte_pktmbuf_alloc(pkt_pool);
+ uint16_t l3 = (fam == V4) ? V4_L3_LEN : V6_L3_LEN;
+ char *p;
+ uint16_t i;
+
+ if (m == NULL)
+ return NULL;
+ m->data_off = 0;
+
+ if (fam == V4) {
+ struct rte_ipv4_hdr *ip = rte_pktmbuf_mtod(m,
+ struct rte_ipv4_hdr *);
+ uint16_t fo = ofs / RTE_IPV4_HDR_OFFSET_UNITS;
+
+ memset(ip, 0, V4_L3_LEN);
+ if (mf)
+ fo |= RTE_IPV4_HDR_MF_FLAG;
+ ip->version_ihl = 0x45;
+ ip->total_length = rte_cpu_to_be_16(V4_L3_LEN + plen);
+ ip->packet_id = rte_cpu_to_be_16(TEST_ID);
+ ip->fragment_offset = rte_cpu_to_be_16(fo);
+ ip->time_to_live = 64;
+ ip->next_proto_id = IPPROTO_UDP;
+ ip->src_addr = rte_cpu_to_be_32(0x0a000001);
+ ip->dst_addr = rte_cpu_to_be_32(0x0a000002);
+ } else {
+ struct rte_ipv6_hdr *ip = rte_pktmbuf_mtod(m,
+ struct rte_ipv6_hdr *);
+ struct rte_ipv6_fragment_ext *fh =
+ rte_pktmbuf_mtod_offset(m,
+ struct rte_ipv6_fragment_ext *,
+ sizeof(struct rte_ipv6_hdr));
+
+ memset(ip, 0, V6_L3_LEN);
+ ip->vtc_flow = rte_cpu_to_be_32(6u << 28);
+ ip->payload_len = rte_cpu_to_be_16(RTE_IPV6_FRAG_HDR_SIZE + plen);
+ ip->proto = IPPROTO_FRAGMENT;
+ ip->hop_limits = 64;
+ ip->src_addr.a[15] = 1;
+ ip->dst_addr.a[15] = 2;
+ fh->next_header = IPPROTO_UDP;
+ fh->reserved = 0;
+ fh->frag_data = rte_cpu_to_be_16(
+ RTE_IPV6_SET_FRAG_DATA(ofs, mf ? 1 : 0));
+ fh->id = rte_cpu_to_be_32(TEST_ID);
+ }
+
+ p = rte_pktmbuf_mtod_offset(m, char *, l3);
+ for (i = 0; i < plen; i++)
+ p[i] = (char)pat(ofs + i);
+
+ m->data_len = m->pkt_len = l3 + plen;
+ m->l2_len = 0;
+ m->l3_len = l3;
+ return m;
+}
+
+static struct rte_mbuf *
+feed(enum family fam, struct rte_ip_frag_tbl *tbl,
+ struct rte_ip_frag_death_row *dr, const struct frag_desc *d, uint64_t tms)
+{
+ struct rte_mbuf *m = build_frag(fam, d->ofs, d->plen, d->mf);
+
+ if (m == NULL)
+ return NULL;
+ if (fam == V4) {
+ struct rte_ipv4_hdr *ip = rte_pktmbuf_mtod(m, struct rte_ipv4_hdr *);
+ return rte_ipv4_frag_reassemble_packet(tbl, dr, m, tms, ip);
+ } else {
+ struct rte_ipv6_hdr *ip = rte_pktmbuf_mtod(m, struct rte_ipv6_hdr *);
+ struct rte_ipv6_fragment_ext *fh =
+ rte_pktmbuf_mtod_offset(m, struct rte_ipv6_fragment_ext *,
+ sizeof(struct rte_ipv6_hdr));
+ return rte_ipv6_frag_reassemble_packet(tbl, dr, m, tms, ip, fh);
+ }
+}
+
+/* Split a datagram of total_plen into fragments of frag_size (multiple of 8).
+ * Returns the fragment count, or -1 if it would exceed MAX_FRAG.
+ */
+static int
+make_datagram(uint16_t total_plen, uint16_t frag_size, struct frag_desc *out)
+{
+ int n = 0;
+ uint16_t ofs = 0;
+
+ while (ofs < total_plen) {
+ uint16_t rem = total_plen - ofs;
+ uint16_t len = rem <= frag_size ? rem : frag_size;
+
+ if (n >= MAX_FRAG)
+ return -1;
+ out[n].ofs = ofs;
+ out[n].plen = len;
+ out[n].mf = (ofs + len < total_plen);
+ ofs += len;
+ n++;
+ }
+ return n;
+}
+
+/* Produce a delivery order (array of indices into descs). */
+static void
+make_order(enum order ord, int n, int *idx)
+{
+ int i, k = 0;
+
+ switch (ord) {
+ case IN_ORDER:
+ for (i = 0; i < n; i++)
+ idx[i] = i;
+ break;
+ case REVERSE:
+ for (i = 0; i < n; i++)
+ idx[i] = n - 1 - i;
+ break;
+ case ODD_EVEN:
+ for (i = 1; i < n; i += 2)
+ idx[k++] = i;
+ for (i = 0; i < n; i += 2)
+ idx[k++] = i;
+ break;
+ case BLOCK: {
+ int t = n / 3 ? n / 3 : 1;
+
+ for (i = 2 * t; i < n; i++)
+ idx[k++] = i;
+ for (i = t; i < 2 * t && i < n; i++)
+ idx[k++] = i;
+ for (i = 0; i < t && i < n; i++)
+ idx[k++] = i;
+ break;
+ }
+ }
+}
+
+/* Feed descs in the given order; return reassembled mbuf or NULL. */
+static struct rte_mbuf *
+run_ordered(enum family fam, const struct frag_desc *descs, int n,
+ const int *idx)
+{
+ struct rte_ip_frag_death_row dr;
+ struct rte_ip_frag_tbl *tbl;
+ struct rte_mbuf *out = NULL;
+ uint64_t tms = rte_rdtsc();
+ int i;
+
+ memset(&dr, 0, sizeof(dr));
+ tbl = tbl_new(rte_get_tsc_hz());
+ if (tbl == NULL)
+ return NULL;
+ for (i = 0; i < n; i++) {
+ struct rte_mbuf *r = feed(fam, tbl, &dr, &descs[idx[i]], tms);
+
+ if (r != NULL)
+ out = r;
+ }
+ rte_ip_frag_free_death_row(&dr, 0);
+ rte_ip_frag_table_destroy(tbl);
+ return out;
+}
+
+/* Validate length and byte-exact payload, then free. Returns 0 on success.
+ * Note: reassembly strips the IPv6 fragment header, so the reassembled v6
+ * header is sizeof(rte_ipv6_hdr), not the V6_L3_LEN the fragments were built
+ * with. v4 has no fragment header to remove.
+ */
+static int
+validate(struct rte_mbuf *m, enum family fam, uint16_t total_plen)
+{
+ uint16_t l3 = (fam == V4) ? V4_L3_LEN :
+ (uint16_t)sizeof(struct rte_ipv6_hdr);
+ uint8_t buf[MAX_PAYLOAD];
+ const uint8_t *p;
+ const char *reason;
+ uint16_t k;
+ int rc = 0;
+
+ if (m == NULL)
+ return -1;
+ if (rte_mbuf_check(m, 1, &reason) != 0) {
+ printf(" bad mbuf fam=%d total=%u: %s\n", fam, total_plen,
+ reason);
+ rte_pktmbuf_free(m);
+ return -1;
+ }
+ if (m->pkt_len != (uint32_t)(l3 + total_plen)) {
+ rte_pktmbuf_free(m);
+ return -1;
+ }
+ p = rte_pktmbuf_read(m, l3, total_plen, buf);
+ if (p == NULL) {
+ rte_pktmbuf_free(m);
+ return -1;
+ }
+ for (k = 0; k < total_plen; k++) {
+ if (p[k] != pat(k)) {
+ rc = -1;
+ break;
+ }
+ }
+ rte_pktmbuf_free(m);
+ return rc;
+}
+
+/* --------------------------- baseline / sweep --------------------------- */
+
+static int
+sweep_one(enum family fam, uint16_t total_plen, uint16_t frag_size)
+{
+ struct frag_desc descs[MAX_FRAG];
+ int idx[MAX_FRAG];
+ const enum order orders[] = { IN_ORDER, REVERSE, ODD_EVEN, BLOCK };
+ int n = make_datagram(total_plen, frag_size, descs);
+ unsigned int o;
+
+ if (n < 2) /* skip single-fragment / oversized for sweep */
+ return 0;
+
+ for (o = 0; o < RTE_DIM(orders); o++) {
+ make_order(orders[o], n, idx);
+ if (validate(run_ordered(fam, descs, n, idx), fam,
+ total_plen) != 0) {
+ printf(" sweep fail: fam=%d total=%u fs=%u order=%u n=%d\n",
+ fam, total_plen, frag_size, orders[o], n);
+ return -1;
+ }
+ }
+ return 0;
+}
+
+static int
+sweep(enum family fam)
+{
+ const uint16_t fsizes[] = { 8, 16, 64, 256 };
+ unsigned int f;
+
+ for (f = 0; f < RTE_DIM(fsizes); f++) {
+ uint16_t fs = fsizes[f];
+ uint16_t total;
+
+ /* cover 2..MAX_FRAG fragments, last fragment partial */
+ for (total = fs + 8; total <= fs * MAX_FRAG; total += fs) {
+ if (sweep_one(fam, total, fs) != 0)
+ return TEST_FAILED;
+ if (total > fs + 4 &&
+ sweep_one(fam, total - 4, fs) != 0)
+ return TEST_FAILED;
+ }
+ }
+ return TEST_SUCCESS;
+}
+
+static int test_sweep_v4(void) { return sweep(V4); }
+static int test_sweep_v6(void) { return sweep(V6); }
+
+/* Minimum 8-byte fragments. */
+static int
+test_min_fragment(void)
+{
+ struct frag_desc d[3] = {
+ { 0, 8, 1 }, { 8, 8, 1 }, { 16, 8, 0 },
+ };
+ int idx[3];
+
+ make_order(REVERSE, 3, idx);
+ TEST_ASSERT_SUCCESS(validate(run_ordered(V4, d, 3, idx), V4, 24),
+ "min 8-byte fragments not reassembled");
+ make_order(ODD_EVEN, 3, idx);
+ TEST_ASSERT_SUCCESS(validate(run_ordered(V6, d, 3, idx), V6, 24),
+ "min 8-byte fragments not reassembled (v6)");
+ return TEST_SUCCESS;
+}
+
+/* Exactly MAX_FRAG fragments reassembles; MAX_FRAG + 1 fails. */
+static int
+test_cap_boundary(void)
+{
+ struct frag_desc d[MAX_FRAG + 1];
+ int idx[MAX_FRAG + 1];
+ uint16_t fs = 8, total = fs * MAX_FRAG;
+ int n, i;
+
+ n = make_datagram(total, fs, d);
+ TEST_ASSERT_EQUAL(n, MAX_FRAG, "expected MAX_FRAG fragments");
+ make_order(IN_ORDER, n, idx);
+ TEST_ASSERT_SUCCESS(validate(run_ordered(V4, d, n, idx), V4, total),
+ "MAX_FRAG fragments should reassemble");
+
+ /* one more fragment than the table can hold */
+ for (i = 0; i <= MAX_FRAG; i++) {
+ d[i].ofs = i * fs;
+ d[i].plen = fs;
+ d[i].mf = (i < MAX_FRAG);
+ idx[i] = i;
+ }
+ TEST_ASSERT_NULL(run_ordered(V4, d, MAX_FRAG + 1, idx),
+ "MAX_FRAG + 1 fragments should not reassemble");
+ TEST_ASSERT_EQUAL(rte_mempool_avail_count(pkt_pool), NB_MBUF,
+ "overflowing set leaked mbufs");
+ return TEST_SUCCESS;
+}
+
+/* Incomplete datagram: no output, reaped on timeout. */
+static int
+test_incomplete_timeout(void)
+{
+ struct rte_ip_frag_death_row dr;
+ struct rte_ip_frag_tbl *tbl;
+ uint64_t mc = rte_get_tsc_hz(), tms = rte_rdtsc();
+ struct frag_desc d[2] = { { 0, 64, 1 }, { 128, 64, 0 } }; /* gap */
+ struct rte_mbuf *out = NULL;
+ int i;
+
+ memset(&dr, 0, sizeof(dr));
+ tbl = tbl_new(mc);
+ TEST_ASSERT_NOT_NULL(tbl, "table create failed");
+ for (i = 0; i < 2; i++) {
+ struct rte_mbuf *r = feed(V4, tbl, &dr, &d[i], tms);
+
+ if (r != NULL)
+ out = r;
+ }
+ TEST_ASSERT_NULL(out, "incomplete datagram reassembled");
+ rte_ip_frag_table_del_expired_entries(tbl, &dr, tms + mc + 1);
+ rte_ip_frag_free_death_row(&dr, 0);
+ rte_ip_frag_table_destroy(tbl);
+ TEST_ASSERT_EQUAL(rte_mempool_avail_count(pkt_pool), NB_MBUF,
+ "expired fragments not freed");
+ return TEST_SUCCESS;
+}
+
+static int
+test_zero_len(void)
+{
+ struct frag_desc d = { 0, 0, 1 };
+ int idx = 0;
+
+ TEST_ASSERT_NULL(run_ordered(V4, &d, 1, &idx),
+ "zero-length fragment accepted");
+ TEST_ASSERT_EQUAL(rte_mempool_avail_count(pkt_pool), NB_MBUF,
+ "zero-length fragment leaked");
+ return TEST_SUCCESS;
+}
+
+/* --------------------- duplicate / overlap / reject --------------------- */
+
+/* A duplicate anywhere in a reordered set must not break reassembly. */
+static int
+test_dup_tolerated(void)
+{
+ /* offsets 0,64,128,192 with 64B frags; inject a dup of frag 1 */
+ struct frag_desc d[5] = {
+ { 0, 64, 1 }, { 64, 64, 1 }, { 64, 64, 1 }, /* dup */
+ { 128, 64, 1 }, { 192, 64, 0 },
+ };
+ int idx[5] = { 1, 4, 2, 0, 3 }; /* reordered, dup interleaved */
+
+ TEST_ASSERT_SUCCESS(validate(run_ordered(V4, d, 5, idx), V4, 256),
+ "duplicate fragment broke reassembly");
+ return TEST_SUCCESS;
+}
+
+/* Overlap geometries; the datagram must be discarded and every collected
+ * fragment freed. The last fragment is withheld so that on unfixed code the
+ * entry is *retained* (total_size stays UINT32_MAX) rather than torn down by
+ * the frag_size > total_size path: that retention is what we detect. We
+ * capture the mbufs still held in the table after draining the death row,
+ * before destroying the table (destroy frees held mbufs, hiding the leak).
+ */
+static int
+overlap_case(enum family fam, const struct frag_desc *d, int n, const char *what)
+{
+ struct rte_ip_frag_death_row dr;
+ struct rte_ip_frag_tbl *tbl;
+ struct rte_mbuf *out = NULL;
+ uint64_t tms = rte_rdtsc();
+ unsigned int held;
+ int i;
+
+ memset(&dr, 0, sizeof(dr));
+ tbl = tbl_new(rte_get_tsc_hz());
+ if (tbl == NULL)
+ return -1;
+ for (i = 0; i < n; i++) {
+ struct rte_mbuf *r = feed(fam, tbl, &dr, &d[i], tms);
+
+ if (r != NULL)
+ out = r;
+ }
+ rte_ip_frag_free_death_row(&dr, 0);
+ held = NB_MBUF - rte_mempool_avail_count(pkt_pool);
+ rte_ip_frag_table_destroy(tbl);
+
+ if (out != NULL) {
+ rte_pktmbuf_free(out);
+ printf(" overlap reassembled instead of discarded: %s\n", what);
+ return -1;
+ }
+ if (held != 0) {
+ printf(" overlap kept %u fragment(s) instead of discarding: %s\n",
+ held, what);
+ return -1;
+ }
+ return 0;
+}
+
+static int
+test_overlap(void)
+{
+ /* last fragment withheld in every case (all MF=1) */
+
+ /* overlapping fragment arrives second */
+ const struct frag_desc tail[2] = { { 0, 600, 1 }, { 300, 600, 1 } };
+ /* overlapping fragment arrives first */
+ const struct frag_desc head[2] = { { 300, 600, 1 }, { 0, 600, 1 } };
+ /* a fragment fully contained in an existing one */
+ const struct frag_desc cont[2] = { { 0, 600, 1 }, { 200, 200, 1 } };
+
+ TEST_ASSERT_SUCCESS(overlap_case(V6, tail, 2, "v6 overlap second"), "");
+ TEST_ASSERT_SUCCESS(overlap_case(V6, head, 2, "v6 overlap first"), "");
+ TEST_ASSERT_SUCCESS(overlap_case(V6, cont, 2, "v6 contained"), "");
+ TEST_ASSERT_SUCCESS(overlap_case(V4, tail, 2, "v4 overlap second"), "");
+ return TEST_SUCCESS;
+}
+
+/* IPv6 fragment with an unfragmentable extension header is dropped, not
+ * stored. Capture whether the fragment is still held in the table after the
+ * death row is drained but before the table is destroyed.
+ */
+static int
+test_v6_ext_header_drop(void)
+{
+ struct rte_ip_frag_death_row dr;
+ struct rte_ip_frag_tbl *tbl;
+ struct rte_mbuf *m, *r;
+ struct rte_ipv6_hdr *ip;
+ struct rte_ipv6_fragment_ext *fh;
+ unsigned int held;
+
+ memset(&dr, 0, sizeof(dr));
+ tbl = tbl_new(rte_get_tsc_hz());
+ TEST_ASSERT_NOT_NULL(tbl, "table create failed");
+
+ m = build_frag(V6, 0, 64, 1);
+ TEST_ASSERT_NOT_NULL(m, "alloc failed");
+ /* pretend an 8-byte extension header sits before the fragment header */
+ m->l3_len += 8;
+ ip = rte_pktmbuf_mtod(m, struct rte_ipv6_hdr *);
+ fh = rte_pktmbuf_mtod_offset(m, struct rte_ipv6_fragment_ext *,
+ sizeof(struct rte_ipv6_hdr));
+ r = rte_ipv6_frag_reassemble_packet(tbl, &dr, m, rte_rdtsc(), ip, fh);
+ rte_ip_frag_free_death_row(&dr, 0);
+ held = NB_MBUF - rte_mempool_avail_count(pkt_pool);
+ rte_ip_frag_table_destroy(tbl);
+
+ TEST_ASSERT_NULL(r, "fragment with extension header accepted");
+ TEST_ASSERT_EQUAL(held, 0,
+ "extension-header fragment stored instead of dropped");
+ return TEST_SUCCESS;
+}
+
+/* A fragment whose end exceeds the max datagram size is dropped, not stored. */
+static int
+oversize_drop_one(enum family fam)
+{
+ struct rte_ip_frag_death_row dr;
+ struct rte_ip_frag_tbl *tbl;
+ struct frag_desc d = { 0xFFF8, 64, 0 }; /* offset 65528 + 64 > 65535 */
+ struct rte_mbuf *r;
+ unsigned int held;
+
+ memset(&dr, 0, sizeof(dr));
+ tbl = tbl_new(rte_get_tsc_hz());
+ if (tbl == NULL)
+ return -1;
+ r = feed(fam, tbl, &dr, &d, rte_rdtsc());
+ rte_ip_frag_free_death_row(&dr, 0);
+ held = NB_MBUF - rte_mempool_avail_count(pkt_pool);
+ rte_ip_frag_table_destroy(tbl);
+
+ if (r != NULL) {
+ rte_pktmbuf_free(r);
+ return -1;
+ }
+ return held == 0 ? 0 : -1;
+}
+
+static int
+test_oversize_drop(void)
+{
+ TEST_ASSERT_SUCCESS(oversize_drop_one(V4),
+ "oversized v4 fragment stored instead of dropped");
+ TEST_ASSERT_SUCCESS(oversize_drop_one(V6),
+ "oversized v6 fragment stored instead of dropped");
+ return TEST_SUCCESS;
+}
+
+static struct unit_test_suite reassembly_testsuite = {
+ .suite_name = "IP Reassembly Unit Test Suite",
+ .setup = testsuite_setup,
+ .teardown = testsuite_teardown,
+ .unit_test_cases = {
+ TEST_CASE_ST(ut_setup, NULL, test_sweep_v4),
+ TEST_CASE_ST(ut_setup, NULL, test_sweep_v6),
+ TEST_CASE_ST(ut_setup, NULL, test_min_fragment),
+ TEST_CASE_ST(ut_setup, NULL, test_cap_boundary),
+ TEST_CASE_ST(ut_setup, NULL, test_incomplete_timeout),
+ TEST_CASE_ST(ut_setup, NULL, test_zero_len),
+ TEST_CASE_ST(ut_setup, NULL, test_dup_tolerated),
+ TEST_CASE_ST(ut_setup, NULL, test_overlap),
+ TEST_CASE_ST(ut_setup, NULL, test_v6_ext_header_drop),
+ TEST_CASE_ST(ut_setup, NULL, test_oversize_drop),
+ TEST_CASES_END()
+ }
+};
+
+static int
+test_reassembly(void)
+{
+ return unit_test_suite_runner(&reassembly_testsuite);
+}
+
+REGISTER_FAST_TEST(reassembly_autotest, NOHUGE_SKIP, ASAN_OK, test_reassembly);
--
2.53.0
^ permalink raw reply related [flat|nested] 7+ messages in thread
end of thread, other threads:[~2026-06-16 21:07 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-16 21:05 [PATCH 0/6] ip_frag: fix reassembly defects and add test Stephen Hemminger
2026-06-16 21:05 ` [PATCH 1/6] ip_frag: tolerate duplicate fragments Stephen Hemminger
2026-06-16 21:05 ` [PATCH 2/6] ip_frag: discard datagrams with overlapping fragments Stephen Hemminger
2026-06-16 21:05 ` [PATCH 3/6] ip_frag: include protocol in IPv4 reassembly key Stephen Hemminger
2026-06-16 21:05 ` [PATCH 4/6] ip_frag: drop IPv6 fragments with unexpected headers Stephen Hemminger
2026-06-16 21:05 ` [PATCH 5/6] ip_frag: reject oversized reassembled datagrams Stephen Hemminger
2026-06-16 21:05 ` [PATCH 6/6] app/test: add test for IP reassembly Stephen Hemminger
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox