[PATCH] bpf: bpf_dbg: fix off-by-one in cmd_select and pcap_next

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH] bpf: bpf_dbg: fix off-by-one in cmd_select and pcap_next_pkt
@ 2026-04-28 10:01 Hasan Basbunar
  2026-04-29  8:44 ` [PATCH v2] bpf: bpf_dbg: fix off-by-one in cmd_select Hasan Basbunar
  0 siblings, 1 reply; 4+ messages in thread
From: Hasan Basbunar @ 2026-04-28 10:01 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Alexei Starovoitov, Andrii Nakryiko, bpf, linux-kernel,
	Hasan Basbunar

bpf_dbg's interactive 'select <N>' command, documented in the file
header ("select 3 (run etc will start from the 3rd packet in the pcap)")
to use 1-based packet indexing, advances the pcap cursor one packet too
many. The loop in cmd_select():

	pcap_reset_pkt();         /* cursor on packet 1 */
	for (i = 0; i < which && (have_next = pcap_next_pkt()); i++)
		/* noop */;

calls pcap_next_pkt() N times to reach packet N, but pcap_next_pkt()
validates the packet at the cursor and then advances past it. After
N calls the cursor is on packet N+1, so 'select 3' positions on
packet 4, 'select 4' on packet 5, etc. To land on packet N the loop
must advance the cursor only N-1 times.

A second off-by-one in pcap_next_pkt() rejects the last packet of any
pcap whose mapped size equals the sum of its packets exactly (the
common case — pcap files have no trailer):

	if (pcap_ptr_va_curr + sizeof(*hdr) + hdr->caplen -
	    pcap_ptr_va_start >= pcap_map_size)
		return false;

When the current packet ends exactly at the mmap boundary, the
expression equals pcap_map_size and the >= check rejects a fully
in-bounds packet. The same off-by-one is present in the earlier
header-fits check on the same function. Both should compare with >.

Combined effect: 'select N' on a pcap of N packets always reports
"no packet #N available!". For a 1-packet pcap, 'select 1' reports
the only packet as unavailable.

Reproduction (deterministic, no kernel needed): build bpf_dbg from
the unmodified tree, synthesize a pcap with N>=1 packets each with a
distinct payload byte, and drive 'select K / step 1 / quit'. Before
this fix, 'select 1' shows packet 2's payload; 'select N' shows the
"no packet" error. After this fix, 'select K' shows packet K for
all K in 1..N, and 'select N+1' correctly errors.

Cloudflare's downstream mirror at github.com/cloudflare/bpftools
carries the same defect.

Fixes: fd981e3c321a ("filter: bpf_dbg: add minimal bpf debugger")
Signed-off-by: Hasan Basbunar <basbunarhasan@gmail.com>
---
 tools/bpf/bpf_dbg.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/tools/bpf/bpf_dbg.c b/tools/bpf/bpf_dbg.c
index 00e560a17baf..f21576dc2326 100644
--- a/tools/bpf/bpf_dbg.c
+++ b/tools/bpf/bpf_dbg.c
@@ -923,12 +923,12 @@ static bool pcap_next_pkt(void)
 	struct pcap_pkthdr *hdr = pcap_curr_pkt();

 	if (pcap_ptr_va_curr + sizeof(*hdr) -
-	    pcap_ptr_va_start >= pcap_map_size)
+	    pcap_ptr_va_start > pcap_map_size)
 		return false;
 	if (hdr->caplen == 0 || hdr->len == 0 || hdr->caplen > hdr->len)
 		return false;
 	if (pcap_ptr_va_curr + sizeof(*hdr) + hdr->caplen -
-	    pcap_ptr_va_start >= pcap_map_size)
+	    pcap_ptr_va_start > pcap_map_size)
 		return false;

 	pcap_ptr_va_curr += (sizeof(*hdr) + hdr->caplen);
@@ -1141,7 +1141,7 @@ static int cmd_select(char *num)
 	pcap_reset_pkt();
 	bpf_reset();

-	for (i = 0; i < which && (have_next = pcap_next_pkt()); i++)
+	for (i = 1; i < which && (have_next = pcap_next_pkt()); i++)
 		/* noop */;
 	if (!have_next || pcap_curr_pkt() == NULL) {
 		rl_printf("no packet #%u available!\n", which);
-- 
2.53.0

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCH v2] bpf: bpf_dbg: fix off-by-one in cmd_select
  2026-04-28 10:01 [PATCH] bpf: bpf_dbg: fix off-by-one in cmd_select and pcap_next_pkt Hasan Basbunar
@ 2026-04-29  8:44 ` Hasan Basbunar
  2026-04-29 12:35   ` [PATCH v3] bpf: bpf_dbg: split pcap_next_pkt() validation/advance, " Hasan Basbunar
  0 siblings, 1 reply; 4+ messages in thread
From: Hasan Basbunar @ 2026-04-29  8:44 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Alexei Starovoitov, Andrii Nakryiko, bpf, linux-kernel,
	Hasan Basbunar

bpf_dbg's interactive 'select <N>' command, documented in the file
header ("select 3 (run etc will start from the 3rd packet in the
pcap)") to use 1-based packet indexing, advances the pcap cursor one
packet too many. The loop in cmd_select():

	pcap_reset_pkt();         /* cursor on packet 1 */
	for (i = 0; i < which && (have_next = pcap_next_pkt()); i++)
		/* noop */;

calls pcap_next_pkt() N times to reach packet N, but pcap_next_pkt()
validates the packet at the cursor and then advances past it. After
N calls the cursor is on packet N+1, so 'select 3' positions on
packet 4, 'select 4' on packet 5, etc. To land on packet N the loop
must advance the cursor only N-1 times.

Reproduction (deterministic, no kernel needed): build bpf_dbg from
the unmodified tree, synthesize a pcap with N>=2 packets each with
a distinct payload byte, and drive 'select 1 / step 1 / quit'.
Before this fix, 'select 1' shows packet 2's payload. After this
fix, 'select K' shows packet K for all K in 1..N, and 'select N+1'
correctly errors with "no packet #N+1 available!".

Cloudflare's downstream mirror at github.com/cloudflare/bpftools
carries the same defect.

Fixes: fd981e3c321a ("filter: bpf_dbg: add minimal bpf debugger")
Signed-off-by: Hasan Basbunar <basbunarhasan@gmail.com>
---
Changes in v2:
 - Drop the pcap_next_pkt() boundary change (>= -> >). As correctly
   pointed out by Sashiko AI on the v1 thread, that change was wrong:
   when the last packet body ends exactly at the mmap boundary (the
   common case for pcap files with no trailer), the relaxed check let
   pcap_next_pkt() advance the cursor to pcap_ptr_va_start +
   pcap_map_size and return true. The cmd_run() do/while loop then
   re-entered its body, called pcap_curr_pkt() at end-of-mmap, and
   bpf_run_all() dereferenced hdr->caplen / hdr->len out of bounds.
   The original >= comparison is correct: when the body ends at the
   boundary it returns false without advancing, and the loop exits
   cleanly. The cmd_select() 1-based fix below is sufficient and
   self-contained; pcap_next_pkt() is left untouched.
 - v1: https://lore.kernel.org/bpf/20260428100109.56572-1-basbunarhasan@gmail.com/

 tools/bpf/bpf_dbg.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/bpf/bpf_dbg.c b/tools/bpf/bpf_dbg.c
index 00e560a17baf..4895602ab37d 100644
--- a/tools/bpf/bpf_dbg.c
+++ b/tools/bpf/bpf_dbg.c
@@ -1141,7 +1141,7 @@ static int cmd_select(char *num)
 	pcap_reset_pkt();
 	bpf_reset();

-	for (i = 0; i < which && (have_next = pcap_next_pkt()); i++)
+	for (i = 1; i < which && (have_next = pcap_next_pkt()); i++)
 		/* noop */;
 	if (!have_next || pcap_curr_pkt() == NULL) {
 		rl_printf("no packet #%u available!\n", which);
-- 
2.53.0

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCH v3] bpf: bpf_dbg: split pcap_next_pkt() validation/advance, fix off-by-one in cmd_select
  2026-04-29  8:44 ` [PATCH v2] bpf: bpf_dbg: fix off-by-one in cmd_select Hasan Basbunar
@ 2026-04-29 12:35   ` Hasan Basbunar
  2026-04-29 13:13     ` bot+bpf-ci
  0 siblings, 1 reply; 4+ messages in thread
From: Hasan Basbunar @ 2026-04-29 12:35 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Alexei Starovoitov, Andrii Nakryiko, bpf, linux-kernel,
	Hasan Basbunar

bpf_dbg's interactive 'select <N>' command, documented in the file
header ("select 3 (run etc will start from the 3rd packet in the
pcap)") to use 1-based packet indexing, advances the pcap cursor one
packet too many. The loop in cmd_select():

	pcap_reset_pkt();         /* cursor on packet 1 */
	for (i = 0; i < which && (have_next = pcap_next_pkt()); i++)
		/* noop */;

calls pcap_next_pkt() N times to reach packet N, but pcap_next_pkt()
validates the packet at the cursor and then advances past it. After
N calls the cursor is on packet N+1, so 'select 3' positions on
packet 4, 'select 4' on packet 5, etc.

Simply changing the loop init to 'i = 1' (so it advances N-1 times)
fixes the user-visible symptom but leaves the final landed-on packet
unvalidated, and combined with pcap_next_pkt()'s '>=' boundary
checks, mis-handles the boundary cases on the last and just-past-the-
last packet. As pointed out by the Sashiko AI review on v1 and v2,
this surfaces in two ways:

  1. On a perfect pcap (no trailing bytes after the last packet),
     pcap_next_pkt()'s '>= pcap_map_size' rejects packets whose body
     ends exactly at the file boundary, so 'select N' on an N-packet
     file errors as "no packet #N available" even though the packet
     is fully in-bounds.

  2. On a truncated pcap (filehdr + a few stray bytes that happen to
     pass try_load_pcap()'s 'pcap_map_size > sizeof(filehdr)' guard
     but not enough to contain a full pkthdr), 'select 1' returns
     CMD_OK without ever validating the header, and a subsequent
     'step' or 'run' dereferences pcap_curr_pkt()->caplen past the
     mapped region.

Fix all three issues by splitting pcap_next_pkt() into a pure
validator (pcap_curr_pkt_valid()) and a validate-advance-validate
combinator. The boundary check now uses '>' instead of '>=', so a
packet whose body ends exactly at pcap_map_size is correctly accepted.
pcap_next_pkt() returns true only when both the current packet was
valid and, after advancing, the new cursor position is also valid.
This means the do-while in cmd_run() exits cleanly after the last
packet (no past-end dereference), and cmd_select() can call
pcap_curr_pkt_valid() after the loop to bounds-check the final
packet.

Reproduction (deterministic, no kernel needed): build bpf_dbg from
the unmodified tree, synthesize a pcap with N>=2 packets each with a
distinct payload byte, and drive 'select 1 / step 1 / quit'. Before
this fix, 'select 1' shows packet 2's payload. After this fix,
'select K' shows packet K for all K in 1..N, 'select N+1' correctly
errors with "no packet #N+1 available!", and 'select 1' on a pcap
truncated to filehdr + 1 byte also correctly errors.

Cloudflare's downstream mirror at github.com/cloudflare/bpftools
carries the same defect.

Fixes: fd981e3c321a ("filter: bpf_dbg: add minimal bpf debugger")
Signed-off-by: Hasan Basbunar <basbunarhasan@gmail.com>
---
Changes in v3:
 - Split pcap_next_pkt() into pcap_curr_pkt_valid() (pure validator)
   and pcap_next_pkt() (validate-current, advance, validate-new).
 - Boundary check now uses '>' instead of '>='; a packet whose body
   ends exactly at pcap_map_size is correctly accepted.
 - cmd_select() validates the final landed-on packet via
   pcap_curr_pkt_valid() instead of the dead
   `pcap_curr_pkt() == NULL` check.
 - Empirically verified in a clean Debian container (gcc -Wall -O0)
   against:
     * 5-packet pcap, select K for K in 1..6 (5 successes + 1 error
       on K=6, payload byte matches K per the file header docs);
     * 1-packet pcap, select 1 (succeeds), select 2 (errors);
     * truncated pcap (filehdr + 1 byte), select 1 errors cleanly
       without dereferencing past the mapped region;
     * `run` after `select 3` on a 5-packet pcap processes exactly
       3 packets and exits cleanly without past-end deref.
 - Addresses both review concerns raised by Sashiko AI on v1 and v2.
 - v1: https://lore.kernel.org/bpf/20260428100109.56572-1-basbunarhasan@gmail.com/
   v2: https://lore.kernel.org/bpf/20260429084441.22089-1-basbunarhasan@gmail.com/

 tools/bpf/bpf_dbg.c | 19 ++++++++++++++-----
 1 file changed, 14 insertions(+), 5 deletions(-)

diff --git a/tools/bpf/bpf_dbg.c b/tools/bpf/bpf_dbg.c
index 4895602ab37d..db12d2f8fb73 100644
--- a/tools/bpf/bpf_dbg.c
+++ b/tools/bpf/bpf_dbg.c
@@ -918,21 +918,30 @@ static struct pcap_pkthdr *pcap_curr_pkt(void)
 	return (void *) pcap_ptr_va_curr;
 }

-static bool pcap_next_pkt(void)
+static bool pcap_curr_pkt_valid(void)
 {
 	struct pcap_pkthdr *hdr = pcap_curr_pkt();

 	if (pcap_ptr_va_curr + sizeof(*hdr) -
-	    pcap_ptr_va_start >= pcap_map_size)
+	    pcap_ptr_va_start > pcap_map_size)
 		return false;
 	if (hdr->caplen == 0 || hdr->len == 0 || hdr->caplen > hdr->len)
 		return false;
 	if (pcap_ptr_va_curr + sizeof(*hdr) + hdr->caplen -
-	    pcap_ptr_va_start >= pcap_map_size)
+	    pcap_ptr_va_start > pcap_map_size)
 		return false;
+	return true;
+}
+
+static bool pcap_next_pkt(void)
+{
+	struct pcap_pkthdr *hdr;

+	if (!pcap_curr_pkt_valid())
+		return false;
+	hdr = pcap_curr_pkt();
 	pcap_ptr_va_curr += (sizeof(*hdr) + hdr->caplen);
-	return true;
+	return pcap_curr_pkt_valid();
 }

 static void pcap_reset_pkt(void)
@@ -1143,7 +1152,7 @@ static int cmd_select(char *num)

 	for (i = 1; i < which && (have_next = pcap_next_pkt()); i++)
 		/* noop */;
-	if (!have_next || pcap_curr_pkt() == NULL) {
+	if (!have_next || !pcap_curr_pkt_valid()) {
 		rl_printf("no packet #%u available!\n", which);
 		pcap_reset_pkt();
 		return CMD_ERR;
-- 
2.53.0

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH v3] bpf: bpf_dbg: split pcap_next_pkt() validation/advance, fix off-by-one in cmd_select
  2026-04-29 12:35   ` [PATCH v3] bpf: bpf_dbg: split pcap_next_pkt() validation/advance, " Hasan Basbunar
@ 2026-04-29 13:13     ` bot+bpf-ci
  0 siblings, 0 replies; 4+ messages in thread
From: bot+bpf-ci @ 2026-04-29 13:13 UTC (permalink / raw)
  To: basbunarhasan, daniel
  Cc: ast, andrii, bpf, linux-kernel, basbunarhasan, ast, andrii,
	daniel, martin.lau, eddyz87, yonghong.song, clm, ihor.solodrai

[-- Attachment #1: Type: text/plain, Size: 3728 bytes --]

> diff --git a/tools/bpf/bpf_dbg.c b/tools/bpf/bpf_dbg.c
> --- a/tools/bpf/bpf_dbg.c
> +++ b/tools/bpf/bpf_dbg.c
> @@ -918,21 +918,30 @@ static struct pcap_pkthdr *pcap_curr_pkt(void)
>  	return (void *) pcap_ptr_va_curr;
>  }
>
> -static bool pcap_next_pkt(void)
> +static bool pcap_curr_pkt_valid(void)
>  {
>  	struct pcap_pkthdr *hdr = pcap_curr_pkt();
>
>  	if (pcap_ptr_va_curr + sizeof(*hdr) -
> -	    pcap_ptr_va_start >= pcap_map_size)
> +	    pcap_ptr_va_start > pcap_map_size)
>  		return false;
>  	if (hdr->caplen == 0 || hdr->len == 0 || hdr->caplen > hdr->len)
>  		return false;
>  	if (pcap_ptr_va_curr + sizeof(*hdr) + hdr->caplen -
> -	    pcap_ptr_va_start >= pcap_map_size)
> +	    pcap_ptr_va_start > pcap_map_size)
>  		return false;
> +	return true;
> +}
>
> +static bool pcap_next_pkt(void)
> +{
> +	struct pcap_pkthdr *hdr;
> +
> +	if (!pcap_curr_pkt_valid())
> +		return false;
> +	hdr = pcap_curr_pkt();
>  	pcap_ptr_va_curr += (sizeof(*hdr) + hdr->caplen);
> -	return true;
> +	return pcap_curr_pkt_valid();
>  }
>
>  static void pcap_reset_pkt(void)
> @@ -1143,7 +1152,7 @@ static int cmd_select(char *num)
>
>  	for (i = 0; i < which && (have_next = pcap_next_pkt()); i++)
>  		/* noop */;
> -	if (!have_next || pcap_curr_pkt() == NULL) {
> +	if (!have_next || !pcap_curr_pkt_valid()) {

The commit message asserts that after this fix, 'select K shows packet K
for all K in 1..N' and 'select N+1 correctly errors with no packet #N+1
available!'. Does the actual code achieve this behavior given that the
loop still initializes i to 0?

With the new pcap_next_pkt() semantics (validate-current, advance,
validate-new), trace through 'select 1' on a file with >=2 packets:

  Iteration 1 (i=0):
    pcap_next_pkt() validates pkt1 (ok)
    advances cursor to pkt2
    validates pkt2 (ok)
    returns true
    i becomes 1

  Loop exit (i=1, i<1 is false)

  Cursor is now on pkt2
  pcap_curr_pkt_valid() returns true
  Function returns CMD_OK

The user requested packet 1, but subsequent 'step' or 'run' operates on
pkt2. This reproduces the exact defect the commit claims to fix: 'Before
this fix, select 1 shows packet 2's payload.'

Similarly, trace 'select 3' on a file with exactly 3 packets:

  Iteration 1 (i=0): advances pkt1 -> pkt2, validates pkt2 (ok), i=1
  Iteration 2 (i=1): advances pkt2 -> pkt3, validates pkt3 (ok), i=2
  Iteration 3 (i=2): pkt3 valid, advances past pkt3 (cursor at EOF),
                     post-advance pcap_curr_pkt_valid() is false,
                     pcap_next_pkt() returns false, have_next=false

  Loop exits with have_next=false
  Code hits 'no packet #3 available!' even though packet 3 exists

The commit message itself identifies the correct fix: "Simply changing the
loop init to 'i = 1' (so it advances N-1 times) fixes the user-visible
symptom". That change was not applied in the diff. The loop still uses
'i = 0', which causes it to advance N times instead of N-1 times, landing
on packet N+1 when the user requested packet N.

The v3 patch discussion on lore.kernel.org/bpf addressed this by changing
the loop from 'for (i = 0; i < which ...)' to 'for (i = 1; i < which ...)'
to properly implement 1-based indexing. The current code has the
pcap_curr_pkt_valid() refactoring but is missing the loop initialization
fix.

Reference: https://lore.kernel.org/bpf/20260429123543.61559-1-basbunarhasan@gmail.com/

>  		rl_printf("no packet #%u available!\n", which);
>  		pcap_reset_pkt();
>  		return CMD_ERR;


---
AI reviewed your patch. Please fix the bug or email reply why it's not a bug.
See: https://github.com/kernel-patches/vmtest/blob/master/ci/claude/README.md

CI run summary: https://github.com/kernel-patches/bpf/actions/runs/25109638113

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2026-04-29 13:13 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-28 10:01 [PATCH] bpf: bpf_dbg: fix off-by-one in cmd_select and pcap_next_pkt Hasan Basbunar
2026-04-29  8:44 ` [PATCH v2] bpf: bpf_dbg: fix off-by-one in cmd_select Hasan Basbunar
2026-04-29 12:35   ` [PATCH v3] bpf: bpf_dbg: split pcap_next_pkt() validation/advance, " Hasan Basbunar
2026-04-29 13:13     ` bot+bpf-ci

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox