Netdev List
 help / color / mirror / Atom feed
* [PATCH net] ipv4: validate ip_forward_options() option fields against skb tail
@ 2026-05-28 11:12 Qi Tang
  2026-05-28 13:48 ` Jiayuan Chen
  0 siblings, 1 reply; 3+ messages in thread
From: Qi Tang @ 2026-05-28 11:12 UTC (permalink / raw)
  To: davem, kuba, pabeni, edumazet
  Cc: netdev, dsahern, idosch, horms, lyutoon, stable, Qi Tang

ip_forward_options() re-reads the RR/SRR/TS option length byte
optptr[1] and pointer byte optptr[2] from the skb on the forwarding
path and uses them as indexes for 4-byte writes via
ip_rt_get_source() (and a memcmp walk in the SRR branch).

__ip_options_compile() validates those bytes at parse time but stores
only the option's offset into IPCB(skb)->opt.{rr,srr,ts}.  An nftables
FORWARD-chain payload mutation between parse and consume can rewrite
the bytes, driving the indexed writes out of bounds and overlapping
skb_shared_info.  With optptr[2] mutated the write can land in
skb_shared_info.frag_list; the next time the skb is dropped
kfree_skb_list_reason() walks the forged list and frees an
attacker-controlled pointer, an arbitrary-free primitive (R15 below
is the corrupted frag_list):

  BUG: unable to handle page fault for address: ffffed10195fd757
  Oops: 0000 [#1] SMP KASAN NOPTI
  RIP: 0010:kfree_skb_list_reason+0x167/0x5f0
  RAX: 1ffff110195fd757 RBX: dffffc0000000000
  R15: ffff8880cafebabe
  CR2: ffffed10195fd757
  Call Trace:
   skb_release_data+0x565/0x820
   sk_skb_reason_drop+0xc1/0x350
   ip_rcv_core+0x7a8/0xcd0
   ip_rcv+0x97/0x270
   __netif_receive_skb_one_core+0x161/0x1b0
   process_backlog+0x1c4/0x5b0
   net_rx_action+0x934/0xfa0

Bound optptr[2] within optptr[1] before the RR and TS writes, and
clamp the SRR walk to the bytes actually present in the skb.  Match
the existing error handling in this function: skip the malformed
option in place rather than returning, so the single ip_send_check()
at the end still recomputes the checksum for any option that was
updated earlier.

Cc: stable@vger.kernel.org
Reported-by: Qi Tang <tpluszz77@gmail.com>
Reported-by: Tong Liu <lyutoon@gmail.com>
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Qi Tang <tpluszz77@gmail.com>
---
 net/ipv4/ip_options.c | 27 +++++++++++++++++++--------
 1 file changed, 19 insertions(+), 8 deletions(-)

diff --git a/net/ipv4/ip_options.c b/net/ipv4/ip_options.c
index be8815ce3ac24..36a4e3cc39dd1 100644
--- a/net/ipv4/ip_options.c
+++ b/net/ipv4/ip_options.c
@@ -544,18 +544,26 @@ void ip_forward_options(struct sk_buff *skb)
 
 	if (opt->rr_needaddr) {
 		optptr = (unsigned char *)raw + opt->rr;
-		ip_rt_get_source(&optptr[optptr[2]-5], skb, rt);
-		opt->is_changed = 1;
+		if (optptr + optptr[1] <= skb_tail_pointer(skb) &&
+		    optptr[2] >= 5 && optptr[2] <= optptr[1] + 1) {
+			ip_rt_get_source(&optptr[optptr[2] - 5], skb, rt);
+			opt->is_changed = 1;
+		}
 	}
 	if (opt->srr_is_hit) {
 		int srrptr, srrspace;
 
 		optptr = raw + opt->srr;
 
-		for ( srrptr = optptr[2], srrspace = optptr[1];
-		     srrptr <= srrspace;
-		     srrptr += 4
-		     ) {
+		/* optptr[1] (option length) may have been rewritten after the
+		 * parse-time check; if it now runs past the skb the option is
+		 * malformed, so skip the source-route rewrite below.
+		 */
+		srrspace = optptr[1];
+		if (optptr + srrspace > skb_tail_pointer(skb))
+			srrspace = 0;
+
+		for (srrptr = optptr[2]; srrptr <= srrspace; srrptr += 4) {
 			if (srrptr + 3 > srrspace)
 				break;
 			if (memcmp(&opt->nexthop, &optptr[srrptr-1], 4) == 0)
@@ -572,8 +580,11 @@ void ip_forward_options(struct sk_buff *skb)
 		}
 		if (opt->ts_needaddr) {
 			optptr = raw + opt->ts;
-			ip_rt_get_source(&optptr[optptr[2]-9], skb, rt);
-			opt->is_changed = 1;
+			if (optptr + optptr[1] <= skb_tail_pointer(skb) &&
+			    optptr[2] >= 9 && optptr[2] <= optptr[1] + 5) {
+				ip_rt_get_source(&optptr[optptr[2] - 9], skb, rt);
+				opt->is_changed = 1;
+			}
 		}
 	}
 	if (opt->is_changed) {
-- 
2.47.3


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH net] ipv4: validate ip_forward_options() option fields against skb tail
  2026-05-28 11:12 [PATCH net] ipv4: validate ip_forward_options() option fields against skb tail Qi Tang
@ 2026-05-28 13:48 ` Jiayuan Chen
  2026-05-28 16:32   ` Qi Tang
  0 siblings, 1 reply; 3+ messages in thread
From: Jiayuan Chen @ 2026-05-28 13:48 UTC (permalink / raw)
  To: Qi Tang, davem, kuba, pabeni, edumazet
  Cc: netdev, dsahern, idosch, horms, lyutoon, stable


On 5/28/26 7:12 PM, Qi Tang wrote:
> ip_forward_options() re-reads the RR/SRR/TS option length byte
> optptr[1] and pointer byte optptr[2] from the skb on the forwarding
> path and uses them as indexes for 4-byte writes via
> ip_rt_get_source() (and a memcmp walk in the SRR branch).
>
> __ip_options_compile() validates those bytes at parse time but stores
> only the option's offset into IPCB(skb)->opt.{rr,srr,ts}.  An nftables
> FORWARD-chain payload mutation between parse and consume can rewrite
> the bytes, driving the indexed writes out of bounds and overlapping
> skb_shared_info.  With optptr[2] mutated the write can land in
> skb_shared_info.frag_list; the next time the skb is dropped
> kfree_skb_list_reason() walks the forged list and frees an
> attacker-controlled pointer, an arbitrary-free primitive (R15 below
> is the corrupted frag_list):
>
>    BUG: unable to handle page fault for address: ffffed10195fd757
>    Oops: 0000 [#1] SMP KASAN NOPTI
>    RIP: 0010:kfree_skb_list_reason+0x167/0x5f0
>    RAX: 1ffff110195fd757 RBX: dffffc0000000000
>    R15: ffff8880cafebabe
>    CR2: ffffed10195fd757
>    Call Trace:
>     skb_release_data+0x565/0x820
>     sk_skb_reason_drop+0xc1/0x350
>     ip_rcv_core+0x7a8/0xcd0
>     ip_rcv+0x97/0x270
>     __netif_receive_skb_one_core+0x161/0x1b0
>     process_backlog+0x1c4/0x5b0
>     net_rx_action+0x934/0xfa0


The bug is real, but I'm curious what kernel version and driver you're on.
On my side the skb falls into SKB_SMALL_HEAD_CACHE_SIZE (704), so the 
linear area
is pretty long, and optptr[2] maxes out at 255, which doesn't look like 
it can reach frag_list.

May the driver use alloc_skb to allocate small liner buffer?


> Bound optptr[2] within optptr[1] before the RR and TS writes, and
> clamp the SRR walk to the bytes actually present in the skb.  Match
> the existing error handling in this function: skip the malformed
> option in place rather than returning, so the single ip_send_check()
> at the end still recomputes the checksum for any option that was
> updated earlier.
>
> Cc: stable@vger.kernel.org
> Reported-by: Qi Tang <tpluszz77@gmail.com>
> Reported-by: Tong Liu <lyutoon@gmail.com>
> Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
> Signed-off-by: Qi Tang <tpluszz77@gmail.com>
> ---
>   net/ipv4/ip_options.c | 27 +++++++++++++++++++--------
>   1 file changed, 19 insertions(+), 8 deletions(-)
>
> diff --git a/net/ipv4/ip_options.c b/net/ipv4/ip_options.c
> index be8815ce3ac24..36a4e3cc39dd1 100644
> --- a/net/ipv4/ip_options.c
> +++ b/net/ipv4/ip_options.c
> @@ -544,18 +544,26 @@ void ip_forward_options(struct sk_buff *skb)
>   
>   	if (opt->rr_needaddr) {
>   		optptr = (unsigned char *)raw + opt->rr;
> -		ip_rt_get_source(&optptr[optptr[2]-5], skb, rt);
> -		opt->is_changed = 1;
> +		if (optptr + optptr[1] <= skb_tail_pointer(skb) &&
> +		    optptr[2] >= 5 && optptr[2] <= optptr[1] + 1) {
> +			ip_rt_get_source(&optptr[optptr[2] - 5], skb, rt);
> +			opt->is_changed = 1;
> +		}
>   	}
>   	if (opt->srr_is_hit) {
>   		int srrptr, srrspace;
>   
>   		optptr = raw + opt->srr;
>   
> -		for ( srrptr = optptr[2], srrspace = optptr[1];
> -		     srrptr <= srrspace;
> -		     srrptr += 4
> -		     ) {
> +		/* optptr[1] (option length) may have been rewritten after the
> +		 * parse-time check; if it now runs past the skb the option is
> +		 * malformed, so skip the source-route rewrite below.
> +		 */
> +		srrspace = optptr[1];
> +		if (optptr + srrspace > skb_tail_pointer(skb))
> +			srrspace = 0;
> +
> +		for (srrptr = optptr[2]; srrptr <= srrspace; srrptr += 4) {
>   			if (srrptr + 3 > srrspace)
>   				break;
>   			if (memcmp(&opt->nexthop, &optptr[srrptr-1], 4) == 0)
> @@ -572,8 +580,11 @@ void ip_forward_options(struct sk_buff *skb)
>   		}
>   		if (opt->ts_needaddr) {
>   			optptr = raw + opt->ts;
> -			ip_rt_get_source(&optptr[optptr[2]-9], skb, rt);
> -			opt->is_changed = 1;
> +			if (optptr + optptr[1] <= skb_tail_pointer(skb) &&
> +			    optptr[2] >= 9 && optptr[2] <= optptr[1] + 5) {
> +				ip_rt_get_source(&optptr[optptr[2] - 9], skb, rt);
> +				opt->is_changed = 1;
> +			}
>   		}
>   	}
>   	if (opt->is_changed) {

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH net] ipv4: validate ip_forward_options() option fields against skb tail
  2026-05-28 13:48 ` Jiayuan Chen
@ 2026-05-28 16:32   ` Qi Tang
  0 siblings, 0 replies; 3+ messages in thread
From: Qi Tang @ 2026-05-28 16:32 UTC (permalink / raw)
  To: jiayuan.chen
  Cc: davem, kuba, pabeni, edumazet, netdev, dsahern, idosch, horms,
	lyutoon, stable, Qi Tang

On 5/28/26 9:48 PM, Jiayuan Chen wrote:
> The bug is real, but I'm curious what kernel version and driver you're on.
> On my side the skb falls into SKB_SMALL_HEAD_CACHE_SIZE (704), so the
> linear area is pretty long, and optptr[2] maxes out at 255, which doesn't
> look like it can reach frag_list.
>
> May the driver use alloc_skb to allocate small liner buffer?

net.git at e1914add2799 (7.1-rc3), x86_64 + KASAN, plain QEMU, no special
driver. You're right that with a normal small nh_off the +250 write stays in
the linear area. We get the reach from a large nh_off instead.

The packet is forwarded over a VXLAN-over-IPv6 tunnel, so after decap the
inner IP packet still has the outer eth/IPv6/UDP/VXLAN/inner-eth in front of
it in the same head (nh_off ~112 here). Inner options are 12 NOPs + RR, so
opt->rr = 32, and nft rewrites the RR pointer byte to 0xff on the forward
hook:

  nft add rule ip filter forward @nh,272,8 set 0xff

so ip_forward_options() does

  write = head + nh_off + opt->rr + (0xff - 5)
        = head + 112 + 32 + 250 = head + 394

with end = 384 that lands at shinfo+10, inside frag_list. ip_rt_get_source()
writes the route source there, and kfree_skb_list_reason() walks the corrupted
frag_list when the skb is dropped.

VXLAN was just convenient. Other paths likely work too: any encap that pushes
the options deeper, or a smaller head like you suggested. Pre-6.3 without
skb_small_head_cache a plain forwarded packet already has end=192. I can send
the PoC off-list if you want to repro.

Thanks,
Qi

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2026-05-28 16:32 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-28 11:12 [PATCH net] ipv4: validate ip_forward_options() option fields against skb tail Qi Tang
2026-05-28 13:48 ` Jiayuan Chen
2026-05-28 16:32   ` Qi Tang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox