From: Jiayuan Chen <jiayuan.chen@linux.dev>
To: Qi Tang <tpluszz77@gmail.com>,
davem@davemloft.net, kuba@kernel.org, pabeni@redhat.com,
edumazet@google.com
Cc: netdev@vger.kernel.org, dsahern@kernel.org, idosch@nvidia.com,
horms@kernel.org, lyutoon@gmail.com, stable@vger.kernel.org
Subject: Re: [PATCH net] ipv4: validate ip_forward_options() option fields against skb tail
Date: Thu, 28 May 2026 21:48:41 +0800 [thread overview]
Message-ID: <b1447f76-0ca4-49b1-a1ba-2670dbbe5eea@linux.dev> (raw)
In-Reply-To: <20260528111204.482401-1-tpluszz77@gmail.com>
On 5/28/26 7:12 PM, Qi Tang wrote:
> ip_forward_options() re-reads the RR/SRR/TS option length byte
> optptr[1] and pointer byte optptr[2] from the skb on the forwarding
> path and uses them as indexes for 4-byte writes via
> ip_rt_get_source() (and a memcmp walk in the SRR branch).
>
> __ip_options_compile() validates those bytes at parse time but stores
> only the option's offset into IPCB(skb)->opt.{rr,srr,ts}. An nftables
> FORWARD-chain payload mutation between parse and consume can rewrite
> the bytes, driving the indexed writes out of bounds and overlapping
> skb_shared_info. With optptr[2] mutated the write can land in
> skb_shared_info.frag_list; the next time the skb is dropped
> kfree_skb_list_reason() walks the forged list and frees an
> attacker-controlled pointer, an arbitrary-free primitive (R15 below
> is the corrupted frag_list):
>
> BUG: unable to handle page fault for address: ffffed10195fd757
> Oops: 0000 [#1] SMP KASAN NOPTI
> RIP: 0010:kfree_skb_list_reason+0x167/0x5f0
> RAX: 1ffff110195fd757 RBX: dffffc0000000000
> R15: ffff8880cafebabe
> CR2: ffffed10195fd757
> Call Trace:
> skb_release_data+0x565/0x820
> sk_skb_reason_drop+0xc1/0x350
> ip_rcv_core+0x7a8/0xcd0
> ip_rcv+0x97/0x270
> __netif_receive_skb_one_core+0x161/0x1b0
> process_backlog+0x1c4/0x5b0
> net_rx_action+0x934/0xfa0
The bug is real, but I'm curious what kernel version and driver you're on.
On my side the skb falls into SKB_SMALL_HEAD_CACHE_SIZE (704), so the
linear area
is pretty long, and optptr[2] maxes out at 255, which doesn't look like
it can reach frag_list.
May the driver use alloc_skb to allocate small liner buffer?
> Bound optptr[2] within optptr[1] before the RR and TS writes, and
> clamp the SRR walk to the bytes actually present in the skb. Match
> the existing error handling in this function: skip the malformed
> option in place rather than returning, so the single ip_send_check()
> at the end still recomputes the checksum for any option that was
> updated earlier.
>
> Cc: stable@vger.kernel.org
> Reported-by: Qi Tang <tpluszz77@gmail.com>
> Reported-by: Tong Liu <lyutoon@gmail.com>
> Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
> Signed-off-by: Qi Tang <tpluszz77@gmail.com>
> ---
> net/ipv4/ip_options.c | 27 +++++++++++++++++++--------
> 1 file changed, 19 insertions(+), 8 deletions(-)
>
> diff --git a/net/ipv4/ip_options.c b/net/ipv4/ip_options.c
> index be8815ce3ac24..36a4e3cc39dd1 100644
> --- a/net/ipv4/ip_options.c
> +++ b/net/ipv4/ip_options.c
> @@ -544,18 +544,26 @@ void ip_forward_options(struct sk_buff *skb)
>
> if (opt->rr_needaddr) {
> optptr = (unsigned char *)raw + opt->rr;
> - ip_rt_get_source(&optptr[optptr[2]-5], skb, rt);
> - opt->is_changed = 1;
> + if (optptr + optptr[1] <= skb_tail_pointer(skb) &&
> + optptr[2] >= 5 && optptr[2] <= optptr[1] + 1) {
> + ip_rt_get_source(&optptr[optptr[2] - 5], skb, rt);
> + opt->is_changed = 1;
> + }
> }
> if (opt->srr_is_hit) {
> int srrptr, srrspace;
>
> optptr = raw + opt->srr;
>
> - for ( srrptr = optptr[2], srrspace = optptr[1];
> - srrptr <= srrspace;
> - srrptr += 4
> - ) {
> + /* optptr[1] (option length) may have been rewritten after the
> + * parse-time check; if it now runs past the skb the option is
> + * malformed, so skip the source-route rewrite below.
> + */
> + srrspace = optptr[1];
> + if (optptr + srrspace > skb_tail_pointer(skb))
> + srrspace = 0;
> +
> + for (srrptr = optptr[2]; srrptr <= srrspace; srrptr += 4) {
> if (srrptr + 3 > srrspace)
> break;
> if (memcmp(&opt->nexthop, &optptr[srrptr-1], 4) == 0)
> @@ -572,8 +580,11 @@ void ip_forward_options(struct sk_buff *skb)
> }
> if (opt->ts_needaddr) {
> optptr = raw + opt->ts;
> - ip_rt_get_source(&optptr[optptr[2]-9], skb, rt);
> - opt->is_changed = 1;
> + if (optptr + optptr[1] <= skb_tail_pointer(skb) &&
> + optptr[2] >= 9 && optptr[2] <= optptr[1] + 5) {
> + ip_rt_get_source(&optptr[optptr[2] - 9], skb, rt);
> + opt->is_changed = 1;
> + }
> }
> }
> if (opt->is_changed) {
next prev parent reply other threads:[~2026-05-28 13:48 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-28 11:12 [PATCH net] ipv4: validate ip_forward_options() option fields against skb tail Qi Tang
2026-05-28 13:48 ` Jiayuan Chen [this message]
2026-05-28 16:32 ` Qi Tang
2026-05-29 2:55 ` Jiayuan Chen
2026-05-29 9:40 ` Florian Westphal
2026-05-29 10:43 ` Qi Tang
2026-05-31 12:17 ` Ido Schimmel
2026-06-04 8:46 ` Paolo Abeni
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=b1447f76-0ca4-49b1-a1ba-2670dbbe5eea@linux.dev \
--to=jiayuan.chen@linux.dev \
--cc=davem@davemloft.net \
--cc=dsahern@kernel.org \
--cc=edumazet@google.com \
--cc=horms@kernel.org \
--cc=idosch@nvidia.com \
--cc=kuba@kernel.org \
--cc=lyutoon@gmail.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=stable@vger.kernel.org \
--cc=tpluszz77@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.