Netdev List
 help / color / mirror / Atom feed
* [PATCH net 0/2] ipv4: harden against ihl < 5 IP_HDRINCL packets
@ 2026-05-12 20:51 Michael Bommarito
  2026-05-12 20:51 ` [PATCH net 1/2] ipv4: raw: reject IP_HDRINCL packets with ihl < 5 Michael Bommarito
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Michael Bommarito @ 2026-05-12 20:51 UTC (permalink / raw)
  To: Steffen Klassert, Herbert Xu, Eric Dumazet, netdev
  Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, Kuniyuki Iwashima,
	Maciej Zenczykowski, Kees Cook, Jeff Layton,
	Gustavo A . R . Silva, Pablo Neira Ayuso, Florian Westphal,
	netfilter-devel, coreteam, linux-kernel

This series fixes a size_t underflow in net/ipv4/ah4.c:ah_output()
reachable when a raw IP_HDRINCL socket sends a packet with ihl < 5
through an xfrm AH policy.  Originally triaged on security@kernel.org;
moving to netdev at Herbert's suggestion so nftables / netfilter
maintainers can weigh in on a related question (see "Open question"
below).  Herbert also asked for the malformed packet to be rejected
upstream of AH rather than guarded at the AH consumer; that is
patch 1/2.  v1's AH-side guard is kept here as 2/2 defense-in-depth.

Bug
---

In net/ipv4/ah4.c, ah_output_done() and ah_output() copy the IPv4
options area with

    if (top_iph->ihl != 5) {
        memcpy(dst, src, top_iph->ihl * 4 - sizeof(struct iphdr));
    }

The "!= 5" guard correctly excludes the no-options case but does
NOT exclude ihl < 5.  For ihl in [0, 4], top_iph->ihl * 4 is less
than sizeof(struct iphdr) (20); the subtraction is computed as int
and becomes negative, then is implicitly converted to size_t at the
memcpy() call.  The resulting length is close to SIZE_MAX and
memcpy walks off the slab allocation backing the skb's network
header.

The malformed packet arrives via raw_send_hdrinc() in net/ipv4/raw.c.
raw_send_hdrinc() validates "iphlen > length" but does not reject
"iphlen < sizeof(struct iphdr)".  An IP_HDRINCL caller with
CAP_NET_RAW (acquirable in an unprivileged user+net namespace on a
distro kernel with CONFIG_USER_NS=y) can therefore craft an ihl < 5
packet; if a matching xfrm AH policy is installed on the outgoing
route, ah_output() runs on the crafted packet and panics the host
kernel.

The guard has been in place since 1da177e4c3f4 ("Linux-2.6.12-rc2",
2005).  No prior fix on lore (3-year window) and no CVE on the file.

Reproduction
------------

x86 + KASAN (QEMU KVM, net-next 7.1.0-rc2):

  BUG: KASAN: out-of-bounds in ah_output+0x696/0x19e0
  Read of size 18446744073709551596 at addr ffff88800bae9824 \
      by task trigger_ah4_ihl/97
  Call Trace:
   __asan_memcpy+0x23/0x60
   ah_output+0x696/0x19e0
   xfrm_output_resume+0xdc8/0x6280
   xfrm4_output+0xfe/0x4c0
   raw_sendmsg+0x2531/0x26f0
   __sys_sendto+0x32b/0x390
   __x64_sys_sendto+0xdf/0x1f0
   do_syscall_64+0xf3/0x6a0
   entry_SYSCALL_64_after_hwframe+0x77/0x7f
  The buggy address belongs to the object at ffff88800bae9800
   which belongs to the cache kmalloc-1k of size 1024
  The buggy address is located 36 bytes inside of
   1024-byte region [ffff88800bae9800, ffff88800bae9c00)

The read size 0xFFFFFFFFFFFFFFEC (SIZE_MAX - 19) is the
underflowed result of (top_iph->ihl * 4 - sizeof(struct iphdr))
for ihl = 0.  Trigger: veth pair (loopback bypasses
xfrm_output), xfrm AH transport-mode policy, IP_HDRINCL
sendto() of a 128-byte packet with iph->ihl in [0, 4].

A container-only variant (CAP_NET_ADMIN container, no
--privileged, no host networking) panics the host kernel on a
stock distro kernel with CONFIG_INET_AH=m + module autoload.
Repro harness + container Dockerfile + console logs available
privately on request; not attached to this public posting.

Patches
-------

1/2 ipv4: raw: reject IP_HDRINCL packets with ihl < 5

    Upstream-of-AH fix.  An IPv4 header with ihl < 5 is malformed
    by definition (RFC 791) and must not be allowed to continue
    along the in-stack output path.  This is the primary fix.

2/2 ipv4: ah: harden ah_output options-copy guard against ihl < 5

    Defense-in-depth at the three memcpy sites in ah_output() and
    ah_output_done().  Changes "if (top_iph->ihl != 5)" to
    "if (top_iph->ihl > 5)" so a future path delivering an ihl < 5
    packet cannot re-introduce the OOB access.  With patch 1/2 in
    place an IP_HDRINCL-crafted ihl < 5 packet should no longer
    reach ah_output; this patch closes the OOB primitive
    specifically at the AH consumer.

Open question for netfilter / netdev
------------------------------------

After patch 1/2 lands, a caller with CAP_NET_ADMIN can still
deliver an ihl < 5 packet into the post-LOCAL_OUT in-stack path by
attaching an nftables payload-set rule on NF_INET_LOCAL_OUT (or an
NFQUEUE reinject on the same hook) that rewrites byte 0 of the
IPv4 header after the raw_send_hdrinc / __ip_local_out validation
has run.  Construction:

    nft add table ip mangle
    nft add chain ip mangle output { type filter hook output \
                                     priority -150 \; }
    nft add rule ip mangle output ip daddr <victim> \
                                  @nh,0,8 set 0x40

I reproduced this separately with nftables payload-set delivering an
ihl = 0 packet to xfrm4_output() and onward.  Patch 2/2 covers the
AH consumer; other consumers that read iph->ihl after the LOCAL_OUT
hook may be similarly exposed and I have not enumerated them.

Direction question rather than a fix proposal: does basic iphdr
re-sanitization after a header-mangling hook belong in the netfilter
machinery, in each in-stack consumer, or both?

Michael Bommarito (2):
  ipv4: raw: reject IP_HDRINCL packets with ihl < 5
  ipv4: ah: harden ah_output options-copy guard against ihl < 5

 net/ipv4/ah4.c | 6 +++---
 net/ipv4/raw.c | 2 +-
 2 files changed, 4 insertions(+), 4 deletions(-)


base-commit: 73d587ae684d176fac9db94173f77d78a794ea4f
--
2.53.0


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH net 1/2] ipv4: raw: reject IP_HDRINCL packets with ihl < 5
  2026-05-12 20:51 [PATCH net 0/2] ipv4: harden against ihl < 5 IP_HDRINCL packets Michael Bommarito
@ 2026-05-12 20:51 ` Michael Bommarito
  2026-05-12 20:51 ` [PATCH net 2/2] ipv4: ah: harden ah_output options-copy guard against " Michael Bommarito
  2026-05-12 22:34 ` [PATCH net 0/2] ipv4: harden against ihl < 5 IP_HDRINCL packets Pablo Neira Ayuso
  2 siblings, 0 replies; 5+ messages in thread
From: Michael Bommarito @ 2026-05-12 20:51 UTC (permalink / raw)
  To: Steffen Klassert, Herbert Xu, Eric Dumazet, netdev
  Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, Kuniyuki Iwashima,
	Maciej Zenczykowski, Kees Cook, Jeff Layton,
	Gustavo A . R . Silva, Pablo Neira Ayuso, Florian Westphal,
	netfilter-devel, coreteam, linux-kernel, stable

raw_send_hdrinc() validates that the caller-supplied IPv4 header
fits within the message length:

    iphlen = iph->ihl * 4;
    err = -EINVAL;
    if (iphlen > length)
        goto error_free;

    if (iphlen >= sizeof(*iph)) {
        /* fix up saddr, tot_len, id, csum, transport_header */
    }

It does not, however, reject ihl < 5.  For such a packet the
"if (iphlen >= sizeof(*iph))" branch is skipped, leaving the
crafted iphdr untouched, but the packet is still handed to
__ip_local_out() and onward.  Downstream consumers that read
iph->ihl assume a sane value: net/ipv4/ah4.c:ah_output() in
particular subtracts sizeof(struct iphdr) from top_iph->ihl * 4
and passes the (signed-int-negative, then cast to size_t)
result to memcpy(), producing an OOB access of length close to
SIZE_MAX and a host kernel panic.

An IPv4 header with ihl < 5 is malformed by definition (RFC 791:
"Internet Header Length is the length of the internet header in
32 bit words ... Note that the minimum value for a correct header
is 5.").  The kernel should not be willing to inject such a
packet into its own output path.

Reject "iphlen < sizeof(*iph)" alongside the existing
"iphlen > length" check.  This matches the principle that locally
constructed packets that re-enter the IP stack must pass the same
basic sanity tests that a foreign packet would be subjected to.

Once this lands, the "if (iphlen >= sizeof(*iph))" wrapper around
the fixup branch becomes redundant; left in place to keep the
patch minimal and backport-friendly.  A follow-up can unwrap it.

Note that commit 86f4c90a1c5c ("ipv4, ipv6: ensure raw socket
message is big enough to hold an IP header") ensures the message
buffer is large enough to hold an iphdr, but does not constrain
the self-reported iph->ihl.

Reachability: the malformed packet source is any caller with
CAP_NET_RAW, including an unprivileged process in a user+net
namespace on a kernel with CONFIG_USER_NS=y.  The reproduced AH
crash also requires a matching xfrm AH policy on the outgoing
route; a container granted CAP_NET_ADMIN can install that state
and policy in its netns.  Loopback bypasses xfrm_output, so the
trigger uses a real netdev.

Reproduced on UML + KASAN: kernel-mode fault at addr 0x0 with
memcpy_orig at the crash site.  Same shape reproduces inside a
rootless Docker container with --cap-add NET_ADMIN on a stock
distro kernel.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Cc: stable@vger.kernel.org
Suggested-by: Herbert Xu <herbert@gondor.apana.org.au>
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com>
---
 net/ipv4/raw.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv4/raw.c b/net/ipv4/raw.c
index 5aaf9c62c8e1..68e88cb3e55c 100644
--- a/net/ipv4/raw.c
+++ b/net/ipv4/raw.c
@@ -391,7 +391,7 @@ static int raw_send_hdrinc(struct sock *sk, struct flowi4 *fl4,
 	 * in, reject the frame as invalid
 	 */
 	err = -EINVAL;
-	if (iphlen > length)
+	if (iphlen > length || iphlen < sizeof(*iph))
 		goto error_free;
 
 	if (iphlen >= sizeof(*iph)) {
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH net 2/2] ipv4: ah: harden ah_output options-copy guard against ihl < 5
  2026-05-12 20:51 [PATCH net 0/2] ipv4: harden against ihl < 5 IP_HDRINCL packets Michael Bommarito
  2026-05-12 20:51 ` [PATCH net 1/2] ipv4: raw: reject IP_HDRINCL packets with ihl < 5 Michael Bommarito
@ 2026-05-12 20:51 ` Michael Bommarito
  2026-05-12 22:34 ` [PATCH net 0/2] ipv4: harden against ihl < 5 IP_HDRINCL packets Pablo Neira Ayuso
  2 siblings, 0 replies; 5+ messages in thread
From: Michael Bommarito @ 2026-05-12 20:51 UTC (permalink / raw)
  To: Steffen Klassert, Herbert Xu, Eric Dumazet, netdev
  Cc: David S . Miller, Jakub Kicinski, Paolo Abeni, Kuniyuki Iwashima,
	Maciej Zenczykowski, Kees Cook, Jeff Layton,
	Gustavo A . R . Silva, Pablo Neira Ayuso, Florian Westphal,
	netfilter-devel, coreteam, linux-kernel, stable

ah_output() and ah_output_done() copy the IPv4 options area with

    if (top_iph->ihl != 5) {
        memcpy(dst, src, top_iph->ihl * 4 - sizeof(struct iphdr));
    }

The "!= 5" guard correctly excludes the no-options case (ihl == 5)
and allows ihl > 5 where options are present.  It does NOT exclude
ihl < 5.  For ihl in [0, 4], top_iph->ihl * 4 is less than
sizeof(struct iphdr) (20); the subtraction is computed as int,
becomes negative, and is then implicitly converted to size_t at
the memcpy() call.  The resulting length is close to SIZE_MAX and
memcpy walks off the slab allocation backing the skb's network
header.

With the preceding patch ("ipv4: raw: reject IP_HDRINCL packets
with ihl < 5") in place, an ihl < 5 packet from a raw IP_HDRINCL
socket is rejected before it reaches the local-output path.
However, post-LOCAL_OUT hook mangling (nftables payload-set,
NFQUEUE reinject) can still rewrite the IPv4 header after the
raw_send_hdrinc validation has run and deliver an ihl < 5 packet
to ah_output().  Reachability of this path requires CAP_NET_ADMIN
in the relevant netns; it is a smaller class than the original
CAP_NET_RAW path but it is not zero.

Independently of the post-LOCAL_OUT mangling question, the AH
consumer should not contain a memcpy whose size is derived from
an attacker-influenced field without a floor.  Change the guard
to "top_iph->ihl > 5" at all three sites:

  - ah_output_done() (the .complete callback path)
  - ah_output()      (the synchronous options-copy site)
  - ah_output()      (the post-hash restore site)

Behavior for valid packets (ihl in {5, 6, ..., 15}) is unchanged.
For malformed packets with ihl < 5, the options copy is cleanly
skipped; the malformed field no longer becomes a huge memcpy
length.  This is the defense-in-depth half of the series; the
upstream sanity check in the preceding patch is the primary fix.

A mirror-pattern audit found no analogous bug in ah_input(),
ip_clear_mutable_options(), or net/ipv6/ah6.c (IPv6 has a
fixed-length header and no IP_HDRINCL equivalent for crafting an
ihl < 5 ipv6hdr).

Reproduced on UML + KASAN: kernel-mode fault at addr 0x0 with
memcpy_orig at the crash site on a pre-fix kernel.  The AH guard
was verified by forcing the same packets through xfrm: the xfrm
state counter incremented and no KASAN splat or panic occurred.
With the preceding patch in this series, the original raw
IP_HDRINCL path is rejected before AH.

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Cc: stable@vger.kernel.org
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com>
---
 net/ipv4/ah4.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/net/ipv4/ah4.c b/net/ipv4/ah4.c
index 4366cbac3f06..8fa31bdf9792 100644
--- a/net/ipv4/ah4.c
+++ b/net/ipv4/ah4.c
@@ -137,7 +137,7 @@ static void ah_output_done(void *data, int err)
 	top_iph->tos = iph->tos;
 	top_iph->ttl = iph->ttl;
 	top_iph->frag_off = iph->frag_off;
-	if (top_iph->ihl != 5) {
+	if (top_iph->ihl > 5) {
 		top_iph->daddr = iph->daddr;
 		memcpy(top_iph+1, iph+1, top_iph->ihl*4 - sizeof(struct iphdr));
 	}
@@ -197,7 +197,7 @@ static int ah_output(struct xfrm_state *x, struct sk_buff *skb)
 	iph->ttl = top_iph->ttl;
 	iph->frag_off = top_iph->frag_off;
 
-	if (top_iph->ihl != 5) {
+	if (top_iph->ihl > 5) {
 		iph->daddr = top_iph->daddr;
 		memcpy(iph+1, top_iph+1, top_iph->ihl*4 - sizeof(struct iphdr));
 		err = ip_clear_mutable_options(top_iph, &top_iph->daddr);
@@ -253,7 +253,7 @@ static int ah_output(struct xfrm_state *x, struct sk_buff *skb)
 	top_iph->tos = iph->tos;
 	top_iph->ttl = iph->ttl;
 	top_iph->frag_off = iph->frag_off;
-	if (top_iph->ihl != 5) {
+	if (top_iph->ihl > 5) {
 		top_iph->daddr = iph->daddr;
 		memcpy(top_iph+1, iph+1, top_iph->ihl*4 - sizeof(struct iphdr));
 	}
-- 
2.53.0


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH net 0/2] ipv4: harden against ihl < 5 IP_HDRINCL packets
  2026-05-12 20:51 [PATCH net 0/2] ipv4: harden against ihl < 5 IP_HDRINCL packets Michael Bommarito
  2026-05-12 20:51 ` [PATCH net 1/2] ipv4: raw: reject IP_HDRINCL packets with ihl < 5 Michael Bommarito
  2026-05-12 20:51 ` [PATCH net 2/2] ipv4: ah: harden ah_output options-copy guard against " Michael Bommarito
@ 2026-05-12 22:34 ` Pablo Neira Ayuso
  2026-05-12 23:05   ` Michael Bommarito
  2 siblings, 1 reply; 5+ messages in thread
From: Pablo Neira Ayuso @ 2026-05-12 22:34 UTC (permalink / raw)
  To: Michael Bommarito
  Cc: Steffen Klassert, Herbert Xu, Eric Dumazet, netdev,
	David S . Miller, Jakub Kicinski, Paolo Abeni, Kuniyuki Iwashima,
	Maciej Zenczykowski, Kees Cook, Jeff Layton,
	Gustavo A . R . Silva, Florian Westphal, netfilter-devel,
	coreteam, linux-kernel

On Tue, May 12, 2026 at 04:51:13PM -0400, Michael Bommarito wrote:
[...]
> Open question for netfilter / netdev
> ------------------------------------
> 
> After patch 1/2 lands, a caller with CAP_NET_ADMIN can still
> deliver an ihl < 5 packet into the post-LOCAL_OUT in-stack path by
> attaching an nftables payload-set rule on NF_INET_LOCAL_OUT (or an
> NFQUEUE reinject on the same hook) that rewrites byte 0 of the
> IPv4 header after the raw_send_hdrinc / __ip_local_out validation
> has run.

There are possibly more ways to mangle ihl in the kernel in 2026, not
only NFQUEUE and nft_payload.

> Construction:
> 
>     nft add table ip mangle
>     nft add chain ip mangle output { type filter hook output \
>                                      priority -150 \; }
>     nft add rule ip mangle output ip daddr <victim> \
>                                   @nh,0,8 set 0x40
> 
> I reproduced this separately with nftables payload-set delivering an
> ihl = 0 packet to xfrm4_output() and onward.  Patch 2/2 covers the
> AH consumer; other consumers that read iph->ihl after the LOCAL_OUT
> hook may be similarly exposed and I have not enumerated them.
> 
> Direction question rather than a fix proposal: does basic iphdr
> re-sanitization after a header-mangling hook belong in the netfilter
> machinery, in each in-stack consumer, or both?

Your patches LGTM, are you suggesting more patches?

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH net 0/2] ipv4: harden against ihl < 5 IP_HDRINCL packets
  2026-05-12 22:34 ` [PATCH net 0/2] ipv4: harden against ihl < 5 IP_HDRINCL packets Pablo Neira Ayuso
@ 2026-05-12 23:05   ` Michael Bommarito
  0 siblings, 0 replies; 5+ messages in thread
From: Michael Bommarito @ 2026-05-12 23:05 UTC (permalink / raw)
  To: Pablo Neira Ayuso
  Cc: Steffen Klassert, Herbert Xu, Eric Dumazet, netdev,
	David S . Miller, Jakub Kicinski, Paolo Abeni, Kuniyuki Iwashima,
	Maciej Zenczykowski, Kees Cook, Jeff Layton,
	Gustavo A . R . Silva, Florian Westphal, netfilter-devel,
	coreteam, linux-kernel

On Tue, May 12, 2026 at 6:34 PM Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> There are possibly more ways to mangle ihl in the kernel in 2026, not
> only NFQUEUE and nft_payload.

Yes, and there's a peer issue  in BEET IHL wrap I fixed in 017ccd82092e too.

In addition to a few other nft_* paths, my understanding is that tc,
NFQUEUE in userspace, eBPF, OVS, etc. will all be a problem unless we
guard in the IP stack itself.  But then if there are legitimate uses
of this path, we might cause regressions for people with complex rule
sets.  That's why Herbert suggested we should bring the issue here to
get feedback from the list broadly.

> Your patches LGTM, are you suggesting more patches?

I think the answer is yes either way, but either A) a smaller patch
set in IP that I can handle if we go that route or B) distributed
across people who know each of their systems better if we handle in
each subsystem.

Thanks,
Mike Bommarito

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2026-05-12 23:05 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-12 20:51 [PATCH net 0/2] ipv4: harden against ihl < 5 IP_HDRINCL packets Michael Bommarito
2026-05-12 20:51 ` [PATCH net 1/2] ipv4: raw: reject IP_HDRINCL packets with ihl < 5 Michael Bommarito
2026-05-12 20:51 ` [PATCH net 2/2] ipv4: ah: harden ah_output options-copy guard against " Michael Bommarito
2026-05-12 22:34 ` [PATCH net 0/2] ipv4: harden against ihl < 5 IP_HDRINCL packets Pablo Neira Ayuso
2026-05-12 23:05   ` Michael Bommarito

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox