[PATCH net v2] ipv6: Implement limits on extension header parsing

public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH net v2] ipv6: Implement limits on extension header parsing
@ 2026-04-25  7:55 Daniel Borkmann
  2026-04-25 10:19 ` Justin Iurman
  0 siblings, 1 reply; 6+ messages in thread
From: Daniel Borkmann @ 2026-04-25  7:55 UTC (permalink / raw)
  To: kuba
  Cc: edumazet, dsahern, tom, willemdebruijn.kernel, idosch,
	justin.iurman, pabeni, netdev

ipv6_{skip_exthdr,find_hdr}() and ip6_{tnl_parse_tlv_enc_lim,
protocol_deliver_rcu}() iterate over IPv6 extension headers until they
find a non-extension-header protocol or run out of packet data. The
loops have no iteration counter, relying solely on the packet length
to bound them. For a crafted packet with 8-byte extension headers
filling a 64KB jumbogram, this means a worst case of up to ~8k
iterations with a skb_header_pointer call each. ipv6_skip_exthdr(),
for example, is used where it parses the inner quoted packet inside
an incoming ICMPv6 error:

  - icmpv6_rcv
    - checksum validation
    - case ICMPV6_DEST_UNREACH
      - icmpv6_notify
        - pskb_may_pull()       <- pull inner IPv6 header
        - ipv6_skip_exthdr()    <- iterates here
        - pskb_may_pull()
        - ipprot->err_handler() <- sk lookup

The per-iteration cost of ipv6_skip_exthdr itself is generally
light, but skb_header_pointer becomes more costly on reassembled
packets: the first ~1232 bytes of the inner packet are in the skb's
linear area, but the remaining ~63KB are in the frag_list where
skb_copy_bits is needed to read data.

Add a configurable limit via a new sysctl net.ipv6.max_ext_hdrs_number
(default 8, minimum 1). All four extension header walking functions
are bound by this limit. The sysctl is in line with commit 47d3d7ac656a
("ipv6: Implement limits on Hop-by-Hop and Destination options").
As documented, init_net is used to derive max_ext_hdrs_number to
be consistent given a net cannot always reliably be retrieved.

Note that the check in ip6_protocol_deliver_rcu() happens right
before the goto resubmit, such that we don't have to have a test
for ipv6_ext_hdr() in the fast-path.

There's an ongoing IETF draft-iurman-6man-eh-occurrences to enforce
IPv6 extension headers ordering and occurrence. The latter also
discusses security implications. As per RFC8200 section 4.1, the
occurrence rules for extension headers provide a practical upper
bound, thus 8 was used as the default.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
---
 v1->v2:
   - Set the default to 8 (Justin)
   - Update IETF references (Justin)
   - Add core path coverage as well (Justin)

 Documentation/networking/ip-sysctl.rst |  7 +++++++
 include/net/dropreason-core.h          |  6 ++++++
 include/net/ipv6.h                     |  2 ++
 include/net/netns/ipv6.h               |  1 +
 net/ipv6/af_inet6.c                    |  1 +
 net/ipv6/exthdrs_core.c                | 11 +++++++++++
 net/ipv6/ip6_input.c                   |  6 ++++++
 net/ipv6/ip6_tunnel.c                  |  5 +++++
 net/ipv6/sysctl_net_ipv6.c             |  8 ++++++++
 9 files changed, 47 insertions(+)

diff --git a/Documentation/networking/ip-sysctl.rst b/Documentation/networking/ip-sysctl.rst
index 2e3a746fcc6d..f7412f4049d1 100644
--- a/Documentation/networking/ip-sysctl.rst
+++ b/Documentation/networking/ip-sysctl.rst
@@ -2537,6 +2537,13 @@ max_hbh_length - INTEGER
 
 	Default: INT_MAX (unlimited)
 
+max_ext_hdrs_number - INTEGER
+	Maximum number of IPv6 extension headers allowed in a packet.
+	Limits how many extension headers will be traversed. The value
+	is read from the initial netns.
+
+	Default: 8
+
 skip_notify_on_dev_down - BOOLEAN
 	Controls whether an RTM_DELROUTE message is generated for routes
 	removed when a device is taken down or deleted. IPv4 does not
diff --git a/include/net/dropreason-core.h b/include/net/dropreason-core.h
index e0ca3904ff8e..1fd91e59b84e 100644
--- a/include/net/dropreason-core.h
+++ b/include/net/dropreason-core.h
@@ -99,6 +99,7 @@
 	FN(FRAG_TOO_FAR)		\
 	FN(TCP_MINTTL)			\
 	FN(IPV6_BAD_EXTHDR)		\
+	FN(IPV6_TOO_MANY_EXTHDRS)	\
 	FN(IPV6_NDISC_FRAG)		\
 	FN(IPV6_NDISC_HOP_LIMIT)	\
 	FN(IPV6_NDISC_BAD_CODE)		\
@@ -494,6 +495,11 @@ enum skb_drop_reason {
 	SKB_DROP_REASON_TCP_MINTTL,
 	/** @SKB_DROP_REASON_IPV6_BAD_EXTHDR: Bad IPv6 extension header. */
 	SKB_DROP_REASON_IPV6_BAD_EXTHDR,
+	/**
+	 * @SKB_DROP_REASON_IPV6_TOO_MANY_EXTHDRS: Number of IPv6 extension
+	 * headers in the packet exceeds net.ipv6.max_ext_hdrs_number.
+	 */
+	SKB_DROP_REASON_IPV6_TOO_MANY_EXTHDRS,
 	/** @SKB_DROP_REASON_IPV6_NDISC_FRAG: invalid frag (suppress_frag_ndisc). */
 	SKB_DROP_REASON_IPV6_NDISC_FRAG,
 	/** @SKB_DROP_REASON_IPV6_NDISC_HOP_LIMIT: invalid hop limit. */
diff --git a/include/net/ipv6.h b/include/net/ipv6.h
index d042afe7a245..c540b750726e 100644
--- a/include/net/ipv6.h
+++ b/include/net/ipv6.h
@@ -90,6 +90,8 @@ struct ip_tunnel_info;
 #define IP6_DEFAULT_MAX_DST_OPTS_LEN	 INT_MAX /* No limit */
 #define IP6_DEFAULT_MAX_HBH_OPTS_LEN	 INT_MAX /* No limit */
 
+#define IP6_DEFAULT_MAX_EXT_HDRS_CNT	 8
+
 /*
  *	Addr type
  *	
diff --git a/include/net/netns/ipv6.h b/include/net/netns/ipv6.h
index 499e4288170f..2cea457bddb4 100644
--- a/include/net/netns/ipv6.h
+++ b/include/net/netns/ipv6.h
@@ -54,6 +54,7 @@ struct netns_sysctl_ipv6 {
 	int max_hbh_opts_cnt;
 	int max_dst_opts_len;
 	int max_hbh_opts_len;
+	int max_ext_hdrs_cnt;
 	int seg6_flowlabel;
 	u32 ioam6_id;
 	u64 ioam6_id_wide;
diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c
index 0a88b376141d..19424c3f2dfc 100644
--- a/net/ipv6/af_inet6.c
+++ b/net/ipv6/af_inet6.c
@@ -945,6 +945,7 @@ static int __net_init inet6_net_init(struct net *net)
 	net->ipv6.sysctl.flowlabel_state_ranges = 0;
 	net->ipv6.sysctl.max_dst_opts_cnt = IP6_DEFAULT_MAX_DST_OPTS_CNT;
 	net->ipv6.sysctl.max_hbh_opts_cnt = IP6_DEFAULT_MAX_HBH_OPTS_CNT;
+	net->ipv6.sysctl.max_ext_hdrs_cnt = IP6_DEFAULT_MAX_EXT_HDRS_CNT;
 	net->ipv6.sysctl.max_dst_opts_len = IP6_DEFAULT_MAX_DST_OPTS_LEN;
 	net->ipv6.sysctl.max_hbh_opts_len = IP6_DEFAULT_MAX_HBH_OPTS_LEN;
 	net->ipv6.sysctl.fib_notify_on_flag_change = 0;
diff --git a/net/ipv6/exthdrs_core.c b/net/ipv6/exthdrs_core.c
index 49e31e4ae7b7..9df892e7f7fb 100644
--- a/net/ipv6/exthdrs_core.c
+++ b/net/ipv6/exthdrs_core.c
@@ -4,6 +4,8 @@
  * not configured or static.
  */
 #include <linux/export.h>
+
+#include <net/net_namespace.h>
 #include <net/ipv6.h>
 
 /*
@@ -72,7 +74,9 @@ EXPORT_SYMBOL(ipv6_ext_hdr);
 int ipv6_skip_exthdr(const struct sk_buff *skb, int start, u8 *nexthdrp,
 		     __be16 *frag_offp)
 {
+	int exthdr_max = READ_ONCE(init_net.ipv6.sysctl.max_ext_hdrs_cnt);
 	u8 nexthdr = *nexthdrp;
+	int exthdr_cnt = 0;
 
 	*frag_offp = 0;
 
@@ -82,6 +86,8 @@ int ipv6_skip_exthdr(const struct sk_buff *skb, int start, u8 *nexthdrp,
 
 		if (nexthdr == NEXTHDR_NONE)
 			return -1;
+		if (unlikely(exthdr_cnt++ >= exthdr_max))
+			return -1;
 		hp = skb_header_pointer(skb, start, sizeof(_hdr), &_hdr);
 		if (!hp)
 			return -1;
@@ -188,8 +194,10 @@ EXPORT_SYMBOL_GPL(ipv6_find_tlv);
 int ipv6_find_hdr(const struct sk_buff *skb, unsigned int *offset,
 		  int target, unsigned short *fragoff, int *flags)
 {
+	int exthdr_max = READ_ONCE(init_net.ipv6.sysctl.max_ext_hdrs_cnt);
 	unsigned int start = skb_network_offset(skb) + sizeof(struct ipv6hdr);
 	u8 nexthdr = ipv6_hdr(skb)->nexthdr;
+	int exthdr_cnt = 0;
 	bool found;
 
 	if (fragoff)
@@ -216,6 +224,9 @@ int ipv6_find_hdr(const struct sk_buff *skb, unsigned int *offset,
 			return -ENOENT;
 		}
 
+		if (unlikely(exthdr_cnt++ >= exthdr_max))
+			return -EBADMSG;
+
 		hp = skb_header_pointer(skb, start, sizeof(_hdr), &_hdr);
 		if (!hp)
 			return -EBADMSG;
diff --git a/net/ipv6/ip6_input.c b/net/ipv6/ip6_input.c
index 967b07aeb683..a5bbbc16e8d7 100644
--- a/net/ipv6/ip6_input.c
+++ b/net/ipv6/ip6_input.c
@@ -403,8 +403,10 @@ INDIRECT_CALLABLE_DECLARE(int tcp_v6_rcv(struct sk_buff *));
 void ip6_protocol_deliver_rcu(struct net *net, struct sk_buff *skb, int nexthdr,
 			      bool have_final)
 {
+	int exthdr_max = READ_ONCE(init_net.ipv6.sysctl.max_ext_hdrs_cnt);
 	const struct inet6_protocol *ipprot;
 	struct inet6_dev *idev;
+	int exthdr_cnt = 0;
 	unsigned int nhoff;
 	SKB_DR(reason);
 	bool raw;
@@ -487,6 +489,10 @@ void ip6_protocol_deliver_rcu(struct net *net, struct sk_buff *skb, int nexthdr,
 				nexthdr = ret;
 				goto resubmit_final;
 			} else {
+				if (unlikely(exthdr_cnt++ >= exthdr_max)) {
+					SKB_DR_SET(reason, IPV6_TOO_MANY_EXTHDRS);
+					goto discard;
+				}
 				goto resubmit;
 			}
 		} else if (ret == 0) {
diff --git a/net/ipv6/ip6_tunnel.c b/net/ipv6/ip6_tunnel.c
index c468c83af0f2..4546a60942ab 100644
--- a/net/ipv6/ip6_tunnel.c
+++ b/net/ipv6/ip6_tunnel.c
@@ -395,15 +395,20 @@ ip6_tnl_dev_uninit(struct net_device *dev)
 
 __u16 ip6_tnl_parse_tlv_enc_lim(struct sk_buff *skb, __u8 *raw)
 {
+	int exthdr_max = READ_ONCE(init_net.ipv6.sysctl.max_ext_hdrs_cnt);
 	const struct ipv6hdr *ipv6h = (const struct ipv6hdr *)raw;
 	unsigned int nhoff = raw - skb->data;
 	unsigned int off = nhoff + sizeof(*ipv6h);
 	u8 nexthdr = ipv6h->nexthdr;
+	int exthdr_cnt = 0;
 
 	while (ipv6_ext_hdr(nexthdr) && nexthdr != NEXTHDR_NONE) {
 		struct ipv6_opt_hdr *hdr;
 		u16 optlen;
 
+		if (unlikely(exthdr_cnt++ >= exthdr_max))
+			break;
+
 		if (!pskb_may_pull(skb, off + sizeof(*hdr)))
 			break;
 
diff --git a/net/ipv6/sysctl_net_ipv6.c b/net/ipv6/sysctl_net_ipv6.c
index d2cd33e2698d..93f865545a7c 100644
--- a/net/ipv6/sysctl_net_ipv6.c
+++ b/net/ipv6/sysctl_net_ipv6.c
@@ -135,6 +135,14 @@ static struct ctl_table ipv6_table_template[] = {
 		.extra1		= SYSCTL_ZERO,
 		.extra2		= &flowlabel_reflect_max,
 	},
+	{
+		.procname	= "max_ext_hdrs_number",
+		.data		= &init_net.ipv6.sysctl.max_ext_hdrs_cnt,
+		.maxlen		= sizeof(int),
+		.mode		= 0644,
+		.proc_handler	= proc_dointvec_minmax,
+		.extra1		= SYSCTL_ONE,
+	},
 	{
 		.procname	= "max_dst_opts_number",
 		.data		= &init_net.ipv6.sysctl.max_dst_opts_cnt,
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH net v2] ipv6: Implement limits on extension header parsing
  2026-04-25  7:55 [PATCH net v2] ipv6: Implement limits on extension header parsing Daniel Borkmann
@ 2026-04-25 10:19 ` Justin Iurman
  2026-04-26 10:38   ` Daniel Borkmann
  0 siblings, 1 reply; 6+ messages in thread
From: Justin Iurman @ 2026-04-25 10:19 UTC (permalink / raw)
  To: Daniel Borkmann, kuba
  Cc: edumazet, dsahern, tom, willemdebruijn.kernel, idosch, pabeni,
	netdev

On 4/25/26 09:55, Daniel Borkmann wrote:
> ipv6_{skip_exthdr,find_hdr}() and ip6_{tnl_parse_tlv_enc_lim,
> protocol_deliver_rcu}() iterate over IPv6 extension headers until they
> find a non-extension-header protocol or run out of packet data. The
> loops have no iteration counter, relying solely on the packet length
> to bound them. For a crafted packet with 8-byte extension headers
> filling a 64KB jumbogram, this means a worst case of up to ~8k
> iterations with a skb_header_pointer call each. ipv6_skip_exthdr(),
> for example, is used where it parses the inner quoted packet inside
> an incoming ICMPv6 error:
> 
>    - icmpv6_rcv
>      - checksum validation
>      - case ICMPV6_DEST_UNREACH
>        - icmpv6_notify
>          - pskb_may_pull()       <- pull inner IPv6 header
>          - ipv6_skip_exthdr()    <- iterates here
>          - pskb_may_pull()
>          - ipprot->err_handler() <- sk lookup
> 
> The per-iteration cost of ipv6_skip_exthdr itself is generally
> light, but skb_header_pointer becomes more costly on reassembled
> packets: the first ~1232 bytes of the inner packet are in the skb's
> linear area, but the remaining ~63KB are in the frag_list where
> skb_copy_bits is needed to read data.
> 
> Add a configurable limit via a new sysctl net.ipv6.max_ext_hdrs_number
> (default 8, minimum 1). All four extension header walking functions
> are bound by this limit. The sysctl is in line with commit 47d3d7ac656a
> ("ipv6: Implement limits on Hop-by-Hop and Destination options").
> As documented, init_net is used to derive max_ext_hdrs_number to
> be consistent given a net cannot always reliably be retrieved.
> 
> Note that the check in ip6_protocol_deliver_rcu() happens right
> before the goto resubmit, such that we don't have to have a test
> for ipv6_ext_hdr() in the fast-path.
> 
> There's an ongoing IETF draft-iurman-6man-eh-occurrences to enforce
> IPv6 extension headers ordering and occurrence. The latter also
> discusses security implications. As per RFC8200 section 4.1, the
> occurrence rules for extension headers provide a practical upper
> bound, thus 8 was used as the default.
> 
> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
> ---
>   v1->v2:
>     - Set the default to 8 (Justin)
>     - Update IETF references (Justin)
>     - Add core path coverage as well (Justin)
> 
>   Documentation/networking/ip-sysctl.rst |  7 +++++++
>   include/net/dropreason-core.h          |  6 ++++++
>   include/net/ipv6.h                     |  2 ++
>   include/net/netns/ipv6.h               |  1 +
>   net/ipv6/af_inet6.c                    |  1 +
>   net/ipv6/exthdrs_core.c                | 11 +++++++++++
>   net/ipv6/ip6_input.c                   |  6 ++++++
>   net/ipv6/ip6_tunnel.c                  |  5 +++++
>   net/ipv6/sysctl_net_ipv6.c             |  8 ++++++++
>   9 files changed, 47 insertions(+)
> 

[snip]

> diff --git a/net/ipv6/ip6_input.c b/net/ipv6/ip6_input.c
> index 967b07aeb683..a5bbbc16e8d7 100644
> --- a/net/ipv6/ip6_input.c
> +++ b/net/ipv6/ip6_input.c
> @@ -403,8 +403,10 @@ INDIRECT_CALLABLE_DECLARE(int tcp_v6_rcv(struct sk_buff *));
>   void ip6_protocol_deliver_rcu(struct net *net, struct sk_buff *skb, int nexthdr,
>   			      bool have_final)
>   {
> +	int exthdr_max = READ_ONCE(init_net.ipv6.sysctl.max_ext_hdrs_cnt);
>   	const struct inet6_protocol *ipprot;
>   	struct inet6_dev *idev;
> +	int exthdr_cnt = 0;
>   	unsigned int nhoff;
>   	SKB_DR(reason);
>   	bool raw;
> @@ -487,6 +489,10 @@ void ip6_protocol_deliver_rcu(struct net *net, struct sk_buff *skb, int nexthdr,
>   				nexthdr = ret;
>   				goto resubmit_final;
>   			} else {
> +				if (unlikely(exthdr_cnt++ >= exthdr_max)) {
> +					SKB_DR_SET(reason, IPV6_TOO_MANY_EXTHDRS);
> +					goto discard;
> +				}
>   				goto resubmit;
>   			}
>   		} else if (ret == 0) {

The hop-by-hop options header (if present) is not taken into account 
based on the above. However, the max number of extension headers 
(implicitly 7***, as per RFC 8200 Section 4.1) must include it. I 
suggest adding this at the beginning of ip6_protocol_deliver_rcu():

struct inet6_skb_parm *opt = IP6CB(skb);

if (opt->flags & IP6SKB_HOPBYHOP)
	exthdr_cnt++;

*** FYI, rounding to 8 is fine for this fix

> diff --git a/net/ipv6/sysctl_net_ipv6.c b/net/ipv6/sysctl_net_ipv6.c
> index d2cd33e2698d..93f865545a7c 100644
> --- a/net/ipv6/sysctl_net_ipv6.c
> +++ b/net/ipv6/sysctl_net_ipv6.c
> @@ -135,6 +135,14 @@ static struct ctl_table ipv6_table_template[] = {
>   		.extra1		= SYSCTL_ZERO,
>   		.extra2		= &flowlabel_reflect_max,
>   	},
> +	{
> +		.procname	= "max_ext_hdrs_number",
> +		.data		= &init_net.ipv6.sysctl.max_ext_hdrs_cnt,
> +		.maxlen		= sizeof(int),
> +		.mode		= 0644,
> +		.proc_handler	= proc_dointvec_minmax,
> +		.extra1		= SYSCTL_ONE,
> +	},
>   	{
>   		.procname	= "max_dst_opts_number",
>   		.data		= &init_net.ipv6.sysctl.max_dst_opts_cnt,

I've given it a lot of thought. I came to the conclusion that we should 
use a hard-coded value here as well (just like we did for 076b8cad77aa, 
with the same logic), not a sysctl. IMO, the main reason is that it 
provides as is a suitable security fix to be backported, i.e., the max 
value is the max number of EHs allowed by RFC 8200, Section 4.1. Also, 
we remain consistent with draft-iurman-6man-eh-occurrences (I think Tom 
is about to send a revision of the series soon for net-next). What this 
series does is not only enforcing ordering, but also verifying the 
specific number of occurrences for each type of Extension Header. Which 
is totally compatible with what this patch does, i.e., limiting the 
total number of Extension Headers (regardless of their types) to 8. I 
guess what I'm trying to say is that it seems like a good 
plan/compromise and that the aforementioned series would build perfectly 
on top of this fix.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH net v2] ipv6: Implement limits on extension header parsing
  2026-04-25 10:19 ` Justin Iurman
@ 2026-04-26 10:38   ` Daniel Borkmann
  2026-04-26 10:56     ` saeed bishara
  2026-04-26 13:17     ` Ido Schimmel
  0 siblings, 2 replies; 6+ messages in thread
From: Daniel Borkmann @ 2026-04-26 10:38 UTC (permalink / raw)
  To: Justin Iurman, kuba
  Cc: edumazet, dsahern, tom, willemdebruijn.kernel, idosch, pabeni,
	netdev

Hi Justin,

On 4/25/26 12:19 PM, Justin Iurman wrote:
> On 4/25/26 09:55, Daniel Borkmann wrote:
>> ipv6_{skip_exthdr,find_hdr}() and ip6_{tnl_parse_tlv_enc_lim,
>> protocol_deliver_rcu}() iterate over IPv6 extension headers until they
>> find a non-extension-header protocol or run out of packet data. The
>> loops have no iteration counter, relying solely on the packet length
>> to bound them. For a crafted packet with 8-byte extension headers
>> filling a 64KB jumbogram, this means a worst case of up to ~8k
>> iterations with a skb_header_pointer call each. ipv6_skip_exthdr(),
>> for example, is used where it parses the inner quoted packet inside
>> an incoming ICMPv6 error:
>>
>>    - icmpv6_rcv
>>      - checksum validation
>>      - case ICMPV6_DEST_UNREACH
>>        - icmpv6_notify
>>          - pskb_may_pull()       <- pull inner IPv6 header
>>          - ipv6_skip_exthdr()    <- iterates here
>>          - pskb_may_pull()
>>          - ipprot->err_handler() <- sk lookup
>>
>> The per-iteration cost of ipv6_skip_exthdr itself is generally
>> light, but skb_header_pointer becomes more costly on reassembled
>> packets: the first ~1232 bytes of the inner packet are in the skb's
>> linear area, but the remaining ~63KB are in the frag_list where
>> skb_copy_bits is needed to read data.
>>
>> Add a configurable limit via a new sysctl net.ipv6.max_ext_hdrs_number
>> (default 8, minimum 1). All four extension header walking functions
>> are bound by this limit. The sysctl is in line with commit 47d3d7ac656a
>> ("ipv6: Implement limits on Hop-by-Hop and Destination options").
>> As documented, init_net is used to derive max_ext_hdrs_number to
>> be consistent given a net cannot always reliably be retrieved.
>>
>> Note that the check in ip6_protocol_deliver_rcu() happens right
>> before the goto resubmit, such that we don't have to have a test
>> for ipv6_ext_hdr() in the fast-path.
>>
>> There's an ongoing IETF draft-iurman-6man-eh-occurrences to enforce
>> IPv6 extension headers ordering and occurrence. The latter also
>> discusses security implications. As per RFC8200 section 4.1, the
>> occurrence rules for extension headers provide a practical upper
>> bound, thus 8 was used as the default.
>>
>> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
>> ---
>>   v1->v2:
>>     - Set the default to 8 (Justin)
>>     - Update IETF references (Justin)
>>     - Add core path coverage as well (Justin)
>>
>>   Documentation/networking/ip-sysctl.rst |  7 +++++++
>>   include/net/dropreason-core.h          |  6 ++++++
>>   include/net/ipv6.h                     |  2 ++
>>   include/net/netns/ipv6.h               |  1 +
>>   net/ipv6/af_inet6.c                    |  1 +
>>   net/ipv6/exthdrs_core.c                | 11 +++++++++++
>>   net/ipv6/ip6_input.c                   |  6 ++++++
>>   net/ipv6/ip6_tunnel.c                  |  5 +++++
>>   net/ipv6/sysctl_net_ipv6.c             |  8 ++++++++
>>   9 files changed, 47 insertions(+)
>>
> 
> [snip]
> 
>> diff --git a/net/ipv6/ip6_input.c b/net/ipv6/ip6_input.c
>> index 967b07aeb683..a5bbbc16e8d7 100644
>> --- a/net/ipv6/ip6_input.c
>> +++ b/net/ipv6/ip6_input.c
>> @@ -403,8 +403,10 @@ INDIRECT_CALLABLE_DECLARE(int tcp_v6_rcv(struct sk_buff *));
>>   void ip6_protocol_deliver_rcu(struct net *net, struct sk_buff *skb, int nexthdr,
>>                     bool have_final)
>>   {
>> +    int exthdr_max = READ_ONCE(init_net.ipv6.sysctl.max_ext_hdrs_cnt);
>>       const struct inet6_protocol *ipprot;
>>       struct inet6_dev *idev;
>> +    int exthdr_cnt = 0;
>>       unsigned int nhoff;
>>       SKB_DR(reason);
>>       bool raw;
>> @@ -487,6 +489,10 @@ void ip6_protocol_deliver_rcu(struct net *net, struct sk_buff *skb, int nexthdr,
>>                   nexthdr = ret;
>>                   goto resubmit_final;
>>               } else {
>> +                if (unlikely(exthdr_cnt++ >= exthdr_max)) {
>> +                    SKB_DR_SET(reason, IPV6_TOO_MANY_EXTHDRS);
>> +                    goto discard;
>> +                }
>>                   goto resubmit;
>>               }
>>           } else if (ret == 0) {
> 
> The hop-by-hop options header (if present) is not taken into account based on the above. However, the max number of extension headers (implicitly 7***, as per RFC 8200 Section 4.1) must include it. I suggest adding this at the beginning of ip6_protocol_deliver_rcu():
> 
> struct inet6_skb_parm *opt = IP6CB(skb);
> 
> if (opt->flags & IP6SKB_HOPBYHOP)
>      exthdr_cnt++;
> 
> *** FYI, rounding to 8 is fine for this fix

Ok, ack, I'll look into adding that in a v3.

>> diff --git a/net/ipv6/sysctl_net_ipv6.c b/net/ipv6/sysctl_net_ipv6.c
>> index d2cd33e2698d..93f865545a7c 100644
>> --- a/net/ipv6/sysctl_net_ipv6.c
>> +++ b/net/ipv6/sysctl_net_ipv6.c
>> @@ -135,6 +135,14 @@ static struct ctl_table ipv6_table_template[] = {
>>           .extra1        = SYSCTL_ZERO,
>>           .extra2        = &flowlabel_reflect_max,
>>       },
>> +    {
>> +        .procname    = "max_ext_hdrs_number",
>> +        .data        = &init_net.ipv6.sysctl.max_ext_hdrs_cnt,
>> +        .maxlen        = sizeof(int),
>> +        .mode        = 0644,
>> +        .proc_handler    = proc_dointvec_minmax,
>> +        .extra1        = SYSCTL_ONE,
>> +    },
>>       {
>>           .procname    = "max_dst_opts_number",
>>           .data        = &init_net.ipv6.sysctl.max_dst_opts_cnt,
> 
> I've given it a lot of thought. I came to the conclusion that we should use a hard-coded value here as well (just like we did for 076b8cad77aa, with the same logic), not a sysctl. IMO, the main reason is that it provides as is a suitable security fix to be backported, i.e., the max value is the max number of EHs allowed by RFC 8200, Section 4.1. Also, we remain consistent with draft-iurman-6man-eh-occurrences (I think Tom is about to send a revision of the series soon for net-next). What this series does is not only enforcing ordering, but also verifying the specific number of occurrences for each type of Extension Header. Which is totally compatible with what this patch does, i.e., limiting the total number of Extension Headers (regardless of their types) to 8. I guess what I'm trying to say is that it seems like a good plan/compromise and that the aforementioned series would build perfectly on top of this fix.


Initially, I had a hard-coded constant (when it was still 32), but Eric's comment
was to rather go with a sysctl, such that if someone unexpectedly complains, then
there is still a chance for that person to fix it up via sysctl without having to
rebuild the kernel. I'm okay either way, but presumably given we're now being more
"aggressive" into lowering the default to 8 rather than 32 then having such a fall-
back is probably better.

Thanks,
Daniel

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH net v2] ipv6: Implement limits on extension header parsing
  2026-04-26 10:38   ` Daniel Borkmann
@ 2026-04-26 10:56     ` saeed bishara
  2026-04-26 13:17     ` Ido Schimmel
  1 sibling, 0 replies; 6+ messages in thread
From: saeed bishara @ 2026-04-26 10:56 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Justin Iurman, kuba, edumazet, dsahern, tom,
	willemdebruijn.kernel, idosch, pabeni, netdev

On Sun, Apr 26, 2026 at 1:38 PM Daniel Borkmann <daniel@iogearbox.net> wrote:
>
> Hi Justin,
>
> On 4/25/26 12:19 PM, Justin Iurman wrote:
> > On 4/25/26 09:55, Daniel Borkmann wrote:
> >> ipv6_{skip_exthdr,find_hdr}() and ip6_{tnl_parse_tlv_enc_lim,
> >> protocol_deliver_rcu}() iterate over IPv6 extension headers until they
> >> find a non-extension-header protocol or run out of packet data. The
> >> loops have no iteration counter, relying solely on the packet length
> >> to bound them. For a crafted packet with 8-byte extension headers
> >> filling a 64KB jumbogram, this means a worst case of up to ~8k
> >> iterations with a skb_header_pointer call each. ipv6_skip_exthdr(),
> >> for example, is used where it parses the inner quoted packet inside
> >> an incoming ICMPv6 error:
> >>
> >>    - icmpv6_rcv
> >>      - checksum validation
> >>      - case ICMPV6_DEST_UNREACH
> >>        - icmpv6_notify
> >>          - pskb_may_pull()       <- pull inner IPv6 header
> >>          - ipv6_skip_exthdr()    <- iterates here
> >>          - pskb_may_pull()
> >>          - ipprot->err_handler() <- sk lookup
> >>
> >> The per-iteration cost of ipv6_skip_exthdr itself is generally
> >> light, but skb_header_pointer becomes more costly on reassembled
> >> packets: the first ~1232 bytes of the inner packet are in the skb's
> >> linear area, but the remaining ~63KB are in the frag_list where
> >> skb_copy_bits is needed to read data.
> >>
> >> Add a configurable limit via a new sysctl net.ipv6.max_ext_hdrs_number
> >> (default 8, minimum 1). All four extension header walking functions
> >> are bound by this limit. The sysctl is in line with commit 47d3d7ac656a
> >> ("ipv6: Implement limits on Hop-by-Hop and Destination options").
> >> As documented, init_net is used to derive max_ext_hdrs_number to
> >> be consistent given a net cannot always reliably be retrieved.
> >>
> >> Note that the check in ip6_protocol_deliver_rcu() happens right
> >> before the goto resubmit, such that we don't have to have a test
> >> for ipv6_ext_hdr() in the fast-path.
> >>
> >> There's an ongoing IETF draft-iurman-6man-eh-occurrences to enforce
> >> IPv6 extension headers ordering and occurrence. The latter also
> >> discusses security implications. As per RFC8200 section 4.1, the
> >> occurrence rules for extension headers provide a practical upper
> >> bound, thus 8 was used as the default.
> >>
> >> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
> >> ---
> >>   v1->v2:
> >>     - Set the default to 8 (Justin)
> >>     - Update IETF references (Justin)
> >>     - Add core path coverage as well (Justin)
> >>
> >>   Documentation/networking/ip-sysctl.rst |  7 +++++++
> >>   include/net/dropreason-core.h          |  6 ++++++
> >>   include/net/ipv6.h                     |  2 ++
> >>   include/net/netns/ipv6.h               |  1 +
> >>   net/ipv6/af_inet6.c                    |  1 +
> >>   net/ipv6/exthdrs_core.c                | 11 +++++++++++
> >>   net/ipv6/ip6_input.c                   |  6 ++++++
> >>   net/ipv6/ip6_tunnel.c                  |  5 +++++
> >>   net/ipv6/sysctl_net_ipv6.c             |  8 ++++++++
> >>   9 files changed, 47 insertions(+)
> >>
> >
> > [snip]
> >
> >> diff --git a/net/ipv6/ip6_input.c b/net/ipv6/ip6_input.c
> >> index 967b07aeb683..a5bbbc16e8d7 100644
> >> --- a/net/ipv6/ip6_input.c
> >> +++ b/net/ipv6/ip6_input.c
> >> @@ -403,8 +403,10 @@ INDIRECT_CALLABLE_DECLARE(int tcp_v6_rcv(struct sk_buff *));
> >>   void ip6_protocol_deliver_rcu(struct net *net, struct sk_buff *skb, int nexthdr,
> >>                     bool have_final)
> >>   {
> >> +    int exthdr_max = READ_ONCE(init_net.ipv6.sysctl.max_ext_hdrs_cnt);
> >>       const struct inet6_protocol *ipprot;
> >>       struct inet6_dev *idev;
> >> +    int exthdr_cnt = 0;
> >>       unsigned int nhoff;
> >>       SKB_DR(reason);
> >>       bool raw;
> >> @@ -487,6 +489,10 @@ void ip6_protocol_deliver_rcu(struct net *net, struct sk_buff *skb, int nexthdr,
> >>                   nexthdr = ret;
> >>                   goto resubmit_final;
> >>               } else {
> >> +                if (unlikely(exthdr_cnt++ >= exthdr_max)) {
from performance perspective, isn't it better to have single variable
that initialized to max_ext_hdrs_cnt then decremented until reaches
zero? that will take less cpu cycles and variables
> >> +                    SKB_DR_SET(reason, IPV6_TOO_MANY_EXTHDRS);
> >> +                    goto discard;
> >> +                }
> >>                   goto resubmit;
> >>               }
> >>           } else if (ret == 0) {
> >
> > The hop-by-hop options header (if present) is not taken into account based on the above. However, the max number of extension headers (implicitly 7***, as per RFC 8200 Section 4.1) must include it. I suggest adding this at the beginning of ip6_protocol_deliver_rcu():
> >
> > struct inet6_skb_parm *opt = IP6CB(skb);
> >
> > if (opt->flags & IP6SKB_HOPBYHOP)
> >      exthdr_cnt++;
> >
> > *** FYI, rounding to 8 is fine for this fix
>
> Ok, ack, I'll look into adding that in a v3.
>
> >> diff --git a/net/ipv6/sysctl_net_ipv6.c b/net/ipv6/sysctl_net_ipv6.c
> >> index d2cd33e2698d..93f865545a7c 100644
> >> --- a/net/ipv6/sysctl_net_ipv6.c
> >> +++ b/net/ipv6/sysctl_net_ipv6.c
> >> @@ -135,6 +135,14 @@ static struct ctl_table ipv6_table_template[] = {
> >>           .extra1        = SYSCTL_ZERO,
> >>           .extra2        = &flowlabel_reflect_max,
> >>       },
> >> +    {
> >> +        .procname    = "max_ext_hdrs_number",
> >> +        .data        = &init_net.ipv6.sysctl.max_ext_hdrs_cnt,
> >> +        .maxlen        = sizeof(int),
> >> +        .mode        = 0644,
> >> +        .proc_handler    = proc_dointvec_minmax,
> >> +        .extra1        = SYSCTL_ONE,
> >> +    },
> >>       {
> >>           .procname    = "max_dst_opts_number",
> >>           .data        = &init_net.ipv6.sysctl.max_dst_opts_cnt,
> >
> > I've given it a lot of thought. I came to the conclusion that we should use a hard-coded value here as well (just like we did for 076b8cad77aa, with the same logic), not a sysctl. IMO, the main reason is that it provides as is a suitable security fix to be backported, i.e., the max value is the max number of EHs allowed by RFC 8200, Section 4.1. Also, we remain consistent with draft-iurman-6man-eh-occurrences (I think Tom is about to send a revision of the series soon for net-next). What this series does is not only enforcing ordering, but also verifying the specific number of occurrences for each type of Extension Header. Which is totally compatible with what this patch does, i.e., limiting the total number of Extension Headers (regardless of their types) to 8. I guess what I'm trying to say is that it seems like a good plan/compromise and that the aforementioned series would build perfectly on top of this fix.
>
>
> Initially, I had a hard-coded constant (when it was still 32), but Eric's comment
> was to rather go with a sysctl, such that if someone unexpectedly complains, then
> there is still a chance for that person to fix it up via sysctl without having to
> rebuild the kernel. I'm okay either way, but presumably given we're now being more
> "aggressive" into lowering the default to 8 rather than 32 then having such a fall-
> back is probably better.
>
> Thanks,
> Daniel
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH net v2] ipv6: Implement limits on extension header parsing
  2026-04-26 10:38   ` Daniel Borkmann
  2026-04-26 10:56     ` saeed bishara
@ 2026-04-26 13:17     ` Ido Schimmel
  2026-04-26 15:47       ` Justin Iurman
  1 sibling, 1 reply; 6+ messages in thread
From: Ido Schimmel @ 2026-04-26 13:17 UTC (permalink / raw)
  To: Daniel Borkmann, tom, justin.iurman
  Cc: Justin Iurman, kuba, edumazet, dsahern, tom,
	willemdebruijn.kernel, pabeni, netdev

On Sun, Apr 26, 2026 at 12:38:31PM +0200, Daniel Borkmann wrote:
> On 4/25/26 12:19 PM, Justin Iurman wrote:
> > I've given it a lot of thought. I came to the conclusion that we
> > should use a hard-coded value here as well (just like we did for
> > 076b8cad77aa, with the same logic), not a sysctl. IMO, the main
> > reason is that it provides as is a suitable security fix to be
> > backported, i.e., the max value is the max number of EHs allowed by
> > RFC 8200, Section 4.1. Also, we remain consistent with
> > draft-iurman-6man-eh-occurrences (I think Tom is about to send a
> > revision of the series soon for net-next). What this series does is
> > not only enforcing ordering, but also verifying the specific number
> > of occurrences for each type of Extension Header. Which is totally
> > compatible with what this patch does, i.e., limiting the total
> > number of Extension Headers (regardless of their types) to 8. I
> > guess what I'm trying to say is that it seems like a good
> > plan/compromise and that the aforementioned series would build
> > perfectly on top of this fix.
> 
> Initially, I had a hard-coded constant (when it was still 32), but Eric's comment
> was to rather go with a sysctl, such that if someone unexpectedly complains, then
> there is still a chance for that person to fix it up via sysctl without having to
> rebuild the kernel. I'm okay either way, but presumably given we're now being more
> "aggressive" into lowering the default to 8 rather than 32 then having such a fall-
> back is probably better.

I also think that 32 without a sysctl knob is fine (just so that we have
some upper bound), but if we go with a sysctl then let's make sure that
it's compatible with Tom's series [1] (I assume he is going to send a
new version).

AFAICT it's possible to create conflicting configuration with both
sysctls (e.g., "enforce_ext_hdr_order" is set to 1 and
"max_ext_hdrs_number" configured to less than 8). The documentation
should make the relation between both sysctls clear to users. It can
also mention that "max_ext_hdrs_number" might be useful when users are
forced to turn "enforce_ext_hdr_order" off when dealing with hosts that
send extension headers in an unexpected order. That way, they still have
an upper bound on the maximum number of extension headers.

[1] https://lore.kernel.org/netdev/20260314175124.47010-1-tom@herbertland.com/

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH net v2] ipv6: Implement limits on extension header parsing
  2026-04-26 13:17     ` Ido Schimmel
@ 2026-04-26 15:47       ` Justin Iurman
  0 siblings, 0 replies; 6+ messages in thread
From: Justin Iurman @ 2026-04-26 15:47 UTC (permalink / raw)
  To: Ido Schimmel, Daniel Borkmann, tom
  Cc: kuba, edumazet, dsahern, willemdebruijn.kernel, pabeni, netdev

On 4/26/26 15:17, Ido Schimmel wrote:
> On Sun, Apr 26, 2026 at 12:38:31PM +0200, Daniel Borkmann wrote:
>> On 4/25/26 12:19 PM, Justin Iurman wrote:
>>> I've given it a lot of thought. I came to the conclusion that we
>>> should use a hard-coded value here as well (just like we did for
>>> 076b8cad77aa, with the same logic), not a sysctl. IMO, the main
>>> reason is that it provides as is a suitable security fix to be
>>> backported, i.e., the max value is the max number of EHs allowed by
>>> RFC 8200, Section 4.1. Also, we remain consistent with
>>> draft-iurman-6man-eh-occurrences (I think Tom is about to send a
>>> revision of the series soon for net-next). What this series does is
>>> not only enforcing ordering, but also verifying the specific number
>>> of occurrences for each type of Extension Header. Which is totally
>>> compatible with what this patch does, i.e., limiting the total
>>> number of Extension Headers (regardless of their types) to 8. I
>>> guess what I'm trying to say is that it seems like a good
>>> plan/compromise and that the aforementioned series would build
>>> perfectly on top of this fix.
>>
>> Initially, I had a hard-coded constant (when it was still 32), but Eric's comment
>> was to rather go with a sysctl, such that if someone unexpectedly complains, then
>> there is still a chance for that person to fix it up via sysctl without having to
>> rebuild the kernel. I'm okay either way, but presumably given we're now being more
>> "aggressive" into lowering the default to 8 rather than 32 then having such a fall-
>> back is probably better.
> 
> I also think that 32 without a sysctl knob is fine (just so that we have
> some upper bound), but if we go with a sysctl then let's make sure that
> it's compatible with Tom's series [1] (I assume he is going to send a
> new version).
> 
> AFAICT it's possible to create conflicting configuration with both
> sysctls (e.g., "enforce_ext_hdr_order" is set to 1 and
> "max_ext_hdrs_number" configured to less than 8). The documentation
> should make the relation between both sysctls clear to users. It can
> also mention that "max_ext_hdrs_number" might be useful when users are
> forced to turn "enforce_ext_hdr_order" off when dealing with hosts that
> send extension headers in an unexpected order. That way, they still have
> an upper bound on the maximum number of extension headers.
> 
> [1] https://lore.kernel.org/netdev/20260314175124.47010-1-tom@herbertland.com/

Ido, Daniel,

As I said, my vote would definitely go to the solution without a sysctl 
for the very reason Ido mentioned. Note that an upper bound of 32 is 
kind of unrealistic, although super (SUPER!) safe. Sending more than 8 
Extension Headers (assuming a different type for each) is not standard 
behavior and would make you non-RFC-compliant anyway. But I'm happy with 
32, as long as we don't define a sysctl for that.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2026-04-26 15:47 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-25  7:55 [PATCH net v2] ipv6: Implement limits on extension header parsing Daniel Borkmann
2026-04-25 10:19 ` Justin Iurman
2026-04-26 10:38   ` Daniel Borkmann
2026-04-26 10:56     ` saeed bishara
2026-04-26 13:17     ` Ido Schimmel
2026-04-26 15:47       ` Justin Iurman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox