* [PATCH net-next v2 0/7] net: ip: add drop reasons to input route
@ 2024-10-07 7:46 Menglong Dong
2024-10-07 7:46 ` [PATCH net-next v2 1/7] net: ip: make fib_validate_source() return drop reason Menglong Dong
` (7 more replies)
0 siblings, 8 replies; 15+ messages in thread
From: Menglong Dong @ 2024-10-07 7:46 UTC (permalink / raw)
To: edumazet, kuba
Cc: davem, pabeni, dsahern, steffen.klassert, herbert, dongml2,
bigeasy, toke, idosch, netdev, linux-kernel, bpf
In this series, we mainly add some skb drop reasons to the input path of
ip routing.
The errno from fib_validate_source() is -EINVAL or -EXDEV, and -EXDEV is
used in ip_rcv_finish_core() to increase the LINUX_MIB_IPRPFILTER. For
this case, we can check it by
"drop_reason == SKB_DROP_REASON_IP_RPFILTER" instead. Therefore, we can
make fib_validate_source() return -reason.
Meanwhile, we make the following functions return drop reasons too:
ip_route_input_mc()
ip_mc_validate_source()
ip_route_input_slow()
ip_route_input_rcu()
ip_route_input_noref()
ip_route_input()
And following new skb drop reasons are added:
SKB_DROP_REASON_IP_LOCAL_SOURCE
SKB_DROP_REASON_IP_INVALID_SOURCE
SKB_DROP_REASON_IP_LOCALNET
SKB_DROP_REASON_IP_INVALID_DEST
Changes since v1:
- make ip_route_input_noref/ip_route_input_rcu/ip_route_input_slow return
drop reasons, instead of passing a local variable to their function
arguments.
Menglong Dong (7):
net: ip: make fib_validate_source() return drop reason
net: ip: make ip_route_input_mc() return drop reason
net: ip: make ip_mc_validate_source() return drop reason
net: ip: make ip_route_input_slow() return drop reasons
net: ip: make ip_route_input_rcu() return drop reasons
net: ip: make ip_route_input_noref() return drop reasons
net: ip: make ip_route_input() return drop reasons
include/net/dropreason-core.h | 19 +++++
include/net/route.h | 27 ++++---
net/bridge/br_netfilter_hooks.c | 11 +--
net/core/lwt_bpf.c | 1 +
net/ipv4/fib_frontend.c | 19 +++--
net/ipv4/icmp.c | 1 +
net/ipv4/ip_fragment.c | 12 +--
net/ipv4/ip_input.c | 11 ++-
net/ipv4/route.c | 131 +++++++++++++++++++-------------
9 files changed, 145 insertions(+), 87 deletions(-)
--
2.39.5
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH net-next v2 1/7] net: ip: make fib_validate_source() return drop reason
2024-10-07 7:46 [PATCH net-next v2 0/7] net: ip: add drop reasons to input route Menglong Dong
@ 2024-10-07 7:46 ` Menglong Dong
2024-10-10 8:25 ` Paolo Abeni
2024-10-07 7:46 ` [PATCH net-next v2 2/7] net: ip: make ip_route_input_mc() " Menglong Dong
` (6 subsequent siblings)
7 siblings, 1 reply; 15+ messages in thread
From: Menglong Dong @ 2024-10-07 7:46 UTC (permalink / raw)
To: edumazet, kuba
Cc: davem, pabeni, dsahern, steffen.klassert, herbert, dongml2,
bigeasy, toke, idosch, netdev, linux-kernel, bpf
In this commit, we make fib_validate_source/__fib_validate_source return
-reason instead of errno on error. As the return value of them can be
-errno, 0, and 1, we can't make it return enum skb_drop_reason directly.
In the origin logic, if __fib_validate_source() return -EXDEV,
LINUX_MIB_IPRPFILTER will be counted. And now, we need to adjust it by
checking "reason == SKB_DROP_REASON_IP_RPFILTER". However, this will take
effect only after the patch "net: ip: make ip_route_input_noref() return
drop reasons", as we can't pass the drop reasons from
fib_validate_source() to ip_rcv_finish_core() in this patch.
We set the errno to -EINVAL when fib_validate_source() is called and the
validation fails, as the errno can be checked in the caller and now its
value is -reason, which can lead misunderstand.
Following new drop reasons are added in this patch:
SKB_DROP_REASON_IP_LOCAL_SOURCE
SKB_DROP_REASON_IP_INVALID_SOURCE
Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
---
include/net/dropreason-core.h | 10 ++++++++++
net/ipv4/fib_frontend.c | 19 +++++++++++++------
net/ipv4/ip_input.c | 4 +---
net/ipv4/route.c | 15 +++++++++++----
4 files changed, 35 insertions(+), 13 deletions(-)
diff --git a/include/net/dropreason-core.h b/include/net/dropreason-core.h
index 4748680e8c88..76504e25d581 100644
--- a/include/net/dropreason-core.h
+++ b/include/net/dropreason-core.h
@@ -76,6 +76,8 @@
FN(INVALID_PROTO) \
FN(IP_INADDRERRORS) \
FN(IP_INNOROUTES) \
+ FN(IP_LOCAL_SOURCE) \
+ FN(IP_INVALID_SOURCE) \
FN(PKT_TOO_BIG) \
FN(DUP_FRAG) \
FN(FRAG_REASM_TIMEOUT) \
@@ -365,6 +367,14 @@ enum skb_drop_reason {
* IPSTATS_MIB_INADDRERRORS
*/
SKB_DROP_REASON_IP_INNOROUTES,
+ /** @SKB_DROP_REASON_IP_LOCAL_SOURCE: the source ip is local */
+ SKB_DROP_REASON_IP_LOCAL_SOURCE,
+ /**
+ * @SKB_DROP_REASON_IP_INVALID_SOURCE: the source ip is invalid:
+ * 1) source ip is multicast or limited broadcast
+ * 2) source ip is zero and not IGMP
+ */
+ SKB_DROP_REASON_IP_INVALID_SOURCE,
/**
* @SKB_DROP_REASON_PKT_TOO_BIG: packet size is too big (maybe exceed the
* MTU)
diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
index 793e6781399a..779c90de3a54 100644
--- a/net/ipv4/fib_frontend.c
+++ b/net/ipv4/fib_frontend.c
@@ -346,6 +346,7 @@ static int __fib_validate_source(struct sk_buff *skb, __be32 src, __be32 dst,
int rpf, struct in_device *idev, u32 *itag)
{
struct net *net = dev_net(dev);
+ enum skb_drop_reason reason;
struct flow_keys flkeys;
int ret, no_addr;
struct fib_result res;
@@ -377,9 +378,15 @@ static int __fib_validate_source(struct sk_buff *skb, __be32 src, __be32 dst,
if (fib_lookup(net, &fl4, &res, 0))
goto last_resort;
- if (res.type != RTN_UNICAST &&
- (res.type != RTN_LOCAL || !IN_DEV_ACCEPT_LOCAL(idev)))
- goto e_inval;
+ if (res.type != RTN_UNICAST) {
+ if (res.type != RTN_LOCAL) {
+ reason = SKB_DROP_REASON_IP_INVALID_SOURCE;
+ goto e_inval;
+ } else if (!IN_DEV_ACCEPT_LOCAL(idev)) {
+ reason = SKB_DROP_REASON_IP_LOCAL_SOURCE;
+ goto e_inval;
+ }
+ }
fib_combine_itag(itag, &res);
dev_match = fib_info_nh_uses_dev(res.fi, dev);
@@ -412,9 +419,9 @@ static int __fib_validate_source(struct sk_buff *skb, __be32 src, __be32 dst,
return 0;
e_inval:
- return -EINVAL;
+ return -reason;
e_rpf:
- return -EXDEV;
+ return -SKB_DROP_REASON_IP_RPFILTER;
}
/* Ignore rp_filter for packets protected by IPsec. */
@@ -440,7 +447,7 @@ int fib_validate_source(struct sk_buff *skb, __be32 src, __be32 dst,
* and the same host but different containers are not.
*/
if (inet_lookup_ifaddr_rcu(net, src))
- return -EINVAL;
+ return -SKB_DROP_REASON_IP_LOCAL_SOURCE;
ok:
*itag = 0;
diff --git a/net/ipv4/ip_input.c b/net/ipv4/ip_input.c
index c0a2490eb7c1..a6f5bfc274ee 100644
--- a/net/ipv4/ip_input.c
+++ b/net/ipv4/ip_input.c
@@ -425,10 +425,8 @@ static int ip_rcv_finish_core(struct net *net, struct sock *sk,
return NET_RX_DROP;
drop_error:
- if (err == -EXDEV) {
- drop_reason = SKB_DROP_REASON_IP_RPFILTER;
+ if (drop_reason == SKB_DROP_REASON_IP_RPFILTER)
__NET_INC_STATS(net, LINUX_MIB_IPRPFILTER);
- }
goto drop;
}
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 6e1cd0065b87..e49b4ce1804a 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -1690,7 +1690,7 @@ int ip_mc_validate_source(struct sk_buff *skb, __be32 daddr, __be32 saddr,
err = fib_validate_source(skb, saddr, 0, tos, 0, dev,
in_dev, itag);
if (err < 0)
- return err;
+ return -EINVAL;
}
return 0;
}
@@ -1788,6 +1788,7 @@ static int __mkroute_input(struct sk_buff *skb,
err = fib_validate_source(skb, saddr, daddr, tos, FIB_RES_OIF(*res),
in_dev->dev, in_dev, &itag);
if (err < 0) {
+ err = -EINVAL;
ip_handle_martian_source(in_dev->dev, in_dev, skb, daddr,
saddr);
@@ -2162,8 +2163,10 @@ int ip_route_use_hint(struct sk_buff *skb, __be32 daddr, __be32 saddr,
tos &= INET_DSCP_MASK;
err = fib_validate_source(skb, saddr, daddr, tos, 0, dev, in_dev, &tag);
- if (err < 0)
+ if (err < 0) {
+ err = -EINVAL;
goto martian_source;
+ }
skip_validate_source:
skb_dst_copy(skb, hint);
@@ -2302,8 +2305,10 @@ static int ip_route_input_slow(struct sk_buff *skb, __be32 daddr, __be32 saddr,
err = fib_validate_source(skb, saddr, daddr,
inet_dscp_to_dsfield(dscp), 0, dev,
in_dev, &itag);
- if (err < 0)
+ if (err < 0) {
+ err = -EINVAL;
goto martian_source;
+ }
goto local_input;
}
@@ -2327,8 +2332,10 @@ out: return err;
err = fib_validate_source(skb, saddr, 0,
inet_dscp_to_dsfield(dscp), 0, dev,
in_dev, &itag);
- if (err < 0)
+ if (err < 0) {
+ err = -EINVAL;
goto martian_source;
+ }
}
flags |= RTCF_BROADCAST;
res->type = RTN_BROADCAST;
--
2.39.5
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH net-next v2 2/7] net: ip: make ip_route_input_mc() return drop reason
2024-10-07 7:46 [PATCH net-next v2 0/7] net: ip: add drop reasons to input route Menglong Dong
2024-10-07 7:46 ` [PATCH net-next v2 1/7] net: ip: make fib_validate_source() return drop reason Menglong Dong
@ 2024-10-07 7:46 ` Menglong Dong
2024-10-07 7:46 ` [PATCH net-next v2 3/7] net: ip: make ip_mc_validate_source() " Menglong Dong
` (5 subsequent siblings)
7 siblings, 0 replies; 15+ messages in thread
From: Menglong Dong @ 2024-10-07 7:46 UTC (permalink / raw)
To: edumazet, kuba
Cc: davem, pabeni, dsahern, steffen.klassert, herbert, dongml2,
bigeasy, toke, idosch, netdev, linux-kernel, bpf
Make ip_route_input_mc() return drop reason, and adjust the call of it
in ip_route_input_rcu().
Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
---
net/ipv4/route.c | 23 ++++++++++++-----------
1 file changed, 12 insertions(+), 11 deletions(-)
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index e49b4ce1804a..76940ca7c178 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -1696,8 +1696,9 @@ int ip_mc_validate_source(struct sk_buff *skb, __be32 daddr, __be32 saddr,
}
/* called in rcu_read_lock() section */
-static int ip_route_input_mc(struct sk_buff *skb, __be32 daddr, __be32 saddr,
- u8 tos, struct net_device *dev, int our)
+static enum skb_drop_reason
+ip_route_input_mc(struct sk_buff *skb, __be32 daddr, __be32 saddr,
+ u8 tos, struct net_device *dev, int our)
{
struct in_device *in_dev = __in_dev_get_rcu(dev);
unsigned int flags = RTCF_MULTICAST;
@@ -1707,7 +1708,7 @@ static int ip_route_input_mc(struct sk_buff *skb, __be32 daddr, __be32 saddr,
err = ip_mc_validate_source(skb, daddr, saddr, tos, dev, in_dev, &itag);
if (err)
- return err;
+ return SKB_DROP_REASON_NOT_SPECIFIED;
if (our)
flags |= RTCF_LOCAL;
@@ -1718,7 +1719,7 @@ static int ip_route_input_mc(struct sk_buff *skb, __be32 daddr, __be32 saddr,
rth = rt_dst_alloc(dev_net(dev)->loopback_dev, flags, RTN_MULTICAST,
false);
if (!rth)
- return -ENOBUFS;
+ return SKB_DROP_REASON_NOMEM;
#ifdef CONFIG_IP_ROUTE_CLASSID
rth->dst.tclassid = itag;
@@ -1734,7 +1735,7 @@ static int ip_route_input_mc(struct sk_buff *skb, __be32 daddr, __be32 saddr,
skb_dst_drop(skb);
skb_dst_set(skb, &rth->dst);
- return 0;
+ return SKB_NOT_DROPPED_YET;
}
@@ -2440,12 +2441,12 @@ static int ip_route_input_rcu(struct sk_buff *skb, __be32 daddr, __be32 saddr,
* route cache entry is created eventually.
*/
if (ipv4_is_multicast(daddr)) {
+ enum skb_drop_reason reason = SKB_DROP_REASON_NOT_SPECIFIED;
struct in_device *in_dev = __in_dev_get_rcu(dev);
int our = 0;
- int err = -EINVAL;
if (!in_dev)
- return err;
+ return -EINVAL;
our = ip_check_mc_rcu(in_dev, daddr, saddr,
ip_hdr(skb)->protocol);
@@ -2466,11 +2467,11 @@ static int ip_route_input_rcu(struct sk_buff *skb, __be32 daddr, __be32 saddr,
IN_DEV_MFORWARD(in_dev))
#endif
) {
- err = ip_route_input_mc(skb, daddr, saddr,
- inet_dscp_to_dsfield(dscp),
- dev, our);
+ reason = ip_route_input_mc(skb, daddr, saddr,
+ inet_dscp_to_dsfield(dscp),
+ dev, our);
}
- return err;
+ return reason ? -EINVAL : 0;
}
return ip_route_input_slow(skb, daddr, saddr, dscp, dev, res);
--
2.39.5
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH net-next v2 3/7] net: ip: make ip_mc_validate_source() return drop reason
2024-10-07 7:46 [PATCH net-next v2 0/7] net: ip: add drop reasons to input route Menglong Dong
2024-10-07 7:46 ` [PATCH net-next v2 1/7] net: ip: make fib_validate_source() return drop reason Menglong Dong
2024-10-07 7:46 ` [PATCH net-next v2 2/7] net: ip: make ip_route_input_mc() " Menglong Dong
@ 2024-10-07 7:46 ` Menglong Dong
2024-10-07 7:46 ` [PATCH net-next v2 4/7] net: ip: make ip_route_input_slow() return drop reasons Menglong Dong
` (4 subsequent siblings)
7 siblings, 0 replies; 15+ messages in thread
From: Menglong Dong @ 2024-10-07 7:46 UTC (permalink / raw)
To: edumazet, kuba
Cc: davem, pabeni, dsahern, steffen.klassert, herbert, dongml2,
bigeasy, toke, idosch, netdev, linux-kernel, bpf
Make ip_mc_validate_source() return drop reason, and adjust the call of
it in ip_route_input_mc().
Another caller of it is ip_rcv_finish_core->udp_v4_early_demux, and the
errno is not checked in detail, so we don't do more adjustment for it.
The drop reason "SKB_DROP_REASON_IP_LOCALNET" is added in this commit.
Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
---
include/net/dropreason-core.h | 3 +++
include/net/route.h | 7 ++++---
net/ipv4/route.c | 33 ++++++++++++++++++---------------
3 files changed, 25 insertions(+), 18 deletions(-)
diff --git a/include/net/dropreason-core.h b/include/net/dropreason-core.h
index 76504e25d581..32d9fcb54af9 100644
--- a/include/net/dropreason-core.h
+++ b/include/net/dropreason-core.h
@@ -78,6 +78,7 @@
FN(IP_INNOROUTES) \
FN(IP_LOCAL_SOURCE) \
FN(IP_INVALID_SOURCE) \
+ FN(IP_LOCALNET) \
FN(PKT_TOO_BIG) \
FN(DUP_FRAG) \
FN(FRAG_REASM_TIMEOUT) \
@@ -375,6 +376,8 @@ enum skb_drop_reason {
* 2) source ip is zero and not IGMP
*/
SKB_DROP_REASON_IP_INVALID_SOURCE,
+ /** @SKB_DROP_REASON_IP_LOCALNET: source or dest ip is local net */
+ SKB_DROP_REASON_IP_LOCALNET,
/**
* @SKB_DROP_REASON_PKT_TOO_BIG: packet size is too big (maybe exceed the
* MTU)
diff --git a/include/net/route.h b/include/net/route.h
index 5e4374d66927..35bc12146960 100644
--- a/include/net/route.h
+++ b/include/net/route.h
@@ -198,9 +198,10 @@ static inline struct rtable *ip_route_output_gre(struct net *net, struct flowi4
fl4->fl4_gre_key = gre_key;
return ip_route_output_key(net, fl4);
}
-int ip_mc_validate_source(struct sk_buff *skb, __be32 daddr, __be32 saddr,
- u8 tos, struct net_device *dev,
- struct in_device *in_dev, u32 *itag);
+enum skb_drop_reason
+ip_mc_validate_source(struct sk_buff *skb, __be32 daddr, __be32 saddr,
+ u8 tos, struct net_device *dev,
+ struct in_device *in_dev, u32 *itag);
int ip_route_input_noref(struct sk_buff *skb, __be32 daddr, __be32 saddr,
dscp_t dscp, struct net_device *dev);
int ip_route_use_hint(struct sk_buff *skb, __be32 dst, __be32 src,
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 76940ca7c178..b41bb9be18e2 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -1665,34 +1665,37 @@ struct rtable *rt_dst_clone(struct net_device *dev, struct rtable *rt)
EXPORT_SYMBOL(rt_dst_clone);
/* called in rcu_read_lock() section */
-int ip_mc_validate_source(struct sk_buff *skb, __be32 daddr, __be32 saddr,
- u8 tos, struct net_device *dev,
- struct in_device *in_dev, u32 *itag)
+enum skb_drop_reason
+ip_mc_validate_source(struct sk_buff *skb, __be32 daddr, __be32 saddr,
+ u8 tos, struct net_device *dev,
+ struct in_device *in_dev, u32 *itag)
{
int err;
/* Primary sanity checks. */
if (!in_dev)
- return -EINVAL;
+ return SKB_DROP_REASON_NOT_SPECIFIED;
- if (ipv4_is_multicast(saddr) || ipv4_is_lbcast(saddr) ||
- skb->protocol != htons(ETH_P_IP))
- return -EINVAL;
+ if (ipv4_is_multicast(saddr) || ipv4_is_lbcast(saddr))
+ return SKB_DROP_REASON_IP_INVALID_SOURCE;
+
+ if (skb->protocol != htons(ETH_P_IP))
+ return SKB_DROP_REASON_INVALID_PROTO;
if (ipv4_is_loopback(saddr) && !IN_DEV_ROUTE_LOCALNET(in_dev))
- return -EINVAL;
+ return SKB_DROP_REASON_IP_LOCALNET;
if (ipv4_is_zeronet(saddr)) {
if (!ipv4_is_local_multicast(daddr) &&
ip_hdr(skb)->protocol != IPPROTO_IGMP)
- return -EINVAL;
+ return SKB_DROP_REASON_IP_INVALID_SOURCE;
} else {
err = fib_validate_source(skb, saddr, 0, tos, 0, dev,
in_dev, itag);
if (err < 0)
- return -EINVAL;
+ return -err;
}
- return 0;
+ return SKB_NOT_DROPPED_YET;
}
/* called in rcu_read_lock() section */
@@ -1702,13 +1705,13 @@ ip_route_input_mc(struct sk_buff *skb, __be32 daddr, __be32 saddr,
{
struct in_device *in_dev = __in_dev_get_rcu(dev);
unsigned int flags = RTCF_MULTICAST;
+ enum skb_drop_reason reason;
struct rtable *rth;
u32 itag = 0;
- int err;
- err = ip_mc_validate_source(skb, daddr, saddr, tos, dev, in_dev, &itag);
- if (err)
- return SKB_DROP_REASON_NOT_SPECIFIED;
+ reason = ip_mc_validate_source(skb, daddr, saddr, tos, dev, in_dev, &itag);
+ if (reason)
+ return reason;
if (our)
flags |= RTCF_LOCAL;
--
2.39.5
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH net-next v2 4/7] net: ip: make ip_route_input_slow() return drop reasons
2024-10-07 7:46 [PATCH net-next v2 0/7] net: ip: add drop reasons to input route Menglong Dong
` (2 preceding siblings ...)
2024-10-07 7:46 ` [PATCH net-next v2 3/7] net: ip: make ip_mc_validate_source() " Menglong Dong
@ 2024-10-07 7:46 ` Menglong Dong
2024-10-07 7:47 ` [PATCH net-next v2 5/7] net: ip: make ip_route_input_rcu() " Menglong Dong
` (3 subsequent siblings)
7 siblings, 0 replies; 15+ messages in thread
From: Menglong Dong @ 2024-10-07 7:46 UTC (permalink / raw)
To: edumazet, kuba
Cc: davem, pabeni, dsahern, steffen.klassert, herbert, dongml2,
bigeasy, toke, idosch, netdev, linux-kernel, bpf
In this commit, we make ip_route_input_slow() return skb drop reasons,
and following new skb drop reasons are added:
SKB_DROP_REASON_IP_INVALID_DEST
The only caller of ip_route_input_slow() is ip_route_input_rcu(), and we
adjust it by making it return -EINVAL on error.
Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
---
include/net/dropreason-core.h | 6 +++++
net/ipv4/route.c | 51 +++++++++++++++++++++--------------
2 files changed, 37 insertions(+), 20 deletions(-)
diff --git a/include/net/dropreason-core.h b/include/net/dropreason-core.h
index 32d9fcb54af9..e53f3d944e04 100644
--- a/include/net/dropreason-core.h
+++ b/include/net/dropreason-core.h
@@ -79,6 +79,7 @@
FN(IP_LOCAL_SOURCE) \
FN(IP_INVALID_SOURCE) \
FN(IP_LOCALNET) \
+ FN(IP_INVALID_DEST) \
FN(PKT_TOO_BIG) \
FN(DUP_FRAG) \
FN(FRAG_REASM_TIMEOUT) \
@@ -378,6 +379,11 @@ enum skb_drop_reason {
SKB_DROP_REASON_IP_INVALID_SOURCE,
/** @SKB_DROP_REASON_IP_LOCALNET: source or dest ip is local net */
SKB_DROP_REASON_IP_LOCALNET,
+ /**
+ * @SKB_DROP_REASON_IP_INVALID_DEST: the dest ip is invalid:
+ * 1) dest ip is 0
+ */
+ SKB_DROP_REASON_IP_INVALID_DEST,
/**
* @SKB_DROP_REASON_PKT_TOO_BIG: packet size is too big (maybe exceed the
* MTU)
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index b41bb9be18e2..9b3f7bebcd86 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -2207,10 +2207,12 @@ static struct net_device *ip_rt_get_dev(struct net *net,
* called with rcu_read_lock()
*/
-static int ip_route_input_slow(struct sk_buff *skb, __be32 daddr, __be32 saddr,
- dscp_t dscp, struct net_device *dev,
- struct fib_result *res)
+static enum skb_drop_reason
+ip_route_input_slow(struct sk_buff *skb, __be32 daddr, __be32 saddr,
+ dscp_t dscp, struct net_device *dev,
+ struct fib_result *res)
{
+ enum skb_drop_reason reason = SKB_DROP_REASON_NOT_SPECIFIED;
struct in_device *in_dev = __in_dev_get_rcu(dev);
struct flow_keys *flkeys = NULL, _flkeys;
struct net *net = dev_net(dev);
@@ -2238,8 +2240,10 @@ static int ip_route_input_slow(struct sk_buff *skb, __be32 daddr, __be32 saddr,
fl4.flowi4_tun_key.tun_id = 0;
skb_dst_drop(skb);
- if (ipv4_is_multicast(saddr) || ipv4_is_lbcast(saddr))
+ if (ipv4_is_multicast(saddr) || ipv4_is_lbcast(saddr)) {
+ reason = SKB_DROP_REASON_IP_INVALID_SOURCE;
goto martian_source;
+ }
res->fi = NULL;
res->table = NULL;
@@ -2249,21 +2253,29 @@ static int ip_route_input_slow(struct sk_buff *skb, __be32 daddr, __be32 saddr,
/* Accept zero addresses only to limited broadcast;
* I even do not know to fix it or not. Waiting for complains :-)
*/
- if (ipv4_is_zeronet(saddr))
+ if (ipv4_is_zeronet(saddr)) {
+ reason = SKB_DROP_REASON_IP_INVALID_SOURCE;
goto martian_source;
+ }
- if (ipv4_is_zeronet(daddr))
+ if (ipv4_is_zeronet(daddr)) {
+ reason = SKB_DROP_REASON_IP_INVALID_DEST;
goto martian_destination;
+ }
/* Following code try to avoid calling IN_DEV_NET_ROUTE_LOCALNET(),
* and call it once if daddr or/and saddr are loopback addresses
*/
if (ipv4_is_loopback(daddr)) {
- if (!IN_DEV_NET_ROUTE_LOCALNET(in_dev, net))
+ if (!IN_DEV_NET_ROUTE_LOCALNET(in_dev, net)) {
+ reason = SKB_DROP_REASON_IP_LOCALNET;
goto martian_destination;
+ }
} else if (ipv4_is_loopback(saddr)) {
- if (!IN_DEV_NET_ROUTE_LOCALNET(in_dev, net))
+ if (!IN_DEV_NET_ROUTE_LOCALNET(in_dev, net)) {
+ reason = SKB_DROP_REASON_IP_LOCALNET;
goto martian_source;
+ }
}
/*
@@ -2310,7 +2322,7 @@ static int ip_route_input_slow(struct sk_buff *skb, __be32 daddr, __be32 saddr,
inet_dscp_to_dsfield(dscp), 0, dev,
in_dev, &itag);
if (err < 0) {
- err = -EINVAL;
+ reason = -err;
goto martian_source;
}
goto local_input;
@@ -2326,18 +2338,21 @@ static int ip_route_input_slow(struct sk_buff *skb, __be32 daddr, __be32 saddr,
make_route:
err = ip_mkroute_input(skb, res, in_dev, daddr, saddr,
inet_dscp_to_dsfield(dscp), flkeys);
-out: return err;
+ if (!err)
+ reason = SKB_NOT_DROPPED_YET;
+
+out: return reason;
brd_input:
if (skb->protocol != htons(ETH_P_IP))
- goto e_inval;
+ goto out;
if (!ipv4_is_zeronet(saddr)) {
err = fib_validate_source(skb, saddr, 0,
inet_dscp_to_dsfield(dscp), 0, dev,
in_dev, &itag);
if (err < 0) {
- err = -EINVAL;
+ reason = -err;
goto martian_source;
}
}
@@ -2356,7 +2371,7 @@ out: return err;
rth = rcu_dereference(nhc->nhc_rth_input);
if (rt_cache_valid(rth)) {
skb_dst_set_noref(skb, &rth->dst);
- err = 0;
+ reason = SKB_NOT_DROPPED_YET;
goto out;
}
}
@@ -2393,7 +2408,7 @@ out: return err;
rt_add_uncached_list(rth);
}
skb_dst_set(skb, &rth->dst);
- err = 0;
+ reason = SKB_NOT_DROPPED_YET;
goto out;
no_route:
@@ -2414,12 +2429,8 @@ out: return err;
&daddr, &saddr, dev->name);
#endif
-e_inval:
- err = -EINVAL;
- goto out;
-
e_nobufs:
- err = -ENOBUFS;
+ reason = SKB_DROP_REASON_NOMEM;
goto out;
martian_source:
@@ -2477,7 +2488,7 @@ static int ip_route_input_rcu(struct sk_buff *skb, __be32 daddr, __be32 saddr,
return reason ? -EINVAL : 0;
}
- return ip_route_input_slow(skb, daddr, saddr, dscp, dev, res);
+ return ip_route_input_slow(skb, daddr, saddr, dscp, dev, res) ? -EINVAL : 0;
}
int ip_route_input_noref(struct sk_buff *skb, __be32 daddr, __be32 saddr,
--
2.39.5
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH net-next v2 5/7] net: ip: make ip_route_input_rcu() return drop reasons
2024-10-07 7:46 [PATCH net-next v2 0/7] net: ip: add drop reasons to input route Menglong Dong
` (3 preceding siblings ...)
2024-10-07 7:46 ` [PATCH net-next v2 4/7] net: ip: make ip_route_input_slow() return drop reasons Menglong Dong
@ 2024-10-07 7:47 ` Menglong Dong
2024-10-07 7:47 ` [PATCH net-next v2 6/7] net: ip: make ip_route_input_noref() " Menglong Dong
` (2 subsequent siblings)
7 siblings, 0 replies; 15+ messages in thread
From: Menglong Dong @ 2024-10-07 7:47 UTC (permalink / raw)
To: edumazet, kuba
Cc: davem, pabeni, dsahern, steffen.klassert, herbert, dongml2,
bigeasy, toke, idosch, netdev, linux-kernel, bpf
In this commit, we make ip_route_input_rcu() return drop reasons, which
come from ip_route_input_mc() and ip_route_input_slow().
The only caller of ip_route_input_rcu() is ip_route_input_noref(). We
adjust it by making it return -EINVAL on error and ignore the reasons that
ip_route_input_rcu() returns. In the following patch, we will make
ip_route_input_noref() returns the drop reasons.
Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
---
net/ipv4/route.c | 18 ++++++++++--------
1 file changed, 10 insertions(+), 8 deletions(-)
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 9b3f7bebcd86..56a1ebddde24 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -2439,9 +2439,10 @@ out: return reason;
}
/* called with rcu_read_lock held */
-static int ip_route_input_rcu(struct sk_buff *skb, __be32 daddr, __be32 saddr,
- dscp_t dscp, struct net_device *dev,
- struct fib_result *res)
+static enum skb_drop_reason
+ip_route_input_rcu(struct sk_buff *skb, __be32 daddr, __be32 saddr,
+ dscp_t dscp, struct net_device *dev,
+ struct fib_result *res)
{
/* Multicast recognition logic is moved from route cache to here.
* The problem was that too many Ethernet cards have broken/missing
@@ -2485,23 +2486,23 @@ static int ip_route_input_rcu(struct sk_buff *skb, __be32 daddr, __be32 saddr,
inet_dscp_to_dsfield(dscp),
dev, our);
}
- return reason ? -EINVAL : 0;
+ return reason;
}
- return ip_route_input_slow(skb, daddr, saddr, dscp, dev, res) ? -EINVAL : 0;
+ return ip_route_input_slow(skb, daddr, saddr, dscp, dev, res);
}
int ip_route_input_noref(struct sk_buff *skb, __be32 daddr, __be32 saddr,
dscp_t dscp, struct net_device *dev)
{
+ enum skb_drop_reason reason;
struct fib_result res;
- int err;
rcu_read_lock();
- err = ip_route_input_rcu(skb, daddr, saddr, dscp, dev, &res);
+ reason = ip_route_input_rcu(skb, daddr, saddr, dscp, dev, &res);
rcu_read_unlock();
- return err;
+ return reason ? -EINVAL : 0;
}
EXPORT_SYMBOL(ip_route_input_noref);
@@ -3314,6 +3315,7 @@ static int inet_rtm_getroute(struct sk_buff *in_skb, struct nlmsghdr *nlh,
err = ip_route_input_rcu(skb, dst, src,
inet_dsfield_to_dscp(rtm->rtm_tos),
dev, &res);
+ err = err ? -EINVAL : 0;
rt = skb_rtable(skb);
if (err == 0 && rt->dst.error)
--
2.39.5
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH net-next v2 6/7] net: ip: make ip_route_input_noref() return drop reasons
2024-10-07 7:46 [PATCH net-next v2 0/7] net: ip: add drop reasons to input route Menglong Dong
` (4 preceding siblings ...)
2024-10-07 7:47 ` [PATCH net-next v2 5/7] net: ip: make ip_route_input_rcu() " Menglong Dong
@ 2024-10-07 7:47 ` Menglong Dong
2024-10-07 7:47 ` [PATCH net-next v2 7/7] net: ip: make ip_route_input() " Menglong Dong
2024-10-10 8:30 ` [PATCH net-next v2 0/7] net: ip: add drop reasons to input route Paolo Abeni
7 siblings, 0 replies; 15+ messages in thread
From: Menglong Dong @ 2024-10-07 7:47 UTC (permalink / raw)
To: edumazet, kuba
Cc: davem, pabeni, dsahern, steffen.klassert, herbert, dongml2,
bigeasy, toke, idosch, netdev, linux-kernel, bpf
In this commit, we make ip_route_input_noref() return drop reasons, which
come from ip_route_input_rcu().
We need adjust the callers of ip_route_input_noref() to make sure the
return value of ip_route_input_noref() is used properly.
The errno that ip_route_input_noref() returns in the origin logic is
returned by ip_route_input and bpf_lwt_input_reroute, and we make them
return -EINVAL on error instead. In the following patch, we will make
ip_route_input() returns drop reasons too.
Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
---
include/net/route.h | 15 ++++++++-------
net/core/lwt_bpf.c | 1 +
net/ipv4/ip_fragment.c | 12 +++++++-----
net/ipv4/ip_input.c | 7 ++++---
net/ipv4/route.c | 7 ++++---
5 files changed, 24 insertions(+), 18 deletions(-)
diff --git a/include/net/route.h b/include/net/route.h
index 35bc12146960..c0b1b5fb9b59 100644
--- a/include/net/route.h
+++ b/include/net/route.h
@@ -202,8 +202,9 @@ enum skb_drop_reason
ip_mc_validate_source(struct sk_buff *skb, __be32 daddr, __be32 saddr,
u8 tos, struct net_device *dev,
struct in_device *in_dev, u32 *itag);
-int ip_route_input_noref(struct sk_buff *skb, __be32 daddr, __be32 saddr,
- dscp_t dscp, struct net_device *dev);
+enum skb_drop_reason
+ip_route_input_noref(struct sk_buff *skb, __be32 daddr, __be32 saddr,
+ dscp_t dscp, struct net_device *dev);
int ip_route_use_hint(struct sk_buff *skb, __be32 dst, __be32 src,
u8 tos, struct net_device *devin,
const struct sk_buff *hint);
@@ -211,18 +212,18 @@ int ip_route_use_hint(struct sk_buff *skb, __be32 dst, __be32 src,
static inline int ip_route_input(struct sk_buff *skb, __be32 dst, __be32 src,
dscp_t dscp, struct net_device *devin)
{
- int err;
+ enum skb_drop_reason reason;
rcu_read_lock();
- err = ip_route_input_noref(skb, dst, src, dscp, devin);
- if (!err) {
+ reason = ip_route_input_noref(skb, dst, src, dscp, devin);
+ if (!reason) {
skb_dst_force(skb);
if (!skb_dst(skb))
- err = -EINVAL;
+ reason = SKB_DROP_REASON_NOT_SPECIFIED;
}
rcu_read_unlock();
- return err;
+ return reason ? -EINVAL : 0;
}
void ipv4_update_pmtu(struct sk_buff *skb, struct net *net, u32 mtu, int oif,
diff --git a/net/core/lwt_bpf.c b/net/core/lwt_bpf.c
index e0ca24a58810..a4652f2a103a 100644
--- a/net/core/lwt_bpf.c
+++ b/net/core/lwt_bpf.c
@@ -98,6 +98,7 @@ static int bpf_lwt_input_reroute(struct sk_buff *skb)
skb_dst_drop(skb);
err = ip_route_input_noref(skb, iph->daddr, iph->saddr,
ip4h_dscp(iph), dev);
+ err = err ? -EINVAL : 0;
dev_put(dev);
} else if (skb->protocol == htons(ETH_P_IPV6)) {
skb_dst_drop(skb);
diff --git a/net/ipv4/ip_fragment.c b/net/ipv4/ip_fragment.c
index 48e2810f1f27..52b991e976ba 100644
--- a/net/ipv4/ip_fragment.c
+++ b/net/ipv4/ip_fragment.c
@@ -132,12 +132,12 @@ static bool frag_expire_skip_icmp(u32 user)
*/
static void ip_expire(struct timer_list *t)
{
+ enum skb_drop_reason reason = SKB_DROP_REASON_FRAG_REASM_TIMEOUT;
struct inet_frag_queue *frag = from_timer(frag, t, timer);
const struct iphdr *iph;
struct sk_buff *head = NULL;
struct net *net;
struct ipq *qp;
- int err;
qp = container_of(frag, struct ipq, q);
net = qp->q.fqdir->net;
@@ -175,10 +175,12 @@ static void ip_expire(struct timer_list *t)
/* skb has no dst, perform route lookup again */
iph = ip_hdr(head);
- err = ip_route_input_noref(head, iph->daddr, iph->saddr, ip4h_dscp(iph),
- head->dev);
- if (err)
+ reason = ip_route_input_noref(head, iph->daddr, iph->saddr,
+ ip4h_dscp(iph), head->dev);
+ if (reason)
goto out;
+ else
+ reason = SKB_DROP_REASON_FRAG_REASM_TIMEOUT;
/* Only an end host needs to send an ICMP
* "Fragment Reassembly Timeout" message, per RFC792.
@@ -195,7 +197,7 @@ static void ip_expire(struct timer_list *t)
spin_unlock(&qp->q.lock);
out_rcu_unlock:
rcu_read_unlock();
- kfree_skb_reason(head, SKB_DROP_REASON_FRAG_REASM_TIMEOUT);
+ kfree_skb_reason(head, reason);
ipq_put(qp);
}
diff --git a/net/ipv4/ip_input.c b/net/ipv4/ip_input.c
index a6f5bfc274ee..aeb71675052c 100644
--- a/net/ipv4/ip_input.c
+++ b/net/ipv4/ip_input.c
@@ -362,10 +362,11 @@ static int ip_rcv_finish_core(struct net *net, struct sock *sk,
* how the packet travels inside Linux networking.
*/
if (!skb_valid_dst(skb)) {
- err = ip_route_input_noref(skb, iph->daddr, iph->saddr,
- ip4h_dscp(iph), dev);
- if (unlikely(err))
+ drop_reason = ip_route_input_noref(skb, iph->daddr, iph->saddr,
+ ip4h_dscp(iph), dev);
+ if (unlikely(drop_reason))
goto drop_error;
+ drop_reason = SKB_DROP_REASON_NOT_SPECIFIED;
} else {
struct in_device *in_dev = __in_dev_get_rcu(dev);
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 56a1ebddde24..6baaaf0bcb3e 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -2492,8 +2492,9 @@ ip_route_input_rcu(struct sk_buff *skb, __be32 daddr, __be32 saddr,
return ip_route_input_slow(skb, daddr, saddr, dscp, dev, res);
}
-int ip_route_input_noref(struct sk_buff *skb, __be32 daddr, __be32 saddr,
- dscp_t dscp, struct net_device *dev)
+enum skb_drop_reason ip_route_input_noref(struct sk_buff *skb, __be32 daddr,
+ __be32 saddr, dscp_t dscp,
+ struct net_device *dev)
{
enum skb_drop_reason reason;
struct fib_result res;
@@ -2502,7 +2503,7 @@ int ip_route_input_noref(struct sk_buff *skb, __be32 daddr, __be32 saddr,
reason = ip_route_input_rcu(skb, daddr, saddr, dscp, dev, &res);
rcu_read_unlock();
- return reason ? -EINVAL : 0;
+ return reason;
}
EXPORT_SYMBOL(ip_route_input_noref);
--
2.39.5
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH net-next v2 7/7] net: ip: make ip_route_input() return drop reasons
2024-10-07 7:46 [PATCH net-next v2 0/7] net: ip: add drop reasons to input route Menglong Dong
` (5 preceding siblings ...)
2024-10-07 7:47 ` [PATCH net-next v2 6/7] net: ip: make ip_route_input_noref() " Menglong Dong
@ 2024-10-07 7:47 ` Menglong Dong
2024-10-10 8:30 ` [PATCH net-next v2 0/7] net: ip: add drop reasons to input route Paolo Abeni
7 siblings, 0 replies; 15+ messages in thread
From: Menglong Dong @ 2024-10-07 7:47 UTC (permalink / raw)
To: edumazet, kuba
Cc: davem, pabeni, dsahern, steffen.klassert, herbert, dongml2,
bigeasy, toke, idosch, netdev, linux-kernel, bpf
In this commit, we make ip_route_input() return skb drop reasons that come
from ip_route_input_noref().
Meanwhile, adjust all the call to it.
Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
---
include/net/route.h | 7 ++++---
net/bridge/br_netfilter_hooks.c | 11 ++++++-----
net/ipv4/icmp.c | 1 +
3 files changed, 11 insertions(+), 8 deletions(-)
diff --git a/include/net/route.h b/include/net/route.h
index c0b1b5fb9b59..87d2c103616e 100644
--- a/include/net/route.h
+++ b/include/net/route.h
@@ -209,8 +209,9 @@ int ip_route_use_hint(struct sk_buff *skb, __be32 dst, __be32 src,
u8 tos, struct net_device *devin,
const struct sk_buff *hint);
-static inline int ip_route_input(struct sk_buff *skb, __be32 dst, __be32 src,
- dscp_t dscp, struct net_device *devin)
+static inline enum skb_drop_reason
+ip_route_input(struct sk_buff *skb, __be32 dst, __be32 src, dscp_t dscp,
+ struct net_device *devin)
{
enum skb_drop_reason reason;
@@ -223,7 +224,7 @@ static inline int ip_route_input(struct sk_buff *skb, __be32 dst, __be32 src,
}
rcu_read_unlock();
- return reason ? -EINVAL : 0;
+ return reason;
}
void ipv4_update_pmtu(struct sk_buff *skb, struct net *net, u32 mtu, int oif,
diff --git a/net/bridge/br_netfilter_hooks.c b/net/bridge/br_netfilter_hooks.c
index c6bab2b5e834..ab4b9c6ae34b 100644
--- a/net/bridge/br_netfilter_hooks.c
+++ b/net/bridge/br_netfilter_hooks.c
@@ -372,8 +372,8 @@ static int br_nf_pre_routing_finish(struct net *net, struct sock *sk, struct sk_
struct nf_bridge_info *nf_bridge = nf_bridge_info_get(skb);
struct net_device *dev = skb->dev, *br_indev;
const struct iphdr *iph = ip_hdr(skb);
+ enum skb_drop_reason reason;
struct rtable *rt;
- int err;
br_indev = nf_bridge_get_physindev(skb, net);
if (!br_indev) {
@@ -389,9 +389,9 @@ static int br_nf_pre_routing_finish(struct net *net, struct sock *sk, struct sk_
}
nf_bridge->in_prerouting = 0;
if (br_nf_ipv4_daddr_was_changed(skb, nf_bridge)) {
- err = ip_route_input(skb, iph->daddr, iph->saddr,
- ip4h_dscp(iph), dev);
- if (err) {
+ reason = ip_route_input(skb, iph->daddr, iph->saddr,
+ ip4h_dscp(iph), dev);
+ if (reason) {
struct in_device *in_dev = __in_dev_get_rcu(dev);
/* If err equals -EHOSTUNREACH the error is due to a
@@ -401,7 +401,8 @@ static int br_nf_pre_routing_finish(struct net *net, struct sock *sk, struct sk_
* martian destinations: loopback destinations and destination
* 0.0.0.0. In both cases the packet will be dropped because the
* destination is the loopback device and not the bridge. */
- if (err != -EHOSTUNREACH || !in_dev || IN_DEV_FORWARD(in_dev))
+ if (reason != SKB_DROP_REASON_IP_INADDRERRORS || !in_dev ||
+ IN_DEV_FORWARD(in_dev))
goto free_skb;
rt = ip_route_output(net, iph->daddr, 0,
diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c
index 23664434922e..c3bafff093e0 100644
--- a/net/ipv4/icmp.c
+++ b/net/ipv4/icmp.c
@@ -546,6 +546,7 @@ static struct rtable *icmp_route_lookup(struct net *net, struct flowi4 *fl4,
skb_dst_set(skb_in, NULL);
err = ip_route_input(skb_in, fl4_dec.daddr, fl4_dec.saddr,
dscp, rt2->dst.dev);
+ err = err ? -EINVAL : 0;
dst_release(&rt2->dst);
rt2 = skb_rtable(skb_in);
--
2.39.5
^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH net-next v2 1/7] net: ip: make fib_validate_source() return drop reason
2024-10-07 7:46 ` [PATCH net-next v2 1/7] net: ip: make fib_validate_source() return drop reason Menglong Dong
@ 2024-10-10 8:25 ` Paolo Abeni
2024-10-10 9:18 ` Menglong Dong
0 siblings, 1 reply; 15+ messages in thread
From: Paolo Abeni @ 2024-10-10 8:25 UTC (permalink / raw)
To: Menglong Dong, edumazet, kuba
Cc: davem, dsahern, steffen.klassert, herbert, dongml2, bigeasy, toke,
idosch, netdev, linux-kernel, bpf
On 10/7/24 09:46, Menglong Dong wrote:
> In this commit, we make fib_validate_source/__fib_validate_source return
> -reason instead of errno on error. As the return value of them can be
> -errno, 0, and 1, we can't make it return enum skb_drop_reason directly.
>
> In the origin logic, if __fib_validate_source() return -EXDEV,
> LINUX_MIB_IPRPFILTER will be counted. And now, we need to adjust it by
> checking "reason == SKB_DROP_REASON_IP_RPFILTER". However, this will take
> effect only after the patch "net: ip: make ip_route_input_noref() return
> drop reasons", as we can't pass the drop reasons from
> fib_validate_source() to ip_rcv_finish_core() in this patch.
>
> We set the errno to -EINVAL when fib_validate_source() is called and the
> validation fails, as the errno can be checked in the caller and now its
> value is -reason, which can lead misunderstand.
>
> Following new drop reasons are added in this patch:
>
> SKB_DROP_REASON_IP_LOCAL_SOURCE
> SKB_DROP_REASON_IP_INVALID_SOURCE
>
> Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
Looking at the next patches, I'm under the impression that the overall
code will be simpler if you let __fib_validate_source() return directly
a drop reason, and fib_validate_source(), too. Hard to be sure without
actually do the attempt... did you try such patch by any chance?
Thanks!
Paolo
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH net-next v2 0/7] net: ip: add drop reasons to input route
2024-10-07 7:46 [PATCH net-next v2 0/7] net: ip: add drop reasons to input route Menglong Dong
` (6 preceding siblings ...)
2024-10-07 7:47 ` [PATCH net-next v2 7/7] net: ip: make ip_route_input() " Menglong Dong
@ 2024-10-10 8:30 ` Paolo Abeni
2024-10-10 10:32 ` Menglong Dong
7 siblings, 1 reply; 15+ messages in thread
From: Paolo Abeni @ 2024-10-10 8:30 UTC (permalink / raw)
To: Menglong Dong, edumazet, kuba
Cc: davem, dsahern, steffen.klassert, herbert, dongml2, bigeasy, toke,
idosch, netdev, linux-kernel, bpf
On 10/7/24 09:46, Menglong Dong wrote:
> In this series, we mainly add some skb drop reasons to the input path of
> ip routing.
>
> The errno from fib_validate_source() is -EINVAL or -EXDEV, and -EXDEV is
> used in ip_rcv_finish_core() to increase the LINUX_MIB_IPRPFILTER. For
> this case, we can check it by
> "drop_reason == SKB_DROP_REASON_IP_RPFILTER" instead. Therefore, we can
> make fib_validate_source() return -reason.
>
> Meanwhile, we make the following functions return drop reasons too:
>
> ip_route_input_mc()
> ip_mc_validate_source()
> ip_route_input_slow()
> ip_route_input_rcu()
> ip_route_input_noref()
> ip_route_input()
A few other functions are excluded, so that the ip input path coverage
is not completed - i.e. ip_route_use_hint(), is that intentional?
In any case does not apply cleanly anymore.
Please answer to the above question and question on patch 1 before
submitting a new revision. At very least the new revision should include
a comment explaining the reasoning for the current choice.
Please, include in each patch the detailed changelog after the '---'
separator.
Thanks,
Paolo
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH net-next v2 1/7] net: ip: make fib_validate_source() return drop reason
2024-10-10 8:25 ` Paolo Abeni
@ 2024-10-10 9:18 ` Menglong Dong
2024-10-11 6:42 ` Menglong Dong
0 siblings, 1 reply; 15+ messages in thread
From: Menglong Dong @ 2024-10-10 9:18 UTC (permalink / raw)
To: Paolo Abeni
Cc: edumazet, kuba, davem, dsahern, steffen.klassert, herbert,
dongml2, bigeasy, toke, idosch, netdev, linux-kernel, bpf
On Thu, Oct 10, 2024 at 4:25 PM Paolo Abeni <pabeni@redhat.com> wrote:
>
>
>
> On 10/7/24 09:46, Menglong Dong wrote:
> > In this commit, we make fib_validate_source/__fib_validate_source return
> > -reason instead of errno on error. As the return value of them can be
> > -errno, 0, and 1, we can't make it return enum skb_drop_reason directly.
> >
> > In the origin logic, if __fib_validate_source() return -EXDEV,
> > LINUX_MIB_IPRPFILTER will be counted. And now, we need to adjust it by
> > checking "reason == SKB_DROP_REASON_IP_RPFILTER". However, this will take
> > effect only after the patch "net: ip: make ip_route_input_noref() return
> > drop reasons", as we can't pass the drop reasons from
> > fib_validate_source() to ip_rcv_finish_core() in this patch.
> >
> > We set the errno to -EINVAL when fib_validate_source() is called and the
> > validation fails, as the errno can be checked in the caller and now its
> > value is -reason, which can lead misunderstand.
> >
> > Following new drop reasons are added in this patch:
> >
> > SKB_DROP_REASON_IP_LOCAL_SOURCE
> > SKB_DROP_REASON_IP_INVALID_SOURCE
> >
> > Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
>
> Looking at the next patches, I'm under the impression that the overall
> code will be simpler if you let __fib_validate_source() return directly
> a drop reason, and fib_validate_source(), too. Hard to be sure without
> actually do the attempt... did you try such patch by any chance?
>
I analysed the usages of fib_validate_source() before. The
return value of fib_validate_source() can be -errno, "0", and "1".
And the value "1" can be used by the caller, such as
__mkroute_input(). Making it return drop reasons can't cover this
case.
It seems that __mkroute_input() is the only case that uses the
positive returning value of fib_validate_source(). Let me think
about it more in this case.
Thanks!
Menglong Dong
> Thanks!
>
> Paolo
>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH net-next v2 0/7] net: ip: add drop reasons to input route
2024-10-10 8:30 ` [PATCH net-next v2 0/7] net: ip: add drop reasons to input route Paolo Abeni
@ 2024-10-10 10:32 ` Menglong Dong
0 siblings, 0 replies; 15+ messages in thread
From: Menglong Dong @ 2024-10-10 10:32 UTC (permalink / raw)
To: Paolo Abeni
Cc: edumazet, kuba, davem, dsahern, steffen.klassert, herbert,
dongml2, bigeasy, toke, idosch, netdev, linux-kernel, bpf
On Thu, Oct 10, 2024 at 4:30 PM Paolo Abeni <pabeni@redhat.com> wrote:
>
> On 10/7/24 09:46, Menglong Dong wrote:
> > In this series, we mainly add some skb drop reasons to the input path of
> > ip routing.
> >
> > The errno from fib_validate_source() is -EINVAL or -EXDEV, and -EXDEV is
> > used in ip_rcv_finish_core() to increase the LINUX_MIB_IPRPFILTER. For
> > this case, we can check it by
> > "drop_reason == SKB_DROP_REASON_IP_RPFILTER" instead. Therefore, we can
> > make fib_validate_source() return -reason.
> >
> > Meanwhile, we make the following functions return drop reasons too:
> >
> > ip_route_input_mc()
> > ip_mc_validate_source()
> > ip_route_input_slow()
> > ip_route_input_rcu()
> > ip_route_input_noref()
> > ip_route_input()
>
> A few other functions are excluded, so that the ip input path coverage
> is not completed - i.e. ip_route_use_hint(), is that intentional?
>
Hello,
That's not intentional, I just missed them. At the beginning, I
wanted to organize the drop reasons in ip_route_input_noref(),
and things become complex when I do it. Let me have a check
and make the coverage complete.
> In any case does not apply cleanly anymore.
>
> Please answer to the above question and question on patch 1 before
> submitting a new revision. At very least the new revision should include
> a comment explaining the reasoning for the current choice.
>
> Please, include in each patch the detailed changelog after the '---'
> separator.
>
Sorry about that. I thought the patches for ip_route_input_noref,
ip_route_input_rcu, ip_route_input_slow are completely new one,
and abandoned the changelogs in the patches. I'll complete the
changelogs in the next version.
Thanks!
Menglong Dong
> Thanks,
>
> Paolo
>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH net-next v2 1/7] net: ip: make fib_validate_source() return drop reason
2024-10-10 9:18 ` Menglong Dong
@ 2024-10-11 6:42 ` Menglong Dong
2024-10-11 8:49 ` Paolo Abeni
0 siblings, 1 reply; 15+ messages in thread
From: Menglong Dong @ 2024-10-11 6:42 UTC (permalink / raw)
To: Paolo Abeni
Cc: edumazet, kuba, davem, dsahern, steffen.klassert, herbert,
dongml2, bigeasy, toke, idosch, netdev, linux-kernel, bpf
On Thu, Oct 10, 2024 at 5:18 PM Menglong Dong <menglong8.dong@gmail.com> wrote:
>
> On Thu, Oct 10, 2024 at 4:25 PM Paolo Abeni <pabeni@redhat.com> wrote:
> >
> >
> >
> > On 10/7/24 09:46, Menglong Dong wrote:
> > > In this commit, we make fib_validate_source/__fib_validate_source return
> > > -reason instead of errno on error. As the return value of them can be
> > > -errno, 0, and 1, we can't make it return enum skb_drop_reason directly.
> > >
> > > In the origin logic, if __fib_validate_source() return -EXDEV,
> > > LINUX_MIB_IPRPFILTER will be counted. And now, we need to adjust it by
> > > checking "reason == SKB_DROP_REASON_IP_RPFILTER". However, this will take
> > > effect only after the patch "net: ip: make ip_route_input_noref() return
> > > drop reasons", as we can't pass the drop reasons from
> > > fib_validate_source() to ip_rcv_finish_core() in this patch.
> > >
> > > We set the errno to -EINVAL when fib_validate_source() is called and the
> > > validation fails, as the errno can be checked in the caller and now its
> > > value is -reason, which can lead misunderstand.
> > >
> > > Following new drop reasons are added in this patch:
> > >
> > > SKB_DROP_REASON_IP_LOCAL_SOURCE
> > > SKB_DROP_REASON_IP_INVALID_SOURCE
> > >
> > > Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
> >
> > Looking at the next patches, I'm under the impression that the overall
> > code will be simpler if you let __fib_validate_source() return directly
> > a drop reason, and fib_validate_source(), too. Hard to be sure without
> > actually do the attempt... did you try such patch by any chance?
> >
>
> I analysed the usages of fib_validate_source() before. The
> return value of fib_validate_source() can be -errno, "0", and "1".
> And the value "1" can be used by the caller, such as
> __mkroute_input(). Making it return drop reasons can't cover this
> case.
>
> It seems that __mkroute_input() is the only case that uses the
> positive returning value of fib_validate_source(). Let me think
> about it more in this case.
Hello,
After digging into the code of __fib_validate_source() and __mkroute_input(),
I think it's hard to make __fib_validate_source() return drop reasons
directly.
The __fib_validate_source() will return 1 if the scope of the
source(revert) route is HOST. And the __mkroute_input()
will mark the skb with IPSKB_DOREDIRECT in this
case (combine with some other conditions). And then, a REDIRECT
ICMP will be sent in ip_forward() if this flag exists.
I don't find a way to pass this information to __mkroute_input
if we make __fib_validate_source() return drop reasons. Can we?
An option is to add a wrapper for fib_validate_source(), such as
fib_validate_source_reason(), which returns drop reasons. And in
__mkroute_input(), we still call fib_validate_source().
What do you think?
Thanks!
Menglong Dong
>
> > Thanks!
> >
> > Paolo
> >
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH net-next v2 1/7] net: ip: make fib_validate_source() return drop reason
2024-10-11 6:42 ` Menglong Dong
@ 2024-10-11 8:49 ` Paolo Abeni
2024-10-11 9:17 ` Menglong Dong
0 siblings, 1 reply; 15+ messages in thread
From: Paolo Abeni @ 2024-10-11 8:49 UTC (permalink / raw)
To: Menglong Dong
Cc: edumazet, kuba, davem, dsahern, steffen.klassert, herbert,
dongml2, bigeasy, toke, idosch, netdev, linux-kernel, bpf
On 10/11/24 08:42, Menglong Dong wrote:
> On Thu, Oct 10, 2024 at 5:18 PM Menglong Dong <menglong8.dong@gmail.com> wrote:
>> On Thu, Oct 10, 2024 at 4:25 PM Paolo Abeni <pabeni@redhat.com> wrote:
>>> On 10/7/24 09:46, Menglong Dong wrote:
>>>> In this commit, we make fib_validate_source/__fib_validate_source return
>>>> -reason instead of errno on error. As the return value of them can be
>>>> -errno, 0, and 1, we can't make it return enum skb_drop_reason directly.
>>>>
>>>> In the origin logic, if __fib_validate_source() return -EXDEV,
>>>> LINUX_MIB_IPRPFILTER will be counted. And now, we need to adjust it by
>>>> checking "reason == SKB_DROP_REASON_IP_RPFILTER". However, this will take
>>>> effect only after the patch "net: ip: make ip_route_input_noref() return
>>>> drop reasons", as we can't pass the drop reasons from
>>>> fib_validate_source() to ip_rcv_finish_core() in this patch.
>>>>
>>>> We set the errno to -EINVAL when fib_validate_source() is called and the
>>>> validation fails, as the errno can be checked in the caller and now its
>>>> value is -reason, which can lead misunderstand.
>>>>
>>>> Following new drop reasons are added in this patch:
>>>>
>>>> SKB_DROP_REASON_IP_LOCAL_SOURCE
>>>> SKB_DROP_REASON_IP_INVALID_SOURCE
>>>>
>>>> Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
>>>
>>> Looking at the next patches, I'm under the impression that the overall
>>> code will be simpler if you let __fib_validate_source() return directly
>>> a drop reason, and fib_validate_source(), too. Hard to be sure without
>>> actually do the attempt... did you try such patch by any chance?
>>>
>>
>> I analysed the usages of fib_validate_source() before. The
>> return value of fib_validate_source() can be -errno, "0", and "1".
>> And the value "1" can be used by the caller, such as
>> __mkroute_input(). Making it return drop reasons can't cover this
>> case.
>>
>> It seems that __mkroute_input() is the only case that uses the
>> positive returning value of fib_validate_source(). Let me think
>> about it more in this case.
>
> Hello,
>
> After digging into the code of __fib_validate_source() and __mkroute_input(),
> I think it's hard to make __fib_validate_source() return drop reasons
> directly.
>
> The __fib_validate_source() will return 1 if the scope of the
> source(revert) route is HOST. And the __mkroute_input()
> will mark the skb with IPSKB_DOREDIRECT in this
> case (combine with some other conditions). And then, a REDIRECT
> ICMP will be sent in ip_forward() if this flag exists.
>
> I don't find a way to pass this information to __mkroute_input
> if we make __fib_validate_source() return drop reasons. Can we?
>
> An option is to add a wrapper for fib_validate_source(), such as
> fib_validate_source_reason(), which returns drop reasons. And in
> __mkroute_input(), we still call fib_validate_source().
>
> What do you think?
Thanks for the investigation. I see that let __fib_validate_source()
returning drop reasons does not look like a good design.
I think the additional helper will not help much, so I guess you can
retain the current implementation here, but please expand the commit
message with the above information.
Thanks!
Paolo
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH net-next v2 1/7] net: ip: make fib_validate_source() return drop reason
2024-10-11 8:49 ` Paolo Abeni
@ 2024-10-11 9:17 ` Menglong Dong
0 siblings, 0 replies; 15+ messages in thread
From: Menglong Dong @ 2024-10-11 9:17 UTC (permalink / raw)
To: Paolo Abeni
Cc: edumazet, kuba, davem, dsahern, steffen.klassert, herbert,
dongml2, bigeasy, toke, idosch, netdev, linux-kernel, bpf
On Fri, Oct 11, 2024 at 4:49 PM Paolo Abeni <pabeni@redhat.com> wrote:
>
> On 10/11/24 08:42, Menglong Dong wrote:
> > On Thu, Oct 10, 2024 at 5:18 PM Menglong Dong <menglong8.dong@gmail.com> wrote:
> >> On Thu, Oct 10, 2024 at 4:25 PM Paolo Abeni <pabeni@redhat.com> wrote:
> >>> On 10/7/24 09:46, Menglong Dong wrote:
> >>>> In this commit, we make fib_validate_source/__fib_validate_source return
> >>>> -reason instead of errno on error. As the return value of them can be
> >>>> -errno, 0, and 1, we can't make it return enum skb_drop_reason directly.
> >>>>
> >>>> In the origin logic, if __fib_validate_source() return -EXDEV,
> >>>> LINUX_MIB_IPRPFILTER will be counted. And now, we need to adjust it by
> >>>> checking "reason == SKB_DROP_REASON_IP_RPFILTER". However, this will take
> >>>> effect only after the patch "net: ip: make ip_route_input_noref() return
> >>>> drop reasons", as we can't pass the drop reasons from
> >>>> fib_validate_source() to ip_rcv_finish_core() in this patch.
> >>>>
> >>>> We set the errno to -EINVAL when fib_validate_source() is called and the
> >>>> validation fails, as the errno can be checked in the caller and now its
> >>>> value is -reason, which can lead misunderstand.
> >>>>
> >>>> Following new drop reasons are added in this patch:
> >>>>
> >>>> SKB_DROP_REASON_IP_LOCAL_SOURCE
> >>>> SKB_DROP_REASON_IP_INVALID_SOURCE
> >>>>
> >>>> Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
> >>>
> >>> Looking at the next patches, I'm under the impression that the overall
> >>> code will be simpler if you let __fib_validate_source() return directly
> >>> a drop reason, and fib_validate_source(), too. Hard to be sure without
> >>> actually do the attempt... did you try such patch by any chance?
> >>>
> >>
> >> I analysed the usages of fib_validate_source() before. The
> >> return value of fib_validate_source() can be -errno, "0", and "1".
> >> And the value "1" can be used by the caller, such as
> >> __mkroute_input(). Making it return drop reasons can't cover this
> >> case.
> >>
> >> It seems that __mkroute_input() is the only case that uses the
> >> positive returning value of fib_validate_source(). Let me think
> >> about it more in this case.
> >
> > Hello,
> >
> > After digging into the code of __fib_validate_source() and __mkroute_input(),
> > I think it's hard to make __fib_validate_source() return drop reasons
> > directly.
> >
> > The __fib_validate_source() will return 1 if the scope of the
> > source(revert) route is HOST. And the __mkroute_input()
> > will mark the skb with IPSKB_DOREDIRECT in this
> > case (combine with some other conditions). And then, a REDIRECT
> > ICMP will be sent in ip_forward() if this flag exists.
> >
> > I don't find a way to pass this information to __mkroute_input
> > if we make __fib_validate_source() return drop reasons. Can we?
> >
> > An option is to add a wrapper for fib_validate_source(), such as
> > fib_validate_source_reason(), which returns drop reasons. And in
> > __mkroute_input(), we still call fib_validate_source().
> >
> > What do you think?
>
> Thanks for the investigation. I see that let __fib_validate_source()
> returning drop reasons does not look like a good design.
>
> I think the additional helper will not help much, so I guess you can
> retain the current implementation here, but please expand the commit
> message with the above information.
Hello,
I have implemented a new version just now like this:
The only caller of __fib_validate_source() is fib_validate_source(), so
we can combine fib_validate_source() into __fib_validate_source(), and
make fib_validate_source() an inline call to __fib_validate_source().
Then, we can make fib_validate_source() return drop reasons. And
we call __fib_validate_source() in __mkroute_input(), which makes
the logic here remains unchanged.
What do you think? Or do we retain the current implementation here?
Following is the part patch that refactor
fib_validate_source/__fib_validate_source:
diff --git a/include/net/ip_fib.h b/include/net/ip_fib.h
index 06130933542d..ea51cae24fad 100644
--- a/include/net/ip_fib.h
+++ b/include/net/ip_fib.h
@@ -448,9 +448,18 @@ int fib_gw_from_via(struct fib_config *cfg,
struct nlattr *nla,
struct netlink_ext_ack *extack);
__be32 fib_compute_spec_dst(struct sk_buff *skb);
bool fib_info_nh_uses_dev(struct fib_info *fi, const struct net_device *dev);
-int fib_validate_source(struct sk_buff *skb, __be32 src, __be32 dst,
- dscp_t dscp, int oif, struct net_device *dev,
- struct in_device *idev, u32 *itag);
+int __fib_validate_source(struct sk_buff *skb, __be32 src, __be32 dst,
+ dscp_t dscp, int oif, struct net_device *dev,
+ struct in_device *idev, u32 *itag);
+
+static inline int
+fib_validate_source(struct sk_buff *skb, __be32 src, __be32 dst,
+ dscp_t dscp, int oif, struct net_device *dev,
+ struct in_device *idev, u32 *itag)
+{
+ return __fib_validate_source(skb, src, dst, dscp, oif, dev, idev,
+ itag);
+}
#ifdef CONFIG_IP_ROUTE_CLASSID
static inline int fib_num_tclassid_users(struct net *net)
diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
index 8353518b110a..f74138f4d748 100644
--- a/net/ipv4/fib_frontend.c
+++ b/net/ipv4/fib_frontend.c
@@ -341,10 +341,11 @@ EXPORT_SYMBOL_GPL(fib_info_nh_uses_dev);
* - check, that packet arrived from expected physical interface.
* called with rcu_read_lock()
*/
-static int __fib_validate_source(struct sk_buff *skb, __be32 src, __be32 dst,
- dscp_t dscp, int oif, struct net_device *dev,
- int rpf, struct in_device *idev, u32 *itag)
+int __fib_validate_source(struct sk_buff *skb, __be32 src, __be32 dst,
+ dscp_t dscp, int oif, struct net_device *dev,
+ struct in_device *idev, u32 *itag)
{
+ int rpf = secpath_exists(skb) ? 0 : IN_DEV_RPFILTER(idev);
struct net *net = dev_net(dev);
struct flow_keys flkeys;
int ret, no_addr;
@@ -352,6 +353,28 @@ static int __fib_validate_source(struct sk_buff
*skb, __be32 src, __be32 dst,
struct flowi4 fl4;
bool dev_match;
+ /* Ignore rp_filter for packets protected by IPsec. */
+ if (!rpf && !fib_num_tclassid_users(net) &&
+ (dev->ifindex != oif || !IN_DEV_TX_REDIRECTS(idev))) {
+ if (IN_DEV_ACCEPT_LOCAL(idev))
+ goto last_resort;
+ /* with custom local routes in place, checking local addresses
+ * only will be too optimistic, with custom rules, checking
+ * local addresses only can be too strict, e.g. due to vrf
+ */
+ if (net->ipv4.fib_has_custom_local_routes ||
+ fib4_has_custom_rules(net))
+ goto full_check;
+ /* Within the same container, it is regarded as a martian source,
+ * and the same host but different containers are not.
+ */
+ if (inet_lookup_ifaddr_rcu(net, src))
+ return -EINVAL;
+
+ goto last_resort;
+ }
+
+full_check:
fl4.flowi4_oif = 0;
fl4.flowi4_l3mdev = l3mdev_master_ifindex_rcu(dev);
fl4.flowi4_iif = oif ? : LOOPBACK_IFINDEX;
Thanks!
Menglong Dong
>
> Thanks!
>
> Paolo
>
^ permalink raw reply related [flat|nested] 15+ messages in thread
end of thread, other threads:[~2024-10-11 9:17 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-07 7:46 [PATCH net-next v2 0/7] net: ip: add drop reasons to input route Menglong Dong
2024-10-07 7:46 ` [PATCH net-next v2 1/7] net: ip: make fib_validate_source() return drop reason Menglong Dong
2024-10-10 8:25 ` Paolo Abeni
2024-10-10 9:18 ` Menglong Dong
2024-10-11 6:42 ` Menglong Dong
2024-10-11 8:49 ` Paolo Abeni
2024-10-11 9:17 ` Menglong Dong
2024-10-07 7:46 ` [PATCH net-next v2 2/7] net: ip: make ip_route_input_mc() " Menglong Dong
2024-10-07 7:46 ` [PATCH net-next v2 3/7] net: ip: make ip_mc_validate_source() " Menglong Dong
2024-10-07 7:46 ` [PATCH net-next v2 4/7] net: ip: make ip_route_input_slow() return drop reasons Menglong Dong
2024-10-07 7:47 ` [PATCH net-next v2 5/7] net: ip: make ip_route_input_rcu() " Menglong Dong
2024-10-07 7:47 ` [PATCH net-next v2 6/7] net: ip: make ip_route_input_noref() " Menglong Dong
2024-10-07 7:47 ` [PATCH net-next v2 7/7] net: ip: make ip_route_input() " Menglong Dong
2024-10-10 8:30 ` [PATCH net-next v2 0/7] net: ip: add drop reasons to input route Paolo Abeni
2024-10-10 10:32 ` Menglong Dong
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox