* [PATCH net 0/2] udp: fix early demux for mcast packets
@ 2017-09-28 13:51 Paolo Abeni
2017-09-28 13:51 ` [PATCH net 1/2] IPv4: early demux can return an error code Paolo Abeni
` (2 more replies)
0 siblings, 3 replies; 4+ messages in thread
From: Paolo Abeni @ 2017-09-28 13:51 UTC (permalink / raw)
To: netdev; +Cc: David S. Miller
Currently the early demux callbacks do not perform source address validation.
This is not an issue for TCP or UDP unicast, where the early demux
is only allowed for connected sockets and the source address is validated
for the first packet and never change.
The UDP protocol currently allows early demux also for unconnected multicast
sockets, and we are not currently doing any validation for them, after that
the first packet lands on the socket: beyond ignoring the rp_filter - if
enabled - any kind of martian sources are also allowed.
This series addresses the issue allowing the early demux callback to return an
error code, and performing the proper checks for unconnected UDP multicast
sockets before leveraging the rx dst cache.
Alternatively we could disable the early demux for unconnected mcast sockets,
but that would cause relevant performance regression - around 50% - while with
this series, with full rp_filter in place, we keep the regression to a more
moderate level.
Paolo Abeni (2):
IPv4: early demux can return an error code
udp: perform source validation for mcast early demux
include/net/protocol.h | 4 ++--
include/net/route.h | 4 +++-
include/net/tcp.h | 2 +-
include/net/udp.h | 2 +-
net/ipv4/ip_input.c | 25 +++++++++++++++----------
net/ipv4/route.c | 46 ++++++++++++++++++++++++++--------------------
net/ipv4/tcp_ipv4.c | 9 +++++----
net/ipv4/udp.c | 24 ++++++++++++++++++------
8 files changed, 71 insertions(+), 45 deletions(-)
--
2.13.5
^ permalink raw reply [flat|nested] 4+ messages in thread
* [PATCH net 1/2] IPv4: early demux can return an error code
2017-09-28 13:51 [PATCH net 0/2] udp: fix early demux for mcast packets Paolo Abeni
@ 2017-09-28 13:51 ` Paolo Abeni
2017-09-28 13:51 ` [PATCH net 2/2] udp: perform source validation for mcast early demux Paolo Abeni
2017-10-01 2:56 ` [PATCH net 0/2] udp: fix early demux for mcast packets David Miller
2 siblings, 0 replies; 4+ messages in thread
From: Paolo Abeni @ 2017-09-28 13:51 UTC (permalink / raw)
To: netdev; +Cc: David S. Miller
Currently no error is emitted, but this infrastructure will
used by the next patch to allow source address validation
for mcast sockets.
Since early demux can do a route lookup and an ipv4 route
lookup can return an error code this is consistent with the
current ipv4 route infrastructure.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
This same patch is also present in the RFC series
"udp: full early demux for unconnected sockets", I'll drop
it from such series on an eventual next re-submission.
---
include/net/protocol.h | 4 ++--
include/net/tcp.h | 2 +-
include/net/udp.h | 2 +-
net/ipv4/ip_input.c | 25 +++++++++++++++----------
net/ipv4/tcp_ipv4.c | 9 +++++----
net/ipv4/udp.c | 11 ++++++-----
6 files changed, 30 insertions(+), 23 deletions(-)
diff --git a/include/net/protocol.h b/include/net/protocol.h
index 65ba335b0e7e..4fc75f7ae23b 100644
--- a/include/net/protocol.h
+++ b/include/net/protocol.h
@@ -39,8 +39,8 @@
/* This is used to register protocols. */
struct net_protocol {
- void (*early_demux)(struct sk_buff *skb);
- void (*early_demux_handler)(struct sk_buff *skb);
+ int (*early_demux)(struct sk_buff *skb);
+ int (*early_demux_handler)(struct sk_buff *skb);
int (*handler)(struct sk_buff *skb);
void (*err_handler)(struct sk_buff *skb, u32 info);
unsigned int no_policy:1,
diff --git a/include/net/tcp.h b/include/net/tcp.h
index 3bc910a9bfc6..89974c5286d8 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -345,7 +345,7 @@ void tcp_v4_err(struct sk_buff *skb, u32);
void tcp_shutdown(struct sock *sk, int how);
-void tcp_v4_early_demux(struct sk_buff *skb);
+int tcp_v4_early_demux(struct sk_buff *skb);
int tcp_v4_rcv(struct sk_buff *skb);
int tcp_v4_tw_remember_stamp(struct inet_timewait_sock *tw);
diff --git a/include/net/udp.h b/include/net/udp.h
index 12dfbfe2e2d7..6c759c8594e2 100644
--- a/include/net/udp.h
+++ b/include/net/udp.h
@@ -259,7 +259,7 @@ static inline struct sk_buff *skb_recv_udp(struct sock *sk, unsigned int flags,
return __skb_recv_udp(sk, flags, noblock, &peeked, &off, err);
}
-void udp_v4_early_demux(struct sk_buff *skb);
+int udp_v4_early_demux(struct sk_buff *skb);
bool udp_sk_rx_dst_set(struct sock *sk, struct dst_entry *dst);
int udp_get_port(struct sock *sk, unsigned short snum,
int (*saddr_cmp)(const struct sock *,
diff --git a/net/ipv4/ip_input.c b/net/ipv4/ip_input.c
index fa2dc8f692c6..57fc13c6ab2b 100644
--- a/net/ipv4/ip_input.c
+++ b/net/ipv4/ip_input.c
@@ -311,9 +311,10 @@ static inline bool ip_rcv_options(struct sk_buff *skb)
static int ip_rcv_finish(struct net *net, struct sock *sk, struct sk_buff *skb)
{
const struct iphdr *iph = ip_hdr(skb);
- struct rtable *rt;
+ int (*edemux)(struct sk_buff *skb);
struct net_device *dev = skb->dev;
- void (*edemux)(struct sk_buff *skb);
+ struct rtable *rt;
+ int err;
/* if ingress device is enslaved to an L3 master device pass the
* skb to its handler for processing
@@ -331,7 +332,9 @@ static int ip_rcv_finish(struct net *net, struct sock *sk, struct sk_buff *skb)
ipprot = rcu_dereference(inet_protos[protocol]);
if (ipprot && (edemux = READ_ONCE(ipprot->early_demux))) {
- edemux(skb);
+ err = edemux(skb);
+ if (unlikely(err))
+ goto drop_error;
/* must reload iph, skb->head might have changed */
iph = ip_hdr(skb);
}
@@ -342,13 +345,10 @@ static int ip_rcv_finish(struct net *net, struct sock *sk, struct sk_buff *skb)
* how the packet travels inside Linux networking.
*/
if (!skb_valid_dst(skb)) {
- int err = ip_route_input_noref(skb, iph->daddr, iph->saddr,
- iph->tos, dev);
- if (unlikely(err)) {
- if (err == -EXDEV)
- __NET_INC_STATS(net, LINUX_MIB_IPRPFILTER);
- goto drop;
- }
+ err = ip_route_input_noref(skb, iph->daddr, iph->saddr,
+ iph->tos, dev);
+ if (unlikely(err))
+ goto drop_error;
}
#ifdef CONFIG_IP_ROUTE_CLASSID
@@ -399,6 +399,11 @@ static int ip_rcv_finish(struct net *net, struct sock *sk, struct sk_buff *skb)
drop:
kfree_skb(skb);
return NET_RX_DROP;
+
+drop_error:
+ if (err == -EXDEV)
+ __NET_INC_STATS(net, LINUX_MIB_IPRPFILTER);
+ goto drop;
}
/*
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index d9416b5162bc..85164d4d3e53 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -1503,23 +1503,23 @@ int tcp_v4_do_rcv(struct sock *sk, struct sk_buff *skb)
}
EXPORT_SYMBOL(tcp_v4_do_rcv);
-void tcp_v4_early_demux(struct sk_buff *skb)
+int tcp_v4_early_demux(struct sk_buff *skb)
{
const struct iphdr *iph;
const struct tcphdr *th;
struct sock *sk;
if (skb->pkt_type != PACKET_HOST)
- return;
+ return 0;
if (!pskb_may_pull(skb, skb_transport_offset(skb) + sizeof(struct tcphdr)))
- return;
+ return 0;
iph = ip_hdr(skb);
th = tcp_hdr(skb);
if (th->doff < sizeof(struct tcphdr) / 4)
- return;
+ return 0;
sk = __inet_lookup_established(dev_net(skb->dev), &tcp_hashinfo,
iph->saddr, th->source,
@@ -1538,6 +1538,7 @@ void tcp_v4_early_demux(struct sk_buff *skb)
skb_dst_set_noref(skb, dst);
}
}
+ return 0;
}
bool tcp_add_backlog(struct sock *sk, struct sk_buff *skb)
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index ef29df8648e4..9b30f821fe96 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -2221,7 +2221,7 @@ static struct sock *__udp4_lib_demux_lookup(struct net *net,
return NULL;
}
-void udp_v4_early_demux(struct sk_buff *skb)
+int udp_v4_early_demux(struct sk_buff *skb)
{
struct net *net = dev_net(skb->dev);
const struct iphdr *iph;
@@ -2234,7 +2234,7 @@ void udp_v4_early_demux(struct sk_buff *skb)
/* validate the packet */
if (!pskb_may_pull(skb, skb_transport_offset(skb) + sizeof(struct udphdr)))
- return;
+ return 0;
iph = ip_hdr(skb);
uh = udp_hdr(skb);
@@ -2244,14 +2244,14 @@ void udp_v4_early_demux(struct sk_buff *skb)
struct in_device *in_dev = __in_dev_get_rcu(skb->dev);
if (!in_dev)
- return;
+ return 0;
/* we are supposed to accept bcast packets */
if (skb->pkt_type == PACKET_MULTICAST) {
ours = ip_check_mc_rcu(in_dev, iph->daddr, iph->saddr,
iph->protocol);
if (!ours)
- return;
+ return 0;
}
sk = __udp4_lib_mcast_demux_lookup(net, uh->dest, iph->daddr,
@@ -2263,7 +2263,7 @@ void udp_v4_early_demux(struct sk_buff *skb)
}
if (!sk || !refcount_inc_not_zero(&sk->sk_refcnt))
- return;
+ return 0;
skb->sk = sk;
skb->destructor = sock_efree;
@@ -2278,6 +2278,7 @@ void udp_v4_early_demux(struct sk_buff *skb)
*/
skb_dst_set_noref(skb, dst);
}
+ return 0;
}
int udp_rcv(struct sk_buff *skb)
--
2.13.5
^ permalink raw reply related [flat|nested] 4+ messages in thread
* [PATCH net 2/2] udp: perform source validation for mcast early demux
2017-09-28 13:51 [PATCH net 0/2] udp: fix early demux for mcast packets Paolo Abeni
2017-09-28 13:51 ` [PATCH net 1/2] IPv4: early demux can return an error code Paolo Abeni
@ 2017-09-28 13:51 ` Paolo Abeni
2017-10-01 2:56 ` [PATCH net 0/2] udp: fix early demux for mcast packets David Miller
2 siblings, 0 replies; 4+ messages in thread
From: Paolo Abeni @ 2017-09-28 13:51 UTC (permalink / raw)
To: netdev; +Cc: David S. Miller
The UDP early demux can leverate the rx dst cache even for
multicast unconnected sockets.
In such scenario the ipv4 source address is validated only on
the first packet in the given flow. After that, when we fetch
the dst entry from the socket rx cache, we stop enforcing
the rp_filter and we even start accepting any kind of martian
addresses.
Disabling the dst cache for unconnected multicast socket will
cause large performace regression, nearly reducing by half the
max ingress tput.
Instead we factor out a route helper to completely validate an
skb source address for multicast packets and we call it from
the UDP early demux for mcast packets landing on unconnected
sockets, after successful fetching the related cached dst entry.
This still gives a measurable, but limited performance
regression:
rp_filter = 0 rp_filter = 1
edmux disabled: 1182 Kpps 1127 Kpps
edmux before: 2238 Kpps 2238 Kpps
edmux after: 2037 Kpps 2019 Kpps
The above figures are on top of current net tree.
Applying the net-next commit 6e617de84e87 ("net: avoid a full
fib lookup when rp_filter is disabled.") the delta with
rp_filter == 0 will decrease even more.
Fixes: 421b3885bf6d ("udp: ipv4: Add udp early demux")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
include/net/route.h | 4 +++-
net/ipv4/route.c | 46 ++++++++++++++++++++++++++--------------------
net/ipv4/udp.c | 13 ++++++++++++-
3 files changed, 41 insertions(+), 22 deletions(-)
diff --git a/include/net/route.h b/include/net/route.h
index 57dfc6850d37..d538e6db1afe 100644
--- a/include/net/route.h
+++ b/include/net/route.h
@@ -175,7 +175,9 @@ static inline struct rtable *ip_route_output_gre(struct net *net, struct flowi4
fl4->fl4_gre_key = gre_key;
return ip_route_output_key(net, fl4);
}
-
+int ip_mc_validate_source(struct sk_buff *skb, __be32 daddr, __be32 saddr,
+ u8 tos, struct net_device *dev,
+ struct in_device *in_dev, u32 *itag);
int ip_route_input_noref(struct sk_buff *skb, __be32 dst, __be32 src,
u8 tos, struct net_device *devin);
int ip_route_input_rcu(struct sk_buff *skb, __be32 dst, __be32 src,
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 94d4cd2d5ea4..ac6fde5d45f1 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -1520,43 +1520,56 @@ struct rtable *rt_dst_alloc(struct net_device *dev,
EXPORT_SYMBOL(rt_dst_alloc);
/* called in rcu_read_lock() section */
-static int ip_route_input_mc(struct sk_buff *skb, __be32 daddr, __be32 saddr,
- u8 tos, struct net_device *dev, int our)
+int ip_mc_validate_source(struct sk_buff *skb, __be32 daddr, __be32 saddr,
+ u8 tos, struct net_device *dev,
+ struct in_device *in_dev, u32 *itag)
{
- struct rtable *rth;
- struct in_device *in_dev = __in_dev_get_rcu(dev);
- unsigned int flags = RTCF_MULTICAST;
- u32 itag = 0;
int err;
/* Primary sanity checks. */
-
if (!in_dev)
return -EINVAL;
if (ipv4_is_multicast(saddr) || ipv4_is_lbcast(saddr) ||
skb->protocol != htons(ETH_P_IP))
- goto e_inval;
+ return -EINVAL;
if (ipv4_is_loopback(saddr) && !IN_DEV_ROUTE_LOCALNET(in_dev))
- goto e_inval;
+ return -EINVAL;
if (ipv4_is_zeronet(saddr)) {
if (!ipv4_is_local_multicast(daddr))
- goto e_inval;
+ return -EINVAL;
} else {
err = fib_validate_source(skb, saddr, 0, tos, 0, dev,
- in_dev, &itag);
+ in_dev, itag);
if (err < 0)
- goto e_err;
+ return err;
}
+ return 0;
+}
+
+/* called in rcu_read_lock() section */
+static int ip_route_input_mc(struct sk_buff *skb, __be32 daddr, __be32 saddr,
+ u8 tos, struct net_device *dev, int our)
+{
+ struct in_device *in_dev = __in_dev_get_rcu(dev);
+ unsigned int flags = RTCF_MULTICAST;
+ struct rtable *rth;
+ u32 itag = 0;
+ int err;
+
+ err = ip_mc_validate_source(skb, daddr, saddr, tos, dev, in_dev, &itag);
+ if (err)
+ return err;
+
if (our)
flags |= RTCF_LOCAL;
rth = rt_dst_alloc(dev_net(dev)->loopback_dev, flags, RTN_MULTICAST,
IN_DEV_CONF_GET(in_dev, NOPOLICY), false, false);
if (!rth)
- goto e_nobufs;
+ return -ENOBUFS;
#ifdef CONFIG_IP_ROUTE_CLASSID
rth->dst.tclassid = itag;
@@ -1572,13 +1585,6 @@ static int ip_route_input_mc(struct sk_buff *skb, __be32 daddr, __be32 saddr,
skb_dst_set(skb, &rth->dst);
return 0;
-
-e_nobufs:
- return -ENOBUFS;
-e_inval:
- return -EINVAL;
-e_err:
- return err;
}
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 9b30f821fe96..5676237d2b0f 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -2224,6 +2224,7 @@ static struct sock *__udp4_lib_demux_lookup(struct net *net,
int udp_v4_early_demux(struct sk_buff *skb)
{
struct net *net = dev_net(skb->dev);
+ struct in_device *in_dev = NULL;
const struct iphdr *iph;
const struct udphdr *uh;
struct sock *sk = NULL;
@@ -2241,7 +2242,7 @@ int udp_v4_early_demux(struct sk_buff *skb)
if (skb->pkt_type == PACKET_BROADCAST ||
skb->pkt_type == PACKET_MULTICAST) {
- struct in_device *in_dev = __in_dev_get_rcu(skb->dev);
+ in_dev = __in_dev_get_rcu(skb->dev);
if (!in_dev)
return 0;
@@ -2272,11 +2273,21 @@ int udp_v4_early_demux(struct sk_buff *skb)
if (dst)
dst = dst_check(dst, 0);
if (dst) {
+ u32 itag = 0;
+
/* set noref for now.
* any place which wants to hold dst has to call
* dst_hold_safe()
*/
skb_dst_set_noref(skb, dst);
+
+ /* for unconnected multicast sockets we need to validate
+ * the source on each packet
+ */
+ if (!inet_sk(sk)->inet_daddr && in_dev)
+ return ip_mc_validate_source(skb, iph->daddr,
+ iph->saddr, iph->tos,
+ skb->dev, in_dev, &itag);
}
return 0;
}
--
2.13.5
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH net 0/2] udp: fix early demux for mcast packets
2017-09-28 13:51 [PATCH net 0/2] udp: fix early demux for mcast packets Paolo Abeni
2017-09-28 13:51 ` [PATCH net 1/2] IPv4: early demux can return an error code Paolo Abeni
2017-09-28 13:51 ` [PATCH net 2/2] udp: perform source validation for mcast early demux Paolo Abeni
@ 2017-10-01 2:56 ` David Miller
2 siblings, 0 replies; 4+ messages in thread
From: David Miller @ 2017-10-01 2:56 UTC (permalink / raw)
To: pabeni; +Cc: netdev
From: Paolo Abeni <pabeni@redhat.com>
Date: Thu, 28 Sep 2017 15:51:35 +0200
> Currently the early demux callbacks do not perform source address validation.
> This is not an issue for TCP or UDP unicast, where the early demux
> is only allowed for connected sockets and the source address is validated
> for the first packet and never change.
>
> The UDP protocol currently allows early demux also for unconnected multicast
> sockets, and we are not currently doing any validation for them, after that
> the first packet lands on the socket: beyond ignoring the rp_filter - if
> enabled - any kind of martian sources are also allowed.
>
> This series addresses the issue allowing the early demux callback to return an
> error code, and performing the proper checks for unconnected UDP multicast
> sockets before leveraging the rx dst cache.
>
> Alternatively we could disable the early demux for unconnected mcast sockets,
> but that would cause relevant performance regression - around 50% - while with
> this series, with full rp_filter in place, we keep the regression to a more
> moderate level.
Series applied and queued up for -stable, thanks.
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2017-10-01 2:56 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-09-28 13:51 [PATCH net 0/2] udp: fix early demux for mcast packets Paolo Abeni
2017-09-28 13:51 ` [PATCH net 1/2] IPv4: early demux can return an error code Paolo Abeni
2017-09-28 13:51 ` [PATCH net 2/2] udp: perform source validation for mcast early demux Paolo Abeni
2017-10-01 2:56 ` [PATCH net 0/2] udp: fix early demux for mcast packets David Miller
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).