* [PATCH net-next 0/3] icmp: avoid possible side-channels attacks
@ 2024-08-28 19:39 Eric Dumazet
2024-08-28 19:39 ` [PATCH net-next 1/3] icmp: change the order of rate limits Eric Dumazet
` (2 more replies)
0 siblings, 3 replies; 9+ messages in thread
From: Eric Dumazet @ 2024-08-28 19:39 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: David Ahern, Willy Tarreau, Keyu Man, Jesper Dangaard Brouer,
netdev, eric.dumazet, Eric Dumazet
Keyu Man reminded us that linux ICMP rate limiting was still allowing
side-channels attacks.
Quoting the fine document [1]:
4.4 Private Source Port Scan Method
...
We can then use the same global ICMP rate limit as a side
channel to infer if such an ICMP message has been triggered. At
first glance, this method can work but at a low speed of one port
per second, due to the per-IP rate limit on ICMP messages.
Surprisingly, after we analyze the source code of the ICMP rate
limit implementation, we find that the global rate limit is checked
prior to the per-IP rate limit. This means that even if the per-IP
rate limit may eventually determine that no ICMP reply should be
sent, a packet is still subjected to the global rate limit check and one
token is deducted. Ironically, such a decision is consciously made
by Linux developers to avoid invoking the expensive check of the
per-IP rate limit [ 22], involving a search process to locate the per-IP
data structure.
This effectively means that the per-IP rate limit can be disre-
garded for the purpose of our side channel based scan, as it only
determines if the final ICMP reply is generated but has nothing to
do with the global rate limit counter decrement. As a result, we can
continue to use roughly the same scan method as efficient as before,
achieving 1,000 ports per second
...
This series :
1) Changes the order of the two rate limiters to fix the issue.
2-3) Make the 'host-wide' rate limiter a per-netns one.
[1]
Link: https://dl.acm.org/doi/pdf/10.1145/3372297.3417280
Eric Dumazet (3):
icmp: change the order of rate limits
icmp: move icmp_global.credit and icmp_global.stamp to per netns
storage
icmp: icmp_msgs_per_sec and icmp_msgs_burst sysctls become per netns
include/net/ip.h | 5 +-
include/net/netns/ipv4.h | 5 +-
net/ipv4/icmp.c | 111 +++++++++++++++++++------------------
net/ipv4/sysctl_net_ipv4.c | 32 +++++------
net/ipv6/icmp.c | 28 ++++++----
5 files changed, 97 insertions(+), 84 deletions(-)
--
2.46.0.295.g3b9ea8a38a-goog
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH net-next 1/3] icmp: change the order of rate limits
2024-08-28 19:39 [PATCH net-next 0/3] icmp: avoid possible side-channels attacks Eric Dumazet
@ 2024-08-28 19:39 ` Eric Dumazet
2024-08-29 4:29 ` David Ahern
2024-08-28 19:39 ` [PATCH net-next 2/3] icmp: move icmp_global.credit and icmp_global.stamp to per netns storage Eric Dumazet
2024-08-28 19:39 ` [PATCH net-next 3/3] icmp: icmp_msgs_per_sec and icmp_msgs_burst sysctls become per netns Eric Dumazet
2 siblings, 1 reply; 9+ messages in thread
From: Eric Dumazet @ 2024-08-28 19:39 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: David Ahern, Willy Tarreau, Keyu Man, Jesper Dangaard Brouer,
netdev, eric.dumazet, Eric Dumazet, stable
ICMP messages are ratelimited :
After the blamed commits, the two rate limiters are applied in this order:
1) host wide ratelimit (icmp_global_allow())
2) Per destination ratelimit (inetpeer based)
In order to avoid side-channels attacks, we need to apply
the per destination check first.
This patch makes the following change :
1) icmp_global_allow() checks if the host wide limit is reached.
But credits are not yet consumed. This is deferred to 3)
2) The per destination limit is checked/updated.
This might add a new node in inetpeer tree.
3) icmp_global_consume() consumes tokens if prior operations succeeded.
This means that host wide ratelimit is still effective
in keeping inetpeer tree small even under DDOS.
As a bonus, I removed icmp_global.lock as the fast path
can use a lock-free operation.
Fixes: c0303efeab73 ("net: reduce cycles spend on ICMP replies that gets rate limited")
Fixes: 4cdf507d5452 ("icmp: add a global rate limitation")
Reported-by: Keyu Man <keyu.man@email.ucr.edu>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Jesper Dangaard Brouer <hawk@kernel.org>
Cc: stable@vger.kernel.org
---
include/net/ip.h | 2 +
net/ipv4/icmp.c | 103 ++++++++++++++++++++++++++---------------------
net/ipv6/icmp.c | 28 ++++++++-----
3 files changed, 76 insertions(+), 57 deletions(-)
diff --git a/include/net/ip.h b/include/net/ip.h
index c5606cadb1a552f3e282a5e1e721fd47b07432b2..82248813619e3f21e09d52976accbdc76c7668c2 100644
--- a/include/net/ip.h
+++ b/include/net/ip.h
@@ -795,6 +795,8 @@ static inline void ip_cmsg_recv(struct msghdr *msg, struct sk_buff *skb)
}
bool icmp_global_allow(void);
+void icmp_global_consume(void);
+
extern int sysctl_icmp_msgs_per_sec;
extern int sysctl_icmp_msgs_burst;
diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c
index b8f56d03fcbb62970a828e20dd9f05fcede2d552..0078e8fb2e86d0552ef85eb5bf5bef947b0f1c3d 100644
--- a/net/ipv4/icmp.c
+++ b/net/ipv4/icmp.c
@@ -224,57 +224,59 @@ int sysctl_icmp_msgs_per_sec __read_mostly = 1000;
int sysctl_icmp_msgs_burst __read_mostly = 50;
static struct {
- spinlock_t lock;
- u32 credit;
+ atomic_t credit;
u32 stamp;
-} icmp_global = {
- .lock = __SPIN_LOCK_UNLOCKED(icmp_global.lock),
-};
+} icmp_global;
/**
* icmp_global_allow - Are we allowed to send one more ICMP message ?
*
* Uses a token bucket to limit our ICMP messages to ~sysctl_icmp_msgs_per_sec.
* Returns false if we reached the limit and can not send another packet.
- * Note: called with BH disabled
+ * Works in tandem with icmp_global_consume().
*/
bool icmp_global_allow(void)
{
- u32 credit, delta, incr = 0, now = (u32)jiffies;
- bool rc = false;
+ u32 delta, now, oldstamp;
+ int incr, new, old;
- /* Check if token bucket is empty and cannot be refilled
- * without taking the spinlock. The READ_ONCE() are paired
- * with the following WRITE_ONCE() in this same function.
+ /* Note: many cpus could find this condition true.
+ * Then later icmp_global_consume() could consume more credits,
+ * this is an acceptable race.
*/
- if (!READ_ONCE(icmp_global.credit)) {
- delta = min_t(u32, now - READ_ONCE(icmp_global.stamp), HZ);
- if (delta < HZ / 50)
- return false;
- }
+ if (atomic_read(&icmp_global.credit) > 0)
+ return true;
- spin_lock(&icmp_global.lock);
- delta = min_t(u32, now - icmp_global.stamp, HZ);
- if (delta >= HZ / 50) {
- incr = READ_ONCE(sysctl_icmp_msgs_per_sec) * delta / HZ;
- if (incr)
- WRITE_ONCE(icmp_global.stamp, now);
- }
- credit = min_t(u32, icmp_global.credit + incr,
- READ_ONCE(sysctl_icmp_msgs_burst));
- if (credit) {
- /* We want to use a credit of one in average, but need to randomize
- * it for security reasons.
- */
- credit = max_t(int, credit - get_random_u32_below(3), 0);
- rc = true;
+ now = jiffies;
+ oldstamp = READ_ONCE(icmp_global.stamp);
+ delta = min_t(u32, now - oldstamp, HZ);
+ if (delta < HZ / 50)
+ return false;
+
+ incr = READ_ONCE(sysctl_icmp_msgs_per_sec) * delta / HZ;
+ if (!incr)
+ return false;
+
+ if (cmpxchg(&icmp_global.stamp, oldstamp, now) == oldstamp) {
+ old = atomic_read(&icmp_global.credit);
+ do {
+ new = min(old + incr, READ_ONCE(sysctl_icmp_msgs_burst));
+ } while (!atomic_try_cmpxchg(&icmp_global.credit, &old, new));
}
- WRITE_ONCE(icmp_global.credit, credit);
- spin_unlock(&icmp_global.lock);
- return rc;
+ return true;
}
EXPORT_SYMBOL(icmp_global_allow);
+void icmp_global_consume(void)
+{
+ int credits = get_random_u32_below(3);
+
+ /* Note: this might make icmp_global.credit negative. */
+ if (credits)
+ atomic_sub(credits, &icmp_global.credit);
+}
+EXPORT_SYMBOL(icmp_global_consume);
+
static bool icmpv4_mask_allow(struct net *net, int type, int code)
{
if (type > NR_ICMP_TYPES)
@@ -291,14 +293,16 @@ static bool icmpv4_mask_allow(struct net *net, int type, int code)
return false;
}
-static bool icmpv4_global_allow(struct net *net, int type, int code)
+static bool icmpv4_global_allow(struct net *net, int type, int code,
+ bool *apply_ratelimit)
{
if (icmpv4_mask_allow(net, type, code))
return true;
- if (icmp_global_allow())
+ if (icmp_global_allow()) {
+ *apply_ratelimit = true;
return true;
-
+ }
__ICMP_INC_STATS(net, ICMP_MIB_RATELIMITGLOBAL);
return false;
}
@@ -308,15 +312,16 @@ static bool icmpv4_global_allow(struct net *net, int type, int code)
*/
static bool icmpv4_xrlim_allow(struct net *net, struct rtable *rt,
- struct flowi4 *fl4, int type, int code)
+ struct flowi4 *fl4, int type, int code,
+ bool apply_ratelimit)
{
struct dst_entry *dst = &rt->dst;
struct inet_peer *peer;
bool rc = true;
int vif;
- if (icmpv4_mask_allow(net, type, code))
- goto out;
+ if (!apply_ratelimit)
+ return true;
/* No rate limit on loopback */
if (dst->dev && (dst->dev->flags&IFF_LOOPBACK))
@@ -331,6 +336,8 @@ static bool icmpv4_xrlim_allow(struct net *net, struct rtable *rt,
out:
if (!rc)
__ICMP_INC_STATS(net, ICMP_MIB_RATELIMITHOST);
+ else
+ icmp_global_consume();
return rc;
}
@@ -402,6 +409,7 @@ static void icmp_reply(struct icmp_bxm *icmp_param, struct sk_buff *skb)
struct ipcm_cookie ipc;
struct rtable *rt = skb_rtable(skb);
struct net *net = dev_net(rt->dst.dev);
+ bool apply_ratelimit = false;
struct flowi4 fl4;
struct sock *sk;
struct inet_sock *inet;
@@ -413,11 +421,11 @@ static void icmp_reply(struct icmp_bxm *icmp_param, struct sk_buff *skb)
if (ip_options_echo(net, &icmp_param->replyopts.opt.opt, skb))
return;
- /* Needed by both icmp_global_allow and icmp_xmit_lock */
+ /* Needed by both icmpv4_global_allow and icmp_xmit_lock */
local_bh_disable();
- /* global icmp_msgs_per_sec */
- if (!icmpv4_global_allow(net, type, code))
+ /* is global icmp_msgs_per_sec exhausted ? */
+ if (!icmpv4_global_allow(net, type, code, &apply_ratelimit))
goto out_bh_enable;
sk = icmp_xmit_lock(net);
@@ -450,7 +458,7 @@ static void icmp_reply(struct icmp_bxm *icmp_param, struct sk_buff *skb)
rt = ip_route_output_key(net, &fl4);
if (IS_ERR(rt))
goto out_unlock;
- if (icmpv4_xrlim_allow(net, rt, &fl4, type, code))
+ if (icmpv4_xrlim_allow(net, rt, &fl4, type, code, apply_ratelimit))
icmp_push_reply(sk, icmp_param, &fl4, &ipc, &rt);
ip_rt_put(rt);
out_unlock:
@@ -596,6 +604,7 @@ void __icmp_send(struct sk_buff *skb_in, int type, int code, __be32 info,
int room;
struct icmp_bxm icmp_param;
struct rtable *rt = skb_rtable(skb_in);
+ bool apply_ratelimit = false;
struct ipcm_cookie ipc;
struct flowi4 fl4;
__be32 saddr;
@@ -677,7 +686,7 @@ void __icmp_send(struct sk_buff *skb_in, int type, int code, __be32 info,
}
}
- /* Needed by both icmp_global_allow and icmp_xmit_lock */
+ /* Needed by both icmpv4_global_allow and icmp_xmit_lock */
local_bh_disable();
/* Check global sysctl_icmp_msgs_per_sec ratelimit, unless
@@ -685,7 +694,7 @@ void __icmp_send(struct sk_buff *skb_in, int type, int code, __be32 info,
* loopback, then peer ratelimit still work (in icmpv4_xrlim_allow)
*/
if (!(skb_in->dev && (skb_in->dev->flags&IFF_LOOPBACK)) &&
- !icmpv4_global_allow(net, type, code))
+ !icmpv4_global_allow(net, type, code, &apply_ratelimit))
goto out_bh_enable;
sk = icmp_xmit_lock(net);
@@ -744,7 +753,7 @@ void __icmp_send(struct sk_buff *skb_in, int type, int code, __be32 info,
goto out_unlock;
/* peer icmp_ratelimit */
- if (!icmpv4_xrlim_allow(net, rt, &fl4, type, code))
+ if (!icmpv4_xrlim_allow(net, rt, &fl4, type, code, apply_ratelimit))
goto ende;
/* RFC says return as much as we can without exceeding 576 bytes. */
diff --git a/net/ipv6/icmp.c b/net/ipv6/icmp.c
index 7b31674644efc338ec458d92dfe495480825b0fd..46f70e4a835139ef7d8925c49440865355048193 100644
--- a/net/ipv6/icmp.c
+++ b/net/ipv6/icmp.c
@@ -175,14 +175,16 @@ static bool icmpv6_mask_allow(struct net *net, int type)
return false;
}
-static bool icmpv6_global_allow(struct net *net, int type)
+static bool icmpv6_global_allow(struct net *net, int type,
+ bool *apply_ratelimit)
{
if (icmpv6_mask_allow(net, type))
return true;
- if (icmp_global_allow())
+ if (icmp_global_allow()) {
+ *apply_ratelimit = true;
return true;
-
+ }
__ICMP_INC_STATS(net, ICMP_MIB_RATELIMITGLOBAL);
return false;
}
@@ -191,13 +193,13 @@ static bool icmpv6_global_allow(struct net *net, int type)
* Check the ICMP output rate limit
*/
static bool icmpv6_xrlim_allow(struct sock *sk, u8 type,
- struct flowi6 *fl6)
+ struct flowi6 *fl6, bool apply_ratelimit)
{
struct net *net = sock_net(sk);
struct dst_entry *dst;
bool res = false;
- if (icmpv6_mask_allow(net, type))
+ if (!apply_ratelimit)
return true;
/*
@@ -228,6 +230,8 @@ static bool icmpv6_xrlim_allow(struct sock *sk, u8 type,
if (!res)
__ICMP6_INC_STATS(net, ip6_dst_idev(dst),
ICMP6_MIB_RATELIMITHOST);
+ else
+ icmp_global_consume();
dst_release(dst);
return res;
}
@@ -452,6 +456,7 @@ void icmp6_send(struct sk_buff *skb, u8 type, u8 code, __u32 info,
struct net *net;
struct ipv6_pinfo *np;
const struct in6_addr *saddr = NULL;
+ bool apply_ratelimit = false;
struct dst_entry *dst;
struct icmp6hdr tmp_hdr;
struct flowi6 fl6;
@@ -533,11 +538,12 @@ void icmp6_send(struct sk_buff *skb, u8 type, u8 code, __u32 info,
return;
}
- /* Needed by both icmp_global_allow and icmpv6_xmit_lock */
+ /* Needed by both icmpv6_global_allow and icmpv6_xmit_lock */
local_bh_disable();
/* Check global sysctl_icmp_msgs_per_sec ratelimit */
- if (!(skb->dev->flags & IFF_LOOPBACK) && !icmpv6_global_allow(net, type))
+ if (!(skb->dev->flags & IFF_LOOPBACK) &&
+ !icmpv6_global_allow(net, type, &apply_ratelimit))
goto out_bh_enable;
mip6_addr_swap(skb, parm);
@@ -575,7 +581,7 @@ void icmp6_send(struct sk_buff *skb, u8 type, u8 code, __u32 info,
np = inet6_sk(sk);
- if (!icmpv6_xrlim_allow(sk, type, &fl6))
+ if (!icmpv6_xrlim_allow(sk, type, &fl6, apply_ratelimit))
goto out;
tmp_hdr.icmp6_type = type;
@@ -717,6 +723,7 @@ static enum skb_drop_reason icmpv6_echo_reply(struct sk_buff *skb)
struct ipv6_pinfo *np;
const struct in6_addr *saddr = NULL;
struct icmp6hdr *icmph = icmp6_hdr(skb);
+ bool apply_ratelimit = false;
struct icmp6hdr tmp_hdr;
struct flowi6 fl6;
struct icmpv6_msg msg;
@@ -781,8 +788,9 @@ static enum skb_drop_reason icmpv6_echo_reply(struct sk_buff *skb)
goto out;
/* Check the ratelimit */
- if ((!(skb->dev->flags & IFF_LOOPBACK) && !icmpv6_global_allow(net, ICMPV6_ECHO_REPLY)) ||
- !icmpv6_xrlim_allow(sk, ICMPV6_ECHO_REPLY, &fl6))
+ if ((!(skb->dev->flags & IFF_LOOPBACK) &&
+ !icmpv6_global_allow(net, ICMPV6_ECHO_REPLY, &apply_ratelimit)) ||
+ !icmpv6_xrlim_allow(sk, ICMPV6_ECHO_REPLY, &fl6, apply_ratelimit))
goto out_dst_release;
idev = __in6_dev_get(skb->dev);
--
2.46.0.295.g3b9ea8a38a-goog
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH net-next 2/3] icmp: move icmp_global.credit and icmp_global.stamp to per netns storage
2024-08-28 19:39 [PATCH net-next 0/3] icmp: avoid possible side-channels attacks Eric Dumazet
2024-08-28 19:39 ` [PATCH net-next 1/3] icmp: change the order of rate limits Eric Dumazet
@ 2024-08-28 19:39 ` Eric Dumazet
2024-08-29 4:30 ` David Ahern
2024-08-29 13:33 ` Simon Horman
2024-08-28 19:39 ` [PATCH net-next 3/3] icmp: icmp_msgs_per_sec and icmp_msgs_burst sysctls become per netns Eric Dumazet
2 siblings, 2 replies; 9+ messages in thread
From: Eric Dumazet @ 2024-08-28 19:39 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: David Ahern, Willy Tarreau, Keyu Man, Jesper Dangaard Brouer,
netdev, eric.dumazet, Eric Dumazet
Host wide ICMP ratelimiter should be per netns, to provide better isolation.
Following patch in this series makes the sysctl per netns.
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
include/net/ip.h | 4 ++--
include/net/netns/ipv4.h | 3 ++-
net/ipv4/icmp.c | 25 ++++++++++---------------
net/ipv6/icmp.c | 4 ++--
4 files changed, 16 insertions(+), 20 deletions(-)
diff --git a/include/net/ip.h b/include/net/ip.h
index 82248813619e3f21e09d52976accbdc76c7668c2..d3bca4e83979f681c4931e9ff62db5941a059c11 100644
--- a/include/net/ip.h
+++ b/include/net/ip.h
@@ -794,8 +794,8 @@ static inline void ip_cmsg_recv(struct msghdr *msg, struct sk_buff *skb)
ip_cmsg_recv_offset(msg, skb->sk, skb, 0, 0);
}
-bool icmp_global_allow(void);
-void icmp_global_consume(void);
+bool icmp_global_allow(struct net *net);
+void icmp_global_consume(struct net *net);
extern int sysctl_icmp_msgs_per_sec;
extern int sysctl_icmp_msgs_burst;
diff --git a/include/net/netns/ipv4.h b/include/net/netns/ipv4.h
index 5fcd61ada62289253844be9cbe25387dd92385a5..54fe7c079fffb285b7a8a069f3d57f9440a6655a 100644
--- a/include/net/netns/ipv4.h
+++ b/include/net/netns/ipv4.h
@@ -122,7 +122,8 @@ struct netns_ipv4 {
u8 sysctl_icmp_errors_use_inbound_ifaddr;
int sysctl_icmp_ratelimit;
int sysctl_icmp_ratemask;
-
+ atomic_t icmp_global_credit;
+ u32 icmp_global_stamp;
u32 ip_rt_min_pmtu;
int ip_rt_mtu_expires;
int ip_rt_min_advmss;
diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c
index 0078e8fb2e86d0552ef85eb5bf5bef947b0f1c3d..8ad3139a00fb8c5cb8f28f92d125ef83d9e840c3 100644
--- a/net/ipv4/icmp.c
+++ b/net/ipv4/icmp.c
@@ -223,11 +223,6 @@ static inline void icmp_xmit_unlock(struct sock *sk)
int sysctl_icmp_msgs_per_sec __read_mostly = 1000;
int sysctl_icmp_msgs_burst __read_mostly = 50;
-static struct {
- atomic_t credit;
- u32 stamp;
-} icmp_global;
-
/**
* icmp_global_allow - Are we allowed to send one more ICMP message ?
*
@@ -235,7 +230,7 @@ static struct {
* Returns false if we reached the limit and can not send another packet.
* Works in tandem with icmp_global_consume().
*/
-bool icmp_global_allow(void)
+bool icmp_global_allow(struct net *net)
{
u32 delta, now, oldstamp;
int incr, new, old;
@@ -244,11 +239,11 @@ bool icmp_global_allow(void)
* Then later icmp_global_consume() could consume more credits,
* this is an acceptable race.
*/
- if (atomic_read(&icmp_global.credit) > 0)
+ if (atomic_read(&net->ipv4.icmp_global_credit) > 0)
return true;
now = jiffies;
- oldstamp = READ_ONCE(icmp_global.stamp);
+ oldstamp = READ_ONCE(net->ipv4.icmp_global_stamp);
delta = min_t(u32, now - oldstamp, HZ);
if (delta < HZ / 50)
return false;
@@ -257,23 +252,23 @@ bool icmp_global_allow(void)
if (!incr)
return false;
- if (cmpxchg(&icmp_global.stamp, oldstamp, now) == oldstamp) {
- old = atomic_read(&icmp_global.credit);
+ if (cmpxchg(&net->ipv4.icmp_global_stamp, oldstamp, now) == oldstamp) {
+ old = atomic_read(&net->ipv4.icmp_global_credit);
do {
new = min(old + incr, READ_ONCE(sysctl_icmp_msgs_burst));
- } while (!atomic_try_cmpxchg(&icmp_global.credit, &old, new));
+ } while (!atomic_try_cmpxchg(&net->ipv4.icmp_global_credit, &old, new));
}
return true;
}
EXPORT_SYMBOL(icmp_global_allow);
-void icmp_global_consume(void)
+void icmp_global_consume(struct net *net)
{
int credits = get_random_u32_below(3);
/* Note: this might make icmp_global.credit negative. */
if (credits)
- atomic_sub(credits, &icmp_global.credit);
+ atomic_sub(credits, &net->ipv4.icmp_global_credit);
}
EXPORT_SYMBOL(icmp_global_consume);
@@ -299,7 +294,7 @@ static bool icmpv4_global_allow(struct net *net, int type, int code,
if (icmpv4_mask_allow(net, type, code))
return true;
- if (icmp_global_allow()) {
+ if (icmp_global_allow(net)) {
*apply_ratelimit = true;
return true;
}
@@ -337,7 +332,7 @@ static bool icmpv4_xrlim_allow(struct net *net, struct rtable *rt,
if (!rc)
__ICMP_INC_STATS(net, ICMP_MIB_RATELIMITHOST);
else
- icmp_global_consume();
+ icmp_global_consume(net);
return rc;
}
diff --git a/net/ipv6/icmp.c b/net/ipv6/icmp.c
index 46f70e4a835139ef7d8925c49440865355048193..071b0bc1179d81b18c340ce415cef21e02a30cd7 100644
--- a/net/ipv6/icmp.c
+++ b/net/ipv6/icmp.c
@@ -181,7 +181,7 @@ static bool icmpv6_global_allow(struct net *net, int type,
if (icmpv6_mask_allow(net, type))
return true;
- if (icmp_global_allow()) {
+ if (icmp_global_allow(net)) {
*apply_ratelimit = true;
return true;
}
@@ -231,7 +231,7 @@ static bool icmpv6_xrlim_allow(struct sock *sk, u8 type,
__ICMP6_INC_STATS(net, ip6_dst_idev(dst),
ICMP6_MIB_RATELIMITHOST);
else
- icmp_global_consume();
+ icmp_global_consume(net);
dst_release(dst);
return res;
}
--
2.46.0.295.g3b9ea8a38a-goog
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [PATCH net-next 3/3] icmp: icmp_msgs_per_sec and icmp_msgs_burst sysctls become per netns
2024-08-28 19:39 [PATCH net-next 0/3] icmp: avoid possible side-channels attacks Eric Dumazet
2024-08-28 19:39 ` [PATCH net-next 1/3] icmp: change the order of rate limits Eric Dumazet
2024-08-28 19:39 ` [PATCH net-next 2/3] icmp: move icmp_global.credit and icmp_global.stamp to per netns storage Eric Dumazet
@ 2024-08-28 19:39 ` Eric Dumazet
2024-08-29 4:31 ` David Ahern
2 siblings, 1 reply; 9+ messages in thread
From: Eric Dumazet @ 2024-08-28 19:39 UTC (permalink / raw)
To: David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: David Ahern, Willy Tarreau, Keyu Man, Jesper Dangaard Brouer,
netdev, eric.dumazet, Eric Dumazet
Previous patch made ICMP rate limits per netns, it makes sense
to allow each netns to change the associated sysctl.
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
include/net/ip.h | 3 ---
include/net/netns/ipv4.h | 2 ++
net/ipv4/icmp.c | 9 ++++-----
net/ipv4/sysctl_net_ipv4.c | 32 ++++++++++++++++----------------
4 files changed, 22 insertions(+), 24 deletions(-)
diff --git a/include/net/ip.h b/include/net/ip.h
index d3bca4e83979f681c4931e9ff62db5941a059c11..1ee472fa8b373e85907146f9a3f29ecc98e2e55b 100644
--- a/include/net/ip.h
+++ b/include/net/ip.h
@@ -797,9 +797,6 @@ static inline void ip_cmsg_recv(struct msghdr *msg, struct sk_buff *skb)
bool icmp_global_allow(struct net *net);
void icmp_global_consume(struct net *net);
-extern int sysctl_icmp_msgs_per_sec;
-extern int sysctl_icmp_msgs_burst;
-
#ifdef CONFIG_PROC_FS
int ip_misc_proc_init(void);
#endif
diff --git a/include/net/netns/ipv4.h b/include/net/netns/ipv4.h
index 54fe7c079fffb285b7a8a069f3d57f9440a6655a..276f622f3516871c438be27bafe61c039445b335 100644
--- a/include/net/netns/ipv4.h
+++ b/include/net/netns/ipv4.h
@@ -122,6 +122,8 @@ struct netns_ipv4 {
u8 sysctl_icmp_errors_use_inbound_ifaddr;
int sysctl_icmp_ratelimit;
int sysctl_icmp_ratemask;
+ int sysctl_icmp_msgs_per_sec;
+ int sysctl_icmp_msgs_burst;
atomic_t icmp_global_credit;
u32 icmp_global_stamp;
u32 ip_rt_min_pmtu;
diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c
index 8ad3139a00fb8c5cb8f28f92d125ef83d9e840c3..1f5d6f01ee9ef7a1b32a6619837176de16bd4389 100644
--- a/net/ipv4/icmp.c
+++ b/net/ipv4/icmp.c
@@ -220,9 +220,6 @@ static inline void icmp_xmit_unlock(struct sock *sk)
spin_unlock(&sk->sk_lock.slock);
}
-int sysctl_icmp_msgs_per_sec __read_mostly = 1000;
-int sysctl_icmp_msgs_burst __read_mostly = 50;
-
/**
* icmp_global_allow - Are we allowed to send one more ICMP message ?
*
@@ -248,14 +245,14 @@ bool icmp_global_allow(struct net *net)
if (delta < HZ / 50)
return false;
- incr = READ_ONCE(sysctl_icmp_msgs_per_sec) * delta / HZ;
+ incr = READ_ONCE(net->ipv4.sysctl_icmp_msgs_per_sec) * delta / HZ;
if (!incr)
return false;
if (cmpxchg(&net->ipv4.icmp_global_stamp, oldstamp, now) == oldstamp) {
old = atomic_read(&net->ipv4.icmp_global_credit);
do {
- new = min(old + incr, READ_ONCE(sysctl_icmp_msgs_burst));
+ new = min(old + incr, READ_ONCE(net->ipv4.sysctl_icmp_msgs_burst));
} while (!atomic_try_cmpxchg(&net->ipv4.icmp_global_credit, &old, new));
}
return true;
@@ -1491,6 +1488,8 @@ static int __net_init icmp_sk_init(struct net *net)
net->ipv4.sysctl_icmp_ratelimit = 1 * HZ;
net->ipv4.sysctl_icmp_ratemask = 0x1818;
net->ipv4.sysctl_icmp_errors_use_inbound_ifaddr = 0;
+ net->ipv4.sysctl_icmp_msgs_per_sec = 1000;
+ net->ipv4.sysctl_icmp_msgs_burst = 50;
return 0;
}
diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c
index 4af0c234d8d763f430608d60f38eff8a6d9935b4..a79b2a52ce01e6c1a1257ba31c17ac2f51ba19ec 100644
--- a/net/ipv4/sysctl_net_ipv4.c
+++ b/net/ipv4/sysctl_net_ipv4.c
@@ -600,22 +600,6 @@ static struct ctl_table ipv4_table[] = {
.mode = 0444,
.proc_handler = proc_tcp_available_ulp,
},
- {
- .procname = "icmp_msgs_per_sec",
- .data = &sysctl_icmp_msgs_per_sec,
- .maxlen = sizeof(int),
- .mode = 0644,
- .proc_handler = proc_dointvec_minmax,
- .extra1 = SYSCTL_ZERO,
- },
- {
- .procname = "icmp_msgs_burst",
- .data = &sysctl_icmp_msgs_burst,
- .maxlen = sizeof(int),
- .mode = 0644,
- .proc_handler = proc_dointvec_minmax,
- .extra1 = SYSCTL_ZERO,
- },
{
.procname = "udp_mem",
.data = &sysctl_udp_mem,
@@ -701,6 +685,22 @@ static struct ctl_table ipv4_net_table[] = {
.mode = 0644,
.proc_handler = proc_dointvec
},
+ {
+ .procname = "icmp_msgs_per_sec",
+ .data = &init_net.ipv4.sysctl_icmp_msgs_per_sec,
+ .maxlen = sizeof(int),
+ .mode = 0644,
+ .proc_handler = proc_dointvec_minmax,
+ .extra1 = SYSCTL_ZERO,
+ },
+ {
+ .procname = "icmp_msgs_burst",
+ .data = &init_net.ipv4.sysctl_icmp_msgs_burst,
+ .maxlen = sizeof(int),
+ .mode = 0644,
+ .proc_handler = proc_dointvec_minmax,
+ .extra1 = SYSCTL_ZERO,
+ },
{
.procname = "ping_group_range",
.data = &init_net.ipv4.ping_group_range.range,
--
2.46.0.295.g3b9ea8a38a-goog
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH net-next 1/3] icmp: change the order of rate limits
2024-08-28 19:39 ` [PATCH net-next 1/3] icmp: change the order of rate limits Eric Dumazet
@ 2024-08-29 4:29 ` David Ahern
0 siblings, 0 replies; 9+ messages in thread
From: David Ahern @ 2024-08-29 4:29 UTC (permalink / raw)
To: Eric Dumazet, David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: Willy Tarreau, Keyu Man, Jesper Dangaard Brouer, netdev,
eric.dumazet, stable
On 8/28/24 1:39 PM, Eric Dumazet wrote:
> ICMP messages are ratelimited :
>
> After the blamed commits, the two rate limiters are applied in this order:
>
> 1) host wide ratelimit (icmp_global_allow())
>
> 2) Per destination ratelimit (inetpeer based)
>
> In order to avoid side-channels attacks, we need to apply
> the per destination check first.
>
> This patch makes the following change :
>
> 1) icmp_global_allow() checks if the host wide limit is reached.
> But credits are not yet consumed. This is deferred to 3)
>
> 2) The per destination limit is checked/updated.
> This might add a new node in inetpeer tree.
>
> 3) icmp_global_consume() consumes tokens if prior operations succeeded.
>
> This means that host wide ratelimit is still effective
> in keeping inetpeer tree small even under DDOS.
>
> As a bonus, I removed icmp_global.lock as the fast path
> can use a lock-free operation.
>
> Fixes: c0303efeab73 ("net: reduce cycles spend on ICMP replies that gets rate limited")
> Fixes: 4cdf507d5452 ("icmp: add a global rate limitation")
> Reported-by: Keyu Man <keyu.man@email.ucr.edu>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Cc: Jesper Dangaard Brouer <hawk@kernel.org>
> Cc: stable@vger.kernel.org
> ---
> include/net/ip.h | 2 +
> net/ipv4/icmp.c | 103 ++++++++++++++++++++++++++---------------------
> net/ipv6/icmp.c | 28 ++++++++-----
> 3 files changed, 76 insertions(+), 57 deletions(-)
>
Reviewed-by: David Ahern <dsahern@kernel.org>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH net-next 2/3] icmp: move icmp_global.credit and icmp_global.stamp to per netns storage
2024-08-28 19:39 ` [PATCH net-next 2/3] icmp: move icmp_global.credit and icmp_global.stamp to per netns storage Eric Dumazet
@ 2024-08-29 4:30 ` David Ahern
2024-08-29 13:33 ` Simon Horman
1 sibling, 0 replies; 9+ messages in thread
From: David Ahern @ 2024-08-29 4:30 UTC (permalink / raw)
To: Eric Dumazet, David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: Willy Tarreau, Keyu Man, Jesper Dangaard Brouer, netdev,
eric.dumazet
On 8/28/24 1:39 PM, Eric Dumazet wrote:
> Host wide ICMP ratelimiter should be per netns, to provide better isolation.
>
> Following patch in this series makes the sysctl per netns.
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> ---
> include/net/ip.h | 4 ++--
> include/net/netns/ipv4.h | 3 ++-
> net/ipv4/icmp.c | 25 ++++++++++---------------
> net/ipv6/icmp.c | 4 ++--
> 4 files changed, 16 insertions(+), 20 deletions(-)
>
Reviewed-by: David Ahern <dsahern@kernel.org>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH net-next 3/3] icmp: icmp_msgs_per_sec and icmp_msgs_burst sysctls become per netns
2024-08-28 19:39 ` [PATCH net-next 3/3] icmp: icmp_msgs_per_sec and icmp_msgs_burst sysctls become per netns Eric Dumazet
@ 2024-08-29 4:31 ` David Ahern
0 siblings, 0 replies; 9+ messages in thread
From: David Ahern @ 2024-08-29 4:31 UTC (permalink / raw)
To: Eric Dumazet, David S . Miller, Jakub Kicinski, Paolo Abeni
Cc: Willy Tarreau, Keyu Man, Jesper Dangaard Brouer, netdev,
eric.dumazet
On 8/28/24 1:39 PM, Eric Dumazet wrote:
> Previous patch made ICMP rate limits per netns, it makes sense
> to allow each netns to change the associated sysctl.
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> ---
> include/net/ip.h | 3 ---
> include/net/netns/ipv4.h | 2 ++
> net/ipv4/icmp.c | 9 ++++-----
> net/ipv4/sysctl_net_ipv4.c | 32 ++++++++++++++++----------------
> 4 files changed, 22 insertions(+), 24 deletions(-)
>
Reviewed-by: David Ahern <dsahern@kernel.org>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH net-next 2/3] icmp: move icmp_global.credit and icmp_global.stamp to per netns storage
2024-08-28 19:39 ` [PATCH net-next 2/3] icmp: move icmp_global.credit and icmp_global.stamp to per netns storage Eric Dumazet
2024-08-29 4:30 ` David Ahern
@ 2024-08-29 13:33 ` Simon Horman
2024-08-29 13:54 ` Eric Dumazet
1 sibling, 1 reply; 9+ messages in thread
From: Simon Horman @ 2024-08-29 13:33 UTC (permalink / raw)
To: Eric Dumazet
Cc: David S. Miller, Jakub Kicinski, Paolo Abeni, David Ahern,
Willy Tarreau, Keyu Man, Jesper Dangaard Brouer, netdev,
eric.dumazet
On Wed, Aug 28, 2024 at 07:39:47PM +0000, Eric Dumazet wrote:
> Host wide ICMP ratelimiter should be per netns, to provide better isolation.
>
> Following patch in this series makes the sysctl per netns.
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
...
> diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c
...
> @@ -235,7 +230,7 @@ static struct {
> * Returns false if we reached the limit and can not send another packet.
> * Works in tandem with icmp_global_consume().
> */
Hi Eric,
nit: This could be handled in a follow-up, and I'm happy to prepare it
myself, but net should be added to the Kernel doc above.
> -bool icmp_global_allow(void)
> +bool icmp_global_allow(struct net *net)
> {
> u32 delta, now, oldstamp;
> int incr, new, old;
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH net-next 2/3] icmp: move icmp_global.credit and icmp_global.stamp to per netns storage
2024-08-29 13:33 ` Simon Horman
@ 2024-08-29 13:54 ` Eric Dumazet
0 siblings, 0 replies; 9+ messages in thread
From: Eric Dumazet @ 2024-08-29 13:54 UTC (permalink / raw)
To: Simon Horman
Cc: David S. Miller, Jakub Kicinski, Paolo Abeni, David Ahern,
Willy Tarreau, Keyu Man, Jesper Dangaard Brouer, netdev,
eric.dumazet
On Thu, Aug 29, 2024 at 3:34 PM Simon Horman <horms@kernel.org> wrote:
>
> On Wed, Aug 28, 2024 at 07:39:47PM +0000, Eric Dumazet wrote:
> > Host wide ICMP ratelimiter should be per netns, to provide better isolation.
> >
> > Following patch in this series makes the sysctl per netns.
> >
> > Signed-off-by: Eric Dumazet <edumazet@google.com>
>
> ...
>
> > diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c
>
> ...
>
> > @@ -235,7 +230,7 @@ static struct {
> > * Returns false if we reached the limit and can not send another packet.
> > * Works in tandem with icmp_global_consume().
> > */
>
> Hi Eric,
>
> nit: This could be handled in a follow-up, and I'm happy to prepare it
> myself, but net should be added to the Kernel doc above.
Thanks, good point, let me send a V2 real quick.
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2024-08-29 13:54 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-08-28 19:39 [PATCH net-next 0/3] icmp: avoid possible side-channels attacks Eric Dumazet
2024-08-28 19:39 ` [PATCH net-next 1/3] icmp: change the order of rate limits Eric Dumazet
2024-08-29 4:29 ` David Ahern
2024-08-28 19:39 ` [PATCH net-next 2/3] icmp: move icmp_global.credit and icmp_global.stamp to per netns storage Eric Dumazet
2024-08-29 4:30 ` David Ahern
2024-08-29 13:33 ` Simon Horman
2024-08-29 13:54 ` Eric Dumazet
2024-08-28 19:39 ` [PATCH net-next 3/3] icmp: icmp_msgs_per_sec and icmp_msgs_burst sysctls become per netns Eric Dumazet
2024-08-29 4:31 ` David Ahern
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).