From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
stable@vger.kernel.org, Willem de Bruijn <willemb@google.com>,
Benjamin Herrenschmidt <benh@amazon.com>,
Kuniyuki Iwashima <kuniyu@amazon.co.jp>,
"David S. Miller" <davem@davemloft.net>
Subject: [PATCH 5.7 17/20] udp: Improve load balancing for SO_REUSEPORT.
Date: Thu, 30 Jul 2020 10:04:07 +0200 [thread overview]
Message-ID: <20200730074421.342711211@linuxfoundation.org> (raw)
In-Reply-To: <20200730074420.533211699@linuxfoundation.org>
From: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
[ Upstream commit efc6b6f6c3113e8b203b9debfb72d81e0f3dcace ]
Currently, SO_REUSEPORT does not work well if connected sockets are in a
UDP reuseport group.
Then reuseport_has_conns() returns true and the result of
reuseport_select_sock() is discarded. Also, unconnected sockets have the
same score, hence only does the first unconnected socket in udp_hslot
always receive all packets sent to unconnected sockets.
So, the result of reuseport_select_sock() should be used for load
balancing.
The noteworthy point is that the unconnected sockets placed after
connected sockets in sock_reuseport.socks will receive more packets than
others because of the algorithm in reuseport_select_sock().
index | connected | reciprocal_scale | result
---------------------------------------------
0 | no | 20% | 40%
1 | no | 20% | 20%
2 | yes | 20% | 0%
3 | no | 20% | 40%
4 | yes | 20% | 0%
If most of the sockets are connected, this can be a problem, but it still
works better than now.
Fixes: acdcecc61285 ("udp: correct reuseport selection with connected sockets")
CC: Willem de Bruijn <willemb@google.com>
Reviewed-by: Benjamin Herrenschmidt <benh@amazon.com>
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
net/ipv4/udp.c | 15 +++++++++------
net/ipv6/udp.c | 15 +++++++++------
2 files changed, 18 insertions(+), 12 deletions(-)
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -413,7 +413,7 @@ static struct sock *udp4_lib_lookup2(str
struct udp_hslot *hslot2,
struct sk_buff *skb)
{
- struct sock *sk, *result;
+ struct sock *sk, *result, *reuseport_result;
int score, badness;
u32 hash = 0;
@@ -423,17 +423,20 @@ static struct sock *udp4_lib_lookup2(str
score = compute_score(sk, net, saddr, sport,
daddr, hnum, dif, sdif);
if (score > badness) {
+ reuseport_result = NULL;
+
if (sk->sk_reuseport &&
sk->sk_state != TCP_ESTABLISHED) {
hash = udp_ehashfn(net, daddr, hnum,
saddr, sport);
- result = reuseport_select_sock(sk, hash, skb,
- sizeof(struct udphdr));
- if (result && !reuseport_has_conns(sk, false))
- return result;
+ reuseport_result = reuseport_select_sock(sk, hash, skb,
+ sizeof(struct udphdr));
+ if (reuseport_result && !reuseport_has_conns(sk, false))
+ return reuseport_result;
}
+
+ result = reuseport_result ? : sk;
badness = score;
- result = sk;
}
}
return result;
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -148,7 +148,7 @@ static struct sock *udp6_lib_lookup2(str
int dif, int sdif, struct udp_hslot *hslot2,
struct sk_buff *skb)
{
- struct sock *sk, *result;
+ struct sock *sk, *result, *reuseport_result;
int score, badness;
u32 hash = 0;
@@ -158,17 +158,20 @@ static struct sock *udp6_lib_lookup2(str
score = compute_score(sk, net, saddr, sport,
daddr, hnum, dif, sdif);
if (score > badness) {
+ reuseport_result = NULL;
+
if (sk->sk_reuseport &&
sk->sk_state != TCP_ESTABLISHED) {
hash = udp6_ehashfn(net, daddr, hnum,
saddr, sport);
- result = reuseport_select_sock(sk, hash, skb,
- sizeof(struct udphdr));
- if (result && !reuseport_has_conns(sk, false))
- return result;
+ reuseport_result = reuseport_select_sock(sk, hash, skb,
+ sizeof(struct udphdr));
+ if (reuseport_result && !reuseport_has_conns(sk, false))
+ return reuseport_result;
}
- result = sk;
+
+ result = reuseport_result ? : sk;
badness = score;
}
}
next prev parent reply other threads:[~2020-07-30 8:21 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-07-30 8:03 [PATCH 5.7 00/20] 5.7.12-rc1 review Greg Kroah-Hartman
2020-07-30 8:03 ` [PATCH 5.7 01/20] AX.25: Fix out-of-bounds read in ax25_connect() Greg Kroah-Hartman
2020-07-30 8:03 ` [PATCH 5.7 02/20] AX.25: Prevent out-of-bounds read in ax25_sendmsg() Greg Kroah-Hartman
2020-07-30 8:03 ` [PATCH 5.7 03/20] dev: Defer free of skbs in flush_backlog Greg Kroah-Hartman
2020-07-30 8:03 ` [PATCH 5.7 04/20] drivers/net/wan/x25_asy: Fix to make it work Greg Kroah-Hartman
2020-07-30 8:03 ` [PATCH 5.7 05/20] ip6_gre: fix null-ptr-deref in ip6gre_init_net() Greg Kroah-Hartman
2020-07-30 8:03 ` [PATCH 5.7 06/20] net/sched: act_ct: fix restore the qdisc_skb_cb after defrag Greg Kroah-Hartman
2020-07-30 8:03 ` [PATCH 5.7 07/20] net-sysfs: add a newline when printing tx_timeout by sysfs Greg Kroah-Hartman
2020-07-30 8:03 ` [PATCH 5.7 08/20] net: udp: Fix wrong clean up for IS_UDPLITE macro Greg Kroah-Hartman
2020-07-30 8:03 ` [PATCH 5.7 09/20] qrtr: orphan socket in qrtr_release() Greg Kroah-Hartman
2020-07-30 8:04 ` [PATCH 5.7 10/20] rtnetlink: Fix memory(net_device) leak when ->newlink fails Greg Kroah-Hartman
2020-07-30 8:04 ` [PATCH 5.7 11/20] rxrpc: Fix sendmsg() returning EPIPE due to recvmsg() returning ENODATA Greg Kroah-Hartman
2020-07-30 8:04 ` [PATCH 5.7 12/20] tcp: allow at most one TLP probe per flight Greg Kroah-Hartman
2020-07-30 8:04 ` [PATCH 5.7 13/20] AX.25: Prevent integer overflows in connect and sendmsg Greg Kroah-Hartman
2020-07-30 8:04 ` [PATCH 5.7 14/20] sctp: shrink stream outq only when new outcnt < old outcnt Greg Kroah-Hartman
2020-07-30 8:04 ` [PATCH 5.7 15/20] sctp: shrink stream outq when fails to do addstream reconf Greg Kroah-Hartman
2020-07-30 8:04 ` [PATCH 5.7 16/20] udp: Copy has_conns in reuseport_grow() Greg Kroah-Hartman
2020-07-30 8:04 ` Greg Kroah-Hartman [this message]
2020-07-30 8:04 ` [PATCH 5.7 18/20] tipc: allow to build NACK message in link timeout function Greg Kroah-Hartman
2020-07-30 8:04 ` [PATCH 5.7 19/20] io_uring: ensure double poll additions work with both request types Greg Kroah-Hartman
2020-07-30 8:04 ` [PATCH 5.7 20/20] regmap: debugfs: check count when read regmap file Greg Kroah-Hartman
2020-07-30 16:48 ` [PATCH 5.7 00/20] 5.7.12-rc1 review Guenter Roeck
2020-07-31 17:15 ` Greg Kroah-Hartman
2020-07-31 8:59 ` Naresh Kamboju
2020-07-31 12:53 ` Jon Hunter
2020-07-31 17:15 ` Greg Kroah-Hartman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200730074421.342711211@linuxfoundation.org \
--to=gregkh@linuxfoundation.org \
--cc=benh@amazon.com \
--cc=davem@davemloft.net \
--cc=kuniyu@amazon.co.jp \
--cc=linux-kernel@vger.kernel.org \
--cc=stable@vger.kernel.org \
--cc=willemb@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).