From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
stable@vger.kernel.org, Willem de Bruijn <willemb@google.com>,
Benjamin Herrenschmidt <benh@amazon.com>,
Kuniyuki Iwashima <kuniyu@amazon.co.jp>,
"David S. Miller" <davem@davemloft.net>
Subject: [PATCH 4.19 15/17] udp: Improve load balancing for SO_REUSEPORT.
Date: Thu, 30 Jul 2020 10:04:41 +0200 [thread overview]
Message-ID: <20200730074421.210362558@linuxfoundation.org> (raw)
In-Reply-To: <20200730074420.449233408@linuxfoundation.org>
From: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
[ Upstream commit efc6b6f6c3113e8b203b9debfb72d81e0f3dcace ]
Currently, SO_REUSEPORT does not work well if connected sockets are in a
UDP reuseport group.
Then reuseport_has_conns() returns true and the result of
reuseport_select_sock() is discarded. Also, unconnected sockets have the
same score, hence only does the first unconnected socket in udp_hslot
always receive all packets sent to unconnected sockets.
So, the result of reuseport_select_sock() should be used for load
balancing.
The noteworthy point is that the unconnected sockets placed after
connected sockets in sock_reuseport.socks will receive more packets than
others because of the algorithm in reuseport_select_sock().
index | connected | reciprocal_scale | result
---------------------------------------------
0 | no | 20% | 40%
1 | no | 20% | 20%
2 | yes | 20% | 0%
3 | no | 20% | 40%
4 | yes | 20% | 0%
If most of the sockets are connected, this can be a problem, but it still
works better than now.
Fixes: acdcecc61285 ("udp: correct reuseport selection with connected sockets")
CC: Willem de Bruijn <willemb@google.com>
Reviewed-by: Benjamin Herrenschmidt <benh@amazon.com>
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
net/ipv4/udp.c | 15 +++++++++------
net/ipv6/udp.c | 15 +++++++++------
2 files changed, 18 insertions(+), 12 deletions(-)
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -433,7 +433,7 @@ static struct sock *udp4_lib_lookup2(str
struct udp_hslot *hslot2,
struct sk_buff *skb)
{
- struct sock *sk, *result;
+ struct sock *sk, *result, *reuseport_result;
int score, badness;
u32 hash = 0;
@@ -443,17 +443,20 @@ static struct sock *udp4_lib_lookup2(str
score = compute_score(sk, net, saddr, sport,
daddr, hnum, dif, sdif, exact_dif);
if (score > badness) {
+ reuseport_result = NULL;
+
if (sk->sk_reuseport &&
sk->sk_state != TCP_ESTABLISHED) {
hash = udp_ehashfn(net, daddr, hnum,
saddr, sport);
- result = reuseport_select_sock(sk, hash, skb,
- sizeof(struct udphdr));
- if (result && !reuseport_has_conns(sk, false))
- return result;
+ reuseport_result = reuseport_select_sock(sk, hash, skb,
+ sizeof(struct udphdr));
+ if (reuseport_result && !reuseport_has_conns(sk, false))
+ return reuseport_result;
}
+
+ result = reuseport_result ? : sk;
badness = score;
- result = sk;
}
}
return result;
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -167,7 +167,7 @@ static struct sock *udp6_lib_lookup2(str
int dif, int sdif, bool exact_dif,
struct udp_hslot *hslot2, struct sk_buff *skb)
{
- struct sock *sk, *result;
+ struct sock *sk, *result, *reuseport_result;
int score, badness;
u32 hash = 0;
@@ -177,17 +177,20 @@ static struct sock *udp6_lib_lookup2(str
score = compute_score(sk, net, saddr, sport,
daddr, hnum, dif, sdif, exact_dif);
if (score > badness) {
+ reuseport_result = NULL;
+
if (sk->sk_reuseport &&
sk->sk_state != TCP_ESTABLISHED) {
hash = udp6_ehashfn(net, daddr, hnum,
saddr, sport);
- result = reuseport_select_sock(sk, hash, skb,
- sizeof(struct udphdr));
- if (result && !reuseport_has_conns(sk, false))
- return result;
+ reuseport_result = reuseport_select_sock(sk, hash, skb,
+ sizeof(struct udphdr));
+ if (reuseport_result && !reuseport_has_conns(sk, false))
+ return reuseport_result;
}
- result = sk;
+
+ result = reuseport_result ? : sk;
badness = score;
}
}
next prev parent reply other threads:[~2020-07-30 8:07 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-07-30 8:04 [PATCH 4.19 00/17] 4.19.136-rc1 review Greg Kroah-Hartman
2020-07-30 8:04 ` [PATCH 4.19 01/17] AX.25: Fix out-of-bounds read in ax25_connect() Greg Kroah-Hartman
2020-07-30 8:04 ` [PATCH 4.19 02/17] AX.25: Prevent out-of-bounds read in ax25_sendmsg() Greg Kroah-Hartman
2020-07-30 8:04 ` [PATCH 4.19 03/17] dev: Defer free of skbs in flush_backlog Greg Kroah-Hartman
2020-07-30 8:04 ` [PATCH 4.19 04/17] drivers/net/wan/x25_asy: Fix to make it work Greg Kroah-Hartman
2020-07-30 8:04 ` [PATCH 4.19 05/17] ip6_gre: fix null-ptr-deref in ip6gre_init_net() Greg Kroah-Hartman
2020-07-30 8:04 ` [PATCH 4.19 06/17] net-sysfs: add a newline when printing tx_timeout by sysfs Greg Kroah-Hartman
2020-07-30 8:04 ` [PATCH 4.19 07/17] net: udp: Fix wrong clean up for IS_UDPLITE macro Greg Kroah-Hartman
2020-07-30 8:04 ` [PATCH 4.19 08/17] qrtr: orphan socket in qrtr_release() Greg Kroah-Hartman
2020-07-30 8:04 ` [PATCH 4.19 09/17] rxrpc: Fix sendmsg() returning EPIPE due to recvmsg() returning ENODATA Greg Kroah-Hartman
2020-07-30 8:04 ` [PATCH 4.19 10/17] tcp: allow at most one TLP probe per flight Greg Kroah-Hartman
2020-07-30 8:04 ` [PATCH 4.19 11/17] AX.25: Prevent integer overflows in connect and sendmsg Greg Kroah-Hartman
2020-07-30 8:04 ` [PATCH 4.19 12/17] sctp: shrink stream outq only when new outcnt < old outcnt Greg Kroah-Hartman
2020-07-30 8:04 ` [PATCH 4.19 13/17] sctp: shrink stream outq when fails to do addstream reconf Greg Kroah-Hartman
2020-07-30 8:04 ` [PATCH 4.19 14/17] udp: Copy has_conns in reuseport_grow() Greg Kroah-Hartman
2020-07-30 8:04 ` Greg Kroah-Hartman [this message]
2020-07-30 8:04 ` [PATCH 4.19 16/17] rtnetlink: Fix memory(net_device) leak when ->newlink fails Greg Kroah-Hartman
2020-07-30 8:04 ` [PATCH 4.19 17/17] regmap: debugfs: check count when read regmap file Greg Kroah-Hartman
2020-07-30 16:47 ` [PATCH 4.19 00/17] 4.19.136-rc1 review Guenter Roeck
2020-07-31 12:00 ` Naresh Kamboju
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200730074421.210362558@linuxfoundation.org \
--to=gregkh@linuxfoundation.org \
--cc=benh@amazon.com \
--cc=davem@davemloft.net \
--cc=kuniyu@amazon.co.jp \
--cc=linux-kernel@vger.kernel.org \
--cc=stable@vger.kernel.org \
--cc=willemb@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox