From: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
To: "David S . Miller" <davem@davemloft.net>,
Jakub Kicinski <kuba@kernel.org>,
Eric Dumazet <edumazet@google.com>,
Alexei Starovoitov <ast@kernel.org>,
Daniel Borkmann <daniel@iogearbox.net>
Cc: Benjamin Herrenschmidt <benh@amazon.com>,
Kuniyuki Iwashima <kuniyu@amazon.co.jp>,
Kuniyuki Iwashima <kuni1840@gmail.com>, <bpf@vger.kernel.org>,
<netdev@vger.kernel.org>, <linux-kernel@vger.kernel.org>
Subject: [RFC PATCH bpf-next 1/8] net: Introduce net.ipv4.tcp_migrate_req.
Date: Tue, 17 Nov 2020 18:40:16 +0900 [thread overview]
Message-ID: <20201117094023.3685-2-kuniyu@amazon.co.jp> (raw)
In-Reply-To: <20201117094023.3685-1-kuniyu@amazon.co.jp>
This commit adds a new sysctl option: net.ipv4.tcp_migrate_req. If this
option is enabled, and then we call listen() for SO_REUSEPORT enabled
sockets and close one, we will be able to migrate its child sockets to
another listener.
Reviewed-by: Benjamin Herrenschmidt <benh@amazon.com>
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp>
---
Documentation/networking/ip-sysctl.rst | 15 +++++++++++++++
include/net/netns/ipv4.h | 1 +
net/ipv4/sysctl_net_ipv4.c | 9 +++++++++
3 files changed, 25 insertions(+)
diff --git a/Documentation/networking/ip-sysctl.rst b/Documentation/networking/ip-sysctl.rst
index dd2b12a32b73..4116771bf5ef 100644
--- a/Documentation/networking/ip-sysctl.rst
+++ b/Documentation/networking/ip-sysctl.rst
@@ -712,6 +712,21 @@ tcp_syncookies - INTEGER
network connections you can set this knob to 2 to enable
unconditionally generation of syncookies.
+tcp_migrate_req - INTEGER
+ By default, when a listening socket is closed, child sockets are also
+ closed. If it has SO_REUSEPORT enabled, the dropped connections should
+ have been accepted by other listeners on the same port. This option
+ makes it possible to migrate child sockets to another listener when
+ calling close() or shutdown().
+
+ Default: 0
+
+ Note that the source and destination listeners _must_ have the same
+ settings at the socket API level. If there are different kinds of
+ sockets on the port, disable this option or use
+ BPF_PROG_TYPE_SK_REUSEPORT program to select the correct socket by
+ bpf_sk_select_reuseport() or to cancel migration by returning SK_DROP.
+
tcp_fastopen - INTEGER
Enable TCP Fast Open (RFC7413) to send and accept data in the opening
SYN packet.
diff --git a/include/net/netns/ipv4.h b/include/net/netns/ipv4.h
index 8e4fcac4df72..a3edc30d6a63 100644
--- a/include/net/netns/ipv4.h
+++ b/include/net/netns/ipv4.h
@@ -132,6 +132,7 @@ struct netns_ipv4 {
int sysctl_tcp_syn_retries;
int sysctl_tcp_synack_retries;
int sysctl_tcp_syncookies;
+ int sysctl_tcp_migrate_req;
int sysctl_tcp_reordering;
int sysctl_tcp_retries1;
int sysctl_tcp_retries2;
diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c
index 3e5f4f2e705e..6b76298fa271 100644
--- a/net/ipv4/sysctl_net_ipv4.c
+++ b/net/ipv4/sysctl_net_ipv4.c
@@ -933,6 +933,15 @@ static struct ctl_table ipv4_net_table[] = {
.proc_handler = proc_dointvec
},
#endif
+ {
+ .procname = "tcp_migrate_req",
+ .data = &init_net.ipv4.sysctl_tcp_migrate_req,
+ .maxlen = sizeof(int),
+ .mode = 0644,
+ .proc_handler = proc_dointvec_minmax,
+ .extra1 = SYSCTL_ZERO,
+ .extra2 = SYSCTL_ONE
+ },
{
.procname = "tcp_reordering",
.data = &init_net.ipv4.sysctl_tcp_reordering,
--
2.17.2 (Apple Git-113)
next prev parent reply other threads:[~2020-11-17 9:41 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-11-17 9:40 [RFC PATCH bpf-next 0/8] Socket migration for SO_REUSEPORT Kuniyuki Iwashima
2020-11-17 9:40 ` Kuniyuki Iwashima [this message]
2020-11-17 9:40 ` [RFC PATCH bpf-next 2/8] tcp: Keep TCP_CLOSE sockets in the reuseport group Kuniyuki Iwashima
2020-11-17 9:40 ` [RFC PATCH bpf-next 3/8] tcp: Migrate TCP_ESTABLISHED/TCP_SYN_RECV sockets in accept queues Kuniyuki Iwashima
2020-11-18 23:50 ` Martin KaFai Lau
2020-11-19 22:09 ` Kuniyuki Iwashima
2020-11-20 1:53 ` Martin KaFai Lau
2020-11-21 10:13 ` Kuniyuki Iwashima
2020-11-23 0:40 ` Martin KaFai Lau
2020-11-24 9:24 ` Kuniyuki Iwashima
2020-11-17 9:40 ` [RFC PATCH bpf-next 4/8] tcp: Migrate TFO requests causing RST during TCP_SYN_RECV Kuniyuki Iwashima
2020-11-17 9:40 ` [RFC PATCH bpf-next 5/8] tcp: Migrate TCP_NEW_SYN_RECV requests Kuniyuki Iwashima
2020-11-17 9:40 ` [RFC PATCH bpf-next 6/8] bpf: Add cookie in sk_reuseport_md Kuniyuki Iwashima
2020-11-19 0:11 ` Martin KaFai Lau
2020-11-19 22:10 ` Kuniyuki Iwashima
2020-11-17 9:40 ` [RFC PATCH bpf-next 7/8] bpf: Call bpf_run_sk_reuseport() for socket migration Kuniyuki Iwashima
2020-11-19 1:00 ` Martin KaFai Lau
2020-11-19 22:13 ` Kuniyuki Iwashima
2020-11-17 9:40 ` [RFC PATCH bpf-next 8/8] bpf: Test BPF_PROG_TYPE_SK_REUSEPORT " Kuniyuki Iwashima
2020-11-18 9:18 ` [RFC PATCH bpf-next 0/8] Socket migration for SO_REUSEPORT David Laight
2020-11-19 22:01 ` Kuniyuki Iwashima
2020-11-18 16:25 ` Eric Dumazet
2020-11-19 22:05 ` Kuniyuki Iwashima
2020-11-19 1:49 ` Martin KaFai Lau
2020-11-19 22:17 ` Kuniyuki Iwashima
2020-11-20 2:31 ` Martin KaFai Lau
2020-11-21 10:16 ` Kuniyuki Iwashima
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20201117094023.3685-2-kuniyu@amazon.co.jp \
--to=kuniyu@amazon.co.jp \
--cc=ast@kernel.org \
--cc=benh@amazon.com \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=kuba@kernel.org \
--cc=kuni1840@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=netdev@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox