* [PATCH net-next 0/3] sctp: add support for sk_reuseport
@ 2018-10-21 4:43 Xin Long
2018-10-21 4:43 ` [PATCH net-next 1/3] sctp: do reuseport_select_sock in __sctp_rcv_lookup_endpoint Xin Long
` (3 more replies)
0 siblings, 4 replies; 11+ messages in thread
From: Xin Long @ 2018-10-21 4:43 UTC (permalink / raw)
To: network dev, linux-sctp; +Cc: Marcelo Ricardo Leitner, Neil Horman, davem
sctp sk_reuseport allows multiple socks to listen on the same port and
addresses, as long as these socks have the same uid. This works pretty
much as TCP/UDP does, the only difference is that sctp is multi-homing
and all the bind_addrs in these socks will have to completely matched,
otherwise listen() will return err.
The below is when 5 sockets are listening on 172.16.254.254:6400 on a
server, 26 sockets on a client connect to 172.16.254.254:6400 and each
may be processed by a different socket on the server which is selected
by hash(lport, pport, paddr) in reuseport_select_sock():
# ss --sctp -nn
State Recv-Q Send-Q Local Address:Port Peer Address:Port
LISTEN 0 10 172.16.254.254:6400 *:*
`- ESTAB 0 0 172.16.254.254%eth1:6400 172.16.2.1:1234
`- ESTAB 0 0 172.16.254.254%eth1:6400 172.16.2.4:1234
`- ESTAB 0 0 172.16.254.254%eth1:6400 172.16.3.3:1234
`- ESTAB 0 0 172.16.254.254%eth1:6400 172.16.3.4:1234
`- ESTAB 0 0 172.16.254.254%eth1:6400 172.16.5.2:1234
`- ESTAB 0 0 172.16.254.254%eth1:6400 172.16.5.3:1234
LISTEN 0 10 172.16.254.254:6400 *:*
`- ESTAB 0 0 172.16.254.254%eth1:6400 172.16.1.3:1234
`- ESTAB 0 0 172.16.254.254%eth1:6400 172.16.1.4:1234
`- ESTAB 0 0 172.16.254.254%eth1:6400 172.16.3.2:1234
`- ESTAB 0 0 172.16.254.254%eth1:6400 172.16.4.1:1234
`- ESTAB 0 0 172.16.254.254%eth1:6400 172.16.4.2:1234
`- ESTAB 0 0 172.16.254.254%eth1:6400 172.16.4.3:1234
`- ESTAB 0 0 172.16.254.254%eth1:6400 172.16.4.4:1234
LISTEN 0 10 172.16.254.254:6400 *:*
`- ESTAB 0 0 172.16.254.254%eth1:6400 172.16.1.2:1234
`- ESTAB 0 0 172.16.254.254%eth1:6400 172.16.3.5:1234
`- ESTAB 0 0 172.16.254.254%eth1:6400 172.16.4.5:1234
`- ESTAB 0 0 172.16.254.254%eth1:6400 172.16.253.253:1234
LISTEN 0 10 172.16.254.254:6400 *:*
`- ESTAB 0 0 172.16.254.254%eth1:6400 172.16.2.2:1234
`- ESTAB 0 0 172.16.254.254%eth1:6400 172.16.2.3:1234
`- ESTAB 0 0 172.16.254.254%eth1:6400 172.16.5.4:1234
`- ESTAB 0 0 172.16.254.254%eth1:6400 172.16.5.5:1234
LISTEN 0 10 172.16.254.254:6400 *:*
`- ESTAB 0 0 172.16.254.254%eth1:6400 172.16.1.1:1234
`- ESTAB 0 0 172.16.254.254%eth1:6400 172.16.1.5:1234
`- ESTAB 0 0 172.16.254.254%eth1:6400 172.16.2.5:1234
`- ESTAB 0 0 172.16.254.254%eth1:6400 172.16.3.1:1234
`- ESTAB 0 0 172.16.254.254%eth1:6400 172.16.5.1:1234
Xin Long (3):
sctp: do reuseport_select_sock in __sctp_rcv_lookup_endpoint
sctp: add sock_reuseport for the sock in __sctp_hash_endpoint
sctp: process sk_reuseport in sctp_get_port_local
include/net/sctp/sctp.h | 2 +-
include/net/sctp/structs.h | 6 ++-
net/core/sock_reuseport.c | 1 +
net/sctp/bind_addr.c | 28 ++++++++++
net/sctp/input.c | 129 ++++++++++++++++++++++++++++++++-------------
net/sctp/socket.c | 49 +++++++++++------
6 files changed, 162 insertions(+), 53 deletions(-)
--
2.1.0
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH net-next 1/3] sctp: do reuseport_select_sock in __sctp_rcv_lookup_endpoint
2018-10-21 4:43 [PATCH net-next 0/3] sctp: add support for sk_reuseport Xin Long
@ 2018-10-21 4:43 ` Xin Long
2018-10-21 4:43 ` [PATCH net-next 2/3] sctp: add sock_reuseport for the sock in __sctp_hash_endpoint Xin Long
2018-10-22 14:17 ` [PATCH net-next 1/3] sctp: do reuseport_select_sock in __sctp_rcv_lookup_endpoint Marcelo Ricardo Leitner
2018-10-21 6:58 ` [PATCH net-next 0/3] sctp: add support for sk_reuseport Xin Long
` (2 subsequent siblings)
3 siblings, 2 replies; 11+ messages in thread
From: Xin Long @ 2018-10-21 4:43 UTC (permalink / raw)
To: network dev, linux-sctp; +Cc: Marcelo Ricardo Leitner, Neil Horman, davem
This is a part of sk_reuseport support for sctp, and it selects a
sock by the hashkey of lport, paddr and dport by default. It will
work until sk_reuseport support is added in sctp_get_port_local()
in the next patch.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
---
net/sctp/input.c | 69 +++++++++++++++++++++++++++++++++-----------------------
1 file changed, 41 insertions(+), 28 deletions(-)
diff --git a/net/sctp/input.c b/net/sctp/input.c
index 5c36a99..60ede89 100644
--- a/net/sctp/input.c
+++ b/net/sctp/input.c
@@ -57,6 +57,7 @@
#include <net/sctp/checksum.h>
#include <net/net_namespace.h>
#include <linux/rhashtable.h>
+#include <net/sock_reuseport.h>
/* Forward declarations for internal helpers. */
static int sctp_rcv_ootb(struct sk_buff *);
@@ -65,8 +66,10 @@ static struct sctp_association *__sctp_rcv_lookup(struct net *net,
const union sctp_addr *paddr,
const union sctp_addr *laddr,
struct sctp_transport **transportp);
-static struct sctp_endpoint *__sctp_rcv_lookup_endpoint(struct net *net,
- const union sctp_addr *laddr);
+static struct sctp_endpoint *__sctp_rcv_lookup_endpoint(
+ struct net *net, struct sk_buff *skb,
+ const union sctp_addr *laddr,
+ const union sctp_addr *daddr);
static struct sctp_association *__sctp_lookup_association(
struct net *net,
const union sctp_addr *local,
@@ -171,7 +174,7 @@ int sctp_rcv(struct sk_buff *skb)
asoc = __sctp_rcv_lookup(net, skb, &src, &dest, &transport);
if (!asoc)
- ep = __sctp_rcv_lookup_endpoint(net, &dest);
+ ep = __sctp_rcv_lookup_endpoint(net, skb, &dest, &src);
/* Retrieve the common input handling substructure. */
rcvr = asoc ? &asoc->base : &ep->base;
@@ -770,16 +773,35 @@ void sctp_unhash_endpoint(struct sctp_endpoint *ep)
local_bh_enable();
}
+static inline __u32 sctp_hashfn(const struct net *net, __be16 lport,
+ const union sctp_addr *paddr, __u32 seed)
+{
+ __u32 addr;
+
+ if (paddr->sa.sa_family == AF_INET6)
+ addr = jhash(&paddr->v6.sin6_addr, 16, seed);
+ else
+ addr = (__force __u32)paddr->v4.sin_addr.s_addr;
+
+ return jhash_3words(addr, ((__force __u32)paddr->v4.sin_port) << 16 |
+ (__force __u32)lport, net_hash_mix(net), seed);
+}
+
/* Look up an endpoint. */
-static struct sctp_endpoint *__sctp_rcv_lookup_endpoint(struct net *net,
- const union sctp_addr *laddr)
+static struct sctp_endpoint *__sctp_rcv_lookup_endpoint(
+ struct net *net, struct sk_buff *skb,
+ const union sctp_addr *laddr,
+ const union sctp_addr *paddr)
{
struct sctp_hashbucket *head;
struct sctp_ep_common *epb;
struct sctp_endpoint *ep;
+ struct sock *sk;
+ __be32 lport;
int hash;
- hash = sctp_ep_hashfn(net, ntohs(laddr->v4.sin_port));
+ lport = laddr->v4.sin_port;
+ hash = sctp_ep_hashfn(net, ntohs(lport));
head = &sctp_ep_hashtable[hash];
read_lock(&head->lock);
sctp_for_each_hentry(epb, &head->chain) {
@@ -791,6 +813,15 @@ static struct sctp_endpoint *__sctp_rcv_lookup_endpoint(struct net *net,
ep = sctp_sk(net->sctp.ctl_sock)->ep;
hit:
+ sk = ep->base.sk;
+ if (sk->sk_reuseport) {
+ __u32 phash = sctp_hashfn(net, lport, paddr, 0);
+
+ sk = reuseport_select_sock(sk, phash, skb,
+ sizeof(struct sctphdr));
+ if (sk)
+ ep = sctp_sk(sk)->ep;
+ }
sctp_endpoint_hold(ep);
read_unlock(&head->lock);
return ep;
@@ -829,35 +860,17 @@ static inline int sctp_hash_cmp(struct rhashtable_compare_arg *arg,
static inline __u32 sctp_hash_obj(const void *data, u32 len, u32 seed)
{
const struct sctp_transport *t = data;
- const union sctp_addr *paddr = &t->ipaddr;
- const struct net *net = sock_net(t->asoc->base.sk);
- __be16 lport = htons(t->asoc->base.bind_addr.port);
- __u32 addr;
-
- if (paddr->sa.sa_family == AF_INET6)
- addr = jhash(&paddr->v6.sin6_addr, 16, seed);
- else
- addr = (__force __u32)paddr->v4.sin_addr.s_addr;
- return jhash_3words(addr, ((__force __u32)paddr->v4.sin_port) << 16 |
- (__force __u32)lport, net_hash_mix(net), seed);
+ return sctp_hashfn(sock_net(t->asoc->base.sk),
+ htons(t->asoc->base.bind_addr.port),
+ &t->ipaddr, seed);
}
static inline __u32 sctp_hash_key(const void *data, u32 len, u32 seed)
{
const struct sctp_hash_cmp_arg *x = data;
- const union sctp_addr *paddr = x->paddr;
- const struct net *net = x->net;
- __be16 lport = x->lport;
- __u32 addr;
-
- if (paddr->sa.sa_family == AF_INET6)
- addr = jhash(&paddr->v6.sin6_addr, 16, seed);
- else
- addr = (__force __u32)paddr->v4.sin_addr.s_addr;
- return jhash_3words(addr, ((__force __u32)paddr->v4.sin_port) << 16 |
- (__force __u32)lport, net_hash_mix(net), seed);
+ return sctp_hashfn(x->net, x->lport, x->paddr, seed);
}
static const struct rhashtable_params sctp_hash_params = {
--
2.1.0
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH net-next 2/3] sctp: add sock_reuseport for the sock in __sctp_hash_endpoint
2018-10-21 4:43 ` [PATCH net-next 1/3] sctp: do reuseport_select_sock in __sctp_rcv_lookup_endpoint Xin Long
@ 2018-10-21 4:43 ` Xin Long
2018-10-21 4:43 ` [PATCH net-next 3/3] sctp: process sk_reuseport in sctp_get_port_local Xin Long
2018-10-22 14:15 ` [PATCH net-next 2/3] sctp: add sock_reuseport for the sock in __sctp_hash_endpoint Marcelo Ricardo Leitner
2018-10-22 14:17 ` [PATCH net-next 1/3] sctp: do reuseport_select_sock in __sctp_rcv_lookup_endpoint Marcelo Ricardo Leitner
1 sibling, 2 replies; 11+ messages in thread
From: Xin Long @ 2018-10-21 4:43 UTC (permalink / raw)
To: network dev, linux-sctp; +Cc: Marcelo Ricardo Leitner, Neil Horman, davem
This is a part of sk_reuseport support for sctp. It defines a helper
sctp_bind_addrs_check() to check if the bind_addrs in two socks are
matched. It will add sock_reuseport if they are completely matched,
and return err if they are partly matched, and alloc sock_reuseport
if all socks are not matched at all.
It will work until sk_reuseport support is added in
sctp_get_port_local() in the next patch.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
---
include/net/sctp/sctp.h | 2 +-
include/net/sctp/structs.h | 2 ++
net/core/sock_reuseport.c | 1 +
net/sctp/bind_addr.c | 28 ++++++++++++++++++++++
net/sctp/input.c | 60 +++++++++++++++++++++++++++++++++++++++-------
net/sctp/socket.c | 3 +--
6 files changed, 85 insertions(+), 11 deletions(-)
diff --git a/include/net/sctp/sctp.h b/include/net/sctp/sctp.h
index 8c2caa3..b8cd58d 100644
--- a/include/net/sctp/sctp.h
+++ b/include/net/sctp/sctp.h
@@ -152,7 +152,7 @@ int sctp_primitive_RECONF(struct net *net, struct sctp_association *asoc,
*/
int sctp_rcv(struct sk_buff *skb);
void sctp_v4_err(struct sk_buff *skb, u32 info);
-void sctp_hash_endpoint(struct sctp_endpoint *);
+int sctp_hash_endpoint(struct sctp_endpoint *ep);
void sctp_unhash_endpoint(struct sctp_endpoint *);
struct sock *sctp_err_lookup(struct net *net, int family, struct sk_buff *,
struct sctphdr *, struct sctp_association **,
diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
index a11f937..15d017f 100644
--- a/include/net/sctp/structs.h
+++ b/include/net/sctp/structs.h
@@ -1190,6 +1190,8 @@ int sctp_bind_addr_conflict(struct sctp_bind_addr *, const union sctp_addr *,
struct sctp_sock *, struct sctp_sock *);
int sctp_bind_addr_state(const struct sctp_bind_addr *bp,
const union sctp_addr *addr);
+int sctp_bind_addrs_check(struct sctp_sock *sp,
+ struct sctp_sock *sp2, int cnt2);
union sctp_addr *sctp_find_unmatch_addr(struct sctp_bind_addr *bp,
const union sctp_addr *addrs,
int addrcnt,
diff --git a/net/core/sock_reuseport.c b/net/core/sock_reuseport.c
index ba5cba5..d8fe3e5 100644
--- a/net/core/sock_reuseport.c
+++ b/net/core/sock_reuseport.c
@@ -187,6 +187,7 @@ int reuseport_add_sock(struct sock *sk, struct sock *sk2, bool bind_inany)
call_rcu(&old_reuse->rcu, reuseport_free_rcu);
return 0;
}
+EXPORT_SYMBOL(reuseport_add_sock);
void reuseport_detach_sock(struct sock *sk)
{
diff --git a/net/sctp/bind_addr.c b/net/sctp/bind_addr.c
index 7df3704..78d0d93 100644
--- a/net/sctp/bind_addr.c
+++ b/net/sctp/bind_addr.c
@@ -337,6 +337,34 @@ int sctp_bind_addr_match(struct sctp_bind_addr *bp,
return match;
}
+int sctp_bind_addrs_check(struct sctp_sock *sp,
+ struct sctp_sock *sp2, int cnt2)
+{
+ struct sctp_bind_addr *bp2 = &sp2->ep->base.bind_addr;
+ struct sctp_bind_addr *bp = &sp->ep->base.bind_addr;
+ struct sctp_sockaddr_entry *laddr, *laddr2;
+ bool exist = false;
+ int cnt = 0;
+
+ rcu_read_lock();
+ list_for_each_entry_rcu(laddr, &bp->address_list, list) {
+ list_for_each_entry_rcu(laddr2, &bp2->address_list, list) {
+ if (sp->pf->af->cmp_addr(&laddr->a, &laddr2->a) &&
+ laddr->valid == laddr2->valid) {
+ exist = true;
+ goto next;
+ }
+ }
+ cnt = 0;
+ break;
+next:
+ cnt++;
+ }
+ rcu_read_unlock();
+
+ return (cnt == cnt2) ? 0 : (exist ? -EEXIST : 1);
+}
+
/* Does the address 'addr' conflict with any addresses in
* the bp.
*/
diff --git a/net/sctp/input.c b/net/sctp/input.c
index 60ede89..6bfeb10 100644
--- a/net/sctp/input.c
+++ b/net/sctp/input.c
@@ -723,43 +723,87 @@ static int sctp_rcv_ootb(struct sk_buff *skb)
}
/* Insert endpoint into the hash table. */
-static void __sctp_hash_endpoint(struct sctp_endpoint *ep)
+static int __sctp_hash_endpoint(struct sctp_endpoint *ep)
{
- struct net *net = sock_net(ep->base.sk);
- struct sctp_ep_common *epb;
+ struct sock *sk = ep->base.sk;
+ struct net *net = sock_net(sk);
struct sctp_hashbucket *head;
+ struct sctp_ep_common *epb;
epb = &ep->base;
-
epb->hashent = sctp_ep_hashfn(net, epb->bind_addr.port);
head = &sctp_ep_hashtable[epb->hashent];
+ if (sk->sk_reuseport) {
+ bool any = sctp_is_ep_boundall(sk);
+ struct sctp_ep_common *epb2;
+ struct list_head *list;
+ int cnt = 0, err = 1;
+
+ list_for_each(list, &ep->base.bind_addr.address_list)
+ cnt++;
+
+ sctp_for_each_hentry(epb2, &head->chain) {
+ struct sock *sk2 = epb2->sk;
+
+ if (!net_eq(sock_net(sk2), net) || sk2 == sk ||
+ !uid_eq(sock_i_uid(sk2), sock_i_uid(sk)) ||
+ !sk2->sk_reuseport)
+ continue;
+
+ err = sctp_bind_addrs_check(sctp_sk(sk2),
+ sctp_sk(sk), cnt);
+ if (!err) {
+ err = reuseport_add_sock(sk, sk2, any);
+ if (err)
+ return err;
+ break;
+ } else if (err < 0) {
+ return err;
+ }
+ }
+
+ if (err) {
+ err = reuseport_alloc(sk, any);
+ if (err)
+ return err;
+ }
+ }
+
write_lock(&head->lock);
hlist_add_head(&epb->node, &head->chain);
write_unlock(&head->lock);
+ return 0;
}
/* Add an endpoint to the hash. Local BH-safe. */
-void sctp_hash_endpoint(struct sctp_endpoint *ep)
+int sctp_hash_endpoint(struct sctp_endpoint *ep)
{
+ int err;
+
local_bh_disable();
- __sctp_hash_endpoint(ep);
+ err = __sctp_hash_endpoint(ep);
local_bh_enable();
+
+ return err;
}
/* Remove endpoint from the hash table. */
static void __sctp_unhash_endpoint(struct sctp_endpoint *ep)
{
- struct net *net = sock_net(ep->base.sk);
+ struct sock *sk = ep->base.sk;
struct sctp_hashbucket *head;
struct sctp_ep_common *epb;
epb = &ep->base;
- epb->hashent = sctp_ep_hashfn(net, epb->bind_addr.port);
+ epb->hashent = sctp_ep_hashfn(sock_net(sk), epb->bind_addr.port);
head = &sctp_ep_hashtable[epb->hashent];
+ if (rcu_access_pointer(sk->sk_reuseport_cb))
+ reuseport_detach_sock(sk);
+
write_lock(&head->lock);
hlist_del_init(&epb->node);
write_unlock(&head->lock);
diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index fc0386e..44e7d8c 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -7850,8 +7850,7 @@ static int sctp_listen_start(struct sock *sk, int backlog)
}
sk->sk_max_ack_backlog = backlog;
- sctp_hash_endpoint(ep);
- return 0;
+ return sctp_hash_endpoint(ep);
}
/*
--
2.1.0
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH net-next 3/3] sctp: process sk_reuseport in sctp_get_port_local
2018-10-21 4:43 ` [PATCH net-next 2/3] sctp: add sock_reuseport for the sock in __sctp_hash_endpoint Xin Long
@ 2018-10-21 4:43 ` Xin Long
2018-10-22 14:15 ` [PATCH net-next 2/3] sctp: add sock_reuseport for the sock in __sctp_hash_endpoint Marcelo Ricardo Leitner
1 sibling, 0 replies; 11+ messages in thread
From: Xin Long @ 2018-10-21 4:43 UTC (permalink / raw)
To: network dev, linux-sctp; +Cc: Marcelo Ricardo Leitner, Neil Horman, davem
When socks' sk_reuseport is set, the same port and address are allowed
to be bound into these socks who have the same uid.
Note that the difference from sk_reuse is that it allows multiple socks
to listen on the same port and address.
Signed-off-by: Xin Long <lucien.xin@gmail.com>
---
include/net/sctp/structs.h | 4 +++-
net/sctp/socket.c | 46 +++++++++++++++++++++++++++++++++-------------
2 files changed, 36 insertions(+), 14 deletions(-)
diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
index 15d017f..af9d494 100644
--- a/include/net/sctp/structs.h
+++ b/include/net/sctp/structs.h
@@ -96,7 +96,9 @@ struct sctp_stream;
struct sctp_bind_bucket {
unsigned short port;
- unsigned short fastreuse;
+ signed char fastreuse;
+ signed char fastreuseport;
+ kuid_t fastuid;
struct hlist_node node;
struct hlist_head owner;
struct net *net;
diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index 44e7d8c..8605705 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -7642,8 +7642,10 @@ static struct sctp_bind_bucket *sctp_bucket_create(
static long sctp_get_port_local(struct sock *sk, union sctp_addr *addr)
{
- bool reuse = (sk->sk_reuse || sctp_sk(sk)->reuse);
+ struct sctp_sock *sp = sctp_sk(sk);
+ bool reuse = (sk->sk_reuse || sp->reuse);
struct sctp_bind_hashbucket *head; /* hash list */
+ kuid_t uid = sock_i_uid(sk);
struct sctp_bind_bucket *pp;
unsigned short snum;
int ret;
@@ -7719,7 +7721,10 @@ static long sctp_get_port_local(struct sock *sk, union sctp_addr *addr)
pr_debug("%s: found a possible match\n", __func__);
- if (pp->fastreuse && reuse && sk->sk_state != SCTP_SS_LISTENING)
+ if ((pp->fastreuse && reuse &&
+ sk->sk_state != SCTP_SS_LISTENING) ||
+ (pp->fastreuseport && sk->sk_reuseport &&
+ uid_eq(pp->fastuid, uid)))
goto success;
/* Run through the list of sockets bound to the port
@@ -7733,16 +7738,18 @@ static long sctp_get_port_local(struct sock *sk, union sctp_addr *addr)
* in an endpoint.
*/
sk_for_each_bound(sk2, &pp->owner) {
- struct sctp_endpoint *ep2;
- ep2 = sctp_sk(sk2)->ep;
+ struct sctp_sock *sp2 = sctp_sk(sk2);
+ struct sctp_endpoint *ep2 = sp2->ep;
if (sk == sk2 ||
- (reuse && (sk2->sk_reuse || sctp_sk(sk2)->reuse) &&
- sk2->sk_state != SCTP_SS_LISTENING))
+ (reuse && (sk2->sk_reuse || sp2->reuse) &&
+ sk2->sk_state != SCTP_SS_LISTENING) ||
+ (sk->sk_reuseport && sk2->sk_reuseport &&
+ uid_eq(uid, sock_i_uid(sk2))))
continue;
- if (sctp_bind_addr_conflict(&ep2->base.bind_addr, addr,
- sctp_sk(sk2), sctp_sk(sk))) {
+ if (sctp_bind_addr_conflict(&ep2->base.bind_addr,
+ addr, sp2, sp)) {
ret = (long)sk2;
goto fail_unlock;
}
@@ -7765,19 +7772,32 @@ static long sctp_get_port_local(struct sock *sk, union sctp_addr *addr)
pp->fastreuse = 1;
else
pp->fastreuse = 0;
- } else if (pp->fastreuse &&
- (!reuse || sk->sk_state == SCTP_SS_LISTENING))
- pp->fastreuse = 0;
+
+ if (sk->sk_reuseport) {
+ pp->fastreuseport = 1;
+ pp->fastuid = uid;
+ } else {
+ pp->fastreuseport = 0;
+ }
+ } else {
+ if (pp->fastreuse &&
+ (!reuse || sk->sk_state == SCTP_SS_LISTENING))
+ pp->fastreuse = 0;
+
+ if (pp->fastreuseport &&
+ (!sk->sk_reuseport || !uid_eq(pp->fastuid, uid)))
+ pp->fastreuseport = 0;
+ }
/* We are set, so fill up all the data in the hash table
* entry, tie the socket list information with the rest of the
* sockets FIXME: Blurry, NPI (ipg).
*/
success:
- if (!sctp_sk(sk)->bind_hash) {
+ if (!sp->bind_hash) {
inet_sk(sk)->inet_num = snum;
sk_add_bind_node(sk, &pp->owner);
- sctp_sk(sk)->bind_hash = pp;
+ sp->bind_hash = pp;
}
ret = 0;
--
2.1.0
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH net-next 0/3] sctp: add support for sk_reuseport
2018-10-21 4:43 [PATCH net-next 0/3] sctp: add support for sk_reuseport Xin Long
2018-10-21 4:43 ` [PATCH net-next 1/3] sctp: do reuseport_select_sock in __sctp_rcv_lookup_endpoint Xin Long
@ 2018-10-21 6:58 ` Xin Long
2018-10-22 11:40 ` Neil Horman
2018-10-22 14:20 ` Marcelo Ricardo Leitner
3 siblings, 0 replies; 11+ messages in thread
From: Xin Long @ 2018-10-21 6:58 UTC (permalink / raw)
To: network dev, linux-sctp; +Cc: Marcelo Ricardo Leitner, Neil Horman, davem
[-- Attachment #1: Type: text/plain, Size: 3812 bytes --]
On Sun, Oct 21, 2018 at 1:43 PM Xin Long <lucien.xin@gmail.com> wrote:
>
> sctp sk_reuseport allows multiple socks to listen on the same port and
> addresses, as long as these socks have the same uid. This works pretty
> much as TCP/UDP does, the only difference is that sctp is multi-homing
> and all the bind_addrs in these socks will have to completely matched,
> otherwise listen() will return err.
>
> The below is when 5 sockets are listening on 172.16.254.254:6400 on a
> server, 26 sockets on a client connect to 172.16.254.254:6400 and each
> may be processed by a different socket on the server which is selected
> by hash(lport, pport, paddr) in reuseport_select_sock():
>
> # ss --sctp -nn
> State Recv-Q Send-Q Local Address:Port Peer Address:Port
> LISTEN 0 10 172.16.254.254:6400 *:*
> `- ESTAB 0 0 172.16.254.254%eth1:6400 172.16.2.1:1234
> `- ESTAB 0 0 172.16.254.254%eth1:6400 172.16.2.4:1234
> `- ESTAB 0 0 172.16.254.254%eth1:6400 172.16.3.3:1234
> `- ESTAB 0 0 172.16.254.254%eth1:6400 172.16.3.4:1234
> `- ESTAB 0 0 172.16.254.254%eth1:6400 172.16.5.2:1234
> `- ESTAB 0 0 172.16.254.254%eth1:6400 172.16.5.3:1234
> LISTEN 0 10 172.16.254.254:6400 *:*
> `- ESTAB 0 0 172.16.254.254%eth1:6400 172.16.1.3:1234
> `- ESTAB 0 0 172.16.254.254%eth1:6400 172.16.1.4:1234
> `- ESTAB 0 0 172.16.254.254%eth1:6400 172.16.3.2:1234
> `- ESTAB 0 0 172.16.254.254%eth1:6400 172.16.4.1:1234
> `- ESTAB 0 0 172.16.254.254%eth1:6400 172.16.4.2:1234
> `- ESTAB 0 0 172.16.254.254%eth1:6400 172.16.4.3:1234
> `- ESTAB 0 0 172.16.254.254%eth1:6400 172.16.4.4:1234
> LISTEN 0 10 172.16.254.254:6400 *:*
> `- ESTAB 0 0 172.16.254.254%eth1:6400 172.16.1.2:1234
> `- ESTAB 0 0 172.16.254.254%eth1:6400 172.16.3.5:1234
> `- ESTAB 0 0 172.16.254.254%eth1:6400 172.16.4.5:1234
> `- ESTAB 0 0 172.16.254.254%eth1:6400 172.16.253.253:1234
> LISTEN 0 10 172.16.254.254:6400 *:*
> `- ESTAB 0 0 172.16.254.254%eth1:6400 172.16.2.2:1234
> `- ESTAB 0 0 172.16.254.254%eth1:6400 172.16.2.3:1234
> `- ESTAB 0 0 172.16.254.254%eth1:6400 172.16.5.4:1234
> `- ESTAB 0 0 172.16.254.254%eth1:6400 172.16.5.5:1234
> LISTEN 0 10 172.16.254.254:6400 *:*
> `- ESTAB 0 0 172.16.254.254%eth1:6400 172.16.1.1:1234
> `- ESTAB 0 0 172.16.254.254%eth1:6400 172.16.1.5:1234
> `- ESTAB 0 0 172.16.254.254%eth1:6400 172.16.2.5:1234
> `- ESTAB 0 0 172.16.254.254%eth1:6400 172.16.3.1:1234
> `- ESTAB 0 0 172.16.254.254%eth1:6400 172.16.5.1:1234
Attached is the testcase based on sctp-tests.git.
>
> Xin Long (3):
> sctp: do reuseport_select_sock in __sctp_rcv_lookup_endpoint
> sctp: add sock_reuseport for the sock in __sctp_hash_endpoint
> sctp: process sk_reuseport in sctp_get_port_local
>
> include/net/sctp/sctp.h | 2 +-
> include/net/sctp/structs.h | 6 ++-
> net/core/sock_reuseport.c | 1 +
> net/sctp/bind_addr.c | 28 ++++++++++
> net/sctp/input.c | 129 ++++++++++++++++++++++++++++++++-------------
> net/sctp/socket.c | 49 +++++++++++------
> 6 files changed, 162 insertions(+), 53 deletions(-)
>
> --
> 2.1.0
>
[-- Attachment #2: reuseport.tar.gz --]
[-- Type: application/x-gzip, Size: 2501 bytes --]
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH net-next 0/3] sctp: add support for sk_reuseport
2018-10-21 4:43 [PATCH net-next 0/3] sctp: add support for sk_reuseport Xin Long
2018-10-21 4:43 ` [PATCH net-next 1/3] sctp: do reuseport_select_sock in __sctp_rcv_lookup_endpoint Xin Long
2018-10-21 6:58 ` [PATCH net-next 0/3] sctp: add support for sk_reuseport Xin Long
@ 2018-10-22 11:40 ` Neil Horman
2018-10-22 14:20 ` Marcelo Ricardo Leitner
3 siblings, 0 replies; 11+ messages in thread
From: Neil Horman @ 2018-10-22 11:40 UTC (permalink / raw)
To: Xin Long; +Cc: network dev, linux-sctp, Marcelo Ricardo Leitner, davem
On Sun, Oct 21, 2018 at 12:43:35PM +0800, Xin Long wrote:
> sctp sk_reuseport allows multiple socks to listen on the same port and
> addresses, as long as these socks have the same uid. This works pretty
> much as TCP/UDP does, the only difference is that sctp is multi-homing
> and all the bind_addrs in these socks will have to completely matched,
> otherwise listen() will return err.
>
> The below is when 5 sockets are listening on 172.16.254.254:6400 on a
> server, 26 sockets on a client connect to 172.16.254.254:6400 and each
> may be processed by a different socket on the server which is selected
> by hash(lport, pport, paddr) in reuseport_select_sock():
>
> # ss --sctp -nn
> State Recv-Q Send-Q Local Address:Port Peer Address:Port
> LISTEN 0 10 172.16.254.254:6400 *:*
> `- ESTAB 0 0 172.16.254.254%eth1:6400 172.16.2.1:1234
> `- ESTAB 0 0 172.16.254.254%eth1:6400 172.16.2.4:1234
> `- ESTAB 0 0 172.16.254.254%eth1:6400 172.16.3.3:1234
> `- ESTAB 0 0 172.16.254.254%eth1:6400 172.16.3.4:1234
> `- ESTAB 0 0 172.16.254.254%eth1:6400 172.16.5.2:1234
> `- ESTAB 0 0 172.16.254.254%eth1:6400 172.16.5.3:1234
> LISTEN 0 10 172.16.254.254:6400 *:*
> `- ESTAB 0 0 172.16.254.254%eth1:6400 172.16.1.3:1234
> `- ESTAB 0 0 172.16.254.254%eth1:6400 172.16.1.4:1234
> `- ESTAB 0 0 172.16.254.254%eth1:6400 172.16.3.2:1234
> `- ESTAB 0 0 172.16.254.254%eth1:6400 172.16.4.1:1234
> `- ESTAB 0 0 172.16.254.254%eth1:6400 172.16.4.2:1234
> `- ESTAB 0 0 172.16.254.254%eth1:6400 172.16.4.3:1234
> `- ESTAB 0 0 172.16.254.254%eth1:6400 172.16.4.4:1234
> LISTEN 0 10 172.16.254.254:6400 *:*
> `- ESTAB 0 0 172.16.254.254%eth1:6400 172.16.1.2:1234
> `- ESTAB 0 0 172.16.254.254%eth1:6400 172.16.3.5:1234
> `- ESTAB 0 0 172.16.254.254%eth1:6400 172.16.4.5:1234
> `- ESTAB 0 0 172.16.254.254%eth1:6400 172.16.253.253:1234
> LISTEN 0 10 172.16.254.254:6400 *:*
> `- ESTAB 0 0 172.16.254.254%eth1:6400 172.16.2.2:1234
> `- ESTAB 0 0 172.16.254.254%eth1:6400 172.16.2.3:1234
> `- ESTAB 0 0 172.16.254.254%eth1:6400 172.16.5.4:1234
> `- ESTAB 0 0 172.16.254.254%eth1:6400 172.16.5.5:1234
> LISTEN 0 10 172.16.254.254:6400 *:*
> `- ESTAB 0 0 172.16.254.254%eth1:6400 172.16.1.1:1234
> `- ESTAB 0 0 172.16.254.254%eth1:6400 172.16.1.5:1234
> `- ESTAB 0 0 172.16.254.254%eth1:6400 172.16.2.5:1234
> `- ESTAB 0 0 172.16.254.254%eth1:6400 172.16.3.1:1234
> `- ESTAB 0 0 172.16.254.254%eth1:6400 172.16.5.1:1234
>
> Xin Long (3):
> sctp: do reuseport_select_sock in __sctp_rcv_lookup_endpoint
> sctp: add sock_reuseport for the sock in __sctp_hash_endpoint
> sctp: process sk_reuseport in sctp_get_port_local
>
> include/net/sctp/sctp.h | 2 +-
> include/net/sctp/structs.h | 6 ++-
> net/core/sock_reuseport.c | 1 +
> net/sctp/bind_addr.c | 28 ++++++++++
> net/sctp/input.c | 129 ++++++++++++++++++++++++++++++++-------------
> net/sctp/socket.c | 49 +++++++++++------
> 6 files changed, 162 insertions(+), 53 deletions(-)
>
> --
> 2.1.0
>
>
Series
Acked-by: Neil Horman <nhorman@tuxdriver.com>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH net-next 2/3] sctp: add sock_reuseport for the sock in __sctp_hash_endpoint
2018-10-21 4:43 ` [PATCH net-next 2/3] sctp: add sock_reuseport for the sock in __sctp_hash_endpoint Xin Long
2018-10-21 4:43 ` [PATCH net-next 3/3] sctp: process sk_reuseport in sctp_get_port_local Xin Long
@ 2018-10-22 14:15 ` Marcelo Ricardo Leitner
2018-11-12 9:58 ` Xin Long
1 sibling, 1 reply; 11+ messages in thread
From: Marcelo Ricardo Leitner @ 2018-10-22 14:15 UTC (permalink / raw)
To: Xin Long; +Cc: network dev, linux-sctp, Neil Horman, davem
On Sun, Oct 21, 2018 at 12:43:37PM +0800, Xin Long wrote:
> This is a part of sk_reuseport support for sctp. It defines a helper
> sctp_bind_addrs_check() to check if the bind_addrs in two socks are
> matched. It will add sock_reuseport if they are completely matched,
> and return err if they are partly matched, and alloc sock_reuseport
> if all socks are not matched at all.
>
> It will work until sk_reuseport support is added in
> sctp_get_port_local() in the next patch.
>
> Signed-off-by: Xin Long <lucien.xin@gmail.com>
> ---
> include/net/sctp/sctp.h | 2 +-
> include/net/sctp/structs.h | 2 ++
> net/core/sock_reuseport.c | 1 +
> net/sctp/bind_addr.c | 28 ++++++++++++++++++++++
> net/sctp/input.c | 60 +++++++++++++++++++++++++++++++++++++++-------
> net/sctp/socket.c | 3 +--
> 6 files changed, 85 insertions(+), 11 deletions(-)
>
> diff --git a/include/net/sctp/sctp.h b/include/net/sctp/sctp.h
> index 8c2caa3..b8cd58d 100644
> --- a/include/net/sctp/sctp.h
> +++ b/include/net/sctp/sctp.h
> @@ -152,7 +152,7 @@ int sctp_primitive_RECONF(struct net *net, struct sctp_association *asoc,
> */
> int sctp_rcv(struct sk_buff *skb);
> void sctp_v4_err(struct sk_buff *skb, u32 info);
> -void sctp_hash_endpoint(struct sctp_endpoint *);
> +int sctp_hash_endpoint(struct sctp_endpoint *ep);
> void sctp_unhash_endpoint(struct sctp_endpoint *);
> struct sock *sctp_err_lookup(struct net *net, int family, struct sk_buff *,
> struct sctphdr *, struct sctp_association **,
> diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
> index a11f937..15d017f 100644
> --- a/include/net/sctp/structs.h
> +++ b/include/net/sctp/structs.h
> @@ -1190,6 +1190,8 @@ int sctp_bind_addr_conflict(struct sctp_bind_addr *, const union sctp_addr *,
> struct sctp_sock *, struct sctp_sock *);
> int sctp_bind_addr_state(const struct sctp_bind_addr *bp,
> const union sctp_addr *addr);
> +int sctp_bind_addrs_check(struct sctp_sock *sp,
> + struct sctp_sock *sp2, int cnt2);
> union sctp_addr *sctp_find_unmatch_addr(struct sctp_bind_addr *bp,
> const union sctp_addr *addrs,
> int addrcnt,
> diff --git a/net/core/sock_reuseport.c b/net/core/sock_reuseport.c
> index ba5cba5..d8fe3e5 100644
> --- a/net/core/sock_reuseport.c
> +++ b/net/core/sock_reuseport.c
> @@ -187,6 +187,7 @@ int reuseport_add_sock(struct sock *sk, struct sock *sk2, bool bind_inany)
> call_rcu(&old_reuse->rcu, reuseport_free_rcu);
> return 0;
> }
> +EXPORT_SYMBOL(reuseport_add_sock);
>
> void reuseport_detach_sock(struct sock *sk)
> {
> diff --git a/net/sctp/bind_addr.c b/net/sctp/bind_addr.c
> index 7df3704..78d0d93 100644
> --- a/net/sctp/bind_addr.c
> +++ b/net/sctp/bind_addr.c
> @@ -337,6 +337,34 @@ int sctp_bind_addr_match(struct sctp_bind_addr *bp,
> return match;
> }
>
> +int sctp_bind_addrs_check(struct sctp_sock *sp,
> + struct sctp_sock *sp2, int cnt2)
> +{
> + struct sctp_bind_addr *bp2 = &sp2->ep->base.bind_addr;
> + struct sctp_bind_addr *bp = &sp->ep->base.bind_addr;
> + struct sctp_sockaddr_entry *laddr, *laddr2;
> + bool exist = false;
> + int cnt = 0;
> +
> + rcu_read_lock();
> + list_for_each_entry_rcu(laddr, &bp->address_list, list) {
> + list_for_each_entry_rcu(laddr2, &bp2->address_list, list) {
> + if (sp->pf->af->cmp_addr(&laddr->a, &laddr2->a) &&
> + laddr->valid == laddr2->valid) {
I think by here in the normal run laddr2->valid will always be true,
but as is it gives the impression that it accepts 0 == 0 too, which
would be bad. May be on a fast BINDX_REM/BINDX_ADD it could trigger
laddr2->valid = 0 in there, not sure.
Anyway, may be '... laddr->valid && laddr2->valid' instead or you
really want to allow the 0 == 0 case?
> + exist = true;
> + goto next;
> + }
> + }
> + cnt = 0;
> + break;
> +next:
> + cnt++;
> + }
> + rcu_read_unlock();
> +
> + return (cnt == cnt2) ? 0 : (exist ? -EEXIST : 1);
> +}
> +
> /* Does the address 'addr' conflict with any addresses in
> * the bp.
> */
> diff --git a/net/sctp/input.c b/net/sctp/input.c
> index 60ede89..6bfeb10 100644
> --- a/net/sctp/input.c
> +++ b/net/sctp/input.c
> @@ -723,43 +723,87 @@ static int sctp_rcv_ootb(struct sk_buff *skb)
> }
>
> /* Insert endpoint into the hash table. */
> -static void __sctp_hash_endpoint(struct sctp_endpoint *ep)
> +static int __sctp_hash_endpoint(struct sctp_endpoint *ep)
> {
> - struct net *net = sock_net(ep->base.sk);
> - struct sctp_ep_common *epb;
> + struct sock *sk = ep->base.sk;
> + struct net *net = sock_net(sk);
> struct sctp_hashbucket *head;
> + struct sctp_ep_common *epb;
>
> epb = &ep->base;
> -
> epb->hashent = sctp_ep_hashfn(net, epb->bind_addr.port);
> head = &sctp_ep_hashtable[epb->hashent];
>
> + if (sk->sk_reuseport) {
> + bool any = sctp_is_ep_boundall(sk);
> + struct sctp_ep_common *epb2;
> + struct list_head *list;
> + int cnt = 0, err = 1;
> +
> + list_for_each(list, &ep->base.bind_addr.address_list)
> + cnt++;
> +
> + sctp_for_each_hentry(epb2, &head->chain) {
> + struct sock *sk2 = epb2->sk;
> +
> + if (!net_eq(sock_net(sk2), net) || sk2 == sk ||
> + !uid_eq(sock_i_uid(sk2), sock_i_uid(sk)) ||
> + !sk2->sk_reuseport)
> + continue;
> +
> + err = sctp_bind_addrs_check(sctp_sk(sk2),
> + sctp_sk(sk), cnt);
> + if (!err) {
> + err = reuseport_add_sock(sk, sk2, any);
> + if (err)
> + return err;
> + break;
> + } else if (err < 0) {
> + return err;
> + }
> + }
> +
> + if (err) {
> + err = reuseport_alloc(sk, any);
> + if (err)
> + return err;
> + }
> + }
> +
> write_lock(&head->lock);
> hlist_add_head(&epb->node, &head->chain);
> write_unlock(&head->lock);
> + return 0;
> }
>
> /* Add an endpoint to the hash. Local BH-safe. */
> -void sctp_hash_endpoint(struct sctp_endpoint *ep)
> +int sctp_hash_endpoint(struct sctp_endpoint *ep)
> {
> + int err;
> +
> local_bh_disable();
> - __sctp_hash_endpoint(ep);
> + err = __sctp_hash_endpoint(ep);
> local_bh_enable();
> +
> + return err;
> }
>
> /* Remove endpoint from the hash table. */
> static void __sctp_unhash_endpoint(struct sctp_endpoint *ep)
> {
> - struct net *net = sock_net(ep->base.sk);
> + struct sock *sk = ep->base.sk;
> struct sctp_hashbucket *head;
> struct sctp_ep_common *epb;
>
> epb = &ep->base;
>
> - epb->hashent = sctp_ep_hashfn(net, epb->bind_addr.port);
> + epb->hashent = sctp_ep_hashfn(sock_net(sk), epb->bind_addr.port);
>
> head = &sctp_ep_hashtable[epb->hashent];
>
> + if (rcu_access_pointer(sk->sk_reuseport_cb))
> + reuseport_detach_sock(sk);
> +
> write_lock(&head->lock);
> hlist_del_init(&epb->node);
> write_unlock(&head->lock);
> diff --git a/net/sctp/socket.c b/net/sctp/socket.c
> index fc0386e..44e7d8c 100644
> --- a/net/sctp/socket.c
> +++ b/net/sctp/socket.c
> @@ -7850,8 +7850,7 @@ static int sctp_listen_start(struct sock *sk, int backlog)
> }
>
> sk->sk_max_ack_backlog = backlog;
> - sctp_hash_endpoint(ep);
> - return 0;
> + return sctp_hash_endpoint(ep);
> }
>
> /*
> --
> 2.1.0
>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH net-next 1/3] sctp: do reuseport_select_sock in __sctp_rcv_lookup_endpoint
2018-10-21 4:43 ` [PATCH net-next 1/3] sctp: do reuseport_select_sock in __sctp_rcv_lookup_endpoint Xin Long
2018-10-21 4:43 ` [PATCH net-next 2/3] sctp: add sock_reuseport for the sock in __sctp_hash_endpoint Xin Long
@ 2018-10-22 14:17 ` Marcelo Ricardo Leitner
2018-11-12 9:56 ` Xin Long
1 sibling, 1 reply; 11+ messages in thread
From: Marcelo Ricardo Leitner @ 2018-10-22 14:17 UTC (permalink / raw)
To: Xin Long; +Cc: network dev, linux-sctp, Neil Horman, davem
On Sun, Oct 21, 2018 at 12:43:36PM +0800, Xin Long wrote:
> This is a part of sk_reuseport support for sctp, and it selects a
> sock by the hashkey of lport, paddr and dport by default. It will
> work until sk_reuseport support is added in sctp_get_port_local()
> in the next patch.
>
> Signed-off-by: Xin Long <lucien.xin@gmail.com>
> ---
> net/sctp/input.c | 69 +++++++++++++++++++++++++++++++++-----------------------
> 1 file changed, 41 insertions(+), 28 deletions(-)
>
> diff --git a/net/sctp/input.c b/net/sctp/input.c
> index 5c36a99..60ede89 100644
> --- a/net/sctp/input.c
> +++ b/net/sctp/input.c
> @@ -57,6 +57,7 @@
> #include <net/sctp/checksum.h>
> #include <net/net_namespace.h>
> #include <linux/rhashtable.h>
> +#include <net/sock_reuseport.h>
>
> /* Forward declarations for internal helpers. */
> static int sctp_rcv_ootb(struct sk_buff *);
> @@ -65,8 +66,10 @@ static struct sctp_association *__sctp_rcv_lookup(struct net *net,
> const union sctp_addr *paddr,
> const union sctp_addr *laddr,
> struct sctp_transport **transportp);
> -static struct sctp_endpoint *__sctp_rcv_lookup_endpoint(struct net *net,
> - const union sctp_addr *laddr);
> +static struct sctp_endpoint *__sctp_rcv_lookup_endpoint(
> + struct net *net, struct sk_buff *skb,
> + const union sctp_addr *laddr,
> + const union sctp_addr *daddr);
> static struct sctp_association *__sctp_lookup_association(
> struct net *net,
> const union sctp_addr *local,
> @@ -171,7 +174,7 @@ int sctp_rcv(struct sk_buff *skb)
> asoc = __sctp_rcv_lookup(net, skb, &src, &dest, &transport);
>
> if (!asoc)
> - ep = __sctp_rcv_lookup_endpoint(net, &dest);
> + ep = __sctp_rcv_lookup_endpoint(net, skb, &dest, &src);
>
> /* Retrieve the common input handling substructure. */
> rcvr = asoc ? &asoc->base : &ep->base;
> @@ -770,16 +773,35 @@ void sctp_unhash_endpoint(struct sctp_endpoint *ep)
> local_bh_enable();
> }
>
> +static inline __u32 sctp_hashfn(const struct net *net, __be16 lport,
> + const union sctp_addr *paddr, __u32 seed)
> +{
> + __u32 addr;
> +
> + if (paddr->sa.sa_family == AF_INET6)
> + addr = jhash(&paddr->v6.sin6_addr, 16, seed);
> + else
> + addr = (__force __u32)paddr->v4.sin_addr.s_addr;
> +
> + return jhash_3words(addr, ((__force __u32)paddr->v4.sin_port) << 16 |
> + (__force __u32)lport, net_hash_mix(net), seed);
> +}
> +
> /* Look up an endpoint. */
> -static struct sctp_endpoint *__sctp_rcv_lookup_endpoint(struct net *net,
> - const union sctp_addr *laddr)
> +static struct sctp_endpoint *__sctp_rcv_lookup_endpoint(
> + struct net *net, struct sk_buff *skb,
> + const union sctp_addr *laddr,
> + const union sctp_addr *paddr)
> {
> struct sctp_hashbucket *head;
> struct sctp_ep_common *epb;
> struct sctp_endpoint *ep;
> + struct sock *sk;
> + __be32 lport;
This could be a __be16 one.
> int hash;
>
> - hash = sctp_ep_hashfn(net, ntohs(laddr->v4.sin_port));
> + lport = laddr->v4.sin_port;
> + hash = sctp_ep_hashfn(net, ntohs(lport));
> head = &sctp_ep_hashtable[hash];
> read_lock(&head->lock);
> sctp_for_each_hentry(epb, &head->chain) {
> @@ -791,6 +813,15 @@ static struct sctp_endpoint *__sctp_rcv_lookup_endpoint(struct net *net,
> ep = sctp_sk(net->sctp.ctl_sock)->ep;
>
> hit:
> + sk = ep->base.sk;
> + if (sk->sk_reuseport) {
> + __u32 phash = sctp_hashfn(net, lport, paddr, 0);
> +
> + sk = reuseport_select_sock(sk, phash, skb,
> + sizeof(struct sctphdr));
> + if (sk)
> + ep = sctp_sk(sk)->ep;
> + }
> sctp_endpoint_hold(ep);
> read_unlock(&head->lock);
> return ep;
> @@ -829,35 +860,17 @@ static inline int sctp_hash_cmp(struct rhashtable_compare_arg *arg,
> static inline __u32 sctp_hash_obj(const void *data, u32 len, u32 seed)
> {
> const struct sctp_transport *t = data;
> - const union sctp_addr *paddr = &t->ipaddr;
> - const struct net *net = sock_net(t->asoc->base.sk);
> - __be16 lport = htons(t->asoc->base.bind_addr.port);
> - __u32 addr;
> -
> - if (paddr->sa.sa_family == AF_INET6)
> - addr = jhash(&paddr->v6.sin6_addr, 16, seed);
> - else
> - addr = (__force __u32)paddr->v4.sin_addr.s_addr;
>
> - return jhash_3words(addr, ((__force __u32)paddr->v4.sin_port) << 16 |
> - (__force __u32)lport, net_hash_mix(net), seed);
> + return sctp_hashfn(sock_net(t->asoc->base.sk),
> + htons(t->asoc->base.bind_addr.port),
> + &t->ipaddr, seed);
> }
>
> static inline __u32 sctp_hash_key(const void *data, u32 len, u32 seed)
> {
> const struct sctp_hash_cmp_arg *x = data;
> - const union sctp_addr *paddr = x->paddr;
> - const struct net *net = x->net;
> - __be16 lport = x->lport;
> - __u32 addr;
> -
> - if (paddr->sa.sa_family == AF_INET6)
> - addr = jhash(&paddr->v6.sin6_addr, 16, seed);
> - else
> - addr = (__force __u32)paddr->v4.sin_addr.s_addr;
>
> - return jhash_3words(addr, ((__force __u32)paddr->v4.sin_port) << 16 |
> - (__force __u32)lport, net_hash_mix(net), seed);
> + return sctp_hashfn(x->net, x->lport, x->paddr, seed);
> }
>
> static const struct rhashtable_params sctp_hash_params = {
> --
> 2.1.0
>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH net-next 0/3] sctp: add support for sk_reuseport
2018-10-21 4:43 [PATCH net-next 0/3] sctp: add support for sk_reuseport Xin Long
` (2 preceding siblings ...)
2018-10-22 11:40 ` Neil Horman
@ 2018-10-22 14:20 ` Marcelo Ricardo Leitner
3 siblings, 0 replies; 11+ messages in thread
From: Marcelo Ricardo Leitner @ 2018-10-22 14:20 UTC (permalink / raw)
To: Xin Long; +Cc: network dev, linux-sctp, Neil Horman, davem
On Sun, Oct 21, 2018 at 12:43:35PM +0800, Xin Long wrote:
> sctp sk_reuseport allows multiple socks to listen on the same port and
> addresses, as long as these socks have the same uid. This works pretty
> much as TCP/UDP does, the only difference is that sctp is multi-homing
> and all the bind_addrs in these socks will have to completely matched,
> otherwise listen() will return err.
>
FWIW, I won't be able to review this patchset thoroughly. The 2 small
comments that I sent are all I have.
Thanks,
Marcelo
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH net-next 1/3] sctp: do reuseport_select_sock in __sctp_rcv_lookup_endpoint
2018-10-22 14:17 ` [PATCH net-next 1/3] sctp: do reuseport_select_sock in __sctp_rcv_lookup_endpoint Marcelo Ricardo Leitner
@ 2018-11-12 9:56 ` Xin Long
0 siblings, 0 replies; 11+ messages in thread
From: Xin Long @ 2018-11-12 9:56 UTC (permalink / raw)
To: Marcelo Ricardo Leitner; +Cc: network dev, linux-sctp, Neil Horman, davem
On Mon, Oct 22, 2018 at 11:18 PM Marcelo Ricardo Leitner
<marcelo.leitner@gmail.com> wrote:
>
> On Sun, Oct 21, 2018 at 12:43:36PM +0800, Xin Long wrote:
> > This is a part of sk_reuseport support for sctp, and it selects a
> > sock by the hashkey of lport, paddr and dport by default. It will
> > work until sk_reuseport support is added in sctp_get_port_local()
> > in the next patch.
> >
> > Signed-off-by: Xin Long <lucien.xin@gmail.com>
> > ---
> > net/sctp/input.c | 69 +++++++++++++++++++++++++++++++++-----------------------
> > 1 file changed, 41 insertions(+), 28 deletions(-)
> >
> > diff --git a/net/sctp/input.c b/net/sctp/input.c
> > index 5c36a99..60ede89 100644
> > --- a/net/sctp/input.c
> > +++ b/net/sctp/input.c
> > @@ -57,6 +57,7 @@
> > #include <net/sctp/checksum.h>
> > #include <net/net_namespace.h>
> > #include <linux/rhashtable.h>
> > +#include <net/sock_reuseport.h>
> >
> > /* Forward declarations for internal helpers. */
> > static int sctp_rcv_ootb(struct sk_buff *);
> > @@ -65,8 +66,10 @@ static struct sctp_association *__sctp_rcv_lookup(struct net *net,
> > const union sctp_addr *paddr,
> > const union sctp_addr *laddr,
> > struct sctp_transport **transportp);
> > -static struct sctp_endpoint *__sctp_rcv_lookup_endpoint(struct net *net,
> > - const union sctp_addr *laddr);
> > +static struct sctp_endpoint *__sctp_rcv_lookup_endpoint(
> > + struct net *net, struct sk_buff *skb,
> > + const union sctp_addr *laddr,
> > + const union sctp_addr *daddr);
> > static struct sctp_association *__sctp_lookup_association(
> > struct net *net,
> > const union sctp_addr *local,
> > @@ -171,7 +174,7 @@ int sctp_rcv(struct sk_buff *skb)
> > asoc = __sctp_rcv_lookup(net, skb, &src, &dest, &transport);
> >
> > if (!asoc)
> > - ep = __sctp_rcv_lookup_endpoint(net, &dest);
> > + ep = __sctp_rcv_lookup_endpoint(net, skb, &dest, &src);
> >
> > /* Retrieve the common input handling substructure. */
> > rcvr = asoc ? &asoc->base : &ep->base;
> > @@ -770,16 +773,35 @@ void sctp_unhash_endpoint(struct sctp_endpoint *ep)
> > local_bh_enable();
> > }
> >
> > +static inline __u32 sctp_hashfn(const struct net *net, __be16 lport,
> > + const union sctp_addr *paddr, __u32 seed)
> > +{
> > + __u32 addr;
> > +
> > + if (paddr->sa.sa_family == AF_INET6)
> > + addr = jhash(&paddr->v6.sin6_addr, 16, seed);
> > + else
> > + addr = (__force __u32)paddr->v4.sin_addr.s_addr;
> > +
> > + return jhash_3words(addr, ((__force __u32)paddr->v4.sin_port) << 16 |
> > + (__force __u32)lport, net_hash_mix(net), seed);
> > +}
> > +
> > /* Look up an endpoint. */
> > -static struct sctp_endpoint *__sctp_rcv_lookup_endpoint(struct net *net,
> > - const union sctp_addr *laddr)
> > +static struct sctp_endpoint *__sctp_rcv_lookup_endpoint(
> > + struct net *net, struct sk_buff *skb,
> > + const union sctp_addr *laddr,
> > + const union sctp_addr *paddr)
> > {
> > struct sctp_hashbucket *head;
> > struct sctp_ep_common *epb;
> > struct sctp_endpoint *ep;
> > + struct sock *sk;
> > + __be32 lport;
>
> This could be a __be16 one.
right, will correct it in v2.
>
> > int hash;
> >
> > - hash = sctp_ep_hashfn(net, ntohs(laddr->v4.sin_port));
> > + lport = laddr->v4.sin_port;
> > + hash = sctp_ep_hashfn(net, ntohs(lport));
> > head = &sctp_ep_hashtable[hash];
> > read_lock(&head->lock);
> > sctp_for_each_hentry(epb, &head->chain) {
> > @@ -791,6 +813,15 @@ static struct sctp_endpoint *__sctp_rcv_lookup_endpoint(struct net *net,
> > ep = sctp_sk(net->sctp.ctl_sock)->ep;
> >
> > hit:
> > + sk = ep->base.sk;
> > + if (sk->sk_reuseport) {
> > + __u32 phash = sctp_hashfn(net, lport, paddr, 0);
> > +
> > + sk = reuseport_select_sock(sk, phash, skb,
> > + sizeof(struct sctphdr));
> > + if (sk)
> > + ep = sctp_sk(sk)->ep;
> > + }
> > sctp_endpoint_hold(ep);
> > read_unlock(&head->lock);
> > return ep;
> > @@ -829,35 +860,17 @@ static inline int sctp_hash_cmp(struct rhashtable_compare_arg *arg,
> > static inline __u32 sctp_hash_obj(const void *data, u32 len, u32 seed)
> > {
> > const struct sctp_transport *t = data;
> > - const union sctp_addr *paddr = &t->ipaddr;
> > - const struct net *net = sock_net(t->asoc->base.sk);
> > - __be16 lport = htons(t->asoc->base.bind_addr.port);
> > - __u32 addr;
> > -
> > - if (paddr->sa.sa_family == AF_INET6)
> > - addr = jhash(&paddr->v6.sin6_addr, 16, seed);
> > - else
> > - addr = (__force __u32)paddr->v4.sin_addr.s_addr;
> >
> > - return jhash_3words(addr, ((__force __u32)paddr->v4.sin_port) << 16 |
> > - (__force __u32)lport, net_hash_mix(net), seed);
> > + return sctp_hashfn(sock_net(t->asoc->base.sk),
> > + htons(t->asoc->base.bind_addr.port),
> > + &t->ipaddr, seed);
> > }
> >
> > static inline __u32 sctp_hash_key(const void *data, u32 len, u32 seed)
> > {
> > const struct sctp_hash_cmp_arg *x = data;
> > - const union sctp_addr *paddr = x->paddr;
> > - const struct net *net = x->net;
> > - __be16 lport = x->lport;
> > - __u32 addr;
> > -
> > - if (paddr->sa.sa_family == AF_INET6)
> > - addr = jhash(&paddr->v6.sin6_addr, 16, seed);
> > - else
> > - addr = (__force __u32)paddr->v4.sin_addr.s_addr;
> >
> > - return jhash_3words(addr, ((__force __u32)paddr->v4.sin_port) << 16 |
> > - (__force __u32)lport, net_hash_mix(net), seed);
> > + return sctp_hashfn(x->net, x->lport, x->paddr, seed);
> > }
> >
> > static const struct rhashtable_params sctp_hash_params = {
> > --
> > 2.1.0
> >
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH net-next 2/3] sctp: add sock_reuseport for the sock in __sctp_hash_endpoint
2018-10-22 14:15 ` [PATCH net-next 2/3] sctp: add sock_reuseport for the sock in __sctp_hash_endpoint Marcelo Ricardo Leitner
@ 2018-11-12 9:58 ` Xin Long
0 siblings, 0 replies; 11+ messages in thread
From: Xin Long @ 2018-11-12 9:58 UTC (permalink / raw)
To: Marcelo Ricardo Leitner; +Cc: network dev, linux-sctp, Neil Horman, davem
On Mon, Oct 22, 2018 at 11:15 PM Marcelo Ricardo Leitner
<marcelo.leitner@gmail.com> wrote:
>
> On Sun, Oct 21, 2018 at 12:43:37PM +0800, Xin Long wrote:
> > This is a part of sk_reuseport support for sctp. It defines a helper
> > sctp_bind_addrs_check() to check if the bind_addrs in two socks are
> > matched. It will add sock_reuseport if they are completely matched,
> > and return err if they are partly matched, and alloc sock_reuseport
> > if all socks are not matched at all.
> >
> > It will work until sk_reuseport support is added in
> > sctp_get_port_local() in the next patch.
> >
> > Signed-off-by: Xin Long <lucien.xin@gmail.com>
> > ---
> > include/net/sctp/sctp.h | 2 +-
> > include/net/sctp/structs.h | 2 ++
> > net/core/sock_reuseport.c | 1 +
> > net/sctp/bind_addr.c | 28 ++++++++++++++++++++++
> > net/sctp/input.c | 60 +++++++++++++++++++++++++++++++++++++++-------
> > net/sctp/socket.c | 3 +--
> > 6 files changed, 85 insertions(+), 11 deletions(-)
> >
> > diff --git a/include/net/sctp/sctp.h b/include/net/sctp/sctp.h
> > index 8c2caa3..b8cd58d 100644
> > --- a/include/net/sctp/sctp.h
> > +++ b/include/net/sctp/sctp.h
> > @@ -152,7 +152,7 @@ int sctp_primitive_RECONF(struct net *net, struct sctp_association *asoc,
> > */
> > int sctp_rcv(struct sk_buff *skb);
> > void sctp_v4_err(struct sk_buff *skb, u32 info);
> > -void sctp_hash_endpoint(struct sctp_endpoint *);
> > +int sctp_hash_endpoint(struct sctp_endpoint *ep);
> > void sctp_unhash_endpoint(struct sctp_endpoint *);
> > struct sock *sctp_err_lookup(struct net *net, int family, struct sk_buff *,
> > struct sctphdr *, struct sctp_association **,
> > diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
> > index a11f937..15d017f 100644
> > --- a/include/net/sctp/structs.h
> > +++ b/include/net/sctp/structs.h
> > @@ -1190,6 +1190,8 @@ int sctp_bind_addr_conflict(struct sctp_bind_addr *, const union sctp_addr *,
> > struct sctp_sock *, struct sctp_sock *);
> > int sctp_bind_addr_state(const struct sctp_bind_addr *bp,
> > const union sctp_addr *addr);
> > +int sctp_bind_addrs_check(struct sctp_sock *sp,
> > + struct sctp_sock *sp2, int cnt2);
> > union sctp_addr *sctp_find_unmatch_addr(struct sctp_bind_addr *bp,
> > const union sctp_addr *addrs,
> > int addrcnt,
> > diff --git a/net/core/sock_reuseport.c b/net/core/sock_reuseport.c
> > index ba5cba5..d8fe3e5 100644
> > --- a/net/core/sock_reuseport.c
> > +++ b/net/core/sock_reuseport.c
> > @@ -187,6 +187,7 @@ int reuseport_add_sock(struct sock *sk, struct sock *sk2, bool bind_inany)
> > call_rcu(&old_reuse->rcu, reuseport_free_rcu);
> > return 0;
> > }
> > +EXPORT_SYMBOL(reuseport_add_sock);
> >
> > void reuseport_detach_sock(struct sock *sk)
> > {
> > diff --git a/net/sctp/bind_addr.c b/net/sctp/bind_addr.c
> > index 7df3704..78d0d93 100644
> > --- a/net/sctp/bind_addr.c
> > +++ b/net/sctp/bind_addr.c
> > @@ -337,6 +337,34 @@ int sctp_bind_addr_match(struct sctp_bind_addr *bp,
> > return match;
> > }
> >
> > +int sctp_bind_addrs_check(struct sctp_sock *sp,
> > + struct sctp_sock *sp2, int cnt2)
> > +{
> > + struct sctp_bind_addr *bp2 = &sp2->ep->base.bind_addr;
> > + struct sctp_bind_addr *bp = &sp->ep->base.bind_addr;
> > + struct sctp_sockaddr_entry *laddr, *laddr2;
> > + bool exist = false;
> > + int cnt = 0;
> > +
> > + rcu_read_lock();
> > + list_for_each_entry_rcu(laddr, &bp->address_list, list) {
> > + list_for_each_entry_rcu(laddr2, &bp2->address_list, list) {
> > + if (sp->pf->af->cmp_addr(&laddr->a, &laddr2->a) &&
> > + laddr->valid == laddr2->valid) {
>
> I think by here in the normal run laddr2->valid will always be true,
> but as is it gives the impression that it accepts 0 == 0 too, which
> would be bad. May be on a fast BINDX_REM/BINDX_ADD it could trigger
> laddr2->valid = 0 in there, not sure.
>
> Anyway, may be '... laddr->valid && laddr2->valid' instead or you
> really want to allow the 0 == 0 case?
>
will improve it in v2. thanks.
> > + exist = true;
> > + goto next;
> > + }
> > + }
> > + cnt = 0;
> > + break;
> > +next:
> > + cnt++;
> > + }
> > + rcu_read_unlock();
> > +
> > + return (cnt == cnt2) ? 0 : (exist ? -EEXIST : 1);
> > +}
> > +
> > /* Does the address 'addr' conflict with any addresses in
> > * the bp.
> > */
> > diff --git a/net/sctp/input.c b/net/sctp/input.c
> > index 60ede89..6bfeb10 100644
> > --- a/net/sctp/input.c
> > +++ b/net/sctp/input.c
> > @@ -723,43 +723,87 @@ static int sctp_rcv_ootb(struct sk_buff *skb)
> > }
> >
> > /* Insert endpoint into the hash table. */
> > -static void __sctp_hash_endpoint(struct sctp_endpoint *ep)
> > +static int __sctp_hash_endpoint(struct sctp_endpoint *ep)
> > {
> > - struct net *net = sock_net(ep->base.sk);
> > - struct sctp_ep_common *epb;
> > + struct sock *sk = ep->base.sk;
> > + struct net *net = sock_net(sk);
> > struct sctp_hashbucket *head;
> > + struct sctp_ep_common *epb;
> >
> > epb = &ep->base;
> > -
> > epb->hashent = sctp_ep_hashfn(net, epb->bind_addr.port);
> > head = &sctp_ep_hashtable[epb->hashent];
> >
> > + if (sk->sk_reuseport) {
> > + bool any = sctp_is_ep_boundall(sk);
> > + struct sctp_ep_common *epb2;
> > + struct list_head *list;
> > + int cnt = 0, err = 1;
> > +
> > + list_for_each(list, &ep->base.bind_addr.address_list)
> > + cnt++;
> > +
> > + sctp_for_each_hentry(epb2, &head->chain) {
> > + struct sock *sk2 = epb2->sk;
> > +
> > + if (!net_eq(sock_net(sk2), net) || sk2 == sk ||
> > + !uid_eq(sock_i_uid(sk2), sock_i_uid(sk)) ||
> > + !sk2->sk_reuseport)
> > + continue;
> > +
> > + err = sctp_bind_addrs_check(sctp_sk(sk2),
> > + sctp_sk(sk), cnt);
> > + if (!err) {
> > + err = reuseport_add_sock(sk, sk2, any);
> > + if (err)
> > + return err;
> > + break;
> > + } else if (err < 0) {
> > + return err;
> > + }
> > + }
> > +
> > + if (err) {
> > + err = reuseport_alloc(sk, any);
> > + if (err)
> > + return err;
> > + }
> > + }
> > +
> > write_lock(&head->lock);
> > hlist_add_head(&epb->node, &head->chain);
> > write_unlock(&head->lock);
> > + return 0;
> > }
> >
> > /* Add an endpoint to the hash. Local BH-safe. */
> > -void sctp_hash_endpoint(struct sctp_endpoint *ep)
> > +int sctp_hash_endpoint(struct sctp_endpoint *ep)
> > {
> > + int err;
> > +
> > local_bh_disable();
> > - __sctp_hash_endpoint(ep);
> > + err = __sctp_hash_endpoint(ep);
> > local_bh_enable();
> > +
> > + return err;
> > }
> >
> > /* Remove endpoint from the hash table. */
> > static void __sctp_unhash_endpoint(struct sctp_endpoint *ep)
> > {
> > - struct net *net = sock_net(ep->base.sk);
> > + struct sock *sk = ep->base.sk;
> > struct sctp_hashbucket *head;
> > struct sctp_ep_common *epb;
> >
> > epb = &ep->base;
> >
> > - epb->hashent = sctp_ep_hashfn(net, epb->bind_addr.port);
> > + epb->hashent = sctp_ep_hashfn(sock_net(sk), epb->bind_addr.port);
> >
> > head = &sctp_ep_hashtable[epb->hashent];
> >
> > + if (rcu_access_pointer(sk->sk_reuseport_cb))
> > + reuseport_detach_sock(sk);
> > +
> > write_lock(&head->lock);
> > hlist_del_init(&epb->node);
> > write_unlock(&head->lock);
> > diff --git a/net/sctp/socket.c b/net/sctp/socket.c
> > index fc0386e..44e7d8c 100644
> > --- a/net/sctp/socket.c
> > +++ b/net/sctp/socket.c
> > @@ -7850,8 +7850,7 @@ static int sctp_listen_start(struct sock *sk, int backlog)
> > }
> >
> > sk->sk_max_ack_backlog = backlog;
> > - sctp_hash_endpoint(ep);
> > - return 0;
> > + return sctp_hash_endpoint(ep);
> > }
> >
> > /*
> > --
> > 2.1.0
> >
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2018-11-12 19:51 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-10-21 4:43 [PATCH net-next 0/3] sctp: add support for sk_reuseport Xin Long
2018-10-21 4:43 ` [PATCH net-next 1/3] sctp: do reuseport_select_sock in __sctp_rcv_lookup_endpoint Xin Long
2018-10-21 4:43 ` [PATCH net-next 2/3] sctp: add sock_reuseport for the sock in __sctp_hash_endpoint Xin Long
2018-10-21 4:43 ` [PATCH net-next 3/3] sctp: process sk_reuseport in sctp_get_port_local Xin Long
2018-10-22 14:15 ` [PATCH net-next 2/3] sctp: add sock_reuseport for the sock in __sctp_hash_endpoint Marcelo Ricardo Leitner
2018-11-12 9:58 ` Xin Long
2018-10-22 14:17 ` [PATCH net-next 1/3] sctp: do reuseport_select_sock in __sctp_rcv_lookup_endpoint Marcelo Ricardo Leitner
2018-11-12 9:56 ` Xin Long
2018-10-21 6:58 ` [PATCH net-next 0/3] sctp: add support for sk_reuseport Xin Long
2018-10-22 11:40 ` Neil Horman
2018-10-22 14:20 ` Marcelo Ricardo Leitner
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).