* Re: [PATCHv2 net-next] sctp: add support for SCTP_REUSE_PORT sockopt
From: Marcelo Ricardo Leitner @ 2018-06-25 13:30 UTC (permalink / raw)
To: Xin Long; +Cc: network dev, linux-sctp, Neil Horman, Michael Tuexen, davem
In-Reply-To: <2ad8be46a8656655e4c9e05aad3df6faf34fed19.1529892406.git.lucien.xin@gmail.com>
On Mon, Jun 25, 2018 at 10:06:46AM +0800, Xin Long wrote:
> This feature is actually already supported by sk->sk_reuse which can be
> set by socket level opt SO_REUSEADDR. But it's not working exactly as
> RFC6458 demands in section 8.1.27, like:
>
> - This option only supports one-to-one style SCTP sockets
> - This socket option must not be used after calling bind()
> or sctp_bindx().
>
> Besides, SCTP_REUSE_PORT sockopt should be provided for user's programs.
> Otherwise, the programs with SCTP_REUSE_PORT from other systems will not
> work in linux.
>
> To separate it from the socket level version, this patch adds 'reuse' in
> sctp_sock and it works pretty much as sk->sk_reuse, but with some extra
> setup limitations that are needed when it is being enabled.
>
> "It should be noted that the behavior of the socket-level socket option
> to reuse ports and/or addresses for SCTP sockets is unspecified", so it
> leaves SO_REUSEADDR as is for the compatibility.
>
> Note that the name SCTP_REUSE_PORT is kind of confusing, it is identical
> to SO_REUSEADDR with some extra restriction, so here it uses 'reuse' in
> sctp_sock instead of 'reuseport'. As for sk->sk_reuseport support for
> SCTP, it will be added in another patch.
To help changelog readers later, please update to something like:
"""\
Note that the name SCTP_REUSE_PORT is somewhat confusing, as its
functionality is nearly identical to SO_REUSEADDR, but with some
extra restrictions. Here it uses 'reuse' in sctp_sock instead of
'reuseport'. As for sk->sk_reuseport support for SCTP, it will be
added in another patch.
"""
Makes sense, can you note the difference?
>
> Thanks to Neil to make this clear.
>
> v1->v2:
> - add sctp_sk->reuse to separate it from the socket level version.
>
> Acked-by: Neil Horman <nhorman@tuxdriver.com>
> Signed-off-by: Xin Long <lucien.xin@gmail.com>
> ---
> include/net/sctp/structs.h | 1 +
> include/uapi/linux/sctp.h | 1 +
> net/sctp/socket.c | 62 ++++++++++++++++++++++++++++++++++++++++------
> 3 files changed, 57 insertions(+), 7 deletions(-)
>
> diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
> index e0f962d..701a517 100644
> --- a/include/net/sctp/structs.h
> +++ b/include/net/sctp/structs.h
> @@ -220,6 +220,7 @@ struct sctp_sock {
> __u32 adaptation_ind;
> __u32 pd_point;
> __u16 nodelay:1,
> + reuse:1,
> disable_fragments:1,
> v4mapped:1,
> frag_interleave:1,
> diff --git a/include/uapi/linux/sctp.h b/include/uapi/linux/sctp.h
> index b64d583..c02986a 100644
> --- a/include/uapi/linux/sctp.h
> +++ b/include/uapi/linux/sctp.h
> @@ -100,6 +100,7 @@ typedef __s32 sctp_assoc_t;
> #define SCTP_RECVNXTINFO 33
> #define SCTP_DEFAULT_SNDINFO 34
> #define SCTP_AUTH_DEACTIVATE_KEY 35
> +#define SCTP_REUSE_PORT 36
>
> /* Internal Socket Options. Some of the sctp library functions are
> * implemented using these socket options.
> diff --git a/net/sctp/socket.c b/net/sctp/socket.c
> index 0e91e83..bf11f9c 100644
> --- a/net/sctp/socket.c
> +++ b/net/sctp/socket.c
> @@ -4170,6 +4170,28 @@ static int sctp_setsockopt_interleaving_supported(struct sock *sk,
> return retval;
> }
>
> +static int sctp_setsockopt_reuse_port(struct sock *sk, char __user *optval,
> + unsigned int optlen)
> +{
> + int val;
> +
> + if (!sctp_style(sk, TCP))
> + return -EOPNOTSUPP;
> +
> + if (sctp_sk(sk)->ep->base.bind_addr.port)
> + return -EFAULT;
> +
> + if (optlen < sizeof(int))
> + return -EINVAL;
> +
> + if (get_user(val, (int __user *)optval))
> + return -EFAULT;
> +
> + sctp_sk(sk)->reuse = !!val;
> +
> + return 0;
> +}
> +
> /* API 6.2 setsockopt(), getsockopt()
> *
> * Applications use setsockopt() and getsockopt() to set or retrieve
> @@ -4364,6 +4386,9 @@ static int sctp_setsockopt(struct sock *sk, int level, int optname,
> retval = sctp_setsockopt_interleaving_supported(sk, optval,
> optlen);
> break;
> + case SCTP_REUSE_PORT:
> + retval = sctp_setsockopt_reuse_port(sk, optval, optlen);
> + break;
> default:
> retval = -ENOPROTOOPT;
> break;
> @@ -7197,6 +7222,26 @@ static int sctp_getsockopt_interleaving_supported(struct sock *sk, int len,
> return retval;
> }
>
> +static int sctp_getsockopt_reuse_port(struct sock *sk, int len,
> + char __user *optval,
> + int __user *optlen)
> +{
> + int val;
> +
> + if (len < sizeof(int))
> + return -EINVAL;
> +
> + len = sizeof(int);
> + val = sctp_sk(sk)->reuse;
> + if (put_user(len, optlen))
> + return -EFAULT;
> +
> + if (copy_to_user(optval, &val, len))
> + return -EFAULT;
> +
> + return 0;
> +}
> +
> static int sctp_getsockopt(struct sock *sk, int level, int optname,
> char __user *optval, int __user *optlen)
> {
> @@ -7392,6 +7437,9 @@ static int sctp_getsockopt(struct sock *sk, int level, int optname,
> retval = sctp_getsockopt_interleaving_supported(sk, len, optval,
> optlen);
> break;
> + case SCTP_REUSE_PORT:
> + retval = sctp_getsockopt_reuse_port(sk, len, optval, optlen);
> + break;
> default:
> retval = -ENOPROTOOPT;
> break;
> @@ -7429,6 +7477,7 @@ static struct sctp_bind_bucket *sctp_bucket_create(
>
> static long sctp_get_port_local(struct sock *sk, union sctp_addr *addr)
> {
> + bool reuse = (sk->sk_reuse || sctp_sk(sk)->reuse);
> struct sctp_bind_hashbucket *head; /* hash list */
> struct sctp_bind_bucket *pp;
> unsigned short snum;
> @@ -7501,13 +7550,11 @@ static long sctp_get_port_local(struct sock *sk, union sctp_addr *addr)
> * used by other socket (pp->owner not empty); that other
> * socket is going to be sk2.
> */
> - int reuse = sk->sk_reuse;
> struct sock *sk2;
>
> pr_debug("%s: found a possible match\n", __func__);
>
> - if (pp->fastreuse && sk->sk_reuse &&
> - sk->sk_state != SCTP_SS_LISTENING)
> + if (pp->fastreuse && reuse && sk->sk_state != SCTP_SS_LISTENING)
> goto success;
>
> /* Run through the list of sockets bound to the port
> @@ -7525,7 +7572,7 @@ static long sctp_get_port_local(struct sock *sk, union sctp_addr *addr)
> ep2 = sctp_sk(sk2)->ep;
>
> if (sk == sk2 ||
> - (reuse && sk2->sk_reuse &&
> + (reuse && (sk2->sk_reuse || sctp_sk(sk2)->reuse) &&
> sk2->sk_state != SCTP_SS_LISTENING))
> continue;
>
> @@ -7549,12 +7596,12 @@ static long sctp_get_port_local(struct sock *sk, union sctp_addr *addr)
> * SO_REUSEADDR on this socket -sk-).
> */
> if (hlist_empty(&pp->owner)) {
> - if (sk->sk_reuse && sk->sk_state != SCTP_SS_LISTENING)
> + if (reuse && sk->sk_state != SCTP_SS_LISTENING)
> pp->fastreuse = 1;
> else
> pp->fastreuse = 0;
> } else if (pp->fastreuse &&
> - (!sk->sk_reuse || sk->sk_state == SCTP_SS_LISTENING))
> + (!reuse || sk->sk_state == SCTP_SS_LISTENING))
> pp->fastreuse = 0;
>
> /* We are set, so fill up all the data in the hash table
> @@ -7685,7 +7732,7 @@ int sctp_inet_listen(struct socket *sock, int backlog)
> err = 0;
> sctp_unhash_endpoint(ep);
> sk->sk_state = SCTP_SS_CLOSED;
> - if (sk->sk_reuse)
> + if (sk->sk_reuse || sctp_sk(sk)->reuse)
> sctp_sk(sk)->bind_hash->fastreuse = 1;
> goto out;
> }
> @@ -8550,6 +8597,7 @@ void sctp_copy_sock(struct sock *newsk, struct sock *sk,
> newsk->sk_no_check_tx = sk->sk_no_check_tx;
> newsk->sk_no_check_rx = sk->sk_no_check_rx;
> newsk->sk_reuse = sk->sk_reuse;
> + sctp_sk(newsk)->reuse = sp->reuse;
>
> newsk->sk_shutdown = sk->sk_shutdown;
> newsk->sk_destruct = sctp_destruct_sock;
> --
> 2.1.0
>
^ permalink raw reply
* Re: [PATCH net-next 1/3] rds: Changing IP address internal representation to struct in6_addr
From: kbuild test robot @ 2018-06-25 13:39 UTC (permalink / raw)
To: Ka-Cheong Poon; +Cc: kbuild-all, netdev, santosh.shilimkar, davem, rds-devel
In-Reply-To: <3b53ed6f18e697fe8e414907cbbba81834515a2a.1529922794.git.ka-cheong.poon@oracle.com>
Hi Ka-Cheong,
Thank you for the patch! Perhaps something to improve:
[auto build test WARNING on net-next/master]
url: https://github.com/0day-ci/linux/commits/Ka-Cheong-Poon/rds-IPv6-support/20180625-190047
reproduce:
# apt-get install sparse
make ARCH=x86_64 allmodconfig
make C=1 CF=-D__CHECK_ENDIAN__
sparse warnings: (new ones prefixed by >>)
>> net/rds/rdma_transport.c:42:5: sparse: symbol 'rds_rdma_cm_event_handler_cmn' was not declared. Should it be static?
--
net/rds/ib_cm.c:205:17: sparse: expression using sizeof(void)
net/rds/ib_cm.c:205:17: sparse: expression using sizeof(void)
net/rds/ib_cm.c:207:17: sparse: expression using sizeof(void)
net/rds/ib_cm.c:207:17: sparse: expression using sizeof(void)
net/rds/ib_cm.c:208:35: sparse: expression using sizeof(void)
include/linux/overflow.h:220:13: sparse: undefined identifier '__builtin_mul_overflow'
include/linux/overflow.h:220:13: sparse: incorrect type in conditional
include/linux/overflow.h:220:13: got void
include/linux/overflow.h:220:13: sparse: not a function <noident>
include/linux/overflow.h:220:13: sparse: incorrect type in conditional
include/linux/overflow.h:220:13: got void
>> net/rds/ib_cm.c:908:31: sparse: incorrect type in assignment (different base types) @@ expected restricted __be16 [usertype] sin_port @@ got unsignedrestricted __be16 [usertype] sin_port @@
net/rds/ib_cm.c:908:31: expected restricted __be16 [usertype] sin_port
net/rds/ib_cm.c:908:31: got unsigned short [unsigned] [usertype] <noident>
net/rds/ib_cm.c:913:31: sparse: incorrect type in assignment (different base types) @@ expected restricted __be16 [usertype] sin_port @@ got unsignedrestricted __be16 [usertype] sin_port @@
net/rds/ib_cm.c:913:31: expected restricted __be16 [usertype] sin_port
net/rds/ib_cm.c:913:31: got unsigned short [unsigned] [usertype] <noident>
>> net/rds/ib_cm.c:920:33: sparse: incorrect type in assignment (different base types) @@ expected restricted __be16 [usertype] sin6_port @@ got unsignedrestricted __be16 [usertype] sin6_port @@
net/rds/ib_cm.c:920:33: expected restricted __be16 [usertype] sin6_port
net/rds/ib_cm.c:920:33: got unsigned short [unsigned] [usertype] <noident>
net/rds/ib_cm.c:926:33: sparse: incorrect type in assignment (different base types) @@ expected restricted __be16 [usertype] sin6_port @@ got unsignedrestricted __be16 [usertype] sin6_port @@
net/rds/ib_cm.c:926:33: expected restricted __be16 [usertype] sin6_port
net/rds/ib_cm.c:926:33: got unsigned short [unsigned] [usertype] <noident>
include/linux/overflow.h:220:13: sparse: call with no type!
--
>> net/rds/tcp.c:294:9: sparse: context imbalance in 'rds_tcp_laddr_check' - different lock contexts for basic block
--
net/rds/tcp_connect.c:119:29: sparse: incorrect type in assignment (different base types) @@ expected restricted __be32 [assigned] [usertype] s_addr @@ got icted __be32 [assigned] [usertype] s_addr @@
net/rds/tcp_connect.c:119:29: expected restricted __be32 [assigned] [usertype] s_addr
net/rds/tcp_connect.c:119:29: got unsigned int [unsigned] [usertype] <noident>
net/rds/tcp_connect.c:120:22: sparse: incorrect type in assignment (different base types) @@ expected restricted __be16 [assigned] [usertype] sin_port @@ got tricted __be16 [assigned] [usertype] sin_port @@
net/rds/tcp_connect.c:120:22: expected restricted __be16 [assigned] [usertype] sin_port
net/rds/tcp_connect.c:120:22: got unsigned short [unsigned] [usertype] <noident>
>> net/rds/tcp_connect.c:132:29: sparse: incorrect type in assignment (different base types) @@ expected restricted __be32 [addressable] [assigned] [usertype] s_addr @@ got addressable] [assigned] [usertype] s_addr @@
net/rds/tcp_connect.c:132:29: expected restricted __be32 [addressable] [assigned] [usertype] s_addr
net/rds/tcp_connect.c:132:29: got unsigned int [unsigned] [usertype] <noident>
>> net/rds/tcp_connect.c:133:22: sparse: incorrect type in assignment (different base types) @@ expected restricted __be16 [addressable] [assigned] [usertype] sin_port @@ got [addressable] [assigned] [usertype] sin_port @@
net/rds/tcp_connect.c:133:22: expected restricted __be16 [addressable] [assigned] [usertype] sin_port
net/rds/tcp_connect.c:133:22: got unsigned short [unsigned] [usertype] <noident>
--
net/rds/af_rds.c:226:22: sparse: invalid assignment: |=
net/rds/af_rds.c:226:22: left side has type restricted __poll_t
net/rds/af_rds.c:226:22: right side has type int
>> net/rds/af_rds.c:499:34: sparse: restricted __be32 degrades to integer
net/rds/af_rds.c:504:34: sparse: restricted __be32 degrades to integer
--
net/rds/bind.c:105:25: sparse: expression using sizeof(void)
>> net/rds/bind.c:177:34: sparse: restricted __be32 degrades to integer
--
net/rds/send.c:363:39: sparse: expression using sizeof(void)
net/rds/send.c:363:39: sparse: expression using sizeof(void)
net/rds/send.c:372:39: sparse: expression using sizeof(void)
net/rds/send.c:372:39: sparse: expression using sizeof(void)
net/rds/send.c:1015:24: sparse: incorrect type in argument 1 (different base types) @@ expected unsigned int [unsigned] [usertype] a @@ got ed int [unsigned] [usertype] a @@
net/rds/send.c:1015:24: expected unsigned int [unsigned] [usertype] a
net/rds/send.c:1015:24: got restricted __be16 [usertype] sin6_port
net/rds/send.c:1017:24: sparse: incorrect type in argument 1 (different base types) @@ expected unsigned int [unsigned] [usertype] a @@ got ed int [unsigned] [usertype] a @@
net/rds/send.c:1017:24: expected unsigned int [unsigned] [usertype] a
net/rds/send.c:1017:24: got restricted __be16 [usertype] sin6_port
>> net/rds/send.c:1097:43: sparse: restricted __be32 degrades to integer
net/rds/send.c:1098:43: sparse: restricted __be32 degrades to integer
net/rds/send.c:1149:13: sparse: expression using sizeof(void)
net/rds/send.c:1149:13: sparse: expression using sizeof(void)
net/rds/send.c:1352:30: sparse: incorrect type in initializer (different base types) @@ expected unsigned short [unsigned] [usertype] npaths @@ got short [unsigned] [usertype] npaths @@
net/rds/send.c:1352:30: expected unsigned short [unsigned] [usertype] npaths
net/rds/send.c:1352:30: got restricted __be16 [usertype] <noident>
net/rds/send.c:1353:34: sparse: incorrect type in initializer (different base types) @@ expected unsigned int [unsigned] [usertype] my_gen_num @@ got ed int [unsigned] [usertype] my_gen_num @@
net/rds/send.c:1353:34: expected unsigned int [unsigned] [usertype] my_gen_num
net/rds/send.c:1353:34: got restricted __be32 [usertype] <noident>
^ permalink raw reply
* [RFC PATCH] rds: rds_rdma_cm_event_handler_cmn() can be static
From: kbuild test robot @ 2018-06-25 13:39 UTC (permalink / raw)
To: Ka-Cheong Poon; +Cc: kbuild-all, netdev, santosh.shilimkar, davem, rds-devel
In-Reply-To: <3b53ed6f18e697fe8e414907cbbba81834515a2a.1529922794.git.ka-cheong.poon@oracle.com>
Fixes: f58dbee41d61 ("rds: Changing IP address internal representation to struct in6_addr")
Signed-off-by: kbuild test robot <fengguang.wu@intel.com>
---
rdma_transport.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/net/rds/rdma_transport.c b/net/rds/rdma_transport.c
index d7da115..3634ed5 100644
--- a/net/rds/rdma_transport.c
+++ b/net/rds/rdma_transport.c
@@ -39,9 +39,9 @@
static struct rdma_cm_id *rds_rdma_listen_id;
-int rds_rdma_cm_event_handler_cmn(struct rdma_cm_id *cm_id,
- struct rdma_cm_event *event,
- bool isv6)
+static int rds_rdma_cm_event_handler_cmn(struct rdma_cm_id *cm_id,
+ struct rdma_cm_event *event,
+ bool isv6)
{
/* this can be null in the listening path */
struct rds_connection *conn = cm_id->context;
^ permalink raw reply related
* RE: [PATCH 1/5] net: emaclite: Use __func__ instead of hardcoded name
From: Radhey Shyam Pandey @ 2018-06-25 13:47 UTC (permalink / raw)
To: Joe Perches, Andy Shevchenko
Cc: David S. Miller, Andrew Lunn, Michal Simek, netdev,
linux-arm Mailing List, Linux Kernel Mailing List
In-Reply-To: <3013aae1803c096645887864d1b161efbfba7c77.camel@perches.com>
> -----Original Message-----
> From: Joe Perches [mailto:joe@perches.com]
> Sent: Wednesday, June 20, 2018 4:08 AM
> To: Andy Shevchenko <andy.shevchenko@gmail.com>; Radhey Shyam
> Pandey <radheys@xilinx.com>
> Cc: David S. Miller <davem@davemloft.net>; Andrew Lunn
> <andrew@lunn.ch>; Michal Simek <michals@xilinx.com>; netdev
> <netdev@vger.kernel.org>; linux-arm Mailing List <linux-arm-
> kernel@lists.infradead.org>; Linux Kernel Mailing List <linux-
> kernel@vger.kernel.org>
> Subject: Re: [PATCH 1/5] net: emaclite: Use __func__ instead of hardcoded
> name
>
> On Wed, 2018-06-20 at 00:36 +0300, Andy Shevchenko wrote:
> > On Mon, Jun 18, 2018 at 2:08 PM, Radhey Shyam Pandey
> > <radhey.shyam.pandey@xilinx.com> wrote:
> > > Switch hardcoded function name with a reference to __func__ making
> > > the code more maintainable. Address below checkpatch warning:
> > >
> > > WARNING: Prefer using '"%s...", __func__' to using
> 'xemaclite_mdio_read',
> > > this function's name, in a string
> > > + "xemaclite_mdio_read(phy_id=%i, reg=%x) == %x\n",
> > >
> > > WARNING: Prefer using '"%s...", __func__' to using
> 'xemaclite_mdio_write',
> > > this function's name, in a string
> > > + "xemaclite_mdio_write(phy_id=%i, reg=%x, val=%x)\n",
> > >
> >
> > For dev_dbg() the __func__ should be completely dropped away.
>
> Not really the same.
>
> dev_dbg without CONFIG_DYNAMIC_DEBUG does not have
> the ability to prefix __func__.
Yes. If it's acceptable, prefer to use __func__ to support all
configurations
>
^ permalink raw reply
* [PATCH net-next 0/7] l2tp: trivial cleanups
From: Guillaume Nault @ 2018-06-25 14:07 UTC (permalink / raw)
To: netdev; +Cc: James Chapman
Just a set of unrelated trivial cleanups (remove unused code, make
local functions static, etc.).
Guillaume Nault (7):
l2tp: remove pppol2tp_session_close()
l2tp: remove .show from struct l2tp_tunnel
l2tp: remove l2tp_tunnel_priv()
l2tp: don't export l2tp_session_queue_purge()
l2tp: don't export l2tp_tunnel_closeall()
l2tp: avoid duplicate l2tp_pernet() calls
l2tp: make l2tp_xmit_core() return void
net/l2tp/l2tp_core.c | 15 +++++----------
net/l2tp/l2tp_core.h | 12 ------------
net/l2tp/l2tp_debugfs.c | 3 ---
net/l2tp/l2tp_ppp.c | 7 -------
4 files changed, 5 insertions(+), 32 deletions(-)
--
2.18.0
^ permalink raw reply
* [PATCH net-next 2/7] l2tp: remove .show from struct l2tp_tunnel
From: Guillaume Nault @ 2018-06-25 14:07 UTC (permalink / raw)
To: netdev; +Cc: James Chapman
In-Reply-To: <cover.1529935024.git.g.nault@alphalink.fr>
This callback has never been implemented.
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
---
net/l2tp/l2tp_core.h | 3 ---
net/l2tp/l2tp_debugfs.c | 3 ---
2 files changed, 6 deletions(-)
diff --git a/net/l2tp/l2tp_core.h b/net/l2tp/l2tp_core.h
index c199020f8a8a..b21c20a4e08f 100644
--- a/net/l2tp/l2tp_core.h
+++ b/net/l2tp/l2tp_core.h
@@ -180,9 +180,6 @@ struct l2tp_tunnel {
struct net *l2tp_net; /* the net we belong to */
refcount_t ref_count;
-#ifdef CONFIG_DEBUG_FS
- void (*show)(struct seq_file *m, void *arg);
-#endif
int (*recv_payload_hook)(struct sk_buff *skb);
void (*old_sk_destruct)(struct sock *);
struct sock *sock; /* Parent socket */
diff --git a/net/l2tp/l2tp_debugfs.c b/net/l2tp/l2tp_debugfs.c
index e87686f7d63c..b5d7dde003ef 100644
--- a/net/l2tp/l2tp_debugfs.c
+++ b/net/l2tp/l2tp_debugfs.c
@@ -177,9 +177,6 @@ static void l2tp_dfs_seq_tunnel_show(struct seq_file *m, void *v)
atomic_long_read(&tunnel->stats.rx_packets),
atomic_long_read(&tunnel->stats.rx_bytes),
atomic_long_read(&tunnel->stats.rx_errors));
-
- if (tunnel->show != NULL)
- tunnel->show(m, tunnel);
}
static void l2tp_dfs_seq_session_show(struct seq_file *m, void *v)
--
2.18.0
^ permalink raw reply related
* [PATCH net-next 4/7] l2tp: don't export l2tp_session_queue_purge()
From: Guillaume Nault @ 2018-06-25 14:07 UTC (permalink / raw)
To: netdev; +Cc: James Chapman
In-Reply-To: <cover.1529935024.git.g.nault@alphalink.fr>
This function is only used in l2tp_core.c.
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
---
net/l2tp/l2tp_core.c | 3 +--
net/l2tp/l2tp_core.h | 1 -
2 files changed, 1 insertion(+), 3 deletions(-)
diff --git a/net/l2tp/l2tp_core.c b/net/l2tp/l2tp_core.c
index 40261cb68e83..3adef4c35a3a 100644
--- a/net/l2tp/l2tp_core.c
+++ b/net/l2tp/l2tp_core.c
@@ -783,7 +783,7 @@ EXPORT_SYMBOL(l2tp_recv_common);
/* Drop skbs from the session's reorder_q
*/
-int l2tp_session_queue_purge(struct l2tp_session *session)
+static int l2tp_session_queue_purge(struct l2tp_session *session)
{
struct sk_buff *skb = NULL;
BUG_ON(!session);
@@ -794,7 +794,6 @@ int l2tp_session_queue_purge(struct l2tp_session *session)
}
return 0;
}
-EXPORT_SYMBOL_GPL(l2tp_session_queue_purge);
/* Internal UDP receive frame. Do the real work of receiving an L2TP data frame
* here. The skb is not on a list when we get here.
diff --git a/net/l2tp/l2tp_core.h b/net/l2tp/l2tp_core.h
index 15e1171ecf7b..0a6e582f84d3 100644
--- a/net/l2tp/l2tp_core.h
+++ b/net/l2tp/l2tp_core.h
@@ -234,7 +234,6 @@ void l2tp_session_free(struct l2tp_session *session);
void l2tp_recv_common(struct l2tp_session *session, struct sk_buff *skb,
unsigned char *ptr, unsigned char *optr, u16 hdrflags,
int length, int (*payload_hook)(struct sk_buff *skb));
-int l2tp_session_queue_purge(struct l2tp_session *session);
int l2tp_udp_encap_recv(struct sock *sk, struct sk_buff *skb);
void l2tp_session_set_header_len(struct l2tp_session *session, int version);
--
2.18.0
^ permalink raw reply related
* [PATCH net-next 3/7] l2tp: remove l2tp_tunnel_priv()
From: Guillaume Nault @ 2018-06-25 14:07 UTC (permalink / raw)
To: netdev; +Cc: James Chapman
In-Reply-To: <cover.1529935024.git.g.nault@alphalink.fr>
This function, and the associated .priv field, are unused.
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
---
net/l2tp/l2tp_core.h | 7 -------
1 file changed, 7 deletions(-)
diff --git a/net/l2tp/l2tp_core.h b/net/l2tp/l2tp_core.h
index b21c20a4e08f..15e1171ecf7b 100644
--- a/net/l2tp/l2tp_core.h
+++ b/net/l2tp/l2tp_core.h
@@ -187,8 +187,6 @@ struct l2tp_tunnel {
* was created by userspace */
struct work_struct del_work;
-
- uint8_t priv[0]; /* private data */
};
struct l2tp_nl_cmd_ops {
@@ -198,11 +196,6 @@ struct l2tp_nl_cmd_ops {
int (*session_delete)(struct l2tp_session *session);
};
-static inline void *l2tp_tunnel_priv(struct l2tp_tunnel *tunnel)
-{
- return &tunnel->priv[0];
-}
-
static inline void *l2tp_session_priv(struct l2tp_session *session)
{
return &session->priv[0];
--
2.18.0
^ permalink raw reply related
* [PATCH net-next 5/7] l2tp: don't export l2tp_tunnel_closeall()
From: Guillaume Nault @ 2018-06-25 14:07 UTC (permalink / raw)
To: netdev; +Cc: James Chapman
In-Reply-To: <cover.1529935024.git.g.nault@alphalink.fr>
This function is only used in l2tp_core.c.
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
---
net/l2tp/l2tp_core.c | 3 +--
net/l2tp/l2tp_core.h | 1 -
2 files changed, 1 insertion(+), 3 deletions(-)
diff --git a/net/l2tp/l2tp_core.c b/net/l2tp/l2tp_core.c
index 3adef4c35a3a..96e31f2ae7cd 100644
--- a/net/l2tp/l2tp_core.c
+++ b/net/l2tp/l2tp_core.c
@@ -1192,7 +1192,7 @@ static void l2tp_tunnel_destruct(struct sock *sk)
/* When the tunnel is closed, all the attached sessions need to go too.
*/
-void l2tp_tunnel_closeall(struct l2tp_tunnel *tunnel)
+static void l2tp_tunnel_closeall(struct l2tp_tunnel *tunnel)
{
int hash;
struct hlist_node *walk;
@@ -1241,7 +1241,6 @@ void l2tp_tunnel_closeall(struct l2tp_tunnel *tunnel)
}
write_unlock_bh(&tunnel->hlist_lock);
}
-EXPORT_SYMBOL_GPL(l2tp_tunnel_closeall);
/* Tunnel socket destroy hook for UDP encapsulation */
static void l2tp_udp_encap_destroy(struct sock *sk)
diff --git a/net/l2tp/l2tp_core.h b/net/l2tp/l2tp_core.h
index 0a6e582f84d3..a5c09d3a5698 100644
--- a/net/l2tp/l2tp_core.h
+++ b/net/l2tp/l2tp_core.h
@@ -219,7 +219,6 @@ int l2tp_tunnel_create(struct net *net, int fd, int version, u32 tunnel_id,
int l2tp_tunnel_register(struct l2tp_tunnel *tunnel, struct net *net,
struct l2tp_tunnel_cfg *cfg);
-void l2tp_tunnel_closeall(struct l2tp_tunnel *tunnel);
void l2tp_tunnel_delete(struct l2tp_tunnel *tunnel);
struct l2tp_session *l2tp_session_create(int priv_size,
struct l2tp_tunnel *tunnel,
--
2.18.0
^ permalink raw reply related
* [PATCH net-next 6/7] l2tp: avoid duplicate l2tp_pernet() calls
From: Guillaume Nault @ 2018-06-25 14:07 UTC (permalink / raw)
To: netdev; +Cc: James Chapman
In-Reply-To: <cover.1529935024.git.g.nault@alphalink.fr>
Replace 'l2tp_pernet(tunnel->l2tp_net)' with 'pn', which has been set
on the preceding line.
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
---
net/l2tp/l2tp_core.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/net/l2tp/l2tp_core.c b/net/l2tp/l2tp_core.c
index 96e31f2ae7cd..88c3001531b4 100644
--- a/net/l2tp/l2tp_core.c
+++ b/net/l2tp/l2tp_core.c
@@ -322,8 +322,7 @@ int l2tp_session_register(struct l2tp_session *session,
if (tunnel->version == L2TP_HDR_VER_3) {
pn = l2tp_pernet(tunnel->l2tp_net);
- g_head = l2tp_session_id_hash_2(l2tp_pernet(tunnel->l2tp_net),
- session->session_id);
+ g_head = l2tp_session_id_hash_2(pn, session->session_id);
spin_lock_bh(&pn->l2tp_session_hlist_lock);
--
2.18.0
^ permalink raw reply related
* [PATCH net-next 7/7] l2tp: make l2tp_xmit_core() return void
From: Guillaume Nault @ 2018-06-25 14:07 UTC (permalink / raw)
To: netdev; +Cc: James Chapman
In-Reply-To: <cover.1529935024.git.g.nault@alphalink.fr>
It always returns 0, and nobody reads the return value anyway.
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
---
net/l2tp/l2tp_core.c | 6 ++----
1 file changed, 2 insertions(+), 4 deletions(-)
diff --git a/net/l2tp/l2tp_core.c b/net/l2tp/l2tp_core.c
index 88c3001531b4..1ea285bad84b 100644
--- a/net/l2tp/l2tp_core.c
+++ b/net/l2tp/l2tp_core.c
@@ -1007,8 +1007,8 @@ static int l2tp_build_l2tpv3_header(struct l2tp_session *session, void *buf)
return bufp - optr;
}
-static int l2tp_xmit_core(struct l2tp_session *session, struct sk_buff *skb,
- struct flowi *fl, size_t data_len)
+static void l2tp_xmit_core(struct l2tp_session *session, struct sk_buff *skb,
+ struct flowi *fl, size_t data_len)
{
struct l2tp_tunnel *tunnel = session->tunnel;
unsigned int len = skb->len;
@@ -1050,8 +1050,6 @@ static int l2tp_xmit_core(struct l2tp_session *session, struct sk_buff *skb,
atomic_long_inc(&tunnel->stats.tx_errors);
atomic_long_inc(&session->stats.tx_errors);
}
-
- return 0;
}
/* If caller requires the skb to have a ppp header, the header must be
--
2.18.0
^ permalink raw reply related
* [PATCH net-next 1/7] l2tp: remove pppol2tp_session_close()
From: Guillaume Nault @ 2018-06-25 14:07 UTC (permalink / raw)
To: netdev; +Cc: James Chapman
In-Reply-To: <cover.1529935024.git.g.nault@alphalink.fr>
l2tp_core.c verifies that ->session_close() is defined before calling
it. There's no need for a stub.
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
---
net/l2tp/l2tp_ppp.c | 7 -------
1 file changed, 7 deletions(-)
diff --git a/net/l2tp/l2tp_ppp.c b/net/l2tp/l2tp_ppp.c
index 55188382845c..eea5d7844473 100644
--- a/net/l2tp/l2tp_ppp.c
+++ b/net/l2tp/l2tp_ppp.c
@@ -424,12 +424,6 @@ static void pppol2tp_put_sk(struct rcu_head *head)
sock_put(ps->__sk);
}
-/* Called by l2tp_core when a session socket is being closed.
- */
-static void pppol2tp_session_close(struct l2tp_session *session)
-{
-}
-
/* Really kill the session socket. (Called from sock_put() if
* refcnt == 0.)
*/
@@ -573,7 +567,6 @@ static void pppol2tp_session_init(struct l2tp_session *session)
struct dst_entry *dst;
session->recv_skb = pppol2tp_recv;
- session->session_close = pppol2tp_session_close;
#if IS_ENABLED(CONFIG_L2TP_DEBUGFS)
session->show = pppol2tp_show;
#endif
--
2.18.0
^ permalink raw reply related
* [PATCH net-next] rds: clean up loopback rds_connections on netns deletion
From: Sowmini Varadhan @ 2018-06-25 13:41 UTC (permalink / raw)
To: netdev, sowmini.varadhan
Cc: davem, rds-devel, sowmini.varadhan, santosh.shilimkar
The RDS core module creates rds_connections based on callbacks
from rds_loop_transport when sending/receiving packets to local
addresses.
These connections will need to be cleaned up when they are
created from a netns that is not init_net, and that netns is deleted.
Add the changes aligned with the changes from
commit ebeeb1ad9b8a ("rds: tcp: use rds_destroy_pending() to synchronize
netns/module teardown and rds connection/workq management") for
rds_loop_transport
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
---
net/rds/connection.c | 11 +++++++++-
net/rds/loop.c | 56 ++++++++++++++++++++++++++++++++++++++++++++++++++
net/rds/loop.h | 2 +
3 files changed, 68 insertions(+), 1 deletions(-)
diff --git a/net/rds/connection.c b/net/rds/connection.c
index abef75d..cfb0595 100644
--- a/net/rds/connection.c
+++ b/net/rds/connection.c
@@ -659,11 +659,19 @@ static void rds_conn_info(struct socket *sock, unsigned int len,
int rds_conn_init(void)
{
+ int ret;
+
+ ret = rds_loop_net_init(); /* register pernet callback */
+ if (ret)
+ return ret;
+
rds_conn_slab = kmem_cache_create("rds_connection",
sizeof(struct rds_connection),
0, 0, NULL);
- if (!rds_conn_slab)
+ if (!rds_conn_slab) {
+ rds_loop_net_exit();
return -ENOMEM;
+ }
rds_info_register_func(RDS_INFO_CONNECTIONS, rds_conn_info);
rds_info_register_func(RDS_INFO_SEND_MESSAGES,
@@ -676,6 +684,7 @@ int rds_conn_init(void)
void rds_conn_exit(void)
{
+ rds_loop_net_exit(); /* unregister pernet callback */
rds_loop_exit();
WARN_ON(!hlist_empty(rds_conn_hash));
diff --git a/net/rds/loop.c b/net/rds/loop.c
index dac6218..feea1f9 100644
--- a/net/rds/loop.c
+++ b/net/rds/loop.c
@@ -33,6 +33,8 @@
#include <linux/kernel.h>
#include <linux/slab.h>
#include <linux/in.h>
+#include <net/net_namespace.h>
+#include <net/netns/generic.h>
#include "rds_single_path.h"
#include "rds.h"
@@ -40,6 +42,17 @@
static DEFINE_SPINLOCK(loop_conns_lock);
static LIST_HEAD(loop_conns);
+static atomic_t rds_loop_unloading = ATOMIC_INIT(0);
+
+static void rds_loop_set_unloading(void)
+{
+ atomic_set(&rds_loop_unloading, 1);
+}
+
+static bool rds_loop_is_unloading(struct rds_connection *conn)
+{
+ return atomic_read(&rds_loop_unloading) != 0;
+}
/*
* This 'loopback' transport is a special case for flows that originate
@@ -165,6 +178,8 @@ void rds_loop_exit(void)
struct rds_loop_connection *lc, *_lc;
LIST_HEAD(tmp_list);
+ rds_loop_set_unloading();
+ synchronize_rcu();
/* avoid calling conn_destroy with irqs off */
spin_lock_irq(&loop_conns_lock);
list_splice(&loop_conns, &tmp_list);
@@ -177,6 +192,46 @@ void rds_loop_exit(void)
}
}
+static void rds_loop_kill_conns(struct net *net)
+{
+ struct rds_loop_connection *lc, *_lc;
+ LIST_HEAD(tmp_list);
+
+ spin_lock_irq(&loop_conns_lock);
+ list_for_each_entry_safe(lc, _lc, &loop_conns, loop_node) {
+ struct net *c_net = read_pnet(&lc->conn->c_net);
+
+ if (net != c_net)
+ continue;
+ list_move_tail(&lc->loop_node, &tmp_list);
+ }
+ spin_unlock_irq(&loop_conns_lock);
+
+ list_for_each_entry_safe(lc, _lc, &tmp_list, loop_node) {
+ WARN_ON(lc->conn->c_passive);
+ rds_conn_destroy(lc->conn);
+ }
+}
+
+static void __net_exit rds_loop_exit_net(struct net *net)
+{
+ rds_loop_kill_conns(net);
+}
+
+static struct pernet_operations rds_loop_net_ops = {
+ .exit = rds_loop_exit_net,
+};
+
+int rds_loop_net_init(void)
+{
+ return register_pernet_device(&rds_loop_net_ops);
+}
+
+void rds_loop_net_exit(void)
+{
+ unregister_pernet_device(&rds_loop_net_ops);
+}
+
/*
* This is missing .xmit_* because loop doesn't go through generic
* rds_send_xmit() and doesn't call rds_recv_incoming(). .listen_stop and
@@ -194,4 +249,5 @@ struct rds_transport rds_loop_transport = {
.inc_free = rds_loop_inc_free,
.t_name = "loopback",
.t_type = RDS_TRANS_LOOP,
+ .t_unloading = rds_loop_is_unloading,
};
diff --git a/net/rds/loop.h b/net/rds/loop.h
index 469fa4b..bbc8cdd 100644
--- a/net/rds/loop.h
+++ b/net/rds/loop.h
@@ -5,6 +5,8 @@
/* loop.c */
extern struct rds_transport rds_loop_transport;
+int rds_loop_net_init(void);
+void rds_loop_net_exit(void);
void rds_loop_exit(void);
#endif
--
1.7.1
^ permalink raw reply related
* [PATCH 2/4] net: lan78xx: Add support for VLAN filtering.
From: Dave Stevenson @ 2018-06-25 14:07 UTC (permalink / raw)
To: woojung.huh, UNGLinuxDriver, davem, netdev; +Cc: Dave Stevenson
In-Reply-To: <cover.1529935234.git.dave.stevenson@raspberrypi.org>
HW_VLAN_CTAG_FILTER was partially implemented, but not advertised
to Linux.
Complete the implementation of this.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
---
drivers/net/usb/lan78xx.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/net/usb/lan78xx.c b/drivers/net/usb/lan78xx.c
index 2f793d4..afe7fa3 100644
--- a/drivers/net/usb/lan78xx.c
+++ b/drivers/net/usb/lan78xx.c
@@ -2363,7 +2363,7 @@ static int lan78xx_set_features(struct net_device *netdev,
pdata->rfe_ctl &= ~(RFE_CTL_ICMP_COE_ | RFE_CTL_IGMP_COE_);
}
- if (features & NETIF_F_HW_VLAN_CTAG_RX)
+ if (features & NETIF_F_HW_VLAN_CTAG_FILTER)
pdata->rfe_ctl |= RFE_CTL_VLAN_FILTER_;
else
pdata->rfe_ctl &= ~RFE_CTL_VLAN_FILTER_;
@@ -2976,6 +2976,9 @@ static int lan78xx_bind(struct lan78xx_net *dev, struct usb_interface *intf)
if (DEFAULT_TSO_CSUM_ENABLE)
dev->net->features |= NETIF_F_TSO | NETIF_F_TSO6 | NETIF_F_SG;
+ if (DEFAULT_VLAN_FILTER_ENABLE)
+ dev->net->features |= NETIF_F_HW_VLAN_CTAG_FILTER;
+
dev->net->hw_features = dev->net->features;
ret = lan78xx_setup_irq_domain(dev);
--
2.7.4
^ permalink raw reply related
* [bpf-next PATCH 0/2] xdp/bpf: extend XDP samples/bpf xdp_rxq_info
From: Jesper Dangaard Brouer @ 2018-06-25 14:27 UTC (permalink / raw)
To: netdev, Jesper Dangaard Brouer
Cc: Daniel Borkmann, Toke Høiland-Jørgensen,
Alexei Starovoitov
While writing an article about XDP, the samples/bpf xdp_rxq_info
program were extended to cover some more use-cases.
---
Jesper Dangaard Brouer (2):
samples/bpf: extend xdp_rxq_info to read packet payload
samples/bpf: xdp_rxq_info action XDP_TX must adjust MAC-addrs
samples/bpf/xdp_rxq_info_kern.c | 43 +++++++++++++++++++++++++++++++++++++
samples/bpf/xdp_rxq_info_user.c | 45 ++++++++++++++++++++++++++++++++++-----
2 files changed, 82 insertions(+), 6 deletions(-)
^ permalink raw reply
* [bpf-next PATCH 1/2] samples/bpf: extend xdp_rxq_info to read packet payload
From: Jesper Dangaard Brouer @ 2018-06-25 14:27 UTC (permalink / raw)
To: netdev, Jesper Dangaard Brouer
Cc: Daniel Borkmann, Toke Høiland-Jørgensen,
Alexei Starovoitov
In-Reply-To: <152993682254.8835.8864318933370018087.stgit@firesoul>
There is a cost associated with reading the packet data payload
that this test ignored. Add option --read to allow enabling
reading part of the payload.
This sample/tool helps us analyse an issue observed with a NIC
mlx5 (ConnectX-5 Ex) and an Intel(R) Xeon(R) CPU E5-1650 v4.
With no_touch of data:
Running XDP on dev:mlx5p1 (ifindex:8) action:XDP_DROP options:no_touch
XDP stats CPU pps issue-pps
XDP-RX CPU 0 14,465,157 0
XDP-RX CPU 1 14,464,728 0
XDP-RX CPU 2 14,465,283 0
XDP-RX CPU 3 14,465,282 0
XDP-RX CPU 4 14,464,159 0
XDP-RX CPU 5 14,465,379 0
XDP-RX CPU total 86,789,992
When not touching data, we observe that the CPUs have idle cycles.
When reading data the CPUs are 100% busy in softirq.
With reading data:
Running XDP on dev:mlx5p1 (ifindex:8) action:XDP_DROP options:read
XDP stats CPU pps issue-pps
XDP-RX CPU 0 9,620,639 0
XDP-RX CPU 1 9,489,843 0
XDP-RX CPU 2 9,407,854 0
XDP-RX CPU 3 9,422,289 0
XDP-RX CPU 4 9,321,959 0
XDP-RX CPU 5 9,395,242 0
XDP-RX CPU total 56,657,828
The effect seen above is a result of cache-misses occuring when
more RXQs are being used. Based on perf-event observations, our
conclusion is that the CPUs DDIO (Direct Data I/O) choose to
deliver packet into main memory, instead of L3-cache. We also
found, that this can be mitigated by either using less RXQs or by
reducing NICs the RX-ring size.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk>
---
samples/bpf/xdp_rxq_info_kern.c | 19 +++++++++++++++++++
samples/bpf/xdp_rxq_info_user.c | 34 ++++++++++++++++++++++++++++------
2 files changed, 47 insertions(+), 6 deletions(-)
diff --git a/samples/bpf/xdp_rxq_info_kern.c b/samples/bpf/xdp_rxq_info_kern.c
index 3fd209291653..61af6210df2f 100644
--- a/samples/bpf/xdp_rxq_info_kern.c
+++ b/samples/bpf/xdp_rxq_info_kern.c
@@ -4,6 +4,8 @@
* Example howto extract XDP RX-queue info
*/
#include <uapi/linux/bpf.h>
+#include <uapi/linux/if_ether.h>
+#include <uapi/linux/in.h>
#include "bpf_helpers.h"
/* Config setup from with userspace
@@ -14,6 +16,11 @@
struct config {
__u32 action;
int ifindex;
+ __u32 options;
+};
+enum cfg_options_flags {
+ NO_TOUCH = 0x0U,
+ READ_MEM = 0x1U,
};
struct bpf_map_def SEC("maps") config_map = {
.type = BPF_MAP_TYPE_ARRAY,
@@ -90,6 +97,18 @@ int xdp_prognum0(struct xdp_md *ctx)
if (key == MAX_RXQs)
rxq_rec->issue++;
+ /* Default: Don't touch packet data, only count packets */
+ if (unlikely(config->options & READ_MEM)) {
+ struct ethhdr *eth = data;
+
+ if (eth + 1 > data_end)
+ return XDP_ABORTED;
+
+ /* Avoid compiler removing this: Drop non 802.3 Ethertypes */
+ if (ntohs(eth->h_proto) < ETH_P_802_3_MIN)
+ return XDP_ABORTED;
+ }
+
return config->action;
}
diff --git a/samples/bpf/xdp_rxq_info_user.c b/samples/bpf/xdp_rxq_info_user.c
index e4e9ba52bff0..435485d4f49e 100644
--- a/samples/bpf/xdp_rxq_info_user.c
+++ b/samples/bpf/xdp_rxq_info_user.c
@@ -50,6 +50,7 @@ static const struct option long_options[] = {
{"sec", required_argument, NULL, 's' },
{"no-separators", no_argument, NULL, 'z' },
{"action", required_argument, NULL, 'a' },
+ {"readmem", no_argument, NULL, 'r' },
{0, 0, NULL, 0 }
};
@@ -66,6 +67,11 @@ static void int_exit(int sig)
struct config {
__u32 action;
int ifindex;
+ __u32 options;
+};
+enum cfg_options_flags {
+ NO_TOUCH = 0x0U,
+ READ_MEM = 0x1U,
};
#define XDP_ACTION_MAX (XDP_TX + 1)
#define XDP_ACTION_MAX_STRLEN 11
@@ -109,6 +115,16 @@ static void list_xdp_actions(void)
printf("\n");
}
+static char* options2str(enum cfg_options_flags flag)
+{
+ if (flag == NO_TOUCH)
+ return "no_touch";
+ if (flag & READ_MEM)
+ return "read";
+ fprintf(stderr, "ERR: Unknown config option flags");
+ exit(EXIT_FAIL);
+}
+
static void usage(char *argv[])
{
int i;
@@ -305,7 +321,7 @@ static __u64 calc_errs_pps(struct datarec *r,
static void stats_print(struct stats_record *stats_rec,
struct stats_record *stats_prev,
- int action)
+ int action, __u32 cfg_opt)
{
unsigned int nr_rxqs = bpf_map__def(rx_queue_index_map)->max_entries;
unsigned int nr_cpus = bpf_num_possible_cpus();
@@ -316,8 +332,8 @@ static void stats_print(struct stats_record *stats_rec,
int i;
/* Header */
- printf("\nRunning XDP on dev:%s (ifindex:%d) action:%s\n",
- ifname, ifindex, action2str(action));
+ printf("\nRunning XDP on dev:%s (ifindex:%d) action:%s options:%s\n",
+ ifname, ifindex, action2str(action), options2str(cfg_opt));
/* stats_global_map */
{
@@ -399,7 +415,7 @@ static inline void swap(struct stats_record **a, struct stats_record **b)
*b = tmp;
}
-static void stats_poll(int interval, int action)
+static void stats_poll(int interval, int action, __u32 cfg_opt)
{
struct stats_record *record, *prev;
@@ -410,7 +426,7 @@ static void stats_poll(int interval, int action)
while (1) {
swap(&prev, &record);
stats_collect(record);
- stats_print(record, prev, action);
+ stats_print(record, prev, action, cfg_opt);
sleep(interval);
}
@@ -421,6 +437,7 @@ static void stats_poll(int interval, int action)
int main(int argc, char **argv)
{
+ __u32 cfg_options= NO_TOUCH ; /* Default: Don't touch packet memory */
struct rlimit r = {10 * 1024 * 1024, RLIM_INFINITY};
struct bpf_prog_load_attr prog_load_attr = {
.prog_type = BPF_PROG_TYPE_XDP,
@@ -435,6 +452,7 @@ int main(int argc, char **argv)
int interval = 2;
__u32 key = 0;
+
char action_str_buf[XDP_ACTION_MAX_STRLEN + 1 /* for \0 */] = { 0 };
int action = XDP_PASS; /* Default action */
char *action_str = NULL;
@@ -496,6 +514,9 @@ int main(int argc, char **argv)
action_str = (char *)&action_str_buf;
strncpy(action_str, optarg, XDP_ACTION_MAX_STRLEN);
break;
+ case 'r':
+ cfg_options |= READ_MEM;
+ break;
case 'h':
error:
default:
@@ -522,6 +543,7 @@ int main(int argc, char **argv)
}
}
cfg.action = action;
+ cfg.options = cfg_options;
/* Trick to pretty printf with thousands separators use %' */
if (use_separators)
@@ -542,6 +564,6 @@ int main(int argc, char **argv)
return EXIT_FAIL_XDP;
}
- stats_poll(interval, action);
+ stats_poll(interval, action, cfg_options);
return EXIT_OK;
}
^ permalink raw reply related
* [bpf-next PATCH 2/2] samples/bpf: xdp_rxq_info action XDP_TX must adjust MAC-addrs
From: Jesper Dangaard Brouer @ 2018-06-25 14:27 UTC (permalink / raw)
To: netdev, Jesper Dangaard Brouer
Cc: Daniel Borkmann, Toke Høiland-Jørgensen,
Alexei Starovoitov
In-Reply-To: <152993682254.8835.8864318933370018087.stgit@firesoul>
XDP_TX requires also changing the MAC-addrs, else some hardware
may drop the TX packet before reaching the wire. This was
observed with driver mlx5.
If xdp_rxq_info select --action XDP_TX the swapmac functionality
is activated. It is also possible to manually enable via cmdline
option --swapmac. This is practical if wanting to measure the
overhead of writing/updating payload for other action types.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk>
---
samples/bpf/xdp_rxq_info_kern.c | 26 +++++++++++++++++++++++++-
samples/bpf/xdp_rxq_info_user.c | 11 +++++++++++
2 files changed, 36 insertions(+), 1 deletion(-)
diff --git a/samples/bpf/xdp_rxq_info_kern.c b/samples/bpf/xdp_rxq_info_kern.c
index 61af6210df2f..222a83eed1cb 100644
--- a/samples/bpf/xdp_rxq_info_kern.c
+++ b/samples/bpf/xdp_rxq_info_kern.c
@@ -21,6 +21,7 @@ struct config {
enum cfg_options_flags {
NO_TOUCH = 0x0U,
READ_MEM = 0x1U,
+ SWAP_MAC = 0x2U,
};
struct bpf_map_def SEC("maps") config_map = {
.type = BPF_MAP_TYPE_ARRAY,
@@ -52,6 +53,23 @@ struct bpf_map_def SEC("maps") rx_queue_index_map = {
.max_entries = MAX_RXQs + 1,
};
+static __always_inline
+void swap_src_dst_mac(void *data)
+{
+ unsigned short *p = data;
+ unsigned short dst[3];
+
+ dst[0] = p[0];
+ dst[1] = p[1];
+ dst[2] = p[2];
+ p[0] = p[3];
+ p[1] = p[4];
+ p[2] = p[5];
+ p[3] = dst[0];
+ p[4] = dst[1];
+ p[5] = dst[2];
+}
+
SEC("xdp_prog0")
int xdp_prognum0(struct xdp_md *ctx)
{
@@ -98,7 +116,7 @@ int xdp_prognum0(struct xdp_md *ctx)
rxq_rec->issue++;
/* Default: Don't touch packet data, only count packets */
- if (unlikely(config->options & READ_MEM)) {
+ if (unlikely(config->options & (READ_MEM|SWAP_MAC))) {
struct ethhdr *eth = data;
if (eth + 1 > data_end)
@@ -107,6 +125,12 @@ int xdp_prognum0(struct xdp_md *ctx)
/* Avoid compiler removing this: Drop non 802.3 Ethertypes */
if (ntohs(eth->h_proto) < ETH_P_802_3_MIN)
return XDP_ABORTED;
+
+ /* XDP_TX requires changing MAC-addrs, else HW may drop.
+ * Can also be enabled with --swapmac (for test purposes)
+ */
+ if (unlikely(config->options & SWAP_MAC))
+ swap_src_dst_mac(data);
}
return config->action;
diff --git a/samples/bpf/xdp_rxq_info_user.c b/samples/bpf/xdp_rxq_info_user.c
index 435485d4f49e..248a7eab9531 100644
--- a/samples/bpf/xdp_rxq_info_user.c
+++ b/samples/bpf/xdp_rxq_info_user.c
@@ -51,6 +51,7 @@ static const struct option long_options[] = {
{"no-separators", no_argument, NULL, 'z' },
{"action", required_argument, NULL, 'a' },
{"readmem", no_argument, NULL, 'r' },
+ {"swapmac", no_argument, NULL, 'm' },
{0, 0, NULL, 0 }
};
@@ -72,6 +73,7 @@ struct config {
enum cfg_options_flags {
NO_TOUCH = 0x0U,
READ_MEM = 0x1U,
+ SWAP_MAC = 0x2U,
};
#define XDP_ACTION_MAX (XDP_TX + 1)
#define XDP_ACTION_MAX_STRLEN 11
@@ -119,6 +121,8 @@ static char* options2str(enum cfg_options_flags flag)
{
if (flag == NO_TOUCH)
return "no_touch";
+ if (flag & SWAP_MAC)
+ return "swapmac";
if (flag & READ_MEM)
return "read";
fprintf(stderr, "ERR: Unknown config option flags");
@@ -517,6 +521,9 @@ int main(int argc, char **argv)
case 'r':
cfg_options |= READ_MEM;
break;
+ case 'm':
+ cfg_options |= SWAP_MAC;
+ break;
case 'h':
error:
default:
@@ -543,6 +550,10 @@ int main(int argc, char **argv)
}
}
cfg.action = action;
+
+ /* XDP_TX requires changing MAC-addrs, else HW may drop */
+ if (action == XDP_TX)
+ cfg_options |= SWAP_MAC;
cfg.options = cfg_options;
/* Trick to pretty printf with thousands separators use %' */
^ permalink raw reply related
* Re: [PATCH RFC ipsec-next] xfrm: Extend the output_mark to support input direction and masking.
From: Steffen Klassert @ 2018-06-25 14:31 UTC (permalink / raw)
To: netdev; +Cc: Tobias Brunner, Eyal Birger, Lorenzo Colitti
In-Reply-To: <20180615065514.bmy6tamr4fqivpyp@gauss3.secunet.de>
On Fri, Jun 15, 2018 at 08:55:14AM +0200, Steffen Klassert wrote:
> We already support setting an output mark at the xfrm_state,
> unfortunately this does not support the input direction and
> masking the marks that will be applied to the skb. This change
> adds support applying a masked value in both directions.
>
> The existing XFRMA_OUTPUT_MARK number is reused for this purpose
> and as it is now bi-directional, it is renamed to XFRMA_SET_MARK.
>
> An additional XFRMA_SET_MARK_MASK attribute is added for setting the
> mask. If the attribute mask not provided, it is set to 0xffffffff,
> keeping the XFRMA_OUTPUT_MARK existing 'full mask' semantics.
>
> Co-developed-by: Tobias Brunner <tobias@strongswan.org>
> Co-developed-by: Eyal Birger <eyal.birger@gmail.com>
> Co-developed-by: Lorenzo Colitti <lorenzo@google.com>
> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
> Signed-off-by: Tobias Brunner <tobias@strongswan.org>
> Signed-off-by: Eyal Birger <eyal.birger@gmail.com>
> Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
This is now applied to ipsec-next.
^ permalink raw reply
* Re: [PATCH RFC v2 ipsec-next 0/3] Virtual xfrm interfaces
From: Steffen Klassert @ 2018-06-25 14:34 UTC (permalink / raw)
To: netdev, David Miller
Cc: Eyal Birger, Antony Antony, Benedict Wong, Lorenzo Colitti,
Shannon Nelson
In-Reply-To: <20180612075610.2000-1-steffen.klassert@secunet.com>
On Tue, Jun 12, 2018 at 09:56:07AM +0200, Steffen Klassert wrote:
> This patchset introduces new virtual xfrm interfaces.
> The design of virtual xfrm interfaces interfaces was
> discussed at the Linux IPsec workshop 2018. This patchset
> implements these interfaces as the IPsec userspace and
> kernel developers agreed. The purpose of these interfaces
> is to overcome the design limitations that the existing
> VTI devices have.
>
> The main limitations that we see with the current VTI are the
> following:
>
> - VTI interfaces are L3 tunnels with configurable endpoints.
> For xfrm, the tunnel endpoint are already determined by the SA.
> So the VTI tunnel endpoints must be either the same as on the
> SA or wildcards. In case VTI tunnel endpoints are same as on
> the SA, we get a one to one correlation between the SA and
> the tunnel. So each SA needs its own tunnel interface.
>
> On the other hand, we can have only one VTI tunnel with
> wildcard src/dst tunnel endpoints in the system because the
> lookup is based on the tunnel endpoints. The existing tunnel
> lookup won't work with multiple tunnels with wildcard
> tunnel endpoints. Some usecases require more than on
> VTI tunnel of this type, for example if somebody has multiple
> namespaces and every namespace requires such a VTI.
>
> - VTI needs separate interfaces for IPv4 and IPv6 tunnels.
> So when routing to a VTI, we have to know to which address
> family this traffic class is going to be encapsulated.
> This is a lmitation because it makes routing more complex
> and it is not always possible to know what happens behind the
> VTI, e.g. when the VTI is move to some namespace.
>
> - VTI works just with tunnel mode SAs. We need generic interfaces
> that ensures transfomation, regardless of the xfrm mode and
> the encapsulated address family.
>
> - VTI is configured with a combination GRE keys and xfrm marks.
> With this we have to deal with some extra cases in the generic
> tunnel lookup because the GRE keys on the VTI are actually
> not GRE keys, the GRE keys were just reused for something else.
> All extensions to the VTI interfaces would require to add
> even more complexity to the generic tunnel lookup.
>
> To overcome this, we started with the following design goal:
>
> - It should be possible to tunnel IPv4 and IPv6 through the same
> interface.
>
> - No limitation on xfrm mode (tunnel, transport and beet).
>
> - Should be a generic virtual interface that ensures IPsec
> transformation, no need to know what happens behind the
> interface.
>
> - Interfaces should be configured with a new key that must match a
> new policy/SA lookup key.
>
> - The lookup logic should stay in the xfrm codebase, no need to
> change or extend generic routing and tunnel lookups.
>
> - Should be possible to use IPsec hardware offloads of the underlying
> interface.
>
> Changes from v1:
>
> - Document the limitations of VTI interfaces and the design of
> the new xfrm interfaces more explicit in the commit messages.
>
> - No code changes.
I have not got any further comments, so applied to ipsec-next.
^ permalink raw reply
* [PATCH 4/4] net: lan78xx: Use s/w csum check on VLANs without tag stripping
From: Dave Stevenson @ 2018-06-25 14:07 UTC (permalink / raw)
To: woojung.huh, UNGLinuxDriver, davem, netdev; +Cc: Dave Stevenson
In-Reply-To: <cover.1529935234.git.dave.stevenson@raspberrypi.org>
Observations of VLANs dropping packets due to invalid
checksums when not offloading VLAN tag receive.
With VLAN tag stripping enabled no issue is observed.
Drop back to s/w checksums if VLAN offload is disabled.
Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
---
drivers/net/usb/lan78xx.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/drivers/net/usb/lan78xx.c b/drivers/net/usb/lan78xx.c
index f72a8f5..6f2ea84 100644
--- a/drivers/net/usb/lan78xx.c
+++ b/drivers/net/usb/lan78xx.c
@@ -3052,8 +3052,13 @@ static void lan78xx_rx_csum_offload(struct lan78xx_net *dev,
struct sk_buff *skb,
u32 rx_cmd_a, u32 rx_cmd_b)
{
+ /* HW Checksum offload appears to be flawed if used when not stripping
+ * VLAN headers. Drop back to S/W checksums under these conditions.
+ */
if (!(dev->net->features & NETIF_F_RXCSUM) ||
- unlikely(rx_cmd_a & RX_CMD_A_ICSM_)) {
+ unlikely(rx_cmd_a & RX_CMD_A_ICSM_) ||
+ ((rx_cmd_a & RX_CMD_A_FVTG_) &&
+ !(dev->net->features & NETIF_F_HW_VLAN_CTAG_RX))) {
skb->ip_summed = CHECKSUM_NONE;
} else {
skb->csum = ntohs((u16)(rx_cmd_b >> RX_CMD_B_CSUM_SHIFT_));
--
2.7.4
^ permalink raw reply related
* Re: [PATCH ipsec-next] xfrm: policy: remove pcpu policy cache
From: Steffen Klassert @ 2018-06-25 14:42 UTC (permalink / raw)
To: Florian Westphal; +Cc: netdev
In-Reply-To: <20180625115753.13161-1-fw@strlen.de>
On Mon, Jun 25, 2018 at 01:57:53PM +0200, Florian Westphal wrote:
> Kristian Evensen says:
> In a project I am involved in, we are running ipsec (Strongswan) on
> different mt7621-based routers. Each router is configured as an
> initiator and has around ~30 tunnels to different responders (running
> on misc. devices). Before the flow cache was removed (kernel 4.9), we
> got a combined throughput of around 70Mbit/s for all tunnels on one
> router. However, we recently switched to kernel 4.14 (4.14.48), and
> the total throughput is somewhere around 57Mbit/s (best-case). I.e., a
> drop of around 20%. Reverting the flow cache removal restores, as
> expected, performance levels to that of kernel 4.9.
>
> When pcpu xdst exists, it has to be validated first before it can be
> used.
>
> A negative hit thus increases cost vs. no-cache.
>
> As number of tunnels increases, hit rate decreases so this pcpu caching
> isn't a viable strategy.
>
> Furthermore, the xdst cache also needs to run with BH off, so when
> removing this the bh disable/enable pairs can be removed too.
>
> Kristian tested a 4.14.y backport of this change and reported
> increased performance:
>
> In our tests, the throughput reduction has been reduced from around -20%
> to -5%. We also see that the overall throughput is independent of the
> number of tunnels, while before the throughput was reduced as the number
> of tunnels increased.
>
> Reported-by: Kristian Evensen <kristian.evensen@gmail.com>
> Signed-off-by: Florian Westphal <fw@strlen.de>
Can you please rebase this to ipsec-next current?
It does not apply cleanly after the merge of the
xfrm interface patches.
Thanks!
^ permalink raw reply
* [PATCH net-next v2] selftests: net: Test headroom handling of ip6_gre devices
From: Petr Machata @ 2018-06-25 14:43 UTC (permalink / raw)
To: netdev, linux-kselftest; +Cc: davem, shuah, u9012063
Commit 5691484df961 ("net: ip6_gre: Fix headroom request in
ip6erspan_tunnel_xmit()") and commit 01b8d064d58b ("net: ip6_gre:
Request headroom in __gre6_xmit()") fix problems in reserving headroom
in the packets tunneled through ip6gre/tap and ip6erspan netdevices.
These two patches included snippets that reproduced the issues. This
patch elevates the snippets to a full-fledged test case.
Suggested-by: David Miller <davem@davemloft.net>
Signed-off-by: Petr Machata <petrm@mellanox.com>
---
Notes:
Changes between v1 and v2:
- Move tunnel construction to setup() and destruction to cleanup().
tools/testing/selftests/net/ip6_gre_headroom.sh | 65 +++++++++++++++++++++++++
1 file changed, 65 insertions(+)
create mode 100755 tools/testing/selftests/net/ip6_gre_headroom.sh
diff --git a/tools/testing/selftests/net/ip6_gre_headroom.sh b/tools/testing/selftests/net/ip6_gre_headroom.sh
new file mode 100755
index 000000000000..5b41e8bb6e2d
--- /dev/null
+++ b/tools/testing/selftests/net/ip6_gre_headroom.sh
@@ -0,0 +1,65 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+#
+# Test that enough headroom is reserved for the first packet passing through an
+# IPv6 GRE-like netdevice.
+
+setup_prepare()
+{
+ ip link add h1 type veth peer name swp1
+ ip link add h3 type veth peer name swp3
+
+ ip link set dev h1 up
+ ip address add 192.0.2.1/28 dev h1
+
+ ip link add dev vh3 type vrf table 20
+ ip link set dev h3 master vh3
+ ip link set dev vh3 up
+ ip link set dev h3 up
+
+ ip link set dev swp3 up
+ ip address add dev swp3 2001:db8:2::1/64
+ ip address add dev swp3 2001:db8:2::3/64
+
+ ip link set dev swp1 up
+ tc qdisc add dev swp1 clsact
+
+ ip link add name er6 type ip6erspan \
+ local 2001:db8:2::1 remote 2001:db8:2::2 oseq okey 123
+ ip link set dev er6 up
+
+ ip link add name gt6 type ip6gretap \
+ local 2001:db8:2::3 remote 2001:db8:2::4
+ ip link set dev gt6 up
+
+ sleep 1
+}
+
+cleanup()
+{
+ ip link del dev gt6
+ ip link del dev er6
+ ip link del dev swp1
+ ip link del dev swp3
+ ip link del dev vh3
+}
+
+test_headroom()
+{
+ local type=$1; shift
+ local tundev=$1; shift
+
+ tc filter add dev swp1 ingress pref 1000 matchall skip_hw \
+ action mirred egress mirror dev $tundev
+ ping -I h1 192.0.2.2 -c 1 -w 2 &> /dev/null
+ tc filter del dev swp1 ingress pref 1000
+
+ # If it doesn't panic, it passes.
+ printf "TEST: %-60s [PASS]\n" "$type headroom"
+}
+
+trap cleanup EXIT
+
+setup_prepare
+test_headroom ip6gretap gt6
+test_headroom ip6erspan er6
--
2.4.11
^ permalink raw reply related
* Re: [PATCH ipsec] xfrm: free skb if nlsk pointer is NULL
From: Steffen Klassert @ 2018-06-25 14:45 UTC (permalink / raw)
To: Florian Westphal; +Cc: netdev
In-Reply-To: <20180625120007.13345-1-fw@strlen.de>
On Mon, Jun 25, 2018 at 02:00:07PM +0200, Florian Westphal wrote:
> nlmsg_multicast() always frees the skb, so in case we cannot call
> it we must do that ourselves.
>
> Fixes: 21ee543edc0dea ("xfrm: fix race between netns cleanup and state expire notification")
> Signed-off-by: Florian Westphal <fw@strlen.de>
Applied, thanks Florian!
^ permalink raw reply
* Re: [PATCH net-next] rds: clean up loopback rds_connections on netns deletion
From: Sowmini Varadhan @ 2018-06-25 14:49 UTC (permalink / raw)
To: davem, rds-devel, santosh.shilimkar, netdev; +Cc: syzkaller-bugs
In-Reply-To: <1529934085-181126-1-git-send-email-sowmini.varadhan@oracle.com>
On (06/25/18 06:41), Sowmini Varadhan wrote:
:
> Add the changes aligned with the changes from
> commit ebeeb1ad9b8a ("rds: tcp: use rds_destroy_pending() to synchronize
> netns/module teardown and rds connection/workq management") for
> rds_loop_transport
FWIW, I am optimistic that this will take care of a number
of the use-after-free panics reported by syzbot (I have not
marked the patch with the recommended syzkaller Reported-by
tags because I was not able to reproduce each original issue,
but inspection of the traces suggests this missing patch may
be behind the races that cause the reports).
--Sowmini
^ permalink raw reply
* Re: [PATCH] ipv6: avoid copy_from_user() via ipv6_renew_options_kern()
From: Paul Moore @ 2018-06-25 14:49 UTC (permalink / raw)
To: davem; +Cc: viro, Paul Moore, netdev, selinux, linux-security-module
In-Reply-To: <20180624.164837.37612664745856114.davem@davemloft.net>
On Sun, Jun 24, 2018 at 3:48 AM David Miller <davem@davemloft.net> wrote:
>
> From: Al Viro <viro@ZenIV.linux.org.uk>
> Date: Sat, 23 Jun 2018 23:21:07 +0100
>
> > BTW, I wonder if the life would be simpler with do_ipv6_setsockopt() doing
> > the copy-in and verifying ipv6_optlen(*hdr) <= newoptlen; that would've
> > simplified ipv6_renew_option{,s}() quite a bit and completely eliminated
> > ipv6_renew_options_kern()...
>
> I agree that this makes things a lot simpler.
I had looked at moving the userspace copy up, but feared it was a bit
too invasive. It sounds like you are open to the idea so I'll code
something up.
> One thing that drives me crazy though is this inherit stuff:
>
> > + ipv6_renew_option(newtype == IPV6_HOPOPTS ? newopt :
> > + opt ? opt->hopopt : NULL,
>
> Why don't we pass the type into ipv6_renew_option() and have it
> do this pointer dance instead?
>
> That's going to definitely be easier to read.
I agree, that struck me as a little odd. I'll rework that too. I'll
send you guys something this week to take a look at.
Thanks.
> I don't know enough about this code to give feedback about the
> option length handling wrt. copies, sorry.
--
paul moore
www.paul-moore.com
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox