Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCHv2 net-next] sctp: add support for SCTP_REUSE_PORT sockopt
From: Marcelo Ricardo Leitner @ 2018-06-25 13:30 UTC (permalink / raw)
  To: Xin Long; +Cc: network dev, linux-sctp, Neil Horman, Michael Tuexen, davem
In-Reply-To: <2ad8be46a8656655e4c9e05aad3df6faf34fed19.1529892406.git.lucien.xin@gmail.com>

On Mon, Jun 25, 2018 at 10:06:46AM +0800, Xin Long wrote:
> This feature is actually already supported by sk->sk_reuse which can be
> set by socket level opt SO_REUSEADDR. But it's not working exactly as
> RFC6458 demands in section 8.1.27, like:
> 
>   - This option only supports one-to-one style SCTP sockets
>   - This socket option must not be used after calling bind()
>     or sctp_bindx().
> 
> Besides, SCTP_REUSE_PORT sockopt should be provided for user's programs.
> Otherwise, the programs with SCTP_REUSE_PORT from other systems will not
> work in linux.
> 
> To separate it from the socket level version, this patch adds 'reuse' in
> sctp_sock and it works pretty much as sk->sk_reuse, but with some extra
> setup limitations that are needed when it is being enabled.
> 
> "It should be noted that the behavior of the socket-level socket option
> to reuse ports and/or addresses for SCTP sockets is unspecified", so it
> leaves SO_REUSEADDR as is for the compatibility.
> 
> Note that the name SCTP_REUSE_PORT is kind of confusing, it is identical
> to SO_REUSEADDR with some extra restriction, so here it uses 'reuse' in
> sctp_sock instead of 'reuseport'. As for sk->sk_reuseport support for
> SCTP, it will be added in another patch.

To help changelog readers later, please update to something like:

"""\
Note that the name SCTP_REUSE_PORT is somewhat confusing, as its
functionality is nearly identical to SO_REUSEADDR, but with some
extra restrictions. Here it uses 'reuse' in sctp_sock instead of
'reuseport'. As for sk->sk_reuseport support for SCTP, it will be
added in another patch.
"""

Makes sense, can you note the difference?

> 
> Thanks to Neil to make this clear.
> 
> v1->v2:
>   - add sctp_sk->reuse to separate it from the socket level version.
> 
> Acked-by: Neil Horman <nhorman@tuxdriver.com>
> Signed-off-by: Xin Long <lucien.xin@gmail.com>
> ---
>  include/net/sctp/structs.h |  1 +
>  include/uapi/linux/sctp.h  |  1 +
>  net/sctp/socket.c          | 62 ++++++++++++++++++++++++++++++++++++++++------
>  3 files changed, 57 insertions(+), 7 deletions(-)
> 
> diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
> index e0f962d..701a517 100644
> --- a/include/net/sctp/structs.h
> +++ b/include/net/sctp/structs.h
> @@ -220,6 +220,7 @@ struct sctp_sock {
>  	__u32 adaptation_ind;
>  	__u32 pd_point;
>  	__u16	nodelay:1,
> +		reuse:1,
>  		disable_fragments:1,
>  		v4mapped:1,
>  		frag_interleave:1,
> diff --git a/include/uapi/linux/sctp.h b/include/uapi/linux/sctp.h
> index b64d583..c02986a 100644
> --- a/include/uapi/linux/sctp.h
> +++ b/include/uapi/linux/sctp.h
> @@ -100,6 +100,7 @@ typedef __s32 sctp_assoc_t;
>  #define SCTP_RECVNXTINFO	33
>  #define SCTP_DEFAULT_SNDINFO	34
>  #define SCTP_AUTH_DEACTIVATE_KEY	35
> +#define SCTP_REUSE_PORT		36
>  
>  /* Internal Socket Options. Some of the sctp library functions are
>   * implemented using these socket options.
> diff --git a/net/sctp/socket.c b/net/sctp/socket.c
> index 0e91e83..bf11f9c 100644
> --- a/net/sctp/socket.c
> +++ b/net/sctp/socket.c
> @@ -4170,6 +4170,28 @@ static int sctp_setsockopt_interleaving_supported(struct sock *sk,
>  	return retval;
>  }
>  
> +static int sctp_setsockopt_reuse_port(struct sock *sk, char __user *optval,
> +				      unsigned int optlen)
> +{
> +	int val;
> +
> +	if (!sctp_style(sk, TCP))
> +		return -EOPNOTSUPP;
> +
> +	if (sctp_sk(sk)->ep->base.bind_addr.port)
> +		return -EFAULT;
> +
> +	if (optlen < sizeof(int))
> +		return -EINVAL;
> +
> +	if (get_user(val, (int __user *)optval))
> +		return -EFAULT;
> +
> +	sctp_sk(sk)->reuse = !!val;
> +
> +	return 0;
> +}
> +
>  /* API 6.2 setsockopt(), getsockopt()
>   *
>   * Applications use setsockopt() and getsockopt() to set or retrieve
> @@ -4364,6 +4386,9 @@ static int sctp_setsockopt(struct sock *sk, int level, int optname,
>  		retval = sctp_setsockopt_interleaving_supported(sk, optval,
>  								optlen);
>  		break;
> +	case SCTP_REUSE_PORT:
> +		retval = sctp_setsockopt_reuse_port(sk, optval, optlen);
> +		break;
>  	default:
>  		retval = -ENOPROTOOPT;
>  		break;
> @@ -7197,6 +7222,26 @@ static int sctp_getsockopt_interleaving_supported(struct sock *sk, int len,
>  	return retval;
>  }
>  
> +static int sctp_getsockopt_reuse_port(struct sock *sk, int len,
> +				      char __user *optval,
> +				      int __user *optlen)
> +{
> +	int val;
> +
> +	if (len < sizeof(int))
> +		return -EINVAL;
> +
> +	len = sizeof(int);
> +	val = sctp_sk(sk)->reuse;
> +	if (put_user(len, optlen))
> +		return -EFAULT;
> +
> +	if (copy_to_user(optval, &val, len))
> +		return -EFAULT;
> +
> +	return 0;
> +}
> +
>  static int sctp_getsockopt(struct sock *sk, int level, int optname,
>  			   char __user *optval, int __user *optlen)
>  {
> @@ -7392,6 +7437,9 @@ static int sctp_getsockopt(struct sock *sk, int level, int optname,
>  		retval = sctp_getsockopt_interleaving_supported(sk, len, optval,
>  								optlen);
>  		break;
> +	case SCTP_REUSE_PORT:
> +		retval = sctp_getsockopt_reuse_port(sk, len, optval, optlen);
> +		break;
>  	default:
>  		retval = -ENOPROTOOPT;
>  		break;
> @@ -7429,6 +7477,7 @@ static struct sctp_bind_bucket *sctp_bucket_create(
>  
>  static long sctp_get_port_local(struct sock *sk, union sctp_addr *addr)
>  {
> +	bool reuse = (sk->sk_reuse || sctp_sk(sk)->reuse);
>  	struct sctp_bind_hashbucket *head; /* hash list */
>  	struct sctp_bind_bucket *pp;
>  	unsigned short snum;
> @@ -7501,13 +7550,11 @@ static long sctp_get_port_local(struct sock *sk, union sctp_addr *addr)
>  		 * used by other socket (pp->owner not empty); that other
>  		 * socket is going to be sk2.
>  		 */
> -		int reuse = sk->sk_reuse;
>  		struct sock *sk2;
>  
>  		pr_debug("%s: found a possible match\n", __func__);
>  
> -		if (pp->fastreuse && sk->sk_reuse &&
> -			sk->sk_state != SCTP_SS_LISTENING)
> +		if (pp->fastreuse && reuse && sk->sk_state != SCTP_SS_LISTENING)
>  			goto success;
>  
>  		/* Run through the list of sockets bound to the port
> @@ -7525,7 +7572,7 @@ static long sctp_get_port_local(struct sock *sk, union sctp_addr *addr)
>  			ep2 = sctp_sk(sk2)->ep;
>  
>  			if (sk == sk2 ||
> -			    (reuse && sk2->sk_reuse &&
> +			    (reuse && (sk2->sk_reuse || sctp_sk(sk2)->reuse) &&
>  			     sk2->sk_state != SCTP_SS_LISTENING))
>  				continue;
>  
> @@ -7549,12 +7596,12 @@ static long sctp_get_port_local(struct sock *sk, union sctp_addr *addr)
>  	 * SO_REUSEADDR on this socket -sk-).
>  	 */
>  	if (hlist_empty(&pp->owner)) {
> -		if (sk->sk_reuse && sk->sk_state != SCTP_SS_LISTENING)
> +		if (reuse && sk->sk_state != SCTP_SS_LISTENING)
>  			pp->fastreuse = 1;
>  		else
>  			pp->fastreuse = 0;
>  	} else if (pp->fastreuse &&
> -		(!sk->sk_reuse || sk->sk_state == SCTP_SS_LISTENING))
> +		   (!reuse || sk->sk_state == SCTP_SS_LISTENING))
>  		pp->fastreuse = 0;
>  
>  	/* We are set, so fill up all the data in the hash table
> @@ -7685,7 +7732,7 @@ int sctp_inet_listen(struct socket *sock, int backlog)
>  		err = 0;
>  		sctp_unhash_endpoint(ep);
>  		sk->sk_state = SCTP_SS_CLOSED;
> -		if (sk->sk_reuse)
> +		if (sk->sk_reuse || sctp_sk(sk)->reuse)
>  			sctp_sk(sk)->bind_hash->fastreuse = 1;
>  		goto out;
>  	}
> @@ -8550,6 +8597,7 @@ void sctp_copy_sock(struct sock *newsk, struct sock *sk,
>  	newsk->sk_no_check_tx = sk->sk_no_check_tx;
>  	newsk->sk_no_check_rx = sk->sk_no_check_rx;
>  	newsk->sk_reuse = sk->sk_reuse;
> +	sctp_sk(newsk)->reuse = sp->reuse;
>  
>  	newsk->sk_shutdown = sk->sk_shutdown;
>  	newsk->sk_destruct = sctp_destruct_sock;
> -- 
> 2.1.0
> 

^ permalink raw reply

* Re: [PATCH net-next 1/3] rds: Changing IP address internal representation to struct in6_addr
From: kbuild test robot @ 2018-06-25 13:39 UTC (permalink / raw)
  To: Ka-Cheong Poon; +Cc: kbuild-all, netdev, santosh.shilimkar, davem, rds-devel
In-Reply-To: <3b53ed6f18e697fe8e414907cbbba81834515a2a.1529922794.git.ka-cheong.poon@oracle.com>

Hi Ka-Cheong,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on net-next/master]

url:    https://github.com/0day-ci/linux/commits/Ka-Cheong-Poon/rds-IPv6-support/20180625-190047
reproduce:
        # apt-get install sparse
        make ARCH=x86_64 allmodconfig
        make C=1 CF=-D__CHECK_ENDIAN__


sparse warnings: (new ones prefixed by >>)

>> net/rds/rdma_transport.c:42:5: sparse: symbol 'rds_rdma_cm_event_handler_cmn' was not declared. Should it be static?
--
   net/rds/ib_cm.c:205:17: sparse: expression using sizeof(void)
   net/rds/ib_cm.c:205:17: sparse: expression using sizeof(void)
   net/rds/ib_cm.c:207:17: sparse: expression using sizeof(void)
   net/rds/ib_cm.c:207:17: sparse: expression using sizeof(void)
   net/rds/ib_cm.c:208:35: sparse: expression using sizeof(void)
   include/linux/overflow.h:220:13: sparse: undefined identifier '__builtin_mul_overflow'
   include/linux/overflow.h:220:13: sparse: incorrect type in conditional
   include/linux/overflow.h:220:13:    got void
   include/linux/overflow.h:220:13: sparse: not a function <noident>
   include/linux/overflow.h:220:13: sparse: incorrect type in conditional
   include/linux/overflow.h:220:13:    got void
>> net/rds/ib_cm.c:908:31: sparse: incorrect type in assignment (different base types) @@    expected restricted __be16 [usertype] sin_port @@    got unsignedrestricted __be16 [usertype] sin_port @@
   net/rds/ib_cm.c:908:31:    expected restricted __be16 [usertype] sin_port
   net/rds/ib_cm.c:908:31:    got unsigned short [unsigned] [usertype] <noident>
   net/rds/ib_cm.c:913:31: sparse: incorrect type in assignment (different base types) @@    expected restricted __be16 [usertype] sin_port @@    got unsignedrestricted __be16 [usertype] sin_port @@
   net/rds/ib_cm.c:913:31:    expected restricted __be16 [usertype] sin_port
   net/rds/ib_cm.c:913:31:    got unsigned short [unsigned] [usertype] <noident>
>> net/rds/ib_cm.c:920:33: sparse: incorrect type in assignment (different base types) @@    expected restricted __be16 [usertype] sin6_port @@    got unsignedrestricted __be16 [usertype] sin6_port @@
   net/rds/ib_cm.c:920:33:    expected restricted __be16 [usertype] sin6_port
   net/rds/ib_cm.c:920:33:    got unsigned short [unsigned] [usertype] <noident>
   net/rds/ib_cm.c:926:33: sparse: incorrect type in assignment (different base types) @@    expected restricted __be16 [usertype] sin6_port @@    got unsignedrestricted __be16 [usertype] sin6_port @@
   net/rds/ib_cm.c:926:33:    expected restricted __be16 [usertype] sin6_port
   net/rds/ib_cm.c:926:33:    got unsigned short [unsigned] [usertype] <noident>
   include/linux/overflow.h:220:13: sparse: call with no type!
--
>> net/rds/tcp.c:294:9: sparse: context imbalance in 'rds_tcp_laddr_check' - different lock contexts for basic block
--
   net/rds/tcp_connect.c:119:29: sparse: incorrect type in assignment (different base types) @@    expected restricted __be32 [assigned] [usertype] s_addr @@    got icted __be32 [assigned] [usertype] s_addr @@
   net/rds/tcp_connect.c:119:29:    expected restricted __be32 [assigned] [usertype] s_addr
   net/rds/tcp_connect.c:119:29:    got unsigned int [unsigned] [usertype] <noident>
   net/rds/tcp_connect.c:120:22: sparse: incorrect type in assignment (different base types) @@    expected restricted __be16 [assigned] [usertype] sin_port @@    got tricted __be16 [assigned] [usertype] sin_port @@
   net/rds/tcp_connect.c:120:22:    expected restricted __be16 [assigned] [usertype] sin_port
   net/rds/tcp_connect.c:120:22:    got unsigned short [unsigned] [usertype] <noident>
>> net/rds/tcp_connect.c:132:29: sparse: incorrect type in assignment (different base types) @@    expected restricted __be32 [addressable] [assigned] [usertype] s_addr @@    got addressable] [assigned] [usertype] s_addr @@
   net/rds/tcp_connect.c:132:29:    expected restricted __be32 [addressable] [assigned] [usertype] s_addr
   net/rds/tcp_connect.c:132:29:    got unsigned int [unsigned] [usertype] <noident>
>> net/rds/tcp_connect.c:133:22: sparse: incorrect type in assignment (different base types) @@    expected restricted __be16 [addressable] [assigned] [usertype] sin_port @@    got  [addressable] [assigned] [usertype] sin_port @@
   net/rds/tcp_connect.c:133:22:    expected restricted __be16 [addressable] [assigned] [usertype] sin_port
   net/rds/tcp_connect.c:133:22:    got unsigned short [unsigned] [usertype] <noident>
--
   net/rds/af_rds.c:226:22: sparse: invalid assignment: |=
   net/rds/af_rds.c:226:22:    left side has type restricted __poll_t
   net/rds/af_rds.c:226:22:    right side has type int
>> net/rds/af_rds.c:499:34: sparse: restricted __be32 degrades to integer
   net/rds/af_rds.c:504:34: sparse: restricted __be32 degrades to integer
--
   net/rds/bind.c:105:25: sparse: expression using sizeof(void)
>> net/rds/bind.c:177:34: sparse: restricted __be32 degrades to integer
--
   net/rds/send.c:363:39: sparse: expression using sizeof(void)
   net/rds/send.c:363:39: sparse: expression using sizeof(void)
   net/rds/send.c:372:39: sparse: expression using sizeof(void)
   net/rds/send.c:372:39: sparse: expression using sizeof(void)
   net/rds/send.c:1015:24: sparse: incorrect type in argument 1 (different base types) @@    expected unsigned int [unsigned] [usertype] a @@    got ed int [unsigned] [usertype] a @@
   net/rds/send.c:1015:24:    expected unsigned int [unsigned] [usertype] a
   net/rds/send.c:1015:24:    got restricted __be16 [usertype] sin6_port
   net/rds/send.c:1017:24: sparse: incorrect type in argument 1 (different base types) @@    expected unsigned int [unsigned] [usertype] a @@    got ed int [unsigned] [usertype] a @@
   net/rds/send.c:1017:24:    expected unsigned int [unsigned] [usertype] a
   net/rds/send.c:1017:24:    got restricted __be16 [usertype] sin6_port
>> net/rds/send.c:1097:43: sparse: restricted __be32 degrades to integer
   net/rds/send.c:1098:43: sparse: restricted __be32 degrades to integer
   net/rds/send.c:1149:13: sparse: expression using sizeof(void)
   net/rds/send.c:1149:13: sparse: expression using sizeof(void)
   net/rds/send.c:1352:30: sparse: incorrect type in initializer (different base types) @@    expected unsigned short [unsigned] [usertype] npaths @@    got  short [unsigned] [usertype] npaths @@
   net/rds/send.c:1352:30:    expected unsigned short [unsigned] [usertype] npaths
   net/rds/send.c:1352:30:    got restricted __be16 [usertype] <noident>
   net/rds/send.c:1353:34: sparse: incorrect type in initializer (different base types) @@    expected unsigned int [unsigned] [usertype] my_gen_num @@    got ed int [unsigned] [usertype] my_gen_num @@
   net/rds/send.c:1353:34:    expected unsigned int [unsigned] [usertype] my_gen_num
   net/rds/send.c:1353:34:    got restricted __be32 [usertype] <noident>

^ permalink raw reply

* [RFC PATCH] rds: rds_rdma_cm_event_handler_cmn() can be static
From: kbuild test robot @ 2018-06-25 13:39 UTC (permalink / raw)
  To: Ka-Cheong Poon; +Cc: kbuild-all, netdev, santosh.shilimkar, davem, rds-devel
In-Reply-To: <3b53ed6f18e697fe8e414907cbbba81834515a2a.1529922794.git.ka-cheong.poon@oracle.com>


Fixes: f58dbee41d61 ("rds: Changing IP address internal representation to struct in6_addr")
Signed-off-by: kbuild test robot <fengguang.wu@intel.com>
---
 rdma_transport.c |    6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/net/rds/rdma_transport.c b/net/rds/rdma_transport.c
index d7da115..3634ed5 100644
--- a/net/rds/rdma_transport.c
+++ b/net/rds/rdma_transport.c
@@ -39,9 +39,9 @@
 
 static struct rdma_cm_id *rds_rdma_listen_id;
 
-int rds_rdma_cm_event_handler_cmn(struct rdma_cm_id *cm_id,
-				  struct rdma_cm_event *event,
-				  bool isv6)
+static int rds_rdma_cm_event_handler_cmn(struct rdma_cm_id *cm_id,
+					 struct rdma_cm_event *event,
+					 bool isv6)
 {
 	/* this can be null in the listening path */
 	struct rds_connection *conn = cm_id->context;

^ permalink raw reply related

* RE: [PATCH 1/5] net: emaclite: Use __func__ instead of hardcoded name
From: Radhey Shyam Pandey @ 2018-06-25 13:47 UTC (permalink / raw)
  To: Joe Perches, Andy Shevchenko
  Cc: David S. Miller, Andrew Lunn, Michal Simek, netdev,
	linux-arm Mailing List, Linux Kernel Mailing List
In-Reply-To: <3013aae1803c096645887864d1b161efbfba7c77.camel@perches.com>

> -----Original Message-----
> From: Joe Perches [mailto:joe@perches.com]
> Sent: Wednesday, June 20, 2018 4:08 AM
> To: Andy Shevchenko <andy.shevchenko@gmail.com>; Radhey Shyam
> Pandey <radheys@xilinx.com>
> Cc: David S. Miller <davem@davemloft.net>; Andrew Lunn
> <andrew@lunn.ch>; Michal Simek <michals@xilinx.com>; netdev
> <netdev@vger.kernel.org>; linux-arm Mailing List <linux-arm-
> kernel@lists.infradead.org>; Linux Kernel Mailing List <linux-
> kernel@vger.kernel.org>
> Subject: Re: [PATCH 1/5] net: emaclite: Use __func__ instead of hardcoded
> name
> 
> On Wed, 2018-06-20 at 00:36 +0300, Andy Shevchenko wrote:
> > On Mon, Jun 18, 2018 at 2:08 PM, Radhey Shyam Pandey
> > <radhey.shyam.pandey@xilinx.com> wrote:
> > > Switch hardcoded function name with a reference to __func__ making
> > > the code more maintainable. Address below checkpatch warning:
> > >
> > > WARNING: Prefer using '"%s...", __func__' to using
> 'xemaclite_mdio_read',
> > > this function's name, in a string
> > > +               "xemaclite_mdio_read(phy_id=%i, reg=%x) == %x\n",
> > >
> > > WARNING: Prefer using '"%s...", __func__' to using
> 'xemaclite_mdio_write',
> > > this function's name, in a string
> > > +               "xemaclite_mdio_write(phy_id=%i, reg=%x, val=%x)\n",
> > >
> >
> > For dev_dbg() the __func__ should be completely dropped away.
> 
> Not really the same.
> 
> dev_dbg without CONFIG_DYNAMIC_DEBUG does not have
> the ability to prefix __func__.
Yes.  If it's acceptable,  prefer to use __func__  to support all 
configurations

> 

^ permalink raw reply

* [PATCH net-next 0/7] l2tp: trivial cleanups
From: Guillaume Nault @ 2018-06-25 14:07 UTC (permalink / raw)
  To: netdev; +Cc: James Chapman

Just a set of unrelated trivial cleanups (remove unused code, make
local functions static, etc.).

Guillaume Nault (7):
  l2tp: remove pppol2tp_session_close()
  l2tp: remove .show from struct l2tp_tunnel
  l2tp: remove l2tp_tunnel_priv()
  l2tp: don't export l2tp_session_queue_purge()
  l2tp: don't export l2tp_tunnel_closeall()
  l2tp: avoid duplicate l2tp_pernet() calls
  l2tp: make l2tp_xmit_core() return void

 net/l2tp/l2tp_core.c    | 15 +++++----------
 net/l2tp/l2tp_core.h    | 12 ------------
 net/l2tp/l2tp_debugfs.c |  3 ---
 net/l2tp/l2tp_ppp.c     |  7 -------
 4 files changed, 5 insertions(+), 32 deletions(-)

-- 
2.18.0

^ permalink raw reply

* [PATCH net-next 2/7] l2tp: remove .show from struct l2tp_tunnel
From: Guillaume Nault @ 2018-06-25 14:07 UTC (permalink / raw)
  To: netdev; +Cc: James Chapman
In-Reply-To: <cover.1529935024.git.g.nault@alphalink.fr>

This callback has never been implemented.

Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
---
 net/l2tp/l2tp_core.h    | 3 ---
 net/l2tp/l2tp_debugfs.c | 3 ---
 2 files changed, 6 deletions(-)

diff --git a/net/l2tp/l2tp_core.h b/net/l2tp/l2tp_core.h
index c199020f8a8a..b21c20a4e08f 100644
--- a/net/l2tp/l2tp_core.h
+++ b/net/l2tp/l2tp_core.h
@@ -180,9 +180,6 @@ struct l2tp_tunnel {
 	struct net		*l2tp_net;	/* the net we belong to */
 
 	refcount_t		ref_count;
-#ifdef CONFIG_DEBUG_FS
-	void (*show)(struct seq_file *m, void *arg);
-#endif
 	int (*recv_payload_hook)(struct sk_buff *skb);
 	void (*old_sk_destruct)(struct sock *);
 	struct sock		*sock;		/* Parent socket */
diff --git a/net/l2tp/l2tp_debugfs.c b/net/l2tp/l2tp_debugfs.c
index e87686f7d63c..b5d7dde003ef 100644
--- a/net/l2tp/l2tp_debugfs.c
+++ b/net/l2tp/l2tp_debugfs.c
@@ -177,9 +177,6 @@ static void l2tp_dfs_seq_tunnel_show(struct seq_file *m, void *v)
 		   atomic_long_read(&tunnel->stats.rx_packets),
 		   atomic_long_read(&tunnel->stats.rx_bytes),
 		   atomic_long_read(&tunnel->stats.rx_errors));
-
-	if (tunnel->show != NULL)
-		tunnel->show(m, tunnel);
 }
 
 static void l2tp_dfs_seq_session_show(struct seq_file *m, void *v)
-- 
2.18.0

^ permalink raw reply related

* [PATCH net-next 4/7] l2tp: don't export l2tp_session_queue_purge()
From: Guillaume Nault @ 2018-06-25 14:07 UTC (permalink / raw)
  To: netdev; +Cc: James Chapman
In-Reply-To: <cover.1529935024.git.g.nault@alphalink.fr>

This function is only used in l2tp_core.c.

Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
---
 net/l2tp/l2tp_core.c | 3 +--
 net/l2tp/l2tp_core.h | 1 -
 2 files changed, 1 insertion(+), 3 deletions(-)

diff --git a/net/l2tp/l2tp_core.c b/net/l2tp/l2tp_core.c
index 40261cb68e83..3adef4c35a3a 100644
--- a/net/l2tp/l2tp_core.c
+++ b/net/l2tp/l2tp_core.c
@@ -783,7 +783,7 @@ EXPORT_SYMBOL(l2tp_recv_common);
 
 /* Drop skbs from the session's reorder_q
  */
-int l2tp_session_queue_purge(struct l2tp_session *session)
+static int l2tp_session_queue_purge(struct l2tp_session *session)
 {
 	struct sk_buff *skb = NULL;
 	BUG_ON(!session);
@@ -794,7 +794,6 @@ int l2tp_session_queue_purge(struct l2tp_session *session)
 	}
 	return 0;
 }
-EXPORT_SYMBOL_GPL(l2tp_session_queue_purge);
 
 /* Internal UDP receive frame. Do the real work of receiving an L2TP data frame
  * here. The skb is not on a list when we get here.
diff --git a/net/l2tp/l2tp_core.h b/net/l2tp/l2tp_core.h
index 15e1171ecf7b..0a6e582f84d3 100644
--- a/net/l2tp/l2tp_core.h
+++ b/net/l2tp/l2tp_core.h
@@ -234,7 +234,6 @@ void l2tp_session_free(struct l2tp_session *session);
 void l2tp_recv_common(struct l2tp_session *session, struct sk_buff *skb,
 		      unsigned char *ptr, unsigned char *optr, u16 hdrflags,
 		      int length, int (*payload_hook)(struct sk_buff *skb));
-int l2tp_session_queue_purge(struct l2tp_session *session);
 int l2tp_udp_encap_recv(struct sock *sk, struct sk_buff *skb);
 void l2tp_session_set_header_len(struct l2tp_session *session, int version);
 
-- 
2.18.0

^ permalink raw reply related

* [PATCH net-next 3/7] l2tp: remove l2tp_tunnel_priv()
From: Guillaume Nault @ 2018-06-25 14:07 UTC (permalink / raw)
  To: netdev; +Cc: James Chapman
In-Reply-To: <cover.1529935024.git.g.nault@alphalink.fr>

This function, and the associated .priv field, are unused.

Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
---
 net/l2tp/l2tp_core.h | 7 -------
 1 file changed, 7 deletions(-)

diff --git a/net/l2tp/l2tp_core.h b/net/l2tp/l2tp_core.h
index b21c20a4e08f..15e1171ecf7b 100644
--- a/net/l2tp/l2tp_core.h
+++ b/net/l2tp/l2tp_core.h
@@ -187,8 +187,6 @@ struct l2tp_tunnel {
 						 * was created by userspace */
 
 	struct work_struct	del_work;
-
-	uint8_t			priv[0];	/* private data */
 };
 
 struct l2tp_nl_cmd_ops {
@@ -198,11 +196,6 @@ struct l2tp_nl_cmd_ops {
 	int (*session_delete)(struct l2tp_session *session);
 };
 
-static inline void *l2tp_tunnel_priv(struct l2tp_tunnel *tunnel)
-{
-	return &tunnel->priv[0];
-}
-
 static inline void *l2tp_session_priv(struct l2tp_session *session)
 {
 	return &session->priv[0];
-- 
2.18.0

^ permalink raw reply related

* [PATCH net-next 5/7] l2tp: don't export l2tp_tunnel_closeall()
From: Guillaume Nault @ 2018-06-25 14:07 UTC (permalink / raw)
  To: netdev; +Cc: James Chapman
In-Reply-To: <cover.1529935024.git.g.nault@alphalink.fr>

This function is only used in l2tp_core.c.

Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
---
 net/l2tp/l2tp_core.c | 3 +--
 net/l2tp/l2tp_core.h | 1 -
 2 files changed, 1 insertion(+), 3 deletions(-)

diff --git a/net/l2tp/l2tp_core.c b/net/l2tp/l2tp_core.c
index 3adef4c35a3a..96e31f2ae7cd 100644
--- a/net/l2tp/l2tp_core.c
+++ b/net/l2tp/l2tp_core.c
@@ -1192,7 +1192,7 @@ static void l2tp_tunnel_destruct(struct sock *sk)
 
 /* When the tunnel is closed, all the attached sessions need to go too.
  */
-void l2tp_tunnel_closeall(struct l2tp_tunnel *tunnel)
+static void l2tp_tunnel_closeall(struct l2tp_tunnel *tunnel)
 {
 	int hash;
 	struct hlist_node *walk;
@@ -1241,7 +1241,6 @@ void l2tp_tunnel_closeall(struct l2tp_tunnel *tunnel)
 	}
 	write_unlock_bh(&tunnel->hlist_lock);
 }
-EXPORT_SYMBOL_GPL(l2tp_tunnel_closeall);
 
 /* Tunnel socket destroy hook for UDP encapsulation */
 static void l2tp_udp_encap_destroy(struct sock *sk)
diff --git a/net/l2tp/l2tp_core.h b/net/l2tp/l2tp_core.h
index 0a6e582f84d3..a5c09d3a5698 100644
--- a/net/l2tp/l2tp_core.h
+++ b/net/l2tp/l2tp_core.h
@@ -219,7 +219,6 @@ int l2tp_tunnel_create(struct net *net, int fd, int version, u32 tunnel_id,
 int l2tp_tunnel_register(struct l2tp_tunnel *tunnel, struct net *net,
 			 struct l2tp_tunnel_cfg *cfg);
 
-void l2tp_tunnel_closeall(struct l2tp_tunnel *tunnel);
 void l2tp_tunnel_delete(struct l2tp_tunnel *tunnel);
 struct l2tp_session *l2tp_session_create(int priv_size,
 					 struct l2tp_tunnel *tunnel,
-- 
2.18.0

^ permalink raw reply related

* [PATCH net-next 6/7] l2tp: avoid duplicate l2tp_pernet() calls
From: Guillaume Nault @ 2018-06-25 14:07 UTC (permalink / raw)
  To: netdev; +Cc: James Chapman
In-Reply-To: <cover.1529935024.git.g.nault@alphalink.fr>

Replace 'l2tp_pernet(tunnel->l2tp_net)' with 'pn', which has been set
on the preceding line.

Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
---
 net/l2tp/l2tp_core.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/net/l2tp/l2tp_core.c b/net/l2tp/l2tp_core.c
index 96e31f2ae7cd..88c3001531b4 100644
--- a/net/l2tp/l2tp_core.c
+++ b/net/l2tp/l2tp_core.c
@@ -322,8 +322,7 @@ int l2tp_session_register(struct l2tp_session *session,
 
 	if (tunnel->version == L2TP_HDR_VER_3) {
 		pn = l2tp_pernet(tunnel->l2tp_net);
-		g_head = l2tp_session_id_hash_2(l2tp_pernet(tunnel->l2tp_net),
-						session->session_id);
+		g_head = l2tp_session_id_hash_2(pn, session->session_id);
 
 		spin_lock_bh(&pn->l2tp_session_hlist_lock);
 
-- 
2.18.0

^ permalink raw reply related

* [PATCH net-next 7/7] l2tp: make l2tp_xmit_core() return void
From: Guillaume Nault @ 2018-06-25 14:07 UTC (permalink / raw)
  To: netdev; +Cc: James Chapman
In-Reply-To: <cover.1529935024.git.g.nault@alphalink.fr>

It always returns 0, and nobody reads the return value anyway.

Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
---
 net/l2tp/l2tp_core.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/net/l2tp/l2tp_core.c b/net/l2tp/l2tp_core.c
index 88c3001531b4..1ea285bad84b 100644
--- a/net/l2tp/l2tp_core.c
+++ b/net/l2tp/l2tp_core.c
@@ -1007,8 +1007,8 @@ static int l2tp_build_l2tpv3_header(struct l2tp_session *session, void *buf)
 	return bufp - optr;
 }
 
-static int l2tp_xmit_core(struct l2tp_session *session, struct sk_buff *skb,
-			  struct flowi *fl, size_t data_len)
+static void l2tp_xmit_core(struct l2tp_session *session, struct sk_buff *skb,
+			   struct flowi *fl, size_t data_len)
 {
 	struct l2tp_tunnel *tunnel = session->tunnel;
 	unsigned int len = skb->len;
@@ -1050,8 +1050,6 @@ static int l2tp_xmit_core(struct l2tp_session *session, struct sk_buff *skb,
 		atomic_long_inc(&tunnel->stats.tx_errors);
 		atomic_long_inc(&session->stats.tx_errors);
 	}
-
-	return 0;
 }
 
 /* If caller requires the skb to have a ppp header, the header must be
-- 
2.18.0

^ permalink raw reply related

* [PATCH net-next 1/7] l2tp: remove pppol2tp_session_close()
From: Guillaume Nault @ 2018-06-25 14:07 UTC (permalink / raw)
  To: netdev; +Cc: James Chapman
In-Reply-To: <cover.1529935024.git.g.nault@alphalink.fr>

l2tp_core.c verifies that ->session_close() is defined before calling
it. There's no need for a stub.

Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
---
 net/l2tp/l2tp_ppp.c | 7 -------
 1 file changed, 7 deletions(-)

diff --git a/net/l2tp/l2tp_ppp.c b/net/l2tp/l2tp_ppp.c
index 55188382845c..eea5d7844473 100644
--- a/net/l2tp/l2tp_ppp.c
+++ b/net/l2tp/l2tp_ppp.c
@@ -424,12 +424,6 @@ static void pppol2tp_put_sk(struct rcu_head *head)
 	sock_put(ps->__sk);
 }
 
-/* Called by l2tp_core when a session socket is being closed.
- */
-static void pppol2tp_session_close(struct l2tp_session *session)
-{
-}
-
 /* Really kill the session socket. (Called from sock_put() if
  * refcnt == 0.)
  */
@@ -573,7 +567,6 @@ static void pppol2tp_session_init(struct l2tp_session *session)
 	struct dst_entry *dst;
 
 	session->recv_skb = pppol2tp_recv;
-	session->session_close = pppol2tp_session_close;
 #if IS_ENABLED(CONFIG_L2TP_DEBUGFS)
 	session->show = pppol2tp_show;
 #endif
-- 
2.18.0

^ permalink raw reply related

* [PATCH net-next] rds: clean up loopback rds_connections on netns deletion
From: Sowmini Varadhan @ 2018-06-25 13:41 UTC (permalink / raw)
  To: netdev, sowmini.varadhan
  Cc: davem, rds-devel, sowmini.varadhan, santosh.shilimkar

The RDS core module creates rds_connections based on callbacks
from rds_loop_transport when sending/receiving packets to local
addresses.

These connections will need to be cleaned up when they are
created from a netns that is not init_net, and that netns is deleted.

Add the changes aligned with the changes from
commit ebeeb1ad9b8a ("rds: tcp: use rds_destroy_pending() to synchronize
netns/module teardown and rds connection/workq management") for
rds_loop_transport

Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
---
 net/rds/connection.c |   11 +++++++++-
 net/rds/loop.c       |   56 ++++++++++++++++++++++++++++++++++++++++++++++++++
 net/rds/loop.h       |    2 +
 3 files changed, 68 insertions(+), 1 deletions(-)

diff --git a/net/rds/connection.c b/net/rds/connection.c
index abef75d..cfb0595 100644
--- a/net/rds/connection.c
+++ b/net/rds/connection.c
@@ -659,11 +659,19 @@ static void rds_conn_info(struct socket *sock, unsigned int len,
 
 int rds_conn_init(void)
 {
+	int ret;
+
+	ret = rds_loop_net_init(); /* register pernet callback */
+	if (ret)
+		return ret;
+
 	rds_conn_slab = kmem_cache_create("rds_connection",
 					  sizeof(struct rds_connection),
 					  0, 0, NULL);
-	if (!rds_conn_slab)
+	if (!rds_conn_slab) {
+		rds_loop_net_exit();
 		return -ENOMEM;
+	}
 
 	rds_info_register_func(RDS_INFO_CONNECTIONS, rds_conn_info);
 	rds_info_register_func(RDS_INFO_SEND_MESSAGES,
@@ -676,6 +684,7 @@ int rds_conn_init(void)
 
 void rds_conn_exit(void)
 {
+	rds_loop_net_exit(); /* unregister pernet callback */
 	rds_loop_exit();
 
 	WARN_ON(!hlist_empty(rds_conn_hash));
diff --git a/net/rds/loop.c b/net/rds/loop.c
index dac6218..feea1f9 100644
--- a/net/rds/loop.c
+++ b/net/rds/loop.c
@@ -33,6 +33,8 @@
 #include <linux/kernel.h>
 #include <linux/slab.h>
 #include <linux/in.h>
+#include <net/net_namespace.h>
+#include <net/netns/generic.h>
 
 #include "rds_single_path.h"
 #include "rds.h"
@@ -40,6 +42,17 @@
 
 static DEFINE_SPINLOCK(loop_conns_lock);
 static LIST_HEAD(loop_conns);
+static atomic_t rds_loop_unloading = ATOMIC_INIT(0);
+
+static void rds_loop_set_unloading(void)
+{
+	atomic_set(&rds_loop_unloading, 1);
+}
+
+static bool rds_loop_is_unloading(struct rds_connection *conn)
+{
+	return atomic_read(&rds_loop_unloading) != 0;
+}
 
 /*
  * This 'loopback' transport is a special case for flows that originate
@@ -165,6 +178,8 @@ void rds_loop_exit(void)
 	struct rds_loop_connection *lc, *_lc;
 	LIST_HEAD(tmp_list);
 
+	rds_loop_set_unloading();
+	synchronize_rcu();
 	/* avoid calling conn_destroy with irqs off */
 	spin_lock_irq(&loop_conns_lock);
 	list_splice(&loop_conns, &tmp_list);
@@ -177,6 +192,46 @@ void rds_loop_exit(void)
 	}
 }
 
+static void rds_loop_kill_conns(struct net *net)
+{
+	struct rds_loop_connection *lc, *_lc;
+	LIST_HEAD(tmp_list);
+
+	spin_lock_irq(&loop_conns_lock);
+	list_for_each_entry_safe(lc, _lc, &loop_conns, loop_node)  {
+		struct net *c_net = read_pnet(&lc->conn->c_net);
+
+		if (net != c_net)
+			continue;
+		list_move_tail(&lc->loop_node, &tmp_list);
+	}
+	spin_unlock_irq(&loop_conns_lock);
+
+	list_for_each_entry_safe(lc, _lc, &tmp_list, loop_node) {
+		WARN_ON(lc->conn->c_passive);
+		rds_conn_destroy(lc->conn);
+	}
+}
+
+static void __net_exit rds_loop_exit_net(struct net *net)
+{
+	rds_loop_kill_conns(net);
+}
+
+static struct pernet_operations rds_loop_net_ops = {
+	.exit = rds_loop_exit_net,
+};
+
+int rds_loop_net_init(void)
+{
+	return register_pernet_device(&rds_loop_net_ops);
+}
+
+void rds_loop_net_exit(void)
+{
+	unregister_pernet_device(&rds_loop_net_ops);
+}
+
 /*
  * This is missing .xmit_* because loop doesn't go through generic
  * rds_send_xmit() and doesn't call rds_recv_incoming().  .listen_stop and
@@ -194,4 +249,5 @@ struct rds_transport rds_loop_transport = {
 	.inc_free		= rds_loop_inc_free,
 	.t_name			= "loopback",
 	.t_type			= RDS_TRANS_LOOP,
+	.t_unloading		= rds_loop_is_unloading,
 };
diff --git a/net/rds/loop.h b/net/rds/loop.h
index 469fa4b..bbc8cdd 100644
--- a/net/rds/loop.h
+++ b/net/rds/loop.h
@@ -5,6 +5,8 @@
 /* loop.c */
 extern struct rds_transport rds_loop_transport;
 
+int rds_loop_net_init(void);
+void rds_loop_net_exit(void);
 void rds_loop_exit(void);
 
 #endif
-- 
1.7.1

^ permalink raw reply related

* [PATCH 2/4] net: lan78xx: Add support for VLAN filtering.
From: Dave Stevenson @ 2018-06-25 14:07 UTC (permalink / raw)
  To: woojung.huh, UNGLinuxDriver, davem, netdev; +Cc: Dave Stevenson
In-Reply-To: <cover.1529935234.git.dave.stevenson@raspberrypi.org>

HW_VLAN_CTAG_FILTER was partially implemented, but not advertised
to Linux.

Complete the implementation of this.

Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
---
 drivers/net/usb/lan78xx.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/net/usb/lan78xx.c b/drivers/net/usb/lan78xx.c
index 2f793d4..afe7fa3 100644
--- a/drivers/net/usb/lan78xx.c
+++ b/drivers/net/usb/lan78xx.c
@@ -2363,7 +2363,7 @@ static int lan78xx_set_features(struct net_device *netdev,
 		pdata->rfe_ctl &= ~(RFE_CTL_ICMP_COE_ | RFE_CTL_IGMP_COE_);
 	}
 
-	if (features & NETIF_F_HW_VLAN_CTAG_RX)
+	if (features & NETIF_F_HW_VLAN_CTAG_FILTER)
 		pdata->rfe_ctl |= RFE_CTL_VLAN_FILTER_;
 	else
 		pdata->rfe_ctl &= ~RFE_CTL_VLAN_FILTER_;
@@ -2976,6 +2976,9 @@ static int lan78xx_bind(struct lan78xx_net *dev, struct usb_interface *intf)
 	if (DEFAULT_TSO_CSUM_ENABLE)
 		dev->net->features |= NETIF_F_TSO | NETIF_F_TSO6 | NETIF_F_SG;
 
+	if (DEFAULT_VLAN_FILTER_ENABLE)
+		dev->net->features |= NETIF_F_HW_VLAN_CTAG_FILTER;
+
 	dev->net->hw_features = dev->net->features;
 
 	ret = lan78xx_setup_irq_domain(dev);
-- 
2.7.4

^ permalink raw reply related

* [bpf-next PATCH 0/2] xdp/bpf: extend XDP samples/bpf xdp_rxq_info
From: Jesper Dangaard Brouer @ 2018-06-25 14:27 UTC (permalink / raw)
  To: netdev, Jesper Dangaard Brouer
  Cc: Daniel Borkmann, Toke Høiland-Jørgensen,
	Alexei Starovoitov

While writing an article about XDP, the samples/bpf xdp_rxq_info
program were extended to cover some more use-cases.

---

Jesper Dangaard Brouer (2):
      samples/bpf: extend xdp_rxq_info to read packet payload
      samples/bpf: xdp_rxq_info action XDP_TX must adjust MAC-addrs


 samples/bpf/xdp_rxq_info_kern.c |   43 +++++++++++++++++++++++++++++++++++++
 samples/bpf/xdp_rxq_info_user.c |   45 ++++++++++++++++++++++++++++++++++-----
 2 files changed, 82 insertions(+), 6 deletions(-)

^ permalink raw reply

* [bpf-next PATCH 1/2] samples/bpf: extend xdp_rxq_info to read packet payload
From: Jesper Dangaard Brouer @ 2018-06-25 14:27 UTC (permalink / raw)
  To: netdev, Jesper Dangaard Brouer
  Cc: Daniel Borkmann, Toke Høiland-Jørgensen,
	Alexei Starovoitov
In-Reply-To: <152993682254.8835.8864318933370018087.stgit@firesoul>

There is a cost associated with reading the packet data payload
that this test ignored.  Add option --read to allow enabling
reading part of the payload.

This sample/tool helps us analyse an issue observed with a NIC
mlx5 (ConnectX-5 Ex) and an Intel(R) Xeon(R) CPU E5-1650 v4.

With no_touch of data:

Running XDP on dev:mlx5p1 (ifindex:8) action:XDP_DROP options:no_touch
XDP stats       CPU     pps         issue-pps
XDP-RX CPU      0       14,465,157  0
XDP-RX CPU      1       14,464,728  0
XDP-RX CPU      2       14,465,283  0
XDP-RX CPU      3       14,465,282  0
XDP-RX CPU      4       14,464,159  0
XDP-RX CPU      5       14,465,379  0
XDP-RX CPU      total   86,789,992

When not touching data, we observe that the CPUs have idle cycles.
When reading data the CPUs are 100% busy in softirq.

With reading data:

Running XDP on dev:mlx5p1 (ifindex:8) action:XDP_DROP options:read
XDP stats       CPU     pps         issue-pps
XDP-RX CPU      0       9,620,639   0
XDP-RX CPU      1       9,489,843   0
XDP-RX CPU      2       9,407,854   0
XDP-RX CPU      3       9,422,289   0
XDP-RX CPU      4       9,321,959   0
XDP-RX CPU      5       9,395,242   0
XDP-RX CPU      total   56,657,828

The effect seen above is a result of cache-misses occuring when
more RXQs are being used.  Based on perf-event observations, our
conclusion is that the CPUs DDIO (Direct Data I/O) choose to
deliver packet into main memory, instead of L3-cache.  We also
found, that this can be mitigated by either using less RXQs or by
reducing NICs the RX-ring size.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk>
---
 samples/bpf/xdp_rxq_info_kern.c |   19 +++++++++++++++++++
 samples/bpf/xdp_rxq_info_user.c |   34 ++++++++++++++++++++++++++++------
 2 files changed, 47 insertions(+), 6 deletions(-)

diff --git a/samples/bpf/xdp_rxq_info_kern.c b/samples/bpf/xdp_rxq_info_kern.c
index 3fd209291653..61af6210df2f 100644
--- a/samples/bpf/xdp_rxq_info_kern.c
+++ b/samples/bpf/xdp_rxq_info_kern.c
@@ -4,6 +4,8 @@
  *  Example howto extract XDP RX-queue info
  */
 #include <uapi/linux/bpf.h>
+#include <uapi/linux/if_ether.h>
+#include <uapi/linux/in.h>
 #include "bpf_helpers.h"
 
 /* Config setup from with userspace
@@ -14,6 +16,11 @@
 struct config {
 	__u32 action;
 	int ifindex;
+	__u32 options;
+};
+enum cfg_options_flags {
+	NO_TOUCH = 0x0U,
+	READ_MEM = 0x1U,
 };
 struct bpf_map_def SEC("maps") config_map = {
 	.type		= BPF_MAP_TYPE_ARRAY,
@@ -90,6 +97,18 @@ int  xdp_prognum0(struct xdp_md *ctx)
 	if (key == MAX_RXQs)
 		rxq_rec->issue++;
 
+	/* Default: Don't touch packet data, only count packets */
+	if (unlikely(config->options & READ_MEM)) {
+		struct ethhdr *eth = data;
+
+		if (eth + 1 > data_end)
+			return XDP_ABORTED;
+
+		/* Avoid compiler removing this: Drop non 802.3 Ethertypes */
+		if (ntohs(eth->h_proto) < ETH_P_802_3_MIN)
+			return XDP_ABORTED;
+	}
+
 	return config->action;
 }
 
diff --git a/samples/bpf/xdp_rxq_info_user.c b/samples/bpf/xdp_rxq_info_user.c
index e4e9ba52bff0..435485d4f49e 100644
--- a/samples/bpf/xdp_rxq_info_user.c
+++ b/samples/bpf/xdp_rxq_info_user.c
@@ -50,6 +50,7 @@ static const struct option long_options[] = {
 	{"sec",		required_argument,	NULL, 's' },
 	{"no-separators", no_argument,		NULL, 'z' },
 	{"action",	required_argument,	NULL, 'a' },
+	{"readmem", 	no_argument,		NULL, 'r' },
 	{0, 0, NULL,  0 }
 };
 
@@ -66,6 +67,11 @@ static void int_exit(int sig)
 struct config {
 	__u32 action;
 	int ifindex;
+	__u32 options;
+};
+enum cfg_options_flags {
+	NO_TOUCH = 0x0U,
+	READ_MEM = 0x1U,
 };
 #define XDP_ACTION_MAX (XDP_TX + 1)
 #define XDP_ACTION_MAX_STRLEN 11
@@ -109,6 +115,16 @@ static void list_xdp_actions(void)
 	printf("\n");
 }
 
+static char* options2str(enum cfg_options_flags flag)
+{
+	if (flag == NO_TOUCH)
+		return "no_touch";
+	if (flag & READ_MEM)
+		return "read";
+	fprintf(stderr, "ERR: Unknown config option flags");
+	exit(EXIT_FAIL);
+}
+
 static void usage(char *argv[])
 {
 	int i;
@@ -305,7 +321,7 @@ static __u64 calc_errs_pps(struct datarec *r,
 
 static void stats_print(struct stats_record *stats_rec,
 			struct stats_record *stats_prev,
-			int action)
+			int action, __u32 cfg_opt)
 {
 	unsigned int nr_rxqs = bpf_map__def(rx_queue_index_map)->max_entries;
 	unsigned int nr_cpus = bpf_num_possible_cpus();
@@ -316,8 +332,8 @@ static void stats_print(struct stats_record *stats_rec,
 	int i;
 
 	/* Header */
-	printf("\nRunning XDP on dev:%s (ifindex:%d) action:%s\n",
-	       ifname, ifindex, action2str(action));
+	printf("\nRunning XDP on dev:%s (ifindex:%d) action:%s options:%s\n",
+	       ifname, ifindex, action2str(action), options2str(cfg_opt));
 
 	/* stats_global_map */
 	{
@@ -399,7 +415,7 @@ static inline void swap(struct stats_record **a, struct stats_record **b)
 	*b = tmp;
 }
 
-static void stats_poll(int interval, int action)
+static void stats_poll(int interval, int action, __u32 cfg_opt)
 {
 	struct stats_record *record, *prev;
 
@@ -410,7 +426,7 @@ static void stats_poll(int interval, int action)
 	while (1) {
 		swap(&prev, &record);
 		stats_collect(record);
-		stats_print(record, prev, action);
+		stats_print(record, prev, action, cfg_opt);
 		sleep(interval);
 	}
 
@@ -421,6 +437,7 @@ static void stats_poll(int interval, int action)
 
 int main(int argc, char **argv)
 {
+	__u32 cfg_options= NO_TOUCH ; /* Default: Don't touch packet memory */
 	struct rlimit r = {10 * 1024 * 1024, RLIM_INFINITY};
 	struct bpf_prog_load_attr prog_load_attr = {
 		.prog_type	= BPF_PROG_TYPE_XDP,
@@ -435,6 +452,7 @@ int main(int argc, char **argv)
 	int interval = 2;
 	__u32 key = 0;
 
+
 	char action_str_buf[XDP_ACTION_MAX_STRLEN + 1 /* for \0 */] = { 0 };
 	int action = XDP_PASS; /* Default action */
 	char *action_str = NULL;
@@ -496,6 +514,9 @@ int main(int argc, char **argv)
 			action_str = (char *)&action_str_buf;
 			strncpy(action_str, optarg, XDP_ACTION_MAX_STRLEN);
 			break;
+		case 'r':
+			cfg_options |= READ_MEM;
+			break;
 		case 'h':
 		error:
 		default:
@@ -522,6 +543,7 @@ int main(int argc, char **argv)
 		}
 	}
 	cfg.action = action;
+	cfg.options = cfg_options;
 
 	/* Trick to pretty printf with thousands separators use %' */
 	if (use_separators)
@@ -542,6 +564,6 @@ int main(int argc, char **argv)
 		return EXIT_FAIL_XDP;
 	}
 
-	stats_poll(interval, action);
+	stats_poll(interval, action, cfg_options);
 	return EXIT_OK;
 }

^ permalink raw reply related

* [bpf-next PATCH 2/2] samples/bpf: xdp_rxq_info action XDP_TX must adjust MAC-addrs
From: Jesper Dangaard Brouer @ 2018-06-25 14:27 UTC (permalink / raw)
  To: netdev, Jesper Dangaard Brouer
  Cc: Daniel Borkmann, Toke Høiland-Jørgensen,
	Alexei Starovoitov
In-Reply-To: <152993682254.8835.8864318933370018087.stgit@firesoul>

XDP_TX requires also changing the MAC-addrs, else some hardware
may drop the TX packet before reaching the wire.  This was
observed with driver mlx5.

If xdp_rxq_info select --action XDP_TX the swapmac functionality
is activated.  It is also possible to manually enable via cmdline
option --swapmac.  This is practical if wanting to measure the
overhead of writing/updating payload for other action types.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk>
---
 samples/bpf/xdp_rxq_info_kern.c |   26 +++++++++++++++++++++++++-
 samples/bpf/xdp_rxq_info_user.c |   11 +++++++++++
 2 files changed, 36 insertions(+), 1 deletion(-)

diff --git a/samples/bpf/xdp_rxq_info_kern.c b/samples/bpf/xdp_rxq_info_kern.c
index 61af6210df2f..222a83eed1cb 100644
--- a/samples/bpf/xdp_rxq_info_kern.c
+++ b/samples/bpf/xdp_rxq_info_kern.c
@@ -21,6 +21,7 @@ struct config {
 enum cfg_options_flags {
 	NO_TOUCH = 0x0U,
 	READ_MEM = 0x1U,
+	SWAP_MAC = 0x2U,
 };
 struct bpf_map_def SEC("maps") config_map = {
 	.type		= BPF_MAP_TYPE_ARRAY,
@@ -52,6 +53,23 @@ struct bpf_map_def SEC("maps") rx_queue_index_map = {
 	.max_entries	= MAX_RXQs + 1,
 };
 
+static __always_inline
+void swap_src_dst_mac(void *data)
+{
+	unsigned short *p = data;
+	unsigned short dst[3];
+
+	dst[0] = p[0];
+	dst[1] = p[1];
+	dst[2] = p[2];
+	p[0] = p[3];
+	p[1] = p[4];
+	p[2] = p[5];
+	p[3] = dst[0];
+	p[4] = dst[1];
+	p[5] = dst[2];
+}
+
 SEC("xdp_prog0")
 int  xdp_prognum0(struct xdp_md *ctx)
 {
@@ -98,7 +116,7 @@ int  xdp_prognum0(struct xdp_md *ctx)
 		rxq_rec->issue++;
 
 	/* Default: Don't touch packet data, only count packets */
-	if (unlikely(config->options & READ_MEM)) {
+	if (unlikely(config->options & (READ_MEM|SWAP_MAC))) {
 		struct ethhdr *eth = data;
 
 		if (eth + 1 > data_end)
@@ -107,6 +125,12 @@ int  xdp_prognum0(struct xdp_md *ctx)
 		/* Avoid compiler removing this: Drop non 802.3 Ethertypes */
 		if (ntohs(eth->h_proto) < ETH_P_802_3_MIN)
 			return XDP_ABORTED;
+
+		/* XDP_TX requires changing MAC-addrs, else HW may drop.
+		 * Can also be enabled with --swapmac (for test purposes)
+		 */
+		if (unlikely(config->options & SWAP_MAC))
+			swap_src_dst_mac(data);
 	}
 
 	return config->action;
diff --git a/samples/bpf/xdp_rxq_info_user.c b/samples/bpf/xdp_rxq_info_user.c
index 435485d4f49e..248a7eab9531 100644
--- a/samples/bpf/xdp_rxq_info_user.c
+++ b/samples/bpf/xdp_rxq_info_user.c
@@ -51,6 +51,7 @@ static const struct option long_options[] = {
 	{"no-separators", no_argument,		NULL, 'z' },
 	{"action",	required_argument,	NULL, 'a' },
 	{"readmem", 	no_argument,		NULL, 'r' },
+	{"swapmac", 	no_argument,		NULL, 'm' },
 	{0, 0, NULL,  0 }
 };
 
@@ -72,6 +73,7 @@ struct config {
 enum cfg_options_flags {
 	NO_TOUCH = 0x0U,
 	READ_MEM = 0x1U,
+	SWAP_MAC = 0x2U,
 };
 #define XDP_ACTION_MAX (XDP_TX + 1)
 #define XDP_ACTION_MAX_STRLEN 11
@@ -119,6 +121,8 @@ static char* options2str(enum cfg_options_flags flag)
 {
 	if (flag == NO_TOUCH)
 		return "no_touch";
+	if (flag & SWAP_MAC)
+		return "swapmac";
 	if (flag & READ_MEM)
 		return "read";
 	fprintf(stderr, "ERR: Unknown config option flags");
@@ -517,6 +521,9 @@ int main(int argc, char **argv)
 		case 'r':
 			cfg_options |= READ_MEM;
 			break;
+		case 'm':
+			cfg_options |= SWAP_MAC;
+			break;
 		case 'h':
 		error:
 		default:
@@ -543,6 +550,10 @@ int main(int argc, char **argv)
 		}
 	}
 	cfg.action = action;
+
+	/* XDP_TX requires changing MAC-addrs, else HW may drop */
+	if (action == XDP_TX)
+		cfg_options |= SWAP_MAC;
 	cfg.options = cfg_options;
 
 	/* Trick to pretty printf with thousands separators use %' */

^ permalink raw reply related

* Re: [PATCH RFC ipsec-next] xfrm: Extend the output_mark to support input direction and masking.
From: Steffen Klassert @ 2018-06-25 14:31 UTC (permalink / raw)
  To: netdev; +Cc: Tobias Brunner, Eyal Birger, Lorenzo Colitti
In-Reply-To: <20180615065514.bmy6tamr4fqivpyp@gauss3.secunet.de>

On Fri, Jun 15, 2018 at 08:55:14AM +0200, Steffen Klassert wrote:
> We already support setting an output mark at the xfrm_state,
> unfortunately this does not support the input direction and
> masking the marks that will be applied to the skb. This change
> adds support applying a masked value in both directions.
> 
> The existing XFRMA_OUTPUT_MARK number is reused for this purpose
> and as it is now bi-directional, it is renamed to XFRMA_SET_MARK.
> 
> An additional XFRMA_SET_MARK_MASK attribute is added for setting the
> mask. If the attribute mask not provided, it is set to 0xffffffff,
> keeping the XFRMA_OUTPUT_MARK existing 'full mask' semantics.
> 
> Co-developed-by: Tobias Brunner <tobias@strongswan.org>
> Co-developed-by: Eyal Birger <eyal.birger@gmail.com>
> Co-developed-by: Lorenzo Colitti <lorenzo@google.com>
> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
> Signed-off-by: Tobias Brunner <tobias@strongswan.org>
> Signed-off-by: Eyal Birger <eyal.birger@gmail.com>
> Signed-off-by: Lorenzo Colitti <lorenzo@google.com>

This is now applied to ipsec-next.

^ permalink raw reply

* Re: [PATCH RFC v2 ipsec-next 0/3] Virtual xfrm interfaces
From: Steffen Klassert @ 2018-06-25 14:34 UTC (permalink / raw)
  To: netdev, David Miller
  Cc: Eyal Birger, Antony Antony, Benedict Wong, Lorenzo Colitti,
	Shannon Nelson
In-Reply-To: <20180612075610.2000-1-steffen.klassert@secunet.com>

On Tue, Jun 12, 2018 at 09:56:07AM +0200, Steffen Klassert wrote:
> This patchset introduces new virtual xfrm interfaces.
> The design of virtual xfrm interfaces interfaces was
> discussed at the Linux IPsec workshop 2018. This patchset
> implements these interfaces as the IPsec userspace and
> kernel developers agreed. The purpose of these interfaces
> is to overcome the design limitations that the existing
> VTI devices have.
> 
> The main limitations that we see with the current VTI are the
> following:
> 
> - VTI interfaces are L3 tunnels with configurable endpoints.
>   For xfrm, the tunnel endpoint are already determined by the SA.
>   So the VTI tunnel endpoints must be either the same as on the
>   SA or wildcards. In case VTI tunnel endpoints are same as on
>   the SA, we get a one to one correlation between the SA and
>   the tunnel. So each SA needs its own tunnel interface.
> 
>   On the other hand, we can have only one VTI tunnel with
>   wildcard src/dst tunnel endpoints in the system because the
>   lookup is based on the tunnel endpoints. The existing tunnel
>   lookup won't work with multiple tunnels with wildcard
>   tunnel endpoints. Some usecases require more than on
>   VTI tunnel of this type, for example if somebody has multiple
>   namespaces and every namespace requires such a VTI.
> 
> - VTI needs separate interfaces for IPv4 and IPv6 tunnels.
>   So when routing to a VTI, we have to know to which address
>   family this traffic class is going to be encapsulated.
>   This is a lmitation because it makes routing more complex
>   and it is not always possible to know what happens behind the
>   VTI, e.g. when the VTI is move to some namespace.
> 
> - VTI works just with tunnel mode SAs. We need generic interfaces
>   that ensures transfomation, regardless of the xfrm mode and
>   the encapsulated address family.
> 
> - VTI is configured with a combination GRE keys and xfrm marks.
>   With this we have to deal with some extra cases in the generic
>   tunnel lookup because the GRE keys on the VTI are actually
>   not GRE keys, the GRE keys were just reused for something else.
>   All extensions to the VTI interfaces would require to add
>   even more complexity to the generic tunnel lookup.
> 
> To overcome this, we started with the following design goal:
> 
> - It should be possible to tunnel IPv4 and IPv6 through the same
>   interface.
> 
> - No limitation on xfrm mode (tunnel, transport and beet).
> 
> - Should be a generic virtual interface that ensures IPsec
>   transformation, no need to know what happens behind the
>   interface.
> 
> - Interfaces should be configured with a new key that must match a
>   new policy/SA lookup key.
> 
> - The lookup logic should stay in the xfrm codebase, no need to
>   change or extend generic routing and tunnel lookups.
> 
> - Should be possible to use IPsec hardware offloads of the underlying
>   interface.
> 
> Changes from v1:
> 
> - Document the limitations of VTI interfaces and the design of
>   the new xfrm interfaces more explicit in the commit messages.
> 
> - No code changes.

I have not got any further comments, so applied to ipsec-next.

^ permalink raw reply

* [PATCH 4/4] net: lan78xx: Use s/w csum check on VLANs without tag stripping
From: Dave Stevenson @ 2018-06-25 14:07 UTC (permalink / raw)
  To: woojung.huh, UNGLinuxDriver, davem, netdev; +Cc: Dave Stevenson
In-Reply-To: <cover.1529935234.git.dave.stevenson@raspberrypi.org>

Observations of VLANs dropping packets due to invalid
checksums when not offloading VLAN tag receive.
With VLAN tag stripping enabled no issue is observed.

Drop back to s/w checksums if VLAN offload is disabled.

Signed-off-by: Dave Stevenson <dave.stevenson@raspberrypi.org>
---
 drivers/net/usb/lan78xx.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/net/usb/lan78xx.c b/drivers/net/usb/lan78xx.c
index f72a8f5..6f2ea84 100644
--- a/drivers/net/usb/lan78xx.c
+++ b/drivers/net/usb/lan78xx.c
@@ -3052,8 +3052,13 @@ static void lan78xx_rx_csum_offload(struct lan78xx_net *dev,
 				    struct sk_buff *skb,
 				    u32 rx_cmd_a, u32 rx_cmd_b)
 {
+	/* HW Checksum offload appears to be flawed if used when not stripping
+	 * VLAN headers. Drop back to S/W checksums under these conditions.
+	 */
 	if (!(dev->net->features & NETIF_F_RXCSUM) ||
-	    unlikely(rx_cmd_a & RX_CMD_A_ICSM_)) {
+	    unlikely(rx_cmd_a & RX_CMD_A_ICSM_) ||
+	    ((rx_cmd_a & RX_CMD_A_FVTG_) &&
+	     !(dev->net->features & NETIF_F_HW_VLAN_CTAG_RX))) {
 		skb->ip_summed = CHECKSUM_NONE;
 	} else {
 		skb->csum = ntohs((u16)(rx_cmd_b >> RX_CMD_B_CSUM_SHIFT_));
-- 
2.7.4

^ permalink raw reply related

* Re: [PATCH ipsec-next] xfrm: policy: remove pcpu policy cache
From: Steffen Klassert @ 2018-06-25 14:42 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netdev
In-Reply-To: <20180625115753.13161-1-fw@strlen.de>

On Mon, Jun 25, 2018 at 01:57:53PM +0200, Florian Westphal wrote:
> Kristian Evensen says:
>   In a project I am involved in, we are running ipsec (Strongswan) on
>   different mt7621-based routers. Each router is configured as an
>   initiator and has around ~30 tunnels to different responders (running
>   on misc. devices). Before the flow cache was removed (kernel 4.9), we
>   got a combined throughput of around 70Mbit/s for all tunnels on one
>   router. However, we recently switched to kernel 4.14 (4.14.48), and
>   the total throughput is somewhere around 57Mbit/s (best-case). I.e., a
>   drop of around 20%. Reverting the flow cache removal restores, as
>   expected, performance levels to that of kernel 4.9.
> 
> When pcpu xdst exists, it has to be validated first before it can be
> used.
> 
> A negative hit thus increases cost vs. no-cache.
> 
> As number of tunnels increases, hit rate decreases so this pcpu caching
> isn't a viable strategy.
> 
> Furthermore, the xdst cache also needs to run with BH off, so when
> removing this the bh disable/enable pairs can be removed too.
> 
> Kristian tested a 4.14.y backport of this change and reported
> increased performance:
> 
>   In our tests, the throughput reduction has been reduced from around -20%
>   to -5%. We also see that the overall throughput is independent of the
>   number of tunnels, while before the throughput was reduced as the number
>   of tunnels increased.
> 
> Reported-by: Kristian Evensen <kristian.evensen@gmail.com>
> Signed-off-by: Florian Westphal <fw@strlen.de>

Can you please rebase this to ipsec-next current?

It does not apply cleanly after the merge of the
xfrm interface patches.

Thanks!

^ permalink raw reply

* [PATCH net-next v2] selftests: net: Test headroom handling of ip6_gre devices
From: Petr Machata @ 2018-06-25 14:43 UTC (permalink / raw)
  To: netdev, linux-kselftest; +Cc: davem, shuah, u9012063

Commit 5691484df961 ("net: ip6_gre: Fix headroom request in
ip6erspan_tunnel_xmit()") and commit 01b8d064d58b ("net: ip6_gre:
Request headroom in __gre6_xmit()") fix problems in reserving headroom
in the packets tunneled through ip6gre/tap and ip6erspan netdevices.

These two patches included snippets that reproduced the issues. This
patch elevates the snippets to a full-fledged test case.

Suggested-by: David Miller <davem@davemloft.net>
Signed-off-by: Petr Machata <petrm@mellanox.com>
---

Notes:
    Changes between v1 and v2:
    
    - Move tunnel construction to setup() and destruction to cleanup().

 tools/testing/selftests/net/ip6_gre_headroom.sh | 65 +++++++++++++++++++++++++
 1 file changed, 65 insertions(+)
 create mode 100755 tools/testing/selftests/net/ip6_gre_headroom.sh

diff --git a/tools/testing/selftests/net/ip6_gre_headroom.sh b/tools/testing/selftests/net/ip6_gre_headroom.sh
new file mode 100755
index 000000000000..5b41e8bb6e2d
--- /dev/null
+++ b/tools/testing/selftests/net/ip6_gre_headroom.sh
@@ -0,0 +1,65 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+#
+# Test that enough headroom is reserved for the first packet passing through an
+# IPv6 GRE-like netdevice.
+
+setup_prepare()
+{
+	ip link add h1 type veth peer name swp1
+	ip link add h3 type veth peer name swp3
+
+	ip link set dev h1 up
+	ip address add 192.0.2.1/28 dev h1
+
+	ip link add dev vh3 type vrf table 20
+	ip link set dev h3 master vh3
+	ip link set dev vh3 up
+	ip link set dev h3 up
+
+	ip link set dev swp3 up
+	ip address add dev swp3 2001:db8:2::1/64
+	ip address add dev swp3 2001:db8:2::3/64
+
+	ip link set dev swp1 up
+	tc qdisc add dev swp1 clsact
+
+	ip link add name er6 type ip6erspan \
+	   local 2001:db8:2::1 remote 2001:db8:2::2 oseq okey 123
+	ip link set dev er6 up
+
+	ip link add name gt6 type ip6gretap \
+	   local 2001:db8:2::3 remote 2001:db8:2::4
+	ip link set dev gt6 up
+
+	sleep 1
+}
+
+cleanup()
+{
+	ip link del dev gt6
+	ip link del dev er6
+	ip link del dev swp1
+	ip link del dev swp3
+	ip link del dev vh3
+}
+
+test_headroom()
+{
+	local type=$1; shift
+	local tundev=$1; shift
+
+	tc filter add dev swp1 ingress pref 1000 matchall skip_hw \
+		action mirred egress mirror dev $tundev
+	ping -I h1 192.0.2.2 -c 1 -w 2 &> /dev/null
+	tc filter del dev swp1 ingress pref 1000
+
+	# If it doesn't panic, it passes.
+	printf "TEST: %-60s  [PASS]\n" "$type headroom"
+}
+
+trap cleanup EXIT
+
+setup_prepare
+test_headroom ip6gretap gt6
+test_headroom ip6erspan er6
-- 
2.4.11

^ permalink raw reply related

* Re: [PATCH ipsec] xfrm: free skb if nlsk pointer is NULL
From: Steffen Klassert @ 2018-06-25 14:45 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netdev
In-Reply-To: <20180625120007.13345-1-fw@strlen.de>

On Mon, Jun 25, 2018 at 02:00:07PM +0200, Florian Westphal wrote:
> nlmsg_multicast() always frees the skb, so in case we cannot call
> it we must do that ourselves.
> 
> Fixes: 21ee543edc0dea ("xfrm: fix race between netns cleanup and state expire notification")
> Signed-off-by: Florian Westphal <fw@strlen.de>

Applied, thanks Florian!

^ permalink raw reply

* Re: [PATCH net-next] rds: clean up loopback rds_connections on netns deletion
From: Sowmini Varadhan @ 2018-06-25 14:49 UTC (permalink / raw)
  To: davem, rds-devel, santosh.shilimkar, netdev; +Cc: syzkaller-bugs
In-Reply-To: <1529934085-181126-1-git-send-email-sowmini.varadhan@oracle.com>

On (06/25/18 06:41), Sowmini Varadhan wrote:
  :
> Add the changes aligned with the changes from
> commit ebeeb1ad9b8a ("rds: tcp: use rds_destroy_pending() to synchronize
> netns/module teardown and rds connection/workq management") for
> rds_loop_transport

FWIW, I am optimistic that this will take care of a number
of the use-after-free panics reported by syzbot (I have not
marked the patch with the recommended syzkaller Reported-by 
tags because I was not able to reproduce each original issue, 
but inspection of the traces suggests this missing patch may 
be behind the races that cause the reports).

--Sowmini

^ permalink raw reply

* Re: [PATCH] ipv6: avoid copy_from_user() via ipv6_renew_options_kern()
From: Paul Moore @ 2018-06-25 14:49 UTC (permalink / raw)
  To: davem; +Cc: viro, Paul Moore, netdev, selinux, linux-security-module
In-Reply-To: <20180624.164837.37612664745856114.davem@davemloft.net>

On Sun, Jun 24, 2018 at 3:48 AM David Miller <davem@davemloft.net> wrote:
>
> From: Al Viro <viro@ZenIV.linux.org.uk>
> Date: Sat, 23 Jun 2018 23:21:07 +0100
>
> > BTW, I wonder if the life would be simpler with do_ipv6_setsockopt() doing
> > the copy-in and verifying ipv6_optlen(*hdr) <= newoptlen; that would've
> > simplified ipv6_renew_option{,s}() quite a bit and completely eliminated
> > ipv6_renew_options_kern()...
>
> I agree that this makes things a lot simpler.

I had looked at moving the userspace copy up, but feared it was a bit
too invasive.  It sounds like you are open to the idea so I'll code
something up.

> One thing that drives me crazy though is this inherit stuff:
>
> > +     ipv6_renew_option(newtype == IPV6_HOPOPTS ? newopt :
> > +                             opt ? opt->hopopt : NULL,
>
> Why don't we pass the type into ipv6_renew_option() and have it
> do this pointer dance instead?
>
> That's going to definitely be easier to read.

I agree, that struck me as a little odd.  I'll rework that too.  I'll
send you guys something this week to take a look at.

Thanks.

> I don't know enough about this code to give feedback about the
> option length handling wrt. copies, sorry.

-- 
paul moore
www.paul-moore.com

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox