Netdev List

Netdev List
 help / color / mirror / Atom feed

* [PATCH net-next 3/3] RDS: TCP: Force every connection to be initiated by numerically smaller IP address
From: Sowmini Varadhan @ 2016-11-16 21:29 UTC (permalink / raw)
  To: netdev; +Cc: santosh.shilimkar, sowmini.varadhan, davem, rds-devel
In-Reply-To: <cover.1478876910.git.sowmini.varadhan@oracle.com>

When 2 RDS peers initiate an RDS-TCP connection simultaneously,
there is a potential for "duelling syns" on either/both sides.
See commit 241b271952eb ("RDS-TCP: Reset tcp callbacks if re-using an
outgoing socket in rds_tcp_accept_one()") for a description of this
condition, and the arbitration logic which ensures that the
numerically large IP address in the TCP connection is bound to the
RDS_TCP_PORT ("canonical ordering").

The rds_connection should not be marked as RDS_CONN_UP until the
arbitration logic has converged for the following reason. The sender
may start transmitting RDS datagrams as soon as RDS_CONN_UP is set,
and since the sender removes all datagrams from the rds_connection's
cp_retrans queue based on TCP acks. If the TCP ack was sent from
a tcp socket that got reset as part of duel aribitration (but
before data was delivered to the receivers RDS socket layer),
the sender may end up prematurely freeing the datagram, and
the datagram is no longer reliably deliverable.

This patch remedies that condition by making sure that, upon
receipt of 3WH completion state change notification of TCP_ESTABLISHED
in rds_tcp_state_change, we mark the rds_connection as RDS_CONN_UP
if, and only if, the IP addresses and ports for the connection are
canonically ordered. In all other cases, rds_tcp_state_change will
force an rds_conn_path_drop(), and rds_queue_reconnect() on
both peers will restart the connection to ensure canonical ordering.

A side-effect of enforcing this condition in rds_tcp_state_change()
is that rds_tcp_accept_one_path() can now be refactored for simplicity.
It is also no longer possible to encounter an RDS_CONN_UP connection in
the arbitration logic in rds_tcp_accept_one().

Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
---
 net/rds/connection.c  |    1 +
 net/rds/tcp_connect.c |   14 +++++++++++++-
 net/rds/tcp_listen.c  |   29 ++++++++++++-----------------
 3 files changed, 26 insertions(+), 18 deletions(-)

diff --git a/net/rds/connection.c b/net/rds/connection.c
index b86e188..fe9d31c 100644
--- a/net/rds/connection.c
+++ b/net/rds/connection.c
@@ -683,6 +683,7 @@ void rds_conn_path_connect_if_down(struct rds_conn_path *cp)
 	    !test_and_set_bit(RDS_RECONNECT_PENDING, &cp->cp_flags))
 		queue_delayed_work(rds_wq, &cp->cp_conn_w, 0);
 }
+EXPORT_SYMBOL_GPL(rds_conn_path_connect_if_down);

 void rds_conn_connect_if_down(struct rds_connection *conn)
 {
diff --git a/net/rds/tcp_connect.c b/net/rds/tcp_connect.c
index 05f61c5..d6839d9 100644
--- a/net/rds/tcp_connect.c
+++ b/net/rds/tcp_connect.c
@@ -60,7 +60,19 @@ void rds_tcp_state_change(struct sock *sk)
 	case TCP_SYN_RECV:
 		break;
 	case TCP_ESTABLISHED:
-		rds_connect_path_complete(cp, RDS_CONN_CONNECTING);
+		/* Force the peer to reconnect so that we have the
+		 * TCP ports going from <smaller-ip>.<transient> to
+		 * <larger-ip>.<RDS_TCP_PORT>. We avoid marking the
+		 * RDS connection as RDS_CONN_UP until the reconnect,
+		 * to avoid RDS datagram loss.
+		 */
+		if (cp->cp_conn->c_laddr > cp->cp_conn->c_faddr &&
+		    rds_conn_path_transition(cp, RDS_CONN_CONNECTING,
+					     RDS_CONN_ERROR)) {
+			rds_conn_path_drop(cp);
+		} else {
+			rds_connect_path_complete(cp, RDS_CONN_CONNECTING);
+		}
 		break;
 	case TCP_CLOSE_WAIT:
 	case TCP_CLOSE:
diff --git a/net/rds/tcp_listen.c b/net/rds/tcp_listen.c
index c9c4968..f74bab3 100644
--- a/net/rds/tcp_listen.c
+++ b/net/rds/tcp_listen.c
@@ -83,25 +83,20 @@ struct rds_tcp_connection *rds_tcp_accept_one_path(struct rds_connection *conn)
 {
 	int i;
 	bool peer_is_smaller = (conn->c_faddr < conn->c_laddr);
-	int npaths = conn->c_npaths;
-
-	if (npaths <= 1) {
-		struct rds_conn_path *cp = &conn->c_path[0];
-		int ret;
-
-		ret = rds_conn_path_transition(cp, RDS_CONN_DOWN,
-					       RDS_CONN_CONNECTING);
-		if (!ret)
-			rds_conn_path_transition(cp, RDS_CONN_ERROR,
-						 RDS_CONN_CONNECTING);
-		return cp->cp_transport_data;
-	}
+	int npaths = max_t(int, 1, conn->c_npaths);

-	/* for mprds, paths with cp_index > 0 MUST be initiated by the peer
+	/* for mprds, all paths MUST be initiated by the peer
 	 * with the smaller address.
 	 */
-	if (!peer_is_smaller)
+	if (!peer_is_smaller) {
+		/* Make sure we initiate at least one path if this
+		 * has not already been done; rds_start_mprds() will
+		 * take care of additional paths, if necessary.
+		 */
+		if (npaths == 1)
+			rds_conn_path_connect_if_down(&conn->c_path[0]);
 		return NULL;
+	}

 	for (i = 0; i < npaths; i++) {
 		struct rds_conn_path *cp = &conn->c_path[i];
@@ -171,8 +166,8 @@ int rds_tcp_accept_one(struct socket *sock)
 	mutex_lock(&rs_tcp->t_conn_path_lock);
 	cp = rs_tcp->t_cpath;
 	conn_state = rds_conn_path_state(cp);
-	if (conn_state != RDS_CONN_CONNECTING && conn_state != RDS_CONN_UP &&
-	    conn_state != RDS_CONN_ERROR)
+	WARN_ON(conn_state == RDS_CONN_UP);
+	if (conn_state != RDS_CONN_CONNECTING && conn_state != RDS_CONN_ERROR)
 		goto rst_nsk;
 	if (rs_tcp->t_sock) {
 		/* Need to resolve a duelling SYN between peers.
-- 
1.7.1

^ permalink raw reply related

* [PATCH net-next 1/3] RDS: TCP: set RDS_FLAG_RETRANSMITTED in cp_retrans list
From: Sowmini Varadhan @ 2016-11-16 21:29 UTC (permalink / raw)
  To: netdev; +Cc: santosh.shilimkar, sowmini.varadhan, davem, rds-devel
In-Reply-To: <cover.1478876910.git.sowmini.varadhan@oracle.com>

As noted in rds_recv_incoming() sequence numbers on data packets
can decreas for the failover case, and the Rx path is equipped
to recover from this, if the RDS_FLAG_RETRANSMITTED is set
on the rds header of an incoming message with a suspect sequence
number.

The RDS_FLAG_RETRANSMITTED is predicated on the RDS_FLAG_RETRANSMITTED
flag in the rds_message, so make sure the flag is set on messages
queued for retransmission.

Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
---
 net/rds/tcp_send.c |    3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/net/rds/tcp_send.c b/net/rds/tcp_send.c
index 89d09b4..dcf4742 100644
--- a/net/rds/tcp_send.c
+++ b/net/rds/tcp_send.c
@@ -100,6 +100,9 @@ int rds_tcp_xmit(struct rds_connection *conn, struct rds_message *rm,
 		set_bit(RDS_MSG_HAS_ACK_SEQ, &rm->m_flags);
 		tc->t_last_expected_una = rm->m_ack_seq + 1;
 
+		if (test_bit(RDS_MSG_RETRANSMITTED, &rm->m_flags))
+			rm->m_inc.i_hdr.h_flags |= RDS_FLAG_RETRANSMITTED;
+
 		rdsdebug("rm %p tcp nxt %u ack_seq %llu\n",
 			 rm, rds_tcp_snd_nxt(tc),
 			 (unsigned long long)rm->m_ack_seq);
-- 
1.7.1

^ permalink raw reply related

* [PATCH net-next 2/3] RDS: TCP: Track peer's connection generation number
From: Sowmini Varadhan @ 2016-11-16 21:29 UTC (permalink / raw)
  To: netdev; +Cc: santosh.shilimkar, sowmini.varadhan, davem, rds-devel
In-Reply-To: <cover.1478876910.git.sowmini.varadhan@oracle.com>

The RDS transport has to be able to distinguish between
two types of failure events:
(a) when the transport fails (e.g., TCP connection reset)
    but the RDS socket/connection layer on both sides stays
    the same
(b) when the peer's RDS layer itself resets (e.g., due to module
    reload or machine reboot at the peer)
In case (a) both sides must reconnect and continue the RDS messaging
without any message loss or disruption to the message sequence numbers,
and this is achieved by rds_send_path_reset().

In case (b) we should reset all rds_connection state to the
new incarnation of the peer. Examples of state that needs to
be reset are next expected rx sequence number from, or messages to be
retransmitted to, the new incarnation of the peer.

To achieve this, the RDS handshake probe added as part of
commit 5916e2c1554f ("RDS: TCP: Enable multipath RDS for TCP")
is enhanced so that sender and receiver of the RDS ping-probe
will add a generation number as part of the RDS_EXTHDR_GEN_NUM
extension header. Each peer stores local and remote generation
numbers as part of each rds_connection. Changes in generation
number will be detected via incoming handshake probe ping
request or response and will allow the receiver to reset rds_connection
state.

Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
---
 net/rds/af_rds.c     |    4 ++++
 net/rds/connection.c |    2 ++
 net/rds/message.c    |    1 +
 net/rds/rds.h        |    8 +++++++-
 net/rds/recv.c       |   36 ++++++++++++++++++++++++++++++++++++
 net/rds/send.c       |    9 +++++++--
 6 files changed, 57 insertions(+), 3 deletions(-)

diff --git a/net/rds/af_rds.c b/net/rds/af_rds.c
index 6beaeb1..2ac1e61 100644
--- a/net/rds/af_rds.c
+++ b/net/rds/af_rds.c
@@ -605,10 +605,14 @@ static void rds_exit(void)
 }
 module_exit(rds_exit);
 
+u32 rds_gen_num;
+
 static int rds_init(void)
 {
 	int ret;
 
+	net_get_random_once(&rds_gen_num, sizeof(rds_gen_num));
+
 	ret = rds_bind_lock_init();
 	if (ret)
 		goto out;
diff --git a/net/rds/connection.c b/net/rds/connection.c
index 13f459d..b86e188 100644
--- a/net/rds/connection.c
+++ b/net/rds/connection.c
@@ -269,6 +269,8 @@ static void __rds_conn_path_init(struct rds_connection *conn,
 			kmem_cache_free(rds_conn_slab, conn);
 			conn = found;
 		} else {
+			conn->c_my_gen_num = rds_gen_num;
+			conn->c_peer_gen_num = 0;
 			hlist_add_head_rcu(&conn->c_hash_node, head);
 			rds_cong_add_conn(conn);
 			rds_conn_count++;
diff --git a/net/rds/message.c b/net/rds/message.c
index 6cb9106..49bfb51 100644
--- a/net/rds/message.c
+++ b/net/rds/message.c
@@ -42,6 +42,7 @@
 [RDS_EXTHDR_RDMA]	= sizeof(struct rds_ext_header_rdma),
 [RDS_EXTHDR_RDMA_DEST]	= sizeof(struct rds_ext_header_rdma_dest),
 [RDS_EXTHDR_NPATHS]	= sizeof(u16),
+[RDS_EXTHDR_GEN_NUM]	= sizeof(u32),
 };
 
 
diff --git a/net/rds/rds.h b/net/rds/rds.h
index 4121e18..ebbf909 100644
--- a/net/rds/rds.h
+++ b/net/rds/rds.h
@@ -151,6 +151,9 @@ struct rds_connection {
 
 	struct rds_conn_path	c_path[RDS_MPATH_WORKERS];
 	wait_queue_head_t	c_hs_waitq; /* handshake waitq */
+
+	u32			c_my_gen_num;
+	u32			c_peer_gen_num;
 };
 
 static inline
@@ -243,7 +246,8 @@ struct rds_ext_header_rdma_dest {
 /* Extension header announcing number of paths.
  * Implicit length = 2 bytes.
  */
-#define RDS_EXTHDR_NPATHS	4
+#define RDS_EXTHDR_NPATHS	5
+#define RDS_EXTHDR_GEN_NUM	6
 
 #define __RDS_EXTHDR_MAX	16 /* for now */
 
@@ -338,6 +342,7 @@ static inline u32 rds_rdma_cookie_offset(rds_rdma_cookie_t cookie)
 #define RDS_MSG_RETRANSMITTED	5
 #define RDS_MSG_MAPPED		6
 #define RDS_MSG_PAGEVEC		7
+#define RDS_MSG_FLUSH		8
 
 struct rds_message {
 	atomic_t		m_refcount;
@@ -664,6 +669,7 @@ static inline void __rds_wake_sk_sleep(struct sock *sk)
 struct rds_message *rds_cong_update_alloc(struct rds_connection *conn);
 
 /* conn.c */
+extern u32 rds_gen_num;
 int rds_conn_init(void);
 void rds_conn_exit(void);
 struct rds_connection *rds_conn_create(struct net *net,
diff --git a/net/rds/recv.c b/net/rds/recv.c
index cbfabdf..9d0666e 100644
--- a/net/rds/recv.c
+++ b/net/rds/recv.c
@@ -120,6 +120,36 @@ static void rds_recv_rcvbuf_delta(struct rds_sock *rs, struct sock *sk,
 	/* do nothing if no change in cong state */
 }
 
+static void rds_conn_peer_gen_update(struct rds_connection *conn,
+				     u32 peer_gen_num)
+{
+	int i;
+	struct rds_message *rm, *tmp;
+	unsigned long flags;
+
+	WARN_ON(conn->c_trans->t_type != RDS_TRANS_TCP);
+	if (peer_gen_num != 0) {
+		if (conn->c_peer_gen_num != 0 &&
+		    peer_gen_num != conn->c_peer_gen_num) {
+			for (i = 0; i < RDS_MPATH_WORKERS; i++) {
+				struct rds_conn_path *cp;
+
+				cp = &conn->c_path[i];
+				spin_lock_irqsave(&cp->cp_lock, flags);
+				cp->cp_next_tx_seq = 1;
+				cp->cp_next_rx_seq = 0;
+				list_for_each_entry_safe(rm, tmp,
+							 &cp->cp_retrans,
+							 m_conn_item) {
+					set_bit(RDS_MSG_FLUSH, &rm->m_flags);
+				}
+				spin_unlock_irqrestore(&cp->cp_lock, flags);
+			}
+		}
+		conn->c_peer_gen_num = peer_gen_num;
+	}
+}
+
 /*
  * Process all extension headers that come with this message.
  */
@@ -163,7 +193,9 @@ static void rds_recv_hs_exthdrs(struct rds_header *hdr,
 	union {
 		struct rds_ext_header_version version;
 		u16 rds_npaths;
+		u32 rds_gen_num;
 	} buffer;
+	u32 new_peer_gen_num = 0;
 
 	while (1) {
 		len = sizeof(buffer);
@@ -176,6 +208,9 @@ static void rds_recv_hs_exthdrs(struct rds_header *hdr,
 			conn->c_npaths = min_t(int, RDS_MPATH_WORKERS,
 					       buffer.rds_npaths);
 			break;
+		case RDS_EXTHDR_GEN_NUM:
+			new_peer_gen_num = buffer.rds_gen_num;
+			break;
 		default:
 			pr_warn_ratelimited("ignoring unknown exthdr type "
 					     "0x%x\n", type);
@@ -183,6 +218,7 @@ static void rds_recv_hs_exthdrs(struct rds_header *hdr,
 	}
 	/* if RDS_EXTHDR_NPATHS was not found, default to a single-path */
 	conn->c_npaths = max_t(int, conn->c_npaths, 1);
+	rds_conn_peer_gen_update(conn, new_peer_gen_num);
 }
 
 /* rds_start_mprds() will synchronously start multiple paths when appropriate.
diff --git a/net/rds/send.c b/net/rds/send.c
index 896626b..77c8c6e 100644
--- a/net/rds/send.c
+++ b/net/rds/send.c
@@ -259,8 +259,9 @@ int rds_send_xmit(struct rds_conn_path *cp)
 			 * connection.
 			 * Therefore, we never retransmit messages with RDMA ops.
 			 */
-			if (rm->rdma.op_active &&
-			    test_bit(RDS_MSG_RETRANSMITTED, &rm->m_flags)) {
+			if (test_bit(RDS_MSG_FLUSH, &rm->m_flags) ||
+			    (rm->rdma.op_active &&
+			    test_bit(RDS_MSG_RETRANSMITTED, &rm->m_flags))) {
 				spin_lock_irqsave(&cp->cp_lock, flags);
 				if (test_and_clear_bit(RDS_MSG_ON_CONN, &rm->m_flags))
 					list_move(&rm->m_conn_item, &to_be_dropped);
@@ -1209,6 +1210,10 @@ int rds_sendmsg(struct socket *sock, struct msghdr *msg, size_t payload_len)
 		rds_message_add_extension(&rm->m_inc.i_hdr,
 					  RDS_EXTHDR_NPATHS, &npaths,
 					  sizeof(npaths));
+		rds_message_add_extension(&rm->m_inc.i_hdr,
+					  RDS_EXTHDR_GEN_NUM,
+					  &cp->cp_conn->c_my_gen_num,
+					  sizeof(u32));
 	}
 	spin_unlock_irqrestore(&cp->cp_lock, flags);
 
-- 
1.7.1

^ permalink raw reply related

* [PATCH net-next 0/3] RDS: TCP: HA/Failover fixes
From: Sowmini Varadhan @ 2016-11-16 21:29 UTC (permalink / raw)
  To: netdev; +Cc: santosh.shilimkar, sowmini.varadhan, davem, rds-devel

This series contains a set of fixes for bugs exposed when
we ran the following in a loop between a test machine pair:

 while (1); do
   # modprobe rds-tcp on test nodes
   # run rds-stress in bi-dir mode between test machine pair 
   # modprobe -r rds-tcp on test nodes
 done

rds-stress in bi-dir mode will cause both nodes to initiate
RDS-TCP connections at almost the same instant, exposing the 
bugs fixed in this series. 

Without the fixes, rds-stress reports sporadic packet drops,
and packets arriving out of sequence. After the fixes,we have
been able to run the  test overnight, without any issues.

Each patch has a detailed description of the root-cause fixed
by the patch.

Sowmini Varadhan (3):
  RDS: TCP: set RDS_FLAG_RETRANSMITTED in cp_retrans list
  RDS: TCP: Track peer's connection generation number
  RDS: TCP: Force every connection to be initiated by numerically
    smaller IP address

 net/rds/af_rds.c      |    4 ++++
 net/rds/connection.c  |    3 +++
 net/rds/message.c     |    1 +
 net/rds/rds.h         |    8 +++++++-
 net/rds/recv.c        |   36 ++++++++++++++++++++++++++++++++++++
 net/rds/send.c        |    9 +++++++--
 net/rds/tcp_connect.c |   14 +++++++++++++-
 net/rds/tcp_listen.c  |   29 ++++++++++++-----------------
 net/rds/tcp_send.c    |    3 +++
 9 files changed, 86 insertions(+), 21 deletions(-)

^ permalink raw reply

* Re: [PATCH net-next] lwtunnel: subtract tunnel headroom from mtu on output redirect
From: David Miller @ 2016-11-16 22:01 UTC (permalink / raw)
  To: david.lebrun; +Cc: netdev, roopa
In-Reply-To: <1479287146-25766-1-git-send-email-david.lebrun@uclouvain.be>

From: David Lebrun <david.lebrun@uclouvain.be>
Date: Wed, 16 Nov 2016 10:05:46 +0100

> This patch changes the lwtunnel_headroom() function which is called
> in ipv4_mtu() and ip6_mtu(), to also return the correct headroom
> value when the lwtunnel state is OUTPUT_REDIRECT.
> 
> This patch enables e.g. SR-IPv6 encapsulations to work without
> manually setting the route mtu.
> 
> Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com>
> Signed-off-by: David Lebrun <david.lebrun@uclouvain.be>

Applied, thanks David.

^ permalink raw reply

* Re: [PATCH net-next] sfc: remove napi_hash_del() call
From: David Miller @ 2016-11-16 22:05 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev, ecree, bkenward
In-Reply-To: <1479304907.8455.171.camel@edumazet-glaptop3.roam.corp.google.com>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 16 Nov 2016 06:01:47 -0800

> From: Eric Dumazet <edumazet@google.com>
> 
> Calling napi_hash_del() after netif_napi_del() is pointless.
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Applied.

^ permalink raw reply

* Re: [PATCH] netronome: don't access real_num_rx_queues directly
From: David Miller @ 2016-11-16 22:06 UTC (permalink / raw)
  To: arnd; +Cc: jakub.kicinski, rolf.neugebauer, oss-drivers, netdev,
	linux-kernel
In-Reply-To: <20161116141118.1893244-1-arnd@arndb.de>

From: Arnd Bergmann <arnd@arndb.de>
Date: Wed, 16 Nov 2016 15:10:49 +0100

> The netdev->real_num_rx_queues setting is only available if CONFIG_SYSFS
> is enabled, so we now get a build failure when that is turned off:
> 
> netronome/nfp/nfp_net_common.c: In function 'nfp_net_ring_swap_enable':
> netronome/nfp/nfp_net_common.c:2489:18: error: 'struct net_device' has no member named 'real_num_rx_queues'; did you mean 'real_num_tx_queues'?
> 
> As far as I can tell, the check here is only used as an optimization that
> we can skip in order to fix the compilation. If sysfs is disabled,
> the following netif_set_real_num_rx_queues() has no effect.
> 
> Fixes: 164d1e9e5d52 ("nfp: add support for ethtool .set_channels")
> Signed-off-by: Arnd Bergmann <arnd@arndb.de>

Applied, thanks.

^ permalink raw reply

* Re: [PATCH net] be2net: do not call napi_hash_del()
From: David Miller @ 2016-11-16 22:07 UTC (permalink / raw)
  To: eric.dumazet
  Cc: netdev, sathya.perla, ajit.khaparde, sriharsha.basavapatna,
	somnath.kotur
In-Reply-To: <1479305562.8455.176.camel@edumazet-glaptop3.roam.corp.google.com>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 16 Nov 2016 06:12:42 -0800

> From: Eric Dumazet <edumazet@google.com>
> 
> Calling napi_hash_del() before netif_napi_del() is dangerous
> if a synchronize_rcu() is not enforced before NAPI struct freeing.
> 
> Lets leave this detail to core networking stack and feel
> more comfortable.
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Applied.

^ permalink raw reply

* Re: [PATCH net] cxgb4: do not call napi_hash_del()
From: David Miller @ 2016-11-16 22:07 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev, hariprasad
In-Reply-To: <1479305942.8455.179.camel@edumazet-glaptop3.roam.corp.google.com>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 16 Nov 2016 06:19:02 -0800

> From: Eric Dumazet <edumazet@google.com>
> 
> Calling napi_hash_del() before netif_napi_del() is dangerous
> if a synchronize_rcu() is not enforced before NAPI struct freeing.
> 
> Lets leave this detail to core networking stack and feel
> more comfortable.
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Applied.

^ permalink raw reply

* [PATCH net 1/1] net sched filters: pass netlink message flags in event notification
From: Roman Mashak @ 2016-11-16 22:16 UTC (permalink / raw)
  To: davem; +Cc: netdev, jhs, Roman Mashak

Userland client should be able to read an event, and reflect it back to
the kernel, therefore it needs to extract complete set of netlink flags.

For example, this will allow "tc monitor" to distinguish Add and Replace
operations.

Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
---
 net/sched/cls_api.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index 2b2a797..8e93d4a 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -112,7 +112,7 @@ static void tfilter_notify_chain(struct net *net, struct sk_buff *oskb,
 
 	for (it_chain = chain; (tp = rtnl_dereference(*it_chain)) != NULL;
 	     it_chain = &tp->next)
-		tfilter_notify(net, oskb, n, tp, 0, event, false);
+		tfilter_notify(net, oskb, n, tp, n->nlmsg_flags, event, false);
 }
 
 /* Select new prio value from the range, managed by kernel. */
@@ -430,7 +430,8 @@ static int tfilter_notify(struct net *net, struct sk_buff *oskb,
 	if (!skb)
 		return -ENOBUFS;
 
-	if (tcf_fill_node(net, skb, tp, fh, portid, n->nlmsg_seq, 0, event) <= 0) {
+	if (tcf_fill_node(net, skb, tp, fh, portid, n->nlmsg_seq,
+			  n->nlmsg_flags, event) <= 0) {
 		kfree_skb(skb);
 		return -EINVAL;
 	}
-- 
1.9.1

^ permalink raw reply related

* [PATCH iproute2 1/1] tc: distinguish Add/Replace filter operations
From: Roman Mashak @ 2016-11-16 22:30 UTC (permalink / raw)
  To: stephen; +Cc: netdev, jhs, Roman Mashak

Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
---
 tc/tc_filter.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/tc/tc_filter.c b/tc/tc_filter.c
index 932677a..ff8713b 100644
--- a/tc/tc_filter.c
+++ b/tc/tc_filter.c
@@ -226,6 +226,16 @@ int print_filter(const struct sockaddr_nl *who, struct nlmsghdr *n, void *arg)
 	if (n->nlmsg_type == RTM_DELTFILTER)
 		fprintf(fp, "deleted ");
 
+	if (n->nlmsg_type == RTM_NEWTFILTER &&
+			(n->nlmsg_flags & NLM_F_CREATE) &&
+			!(n->nlmsg_flags & NLM_F_EXCL))
+		fprintf(fp, "replaced ");
+
+	if (n->nlmsg_type == RTM_NEWTFILTER &&
+			(n->nlmsg_flags & NLM_F_CREATE) &&
+			(n->nlmsg_flags & NLM_F_EXCL))
+		fprintf(fp, "added ");
+
 	fprintf(fp, "filter ");
 	if (!filter_ifindex || filter_ifindex != t->tcm_ifindex)
 		fprintf(fp, "dev %s ", ll_index_to_name(t->tcm_ifindex));
-- 
1.9.1

^ permalink raw reply related

* Re: Netperf UDP issue with connected sockets
From: Jesper Dangaard Brouer @ 2016-11-16 22:40 UTC (permalink / raw)
  To: Rick Jones; +Cc: netdev@vger.kernel.org, Eric Dumazet, brouer
In-Reply-To: <7c4b43a4-74bf-1ee2-6f0d-17783b5d8fcb@hpe.com>

On Wed, 16 Nov 2016 09:46:37 -0800
Rick Jones <rick.jones2@hpe.com> wrote:

> On 11/16/2016 04:16 AM, Jesper Dangaard Brouer wrote:
> > [1] Subj: High perf top ip_idents_reserve doing netperf UDP_STREAM
> >  - https://www.spinics.net/lists/netdev/msg294752.html
> >
> > Not fixed in version 2.7.0.
> >  - ftp://ftp.netperf.org/netperf/netperf-2.7.0.tar.gz
> >
> > Used extra netperf configure compile options:
> >  ./configure  --enable-histogram --enable-demo
> >
> > It seems like some fix attempts exists in the SVN repository::
> >
> >  svn checkout http://www.netperf.org/svn/netperf2/trunk/ netperf2-svn
> >  svn log -r709
> >  # A quick stab at getting remote connect going for UDP_STREAM
> >  svn diff -r708:709
> >
> > Testing with SVN version, still show __ip_select_ident() in top#1.  
> 
> Indeed, there was a fix for getting the remote side connect()ed. 
> Looking at what I have for the top of trunk I do though see a connect() 
> call being made at the local end:
> 
> socket(PF_INET, SOCK_DGRAM, IPPROTO_UDP) = 4
> getsockopt(4, SOL_SOCKET, SO_SNDBUF, [212992], [4]) = 0
> getsockopt(4, SOL_SOCKET, SO_RCVBUF, [212992], [4]) = 0
> setsockopt(4, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
> bind(4, {sa_family=AF_INET, sin_port=htons(0), 
> sin_addr=inet_addr("0.0.0.0")}, 16) = 0
> setsockopt(4, SOL_SOCKET, SO_DONTROUTE, [1], 4) = 0
> setsockopt(4, SOL_IP, IP_RECVERR, [1], 4) = 0
> brk(0xe53000)                           = 0xe53000
> getsockname(4, {sa_family=AF_INET, sin_port=htons(59758), 
> sin_addr=inet_addr("0.0.0.0")}, [16]) = 0
> sendto(3, 
> "\0\0\0a\377\377\377\377\377\377\377\377\377\377\377\377\0\0\0\10\0\0\0\0\0\0\0\321\377\377\377\377"..., 
> 656, 0, NULL, 0) = 656
> select(1024, [3], NULL, NULL, {120, 0}) = 1 (in [3], left {119, 995630})
> recvfrom(3, 
> "\0\0\0b\0\0\0\0\0\3@\0\0\3@\0\0\0\0\2\0\3@\0\377\377\377\377\0\0\0\321"..., 
> 656, 0, NULL, NULL) = 656
> write(1, "need to connect is 1\n", 21)  = 21
> rt_sigaction(SIGALRM, {0x402ea6, [ALRM], SA_RESTORER|SA_INTERRUPT, 
> 0x7f2824eb2cb0}, NULL, 8) = 0
> rt_sigaction(SIGINT, {0x402ea6, [INT], SA_RESTORER|SA_INTERRUPT, 
> 0x7f2824eb2cb0}, NULL, 8) = 0
> alarm(1)                                = 0
> connect(4, {sa_family=AF_INET, sin_port=htons(34832), 
> sin_addr=inet_addr("127.0.0.1")}, 16) = 0
> sendto(4, "netperf\0netperf\0netperf\0netperf\0"..., 1024, 0, NULL, 0) = 
> 1024
> sendto(4, "netperf\0netperf\0netperf\0netperf\0"..., 1024, 0, NULL, 0) = 
> 1024
> sendto(4, "netperf\0netperf\0netperf\0netperf\0"..., 1024, 0, NULL, 0) = 
> 1024
> 
> the only difference there with top of trunk is that "need to connect" 
> write/printf I just put in the code to be a nice marker in the system 
> call trace.
> 
> It is a wild guess, but does setting SO_DONTROUTE affect whether or not 
> a connect() would have the desired effect?  That is there to protect 
> people from themselves (long story about people using UDP_STREAM to 
> stress improperly air-gapped systems during link up/down testing....) 
> It can be disabled with a test-specific -R 1 option, so your netperf 
> command would become:
> 
> netperf -H 198.18.50.1 -t UDP_STREAM -l 120 -- -m 1472 -n -N -R 1

Using -R 1 does not seem to help remove __ip_select_ident()

Samples: 56K of event 'cycles', Event count (approx.): 78628132661
  Overhead  Command        Shared Object        Symbol
+    9.11%  netperf        [kernel.vmlinux]     [k] __ip_select_ident
+    6.98%  netperf        [kernel.vmlinux]     [k] _raw_spin_lock
+    6.21%  swapper        [mlx5_core]          [k] mlx5e_poll_tx_cq
+    5.03%  netperf        [kernel.vmlinux]     [k] copy_user_enhanced_fast_string
+    4.69%  netperf        [kernel.vmlinux]     [k] __ip_make_skb
+    4.63%  netperf        [kernel.vmlinux]     [k] skb_set_owner_w
+    4.15%  swapper        [kernel.vmlinux]     [k] __slab_free
+    3.80%  netperf        [mlx5_core]          [k] mlx5e_sq_xmit
+    2.00%  swapper        [kernel.vmlinux]     [k] sock_wfree
+    1.94%  netperf        netperf              [.] send_data
+    1.92%  netperf        netperf              [.] send_omni_inner


> >
> > (p.s. is netperf ever going to be converted from SVN to git?)
> >  
> 
> Well....  my git-fu could use some work (gentle, offlinetaps with a 
> clueful tutorial bat would be welcome), and at least in the past, going 
> to git was held back because there were a bunch of netperf users on 
> Windows and there wasn't (at the time) support for git under Windows.
> 
> But I am not against the idea in principle.

Once you have learned git, you will never go back to SVN. Just do it! :-)

Here are even nice writeups of how to convert and preserve history:
 http://john.albin.net/git/convert-subversion-to-git

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply

* [PATCH iproute2 v2 4/9] l2tp: fix L2TP_ATTR_{RECV,SEND}_SEQ handling
From: Asbjørn Sloth Tønnesen @ 2016-11-16 22:45 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: James Chapman, netdev
In-Reply-To: <20161116224526.32343-1-asbjorn@asbjorn.st>

L2TP_ATTR_RECV_SEQ and L2TP_ATTR_SEND_SEQ are declared as NLA_U8
attributes in the kernel, so let's threat them accordingly.

Signed-off-by: Asbjørn Sloth Tønnesen <asbjorn@asbjorn.st>
---
 ip/ipl2tp.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/ip/ipl2tp.c b/ip/ipl2tp.c
index 2e0e9c7..a7cbd66 100644
--- a/ip/ipl2tp.c
+++ b/ip/ipl2tp.c
@@ -160,8 +160,8 @@ static int create_session(struct l2tp_parm *p)
 	addattr8(&req.n, 1024, L2TP_ATTR_L2SPEC_LEN, p->l2spec_len);
 
 	if (p->mtu)		addattr16(&req.n, 1024, L2TP_ATTR_MTU, p->mtu);
-	if (p->recv_seq)	addattr(&req.n, 1024, L2TP_ATTR_RECV_SEQ);
-	if (p->send_seq)	addattr(&req.n, 1024, L2TP_ATTR_SEND_SEQ);
+	if (p->recv_seq)	addattr8(&req.n, 1024, L2TP_ATTR_RECV_SEQ, 1);
+	if (p->send_seq)	addattr8(&req.n, 1024, L2TP_ATTR_SEND_SEQ, 1);
 	if (p->lns_mode)	addattr(&req.n, 1024, L2TP_ATTR_LNS_MODE);
 	if (p->data_seq)	addattr8(&req.n, 1024, L2TP_ATTR_DATA_SEQ, p->data_seq);
 	if (p->reorder_timeout) addattr64(&req.n, 1024, L2TP_ATTR_RECV_TIMEOUT,
@@ -304,8 +304,10 @@ static int get_response(struct nlmsghdr *n, void *arg)
 		memcpy(p->peer_cookie, RTA_DATA(attrs[L2TP_ATTR_PEER_COOKIE]),
 		       p->peer_cookie_len = RTA_PAYLOAD(attrs[L2TP_ATTR_PEER_COOKIE]));
 
-	p->recv_seq = !!attrs[L2TP_ATTR_RECV_SEQ];
-	p->send_seq = !!attrs[L2TP_ATTR_SEND_SEQ];
+	if (attrs[L2TP_ATTR_RECV_SEQ])
+		p->recv_seq = !!rta_getattr_u8(attrs[L2TP_ATTR_RECV_SEQ]);
+	if (attrs[L2TP_ATTR_SEND_SEQ])
+		p->send_seq = !!rta_getattr_u8(attrs[L2TP_ATTR_SEND_SEQ]);
 
 	if (attrs[L2TP_ATTR_RECV_TIMEOUT])
 		p->reorder_timeout = rta_getattr_u64(attrs[L2TP_ATTR_RECV_TIMEOUT]);
-- 
2.10.2

^ permalink raw reply related

* [PATCH iproute2 v2 5/9] l2tp: fix L2TP_ATTR_UDP_CSUM handling
From: Asbjørn Sloth Tønnesen @ 2016-11-16 22:45 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: James Chapman, netdev
In-Reply-To: <20161116224526.32343-1-asbjorn@asbjorn.st>

L2TP_ATTR_UDP_CSUM is read by the kernel as a NLA_FLAG value,
but is validated as a NLA_U8, so we will write it as an u8,
but the value isn't actually being read by the kernel.

It is written by the kernel as a NLA_U8, so we will read as
such.

Signed-off-by: Asbjørn Sloth Tønnesen <asbjorn@asbjorn.st>
---
 ip/ipl2tp.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/ip/ipl2tp.c b/ip/ipl2tp.c
index a7cbd66..03ca0cc 100644
--- a/ip/ipl2tp.c
+++ b/ip/ipl2tp.c
@@ -120,7 +120,7 @@ static int create_tunnel(struct l2tp_parm *p)
 		addattr16(&req.n, 1024, L2TP_ATTR_UDP_SPORT, p->local_udp_port);
 		addattr16(&req.n, 1024, L2TP_ATTR_UDP_DPORT, p->peer_udp_port);
 		if (p->udp_csum)
-			addattr(&req.n, 1024, L2TP_ATTR_UDP_CSUM);
+			addattr8(&req.n, 1024, L2TP_ATTR_UDP_CSUM, 1);
 		if (!p->udp6_csum_tx)
 			addattr(&req.n, 1024, L2TP_ATTR_UDP_ZERO_CSUM6_TX);
 		if (!p->udp6_csum_rx)
@@ -289,7 +289,9 @@ static int get_response(struct nlmsghdr *n, void *arg)
 	if (attrs[L2TP_ATTR_L2SPEC_LEN])
 		p->l2spec_len = rta_getattr_u8(attrs[L2TP_ATTR_L2SPEC_LEN]);
 
-	p->udp_csum = !!attrs[L2TP_ATTR_UDP_CSUM];
+	if (attrs[L2TP_ATTR_UDP_CSUM])
+		p->udp_csum = !!rta_getattr_u8(attrs[L2TP_ATTR_UDP_CSUM]);
+
 	/*
 	 * Not fetching from L2TP_ATTR_UDP_ZERO_CSUM6_{T,R}X because the
 	 * kernel doesn't send it so just leave it as default value.
-- 
2.10.2

^ permalink raw reply related

* [PATCH iproute2 v2 3/9] l2tp: fix integers with too few significant bits
From: Asbjørn Sloth Tønnesen @ 2016-11-16 22:45 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: James Chapman, netdev
In-Reply-To: <20161116224526.32343-1-asbjorn@asbjorn.st>

udp6_csum_{tx,rx}, tunnel and session are the only ones
currently used.

recv_seq, send_seq, lns_mode and data_seq are partially
implemented in a useless way.

Signed-off-by: Asbjørn Sloth Tønnesen <asbjorn@asbjorn.st>
---
 ip/ipl2tp.c | 18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/ip/ipl2tp.c b/ip/ipl2tp.c
index d3338ac..2e0e9c7 100644
--- a/ip/ipl2tp.c
+++ b/ip/ipl2tp.c
@@ -56,15 +56,15 @@ struct l2tp_parm {
 
 	uint16_t pw_type;
 	uint16_t mtu;
-	int udp6_csum_tx:1;
-	int udp6_csum_rx:1;
-	int udp_csum:1;
-	int recv_seq:1;
-	int send_seq:1;
-	int lns_mode:1;
-	int data_seq:2;
-	int tunnel:1;
-	int session:1;
+	unsigned int udp6_csum_tx:1;
+	unsigned int udp6_csum_rx:1;
+	unsigned int udp_csum:1;
+	unsigned int recv_seq:1;
+	unsigned int send_seq:1;
+	unsigned int lns_mode:1;
+	unsigned int data_seq:2;
+	unsigned int tunnel:1;
+	unsigned int session:1;
 	int reorder_timeout;
 	const char *ifname;
 	uint8_t l2spec_type;
-- 
2.10.2

^ permalink raw reply related

* [PATCH iproute2 v2 9/9] man: ip-l2tp.8: document UDP checksum options
From: Asbjørn Sloth Tønnesen @ 2016-11-16 22:45 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: James Chapman, netdev
In-Reply-To: <20161116224526.32343-1-asbjorn@asbjorn.st>

Signed-off-by: Asbjørn Sloth Tønnesen <asbjorn@asbjorn.st>
---
 man/man8/ip-l2tp.8 | 33 +++++++++++++++++++++++++++++++++
 1 file changed, 33 insertions(+)

diff --git a/man/man8/ip-l2tp.8 b/man/man8/ip-l2tp.8
index d4e7270..8ce630a 100644
--- a/man/man8/ip-l2tp.8
+++ b/man/man8/ip-l2tp.8
@@ -30,6 +30,12 @@ ip-l2tp - L2TPv3 static unmanaged tunnel configuration
 .IR PORT
 .RB " ]"
 .br
+.RB "[ " udp_csum " { " on " | " off " } ]"
+.br
+.RB "[ " udp6_csum_tx " { " on " | " off " } ]"
+.br
+.RB "[ " udp6_csum_rx " { " on " | " off " } ]"
+.br
 .ti -8
 .BR "ip l2tp add session"
 .RB "[ " name
@@ -190,6 +196,33 @@ selected.
 set the UDP destination port to be used for the tunnel. Must be
 present when udp encapsulation is selected. Ignored when ip
 encapsulation is selected.
+.TP
+.BI udp_csum " STATE"
+(IPv4 only) control if IPv4 UDP checksums should be calculated and checked for the
+encapsulating UDP packets, when UDP encapsulating is selected.
+Default is
+.BR off "."
+.br
+Valid values are:
+.BR on ", " off "."
+.TP
+.BI udp6_csum_tx " STATE"
+(IPv6 only) control if IPv6 UDP checksums should be calculated for encapsulating
+UDP packets, when UDP encapsulating is selected.
+Default is
+.BR on "."
+.br
+Valid values are:
+.BR on ", " off "."
+.TP
+.BI udp6_csum_rx " STATE"
+(IPv6 only) control if IPv6 UDP checksums should be checked for the encapsulating
+UDP packets, when UDP encapsulating is selected.
+Default is
+.BR on "."
+.br
+Valid values are:
+.BR on ", " off "."
 .SS ip l2tp del tunnel - destroy a tunnel
 .TP
 .BI tunnel_id " ID"
-- 
2.10.2

^ permalink raw reply related

* [PATCH iproute2 v2 6/9] l2tp: read IPv6 UDP checksum attributes from kernel
From: Asbjørn Sloth Tønnesen @ 2016-11-16 22:45 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: James Chapman, netdev
In-Reply-To: <20161116224526.32343-1-asbjorn@asbjorn.st>

In case of an older kernel that doesn't set L2TP_ATTR_UDP_ZERO_CSUM6_{RX,TX}
the old hard-coded value is being preserved, since the attribute flag will be
missing.

Signed-off-by: Asbjørn Sloth Tønnesen <asbjorn@asbjorn.st>
---
 ip/ipl2tp.c | 9 +++------
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/ip/ipl2tp.c b/ip/ipl2tp.c
index 03ca0cc..f5d4113 100644
--- a/ip/ipl2tp.c
+++ b/ip/ipl2tp.c
@@ -292,12 +292,9 @@ static int get_response(struct nlmsghdr *n, void *arg)
 	if (attrs[L2TP_ATTR_UDP_CSUM])
 		p->udp_csum = !!rta_getattr_u8(attrs[L2TP_ATTR_UDP_CSUM]);
 
-	/*
-	 * Not fetching from L2TP_ATTR_UDP_ZERO_CSUM6_{T,R}X because the
-	 * kernel doesn't send it so just leave it as default value.
-	 */
-	p->udp6_csum_tx = 1;
-	p->udp6_csum_rx = 1;
+	p->udp6_csum_tx = !attrs[L2TP_ATTR_UDP_ZERO_CSUM6_TX];
+	p->udp6_csum_rx = !attrs[L2TP_ATTR_UDP_ZERO_CSUM6_RX];
+
 	if (attrs[L2TP_ATTR_COOKIE])
 		memcpy(p->cookie, RTA_DATA(attrs[L2TP_ATTR_COOKIE]),
 		       p->cookie_len = RTA_PAYLOAD(attrs[L2TP_ATTR_COOKIE]));
-- 
2.10.2

^ permalink raw reply related

* [PATCH iproute2 v2 7/9] l2tp: support sequence numbering
From: Asbjørn Sloth Tønnesen @ 2016-11-16 22:45 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: James Chapman, netdev
In-Reply-To: <20161116224526.32343-1-asbjorn@asbjorn.st>

This patch implement and documents the user interface for
sequence numbering.

Signed-off-by: Asbjørn Sloth Tønnesen <asbjorn@asbjorn.st>
---
 ip/ipl2tp.c        | 23 +++++++++++++++++++++++
 man/man8/ip-l2tp.8 | 15 +++++++++++++++
 2 files changed, 38 insertions(+)

diff --git a/ip/ipl2tp.c b/ip/ipl2tp.c
index f5d4113..ab35023 100644
--- a/ip/ipl2tp.c
+++ b/ip/ipl2tp.c
@@ -246,6 +246,12 @@ static void print_session(struct l2tp_data *data)
 		printf("  reorder timeout: %u\n", p->reorder_timeout);
 	else
 		printf("\n");
+	if (p->send_seq || p->recv_seq) {
+		printf("  sequence numbering:");
+		if (p->send_seq) printf(" send");
+		if (p->recv_seq) printf(" recv");
+		printf("\n");
+	}
 }
 
 static int get_response(struct nlmsghdr *n, void *arg)
@@ -482,6 +488,7 @@ static void usage(void)
 	fprintf(stderr, "          session_id ID peer_session_id ID\n");
 	fprintf(stderr, "          [ cookie HEXSTR ] [ peer_cookie HEXSTR ]\n");
 	fprintf(stderr, "          [ offset OFFSET ] [ peer_offset OFFSET ]\n");
+	fprintf(stderr, "          [ seq { none | send | recv | both } ]\n");
 	fprintf(stderr, "          [ l2spec_type L2SPEC ]\n");
 	fprintf(stderr, "       ip l2tp del tunnel tunnel_id ID\n");
 	fprintf(stderr, "       ip l2tp del session tunnel_id ID session_id ID\n");
@@ -652,6 +659,22 @@ static int parse_args(int argc, char **argv, int cmd, struct l2tp_parm *p)
 				fprintf(stderr, "Unknown layer2specific header type \"%s\"\n", *argv);
 				exit(-1);
 			}
+		} else if (strcmp(*argv, "seq") == 0) {
+			NEXT_ARG();
+			if (strcasecmp(*argv, "both") == 0) {
+				p->recv_seq = 1;
+				p->send_seq = 1;
+			} else if (strcasecmp(*argv, "recv") == 0) {
+				p->recv_seq = 1;
+			} else if (strcasecmp(*argv, "send") == 0) {
+				p->send_seq = 1;
+			} else if (strcasecmp(*argv, "none") == 0) {
+				p->recv_seq = 0;
+				p->send_seq = 0;
+			} else {
+				fprintf(stderr, "Unknown seq value \"%s\"\n", *argv);
+				exit(-1);
+			}
 		} else if (strcmp(*argv, "tunnel") == 0) {
 			p->tunnel = 1;
 		} else if (strcmp(*argv, "session") == 0) {
diff --git a/man/man8/ip-l2tp.8 b/man/man8/ip-l2tp.8
index 991d097..d4e7270 100644
--- a/man/man8/ip-l2tp.8
+++ b/man/man8/ip-l2tp.8
@@ -51,6 +51,8 @@ ip-l2tp - L2TPv3 static unmanaged tunnel configuration
 .br
 .RB "[ " l2spec_type " { " none " | " default " } ]"
 .br
+.RB "[ " seq " { " none " | " send " | " recv " | " both " } ]"
+.br
 .RB "[ " offset
 .IR OFFSET
 .RB " ] [ " peer_offset
@@ -238,6 +240,19 @@ set the layer2specific header type of the session.
 Valid values are:
 .BR none ", " default "."
 .TP
+.BI seq " SEQ"
+controls sequence numbering to prevent or detect out of order packets.
+.B send
+puts a sequence number in the default layer2specific header of each
+outgoing packet.
+.B recv
+reorder packets if they are received out of order.
+Default is
+.BR none "."
+.br
+Valid values are:
+.BR none ", " send ", " recv ", " both "."
+.TP
 .BI offset " OFFSET"
 sets the byte offset from the L2TP header where user data starts in
 transmitted L2TP data packets. This is hardly ever used. If set, the
-- 
2.10.2

^ permalink raw reply related

* [PATCH iproute2 v2 1/9] man: ip-l2tp.8: fix l2spec_type documentation
From: Asbjørn Sloth Tønnesen @ 2016-11-16 22:45 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: James Chapman, netdev

Signed-off-by: Asbjørn Sloth Tønnesen <asbjorn@asbjorn.st>
---
 man/man8/ip-l2tp.8 | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/man/man8/ip-l2tp.8 b/man/man8/ip-l2tp.8
index 5b7041f..4a3bb20 100644
--- a/man/man8/ip-l2tp.8
+++ b/man/man8/ip-l2tp.8
@@ -239,7 +239,7 @@ find in received L2TP packets. Default is to use no cookie.
 set the layer2specific header type of the session.
 .br
 Valid values are:
-.BR none ", " udp "."
+.BR none ", " default "."
 .TP
 .BI offset " OFFSET"
 sets the byte offset from the L2TP header where user data starts in
-- 
2.10.2

^ permalink raw reply related

* [PATCH iproute2 v2 2/9] man: ip-l2tp.8: remove non-existent tunnel parameter name
From: Asbjørn Sloth Tønnesen @ 2016-11-16 22:45 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: James Chapman, netdev
In-Reply-To: <20161116224526.32343-1-asbjorn@asbjorn.st>

The name parameter is only valid for sessions, not tunnels.

Signed-off-by: Asbjørn Sloth Tønnesen <asbjorn@asbjorn.st>
---
 man/man8/ip-l2tp.8 | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/man/man8/ip-l2tp.8 b/man/man8/ip-l2tp.8
index 4a3bb20..991d097 100644
--- a/man/man8/ip-l2tp.8
+++ b/man/man8/ip-l2tp.8
@@ -154,9 +154,6 @@ tunnels and sessions to be established and provides for detecting and
 acting upon network failures.
 .SS ip l2tp add tunnel - add a new tunnel
 .TP
-.BI name " NAME "
-sets the session network interface name. Default is l2tpethN.
-.TP
 .BI tunnel_id " ID"
 set the tunnel id, which is a 32-bit integer value. Uniquely
 identifies the tunnel. The value used must match the peer_tunnel_id
-- 
2.10.2

^ permalink raw reply related

* [PATCH iproute2 v2 8/9] l2tp: show tunnel: expose UDP checksum state
From: Asbjørn Sloth Tønnesen @ 2016-11-16 22:45 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: James Chapman, netdev
In-Reply-To: <20161116224526.32343-1-asbjorn@asbjorn.st>

Signed-off-by: Asbjørn Sloth Tønnesen <asbjorn@asbjorn.st>
---
 ip/ipl2tp.c | 17 ++++++++++++++++-
 1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/ip/ipl2tp.c b/ip/ipl2tp.c
index ab35023..f2bbc0c 100644
--- a/ip/ipl2tp.c
+++ b/ip/ipl2tp.c
@@ -218,9 +218,24 @@ static void print_tunnel(const struct l2tp_data *data)
 	printf("  Peer tunnel %u\n",
 	       p->peer_tunnel_id);
 
-	if (p->encap == L2TP_ENCAPTYPE_UDP)
+	if (p->encap == L2TP_ENCAPTYPE_UDP) {
 		printf("  UDP source / dest ports: %hu/%hu\n",
 		       p->local_udp_port, p->peer_udp_port);
+
+		switch (p->local_ip.family) {
+		case AF_INET:
+			printf("  UDP checksum: %s\n",
+			       p->udp_csum ? "enabled" : "disabled");
+			break;
+		case AF_INET6:
+			printf("  UDP checksum: %s%s%s%s\n",
+			       p->udp6_csum_tx && p->udp6_csum_rx ? "enabled" : "",
+			       p->udp6_csum_tx && !p->udp6_csum_rx ? "tx" : "",
+			       !p->udp6_csum_tx && p->udp6_csum_rx ? "rx" : "",
+			       !p->udp6_csum_tx && !p->udp6_csum_rx ? "disabled" : "");
+			break;
+		}
+	}
 }
 
 static void print_session(struct l2tp_data *data)
-- 
2.10.2

^ permalink raw reply related

* [PATCH] net: bcm63xx_enet: fix build failure
From: Sudip Mukherjee @ 2016-11-16 22:50 UTC (permalink / raw)
  To: Florian Fainelli, David S. Miller
  Cc: linux-kernel, bcm-kernel-feedback-list, netdev, linux-arm-kernel,
	Sudip Mukherjee

The build of mips bcm63xx_defconfig was failing with the error:
drivers/net/ethernet/broadcom/bcm63xx_enet.c:1440:2:
	error: expected expression before 'return'

The return statement should be termibated with ';' and not ','.

Fixes: 42469bf5d9bb ("net: bcm63xx_enet: Utilize phy_ethtool_nway_reset")
Signed-off-by: Sudip Mukherjee <sudip.mukherjee@codethink.co.uk>
---

build log is at:
https://travis-ci.org/sudipm-mukherjee/parport/jobs/176269457

 drivers/net/ethernet/broadcom/bcm63xx_enet.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/broadcom/bcm63xx_enet.c b/drivers/net/ethernet/broadcom/bcm63xx_enet.c
index a43ab90..3b14d51 100644
--- a/drivers/net/ethernet/broadcom/bcm63xx_enet.c
+++ b/drivers/net/ethernet/broadcom/bcm63xx_enet.c
@@ -1435,7 +1435,7 @@ static int bcm_enet_nway_reset(struct net_device *dev)
 
 	priv = netdev_priv(dev);
 	if (priv->has_phy)
-		return phy_ethtool_nway_reset(dev),
+		return phy_ethtool_nway_reset(dev);
 
 	return -EOPNOTSUPP;
 }
-- 
1.9.1

^ permalink raw reply related

* Re: Netperf UDP issue with connected sockets
From: Rick Jones @ 2016-11-16 22:50 UTC (permalink / raw)
  To: Jesper Dangaard Brouer; +Cc: netdev@vger.kernel.org, Eric Dumazet
In-Reply-To: <20161116234022.2bad179b@redhat.com>

On 11/16/2016 02:40 PM, Jesper Dangaard Brouer wrote:
> On Wed, 16 Nov 2016 09:46:37 -0800
> Rick Jones <rick.jones2@hpe.com> wrote:
>> It is a wild guess, but does setting SO_DONTROUTE affect whether or not
>> a connect() would have the desired effect?  That is there to protect
>> people from themselves (long story about people using UDP_STREAM to
>> stress improperly air-gapped systems during link up/down testing....)
>> It can be disabled with a test-specific -R 1 option, so your netperf
>> command would become:
>>
>> netperf -H 198.18.50.1 -t UDP_STREAM -l 120 -- -m 1472 -n -N -R 1
>
> Using -R 1 does not seem to help remove __ip_select_ident()

Bummer.  It was a wild guess anyway, since I was seeing a connect() call 
on the data socket.

> Samples: 56K of event 'cycles', Event count (approx.): 78628132661
>   Overhead  Command        Shared Object        Symbol
> +    9.11%  netperf        [kernel.vmlinux]     [k] __ip_select_ident
> +    6.98%  netperf        [kernel.vmlinux]     [k] _raw_spin_lock
> +    6.21%  swapper        [mlx5_core]          [k] mlx5e_poll_tx_cq
> +    5.03%  netperf        [kernel.vmlinux]     [k] copy_user_enhanced_fast_string
> +    4.69%  netperf        [kernel.vmlinux]     [k] __ip_make_skb
> +    4.63%  netperf        [kernel.vmlinux]     [k] skb_set_owner_w
> +    4.15%  swapper        [kernel.vmlinux]     [k] __slab_free
> +    3.80%  netperf        [mlx5_core]          [k] mlx5e_sq_xmit
> +    2.00%  swapper        [kernel.vmlinux]     [k] sock_wfree
> +    1.94%  netperf        netperf              [.] send_data
> +    1.92%  netperf        netperf              [.] send_omni_inner

Well, the next step I suppose is to have you try a quick netperf 
UDP_STREAM under strace to see if your netperf binary does what mine did:

strace -v -o /tmp/netperf.strace netperf -H 198.18.50.1 -t UDP_STREAM -l 
1 -- -m 1472 -n -N -R 1

And see if you see the connect() I saw. (Note, I make the runtime 1 second)

rick

^ permalink raw reply

* Re: [PATCH] net: bcm63xx_enet: fix build failure
From: David Miller @ 2016-11-16 22:51 UTC (permalink / raw)
  To: sudipm.mukherjee
  Cc: f.fainelli, linux-kernel, bcm-kernel-feedback-list, netdev,
	linux-arm-kernel
In-Reply-To: <1479336616-26500-1-git-send-email-sudipm.mukherjee@gmail.com>

From: Sudip Mukherjee <sudipm.mukherjee@gmail.com>
Date: Wed, 16 Nov 2016 22:50:16 +0000

> The build of mips bcm63xx_defconfig was failing with the error:
> drivers/net/ethernet/broadcom/bcm63xx_enet.c:1440:2:
> 	error: expected expression before 'return'
> 
> The return statement should be termibated with ';' and not ','.
> 
> Fixes: 42469bf5d9bb ("net: bcm63xx_enet: Utilize phy_ethtool_nway_reset")
> Signed-off-by: Sudip Mukherjee <sudip.mukherjee@codethink.co.uk>

Already fixes in the current net-next tree.

^ permalink raw reply

* Re: [PATCH v2] net/phy/vitesse: Configure RGMII skew on VSC8601, if needed
From: David Miller @ 2016-11-16 22:54 UTC (permalink / raw)
  To: alex.g; +Cc: f.fainelli, netdev, linux-kernel
In-Reply-To: <1479286953-11481-1-git-send-email-alex.g@adaptrum.com>

From: Alexandru Gagniuc <alex.g@adaptrum.com>
Date: Wed, 16 Nov 2016 01:02:33 -0800

> With RGMII, we need a 1.5 to 2ns skew between clock and data lines. The
> VSC8601 can handle this internally. While the VSC8601 can set more
> fine-grained delays, the standard skew settings work out of the box.
> The same heuristic is used to determine when this skew should be enabled
> as in vsc824x_config_init().
> 
> Tested on custom board with AM3352 SOC and VSC801 PHY.
> 
> Signed-off-by: Alexandru Gagniuc <alex.g@adaptrum.com>
> ---
> Changes since v1:
>  * Added comment detailing applicability to different RGMII interfaces.

Applied.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox