Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH] can: enable multi-queue for SocketCAN devices
From: Marc Kleine-Budde @ 2018-03-14 11:11 UTC (permalink / raw)
  To: Mark Jonas, Wolfgang Grandegger
  Cc: linux-can, netdev, linux-kernel, hs, Zhu Yi
In-Reply-To: <1521023596-30108-1-git-send-email-mark.jonas@de.bosch.com>


[-- Attachment #1.1: Type: text/plain, Size: 3841 bytes --]

On 03/14/2018 11:33 AM, Mark Jonas wrote:
> From: Zhu Yi <yi.zhu5@cn.bosch.com>
> 
> The existing SocketCAN implementation provides alloc_candev() to
> allocate a CAN device using a single Tx and Rx queue. This can lead to
> priority inversion in case the single Tx queue is already full with low
> priority messages and a high priority message needs to be sent while the
> bus is fully loaded with medium priority messages.
> 
> This problem can be solved by using the existing multi-queue support of
> the network subsystem. The commit makes it possible to use multi-queue in
> the CAN subsystem in the same way it is used in the Ethernet subsystem
> by adding an alloc_candev_mqs() call and accompanying macros. With this
> support a CAN device can use multi-queue qdisc (e.g. mqprio) to avoid
> the aforementioned priority inversion.
> 
> The existing functionality of alloc_candev() is the same as before.
> 
> CAN devices need to have prioritized multiple hardware queues or are
> able to abort waiting for arbitration to make sensible use of
> multi-queues.
> 
> Signed-off-by: Zhu Yi <yi.zhu5@cn.bosch.com>
> Signed-off-by: Mark Jonas <mark.jonas@de.bosch.com>
> Reviewed-by: Heiko Schocher <hs@denx.de>

Do you have a driver or a patch to make a driver mq aware?

> ---
>  drivers/net/can/dev.c   | 8 +++++---
>  include/linux/can/dev.h | 7 ++++++-
>  2 files changed, 11 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/net/can/dev.c b/drivers/net/can/dev.c
> index b177956..636f853 100644
> --- a/drivers/net/can/dev.c
> +++ b/drivers/net/can/dev.c
> @@ -703,7 +703,8 @@ EXPORT_SYMBOL_GPL(alloc_can_err_skb);
>  /*
>   * Allocate and setup space for the CAN network device
>   */
> -struct net_device *alloc_candev(int sizeof_priv, unsigned int echo_skb_max)
> +struct net_device *alloc_candev_mqs(int sizeof_priv, unsigned int echo_skb_max,
> +				    unsigned int txqs, unsigned int rxqs)
>  {
>  	struct net_device *dev;
>  	struct can_priv *priv;
> @@ -715,7 +716,8 @@ struct net_device *alloc_candev(int sizeof_priv, unsigned int echo_skb_max)
>  	else
>  		size = sizeof_priv;
>  
> -	dev = alloc_netdev(size, "can%d", NET_NAME_UNKNOWN, can_setup);
> +	dev = alloc_netdev_mqs(size, "can%d", NET_NAME_UNKNOWN, can_setup,
> +			       txqs, rxqs);
>  	if (!dev)
>  		return NULL;
>  
> @@ -734,7 +736,7 @@ struct net_device *alloc_candev(int sizeof_priv, unsigned int echo_skb_max)
>  
>  	return dev;
>  }
> -EXPORT_SYMBOL_GPL(alloc_candev);
> +EXPORT_SYMBOL_GPL(alloc_candev_mqs);
>  
>  /*
>   * Free space of the CAN network device
> diff --git a/include/linux/can/dev.h b/include/linux/can/dev.h
> index 055aaf5..a83e1f6 100644
> --- a/include/linux/can/dev.h
> +++ b/include/linux/can/dev.h
> @@ -143,7 +143,12 @@ u8 can_dlc2len(u8 can_dlc);
>  /* map the sanitized data length to an appropriate data length code */
>  u8 can_len2dlc(u8 len);
>  
> -struct net_device *alloc_candev(int sizeof_priv, unsigned int echo_skb_max);
> +struct net_device *alloc_candev_mqs(int sizeof_priv, unsigned int echo_skb_max,
> +				    unsigned int txqs, unsigned int rxqs);
> +#define alloc_candev(sizeof_priv, echo_skb_max) \
> +	alloc_candev_mqs(sizeof_priv, echo_skb_max, 1, 1)
> +#define alloc_candev_mq(sizeof_priv, echo_skb_max, count) \
> +	alloc_candev_mqs(sizeof_priv, echo_skb_max, count, count)

Can you make this a static inline function.

>  void free_candev(struct net_device *dev);
>  
>  /* a candev safe wrapper around netdev_priv */
> 

Marc

-- 
Pengutronix e.K.                  | Marc Kleine-Budde           |
Industrial Linux Solutions        | Phone: +49-231-2826-924     |
Vertretung West/Dortmund          | Fax:   +49-5121-206917-5555 |
Amtsgericht Hildesheim, HRA 2686  | http://www.pengutronix.de   |


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply

* [PATCH net-next 5/5] sctp: add SCTP_AUTH_NO_AUTH type for AUTHENTICATION_EVENT
From: Xin Long @ 2018-03-14 11:05 UTC (permalink / raw)
  To: network dev, linux-sctp; +Cc: Marcelo Ricardo Leitner, Neil Horman, davem
In-Reply-To: <cover.1521025473.git.lucien.xin@gmail.com>

This patch is to add SCTP_AUTH_NO_AUTH type for AUTHENTICATION_EVENT,
as described in section 6.1.8 of RFC6458.

      SCTP_AUTH_NO_AUTH:  This report indicates that the peer does not
         support SCTP authentication as defined in [RFC4895].

Note that the implementation is quite similar as that of
SCTP_ADAPTATION_INDICATION.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
---
 include/net/sctp/command.h |  1 +
 include/uapi/linux/sctp.h  |  1 +
 net/sctp/sm_sideeffect.c   | 13 +++++++++++++
 net/sctp/sm_statefuns.c    | 43 +++++++++++++++++++++++++++++++++++++++++--
 4 files changed, 56 insertions(+), 2 deletions(-)

diff --git a/include/net/sctp/command.h b/include/net/sctp/command.h
index b55c6a4..6640f84 100644
--- a/include/net/sctp/command.h
+++ b/include/net/sctp/command.h
@@ -100,6 +100,7 @@ enum sctp_verb {
 	SCTP_CMD_SET_SK_ERR,	 /* Set sk_err */
 	SCTP_CMD_ASSOC_CHANGE,	 /* generate and send assoc_change event */
 	SCTP_CMD_ADAPTATION_IND, /* generate and send adaptation event */
+	SCTP_CMD_PEER_NO_AUTH,   /* generate and send authentication event */
 	SCTP_CMD_ASSOC_SHKEY,    /* generate the association shared keys */
 	SCTP_CMD_T1_RETRAN,	 /* Mark for retransmission after T1 timeout  */
 	SCTP_CMD_UPDATE_INITTAG, /* Update peer inittag */
diff --git a/include/uapi/linux/sctp.h b/include/uapi/linux/sctp.h
index 18ebbfe..afd4346 100644
--- a/include/uapi/linux/sctp.h
+++ b/include/uapi/linux/sctp.h
@@ -522,6 +522,7 @@ enum {
 	SCTP_AUTH_NEW_KEY,
 #define	SCTP_AUTH_NEWKEY	SCTP_AUTH_NEW_KEY /* compatible with before */
 	SCTP_AUTH_FREE_KEY,
+	SCTP_AUTH_NO_AUTH,
 };
 
 /*
diff --git a/net/sctp/sm_sideeffect.c b/net/sctp/sm_sideeffect.c
index b71e7fb..298112c 100644
--- a/net/sctp/sm_sideeffect.c
+++ b/net/sctp/sm_sideeffect.c
@@ -1049,6 +1049,16 @@ static void sctp_cmd_assoc_change(struct sctp_cmd_seq *commands,
 		asoc->stream.si->enqueue_event(&asoc->ulpq, ev);
 }
 
+static void sctp_cmd_peer_no_auth(struct sctp_cmd_seq *commands,
+				  struct sctp_association *asoc)
+{
+	struct sctp_ulpevent *ev;
+
+	ev = sctp_ulpevent_make_authkey(asoc, 0, SCTP_AUTH_NO_AUTH, GFP_ATOMIC);
+	if (ev)
+		asoc->stream.si->enqueue_event(&asoc->ulpq, ev);
+}
+
 /* Helper function to generate an adaptation indication event */
 static void sctp_cmd_adaptation_ind(struct sctp_cmd_seq *commands,
 				    struct sctp_association *asoc)
@@ -1755,6 +1765,9 @@ static int sctp_cmd_interpreter(enum sctp_event event_type,
 		case SCTP_CMD_ADAPTATION_IND:
 			sctp_cmd_adaptation_ind(commands, asoc);
 			break;
+		case SCTP_CMD_PEER_NO_AUTH:
+			sctp_cmd_peer_no_auth(commands, asoc);
+			break;
 
 		case SCTP_CMD_ASSOC_SHKEY:
 			error = sctp_auth_asoc_init_active_key(asoc,
diff --git a/net/sctp/sm_statefuns.c b/net/sctp/sm_statefuns.c
index 1e41dee..cc56a67 100644
--- a/net/sctp/sm_statefuns.c
+++ b/net/sctp/sm_statefuns.c
@@ -659,7 +659,7 @@ enum sctp_disposition sctp_sf_do_5_1D_ce(struct net *net,
 					 void *arg,
 					 struct sctp_cmd_seq *commands)
 {
-	struct sctp_ulpevent *ev, *ai_ev = NULL;
+	struct sctp_ulpevent *ev, *ai_ev = NULL, *auth_ev = NULL;
 	struct sctp_association *new_asoc;
 	struct sctp_init_chunk *peer_init;
 	struct sctp_chunk *chunk = arg;
@@ -820,6 +820,14 @@ enum sctp_disposition sctp_sf_do_5_1D_ce(struct net *net,
 			goto nomem_aiev;
 	}
 
+	if (!new_asoc->peer.auth_capable) {
+		auth_ev = sctp_ulpevent_make_authkey(new_asoc, 0,
+						     SCTP_AUTH_NO_AUTH,
+						     GFP_ATOMIC);
+		if (!auth_ev)
+			goto nomem_authev;
+	}
+
 	/* Add all the state machine commands now since we've created
 	 * everything.  This way we don't introduce memory corruptions
 	 * during side-effect processing and correclty count established
@@ -847,8 +855,14 @@ enum sctp_disposition sctp_sf_do_5_1D_ce(struct net *net,
 		sctp_add_cmd_sf(commands, SCTP_CMD_EVENT_ULP,
 				SCTP_ULPEVENT(ai_ev));
 
+	if (auth_ev)
+		sctp_add_cmd_sf(commands, SCTP_CMD_EVENT_ULP,
+				SCTP_ULPEVENT(auth_ev));
+
 	return SCTP_DISPOSITION_CONSUME;
 
+nomem_authev:
+	sctp_ulpevent_free(ai_ev);
 nomem_aiev:
 	sctp_ulpevent_free(ev);
 nomem_ev:
@@ -953,6 +967,15 @@ enum sctp_disposition sctp_sf_do_5_1E_ca(struct net *net,
 				SCTP_ULPEVENT(ev));
 	}
 
+	if (!asoc->peer.auth_capable) {
+		ev = sctp_ulpevent_make_authkey(asoc, 0, SCTP_AUTH_NO_AUTH,
+						GFP_ATOMIC);
+		if (!ev)
+			goto nomem;
+		sctp_add_cmd_sf(commands, SCTP_CMD_EVENT_ULP,
+				SCTP_ULPEVENT(ev));
+	}
+
 	return SCTP_DISPOSITION_CONSUME;
 nomem:
 	return SCTP_DISPOSITION_NOMEM;
@@ -1908,6 +1931,9 @@ static enum sctp_disposition sctp_sf_do_dupcook_b(
 	if (asoc->peer.adaptation_ind)
 		sctp_add_cmd_sf(commands, SCTP_CMD_ADAPTATION_IND, SCTP_NULL());
 
+	if (!asoc->peer.auth_capable)
+		sctp_add_cmd_sf(commands, SCTP_CMD_PEER_NO_AUTH, SCTP_NULL());
+
 	return SCTP_DISPOSITION_CONSUME;
 
 nomem:
@@ -1954,7 +1980,7 @@ static enum sctp_disposition sctp_sf_do_dupcook_d(
 					struct sctp_cmd_seq *commands,
 					struct sctp_association *new_asoc)
 {
-	struct sctp_ulpevent *ev = NULL, *ai_ev = NULL;
+	struct sctp_ulpevent *ev = NULL, *ai_ev = NULL, *auth_ev = NULL;
 	struct sctp_chunk *repl;
 
 	/* Clarification from Implementor's Guide:
@@ -2001,6 +2027,14 @@ static enum sctp_disposition sctp_sf_do_dupcook_d(
 				goto nomem;
 
 		}
+
+		if (!asoc->peer.auth_capable) {
+			auth_ev = sctp_ulpevent_make_authkey(asoc, 0,
+							     SCTP_AUTH_NO_AUTH,
+							     GFP_ATOMIC);
+			if (!auth_ev)
+				goto nomem;
+		}
 	}
 
 	repl = sctp_make_cookie_ack(new_asoc, chunk);
@@ -2015,10 +2049,15 @@ static enum sctp_disposition sctp_sf_do_dupcook_d(
 	if (ai_ev)
 		sctp_add_cmd_sf(commands, SCTP_CMD_EVENT_ULP,
 					SCTP_ULPEVENT(ai_ev));
+	if (auth_ev)
+		sctp_add_cmd_sf(commands, SCTP_CMD_EVENT_ULP,
+				SCTP_ULPEVENT(auth_ev));
 
 	return SCTP_DISPOSITION_CONSUME;
 
 nomem:
+	if (auth_ev)
+		sctp_ulpevent_free(auth_ev);
 	if (ai_ev)
 		sctp_ulpevent_free(ai_ev);
 	if (ev)
-- 
2.1.0

^ permalink raw reply related

* [PATCH net-next 4/5] sctp: add SCTP_AUTH_FREE_KEY type for AUTHENTICATION_EVENT
From: Xin Long @ 2018-03-14 11:05 UTC (permalink / raw)
  To: network dev, linux-sctp; +Cc: Marcelo Ricardo Leitner, Neil Horman, davem
In-Reply-To: <cover.1521025473.git.lucien.xin@gmail.com>

This patch is to add SCTP_AUTH_FREE_KEY type for AUTHENTICATION_EVENT,
as described in section 6.1.8 of RFC6458.

      SCTP_AUTH_FREE_KEY:  This report indicates that the SCTP
         implementation will no longer use the key identifier specified
         in auth_keynumber.

After deactivating a key, it would never be used again, which means
it's refcnt can't be held/increased by new chunks. But there may be
some chunks in out queue still using it. So only when refcnt is 1,
which means no chunk in outqueue is using/holding this key either,
this EVENT would be sent.

When users receive this notification, they could do DEL_KEY sockopt to
remove this shkey, and also tell the peer that this key won't be used
in any chunk thoroughly from now on, then the peer can remove it as
well safely.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
---
 include/uapi/linux/sctp.h |  6 +++++-
 net/sctp/auth.c           | 14 ++++++++++++++
 net/sctp/sm_make_chunk.c  | 20 +++++++++++++++++++-
 net/sctp/sm_statefuns.c   |  2 +-
 net/sctp/socket.c         | 19 ++++++++++++++++++-
 5 files changed, 57 insertions(+), 4 deletions(-)

diff --git a/include/uapi/linux/sctp.h b/include/uapi/linux/sctp.h
index 08fc313..18ebbfe 100644
--- a/include/uapi/linux/sctp.h
+++ b/include/uapi/linux/sctp.h
@@ -518,7 +518,11 @@ struct sctp_authkey_event {
 	sctp_assoc_t auth_assoc_id;
 };
 
-enum { SCTP_AUTH_NEWKEY = 0, };
+enum {
+	SCTP_AUTH_NEW_KEY,
+#define	SCTP_AUTH_NEWKEY	SCTP_AUTH_NEW_KEY /* compatible with before */
+	SCTP_AUTH_FREE_KEY,
+};
 
 /*
  * 6.1.9. SCTP_SENDER_DRY_EVENT
diff --git a/net/sctp/auth.c b/net/sctp/auth.c
index a073123..e64630c 100644
--- a/net/sctp/auth.c
+++ b/net/sctp/auth.c
@@ -992,6 +992,20 @@ int sctp_auth_deact_key_id(struct sctp_endpoint *ep,
 	if (!found)
 		return -EINVAL;
 
+	/* refcnt == 1 and !list_empty mean it's not being used anywhere
+	 * and deactivated will be set, so it's time to notify userland
+	 * that this shkey can be freed.
+	 */
+	if (asoc && !list_empty(&key->key_list) &&
+	    refcount_read(&key->refcnt) == 1) {
+		struct sctp_ulpevent *ev;
+
+		ev = sctp_ulpevent_make_authkey(asoc, key->key_id,
+						SCTP_AUTH_FREE_KEY, GFP_KERNEL);
+		if (ev)
+			asoc->stream.si->enqueue_event(&asoc->ulpq, ev);
+	}
+
 	key->deactivated = 1;
 
 	return 0;
diff --git a/net/sctp/sm_make_chunk.c b/net/sctp/sm_make_chunk.c
index 10f071c..cc20bc3 100644
--- a/net/sctp/sm_make_chunk.c
+++ b/net/sctp/sm_make_chunk.c
@@ -89,8 +89,26 @@ static void sctp_control_release_owner(struct sk_buff *skb)
 {
 	struct sctp_chunk *chunk = skb_shinfo(skb)->destructor_arg;
 
-	if (chunk->shkey)
+	if (chunk->shkey) {
+		struct sctp_shared_key *shkey = chunk->shkey;
+		struct sctp_association *asoc = chunk->asoc;
+
+		/* refcnt == 2 and !list_empty mean after this release, it's
+		 * not being used anywhere, and it's time to notify userland
+		 * that this shkey can be freed if it's been deactivated.
+		 */
+		if (shkey->deactivated && !list_empty(&shkey->key_list) &&
+		    refcount_read(&shkey->refcnt) == 2) {
+			struct sctp_ulpevent *ev;
+
+			ev = sctp_ulpevent_make_authkey(asoc, shkey->key_id,
+							SCTP_AUTH_FREE_KEY,
+							GFP_KERNEL);
+			if (ev)
+				asoc->stream.si->enqueue_event(&asoc->ulpq, ev);
+		}
 		sctp_auth_shkey_release(chunk->shkey);
+	}
 }
 
 static void sctp_control_set_owner_w(struct sctp_chunk *chunk)
diff --git a/net/sctp/sm_statefuns.c b/net/sctp/sm_statefuns.c
index 792e0e2..1e41dee 100644
--- a/net/sctp/sm_statefuns.c
+++ b/net/sctp/sm_statefuns.c
@@ -4246,7 +4246,7 @@ enum sctp_disposition sctp_sf_eat_auth(struct net *net,
 		struct sctp_ulpevent *ev;
 
 		ev = sctp_ulpevent_make_authkey(asoc, ntohs(auth_hdr->shkey_id),
-				    SCTP_AUTH_NEWKEY, GFP_ATOMIC);
+				    SCTP_AUTH_NEW_KEY, GFP_ATOMIC);
 
 		if (!ev)
 			return -ENOMEM;
diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index 65cc354..aeecdd6 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -8166,8 +8166,25 @@ static void sctp_wfree(struct sk_buff *skb)
 	sk->sk_wmem_queued   -= skb->truesize;
 	sk_mem_uncharge(sk, skb->truesize);
 
-	if (chunk->shkey)
+	if (chunk->shkey) {
+		struct sctp_shared_key *shkey = chunk->shkey;
+
+		/* refcnt == 2 and !list_empty mean after this release, it's
+		 * not being used anywhere, and it's time to notify userland
+		 * that this shkey can be freed if it's been deactivated.
+		 */
+		if (shkey->deactivated && !list_empty(&shkey->key_list) &&
+		    refcount_read(&shkey->refcnt) == 2) {
+			struct sctp_ulpevent *ev;
+
+			ev = sctp_ulpevent_make_authkey(asoc, shkey->key_id,
+							SCTP_AUTH_FREE_KEY,
+							GFP_KERNEL);
+			if (ev)
+				asoc->stream.si->enqueue_event(&asoc->ulpq, ev);
+		}
 		sctp_auth_shkey_release(chunk->shkey);
+	}
 
 	sock_wfree(skb);
 	sctp_wake_up_waiters(sk, asoc);
-- 
2.1.0

^ permalink raw reply related

* [PATCH net-next 3/5] sctp: add sockopt SCTP_AUTH_DEACTIVATE_KEY
From: Xin Long @ 2018-03-14 11:05 UTC (permalink / raw)
  To: network dev, linux-sctp; +Cc: Marcelo Ricardo Leitner, Neil Horman, davem
In-Reply-To: <cover.1521025473.git.lucien.xin@gmail.com>

This patch is to add sockopt SCTP_AUTH_DEACTIVATE_KEY, as described in
section 8.3.4 of RFC6458.

This set option indicates that the application will no longer send user
messages using the indicated key identifier.

Note that RFC requires that only deactivated keys that are no longer used
by an association can be deleted, but for the backward compatibility, it
is not to check deactivated when deleting or replacing one sh_key.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
---
 include/net/sctp/auth.h   | 12 ++++++------
 include/uapi/linux/sctp.h |  1 +
 net/sctp/auth.c           | 46 +++++++++++++++++++++++++++++++++++++++++++---
 net/sctp/socket.c         | 31 +++++++++++++++++++++++++++++++
 4 files changed, 81 insertions(+), 9 deletions(-)

diff --git a/include/net/sctp/auth.h b/include/net/sctp/auth.h
index 017c1aa..687e7f8 100644
--- a/include/net/sctp/auth.h
+++ b/include/net/sctp/auth.h
@@ -65,6 +65,7 @@ struct sctp_shared_key {
 	struct sctp_auth_bytes *key;
 	refcount_t refcnt;
 	__u16 key_id;
+	__u8 deactivated;
 };
 
 #define key_for_each(__key, __list_head) \
@@ -113,14 +114,13 @@ void sctp_auth_shkey_hold(struct sctp_shared_key *sh_key);
 int sctp_auth_ep_add_chunkid(struct sctp_endpoint *ep, __u8 chunk_id);
 int sctp_auth_ep_set_hmacs(struct sctp_endpoint *ep,
 			    struct sctp_hmacalgo *hmacs);
-int sctp_auth_set_key(struct sctp_endpoint *ep,
-		      struct sctp_association *asoc,
+int sctp_auth_set_key(struct sctp_endpoint *ep, struct sctp_association *asoc,
 		      struct sctp_authkey *auth_key);
 int sctp_auth_set_active_key(struct sctp_endpoint *ep,
-		      struct sctp_association *asoc,
-		      __u16 key_id);
+			     struct sctp_association *asoc, __u16 key_id);
 int sctp_auth_del_key_id(struct sctp_endpoint *ep,
-		      struct sctp_association *asoc,
-		      __u16 key_id);
+			 struct sctp_association *asoc, __u16 key_id);
+int sctp_auth_deact_key_id(struct sctp_endpoint *ep,
+			   struct sctp_association *asoc, __u16 key_id);
 
 #endif
diff --git a/include/uapi/linux/sctp.h b/include/uapi/linux/sctp.h
index 47e781e..08fc313 100644
--- a/include/uapi/linux/sctp.h
+++ b/include/uapi/linux/sctp.h
@@ -99,6 +99,7 @@ typedef __s32 sctp_assoc_t;
 #define SCTP_RECVRCVINFO	32
 #define SCTP_RECVNXTINFO	33
 #define SCTP_DEFAULT_SNDINFO	34
+#define SCTP_AUTH_DEACTIVATE_KEY	35
 
 /* Internal Socket Options. Some of the sctp library functions are
  * implemented using these socket options.
diff --git a/net/sctp/auth.c b/net/sctp/auth.c
index e5214fd..a073123 100644
--- a/net/sctp/auth.c
+++ b/net/sctp/auth.c
@@ -449,8 +449,11 @@ struct sctp_shared_key *sctp_auth_get_shkey(
 
 	/* First search associations set of endpoint pair shared keys */
 	key_for_each(key, &asoc->endpoint_shared_keys) {
-		if (key->key_id == key_id)
-			return key;
+		if (key->key_id == key_id) {
+			if (!key->deactivated)
+				return key;
+			break;
+		}
 	}
 
 	return NULL;
@@ -905,7 +908,7 @@ int sctp_auth_set_active_key(struct sctp_endpoint *ep,
 		}
 	}
 
-	if (!found)
+	if (!found || key->deactivated)
 		return -EINVAL;
 
 	if (asoc) {
@@ -956,3 +959,40 @@ int sctp_auth_del_key_id(struct sctp_endpoint *ep,
 
 	return 0;
 }
+
+int sctp_auth_deact_key_id(struct sctp_endpoint *ep,
+			   struct sctp_association *asoc, __u16  key_id)
+{
+	struct sctp_shared_key *key;
+	struct list_head *sh_keys;
+	int found = 0;
+
+	/* The key identifier MUST NOT be the current active key
+	 * The key identifier MUST correst to an existing key
+	 */
+	if (asoc) {
+		if (asoc->active_key_id == key_id)
+			return -EINVAL;
+
+		sh_keys = &asoc->endpoint_shared_keys;
+	} else {
+		if (ep->active_key_id == key_id)
+			return -EINVAL;
+
+		sh_keys = &ep->endpoint_shared_keys;
+	}
+
+	key_for_each(key, sh_keys) {
+		if (key->key_id == key_id) {
+			found = 1;
+			break;
+		}
+	}
+
+	if (!found)
+		return -EINVAL;
+
+	key->deactivated = 1;
+
+	return 0;
+}
diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index 9ffdecb..65cc354 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -3647,6 +3647,33 @@ static int sctp_setsockopt_del_key(struct sock *sk,
 }
 
 /*
+ * 8.3.4  Deactivate a Shared Key (SCTP_AUTH_DEACTIVATE_KEY)
+ *
+ * This set option will deactivate a shared secret key.
+ */
+static int sctp_setsockopt_deactivate_key(struct sock *sk, char __user *optval,
+					  unsigned int optlen)
+{
+	struct sctp_endpoint *ep = sctp_sk(sk)->ep;
+	struct sctp_authkeyid val;
+	struct sctp_association *asoc;
+
+	if (!ep->auth_enable)
+		return -EACCES;
+
+	if (optlen != sizeof(struct sctp_authkeyid))
+		return -EINVAL;
+	if (copy_from_user(&val, optval, optlen))
+		return -EFAULT;
+
+	asoc = sctp_id2assoc(sk, val.scact_assoc_id);
+	if (!asoc && val.scact_assoc_id && sctp_style(sk, UDP))
+		return -EINVAL;
+
+	return sctp_auth_deact_key_id(ep, asoc, val.scact_keynumber);
+}
+
+/*
  * 8.1.23 SCTP_AUTO_ASCONF
  *
  * This option will enable or disable the use of the automatic generation of
@@ -4238,6 +4265,9 @@ static int sctp_setsockopt(struct sock *sk, int level, int optname,
 	case SCTP_AUTH_DELETE_KEY:
 		retval = sctp_setsockopt_del_key(sk, optval, optlen);
 		break;
+	case SCTP_AUTH_DEACTIVATE_KEY:
+		retval = sctp_setsockopt_deactivate_key(sk, optval, optlen);
+		break;
 	case SCTP_AUTO_ASCONF:
 		retval = sctp_setsockopt_auto_asconf(sk, optval, optlen);
 		break;
@@ -7212,6 +7242,7 @@ static int sctp_getsockopt(struct sock *sk, int level, int optname,
 	case SCTP_AUTH_KEY:
 	case SCTP_AUTH_CHUNK:
 	case SCTP_AUTH_DELETE_KEY:
+	case SCTP_AUTH_DEACTIVATE_KEY:
 		retval = -EOPNOTSUPP;
 		break;
 	case SCTP_HMAC_IDENT:
-- 
2.1.0

^ permalink raw reply related

* [PATCH net-next 2/5] sctp: add support for SCTP AUTH Information for sendmsg
From: Xin Long @ 2018-03-14 11:05 UTC (permalink / raw)
  To: network dev, linux-sctp; +Cc: Marcelo Ricardo Leitner, Neil Horman, davem
In-Reply-To: <cover.1521025473.git.lucien.xin@gmail.com>

This patch is to add support for SCTP AUTH Information for sendmsg,
as described in section 5.3.8 of RFC6458.

With this option, you can provide shared key identifier used for
sending the user message.

It's also a necessary send info for sctp_sendv.

Note that it reuses sinfo->sinfo_tsn to indicate if this option is
set and sinfo->sinfo_ssn to save the shkey ID which can be 0.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
---
 include/net/sctp/structs.h |  1 +
 include/uapi/linux/sctp.h  | 14 +++++++++++++-
 net/sctp/chunk.c           | 11 ++++++++++-
 net/sctp/socket.c          | 23 +++++++++++++++++++++++
 4 files changed, 47 insertions(+), 2 deletions(-)

diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
index 49ad67b..012fb3e 100644
--- a/include/net/sctp/structs.h
+++ b/include/net/sctp/structs.h
@@ -2118,6 +2118,7 @@ struct sctp_cmsgs {
 	struct sctp_sndrcvinfo *srinfo;
 	struct sctp_sndinfo *sinfo;
 	struct sctp_prinfo *prinfo;
+	struct sctp_authinfo *authinfo;
 	struct msghdr *addrs_msg;
 };
 
diff --git a/include/uapi/linux/sctp.h b/include/uapi/linux/sctp.h
index e94b6d2..47e781e 100644
--- a/include/uapi/linux/sctp.h
+++ b/include/uapi/linux/sctp.h
@@ -273,6 +273,18 @@ struct sctp_prinfo {
 	__u32 pr_value;
 };
 
+/* 5.3.8 SCTP AUTH Information Structure (SCTP_AUTHINFO)
+ *
+ *   This cmsghdr structure specifies SCTP options for sendmsg().
+ *
+ *   cmsg_level    cmsg_type      cmsg_data[]
+ *   ------------  ------------   -------------------
+ *   IPPROTO_SCTP  SCTP_AUTHINFO  struct sctp_authinfo
+ */
+struct sctp_authinfo {
+	__u16 auth_keynumber;
+};
+
 /*
  *  sinfo_flags: 16 bits (unsigned integer)
  *
@@ -310,7 +322,7 @@ typedef enum sctp_cmsg_type {
 #define SCTP_NXTINFO	SCTP_NXTINFO
 	SCTP_PRINFO,		/* 5.3.7 SCTP PR-SCTP Information Structure */
 #define SCTP_PRINFO	SCTP_PRINFO
-	SCTP_AUTHINFO,		/* 5.3.8 SCTP AUTH Information Structure (RESERVED) */
+	SCTP_AUTHINFO,		/* 5.3.8 SCTP AUTH Information Structure */
 #define SCTP_AUTHINFO	SCTP_AUTHINFO
 	SCTP_DSTADDRV4,		/* 5.3.9 SCTP Destination IPv4 Address Structure */
 #define SCTP_DSTADDRV4	SCTP_DSTADDRV4
diff --git a/net/sctp/chunk.c b/net/sctp/chunk.c
index 9f28a9a..f889a84 100644
--- a/net/sctp/chunk.c
+++ b/net/sctp/chunk.c
@@ -206,7 +206,16 @@ struct sctp_datamsg *sctp_datamsg_from_user(struct sctp_association *asoc,
 			max_data -= SCTP_PAD4(sizeof(struct sctp_auth_chunk) +
 					      hmac_desc->hmac_len);
 
-		shkey = asoc->shkey;
+		if (sinfo->sinfo_tsn &&
+		    sinfo->sinfo_ssn != asoc->active_key_id) {
+			shkey = sctp_auth_get_shkey(asoc, sinfo->sinfo_ssn);
+			if (!shkey) {
+				err = -EINVAL;
+				goto errout;
+			}
+		} else {
+			shkey = asoc->shkey;
+		}
 	}
 
 	/* Check what's our max considering the above */
diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index 003a4ad..9ffdecb 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -1987,6 +1987,14 @@ static void sctp_sendmsg_update_sinfo(struct sctp_association *asoc,
 
 	if (!cmsgs->srinfo && !cmsgs->prinfo)
 		sinfo->sinfo_timetolive = asoc->default_timetolive;
+
+	if (cmsgs->authinfo) {
+		/* Reuse sinfo_tsn to indicate that authinfo was set and
+		 * sinfo_ssn to save the keyid on tx path.
+		 */
+		sinfo->sinfo_tsn = 1;
+		sinfo->sinfo_ssn = cmsgs->authinfo->auth_keynumber;
+	}
 }
 
 static int sctp_sendmsg(struct sock *sk, struct msghdr *msg, size_t msg_len)
@@ -7874,6 +7882,21 @@ static int sctp_msghdr_parse(const struct msghdr *msg, struct sctp_cmsgs *cmsgs)
 			if (cmsgs->prinfo->pr_policy == SCTP_PR_SCTP_NONE)
 				cmsgs->prinfo->pr_value = 0;
 			break;
+		case SCTP_AUTHINFO:
+			/* SCTP Socket API Extension
+			 * 5.3.8 SCTP AUTH Information Structure (SCTP_AUTHINFO)
+			 *
+			 * This cmsghdr structure specifies SCTP options for sendmsg().
+			 *
+			 * cmsg_level    cmsg_type      cmsg_data[]
+			 * ------------  ------------   ---------------------
+			 * IPPROTO_SCTP  SCTP_AUTHINFO  struct sctp_authinfo
+			 */
+			if (cmsg->cmsg_len != CMSG_LEN(sizeof(struct sctp_authinfo)))
+				return -EINVAL;
+
+			cmsgs->authinfo = CMSG_DATA(cmsg);
+			break;
 		case SCTP_DSTADDRV4:
 		case SCTP_DSTADDRV6:
 			/* SCTP Socket API Extension
-- 
2.1.0

^ permalink raw reply related

* [PATCH net-next 1/5] sctp: add refcnt support for sh_key
From: Xin Long @ 2018-03-14 11:05 UTC (permalink / raw)
  To: network dev, linux-sctp; +Cc: Marcelo Ricardo Leitner, Neil Horman, davem
In-Reply-To: <cover.1521025473.git.lucien.xin@gmail.com>

With refcnt support for sh_key, chunks auth sh_keys can be decided
before enqueuing it. Changing the active key later will not affect
the chunks already enqueued.

Furthermore, this is necessary when adding the support for authinfo
for sendmsg in next patch.

Note that struct sctp_chunk can't be grown due to that performance
drop issue on slow cpu, so it just reuses head_skb memory for shkey
in sctp_chunk.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
---
 include/net/sctp/auth.h    |  9 +++--
 include/net/sctp/sm.h      |  3 +-
 include/net/sctp/structs.h |  9 +++--
 net/sctp/auth.c            | 86 +++++++++++++++++++++++-----------------------
 net/sctp/chunk.c           |  5 +++
 net/sctp/output.c          | 18 ++++++++--
 net/sctp/sm_make_chunk.c   | 15 ++++++--
 net/sctp/sm_statefuns.c    | 11 +++---
 net/sctp/socket.c          |  6 ++++
 9 files changed, 104 insertions(+), 58 deletions(-)

diff --git a/include/net/sctp/auth.h b/include/net/sctp/auth.h
index e5c57d0..017c1aa 100644
--- a/include/net/sctp/auth.h
+++ b/include/net/sctp/auth.h
@@ -62,8 +62,9 @@ struct sctp_auth_bytes {
 /* Definition for a shared key, weather endpoint or association */
 struct sctp_shared_key {
 	struct list_head key_list;
-	__u16 key_id;
 	struct sctp_auth_bytes *key;
+	refcount_t refcnt;
+	__u16 key_id;
 };
 
 #define key_for_each(__key, __list_head) \
@@ -103,8 +104,10 @@ int sctp_auth_send_cid(enum sctp_cid chunk,
 int sctp_auth_recv_cid(enum sctp_cid chunk,
 		       const struct sctp_association *asoc);
 void sctp_auth_calculate_hmac(const struct sctp_association *asoc,
-			    struct sk_buff *skb,
-			    struct sctp_auth_chunk *auth, gfp_t gfp);
+			      struct sk_buff *skb, struct sctp_auth_chunk *auth,
+			      struct sctp_shared_key *ep_key, gfp_t gfp);
+void sctp_auth_shkey_release(struct sctp_shared_key *sh_key);
+void sctp_auth_shkey_hold(struct sctp_shared_key *sh_key);
 
 /* API Helpers */
 int sctp_auth_ep_add_chunkid(struct sctp_endpoint *ep, __u8 chunk_id);
diff --git a/include/net/sctp/sm.h b/include/net/sctp/sm.h
index 2883c43..2d0e782 100644
--- a/include/net/sctp/sm.h
+++ b/include/net/sctp/sm.h
@@ -263,7 +263,8 @@ int sctp_process_asconf_ack(struct sctp_association *asoc,
 struct sctp_chunk *sctp_make_fwdtsn(const struct sctp_association *asoc,
 				    __u32 new_cum_tsn, size_t nstreams,
 				    struct sctp_fwdtsn_skip *skiplist);
-struct sctp_chunk *sctp_make_auth(const struct sctp_association *asoc);
+struct sctp_chunk *sctp_make_auth(const struct sctp_association *asoc,
+				  __u16 key_id);
 struct sctp_chunk *sctp_make_strreset_req(const struct sctp_association *asoc,
 					  __u16 stream_num, __be16 *stream_list,
 					  bool out, bool in);
diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
index ec6e46b..49ad67b 100644
--- a/include/net/sctp/structs.h
+++ b/include/net/sctp/structs.h
@@ -577,8 +577,12 @@ struct sctp_chunk {
 	/* This points to the sk_buff containing the actual data.  */
 	struct sk_buff *skb;
 
-	/* In case of GSO packets, this will store the head one */
-	struct sk_buff *head_skb;
+	union {
+		/* In case of GSO packets, this will store the head one */
+		struct sk_buff *head_skb;
+		/* In case of auth enabled, this will point to the shkey */
+		struct sctp_shared_key *shkey;
+	};
 
 	/* These are the SCTP headers by reverse order in a packet.
 	 * Note that some of these may happen more than once.  In that
@@ -1995,6 +1999,7 @@ struct sctp_association {
 	 * The current generated assocaition shared key (secret)
 	 */
 	struct sctp_auth_bytes *asoc_shared_key;
+	struct sctp_shared_key *shkey;
 
 	/* SCTP AUTH: hmac id of the first peer requested algorithm
 	 * that we support.
diff --git a/net/sctp/auth.c b/net/sctp/auth.c
index 00667c5..e5214fd 100644
--- a/net/sctp/auth.c
+++ b/net/sctp/auth.c
@@ -101,13 +101,14 @@ struct sctp_shared_key *sctp_auth_shkey_create(__u16 key_id, gfp_t gfp)
 		return NULL;
 
 	INIT_LIST_HEAD(&new->key_list);
+	refcount_set(&new->refcnt, 1);
 	new->key_id = key_id;
 
 	return new;
 }
 
 /* Free the shared key structure */
-static void sctp_auth_shkey_free(struct sctp_shared_key *sh_key)
+static void sctp_auth_shkey_destroy(struct sctp_shared_key *sh_key)
 {
 	BUG_ON(!list_empty(&sh_key->key_list));
 	sctp_auth_key_put(sh_key->key);
@@ -115,6 +116,17 @@ static void sctp_auth_shkey_free(struct sctp_shared_key *sh_key)
 	kfree(sh_key);
 }
 
+void sctp_auth_shkey_release(struct sctp_shared_key *sh_key)
+{
+	if (refcount_dec_and_test(&sh_key->refcnt))
+		sctp_auth_shkey_destroy(sh_key);
+}
+
+void sctp_auth_shkey_hold(struct sctp_shared_key *sh_key)
+{
+	refcount_inc(&sh_key->refcnt);
+}
+
 /* Destroy the entire key list.  This is done during the
  * associon and endpoint free process.
  */
@@ -128,7 +140,7 @@ void sctp_auth_destroy_keys(struct list_head *keys)
 
 	key_for_each_safe(ep_key, tmp, keys) {
 		list_del_init(&ep_key->key_list);
-		sctp_auth_shkey_free(ep_key);
+		sctp_auth_shkey_release(ep_key);
 	}
 }
 
@@ -409,13 +421,19 @@ int sctp_auth_asoc_init_active_key(struct sctp_association *asoc, gfp_t gfp)
 
 	sctp_auth_key_put(asoc->asoc_shared_key);
 	asoc->asoc_shared_key = secret;
+	asoc->shkey = ep_key;
 
 	/* Update send queue in case any chunk already in there now
 	 * needs authenticating
 	 */
 	list_for_each_entry(chunk, &asoc->outqueue.out_chunk_list, list) {
-		if (sctp_auth_send_cid(chunk->chunk_hdr->type, asoc))
+		if (sctp_auth_send_cid(chunk->chunk_hdr->type, asoc)) {
 			chunk->auth = 1;
+			if (!chunk->shkey) {
+				chunk->shkey = asoc->shkey;
+				sctp_auth_shkey_hold(chunk->shkey);
+			}
+		}
 	}
 
 	return 0;
@@ -703,16 +721,15 @@ int sctp_auth_recv_cid(enum sctp_cid chunk, const struct sctp_association *asoc)
  *    after the AUTH chunk in the SCTP packet.
  */
 void sctp_auth_calculate_hmac(const struct sctp_association *asoc,
-			      struct sk_buff *skb,
-			      struct sctp_auth_chunk *auth,
-			      gfp_t gfp)
+			      struct sk_buff *skb, struct sctp_auth_chunk *auth,
+			      struct sctp_shared_key *ep_key, gfp_t gfp)
 {
-	struct crypto_shash *tfm;
 	struct sctp_auth_bytes *asoc_key;
+	struct crypto_shash *tfm;
 	__u16 key_id, hmac_id;
-	__u8 *digest;
 	unsigned char *end;
 	int free_key = 0;
+	__u8 *digest;
 
 	/* Extract the info we need:
 	 * - hmac id
@@ -724,12 +741,7 @@ void sctp_auth_calculate_hmac(const struct sctp_association *asoc,
 	if (key_id == asoc->active_key_id)
 		asoc_key = asoc->asoc_shared_key;
 	else {
-		struct sctp_shared_key *ep_key;
-
-		ep_key = sctp_auth_get_shkey(asoc, key_id);
-		if (!ep_key)
-			return;
-
+		/* ep_key can't be NULL here */
 		asoc_key = sctp_auth_asoc_create_secret(asoc, ep_key, gfp);
 		if (!asoc_key)
 			return;
@@ -829,7 +841,7 @@ int sctp_auth_set_key(struct sctp_endpoint *ep,
 		      struct sctp_association *asoc,
 		      struct sctp_authkey *auth_key)
 {
-	struct sctp_shared_key *cur_key = NULL;
+	struct sctp_shared_key *cur_key, *shkey;
 	struct sctp_auth_bytes *key;
 	struct list_head *sh_keys;
 	int replace = 0;
@@ -842,46 +854,34 @@ int sctp_auth_set_key(struct sctp_endpoint *ep,
 	else
 		sh_keys = &ep->endpoint_shared_keys;
 
-	key_for_each(cur_key, sh_keys) {
-		if (cur_key->key_id == auth_key->sca_keynumber) {
+	key_for_each(shkey, sh_keys) {
+		if (shkey->key_id == auth_key->sca_keynumber) {
 			replace = 1;
 			break;
 		}
 	}
 
-	/* If we are not replacing a key id, we need to allocate
-	 * a shared key.
-	 */
-	if (!replace) {
-		cur_key = sctp_auth_shkey_create(auth_key->sca_keynumber,
-						 GFP_KERNEL);
-		if (!cur_key)
-			return -ENOMEM;
-	}
+	cur_key = sctp_auth_shkey_create(auth_key->sca_keynumber, GFP_KERNEL);
+	if (!cur_key)
+		return -ENOMEM;
 
 	/* Create a new key data based on the info passed in */
 	key = sctp_auth_create_key(auth_key->sca_keylength, GFP_KERNEL);
-	if (!key)
-		goto nomem;
+	if (!key) {
+		kfree(cur_key);
+		return -ENOMEM;
+	}
 
 	memcpy(key->data, &auth_key->sca_key[0], auth_key->sca_keylength);
+	cur_key->key = key;
 
-	/* If we are replacing, remove the old keys data from the
-	 * key id.  If we are adding new key id, add it to the
-	 * list.
-	 */
-	if (replace)
-		sctp_auth_key_put(cur_key->key);
-	else
-		list_add(&cur_key->key_list, sh_keys);
+	if (replace) {
+		list_del_init(&shkey->key_list);
+		sctp_auth_shkey_release(shkey);
+	}
+	list_add(&cur_key->key_list, sh_keys);
 
-	cur_key->key = key;
 	return 0;
-nomem:
-	if (!replace)
-		sctp_auth_shkey_free(cur_key);
-
-	return -ENOMEM;
 }
 
 int sctp_auth_set_active_key(struct sctp_endpoint *ep,
@@ -952,7 +952,7 @@ int sctp_auth_del_key_id(struct sctp_endpoint *ep,
 
 	/* Delete the shared key */
 	list_del_init(&key->key_list);
-	sctp_auth_shkey_free(key);
+	sctp_auth_shkey_release(key);
 
 	return 0;
 }
diff --git a/net/sctp/chunk.c b/net/sctp/chunk.c
index 991a530..9f28a9a 100644
--- a/net/sctp/chunk.c
+++ b/net/sctp/chunk.c
@@ -168,6 +168,7 @@ struct sctp_datamsg *sctp_datamsg_from_user(struct sctp_association *asoc,
 {
 	size_t len, first_len, max_data, remaining;
 	size_t msg_len = iov_iter_count(from);
+	struct sctp_shared_key *shkey = NULL;
 	struct list_head *pos, *temp;
 	struct sctp_chunk *chunk;
 	struct sctp_datamsg *msg;
@@ -204,6 +205,8 @@ struct sctp_datamsg *sctp_datamsg_from_user(struct sctp_association *asoc,
 		if (hmac_desc)
 			max_data -= SCTP_PAD4(sizeof(struct sctp_auth_chunk) +
 					      hmac_desc->hmac_len);
+
+		shkey = asoc->shkey;
 	}
 
 	/* Check what's our max considering the above */
@@ -275,6 +278,8 @@ struct sctp_datamsg *sctp_datamsg_from_user(struct sctp_association *asoc,
 		if (err < 0)
 			goto errout_chunk_free;
 
+		chunk->shkey = shkey;
+
 		/* Put the chunk->skb back into the form expected by send.  */
 		__skb_pull(chunk->skb, (__u8 *)chunk->chunk_hdr -
 				       chunk->skb->data);
diff --git a/net/sctp/output.c b/net/sctp/output.c
index 01a26ee0..d6e1c90 100644
--- a/net/sctp/output.c
+++ b/net/sctp/output.c
@@ -241,10 +241,13 @@ static enum sctp_xmit sctp_packet_bundle_auth(struct sctp_packet *pkt,
 	if (!chunk->auth)
 		return retval;
 
-	auth = sctp_make_auth(asoc);
+	auth = sctp_make_auth(asoc, chunk->shkey->key_id);
 	if (!auth)
 		return retval;
 
+	auth->shkey = chunk->shkey;
+	sctp_auth_shkey_hold(auth->shkey);
+
 	retval = __sctp_packet_append_chunk(pkt, auth);
 
 	if (retval != SCTP_XMIT_OK)
@@ -490,7 +493,8 @@ static int sctp_packet_pack(struct sctp_packet *packet,
 		}
 
 		if (auth) {
-			sctp_auth_calculate_hmac(tp->asoc, nskb, auth, gfp);
+			sctp_auth_calculate_hmac(tp->asoc, nskb, auth,
+						 packet->auth->shkey, gfp);
 			/* free auth if no more chunks, or add it back */
 			if (list_empty(&packet->chunk_list))
 				sctp_chunk_free(packet->auth);
@@ -770,6 +774,16 @@ static enum sctp_xmit sctp_packet_will_fit(struct sctp_packet *packet,
 	enum sctp_xmit retval = SCTP_XMIT_OK;
 	size_t psize, pmtu, maxsize;
 
+	/* Don't bundle in this packet if this chunk's auth key doesn't
+	 * match other chunks already enqueued on this packet. Also,
+	 * don't bundle the chunk with auth key if other chunks in this
+	 * packet don't have auth key.
+	 */
+	if ((packet->auth && chunk->shkey != packet->auth->shkey) ||
+	    (!packet->auth && chunk->shkey &&
+	     chunk->chunk_hdr->type != SCTP_CID_AUTH))
+		return SCTP_XMIT_PMTU_FULL;
+
 	psize = packet->size;
 	if (packet->transport->asoc)
 		pmtu = packet->transport->asoc->pathmtu;
diff --git a/net/sctp/sm_make_chunk.c b/net/sctp/sm_make_chunk.c
index d01475f..10f071c 100644
--- a/net/sctp/sm_make_chunk.c
+++ b/net/sctp/sm_make_chunk.c
@@ -87,7 +87,10 @@ static void  *sctp_addto_chunk_fixed(struct sctp_chunk *, int len,
 /* Control chunk destructor */
 static void sctp_control_release_owner(struct sk_buff *skb)
 {
-	/*TODO: do memory release */
+	struct sctp_chunk *chunk = skb_shinfo(skb)->destructor_arg;
+
+	if (chunk->shkey)
+		sctp_auth_shkey_release(chunk->shkey);
 }
 
 static void sctp_control_set_owner_w(struct sctp_chunk *chunk)
@@ -102,7 +105,12 @@ static void sctp_control_set_owner_w(struct sctp_chunk *chunk)
 	 *
 	 *  For now don't do anything for now.
 	 */
+	if (chunk->auth) {
+		chunk->shkey = asoc->shkey;
+		sctp_auth_shkey_hold(chunk->shkey);
+	}
 	skb->sk = asoc ? asoc->base.sk : NULL;
+	skb_shinfo(skb)->destructor_arg = chunk;
 	skb->destructor = sctp_control_release_owner;
 }
 
@@ -1271,7 +1279,8 @@ struct sctp_chunk *sctp_make_op_error(const struct sctp_association *asoc,
 	return retval;
 }
 
-struct sctp_chunk *sctp_make_auth(const struct sctp_association *asoc)
+struct sctp_chunk *sctp_make_auth(const struct sctp_association *asoc,
+				  __u16 key_id)
 {
 	struct sctp_authhdr auth_hdr;
 	struct sctp_hmac *hmac_desc;
@@ -1289,7 +1298,7 @@ struct sctp_chunk *sctp_make_auth(const struct sctp_association *asoc)
 		return NULL;
 
 	auth_hdr.hmac_id = htons(hmac_desc->hmac_id);
-	auth_hdr.shkey_id = htons(asoc->active_key_id);
+	auth_hdr.shkey_id = htons(key_id);
 
 	retval->subh.auth_hdr = sctp_addto_chunk(retval, sizeof(auth_hdr),
 						 &auth_hdr);
diff --git a/net/sctp/sm_statefuns.c b/net/sctp/sm_statefuns.c
index eb7905f..792e0e2 100644
--- a/net/sctp/sm_statefuns.c
+++ b/net/sctp/sm_statefuns.c
@@ -4114,6 +4114,7 @@ static enum sctp_ierror sctp_sf_authenticate(
 					const union sctp_subtype type,
 					struct sctp_chunk *chunk)
 {
+	struct sctp_shared_key *sh_key = NULL;
 	struct sctp_authhdr *auth_hdr;
 	__u8 *save_digest, *digest;
 	struct sctp_hmac *hmac;
@@ -4135,9 +4136,11 @@ static enum sctp_ierror sctp_sf_authenticate(
 	 * configured
 	 */
 	key_id = ntohs(auth_hdr->shkey_id);
-	if (key_id != asoc->active_key_id && !sctp_auth_get_shkey(asoc, key_id))
-		return SCTP_IERROR_AUTH_BAD_KEYID;
-
+	if (key_id != asoc->active_key_id) {
+		sh_key = sctp_auth_get_shkey(asoc, key_id);
+		if (!sh_key)
+			return SCTP_IERROR_AUTH_BAD_KEYID;
+	}
 
 	/* Make sure that the length of the signature matches what
 	 * we expect.
@@ -4166,7 +4169,7 @@ static enum sctp_ierror sctp_sf_authenticate(
 
 	sctp_auth_calculate_hmac(asoc, chunk->skb,
 				 (struct sctp_auth_chunk *)chunk->chunk_hdr,
-				 GFP_ATOMIC);
+				 sh_key, GFP_ATOMIC);
 
 	/* Discard the packet if the digests do not match */
 	if (memcmp(save_digest, digest, sig_len)) {
diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index af5cf29..003a4ad 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -156,6 +156,9 @@ static inline void sctp_set_owner_w(struct sctp_chunk *chunk)
 	/* The sndbuf space is tracked per association.  */
 	sctp_association_hold(asoc);
 
+	if (chunk->shkey)
+		sctp_auth_shkey_hold(chunk->shkey);
+
 	skb_set_owner_w(chunk->skb, sk);
 
 	chunk->skb->destructor = sctp_wfree;
@@ -8109,6 +8112,9 @@ static void sctp_wfree(struct sk_buff *skb)
 	sk->sk_wmem_queued   -= skb->truesize;
 	sk_mem_uncharge(sk, skb->truesize);
 
+	if (chunk->shkey)
+		sctp_auth_shkey_release(chunk->shkey);
+
 	sock_wfree(skb);
 	sctp_wake_up_waiters(sk, asoc);
 
-- 
2.1.0

^ permalink raw reply related

* [PATCH net-next 0/5] sctp: add support for some sctp auth APIs from RFC6458
From: Xin Long @ 2018-03-14 11:05 UTC (permalink / raw)
  To: network dev, linux-sctp; +Cc: Marcelo Ricardo Leitner, Neil Horman, davem

This patchset mainly adds support for SCTP AUTH Information for sendmsg,
described in RFC6458:

    5.3.8.  SCTP AUTH Information Structure (SCTP_AUTHINFO)

and also adds a sockopt described in RFC6458:

    8.3.4.  Deactivate a Shared Key (SCTP_AUTH_DEACTIVATE_KEY)

and two types of events for AUTHENTICATION_EVENT described in RFC6458:

    6.1.8.  SCTP_AUTHENTICATION_EVENT:
             - SCTP_AUTH_NO_AUTH
             - SCTP_AUTH_FREE_KEY

After this patchset, we have fully support for sctp_sendv in kernel.

Note that this patchset won't touch that sctp options merge conflict.

Xin Long (5):
  sctp: add refcnt support for sh_key
  sctp: add support for SCTP AUTH Information for sendmsg
  sctp: add sockopt SCTP_AUTH_DEACTIVATE_KEY
  sctp: add SCTP_AUTH_FREE_KEY type for AUTHENTICATION_EVENT
  sctp: add SCTP_AUTH_NO_AUTH type for AUTHENTICATION_EVENT

 include/net/sctp/auth.h    |  21 ++++---
 include/net/sctp/command.h |   1 +
 include/net/sctp/sm.h      |   3 +-
 include/net/sctp/structs.h |  10 +++-
 include/uapi/linux/sctp.h  |  22 ++++++-
 net/sctp/auth.c            | 146 +++++++++++++++++++++++++++++++--------------
 net/sctp/chunk.c           |  14 +++++
 net/sctp/output.c          |  18 +++++-
 net/sctp/sm_make_chunk.c   |  33 +++++++++-
 net/sctp/sm_sideeffect.c   |  13 ++++
 net/sctp/sm_statefuns.c    |  56 ++++++++++++++---
 net/sctp/socket.c          |  77 ++++++++++++++++++++++++
 12 files changed, 342 insertions(+), 72 deletions(-)

-- 
2.1.0

^ permalink raw reply

* [PATCH] brcmfmac: drop Inter-Access Point Protocol packets by default
From: Rafał Miłecki @ 2018-03-14 11:01 UTC (permalink / raw)
  To: Kalle Valo
  Cc: Arend van Spriel, Franky Lin, Hante Meuleman, Chi-Hsien Lin,
	Wright Feng, Pieter-Paul Giesberts, James Hughes,
	linux-wireless-u79uwXL29TY76Z2rM5mHXA,
	brcm80211-dev-list.pdl-dY08KVG/lbpWk0Htik3J/w,
	brcm80211-dev-list-+wT8y+m8/X5BDgjK7y7TUQ,
	netdev-u79uwXL29TY76Z2rM5mHXA, Linus Lüssing, Felix Fietkau,
	bridge-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Rafał Miłecki

From: Rafał Miłecki <rafal-g1n6cQUeyibVItvQsEIGlw@public.gmane.org>

Testing brcmfmac with more recent firmwares resulted in AP interfaces
not working in some specific setups. Debugging resulted in discovering
support for IAPP in Broadcom's firmwares. This is an obsoleted standard
and its implementation is something that:
1) Most people don't need / want to use
2) Can allow local DoS attacks
3) Breaks AP interfaces in some specific bridge setups

To solve issues it can cause this commit modifies brcmfmac to drop IAPP
packets. If affects:
1) Rx path: driver won't be sending these unwanted packets up.
2) Tx path: driver will reject packets that would trigger STA
   disassociation perfromed by a firmware (possible local DoS attack).

It appears there are some Broadcom's clients/users who care about this
feature despite the drawbacks. They can switch it on by a newly added
Kconfig option.

Signed-off-by: Rafał Miłecki <rafal-g1n6cQUeyibVItvQsEIGlw@public.gmane.org>
---
 drivers/net/wireless/broadcom/brcm80211/Kconfig    | 20 +++++++++++
 .../wireless/broadcom/brcm80211/brcmfmac/core.c    | 39 ++++++++++++++++++++++
 2 files changed, 59 insertions(+)

diff --git a/drivers/net/wireless/broadcom/brcm80211/Kconfig b/drivers/net/wireless/broadcom/brcm80211/Kconfig
index 9d99eb42d917..876787ef991a 100644
--- a/drivers/net/wireless/broadcom/brcm80211/Kconfig
+++ b/drivers/net/wireless/broadcom/brcm80211/Kconfig
@@ -68,6 +68,26 @@ config BRCMFMAC_PCIE
 	  IEEE802.11ac embedded FullMAC WLAN driver. Say Y if you want to
 	  use the driver for an PCIE wireless card.
 
+config BRCMFMAC_IAPP
+	bool "Partial support for obsoleted Inter-Access Point Protocol"
+	depends on BRCMFMAC
+	---help---
+	  Most of Broadcom's firmwares can send 802.11f ADD frame every
+	  time new STA connects to the AP interface. Some recent ones
+	  can also disassociate STA when they receive such a frame.
+
+	  It's important to understand this behavior can lead to a local
+	  DoS security issue. Attacker may trigger disassociation of any
+	  STA by sending a proper Ethernet frame to the wireless
+	  interface.
+
+	  Moreover this feature may break AP interfaces in some specific
+	  setups. This applies e.g. to the bridge with hairpin mode
+	  enabled and IFLA_BRPORT_MCAST_TO_UCAST set. IAPP packet
+	  generated by a firmware will get passed back to the wireless
+	  interface and cause immediate disassociation of just-connected
+	  STA.
+
 config BRCM_TRACING
 	bool "Broadcom device tracing"
 	depends on BRCMSMAC || BRCMFMAC
diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/core.c b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/core.c
index 19048526b4af..db6987015fb1 100644
--- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/core.c
+++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/core.c
@@ -230,6 +230,34 @@ static void brcmf_netdev_set_multicast_list(struct net_device *ndev)
 	schedule_work(&ifp->multicast_work);
 }
 
+/**
+ * brcmf_skb_is_iapp - checks if skb is an IAPP packet
+ *
+ * @skb: skb to check
+ */
+static bool brcmf_skb_is_iapp(struct sk_buff *skb)
+{
+	const u8 iapp_l2_update_packet[6] __aligned(2) = {
+		0x00, 0x01, 0xaf, 0x81, 0x01, 0x00,
+	};
+	unsigned char *eth_data = skb_mac_header(skb) + ETH_HLEN;
+#if !defined(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS)
+	const u16 *a = (const u16 *)eth_data;
+	const u16 *b = (const u16 *)iapp_l2_update_packet;
+#endif
+
+	if (skb->len - skb->mac_len != 6 ||
+	    !is_multicast_ether_addr(eth_hdr(skb)->h_dest))
+		return false;
+
+#if defined(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS)
+	return !(((*(const u32 *)eth_data) ^ (*(const u32 *)iapp_l2_update_packet)) |
+		 ((*(const u16 *)(eth_data + 4)) ^ (*(const u16 *)(iapp_l2_update_packet + 4))));
+#else
+	return !((a[0] ^ b[0]) | (a[1] ^ b[1]) | (a[2] ^ b[2]));
+#endif
+}
+
 static netdev_tx_t brcmf_netdev_start_xmit(struct sk_buff *skb,
 					   struct net_device *ndev)
 {
@@ -250,6 +278,12 @@ static netdev_tx_t brcmf_netdev_start_xmit(struct sk_buff *skb,
 		goto done;
 	}
 
+	if (!IS_ENABLED(CONFIG_BRCMFMAC_IAPP) && brcmf_skb_is_iapp(skb)) {
+		dev_kfree_skb(skb);
+		ret = -EINVAL;
+		goto done;
+	}
+
 	/* Make sure there's enough writeable headroom */
 	if (skb_headroom(skb) < drvr->hdrlen || skb_header_cloned(skb)) {
 		head_delta = max_t(int, drvr->hdrlen - skb_headroom(skb), 0);
@@ -325,6 +359,11 @@ void brcmf_txflowblock_if(struct brcmf_if *ifp,
 
 void brcmf_netif_rx(struct brcmf_if *ifp, struct sk_buff *skb)
 {
+	if (!IS_ENABLED(CONFIG_BRCMFMAC_IAPP) && brcmf_skb_is_iapp(skb)) {
+		brcmu_pkt_buf_free_skb(skb);
+		return;
+	}
+
 	if (skb->pkt_type == PACKET_MULTICAST)
 		ifp->ndev->stats.multicast++;
 
-- 
2.11.0

^ permalink raw reply related

* [PATCH] can: enable multi-queue for SocketCAN devices
From: Mark Jonas @ 2018-03-14 10:33 UTC (permalink / raw)
  To: Wolfgang Grandegger, Marc Kleine-Budde
  Cc: linux-can, netdev, linux-kernel, hs, Zhu Yi, Mark Jonas

From: Zhu Yi <yi.zhu5@cn.bosch.com>

The existing SocketCAN implementation provides alloc_candev() to
allocate a CAN device using a single Tx and Rx queue. This can lead to
priority inversion in case the single Tx queue is already full with low
priority messages and a high priority message needs to be sent while the
bus is fully loaded with medium priority messages.

This problem can be solved by using the existing multi-queue support of
the network subsystem. The commit makes it possible to use multi-queue in
the CAN subsystem in the same way it is used in the Ethernet subsystem
by adding an alloc_candev_mqs() call and accompanying macros. With this
support a CAN device can use multi-queue qdisc (e.g. mqprio) to avoid
the aforementioned priority inversion.

The existing functionality of alloc_candev() is the same as before.

CAN devices need to have prioritized multiple hardware queues or are
able to abort waiting for arbitration to make sensible use of
multi-queues.

Signed-off-by: Zhu Yi <yi.zhu5@cn.bosch.com>
Signed-off-by: Mark Jonas <mark.jonas@de.bosch.com>
Reviewed-by: Heiko Schocher <hs@denx.de>
---
 drivers/net/can/dev.c   | 8 +++++---
 include/linux/can/dev.h | 7 ++++++-
 2 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/drivers/net/can/dev.c b/drivers/net/can/dev.c
index b177956..636f853 100644
--- a/drivers/net/can/dev.c
+++ b/drivers/net/can/dev.c
@@ -703,7 +703,8 @@ EXPORT_SYMBOL_GPL(alloc_can_err_skb);
 /*
  * Allocate and setup space for the CAN network device
  */
-struct net_device *alloc_candev(int sizeof_priv, unsigned int echo_skb_max)
+struct net_device *alloc_candev_mqs(int sizeof_priv, unsigned int echo_skb_max,
+				    unsigned int txqs, unsigned int rxqs)
 {
 	struct net_device *dev;
 	struct can_priv *priv;
@@ -715,7 +716,8 @@ struct net_device *alloc_candev(int sizeof_priv, unsigned int echo_skb_max)
 	else
 		size = sizeof_priv;
 
-	dev = alloc_netdev(size, "can%d", NET_NAME_UNKNOWN, can_setup);
+	dev = alloc_netdev_mqs(size, "can%d", NET_NAME_UNKNOWN, can_setup,
+			       txqs, rxqs);
 	if (!dev)
 		return NULL;
 
@@ -734,7 +736,7 @@ struct net_device *alloc_candev(int sizeof_priv, unsigned int echo_skb_max)
 
 	return dev;
 }
-EXPORT_SYMBOL_GPL(alloc_candev);
+EXPORT_SYMBOL_GPL(alloc_candev_mqs);
 
 /*
  * Free space of the CAN network device
diff --git a/include/linux/can/dev.h b/include/linux/can/dev.h
index 055aaf5..a83e1f6 100644
--- a/include/linux/can/dev.h
+++ b/include/linux/can/dev.h
@@ -143,7 +143,12 @@ u8 can_dlc2len(u8 can_dlc);
 /* map the sanitized data length to an appropriate data length code */
 u8 can_len2dlc(u8 len);
 
-struct net_device *alloc_candev(int sizeof_priv, unsigned int echo_skb_max);
+struct net_device *alloc_candev_mqs(int sizeof_priv, unsigned int echo_skb_max,
+				    unsigned int txqs, unsigned int rxqs);
+#define alloc_candev(sizeof_priv, echo_skb_max) \
+	alloc_candev_mqs(sizeof_priv, echo_skb_max, 1, 1)
+#define alloc_candev_mq(sizeof_priv, echo_skb_max, count) \
+	alloc_candev_mqs(sizeof_priv, echo_skb_max, count, count)
 void free_candev(struct net_device *dev);
 
 /* a candev safe wrapper around netdev_priv */
-- 
2.7.4

^ permalink raw reply related

* Re: [bug, bisected] pfifo_fast causes packet reordering
From: Jakob Unterwurzacher @ 2018-03-14 10:09 UTC (permalink / raw)
  To: John Fastabend, Dave Taht
  Cc: netdev, linux-kernel, David S. Miller, linux-can@vger.kernel.org,
	Martin Elshuber
In-Reply-To: <95844480-d020-9000-53ef-0da8b965ce6e@gmail.com>

On 14.03.18 05:03, John Fastabend wrote:
>>> During stress-testing our "ucan" USB/CAN adapter SocketCAN driver on Linux
>>> v4.16-rc4-383-ged58d66f60b3 we observed that a small fraction of packets are
>>> delivered out-of-order.
>>>
> 
> Is the stress-testing tool available somewhere? What type of packets
> are being sent?

Not public, no, the problem is that you'd need a CAN adapter as well.

The test is simple, sending CAN frames with an increasing counter and a 
random length payload:

[  tx thread                      rx thread    ]
        |                              ^
        v                              |
[ interface 0 ] ---- cable ----> [ interface 1 ]

I'll see if I can come up with a UDP testcase that works with normal 
ethernet interfaces.

> Is this a single queue device or a multiqueue device? Running
> 'tc -s qdisc show dev foo' would help some.

Here you go:

root@rk3399-q7:~# tc -s qdisc show dev can0
qdisc pfifo_fast 0: root refcnt 2 bands 3 priomap  1 2 2 2 1 2 0 0 1 1 1 
1 1 1 1 1
  Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
  backlog 0b 0p requeues 0

> If we introduced a OOO edge case somewhere that was not
> intended so I'll take a look into it. But, if you can provide
> a bit more details on how stress testing is done to cause the
> issue that would help.

Will do.

Thanks,
Jakob

^ permalink raw reply

* [PATCH net-next 3/3] net/smc: schedule free_work when link group is terminated
From: Ursula Braun @ 2018-03-14 10:01 UTC (permalink / raw)
  To: davem
  Cc: netdev, linux-s390, linux-rdma, schwidefsky, heiko.carstens,
	raspl, ubraun
In-Reply-To: <20180314100102.40474-1-ubraun@linux.vnet.ibm.com>

From: Karsten Graul <kgraul@linux.vnet.ibm.com>

The free_work worker must be scheduled when the link group is
abnormally terminated.

Signed-off-by: Karsten Graul <kgraul@linux.vnet.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
---
 net/smc/smc_core.c | 20 +++++++++++++-------
 1 file changed, 13 insertions(+), 7 deletions(-)

diff --git a/net/smc/smc_core.c b/net/smc/smc_core.c
index ec6189fe2a48..f44f6803f7ff 100644
--- a/net/smc/smc_core.c
+++ b/net/smc/smc_core.c
@@ -32,6 +32,17 @@
 
 static u32 smc_lgr_num;			/* unique link group number */
 
+static void smc_lgr_schedule_free_work(struct smc_link_group *lgr)
+{
+	/* client link group creation always follows the server link group
+	 * creation. For client use a somewhat higher removal delay time,
+	 * otherwise there is a risk of out-of-sync link groups.
+	 */
+	mod_delayed_work(system_wq, &lgr->free_work,
+			 lgr->role == SMC_CLNT ? SMC_LGR_FREE_DELAY_CLNT :
+						 SMC_LGR_FREE_DELAY_SERV);
+}
+
 /* Register connection's alert token in our lookup structure.
  * To use rbtrees we have to implement our own insert core.
  * Requires @conns_lock
@@ -111,13 +122,7 @@ static void smc_lgr_unregister_conn(struct smc_connection *conn)
 	write_unlock_bh(&lgr->conns_lock);
 	if (!reduced || lgr->conns_num)
 		return;
-	/* client link group creation always follows the server link group
-	 * creation. For client use a somewhat higher removal delay time,
-	 * otherwise there is a risk of out-of-sync link groups.
-	 */
-	mod_delayed_work(system_wq, &lgr->free_work,
-			 lgr->role == SMC_CLNT ? SMC_LGR_FREE_DELAY_CLNT :
-						 SMC_LGR_FREE_DELAY_SERV);
+	smc_lgr_schedule_free_work(lgr);
 }
 
 static void smc_lgr_free_work(struct work_struct *work)
@@ -344,6 +349,7 @@ void smc_lgr_terminate(struct smc_link_group *lgr)
 	}
 	write_unlock_bh(&lgr->conns_lock);
 	wake_up(&lgr->lnk[SMC_SINGLE_LINK].wr_reg_wait);
+	smc_lgr_schedule_free_work(lgr);
 }
 
 /* Determine vlan of internal TCP socket.
-- 
2.13.5

^ permalink raw reply related

* [PATCH net-next 2/3] net/smc: free link group without pending free_work only
From: Ursula Braun @ 2018-03-14 10:01 UTC (permalink / raw)
  To: davem
  Cc: netdev, linux-s390, linux-rdma, schwidefsky, heiko.carstens,
	raspl, ubraun
In-Reply-To: <20180314100102.40474-1-ubraun@linux.vnet.ibm.com>

Make sure there is no pending or running free_work worker for the link
group when freeing the link group.

Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
---
 net/smc/af_smc.c   | 1 +
 net/smc/smc_core.c | 3 ++-
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c
index 2c6f4e0a9f3d..649489f825a5 100644
--- a/net/smc/af_smc.c
+++ b/net/smc/af_smc.c
@@ -1477,6 +1477,7 @@ static void __exit smc_exit(void)
 	spin_unlock_bh(&smc_lgr_list.lock);
 	list_for_each_entry_safe(lgr, lg, &lgr_freeing_list, list) {
 		list_del_init(&lgr->list);
+		cancel_delayed_work_sync(&lgr->free_work);
 		smc_lgr_free(lgr); /* free link group */
 	}
 	static_branch_disable(&tcp_have_smc);
diff --git a/net/smc/smc_core.c b/net/smc/smc_core.c
index f76f60e463cb..ec6189fe2a48 100644
--- a/net/smc/smc_core.c
+++ b/net/smc/smc_core.c
@@ -140,7 +140,8 @@ static void smc_lgr_free_work(struct work_struct *work)
 	list_del_init(&lgr->list); /* remove from smc_lgr_list */
 free:
 	spin_unlock_bh(&smc_lgr_list.lock);
-	smc_lgr_free(lgr);
+	if (!delayed_work_pending(&lgr->free_work))
+		smc_lgr_free(lgr);
 }
 
 /* create a new SMC link group */
-- 
2.13.5

^ permalink raw reply related

* [PATCH net-next 1/3] net/smc: pay attention to MAX_ORDER for CQ entries
From: Ursula Braun @ 2018-03-14 10:01 UTC (permalink / raw)
  To: davem
  Cc: netdev, linux-s390, linux-rdma, schwidefsky, heiko.carstens,
	raspl, ubraun
In-Reply-To: <20180314100102.40474-1-ubraun@linux.vnet.ibm.com>

smc allocates a certain number of CQ entries for used RoCE devices. For
mlx5 devices the chosen constant number results in a large allocation
causing this warning:

[13355.124656] WARNING: CPU: 3 PID: 16535 at mm/page_alloc.c:3883 __alloc_pages_nodemask+0x2be/0x10c0
[13355.124657] Modules linked in: smc_diag(O) smc(O) xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp bridge stp llc ip6table_filter ip6_tables iptable_filter mlx5_ib ib_core sunrpc mlx5_core s390_trng rng_core ghash_s390 prng aes_s390 des_s390 des_generic sha512_s390 sha256_s390 sha1_s390 sha_common ptp pps_core eadm_sch dm_multipath dm_mod vhost_net tun vhost tap sch_fq_codel kvm ip_tables x_tables autofs4 [last unloaded: smc]
[13355.124672] CPU: 3 PID: 16535 Comm: kworker/3:0 Tainted: G           O    4.14.0uschi #1
[13355.124673] Hardware name: IBM 3906 M04 704 (LPAR)
[13355.124675] Workqueue: events smc_listen_work [smc]
[13355.124677] task: 00000000e2f22100 task.stack: 0000000084720000
[13355.124678] Krnl PSW : 0704c00180000000 000000000029da76 (__alloc_pages_nodemask+0x2be/0x10c0)
[13355.124681]            R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3
[13355.124682] Krnl GPRS: 0000000000000000 00550e00014080c0 0000000000000000 0000000000000001
[13355.124684]            000000000029d8b6 00000000f3bfd710 0000000000000000 00000000014080c0
[13355.124685]            0000000000000009 00000000ec277a00 0000000000200000 0000000000000000
[13355.124686]            0000000000000000 00000000000001ff 000000000029d8b6 0000000084723720
[13355.124708] Krnl Code: 000000000029da6a: a7110200		tmll	%r1,512
                          000000000029da6e: a774ff29		brc	7,29d8c0
                         #000000000029da72: a7f40001		brc	15,29da74
                         >000000000029da76: a7f4ff25		brc	15,29d8c0
                          000000000029da7a: a7380000		lhi	%r3,0
                          000000000029da7e: a7f4fef1		brc	15,29d860
                          000000000029da82: 5820f0c4		l	%r2,196(%r15)
                          000000000029da86: a53e0048		llilh	%r3,72
[13355.124720] Call Trace:
[13355.124722] ([<000000000029d8b6>] __alloc_pages_nodemask+0xfe/0x10c0)
[13355.124724]  [<000000000013bd1e>] s390_dma_alloc+0x6e/0x148
[13355.124733]  [<000003ff802eeba6>] mlx5_dma_zalloc_coherent_node+0x8e/0xe0 [mlx5_core]
[13355.124740]  [<000003ff802eee18>] mlx5_buf_alloc_node+0x70/0x108 [mlx5_core]
[13355.124744]  [<000003ff804eb410>] mlx5_ib_create_cq+0x558/0x898 [mlx5_ib]
[13355.124749]  [<000003ff80407d40>] ib_create_cq+0x48/0x88 [ib_core]
[13355.124751]  [<000003ff80109fba>] smc_ib_setup_per_ibdev+0x52/0x118 [smc]
[13355.124753]  [<000003ff8010bcb6>] smc_conn_create+0x65e/0x728 [smc]
[13355.124755]  [<000003ff801081a2>] smc_listen_work+0x2d2/0x540 [smc]
[13355.124756]  [<0000000000162c66>] process_one_work+0x1be/0x440
[13355.124758]  [<0000000000162f40>] worker_thread+0x58/0x458
[13355.124759]  [<0000000000169e7e>] kthread+0x14e/0x168
[13355.124760]  [<00000000009ce8be>] kernel_thread_starter+0x6/0xc
[13355.124762]  [<00000000009ce8b8>] kernel_thread_starter+0x0/0xc
[13355.124762] Last Breaking-Event-Address:
[13355.124764]  [<000000000029da72>] __alloc_pages_nodemask+0x2ba/0x10c0
[13355.124764] ---[ end trace 34be38b581c0b585 ]---

This patch reduces the smc constant for the maximum number of allocated
completion queue entries SMC_MAX_CQE by 2 to avoid high round up values
in the mlx5 code, and reduces the number of allocated completion queue
entries even more, if the final allocation for an mlx5 device hits the
MAX_ORDER limit.

Reported-by: Ihnken Menssen <menssen@de.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
---
 net/smc/smc_ib.c | 10 +++++++++-
 net/smc/smc_wr.h |  1 -
 2 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/net/smc/smc_ib.c b/net/smc/smc_ib.c
index 2a8957bd6d38..26df554f7588 100644
--- a/net/smc/smc_ib.c
+++ b/net/smc/smc_ib.c
@@ -23,6 +23,8 @@
 #include "smc_wr.h"
 #include "smc.h"
 
+#define SMC_MAX_CQE 32766	/* max. # of completion queue elements */
+
 #define SMC_QP_MIN_RNR_TIMER		5
 #define SMC_QP_TIMEOUT			15 /* 4096 * 2 ** timeout usec */
 #define SMC_QP_RETRY_CNT			7 /* 7: infinite */
@@ -438,9 +440,15 @@ int smc_ib_remember_port_attr(struct smc_ib_device *smcibdev, u8 ibport)
 long smc_ib_setup_per_ibdev(struct smc_ib_device *smcibdev)
 {
 	struct ib_cq_init_attr cqattr =	{
-		.cqe = SMC_WR_MAX_CQE, .comp_vector = 0 };
+		.cqe = SMC_MAX_CQE, .comp_vector = 0 };
+	int cqe_size_order, smc_order;
 	long rc;
 
+	/* the calculated number of cq entries fits to mlx5 cq allocation */
+	cqe_size_order = cache_line_size() == 128 ? 7 : 6;
+	smc_order = MAX_ORDER - cqe_size_order - 1;
+	if (SMC_MAX_CQE + 2 > (0x00000001 << smc_order) * PAGE_SIZE)
+		cqattr.cqe = (0x00000001 << smc_order) * PAGE_SIZE - 2;
 	smcibdev->roce_cq_send = ib_create_cq(smcibdev->ibdev,
 					      smc_wr_tx_cq_handler, NULL,
 					      smcibdev, &cqattr);
diff --git a/net/smc/smc_wr.h b/net/smc/smc_wr.h
index ef0c3494c9cb..210bec3c3ebe 100644
--- a/net/smc/smc_wr.h
+++ b/net/smc/smc_wr.h
@@ -19,7 +19,6 @@
 #include "smc.h"
 #include "smc_core.h"
 
-#define SMC_WR_MAX_CQE 32768	/* max. # of completion queue elements */
 #define SMC_WR_BUF_CNT 16	/* # of ctrl buffers per link */
 
 #define SMC_WR_TX_WAIT_FREE_SLOT_TIME	(10 * HZ)
-- 
2.13.5

^ permalink raw reply related

* [PATCH net-next 0/3] net/smc: fixes 2018-03-14
From: Ursula Braun @ 2018-03-14 10:00 UTC (permalink / raw)
  To: davem
  Cc: netdev, linux-s390, linux-rdma, schwidefsky, heiko.carstens,
	raspl, ubraun

here are smc changes for the net-next tree.
The first patch enables SMC to work with mlx5-RoCE-devices.
Patches 2 and 3 deal with link group freeing.

Thanks, Ursula

Karsten Graul (1):
  net/smc: schedule free_work when link group is terminated

Ursula Braun (2):
  net/smc: pay attention to MAX_ORDER for CQ entries
  net/smc: free link group without pending free_work only

 net/smc/af_smc.c   |  1 +
 net/smc/smc_core.c | 23 +++++++++++++++--------
 net/smc/smc_ib.c   | 10 +++++++++-
 net/smc/smc_wr.h   |  1 -
 4 files changed, 25 insertions(+), 10 deletions(-)

-- 
2.13.5

^ permalink raw reply

* Re: [RFC net-next 2/6] driver: net: bonding: allow registration of tc offload callbacks in bond
From: Jiri Pirko @ 2018-03-14  9:50 UTC (permalink / raw)
  To: Or Gerlitz
  Cc: Jiri Pirko, Rabie Loulou, John Hurley, Jakub Kicinski,
	Simon Horman, Linux Netdev List, ASAP_Direct_Dev, mlxsw
In-Reply-To: <CAJ3xEMicamC40mt6tkXyob8yHwmUT_T2SRoig7iTPu1ud8Arzw@mail.gmail.com>

Tue, Mar 13, 2018 at 04:51:02PM CET, gerlitz.or@gmail.com wrote:
>On Wed, Mar 7, 2018 at 12:57 PM, Jiri Pirko <jiri@resnulli.us> wrote:
>> Mon, Mar 05, 2018 at 02:28:30PM CET, john.hurley@netronome.com wrote:
>>>Allow drivers to register netdev callbacks for tc offload in linux bonds.
>>>If a netdev has registered and is a slave of a given bond, then any tc
>>>rules offloaded to the bond will be relayed to it if both the bond and the
>>>slave permit hw offload.
>
>>>Because the bond itself is not offloaded, just the rules, we don't care
>>>about whether the bond ports are on the same device or whether some of
>>>slaves are representor ports and some are not.
>
>John, I think we must design here for the case where the bond IS offloaded.
>E.g some sort of HW LAG. For example, the mlxsw driver supports
>LAG offload and support tcflower offload, we need to see how these
>two live together, mlx5 supports tcflower offload and we are working on
>bond offload, etc.
>
>>>+EXPORT_SYMBOL_GPL(tc_setup_cb_bond_register);
>>
>> Please, no "bond" specific calls from drivers. That would be wrong.
>> The idea behing block callbacks was that anyone who is interested could
>> register to receive those. In this case, slave device is interested.
>> So it should register to receive block callbacks in the same way as if
>> the block was directly on top of the slave device. The only thing you
>> need to handle is to propagate block bind/unbind from master down to the
>> slaves.
>
>Jiri,
>
>This sounds nice for the case where one install ingress tc rules on
>the bond (lets
>call them type 1, see next)
>
>One obstacle pointed by my colleague, Rabie, is that when the upper layer
>issues stat call on the filter, they will get two replies, this can confuse them
>and lead to wrong decisions (aging). I wonder if/how we can set a knob

The bonding itself would not do anything on stats update
command (TC_CLSFLOWER_STATS for example). Only the slaves would do
update. So there will be only reply from slaves.

Bond/team is just going to probagare block bind/unbind down. Nothing
else.



>somewhere that unifies the stats (add packet/bytes, use the latest lastuse).
>
>Also, lets see what other rules have to be offloaded in that scheme
>(call them type 2/3/4)
>where one bonded two HW ports
>
>2. bond being egress port of a rule
>
>TC rules for overlay networks scheme, e.g in NIC SRIOV
>scheme where one bonds the two uplink representors
>
>Starting with type 2, in our current NIC HW APIs we have to duplicate
>these rules
>into two rules set to HW:
>
>2.1 VF rep --> uplink 0
>2.2 VF rep --> uplink 1
>
>and we do that in the driver (add/del two HW rules, combine the stat
>results, etc)

That is up to the driver. If the driver can share block between 2
devices, he can do that. If he cannot share, it will just report stats
for every device separatelly (2 block cbs registered) and tc will see
them both together. No need to do anything in driver.


>
>3. ingress rule on VF rep port with shared tunnel device being the
>egress (encap)
>and where the routing of the underlay (tunnel) goes through LAG.
>
>in our case, this is like 2.1/2.2 above, offload two rules, combine stats
>

Same as "2."


>4. ingress rule shared tunnel device being the ingress and VF rep port
>being the egress (decap)

I don't follow :(

>
>this uses the egdev facility to be offloaded into the our driver, and
>then in the driver
>we will treat it like type 1, two rules need to be installed into HW,
>but now, we can't delegate them
>from the vxlan device b/c it has no direct connection with the bond.

I see another thing we need to sanitize: vxlan rule ingress match action
mirred redirect to lag


>
>All to all, for the mlx5 use case, seems we have elegant solution only
>for type 1.
>
>I think we should do the elegant solution for the case where it applicable.
>
>In parallel if/when newer HW APIs are there such that type 2 and 3 can be set
>using one HW rule whose dest is the bond, we are good. As for type 4,
>need to see
>if/how it can be nicer.
>
>Or.

^ permalink raw reply

* Re: [PATCH] hv_netvsc: Make sure out channel is fully opened on send
From: Mohammed Gamal @ 2018-03-14  9:22 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: otubo, sthemmin, netdev, linux-kernel, devel, vkuznets, davem
In-Reply-To: <20180313123521.5b486da1@xeon-e3>

On Tue, 2018-03-13 at 12:35 -0700, Stephen Hemminger wrote:
> On Tue, 13 Mar 2018 20:06:50 +0100
> Mohammed Gamal <mgamal@redhat.com> wrote:
> 
> > Dring high network traffic changes to network interface parameters
> > such as number of channels or MTU can cause a kernel panic with a
> > NULL
> > pointer dereference. This is due to netvsc_device_remove() being
> > called and deallocating the channel ring buffers, which can then be
> > accessed by netvsc_send_pkt() before they're allocated on calling
> > netvsc_device_add()
> > 
> > The patch fixes this problem by checking the channel state and
> > returning
> > ENODEV if not yet opened. We also move the call to
> > hv_ringbuf_avail_percent()
> > which may access the uninitialized ring buffer.
> > 
> > Signed-off-by: Mohammed Gamal <mgamal@redhat.com>
> > ---
> >  drivers/net/hyperv/netvsc.c | 5 +++--
> >  1 file changed, 3 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/net/hyperv/netvsc.c
> > b/drivers/net/hyperv/netvsc.c
> > index 0265d70..44a8358 100644
> > --- a/drivers/net/hyperv/netvsc.c
> > +++ b/drivers/net/hyperv/netvsc.c
> > @@ -757,7 +757,7 @@ static inline int netvsc_send_pkt(
> >  	struct netdev_queue *txq = netdev_get_tx_queue(ndev,
> > packet->q_idx);
> >  	u64 req_id;
> >  	int ret;
> > -	u32 ring_avail = hv_ringbuf_avail_percent(&out_channel-
> > >outbound);
> > +	u32 ring_avail;
> >  
> >  	nvmsg.hdr.msg_type = NVSP_MSG1_TYPE_SEND_RNDIS_PKT;
> >  	if (skb)
> > @@ -773,7 +773,7 @@ static inline int netvsc_send_pkt(
> >  
> >  	req_id = (ulong)skb;
> >  
> > -	if (out_channel->rescind)
> > +	if (out_channel->rescind || out_channel->state !=
> > CHANNEL_OPENED_STATE)
> >  		return -ENODEV;
> >  
> >  	if (packet->page_buf_cnt) {
> > @@ -791,6 +791,7 @@ static inline int netvsc_send_pkt(
> >  				       VMBUS_DATA_PACKET_FLAG_COMP
> > LETION_REQUESTED);
> >  	}
> >  
> > +	ring_avail = hv_ringbuf_avail_percent(&out_channel-
> > >outbound);
> >  	if (ret == 0) {
> >  		atomic_inc_return(&nvchan->queue_sends);
> >  
> 
> Thanks for your patch. Yes there are races with the current update
> logic. The root cause goes higher up in the flow; the send queues
> should
> be stopped before netvsc_device_remove is called. Solving it where
> you tried
> to is racy and not going to work reliably.
> 
> Network patches should go to netdev@vger.kernel.org
> 
> You can't move the ring_avail check until after the vmbus_sendpacket
> because
> that will break the flow control logic.
> 
Why? I don't see ring_avail being used before that point.

> Instead, you should just move the avail_read check until just after
> the existing rescind
> check.
> 
> Also, you shouldn't need to check for OPENED_STATE, just rescind is
> enough.

That rarely mitigated the race. channel->rescind flag is set on vmbus
exit - called on module unload - and when a rescind offer is received
from the host, which AFAICT doesn't happen on every call to
netvsc_device_remove, so it's quite possible that the ringbuffer is
accessed before it's allocated again on channel open and hence the
check for OPENED_STAT - which is only set after all vmbus data is
initialized.

> 
> 
> 
> 
_______________________________________________
devel mailing list
devel@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel

^ permalink raw reply

* [PATCH net v2] ipv4: lock mtu in fnhe when received PMTU < net.ipv4.route.min_pmtu
From: Sabrina Dubroca @ 2018-03-14  9:21 UTC (permalink / raw)
  To: netdev; +Cc: sbrivio, Sabrina Dubroca

Prior to the rework of PMTU information storage in commit
2c8cec5c10bc ("ipv4: Cache learned PMTU information in inetpeer."),
when a PMTU event advertising a PMTU smaller than
net.ipv4.route.min_pmtu was received, we would disable setting the DF
flag on packets by locking the MTU metric, and set the PMTU to
net.ipv4.route.min_pmtu.

Since then, we don't disable DF, and set PMTU to
net.ipv4.route.min_pmtu, so the intermediate router that has this link
with a small MTU will have to drop the packets.

This patch reestablishes pre-2.6.39 behavior by splitting
rtable->rt_pmtu into a bitfield with rt_mtu_locked and rt_pmtu.
rt_mtu_locked indicates that we shouldn't set the DF bit on that path,
and is checked in ip_dont_fragment().

One possible workaround is to set net.ipv4.route.min_pmtu to a value low
enough to accommodate the lowest MTU encountered.

Fixes: 2c8cec5c10bc ("ipv4: Cache learned PMTU information in inetpeer.")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
---
v2: make rt_pmtu a bitfield
    fix missing initializations of rt_mtu_locked

 include/net/ip.h        | 11 +++++++++--
 include/net/ip_fib.h    |  1 +
 include/net/route.h     |  3 ++-
 net/ipv4/route.c        | 26 +++++++++++++++++++-------
 net/ipv4/xfrm4_policy.c |  1 +
 5 files changed, 32 insertions(+), 10 deletions(-)

diff --git a/include/net/ip.h b/include/net/ip.h
index 746abff9ce51..f49b3a576bec 100644
--- a/include/net/ip.h
+++ b/include/net/ip.h
@@ -328,6 +328,13 @@ int ip_decrease_ttl(struct iphdr *iph)
 	return --iph->ttl;
 }
 
+static inline int ip_mtu_locked(const struct dst_entry *dst)
+{
+	const struct rtable *rt = (const struct rtable *)dst;
+
+	return rt->rt_mtu_locked || dst_metric_locked(dst, RTAX_MTU);
+}
+
 static inline
 int ip_dont_fragment(const struct sock *sk, const struct dst_entry *dst)
 {
@@ -335,7 +342,7 @@ int ip_dont_fragment(const struct sock *sk, const struct dst_entry *dst)
 
 	return  pmtudisc == IP_PMTUDISC_DO ||
 		(pmtudisc == IP_PMTUDISC_WANT &&
-		 !(dst_metric_locked(dst, RTAX_MTU)));
+		 !ip_mtu_locked(dst));
 }
 
 static inline bool ip_sk_accept_pmtu(const struct sock *sk)
@@ -361,7 +368,7 @@ static inline unsigned int ip_dst_mtu_maybe_forward(const struct dst_entry *dst,
 	struct net *net = dev_net(dst->dev);
 
 	if (net->ipv4.sysctl_ip_fwd_use_pmtu ||
-	    dst_metric_locked(dst, RTAX_MTU) ||
+	    ip_mtu_locked(dst) ||
 	    !forwarding)
 		return dst_mtu(dst);
 
diff --git a/include/net/ip_fib.h b/include/net/ip_fib.h
index f80524396c06..77d0a78cf7d2 100644
--- a/include/net/ip_fib.h
+++ b/include/net/ip_fib.h
@@ -59,6 +59,7 @@ struct fib_nh_exception {
 	int				fnhe_genid;
 	__be32				fnhe_daddr;
 	u32				fnhe_pmtu;
+	bool				fnhe_mtu_locked;
 	__be32				fnhe_gw;
 	unsigned long			fnhe_expires;
 	struct rtable __rcu		*fnhe_rth_input;
diff --git a/include/net/route.h b/include/net/route.h
index 1eb9ce470e25..3d21e3686838 100644
--- a/include/net/route.h
+++ b/include/net/route.h
@@ -63,7 +63,8 @@ struct rtable {
 	__be32			rt_gateway;
 
 	/* Miscellaneous cached information */
-	u32			rt_pmtu;
+	u32			rt_mtu_locked:1,
+				rt_pmtu:31;
 
 	u32			rt_table_id;
 
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 860b3fd2f54b..dc6909448f41 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -634,6 +634,7 @@ static inline u32 fnhe_hashfun(__be32 daddr)
 static void fill_route_from_fnhe(struct rtable *rt, struct fib_nh_exception *fnhe)
 {
 	rt->rt_pmtu = fnhe->fnhe_pmtu;
+	rt->rt_mtu_locked = fnhe->fnhe_mtu_locked;
 	rt->dst.expires = fnhe->fnhe_expires;
 
 	if (fnhe->fnhe_gw) {
@@ -644,7 +645,7 @@ static void fill_route_from_fnhe(struct rtable *rt, struct fib_nh_exception *fnh
 }
 
 static void update_or_create_fnhe(struct fib_nh *nh, __be32 daddr, __be32 gw,
-				  u32 pmtu, unsigned long expires)
+				  u32 pmtu, bool lock, unsigned long expires)
 {
 	struct fnhe_hash_bucket *hash;
 	struct fib_nh_exception *fnhe;
@@ -681,8 +682,10 @@ static void update_or_create_fnhe(struct fib_nh *nh, __be32 daddr, __be32 gw,
 			fnhe->fnhe_genid = genid;
 		if (gw)
 			fnhe->fnhe_gw = gw;
-		if (pmtu)
+		if (pmtu) {
 			fnhe->fnhe_pmtu = pmtu;
+			fnhe->fnhe_mtu_locked = lock;
+		}
 		fnhe->fnhe_expires = max(1UL, expires);
 		/* Update all cached dsts too */
 		rt = rcu_dereference(fnhe->fnhe_rth_input);
@@ -706,6 +709,7 @@ static void update_or_create_fnhe(struct fib_nh *nh, __be32 daddr, __be32 gw,
 		fnhe->fnhe_daddr = daddr;
 		fnhe->fnhe_gw = gw;
 		fnhe->fnhe_pmtu = pmtu;
+		fnhe->fnhe_mtu_locked = lock;
 		fnhe->fnhe_expires = expires;
 
 		/* Exception created; mark the cached routes for the nexthop
@@ -787,7 +791,8 @@ static void __ip_do_redirect(struct rtable *rt, struct sk_buff *skb, struct flow
 				struct fib_nh *nh = &FIB_RES_NH(res);
 
 				update_or_create_fnhe(nh, fl4->daddr, new_gw,
-						0, jiffies + ip_rt_gc_timeout);
+						0, false,
+						jiffies + ip_rt_gc_timeout);
 			}
 			if (kill_route)
 				rt->dst.obsolete = DST_OBSOLETE_KILL;
@@ -1009,15 +1014,18 @@ static void __ip_rt_update_pmtu(struct rtable *rt, struct flowi4 *fl4, u32 mtu)
 {
 	struct dst_entry *dst = &rt->dst;
 	struct fib_result res;
+	bool lock = false;
 
-	if (dst_metric_locked(dst, RTAX_MTU))
+	if (ip_mtu_locked(dst))
 		return;
 
 	if (ipv4_mtu(dst) < mtu)
 		return;
 
-	if (mtu < ip_rt_min_pmtu)
+	if (mtu < ip_rt_min_pmtu) {
+		lock = true;
 		mtu = ip_rt_min_pmtu;
+	}
 
 	if (rt->rt_pmtu == mtu &&
 	    time_before(jiffies, dst->expires - ip_rt_mtu_expires / 2))
@@ -1027,7 +1035,7 @@ static void __ip_rt_update_pmtu(struct rtable *rt, struct flowi4 *fl4, u32 mtu)
 	if (fib_lookup(dev_net(dst->dev), fl4, &res, 0) == 0) {
 		struct fib_nh *nh = &FIB_RES_NH(res);
 
-		update_or_create_fnhe(nh, fl4->daddr, 0, mtu,
+		update_or_create_fnhe(nh, fl4->daddr, 0, mtu, lock,
 				      jiffies + ip_rt_mtu_expires);
 	}
 	rcu_read_unlock();
@@ -1280,7 +1288,7 @@ static unsigned int ipv4_mtu(const struct dst_entry *dst)
 
 	mtu = READ_ONCE(dst->dev->mtu);
 
-	if (unlikely(dst_metric_locked(dst, RTAX_MTU))) {
+	if (unlikely(ip_mtu_locked(dst))) {
 		if (rt->rt_uses_gateway && mtu > 576)
 			mtu = 576;
 	}
@@ -1516,6 +1524,7 @@ struct rtable *rt_dst_alloc(struct net_device *dev,
 		rt->rt_is_input = 0;
 		rt->rt_iif = 0;
 		rt->rt_pmtu = 0;
+		rt->rt_mtu_locked = 0;
 		rt->rt_gateway = 0;
 		rt->rt_uses_gateway = 0;
 		rt->rt_table_id = 0;
@@ -2541,6 +2550,7 @@ struct dst_entry *ipv4_blackhole_route(struct net *net, struct dst_entry *dst_or
 		rt->rt_is_input = ort->rt_is_input;
 		rt->rt_iif = ort->rt_iif;
 		rt->rt_pmtu = ort->rt_pmtu;
+		rt->rt_mtu_locked = ort->rt_mtu_locked;
 
 		rt->rt_genid = rt_genid_ipv4(net);
 		rt->rt_flags = ort->rt_flags;
@@ -2643,6 +2653,8 @@ static int rt_fill_info(struct net *net,  __be32 dst, __be32 src, u32 table_id,
 	memcpy(metrics, dst_metrics_ptr(&rt->dst), sizeof(metrics));
 	if (rt->rt_pmtu && expires)
 		metrics[RTAX_MTU - 1] = rt->rt_pmtu;
+	if (rt->rt_mtu_locked && expires)
+		metrics[RTAX_LOCK - 1] |= BIT(RTAX_MTU);
 	if (rtnetlink_put_metrics(skb, metrics) < 0)
 		goto nla_put_failure;
 
diff --git a/net/ipv4/xfrm4_policy.c b/net/ipv4/xfrm4_policy.c
index 05017e2c849c..4b586e7d5637 100644
--- a/net/ipv4/xfrm4_policy.c
+++ b/net/ipv4/xfrm4_policy.c
@@ -100,6 +100,7 @@ static int xfrm4_fill_dst(struct xfrm_dst *xdst, struct net_device *dev,
 	xdst->u.rt.rt_gateway = rt->rt_gateway;
 	xdst->u.rt.rt_uses_gateway = rt->rt_uses_gateway;
 	xdst->u.rt.rt_pmtu = rt->rt_pmtu;
+	xdst->u.rt.rt_mtu_locked = rt->rt_mtu_locked;
 	xdst->u.rt.rt_table_id = rt->rt_table_id;
 	INIT_LIST_HEAD(&xdst->u.rt.rt_uncached);
 
-- 
2.16.2

^ permalink raw reply related

* Re: [RESEND] rsi: Remove stack VLA usage
From: Kalle Valo @ 2018-03-14  9:19 UTC (permalink / raw)
  To: Tobin C. Harding
  Cc: Kees Cook, Andy Shevchenko, Kernel Hardening,
	Linux Kernel Mailing List, netdev, open list:TI WILINK WIRELES...,
	Tycho Andersen, Konstantin Ryabitsev
In-Reply-To: <20180314034303.GB11666@eros>

"Tobin C. Harding" <me@tobin.cc> writes:

> Added Konstantin in case he is in charge of administering patchwork.kernel.org?
>
> On Tue, Mar 13, 2018 at 07:53:34PM -0700, Kees Cook wrote:
>> On Tue, Mar 13, 2018 at 7:11 PM, Tobin C. Harding <me@tobin.cc> wrote:
>> > On Tue, Mar 13, 2018 at 11:00:47PM +0200, Andy Shevchenko wrote:
>> >> On Tue, Mar 13, 2018 at 10:17 PM, tcharding <me@tobin.cc> wrote:
>> >> > On Mon, Mar 12, 2018 at 09:46:06AM +0000, Kalle Valo wrote:
>> >> >> tcharding <me@tobin.cc> wrote:
>> >>
>> >> I'm pretty much sure it depends on the original email headers, like
>> >> above ^^^ — no name.
>> >> Perhaps git config on your side should be done.
>> >
>> > Thanks for the suggestion Andy but the 'tcharding' as the name was
>> > munged by either Kalle or patchwork.  I'm guessing patchwork.
>> 
>> Something you're sending from is using "tcharding" (see the email Andy
>> quotes). I see the headers as:
>> 
>> Date: Wed, 14 Mar 2018 07:17:57 +1100
>> From: tcharding <me@tobin.cc>
>> ...
>> Message-ID: <20180313201757.GK8631@eros>
>> X-Mailer: Mutt 1.5.24 (2015-08-30)
>> User-Agent: Mutt/1.5.24 (2015-08-30)
>> 
>> Your most recently email shows "Tobin C. Harding" though, and also
>> sent with Mutt...
>>
>> Do you have multiple Mutt configurations? Is something lacking a
>> "From" header insertion and your MTA is filling it in for you from
>> your username?
>
> Thanks for taking the time to respond Kees (and Tycho).  I have mutt
> configured to reply from whichever email address I receive from so if
> patchwork sent an email to 'tcharding <me@tobin.cc>' (which is the
> details it has) and I hit reply it would have come from 'tcharding',
> hence Andy's reply.  I wouldn't bet my life on it but I'm kinda
> confident that I cannot initiate an email from 'tcharding' with my
> current set up.
>
> Super bad form to blame someone (or something else) but I think this is
> a problem with how my patchwork account is configured.  Either way, that
> is still my fault I should have added my real name to patchwork when I
> signed up (not just username 'tcharding').
>
> Is patchwork.kernel.org administered by Konstantin Ryabitsev? Added
> Konstantin to CC's.

Like I said earlier, just send a request to helpdesk@kernel.org and
admins should fix your name.

-- 
Kalle Valo

^ permalink raw reply

* Re: [RESEND] rsi: Remove stack VLA usage
From: Kalle Valo @ 2018-03-14  9:11 UTC (permalink / raw)
  To: Tobin C. Harding
  Cc: Andy Shevchenko, kernel-hardening, Linux Kernel Mailing List,
	netdev, linux-wireless, Tycho Andersen, Kees Cook
In-Reply-To: <20180314021151.GA11666@eros>

"Tobin C. Harding" <me@tobin.cc> writes:

> On Tue, Mar 13, 2018 at 11:00:47PM +0200, Andy Shevchenko wrote:
>> On Tue, Mar 13, 2018 at 10:17 PM, tcharding <me@tobin.cc> wrote:
>> > On Mon, Mar 12, 2018 at 09:46:06AM +0000, Kalle Valo wrote:
>> >> tcharding <me@tobin.cc> wrote:
>> 
>> I'm pretty much sure it depends on the original email headers, like
>> above ^^^ — no name.
>> Perhaps git config on your side should be done.
>
> Thanks for the suggestion Andy but the 'tcharding' as the name was
> munged by either Kalle or patchwork.  I'm guessing patchwork.

You guessed corretly, patchwork is here to blame. I sent my "please
rebase" mail earlier in this thread using my custom patchwork client
script (pwcli) which takes the name and address from patchwork.

Andy, this is definitely a bug in patchwork and I have seen this issue
multiple times already. I have understood that it has been fixed in a
recent version but patchwork.kernel.org is still running an old version.
In the original mail Tobin did have the correct From header which can be
checked from the headers in patch page[1]:

From: "Tobin C. Harding" <me@tobin.cc>

[1] https://patchwork.kernel.org/patch/10274983/

-- 
Kalle Valo

^ permalink raw reply

* Re: [RESEND] rsi: Remove stack VLA usage
From: Kalle Valo @ 2018-03-14  9:05 UTC (permalink / raw)
  To: tcharding
  Cc: kernel-hardening, linux-kernel, netdev, linux-wireless,
	Tycho Andersen, Kees Cook
In-Reply-To: <20180313201757.GK8631@eros>

tcharding <me@tobin.cc> writes:

> On Mon, Mar 12, 2018 at 09:46:06AM +0000, Kalle Valo wrote:
>> tcharding <me@tobin.cc> wrote:
>> 
>> > The kernel would like to have all stack VLA usage removed[1].  rsi uses
>> > a VLA based on 'blksize'.  Elsewhere in the SDIO code maximum block size
>> > is defined using a magic number.  We can use a pre-processor defined
>> > constant and declare the array to maximum size.  We add a check before
>> > accessing the array in case of programmer error.
>> > 
>> > [1]: https://lkml.org/lkml/2018/3/7/621
>> > 
>> > Signed-off-by: Tobin C. Harding <me@tobin.cc>
>> 
>> Tobin, your name in patchwork.kernel.org is just "tcharding" then it should be
>> "Tobin C. Harding". Patchwork is braindead in a way as it takes the name from
>> it's database instead of the From header of the patch in question.
>> 
>> I can fix that manually but it would be helpful if you could register to
>> patchwork and fix your name during registration. You have only one chance to
>> fix your name (another braindead feature!) so be careful :)
>
> Hi Kalle,
>
> I logged into my patchwork account but I don't see any way to set the
> name.  Within 'profile' there is only 'change password' and 'link
> email'.  I thought I could unregister then re-register but I can't see
> how to do that either.

Ok, maybe you have registered (=logged on for the first time) already
earlier so it's not possible to change the name anymore.

> Is there a maintainer of patchwork.kernel.org who I can email to
> manually remove me from the system?

helpdesk@kernel.org should be able to fix your name in patchwork, at
least they have done it in the past. This is not the first time this has
happened.

-- 
Kalle Valo

^ permalink raw reply

* Re: [PATCH iproute2] Revert "iproute: "list/flush/save default" selected all of the routes"
From: Alexander Zubkov @ 2018-03-14  8:59 UTC (permalink / raw)
  To: Serhey Popovych, Luca Boccassi, Stephen Hemminger; +Cc: netdev@vger.kernel.org
In-Reply-To: <3ebb9805-fb69-c67d-6222-c6d1d344f4ec@msu.ru>

Hello,

There was a series of patches by Serhey and specifically this one:
https://git.kernel.org/pub/scm/network/iproute2/iproute2.git/commit/?id=93fa12418dc6f5943692250244be303bb162175b

It drops handling of special prefix names in get_prefix_1(), and in get_addr_1() they always receive family and bytelen. But as I unerstand for any case it was important to keep it with unspecified family for further filtering. As I do not know what is the global idea, I want to discuss it. Because there are options depending on how and where we want to handle those special names. Like keep unspecified family or change filtering logic.

I have added Serhey Popovych in the recepients, so he can give some ideas on what his aim is and help choose better solution.

13.03.2018, 21:12, "Alexander Zubkov" <green@msu.ru>:
> Hi,
>
> I just realized that you need patch for v4.15.0, which is easier to do.
> I'll send it as separate message now. I will make patch for the master
> branch, but later.
>
> On 13.03.2018 13:02, Luca Boccassi wrote:
>>  On Tue, 2018-03-13 at 12:05 +0100, Alexander Zubkov wrote:
>>>  Hello again,
>>>
>>>  The fun thing is that before the commit "ip route ls all" showed all
>>>  routes, but "ip -[4|6] route ls all" showed only default. So it was
>>>  broken too, but in other way.
>>>  I see parsing of prefix was changed since my patch. So I need several
>>>  days to propose fix. I think if "ip route ls [all|any]" shows all
>>>  routes and "ip route ls default" shows only default, everybody will
>>>  be happy with that?
>>
>>  Hi,
>>
>>  My only concern is that behaviour of existing commands that have been
>>  in releases is not changed, otherwise I get bugs raised :-)
>>
>>  Thank you for your work!
>>
>>>  13.03.2018, 09:46, "Alexander Zubkov" <green@msu.ru>:
>>>>  Hello.
>>>>
>>>>  May be the better way would be to change how "all"/"any" argument
>>>>  behaves? My original concern was about "default" only. I agree too,
>>>>  that "all" or "any" should work for all routes. But not for the
>>>>  default.
>>>>
>>>>  12.03.2018, 22:37, "Luca Boccassi" <bluca@debian.org>:
>>>>>    On Mon, 2018-03-12 at 14:03 -0700, Stephen Hemminger wrote:
>>>>>>     This reverts commit 9135c4d6037ff9f1818507bac0049fc44db8c3d2.
>>>>>>
>>>>>>     Debian maintainer found that basic command:
>>>>>>             # ip route flush all
>>>>>>     No longer worked as expected which breaks user scripts and
>>>>>>     expectations. It no longer flushed all IPv4 routes.
>>>>>>
>>>>>>     Reported-by: Luca Boccassi <bluca@debian.org>
>>>>>>     Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
>>>>>>     ---
>>>>>>      ip/iproute.c | 65 ++++++++++++++++++----------------------
>>>>>>  --------
>>>>>>     ------------
>>>>>>      lib/utils.c  | 13 ++++++++++++
>>>>>>      2 files changed, 32 insertions(+), 46 deletions(-)
>>>>>
>>>>>    Tested-by: Luca Boccassi <bluca@debian.org>
>>>>>
>>>>>    Thanks, solves the problem. I'll backport it to Debian.
>>>>>
>>>>>    Alexander, reproducing the issue is quite simple - before that
>>>>>  commit,
>>>>>    ip route ls all showed all routes, but with the change it
>>>>>  started
>>>>>    showing only the default table. Same for ip route flush.
>>>>>
>>>>>    --
>>>>>    Kind regards,
>>>>>    Luca Boccassi

^ permalink raw reply

* Re: [pci PATCH v6 5/5] pci-pf-stub: Add PF driver stub for PFs that function only to enable VFs
From: Christoph Hellwig @ 2018-03-14  8:56 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: bhelgaas, alexander.h.duyck, linux-pci, virtio-dev, kvm, netdev,
	dan.daly, linux-kernel, linux-nvme, keith.busch, netanel, ddutile,
	mheyne, liang-min.wang, mark.d.rustad, dwmw2, hch, dwmw
In-Reply-To: <20180313213054.3553.89366.stgit@localhost.localdomain>

> +
> +/**
> + * pci_pf_stub_white_list - White list of devices to bind pci-pf-stub onto
> + *
> + * This table provides the list of IDs this driver is supposed to bind
> + * onto. You could think of this as a list of "quirked" devices where we
> + * are adding support for SR-IOV here since there are no other drivers
> + * that they would be running under.
> + *
> + * Layout of the table below is as follows:
> + * { Vendor ID, Device ID,
> + *   SubVendor ID, SubDevice ID,
> + *   Class, Class Mask,
> + *   private data (not used) }
> + */

No need to document the PCI device table format in a random driver.

Otherwise looks fine:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply

* Re: [pci PATCH v6 4/5] nvme: Migrate over to unmanaged SR-IOV support
From: Christoph Hellwig @ 2018-03-14  8:54 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: bhelgaas, alexander.h.duyck, linux-pci, virtio-dev, kvm, netdev,
	dan.daly, linux-kernel, linux-nvme, keith.busch, netanel, ddutile,
	mheyne, liang-min.wang, mark.d.rustad, dwmw2, hch, dwmw
In-Reply-To: <20180313213034.3553.47677.stgit@localhost.localdomain>

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply

* Re: [pci PATCH v6 2/5] virtio_pci: Add support for unmanaged SR-IOV on virtio_pci devices
From: Christoph Hellwig @ 2018-03-14  8:54 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: bhelgaas, alexander.h.duyck, linux-pci, virtio-dev, kvm, netdev,
	dan.daly, linux-kernel, linux-nvme, keith.busch, netanel, ddutile,
	mheyne, liang-min.wang, mark.d.rustad, dwmw2, hch, dwmw
In-Reply-To: <20180313212855.3553.97762.stgit@localhost.localdomain>

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply

* Re: [pci PATCH v6 1/5] pci: Add pci_sriov_configure_simple for PFs that don't manage VF resources
From: Christoph Hellwig @ 2018-03-14  8:54 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: bhelgaas, alexander.h.duyck, linux-pci, virtio-dev, kvm, netdev,
	dan.daly, linux-kernel, linux-nvme, keith.busch, netanel, ddutile,
	mheyne, liang-min.wang, mark.d.rustad, dwmw2, hch, dwmw
In-Reply-To: <20180313212754.3553.72176.stgit@localhost.localdomain>

On Tue, Mar 13, 2018 at 02:28:49PM -0700, Alexander Duyck wrote:
> From: Alexander Duyck <alexander.h.duyck@intel.com>
> 
> This patch adds a common configuration function called
> pci_sriov_configure_simple that will allow for managing VFs on devices
> where the PF is not capable of managing VF resources.
> 
> Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
> ---
> 
> v5: New patch replacing pci_sriov_configure_unmanaged with
>       pci_sriov_configure_simple
>     Dropped bits related to autoprobe changes
> v6: Defined pci_sriov_configure_simple as NULL if IOV is disabled
> 
>  drivers/pci/iov.c   |   32 ++++++++++++++++++++++++++++++++
>  include/linux/pci.h |    3 +++
>  2 files changed, 35 insertions(+)
> 
> diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
> index 677924ae0350..bd7021491fdb 100644
> --- a/drivers/pci/iov.c
> +++ b/drivers/pci/iov.c
> @@ -807,3 +807,35 @@ int pci_sriov_get_totalvfs(struct pci_dev *dev)
>  	return dev->sriov->total_VFs;
>  }
>  EXPORT_SYMBOL_GPL(pci_sriov_get_totalvfs);
> +
> +/**
> + * pci_sriov_configure_simple - helper to configure unmanaged SR-IOV
> + * @dev: the PCI device
> + * @nr_virtfn: number of virtual functions to enable, 0 to disable
> + *
> + * Used to provide generic enable/disable SR-IOV option for devices
> + * that do not manage the VFs generated by their driver
> + */
> +int pci_sriov_configure_simple(struct pci_dev *dev, int nr_virtfn)
> +{
> +	int err = -EINVAL;

This assignment seems like it is never used..

> +
> +	might_sleep();
> +
> +	if (!dev->is_physfn)
> +		return -ENODEV;
> +
> +	if (pci_vfs_assigned(dev)) {
> +		pci_warn(dev,
> +			 "Cannot modify SR-IOV while VFs are assigned\n");
> +		err = -EPERM;

Why not:

	if (pci_vfs_assigned(dev)) {
		pci_warn(dev,
			 "Cannot modify SR-IOV while VFs are assigned\n");
		return -EPERM;
	}

Otherwise looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox