Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH 05/14] net: sched: always take reference to action
From: Vlad Buslov @ 2018-05-14 18:49 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: netdev, davem, jhs, xiyou.wangcong, pablo, kadlec, fw, ast,
	daniel, edumazet, keescook, linux-kernel, netfilter-devel,
	coreteam, kliteyn
In-Reply-To: <20180514162301.GC2134@nanopsycho.orion>


On Mon 14 May 2018 at 16:23, Jiri Pirko <jiri@resnulli.us> wrote:
> Mon, May 14, 2018 at 04:27:06PM CEST, vladbu@mellanox.com wrote:
>>Without rtnl lock protection it is no longer safe to use pointer to tc
>>action without holding reference to it. (it can be destroyed concurrently)
>>
>>Remove unsafe action idr lookup function. Instead of it, implement safe tcf
>>idr check function that atomically looks up action in idr and increments
>>its reference and bind counters.
>>
>>Implement both action search and check using new safe function.
>>
>>Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
>>---
>> net/sched/act_api.c | 38 ++++++++++++++++----------------------
>> 1 file changed, 16 insertions(+), 22 deletions(-)
>>
>>diff --git a/net/sched/act_api.c b/net/sched/act_api.c
>>index 1331beb..9459cce 100644
>>--- a/net/sched/act_api.c
>>+++ b/net/sched/act_api.c
>>@@ -284,44 +284,38 @@ int tcf_generic_walker(struct tc_action_net *tn, struct sk_buff *skb,
>> }
>> EXPORT_SYMBOL(tcf_generic_walker);
>> 
>>-static struct tc_action *tcf_idr_lookup(u32 index, struct tcf_idrinfo *idrinfo)
>>+bool __tcf_idr_check(struct tc_action_net *tn, u32 index, struct tc_action **a,
>>+		     int bind)
>> {
>>-	struct tc_action *p = NULL;
>>+	struct tcf_idrinfo *idrinfo = tn->idrinfo;
>>+	struct tc_action *p;
>> 
>> 	spin_lock_bh(&idrinfo->lock);
>
> Why "_bh" variant is necessary here?

It is not my code.

>
>> 	p = idr_find(&idrinfo->action_idr, index);
>>+	if (p) {
>>+		refcount_inc(&p->tcfa_refcnt);
>>+		if (bind)
>>+			atomic_inc(&p->tcfa_bindcnt);
>>+	}
>> 	spin_unlock_bh(&idrinfo->lock);
>
> [...]

^ permalink raw reply

* [PATCH net-next v2 4/4] bonding: allow carrier and link status to determine link state
From: Debabrata Banerjee @ 2018-05-14 18:48 UTC (permalink / raw)
  To: David S . Miller, netdev, Jay Vosburgh
  Cc: Veaceslav Falico, Andy Gospodarek, dbanerje
In-Reply-To: <20180514184810.15506-1-dbanerje@akamai.com>

In a mixed environment it may be difficult to tell if your hardware
support carrier, if it does not it can always report true. With a new
use_carrier option of 2, we can check both carrier and link status
sequentially, instead of one or the other

Signed-off-by: Debabrata Banerjee <dbanerje@akamai.com>
---
 Documentation/networking/bonding.txt |  4 ++--
 drivers/net/bonding/bond_main.c      | 12 ++++++++----
 drivers/net/bonding/bond_options.c   |  7 ++++---
 3 files changed, 14 insertions(+), 9 deletions(-)

diff --git a/Documentation/networking/bonding.txt b/Documentation/networking/bonding.txt
index 9ba04c0bab8d..f063730e7e73 100644
--- a/Documentation/networking/bonding.txt
+++ b/Documentation/networking/bonding.txt
@@ -828,8 +828,8 @@ use_carrier
 	MII / ETHTOOL ioctl method to determine the link state.
 
 	A value of 1 enables the use of netif_carrier_ok(), a value of
-	0 will use the deprecated MII / ETHTOOL ioctls.  The default
-	value is 1.
+	0 will use the deprecated MII / ETHTOOL ioctls. A value of 2
+	will check both.  The default value is 1.
 
 xmit_hash_policy
 
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index f7f8a49cb32b..7e9652c4b35c 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -132,7 +132,7 @@ MODULE_PARM_DESC(downdelay, "Delay before considering link down, "
 			    "in milliseconds");
 module_param(use_carrier, int, 0);
 MODULE_PARM_DESC(use_carrier, "Use netif_carrier_ok (vs MII ioctls) in miimon; "
-			      "0 for off, 1 for on (default)");
+			      "0 for off, 1 for on (default), 2 for carrier then legacy checks");
 module_param(mode, charp, 0);
 MODULE_PARM_DESC(mode, "Mode of operation; 0 for balance-rr, "
 		       "1 for active-backup, 2 for balance-xor, "
@@ -434,12 +434,16 @@ static int bond_check_dev_link(struct bonding *bond,
 	int (*ioctl)(struct net_device *, struct ifreq *, int);
 	struct ifreq ifr;
 	struct mii_ioctl_data *mii;
+	bool carrier = true;
 
 	if (!reporting && !netif_running(slave_dev))
 		return 0;
 
 	if (bond->params.use_carrier)
-		return netif_carrier_ok(slave_dev) ? BMSR_LSTATUS : 0;
+		carrier = netif_carrier_ok(slave_dev) ? BMSR_LSTATUS : 0;
+
+	if (!carrier)
+		return carrier;
 
 	/* Try to get link status using Ethtool first. */
 	if (slave_dev->ethtool_ops->get_link)
@@ -4399,8 +4403,8 @@ static int bond_check_params(struct bond_params *params)
 		downdelay = 0;
 	}
 
-	if ((use_carrier != 0) && (use_carrier != 1)) {
-		pr_warn("Warning: use_carrier module parameter (%d), not of valid value (0/1), so it was set to 1\n",
+	if (use_carrier < 0 || use_carrier > 2) {
+		pr_warn("Warning: use_carrier module parameter (%d), not of valid value (0-2), so it was set to 1\n",
 			use_carrier);
 		use_carrier = 1;
 	}
diff --git a/drivers/net/bonding/bond_options.c b/drivers/net/bonding/bond_options.c
index 8a945c9341d6..dba6cef05134 100644
--- a/drivers/net/bonding/bond_options.c
+++ b/drivers/net/bonding/bond_options.c
@@ -164,9 +164,10 @@ static const struct bond_opt_value bond_primary_reselect_tbl[] = {
 };
 
 static const struct bond_opt_value bond_use_carrier_tbl[] = {
-	{ "off", 0,  0},
-	{ "on",  1,  BOND_VALFLAG_DEFAULT},
-	{ NULL,  -1, 0}
+	{ "off",  0,  0},
+	{ "on",   1,  BOND_VALFLAG_DEFAULT},
+	{ "both", 2,  0},
+	{ NULL,  -1,  0}
 };
 
 static const struct bond_opt_value bond_all_slaves_active_tbl[] = {
-- 
2.17.0

^ permalink raw reply related

* [PATCH net-next v2 2/4] bonding: use common mac addr checks
From: Debabrata Banerjee @ 2018-05-14 18:48 UTC (permalink / raw)
  To: David S . Miller, netdev, Jay Vosburgh
  Cc: Veaceslav Falico, Andy Gospodarek, dbanerje
In-Reply-To: <20180514184810.15506-1-dbanerje@akamai.com>

Replace homegrown mac addr checks with faster defs from etherdevice.h

Note that this will also prevent any rlb arp updates for multicast
addresses, however this should have been forbidden anyway.

Signed-off-by: Debabrata Banerjee <dbanerje@akamai.com>
---
 drivers/net/bonding/bond_alb.c | 28 +++++++++-------------------
 1 file changed, 9 insertions(+), 19 deletions(-)

diff --git a/drivers/net/bonding/bond_alb.c b/drivers/net/bonding/bond_alb.c
index c2f6c58e4e6a..180e50f7806f 100644
--- a/drivers/net/bonding/bond_alb.c
+++ b/drivers/net/bonding/bond_alb.c
@@ -40,11 +40,6 @@
 #include <net/bonding.h>
 #include <net/bond_alb.h>
 
-
-
-static const u8 mac_bcast[ETH_ALEN + 2] __long_aligned = {
-	0xff, 0xff, 0xff, 0xff, 0xff, 0xff
-};
 static const u8 mac_v6_allmcast[ETH_ALEN + 2] __long_aligned = {
 	0x33, 0x33, 0x00, 0x00, 0x00, 0x01
 };
@@ -420,9 +415,7 @@ static void rlb_clear_slave(struct bonding *bond, struct slave *slave)
 
 			if (assigned_slave) {
 				rx_hash_table[index].slave = assigned_slave;
-				if (!ether_addr_equal_64bits(rx_hash_table[index].mac_dst,
-							     mac_bcast) &&
-				    !is_zero_ether_addr(rx_hash_table[index].mac_dst)) {
+				if (is_valid_ether_addr(rx_hash_table[index].mac_dst)) {
 					bond_info->rx_hashtbl[index].ntt = 1;
 					bond_info->rx_ntt = 1;
 					/* A slave has been removed from the
@@ -525,8 +518,7 @@ static void rlb_req_update_slave_clients(struct bonding *bond, struct slave *sla
 		client_info = &(bond_info->rx_hashtbl[hash_index]);
 
 		if ((client_info->slave == slave) &&
-		    !ether_addr_equal_64bits(client_info->mac_dst, mac_bcast) &&
-		    !is_zero_ether_addr(client_info->mac_dst)) {
+		    is_valid_ether_addr(client_info->mac_dst)) {
 			client_info->ntt = 1;
 			ntt = 1;
 		}
@@ -567,8 +559,7 @@ static void rlb_req_update_subnet_clients(struct bonding *bond, __be32 src_ip)
 		if ((client_info->ip_src == src_ip) &&
 		    !ether_addr_equal_64bits(client_info->slave->dev->dev_addr,
 					     bond->dev->dev_addr) &&
-		    !ether_addr_equal_64bits(client_info->mac_dst, mac_bcast) &&
-		    !is_zero_ether_addr(client_info->mac_dst)) {
+		    is_valid_ether_addr(client_info->mac_dst)) {
 			client_info->ntt = 1;
 			bond_info->rx_ntt = 1;
 		}
@@ -596,7 +587,7 @@ static struct slave *rlb_choose_channel(struct sk_buff *skb, struct bonding *bon
 		if ((client_info->ip_src == arp->ip_src) &&
 		    (client_info->ip_dst == arp->ip_dst)) {
 			/* the entry is already assigned to this client */
-			if (!ether_addr_equal_64bits(arp->mac_dst, mac_bcast)) {
+			if (!is_broadcast_ether_addr(arp->mac_dst)) {
 				/* update mac address from arp */
 				ether_addr_copy(client_info->mac_dst, arp->mac_dst);
 			}
@@ -644,8 +635,7 @@ static struct slave *rlb_choose_channel(struct sk_buff *skb, struct bonding *bon
 		ether_addr_copy(client_info->mac_src, arp->mac_src);
 		client_info->slave = assigned_slave;
 
-		if (!ether_addr_equal_64bits(client_info->mac_dst, mac_bcast) &&
-		    !is_zero_ether_addr(client_info->mac_dst)) {
+		if (is_valid_ether_addr(client_info->mac_dst)) {
 			client_info->ntt = 1;
 			bond->alb_info.rx_ntt = 1;
 		} else {
@@ -1418,9 +1408,9 @@ int bond_alb_xmit(struct sk_buff *skb, struct net_device *bond_dev)
 	case ETH_P_IP: {
 		const struct iphdr *iph = ip_hdr(skb);
 
-		if (ether_addr_equal_64bits(eth_data->h_dest, mac_bcast) ||
-		    (iph->daddr == ip_bcast) ||
-		    (iph->protocol == IPPROTO_IGMP)) {
+		if (is_broadcast_ether_addr(eth_data->h_dest) ||
+		    iph->daddr == ip_bcast ||
+		    iph->protocol == IPPROTO_IGMP) {
 			do_tx_balance = false;
 			break;
 		}
@@ -1432,7 +1422,7 @@ int bond_alb_xmit(struct sk_buff *skb, struct net_device *bond_dev)
 		/* IPv6 doesn't really use broadcast mac address, but leave
 		 * that here just in case.
 		 */
-		if (ether_addr_equal_64bits(eth_data->h_dest, mac_bcast)) {
+		if (is_broadcast_ether_addr(eth_data->h_dest)) {
 			do_tx_balance = false;
 			break;
 		}
-- 
2.17.0

^ permalink raw reply related

* [PATCH net-next v2 3/4] bonding: allow use of tx hashing in balance-alb
From: Debabrata Banerjee @ 2018-05-14 18:48 UTC (permalink / raw)
  To: David S . Miller, netdev, Jay Vosburgh
  Cc: Veaceslav Falico, Andy Gospodarek, dbanerje
In-Reply-To: <20180514184810.15506-1-dbanerje@akamai.com>

The rx load balancing provided by balance-alb is not mutually
exclusive with using hashing for tx selection, and should provide a decent
speed increase because this eliminates spinlocks and cache contention.

Signed-off-by: Debabrata Banerjee <dbanerje@akamai.com>
---
 drivers/net/bonding/bond_alb.c     | 20 ++++++++++++++++++--
 drivers/net/bonding/bond_main.c    | 25 +++++++++++++++----------
 drivers/net/bonding/bond_options.c |  2 +-
 include/net/bonding.h              | 11 +++++++++--
 4 files changed, 43 insertions(+), 15 deletions(-)

diff --git a/drivers/net/bonding/bond_alb.c b/drivers/net/bonding/bond_alb.c
index 180e50f7806f..6228635880d5 100644
--- a/drivers/net/bonding/bond_alb.c
+++ b/drivers/net/bonding/bond_alb.c
@@ -1478,8 +1478,24 @@ int bond_alb_xmit(struct sk_buff *skb, struct net_device *bond_dev)
 	}
 
 	if (do_tx_balance) {
-		hash_index = _simple_hash(hash_start, hash_size);
-		tx_slave = tlb_choose_channel(bond, hash_index, skb->len);
+		if (bond->params.tlb_dynamic_lb) {
+			hash_index = _simple_hash(hash_start, hash_size);
+			tx_slave = tlb_choose_channel(bond, hash_index, skb->len);
+		} else {
+			/*
+			 * do_tx_balance means we are free to select the tx_slave
+			 * So we do exactly what tlb would do for hash selection
+			 */
+
+			struct bond_up_slave *slaves;
+			unsigned int count;
+
+			slaves = rcu_dereference(bond->slave_arr);
+			count = slaves ? READ_ONCE(slaves->count) : 0;
+			if (likely(count))
+				tx_slave = slaves->arr[bond_xmit_hash(bond, skb) %
+						       count];
+		}
 	}
 
 	return bond_do_alb_xmit(skb, bond, tx_slave);
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 1f1e97b26f95..f7f8a49cb32b 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -159,7 +159,7 @@ module_param(min_links, int, 0);
 MODULE_PARM_DESC(min_links, "Minimum number of available links before turning on carrier");
 
 module_param(xmit_hash_policy, charp, 0);
-MODULE_PARM_DESC(xmit_hash_policy, "balance-xor and 802.3ad hashing method; "
+MODULE_PARM_DESC(xmit_hash_policy, "balance-alb, balance-tlb, balance-xor, 802.3ad hashing method; "
 				   "0 for layer 2 (default), 1 for layer 3+4, "
 				   "2 for layer 2+3, 3 for encap layer 2+3, "
 				   "4 for encap layer 3+4");
@@ -1735,7 +1735,7 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev,
 		unblock_netpoll_tx();
 	}
 
-	if (bond_mode_uses_xmit_hash(bond))
+	if (bond_mode_can_use_xmit_hash(bond))
 		bond_update_slave_arr(bond, NULL);
 
 	bond->nest_level = dev_get_nest_level(bond_dev);
@@ -1870,7 +1870,7 @@ static int __bond_release_one(struct net_device *bond_dev,
 	if (BOND_MODE(bond) == BOND_MODE_8023AD)
 		bond_3ad_unbind_slave(slave);
 
-	if (bond_mode_uses_xmit_hash(bond))
+	if (bond_mode_can_use_xmit_hash(bond))
 		bond_update_slave_arr(bond, slave);
 
 	netdev_info(bond_dev, "Releasing %s interface %s\n",
@@ -3102,7 +3102,7 @@ static int bond_slave_netdev_event(unsigned long event,
 		 * events. If these (miimon/arpmon) parameters are configured
 		 * then array gets refreshed twice and that should be fine!
 		 */
-		if (bond_mode_uses_xmit_hash(bond))
+		if (bond_mode_can_use_xmit_hash(bond))
 			bond_update_slave_arr(bond, NULL);
 		break;
 	case NETDEV_CHANGEMTU:
@@ -3322,7 +3322,7 @@ static int bond_open(struct net_device *bond_dev)
 		 */
 		if (bond_alb_initialize(bond, (BOND_MODE(bond) == BOND_MODE_ALB)))
 			return -ENOMEM;
-		if (bond->params.tlb_dynamic_lb)
+		if (bond->params.tlb_dynamic_lb || BOND_MODE(bond) == BOND_MODE_ALB)
 			queue_delayed_work(bond->wq, &bond->alb_work, 0);
 	}
 
@@ -3341,7 +3341,7 @@ static int bond_open(struct net_device *bond_dev)
 		bond_3ad_initiate_agg_selection(bond, 1);
 	}
 
-	if (bond_mode_uses_xmit_hash(bond))
+	if (bond_mode_can_use_xmit_hash(bond))
 		bond_update_slave_arr(bond, NULL);
 
 	return 0;
@@ -3892,7 +3892,7 @@ static void bond_slave_arr_handler(struct work_struct *work)
  * to determine the slave interface -
  * (a) BOND_MODE_8023AD
  * (b) BOND_MODE_XOR
- * (c) BOND_MODE_TLB && tlb_dynamic_lb == 0
+ * (c) (BOND_MODE_TLB || BOND_MODE_ALB) && tlb_dynamic_lb == 0
  *
  * The caller is expected to hold RTNL only and NO other lock!
  */
@@ -3945,6 +3945,11 @@ int bond_update_slave_arr(struct bonding *bond, struct slave *skipslave)
 			continue;
 		if (skipslave == slave)
 			continue;
+
+		netdev_dbg(bond->dev,
+			   "Adding slave dev %s to tx hash array[%d]\n",
+			   slave->dev->name, new_arr->count);
+
 		new_arr->arr[new_arr->count++] = slave;
 	}
 
@@ -4320,9 +4325,9 @@ static int bond_check_params(struct bond_params *params)
 	}
 
 	if (xmit_hash_policy) {
-		if ((bond_mode != BOND_MODE_XOR) &&
-		    (bond_mode != BOND_MODE_8023AD) &&
-		    (bond_mode != BOND_MODE_TLB)) {
+		if (bond_mode == BOND_MODE_ROUNDROBIN ||
+		    bond_mode == BOND_MODE_ACTIVEBACKUP ||
+		    bond_mode == BOND_MODE_BROADCAST) {
 			pr_info("xmit_hash_policy param is irrelevant in mode %s\n",
 				bond_mode_name(bond_mode));
 		} else {
diff --git a/drivers/net/bonding/bond_options.c b/drivers/net/bonding/bond_options.c
index 58c705f24f96..8a945c9341d6 100644
--- a/drivers/net/bonding/bond_options.c
+++ b/drivers/net/bonding/bond_options.c
@@ -395,7 +395,7 @@ static const struct bond_option bond_opts[BOND_OPT_LAST] = {
 		.id = BOND_OPT_TLB_DYNAMIC_LB,
 		.name = "tlb_dynamic_lb",
 		.desc = "Enable dynamic flow shuffling",
-		.unsuppmodes = BOND_MODE_ALL_EX(BIT(BOND_MODE_TLB)),
+		.unsuppmodes = BOND_MODE_ALL_EX(BIT(BOND_MODE_TLB) | BIT(BOND_MODE_ALB)),
 		.values = bond_tlb_dynamic_lb_tbl,
 		.flags = BOND_OPTFLAG_IFDOWN,
 		.set = bond_option_tlb_dynamic_lb_set,
diff --git a/include/net/bonding.h b/include/net/bonding.h
index b52235158836..808f1d167349 100644
--- a/include/net/bonding.h
+++ b/include/net/bonding.h
@@ -285,8 +285,15 @@ static inline bool bond_needs_speed_duplex(const struct bonding *bond)
 
 static inline bool bond_is_nondyn_tlb(const struct bonding *bond)
 {
-	return (BOND_MODE(bond) == BOND_MODE_TLB)  &&
-	       (bond->params.tlb_dynamic_lb == 0);
+	return (bond_is_lb(bond) && bond->params.tlb_dynamic_lb == 0);
+}
+
+static inline bool bond_mode_can_use_xmit_hash(const struct bonding *bond)
+{
+	return (BOND_MODE(bond) == BOND_MODE_8023AD ||
+		BOND_MODE(bond) == BOND_MODE_XOR ||
+		BOND_MODE(bond) == BOND_MODE_TLB ||
+		BOND_MODE(bond) == BOND_MODE_ALB);
 }
 
 static inline bool bond_mode_uses_xmit_hash(const struct bonding *bond)
-- 
2.17.0

^ permalink raw reply related

* [PATCH net-next v2 1/4] bonding: don't queue up extraneous rlb updates
From: Debabrata Banerjee @ 2018-05-14 18:48 UTC (permalink / raw)
  To: David S . Miller, netdev, Jay Vosburgh
  Cc: Veaceslav Falico, Andy Gospodarek, dbanerje
In-Reply-To: <20180514184810.15506-1-dbanerje@akamai.com>

arps for incomplete entries can't be sent anyway.

Signed-off-by: Debabrata Banerjee <dbanerje@akamai.com>
---
 drivers/net/bonding/bond_alb.c | 18 ++++++++++++------
 1 file changed, 12 insertions(+), 6 deletions(-)

diff --git a/drivers/net/bonding/bond_alb.c b/drivers/net/bonding/bond_alb.c
index 5eb0df2e5464..c2f6c58e4e6a 100644
--- a/drivers/net/bonding/bond_alb.c
+++ b/drivers/net/bonding/bond_alb.c
@@ -421,7 +421,8 @@ static void rlb_clear_slave(struct bonding *bond, struct slave *slave)
 			if (assigned_slave) {
 				rx_hash_table[index].slave = assigned_slave;
 				if (!ether_addr_equal_64bits(rx_hash_table[index].mac_dst,
-							     mac_bcast)) {
+							     mac_bcast) &&
+				    !is_zero_ether_addr(rx_hash_table[index].mac_dst)) {
 					bond_info->rx_hashtbl[index].ntt = 1;
 					bond_info->rx_ntt = 1;
 					/* A slave has been removed from the
@@ -524,7 +525,8 @@ static void rlb_req_update_slave_clients(struct bonding *bond, struct slave *sla
 		client_info = &(bond_info->rx_hashtbl[hash_index]);
 
 		if ((client_info->slave == slave) &&
-		    !ether_addr_equal_64bits(client_info->mac_dst, mac_bcast)) {
+		    !ether_addr_equal_64bits(client_info->mac_dst, mac_bcast) &&
+		    !is_zero_ether_addr(client_info->mac_dst)) {
 			client_info->ntt = 1;
 			ntt = 1;
 		}
@@ -565,7 +567,8 @@ static void rlb_req_update_subnet_clients(struct bonding *bond, __be32 src_ip)
 		if ((client_info->ip_src == src_ip) &&
 		    !ether_addr_equal_64bits(client_info->slave->dev->dev_addr,
 					     bond->dev->dev_addr) &&
-		    !ether_addr_equal_64bits(client_info->mac_dst, mac_bcast)) {
+		    !ether_addr_equal_64bits(client_info->mac_dst, mac_bcast) &&
+		    !is_zero_ether_addr(client_info->mac_dst)) {
 			client_info->ntt = 1;
 			bond_info->rx_ntt = 1;
 		}
@@ -641,7 +644,8 @@ static struct slave *rlb_choose_channel(struct sk_buff *skb, struct bonding *bon
 		ether_addr_copy(client_info->mac_src, arp->mac_src);
 		client_info->slave = assigned_slave;
 
-		if (!ether_addr_equal_64bits(client_info->mac_dst, mac_bcast)) {
+		if (!ether_addr_equal_64bits(client_info->mac_dst, mac_bcast) &&
+		    !is_zero_ether_addr(client_info->mac_dst)) {
 			client_info->ntt = 1;
 			bond->alb_info.rx_ntt = 1;
 		} else {
@@ -733,8 +737,10 @@ static void rlb_rebalance(struct bonding *bond)
 		assigned_slave = __rlb_next_rx_slave(bond);
 		if (assigned_slave && (client_info->slave != assigned_slave)) {
 			client_info->slave = assigned_slave;
-			client_info->ntt = 1;
-			ntt = 1;
+			if (!is_zero_ether_addr(client_info->mac_dst)) {
+				client_info->ntt = 1;
+				ntt = 1;
+			}
 		}
 	}
 
-- 
2.17.0

^ permalink raw reply related

* [PATCH net-next v2 0/4] bonding: performance and reliability
From: Debabrata Banerjee @ 2018-05-14 18:48 UTC (permalink / raw)
  To: David S . Miller, netdev, Jay Vosburgh
  Cc: Veaceslav Falico, Andy Gospodarek, dbanerje

Series of fixes to how rlb updates are handled, code cleanup, allowing
higher performance tx hashing in balance-alb mode, and reliability of
link up/down monitoring.

v2: refactor bond_is_nondyn_tlb with inline fn, update log comment to
point out that multicast addresses will not get rlb updates.

Debabrata Banerjee (4):
  bonding: don't queue up extraneous rlb updates
  bonding: use common mac addr checks
  bonding: allow use of tx hashing in balance-alb
  bonding: allow carrier and link status to determine link state

 Documentation/networking/bonding.txt |  4 +--
 drivers/net/bonding/bond_alb.c       | 50 +++++++++++++++++-----------
 drivers/net/bonding/bond_main.c      | 37 ++++++++++++--------
 drivers/net/bonding/bond_options.c   |  9 ++---
 include/net/bonding.h                | 11 ++++--
 5 files changed, 70 insertions(+), 41 deletions(-)

-- 
2.17.0

^ permalink raw reply

* Re: [PATCH net V2] tun: fix use after free for ptr_ring
From: David Miller @ 2018-05-14 18:48 UTC (permalink / raw)
  To: jasowang; +Cc: netdev, linux-kernel, xiyou.wangcong, eric.dumazet, mst
In-Reply-To: <1526006965-9124-1-git-send-email-jasowang@redhat.com>

From: Jason Wang <jasowang@redhat.com>
Date: Fri, 11 May 2018 10:49:25 +0800

> We used to initialize ptr_ring during TUNSETIFF, this is because its
> size depends on the tx_queue_len of netdevice. And we try to clean it
> up when socket were detached from netdevice. A race were spotted when
> trying to do uninit during a read which will lead a use after free for
> pointer ring. Solving this by always initialize a zero size ptr_ring
> in open() and do resizing during TUNSETIFF, and then we can safely do
> cleanup during close(). With this, there's no need for the workaround
> that was introduced by commit 4df0bfc79904 ("tun: fix a memory leak
> for tfile->tx_array").
> 
> Reported-by: syzbot+e8b902c3c3fadf0a9dba@syzkaller.appspotmail.com
> Cc: Eric Dumazet <eric.dumazet@gmail.com>
> Cc: Cong Wang <xiyou.wangcong@gmail.com>
> Cc: Michael S. Tsirkin <mst@redhat.com>
> Fixes: 1576d9860599 ("tun: switch to use skb array for tx")
> Signed-off-by: Jason Wang <jasowang@redhat.com>
> ---
> Changes from v1:
> - free ptr_ring during close()
> - use tun_ptr_free() during resie for safety

Applied and queued up for -stable, thanks Jason.

^ permalink raw reply

* Re: [bpf-next PATCH 5/5] bpf, doc: howto use/run the BPF selftests
From: Silvan Jegen @ 2018-05-14 18:46 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: borkmann, alexei.starovoitov, netdev, quentin.monnet, linux-man,
	linux-doc
In-Reply-To: <20180514180514.4fd0ba66@redhat.com>

On Mon, May 14, 2018 at 06:05:14PM +0200, Jesper Dangaard Brouer wrote:
> On Mon, 14 May 2018 17:15:54 +0200
> Silvan Jegen <s.jegen@gmail.com> wrote:
> 
> > Hi
> > 
> > Some typo fixes below.
> > 
> > On Mon, May 14, 2018 at 3:43 PM Jesper Dangaard Brouer <brouer@redhat.com>
> > wrote:
> > > I always forget howto run the BPF selftests. Thus, lets add that info
> > > to the QA document.  
> >
> > [...] 
> >
> > I also think the underscore at the end of this line is misplaced (or it
> > should be a dash instead).
> 
> This is also part of the RST formatting.  This is a link. 
> 
> 
> > > +document for further documentation.
> > > +
> > >   Q: Which BPF kernel selftests version should I run my kernel against?
> > >   ---------------------------------------------------------------------
> > >   A: If you run a kernel ``xyz``, then always run the BPF kernel selftests
> > > @@ -607,5 +634,7 @@ when:
> > >   .. _netdev FAQ: ../networking/netdev-FAQ.txt
> > >   .. _samples/bpf/: ../../samples/bpf/
> > >   .. _selftests: ../../tools/testing/selftests/bpf/
> > > +.. _Documentation/dev-tools/kselftest.rst:
> > > +   https://www.kernel.org/doc/html/latest/dev-tools/kselftest.html  
> 
> The link is defined above/here.

Ah, thanks I didn't know.


Cheers,

Silvan

 
> -- 
> Best regards,
>   Jesper Dangaard Brouer
>   MSc.CS, Principal Kernel Engineer at Red Hat
>   LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply

* Re: [PATCH v3] selftests: add headers_install to lib.mk
From: Mike Rapoport @ 2018-05-14 18:44 UTC (permalink / raw)
  To: Anders Roxell
  Cc: yamada.masahiro, michal.lkml, shuah, bamv2005, brgl, pbonzini,
	akpm, aarcange, linux-kbuild, linux-kernel, linux-kselftest,
	netdev
In-Reply-To: <20180514115809.9803-1-anders.roxell@linaro.org>

On Mon, May 14, 2018 at 01:58:09PM +0200, Anders Roxell wrote:
> If the kernel headers aren't installed we can't build all the tests.
> Add a new make target rule 'khdr' in the file lib.mk to generate the
> kernel headers and that gets include for every test-dir Makefile that
> includes lib.mk If the testdir in turn have its own sub-dirs the
> top_srcdir needs to be set to the linux-rootdir to be able to generate
> the kernel headers.
> 
> Signed-off-by: Anders Roxell <anders.roxell@linaro.org>
> Reviewed-by: Fathi Boudra <fathi.boudra@linaro.org>

Ack for the userfaultfd part.

> ---
>  Makefile                                          | 14 +-------------
>  scripts/subarch.include                           | 13 +++++++++++++
>  tools/testing/selftests/android/Makefile          |  2 +-
>  tools/testing/selftests/android/ion/Makefile      |  1 +
>  tools/testing/selftests/bpf/Makefile              |  5 ++---
>  tools/testing/selftests/futex/functional/Makefile |  1 +
>  tools/testing/selftests/gpio/Makefile             |  7 ++-----
>  tools/testing/selftests/kvm/Makefile              |  7 ++-----
>  tools/testing/selftests/lib.mk                    | 10 ++++++++++
>  tools/testing/selftests/vm/Makefile               |  4 ----
>  10 files changed, 33 insertions(+), 31 deletions(-)
>  create mode 100644 scripts/subarch.include
> 
> diff --git a/Makefile b/Makefile
> index 6395c79ffe40..bad0a8f6e874 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -286,19 +286,7 @@ KERNELRELEASE = $(shell cat include/config/kernel.release 2> /dev/null)
>  KERNELVERSION = $(VERSION)$(if $(PATCHLEVEL),.$(PATCHLEVEL)$(if $(SUBLEVEL),.$(SUBLEVEL)))$(EXTRAVERSION)
>  export VERSION PATCHLEVEL SUBLEVEL KERNELRELEASE KERNELVERSION
> 
> -# SUBARCH tells the usermode build what the underlying arch is.  That is set
> -# first, and if a usermode build is happening, the "ARCH=um" on the command
> -# line overrides the setting of ARCH below.  If a native build is happening,
> -# then ARCH is assigned, getting whatever value it gets normally, and
> -# SUBARCH is subsequently ignored.
> -
> -SUBARCH := $(shell uname -m | sed -e s/i.86/x86/ -e s/x86_64/x86/ \
> -				  -e s/sun4u/sparc64/ \
> -				  -e s/arm.*/arm/ -e s/sa110/arm/ \
> -				  -e s/s390x/s390/ -e s/parisc64/parisc/ \
> -				  -e s/ppc.*/powerpc/ -e s/mips.*/mips/ \
> -				  -e s/sh[234].*/sh/ -e s/aarch64.*/arm64/ \
> -				  -e s/riscv.*/riscv/)
> +include scripts/subarch.include
> 
>  # Cross compiling and selecting different set of gcc/bin-utils
>  # ---------------------------------------------------------------------------
> diff --git a/scripts/subarch.include b/scripts/subarch.include
> new file mode 100644
> index 000000000000..650682821126
> --- /dev/null
> +++ b/scripts/subarch.include
> @@ -0,0 +1,13 @@
> +# SUBARCH tells the usermode build what the underlying arch is.  That is set
> +# first, and if a usermode build is happening, the "ARCH=um" on the command
> +# line overrides the setting of ARCH below.  If a native build is happening,
> +# then ARCH is assigned, getting whatever value it gets normally, and
> +# SUBARCH is subsequently ignored.
> +
> +SUBARCH := $(shell uname -m | sed -e s/i.86/x86/ -e s/x86_64/x86/ \
> +				  -e s/sun4u/sparc64/ \
> +				  -e s/arm.*/arm/ -e s/sa110/arm/ \
> +				  -e s/s390x/s390/ -e s/parisc64/parisc/ \
> +				  -e s/ppc.*/powerpc/ -e s/mips.*/mips/ \
> +				  -e s/sh[234].*/sh/ -e s/aarch64.*/arm64/ \
> +				  -e s/riscv.*/riscv/)
> diff --git a/tools/testing/selftests/android/Makefile b/tools/testing/selftests/android/Makefile
> index 72c25a3cb658..d9a725478375 100644
> --- a/tools/testing/selftests/android/Makefile
> +++ b/tools/testing/selftests/android/Makefile
> @@ -6,7 +6,7 @@ TEST_PROGS := run.sh
> 
>  include ../lib.mk
> 
> -all:
> +all: khdr
>  	@for DIR in $(SUBDIRS); do		\
>  		BUILD_TARGET=$(OUTPUT)/$$DIR;	\
>  		mkdir $$BUILD_TARGET  -p;	\
> diff --git a/tools/testing/selftests/android/ion/Makefile b/tools/testing/selftests/android/ion/Makefile
> index e03695287f76..f3be7dcc060a 100644
> --- a/tools/testing/selftests/android/ion/Makefile
> +++ b/tools/testing/selftests/android/ion/Makefile
> @@ -10,6 +10,7 @@ $(TEST_GEN_FILES): ipcsocket.c ionutils.c
> 
>  TEST_PROGS := ion_test.sh
> 
> +top_srcdir = ../../../../..
>  include ../../lib.mk
> 
>  $(OUTPUT)/ionapp_export: ionapp_export.c ipcsocket.c ionutils.c
> diff --git a/tools/testing/selftests/bpf/Makefile b/tools/testing/selftests/bpf/Makefile
> index 438d4f93875b..9741609a0eb1 100644
> --- a/tools/testing/selftests/bpf/Makefile
> +++ b/tools/testing/selftests/bpf/Makefile
> @@ -16,9 +16,8 @@ LDLIBS += -lcap -lelf -lrt -lpthread
>  TEST_CUSTOM_PROGS = $(OUTPUT)/urandom_read
>  all: $(TEST_CUSTOM_PROGS)
> 
> -$(TEST_CUSTOM_PROGS): urandom_read
> -
> -urandom_read: urandom_read.c
> +$(TEST_CUSTOM_PROGS):| khdr
> +$(TEST_CUSTOM_PROGS): urandom_read.c
>  	$(CC) -o $(TEST_CUSTOM_PROGS) -static $<
> 
>  # Order correspond to 'make run_tests' order
> diff --git a/tools/testing/selftests/futex/functional/Makefile b/tools/testing/selftests/futex/functional/Makefile
> index ff8feca49746..ad1eeb14fda7 100644
> --- a/tools/testing/selftests/futex/functional/Makefile
> +++ b/tools/testing/selftests/futex/functional/Makefile
> @@ -18,6 +18,7 @@ TEST_GEN_FILES := \
> 
>  TEST_PROGS := run.sh
> 
> +top_srcdir = ../../../../..
>  include ../../lib.mk
> 
>  $(TEST_GEN_FILES): $(HEADERS)
> diff --git a/tools/testing/selftests/gpio/Makefile b/tools/testing/selftests/gpio/Makefile
> index 1bbb47565c55..4665cdbf1a8d 100644
> --- a/tools/testing/selftests/gpio/Makefile
> +++ b/tools/testing/selftests/gpio/Makefile
> @@ -21,11 +21,8 @@ endef
>  CFLAGS += -O2 -g -std=gnu99 -Wall -I../../../../usr/include/
>  LDLIBS += -lmount -I/usr/include/libmount
> 
> -$(BINARIES): ../../../gpio/gpio-utils.o ../../../../usr/include/linux/gpio.h
> +$(BINARIES):| khdr
> +$(BINARIES): ../../../gpio/gpio-utils.o
> 
>  ../../../gpio/gpio-utils.o:
>  	make ARCH=$(ARCH) CROSS_COMPILE=$(CROSS_COMPILE) -C ../../../gpio
> -
> -../../../../usr/include/linux/gpio.h:
> -	make -C ../../../.. headers_install INSTALL_HDR_PATH=$(shell pwd)/../../../../usr/
> -
> diff --git a/tools/testing/selftests/kvm/Makefile b/tools/testing/selftests/kvm/Makefile
> index d9d00319b07c..bcb69380bbab 100644
> --- a/tools/testing/selftests/kvm/Makefile
> +++ b/tools/testing/selftests/kvm/Makefile
> @@ -32,9 +32,6 @@ $(LIBKVM_OBJ): $(OUTPUT)/%.o: %.c
>  $(OUTPUT)/libkvm.a: $(LIBKVM_OBJ)
>  	$(AR) crs $@ $^
> 
> -$(LINUX_HDR_PATH):
> -	make -C $(top_srcdir) headers_install
> -
> -all: $(STATIC_LIBS) $(LINUX_HDR_PATH)
> +all: $(STATIC_LIBS)
>  $(TEST_GEN_PROGS): $(STATIC_LIBS)
> -$(TEST_GEN_PROGS) $(LIBKVM_OBJ): | $(LINUX_HDR_PATH)
> +$(STATIC_LIBS):| khdr
> diff --git a/tools/testing/selftests/lib.mk b/tools/testing/selftests/lib.mk
> index 6466294366dc..9515af872e16 100644
> --- a/tools/testing/selftests/lib.mk
> +++ b/tools/testing/selftests/lib.mk
> @@ -16,8 +16,18 @@ TEST_GEN_PROGS := $(patsubst %,$(OUTPUT)/%,$(TEST_GEN_PROGS))
>  TEST_GEN_PROGS_EXTENDED := $(patsubst %,$(OUTPUT)/%,$(TEST_GEN_PROGS_EXTENDED))
>  TEST_GEN_FILES := $(patsubst %,$(OUTPUT)/%,$(TEST_GEN_FILES))
> 
> +top_srcdir ?= ../../../..
> +include $(top_srcdir)/scripts/subarch.include
> +ARCH		?= $(SUBARCH)
> +
>  all: $(TEST_GEN_PROGS) $(TEST_GEN_PROGS_EXTENDED) $(TEST_GEN_FILES)
> 
> +.PHONY: khdr
> +khdr:
> +	make ARCH=$(ARCH) -C $(top_srcdir) headers_install
> +
> +$(TEST_GEN_PROGS) $(TEST_GEN_PROGS_EXTENDED) $(TEST_GEN_FILES):| khdr
> +
>  .ONESHELL:
>  define RUN_TEST_PRINT_RESULT
>  	TEST_HDR_MSG="selftests: "`basename $$PWD`:" $$BASENAME_TEST";	\
> diff --git a/tools/testing/selftests/vm/Makefile b/tools/testing/selftests/vm/Makefile
> index fdefa2295ddc..58759454b1d0 100644
> --- a/tools/testing/selftests/vm/Makefile
> +++ b/tools/testing/selftests/vm/Makefile
> @@ -25,10 +25,6 @@ TEST_PROGS := run_vmtests
> 
>  include ../lib.mk
> 
> -$(OUTPUT)/userfaultfd: ../../../../usr/include/linux/kernel.h
>  $(OUTPUT)/userfaultfd: LDLIBS += -lpthread
> 
>  $(OUTPUT)/mlock-random-test: LDLIBS += -lcap
> -
> -../../../../usr/include/linux/kernel.h:
> -	make -C ../../../.. headers_install
> -- 
> 2.17.0
> 

-- 
Sincerely yours,
Mike.

^ permalink raw reply

* Re: iproute2 - modifying routes in place
From: Ryan Whelan @ 2018-05-14 18:40 UTC (permalink / raw)
  To: dsahern; +Cc: netdev
In-Reply-To: <07d96b60-68e5-8c20-8505-3df9178f9f78@gmail.com>

Same behavior:


root@rwhelan-linux ~
# ip -6 route
::1 dev lo proto kernel metric 256 pref medium
fd9b:caee:ff93:ceef:3431:3831:3930:3031 dev internal0 proto kernel metric
256 pref medium
fd9b:caee:ff93:ceef:3431:3831:3930:3032 dev internal0 src
fd9b:caee:ff93:ceef:3431:3831:3930:3031 metric 1024 pref medium
fe80::/64 dev enp0s3 proto kernel metric 256 pref medium
fe80::/64 dev enp0s8 proto kernel metric 256 pref medium
fe80::/64 dev internal0 proto kernel metric 256 pref medium

root@rwhelan-linux ~
# ip -6 route change fd9b:caee:ff93:ceef:3431:3831:3930:3032 dev internal0
src fd9b:caee:ff93:ceef:3431:3831:3930:3031 metric 10
RTNETLINK answers: No such file or directory

root@rwhelan-linux ~
# ip -6 route replace fd9b:caee:ff93:ceef:3431:3831:3930:3032 dev internal0
src fd9b:caee:ff93:ceef:3431:3831:3930:3031 metric 10

root@rwhelan-linux ~
# ip -6 route
::1 dev lo proto kernel metric 256 pref medium
fd9b:caee:ff93:ceef:3431:3831:3930:3031 dev internal0 proto kernel metric
256 pref medium
fd9b:caee:ff93:ceef:3431:3831:3930:3032 dev internal0 src
fd9b:caee:ff93:ceef:3431:3831:3930:3031 metric 10 pref medium
fd9b:caee:ff93:ceef:3431:3831:3930:3032 dev internal0 src
fd9b:caee:ff93:ceef:3431:3831:3930:3031 metric 1024 pref medium
fe80::/64 dev enp0s3 proto kernel metric 256 pref medium
fe80::/64 dev enp0s8 proto kernel metric 256 pref medium
fe80::/64 dev internal0 proto kernel metric 256 pref medium

root@rwhelan-linux ~
# uname -a
Linux rwhelan-linux 4.17.0-rc3-ipv6-route-bugs+ #2 SMP Mon May 14 11:30:38
EDT 2018 x86_64 x86_64 x86_64 GNU/Linux
On Sat, May 12, 2018 at 1:01 PM David Ahern <dsahern@gmail.com> wrote:

> On 5/11/18 4:42 AM, Ryan Whelan wrote:
> > `ip route` has 2 subcommands that don't seem to work as expected and i'm
> > not sure if its a bug, or if i'm misunderstanding the semantics.

> Can you try with ipv6/route-bugs branch in
> https://github.com/dsahern/linux

^ permalink raw reply

* Re: [PATCH net-next v9 1/7] sched: Add Common Applications Kept Enhanced (cake) qdisc
From: Toke Høiland-Jørgensen @ 2018-05-14 18:31 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, cake
In-Reply-To: <20180514.110558.392362320880673775.davem@davemloft.net>

David Miller <davem@davemloft.net> writes:

> From: Toke Høiland-Jørgensen <toke@toke.dk>
> Date: Mon, 14 May 2018 11:08:28 +0200
>
>> David Miller <davem@davemloft.net> writes:
>> 
>>> From: Toke Høiland-Jørgensen <toke@toke.dk>
>>> Date: Tue, 08 May 2018 16:34:19 +0200
>>>
>>>> +struct cake_flow {
>>>> +	/* this stuff is all needed per-flow at dequeue time */
>>>> +	struct sk_buff	  *head;
>>>> +	struct sk_buff	  *tail;
>>>
>>> Please do not invent your own SKB list handling mechanism.
>> 
>> We didn't invent it, we inherited it from fq_codel. I was actually about
>> to fix that, but then I noticed it was still around in fq_codel, and so
>> let it be. I can certainly fix it anyway, but, erm, why is it acceptable
>> in fq_codel but not in cake? struct sk_buff_head is not that new, is it?
>
> I guess one argument has to do with the amount of memory consumed by this
> per-flow or per-queue information, right?  Because the skb queue head has
> a qlen and a spinlock regardless of whether they are used or not.
>
> Furthermore, if you use the __skb_insert() et al. helpers, even though it
> won't use the lock it will adjust the qlen counter.  And that's useless
> work since you have no use for the qlen value.

I think the useless work issue is larger than the memory usage. When
running this (or FQ-CoDel) on small memory-constrained routers, we've
mostly had issues with OOM because of the packet data, which dwarfs the
per-queue overhead.

> Taken together, it seems that what you and fq_codel are doing is not
> such a bad idea after all.  So please leave it alone.

OK. I'll just resend with prettier Christmas trees, then :)

> On-and-off again, I've looked into converting skbs to using list_head
> but it's a non-trivial set of work. All over the tree the different
> layers use the next/prev pointers in different ways. Some use it for a
> doubly linked list. Some use it for a singly linked list. Some encode
> state in the prev pointer. You name it, it's out there.
>
> I'll try to get back to that task because obviously it'll be useful to
> have code like cake and fq_codel use common helpers instead of custom
> stuff.

Yup, I agree. From a code readability point of view, I also prefer the
helpers.

-Toke

^ permalink raw reply

* Re: [net-next 00/11][pull request] 40GbE Intel Wired LAN Driver Updates 2018-05-14
From: David Miller @ 2018-05-14 18:29 UTC (permalink / raw)
  To: jeffrey.t.kirsher; +Cc: netdev, nhorman, sassmann, jogreene
In-Reply-To: <20180514152747.23154-1-jeffrey.t.kirsher@intel.com>

From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Date: Mon, 14 May 2018 08:27:36 -0700

> This series contains updates to virtchnl, i40e and i40evf.
> 
> Bruce cleans up whitespace and unnecessary parentheses in virtchnl.
> 
> Jake does a number of stat cleanups in the i40e driver, including
> cleanup of code indentation, whitespace issues, remove duplicate stats,
> fix grammar in code comment and general spring cleaning of the
> statistics code.
> 
> Patryk fixes an issue where we recalculate vectors left and vectors
> wanted but do not take into account the reduced number of queue pairs
> per VSI.
> 
> Harshitha adds tx_busy stat to ethtool stats to track the number of
> times we return NETDEV_TX_BUSY to the stack during transmit.
> 
> Paweł fixes a potential system crash when unloading the VF driver after
> a hardware reset.
> 
> The following are changes since commit 289e1f4e9e4a09c73a1c0152bb93855ea351ccda:
>   net: ipv4: ipconfig: fix unused variable
> and are available in the git repository at:
>   git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue 40GbE

Looks good, pulled, thanks Jeff.

^ permalink raw reply

* Re: Kernel panic on kernel-3.10.0-693.21.1.el7 in ndisc.h
From: Roman Makhov @ 2018-05-14 18:29 UTC (permalink / raw)
  To: Alexander Aring; +Cc: linux-wpan, netdev
In-Reply-To: <20180514154002.ehab2ss25a45l5p6@x220t>

Hi Alexander,

Thank you for the answer.
Unfortunately CentOS goes with these dinosaurs.
So we will try to debug the problem in the current one and try to
reproduce on the latest kernel.

Thanks,
Roman.

2018-05-14 18:40 GMT+03:00 Alexander Aring <aring@mojatatu.com>:
> Hi,
>
> On Sun, May 13, 2018 at 02:35:07PM +0300, Roman Makhov wrote:
>> Hello,
>>
>> We have a problem with Kernel panic after upgrade from CentOS 7.3
>> (kernel-3.10.0-514.el7) to CentOS 7.4 (kernel-3.10.0-693.21.1.el7).
>> It occurs when we have the incoming traffic from other nodes and we
>> are performing the re-configuration of IPv6 interfaces.
>>
>> It is high-availability system without 802.15.4 support.
>>
>> The log of crash:
>> =========================================================
>> #10 [ffff88043fc03cf0] async_page_fault at ffffffff816b7798
>>     [exception RIP: ndisc_send_rs+238]
>>     RIP: ffffffff8166575e  RSP: ffff88043fc03da8  RFLAGS: 00010202
>>     RAX: 0000000000000002  RBX: ffff88042caa9000  RCX: 0000000000000001
>>     RDX: 0000000000000000  RSI: 0000000000000200  RDI: ffffffff816534f7
>>     RBP: ffff88043fc03dd0   R8: 0000000000000000   R9: ffffffff81e9f1c0
>>     R10: 0000000000000002  R11: ffff88043fc03da8  R12: 0000000000000008
>>     R13: 0000000000000006  R14: ffff88043fc03de0  R15: ffffffff81772410
>>     ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
>> #11 [ffff88043fc03da0] ndisc_send_rs at ffffffff81665704
>> =========================================================
>>
>> I see that crash points on ndisc.h, it is ndisc_ops_opt_addr_space()
>> in function:
>> =========================================================
>> crash> kmem ffffffff8166575e
>> ffffffff8166575e (T) ndisc_send_rs+238
>> /usr/src/debug/kernel-3.10.0-693.21.1.el7/linux-3.10.0-693.21.1.el7.x86_64/include/net/ndisc.h:
>> 251
>>
>>       PAGE        PHYSICAL      MAPPING       INDEX CNT FLAGS
>> ffffea0000059940   1665000                0        0  1 1fffff00000400 reserved
>> crash>
>> =========================================================
>>
>> I checked the difference between 514 and 693 kernels is in the patch
>> https://patchwork.kernel.org/patch/9179229/ .
>>
>> Any suggesions about what I am doing wrong are welcome.
>>
>
> Me as original author of this patch,
>
> I cannot help you with such a dinosaurs kernel. Please try it with the
> latest one and check if the problem still exists.
>
> - Alex

^ permalink raw reply

* Re: INFO: rcu detected stall in kfree_skbmem
From: Xin Long @ 2018-05-14 18:04 UTC (permalink / raw)
  To: Neil Horman, Marcelo Ricardo Leitner
  Cc: Dmitry Vyukov, syzbot, Vladislav Yasevich, linux-sctp,
	Andrei Vagin, David Miller, Kirill Tkhai, LKML, netdev,
	syzkaller-bugs
In-Reply-To: <20180514133439.GA6743@hmswarspite.think-freely.org>

On Mon, May 14, 2018 at 9:34 PM, Neil Horman <nhorman@tuxdriver.com> wrote:
> On Fri, May 11, 2018 at 12:00:38PM +0200, Dmitry Vyukov wrote:
>> On Mon, Apr 30, 2018 at 8:09 PM, syzbot
>> <syzbot+fc78715ba3b3257caf6a@syzkaller.appspotmail.com> wrote:
>> > Hello,
>> >
>> > syzbot found the following crash on:
>> >
>> > HEAD commit:    5d1365940a68 Merge
>> > git://git.kernel.org/pub/scm/linux/kerne...
>> > git tree:       net-next
>> > console output: https://syzkaller.appspot.com/x/log.txt?id=5667997129637888
>> > kernel config:
>> > https://syzkaller.appspot.com/x/.config?id=-5947642240294114534
>> > dashboard link: https://syzkaller.appspot.com/bug?extid=fc78715ba3b3257caf6a
>> > compiler:       gcc (GCC) 8.0.1 20180413 (experimental)
>> >
>> > Unfortunately, I don't have any reproducer for this crash yet.
>>
>> This looks sctp-related, +sctp maintainers.
>>
> Looking at the entire trace, it appears that we are getting caught in the
> kfree_skb that is getting triggered in enqueue_to_backlog which occurs when our
> rx backlog list grows over netdev_max_backlog packets.  That suggests to me that
It might be a long skb->frag_list that made kfree_skb slow when packing
lots of small chunks to go through lo device?

> whatever test(s) is/are causing this trace are queuing up a large number of
> frames to be sent over the loopback interface, and are never/rarely getting
> received.  Looking up higher in the stack, in the sctp_generate_heartbeat_event
> function, we (in addition to the rcu_read_lock in sctp_v6_xmit) we also hold the
> socket lock during the entirety of the xmit operaion.  Is it possible that we
> are just enqueuing so many frames for xmit that we are blocking progress of
> other threads using the same socket that we cross the RCU self detected stall
> boundary?  While its not a fix per se, it might be a worthwhile test to limit
> the number of frames we flush in a single pass.
>
> Neil
>
>> > IMPORTANT: if you fix the bug, please add the following tag to the commit:
>> > Reported-by: syzbot+fc78715ba3b3257caf6a@syzkaller.appspotmail.com
>> >
>> > INFO: rcu_sched self-detected stall on CPU
>> >         1-...!: (1 GPs behind) idle=a3e/1/4611686018427387908
>> > softirq=71980/71983 fqs=33
>> >          (t=125000 jiffies g=39438 c=39437 q=958)
>> > rcu_sched kthread starved for 124829 jiffies! g39438 c39437 f0x0
>> > RCU_GP_WAIT_FQS(3) ->state=0x0 ->cpu=0
>> > RCU grace-period kthread stack dump:
>> > rcu_sched       R  running task    23768     9      2 0x80000000
>> > Call Trace:
>> >  context_switch kernel/sched/core.c:2848 [inline]
>> >  __schedule+0x801/0x1e30 kernel/sched/core.c:3490
>> >  schedule+0xef/0x430 kernel/sched/core.c:3549
>> >  schedule_timeout+0x138/0x240 kernel/time/timer.c:1801
>> >  rcu_gp_kthread+0x6b5/0x1940 kernel/rcu/tree.c:2231
>> >  kthread+0x345/0x410 kernel/kthread.c:238
>> >  ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:411
>> > NMI backtrace for cpu 1
>> > CPU: 1 PID: 20560 Comm: syz-executor4 Not tainted 4.16.0+ #1
>> > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
>> > Google 01/01/2011
>> > Call Trace:
>> >  <IRQ>
>> >  __dump_stack lib/dump_stack.c:77 [inline]
>> >  dump_stack+0x1b9/0x294 lib/dump_stack.c:113
>> >  nmi_cpu_backtrace.cold.4+0x19/0xce lib/nmi_backtrace.c:103
>> >  nmi_trigger_cpumask_backtrace+0x151/0x192 lib/nmi_backtrace.c:62
>> >  arch_trigger_cpumask_backtrace+0x14/0x20 arch/x86/kernel/apic/hw_nmi.c:38
>> >  trigger_single_cpu_backtrace include/linux/nmi.h:156 [inline]
>> >  rcu_dump_cpu_stacks+0x175/0x1c2 kernel/rcu/tree.c:1376
>> >  print_cpu_stall kernel/rcu/tree.c:1525 [inline]
>> >  check_cpu_stall.isra.61.cold.80+0x36c/0x59a kernel/rcu/tree.c:1593
>> >  __rcu_pending kernel/rcu/tree.c:3356 [inline]
>> >  rcu_pending kernel/rcu/tree.c:3401 [inline]
>> >  rcu_check_callbacks+0x21b/0xad0 kernel/rcu/tree.c:2763
>> >  update_process_times+0x2d/0x70 kernel/time/timer.c:1636
>> >  tick_sched_handle+0x9f/0x180 kernel/time/tick-sched.c:173
>> >  tick_sched_timer+0x45/0x130 kernel/time/tick-sched.c:1283
>> >  __run_hrtimer kernel/time/hrtimer.c:1386 [inline]
>> >  __hrtimer_run_queues+0x3e3/0x10a0 kernel/time/hrtimer.c:1448
>> >  hrtimer_interrupt+0x286/0x650 kernel/time/hrtimer.c:1506
>> >  local_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1025 [inline]
>> >  smp_apic_timer_interrupt+0x15d/0x710 arch/x86/kernel/apic/apic.c:1050
>> >  apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:862
>> > RIP: 0010:arch_local_irq_restore arch/x86/include/asm/paravirt.h:783
>> > [inline]
>> > RIP: 0010:kmem_cache_free+0xb3/0x2d0 mm/slab.c:3757
>> > RSP: 0018:ffff8801db105228 EFLAGS: 00000282 ORIG_RAX: ffffffffffffff13
>> > RAX: 0000000000000007 RBX: ffff8800b055c940 RCX: 1ffff1003b2345a5
>> > RDX: 0000000000000000 RSI: ffff8801d91a2d80 RDI: 0000000000000282
>> > RBP: ffff8801db105248 R08: ffff8801d91a2cb8 R09: 0000000000000002
>> > R10: ffff8801d91a2480 R11: 0000000000000000 R12: ffff8801d9848e40
>> > R13: 0000000000000282 R14: ffffffff85b7f27c R15: 0000000000000000
>> >  kfree_skbmem+0x13c/0x210 net/core/skbuff.c:582
>> >  __kfree_skb net/core/skbuff.c:642 [inline]
>> >  kfree_skb+0x19d/0x560 net/core/skbuff.c:659
>> >  enqueue_to_backlog+0x2fc/0xc90 net/core/dev.c:3968
>> >  netif_rx_internal+0x14d/0xae0 net/core/dev.c:4181
>> >  netif_rx+0xba/0x400 net/core/dev.c:4206
>> >  loopback_xmit+0x283/0x741 drivers/net/loopback.c:91
>> >  __netdev_start_xmit include/linux/netdevice.h:4087 [inline]
>> >  netdev_start_xmit include/linux/netdevice.h:4096 [inline]
>> >  xmit_one net/core/dev.c:3053 [inline]
>> >  dev_hard_start_xmit+0x264/0xc10 net/core/dev.c:3069
>> >  __dev_queue_xmit+0x2724/0x34c0 net/core/dev.c:3584
>> >  dev_queue_xmit+0x17/0x20 net/core/dev.c:3617
>> >  neigh_hh_output include/net/neighbour.h:472 [inline]
>> >  neigh_output include/net/neighbour.h:480 [inline]
>> >  ip6_finish_output2+0x134e/0x2810 net/ipv6/ip6_output.c:120
>> >  ip6_finish_output+0x5fe/0xbc0 net/ipv6/ip6_output.c:154
>> >  NF_HOOK_COND include/linux/netfilter.h:277 [inline]
>> >  ip6_output+0x227/0x9b0 net/ipv6/ip6_output.c:171
>> >  dst_output include/net/dst.h:444 [inline]
>> >  NF_HOOK include/linux/netfilter.h:288 [inline]
>> >  ip6_xmit+0xf51/0x23f0 net/ipv6/ip6_output.c:277
>> >  sctp_v6_xmit+0x4a5/0x6b0 net/sctp/ipv6.c:225
>> >  sctp_packet_transmit+0x26f6/0x3ba0 net/sctp/output.c:650
>> >  sctp_outq_flush+0x1373/0x4370 net/sctp/outqueue.c:1197
>> >  sctp_outq_uncork+0x6a/0x80 net/sctp/outqueue.c:776
>> >  sctp_cmd_interpreter net/sctp/sm_sideeffect.c:1820 [inline]
>> >  sctp_side_effects net/sctp/sm_sideeffect.c:1220 [inline]
>> >  sctp_do_sm+0x596/0x7160 net/sctp/sm_sideeffect.c:1191
>> >  sctp_generate_heartbeat_event+0x218/0x450 net/sctp/sm_sideeffect.c:406
>> >  call_timer_fn+0x230/0x940 kernel/time/timer.c:1326
>> >  expire_timers kernel/time/timer.c:1363 [inline]
>> >  __run_timers+0x79e/0xc50 kernel/time/timer.c:1666
>> >  run_timer_softirq+0x4c/0x70 kernel/time/timer.c:1692
>> >  __do_softirq+0x2e0/0xaf5 kernel/softirq.c:285
>> >  invoke_softirq kernel/softirq.c:365 [inline]
>> >  irq_exit+0x1d1/0x200 kernel/softirq.c:405
>> >  exiting_irq arch/x86/include/asm/apic.h:525 [inline]
>> >  smp_apic_timer_interrupt+0x17e/0x710 arch/x86/kernel/apic/apic.c:1052
>> >  apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:862
>> >  </IRQ>
>> > RIP: 0010:arch_local_irq_restore arch/x86/include/asm/paravirt.h:783
>> > [inline]
>> > RIP: 0010:lock_release+0x4d4/0xa10 kernel/locking/lockdep.c:3942
>> > RSP: 0018:ffff8801971ce7b0 EFLAGS: 00000282 ORIG_RAX: ffffffffffffff13
>> > RAX: dffffc0000000000 RBX: 1ffff10032e39cfb RCX: 1ffff1003b234595
>> > RDX: 1ffffffff11630ed RSI: 0000000000000002 RDI: 0000000000000282
>> > RBP: ffff8801971ce8e0 R08: 1ffff10032e39cff R09: ffffed003b6246c2
>> > R10: 0000000000000003 R11: 0000000000000001 R12: ffff8801d91a2480
>> > R13: ffffffff88b8df60 R14: ffff8801d91a2480 R15: ffff8801971ce7f8
>> >  rcu_lock_release include/linux/rcupdate.h:251 [inline]
>> >  rcu_read_unlock include/linux/rcupdate.h:688 [inline]
>> >  __unlock_page_memcg+0x72/0x100 mm/memcontrol.c:1654
>> >  unlock_page_memcg+0x2c/0x40 mm/memcontrol.c:1663
>> >  page_remove_file_rmap mm/rmap.c:1248 [inline]
>> >  page_remove_rmap+0x6f2/0x1250 mm/rmap.c:1299
>> >  zap_pte_range mm/memory.c:1337 [inline]
>> >  zap_pmd_range mm/memory.c:1441 [inline]
>> >  zap_pud_range mm/memory.c:1470 [inline]
>> >  zap_p4d_range mm/memory.c:1491 [inline]
>> >  unmap_page_range+0xeb4/0x2200 mm/memory.c:1512
>> >  unmap_single_vma+0x1a0/0x310 mm/memory.c:1557
>> >  unmap_vmas+0x120/0x1f0 mm/memory.c:1587
>> >  exit_mmap+0x265/0x570 mm/mmap.c:3038
>> >  __mmput kernel/fork.c:962 [inline]
>> >  mmput+0x251/0x610 kernel/fork.c:983
>> >  exit_mm kernel/exit.c:544 [inline]
>> >  do_exit+0xe98/0x2730 kernel/exit.c:852
>> >  do_group_exit+0x16f/0x430 kernel/exit.c:968
>> >  get_signal+0x886/0x1960 kernel/signal.c:2469
>> >  do_signal+0x98/0x2040 arch/x86/kernel/signal.c:810
>> >  exit_to_usermode_loop+0x28a/0x310 arch/x86/entry/common.c:162
>> >  prepare_exit_to_usermode arch/x86/entry/common.c:196 [inline]
>> >  syscall_return_slowpath arch/x86/entry/common.c:265 [inline]
>> >  do_syscall_64+0x792/0x9d0 arch/x86/entry/common.c:292
>> >  entry_SYSCALL_64_after_hwframe+0x42/0xb7
>> > RIP: 0033:0x455319
>> > RSP: 002b:00007fa346e81ce8 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca
>> > RAX: fffffffffffffe00 RBX: 000000000072bf80 RCX: 0000000000455319
>> > RDX: 0000000000000000 RSI: 0000000000000000 RDI: 000000000072bf80
>> > RBP: 000000000072bf80 R08: 0000000000000000 R09: 000000000072bf58
>> > R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
>> > R13: 0000000000a3e81f R14: 00007fa346e829c0 R15: 0000000000000001
>> >
>> >
>> > ---
>> > This bug is generated by a bot. It may contain errors.
>> > See https://goo.gl/tpsmEJ for more information about syzbot.
>> > syzbot engineers can be reached at syzkaller@googlegroups.com.
>> >
>> > syzbot will keep track of this bug report.
>> > If you forgot to add the Reported-by tag, once the fix for this bug is
>> > merged
>> > into any tree, please reply to this email with:
>> > #syz fix: exact-commit-title
>> > To mark this as a duplicate of another syzbot report, please reply with:
>> > #syz dup: exact-subject-of-another-report
>> > If it's a one-off invalid bug report, please reply with:
>> > #syz invalid
>> > Note: if the crash happens again, it will cause creation of a new bug
>> > report.
>> > Note: all commands must start from beginning of the line in the email body.
>> >
>> > --
>> > You received this message because you are subscribed to the Google Groups
>> > "syzkaller-bugs" group.
>> > To unsubscribe from this group and stop receiving emails from it, send an
>> > email to syzkaller-bugs+unsubscribe@googlegroups.com.
>> > To view this discussion on the web visit
>> > https://groups.google.com/d/msgid/syzkaller-bugs/000000000000a9b0e3056b14bfb2%40google.com.
>> > For more options, visit https://groups.google.com/d/optout.
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-sctp" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-sctp" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH 00/14] Modify action API for implementing lockless actions
From: Jamal Hadi Salim @ 2018-05-14 18:03 UTC (permalink / raw)
  To: Vlad Buslov, netdev
  Cc: davem, xiyou.wangcong, jiri, pablo, kadlec, fw, ast, daniel,
	edumazet, keescook, linux-kernel, netfilter-devel, coreteam,
	kliteyn
In-Reply-To: <1526308035-12484-1-git-send-email-vladbu@mellanox.com>

On 14/05/18 10:27 AM, Vlad Buslov wrote:
> Currently, all netlink protocol handlers for updating rules, actions and
> qdiscs are protected with single global rtnl lock which removes any
> possibility for parallelism. This patch set is a first step to remove
> rtnl lock dependency from TC rules update path. It updates act API to
> use atomic operations, rcu and spinlocks for fine-grained locking. It
> also extend API with functions that are needed to update existing
> actions for parallel execution.
> 
> Outline of changes:
> - Change tc action to use atomic reference and bind counters, rcu
>    mechanism for cookie update.
> - Extend action ops API with 'delete' function and 'unlocked' flag.
> - Change action API to work with actions in lockless manner based on
>    primitives implemented in previous patches.
> - Extend action API with new functions necessary to implement unlocked
>    actions.

Please run all the tdc tests with these changes. This area has almost
good test coverage at this point. If you need help just ping me.

cheers,
jamal

^ permalink raw reply

* Re: possible deadlock in sk_diag_fill
From: Andrei Vagin @ 2018-05-14 18:00 UTC (permalink / raw)
  To: Dmitry Vyukov; +Cc: syzbot, avagin, David Miller, LKML, netdev, syzkaller-bugs
In-Reply-To: <CACT4Y+bt-HwBNA5REeZwoKAymCc-Da5rEU=yeeHodg7zByStQg@mail.gmail.com>

On Sat, May 12, 2018 at 09:46:25AM +0200, Dmitry Vyukov wrote:
> On Fri, May 11, 2018 at 8:33 PM, Andrei Vagin <avagin@virtuozzo.com> wrote:
> > On Sat, May 05, 2018 at 10:59:02AM -0700, syzbot wrote:
> >> Hello,
> >>
> >> syzbot found the following crash on:
> >>
> >> HEAD commit:    c1c07416cdd4 Merge tag 'kbuild-fixes-v4.17' of git://git.k..
> >> git tree:       upstream
> >> console output: https://syzkaller.appspot.com/x/log.txt?x=12164c97800000
> >> kernel config:  https://syzkaller.appspot.com/x/.config?x=5a1dc06635c10d27
> >> dashboard link: https://syzkaller.appspot.com/bug?extid=c1872be62e587eae9669
> >> compiler:       gcc (GCC) 8.0.1 20180413 (experimental)
> >> userspace arch: i386
> >>
> >> Unfortunately, I don't have any reproducer for this crash yet.
> >>
> >> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> >> Reported-by: syzbot+c1872be62e587eae9669@syzkaller.appspotmail.com
> >>
> >>
> >> ======================================================
> >> WARNING: possible circular locking dependency detected
> >> 4.17.0-rc3+ #59 Not tainted
> >> ------------------------------------------------------
> >> syz-executor1/25282 is trying to acquire lock:
> >> 000000004fddf743 (&(&u->lock)->rlock/1){+.+.}, at: sk_diag_dump_icons
> >> net/unix/diag.c:82 [inline]
> >> 000000004fddf743 (&(&u->lock)->rlock/1){+.+.}, at:
> >> sk_diag_fill.isra.5+0xa43/0x10d0 net/unix/diag.c:144
> >>
> >> but task is already holding lock:
> >> 00000000b6895645 (rlock-AF_UNIX){+.+.}, at: spin_lock
> >> include/linux/spinlock.h:310 [inline]
> >> 00000000b6895645 (rlock-AF_UNIX){+.+.}, at: sk_diag_dump_icons
> >> net/unix/diag.c:64 [inline]
> >> 00000000b6895645 (rlock-AF_UNIX){+.+.}, at: sk_diag_fill.isra.5+0x94e/0x10d0
> >> net/unix/diag.c:144
> >>
> >> which lock already depends on the new lock.
> >
> > In the code, we have a comment which explains why it is safe to take this lock
> >
> > /*
> >  * The state lock is outer for the same sk's
> >  * queue lock. With the other's queue locked it's
> >  * OK to lock the state.
> >  */
> > unix_state_lock_nested(req);
> >
> > It is a question how to explain this to lockdep.
> 
> Do I understand it correctly that (&u->lock)->rlock associated with
> AF_UNIX is locked under rlock-AF_UNIX, and then rlock-AF_UNIX is
> locked under (&u->lock)->rlock associated with AF_NETLINK? If so, I
> think we need to split (&u->lock)->rlock by family too, so that we
> have u->lock-AF_UNIX and u->lock-AF_NETLINK.

I think here is another problem. lockdep woried about
sk->sk_receive_queue vs unix_sk(s)->lock.

sk_diag_dump_icons() takes sk->sk_receive_queue and then
unix_sk(s)->lock.

unix_dgram_sendmsg takes unix_sk(sk)->lock and then sk->sk_receive_queue.

sk_diag_dump_icons() takes locks for two different sockets, but
unix_dgram_sendmsg() takes locks for one socket.

sk_diag_dump_icons
        if (sk->sk_state == TCP_LISTEN) {
                spin_lock(&sk->sk_receive_queue.lock);
                skb_queue_walk(&sk->sk_receive_queue, skb) {
			unix_state_lock_nested(req);
				spin_lock_nested(&unix_sk(s)->lock,


unix_dgram_sendmsg
	unix_state_lock(other)
		spin_lock(&unix_sk(s)->lock)
        skb_queue_tail(&other->sk_receive_queue, skb);
	        spin_lock_irqsave(&list->lock, flags);

> 
> 
> 
> >> the existing dependency chain (in reverse order) is:
> >>
> >> -> #1 (rlock-AF_UNIX){+.+.}:
> >>        __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
> >>        _raw_spin_lock_irqsave+0x96/0xc0 kernel/locking/spinlock.c:152
> >>        skb_queue_tail+0x26/0x150 net/core/skbuff.c:2900
> >>        unix_dgram_sendmsg+0xf77/0x1730 net/unix/af_unix.c:1797
> >>        sock_sendmsg_nosec net/socket.c:629 [inline]
> >>        sock_sendmsg+0xd5/0x120 net/socket.c:639
> >>        ___sys_sendmsg+0x525/0x940 net/socket.c:2117
> >>        __sys_sendmmsg+0x3bb/0x6f0 net/socket.c:2205
> >>        __compat_sys_sendmmsg net/compat.c:770 [inline]
> >>        __do_compat_sys_sendmmsg net/compat.c:777 [inline]
> >>        __se_compat_sys_sendmmsg net/compat.c:774 [inline]
> >>        __ia32_compat_sys_sendmmsg+0x9f/0x100 net/compat.c:774
> >>        do_syscall_32_irqs_on arch/x86/entry/common.c:323 [inline]
> >>        do_fast_syscall_32+0x345/0xf9b arch/x86/entry/common.c:394
> >>        entry_SYSENTER_compat+0x70/0x7f arch/x86/entry/entry_64_compat.S:139
> >>
> >> -> #0 (&(&u->lock)->rlock/1){+.+.}:
> >>        lock_acquire+0x1dc/0x520 kernel/locking/lockdep.c:3920
> >>        _raw_spin_lock_nested+0x28/0x40 kernel/locking/spinlock.c:354
> >>        sk_diag_dump_icons net/unix/diag.c:82 [inline]
> >>        sk_diag_fill.isra.5+0xa43/0x10d0 net/unix/diag.c:144
> >>        sk_diag_dump net/unix/diag.c:178 [inline]
> >>        unix_diag_dump+0x35f/0x550 net/unix/diag.c:206
> >>        netlink_dump+0x507/0xd20 net/netlink/af_netlink.c:2226
> >>        __netlink_dump_start+0x51a/0x780 net/netlink/af_netlink.c:2323
> >>        netlink_dump_start include/linux/netlink.h:214 [inline]
> >>        unix_diag_handler_dump+0x3f4/0x7b0 net/unix/diag.c:307
> >>        __sock_diag_cmd net/core/sock_diag.c:230 [inline]
> >>        sock_diag_rcv_msg+0x2e0/0x3d0 net/core/sock_diag.c:261
> >>        netlink_rcv_skb+0x172/0x440 net/netlink/af_netlink.c:2448
> >>        sock_diag_rcv+0x2a/0x40 net/core/sock_diag.c:272
> >>        netlink_unicast_kernel net/netlink/af_netlink.c:1310 [inline]
> >>        netlink_unicast+0x58b/0x740 net/netlink/af_netlink.c:1336
> >>        netlink_sendmsg+0x9f0/0xfa0 net/netlink/af_netlink.c:1901
> >>        sock_sendmsg_nosec net/socket.c:629 [inline]
> >>        sock_sendmsg+0xd5/0x120 net/socket.c:639
> >>        sock_write_iter+0x35a/0x5a0 net/socket.c:908
> >>        call_write_iter include/linux/fs.h:1784 [inline]
> >>        new_sync_write fs/read_write.c:474 [inline]
> >>        __vfs_write+0x64d/0x960 fs/read_write.c:487
> >>        vfs_write+0x1f8/0x560 fs/read_write.c:549
> >>        ksys_write+0xf9/0x250 fs/read_write.c:598
> >>        __do_sys_write fs/read_write.c:610 [inline]
> >>        __se_sys_write fs/read_write.c:607 [inline]
> >>        __ia32_sys_write+0x71/0xb0 fs/read_write.c:607
> >>        do_syscall_32_irqs_on arch/x86/entry/common.c:323 [inline]
> >>        do_fast_syscall_32+0x345/0xf9b arch/x86/entry/common.c:394
> >>        entry_SYSENTER_compat+0x70/0x7f arch/x86/entry/entry_64_compat.S:139
> >>
> >> other info that might help us debug this:
> >>
> >>  Possible unsafe locking scenario:
> >>
> >>        CPU0                    CPU1
> >>        ----                    ----
> >>   lock(rlock-AF_UNIX);
> >>                                lock(&(&u->lock)->rlock/1);
> >>                                lock(rlock-AF_UNIX);
> >>   lock(&(&u->lock)->rlock/1);
> >>
> >>  *** DEADLOCK ***
> >>
> >> 5 locks held by syz-executor1/25282:
> >>  #0: 000000003919e1bd (sock_diag_mutex){+.+.}, at: sock_diag_rcv+0x1b/0x40
> >> net/core/sock_diag.c:271
> >>  #1: 000000004f328d3e (sock_diag_table_mutex){+.+.}, at: __sock_diag_cmd
> >> net/core/sock_diag.c:225 [inline]
> >>  #1: 000000004f328d3e (sock_diag_table_mutex){+.+.}, at:
> >> sock_diag_rcv_msg+0x169/0x3d0 net/core/sock_diag.c:261
> >>  #2: 000000004cc04dbb (nlk_cb_mutex-SOCK_DIAG){+.+.}, at:
> >> netlink_dump+0x98/0xd20 net/netlink/af_netlink.c:2182
> >>  #3: 00000000accdef41 (unix_table_lock){+.+.}, at: spin_lock
> >> include/linux/spinlock.h:310 [inline]
> >>  #3: 00000000accdef41 (unix_table_lock){+.+.}, at:
> >> unix_diag_dump+0x10a/0x550 net/unix/diag.c:192
> >>  #4: 00000000b6895645 (rlock-AF_UNIX){+.+.}, at: spin_lock
> >> include/linux/spinlock.h:310 [inline]
> >>  #4: 00000000b6895645 (rlock-AF_UNIX){+.+.}, at: sk_diag_dump_icons
> >> net/unix/diag.c:64 [inline]
> >>  #4: 00000000b6895645 (rlock-AF_UNIX){+.+.}, at:
> >> sk_diag_fill.isra.5+0x94e/0x10d0 net/unix/diag.c:144
> >>
> >> stack backtrace:
> >> CPU: 1 PID: 25282 Comm: syz-executor1 Not tainted 4.17.0-rc3+ #59
> >> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> >> Google 01/01/2011
> >> Call Trace:
> >>  __dump_stack lib/dump_stack.c:77 [inline]
> >>  dump_stack+0x1b9/0x294 lib/dump_stack.c:113
> >>  print_circular_bug.isra.36.cold.54+0x1bd/0x27d
> >> kernel/locking/lockdep.c:1223
> >>  check_prev_add kernel/locking/lockdep.c:1863 [inline]
> >>  check_prevs_add kernel/locking/lockdep.c:1976 [inline]
> >>  validate_chain kernel/locking/lockdep.c:2417 [inline]
> >>  __lock_acquire+0x343e/0x5140 kernel/locking/lockdep.c:3431
> >>  lock_acquire+0x1dc/0x520 kernel/locking/lockdep.c:3920
> >>  _raw_spin_lock_nested+0x28/0x40 kernel/locking/spinlock.c:354
> >>  sk_diag_dump_icons net/unix/diag.c:82 [inline]
> >>  sk_diag_fill.isra.5+0xa43/0x10d0 net/unix/diag.c:144
> >>  sk_diag_dump net/unix/diag.c:178 [inline]
> >>  unix_diag_dump+0x35f/0x550 net/unix/diag.c:206
> >>  netlink_dump+0x507/0xd20 net/netlink/af_netlink.c:2226
> >>  __netlink_dump_start+0x51a/0x780 net/netlink/af_netlink.c:2323
> >>  netlink_dump_start include/linux/netlink.h:214 [inline]
> >>  unix_diag_handler_dump+0x3f4/0x7b0 net/unix/diag.c:307
> >>  __sock_diag_cmd net/core/sock_diag.c:230 [inline]
> >>  sock_diag_rcv_msg+0x2e0/0x3d0 net/core/sock_diag.c:261
> >>  netlink_rcv_skb+0x172/0x440 net/netlink/af_netlink.c:2448
> >>  sock_diag_rcv+0x2a/0x40 net/core/sock_diag.c:272
> >>  netlink_unicast_kernel net/netlink/af_netlink.c:1310 [inline]
> >>  netlink_unicast+0x58b/0x740 net/netlink/af_netlink.c:1336
> >>  netlink_sendmsg+0x9f0/0xfa0 net/netlink/af_netlink.c:1901
> >>  sock_sendmsg_nosec net/socket.c:629 [inline]
> >>  sock_sendmsg+0xd5/0x120 net/socket.c:639
> >>  sock_write_iter+0x35a/0x5a0 net/socket.c:908
> >>  call_write_iter include/linux/fs.h:1784 [inline]
> >>  new_sync_write fs/read_write.c:474 [inline]
> >>  __vfs_write+0x64d/0x960 fs/read_write.c:487
> >>  vfs_write+0x1f8/0x560 fs/read_write.c:549
> >>  ksys_write+0xf9/0x250 fs/read_write.c:598
> >>  __do_sys_write fs/read_write.c:610 [inline]
> >>  __se_sys_write fs/read_write.c:607 [inline]
> >>  __ia32_sys_write+0x71/0xb0 fs/read_write.c:607
> >>  do_syscall_32_irqs_on arch/x86/entry/common.c:323 [inline]
> >>  do_fast_syscall_32+0x345/0xf9b arch/x86/entry/common.c:394
> >>  entry_SYSENTER_compat+0x70/0x7f arch/x86/entry/entry_64_compat.S:139
> >> RIP: 0023:0xf7f8ccb9
> >> RSP: 002b:00000000f5f880ac EFLAGS: 00000282 ORIG_RAX: 0000000000000004
> >> RAX: ffffffffffffffda RBX: 0000000000000017 RCX: 000000002058bfe4
> >> RDX: 0000000000000029 RSI: 0000000000000000 RDI: 0000000000000000
> >> RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
> >> R10: 0000000000000000 R11: 0000000000000296 R12: 0000000000000000
> >> R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
> >>
> >>
> >> ---
> >> This bug is generated by a bot. It may contain errors.
> >> See https://goo.gl/tpsmEJ for more information about syzbot.
> >> syzbot engineers can be reached at syzkaller@googlegroups.com.
> >>
> >> syzbot will keep track of this bug report.
> >> If you forgot to add the Reported-by tag, once the fix for this bug is
> >> merged
> >> into any tree, please reply to this email with:
> >> #syz fix: exact-commit-title
> >> To mark this as a duplicate of another syzbot report, please reply with:
> >> #syz dup: exact-subject-of-another-report
> >> If it's a one-off invalid bug report, please reply with:
> >> #syz invalid
> >> Note: if the crash happens again, it will cause creation of a new bug
> >> report.
> >> Note: all commands must start from beginning of the line in the email body.
> >
> > --
> > You received this message because you are subscribed to the Google Groups "syzkaller-bugs" group.
> > To unsubscribe from this group and stop receiving emails from it, send an email to syzkaller-bugs+unsubscribe@googlegroups.com.
> > To view this discussion on the web visit https://groups.google.com/d/msgid/syzkaller-bugs/20180511183358.GA1492%40outlook.office365.com.
> > For more options, visit https://groups.google.com/d/optout.

^ permalink raw reply

* Re: [PATCH net-next v8 0/3] kernel: add support to collect hardware logs in crash recovery kernel
From: David Miller @ 2018-05-14 17:49 UTC (permalink / raw)
  To: ebiederm
  Cc: rahul.lakkireddy, netdev, kexec, linux-fsdevel, linux-kernel,
	viro, stephen, akpm, torvalds, ganeshgr, nirranjan, indranil
In-Reply-To: <87a7t2mlfn.fsf@xmission.com>

From: ebiederm@xmission.com (Eric W. Biederman)
Date: Mon, 14 May 2018 08:11:24 -0500

> David Miller <davem@davemloft.net> writes:
> 
>> I'm deferring this patch series.
>>
>> If we can't get a reasonable review from an interested party in 10+
>> days, that is not reasonable.
>>
>> Resubmit this once someone reviews it properly.
> 
> David I am out on vacation this week and last (the reason for the delay).
> 
> The last version of this that I looked at I gave my ack.  All of my ABI
> concerns had been addressed. The only outstanding change I believe was
> the Eric Dumazet's asking about something being reviewed.
> 
> I just glanced over it again and I don't see any new issues introduced
> by the last round of changes.
> 
> From 10,000 feet flyover design perspectie and from an ABI perspective
> this patchset seems fine.
> 
> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>

Ok, thanks for reviewing Eric.

Series applied.

^ permalink raw reply

* Re: [PATCH bpf-next 3/6] bpf, x64: clean up retpoline emission slightly
From: Y Song @ 2018-05-14 17:42 UTC (permalink / raw)
  To: Daniel Borkmann; +Cc: Alexei Starovoitov, netdev
In-Reply-To: <20180514002612.12083-4-daniel@iogearbox.net>

On Sun, May 13, 2018 at 5:26 PM, Daniel Borkmann <daniel@iogearbox.net> wrote:
> Make the RETPOLINE_{RA,ED}X_BPF_JIT() a bit more readable by
> cleaning up the macro, aligning comments and spacing.
>
> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
> ---
>  arch/x86/include/asm/nospec-branch.h | 29 ++++++++++++++---------------
>  1 file changed, 14 insertions(+), 15 deletions(-)
>
> diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h
> index 2cd344d..2f700a1d 100644
> --- a/arch/x86/include/asm/nospec-branch.h
> +++ b/arch/x86/include/asm/nospec-branch.h
> @@ -301,9 +301,9 @@ do {                                                                        \
>   *    jmp *%edx for x86_32
>   */
>  #ifdef CONFIG_RETPOLINE
> -#ifdef CONFIG_X86_64
> -# define RETPOLINE_RAX_BPF_JIT_SIZE    17
> -# define RETPOLINE_RAX_BPF_JIT()                               \
> +# ifdef CONFIG_X86_64
> +#  define RETPOLINE_RAX_BPF_JIT_SIZE   17
> +#  define RETPOLINE_RAX_BPF_JIT()                              \
>  do {                                                           \
>         EMIT1_off32(0xE8, 7);    /* callq do_rop */             \
>         /* spec_trap: */                                        \
> @@ -314,8 +314,8 @@ do {                                                                \
>         EMIT4(0x48, 0x89, 0x04, 0x24); /* mov %rax,(%rsp) */    \
>         EMIT1(0xC3);             /* retq */                     \
>  } while (0)
> -#else
> -# define RETPOLINE_EDX_BPF_JIT()                               \
> +# else /* !CONFIG_X86_64 */
> +#  define RETPOLINE_EDX_BPF_JIT()                              \
>  do {                                                           \
>         EMIT1_off32(0xE8, 7);    /* call do_rop */              \
>         /* spec_trap: */                                        \
> @@ -326,17 +326,16 @@ do {                                                              \
>         EMIT3(0x89, 0x14, 0x24); /* mov %edx,(%esp) */          \
>         EMIT1(0xC3);             /* ret */                      \
>  } while (0)
> -#endif
> +# endif
>  #else /* !CONFIG_RETPOLINE */
> -
> -#ifdef CONFIG_X86_64
> -# define RETPOLINE_RAX_BPF_JIT_SIZE    2
> -# define RETPOLINE_RAX_BPF_JIT()                               \
> -       EMIT2(0xFF, 0xE0);       /* jmp *%rax */
> -#else
> -# define RETPOLINE_EDX_BPF_JIT()                               \
> -       EMIT2(0xFF, 0xE2) /* jmp *%edx */
> -#endif
> +# ifdef CONFIG_X86_64
> +#  define RETPOLINE_RAX_BPF_JIT_SIZE   2
> +#  define RETPOLINE_RAX_BPF_JIT()                              \
> +       EMIT2(0xFF, 0xE0);       /* jmp *%rax */
> +# else /* !CONFIG_X86_64 */
> +#  define RETPOLINE_EDX_BPF_JIT()                              \
> +       EMIT2(0xFF, 0xE2)        /* jmp *%edx */
> +# endif
>  #endif
>
>  #endif /* _ASM_X86_NOSPEC_BRANCH_H_ */
> --
> 2.9.5
>

Acked-by: Yonghong Song <yhs@fb.com>

^ permalink raw reply

* RE: [PATCH net-next 4/4] bonding: allow carrier and link status to determine link state
From: Banerjee, Debabrata @ 2018-05-14 17:39 UTC (permalink / raw)
  To: 'Jay Vosburgh'
  Cc: David S . Miller, netdev@vger.kernel.org, Veaceslav Falico,
	Andy Gospodarek
In-Reply-To: <6612.1526076265@famine>

> From: Jay Vosburgh [mailto:jay.vosburgh@canonical.com]
> Debabrata Banerjee <dbanerje@akamai.com> wrote:
> 
> >In a mixed environment it may be difficult to tell if your hardware
> >support carrier, if it does not it can always report true. With a new
> >use_carrier option of 2, we can check both carrier and link status
> >sequentially, instead of one or the other
> 
> 	What do you mean by "mixed environment," and under what
> circumstances are you seeing an actual benefit from doing the MII / ethtool
> test in addition to the standard netif_carrier_ok test?
> 
> 	The use_carrier option was meant for backwards compatibility with
> old-in-2005 device drivers, so this seem counterintuitive to me.  I don't recall
> seeing any devices lacking netif_carrier support for some time.  At this point,
> I would tend to argue that a new device driver that does not implement
> netif_carrier support should be fixed, and not have another hack added to
> bonding to work around it.

Mixed environment = 100k machines with dozen NIC flavors. I have logs of a bnx2x machine that was reporting carrier OK with speed and duplex as unknown. Led to some interesting behavior of flooding on the switch due to the local destination mac not being known by the switch. Of course these situations get rectified manually, and if they don't happen again it's hard for me to confirm details of how it broke. However, while we continue to have both checks, it would make sense to be able to use both. If it's reliable enough to use carrier only, then it can go away. We could carry this patch internally for the time being, but it seemed useful for everyone. See below:

Bonding Mode: adaptive load balancing
Primary Slave: None
Currently Active Slave: ith1
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0

Slave Interface: ith0
MII Status: up
Speed: 10000 Mbps
Duplex: full
Link Failure Count: 1
Permanent HW addr: 00:e0:81:e5:0e:16
Slave queue ID: 0

Slave Interface: ith1
MII Status: up
Speed: Unknown
Duplex: Unknown
Link Failure Count: 0
Permanent HW addr: 00:e0:81:e5:0e:18
Slave queue ID: 0

^ permalink raw reply

* [PATCH net-next v3 1/3] sctp: add sctp_flush_ctx, a context struct on outq_flush routines
From: Marcelo Ricardo Leitner @ 2018-05-14 17:35 UTC (permalink / raw)
  To: netdev; +Cc: linux-sctp, Neil Horman, Xin Long, Vlad Yasevich
In-Reply-To: <cover.1526319083.git.marcelo.leitner@gmail.com>

With this struct we avoid passing lots of variables around and taking care
of updating the current transport/packet.

Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
---
 net/sctp/outqueue.c | 182 +++++++++++++++++++++++++---------------------------
 1 file changed, 88 insertions(+), 94 deletions(-)

diff --git a/net/sctp/outqueue.c b/net/sctp/outqueue.c
index e9c22b3db11cd702afc52f8233fb7f39d917698a..db94a2513dd874149aa77c4936f68537e97f8855 100644
--- a/net/sctp/outqueue.c
+++ b/net/sctp/outqueue.c
@@ -791,13 +791,22 @@ static int sctp_packet_singleton(struct sctp_transport *transport,
 	return sctp_packet_transmit(&singleton, gfp);
 }
 
-static bool sctp_outq_select_transport(struct sctp_chunk *chunk,
-				       struct sctp_association *asoc,
-				       struct sctp_transport **transport,
-				       struct list_head *transport_list)
+/* Struct to hold the context during sctp outq flush */
+struct sctp_flush_ctx {
+	struct sctp_outq *q;
+	/* Current transport being used. It's NOT the same as curr active one */
+	struct sctp_transport *transport;
+	/* These transports have chunks to send. */
+	struct list_head transport_list;
+	gfp_t gfp;
+};
+
+/* transport: current transport */
+static bool sctp_outq_select_transport(struct sctp_flush_ctx *ctx,
+				       struct sctp_chunk *chunk)
 {
 	struct sctp_transport *new_transport = chunk->transport;
-	struct sctp_transport *curr = *transport;
+	struct sctp_association *asoc = ctx->q->asoc;
 	bool changed = false;
 
 	if (!new_transport) {
@@ -812,9 +821,9 @@ static bool sctp_outq_select_transport(struct sctp_chunk *chunk,
 			 * after processing ASCONFs, we may have new
 			 * transports created.
 			 */
-			if (curr && sctp_cmp_addr_exact(&chunk->dest,
-							&curr->ipaddr))
-				new_transport = curr;
+			if (ctx->transport && sctp_cmp_addr_exact(&chunk->dest,
+							&ctx->transport->ipaddr))
+				new_transport = ctx->transport;
 			else
 				new_transport = sctp_assoc_lookup_paddr(asoc,
 								  &chunk->dest);
@@ -857,37 +866,33 @@ static bool sctp_outq_select_transport(struct sctp_chunk *chunk,
 	}
 
 	/* Are we switching transports? Take care of transport locks. */
-	if (new_transport != curr) {
+	if (new_transport != ctx->transport) {
 		changed = true;
-		curr = new_transport;
-		*transport = curr;
-		if (list_empty(&curr->send_ready))
-			list_add_tail(&curr->send_ready, transport_list);
+		ctx->transport = new_transport;
+		if (list_empty(&ctx->transport->send_ready))
+			list_add_tail(&ctx->transport->send_ready,
+				      &ctx->transport_list);
 
-		sctp_packet_config(&curr->packet, asoc->peer.i.init_tag,
+		sctp_packet_config(&ctx->transport->packet, asoc->peer.i.init_tag,
 				   asoc->peer.ecn_capable);
 		/* We've switched transports, so apply the
 		 * Burst limit to the new transport.
 		 */
-		sctp_transport_burst_limited(curr);
+		sctp_transport_burst_limited(ctx->transport);
 	}
 
 	return changed;
 }
 
-static void sctp_outq_flush_ctrl(struct sctp_outq *q,
-				 struct sctp_transport **_transport,
-				 struct list_head *transport_list,
-				 gfp_t gfp)
+static void sctp_outq_flush_ctrl(struct sctp_flush_ctx *ctx)
 {
-	struct sctp_transport *transport = *_transport;
-	struct sctp_association *asoc = q->asoc;
+	struct sctp_association *asoc = ctx->q->asoc;
 	struct sctp_packet *packet = NULL;
 	struct sctp_chunk *chunk, *tmp;
 	enum sctp_xmit status;
 	int one_packet, error;
 
-	list_for_each_entry_safe(chunk, tmp, &q->control_chunk_list, list) {
+	list_for_each_entry_safe(chunk, tmp, &ctx->q->control_chunk_list, list) {
 		one_packet = 0;
 
 		/* RFC 5061, 5.3
@@ -905,11 +910,8 @@ static void sctp_outq_flush_ctrl(struct sctp_outq *q,
 		/* Pick the right transport to use. Should always be true for
 		 * the first chunk as we don't have a transport by then.
 		 */
-		if (sctp_outq_select_transport(chunk, asoc, _transport,
-					       transport_list)) {
-			transport = *_transport;
-			packet = &transport->packet;
-		}
+		if (sctp_outq_select_transport(ctx, chunk))
+			packet = &ctx->transport->packet;
 
 		switch (chunk->chunk_hdr->type) {
 		/*
@@ -921,7 +923,8 @@ static void sctp_outq_flush_ctrl(struct sctp_outq *q,
 		case SCTP_CID_INIT:
 		case SCTP_CID_INIT_ACK:
 		case SCTP_CID_SHUTDOWN_COMPLETE:
-			error = sctp_packet_singleton(transport, chunk, gfp);
+			error = sctp_packet_singleton(ctx->transport, chunk,
+						      ctx->gfp);
 			if (error < 0) {
 				asoc->base.sk->sk_err = -error;
 				return;
@@ -957,10 +960,10 @@ static void sctp_outq_flush_ctrl(struct sctp_outq *q,
 		case SCTP_CID_I_FWD_TSN:
 		case SCTP_CID_RECONF:
 			status = sctp_packet_transmit_chunk(packet, chunk,
-							    one_packet, gfp);
+							    one_packet, ctx->gfp);
 			if (status != SCTP_XMIT_OK) {
 				/* put the chunk back */
-				list_add(&chunk->list, &q->control_chunk_list);
+				list_add(&chunk->list, &ctx->q->control_chunk_list);
 				break;
 			}
 
@@ -971,12 +974,12 @@ static void sctp_outq_flush_ctrl(struct sctp_outq *q,
 			 */
 			if (chunk->chunk_hdr->type == SCTP_CID_FWD_TSN ||
 			    chunk->chunk_hdr->type == SCTP_CID_I_FWD_TSN) {
-				sctp_transport_reset_t3_rtx(transport);
-				transport->last_time_sent = jiffies;
+				sctp_transport_reset_t3_rtx(ctx->transport);
+				ctx->transport->last_time_sent = jiffies;
 			}
 
 			if (chunk == asoc->strreset_chunk)
-				sctp_transport_reset_reconf_timer(transport);
+				sctp_transport_reset_reconf_timer(ctx->transport);
 
 			break;
 
@@ -988,41 +991,38 @@ static void sctp_outq_flush_ctrl(struct sctp_outq *q,
 }
 
 /* Returns false if new data shouldn't be sent */
-static bool sctp_outq_flush_rtx(struct sctp_outq *q,
-				struct sctp_transport **_transport,
-				struct list_head *transport_list,
-				int rtx_timeout, gfp_t gfp)
+static bool sctp_outq_flush_rtx(struct sctp_flush_ctx *ctx,
+				int rtx_timeout)
 {
-	struct sctp_transport *transport = *_transport;
-	struct sctp_packet *packet = transport ? &transport->packet : NULL;
-	struct sctp_association *asoc = q->asoc;
+	struct sctp_packet *packet = ctx->transport ? &ctx->transport->packet :
+				     NULL;
+	struct sctp_association *asoc = ctx->q->asoc;
 	int error, start_timer = 0;
 
 	if (asoc->peer.retran_path->state == SCTP_UNCONFIRMED)
 		return false;
 
-	if (transport != asoc->peer.retran_path) {
+	if (ctx->transport != asoc->peer.retran_path) {
 		/* Switch transports & prepare the packet.  */
-		transport = asoc->peer.retran_path;
-		*_transport = transport;
+		ctx->transport = asoc->peer.retran_path;
 
-		if (list_empty(&transport->send_ready))
-			list_add_tail(&transport->send_ready,
-				      transport_list);
+		if (list_empty(&ctx->transport->send_ready))
+			list_add_tail(&ctx->transport->send_ready,
+				      &ctx->transport_list);
 
-		packet = &transport->packet;
+		packet = &ctx->transport->packet;
 		sctp_packet_config(packet, asoc->peer.i.init_tag,
 				   asoc->peer.ecn_capable);
 	}
 
-	error = __sctp_outq_flush_rtx(q, packet, rtx_timeout, &start_timer,
-				      gfp);
+	error = __sctp_outq_flush_rtx(ctx->q, packet, rtx_timeout, &start_timer,
+				      ctx->gfp);
 	if (error < 0)
 		asoc->base.sk->sk_err = -error;
 
 	if (start_timer) {
-		sctp_transport_reset_t3_rtx(transport);
-		transport->last_time_sent = jiffies;
+		sctp_transport_reset_t3_rtx(ctx->transport);
+		ctx->transport->last_time_sent = jiffies;
 	}
 
 	/* This can happen on COOKIE-ECHO resend.  Only
@@ -1034,20 +1034,18 @@ static bool sctp_outq_flush_rtx(struct sctp_outq *q,
 	/* Don't send new data if there is still data
 	 * waiting to retransmit.
 	 */
-	if (!list_empty(&q->retransmit))
+	if (!list_empty(&ctx->q->retransmit))
 		return false;
 
 	return true;
 }
 
-static void sctp_outq_flush_data(struct sctp_outq *q,
-				 struct sctp_transport **_transport,
-				 struct list_head *transport_list,
-				 int rtx_timeout, gfp_t gfp)
+static void sctp_outq_flush_data(struct sctp_flush_ctx *ctx,
+				 int rtx_timeout)
 {
-	struct sctp_transport *transport = *_transport;
-	struct sctp_packet *packet = transport ? &transport->packet : NULL;
-	struct sctp_association *asoc = q->asoc;
+	struct sctp_packet *packet = ctx->transport ? &ctx->transport->packet :
+				     NULL;
+	struct sctp_association *asoc = ctx->q->asoc;
 	struct sctp_chunk *chunk;
 	enum sctp_xmit status;
 
@@ -1080,13 +1078,11 @@ static void sctp_outq_flush_data(struct sctp_outq *q,
 	 * are marked for retransmission (limited by the
 	 * current cwnd).
 	 */
-	if (!list_empty(&q->retransmit)) {
-		if (!sctp_outq_flush_rtx(q, _transport, transport_list,
-					 rtx_timeout, gfp))
+	if (!list_empty(&ctx->q->retransmit)) {
+		if (!sctp_outq_flush_rtx(ctx, rtx_timeout))
 			return;
 		/* We may have switched current transport */
-		transport = *_transport;
-		packet = &transport->packet;
+		packet = &ctx->transport->packet;
 	}
 
 	/* Apply Max.Burst limitation to the current transport in
@@ -1094,42 +1090,39 @@ static void sctp_outq_flush_data(struct sctp_outq *q,
 	 * rest it before we return, but we want to apply the limit
 	 * to the currently queued data.
 	 */
-	if (transport)
-		sctp_transport_burst_limited(transport);
+	if (ctx->transport)
+		sctp_transport_burst_limited(ctx->transport);
 
 	/* Finally, transmit new packets.  */
-	while ((chunk = sctp_outq_dequeue_data(q)) != NULL) {
+	while ((chunk = sctp_outq_dequeue_data(ctx->q)) != NULL) {
 		__u32 sid = ntohs(chunk->subh.data_hdr->stream);
 
 		/* Has this chunk expired? */
 		if (sctp_chunk_abandoned(chunk)) {
-			sctp_sched_dequeue_done(q, chunk);
+			sctp_sched_dequeue_done(ctx->q, chunk);
 			sctp_chunk_fail(chunk, 0);
 			sctp_chunk_free(chunk);
 			continue;
 		}
 
 		if (asoc->stream.out[sid].state == SCTP_STREAM_CLOSED) {
-			sctp_outq_head_data(q, chunk);
+			sctp_outq_head_data(ctx->q, chunk);
 			break;
 		}
 
-		if (sctp_outq_select_transport(chunk, asoc, _transport,
-					       transport_list)) {
-			transport = *_transport;
-			packet = &transport->packet;
-		}
+		if (sctp_outq_select_transport(ctx, chunk))
+			packet = &ctx->transport->packet;
 
 		pr_debug("%s: outq:%p, chunk:%p[%s], tx-tsn:0x%x skb->head:%p "
 			 "skb->users:%d\n",
-			 __func__, q, chunk, chunk && chunk->chunk_hdr ?
+			 __func__, ctx->q, chunk, chunk && chunk->chunk_hdr ?
 			 sctp_cname(SCTP_ST_CHUNK(chunk->chunk_hdr->type)) :
 			 "illegal chunk", ntohl(chunk->subh.data_hdr->tsn),
 			 chunk->skb ? chunk->skb->head : NULL, chunk->skb ?
 			 refcount_read(&chunk->skb->users) : -1);
 
 		/* Add the chunk to the packet.  */
-		status = sctp_packet_transmit_chunk(packet, chunk, 0, gfp);
+		status = sctp_packet_transmit_chunk(packet, chunk, 0, ctx->gfp);
 		if (status != SCTP_XMIT_OK) {
 			/* We could not append this chunk, so put
 			 * the chunk back on the output queue.
@@ -1138,7 +1131,7 @@ static void sctp_outq_flush_data(struct sctp_outq *q,
 				 __func__, ntohl(chunk->subh.data_hdr->tsn),
 				 status);
 
-			sctp_outq_head_data(q, chunk);
+			sctp_outq_head_data(ctx->q, chunk);
 			break;
 		}
 
@@ -1156,13 +1149,13 @@ static void sctp_outq_flush_data(struct sctp_outq *q,
 		/* Only now it's safe to consider this
 		 * chunk as sent, sched-wise.
 		 */
-		sctp_sched_dequeue_done(q, chunk);
+		sctp_sched_dequeue_done(ctx->q, chunk);
 
 		list_add_tail(&chunk->transmitted_list,
-			      &transport->transmitted);
+			      &ctx->transport->transmitted);
 
-		sctp_transport_reset_t3_rtx(transport);
-		transport->last_time_sent = jiffies;
+		sctp_transport_reset_t3_rtx(ctx->transport);
+		ctx->transport->last_time_sent = jiffies;
 
 		/* Only let one DATA chunk get bundled with a
 		 * COOKIE-ECHO chunk.
@@ -1172,22 +1165,20 @@ static void sctp_outq_flush_data(struct sctp_outq *q,
 	}
 }
 
-static void sctp_outq_flush_transports(struct sctp_outq *q,
-				       struct list_head *transport_list,
-				       gfp_t gfp)
+static void sctp_outq_flush_transports(struct sctp_flush_ctx *ctx)
 {
 	struct list_head *ltransport;
 	struct sctp_packet *packet;
 	struct sctp_transport *t;
 	int error = 0;
 
-	while ((ltransport = sctp_list_dequeue(transport_list)) != NULL) {
+	while ((ltransport = sctp_list_dequeue(&ctx->transport_list)) != NULL) {
 		t = list_entry(ltransport, struct sctp_transport, send_ready);
 		packet = &t->packet;
 		if (!sctp_packet_empty(packet)) {
-			error = sctp_packet_transmit(packet, gfp);
+			error = sctp_packet_transmit(packet, ctx->gfp);
 			if (error < 0)
-				q->asoc->base.sk->sk_err = -error;
+				ctx->q->asoc->base.sk->sk_err = -error;
 		}
 
 		/* Clear the burst limited state, if any */
@@ -1204,12 +1195,15 @@ static void sctp_outq_flush_transports(struct sctp_outq *q,
  * locking concerns must be made.  Today we use the sock lock to protect
  * this function.
  */
+
 static void sctp_outq_flush(struct sctp_outq *q, int rtx_timeout, gfp_t gfp)
 {
-	/* Current transport being used. It's NOT the same as curr active one */
-	struct sctp_transport *transport = NULL;
-	/* These transports have chunks to send. */
-	LIST_HEAD(transport_list);
+	struct sctp_flush_ctx ctx = {
+		.q = q,
+		.transport = NULL,
+		.transport_list = LIST_HEAD_INIT(ctx.transport_list),
+		.gfp = gfp,
+	};
 
 	/*
 	 * 6.10 Bundling
@@ -1221,16 +1215,16 @@ static void sctp_outq_flush(struct sctp_outq *q, int rtx_timeout, gfp_t gfp)
 	 *   ...
 	 */
 
-	sctp_outq_flush_ctrl(q, &transport, &transport_list, gfp);
+	sctp_outq_flush_ctrl(&ctx);
 
 	if (q->asoc->src_out_of_asoc_ok)
 		goto sctp_flush_out;
 
-	sctp_outq_flush_data(q, &transport, &transport_list, rtx_timeout, gfp);
+	sctp_outq_flush_data(&ctx, rtx_timeout);
 
 sctp_flush_out:
 
-	sctp_outq_flush_transports(q, &transport_list, gfp);
+	sctp_outq_flush_transports(&ctx);
 }
 
 /* Update unack_data based on the incoming SACK chunk */
-- 
2.14.3

^ permalink raw reply related

* [PATCH net-next v3 2/3] sctp: add asoc and packet to sctp_flush_ctx
From: Marcelo Ricardo Leitner @ 2018-05-14 17:35 UTC (permalink / raw)
  To: netdev; +Cc: linux-sctp, Neil Horman, Xin Long, Vlad Yasevich
In-Reply-To: <cover.1526319083.git.marcelo.leitner@gmail.com>

Pre-compute these so the compiler won't reload them (due to
no-strict-aliasing).

Changes since v2:
- Do not replace a return with a break in sctp_outq_flush_data

Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
---
 net/sctp/outqueue.c | 97 ++++++++++++++++++++++++-----------------------------
 1 file changed, 44 insertions(+), 53 deletions(-)

diff --git a/net/sctp/outqueue.c b/net/sctp/outqueue.c
index db94a2513dd874149aa77c4936f68537e97f8855..68b7baea3feaacf6738a2346501b2ad673b67503 100644
--- a/net/sctp/outqueue.c
+++ b/net/sctp/outqueue.c
@@ -798,16 +798,17 @@ struct sctp_flush_ctx {
 	struct sctp_transport *transport;
 	/* These transports have chunks to send. */
 	struct list_head transport_list;
+	struct sctp_association *asoc;
+	/* Packet on the current transport above */
+	struct sctp_packet *packet;
 	gfp_t gfp;
 };

 /* transport: current transport */
-static bool sctp_outq_select_transport(struct sctp_flush_ctx *ctx,
+static void sctp_outq_select_transport(struct sctp_flush_ctx *ctx,
 				       struct sctp_chunk *chunk)
 {
 	struct sctp_transport *new_transport = chunk->transport;
-	struct sctp_association *asoc = ctx->q->asoc;
-	bool changed = false;

 	if (!new_transport) {
 		if (!sctp_chunk_is_data(chunk)) {
@@ -825,7 +826,7 @@ static bool sctp_outq_select_transport(struct sctp_flush_ctx *ctx,
 							&ctx->transport->ipaddr))
 				new_transport = ctx->transport;
 			else
-				new_transport = sctp_assoc_lookup_paddr(asoc,
+				new_transport = sctp_assoc_lookup_paddr(ctx->asoc,
 								  &chunk->dest);
 		}

@@ -833,7 +834,7 @@ static bool sctp_outq_select_transport(struct sctp_flush_ctx *ctx,
 		 * use the current active path.
 		 */
 		if (!new_transport)
-			new_transport = asoc->peer.active_path;
+			new_transport = ctx->asoc->peer.active_path;
 	} else {
 		__u8 type;

@@ -858,7 +859,7 @@ static bool sctp_outq_select_transport(struct sctp_flush_ctx *ctx,
 			if (type != SCTP_CID_HEARTBEAT &&
 			    type != SCTP_CID_HEARTBEAT_ACK &&
 			    type != SCTP_CID_ASCONF_ACK)
-				new_transport = asoc->peer.active_path;
+				new_transport = ctx->asoc->peer.active_path;
 			break;
 		default:
 			break;
@@ -867,27 +868,25 @@ static bool sctp_outq_select_transport(struct sctp_flush_ctx *ctx,

 	/* Are we switching transports? Take care of transport locks. */
 	if (new_transport != ctx->transport) {
-		changed = true;
 		ctx->transport = new_transport;
+		ctx->packet = &ctx->transport->packet;
+
 		if (list_empty(&ctx->transport->send_ready))
 			list_add_tail(&ctx->transport->send_ready,
 				      &ctx->transport_list);

-		sctp_packet_config(&ctx->transport->packet, asoc->peer.i.init_tag,
-				   asoc->peer.ecn_capable);
+		sctp_packet_config(ctx->packet,
+				   ctx->asoc->peer.i.init_tag,
+				   ctx->asoc->peer.ecn_capable);
 		/* We've switched transports, so apply the
 		 * Burst limit to the new transport.
 		 */
 		sctp_transport_burst_limited(ctx->transport);
 	}
-
-	return changed;
 }

 static void sctp_outq_flush_ctrl(struct sctp_flush_ctx *ctx)
 {
-	struct sctp_association *asoc = ctx->q->asoc;
-	struct sctp_packet *packet = NULL;
 	struct sctp_chunk *chunk, *tmp;
 	enum sctp_xmit status;
 	int one_packet, error;
@@ -901,7 +900,7 @@ static void sctp_outq_flush_ctrl(struct sctp_flush_ctx *ctx)
 		 * NOT use the new IP address as a source for ANY SCTP
 		 * packet except on carrying an ASCONF Chunk.
 		 */
-		if (asoc->src_out_of_asoc_ok &&
+		if (ctx->asoc->src_out_of_asoc_ok &&
 		    chunk->chunk_hdr->type != SCTP_CID_ASCONF)
 			continue;

@@ -910,8 +909,7 @@ static void sctp_outq_flush_ctrl(struct sctp_flush_ctx *ctx)
 		/* Pick the right transport to use. Should always be true for
 		 * the first chunk as we don't have a transport by then.
 		 */
-		if (sctp_outq_select_transport(ctx, chunk))
-			packet = &ctx->transport->packet;
+		sctp_outq_select_transport(ctx, chunk);

 		switch (chunk->chunk_hdr->type) {
 		/*
@@ -926,14 +924,14 @@ static void sctp_outq_flush_ctrl(struct sctp_flush_ctx *ctx)
 			error = sctp_packet_singleton(ctx->transport, chunk,
 						      ctx->gfp);
 			if (error < 0) {
-				asoc->base.sk->sk_err = -error;
+				ctx->asoc->base.sk->sk_err = -error;
 				return;
 			}
 			break;

 		case SCTP_CID_ABORT:
 			if (sctp_test_T_bit(chunk))
-				packet->vtag = asoc->c.my_vtag;
+				ctx->packet->vtag = ctx->asoc->c.my_vtag;
 			/* fallthru */

 		/* The following chunks are "response" chunks, i.e.
@@ -959,7 +957,7 @@ static void sctp_outq_flush_ctrl(struct sctp_flush_ctx *ctx)
 		case SCTP_CID_FWD_TSN:
 		case SCTP_CID_I_FWD_TSN:
 		case SCTP_CID_RECONF:
-			status = sctp_packet_transmit_chunk(packet, chunk,
+			status = sctp_packet_transmit_chunk(ctx->packet, chunk,
 							    one_packet, ctx->gfp);
 			if (status != SCTP_XMIT_OK) {
 				/* put the chunk back */
@@ -967,7 +965,7 @@ static void sctp_outq_flush_ctrl(struct sctp_flush_ctx *ctx)
 				break;
 			}

-			asoc->stats.octrlchunks++;
+			ctx->asoc->stats.octrlchunks++;
 			/* PR-SCTP C5) If a FORWARD TSN is sent, the
 			 * sender MUST assure that at least one T3-rtx
 			 * timer is running.
@@ -978,7 +976,7 @@ static void sctp_outq_flush_ctrl(struct sctp_flush_ctx *ctx)
 				ctx->transport->last_time_sent = jiffies;
 			}

-			if (chunk == asoc->strreset_chunk)
+			if (chunk == ctx->asoc->strreset_chunk)
 				sctp_transport_reset_reconf_timer(ctx->transport);

 			break;
@@ -994,31 +992,28 @@ static void sctp_outq_flush_ctrl(struct sctp_flush_ctx *ctx)
 static bool sctp_outq_flush_rtx(struct sctp_flush_ctx *ctx,
 				int rtx_timeout)
 {
-	struct sctp_packet *packet = ctx->transport ? &ctx->transport->packet :
-				     NULL;
-	struct sctp_association *asoc = ctx->q->asoc;
 	int error, start_timer = 0;

-	if (asoc->peer.retran_path->state == SCTP_UNCONFIRMED)
+	if (ctx->asoc->peer.retran_path->state == SCTP_UNCONFIRMED)
 		return false;

-	if (ctx->transport != asoc->peer.retran_path) {
+	if (ctx->transport != ctx->asoc->peer.retran_path) {
 		/* Switch transports & prepare the packet.  */
-		ctx->transport = asoc->peer.retran_path;
+		ctx->transport = ctx->asoc->peer.retran_path;
+		ctx->packet = &ctx->transport->packet;

 		if (list_empty(&ctx->transport->send_ready))
 			list_add_tail(&ctx->transport->send_ready,
 				      &ctx->transport_list);

-		packet = &ctx->transport->packet;
-		sctp_packet_config(packet, asoc->peer.i.init_tag,
-				   asoc->peer.ecn_capable);
+		sctp_packet_config(ctx->packet, ctx->asoc->peer.i.init_tag,
+				   ctx->asoc->peer.ecn_capable);
 	}

-	error = __sctp_outq_flush_rtx(ctx->q, packet, rtx_timeout, &start_timer,
-				      ctx->gfp);
+	error = __sctp_outq_flush_rtx(ctx->q, ctx->packet, rtx_timeout,
+				      &start_timer, ctx->gfp);
 	if (error < 0)
-		asoc->base.sk->sk_err = -error;
+		ctx->asoc->base.sk->sk_err = -error;

 	if (start_timer) {
 		sctp_transport_reset_t3_rtx(ctx->transport);
@@ -1028,7 +1023,7 @@ static bool sctp_outq_flush_rtx(struct sctp_flush_ctx *ctx,
 	/* This can happen on COOKIE-ECHO resend.  Only
 	 * one chunk can get bundled with a COOKIE-ECHO.
 	 */
-	if (packet->has_cookie_echo)
+	if (ctx->packet->has_cookie_echo)
 		return false;

 	/* Don't send new data if there is still data
@@ -1043,19 +1038,16 @@ static bool sctp_outq_flush_rtx(struct sctp_flush_ctx *ctx,
 static void sctp_outq_flush_data(struct sctp_flush_ctx *ctx,
 				 int rtx_timeout)
 {
-	struct sctp_packet *packet = ctx->transport ? &ctx->transport->packet :
-				     NULL;
-	struct sctp_association *asoc = ctx->q->asoc;
 	struct sctp_chunk *chunk;
 	enum sctp_xmit status;

 	/* Is it OK to send data chunks?  */
-	switch (asoc->state) {
+	switch (ctx->asoc->state) {
 	case SCTP_STATE_COOKIE_ECHOED:
 		/* Only allow bundling when this packet has a COOKIE-ECHO
 		 * chunk.
 		 */
-		if (!packet || !packet->has_cookie_echo)
+		if (!ctx->packet || !ctx->packet->has_cookie_echo)
 			return;

 		/* fallthru */
@@ -1078,12 +1070,9 @@ static void sctp_outq_flush_data(struct sctp_flush_ctx *ctx,
 	 * are marked for retransmission (limited by the
 	 * current cwnd).
 	 */
-	if (!list_empty(&ctx->q->retransmit)) {
-		if (!sctp_outq_flush_rtx(ctx, rtx_timeout))
-			return;
-		/* We may have switched current transport */
-		packet = &ctx->transport->packet;
-	}
+	if (!list_empty(&ctx->q->retransmit) &&
+	    !sctp_outq_flush_rtx(ctx, rtx_timeout))
+		return;

 	/* Apply Max.Burst limitation to the current transport in
 	 * case it will be used for new data.  We are going to
@@ -1105,13 +1094,12 @@ static void sctp_outq_flush_data(struct sctp_flush_ctx *ctx,
 			continue;
 		}

-		if (asoc->stream.out[sid].state == SCTP_STREAM_CLOSED) {
+		if (ctx->asoc->stream.out[sid].state == SCTP_STREAM_CLOSED) {
 			sctp_outq_head_data(ctx->q, chunk);
 			break;
 		}

-		if (sctp_outq_select_transport(ctx, chunk))
-			packet = &ctx->transport->packet;
+		sctp_outq_select_transport(ctx, chunk);

 		pr_debug("%s: outq:%p, chunk:%p[%s], tx-tsn:0x%x skb->head:%p "
 			 "skb->users:%d\n",
@@ -1122,7 +1110,8 @@ static void sctp_outq_flush_data(struct sctp_flush_ctx *ctx,
 			 refcount_read(&chunk->skb->users) : -1);

 		/* Add the chunk to the packet.  */
-		status = sctp_packet_transmit_chunk(packet, chunk, 0, ctx->gfp);
+		status = sctp_packet_transmit_chunk(ctx->packet, chunk, 0,
+						    ctx->gfp);
 		if (status != SCTP_XMIT_OK) {
 			/* We could not append this chunk, so put
 			 * the chunk back on the output queue.
@@ -1139,12 +1128,12 @@ static void sctp_outq_flush_data(struct sctp_flush_ctx *ctx,
 		 * The sender MAY set the I-bit in the DATA
 		 * chunk header.
 		 */
-		if (asoc->state == SCTP_STATE_SHUTDOWN_PENDING)
+		if (ctx->asoc->state == SCTP_STATE_SHUTDOWN_PENDING)
 			chunk->chunk_hdr->flags |= SCTP_DATA_SACK_IMM;
 		if (chunk->chunk_hdr->flags & SCTP_DATA_UNORDERED)
-			asoc->stats.ouodchunks++;
+			ctx->asoc->stats.ouodchunks++;
 		else
-			asoc->stats.oodchunks++;
+			ctx->asoc->stats.oodchunks++;

 		/* Only now it's safe to consider this
 		 * chunk as sent, sched-wise.
@@ -1160,7 +1149,7 @@ static void sctp_outq_flush_data(struct sctp_flush_ctx *ctx,
 		/* Only let one DATA chunk get bundled with a
 		 * COOKIE-ECHO chunk.
 		 */
-		if (packet->has_cookie_echo)
+		if (ctx->packet->has_cookie_echo)
 			break;
 	}
 }
@@ -1202,6 +1191,8 @@ static void sctp_outq_flush(struct sctp_outq *q, int rtx_timeout, gfp_t gfp)
 		.q = q,
 		.transport = NULL,
 		.transport_list = LIST_HEAD_INIT(ctx.transport_list),
+		.asoc = q->asoc,
+		.packet = NULL,
 		.gfp = gfp,
 	};

^ permalink raw reply related

* [PATCH net-next v3 0/3] sctp: Introduce sctp_flush_ctx
From: Marcelo Ricardo Leitner @ 2018-05-14 17:35 UTC (permalink / raw)
  To: netdev; +Cc: linux-sctp, Neil Horman, Xin Long, Vlad Yasevich

This struct will hold all the context used during the outq flush, so we
don't have to pass lots of pointers all around.

Checked on x86_64, the compiler inlines all these functions and there is no
derreference added because of the struct.

This patchset depends on 'sctp: refactor sctp_outq_flush'

Changes since v1:
- updated to build on top of v2 of 'sctp: refactor sctp_outq_flush'

Changes since v2:
- fixed a rebase issue which reverted a change in patch 2.
- rebased on v3 of 'sctp: refactor sctp_outq_flush'

Marcelo Ricardo Leitner (3):
  sctp: add sctp_flush_ctx, a context struct on outq_flush routines
  sctp: add asoc and packet to sctp_flush_ctx
  sctp: checkpatch fixups

 net/sctp/outqueue.c | 257 ++++++++++++++++++++++++----------------------------
 1 file changed, 118 insertions(+), 139 deletions(-)

^ permalink raw reply

* [PATCH net-next v3 3/3] sctp: checkpatch fixups
From: Marcelo Ricardo Leitner @ 2018-05-14 17:35 UTC (permalink / raw)
  To: netdev; +Cc: linux-sctp, Neil Horman, Xin Long, Vlad Yasevich
In-Reply-To: <cover.1526319083.git.marcelo.leitner@gmail.com>

A collection of fixups from previous patches, left for later to not
introduce unnecessary changes while moving code around.

Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
---
 net/sctp/outqueue.c | 20 +++++++-------------
 1 file changed, 7 insertions(+), 13 deletions(-)

diff --git a/net/sctp/outqueue.c b/net/sctp/outqueue.c
index 68b7baea3feaacf6738a2346501b2ad673b67503..d68aa33485a94e87858fed9b655f00a1b9748998 100644
--- a/net/sctp/outqueue.c
+++ b/net/sctp/outqueue.c
@@ -812,8 +812,7 @@ static void sctp_outq_select_transport(struct sctp_flush_ctx *ctx,
 
 	if (!new_transport) {
 		if (!sctp_chunk_is_data(chunk)) {
-			/*
-			 * If we have a prior transport pointer, see if
+			/* If we have a prior transport pointer, see if
 			 * the destination address of the chunk
 			 * matches the destination address of the
 			 * current transport.  If not a match, then
@@ -912,8 +911,7 @@ static void sctp_outq_flush_ctrl(struct sctp_flush_ctx *ctx)
 		sctp_outq_select_transport(ctx, chunk);
 
 		switch (chunk->chunk_hdr->type) {
-		/*
-		 * 6.10 Bundling
+		/* 6.10 Bundling
 		 *   ...
 		 *   An endpoint MUST NOT bundle INIT, INIT ACK or SHUTDOWN
 		 *   COMPLETE with any other chunks.  [Send them immediately.]
@@ -1061,8 +1059,7 @@ static void sctp_outq_flush_data(struct sctp_flush_ctx *ctx,
 		return;
 	}
 
-	/*
-	 * RFC 2960 6.1  Transmission of DATA Chunks
+	/* RFC 2960 6.1  Transmission of DATA Chunks
 	 *
 	 * C) When the time comes for the sender to transmit,
 	 * before sending new DATA chunks, the sender MUST
@@ -1101,8 +1098,7 @@ static void sctp_outq_flush_data(struct sctp_flush_ctx *ctx,
 
 		sctp_outq_select_transport(ctx, chunk);
 
-		pr_debug("%s: outq:%p, chunk:%p[%s], tx-tsn:0x%x skb->head:%p "
-			 "skb->users:%d\n",
+		pr_debug("%s: outq:%p, chunk:%p[%s], tx-tsn:0x%x skb->head:%p skb->users:%d\n",
 			 __func__, ctx->q, chunk, chunk && chunk->chunk_hdr ?
 			 sctp_cname(SCTP_ST_CHUNK(chunk->chunk_hdr->type)) :
 			 "illegal chunk", ntohl(chunk->subh.data_hdr->tsn),
@@ -1175,8 +1171,7 @@ static void sctp_outq_flush_transports(struct sctp_flush_ctx *ctx)
 	}
 }
 
-/*
- * Try to flush an outqueue.
+/* Try to flush an outqueue.
  *
  * Description: Send everything in q which we legally can, subject to
  * congestion limitations.
@@ -1196,8 +1191,7 @@ static void sctp_outq_flush(struct sctp_outq *q, int rtx_timeout, gfp_t gfp)
 		.gfp = gfp,
 	};
 
-	/*
-	 * 6.10 Bundling
+	/* 6.10 Bundling
 	 *   ...
 	 *   When bundling control chunks with DATA chunks, an
 	 *   endpoint MUST place control chunks first in the outbound
@@ -1768,7 +1762,7 @@ static int sctp_acked(struct sctp_sackhdr *sack, __u32 tsn)
 	if (TSN_lte(tsn, ctsn))
 		goto pass;
 
-	/* 3.3.4 Selective Acknowledgement (SACK) (3):
+	/* 3.3.4 Selective Acknowledgment (SACK) (3):
 	 *
 	 * Gap Ack Blocks:
 	 *  These fields contain the Gap Ack Blocks. They are repeated
-- 
2.14.3

^ permalink raw reply related

* [PATCH net-next v3 5/8] sctp: move flushing of data chunks out of sctp_outq_flush
From: Marcelo Ricardo Leitner @ 2018-05-14 17:34 UTC (permalink / raw)
  To: netdev; +Cc: linux-sctp, Neil Horman, Xin Long, Vlad Yasevich
In-Reply-To: <cover.1526318522.git.marcelo.leitner@gmail.com>

To the new sctp_outq_flush_data. Again, smaller functions and with well
defined objectives.

Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
---
 net/sctp/outqueue.c | 149 ++++++++++++++++++++++++++--------------------------
 1 file changed, 75 insertions(+), 74 deletions(-)

diff --git a/net/sctp/outqueue.c b/net/sctp/outqueue.c
index 49e80bf2ade73914f3be275016df0a6e7f06f606..bfa2e43dfd31d115862b65d1650858fcaa6f203b 100644
--- a/net/sctp/outqueue.c
+++ b/net/sctp/outqueue.c
@@ -1038,46 +1038,17 @@ static bool sctp_outq_flush_rtx(struct sctp_outq *q,
 
 	return true;
 }
-/*
- * Try to flush an outqueue.
- *
- * Description: Send everything in q which we legally can, subject to
- * congestion limitations.
- * * Note: This function can be called from multiple contexts so appropriate
- * locking concerns must be made.  Today we use the sock lock to protect
- * this function.
- */
-static void sctp_outq_flush(struct sctp_outq *q, int rtx_timeout, gfp_t gfp)
+
+static void sctp_outq_flush_data(struct sctp_outq *q,
+				 struct sctp_transport **_transport,
+				 struct list_head *transport_list,
+				 int rtx_timeout, gfp_t gfp)
 {
-	struct sctp_packet *packet;
+	struct sctp_transport *transport = *_transport;
+	struct sctp_packet *packet = transport ? &transport->packet : NULL;
 	struct sctp_association *asoc = q->asoc;
-	struct sctp_transport *transport = NULL;
 	struct sctp_chunk *chunk;
 	enum sctp_xmit status;
-	int error = 0;
-
-	/* These transports have chunks to send. */
-	struct list_head transport_list;
-	struct list_head *ltransport;
-
-	INIT_LIST_HEAD(&transport_list);
-	packet = NULL;
-
-	/*
-	 * 6.10 Bundling
-	 *   ...
-	 *   When bundling control chunks with DATA chunks, an
-	 *   endpoint MUST place control chunks first in the outbound
-	 *   SCTP packet.  The transmitter MUST transmit DATA chunks
-	 *   within a SCTP packet in increasing order of TSN.
-	 *   ...
-	 */
-
-	sctp_outq_flush_ctrl(q, &transport, &transport_list, gfp);
-	packet = &transport->packet;
-
-	if (q->asoc->src_out_of_asoc_ok)
-		goto sctp_flush_out;
 
 	/* Is it OK to send data chunks?  */
 	switch (asoc->state) {
@@ -1102,10 +1073,11 @@ static void sctp_outq_flush(struct sctp_outq *q, int rtx_timeout, gfp_t gfp)
 		 * current cwnd).
 		 */
 		if (!list_empty(&q->retransmit)) {
-			if (!sctp_outq_flush_rtx(q, &transport, &transport_list,
+			if (!sctp_outq_flush_rtx(q, _transport, transport_list,
 						 rtx_timeout))
 				break;
 			/* We may have switched current transport */
+			transport = *_transport;
 			packet = &transport->packet;
 		}
 
@@ -1131,12 +1103,14 @@ static void sctp_outq_flush(struct sctp_outq *q, int rtx_timeout, gfp_t gfp)
 
 			if (asoc->stream.out[sid].state == SCTP_STREAM_CLOSED) {
 				sctp_outq_head_data(q, chunk);
-				goto sctp_flush_out;
+				break;
 			}
 
-			if (sctp_outq_select_transport(chunk, asoc, &transport,
-						       &transport_list))
+			if (sctp_outq_select_transport(chunk, asoc, _transport,
+						       transport_list)) {
+				transport = *_transport;
 				packet = &transport->packet;
+			}
 
 			pr_debug("%s: outq:%p, chunk:%p[%s], tx-tsn:0x%x skb->head:%p "
 				 "skb->users:%d\n",
@@ -1148,8 +1122,10 @@ static void sctp_outq_flush(struct sctp_outq *q, int rtx_timeout, gfp_t gfp)
 
 			/* Add the chunk to the packet.  */
 			status = sctp_packet_transmit_chunk(packet, chunk, 0, gfp);
-
 			switch (status) {
+			case SCTP_XMIT_OK:
+				break;
+
 			case SCTP_XMIT_PMTU_FULL:
 			case SCTP_XMIT_RWND_FULL:
 			case SCTP_XMIT_DELAY:
@@ -1161,41 +1137,25 @@ static void sctp_outq_flush(struct sctp_outq *q, int rtx_timeout, gfp_t gfp)
 					 status);
 
 				sctp_outq_head_data(q, chunk);
-				goto sctp_flush_out;
-
-			case SCTP_XMIT_OK:
-				/* The sender is in the SHUTDOWN-PENDING state,
-				 * The sender MAY set the I-bit in the DATA
-				 * chunk header.
-				 */
-				if (asoc->state == SCTP_STATE_SHUTDOWN_PENDING)
-					chunk->chunk_hdr->flags |= SCTP_DATA_SACK_IMM;
-				if (chunk->chunk_hdr->flags & SCTP_DATA_UNORDERED)
-					asoc->stats.ouodchunks++;
-				else
-					asoc->stats.oodchunks++;
-
-				/* Only now it's safe to consider this
-				 * chunk as sent, sched-wise.
-				 */
-				sctp_sched_dequeue_done(q, chunk);
-
-				break;
-
-			default:
-				BUG();
+				return;
 			}
 
-			/* BUG: We assume that the sctp_packet_transmit()
-			 * call below will succeed all the time and add the
-			 * chunk to the transmitted list and restart the
-			 * timers.
-			 * It is possible that the call can fail under OOM
-			 * conditions.
-			 *
-			 * Is this really a problem?  Won't this behave
-			 * like a lost TSN?
+			/* The sender is in the SHUTDOWN-PENDING state,
+			 * The sender MAY set the I-bit in the DATA
+			 * chunk header.
 			 */
+			if (asoc->state == SCTP_STATE_SHUTDOWN_PENDING)
+				chunk->chunk_hdr->flags |= SCTP_DATA_SACK_IMM;
+			if (chunk->chunk_hdr->flags & SCTP_DATA_UNORDERED)
+				asoc->stats.ouodchunks++;
+			else
+				asoc->stats.oodchunks++;
+
+			/* Only now it's safe to consider this
+			 * chunk as sent, sched-wise.
+			 */
+			sctp_sched_dequeue_done(q, chunk);
+
 			list_add_tail(&chunk->transmitted_list,
 				      &transport->transmitted);
 
@@ -1206,7 +1166,7 @@ static void sctp_outq_flush(struct sctp_outq *q, int rtx_timeout, gfp_t gfp)
 			 * COOKIE-ECHO chunk.
 			 */
 			if (packet->has_cookie_echo)
-				goto sctp_flush_out;
+				break;
 		}
 		break;
 
@@ -1214,6 +1174,47 @@ static void sctp_outq_flush(struct sctp_outq *q, int rtx_timeout, gfp_t gfp)
 		/* Do nothing.  */
 		break;
 	}
+}
+
+/*
+ * Try to flush an outqueue.
+ *
+ * Description: Send everything in q which we legally can, subject to
+ * congestion limitations.
+ * * Note: This function can be called from multiple contexts so appropriate
+ * locking concerns must be made.  Today we use the sock lock to protect
+ * this function.
+ */
+static void sctp_outq_flush(struct sctp_outq *q, int rtx_timeout, gfp_t gfp)
+{
+	struct sctp_packet *packet;
+	struct sctp_association *asoc = q->asoc;
+	struct sctp_transport *transport = NULL;
+	int error = 0;
+
+	/* These transports have chunks to send. */
+	struct list_head transport_list;
+	struct list_head *ltransport;
+
+	INIT_LIST_HEAD(&transport_list);
+	packet = NULL;
+
+	/*
+	 * 6.10 Bundling
+	 *   ...
+	 *   When bundling control chunks with DATA chunks, an
+	 *   endpoint MUST place control chunks first in the outbound
+	 *   SCTP packet.  The transmitter MUST transmit DATA chunks
+	 *   within a SCTP packet in increasing order of TSN.
+	 *   ...
+	 */
+
+	sctp_outq_flush_ctrl(q, &transport, &transport_list, gfp);
+
+	if (q->asoc->src_out_of_asoc_ok)
+		goto sctp_flush_out;
+
+	sctp_outq_flush_data(q, &transport, &transport_list, rtx_timeout, gfp);
 
 sctp_flush_out:
 
-- 
2.14.3

^ permalink raw reply related

* [PATCH net-next v3 8/8] sctp: rework switch cases in sctp_outq_flush_data
From: Marcelo Ricardo Leitner @ 2018-05-14 17:34 UTC (permalink / raw)
  To: netdev; +Cc: linux-sctp, Neil Horman, Xin Long, Vlad Yasevich
In-Reply-To: <cover.1526318522.git.marcelo.leitner@gmail.com>

Remove an inner one, which tended to be error prone due to the cascading
and it can be replaced by a simple if ().

Rework the outer one so that the actual flush code is not inside it. Now
we first validate if we can or cannot send data, return if not, and then
the flush code.

Suggested-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
---
 net/sctp/outqueue.c | 191 +++++++++++++++++++++++++---------------------------
 1 file changed, 93 insertions(+), 98 deletions(-)

diff --git a/net/sctp/outqueue.c b/net/sctp/outqueue.c
index e1632b8e2900f88070aebc82bbe4eca61f424e87..e9c22b3db11cd702afc52f8233fb7f39d917698a 100644
--- a/net/sctp/outqueue.c
+++ b/net/sctp/outqueue.c
@@ -1058,122 +1058,117 @@ static void sctp_outq_flush_data(struct sctp_outq *q,
 		 * chunk.
 		 */
 		if (!packet || !packet->has_cookie_echo)
-			break;
+			return;
 
 		/* fallthru */
 	case SCTP_STATE_ESTABLISHED:
 	case SCTP_STATE_SHUTDOWN_PENDING:
 	case SCTP_STATE_SHUTDOWN_RECEIVED:
-		/*
-		 * RFC 2960 6.1  Transmission of DATA Chunks
-		 *
-		 * C) When the time comes for the sender to transmit,
-		 * before sending new DATA chunks, the sender MUST
-		 * first transmit any outstanding DATA chunks which
-		 * are marked for retransmission (limited by the
-		 * current cwnd).
-		 */
-		if (!list_empty(&q->retransmit)) {
-			if (!sctp_outq_flush_rtx(q, _transport, transport_list,
-						 rtx_timeout, gfp))
-				break;
-			/* We may have switched current transport */
-			transport = *_transport;
-			packet = &transport->packet;
-		}
+		break;
 
-		/* Apply Max.Burst limitation to the current transport in
-		 * case it will be used for new data.  We are going to
-		 * rest it before we return, but we want to apply the limit
-		 * to the currently queued data.
-		 */
-		if (transport)
-			sctp_transport_burst_limited(transport);
-
-		/* Finally, transmit new packets.  */
-		while ((chunk = sctp_outq_dequeue_data(q)) != NULL) {
-			__u32 sid = ntohs(chunk->subh.data_hdr->stream);
-
-			/* Has this chunk expired? */
-			if (sctp_chunk_abandoned(chunk)) {
-				sctp_sched_dequeue_done(q, chunk);
-				sctp_chunk_fail(chunk, 0);
-				sctp_chunk_free(chunk);
-				continue;
-			}
+	default:
+		/* Do nothing. */
+		return;
+	}
 
-			if (asoc->stream.out[sid].state == SCTP_STREAM_CLOSED) {
-				sctp_outq_head_data(q, chunk);
-				break;
-			}
+	/*
+	 * RFC 2960 6.1  Transmission of DATA Chunks
+	 *
+	 * C) When the time comes for the sender to transmit,
+	 * before sending new DATA chunks, the sender MUST
+	 * first transmit any outstanding DATA chunks which
+	 * are marked for retransmission (limited by the
+	 * current cwnd).
+	 */
+	if (!list_empty(&q->retransmit)) {
+		if (!sctp_outq_flush_rtx(q, _transport, transport_list,
+					 rtx_timeout, gfp))
+			return;
+		/* We may have switched current transport */
+		transport = *_transport;
+		packet = &transport->packet;
+	}
 
-			if (sctp_outq_select_transport(chunk, asoc, _transport,
-						       transport_list)) {
-				transport = *_transport;
-				packet = &transport->packet;
-			}
+	/* Apply Max.Burst limitation to the current transport in
+	 * case it will be used for new data.  We are going to
+	 * rest it before we return, but we want to apply the limit
+	 * to the currently queued data.
+	 */
+	if (transport)
+		sctp_transport_burst_limited(transport);
 
-			pr_debug("%s: outq:%p, chunk:%p[%s], tx-tsn:0x%x skb->head:%p "
-				 "skb->users:%d\n",
-				 __func__, q, chunk, chunk && chunk->chunk_hdr ?
-				 sctp_cname(SCTP_ST_CHUNK(chunk->chunk_hdr->type)) :
-				 "illegal chunk", ntohl(chunk->subh.data_hdr->tsn),
-				 chunk->skb ? chunk->skb->head : NULL, chunk->skb ?
-				 refcount_read(&chunk->skb->users) : -1);
-
-			/* Add the chunk to the packet.  */
-			status = sctp_packet_transmit_chunk(packet, chunk, 0, gfp);
-			switch (status) {
-			case SCTP_XMIT_OK:
-				break;
+	/* Finally, transmit new packets.  */
+	while ((chunk = sctp_outq_dequeue_data(q)) != NULL) {
+		__u32 sid = ntohs(chunk->subh.data_hdr->stream);
 
-			case SCTP_XMIT_PMTU_FULL:
-			case SCTP_XMIT_RWND_FULL:
-			case SCTP_XMIT_DELAY:
-				/* We could not append this chunk, so put
-				 * the chunk back on the output queue.
-				 */
-				pr_debug("%s: could not transmit tsn:0x%x, status:%d\n",
-					 __func__, ntohl(chunk->subh.data_hdr->tsn),
-					 status);
+		/* Has this chunk expired? */
+		if (sctp_chunk_abandoned(chunk)) {
+			sctp_sched_dequeue_done(q, chunk);
+			sctp_chunk_fail(chunk, 0);
+			sctp_chunk_free(chunk);
+			continue;
+		}
 
-				sctp_outq_head_data(q, chunk);
-				return;
-			}
+		if (asoc->stream.out[sid].state == SCTP_STREAM_CLOSED) {
+			sctp_outq_head_data(q, chunk);
+			break;
+		}
 
-			/* The sender is in the SHUTDOWN-PENDING state,
-			 * The sender MAY set the I-bit in the DATA
-			 * chunk header.
-			 */
-			if (asoc->state == SCTP_STATE_SHUTDOWN_PENDING)
-				chunk->chunk_hdr->flags |= SCTP_DATA_SACK_IMM;
-			if (chunk->chunk_hdr->flags & SCTP_DATA_UNORDERED)
-				asoc->stats.ouodchunks++;
-			else
-				asoc->stats.oodchunks++;
+		if (sctp_outq_select_transport(chunk, asoc, _transport,
+					       transport_list)) {
+			transport = *_transport;
+			packet = &transport->packet;
+		}
 
-			/* Only now it's safe to consider this
-			 * chunk as sent, sched-wise.
+		pr_debug("%s: outq:%p, chunk:%p[%s], tx-tsn:0x%x skb->head:%p "
+			 "skb->users:%d\n",
+			 __func__, q, chunk, chunk && chunk->chunk_hdr ?
+			 sctp_cname(SCTP_ST_CHUNK(chunk->chunk_hdr->type)) :
+			 "illegal chunk", ntohl(chunk->subh.data_hdr->tsn),
+			 chunk->skb ? chunk->skb->head : NULL, chunk->skb ?
+			 refcount_read(&chunk->skb->users) : -1);
+
+		/* Add the chunk to the packet.  */
+		status = sctp_packet_transmit_chunk(packet, chunk, 0, gfp);
+		if (status != SCTP_XMIT_OK) {
+			/* We could not append this chunk, so put
+			 * the chunk back on the output queue.
 			 */
-			sctp_sched_dequeue_done(q, chunk);
+			pr_debug("%s: could not transmit tsn:0x%x, status:%d\n",
+				 __func__, ntohl(chunk->subh.data_hdr->tsn),
+				 status);
 
-			list_add_tail(&chunk->transmitted_list,
-				      &transport->transmitted);
+			sctp_outq_head_data(q, chunk);
+			break;
+		}
 
-			sctp_transport_reset_t3_rtx(transport);
-			transport->last_time_sent = jiffies;
+		/* The sender is in the SHUTDOWN-PENDING state,
+		 * The sender MAY set the I-bit in the DATA
+		 * chunk header.
+		 */
+		if (asoc->state == SCTP_STATE_SHUTDOWN_PENDING)
+			chunk->chunk_hdr->flags |= SCTP_DATA_SACK_IMM;
+		if (chunk->chunk_hdr->flags & SCTP_DATA_UNORDERED)
+			asoc->stats.ouodchunks++;
+		else
+			asoc->stats.oodchunks++;
 
-			/* Only let one DATA chunk get bundled with a
-			 * COOKIE-ECHO chunk.
-			 */
-			if (packet->has_cookie_echo)
-				break;
-		}
-		break;
+		/* Only now it's safe to consider this
+		 * chunk as sent, sched-wise.
+		 */
+		sctp_sched_dequeue_done(q, chunk);
 
-	default:
-		/* Do nothing.  */
-		break;
+		list_add_tail(&chunk->transmitted_list,
+			      &transport->transmitted);
+
+		sctp_transport_reset_t3_rtx(transport);
+		transport->last_time_sent = jiffies;
+
+		/* Only let one DATA chunk get bundled with a
+		 * COOKIE-ECHO chunk.
+		 */
+		if (packet->has_cookie_echo)
+			break;
 	}
 }
 
-- 
2.14.3

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox