Netdev List

Netdev List
 help / color / mirror / Atom feed

* [net-next 09/16] ice: report link down for VF when PF's queues are not enabled
From: Jeff Kirsher @ 2019-09-05 20:33 UTC (permalink / raw)
  To: davem
  Cc: Lukasz Czapnik, netdev, nhorman, sassmann, Tony Nguyen,
	Andrew Bowers, Jeff Kirsher
In-Reply-To: <20190905203406.4152-1-jeffrey.t.kirsher@intel.com>

From: Lukasz Czapnik <lukasz.czapnik@intel.com>

This is port of a fix from i40e commit 2ad1274fa35a ("i40e: don't
report link up for a VF who hasn't enabled queues")

Older VF drivers do not respond well to receiving a link
up notification before queues are enabled. This can cause their state
machine to think that it is safe to send traffic. This results in a Tx
hang on the VF.

Record whether the PF has actually enabled queues for the VF. When
reporting link status, always report link down if the queues aren't
enabled. In this way, the VF driver will never receive a link up
notification until after its queues are enabled.

Signed-off-by: Lukasz Czapnik <lukasz.czapnik@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c b/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c
index 3ba6613048ef..1ec2a037a369 100644
--- a/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c
+++ b/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c
@@ -129,7 +129,10 @@ static void ice_vc_notify_vf_link_state(struct ice_vf *vf)
 	pfe.event = VIRTCHNL_EVENT_LINK_CHANGE;
 	pfe.severity = PF_EVENT_SEVERITY_INFO;
 
-	if (vf->link_forced)
+	/* Always report link is down if the VF queues aren't enabled */
+	if (!vf->num_qs_ena)
+		ice_set_pfe_link(vf, &pfe, ICE_AQ_LINK_SPEED_UNKNOWN, false);
+	else if (vf->link_forced)
 		ice_set_pfe_link_forced(vf, &pfe, vf->link_up);
 	else
 		ice_set_pfe_link(vf, &pfe, ls->link_speed, ls->link_info &
-- 
2.21.0


^ permalink raw reply related

* [net-next 16/16] ice: Rework around device/function capabilities
From: Jeff Kirsher @ 2019-09-05 20:34 UTC (permalink / raw)
  To: davem
  Cc: Anirudh Venkataramanan, netdev, nhorman, sassmann, Tony Nguyen,
	Andrew Bowers, Jeff Kirsher
In-Reply-To: <20190905203406.4152-1-jeffrey.t.kirsher@intel.com>

From: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>

ice_parse_caps is printing capabilities in a different way when
compared to the variable names. This makes it difficult to search for
the right strings in the debug logs. So this patch updates the
print strings to be exactly the same as the fields' name in the
structure.

Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ice/ice_common.c | 40 ++++++++++-----------
 1 file changed, 20 insertions(+), 20 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_common.c b/drivers/net/ethernet/intel/ice/ice_common.c
index e8397e5b6267..8b2c46615834 100644
--- a/drivers/net/ethernet/intel/ice/ice_common.c
+++ b/drivers/net/ethernet/intel/ice/ice_common.c
@@ -1551,29 +1551,29 @@ ice_parse_caps(struct ice_hw *hw, void *buf, u32 cap_count,
 		case ICE_AQC_CAPS_VALID_FUNCTIONS:
 			caps->valid_functions = number;
 			ice_debug(hw, ICE_DBG_INIT,
-				  "%s: valid functions = %d\n", prefix,
+				  "%s: valid_functions (bitmap) = %d\n", prefix,
 				  caps->valid_functions);
 			break;
 		case ICE_AQC_CAPS_SRIOV:
 			caps->sr_iov_1_1 = (number == 1);
 			ice_debug(hw, ICE_DBG_INIT,
-				  "%s: SR-IOV = %d\n", prefix,
+				  "%s: sr_iov_1_1 = %d\n", prefix,
 				  caps->sr_iov_1_1);
 			break;
 		case ICE_AQC_CAPS_VF:
 			if (dev_p) {
 				dev_p->num_vfs_exposed = number;
 				ice_debug(hw, ICE_DBG_INIT,
-					  "%s: VFs exposed = %d\n", prefix,
+					  "%s: num_vfs_exposed = %d\n", prefix,
 					  dev_p->num_vfs_exposed);
 			} else if (func_p) {
 				func_p->num_allocd_vfs = number;
 				func_p->vf_base_id = logical_id;
 				ice_debug(hw, ICE_DBG_INIT,
-					  "%s: VFs allocated = %d\n", prefix,
+					  "%s: num_allocd_vfs = %d\n", prefix,
 					  func_p->num_allocd_vfs);
 				ice_debug(hw, ICE_DBG_INIT,
-					  "%s: VF base_id = %d\n", prefix,
+					  "%s: vf_base_id = %d\n", prefix,
 					  func_p->vf_base_id);
 			}
 			break;
@@ -1581,17 +1581,17 @@ ice_parse_caps(struct ice_hw *hw, void *buf, u32 cap_count,
 			if (dev_p) {
 				dev_p->num_vsi_allocd_to_host = number;
 				ice_debug(hw, ICE_DBG_INIT,
-					  "%s: num VSI alloc to host = %d\n",
+					  "%s: num_vsi_allocd_to_host = %d\n",
 					  prefix,
 					  dev_p->num_vsi_allocd_to_host);
 			} else if (func_p) {
 				func_p->guar_num_vsi =
 					ice_get_num_per_func(hw, ICE_MAX_VSI);
 				ice_debug(hw, ICE_DBG_INIT,
-					  "%s: num guaranteed VSI (fw) = %d\n",
+					  "%s: guar_num_vsi (fw) = %d\n",
 					  prefix, number);
 				ice_debug(hw, ICE_DBG_INIT,
-					  "%s: num guaranteed VSI = %d\n",
+					  "%s: guar_num_vsi = %d\n",
 					  prefix, func_p->guar_num_vsi);
 			}
 			break;
@@ -1600,56 +1600,56 @@ ice_parse_caps(struct ice_hw *hw, void *buf, u32 cap_count,
 			caps->active_tc_bitmap = logical_id;
 			caps->maxtc = phys_id;
 			ice_debug(hw, ICE_DBG_INIT,
-				  "%s: DCB = %d\n", prefix, caps->dcb);
+				  "%s: dcb = %d\n", prefix, caps->dcb);
 			ice_debug(hw, ICE_DBG_INIT,
-				  "%s: active TC bitmap = %d\n", prefix,
+				  "%s: active_tc_bitmap = %d\n", prefix,
 				  caps->active_tc_bitmap);
 			ice_debug(hw, ICE_DBG_INIT,
-				  "%s: TC max = %d\n", prefix, caps->maxtc);
+				  "%s: maxtc = %d\n", prefix, caps->maxtc);
 			break;
 		case ICE_AQC_CAPS_RSS:
 			caps->rss_table_size = number;
 			caps->rss_table_entry_width = logical_id;
 			ice_debug(hw, ICE_DBG_INIT,
-				  "%s: RSS table size = %d\n", prefix,
+				  "%s: rss_table_size = %d\n", prefix,
 				  caps->rss_table_size);
 			ice_debug(hw, ICE_DBG_INIT,
-				  "%s: RSS table width = %d\n", prefix,
+				  "%s: rss_table_entry_width = %d\n", prefix,
 				  caps->rss_table_entry_width);
 			break;
 		case ICE_AQC_CAPS_RXQS:
 			caps->num_rxq = number;
 			caps->rxq_first_id = phys_id;
 			ice_debug(hw, ICE_DBG_INIT,
-				  "%s: num Rx queues = %d\n", prefix,
+				  "%s: num_rxq = %d\n", prefix,
 				  caps->num_rxq);
 			ice_debug(hw, ICE_DBG_INIT,
-				  "%s: Rx first queue ID = %d\n", prefix,
+				  "%s: rxq_first_id = %d\n", prefix,
 				  caps->rxq_first_id);
 			break;
 		case ICE_AQC_CAPS_TXQS:
 			caps->num_txq = number;
 			caps->txq_first_id = phys_id;
 			ice_debug(hw, ICE_DBG_INIT,
-				  "%s: num Tx queues = %d\n", prefix,
+				  "%s: num_txq = %d\n", prefix,
 				  caps->num_txq);
 			ice_debug(hw, ICE_DBG_INIT,
-				  "%s: Tx first queue ID = %d\n", prefix,
+				  "%s: txq_first_id = %d\n", prefix,
 				  caps->txq_first_id);
 			break;
 		case ICE_AQC_CAPS_MSIX:
 			caps->num_msix_vectors = number;
 			caps->msix_vector_first_id = phys_id;
 			ice_debug(hw, ICE_DBG_INIT,
-				  "%s: MSIX vector count = %d\n", prefix,
+				  "%s: num_msix_vectors = %d\n", prefix,
 				  caps->num_msix_vectors);
 			ice_debug(hw, ICE_DBG_INIT,
-				  "%s: MSIX first vector index = %d\n", prefix,
+				  "%s: msix_vector_first_id = %d\n", prefix,
 				  caps->msix_vector_first_id);
 			break;
 		case ICE_AQC_CAPS_MAX_MTU:
 			caps->max_mtu = number;
-			ice_debug(hw, ICE_DBG_INIT, "%s: max MTU = %d\n",
+			ice_debug(hw, ICE_DBG_INIT, "%s: max_mtu = %d\n",
 				  prefix, caps->max_mtu);
 			break;
 		default:
-- 
2.21.0


^ permalink raw reply related

* [net-next 13/16] ice: Allow for delayed LLDP MIB change registration
From: Jeff Kirsher @ 2019-09-05 20:34 UTC (permalink / raw)
  To: davem
  Cc: Dave Ertman, netdev, nhorman, sassmann, Tony Nguyen,
	Andrew Bowers, Jeff Kirsher
In-Reply-To: <20190905203406.4152-1-jeffrey.t.kirsher@intel.com>

From: Dave Ertman <david.m.ertman@intel.com>

Add an additional boolean parameter to the ice_init_dcb
function.  This boolean controls if the LLDP MIB change
events are registered for.  Also, add a new function
defined ice_cfg_lldp_mib_change.  The additional function
is necessary to be able to register for LLDP MIB change
events after calling ice_init_dcb.  The net effect of these
two changes is to allow a delayed registration for MIB change
events so that the driver is not accepting events before it
is ready for them.

Signed-off-by: Dave Ertman <david.m.ertman@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ice/ice_dcb.c     | 39 ++++++++++++++++++--
 drivers/net/ethernet/intel/ice/ice_dcb.h     | 11 ++----
 drivers/net/ethernet/intel/ice/ice_dcb_lib.c |  4 +-
 drivers/net/ethernet/intel/ice/ice_ethtool.c | 10 ++++-
 drivers/net/ethernet/intel/ice/ice_main.c    |  2 +
 5 files changed, 51 insertions(+), 15 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_dcb.c b/drivers/net/ethernet/intel/ice/ice_dcb.c
index c5ee8d930611..dd7efff121bd 100644
--- a/drivers/net/ethernet/intel/ice/ice_dcb.c
+++ b/drivers/net/ethernet/intel/ice/ice_dcb.c
@@ -60,7 +60,7 @@ ice_aq_get_lldp_mib(struct ice_hw *hw, u8 bridge_type, u8 mib_type, void *buf,
  * Enable or Disable posting of an event on ARQ when LLDP MIB
  * associated with the interface changes (0x0A01)
  */
-enum ice_status
+static enum ice_status
 ice_aq_cfg_lldp_mib_change(struct ice_hw *hw, bool ena_update,
 			   struct ice_sq_cd *cd)
 {
@@ -943,10 +943,11 @@ enum ice_status ice_get_dcb_cfg(struct ice_port_info *pi)
 /**
  * ice_init_dcb
  * @hw: pointer to the HW struct
+ * @enable_mib_change: enable MIB change event
  *
  * Update DCB configuration from the Firmware
  */
-enum ice_status ice_init_dcb(struct ice_hw *hw)
+enum ice_status ice_init_dcb(struct ice_hw *hw, bool enable_mib_change)
 {
 	struct ice_port_info *pi = hw->port_info;
 	enum ice_status ret = 0;
@@ -972,9 +973,39 @@ enum ice_status ice_init_dcb(struct ice_hw *hw)
 	}
 
 	/* Configure the LLDP MIB change event */
-	ret = ice_aq_cfg_lldp_mib_change(hw, true, NULL);
+	if (enable_mib_change) {
+		ret = ice_aq_cfg_lldp_mib_change(hw, true, NULL);
+		if (!ret)
+			pi->is_sw_lldp = false;
+	}
+
+	return ret;
+}
+
+/**
+ * ice_cfg_lldp_mib_change
+ * @hw: pointer to the HW struct
+ * @ena_mib: enable/disable MIB change event
+ *
+ * Configure (disable/enable) MIB
+ */
+enum ice_status ice_cfg_lldp_mib_change(struct ice_hw *hw, bool ena_mib)
+{
+	struct ice_port_info *pi = hw->port_info;
+	enum ice_status ret;
+
+	if (!hw->func_caps.common_cap.dcb)
+		return ICE_ERR_NOT_SUPPORTED;
+
+	/* Get DCBX status */
+	pi->dcbx_status = ice_get_dcbx_status(hw);
+
+	if (pi->dcbx_status == ICE_DCBX_STATUS_DIS)
+		return ICE_ERR_NOT_READY;
+
+	ret = ice_aq_cfg_lldp_mib_change(hw, ena_mib, NULL);
 	if (!ret)
-		pi->is_sw_lldp = false;
+		pi->is_sw_lldp = !ena_mib;
 
 	return ret;
 }
diff --git a/drivers/net/ethernet/intel/ice/ice_dcb.h b/drivers/net/ethernet/intel/ice/ice_dcb.h
index 522e1452abe2..ee138f9bdc7c 100644
--- a/drivers/net/ethernet/intel/ice/ice_dcb.h
+++ b/drivers/net/ethernet/intel/ice/ice_dcb.h
@@ -125,7 +125,7 @@ ice_aq_get_dcb_cfg(struct ice_hw *hw, u8 mib_type, u8 bridgetype,
 		   struct ice_dcbx_cfg *dcbcfg);
 enum ice_status ice_get_dcb_cfg(struct ice_port_info *pi);
 enum ice_status ice_set_dcb_cfg(struct ice_port_info *pi);
-enum ice_status ice_init_dcb(struct ice_hw *hw);
+enum ice_status ice_init_dcb(struct ice_hw *hw, bool enable_mib_change);
 enum ice_status
 ice_query_port_ets(struct ice_port_info *pi,
 		   struct ice_aqc_port_ets_elem *buf, u16 buf_size,
@@ -139,9 +139,7 @@ ice_aq_start_lldp(struct ice_hw *hw, bool persist, struct ice_sq_cd *cd);
 enum ice_status
 ice_aq_start_stop_dcbx(struct ice_hw *hw, bool start_dcbx_agent,
 		       bool *dcbx_agent_status, struct ice_sq_cd *cd);
-enum ice_status
-ice_aq_cfg_lldp_mib_change(struct ice_hw *hw, bool ena_update,
-			   struct ice_sq_cd *cd);
+enum ice_status ice_cfg_lldp_mib_change(struct ice_hw *hw, bool ena_mib);
 #else /* CONFIG_DCB */
 static inline enum ice_status
 ice_aq_stop_lldp(struct ice_hw __always_unused *hw,
@@ -172,9 +170,8 @@ ice_aq_start_stop_dcbx(struct ice_hw __always_unused *hw,
 }
 
 static inline enum ice_status
-ice_aq_cfg_lldp_mib_change(struct ice_hw __always_unused *hw,
-			   bool __always_unused ena_update,
-			   struct ice_sq_cd __always_unused *cd)
+ice_cfg_lldp_mib_change(struct ice_hw __always_unused *hw,
+			bool __always_unused ena_mib)
 {
 	return 0;
 }
diff --git a/drivers/net/ethernet/intel/ice/ice_dcb_lib.c b/drivers/net/ethernet/intel/ice/ice_dcb_lib.c
index 20f440a64650..97c22d4aae1d 100644
--- a/drivers/net/ethernet/intel/ice/ice_dcb_lib.c
+++ b/drivers/net/ethernet/intel/ice/ice_dcb_lib.c
@@ -318,7 +318,7 @@ void ice_dcb_rebuild(struct ice_pf *pf)
 		goto dcb_error;
 	}
 
-	ice_init_dcb(&pf->hw);
+	ice_init_dcb(&pf->hw, true);
 	if (pf->hw.port_info->dcbx_status == ICE_DCBX_STATUS_DIS)
 		pf->hw.port_info->is_sw_lldp = true;
 	else
@@ -451,7 +451,7 @@ int ice_init_pf_dcb(struct ice_pf *pf, bool locked)
 
 	port_info = hw->port_info;
 
-	err = ice_init_dcb(hw);
+	err = ice_init_dcb(hw, false);
 	if (err && !port_info->is_sw_lldp) {
 		dev_err(&pf->pdev->dev, "Error initializing DCB %d\n", err);
 		goto dcb_init_err;
diff --git a/drivers/net/ethernet/intel/ice/ice_ethtool.c b/drivers/net/ethernet/intel/ice/ice_ethtool.c
index ae9921b7de7b..d5db1426d484 100644
--- a/drivers/net/ethernet/intel/ice/ice_ethtool.c
+++ b/drivers/net/ethernet/intel/ice/ice_ethtool.c
@@ -1206,8 +1206,8 @@ static int ice_set_priv_flags(struct net_device *netdev, u32 flags)
 			enum ice_status status;
 
 			/* Disable FW LLDP engine */
-			status = ice_aq_cfg_lldp_mib_change(&pf->hw, false,
-							    NULL);
+			status = ice_cfg_lldp_mib_change(&pf->hw, false);
+
 			/* If unregistering for LLDP events fails, this is
 			 * not an error state, as there shouldn't be any
 			 * events to respond to.
@@ -1273,6 +1273,12 @@ static int ice_set_priv_flags(struct net_device *netdev, u32 flags)
 			 * The FW LLDP engine will now be consuming them.
 			 */
 			ice_cfg_sw_lldp(vsi, false, false);
+
+			/* Register for MIB change events */
+			status = ice_cfg_lldp_mib_change(&pf->hw, true);
+			if (status)
+				dev_dbg(&pf->pdev->dev,
+					"Fail to enable MIB change events\n");
 		}
 	}
 	clear_bit(ICE_FLAG_ETHTOOL_CTXT, pf->flags);
diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c
index 8bb3b81876a9..2d92d8591a8a 100644
--- a/drivers/net/ethernet/intel/ice/ice_main.c
+++ b/drivers/net/ethernet/intel/ice/ice_main.c
@@ -2536,6 +2536,8 @@ ice_probe(struct pci_dev *pdev, const struct pci_device_id __always_unused *ent)
 		if (ice_init_pf_dcb(pf, false)) {
 			clear_bit(ICE_FLAG_DCB_CAPABLE, pf->flags);
 			clear_bit(ICE_FLAG_DCB_ENA, pf->flags);
+		} else {
+			ice_cfg_lldp_mib_change(&pf->hw, true);
 		}
 	}
 
-- 
2.21.0


^ permalink raw reply related

* [net-next 10/16] ice: Check for DCB capability before initializing DCB
From: Jeff Kirsher @ 2019-09-05 20:34 UTC (permalink / raw)
  To: davem
  Cc: Anirudh Venkataramanan, netdev, nhorman, sassmann, Andrew Bowers,
	Jeff Kirsher
In-Reply-To: <20190905203406.4152-1-jeffrey.t.kirsher@intel.com>

From: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>

Check the ICE_FLAG_DCB_CAPABLE before calling ice_init_pf_dcb.

Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ice/ice_dcb_lib.c |  3 ---
 drivers/net/ethernet/intel/ice/ice_main.c    | 15 ++++++++-------
 2 files changed, 8 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_dcb_lib.c b/drivers/net/ethernet/intel/ice/ice_dcb_lib.c
index e922adf1fa15..20f440a64650 100644
--- a/drivers/net/ethernet/intel/ice/ice_dcb_lib.c
+++ b/drivers/net/ethernet/intel/ice/ice_dcb_lib.c
@@ -474,7 +474,6 @@ int ice_init_pf_dcb(struct ice_pf *pf, bool locked)
 		}
 
 		pf->dcbx_cap = DCB_CAP_DCBX_HOST | DCB_CAP_DCBX_VER_IEEE;
-		set_bit(ICE_FLAG_DCB_CAPABLE, pf->flags);
 		return 0;
 	}
 
@@ -483,8 +482,6 @@ int ice_init_pf_dcb(struct ice_pf *pf, bool locked)
 	/* DCBX in FW and LLDP enabled in FW */
 	pf->dcbx_cap = DCB_CAP_DCBX_LLD_MANAGED | DCB_CAP_DCBX_VER_IEEE;
 
-	set_bit(ICE_FLAG_DCB_CAPABLE, pf->flags);
-
 	err = ice_dcb_init_cfg(pf, locked);
 	if (err)
 		goto dcb_init_err;
diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c
index 703fc7bf2b31..8bb3b81876a9 100644
--- a/drivers/net/ethernet/intel/ice/ice_main.c
+++ b/drivers/net/ethernet/intel/ice/ice_main.c
@@ -2252,6 +2252,8 @@ static void ice_deinit_pf(struct ice_pf *pf)
 static int ice_init_pf(struct ice_pf *pf)
 {
 	bitmap_zero(pf->flags, ICE_PF_FLAGS_NBITS);
+	if (pf->hw.func_caps.common_cap.dcb)
+		set_bit(ICE_FLAG_DCB_CAPABLE, pf->flags);
 #ifdef CONFIG_PCI_IOV
 	if (pf->hw.func_caps.common_cap.sr_iov_1_1) {
 		struct ice_hw *hw = &pf->hw;
@@ -2529,13 +2531,12 @@ ice_probe(struct pci_dev *pdev, const struct pci_device_id __always_unused *ent)
 		goto err_init_pf_unroll;
 	}
 
-	err = ice_init_pf_dcb(pf, false);
-	if (err) {
-		clear_bit(ICE_FLAG_DCB_CAPABLE, pf->flags);
-		clear_bit(ICE_FLAG_DCB_ENA, pf->flags);
-
-		/* do not fail overall init if DCB init fails */
-		err = 0;
+	if (test_bit(ICE_FLAG_DCB_CAPABLE, pf->flags)) {
+		/* Note: DCB init failure is non-fatal to load */
+		if (ice_init_pf_dcb(pf, false)) {
+			clear_bit(ICE_FLAG_DCB_CAPABLE, pf->flags);
+			clear_bit(ICE_FLAG_DCB_ENA, pf->flags);
+		}
 	}
 
 	ice_determine_q_usage(pf);
-- 
2.21.0


^ permalink raw reply related

* [net-next 02/16] ice: Add ice_get_main_vsi to get PF/main VSI
From: Jeff Kirsher @ 2019-09-05 20:33 UTC (permalink / raw)
  To: davem
  Cc: Anirudh Venkataramanan, netdev, nhorman, sassmann, Tony Nguyen,
	Andrew Bowers, Jeff Kirsher
In-Reply-To: <20190905203406.4152-1-jeffrey.t.kirsher@intel.com>

From: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>

There are multiple places where we currently use ice_find_vsi_by_type
to get the PF (a.k.a. main) VSI. The PF VSI by definition is always
the first element in the pf->vsi array (i.e. pf->vsi[0]). So instead
add and use a new helper function ice_get_main_vsi, which just returns
pf->vsi[0].

Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ice/ice.h      | 20 +++++++-------------
 drivers/net/ethernet/intel/ice/ice_main.c |  6 +++---
 2 files changed, 10 insertions(+), 16 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice.h b/drivers/net/ethernet/intel/ice/ice.h
index fb2bc836b20a..bbb3c290a0bf 100644
--- a/drivers/net/ethernet/intel/ice/ice.h
+++ b/drivers/net/ethernet/intel/ice/ice.h
@@ -425,21 +425,15 @@ ice_irq_dynamic_ena(struct ice_hw *hw, struct ice_vsi *vsi,
 }
 
 /**
- * ice_find_vsi_by_type - Find and return VSI of a given type
- * @pf: PF to search for VSI
- * @type: Value indicating type of VSI we are looking for
+ * ice_get_main_vsi - Get the PF VSI
+ * @pf: PF instance
+ *
+ * returns pf->vsi[0], which by definition is the PF VSI
  */
-static inline struct ice_vsi *
-ice_find_vsi_by_type(struct ice_pf *pf, enum ice_vsi_type type)
+static inline struct ice_vsi *ice_get_main_vsi(struct ice_pf *pf)
 {
-	int i;
-
-	for (i = 0; i < pf->num_alloc_vsi; i++) {
-		struct ice_vsi *vsi = pf->vsi[i];
-
-		if (vsi && vsi->type == type)
-			return vsi;
-	}
+	if (pf->vsi)
+		return pf->vsi[0];
 
 	return NULL;
 }
diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c
index 50a17a0337be..703fc7bf2b31 100644
--- a/drivers/net/ethernet/intel/ice/ice_main.c
+++ b/drivers/net/ethernet/intel/ice/ice_main.c
@@ -120,7 +120,7 @@ static int ice_init_mac_fltr(struct ice_pf *pf)
 	u8 broadcast[ETH_ALEN];
 	struct ice_vsi *vsi;
 
-	vsi = ice_find_vsi_by_type(pf, ICE_VSI_PF);
+	vsi = ice_get_main_vsi(pf);
 	if (!vsi)
 		return -EINVAL;
 
@@ -826,7 +826,7 @@ ice_link_event(struct ice_pf *pf, struct ice_port_info *pi, bool link_up,
 	if (link_up == old_link && link_speed == old_link_speed)
 		return result;
 
-	vsi = ice_find_vsi_by_type(pf, ICE_VSI_PF);
+	vsi = ice_get_main_vsi(pf);
 	if (!vsi || !vsi->port_info)
 		return -EINVAL;
 
@@ -1439,7 +1439,7 @@ static void ice_check_media_subtask(struct ice_pf *pf)
 	struct ice_vsi *vsi;
 	int err;
 
-	vsi = ice_find_vsi_by_type(pf, ICE_VSI_PF);
+	vsi = ice_get_main_vsi(pf);
 	if (!vsi)
 		return;
 
-- 
2.21.0


^ permalink raw reply related

* [net-next 04/16] ice: clean up arguments
From: Jeff Kirsher @ 2019-09-05 20:33 UTC (permalink / raw)
  To: davem
  Cc: Jesse Brandeburg, netdev, nhorman, sassmann, Tony Nguyen,
	Andrew Bowers, Jeff Kirsher
In-Reply-To: <20190905203406.4152-1-jeffrey.t.kirsher@intel.com>

From: Jesse Brandeburg <jesse.brandeburg@intel.com>

There are a couple of functions that don't need two arguments
passed in when the second argument already had access to
the pointer pointed to by the first.

Remove the unnecessary arguments.

Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ice/ice_txrx.c | 43 +++++++++++------------
 1 file changed, 21 insertions(+), 22 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_txrx.c b/drivers/net/ethernet/intel/ice/ice_txrx.c
index 5bf5c179a738..4fe1b332e67e 100644
--- a/drivers/net/ethernet/intel/ice/ice_txrx.c
+++ b/drivers/net/ethernet/intel/ice/ice_txrx.c
@@ -95,17 +95,16 @@ void ice_free_tx_ring(struct ice_ring *tx_ring)
 
 /**
  * ice_clean_tx_irq - Reclaim resources after transmit completes
- * @vsi: the VSI we care about
  * @tx_ring: Tx ring to clean
  * @napi_budget: Used to determine if we are in netpoll
  *
  * Returns true if there's any budget left (e.g. the clean is finished)
  */
-static bool
-ice_clean_tx_irq(struct ice_vsi *vsi, struct ice_ring *tx_ring, int napi_budget)
+static bool ice_clean_tx_irq(struct ice_ring *tx_ring, int napi_budget)
 {
 	unsigned int total_bytes = 0, total_pkts = 0;
-	unsigned int budget = vsi->work_lmt;
+	unsigned int budget = ICE_DFLT_IRQ_WORK;
+	struct ice_vsi *vsi = tx_ring->vsi;
 	s16 i = tx_ring->next_to_clean;
 	struct ice_tx_desc *tx_desc;
 	struct ice_tx_buf *tx_buf;
@@ -114,6 +113,8 @@ ice_clean_tx_irq(struct ice_vsi *vsi, struct ice_ring *tx_ring, int napi_budget)
 	tx_desc = ICE_TX_DESC(tx_ring, i);
 	i -= tx_ring->count;
 
+	prefetch(&vsi->state);
+
 	do {
 		struct ice_tx_desc *eop_desc = tx_buf->next_to_watch;
 
@@ -206,7 +207,7 @@ ice_clean_tx_irq(struct ice_vsi *vsi, struct ice_ring *tx_ring, int napi_budget)
 		smp_mb();
 		if (__netif_subqueue_stopped(tx_ring->netdev,
 					     tx_ring->q_index) &&
-		   !test_bit(__ICE_DOWN, vsi->state)) {
+		    !test_bit(__ICE_DOWN, vsi->state)) {
 			netif_wake_subqueue(tx_ring->netdev,
 					    tx_ring->q_index);
 			++tx_ring->tx_stats.restart_q;
@@ -879,7 +880,7 @@ ice_rx_hash(struct ice_ring *rx_ring, union ice_32b_rx_flex_desc *rx_desc,
 
 /**
  * ice_rx_csum - Indicate in skb if checksum is good
- * @vsi: the VSI we care about
+ * @ring: the ring we care about
  * @skb: skb currently being received and modified
  * @rx_desc: the receive descriptor
  * @ptype: the packet type decoded by hardware
@@ -887,7 +888,7 @@ ice_rx_hash(struct ice_ring *rx_ring, union ice_32b_rx_flex_desc *rx_desc,
  * skb->protocol must be set before this function is called
  */
 static void
-ice_rx_csum(struct ice_vsi *vsi, struct sk_buff *skb,
+ice_rx_csum(struct ice_ring *ring, struct sk_buff *skb,
 	    union ice_32b_rx_flex_desc *rx_desc, u8 ptype)
 {
 	struct ice_rx_ptype_decoded decoded;
@@ -904,7 +905,7 @@ ice_rx_csum(struct ice_vsi *vsi, struct sk_buff *skb,
 	skb_checksum_none_assert(skb);
 
 	/* check if Rx checksum is enabled */
-	if (!(vsi->netdev->features & NETIF_F_RXCSUM))
+	if (!(ring->netdev->features & NETIF_F_RXCSUM))
 		return;
 
 	/* check if HW has decoded the packet and checksum */
@@ -944,7 +945,7 @@ ice_rx_csum(struct ice_vsi *vsi, struct sk_buff *skb,
 	return;
 
 checksum_fail:
-	vsi->back->hw_csum_rx_error++;
+	ring->vsi->back->hw_csum_rx_error++;
 }
 
 /**
@@ -968,7 +969,7 @@ ice_process_skb_fields(struct ice_ring *rx_ring,
 	/* modifies the skb - consumes the enet header */
 	skb->protocol = eth_type_trans(skb, rx_ring->netdev);
 
-	ice_rx_csum(rx_ring->vsi, skb, rx_desc, ptype);
+	ice_rx_csum(rx_ring, skb, rx_desc, ptype);
 }
 
 /**
@@ -1354,14 +1355,13 @@ static u32 ice_buildreg_itr(u16 itr_idx, u16 itr)
 
 /**
  * ice_update_ena_itr - Update ITR and re-enable MSIX interrupt
- * @vsi: the VSI associated with the q_vector
  * @q_vector: q_vector for which ITR is being updated and interrupt enabled
  */
-static void
-ice_update_ena_itr(struct ice_vsi *vsi, struct ice_q_vector *q_vector)
+static void ice_update_ena_itr(struct ice_q_vector *q_vector)
 {
 	struct ice_ring_container *tx = &q_vector->tx;
 	struct ice_ring_container *rx = &q_vector->rx;
+	struct ice_vsi *vsi = q_vector->vsi;
 	u32 itr_val;
 
 	/* when exiting WB_ON_ITR lets set a low ITR value and trigger
@@ -1419,15 +1419,14 @@ ice_update_ena_itr(struct ice_vsi *vsi, struct ice_q_vector *q_vector)
 			q_vector->itr_countdown--;
 	}
 
-	if (!test_bit(__ICE_DOWN, vsi->state))
-		wr32(&vsi->back->hw,
+	if (!test_bit(__ICE_DOWN, q_vector->vsi->state))
+		wr32(&q_vector->vsi->back->hw,
 		     GLINT_DYN_CTL(q_vector->reg_idx),
 		     itr_val);
 }
 
 /**
  * ice_set_wb_on_itr - set WB_ON_ITR for this q_vector
- * @vsi: pointer to the VSI structure
  * @q_vector: q_vector to set WB_ON_ITR on
  *
  * We need to tell hardware to write-back completed descriptors even when
@@ -1440,9 +1439,10 @@ ice_update_ena_itr(struct ice_vsi *vsi, struct ice_q_vector *q_vector)
  * value that's not 0 due to ITR granularity. Also, set the INTENA_MSK bit to
  * make sure hardware knows we aren't meddling with the INTENA_M bit.
  */
-static void
-ice_set_wb_on_itr(struct ice_vsi *vsi, struct ice_q_vector *q_vector)
+static void ice_set_wb_on_itr(struct ice_q_vector *q_vector)
 {
+	struct ice_vsi *vsi = q_vector->vsi;
+
 	/* already in WB_ON_ITR mode no need to change it */
 	if (q_vector->itr_countdown == ICE_IN_WB_ON_ITR_MODE)
 		return;
@@ -1473,7 +1473,6 @@ int ice_napi_poll(struct napi_struct *napi, int budget)
 {
 	struct ice_q_vector *q_vector =
 				container_of(napi, struct ice_q_vector, napi);
-	struct ice_vsi *vsi = q_vector->vsi;
 	bool clean_complete = true;
 	struct ice_ring *ring;
 	int budget_per_ring;
@@ -1483,7 +1482,7 @@ int ice_napi_poll(struct napi_struct *napi, int budget)
 	 * budget and be more aggressive about cleaning up the Tx descriptors.
 	 */
 	ice_for_each_ring(ring, q_vector->tx)
-		if (!ice_clean_tx_irq(vsi, ring, budget))
+		if (!ice_clean_tx_irq(ring, budget))
 			clean_complete = false;
 
 	/* Handle case where we are called by netpoll with a budget of 0 */
@@ -1519,9 +1518,9 @@ int ice_napi_poll(struct napi_struct *napi, int budget)
 	 * poll us due to busy-polling
 	 */
 	if (likely(napi_complete_done(napi, work_done)))
-		ice_update_ena_itr(vsi, q_vector);
+		ice_update_ena_itr(q_vector);
 	else
-		ice_set_wb_on_itr(vsi, q_vector);
+		ice_set_wb_on_itr(q_vector);
 
 	return min_t(int, work_done, budget - 1);
 }
-- 
2.21.0


^ permalink raw reply related

* [net-next 00/16][pull request] 100GbE Intel Wired LAN Driver Updates 2019-09-05
From: Jeff Kirsher @ 2019-09-05 20:33 UTC (permalink / raw)
  To: davem; +Cc: Jeff Kirsher, netdev, nhorman, sassmann

This series contains updates to ice driver.

Brett fixes the setting of num_q_vectors by using the maximum number
between the allocated transmit and receive queues.

Anirudh simplifies the code to use a helper function to return the main
VSI, which is the first element in the pf->vsi array.  Adds a pointer
check to prevent a NULL pointer dereference.  Adds a check to ensure we
do not initialize DCB on devices that are not DCB capable.  Does some
housekeeping on the code to remove unnecessary indirection and reduce
the PF structure by removing elements that are not needed since the
values they were storing can be readily gotten from
ice_get_avail_*_count()'s.  Updates the printed strings to make it
easier to search the logs for driver capabilities.

Jesse cleans up unnecessary function arguments.  Updated the code to use
prefetch() to add some efficiency to the driver to avoid a cache miss.
Did some housekeeping on the code to remove the configurable transmit
work limit via ethtool which ended up creating performance overhead.
Made additional performance enhancements by updating the driver to start
out with a reasonable number of descriptors by changing the default to
2048.

Mitch fixes the reset logic for VFs by clearing VF_MBX_ARQLEN register
when the source of the reset is not PFR.

Lukasz updates the driver to include a similar fix for the i40e driver
by reporting link down for VF's when the PF queues are not enabled.

Akeem updates the driver to report the VF link status once we get VF
resources so that we can reflect the link status similarly to how the PF
reports link speed.

Ashish updates the transmit context structure based on recent changes to
the hardware specification.

Dave updates the DCB logic to allow a delayed registration for MIB
change events so that the driver is not accepting events before it is
ready for them.

The following are changes since commit 0e5b36bc4c1fccfc18dd851d960781589c16dae8:
  r8152: adjust the settings of ups flags
and are available in the git repository at:
  git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue 100GbE

Akeem G Abodunrin (1):
  ice: Report VF link status with opcode to get resources

Anirudh Venkataramanan (5):
  ice: Add ice_get_main_vsi to get PF/main VSI
  ice: Check root pointer for validity
  ice: Check for DCB capability before initializing DCB
  ice: Minor refactor in queue management
  ice: Rework around device/function capabilities

Ashish Shah (1):
  ice: update Tx context struct

Brett Creeley (1):
  ice: Update fields in ice_vsi_set_num_qs when reconfiguring

Dave Ertman (1):
  ice: Allow for delayed LLDP MIB change registration

Jesse Brandeburg (5):
  ice: clean up arguments
  ice: move code closer together
  ice: small efficiency fixes
  ice: change work limit to a constant
  ice: change default number of receive descriptors

Lukasz Czapnik (1):
  ice: report link down for VF when PF's queues are not enabled

Mitch Williams (1):
  ice: Reliably reset VFs

 drivers/net/ethernet/intel/ice/ice.h          | 46 +++---------
 drivers/net/ethernet/intel/ice/ice_common.c   | 43 +++++------
 drivers/net/ethernet/intel/ice/ice_dcb.c      | 39 +++++++++-
 drivers/net/ethernet/intel/ice/ice_dcb.h      | 11 +--
 drivers/net/ethernet/intel/ice/ice_dcb_lib.c  |  7 +-
 drivers/net/ethernet/intel/ice/ice_ethtool.c  | 24 +++---
 .../net/ethernet/intel/ice/ice_lan_tx_rx.h    |  1 +
 drivers/net/ethernet/intel/ice/ice_lib.c      | 29 ++++----
 drivers/net/ethernet/intel/ice/ice_main.c     | 73 +++++++++++--------
 drivers/net/ethernet/intel/ice/ice_sched.c    |  2 +-
 drivers/net/ethernet/intel/ice/ice_txrx.c     | 53 +++++++-------
 .../net/ethernet/intel/ice/ice_virtchnl_pf.c  | 36 +++++----
 12 files changed, 195 insertions(+), 169 deletions(-)

-- 
2.21.0

^ permalink raw reply

* [net-next 03/16] ice: Check root pointer for validity
From: Jeff Kirsher @ 2019-09-05 20:33 UTC (permalink / raw)
  To: davem
  Cc: Anirudh Venkataramanan, netdev, nhorman, sassmann, Andrew Bowers,
	Jeff Kirsher
In-Reply-To: <20190905203406.4152-1-jeffrey.t.kirsher@intel.com>

From: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>

ice_sched_get_tc_node uses pi->root without checking for NULL. Add a
check to prevent NULL pointer dereference.

Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ice/ice_sched.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_sched.c b/drivers/net/ethernet/intel/ice/ice_sched.c
index 79d64f9ed609..fc624b73d05d 100644
--- a/drivers/net/ethernet/intel/ice/ice_sched.c
+++ b/drivers/net/ethernet/intel/ice/ice_sched.c
@@ -284,7 +284,7 @@ struct ice_sched_node *ice_sched_get_tc_node(struct ice_port_info *pi, u8 tc)
 {
 	u8 i;
 
-	if (!pi)
+	if (!pi || !pi->root)
 		return NULL;
 	for (i = 0; i < pi->root->num_children; i++)
 		if (pi->root->children[i]->tc_num == tc)
-- 
2.21.0


^ permalink raw reply related

* RE: [Intel-wired-lan] [PATCH] iavf: fix MAC address setting for VFs when filter is rejected
From: Bowers, AndrewX @ 2019-09-05 20:32 UTC (permalink / raw)
  To: intel-wired-lan@lists.osuosl.org; +Cc: netdev@vger.kernel.org
In-Reply-To: <20190905063422.28743-1-sassmann@kpanic.de>

> -----Original Message-----
> From: Intel-wired-lan [mailto:intel-wired-lan-bounces@osuosl.org] On
> Behalf Of Stefan Assmann
> Sent: Wednesday, September 4, 2019 11:34 PM
> To: intel-wired-lan@lists.osuosl.org
> Cc: netdev@vger.kernel.org; davem@davemloft.net; sassmann@kpanic.de
> Subject: [Intel-wired-lan] [PATCH] iavf: fix MAC address setting for VFs when
> filter is rejected
> 
> Currently iavf unconditionally applies MAC address change requests. This
> brings the VF in a state where it is no longer able to pass traffic if the PF
> rejects a MAC filter change for the VF.
> A typical scenario for a rejected MAC filter is for an untrusted VF to request
> to change the MAC address when an administratively set MAC is present.
> 
> To keep iavf working in this scenario the MAC filter handling in iavf needs to
> act on the PF reply regarding the MAC filter change. In the case of an ack the
> new MAC address gets set, whereas in the case of a nack the previous MAC
> address needs to stay in place.
> 
> Signed-off-by: Stefan Assmann <sassmann@kpanic.de>
> ---
>  drivers/net/ethernet/intel/iavf/iavf_main.c     | 1 -
>  drivers/net/ethernet/intel/iavf/iavf_virtchnl.c | 7 +++++++
>  2 files changed, 7 insertions(+), 1 deletion(-)

Tested-by: Andrew Bowers <andrewx.bowers@intel.com>



^ permalink raw reply

* Re: [PATCH net] tcp: ulp: fix possible crash in tcp_diag_get_aux_size()
From: Eric Dumazet @ 2019-09-05 20:21 UTC (permalink / raw)
  To: David S . Miller
  Cc: netdev, Eric Dumazet, Luke Hsiao, Neal Cardwell, Davide Caratti
In-Reply-To: <20190905202041.138085-1-edumazet@google.com>

On Thu, Sep 5, 2019 at 10:20 PM Eric Dumazet <edumazet@google.com> wrote:
>
> tcp_diag_get_aux_size() can be called with sockets in any state.
>
> icsk_ulp_ops is only present for full sockets.
>
> For SYN_RECV or TIME_WAIT ones we would access garbage.
>
> Fixes: 61723b393292 ("tcp: ulp: add functions to dump ulp-specific information")
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Reported-by: Luke Hsiao <lukehsiao@google.com>
> Reported-by: Neal Cardwell <ncardwell@google.com>
> Cc: Davide Caratti <dcaratti@redhat.com>
> ---

Sorry for the 'net' tag. This patch targets net-next tree only.

^ permalink raw reply

* [PATCH net] tcp: ulp: fix possible crash in tcp_diag_get_aux_size()
From: Eric Dumazet @ 2019-09-05 20:20 UTC (permalink / raw)
  To: David S . Miller
  Cc: netdev, Eric Dumazet, Eric Dumazet, Luke Hsiao, Neal Cardwell,
	Davide Caratti

tcp_diag_get_aux_size() can be called with sockets in any state.

icsk_ulp_ops is only present for full sockets.

For SYN_RECV or TIME_WAIT ones we would access garbage.

Fixes: 61723b393292 ("tcp: ulp: add functions to dump ulp-specific information")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Luke Hsiao <lukehsiao@google.com>
Reported-by: Neal Cardwell <ncardwell@google.com>
Cc: Davide Caratti <dcaratti@redhat.com>
---
 net/ipv4/tcp_diag.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv4/tcp_diag.c b/net/ipv4/tcp_diag.c
index babc156deabb573f11bba344215d2c3712c4a3cd..81a8221d650a94be53d17354c60ddd0c655eaccf 100644
--- a/net/ipv4/tcp_diag.c
+++ b/net/ipv4/tcp_diag.c
@@ -163,7 +163,7 @@ static size_t tcp_diag_get_aux_size(struct sock *sk, bool net_admin)
 	}
 #endif
 
-	if (net_admin) {
+	if (net_admin && sk_fullsock(sk)) {
 		const struct tcp_ulp_ops *ulp_ops;
 
 		ulp_ops = icsk->icsk_ulp_ops;
-- 
2.23.0.187.g17f5b7556c-goog


^ permalink raw reply related

* general protection fault in dev_map_hash_update_elem
From: syzbot @ 2019-09-05 20:08 UTC (permalink / raw)
  To: ast, bpf, daniel, davem, hawk, jakub.kicinski, john.fastabend,
	kafai, linux-kernel, netdev, songliubraving, syzkaller-bugs, yhs

Hello,

syzbot found the following crash on:

HEAD commit:    6d028043 Add linux-next specific files for 20190830
git tree:       linux-next
console output: https://syzkaller.appspot.com/x/log.txt?x=135c1a92600000
kernel config:  https://syzkaller.appspot.com/x/.config?x=82a6bec43ab0cb69
dashboard link: https://syzkaller.appspot.com/bug?extid=4e7a85b1432052e8d6f8
compiler:       gcc (GCC) 9.0.0 20181231 (experimental)
syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=109124e1600000

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+4e7a85b1432052e8d6f8@syzkaller.appspotmail.com

kasan: CONFIG_KASAN_INLINE enabled
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] PREEMPT SMP KASAN
CPU: 1 PID: 10235 Comm: syz-executor.0 Not tainted 5.3.0-rc6-next-20190830  
#75
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011
RIP: 0010:__write_once_size include/linux/compiler.h:203 [inline]
RIP: 0010:__hlist_del include/linux/list.h:795 [inline]
RIP: 0010:hlist_del_rcu include/linux/rculist.h:475 [inline]
RIP: 0010:__dev_map_hash_update_elem kernel/bpf/devmap.c:668 [inline]
RIP: 0010:dev_map_hash_update_elem+0x3c8/0x6e0 kernel/bpf/devmap.c:691
Code: 48 89 f1 48 89 75 c8 48 c1 e9 03 80 3c 11 00 0f 85 d3 02 00 00 48 b9  
00 00 00 00 00 fc ff df 48 8b 53 10 48 89 d6 48 c1 ee 03 <80> 3c 0e 00 0f  
85 97 02 00 00 48 85 c0 48 89 02 74 38 48 89 55 b8
RSP: 0018:ffff88808d607c30 EFLAGS: 00010046
RAX: 0000000000000000 RBX: ffff8880a7f14580 RCX: dffffc0000000000
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8880a7f14588
RBP: ffff88808d607c78 R08: 0000000000000004 R09: ffffed1011ac0f73
R10: ffffed1011ac0f72 R11: 0000000000000003 R12: ffff88809f4e9400
R13: ffff88809b06ba00 R14: 0000000000000000 R15: ffff88809f4e9528
FS:  00007f3a3d50c700(0000) GS:ffff8880ae900000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007feb3fcd0000 CR3: 00000000986b9000 CR4: 00000000001406e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
  map_update_elem+0xc82/0x10b0 kernel/bpf/syscall.c:966
  __do_sys_bpf+0x8b5/0x3350 kernel/bpf/syscall.c:2854
  __se_sys_bpf kernel/bpf/syscall.c:2825 [inline]
  __x64_sys_bpf+0x73/0xb0 kernel/bpf/syscall.c:2825
  do_syscall_64+0xfa/0x760 arch/x86/entry/common.c:290
  entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x459879
Code: fd b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7  
48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff  
ff 0f 83 cb b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f3a3d50bc78 EFLAGS: 00000246 ORIG_RAX: 0000000000000141
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 0000000000459879
RDX: 0000000000000020 RSI: 0000000020000040 RDI: 0000000000000002
RBP: 000000000075bf20 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00007f3a3d50c6d4
R13: 00000000004bfc86 R14: 00000000004d1960 R15: 00000000ffffffff
Modules linked in:
---[ end trace 083223e21dbd0ae5 ]---
RIP: 0010:__write_once_size include/linux/compiler.h:203 [inline]
RIP: 0010:__hlist_del include/linux/list.h:795 [inline]
RIP: 0010:hlist_del_rcu include/linux/rculist.h:475 [inline]
RIP: 0010:__dev_map_hash_update_elem kernel/bpf/devmap.c:668 [inline]
RIP: 0010:dev_map_hash_update_elem+0x3c8/0x6e0 kernel/bpf/devmap.c:691
Code: 48 89 f1 48 89 75 c8 48 c1 e9 03 80 3c 11 00 0f 85 d3 02 00 00 48 b9  
00 00 00 00 00 fc ff df 48 8b 53 10 48 89 d6 48 c1 ee 03 <80> 3c 0e 00 0f  
85 97 02 00 00 48 85 c0 48 89 02 74 38 48 89 55 b8
RSP: 0018:ffff88808d607c30 EFLAGS: 00010046
RAX: 0000000000000000 RBX: ffff8880a7f14580 RCX: dffffc0000000000
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8880a7f14588
RBP: ffff88808d607c78 R08: 0000000000000004 R09: ffffed1011ac0f73
R10: ffffed1011ac0f72 R11: 0000000000000003 R12: ffff88809f4e9400
R13: ffff88809b06ba00 R14: 0000000000000000 R15: ffff88809f4e9528
FS:  00007f3a3d50c700(0000) GS:ffff8880ae900000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007feb3fcd0000 CR3: 00000000986b9000 CR4: 00000000001406e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400


---
This bug is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this bug report. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
syzbot can test patches for this bug, for details see:
https://goo.gl/tpsmEJ#testing-patches

^ permalink raw reply

* Re: [PATCH net-next] net/mlx5: DR, Fix error return code in dr_domain_init_resources()
From: Saeed Mahameed @ 2019-09-05 19:37 UTC (permalink / raw)
  To: Erez Shitrit, weiyongjun1@huawei.com, Mark Bloch, leon@kernel.org,
	Alex Vesker
  Cc: kernel-janitors@vger.kernel.org, netdev@vger.kernel.org,
	linux-rdma@vger.kernel.org
In-Reply-To: <20190905095600.127371-1-weiyongjun1@huawei.com>

On Thu, 2019-09-05 at 09:56 +0000, Wei Yongjun wrote:
> Fix to return negative error code -ENOMEM from the error handling
> case instead of 0, as done elsewhere in this function.
> 
> Fixes: 4ec9e7b02697 ("net/mlx5: DR, Expose steering domain
> functionality")
> Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
> 

Applied to net-next-mlx5.
Thanks !


^ permalink raw reply

* Re: [PATCH net-next] net/mlx5: DR, Remove useless set memory to zero use memset()
From: Saeed Mahameed @ 2019-09-05 19:37 UTC (permalink / raw)
  To: Erez Shitrit, weiyongjun1@huawei.com, Mark Bloch, leon@kernel.org,
	Alex Vesker
  Cc: kernel-janitors@vger.kernel.org, netdev@vger.kernel.org,
	linux-rdma@vger.kernel.org
In-Reply-To: <20190905095326.127277-1-weiyongjun1@huawei.com>

On Thu, 2019-09-05 at 09:53 +0000, Wei Yongjun wrote:
> The memory return by kzalloc() has already be set to zero, so
> remove useless memset(0).
> 
> Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>

Applied to net-next-mlx5.
Thanks !

^ permalink raw reply

* Zdravstvujte! Vas interesujut klientskie bazy dannyh?
From: netdev @ 2019-09-05 17:56 UTC (permalink / raw)
  To: netdev

Zdravstvujte! Vas interesujut klientskie bazy dannyh?

^ permalink raw reply

* Re: linux-next: build failure after merge of the net-next tree
From: Andrii Nakryiko @ 2019-09-05 19:26 UTC (permalink / raw)
  To: Masahiro Yamada
  Cc: Stephen Rothwell, David Miller, Networking,
	Linux Next Mailing List, Linux Kernel Mailing List,
	Andrii Nakryiko, Daniel Borkmann, Alexei Starovoitov
In-Reply-To: <CAK7LNAQEU6uu-Z=VeR2KNa8ezCLA7VHtpvM2tvAKsWtUTi6Eug@mail.gmail.com>

On Tue, Sep 3, 2019 at 11:20 PM Masahiro Yamada
<yamada.masahiro@socionext.com> wrote:
>
> On Wed, Sep 4, 2019 at 3:00 PM Stephen Rothwell <sfr@canb.auug.org.au> wrote:
> >
> > Hi all,
> >
> > After merging the net-next tree, today's linux-next build (arm
> > multi_v7_defconfig) failed like this:
> >
> > scripts/link-vmlinux.sh: 74: Bad substitution
> >
> > Caused by commit
> >
> >   341dfcf8d78e ("btf: expose BTF info through sysfs")
> >
> > interacting with commit
> >
> >   1267f9d3047d ("kbuild: add $(BASH) to run scripts with bash-extension")
> >
> > from the kbuild tree.
>
>
> I knew that they were using bash-extension
> in the #!/bin/sh script.  :-D
>
> In fact, I wrote my patch in order to break their code
> and  make btf people realize that they were doing wrong.

Was there a specific reason to wait until this would break during
Stephen's merge, instead of giving me a heads up (or just replying on
original patch) and letting me fix it and save everyone's time and
efforts?

Either way, I've fixed the issue in
https://patchwork.ozlabs.org/patch/1158620/ and will pay way more
attention to BASH-specific features going forward (I found it pretty
hard to verify stuff like this, unfortunately). But again, code review
process is the best place to catch this and I really hope in the
future we can keep this process productive. Thanks!

>
>
>
> > The change in the net-next tree turned link-vmlinux.sh into a bash script
> > (I think).
> >
> > I have applied the following patch for today:
>
>
> But, this is a temporary fix only for linux-next.
>
> scripts/link-vmlinux.sh does not need to use the
> bash-extension ${@:2} in the first place.
>
> I hope btf people will write the correct code.

I replaced ${@:2} with shift and ${@}, I hope that's a correct fix,
but if you think it's not, please reply on the patch and let me know.


>
> Thanks.
>
>
>
>
> > From: Stephen Rothwell <sfr@canb.auug.org.au>
> > Date: Wed, 4 Sep 2019 15:43:41 +1000
> > Subject: [PATCH] link-vmlinux.sh is now a bash script
> >
> > Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
> > ---
> >  Makefile                | 4 ++--
> >  scripts/link-vmlinux.sh | 2 +-
> >  2 files changed, 3 insertions(+), 3 deletions(-)
> >
> > diff --git a/Makefile b/Makefile
> > index ac97fb282d99..523d12c5cebe 100644
> > --- a/Makefile
> > +++ b/Makefile
> > @@ -1087,7 +1087,7 @@ ARCH_POSTLINK := $(wildcard $(srctree)/arch/$(SRCARCH)/Makefile.postlink)
> >
> >  # Final link of vmlinux with optional arch pass after final link
> >  cmd_link-vmlinux =                                                 \
> > -       $(CONFIG_SHELL) $< $(LD) $(KBUILD_LDFLAGS) $(LDFLAGS_vmlinux) ;    \
> > +       $(BASH) $< $(LD) $(KBUILD_LDFLAGS) $(LDFLAGS_vmlinux) ;    \
> >         $(if $(ARCH_POSTLINK), $(MAKE) -f $(ARCH_POSTLINK) $@, true)
> >
> >  vmlinux: scripts/link-vmlinux.sh autoksyms_recursive $(vmlinux-deps) FORCE
> > @@ -1403,7 +1403,7 @@ clean: rm-files := $(CLEAN_FILES)
> >  PHONY += archclean vmlinuxclean
> >
> >  vmlinuxclean:
> > -       $(Q)$(CONFIG_SHELL) $(srctree)/scripts/link-vmlinux.sh clean
> > +       $(Q)$(BASH) $(srctree)/scripts/link-vmlinux.sh clean
> >         $(Q)$(if $(ARCH_POSTLINK), $(MAKE) -f $(ARCH_POSTLINK) clean)
> >
> >  clean: archclean vmlinuxclean
> > diff --git a/scripts/link-vmlinux.sh b/scripts/link-vmlinux.sh
> > index f7edb75f9806..ea1f8673869d 100755
> > --- a/scripts/link-vmlinux.sh
> > +++ b/scripts/link-vmlinux.sh
> > @@ -1,4 +1,4 @@
> > -#!/bin/sh
> > +#!/bin/bash
> >  # SPDX-License-Identifier: GPL-2.0
> >  #
> >  # link vmlinux
> > --
> > 2.23.0.rc1
> >
> > --
> > Cheers,
> > Stephen Rothwell
>
>
>
> --
> Best Regards
> Masahiro Yamada

^ permalink raw reply

* Re: [PATCH v2 00/10] Add definition for the number of standard PCI BARs
From: Denis Efremov @ 2019-09-05 19:02 UTC (permalink / raw)
  To: Andrew Murray
  Cc: Bjorn Helgaas, linux-kernel, linux-pci, Sebastian Ott,
	Gerald Schaefer, H. Peter Anvin, Giuseppe Cavallaro,
	Alexandre Torgue, Matt Porter, Alexandre Bounine, Peter Jones,
	Bartlomiej Zolnierkiewicz, Cornelia Huck, Alex Williamson,
	Jose Abreu, kvm, linux-fbdev, netdev, x86, linux-s390
In-Reply-To: <20190816105128.GD14111@e119886-lin.cambridge.arm.com>



On 16.08.2019 13:51, Andrew Murray wrote:
> On Fri, Aug 16, 2019 at 12:24:27PM +0300, Denis Efremov wrote:
>> Code that iterates over all standard PCI BARs typically uses
>> PCI_STD_RESOURCE_END, but this is error-prone because it requires
>> "i <= PCI_STD_RESOURCE_END" rather than something like
>> "i < PCI_STD_NUM_BARS". We could add such a definition and use it the same
>> way PCI_SRIOV_NUM_BARS is used. There is already the definition
>> PCI_BAR_COUNT for s390 only. Thus, this patchset introduces it globally.
>>
>> Changes in v2:
>>   - Reverse checks in pci_iomap_range,pci_iomap_wc_range.
>>   - Refactor loops in vfio_pci to keep PCI_STD_RESOURCES.
>>   - Add 2 new patches to replace the magic constant with new define.
>>   - Split net patch in v1 to separate stmmac and dwc-xlgmac patches.
>>
>> Denis Efremov (10):
>>   PCI: Add define for the number of standard PCI BARs
>>   s390/pci: Loop using PCI_STD_NUM_BARS
>>   x86/PCI: Loop using PCI_STD_NUM_BARS
>>   stmmac: pci: Loop using PCI_STD_NUM_BARS
>>   net: dwc-xlgmac: Loop using PCI_STD_NUM_BARS
>>   rapidio/tsi721: Loop using PCI_STD_NUM_BARS
>>   efifb: Loop using PCI_STD_NUM_BARS
>>   vfio_pci: Loop using PCI_STD_NUM_BARS
>>   PCI: hv: Use PCI_STD_NUM_BARS
>>   PCI: Use PCI_STD_NUM_BARS
>>
>>  arch/s390/include/asm/pci.h                      |  5 +----
>>  arch/s390/include/asm/pci_clp.h                  |  6 +++---
>>  arch/s390/pci/pci.c                              | 16 ++++++++--------
>>  arch/s390/pci/pci_clp.c                          |  6 +++---
>>  arch/x86/pci/common.c                            |  2 +-
>>  drivers/net/ethernet/stmicro/stmmac/stmmac_pci.c |  4 ++--
>>  drivers/net/ethernet/synopsys/dwc-xlgmac-pci.c   |  2 +-
>>  drivers/pci/controller/pci-hyperv.c              | 10 +++++-----
>>  drivers/pci/pci.c                                | 11 ++++++-----
>>  drivers/pci/quirks.c                             |  4 ++--
>>  drivers/rapidio/devices/tsi721.c                 |  2 +-
>>  drivers/vfio/pci/vfio_pci.c                      | 11 +++++++----
>>  drivers/vfio/pci/vfio_pci_config.c               | 10 ++++++----
>>  drivers/vfio/pci/vfio_pci_private.h              |  4 ++--
>>  drivers/video/fbdev/efifb.c                      |  2 +-
>>  include/linux/pci.h                              |  2 +-
>>  include/uapi/linux/pci_regs.h                    |  1 +
>>  17 files changed, 51 insertions(+), 47 deletions(-)
> 
> I've come across a few more places where this change can be made. There
> may be multiple instances in the same file, but only the first is shown
> below:
> 
> drivers/misc/pci_endpoint_test.c:       for (bar = BAR_0; bar <= BAR_5; bar++) {
> drivers/net/ethernet/intel/e1000/e1000_main.c:          for (i = BAR_1; i <= BAR_5; i++) {
> drivers/net/ethernet/intel/ixgb/ixgb_main.c:    for (i = BAR_1; i <= BAR_5; i++) {
> drivers/pci/controller/dwc/pci-dra7xx.c:        for (bar = BAR_0; bar <= BAR_5; bar++)
> drivers/pci/controller/dwc/pci-layerscape-ep.c: for (bar = BAR_0; bar <= BAR_5; bar++)
> drivers/pci/controller/dwc/pcie-artpec6.c:      for (bar = BAR_0; bar <= BAR_5; bar++)
> drivers/pci/controller/dwc/pcie-designware-plat.c:      for (bar = BAR_0; bar <= BAR_5; bar++)
> drivers/pci/endpoint/functions/pci-epf-test.c:  for (bar = BAR_0; bar <= BAR_5; bar++) {
> include/linux/pci-epc.h:        u64     bar_fixed_size[BAR_5 + 1];
> drivers/scsi/pm8001/pm8001_hwi.c:       for (bar = 0; bar < 6; bar++) {
> drivers/scsi/pm8001/pm8001_init.c:      for (bar = 0; bar < 6; bar++) {
> drivers/ata/sata_nv.c:  for (bar = 0; bar < 6; bar++)
> drivers/video/fbdev/core/fbmem.c:       for (idx = 0, bar = 0; bar < PCI_ROM_RESOURCE; bar++) {
> drivers/staging/gasket/gasket_core.c:   for (i = 0; i < GASKET_NUM_BARS; i++) {
> drivers/tty/serial/8250/8250_pci.c:     for (i = 0; i < PCI_NUM_BAR_RESOURCES; i++) { <-----------
> 
> It looks like BARs are often iterated with PCI_NUM_BAR_RESOURCES, there
> are a load of these too found with:
> 
> git grep PCI_ROM_RESOURCE | grep "< "
> 
> I'm happy to share patches if preferred.
> 

I'm not sure about lib/devres.c
265:#define PCIM_IOMAP_MAX      PCI_ROM_RESOURCE
268:    void __iomem *table[PCIM_IOMAP_MAX];
277:    for (i = 0; i < PCIM_IOMAP_MAX; i++)
324:    BUG_ON(bar >= PCIM_IOMAP_MAX);
352:    for (i = 0; i < PCIM_IOMAP_MAX; i++)
455:    for (i = 0; i < PCIM_IOMAP_MAX; i++) {

Is it worth changing?
#define PCIM_IOMAP_MAX  PCI_STD_NUM_BARS

Thanks,
Denis

^ permalink raw reply

* Re: [PATCH v2] net-ipv6: fix excessive RTF_ADDRCONF flag on ::1/128 local route (and others)
From: Eric Dumazet @ 2019-09-05 18:49 UTC (permalink / raw)
  To: Maciej Żenczykowski, Maciej Żenczykowski,
	David S . Miller
  Cc: netdev, David Ahern, Lorenzo Colitti
In-Reply-To: <20190902162336.240405-1-zenczykowski@gmail.com>



On 9/2/19 6:23 PM, Maciej Żenczykowski wrote:
> From: Maciej Żenczykowski <maze@google.com>
> 
> There is a subtle change in behaviour introduced by:
>   commit c7a1ce397adacaf5d4bb2eab0a738b5f80dc3e43
>   'ipv6: Change addrconf_f6i_alloc to use ip6_route_info_create'
> 
> Before that patch /proc/net/ipv6_route includes:
> 00000000000000000000000000000001 80 00000000000000000000000000000000 00 00000000000000000000000000000000 00000000 00000003 00000000 80200001 lo
> 
> Afterwards /proc/net/ipv6_route includes:
> 00000000000000000000000000000001 80 00000000000000000000000000000000 00 00000000000000000000000000000000 00000000 00000002 00000000 80240001 lo
> 
> ie. the above commit causes the ::1/128 local (automatic) route to be flagged with RTF_ADDRCONF (0x040000).
> 
> AFAICT, this is incorrect since these routes are *not* coming from RA's.
> 
> As such, this patch restores the old behaviour.
> 
> Fixes: c7a1ce397adacaf5d4bb2eab0a738b5f80dc3e43
> Cc: David Ahern <dsahern@gmail.com>
> Cc: Lorenzo Colitti <lorenzo@google.com>
> Signed-off-by: Maciej Żenczykowski <maze@google.com>
> ---
>  net/ipv6/route.c | 8 ++++++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/net/ipv6/route.c b/net/ipv6/route.c
> index 558c6c68855f..516b2e568dae 100644
> --- a/net/ipv6/route.c
> +++ b/net/ipv6/route.c
> @@ -4365,13 +4365,14 @@ struct fib6_info *addrconf_f6i_alloc(struct net *net,
>  	struct fib6_config cfg = {
>  		.fc_table = l3mdev_fib_table(idev->dev) ? : RT6_TABLE_LOCAL,
>  		.fc_ifindex = idev->dev->ifindex,
> -		.fc_flags = RTF_UP | RTF_ADDRCONF | RTF_NONEXTHOP,
> +		.fc_flags = RTF_UP | RTF_NONEXTHOP,
>  		.fc_dst = *addr,
>  		.fc_dst_len = 128,
>  		.fc_protocol = RTPROT_KERNEL,
>  		.fc_nlinfo.nl_net = net,
>  		.fc_ignore_dev_down = true,
>  	};
> +	struct fib6_info *f6i;
>  
>  	if (anycast) {
>  		cfg.fc_type = RTN_ANYCAST;
> @@ -4381,7 +4382,10 @@ struct fib6_info *addrconf_f6i_alloc(struct net *net,
>  		cfg.fc_flags |= RTF_LOCAL;
>  	}
>  
> -	return ip6_route_info_create(&cfg, gfp_flags, NULL);
> +	f6i = ip6_route_info_create(&cfg, gfp_flags, NULL);
> +	if (f6i)
> +		f6i->dst_nocount = true;

Shouldn't it use 

	if (!IS_ERR(f6i))
		f6i->dst_nocount = true;

???


> +	return f6i;
>  }
>  
>  /* remove deleted ip from prefsrc entries */
> 

^ permalink raw reply

* [PATCH net] net: gso: Fix skb_segment splat when splitting gso_size mangled skb having linear-headed frag_list
From: Shmulik Ladkani @ 2019-09-05 18:36 UTC (permalink / raw)
  To: Daniel Borkmann, Eric Dumazet, Willem de Bruijn
  Cc: eyal, netdev, Shmulik Ladkani, Alexander Duyck

Historically, support for frag_list packets entering skb_segment() was
limited to frag_list members terminating on exact same gso_size
boundaries. This is verified with a BUG_ON since commit 89319d3801d1
("net: Add frag_list support to skb_segment"), quote:

    As such we require all frag_list members terminate on exact MSS
    boundaries.  This is checked using BUG_ON.
    As there should only be one producer in the kernel of such packets,
    namely GRO, this requirement should not be difficult to maintain.

However, since commit 6578171a7ff0 ("bpf: add bpf_skb_change_proto helper"),
the "exact MSS boundaries" assumption no longer holds:
An eBPF program using bpf_skb_change_proto() DOES modify 'gso_size', but
leaves the frag_list members as originally merged by GRO with the
original 'gso_size'. Example of such programs are bpf-based NAT46 or
NAT64.

This lead to a kernel BUG_ON for flows involving:
 - GRO generating a frag_list skb
 - bpf program performing bpf_skb_change_proto() or bpf_skb_adjust_room()
 - skb_segment() of the skb

See example BUG_ON reports in [0].

In commit 13acc94eff12 ("net: permit skb_segment on head_frag frag_list skb"),
skb_segment() was modified to support the "gso_size mangling" case of
a frag_list GRO'ed skb, but *only* for frag_list members having
head_frag==true (having a page-fragment head).

Alas, GRO packets having frag_list members with a linear kmalloced head
(head_frag==false) still hit the BUG_ON.

This commit adds support to skb_segment() for a 'head_skb' packet having
a frag_list whose members are *non* head_frag, with gso_size mangled, by
disabling SG and thus falling-back to copying the data from the given
'head_skb' into the generated segmented skbs - as suggested by Willem de
Bruijn [1].

Since this approach involves the penalty of skb_copy_and_csum_bits()
when building the segments, care was taken in order to enable this
solution only when required:
 - untrusted gso_size, by testing SKB_GSO_DODGY is set
   (SKB_GSO_DODGY is set by any gso_size mangling functions in
    net/core/filter.c)
 - the frag_list is non empty, its item is a non head_frag, *and* the
   headlen of the given 'head_skb' does not match the gso_size.

[0]
https://lore.kernel.org/netdev/20190826170724.25ff616f@pixies/
https://lore.kernel.org/netdev/9265b93f-253d-6b8c-f2b8-4b54eff1835c@fb.com/

[1]
https://lore.kernel.org/netdev/CA+FuTSfVsgNDi7c=GUU8nMg2hWxF2SjCNLXetHeVPdnxAW5K-w@mail.gmail.com/

Fixes: 6578171a7ff0 ("bpf: add bpf_skb_change_proto helper")
Suggested-by: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Alexander Duyck <alexander.duyck@gmail.com>
Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
---
 net/core/skbuff.c | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index ea8e8d332d85..c4bd1881acff 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -3678,6 +3678,24 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,
 	sg = !!(features & NETIF_F_SG);
 	csum = !!can_checksum_protocol(features, proto);

+	if (mss != GSO_BY_FRAGS &&
+	    (skb_shinfo(head_skb)->gso_type & SKB_GSO_DODGY)) {
+		/* gso_size is untrusted.
+		 *
+		 * If head_skb has a frag_list with a linear non head_frag
+		 * item, and head_skb's headlen does not fit requested
+		 * gso_size, fall back to copying the skbs - by disabling sg.
+		 *
+		 * We assume checking the first frag suffices, i.e if either of
+		 * the frags have non head_frag data, then the first frag is
+		 * too.
+		 */
+		if (list_skb && skb_headlen(list_skb) && !list_skb->head_frag &&
+		    (mss != skb_headlen(head_skb) - doffset)) {
+			sg = false;
+		}
+	}
+
 	if (sg && csum && (mss != GSO_BY_FRAGS))  {
 		if (!(features & NETIF_F_GSO_PARTIAL)) {
 			struct sk_buff *iter;
-- 
2.19.1

^ permalink raw reply related

* Re: rtnl_lock() question
From: Rustad, Mark D @ 2019-09-05 18:07 UTC (permalink / raw)
  To: Saeed Mahameed
  Cc: jonathan.lemon@gmail.com, eric.dumazet@gmail.com,
	netdev@vger.kernel.org
In-Reply-To: <867cf373f204715aec3b2e04ef9f65454cf25a2e.camel@mellanox.com>

[-- Attachment #1: Type: text/plain, Size: 566 bytes --]

On Sep 4, 2019, at 4:23 PM, Saeed Mahameed <saeedm@mellanox.com> wrote:

> some allocations require parameters that should remain valid and
> constant across the whole reconfiguration procedure such
> params.num_channels, so they must be done inside the lock.

You could always check if those parameters have changed once under the lock  
and, if they did, drop the lock, reallocate and try again. Since such  
changes should be very infrequent, this is something that really should not  
loop multiple times.

--
Mark Rustad, Networking Division, Intel Corporation

[-- Attachment #2: Message signed with OpenPGP --]
[-- Type: application/pgp-signature, Size: 873 bytes --]

^ permalink raw reply

* [patch net-next] net: fib_notifier: move fib_notifier_ops from struct net into per-net struct
From: Jiri Pirko @ 2019-09-05 18:06 UTC (permalink / raw)
  To: netdev; +Cc: davem, idosch, dsahern, mlxsw

From: Jiri Pirko <jiri@mellanox.com>

No need for fib_notifier_ops to be in struct net. It is used only by
fib_notifier as a private data. Use net_generic to introduce per-net
fib_notifier struct and move fib_notifier_ops there.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 include/net/net_namespace.h |  3 ---
 net/core/fib_notifier.c     | 29 +++++++++++++++++++++++------
 2 files changed, 23 insertions(+), 9 deletions(-)

diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h
index ab40d7afdc54..64bcb589a610 100644
--- a/include/net/net_namespace.h
+++ b/include/net/net_namespace.h
@@ -103,9 +103,6 @@ struct net {
 	/* core fib_rules */
 	struct list_head	rules_ops;
 
-	struct list_head	fib_notifier_ops;  /* Populated by
-						    * register_pernet_subsys()
-						    */
 	struct net_device       *loopback_dev;          /* The loopback */
 	struct netns_core	core;
 	struct netns_mib	mib;
diff --git a/net/core/fib_notifier.c b/net/core/fib_notifier.c
index 13a40b831d6d..470a606d5e8d 100644
--- a/net/core/fib_notifier.c
+++ b/net/core/fib_notifier.c
@@ -5,8 +5,15 @@
 #include <linux/module.h>
 #include <linux/init.h>
 #include <net/net_namespace.h>
+#include <net/netns/generic.h>
 #include <net/fib_notifier.h>
 
+static unsigned int fib_notifier_net_id;
+
+struct fib_notifier_net {
+	struct list_head fib_notifier_ops;
+};
+
 static ATOMIC_NOTIFIER_HEAD(fib_chain);
 
 int call_fib_notifier(struct notifier_block *nb, struct net *net,
@@ -34,6 +41,7 @@ EXPORT_SYMBOL(call_fib_notifiers);
 
 static unsigned int fib_seq_sum(void)
 {
+	struct fib_notifier_net *fn_net;
 	struct fib_notifier_ops *ops;
 	unsigned int fib_seq = 0;
 	struct net *net;
@@ -41,8 +49,9 @@ static unsigned int fib_seq_sum(void)
 	rtnl_lock();
 	down_read(&net_rwsem);
 	for_each_net(net) {
+		fn_net = net_generic(net, fib_notifier_net_id);
 		rcu_read_lock();
-		list_for_each_entry_rcu(ops, &net->fib_notifier_ops, list) {
+		list_for_each_entry_rcu(ops, &fn_net->fib_notifier_ops, list) {
 			if (!try_module_get(ops->owner))
 				continue;
 			fib_seq += ops->fib_seq_read(net);
@@ -58,9 +67,10 @@ static unsigned int fib_seq_sum(void)
 
 static int fib_net_dump(struct net *net, struct notifier_block *nb)
 {
+	struct fib_notifier_net *fn_net = net_generic(net, fib_notifier_net_id);
 	struct fib_notifier_ops *ops;
 
-	list_for_each_entry_rcu(ops, &net->fib_notifier_ops, list) {
+	list_for_each_entry_rcu(ops, &fn_net->fib_notifier_ops, list) {
 		int err;
 
 		if (!try_module_get(ops->owner))
@@ -127,12 +137,13 @@ EXPORT_SYMBOL(unregister_fib_notifier);
 static int __fib_notifier_ops_register(struct fib_notifier_ops *ops,
 				       struct net *net)
 {
+	struct fib_notifier_net *fn_net = net_generic(net, fib_notifier_net_id);
 	struct fib_notifier_ops *o;
 
-	list_for_each_entry(o, &net->fib_notifier_ops, list)
+	list_for_each_entry(o, &fn_net->fib_notifier_ops, list)
 		if (ops->family == o->family)
 			return -EEXIST;
-	list_add_tail_rcu(&ops->list, &net->fib_notifier_ops);
+	list_add_tail_rcu(&ops->list, &fn_net->fib_notifier_ops);
 	return 0;
 }
 
@@ -167,18 +178,24 @@ EXPORT_SYMBOL(fib_notifier_ops_unregister);
 
 static int __net_init fib_notifier_net_init(struct net *net)
 {
-	INIT_LIST_HEAD(&net->fib_notifier_ops);
+	struct fib_notifier_net *fn_net = net_generic(net, fib_notifier_net_id);
+
+	INIT_LIST_HEAD(&fn_net->fib_notifier_ops);
 	return 0;
 }
 
 static void __net_exit fib_notifier_net_exit(struct net *net)
 {
-	WARN_ON_ONCE(!list_empty(&net->fib_notifier_ops));
+	struct fib_notifier_net *fn_net = net_generic(net, fib_notifier_net_id);
+
+	WARN_ON_ONCE(!list_empty(&fn_net->fib_notifier_ops));
 }
 
 static struct pernet_operations fib_notifier_net_ops = {
 	.init = fib_notifier_net_init,
 	.exit = fib_notifier_net_exit,
+	.id = &fib_notifier_net_id,
+	.size = sizeof(struct fib_notifier_net),
 };
 
 static int __init fib_notifier_init(void)
-- 
2.21.0


^ permalink raw reply related

* [PATCH bpf-next] kbuild: replace BASH-specific ${@:2} with shift and ${@}
From: Andrii Nakryiko @ 2019-09-05 17:59 UTC (permalink / raw)
  To: bpf, netdev, ast, daniel
  Cc: andrii.nakryiko, kernel-team, Andrii Nakryiko, Stephen Rothwell,
	Masahiro Yamada

${@:2} is BASH-specific extension, which makes link-vmlinux.sh rely on
BASH. Use shift and ${@} instead to fix this issue.

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Fixes: 341dfcf8d78e ("btf: expose BTF info through sysfs")
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
---
 scripts/link-vmlinux.sh | 16 +++++++++++-----
 1 file changed, 11 insertions(+), 5 deletions(-)

diff --git a/scripts/link-vmlinux.sh b/scripts/link-vmlinux.sh
index 0d8f41db8cd6..8c59970a09dc 100755
--- a/scripts/link-vmlinux.sh
+++ b/scripts/link-vmlinux.sh
@@ -57,12 +57,16 @@ modpost_link()
 
 # Link of vmlinux
 # ${1} - output file
-# ${@:2} - optional extra .o files
+# ${2}, ${3}, ... - optional extra .o files
 vmlinux_link()
 {
 	local lds="${objtree}/${KBUILD_LDS}"
+	local output=${1}
 	local objects
 
+	# skip output file argument
+	shift
+
 	if [ "${SRCARCH}" != "um" ]; then
 		objects="--whole-archive			\
 			${KBUILD_VMLINUX_OBJS}			\
@@ -70,9 +74,10 @@ vmlinux_link()
 			--start-group				\
 			${KBUILD_VMLINUX_LIBS}			\
 			--end-group				\
-			${@:2}"
+			${@}"
 
-		${LD} ${KBUILD_LDFLAGS} ${LDFLAGS_vmlinux} -o ${1}	\
+		${LD} ${KBUILD_LDFLAGS} ${LDFLAGS_vmlinux}	\
+			-o ${output}				\
 			-T ${lds} ${objects}
 	else
 		objects="-Wl,--whole-archive			\
@@ -81,9 +86,10 @@ vmlinux_link()
 			-Wl,--start-group			\
 			${KBUILD_VMLINUX_LIBS}			\
 			-Wl,--end-group				\
-			${@:2}"
+			${@}"
 
-		${CC} ${CFLAGS_vmlinux} -o ${1}			\
+		${CC} ${CFLAGS_vmlinux}				\
+			-o ${output}				\
 			-Wl,-T,${lds}				\
 			${objects}				\
 			-lutil -lrt -lpthread
-- 
2.21.0


^ permalink raw reply related

* Applied "spi: Use an abbreviated pointer to ctlr->cur_msg in __spi_pump_messages" to the spi tree
From: Mark Brown @ 2019-09-05 17:39 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: andrew, broonie, f.fainelli, h.feurstein, linux-spi, Mark Brown,
	mlichvar, netdev, richardcochran
In-Reply-To: <20190905010114.26718-2-olteanv@gmail.com>

The patch

   spi: Use an abbreviated pointer to ctlr->cur_msg in __spi_pump_messages

has been applied to the spi tree at

   https://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi.git for-5.4

All being well this means that it will be integrated into the linux-next
tree (usually sometime in the next 24 hours) and sent to Linus during
the next merge window (or sooner if it is a bug fix), however if
problems are discovered then the patch may be dropped or reverted.  

You may get further e-mails resulting from automated or manual testing
and review of the tree, please engage with people reporting problems and
send followup patches addressing any issues that are reported if needed.

If any updates are required or you are submitting further changes they
should be sent as incremental updates against current git, existing
patches will not be replaced.

Please add any relevant lists and maintainers to the CCs when replying
to this mail.

Thanks,
Mark

From d1c44c9342c17e3314371325d9272684a075b65c Mon Sep 17 00:00:00 2001
From: Vladimir Oltean <olteanv@gmail.com>
Date: Thu, 5 Sep 2019 04:01:11 +0300
Subject: [PATCH] spi: Use an abbreviated pointer to ctlr->cur_msg in
 __spi_pump_messages

This helps a bit with line fitting now (the list_first_entry call) as
well as during the next patch which needs to iterate through all
transfers of ctlr->cur_msg so it timestamps them.

Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
Link: https://lore.kernel.org/r/20190905010114.26718-2-olteanv@gmail.com
Signed-off-by: Mark Brown <broonie@kernel.org>
---
 drivers/spi/spi.c | 23 ++++++++++++-----------
 1 file changed, 12 insertions(+), 11 deletions(-)

diff --git a/drivers/spi/spi.c b/drivers/spi/spi.c
index aef55acb5ccd..b2890923d256 100644
--- a/drivers/spi/spi.c
+++ b/drivers/spi/spi.c
@@ -1265,8 +1265,9 @@ EXPORT_SYMBOL_GPL(spi_finalize_current_transfer);
  */
 static void __spi_pump_messages(struct spi_controller *ctlr, bool in_kthread)
 {
-	unsigned long flags;
+	struct spi_message *msg;
 	bool was_busy = false;
+	unsigned long flags;
 	int ret;
 
 	/* Lock queue */
@@ -1325,10 +1326,10 @@ static void __spi_pump_messages(struct spi_controller *ctlr, bool in_kthread)
 	}
 
 	/* Extract head of queue */
-	ctlr->cur_msg =
-		list_first_entry(&ctlr->queue, struct spi_message, queue);
+	msg = list_first_entry(&ctlr->queue, struct spi_message, queue);
+	ctlr->cur_msg = msg;
 
-	list_del_init(&ctlr->cur_msg->queue);
+	list_del_init(&msg->queue);
 	if (ctlr->busy)
 		was_busy = true;
 	else
@@ -1361,7 +1362,7 @@ static void __spi_pump_messages(struct spi_controller *ctlr, bool in_kthread)
 			if (ctlr->auto_runtime_pm)
 				pm_runtime_put(ctlr->dev.parent);
 
-			ctlr->cur_msg->status = ret;
+			msg->status = ret;
 			spi_finalize_current_message(ctlr);
 
 			mutex_unlock(&ctlr->io_mutex);
@@ -1369,28 +1370,28 @@ static void __spi_pump_messages(struct spi_controller *ctlr, bool in_kthread)
 		}
 	}
 
-	trace_spi_message_start(ctlr->cur_msg);
+	trace_spi_message_start(msg);
 
 	if (ctlr->prepare_message) {
-		ret = ctlr->prepare_message(ctlr, ctlr->cur_msg);
+		ret = ctlr->prepare_message(ctlr, msg);
 		if (ret) {
 			dev_err(&ctlr->dev, "failed to prepare message: %d\n",
 				ret);
-			ctlr->cur_msg->status = ret;
+			msg->status = ret;
 			spi_finalize_current_message(ctlr);
 			goto out;
 		}
 		ctlr->cur_msg_prepared = true;
 	}
 
-	ret = spi_map_msg(ctlr, ctlr->cur_msg);
+	ret = spi_map_msg(ctlr, msg);
 	if (ret) {
-		ctlr->cur_msg->status = ret;
+		msg->status = ret;
 		spi_finalize_current_message(ctlr);
 		goto out;
 	}
 
-	ret = ctlr->transfer_one_message(ctlr, ctlr->cur_msg);
+	ret = ctlr->transfer_one_message(ctlr, msg);
 	if (ret) {
 		dev_err(&ctlr->dev,
 			"failed to transfer one message from queue\n");
-- 
2.20.1


^ permalink raw reply related

* Re: [PATCH v2 1/2] ethtool: implement Energy Detect Powerdown support via phy-tunable
From: Florian Fainelli @ 2019-09-05 17:23 UTC (permalink / raw)
  To: Ardelean, Alexandru, andrew@lunn.ch
  Cc: davem@davemloft.net, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, hkallweit1@gmail.com
In-Reply-To: <361eb94a4da73d1fa21893e8e294639f0fc0bcd2.camel@analog.com>

On 9/4/19 11:25 PM, Ardelean, Alexandru wrote:
> On Wed, 2019-09-04 at 21:53 +0200, Andrew Lunn wrote:
>> [External]
>>
>> On Wed, Sep 04, 2019 at 07:23:21PM +0300, Alexandru Ardelean wrote:
>>
>> Hi Alexandru
>>
>> Somewhere we need a comment stating what EDPD means. Here would be a
>> good place.
> 
> ack
> 
>>
>>> +#define ETHTOOL_PHY_EDPD_DFLT_TX_INTERVAL	0x7fff
>>> +#define ETHTOOL_PHY_EDPD_NO_TX			0x8000
>>> +#define ETHTOOL_PHY_EDPD_DISABLE		0
>>
>> I think you are passing a u16. So why not 0xfffe and 0xffff?  We also
>> need to make it clear what the units are for interval. This file
> 
> I initially thought about keeping this u8 and going with 0xff & 0xfe.
> But 254 or 253 could be too small to specify the value of an interval.
> 
> Also (maybe due ti all the coding-patterns that I saw over the course of some time), make me feel that I should add a
> flag somewhere.
> 
> Bottom line is: 0xfffe and 0xffff also work from my side, if it is acceptable (by the community).
> 
> Another approach I considered, was to maybe have this EDPD just do enable & disable (which is sufficient for the `adin`
> PHY & `micrel` as well).
> That would mean that if we would ever want to configure the TX interval (in the future), we would need an extra PHY-
> tunable parameter just for that; because changing the enable/disable behavior would be dangerous.
> And also, deferring the TX-interval configuration, does not sound like good design/pattern, since it can allow for tons
> of PHY-tunable parameters for every little knob.

It seems to me that the interval is a better way to deal with that, if
you specify a non zero interval, you enable EDPD, even if your PHY can
only act on an enable/disable bit. For PHYs that do support setting a TX
internal, the non-zero interval can be translated into whatever
appropriate unit. In all cases, a 0 interval means disable.

Andrew, does that work  for you?
-- 
Florian

^ permalink raw reply

* Re: [PATCH] net/skbuff: silence warnings under memory pressure
From: Steven Rostedt @ 2019-09-05 17:23 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Qian Cai, Petr Mladek, Sergey Senozhatsky, Michal Hocko,
	Eric Dumazet, davem, netdev, linux-mm, linux-kernel
In-Reply-To: <20190905113208.GA521@jagdpanzerIV>

On Thu, 5 Sep 2019 20:32:08 +0900
Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> wrote:

> I think we can queue significantly much less irq_work-s from printk().
> 
> Petr, Steven, what do you think?

What if we just rate limit the wake ups of klogd? I mean, really, do we
need to keep calling wake up if it probably never even executed?

-- Steve

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox