Netdev List
 help / color / mirror / Atom feed
* [net-next 11/16] ice: Add stats for Rx drops at the port level
From: Jeff Kirsher @ 2019-07-31 20:41 UTC (permalink / raw)
  To: davem
  Cc: Brett Creeley, netdev, nhorman, sassmann, Tony Nguyen,
	Andrew Bowers, Jeff Kirsher
In-Reply-To: <20190731204147.8582-1-jeffrey.t.kirsher@intel.com>

From: Brett Creeley <brett.creeley@intel.com>

Currently we are not reporting dropped counts at the port level to
ethtool or netlink. This was found when debugging Rx dropped issues
and the total packets sent did not equal the total packets received
minus the rx_dropped, which was very confusing. To determine dropped
counts at the port level we need to read the PRTRPB_RDPC register.
To fix reporting we will store the dropped counts in the PF's
rx_discards. This will be reported to netlink by storing it in the
PF VSI's rx_missed_errors signaling that the receiver missed the
packet. Also, we will report this to ethtool in the rx_dropped.nic
field.

Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ice/ice_hw_autogen.h | 1 +
 drivers/net/ethernet/intel/ice/ice_main.c       | 6 ++++++
 2 files changed, 7 insertions(+)

diff --git a/drivers/net/ethernet/intel/ice/ice_hw_autogen.h b/drivers/net/ethernet/intel/ice/ice_hw_autogen.h
index 3250dfc00002..87652d722a30 100644
--- a/drivers/net/ethernet/intel/ice/ice_hw_autogen.h
+++ b/drivers/net/ethernet/intel/ice/ice_hw_autogen.h
@@ -337,5 +337,6 @@
 #define VSIQF_HLUT_MAX_INDEX			15
 #define VFINT_DYN_CTLN(_i)			(0x00003800 + ((_i) * 4))
 #define VFINT_DYN_CTLN_CLEARPBA_M		BIT(1)
+#define PRTRPB_RDPC				0x000AC260
 
 #endif /* _ICE_HW_AUTOGEN_H_ */
diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c
index d13f56803658..e4be9aa79337 100644
--- a/drivers/net/ethernet/intel/ice/ice_main.c
+++ b/drivers/net/ethernet/intel/ice/ice_main.c
@@ -3297,6 +3297,8 @@ static void ice_update_vsi_stats(struct ice_vsi *vsi)
 		cur_ns->rx_errors = pf->stats.crc_errors +
 				    pf->stats.illegal_bytes;
 		cur_ns->rx_length_errors = pf->stats.rx_len_errors;
+		/* record drops from the port level */
+		cur_ns->rx_missed_errors = pf->stats.eth.rx_discards;
 	}
 }
 
@@ -3330,6 +3332,10 @@ static void ice_update_pf_stats(struct ice_pf *pf)
 			  &prev_ps->eth.rx_broadcast,
 			  &cur_ps->eth.rx_broadcast);
 
+	ice_stat_update32(hw, PRTRPB_RDPC, pf->stat_prev_loaded,
+			  &prev_ps->eth.rx_discards,
+			  &cur_ps->eth.rx_discards);
+
 	ice_stat_update40(hw, GLPRT_GOTCL(pf_id), pf->stat_prev_loaded,
 			  &prev_ps->eth.tx_bytes,
 			  &cur_ps->eth.tx_bytes);
-- 
2.21.0


^ permalink raw reply related

* [net-next 16/16] ice: Bump version number
From: Jeff Kirsher @ 2019-07-31 20:41 UTC (permalink / raw)
  To: davem; +Cc: Tony Nguyen, netdev, nhorman, sassmann, Andrew Bowers,
	Jeff Kirsher
In-Reply-To: <20190731204147.8582-1-jeffrey.t.kirsher@intel.com>

From: Tony Nguyen <anthony.l.nguyen@intel.com>

Update driver version to 0.7.5

Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ice/ice_main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c
index 8a8f9170e219..c26e6a102dac 100644
--- a/drivers/net/ethernet/intel/ice/ice_main.c
+++ b/drivers/net/ethernet/intel/ice/ice_main.c
@@ -9,7 +9,7 @@
 #include "ice_lib.h"
 #include "ice_dcb_lib.h"
 
-#define DRV_VERSION	"0.7.4-k"
+#define DRV_VERSION	"0.7.5-k"
 #define DRV_SUMMARY	"Intel(R) Ethernet Connection E800 Series Linux Driver"
 const char ice_drv_ver[] = DRV_VERSION;
 static const char ice_driver_string[] = DRV_SUMMARY;
-- 
2.21.0


^ permalink raw reply related

* [net-next 13/16] ice: Don't return error for disabling LAN Tx queue that does exist
From: Jeff Kirsher @ 2019-07-31 20:41 UTC (permalink / raw)
  To: davem
  Cc: Akeem G Abodunrin, netdev, nhorman, sassmann, Andrew Bowers,
	Jeff Kirsher
In-Reply-To: <20190731204147.8582-1-jeffrey.t.kirsher@intel.com>

From: Akeem G Abodunrin <akeem.g.abodunrin@intel.com>

Since Tx rings are being managed by FW/NVM, Tx rings might have not been
set up or driver had already wiped them off - In that case, call to
disable LAN Tx queue is being returned as not in existence. This patch
makes sure we don't return unnecessary error for such scenario.

Signed-off-by: Akeem G Abodunrin <akeem.g.abodunrin@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ice/ice_lib.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/net/ethernet/intel/ice/ice_lib.c b/drivers/net/ethernet/intel/ice/ice_lib.c
index e28478215810..2add10e02280 100644
--- a/drivers/net/ethernet/intel/ice/ice_lib.c
+++ b/drivers/net/ethernet/intel/ice/ice_lib.c
@@ -2148,6 +2148,9 @@ ice_vsi_stop_tx_rings(struct ice_vsi *vsi, enum ice_disq_rst_src rst_src,
 		if (status == ICE_ERR_RESET_ONGOING) {
 			dev_dbg(&pf->pdev->dev,
 				"Reset in progress. LAN Tx queues already disabled\n");
+		} else if (status == ICE_ERR_DOES_NOT_EXIST) {
+			dev_dbg(&pf->pdev->dev,
+				"LAN Tx queues does not exist, nothing to disabled\n");
 		} else if (status) {
 			dev_err(&pf->pdev->dev,
 				"Failed to disable LAN Tx queues, error: %d\n",
-- 
2.21.0


^ permalink raw reply related

* [net-next 14/16] ice: Remove unnecessary flag ICE_FLAG_MSIX_ENA
From: Jeff Kirsher @ 2019-07-31 20:41 UTC (permalink / raw)
  To: davem; +Cc: Brett Creeley, netdev, nhorman, sassmann, Andrew Bowers,
	Jeff Kirsher
In-Reply-To: <20190731204147.8582-1-jeffrey.t.kirsher@intel.com>

From: Brett Creeley <brett.creeley@intel.com>

This flag is not needed and is called every time we re-enable interrupts
in the hotpath so remove it. Also remove ice_vsi_req_irq() because it
was a wrapper function for ice_vsi_req_irq_msix() whose sole purpose was
checking the ICE_FLAG_MSIX_ENA flag.

Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ice/ice.h      |  1 -
 drivers/net/ethernet/intel/ice/ice_lib.c  | 71 +++++++++-------------
 drivers/net/ethernet/intel/ice/ice_main.c | 74 ++++++-----------------
 drivers/net/ethernet/intel/ice/ice_txrx.c |  4 +-
 4 files changed, 48 insertions(+), 102 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice.h b/drivers/net/ethernet/intel/ice/ice.h
index 596b09a905aa..794d97460fc7 100644
--- a/drivers/net/ethernet/intel/ice/ice.h
+++ b/drivers/net/ethernet/intel/ice/ice.h
@@ -329,7 +329,6 @@ struct ice_q_vector {
 } ____cacheline_internodealigned_in_smp;
 
 enum ice_pf_flags {
-	ICE_FLAG_MSIX_ENA,
 	ICE_FLAG_FLTR_SYNC,
 	ICE_FLAG_RSS_ENA,
 	ICE_FLAG_SRIOV_ENA,
diff --git a/drivers/net/ethernet/intel/ice/ice_lib.c b/drivers/net/ethernet/intel/ice/ice_lib.c
index 2add10e02280..6e34c40e7840 100644
--- a/drivers/net/ethernet/intel/ice/ice_lib.c
+++ b/drivers/net/ethernet/intel/ice/ice_lib.c
@@ -1129,12 +1129,7 @@ static int ice_vsi_alloc_q_vectors(struct ice_vsi *vsi)
 		return -EEXIST;
 	}
 
-	if (test_bit(ICE_FLAG_MSIX_ENA, pf->flags)) {
-		num_q_vectors = vsi->num_q_vectors;
-	} else {
-		err = -EINVAL;
-		goto err_out;
-	}
+	num_q_vectors = vsi->num_q_vectors;
 
 	for (v_idx = 0; v_idx < num_q_vectors; v_idx++) {
 		err = ice_vsi_alloc_q_vector(vsi, v_idx);
@@ -1180,9 +1175,6 @@ static int ice_vsi_setup_vector_base(struct ice_vsi *vsi)
 		return -EEXIST;
 	}
 
-	if (!test_bit(ICE_FLAG_MSIX_ENA, pf->flags))
-		return -ENOENT;
-
 	num_q_vectors = vsi->num_q_vectors;
 	/* reserve slots from OS requested IRQs */
 	vsi->base_vector = ice_get_res(pf, pf->irq_tracker, num_q_vectors,
@@ -2605,39 +2597,36 @@ void ice_vsi_free_irq(struct ice_vsi *vsi)
 {
 	struct ice_pf *pf = vsi->back;
 	int base = vsi->base_vector;
+	int i;
 
-	if (test_bit(ICE_FLAG_MSIX_ENA, pf->flags)) {
-		int i;
-
-		if (!vsi->q_vectors || !vsi->irqs_ready)
-			return;
+	if (!vsi->q_vectors || !vsi->irqs_ready)
+		return;
 
-		ice_vsi_release_msix(vsi);
-		if (vsi->type == ICE_VSI_VF)
-			return;
+	ice_vsi_release_msix(vsi);
+	if (vsi->type == ICE_VSI_VF)
+		return;
 
-		vsi->irqs_ready = false;
-		ice_for_each_q_vector(vsi, i) {
-			u16 vector = i + base;
-			int irq_num;
+	vsi->irqs_ready = false;
+	ice_for_each_q_vector(vsi, i) {
+		u16 vector = i + base;
+		int irq_num;
 
-			irq_num = pf->msix_entries[vector].vector;
+		irq_num = pf->msix_entries[vector].vector;
 
-			/* free only the irqs that were actually requested */
-			if (!vsi->q_vectors[i] ||
-			    !(vsi->q_vectors[i]->num_ring_tx ||
-			      vsi->q_vectors[i]->num_ring_rx))
-				continue;
+		/* free only the irqs that were actually requested */
+		if (!vsi->q_vectors[i] ||
+		    !(vsi->q_vectors[i]->num_ring_tx ||
+		      vsi->q_vectors[i]->num_ring_rx))
+			continue;
 
-			/* clear the affinity notifier in the IRQ descriptor */
-			irq_set_affinity_notifier(irq_num, NULL);
+		/* clear the affinity notifier in the IRQ descriptor */
+		irq_set_affinity_notifier(irq_num, NULL);
 
-			/* clear the affinity_mask in the IRQ descriptor */
-			irq_set_affinity_hint(irq_num, NULL);
-			synchronize_irq(irq_num);
-			devm_free_irq(&pf->pdev->dev, irq_num,
-				      vsi->q_vectors[i]);
-		}
+		/* clear the affinity_mask in the IRQ descriptor */
+		irq_set_affinity_hint(irq_num, NULL);
+		synchronize_irq(irq_num);
+		devm_free_irq(&pf->pdev->dev, irq_num,
+			      vsi->q_vectors[i]);
 	}
 }
 
@@ -2816,15 +2805,13 @@ void ice_vsi_dis_irq(struct ice_vsi *vsi)
 	}
 
 	/* disable each interrupt */
-	if (test_bit(ICE_FLAG_MSIX_ENA, pf->flags)) {
-		ice_for_each_q_vector(vsi, i)
-			wr32(hw, GLINT_DYN_CTL(vsi->q_vectors[i]->reg_idx), 0);
+	ice_for_each_q_vector(vsi, i)
+		wr32(hw, GLINT_DYN_CTL(vsi->q_vectors[i]->reg_idx), 0);
 
-		ice_flush(hw);
+	ice_flush(hw);
 
-		ice_for_each_q_vector(vsi, i)
-			synchronize_irq(pf->msix_entries[i + base].vector);
-	}
+	ice_for_each_q_vector(vsi, i)
+		synchronize_irq(pf->msix_entries[i + base].vector);
 }
 
 /**
diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c
index e4be9aa79337..8a8f9170e219 100644
--- a/drivers/net/ethernet/intel/ice/ice_main.c
+++ b/drivers/net/ethernet/intel/ice/ice_main.c
@@ -1547,15 +1547,11 @@ static void ice_irq_affinity_release(struct kref __always_unused *ref) {}
  */
 static int ice_vsi_ena_irq(struct ice_vsi *vsi)
 {
-	struct ice_pf *pf = vsi->back;
-	struct ice_hw *hw = &pf->hw;
-
-	if (test_bit(ICE_FLAG_MSIX_ENA, pf->flags)) {
-		int i;
+	struct ice_hw *hw = &vsi->back->hw;
+	int i;
 
-		ice_for_each_q_vector(vsi, i)
-			ice_irq_dynamic_ena(hw, vsi, vsi->q_vectors[i]);
-	}
+	ice_for_each_q_vector(vsi, i)
+		ice_irq_dynamic_ena(hw, vsi, vsi->q_vectors[i]);
 
 	ice_flush(hw);
 	return 0;
@@ -1803,7 +1799,7 @@ static void ice_free_irq_msix_misc(struct ice_pf *pf)
 	wr32(hw, PFINT_OICR_ENA, 0);
 	ice_flush(hw);
 
-	if (test_bit(ICE_FLAG_MSIX_ENA, pf->flags) && pf->msix_entries) {
+	if (pf->msix_entries) {
 		synchronize_irq(pf->msix_entries[pf->oicr_idx].vector);
 		devm_free_irq(&pf->pdev->dev,
 			      pf->msix_entries[pf->oicr_idx].vector, pf);
@@ -2229,7 +2225,6 @@ static void ice_deinit_pf(struct ice_pf *pf)
 static void ice_init_pf(struct ice_pf *pf)
 {
 	bitmap_zero(pf->flags, ICE_PF_FLAGS_NBITS);
-	set_bit(ICE_FLAG_MSIX_ENA, pf->flags);
 #ifdef CONFIG_PCI_IOV
 	if (pf->hw.func_caps.common_cap.sr_iov_1_1) {
 		struct ice_hw *hw = &pf->hw;
@@ -2329,7 +2324,6 @@ static int ice_ena_msix_range(struct ice_pf *pf)
 
 exit_err:
 	pf->num_lan_msix = 0;
-	clear_bit(ICE_FLAG_MSIX_ENA, pf->flags);
 	return err;
 }
 
@@ -2342,7 +2336,6 @@ static void ice_dis_msix(struct ice_pf *pf)
 	pci_disable_msix(pf->pdev);
 	devm_kfree(&pf->pdev->dev, pf->msix_entries);
 	pf->msix_entries = NULL;
-	clear_bit(ICE_FLAG_MSIX_ENA, pf->flags);
 }
 
 /**
@@ -2351,8 +2344,7 @@ static void ice_dis_msix(struct ice_pf *pf)
  */
 static void ice_clear_interrupt_scheme(struct ice_pf *pf)
 {
-	if (test_bit(ICE_FLAG_MSIX_ENA, pf->flags))
-		ice_dis_msix(pf);
+	ice_dis_msix(pf);
 
 	if (pf->irq_tracker) {
 		devm_kfree(&pf->pdev->dev, pf->irq_tracker);
@@ -2368,10 +2360,7 @@ static int ice_init_interrupt_scheme(struct ice_pf *pf)
 {
 	int vectors;
 
-	if (test_bit(ICE_FLAG_MSIX_ENA, pf->flags))
-		vectors = ice_ena_msix_range(pf);
-	else
-		return -ENODEV;
+	vectors = ice_ena_msix_range(pf);
 
 	if (vectors < 0)
 		return vectors;
@@ -2528,12 +2517,10 @@ ice_probe(struct pci_dev *pdev, const struct pci_device_id __always_unused *ent)
 	 * the misc functionality and queue processing is combined in
 	 * the same vector and that gets setup at open.
 	 */
-	if (test_bit(ICE_FLAG_MSIX_ENA, pf->flags)) {
-		err = ice_req_irq_msix_misc(pf);
-		if (err) {
-			dev_err(dev, "setup of misc vector failed: %d\n", err);
-			goto err_init_interrupt_unroll;
-		}
+	err = ice_req_irq_msix_misc(pf);
+	if (err) {
+		dev_err(dev, "setup of misc vector failed: %d\n", err);
+		goto err_init_interrupt_unroll;
 	}
 
 	/* create switch struct for the switch element created by FW on boot */
@@ -3146,10 +3133,7 @@ static int ice_up_complete(struct ice_vsi *vsi)
 	struct ice_pf *pf = vsi->back;
 	int err;
 
-	if (test_bit(ICE_FLAG_MSIX_ENA, pf->flags))
-		ice_vsi_cfg_msix(vsi);
-	else
-		return -ENOTSUPP;
+	ice_vsi_cfg_msix(vsi);
 
 	/* Enable only Rx rings, Tx rings were enabled by the FW when the
 	 * Tx queue group list was configured and the context bits were
@@ -3609,24 +3593,6 @@ int ice_vsi_setup_rx_rings(struct ice_vsi *vsi)
 	return err;
 }
 
-/**
- * ice_vsi_req_irq - Request IRQ from the OS
- * @vsi: The VSI IRQ is being requested for
- * @basename: name for the vector
- *
- * Return 0 on success and a negative value on error
- */
-static int ice_vsi_req_irq(struct ice_vsi *vsi, char *basename)
-{
-	struct ice_pf *pf = vsi->back;
-	int err = -EINVAL;
-
-	if (test_bit(ICE_FLAG_MSIX_ENA, pf->flags))
-		err = ice_vsi_req_irq_msix(vsi, basename);
-
-	return err;
-}
-
 /**
  * ice_vsi_open - Called when a network interface is made active
  * @vsi: the VSI to open
@@ -3656,7 +3622,7 @@ static int ice_vsi_open(struct ice_vsi *vsi)
 
 	snprintf(int_name, sizeof(int_name) - 1, "%s-%s",
 		 dev_driver_string(&pf->pdev->dev), vsi->netdev->name);
-	err = ice_vsi_req_irq(vsi, int_name);
+	err = ice_vsi_req_irq_msix(vsi, int_name);
 	if (err)
 		goto err_setup_rx;
 
@@ -3893,12 +3859,10 @@ static void ice_rebuild(struct ice_pf *pf)
 	}
 
 	/* start misc vector */
-	if (test_bit(ICE_FLAG_MSIX_ENA, pf->flags)) {
-		err = ice_req_irq_msix_misc(pf);
-		if (err) {
-			dev_err(dev, "misc vector setup failed: %d\n", err);
-			goto err_vsi_rebuild;
-		}
+	err = ice_req_irq_msix_misc(pf);
+	if (err) {
+		dev_err(dev, "misc vector setup failed: %d\n", err);
+		goto err_vsi_rebuild;
 	}
 
 	/* restart the VSIs that were rebuilt and running before the reset */
@@ -4295,9 +4259,7 @@ static void ice_tx_timeout(struct net_device *netdev)
 		head = (rd32(hw, QTX_COMM_HEAD(vsi->txq_map[hung_queue])) &
 			QTX_COMM_HEAD_HEAD_M) >> QTX_COMM_HEAD_HEAD_S;
 		/* Read interrupt register */
-		if (test_bit(ICE_FLAG_MSIX_ENA, pf->flags))
-			val = rd32(hw,
-				   GLINT_DYN_CTL(tx_ring->q_vector->reg_idx));
+		val = rd32(hw, GLINT_DYN_CTL(tx_ring->q_vector->reg_idx));
 
 		netdev_info(netdev, "tx_timeout: VSI_num: %d, Q %d, NTC: 0x%x, HW_HEAD: 0x%x, NTU: 0x%x, INT: 0x%x\n",
 			    vsi->vsi_num, hung_queue, tx_ring->next_to_clean,
diff --git a/drivers/net/ethernet/intel/ice/ice_txrx.c b/drivers/net/ethernet/intel/ice/ice_txrx.c
index 020dac283f07..9234fd203929 100644
--- a/drivers/net/ethernet/intel/ice/ice_txrx.c
+++ b/drivers/net/ethernet/intel/ice/ice_txrx.c
@@ -1413,7 +1413,6 @@ int ice_napi_poll(struct napi_struct *napi, int budget)
 	struct ice_q_vector *q_vector =
 				container_of(napi, struct ice_q_vector, napi);
 	struct ice_vsi *vsi = q_vector->vsi;
-	struct ice_pf *pf = vsi->back;
 	bool clean_complete = true;
 	int budget_per_ring = 0;
 	struct ice_ring *ring;
@@ -1454,8 +1453,7 @@ int ice_napi_poll(struct napi_struct *napi, int budget)
 	 * poll us due to busy-polling
 	 */
 	if (likely(napi_complete_done(napi, work_done)))
-		if (test_bit(ICE_FLAG_MSIX_ENA, pf->flags))
-			ice_update_ena_itr(vsi, q_vector);
+		ice_update_ena_itr(vsi, q_vector);
 
 	return min_t(int, work_done, budget - 1);
 }
-- 
2.21.0


^ permalink raw reply related

* [net-next 15/16] ice: Remove flag to track VF interrupt status
From: Jeff Kirsher @ 2019-07-31 20:41 UTC (permalink / raw)
  To: davem
  Cc: Akeem G Abodunrin, netdev, nhorman, sassmann, Andrew Bowers,
	Jeff Kirsher
In-Reply-To: <20190731204147.8582-1-jeffrey.t.kirsher@intel.com>

From: Akeem G Abodunrin <akeem.g.abodunrin@intel.com>

As a result of refactoring of VF VSIs interrupts code, there is no
need to track its configuration status again with ICE_VF_STATE_CFG_INTR
flag - In fact, it is not being checked anywhere in the code right now, so
this patch removes the dead code as applicable to the flag.

Signed-off-by: Akeem G Abodunrin <akeem.g.abodunrin@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c | 11 -----------
 drivers/net/ethernet/intel/ice/ice_virtchnl_pf.h |  5 -----
 2 files changed, 16 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c b/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c
index 00aa6364124a..ce01cbe70ea4 100644
--- a/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c
+++ b/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c
@@ -297,13 +297,6 @@ void ice_free_vfs(struct ice_pf *pf)
 		if (test_bit(ICE_VF_STATE_INIT, pf->vf[i].vf_states)) {
 			/* disable VF qp mappings */
 			ice_dis_vf_mappings(&pf->vf[i]);
-
-			/* Set this state so that assigned VF vectors can be
-			 * reclaimed by PF for reuse in ice_vsi_release(). No
-			 * need to clear this bit since pf->vf array is being
-			 * freed anyways after this for loop
-			 */
-			set_bit(ICE_VF_STATE_CFG_INTR, pf->vf[i].vf_states);
 			ice_free_vf_res(&pf->vf[i]);
 		}
 	}
@@ -551,7 +544,6 @@ static int ice_alloc_vsi_res(struct ice_vf *vf)
 	 * expect vector assignment to be changed unless there is a request for
 	 * more vectors.
 	 */
-	clear_bit(ICE_VF_STATE_CFG_INTR, vf->vf_states);
 ice_alloc_vsi_res_exit:
 	ice_free_fltr_list(&pf->pdev->dev, &tmp_add_list);
 	return status;
@@ -1283,9 +1275,6 @@ static int ice_alloc_vfs(struct ice_pf *pf, u16 num_alloc_vfs)
 		/* assign default capabilities */
 		set_bit(ICE_VIRTCHNL_VF_CAP_L2, &vfs[i].vf_caps);
 		vfs[i].spoofchk = true;
-
-		/* Set this state so that PF driver does VF vector assignment */
-		set_bit(ICE_VF_STATE_CFG_INTR, vfs[i].vf_states);
 	}
 	pf->num_alloc_vfs = num_alloc_vfs;
 
diff --git a/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.h b/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.h
index c3ca522c245a..ada69120ff38 100644
--- a/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.h
+++ b/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.h
@@ -30,11 +30,6 @@ enum ice_vf_states {
 	ICE_VF_STATE_DIS,
 	ICE_VF_STATE_MC_PROMISC,
 	ICE_VF_STATE_UC_PROMISC,
-	/* state to indicate if PF needs to do vector assignment for VF.
-	 * This needs to be set during first time VF initialization or later
-	 * when VF asks for more Vectors through virtchnl OP.
-	 */
-	ICE_VF_STATE_CFG_INTR,
 	ICE_VF_STATES_NBITS
 };
 
-- 
2.21.0


^ permalink raw reply related

* [net-next 12/16] ice: Remove duplicate code in ice_alloc_rx_bufs
From: Jeff Kirsher @ 2019-07-31 20:41 UTC (permalink / raw)
  To: davem; +Cc: Brett Creeley, netdev, nhorman, sassmann, Andrew Bowers,
	Jeff Kirsher
In-Reply-To: <20190731204147.8582-1-jeffrey.t.kirsher@intel.com>

From: Brett Creeley <brett.creeley@intel.com>

Currently if the call to ice_alloc_mapped_page() fails we jump to the
no_buf label, possibly call ice_release_rx_desc(), and return true
indicating that there is more work to do. In the success case we just
fall out of the while loop, possibly call ice_alloc_mapped_page(), and
return false saying we exhausted cleaned_count. This flow can be
improved by breaking if ice_alloc_mapped_page() fails and then the flow
outside of the while loop is the same for the failure and success case.

Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ice/ice_txrx.c | 14 +++-----------
 1 file changed, 3 insertions(+), 11 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_txrx.c b/drivers/net/ethernet/intel/ice/ice_txrx.c
index 0c459305c12f..020dac283f07 100644
--- a/drivers/net/ethernet/intel/ice/ice_txrx.c
+++ b/drivers/net/ethernet/intel/ice/ice_txrx.c
@@ -478,8 +478,9 @@ bool ice_alloc_rx_bufs(struct ice_ring *rx_ring, u16 cleaned_count)
 	bi = &rx_ring->rx_buf[ntu];
 
 	do {
+		/* if we fail here, we have work remaining */
 		if (!ice_alloc_mapped_page(rx_ring, bi))
-			goto no_bufs;
+			break;
 
 		/* sync the buffer for use by the device */
 		dma_sync_single_range_for_device(rx_ring->dev, bi->dma,
@@ -510,16 +511,7 @@ bool ice_alloc_rx_bufs(struct ice_ring *rx_ring, u16 cleaned_count)
 	if (rx_ring->next_to_use != ntu)
 		ice_release_rx_desc(rx_ring, ntu);
 
-	return false;
-
-no_bufs:
-	if (rx_ring->next_to_use != ntu)
-		ice_release_rx_desc(rx_ring, ntu);
-
-	/* make sure to come back via polling to try again after
-	 * allocation failure
-	 */
-	return true;
+	return !!cleaned_count;
 }
 
 /**
-- 
2.21.0


^ permalink raw reply related

* [net-next 09/16] ice: Set up Tx scheduling tree based on alloc VSI Tx queues
From: Jeff Kirsher @ 2019-07-31 20:41 UTC (permalink / raw)
  To: davem
  Cc: Akeem G Abodunrin, netdev, nhorman, sassmann, Tony Nguyen,
	Andrew Bowers, Jeff Kirsher
In-Reply-To: <20190731204147.8582-1-jeffrey.t.kirsher@intel.com>

From: Akeem G Abodunrin <akeem.g.abodunrin@intel.com>

This patch uses allocated number of Tx queues per VSI to set up its
scheduling tree instead of using total number of available Tx queues.
Only PF VSIs have total number of allocated Tx queues equal to number
of available Tx queues, other VSIs have different number of queues
configured.

Signed-off-by: Akeem G Abodunrin <akeem.g.abodunrin@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ice/ice_lib.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_lib.c b/drivers/net/ethernet/intel/ice/ice_lib.c
index 01f38abd4277..e28478215810 100644
--- a/drivers/net/ethernet/intel/ice/ice_lib.c
+++ b/drivers/net/ethernet/intel/ice/ice_lib.c
@@ -2511,7 +2511,7 @@ ice_vsi_setup(struct ice_pf *pf, struct ice_port_info *pi,
 
 	/* configure VSI nodes based on number of queues and TC's */
 	for (i = 0; i < vsi->tc_cfg.numtc; i++)
-		max_txqs[i] = pf->num_lan_tx;
+		max_txqs[i] = vsi->alloc_txq;
 
 	status = ice_cfg_vsi_lan(vsi->port_info, vsi->idx, vsi->tc_cfg.ena_tc,
 				 max_txqs);
@@ -3020,7 +3020,7 @@ int ice_vsi_rebuild(struct ice_vsi *vsi)
 
 	/* configure VSI nodes based on number of queues and TC's */
 	for (i = 0; i < vsi->tc_cfg.numtc; i++)
-		max_txqs[i] = pf->num_lan_tx;
+		max_txqs[i] = vsi->alloc_txq;
 
 	status = ice_cfg_vsi_lan(vsi->port_info, vsi->idx, vsi->tc_cfg.ena_tc,
 				 max_txqs);
@@ -3137,7 +3137,7 @@ int ice_vsi_cfg_tc(struct ice_vsi *vsi, u8 ena_tc)
 		if (ena_tc & BIT(i))
 			num_tc++;
 		/* populate max_txqs per TC */
-		max_txqs[i] = pf->num_lan_tx;
+		max_txqs[i] = vsi->alloc_txq;
 	}
 
 	vsi->tc_cfg.ena_tc = ena_tc;
-- 
2.21.0


^ permalink raw reply related

* [net-next 03/16] ice: Move vector base setup to PF VSI
From: Jeff Kirsher @ 2019-07-31 20:41 UTC (permalink / raw)
  To: davem; +Cc: Tony Nguyen, netdev, nhorman, sassmann, Andrew Bowers,
	Jeff Kirsher
In-Reply-To: <20190731204147.8582-1-jeffrey.t.kirsher@intel.com>

From: Tony Nguyen <anthony.l.nguyen@intel.com>

When interrupt tracking was refactored, during rebuild, the call to
ice_vsi_setup_vector_base() was inadvertently removed from the PF VSI
instead of being removed from the VF VSI. During reset, the failure to
properly setup the vector base generates a call trace. Correct this so
that resets/rebuilds properly complete.

Fixes: cbe66bfee6a0 ("ice: Refactor interrupt tracking")
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ice/ice_lib.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_lib.c b/drivers/net/ethernet/intel/ice/ice_lib.c
index e9e8340b1ab7..01f38abd4277 100644
--- a/drivers/net/ethernet/intel/ice/ice_lib.c
+++ b/drivers/net/ethernet/intel/ice/ice_lib.c
@@ -2978,6 +2978,10 @@ int ice_vsi_rebuild(struct ice_vsi *vsi)
 		if (ret)
 			goto err_rings;
 
+		ret = ice_vsi_setup_vector_base(vsi);
+		if (ret)
+			goto err_vectors;
+
 		ret = ice_vsi_set_q_vectors_reg_idx(vsi);
 		if (ret)
 			goto err_vectors;
@@ -2999,10 +3003,6 @@ int ice_vsi_rebuild(struct ice_vsi *vsi)
 		if (ret)
 			goto err_rings;
 
-		ret = ice_vsi_setup_vector_base(vsi);
-		if (ret)
-			goto err_vectors;
-
 		ret = ice_vsi_set_q_vectors_reg_idx(vsi);
 		if (ret)
 			goto err_vectors;
-- 
2.21.0


^ permalink raw reply related

* [net-next 08/16] ice: Only bump Rx tail and release buffers once per napi_poll
From: Jeff Kirsher @ 2019-07-31 20:41 UTC (permalink / raw)
  To: davem; +Cc: Brett Creeley, netdev, nhorman, sassmann, Andrew Bowers,
	Jeff Kirsher
In-Reply-To: <20190731204147.8582-1-jeffrey.t.kirsher@intel.com>

From: Brett Creeley <brett.creeley@intel.com>

Currently we bump the Rx tail and release/give buffers to hardware every
16 descriptors. This causes us to bump Rx tail up to 4 times per
napi_poll call. Also we are always bumping tail on an odd index and this
is a problem because hardware ignores the lower 3 bits in the QRX_TAIL
register. This is making it so hardware sees tail bumps only every 8
descriptors. Instead lets only bump Rx tail once per napi_poll if
the value aligns with hardware's expectations of the lower 3 bits being
cleared. Also only release/give Rx buffers once per napi_poll call.

Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ice/ice_txrx.c | 42 +++++++++++++++--------
 1 file changed, 27 insertions(+), 15 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_txrx.c b/drivers/net/ethernet/intel/ice/ice_txrx.c
index dd7392f293bf..0c459305c12f 100644
--- a/drivers/net/ethernet/intel/ice/ice_txrx.c
+++ b/drivers/net/ethernet/intel/ice/ice_txrx.c
@@ -377,18 +377,28 @@ int ice_setup_rx_ring(struct ice_ring *rx_ring)
  */
 static void ice_release_rx_desc(struct ice_ring *rx_ring, u32 val)
 {
+	u16 prev_ntu = rx_ring->next_to_use;
+
 	rx_ring->next_to_use = val;
 
 	/* update next to alloc since we have filled the ring */
 	rx_ring->next_to_alloc = val;
 
-	/* Force memory writes to complete before letting h/w
-	 * know there are new descriptors to fetch. (Only
-	 * applicable for weak-ordered memory model archs,
-	 * such as IA-64).
+	/* QRX_TAIL will be updated with any tail value, but hardware ignores
+	 * the lower 3 bits. This makes it so we only bump tail on meaningful
+	 * boundaries. Also, this allows us to bump tail on intervals of 8 up to
+	 * the budget depending on the current traffic load.
 	 */
-	wmb();
-	writel(val, rx_ring->tail);
+	val &= ~0x7;
+	if (prev_ntu != val) {
+		/* Force memory writes to complete before letting h/w
+		 * know there are new descriptors to fetch. (Only
+		 * applicable for weak-ordered memory model archs,
+		 * such as IA-64).
+		 */
+		wmb();
+		writel(val, rx_ring->tail);
+	}
 }
 
 /**
@@ -445,7 +455,13 @@ ice_alloc_mapped_page(struct ice_ring *rx_ring, struct ice_rx_buf *bi)
  * @rx_ring: ring to place buffers on
  * @cleaned_count: number of buffers to replace
  *
- * Returns false if all allocations were successful, true if any fail
+ * Returns false if all allocations were successful, true if any fail. Returning
+ * true signals to the caller that we didn't replace cleaned_count buffers and
+ * there is more work to do.
+ *
+ * First, try to clean "cleaned_count" Rx buffers. Then refill the cleaned Rx
+ * buffers. Then bump tail at most one time. Grouping like this lets us avoid
+ * multiple tail writes per call.
  */
 bool ice_alloc_rx_bufs(struct ice_ring *rx_ring, u16 cleaned_count)
 {
@@ -990,7 +1006,7 @@ static int ice_clean_rx_irq(struct ice_ring *rx_ring, int budget)
 {
 	unsigned int total_rx_bytes = 0, total_rx_pkts = 0;
 	u16 cleaned_count = ICE_DESC_UNUSED(rx_ring);
-	bool failure = false;
+	bool failure;
 
 	/* start the loop to process Rx packets bounded by 'budget' */
 	while (likely(total_rx_pkts < (unsigned int)budget)) {
@@ -1002,13 +1018,6 @@ static int ice_clean_rx_irq(struct ice_ring *rx_ring, int budget)
 		u16 vlan_tag = 0;
 		u8 rx_ptype;
 
-		/* return some buffers to hardware, one at a time is too slow */
-		if (cleaned_count >= ICE_RX_BUF_WRITE) {
-			failure = failure ||
-				  ice_alloc_rx_bufs(rx_ring, cleaned_count);
-			cleaned_count = 0;
-		}
-
 		/* get the Rx desc from Rx ring based on 'next_to_clean' */
 		rx_desc = ICE_RX_DESC(rx_ring, rx_ring->next_to_clean);
 
@@ -1085,6 +1094,9 @@ static int ice_clean_rx_irq(struct ice_ring *rx_ring, int budget)
 		total_rx_pkts++;
 	}
 
+	/* return up to cleaned_count buffers to hardware */
+	failure = ice_alloc_rx_bufs(rx_ring, cleaned_count);
+
 	/* update queue and vector specific stats */
 	u64_stats_update_begin(&rx_ring->syncp);
 	rx_ring->stats.pkts += total_rx_pkts;
-- 
2.21.0


^ permalink raw reply related

* [net-next 04/16] ice: Always set prefena when configuring an Rx queue
From: Jeff Kirsher @ 2019-07-31 20:41 UTC (permalink / raw)
  To: davem; +Cc: Brett Creeley, netdev, nhorman, sassmann, Andrew Bowers,
	Jeff Kirsher
In-Reply-To: <20190731204147.8582-1-jeffrey.t.kirsher@intel.com>

From: Brett Creeley <brett.creeley@intel.com>

Currently we are always setting prefena to 0. This is causing the
hardware to only fetch descriptors when there are none free in the cache
for a received packet instead of prefetching when it has used the last
descriptor regardless of incoming packets. Fix this by allowing the
hardware to prefetch Rx descriptors.

Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ice/ice_common.c    | 9 ++++++++-
 drivers/net/ethernet/intel/ice/ice_lan_tx_rx.h | 1 +
 2 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_common.c b/drivers/net/ethernet/intel/ice/ice_common.c
index 4be3559de207..01e5ecaaa322 100644
--- a/drivers/net/ethernet/intel/ice/ice_common.c
+++ b/drivers/net/ethernet/intel/ice/ice_common.c
@@ -1078,6 +1078,7 @@ static const struct ice_ctx_ele ice_rlan_ctx_info[] = {
 	ICE_CTX_STORE(ice_rlan_ctx, tphdata_ena,	1,	195),
 	ICE_CTX_STORE(ice_rlan_ctx, tphhead_ena,	1,	196),
 	ICE_CTX_STORE(ice_rlan_ctx, lrxqthresh,		3,	198),
+	ICE_CTX_STORE(ice_rlan_ctx, prefena,		1,	201),
 	{ 0 }
 };
 
@@ -1088,7 +1089,8 @@ static const struct ice_ctx_ele ice_rlan_ctx_info[] = {
  * @rxq_index: the index of the Rx queue
  *
  * Converts rxq context from sparse to dense structure and then writes
- * it to HW register space
+ * it to HW register space and enables the hardware to prefetch descriptors
+ * instead of only fetching them on demand
  */
 enum ice_status
 ice_write_rxq_ctx(struct ice_hw *hw, struct ice_rlan_ctx *rlan_ctx,
@@ -1096,6 +1098,11 @@ ice_write_rxq_ctx(struct ice_hw *hw, struct ice_rlan_ctx *rlan_ctx,
 {
 	u8 ctx_buf[ICE_RXQ_CTX_SZ] = { 0 };
 
+	if (!rlan_ctx)
+		return ICE_ERR_BAD_PTR;
+
+	rlan_ctx->prefena = 1;
+
 	ice_set_ctx((u8 *)rlan_ctx, ctx_buf, ice_rlan_ctx_info);
 	return ice_copy_rxq_ctx_to_hw(hw, ctx_buf, rxq_index);
 }
diff --git a/drivers/net/ethernet/intel/ice/ice_lan_tx_rx.h b/drivers/net/ethernet/intel/ice/ice_lan_tx_rx.h
index 510a8c900e61..57ea6811fe2c 100644
--- a/drivers/net/ethernet/intel/ice/ice_lan_tx_rx.h
+++ b/drivers/net/ethernet/intel/ice/ice_lan_tx_rx.h
@@ -290,6 +290,7 @@ struct ice_rlan_ctx {
 	u8 tphdata_ena;
 	u8 tphhead_ena;
 	u16 lrxqthresh; /* bigger than needed, see above for reason */
+	u8 prefena;	/* NOTE: normally must be set to 1 at init */
 };
 
 struct ice_ctx_ele {
-- 
2.21.0


^ permalink raw reply related

* [net-next 06/16] ice: Do not configure port with no media
From: Jeff Kirsher @ 2019-07-31 20:41 UTC (permalink / raw)
  To: davem; +Cc: Tony Nguyen, netdev, nhorman, sassmann, Andrew Bowers,
	Jeff Kirsher
In-Reply-To: <20190731204147.8582-1-jeffrey.t.kirsher@intel.com>

From: Tony Nguyen <anthony.l.nguyen@intel.com>

The firmware reports an error when trying to configure a port with no
media. Instead of always configuring the port, check for media before
attempting to configure it. In the absence of media, turn off link and
poll for media to become available before re-enabling link.

Move ice_force_phys_link_state() up to avoid forward declaration.

Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ice/ice.h      |   1 +
 drivers/net/ethernet/intel/ice/ice_main.c | 239 ++++++++++++++--------
 2 files changed, 158 insertions(+), 82 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice.h b/drivers/net/ethernet/intel/ice/ice.h
index 9ee6b55553c0..596b09a905aa 100644
--- a/drivers/net/ethernet/intel/ice/ice.h
+++ b/drivers/net/ethernet/intel/ice/ice.h
@@ -337,6 +337,7 @@ enum ice_pf_flags {
 	ICE_FLAG_DCB_CAPABLE,
 	ICE_FLAG_DCB_ENA,
 	ICE_FLAG_LINK_DOWN_ON_CLOSE_ENA,
+	ICE_FLAG_NO_MEDIA,
 	ICE_FLAG_ENABLE_FW_LLDP,
 	ICE_FLAG_ETHTOOL_CTXT,		/* set when ethtool holds RTNL lock */
 	ICE_PF_FLAGS_NBITS		/* must be last */
diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c
index f490e65c64bc..91334d1e83ed 100644
--- a/drivers/net/ethernet/intel/ice/ice_main.c
+++ b/drivers/net/ethernet/intel/ice/ice_main.c
@@ -810,6 +810,20 @@ ice_link_event(struct ice_pf *pf, struct ice_port_info *pi, bool link_up,
 	if (!vsi || !vsi->port_info)
 		return -EINVAL;
 
+	/* turn off PHY if media was removed */
+	if (!test_bit(ICE_FLAG_NO_MEDIA, pf->flags) &&
+	    !(pi->phy.link_info.link_info & ICE_AQ_MEDIA_AVAILABLE)) {
+		set_bit(ICE_FLAG_NO_MEDIA, pf->flags);
+
+		result = ice_aq_set_link_restart_an(pi, false, NULL);
+		if (result) {
+			dev_dbg(&pf->pdev->dev,
+				"Failed to set link down, VSI %d error %d\n",
+				vsi->vsi_num, result);
+			return result;
+		}
+	}
+
 	ice_vsi_link_event(vsi, link_up);
 	ice_print_link_msg(vsi, link_up);
 
@@ -1314,6 +1328,124 @@ static void ice_handle_mdd_event(struct ice_pf *pf)
 	}
 }
 
+/**
+ * ice_force_phys_link_state - Force the physical link state
+ * @vsi: VSI to force the physical link state to up/down
+ * @link_up: true/false indicates to set the physical link to up/down
+ *
+ * Force the physical link state by getting the current PHY capabilities from
+ * hardware and setting the PHY config based on the determined capabilities. If
+ * link changes a link event will be triggered because both the Enable Automatic
+ * Link Update and LESM Enable bits are set when setting the PHY capabilities.
+ *
+ * Returns 0 on success, negative on failure
+ */
+static int ice_force_phys_link_state(struct ice_vsi *vsi, bool link_up)
+{
+	struct ice_aqc_get_phy_caps_data *pcaps;
+	struct ice_aqc_set_phy_cfg_data *cfg;
+	struct ice_port_info *pi;
+	struct device *dev;
+	int retcode;
+
+	if (!vsi || !vsi->port_info || !vsi->back)
+		return -EINVAL;
+	if (vsi->type != ICE_VSI_PF)
+		return 0;
+
+	dev = &vsi->back->pdev->dev;
+
+	pi = vsi->port_info;
+
+	pcaps = devm_kzalloc(dev, sizeof(*pcaps), GFP_KERNEL);
+	if (!pcaps)
+		return -ENOMEM;
+
+	retcode = ice_aq_get_phy_caps(pi, false, ICE_AQC_REPORT_SW_CFG, pcaps,
+				      NULL);
+	if (retcode) {
+		dev_err(dev,
+			"Failed to get phy capabilities, VSI %d error %d\n",
+			vsi->vsi_num, retcode);
+		retcode = -EIO;
+		goto out;
+	}
+
+	/* No change in link */
+	if (link_up == !!(pcaps->caps & ICE_AQC_PHY_EN_LINK) &&
+	    link_up == !!(pi->phy.link_info.link_info & ICE_AQ_LINK_UP))
+		goto out;
+
+	cfg = devm_kzalloc(dev, sizeof(*cfg), GFP_KERNEL);
+	if (!cfg) {
+		retcode = -ENOMEM;
+		goto out;
+	}
+
+	cfg->phy_type_low = pcaps->phy_type_low;
+	cfg->phy_type_high = pcaps->phy_type_high;
+	cfg->caps = pcaps->caps | ICE_AQ_PHY_ENA_AUTO_LINK_UPDT;
+	cfg->low_power_ctrl = pcaps->low_power_ctrl;
+	cfg->eee_cap = pcaps->eee_cap;
+	cfg->eeer_value = pcaps->eeer_value;
+	cfg->link_fec_opt = pcaps->link_fec_options;
+	if (link_up)
+		cfg->caps |= ICE_AQ_PHY_ENA_LINK;
+	else
+		cfg->caps &= ~ICE_AQ_PHY_ENA_LINK;
+
+	retcode = ice_aq_set_phy_cfg(&vsi->back->hw, pi->lport, cfg, NULL);
+	if (retcode) {
+		dev_err(dev, "Failed to set phy config, VSI %d error %d\n",
+			vsi->vsi_num, retcode);
+		retcode = -EIO;
+	}
+
+	devm_kfree(dev, cfg);
+out:
+	devm_kfree(dev, pcaps);
+	return retcode;
+}
+
+/**
+ * ice_check_media_subtask - Check for media; bring link up if detected.
+ * @pf: pointer to PF struct
+ */
+static void ice_check_media_subtask(struct ice_pf *pf)
+{
+	struct ice_port_info *pi;
+	struct ice_vsi *vsi;
+	int err;
+
+	vsi = ice_find_vsi_by_type(pf, ICE_VSI_PF);
+	if (!vsi)
+		return;
+
+	/* No need to check for media if it's already present or the interface
+	 * is down
+	 */
+	if (!test_bit(ICE_FLAG_NO_MEDIA, pf->flags) ||
+	    test_bit(__ICE_DOWN, vsi->state))
+		return;
+
+	/* Refresh link info and check if media is present */
+	pi = vsi->port_info;
+	err = ice_update_link_info(pi);
+	if (err)
+		return;
+
+	if (pi->phy.link_info.link_info & ICE_AQ_MEDIA_AVAILABLE) {
+		err = ice_force_phys_link_state(vsi, true);
+		if (err)
+			return;
+		clear_bit(ICE_FLAG_NO_MEDIA, pf->flags);
+
+		/* A Link Status Event will be generated; the event handler
+		 * will complete bringing the interface up
+		 */
+	}
+}
+
 /**
  * ice_service_task - manage and run subtasks
  * @work: pointer to work_struct contained by the PF struct
@@ -1336,6 +1468,7 @@ static void ice_service_task(struct work_struct *work)
 		return;
 	}
 
+	ice_check_media_subtask(pf);
 	ice_check_for_hang_subtask(pf);
 	ice_sync_fltr_subtask(pf);
 	ice_handle_mdd_event(pf);
@@ -3357,85 +3490,6 @@ static void ice_napi_disable_all(struct ice_vsi *vsi)
 	}
 }
 
-/**
- * ice_force_phys_link_state - Force the physical link state
- * @vsi: VSI to force the physical link state to up/down
- * @link_up: true/false indicates to set the physical link to up/down
- *
- * Force the physical link state by getting the current PHY capabilities from
- * hardware and setting the PHY config based on the determined capabilities. If
- * link changes a link event will be triggered because both the Enable Automatic
- * Link Update and LESM Enable bits are set when setting the PHY capabilities.
- *
- * Returns 0 on success, negative on failure
- */
-static int ice_force_phys_link_state(struct ice_vsi *vsi, bool link_up)
-{
-	struct ice_aqc_get_phy_caps_data *pcaps;
-	struct ice_aqc_set_phy_cfg_data *cfg;
-	struct ice_port_info *pi;
-	struct device *dev;
-	int retcode;
-
-	if (!vsi || !vsi->port_info || !vsi->back)
-		return -EINVAL;
-	if (vsi->type != ICE_VSI_PF)
-		return 0;
-
-	dev = &vsi->back->pdev->dev;
-
-	pi = vsi->port_info;
-
-	pcaps = devm_kzalloc(dev, sizeof(*pcaps), GFP_KERNEL);
-	if (!pcaps)
-		return -ENOMEM;
-
-	retcode = ice_aq_get_phy_caps(pi, false, ICE_AQC_REPORT_SW_CFG, pcaps,
-				      NULL);
-	if (retcode) {
-		dev_err(dev,
-			"Failed to get phy capabilities, VSI %d error %d\n",
-			vsi->vsi_num, retcode);
-		retcode = -EIO;
-		goto out;
-	}
-
-	/* No change in link */
-	if (link_up == !!(pcaps->caps & ICE_AQC_PHY_EN_LINK) &&
-	    link_up == !!(pi->phy.link_info.link_info & ICE_AQ_LINK_UP))
-		goto out;
-
-	cfg = devm_kzalloc(dev, sizeof(*cfg), GFP_KERNEL);
-	if (!cfg) {
-		retcode = -ENOMEM;
-		goto out;
-	}
-
-	cfg->phy_type_low = pcaps->phy_type_low;
-	cfg->phy_type_high = pcaps->phy_type_high;
-	cfg->caps = pcaps->caps | ICE_AQ_PHY_ENA_AUTO_LINK_UPDT;
-	cfg->low_power_ctrl = pcaps->low_power_ctrl;
-	cfg->eee_cap = pcaps->eee_cap;
-	cfg->eeer_value = pcaps->eeer_value;
-	cfg->link_fec_opt = pcaps->link_fec_options;
-	if (link_up)
-		cfg->caps |= ICE_AQ_PHY_ENA_LINK;
-	else
-		cfg->caps &= ~ICE_AQ_PHY_ENA_LINK;
-
-	retcode = ice_aq_set_phy_cfg(&vsi->back->hw, pi->lport, cfg, NULL);
-	if (retcode) {
-		dev_err(dev, "Failed to set phy config, VSI %d error %d\n",
-			vsi->vsi_num, retcode);
-		retcode = -EIO;
-	}
-
-	devm_kfree(dev, cfg);
-out:
-	devm_kfree(dev, pcaps);
-	return retcode;
-}
-
 /**
  * ice_down - Shutdown the connection
  * @vsi: The VSI being stopped
@@ -4281,6 +4335,7 @@ int ice_open(struct net_device *netdev)
 {
 	struct ice_netdev_priv *np = netdev_priv(netdev);
 	struct ice_vsi *vsi = np->vsi;
+	struct ice_port_info *pi;
 	int err;
 
 	if (test_bit(__ICE_NEEDS_RESTART, vsi->back->state)) {
@@ -4290,13 +4345,33 @@ int ice_open(struct net_device *netdev)
 
 	netif_carrier_off(netdev);
 
-	err = ice_force_phys_link_state(vsi, true);
+	pi = vsi->port_info;
+	err = ice_update_link_info(pi);
 	if (err) {
-		netdev_err(netdev,
-			   "Failed to set physical link up, error %d\n", err);
+		netdev_err(netdev, "Failed to get link info, error %d\n",
+			   err);
 		return err;
 	}
 
+	/* Set PHY if there is media, otherwise, turn off PHY */
+	if (pi->phy.link_info.link_info & ICE_AQ_MEDIA_AVAILABLE) {
+		err = ice_force_phys_link_state(vsi, true);
+		if (err) {
+			netdev_err(netdev,
+				   "Failed to set physical link up, error %d\n",
+				   err);
+			return err;
+		}
+	} else {
+		err = ice_aq_set_link_restart_an(pi, false, NULL);
+		if (err) {
+			netdev_err(netdev, "Failed to set PHY state, VSI %d error %d\n",
+				   vsi->vsi_num, err);
+			return err;
+		}
+		set_bit(ICE_FLAG_NO_MEDIA, vsi->back->flags);
+	}
+
 	err = ice_vsi_open(vsi);
 	if (err)
 		netdev_err(netdev, "Failed to open VSI 0x%04X on switch 0x%04X\n",
-- 
2.21.0


^ permalink raw reply related

* Re: [PATCH] net: sctp: Rename fallthrough label to unhandled
From: Neil Horman @ 2019-07-31 20:58 UTC (permalink / raw)
  To: Joe Perches
  Cc: Vlad Yasevich, Marcelo Ricardo Leitner, David S. Miller,
	linux-sctp, netdev, linux-kernel
In-Reply-To: <b93bbb17b407e27bb1dc196af84e4f289d9dfd93.camel@perches.com>

On Wed, Jul 31, 2019 at 09:35:31AM -0700, Joe Perches wrote:
> On Wed, 2019-07-31 at 08:16 -0400, Neil Horman wrote:
> > On Wed, Jul 31, 2019 at 04:32:43AM -0700, Joe Perches wrote:
> > > On Wed, 2019-07-31 at 07:19 -0400, Neil Horman wrote:
> > > > On Tue, Jul 30, 2019 at 10:04:37PM -0700, Joe Perches wrote:
> > > > > fallthrough may become a pseudo reserved keyword so this only use of
> > > > > fallthrough is better renamed to allow it.
> > > > > 
> > > > > Signed-off-by: Joe Perches <joe@perches.com>
> > > > Are you referring to the __attribute__((fallthrough)) statement that gcc
> > > > supports?  If so the compiler should by all rights be able to differentiate
> > > > between a null statement attribute and a explicit goto and label without the
> > > > need for renaming here.  Or are you referring to something else?
> > > 
> > > Hi.
> > > 
> > > I sent after this a patch that adds
> > > 
> > > # define fallthrough                    __attribute__((__fallthrough__))
> > > 
> > > https://lore.kernel.org/patchwork/patch/1108577/
> > > 
> > > So this rename is a prerequisite to adding this #define.
> > > 
> > why not just define __fallthrough instead, like we do for all the other
> > attributes we alias (i.e. __read_mostly, __protected_by, __unused, __exception,
> > etc)
> 
> Because it's not as intelligible when used as a statement.
I think thats somewhat debatable.  __fallthrough to me looks like an internal
macro, whereas fallthrough looks like a comment someone forgot to /* */

Neil

> 
> 
> 
> 

^ permalink raw reply

* [PATCH net] net: phy: fix race in genphy_update_link
From: Heiner Kallweit @ 2019-07-31 21:05 UTC (permalink / raw)
  To: Andrew Lunn, Florian Fainelli, David Miller
  Cc: netdev@vger.kernel.org, liuyonglong

In phy_start_aneg() autoneg is started, and immediately after that
link and autoneg status are read. As reported in [0] it can happen that
at time of this read the PHY has reset the "aneg complete" bit but not
yet the "link up" bit, what can result in a false link-up detection.
To fix this don't report link as up if we're in aneg mode and PHY
doesn't signal "aneg complete".

[0] https://marc.info/?t=156413509900003&r=1&w=2

Fixes: 4950c2ba49cc ("net: phy: fix autoneg mismatch case in genphy_read_status")
Reported-by: liuyonglong <liuyonglong@huawei.com>
Tested-by: liuyonglong <liuyonglong@huawei.com>
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
---
 drivers/net/phy/phy_device.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
index 6b5cb87f3..7ddd91df9 100644
--- a/drivers/net/phy/phy_device.c
+++ b/drivers/net/phy/phy_device.c
@@ -1774,6 +1774,12 @@ int genphy_update_link(struct phy_device *phydev)
 	phydev->link = status & BMSR_LSTATUS ? 1 : 0;
 	phydev->autoneg_complete = status & BMSR_ANEGCOMPLETE ? 1 : 0;
 
+	/* Consider the case that autoneg was started and "aneg complete"
+	 * bit has been reset, but "link up" bit not yet.
+	 */
+	if (phydev->autoneg == AUTONEG_ENABLE && !phydev->autoneg_complete)
+		phydev->link = 0;
+
 	return 0;
 }
 EXPORT_SYMBOL(genphy_update_link);
-- 
2.22.0


^ permalink raw reply related

* Re: KASAN: use-after-free Read in nr_rx_frame (2)
From: syzbot @ 2019-07-31 21:32 UTC (permalink / raw)
  To: davem, dvyukov, linux-hams, linux-kernel, netdev, ralf,
	syzkaller-bugs, xiyou.wangcong
In-Reply-To: <000000000000e42667058e554371@google.com>

syzbot has bisected this bug to:

commit c8c8218ec5af5d2598381883acbefbf604e56b5e
Author: Cong Wang <xiyou.wangcong@gmail.com>
Date:   Thu Jun 27 21:30:58 2019 +0000

     netrom: fix a memory leak in nr_rx_frame()

bisection log:  https://syzkaller.appspot.com/x/bisect.txt?x=122ddaec600000
start commit:   629f8205 Merge tag 'for-linus-20190730' of git://git.kerne..
git tree:       upstream
final crash:    https://syzkaller.appspot.com/x/report.txt?x=112ddaec600000
console output: https://syzkaller.appspot.com/x/log.txt?x=162ddaec600000
kernel config:  https://syzkaller.appspot.com/x/.config?x=e397351d2615e10
dashboard link: https://syzkaller.appspot.com/bug?extid=701728447042217b67c1
syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=14a6e008600000
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=11937d92600000

Reported-by: syzbot+701728447042217b67c1@syzkaller.appspotmail.com
Fixes: c8c8218ec5af ("netrom: fix a memory leak in nr_rx_frame()")

For information about bisection process see: https://goo.gl/tpsmEJ#bisection

^ permalink raw reply

* Re: [PATCH net v3] net: ipv6: Fix a bug in ndisc_send_ns when netdev only has a global address
From: David Ahern @ 2019-07-31 21:42 UTC (permalink / raw)
  To: Su Yanjun, davem, kuznet, yoshfuji; +Cc: netdev, linux-kernel
In-Reply-To: <1564537972-76503-1-git-send-email-suyj.fnst@cn.fujitsu.com>

On 7/30/19 7:52 PM, Su Yanjun wrote:
> When the egress interface does not have a link local address, it can
> not communicate with other hosts.
> 
> In RFC4861, 7.2.2 says
> "If the source address of the packet prompting the solicitation is the
> same as one of the addresses assigned to the outgoing interface, that
> address SHOULD be placed in the IP Source Address of the outgoing
> solicitation.  Otherwise, any one of the addresses assigned to the
> interface should be used."
> 
> In this patch we try get a global address if we get ll address failed.
> 
> Signed-off-by: Su Yanjun <suyj.fnst@cn.fujitsu.com>
> ---
> Changes since V2:
> 	- Let banned_flags under the scope of its use.
> ---
>  include/net/addrconf.h |  2 ++
>  net/ipv6/addrconf.c    | 34 ++++++++++++++++++++++++++++++++++
>  net/ipv6/ndisc.c       | 10 +++++++---
>  3 files changed, 43 insertions(+), 3 deletions(-)
> 


This change looks fine to me given the RFC reference, so for that part:
Reviewed-by: David Ahern <dsahern@gmail.com>

Bigger picture is the issue Mark raised that a different RFC says all
links should have an LLA, so use of IN6_ADDR_GEN_MODE_NONE means
userspace is expected to create and add the LLA. Lack of an LLA is a
misconfigured system. If that is enforced via some to be developed
patch, then this patch would not be needed.

^ permalink raw reply

* Re: [patch net-next 0/3] net: devlink: Finish network namespace support
From: David Ahern @ 2019-07-31 21:50 UTC (permalink / raw)
  To: Jiri Pirko; +Cc: netdev, davem, jakub.kicinski, sthemmin, mlxsw
In-Reply-To: <20190730060817.GD2312@nanopsycho.orion>

On 7/30/19 12:08 AM, Jiri Pirko wrote:
> Mon, Jul 29, 2019 at 10:17:25PM CEST, dsahern@gmail.com wrote:
>> On 7/27/19 3:44 AM, Jiri Pirko wrote:
>>> From: Jiri Pirko <jiri@mellanox.com>
>>>
>>> Devlink from the beginning counts with network namespaces, but the
>>> instances has been fixed to init_net. The first patch allows user
>>> to move existing devlink instances into namespaces:
>>>
>>
>> so you intend for an asic, for example, to have multiple devlink
>> instances where each instance governs a set of related ports (e.g.,
>> ports that share a set of hardware resources) and those instances can be
>> managed from distinct network namespaces?
> 
> No, no multiple devlink instances for asic intended.

So it should be allowed for an asic to have resources split across
network namespaces. e.g., something like this:

   namespace 1 |  namespace 2  | ... | namespace N
               |               |     |
  { ports 1 }  |   { ports 2 } | ... | { ports N }
               |               |     |
   devlink 1   |    devlink 2  | ... |  devlink N
  =================================================
                   driver


^ permalink raw reply

* Re: [PATCH] net: mdio-octeon: Fix build error and Kconfig warning
From: Randy Dunlap @ 2019-07-31 21:55 UTC (permalink / raw)
  To: Nathan Chancellor, davem
  Cc: andrew, broonie, devel, f.fainelli, gregkh, hkallweit1,
	kernel-build-reports, linux-arm-kernel, linux-next, netdev, willy,
	kbuild test robot
In-Reply-To: <20190731185023.20954-1-natechancellor@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 3444 bytes --]

On 7/31/19 11:50 AM, Nathan Chancellor wrote:
> arm allyesconfig warns:
> 
> WARNING: unmet direct dependencies detected for MDIO_OCTEON
>   Depends on [n]: NETDEVICES [=y] && MDIO_DEVICE [=y] && MDIO_BUS [=y]
> && 64BIT && HAS_IOMEM [=y] && OF_MDIO [=y]
>   Selected by [y]:
>   - OCTEON_ETHERNET [=y] && STAGING [=y] && (CAVIUM_OCTEON_SOC &&
> NETDEVICES [=y] || COMPILE_TEST [=y])
> 
> and errors:
> 
> In file included from ../drivers/net/phy/mdio-octeon.c:14:
> ../drivers/net/phy/mdio-octeon.c: In function 'octeon_mdiobus_probe':
> ../drivers/net/phy/mdio-cavium.h:111:36: error: implicit declaration of
> function 'writeq'; did you mean 'writeb'?
> [-Werror=implicit-function-declaration]
>   111 | #define oct_mdio_writeq(val, addr) writeq(val, (void *)addr)
>       |                                    ^~~~~~
> ../drivers/net/phy/mdio-octeon.c:56:2: note: in expansion of macro
> 'oct_mdio_writeq'
>    56 |  oct_mdio_writeq(smi_en.u64, bus->register_base + SMI_EN);
>       |  ^~~~~~~~~~~~~~~
> cc1: some warnings being treated as errors
> 
> This allows MDIO_OCTEON to be built with COMPILE_TEST as well and
> includes the proper header for readq/writeq. This does not address
> the several -Wint-to-pointer-cast and -Wpointer-to-int-cast warnings
> that appeared as a result of commit 171a9bae68c7 ("staging/octeon:
> Allow test build on !MIPS") in these files.
> 
> Fixes: 171a9bae68c7 ("staging/octeon: Allow test build on !MIPS")
> Reported-by: kbuild test robot <lkp@intel.com>
> Reported-by: Mark Brown <broonie@kernel.org>
> Reported-by: Randy Dunlap <rdunlap@infradead.org>
> Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>


With today's linux-next (20190731), I am still seeing a Kconfig warning and
build errors (building for i386):

and applying Greg's "depends on NETDEVICES" patch and this patch:

WARNING: unmet direct dependencies detected for MDIO_OCTEON
  Depends on [n]: NETDEVICES [=y] && MDIO_DEVICE [=m] && MDIO_BUS [=m] && (64BIT [=n] || COMPILE_TEST [=y]) && HAS_IOMEM [=y] && OF_MDIO [=n]
  Selected by [m]:
  - OCTEON_ETHERNET [=m] && STAGING [=y] && (CAVIUM_OCTEON_SOC || COMPILE_TEST [=y]) && NETDEVICES [=y]

ERROR: "cavium_mdiobus_write" [drivers/net/phy/mdio-octeon.ko] undefined!
ERROR: "cavium_mdiobus_read" [drivers/net/phy/mdio-octeon.ko] undefined!


kernel .config file is attached.

Am I missing another patch?

thanks.

> ---
>  drivers/net/phy/Kconfig       | 2 +-
>  drivers/net/phy/mdio-cavium.h | 2 ++
>  2 files changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/phy/Kconfig b/drivers/net/phy/Kconfig
> index 20f14c5fbb7e..ed2edf4b5b0e 100644
> --- a/drivers/net/phy/Kconfig
> +++ b/drivers/net/phy/Kconfig
> @@ -159,7 +159,7 @@ config MDIO_MSCC_MIIM
>  
>  config MDIO_OCTEON
>  	tristate "Octeon and some ThunderX SOCs MDIO buses"
> -	depends on 64BIT
> +	depends on 64BIT || COMPILE_TEST
>  	depends on HAS_IOMEM && OF_MDIO
>  	select MDIO_CAVIUM
>  	help
> diff --git a/drivers/net/phy/mdio-cavium.h b/drivers/net/phy/mdio-cavium.h
> index ed5f9bb5448d..b7f89ad27465 100644
> --- a/drivers/net/phy/mdio-cavium.h
> +++ b/drivers/net/phy/mdio-cavium.h
> @@ -108,6 +108,8 @@ static inline u64 oct_mdio_readq(u64 addr)
>  	return cvmx_read_csr(addr);
>  }
>  #else
> +#include <linux/io-64-nonatomic-lo-hi.h>
> +
>  #define oct_mdio_writeq(val, addr)	writeq(val, (void *)addr)
>  #define oct_mdio_readq(addr)		readq((void *)addr)
>  #endif
> 


-- 
~Randy

[-- Attachment #2: cavium.i386.config --]
[-- Type: application/x-config, Size: 107345 bytes --]

^ permalink raw reply

* Re: [PATCH 0/2] tools: bpftool: add net (un)load command to load XDP
From: David Ahern @ 2019-07-31 21:59 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Alexei Starovoitov, Daniel T. Lee, Stephen Hemminger,
	Jesper Dangaard Brouer, John Fastabend, Daniel Borkmann,
	Alexei Starovoitov, David Miller, netdev
In-Reply-To: <20190730182144.1355bf50@cakuba.netronome.com>

On 7/30/19 7:21 PM, Jakub Kicinski wrote:
> 
>>>> If bpftool was taught to do equivalent of 'ip link' that would be
>>>> very different story and I would be opposed to that.  
>>> Yes, that'd be pretty clear cut, only the XDP stuff is a bit more 
>>> of a judgement call.  
>> bpftool must be able to introspect every aspect of bpf programming.
>> That includes detaching and attaching anywhere.
>> Anyone doing 'bpftool p s' should be able to switch off particular
>> prog id without learning ten different other tools.
> I think the fact that we already have an implementation in iproute2,
> which is at the risk of bit rot is more important to me that the
> hypothetical scenario where everyone knows to just use bpftool (for
> XDP, for TC it's still iproute2 unless there's someone crazy enough 
> to reimplement the TC functionality :))

apparently the iproute2 version has bit rot which is a shame.

> 
> I'm not sure we can settle our differences over email :)
> I have tremendous respect for all the maintainers I CCed here, 
> if nobody steps up to agree with me I'll concede the bpftool net
> battle entirely :)

bpftool started as an introspection tool and has turned into a one stop
shop for all things ebpf. I am mixed on whether that is a good thing or
a bad thing.

^ permalink raw reply

* Re: [patch net-next 0/3] net: devlink: Finish network namespace support
From: Jakub Kicinski @ 2019-07-31 22:02 UTC (permalink / raw)
  To: David Ahern; +Cc: Jiri Pirko, netdev, davem, sthemmin, mlxsw
In-Reply-To: <8f5afc58-1cbc-9e9a-aa15-94d1bafcda22@gmail.com>

On Wed, 31 Jul 2019 15:50:26 -0600, David Ahern wrote:
> On 7/30/19 12:08 AM, Jiri Pirko wrote:
> > Mon, Jul 29, 2019 at 10:17:25PM CEST, dsahern@gmail.com wrote:  
> >> On 7/27/19 3:44 AM, Jiri Pirko wrote:  
> >>> From: Jiri Pirko <jiri@mellanox.com>
> >>>
> >>> Devlink from the beginning counts with network namespaces, but the
> >>> instances has been fixed to init_net. The first patch allows user
> >>> to move existing devlink instances into namespaces:
> >>>  
> >>
> >> so you intend for an asic, for example, to have multiple devlink
> >> instances where each instance governs a set of related ports (e.g.,
> >> ports that share a set of hardware resources) and those instances can be
> >> managed from distinct network namespaces?  
> > 
> > No, no multiple devlink instances for asic intended.  
> 
> So it should be allowed for an asic to have resources split across
> network namespaces. e.g., something like this:
> 
>    namespace 1 |  namespace 2  | ... | namespace N
>                |               |     |
>   { ports 1 }  |   { ports 2 } | ... | { ports N }
>                |               |     |
>    devlink 1   |    devlink 2  | ... |  devlink N
>   =================================================
>                    driver

Can you elaborate further? Ports for most purposes are represented by
netdevices. Devlink port instances expose global topological view of
the ports which is primarily relevant if you can see the entire ASIC.
I think the global configuration and global view of resources is still
the most relevant need, so in your diagram you must account for some
"all-seeing" instance, e.g.:

   namespace 1 |  namespace 2  | ... | namespace N
               |               |     |
  { ports 1 }  |   { ports 2 } | ... | { ports N }
               |               |     |
subdevlink 1   | subdevlink 2  | ... |  subdevlink N
         \______      |              _______/
                 master ASIC devlink
  =================================================
                   driver

No?

^ permalink raw reply

* Re: [patch net-next 0/3] net: devlink: Finish network namespace support
From: David Ahern @ 2019-07-31 22:07 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: Jiri Pirko, netdev, davem, sthemmin, mlxsw
In-Reply-To: <20190731150233.432d3c86@cakuba.netronome.com>

On 7/31/19 4:02 PM, Jakub Kicinski wrote:
> On Wed, 31 Jul 2019 15:50:26 -0600, David Ahern wrote:
>> On 7/30/19 12:08 AM, Jiri Pirko wrote:
>>> Mon, Jul 29, 2019 at 10:17:25PM CEST, dsahern@gmail.com wrote:  
>>>> On 7/27/19 3:44 AM, Jiri Pirko wrote:  
>>>>> From: Jiri Pirko <jiri@mellanox.com>
>>>>>
>>>>> Devlink from the beginning counts with network namespaces, but the
>>>>> instances has been fixed to init_net. The first patch allows user
>>>>> to move existing devlink instances into namespaces:
>>>>>  
>>>>
>>>> so you intend for an asic, for example, to have multiple devlink
>>>> instances where each instance governs a set of related ports (e.g.,
>>>> ports that share a set of hardware resources) and those instances can be
>>>> managed from distinct network namespaces?  
>>>
>>> No, no multiple devlink instances for asic intended.  
>>
>> So it should be allowed for an asic to have resources split across
>> network namespaces. e.g., something like this:
>>
>>    namespace 1 |  namespace 2  | ... | namespace N
>>                |               |     |
>>   { ports 1 }  |   { ports 2 } | ... | { ports N }
>>                |               |     |
>>    devlink 1   |    devlink 2  | ... |  devlink N
>>   =================================================
>>                    driver
> 
> Can you elaborate further? Ports for most purposes are represented by
> netdevices. Devlink port instances expose global topological view of
> the ports which is primarily relevant if you can see the entire ASIC.
> I think the global configuration and global view of resources is still
> the most relevant need, so in your diagram you must account for some
> "all-seeing" instance, e.g.:
> 
>    namespace 1 |  namespace 2  | ... | namespace N
>                |               |     |
>   { ports 1 }  |   { ports 2 } | ... | { ports N }
>                |               |     |
> subdevlink 1   | subdevlink 2  | ... |  subdevlink N
>          \______      |              _______/
>                  master ASIC devlink
>   =================================================
>                    driver
> 
> No?
> 

sure, there could be a master devlink visible to the user if that makes
sense or the driver can account for it behind the scenes as the sum of
the devlink instances.

The goal is to allow ports within an asic [1] to be divided across
network namespace where each namespace sees a subset of the ports. This
allows creating multiple logical switches from a single physical asic.

[1] within constraints imposed by the driver/hardware - for example to
account for resources shared by a set of ports. e.g., front panel ports
1 - 4 have shared resources and must always be in the same devlink instance.

^ permalink raw reply

* [PATCH bpf v2] libbpf : make libbpf_num_possible_cpus function thread safe
From: Takshak Chahande @ 2019-07-31 22:10 UTC (permalink / raw)
  To: netdev; +Cc: ast, daniel, rdna, ctakshak, kernel-team, hechaol, jakub.kicinski

Having static variable `cpus` in libbpf_num_possible_cpus function
without guarding it with mutex makes this function thread-unsafe.

If multiple threads accessing this function, in the current form; it
leads to incrementing the static variable value `cpus` in the multiple
of total available CPUs.

Used local stack variable to calculate the number of possible CPUs and
then updated the static variable using WRITE_ONCE().

Changes since v1:
 * added stack variable to calculate cpus
 * serialized static variable update using WRITE_ONCE()
 * fixed Fixes tag

Fixes: 6446b3155521 ("bpf: add a new API libbpf_num_possible_cpus()")
Signed-off-by: Takshak Chahande <ctakshak@fb.com>
---
 tools/lib/bpf/libbpf.c | 18 +++++++++++-------
 1 file changed, 11 insertions(+), 7 deletions(-)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 6718d0b90130..2e84fa5b8479 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -4995,13 +4995,15 @@ int libbpf_num_possible_cpus(void)
 	static const char *fcpu = "/sys/devices/system/cpu/possible";
 	int len = 0, n = 0, il = 0, ir = 0;
 	unsigned int start = 0, end = 0;
+	int tmp_cpus = 0;
 	static int cpus;
 	char buf[128];
 	int error = 0;
 	int fd = -1;
 
-	if (cpus > 0)
-		return cpus;
+	tmp_cpus = READ_ONCE(cpus);
+	if (tmp_cpus > 0)
+		return tmp_cpus;
 
 	fd = open(fcpu, O_RDONLY);
 	if (fd < 0) {
@@ -5024,7 +5026,7 @@ int libbpf_num_possible_cpus(void)
 	}
 	buf[len] = '\0';
 
-	for (ir = 0, cpus = 0; ir <= len; ir++) {
+	for (ir = 0, tmp_cpus = 0; ir <= len; ir++) {
 		/* Each sub string separated by ',' has format \d+-\d+ or \d+ */
 		if (buf[ir] == ',' || buf[ir] == '\0') {
 			buf[ir] = '\0';
@@ -5036,13 +5038,15 @@ int libbpf_num_possible_cpus(void)
 			} else if (n == 1) {
 				end = start;
 			}
-			cpus += end - start + 1;
+			tmp_cpus += end - start + 1;
 			il = ir + 1;
 		}
 	}
-	if (cpus <= 0) {
-		pr_warning("Invalid #CPUs %d from %s\n", cpus, fcpu);
+	if (tmp_cpus <= 0) {
+		pr_warning("Invalid #CPUs %d from %s\n", tmp_cpus, fcpu);
 		return -EINVAL;
 	}
-	return cpus;
+
+	WRITE_ONCE(cpus, tmp_cpus);
+	return tmp_cpus;
 }
-- 
2.17.1


^ permalink raw reply related

* [PATCH V37 23/29] bpf: Restrict bpf when kernel lockdown is in confidentiality mode
From: Matthew Garrett @ 2019-07-31 22:16 UTC (permalink / raw)
  To: jmorris
  Cc: linux-security-module, linux-kernel, linux-api, David Howells,
	Alexei Starovoitov, Matthew Garrett, Kees Cook, netdev,
	Chun-Yi Lee, Daniel Borkmann
In-Reply-To: <20190731221617.234725-1-matthewgarrett@google.com>

From: David Howells <dhowells@redhat.com>

bpf_read() and bpf_read_str() could potentially be abused to (eg) allow
private keys in kernel memory to be leaked. Disable them if the kernel
has been locked down in confidentiality mode.

Suggested-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Signed-off-by: Matthew Garrett <mjg59@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
cc: netdev@vger.kernel.org
cc: Chun-Yi Lee <jlee@suse.com>
cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
---
 include/linux/security.h     |  1 +
 kernel/trace/bpf_trace.c     | 10 ++++++++++
 security/lockdown/lockdown.c |  1 +
 3 files changed, 12 insertions(+)

diff --git a/include/linux/security.h b/include/linux/security.h
index 987d8427f091..8dd1741a52cd 100644
--- a/include/linux/security.h
+++ b/include/linux/security.h
@@ -118,6 +118,7 @@ enum lockdown_reason {
 	LOCKDOWN_INTEGRITY_MAX,
 	LOCKDOWN_KCORE,
 	LOCKDOWN_KPROBES,
+	LOCKDOWN_BPF_READ,
 	LOCKDOWN_CONFIDENTIALITY_MAX,
 };
 
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index ca1255d14576..492a8bfaae98 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -142,8 +142,13 @@ BPF_CALL_3(bpf_probe_read, void *, dst, u32, size, const void *, unsafe_ptr)
 {
 	int ret;
 
+	ret = security_locked_down(LOCKDOWN_BPF_READ);
+	if (ret < 0)
+		goto out;
+
 	ret = probe_kernel_read(dst, unsafe_ptr, size);
 	if (unlikely(ret < 0))
+out:
 		memset(dst, 0, size);
 
 	return ret;
@@ -569,6 +574,10 @@ BPF_CALL_3(bpf_probe_read_str, void *, dst, u32, size,
 {
 	int ret;
 
+	ret = security_locked_down(LOCKDOWN_BPF_READ);
+	if (ret < 0)
+		goto out;
+
 	/*
 	 * The strncpy_from_unsafe() call will likely not fill the entire
 	 * buffer, but that's okay in this circumstance as we're probing
@@ -580,6 +589,7 @@ BPF_CALL_3(bpf_probe_read_str, void *, dst, u32, size,
 	 */
 	ret = strncpy_from_unsafe(dst, unsafe_ptr, size);
 	if (unlikely(ret < 0))
+out:
 		memset(dst, 0, size);
 
 	return ret;
diff --git a/security/lockdown/lockdown.c b/security/lockdown/lockdown.c
index 6b123cbf3748..1b89d3e8e54d 100644
--- a/security/lockdown/lockdown.c
+++ b/security/lockdown/lockdown.c
@@ -33,6 +33,7 @@ static char *lockdown_reasons[LOCKDOWN_CONFIDENTIALITY_MAX+1] = {
 	[LOCKDOWN_INTEGRITY_MAX] = "integrity",
 	[LOCKDOWN_KCORE] = "/proc/kcore access",
 	[LOCKDOWN_KPROBES] = "use of kprobes",
+	[LOCKDOWN_BPF_READ] = "use of bpf to read kernel RAM",
 	[LOCKDOWN_CONFIDENTIALITY_MAX] = "confidentiality",
 };
 
-- 
2.22.0.770.g0f2c4a37fd-goog


^ permalink raw reply related

* Re: [Potential Spoof] [PATCH bpf v2] libbpf : make libbpf_num_possible_cpus function thread safe
From: Andrey Ignatov @ 2019-07-31 22:17 UTC (permalink / raw)
  To: Takshak Chahande
  Cc: netdev@vger.kernel.org, ast@kernel.org, daniel@iogearbox.net,
	Kernel Team, Hechao Li, jakub.kicinski@netronome.com
In-Reply-To: <20190731221055.1478201-1-ctakshak@fb.com>

Takshak Chahande <ctakshak@fb.com> [Wed, 2019-07-31 15:11 -0700]:
> Having static variable `cpus` in libbpf_num_possible_cpus function
> without guarding it with mutex makes this function thread-unsafe.
> 
> If multiple threads accessing this function, in the current form; it
> leads to incrementing the static variable value `cpus` in the multiple
> of total available CPUs.
> 
> Used local stack variable to calculate the number of possible CPUs and
> then updated the static variable using WRITE_ONCE().
> 
> Changes since v1:
>  * added stack variable to calculate cpus
>  * serialized static variable update using WRITE_ONCE()
>  * fixed Fixes tag

This "Changes" section should be after "---" line not to be included in
the final commit message.

Not sure if resubmit is needed because of it, but other than this looks
good to me.

Acked-by: Andrey Ignatov <rdna@fb.com>


> Fixes: 6446b3155521 ("bpf: add a new API libbpf_num_possible_cpus()")
> Signed-off-by: Takshak Chahande <ctakshak@fb.com>
> ---
>  tools/lib/bpf/libbpf.c | 18 +++++++++++-------
>  1 file changed, 11 insertions(+), 7 deletions(-)
> 
> diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
> index 6718d0b90130..2e84fa5b8479 100644
> --- a/tools/lib/bpf/libbpf.c
> +++ b/tools/lib/bpf/libbpf.c
> @@ -4995,13 +4995,15 @@ int libbpf_num_possible_cpus(void)
>  	static const char *fcpu = "/sys/devices/system/cpu/possible";
>  	int len = 0, n = 0, il = 0, ir = 0;
>  	unsigned int start = 0, end = 0;
> +	int tmp_cpus = 0;
>  	static int cpus;
>  	char buf[128];
>  	int error = 0;
>  	int fd = -1;
>  
> -	if (cpus > 0)
> -		return cpus;
> +	tmp_cpus = READ_ONCE(cpus);
> +	if (tmp_cpus > 0)
> +		return tmp_cpus;
>  
>  	fd = open(fcpu, O_RDONLY);
>  	if (fd < 0) {
> @@ -5024,7 +5026,7 @@ int libbpf_num_possible_cpus(void)
>  	}
>  	buf[len] = '\0';
>  
> -	for (ir = 0, cpus = 0; ir <= len; ir++) {
> +	for (ir = 0, tmp_cpus = 0; ir <= len; ir++) {
>  		/* Each sub string separated by ',' has format \d+-\d+ or \d+ */
>  		if (buf[ir] == ',' || buf[ir] == '\0') {
>  			buf[ir] = '\0';
> @@ -5036,13 +5038,15 @@ int libbpf_num_possible_cpus(void)
>  			} else if (n == 1) {
>  				end = start;
>  			}
> -			cpus += end - start + 1;
> +			tmp_cpus += end - start + 1;
>  			il = ir + 1;
>  		}
>  	}
> -	if (cpus <= 0) {
> -		pr_warning("Invalid #CPUs %d from %s\n", cpus, fcpu);
> +	if (tmp_cpus <= 0) {
> +		pr_warning("Invalid #CPUs %d from %s\n", tmp_cpus, fcpu);
>  		return -EINVAL;
>  	}
> -	return cpus;
> +
> +	WRITE_ONCE(cpus, tmp_cpus);
> +	return tmp_cpus;
>  }
> -- 
> 2.17.1
> 

-- 
Andrey Ignatov

^ permalink raw reply

* Re: [PATCH] net: sctp: Rename fallthrough label to unhandled
From: Joe Perches @ 2019-07-31 22:23 UTC (permalink / raw)
  To: Neil Horman
  Cc: Vlad Yasevich, Marcelo Ricardo Leitner, David S. Miller,
	linux-sctp, netdev, linux-kernel
In-Reply-To: <20190731205804.GE9823@hmswarspite.think-freely.org>

On Wed, 2019-07-31 at 16:58 -0400, Neil Horman wrote:
> On Wed, Jul 31, 2019 at 09:35:31AM -0700, Joe Perches wrote:
> > On Wed, 2019-07-31 at 08:16 -0400, Neil Horman wrote:
> > > On Wed, Jul 31, 2019 at 04:32:43AM -0700, Joe Perches wrote:
> > > > On Wed, 2019-07-31 at 07:19 -0400, Neil Horman wrote:
> > > > > On Tue, Jul 30, 2019 at 10:04:37PM -0700, Joe Perches wrote:
> > > > > > fallthrough may become a pseudo reserved keyword so this only use of
> > > > > > fallthrough is better renamed to allow it.
> > > > > > 
> > > > > > Signed-off-by: Joe Perches <joe@perches.com>
> > > > > Are you referring to the __attribute__((fallthrough)) statement that gcc
> > > > > supports?  If so the compiler should by all rights be able to differentiate
> > > > > between a null statement attribute and a explicit goto and label without the
> > > > > need for renaming here.  Or are you referring to something else?
> > > > 
> > > > Hi.
> > > > 
> > > > I sent after this a patch that adds
> > > > 
> > > > # define fallthrough                    __attribute__((__fallthrough__))
> > > > 
> > > > https://lore.kernel.org/patchwork/patch/1108577/
> > > > 
> > > > So this rename is a prerequisite to adding this #define.
> > > > 
> > > why not just define __fallthrough instead, like we do for all the other
> > > attributes we alias (i.e. __read_mostly, __protected_by, __unused, __exception,
> > > etc)
> > 
> > Because it's not as intelligible when used as a statement.
> I think thats somewhat debatable.  __fallthrough to me looks like an internal
> macro, whereas fallthrough looks like a comment someone forgot to /* */


I'd rather see:

	switch (foo) {
	case FOO:
		bar |= baz;
		fallthrough;
	case BAR:
		bar |= qux;
		break;
	default:
		error();
	}

than

	switch (foo) {
	case FOO:
		bar |= baz;
		__fallthrough;
	case BAR:
		bar |= qux;
		break;
	default:
		error();
	}

or esoecially

	switch (foo) {
	case FOO:
		bar |= baz;
		/* fallthrough
*/;
	case BAR:
		bar |= qux;
		break;
	default:
		error();
	}

but <shrug>, bikeshed ahoy!...



^ permalink raw reply

* Re: [patch net-next 0/3] net: devlink: Finish network namespace support
From: Jakub Kicinski @ 2019-07-31 22:28 UTC (permalink / raw)
  To: David Ahern; +Cc: Jiri Pirko, netdev, davem, sthemmin, mlxsw
In-Reply-To: <45803ed3-0328-9409-4351-6c26ba8af3cd@gmail.com>

On Wed, 31 Jul 2019 16:07:31 -0600, David Ahern wrote:
> On 7/31/19 4:02 PM, Jakub Kicinski wrote:
> > Can you elaborate further? Ports for most purposes are represented by
> > netdevices. Devlink port instances expose global topological view of
> > the ports which is primarily relevant if you can see the entire ASIC.
> > I think the global configuration and global view of resources is still
> > the most relevant need, so in your diagram you must account for some
> > "all-seeing" instance, e.g.:
> > 
> >    namespace 1 |  namespace 2  | ... | namespace N
> >                |               |     |
> >   { ports 1 }  |   { ports 2 } | ... | { ports N }
> >                |               |     |
> > subdevlink 1   | subdevlink 2  | ... |  subdevlink N
> >          \______      |              _______/
> >                  master ASIC devlink
> >   =================================================
> >                    driver
> > 
> > No?
> 
> sure, there could be a master devlink visible to the user if that makes
> sense or the driver can account for it behind the scenes as the sum of
> the devlink instances.
> 
> The goal is to allow ports within an asic [1] to be divided across
> network namespace where each namespace sees a subset of the ports. This
> allows creating multiple logical switches from a single physical asic.
> 
> [1] within constraints imposed by the driver/hardware - for example to
> account for resources shared by a set of ports. e.g., front panel ports
> 1 - 4 have shared resources and must always be in the same devlink instance.

So the ASIC would start out all partitioned? Presumably some would
still like to use it non-partitioned? What follows there must be a top
level instance to decide on partitioning, and moving resources between
sub-instances.

Right now I don't think there is much info in devlink ports which would
be relevant without full view of the ASIC..

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox