Netdev List

Netdev List
 help / color / mirror / Atom feed

* [net 05/13] ice: Clean control queues only when they are initialized
From: Jeff Kirsher @ 2018-08-23 19:14 UTC (permalink / raw)
  To: davem
  Cc: Preethi Banala, netdev, nhorman, sassmann, jogreene,
	Anirudh Venkataramanan, Jeff Kirsher
In-Reply-To: <20180823191503.15804-1-jeffrey.t.kirsher@intel.com>

From: Preethi Banala <preethi.banala@intel.com>

Clean control queues only when they are initialized. One of the ways to
validate if the basic initialization is done is by checking value of
cq->sq.head and cq->rq.head variables that specify the register address.
This patch adds a check to avoid NULL pointer dereference crash when tried
to shutdown uninitialized control queue.

Signed-off-by: Preethi Banala <preethi.banala@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ice/ice_controlq.c | 24 ++++++++++++-------
 1 file changed, 16 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_controlq.c b/drivers/net/ethernet/intel/ice/ice_controlq.c
index 7c511f144ed6..c064416080e7 100644
--- a/drivers/net/ethernet/intel/ice/ice_controlq.c
+++ b/drivers/net/ethernet/intel/ice/ice_controlq.c
@@ -597,10 +597,14 @@ static enum ice_status ice_init_check_adminq(struct ice_hw *hw)
 	return 0;
 
 init_ctrlq_free_rq:
-	ice_shutdown_rq(hw, cq);
-	ice_shutdown_sq(hw, cq);
-	mutex_destroy(&cq->sq_lock);
-	mutex_destroy(&cq->rq_lock);
+	if (cq->rq.head) {
+		ice_shutdown_rq(hw, cq);
+		mutex_destroy(&cq->rq_lock);
+	}
+	if (cq->sq.head) {
+		ice_shutdown_sq(hw, cq);
+		mutex_destroy(&cq->sq_lock);
+	}
 	return status;
 }
 
@@ -706,10 +710,14 @@ static void ice_shutdown_ctrlq(struct ice_hw *hw, enum ice_ctl_q q_type)
 		return;
 	}
 
-	ice_shutdown_sq(hw, cq);
-	ice_shutdown_rq(hw, cq);
-	mutex_destroy(&cq->sq_lock);
-	mutex_destroy(&cq->rq_lock);
+	if (cq->sq.head) {
+		ice_shutdown_sq(hw, cq);
+		mutex_destroy(&cq->sq_lock);
+	}
+	if (cq->rq.head) {
+		ice_shutdown_rq(hw, cq);
+		mutex_destroy(&cq->rq_lock);
+	}
 }
 
 /**
-- 
2.17.1

^ permalink raw reply related

* [net 06/13] ice: Fix bugs in control queue processing
From: Jeff Kirsher @ 2018-08-23 19:14 UTC (permalink / raw)
  To: davem
  Cc: Anirudh Venkataramanan, netdev, nhorman, sassmann, jogreene,
	Jeff Kirsher
In-Reply-To: <20180823191503.15804-1-jeffrey.t.kirsher@intel.com>

From: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>

This patch is a consolidation of multiple bug fixes for control queue
processing.

1)  In ice_clean_adminq_subtask() remove unnecessary reads/writes to
    registers. The bits PFINT_FW_CTL, PFINT_MBX_CTL and PFINT_SB_CTL
    are not set when an interrupt arrives, which means that clearing them
    again can be omitted.

2)  Get an accurate value in "pending" by re-reading the control queue
    head register from the hardware.

3)  Fix a corner case involving lost control queue messages by checking
    for new control messages (using ice_ctrlq_pending) before exiting the
    cleanup routine.

Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ice/ice_controlq.c |  5 +++-
 drivers/net/ethernet/intel/ice/ice_main.c     | 26 ++++++++++++++++---
 2 files changed, 26 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_controlq.c b/drivers/net/ethernet/intel/ice/ice_controlq.c
index c064416080e7..62be72fdc8f3 100644
--- a/drivers/net/ethernet/intel/ice/ice_controlq.c
+++ b/drivers/net/ethernet/intel/ice/ice_controlq.c
@@ -1065,8 +1065,11 @@ ice_clean_rq_elem(struct ice_hw *hw, struct ice_ctl_q_info *cq,
 
 clean_rq_elem_out:
 	/* Set pending if needed, unlock and return */
-	if (pending)
+	if (pending) {
+		/* re-read HW head to calculate actual pending messages */
+		ntu = (u16)(rd32(hw, cq->rq.head) & cq->rq.head_mask);
 		*pending = (u16)((ntc > ntu ? cq->rq.count : 0) + (ntu - ntc));
+	}
 clean_rq_elem_err:
 	mutex_unlock(&cq->rq_lock);
 
diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c
index 7d65e0ed3588..f3ba4f76b6cb 100644
--- a/drivers/net/ethernet/intel/ice/ice_main.c
+++ b/drivers/net/ethernet/intel/ice/ice_main.c
@@ -916,6 +916,21 @@ static int __ice_clean_ctrlq(struct ice_pf *pf, enum ice_ctl_q q_type)
 	return pending && (i == ICE_DFLT_IRQ_WORK);
 }
 
+/**
+ * ice_ctrlq_pending - check if there is a difference between ntc and ntu
+ * @hw: pointer to hardware info
+ * @cq: control queue information
+ *
+ * returns true if there are pending messages in a queue, false if there aren't
+ */
+static bool ice_ctrlq_pending(struct ice_hw *hw, struct ice_ctl_q_info *cq)
+{
+	u16 ntu;
+
+	ntu = (u16)(rd32(hw, cq->rq.head) & cq->rq.head_mask);
+	return cq->rq.next_to_clean != ntu;
+}
+
 /**
  * ice_clean_adminq_subtask - clean the AdminQ rings
  * @pf: board private structure
@@ -923,7 +938,6 @@ static int __ice_clean_ctrlq(struct ice_pf *pf, enum ice_ctl_q q_type)
 static void ice_clean_adminq_subtask(struct ice_pf *pf)
 {
 	struct ice_hw *hw = &pf->hw;
-	u32 val;
 
 	if (!test_bit(__ICE_ADMINQ_EVENT_PENDING, pf->state))
 		return;
@@ -933,9 +947,13 @@ static void ice_clean_adminq_subtask(struct ice_pf *pf)
 
 	clear_bit(__ICE_ADMINQ_EVENT_PENDING, pf->state);
 
-	/* re-enable Admin queue interrupt causes */
-	val = rd32(hw, PFINT_FW_CTL);
-	wr32(hw, PFINT_FW_CTL, (val | PFINT_FW_CTL_CAUSE_ENA_M));
+	/* There might be a situation where new messages arrive to a control
+	 * queue between processing the last message and clearing the
+	 * EVENT_PENDING bit. So before exiting, check queue head again (using
+	 * ice_ctrlq_pending) and process new messages if any.
+	 */
+	if (ice_ctrlq_pending(hw, &hw->adminq))
+		__ice_clean_ctrlq(pf, ICE_CTL_Q_ADMIN);
 
 	ice_flush(hw);
 }
-- 
2.17.1

^ permalink raw reply related

* [net 09/13] ice: Update to interrupts enabled in OICR
From: Jeff Kirsher @ 2018-08-23 19:14 UTC (permalink / raw)
  To: davem
  Cc: Bruce Allan, netdev, nhorman, sassmann, jogreene,
	Anirudh Venkataramanan, Jeff Kirsher
In-Reply-To: <20180823191503.15804-1-jeffrey.t.kirsher@intel.com>

From: Bruce Allan <bruce.w.allan@intel.com>

Remove the following interrupt causes that are not applicable or not
handled:
- PFINT_OICR_HLP_RDY_M
- PFINT_OICR_CPM_RDY_M
- PFINT_OICR_GPIO_M
- PFINT_OICR_STORM_DETECT_M

Add the following interrupt cause that's actually handled in ice_misc_intr:
- PFINT_OICR_PE_CRITERR_M

Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ice/ice_hw_autogen.h | 8 --------
 drivers/net/ethernet/intel/ice/ice_main.c       | 9 +++------
 2 files changed, 3 insertions(+), 14 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_hw_autogen.h b/drivers/net/ethernet/intel/ice/ice_hw_autogen.h
index 499904874b3f..6076fc87df9d 100644
--- a/drivers/net/ethernet/intel/ice/ice_hw_autogen.h
+++ b/drivers/net/ethernet/intel/ice/ice_hw_autogen.h
@@ -121,10 +121,6 @@
 #define PFINT_FW_CTL_CAUSE_ENA_S	30
 #define PFINT_FW_CTL_CAUSE_ENA_M	BIT(PFINT_FW_CTL_CAUSE_ENA_S)
 #define PFINT_OICR			0x0016CA00
-#define PFINT_OICR_HLP_RDY_S		14
-#define PFINT_OICR_HLP_RDY_M		BIT(PFINT_OICR_HLP_RDY_S)
-#define PFINT_OICR_CPM_RDY_S		15
-#define PFINT_OICR_CPM_RDY_M		BIT(PFINT_OICR_CPM_RDY_S)
 #define PFINT_OICR_ECC_ERR_S		16
 #define PFINT_OICR_ECC_ERR_M		BIT(PFINT_OICR_ECC_ERR_S)
 #define PFINT_OICR_MAL_DETECT_S		19
@@ -133,10 +129,6 @@
 #define PFINT_OICR_GRST_M		BIT(PFINT_OICR_GRST_S)
 #define PFINT_OICR_PCI_EXCEPTION_S	21
 #define PFINT_OICR_PCI_EXCEPTION_M	BIT(PFINT_OICR_PCI_EXCEPTION_S)
-#define PFINT_OICR_GPIO_S		22
-#define PFINT_OICR_GPIO_M		BIT(PFINT_OICR_GPIO_S)
-#define PFINT_OICR_STORM_DETECT_S	24
-#define PFINT_OICR_STORM_DETECT_M	BIT(PFINT_OICR_STORM_DETECT_S)
 #define PFINT_OICR_HMC_ERR_S		26
 #define PFINT_OICR_HMC_ERR_M		BIT(PFINT_OICR_HMC_ERR_S)
 #define PFINT_OICR_PE_CRITERR_S		28
diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c
index 68003fad33d1..34be94a30a60 100644
--- a/drivers/net/ethernet/intel/ice/ice_main.c
+++ b/drivers/net/ethernet/intel/ice/ice_main.c
@@ -1704,15 +1704,12 @@ static void ice_ena_misc_vector(struct ice_pf *pf)
 	wr32(hw, PFINT_OICR_ENA, 0);	/* disable all */
 	rd32(hw, PFINT_OICR);		/* read to clear */
 
-	val = (PFINT_OICR_HLP_RDY_M |
-	       PFINT_OICR_CPM_RDY_M |
-	       PFINT_OICR_ECC_ERR_M |
+	val = (PFINT_OICR_ECC_ERR_M |
 	       PFINT_OICR_MAL_DETECT_M |
 	       PFINT_OICR_GRST_M |
 	       PFINT_OICR_PCI_EXCEPTION_M |
-	       PFINT_OICR_GPIO_M |
-	       PFINT_OICR_STORM_DETECT_M |
-	       PFINT_OICR_HMC_ERR_M);
+	       PFINT_OICR_HMC_ERR_M |
+	       PFINT_OICR_PE_CRITERR_M);
 
 	wr32(hw, PFINT_OICR_ENA, val);
 
-- 
2.17.1

^ permalink raw reply related

* [net 07/13] ice: Use order_base_2 to calculate higher power of 2
From: Jeff Kirsher @ 2018-08-23 19:14 UTC (permalink / raw)
  To: davem
  Cc: Jacob Keller, netdev, nhorman, sassmann, jogreene,
	Anirudh Venkataramanan, Jeff Kirsher
In-Reply-To: <20180823191503.15804-1-jeffrey.t.kirsher@intel.com>

From: Jacob Keller <jacob.e.keller@intel.com>

Currently, we use a combination of ilog2 and is_power_of_2() to
calculate the next power of 2 for the qcount. This appears to be causing
a warning on some combinations of GCC and the Linux kernel:

MODPOST 1 modules
WARNING: "____ilog2_NaN" [ice.ko] undefined!

This appears to because because GCC realizes that qcount could be zero
in some circumstances and thus attempts to link against the
intentionally undefined ___ilog2_NaN function.

The order_base_2 function is intentionally defined to return 0 when
passed 0 as an argument, and thus will be safe to use here.

This not only fixes the warning but makes the resulting code slightly
cleaner, and is really what we should have used originally.

Also update the comment to make it more clear that we are rounding up,
not just incrementing the ilog2 of qcount unconditionally.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ice/ice_main.c | 7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c
index f3ba4f76b6cb..3eff1d2d1543 100644
--- a/drivers/net/ethernet/intel/ice/ice_main.c
+++ b/drivers/net/ethernet/intel/ice/ice_main.c
@@ -1313,11 +1313,8 @@ static void ice_vsi_setup_q_map(struct ice_vsi *vsi, struct ice_vsi_ctx *ctxt)
 		qcount = numq_tc;
 	}

-	/* find higher power-of-2 of qcount */
-	pow = ilog2(qcount);
-
-	if (!is_power_of_2(qcount))
-		pow++;
+	/* find the (rounded up) power-of-2 of qcount */
+	pow = order_base_2(qcount);

 	for (i = 0; i < ICE_MAX_TRAFFIC_CLASS; i++) {
 		if (!(vsi->tc_cfg.ena_tc & BIT(i))) {
-- 
2.17.1

^ permalink raw reply related

* [net 04/13] ice: Report stats for allocated queues via ethtool stats
From: Jeff Kirsher @ 2018-08-23 19:14 UTC (permalink / raw)
  To: davem
  Cc: Jacob Keller, netdev, nhorman, sassmann, jogreene,
	Anirudh Venkataramanan, Jeff Kirsher
In-Reply-To: <20180823191503.15804-1-jeffrey.t.kirsher@intel.com>

From: Jacob Keller <jacob.e.keller@intel.com>

It is not safe to have the string table for statistics change order or
size over the lifetime of a given netdevice. This is because of the
nature of the 3-step process for obtaining stats. First, user space
performs a request for the size of the strings table. Second it performs
a separate request for the strings themselves, after allocating space
for the table. Third, it requests the stats themselves, also allocating
space for the table.

If the size decreased, there is potential to see garbage data or stats
values. In the worst case, we could potentially see stats values become
mis-aligned with their strings, so that it looks like a statistic is
being reported differently than it actually is.

Even worse, if the size increased, there is potential that the strings
table or stats table was not allocated large enough and the stats code
could access and write to memory it should not, potentially resulting in
undefined behavior and system crashes.

It isn't even safe if the size always changes under the RTNL lock. This
is because the calls take place over multiple user space commands, so it
is not possible to hold the RTNL lock for the entire duration of
obtaining strings and stats. Further, not all consumers of the ethtool
API are the user space ethtool program, and it is possible that one
assumes the strings will not change (valid under the current contract),
and thus only requests the stats values when requesting stats in a loop.

Finally, it's not possible in the general case to detect when the size
changes, because it is quite possible that one value which could impact
the stat size increased, while another decreased. This would result in
the same total number of stats, but reordering them so that stats no
longer line up with the strings they belong to. Since only size changes
aren't enough, we would need some sort of hash or token to determine
when the strings no longer match. This would require extending the
ethtool stats commands, but there is no more space in the relevant
structures.

The real solution to resolve this would be to add a completely new API
for stats, probably over netlink.

In the ice driver, the only thing impacting the stats that is not
constant is the number of queues. Instead of reporting stats for each
used queue, report stats for each allocated queue. We do not change the
number of queues allocated for a given netdevice, as we pass this into
the alloc_etherdev_mq() function to set the num_tx_queues and
num_rx_queues.

This resolves the potential bugs at the slight cost of displaying many
queue statistics which will not be activated.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ice/ice.h         |  7 +++
 drivers/net/ethernet/intel/ice/ice_ethtool.c | 52 +++++++++++++++-----
 2 files changed, 46 insertions(+), 13 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice.h b/drivers/net/ethernet/intel/ice/ice.h
index d8b5fff581e7..ed071ea75f20 100644
--- a/drivers/net/ethernet/intel/ice/ice.h
+++ b/drivers/net/ethernet/intel/ice/ice.h
@@ -89,6 +89,13 @@ extern const char ice_drv_ver[];
 #define ice_for_each_rxq(vsi, i) \
 	for ((i) = 0; (i) < (vsi)->num_rxq; (i)++)

+/* Macros for each allocated tx/rx ring whether used or not in a VSI */
+#define ice_for_each_alloc_txq(vsi, i) \
+	for ((i) = 0; (i) < (vsi)->alloc_txq; (i)++)
+
+#define ice_for_each_alloc_rxq(vsi, i) \
+	for ((i) = 0; (i) < (vsi)->alloc_rxq; (i)++)
+
 struct ice_tc_info {
 	u16 qoffset;
 	u16 qcount;
diff --git a/drivers/net/ethernet/intel/ice/ice_ethtool.c b/drivers/net/ethernet/intel/ice/ice_ethtool.c
index 1db304c01d10..c71a9b528d6d 100644
--- a/drivers/net/ethernet/intel/ice/ice_ethtool.c
+++ b/drivers/net/ethernet/intel/ice/ice_ethtool.c
@@ -26,7 +26,7 @@ static int ice_q_stats_len(struct net_device *netdev)
 {
 	struct ice_netdev_priv *np = netdev_priv(netdev);

-	return ((np->vsi->num_txq + np->vsi->num_rxq) *
+	return ((np->vsi->alloc_txq + np->vsi->alloc_rxq) *
 		(sizeof(struct ice_q_stats) / sizeof(u64)));
 }

@@ -218,7 +218,7 @@ static void ice_get_strings(struct net_device *netdev, u32 stringset, u8 *data)
 			p += ETH_GSTRING_LEN;
 		}

-		ice_for_each_txq(vsi, i) {
+		ice_for_each_alloc_txq(vsi, i) {
 			snprintf(p, ETH_GSTRING_LEN,
 				 "tx-queue-%u.tx_packets", i);
 			p += ETH_GSTRING_LEN;
@@ -226,7 +226,7 @@ static void ice_get_strings(struct net_device *netdev, u32 stringset, u8 *data)
 			p += ETH_GSTRING_LEN;
 		}

-		ice_for_each_rxq(vsi, i) {
+		ice_for_each_alloc_rxq(vsi, i) {
 			snprintf(p, ETH_GSTRING_LEN,
 				 "rx-queue-%u.rx_packets", i);
 			p += ETH_GSTRING_LEN;
@@ -253,6 +253,24 @@ static int ice_get_sset_count(struct net_device *netdev, int sset)
 {
 	switch (sset) {
 	case ETH_SS_STATS:
+		/* The number (and order) of strings reported *must* remain
+		 * constant for a given netdevice. This function must not
+		 * report a different number based on run time parameters
+		 * (such as the number of queues in use, or the setting of
+		 * a private ethtool flag). This is due to the nature of the
+		 * ethtool stats API.
+		 *
+		 * User space programs such as ethtool must make 3 separate
+		 * ioctl requests, one for size, one for the strings, and
+		 * finally one for the stats. Since these cross into
+		 * user space, changes to the number or size could result in
+		 * undefined memory access or incorrect string<->value
+		 * correlations for statistics.
+		 *
+		 * Even if it appears to be safe, changes to the size or
+		 * order of strings will suffer from race conditions and are
+		 * not safe.
+		 */
 		return ICE_ALL_STATS_LEN(netdev);
 	default:
 		return -EOPNOTSUPP;
@@ -280,18 +298,26 @@ ice_get_ethtool_stats(struct net_device *netdev,
 	/* populate per queue stats */
 	rcu_read_lock();

-	ice_for_each_txq(vsi, j) {
+	ice_for_each_alloc_txq(vsi, j) {
 		ring = READ_ONCE(vsi->tx_rings[j]);
-		if (!ring)
-			continue;
-		data[i++] = ring->stats.pkts;
-		data[i++] = ring->stats.bytes;
+		if (ring) {
+			data[i++] = ring->stats.pkts;
+			data[i++] = ring->stats.bytes;
+		} else {
+			data[i++] = 0;
+			data[i++] = 0;
+		}
 	}

-	ice_for_each_rxq(vsi, j) {
+	ice_for_each_alloc_rxq(vsi, j) {
 		ring = READ_ONCE(vsi->rx_rings[j]);
-		data[i++] = ring->stats.pkts;
-		data[i++] = ring->stats.bytes;
+		if (ring) {
+			data[i++] = ring->stats.pkts;
+			data[i++] = ring->stats.bytes;
+		} else {
+			data[i++] = 0;
+			data[i++] = 0;
+		}
 	}

 	rcu_read_unlock();
@@ -519,7 +545,7 @@ ice_set_ringparam(struct net_device *netdev, struct ethtool_ringparam *ring)
 		goto done;
 	}

-	for (i = 0; i < vsi->num_txq; i++) {
+	for (i = 0; i < vsi->alloc_txq; i++) {
 		/* clone ring and setup updated count */
 		tx_rings[i] = *vsi->tx_rings[i];
 		tx_rings[i].count = new_tx_cnt;
@@ -551,7 +577,7 @@ ice_set_ringparam(struct net_device *netdev, struct ethtool_ringparam *ring)
 		goto done;
 	}

-	for (i = 0; i < vsi->num_rxq; i++) {
+	for (i = 0; i < vsi->alloc_rxq; i++) {
 		/* clone ring and setup updated count */
 		rx_rings[i] = *vsi->rx_rings[i];
 		rx_rings[i].count = new_rx_cnt;
-- 
2.17.1

^ permalink raw reply related

* [net 02/13] ice: Remove unnecessary node owner check
From: Jeff Kirsher @ 2018-08-23 19:14 UTC (permalink / raw)
  To: davem
  Cc: Bruce Allan, netdev, nhorman, sassmann, jogreene,
	Anirudh Venkataramanan, Jeff Kirsher
In-Reply-To: <20180823191503.15804-1-jeffrey.t.kirsher@intel.com>

From: Bruce Allan <bruce.w.allan@intel.com>

There is already a check for owner == ICE_SCHED_NODE_OWNER_LAN at the
beginning of ice_sched_update_vsi_child_nodes. Remove the additional
check to address the static analysis tool smatch issue "warn: we tested
'owner' before and it was 'false'".

Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ice/ice_sched.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_sched.c b/drivers/net/ethernet/intel/ice/ice_sched.c
index 2e6c1d92cc88..eeae199469b6 100644
--- a/drivers/net/ethernet/intel/ice/ice_sched.c
+++ b/drivers/net/ethernet/intel/ice/ice_sched.c
@@ -1576,8 +1576,7 @@ ice_sched_update_vsi_child_nodes(struct ice_port_info *pi, u16 vsi_id, u8 tc,
 			return status;
 	}
 
-	if (owner == ICE_SCHED_NODE_OWNER_LAN)
-		vsi->max_lanq[tc] = new_numqs;
+	vsi->max_lanq[tc] = new_numqs;
 
 	return status;
 }
-- 
2.17.1

^ permalink raw reply related

* [net 01/13] ice: Fix multiple static analyser warnings
From: Jeff Kirsher @ 2018-08-23 19:14 UTC (permalink / raw)
  To: davem
  Cc: Anirudh Venkataramanan, netdev, nhorman, sassmann, jogreene,
	Jeff Kirsher
In-Reply-To: <20180823191503.15804-1-jeffrey.t.kirsher@intel.com>

From: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>

This patch fixes the following smatch errors:

1) Fix "odd binop '0x0 & 0xc'" when performing the bitwise-and with a
   constant value of zero (ICE_AQC_GSET_RSS_LUT_TABLE_SIZE_128_FLAG).
   Remove a similar bitwise-and with 0 in ice_add_marker_act() and use the
   right mask ICE_LG_ACT_GENERIC_OFFSET_M in the expression.

2) Fix a similar issue "odd binop '0x0 & 0x1800' in ice_req_irq_msix_misc.

3) Fix "odd binop '0x380000 & 0x7fff8'" in ice_add_marker_act(). Also, use
   a new define ICE_LG_ACT_GENERIC_OFF_RX_DESC_PROF_IDX instead of magic
   number '7'.

4) Fix warn: odd binop '0x0 & 0x18' in ice_set_dflt_vsi_ctx() by removing
   unnecessary logic to explicitly unset bits 3 and 4 in port_vlan_bits.
   These bits are unset already by the memset on ctxt->info.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 .../net/ethernet/intel/ice/ice_adminq_cmd.h   |  1 +
 drivers/net/ethernet/intel/ice/ice_common.c   | 25 +++++++++++--------
 drivers/net/ethernet/intel/ice/ice_main.c     | 19 ++++++--------
 drivers/net/ethernet/intel/ice/ice_switch.c   |  4 +--
 4 files changed, 25 insertions(+), 24 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h b/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h
index 7541ec2270b3..6d3e11659ba5 100644
--- a/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h
+++ b/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h
@@ -594,6 +594,7 @@ struct ice_sw_rule_lg_act {
 #define ICE_LG_ACT_GENERIC_OFFSET_M	(0x7 << ICE_LG_ACT_GENERIC_OFFSET_S)
 #define ICE_LG_ACT_GENERIC_PRIORITY_S	22
 #define ICE_LG_ACT_GENERIC_PRIORITY_M	(0x7 << ICE_LG_ACT_GENERIC_PRIORITY_S)
+#define ICE_LG_ACT_GENERIC_OFF_RX_DESC_PROF_IDX	7
 
 	/* Action = 7 - Set Stat count */
 #define ICE_LG_ACT_STAT_COUNT		0x7
diff --git a/drivers/net/ethernet/intel/ice/ice_common.c b/drivers/net/ethernet/intel/ice/ice_common.c
index 71d032cc5fa7..d5300b606d5a 100644
--- a/drivers/net/ethernet/intel/ice/ice_common.c
+++ b/drivers/net/ethernet/intel/ice/ice_common.c
@@ -1619,20 +1619,23 @@ __ice_aq_get_set_rss_lut(struct ice_hw *hw, u16 vsi_id, u8 lut_type, u8 *lut,
 	}
 
 	/* LUT size is only valid for Global and PF table types */
-	if (lut_size == ICE_AQC_GSET_RSS_LUT_TABLE_SIZE_128) {
-		flags |= (ICE_AQC_GSET_RSS_LUT_TABLE_SIZE_128_FLAG <<
-			  ICE_AQC_GSET_RSS_LUT_TABLE_SIZE_S) &
-			 ICE_AQC_GSET_RSS_LUT_TABLE_SIZE_M;
-	} else if (lut_size == ICE_AQC_GSET_RSS_LUT_TABLE_SIZE_512) {
+	switch (lut_size) {
+	case ICE_AQC_GSET_RSS_LUT_TABLE_SIZE_128:
+		break;
+	case ICE_AQC_GSET_RSS_LUT_TABLE_SIZE_512:
 		flags |= (ICE_AQC_GSET_RSS_LUT_TABLE_SIZE_512_FLAG <<
 			  ICE_AQC_GSET_RSS_LUT_TABLE_SIZE_S) &
 			 ICE_AQC_GSET_RSS_LUT_TABLE_SIZE_M;
-	} else if ((lut_size == ICE_AQC_GSET_RSS_LUT_TABLE_SIZE_2K) &&
-		   (lut_type == ICE_AQC_GSET_RSS_LUT_TABLE_TYPE_PF)) {
-		flags |= (ICE_AQC_GSET_RSS_LUT_TABLE_SIZE_2K_FLAG <<
-			  ICE_AQC_GSET_RSS_LUT_TABLE_SIZE_S) &
-			 ICE_AQC_GSET_RSS_LUT_TABLE_SIZE_M;
-	} else {
+		break;
+	case ICE_AQC_GSET_RSS_LUT_TABLE_SIZE_2K:
+		if (lut_type == ICE_AQC_GSET_RSS_LUT_TABLE_TYPE_PF) {
+			flags |= (ICE_AQC_GSET_RSS_LUT_TABLE_SIZE_2K_FLAG <<
+				  ICE_AQC_GSET_RSS_LUT_TABLE_SIZE_S) &
+				 ICE_AQC_GSET_RSS_LUT_TABLE_SIZE_M;
+			break;
+		}
+		/* fall-through */
+	default:
 		status = ICE_ERR_PARAM;
 		goto ice_aq_get_set_rss_lut_exit;
 	}
diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c
index 5299caf55a7f..186e764a469a 100644
--- a/drivers/net/ethernet/intel/ice/ice_main.c
+++ b/drivers/net/ethernet/intel/ice/ice_main.c
@@ -1352,14 +1352,13 @@ static void ice_set_dflt_vsi_ctx(struct ice_vsi_ctx *ctxt)
 	ctxt->info.sw_flags = ICE_AQ_VSI_SW_FLAG_SRC_PRUNE;
 	/* Traffic from VSI can be sent to LAN */
 	ctxt->info.sw_flags2 = ICE_AQ_VSI_SW_FLAG_LAN_ENA;
-	/* Allow all packets untagged/tagged */
+	/* By default bits 3 and 4 in port_vlan_flags are 0's which results in
+	 * legacy behavior (show VLAN, DEI, and UP) in descriptor. Also, allow
+	 * all packets untagged/tagged.
+	 */
 	ctxt->info.port_vlan_flags = ((ICE_AQ_VSI_PVLAN_MODE_ALL &
 				       ICE_AQ_VSI_PVLAN_MODE_M) >>
 				      ICE_AQ_VSI_PVLAN_MODE_S);
-	/* Show VLAN/UP from packets in Rx descriptors */
-	ctxt->info.port_vlan_flags |= ((ICE_AQ_VSI_PVLAN_EMOD_STR_BOTH &
-					ICE_AQ_VSI_PVLAN_EMOD_M) >>
-				       ICE_AQ_VSI_PVLAN_EMOD_S);
 	/* Have 1:1 UP mapping for both ingress/egress tables */
 	table |= ICE_UP_TABLE_TRANSLATE(0, 0);
 	table |= ICE_UP_TABLE_TRANSLATE(1, 1);
@@ -2058,15 +2057,13 @@ static int ice_req_irq_msix_misc(struct ice_pf *pf)
 skip_req_irq:
 	ice_ena_misc_vector(pf);
 
-	val = (pf->oicr_idx & PFINT_OICR_CTL_MSIX_INDX_M) |
-	      (ICE_RX_ITR & PFINT_OICR_CTL_ITR_INDX_M) |
-	      PFINT_OICR_CTL_CAUSE_ENA_M;
+	val = ((pf->oicr_idx & PFINT_OICR_CTL_MSIX_INDX_M) |
+	       PFINT_OICR_CTL_CAUSE_ENA_M);
 	wr32(hw, PFINT_OICR_CTL, val);
 
 	/* This enables Admin queue Interrupt causes */
-	val = (pf->oicr_idx & PFINT_FW_CTL_MSIX_INDX_M) |
-	      (ICE_RX_ITR & PFINT_FW_CTL_ITR_INDX_M) |
-	      PFINT_FW_CTL_CAUSE_ENA_M;
+	val = ((pf->oicr_idx & PFINT_FW_CTL_MSIX_INDX_M) |
+	       PFINT_FW_CTL_CAUSE_ENA_M);
 	wr32(hw, PFINT_FW_CTL, val);
 
 	itr_gran = hw->itr_gran_200;
diff --git a/drivers/net/ethernet/intel/ice/ice_switch.c b/drivers/net/ethernet/intel/ice/ice_switch.c
index 723d15f1e90b..6b7ec2ae5ad6 100644
--- a/drivers/net/ethernet/intel/ice/ice_switch.c
+++ b/drivers/net/ethernet/intel/ice/ice_switch.c
@@ -645,14 +645,14 @@ ice_add_marker_act(struct ice_hw *hw, struct ice_fltr_mgmt_list_entry *m_ent,
 	act |= (1 << ICE_LG_ACT_GENERIC_VALUE_S) & ICE_LG_ACT_GENERIC_VALUE_M;
 	lg_act->pdata.lg_act.act[1] = cpu_to_le32(act);
 
-	act = (7 << ICE_LG_ACT_GENERIC_OFFSET_S) & ICE_LG_ACT_GENERIC_VALUE_M;
+	act = (ICE_LG_ACT_GENERIC_OFF_RX_DESC_PROF_IDX <<
+	       ICE_LG_ACT_GENERIC_OFFSET_S) & ICE_LG_ACT_GENERIC_OFFSET_M;
 
 	/* Third action Marker value */
 	act |= ICE_LG_ACT_GENERIC;
 	act |= (sw_marker << ICE_LG_ACT_GENERIC_VALUE_S) &
 		ICE_LG_ACT_GENERIC_VALUE_M;
 
-	act |= (0 << ICE_LG_ACT_GENERIC_OFFSET_S) & ICE_LG_ACT_GENERIC_VALUE_M;
 	lg_act->pdata.lg_act.act[2] = cpu_to_le32(act);
 
 	/* call the fill switch rule to fill the lookup tx rx structure */
-- 
2.17.1

^ permalink raw reply related

* [net 03/13] ice: Cleanup magic number
From: Jeff Kirsher @ 2018-08-23 19:14 UTC (permalink / raw)
  To: davem
  Cc: Anirudh Venkataramanan, netdev, nhorman, sassmann, jogreene,
	Jeff Kirsher
In-Reply-To: <20180823191503.15804-1-jeffrey.t.kirsher@intel.com>

From: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>

Use define for the unit size shift of the Rx LAN context descriptor base
address instead of the magic number 7.

Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/ice/ice_lan_tx_rx.h | 1 +
 drivers/net/ethernet/intel/ice/ice_main.c      | 2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_lan_tx_rx.h b/drivers/net/ethernet/intel/ice/ice_lan_tx_rx.h
index d23a91665b46..068dbc740b76 100644
--- a/drivers/net/ethernet/intel/ice/ice_lan_tx_rx.h
+++ b/drivers/net/ethernet/intel/ice/ice_lan_tx_rx.h
@@ -265,6 +265,7 @@ enum ice_rx_flex_desc_status_error_0_bits {
 struct ice_rlan_ctx {
 	u16 head;
 	u16 cpuid; /* bigger than needed, see above for reason */
+#define ICE_RLAN_BASE_S 7
 	u64 base;
 	u16 qlen;
 #define ICE_RLAN_CTX_DBUF_S 7
diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c
index 186e764a469a..7d65e0ed3588 100644
--- a/drivers/net/ethernet/intel/ice/ice_main.c
+++ b/drivers/net/ethernet/intel/ice/ice_main.c
@@ -3983,7 +3983,7 @@ static int ice_setup_rx_ctx(struct ice_ring *ring)
 	/* clear the context structure first */
 	memset(&rlan_ctx, 0, sizeof(rlan_ctx));
 
-	rlan_ctx.base = ring->dma >> 7;
+	rlan_ctx.base = ring->dma >> ICE_RLAN_BASE_S;
 
 	rlan_ctx.qlen = ring->count;
 
-- 
2.17.1

^ permalink raw reply related

* [net 00/13][pull request] Intel Wired LAN Driver Fixes 2018-08-23
From: Jeff Kirsher @ 2018-08-23 19:14 UTC (permalink / raw)
  To: davem; +Cc: Jeff Kirsher, netdev, nhorman, sassmann, jogreene

This series contains bug fixes to the ice driver.

Anirudh provides several fixes, starting with static analysis fixes by
replacing a bitwise-and with a constant value and replace "magic"
numbers with defines.  Fixed the control queue processing by removing
unnecessary read/writes to registers, as well as getting a accurate
value for "pending".  Added additional checks to avoid NULL pointer
dereferences.  Fixed up code formatting issues, by cleaning up code
comments and coding style.

Bruce cleans up a duplicate check for owner, within the same function.
Also cleans up interrupt causes that are not handled or applicable.
Fix checkpatch warning about the use of bool in structures due to the
wasted space and size of bool, so convert struct members to u8 instead.

Jake fixes a number of potential bugs in the reporting of stats via
ethtool, by simply reporting all the queue statistics, even for the
queues that are not activated.  Fixed a compiler warning, as well as
make the code a bit cleaner but just using order_base_2() for
calculating the power-of-2.

Preethi adds a check to avoid a NULL pointer dereference crash during
initialization.

Brett clarifies the code when it comes to port VLANs and regular VLANs,
by renaming defines and field values to match their intended use and
purpose.

Jesse initializes a variable to avoid garbage values being returned to
the caller.

The following are changes since commit 0d092f06faa46b95a8e07b9bb5737b7c0f1176ee:
  Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth
and are available in the git repository at:
  git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-queue 100GbE

Anirudh Venkataramanan (5):
  ice: Fix multiple static analyser warnings
  ice: Cleanup magic number
  ice: Fix bugs in control queue processing
  ice: Fix a few null pointer dereference issues
  ice: Trivial formatting fixes

Brett Creeley (1):
  ice: Set VLAN flags correctly

Bruce Allan (3):
  ice: Remove unnecessary node owner check
  ice: Update to interrupts enabled in OICR
  ice: Change struct members from bool to u8

Jacob Keller (2):
  ice: Report stats for allocated queues via ethtool stats
  ice: Use order_base_2 to calculate higher power of 2

Jesse Brandeburg (1):
  ice: Fix potential return of uninitialized value

Preethi Banala (1):
  ice: Clean control queues only when they are initialized

 drivers/net/ethernet/intel/ice/ice.h          |  15 ++-
 .../net/ethernet/intel/ice/ice_adminq_cmd.h   |  25 ++--
 drivers/net/ethernet/intel/ice/ice_common.c   |  30 +++--
 drivers/net/ethernet/intel/ice/ice_controlq.c |  29 +++--
 drivers/net/ethernet/intel/ice/ice_ethtool.c  |  52 ++++++--
 .../net/ethernet/intel/ice/ice_hw_autogen.h   |   8 --
 .../net/ethernet/intel/ice/ice_lan_tx_rx.h    |   1 +
 drivers/net/ethernet/intel/ice/ice_main.c     | 115 ++++++++++--------
 drivers/net/ethernet/intel/ice/ice_nvm.c      |   5 +-
 drivers/net/ethernet/intel/ice/ice_sched.c    |   3 +-
 drivers/net/ethernet/intel/ice/ice_switch.c   |   4 +-
 drivers/net/ethernet/intel/ice/ice_switch.h   |   6 +-
 drivers/net/ethernet/intel/ice/ice_txrx.h     |   2 +-
 drivers/net/ethernet/intel/ice/ice_type.h     |  16 +--
 14 files changed, 185 insertions(+), 126 deletions(-)

-- 
2.17.1

^ permalink raw reply

* Re: KASAN: use-after-free Read in ceph_destroy_options
From: Ilya Dryomov @ 2018-08-23 19:06 UTC (permalink / raw)
  To: syzbot+8ab6f1042021b4eed062
  Cc: Ceph Development, David S. Miller, linux-kernel, netdev,
	Sage Weil, syzkaller-bugs, Yan, Zheng
In-Reply-To: <0000000000002a37a205741b2ae1@google.com>

On Thu, Aug 23, 2018 at 4:35 PM syzbot
<syzbot+8ab6f1042021b4eed062@syzkaller.appspotmail.com> wrote:
>
> Hello,
>
> syzbot found the following crash on:
>
> HEAD commit:    cc3d190b12b3 Add linux-next specific files for 20180822
> git tree:       linux-next
> console output: https://syzkaller.appspot.com/x/log.txt?x=107d322e400000
> kernel config:  https://syzkaller.appspot.com/x/.config?x=ce7f661707bc9904
> dashboard link: https://syzkaller.appspot.com/bug?extid=8ab6f1042021b4eed062
> compiler:       gcc (GCC) 8.0.1 20180413 (experimental)
>
> Unfortunately, I don't have any reproducer for this crash yet.
>
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: syzbot+8ab6f1042021b4eed062@syzkaller.appspotmail.com
>
> ==================================================================
> BUG: KASAN: use-after-free in ceph_destroy_options+0xe0/0x110
> net/ceph/ceph_common.c:289
> Read of size 8 at addr ffff8801d8f4dd50 by task syz-executor2/7977
>
> CPU: 0 PID: 7977 Comm: syz-executor2 Not tainted 4.18.0-next-20180822+ #45
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> Google 01/01/2011
> Call Trace:
>   __dump_stack lib/dump_stack.c:77 [inline]
>   dump_stack+0x1c9/0x2b4 lib/dump_stack.c:113
>   print_address_description+0x6c/0x20b mm/kasan/report.c:256
>   kasan_report_error mm/kasan/report.c:354 [inline]
>   kasan_report.cold.7+0x242/0x30d mm/kasan/report.c:412
>   __asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:433
>   ceph_destroy_options+0xe0/0x110 net/ceph/ceph_common.c:289
>   ceph_mount+0xeab/0x1cc0 fs/ceph/super.c:1047
>   legacy_get_tree+0x131/0x460 fs/fs_context.c:732
>   vfs_get_tree+0x1cb/0x5c0 fs/super.c:1746
>   do_new_mount fs/namespace.c:2627 [inline]
>   do_mount+0x6f9/0x1e30 fs/namespace.c:2951
>   ksys_mount+0x12d/0x140 fs/namespace.c:3167
>   __do_sys_mount fs/namespace.c:3181 [inline]
>   __se_sys_mount fs/namespace.c:3178 [inline]
>   __x64_sys_mount+0xbe/0x150 fs/namespace.c:3178
>   do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
>   entry_SYSCALL_64_after_hwframe+0x49/0xbe
> RIP: 0033:0x457089
> Code: fd b4 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7
> 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff
> ff 0f 83 cb b4 fb ff c3 66 2e 0f 1f 84 00 00 00 00
> RSP: 002b:00007fae7dd45c78 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5
> RAX: ffffffffffffffda RBX: 00007fae7dd466d4 RCX: 0000000000457089
> RDX: 0000000020000080 RSI: 0000000020000040 RDI: 0000000020000000
> RBP: 00000000009300a0 R08: 00000000200000c0 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff
> R13: 00000000004d2750 R14: 00000000004c7bdf R15: 0000000000000000
>
> Allocated by task 7977:
>   save_stack+0x43/0xd0 mm/kasan/kasan.c:448
>   set_track mm/kasan/kasan.c:460 [inline]
>   kasan_kmalloc+0xc4/0xe0 mm/kasan/kasan.c:553
>   kmem_cache_alloc_trace+0x152/0x730 mm/slab.c:3620
>   kmalloc include/linux/slab.h:513 [inline]
>   kzalloc include/linux/slab.h:707 [inline]
>   ceph_parse_options+0xfe/0x1230 net/ceph/ceph_common.c:355
>   parse_mount_options fs/ceph/super.c:491 [inline]
>   ceph_mount+0x4b9/0x1cc0 fs/ceph/super.c:1036
>   legacy_get_tree+0x131/0x460 fs/fs_context.c:732
>   vfs_get_tree+0x1cb/0x5c0 fs/super.c:1746
>   do_new_mount fs/namespace.c:2627 [inline]
>   do_mount+0x6f9/0x1e30 fs/namespace.c:2951
>   ksys_mount+0x12d/0x140 fs/namespace.c:3167
>   __do_sys_mount fs/namespace.c:3181 [inline]
>   __se_sys_mount fs/namespace.c:3178 [inline]
>   __x64_sys_mount+0xbe/0x150 fs/namespace.c:3178
>   do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
>   entry_SYSCALL_64_after_hwframe+0x49/0xbe
>
> Freed by task 7977:
>   save_stack+0x43/0xd0 mm/kasan/kasan.c:448
>   set_track mm/kasan/kasan.c:460 [inline]
>   __kasan_slab_free+0x11a/0x170 mm/kasan/kasan.c:521
>   kasan_slab_free+0xe/0x10 mm/kasan/kasan.c:528
>   __cache_free mm/slab.c:3498 [inline]
>   kfree+0xd9/0x210 mm/slab.c:3813
>   ceph_destroy_options+0xd4/0x110 net/ceph/ceph_common.c:295
>   ceph_destroy_client+0x139/0x1a0 net/ceph/ceph_common.c:680
>   create_fs_client fs/ceph/super.c:677 [inline]
>   ceph_mount+0xf6e/0x1cc0 fs/ceph/super.c:1043
>   legacy_get_tree+0x131/0x460 fs/fs_context.c:732
>   vfs_get_tree+0x1cb/0x5c0 fs/super.c:1746
>   do_new_mount fs/namespace.c:2627 [inline]
>   do_mount+0x6f9/0x1e30 fs/namespace.c:2951
>   ksys_mount+0x12d/0x140 fs/namespace.c:3167
>   __do_sys_mount fs/namespace.c:3181 [inline]
>   __se_sys_mount fs/namespace.c:3178 [inline]
>   __x64_sys_mount+0xbe/0x150 fs/namespace.c:3178
>   do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
>   entry_SYSCALL_64_after_hwframe+0x49/0xbe
>
> The buggy address belongs to the object at ffff8801d8f4dc80
>   which belongs to the cache kmalloc-256 of size 256
> The buggy address is located 208 bytes inside of
>   256-byte region [ffff8801d8f4dc80, ffff8801d8f4dd80)
> The buggy address belongs to the page:
> page:ffffea000763d340 count:1 mapcount:0 mapping:ffff8801dac007c0 index:0x0
> flags: 0x2fffc0000000100(slab)
> raw: 02fffc0000000100 ffffea0006c0e448 ffffea00072ca688 ffff8801dac007c0
> raw: 0000000000000000 ffff8801d8f4d000 000000010000000c 0000000000000000
> page dumped because: kasan: bad access detected
>
> Memory state around the buggy address:
>   ffff8801d8f4dc00: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
>   ffff8801d8f4dc80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> > ffff8801d8f4dd00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>                                                   ^
>   ffff8801d8f4dd80: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb
>   ffff8801d8f4de00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> ==================================================================

Thanks for the report, I'll prepare a patch.

                Ilya

^ permalink raw reply

* Re: [v3, net-next, 02/12] net: stmmac: Do not keep rearming the coalesce timer in stmmac_xmit
From: Martin Blumenstingl @ 2018-08-23 18:42 UTC (permalink / raw)
  To: jbrunet, Jose.Abreu, peppe.cavallaro, Joao.Pinto,
	alexandre.torgue, Vitor.Soares
  Cc: netdev, linux-amlogic, khilman, bgolaszewski, davem
In-Reply-To: <7c15f9477adbf69c6eb57a2a89273f7afc51574e.camel@baylibre.com>

Hi,

On Fri, Aug 17, 2018 at 9:32 AM Jerome Brunet <jbrunet@baylibre.com> wrote:
>
> On Fri, 2018-05-18 at 14:55 +0100, Jose Abreu wrote:
> > This is cutting down performance. Once the timer is armed it should run
> > after the time expires for the first packet sent and not the last one.
> >
> > After this change, running iperf, the performance gain is +/- 24%.
>
> Hi Guys,
>
> Since v4.18, we are getting a serious regression on Amlogic based SoCs.
> I have tested this on amlogic's:
> * gxbb S905 p200 (Micrel KSZ9031 - 1GBps)
> * axg A113 s400 (Realtek RTL8211F - 1GBps)
>
> Both SoCs use the synopsys gmac with stmmac driver.
I can confirm this on Odroid-C1 (Meson8b SoC with RTL8211F RGMII PHY) as well

> I first noticed that running NFS root filesystem became unstable but I could not
> understand why. Then, running a download as simple test with iperf3 (from an
> initramfs) will break the 'network' in matter of seconds.
I didn't run iperf, simply downloading the latest rootfs package
updates (on Arch Linux ARM) caused the network to break

> I don't know exactly what breaks but bisect clearly assign the blame to this
> change. Reverting the change solve this problem.
>
> I'll be happy to make more tests to help understand what is happening here.
if some latency is fine then I can also help testing

here's a bootlog excerpt with the info from the dwmac-meson8b driver
(used on all platforms listed above):
meson8b-dwmac c9410000.ethernet: PTP uses main clock
meson8b-dwmac c9410000.ethernet: User ID: 0x10, Synopsys ID: 0x37
meson8b-dwmac c9410000.ethernet: DWMAC1000
meson8b-dwmac c9410000.ethernet: DMA HW capability register supported
meson8b-dwmac c9410000.ethernet: RX Checksum Offload Engine supported
meson8b-dwmac c9410000.ethernet: COE Type 2
meson8b-dwmac c9410000.ethernet: TX Checksum insertion supported
meson8b-dwmac c9410000.ethernet: Wake-Up On Lan supported
meson8b-dwmac c9410000.ethernet: Normal descriptors
meson8b-dwmac c9410000.ethernet: Ring mode enabled
meson8b-dwmac c9410000.ethernet: Enable RX Mitigation via HW Watchdog Timer
...
meson8b-dwmac c9410000.ethernet eth0: device MAC address [...random
mac address...]
RTL8211F Gigabit Ethernet stmmac-0:00: attached PHY driver [RTL8211F
Gigabit Ethernet] (mii_bus:phy_addr=stmmac-0:00, irq=27)
...
meson8b-dwmac c9410000.ethernet eth0: No Safety Features support found
meson8b-dwmac c9410000.ethernet eth0: PTP not supported by HW
IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready

Regards
Martin

^ permalink raw reply

* Re: [PATCH bpf] tools: bpftool: return from do_event_pipe() on bad arguments
From: Daniel Borkmann @ 2018-08-23 18:35 UTC (permalink / raw)
  To: Quentin Monnet, Sergei Shtylyov, Alexei Starovoitov
  Cc: Jakub Kicinski, netdev, oss-drivers
In-Reply-To: <e00a163d-4574-1908-54ec-2b7c5d69508b@netronome.com>

On 08/23/2018 07:48 PM, Quentin Monnet wrote:
> 2018-08-23 20:35 UTC+0300 ~ Sergei Shtylyov
> <sergei.shtylyov@cogentembedded.com>
>> Hello!
>>
>> On 08/23/2018 07:46 PM, Quentin Monnet wrote:
>>
>>> When command line parsing fails in the while loop in do_event_pipe()
>>> because the number of arguments is incorrect or because the keyword is
>>> unknown, an error message is displayed, but bpfool
>>
>>    bp-who? ;-)
>>
>>> remains stucked in
>>
>>    Stuck.
>>
>>> the loop. Make sure we exit the loop upon failure.
>>>
>>> Fixes: f412eed9dfde ("tools: bpftool: add simple perf event output reader")
>>> Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
>>> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
>> [...]
>>
>> MBR, Sergei
> 
> Thanks Sergei! The patch has been applied so I cannot fix these, but
> I'll make sure to give an additional pass to my future commit logs…

I fixed the two commit log typos up, thanks.

^ permalink raw reply

* Re: [PATCH bpf] xsk: fix return value of xdp_umem_assign_dev()
From: Jakub Kicinski @ 2018-08-23 18:29 UTC (permalink / raw)
  To: Prashant Bhole
  Cc: Alexei Starovoitov, Daniel Borkmann, Björn Töpel,
	Magnus Karlsson, David S . Miller, netdev
In-Reply-To: <20180820005425.2604-1-bhole_prashant_q7@lab.ntt.co.jp>

On Mon, 20 Aug 2018 09:54:25 +0900, Prashant Bhole wrote:
> s/ENOTSUPP/EOPNOTSUPP/ in function umem_assign_dev().
> This function's return value is directly returned by xsk_bind().
> EOPNOTSUPP is bind()'s possible return value.
> 
> Fixes: f734607e819b ("xsk: refactor xdp_umem_assign_dev()")
> Signed-off-by: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp>

FWIW the refactoring commit just cleaned up the code, is it worth
submitting a patch to stable to correct 4.18 as well?

^ permalink raw reply

* Re: [PATCH] netfilter: conntrack: remove duplicated include from nf_conntrack_proto_udp.c
From: Pablo Neira Ayuso @ 2018-08-23 18:26 UTC (permalink / raw)
  To: Yue Haibing
  Cc: Jozsef Kadlecsik, Florian Westphal, David S. Miller,
	netfilter-devel, coreteam, netdev, kernel-janitors
In-Reply-To: <1534860184-170985-1-git-send-email-yuehaibing@huawei.com>

On Tue, Aug 21, 2018 at 02:03:04PM +0000, Yue Haibing wrote:
> Remove duplicated include.

Applied, thanks.

^ permalink raw reply

* Re: [PATCH 0/2] net/sched: Add hardware specific counters to TC actions
From: Jakub Kicinski @ 2018-08-23 18:14 UTC (permalink / raw)
  To: Eelco Chaudron
  Cc: David Miller, netdev, jhs, xiyou.wangcong, jiri, simon.horman,
	Marcelo Ricardo Leitner, louis.peens
In-Reply-To: <229BA7FA-916B-47EA-8FD4-3F0B8BDDD145@redhat.com>

On Mon, 20 Aug 2018 16:03:40 +0200, Eelco Chaudron wrote:
> On 17 Aug 2018, at 13:27, Jakub Kicinski wrote:
> > On Thu, 16 Aug 2018 14:02:44 +0200, Eelco Chaudron wrote:  
> >> On 11 Aug 2018, at 21:06, David Miller wrote:
> >>  
> >>> From: Jakub Kicinski <jakub.kicinski@netronome.com>
> >>> Date: Thu, 9 Aug 2018 20:26:08 -0700
> >>>  
> >>>> It is not immediately clear why this is needed.  The memory and
> >>>> updating two sets of counters won't come for free, so perhaps a
> >>>> stronger justification than troubleshooting is due? :S
> >>>>
> >>>> Netdev has counters for fallback vs forwarded traffic, so you'd 
> >>>> know
> >>>> that traffic hits the SW datapath, plus the rules which are in_hw
> >>>> will
> >>>> most likely not match as of today for flower (assuming 
> >>>> correctness).  
> >>
> >> I strongly believe that these counters are a requirement for a mixed
> >> software/hardware (flow) based forwarding environment. The global
> >> counters will not help much here as you might have chosen to have
> >> certain traffic forwarded by software.
> >>
> >> These counters are probably the only option you have to figure out 
> >> why
> >> forwarding is not as fast as expected, and you want to blame the TC
> >> offload NIC.  
> >
> > The suggested debugging flow would be:
> >  (1) check the global counter for fallback are incrementing;
> >  (2) find a flow with high stats but no in_hw flag set.
> >
> > The in_hw indication should be sufficient in most cases (unless there
> > are shared blocks between netdevs of different ASICs...).  
> 
> I guess the aim is to find miss behaving hardware, i.e. having the in_hw 
> flag set, but flows still coming to the kernel.

For misbehaving hardware in_hw will not work indeed.  Whether we need
these extra always-on stats for such use case could be debated :)
 
> >>>> I'm slightly concerned about potential performance impact, would 
> >>>> you
> >>>> be able to share some numbers for non-trivial number of flows (100k
> >>>> active?)?  
> >>>
> >>> Agreed, features used for diagnostics cannot have a harmful penalty
> >>> for fast path performance.  
> >>
> >> Fast path performance is not affected as these counters are not
> >> incremented there. They are only incremented by the nic driver when 
> >> they
> >> gather their statistics from hardware.  
> >
> > Not by much, you are adding state to performance-critical structures,
> > though, for what is effectively debugging purposes.
> >
> > I was mostly talking about the HW offload stat updates (sorry for not
> > being clear).
> >
> > We can have some hundreds of thousands active offloaded flows, each of
> > them can have multiple actions, and stats have to be updated multiple
> > times per second and dumped probably around once a second, too.  On a
> > busy system the stats will get evicted from cache between each round.
> >
> > But I'm speculating let's see if I can get some numbers on it (if you
> > could get some too, that would be great!).  
> 
> I’ll try to measure some of this later this week/early next week.

I asked Louis to run some tests while I'm travelling, and he reports
that my worry about reporting the extra stats was unfounded.  Update
function does not show up in traces at all.  It seems under stress
(generated with stress-ng) the thread dumping the stats in userspace
(in OvS it would be the revalidator) actually consumes less CPU in
__gnet_stats_copy_basic (0.4% less for ~2.0% total).

Would this match with your results?  I'm not sure why dumping would be
faster with your change..

> >> However, the flow creation is effected, as this is where the extra
> >> memory gets allocated. I had done some 40K flow tests before and did 
> >> not
> >> see any noticeable change in flow insertion performance. As requested 
> >> by
> >> Jakub I did it again for 100K (and threw a Netronome blade in the mix
> >> ;). I used Marcelo’s test tool,
> >> https://github.com/marceloleitner/perf-flower.git.
> >>
> >> Here are the numbers (time in seconds) for 10 runs in sorted order:
> >>
> >> +-------------+----------------+
> >> | Base_kernel | Change_applied |
> >> +-------------+----------------+
> >> |    5.684019 |       5.656388 |
> >> |    5.699658 |       5.674974 |
> >> |    5.725220 |       5.722107 |
> >> |    5.739285 |       5.839855 |
> >> |    5.748088 |       5.865238 |
> >> |    5.766231 |       5.873913 |
> >> |    5.842264 |       5.909259 |
> >> |    5.902202 |       5.912685 |
> >> |    5.905391 |       5.947138 |
> >> |    6.032997 |       5.997779 |
> >> +-------------+----------------+
> >>
> >> I guess the deviation is in the userspace part, which is where in 
> >> real
> >> life flows get added anyway.
> >>
> >> Let me know if more is unclear.  

^ permalink raw reply

* Re: [PATCH bpf] tools: bpftool: return from do_event_pipe() on bad arguments
From: Quentin Monnet @ 2018-08-23 17:48 UTC (permalink / raw)
  To: Sergei Shtylyov, Daniel Borkmann, Alexei Starovoitov
  Cc: Jakub Kicinski, netdev, oss-drivers
In-Reply-To: <7d223b75-59ae-2b8f-ff09-7f800c6228bf@cogentembedded.com>

2018-08-23 20:35 UTC+0300 ~ Sergei Shtylyov
<sergei.shtylyov@cogentembedded.com>
> Hello!
> 
> On 08/23/2018 07:46 PM, Quentin Monnet wrote:
> 
>> When command line parsing fails in the while loop in do_event_pipe()
>> because the number of arguments is incorrect or because the keyword is
>> unknown, an error message is displayed, but bpfool
> 
>    bp-who? ;-)
> 
>> remains stucked in
> 
>    Stuck.
> 
>> the loop. Make sure we exit the loop upon failure.
>>
>> Fixes: f412eed9dfde ("tools: bpftool: add simple perf event output reader")
>> Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
>> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
> [...]
> 
> MBR, Sergei

Thanks Sergei! The patch has been applied so I cannot fix these, but
I'll make sure to give an additional pass to my future commit logs…

Best,
Quentin

^ permalink raw reply

* Re: [RFC 1/3 net] lorawan: Add LoRaWAN class module
From: Randy Dunlap @ 2018-08-23 17:43 UTC (permalink / raw)
  To: Jian-Hong Pan, Andreas Färber, netdev, linux-arm-kernel,
	linux-kernel, Jiri Pirko, Marcel Holtmann, David S. Miller,
	Matthias Brugger, Janus Piwek, Michael Röder, Dollar Chen,
	Ken Yu, Konstantin Böhm, Jan Jongboom, Jon Ortego,
	contact@snootlab.com, Ben Whitten, Brian Ray, lora
In-Reply-To: <fc737f3940bbe91341fb15d85ac11931eb56d1fc.1535039998.git.starnight@g.ncu.edu.tw>

Hi,

On 08/23/2018 10:15 AM, Jian-Hong Pan wrote:
> diff --git a/net/maclorawan/Kconfig b/net/maclorawan/Kconfig
> new file mode 100644
> index 000000000000..741992b059c6
> --- /dev/null
> +++ b/net/maclorawan/Kconfig
> @@ -0,0 +1,14 @@
> +config LORAWAN
> +	tristate "MAC layer of LoRaWAN Network support"
> +	select CRYPTO
> +	select CRYPTO_CMAC
> +	select CRYPTO_CBC
> +	select CRYPTO_AES
> +	---help---
> +	  LoRaWAN defines a low data rate, low power and long range wireless
> +	  wide area networks. It was designed to organise networks of sensors,
> +	  switches and actuators, etc automation devices.  It could operate as
> +	  multiple kilometers wide.

There are several things that I would change.  How about this instead?


	  LoRaWAN defines low data rate, low power and long range wireless
	  wide area networks. It was designed to organise networks of automation
	  devices, such as sensors, switches and actuators. It can operate
	  multiple kilometers wide.

> +
> +	  Say Y here to compile LoRaWAN support into the kernel or say M to
> +	  compile it as modules.


-- 
~Randy

^ permalink raw reply

* Re: [PATCH bpf] tools: bpftool: return from do_event_pipe() on bad arguments
From: Sergei Shtylyov @ 2018-08-23 17:35 UTC (permalink / raw)
  To: Quentin Monnet, Daniel Borkmann, Alexei Starovoitov
  Cc: Jakub Kicinski, netdev, oss-drivers
In-Reply-To: <20180823164625.5410-1-quentin.monnet@netronome.com>

Hello!

On 08/23/2018 07:46 PM, Quentin Monnet wrote:

> When command line parsing fails in the while loop in do_event_pipe()
> because the number of arguments is incorrect or because the keyword is
> unknown, an error message is displayed, but bpfool

   bp-who? ;-)

> remains stucked in

   Stuck.

> the loop. Make sure we exit the loop upon failure.
> 
> Fixes: f412eed9dfde ("tools: bpftool: add simple perf event output reader")
> Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
[...]

MBR, Sergei

^ permalink raw reply

* RE: [Intel-wired-lan] [PATCH] net: intel: igb: Replace mdelay() with msleep() in igb_integrated_phy_loopback()
From: Brown, Aaron F @ 2018-08-23 21:05 UTC (permalink / raw)
  To: Jia-Ju Bai, Kirsher, Jeffrey T
  Cc: netdev@vger.kernel.org, intel-wired-lan@lists.osuosl.org,
	linux-kernel@vger.kernel.org
In-Reply-To: <20180727080738.3301-1-baijiaju1990@gmail.com>

> From: Intel-wired-lan [mailto:intel-wired-lan-bounces@osuosl.org] On
> Behalf Of Jia-Ju Bai
> Sent: Friday, July 27, 2018 1:08 AM
> To: Kirsher, Jeffrey T <jeffrey.t.kirsher@intel.com>
> Cc: netdev@vger.kernel.org; Jia-Ju Bai <baijiaju1990@gmail.com>; intel-
> wired-lan@lists.osuosl.org; linux-kernel@vger.kernel.org
> Subject: [Intel-wired-lan] [PATCH] net: intel: igb: Replace mdelay() with
> msleep() in igb_integrated_phy_loopback()
> 
> igb_integrated_phy_loopback() is never called in atomic context.
> It calls mdelay() to busily wait, which is not necessary.
> mdelay() can be replaced with msleep().
> 
> This is found by a static analysis tool named DCNS written by myself.
> 
> Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com>
> ---
>  drivers/net/ethernet/intel/igb/igb_ethtool.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 

Tested-by: Aaron Brown <aaron.f.brown@intel.com>

^ permalink raw reply

* RE: [Intel-wired-lan] [PATCH 1/2] net: intel: igb: Replace GFP_ATOMIC with GFP_KERNEL in igb_sw_init()
From: Brown, Aaron F @ 2018-08-23 21:04 UTC (permalink / raw)
  To: Jia-Ju Bai, Kirsher, Jeffrey T
  Cc: netdev@vger.kernel.org, intel-wired-lan@lists.osuosl.org,
	linux-kernel@vger.kernel.org
In-Reply-To: <20180727080453.3198-1-baijiaju1990@gmail.com>

> From: Intel-wired-lan [mailto:intel-wired-lan-bounces@osuosl.org] On
> Behalf Of Jia-Ju Bai
> Sent: Friday, July 27, 2018 1:05 AM
> To: Kirsher, Jeffrey T <jeffrey.t.kirsher@intel.com>
> Cc: netdev@vger.kernel.org; Jia-Ju Bai <baijiaju1990@gmail.com>; intel-
> wired-lan@lists.osuosl.org; linux-kernel@vger.kernel.org
> Subject: [Intel-wired-lan] [PATCH 1/2] net: intel: igb: Replace GFP_ATOMIC
> with GFP_KERNEL in igb_sw_init()
> 
> igb_sw_init() is never called in atomic context.
> It calls kzalloc() and kcalloc() with GFP_ATOMIC, which is not necessary.
> GFP_ATOMIC can be replaced with GFP_KERNEL.
> 
> This is found by a static analysis tool named DCNS written by myself.
> 
> Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com>
> ---
>  drivers/net/ethernet/intel/igb/igb_main.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 

Tested-by: Aaron Brown <aaron.f.brown@intel.com>

^ permalink raw reply

* Re: [PATCH bpf] tools: bpftool: return from do_event_pipe() on bad arguments
From: Daniel Borkmann @ 2018-08-23 16:56 UTC (permalink / raw)
  To: Quentin Monnet, Alexei Starovoitov; +Cc: Jakub Kicinski, netdev, oss-drivers
In-Reply-To: <20180823164625.5410-1-quentin.monnet@netronome.com>

On 08/23/2018 06:46 PM, Quentin Monnet wrote:
> When command line parsing fails in the while loop in do_event_pipe()
> because the number of arguments is incorrect or because the keyword is
> unknown, an error message is displayed, but bpfool remains stucked in
> the loop. Make sure we exit the loop upon failure.
> 
> Fixes: f412eed9dfde ("tools: bpftool: add simple perf event output reader")
> Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>

Applied to bpf, thanks Quentin!

^ permalink raw reply

* Re: [PATCH bpf] bpf: use per htab salt for bucket hash
From: Eduardo Valentin @ 2018-08-23 16:43 UTC (permalink / raw)
  To: Daniel Borkmann; +Cc: alexei.starovoitov, netdev
In-Reply-To: <3d91bc248da2fb1d2a88bcaa75d1b2d2da936986.1534974407.git.daniel@iogearbox.net>

On Wed, Aug 22, 2018 at 11:49:37PM +0200, Daniel Borkmann wrote:
> All BPF hash and LRU maps currently have a known and global seed
> we feed into jhash() which is 0. This is suboptimal, thus fix it
> by generating a random seed upon hashtab setup time which we can
> later on feed into jhash() on lookup, update and deletions.
> 
> Fixes: 0f8e4bd8a1fc8 ("bpf: add hashtable type of eBPF maps")
> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
> Acked-by: Alexei Starovoitov <ast@kernel.org>

Reviewed-by: Eduardo Valentin <eduval@amazon.com>


> ---
>  kernel/bpf/hashtab.c | 23 +++++++++++++----------
>  1 file changed, 13 insertions(+), 10 deletions(-)
> 
> diff --git a/kernel/bpf/hashtab.c b/kernel/bpf/hashtab.c
> index 04b8eda..03cc59e 100644
> --- a/kernel/bpf/hashtab.c
> +++ b/kernel/bpf/hashtab.c
> @@ -15,6 +15,7 @@
>  #include <linux/jhash.h>
>  #include <linux/filter.h>
>  #include <linux/rculist_nulls.h>
> +#include <linux/random.h>
>  #include <uapi/linux/btf.h>
>  #include "percpu_freelist.h"
>  #include "bpf_lru_list.h"
> @@ -41,6 +42,7 @@ struct bpf_htab {
>  	atomic_t count;	/* number of elements in this hashtable */
>  	u32 n_buckets;	/* number of hash buckets */
>  	u32 elem_size;	/* size of each element in bytes */
> +	u32 hashrnd;
>  };
>  
>  /* each htab element is struct htab_elem + key + value */
> @@ -371,6 +373,7 @@ static struct bpf_map *htab_map_alloc(union bpf_attr *attr)
>  	if (!htab->buckets)
>  		goto free_htab;
>  
> +	htab->hashrnd = get_random_int();
>  	for (i = 0; i < htab->n_buckets; i++) {
>  		INIT_HLIST_NULLS_HEAD(&htab->buckets[i].head, i);
>  		raw_spin_lock_init(&htab->buckets[i].lock);
> @@ -402,9 +405,9 @@ static struct bpf_map *htab_map_alloc(union bpf_attr *attr)
>  	return ERR_PTR(err);
>  }
>  
> -static inline u32 htab_map_hash(const void *key, u32 key_len)
> +static inline u32 htab_map_hash(const void *key, u32 key_len, u32 hashrnd)
>  {
> -	return jhash(key, key_len, 0);
> +	return jhash(key, key_len, hashrnd);
>  }
>  
>  static inline struct bucket *__select_bucket(struct bpf_htab *htab, u32 hash)
> @@ -470,7 +473,7 @@ static void *__htab_map_lookup_elem(struct bpf_map *map, void *key)
>  
>  	key_size = map->key_size;
>  
> -	hash = htab_map_hash(key, key_size);
> +	hash = htab_map_hash(key, key_size, htab->hashrnd);
>  
>  	head = select_bucket(htab, hash);
>  
> @@ -597,7 +600,7 @@ static int htab_map_get_next_key(struct bpf_map *map, void *key, void *next_key)
>  	if (!key)
>  		goto find_first_elem;
>  
> -	hash = htab_map_hash(key, key_size);
> +	hash = htab_map_hash(key, key_size, htab->hashrnd);
>  
>  	head = select_bucket(htab, hash);
>  
> @@ -824,7 +827,7 @@ static int htab_map_update_elem(struct bpf_map *map, void *key, void *value,
>  
>  	key_size = map->key_size;
>  
> -	hash = htab_map_hash(key, key_size);
> +	hash = htab_map_hash(key, key_size, htab->hashrnd);
>  
>  	b = __select_bucket(htab, hash);
>  	head = &b->head;
> @@ -880,7 +883,7 @@ static int htab_lru_map_update_elem(struct bpf_map *map, void *key, void *value,
>  
>  	key_size = map->key_size;
>  
> -	hash = htab_map_hash(key, key_size);
> +	hash = htab_map_hash(key, key_size, htab->hashrnd);
>  
>  	b = __select_bucket(htab, hash);
>  	head = &b->head;
> @@ -945,7 +948,7 @@ static int __htab_percpu_map_update_elem(struct bpf_map *map, void *key,
>  
>  	key_size = map->key_size;
>  
> -	hash = htab_map_hash(key, key_size);
> +	hash = htab_map_hash(key, key_size, htab->hashrnd);
>  
>  	b = __select_bucket(htab, hash);
>  	head = &b->head;
> @@ -998,7 +1001,7 @@ static int __htab_lru_percpu_map_update_elem(struct bpf_map *map, void *key,
>  
>  	key_size = map->key_size;
>  
> -	hash = htab_map_hash(key, key_size);
> +	hash = htab_map_hash(key, key_size, htab->hashrnd);
>  
>  	b = __select_bucket(htab, hash);
>  	head = &b->head;
> @@ -1071,7 +1074,7 @@ static int htab_map_delete_elem(struct bpf_map *map, void *key)
>  
>  	key_size = map->key_size;
>  
> -	hash = htab_map_hash(key, key_size);
> +	hash = htab_map_hash(key, key_size, htab->hashrnd);
>  	b = __select_bucket(htab, hash);
>  	head = &b->head;
>  
> @@ -1103,7 +1106,7 @@ static int htab_lru_map_delete_elem(struct bpf_map *map, void *key)
>  
>  	key_size = map->key_size;
>  
> -	hash = htab_map_hash(key, key_size);
> +	hash = htab_map_hash(key, key_size, htab->hashrnd);
>  	b = __select_bucket(htab, hash);
>  	head = &b->head;
>  
> -- 
> 2.9.5
> 
> 

-- 
All the best,
Eduardo Valentin

^ permalink raw reply

* [PATCH bpf] tools: bpftool: return from do_event_pipe() on bad arguments
From: Quentin Monnet @ 2018-08-23 16:46 UTC (permalink / raw)
  To: Daniel Borkmann, Alexei Starovoitov
  Cc: Jakub Kicinski, netdev, oss-drivers, Quentin Monnet

When command line parsing fails in the while loop in do_event_pipe()
because the number of arguments is incorrect or because the keyword is
unknown, an error message is displayed, but bpfool remains stucked in
the loop. Make sure we exit the loop upon failure.

Fixes: f412eed9dfde ("tools: bpftool: add simple perf event output reader")
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
---
 tools/bpf/bpftool/map_perf_ring.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/tools/bpf/bpftool/map_perf_ring.c b/tools/bpf/bpftool/map_perf_ring.c
index 1832100d1b27..6d41323be291 100644
--- a/tools/bpf/bpftool/map_perf_ring.c
+++ b/tools/bpf/bpftool/map_perf_ring.c
@@ -194,8 +194,10 @@ int do_event_pipe(int argc, char **argv)
 	}
 
 	while (argc) {
-		if (argc < 2)
+		if (argc < 2) {
 			BAD_ARG();
+			goto err_close_map;
+		}
 
 		if (is_prefix(*argv, "cpu")) {
 			char *endptr;
@@ -221,6 +223,7 @@ int do_event_pipe(int argc, char **argv)
 			NEXT_ARG();
 		} else {
 			BAD_ARG();
+			goto err_close_map;
 		}
 
 		do_all = false;
-- 
2.14.1

^ permalink raw reply related

* [PATCH net] vti6: remove !skb->ignore_df check from vti6_xmit()
From: Alexey Kodanev @ 2018-08-23 16:49 UTC (permalink / raw)
  To: netdev; +Cc: Steffen Klassert, David Miller, Alexey Kodanev

Before the commit d6990976af7c ("vti6: fix PMTU caching and reporting
on xmit") '!skb->ignore_df' check was always true because the function
skb_scrub_packet() was called before it, resetting ignore_df to zero.

In the commit, skb_scrub_packet() was moved below, and now this check
can be false for the packet, e.g. when sending it in the two fragments,
this prevents successful PMTU updates in such case. The next attempts
to send the packet lead to the same tx error. Moreover, vti6 initial
MTU value relies on PMTU adjustments.

This issue can be reproduced with the following LTP test script:
    udp_ipsec_vti.sh -6 -p ah -m tunnel -s 2000

Fixes: ccd740cbc6e0 ("vti6: Add pmtu handling to vti6_xmit.")
Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com>
---
Not sure about xfrmi_xmit2(), it has a similar check for ignore_df...

 net/ipv6/ip6_vti.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv6/ip6_vti.c b/net/ipv6/ip6_vti.c
index 38dec9d..f48d196 100644
--- a/net/ipv6/ip6_vti.c
+++ b/net/ipv6/ip6_vti.c
@@ -481,7 +481,7 @@ static bool vti6_state_check(const struct xfrm_state *x,
 	}

 	mtu = dst_mtu(dst);
-	if (!skb->ignore_df && skb->len > mtu) {
+	if (skb->len > mtu) {
 		skb_dst_update_pmtu(skb, mtu);

 		if (skb->protocol == htons(ETH_P_IPV6)) {
-- 
1.8.3.1

^ permalink raw reply related

* Re: [PATCH] netlink: add nl_set_extack_cookie_u64()
From: Johannes Berg @ 2018-08-23 16:39 UTC (permalink / raw)
  To: David Miller; +Cc: linux-wireless, netdev
In-Reply-To: <20180823.084510.864066255490001088.davem@davemloft.net>

On Thu, 2018-08-23 at 08:45 -0700, David Miller wrote:
> From: Johannes Berg <johannes@sipsolutions.net>
> Date: Thu, 23 Aug 2018 10:48:13 +0200
> 
> > From: Johannes Berg <johannes.berg@intel.com>
> > 
> > Add a helper function nl_set_extack_cookie_u64() to use a u64 as
> > the netlink extended ACK cookie, to avoid having to open-code it
> > in any users of the cookie.
> > 
> > A u64 should be sufficient for most subsystems though we allow
> > for up to 20 bytes right now. This also matches the cookies in
> > nl80211 where I intend to use this.
> > 
> > Signed-off-by: Johannes Berg <johannes.berg@intel.com>
> 
> Feel free to merge this in your wireless tree since the first
> user will be there.
> 
> Acked-by: David S. Miller <davem@davemloft.net>

Will do, thanks.

johannes

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox