* [net-next 03/14] ice: Don't call synchronize_irq() for VF's from the host
From: Jeff Kirsher @ 2019-08-23 23:37 UTC (permalink / raw)
To: davem; +Cc: Brett Creeley, netdev, nhorman, sassmann, Andrew Bowers,
Jeff Kirsher
In-Reply-To: <20190823233750.7997-1-jeffrey.t.kirsher@intel.com>
From: Brett Creeley <brett.creeley@intel.com>
Currently we will call synchronize_irq() from the host for VF's. This is
not correct, so don't allow it.
Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
drivers/net/ethernet/intel/ice/ice_lib.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/drivers/net/ethernet/intel/ice/ice_lib.c b/drivers/net/ethernet/intel/ice/ice_lib.c
index d6279dfe029e..c067ef6be7f4 100644
--- a/drivers/net/ethernet/intel/ice/ice_lib.c
+++ b/drivers/net/ethernet/intel/ice/ice_lib.c
@@ -2817,6 +2817,10 @@ void ice_vsi_dis_irq(struct ice_vsi *vsi)
ice_flush(hw);
+ /* don't call synchronize_irq() for VF's from the host */
+ if (vsi->type == ICE_VSI_VF)
+ return;
+
ice_for_each_q_vector(vsi, i)
synchronize_irq(pf->msix_entries[i + base].vector);
}
--
2.21.0
^ permalink raw reply related
* [net-next 10/14] ice: Don't clear auto_fec bit in ice_cfg_phy_fec()
From: Jeff Kirsher @ 2019-08-23 23:37 UTC (permalink / raw)
To: davem; +Cc: Chinh T Cao, netdev, nhorman, sassmann, Andrew Bowers,
Jeff Kirsher
In-Reply-To: <20190823233750.7997-1-jeffrey.t.kirsher@intel.com>
From: Chinh T Cao <chinh.t.cao@intel.com>
The driver should never clear the auto_fec_enable bit.
Signed-off-by: Chinh T Cao <chinh.t.cao@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
drivers/net/ethernet/intel/ice/ice_common.c | 9 +++------
1 file changed, 3 insertions(+), 6 deletions(-)
diff --git a/drivers/net/ethernet/intel/ice/ice_common.c b/drivers/net/ethernet/intel/ice/ice_common.c
index 15648d4a8bab..4b43e6de847b 100644
--- a/drivers/net/ethernet/intel/ice/ice_common.c
+++ b/drivers/net/ethernet/intel/ice/ice_common.c
@@ -2181,27 +2181,24 @@ ice_cfg_phy_fec(struct ice_aqc_set_phy_cfg_data *cfg, enum ice_fec_mode fec)
{
switch (fec) {
case ICE_FEC_BASER:
- /* Clear auto FEC and RS bits, and AND BASE-R ability
+ /* Clear RS bits, and AND BASE-R ability
* bits and OR request bits.
*/
- cfg->caps &= ~ICE_AQC_PHY_EN_AUTO_FEC;
cfg->link_fec_opt &= ICE_AQC_PHY_FEC_10G_KR_40G_KR4_EN |
ICE_AQC_PHY_FEC_25G_KR_CLAUSE74_EN;
cfg->link_fec_opt |= ICE_AQC_PHY_FEC_10G_KR_40G_KR4_REQ |
ICE_AQC_PHY_FEC_25G_KR_REQ;
break;
case ICE_FEC_RS:
- /* Clear auto FEC and BASE-R bits, and AND RS ability
+ /* Clear BASE-R bits, and AND RS ability
* bits and OR request bits.
*/
- cfg->caps &= ~ICE_AQC_PHY_EN_AUTO_FEC;
cfg->link_fec_opt &= ICE_AQC_PHY_FEC_25G_RS_CLAUSE91_EN;
cfg->link_fec_opt |= ICE_AQC_PHY_FEC_25G_RS_528_REQ |
ICE_AQC_PHY_FEC_25G_RS_544_REQ;
break;
case ICE_FEC_NONE:
- /* Clear auto FEC and all FEC option bits. */
- cfg->caps &= ~ICE_AQC_PHY_EN_AUTO_FEC;
+ /* Clear all FEC option bits. */
cfg->link_fec_opt &= ~ICE_AQC_PHY_FEC_MASK;
break;
case ICE_FEC_AUTO:
--
2.21.0
^ permalink raw reply related
* [net-next 13/14] ice: Fix issues updating VSI MAC filters
From: Jeff Kirsher @ 2019-08-23 23:37 UTC (permalink / raw)
To: davem
Cc: Akeem G Abodunrin, netdev, nhorman, sassmann, Tony Nguyen,
Andrew Bowers, Jeff Kirsher
In-Reply-To: <20190823233750.7997-1-jeffrey.t.kirsher@intel.com>
From: Akeem G Abodunrin <akeem.g.abodunrin@intel.com>
VSI, especially VF could request to add or remove filter for another VSI,
driver should really guide such request and disallow it.
However, instead of returning error for such malicious request, driver
can simply return success.
In addition, we are not tracking number of MAC filters configured per
VF correctly - and this leads to issue updating VF MAC filters whenever
they were removed and re-configured via bringing VF interface down and
up. Also, since VF could send request to update multiple MAC filters at
once, driver should program those filters individually in the switch, in
order to determine which action resulted to error, and communicate
accordingly to the VF.
So, with this changes, we now track number of filters added right from
when VF resources allocation is done, and could properly add filters for
both trusted and non_trusted VFs, without MAC filters mis-match issue in
the switch...
Also refactor code, so that driver can use new function to add or remove
MAC filters.
Signed-off-by: Akeem G Abodunrin <akeem.g.abodunrin@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
drivers/net/ethernet/intel/ice/ice_lib.c | 30 +++++++++
drivers/net/ethernet/intel/ice/ice_lib.h | 4 ++
drivers/net/ethernet/intel/ice/ice_main.c | 64 +++++--------------
.../net/ethernet/intel/ice/ice_virtchnl_pf.c | 42 ++++++------
4 files changed, 73 insertions(+), 67 deletions(-)
diff --git a/drivers/net/ethernet/intel/ice/ice_lib.c b/drivers/net/ethernet/intel/ice/ice_lib.c
index 343d0c305423..8d5d6635a123 100644
--- a/drivers/net/ethernet/intel/ice/ice_lib.c
+++ b/drivers/net/ethernet/intel/ice/ice_lib.c
@@ -3181,3 +3181,33 @@ int ice_vsi_cfg_tc(struct ice_vsi *vsi, u8 ena_tc)
return ret;
}
#endif /* CONFIG_DCB */
+
+/**
+ * ice_vsi_cfg_mac_fltr - Add or remove a MAC address filter for a VSI
+ * @vsi: the VSI being configured MAC filter
+ * @macaddr: the MAC address to be added.
+ * @set: Add or delete a MAC filter
+ *
+ * Adds or removes MAC address filter entry for VF VSI
+ */
+enum ice_status
+ice_vsi_cfg_mac_fltr(struct ice_vsi *vsi, const u8 *macaddr, bool set)
+{
+ LIST_HEAD(tmp_add_list);
+ enum ice_status status;
+
+ /* Update MAC filter list to be added or removed for a VSI */
+ if (ice_add_mac_to_list(vsi, &tmp_add_list, macaddr)) {
+ status = ICE_ERR_NO_MEMORY;
+ goto cfg_mac_fltr_exit;
+ }
+
+ if (set)
+ status = ice_add_mac(&vsi->back->hw, &tmp_add_list);
+ else
+ status = ice_remove_mac(&vsi->back->hw, &tmp_add_list);
+
+cfg_mac_fltr_exit:
+ ice_free_fltr_list(&vsi->back->pdev->dev, &tmp_add_list);
+ return status;
+}
diff --git a/drivers/net/ethernet/intel/ice/ice_lib.h b/drivers/net/ethernet/intel/ice/ice_lib.h
index 6e43ef03bfc3..969ba27cba95 100644
--- a/drivers/net/ethernet/intel/ice/ice_lib.h
+++ b/drivers/net/ethernet/intel/ice/ice_lib.h
@@ -95,4 +95,8 @@ void ice_vsi_free_tx_rings(struct ice_vsi *vsi);
int ice_vsi_manage_rss_lut(struct ice_vsi *vsi, bool ena);
u32 ice_intrl_usec_to_reg(u8 intrl, u8 gran);
+
+enum ice_status
+ice_vsi_cfg_mac_fltr(struct ice_vsi *vsi, const u8 *macaddr, bool set);
+
#endif /* !_ICE_LIB_H_ */
diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c
index 6dd806b763ea..f3923dec32b7 100644
--- a/drivers/net/ethernet/intel/ice/ice_main.c
+++ b/drivers/net/ethernet/intel/ice/ice_main.c
@@ -116,10 +116,9 @@ static void ice_check_for_hang_subtask(struct ice_pf *pf)
*/
static int ice_init_mac_fltr(struct ice_pf *pf)
{
- LIST_HEAD(tmp_add_list);
+ enum ice_status status;
u8 broadcast[ETH_ALEN];
struct ice_vsi *vsi;
- int status;
vsi = ice_find_vsi_by_type(pf, ICE_VSI_PF);
if (!vsi)
@@ -130,8 +129,7 @@ static int ice_init_mac_fltr(struct ice_pf *pf)
*/
/* Add a unicast MAC filter so the VSI can get its packets */
- status = ice_add_mac_to_list(vsi, &tmp_add_list,
- vsi->port_info->mac.perm_addr);
+ status = ice_vsi_cfg_mac_fltr(vsi, vsi->port_info->mac.perm_addr, true);
if (status)
goto unregister;
@@ -139,18 +137,11 @@ static int ice_init_mac_fltr(struct ice_pf *pf)
* MAC address to the list as well.
*/
eth_broadcast_addr(broadcast);
- status = ice_add_mac_to_list(vsi, &tmp_add_list, broadcast);
- if (status)
- goto free_mac_list;
-
- /* Program MAC filters for entries in tmp_add_list */
- status = ice_add_mac(&pf->hw, &tmp_add_list);
+ status = ice_vsi_cfg_mac_fltr(vsi, broadcast, true);
if (status)
- status = -ENOMEM;
-
-free_mac_list:
- ice_free_fltr_list(&pf->pdev->dev, &tmp_add_list);
+ goto unregister;
+ return 0;
unregister:
/* We aren't useful with no MAC filters, so unregister if we
* had an error
@@ -164,7 +155,7 @@ static int ice_init_mac_fltr(struct ice_pf *pf)
vsi->netdev = NULL;
}
- return status;
+ return -EIO;
}
/**
@@ -2834,10 +2825,8 @@ static int ice_set_mac_address(struct net_device *netdev, void *pi)
struct ice_hw *hw = &pf->hw;
struct sockaddr *addr = pi;
enum ice_status status;
- LIST_HEAD(a_mac_list);
- LIST_HEAD(r_mac_list);
u8 flags = 0;
- int err;
+ int err = 0;
u8 *mac;
mac = (u8 *)addr->sa_data;
@@ -2860,42 +2849,23 @@ static int ice_set_mac_address(struct net_device *netdev, void *pi)
/* When we change the MAC address we also have to change the MAC address
* based filter rules that were created previously for the old MAC
* address. So first, we remove the old filter rule using ice_remove_mac
- * and then create a new filter rule using ice_add_mac. Note that for
- * both these operations, we first need to form a "list" of MAC
- * addresses (even though in this case, we have only 1 MAC address to be
- * added/removed) and this done using ice_add_mac_to_list. Depending on
- * the ensuing operation this "list" of MAC addresses is either to be
- * added or removed from the filter.
+ * and then create a new filter rule using ice_add_mac via
+ * ice_vsi_cfg_mac_fltr function call for both add and/or remove
+ * filters.
*/
- err = ice_add_mac_to_list(vsi, &r_mac_list, netdev->dev_addr);
- if (err) {
- err = -EADDRNOTAVAIL;
- goto free_lists;
- }
-
- status = ice_remove_mac(hw, &r_mac_list);
+ status = ice_vsi_cfg_mac_fltr(vsi, netdev->dev_addr, false);
if (status) {
err = -EADDRNOTAVAIL;
- goto free_lists;
- }
-
- err = ice_add_mac_to_list(vsi, &a_mac_list, mac);
- if (err) {
- err = -EADDRNOTAVAIL;
- goto free_lists;
+ goto err_update_filters;
}
- status = ice_add_mac(hw, &a_mac_list);
+ status = ice_vsi_cfg_mac_fltr(vsi, mac, true);
if (status) {
err = -EADDRNOTAVAIL;
- goto free_lists;
+ goto err_update_filters;
}
-free_lists:
- /* free list entries */
- ice_free_fltr_list(&pf->pdev->dev, &r_mac_list);
- ice_free_fltr_list(&pf->pdev->dev, &a_mac_list);
-
+err_update_filters:
if (err) {
netdev_err(netdev, "can't set MAC %pM. filter update failed\n",
mac);
@@ -2911,8 +2881,8 @@ static int ice_set_mac_address(struct net_device *netdev, void *pi)
flags = ICE_AQC_MAN_MAC_UPDATE_LAA_WOL;
status = ice_aq_manage_mac_write(hw, mac, flags, NULL);
if (status) {
- netdev_err(netdev, "can't set MAC %pM. write to firmware failed.\n",
- mac);
+ netdev_err(netdev, "can't set MAC %pM. write to firmware failed error %d\n",
+ mac, status);
}
return 0;
}
diff --git a/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c b/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c
index de0a1ef54e83..86637d99ee77 100644
--- a/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c
+++ b/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c
@@ -540,7 +540,10 @@ static int ice_alloc_vsi_res(struct ice_vf *vf)
status = ice_add_mac(&pf->hw, &tmp_add_list);
if (status)
- dev_err(&pf->pdev->dev, "could not add mac filters\n");
+ dev_err(&pf->pdev->dev,
+ "could not add mac filters error %d\n", status);
+ else
+ vf->num_mac = 1;
/* Clear this bit after VF initialization since we shouldn't reclaim
* and reassign interrupts for synchronous or asynchronous VFR events.
@@ -2208,7 +2211,7 @@ ice_vc_handle_mac_addr_msg(struct ice_vf *vf, u8 *msg, bool set)
(struct virtchnl_ether_addr_list *)msg;
struct ice_pf *pf = vf->pf;
enum virtchnl_ops vc_op;
- LIST_HEAD(mac_list);
+ enum ice_status status;
struct ice_vsi *vsi;
int mac_count = 0;
int i;
@@ -2282,33 +2285,32 @@ ice_vc_handle_mac_addr_msg(struct ice_vf *vf, u8 *msg, bool set)
goto handle_mac_exit;
}
- /* get here if maddr is multicast or if VF can change MAC */
- if (ice_add_mac_to_list(vsi, &mac_list, al->list[i].addr)) {
- v_ret = VIRTCHNL_STATUS_ERR_NO_MEMORY;
+ /* program the updated filter list */
+ status = ice_vsi_cfg_mac_fltr(vsi, maddr, set);
+ if (status == ICE_ERR_DOES_NOT_EXIST ||
+ status == ICE_ERR_ALREADY_EXISTS) {
+ dev_info(&pf->pdev->dev,
+ "can't %s MAC filters %pM for VF %d, error %d\n",
+ set ? "add" : "remove", maddr, vf->vf_id,
+ status);
+ } else if (status) {
+ dev_err(&pf->pdev->dev,
+ "can't %s MAC filters for VF %d, error %d\n",
+ set ? "add" : "remove", vf->vf_id, status);
+ v_ret = ice_err_to_virt_err(status);
goto handle_mac_exit;
}
+
mac_count++;
}
- /* program the updated filter list */
+ /* Track number of MAC filters programmed for the VF VSI */
if (set)
- v_ret = ice_err_to_virt_err(ice_add_mac(&pf->hw, &mac_list));
+ vf->num_mac += mac_count;
else
- v_ret = ice_err_to_virt_err(ice_remove_mac(&pf->hw, &mac_list));
-
- if (v_ret) {
- dev_err(&pf->pdev->dev,
- "can't %s MAC filters for VF %d, error %d\n",
- set ? "add" : "remove", vf->vf_id, v_ret);
- } else {
- if (set)
- vf->num_mac += mac_count;
- else
- vf->num_mac -= mac_count;
- }
+ vf->num_mac -= mac_count;
handle_mac_exit:
- ice_free_fltr_list(&pf->pdev->dev, &mac_list);
/* send the response to the VF */
return ice_vc_send_msg_to_vf(vf, vc_op, v_ret, NULL, 0);
}
--
2.21.0
^ permalink raw reply related
* [net-next 07/14] ice: Rename ethtool private flag for lldp
From: Jeff Kirsher @ 2019-08-23 23:37 UTC (permalink / raw)
To: davem; +Cc: Dave Ertman, netdev, nhorman, sassmann, Andrew Bowers,
Jeff Kirsher
In-Reply-To: <20190823233750.7997-1-jeffrey.t.kirsher@intel.com>
From: Dave Ertman <david.m.ertman@intel.com>
The current flag name of "enable-fw-lldp" is a bit cumbersome.
Change priv-flag name to "fw-lldp-agent" with a value of on or
off. This is more straight-forward in meaning.
Signed-off-by: Dave Ertman <david.m.ertman@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
drivers/net/ethernet/intel/ice/ice.h | 2 +-
drivers/net/ethernet/intel/ice/ice_dcb_lib.c | 4 ++--
drivers/net/ethernet/intel/ice/ice_ethtool.c | 6 +++---
drivers/net/ethernet/intel/ice/ice_lib.c | 4 ++--
4 files changed, 8 insertions(+), 8 deletions(-)
diff --git a/drivers/net/ethernet/intel/ice/ice.h b/drivers/net/ethernet/intel/ice/ice.h
index 9f9c30d29eb5..99e0febd8e50 100644
--- a/drivers/net/ethernet/intel/ice/ice.h
+++ b/drivers/net/ethernet/intel/ice/ice.h
@@ -329,7 +329,7 @@ enum ice_pf_flags {
ICE_FLAG_DCB_ENA,
ICE_FLAG_LINK_DOWN_ON_CLOSE_ENA,
ICE_FLAG_NO_MEDIA,
- ICE_FLAG_ENABLE_FW_LLDP,
+ ICE_FLAG_FW_LLDP_AGENT,
ICE_FLAG_ETHTOOL_CTXT, /* set when ethtool holds RTNL lock */
ICE_PF_FLAGS_NBITS /* must be last */
};
diff --git a/drivers/net/ethernet/intel/ice/ice_dcb_lib.c b/drivers/net/ethernet/intel/ice/ice_dcb_lib.c
index 4fc9faf5bc71..734cef8eed9e 100644
--- a/drivers/net/ethernet/intel/ice/ice_dcb_lib.c
+++ b/drivers/net/ethernet/intel/ice/ice_dcb_lib.c
@@ -457,7 +457,7 @@ int ice_init_pf_dcb(struct ice_pf *pf, bool locked)
dev_info(&pf->pdev->dev,
"FW LLDP is disabled, DCBx/LLDP in SW mode.\n");
port_info->is_sw_lldp = true;
- clear_bit(ICE_FLAG_ENABLE_FW_LLDP, pf->flags);
+ clear_bit(ICE_FLAG_FW_LLDP_AGENT, pf->flags);
err = ice_dcb_sw_dflt_cfg(pf, locked);
if (err) {
dev_err(&pf->pdev->dev,
@@ -473,7 +473,7 @@ int ice_init_pf_dcb(struct ice_pf *pf, bool locked)
}
port_info->is_sw_lldp = false;
- set_bit(ICE_FLAG_ENABLE_FW_LLDP, pf->flags);
+ set_bit(ICE_FLAG_FW_LLDP_AGENT, pf->flags);
/* DCBX in FW and LLDP enabled in FW */
pf->dcbx_cap = DCB_CAP_DCBX_LLD_MANAGED | DCB_CAP_DCBX_VER_IEEE;
diff --git a/drivers/net/ethernet/intel/ice/ice_ethtool.c b/drivers/net/ethernet/intel/ice/ice_ethtool.c
index 6a97ddbbda76..948a33716290 100644
--- a/drivers/net/ethernet/intel/ice/ice_ethtool.c
+++ b/drivers/net/ethernet/intel/ice/ice_ethtool.c
@@ -155,7 +155,7 @@ struct ice_priv_flag {
static const struct ice_priv_flag ice_gstrings_priv_flags[] = {
ICE_PRIV_FLAG("link-down-on-close", ICE_FLAG_LINK_DOWN_ON_CLOSE_ENA),
- ICE_PRIV_FLAG("enable-fw-lldp", ICE_FLAG_ENABLE_FW_LLDP),
+ ICE_PRIV_FLAG("fw-lldp-agent", ICE_FLAG_FW_LLDP_AGENT),
};
#define ICE_PRIV_FLAG_ARRAY_SIZE ARRAY_SIZE(ice_gstrings_priv_flags)
@@ -1201,8 +1201,8 @@ static int ice_set_priv_flags(struct net_device *netdev, u32 flags)
bitmap_xor(change_flags, pf->flags, orig_flags, ICE_PF_FLAGS_NBITS);
- if (test_bit(ICE_FLAG_ENABLE_FW_LLDP, change_flags)) {
- if (!test_bit(ICE_FLAG_ENABLE_FW_LLDP, pf->flags)) {
+ if (test_bit(ICE_FLAG_FW_LLDP_AGENT, change_flags)) {
+ if (!test_bit(ICE_FLAG_FW_LLDP_AGENT, pf->flags)) {
enum ice_status status;
/* Disable FW LLDP engine */
diff --git a/drivers/net/ethernet/intel/ice/ice_lib.c b/drivers/net/ethernet/intel/ice/ice_lib.c
index c067ef6be7f4..343d0c305423 100644
--- a/drivers/net/ethernet/intel/ice/ice_lib.c
+++ b/drivers/net/ethernet/intel/ice/ice_lib.c
@@ -2541,7 +2541,7 @@ ice_vsi_setup(struct ice_pf *pf, struct ice_port_info *pi,
ice_cfg_sw_lldp(vsi, true, true);
/* Rx LLDP packets */
- if (!test_bit(ICE_FLAG_ENABLE_FW_LLDP, pf->flags))
+ if (!test_bit(ICE_FLAG_FW_LLDP_AGENT, pf->flags))
ice_cfg_sw_lldp(vsi, false, true);
}
@@ -2888,7 +2888,7 @@ int ice_vsi_release(struct ice_vsi *vsi)
/* The Rx rule will only exist to remove if the LLDP FW
* engine is currently stopped
*/
- if (!test_bit(ICE_FLAG_ENABLE_FW_LLDP, pf->flags))
+ if (!test_bit(ICE_FLAG_FW_LLDP_AGENT, pf->flags))
ice_cfg_sw_lldp(vsi, false, false);
}
--
2.21.0
^ permalink raw reply related
* [net-next 06/14] ice: reject VF attempts to enable head writeback
From: Jeff Kirsher @ 2019-08-23 23:37 UTC (permalink / raw)
To: davem; +Cc: Jacob Keller, netdev, nhorman, sassmann, Andrew Bowers,
Jeff Kirsher
In-Reply-To: <20190823233750.7997-1-jeffrey.t.kirsher@intel.com>
From: Jacob Keller <jacob.e.keller@intel.com>
The virtchnl interface provides a mechanism for a VF driver to request
head writeback support. This feature is deprecated as of AVF 1.0, but
older versions of a VF driver may still attempt to request the mode.
Since the ice hardware does not support head writeback, we should not
accept Tx queue configuration which attempts to enable it.
Currently, the driver simply assumes that the headwb_enabled bit will
never be set.
If a VF driver does request head writeback, the configuration will
return successfully, even though head writeback is not enabled. This
leaves the VF driver in a non functional state since it is assuming to
be operating in head writeback mode.
Fix the PF driver to reject any attempt to setup headwb_enabled.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c b/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c
index 1b1d1ea0c8f9..73ab6222d29b 100644
--- a/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c
+++ b/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c
@@ -2109,6 +2109,7 @@ static int ice_vc_cfg_qs_msg(struct ice_vf *vf, u8 *msg)
if (qpi->txq.vsi_id != qci->vsi_id ||
qpi->rxq.vsi_id != qci->vsi_id ||
qpi->rxq.queue_id != qpi->txq.queue_id ||
+ qpi->txq.headwb_enabled ||
!ice_vc_isvalid_q_id(vf, qci->vsi_id, qpi->txq.queue_id)) {
v_ret = VIRTCHNL_STATUS_ERR_PARAM;
goto error_param;
--
2.21.0
^ permalink raw reply related
* [net-next 08/14] ice: silence some bogus error messages
From: Jeff Kirsher @ 2019-08-23 23:37 UTC (permalink / raw)
To: davem
Cc: Mitch Williams, netdev, nhorman, sassmann, Tony Nguyen,
Andrew Bowers, Jeff Kirsher
In-Reply-To: <20190823233750.7997-1-jeffrey.t.kirsher@intel.com>
From: Mitch Williams <mitch.a.williams@intel.com>
In some circumstances, VF devices can be deactivated while a message is
in-flight. In that case, a series of scary error message will be
printed in the log. Since these are actually harmless, check for this
case and suppress them. No harm, no foul.
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
drivers/net/ethernet/intel/ice/ice_adminq_cmd.h | 1 +
drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c | 6 +++---
2 files changed, 4 insertions(+), 3 deletions(-)
diff --git a/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h b/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h
index 765e3c2ed045..bf9aa533a7c6 100644
--- a/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h
+++ b/drivers/net/ethernet/intel/ice/ice_adminq_cmd.h
@@ -1610,6 +1610,7 @@ enum ice_aq_err {
ICE_AQ_RC_EBUSY = 12, /* Device or resource busy */
ICE_AQ_RC_EEXIST = 13, /* Object already exists */
ICE_AQ_RC_ENOSPC = 16, /* No space left or allocation failure */
+ ICE_AQ_RC_ENOSYS = 17, /* Function not implemented */
};
/* Admin Queue command opcodes */
diff --git a/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c b/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c
index 73ab6222d29b..83e58e71081e 100644
--- a/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c
+++ b/drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c
@@ -1512,10 +1512,10 @@ ice_vc_send_msg_to_vf(struct ice_vf *vf, u32 v_opcode,
aq_ret = ice_aq_send_msg_to_vf(&pf->hw, vf->vf_id, v_opcode, v_retval,
msg, msglen, NULL);
- if (aq_ret) {
+ if (aq_ret && pf->hw.mailboxq.sq_last_status != ICE_AQ_RC_ENOSYS) {
dev_info(&pf->pdev->dev,
- "Unable to send the message to VF %d aq_err %d\n",
- vf->vf_id, pf->hw.mailboxq.sq_last_status);
+ "Unable to send the message to VF %d ret %d aq_err %d\n",
+ vf->vf_id, aq_ret, pf->hw.mailboxq.sq_last_status);
return -EIO;
}
--
2.21.0
^ permalink raw reply related
* [net PATCH] net: route dump netlink NLM_F_MULTI flag missing
From: John Fastabend @ 2019-08-24 0:11 UTC (permalink / raw)
To: sbrivio, davem, dsahern; +Cc: netdev, john.fastabend
An excerpt from netlink(7) man page,
In multipart messages (multiple nlmsghdr headers with associated payload
in one byte stream) the first and all following headers have the
NLM_F_MULTI flag set, except for the last header which has the type
NLMSG_DONE.
but, after (ee28906) there is a missing NLM_F_MULTI flag in the middle of a
FIB dump. The result is user space applications following above man page
excerpt may get confused and may stop parsing msg believing something went
wrong.
In the golang netlink lib [0] the library logic stops parsing believing the
message is not a multipart message. Found this running Cilium[1] against
net-next while adding a feature to auto-detect routes. I noticed with
multiple route tables we no longer could detect the default routes on net
tree kernels because the library logic was not returning them.
Fix this by handling the fib_dump_info_fnhe() case the same way the
fib_dump_info() handles it by passing the flags argument through the
call chain and adding a flags argument to rt_fill_info().
Tested with Cilium stack and auto-detection of routes works again. Also
annotated libs to dump netlink msgs and inspected NLM_F_MULTI and
NLMSG_DONE flags look correct after this.
Note: In inet_rtm_getroute() pass rt_fill_info() '0' for flags the same
as is done for fib_dump_info() so this looks correct to me.
[0] https://github.com/vishvananda/netlink/
[1] https://github.com/cilium/
Fixes: ee28906fd7a14 ("ipv4: Dump route exceptions if requested")
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
---
include/net/route.h | 2 +-
net/ipv4/fib_trie.c | 2 +-
net/ipv4/route.c | 17 ++++++++++-------
3 files changed, 12 insertions(+), 9 deletions(-)
diff --git a/include/net/route.h b/include/net/route.h
index 630a0493f1f3..dfce19c9fa96 100644
--- a/include/net/route.h
+++ b/include/net/route.h
@@ -233,7 +233,7 @@ void rt_del_uncached_list(struct rtable *rt);
int fib_dump_info_fnhe(struct sk_buff *skb, struct netlink_callback *cb,
u32 table_id, struct fib_info *fi,
- int *fa_index, int fa_start);
+ int *fa_index, int fa_start, unsigned int flags);
static inline void ip_rt_put(struct rtable *rt)
{
diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c
index 2b2b3d291ab0..1ab2fb6bb37d 100644
--- a/net/ipv4/fib_trie.c
+++ b/net/ipv4/fib_trie.c
@@ -2145,7 +2145,7 @@ static int fn_trie_dump_leaf(struct key_vector *l, struct fib_table *tb,
if (filter->dump_exceptions) {
err = fib_dump_info_fnhe(skb, cb, tb->tb_id, fi,
- &i_fa, s_fa);
+ &i_fa, s_fa, flags);
if (err < 0)
goto stop;
}
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 517300d587a7..b6a6f18c3dd1 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -2728,7 +2728,8 @@ EXPORT_SYMBOL_GPL(ip_route_output_flow);
/* called with rcu_read_lock held */
static int rt_fill_info(struct net *net, __be32 dst, __be32 src,
struct rtable *rt, u32 table_id, struct flowi4 *fl4,
- struct sk_buff *skb, u32 portid, u32 seq)
+ struct sk_buff *skb, u32 portid, u32 seq,
+ unsigned int flags)
{
struct rtmsg *r;
struct nlmsghdr *nlh;
@@ -2736,7 +2737,7 @@ static int rt_fill_info(struct net *net, __be32 dst, __be32 src,
u32 error;
u32 metrics[RTAX_MAX];
- nlh = nlmsg_put(skb, portid, seq, RTM_NEWROUTE, sizeof(*r), 0);
+ nlh = nlmsg_put(skb, portid, seq, RTM_NEWROUTE, sizeof(*r), flags);
if (!nlh)
return -EMSGSIZE;
@@ -2860,7 +2861,7 @@ static int rt_fill_info(struct net *net, __be32 dst, __be32 src,
static int fnhe_dump_bucket(struct net *net, struct sk_buff *skb,
struct netlink_callback *cb, u32 table_id,
struct fnhe_hash_bucket *bucket, int genid,
- int *fa_index, int fa_start)
+ int *fa_index, int fa_start, unsigned int flags)
{
int i;
@@ -2891,7 +2892,7 @@ static int fnhe_dump_bucket(struct net *net, struct sk_buff *skb,
err = rt_fill_info(net, fnhe->fnhe_daddr, 0, rt,
table_id, NULL, skb,
NETLINK_CB(cb->skb).portid,
- cb->nlh->nlmsg_seq);
+ cb->nlh->nlmsg_seq, flags);
if (err)
return err;
next:
@@ -2904,7 +2905,7 @@ static int fnhe_dump_bucket(struct net *net, struct sk_buff *skb,
int fib_dump_info_fnhe(struct sk_buff *skb, struct netlink_callback *cb,
u32 table_id, struct fib_info *fi,
- int *fa_index, int fa_start)
+ int *fa_index, int fa_start, unsigned int flags)
{
struct net *net = sock_net(cb->skb->sk);
int nhsel, genid = fnhe_genid(net);
@@ -2922,7 +2923,8 @@ int fib_dump_info_fnhe(struct sk_buff *skb, struct netlink_callback *cb,
err = 0;
if (bucket)
err = fnhe_dump_bucket(net, skb, cb, table_id, bucket,
- genid, fa_index, fa_start);
+ genid, fa_index, fa_start,
+ flags);
rcu_read_unlock();
if (err)
return err;
@@ -3183,7 +3185,8 @@ static int inet_rtm_getroute(struct sk_buff *in_skb, struct nlmsghdr *nlh,
fl4.flowi4_tos, res.fi, 0);
} else {
err = rt_fill_info(net, dst, src, rt, table_id, &fl4, skb,
- NETLINK_CB(in_skb).portid, nlh->nlmsg_seq);
+ NETLINK_CB(in_skb).portid,
+ nlh->nlmsg_seq, 0);
}
if (err < 0)
goto errout_rcu;
^ permalink raw reply related
* pull-request: bpf 2019-08-24
From: Daniel Borkmann @ 2019-08-24 0:11 UTC (permalink / raw)
To: davem, jakub.kicinski; +Cc: daniel, ast, andrii.nakryiko, netdev, bpf
Hi David,
The following pull-request contains BPF updates for your *net* tree.
The main changes are:
1) Fix verifier precision tracking with BPF-to-BPF calls, from Alexei.
2) Fix a use-after-free in prog symbol exposure, from Daniel.
3) Several s390x JIT fixes plus BE related fixes in BPF kselftests, from Ilya.
4) Fix memory leak by unpinning XDP umem pages in error path, from Ivan.
5) Fix a potential use-after-free on flow dissector detach, from Jakub.
6) Fix bpftool to close prog fd after showing metadata, from Quentin.
7) BPF kselftest config and TEST_PROGS_EXTENDED fixes, from Anders.
Please consider pulling these changes from:
git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git
Thanks a lot!
----------------------------------------------------------------
The following changes since commit 125b7e0949d4e72b15c2b1a1590f8cece985a918:
net: tc35815: Explicitly check NET_IP_ALIGN is not zero in tc35815_rx (2019-08-11 21:41:48 -0700)
are available in the Git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git
for you to fetch changes up to 2c238177bd7f4b14bdf7447cc1cd9bb791f147e6:
bpf: allow narrow loads of some sk_reuseport_md fields with offset > 0 (2019-08-24 01:25:41 +0200)
----------------------------------------------------------------
Alexei Starovoitov (1):
bpf: fix precision tracking in presence of bpf2bpf calls
Anders Roxell (2):
selftests/bpf: add config fragment BPF_JIT
selftests/bpf: install files test_xdp_vlan.sh
Daniel Borkmann (1):
bpf: fix use after free in prog symbol exposure
Ilya Leoshkevich (6):
s390/bpf: fix lcgr instruction encoding
s390/bpf: use 32-bit index for tail calls
selftests/bpf: fix "bind{4, 6} deny specific IP & port" on s390
selftests/bpf: fix test_cgroup_storage on s390
selftests/bpf: fix test_btf_dump with O=
bpf: allow narrow loads of some sk_reuseport_md fields with offset > 0
Ivan Khoronzhuk (1):
xdp: unpin xdp umem pages in error path
Jakub Sitnicki (1):
flow_dissector: Fix potential use-after-free on BPF_PROG_DETACH
Quentin Monnet (1):
tools: bpftool: close prog FD before exit on showing a single program
arch/s390/net/bpf_jit_comp.c | 12 +++++----
kernel/bpf/syscall.c | 30 ++++++++++++++---------
kernel/bpf/verifier.c | 9 ++++---
net/core/filter.c | 8 +++---
net/core/flow_dissector.c | 2 +-
net/xdp/xdp_umem.c | 4 ++-
tools/bpf/bpftool/prog.c | 4 ++-
tools/testing/selftests/bpf/Makefile | 6 ++++-
tools/testing/selftests/bpf/config | 1 +
tools/testing/selftests/bpf/test_btf_dump.c | 7 ++++++
tools/testing/selftests/bpf/test_cgroup_storage.c | 6 ++---
tools/testing/selftests/bpf/test_sock.c | 7 ++++--
12 files changed, 62 insertions(+), 34 deletions(-)
^ permalink raw reply
* Re: [PATCHv4 net 1/2] ipv4/icmp: fix rt dst dev null pointer dereference
From: Jonathan Lemon @ 2019-08-23 22:32 UTC (permalink / raw)
To: Hangbin Liu
Cc: netdev, Stefano Brivio, wenxu, Alexei Starovoitov,
David S . Miller, Eric Dumazet, Julian Anastasov
In-Reply-To: <20190822141949.29561-2-liuhangbin@gmail.com>
On 22 Aug 2019, at 7:19, Hangbin Liu wrote:
> In __icmp_send() there is a possibility that the rt->dst.dev is NULL,
> e,g, with tunnel collect_md mode, which will cause kernel crash.
> Here is what the code path looks like, for GRE:
>
> - ip6gre_tunnel_xmit
> - ip6gre_xmit_ipv4
> - __gre6_xmit
> - ip6_tnl_xmit
> - if skb->len - t->tun_hlen - eth_hlen > mtu; return -EMSGSIZE
> - icmp_send
> - net = dev_net(rt->dst.dev); <-- here
>
> The reason is __metadata_dst_init() init dst->dev to NULL by default.
> We could not fix it in __metadata_dst_init() as there is no dev supplied.
> On the other hand, the reason we need rt->dst.dev is to get the net.
> So we can just try get it from skb->dev when rt->dst.dev is NULL.
>
> v4: Julian Anastasov remind skb->dev also could be NULL. We'd better
> still use dst.dev and do a check to avoid crash.
>
> v3: No changes.
>
> v2: fix the issue in __icmp_send() instead of updating shared dst dev
> in {ip_md, ip6}_tunnel_xmit.
>
> Fixes: c8b34e680a09 ("ip_tunnel: Add tnl_update_pmtu in ip_md_tunnel_xmit")
> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com>
^ permalink raw reply
* Re: [PATCH net-next v2 03/10] net: sched: refactor block offloads counter usage
From: Jakub Kicinski @ 2019-08-24 0:26 UTC (permalink / raw)
To: Vlad Buslov; +Cc: netdev, jhs, xiyou.wangcong, jiri, davem, pablo
In-Reply-To: <20190823185056.12536-4-vladbu@mellanox.com>
On Fri, 23 Aug 2019 21:50:49 +0300, Vlad Buslov wrote:
> @@ -1201,14 +1199,11 @@ static int u32_reoffload_knode(struct tcf_proto *tp, struct tc_u_knode *n,
> cls_u32.knode.link_handle = ht->handle;
> }
>
> - err = cb(TC_SETUP_CLSU32, &cls_u32, cb_priv);
> - if (err) {
> - if (add && tc_skip_sw(n->flags))
> - return err;
> - return 0;
> - }
> -
> - tc_cls_offload_cnt_update(block, &n->in_hw_count, &n->flags, add);
> + err = tc_setup_cb_reoffload(block, tp, add, cb, TC_SETUP_CLSU32,
> + &cls_u32, cb_priv, &n->flags,
> + &n->in_hw_count);
> + if (err && add && tc_skip_sw(n->flags))
> + return err;
Could this be further simplified by adding something along the lines of:
if (!add || !tc_skip_sw(*flags))
err = 0;
to tc_setup_cb_reoffload() ?
>
> return 0;
> }
^ permalink raw reply
* Re: [PATCH v2] riscv: add support for SECCOMP and SECCOMP_FILTER
From: Paul Walmsley @ 2019-08-24 0:30 UTC (permalink / raw)
To: David Abdurachmanov, Tycho Andersen
Cc: Palmer Dabbelt, Albert Ou, Oleg Nesterov, Kees Cook,
Andy Lutomirski, Will Drewry, Shuah Khan, Alexei Starovoitov,
Daniel Borkmann, Martin KaFai Lau, Song Liu, Yonghong Song,
David Abdurachmanov, Thomas Gleixner, Allison Randal,
Alexios Zavras, Anup Patel, Vincent Chen, Alan Kao, linux-riscv,
linux-kernel, linux-kselftest, netdev, bpf, me
In-Reply-To: <20190822205533.4877-1-david.abdurachmanov@sifive.com>
On Thu, 22 Aug 2019, David Abdurachmanov wrote:
> There is one failing kernel selftest: global.user_notification_signal
Is this the only failing test? Or are the rest of the selftests skipped
when this test fails, and no further tests are run, as seems to be shown
here:
https://lore.kernel.org/linux-riscv/CADnnUqcmDMRe1f+3jG8SPR6jRrnBsY8VVD70VbKEm0NqYeoicA@mail.gmail.com/
For example, looking at the source, I'd naively expect to see the
user_notification_closed_listener test result -- which follows right
after the failing test in the selftest source. But there aren't any
results?
Also - could you follow up with the author of this failing test to see if
we can get some more clarity about what might be going wrong here? It
appears that the failing test was added in commit 6a21cc50f0c7f ("seccomp:
add a return code to trap to userspace") by Tycho Andersen
<tycho@tycho.ws>.
- Paul
^ permalink raw reply
* Re: pull-request: bpf 2019-08-24
From: David Miller @ 2019-08-24 0:34 UTC (permalink / raw)
To: daniel; +Cc: jakub.kicinski, ast, andrii.nakryiko, netdev, bpf
In-Reply-To: <20190824001157.16043-1-daniel@iogearbox.net>
From: Daniel Borkmann <daniel@iogearbox.net>
Date: Sat, 24 Aug 2019 02:11:57 +0200
> The following pull-request contains BPF updates for your *net* tree.
>
> The main changes are:
>
> 1) Fix verifier precision tracking with BPF-to-BPF calls, from Alexei.
>
> 2) Fix a use-after-free in prog symbol exposure, from Daniel.
>
> 3) Several s390x JIT fixes plus BE related fixes in BPF kselftests, from Ilya.
>
> 4) Fix memory leak by unpinning XDP umem pages in error path, from Ivan.
>
> 5) Fix a potential use-after-free on flow dissector detach, from Jakub.
>
> 6) Fix bpftool to close prog fd after showing metadata, from Quentin.
>
> 7) BPF kselftest config and TEST_PROGS_EXTENDED fixes, from Anders.
>
> Please consider pulling these changes from:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git
Pulled, thanks Daniel.
^ permalink raw reply
* [PATCHv2 1/1] net: rds: add service level support in rds-info
From: Zhu Yanjun @ 2019-08-24 1:04 UTC (permalink / raw)
To: santosh.shilimkar, davem, netdev, linux-rdma, rds-devel,
gerd.rausch
From IB specific 7.6.5 SERVICE LEVEL, Service Level (SL)
is used to identify different flows within an IBA subnet.
It is carried in the local route header of the packet.
Before this commit, run "rds-info -I". The outputs are as
below:
"
RDS IB Connections:
LocalAddr RemoteAddr Tos SL LocalDev RemoteDev
192.2.95.3 192.2.95.1 2 0 fe80::21:28:1a:39 fe80::21:28:10:b9
192.2.95.3 192.2.95.1 1 0 fe80::21:28:1a:39 fe80::21:28:10:b9
192.2.95.3 192.2.95.1 0 0 fe80::21:28:1a:39 fe80::21:28:10:b9
"
After this commit, the output is as below:
"
RDS IB Connections:
LocalAddr RemoteAddr Tos SL LocalDev RemoteDev
192.2.95.3 192.2.95.1 2 2 fe80::21:28:1a:39 fe80::21:28:10:b9
192.2.95.3 192.2.95.1 1 1 fe80::21:28:1a:39 fe80::21:28:10:b9
192.2.95.3 192.2.95.1 0 0 fe80::21:28:1a:39 fe80::21:28:10:b9
"
The commit fe3475af3bdf ("net: rds: add per rds connection cache
statistics") adds cache_allocs in struct rds_info_rdma_connection
as below:
struct rds_info_rdma_connection {
...
__u32 rdma_mr_max;
__u32 rdma_mr_size;
__u8 tos;
__u32 cache_allocs;
};
The peer struct in rds-tools of struct rds_info_rdma_connection is as
below:
struct rds_info_rdma_connection {
...
uint32_t rdma_mr_max;
uint32_t rdma_mr_size;
uint8_t tos;
uint8_t sl;
uint32_t cache_allocs;
};
The difference between userspace and kernel is the member variable sl.
In the kernel struct, the member variable sl is missing. This will
introduce risks. So it is necessary to use this commit to avoid this risk.
Fixes: fe3475af3bdf ("net: rds: add per rds connection cache statistics")
CC: Joe Jin <joe.jin@oracle.com>
CC: JUNXIAO_BI <junxiao.bi@oracle.com>
Suggested-by: Gerd Rausch <gerd.rausch@oracle.com>
Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
---
V1->V2: fix typos in commit logs.
---
include/uapi/linux/rds.h | 2 ++
net/rds/ib.c | 16 ++++++++++------
net/rds/ib.h | 1 +
net/rds/ib_cm.c | 3 +++
net/rds/rdma_transport.c | 10 ++++++++--
5 files changed, 24 insertions(+), 8 deletions(-)
diff --git a/include/uapi/linux/rds.h b/include/uapi/linux/rds.h
index fd6b5f6..cba368e 100644
--- a/include/uapi/linux/rds.h
+++ b/include/uapi/linux/rds.h
@@ -250,6 +250,7 @@ struct rds_info_rdma_connection {
__u32 rdma_mr_max;
__u32 rdma_mr_size;
__u8 tos;
+ __u8 sl;
__u32 cache_allocs;
};
@@ -265,6 +266,7 @@ struct rds6_info_rdma_connection {
__u32 rdma_mr_max;
__u32 rdma_mr_size;
__u8 tos;
+ __u8 sl;
__u32 cache_allocs;
};
diff --git a/net/rds/ib.c b/net/rds/ib.c
index ec05d91..45acab2 100644
--- a/net/rds/ib.c
+++ b/net/rds/ib.c
@@ -291,7 +291,7 @@ static int rds_ib_conn_info_visitor(struct rds_connection *conn,
void *buffer)
{
struct rds_info_rdma_connection *iinfo = buffer;
- struct rds_ib_connection *ic;
+ struct rds_ib_connection *ic = conn->c_transport_data;
/* We will only ever look at IB transports */
if (conn->c_trans != &rds_ib_transport)
@@ -301,15 +301,16 @@ static int rds_ib_conn_info_visitor(struct rds_connection *conn,
iinfo->src_addr = conn->c_laddr.s6_addr32[3];
iinfo->dst_addr = conn->c_faddr.s6_addr32[3];
- iinfo->tos = conn->c_tos;
+ if (ic) {
+ iinfo->tos = conn->c_tos;
+ iinfo->sl = ic->i_sl;
+ }
memset(&iinfo->src_gid, 0, sizeof(iinfo->src_gid));
memset(&iinfo->dst_gid, 0, sizeof(iinfo->dst_gid));
if (rds_conn_state(conn) == RDS_CONN_UP) {
struct rds_ib_device *rds_ibdev;
- ic = conn->c_transport_data;
-
rdma_read_gids(ic->i_cm_id, (union ib_gid *)&iinfo->src_gid,
(union ib_gid *)&iinfo->dst_gid);
@@ -329,7 +330,7 @@ static int rds6_ib_conn_info_visitor(struct rds_connection *conn,
void *buffer)
{
struct rds6_info_rdma_connection *iinfo6 = buffer;
- struct rds_ib_connection *ic;
+ struct rds_ib_connection *ic = conn->c_transport_data;
/* We will only ever look at IB transports */
if (conn->c_trans != &rds_ib_transport)
@@ -337,6 +338,10 @@ static int rds6_ib_conn_info_visitor(struct rds_connection *conn,
iinfo6->src_addr = conn->c_laddr;
iinfo6->dst_addr = conn->c_faddr;
+ if (ic) {
+ iinfo6->tos = conn->c_tos;
+ iinfo6->sl = ic->i_sl;
+ }
memset(&iinfo6->src_gid, 0, sizeof(iinfo6->src_gid));
memset(&iinfo6->dst_gid, 0, sizeof(iinfo6->dst_gid));
@@ -344,7 +349,6 @@ static int rds6_ib_conn_info_visitor(struct rds_connection *conn,
if (rds_conn_state(conn) == RDS_CONN_UP) {
struct rds_ib_device *rds_ibdev;
- ic = conn->c_transport_data;
rdma_read_gids(ic->i_cm_id, (union ib_gid *)&iinfo6->src_gid,
(union ib_gid *)&iinfo6->dst_gid);
rds_ibdev = ic->rds_ibdev;
diff --git a/net/rds/ib.h b/net/rds/ib.h
index 303c6ee..f2b558e 100644
--- a/net/rds/ib.h
+++ b/net/rds/ib.h
@@ -220,6 +220,7 @@ struct rds_ib_connection {
/* Send/Recv vectors */
int i_scq_vector;
int i_rcq_vector;
+ u8 i_sl;
};
/* This assumes that atomic_t is at least 32 bits */
diff --git a/net/rds/ib_cm.c b/net/rds/ib_cm.c
index fddaa09..233f136 100644
--- a/net/rds/ib_cm.c
+++ b/net/rds/ib_cm.c
@@ -152,6 +152,9 @@ void rds_ib_cm_connect_complete(struct rds_connection *conn, struct rdma_cm_even
RDS_PROTOCOL_MINOR(conn->c_version),
ic->i_flowctl ? ", flow control" : "");
+ /* receive sl from the peer */
+ ic->i_sl = ic->i_cm_id->route.path_rec->sl;
+
atomic_set(&ic->i_cq_quiesce, 0);
/* Init rings and fill recv. this needs to wait until protocol
diff --git a/net/rds/rdma_transport.c b/net/rds/rdma_transport.c
index ff74c4b..28668ad 100644
--- a/net/rds/rdma_transport.c
+++ b/net/rds/rdma_transport.c
@@ -43,6 +43,9 @@
static struct rdma_cm_id *rds6_rdma_listen_id;
#endif
+/* Per IB specification 7.7.3, service level is a 4-bit field. */
+#define TOS_TO_SL(tos) ((tos) & 0xF)
+
static int rds_rdma_cm_event_handler_cmn(struct rdma_cm_id *cm_id,
struct rdma_cm_event *event,
bool isv6)
@@ -97,10 +100,13 @@ static int rds_rdma_cm_event_handler_cmn(struct rdma_cm_id *cm_id,
struct rds_ib_connection *ibic;
ibic = conn->c_transport_data;
- if (ibic && ibic->i_cm_id == cm_id)
+ if (ibic && ibic->i_cm_id == cm_id) {
+ cm_id->route.path_rec[0].sl =
+ TOS_TO_SL(conn->c_tos);
ret = trans->cm_initiate_connect(cm_id, isv6);
- else
+ } else {
rds_conn_drop(conn);
+ }
}
break;
--
1.7.1
^ permalink raw reply related
* Re: [PATCH v2] riscv: add support for SECCOMP and SECCOMP_FILTER
From: David Abdurachmanov @ 2019-08-24 1:04 UTC (permalink / raw)
To: Paul Walmsley
Cc: Tycho Andersen, Palmer Dabbelt, Albert Ou, Oleg Nesterov,
Kees Cook, Andy Lutomirski, Will Drewry, Shuah Khan,
Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau, Song Liu,
Yonghong Song, David Abdurachmanov, Thomas Gleixner,
Allison Randal, Alexios Zavras, Anup Patel, Vincent Chen,
Alan Kao, linux-riscv, linux-kernel, linux-kselftest, netdev, bpf,
me
In-Reply-To: <alpine.DEB.2.21.9999.1908231717550.25649@viisi.sifive.com>
On Fri, Aug 23, 2019 at 5:30 PM Paul Walmsley <paul.walmsley@sifive.com> wrote:
>
> On Thu, 22 Aug 2019, David Abdurachmanov wrote:
>
> > There is one failing kernel selftest: global.user_notification_signal
>
> Is this the only failing test? Or are the rest of the selftests skipped
> when this test fails, and no further tests are run, as seems to be shown
> here:
>
> https://lore.kernel.org/linux-riscv/CADnnUqcmDMRe1f+3jG8SPR6jRrnBsY8VVD70VbKEm0NqYeoicA@mail.gmail.com/
Yes, it's a single test failing. After removing global.user_notification_signal
test everything else pass and you get the results printed.
>
> For example, looking at the source, I'd naively expect to see the
> user_notification_closed_listener test result -- which follows right
> after the failing test in the selftest source. But there aren't any
> results?
Yes, it hangs at this point. You have to manually terminate it.
>
> Also - could you follow up with the author of this failing test to see if
> we can get some more clarity about what might be going wrong here? It
> appears that the failing test was added in commit 6a21cc50f0c7f ("seccomp:
> add a return code to trap to userspace") by Tycho Andersen
> <tycho@tycho.ws>.
Well the code states ".. and hope that it doesn't break when there
is actually a signal :)". Maybe we are just unlucky. I don't have results
from other architectures to compare.
I found that Linaro is running selftests, but SECCOMP is disabled
and thus it's failing. Is there another CI which tracks selftests?
https://qa-reports.linaro.org/lkft/linux-next-oe/tests/kselftest/seccomp_seccomp_bpf?top=next-20190823
>
>
> - Paul
^ permalink raw reply
* Re: [net PATCH] net: route dump netlink NLM_F_MULTI flag missing
From: Stefano Brivio @ 2019-08-24 1:06 UTC (permalink / raw)
To: John Fastabend; +Cc: davem, dsahern, netdev
In-Reply-To: <156660549861.5753.7912871726096518275.stgit@john-XPS-13-9370>
On Fri, 23 Aug 2019 17:11:38 -0700
John Fastabend <john.fastabend@gmail.com> wrote:
> An excerpt from netlink(7) man page,
>
> In multipart messages (multiple nlmsghdr headers with associated payload
> in one byte stream) the first and all following headers have the
> NLM_F_MULTI flag set, except for the last header which has the type
> NLMSG_DONE.
>
> but, after (ee28906) there is a missing NLM_F_MULTI flag in the middle of a
> FIB dump.
In your case (see below), it can be zero or more, depending on how many
exception routes you have.
> The result is user space applications following above man page
> excerpt may get confused and may stop parsing msg believing something went
> wrong.
Worse yet, also RFC 3459 says:
[...] For multipart
messages, the first and all following headers have the NLM_F_MULTI
Netlink header flag set, except for the last header which has the
Netlink header type NLMSG_DONE.
But iproute2 doesn't check for this, so the selftests I added didn't
notice. Thanks for fixing this!
> In the golang netlink lib [0] the library logic stops parsing believing the
> message is not a multipart message. Found this running Cilium[1] against
> net-next while adding a feature to auto-detect routes. I noticed with
> multiple route tables we no longer could detect the default routes on net
> tree kernels because the library logic was not returning them.
However, note that if strict netlink checking is requested (I think the
library should be updated), and RTM_F_CLONED is not set (which should
be the case if you are just looking for "regular" routes), you won't
hit this.
> Fix this by handling the fib_dump_info_fnhe() case the same way the
> fib_dump_info() handles it by passing the flags argument through the
> call chain and adding a flags argument to rt_fill_info().
>
> Tested with Cilium stack and auto-detection of routes works again. Also
> annotated libs to dump netlink msgs and inspected NLM_F_MULTI and
> NLMSG_DONE flags look correct after this.
>
> Note: In inet_rtm_getroute() pass rt_fill_info() '0' for flags the same
> as is done for fib_dump_info() so this looks correct to me.
Yes, that's correct, because if the buffer is too small for a single
route dumped by a single rt_fill_info() call, we'll just fail, so that
will never be a multipart message.
> [0] https://github.com/vishvananda/netlink/
> [1] https://github.com/cilium/
>
> Fixes: ee28906fd7a14 ("ipv4: Dump route exceptions if requested")
> Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
--
Stefano
^ permalink raw reply
* Re: [PATCH v2] riscv: add support for SECCOMP and SECCOMP_FILTER
From: David Abdurachmanov @ 2019-08-24 1:10 UTC (permalink / raw)
To: Paul Walmsley
Cc: Tycho Andersen, Palmer Dabbelt, Albert Ou, Oleg Nesterov,
Kees Cook, Andy Lutomirski, Will Drewry, Shuah Khan,
Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau, Song Liu,
Yonghong Song, David Abdurachmanov, Thomas Gleixner,
Allison Randal, Alexios Zavras, Anup Patel, Vincent Chen,
Alan Kao, linux-riscv, linux-kernel, linux-kselftest, netdev, bpf,
me
In-Reply-To: <CAEn-LTp=ss0Dfv6J00=rCAy+N78U2AmhqJNjfqjr2FDpPYjxEQ@mail.gmail.com>
On Fri, Aug 23, 2019 at 6:04 PM David Abdurachmanov
<david.abdurachmanov@gmail.com> wrote:
>
> On Fri, Aug 23, 2019 at 5:30 PM Paul Walmsley <paul.walmsley@sifive.com> wrote:
> >
> > On Thu, 22 Aug 2019, David Abdurachmanov wrote:
> >
> > > There is one failing kernel selftest: global.user_notification_signal
> >
> > Is this the only failing test? Or are the rest of the selftests skipped
> > when this test fails, and no further tests are run, as seems to be shown
> > here:
> >
> > https://lore.kernel.org/linux-riscv/CADnnUqcmDMRe1f+3jG8SPR6jRrnBsY8VVD70VbKEm0NqYeoicA@mail.gmail.com/
>
> Yes, it's a single test failing. After removing global.user_notification_signal
> test everything else pass and you get the results printed.
>
> >
> > For example, looking at the source, I'd naively expect to see the
> > user_notification_closed_listener test result -- which follows right
> > after the failing test in the selftest source. But there aren't any
> > results?
>
> Yes, it hangs at this point. You have to manually terminate it.
>
> >
> > Also - could you follow up with the author of this failing test to see if
> > we can get some more clarity about what might be going wrong here? It
> > appears that the failing test was added in commit 6a21cc50f0c7f ("seccomp:
> > add a return code to trap to userspace") by Tycho Andersen
> > <tycho@tycho.ws>.
>
> Well the code states ".. and hope that it doesn't break when there
> is actually a signal :)". Maybe we are just unlucky. I don't have results
> from other architectures to compare.
>
> I found that Linaro is running selftests, but SECCOMP is disabled
> and thus it's failing. Is there another CI which tracks selftests?
>
> https://qa-reports.linaro.org/lkft/linux-next-oe/tests/kselftest/seccomp_seccomp_bpf?top=next-20190823
Actually it seems that seccomp is enabled in kernel, but not in
systemd, and somehow seccomp_bpf is missing on all arches thus
causing automatic failure.
> >
> >
> > - Paul
^ permalink raw reply
* Re: [PATCH v2] riscv: add support for SECCOMP and SECCOMP_FILTER
From: Paul Walmsley @ 2019-08-24 1:18 UTC (permalink / raw)
To: David Abdurachmanov
Cc: Tycho Andersen, Palmer Dabbelt, Albert Ou, Oleg Nesterov,
Kees Cook, Andy Lutomirski, Will Drewry, Shuah Khan,
Alexei Starovoitov, Daniel Borkmann, Martin KaFai Lau, Song Liu,
Yonghong Song, David Abdurachmanov, Thomas Gleixner,
Allison Randal, Alexios Zavras, Anup Patel, Vincent Chen,
Alan Kao, linux-riscv, linux-kernel, linux-kselftest, netdev, bpf,
me
In-Reply-To: <CAEn-LTp=ss0Dfv6J00=rCAy+N78U2AmhqJNjfqjr2FDpPYjxEQ@mail.gmail.com>
On Fri, 23 Aug 2019, David Abdurachmanov wrote:
> On Fri, Aug 23, 2019 at 5:30 PM Paul Walmsley <paul.walmsley@sifive.com> wrote:
> >
> > On Thu, 22 Aug 2019, David Abdurachmanov wrote:
> >
> > > There is one failing kernel selftest: global.user_notification_signal
> >
> > Is this the only failing test? Or are the rest of the selftests skipped
> > when this test fails, and no further tests are run, as seems to be shown
> > here:
> >
> > https://lore.kernel.org/linux-riscv/CADnnUqcmDMRe1f+3jG8SPR6jRrnBsY8VVD70VbKEm0NqYeoicA@mail.gmail.com/
>
> Yes, it's a single test failing. After removing global.user_notification_signal
> test everything else pass and you get the results printed.
OK.
> Well the code states ".. and hope that it doesn't break when there
> is actually a signal :)". Maybe we are just unlucky. I don't have results
> from other architectures to compare.
>
> I found that Linaro is running selftests, but SECCOMP is disabled
> and thus it's failing. Is there another CI which tracks selftests?
0day runs the kselftests, and at least on some architectures/Kconfigs,
it's succeeding:
https://lore.kernel.org/lkml/20190726083740.GG22106@shao2-debian/
https://lore.kernel.org/lkml/20190712064850.GC20848@shao2-debian/
https://lore.kernel.org/lkml/20190311074115.GC10839@shao2-debian/
etc.
- Paul
^ permalink raw reply
* Re: [PATCHv2 1/1] net: rds: add service level support in rds-info
From: santosh.shilimkar @ 2019-08-24 1:25 UTC (permalink / raw)
To: Zhu Yanjun, davem, netdev, linux-rdma, rds-devel, gerd.rausch
In-Reply-To: <1566608656-30836-1-git-send-email-yanjun.zhu@oracle.com>
On 8/23/19 6:04 PM, Zhu Yanjun wrote:
> From IB specific 7.6.5 SERVICE LEVEL, Service Level (SL)
> is used to identify different flows within an IBA subnet.
> It is carried in the local route header of the packet.
>
> Before this commit, run "rds-info -I". The outputs are as
> below:
> "
> RDS IB Connections:
> LocalAddr RemoteAddr Tos SL LocalDev RemoteDev
> 192.2.95.3 192.2.95.1 2 0 fe80::21:28:1a:39 fe80::21:28:10:b9
> 192.2.95.3 192.2.95.1 1 0 fe80::21:28:1a:39 fe80::21:28:10:b9
> 192.2.95.3 192.2.95.1 0 0 fe80::21:28:1a:39 fe80::21:28:10:b9
> "
> After this commit, the output is as below:
> "
> RDS IB Connections:
> LocalAddr RemoteAddr Tos SL LocalDev RemoteDev
> 192.2.95.3 192.2.95.1 2 2 fe80::21:28:1a:39 fe80::21:28:10:b9
> 192.2.95.3 192.2.95.1 1 1 fe80::21:28:1a:39 fe80::21:28:10:b9
> 192.2.95.3 192.2.95.1 0 0 fe80::21:28:1a:39 fe80::21:28:10:b9
> "
>
> The commit fe3475af3bdf ("net: rds: add per rds connection cache
> statistics") adds cache_allocs in struct rds_info_rdma_connection
> as below:
> struct rds_info_rdma_connection {
> ...
> __u32 rdma_mr_max;
> __u32 rdma_mr_size;
> __u8 tos;
> __u32 cache_allocs;
> };
> The peer struct in rds-tools of struct rds_info_rdma_connection is as
> below:
> struct rds_info_rdma_connection {
> ...
> uint32_t rdma_mr_max;
> uint32_t rdma_mr_size;
> uint8_t tos;
> uint8_t sl;
> uint32_t cache_allocs;
> };
> The difference between userspace and kernel is the member variable sl.
> In the kernel struct, the member variable sl is missing. This will
> introduce risks. So it is necessary to use this commit to avoid this risk.
>
> Fixes: fe3475af3bdf ("net: rds: add per rds connection cache statistics")
> CC: Joe Jin <joe.jin@oracle.com>
> CC: JUNXIAO_BI <junxiao.bi@oracle.com>
> Suggested-by: Gerd Rausch <gerd.rausch@oracle.com>
> Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
> ---
> V1->V2: fix typos in commit logs.
> ---
I did ask you when ypu posted the patch about whether you did
backward compatibility tests for which you said, you did all the
tests and said "So do not worry about backward compatibility. This
commit will work well with older rds-tools2.0.5 and 2.0.6."
https://www.spinics.net/lists/netdev/msg574691.html
I was worried about exactly such issue as described in commit.
Anyways thanks for the fixup patch. Should be applied to stable
as well.
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Regards,
Santosh
^ permalink raw reply
* Re: [net-next 07/14] ice: Rename ethtool private flag for lldp
From: Jakub Kicinski @ 2019-08-24 1:31 UTC (permalink / raw)
To: Jeff Kirsher; +Cc: davem, Dave Ertman, netdev, nhorman, sassmann, Andrew Bowers
In-Reply-To: <20190823233750.7997-8-jeffrey.t.kirsher@intel.com>
On Fri, 23 Aug 2019 16:37:43 -0700, Jeff Kirsher wrote:
> From: Dave Ertman <david.m.ertman@intel.com>
>
> The current flag name of "enable-fw-lldp" is a bit cumbersome.
>
> Change priv-flag name to "fw-lldp-agent" with a value of on or
> off. This is more straight-forward in meaning.
>
> Signed-off-by: Dave Ertman <david.m.ertman@intel.com>
> Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Just flagging this for Dave, it was introduced in v5.2 by:
commit 3a257a1404f8bf751a258ab92262dcb2cce39eef
Author: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Date: Thu Feb 28 15:24:31 2019 -0800
ice: Add code to control FW LLDP and DCBX
This patch adds code to start or stop LLDP and DCBX in firmware through
use of ethtool private flags.
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
And changed once already in v5.3-rc by:
commit 31eafa403b9945997cf5b321ae3560f072b74efe
Author: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Date: Tue Apr 16 10:24:25 2019 -0700
ice: Implement LLDP persistence
Implement LLDP persistence across reboots, start and stop of LLDP agent.
Add additional parameter to ice_aq_start_lldp and ice_aq_stop_lldp.
Also change the ethtool private flag from "disable-fw-lldp" to
"enable-fw-lldp". This change will flip the boolean logic of the
functionality of the flag (on = enable, off = disable). The change
in name and functionality is to differentiate between the
pre-persistence and post-persistence states.
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Perhaps the rename should target net? IDK how much driver flag renaming
is okay otherwise, I guess this will only affect Intel users.
^ permalink raw reply
* Re: [PATCHv2 1/1] net: rds: add service level support in rds-info
From: Zhu Yanjun @ 2019-08-24 1:36 UTC (permalink / raw)
To: santosh.shilimkar, davem, netdev, linux-rdma, rds-devel,
gerd.rausch
In-Reply-To: <6e5bc371-d613-e8f7-7b57-0b1bc2e10e9d@oracle.com>
On 2019/8/24 9:25, santosh.shilimkar@oracle.com wrote:
> On 8/23/19 6:04 PM, Zhu Yanjun wrote:
>> From IB specific 7.6.5 SERVICE LEVEL, Service Level (SL)
>> is used to identify different flows within an IBA subnet.
>> It is carried in the local route header of the packet.
>>
>> Before this commit, run "rds-info -I". The outputs are as
>> below:
>> "
>> RDS IB Connections:
>> LocalAddr RemoteAddr Tos SL LocalDev RemoteDev
>> 192.2.95.3 192.2.95.1 2 0 fe80::21:28:1a:39 fe80::21:28:10:b9
>> 192.2.95.3 192.2.95.1 1 0 fe80::21:28:1a:39 fe80::21:28:10:b9
>> 192.2.95.3 192.2.95.1 0 0 fe80::21:28:1a:39 fe80::21:28:10:b9
>> "
>> After this commit, the output is as below:
>> "
>> RDS IB Connections:
>> LocalAddr RemoteAddr Tos SL LocalDev RemoteDev
>> 192.2.95.3 192.2.95.1 2 2 fe80::21:28:1a:39 fe80::21:28:10:b9
>> 192.2.95.3 192.2.95.1 1 1 fe80::21:28:1a:39 fe80::21:28:10:b9
>> 192.2.95.3 192.2.95.1 0 0 fe80::21:28:1a:39 fe80::21:28:10:b9
>> "
>>
>> The commit fe3475af3bdf ("net: rds: add per rds connection cache
>> statistics") adds cache_allocs in struct rds_info_rdma_connection
>> as below:
>> struct rds_info_rdma_connection {
>> ...
>> __u32 rdma_mr_max;
>> __u32 rdma_mr_size;
>> __u8 tos;
>> __u32 cache_allocs;
>> };
>> The peer struct in rds-tools of struct rds_info_rdma_connection is as
>> below:
>> struct rds_info_rdma_connection {
>> ...
>> uint32_t rdma_mr_max;
>> uint32_t rdma_mr_size;
>> uint8_t tos;
>> uint8_t sl;
>> uint32_t cache_allocs;
>> };
>> The difference between userspace and kernel is the member variable sl.
>> In the kernel struct, the member variable sl is missing. This will
>> introduce risks. So it is necessary to use this commit to avoid this
>> risk.
>>
>> Fixes: fe3475af3bdf ("net: rds: add per rds connection cache
>> statistics")
>> CC: Joe Jin <joe.jin@oracle.com>
>> CC: JUNXIAO_BI <junxiao.bi@oracle.com>
>> Suggested-by: Gerd Rausch <gerd.rausch@oracle.com>
>> Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
>> ---
>> V1->V2: fix typos in commit logs.
>> ---
> I did ask you when ypu posted the patch about whether you did
> backward compatibility tests for which you said, you did all the
> tests and said "So do not worry about backward compatibility. This
> commit will work well with older rds-tools2.0.5 and 2.0.6."
>
> https://www.spinics.net/lists/netdev/msg574691.html
>
> I was worried about exactly such issue as described in commit.
Sorry. My bad. I will make more work to let rds robust.
Thanks a lot for your Ack.
Zhu Yanjun
>
> Anyways thanks for the fixup patch. Should be applied to stable
> as well.
>
> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
>
> Regards,
> Santosh
>
>
^ permalink raw reply
* [PATCH bpf] nfp: bpf: fix latency bug when updating stack index register
From: Jakub Kicinski @ 2019-08-24 2:00 UTC (permalink / raw)
To: alexei.starovoitov, daniel
Cc: bpf, netdev, oss-drivers, Jiong Wang, Jakub Kicinski
From: Jiong Wang <jiong.wang@netronome.com>
NFP is using Local Memory to model stack. LM_addr could be used as base of
a 16 32-bit word region of Local Memory. Then, if the stack offset is
beyond the current region, the local index needs to be updated. The update
needs at least three cycles to take effect, therefore the sequence normally
looks like:
local_csr_wr[ActLMAddr3, gprB_5]
nop
nop
nop
If the local index switch happens on a narrow loads, then the instruction
preparing value to zero high 32-bit of the destination register could be
counted as one cycle, the sequence then could be something like:
local_csr_wr[ActLMAddr3, gprB_5]
nop
nop
immed[gprB_5, 0]
However, we have zero extension optimization that zeroing high 32-bit could
be eliminated, therefore above IMMED insn won't be available for which case
the first sequence needs to be generated.
Fixes: 0b4de1ff19bf ("nfp: bpf: eliminate zero extension code-gen")
Signed-off-by: Jiong Wang <jiong.wang@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
---
drivers/net/ethernet/netronome/nfp/bpf/jit.c | 17 +++++++++++++----
1 file changed, 13 insertions(+), 4 deletions(-)
diff --git a/drivers/net/ethernet/netronome/nfp/bpf/jit.c b/drivers/net/ethernet/netronome/nfp/bpf/jit.c
index 4054b70d7719..5afcb3c4c2ef 100644
--- a/drivers/net/ethernet/netronome/nfp/bpf/jit.c
+++ b/drivers/net/ethernet/netronome/nfp/bpf/jit.c
@@ -1163,7 +1163,7 @@ mem_op_stack(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta,
bool clr_gpr, lmem_step step)
{
s32 off = nfp_prog->stack_frame_depth + meta->insn.off + ptr_off;
- bool first = true, last;
+ bool first = true, narrow_ld, last;
bool needs_inc = false;
swreg stack_off_reg;
u8 prev_gpr = 255;
@@ -1209,13 +1209,22 @@ mem_op_stack(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta,
needs_inc = true;
}
+
+ narrow_ld = clr_gpr && size < 8;
+
if (lm3) {
+ unsigned int nop_cnt;
+
emit_csr_wr(nfp_prog, imm_b(nfp_prog), NFP_CSR_ACT_LM_ADDR3);
- /* For size < 4 one slot will be filled by zeroing of upper. */
- wrp_nops(nfp_prog, clr_gpr && size < 8 ? 2 : 3);
+ /* For size < 4 one slot will be filled by zeroing of upper,
+ * but be careful, that zeroing could be eliminated by zext
+ * optimization.
+ */
+ nop_cnt = narrow_ld && meta->flags & FLAG_INSN_DO_ZEXT ? 2 : 3;
+ wrp_nops(nfp_prog, nop_cnt);
}
- if (clr_gpr && size < 8)
+ if (narrow_ld)
wrp_zext(nfp_prog, meta, gpr);
while (size) {
--
2.21.0
^ permalink raw reply related
* [PATCH RFC net-next 0/3] Multi-CPU DSA support
From: Marek Behún @ 2019-08-24 2:42 UTC (permalink / raw)
To: netdev
Cc: Andrew Lunn, Vivien Didelot, Florian Fainelli, David Ahern,
Stephen Hemminger, Marek Behún
Hi,
this is my attempt to solve the multi-CPU port issue for DSA.
Patch 1 adds code for handling multiple CPU ports in a DSA switch tree.
If more than one CPU port is found in a tree, the code assigns CPU ports
to user/DSA ports in a round robin way. So for the simplest case where
we have one switch with N ports, 2 of them of type CPU connected to eth0
and eth1, and the other ports labels being lan1, lan2, ..., the code
assigns them to CPU ports this way:
lan1 <-> eth0
lan2 <-> eth1
lan3 <-> eth0
lan4 <-> eth1
lan5 <-> eth0
...
Patch 2 adds a new operation to the net device operations structure.
Currently we use the iflink property of a net device to report to which
CPU port a given switch port si connected to. The ip link utility from
iproute2 reports this as "lan1@eth0". We add a new net device operation,
ndo_set_iflink, which can be used to set this property. We call this
function from the netlink handlers.
Patch 3 implements this new ndo_set_iflink operation for DSA slave
device. Thus the userspace can request a change of CPU port of a given
port.
I am also sending patch for iproute2-next, to add support for setting
this iflink value.
Marek
Marek Behún (3):
net: dsa: allow for multiple CPU ports
net: add ndo for setting the iflink property
net: dsa: implement ndo_set_netlink for chaning port's CPU port
include/linux/netdevice.h | 5 +++
include/net/dsa.h | 11 ++++-
net/core/dev.c | 15 +++++++
net/core/rtnetlink.c | 7 ++++
net/dsa/dsa2.c | 84 +++++++++++++++++++++++++--------------
net/dsa/slave.c | 35 ++++++++++++++++
6 files changed, 126 insertions(+), 31 deletions(-)
--
2.21.0
^ permalink raw reply
* [PATCH RFC net-next 2/3] net: add ndo for setting the iflink property
From: Marek Behún @ 2019-08-24 2:42 UTC (permalink / raw)
To: netdev
Cc: Andrew Lunn, Vivien Didelot, Florian Fainelli, David Ahern,
Stephen Hemminger, Marek Behún
In-Reply-To: <20190824024251.4542-1-marek.behun@nic.cz>
In DSA the iflink value is used to report to which CPU port a given
switch port is connected to. Since we want to support multi-CPU DSA, we
want the user to be able to change this value.
Add ndo_set_iflink method into the ndo strucutre to be a pair to
ndo_get_iflink. Also create dev_set_iflink and call this from the
netlink code, so that userspace can change the iflink value.
Signed-off-by: Marek Behún <marek.behun@nic.cz>
---
include/linux/netdevice.h | 5 +++++
net/core/dev.c | 15 +++++++++++++++
net/core/rtnetlink.c | 7 +++++++
3 files changed, 27 insertions(+)
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 55ac223553f8..45eeb6da8583 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1201,6 +1201,8 @@ struct tlsdev_ops;
* TX queue.
* int (*ndo_get_iflink)(const struct net_device *dev);
* Called to get the iflink value of this device.
+ * int (*ndo_set_iflink)(struct net_device *dev, int iflink);
+ * Called to set the iflink value of this device.
* void (*ndo_change_proto_down)(struct net_device *dev,
* bool proto_down);
* This function is used to pass protocol port error state information
@@ -1415,6 +1417,8 @@ struct net_device_ops {
int queue_index,
u32 maxrate);
int (*ndo_get_iflink)(const struct net_device *dev);
+ int (*ndo_set_iflink)(struct net_device *dev,
+ int iflink);
int (*ndo_change_proto_down)(struct net_device *dev,
bool proto_down);
int (*ndo_fill_metadata_dst)(struct net_device *dev,
@@ -2606,6 +2610,7 @@ void dev_add_offload(struct packet_offload *po);
void dev_remove_offload(struct packet_offload *po);
int dev_get_iflink(const struct net_device *dev);
+int dev_set_iflink(struct net_device *dev, int iflink);
int dev_fill_metadata_dst(struct net_device *dev, struct sk_buff *skb);
struct net_device *__dev_get_by_flags(struct net *net, unsigned short flags,
unsigned short mask);
diff --git a/net/core/dev.c b/net/core/dev.c
index 49589ed2018d..966bab196694 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -693,6 +693,21 @@ int dev_get_iflink(const struct net_device *dev)
}
EXPORT_SYMBOL(dev_get_iflink);
+/**
+ * dev_set_iflink - set 'iflink' value of an interface
+ * @dev: target interface
+ * @iflink: new value
+ *
+ * Change the interface to which this interface is linked to.
+ */
+int dev_set_iflink(struct net_device *dev, int iflink)
+{
+ if (dev->netdev_ops && dev->netdev_ops->ndo_set_iflink)
+ return dev->netdev_ops->ndo_set_iflink(dev, iflink);
+
+ return -EOPNOTSUPP;
+}
+
/**
* dev_fill_metadata_dst - Retrieve tunnel egress information.
* @dev: targeted interface
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 1ee6460f8275..106d5e23ae6f 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -2507,6 +2507,13 @@ static int do_setlink(const struct sk_buff *skb,
status |= DO_SETLINK_MODIFIED;
}
+ if (tb[IFLA_LINK]) {
+ err = dev_set_iflink(dev, nla_get_u32(tb[IFLA_LINK]));
+ if (err)
+ goto errout;
+ status |= DO_SETLINK_MODIFIED;
+ }
+
if (tb[IFLA_CARRIER]) {
err = dev_change_carrier(dev, nla_get_u8(tb[IFLA_CARRIER]));
if (err)
--
2.21.0
^ permalink raw reply related
* [PATCH RFC net-next 1/3] net: dsa: allow for multiple CPU ports
From: Marek Behún @ 2019-08-24 2:42 UTC (permalink / raw)
To: netdev
Cc: Andrew Lunn, Vivien Didelot, Florian Fainelli, David Ahern,
Stephen Hemminger, Marek Behún
In-Reply-To: <20190824024251.4542-1-marek.behun@nic.cz>
Allow for multiple CPU ports in a DSA switch tree. By default assign the
CPU ports to user ports in a round robin way, ie. if there are two CPU
ports connected to eth0 and eth1, and five user ports (lan1..lan5),
assign them as:
lan1 <-> eth0
lan2 <-> eth1
lan3 <-> eth0
lan4 <-> eth1
lan5 <-> eth0
Signed-off-by: Marek Behún <marek.behun@nic.cz>
---
include/net/dsa.h | 5 +--
net/dsa/dsa2.c | 84 +++++++++++++++++++++++++++++++----------------
2 files changed, 58 insertions(+), 31 deletions(-)
diff --git a/include/net/dsa.h b/include/net/dsa.h
index 147b757ef8ea..64bd70608f2f 100644
--- a/include/net/dsa.h
+++ b/include/net/dsa.h
@@ -123,9 +123,10 @@ struct dsa_switch_tree {
struct dsa_platform_data *pd;
/*
- * The switch port to which the CPU is attached.
+ * The switch ports to which the CPU is attached.
*/
- struct dsa_port *cpu_dp;
+ size_t num_cpu_dps;
+ struct dsa_port *cpu_dps[DSA_MAX_PORTS];
/*
* Data for the individual switch chips.
diff --git a/net/dsa/dsa2.c b/net/dsa/dsa2.c
index 8c4eccb0cfe6..c5af89079a6b 100644
--- a/net/dsa/dsa2.c
+++ b/net/dsa/dsa2.c
@@ -194,11 +194,12 @@ static bool dsa_tree_setup_routing_table(struct dsa_switch_tree *dst)
return complete;
}
-static struct dsa_port *dsa_tree_find_first_cpu(struct dsa_switch_tree *dst)
+static void dsa_tree_fill_cpu_ports(struct dsa_switch_tree *dst)
{
struct dsa_switch *ds;
struct dsa_port *dp;
int device, port;
+ int count = 0;
for (device = 0; device < DSA_MAX_SWITCHES; device++) {
ds = dst->ds[device];
@@ -208,28 +209,38 @@ static struct dsa_port *dsa_tree_find_first_cpu(struct dsa_switch_tree *dst)
for (port = 0; port < ds->num_ports; port++) {
dp = &ds->ports[port];
- if (dsa_port_is_cpu(dp))
- return dp;
+ if (dsa_port_is_cpu(dp)) {
+ if (count == ARRAY_SIZE(dst->cpu_dps)) {
+ pr_warn("Tree has too many CPU ports\n");
+ dst->num_cpu_dps = count;
+ return;
+ }
+
+ dst->cpu_dps[count++] = dp;
+ }
}
}
- return NULL;
+ dst->num_cpu_dps = count;
}
-static int dsa_tree_setup_default_cpu(struct dsa_switch_tree *dst)
+static int dsa_tree_setup_default_cpus(struct dsa_switch_tree *dst)
{
struct dsa_switch *ds;
struct dsa_port *dp;
- int device, port;
+ int device, port, i;
- /* DSA currently only supports a single CPU port */
- dst->cpu_dp = dsa_tree_find_first_cpu(dst);
- if (!dst->cpu_dp) {
+ dsa_tree_fill_cpu_ports(dst);
+ if (!dst->num_cpu_dps) {
pr_warn("Tree has no master device\n");
return -EINVAL;
}
- /* Assign the default CPU port to all ports of the fabric */
+ /* Assign the default CPU port to all ports of the fabric in a round
+ * robin way. This should work nicely for all sane switch tree designs.
+ */
+ i = 0;
+
for (device = 0; device < DSA_MAX_SWITCHES; device++) {
ds = dst->ds[device];
if (!ds)
@@ -238,18 +249,20 @@ static int dsa_tree_setup_default_cpu(struct dsa_switch_tree *dst)
for (port = 0; port < ds->num_ports; port++) {
dp = &ds->ports[port];
- if (dsa_port_is_user(dp) || dsa_port_is_dsa(dp))
- dp->cpu_dp = dst->cpu_dp;
+ if (dsa_port_is_user(dp) || dsa_port_is_dsa(dp)) {
+ dp->cpu_dp = dst->cpu_dps[i++];
+ if (i == dst->num_cpu_dps)
+ i = 0;
+ }
}
}
return 0;
}
-static void dsa_tree_teardown_default_cpu(struct dsa_switch_tree *dst)
+static void dsa_tree_teardown_default_cpus(struct dsa_switch_tree *dst)
{
- /* DSA currently only supports a single CPU port */
- dst->cpu_dp = NULL;
+ dst->num_cpu_dps = 0;
}
static int dsa_port_setup(struct dsa_port *dp)
@@ -493,21 +506,34 @@ static void dsa_tree_teardown_switches(struct dsa_switch_tree *dst)
}
}
-static int dsa_tree_setup_master(struct dsa_switch_tree *dst)
+static int dsa_tree_setup_masters(struct dsa_switch_tree *dst)
{
- struct dsa_port *cpu_dp = dst->cpu_dp;
- struct net_device *master = cpu_dp->master;
+ int i;
+ int err;
- /* DSA currently supports a single pair of CPU port and master device */
- return dsa_master_setup(master, cpu_dp);
+ for (i = 0; i < dst->num_cpu_dps; ++i) {
+ struct dsa_port *cpu_dp = dst->cpu_dps[i];
+ struct net_device *master = cpu_dp->master;
+
+ err = dsa_master_setup(master, cpu_dp);
+ if (err)
+ goto teardown;
+ }
+
+ return 0;
+teardown:
+ for (--i; i >= 0; --i)
+ dsa_master_teardown(dst->cpu_dps[i]->master);
+
+ return err;
}
-static void dsa_tree_teardown_master(struct dsa_switch_tree *dst)
+static void dsa_tree_teardown_masters(struct dsa_switch_tree *dst)
{
- struct dsa_port *cpu_dp = dst->cpu_dp;
- struct net_device *master = cpu_dp->master;
+ int i;
- return dsa_master_teardown(master);
+ for (i = 0; i < dst->num_cpu_dps; ++i)
+ dsa_master_teardown(dst->cpu_dps[i]->master);
}
static int dsa_tree_setup(struct dsa_switch_tree *dst)
@@ -525,7 +551,7 @@ static int dsa_tree_setup(struct dsa_switch_tree *dst)
if (!complete)
return 0;
- err = dsa_tree_setup_default_cpu(dst);
+ err = dsa_tree_setup_default_cpus(dst);
if (err)
return err;
@@ -533,7 +559,7 @@ static int dsa_tree_setup(struct dsa_switch_tree *dst)
if (err)
goto teardown_default_cpu;
- err = dsa_tree_setup_master(dst);
+ err = dsa_tree_setup_masters(dst);
if (err)
goto teardown_switches;
@@ -546,7 +572,7 @@ static int dsa_tree_setup(struct dsa_switch_tree *dst)
teardown_switches:
dsa_tree_teardown_switches(dst);
teardown_default_cpu:
- dsa_tree_teardown_default_cpu(dst);
+ dsa_tree_teardown_default_cpus(dst);
return err;
}
@@ -556,11 +582,11 @@ static void dsa_tree_teardown(struct dsa_switch_tree *dst)
if (!dst->setup)
return;
- dsa_tree_teardown_master(dst);
+ dsa_tree_teardown_masters(dst);
dsa_tree_teardown_switches(dst);
- dsa_tree_teardown_default_cpu(dst);
+ dsa_tree_teardown_default_cpus(dst);
pr_info("DSA: tree %d torn down\n", dst->index);
--
2.21.0
^ permalink raw reply related
* [PATCH RFC iproute2-next] iplink: allow to change iplink value
From: Marek Behún @ 2019-08-24 2:42 UTC (permalink / raw)
To: netdev
Cc: Andrew Lunn, Vivien Didelot, Florian Fainelli, David Ahern,
Stephen Hemminger, Marek Behún
In-Reply-To: <20190824024251.4542-1-marek.behun@nic.cz>
Allow to change the interface to which a given interface is linked to.
This is useful in the case of multi-CPU port DSA, for changing the CPU
port of a given user port.
Signed-off-by: Marek Behún <marek.behun@nic.cz>
Cc: David Ahern <dsahern@gmail.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
---
ip/iplink.c | 16 +++++-----------
man/man8/ip-link.8.in | 7 +++++++
2 files changed, 12 insertions(+), 11 deletions(-)
diff --git a/ip/iplink.c b/ip/iplink.c
index 212a0885..d52c0aaf 100644
--- a/ip/iplink.c
+++ b/ip/iplink.c
@@ -579,7 +579,6 @@ int iplink_parse(int argc, char **argv, struct iplink_req *req, char **type)
{
char *name = NULL;
char *dev = NULL;
- char *link = NULL;
int ret, len;
char abuf[32];
int qlen = -1;
@@ -590,6 +589,7 @@ int iplink_parse(int argc, char **argv, struct iplink_req *req, char **type)
int numrxqueues = -1;
int link_netnsid = -1;
int index = 0;
+ int link = -1;
int group = -1;
int addr_len = 0;
@@ -620,7 +620,10 @@ int iplink_parse(int argc, char **argv, struct iplink_req *req, char **type)
invarg("Invalid \"index\" value", *argv);
} else if (matches(*argv, "link") == 0) {
NEXT_ARG();
- link = *argv;
+ link = ll_name_to_index(*argv);
+ if (!link)
+ return nodev(*argv);
+ addattr32(&req->n, sizeof(*req), IFLA_LINK, link);
} else if (matches(*argv, "address") == 0) {
NEXT_ARG();
addr_len = ll_addr_a2n(abuf, sizeof(abuf), *argv);
@@ -1004,15 +1007,6 @@ int iplink_parse(int argc, char **argv, struct iplink_req *req, char **type)
exit(-1);
}
- if (link) {
- int ifindex;
-
- ifindex = ll_name_to_index(link);
- if (!ifindex)
- return nodev(link);
- addattr32(&req->n, sizeof(*req), IFLA_LINK, ifindex);
- }
-
req->i.ifi_index = index;
}
diff --git a/man/man8/ip-link.8.in b/man/man8/ip-link.8.in
index a8ae72d2..800aed05 100644
--- a/man/man8/ip-link.8.in
+++ b/man/man8/ip-link.8.in
@@ -149,6 +149,9 @@ ip-link \- network device configuration
.br
.RB "[ " nomaster " ]"
.br
+.RB "[ " link
+.IR DEVICE " ]"
+.br
.RB "[ " vrf
.IR NAME " ]"
.br
@@ -2131,6 +2134,10 @@ set master device of the device (enslave device).
.BI nomaster
unset master device of the device (release device).
+.TP
+.BI link " DEVICE"
+set device to which this device is linked to.
+
.TP
.BI addrgenmode " eui64|none|stable_secret|random"
set the IPv6 address generation mode
--
2.21.0
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox