* [PATCH net-next] ifb: fix packets checksum
From: Jon Maxwell @ 2018-05-24 21:38 UTC (permalink / raw)
To: davem
Cc: dsahern, mschiffer, zhangshengju, ktkhai, netdev, linux-kernel,
jmaxwell
Fixup the checksum for CHECKSUM_COMPLETE when pulling skbs on RX path.
Otherwise we get splats when tc mirred is used to redirect packets to ifb.
Before fix:
nic: hw csum failure
Signed-off-by: Jon Maxwell <jmaxwell37@gmail.com>
---
drivers/net/ifb.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ifb.c b/drivers/net/ifb.c
index 5f2897ec0edc..d345c61d476c 100644
--- a/drivers/net/ifb.c
+++ b/drivers/net/ifb.c
@@ -102,7 +102,7 @@ static void ifb_ri_tasklet(unsigned long _txp)
if (!skb->tc_from_ingress) {
dev_queue_xmit(skb);
} else {
- skb_pull(skb, skb->mac_len);
+ skb_pull_rcsum(skb, skb->mac_len);
netif_receive_skb(skb);
}
}
--
2.13.6
^ permalink raw reply related
* [net-next V2 1/6] net/dcb: Add dcbnl buffer attribute
From: Saeed Mahameed @ 2018-05-24 21:38 UTC (permalink / raw)
To: David S. Miller
Cc: netdev, Huy Nguyen, Ido Schimmel, Jakub Kicinski, Jiri Pirko,
Or Gerlitz, Parav Pandit, Aron Silverton, Saeed Mahameed
In-Reply-To: <20180524213820.5910-1-saeedm@mellanox.com>
From: Huy Nguyen <huyn@mellanox.com>
In this patch, we add dcbnl buffer attribute to allow user
change the NIC's buffer configuration such as priority
to buffer mapping and buffer size of individual buffer.
This attribute combined with pfc attribute allows advanced user to
fine tune the qos setting for specific priority queue. For example,
user can give dedicated buffer for one or more priorities or user
can give large buffer to certain priorities.
The dcb buffer configuration will be controlled by lldptool.
lldptool -T -i eth2 -V BUFFER prio 0,2,5,7,1,2,3,6
maps priorities 0,1,2,3,4,5,6,7 to receive buffer 0,2,5,7,1,2,3,6
lldptool -T -i eth2 -V BUFFER size 87296,87296,0,87296,0,0,0,0
sets receive buffer size for buffer 0,1,2,3,4,5,6,7 respectively
After discussion on mailing list with Jakub, Jiri, Ido and John, we agreed to
choose dcbnl over devlink interface since this feature is intended to set
port attributes which are governed by the netdev instance of that port, where
devlink API is more suitable for global ASIC configurations.
We present an use case scenario where dcbnl buffer attribute configured
by advance user helps reduce the latency of messages of different sizes.
Scenarios description:
On ConnectX-5, we run latency sensitive traffic with
small/medium message sizes ranging from 64B to 256KB and bandwidth sensitive
traffic with large messages sizes 512KB and 1MB. We group small, medium,
and large message sizes to their own pfc enables priorities as follow.
Priorities 1 & 2 (64B, 256B and 1KB)
Priorities 3 & 4 (4KB, 8KB, 16KB, 64KB, 128KB and 256KB)
Priorities 5 & 6 (512KB and 1MB)
By default, ConnectX-5 maps all pfc enabled priorities to a single
lossless fixed buffer size of 50% of total available buffer space. The
other 50% is assigned to lossy buffer. Using dcbnl buffer attribute,
we create three equal size lossless buffers. Each buffer has 25% of total
available buffer space. Thus, the lossy buffer size reduces to 25%. Priority
to lossless buffer mappings are set as follow.
Priorities 1 & 2 on lossless buffer #1
Priorities 3 & 4 on lossless buffer #2
Priorities 5 & 6 on lossless buffer #3
We observe improvements in latency for small and medium message sizes
as follows. Please note that the large message sizes bandwidth performance is
reduced but the total bandwidth remains the same.
256B message size (42 % latency reduction)
4K message size (21% latency reduction)
64K message size (16% latency reduction)
CC: Ido Schimmel <idosch@idosch.org>
CC: Jakub Kicinski <jakub.kicinski@netronome.com>
CC: Jiri Pirko <jiri@resnulli.us>
CC: Or Gerlitz <gerlitz.or@gmail.com>
CC: Parav Pandit <parav@mellanox.com>
CC: Aron Silverton <aron.silverton@oracle.com>
Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
include/net/dcbnl.h | 4 ++++
include/uapi/linux/dcbnl.h | 11 +++++++++++
net/dcb/dcbnl.c | 20 ++++++++++++++++++++
3 files changed, 35 insertions(+)
diff --git a/include/net/dcbnl.h b/include/net/dcbnl.h
index 207d9ba1f92c..0e5e91be2d30 100644
--- a/include/net/dcbnl.h
+++ b/include/net/dcbnl.h
@@ -101,6 +101,10 @@ struct dcbnl_rtnl_ops {
/* CEE peer */
int (*cee_peer_getpg) (struct net_device *, struct cee_pg *);
int (*cee_peer_getpfc) (struct net_device *, struct cee_pfc *);
+
+ /* buffer settings */
+ int (*dcbnl_getbuffer)(struct net_device *, struct dcbnl_buffer *);
+ int (*dcbnl_setbuffer)(struct net_device *, struct dcbnl_buffer *);
};
#endif /* __NET_DCBNL_H__ */
diff --git a/include/uapi/linux/dcbnl.h b/include/uapi/linux/dcbnl.h
index 2c0c6453c3f4..60aa2e446698 100644
--- a/include/uapi/linux/dcbnl.h
+++ b/include/uapi/linux/dcbnl.h
@@ -163,6 +163,16 @@ struct ieee_pfc {
__u64 indications[IEEE_8021QAZ_MAX_TCS];
};
+#define IEEE_8021Q_MAX_PRIORITIES 8
+#define DCBX_MAX_BUFFERS 8
+struct dcbnl_buffer {
+ /* priority to buffer mapping */
+ __u8 prio2buffer[IEEE_8021Q_MAX_PRIORITIES];
+ /* buffer size in Bytes */
+ __u32 buffer_size[DCBX_MAX_BUFFERS];
+ __u32 total_size;
+};
+
/* CEE DCBX std supported values */
#define CEE_DCBX_MAX_PGS 8
#define CEE_DCBX_MAX_PRIO 8
@@ -406,6 +416,7 @@ enum ieee_attrs {
DCB_ATTR_IEEE_MAXRATE,
DCB_ATTR_IEEE_QCN,
DCB_ATTR_IEEE_QCN_STATS,
+ DCB_ATTR_DCB_BUFFER,
__DCB_ATTR_IEEE_MAX
};
#define DCB_ATTR_IEEE_MAX (__DCB_ATTR_IEEE_MAX - 1)
diff --git a/net/dcb/dcbnl.c b/net/dcb/dcbnl.c
index bae7d78aa068..d2f4e0c1faaf 100644
--- a/net/dcb/dcbnl.c
+++ b/net/dcb/dcbnl.c
@@ -176,6 +176,7 @@ static const struct nla_policy dcbnl_ieee_policy[DCB_ATTR_IEEE_MAX + 1] = {
[DCB_ATTR_IEEE_MAXRATE] = {.len = sizeof(struct ieee_maxrate)},
[DCB_ATTR_IEEE_QCN] = {.len = sizeof(struct ieee_qcn)},
[DCB_ATTR_IEEE_QCN_STATS] = {.len = sizeof(struct ieee_qcn_stats)},
+ [DCB_ATTR_DCB_BUFFER] = {.len = sizeof(struct dcbnl_buffer)},
};
/* DCB number of traffic classes nested attributes. */
@@ -1094,6 +1095,16 @@ static int dcbnl_ieee_fill(struct sk_buff *skb, struct net_device *netdev)
return -EMSGSIZE;
}
+ if (ops->dcbnl_getbuffer) {
+ struct dcbnl_buffer buffer;
+
+ memset(&buffer, 0, sizeof(buffer));
+ err = ops->dcbnl_getbuffer(netdev, &buffer);
+ if (!err &&
+ nla_put(skb, DCB_ATTR_DCB_BUFFER, sizeof(buffer), &buffer))
+ return -EMSGSIZE;
+ }
+
app = nla_nest_start(skb, DCB_ATTR_IEEE_APP_TABLE);
if (!app)
return -EMSGSIZE;
@@ -1453,6 +1464,15 @@ static int dcbnl_ieee_set(struct net_device *netdev, struct nlmsghdr *nlh,
goto err;
}
+ if (ieee[DCB_ATTR_DCB_BUFFER] && ops->dcbnl_setbuffer) {
+ struct dcbnl_buffer *buffer =
+ nla_data(ieee[DCB_ATTR_DCB_BUFFER]);
+
+ err = ops->dcbnl_setbuffer(netdev, buffer);
+ if (err)
+ goto err;
+ }
+
if (ieee[DCB_ATTR_IEEE_APP_TABLE]) {
struct nlattr *attr;
int rem;
--
2.17.0
^ permalink raw reply related
* [net-next V2 2/6] net/mlx5e: Move port speed code from en_ethtool.c to en/port.c
From: Saeed Mahameed @ 2018-05-24 21:38 UTC (permalink / raw)
To: David S. Miller; +Cc: netdev, Huy Nguyen, Saeed Mahameed
In-Reply-To: <20180524213820.5910-1-saeedm@mellanox.com>
From: Huy Nguyen <huyn@mellanox.com>
Move four below functions from en_ethtool.c to en/port.c. These
functions are used by both en_ethtool.c and en_main.c. Future code
can use these functions without ethtool link mode dependency.
u32 mlx5e_port_ptys2speed(u32 eth_proto_oper);
int mlx5e_port_linkspeed(struct mlx5_core_dev *mdev, u32 *speed);
int mlx5e_port_max_linkspeed(struct mlx5_core_dev *mdev, u32 *speed);
u32 mlx5e_port_speed2linkmodes(u32 speed);
Delete the speed field from table mlx5e_build_ptys2ethtool_map. This
table only keeps the mapping between the mlx5e link mode and
ethtool link mode. Add new table mlx5e_link_speed for translation
from mlx5e link mode to actual speed.
Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
.../net/ethernet/mellanox/mlx5/core/Makefile | 2 +-
drivers/net/ethernet/mellanox/mlx5/core/en.h | 2 -
.../ethernet/mellanox/mlx5/core/en/Makefile | 1 +
.../net/ethernet/mellanox/mlx5/core/en/port.c | 129 ++++++++++++++++++
.../net/ethernet/mellanox/mlx5/core/en/port.h | 43 ++++++
.../ethernet/mellanox/mlx5/core/en_ethtool.c | 102 +++++---------
.../net/ethernet/mellanox/mlx5/core/en_main.c | 3 +-
.../net/ethernet/mellanox/mlx5/core/en_tc.c | 3 +-
8 files changed, 213 insertions(+), 72 deletions(-)
create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en/Makefile
create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en/port.c
create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en/port.h
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Makefile b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
index a7135f5d5cf6..651cf3640420 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/Makefile
+++ b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
@@ -15,7 +15,7 @@ mlx5_core-$(CONFIG_MLX5_FPGA) += fpga/cmd.o fpga/core.o fpga/conn.o fpga/sdk.o \
mlx5_core-$(CONFIG_MLX5_CORE_EN) += en_main.o en_common.o en_fs.o en_ethtool.o \
en_tx.o en_rx.o en_dim.o en_txrx.o en_stats.o vxlan.o \
- en_arfs.o en_fs_ethtool.o en_selftest.o
+ en_arfs.o en_fs_ethtool.o en_selftest.o en/port.o
mlx5_core-$(CONFIG_MLX5_MPFS) += lib/mpfs.o
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index bc91a7335c93..d13a86a1d702 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -932,8 +932,6 @@ void mlx5e_deactivate_priv_channels(struct mlx5e_priv *priv);
void mlx5e_build_default_indir_rqt(u32 *indirection_rqt, int len,
int num_channels);
-int mlx5e_get_max_linkspeed(struct mlx5_core_dev *mdev, u32 *speed);
-
void mlx5e_set_tx_cq_mode_params(struct mlx5e_params *params,
u8 cq_period_mode);
void mlx5e_set_rx_cq_mode_params(struct mlx5e_params *params,
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/Makefile b/drivers/net/ethernet/mellanox/mlx5/core/en/Makefile
new file mode 100644
index 000000000000..d8e17110f25d
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/Makefile
@@ -0,0 +1 @@
+subdir-ccflags-y += -I$(src)/..
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/port.c b/drivers/net/ethernet/mellanox/mlx5/core/en/port.c
new file mode 100644
index 000000000000..9f04542f3661
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/port.c
@@ -0,0 +1,129 @@
+/*
+ * Copyright (c) 2018, Mellanox Technologies. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses. You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ * - Redistributions of source code must retain the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer.
+ *
+ * - Redistributions in binary form must reproduce the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer in the documentation and/or other materials
+ * provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include "port.h"
+
+/* speed in units of 1Mb */
+static const u32 mlx5e_link_speed[MLX5E_LINK_MODES_NUMBER] = {
+ [MLX5E_1000BASE_CX_SGMII] = 1000,
+ [MLX5E_1000BASE_KX] = 1000,
+ [MLX5E_10GBASE_CX4] = 10000,
+ [MLX5E_10GBASE_KX4] = 10000,
+ [MLX5E_10GBASE_KR] = 10000,
+ [MLX5E_20GBASE_KR2] = 20000,
+ [MLX5E_40GBASE_CR4] = 40000,
+ [MLX5E_40GBASE_KR4] = 40000,
+ [MLX5E_56GBASE_R4] = 56000,
+ [MLX5E_10GBASE_CR] = 10000,
+ [MLX5E_10GBASE_SR] = 10000,
+ [MLX5E_10GBASE_ER] = 10000,
+ [MLX5E_40GBASE_SR4] = 40000,
+ [MLX5E_40GBASE_LR4] = 40000,
+ [MLX5E_50GBASE_SR2] = 50000,
+ [MLX5E_100GBASE_CR4] = 100000,
+ [MLX5E_100GBASE_SR4] = 100000,
+ [MLX5E_100GBASE_KR4] = 100000,
+ [MLX5E_100GBASE_LR4] = 100000,
+ [MLX5E_100BASE_TX] = 100,
+ [MLX5E_1000BASE_T] = 1000,
+ [MLX5E_10GBASE_T] = 10000,
+ [MLX5E_25GBASE_CR] = 25000,
+ [MLX5E_25GBASE_KR] = 25000,
+ [MLX5E_25GBASE_SR] = 25000,
+ [MLX5E_50GBASE_CR2] = 50000,
+ [MLX5E_50GBASE_KR2] = 50000,
+};
+
+u32 mlx5e_port_ptys2speed(u32 eth_proto_oper)
+{
+ unsigned long temp = eth_proto_oper;
+ u32 speed = 0;
+ int i;
+
+ i = find_first_bit(&temp, MLX5E_LINK_MODES_NUMBER);
+ if (i < MLX5E_LINK_MODES_NUMBER)
+ speed = mlx5e_link_speed[i];
+
+ return speed;
+}
+
+int mlx5e_port_linkspeed(struct mlx5_core_dev *mdev, u32 *speed)
+{
+ u32 out[MLX5_ST_SZ_DW(ptys_reg)] = {};
+ u32 eth_proto_oper;
+ int err;
+
+ err = mlx5_query_port_ptys(mdev, out, sizeof(out), MLX5_PTYS_EN, 1);
+ if (err)
+ return err;
+
+ eth_proto_oper = MLX5_GET(ptys_reg, out, eth_proto_oper);
+ *speed = mlx5e_port_ptys2speed(eth_proto_oper);
+ if (!(*speed)) {
+ mlx5_core_warn(mdev, "cannot get port speed\n");
+ err = -EINVAL;
+ }
+
+ return err;
+}
+
+int mlx5e_port_max_linkspeed(struct mlx5_core_dev *mdev, u32 *speed)
+{
+ u32 max_speed = 0;
+ u32 proto_cap;
+ int err;
+ int i;
+
+ err = mlx5_query_port_proto_cap(mdev, &proto_cap, MLX5_PTYS_EN);
+ if (err)
+ return err;
+
+ for (i = 0; i < MLX5E_LINK_MODES_NUMBER; ++i)
+ if (proto_cap & MLX5E_PROT_MASK(i))
+ max_speed = max(max_speed, mlx5e_link_speed[i]);
+
+ *speed = max_speed;
+ return 0;
+}
+
+u32 mlx5e_port_speed2linkmodes(u32 speed)
+{
+ u32 link_modes = 0;
+ int i;
+
+ for (i = 0; i < MLX5E_LINK_MODES_NUMBER; ++i) {
+ if (mlx5e_link_speed[i] == speed)
+ link_modes |= MLX5E_PROT_MASK(i);
+ }
+
+ return link_modes;
+}
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/port.h b/drivers/net/ethernet/mellanox/mlx5/core/en/port.h
new file mode 100644
index 000000000000..7aae38e98a65
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/port.h
@@ -0,0 +1,43 @@
+/*
+ * Copyright (c) 2018, Mellanox Technologies. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses. You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ * - Redistributions of source code must retain the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer.
+ *
+ * - Redistributions in binary form must reproduce the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer in the documentation and/or other materials
+ * provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#ifndef __MLX5E_EN_PORT_H
+#define __MLX5E_EN_PORT_H
+
+#include <linux/mlx5/driver.h>
+#include "en.h"
+
+u32 mlx5e_port_ptys2speed(u32 eth_proto_oper);
+int mlx5e_port_linkspeed(struct mlx5_core_dev *mdev, u32 *speed);
+int mlx5e_port_max_linkspeed(struct mlx5_core_dev *mdev, u32 *speed);
+u32 mlx5e_port_speed2linkmodes(u32 speed);
+#endif
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
index 2b786c4d3dab..42bd256e680d 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
@@ -31,6 +31,7 @@
*/
#include "en.h"
+#include "en/port.h"
void mlx5e_ethtool_get_drvinfo(struct mlx5e_priv *priv,
struct ethtool_drvinfo *drvinfo)
@@ -59,18 +60,16 @@ static void mlx5e_get_drvinfo(struct net_device *dev,
struct ptys2ethtool_config {
__ETHTOOL_DECLARE_LINK_MODE_MASK(supported);
__ETHTOOL_DECLARE_LINK_MODE_MASK(advertised);
- u32 speed;
};
static struct ptys2ethtool_config ptys2ethtool_table[MLX5E_LINK_MODES_NUMBER];
-#define MLX5_BUILD_PTYS2ETHTOOL_CONFIG(reg_, speed_, ...) \
+#define MLX5_BUILD_PTYS2ETHTOOL_CONFIG(reg_, ...) \
({ \
struct ptys2ethtool_config *cfg; \
const unsigned int modes[] = { __VA_ARGS__ }; \
unsigned int i; \
cfg = &ptys2ethtool_table[reg_]; \
- cfg->speed = speed_; \
bitmap_zero(cfg->supported, \
__ETHTOOL_LINK_MODE_MASK_NBITS); \
bitmap_zero(cfg->advertised, \
@@ -83,55 +82,55 @@ static struct ptys2ethtool_config ptys2ethtool_table[MLX5E_LINK_MODES_NUMBER];
void mlx5e_build_ptys2ethtool_map(void)
{
- MLX5_BUILD_PTYS2ETHTOOL_CONFIG(MLX5E_1000BASE_CX_SGMII, SPEED_1000,
+ MLX5_BUILD_PTYS2ETHTOOL_CONFIG(MLX5E_1000BASE_CX_SGMII,
ETHTOOL_LINK_MODE_1000baseKX_Full_BIT);
- MLX5_BUILD_PTYS2ETHTOOL_CONFIG(MLX5E_1000BASE_KX, SPEED_1000,
+ MLX5_BUILD_PTYS2ETHTOOL_CONFIG(MLX5E_1000BASE_KX,
ETHTOOL_LINK_MODE_1000baseKX_Full_BIT);
- MLX5_BUILD_PTYS2ETHTOOL_CONFIG(MLX5E_10GBASE_CX4, SPEED_10000,
+ MLX5_BUILD_PTYS2ETHTOOL_CONFIG(MLX5E_10GBASE_CX4,
ETHTOOL_LINK_MODE_10000baseKX4_Full_BIT);
- MLX5_BUILD_PTYS2ETHTOOL_CONFIG(MLX5E_10GBASE_KX4, SPEED_10000,
+ MLX5_BUILD_PTYS2ETHTOOL_CONFIG(MLX5E_10GBASE_KX4,
ETHTOOL_LINK_MODE_10000baseKX4_Full_BIT);
- MLX5_BUILD_PTYS2ETHTOOL_CONFIG(MLX5E_10GBASE_KR, SPEED_10000,
+ MLX5_BUILD_PTYS2ETHTOOL_CONFIG(MLX5E_10GBASE_KR,
ETHTOOL_LINK_MODE_10000baseKR_Full_BIT);
- MLX5_BUILD_PTYS2ETHTOOL_CONFIG(MLX5E_20GBASE_KR2, SPEED_20000,
+ MLX5_BUILD_PTYS2ETHTOOL_CONFIG(MLX5E_20GBASE_KR2,
ETHTOOL_LINK_MODE_20000baseKR2_Full_BIT);
- MLX5_BUILD_PTYS2ETHTOOL_CONFIG(MLX5E_40GBASE_CR4, SPEED_40000,
+ MLX5_BUILD_PTYS2ETHTOOL_CONFIG(MLX5E_40GBASE_CR4,
ETHTOOL_LINK_MODE_40000baseCR4_Full_BIT);
- MLX5_BUILD_PTYS2ETHTOOL_CONFIG(MLX5E_40GBASE_KR4, SPEED_40000,
+ MLX5_BUILD_PTYS2ETHTOOL_CONFIG(MLX5E_40GBASE_KR4,
ETHTOOL_LINK_MODE_40000baseKR4_Full_BIT);
- MLX5_BUILD_PTYS2ETHTOOL_CONFIG(MLX5E_56GBASE_R4, SPEED_56000,
+ MLX5_BUILD_PTYS2ETHTOOL_CONFIG(MLX5E_56GBASE_R4,
ETHTOOL_LINK_MODE_56000baseKR4_Full_BIT);
- MLX5_BUILD_PTYS2ETHTOOL_CONFIG(MLX5E_10GBASE_CR, SPEED_10000,
+ MLX5_BUILD_PTYS2ETHTOOL_CONFIG(MLX5E_10GBASE_CR,
ETHTOOL_LINK_MODE_10000baseKR_Full_BIT);
- MLX5_BUILD_PTYS2ETHTOOL_CONFIG(MLX5E_10GBASE_SR, SPEED_10000,
+ MLX5_BUILD_PTYS2ETHTOOL_CONFIG(MLX5E_10GBASE_SR,
ETHTOOL_LINK_MODE_10000baseKR_Full_BIT);
- MLX5_BUILD_PTYS2ETHTOOL_CONFIG(MLX5E_10GBASE_ER, SPEED_10000,
+ MLX5_BUILD_PTYS2ETHTOOL_CONFIG(MLX5E_10GBASE_ER,
ETHTOOL_LINK_MODE_10000baseKR_Full_BIT);
- MLX5_BUILD_PTYS2ETHTOOL_CONFIG(MLX5E_40GBASE_SR4, SPEED_40000,
+ MLX5_BUILD_PTYS2ETHTOOL_CONFIG(MLX5E_40GBASE_SR4,
ETHTOOL_LINK_MODE_40000baseSR4_Full_BIT);
- MLX5_BUILD_PTYS2ETHTOOL_CONFIG(MLX5E_40GBASE_LR4, SPEED_40000,
+ MLX5_BUILD_PTYS2ETHTOOL_CONFIG(MLX5E_40GBASE_LR4,
ETHTOOL_LINK_MODE_40000baseLR4_Full_BIT);
- MLX5_BUILD_PTYS2ETHTOOL_CONFIG(MLX5E_50GBASE_SR2, SPEED_50000,
+ MLX5_BUILD_PTYS2ETHTOOL_CONFIG(MLX5E_50GBASE_SR2,
ETHTOOL_LINK_MODE_50000baseSR2_Full_BIT);
- MLX5_BUILD_PTYS2ETHTOOL_CONFIG(MLX5E_100GBASE_CR4, SPEED_100000,
+ MLX5_BUILD_PTYS2ETHTOOL_CONFIG(MLX5E_100GBASE_CR4,
ETHTOOL_LINK_MODE_100000baseCR4_Full_BIT);
- MLX5_BUILD_PTYS2ETHTOOL_CONFIG(MLX5E_100GBASE_SR4, SPEED_100000,
+ MLX5_BUILD_PTYS2ETHTOOL_CONFIG(MLX5E_100GBASE_SR4,
ETHTOOL_LINK_MODE_100000baseSR4_Full_BIT);
- MLX5_BUILD_PTYS2ETHTOOL_CONFIG(MLX5E_100GBASE_KR4, SPEED_100000,
+ MLX5_BUILD_PTYS2ETHTOOL_CONFIG(MLX5E_100GBASE_KR4,
ETHTOOL_LINK_MODE_100000baseKR4_Full_BIT);
- MLX5_BUILD_PTYS2ETHTOOL_CONFIG(MLX5E_100GBASE_LR4, SPEED_100000,
+ MLX5_BUILD_PTYS2ETHTOOL_CONFIG(MLX5E_100GBASE_LR4,
ETHTOOL_LINK_MODE_100000baseLR4_ER4_Full_BIT);
- MLX5_BUILD_PTYS2ETHTOOL_CONFIG(MLX5E_10GBASE_T, SPEED_10000,
+ MLX5_BUILD_PTYS2ETHTOOL_CONFIG(MLX5E_10GBASE_T,
ETHTOOL_LINK_MODE_10000baseT_Full_BIT);
- MLX5_BUILD_PTYS2ETHTOOL_CONFIG(MLX5E_25GBASE_CR, SPEED_25000,
+ MLX5_BUILD_PTYS2ETHTOOL_CONFIG(MLX5E_25GBASE_CR,
ETHTOOL_LINK_MODE_25000baseCR_Full_BIT);
- MLX5_BUILD_PTYS2ETHTOOL_CONFIG(MLX5E_25GBASE_KR, SPEED_25000,
+ MLX5_BUILD_PTYS2ETHTOOL_CONFIG(MLX5E_25GBASE_KR,
ETHTOOL_LINK_MODE_25000baseKR_Full_BIT);
- MLX5_BUILD_PTYS2ETHTOOL_CONFIG(MLX5E_25GBASE_SR, SPEED_25000,
+ MLX5_BUILD_PTYS2ETHTOOL_CONFIG(MLX5E_25GBASE_SR,
ETHTOOL_LINK_MODE_25000baseSR_Full_BIT);
- MLX5_BUILD_PTYS2ETHTOOL_CONFIG(MLX5E_50GBASE_CR2, SPEED_50000,
+ MLX5_BUILD_PTYS2ETHTOOL_CONFIG(MLX5E_50GBASE_CR2,
ETHTOOL_LINK_MODE_50000baseCR2_Full_BIT);
- MLX5_BUILD_PTYS2ETHTOOL_CONFIG(MLX5E_50GBASE_KR2, SPEED_50000,
+ MLX5_BUILD_PTYS2ETHTOOL_CONFIG(MLX5E_50GBASE_KR2,
ETHTOOL_LINK_MODE_50000baseKR2_Full_BIT);
}
@@ -617,43 +616,24 @@ static void ptys2ethtool_supported_advertised_port(struct ethtool_link_ksettings
}
}
-int mlx5e_get_max_linkspeed(struct mlx5_core_dev *mdev, u32 *speed)
-{
- u32 max_speed = 0;
- u32 proto_cap;
- int err;
- int i;
-
- err = mlx5_query_port_proto_cap(mdev, &proto_cap, MLX5_PTYS_EN);
- if (err)
- return err;
-
- for (i = 0; i < MLX5E_LINK_MODES_NUMBER; ++i)
- if (proto_cap & MLX5E_PROT_MASK(i))
- max_speed = max(max_speed, ptys2ethtool_table[i].speed);
-
- *speed = max_speed;
- return 0;
-}
-
static void get_speed_duplex(struct net_device *netdev,
u32 eth_proto_oper,
struct ethtool_link_ksettings *link_ksettings)
{
- int i;
u32 speed = SPEED_UNKNOWN;
u8 duplex = DUPLEX_UNKNOWN;
if (!netif_carrier_ok(netdev))
goto out;
- for (i = 0; i < MLX5E_LINK_MODES_NUMBER; ++i) {
- if (eth_proto_oper & MLX5E_PROT_MASK(i)) {
- speed = ptys2ethtool_table[i].speed;
- duplex = DUPLEX_FULL;
- break;
- }
+ speed = mlx5e_port_ptys2speed(eth_proto_oper);
+ if (!speed) {
+ speed = SPEED_UNKNOWN;
+ goto out;
}
+
+ duplex = DUPLEX_FULL;
+
out:
link_ksettings->base.speed = speed;
link_ksettings->base.duplex = duplex;
@@ -811,18 +791,6 @@ static u32 mlx5e_ethtool2ptys_adver_link(const unsigned long *link_modes)
return ptys_modes;
}
-static u32 mlx5e_ethtool2ptys_speed_link(u32 speed)
-{
- u32 i, speed_links = 0;
-
- for (i = 0; i < MLX5E_LINK_MODES_NUMBER; ++i) {
- if (ptys2ethtool_table[i].speed == speed)
- speed_links |= MLX5E_PROT_MASK(i);
- }
-
- return speed_links;
-}
-
static int mlx5e_set_link_ksettings(struct net_device *netdev,
const struct ethtool_link_ksettings *link_ksettings)
{
@@ -842,7 +810,7 @@ static int mlx5e_set_link_ksettings(struct net_device *netdev,
link_modes = link_ksettings->base.autoneg == AUTONEG_ENABLE ?
mlx5e_ethtool2ptys_adver_link(link_ksettings->link_modes.advertising) :
- mlx5e_ethtool2ptys_speed_link(speed);
+ mlx5e_port_speed2linkmodes(speed);
err = mlx5_query_port_proto_cap(mdev, ð_proto_cap, MLX5_PTYS_EN);
if (err) {
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index b5a7580b12fe..cee44c21766c 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -46,6 +46,7 @@
#include "accel/ipsec.h"
#include "accel/tls.h"
#include "vxlan.h"
+#include "en/port.h"
struct mlx5e_rq_param {
u32 rqc[MLX5_ST_SZ_DW(rqc)];
@@ -4082,7 +4083,7 @@ static bool slow_pci_heuristic(struct mlx5_core_dev *mdev)
u32 link_speed = 0;
u32 pci_bw = 0;
- mlx5e_get_max_linkspeed(mdev, &link_speed);
+ mlx5e_port_max_linkspeed(mdev, &link_speed);
pci_bw = pcie_bandwidth_available(mdev->pdev, NULL, NULL, NULL);
mlx5_core_dbg_once(mdev, "Max link speed = %d, PCI BW = %d\n",
link_speed, pci_bw);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
index 674f1d7d2737..a9c96fe8e4fe 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
@@ -52,6 +52,7 @@
#include "eswitch.h"
#include "vxlan.h"
#include "fs_core.h"
+#include "en/port.h"
struct mlx5_nic_flow_attr {
u32 action;
@@ -613,7 +614,7 @@ static int mlx5e_hairpin_flow_add(struct mlx5e_priv *priv,
params.q_counter = priv->q_counter;
/* set hairpin pair per each 50Gbs share of the link */
- mlx5e_get_max_linkspeed(priv->mdev, &link_speed);
+ mlx5e_port_max_linkspeed(priv->mdev, &link_speed);
link_speed = max_t(u32, link_speed, 50000);
link_speed64 = link_speed;
do_div(link_speed64, 50000);
--
2.17.0
^ permalink raw reply related
* [net-next V2 3/6] net/mlx5: Add pbmc and pptb in the port_access_reg_cap_mask
From: Saeed Mahameed @ 2018-05-24 21:38 UTC (permalink / raw)
To: David S. Miller; +Cc: netdev, Huy Nguyen, Saeed Mahameed
In-Reply-To: <20180524213820.5910-1-saeedm@mellanox.com>
From: Huy Nguyen <huyn@mellanox.com>
Add pbmc and pptb in the port_access_reg_cap_mask. These two
bits determine if device supports receive buffer configuration.
Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
include/linux/mlx5/device.h | 3 +++
include/linux/mlx5/mlx5_ifc.h | 12 ++++++++++++
2 files changed, 15 insertions(+)
diff --git a/include/linux/mlx5/device.h b/include/linux/mlx5/device.h
index 2bc27f8c5b87..db0332a6d23c 100644
--- a/include/linux/mlx5/device.h
+++ b/include/linux/mlx5/device.h
@@ -1152,6 +1152,9 @@ enum mlx5_qcam_feature_groups {
#define MLX5_CAP_PCAM_FEATURE(mdev, fld) \
MLX5_GET(pcam_reg, (mdev)->caps.pcam, feature_cap_mask.enhanced_features.fld)
+#define MLX5_CAP_PCAM_REG(mdev, reg) \
+ MLX5_GET(pcam_reg, (mdev)->caps.pcam, port_access_reg_cap_mask.regs_5000_to_507f.reg)
+
#define MLX5_CAP_MCAM_REG(mdev, reg) \
MLX5_GET(mcam_reg, (mdev)->caps.mcam, mng_access_reg_cap_mask.access_regs.reg)
diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h
index b4ea8a9914c4..f687989d336b 100644
--- a/include/linux/mlx5/mlx5_ifc.h
+++ b/include/linux/mlx5/mlx5_ifc.h
@@ -8003,6 +8003,17 @@ struct mlx5_ifc_pcam_enhanced_features_bits {
u8 ppcnt_statistical_group[0x1];
};
+struct mlx5_ifc_pcam_regs_5000_to_507f_bits {
+ u8 port_access_reg_cap_mask_127_to_96[0x20];
+ u8 port_access_reg_cap_mask_95_to_64[0x20];
+ u8 port_access_reg_cap_mask_63_to_32[0x20];
+
+ u8 port_access_reg_cap_mask_31_to_13[0x13];
+ u8 pbmc[0x1];
+ u8 pptb[0x1];
+ u8 port_access_reg_cap_mask_10_to_0[0xb];
+};
+
struct mlx5_ifc_pcam_reg_bits {
u8 reserved_at_0[0x8];
u8 feature_group[0x8];
@@ -8012,6 +8023,7 @@ struct mlx5_ifc_pcam_reg_bits {
u8 reserved_at_20[0x20];
union {
+ struct mlx5_ifc_pcam_regs_5000_to_507f_bits regs_5000_to_507f;
u8 reserved_at_0[0x80];
} port_access_reg_cap_mask;
--
2.17.0
^ permalink raw reply related
* [net-next V2 4/6] net/mlx5: PPTB and PBMC register firmware command support
From: Saeed Mahameed @ 2018-05-24 21:38 UTC (permalink / raw)
To: David S. Miller; +Cc: netdev, Huy Nguyen, Saeed Mahameed
In-Reply-To: <20180524213820.5910-1-saeedm@mellanox.com>
From: Huy Nguyen <huyn@mellanox.com>
Add firmware command interface to read and write PPTB and PBMC
registers.
PPTB register enables mappings priority to a specific receive buffer.
PBMC registers enables changing the receive buffer's configuration such
as buffer size, xon/xoff thresholds, buffer's lossy property and
buffer's shared property.
Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
.../net/ethernet/mellanox/mlx5/core/en/port.c | 108 ++++++++++++++++++
.../net/ethernet/mellanox/mlx5/core/en/port.h | 5 +
include/linux/mlx5/driver.h | 2 +
include/linux/mlx5/mlx5_ifc.h | 35 ++++++
4 files changed, 150 insertions(+)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/port.c b/drivers/net/ethernet/mellanox/mlx5/core/en/port.c
index 9f04542f3661..24e3b564964f 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/port.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/port.c
@@ -127,3 +127,111 @@ u32 mlx5e_port_speed2linkmodes(u32 speed)
return link_modes;
}
+
+int mlx5e_port_query_pbmc(struct mlx5_core_dev *mdev, void *out)
+{
+ int sz = MLX5_ST_SZ_BYTES(pbmc_reg);
+ void *in;
+ int err;
+
+ in = kzalloc(sz, GFP_KERNEL);
+ if (!in)
+ return -ENOMEM;
+
+ MLX5_SET(pbmc_reg, in, local_port, 1);
+ err = mlx5_core_access_reg(mdev, in, sz, out, sz, MLX5_REG_PBMC, 0, 0);
+
+ kfree(in);
+ return err;
+}
+
+int mlx5e_port_set_pbmc(struct mlx5_core_dev *mdev, void *in)
+{
+ int sz = MLX5_ST_SZ_BYTES(pbmc_reg);
+ void *out;
+ int err;
+
+ out = kzalloc(sz, GFP_KERNEL);
+ if (!out)
+ return -ENOMEM;
+
+ MLX5_SET(pbmc_reg, in, local_port, 1);
+ err = mlx5_core_access_reg(mdev, in, sz, out, sz, MLX5_REG_PBMC, 0, 1);
+
+ kfree(out);
+ return err;
+}
+
+/* buffer[i]: buffer that priority i mapped to */
+int mlx5e_port_query_priority2buffer(struct mlx5_core_dev *mdev, u8 *buffer)
+{
+ int sz = MLX5_ST_SZ_BYTES(pptb_reg);
+ u32 prio_x_buff;
+ void *out;
+ void *in;
+ int prio;
+ int err;
+
+ in = kzalloc(sz, GFP_KERNEL);
+ out = kzalloc(sz, GFP_KERNEL);
+ if (!in || !out) {
+ err = -ENOMEM;
+ goto out;
+ }
+
+ MLX5_SET(pptb_reg, in, local_port, 1);
+ err = mlx5_core_access_reg(mdev, in, sz, out, sz, MLX5_REG_PPTB, 0, 0);
+ if (err)
+ goto out;
+
+ prio_x_buff = MLX5_GET(pptb_reg, out, prio_x_buff);
+ for (prio = 0; prio < 8; prio++) {
+ buffer[prio] = (u8)(prio_x_buff >> (4 * prio)) & 0xF;
+ mlx5_core_dbg(mdev, "prio %d, buffer %d\n", prio, buffer[prio]);
+ }
+out:
+ kfree(in);
+ kfree(out);
+ return err;
+}
+
+int mlx5e_port_set_priority2buffer(struct mlx5_core_dev *mdev, u8 *buffer)
+{
+ int sz = MLX5_ST_SZ_BYTES(pptb_reg);
+ u32 prio_x_buff;
+ void *out;
+ void *in;
+ int prio;
+ int err;
+
+ in = kzalloc(sz, GFP_KERNEL);
+ out = kzalloc(sz, GFP_KERNEL);
+ if (!in || !out) {
+ err = -ENOMEM;
+ goto out;
+ }
+
+ /* First query the pptb register */
+ MLX5_SET(pptb_reg, in, local_port, 1);
+ err = mlx5_core_access_reg(mdev, in, sz, out, sz, MLX5_REG_PPTB, 0, 0);
+ if (err)
+ goto out;
+
+ memcpy(in, out, sz);
+ MLX5_SET(pptb_reg, in, local_port, 1);
+
+ /* Update the pm and prio_x_buff */
+ MLX5_SET(pptb_reg, in, pm, 0xFF);
+
+ prio_x_buff = 0;
+ for (prio = 0; prio < 8; prio++)
+ prio_x_buff |= (buffer[prio] << (4 * prio));
+ MLX5_SET(pptb_reg, in, prio_x_buff, prio_x_buff);
+
+ err = mlx5_core_access_reg(mdev, in, sz, out, sz, MLX5_REG_PPTB, 0, 1);
+
+out:
+ kfree(in);
+ kfree(out);
+ return err;
+}
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/port.h b/drivers/net/ethernet/mellanox/mlx5/core/en/port.h
index 7aae38e98a65..f8cbd8194179 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/port.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/port.h
@@ -40,4 +40,9 @@ u32 mlx5e_port_ptys2speed(u32 eth_proto_oper);
int mlx5e_port_linkspeed(struct mlx5_core_dev *mdev, u32 *speed);
int mlx5e_port_max_linkspeed(struct mlx5_core_dev *mdev, u32 *speed);
u32 mlx5e_port_speed2linkmodes(u32 speed);
+
+int mlx5e_port_query_pbmc(struct mlx5_core_dev *mdev, void *out);
+int mlx5e_port_set_pbmc(struct mlx5_core_dev *mdev, void *in);
+int mlx5e_port_query_priority2buffer(struct mlx5_core_dev *mdev, u8 *buffer);
+int mlx5e_port_set_priority2buffer(struct mlx5_core_dev *mdev, u8 *buffer);
#endif
diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h
index d703774982ca..92d292454351 100644
--- a/include/linux/mlx5/driver.h
+++ b/include/linux/mlx5/driver.h
@@ -124,6 +124,8 @@ enum {
MLX5_REG_PAOS = 0x5006,
MLX5_REG_PFCC = 0x5007,
MLX5_REG_PPCNT = 0x5008,
+ MLX5_REG_PPTB = 0x500b,
+ MLX5_REG_PBMC = 0x500c,
MLX5_REG_PMAOS = 0x5012,
MLX5_REG_PUDE = 0x5009,
MLX5_REG_PMPE = 0x5010,
diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h
index f687989d336b..edbddeaacc88 100644
--- a/include/linux/mlx5/mlx5_ifc.h
+++ b/include/linux/mlx5/mlx5_ifc.h
@@ -8788,6 +8788,41 @@ struct mlx5_ifc_qpts_reg_bits {
u8 trust_state[0x3];
};
+struct mlx5_ifc_pptb_reg_bits {
+ u8 reserved_at_0[0x2];
+ u8 mm[0x2];
+ u8 reserved_at_4[0x4];
+ u8 local_port[0x8];
+ u8 reserved_at_10[0x6];
+ u8 cm[0x1];
+ u8 um[0x1];
+ u8 pm[0x8];
+
+ u8 prio_x_buff[0x20];
+
+ u8 pm_msb[0x8];
+ u8 reserved_at_48[0x10];
+ u8 ctrl_buff[0x4];
+ u8 untagged_buff[0x4];
+};
+
+struct mlx5_ifc_pbmc_reg_bits {
+ u8 reserved_at_0[0x8];
+ u8 local_port[0x8];
+ u8 reserved_at_10[0x10];
+
+ u8 xoff_timer_value[0x10];
+ u8 xoff_refresh[0x10];
+
+ u8 reserved_at_40[0x9];
+ u8 fullness_threshold[0x7];
+ u8 port_buffer_size[0x10];
+
+ struct mlx5_ifc_bufferx_reg_bits buffer[10];
+
+ u8 reserved_at_2e0[0x40];
+};
+
struct mlx5_ifc_qtct_reg_bits {
u8 reserved_at_0[0x8];
u8 port_number[0x8];
--
2.17.0
^ permalink raw reply related
* [net-next V2 5/6] net/mlx5e: Receive buffer configuration
From: Saeed Mahameed @ 2018-05-24 21:38 UTC (permalink / raw)
To: David S. Miller; +Cc: netdev, Huy Nguyen, Saeed Mahameed
In-Reply-To: <20180524213820.5910-1-saeedm@mellanox.com>
From: Huy Nguyen <huyn@mellanox.com>
Add APIs for buffer configuration based on the changes in
pfc configuration, cable len, buffer size configuration,
and priority to buffer mapping.
Note that the xoff fomula is as below
xoff = ((301+2.16 * len [m]) * speed [Gbps] + 2.72 MTU [B]
xoff_threshold = buffer_size - xoff
xon_threshold = xoff_threshold - MTU
Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
.../net/ethernet/mellanox/mlx5/core/Makefile | 2 +-
drivers/net/ethernet/mellanox/mlx5/core/en.h | 5 +
.../mellanox/mlx5/core/en/port_buffer.c | 327 ++++++++++++++++++
.../mellanox/mlx5/core/en/port_buffer.h | 75 ++++
4 files changed, 408 insertions(+), 1 deletion(-)
create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en/port_buffer.c
create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/en/port_buffer.h
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Makefile b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
index 651cf3640420..9efbf193ad5a 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/Makefile
+++ b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
@@ -21,7 +21,7 @@ mlx5_core-$(CONFIG_MLX5_MPFS) += lib/mpfs.o
mlx5_core-$(CONFIG_MLX5_ESWITCH) += eswitch.o eswitch_offloads.o en_rep.o en_tc.o
-mlx5_core-$(CONFIG_MLX5_CORE_EN_DCB) += en_dcbnl.o
+mlx5_core-$(CONFIG_MLX5_CORE_EN_DCB) += en_dcbnl.o en/port_buffer.o
mlx5_core-$(CONFIG_MLX5_CORE_IPOIB) += ipoib/ipoib.o ipoib/ethtool.o ipoib/ipoib_vlan.o
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index d13a86a1d702..9ab7158a7ce7 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -65,6 +65,7 @@ struct page_pool;
#define MLX5E_HW2SW_MTU(params, hwmtu) ((hwmtu) - ((params)->hard_mtu))
#define MLX5E_SW2HW_MTU(params, swmtu) ((swmtu) + ((params)->hard_mtu))
+#define MLX5E_MAX_PRIORITY 8
#define MLX5E_MAX_DSCP 64
#define MLX5E_MAX_NUM_TC 8
@@ -275,6 +276,10 @@ struct mlx5e_dcbx {
/* The only setting that cannot be read from FW */
u8 tc_tsa[IEEE_8021QAZ_MAX_TCS];
u8 cap;
+
+ /* Buffer configuration */
+ u32 cable_len;
+ u32 xoff;
};
struct mlx5e_dcbx_dp {
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/port_buffer.c b/drivers/net/ethernet/mellanox/mlx5/core/en/port_buffer.c
new file mode 100644
index 000000000000..c047da8752da
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/port_buffer.c
@@ -0,0 +1,327 @@
+/*
+ * Copyright (c) 2018, Mellanox Technologies. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses. You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ * - Redistributions of source code must retain the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer.
+ *
+ * - Redistributions in binary form must reproduce the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer in the documentation and/or other materials
+ * provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+#include "port_buffer.h"
+
+int mlx5e_port_query_buffer(struct mlx5e_priv *priv,
+ struct mlx5e_port_buffer *port_buffer)
+{
+ struct mlx5_core_dev *mdev = priv->mdev;
+ int sz = MLX5_ST_SZ_BYTES(pbmc_reg);
+ u32 total_used = 0;
+ void *buffer;
+ void *out;
+ int err;
+ int i;
+
+ out = kzalloc(sz, GFP_KERNEL);
+ if (!out)
+ return -ENOMEM;
+
+ err = mlx5e_port_query_pbmc(mdev, out);
+ if (err)
+ goto out;
+
+ for (i = 0; i < MLX5E_MAX_BUFFER; i++) {
+ buffer = MLX5_ADDR_OF(pbmc_reg, out, buffer[i]);
+ port_buffer->buffer[i].lossy =
+ MLX5_GET(bufferx_reg, buffer, lossy);
+ port_buffer->buffer[i].epsb =
+ MLX5_GET(bufferx_reg, buffer, epsb);
+ port_buffer->buffer[i].size =
+ MLX5_GET(bufferx_reg, buffer, size) << MLX5E_BUFFER_CELL_SHIFT;
+ port_buffer->buffer[i].xon =
+ MLX5_GET(bufferx_reg, buffer, xon_threshold) << MLX5E_BUFFER_CELL_SHIFT;
+ port_buffer->buffer[i].xoff =
+ MLX5_GET(bufferx_reg, buffer, xoff_threshold) << MLX5E_BUFFER_CELL_SHIFT;
+ total_used += port_buffer->buffer[i].size;
+
+ mlx5e_dbg(HW, priv, "buffer %d: size=%d, xon=%d, xoff=%d, epsb=%d, lossy=%d\n", i,
+ port_buffer->buffer[i].size,
+ port_buffer->buffer[i].xon,
+ port_buffer->buffer[i].xoff,
+ port_buffer->buffer[i].epsb,
+ port_buffer->buffer[i].lossy);
+ }
+
+ port_buffer->port_buffer_size =
+ MLX5_GET(pbmc_reg, out, port_buffer_size) << MLX5E_BUFFER_CELL_SHIFT;
+ port_buffer->spare_buffer_size =
+ port_buffer->port_buffer_size - total_used;
+
+ mlx5e_dbg(HW, priv, "total buffer size=%d, spare buffer size=%d\n",
+ port_buffer->port_buffer_size,
+ port_buffer->spare_buffer_size);
+out:
+ kfree(out);
+ return err;
+}
+
+static int port_set_buffer(struct mlx5e_priv *priv,
+ struct mlx5e_port_buffer *port_buffer)
+{
+ struct mlx5_core_dev *mdev = priv->mdev;
+ int sz = MLX5_ST_SZ_BYTES(pbmc_reg);
+ void *buffer;
+ void *in;
+ int err;
+ int i;
+
+ in = kzalloc(sz, GFP_KERNEL);
+ if (!in)
+ return -ENOMEM;
+
+ err = mlx5e_port_query_pbmc(mdev, in);
+ if (err)
+ goto out;
+
+ for (i = 0; i < MLX5E_MAX_BUFFER; i++) {
+ buffer = MLX5_ADDR_OF(pbmc_reg, in, buffer[i]);
+
+ MLX5_SET(bufferx_reg, buffer, size,
+ port_buffer->buffer[i].size >> MLX5E_BUFFER_CELL_SHIFT);
+ MLX5_SET(bufferx_reg, buffer, lossy,
+ port_buffer->buffer[i].lossy);
+ MLX5_SET(bufferx_reg, buffer, xoff_threshold,
+ port_buffer->buffer[i].xoff >> MLX5E_BUFFER_CELL_SHIFT);
+ MLX5_SET(bufferx_reg, buffer, xon_threshold,
+ port_buffer->buffer[i].xon >> MLX5E_BUFFER_CELL_SHIFT);
+ }
+
+ err = mlx5e_port_set_pbmc(mdev, in);
+out:
+ kfree(in);
+ return err;
+}
+
+/* xoff = ((301+2.16 * len [m]) * speed [Gbps] + 2.72 MTU [B]) */
+static u32 calculate_xoff(struct mlx5e_priv *priv, unsigned int mtu)
+{
+ u32 speed;
+ u32 xoff;
+ int err;
+
+ err = mlx5e_port_linkspeed(priv->mdev, &speed);
+ if (err)
+ return 0;
+
+ xoff = (301 + 216 * priv->dcbx.cable_len / 100) * speed / 1000 + 272 * mtu / 100;
+
+ mlx5e_dbg(HW, priv, "%s: xoff=%d\n", __func__, xoff);
+ return xoff;
+}
+
+static int update_xoff_threshold(struct mlx5e_port_buffer *port_buffer,
+ u32 xoff, unsigned int mtu)
+{
+ int i;
+
+ for (i = 0; i < MLX5E_MAX_BUFFER; i++) {
+ if (port_buffer->buffer[i].lossy) {
+ port_buffer->buffer[i].xoff = 0;
+ port_buffer->buffer[i].xon = 0;
+ continue;
+ }
+
+ if (port_buffer->buffer[i].size <
+ (xoff + mtu + (1 << MLX5E_BUFFER_CELL_SHIFT)))
+ return -ENOMEM;
+
+ port_buffer->buffer[i].xoff = port_buffer->buffer[i].size - xoff;
+ port_buffer->buffer[i].xon = port_buffer->buffer[i].xoff - mtu;
+ }
+
+ return 0;
+}
+
+/**
+ * update_buffer_lossy()
+ * mtu: device's MTU
+ * pfc_en: <input> current pfc configuration
+ * buffer: <input> current prio to buffer mapping
+ * xoff: <input> xoff value
+ * port_buffer: <output> port receive buffer configuration
+ * change: <output>
+ *
+ * Update buffer configuration based on pfc configuraiton and priority
+ * to buffer mapping.
+ * Buffer's lossy bit is changed to:
+ * lossless if there is at least one PFC enabled priority mapped to this buffer
+ * lossy if all priorities mapped to this buffer are PFC disabled
+ *
+ * Return:
+ * Return 0 if no error.
+ * Set change to true if buffer configuration is modified.
+ */
+static int update_buffer_lossy(unsigned int mtu,
+ u8 pfc_en, u8 *buffer, u32 xoff,
+ struct mlx5e_port_buffer *port_buffer,
+ bool *change)
+{
+ bool changed = false;
+ u8 lossy_count;
+ u8 prio_count;
+ u8 lossy;
+ int prio;
+ int err;
+ int i;
+
+ for (i = 0; i < MLX5E_MAX_BUFFER; i++) {
+ prio_count = 0;
+ lossy_count = 0;
+
+ for (prio = 0; prio < MLX5E_MAX_PRIORITY; prio++) {
+ if (buffer[prio] != i)
+ continue;
+
+ prio_count++;
+ lossy_count += !(pfc_en & (1 << prio));
+ }
+
+ if (lossy_count == prio_count)
+ lossy = 1;
+ else /* lossy_count < prio_count */
+ lossy = 0;
+
+ if (lossy != port_buffer->buffer[i].lossy) {
+ port_buffer->buffer[i].lossy = lossy;
+ changed = true;
+ }
+ }
+
+ if (changed) {
+ err = update_xoff_threshold(port_buffer, xoff, mtu);
+ if (err)
+ return err;
+
+ *change = true;
+ }
+
+ return 0;
+}
+
+int mlx5e_port_manual_buffer_config(struct mlx5e_priv *priv,
+ u32 change, unsigned int mtu,
+ struct ieee_pfc *pfc,
+ u32 *buffer_size,
+ u8 *prio2buffer)
+{
+ struct mlx5e_port_buffer port_buffer;
+ u32 xoff = calculate_xoff(priv, mtu);
+ bool update_prio2buffer = false;
+ u8 buffer[MLX5E_MAX_PRIORITY];
+ bool update_buffer = false;
+ u32 total_used = 0;
+ u8 curr_pfc_en;
+ int err;
+ int i;
+
+ mlx5e_dbg(HW, priv, "%s: change=%x\n", __func__, change);
+
+ err = mlx5e_port_query_buffer(priv, &port_buffer);
+ if (err)
+ return err;
+
+ if (change & MLX5E_PORT_BUFFER_CABLE_LEN) {
+ update_buffer = true;
+ err = update_xoff_threshold(&port_buffer, xoff, mtu);
+ if (err)
+ return err;
+ }
+
+ if (change & MLX5E_PORT_BUFFER_PFC) {
+ err = mlx5e_port_query_priority2buffer(priv->mdev, buffer);
+ if (err)
+ return err;
+
+ err = update_buffer_lossy(mtu, pfc->pfc_en, buffer, xoff,
+ &port_buffer, &update_buffer);
+ if (err)
+ return err;
+ }
+
+ if (change & MLX5E_PORT_BUFFER_PRIO2BUFFER) {
+ update_prio2buffer = true;
+ err = mlx5_query_port_pfc(priv->mdev, &curr_pfc_en, NULL);
+ if (err)
+ return err;
+
+ err = update_buffer_lossy(mtu, curr_pfc_en, prio2buffer, xoff,
+ &port_buffer, &update_buffer);
+ if (err)
+ return err;
+ }
+
+ if (change & MLX5E_PORT_BUFFER_SIZE) {
+ for (i = 0; i < MLX5E_MAX_BUFFER; i++) {
+ mlx5e_dbg(HW, priv, "%s: buffer[%d]=%d\n", __func__, i, buffer_size[i]);
+ if (!port_buffer.buffer[i].lossy && !buffer_size[i]) {
+ mlx5e_dbg(HW, priv, "%s: lossless buffer[%d] size cannot be zero\n",
+ __func__, i);
+ return -EINVAL;
+ }
+
+ port_buffer.buffer[i].size = buffer_size[i];
+ total_used += buffer_size[i];
+ }
+
+ mlx5e_dbg(HW, priv, "%s: total buffer requested=%d\n", __func__, total_used);
+
+ if (total_used > port_buffer.port_buffer_size)
+ return -EINVAL;
+
+ update_buffer = true;
+ err = update_xoff_threshold(&port_buffer, xoff, mtu);
+ if (err)
+ return err;
+ }
+
+ /* Need to update buffer configuration if xoff value is changed */
+ if (!update_buffer && xoff != priv->dcbx.xoff) {
+ update_buffer = true;
+ err = update_xoff_threshold(&port_buffer, xoff, mtu);
+ if (err)
+ return err;
+ }
+ priv->dcbx.xoff = xoff;
+
+ /* Apply the settings */
+ if (update_buffer) {
+ err = port_set_buffer(priv, &port_buffer);
+ if (err)
+ return err;
+ }
+
+ if (update_prio2buffer)
+ err = mlx5e_port_set_priority2buffer(priv->mdev, prio2buffer);
+
+ return err;
+}
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/port_buffer.h b/drivers/net/ethernet/mellanox/mlx5/core/en/port_buffer.h
new file mode 100644
index 000000000000..34f55b81a0de
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/port_buffer.h
@@ -0,0 +1,75 @@
+/*
+ * Copyright (c) 2018, Mellanox Technologies. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses. You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ * - Redistributions of source code must retain the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer.
+ *
+ * - Redistributions in binary form must reproduce the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer in the documentation and/or other materials
+ * provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+#ifndef __MLX5_EN_PORT_BUFFER_H__
+#define __MLX5_EN_PORT_BUFFER_H__
+
+#include "en.h"
+#include "port.h"
+
+#define MLX5E_MAX_BUFFER 8
+#define MLX5E_BUFFER_CELL_SHIFT 7
+#define MLX5E_DEFAULT_CABLE_LEN 7 /* 7 meters */
+
+#define MLX5_BUFFER_SUPPORTED(mdev) (MLX5_CAP_GEN(mdev, pcam_reg) && \
+ MLX5_CAP_PCAM_REG(mdev, pbmc) && \
+ MLX5_CAP_PCAM_REG(mdev, pptb))
+
+enum {
+ MLX5E_PORT_BUFFER_CABLE_LEN = BIT(0),
+ MLX5E_PORT_BUFFER_PFC = BIT(1),
+ MLX5E_PORT_BUFFER_PRIO2BUFFER = BIT(2),
+ MLX5E_PORT_BUFFER_SIZE = BIT(3),
+};
+
+struct mlx5e_bufferx_reg {
+ u8 lossy;
+ u8 epsb;
+ u32 size;
+ u32 xoff;
+ u32 xon;
+};
+
+struct mlx5e_port_buffer {
+ u32 port_buffer_size;
+ u32 spare_buffer_size;
+ struct mlx5e_bufferx_reg buffer[MLX5E_MAX_BUFFER];
+};
+
+int mlx5e_port_manual_buffer_config(struct mlx5e_priv *priv,
+ u32 change, unsigned int mtu,
+ struct ieee_pfc *pfc,
+ u32 *buffer_size,
+ u8 *prio2buffer);
+
+int mlx5e_port_query_buffer(struct mlx5e_priv *priv,
+ struct mlx5e_port_buffer *port_buffer);
+#endif
--
2.17.0
^ permalink raw reply related
* [net-next V2 6/6] net/mlx5e: Receive buffer support for DCBX
From: Saeed Mahameed @ 2018-05-24 21:38 UTC (permalink / raw)
To: David S. Miller; +Cc: netdev, Huy Nguyen, Saeed Mahameed
In-Reply-To: <20180524213820.5910-1-saeedm@mellanox.com>
From: Huy Nguyen <huyn@mellanox.com>
Add dcbnl's set/get buffer configuration callback that allows user to
set/get buffer size configuration and priority to buffer mapping.
By default, firmware controls receive buffer configuration and priority
of buffer mapping based on the changes in pfc settings. When set buffer
call back is triggered, the buffer configuration changes to manual mode.
The manual mode means mlx5 driver will adjust the buffer configuration
accordingly based on the changes in pfc settings.
ConnectX buffer stride is 128 Bytes. If the buffer size is not multiple
of 128, the buffer size will be rounded down to the nearest multiple of
128.
Signed-off-by: Huy Nguyen <huyn@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
drivers/net/ethernet/mellanox/mlx5/core/en.h | 1 +
.../ethernet/mellanox/mlx5/core/en_dcbnl.c | 132 +++++++++++++++++-
2 files changed, 126 insertions(+), 7 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index 9ab7158a7ce7..c5c7a6d687ff 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -278,6 +278,7 @@ struct mlx5e_dcbx {
u8 cap;
/* Buffer configuration */
+ bool manual_buffer;
u32 cable_len;
u32 xoff;
};
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c b/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c
index c641d5656b2d..0a52f31fef37 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_dcbnl.c
@@ -32,8 +32,8 @@
#include <linux/device.h>
#include <linux/netdevice.h>
#include "en.h"
-
-#define MLX5E_MAX_PRIORITY 8
+#include "en/port.h"
+#include "en/port_buffer.h"
#define MLX5E_100MB (100000)
#define MLX5E_1GB (1000000)
@@ -41,6 +41,9 @@
#define MLX5E_CEE_STATE_UP 1
#define MLX5E_CEE_STATE_DOWN 0
+/* Max supported cable length is 1000 meters */
+#define MLX5E_MAX_CABLE_LENGTH 1000
+
enum {
MLX5E_VENDOR_TC_GROUP_NUM = 7,
MLX5E_LOWEST_PRIO_GROUP = 0,
@@ -338,6 +341,9 @@ static int mlx5e_dcbnl_ieee_getpfc(struct net_device *dev,
pfc->indications[i] = PPORT_PER_PRIO_GET(pstats, i, rx_pause);
}
+ if (MLX5_BUFFER_SUPPORTED(mdev))
+ pfc->delay = priv->dcbx.cable_len;
+
return mlx5_query_port_pfc(mdev, &pfc->pfc_en, NULL);
}
@@ -346,16 +352,39 @@ static int mlx5e_dcbnl_ieee_setpfc(struct net_device *dev,
{
struct mlx5e_priv *priv = netdev_priv(dev);
struct mlx5_core_dev *mdev = priv->mdev;
+ u32 old_cable_len = priv->dcbx.cable_len;
+ struct ieee_pfc pfc_new;
+ u32 changed = 0;
u8 curr_pfc_en;
- int ret;
+ int ret = 0;
+ /* pfc_en */
mlx5_query_port_pfc(mdev, &curr_pfc_en, NULL);
+ if (pfc->pfc_en != curr_pfc_en) {
+ ret = mlx5_set_port_pfc(mdev, pfc->pfc_en, pfc->pfc_en);
+ if (ret)
+ return ret;
+ mlx5_toggle_port_link(mdev);
+ changed |= MLX5E_PORT_BUFFER_PFC;
+ }
- if (pfc->pfc_en == curr_pfc_en)
- return 0;
+ if (pfc->delay &&
+ pfc->delay < MLX5E_MAX_CABLE_LENGTH &&
+ pfc->delay != priv->dcbx.cable_len) {
+ priv->dcbx.cable_len = pfc->delay;
+ changed |= MLX5E_PORT_BUFFER_CABLE_LEN;
+ }
- ret = mlx5_set_port_pfc(mdev, pfc->pfc_en, pfc->pfc_en);
- mlx5_toggle_port_link(mdev);
+ if (MLX5_BUFFER_SUPPORTED(mdev)) {
+ pfc_new.pfc_en = (changed & MLX5E_PORT_BUFFER_PFC) ? pfc->pfc_en : curr_pfc_en;
+ if (priv->dcbx.manual_buffer)
+ ret = mlx5e_port_manual_buffer_config(priv, changed,
+ dev->mtu, &pfc_new,
+ NULL, NULL);
+
+ if (ret && (changed & MLX5E_PORT_BUFFER_CABLE_LEN))
+ priv->dcbx.cable_len = old_cable_len;
+ }
if (!ret) {
mlx5e_dbg(HW, priv,
@@ -873,6 +902,90 @@ static void mlx5e_dcbnl_setpfcstate(struct net_device *netdev, u8 state)
cee_cfg->pfc_enable = state;
}
+static int mlx5e_dcbnl_getbuffer(struct net_device *dev,
+ struct dcbnl_buffer *dcb_buffer)
+{
+ struct mlx5e_priv *priv = netdev_priv(dev);
+ struct mlx5_core_dev *mdev = priv->mdev;
+ struct mlx5e_port_buffer port_buffer;
+ u8 buffer[MLX5E_MAX_PRIORITY];
+ int i, err;
+
+ if (!MLX5_BUFFER_SUPPORTED(mdev))
+ return -EOPNOTSUPP;
+
+ err = mlx5e_port_query_priority2buffer(mdev, buffer);
+ if (err)
+ return err;
+
+ for (i = 0; i < MLX5E_MAX_PRIORITY; i++)
+ dcb_buffer->prio2buffer[i] = buffer[i];
+
+ err = mlx5e_port_query_buffer(priv, &port_buffer);
+ if (err)
+ return err;
+
+ for (i = 0; i < MLX5E_MAX_BUFFER; i++)
+ dcb_buffer->buffer_size[i] = port_buffer.buffer[i].size;
+ dcb_buffer->total_size = port_buffer.port_buffer_size;
+
+ return 0;
+}
+
+static int mlx5e_dcbnl_setbuffer(struct net_device *dev,
+ struct dcbnl_buffer *dcb_buffer)
+{
+ struct mlx5e_priv *priv = netdev_priv(dev);
+ struct mlx5_core_dev *mdev = priv->mdev;
+ struct mlx5e_port_buffer port_buffer;
+ u8 old_prio2buffer[MLX5E_MAX_PRIORITY];
+ u32 *buffer_size = NULL;
+ u8 *prio2buffer = NULL;
+ u32 changed = 0;
+ int i, err;
+
+ if (!MLX5_BUFFER_SUPPORTED(mdev))
+ return -EOPNOTSUPP;
+
+ for (i = 0; i < DCBX_MAX_BUFFERS; i++)
+ mlx5_core_dbg(mdev, "buffer[%d]=%d\n", i, dcb_buffer->buffer_size[i]);
+
+ for (i = 0; i < MLX5E_MAX_PRIORITY; i++)
+ mlx5_core_dbg(mdev, "priority %d buffer%d\n", i, dcb_buffer->prio2buffer[i]);
+
+ err = mlx5e_port_query_priority2buffer(mdev, old_prio2buffer);
+ if (err)
+ return err;
+
+ for (i = 0; i < MLX5E_MAX_PRIORITY; i++) {
+ if (dcb_buffer->prio2buffer[i] != old_prio2buffer[i]) {
+ changed |= MLX5E_PORT_BUFFER_PRIO2BUFFER;
+ prio2buffer = dcb_buffer->prio2buffer;
+ break;
+ }
+ }
+
+ err = mlx5e_port_query_buffer(priv, &port_buffer);
+ if (err)
+ return err;
+
+ for (i = 0; i < MLX5E_MAX_BUFFER; i++) {
+ if (port_buffer.buffer[i].size != dcb_buffer->buffer_size[i]) {
+ changed |= MLX5E_PORT_BUFFER_SIZE;
+ buffer_size = dcb_buffer->buffer_size;
+ break;
+ }
+ }
+
+ if (!changed)
+ return 0;
+
+ priv->dcbx.manual_buffer = true;
+ err = mlx5e_port_manual_buffer_config(priv, changed, dev->mtu, NULL,
+ buffer_size, prio2buffer);
+ return err;
+}
+
const struct dcbnl_rtnl_ops mlx5e_dcbnl_ops = {
.ieee_getets = mlx5e_dcbnl_ieee_getets,
.ieee_setets = mlx5e_dcbnl_ieee_setets,
@@ -884,6 +997,8 @@ const struct dcbnl_rtnl_ops mlx5e_dcbnl_ops = {
.ieee_delapp = mlx5e_dcbnl_ieee_delapp,
.getdcbx = mlx5e_dcbnl_getdcbx,
.setdcbx = mlx5e_dcbnl_setdcbx,
+ .dcbnl_getbuffer = mlx5e_dcbnl_getbuffer,
+ .dcbnl_setbuffer = mlx5e_dcbnl_setbuffer,
/* CEE interfaces */
.setall = mlx5e_dcbnl_setall,
@@ -1091,5 +1206,8 @@ void mlx5e_dcbnl_initialize(struct mlx5e_priv *priv)
if (priv->dcbx.mode == MLX5E_DCBX_PARAM_VER_OPER_HOST)
priv->dcbx.cap |= DCB_CAP_DCBX_HOST;
+ priv->dcbx.manual_buffer = false;
+ priv->dcbx.cable_len = MLX5E_DEFAULT_CABLE_LEN;
+
mlx5e_ets_init(priv);
}
--
2.17.0
^ permalink raw reply related
* Re: [PATCH net-next] net: phy: realtek: add suspend/resume callbacks for RTL8211B
From: Heiner Kallweit @ 2018-05-24 21:42 UTC (permalink / raw)
To: Andrew Lunn
Cc: David Miller, Realtek linux nic maintainers, Hau,
Florian Fainelli, netdev@vger.kernel.org, Kevin Hao
In-Reply-To: <20180524205302.GB6762@lunn.ch>
Am 24.05.2018 um 22:53 schrieb Andrew Lunn:
> On Thu, May 24, 2018 at 10:40:12PM +0200, Heiner Kallweit wrote:
>> Add RTL8211B suspend / resume callbacks.
>>
>> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
>> ---
>> This patch is based on my knowledge of the r8169 driver, and on some
>> guessing. Therefore I'd appreciate a confirmation from Realtek.
>>
>> The integrated PHY in some chips supported by the r8169 driver uses
>> a special sequence for power-down/-up. I have a board with a RTL8168D
>> network chip (one of the chips using the special sequence) and there
>> the PHY identifies as RTL8211B. So my guess is that this applies also
>> to external RTL8211B PHY's.
>>
>> A hint for RTL8211B requiring a special sequence is that no suspend/
>> resume callbacks are defined yet in the Realtek PHY driver.
>> Last but not least the non-standard usage of register MII_MMD_DATA
>> is in line with the description of patch 0231b1a074c6.
>> ("net: phy: realtek: Use the dummy stubs for MMD register access for rtl8211b")
>> ---
>> drivers/net/phy/realtek.c | 16 ++++++++++++++++
>> 1 file changed, 16 insertions(+)
>>
>> diff --git a/drivers/net/phy/realtek.c b/drivers/net/phy/realtek.c
>> index 9f48ecf9c..082fb40c6 100644
>> --- a/drivers/net/phy/realtek.c
>> +++ b/drivers/net/phy/realtek.c
>> @@ -145,6 +145,20 @@ static int rtl8211f_config_init(struct phy_device *phydev)
>> return phy_modify_paged(phydev, 0xd08, 0x11, RTL8211F_TX_DELAY, val);
>> }
>>
>> +static int rtl8211b_suspend(struct phy_device *phydev)
>> +{
>> + phy_write(phydev, MII_MMD_DATA, BIT(9));
>
> Hi Heiner
>
> Using it like this suggests it is not actually MMD_DATA, it is
> something else which just happens to use the same address as the
> optional MMD_DATA. To make this clearer, it would be good to add
> #defines for both the register address and this BIT(9). Is there any
> vendor code you know of which might give you a clue for appropriate
> names?
>
Vendor code (Realtek r8168 driver) just writes value 0x0200 to
register 0x0E. I also would have preferred to assign proper
names to register and bit. Maybe the Realtek people on cc can
provide some information.
> I guess this device also does not support EEE? Does phy_init_eee()
> correctly figure this out? Is there a chance calling phy_init_eee()
> might trigger a suspend?
>
Right, EEE isn't supported. phy_init_eee() figures this out correctly
because read_mmd callback is set to new genphy_read_mmd_unsupported.
Heiner
> Andrew
>
^ permalink raw reply
* [pull request][net 0/2] Mellanox, mlx5 fixes 2018-05-24
From: Saeed Mahameed @ 2018-05-24 21:53 UTC (permalink / raw)
To: David S. Miller; +Cc: netdev, Saeed Mahameed
Hi Dave,
This series includes two mlx5 fixes.
1) add FCS data to checksum complete when required, from Eran Ben
Elisha.
2) Fix A race in IPSec sandbox QP commands, from Yossi Kuperman.
Please pull and let me know if there's any problem.
for -stable v4.15
("net/mlx5e: When RXFCS is set, add FCS data into checksum calculation")
Thanks,
Saeed.
---
git format-pullreq $NTAG "for-next" $BASE $TARGET $NTAG
The following changes since commit d546b67cda015fb92bfee93d5dc0ceadb91deaee:
net/mlx4: Fix irq-unsafe spinlock usage (2018-05-23 15:48:58 -0400)
are available in the Git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux.git tags/mlx5-fixes-2018-05-24
for you to fetch changes up to 1dcbc01f73f9abc4779f71eae5e6dc61bee37229:
net/mlx5: IPSec, Fix a race between concurrent sandbox QP commands (2018-05-24 14:40:40 -0700)
----------------------------------------------------------------
mlx5-fixes-2018-05-24
----------------------------------------------------------------
Eran Ben Elisha (1):
net/mlx5e: When RXFCS is set, add FCS data into checksum calculation
Yossi Kuperman (1):
net/mlx5: IPSec, Fix a race between concurrent sandbox QP commands
drivers/net/ethernet/mellanox/mlx5/core/en_rx.c | 42 ++++++++++++++++++++++
.../net/ethernet/mellanox/mlx5/core/fpga/ipsec.c | 12 +++----
2 files changed, 47 insertions(+), 7 deletions(-)
^ permalink raw reply
* [net 1/2] net/mlx5e: When RXFCS is set, add FCS data into checksum calculation
From: Saeed Mahameed @ 2018-05-24 21:53 UTC (permalink / raw)
To: David S. Miller; +Cc: netdev, Eran Ben Elisha, Saeed Mahameed
In-Reply-To: <20180524215313.7605-1-saeedm@mellanox.com>
From: Eran Ben Elisha <eranbe@mellanox.com>
When RXFCS feature is enabled, the HW do not strip the FCS data,
however it is not present in the checksum calculated by the HW.
Fix that by manually calculating the FCS checksum and adding it to the SKB
checksum field.
Add helper function to find the FCS data for all SKB forms (linear,
one fragment or more).
Fixes: 102722fc6832 ("net/mlx5e: Add support for RXFCS feature flag")
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
.../net/ethernet/mellanox/mlx5/core/en_rx.c | 42 +++++++++++++++++++
1 file changed, 42 insertions(+)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
index 176645762e49..1ff0b0e93804 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
@@ -615,6 +615,45 @@ static inline bool is_last_ethertype_ip(struct sk_buff *skb, int *network_depth)
return (ethertype == htons(ETH_P_IP) || ethertype == htons(ETH_P_IPV6));
}
+static __be32 mlx5e_get_fcs(struct sk_buff *skb)
+{
+ int last_frag_sz, bytes_in_prev, nr_frags;
+ u8 *fcs_p1, *fcs_p2;
+ skb_frag_t *last_frag;
+ __be32 fcs_bytes;
+
+ if (!skb_is_nonlinear(skb))
+ return *(__be32 *)(skb->data + skb->len - ETH_FCS_LEN);
+
+ nr_frags = skb_shinfo(skb)->nr_frags;
+ last_frag = &skb_shinfo(skb)->frags[nr_frags - 1];
+ last_frag_sz = skb_frag_size(last_frag);
+
+ /* If all FCS data is in last frag */
+ if (last_frag_sz >= ETH_FCS_LEN)
+ return *(__be32 *)(skb_frag_address(last_frag) +
+ last_frag_sz - ETH_FCS_LEN);
+
+ fcs_p2 = (u8 *)skb_frag_address(last_frag);
+ bytes_in_prev = ETH_FCS_LEN - last_frag_sz;
+
+ /* Find where the other part of the FCS is - Linear or another frag */
+ if (nr_frags == 1) {
+ fcs_p1 = skb_tail_pointer(skb);
+ } else {
+ skb_frag_t *prev_frag = &skb_shinfo(skb)->frags[nr_frags - 2];
+
+ fcs_p1 = skb_frag_address(prev_frag) +
+ skb_frag_size(prev_frag);
+ }
+ fcs_p1 -= bytes_in_prev;
+
+ memcpy(&fcs_bytes, fcs_p1, bytes_in_prev);
+ memcpy(((u8 *)&fcs_bytes) + bytes_in_prev, fcs_p2, last_frag_sz);
+
+ return fcs_bytes;
+}
+
static inline void mlx5e_handle_csum(struct net_device *netdev,
struct mlx5_cqe64 *cqe,
struct mlx5e_rq *rq,
@@ -643,6 +682,9 @@ static inline void mlx5e_handle_csum(struct net_device *netdev,
skb->csum = csum_partial(skb->data + ETH_HLEN,
network_depth - ETH_HLEN,
skb->csum);
+ if (unlikely(netdev->features & NETIF_F_RXFCS))
+ skb->csum = csum_add(skb->csum,
+ (__force __wsum)mlx5e_get_fcs(skb));
rq->stats.csum_complete++;
return;
}
--
2.17.0
^ permalink raw reply related
* [net 2/2] net/mlx5: IPSec, Fix a race between concurrent sandbox QP commands
From: Saeed Mahameed @ 2018-05-24 21:53 UTC (permalink / raw)
To: David S. Miller; +Cc: netdev, Yossi Kuperman, Adi Nissim, Saeed Mahameed
In-Reply-To: <20180524215313.7605-1-saeedm@mellanox.com>
From: Yossi Kuperman <yossiku@mellanox.com>
Sandbox QP Commands are retired in the order they are sent. Outstanding
commands are stored in a linked-list in the order they appear. Once a
response is received and the callback gets called, we pull the first
element off the pending list, assuming they correspond.
Sending a message and adding it to the pending list is not done atomically,
hence there is an opportunity for a race between concurrent requests.
Bind both send and add under a critical section.
Fixes: bebb23e6cb02 ("net/mlx5: Accel, Add IPSec acceleration interface")
Signed-off-by: Yossi Kuperman <yossiku@mellanox.com>
Signed-off-by: Adi Nissim <adin@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
drivers/net/ethernet/mellanox/mlx5/core/fpga/ipsec.c | 12 +++++-------
1 file changed, 5 insertions(+), 7 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fpga/ipsec.c b/drivers/net/ethernet/mellanox/mlx5/core/fpga/ipsec.c
index 0f5da499a223..fad8c2e3804e 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fpga/ipsec.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fpga/ipsec.c
@@ -237,19 +237,17 @@ static void *mlx5_fpga_ipsec_cmd_exec(struct mlx5_core_dev *mdev,
context->buf.sg[0].data = &context->command;
spin_lock_irqsave(&fdev->ipsec->pending_cmds_lock, flags);
- list_add_tail(&context->list, &fdev->ipsec->pending_cmds);
+ res = mlx5_fpga_sbu_conn_sendmsg(fdev->ipsec->conn, &context->buf);
+ if (!res)
+ list_add_tail(&context->list, &fdev->ipsec->pending_cmds);
spin_unlock_irqrestore(&fdev->ipsec->pending_cmds_lock, flags);
- res = mlx5_fpga_sbu_conn_sendmsg(fdev->ipsec->conn, &context->buf);
if (res) {
- mlx5_fpga_warn(fdev, "Failure sending IPSec command: %d\n",
- res);
- spin_lock_irqsave(&fdev->ipsec->pending_cmds_lock, flags);
- list_del(&context->list);
- spin_unlock_irqrestore(&fdev->ipsec->pending_cmds_lock, flags);
+ mlx5_fpga_warn(fdev, "Failed to send IPSec command: %d\n", res);
kfree(context);
return ERR_PTR(res);
}
+
/* Context will be freed by wait func after completion */
return context;
}
--
2.17.0
^ permalink raw reply related
* Re: [PATCH net-next 0/8] nfp: offload LAG for tc flower egress
From: Jakub Kicinski @ 2018-05-24 22:01 UTC (permalink / raw)
To: Or Gerlitz
Cc: David Miller, Linux Netdev List, oss-drivers, Jiri Pirko,
Jay Vosburgh, Veaceslav Falico, Andy Gospodarek
In-Reply-To: <CAJ3xEMhJckJq6HDFm_QTtDP_SG1jPJ55q1b-_Vg0WoC_UqO_Wg@mail.gmail.com>
On Thu, 24 May 2018 22:26:03 +0300, Or Gerlitz wrote:
> On Thu, May 24, 2018 at 9:49 PM, Jakub Kicinski
> <jakub.kicinski@netronome.com> wrote:
> > On Thu, 24 May 2018 20:04:56 +0300, Or Gerlitz wrote:
>
> >> Does this apply also to non-uplink representors? if yes, what is the use case?
> >>
> >> We are looking on supporting uplink lag in sriov switchdev scheme - we refer to
> >> it as "vf lag" -- b/c the netdev and rdma devices seen by the VF are actually
> >> subject to HA and/or LAG - I wasn't sure if/how you limit this series
> >> to uplink reprs
> >
> > I don't think we have a limitation on the output port within the LAG.
> > But keep in mind in our devices all ports belong to the same eswitch/PF
> > so bonding uplink ports is generally sufficient, I'm not sure VF
> > bonding adds much HA. IOW AFAIK we support VF bonding because HW can do
> > it easily, not because we have a strong use case for it.
>
> To make it clear, vf lag is code name for uplink lag, I think we want
> to say that we provide the VM a lagged VF, anyway, again, the lag is
> done on the uplink reps not on the vf reps.
Ah, ack, same use case here!
> Unlike the uplink port which is physical one, the vf vport is virtual
> one, what could be the benefit to bond two vports?
I'm not sure what it could be :) We can also bond an uplink and a VF!
All outputs on the nfp are working same, so why limit ourselves if we
can do it? :)
^ permalink raw reply
* [PATCH net] packet: fix reserve calculation
From: Willem de Bruijn @ 2018-05-24 22:10 UTC (permalink / raw)
To: netdev; +Cc: davem, Willem de Bruijn
From: Willem de Bruijn <willemb@google.com>
Commit b84bbaf7a6c8 ("packet: in packet_snd start writing at link
layer allocation") ensures that packet_snd always starts writing
the link layer header in reserved headroom allocated for this
purpose.
This is needed because packets may be shorter than hard_header_len,
in which case the space up to hard_header_len may be zeroed. But
that necessary padding is not accounted for in skb->len.
The fix, however, is buggy. It calls skb_push, which grows skb->len
when moving skb->data back. But in this case packet length should not
change.
Instead, call skb_reserve, which moves both skb->data and skb->tail
back, without changing length.
Fixes: b84bbaf7a6c8 ("packet: in packet_snd start writing at link layer allocation")
Reported-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
---
net/packet/af_packet.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index e9422fe45179..acb7b86574cd 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -2911,7 +2911,7 @@ static int packet_snd(struct socket *sock, struct msghdr *msg, size_t len)
if (unlikely(offset < 0))
goto out_free;
} else if (reserve) {
- skb_push(skb, reserve);
+ skb_reserve(skb, -reserve);
}
/* Returns -EFAULT on error */
--
2.17.0.921.gf22659ad46-goog
^ permalink raw reply related
* Re: [PATCH net] packet: in packet_snd start writing at link layer allocation
From: Willem de Bruijn @ 2018-05-24 22:13 UTC (permalink / raw)
To: Tariq Toukan
Cc: David Miller, Network Development, Eric Dumazet, Willem de Bruijn,
Maor Gottlieb
In-Reply-To: <CAF=yD-JapgdzDxtt+noXEm2Zj4dy=9N1_ALYBsz-TXA5CwtTkQ@mail.gmail.com>
On Thu, May 24, 2018 at 1:01 PM, Willem de Bruijn
<willemdebruijn.kernel@gmail.com> wrote:
> On Thu, May 24, 2018 at 11:17 AM, Willem de Bruijn
> <willemdebruijn.kernel@gmail.com> wrote:
>> On Thu, May 24, 2018 at 11:07 AM, Tariq Toukan <tariqt@mellanox.com> wrote:
>>>
>>>
>>> On 14/05/2018 3:20 AM, David Miller wrote:
>>>>
>>>> From: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
>>>> Date: Fri, 11 May 2018 13:24:25 -0400
>>>>
>>>>> From: Willem de Bruijn <willemb@google.com>
>>>>>
>>>>> Packet sockets allow construction of packets shorter than
>>>>> dev->hard_header_len to accommodate protocols with variable length
>>>>> link layer headers. These packets are padded to dev->hard_header_len,
>>>>> because some device drivers interpret that as a minimum packet size.
>>>>>
>>>>> packet_snd reserves dev->hard_header_len bytes on allocation.
>>>>> SOCK_DGRAM sockets call skb_push in dev_hard_header() to ensure that
>>>>> link layer headers are stored in the reserved range. SOCK_RAW sockets
>>>>> do the same in tpacket_snd, but not in packet_snd.
>>>>>
>>>>> Syzbot was able to send a zero byte packet to a device with massive
>>>>> 116B link layer header, causing padding to cross over into skb_shinfo.
>>>>> Fix this by writing from the start of the llheader reserved range also
>>>>> in the case of packet_snd/SOCK_RAW.
>>>>>
>>>>> Update skb_set_network_header to the new offset. This also corrects
>>>>> it for SOCK_DGRAM, where it incorrectly double counted reserve due to
>>>>> the skb_push in dev_hard_header.
>>>>>
>>>>> Fixes: 9ed988cd5915 ("packet: validate variable length ll headers")
>>>>> Reported-by: syzbot+71d74a5406d02057d559@syzkaller.appspotmail.com
>>>>> Signed-off-by: Willem de Bruijn <willemb@google.com>
>>>>
>>>>
>>>> Applied and queued up for -stable, thanks Willem.
>>>>
>>>
>>> Hi,
>>>
>>> One of our regression tests started failing. Once this patch is reverted,
>>> test passes.
>>>
>>> The tests add flow steering rules in the receiver side and in the sender
>>> side it send the packet with some RAW socket applications. Then received
>>> side gets completion with error.
>>>
>>> Our verification team compared the packets between the stable and the broken
>>> version, in the broken version we have some extra bytes at the end of the
>>> packet.
>>>
>>> It looks like some bad push to the SKB, maybe the conditional reserved
>>> addition should be more strict?
>>>
>>> Any idea?
>>
>> Thanks for reporting, sorry for the breakage.
>>
>> I think I might. This skb_push moves back the start of skb->data in the
>> same way that tpacket_snd does. But it does not reduce the length
>> passed to skb_put, so this might double count hard_header_len.
>>
>> Let me construct a test.
>
> Indeed.
>
> Still verifying, but this almost certainly has to be
>
> @@ -2911,7 +2912,7 @@ static int packet_snd(struct socket *sock,
> struct msghdr *msg, size_t len)
> if (unlikely(offset < 0))
> goto out_free;
> } else if (reserve) {
> - skb_push(skb, reserve);
> + skb_reserve(skb, -reserve);
> }
>
> to move the start of the packet without changing its length.
I sent http://patchwork.ozlabs.org/patch/920126/
Again, thanks a lot for reporting this, Tariq. I'm working on some
packet socket boundary condition tests for tools/testing/selftests/net,
so that I cannot push such a mistake again.
^ permalink raw reply
* Re: 4.16 issue with mbim modem and ping with size > 14552 bytes
From: Daniele Palmas @ 2018-05-24 22:54 UTC (permalink / raw)
To: Greg KH; +Cc: netdev, linux-usb
In-Reply-To: <20180524155334.GA28874@kroah.com>
Hi Greg,
2018-05-24 17:53 GMT+02:00 Greg KH <gregkh@linuxfoundation.org>:
> On Thu, May 24, 2018 at 05:04:49PM +0200, Daniele Palmas wrote:
>> Hello,
>>
>> I have an issue with an USB mbim modem when trying to send with ping
>> more than 14552 bytes: it looks like to me a kernel issue, but not at
>> the cdc_mbim or cdc_ncm level, anyway not sure, so I'm reporting the
>> issue.
>>
>> My kernel is 4.16. The device is the following:
>
> Does older kernels work, or is this something that has always been
> there?
>
Not tested yet, I'm going to do.
> I ask, as my mobile provider does horrible things to large packet sizes.
> So much so that I have to set the mtu to 1280 just to get things to work
> properly when tethering my phone through to my laptop. So this might be
> a network provider issue :)
>
Yeah, I thought the same, so I tried the same scenario with Windows 10
but it is working fine.
Thanks,
Daniele
> thanks,
>
> greg k-h
^ permalink raw reply
* [PATCH] ath10k: htt_tx: mark expected switch fall-throughs
From: Gustavo A. R. Silva @ 2018-05-24 22:59 UTC (permalink / raw)
To: Kalle Valo, David S. Miller
Cc: ath10k, linux-wireless, netdev, linux-kernel, Gustavo A. R. Silva
In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.
Notice that in this particular case, I replaced "pass through" with
a proper "fall through" comment, which is what GCC is expecting
to find.
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
---
drivers/net/wireless/ath/ath10k/htt_tx.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/net/wireless/ath/ath10k/htt_tx.c b/drivers/net/wireless/ath/ath10k/htt_tx.c
index 5d8b97a..89157c5 100644
--- a/drivers/net/wireless/ath/ath10k/htt_tx.c
+++ b/drivers/net/wireless/ath/ath10k/htt_tx.c
@@ -1202,7 +1202,7 @@ static int ath10k_htt_tx_32(struct ath10k_htt *htt,
case ATH10K_HW_TXRX_RAW:
case ATH10K_HW_TXRX_NATIVE_WIFI:
flags0 |= HTT_DATA_TX_DESC_FLAGS0_MAC_HDR_PRESENT;
- /* pass through */
+ /* fall through */
case ATH10K_HW_TXRX_ETHERNET:
if (ar->hw_params.continuous_frag_desc) {
ext_desc_t = htt->frag_desc.vaddr_desc_32;
@@ -1404,7 +1404,7 @@ static int ath10k_htt_tx_64(struct ath10k_htt *htt,
case ATH10K_HW_TXRX_RAW:
case ATH10K_HW_TXRX_NATIVE_WIFI:
flags0 |= HTT_DATA_TX_DESC_FLAGS0_MAC_HDR_PRESENT;
- /* pass through */
+ /* fall through */
case ATH10K_HW_TXRX_ETHERNET:
if (ar->hw_params.continuous_frag_desc) {
ext_desc_t = htt->frag_desc.vaddr_desc_64;
--
2.7.4
^ permalink raw reply related
* Re: [PATCH net-next] net: phy: convert further flags in struct phy_device to bit-field
From: Florian Fainelli @ 2018-05-24 23:03 UTC (permalink / raw)
To: Heiner Kallweit, Andrew Lunn, David Miller; +Cc: netdev@vger.kernel.org
In-Reply-To: <d148e574-2e29-a52f-7da0-13ef1ead927a@gmail.com>
On 05/24/2018 01:15 PM, Heiner Kallweit wrote:
> This patch is a follow-up to 87e5808d52b6 ("net: phy: replace bool
> members in struct phy_device with bit-fields") and converts further
> flags to bit-fields.
This looks fine, but then you would also have to clean-up all code that
does phydev->asym_pause = 1 and phydev->pause = 1 to use true/false
instead, I am not sure there is much value in doing that for these
fields considering that they are exposed to drivers so there is a risk
of possible breakage.
Thanks!
>
> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
> ---
> include/linux/phy.h | 17 ++++++++---------
> 1 file changed, 8 insertions(+), 9 deletions(-)
>
> diff --git a/include/linux/phy.h b/include/linux/phy.h
> index 6cd090984..cc66f2834 100644
> --- a/include/linux/phy.h
> +++ b/include/linux/phy.h
> @@ -418,21 +418,20 @@ struct phy_device {
> /* The most recently read link state */
> unsigned link:1;
>
> + /* forced speed & duplex (no autoneg)
> + * partner speed & duplex & pause (autoneg)
> + */
> + unsigned pause:1;
> + unsigned asym_pause:1;
> + int speed;
> + int duplex;
> +
> enum phy_state state;
>
> u32 dev_flags;
>
> phy_interface_t interface;
>
> - /*
> - * forced speed & duplex (no autoneg)
> - * partner speed & duplex & pause (autoneg)
> - */
> - int speed;
> - int duplex;
> - int pause;
> - int asym_pause;
> -
> /* Enabled Interrupts */
> u32 interrupts;
>
>
--
Florian
^ permalink raw reply
* [PATCH] ath5k: mark expected switch fall-through
From: Gustavo A. R. Silva @ 2018-05-24 23:07 UTC (permalink / raw)
To: Jiri Slaby, Nick Kossifidis, Luis R. Rodriguez, Kalle Valo,
David S. Miller
Cc: linux-wireless, netdev, linux-kernel, Gustavo A. R. Silva
In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
---
drivers/net/wireless/ath/ath5k/pcu.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/net/wireless/ath/ath5k/pcu.c b/drivers/net/wireless/ath/ath5k/pcu.c
index f23c851..05140d8 100644
--- a/drivers/net/wireless/ath/ath5k/pcu.c
+++ b/drivers/net/wireless/ath/ath5k/pcu.c
@@ -670,6 +670,7 @@ ath5k_hw_init_beacon_timers(struct ath5k_hw *ah, u32 next_beacon, u32 interval)
break;
case NL80211_IFTYPE_ADHOC:
AR5K_REG_ENABLE_BITS(ah, AR5K_TXCFG, AR5K_TXCFG_ADHOC_BCN_ATIM);
+ /* fall through */
default:
/* On non-STA modes timer1 is used as next DMA
* beacon alert (DBA) timer and timer2 as next
--
2.7.4
^ permalink raw reply related
* [PATCH] ath6kl: mark expected switch fall-throughs
From: Gustavo A. R. Silva @ 2018-05-24 23:13 UTC (permalink / raw)
To: Kalle Valo, David S. Miller
Cc: linux-wireless, netdev, linux-kernel, Gustavo A. R. Silva
In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
---
drivers/net/wireless/ath/ath6kl/cfg80211.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/net/wireless/ath/ath6kl/cfg80211.c b/drivers/net/wireless/ath/ath6kl/cfg80211.c
index 2ba8cf3..29e32cd 100644
--- a/drivers/net/wireless/ath/ath6kl/cfg80211.c
+++ b/drivers/net/wireless/ath/ath6kl/cfg80211.c
@@ -3898,17 +3898,17 @@ int ath6kl_cfg80211_init(struct ath6kl *ar)
wiphy->max_scan_ie_len = 1000; /* FIX: what is correct limit? */
switch (ar->hw.cap) {
case WMI_11AN_CAP:
- ht = true;
+ ht = true; /* fall through */
case WMI_11A_CAP:
band_5gig = true;
break;
case WMI_11GN_CAP:
- ht = true;
+ ht = true; /* fall through */
case WMI_11G_CAP:
band_2gig = true;
break;
case WMI_11AGN_CAP:
- ht = true;
+ ht = true; /* fall through */
case WMI_11AG_CAP:
band_2gig = true;
band_5gig = true;
--
2.7.4
^ permalink raw reply related
* Re: [v8, bpf-next, 4/9] net/wireless/iwlwifi: fix iwlwifi_dev_ucode_error tracepoint
From: Alexei Starovoitov @ 2018-05-24 23:28 UTC (permalink / raw)
To: Johannes Berg
Cc: Alexei Starovoitov, davem-fT/PcQaiUtIeIZ0/mPfg9Q,
daniel-FeC+5ew28dpmcu3hnIyYJQ,
torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
peterz-wEGCiKHe2LqWVfeAwA7xHQ, rostedt-nx8X9YLhiw1AfugRpC6u6w,
mathieu.desnoyers-vg+e7yoeK/dWk0Htik3J/w,
netdev-u79uwXL29TY76Z2rM5mHXA, kernel-team-b10kYP2dOMg,
linux-api-u79uwXL29TY76Z2rM5mHXA,
linux-wireless-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <1527073388.3759.21.camel-cdvu00un1VgdHxzADdlk8Q@public.gmane.org>
On Wed, May 23, 2018 at 01:03:08PM +0200, Johannes Berg wrote:
> On Wed, 2018-03-28 at 12:05 -0700, Alexei Starovoitov wrote:
> > fix iwlwifi_dev_ucode_error tracepoint to pass pointer to a table
> > instead of all 17 arguments by value.
> > dvm/main.c and mvm/utils.c have 'struct iwl_error_event_table'
> > defined with very similar yet subtly different fields and offsets.
> > tracepoint is still common and using definition of 'struct iwl_error_event_table'
> > from dvm/commands.h while copying fields.
> > Long term this tracepoint probably should be split into two.
>
> It would've been nice to CC the wireless list for wireless related
> patches ...
Ohh. I didn't realize that networking wireless doesn't fall under netdev.
I thought wireless folks are silent because they are embarrassed
by a function with 17 arguments.
> > --- a/drivers/net/wireless/intel/iwlwifi/iwl-devtrace.c
> > +++ b/drivers/net/wireless/intel/iwlwifi/iwl-devtrace.c
> > @@ -30,6 +30,7 @@
> > #ifndef __CHECKER__
> > #include "iwl-trans.h"
> >
> > +#include "dvm/commands.h"
>
> In particular, this breaks the whole driver abstraction.
>
> > +++ b/drivers/net/wireless/intel/iwlwifi/mvm/utils.c
> > @@ -549,12 +549,7 @@ static void iwl_mvm_dump_lmac_error_log(struct iwl_mvm *mvm, u32 base)
> >
> > IWL_ERR(mvm, "Loaded firmware version: %s\n", mvm->fw->fw_version);
> >
> > - trace_iwlwifi_dev_ucode_error(trans->dev, table.error_id, table.tsf_low,
> > - table.data1, table.data2, table.data3,
> > - table.blink2, table.ilink1,
> > - table.ilink2, table.bcon_time, table.gp1,
> > - table.gp2, table.fw_rev_type, table.major,
> > - table.minor, table.hw_ver, table.brd_ver);
> > + trace_iwlwifi_dev_ucode_error(trans->dev, &table, table.hw_ver, table.brd_ver);
>
> This is also utterly wrong because mvm has - for better or worse - a
> different type "struct iwl_error_event_table" in this file ...
As I was trying to explain in the commit log the single struct
is used in both places, but differences in two
"struct iwl_error_event_table" are carefully matched
field and by field. For two extra fields it was not
possible and they are passed separately as you can see above.
I still believe that tracepoint output is still exactly
the same before and after the patch.
I guess you see the breakage because new fields got
added into one "struct iwl_error_event_table",
but were not added to its evil twin "struct iwl_error_event_table"
with the same name after the patch landed ?
imo wireless folks need to avoid such naming conflicts.
I suggest to isolate common fields into separate base struct and
give two children structs different names.
^ permalink raw reply
* Re: [PATCH bpf-next v2 0/3] bpf: add boot parameters for sysctl knobs
From: Alexei Starovoitov @ 2018-05-24 23:34 UTC (permalink / raw)
To: Jesper Dangaard Brouer
Cc: Eugene Syromiatnikov, netdev, linux-kernel, linux-doc, Kees Cook,
Kai-Heng Feng, Daniel Borkmann, Alexei Starovoitov,
Jonathan Corbet, Jiri Olsa
In-Reply-To: <20180524094108.066d885a@redhat.com>
On Thu, May 24, 2018 at 09:41:08AM +0200, Jesper Dangaard Brouer wrote:
> On Wed, 23 May 2018 15:02:45 -0700
> Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
>
> > On Wed, May 23, 2018 at 02:18:19PM +0200, Eugene Syromiatnikov wrote:
> > > Some BPF sysctl knobs affect the loading of BPF programs, and during
> > > system boot/init stages these sysctls are not yet configured.
> > > A concrete example is systemd, that has implemented loading of BPF
> > > programs.
> > >
> > > Thus, to allow controlling these setting at early boot, this patch set
> > > adds the ability to change the default setting of these sysctl knobs
> > > as well as option to override them via a boot-time kernel parameter
> > > (in order to avoid rebuilding kernel each time a need of changing these
> > > defaults arises).
> > >
> > > The sysctl knobs in question are kernel.unprivileged_bpf_disable,
> > > net.core.bpf_jit_harden, and net.core.bpf_jit_kallsyms.
> >
> > - systemd is root. today it only uses cgroup-bpf progs which require root,
> > so disabling unpriv during boot time makes no difference to systemd.
> > what is the actual reason to present time?
> >
> > - say in the future systemd wants to use so_reuseport+bpf for faster
> > networking. With unpriv disable during boot, it will force systemd
> > to do such networking from root, which will lower its security barrier.
> > How that make sense?
> >
> > - bpf_jit_kallsyms sysctl has immediate effect on loaded programs.
> > Flipping it during the boot or right after or any time after
> > is the same thing. Why add such boot flag then?
> >
> > - jit_harden can be turned on by systemd. so turning it during the boot
> > will make systemd progs to be constant blinded.
> > Constant blinding protects kernel from unprivileged JIT spraying.
> > Are you worried that systemd will attack the kernel with JIT spraying?
>
>
> I think you are missing that, we want the ability to change these
> defaults in-order to avoid depending on /etc/sysctl.conf settings, and
> that the these sysctl.conf setting happen too late.
What does it mean 'happens too late' ?
Too late for what?
sysctl.conf has plenty of system critical knobs like
kernel.perf_event_paranoid, kernel.core_pattern, etc
The behavior of the host is drastically different after sysctl config
is applied.
> For example with jit_harden, there will be a difference between the
> loaded BPF program that got loaded at boot-time with systemd (no
> constant blinding) and when someone reloads that systemd service after
> /etc/sysctl.conf have been evaluated and setting bpf_jit_harden (now
> slower due to constant blinding). This is inconsistent behavior.
net.core.bpf_jit_harden can be flipped back and forth at run-time,
so bpf progs before and after will be either blinded or not.
I don't see any inconsistency.
In general I think bootparams should be used only for things
like kpti=on/off that cannot be set by sysctl.
^ permalink raw reply
* Re: [PATCH 00/14] Modify action API for implementing lockless actions
From: Cong Wang @ 2018-05-24 23:34 UTC (permalink / raw)
To: Vlad Buslov
Cc: Linux Kernel Network Developers, David Miller, Jamal Hadi Salim,
Jiri Pirko, Pablo Neira Ayuso, Jozsef Kadlecsik, Florian Westphal,
Alexei Starovoitov, Daniel Borkmann, Eric Dumazet, Kees Cook,
LKML, NetFilter, coreteam, kliteyn
In-Reply-To: <1526308035-12484-1-git-send-email-vladbu@mellanox.com>
On Mon, May 14, 2018 at 7:27 AM, Vlad Buslov <vladbu@mellanox.com> wrote:
> Currently, all netlink protocol handlers for updating rules, actions and
> qdiscs are protected with single global rtnl lock which removes any
> possibility for parallelism. This patch set is a first step to remove
> rtnl lock dependency from TC rules update path. It updates act API to
> use atomic operations, rcu and spinlocks for fine-grained locking. It
> also extend API with functions that are needed to update existing
> actions for parallel execution.
Can you give a summary here for what and how it is achieved?
You said this is the first step, what do you want to achieve in this
very first step? And how do you achieve it? Do you break the RTNL
lock down to, for a quick example, a per-device lock? Or perhaps you
completely remove it because of what reason?
I go through all the descriptions of your 14 patches (but not any code),
I still have no clue how you successfully avoid RTNL. Please don't
let me read into your code to understand that, there must be some
high-level justification on how it works. Without it, I don't event want
to read into the code.
Thanks.
^ permalink raw reply
* Re: [v8, bpf-next, 4/9] net/wireless/iwlwifi: fix iwlwifi_dev_ucode_error tracepoint
From: Steven Rostedt @ 2018-05-24 23:39 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: Johannes Berg, Alexei Starovoitov, davem, daniel, torvalds,
peterz, mathieu.desnoyers, netdev, kernel-team, linux-api,
linux-wireless
In-Reply-To: <20180524232837.24jvdsdiohkpj7fs@ast-mbp>
On Thu, 24 May 2018 16:28:39 -0700
Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
> Ohh. I didn't realize that networking wireless doesn't fall under netdev.
> I thought wireless folks are silent because they are embarrassed
> by a function with 17 arguments.
Please lets refrain from the demeaning comments.
I agree with your argument, but not the tone.
-- Steve
^ permalink raw reply
* Re: [PATCH] PCI: allow drivers to limit the number of VFs to 0
From: Bjorn Helgaas @ 2018-05-24 23:57 UTC (permalink / raw)
To: Jakub Kicinski
Cc: Bjorn Helgaas, linux-pci, netdev, Sathya Perla, Felix Manlunas,
alexander.duyck, john.fastabend, Jacob Keller, Donald Dutile,
oss-drivers, Christoph Hellwig
In-Reply-To: <20180402224652.4058-1-jakub.kicinski@netronome.com>
Hi Jakub,
On Mon, Apr 02, 2018 at 03:46:52PM -0700, Jakub Kicinski wrote:
> Some user space depends on enabling sriov_totalvfs number of VFs
> to not fail, e.g.:
>
> $ cat .../sriov_totalvfs > .../sriov_numvfs
>
> For devices which VF support depends on loaded FW we have the
> pci_sriov_{g,s}et_totalvfs() API. However, this API uses 0 as
> a special "unset" value, meaning drivers can't limit sriov_totalvfs
> to 0. Remove the special values completely and simply initialize
> driver_max_VFs to total_VFs. Then always use driver_max_VFs.
> Add a helper for drivers to reset the VF limit back to total.
I still can't really make sense out of the changelog.
I think part of the reason it's confusing is because there are two
things going on:
1) You want this:
pci_sriov_set_totalvfs(dev, 0);
x = pci_sriov_get_totalvfs(dev)
to return 0 instead of total_VFs. That seems to connect with
your subject line. It means "sriov_totalvfs" in sysfs could be
0, but I don't know how that is useful (I'm sure it is; just
educate me :))
2) You're adding the pci_sriov_reset_totalvfs() interface. I'm not
sure what you intend for this. Is *every* driver supposed to
call it in .remove()? Could/should this be done in the core
somehow instead of depending on every driver?
I'm also having a hard time connecting your user-space command example
with the rest of this. Maybe it will make more sense to me tomorrow
after some coffee.
> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
> ---
> drivers/net/ethernet/netronome/nfp/nfp_main.c | 6 +++---
> drivers/pci/iov.c | 27 +++++++++++++++++++++------
> include/linux/pci.h | 2 ++
> 3 files changed, 26 insertions(+), 9 deletions(-)
>
> diff --git a/drivers/net/ethernet/netronome/nfp/nfp_main.c b/drivers/net/ethernet/netronome/nfp/nfp_main.c
> index c4b1f344b4da..a76d177e40dd 100644
> --- a/drivers/net/ethernet/netronome/nfp/nfp_main.c
> +++ b/drivers/net/ethernet/netronome/nfp/nfp_main.c
> @@ -123,7 +123,7 @@ static int nfp_pcie_sriov_read_nfd_limit(struct nfp_pf *pf)
> return pci_sriov_set_totalvfs(pf->pdev, pf->limit_vfs);
>
> pf->limit_vfs = ~0;
> - pci_sriov_set_totalvfs(pf->pdev, 0); /* 0 is unset */
> + pci_sriov_reset_totalvfs(pf->pdev);
> /* Allow any setting for backwards compatibility if symbol not found */
> if (err == -ENOENT)
> return 0;
> @@ -537,7 +537,7 @@ static int nfp_pci_probe(struct pci_dev *pdev,
> err_net_remove:
> nfp_net_pci_remove(pf);
> err_sriov_unlimit:
> - pci_sriov_set_totalvfs(pf->pdev, 0);
> + pci_sriov_reset_totalvfs(pf->pdev);
> err_fw_unload:
> kfree(pf->rtbl);
> nfp_mip_close(pf->mip);
> @@ -570,7 +570,7 @@ static void nfp_pci_remove(struct pci_dev *pdev)
> nfp_hwmon_unregister(pf);
>
> nfp_pcie_sriov_disable(pdev);
> - pci_sriov_set_totalvfs(pf->pdev, 0);
> + pci_sriov_reset_totalvfs(pf->pdev);
>
> nfp_net_pci_remove(pf);
>
> diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
> index 677924ae0350..c63ea870d8be 100644
> --- a/drivers/pci/iov.c
> +++ b/drivers/pci/iov.c
> @@ -443,6 +443,7 @@ static int sriov_init(struct pci_dev *dev, int pos)
> iov->nres = nres;
> iov->ctrl = ctrl;
> iov->total_VFs = total;
> + iov->driver_max_VFs = total;
> pci_read_config_word(dev, pos + PCI_SRIOV_VF_DID, &iov->vf_device);
> iov->pgsz = pgsz;
> iov->self = dev;
> @@ -788,12 +789,29 @@ int pci_sriov_set_totalvfs(struct pci_dev *dev, u16 numvfs)
> }
> EXPORT_SYMBOL_GPL(pci_sriov_set_totalvfs);
>
> +/**
> + * pci_sriov_reset_totalvfs -- return the TotalVFs value to the default
> + * @dev: the PCI PF device
> + *
> + * Should be called from PF driver's remove routine with
> + * device's mutex held.
> + */
> +void pci_sriov_reset_totalvfs(struct pci_dev *dev)
> +{
> + /* Shouldn't change if VFs already enabled */
> + if (!dev->is_physfn || dev->sriov->ctrl & PCI_SRIOV_CTRL_VFE)
> + return;
> +
> + dev->sriov->driver_max_VFs = dev->sriov->total_VFs;
> +}
> +EXPORT_SYMBOL_GPL(pci_sriov_reset_totalvfs);
> +
> /**
> * pci_sriov_get_totalvfs -- get total VFs supported on this device
> * @dev: the PCI PF device
> *
> - * For a PCIe device with SRIOV support, return the PCIe
> - * SRIOV capability value of TotalVFs or the value of driver_max_VFs
> + * For a PCIe device with SRIOV support, return the value of driver_max_VFs
> + * which can be equal to the PCIe SRIOV capability value of TotalVFs or lower
> * if the driver reduced it. Otherwise 0.
> */
> int pci_sriov_get_totalvfs(struct pci_dev *dev)
> @@ -801,9 +819,6 @@ int pci_sriov_get_totalvfs(struct pci_dev *dev)
> if (!dev->is_physfn)
> return 0;
>
> - if (dev->sriov->driver_max_VFs)
> - return dev->sriov->driver_max_VFs;
> -
> - return dev->sriov->total_VFs;
> + return dev->sriov->driver_max_VFs;
> }
> EXPORT_SYMBOL_GPL(pci_sriov_get_totalvfs);
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index 024a1beda008..95fde8850393 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -1952,6 +1952,7 @@ void pci_iov_remove_virtfn(struct pci_dev *dev, int id);
> int pci_num_vf(struct pci_dev *dev);
> int pci_vfs_assigned(struct pci_dev *dev);
> int pci_sriov_set_totalvfs(struct pci_dev *dev, u16 numvfs);
> +void pci_sriov_reset_totalvfs(struct pci_dev *dev);
> int pci_sriov_get_totalvfs(struct pci_dev *dev);
> resource_size_t pci_iov_resource_size(struct pci_dev *dev, int resno);
> void pci_vf_drivers_autoprobe(struct pci_dev *dev, bool probe);
> @@ -1978,6 +1979,7 @@ static inline int pci_vfs_assigned(struct pci_dev *dev)
> { return 0; }
> static inline int pci_sriov_set_totalvfs(struct pci_dev *dev, u16 numvfs)
> { return 0; }
> +static inline void pci_sriov_reset_totalvfs(struct pci_dev *dev) { }
> static inline int pci_sriov_get_totalvfs(struct pci_dev *dev)
> { return 0; }
> static inline resource_size_t pci_iov_resource_size(struct pci_dev *dev, int resno)
> --
> 2.16.2
>
^ permalink raw reply
* Re: [PATCH bpf-next v5 0/7] bpf: implement BPF_TASK_FD_QUERY
From: Daniel Borkmann @ 2018-05-25 0:27 UTC (permalink / raw)
To: Yonghong Song, peterz, ast, netdev; +Cc: kernel-team
In-Reply-To: <20180524182111.454612-1-yhs@fb.com>
On 05/24/2018 08:21 PM, Yonghong Song wrote:
> Currently, suppose a userspace application has loaded a bpf program
> and attached it to a tracepoint/kprobe/uprobe, and a bpf
> introspection tool, e.g., bpftool, wants to show which bpf program
> is attached to which tracepoint/kprobe/uprobe. Such attachment
> information will be really useful to understand the overall bpf
> deployment in the system.
>
> There is a name field (16 bytes) for each program, which could
> be used to encode the attachment point. There are some drawbacks
> for this approaches. First, bpftool user (e.g., an admin) may not
> really understand the association between the name and the
> attachment point. Second, if one program is attached to multiple
> places, encoding a proper name which can imply all these
> attachments becomes difficult.
>
> This patch introduces a new bpf subcommand BPF_TASK_FD_QUERY.
> Given a pid and fd, this command will return bpf related information
> to user space. Right now it only supports tracepoint/kprobe/uprobe
> perf event fd's. For such a fd, BPF_TASK_FD_QUERY will return
> . prog_id
> . tracepoint name, or
> . k[ret]probe funcname + offset or kernel addr, or
> . u[ret]probe filename + offset
> to the userspace.
> The user can use "bpftool prog" to find more information about
> bpf program itself with prog_id.
>
> Patch #1 adds function perf_get_event() in kernel/events/core.c.
> Patch #2 implements the bpf subcommand BPF_TASK_FD_QUERY.
> Patch #3 syncs tools bpf.h header and also add bpf_task_fd_query()
> in the libbpf library for samples/selftests/bpftool to use.
> Patch #4 adds ksym_get_addr() utility function.
> Patch #5 add a test in samples/bpf for querying k[ret]probes and
> u[ret]probes.
> Patch #6 add a test in tools/testing/selftests/bpf for querying
> raw_tracepoint and tracepoint.
> Patch #7 add a new subcommand "perf" to bpftool.
>
> Changelogs:
> v4 -> v5:
> . return strlen(buf) instead of strlen(buf) + 1
> in the attr.buf_len. As long as user provides
> non-empty buffer, it will be filed with empty
> string, truncated string, or full string
> based on the buffer size and the length of
> to-be-copied string.
> v3 -> v4:
> . made attr buf_len input/output. The length of
> actual buffter is written to buf_len so user space knows
> what is actually needed. If user provides a buffer
> with length >= 1 but less than required, do partial
> copy and return -ENOSPC.
> . code simplification with put_user.
> . changed query result attach_info to fd_type.
> . add tests at selftests/bpf to test zero len, null buf and
> insufficient buf.
> v2 -> v3:
> . made perf_get_event() return perf_event pointer const.
> this was to ensure that event fields are not meddled.
> . detect whether newly BPF_TASK_FD_QUERY is supported or
> not in "bpftool perf" and warn users if it is not.
> v1 -> v2:
> . changed bpf subcommand name from BPF_PERF_EVENT_QUERY
> to BPF_TASK_FD_QUERY.
> . fixed various "bpftool perf" issues and added documentation
> and auto-completion.
>
> Yonghong Song (7):
> perf/core: add perf_get_event() to return perf_event given a struct
> file
> bpf: introduce bpf subcommand BPF_TASK_FD_QUERY
> tools/bpf: sync kernel header bpf.h and add bpf_task_fd_query in
> libbpf
> tools/bpf: add ksym_get_addr() in trace_helpers
> samples/bpf: add a samples/bpf test for BPF_TASK_FD_QUERY
> tools/bpf: add two BPF_TASK_FD_QUERY tests in test_progs
> tools/bpftool: add perf subcommand
>
> include/linux/perf_event.h | 5 +
> include/linux/trace_events.h | 17 +
> include/uapi/linux/bpf.h | 26 ++
> kernel/bpf/syscall.c | 131 ++++++++
> kernel/events/core.c | 8 +
> kernel/trace/bpf_trace.c | 48 +++
> kernel/trace/trace_kprobe.c | 29 ++
> kernel/trace/trace_uprobe.c | 22 ++
> samples/bpf/Makefile | 4 +
> samples/bpf/task_fd_query_kern.c | 19 ++
> samples/bpf/task_fd_query_user.c | 382 +++++++++++++++++++++++
> tools/bpf/bpftool/Documentation/bpftool-perf.rst | 81 +++++
> tools/bpf/bpftool/Documentation/bpftool.rst | 5 +-
> tools/bpf/bpftool/bash-completion/bpftool | 9 +
> tools/bpf/bpftool/main.c | 3 +-
> tools/bpf/bpftool/main.h | 1 +
> tools/bpf/bpftool/perf.c | 246 +++++++++++++++
> tools/include/uapi/linux/bpf.h | 26 ++
> tools/lib/bpf/bpf.c | 23 ++
> tools/lib/bpf/bpf.h | 3 +
> tools/testing/selftests/bpf/test_progs.c | 158 ++++++++++
> tools/testing/selftests/bpf/trace_helpers.c | 12 +
> tools/testing/selftests/bpf/trace_helpers.h | 1 +
> 23 files changed, 1257 insertions(+), 2 deletions(-)
> create mode 100644 samples/bpf/task_fd_query_kern.c
> create mode 100644 samples/bpf/task_fd_query_user.c
> create mode 100644 tools/bpf/bpftool/Documentation/bpftool-perf.rst
> create mode 100644 tools/bpf/bpftool/perf.c
LGTM, series:
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox