* Re: [RFC PATCH v2 1/5] bpf: add PHYS_DEV prog type for early driver filter
From: Thomas Graf @ 2016-04-10 7:55 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: Jamal Hadi Salim, Tom Herbert, Jesper Dangaard Brouer,
Brenden Blanco, David S. Miller, Linux Kernel Network Developers,
Or Gerlitz, Daniel Borkmann, Eric Dumazet, Edward Cree,
john fastabend, Johannes Berg, eranlinuxmellanox, Lorenzo Colitti,
linux-mm
In-Reply-To: <20160409172634.GA55330@ast-mbp.thefacebook.com>
On 04/09/16 at 10:26am, Alexei Starovoitov wrote:
> On Sat, Apr 09, 2016 at 11:29:18AM -0400, Jamal Hadi Salim wrote:
> > If this is _forwarding only_ it maybe useful to look at
> > Alexey's old code in particular the DMA bits;
> > he built his own lookup algorithm but sounds like bpf is
> > a much better fit today.
>
> a link to these old bits?
>
> Just to be clear: this rfc is not the only thing we're considering.
> In particular huawei guys did a monster effort to improve performance
> in this area as well. We'll try to blend all the code together and
> pick what's the best.
What's the plan on opening the discussion on this? Can we get a peek?
Is it an alternative to XDP and the driver hook? Different architecture
or just different implementation? I understood it as another pseudo
skb model with a path on converting to real skbs for stack processing.
I really like the current proposal by Brenden for its simplicity and
targeted compatibility with cls_bpf.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply
* Re: [PATCH net-next v2 1/2] rtnetlink: add new RTM_GETSTATS message to dump link stats
From: Thomas Graf @ 2016-04-10 8:16 UTC (permalink / raw)
To: Roopa Prabhu; +Cc: netdev, jhs, davem
In-Reply-To: <1460183892-57286-2-git-send-email-roopa@cumulusnetworks.com>
On 04/08/16 at 11:38pm, Roopa Prabhu wrote:
> From: Roopa Prabhu <roopa@cumulusnetworks.com>
>
> This patch adds a new RTM_GETSTATS message to query link stats via netlink
> from the kernel. RTM_NEWLINK also dumps stats today, but RTM_NEWLINK
> returns a lot more than just stats and is expensive in some cases when
> frequent polling for stats from userspace is a common operation.
>
> RTM_GETSTATS is an attempt to provide a light weight netlink message
> to explicity query only link stats from the kernel on an interface.
> The idea is to also keep it extensible so that new kinds of stats can be
> added to it in the future.
>
> This patch adds the following attribute for NETDEV stats:
> struct nla_policy ifla_stats_policy[IFLA_STATS_MAX + 1] = {
> [IFLA_STATS_LINK64] = { .len = sizeof(struct rtnl_link_stats64) },
> };
>
> This patch also allows for af family stats (an example af stats for IPV6
> is available with the second patch in the series).
>
> Like any other rtnetlink message, RTM_GETSTATS can be used to get stats of
> a single interface or all interfaces with NLM_F_DUMP.
Awesome stuff Roopa.
This currently ties everything to a net_device with a selector to
include certain bits of that net_device. How about we take it half a
step further and allow for non net_device stats such as IP, TCP,
routing or ipsec stats to be retrieved as well?
A simple array of nested attributes replacing IFLA_STATS_* would
allow for that, e.g.
1. {.type = ST_IPSTATS, value = { ...} }
2. {.type = ST_LINK, .value = {
{.type = ST_LINK_NAME, .value = "eth0"},
{.type = ST_LINK_Q, .value = 10}
}}
3. ...
So for your initial patch, you'd simply add a new top level attribute
which ties all the net_device specific statistics to a top level
attribute and we can add the non net_device specific stats later on.
We can't do that later on though without breaking ABI so that would
have to go in with the first iteration.
We can preserve all the existing attribute formats, we simply have
to introduce new attribute types which don't overlap and document
which statistic identifier maps to which existing attribute id.
^ permalink raw reply
* [PATCH net-next 0/4] qed*: [mostly] Ethtool RSS configuration
From: Yuval Mintz @ 2016-04-10 9:42 UTC (permalink / raw)
To: davem, netdev; +Cc: Yuval Mintz
Most of the content [code-wise] in this series is for allowing various
RSS-related configuration via ethtool.
In addition, this also removed an unnecessary versioning scheme between
the drivers and bump the driver version.
Dave,
Please consider applying this to `net-next'.
Thanks,
Yuval
^ permalink raw reply
* [PATCH net-next 1/4] qed*: remove version dependency
From: Yuval Mintz @ 2016-04-10 9:42 UTC (permalink / raw)
To: davem, netdev; +Cc: Rahul Verma, Yuval Mintz
In-Reply-To: <1460281382-13656-1-git-send-email-Yuval.Mintz@qlogic.com>
From: Rahul Verma <rahul.verma@qlogic.com>
Inbox drivers don't need versioning scheme in order to guarantee
compatibility, as both qed and qede are compiled from same codebase.
Signed-off-by: Rahul Verma <rahul.verma@qlogic.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
---
drivers/net/ethernet/qlogic/qed/qed.h | 2 --
drivers/net/ethernet/qlogic/qed/qed_l2.c | 8 +-------
drivers/net/ethernet/qlogic/qed/qed_main.c | 11 -----------
drivers/net/ethernet/qlogic/qede/qede.h | 2 --
drivers/net/ethernet/qlogic/qede/qede_main.c | 11 +----------
include/linux/qed/qed_eth_if.h | 2 +-
include/linux/qed/qed_if.h | 9 ---------
7 files changed, 3 insertions(+), 42 deletions(-)
diff --git a/drivers/net/ethernet/qlogic/qed/qed.h b/drivers/net/ethernet/qlogic/qed/qed.h
index fcb8e9b..a3ee9df 100644
--- a/drivers/net/ethernet/qlogic/qed/qed.h
+++ b/drivers/net/ethernet/qlogic/qed/qed.h
@@ -507,6 +507,4 @@ u32 qed_unzip_data(struct qed_hwfn *p_hwfn,
int qed_slowpath_irq_req(struct qed_hwfn *hwfn);
-#define QED_ETH_INTERFACE_VERSION 300
-
#endif /* _QED_H */
diff --git a/drivers/net/ethernet/qlogic/qed/qed_l2.c b/drivers/net/ethernet/qlogic/qed/qed_l2.c
index 3f35c6c..e848d5a 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_l2.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_l2.c
@@ -2043,14 +2043,8 @@ static const struct qed_eth_ops qed_eth_ops_pass = {
.get_vport_stats = &qed_get_vport_stats,
};
-const struct qed_eth_ops *qed_get_eth_ops(u32 version)
+const struct qed_eth_ops *qed_get_eth_ops(void)
{
- if (version != QED_ETH_INTERFACE_VERSION) {
- pr_notice("Cannot supply ethtool operations [%08x != %08x]\n",
- version, QED_ETH_INTERFACE_VERSION);
- return NULL;
- }
-
return &qed_eth_ops_pass;
}
EXPORT_SYMBOL(qed_get_eth_ops);
diff --git a/drivers/net/ethernet/qlogic/qed/qed_main.c b/drivers/net/ethernet/qlogic/qed/qed_main.c
index 26d40db..c31d485 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_main.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_main.c
@@ -1172,14 +1172,3 @@ const struct qed_common_ops qed_common_ops_pass = {
.chain_free = &qed_chain_free,
.set_led = &qed_set_led,
};
-
-u32 qed_get_protocol_version(enum qed_protocol protocol)
-{
- switch (protocol) {
- case QED_PROTOCOL_ETH:
- return QED_ETH_INTERFACE_VERSION;
- default:
- return 0;
- }
-}
-EXPORT_SYMBOL(qed_get_protocol_version);
diff --git a/drivers/net/ethernet/qlogic/qede/qede.h b/drivers/net/ethernet/qlogic/qede/qede.h
index d023251..e0a696a 100644
--- a/drivers/net/ethernet/qlogic/qede/qede.h
+++ b/drivers/net/ethernet/qlogic/qede/qede.h
@@ -32,8 +32,6 @@
__stringify(QEDE_REVISION_VERSION) "." \
__stringify(QEDE_ENGINEERING_VERSION)
-#define QEDE_ETH_INTERFACE_VERSION 300
-
#define DRV_MODULE_SYM qede
struct qede_stats {
diff --git a/drivers/net/ethernet/qlogic/qede/qede_main.c b/drivers/net/ethernet/qlogic/qede/qede_main.c
index 518af32..a55d93e 100644
--- a/drivers/net/ethernet/qlogic/qede/qede_main.c
+++ b/drivers/net/ethernet/qlogic/qede/qede_main.c
@@ -141,19 +141,10 @@ static
int __init qede_init(void)
{
int ret;
- u32 qed_ver;
pr_notice("qede_init: %s\n", version);
- qed_ver = qed_get_protocol_version(QED_PROTOCOL_ETH);
- if (qed_ver != QEDE_ETH_INTERFACE_VERSION) {
- pr_notice("Version mismatch [%08x != %08x]\n",
- qed_ver,
- QEDE_ETH_INTERFACE_VERSION);
- return -EINVAL;
- }
-
- qed_ops = qed_get_eth_ops(QEDE_ETH_INTERFACE_VERSION);
+ qed_ops = qed_get_eth_ops();
if (!qed_ops) {
pr_notice("Failed to get qed ethtool operations\n");
return -EINVAL;
diff --git a/include/linux/qed/qed_eth_if.h b/include/linux/qed/qed_eth_if.h
index e1d6983..e00c8db 100644
--- a/include/linux/qed/qed_eth_if.h
+++ b/include/linux/qed/qed_eth_if.h
@@ -167,7 +167,7 @@ struct qed_eth_ops {
struct qed_eth_stats *stats);
};
-const struct qed_eth_ops *qed_get_eth_ops(u32 version);
+const struct qed_eth_ops *qed_get_eth_ops(void);
void qed_put_eth_ops(void);
#endif
diff --git a/include/linux/qed/qed_if.h b/include/linux/qed/qed_if.h
index 1f7599c7..b007011 100644
--- a/include/linux/qed/qed_if.h
+++ b/include/linux/qed/qed_if.h
@@ -271,15 +271,6 @@ struct qed_common_ops {
enum qed_led_mode mode);
};
-/**
- * @brief qed_get_protocol_version
- *
- * @param protocol
- *
- * @return version supported by qed for given protocol driver
- */
-u32 qed_get_protocol_version(enum qed_protocol protocol);
-
#define MASK_FIELD(_name, _value) \
((_value) &= (_name ## _MASK))
--
1.9.3
^ permalink raw reply related
* [PATCH net-next 2/4] qed: add Rx flow hash/indirection support.
From: Yuval Mintz @ 2016-04-10 9:43 UTC (permalink / raw)
To: davem, netdev; +Cc: Sudarsana Reddy Kalluru, Yuval Mintz
In-Reply-To: <1460281382-13656-1-git-send-email-Yuval.Mintz@qlogic.com>
From: Sudarsana Reddy Kalluru <sudarsana.kalluru@qlogic.com>
Adds the required API for passing RSS-related configuration from qede.
Signed-off-by: Sudarsana Reddy Kalluru <sudarsana.kalluru@qlogic.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
---
drivers/net/ethernet/qlogic/qed/qed_l2.c | 17 +----------------
include/linux/qed/qed_eth_if.h | 1 +
include/linux/qed/qed_if.h | 11 +++++++++++
3 files changed, 13 insertions(+), 16 deletions(-)
diff --git a/drivers/net/ethernet/qlogic/qed/qed_l2.c b/drivers/net/ethernet/qlogic/qed/qed_l2.c
index e848d5a..5005497 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_l2.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_l2.c
@@ -35,19 +35,6 @@
#include "qed_reg_addr.h"
#include "qed_sp.h"
-enum qed_rss_caps {
- QED_RSS_IPV4 = 0x1,
- QED_RSS_IPV6 = 0x2,
- QED_RSS_IPV4_TCP = 0x4,
- QED_RSS_IPV6_TCP = 0x8,
- QED_RSS_IPV4_UDP = 0x10,
- QED_RSS_IPV6_UDP = 0x20,
-};
-
-/* Should be the same as ETH_RSS_IND_TABLE_ENTRIES_NUM */
-#define QED_RSS_IND_TABLE_SIZE 128
-#define QED_RSS_KEY_SIZE 10 /* size in 32b chunks */
-
struct qed_rss_params {
u8 update_rss_config;
u8 rss_enable;
@@ -1744,9 +1731,7 @@ static int qed_update_vport(struct qed_dev *cdev,
sp_rss_params.update_rss_capabilities = 1;
sp_rss_params.update_rss_ind_table = 1;
sp_rss_params.update_rss_key = 1;
- sp_rss_params.rss_caps = QED_RSS_IPV4 |
- QED_RSS_IPV6 |
- QED_RSS_IPV4_TCP | QED_RSS_IPV6_TCP;
+ sp_rss_params.rss_caps = params->rss_params.rss_caps;
sp_rss_params.rss_table_size_log = 7; /* 2^7 = 128 */
memcpy(sp_rss_params.rss_ind_table,
params->rss_params.rss_ind_table,
diff --git a/include/linux/qed/qed_eth_if.h b/include/linux/qed/qed_eth_if.h
index e00c8db..795c990 100644
--- a/include/linux/qed/qed_eth_if.h
+++ b/include/linux/qed/qed_eth_if.h
@@ -27,6 +27,7 @@ struct qed_dev_eth_info {
struct qed_update_vport_rss_params {
u16 rss_ind_table[128];
u32 rss_key[10];
+ u8 rss_caps;
};
struct qed_update_vport_params {
diff --git a/include/linux/qed/qed_if.h b/include/linux/qed/qed_if.h
index b007011..67e8c20 100644
--- a/include/linux/qed/qed_if.h
+++ b/include/linux/qed/qed_if.h
@@ -515,4 +515,15 @@ static inline void internal_ram_wr(void __iomem *addr,
__internal_ram_wr(NULL, addr, size, data);
}
+enum qed_rss_caps {
+ QED_RSS_IPV4 = 0x1,
+ QED_RSS_IPV6 = 0x2,
+ QED_RSS_IPV4_TCP = 0x4,
+ QED_RSS_IPV6_TCP = 0x8,
+ QED_RSS_IPV4_UDP = 0x10,
+ QED_RSS_IPV6_UDP = 0x20,
+};
+
+#define QED_RSS_IND_TABLE_SIZE 128
+#define QED_RSS_KEY_SIZE 10 /* size in 32b chunks */
#endif
--
1.9.3
^ permalink raw reply related
* [PATCH 4/4] qed* - bump driver versions to 8.7.1.20
From: Yuval Mintz @ 2016-04-10 9:43 UTC (permalink / raw)
To: davem, netdev; +Cc: Yuval Mintz
In-Reply-To: <1460281382-13656-1-git-send-email-Yuval.Mintz@qlogic.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
---
drivers/net/ethernet/qlogic/qed/qed.h | 2 +-
drivers/net/ethernet/qlogic/qede/qede.h | 4 ++--
2 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/net/ethernet/qlogic/qed/qed.h b/drivers/net/ethernet/qlogic/qed/qed.h
index a3ee9df..0f0d2d1 100644
--- a/drivers/net/ethernet/qlogic/qed/qed.h
+++ b/drivers/net/ethernet/qlogic/qed/qed.h
@@ -26,7 +26,7 @@
#include "qed_hsi.h"
extern const struct qed_common_ops qed_common_ops_pass;
-#define DRV_MODULE_VERSION "8.7.0.0"
+#define DRV_MODULE_VERSION "8.7.1.20"
#define MAX_HWFNS_PER_DEVICE (4)
#define NAME_SIZE 16
diff --git a/drivers/net/ethernet/qlogic/qede/qede.h b/drivers/net/ethernet/qlogic/qede/qede.h
index 80dbb73..41c4189 100644
--- a/drivers/net/ethernet/qlogic/qede/qede.h
+++ b/drivers/net/ethernet/qlogic/qede/qede.h
@@ -25,8 +25,8 @@
#define QEDE_MAJOR_VERSION 8
#define QEDE_MINOR_VERSION 7
-#define QEDE_REVISION_VERSION 0
-#define QEDE_ENGINEERING_VERSION 0
+#define QEDE_REVISION_VERSION 1
+#define QEDE_ENGINEERING_VERSION 20
#define DRV_MODULE_VERSION __stringify(QEDE_MAJOR_VERSION) "." \
__stringify(QEDE_MINOR_VERSION) "." \
__stringify(QEDE_REVISION_VERSION) "." \
--
1.9.3
^ permalink raw reply related
* [PATCH 3/4] qede: add Rx flow hash/indirection support.
From: Yuval Mintz @ 2016-04-10 9:43 UTC (permalink / raw)
To: davem, netdev; +Cc: Sudarsana Reddy Kalluru, Yuval Mintz
In-Reply-To: <1460281382-13656-1-git-send-email-Yuval.Mintz@qlogic.com>
From: Sudarsana Reddy Kalluru <sudarsana.kalluru@qlogic.com>
Adds support for the following via ethtool:
- UDP configuration of RSS based on 2-tuple/4-tuple.
- RSS hash key.
- RSS indirection table.
Signed-off-by: Sudarsana Reddy Kalluru <sudarsana.kalluru@qlogic.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
---
drivers/net/ethernet/qlogic/qede/qede.h | 4 +
drivers/net/ethernet/qlogic/qede/qede_ethtool.c | 237 +++++++++++++++++++++++-
drivers/net/ethernet/qlogic/qede/qede_main.c | 52 +++++-
3 files changed, 283 insertions(+), 10 deletions(-)
diff --git a/drivers/net/ethernet/qlogic/qede/qede.h b/drivers/net/ethernet/qlogic/qede/qede.h
index e0a696a..80dbb73 100644
--- a/drivers/net/ethernet/qlogic/qede/qede.h
+++ b/drivers/net/ethernet/qlogic/qede/qede.h
@@ -154,6 +154,10 @@ struct qede_dev {
SKB_DATA_ALIGN(sizeof(struct skb_shared_info)))
struct qede_stats stats;
+#define QEDE_RSS_INDIR_INITED BIT(0)
+#define QEDE_RSS_KEY_INITED BIT(1)
+#define QEDE_RSS_CAPS_INITED BIT(2)
+ u32 rss_params_inited; /* bit-field to track initialized rss params */
struct qed_update_vport_rss_params rss_params;
u16 q_num_rx_buffers; /* Must be a power of two */
u16 q_num_tx_buffers; /* Must be a power of two */
diff --git a/drivers/net/ethernet/qlogic/qede/qede_ethtool.c b/drivers/net/ethernet/qlogic/qede/qede_ethtool.c
index c49dc10..f0982f1 100644
--- a/drivers/net/ethernet/qlogic/qede/qede_ethtool.c
+++ b/drivers/net/ethernet/qlogic/qede/qede_ethtool.c
@@ -569,6 +569,236 @@ static int qede_set_phys_id(struct net_device *dev,
return 0;
}
+static int qede_get_rss_flags(struct qede_dev *edev, struct ethtool_rxnfc *info)
+{
+ info->data = RXH_IP_SRC | RXH_IP_DST;
+
+ switch (info->flow_type) {
+ case TCP_V4_FLOW:
+ case TCP_V6_FLOW:
+ info->data |= RXH_L4_B_0_1 | RXH_L4_B_2_3;
+ break;
+ case UDP_V4_FLOW:
+ if (edev->rss_params.rss_caps & QED_RSS_IPV4_UDP)
+ info->data |= RXH_L4_B_0_1 | RXH_L4_B_2_3;
+ break;
+ case UDP_V6_FLOW:
+ if (edev->rss_params.rss_caps & QED_RSS_IPV6_UDP)
+ info->data |= RXH_L4_B_0_1 | RXH_L4_B_2_3;
+ break;
+ case IPV4_FLOW:
+ case IPV6_FLOW:
+ break;
+ default:
+ info->data = 0;
+ break;
+ }
+
+ return 0;
+}
+
+static int qede_get_rxnfc(struct net_device *dev, struct ethtool_rxnfc *info,
+ u32 *rules __always_unused)
+{
+ struct qede_dev *edev = netdev_priv(dev);
+
+ switch (info->cmd) {
+ case ETHTOOL_GRXRINGS:
+ info->data = edev->num_rss;
+ return 0;
+ case ETHTOOL_GRXFH:
+ return qede_get_rss_flags(edev, info);
+ default:
+ DP_ERR(edev, "Command parameters not supported\n");
+ return -EOPNOTSUPP;
+ }
+}
+
+static int qede_set_rss_flags(struct qede_dev *edev, struct ethtool_rxnfc *info)
+{
+ struct qed_update_vport_params vport_update_params;
+ u8 set_caps = 0, clr_caps = 0;
+
+ DP_VERBOSE(edev, QED_MSG_DEBUG,
+ "Set rss flags command parameters: flow type = %d, data = %llu\n",
+ info->flow_type, info->data);
+
+ switch (info->flow_type) {
+ case TCP_V4_FLOW:
+ case TCP_V6_FLOW:
+ /* For TCP only 4-tuple hash is supported */
+ if (info->data ^ (RXH_IP_SRC | RXH_IP_DST |
+ RXH_L4_B_0_1 | RXH_L4_B_2_3)) {
+ DP_INFO(edev, "Command parameters not supported\n");
+ return -EINVAL;
+ }
+ return 0;
+ case UDP_V4_FLOW:
+ /* For UDP either 2-tuple hash or 4-tuple hash is supported */
+ if (info->data == (RXH_IP_SRC | RXH_IP_DST |
+ RXH_L4_B_0_1 | RXH_L4_B_2_3)) {
+ set_caps = QED_RSS_IPV4_UDP;
+ DP_VERBOSE(edev, QED_MSG_DEBUG,
+ "UDP 4-tuple enabled\n");
+ } else if (info->data == (RXH_IP_SRC | RXH_IP_DST)) {
+ clr_caps = QED_RSS_IPV4_UDP;
+ DP_VERBOSE(edev, QED_MSG_DEBUG,
+ "UDP 4-tuple disabled\n");
+ } else {
+ return -EINVAL;
+ }
+ break;
+ case UDP_V6_FLOW:
+ /* For UDP either 2-tuple hash or 4-tuple hash is supported */
+ if (info->data == (RXH_IP_SRC | RXH_IP_DST |
+ RXH_L4_B_0_1 | RXH_L4_B_2_3)) {
+ set_caps = QED_RSS_IPV6_UDP;
+ DP_VERBOSE(edev, QED_MSG_DEBUG,
+ "UDP 4-tuple enabled\n");
+ } else if (info->data == (RXH_IP_SRC | RXH_IP_DST)) {
+ clr_caps = QED_RSS_IPV6_UDP;
+ DP_VERBOSE(edev, QED_MSG_DEBUG,
+ "UDP 4-tuple disabled\n");
+ } else {
+ return -EINVAL;
+ }
+ break;
+ case IPV4_FLOW:
+ case IPV6_FLOW:
+ /* For IP only 2-tuple hash is supported */
+ if (info->data ^ (RXH_IP_SRC | RXH_IP_DST)) {
+ DP_INFO(edev, "Command parameters not supported\n");
+ return -EINVAL;
+ }
+ return 0;
+ case SCTP_V4_FLOW:
+ case AH_ESP_V4_FLOW:
+ case AH_V4_FLOW:
+ case ESP_V4_FLOW:
+ case SCTP_V6_FLOW:
+ case AH_ESP_V6_FLOW:
+ case AH_V6_FLOW:
+ case ESP_V6_FLOW:
+ case IP_USER_FLOW:
+ case ETHER_FLOW:
+ /* RSS is not supported for these protocols */
+ if (info->data) {
+ DP_INFO(edev, "Command parameters not supported\n");
+ return -EINVAL;
+ }
+ return 0;
+ default:
+ return -EINVAL;
+ }
+
+ /* No action is needed if there is no change in the rss capability */
+ if (edev->rss_params.rss_caps == ((edev->rss_params.rss_caps &
+ ~clr_caps) | set_caps))
+ return 0;
+
+ /* Update internal configuration */
+ edev->rss_params.rss_caps = (edev->rss_params.rss_caps & ~clr_caps) |
+ set_caps;
+ edev->rss_params_inited |= QEDE_RSS_CAPS_INITED;
+
+ /* Re-configure if possible */
+ if (netif_running(edev->ndev)) {
+ memset(&vport_update_params, 0, sizeof(vport_update_params));
+ vport_update_params.update_rss_flg = 1;
+ vport_update_params.vport_id = 0;
+ memcpy(&vport_update_params.rss_params, &edev->rss_params,
+ sizeof(vport_update_params.rss_params));
+ return edev->ops->vport_update(edev->cdev,
+ &vport_update_params);
+ }
+
+ return 0;
+}
+
+static int qede_set_rxnfc(struct net_device *dev, struct ethtool_rxnfc *info)
+{
+ struct qede_dev *edev = netdev_priv(dev);
+
+ switch (info->cmd) {
+ case ETHTOOL_SRXFH:
+ return qede_set_rss_flags(edev, info);
+ default:
+ DP_INFO(edev, "Command parameters not supported\n");
+ return -EOPNOTSUPP;
+ }
+}
+
+static u32 qede_get_rxfh_indir_size(struct net_device *dev)
+{
+ return QED_RSS_IND_TABLE_SIZE;
+}
+
+static u32 qede_get_rxfh_key_size(struct net_device *dev)
+{
+ struct qede_dev *edev = netdev_priv(dev);
+
+ return sizeof(edev->rss_params.rss_key);
+}
+
+static int qede_get_rxfh(struct net_device *dev, u32 *indir, u8 *key, u8 *hfunc)
+{
+ struct qede_dev *edev = netdev_priv(dev);
+ int i;
+
+ if (hfunc)
+ *hfunc = ETH_RSS_HASH_TOP;
+
+ if (!indir)
+ return 0;
+
+ for (i = 0; i < QED_RSS_IND_TABLE_SIZE; i++)
+ indir[i] = edev->rss_params.rss_ind_table[i];
+
+ if (key)
+ memcpy(key, edev->rss_params.rss_key,
+ qede_get_rxfh_key_size(dev));
+
+ return 0;
+}
+
+static int qede_set_rxfh(struct net_device *dev, const u32 *indir,
+ const u8 *key, const u8 hfunc)
+{
+ struct qed_update_vport_params vport_update_params;
+ struct qede_dev *edev = netdev_priv(dev);
+ int i;
+
+ if (hfunc != ETH_RSS_HASH_NO_CHANGE && hfunc != ETH_RSS_HASH_TOP)
+ return -EOPNOTSUPP;
+
+ if (!indir && !key)
+ return 0;
+
+ if (indir) {
+ for (i = 0; i < QED_RSS_IND_TABLE_SIZE; i++)
+ edev->rss_params.rss_ind_table[i] = indir[i];
+ edev->rss_params_inited |= QEDE_RSS_INDIR_INITED;
+ }
+
+ if (key) {
+ memcpy(&edev->rss_params.rss_key, key,
+ qede_get_rxfh_key_size(dev));
+ edev->rss_params_inited |= QEDE_RSS_KEY_INITED;
+ }
+
+ if (netif_running(edev->ndev)) {
+ memset(&vport_update_params, 0, sizeof(vport_update_params));
+ vport_update_params.update_rss_flg = 1;
+ vport_update_params.vport_id = 0;
+ memcpy(&vport_update_params.rss_params, &edev->rss_params,
+ sizeof(vport_update_params.rss_params));
+ return edev->ops->vport_update(edev->cdev,
+ &vport_update_params);
+ }
+
+ return 0;
+}
+
static const struct ethtool_ops qede_ethtool_ops = {
.get_settings = qede_get_settings,
.set_settings = qede_set_settings,
@@ -585,7 +815,12 @@ static const struct ethtool_ops qede_ethtool_ops = {
.set_phys_id = qede_set_phys_id,
.get_ethtool_stats = qede_get_ethtool_stats,
.get_sset_count = qede_get_sset_count,
-
+ .get_rxnfc = qede_get_rxnfc,
+ .set_rxnfc = qede_set_rxnfc,
+ .get_rxfh_indir_size = qede_get_rxfh_indir_size,
+ .get_rxfh_key_size = qede_get_rxfh_key_size,
+ .get_rxfh = qede_get_rxfh,
+ .set_rxfh = qede_set_rxfh,
.get_channels = qede_get_channels,
.set_channels = qede_set_channels,
};
diff --git a/drivers/net/ethernet/qlogic/qede/qede_main.c b/drivers/net/ethernet/qlogic/qede/qede_main.c
index a55d93e..457caad 100644
--- a/drivers/net/ethernet/qlogic/qede/qede_main.c
+++ b/drivers/net/ethernet/qlogic/qede/qede_main.c
@@ -2826,10 +2826,10 @@ static int qede_start_queues(struct qede_dev *edev)
int rc, tc, i;
int vlan_removal_en = 1;
struct qed_dev *cdev = edev->cdev;
- struct qed_update_vport_rss_params *rss_params = &edev->rss_params;
struct qed_update_vport_params vport_update_params;
struct qed_queue_start_common_params q_params;
struct qed_start_vport_params start = {0};
+ bool reset_rss_indir = false;
if (!edev->num_rss) {
DP_ERR(edev,
@@ -2924,16 +2924,50 @@ static int qede_start_queues(struct qede_dev *edev)
/* Fill struct with RSS params */
if (QEDE_RSS_CNT(edev) > 1) {
vport_update_params.update_rss_flg = 1;
- for (i = 0; i < 128; i++)
- rss_params->rss_ind_table[i] =
- ethtool_rxfh_indir_default(i, QEDE_RSS_CNT(edev));
- netdev_rss_key_fill(rss_params->rss_key,
- sizeof(rss_params->rss_key));
+
+ /* Need to validate current RSS config uses valid entries */
+ for (i = 0; i < QED_RSS_IND_TABLE_SIZE; i++) {
+ if (edev->rss_params.rss_ind_table[i] >=
+ edev->num_rss) {
+ reset_rss_indir = true;
+ break;
+ }
+ }
+
+ if (!(edev->rss_params_inited & QEDE_RSS_INDIR_INITED) ||
+ reset_rss_indir) {
+ u16 val;
+
+ for (i = 0; i < QED_RSS_IND_TABLE_SIZE; i++) {
+ u16 indir_val;
+
+ val = QEDE_RSS_CNT(edev);
+ indir_val = ethtool_rxfh_indir_default(i, val);
+ edev->rss_params.rss_ind_table[i] = indir_val;
+ }
+ edev->rss_params_inited |= QEDE_RSS_INDIR_INITED;
+ }
+
+ if (!(edev->rss_params_inited & QEDE_RSS_KEY_INITED)) {
+ netdev_rss_key_fill(edev->rss_params.rss_key,
+ sizeof(edev->rss_params.rss_key));
+ edev->rss_params_inited |= QEDE_RSS_KEY_INITED;
+ }
+
+ if (!(edev->rss_params_inited & QEDE_RSS_CAPS_INITED)) {
+ edev->rss_params.rss_caps = QED_RSS_IPV4 |
+ QED_RSS_IPV6 |
+ QED_RSS_IPV4_TCP |
+ QED_RSS_IPV6_TCP;
+ edev->rss_params_inited |= QEDE_RSS_CAPS_INITED;
+ }
+
+ memcpy(&vport_update_params.rss_params, &edev->rss_params,
+ sizeof(vport_update_params.rss_params));
} else {
- memset(rss_params, 0, sizeof(*rss_params));
+ memset(&vport_update_params.rss_params, 0,
+ sizeof(vport_update_params.rss_params));
}
- memcpy(&vport_update_params.rss_params, rss_params,
- sizeof(*rss_params));
rc = edev->ops->vport_update(cdev, &vport_update_params);
if (rc) {
--
1.9.3
^ permalink raw reply related
* [PATCH net] packet: fix heap info leak in PACKET_DIAG_MCLIST sock_diag interface
From: Mathias Krause @ 2016-04-10 10:52 UTC (permalink / raw)
To: David S. Miller; +Cc: netdev, Eric W. Biederman, Pavel Emelyanov
Because we miss to wipe the remainder of i->addr[] in packet_mc_add(),
pdiag_put_mclist() leaks uninitialized heap bytes via the
PACKET_DIAG_MCLIST netlink attribute.
Fix this by explicitly memset(0)ing the remaining bytes in i->addr[].
Fixes: eea68e2f1a00 ("packet: Report socket mclist info via diag module")
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
---
The bug itself precedes commit eea68e2f1a00 but the list wasn't exposed
to userland before the introduction of the packet_diag interface.
Therefore the "Fixes:" line on that commit.
net/packet/af_packet.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index 992396aa635c..86a408cf38d5 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -3441,6 +3441,7 @@ static int packet_mc_add(struct sock *sk, struct packet_mreq_max *mreq)
i->ifindex = mreq->mr_ifindex;
i->alen = mreq->mr_alen;
memcpy(i->addr, mreq->mr_address, i->alen);
+ memset(i->addr + i->alen, 0, sizeof(i->addr) - i->alen);
i->count = 1;
i->next = po->mclist;
po->mclist = i;
--
1.7.10.4
^ permalink raw reply related
* [PATCH] ath9k: remove duplicate assignment of variable ah
From: Colin King @ 2016-04-10 11:25 UTC (permalink / raw)
To: QCA ath9k Development, Kalle Valo, linux-wireless, ath9k-devel,
netdev
Cc: linux-kernel
From: Colin Ian King <colin.king@canonical.com>
ah is written twice with the same value, remove one of the
redundant assignments to ah.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
---
drivers/net/wireless/ath/ath9k/init.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/wireless/ath/ath9k/init.c b/drivers/net/wireless/ath/ath9k/init.c
index 77ace8d..fb702c4 100644
--- a/drivers/net/wireless/ath/ath9k/init.c
+++ b/drivers/net/wireless/ath/ath9k/init.c
@@ -477,7 +477,7 @@ static void ath9k_eeprom_request_cb(const struct firmware *eeprom_blob,
static int ath9k_eeprom_request(struct ath_softc *sc, const char *name)
{
struct ath9k_eeprom_ctx ec;
- struct ath_hw *ah = ah = sc->sc_ah;
+ struct ath_hw *ah = sc->sc_ah;
int err;
/* try to load the EEPROM content asynchronously */
--
2.7.4
^ permalink raw reply related
* Re: AP firmware for TI wl1251 wifi chip (wl1251-fw-ap.bin)
From: Pavel Machek @ 2016-04-10 11:51 UTC (permalink / raw)
To: Machani, Yaniv
Cc: Pali Rohár, Luciano Coelho, Balbi, Felipe, Chalmers, Kevin,
Shahar Levi, Kalle Valo, Davis, Andrew, Mishol, Guy, Arik Nemtsov,
Gery Kahn, Felipe Balbi, Luciano Coelho, David Woodhouse,
Aaro Koskinen, Ben Hutchings, David Gnedt, Ivaylo Dimitrov,
Sebastian Reichel, Tony Lindgren, Menon, Nishanth,
linux-wireless-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
In-Reply-To: <AE1C82FB3D0EC64DB1F752C81CBD1101390F685E-1tpBd5JUCm6IQmiDNMet8wC/G2K4zDHf@public.gmane.org>
Hi!
> > > wl1251 does not support AP mode, so there is no firmware for it in
> > > the tree.
> > >
> > > Regards,
> > > Yaniv
> >
> > Hi Yaniv! I read on some TI whitepaper, that wl1251 hardware supports
> > some Soft-AP mode. So I expect that either special FW is needed for it
> > or somehow it is possible to use current released. Do you have any information about it?
>
> Hi Pali,
> This must be some typo, the device does not support Soft-AP.
> More than that, wl1251 family is not officially supported via the mainline Linux.
> For Soft-AP, and other new features based on Linux you should use WiLink8 chip family.
Too late for that, we already have the devices, and they were manufactured quite long
time ago.
Is it "hardware can't do AP", "firmware can't do AP" or "current drivers
do not support AP"?
Best regards,
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [PATCHv2 net-next 1/6] sctp: add sctp_info dump api for sctp_diag
From: Jamal Hadi Salim @ 2016-04-10 13:02 UTC (permalink / raw)
To: Eric Dumazet
Cc: Xin Long, network dev, linux-sctp, Marcelo Ricardo Leitner,
Vlad Yasevich, daniel, davem
In-Reply-To: <1460222507.6473.493.camel@edumazet-glaptop3.roam.corp.google.com>
On 16-04-09 01:21 PM, Eric Dumazet wrote:
> Well, once a hole is there, nothing we can do really, because of
> compatibility with old kernels / old binaries.
>
>
> But when a _new_ structure is defined, this is the time where we can ask
> for doing sensible things ;)
>
This one is fixable. sizeof() already includes the accounting of
the pad. something like:
diff --git a/include/uapi/linux/tcp.h b/include/uapi/linux/tcp.h
index fe95446..52542eb 100644
--- a/include/uapi/linux/tcp.h
+++ b/include/uapi/linux/tcp.h
@@ -158,6 +158,7 @@ struct tcp_info {
__u8 tcpi_options;
__u8 tcpi_snd_wscale : 4, tcpi_rcv_wscale : 4;
+ __u8 pad; /*reuse this space if you need 8bits for something*/
__u32 tcpi_rto;
__u32 tcpi_ato;
__u32 tcpi_snd_mss;
cheers,
jamal
^ permalink raw reply related
* Re: [RFC PATCH v2 1/5] bpf: add PHYS_DEV prog type for early driver filter
From: Jamal Hadi Salim @ 2016-04-10 13:07 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: Tom Herbert, Jesper Dangaard Brouer, Brenden Blanco,
David S. Miller, Linux Kernel Network Developers, Or Gerlitz,
Daniel Borkmann, Eric Dumazet, Edward Cree, john fastabend,
Thomas Graf, Johannes Berg, eranlinuxmellanox, Lorenzo Colitti,
linux-mm, Robert Olsson, kuznet
In-Reply-To: <20160409172634.GA55330@ast-mbp.thefacebook.com>
On 16-04-09 01:26 PM, Alexei Starovoitov wrote:
>
> yeah, no stack, no queues in bpf.
Thanks.
>
>> If this is _forwarding only_ it maybe useful to look at
>> Alexey's old code in particular the DMA bits;
>> he built his own lookup algorithm but sounds like bpf is
>> a much better fit today.
>
> a link to these old bits?
>
Dang. Trying to remember exact name (I think it has been gone for at
least 10 years now). I know it is not CONFIG_NET_FASTROUTE although
it could have been that depending on the driver (tulip had some
nice DMA properties - which by todays standards would be considered
primitive ;->).
+Cc Robert and Alexey (Trying to figure out name of driver based
routing code that DMAed from ingress to egress port)
> Just to be clear: this rfc is not the only thing we're considering.
> In particular huawei guys did a monster effort to improve performance
> in this area as well. We'll try to blend all the code together and
> pick what's the best.
>
Sounds very interesting.
cheers,
jamal
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply
* Re: [PATCH net-next v2 1/2] rtnetlink: add new RTM_GETSTATS message to dump link stats
From: Jamal Hadi Salim @ 2016-04-10 13:48 UTC (permalink / raw)
To: roopa; +Cc: netdev, davem
In-Reply-To: <57094326.9040009@cumulusnetworks.com>
On 16-04-09 02:00 PM, roopa wrote:
> This EXTENDED_HW_STATS is for ethtool like extended hw stats. This is keeping in
> mind that we want to also move ethtool to netlink in the future and with switchdev
> it becomes more necessary that we provide all stats closer to the other netdev stats.
> So far hw extended stats have always been available through this separate ethtool
> channel. The intent here is to unify the api for extended hw and software only stats.
Ok, so these are _not_ the stats which are broken down by packet size
ranges which are quiet common; but rather "proprietary" per port
type h/w stats? I browsed a couple of users of ethtool_stats and i see
they return proprietary looking 64 bit counters (batman for example
has a very strange meaning to those stats).
What i meant is a lot of ASICS have counters for byte ranges
[64bytes-128bytes], [128bytes-256bytes], etc - sorry i cant pin a name
to those but i am sure you have seen them and i thought those at minimal
need their own TLV since they are always fixed.
> XSTATS is per netdev can be included as a nested attribute inside IFLA_EXTENDED_STATS
> which are per netdev. bridge vlan stats will also fall here.
>
And you are going to distinguish which come from hardware and which are
software derived?
> And this api provides netdev specific stats. We have always mapped all
> asic stats to the switch port netdev stats. and this api does not cover the non-netdev specific stats.
> If you are for example asking for stats for a hardware offloaded bridge, then yes, they will fall here
> and be available on the bridge netdev. For asic stats that don't map to any netdev, devlink will be an
> appropriate infrastructure IMO.
>
> I am not sure if I answered your question :).
>
It is useful to have this discussion; unfortunately these user APIS once
are in will never be allowed to change. The answers could come
in time.
>> Should such a command then not be rejected with an error code?
> Dumps with no data are not rejected with an error code AFAIK. ie they don't return
> -ENODATA. This is consistent with all other dumps (unless i missed it).
> But, if there is a need for an error code, i can certainly check again.
>
It is mostly because you chose a whitelist filter i.e you list what
is allowed to be sent back. And if such a list is missing there
needs to be an opposite default (which is a deny all).
>>> +/* STATS section */
>>> +
>>> +struct if_stats_msg {
>>> + __u8 family;
>>> + __u32 ifindex;
>>> + __u32 filter_mask;
>>> +};
>>
>> Needs to be 32 bit aligned.
>> Do you need 32 bits for the filter mask?
> yes, i think we should keep it minimum 32 bits.
Ok, that is fine then. I thought it wont exceed
3-4 per port type but i could be wrong. 32 bits
should be safer.
>> Perhaps a 16bit mask and an 8bit pad for future use.
>>
>> struct if_stats_msg {
>> __u32 ifindex;
>> __u16 filter_mask;
>> __u8 family;
>> __u8 pad; /* future use */
>> };
>>
>> Or you could reverse those (from smallest to largest).
>
> The __u8 family needs to be the first field in the structure and at the first byte in the header data.
> hence family is first and i added the others after that. It follows the format for existing such structs (for other message types).
>
True, but it must be 32 bit aligned. So something like:
struct if_stats_msg {
__u8 family;
__u8 pad1;
__u16 pad2;
__u32 ifindex;
__u32 filter_mask;
}
> Yeah i remember :). But deferring it for a later incremental feature. That needs some more thought.
NP ;->
> Right now there is an urgent need to get the basic get stats api for a bunch of other stats: mpls, bridge vlan etc.
> Because it is not clean to extend the current stats infra part of the link message for this. So trying to get this in first.
>
Agreed.
The only thing i strongly feel about is the if_stats_msg - please fix
that and lets get at least the basics in. We can resolve other things
with further discussions.
> And this patchset only adds a handler for RTM_NEWSTATS dump and get stats. Your stats events request should be part of the RTM_NEWSTATS handler and can include other attributes (like timeout) in the future.
>
Ok.
cheers,
jamal
> Thanks,
> Roopa
>
^ permalink raw reply
* Re: [RFC PATCH v2 1/5] bpf: add PHYS_DEV prog type for early driver filter
From: Tom Herbert @ 2016-04-10 16:53 UTC (permalink / raw)
To: Thomas Graf
Cc: Alexei Starovoitov, Jamal Hadi Salim, Jesper Dangaard Brouer,
Brenden Blanco, David S. Miller, Linux Kernel Network Developers,
Or Gerlitz, Daniel Borkmann, Eric Dumazet, Edward Cree,
john fastabend, Johannes Berg, eranlinuxmellanox, Lorenzo Colitti,
linux-mm
In-Reply-To: <20160410075550.GA22873@pox.localdomain>
On Sun, Apr 10, 2016 at 12:55 AM, Thomas Graf <tgraf@suug.ch> wrote:
> On 04/09/16 at 10:26am, Alexei Starovoitov wrote:
>> On Sat, Apr 09, 2016 at 11:29:18AM -0400, Jamal Hadi Salim wrote:
>> > If this is _forwarding only_ it maybe useful to look at
>> > Alexey's old code in particular the DMA bits;
>> > he built his own lookup algorithm but sounds like bpf is
>> > a much better fit today.
>>
>> a link to these old bits?
>>
>> Just to be clear: this rfc is not the only thing we're considering.
>> In particular huawei guys did a monster effort to improve performance
>> in this area as well. We'll try to blend all the code together and
>> pick what's the best.
>
> What's the plan on opening the discussion on this? Can we get a peek?
> Is it an alternative to XDP and the driver hook? Different architecture
> or just different implementation? I understood it as another pseudo
> skb model with a path on converting to real skbs for stack processing.
>
We started discussions about this in IOvisor. The Huawei project is
called ceth (Common Ethernet). It is essentially a layer called
directly from drivers intended for fast path forwarding and network
virtualization. They have put quite a bit of effort into buffer
management and other parts of the infrastructure, much of which we
would like to leverage in XDP. The code is currently in github, will
ask them to make it generally accessible.
The programmability part, essentially BPF, should be part of a common
solution. We can define the necessary interfaces independently of the
underlying infrastructure which is really the only way we can do this
if we want the BPF programs to be portable across different
platforms-- in Linux, userspace, HW, etc.
Tom
> I really like the current proposal by Brenden for its simplicity and
> targeted compatibility with cls_bpf.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply
* Re: [RFC PATCH v2 1/5] bpf: add PHYS_DEV prog type for early driver filter
From: Jamal Hadi Salim @ 2016-04-10 18:09 UTC (permalink / raw)
To: Tom Herbert, Thomas Graf
Cc: Alexei Starovoitov, Jesper Dangaard Brouer, Brenden Blanco,
David S. Miller, Linux Kernel Network Developers, Or Gerlitz,
Daniel Borkmann, Eric Dumazet, Edward Cree, john fastabend,
Johannes Berg, eranlinuxmellanox, Lorenzo Colitti, linux-mm
In-Reply-To: <CALx6S37gzDE0-S4JjS3m+FyuA2xypyp+O+XYdeK3ciAnvmmkyQ@mail.gmail.com>
On 16-04-10 12:53 PM, Tom Herbert wrote:
> We started discussions about this in IOvisor. The Huawei project is
> called ceth (Common Ethernet). It is essentially a layer called
> directly from drivers intended for fast path forwarding and network
> virtualization. They have put quite a bit of effort into buffer
> management and other parts of the infrastructure, much of which we
> would like to leverage in XDP. The code is currently in github, will
> ask them to make it generally accessible.
>
Cant seem to find any info on it on the googles.
If it is forwarding then it should hopefully at least make use of Linux
control APIs I hope.
cheers,
jamal
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply
* Re: [PATCH net-next v2 1/2] rtnetlink: add new RTM_GETSTATS message to dump link stats
From: roopa @ 2016-04-10 18:28 UTC (permalink / raw)
To: Thomas Graf; +Cc: netdev, jhs, davem
In-Reply-To: <20160410081650.GB22873@pox.localdomain>
On 4/10/16, 1:16 AM, Thomas Graf wrote:
> On 04/08/16 at 11:38pm, Roopa Prabhu wrote:
>> From: Roopa Prabhu <roopa@cumulusnetworks.com>
>>
>> This patch adds a new RTM_GETSTATS message to query link stats via netlink
>> from the kernel. RTM_NEWLINK also dumps stats today, but RTM_NEWLINK
>> returns a lot more than just stats and is expensive in some cases when
>> frequent polling for stats from userspace is a common operation.
>>
>> RTM_GETSTATS is an attempt to provide a light weight netlink message
>> to explicity query only link stats from the kernel on an interface.
>> The idea is to also keep it extensible so that new kinds of stats can be
>> added to it in the future.
>>
>> This patch adds the following attribute for NETDEV stats:
>> struct nla_policy ifla_stats_policy[IFLA_STATS_MAX + 1] = {
>> [IFLA_STATS_LINK64] = { .len = sizeof(struct rtnl_link_stats64) },
>> };
>>
>> This patch also allows for af family stats (an example af stats for IPV6
>> is available with the second patch in the series).
>>
>> Like any other rtnetlink message, RTM_GETSTATS can be used to get stats of
>> a single interface or all interfaces with NLM_F_DUMP.
> Awesome stuff Roopa.
>
> This currently ties everything to a net_device with a selector to
> include certain bits of that net_device. How about we take it half a
> step further and allow for non net_device stats such as IP, TCP,
> routing or ipsec stats to be retrieved as well?
yes, absolutely. and that is also the goal.
> A simple array of nested attributes replacing IFLA_STATS_* would
> allow for that, e.g.
>
> 1. {.type = ST_IPSTATS, value = { ...} }
>
> 2. {.type = ST_LINK, .value = {
> {.type = ST_LINK_NAME, .value = "eth0"},
> {.type = ST_LINK_Q, .value = 10}
> }}
>
> 3. ...
One thing though, Its unclear to me if we absolutely need the additional nest.
Every stats netlink msg has an ifindex in the header (if_stats_msg) if the scope
of the stats is a netdev. If the msg does not have an ifindex in the if_stats_msg,
it represents a global stat. ie Cant a dump, include other stats netlink msgs after
all the netdev msgs are done when the filter has global stat filters ?.
same will apply to RTM_GETSTATS (without NLM_F_DUMP).
Since the msg may potentially have more nest levels
in the IFLA_EXT_STATS categories, just trying to see if i can avoid adding another
top-level nest. We can sure add it if there is no other way to include global
stats in the same dump.
> So for your initial patch, you'd simply add a new top level attribute
> which ties all the net_device specific statistics to a top level
> attribute and we can add the non net_device specific stats later on.
> We can't do that later on though without breaking ABI so that would
> have to go in with the first iteration.
agreed
>
> We can preserve all the existing attribute formats, we simply have
> to introduce new attribute types which don't overlap and document
> which statistic identifier maps to which existing attribute id.
sure,
thanks Thomas.
^ permalink raw reply
* [PATCH 4.5 155/238] ARC: [plat-axs10x] add Ethernet PHY description in .dts
From: Greg Kroah-Hartman @ 2016-04-10 18:35 UTC (permalink / raw)
To: linux-kernel
Cc: Greg Kroah-Hartman, stable, Alexey Brodkin, Rob Herring,
Phil Reid, David S. Miller, netdev, Sergei Shtylyov, Vineet Gupta
In-Reply-To: <20160410183456.398741366@linuxfoundation.org>
4.5-stable review patch. If anyone has any objections, please let me know.
------------------
From: Alexey Brodkin <Alexey.Brodkin@synopsys.com>
commit 667a490bdb6e27db0887d2ca515b907d6aa87118 upstream.
Commit e34d65696d2e ("stmmac: create of compatible mdio bus for stmmac
driver") broke DW GMAC functionality on ARC AXS10x boards:
That's what happens on eth0 up:
--------------------------->8------------------------
| libphy: PHY stmmac-0:ffffffff not found
| eth0: Could not attach to PHY
| stmmac_open: Cannot attach to PHY (error: -19)
--------------------------->8------------------------
Simplest solution is to add PHY description in board's .dts.
And so we do here.
Signed-off-by: Alexey Brodkin <abrodkin@synopsys.com>
Cc: Rob Herring <robh@kernel.org>
Cc: Phil Reid <preid@electromag.com.au>
Cc: David S. Miller <davem@davemloft.net>
Cc: linux-kernel@vger.kernel.org
Cc: netdev@vger.kernel.org
Reviewed-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
arch/arc/boot/dts/axs10x_mb.dtsi | 8 ++++++++
1 file changed, 8 insertions(+)
--- a/arch/arc/boot/dts/axs10x_mb.dtsi
+++ b/arch/arc/boot/dts/axs10x_mb.dtsi
@@ -47,6 +47,14 @@
clocks = <&apbclk>;
clock-names = "stmmaceth";
max-speed = <100>;
+ mdio0 {
+ #address-cells = <1>;
+ #size-cells = <0>;
+ compatible = "snps,dwmac-mdio";
+ phy1: ethernet-phy@1 {
+ reg = <1>;
+ };
+ };
};
ehci@0x40000 {
^ permalink raw reply
* Re: [RFC PATCH v2 5/5] Add sample for adding simple drop program to link
From: Brenden Blanco @ 2016-04-10 18:38 UTC (permalink / raw)
To: Jamal Hadi Salim
Cc: davem, netdev, tom, alexei.starovoitov, ogerlitz, daniel, brouer,
eric.dumazet, ecree, john.fastabend, tgraf, johannes,
eranlinuxmellanox, lorenzo
In-Reply-To: <57093B67.1080604@mojatatu.com>
On Sat, Apr 09, 2016 at 01:27:03PM -0400, Jamal Hadi Salim wrote:
> On 16-04-09 12:43 PM, Brenden Blanco wrote:
> >On Sat, Apr 09, 2016 at 10:48:05AM -0400, Jamal Hadi Salim wrote:
>
>
> >>Ok, sorry - should have looked this far before sending earlier email.
> >>So when you run concurently you see about 5Mpps per core but if you
> >>shoot all traffic at a single core you see 20Mpps?
> >No, only sender is multiple, receiver is still single core. The flow is
> >the same in all 4 of the send threads. Note that only ksoftirqd/6 is
> >active.
>
> Got it.
> The sender was limited to the 20Mpps and you are able to keep up
> if i understand correctly.
Perhaps, though I can't say 100%. The sender is able to do about 21/22
Mpps when pause frames are disabled. The sender is likely CPU limited as
it is an older Xeon.
>
>
> >>
> >>Devil's advocate question:
> >>If the bottleneck is the driver - is there an advantage in adding the
> >>bpf code at all in the driver?
> >Only by adding this hook into the driver has it become the bottleneck.
> >
> >Prior to this, the bottleneck was later in the codepath, primarily in
> >allocations.
> >
>
> Maybe useful in your commit log to show the prior and after.
I can add this, sure.
> Looking at both your and Daniel's profile you show in this email
> mlx4_en_process_rx_cq() seems to be where the action is on both, no?
I don't draw this conclusion. With the phys_dev drop,
mlx4_en_process_rx_cq is the majority time consumer. In the perf output
showing drop in tc, the functions such as dev_gro_receive,
kmem_cache_free, napi_gro_frags, inet_gro_receive, __build_skb, etc
combined add up to 60% of the time spent. None of these are called when
early drop occurs. Just because mlx4_en_process_rx_cq is at the top of
the list doesn't mean it is the lowest hanging fruit.
>
> >If a packet is to be dropped, and a determination can be made with fewer
> >cpu cycles spent, then there is more time for the goodput.
> >
>
> Agreed.
>
> >Beyond that, even if the skb allocation gets 10x or 100x or whatever
> >improvement, there is still a non-zero cost associated, and dropping bad
> >packets with minimal time spent has value. The same argument holds for
> >physical nic forwarding decisions.
> >
>
> I always go for the lowest hanging fruit.
Which to me is the 60% time spent above the driver level as shown above.
> It seemed it was the driver path in your case. When we removed
> the driver overhead (as demoed at the tc workshop in netdev11) we saw
> __netif_receive_skb_core() at the top of the profile.
> So in this case seems it was mlx4_en_process_rx_cq() - thats why i
> was saying the bottleneck is the driver.
I wouldn't call it a bottleneck when the time spent is additive,
aka run-to-completion.
> Having said that: I agree that early drop is useful if not for anything
> else to avoid the longer code path (but was worried after reading on
> thread this was going to get into a messy stack-in-the-driver and i am
> not sure it is avoidable either given a new ops interface is showing
> up).
>
> >>I am curious than before to see the comparison for the same bpf code
> >>running at tc level vs in the driver..
> >Here is a perf report for drop in the clsact qdisc with direct-action,
> >which Daniel earlier showed to have the best performance to-date. On my
> >machine, this gets about 6.5Mpps drop single core. Drop due to failed
> >IP lookup (not shown here) is worse @4.5Mpps.
> >
>
> Nice.
> However, still for this to be orange/orange comparison you have to
> run it on the _same receiver machine_ as opposed to Daniel doing
> it on his for the one case. And two different kernels booted up
> one patched with your changes and another virgin without them.
Of course the second perf report is on the same machine as the commit
message. That was generated fresh for this email thread. All of the
numbers I've quoted come from the same single-sender/single-receiver
setup. I did also revert the change the in mlx4 driver and there was no
change in the tc numbers.
>
> cheers,
> jamal
^ permalink raw reply
* Re: [Lsf] [Lsf-pc] [LSF/MM TOPIC] Generic page-pool recycle facility?
From: Sagi Grimberg @ 2016-04-10 18:45 UTC (permalink / raw)
To: Bart Van Assche, Christoph Hellwig, Jesper Dangaard Brouer
Cc: lsf@lists.linux-foundation.org, Tom Herbert, Brenden Blanco,
James Bottomley, linux-mm, netdev@vger.kernel.org,
lsf-pc@lists.linux-foundation.org, Alexei Starovoitov
In-Reply-To: <570678B7.7010802@sandisk.com>
>> This is also very interesting for storage targets, which face the same
>> issue. SCST has a mode where it caches some fully constructed SGLs,
>> which is probably very similar to what NICs want to do.
>
> I think a cached allocator for page sets + the scatterlists that
> describe these page sets would not only be useful for SCSI target
> implementations but also for the Linux SCSI initiator. Today the scsi-mq
> code reserves space in each scsi_cmnd for a scatterlist of
> SCSI_MAX_SG_SEGMENTS. If scatterlists would be cached together with page
> sets less memory would be needed per scsi_cmnd.
If we go down this road how about also attaching some driver opaques
to the page sets?
I know of some drivers that can make good use of those ;)
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply
* Re: [PATCH net-next v2 1/2] rtnetlink: add new RTM_GETSTATS message to dump link stats
From: roopa @ 2016-04-10 19:17 UTC (permalink / raw)
To: Jamal Hadi Salim; +Cc: netdev, davem
In-Reply-To: <570A59B4.30007@mojatatu.com>
On 4/10/16, 6:48 AM, Jamal Hadi Salim wrote:
> On 16-04-09 02:00 PM, roopa wrote:
>
>> This EXTENDED_HW_STATS is for ethtool like extended hw stats. This is keeping in
>> mind that we want to also move ethtool to netlink in the future and with switchdev
>> it becomes more necessary that we provide all stats closer to the other netdev stats.
>> So far hw extended stats have always been available through this separate ethtool
>> channel. The intent here is to unify the api for extended hw and software only stats.
>
> Ok, so these are _not_ the stats which are broken down by packet size
> ranges which are quiet common; but rather "proprietary" per port
> type h/w stats? I browsed a couple of users of ethtool_stats and i see
> they return proprietary looking 64 bit counters (batman for example
> has a very strange meaning to those stats).
> What i meant is a lot of ASICS have counters for byte ranges
> [64bytes-128bytes], [128bytes-256bytes], etc - sorry i cant pin a name
> to those but i am sure you have seen them and i thought those at minimal
> need their own TLV since they are always fixed.
yes, I think i have seen this. But, these if presented by the driver or hardware,
can be part of the extended hw stats (just like how ethtool would do).
So, there is the base netdev stats IFLA_STATS_LINK64 which is nothing but
rtnl_link_stats64 (which this patch adds). And this patch does not add but lists
examples for future additions IFLA_STATS_EXTENDED (for example: bridge
(can include igmp, stp, vlan stats here). bond can includelacp, and other stats here). All these will be TLV based.
and note that this patch does notrestrict or define the design for these yet.
>
>> XSTATS is per netdev can be included as a nested attribute inside IFLA_EXTENDED_STATS
>> which are per netdev. bridge vlan stats will also fall here.
>>
>
> And you are going to distinguish which come from hardware and which are
> software derived?
a) IFLA_EXTENDED_STATS - these are all software. ( for example: bridge (can include
igmp, stp, vlan stats here).
bond can include lacp, and other stats here).
b) IFLA_HW_EXTENDED_STATS - for hardware stats. for a switch port these will be
similar to ethtool
physical port stats today. For logical devices, this can include specific hw
stats that switch asics
provide and those which cannot be added to the software stats.
And since this patch does not define the format for these stats yet, we can
potentially design it when we see the first case of such stats.
>
>> And this api provides netdev specific stats. We have always mapped all
>> asic stats to the switch port netdev stats. and this api does not cover the non-netdev specific stats.
>> If you are for example asking for stats for a hardware offloaded bridge, then yes, they will fall here
>> and be available on the bridge netdev. For asic stats that don't map to any netdev, devlink will be an
>> appropriate infrastructure IMO.
>>
>> I am not sure if I answered your question :).
>>
>
> It is useful to have this discussion; unfortunately these user APIS once are in will never be allowed to change. The answers could come
> in time.
yep, agreed.
>
>>> Should such a command then not be rejected with an error code?
>> Dumps with no data are not rejected with an error code AFAIK. ie they don't return
>> -ENODATA. This is consistent with all other dumps (unless i missed it).
>> But, if there is a need for an error code, i can certainly check again.
>>
>
> It is mostly because you chose a whitelist filter i.e you list what
> is allowed to be sent back. And if such a list is missing there
> needs to be an opposite default (which is a deny all).
ok, by default no-filter = no-data is what we decided. And that
keeps the api simple. will think some more about if we should return
an err in case of no-filter.
>
>
[snip]
> True, but it must be 32 bit aligned. So something like:
> struct if_stats_msg {
> __u8 family;
> __u8 pad1;
> __u16 pad2;
> __u32 ifindex;
> __u32 filter_mask;
> }
>
ok ack
>
>> Yeah i remember :). But deferring it for a later incremental feature. That needs some more thought.
>
> NP ;->
>
>> Right now there is an urgent need to get the basic get stats api for a bunch of other stats: mpls, bridge vlan etc.
>> Because it is not clean to extend the current stats infra part of the link message for this. So trying to get this in first.
>>
>
> Agreed.
> The only thing i strongly feel about is the if_stats_msg - please fix
> that and lets get at least the basics in. We can resolve other things
> with further discussions.
will do. Thanks for the review.
^ permalink raw reply
* [PATCH] ravb: make ravb_ptp_interrupt() *void*
From: Sergei Shtylyov @ 2016-04-10 20:55 UTC (permalink / raw)
To: netdev; +Cc: linux-renesas-soc
In-Reply-To: <2504091.pf4fyK11eg@wasted.cogentembedded.com>
When we have the ISS.CGIS bit set, we already know that gPTP interrupt has
happened, so an extra GIS register check at the end of ravb_ptp_interrupt()
seems superfluous. We can model the gPTP interrupt handler like all other
dedicated interrupt handlers in the driver and make it *void*.
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
---
The patch is against the Dave Miller's 'net-next.git' repo.
drivers/net/ethernet/renesas/ravb.h | 2 +-
drivers/net/ethernet/renesas/ravb_main.c | 8 ++++++--
drivers/net/ethernet/renesas/ravb_ptp.c | 9 ++-------
3 files changed, 9 insertions(+), 10 deletions(-)
Index: net-next/drivers/net/ethernet/renesas/ravb.h
===================================================================
--- net-next.orig/drivers/net/ethernet/renesas/ravb.h
+++ net-next/drivers/net/ethernet/renesas/ravb.h
@@ -1045,7 +1045,7 @@ void ravb_modify(struct net_device *ndev
u32 set);
int ravb_wait(struct net_device *ndev, enum ravb_reg reg, u32 mask, u32 value);
-irqreturn_t ravb_ptp_interrupt(struct net_device *ndev);
+void ravb_ptp_interrupt(struct net_device *ndev);
void ravb_ptp_init(struct net_device *ndev, struct platform_device *pdev);
void ravb_ptp_stop(struct net_device *ndev);
Index: net-next/drivers/net/ethernet/renesas/ravb_main.c
===================================================================
--- net-next.orig/drivers/net/ethernet/renesas/ravb_main.c
+++ net-next/drivers/net/ethernet/renesas/ravb_main.c
@@ -807,8 +807,10 @@ static irqreturn_t ravb_interrupt(int ir
}
/* gPTP interrupt status summary */
- if ((iss & ISS_CGIS) && ravb_ptp_interrupt(ndev) == IRQ_HANDLED)
+ if (iss & ISS_CGIS) {
+ ravb_ptp_interrupt(ndev);
result = IRQ_HANDLED;
+ }
mmiowb();
spin_unlock(&priv->lock);
@@ -838,8 +840,10 @@ static irqreturn_t ravb_multi_interrupt(
}
/* gPTP interrupt status summary */
- if ((iss & ISS_CGIS) && ravb_ptp_interrupt(ndev) == IRQ_HANDLED)
+ if (iss & ISS_CGIS) {
+ ravb_ptp_interrupt(ndev);
result = IRQ_HANDLED;
+ }
mmiowb();
spin_unlock(&priv->lock);
Index: net-next/drivers/net/ethernet/renesas/ravb_ptp.c
===================================================================
--- net-next.orig/drivers/net/ethernet/renesas/ravb_ptp.c
+++ net-next/drivers/net/ethernet/renesas/ravb_ptp.c
@@ -296,7 +296,7 @@ static const struct ptp_clock_info ravb_
};
/* Caller must hold the lock */
-irqreturn_t ravb_ptp_interrupt(struct net_device *ndev)
+void ravb_ptp_interrupt(struct net_device *ndev)
{
struct ravb_private *priv = netdev_priv(ndev);
u32 gis = ravb_read(ndev, GIS);
@@ -319,12 +319,7 @@ irqreturn_t ravb_ptp_interrupt(struct ne
}
}
- if (gis) {
- ravb_write(ndev, ~gis, GIS);
- return IRQ_HANDLED;
- }
-
- return IRQ_NONE;
+ ravb_write(ndev, ~gis, GIS);
}
void ravb_ptp_init(struct net_device *ndev, struct platform_device *pdev)
^ permalink raw reply
* Re: optimizations to sk_buff handling in rds_tcp_data_ready
From: Eric Dumazet @ 2016-04-10 23:54 UTC (permalink / raw)
To: Sowmini Varadhan; +Cc: netdev
In-Reply-To: <20160410001853.GA30364@oracle.com>
On Sat, 2016-04-09 at 20:18 -0400, Sowmini Varadhan wrote:
> On (04/07/16 07:16), Eric Dumazet wrote:
> > Use skb split like TCP in output path ?
> > Really, pskb_expand_head() is not supposed to copy payload ;)
>
> Question- how come skb_split doesnt have to deal with frag_list
> and do a skb_walk_frags()? Couldn't the split-line be somewhere
> in the frag_list? Also even for the skb_split_inside_header,
> dont we have to set
> skb_shinfo(skb1)->frag_list = skb_shinfo(skb)->frag_list;
> and cut loose the skb_shinfo(skb)->frag_list?
>
> As I try to mimic skb_split in some new set of "skb_carve"
> funtions, I'm running into all the various frag_list cases. I'm
> afraid I might end up needing most of the stuff under the "Pure
> masohism" (sic) comment in __pskb_pull_tail().
>
The helper was written for TCP I guess. TCP in output path never uses
skb_shinfo(skb)->frag_list : It is guaranteed to be NULL for all skbs.
^ permalink raw reply
* Вы сэкономите средства на кофе и чай, Вы сэкономите средства на кофе и чай
From: Галина (кофейная компания) @ 2016-04-08 11:33 UTC (permalink / raw)
To: midiza, losb9111, n.oborina00, maestrowinever, piratk0, olgavm84,
pincer, phoenix-fire, pylatik, mileninagv, notthatkind,
maystrov2000, nariste-1993, m.n.makaroff, master-union, ovt69,
netdev, podruga.lp, oukca, olgak12071979
Добрый день.
Поставляем Вашему офису кофе-машину абсолютно бесплатно(действительно бесплатно)!
Какая выгода для нас - Ваши сотрудники будут покупать кофе в нашей кофе-машине.
В чем выгода у Вас:
- Кофе разбудит сонных работников;
- Вы сэкономите деньги на бесплатные чай и кофе, которые они потребляли за счет фирмы.
- Кофе-аппарат порадует еще и Ваших покупателей, посетивших офис.
Данный кофе-аппарат работает в в автономном режиме.
Преимущества нашего кофе-аппарата:
- Современный внешний вид;
- Высокая скорость кофе - от 15 секунд;
- Бесплатный сервис;
- Аппарат готовит: эспрессо, американо, капучино, латте, мокачино, горячий шоколад, молочный шоколад, ванильный капучино.
Если принципиальный интерес у Вас есть - я вышлю Вам фотографии наших аппаратов и их характеристики.
Напишите пожалуйста - сколько человек в Вашем офисе, чтобы я могла сориентироваться - какого объема аппараты Вам можно предложить.
С уважением, Галина.
Буду ждать ответ от Вас.
P.S. Предложение действительно для фирм со штатом от 50 человек находящихся в Москве.
^ permalink raw reply
* Re: [PATCH] ath9k: remove duplicate assignment of variable ah
From: Julian Calaby @ 2016-04-11 1:39 UTC (permalink / raw)
To: Colin King
Cc: QCA ath9k Development, Kalle Valo, linux-wireless, ath9k-devel,
netdev, linux-kernel@vger.kernel.org
In-Reply-To: <1460287531-17385-1-git-send-email-colin.king@canonical.com>
Hi All,
On Sun, Apr 10, 2016 at 9:25 PM, Colin King <colin.king@canonical.com> wrote:
> From: Colin Ian King <colin.king@canonical.com>
>
> ah is written twice with the same value, remove one of the
> redundant assignments to ah.
>
> Signed-off-by: Colin Ian King <colin.king@canonical.com>
Looks right to me.
Signed-off-by: Julian Calaby <julian.calaby@gmail.com>
Thanks,
> ---
> drivers/net/wireless/ath/ath9k/init.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/net/wireless/ath/ath9k/init.c b/drivers/net/wireless/ath/ath9k/init.c
> index 77ace8d..fb702c4 100644
> --- a/drivers/net/wireless/ath/ath9k/init.c
> +++ b/drivers/net/wireless/ath/ath9k/init.c
> @@ -477,7 +477,7 @@ static void ath9k_eeprom_request_cb(const struct firmware *eeprom_blob,
> static int ath9k_eeprom_request(struct ath_softc *sc, const char *name)
> {
> struct ath9k_eeprom_ctx ec;
> - struct ath_hw *ah = ah = sc->sc_ah;
> + struct ath_hw *ah = sc->sc_ah;
> int err;
>
> /* try to load the EEPROM content asynchronously */
> --
> 2.7.4
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Julian Calaby
Email: julian.calaby@gmail.com
Profile: http://www.google.com/profiles/julian.calaby/
^ permalink raw reply
* [net-next PATCH v2 0/5] GRO Fixed IPv4 ID support and GSO partial support
From: Alexander Duyck @ 2016-04-11 1:44 UTC (permalink / raw)
To: herbert, tom, jesse, alexander.duyck, edumazet, netdev, davem
This patch series sets up a few different things.
First it adds support for GRO of frames with a fixed IP ID value. This
will allow us to perform GRO for frames that go through things like an IPv6
to IPv4 header translation.
The second item we add is support for segmenting frames that are generated
this way. Most devices only support an incrementing IP ID value, and in
the case of TCP the IP ID can be ignored in many cases since the DF bit
should be set. So we can technically segment these frames using existing
TSO if we are willing to allow the IP ID to be mangled. As such I have
added a matching feature for the new form of GRO/GSO called TCP IPv4 ID
mangling. With this enabled we can assemble and disassemble a frame with
the sequence number fixed and the only ill effect will be that the IPv4 ID
will be altered which may or may not have any noticeable effect. As such I
have defaulted the feature to disabled.
The third item this patch series adds is support for partial GSO
segmentation. Partial GSO segmentation allows us to split a large frame
into two pieces. The first piece will have an even multiple of MSS worth
of data and the headers before the one pointed to by csum_start will have
been updated so that they are correct for if the data payload had already
been segmented. By doing this we can do things such as precompute the
outer header checksums for a frame to be segmented allowing us to perform
TSO on devices that don't support tunneling, or tunneling with outer header
checksums.
This patch set is based on the net-next tree, but I included "net: remove
netdevice gso_min_segs" in my tree as I assume it is likely to be applied
before this patch set will and I wanted to avoid a merge conflict.
v2: Fixed items reported by Jesse Gross
fixed missing GSO flag in MPLS check
adding DF check for MANGLEID
Moved extra GSO feature checks into gso_features_check
Rebased batches to account for "net: remove netdevice gso_min_segs"
Driver patches from the first patch set should still be compatible. However
I do have a few changes in them so I will submit a v2 of those to Jeff
Kirsher once these patches are accepted into net-next.
Example driver patches for i40e, ixgbe, and igb:
https://patchwork.ozlabs.org/patch/608221/
https://patchwork.ozlabs.org/patch/608224/
https://patchwork.ozlabs.org/patch/608225/
---
Alexander Duyck (5):
ethtool: Add support for toggling any of the GSO offloads
GSO: Add GSO type for fixed IPv4 ID
GRO: Add support for TCP with fixed IPv4 ID field, limit tunnel IP ID values
GSO: Support partial segmentation offload
Documentation: Add documentation for TSO and GSO features
Documentation/networking/segmentation-offloads.txt | 130 ++++++++++++++++++++
include/linux/netdev_features.h | 8 +
include/linux/netdevice.h | 8 +
include/linux/skbuff.h | 27 +++-
net/core/dev.c | 67 +++++++++-
net/core/ethtool.c | 4 +
net/core/skbuff.c | 29 ++++
net/ipv4/af_inet.c | 70 ++++++++---
net/ipv4/gre_offload.c | 27 +++-
net/ipv4/tcp_offload.c | 30 ++++-
net/ipv4/udp_offload.c | 27 +++-
net/ipv6/ip6_offload.c | 21 +++
net/mpls/mpls_gso.c | 1
13 files changed, 395 insertions(+), 54 deletions(-)
create mode 100644 Documentation/networking/segmentation-offloads.txt
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox