* [PATCH net 1/2] hns: Fix string set validation in ethtool string operations
From: Ben Hutchings @ 2018-03-13 23:39 UTC (permalink / raw)
To: Yisen Zhuang, Salil Mehta; +Cc: netdev
[-- Attachment #1: Type: text/plain, Size: 3201 bytes --]
The hns driver used to assume that it would only ever be asked
about ETH_SS_TEST or ETH_SS_STATS, and for any other stringset
it would claim a count of 0 but if asked for names would copy
over all the statistic names. This resulted in a potential
heap buffer overflow.
This was "fixed" some time ago by making it report the number of
statistics when asked about the ETH_SS_PRIV_FLAGS stringset. This
doesn't make any sense, and it left the bug still exploitable with
other stringset numbers.
As a minimal but complete fix, stop calling the driver-internal
string operations for any stringset other than ETH_SS_STATS.
Fixes: 511e6bc071db ("net: add Hisilicon Network Subsystem DSAF support")
Fixes: 412b65d15a7f ("net: hns: fix ethtool_get_strings overflow ...")
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
---
drivers/net/ethernet/hisilicon/hns/hns_ethtool.c | 17 +++++------------
1 file changed, 5 insertions(+), 12 deletions(-)
diff --git a/drivers/net/ethernet/hisilicon/hns/hns_ethtool.c b/drivers/net/ethernet/hisilicon/hns/hns_ethtool.c
index 7ea7f8a4aa2a..b836742f7ab6 100644
--- a/drivers/net/ethernet/hisilicon/hns/hns_ethtool.c
+++ b/drivers/net/ethernet/hisilicon/hns/hns_ethtool.c
@@ -889,11 +889,6 @@ void hns_get_strings(struct net_device *netdev, u32 stringset, u8 *data)
struct hnae_handle *h = priv->ae_handle;
char *buff = (char *)data;
- if (!h->dev->ops->get_strings) {
- netdev_err(netdev, "h->dev->ops->get_strings is null!\n");
- return;
- }
-
if (stringset == ETH_SS_TEST) {
if (priv->ae_handle->phy_if != PHY_INTERFACE_MODE_XGMII) {
memcpy(buff, hns_nic_test_strs[MAC_INTERNALLOOP_MAC],
@@ -907,7 +902,7 @@ void hns_get_strings(struct net_device *netdev, u32 stringset, u8 *data)
memcpy(buff, hns_nic_test_strs[MAC_INTERNALLOOP_PHY],
ETH_GSTRING_LEN);
- } else {
+ } else if (stringset == ETH_SS_STATS && h->dev->ops->get_strings) {
snprintf(buff, ETH_GSTRING_LEN, "rx_packets");
buff = buff + ETH_GSTRING_LEN;
snprintf(buff, ETH_GSTRING_LEN, "tx_packets");
@@ -969,7 +964,7 @@ void hns_get_strings(struct net_device *netdev, u32 stringset, u8 *data)
/**
* nic_get_sset_count - get string set count witch returned by nic_get_strings.
* @dev: net device
- * @stringset: string set index, 0: self test string; 1: statistics string.
+ * @stringset: string set index
*
* Return string set count.
*/
@@ -979,10 +974,6 @@ int hns_get_sset_count(struct net_device *netdev, int stringset)
struct hnae_handle *h = priv->ae_handle;
struct hnae_ae_ops *ops = h->dev->ops;
- if (!ops->get_sset_count) {
- netdev_err(netdev, "get_sset_count is null!\n");
- return -EOPNOTSUPP;
- }
if (stringset == ETH_SS_TEST) {
u32 cnt = (sizeof(hns_nic_test_strs) / ETH_GSTRING_LEN);
@@ -993,8 +984,10 @@ int hns_get_sset_count(struct net_device *netdev, int stringset)
cnt--;
return cnt;
+ } else if (stringset == ETH_SS_STATS && ops->get_sset_count) {
+ return HNS_NET_STATS_CNT + ops->get_sset_count(h, stringset);
} else {
- return (HNS_NET_STATS_CNT + ops->get_sset_count(h, stringset));
+ return -EOPNOTSUPP;
}
}
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 811 bytes --]
^ permalink raw reply related
* [PATCH 2/2] hns: Clean up string operations
From: Ben Hutchings @ 2018-03-13 23:41 UTC (permalink / raw)
To: Yisen Zhuang, Salil Mehta; +Cc: netdev
In-Reply-To: <20180313233948.GG8564@decadent.org.uk>
[-- Attachment #1: Type: text/plain, Size: 20972 bytes --]
The driver-internal string operations are only ever used for
statistics, so remove the stringset parameters and rename them
accordingly.
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
---
drivers/net/ethernet/hisilicon/hns/hnae.h | 13 ++++----
drivers/net/ethernet/hisilicon/hns/hns_ae_adapt.c | 37 +++++++++++-----------
drivers/net/ethernet/hisilicon/hns/hns_dsaf_gmac.c | 16 +++-------
drivers/net/ethernet/hisilicon/hns/hns_dsaf_mac.c | 9 +++---
drivers/net/ethernet/hisilicon/hns/hns_dsaf_mac.h | 10 +++---
drivers/net/ethernet/hisilicon/hns/hns_dsaf_main.c | 28 ++++++----------
drivers/net/ethernet/hisilicon/hns/hns_dsaf_main.h | 6 ++--
drivers/net/ethernet/hisilicon/hns/hns_dsaf_ppe.c | 11 +++----
drivers/net/ethernet/hisilicon/hns/hns_dsaf_ppe.h | 4 +--
drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.c | 20 ++++--------
drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.h | 4 +--
.../net/ethernet/hisilicon/hns/hns_dsaf_xgmac.c | 24 +++++---------
drivers/net/ethernet/hisilicon/hns/hns_ethtool.c | 9 +++---
13 files changed, 78 insertions(+), 113 deletions(-)
diff --git a/drivers/net/ethernet/hisilicon/hns/hnae.h b/drivers/net/ethernet/hisilicon/hns/hnae.h
index 3e62692af011..480fa2b49be7 100644
--- a/drivers/net/ethernet/hisilicon/hns/hnae.h
+++ b/drivers/net/ethernet/hisilicon/hns/hnae.h
@@ -457,10 +457,10 @@ enum hnae_media_type {
* update Old network device statistics
* get_ethtool_stats()
* get ethtool network device statistics
- * get_strings()
- * get a set of strings that describe the requested objects
- * get_sset_count()
- * get number of strings that @get_strings will write
+ * get_stats_strings()
+ * get a set of strings that describe the statistics
+ * get_stats_count()
+ * get number of strings that @get_stats_strings will write
* update_led_status()
* update the led status
* set_led_id()
@@ -522,9 +522,8 @@ struct hnae_ae_ops {
void (*update_stats)(struct hnae_handle *handle,
struct net_device_stats *net_stats);
void (*get_stats)(struct hnae_handle *handle, u64 *data);
- void (*get_strings)(struct hnae_handle *handle,
- u32 stringset, u8 *data);
- int (*get_sset_count)(struct hnae_handle *handle, int stringset);
+ void (*get_stats_strings)(struct hnae_handle *handle, u8 *data);
+ int (*get_stats_count)(struct hnae_handle *handle);
void (*update_led_status)(struct hnae_handle *handle);
int (*set_led_id)(struct hnae_handle *handle,
enum hnae_led_state status);
diff --git a/drivers/net/ethernet/hisilicon/hns/hns_ae_adapt.c b/drivers/net/ethernet/hisilicon/hns/hns_ae_adapt.c
index bd68379d2bea..638476aa6311 100644
--- a/drivers/net/ethernet/hisilicon/hns/hns_ae_adapt.c
+++ b/drivers/net/ethernet/hisilicon/hns/hns_ae_adapt.c
@@ -679,21 +679,20 @@ void hns_ae_get_stats(struct hnae_handle *handle, u64 *data)
for (idx = 0; idx < handle->q_num; idx++) {
hns_rcb_get_stats(handle->qs[idx], p);
- p += hns_rcb_get_ring_sset_count((int)ETH_SS_STATS);
+ p += hns_rcb_get_ring_stats_count();
}
hns_ppe_get_stats(ppe_cb, p);
- p += hns_ppe_get_sset_count((int)ETH_SS_STATS);
+ p += hns_ppe_get_stats_count();
hns_mac_get_stats(mac_cb, p);
- p += hns_mac_get_sset_count(mac_cb, (int)ETH_SS_STATS);
+ p += hns_mac_get_stats_count(mac_cb);
if (mac_cb->mac_type == HNAE_PORT_SERVICE)
hns_dsaf_get_stats(vf_cb->dsaf_dev, p, vf_cb->port_index);
}
-void hns_ae_get_strings(struct hnae_handle *handle,
- u32 stringset, u8 *data)
+void hns_ae_get_stats_strings(struct hnae_handle *handle, u8 *data)
{
int port;
int idx;
@@ -711,21 +710,21 @@ void hns_ae_get_strings(struct hnae_handle *handle,
ppe_cb = hns_get_ppe_cb(handle);
for (idx = 0; idx < handle->q_num; idx++) {
- hns_rcb_get_strings(stringset, p, idx);
- p += ETH_GSTRING_LEN * hns_rcb_get_ring_sset_count(stringset);
+ hns_rcb_get_stats_strings(p, idx);
+ p += ETH_GSTRING_LEN * hns_rcb_get_ring_stats_count();
}
- hns_ppe_get_strings(ppe_cb, stringset, p);
- p += ETH_GSTRING_LEN * hns_ppe_get_sset_count(stringset);
+ hns_ppe_get_stats_strings(ppe_cb, p);
+ p += ETH_GSTRING_LEN * hns_ppe_get_stats_count();
- hns_mac_get_strings(mac_cb, stringset, p);
- p += ETH_GSTRING_LEN * hns_mac_get_sset_count(mac_cb, stringset);
+ hns_mac_get_stats_strings(mac_cb, p);
+ p += ETH_GSTRING_LEN * hns_mac_get_stats_count(mac_cb);
if (mac_cb->mac_type == HNAE_PORT_SERVICE)
- hns_dsaf_get_strings(stringset, p, port, dsaf_dev);
+ hns_dsaf_get_stats_strings(p, port, dsaf_dev);
}
-int hns_ae_get_sset_count(struct hnae_handle *handle, int stringset)
+int hns_ae_get_stats_count(struct hnae_handle *handle)
{
u32 sset_count = 0;
struct hns_mac_cb *mac_cb;
@@ -735,12 +734,12 @@ int hns_ae_get_sset_count(struct hnae_handle *handle, int stringset)
mac_cb = hns_get_mac_cb(handle);
- sset_count += hns_rcb_get_ring_sset_count(stringset) * handle->q_num;
- sset_count += hns_ppe_get_sset_count(stringset);
- sset_count += hns_mac_get_sset_count(mac_cb, stringset);
+ sset_count += hns_rcb_get_ring_stats_count() * handle->q_num;
+ sset_count += hns_ppe_get_stats_count();
+ sset_count += hns_mac_get_stats_count(mac_cb);
if (mac_cb->mac_type == HNAE_PORT_SERVICE)
- sset_count += hns_dsaf_get_sset_count(dsaf_dev, stringset);
+ sset_count += hns_dsaf_get_stats_count(dsaf_dev);
return sset_count;
}
@@ -923,8 +922,8 @@ static struct hnae_ae_ops hns_dsaf_ops = {
.update_stats = hns_ae_update_stats,
.set_tso_stats = hns_ae_set_tso_stats,
.get_stats = hns_ae_get_stats,
- .get_strings = hns_ae_get_strings,
- .get_sset_count = hns_ae_get_sset_count,
+ .get_stats_strings = hns_ae_get_stats_strings,
+ .get_stats_count = hns_ae_get_stats_count,
.update_led_status = hns_ae_update_led_status,
.set_led_id = hns_ae_cpld_set_led_id,
.get_regs = hns_ae_get_regs,
diff --git a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_gmac.c b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_gmac.c
index 86944bc3b273..7525b03fd3d8 100644
--- a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_gmac.c
+++ b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_gmac.c
@@ -649,14 +649,11 @@ static void hns_gmac_get_stats(void *mac_drv, u64 *data)
}
}
-static void hns_gmac_get_strings(u32 stringset, u8 *data)
+static void hns_gmac_get_stats_strings(u8 *data)
{
char *buff = (char *)data;
u32 i;
- if (stringset != ETH_SS_STATS)
- return;
-
for (i = 0; i < ARRAY_SIZE(g_gmac_stats_string); i++) {
snprintf(buff, ETH_GSTRING_LEN, "%s",
g_gmac_stats_string[i].desc);
@@ -664,12 +661,9 @@ static void hns_gmac_get_strings(u32 stringset, u8 *data)
}
}
-static int hns_gmac_get_sset_count(int stringset)
+static int hns_gmac_get_stats_count(void)
{
- if (stringset == ETH_SS_STATS || stringset == ETH_SS_PRIV_FLAGS)
- return ARRAY_SIZE(g_gmac_stats_string);
-
- return 0;
+ return ARRAY_SIZE(g_gmac_stats_string);
}
static int hns_gmac_get_regs_count(void)
@@ -713,8 +707,8 @@ void *hns_gmac_config(struct hns_mac_cb *mac_cb, struct mac_params *mac_param)
mac_drv->get_regs = hns_gmac_get_regs;
mac_drv->get_regs_count = hns_gmac_get_regs_count;
mac_drv->get_ethtool_stats = hns_gmac_get_stats;
- mac_drv->get_sset_count = hns_gmac_get_sset_count;
- mac_drv->get_strings = hns_gmac_get_strings;
+ mac_drv->get_stats_count = hns_gmac_get_stats_count;
+ mac_drv->get_stats_strings = hns_gmac_get_stats_strings;
mac_drv->update_stats = hns_gmac_update_stats;
mac_drv->set_promiscuous = hns_gmac_set_promisc;
diff --git a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_mac.c b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_mac.c
index cac86e9ae0dd..094476b27c00 100644
--- a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_mac.c
+++ b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_mac.c
@@ -1109,19 +1109,18 @@ void hns_mac_get_stats(struct hns_mac_cb *mac_cb, u64 *data)
mac_ctrl_drv->get_ethtool_stats(mac_ctrl_drv, data);
}
-void hns_mac_get_strings(struct hns_mac_cb *mac_cb,
- int stringset, u8 *data)
+void hns_mac_get_stats_strings(struct hns_mac_cb *mac_cb, u8 *data)
{
struct mac_driver *mac_ctrl_drv = hns_mac_get_drv(mac_cb);
- mac_ctrl_drv->get_strings(stringset, data);
+ mac_ctrl_drv->get_stats_strings(data);
}
-int hns_mac_get_sset_count(struct hns_mac_cb *mac_cb, int stringset)
+int hns_mac_get_stats_count(struct hns_mac_cb *mac_cb)
{
struct mac_driver *mac_ctrl_drv = hns_mac_get_drv(mac_cb);
- return mac_ctrl_drv->get_sset_count(stringset);
+ return mac_ctrl_drv->get_stats_count();
}
void hns_mac_set_promisc(struct hns_mac_cb *mac_cb, u8 en)
diff --git a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_mac.h b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_mac.h
index bbc0a98e7ca3..495462d38cb2 100644
--- a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_mac.h
+++ b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_mac.h
@@ -383,9 +383,9 @@ struct mac_driver {
void (*get_regs)(void *mac_drv, void *data);
int (*get_regs_count)(void);
/* get strings name for ethtool statistic */
- void (*get_strings)(u32 stringset, u8 *data);
- /* get the number of strings*/
- int (*get_sset_count)(int stringset);
+ void (*get_stats_strings)(u8 *data);
+ /* get the number of statistics */
+ int (*get_stats_count)(void);
/* get the statistic by ethtools*/
void (*get_ethtool_stats)(void *mac_drv, u64 *data);
@@ -448,8 +448,8 @@ int hns_mac_config_mac_loopback(struct hns_mac_cb *mac_cb,
enum hnae_loop loop, int en);
void hns_mac_update_stats(struct hns_mac_cb *mac_cb);
void hns_mac_get_stats(struct hns_mac_cb *mac_cb, u64 *data);
-void hns_mac_get_strings(struct hns_mac_cb *mac_cb, int stringset, u8 *data);
-int hns_mac_get_sset_count(struct hns_mac_cb *mac_cb, int stringset);
+void hns_mac_get_stats_strings(struct hns_mac_cb *mac_cb, u8 *data);
+int hns_mac_get_stats_count(struct hns_mac_cb *mac_cb);
void hns_mac_get_regs(struct hns_mac_cb *mac_cb, void *data);
int hns_mac_get_regs_count(struct hns_mac_cb *mac_cb);
void hns_set_led_opt(struct hns_mac_cb *mac_cb);
diff --git a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_main.c b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_main.c
index e0bc79ea3d88..0ba28a29b738 100644
--- a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_main.c
+++ b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_main.c
@@ -2617,38 +2617,30 @@ void hns_dsaf_get_stats(struct dsaf_device *ddev, u64 *data, int port)
}
/**
- *hns_dsaf_get_sset_count - get dsaf string set count
- *@stringset: type of values in data
- *return dsaf string name count
+ *hns_dsaf_get_stats_count - get dsaf statistic count
+ *return dsaf statistic count
*/
-int hns_dsaf_get_sset_count(struct dsaf_device *dsaf_dev, int stringset)
+int hns_dsaf_get_stats_count(struct dsaf_device *dsaf_dev)
{
bool is_ver1 = AE_IS_VER1(dsaf_dev->dsaf_ver);
- if (stringset == ETH_SS_STATS) {
- if (is_ver1)
- return DSAF_STATIC_NUM;
- else
- return DSAF_V2_STATIC_NUM;
- }
- return 0;
+ if (is_ver1)
+ return DSAF_STATIC_NUM;
+ else
+ return DSAF_V2_STATIC_NUM;
}
/**
- *hns_dsaf_get_strings - get dsaf string set
- *@stringset:srting set index
+ *hns_dsaf_get_stats_strings - get dsaf statistic strings
*@data:strings name value
*@port:port index
*/
-void hns_dsaf_get_strings(int stringset, u8 *data, int port,
- struct dsaf_device *dsaf_dev)
+void hns_dsaf_get_stats_strings(u8 *data, int port,
+ struct dsaf_device *dsaf_dev)
{
char *buff = (char *)data;
int node = port;
- if (stringset != ETH_SS_STATS)
- return;
-
/* for ge/xge node info */
buff = hns_dsaf_get_node_stats_strings(buff, node, dsaf_dev);
diff --git a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_main.h b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_main.h
index 4507e8222683..275dec595dc7 100644
--- a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_main.h
+++ b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_main.h
@@ -442,10 +442,10 @@ void hns_dsaf_ae_uninit(struct dsaf_device *dsaf_dev);
void hns_dsaf_update_stats(struct dsaf_device *dsaf_dev, u32 inode_num);
-int hns_dsaf_get_sset_count(struct dsaf_device *dsaf_dev, int stringset);
+int hns_dsaf_get_stats_count(struct dsaf_device *dsaf_dev);
void hns_dsaf_get_stats(struct dsaf_device *ddev, u64 *data, int port);
-void hns_dsaf_get_strings(int stringset, u8 *data, int port,
- struct dsaf_device *dsaf_dev);
+void hns_dsaf_get_stats_strings(u8 *data, int port,
+ struct dsaf_device *dsaf_dev);
void hns_dsaf_get_regs(struct dsaf_device *ddev, u32 port, void *data);
int hns_dsaf_get_regs_count(void);
diff --git a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_ppe.c b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_ppe.c
index b62816c1574e..a08471441285 100644
--- a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_ppe.c
+++ b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_ppe.c
@@ -420,11 +420,9 @@ void hns_ppe_update_stats(struct hns_ppe_cb *ppe_cb)
+= dsaf_read_dev(ppe_cb, PPE_HIS_TX_PKT_CS_FAIL_CNT_REG);
}
-int hns_ppe_get_sset_count(int stringset)
+int hns_ppe_get_stats_count(void)
{
- if (stringset == ETH_SS_STATS || stringset == ETH_SS_PRIV_FLAGS)
- return ETH_PPE_STATIC_NUM;
- return 0;
+ return ETH_PPE_STATIC_NUM;
}
int hns_ppe_get_regs_count(void)
@@ -433,12 +431,11 @@ int hns_ppe_get_regs_count(void)
}
/**
- * ppe_get_strings - get ppe srting
+ * ppe_get_strings - get ppe statistic strings
* @ppe_device: ppe device
- * @stringset: string set type
* @data: output string
*/
-void hns_ppe_get_strings(struct hns_ppe_cb *ppe_cb, int stringset, u8 *data)
+void hns_ppe_get_stats_strings(struct hns_ppe_cb *ppe_cb, u8 *data)
{
char *buff = (char *)data;
int index = ppe_cb->index;
diff --git a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_ppe.h b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_ppe.h
index 9d8e643e8aa6..1baf04caf4bf 100644
--- a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_ppe.h
+++ b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_ppe.h
@@ -108,11 +108,11 @@ void hns_ppe_reset_common(struct dsaf_device *dsaf_dev, u8 ppe_common_index);
void hns_ppe_update_stats(struct hns_ppe_cb *ppe_cb);
-int hns_ppe_get_sset_count(int stringset);
+int hns_ppe_get_stats_count(void);
int hns_ppe_get_regs_count(void);
void hns_ppe_get_regs(struct hns_ppe_cb *ppe_cb, void *data);
-void hns_ppe_get_strings(struct hns_ppe_cb *ppe_cb, int stringset, u8 *data);
+void hns_ppe_get_stats_strings(struct hns_ppe_cb *ppe_cb, u8 *data);
void hns_ppe_get_stats(struct hns_ppe_cb *ppe_cb, u64 *data);
void hns_ppe_set_tso_enable(struct hns_ppe_cb *ppe_cb, u32 value);
void hns_ppe_set_rss_key(struct hns_ppe_cb *ppe_cb,
diff --git a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.c b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.c
index 6f3570cfb501..7d4ecd8c8d6e 100644
--- a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.c
+++ b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.c
@@ -870,16 +870,12 @@ void hns_rcb_get_stats(struct hnae_queue *queue, u64 *data)
}
/**
- *hns_rcb_get_ring_sset_count - rcb string set count
- *@stringset:ethtool cmd
- *return rcb ring string set count
+ *hns_rcb_get_ring_stats_count - rcb statistic count
+ *return rcb ring statistic count
*/
-int hns_rcb_get_ring_sset_count(int stringset)
+int hns_rcb_get_ring_stats_count(void)
{
- if (stringset == ETH_SS_STATS || stringset == ETH_SS_PRIV_FLAGS)
- return HNS_RING_STATIC_REG_NUM;
-
- return 0;
+ return HNS_RING_STATIC_REG_NUM;
}
/**
@@ -901,18 +897,14 @@ int hns_rcb_get_ring_regs_count(void)
}
/**
- *hns_rcb_get_strings - get rcb string set
- *@stringset:string set index
+ *hns_rcb_get_strings - get rcb statistic strings
*@data:strings name value
*@index:queue index
*/
-void hns_rcb_get_strings(int stringset, u8 *data, int index)
+void hns_rcb_get_stats_strings(u8 *data, int index)
{
char *buff = (char *)data;
- if (stringset != ETH_SS_STATS)
- return;
-
snprintf(buff, ETH_GSTRING_LEN, "tx_ring%d_rcb_pkt_num", index);
buff = buff + ETH_GSTRING_LEN;
snprintf(buff, ETH_GSTRING_LEN, "tx_ring%d_ppe_tx_pkt_num", index);
diff --git a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.h b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.h
index 602816498c8d..95cd0111e802 100644
--- a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.h
+++ b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_rcb.h
@@ -154,13 +154,13 @@ void hns_rcb_get_stats(struct hnae_queue *queue, u64 *data);
void hns_rcb_get_common_regs(struct rcb_common_cb *rcb_common, void *data);
-int hns_rcb_get_ring_sset_count(int stringset);
+int hns_rcb_get_ring_stats_count(void);
int hns_rcb_get_common_regs_count(void);
int hns_rcb_get_ring_regs_count(void);
void hns_rcb_get_ring_regs(struct hnae_queue *queue, void *data);
-void hns_rcb_get_strings(int stringset, u8 *data, int index);
+void hns_rcb_get_stats_strings(u8 *data, int index);
void hns_rcb_set_rx_ring_bs(struct hnae_queue *q, u32 buf_size);
void hns_rcb_set_tx_ring_bs(struct hnae_queue *q, u32 buf_size);
diff --git a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_xgmac.c b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_xgmac.c
index 51e7e9f5af49..0e8fd285bfd7 100644
--- a/drivers/net/ethernet/hisilicon/hns/hns_dsaf_xgmac.c
+++ b/drivers/net/ethernet/hisilicon/hns/hns_dsaf_xgmac.c
@@ -756,18 +756,14 @@ static void hns_xgmac_get_stats(void *mac_drv, u64 *data)
}
/**
- *hns_xgmac_get_strings - get xgmac strings name
- *@stringset: type of values in data
+ *hns_xgmac_get_stats_strings - get xgmac statistic strings
*@data:data for value of string name
*/
-static void hns_xgmac_get_strings(u32 stringset, u8 *data)
+static void hns_xgmac_get_stats_strings(u8 *data)
{
char *buff = (char *)data;
u32 i;
- if (stringset != ETH_SS_STATS)
- return;
-
for (i = 0; i < ARRAY_SIZE(g_xgmac_stats_string); i++) {
snprintf(buff, ETH_GSTRING_LEN, g_xgmac_stats_string[i].desc);
buff = buff + ETH_GSTRING_LEN;
@@ -775,16 +771,12 @@ static void hns_xgmac_get_strings(u32 stringset, u8 *data)
}
/**
- *hns_xgmac_get_sset_count - get xgmac string set count
- *@stringset: type of values in data
- *return xgmac string set count
+ *hns_xgmac_get_stats_count - get xgmac statistic count
+ *return xgmac statistic count
*/
-static int hns_xgmac_get_sset_count(int stringset)
+static int hns_xgmac_get_stats_count(void)
{
- if (stringset == ETH_SS_STATS || stringset == ETH_SS_PRIV_FLAGS)
- return ARRAY_SIZE(g_xgmac_stats_string);
-
- return 0;
+ return ARRAY_SIZE(g_xgmac_stats_string);
}
/**
@@ -832,9 +824,9 @@ void *hns_xgmac_config(struct hns_mac_cb *mac_cb, struct mac_params *mac_param)
mac_drv->get_link_status = hns_xgmac_get_link_status;
mac_drv->get_regs = hns_xgmac_get_regs;
mac_drv->get_ethtool_stats = hns_xgmac_get_stats;
- mac_drv->get_sset_count = hns_xgmac_get_sset_count;
+ mac_drv->get_stats_count = hns_xgmac_get_stats_count;
mac_drv->get_regs_count = hns_xgmac_get_regs_count;
- mac_drv->get_strings = hns_xgmac_get_strings;
+ mac_drv->get_stats_strings = hns_xgmac_get_stats_strings;
mac_drv->update_stats = hns_xgmac_update_stats;
return (void *)mac_drv;
diff --git a/drivers/net/ethernet/hisilicon/hns/hns_ethtool.c b/drivers/net/ethernet/hisilicon/hns/hns_ethtool.c
index b836742f7ab6..71075f69122c 100644
--- a/drivers/net/ethernet/hisilicon/hns/hns_ethtool.c
+++ b/drivers/net/ethernet/hisilicon/hns/hns_ethtool.c
@@ -902,7 +902,8 @@ void hns_get_strings(struct net_device *netdev, u32 stringset, u8 *data)
memcpy(buff, hns_nic_test_strs[MAC_INTERNALLOOP_PHY],
ETH_GSTRING_LEN);
- } else if (stringset == ETH_SS_STATS && h->dev->ops->get_strings) {
+ } else if (stringset == ETH_SS_STATS &&
+ h->dev->ops->get_stats_strings) {
snprintf(buff, ETH_GSTRING_LEN, "rx_packets");
buff = buff + ETH_GSTRING_LEN;
snprintf(buff, ETH_GSTRING_LEN, "tx_packets");
@@ -957,7 +958,7 @@ void hns_get_strings(struct net_device *netdev, u32 stringset, u8 *data)
snprintf(buff, ETH_GSTRING_LEN, "netdev_tx_timeout");
buff = buff + ETH_GSTRING_LEN;
- h->dev->ops->get_strings(h, stringset, (u8 *)buff);
+ h->dev->ops->get_stats_strings(h, (u8 *)buff);
}
}
@@ -984,8 +985,8 @@ int hns_get_sset_count(struct net_device *netdev, int stringset)
cnt--;
return cnt;
- } else if (stringset == ETH_SS_STATS && ops->get_sset_count) {
- return HNS_NET_STATS_CNT + ops->get_sset_count(h, stringset);
+ } else if (stringset == ETH_SS_STATS && ops->get_stats_count) {
+ return HNS_NET_STATS_CNT + ops->get_stats_count(h);
} else {
return -EOPNOTSUPP;
}
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 811 bytes --]
^ permalink raw reply related
* Re: [PATCH] bpf: whitelist syscalls for error injection
From: Omar Sandoval @ 2018-03-13 23:45 UTC (permalink / raw)
To: Howard McLauchlan
Cc: linux-kernel, linux-api, Al Viro, Thomas Gleixner, Yonghong Song,
David S . Miller, Thomas Garnier, kernel-team, Steven Rostedt,
Ingo Molnar, Josef Bacik, Alexei Starovoitov, netdev
In-Reply-To: <20180313231627.1247-1-hmclauchlan@fb.com>
On Tue, Mar 13, 2018 at 04:16:27PM -0700, Howard McLauchlan wrote:
> Error injection is a useful mechanism to fail arbitrary kernel
> functions. However, it is often hard to guarantee an error propagates
> appropriately to user space programs. By injecting into syscalls, we can
> return arbitrary values to user space directly; this increases
> flexibility and robustness in testing, allowing us to test user space
> error paths effectively.
>
> The following script, for example, fails calls to sys_open() from a
> given pid:
>
> from bcc import BPF
> from sys import argv
>
> pid = argv[1]
>
> prog = r"""
>
> int kprobe__SyS_open(struct pt_regs *ctx, const char *pathname, int flags)
> {
> u32 pid = bpf_get_current_pid_tgid();
> if (pid == %s)
> bpf_override_return(ctx, -ENOENT);
> return 0;
> }
> """ % pid
>
> b = BPF(text = prog)
> while 1:
> b.perf_buffer_poll()
>
> This patch whitelists all syscalls defined with SYSCALL_DEFINE for error
> injection.
>
> Signed-off-by: Howard McLauchlan <hmclauchlan@fb.com>
> ---
> based on 4.16-rc5
> include/linux/syscalls.h | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
> index a78186d826d7..e8c6d63ace78 100644
> --- a/include/linux/syscalls.h
> +++ b/include/linux/syscalls.h
> @@ -191,6 +191,8 @@ static inline int is_syscall_trace_event(struct trace_event_call *tp_event)
>
> #define SYSCALL_DEFINE0(sname) \
> SYSCALL_METADATA(_##sname, 0); \
> + asmlinkage long sys_##sname(void); \
> + ALLOW_ERROR_INJECTION(sys_##sname, ERRNO); \
> asmlinkage long sys_##sname(void)
>
> #define SYSCALL_DEFINE1(name, ...) SYSCALL_DEFINEx(1, _##name, __VA_ARGS__)
> @@ -210,6 +212,7 @@ static inline int is_syscall_trace_event(struct trace_event_call *tp_event)
> #define __SYSCALL_DEFINEx(x, name, ...) \
> asmlinkage long sys##name(__MAP(x,__SC_DECL,__VA_ARGS__)) \
> __attribute__((alias(__stringify(SyS##name)))); \
> + ALLOW_ERROR_INJECTION(sys##name, ERRNO); \
> static inline long SYSC##name(__MAP(x,__SC_DECL,__VA_ARGS__)); \
> asmlinkage long SyS##name(__MAP(x,__SC_LONG,__VA_ARGS__)); \
> asmlinkage long SyS##name(__MAP(x,__SC_LONG,__VA_ARGS__)) \
> --
> 2.14.1
>
Adding a few more people to Cc
^ permalink raw reply
* Re: [PATCH] bpf: whitelist syscalls for error injection
From: Yonghong Song @ 2018-03-13 23:49 UTC (permalink / raw)
To: Omar Sandoval, Howard McLauchlan
Cc: linux-kernel, linux-api, Al Viro, Thomas Gleixner,
David S . Miller, Thomas Garnier, kernel-team, Steven Rostedt,
Ingo Molnar, Josef Bacik, Alexei Starovoitov, netdev
In-Reply-To: <20180313234509.GA4981@vader>
On 3/13/18 4:45 PM, Omar Sandoval wrote:
> On Tue, Mar 13, 2018 at 04:16:27PM -0700, Howard McLauchlan wrote:
>> Error injection is a useful mechanism to fail arbitrary kernel
>> functions. However, it is often hard to guarantee an error propagates
>> appropriately to user space programs. By injecting into syscalls, we can
>> return arbitrary values to user space directly; this increases
>> flexibility and robustness in testing, allowing us to test user space
>> error paths effectively.
>>
>> The following script, for example, fails calls to sys_open() from a
>> given pid:
>>
>> from bcc import BPF
>> from sys import argv
>>
>> pid = argv[1]
>>
>> prog = r"""
>>
>> int kprobe__SyS_open(struct pt_regs *ctx, const char *pathname, int flags)
>> {
>> u32 pid = bpf_get_current_pid_tgid();
>> if (pid == %s)
>> bpf_override_return(ctx, -ENOENT);
>> return 0;
>> }
>> """ % pid
>>
>> b = BPF(text = prog)
>> while 1:
>> b.perf_buffer_poll()
>>
>> This patch whitelists all syscalls defined with SYSCALL_DEFINE for error
>> injection.
>>
>> Signed-off-by: Howard McLauchlan <hmclauchlan@fb.com>
>> ---
>> based on 4.16-rc5
>> include/linux/syscalls.h | 3 +++
>> 1 file changed, 3 insertions(+)
>>
>> diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
>> index a78186d826d7..e8c6d63ace78 100644
>> --- a/include/linux/syscalls.h
>> +++ b/include/linux/syscalls.h
>> @@ -191,6 +191,8 @@ static inline int is_syscall_trace_event(struct trace_event_call *tp_event)
>>
>> #define SYSCALL_DEFINE0(sname) \
>> SYSCALL_METADATA(_##sname, 0); \
>> + asmlinkage long sys_##sname(void); \
>> + ALLOW_ERROR_INJECTION(sys_##sname, ERRNO); \
>> asmlinkage long sys_##sname(void)
duplication of asmlinkage in the above?
>>
>> #define SYSCALL_DEFINE1(name, ...) SYSCALL_DEFINEx(1, _##name, __VA_ARGS__)
>> @@ -210,6 +212,7 @@ static inline int is_syscall_trace_event(struct trace_event_call *tp_event)
>> #define __SYSCALL_DEFINEx(x, name, ...) \
>> asmlinkage long sys##name(__MAP(x,__SC_DECL,__VA_ARGS__)) \
>> __attribute__((alias(__stringify(SyS##name)))); \
>> + ALLOW_ERROR_INJECTION(sys##name, ERRNO); \
>> static inline long SYSC##name(__MAP(x,__SC_DECL,__VA_ARGS__)); \
>> asmlinkage long SyS##name(__MAP(x,__SC_LONG,__VA_ARGS__)); \
>> asmlinkage long SyS##name(__MAP(x,__SC_LONG,__VA_ARGS__)) \
>> --
>> 2.14.1
>>
>
> Adding a few more people to Cc
>
^ permalink raw reply
* Re: [PATCH] bpf: whitelist syscalls for error injection
From: Howard McLauchlan @ 2018-03-14 0:00 UTC (permalink / raw)
To: Yonghong Song, Omar Sandoval
Cc: linux-kernel, linux-api, Al Viro, Thomas Gleixner,
David S . Miller, Thomas Garnier, kernel-team, Steven Rostedt,
Ingo Molnar, Josef Bacik, Alexei Starovoitov, netdev
In-Reply-To: <67172932-df1d-5585-601d-a7ce70a2c86b@fb.com>
On 03/13/2018 04:49 PM, Yonghong Song wrote:
>
>
> On 3/13/18 4:45 PM, Omar Sandoval wrote:
>> On Tue, Mar 13, 2018 at 04:16:27PM -0700, Howard McLauchlan wrote:
>>> Error injection is a useful mechanism to fail arbitrary kernel
>>> functions. However, it is often hard to guarantee an error propagates
>>> appropriately to user space programs. By injecting into syscalls, we can
>>> return arbitrary values to user space directly; this increases
>>> flexibility and robustness in testing, allowing us to test user space
>>> error paths effectively.
>>>
>>> The following script, for example, fails calls to sys_open() from a
>>> given pid:
>>>
>>> from bcc import BPF
>>> from sys import argv
>>>
>>> pid = argv[1]
>>>
>>> prog = r"""
>>>
>>> int kprobe__SyS_open(struct pt_regs *ctx, const char *pathname, int flags)
>>> {
>>> u32 pid = bpf_get_current_pid_tgid();
>>> if (pid == %s)
>>> bpf_override_return(ctx, -ENOENT);
>>> return 0;
>>> }
>>> """ % pid
>>>
>>> b = BPF(text = prog)
>>> while 1:
>>> b.perf_buffer_poll()
>>>
>>> This patch whitelists all syscalls defined with SYSCALL_DEFINE for error
>>> injection.
>>>
>>> Signed-off-by: Howard McLauchlan <hmclauchlan@fb.com>
>>> ---
>>> based on 4.16-rc5
>>> include/linux/syscalls.h | 3 +++
>>> 1 file changed, 3 insertions(+)
>>>
>>> diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
>>> index a78186d826d7..e8c6d63ace78 100644
>>> --- a/include/linux/syscalls.h
>>> +++ b/include/linux/syscalls.h
>>> @@ -191,6 +191,8 @@ static inline int is_syscall_trace_event(struct trace_event_call *tp_event)
>>> #define SYSCALL_DEFINE0(sname) \
>>> SYSCALL_METADATA(_##sname, 0); \
>>> + asmlinkage long sys_##sname(void); \
>>> + ALLOW_ERROR_INJECTION(sys_##sname, ERRNO); \
>>> asmlinkage long sys_##sname(void)
>
> duplication of asmlinkage in the above?
>
The pre-declaration is necessary to ensure ALLOW_ERROR_INJECTION works appropriately. There can be syscalls
that are not pre-declared elsewhere which will fail compilation if not declared in this block.
>>> #define SYSCALL_DEFINE1(name, ...) SYSCALL_DEFINEx(1, _##name, __VA_ARGS__)
>>> @@ -210,6 +212,7 @@ static inline int is_syscall_trace_event(struct trace_event_call *tp_event)
>>> #define __SYSCALL_DEFINEx(x, name, ...) \
>>> asmlinkage long sys##name(__MAP(x,__SC_DECL,__VA_ARGS__)) \
>>> __attribute__((alias(__stringify(SyS##name)))); \
>>> + ALLOW_ERROR_INJECTION(sys##name, ERRNO); \
>>> static inline long SYSC##name(__MAP(x,__SC_DECL,__VA_ARGS__)); \
>>> asmlinkage long SyS##name(__MAP(x,__SC_LONG,__VA_ARGS__)); \
>>> asmlinkage long SyS##name(__MAP(x,__SC_LONG,__VA_ARGS__)) \
>>> --
>>> 2.14.1
>>>
>>
>> Adding a few more people to Cc
>>
^ permalink raw reply
* Re: BUG_ON triggered in skb_segment
From: Alexei Starovoitov @ 2018-03-14 0:04 UTC (permalink / raw)
To: Eric Dumazet, Yonghong Song, Steffen Klassert
Cc: Daniel Borkmann, netdev, Martin Lau, saeedm, diptanu
In-Reply-To: <39a56b2c-6ea8-02f7-9e9c-968b16b9aa7a@gmail.com>
On 3/13/18 4:27 PM, Eric Dumazet wrote:
>
>
> On 03/13/2018 04:09 PM, Alexei Starovoitov wrote:
>
>> we have bpf_skb_proto_6_to_4() that was used by cilium for long time.
>> It's not clear why it's not crashing there, but we cannot just
>> reject changing proto in bpf programs now.
>> We have to fix whatever needs to be fixed in skb_segment
>> (if bug is there) or fix whatever necessary on mlx5 side.
>> In bpf helper we mark it as SKB_GSO_DODGY just like packets coming
>> through virtio would do, so if skb_segment() needs to do something
>> special with skb the SKB_GSO_DODGY flag is already there.
>
> 'Fixing' skb_segment(), I did that a long time ago and Herbert Xu was
> not happy with the fix and provided something else.
any link to your old patches and discussion?
I think since mlx4 can do tso on them and the packets came out
correct on the wire, there is nothing fundamentally wrong with
changing gso_size. Just tricky to teach skb_segment.
> GSO_DODGY has nothing to do with the problem really.
>
> Changing gso_size is breaking GRO since it ends up changing the number
> of segments on the wire. TCP is not going to be happy, so you'll also
> have to fix TCP eventually.
>
>
^ permalink raw reply
* Re: [PATCH net-next v2 2/4] net: qualcomm: rmnet: Update copyright year to 2018
From: Subash Abhinov Kasiviswanathan @ 2018-03-14 0:15 UTC (permalink / raw)
To: Joe Perches; +Cc: davem, netdev
In-Reply-To: <1520983965.2049.64.camel@perches.com>
> Did any work actually occur on all these files in 2018?
> $ git log --name-only --since=01-01-2018
> drivers/net/ethernet/qualcomm/rmnet/ | \
> grep "^drivers" | sort | uniq
> drivers/net/ethernet/qualcomm/rmnet/rmnet_config.c
> drivers/net/ethernet/qualcomm/rmnet/rmnet_config.h
> drivers/net/ethernet/qualcomm/rmnet/rmnet_handlers.c
> drivers/net/ethernet/qualcomm/rmnet/rmnet_map_command.c
> drivers/net/ethernet/qualcomm/rmnet/rmnet_map_data.c
> drivers/net/ethernet/qualcomm/rmnet/rmnet_map.h
> drivers/net/ethernet/qualcomm/rmnet/rmnet_private.h
> drivers/net/ethernet/qualcomm/rmnet/rmnet_vnd.c
Hi
There were changes to all these files as you pointed out
except for -
drivers/net/ethernet/qualcomm/rmnet/rmnet_handlers.h
drivers/net/ethernet/qualcomm/rmnet/rmnet_vnd.h
--
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project
^ permalink raw reply
* Re: [PATCH net-next v2 2/4] net: qualcomm: rmnet: Update copyright year to 2018
From: Joe Perches @ 2018-03-14 0:20 UTC (permalink / raw)
To: Subash Abhinov Kasiviswanathan; +Cc: davem, netdev
In-Reply-To: <8408f67435612b036439f8fc91b60e37@codeaurora.org>
On Tue, 2018-03-13 at 18:15 -0600, Subash Abhinov Kasiviswanathan wrote:
> > Did any work actually occur on all these files in 2018?
> > $ git log --name-only --since=01-01-2018
> > drivers/net/ethernet/qualcomm/rmnet/ | \
> > grep "^drivers" | sort | uniq
> > drivers/net/ethernet/qualcomm/rmnet/rmnet_config.c
> > drivers/net/ethernet/qualcomm/rmnet/rmnet_config.h
> > drivers/net/ethernet/qualcomm/rmnet/rmnet_handlers.c
> > drivers/net/ethernet/qualcomm/rmnet/rmnet_map_command.c
> > drivers/net/ethernet/qualcomm/rmnet/rmnet_map_data.c
> > drivers/net/ethernet/qualcomm/rmnet/rmnet_map.h
> > drivers/net/ethernet/qualcomm/rmnet/rmnet_private.h
> > drivers/net/ethernet/qualcomm/rmnet/rmnet_vnd.c
>
> Hi
>
> There were changes to all these files as you pointed out
> except for -
> drivers/net/ethernet/qualcomm/rmnet/rmnet_handlers.h
> drivers/net/ethernet/qualcomm/rmnet/rmnet_vnd.h
Yup. That's why I sent that list.
Files without modifications should generally not
have updated copyrights.
cheers, Joe
^ permalink raw reply
* Re: [PATCH] pktgen: use dynamic allocation for debug print buffer
From: David Miller @ 2018-03-14 0:25 UTC (permalink / raw)
To: arnd; +Cc: gustavo, dima, johannes.berg, edumazet, netdev, linux-kernel
In-Reply-To: <20180313205856.3309750-1-arnd@arndb.de>
From: Arnd Bergmann <arnd@arndb.de>
Date: Tue, 13 Mar 2018 21:58:39 +0100
> After the removal of the VLA, we get a harmless warning about a large
> stack frame:
>
> net/core/pktgen.c: In function 'pktgen_if_write':
> net/core/pktgen.c:1710:1: error: the frame size of 1076 bytes is larger than 1024 bytes [-Werror=frame-larger-than=]
>
> The function was previously shown to be safe despite hitting
> the 1024 bye warning level. To get rid of the annoyging warning,
> while keeping it readable, this changes it to use strndup_user().
>
> Obviously this is not a fast path, so the kmalloc() overhead
> can be disregarded.
>
> Fixes: 35951393bbff ("pktgen: Remove VLA usage")
> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Applied, thanks.
^ permalink raw reply
* Re: BUG_ON triggered in skb_segment
From: Eric Dumazet @ 2018-03-14 0:26 UTC (permalink / raw)
To: Alexei Starovoitov, Yonghong Song, Steffen Klassert
Cc: Daniel Borkmann, netdev, Martin Lau, saeedm, diptanu
In-Reply-To: <acba905c-0d57-630c-00ac-493653e7b9a9@fb.com>
On 03/13/2018 05:04 PM, Alexei Starovoitov wrote:
> On 3/13/18 4:27 PM, Eric Dumazet wrote:
>>
>>
>> On 03/13/2018 04:09 PM, Alexei Starovoitov wrote:
>>
>>> we have bpf_skb_proto_6_to_4() that was used by cilium for long time.
>>> It's not clear why it's not crashing there, but we cannot just
>>> reject changing proto in bpf programs now.
>>> We have to fix whatever needs to be fixed in skb_segment
>>> (if bug is there) or fix whatever necessary on mlx5 side.
>>> In bpf helper we mark it as SKB_GSO_DODGY just like packets coming
>>> through virtio would do, so if skb_segment() needs to do something
>>> special with skb the SKB_GSO_DODGY flag is already there.
>>
>> 'Fixing' skb_segment(), I did that a long time ago and Herbert Xu was
>> not happy with the fix and provided something else.
>
> any link to your old patches and discussion?
>
> I think since mlx4 can do tso on them and the packets came out
> correct on the wire, there is nothing fundamentally wrong with
> changing gso_size. Just tricky to teach skb_segment.
>
The world is not mlx4 only. Some NIC will ask skb_segment() fallback
segmentation for various reasons (like skb->len above a given limit like
16KB)
Maybe https://www.spinics.net/lists/netdev/msg255549.html
^ permalink raw reply
* Re: [PATCH v4 2/2] virtio_net: Extend virtio to use VF datapath when available
From: Samudrala, Sridhar @ 2018-03-14 0:28 UTC (permalink / raw)
To: Siwei Liu
Cc: Michael S. Tsirkin, Stephen Hemminger, David Miller, Netdev,
Jiri Pirko, virtio-dev, Brandeburg, Jesse, Alexander Duyck,
Jakub Kicinski
In-Reply-To: <CADGSJ23unj7P_d_vpitAp9JZ7F76i-Y8msaHU4_pOV_4-UUinw@mail.gmail.com>
On 3/12/2018 3:44 PM, Siwei Liu wrote:
> Apologies, still some comments going. Please see inline.
>
> On Thu, Mar 1, 2018 at 12:08 PM, Sridhar Samudrala
> <sridhar.samudrala@intel.com> wrote:
>> This patch enables virtio_net to switch over to a VF datapath when a VF
>> netdev is present with the same MAC address. It allows live migration
>> of a VM with a direct attached VF without the need to setup a bond/team
>> between a VF and virtio net device in the guest.
>>
>> The hypervisor needs to enable only one datapath at any time so that
>> packets don't get looped back to the VM over the other datapath. When a VF
>> is plugged, the virtio datapath link state can be marked as down. The
>> hypervisor needs to unplug the VF device from the guest on the source host
>> and reset the MAC filter of the VF to initiate failover of datapath to
>> virtio before starting the migration. After the migration is completed,
>> the destination hypervisor sets the MAC filter on the VF and plugs it back
>> to the guest to switch over to VF datapath.
>>
>> When BACKUP feature is enabled, an additional netdev(bypass netdev) is
>> created that acts as a master device and tracks the state of the 2 lower
>> netdevs. The original virtio_net netdev is marked as 'backup' netdev and a
>> passthru device with the same MAC is registered as 'active' netdev.
>>
>> This patch is based on the discussion initiated by Jesse on this thread.
>> https://marc.info/?l=linux-virtualization&m=151189725224231&w=2
>>
>> Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
>> Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
>> Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
>> ---
>> drivers/net/virtio_net.c | 683 ++++++++++++++++++++++++++++++++++++++++++++++-
>> 1 file changed, 682 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>> index bcd13fe906ca..f2860d86c952 100644
>> --- a/drivers/net/virtio_net.c
>> +++ b/drivers/net/virtio_net.c
>> @@ -30,6 +30,8 @@
>> #include <linux/cpu.h>
>> #include <linux/average.h>
>> #include <linux/filter.h>
>> +#include <linux/netdevice.h>
>> +#include <linux/pci.h>
>> #include <net/route.h>
>> #include <net/xdp.h>
>>
>> @@ -206,6 +208,9 @@ struct virtnet_info {
>> u32 speed;
>>
>> unsigned long guest_offloads;
>> +
>> + /* upper netdev created when BACKUP feature enabled */
>> + struct net_device *bypass_netdev;
>> };
>>
>> struct padded_vnet_hdr {
>> @@ -2236,6 +2241,22 @@ static int virtnet_xdp(struct net_device *dev, struct netdev_bpf *xdp)
>> }
>> }
>>
>> +static int virtnet_get_phys_port_name(struct net_device *dev, char *buf,
>> + size_t len)
>> +{
>> + struct virtnet_info *vi = netdev_priv(dev);
>> + int ret;
>> +
>> + if (!virtio_has_feature(vi->vdev, VIRTIO_NET_F_BACKUP))
>> + return -EOPNOTSUPP;
>> +
>> + ret = snprintf(buf, len, "_bkup");
>> + if (ret >= len)
>> + return -EOPNOTSUPP;
>> +
>> + return 0;
>> +}
>> +
>> static const struct net_device_ops virtnet_netdev = {
>> .ndo_open = virtnet_open,
>> .ndo_stop = virtnet_close,
>> @@ -2253,6 +2274,7 @@ static const struct net_device_ops virtnet_netdev = {
>> .ndo_xdp_xmit = virtnet_xdp_xmit,
>> .ndo_xdp_flush = virtnet_xdp_flush,
>> .ndo_features_check = passthru_features_check,
>> + .ndo_get_phys_port_name = virtnet_get_phys_port_name,
>> };
>>
>> static void virtnet_config_changed_work(struct work_struct *work)
>> @@ -2647,6 +2669,653 @@ static int virtnet_validate(struct virtio_device *vdev)
>> return 0;
>> }
>>
>> +/* START of functions supporting VIRTIO_NET_F_BACKUP feature.
>> + * When BACKUP feature is enabled, an additional netdev(bypass netdev)
>> + * is created that acts as a master device and tracks the state of the
>> + * 2 lower netdevs. The original virtio_net netdev is registered as
>> + * 'backup' netdev and a passthru device with the same MAC is registered
>> + * as 'active' netdev.
>> + */
>> +
>> +/* bypass state maintained when BACKUP feature is enabled */
>> +struct virtnet_bypass_info {
>> + /* passthru netdev with same MAC */
>> + struct net_device __rcu *active_netdev;
>> +
>> + /* virtio_net netdev */
>> + struct net_device __rcu *backup_netdev;
>> +
>> + /* active netdev stats */
>> + struct rtnl_link_stats64 active_stats;
>> +
>> + /* backup netdev stats */
>> + struct rtnl_link_stats64 backup_stats;
>> +
>> + /* aggregated stats */
>> + struct rtnl_link_stats64 bypass_stats;
>> +
>> + /* spinlock while updating stats */
>> + spinlock_t stats_lock;
>> +};
>> +
>> +static void virtnet_bypass_child_open(struct net_device *dev,
>> + struct net_device *child_netdev)
>> +{
>> + int err = dev_open(child_netdev);
>> +
>> + if (err)
>> + netdev_warn(dev, "unable to open slave: %s: %d\n",
>> + child_netdev->name, err);
>> +}
> Why ignoring the error?? i.e. virtnet_bypass_child_open should have
> return value. I don't believe the caller is in a safe context to
> guarantee the dev_open always succeeds.
OK. Will handle this in the next revision.
>
>> +
>> +static int virtnet_bypass_open(struct net_device *dev)
>> +{
>> + struct virtnet_bypass_info *vbi = netdev_priv(dev);
>> + struct net_device *child_netdev;
>> +
>> + netif_carrier_off(dev);
>> + netif_tx_wake_all_queues(dev);
>> +
>> + child_netdev = rtnl_dereference(vbi->active_netdev);
>> + if (child_netdev)
>> + virtnet_bypass_child_open(dev, child_netdev);
> Handle the error?
>
>> +
>> + child_netdev = rtnl_dereference(vbi->backup_netdev);
>> + if (child_netdev)
>> + virtnet_bypass_child_open(dev, child_netdev);
> Handle the error and unwind?
Sure.
<snip>
> +
> +static int virtnet_bypass_register_child(struct net_device *child_netdev)
> +{
> + struct virtnet_bypass_info *vbi;
> + struct net_device *dev;
> + bool backup;
> + int ret;
> +
> + if (child_netdev->addr_len != ETH_ALEN)
> + return NOTIFY_DONE;
> +
> + /* We will use the MAC address to locate the virtnet_bypass netdev
> + * to associate with the child netdev. If we don't find a matching
> + * bypass netdev, move on.
> + */
> + dev = get_virtnet_bypass_bymac(dev_net(child_netdev),
> + child_netdev->perm_addr);
> + if (!dev)
> + return NOTIFY_DONE;
> +
> + vbi = netdev_priv(dev);
> + backup = (child_netdev->dev.parent == dev->dev.parent);
> + if (backup ? rtnl_dereference(vbi->backup_netdev) :
> + rtnl_dereference(vbi->active_netdev)) {
> + netdev_info(dev,
> + "%s attempting to join bypass dev when %s already present\n",
> + child_netdev->name, backup ? "backup" : "active");
> + return NOTIFY_DONE;
> + }
> +
> + /* Avoid non pci devices as active netdev */
> + if (!backup && (!child_netdev->dev.parent ||
> + !dev_is_pci(child_netdev->dev.parent)))
> + return NOTIFY_DONE;
> +
> There's a problem here in terms of error (particularly on the active
> slave, e.g. VF), see below:
>
>> + ret = netdev_rx_handler_register(child_netdev,
>> + virtnet_bypass_handle_frame, dev);
>> + if (ret != 0) {
>> + netdev_err(child_netdev,
>> + "can not register bypass receive handler (err = %d)\n",
>> + ret);
>> + goto rx_handler_failed;
>> + }
>> +
>> + ret = netdev_upper_dev_link(child_netdev, dev, NULL);
>> + if (ret != 0) {
>> + netdev_err(child_netdev,
>> + "can not set master device %s (err = %d)\n",
>> + dev->name, ret);
>> + goto upper_link_failed;
>> + }
>> +
>> + child_netdev->flags |= IFF_SLAVE;
>> +
>> + if (netif_running(dev)) {
>> + ret = dev_open(child_netdev);
>> + if (ret && (ret != -EBUSY)) {
>> + netdev_err(dev, "Opening child %s failed ret:%d\n",
>> + child_netdev->name, ret);
>> + goto err_interface_up;
>> + }
>> + }
>> +
>> + /* Align MTU of child with master */
>> + ret = dev_set_mtu(child_netdev, dev->mtu);
>> + if (ret) {
>> + netdev_err(dev,
>> + "unable to change mtu of %s to %u register failed\n",
>> + child_netdev->name, dev->mtu);
>> + goto err_set_mtu;
>> + }
> If any of the above calls returns non-zero, the current code steps
> back to undo what's being done on that spefic slave previously. For
> instance, if the netdev_rx_handler_register returns non-zero because
> of busy rx handler, this register_child function would give up
> enslaving the VF and leave the upper virtio_bypass interface behind
> once it returns.
virtio_bypass interface is the upper master netdev and it is always present when
BACKUP feature is enabled.
If there is failure with enslaving VF, there will be 2 netdevs,
master virtio_bypass and slave virtio_net and the VM can be migrated.
> I am not sure if it's a good idea to leave the
> virtio_bypass around if running into failure: the guest is not
> migratable as the VF doesn't have a backup path,
Are you talking about a failure when registering backup netdev? This should not
happen, but i guess we can improve error handling in such scenario.
> And perhaps the worse
> part is that, it now has two interfaces with identical MAC address but
> one of them is invalid (user cannot use the virtio interface as it has
> a dampened datapath). IMHO the virtio_bypass and its lower netdev
> should be destroyed at all when it fails to bind the VF, and
> technically, there should be some way to propogate the failure status
> to the hypervisor/backend, indicating that the VM is not migratable
> because of guest software errors (maybe by clearing out the backup
> feature from the guest virtio driver so host can see/learn it).
In BACKUP mode, user can only use the upper virtio_bypass netdev and that will
always be there. Any failure to enslave VF netdev is not fatal, but i will see
if we can improve the error handling of failure to enslave backup netdev.
Also, i don't think the BACKUP feature bit is negotiable with the host.
Thanks
Sridhar
^ permalink raw reply
* [PATCH net-next 1/6] ibmvnic: Generalize TX pool structure
From: Thomas Falcon @ 2018-03-14 0:34 UTC (permalink / raw)
To: netdev; +Cc: jallen, nfont, Thomas Falcon
In-Reply-To: <1520987663-26056-1-git-send-email-tlfalcon@linux.vnet.ibm.com>
Remove some unused fields in the structure and include values
describing the individual buffer size and number of buffers in
a TX pool. This allows us to use these fields for TX pool buffer
accounting as opposed to using hard coded values. Finally, split
TSO buffers out and provide an additional TX pool array for TSO.
Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
---
drivers/net/ethernet/ibm/ibmvnic.h | 7 +++----
1 file changed, 3 insertions(+), 4 deletions(-)
diff --git a/drivers/net/ethernet/ibm/ibmvnic.h b/drivers/net/ethernet/ibm/ibmvnic.h
index 099c89d..a2e21b3 100644
--- a/drivers/net/ethernet/ibm/ibmvnic.h
+++ b/drivers/net/ethernet/ibm/ibmvnic.h
@@ -917,11 +917,9 @@ struct ibmvnic_tx_pool {
int *free_map;
int consumer_index;
int producer_index;
- wait_queue_head_t ibmvnic_tx_comp_q;
- struct task_struct *work_thread;
struct ibmvnic_long_term_buff long_term_buff;
- struct ibmvnic_long_term_buff tso_ltb;
- int tso_index;
+ int num_buffers;
+ int buf_size;
};
struct ibmvnic_rx_buff {
@@ -1044,6 +1042,7 @@ struct ibmvnic_adapter {
u64 promisc;
struct ibmvnic_tx_pool *tx_pool;
+ struct ibmvnic_tx_pool *tso_pool;
struct completion init_done;
int init_done_rc;
--
1.8.3.1
^ permalink raw reply related
* [PATCH net-next 0/6] ibmvnic: Update TX pool and TX routines
From: Thomas Falcon @ 2018-03-14 0:34 UTC (permalink / raw)
To: netdev; +Cc: jallen, nfont, Thomas Falcon
This patch restructures the TX pool data structure and provides a
separate TX pool array for TSO transmissions. This is already used
in some way due to our unique DMA situation, namely that we cannot
use single DMA mappings for packet data. Previously, both buffer
arrays used the same pool entry. This restructuring allows for
some additional cleanup in the driver code, especially in some
places in the device transmit routine.
In addition, it allows us to more easily track the consumer
and producer indexes of a particular pool. This has been
further improved by better tracking of in-use buffers to
prevent possible data corruption in case an invalid buffer
entry is used.
Thomas Falcon (6):
ibmvnic: Generalize TX pool structure
ibmvnic: Update and clean up reset TX pool routine
ibmvnic: Update release RX pool routine
ibmvnic: Update TX pool initialization routine
ibmvnic: Update TX and TX completion routines
ibmvnic: Improve TX buffer accounting
drivers/net/ethernet/ibm/ibmvnic.c | 235 +++++++++++++++++++++----------------
drivers/net/ethernet/ibm/ibmvnic.h | 8 +-
2 files changed, 136 insertions(+), 107 deletions(-)
--
1.8.3.1
^ permalink raw reply
* [PATCH net-next 3/6] ibmvnic: Update release RX pool routine
From: Thomas Falcon @ 2018-03-14 0:34 UTC (permalink / raw)
To: netdev; +Cc: jallen, nfont, Thomas Falcon
In-Reply-To: <1520987663-26056-1-git-send-email-tlfalcon@linux.vnet.ibm.com>
Introduce function that frees one TX pool. Use that to release
each pool in both the standard TX pool and TSO pool arrays.
Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
---
drivers/net/ethernet/ibm/ibmvnic.c | 19 ++++++++++++-------
1 file changed, 12 insertions(+), 7 deletions(-)
diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c
index 81a4cd9..e6e934e 100644
--- a/drivers/net/ethernet/ibm/ibmvnic.c
+++ b/drivers/net/ethernet/ibm/ibmvnic.c
@@ -608,25 +608,30 @@ static void release_vpd_data(struct ibmvnic_adapter *adapter)
adapter->vpd = NULL;
}
+static void release_one_tx_pool(struct ibmvnic_adapter *adapter,
+ struct ibmvnic_tx_pool *tx_pool)
+{
+ kfree(tx_pool->tx_buff);
+ kfree(tx_pool->free_map);
+ free_long_term_buff(adapter, &tx_pool->long_term_buff);
+}
+
static void release_tx_pools(struct ibmvnic_adapter *adapter)
{
- struct ibmvnic_tx_pool *tx_pool;
int i;
if (!adapter->tx_pool)
return;
for (i = 0; i < adapter->num_active_tx_pools; i++) {
- netdev_dbg(adapter->netdev, "Releasing tx_pool[%d]\n", i);
- tx_pool = &adapter->tx_pool[i];
- kfree(tx_pool->tx_buff);
- free_long_term_buff(adapter, &tx_pool->long_term_buff);
- free_long_term_buff(adapter, &tx_pool->tso_ltb);
- kfree(tx_pool->free_map);
+ release_one_tx_pool(adapter, &adapter->tx_pool[i]);
+ release_one_tx_pool(adapter, &adapter->tso_pool[i]);
}
kfree(adapter->tx_pool);
adapter->tx_pool = NULL;
+ kfree(adapter->tso_pool);
+ adapter->tso_pool = NULL;
adapter->num_active_tx_pools = 0;
}
--
1.8.3.1
^ permalink raw reply related
* [PATCH net-next 5/6] ibmvnic: Update TX and TX completion routines
From: Thomas Falcon @ 2018-03-14 0:34 UTC (permalink / raw)
To: netdev; +Cc: jallen, nfont, Thomas Falcon
In-Reply-To: <1520987663-26056-1-git-send-email-tlfalcon@linux.vnet.ibm.com>
Update TX and TX completion routines to account for TX pool
restructuring. TX routine first chooses the pool depending
on whether a packet is GSO or not, then uses it accordingly.
For the completion routine to know which pool it needs to use,
set the most significant bit of the correlator index to one
if the packet uses the TSO pool. On completion, unset the bit
and use the correlator index to release the buffer pool entry.
Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
---
drivers/net/ethernet/ibm/ibmvnic.c | 55 +++++++++++++++++++-------------------
drivers/net/ethernet/ibm/ibmvnic.h | 1 +
2 files changed, 29 insertions(+), 27 deletions(-)
diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c
index cda5bf0..3b752e3 100644
--- a/drivers/net/ethernet/ibm/ibmvnic.c
+++ b/drivers/net/ethernet/ibm/ibmvnic.c
@@ -1414,8 +1414,11 @@ static int ibmvnic_xmit(struct sk_buff *skb, struct net_device *netdev)
ret = NETDEV_TX_OK;
goto out;
}
+ if (skb_is_gso(skb))
+ tx_pool = &adapter->tso_pool[queue_num];
+ else
+ tx_pool = &adapter->tx_pool[queue_num];
- tx_pool = &adapter->tx_pool[queue_num];
tx_scrq = adapter->tx_scrq[queue_num];
txq = netdev_get_tx_queue(netdev, skb_get_queue_mapping(skb));
handle_array = (u64 *)((u8 *)(adapter->login_rsp_buf) +
@@ -1423,20 +1426,10 @@ static int ibmvnic_xmit(struct sk_buff *skb, struct net_device *netdev)
index = tx_pool->free_map[tx_pool->consumer_index];
- if (skb_is_gso(skb)) {
- offset = tx_pool->tso_index * IBMVNIC_TSO_BUF_SZ;
- dst = tx_pool->tso_ltb.buff + offset;
- memset(dst, 0, IBMVNIC_TSO_BUF_SZ);
- data_dma_addr = tx_pool->tso_ltb.addr + offset;
- tx_pool->tso_index++;
- if (tx_pool->tso_index == IBMVNIC_TSO_BUFS)
- tx_pool->tso_index = 0;
- } else {
- offset = index * (adapter->req_mtu + VLAN_HLEN);
- dst = tx_pool->long_term_buff.buff + offset;
- memset(dst, 0, adapter->req_mtu + VLAN_HLEN);
- data_dma_addr = tx_pool->long_term_buff.addr + offset;
- }
+ offset = index * tx_pool->buf_size;
+ dst = tx_pool->long_term_buff.buff + offset;
+ memset(dst, 0, tx_pool->buf_size);
+ data_dma_addr = tx_pool->long_term_buff.addr + offset;
if (skb_shinfo(skb)->nr_frags) {
int cur, i;
@@ -1459,8 +1452,7 @@ static int ibmvnic_xmit(struct sk_buff *skb, struct net_device *netdev)
}
tx_pool->consumer_index =
- (tx_pool->consumer_index + 1) %
- adapter->req_tx_entries_per_subcrq;
+ (tx_pool->consumer_index + 1) % tx_pool->num_buffers;
tx_buff = &tx_pool->tx_buff[index];
tx_buff->skb = skb;
@@ -1476,11 +1468,13 @@ static int ibmvnic_xmit(struct sk_buff *skb, struct net_device *netdev)
tx_crq.v1.n_crq_elem = 1;
tx_crq.v1.n_sge = 1;
tx_crq.v1.flags1 = IBMVNIC_TX_COMP_NEEDED;
- tx_crq.v1.correlator = cpu_to_be32(index);
+
if (skb_is_gso(skb))
- tx_crq.v1.dma_reg = cpu_to_be16(tx_pool->tso_ltb.map_id);
+ tx_crq.v1.correlator =
+ cpu_to_be32(index | IBMVNIC_TSO_POOL_MASK);
else
- tx_crq.v1.dma_reg = cpu_to_be16(tx_pool->long_term_buff.map_id);
+ tx_crq.v1.correlator = cpu_to_be32(index);
+ tx_crq.v1.dma_reg = cpu_to_be16(tx_pool->long_term_buff.map_id);
tx_crq.v1.sge_len = cpu_to_be32(skb->len);
tx_crq.v1.ioba = cpu_to_be64(data_dma_addr);
@@ -1543,7 +1537,7 @@ static int ibmvnic_xmit(struct sk_buff *skb, struct net_device *netdev)
if (tx_pool->consumer_index == 0)
tx_pool->consumer_index =
- adapter->req_tx_entries_per_subcrq - 1;
+ tx_pool->num_buffers - 1;
else
tx_pool->consumer_index--;
@@ -2545,6 +2539,7 @@ static int ibmvnic_complete_tx(struct ibmvnic_adapter *adapter,
struct ibmvnic_sub_crq_queue *scrq)
{
struct device *dev = &adapter->vdev->dev;
+ struct ibmvnic_tx_pool *tx_pool;
struct ibmvnic_tx_buff *txbuff;
union sub_crq *next;
int index;
@@ -2564,7 +2559,14 @@ static int ibmvnic_complete_tx(struct ibmvnic_adapter *adapter,
continue;
}
index = be32_to_cpu(next->tx_comp.correlators[i]);
- txbuff = &adapter->tx_pool[pool].tx_buff[index];
+ if (index & IBMVNIC_TSO_POOL_MASK) {
+ tx_pool = &adapter->tso_pool[pool];
+ index &= ~IBMVNIC_TSO_POOL_MASK;
+ } else {
+ tx_pool = &adapter->tx_pool[pool];
+ }
+
+ txbuff = &tx_pool->tx_buff[index];
for (j = 0; j < IBMVNIC_MAX_FRAGS_PER_CRQ; j++) {
if (!txbuff->data_dma[j])
@@ -2587,11 +2589,10 @@ static int ibmvnic_complete_tx(struct ibmvnic_adapter *adapter,
num_entries += txbuff->num_entries;
- adapter->tx_pool[pool].free_map[adapter->tx_pool[pool].
- producer_index] = index;
- adapter->tx_pool[pool].producer_index =
- (adapter->tx_pool[pool].producer_index + 1) %
- adapter->req_tx_entries_per_subcrq;
+ tx_pool->free_map[tx_pool->producer_index] = index;
+ tx_pool->producer_index =
+ (tx_pool->producer_index + 1) %
+ tx_pool->num_buffers;
}
/* remove tx_comp scrq*/
next->tx_comp.first = 0;
diff --git a/drivers/net/ethernet/ibm/ibmvnic.h b/drivers/net/ethernet/ibm/ibmvnic.h
index a2e21b3..89efe70 100644
--- a/drivers/net/ethernet/ibm/ibmvnic.h
+++ b/drivers/net/ethernet/ibm/ibmvnic.h
@@ -43,6 +43,7 @@
#define IBMVNIC_TSO_BUF_SZ 65536
#define IBMVNIC_TSO_BUFS 64
+#define IBMVNIC_TSO_POOL_MASK 0x80000000
#define IBMVNIC_MAX_LTB_SIZE ((1 << (MAX_ORDER - 1)) * PAGE_SIZE)
#define IBMVNIC_BUFFER_HLEN 500
--
1.8.3.1
^ permalink raw reply related
* [PATCH net-next 2/6] ibmvnic: Update and clean up reset TX pool routine
From: Thomas Falcon @ 2018-03-14 0:34 UTC (permalink / raw)
To: netdev; +Cc: jallen, nfont, Thomas Falcon
In-Reply-To: <1520987663-26056-1-git-send-email-tlfalcon@linux.vnet.ibm.com>
Update TX pool reset routine to accommodate new TSO pool array. Introduce
a function that resets one TX pool, and use that function to initialize
each pool in both pool arrays.
Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
---
drivers/net/ethernet/ibm/ibmvnic.c | 45 +++++++++++++++++++++-----------------
1 file changed, 25 insertions(+), 20 deletions(-)
diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c
index 6ff43d7..81a4cd9 100644
--- a/drivers/net/ethernet/ibm/ibmvnic.c
+++ b/drivers/net/ethernet/ibm/ibmvnic.c
@@ -557,36 +557,41 @@ static int init_rx_pools(struct net_device *netdev)
return 0;
}
+static int reset_one_tx_pool(struct ibmvnic_adapter *adapter,
+ struct ibmvnic_tx_pool *tx_pool)
+{
+ int rc, i;
+
+ rc = reset_long_term_buff(adapter, &tx_pool->long_term_buff);
+ if (rc)
+ return rc;
+
+ memset(tx_pool->tx_buff, 0,
+ tx_pool->num_buffers *
+ sizeof(struct ibmvnic_tx_buff));
+
+ for (i = 0; i < tx_pool->num_buffers; i++)
+ tx_pool->free_map[i] = i;
+
+ tx_pool->consumer_index = 0;
+ tx_pool->producer_index = 0;
+
+ return 0;
+}
+
static int reset_tx_pools(struct ibmvnic_adapter *adapter)
{
- struct ibmvnic_tx_pool *tx_pool;
int tx_scrqs;
- int i, j, rc;
+ int i, rc;
tx_scrqs = be32_to_cpu(adapter->login_rsp_buf->num_txsubm_subcrqs);
for (i = 0; i < tx_scrqs; i++) {
- netdev_dbg(adapter->netdev, "Re-setting tx_pool[%d]\n", i);
-
- tx_pool = &adapter->tx_pool[i];
-
- rc = reset_long_term_buff(adapter, &tx_pool->long_term_buff);
+ rc = reset_one_tx_pool(adapter, &adapter->tso_pool[i]);
if (rc)
return rc;
-
- rc = reset_long_term_buff(adapter, &tx_pool->tso_ltb);
+ rc = reset_one_tx_pool(adapter, &adapter->tx_pool[i]);
if (rc)
return rc;
-
- memset(tx_pool->tx_buff, 0,
- adapter->req_tx_entries_per_subcrq *
- sizeof(struct ibmvnic_tx_buff));
-
- for (j = 0; j < adapter->req_tx_entries_per_subcrq; j++)
- tx_pool->free_map[j] = j;
-
- tx_pool->consumer_index = 0;
- tx_pool->producer_index = 0;
- tx_pool->tso_index = 0;
}
return 0;
--
1.8.3.1
^ permalink raw reply related
* [PATCH net-next 6/6] ibmvnic: Improve TX buffer accounting
From: Thomas Falcon @ 2018-03-14 0:34 UTC (permalink / raw)
To: netdev; +Cc: jallen, nfont, Thomas Falcon
In-Reply-To: <1520987663-26056-1-git-send-email-tlfalcon@linux.vnet.ibm.com>
Improve TX pool buffer accounting to prevent the producer
index from overruning the consumer. First, set the next free
index to an invalid value if it is in use. If next buffer
to be consumed is in use, drop the packet.
Finally, if the transmit fails for some other reason, roll
back the consumer index and set the free map entry to its original
value. This should also be done if the DMA map fails.
Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
---
drivers/net/ethernet/ibm/ibmvnic.c | 30 +++++++++++++++++++++---------
1 file changed, 21 insertions(+), 9 deletions(-)
diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c
index 3b752e3..f599dad 100644
--- a/drivers/net/ethernet/ibm/ibmvnic.c
+++ b/drivers/net/ethernet/ibm/ibmvnic.c
@@ -1426,6 +1426,16 @@ static int ibmvnic_xmit(struct sk_buff *skb, struct net_device *netdev)
index = tx_pool->free_map[tx_pool->consumer_index];
+ if (index == IBMVNIC_INVALID_MAP) {
+ dev_kfree_skb_any(skb);
+ tx_send_failed++;
+ tx_dropped++;
+ ret = NETDEV_TX_OK;
+ goto out;
+ }
+
+ tx_pool->free_map[tx_pool->consumer_index] = IBMVNIC_INVALID_MAP;
+
offset = index * tx_pool->buf_size;
dst = tx_pool->long_term_buff.buff + offset;
memset(dst, 0, tx_pool->buf_size);
@@ -1522,7 +1532,7 @@ static int ibmvnic_xmit(struct sk_buff *skb, struct net_device *netdev)
tx_map_failed++;
tx_dropped++;
ret = NETDEV_TX_OK;
- goto out;
+ goto tx_err_out;
}
lpar_rc = send_subcrq_indirect(adapter, handle_array[queue_num],
(u64)tx_buff->indir_dma,
@@ -1534,13 +1544,6 @@ static int ibmvnic_xmit(struct sk_buff *skb, struct net_device *netdev)
}
if (lpar_rc != H_SUCCESS) {
dev_err(dev, "tx failed with code %ld\n", lpar_rc);
-
- if (tx_pool->consumer_index == 0)
- tx_pool->consumer_index =
- tx_pool->num_buffers - 1;
- else
- tx_pool->consumer_index--;
-
dev_kfree_skb_any(skb);
tx_buff->skb = NULL;
@@ -1556,7 +1559,7 @@ static int ibmvnic_xmit(struct sk_buff *skb, struct net_device *netdev)
tx_send_failed++;
tx_dropped++;
ret = NETDEV_TX_OK;
- goto out;
+ goto tx_err_out;
}
if (atomic_add_return(num_entries, &tx_scrq->used)
@@ -1569,7 +1572,16 @@ static int ibmvnic_xmit(struct sk_buff *skb, struct net_device *netdev)
tx_bytes += skb->len;
txq->trans_start = jiffies;
ret = NETDEV_TX_OK;
+ goto out;
+tx_err_out:
+ /* roll back consumer index and map array*/
+ if (tx_pool->consumer_index == 0)
+ tx_pool->consumer_index =
+ tx_pool->num_buffers - 1;
+ else
+ tx_pool->consumer_index--;
+ tx_pool->free_map[tx_pool->consumer_index] = index;
out:
netdev->stats.tx_dropped += tx_dropped;
netdev->stats.tx_bytes += tx_bytes;
--
1.8.3.1
^ permalink raw reply related
* [PATCH net-next 4/6] ibmvnic: Update TX pool initialization routine
From: Thomas Falcon @ 2018-03-14 0:34 UTC (permalink / raw)
To: netdev; +Cc: jallen, nfont, Thomas Falcon
In-Reply-To: <1520987663-26056-1-git-send-email-tlfalcon@linux.vnet.ibm.com>
Introduce function that initializes one TX pool. Use that to
create each pool entry in both the standard TX pool and TSO
pool arrays.
Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
---
drivers/net/ethernet/ibm/ibmvnic.c | 90 ++++++++++++++++++++------------------
1 file changed, 48 insertions(+), 42 deletions(-)
diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c
index e6e934e..cda5bf0 100644
--- a/drivers/net/ethernet/ibm/ibmvnic.c
+++ b/drivers/net/ethernet/ibm/ibmvnic.c
@@ -635,13 +635,43 @@ static void release_tx_pools(struct ibmvnic_adapter *adapter)
adapter->num_active_tx_pools = 0;
}
+static int init_one_tx_pool(struct net_device *netdev,
+ struct ibmvnic_tx_pool *tx_pool,
+ int num_entries, int buf_size)
+{
+ struct ibmvnic_adapter *adapter = netdev_priv(netdev);
+ int i;
+
+ tx_pool->tx_buff = kcalloc(num_entries,
+ sizeof(struct ibmvnic_tx_buff),
+ GFP_KERNEL);
+ if (!tx_pool->tx_buff)
+ return -1;
+
+ if (alloc_long_term_buff(adapter, &tx_pool->long_term_buff,
+ num_entries * buf_size))
+ return -1;
+
+ tx_pool->free_map = kcalloc(num_entries, sizeof(int), GFP_KERNEL);
+ if (!tx_pool->free_map)
+ return -1;
+
+ for (i = 0; i < num_entries; i++)
+ tx_pool->free_map[i] = i;
+
+ tx_pool->consumer_index = 0;
+ tx_pool->producer_index = 0;
+ tx_pool->num_buffers = num_entries;
+ tx_pool->buf_size = buf_size;
+
+ return 0;
+}
+
static int init_tx_pools(struct net_device *netdev)
{
struct ibmvnic_adapter *adapter = netdev_priv(netdev);
- struct device *dev = &adapter->vdev->dev;
- struct ibmvnic_tx_pool *tx_pool;
int tx_subcrqs;
- int i, j;
+ int i, rc;
tx_subcrqs = be32_to_cpu(adapter->login_rsp_buf->num_txsubm_subcrqs);
adapter->tx_pool = kcalloc(tx_subcrqs,
@@ -649,53 +679,29 @@ static int init_tx_pools(struct net_device *netdev)
if (!adapter->tx_pool)
return -1;
+ adapter->tso_pool = kcalloc(tx_subcrqs,
+ sizeof(struct ibmvnic_tx_pool), GFP_KERNEL);
+ if (!adapter->tso_pool)
+ return -1;
+
adapter->num_active_tx_pools = tx_subcrqs;
for (i = 0; i < tx_subcrqs; i++) {
- tx_pool = &adapter->tx_pool[i];
-
- netdev_dbg(adapter->netdev,
- "Initializing tx_pool[%d], %lld buffs\n",
- i, adapter->req_tx_entries_per_subcrq);
-
- tx_pool->tx_buff = kcalloc(adapter->req_tx_entries_per_subcrq,
- sizeof(struct ibmvnic_tx_buff),
- GFP_KERNEL);
- if (!tx_pool->tx_buff) {
- dev_err(dev, "tx pool buffer allocation failed\n");
- release_tx_pools(adapter);
- return -1;
- }
-
- if (alloc_long_term_buff(adapter, &tx_pool->long_term_buff,
- adapter->req_tx_entries_per_subcrq *
- (adapter->req_mtu + VLAN_HLEN))) {
- release_tx_pools(adapter);
- return -1;
- }
-
- /* alloc TSO ltb */
- if (alloc_long_term_buff(adapter, &tx_pool->tso_ltb,
- IBMVNIC_TSO_BUFS *
- IBMVNIC_TSO_BUF_SZ)) {
+ rc = init_one_tx_pool(netdev, &adapter->tx_pool[i],
+ adapter->req_tx_entries_per_subcrq,
+ adapter->req_mtu + VLAN_HLEN);
+ if (rc) {
release_tx_pools(adapter);
- return -1;
+ return rc;
}
- tx_pool->tso_index = 0;
-
- tx_pool->free_map = kcalloc(adapter->req_tx_entries_per_subcrq,
- sizeof(int), GFP_KERNEL);
- if (!tx_pool->free_map) {
+ init_one_tx_pool(netdev, &adapter->tso_pool[i],
+ IBMVNIC_TSO_BUFS,
+ IBMVNIC_TSO_BUF_SZ);
+ if (rc) {
release_tx_pools(adapter);
- return -1;
+ return rc;
}
-
- for (j = 0; j < adapter->req_tx_entries_per_subcrq; j++)
- tx_pool->free_map[j] = j;
-
- tx_pool->consumer_index = 0;
- tx_pool->producer_index = 0;
}
return 0;
--
1.8.3.1
^ permalink raw reply related
* Re: BUG_ON triggered in skb_segment
From: Eric Dumazet @ 2018-03-14 0:35 UTC (permalink / raw)
To: Alexei Starovoitov, Yonghong Song, Steffen Klassert
Cc: Daniel Borkmann, netdev, Martin Lau, saeedm, diptanu
In-Reply-To: <50906fd5-9834-d41a-c132-86f64be38606@gmail.com>
On 03/13/2018 05:26 PM, Eric Dumazet wrote:
>
>
> On 03/13/2018 05:04 PM, Alexei Starovoitov wrote:
>> On 3/13/18 4:27 PM, Eric Dumazet wrote:
>>>
>>>
>>> On 03/13/2018 04:09 PM, Alexei Starovoitov wrote:
>>>
>>>> we have bpf_skb_proto_6_to_4() that was used by cilium for long time.
>>>> It's not clear why it's not crashing there, but we cannot just
>>>> reject changing proto in bpf programs now.
>>>> We have to fix whatever needs to be fixed in skb_segment
>>>> (if bug is there) or fix whatever necessary on mlx5 side.
>>>> In bpf helper we mark it as SKB_GSO_DODGY just like packets coming
>>>> through virtio would do, so if skb_segment() needs to do something
>>>> special with skb the SKB_GSO_DODGY flag is already there.
>>>
>>> 'Fixing' skb_segment(), I did that a long time ago and Herbert Xu was
>>> not happy with the fix and provided something else.
>>
>> any link to your old patches and discussion?
>>
>> I think since mlx4 can do tso on them and the packets came out
>> correct on the wire, there is nothing fundamentally wrong with
>> changing gso_size. Just tricky to teach skb_segment.
>>
>
> The world is not mlx4 only. Some NIC will ask skb_segment() fallback
> segmentation for various reasons (like skb->len above a given limit like
> 16KB)
>
> Maybe https://www.spinics.net/lists/netdev/msg255549.html
Herbert patch :
commit 9d8506cc2d7ea1f911c72c100193a3677f6668c3
Author: Herbert Xu <herbert@gondor.apana.org.au>
Date: Thu Nov 21 11:10:04 2013 -0800
gso: handle new frag_list of frags GRO packets
^ permalink raw reply
* Re: [PATCH v4 2/2] virtio_net: Extend virtio to use VF datapath when available
From: Samudrala, Sridhar @ 2018-03-14 0:36 UTC (permalink / raw)
To: Jiri Pirko
Cc: mst, stephen, davem, netdev, virtio-dev, jesse.brandeburg,
alexander.h.duyck, kubakici
In-Reply-To: <20180312210812.GB17283@nanopsycho>
On 3/12/2018 2:08 PM, Jiri Pirko wrote:
> Mon, Mar 12, 2018 at 09:58:06PM CET, sridhar.samudrala@intel.com wrote:
>>
>> On 3/12/2018 1:12 PM, Jiri Pirko wrote:
>>> Thu, Mar 01, 2018 at 09:08:43PM CET, sridhar.samudrala@intel.com wrote:
>>>> This patch enables virtio_net to switch over to a VF datapath when a VF
>>>> netdev is present with the same MAC address. It allows live migration
>>>> of a VM with a direct attached VF without the need to setup a bond/team
>>>> between a VF and virtio net device in the guest.
>>>>
>>>> The hypervisor needs to enable only one datapath at any time so that
>>>> packets don't get looped back to the VM over the other datapath. When a VF
>>>> is plugged, the virtio datapath link state can be marked as down. The
>>>> hypervisor needs to unplug the VF device from the guest on the source host
>>>> and reset the MAC filter of the VF to initiate failover of datapath to
>>>> virtio before starting the migration. After the migration is completed,
>>>> the destination hypervisor sets the MAC filter on the VF and plugs it back
>>>> to the guest to switch over to VF datapath.
>>>>
>>>> When BACKUP feature is enabled, an additional netdev(bypass netdev) is
>>>> created that acts as a master device and tracks the state of the 2 lower
>>>> netdevs. The original virtio_net netdev is marked as 'backup' netdev and a
>>>> passthru device with the same MAC is registered as 'active' netdev.
>>>>
>>>> This patch is based on the discussion initiated by Jesse on this thread.
>>>> https://marc.info/?l=linux-virtualization&m=151189725224231&w=2
>>>>
>>>> Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
>>>> Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
>>>> Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
>>> [...]
>>>
>>>
>>>> +static int virtnet_bypass_register_child(struct net_device *child_netdev)
>>>> +{
>>>> + struct virtnet_bypass_info *vbi;
>>>> + struct net_device *dev;
>>>> + bool backup;
>>>> + int ret;
>>>> +
>>>> + if (child_netdev->addr_len != ETH_ALEN)
>>>> + return NOTIFY_DONE;
>>>> +
>>>> + /* We will use the MAC address to locate the virtnet_bypass netdev
>>>> + * to associate with the child netdev. If we don't find a matching
>>>> + * bypass netdev, move on.
>>>> + */
>>>> + dev = get_virtnet_bypass_bymac(dev_net(child_netdev),
>>>> + child_netdev->perm_addr);
>>>> + if (!dev)
>>>> + return NOTIFY_DONE;
>>>> +
>>>> + vbi = netdev_priv(dev);
>>>> + backup = (child_netdev->dev.parent == dev->dev.parent);
>>>> + if (backup ? rtnl_dereference(vbi->backup_netdev) :
>>>> + rtnl_dereference(vbi->active_netdev)) {
>>>> + netdev_info(dev,
>>>> + "%s attempting to join bypass dev when %s already present\n",
>>>> + child_netdev->name, backup ? "backup" : "active");
>>>> + return NOTIFY_DONE;
>>>> + }
>>>> +
>>>> + /* Avoid non pci devices as active netdev */
>>>> + if (!backup && (!child_netdev->dev.parent ||
>>>> + !dev_is_pci(child_netdev->dev.parent)))
>>>> + return NOTIFY_DONE;
>>>> +
>>>> + ret = netdev_rx_handler_register(child_netdev,
>>>> + virtnet_bypass_handle_frame, dev);
>>>> + if (ret != 0) {
>>>> + netdev_err(child_netdev,
>>>> + "can not register bypass receive handler (err = %d)\n",
>>>> + ret);
>>>> + goto rx_handler_failed;
>>>> + }
>>>> +
>>>> + ret = netdev_upper_dev_link(child_netdev, dev, NULL);
>>>> + if (ret != 0) {
>>>> + netdev_err(child_netdev,
>>>> + "can not set master device %s (err = %d)\n",
>>>> + dev->name, ret);
>>>> + goto upper_link_failed;
>>>> + }
>>>> +
>>>> + child_netdev->flags |= IFF_SLAVE;
>>>> +
>>>> + if (netif_running(dev)) {
>>>> + ret = dev_open(child_netdev);
>>>> + if (ret && (ret != -EBUSY)) {
>>>> + netdev_err(dev, "Opening child %s failed ret:%d\n",
>>>> + child_netdev->name, ret);
>>>> + goto err_interface_up;
>>>> + }
>>>> + }
>>> Much of this function is copy of netvsc_vf_join, should be shared with
>>> netvsc.
>> Yes. This is based on netvsc_register_vf() and netvsc_vf_join(). But modified
>> to handle registration of 2 child netdevs(backup virtio and active vf). In case
>> of netvsc, they only register VF. There is no backup netdev.
>> I think trying to make this code shareable with netvsc will introduce additional
>> checks and make it convoluted.
> No problem.
Not clear what you meant here?
Want to confirm that you are agreeing that it is OK to not share this function
with netvsc.
>
>
>>
>>>
>>>> +
>>>> + /* Align MTU of child with master */
>>>> + ret = dev_set_mtu(child_netdev, dev->mtu);
>>>> + if (ret) {
>>>> + netdev_err(dev,
>>>> + "unable to change mtu of %s to %u register failed\n",
>>>> + child_netdev->name, dev->mtu);
>>>> + goto err_set_mtu;
>>>> + }
>>>> +
>>>> + call_netdevice_notifiers(NETDEV_JOIN, child_netdev);
>>>> +
>>>> + netdev_info(dev, "registering %s\n", child_netdev->name);
>>>> +
>>>> + dev_hold(child_netdev);
>>>> + if (backup) {
>>>> + rcu_assign_pointer(vbi->backup_netdev, child_netdev);
>>>> + dev_get_stats(vbi->backup_netdev, &vbi->backup_stats);
>>>> + } else {
>>>> + rcu_assign_pointer(vbi->active_netdev, child_netdev);
>>>> + dev_get_stats(vbi->active_netdev, &vbi->active_stats);
>>>> + dev->min_mtu = child_netdev->min_mtu;
>>>> + dev->max_mtu = child_netdev->max_mtu;
>>>> + }
>>>> +
>>>> + return NOTIFY_OK;
>>>> +
>>>> +err_set_mtu:
>>>> + dev_close(child_netdev);
>>>> +err_interface_up:
>>>> + netdev_upper_dev_unlink(child_netdev, dev);
>>>> + child_netdev->flags &= ~IFF_SLAVE;
>>>> +upper_link_failed:
>>>> + netdev_rx_handler_unregister(child_netdev);
>>>> +rx_handler_failed:
>>>> + return NOTIFY_DONE;
>>>> +}
>>>> +
>>>> +static int virtnet_bypass_unregister_child(struct net_device *child_netdev)
>>>> +{
>>>> + struct virtnet_bypass_info *vbi;
>>>> + struct net_device *dev, *backup;
>>>> +
>>>> + dev = get_virtnet_bypass_byref(child_netdev);
>>>> + if (!dev)
>>>> + return NOTIFY_DONE;
>>>> +
>>>> + vbi = netdev_priv(dev);
>>>> +
>>>> + netdev_info(dev, "unregistering %s\n", child_netdev->name);
>>>> +
>>>> + netdev_rx_handler_unregister(child_netdev);
>>>> + netdev_upper_dev_unlink(child_netdev, dev);
>>>> + child_netdev->flags &= ~IFF_SLAVE;
>>>> +
>>>> + if (child_netdev->dev.parent == dev->dev.parent) {
>>>> + RCU_INIT_POINTER(vbi->backup_netdev, NULL);
>>>> + } else {
>>>> + RCU_INIT_POINTER(vbi->active_netdev, NULL);
>>>> + backup = rtnl_dereference(vbi->backup_netdev);
>>>> + if (backup) {
>>>> + dev->min_mtu = backup->min_mtu;
>>>> + dev->max_mtu = backup->max_mtu;
>>>> + }
>>>> + }
>>>> +
>>>> + dev_put(child_netdev);
>>>> +
>>>> + return NOTIFY_OK;
>>>> +}
>>>> +
>>>> +static int virtnet_bypass_update_link(struct net_device *child_netdev)
>>>> +{
>>>> + struct net_device *dev, *active, *backup;
>>>> + struct virtnet_bypass_info *vbi;
>>>> +
>>>> + dev = get_virtnet_bypass_byref(child_netdev);
>>>> + if (!dev || !netif_running(dev))
>>>> + return NOTIFY_DONE;
>>>> +
>>>> + vbi = netdev_priv(dev);
>>>> +
>>>> + active = rtnl_dereference(vbi->active_netdev);
>>>> + backup = rtnl_dereference(vbi->backup_netdev);
>>>> +
>>>> + if ((active && virtnet_bypass_xmit_ready(active)) ||
>>>> + (backup && virtnet_bypass_xmit_ready(backup))) {
>>>> + netif_carrier_on(dev);
>>>> + netif_tx_wake_all_queues(dev);
>>>> + } else {
>>>> + netif_carrier_off(dev);
>>>> + netif_tx_stop_all_queues(dev);
>>>> + }
>>>> +
>>>> + return NOTIFY_OK;
>>>> +}
>>>> +
>>>> +static int virtnet_bypass_event(struct notifier_block *this,
>>>> + unsigned long event, void *ptr)
>>>> +{
>>>> + struct net_device *event_dev = netdev_notifier_info_to_dev(ptr);
>>>> +
>>>> + /* Skip our own events */
>>>> + if (event_dev->netdev_ops == &virtnet_bypass_netdev_ops)
>>>> + return NOTIFY_DONE;
>>>> +
>>>> + /* Avoid non-Ethernet type devices */
>>>> + if (event_dev->type != ARPHRD_ETHER)
>>>> + return NOTIFY_DONE;
>>>> +
>>>> + /* Avoid Vlan dev with same MAC registering as child dev */
>>>> + if (is_vlan_dev(event_dev))
>>>> + return NOTIFY_DONE;
>>>> +
>>>> + /* Avoid Bonding master dev with same MAC registering as child dev */
>>>> + if ((event_dev->priv_flags & IFF_BONDING) &&
>>>> + (event_dev->flags & IFF_MASTER))
>>>> + return NOTIFY_DONE;
>>>> +
>>>> + switch (event) {
>>>> + case NETDEV_REGISTER:
>>>> + return virtnet_bypass_register_child(event_dev);
>>>> + case NETDEV_UNREGISTER:
>>>> + return virtnet_bypass_unregister_child(event_dev);
>>>> + case NETDEV_UP:
>>>> + case NETDEV_DOWN:
>>>> + case NETDEV_CHANGE:
>>>> + return virtnet_bypass_update_link(event_dev);
>>>> + default:
>>>> + return NOTIFY_DONE;
>>>> + }
>>>> +}
>>> For example this function is 1:1 copy of netvsc, even with comments
>>> and bugs (like IFF_BODING check).
>>>
>>> This is also something that should be shared with netvsc.
>> The problem is with the calls that are made from this function.
>> Sharing would have been possible if both virtio and netvsc went with
>> same netdev model.
> ops? You can still share.
OK. looks like you want to see atleast some code shared between netvsc and virtio_net
even when they are using 2 different netdev models.
Will try to add a new file under net/core and move some functions that can be shared
between netvsc and virtio_net in BACKUP mode.
>
>
>>>
>>>> +
>>>> +static struct notifier_block virtnet_bypass_notifier = {
>>>> + .notifier_call = virtnet_bypass_event,
>>>> +};
>>>> +
>>>> +static int virtnet_bypass_create(struct virtnet_info *vi)
>>>> +{
>>>> + struct net_device *backup_netdev = vi->dev;
>>>> + struct device *dev = &vi->vdev->dev;
>>>> + struct net_device *bypass_netdev;
>>>> + int res;
>>>> +
>>>> + /* Alloc at least 2 queues, for now we are going with 16 assuming
>>>> + * that most devices being bonded won't have too many queues.
>>>> + */
>>>> + bypass_netdev = alloc_etherdev_mq(sizeof(struct virtnet_bypass_info),
>>>> + 16);
>>>> + if (!bypass_netdev) {
>>>> + dev_err(dev, "Unable to allocate bypass_netdev!\n");
>>>> + return -ENOMEM;
>>>> + }
>>>> +
>>>> + dev_net_set(bypass_netdev, dev_net(backup_netdev));
>>>> + SET_NETDEV_DEV(bypass_netdev, dev);
>>>> +
>>>> + bypass_netdev->netdev_ops = &virtnet_bypass_netdev_ops;
>>>> + bypass_netdev->ethtool_ops = &virtnet_bypass_ethtool_ops;
>>>> +
>>>> + /* Initialize the device options */
>>>> + bypass_netdev->flags |= IFF_MASTER;
>>>> + bypass_netdev->priv_flags |= IFF_BONDING | IFF_UNICAST_FLT |
>>> No clue why you set IFF_BONDING here...
>> This is to indicate to the userspace that this is a master bond device.
> This is not a master bond device! If you say this, it makes me really
> worried....
>
Will remove IFF_BONDING in the next revision.
>
>> May be it is sufficient to just set IFF_MASTER.
>>
>>>
>>>
>>>
>>>
^ permalink raw reply
* Re: [PATCH v4 2/2] virtio_net: Extend virtio to use VF datapath when available
From: Michael S. Tsirkin @ 2018-03-14 0:44 UTC (permalink / raw)
To: Samudrala, Sridhar
Cc: Siwei Liu, Stephen Hemminger, David Miller, Netdev, Jiri Pirko,
virtio-dev, Brandeburg, Jesse, Alexander Duyck, Jakub Kicinski
In-Reply-To: <eac018bf-58fe-188d-0dad-f454c9affebb@intel.com>
On Tue, Mar 13, 2018 at 05:28:07PM -0700, Samudrala, Sridhar wrote:
> > I am not sure if it's a good idea to leave the
> > virtio_bypass around if running into failure: the guest is not
> > migratable as the VF doesn't have a backup path,
>
> Are you talking about a failure when registering backup netdev? This should not
> happen, but i guess we can improve error handling in such scenario.
A nice way to do this would be to clear the backup feature bit.
>
> > And perhaps the worse
> > part is that, it now has two interfaces with identical MAC address but
> > one of them is invalid (user cannot use the virtio interface as it has
> > a dampened datapath). IMHO the virtio_bypass and its lower netdev
> > should be destroyed at all when it fails to bind the VF, and
> > technically, there should be some way to propogate the failure status
> > to the hypervisor/backend, indicating that the VM is not migratable
> > because of guest software errors (maybe by clearing out the backup
> > feature from the guest virtio driver so host can see/learn it).
>
> In BACKUP mode, user can only use the upper virtio_bypass netdev and that will
> always be there. Any failure to enslave VF netdev is not fatal, but i will see
> if we can improve the error handling of failure to enslave backup netdev.
> Also, i don't think the BACKUP feature bit is negotiable with the host.
>
> Thanks
> Sridhar
All bits are negotiable. It's up to the host whether to support
a device with this bit clear, or to fail negotiation.
--
MST
^ permalink raw reply
* Re: [PATCH v4 2/2] virtio_net: Extend virtio to use VF datapath when available
From: Stephen Hemminger @ 2018-03-14 0:54 UTC (permalink / raw)
To: Samudrala, Sridhar
Cc: Jiri Pirko, mst, davem, netdev, virtio-dev, jesse.brandeburg,
alexander.h.duyck, kubakici
In-Reply-To: <45ca2aec-6475-06e6-7a38-be6b8aeb362b@intel.com>
On Tue, 13 Mar 2018 17:36:23 -0700
"Samudrala, Sridhar" <sridhar.samudrala@intel.com> wrote:
> OK. looks like you want to see atleast some code shared between netvsc and virtio_net
> even when they are using 2 different netdev models.
> Will try to add a new file under net/core and move some functions that can be shared
> between netvsc and virtio_net in BACKUP mode.
Make it work first (on both) then workout how to share.
That method worked quite well for tunnels and other netdev like objects.
^ permalink raw reply
* Re: [PATCH iproute2/net-next] tc: f_flower: Add support for matching first frag packets
From: David Ahern @ 2018-03-14 1:05 UTC (permalink / raw)
To: Simon Horman, Stephen Hemminger
Cc: Jiri Pirko, Jamal Hadi Salim, Cong Wang, netdev, oss-drivers,
Pieter Jansen van Vuuren
In-Reply-To: <20180309100722.21627-1-simon.horman@netronome.com>
On 3/9/18 2:07 AM, Simon Horman wrote:
> From: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
>
> Add matching support for distinguishing between first and later fragmented
> packets.
>
> # tc filter add dev eth0 protocol ip parent ffff: \
> flower indev eth0 \
> ip_flags firstfrag \
> ip_proto udp \
> action mirred egress redirect dev eth1
>
> # tc filter add dev eth0 protocol ip parent ffff: \
> flower indev eth0 \
> ip_flags nofirstfrag \
> ip_proto udp \
> action mirred egress redirect dev eth1
>
> Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
> Signed-off-by: Simon Horman <simon.horman@netronome.com>
> ---
> man/man8/tc-flower.8 | 8 ++++++--
> tc/f_flower.c | 1 +
> 2 files changed, 7 insertions(+), 2 deletions(-)
applied to iproute2-next.
^ permalink raw reply
* Re: [PATCH iproute2-next 1/2] tc: Add missing documentation for codel and fq_codel parameters
From: David Ahern @ 2018-03-14 1:07 UTC (permalink / raw)
To: Toke Høiland-Jørgensen, netdev
In-Reply-To: <20180308223137.7366-1-toke@toke.dk>
On 3/8/18 2:31 PM, Toke Høiland-Jørgensen wrote:
> Add missing documentation of the memory_limit fq_codel parameter and the
> ce_threshold codel and fq_codel parameters.
>
> Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk>
> ---
> man/man8/tc-codel.8 | 10 +++++++++-
> man/man8/tc-fq_codel.8 | 18 +++++++++++++++++-
> tc/q_fq_codel.c | 1 +
> 3 files changed, 27 insertions(+), 2 deletions(-)
applied to iproute2-next.
^ permalink raw reply
* Re: [PATCH iproute2-next 2/2] tc: Add JSON output of fq_codel stats
From: David Ahern @ 2018-03-14 1:07 UTC (permalink / raw)
To: Toke Høiland-Jørgensen, netdev
In-Reply-To: <20180308223137.7366-2-toke@toke.dk>
On 3/8/18 2:31 PM, Toke Høiland-Jørgensen wrote:
> Enable proper JSON output support for fq_codel in `tc -s qdisc` output.
>
> Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk>
> ---
> tc/q_fq_codel.c | 49 ++++++++++++++++++++++++++++++++-----------------
> 1 file changed, 32 insertions(+), 17 deletions(-)
>
applied to iproute2-next.
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox