* [PATCH net-next 00/11] Mellanox driver updates 2013-06-20
@ 2013-06-20 19:40 Amir Vadai
2013-06-20 19:40 ` [PATCH net-next 01/11] net/mlx4_en: Suppress page allocation failure warnings Amir Vadai
` (10 more replies)
0 siblings, 11 replies; 25+ messages in thread
From: Amir Vadai @ 2013-06-20 19:40 UTC (permalink / raw)
To: David S. Miller; +Cc: netdev, Or Gerlitz, Amir Vadai
Hi Dave,
Sorry for sending you the patchset twice - I didn't add netdev to the CC's
before.
This patchset contains some small fixes to mlx4_core and mlx4_en.
Patchset was applied and tested on commit: "fedaf4f ndisc:
Convert use of typedef ctl_table to struct ctl_table"
Thanks,
Amir
Dotan Barak (4):
net/mlx4_en: Fix resource leak in error flow
net/mlx4_en: Remove an unnecessary test
net/mlx4_core: Replace sscanf() with kstrtoint()
net/mlx4_core: Add warning in case of command timeouts
Eugenia Emantayev (3):
net/mlx4_en: Move register_netdev() to the end of initialization
function
net/mlx4_en: Change log level from error to debug for vlan related
messages
net/mlx4_en: Fix a race between napi poll function and RX ring cleanup
Jack Morgenstein (2):
net/mlx4_en: Do not query stats when device port is down
net/mlx4_core: Fail device init if num_vfs is negative
Yevgeny Petrilin (2):
net/mlx4_en: Suppress page allocation failure warnings
net/mlx4_en: Add prints when TX timeout occurs
drivers/net/ethernet/mellanox/mlx4/cmd.c | 6 +++++
drivers/net/ethernet/mellanox/mlx4/en_dcb_nl.c | 3 ---
drivers/net/ethernet/mellanox/mlx4/en_main.c | 2 +-
drivers/net/ethernet/mellanox/mlx4/en_netdev.c | 35 ++++++++++++++++++--------
drivers/net/ethernet/mellanox/mlx4/en_rx.c | 4 +--
drivers/net/ethernet/mellanox/mlx4/main.c | 11 +++++---
6 files changed, 41 insertions(+), 20 deletions(-)
--
1.8.3.251.g1462b67
^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH net-next 01/11] net/mlx4_en: Suppress page allocation failure warnings
2013-06-20 19:40 [PATCH net-next 00/11] Mellanox driver updates 2013-06-20 Amir Vadai
@ 2013-06-20 19:40 ` Amir Vadai
2013-06-20 21:28 ` Eric Dumazet
2013-06-20 19:40 ` [PATCH net-next 02/11] net/mlx4_en: Fix resource leak in error flow Amir Vadai
` (9 subsequent siblings)
10 siblings, 1 reply; 25+ messages in thread
From: Amir Vadai @ 2013-06-20 19:40 UTC (permalink / raw)
To: David S. Miller
Cc: netdev, Or Gerlitz, Amir Vadai, Yevgeny Petrilin,
Jack Morgenstein
From: Yevgeny Petrilin <yevgenyp@mellanox.com>
When system is low on resources, those warnings hang the host.
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.com>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
---
drivers/net/ethernet/mellanox/mlx4/en_rx.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
index 9c57581..2b564ac 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
@@ -57,8 +57,8 @@ static int mlx4_en_alloc_frags(struct mlx4_en_priv *priv,
for (i = 0; i < priv->num_frags; i++) {
frag_info = &priv->frag_info[i];
if (ring_alloc[i].offset == frag_info->last_offset) {
- page = alloc_pages(GFP_ATOMIC | __GFP_COMP,
- MLX4_EN_ALLOC_ORDER);
+ page = alloc_pages(GFP_ATOMIC | __GFP_COMP |
+ __GFP_NOWARN, MLX4_EN_ALLOC_ORDER);
if (!page)
goto out;
dma = dma_map_page(priv->ddev, page, 0,
--
1.8.3.251.g1462b67
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH net-next 02/11] net/mlx4_en: Fix resource leak in error flow
2013-06-20 19:40 [PATCH net-next 00/11] Mellanox driver updates 2013-06-20 Amir Vadai
2013-06-20 19:40 ` [PATCH net-next 01/11] net/mlx4_en: Suppress page allocation failure warnings Amir Vadai
@ 2013-06-20 19:40 ` Amir Vadai
2013-06-20 19:40 ` [PATCH net-next 03/11] net/mlx4_en: Do not query stats when device port is down Amir Vadai
` (8 subsequent siblings)
10 siblings, 0 replies; 25+ messages in thread
From: Amir Vadai @ 2013-06-20 19:40 UTC (permalink / raw)
To: David S. Miller; +Cc: netdev, Or Gerlitz, Amir Vadai, Dotan Barak
From: Dotan Barak <dotanb@dev.mellanox.com>
Wrong condition was used when calling iounmap.
Signed-off-by: Dotan Barak <dotanb@dev.mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
---
drivers/net/ethernet/mellanox/mlx4/en_main.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_main.c b/drivers/net/ethernet/mellanox/mlx4/en_main.c
index a5c9df07..a071cda 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_main.c
@@ -310,7 +310,7 @@ static void *mlx4_en_add(struct mlx4_dev *dev)
err_mr:
(void) mlx4_mr_free(dev, &mdev->mr);
err_map:
- if (!mdev->uar_map)
+ if (mdev->uar_map)
iounmap(mdev->uar_map);
err_uar:
mlx4_uar_free(dev, &mdev->priv_uar);
--
1.8.3.251.g1462b67
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH net-next 03/11] net/mlx4_en: Do not query stats when device port is down
2013-06-20 19:40 [PATCH net-next 00/11] Mellanox driver updates 2013-06-20 Amir Vadai
2013-06-20 19:40 ` [PATCH net-next 01/11] net/mlx4_en: Suppress page allocation failure warnings Amir Vadai
2013-06-20 19:40 ` [PATCH net-next 02/11] net/mlx4_en: Fix resource leak in error flow Amir Vadai
@ 2013-06-20 19:40 ` Amir Vadai
2013-06-20 19:40 ` [PATCH net-next 04/11] net/mlx4_en: Move register_netdev() to the end of initialization function Amir Vadai
` (7 subsequent siblings)
10 siblings, 0 replies; 25+ messages in thread
From: Amir Vadai @ 2013-06-20 19:40 UTC (permalink / raw)
To: David S. Miller; +Cc: netdev, Or Gerlitz, Amir Vadai, Jack Morgenstein
From: Jack Morgenstein <jackm@dev.mellanox.com>
There are no counters allocated to the eth device when the port is down, so
this query is meaningless at that time.
It also leads to querying incorrect counters (since the counter_index is not
valid when the device port is down).
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
---
drivers/net/ethernet/mellanox/mlx4/en_netdev.c | 9 +++++----
1 file changed, 5 insertions(+), 4 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
index 7299ada..c0b02d7 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
@@ -1375,12 +1375,13 @@ static void mlx4_en_do_get_stats(struct work_struct *work)
mutex_lock(&mdev->state_lock);
if (mdev->device_up) {
- err = mlx4_en_DUMP_ETH_STATS(mdev, priv->port, 0);
- if (err)
- en_dbg(HW, priv, "Could not update stats\n");
+ if (priv->port_up) {
+ err = mlx4_en_DUMP_ETH_STATS(mdev, priv->port, 0);
+ if (err)
+ en_dbg(HW, priv, "Could not update stats\n");
- if (priv->port_up)
mlx4_en_auto_moderation(priv);
+ }
queue_delayed_work(mdev->workqueue, &priv->stats_task, STATS_DELAY);
}
--
1.8.3.251.g1462b67
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH net-next 04/11] net/mlx4_en: Move register_netdev() to the end of initialization function
2013-06-20 19:40 [PATCH net-next 00/11] Mellanox driver updates 2013-06-20 Amir Vadai
` (2 preceding siblings ...)
2013-06-20 19:40 ` [PATCH net-next 03/11] net/mlx4_en: Do not query stats when device port is down Amir Vadai
@ 2013-06-20 19:40 ` Amir Vadai
2013-06-20 19:40 ` [PATCH net-next 05/11] net/mlx4_en: Change log level from error to debug for vlan related messages Amir Vadai
` (6 subsequent siblings)
10 siblings, 0 replies; 25+ messages in thread
From: Amir Vadai @ 2013-06-20 19:40 UTC (permalink / raw)
To: David S. Miller; +Cc: netdev, Or Gerlitz, Amir Vadai, Eugenia Emantayev
From: Eugenia Emantayev <eugenia@mellanox.com>
To avoid a race between the open function and everything that happens after
register_netdev() move it to be the last operation called.
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
---
drivers/net/ethernet/mellanox/mlx4/en_netdev.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
index c0b02d7..1f0f817 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
@@ -2323,6 +2323,8 @@ int mlx4_en_init_netdev(struct mlx4_en_dev *mdev, int port,
mdev->pndev[port] = dev;
netif_carrier_off(dev);
+ mlx4_en_set_default_moderation(priv);
+
err = register_netdev(dev);
if (err) {
en_err(priv, "Netdev registration failed for port %d\n", port);
@@ -2354,7 +2356,6 @@ int mlx4_en_init_netdev(struct mlx4_en_dev *mdev, int port,
en_err(priv, "Failed Initializing port\n");
goto out;
}
- mlx4_en_set_default_moderation(priv);
queue_delayed_work(mdev->workqueue, &priv->stats_task, STATS_DELAY);
if (mdev->dev->caps.flags2 & MLX4_DEV_CAP_FLAG2_TS)
--
1.8.3.251.g1462b67
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH net-next 05/11] net/mlx4_en: Change log level from error to debug for vlan related messages
2013-06-20 19:40 [PATCH net-next 00/11] Mellanox driver updates 2013-06-20 Amir Vadai
` (3 preceding siblings ...)
2013-06-20 19:40 ` [PATCH net-next 04/11] net/mlx4_en: Move register_netdev() to the end of initialization function Amir Vadai
@ 2013-06-20 19:40 ` Amir Vadai
2013-06-20 19:40 ` [PATCH net-next 06/11] net/mlx4_en: Fix a race between napi poll function and RX ring cleanup Amir Vadai
` (5 subsequent siblings)
10 siblings, 0 replies; 25+ messages in thread
From: Amir Vadai @ 2013-06-20 19:40 UTC (permalink / raw)
To: David S. Miller
Cc: netdev, Or Gerlitz, Amir Vadai, Eugenia Emantayev, Aviad Yehezkel
From: Eugenia Emantayev <eugenia@mellanox.com>
The port vlan table size is 126 (used for IBoE) so after 126 we will
not have space and the user need to see it only in debug print and not
error.
Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com>
Reviewed-by: Yevgeny Petrilin <yevgenyp@mellanox.com>
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
---
drivers/net/ethernet/mellanox/mlx4/en_netdev.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
index 1f0f817..f256a73 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
@@ -405,7 +405,7 @@ static int mlx4_en_vlan_rx_add_vid(struct net_device *dev,
en_err(priv, "Failed configuring VLAN filter\n");
}
if (mlx4_register_vlan(mdev->dev, priv->port, vid, &idx))
- en_err(priv, "failed adding vlan %d\n", vid);
+ en_dbg(HW, priv, "failed adding vlan %d\n", vid);
mutex_unlock(&mdev->state_lock);
return 0;
@@ -428,7 +428,7 @@ static int mlx4_en_vlan_rx_kill_vid(struct net_device *dev,
if (!mlx4_find_cached_vlan(mdev->dev, priv->port, vid, &idx))
mlx4_unregister_vlan(mdev->dev, priv->port, idx);
else
- en_err(priv, "could not find vid %d in cache\n", vid);
+ en_dbg(HW, priv, "could not find vid %d in cache\n", vid);
if (mdev->device_up && priv->port_up) {
err = mlx4_SET_VLAN_FLTR(mdev->dev, priv);
--
1.8.3.251.g1462b67
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH net-next 06/11] net/mlx4_en: Fix a race between napi poll function and RX ring cleanup
2013-06-20 19:40 [PATCH net-next 00/11] Mellanox driver updates 2013-06-20 Amir Vadai
` (4 preceding siblings ...)
2013-06-20 19:40 ` [PATCH net-next 05/11] net/mlx4_en: Change log level from error to debug for vlan related messages Amir Vadai
@ 2013-06-20 19:40 ` Amir Vadai
2013-06-20 19:40 ` [PATCH net-next 07/11] net/mlx4_en: Add prints when TX timeout occurs Amir Vadai
` (4 subsequent siblings)
10 siblings, 0 replies; 25+ messages in thread
From: Amir Vadai @ 2013-06-20 19:40 UTC (permalink / raw)
To: David S. Miller; +Cc: netdev, Or Gerlitz, Amir Vadai, Eugenia Emantayev
From: Eugenia Emantayev <eugenia@mellanox.com>
The RX rings were cleaned while there was still possible RX traffic completion
handling.
Change the sequance of events so that the port is closed and the QPs are being
stopped before RX cleanup.
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
---
drivers/net/ethernet/mellanox/mlx4/en_netdev.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
index f256a73..f1dcddc 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
@@ -1635,6 +1635,9 @@ void mlx4_en_stop_port(struct net_device *dev, int detach)
return;
}
+ /* close port*/
+ mlx4_CLOSE_PORT(mdev->dev, priv->port);
+
/* Synchronize with tx routine */
netif_tx_lock_bh(dev);
if (detach)
@@ -1735,14 +1738,11 @@ void mlx4_en_stop_port(struct net_device *dev, int detach)
}
local_bh_enable();
- mlx4_en_deactivate_rx_ring(priv, &priv->rx_ring[i]);
while (test_bit(NAPI_STATE_SCHED, &cq->napi.state))
msleep(1);
+ mlx4_en_deactivate_rx_ring(priv, &priv->rx_ring[i]);
mlx4_en_deactivate_cq(priv, cq);
}
-
- /* close port*/
- mlx4_CLOSE_PORT(mdev->dev, priv->port);
}
static void mlx4_en_restart(struct work_struct *work)
--
1.8.3.251.g1462b67
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH net-next 07/11] net/mlx4_en: Add prints when TX timeout occurs
2013-06-20 19:40 [PATCH net-next 00/11] Mellanox driver updates 2013-06-20 Amir Vadai
` (5 preceding siblings ...)
2013-06-20 19:40 ` [PATCH net-next 06/11] net/mlx4_en: Fix a race between napi poll function and RX ring cleanup Amir Vadai
@ 2013-06-20 19:40 ` Amir Vadai
2013-06-20 19:55 ` Joe Perches
2013-06-20 19:40 ` [PATCH net-next 08/11] net/mlx4_en: Remove an unnecessary test Amir Vadai
` (3 subsequent siblings)
10 siblings, 1 reply; 25+ messages in thread
From: Amir Vadai @ 2013-06-20 19:40 UTC (permalink / raw)
To: David S. Miller
Cc: netdev, Or Gerlitz, Amir Vadai, Yevgeny Petrilin,
Eugenia Emantayev
From: Yevgeny Petrilin <yevgenyp@mellanox.com>
Debug prints when a TX timeout is detected
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.com>
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
---
drivers/net/ethernet/mellanox/mlx4/en_netdev.c | 11 +++++++++++
1 file changed, 11 insertions(+)
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
index f1dcddc..b91d577 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
@@ -1236,10 +1236,21 @@ static void mlx4_en_tx_timeout(struct net_device *dev)
{
struct mlx4_en_priv *priv = netdev_priv(dev);
struct mlx4_en_dev *mdev = priv->mdev;
+ int i;
if (netif_msg_timer(priv))
en_warn(priv, "Tx timeout called on port:%d\n", priv->port);
+ for (i = 0; i < priv->tx_ring_num; i++) {
+ if (!netif_tx_queue_stopped(netdev_get_tx_queue(dev, i)))
+ continue;
+ en_info(priv, "TX timeout detected on queue: %d,\n"
+ "QP: 0x%x, CQ: 0x%x,\n"
+ "Cons index: 0x%x, Prod index: 0x%x\n", i,
+ priv->tx_ring[i].qpn, priv->tx_ring[i].cqn,
+ priv->tx_ring[i].cons, priv->tx_ring[i].prod);
+ }
+
priv->port_stats.tx_timeout++;
en_dbg(DRV, priv, "Scheduling watchdog\n");
queue_work(mdev->workqueue, &priv->watchdog_task);
--
1.8.3.251.g1462b67
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH net-next 08/11] net/mlx4_en: Remove an unnecessary test
2013-06-20 19:40 [PATCH net-next 00/11] Mellanox driver updates 2013-06-20 Amir Vadai
` (6 preceding siblings ...)
2013-06-20 19:40 ` [PATCH net-next 07/11] net/mlx4_en: Add prints when TX timeout occurs Amir Vadai
@ 2013-06-20 19:40 ` Amir Vadai
2013-06-20 19:40 ` [PATCH net-next 09/11] net/mlx4_core: Replace sscanf() with kstrtoint() Amir Vadai
` (2 subsequent siblings)
10 siblings, 0 replies; 25+ messages in thread
From: Amir Vadai @ 2013-06-20 19:40 UTC (permalink / raw)
To: David S. Miller; +Cc: netdev, Or Gerlitz, Amir Vadai, Dotan Barak
From: Dotan Barak <dotanb@dev.mellanox.com>
Since this variable is now part of a structure and not allocated dynamically,
this test is irrelevant now.
Signed-off-by: Dotan Barak <dotanb@dev.mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
---
drivers/net/ethernet/mellanox/mlx4/en_dcb_nl.c | 3 ---
1 file changed, 3 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_dcb_nl.c b/drivers/net/ethernet/mellanox/mlx4/en_dcb_nl.c
index 0f91222..9d4a1ea 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_dcb_nl.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_dcb_nl.c
@@ -207,9 +207,6 @@ static int mlx4_en_dcbnl_ieee_getmaxrate(struct net_device *dev,
struct mlx4_en_priv *priv = netdev_priv(dev);
int i;
- if (!priv->maxrate)
- return -EINVAL;
-
for (i = 0; i < IEEE_8021QAZ_MAX_TCS; i++)
maxrate->tc_maxrate[i] =
priv->maxrate[i] * MLX4_RATELIMIT_UNITS_IN_KB;
--
1.8.3.251.g1462b67
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH net-next 09/11] net/mlx4_core: Replace sscanf() with kstrtoint()
2013-06-20 19:40 [PATCH net-next 00/11] Mellanox driver updates 2013-06-20 Amir Vadai
` (7 preceding siblings ...)
2013-06-20 19:40 ` [PATCH net-next 08/11] net/mlx4_en: Remove an unnecessary test Amir Vadai
@ 2013-06-20 19:40 ` Amir Vadai
2013-06-20 19:40 ` [PATCH net-next 10/11] net/mlx4_core: Add warning in case of command timeouts Amir Vadai
2013-06-20 19:40 ` [PATCH net-next 11/11] net/mlx4_core: Fail device init if num_vfs is negative Amir Vadai
10 siblings, 0 replies; 25+ messages in thread
From: Amir Vadai @ 2013-06-20 19:40 UTC (permalink / raw)
To: David S. Miller
Cc: netdev, Or Gerlitz, Amir Vadai, Dotan Barak, Vladimir Sokolovsky
From: Dotan Barak <dotanb@dev.mellanox.com>
It is not safe to use sscanf.
Signed-off-by: Dotan Barak <dotanb@dev.mellanox.com>
Signed-off-by: Vladimir Sokolovsky <vlad@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
---
drivers/net/ethernet/mellanox/mlx4/main.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx4/main.c b/drivers/net/ethernet/mellanox/mlx4/main.c
index 2f4a260..81e4529 100644
--- a/drivers/net/ethernet/mellanox/mlx4/main.c
+++ b/drivers/net/ethernet/mellanox/mlx4/main.c
@@ -839,11 +839,11 @@ static ssize_t set_port_ib_mtu(struct device *dev,
return -EINVAL;
}
- err = sscanf(buf, "%d", &mtu);
- if (err > 0)
+ err = kstrtoint(buf, 0, &mtu);
+ if (!err)
ibta_mtu = int_to_ibta_mtu(mtu);
- if (err <= 0 || ibta_mtu < 0) {
+ if (err || ibta_mtu < 0) {
mlx4_err(mdev, "%s is invalid IBTA mtu\n", buf);
return -EINVAL;
}
--
1.8.3.251.g1462b67
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH net-next 10/11] net/mlx4_core: Add warning in case of command timeouts
2013-06-20 19:40 [PATCH net-next 00/11] Mellanox driver updates 2013-06-20 Amir Vadai
` (8 preceding siblings ...)
2013-06-20 19:40 ` [PATCH net-next 09/11] net/mlx4_core: Replace sscanf() with kstrtoint() Amir Vadai
@ 2013-06-20 19:40 ` Amir Vadai
2013-06-20 19:40 ` [PATCH net-next 11/11] net/mlx4_core: Fail device init if num_vfs is negative Amir Vadai
10 siblings, 0 replies; 25+ messages in thread
From: Amir Vadai @ 2013-06-20 19:40 UTC (permalink / raw)
To: David S. Miller; +Cc: netdev, Or Gerlitz, Amir Vadai, Dotan Barak
From: Dotan Barak <dotanb@dev.mellanox.com>
Warning prints when there are command timeout to help debugging future
failures.
Signed-off-by: Dotan Barak <dotanb@dev.mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
---
drivers/net/ethernet/mellanox/mlx4/cmd.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/drivers/net/ethernet/mellanox/mlx4/cmd.c b/drivers/net/ethernet/mellanox/mlx4/cmd.c
index ea1e038..df04c82 100644
--- a/drivers/net/ethernet/mellanox/mlx4/cmd.c
+++ b/drivers/net/ethernet/mellanox/mlx4/cmd.c
@@ -257,6 +257,8 @@ static int mlx4_comm_cmd_wait(struct mlx4_dev *dev, u8 op,
if (!wait_for_completion_timeout(&context->done,
msecs_to_jiffies(timeout))) {
+ mlx4_warn(dev, "communication channel command 0x%x timed out\n",
+ op);
err = -EBUSY;
goto out;
}
@@ -486,6 +488,8 @@ static int mlx4_cmd_poll(struct mlx4_dev *dev, u64 in_param, u64 *out_param,
}
if (cmd_pending(dev)) {
+ mlx4_warn(dev, "command 0x%x timed out (go bit not cleared)\n",
+ op);
err = -ETIMEDOUT;
goto out;
}
@@ -549,6 +553,8 @@ static int mlx4_cmd_wait(struct mlx4_dev *dev, u64 in_param, u64 *out_param,
if (!wait_for_completion_timeout(&context->done,
msecs_to_jiffies(timeout))) {
+ mlx4_warn(dev, "command 0x%x timed out (go bit not cleared)\n",
+ op);
err = -EBUSY;
goto out;
}
--
1.8.3.251.g1462b67
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [PATCH net-next 11/11] net/mlx4_core: Fail device init if num_vfs is negative
2013-06-20 19:40 [PATCH net-next 00/11] Mellanox driver updates 2013-06-20 Amir Vadai
` (9 preceding siblings ...)
2013-06-20 19:40 ` [PATCH net-next 10/11] net/mlx4_core: Add warning in case of command timeouts Amir Vadai
@ 2013-06-20 19:40 ` Amir Vadai
10 siblings, 0 replies; 25+ messages in thread
From: Amir Vadai @ 2013-06-20 19:40 UTC (permalink / raw)
To: David S. Miller
Cc: netdev, Or Gerlitz, Amir Vadai, Jack Morgenstein,
Vladimir Sokolovsky
From: Jack Morgenstein <jackm@dev.mellanox.com>
Should not allow negative num_vfs
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.com>
Signed-off-by: Vladimir Sokolovsky <vlad@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
---
drivers/net/ethernet/mellanox/mlx4/main.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/drivers/net/ethernet/mellanox/mlx4/main.c b/drivers/net/ethernet/mellanox/mlx4/main.c
index 81e4529..56160a2 100644
--- a/drivers/net/ethernet/mellanox/mlx4/main.c
+++ b/drivers/net/ethernet/mellanox/mlx4/main.c
@@ -2077,6 +2077,11 @@ static int __mlx4_init_one(struct pci_dev *pdev, int pci_dev_data)
num_vfs, MLX4_MAX_NUM_VF);
return -EINVAL;
}
+
+ if (num_vfs < 0) {
+ pr_err("num_vfs module parameter cannot be negative\n");
+ return -EINVAL;
+ }
/*
* Check for BARs.
*/
--
1.8.3.251.g1462b67
^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [PATCH net-next 07/11] net/mlx4_en: Add prints when TX timeout occurs
2013-06-20 19:40 ` [PATCH net-next 07/11] net/mlx4_en: Add prints when TX timeout occurs Amir Vadai
@ 2013-06-20 19:55 ` Joe Perches
2013-06-21 5:31 ` Amir Vadai
0 siblings, 1 reply; 25+ messages in thread
From: Joe Perches @ 2013-06-20 19:55 UTC (permalink / raw)
To: Amir Vadai
Cc: David S. Miller, netdev, Or Gerlitz, Yevgeny Petrilin,
Eugenia Emantayev
On Thu, 2013-06-20 at 22:40 +0300, Amir Vadai wrote:
> From: Yevgeny Petrilin <yevgenyp@mellanox.com>
>
> Debug prints when a TX timeout is detected
[]
> diff --git a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
[]
> @@ -1236,10 +1236,21 @@ static void mlx4_en_tx_timeout(struct net_device *dev)
[]
> + for (i = 0; i < priv->tx_ring_num; i++) {
> + if (!netif_tx_queue_stopped(netdev_get_tx_queue(dev, i)))
> + continue;
> + en_info(priv, "TX timeout detected on queue: %d,\n"
> + "QP: 0x%x, CQ: 0x%x,\n"
> + "Cons index: 0x%x, Prod index: 0x%x\n", i,
Not at dbg level and probably better on a single line.
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH net-next 01/11] net/mlx4_en: Suppress page allocation failure warnings
2013-06-20 19:40 ` [PATCH net-next 01/11] net/mlx4_en: Suppress page allocation failure warnings Amir Vadai
@ 2013-06-20 21:28 ` Eric Dumazet
2013-06-21 5:35 ` Amir Vadai
2013-06-23 8:46 ` Amir Vadai
0 siblings, 2 replies; 25+ messages in thread
From: Eric Dumazet @ 2013-06-20 21:28 UTC (permalink / raw)
To: Amir Vadai
Cc: David S. Miller, netdev, Or Gerlitz, Yevgeny Petrilin,
Jack Morgenstein
On Thu, 2013-06-20 at 22:40 +0300, Amir Vadai wrote:
> From: Yevgeny Petrilin <yevgenyp@mellanox.com>
>
> When system is low on resources, those warnings hang the host.
>
> Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.com>
> Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.com>
> Signed-off-by: Amir Vadai <amirv@mellanox.com>
> ---
> drivers/net/ethernet/mellanox/mlx4/en_rx.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> index 9c57581..2b564ac 100644
> --- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> +++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> @@ -57,8 +57,8 @@ static int mlx4_en_alloc_frags(struct mlx4_en_priv *priv,
> for (i = 0; i < priv->num_frags; i++) {
> frag_info = &priv->frag_info[i];
> if (ring_alloc[i].offset == frag_info->last_offset) {
> - page = alloc_pages(GFP_ATOMIC | __GFP_COMP,
> - MLX4_EN_ALLOC_ORDER);
> + page = alloc_pages(GFP_ATOMIC | __GFP_COMP |
> + __GFP_NOWARN, MLX4_EN_ALLOC_ORDER);
> if (!page)
> goto out;
> dma = dma_map_page(priv->ddev, page, 0,
Thats IMHO a lazy patch...
What about mlx4_en_init_allocator() ?
I think I did a patch doing fallback to order-1 and order-0 allocations
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH net-next 07/11] net/mlx4_en: Add prints when TX timeout occurs
2013-06-20 19:55 ` Joe Perches
@ 2013-06-21 5:31 ` Amir Vadai
0 siblings, 0 replies; 25+ messages in thread
From: Amir Vadai @ 2013-06-21 5:31 UTC (permalink / raw)
To: Joe Perches
Cc: Amir Vadai, David S. Miller, netdev, Or Gerlitz, Yevgeny Petrilin,
Eugenia Emantayev
On 20/06/2013 22:55, Joe Perches wrote:
> On Thu, 2013-06-20 at 22:40 +0300, Amir Vadai wrote:
>> From: Yevgeny Petrilin <yevgenyp@mellanox.com>
>>
>> Debug prints when a TX timeout is detected
> []
>> diff --git a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
> []
>> @@ -1236,10 +1236,21 @@ static void mlx4_en_tx_timeout(struct net_device *dev)
> []
>> + for (i = 0; i < priv->tx_ring_num; i++) {
>> + if (!netif_tx_queue_stopped(netdev_get_tx_queue(dev, i)))
>> + continue;
>> + en_info(priv, "TX timeout detected on queue: %d,\n"
>> + "QP: 0x%x, CQ: 0x%x,\n"
>> + "Cons index: 0x%x, Prod index: 0x%x\n", i,
>
> Not at dbg level and probably better on a single line.
>
Will be fixed in v1
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH net-next 01/11] net/mlx4_en: Suppress page allocation failure warnings
2013-06-20 21:28 ` Eric Dumazet
@ 2013-06-21 5:35 ` Amir Vadai
2013-06-23 8:46 ` Amir Vadai
1 sibling, 0 replies; 25+ messages in thread
From: Amir Vadai @ 2013-06-21 5:35 UTC (permalink / raw)
To: Eric Dumazet
Cc: Amir Vadai, David S. Miller, netdev, Or Gerlitz, Yevgeny Petrilin,
Jack Morgenstein
On 21/06/2013 00:28, Eric Dumazet wrote:
> On Thu, 2013-06-20 at 22:40 +0300, Amir Vadai wrote:
>> From: Yevgeny Petrilin <yevgenyp@mellanox.com>
>>
>> When system is low on resources, those warnings hang the host.
>>
>> Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.com>
>> Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.com>
>> Signed-off-by: Amir Vadai <amirv@mellanox.com>
>> ---
>> drivers/net/ethernet/mellanox/mlx4/en_rx.c | 4 ++--
>> 1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
>> index 9c57581..2b564ac 100644
>> --- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
>> +++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
>> @@ -57,8 +57,8 @@ static int mlx4_en_alloc_frags(struct mlx4_en_priv *priv,
>> for (i = 0; i < priv->num_frags; i++) {
>> frag_info = &priv->frag_info[i];
>> if (ring_alloc[i].offset == frag_info->last_offset) {
>> - page = alloc_pages(GFP_ATOMIC | __GFP_COMP,
>> - MLX4_EN_ALLOC_ORDER);
>> + page = alloc_pages(GFP_ATOMIC | __GFP_COMP |
>> + __GFP_NOWARN, MLX4_EN_ALLOC_ORDER);
>> if (!page)
>> goto out;
>> dma = dma_map_page(priv->ddev, page, 0,
>
>
> Thats IMHO a lazy patch...
>
> What about mlx4_en_init_allocator() ?
>
> I think I did a patch doing fallback to order-1 and order-0 allocations
>
I will go over the patch, and fix it for v1.
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH net-next 01/11] net/mlx4_en: Suppress page allocation failure warnings
2013-06-20 21:28 ` Eric Dumazet
2013-06-21 5:35 ` Amir Vadai
@ 2013-06-23 8:46 ` Amir Vadai
2013-06-23 15:14 ` Eric Dumazet
2013-06-23 15:17 ` [PATCH net-next] mlx4: allow order-0 memory allocations in RX path Eric Dumazet
1 sibling, 2 replies; 25+ messages in thread
From: Amir Vadai @ 2013-06-23 8:46 UTC (permalink / raw)
To: Eric Dumazet
Cc: David S. Miller, netdev, Or Gerlitz, Yevgeny Petrilin,
Jack Morgenstein
On 21/06/2013 00:28, Eric Dumazet wrote:
> On Thu, 2013-06-20 at 22:40 +0300, Amir Vadai wrote:
>> From: Yevgeny Petrilin <yevgenyp@mellanox.com>
>>
>> When system is low on resources, those warnings hang the host.
>>
>> Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.com>
>> Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.com>
>> Signed-off-by: Amir Vadai <amirv@mellanox.com>
>> ---
>> drivers/net/ethernet/mellanox/mlx4/en_rx.c | 4 ++--
>> 1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
>> index 9c57581..2b564ac 100644
>> --- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
>> +++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
>> @@ -57,8 +57,8 @@ static int mlx4_en_alloc_frags(struct mlx4_en_priv *priv,
>> for (i = 0; i < priv->num_frags; i++) {
>> frag_info = &priv->frag_info[i];
>> if (ring_alloc[i].offset == frag_info->last_offset) {
>> - page = alloc_pages(GFP_ATOMIC | __GFP_COMP,
>> - MLX4_EN_ALLOC_ORDER);
>> + page = alloc_pages(GFP_ATOMIC | __GFP_COMP |
>> + __GFP_NOWARN, MLX4_EN_ALLOC_ORDER);
>> if (!page)
>> goto out;
>> dma = dma_map_page(priv->ddev, page, 0,
>
>
> Thats IMHO a lazy patch...
>
> What about mlx4_en_init_allocator() ?
mlx4_en_init_allocator() is called only on driver initialization - I
don't care if it will warn when no memory.
But mlx4_en_alloc_frags() is on the data path, and many warnings there
when the system is already stressed and without memory is bad.
Besides the warnings, this error flow is handled, and
mlx4_en_en_fill_rx_buffers() will handle the ENOMEM.
>
> I think I did a patch doing fallback to order-1 and order-0 allocations
Current code has, as I said above, a fallback to reduce the rx ring size
when memory is stressed. Do you suggest to use smaller fragments instead
(or in addition)?
Can you send me a link to the patch?
Thanks,
Amir
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH net-next 01/11] net/mlx4_en: Suppress page allocation failure warnings
2013-06-23 8:46 ` Amir Vadai
@ 2013-06-23 15:14 ` Eric Dumazet
2013-06-23 15:17 ` [PATCH net-next] mlx4: allow order-0 memory allocations in RX path Eric Dumazet
1 sibling, 0 replies; 25+ messages in thread
From: Eric Dumazet @ 2013-06-23 15:14 UTC (permalink / raw)
To: Amir Vadai
Cc: David S. Miller, netdev, Or Gerlitz, Yevgeny Petrilin,
Jack Morgenstein
On Sun, 2013-06-23 at 11:46 +0300, Amir Vadai wrote:
> But mlx4_en_alloc_frags() is on the data path, and many warnings there
> when the system is already stressed and without memory is bad.
> Besides the warnings, this error flow is handled, and
> mlx4_en_en_fill_rx_buffers() will handle the ENOMEM.
>
> >
> > I think I did a patch doing fallback to order-1 and order-0 allocations
> Current code has, as I said above, a fallback to reduce the rx ring size
> when memory is stressed.
Well, it should not do that, and instead use GFP_KERNEL.
> Do you suggest to use smaller fragments instead
> (or in addition)?
> Can you send me a link to the patch?
>
Sure, I'll respin it.
^ permalink raw reply [flat|nested] 25+ messages in thread
* [PATCH net-next] mlx4: allow order-0 memory allocations in RX path
2013-06-23 8:46 ` Amir Vadai
2013-06-23 15:14 ` Eric Dumazet
@ 2013-06-23 15:17 ` Eric Dumazet
2013-06-23 20:17 ` Or Gerlitz
` (2 more replies)
1 sibling, 3 replies; 25+ messages in thread
From: Eric Dumazet @ 2013-06-23 15:17 UTC (permalink / raw)
To: Amir Vadai; +Cc: David S. Miller, netdev, Or Gerlitz, Yevgeny Petrilin
Signed-off-by: Eric Dumazet <edumazet@google.com>
mlx4 exclusively uses order-2 allocations in RX path, which are
likely to fail under memory pressure.
We therefore drop frames more than needed.
This patch tries order-3, order-2, order-1 and finally order-0
allocations to keep good performance, yet allow allocations if/when
memory gets fragmented.
By using larger pages, and avoiding unnecessary get_page()/put_page()
on compound pages, this patch improves performance as well, lowering
false sharing on struct page.
Also use GFP_KERNEL allocations in initialization path, as allocating 12
MB (390 order-3 pages) can easily fail with GFP_ATOMIC.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Amir Vadai <amirv@mellanox.com>
---
drivers/net/ethernet/mellanox/mlx4/en_rx.c | 169 ++++++++---------
drivers/net/ethernet/mellanox/mlx4/mlx4_en.h | 12 -
2 files changed, 95 insertions(+), 86 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
index 9c57581..76997b9 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
@@ -43,40 +43,64 @@
#include "mlx4_en.h"
+static int mlx4_alloc_pages(struct mlx4_en_priv *priv,
+ struct mlx4_en_rx_alloc *page_alloc,
+ const struct mlx4_en_frag_info *frag_info,
+ gfp_t _gfp)
+{
+ int order;
+ struct page *page;
+ dma_addr_t dma;
+
+ for (order = MLX4_EN_ALLOC_PREFER_ORDER; ;) {
+ gfp_t gfp = _gfp;
+
+ if (order)
+ gfp |= __GFP_COMP | __GFP_NOWARN;
+ page = alloc_pages(gfp, order);
+ if (likely(page))
+ break;
+ if (--order < 0 ||
+ ((PAGE_SIZE << order) < frag_info->frag_size))
+ return -ENOMEM;
+ }
+ dma = dma_map_page(priv->ddev, page, 0, PAGE_SIZE << order,
+ PCI_DMA_FROMDEVICE);
+ if (dma_mapping_error(priv->ddev, dma)) {
+ put_page(page);
+ return -ENOMEM;
+ }
+ page_alloc->size = PAGE_SIZE << order;
+ page_alloc->page = page;
+ page_alloc->dma = dma;
+ page_alloc->offset = frag_info->frag_align;
+ /* Not doing get_page() for each frag is a big win
+ * on asymetric workloads.
+ */
+ atomic_set(&page->_count, page_alloc->size / frag_info->frag_stride);
+ return 0;
+}
+
static int mlx4_en_alloc_frags(struct mlx4_en_priv *priv,
struct mlx4_en_rx_desc *rx_desc,
struct mlx4_en_rx_alloc *frags,
- struct mlx4_en_rx_alloc *ring_alloc)
+ struct mlx4_en_rx_alloc *ring_alloc,
+ gfp_t gfp)
{
struct mlx4_en_rx_alloc page_alloc[MLX4_EN_MAX_RX_FRAGS];
- struct mlx4_en_frag_info *frag_info;
+ const struct mlx4_en_frag_info *frag_info;
struct page *page;
dma_addr_t dma;
int i;
for (i = 0; i < priv->num_frags; i++) {
frag_info = &priv->frag_info[i];
- if (ring_alloc[i].offset == frag_info->last_offset) {
- page = alloc_pages(GFP_ATOMIC | __GFP_COMP,
- MLX4_EN_ALLOC_ORDER);
- if (!page)
- goto out;
- dma = dma_map_page(priv->ddev, page, 0,
- MLX4_EN_ALLOC_SIZE, PCI_DMA_FROMDEVICE);
- if (dma_mapping_error(priv->ddev, dma)) {
- put_page(page);
- goto out;
- }
- page_alloc[i].page = page;
- page_alloc[i].dma = dma;
- page_alloc[i].offset = frag_info->frag_align;
- } else {
- page_alloc[i].page = ring_alloc[i].page;
- get_page(ring_alloc[i].page);
- page_alloc[i].dma = ring_alloc[i].dma;
- page_alloc[i].offset = ring_alloc[i].offset +
- frag_info->frag_stride;
- }
+ page_alloc[i] = ring_alloc[i];
+ page_alloc[i].offset += frag_info->frag_stride;
+ if (page_alloc[i].offset + frag_info->frag_stride <= ring_alloc[i].size)
+ continue;
+ if (mlx4_alloc_pages(priv, &page_alloc[i], frag_info, gfp))
+ goto out;
}
for (i = 0; i < priv->num_frags; i++) {
@@ -88,14 +112,16 @@ static int mlx4_en_alloc_frags(struct mlx4_en_priv *priv,
return 0;
-
out:
while (i--) {
frag_info = &priv->frag_info[i];
- if (ring_alloc[i].offset == frag_info->last_offset)
+ if (page_alloc[i].page != ring_alloc[i].page) {
dma_unmap_page(priv->ddev, page_alloc[i].dma,
- MLX4_EN_ALLOC_SIZE, PCI_DMA_FROMDEVICE);
- put_page(page_alloc[i].page);
+ page_alloc[i].size, PCI_DMA_FROMDEVICE);
+ page = page_alloc[i].page;
+ atomic_set(&page->_count, 1);
+ put_page(page);
+ }
}
return -ENOMEM;
}
@@ -104,12 +130,12 @@ static void mlx4_en_free_frag(struct mlx4_en_priv *priv,
struct mlx4_en_rx_alloc *frags,
int i)
{
- struct mlx4_en_frag_info *frag_info = &priv->frag_info[i];
+ const struct mlx4_en_frag_info *frag_info = &priv->frag_info[i];
- if (frags[i].offset == frag_info->last_offset) {
- dma_unmap_page(priv->ddev, frags[i].dma, MLX4_EN_ALLOC_SIZE,
+ if (frags[i].offset + frag_info->frag_stride > frags[i].size)
+ dma_unmap_page(priv->ddev, frags[i].dma, frags[i].size,
PCI_DMA_FROMDEVICE);
- }
+
if (frags[i].page)
put_page(frags[i].page);
}
@@ -117,35 +143,28 @@ static void mlx4_en_free_frag(struct mlx4_en_priv *priv,
static int mlx4_en_init_allocator(struct mlx4_en_priv *priv,
struct mlx4_en_rx_ring *ring)
{
- struct mlx4_en_rx_alloc *page_alloc;
int i;
+ struct mlx4_en_rx_alloc *page_alloc;
for (i = 0; i < priv->num_frags; i++) {
- page_alloc = &ring->page_alloc[i];
- page_alloc->page = alloc_pages(GFP_ATOMIC | __GFP_COMP,
- MLX4_EN_ALLOC_ORDER);
- if (!page_alloc->page)
- goto out;
+ const struct mlx4_en_frag_info *frag_info = &priv->frag_info[i];
- page_alloc->dma = dma_map_page(priv->ddev, page_alloc->page, 0,
- MLX4_EN_ALLOC_SIZE, PCI_DMA_FROMDEVICE);
- if (dma_mapping_error(priv->ddev, page_alloc->dma)) {
- put_page(page_alloc->page);
- page_alloc->page = NULL;
+ if (mlx4_alloc_pages(priv, &ring->page_alloc[i],
+ frag_info, GFP_KERNEL))
goto out;
- }
- page_alloc->offset = priv->frag_info[i].frag_align;
- en_dbg(DRV, priv, "Initialized allocator:%d with page:%p\n",
- i, page_alloc->page);
}
return 0;
out:
while (i--) {
+ struct page *page;
+
page_alloc = &ring->page_alloc[i];
dma_unmap_page(priv->ddev, page_alloc->dma,
- MLX4_EN_ALLOC_SIZE, PCI_DMA_FROMDEVICE);
- put_page(page_alloc->page);
+ page_alloc->size, PCI_DMA_FROMDEVICE);
+ page = page_alloc->page;
+ atomic_set(&page->_count, 1);
+ put_page(page);
page_alloc->page = NULL;
}
return -ENOMEM;
@@ -158,13 +177,18 @@ static void mlx4_en_destroy_allocator(struct mlx4_en_priv *priv,
int i;
for (i = 0; i < priv->num_frags; i++) {
+ const struct mlx4_en_frag_info *frag_info = &priv->frag_info[i];
+
page_alloc = &ring->page_alloc[i];
en_dbg(DRV, priv, "Freeing allocator:%d count:%d\n",
i, page_count(page_alloc->page));
dma_unmap_page(priv->ddev, page_alloc->dma,
- MLX4_EN_ALLOC_SIZE, PCI_DMA_FROMDEVICE);
- put_page(page_alloc->page);
+ page_alloc->size, PCI_DMA_FROMDEVICE);
+ while (page_alloc->offset + frag_info->frag_stride < page_alloc->size) {
+ put_page(page_alloc->page);
+ page_alloc->offset += frag_info->frag_stride;
+ }
page_alloc->page = NULL;
}
}
@@ -195,13 +219,14 @@ static void mlx4_en_init_rx_desc(struct mlx4_en_priv *priv,
}
static int mlx4_en_prepare_rx_desc(struct mlx4_en_priv *priv,
- struct mlx4_en_rx_ring *ring, int index)
+ struct mlx4_en_rx_ring *ring, int index,
+ gfp_t gfp)
{
struct mlx4_en_rx_desc *rx_desc = ring->buf + (index * ring->stride);
struct mlx4_en_rx_alloc *frags = ring->rx_info +
(index << priv->log_rx_info);
- return mlx4_en_alloc_frags(priv, rx_desc, frags, ring->page_alloc);
+ return mlx4_en_alloc_frags(priv, rx_desc, frags, ring->page_alloc, gfp);
}
static inline void mlx4_en_update_rx_prod_db(struct mlx4_en_rx_ring *ring)
@@ -235,7 +260,8 @@ static int mlx4_en_fill_rx_buffers(struct mlx4_en_priv *priv)
ring = &priv->rx_ring[ring_ind];
if (mlx4_en_prepare_rx_desc(priv, ring,
- ring->actual_size)) {
+ ring->actual_size,
+ GFP_KERNEL)) {
if (ring->actual_size < MLX4_EN_MIN_RX_SIZE) {
en_err(priv, "Failed to allocate "
"enough rx buffers\n");
@@ -450,11 +476,11 @@ static int mlx4_en_complete_rx_desc(struct mlx4_en_priv *priv,
DMA_FROM_DEVICE);
/* Save page reference in skb */
- get_page(frags[nr].page);
__skb_frag_set_page(&skb_frags_rx[nr], frags[nr].page);
skb_frag_size_set(&skb_frags_rx[nr], frag_info->frag_size);
skb_frags_rx[nr].page_offset = frags[nr].offset;
skb->truesize += frag_info->frag_stride;
+ frags[nr].page = NULL;
}
/* Adjust size of last fragment to match actual length */
if (nr > 0)
@@ -547,7 +573,7 @@ static void mlx4_en_refill_rx_buffers(struct mlx4_en_priv *priv,
int index = ring->prod & ring->size_mask;
while ((u32) (ring->prod - ring->cons) < ring->actual_size) {
- if (mlx4_en_prepare_rx_desc(priv, ring, index))
+ if (mlx4_en_prepare_rx_desc(priv, ring, index, GFP_ATOMIC))
break;
ring->prod++;
index = ring->prod & ring->size_mask;
@@ -805,21 +831,7 @@ int mlx4_en_poll_rx_cq(struct napi_struct *napi, int budget)
return done;
}
-
-/* Calculate the last offset position that accommodates a full fragment
- * (assuming fagment size = stride-align) */
-static int mlx4_en_last_alloc_offset(struct mlx4_en_priv *priv, u16 stride, u16 align)
-{
- u16 res = MLX4_EN_ALLOC_SIZE % stride;
- u16 offset = MLX4_EN_ALLOC_SIZE - stride - res + align;
-
- en_dbg(DRV, priv, "Calculated last offset for stride:%d align:%d "
- "res:%d offset:%d\n", stride, align, res, offset);
- return offset;
-}
-
-
-static int frag_sizes[] = {
+static const int frag_sizes[] = {
FRAG_SZ0,
FRAG_SZ1,
FRAG_SZ2,
@@ -847,9 +859,6 @@ void mlx4_en_calc_rx_buf(struct net_device *dev)
priv->frag_info[i].frag_stride =
ALIGN(frag_sizes[i], SMP_CACHE_BYTES);
}
- priv->frag_info[i].last_offset = mlx4_en_last_alloc_offset(
- priv, priv->frag_info[i].frag_stride,
- priv->frag_info[i].frag_align);
buf_size += priv->frag_info[i].frag_size;
i++;
}
@@ -861,13 +870,13 @@ void mlx4_en_calc_rx_buf(struct net_device *dev)
en_dbg(DRV, priv, "Rx buffer scatter-list (effective-mtu:%d "
"num_frags:%d):\n", eff_mtu, priv->num_frags);
for (i = 0; i < priv->num_frags; i++) {
- en_dbg(DRV, priv, " frag:%d - size:%d prefix:%d align:%d "
- "stride:%d last_offset:%d\n", i,
- priv->frag_info[i].frag_size,
- priv->frag_info[i].frag_prefix_size,
- priv->frag_info[i].frag_align,
- priv->frag_info[i].frag_stride,
- priv->frag_info[i].last_offset);
+ en_err(priv,
+ " frag:%d - size:%d prefix:%d align:%d stride:%d\n",
+ i,
+ priv->frag_info[i].frag_size,
+ priv->frag_info[i].frag_prefix_size,
+ priv->frag_info[i].frag_align,
+ priv->frag_info[i].frag_stride);
}
}
diff --git a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
index 57192a8..35fb60e 100644
--- a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
+++ b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
@@ -96,7 +96,8 @@
/* Use the maximum between 16384 and a single page */
#define MLX4_EN_ALLOC_SIZE PAGE_ALIGN(16384)
-#define MLX4_EN_ALLOC_ORDER get_order(MLX4_EN_ALLOC_SIZE)
+
+#define MLX4_EN_ALLOC_PREFER_ORDER PAGE_ALLOC_COSTLY_ORDER
/* Receive fragment sizes; we use at most 3 fragments (for 9600 byte MTU
* and 4K allocations) */
@@ -234,9 +235,10 @@ struct mlx4_en_tx_desc {
#define MLX4_EN_CX3_HIGH_ID 0x1005
struct mlx4_en_rx_alloc {
- struct page *page;
- dma_addr_t dma;
- u16 offset;
+ struct page *page;
+ dma_addr_t dma;
+ u32 offset;
+ u32 size;
};
struct mlx4_en_tx_ring {
@@ -439,8 +441,6 @@ struct mlx4_en_frag_info {
u16 frag_prefix_size;
u16 frag_stride;
u16 frag_align;
- u16 last_offset;
-
};
#ifdef CONFIG_MLX4_EN_DCB
^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [PATCH net-next] mlx4: allow order-0 memory allocations in RX path
2013-06-23 15:17 ` [PATCH net-next] mlx4: allow order-0 memory allocations in RX path Eric Dumazet
@ 2013-06-23 20:17 ` Or Gerlitz
2013-06-23 21:13 ` Eric Dumazet
2013-06-24 14:09 ` Or Gerlitz
2013-06-25 8:53 ` Or Gerlitz
2 siblings, 1 reply; 25+ messages in thread
From: Or Gerlitz @ 2013-06-23 20:17 UTC (permalink / raw)
To: Eric Dumazet
Cc: David S. Miller, netdev, Or Gerlitz, Eugenia Emantayev,
Saeed Mahameed
On Sun, Jun 23, 2013 at 6:17 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Signed-off-by: Eric Dumazet <edumazet@google.com>
>
> mlx4 exclusively uses order-2 allocations in RX path, which are
> likely to fail under memory pressure.
>
> We therefore drop frames more than needed.
>
> This patch tries order-3, order-2, order-1 and finally order-0
> allocations to keep good performance, yet allow allocations if/when
> memory gets fragmented.
>
> By using larger pages, and avoiding unnecessary get_page()/put_page()
> on compound pages, this patch improves performance as well, lowering
> false sharing on struct page.
Hi Eric, thanks for the patch, both Amir and Yevgeny are OOO, so it
will take us a bit more time to conduct the review... but lets start:
could you explain a little further what do you exactly refer to by
"false sharing" in this context?
Also, I am not fully sure, but I think the current driver code doesn't
support splice and this somehow relates to how RX skbs are spread over
pages. In that repsect, I wonder if this patch goes in the direction
that would allow to support splice, or maybe takes us a bit back, as
of moving to use order-3 allocations?
You've mentioned performance improvement, could you be more specific?
what's the scheme under which you saw the improvement and what was
that improvement.
Last, as Amir wrote you, we're looking on re-using skbs on the RX
patch to avoid sever performance hits when IOMMU is enabled. The team
has not provided me yet the patch, but basically, if you look on the
ixgbe patch that was made largely for that very same purpose
(improving perf under IOMMU) f800326dca7bc158f4c886aa92f222de37993c80
"ixgbe: Replace standard receive path with a page based receive" ,
they use there order-0 or order-1 allocations, but not order-2 or
order-3, also here I have some more catch up to conduct, so we'll
see...
Or.
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH net-next] mlx4: allow order-0 memory allocations in RX path
2013-06-23 20:17 ` Or Gerlitz
@ 2013-06-23 21:13 ` Eric Dumazet
2013-06-24 14:10 ` Or Gerlitz
0 siblings, 1 reply; 25+ messages in thread
From: Eric Dumazet @ 2013-06-23 21:13 UTC (permalink / raw)
To: Or Gerlitz
Cc: David S. Miller, netdev, Or Gerlitz, Eugenia Emantayev,
Saeed Mahameed
On Sun, 2013-06-23 at 23:17 +0300, Or Gerlitz wrote:
> On Sun, Jun 23, 2013 at 6:17 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > Signed-off-by: Eric Dumazet <edumazet@google.com>
> >
> > mlx4 exclusively uses order-2 allocations in RX path, which are
> > likely to fail under memory pressure.
> >
> > We therefore drop frames more than needed.
> >
> > This patch tries order-3, order-2, order-1 and finally order-0
> > allocations to keep good performance, yet allow allocations if/when
> > memory gets fragmented.
> >
> > By using larger pages, and avoiding unnecessary get_page()/put_page()
> > on compound pages, this patch improves performance as well, lowering
> > false sharing on struct page.
>
> Hi Eric, thanks for the patch, both Amir and Yevgeny are OOO, so it
> will take us a bit more time to conduct the review... but lets start:
> could you explain a little further what do you exactly refer to by
> "false sharing" in this context?
Every time mlx4 prepared a page frag into a skb, it did :
- a get_page() in mlx4_en_alloc_frags()
- a get_page() in mlx4_en_complete_rx_desc()
- a put_page() in mlx4_en_free_frag()
-> lot of changes of page->_count
When this skb is consumed, frag is freed -> put_page()
-> decrement of page->_count
If the consumer is on a different cpu, this adds false sharing on
"struct page"
After my patch, mlx4 driver touches this "struct page" only once,
and the consumers will do their get_page() without being slowed down by
mlx4 driver/cpu. This reduces latencies.
>
> Also, I am not fully sure, but I think the current driver code doesn't
> support splice and this somehow relates to how RX skbs are spread over
> pages. In that repsect, I wonder if this patch goes in the direction
> that would allow to support splice, or maybe takes us a bit back, as
> of moving to use order-3 allocations?
splice is supported by core networking, no worries ;)
It doesn't depend on order-whatever allocations.
BTW, splice() works well for TCP over loopback, and TX already uses
fragments in order-3 pages.
>
> You've mentioned performance improvement, could you be more specific?
> what's the scheme under which you saw the improvement and what was
> that improvement.
A cpu might be fully dedicated to softirq handling, and skb consumed on
other cpus.
My patch removes ~60 atomic operations per allocated page
(21 frags, and for each frag, two get_page() and one put_page())
>
> Last, as Amir wrote you, we're looking on re-using skbs on the RX
> patch to avoid sever performance hits when IOMMU is enabled. The team
> has not provided me yet the patch, but basically, if you look on the
> ixgbe patch that was made largely for that very same purpose
> (improving perf under IOMMU) f800326dca7bc158f4c886aa92f222de37993c80
> "ixgbe: Replace standard receive path with a page based receive" ,
> they use there order-0 or order-1 allocations, but not order-2 or
> order-3, also here I have some more catch up to conduct, so we'll
> see...
ixgbe do not support frag_size of 1536 bytes, but 2048 or 4096 bytes.
So using order-3 pages is not win for it.
But for mlx4, we gain 5% occupancy using order-3 pages (21 frags per
32K) over order-2 pages (10 frags per 16K), and 30 % over order-0 pages
(2 frags per 4K)
I don't know, current mlx4 driver is barely usable as is, unless you
make sure the host has enough memory, with plenty of order-2 pages.
And unless you have really specialized applications, there is never
enough memory.
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH net-next] mlx4: allow order-0 memory allocations in RX path
2013-06-23 15:17 ` [PATCH net-next] mlx4: allow order-0 memory allocations in RX path Eric Dumazet
2013-06-23 20:17 ` Or Gerlitz
@ 2013-06-24 14:09 ` Or Gerlitz
2013-06-25 8:53 ` Or Gerlitz
2 siblings, 0 replies; 25+ messages in thread
From: Or Gerlitz @ 2013-06-24 14:09 UTC (permalink / raw)
To: Eric Dumazet; +Cc: David S. Miller, netdev, Or Gerlitz
On Sun, Jun 23, 2013 at 6:17 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Signed-off-by: Eric Dumazet <edumazet@google.com>
>
> mlx4 exclusively uses order-2 allocations in RX path, which are
> likely to fail under memory pressure.
>
> We therefore drop frames more than needed.
>
> This patch tries order-3, order-2, order-1 and finally order-0
> allocations to keep good performance, yet allow allocations if/when
> memory gets fragmented.
>
> By using larger pages, and avoiding unnecessary get_page()/put_page()
> on compound pages, this patch improves performance as well, lowering
> false sharing on struct page.
>
> Also use GFP_KERNEL allocations in initialization path, as allocating 12
> MB (390 order-3 pages) can easily fail with GFP_ATOMIC.
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Cc: Amir Vadai <amirv@mellanox.com>
Eric, this looks brilliant, I am OOO today, so will give it a try on
my systems tomorrow and ack, thanks for the good work!
Or.
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH net-next] mlx4: allow order-0 memory allocations in RX path
2013-06-23 21:13 ` Eric Dumazet
@ 2013-06-24 14:10 ` Or Gerlitz
0 siblings, 0 replies; 25+ messages in thread
From: Or Gerlitz @ 2013-06-24 14:10 UTC (permalink / raw)
To: Eric Dumazet
Cc: David S. Miller, netdev, Or Gerlitz, Eugenia Emantayev,
Saeed Mahameed
On Mon, Jun 24, 2013 at 12:13 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Sun, 2013-06-23 at 23:17 +0300, Or Gerlitz wrote:
>> On Sun, Jun 23, 2013 at 6:17 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>> > Signed-off-by: Eric Dumazet <edumazet@google.com>
>> >
>> > mlx4 exclusively uses order-2 allocations in RX path, which are
>> > likely to fail under memory pressure.
>> >
>> > We therefore drop frames more than needed.
>> >
>> > This patch tries order-3, order-2, order-1 and finally order-0
>> > allocations to keep good performance, yet allow allocations if/when
>> > memory gets fragmented.
>> >
>> > By using larger pages, and avoiding unnecessary get_page()/put_page()
>> > on compound pages, this patch improves performance as well, lowering
>> > false sharing on struct page.
>>
>> Hi Eric, thanks for the patch, both Amir and Yevgeny are OOO, so it
>> will take us a bit more time to conduct the review... but lets start:
>> could you explain a little further what do you exactly refer to by
>> "false sharing" in this context?
>
> Every time mlx4 prepared a page frag into a skb, it did :
> - a get_page() in mlx4_en_alloc_frags()
> - a get_page() in mlx4_en_complete_rx_desc()
> - a put_page() in mlx4_en_free_frag()
>
> -> lot of changes of page->_count
>
> When this skb is consumed, frag is freed -> put_page()
>
> -> decrement of page->_count
>
> If the consumer is on a different cpu, this adds false sharing on
> "struct page"
>
> After my patch, mlx4 driver touches this "struct page" only once,
> and the consumers will do their get_page() without being slowed down by
> mlx4 driver/cpu. This reduces latencies.
>
>>
>> Also, I am not fully sure, but I think the current driver code doesn't
>> support splice and this somehow relates to how RX skbs are spread over
>> pages. In that repsect, I wonder if this patch goes in the direction
>> that would allow to support splice, or maybe takes us a bit back, as
>> of moving to use order-3 allocations?
>
> splice is supported by core networking, no worries ;)
>
> It doesn't depend on order-whatever allocations.
>
> BTW, splice() works well for TCP over loopback, and TX already uses
> fragments in order-3 pages.
Yep, we've tried fio with splice today on mlx4_en NICs running
net-next and it works, I am not sure what was that past problem nor
does it matter too much when things are working now...
Or.
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH net-next] mlx4: allow order-0 memory allocations in RX path
2013-06-23 15:17 ` [PATCH net-next] mlx4: allow order-0 memory allocations in RX path Eric Dumazet
2013-06-23 20:17 ` Or Gerlitz
2013-06-24 14:09 ` Or Gerlitz
@ 2013-06-25 8:53 ` Or Gerlitz
2013-06-25 23:19 ` David Miller
2 siblings, 1 reply; 25+ messages in thread
From: Or Gerlitz @ 2013-06-25 8:53 UTC (permalink / raw)
To: Eric Dumazet
Cc: Amir Vadai, David S. Miller, netdev, Or Gerlitz, Yevgeny Petrilin
On Sun, Jun 23, 2013 at 6:17 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Signed-off-by: Eric Dumazet <edumazet@google.com>
>
> mlx4 exclusively uses order-2 allocations in RX path, which are
> likely to fail under memory pressure.
>
> We therefore drop frames more than needed.
>
> This patch tries order-3, order-2, order-1 and finally order-0
> allocations to keep good performance, yet allow allocations if/when
> memory gets fragmented.
>
> By using larger pages, and avoiding unnecessary get_page()/put_page()
> on compound pages, this patch improves performance as well, lowering
> false sharing on struct page.
>
> Also use GFP_KERNEL allocations in initialization path, as allocating 12
> MB (390 order-3 pages) can easily fail with GFP_ATOMIC.
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Cc: Amir Vadai <amirv@mellanox.com>
> ---
> drivers/net/ethernet/mellanox/mlx4/en_rx.c | 169 ++++++++---------
> drivers/net/ethernet/mellanox/mlx4/mlx4_en.h | 12 -
> 2 files changed, 95 insertions(+), 86 deletions(-)
>
> diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> index 9c57581..76997b9 100644
> --- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> +++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> @@ -43,40 +43,64 @@
>
> #include "mlx4_en.h"
>
> +static int mlx4_alloc_pages(struct mlx4_en_priv *priv,
> + struct mlx4_en_rx_alloc *page_alloc,
> + const struct mlx4_en_frag_info *frag_info,
> + gfp_t _gfp)
> +{
> + int order;
> + struct page *page;
> + dma_addr_t dma;
> +
> + for (order = MLX4_EN_ALLOC_PREFER_ORDER; ;) {
> + gfp_t gfp = _gfp;
> +
> + if (order)
> + gfp |= __GFP_COMP | __GFP_NOWARN;
> + page = alloc_pages(gfp, order);
> + if (likely(page))
> + break;
> + if (--order < 0 ||
> + ((PAGE_SIZE << order) < frag_info->frag_size))
> + return -ENOMEM;
> + }
> + dma = dma_map_page(priv->ddev, page, 0, PAGE_SIZE << order,
> + PCI_DMA_FROMDEVICE);
> + if (dma_mapping_error(priv->ddev, dma)) {
> + put_page(page);
> + return -ENOMEM;
> + }
> + page_alloc->size = PAGE_SIZE << order;
> + page_alloc->page = page;
> + page_alloc->dma = dma;
> + page_alloc->offset = frag_info->frag_align;
> + /* Not doing get_page() for each frag is a big win
> + * on asymetric workloads.
> + */
> + atomic_set(&page->_count, page_alloc->size / frag_info->frag_stride);
> + return 0;
> +}
> +
> static int mlx4_en_alloc_frags(struct mlx4_en_priv *priv,
> struct mlx4_en_rx_desc *rx_desc,
> struct mlx4_en_rx_alloc *frags,
> - struct mlx4_en_rx_alloc *ring_alloc)
> + struct mlx4_en_rx_alloc *ring_alloc,
> + gfp_t gfp)
> {
> struct mlx4_en_rx_alloc page_alloc[MLX4_EN_MAX_RX_FRAGS];
> - struct mlx4_en_frag_info *frag_info;
> + const struct mlx4_en_frag_info *frag_info;
> struct page *page;
> dma_addr_t dma;
> int i;
>
> for (i = 0; i < priv->num_frags; i++) {
> frag_info = &priv->frag_info[i];
> - if (ring_alloc[i].offset == frag_info->last_offset) {
> - page = alloc_pages(GFP_ATOMIC | __GFP_COMP,
> - MLX4_EN_ALLOC_ORDER);
> - if (!page)
> - goto out;
> - dma = dma_map_page(priv->ddev, page, 0,
> - MLX4_EN_ALLOC_SIZE, PCI_DMA_FROMDEVICE);
> - if (dma_mapping_error(priv->ddev, dma)) {
> - put_page(page);
> - goto out;
> - }
> - page_alloc[i].page = page;
> - page_alloc[i].dma = dma;
> - page_alloc[i].offset = frag_info->frag_align;
> - } else {
> - page_alloc[i].page = ring_alloc[i].page;
> - get_page(ring_alloc[i].page);
> - page_alloc[i].dma = ring_alloc[i].dma;
> - page_alloc[i].offset = ring_alloc[i].offset +
> - frag_info->frag_stride;
> - }
> + page_alloc[i] = ring_alloc[i];
> + page_alloc[i].offset += frag_info->frag_stride;
> + if (page_alloc[i].offset + frag_info->frag_stride <= ring_alloc[i].size)
> + continue;
> + if (mlx4_alloc_pages(priv, &page_alloc[i], frag_info, gfp))
> + goto out;
> }
>
> for (i = 0; i < priv->num_frags; i++) {
> @@ -88,14 +112,16 @@ static int mlx4_en_alloc_frags(struct mlx4_en_priv *priv,
>
> return 0;
>
> -
> out:
> while (i--) {
> frag_info = &priv->frag_info[i];
> - if (ring_alloc[i].offset == frag_info->last_offset)
> + if (page_alloc[i].page != ring_alloc[i].page) {
> dma_unmap_page(priv->ddev, page_alloc[i].dma,
> - MLX4_EN_ALLOC_SIZE, PCI_DMA_FROMDEVICE);
> - put_page(page_alloc[i].page);
> + page_alloc[i].size, PCI_DMA_FROMDEVICE);
> + page = page_alloc[i].page;
> + atomic_set(&page->_count, 1);
> + put_page(page);
> + }
> }
> return -ENOMEM;
> }
> @@ -104,12 +130,12 @@ static void mlx4_en_free_frag(struct mlx4_en_priv *priv,
> struct mlx4_en_rx_alloc *frags,
> int i)
> {
> - struct mlx4_en_frag_info *frag_info = &priv->frag_info[i];
> + const struct mlx4_en_frag_info *frag_info = &priv->frag_info[i];
>
> - if (frags[i].offset == frag_info->last_offset) {
> - dma_unmap_page(priv->ddev, frags[i].dma, MLX4_EN_ALLOC_SIZE,
> + if (frags[i].offset + frag_info->frag_stride > frags[i].size)
> + dma_unmap_page(priv->ddev, frags[i].dma, frags[i].size,
> PCI_DMA_FROMDEVICE);
> - }
> +
> if (frags[i].page)
> put_page(frags[i].page);
> }
> @@ -117,35 +143,28 @@ static void mlx4_en_free_frag(struct mlx4_en_priv *priv,
> static int mlx4_en_init_allocator(struct mlx4_en_priv *priv,
> struct mlx4_en_rx_ring *ring)
> {
> - struct mlx4_en_rx_alloc *page_alloc;
> int i;
> + struct mlx4_en_rx_alloc *page_alloc;
>
> for (i = 0; i < priv->num_frags; i++) {
> - page_alloc = &ring->page_alloc[i];
> - page_alloc->page = alloc_pages(GFP_ATOMIC | __GFP_COMP,
> - MLX4_EN_ALLOC_ORDER);
> - if (!page_alloc->page)
> - goto out;
> + const struct mlx4_en_frag_info *frag_info = &priv->frag_info[i];
>
> - page_alloc->dma = dma_map_page(priv->ddev, page_alloc->page, 0,
> - MLX4_EN_ALLOC_SIZE, PCI_DMA_FROMDEVICE);
> - if (dma_mapping_error(priv->ddev, page_alloc->dma)) {
> - put_page(page_alloc->page);
> - page_alloc->page = NULL;
> + if (mlx4_alloc_pages(priv, &ring->page_alloc[i],
> + frag_info, GFP_KERNEL))
> goto out;
> - }
> - page_alloc->offset = priv->frag_info[i].frag_align;
> - en_dbg(DRV, priv, "Initialized allocator:%d with page:%p\n",
> - i, page_alloc->page);
> }
> return 0;
>
> out:
> while (i--) {
> + struct page *page;
> +
> page_alloc = &ring->page_alloc[i];
> dma_unmap_page(priv->ddev, page_alloc->dma,
> - MLX4_EN_ALLOC_SIZE, PCI_DMA_FROMDEVICE);
> - put_page(page_alloc->page);
> + page_alloc->size, PCI_DMA_FROMDEVICE);
> + page = page_alloc->page;
> + atomic_set(&page->_count, 1);
> + put_page(page);
> page_alloc->page = NULL;
> }
> return -ENOMEM;
> @@ -158,13 +177,18 @@ static void mlx4_en_destroy_allocator(struct mlx4_en_priv *priv,
> int i;
>
> for (i = 0; i < priv->num_frags; i++) {
> + const struct mlx4_en_frag_info *frag_info = &priv->frag_info[i];
> +
> page_alloc = &ring->page_alloc[i];
> en_dbg(DRV, priv, "Freeing allocator:%d count:%d\n",
> i, page_count(page_alloc->page));
>
> dma_unmap_page(priv->ddev, page_alloc->dma,
> - MLX4_EN_ALLOC_SIZE, PCI_DMA_FROMDEVICE);
> - put_page(page_alloc->page);
> + page_alloc->size, PCI_DMA_FROMDEVICE);
> + while (page_alloc->offset + frag_info->frag_stride < page_alloc->size) {
> + put_page(page_alloc->page);
> + page_alloc->offset += frag_info->frag_stride;
> + }
> page_alloc->page = NULL;
> }
> }
> @@ -195,13 +219,14 @@ static void mlx4_en_init_rx_desc(struct mlx4_en_priv *priv,
> }
>
> static int mlx4_en_prepare_rx_desc(struct mlx4_en_priv *priv,
> - struct mlx4_en_rx_ring *ring, int index)
> + struct mlx4_en_rx_ring *ring, int index,
> + gfp_t gfp)
> {
> struct mlx4_en_rx_desc *rx_desc = ring->buf + (index * ring->stride);
> struct mlx4_en_rx_alloc *frags = ring->rx_info +
> (index << priv->log_rx_info);
>
> - return mlx4_en_alloc_frags(priv, rx_desc, frags, ring->page_alloc);
> + return mlx4_en_alloc_frags(priv, rx_desc, frags, ring->page_alloc, gfp);
> }
>
> static inline void mlx4_en_update_rx_prod_db(struct mlx4_en_rx_ring *ring)
> @@ -235,7 +260,8 @@ static int mlx4_en_fill_rx_buffers(struct mlx4_en_priv *priv)
> ring = &priv->rx_ring[ring_ind];
>
> if (mlx4_en_prepare_rx_desc(priv, ring,
> - ring->actual_size)) {
> + ring->actual_size,
> + GFP_KERNEL)) {
> if (ring->actual_size < MLX4_EN_MIN_RX_SIZE) {
> en_err(priv, "Failed to allocate "
> "enough rx buffers\n");
> @@ -450,11 +476,11 @@ static int mlx4_en_complete_rx_desc(struct mlx4_en_priv *priv,
> DMA_FROM_DEVICE);
>
> /* Save page reference in skb */
> - get_page(frags[nr].page);
> __skb_frag_set_page(&skb_frags_rx[nr], frags[nr].page);
> skb_frag_size_set(&skb_frags_rx[nr], frag_info->frag_size);
> skb_frags_rx[nr].page_offset = frags[nr].offset;
> skb->truesize += frag_info->frag_stride;
> + frags[nr].page = NULL;
> }
> /* Adjust size of last fragment to match actual length */
> if (nr > 0)
> @@ -547,7 +573,7 @@ static void mlx4_en_refill_rx_buffers(struct mlx4_en_priv *priv,
> int index = ring->prod & ring->size_mask;
>
> while ((u32) (ring->prod - ring->cons) < ring->actual_size) {
> - if (mlx4_en_prepare_rx_desc(priv, ring, index))
> + if (mlx4_en_prepare_rx_desc(priv, ring, index, GFP_ATOMIC))
> break;
> ring->prod++;
> index = ring->prod & ring->size_mask;
> @@ -805,21 +831,7 @@ int mlx4_en_poll_rx_cq(struct napi_struct *napi, int budget)
> return done;
> }
>
> -
> -/* Calculate the last offset position that accommodates a full fragment
> - * (assuming fagment size = stride-align) */
> -static int mlx4_en_last_alloc_offset(struct mlx4_en_priv *priv, u16 stride, u16 align)
> -{
> - u16 res = MLX4_EN_ALLOC_SIZE % stride;
> - u16 offset = MLX4_EN_ALLOC_SIZE - stride - res + align;
> -
> - en_dbg(DRV, priv, "Calculated last offset for stride:%d align:%d "
> - "res:%d offset:%d\n", stride, align, res, offset);
> - return offset;
> -}
> -
> -
> -static int frag_sizes[] = {
> +static const int frag_sizes[] = {
> FRAG_SZ0,
> FRAG_SZ1,
> FRAG_SZ2,
> @@ -847,9 +859,6 @@ void mlx4_en_calc_rx_buf(struct net_device *dev)
> priv->frag_info[i].frag_stride =
> ALIGN(frag_sizes[i], SMP_CACHE_BYTES);
> }
> - priv->frag_info[i].last_offset = mlx4_en_last_alloc_offset(
> - priv, priv->frag_info[i].frag_stride,
> - priv->frag_info[i].frag_align);
> buf_size += priv->frag_info[i].frag_size;
> i++;
> }
> @@ -861,13 +870,13 @@ void mlx4_en_calc_rx_buf(struct net_device *dev)
> en_dbg(DRV, priv, "Rx buffer scatter-list (effective-mtu:%d "
> "num_frags:%d):\n", eff_mtu, priv->num_frags);
> for (i = 0; i < priv->num_frags; i++) {
> - en_dbg(DRV, priv, " frag:%d - size:%d prefix:%d align:%d "
> - "stride:%d last_offset:%d\n", i,
> - priv->frag_info[i].frag_size,
> - priv->frag_info[i].frag_prefix_size,
> - priv->frag_info[i].frag_align,
> - priv->frag_info[i].frag_stride,
> - priv->frag_info[i].last_offset);
> + en_err(priv,
> + " frag:%d - size:%d prefix:%d align:%d stride:%d\n",
> + i,
> + priv->frag_info[i].frag_size,
> + priv->frag_info[i].frag_prefix_size,
> + priv->frag_info[i].frag_align,
> + priv->frag_info[i].frag_stride);
> }
> }
>
> diff --git a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
> index 57192a8..35fb60e 100644
> --- a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
> +++ b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
> @@ -96,7 +96,8 @@
>
> /* Use the maximum between 16384 and a single page */
> #define MLX4_EN_ALLOC_SIZE PAGE_ALIGN(16384)
> -#define MLX4_EN_ALLOC_ORDER get_order(MLX4_EN_ALLOC_SIZE)
> +
> +#define MLX4_EN_ALLOC_PREFER_ORDER PAGE_ALLOC_COSTLY_ORDER
>
> /* Receive fragment sizes; we use at most 3 fragments (for 9600 byte MTU
> * and 4K allocations) */
> @@ -234,9 +235,10 @@ struct mlx4_en_tx_desc {
> #define MLX4_EN_CX3_HIGH_ID 0x1005
>
> struct mlx4_en_rx_alloc {
> - struct page *page;
> - dma_addr_t dma;
> - u16 offset;
> + struct page *page;
> + dma_addr_t dma;
> + u32 offset;
> + u32 size;
> };
>
> struct mlx4_en_tx_ring {
> @@ -439,8 +441,6 @@ struct mlx4_en_frag_info {
> u16 frag_prefix_size;
> u16 frag_stride;
> u16 frag_align;
> - u16 last_offset;
> -
> };
>
> #ifdef CONFIG_MLX4_EN_DCB
Amir is OOO and I am covering for him, so:
Acked-by: Or Gerlitz <ogerlitz@mellanox.com>
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [PATCH net-next] mlx4: allow order-0 memory allocations in RX path
2013-06-25 8:53 ` Or Gerlitz
@ 2013-06-25 23:19 ` David Miller
0 siblings, 0 replies; 25+ messages in thread
From: David Miller @ 2013-06-25 23:19 UTC (permalink / raw)
To: or.gerlitz; +Cc: eric.dumazet, amirv, netdev, ogerlitz, yevgenyp
From: Or Gerlitz <or.gerlitz@gmail.com>
Date: Tue, 25 Jun 2013 11:53:50 +0300
> On Sun, Jun 23, 2013 at 6:17 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>> Signed-off-by: Eric Dumazet <edumazet@google.com>
>>
>> mlx4 exclusively uses order-2 allocations in RX path, which are
>> likely to fail under memory pressure.
>>
>> We therefore drop frames more than needed.
>>
>> This patch tries order-3, order-2, order-1 and finally order-0
>> allocations to keep good performance, yet allow allocations if/when
>> memory gets fragmented.
>>
>> By using larger pages, and avoiding unnecessary get_page()/put_page()
>> on compound pages, this patch improves performance as well, lowering
>> false sharing on struct page.
>>
>> Also use GFP_KERNEL allocations in initialization path, as allocating 12
>> MB (390 order-3 pages) can easily fail with GFP_ATOMIC.
>>
>> Signed-off-by: Eric Dumazet <edumazet@google.com>
>
> Amir is OOO and I am covering for him, so:
>
> Acked-by: Or Gerlitz <ogerlitz@mellanox.com>
>
Applied.
^ permalink raw reply [flat|nested] 25+ messages in thread
end of thread, other threads:[~2013-06-25 23:19 UTC | newest]
Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-06-20 19:40 [PATCH net-next 00/11] Mellanox driver updates 2013-06-20 Amir Vadai
2013-06-20 19:40 ` [PATCH net-next 01/11] net/mlx4_en: Suppress page allocation failure warnings Amir Vadai
2013-06-20 21:28 ` Eric Dumazet
2013-06-21 5:35 ` Amir Vadai
2013-06-23 8:46 ` Amir Vadai
2013-06-23 15:14 ` Eric Dumazet
2013-06-23 15:17 ` [PATCH net-next] mlx4: allow order-0 memory allocations in RX path Eric Dumazet
2013-06-23 20:17 ` Or Gerlitz
2013-06-23 21:13 ` Eric Dumazet
2013-06-24 14:10 ` Or Gerlitz
2013-06-24 14:09 ` Or Gerlitz
2013-06-25 8:53 ` Or Gerlitz
2013-06-25 23:19 ` David Miller
2013-06-20 19:40 ` [PATCH net-next 02/11] net/mlx4_en: Fix resource leak in error flow Amir Vadai
2013-06-20 19:40 ` [PATCH net-next 03/11] net/mlx4_en: Do not query stats when device port is down Amir Vadai
2013-06-20 19:40 ` [PATCH net-next 04/11] net/mlx4_en: Move register_netdev() to the end of initialization function Amir Vadai
2013-06-20 19:40 ` [PATCH net-next 05/11] net/mlx4_en: Change log level from error to debug for vlan related messages Amir Vadai
2013-06-20 19:40 ` [PATCH net-next 06/11] net/mlx4_en: Fix a race between napi poll function and RX ring cleanup Amir Vadai
2013-06-20 19:40 ` [PATCH net-next 07/11] net/mlx4_en: Add prints when TX timeout occurs Amir Vadai
2013-06-20 19:55 ` Joe Perches
2013-06-21 5:31 ` Amir Vadai
2013-06-20 19:40 ` [PATCH net-next 08/11] net/mlx4_en: Remove an unnecessary test Amir Vadai
2013-06-20 19:40 ` [PATCH net-next 09/11] net/mlx4_core: Replace sscanf() with kstrtoint() Amir Vadai
2013-06-20 19:40 ` [PATCH net-next 10/11] net/mlx4_core: Add warning in case of command timeouts Amir Vadai
2013-06-20 19:40 ` [PATCH net-next 11/11] net/mlx4_core: Fail device init if num_vfs is negative Amir Vadai
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).