* Re: [PATCH v2 net-next] net/mlx5e: remove napi_hash_del() calls
From: Saeed Mahameed @ 2016-11-16 14:34 UTC (permalink / raw)
To: Eric Dumazet; +Cc: David Miller, Saeed Mahameed, netdev
In-Reply-To: <1479306094.8455.181.camel@edumazet-glaptop3.roam.corp.google.com>
On Wed, Nov 16, 2016 at 4:21 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> From: Eric Dumazet <edumazet@google.com>
>
> Calling napi_hash_del() after netif_napi_del() is pointless.
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Cc: Saeed Mahameed <saeedm@mellanox.com>
Acked-by: Saeed Mahameed <saeedm@mellanox.com>
^ permalink raw reply
* [PATCH net 3/7] net: ethernet: ti: cpsw: fix deferred probe
From: Johan Hovold @ 2016-11-16 14:35 UTC (permalink / raw)
To: Mugunthan V N
Cc: Grygorii Strashko, linux-omap, netdev, linux-kernel, Johan Hovold
In-Reply-To: <1479306916-27673-1-git-send-email-johan@kernel.org>
Make sure to deregister all child devices also on probe errors to avoid
leaks and to fix probe deferral.
cpsw 4a100000.ethernet: omap_device: omap_device_enable() called from invalid state 1
cpsw 4a100000.ethernet: use pm_runtime_put_sync_suspend() in driver?
cpsw: probe of 4a100000.ethernet failed with error -22
Add generic helper to undo the effects of cpsw_probe_dt(), which will
also be used in a follow-on patch to fix further leaks that have been
introduced more recently.
Fixes: 1fb19aa730e4 ("net: cpsw: Add parent<->child relation support
between cpsw and mdio")
Signed-off-by: Johan Hovold <johan@kernel.org>
---
drivers/net/ethernet/ti/cpsw.c | 26 ++++++++++++++++----------
1 file changed, 16 insertions(+), 10 deletions(-)
diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c
index ee1fae914abc..4374ba05610b 100644
--- a/drivers/net/ethernet/ti/cpsw.c
+++ b/drivers/net/ethernet/ti/cpsw.c
@@ -2441,6 +2441,11 @@ static int cpsw_probe_dt(struct cpsw_platform_data *data,
return 0;
}
+static void cpsw_remove_dt(struct platform_device *pdev)
+{
+ of_platform_depopulate(&pdev->dev);
+}
+
static int cpsw_probe_dual_emac(struct cpsw_priv *priv)
{
struct cpsw_common *cpsw = priv->cpsw;
@@ -2588,7 +2593,7 @@ static int cpsw_probe(struct platform_device *pdev)
if (cpsw_probe_dt(&cpsw->data, pdev)) {
dev_err(&pdev->dev, "cpsw: platform data missing\n");
ret = -ENODEV;
- goto clean_runtime_disable_ret;
+ goto clean_dt_ret;
}
data = &cpsw->data;
cpsw->rx_ch_num = 1;
@@ -2609,7 +2614,7 @@ static int cpsw_probe(struct platform_device *pdev)
GFP_KERNEL);
if (!cpsw->slaves) {
ret = -ENOMEM;
- goto clean_runtime_disable_ret;
+ goto clean_dt_ret;
}
for (i = 0; i < data->slaves; i++)
cpsw->slaves[i].slave_num = i;
@@ -2621,7 +2626,7 @@ static int cpsw_probe(struct platform_device *pdev)
if (IS_ERR(clk)) {
dev_err(priv->dev, "fck is not found\n");
ret = -ENODEV;
- goto clean_runtime_disable_ret;
+ goto clean_dt_ret;
}
cpsw->bus_freq_mhz = clk_get_rate(clk) / 1000000;
@@ -2629,7 +2634,7 @@ static int cpsw_probe(struct platform_device *pdev)
ss_regs = devm_ioremap_resource(&pdev->dev, ss_res);
if (IS_ERR(ss_regs)) {
ret = PTR_ERR(ss_regs);
- goto clean_runtime_disable_ret;
+ goto clean_dt_ret;
}
cpsw->regs = ss_regs;
@@ -2639,7 +2644,7 @@ static int cpsw_probe(struct platform_device *pdev)
ret = pm_runtime_get_sync(&pdev->dev);
if (ret < 0) {
pm_runtime_put_noidle(&pdev->dev);
- goto clean_runtime_disable_ret;
+ goto clean_dt_ret;
}
cpsw->version = readl(&cpsw->regs->id_ver);
pm_runtime_put_sync(&pdev->dev);
@@ -2648,7 +2653,7 @@ static int cpsw_probe(struct platform_device *pdev)
cpsw->wr_regs = devm_ioremap_resource(&pdev->dev, res);
if (IS_ERR(cpsw->wr_regs)) {
ret = PTR_ERR(cpsw->wr_regs);
- goto clean_runtime_disable_ret;
+ goto clean_dt_ret;
}
memset(&dma_params, 0, sizeof(dma_params));
@@ -2685,7 +2690,7 @@ static int cpsw_probe(struct platform_device *pdev)
default:
dev_err(priv->dev, "unknown version 0x%08x\n", cpsw->version);
ret = -ENODEV;
- goto clean_runtime_disable_ret;
+ goto clean_dt_ret;
}
for (i = 0; i < cpsw->data.slaves; i++) {
struct cpsw_slave *slave = &cpsw->slaves[i];
@@ -2714,7 +2719,7 @@ static int cpsw_probe(struct platform_device *pdev)
if (!cpsw->dma) {
dev_err(priv->dev, "error initializing dma\n");
ret = -ENOMEM;
- goto clean_runtime_disable_ret;
+ goto clean_dt_ret;
}
cpsw->txch[0] = cpdma_chan_create(cpsw->dma, 0, cpsw_tx_handler, 0);
@@ -2827,7 +2832,8 @@ static int cpsw_probe(struct platform_device *pdev)
}
clean_dma_ret:
cpdma_ctlr_destroy(cpsw->dma);
-clean_runtime_disable_ret:
+clean_dt_ret:
+ cpsw_remove_dt(pdev);
pm_runtime_disable(&pdev->dev);
clean_ndev_ret:
free_netdev(priv->ndev);
@@ -2852,7 +2858,7 @@ static int cpsw_remove(struct platform_device *pdev)
cpsw_ale_destroy(cpsw->ale);
cpdma_ctlr_destroy(cpsw->dma);
- of_platform_depopulate(&pdev->dev);
+ cpsw_remove_dt(pdev);
pm_runtime_put_sync(&pdev->dev);
pm_runtime_disable(&pdev->dev);
if (cpsw->data.dual_emac)
--
2.7.3
^ permalink raw reply related
* [PATCH net 4/7] net: ethernet: ti: cpsw: fix of_node and phydev leaks
From: Johan Hovold @ 2016-11-16 14:35 UTC (permalink / raw)
To: Mugunthan V N
Cc: Grygorii Strashko, linux-omap, netdev, linux-kernel, Johan Hovold
In-Reply-To: <1479306916-27673-1-git-send-email-johan@kernel.org>
Make sure to drop references taken and deregister devices registered
during probe on probe errors (including deferred probe) and driver
unbind.
Specifically, PHY of-node references were never released and fixed-link
PHY devices were never deregistered.
Fixes: 9e42f715264f ("drivers: net: cpsw: add phy-handle parsing")
Fixes: 1f71e8c96fc6 ("drivers: net: cpsw: Add support for fixed-link
PHY")
Signed-off-by: Johan Hovold <johan@kernel.org>
---
drivers/net/ethernet/ti/cpsw.c | 35 +++++++++++++++++++++++++++++++++++
1 file changed, 35 insertions(+)
diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c
index 4374ba05610b..4bd96732291c 100644
--- a/drivers/net/ethernet/ti/cpsw.c
+++ b/drivers/net/ethernet/ti/cpsw.c
@@ -2443,6 +2443,41 @@ static int cpsw_probe_dt(struct cpsw_platform_data *data,
static void cpsw_remove_dt(struct platform_device *pdev)
{
+ struct net_device *ndev = platform_get_drvdata(pdev);
+ struct cpsw_common *cpsw = ndev_to_cpsw(ndev);
+ struct cpsw_platform_data *data = &cpsw->data;
+ struct device_node *node = pdev->dev.of_node;
+ struct device_node *slave_node;
+ int i = 0;
+
+ for_each_available_child_of_node(node, slave_node) {
+ struct cpsw_slave_data *slave_data = &data->slave_data[i];
+
+ if (strcmp(slave_node->name, "slave"))
+ continue;
+
+ if (of_phy_is_fixed_link(slave_node)) {
+ struct phy_device *phydev;
+
+ phydev = of_phy_find_device(slave_node);
+ if (phydev) {
+ fixed_phy_unregister(phydev);
+ /* Put references taken by
+ * of_phy_find_device() and
+ * of_phy_register_fixed_link().
+ */
+ phy_device_free(phydev);
+ phy_device_free(phydev);
+ }
+ }
+
+ of_node_put(slave_data->phy_node);
+
+ i++;
+ if (i == data->slaves)
+ break;
+ }
+
of_platform_depopulate(&pdev->dev);
}
--
2.7.3
^ permalink raw reply related
* [PATCH net 5/7] net: ethernet: ti: cpsw: fix secondary-emac probe error path
From: Johan Hovold @ 2016-11-16 14:35 UTC (permalink / raw)
To: Mugunthan V N
Cc: Grygorii Strashko, linux-omap, netdev, linux-kernel, Johan Hovold
In-Reply-To: <1479306916-27673-1-git-send-email-johan@kernel.org>
Make sure to deregister the primary device in case the secondary emac
fails to probe.
kernel BUG at /home/johan/work/omicron/src/linux/net/core/dev.c:7743!
...
[<c05b3dec>] (free_netdev) from [<c04fe6c0>] (cpsw_probe+0x9cc/0xe50)
[<c04fe6c0>] (cpsw_probe) from [<c047b28c>] (platform_drv_probe+0x5c/0xc0)
Fixes: d9ba8f9e6298 ("driver: net: ethernet: cpsw: dual emac interface
implementation")
Signed-off-by: Johan Hovold <johan@kernel.org>
---
drivers/net/ethernet/ti/cpsw.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c
index 4bd96732291c..421da97a5be6 100644
--- a/drivers/net/ethernet/ti/cpsw.c
+++ b/drivers/net/ethernet/ti/cpsw.c
@@ -2852,12 +2852,14 @@ static int cpsw_probe(struct platform_device *pdev)
ret = cpsw_probe_dual_emac(priv);
if (ret) {
cpsw_err(priv, probe, "error probe slave 2 emac interface\n");
- goto clean_ale_ret;
+ goto clean_unregister_netdev_ret;
}
}
return 0;
+clean_unregister_netdev_ret:
+ unregister_netdev(ndev);
clean_ale_ret:
if (pm_runtime_get_sync(&pdev->dev) < 0) {
pm_runtime_put_noidle(&pdev->dev);
--
2.7.3
^ permalink raw reply related
* [PATCH net 6/7] net: ethernet: ti: cpsw: add missing sanity check
From: Johan Hovold @ 2016-11-16 14:35 UTC (permalink / raw)
To: Mugunthan V N
Cc: Grygorii Strashko, linux-omap, netdev, linux-kernel, Johan Hovold
In-Reply-To: <1479306916-27673-1-git-send-email-johan@kernel.org>
Make sure to check for allocation failures before dereferencing a
NULL-pointer during probe.
Fixes: 649a1688c960 ("net: ethernet: ti: cpsw: create common struct to
hold shared driver data")
Signed-off-by: Johan Hovold <johan@kernel.org>
---
drivers/net/ethernet/ti/cpsw.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c
index 421da97a5be6..411be060fee6 100644
--- a/drivers/net/ethernet/ti/cpsw.c
+++ b/drivers/net/ethernet/ti/cpsw.c
@@ -2588,6 +2588,9 @@ static int cpsw_probe(struct platform_device *pdev)
int irq;
cpsw = devm_kzalloc(&pdev->dev, sizeof(struct cpsw_common), GFP_KERNEL);
+ if (!cpsw)
+ return -ENOMEM;
+
cpsw->dev = &pdev->dev;
ndev = alloc_etherdev_mq(sizeof(struct cpsw_priv), CPSW_MAX_QUEUES);
--
2.7.3
^ permalink raw reply related
* [PATCH net 7/7] net: ethernet: ti: cpsw: fix fixed-link phy probe deferral
From: Johan Hovold @ 2016-11-16 14:35 UTC (permalink / raw)
To: Mugunthan V N
Cc: Grygorii Strashko, linux-omap, netdev, linux-kernel, Johan Hovold
In-Reply-To: <1479306916-27673-1-git-send-email-johan@kernel.org>
Make sure to propagate errors from of_phy_register_fixed_link() which
can fail with -EPROBE_DEFER.
Fixes: 1f71e8c96fc6 ("drivers: net: cpsw: Add support for fixed-link
PHY")
Signed-off-by: Johan Hovold <johan@kernel.org>
---
drivers/net/ethernet/ti/cpsw.c | 12 +++++++-----
1 file changed, 7 insertions(+), 5 deletions(-)
diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c
index 411be060fee6..cf8f1afa46bf 100644
--- a/drivers/net/ethernet/ti/cpsw.c
+++ b/drivers/net/ethernet/ti/cpsw.c
@@ -2375,8 +2375,11 @@ static int cpsw_probe_dt(struct cpsw_platform_data *data,
* to the PHY is the Ethernet MAC DT node.
*/
ret = of_phy_register_fixed_link(slave_node);
- if (ret)
+ if (ret) {
+ if (ret != -EPROBE_DEFER)
+ dev_err(&pdev->dev, "failed to register fixed-link phy: %d\n", ret);
return ret;
+ }
slave_data->phy_node = of_node_get(slave_node);
} else if (parp) {
u32 phyid;
@@ -2628,11 +2631,10 @@ static int cpsw_probe(struct platform_device *pdev)
/* Select default pin state */
pinctrl_pm_select_default_state(&pdev->dev);
- if (cpsw_probe_dt(&cpsw->data, pdev)) {
- dev_err(&pdev->dev, "cpsw: platform data missing\n");
- ret = -ENODEV;
+ ret = cpsw_probe_dt(&cpsw->data, pdev);
+ if (ret)
goto clean_dt_ret;
- }
+
data = &cpsw->data;
cpsw->rx_ch_num = 1;
cpsw->tx_ch_num = 1;
--
2.7.3
^ permalink raw reply related
* [PATCH net 1/7] net: ethernet: ti: cpsw: fix bad register access in probe error path
From: Johan Hovold @ 2016-11-16 14:35 UTC (permalink / raw)
To: Mugunthan V N
Cc: Grygorii Strashko, linux-omap, netdev, linux-kernel, Johan Hovold
In-Reply-To: <1479306916-27673-1-git-send-email-johan@kernel.org>
Make sure to resume the platform device to enable clocks before
accessing the CPSW registers in the probe error path (e.g. for deferred
probe).
Unhandled fault: external abort on non-linefetch (0x1008) at 0xd0872d08
...
[<c04fabcc>] (cpsw_ale_control_set) from [<c04fb8b4>] (cpsw_ale_destroy+0x2c/0x44)
[<c04fb8b4>] (cpsw_ale_destroy) from [<c04fea58>] (cpsw_probe+0xbd0/0x10c4)
[<c04fea58>] (cpsw_probe) from [<c047b2a0>] (platform_drv_probe+0x5c/0xc0)
Note that in the unlikely event of a runtime-resume failure, we'll leak
the ale struct.
Fixes: df828598a755 ("netdev: driver: ethernet: Add TI CPSW driver")
Signed-off-by: Johan Hovold <johan@kernel.org>
---
drivers/net/ethernet/ti/cpsw.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c
index c6cff3d2ff05..5bc5e6189661 100644
--- a/drivers/net/ethernet/ti/cpsw.c
+++ b/drivers/net/ethernet/ti/cpsw.c
@@ -2818,7 +2818,12 @@ static int cpsw_probe(struct platform_device *pdev)
return 0;
clean_ale_ret:
- cpsw_ale_destroy(cpsw->ale);
+ if (pm_runtime_get_sync(&pdev->dev) < 0) {
+ pm_runtime_put_noidle(&pdev->dev);
+ } else {
+ cpsw_ale_destroy(cpsw->ale);
+ pm_runtime_put_sync(&pdev->dev);
+ }
clean_dma_ret:
cpdma_ctlr_destroy(cpsw->dma);
clean_runtime_disable_ret:
--
2.7.3
^ permalink raw reply related
* Re: [PATCH] net: ethernet: faraday: To support device tree usage.
From: Andrew Lunn @ 2016-11-16 14:37 UTC (permalink / raw)
To: Greentime Hu; +Cc: netdev, devicetree
In-Reply-To: <CAEbi=3esyRi4cnUNf4uPKCQtyRekBRo81Frd=BSJy8qf=3uc+A@mail.gmail.com>
On Wed, Nov 16, 2016 at 10:26:52PM +0800, Greentime Hu wrote:
> On Wed, Nov 16, 2016 at 9:47 PM, Andrew Lunn <andrew@lunn.ch> wrote:
> > On Wed, Nov 16, 2016 at 04:43:15PM +0800, Greentime Hu wrote:
> >> To support device tree usage for ftmac100.
> >>
> >> Signed-off-by: Greentime Hu <green.hu@gmail.com>
> >> ---
> >> drivers/net/ethernet/faraday/ftmac100.c | 7 +++++++
> >> 1 file changed, 7 insertions(+)
> >>
> >> diff --git a/drivers/net/ethernet/faraday/ftmac100.c b/drivers/net/ethernet/faraday/ftmac100.c
> >> index dce5f7b..81dd9e1 100644
> >> --- a/drivers/net/ethernet/faraday/ftmac100.c
> >> +++ b/drivers/net/ethernet/faraday/ftmac100.c
> >> @@ -1172,11 +1172,17 @@ static int __exit ftmac100_remove(struct platform_device *pdev)
> >> return 0;
> >> }
> >>
> >> +static const struct of_device_id mac_of_ids[] = {
> >> + { .compatible = "andestech,atmac100" },
> >> + { }
> >
> > andestech is not in
> > Documentation/devicetree/bindings/vendor-prefixes.txt Please provide a
> > separate patch adding it.
> OK. I will provide another patch to add andestech.
>
> > Humm, why andestech? Why not something based around faraday
> > technology?
> It is because we use the same ftmac100 IP provided from faraday
> technology but I am now using it in andestech SoC.
Please make sure you get an acked-by: from the device tree
maintainers. They might want you to use faraday, since that is the
original IP provider. For example, all Synopsys licensed IP uses
"snps,XXX", not the SoC vendor with the license.
Andrew
^ permalink raw reply
* [PATCH] bpf: fix possible uninitialized access in inactive rotation
From: Arnd Bergmann @ 2016-11-16 14:38 UTC (permalink / raw)
To: Alexei Starovoitov
Cc: Arnd Bergmann, Martin KaFai Lau, David S. Miller, netdev,
linux-kernel
This newly added code causes a build warning:
kernel/bpf/bpf_lru_list.c: In function '__bpf_lru_list_rotate_inactive':
kernel/bpf/bpf_lru_list.c:201:28: error: 'next' may be used uninitialized in this function [-Werror=maybe-uninitialized]
The warning is plausible from looking at the code, though there might
be non-obvious external constraints that ensure it always works.
Moving the assignment of ->next_inactive_rotation inside of the
loop makes it obvious to the reader and the compiler when we
actually want to update ->next.
Fixes: 3a08c2fd7634 ("bpf: LRU List")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
---
kernel/bpf/bpf_lru_list.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/kernel/bpf/bpf_lru_list.c b/kernel/bpf/bpf_lru_list.c
index bfebff010ba9..f462e5f09703 100644
--- a/kernel/bpf/bpf_lru_list.c
+++ b/kernel/bpf/bpf_lru_list.c
@@ -192,13 +192,14 @@ static void __bpf_lru_list_rotate_inactive(struct bpf_lru *lru,
next = cur->prev;
if (bpf_lru_node_is_ref(node))
__bpf_lru_node_move(l, node, BPF_LRU_LIST_T_ACTIVE);
- if (cur == last)
+ if (cur == last) {
+ l->next_inactive_rotation = next;
break;
+ }
cur = next;
i++;
}
- l->next_inactive_rotation = next;
}
/* Shrink the inactive list. It starts from the tail of the
--
2.9.0
^ permalink raw reply related
* [PATCH] net: dsa: fix fixed-link-phy device leaks
From: Johan Hovold @ 2016-11-16 14:47 UTC (permalink / raw)
To: Andrew Lunn
Cc: Vivien Didelot, Florian Fainelli, David S. Miller, netdev,
linux-kernel, Johan Hovold
Make sure to drop the reference taken by of_phy_find_device() when
registering and deregistering the fixed-link PHY-device.
Note that we need to put both references held at deregistration.
Fixes: 39b0c705195e ("net: dsa: Allow configuration of CPU & DSA port
speeds/duplex")
Signed-off-by: Johan Hovold <johan@kernel.org>
---
Hi,
This is one has been compile tested only, but fixes a couple of leaks
similar to one that was found in the cpsw driver for which I just posted
a patch.
It turns out all drivers but DSA fail to deregister the fixed-link PHYs
registered by of_phy_register_fixed_link(). Due to the way this
interface was designed, deregistering such a PHY is a bit cumbersome and
looks like it would benefit from a common helper.
However, perhaps the interface should instead be changed so that the PHY
device is returned so that drivers do not need to use
of_phy_find_device() when they need to access properties of the fixed
link (e.g. as in dsu_cpu_dsa_setup below).
Thoughts?
Thanks,
Johan
net/dsa/dsa.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/net/dsa/dsa.c b/net/dsa/dsa.c
index a6902c1e2f28..798a6a776a5f 100644
--- a/net/dsa/dsa.c
+++ b/net/dsa/dsa.c
@@ -233,6 +233,8 @@ int dsa_cpu_dsa_setup(struct dsa_switch *ds, struct device *dev,
genphy_read_status(phydev);
if (ds->ops->adjust_link)
ds->ops->adjust_link(ds, port, phydev);
+
+ phy_device_free(phydev); /* of_phy_find_device */
}
return 0;
@@ -509,8 +511,12 @@ void dsa_cpu_dsa_destroy(struct device_node *port_dn)
if (of_phy_is_fixed_link(port_dn)) {
phydev = of_phy_find_device(port_dn);
if (phydev) {
- phy_device_free(phydev);
fixed_phy_unregister(phydev);
+ /* Put references taken by of_phy_find_device() and
+ * of_phy_register_fixed_link().
+ */
+ phy_device_free(phydev);
+ phy_device_free(phydev);
}
}
}
--
2.7.3
^ permalink raw reply related
* Re: [patch net-next 6/8] ipv4: fib: Add an API to request a FIB dump
From: Hannes Frederic Sowa @ 2016-11-16 14:51 UTC (permalink / raw)
To: Jiri Pirko, netdev
Cc: davem, idosch, eladr, yotamg, nogahf, arkadis, ogerlitz, roopa,
dsa, nikolay, andy, vivien.didelot, andrew, f.fainelli,
alexander.h.duyck, kuznet, jmorris, yoshfuji, kaber
In-Reply-To: <1479305343-13167-7-git-send-email-jiri@resnulli.us>
On 16.11.2016 15:09, Jiri Pirko wrote:
> From: Ido Schimmel <idosch@mellanox.com>
>
> Commit b90eb7549499 ("fib: introduce FIB notification infrastructure")
> introduced a new notification chain to notify listeners (f.e., switchdev
> drivers) about addition and deletion of routes.
>
> However, upon registration to the chain the FIB tables can already be
> populated, which means potential listeners will have an incomplete view
> of the tables.
>
> Solve that by adding an API to request a FIB dump. The dump itself it
> done using RCU in order not to starve consumers that need RTNL to make
> progress.
>
> Signed-off-by: Ido Schimmel <idosch@mellanox.com>
> Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Have you looked at potential inconsistencies resulting of RCU walking
the table and having concurrent inserts?
I don't see a way around doing a journal like in filesystems somehow,
but maybe the effects are not that severe and it is not a problem after all.
Bye,
Hannes
^ permalink raw reply
* Re: [PATCH net 1/3] net: phy: realtek: add eee advertisement disable options
From: Jerome Brunet @ 2016-11-16 14:51 UTC (permalink / raw)
To: Andrew Lunn
Cc: Florian Fainelli, netdev-u79uwXL29TY76Z2rM5mHXA,
devicetree-u79uwXL29TY76Z2rM5mHXA, Carlo Caione, Kevin Hilman,
Giuseppe Cavallaro, Alexandre TORGUE, Martin Blumenstingl,
Andre Roth, Neil Armstrong,
linux-amlogic-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
linux-kernel-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <20161116132337.GD19962-g2DYL2Zd6BY@public.gmane.org>
On Wed, 2016-11-16 at 14:23 +0100, Andrew Lunn wrote:
> >
> > There two kind of PHYs supporting eee, the one advertising eee by
> > default (like realtek) and the one not advertising it (like
> > micrel).
This is just the default register value.
>
> I don't know too much about EEE. So maybe a dumb question. Does the
> MAC need to be involved? Or is it just the PHY?
>
> If the MAC needs to be involved, the PHY should not be advertising
> EEE
> unless the MAC asks for it by calling phy_init_eee(). If this is
> true,
> maybe we need to change the realtek driver, and others in that class.
As far I understand, the advertised capabilities are exchanged during
the auto-negotiation.
At this stage, if the advertisement is disabled (regarless of the
actual support) on either side of the link, there will be no low power
idle state on the Tx nor the Rx path.
If the advertisement is enabled on both side but we don't call
phy_init_eee, I suppose Tx won't enter LPI, but Rx could.
>
> Andrew
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [PATCH net 1/3] net: phy: realtek: add eee advertisement disable options
From: Andrew Lunn @ 2016-11-16 15:06 UTC (permalink / raw)
To: Jerome Brunet
Cc: Florian Fainelli, netdev-u79uwXL29TY76Z2rM5mHXA,
devicetree-u79uwXL29TY76Z2rM5mHXA, Carlo Caione, Kevin Hilman,
Giuseppe Cavallaro, Alexandre TORGUE, Martin Blumenstingl,
Andre Roth, Neil Armstrong,
linux-amlogic-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
linux-kernel-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <1479307890.17538.40.camel-rdvid1DuHRBWk0Htik3J/w@public.gmane.org>
On Wed, Nov 16, 2016 at 03:51:30PM +0100, Jerome Brunet wrote:
> On Wed, 2016-11-16 at 14:23 +0100, Andrew Lunn wrote:
> > >
> > > There two kind of PHYs supporting eee, the one advertising eee by
> > > default (like realtek) and the one not advertising it (like
> > > micrel).
>
> This is just the default register value.
>
> >
> > I don't know too much about EEE. So maybe a dumb question. Does the
> > MAC need to be involved? Or is it just the PHY?
> >
> > If the MAC needs to be involved, the PHY should not be advertising
> > EEE
> > unless the MAC asks for it by calling phy_init_eee(). If this is
> > true,
> > maybe we need to change the realtek driver, and others in that class.
>
> As far I understand, the advertised capabilities are exchanged during
> the auto-negotiation.
>
> At this stage, if the advertisement is disabled (regarless of the
> actual support) on either side of the link, there will be no low power
> idle state on the Tx nor the Rx path.
>
> If the advertisement is enabled on both side but we don't call
> phy_init_eee, I suppose Tx won't enter LPI, but Rx could.
What i was trying to find out is, if the MAC needs to support EEE as
well as the PHY, what happens when the MAC does not support EEE, but
the PHYs do negotiate EEE? Does it break?
Andrew
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: [PATCH net 2/3] dt-bindings: net: add DT bindings for realtek phys
From: Rob Herring @ 2016-11-16 15:11 UTC (permalink / raw)
To: Jerome Brunet
Cc: netdev-u79uwXL29TY76Z2rM5mHXA, devicetree-u79uwXL29TY76Z2rM5mHXA,
Florian Fainelli, Carlo Caione, Kevin Hilman, Giuseppe Cavallaro,
Alexandre TORGUE, Martin Blumenstingl, Andre Roth, Neil Armstrong,
linux-amlogic-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
linux-kernel-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <1479220154-25851-3-git-send-email-jbrunet-rdvid1DuHRBWk0Htik3J/w@public.gmane.org>
On Tue, Nov 15, 2016 at 03:29:13PM +0100, Jerome Brunet wrote:
> Signed-off-by: Jerome Brunet <jbrunet-rdvid1DuHRBWk0Htik3J/w@public.gmane.org>
> Signed-off-by: Neil Armstrong <narmstrong-rdvid1DuHRBWk0Htik3J/w@public.gmane.org>
> ---
> .../devicetree/bindings/net/realtek-phy.txt | 20 ++++++++++++++++++++
> 1 file changed, 20 insertions(+)
> create mode 100644 Documentation/devicetree/bindings/net/realtek-phy.txt
>
> diff --git a/Documentation/devicetree/bindings/net/realtek-phy.txt b/Documentation/devicetree/bindings/net/realtek-phy.txt
> new file mode 100644
> index 000000000000..dc2845a6b387
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/net/realtek-phy.txt
> @@ -0,0 +1,20 @@
> +Realtek Ethernet PHY
> +
> +Some boards require special tuning values of the phy.
> +
> +Optional properties:
> +
> +realtek,disable-eee-1000t:
> +realtek,disable-eee-100tx:
Make these generic/common.
> + If set, respectively disable 1000-BaseT and 100-BaseTx energy efficient
> + ethernet capabilty advertisement
> + default: Leave the phy default settings unchanged (capabilities advertised)
> +
> +Example:
> +
> +&mdio0 {
> + ethernetphy0: ethernet-phy@0 {
> + reg = <0>;
> + realtek,disable-eee-1000t;
> + };
> +};
> --
> 2.7.4
>
> --
> To unsubscribe from this list: send the line "unsubscribe devicetree" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Re: linux kernel 4.6 and 4.7 from backports have bug in congestion module refcount
From: Ben Hutchings @ 2016-11-16 15:12 UTC (permalink / raw)
To: netdev; +Cc: Panagiotis Malakoudis, Debian kernel maintainers
In-Reply-To: <CAMig2e6C0Ga2oHiVGRrmQ_8Uw2AabWj7EyYyDDbHhxnMJSOWag@mail.gmail.com>
[-- Attachment #1: Type: text/plain, Size: 6726 bytes --]
[Moving to netdev and debian-kernel lists]
On Wed, 2016-11-16 at 16:12 +0200, Panagiotis Malakoudis wrote:
> I am using Debian Jessie with backports and with the last kernels,
> 4.6.0-0.bpo.1 and 4.7.0-bpo.1 I experience the following WARNING after a
> few hours of heavy load tcp traffic.
>
> Nov 15 01:40:00 archive kernel: WARNING: CPU: 8 PID: 0 at /build/linux-lVEVrl/linux-4.7.8/kernel/module.c:1107 module_put+0x8d/0xa0
> Nov 15 01:40:00 archive kernel: Modules linked in: seqiv xfrm6_mode_tunnel xfrm4_mode_tunnel twofish_generic twofish_avx_x86_64 twofish_x86_64_3way twofish_x86_64 twofish_common serpent_avx2 serpent_avx_x86_64 serpent_sse2_x86_64 serpent_generic blowfish_generic blowfish_x86_64 blowfish_common cast5_avx_x86_64 cast5_generic cast_common ctr ecb des_generic cbc algif_skcipher camellia_generic camellia_aesni_avx2 camellia_aesni_avx_x86_64 camellia_x86_64 xts xcbc sha512_ssse3 sha512_generic md4 algif_hash af_alg xfrm_user xfrm4_tunnel ipcomp xfrm_ipcomp esp4 ah4 af_key xfrm_algo nfsd auth_rpcgss nfs_acl nfs lockd grace fscache sunrpc ipip tunnel4 ip_tunnel xt_TCPMSS xt_tcpmss nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables xt_recent xt_tcpudp xt_policy nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack
> Nov 15 01:40:00 archive kernel: iptable_filter ip_tables x_tables xfs libcrc32c crc32c_generic intel_rapl sb_edac edac_core x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel iTCO_wdt iTCO_vendor_support jitterentropy_rng hmac drbg ansi_cprng aesni_intel ast aes_x86_64 lrw ttm gf128mul glue_helper ablk_helper cryptd drm_kms_helper pcspkr drm lpc_ich mfd_core i2c_i801 sg mei_me joydev evdev ipmi_si mei ioatdma shpchp wmi button ipmi_msghandler tpm_tis acpi_pad tpm acpi_power_meter tcp_htcp autofs4 ext4 crc16 jbd2 mbcache dm_mod md_mod hid_generic usbhid hid sd_mod crc32c_intel xhci_pci ahci xhci_hcd ehci_pci libahci ehci_hcd libata ixgbe mpt3sas igb usbcore raid_class scsi_transport_sas i2c_algo_bit usb_common dca scsi_mod ptp pps_core mdio fjes
> Nov 15 01:40:00 archive kernel: CPU: 8 PID: 0 Comm: swapper/8 Tainted: G W 4.7.0-0.bpo.1-amd64 #1 Debian 4.7.8-1~bpo8+1
> Nov 15 01:40:00 archive kernel: Hardware name: Supermicro Super Server/X10SRH-CF, BIOS 1.0c 09/14/2015
> Nov 15 01:40:00 archive kernel: 0000000000000286 c4c1e09c90bbaf17 ffffffffa3d1c805 0000000000000000
> Nov 15 01:40:00 archive kernel: 0000000000000000 ffffffffa3a7c9c4 ffff881024255000 ffffffffc032a0c0
> Nov 15 01:40:00 archive kernel: ffff881024255130 0000000000000000 ffff881024255000 ffff881024255000
> Nov 15 01:40:00 archive kernel: Call Trace:
> Nov 15 01:40:00 archive kernel: <IRQ> [<ffffffffa3d1c805>] ? dump_stack+0x5c/0x77
> Nov 15 01:40:00 archive kernel: [<ffffffffa3a7c9c4>] ? __warn+0xc4/0xe0
> Nov 15 01:40:00 archive kernel: [<ffffffffa3afe6dd>] ? module_put+0x8d/0xa0
> Nov 15 01:40:00 archive kernel: [<ffffffffa3f41ed0>] ? tcp_v4_destroy_sock+0x20/0x290
> Nov 15 01:40:00 archive kernel: [<ffffffffa3f292e7>] ? inet_csk_destroy_sock+0x47/0x160
> Nov 15 01:40:00 archive kernel: [<ffffffffa3f38c37>] ? tcp_rcv_state_process+0xcc7/0xcd0
> Nov 15 01:40:00 archive kernel: [<ffffffffa3fd14e0>] ? tcp_v4_inbound_md5_hash+0x67/0x177
> Nov 15 01:40:00 archive kernel: [<ffffffffa3f41d20>] ? tcp_v4_do_rcv+0x70/0x200
> Nov 15 01:40:00 archive kernel: [<ffffffffa3f435e2>] ? tcp_v4_rcv+0x8a2/0xa90
> Nov 15 01:40:00 archive kernel: [<ffffffffc06fac01>] ? ipv4_confirm+0x61/0xf0 [nf_conntrack_ipv4]
> Nov 15 01:40:00 archive kernel: [<ffffffffa3f1d86b>] ? ip_local_deliver_finish+0x8b/0x1c0
> Nov 15 01:40:00 archive kernel: [<ffffffffa3f1db3b>] ? ip_local_deliver+0x6b/0xf0
> Nov 15 01:40:00 archive kernel: [<ffffffffa3f1d7e0>] ? ip_rcv_finish+0x3e0/0x3e0
> Nov 15 01:40:00 archive kernel: [<ffffffffa3f1de41>] ? ip_rcv+0x281/0x3b0
> Nov 15 01:40:00 archive kernel: [<ffffffffa3f1d400>] ? inet_del_offload+0x40/0x40
> Nov 15 01:40:00 archive kernel: [<ffffffffa3edf11e>] ? __netif_receive_skb_core+0x2be/0xa50
> Nov 15 01:40:00 archive kernel: [<ffffffffa3f58865>] ? inet_gro_receive+0x265/0x2a0
> Nov 15 01:40:00 archive kernel: [<ffffffffa3edf93f>] ? netif_receive_skb_internal+0x2f/0xa0
> Nov 15 01:40:00 archive kernel: [<ffffffffa3ee088b>] ? napi_gro_receive+0xbb/0x110
> Nov 15 01:40:00 archive kernel: [<ffffffffc0332522>] ? ixgbe_clean_rx_irq+0x542/0xaf0 [ixgbe]
> Nov 15 01:40:00 archive kernel: [<ffffffffc033381c>] ? ixgbe_poll+0x44c/0x7a0 [ixgbe]
> Nov 15 01:40:00 archive kernel: [<ffffffffa3ee00e8>] ? net_rx_action+0x238/0x370
> Nov 15 01:40:00 archive kernel: [<ffffffffa3fe20b6>] ? __do_softirq+0x106/0x294
> Nov 15 01:40:00 archive kernel: [<ffffffffa3a82306>] ? irq_exit+0x86/0x90
> Nov 15 01:40:00 archive kernel: [<ffffffffa3fe1dff>] ? do_IRQ+0x4f/0xd0
> Nov 15 01:40:00 archive kernel: [<ffffffffa3fdff42>] ? common_interrupt+0x82/0x82
> Nov 15 01:40:00 archive kernel: <EOI> [<ffffffffa3ea38e2>] ? cpuidle_enter_state+0x112/0x260
> Nov 15 01:40:00 archive kernel: [<ffffffffa3abe31e>] ? cpu_startup_entry+0x2be/0x360
> Nov 15 01:40:00 archive kernel: [<ffffffffa3a4e1c1>] ? start_secondary+0x151/0x190
> Nov 15 01:40:00 archive kernel: ---[ end trace cdbbaec80305a0cc ]---
>
> Tracing the code I found that the issue comes from the usage of a tcp
> congestion module.
> I use sysctl -w net.ipv4.tcp_congestion_control=htcp, which is compiled as
> module. Default congestion algo, cubic, is compiled inside kernel, not
> module.
>
> When the warning is triggered, /sys/module/tcp_htcp/refcnt is -1, which is
> something that should not happen. It is triggered from inside
> tcp_v4_destroy_sock, when calling tcp_cleanup_congestion_control, which
> then calls module_put.
>
> This doesn't happen in 4.5.0-0.bpo.2 (2016-05-13) so I guess there is a bug
> introduced from 4.5 to 4.6 which still lives in 4.7 and decrements
> reference count of htcp congestion module when it shouldn't. The issue is
> only triggered under heavy load (the above mentioned server is a 10g SFP+
> server with more than 4 Gbps traffic at certain times of day).
I didn't see any bug fixes relating to congestion control refcounts
after 4.7, so I'm guessing this bug is still present upstream.
Ben.
> I will test with a different congestion module (tcp_highspeed) to see if
> the issue is only htcp related or it is general issue of using congestion
> tcp modules.
>
> Panagiotis Malakoudis
--
Ben Hutchings
Time is nature's way of making sure that everything doesn't happen at
once.
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 801 bytes --]
^ permalink raw reply
* [PATCH net] ixgbevf: fix invalid use of napi_hash_del()
From: Eric Dumazet @ 2016-11-16 15:15 UTC (permalink / raw)
To: David Miller; +Cc: netdev, Jeff Kirsher
From: Eric Dumazet <edumazet@google.com>
Calling napi_hash_del() before netif_napi_del() is dangerous
if a synchronize_rcu() is not enforced before NAPI struct freeing.
Lets leave this detail to core networking stack to get it right.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 3 ---
1 file changed, 3 deletions(-)
diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
index 7eaac3234049..30a26e624e5a 100644
--- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
+++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
@@ -2511,9 +2511,6 @@ static int ixgbevf_alloc_q_vectors(struct ixgbevf_adapter *adapter)
while (q_idx) {
q_idx--;
q_vector = adapter->q_vector[q_idx];
-#ifdef CONFIG_NET_RX_BUSY_POLL
- napi_hash_del(&q_vector->napi);
-#endif
netif_napi_del(&q_vector->napi);
kfree(q_vector);
adapter->q_vector[q_idx] = NULL;
^ permalink raw reply related
* Re: [patch net-next 6/8] ipv4: fib: Add an API to request a FIB dump
From: Ido Schimmel @ 2016-11-16 15:18 UTC (permalink / raw)
To: Hannes Frederic Sowa
Cc: Jiri Pirko, netdev, davem, idosch, eladr, yotamg, nogahf, arkadis,
ogerlitz, roopa, dsa, nikolay, andy, vivien.didelot, andrew,
f.fainelli, alexander.h.duyck, kuznet, jmorris, yoshfuji, kaber
In-Reply-To: <e795e6e0-a680-a2c3-7d5e-afd966104a4a@stressinduktion.org>
Hi Hannes,
On Wed, Nov 16, 2016 at 03:51:01PM +0100, Hannes Frederic Sowa wrote:
> On 16.11.2016 15:09, Jiri Pirko wrote:
> > From: Ido Schimmel <idosch@mellanox.com>
> >
> > Commit b90eb7549499 ("fib: introduce FIB notification infrastructure")
> > introduced a new notification chain to notify listeners (f.e., switchdev
> > drivers) about addition and deletion of routes.
> >
> > However, upon registration to the chain the FIB tables can already be
> > populated, which means potential listeners will have an incomplete view
> > of the tables.
> >
> > Solve that by adding an API to request a FIB dump. The dump itself it
> > done using RCU in order not to starve consumers that need RTNL to make
> > progress.
> >
> > Signed-off-by: Ido Schimmel <idosch@mellanox.com>
> > Signed-off-by: Jiri Pirko <jiri@mellanox.com>
>
> Have you looked at potential inconsistencies resulting of RCU walking
> the table and having concurrent inserts?
Yes. I did try to think about situations in which this approach will
fail, but I could only find problems with concurrent removals, which I
addressed in 5/8. In case of concurrent insertions, even if you missed
the node, you would still get the ENTRY_ADD event to your listener.
^ permalink raw reply
* Re: [PATCH net 2/3] dt-bindings: net: add DT bindings for realtek phys
From: Jerome Brunet @ 2016-11-16 15:20 UTC (permalink / raw)
To: Rob Herring
Cc: netdev-u79uwXL29TY76Z2rM5mHXA, devicetree-u79uwXL29TY76Z2rM5mHXA,
Florian Fainelli, Carlo Caione, Kevin Hilman, Giuseppe Cavallaro,
Alexandre TORGUE, Martin Blumenstingl, Andre Roth, Neil Armstrong,
linux-amlogic-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
linux-kernel-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <20161116151104.5sfcqjyvrzre5pkn@rob-hp-laptop>
On Wed, 2016-11-16 at 09:11 -0600, Rob Herring wrote:
> On Tue, Nov 15, 2016 at 03:29:13PM +0100, Jerome Brunet wrote:
> >
> > Signed-off-by: Jerome Brunet <jbrunet-rdvid1DuHRBWk0Htik3J/w@public.gmane.org>
> > Signed-off-by: Neil Armstrong <narmstrong-rdvid1DuHRBWk0Htik3J/w@public.gmane.org>
> > ---
> > .../devicetree/bindings/net/realtek-phy.txt | 20
> > ++++++++++++++++++++
> > 1 file changed, 20 insertions(+)
> > create mode 100644 Documentation/devicetree/bindings/net/realtek-
> > phy.txt
> >
> > diff --git a/Documentation/devicetree/bindings/net/realtek-phy.txt
> > b/Documentation/devicetree/bindings/net/realtek-phy.txt
> > new file mode 100644
> > index 000000000000..dc2845a6b387
> > --- /dev/null
> > +++ b/Documentation/devicetree/bindings/net/realtek-phy.txt
> > @@ -0,0 +1,20 @@
> > +Realtek Ethernet PHY
> > +
> > +Some boards require special tuning values of the phy.
> > +
> > +Optional properties:
> > +
> > +realtek,disable-eee-1000t:
> > +realtek,disable-eee-100tx:
>
> Make these generic/common.
Same feedback from the net folks. Will do.
Thx Rob
>
> >
> > + If set, respectively disable 1000-BaseT and 100-BaseTx energy
> > efficient
> > + ethernet capabilty advertisement
> > + default: Leave the phy default settings unchanged (capabilities
> > advertised)
> > +
> > +Example:
> > +
> > +&mdio0 {
> > + ethernetphy0: ethernet-phy@0 {
> > + reg = <0>;
> > + realtek,disable-eee-1000t;
> > + };
> > +};
> > --
> > 2.7.4
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe
> > devicetree" in
> > the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* Cannot set IPv6 address
From: Saeed Mahameed @ 2016-11-16 15:22 UTC (permalink / raw)
To: david.lebrun; +Cc: Linux Netdev List, Doron Tsur, Majd Dibbiny
Hi David,
The following commit introduced a new issue when setting IPv6 address
via the following command:
/sbin/ip -6 addr add 2001:0db8:0:f112::1/64 dev enp2s2
RTNETLINK answers: Operation not supported
Offending commit:
commit 6c8702c60b88651072460f3f4026c7dfe2521d12
Author: David Lebrun <david.lebrun@uclouvain.be>
Date: Tue Nov 8 14:57:41 2016 +0100
ipv6: sr: add support for SRH encapsulation and injection with lwtunnels
This patch creates a new type of interfaceless lightweight tunnel (SEG6),
enabling the encapsulation and injection of SRH within locally emitted
packets and forwarded packets.
>From a configuration viewpoint, a seg6 tunnel would be configured
as follows:
ip -6 ro ad fc00::1/128 encap seg6 mode encap segs
fc42::1,fc42::2,fc42::3 dev eth0
Any packet whose destination address is fc00::1 would thus be encapsulated
within an outer IPv6 header containing the SRH with three
segments, and would
actually be routed to the first segment of the list. If `mode inline' was
specified instead of `mode encap', then the SRH would be directly inserted
after the IPv6 header without outer encapsulation.
The inline mode is only available if CONFIG_IPV6_SEG6_INLINE is
enabled. This
feature was made configurable because direct header insertion may break
several mechanisms such as PMTUD or IPSec AH.
Signed-off-by: David Lebrun <david.lebrun@uclouvain.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
Can you check ? Are we missing something here ?
Thanks,
Saeed.
^ permalink raw reply
* Re: [PATCH net] ixgbevf: fix invalid use of napi_hash_del()
From: Eric Dumazet @ 2016-11-16 15:25 UTC (permalink / raw)
To: David Miller; +Cc: netdev, Jeff Kirsher
In-Reply-To: <1479309315.8455.189.camel@edumazet-glaptop3.roam.corp.google.com>
On Wed, 2016-11-16 at 07:15 -0800, Eric Dumazet wrote:
> From: Eric Dumazet <edumazet@google.com>
>
> Calling napi_hash_del() before netif_napi_del() is dangerous
> if a synchronize_rcu() is not enforced before NAPI struct freeing.
>
> Lets leave this detail to core networking stack to get it right.
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
> ---
> drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 3 ---
> 1 file changed, 3 deletions(-)
Will send a v2
Same stuff is needed in ixgbevf_free_q_vectors()
^ permalink raw reply
* Re: [PATCH net-next v2 2/4] bpf, mlx5: fix various refcount issues in mlx5e_xdp_set
From: Daniel Borkmann @ 2016-11-16 15:26 UTC (permalink / raw)
To: Saeed Mahameed
Cc: David S. Miller, Alexei Starovoitov, Brenden Blanco, zhiyisun,
Rana Shahout, Saeed Mahameed, Linux Netdev List
In-Reply-To: <CALzJLG8jYhp0EZS=oOTv-Oy6fJFTax8FSA6hh4EXXS1E5g-+uQ@mail.gmail.com>
On 11/16/2016 03:30 PM, Saeed Mahameed wrote:
> On Wed, Nov 16, 2016 at 3:54 PM, Daniel Borkmann <daniel@iogearbox.net> wrote:
>> On 11/16/2016 01:25 PM, Saeed Mahameed wrote:
>>> On Wed, Nov 16, 2016 at 2:04 AM, Daniel Borkmann <daniel@iogearbox.net>
>>> wrote:
>>>>
>>>> There are multiple issues in mlx5e_xdp_set():
>>>>
>>>> 1) The batched bpf_prog_add() is currently not checked for errors! When
>>>> doing so, it should be done at an earlier point in time to makes sure
>>>> that we cannot fail anymore at the time we want to set the program
>>>> for
>>>> each channel. This only means that we have to undo the bpf_prog_add()
>>>> in case we return due to required reset.
>>>>
>>>> 2) When swapping the priv->xdp_prog, then no extra reference count must
>>>> be
>>>> taken since we got that from call path via dev_change_xdp_fd()
>>>> already.
>>>> Otherwise, we'd never be able to free the program. Also,
>>>> bpf_prog_add()
>>>> without checking the return code could fail.
>>>>
>>>> Fixes: 86994156c736 ("net/mlx5e: XDP fast RX drop bpf programs support")
>>>> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
>>>> ---
>>>> drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 25
>>>> ++++++++++++++++++-----
>>>> 1 file changed, 20 insertions(+), 5 deletions(-)
>>>>
>>>> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
>>>> b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
>>>> index ab0c336..cf26672 100644
>>>> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
>>>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
>>>> @@ -3142,6 +3142,17 @@ static int mlx5e_xdp_set(struct net_device
>>>> *netdev, struct bpf_prog *prog)
>>>> goto unlock;
>>>> }
>>>>
>>>> + if (prog) {
>>>> + /* num_channels is invariant here, so we can take the
>>>> + * batched reference right upfront.
>>>> + */
>>>> + prog = bpf_prog_add(prog, priv->params.num_channels);
>>>> + if (IS_ERR(prog)) {
>>>> + err = PTR_ERR(prog);
>>>> + goto unlock;
>>>> + }
>>>> + }
>>>> +
>>>
>>> With this approach you will end up taking a ref count twice per RQ! on
>>> the first time xdp_set is called i.e (old_prog = NULL, prog != NULL).
>>> One ref will be taken per RQ/Channel from the above code, and since
>>> reset will be TRUE mlx5e_open_locked will be called and another ref
>>> count will be taken on mlx5e_create_rq.
>>>
>>> The issue here is that we have two places for ref count accounting,
>>> xdp_set and mlx5e_create_rq. Having ref-count updates in
>>> mlx5e_create_rq is essential for num_channels configuration changes
>>> (mlx5e_set_ringparam).
>>>
>>> Our previous approach made sure that only one path will do the ref
>>> counting (mlx5e_open_locked vs. mlx5e_xdp_set batch ref update when
>>> reset == false).
>>
>> That is correct, for a short time bpf_prog_add() was charged also when
>> we reset. When we reset, we will then jump to unlock_put and safely undo
>> this since we took ref from mlx5e_create_rq() already in that case, and
>> return successfully. That was intentional, so that eventually we end up
>> just taking a /single/ ref per RQ when we exit mlx5e_xdp_set(), but more
>> below ...
>>
>> [...]
>>>
>>> 2. Keep the current approach and make sure to not update the ref count
>>> twice, you can batch update only if (!reset && was_open) otherwise you
>>> can rely on mlx5e_open_locked to take the num_channels ref count for
>>> you.
>>>
>>> Personally I prefer option 2, since we will keep the current logic
>>> which will allow configuration updates such as (mlx5e_set_ringparam)
>>> to not worry about ref counting since it will be done in the reset
>>> flow.
>>
>> ... agree on keeping current approach. I actually like the idea, so we'd
>> end up with this simpler version for the batched ref then.
>
> Great :).
>
> So let's do the batched update only if we are not going to reset (we
> already know that in advance), instead of the current patch where you
> batch update unconditionally and then
> unlock_put in case reset was performed (which is just redundant and confusing).
>
> Please note that if open_locked fails you need to goto unlock_put.
Sorry, I'm not quite sure I follow you here; are you /now/ commenting on
the original patch or on the already updated diff I did from my very last
email, that is, http://patchwork.ozlabs.org/patch/695564/ ?
>> Note, your "bpf_prog_add(prog, 1) // one ref for the device." is not needed
>> since this we already got that one through dev_change_xdp_fd() as mentioned.
>
> If it is not needed then why we need bpf_prog_put on mlx5e_nic_cleanup
> in your next patch? this doesn't look symmetric (right) !
> you shouldn't release a reference you didn't take.
> Overall with this series the driver can take num_channels refs and
> will release num_channels refs on mlx5e_close. we shouldn't release
> one extra ref on NIC cleanup.
I already explained this in the commit description; when dev_change_xdp_fd()
is called and sees a valid fd, it does bpf_prog_get_type(), which calls the
__bpf_prog_get() taking a ref on the program (bpf_prog_inc()). That is then
passed down to ops->ndo_xdp(). Only in case of error from the ->ndo_xdp()
callback, we bpf_prog_put() this reference from dev_change_xdp_fd() side.
For drivers that implement against this ndo, it means that you need N-1 refs
in addition. Have a look at the other two in-tree users, which do it correctly,
that is mlx4_xdp_set() and nfp_net_xdp_setup().
It's documented here (include/linux/netdevice.h) with ndo_xdp referring to it:
/* These structures hold the attributes of xdp state that are being passed
* to the netdevice through the xdp op.
*/
enum xdp_netdev_command {
/* Set or clear a bpf program used in the earliest stages of packet
* rx. The prog will have been loaded as BPF_PROG_TYPE_XDP. The callee
* is responsible for calling bpf_prog_put on any old progs that are
* stored. In case of error, the callee need not release the new prog
* reference, but on success it takes ownership and must bpf_prog_put
* when it is no longer used.
*/
XDP_SETUP_PROG,
[...]
};
I think reason why this is rather confusing would be that initially, it was
just a single prog for all queues, but it was requested along the way to move
prog pointer down to queues instead and let them have their own ref, so that
some time in future individual progs from a subset of the queues can be
exchanged.
Since the way xdp in mlx5 is implemented, you currently have the priv->xdp_prog
as the control prog. That's okay right now, but requires to drop the last ref
on priv->xdp_prog via bpf_prog_put() when netdev is dismantled and priv->xdp_prog
still present.
^ permalink raw reply
* [PATCH v2 net] ixgbevf: fix invalid uses of napi_hash_del()
From: Eric Dumazet @ 2016-11-16 15:26 UTC (permalink / raw)
To: David Miller; +Cc: netdev, Jeff Kirsher
In-Reply-To: <1479309315.8455.189.camel@edumazet-glaptop3.roam.corp.google.com>
From: Eric Dumazet <edumazet@google.com>
Calling napi_hash_del() before netif_napi_del() is dangerous
if a synchronize_rcu() is not enforced before NAPI struct freeing.
Lets leave this detail to core networking stack to get it right.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 6 ------
1 file changed, 6 deletions(-)
diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
index 7eaac3234049..bf4d7efc7dbd 100644
--- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
+++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
@@ -2511,9 +2511,6 @@ static int ixgbevf_alloc_q_vectors(struct ixgbevf_adapter *adapter)
while (q_idx) {
q_idx--;
q_vector = adapter->q_vector[q_idx];
-#ifdef CONFIG_NET_RX_BUSY_POLL
- napi_hash_del(&q_vector->napi);
-#endif
netif_napi_del(&q_vector->napi);
kfree(q_vector);
adapter->q_vector[q_idx] = NULL;
@@ -2537,9 +2534,6 @@ static void ixgbevf_free_q_vectors(struct ixgbevf_adapter *adapter)
struct ixgbevf_q_vector *q_vector = adapter->q_vector[q_idx];
adapter->q_vector[q_idx] = NULL;
-#ifdef CONFIG_NET_RX_BUSY_POLL
- napi_hash_del(&q_vector->napi);
-#endif
netif_napi_del(&q_vector->napi);
kfree(q_vector);
}
^ permalink raw reply related
* [PATCH net] ip6_tunnel: disable caching when the traffic class is inherited
From: Paolo Abeni @ 2016-11-16 15:26 UTC (permalink / raw)
To: netdev; +Cc: David S. Miller, Liam McBirnie, Hannes Frederic Sowa
If an ip6 tunnel is configured to inherit the traffic class from
the inner header, the dst_cache must be disabled or it will foul
the policy routing.
The issue is apprently there since at leat Linux-2.6.12-rc2.
Reported-by: Liam McBirnie <liam.mcbirnie@boeing.com>
Cc: Liam McBirnie <liam.mcbirnie@boeing.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
net/ipv6/ip6_tunnel.c | 13 +++++++++++--
1 file changed, 11 insertions(+), 2 deletions(-)
diff --git a/net/ipv6/ip6_tunnel.c b/net/ipv6/ip6_tunnel.c
index 8778456..0a4759b 100644
--- a/net/ipv6/ip6_tunnel.c
+++ b/net/ipv6/ip6_tunnel.c
@@ -1034,6 +1034,7 @@ int ip6_tnl_xmit(struct sk_buff *skb, struct net_device *dev, __u8 dsfield,
int mtu;
unsigned int psh_hlen = sizeof(struct ipv6hdr) + t->encap_hlen;
unsigned int max_headroom = psh_hlen;
+ bool use_cache = false;
u8 hop_limit;
int err = -1;
@@ -1066,7 +1067,15 @@ int ip6_tnl_xmit(struct sk_buff *skb, struct net_device *dev, __u8 dsfield,
memcpy(&fl6->daddr, addr6, sizeof(fl6->daddr));
neigh_release(neigh);
- } else if (!fl6->flowi6_mark)
+ } else if (!(t->parms.flags &
+ (IP6_TNL_F_USE_ORIG_TCLASS | IP6_TNL_F_USE_ORIG_FWMARK))) {
+ /* enable the cache only only if the routing decision does
+ * not depend on the current inner header value
+ */
+ use_cache = true;
+ }
+
+ if (use_cache)
dst = dst_cache_get(&t->dst_cache);
if (!ip6_tnl_xmit_ctl(t, &fl6->saddr, &fl6->daddr))
@@ -1150,7 +1159,7 @@ int ip6_tnl_xmit(struct sk_buff *skb, struct net_device *dev, __u8 dsfield,
if (t->encap.type != TUNNEL_ENCAP_NONE)
goto tx_err_dst_release;
} else {
- if (!fl6->flowi6_mark && ndst)
+ if (use_cache && ndst)
dst_cache_set_ip6(&t->dst_cache, ndst, &fl6->saddr);
}
skb_dst_set(skb, dst);
--
1.8.3.1
^ permalink raw reply related
* Re: Cannot set IPv6 address
From: Eric Dumazet @ 2016-11-16 15:29 UTC (permalink / raw)
To: Saeed Mahameed; +Cc: david.lebrun, Linux Netdev List, Doron Tsur, Majd Dibbiny
In-Reply-To: <CALzJLG9B52CbMEiWgroaihq4Ar2Jc9H3BZdA_xjF37CZESbZ8Q@mail.gmail.com>
On Wed, 2016-11-16 at 17:22 +0200, Saeed Mahameed wrote:
> Hi David,
>
> The following commit introduced a new issue when setting IPv6 address
> via the following command:
>
> /sbin/ip -6 addr add 2001:0db8:0:f112::1/64 dev enp2s2
> RTNETLINK answers: Operation not supported
>
> Offending commit:
>
> commit 6c8702c60b88651072460f3f4026c7dfe2521d12
> Author: David Lebrun <david.lebrun@uclouvain.be>
> Date: Tue Nov 8 14:57:41 2016 +0100
>
> ipv6: sr: add support for SRH encapsulation and injection with lwtunnels
>
> This patch creates a new type of interfaceless lightweight tunnel (SEG6),
> enabling the encapsulation and injection of SRH within locally emitted
> packets and forwarded packets.
>
> >From a configuration viewpoint, a seg6 tunnel would be configured
> as follows:
>
> ip -6 ro ad fc00::1/128 encap seg6 mode encap segs
> fc42::1,fc42::2,fc42::3 dev eth0
>
> Any packet whose destination address is fc00::1 would thus be encapsulated
> within an outer IPv6 header containing the SRH with three
> segments, and would
> actually be routed to the first segment of the list. If `mode inline' was
> specified instead of `mode encap', then the SRH would be directly inserted
> after the IPv6 header without outer encapsulation.
>
> The inline mode is only available if CONFIG_IPV6_SEG6_INLINE is
> enabled. This
> feature was made configurable because direct header insertion may break
> several mechanisms such as PMTUD or IPSec AH.
>
> Signed-off-by: David Lebrun <david.lebrun@uclouvain.be>
> Signed-off-by: David S. Miller <davem@davemloft.net>
>
>
> Can you check ? Are we missing something here ?
Sure, patch is under review. Please look at netdev archives and/or
ozlabs
https://patchwork.ozlabs.org/patch/695060/
^ permalink raw reply
* RE: [PATCH] lan78xx: relocate mdix setting to phy driver
From: Woojung.Huh @ 2016-11-16 15:31 UTC (permalink / raw)
To: f.fainelli, davem; +Cc: andrew, netdev, UNGLinuxDriver
In-Reply-To: <c894a7bd-4062-cb63-6b70-02bb86c874c1@gmail.com>
> static void lan88xx_set_mdix(struct phy_device *phydev)
> {
> int buf;
> int mask_val;
>
> switch (phydev->mdix) {
> case ETH_TP_MDI:
> mask_val = LAN88XX_EXT_MODE_CTRL_MDI_;
> break;
> case ETH_TP_MDI_X:
> mask_val = LAN88XX_EXT_MODE_CTRL_MDI_X_;
> break;
> case ETH_TP_MDI_AUTO:
> mask_val = LAN88XX_EXT_MODE_CTRL_AUTO_MDIX_:
> break:
> default:
> return;
> }
>
> phy_write(phydev, LAN88XX_EXT_PAGE_ACCESS,
> LAN88XX_EXT_PAGE_SPACE_1);
> buf = phy_read(phydev, LAN88XX_EXT_MODE_CTRL);
> buf &= ~LAN88XX_EXT_MODE_CTRL_MDIX_MASK_;
> buf |= mask_val;
> phy_write(phydev, LAN88XX_EXT_MODE_CTRL, buf);
> phy_write(phydev, LAN88XX_EXT_PAGE_ACCESS,
> LAN88XX_EXT_PAGE_SPACE_0);
> }
Florian,
Looks simpler to me too. Will submit new patch.
Thanks.
- Woojung
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox