Netdev List
 help / color / mirror / Atom feed
* [PATCH net v2 6/7] net: ethernet: ti: cpsw: add missing sanity check
From: Johan Hovold @ 2016-11-17 16:40 UTC (permalink / raw)
  To: Mugunthan V N
  Cc: Grygorii Strashko, linux-omap, netdev, linux-kernel, Johan Hovold
In-Reply-To: <1479400804-9847-1-git-send-email-johan@kernel.org>

Make sure to check for allocation failures before dereferencing a
NULL-pointer during probe.

Fixes: 649a1688c960 ("net: ethernet: ti: cpsw: create common struct to
hold shared driver data")
Signed-off-by: Johan Hovold <johan@kernel.org>
---
 drivers/net/ethernet/ti/cpsw.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c
index 11b2daef3158..1387299030e4 100644
--- a/drivers/net/ethernet/ti/cpsw.c
+++ b/drivers/net/ethernet/ti/cpsw.c
@@ -2588,6 +2588,9 @@ static int cpsw_probe(struct platform_device *pdev)
 	int irq;
 
 	cpsw = devm_kzalloc(&pdev->dev, sizeof(struct cpsw_common), GFP_KERNEL);
+	if (!cpsw)
+		return -ENOMEM;
+
 	cpsw->dev = &pdev->dev;
 
 	ndev = alloc_etherdev_mq(sizeof(struct cpsw_priv), CPSW_MAX_QUEUES);
-- 
2.7.3

^ permalink raw reply related

* [PATCH net v2 7/7] net: ethernet: ti: cpsw: fix fixed-link phy probe deferral
From: Johan Hovold @ 2016-11-17 16:40 UTC (permalink / raw)
  To: Mugunthan V N
  Cc: Grygorii Strashko, linux-omap, netdev, linux-kernel, Johan Hovold
In-Reply-To: <1479400804-9847-1-git-send-email-johan@kernel.org>

Make sure to propagate errors from of_phy_register_fixed_link() which
can fail with -EPROBE_DEFER.

Fixes: 1f71e8c96fc6 ("drivers: net: cpsw: Add support for fixed-link
PHY")
Signed-off-by: Johan Hovold <johan@kernel.org>
---
 drivers/net/ethernet/ti/cpsw.c | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c
index 1387299030e4..58947aae31c7 100644
--- a/drivers/net/ethernet/ti/cpsw.c
+++ b/drivers/net/ethernet/ti/cpsw.c
@@ -2375,8 +2375,11 @@ static int cpsw_probe_dt(struct cpsw_platform_data *data,
 			 * to the PHY is the Ethernet MAC DT node.
 			 */
 			ret = of_phy_register_fixed_link(slave_node);
-			if (ret)
+			if (ret) {
+				if (ret != -EPROBE_DEFER)
+					dev_err(&pdev->dev, "failed to register fixed-link phy: %d\n", ret);
 				return ret;
+			}
 			slave_data->phy_node = of_node_get(slave_node);
 		} else if (parp) {
 			u32 phyid;
@@ -2637,11 +2640,10 @@ static int cpsw_probe(struct platform_device *pdev)
 		goto clean_runtime_disable_ret;
 	}
 
-	if (cpsw_probe_dt(&cpsw->data, pdev)) {
-		dev_err(&pdev->dev, "cpsw: platform data missing\n");
-		ret = -ENODEV;
+	ret = cpsw_probe_dt(&cpsw->data, pdev);
+	if (ret)
 		goto clean_dt_ret;
-	}
+
 	data = &cpsw->data;
 	cpsw->rx_ch_num = 1;
 	cpsw->tx_ch_num = 1;
-- 
2.7.3

^ permalink raw reply related

* Re: [Patch net] net: check dead netns for peernet2id_alloc()
From: David Miller @ 2016-11-17 16:20 UTC (permalink / raw)
  To: xiyou.wangcong; +Cc: netdev, avagin, nicolas.dichtel
In-Reply-To: <1479320822-17483-1-git-send-email-xiyou.wangcong@gmail.com>

From: Cong Wang <xiyou.wangcong@gmail.com>
Date: Wed, 16 Nov 2016 10:27:02 -0800

> Andrei reports we still allocate netns ID from idr after we destroy
> it in cleanup_net().
> 
> cleanup_net():
>   ...
>   idr_destroy(&net->netns_ids);
>   ...
>   list_for_each_entry_reverse(ops, &pernet_list, list)
>     ops_exit_list(ops, &net_exit_list);
>       -> rollback_registered_many()
>         -> rtmsg_ifinfo_build_skb()
>          -> rtnl_fill_ifinfo()
>            -> peernet2id_alloc()
> 
> After that point we should not even access net->netns_ids, we
> should check the death of the current netns as early as we can in
> peernet2id_alloc().
> 
> For net-next we can consider to avoid sending rtmsg totally,
> it is a good optimization for netns teardown path.
> 
> Fixes: 0c7aecd4bde4 ("netns: add rtnl cmd to add and get peer netns ids")
> Reported-by: Andrei Vagin <avagin@gmail.com>
> Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com>
> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>

Applied and queued up for -stable, thanks.

^ permalink raw reply

* Re: [PATCH net-next] net/mlx4_en: remove napi_hash_del() call
From: David Miller @ 2016-11-17 16:17 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev, tariqt
In-Reply-To: <1479304162.8455.163.camel@edumazet-glaptop3.roam.corp.google.com>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 16 Nov 2016 05:49:22 -0800

> From: Eric Dumazet <edumazet@google.com>
> 
> There is no need calling napi_hash_del()+synchronize_rcu() before
> calling netif_napi_del()
> 
> netif_napi_del() does this already.
> 
> Using napi_hash_del() in a driver is useful only when dealing with
> a batch of NAPI structures, so that a single synchronize_rcu() can
> be used. mlx4_en_deactivate_cq() is deactivating a single NAPI.
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Applied, thanks Eric.

^ permalink raw reply

* Re: [PATCH] net: dsa: fix fixed-link-phy device leaks
From: Johan Hovold @ 2016-11-17 16:47 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: Johan Hovold, Andrew Lunn, Vivien Didelot, David S. Miller,
	netdev, linux-kernel
In-Reply-To: <20161117095225.GC21237@localhost>

On Thu, Nov 17, 2016 at 10:52:25AM +0100, Johan Hovold wrote:
> On Wed, Nov 16, 2016 at 10:14:10AM -0800, Florian Fainelli wrote:
> > On 11/16/2016 09:11 AM, Johan Hovold wrote:
> > > On Wed, Nov 16, 2016 at 09:06:26AM -0800, Florian Fainelli wrote:
> > >> On 11/16/2016 06:47 AM, Johan Hovold wrote:
> > >>> Make sure to drop the reference taken by of_phy_find_device() when
> > >>> registering and deregistering the fixed-link PHY-device.
> > >>>
> > >>> Note that we need to put both references held at deregistration.
> > >>>
> > >>> Fixes: 39b0c705195e ("net: dsa: Allow configuration of CPU & DSA port
> > >>> speeds/duplex")
> > >>> Signed-off-by: Johan Hovold <johan@kernel.org>
> > >>> ---
> > >>>
> > >>> Hi,
> > >>>
> > >>> This is one has been compile tested only, but fixes a couple of leaks
> > >>> similar to one that was found in the cpsw driver for which I just posted
> > >>> a patch.
> > >>>
> > >>> It turns out all drivers but DSA fail to deregister the fixed-link PHYs
> > >>> registered by of_phy_register_fixed_link(). Due to the way this
> > >>> interface was designed, deregistering such a PHY is a bit cumbersome and
> > >>> looks like it would benefit from a common helper.
> > >>>
> > >>> However, perhaps the interface should instead be changed so that the PHY
> > >>> device is returned so that drivers do not need to use
> > >>> of_phy_find_device() when they need to access properties of the fixed
> > >>> link (e.g. as in dsu_cpu_dsa_setup below).
> > >>>
> > >>> Thoughts?
> > >>>
> > >>> Thanks,
> > >>> Johan
> > >>>
> > >>>
> > >>>  net/dsa/dsa.c | 8 +++++++-
> > >>>  1 file changed, 7 insertions(+), 1 deletion(-)
> > >>>
> > >>> diff --git a/net/dsa/dsa.c b/net/dsa/dsa.c
> > >>> index a6902c1e2f28..798a6a776a5f 100644
> > >>> --- a/net/dsa/dsa.c
> > >>> +++ b/net/dsa/dsa.c
> > >>> @@ -233,6 +233,8 @@ int dsa_cpu_dsa_setup(struct dsa_switch *ds, struct device *dev,
> > >>>  		genphy_read_status(phydev);
> > >>>  		if (ds->ops->adjust_link)
> > >>>  			ds->ops->adjust_link(ds, port, phydev);
> > >>> +
> > >>> +		phy_device_free(phydev);	/* of_phy_find_device */
> > >>>  	}
> > >>>  
> > >>>  	return 0;
> > >>> @@ -509,8 +511,12 @@ void dsa_cpu_dsa_destroy(struct device_node *port_dn)
> > >>>  	if (of_phy_is_fixed_link(port_dn)) {
> > >>>  		phydev = of_phy_find_device(port_dn);
> > >>>  		if (phydev) {
> > >>> -			phy_device_free(phydev);
> > >>>  			fixed_phy_unregister(phydev);
> > >>> +			/* Put references taken by of_phy_find_device() and
> > >>> +			 * of_phy_register_fixed_link().
> > >>> +			 */
> > >>> +			phy_device_free(phydev);
> > >>> +			phy_device_free(phydev);
> > >>
> > >> Double free, this looks bogus here. Actually would not this mean a
> > >> triple free since you already free in dsa_cpu_dsa_setup() which is
> > >> paired with dsa_cpu_dsa_destroy()?
> > > 
> > > The naming of phy_device_free() is unfortunate when it's really a put():
> > > 
> > > void phy_device_free(struct phy_device *phydev)
> > > {
> > > 	put_device(&phydev->mdio.dev);
> > > }
> > 
> > Indeed, should have looked a little harder.
> > 
> > > 
> > > which may need to be called multiple times, specifically after a call to
> > > of_phy_find_device() which takes another reference.
> > > 
> > > With this patch the refcounts are properly balanced.
> > 
> > The intent of your patch is good, but it still feels like having to
> > double imbalance the refcount is symptomatic of a larger issue here, it
> > does not seem like having several refcounts are necessary, so we may
> > really want to rework the API.
> 
> I'll cook something up for -next, but how about using
> 
> 	put_device(&phydev->mdio.dev)
> 
> for the reference taken by of_phy_find_device() (as some driver
> already do) to fix the immediate leaks?
> 
> Then deregistration would look like:
> 
> 	phydev = of_phy_find_device(port_dn);
> 	if (phydev) {
> -			phy_device_free(phydev);
> 			fixed_phy_unregister(phydev);
> +			put_device(&phydev->mdio.dev)
> +			phy_device_free(phydev);

On second thought, I suggest just renaming the phy_device_free() as
phy_device_put() to reflect what it really does these days. There can
never be a guarantee that the phy is freed on a call to
phy_device_free() anyway.

The OF interface can still be modified to return a pointer later.

Johan

^ permalink raw reply

* Re: [patch net-next 6/8] ipv4: fib: Add an API to request a FIB dump
From: David Miller @ 2016-11-17 16:45 UTC (permalink / raw)
  To: hannes
  Cc: idosch, jiri, netdev, idosch, eladr, yotamg, nogahf, arkadis,
	ogerlitz, roopa, dsa, nikolay, andy, vivien.didelot, andrew,
	f.fainelli, alexander.h.duyck, kuznet, jmorris, yoshfuji, kaber
In-Reply-To: <e2e0eaaf-30f0-e825-29c7-920115742301@stressinduktion.org>

From: Hannes Frederic Sowa <hannes@stressinduktion.org>
Date: Thu, 17 Nov 2016 15:36:48 +0100

> The other way is the journal idea I had, which uses an rb-tree with
> timestamps as keys (can be lamport timestamps). You insert into the tree
> until the dump is finished and use it as queue later to shuffle stuff
> into the hardware.

If you have this "place" where pending inserts are stored, you have
a policy decision to make.

First of all what do other lookups see when there are pending entires?

If, once inserted into the pending queue, you return success to the
inserting entity, then you must make those pending entries visible
to lookups.

If you block the inserting entity, well that doesn't make a lot of
sense.  If blocking is a workable solution, then we can just block the
entire insert while this FIB dump to the device driver is happening.

But I am pretty sure the idea of blocking modifications for so long
was considered undesirable.

^ permalink raw reply

* Re: [PATCH net-next v2 3/3] net: marvell: Allow drivers to be built with COMPILE_TEST
From: Florian Fainelli @ 2016-11-17 16:52 UTC (permalink / raw)
  To: Arnd Bergmann, kbuild test robot
  Cc: kbuild-all, netdev, davem, mw, gregory.clement, Shaohui.Xie
In-Reply-To: <19888267.TNEXR1Cft3@wuerfel>

On 11/17/2016 01:12 AM, Arnd Bergmann wrote:
> On Thursday, November 17, 2016 3:10:10 PM CET kbuild test robot wrote:
>> url:    https://github.com/0day-ci/linux/commits/Florian-Fainelli/net-fsl-Allow-most-drivers-to-be-built-with-COMPILE_TEST/20161116-094841
>> config: mips-allyesconfig (attached as .config)
>> compiler: mips-linux-gnu-gcc (Debian 6.1.1-9) 6.1.1 20160705
>> reproduce:
>>         wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross
>>         chmod +x ~/bin/make.cross
>>         # save the attached .config to linux build tree
>>         make.cross ARCH=mips 
>>
>> All errors (new ones prefixed by >>):
>>
>>    drivers/built-in.o: In function `mvneta_bm_pool_create':
>>>> mvneta_bm.c:(.text+0x772f0c): undefined reference to `mvebu_mbus_get_dram_win_info'
> 
> We should fix this, I'd suggest either adding an inline wrapper for
> mvebu_mbus_get_dram_win_info() when CONFIG_MVEBU_MBUS is not set,
> or allowing CONFIG_MVEBU_MBUS to be enabled for COMPILE_TEST as well.

Yes, I have a patch prepared for that, thanks!
-- 
Florian

^ permalink raw reply

* Re: [PATCH net v2 7/7] net: ethernet: ti: cpsw: fix fixed-link phy probe deferral
From: David Miller @ 2016-11-17 17:04 UTC (permalink / raw)
  To: johan; +Cc: mugunthanvnm, grygorii.strashko, linux-omap, netdev, linux-kernel
In-Reply-To: <1479400804-9847-8-git-send-email-johan@kernel.org>

From: Johan Hovold <johan@kernel.org>
Date: Thu, 17 Nov 2016 17:40:04 +0100

> Make sure to propagate errors from of_phy_register_fixed_link() which
> can fail with -EPROBE_DEFER.
> 
> Fixes: 1f71e8c96fc6 ("drivers: net: cpsw: Add support for fixed-link
> PHY")
> Signed-off-by: Johan Hovold <johan@kernel.org>

Johan, when you update a patch within a series you must post the
entire series freshly to the lists, cover posting and all.

Thank you.

^ permalink raw reply

* Re: [PATCH net 0/3] net: phy: fix of_node and device leaks
From: David Miller @ 2016-11-17 17:05 UTC (permalink / raw)
  To: johan; +Cc: f.fainelli, robh+dt, frowand.list, netdev, linux-kernel
In-Reply-To: <1479306038-27211-1-git-send-email-johan@kernel.org>

From: Johan Hovold <johan@kernel.org>
Date: Wed, 16 Nov 2016 15:20:35 +0100

> These patches fix a couple of of_node leaks in the fixed-link code and a
> device reference leak in a phy helper.

Series applied, thanks.

^ permalink raw reply

* Re: [patch net-next 6/8] ipv4: fib: Add an API to request a FIB dump
From: Ido Schimmel @ 2016-11-17 13:10 UTC (permalink / raw)
  To: Hannes Frederic Sowa
  Cc: Jiri Pirko, netdev, davem, idosch, eladr, yotamg, nogahf, arkadis,
	ogerlitz, roopa, dsa, nikolay, andy, vivien.didelot, andrew,
	f.fainelli, alexander.h.duyck, kuznet, jmorris, yoshfuji, kaber
In-Reply-To: <56d2179d-00ff-bcc5-e365-845179cbe672@stressinduktion.org>

Hi Hannes,

On Wed, Nov 16, 2016 at 08:43:25PM +0100, Hannes Frederic Sowa wrote:
> On 16.11.2016 19:51, Ido Schimmel wrote:
> > On Wed, Nov 16, 2016 at 06:35:45PM +0100, Hannes Frederic Sowa wrote:
> >> On 16.11.2016 16:18, Ido Schimmel wrote:
> >>> On Wed, Nov 16, 2016 at 03:51:01PM +0100, Hannes Frederic Sowa wrote:
> >>>> On 16.11.2016 15:09, Jiri Pirko wrote:
> >>>>> From: Ido Schimmel <idosch@mellanox.com>
> >>>>>
> >>>>> Commit b90eb7549499 ("fib: introduce FIB notification infrastructure")
> >>>>> introduced a new notification chain to notify listeners (f.e., switchdev
> >>>>> drivers) about addition and deletion of routes.
> >>>>>
> >>>>> However, upon registration to the chain the FIB tables can already be
> >>>>> populated, which means potential listeners will have an incomplete view
> >>>>> of the tables.
> >>>>>
> >>>>> Solve that by adding an API to request a FIB dump. The dump itself it
> >>>>> done using RCU in order not to starve consumers that need RTNL to make
> >>>>> progress.
> >>>>>
> >>>>> Signed-off-by: Ido Schimmel <idosch@mellanox.com>
> >>>>> Signed-off-by: Jiri Pirko <jiri@mellanox.com>
> >>>>
> >>>> Have you looked at potential inconsistencies resulting of RCU walking
> >>>> the table and having concurrent inserts?
> >>>
> >>> Yes. I did try to think about situations in which this approach will
> >>> fail, but I could only find problems with concurrent removals, which I
> >>> addressed in 5/8. In case of concurrent insertions, even if you missed
> >>> the node, you would still get the ENTRY_ADD event to your listener.
> >>
> >> Theoretically a node could still be installed while the deletion event
> >> fired before registering the notifier. E.g. a synchronize_net before
> >> dumping could help here?
> > 
> > If the deletion event fired for some fib alias, then by 5/8 we are
> > guaranteed that it was already unlinked from the fib alias list in the
> > leaf in which it was contained. So, while it's possible we didn't
> > register our listener in time for the deletion event, we won't traverse
> > this fib alias while dumping the trie anyway. Did I understand you
> > correctly?
> > 
> 
> Theoretically we can have the same problem for insertion:
> 
> You receive a delete event from the notifier that is queued up first but
> the dump will still see the entry in the fib due to being managed by RCU
> (the notifier running on another CPU).
> 
> The problem is that the fib_remove_alias->hlist_del_rcu->WRITE_ONCE is
> still not strongly ordered against the local fib dump trie walk.

OK, so I believe my analysis in 5/8 was wrong. Despite CPU A invoking
fib_remove_alias() in t0 for fa1, it's possible for CPU B doing the fib
dump to see fa1 in t1, which will lead to fa1 being permanently present
in the listener's table.

Given the above, I think Dave's suggestion is the only applicable
solution. Do you agree? Any other suggestions?

Thanks

^ permalink raw reply

* Re: [PATCH net-next v3 2/3] net: fsl: Allow most drivers to be built with COMPILE_TEST
From: David Miller @ 2016-11-17 16:27 UTC (permalink / raw)
  To: arnd; +Cc: lkp, f.fainelli, kbuild-all, netdev, mw, gregory.clement,
	Shaohui.Xie
In-Reply-To: <2379973.pinmoxev18@wuerfel>

From: Arnd Bergmann <arnd@arndb.de>
Date: Thu, 17 Nov 2016 11:47:02 +0100

> On Thursday, November 17, 2016 9:38:23 AM CET Fengguang Wu wrote:
>> include/asm-generic/io.h conditionally defines iounmap() to be an
>> empty inline function, which may explain the warning on sh4.
>> 
>> General speaking, it's a false warning. The solution could be to teach
>> the robot to ignore such 'unused variable' warnings in non-x86 archs.
> 
> I think we should change the iounmap stub macro into an inline function.

Agreed.

^ permalink raw reply

* Re: [PATCH v2 net-next] net/mlx5e: remove napi_hash_del() calls
From: David Miller @ 2016-11-17 17:06 UTC (permalink / raw)
  To: eric.dumazet; +Cc: saeedm, netdev
In-Reply-To: <1479306094.8455.181.camel@edumazet-glaptop3.roam.corp.google.com>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 16 Nov 2016 06:21:34 -0800

> From: Eric Dumazet <edumazet@google.com>
> 
> Calling napi_hash_del() after netif_napi_del() is pointless.
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Applied, thanks Eric.

^ permalink raw reply

* Re: Netperf UDP issue with connected sockets
From: Eric Dumazet @ 2016-11-17 16:21 UTC (permalink / raw)
  To: Jesper Dangaard Brouer; +Cc: Rick Jones, netdev
In-Reply-To: <20161117155753.17b76f5a@redhat.com>

On Thu, 2016-11-17 at 15:57 +0100, Jesper Dangaard Brouer wrote:
> On Thu, 17 Nov 2016 06:17:38 -0800
> Eric Dumazet <eric.dumazet@gmail.com> wrote:
> 
> > On Thu, 2016-11-17 at 14:42 +0100, Jesper Dangaard Brouer wrote:
> > 
> > > I can see that qdisc layer does not activate xmit_more in this case.
> > >   
> > 
> > Sure. Not enough pressure from the sender(s).
> > 
> > The bottleneck is not the NIC or qdisc in your case, meaning that BQL
> > limit is kept at a small value.
> > 
> > (BTW not all NIC have expensive doorbells)
> 
> I believe this NIC mlx5 (50G edition) does.
> 
> I'm seeing UDP TX of 1656017.55 pps, which is per packet:
> 2414 cycles(tsc) 603.86 ns
> 
> Perf top shows (with my own udp_flood, that avoids __ip_select_ident):
> 
>  Samples: 56K of event 'cycles', Event count (approx.): 51613832267
>    Overhead  Command        Shared Object        Symbol
>  +    8.92%  udp_flood      [kernel.vmlinux]     [k] _raw_spin_lock
>    - _raw_spin_lock
>       + 90.78% __dev_queue_xmit
>       + 7.83% dev_queue_xmit
>       + 1.30% ___slab_alloc
>  +    5.59%  udp_flood      [kernel.vmlinux]     [k] skb_set_owner_w

Does TX completion happens on same cpu ?

>  +    4.77%  udp_flood      [mlx5_core]          [k] mlx5e_sq_xmit
>  +    4.09%  udp_flood      [kernel.vmlinux]     [k] fib_table_lookup

Why fib_table_lookup() is used with connected UDP flow ???

>  +    4.00%  swapper        [mlx5_core]          [k] mlx5e_poll_tx_cq
>  +    3.11%  udp_flood      [kernel.vmlinux]     [k] __ip_route_output_key_hash

Same here, this is suspect.

>  +    2.49%  swapper        [kernel.vmlinux]     [k] __slab_free
> 
> In this setup the spinlock in __dev_queue_xmit should be uncongested.
> An uncongested spin_lock+unlock cost 32 cycles(tsc) 8.198 ns on this system.
> 
> But 8.92% of the time is spend on it, which corresponds to a cost of 215
> cycles (2414*0.0892).  This cost is too high, thus something else is
> going on... I claim this mysterious extra cost is the tailptr/doorbell.

Well, with no pressure, doorbell is triggered for each packet.

Since we can not predict that your application is going to send yet
another packet one usec later, we can not avoid this.

Note that with the patches I am working on (busypolling extentions),
we could decide to let the busypoller doing the doorbells, say one every
10 usec. (or after ~16 packets were queued)

But mlx4 uses two different NAPI for TX and RX, maybe mlx5 has the same
strategy .

^ permalink raw reply

* [PATCH] VSOCK: add loopback to virtio_transport
From: Stefan Hajnoczi @ 2016-11-17 15:49 UTC (permalink / raw)
  To: netdev
  Cc: cavery, Claudio Imbrenda, Jorgen Hansen, David S . Miller,
	Stefan Hajnoczi

The VMware VMCI transport supports loopback inside virtual machines.
This patch implements loopback for virtio-vsock.

Flow control is handled by the virtio-vsock protocol as usual.  The
sending process stops transmitting on a connection when the peer's
receive buffer space is exhausted.

Cathy Avery <cavery@redhat.com> noticed this difference between VMCI and
virtio-vsock when a test case using loopback failed.  Although loopback
isn't the main point of AF_VSOCK, it is useful for testing and
virtio-vsock must match VMCI semantics so that userspace programs run
regardless of the underlying transport.

My understanding is that loopback is not supported on the host side with
VMCI.  Follow that by implementing it only in the guest driver, not the
vhost host driver.

Cc: Jorgen Hansen <jhansen@vmware.com>
Reported-by: Cathy Avery <cavery@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 net/vmw_vsock/virtio_transport.c | 57 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 57 insertions(+)

diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
index 936d7ee..f2c4071 100644
--- a/net/vmw_vsock/virtio_transport.c
+++ b/net/vmw_vsock/virtio_transport.c
@@ -44,6 +44,10 @@ struct virtio_vsock {
 	spinlock_t send_pkt_list_lock;
 	struct list_head send_pkt_list;
 
+	struct work_struct loopback_work;
+	spinlock_t loopback_list_lock;
+	struct list_head loopback_list;
+
 	atomic_t queued_replies;
 
 	/* The following fields are protected by rx_lock.  vqs[VSOCK_VQ_RX]
@@ -74,6 +78,42 @@ static u32 virtio_transport_get_local_cid(void)
 	return vsock->guest_cid;
 }
 
+static void virtio_transport_loopback_work(struct work_struct *work)
+{
+	struct virtio_vsock *vsock =
+		container_of(work, struct virtio_vsock, loopback_work);
+	LIST_HEAD(pkts);
+
+	spin_lock_bh(&vsock->loopback_list_lock);
+	list_splice_init(&vsock->loopback_list, &pkts);
+	spin_unlock_bh(&vsock->loopback_list_lock);
+
+	mutex_lock(&vsock->rx_lock);
+	while (!list_empty(&pkts)) {
+		struct virtio_vsock_pkt *pkt;
+
+		pkt = list_first_entry(&pkts, struct virtio_vsock_pkt, list);
+		list_del_init(&pkt->list);
+
+		virtio_transport_recv_pkt(pkt);
+	}
+	mutex_unlock(&vsock->rx_lock);
+}
+
+static int virtio_transport_send_pkt_loopback(struct virtio_vsock *vsock,
+					      struct virtio_vsock_pkt *pkt)
+{
+	int len = pkt->len;
+
+	spin_lock_bh(&vsock->loopback_list_lock);
+	list_add_tail(&pkt->list, &vsock->loopback_list);
+	spin_unlock_bh(&vsock->loopback_list_lock);
+
+	queue_work(virtio_vsock_workqueue, &vsock->loopback_work);
+
+	return len;
+}
+
 static void
 virtio_transport_send_pkt_work(struct work_struct *work)
 {
@@ -159,6 +199,10 @@ virtio_transport_send_pkt(struct virtio_vsock_pkt *pkt)
 		return -ENODEV;
 	}
 
+	if (le32_to_cpu(pkt->hdr.dst_cid) == vsock->guest_cid) {
+		return virtio_transport_send_pkt_loopback(vsock, pkt);
+	}
+
 	if (pkt->reply)
 		atomic_inc(&vsock->queued_replies);
 
@@ -510,10 +554,13 @@ static int virtio_vsock_probe(struct virtio_device *vdev)
 	mutex_init(&vsock->event_lock);
 	spin_lock_init(&vsock->send_pkt_list_lock);
 	INIT_LIST_HEAD(&vsock->send_pkt_list);
+	spin_lock_init(&vsock->loopback_list_lock);
+	INIT_LIST_HEAD(&vsock->loopback_list);
 	INIT_WORK(&vsock->rx_work, virtio_transport_rx_work);
 	INIT_WORK(&vsock->tx_work, virtio_transport_tx_work);
 	INIT_WORK(&vsock->event_work, virtio_transport_event_work);
 	INIT_WORK(&vsock->send_pkt_work, virtio_transport_send_pkt_work);
+	INIT_WORK(&vsock->loopback_work, virtio_transport_loopback_work);
 
 	mutex_lock(&vsock->rx_lock);
 	virtio_vsock_rx_fill(vsock);
@@ -539,6 +586,7 @@ static void virtio_vsock_remove(struct virtio_device *vdev)
 	struct virtio_vsock *vsock = vdev->priv;
 	struct virtio_vsock_pkt *pkt;
 
+	flush_work(&vsock->loopback_work);
 	flush_work(&vsock->rx_work);
 	flush_work(&vsock->tx_work);
 	flush_work(&vsock->event_work);
@@ -565,6 +613,15 @@ static void virtio_vsock_remove(struct virtio_device *vdev)
 	}
 	spin_unlock_bh(&vsock->send_pkt_list_lock);
 
+	spin_lock_bh(&vsock->loopback_list_lock);
+	while (!list_empty(&vsock->loopback_list)) {
+		pkt = list_first_entry(&vsock->loopback_list,
+				       struct virtio_vsock_pkt, list);
+		list_del(&pkt->list);
+		virtio_transport_free_pkt(pkt);
+	}
+	spin_unlock_bh(&vsock->loopback_list_lock);
+
 	mutex_lock(&the_virtio_vsock_mutex);
 	the_virtio_vsock = NULL;
 	vsock_core_exit();
-- 
2.7.4

^ permalink raw reply related

* Re: [PATCH net] ip6_tunnel: disable caching when the traffic class is inherited
From: David Miller @ 2016-11-17 17:09 UTC (permalink / raw)
  To: pabeni; +Cc: netdev, liam.mcbirnie, hannes
In-Reply-To: <67662ec875375c86e95666a7e47fc61c3a869d86.1479309929.git.pabeni@redhat.com>

From: Paolo Abeni <pabeni@redhat.com>
Date: Wed, 16 Nov 2016 16:26:46 +0100

> If an ip6 tunnel is configured to inherit the traffic class from
> the inner header, the dst_cache must be disabled or it will foul
> the policy routing.
> 
> The issue is apprently there since at leat Linux-2.6.12-rc2.
> 
> Reported-by: Liam McBirnie <liam.mcbirnie@boeing.com>
> Cc: Liam McBirnie <liam.mcbirnie@boeing.com>
> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Applied and queued up for -stable, thank you.

^ permalink raw reply

* [PATCH net 3/3] net: Check netdevice presence on dev_get_phys_port_name
From: Tariq Toukan @ 2016-11-17 15:40 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev, Eran Ben Elisha, Saeed Mahameed, Eugenia Emantayev,
	Tariq Toukan, Jiri Pirko
In-Reply-To: <1479397251-6932-1-git-send-email-tariqt@mellanox.com>

From: Eugenia Emantayev <eugenia@mellanox.com>

Check presence of network device before calling device driver
ndo_get_phys_port_name callback. Otherwise callback may access
non-existing data structures and cause kernel panic.

Fixes: db24a9044ee1 ('net: add support for phys_port_name')
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reported-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Reported-by: Steve Wise <swise@opengridcomputing.com>
Cc: Jiri Pirko <jiri@mellanox.com>
---
 net/core/dev.c       | 2 ++
 net/core/rtnetlink.c | 2 +-
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 7de0d000a9f8..b21d77405863 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -6623,6 +6623,8 @@ int dev_get_phys_port_name(struct net_device *dev,
 
 	if (!ops->ndo_get_phys_port_name)
 		return -EOPNOTSUPP;
+	if (!netif_device_present(dev))
+		return -ENODEV;
 	return ops->ndo_get_phys_port_name(dev, name, len);
 }
 EXPORT_SYMBOL(dev_get_phys_port_name);
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index b3dd81f82e70..40cd41376566 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -1054,7 +1054,7 @@ static int rtnl_phys_port_name_fill(struct sk_buff *skb, struct net_device *dev)
 
 	err = dev_get_phys_port_name(dev, name, sizeof(name));
 	if (err) {
-		if (err == -EOPNOTSUPP)
+		if (err == -EOPNOTSUPP || err == -ENODEV)
 			return 0;
 		return err;
 	}
-- 
1.8.3.1

^ permalink raw reply related

* [PATCH net 0/3] mlx4 fix for shutdown flow
From: Tariq Toukan @ 2016-11-17 15:40 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev, Eran Ben Elisha, Saeed Mahameed, Eugenia Emantayev,
	Tariq Toukan

This patchset fixes an invalid reference to mdev in mlx4 shutdown flow.

In patch 1, we make sure netif_device_detach() is called from shutdown flow only,
since we want to keep it present during a simple configuration change.

In patches 2 and 3, we add checks that were missing in:
* dev_get_phys_port_id
* dev_get_phys_port_name
We check the presence of the network device before calling the driver's
callbacks. This already exists for all other ndo's.

Series generated against net commit:
e5f6f564fd19 bnxt: add a missing rcu synchronization

Thanks,
Tariq.

Eugenia Emantayev (3):
  net/mlx4_en: Remove netif_device_detach from stop port flow
  net: Check netdevice presence on dev_get_phys_port_id
  net: Check netdevice presence on dev_get_phys_port_name

 drivers/net/ethernet/mellanox/mlx4/en_ethtool.c |  8 ++++----
 drivers/net/ethernet/mellanox/mlx4/en_netdev.c  | 22 ++++++++--------------
 drivers/net/ethernet/mellanox/mlx4/mlx4_en.h    |  2 +-
 net/core/dev.c                                  |  4 ++++
 net/core/rtnetlink.c                            |  4 ++--
 5 files changed, 19 insertions(+), 21 deletions(-)

-- 
1.8.3.1

^ permalink raw reply

* [PATCH net 2/3] net: Check netdevice presence on dev_get_phys_port_id
From: Tariq Toukan @ 2016-11-17 15:40 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev, Eran Ben Elisha, Saeed Mahameed, Eugenia Emantayev,
	Tariq Toukan, Jiri Pirko
In-Reply-To: <1479397251-6932-1-git-send-email-tariqt@mellanox.com>

From: Eugenia Emantayev <eugenia@mellanox.com>

Check presence of network device before calling device driver
ndo_get_phys_port_id callback. Otherwise callback may access
non-existing data structures and cause kernel panic.

Fixes: 66b52b0dc82c ("net: add ndo to get id of physical port of the device")
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reported-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Reported-by: Steve Wise <swise@opengridcomputing.com>
Cc: Jiri Pirko <jiri@mellanox.com>
---
 net/core/dev.c       | 2 ++
 net/core/rtnetlink.c | 2 +-
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 6666b28b6815..7de0d000a9f8 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -6602,6 +6602,8 @@ int dev_get_phys_port_id(struct net_device *dev,
 
 	if (!ops->ndo_get_phys_port_id)
 		return -EOPNOTSUPP;
+	if (!netif_device_present(dev))
+		return -ENODEV;
 	return ops->ndo_get_phys_port_id(dev, ppid);
 }
 EXPORT_SYMBOL(dev_get_phys_port_id);
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index a6529c55ffb7..b3dd81f82e70 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -1036,7 +1036,7 @@ static int rtnl_phys_port_id_fill(struct sk_buff *skb, struct net_device *dev)
 
 	err = dev_get_phys_port_id(dev, &ppid);
 	if (err) {
-		if (err == -EOPNOTSUPP)
+		if (err == -EOPNOTSUPP || err == -ENODEV)
 			return 0;
 		return err;
 	}
-- 
1.8.3.1

^ permalink raw reply related

* [PATCH net 1/3] net/mlx4_en: Remove netif_device_detach from stop port flow
From: Tariq Toukan @ 2016-11-17 15:40 UTC (permalink / raw)
  To: David S. Miller
  Cc: netdev, Eran Ben Elisha, Saeed Mahameed, Eugenia Emantayev,
	Tariq Toukan
In-Reply-To: <1479397251-6932-1-git-send-email-tariqt@mellanox.com>

From: Eugenia Emantayev <eugenia@mellanox.com>

netif_device_detach() should be called from shutdown flow only,
in any other scenario netif_device_detach is not needed and may
result in -ENODEV error in certain cases.
In order to prevent TX timeout issue during heavy CPU load
netif_carrier_off will be called.

Fixes: 3484aac16149 ("net/mlx4_en: Fix transmit timeout when driver restarts port")
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx4/en_ethtool.c |  8 ++++----
 drivers/net/ethernet/mellanox/mlx4/en_netdev.c  | 22 ++++++++--------------
 drivers/net/ethernet/mellanox/mlx4/mlx4_en.h    |  2 +-
 3 files changed, 13 insertions(+), 19 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c b/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c
index bdda17d2ea0f..3bb30f66aa07 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c
@@ -932,7 +932,7 @@ static __be32 speed_set_ptys_admin(struct mlx4_en_priv *priv, u32 speed,
 	mutex_lock(&priv->mdev->state_lock);
 	if (priv->port_up) {
 		en_warn(priv, "Port link mode changed, restarting port...\n");
-		mlx4_en_stop_port(dev, 1);
+		mlx4_en_stop_port(dev);
 		if (mlx4_en_start_port(dev))
 			en_err(priv, "Failed restarting port %d\n", priv->port);
 	}
@@ -1077,7 +1077,7 @@ static int mlx4_en_set_ringparam(struct net_device *dev,
 
 	if (priv->port_up) {
 		port_up = 1;
-		mlx4_en_stop_port(dev, 1);
+		mlx4_en_stop_port(dev);
 	}
 
 	mlx4_en_safe_replace_resources(priv, tmp);
@@ -1205,7 +1205,7 @@ static int mlx4_en_set_rxfh(struct net_device *dev, const u32 *ring_index,
 	mutex_lock(&mdev->state_lock);
 	if (priv->port_up) {
 		port_up = 1;
-		mlx4_en_stop_port(dev, 1);
+		mlx4_en_stop_port(dev);
 	}
 
 	if (ring_index)
@@ -1751,7 +1751,7 @@ static int mlx4_en_set_channels(struct net_device *dev,
 
 	if (priv->port_up) {
 		port_up = 1;
-		mlx4_en_stop_port(dev, 1);
+		mlx4_en_stop_port(dev);
 	}
 
 	mlx4_en_safe_replace_resources(priv, tmp);
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
index 3a47e83d3e07..01a680b66177 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
@@ -1741,8 +1741,7 @@ int mlx4_en_start_port(struct net_device *dev)
 		napi_schedule(&priv->rx_cq[i]->napi);
 
 	netif_tx_start_all_queues(dev);
-	netif_device_attach(dev);
-
+	netif_carrier_on(dev);
 	return 0;
 
 tx_err:
@@ -1767,7 +1766,7 @@ int mlx4_en_start_port(struct net_device *dev)
 }
 
 
-void mlx4_en_stop_port(struct net_device *dev, int detach)
+void mlx4_en_stop_port(struct net_device *dev)
 {
 	struct mlx4_en_priv *priv = netdev_priv(dev);
 	struct mlx4_en_dev *mdev = priv->mdev;
@@ -1785,12 +1784,7 @@ void mlx4_en_stop_port(struct net_device *dev, int detach)
 	mlx4_CLOSE_PORT(mdev->dev, priv->port);
 
 	/* Synchronize with tx routine */
-	netif_tx_lock_bh(dev);
-	if (detach)
-		netif_device_detach(dev);
-	netif_tx_stop_all_queues(dev);
-	netif_tx_unlock_bh(dev);
-
+	netif_carrier_off(dev);
 	netif_tx_disable(dev);
 
 	/* Set port as not active */
@@ -1903,7 +1897,7 @@ static void mlx4_en_restart(struct work_struct *work)
 	rtnl_lock();
 	mutex_lock(&mdev->state_lock);
 	if (priv->port_up) {
-		mlx4_en_stop_port(dev, 1);
+		mlx4_en_stop_port(dev);
 		if (mlx4_en_start_port(dev))
 			en_err(priv, "Failed restarting port %d\n", priv->port);
 	}
@@ -1987,7 +1981,7 @@ static int mlx4_en_close(struct net_device *dev)
 
 	mutex_lock(&mdev->state_lock);
 
-	mlx4_en_stop_port(dev, 0);
+	mlx4_en_stop_port(dev);
 	netif_carrier_off(dev);
 
 	mutex_unlock(&mdev->state_lock);
@@ -2231,7 +2225,7 @@ static int mlx4_en_change_mtu(struct net_device *dev, int new_mtu)
 			 * the port */
 			en_dbg(DRV, priv, "Change MTU called with card down!?\n");
 		} else {
-			mlx4_en_stop_port(dev, 1);
+			mlx4_en_stop_port(dev);
 			err = mlx4_en_start_port(dev);
 			if (err) {
 				en_err(priv, "Failed restarting port:%d\n",
@@ -2687,7 +2681,7 @@ static int mlx4_xdp_set(struct net_device *dev, struct bpf_prog *prog)
 	mutex_lock(&mdev->state_lock);
 	if (priv->port_up) {
 		port_up = 1;
-		mlx4_en_stop_port(dev, 1);
+		mlx4_en_stop_port(dev);
 	}
 
 	priv->xdp_ring_num = xdp_ring_num;
@@ -3406,7 +3400,7 @@ int mlx4_en_reset_config(struct net_device *dev,
 
 	if (priv->port_up) {
 		port_up = 1;
-		mlx4_en_stop_port(dev, 1);
+		mlx4_en_stop_port(dev);
 	}
 
 	en_warn(priv, "Changing device configuration rx filter(%x) rx vlan(%x)\n",
diff --git a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
index a3528dd1e72e..fb17acdfe528 100644
--- a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
+++ b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
@@ -659,7 +659,7 @@ int mlx4_en_init_netdev(struct mlx4_en_dev *mdev, int port,
 			struct mlx4_en_port_profile *prof);
 
 int mlx4_en_start_port(struct net_device *dev);
-void mlx4_en_stop_port(struct net_device *dev, int detach);
+void mlx4_en_stop_port(struct net_device *dev);
 
 void mlx4_en_set_stats_bitmap(struct mlx4_dev *dev,
 			      struct mlx4_en_stats_bitmap *stats_bitmap,
-- 
1.8.3.1

^ permalink raw reply related

* Re: [patch] amd-xgbe: Signedness bug in xgbe_phy_link_status()
From: Tom Lendacky @ 2016-11-17 14:40 UTC (permalink / raw)
  To: Dan Carpenter; +Cc: netdev, kernel-janitors
In-Reply-To: <20161117105932.GA32143@mwanda>

On 11/17/2016 4:59 AM, Dan Carpenter wrote:
> "ret" needs to be signed for the error handling to work.
> 
> Fixes: abf0a1c2b26a ("amd-xgbe: Add support for SFP+ modules")
> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
> 

Hi Dan,

This was already identified and patched:
8c5385cbb036 ("amd-xgbe: Fix up some coccinelle identified warnings")

Thanks,
Tom

> diff --git a/drivers/net/ethernet/amd/xgbe/xgbe-phy-v2.c b/drivers/net/ethernet/amd/xgbe/xgbe-phy-v2.c
> index 4ba4332..a2559c2 100644
> --- a/drivers/net/ethernet/amd/xgbe/xgbe-phy-v2.c
> +++ b/drivers/net/ethernet/amd/xgbe/xgbe-phy-v2.c
> @@ -2346,7 +2346,8 @@ static bool xgbe_phy_valid_speed(struct xgbe_prv_data *pdata, int speed)
>  static int xgbe_phy_link_status(struct xgbe_prv_data *pdata, int *an_restart)
>  {
>  	struct xgbe_phy_data *phy_data = pdata->phy_data;
> -	unsigned int ret, reg;
> +	unsigned int reg;
> +	int ret;
>  
>  	*an_restart = 0;
>  
> 

^ permalink raw reply

* Re: Netperf UDP issue with connected sockets
From: Eric Dumazet @ 2016-11-17 13:20 UTC (permalink / raw)
  To: Jesper Dangaard Brouer; +Cc: Rick Jones, netdev
In-Reply-To: <20161117091638.5fab8494@redhat.com>

On Thu, 2016-11-17 at 09:16 +0100, Jesper Dangaard Brouer wrote:

> 
> I noticed there is a Send-Q, and the perf-top2 is _raw_spin_lock, which
> looks like it comes from __dev_queue_xmit(), but we know from
> experience that this stall is actually caused by writing the
> tailptr/doorbell in the HW.  Thus, this could benefit a lot from
> bulk/xmit_more into the qdisc layer.

The Send-Q is there because of TX-completions being delayed a bit,
because of IRQ mitigation.

(ethtool -c eth0)

It happens even if you do not have a qdisc in the first place.

And we do have xmit_more in the qdisc layer already.

^ permalink raw reply

* Re: [PATCH net v2] net: nsid cannot be allocated for a dead netns
From: Cong Wang @ 2016-11-17 16:50 UTC (permalink / raw)
  To: Nicolas Dichtel
  Cc: David Miller, Andrey Wagin, Linux Kernel Network Developers
In-Reply-To: <0fb84877-d43d-736a-c5a8-1ca954f90925@6wind.com>

On Thu, Nov 17, 2016 at 12:41 AM, Nicolas Dichtel
<nicolas.dichtel@6wind.com> wrote:
> Le 17/11/2016 à 07:32, Cong Wang a écrit :
> [snip]
>> since Nicolas' patch doesn't even compile...
> It's always surprising how agressive you are, really :(
> Can you show me the compilation error of this patch (we are talking about the v2
> patch here)?

Sorry about that, I didn't event look at your v2, because your patch
(no matter v2 or v1) is already wrong to me, the idr_for_each() before
alloc_netid() is clearly a use-after-destroy.

Based on the same reason, my patch is not your v3, we are patching
different places.

^ permalink raw reply

* Re: [patch net-next 6/8] ipv4: fib: Add an API to request a FIB dump
From: Hannes Frederic Sowa @ 2016-11-17 14:36 UTC (permalink / raw)
  To: Ido Schimmel
  Cc: Jiri Pirko, netdev, davem, idosch, eladr, yotamg, nogahf, arkadis,
	ogerlitz, roopa, dsa, nikolay, andy, vivien.didelot, andrew,
	f.fainelli, alexander.h.duyck, kuznet, jmorris, yoshfuji, kaber
In-Reply-To: <20161117131049.pevkla5os75h4zlc@splinter.mtl.com>

On 17.11.2016 14:10, Ido Schimmel wrote:
> Hi Hannes,
> 
> On Wed, Nov 16, 2016 at 08:43:25PM +0100, Hannes Frederic Sowa wrote:
>> On 16.11.2016 19:51, Ido Schimmel wrote:
>>> On Wed, Nov 16, 2016 at 06:35:45PM +0100, Hannes Frederic Sowa wrote:
>>>> On 16.11.2016 16:18, Ido Schimmel wrote:
>>>>> On Wed, Nov 16, 2016 at 03:51:01PM +0100, Hannes Frederic Sowa wrote:
>>>>>> On 16.11.2016 15:09, Jiri Pirko wrote:
>>>>>>> From: Ido Schimmel <idosch@mellanox.com>
>>>>>>>
>>>>>>> Commit b90eb7549499 ("fib: introduce FIB notification infrastructure")
>>>>>>> introduced a new notification chain to notify listeners (f.e., switchdev
>>>>>>> drivers) about addition and deletion of routes.
>>>>>>>
>>>>>>> However, upon registration to the chain the FIB tables can already be
>>>>>>> populated, which means potential listeners will have an incomplete view
>>>>>>> of the tables.
>>>>>>>
>>>>>>> Solve that by adding an API to request a FIB dump. The dump itself it
>>>>>>> done using RCU in order not to starve consumers that need RTNL to make
>>>>>>> progress.
>>>>>>>
>>>>>>> Signed-off-by: Ido Schimmel <idosch@mellanox.com>
>>>>>>> Signed-off-by: Jiri Pirko <jiri@mellanox.com>
>>>>>>
>>>>>> Have you looked at potential inconsistencies resulting of RCU walking
>>>>>> the table and having concurrent inserts?
>>>>>
>>>>> Yes. I did try to think about situations in which this approach will
>>>>> fail, but I could only find problems with concurrent removals, which I
>>>>> addressed in 5/8. In case of concurrent insertions, even if you missed
>>>>> the node, you would still get the ENTRY_ADD event to your listener.
>>>>
>>>> Theoretically a node could still be installed while the deletion event
>>>> fired before registering the notifier. E.g. a synchronize_net before
>>>> dumping could help here?
>>>
>>> If the deletion event fired for some fib alias, then by 5/8 we are
>>> guaranteed that it was already unlinked from the fib alias list in the
>>> leaf in which it was contained. So, while it's possible we didn't
>>> register our listener in time for the deletion event, we won't traverse
>>> this fib alias while dumping the trie anyway. Did I understand you
>>> correctly?
>>>
>>
>> Theoretically we can have the same problem for insertion:
>>
>> You receive a delete event from the notifier that is queued up first but
>> the dump will still see the entry in the fib due to being managed by RCU
>> (the notifier running on another CPU).
>>
>> The problem is that the fib_remove_alias->hlist_del_rcu->WRITE_ONCE is
>> still not strongly ordered against the local fib dump trie walk.
> 
> OK, so I believe my analysis in 5/8 was wrong. Despite CPU A invoking
> fib_remove_alias() in t0 for fa1, it's possible for CPU B doing the fib
> dump to see fa1 in t1, which will lead to fa1 being permanently present
> in the listener's table.

Yep. :(

Also the delayed work queue is only partial ordered (only on one CPU),
so you don't know about when an event is processed from the dump and the
notifier. I think you need to create your own workqueue for doing so.

> Given the above, I think Dave's suggestion is the only applicable
> solution. Do you agree? Any other suggestions?

As I wrote before the problem with seqcounter is, that with a quagga
running on top and pushing updates you end up never getting a stable
view, so maybe some logic to abort in case it cannot be loaded after a
few tries would be fine.

The other way is the journal idea I had, which uses an rb-tree with
timestamps as keys (can be lamport timestamps). You insert into the tree
until the dump is finished and use it as queue later to shuffle stuff
into the hardware.

Because you should be able to hold refs to the fa_info or other structs
directly, I don't expect gigantic memory overhead, just for a cell to
store timetamp and pointer (rb_node).

Bye,
Hannes

^ permalink raw reply

* [PATCH net-next] amd-xgbe: Update connection validation for backplane mode
From: Tom Lendacky @ 2016-11-17 14:43 UTC (permalink / raw)
  To: netdev; +Cc: David Miller

Update the connection type enumeration for backplane mode and return
an error when there is a mismatch between the mode and the connection
type.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
---
 drivers/net/ethernet/amd/xgbe/xgbe-phy-v2.c |    2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/ethernet/amd/xgbe/xgbe-phy-v2.c b/drivers/net/ethernet/amd/xgbe/xgbe-phy-v2.c
index 348cc8c..9d8c9530 100644
--- a/drivers/net/ethernet/amd/xgbe/xgbe-phy-v2.c
+++ b/drivers/net/ethernet/amd/xgbe/xgbe-phy-v2.c
@@ -164,6 +164,7 @@ enum xgbe_conn_type {
 	XGBE_CONN_TYPE_NONE = 0,
 	XGBE_CONN_TYPE_SFP,
 	XGBE_CONN_TYPE_MDIO,
+	XGBE_CONN_TYPE_RSVD1,
 	XGBE_CONN_TYPE_BACKPLANE,
 	XGBE_CONN_TYPE_MAX,
 };
@@ -2831,6 +2832,7 @@ static int xgbe_phy_init(struct xgbe_prv_data *pdata)
 	if (xgbe_phy_conn_type_mismatch(pdata)) {
 		dev_err(pdata->dev, "phy mode/connection mismatch (%#x/%#x)\n",
 			phy_data->port_mode, phy_data->conn_type);
+		return -EINVAL;
 	}
 
 	/* Validate the mode requested */

^ permalink raw reply related

* Re: Netperf UDP issue with connected sockets
From: Jesper Dangaard Brouer @ 2016-11-17 13:42 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Rick Jones, netdev, brouer
In-Reply-To: <1479388850.8455.240.camel@edumazet-glaptop3.roam.corp.google.com>

On Thu, 17 Nov 2016 05:20:50 -0800
Eric Dumazet <eric.dumazet@gmail.com> wrote:

> On Thu, 2016-11-17 at 09:16 +0100, Jesper Dangaard Brouer wrote:
> 
> > 
> > I noticed there is a Send-Q, and the perf-top2 is _raw_spin_lock, which
> > looks like it comes from __dev_queue_xmit(), but we know from
> > experience that this stall is actually caused by writing the
> > tailptr/doorbell in the HW.  Thus, this could benefit a lot from
> > bulk/xmit_more into the qdisc layer.  
> 
> The Send-Q is there because of TX-completions being delayed a bit,
> because of IRQ mitigation.
> 
> (ethtool -c eth0)
> 
> It happens even if you do not have a qdisc in the first place.
> 
> And we do have xmit_more in the qdisc layer already.

I can see that qdisc layer does not activate xmit_more in this case.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer


$ ethtool -c mlx5p4
Coalesce parameters for mlx5p4:
Adaptive RX: on  TX: off
stats-block-usecs: 0
sample-interval: 0
pkt-rate-low: 0
pkt-rate-high: 0

rx-usecs: 3
rx-frames: 32
rx-usecs-irq: 0
rx-frames-irq: 0

tx-usecs: 16
tx-frames: 32
tx-usecs-irq: 0
tx-frames-irq: 0

rx-usecs-low: 0
rx-frame-low: 0
tx-usecs-low: 0
tx-frame-low: 0

rx-usecs-high: 0
rx-frame-high: 0
tx-usecs-high: 0
tx-frame-high: 0

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox