Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH net-next 0/6] Fixes for the MV88e6xxx interrupt code
From: David Miller @ 2016-11-16 19:29 UTC (permalink / raw)
  To: andrew; +Cc: vivien.didelot, netdev
In-Reply-To: <20161116.142102.1586244273660874282.davem@davemloft.net>

From: David Miller <davem@davemloft.net>
Date: Wed, 16 Nov 2016 14:21:02 -0500 (EST)

> From: Andrew Lunn <andrew@lunn.ch>
> Date: Wed, 16 Nov 2016 01:56:50 +0100
> 
>> The interrupt code was never tested with a board who's probing
>> resulted in an -EPROBE_DEFFERED. So the clean up paths never got
>> tested. I now do have -EPROBE_DEFFERED, and things break badly during
>> cleanup. These are the fixes.
>> 
>> This is fixing code in net-next.
> 
> Series applied, thanks Andrew.

Actually, I reverted, there is a bug.

Take a look at how the 'device_irq' local variable is used in
mv88e6xxx_g2_irq_setup.  You assign it to 'err' in an error
path before it is ever set to anything.

I think you meant to use the structure's 'device_irq' member
instead.

^ permalink raw reply

* Re: [PATCH] net: dsa: mv88e6xxx: Respect SPEED_UNFORCED, don't set force bit
From: David Miller @ 2016-11-16 19:34 UTC (permalink / raw)
  To: andrew; +Cc: vivien.didelot, netdev
In-Reply-To: <1479266808-10957-1-git-send-email-andrew@lunn.ch>

From: Andrew Lunn <andrew@lunn.ch>
Date: Wed, 16 Nov 2016 04:26:48 +0100

> The SPEED_UNFORCED indicates the MAC & PHY should perform
> auto-negotiation to determine a speed which works. If this is called
> for, don't set the force bit. If it is set, the MAC actually does
> 10Gbps, why the internal PHYs don't support.
> 
> Signed-off-by: Andrew Lunn <andrew@lunn.ch>

What tree is this for?  This is a fix but the patch doesn't apply to
'net'.

^ permalink raw reply

* Re: [patch net-next 6/8] ipv4: fib: Add an API to request a FIB dump
From: Hannes Frederic Sowa @ 2016-11-16 19:43 UTC (permalink / raw)
  To: Ido Schimmel
  Cc: Jiri Pirko, netdev, davem, idosch, eladr, yotamg, nogahf, arkadis,
	ogerlitz, roopa, dsa, nikolay, andy, vivien.didelot, andrew,
	f.fainelli, alexander.h.duyck, kuznet, jmorris, yoshfuji, kaber
In-Reply-To: <20161116185103.h3hio4pyrlk2xeol@splinter>

On 16.11.2016 19:51, Ido Schimmel wrote:
> Hi,
> 
> On Wed, Nov 16, 2016 at 06:35:45PM +0100, Hannes Frederic Sowa wrote:
>> On 16.11.2016 16:18, Ido Schimmel wrote:
>>> On Wed, Nov 16, 2016 at 03:51:01PM +0100, Hannes Frederic Sowa wrote:
>>>> On 16.11.2016 15:09, Jiri Pirko wrote:
>>>>> From: Ido Schimmel <idosch@mellanox.com>
>>>>>
>>>>> Commit b90eb7549499 ("fib: introduce FIB notification infrastructure")
>>>>> introduced a new notification chain to notify listeners (f.e., switchdev
>>>>> drivers) about addition and deletion of routes.
>>>>>
>>>>> However, upon registration to the chain the FIB tables can already be
>>>>> populated, which means potential listeners will have an incomplete view
>>>>> of the tables.
>>>>>
>>>>> Solve that by adding an API to request a FIB dump. The dump itself it
>>>>> done using RCU in order not to starve consumers that need RTNL to make
>>>>> progress.
>>>>>
>>>>> Signed-off-by: Ido Schimmel <idosch@mellanox.com>
>>>>> Signed-off-by: Jiri Pirko <jiri@mellanox.com>
>>>>
>>>> Have you looked at potential inconsistencies resulting of RCU walking
>>>> the table and having concurrent inserts?
>>>
>>> Yes. I did try to think about situations in which this approach will
>>> fail, but I could only find problems with concurrent removals, which I
>>> addressed in 5/8. In case of concurrent insertions, even if you missed
>>> the node, you would still get the ENTRY_ADD event to your listener.
>>
>> Theoretically a node could still be installed while the deletion event
>> fired before registering the notifier. E.g. a synchronize_net before
>> dumping could help here?
> 
> If the deletion event fired for some fib alias, then by 5/8 we are
> guaranteed that it was already unlinked from the fib alias list in the
> leaf in which it was contained. So, while it's possible we didn't
> register our listener in time for the deletion event, we won't traverse
> this fib alias while dumping the trie anyway. Did I understand you
> correctly?
> 

Theoretically we can have the same problem for insertion:

You receive a delete event from the notifier that is queued up first but
the dump will still see the entry in the fib due to being managed by RCU
(the notifier running on another CPU).

The problem is that the fib_remove_alias->hlist_del_rcu->WRITE_ONCE is
still not strongly ordered against the local fib dump trie walk.

>> I don't know how you prepare the data structures for inserting in into
>> the hardware, but if ordering matters, the notifier for a delete event
>> can be called before the dump installed the fib entry?
> 
> Right. It's possible for the listener to receive a deletion event for a
> fib entry it doesn't have, in which case it should just ignore it (as
> current listeners do).

Yep, for this specific case.

Bye,
Hannes

^ permalink raw reply

* Re: [PATCH net-next 0/6] Fixes for the MV88e6xxx interrupt code
From: Andrew Lunn @ 2016-11-16 19:45 UTC (permalink / raw)
  To: David Miller; +Cc: vivien.didelot, netdev
In-Reply-To: <20161116.142916.1432429487379929901.davem@davemloft.net>

> Take a look at how the 'device_irq' local variable is used in
> mv88e6xxx_g2_irq_setup.  You assign it to 'err' in an error
> path before it is ever set to anything.
> 
> I think you meant to use the structure's 'device_irq' member
> instead.

Hi David

Agreed. Thanks for the review.

	Andrew

^ permalink raw reply

* Re: [PATCH net-next v3 2/3] net: fsl: Allow most drivers to be built with COMPILE_TEST
From: Florian Fainelli @ 2016-11-16 19:52 UTC (permalink / raw)
  To: kbuild test robot
  Cc: kbuild-all, netdev, davem, mw, arnd, gregory.clement, Shaohui.Xie
In-Reply-To: <201611161135.ksuIHp17%fengguang.wu@intel.com>

On 11/15/2016 07:23 PM, kbuild test robot wrote:
> Hi Florian,
> 
> [auto build test WARNING on net-next/master]
> 
> url:    https://github.com/0day-ci/linux/commits/Florian-Fainelli/net-gianfar_ptp-Rename-FS-bit-to-FIPERST/20161116-095805
> config: sh-allmodconfig (attached as .config)
> compiler: sh4-linux-gnu-gcc (Debian 6.1.1-9) 6.1.1 20160705
> reproduce:
>         wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross
>         chmod +x ~/bin/make.cross
>         # save the attached .config to linux build tree
>         make.cross ARCH=sh 
> 
> All warnings (new ones prefixed by >>):
> 
>    drivers/net/ethernet/freescale/fsl_pq_mdio.c: In function 'fsl_pq_mdio_remove':
>>> drivers/net/ethernet/freescale/fsl_pq_mdio.c:498:27: warning: unused variable 'priv' [-Wunused-variable]
>      struct fsl_pq_mdio_priv *priv = bus->priv;

Humm, this looks bogus, the variable is used see below:

>                               ^~~~
> 
> vim +/priv +498 drivers/net/ethernet/freescale/fsl_pq_mdio.c
> 
> 1577ecef drivers/net/fsl_pq_mdio.c                    Andy Fleming    2009-02-04  482  	return 0;
> 1577ecef drivers/net/fsl_pq_mdio.c                    Andy Fleming    2009-02-04  483  
> dd3b8a32 drivers/net/ethernet/freescale/fsl_pq_mdio.c Timur Tabi      2012-08-29  484  error:
> dd3b8a32 drivers/net/ethernet/freescale/fsl_pq_mdio.c Timur Tabi      2012-08-29  485  	if (priv->map)
> b3319b10 drivers/net/fsl_pq_mdio.c                    Anton Vorontsov 2009-12-30  486  		iounmap(priv->map);
> dd3b8a32 drivers/net/ethernet/freescale/fsl_pq_mdio.c Timur Tabi      2012-08-29  487  
> 1577ecef drivers/net/fsl_pq_mdio.c                    Andy Fleming    2009-02-04  488  	kfree(new_bus);
> dd3b8a32 drivers/net/ethernet/freescale/fsl_pq_mdio.c Timur Tabi      2012-08-29  489  
> 1577ecef drivers/net/fsl_pq_mdio.c                    Andy Fleming    2009-02-04  490  	return err;
> 1577ecef drivers/net/fsl_pq_mdio.c                    Andy Fleming    2009-02-04  491  }
> 1577ecef drivers/net/fsl_pq_mdio.c                    Andy Fleming    2009-02-04  492  
> 1577ecef drivers/net/fsl_pq_mdio.c                    Andy Fleming    2009-02-04  493  
> 5078ac79 drivers/net/ethernet/freescale/fsl_pq_mdio.c Timur Tabi      2012-08-29  494  static int fsl_pq_mdio_remove(struct platform_device *pdev)
> 1577ecef drivers/net/fsl_pq_mdio.c                    Andy Fleming    2009-02-04  495  {
> 5078ac79 drivers/net/ethernet/freescale/fsl_pq_mdio.c Timur Tabi      2012-08-29  496  	struct device *device = &pdev->dev;
> 1577ecef drivers/net/fsl_pq_mdio.c                    Andy Fleming    2009-02-04  497  	struct mii_bus *bus = dev_get_drvdata(device);
> b3319b10 drivers/net/fsl_pq_mdio.c                    Anton Vorontsov 2009-12-30 @498  	struct fsl_pq_mdio_priv *priv = bus->priv;
> 1577ecef drivers/net/fsl_pq_mdio.c                    Andy Fleming    2009-02-04  499  
> 1577ecef drivers/net/fsl_pq_mdio.c                    Andy Fleming    2009-02-04  500  	mdiobus_unregister(bus);
> 1577ecef drivers/net/fsl_pq_mdio.c                    Andy Fleming    2009-02-04  501  
> b3319b10 drivers/net/fsl_pq_mdio.c                    Anton Vorontsov 2009-12-30  502  	iounmap(priv->map);

Right here.

What compiler version is this?
-- 
Florian

^ permalink raw reply

* Re: [PATCH net-next v3 3/3] net: marvell: Allow drivers to be built with COMPILE_TEST
From: Florian Fainelli @ 2016-11-16 20:06 UTC (permalink / raw)
  To: kbuild test robot, gregory.clement, thomas.petazzoni, mw
  Cc: kbuild-all, netdev, davem, mw, arnd, Shaohui.Xie
In-Reply-To: <201611170244.EQJKm0tn%fengguang.wu@intel.com>

On 11/16/2016 11:04 AM, kbuild test robot wrote:
> Hi Florian,
> 
> [auto build test WARNING on net-next/master]
> 
> url:    https://github.com/0day-ci/linux/commits/Florian-Fainelli/net-gianfar_ptp-Rename-FS-bit-to-FIPERST/20161116-095805
> config: s390-allyesconfig (attached as .config)
> compiler: s390x-linux-gnu-gcc (Debian 6.1.1-9) 6.1.1 20160705
> reproduce:
>         wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross
>         chmod +x ~/bin/make.cross
>         # save the attached .config to linux build tree
>         make.cross ARCH=s390 
> 
> All warnings (new ones prefixed by >>):

While we could fix some of these warnings for 64-bit architectures, the
mvneta and mvpp2 drivers would not work there anyway since they assume
physical addresses will always be 32-bit wide and casts such addresses
accordingly.

Should we still silence these warnings?
-- 
Florian

^ permalink raw reply

* Re: [PATCH net] virtio-net: add a missing synchronize_net()
From: David Miller @ 2016-11-16 20:12 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev, jasowang, mst
In-Reply-To: <1479277452.8455.156.camel@edumazet-glaptop3.roam.corp.google.com>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Tue, 15 Nov 2016 22:24:12 -0800

> From: Eric Dumazet <edumazet@google.com>
> 
> It seems many drivers do not respect napi_hash_del() contract.
> 
> When napi_hash_del() is used before netif_napi_del(), an RCU grace
> period is needed before freeing NAPI object.
> 
> Fixes: 91815639d880 ("virtio-net: rx busy polling support")
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Applied and queued up for -stable, thanks Eric.

^ permalink raw reply

* Re: [BUG] X86: Removing inline decl on arch/x86/include/asm/desc.h.
From: Eric Dumazet @ 2016-11-16 20:15 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: netdev, Ingo Molnar, H. Peter Anvin,
	Realtek linux nic maintainers, x86, linux-kernel
In-Reply-To: <alpine.DEB.2.20.1611162005320.3697@nanos>

On Wed, 2016-11-16 at 20:16 +0100, Thomas Gleixner wrote:
> On Tue, 15 Nov 2016, Corcodel Marian wrote:
> >  Inline declarations suppress warning message from compiler but
> >  most of these functions was declared static and is not used local on file.
> 
> Huch? This is a header file and the functions are marked inline on purpose.
> 
> Can you please explain what you are trying to achieve and why you think
> that this is a good idea?

Corcodel Marian is a bot.
Do not bother Thomas. 
Total Waste of time.

https://www.spinics.net/lists/netdev/msg370788.html

https://www.mail-archive.com/netdev@vger.kernel.org/msg103775.html

^ permalink raw reply

* Re: [patch net-next] mlxsw: spectrum_router: Adjust placement of FIB abort warning
From: David Miller @ 2016-11-16 20:18 UTC (permalink / raw)
  To: jiri; +Cc: netdev, idosch, eladr, yotamg, nogahf, arkadis, ogerlitz
In-Reply-To: <1479286318-6115-1-git-send-email-jiri@resnulli.us>

From: Jiri Pirko <jiri@resnulli.us>
Date: Wed, 16 Nov 2016 09:51:58 +0100

> From: Ido Schimmel <idosch@mellanox.com>
> 
> The recent merge commit bb598c1b8c9b ("Merge
> git://git.kernel.org/pub/scm/linux/kernel/git/davem/net") would cause
> the FIB abort warning to fire whenever we flush the FIB tables - either
> during module removal or actual abort.
> 
> Move it back to its rightful location in the FIB abort function.
> 
> Signed-off-by: Ido Schimmel <idosch@mellanox.com>
> Signed-off-by: Jiri Pirko <jiri@mellanox.com>

Applied, thanks for fixing this up.

^ permalink raw reply

* Re: [PATCH net][v2] bpf: fix range arithmetic for bpf map access
From: Josef Bacik @ 2016-11-16 20:25 UTC (permalink / raw)
  To: Jann Horn
  Cc: Alexei Starovoitov, Alexei Starovoitov, Daniel Borkmann,
	David S. Miller, netdev
In-Reply-To: <CAG48ez2XnOQAc440ErNBATfL8uN1VOmV4M3BJKDW9s9PvWFOtg@mail.gmail.com>

On 11/16/2016 01:41 PM, Jann Horn wrote:
> On Tue, Nov 15, 2016 at 3:20 PM, Josef Bacik <jbacik@fb.com> wrote:
>> On 11/15/2016 08:47 AM, Jann Horn wrote:
>>> In states_equal():
>>> if (rold->type == NOT_INIT ||
>>>    (rold->type == UNKNOWN_VALUE && rcur->type != NOT_INIT))
>>> <------------
>>> continue;
>>>
>>> I think this is broken in code like the following:
>>>
>>> int value;
>>> if (condition) {
>>>   value = 1; // visited first by verifier
>>> } else {
>>>   value = 1000000; // visited second by verifier
>>> }
>>> int dummy = 1; // states seem to converge here, but actually don't
>>> map[value] = 1234;
>>>
>>> `value` would be an UNKNOWN_VALUE for both paths, right? So
>>> states_equal() would decide that the states converge after the
>>> conditionally executed code?
>>>
>>
>> Value would be CONST_IMM for both paths, and wouldn't match so they wouldn't
>> converge.  I think I understood your question right, let me know if I'm
>> addressing the wrong part of it.
>
> Okay, true, but what if you load the values from a map and bounds-check them
> instead of hardcoding them? Then they will be of type UNKNOWN_VALUE, right?
> Like this:
>
> int value = map[0];
> if (condition) {
>   value &= 0x1; // visited first by verifier
> } else {
>   // nothing; visited second by verifier
> }
> int dummy = 1; // states seem to converge here, but actually don't
> map[value] = 1234;
>
> And then `rold->type == UNKNOWN_VALUE && rcur->type != NOT_INIT` will be
> true in the `dummy = 1` line, and the states converge. Am I missing something?
>

Ah ok yeah I see it now you are right.  This is slightly different from this 
particular problem so I'll send a second patch to address this, sound 
reasonable?  Thanks,

Josef

^ permalink raw reply

* Re: [PATCH net][v2] bpf: fix range arithmetic for bpf map access
From: Jann Horn @ 2016-11-16 20:26 UTC (permalink / raw)
  To: Josef Bacik
  Cc: Alexei Starovoitov, Alexei Starovoitov, Daniel Borkmann,
	David S. Miller, netdev
In-Reply-To: <935e538a-1400-cad0-c933-d4a200e5e0ef@fb.com>

On Wed, Nov 16, 2016 at 9:25 PM, Josef Bacik <jbacik@fb.com> wrote:
> On 11/16/2016 01:41 PM, Jann Horn wrote:
>>
>> On Tue, Nov 15, 2016 at 3:20 PM, Josef Bacik <jbacik@fb.com> wrote:
>>>
>>> On 11/15/2016 08:47 AM, Jann Horn wrote:
>>>>
>>>> In states_equal():
>>>> if (rold->type == NOT_INIT ||
>>>>    (rold->type == UNKNOWN_VALUE && rcur->type != NOT_INIT))
>>>> <------------
>>>> continue;
>>>>
>>>> I think this is broken in code like the following:
>>>>
>>>> int value;
>>>> if (condition) {
>>>>   value = 1; // visited first by verifier
>>>> } else {
>>>>   value = 1000000; // visited second by verifier
>>>> }
>>>> int dummy = 1; // states seem to converge here, but actually don't
>>>> map[value] = 1234;
>>>>
>>>> `value` would be an UNKNOWN_VALUE for both paths, right? So
>>>> states_equal() would decide that the states converge after the
>>>> conditionally executed code?
>>>>
>>>
>>> Value would be CONST_IMM for both paths, and wouldn't match so they
>>> wouldn't
>>> converge.  I think I understood your question right, let me know if I'm
>>> addressing the wrong part of it.
>>
>>
>> Okay, true, but what if you load the values from a map and bounds-check
>> them
>> instead of hardcoding them? Then they will be of type UNKNOWN_VALUE,
>> right?
>> Like this:
>>
>> int value = map[0];
>> if (condition) {
>>   value &= 0x1; // visited first by verifier
>> } else {
>>   // nothing; visited second by verifier
>> }
>> int dummy = 1; // states seem to converge here, but actually don't
>> map[value] = 1234;
>>
>> And then `rold->type == UNKNOWN_VALUE && rcur->type != NOT_INIT` will be
>> true in the `dummy = 1` line, and the states converge. Am I missing
>> something?
>>
>
> Ah ok yeah I see it now you are right.  This is slightly different from this
> particular problem so I'll send a second patch to address this, sound
> reasonable?  Thanks,

Sure, makes sense.

^ permalink raw reply

* Re: [PATCH net 1/7] net: ethernet: ti: cpsw: fix bad register access in probe error path
From: Grygorii Strashko @ 2016-11-16 20:33 UTC (permalink / raw)
  To: Johan Hovold, Mugunthan V N; +Cc: linux-omap, netdev, linux-kernel
In-Reply-To: <1479306916-27673-2-git-send-email-johan@kernel.org>



On 11/16/2016 08:35 AM, Johan Hovold wrote:
> Make sure to resume the platform device to enable clocks before
> accessing the CPSW registers in the probe error path (e.g. for deferred
> probe).
> 
> Unhandled fault: external abort on non-linefetch (0x1008) at 0xd0872d08
> ...
> [<c04fabcc>] (cpsw_ale_control_set) from [<c04fb8b4>] (cpsw_ale_destroy+0x2c/0x44)
> [<c04fb8b4>] (cpsw_ale_destroy) from [<c04fea58>] (cpsw_probe+0xbd0/0x10c4)
> [<c04fea58>] (cpsw_probe) from [<c047b2a0>] (platform_drv_probe+0x5c/0xc0)
> 
> Note that in the unlikely event of a runtime-resume failure, we'll leak
> the ale struct.
> 
> Fixes: df828598a755 ("netdev: driver: ethernet: Add TI CPSW driver")
> Signed-off-by: Johan Hovold <johan@kernel.org>
> ---
>  drivers/net/ethernet/ti/cpsw.c | 7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c
> index c6cff3d2ff05..5bc5e6189661 100644
> --- a/drivers/net/ethernet/ti/cpsw.c
> +++ b/drivers/net/ethernet/ti/cpsw.c
> @@ -2818,7 +2818,12 @@ static int cpsw_probe(struct platform_device *pdev)
>  	return 0;
>  
>  clean_ale_ret:
> -	cpsw_ale_destroy(cpsw->ale);
> +	if (pm_runtime_get_sync(&pdev->dev) < 0) {
> +		pm_runtime_put_noidle(&pdev->dev);
> +	} else {
> +		cpsw_ale_destroy(cpsw->ale);
> +		pm_runtime_put_sync(&pdev->dev);
> +	}
>  clean_dma_ret:
>  	cpdma_ctlr_destroy(cpsw->dma);
>  clean_runtime_disable_ret:
> 

I think, wouldn't it be logically more simple to just keep CPSW PM runtime enabled during probe?
Like in below diff (not tested):

diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c
index 0548e56..deaac1b 100644
--- a/drivers/net/ethernet/ti/cpsw.c
+++ b/drivers/net/ethernet/ti/cpsw.c
@@ -2657,13 +2657,12 @@ static int cpsw_probe(struct platform_device *pdev)
                goto clean_runtime_disable_ret;
        }
        cpsw->version = readl(&cpsw->regs->id_ver);
-       pm_runtime_put_sync(&pdev->dev);
 
        res = platform_get_resource(pdev, IORESOURCE_MEM, 1);
        cpsw->wr_regs = devm_ioremap_resource(&pdev->dev, res);
        if (IS_ERR(cpsw->wr_regs)) {
                ret = PTR_ERR(cpsw->wr_regs);
-               goto clean_runtime_disable_ret;
+               goto clean_runtime_put_ret;
        }
 
        memset(&dma_params, 0, sizeof(dma_params));
@@ -2700,7 +2699,7 @@ static int cpsw_probe(struct platform_device *pdev)
        default:
                dev_err(priv->dev, "unknown version 0x%08x\n", cpsw->version);
                ret = -ENODEV;
-               goto clean_runtime_disable_ret;
+               goto clean_runtime_put_ret;
        }
        for (i = 0; i < cpsw->data.slaves; i++) {
                struct cpsw_slave *slave = &cpsw->slaves[i];
@@ -2729,7 +2728,7 @@ static int cpsw_probe(struct platform_device *pdev)
        if (!cpsw->dma) {
                dev_err(priv->dev, "error initializing dma\n");
                ret = -ENOMEM;
-               goto clean_runtime_disable_ret;
+               goto clean_runtime_put_ret;
        }
 
        cpsw->txch[0] = cpdma_chan_create(cpsw->dma, 0, cpsw_tx_handler, 0);
@@ -2831,12 +2830,16 @@ static int cpsw_probe(struct platform_device *pdev)
                }
        }
 
+       pm_runtime_put(&pdev->dev);
+
        return 0;
 
 clean_ale_ret:
        cpsw_ale_destroy(cpsw->ale);
 clean_dma_ret:
        cpdma_ctlr_destroy(cpsw->dma);
+clean_runtime_put_ret:
+       pm_runtime_put_sync(&pdev->dev);
 clean_runtime_disable_ret:
        pm_runtime_disable(&pdev->dev);
 clean_ndev_ret:

-- 
regards,
-grygorii

^ permalink raw reply related

* HELLO
From: Wilfred Kabore @ 2016-11-16 20:39 UTC (permalink / raw)
  To: you

Dear Friend,

Greetings and hope this mail meets you well!

I want you to be my partner in the transfer of the sum of $6.7 Million dollars discovered in my  department in one of the leading Banks (EcoBank Plc) here in Burkina-Faso. I shall give you more information on this proposal when I get your reply, but rest assured that I will give you 40%  of the total sum once the transfer is completed but you have to maintain secrecy of this business deal if you are ready to work with me.

God bless you as I wait for your response.

Dr. Kabore Wilfred

^ permalink raw reply

* Re: [patch net-next 6/8] ipv4: fib: Add an API to request a FIB dump
From: Ido Schimmel @ 2016-11-16 21:06 UTC (permalink / raw)
  To: Hannes Frederic Sowa
  Cc: Jiri Pirko, netdev, davem, idosch, eladr, yotamg, nogahf, arkadis,
	ogerlitz, roopa, dsa, nikolay, andy, vivien.didelot, andrew,
	f.fainelli, alexander.h.duyck, kuznet, jmorris, yoshfuji, kaber
In-Reply-To: <56d2179d-00ff-bcc5-e365-845179cbe672@stressinduktion.org>

On Wed, Nov 16, 2016 at 08:43:25PM +0100, Hannes Frederic Sowa wrote:
> On 16.11.2016 19:51, Ido Schimmel wrote:
> > On Wed, Nov 16, 2016 at 06:35:45PM +0100, Hannes Frederic Sowa wrote:
> >> On 16.11.2016 16:18, Ido Schimmel wrote:
> >>> On Wed, Nov 16, 2016 at 03:51:01PM +0100, Hannes Frederic Sowa wrote:
> >>>> On 16.11.2016 15:09, Jiri Pirko wrote:
> >>>>> From: Ido Schimmel <idosch@mellanox.com>
> >>>>>
> >>>>> Commit b90eb7549499 ("fib: introduce FIB notification infrastructure")
> >>>>> introduced a new notification chain to notify listeners (f.e., switchdev
> >>>>> drivers) about addition and deletion of routes.
> >>>>>
> >>>>> However, upon registration to the chain the FIB tables can already be
> >>>>> populated, which means potential listeners will have an incomplete view
> >>>>> of the tables.
> >>>>>
> >>>>> Solve that by adding an API to request a FIB dump. The dump itself it
> >>>>> done using RCU in order not to starve consumers that need RTNL to make
> >>>>> progress.
> >>>>>
> >>>>> Signed-off-by: Ido Schimmel <idosch@mellanox.com>
> >>>>> Signed-off-by: Jiri Pirko <jiri@mellanox.com>
> >>>>
> >>>> Have you looked at potential inconsistencies resulting of RCU walking
> >>>> the table and having concurrent inserts?
> >>>
> >>> Yes. I did try to think about situations in which this approach will
> >>> fail, but I could only find problems with concurrent removals, which I
> >>> addressed in 5/8. In case of concurrent insertions, even if you missed
> >>> the node, you would still get the ENTRY_ADD event to your listener.
> >>
> >> Theoretically a node could still be installed while the deletion event
> >> fired before registering the notifier. E.g. a synchronize_net before
> >> dumping could help here?
> > 
> > If the deletion event fired for some fib alias, then by 5/8 we are
> > guaranteed that it was already unlinked from the fib alias list in the
> > leaf in which it was contained. So, while it's possible we didn't
> > register our listener in time for the deletion event, we won't traverse
> > this fib alias while dumping the trie anyway. Did I understand you
> > correctly?
> > 
> 
> Theoretically we can have the same problem for insertion:
> 
> You receive a delete event from the notifier that is queued up first but
> the dump will still see the entry in the fib due to being managed by RCU
> (the notifier running on another CPU).
> 
> The problem is that the fib_remove_alias->hlist_del_rcu->WRITE_ONCE is
> still not strongly ordered against the local fib dump trie walk.

It's pretty late here so I would have to check this out tomorrow
morning. If this is indeed the case (not saying you're wrong, just want
to verify for myself), then I guess 5/8 can be dropped and instead we
should go with Dave's suggestion? I don't see any other way given the
constraints...

Thanks a lot Hannes!

^ permalink raw reply

* Re: net/l2tp:BUG: KASAN: use-after-free in l2tp_ip6_close
From: Guillaume Nault @ 2016-11-16 21:07 UTC (permalink / raw)
  To: Cong Wang; +Cc: Baozeng Ding, Linux Kernel Network Developers
In-Reply-To: <CAM_iQpUxNYpkRetxX88z=iFZiZ1beQqBt+9qFsDTnAYwWCoHfA@mail.gmail.com>

On Wed, Nov 16, 2016 at 11:08:23AM -0800, Cong Wang wrote:
> On Wed, Nov 16, 2016 at 8:30 AM, Guillaume Nault <g.nault@alphalink.fr> wrote:
> > diff --git a/net/l2tp/l2tp_ip.c b/net/l2tp/l2tp_ip.c
> > index fce25af..982f6c4 100644
> > --- a/net/l2tp/l2tp_ip.c
> > +++ b/net/l2tp/l2tp_ip.c
> > @@ -251,8 +251,6 @@ static int l2tp_ip_bind(struct sock *sk, struct sockaddr *uaddr, int addr_len)
> >         int ret;
> >         int chk_addr_ret;
> >
> > -       if (!sock_flag(sk, SOCK_ZAPPED))
> > -               return -EINVAL;
> >         if (addr_len < sizeof(struct sockaddr_l2tpip))
> >                 return -EINVAL;
> >         if (addr->l2tp_family != AF_INET)
> > @@ -267,6 +265,9 @@ static int l2tp_ip_bind(struct sock *sk, struct sockaddr *uaddr, int addr_len)
> >         read_unlock_bh(&l2tp_ip_lock);
> >
> >         lock_sock(sk);
> > +       if (!sock_flag(sk, SOCK_ZAPPED))
> > +               goto out;
> > +
> >         if (sk->sk_state != TCP_CLOSE || addr_len < sizeof(struct sockaddr_l2tpip))
> >                 goto out;
> >
> > diff --git a/net/l2tp/l2tp_ip6.c b/net/l2tp/l2tp_ip6.c
> > index ad3468c..9978d01 100644
> > --- a/net/l2tp/l2tp_ip6.c
> > +++ b/net/l2tp/l2tp_ip6.c
> > @@ -269,8 +269,6 @@ static int l2tp_ip6_bind(struct sock *sk, struct sockaddr *uaddr, int addr_len)
> >         int addr_type;
> >         int err;
> >
> > -       if (!sock_flag(sk, SOCK_ZAPPED))
> > -               return -EINVAL;
> >         if (addr->l2tp_family != AF_INET6)
> >                 return -EINVAL;
> >         if (addr_len < sizeof(*addr))
> > @@ -296,6 +294,9 @@ static int l2tp_ip6_bind(struct sock *sk, struct sockaddr *uaddr, int addr_len)
> >         lock_sock(sk);
> >
> >         err = -EINVAL;
> > +       if (!sock_flag(sk, SOCK_ZAPPED))
> > +               goto out_unlock;
> > +
> >         if (sk->sk_state != TCP_CLOSE)
> >                 goto out_unlock;
> 
> 
> Makes sense, it should prevent a concurrent caller adding the socket
> into bind table
> twice after passing __l2tp_ip_bind_lookup() check.

Yes, and the __l2tp_ip_bind_lookup() call is also racy. But, by
properly checking the SOCK_ZAPPED flag, we probably can remove this
call entirely.

For now, I only wanted to make sure the issue was well identified. I'll
submit a more complete patch for net (with protected SOCK_ZAPPED check
in l2tp_ip_connect() too).

^ permalink raw reply

* Re: [PATCH] rtl8xxxu: Fix for agressive power saving by rtl8723bu wireless IC
From: Jes Sorensen @ 2016-11-16 21:29 UTC (permalink / raw)
  To: John Heenan
  Cc: Barry Day, Rafał Miłecki, Kalle Valo, linux-wireless,
	netdev, linux-kernel
In-Reply-To: <CAAye0QP0FQetaaa-RE23teVgZ335ELE8O5qnzMA7CwQ4T8wc_Q@mail.gmail.com>

John Heenan <john@zgus.com> writes:
> Barry Day has submitted real world reports for the 8192eu and 8192cu.
> This needs to be acknowledged. I have submitted real world reports for
> the 8723bu.

Lets get this a little more clear - first of all, I have asked you to
investigate which part resolves the problem. Rather than 'I randomly
moved something around and it happens to work for me'.

> When it comes down to it, it looks like the kernel code changes are
> really going to be very trivial to fix this problem and we need to
> take the focus off dramatic outbursts over style issues to a strategy
> for getting usable results from real world testing.
>
> Addressing style issues in a dramatic manner to me looks like a mean
> sport for maintainers who line up to easy target first time
> contributors. This mean attitude comes from the top with a well known
> comment about "publicly making fun of people". The polite comments
> over style from Joe Perches and Rafał Miłecki are welcomed.

Once bad code is in place, it is way harder to get rid of it again. It
is *normal* for maintainers to ask contributors to do things
correctly. In addition you have been asked repeatedly by multiple people
to respect coding style, but every patch you posted violated it again in
a different way, instead of spending the little time it would take for
you to get it right.

> An effective strategy would be to insert some printk statements to
> trace what init steps vendor derived drivers do each time
> wpa_supplicant is called and ask real world testers to report their
> results. This is a lot more productive and less error prone than
> laboriously pouring over vendor source code. Alternative drivers that
> use vendor code from Realtek is enormously complicated and a huge pain
> to make sense of.
>
> Joe Sorensen's driver code is far easier to make sense of and it is a
> shame Realtek don't come to the party. Joe Sorensens's code take takes
> advantage of the excellent work of kernel contributors to the mac80211
> driver.

Now you are pissing on my name - do you really want to be taken
seriously here?

> Previous comments I made about enable_rf, rtl8xxxu_start,
> rtl8xxxu_init_device etc should be clarified. I will leave it for the
> moment as it currently serves no direct useful purpose.

I have made it very clear I want this issue resolved, but I want it
done right.

Jes

^ permalink raw reply

* [PATCH net-next 3/3] RDS: TCP: Force every connection to be initiated by numerically smaller IP address
From: Sowmini Varadhan @ 2016-11-16 21:29 UTC (permalink / raw)
  To: netdev; +Cc: santosh.shilimkar, sowmini.varadhan, davem, rds-devel
In-Reply-To: <cover.1478876910.git.sowmini.varadhan@oracle.com>

When 2 RDS peers initiate an RDS-TCP connection simultaneously,
there is a potential for "duelling syns" on either/both sides.
See commit 241b271952eb ("RDS-TCP: Reset tcp callbacks if re-using an
outgoing socket in rds_tcp_accept_one()") for a description of this
condition, and the arbitration logic which ensures that the
numerically large IP address in the TCP connection is bound to the
RDS_TCP_PORT ("canonical ordering").

The rds_connection should not be marked as RDS_CONN_UP until the
arbitration logic has converged for the following reason. The sender
may start transmitting RDS datagrams as soon as RDS_CONN_UP is set,
and since the sender removes all datagrams from the rds_connection's
cp_retrans queue based on TCP acks. If the TCP ack was sent from
a tcp socket that got reset as part of duel aribitration (but
before data was delivered to the receivers RDS socket layer),
the sender may end up prematurely freeing the datagram, and
the datagram is no longer reliably deliverable.

This patch remedies that condition by making sure that, upon
receipt of 3WH completion state change notification of TCP_ESTABLISHED
in rds_tcp_state_change, we mark the rds_connection as RDS_CONN_UP
if, and only if, the IP addresses and ports for the connection are
canonically ordered. In all other cases, rds_tcp_state_change will
force an rds_conn_path_drop(), and rds_queue_reconnect() on
both peers will restart the connection to ensure canonical ordering.

A side-effect of enforcing this condition in rds_tcp_state_change()
is that rds_tcp_accept_one_path() can now be refactored for simplicity.
It is also no longer possible to encounter an RDS_CONN_UP connection in
the arbitration logic in rds_tcp_accept_one().

Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
---
 net/rds/connection.c  |    1 +
 net/rds/tcp_connect.c |   14 +++++++++++++-
 net/rds/tcp_listen.c  |   29 ++++++++++++-----------------
 3 files changed, 26 insertions(+), 18 deletions(-)

diff --git a/net/rds/connection.c b/net/rds/connection.c
index b86e188..fe9d31c 100644
--- a/net/rds/connection.c
+++ b/net/rds/connection.c
@@ -683,6 +683,7 @@ void rds_conn_path_connect_if_down(struct rds_conn_path *cp)
 	    !test_and_set_bit(RDS_RECONNECT_PENDING, &cp->cp_flags))
 		queue_delayed_work(rds_wq, &cp->cp_conn_w, 0);
 }
+EXPORT_SYMBOL_GPL(rds_conn_path_connect_if_down);

 void rds_conn_connect_if_down(struct rds_connection *conn)
 {
diff --git a/net/rds/tcp_connect.c b/net/rds/tcp_connect.c
index 05f61c5..d6839d9 100644
--- a/net/rds/tcp_connect.c
+++ b/net/rds/tcp_connect.c
@@ -60,7 +60,19 @@ void rds_tcp_state_change(struct sock *sk)
 	case TCP_SYN_RECV:
 		break;
 	case TCP_ESTABLISHED:
-		rds_connect_path_complete(cp, RDS_CONN_CONNECTING);
+		/* Force the peer to reconnect so that we have the
+		 * TCP ports going from <smaller-ip>.<transient> to
+		 * <larger-ip>.<RDS_TCP_PORT>. We avoid marking the
+		 * RDS connection as RDS_CONN_UP until the reconnect,
+		 * to avoid RDS datagram loss.
+		 */
+		if (cp->cp_conn->c_laddr > cp->cp_conn->c_faddr &&
+		    rds_conn_path_transition(cp, RDS_CONN_CONNECTING,
+					     RDS_CONN_ERROR)) {
+			rds_conn_path_drop(cp);
+		} else {
+			rds_connect_path_complete(cp, RDS_CONN_CONNECTING);
+		}
 		break;
 	case TCP_CLOSE_WAIT:
 	case TCP_CLOSE:
diff --git a/net/rds/tcp_listen.c b/net/rds/tcp_listen.c
index c9c4968..f74bab3 100644
--- a/net/rds/tcp_listen.c
+++ b/net/rds/tcp_listen.c
@@ -83,25 +83,20 @@ struct rds_tcp_connection *rds_tcp_accept_one_path(struct rds_connection *conn)
 {
 	int i;
 	bool peer_is_smaller = (conn->c_faddr < conn->c_laddr);
-	int npaths = conn->c_npaths;
-
-	if (npaths <= 1) {
-		struct rds_conn_path *cp = &conn->c_path[0];
-		int ret;
-
-		ret = rds_conn_path_transition(cp, RDS_CONN_DOWN,
-					       RDS_CONN_CONNECTING);
-		if (!ret)
-			rds_conn_path_transition(cp, RDS_CONN_ERROR,
-						 RDS_CONN_CONNECTING);
-		return cp->cp_transport_data;
-	}
+	int npaths = max_t(int, 1, conn->c_npaths);

-	/* for mprds, paths with cp_index > 0 MUST be initiated by the peer
+	/* for mprds, all paths MUST be initiated by the peer
 	 * with the smaller address.
 	 */
-	if (!peer_is_smaller)
+	if (!peer_is_smaller) {
+		/* Make sure we initiate at least one path if this
+		 * has not already been done; rds_start_mprds() will
+		 * take care of additional paths, if necessary.
+		 */
+		if (npaths == 1)
+			rds_conn_path_connect_if_down(&conn->c_path[0]);
 		return NULL;
+	}

 	for (i = 0; i < npaths; i++) {
 		struct rds_conn_path *cp = &conn->c_path[i];
@@ -171,8 +166,8 @@ int rds_tcp_accept_one(struct socket *sock)
 	mutex_lock(&rs_tcp->t_conn_path_lock);
 	cp = rs_tcp->t_cpath;
 	conn_state = rds_conn_path_state(cp);
-	if (conn_state != RDS_CONN_CONNECTING && conn_state != RDS_CONN_UP &&
-	    conn_state != RDS_CONN_ERROR)
+	WARN_ON(conn_state == RDS_CONN_UP);
+	if (conn_state != RDS_CONN_CONNECTING && conn_state != RDS_CONN_ERROR)
 		goto rst_nsk;
 	if (rs_tcp->t_sock) {
 		/* Need to resolve a duelling SYN between peers.
-- 
1.7.1

^ permalink raw reply related

* [PATCH net-next 1/3] RDS: TCP: set RDS_FLAG_RETRANSMITTED in cp_retrans list
From: Sowmini Varadhan @ 2016-11-16 21:29 UTC (permalink / raw)
  To: netdev; +Cc: santosh.shilimkar, sowmini.varadhan, davem, rds-devel
In-Reply-To: <cover.1478876910.git.sowmini.varadhan@oracle.com>

As noted in rds_recv_incoming() sequence numbers on data packets
can decreas for the failover case, and the Rx path is equipped
to recover from this, if the RDS_FLAG_RETRANSMITTED is set
on the rds header of an incoming message with a suspect sequence
number.

The RDS_FLAG_RETRANSMITTED is predicated on the RDS_FLAG_RETRANSMITTED
flag in the rds_message, so make sure the flag is set on messages
queued for retransmission.

Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
---
 net/rds/tcp_send.c |    3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/net/rds/tcp_send.c b/net/rds/tcp_send.c
index 89d09b4..dcf4742 100644
--- a/net/rds/tcp_send.c
+++ b/net/rds/tcp_send.c
@@ -100,6 +100,9 @@ int rds_tcp_xmit(struct rds_connection *conn, struct rds_message *rm,
 		set_bit(RDS_MSG_HAS_ACK_SEQ, &rm->m_flags);
 		tc->t_last_expected_una = rm->m_ack_seq + 1;
 
+		if (test_bit(RDS_MSG_RETRANSMITTED, &rm->m_flags))
+			rm->m_inc.i_hdr.h_flags |= RDS_FLAG_RETRANSMITTED;
+
 		rdsdebug("rm %p tcp nxt %u ack_seq %llu\n",
 			 rm, rds_tcp_snd_nxt(tc),
 			 (unsigned long long)rm->m_ack_seq);
-- 
1.7.1

^ permalink raw reply related

* [PATCH net-next 2/3] RDS: TCP: Track peer's connection generation number
From: Sowmini Varadhan @ 2016-11-16 21:29 UTC (permalink / raw)
  To: netdev; +Cc: santosh.shilimkar, sowmini.varadhan, davem, rds-devel
In-Reply-To: <cover.1478876910.git.sowmini.varadhan@oracle.com>

The RDS transport has to be able to distinguish between
two types of failure events:
(a) when the transport fails (e.g., TCP connection reset)
    but the RDS socket/connection layer on both sides stays
    the same
(b) when the peer's RDS layer itself resets (e.g., due to module
    reload or machine reboot at the peer)
In case (a) both sides must reconnect and continue the RDS messaging
without any message loss or disruption to the message sequence numbers,
and this is achieved by rds_send_path_reset().

In case (b) we should reset all rds_connection state to the
new incarnation of the peer. Examples of state that needs to
be reset are next expected rx sequence number from, or messages to be
retransmitted to, the new incarnation of the peer.

To achieve this, the RDS handshake probe added as part of
commit 5916e2c1554f ("RDS: TCP: Enable multipath RDS for TCP")
is enhanced so that sender and receiver of the RDS ping-probe
will add a generation number as part of the RDS_EXTHDR_GEN_NUM
extension header. Each peer stores local and remote generation
numbers as part of each rds_connection. Changes in generation
number will be detected via incoming handshake probe ping
request or response and will allow the receiver to reset rds_connection
state.

Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
---
 net/rds/af_rds.c     |    4 ++++
 net/rds/connection.c |    2 ++
 net/rds/message.c    |    1 +
 net/rds/rds.h        |    8 +++++++-
 net/rds/recv.c       |   36 ++++++++++++++++++++++++++++++++++++
 net/rds/send.c       |    9 +++++++--
 6 files changed, 57 insertions(+), 3 deletions(-)

diff --git a/net/rds/af_rds.c b/net/rds/af_rds.c
index 6beaeb1..2ac1e61 100644
--- a/net/rds/af_rds.c
+++ b/net/rds/af_rds.c
@@ -605,10 +605,14 @@ static void rds_exit(void)
 }
 module_exit(rds_exit);
 
+u32 rds_gen_num;
+
 static int rds_init(void)
 {
 	int ret;
 
+	net_get_random_once(&rds_gen_num, sizeof(rds_gen_num));
+
 	ret = rds_bind_lock_init();
 	if (ret)
 		goto out;
diff --git a/net/rds/connection.c b/net/rds/connection.c
index 13f459d..b86e188 100644
--- a/net/rds/connection.c
+++ b/net/rds/connection.c
@@ -269,6 +269,8 @@ static void __rds_conn_path_init(struct rds_connection *conn,
 			kmem_cache_free(rds_conn_slab, conn);
 			conn = found;
 		} else {
+			conn->c_my_gen_num = rds_gen_num;
+			conn->c_peer_gen_num = 0;
 			hlist_add_head_rcu(&conn->c_hash_node, head);
 			rds_cong_add_conn(conn);
 			rds_conn_count++;
diff --git a/net/rds/message.c b/net/rds/message.c
index 6cb9106..49bfb51 100644
--- a/net/rds/message.c
+++ b/net/rds/message.c
@@ -42,6 +42,7 @@
 [RDS_EXTHDR_RDMA]	= sizeof(struct rds_ext_header_rdma),
 [RDS_EXTHDR_RDMA_DEST]	= sizeof(struct rds_ext_header_rdma_dest),
 [RDS_EXTHDR_NPATHS]	= sizeof(u16),
+[RDS_EXTHDR_GEN_NUM]	= sizeof(u32),
 };
 
 
diff --git a/net/rds/rds.h b/net/rds/rds.h
index 4121e18..ebbf909 100644
--- a/net/rds/rds.h
+++ b/net/rds/rds.h
@@ -151,6 +151,9 @@ struct rds_connection {
 
 	struct rds_conn_path	c_path[RDS_MPATH_WORKERS];
 	wait_queue_head_t	c_hs_waitq; /* handshake waitq */
+
+	u32			c_my_gen_num;
+	u32			c_peer_gen_num;
 };
 
 static inline
@@ -243,7 +246,8 @@ struct rds_ext_header_rdma_dest {
 /* Extension header announcing number of paths.
  * Implicit length = 2 bytes.
  */
-#define RDS_EXTHDR_NPATHS	4
+#define RDS_EXTHDR_NPATHS	5
+#define RDS_EXTHDR_GEN_NUM	6
 
 #define __RDS_EXTHDR_MAX	16 /* for now */
 
@@ -338,6 +342,7 @@ static inline u32 rds_rdma_cookie_offset(rds_rdma_cookie_t cookie)
 #define RDS_MSG_RETRANSMITTED	5
 #define RDS_MSG_MAPPED		6
 #define RDS_MSG_PAGEVEC		7
+#define RDS_MSG_FLUSH		8
 
 struct rds_message {
 	atomic_t		m_refcount;
@@ -664,6 +669,7 @@ static inline void __rds_wake_sk_sleep(struct sock *sk)
 struct rds_message *rds_cong_update_alloc(struct rds_connection *conn);
 
 /* conn.c */
+extern u32 rds_gen_num;
 int rds_conn_init(void);
 void rds_conn_exit(void);
 struct rds_connection *rds_conn_create(struct net *net,
diff --git a/net/rds/recv.c b/net/rds/recv.c
index cbfabdf..9d0666e 100644
--- a/net/rds/recv.c
+++ b/net/rds/recv.c
@@ -120,6 +120,36 @@ static void rds_recv_rcvbuf_delta(struct rds_sock *rs, struct sock *sk,
 	/* do nothing if no change in cong state */
 }
 
+static void rds_conn_peer_gen_update(struct rds_connection *conn,
+				     u32 peer_gen_num)
+{
+	int i;
+	struct rds_message *rm, *tmp;
+	unsigned long flags;
+
+	WARN_ON(conn->c_trans->t_type != RDS_TRANS_TCP);
+	if (peer_gen_num != 0) {
+		if (conn->c_peer_gen_num != 0 &&
+		    peer_gen_num != conn->c_peer_gen_num) {
+			for (i = 0; i < RDS_MPATH_WORKERS; i++) {
+				struct rds_conn_path *cp;
+
+				cp = &conn->c_path[i];
+				spin_lock_irqsave(&cp->cp_lock, flags);
+				cp->cp_next_tx_seq = 1;
+				cp->cp_next_rx_seq = 0;
+				list_for_each_entry_safe(rm, tmp,
+							 &cp->cp_retrans,
+							 m_conn_item) {
+					set_bit(RDS_MSG_FLUSH, &rm->m_flags);
+				}
+				spin_unlock_irqrestore(&cp->cp_lock, flags);
+			}
+		}
+		conn->c_peer_gen_num = peer_gen_num;
+	}
+}
+
 /*
  * Process all extension headers that come with this message.
  */
@@ -163,7 +193,9 @@ static void rds_recv_hs_exthdrs(struct rds_header *hdr,
 	union {
 		struct rds_ext_header_version version;
 		u16 rds_npaths;
+		u32 rds_gen_num;
 	} buffer;
+	u32 new_peer_gen_num = 0;
 
 	while (1) {
 		len = sizeof(buffer);
@@ -176,6 +208,9 @@ static void rds_recv_hs_exthdrs(struct rds_header *hdr,
 			conn->c_npaths = min_t(int, RDS_MPATH_WORKERS,
 					       buffer.rds_npaths);
 			break;
+		case RDS_EXTHDR_GEN_NUM:
+			new_peer_gen_num = buffer.rds_gen_num;
+			break;
 		default:
 			pr_warn_ratelimited("ignoring unknown exthdr type "
 					     "0x%x\n", type);
@@ -183,6 +218,7 @@ static void rds_recv_hs_exthdrs(struct rds_header *hdr,
 	}
 	/* if RDS_EXTHDR_NPATHS was not found, default to a single-path */
 	conn->c_npaths = max_t(int, conn->c_npaths, 1);
+	rds_conn_peer_gen_update(conn, new_peer_gen_num);
 }
 
 /* rds_start_mprds() will synchronously start multiple paths when appropriate.
diff --git a/net/rds/send.c b/net/rds/send.c
index 896626b..77c8c6e 100644
--- a/net/rds/send.c
+++ b/net/rds/send.c
@@ -259,8 +259,9 @@ int rds_send_xmit(struct rds_conn_path *cp)
 			 * connection.
 			 * Therefore, we never retransmit messages with RDMA ops.
 			 */
-			if (rm->rdma.op_active &&
-			    test_bit(RDS_MSG_RETRANSMITTED, &rm->m_flags)) {
+			if (test_bit(RDS_MSG_FLUSH, &rm->m_flags) ||
+			    (rm->rdma.op_active &&
+			    test_bit(RDS_MSG_RETRANSMITTED, &rm->m_flags))) {
 				spin_lock_irqsave(&cp->cp_lock, flags);
 				if (test_and_clear_bit(RDS_MSG_ON_CONN, &rm->m_flags))
 					list_move(&rm->m_conn_item, &to_be_dropped);
@@ -1209,6 +1210,10 @@ int rds_sendmsg(struct socket *sock, struct msghdr *msg, size_t payload_len)
 		rds_message_add_extension(&rm->m_inc.i_hdr,
 					  RDS_EXTHDR_NPATHS, &npaths,
 					  sizeof(npaths));
+		rds_message_add_extension(&rm->m_inc.i_hdr,
+					  RDS_EXTHDR_GEN_NUM,
+					  &cp->cp_conn->c_my_gen_num,
+					  sizeof(u32));
 	}
 	spin_unlock_irqrestore(&cp->cp_lock, flags);
 
-- 
1.7.1

^ permalink raw reply related

* [PATCH net-next 0/3] RDS: TCP: HA/Failover fixes
From: Sowmini Varadhan @ 2016-11-16 21:29 UTC (permalink / raw)
  To: netdev; +Cc: santosh.shilimkar, sowmini.varadhan, davem, rds-devel

This series contains a set of fixes for bugs exposed when
we ran the following in a loop between a test machine pair:

 while (1); do
   # modprobe rds-tcp on test nodes
   # run rds-stress in bi-dir mode between test machine pair 
   # modprobe -r rds-tcp on test nodes
 done

rds-stress in bi-dir mode will cause both nodes to initiate
RDS-TCP connections at almost the same instant, exposing the 
bugs fixed in this series. 

Without the fixes, rds-stress reports sporadic packet drops,
and packets arriving out of sequence. After the fixes,we have
been able to run the  test overnight, without any issues.

Each patch has a detailed description of the root-cause fixed
by the patch.

Sowmini Varadhan (3):
  RDS: TCP: set RDS_FLAG_RETRANSMITTED in cp_retrans list
  RDS: TCP: Track peer's connection generation number
  RDS: TCP: Force every connection to be initiated by numerically
    smaller IP address

 net/rds/af_rds.c      |    4 ++++
 net/rds/connection.c  |    3 +++
 net/rds/message.c     |    1 +
 net/rds/rds.h         |    8 +++++++-
 net/rds/recv.c        |   36 ++++++++++++++++++++++++++++++++++++
 net/rds/send.c        |    9 +++++++--
 net/rds/tcp_connect.c |   14 +++++++++++++-
 net/rds/tcp_listen.c  |   29 ++++++++++++-----------------
 net/rds/tcp_send.c    |    3 +++
 9 files changed, 86 insertions(+), 21 deletions(-)

^ permalink raw reply

* Re: [PATCH net-next] lwtunnel: subtract tunnel headroom from mtu on output redirect
From: David Miller @ 2016-11-16 22:01 UTC (permalink / raw)
  To: david.lebrun; +Cc: netdev, roopa
In-Reply-To: <1479287146-25766-1-git-send-email-david.lebrun@uclouvain.be>

From: David Lebrun <david.lebrun@uclouvain.be>
Date: Wed, 16 Nov 2016 10:05:46 +0100

> This patch changes the lwtunnel_headroom() function which is called
> in ipv4_mtu() and ip6_mtu(), to also return the correct headroom
> value when the lwtunnel state is OUTPUT_REDIRECT.
> 
> This patch enables e.g. SR-IPv6 encapsulations to work without
> manually setting the route mtu.
> 
> Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com>
> Signed-off-by: David Lebrun <david.lebrun@uclouvain.be>

Applied, thanks David.

^ permalink raw reply

* Re: [PATCH net-next] sfc: remove napi_hash_del() call
From: David Miller @ 2016-11-16 22:05 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev, ecree, bkenward
In-Reply-To: <1479304907.8455.171.camel@edumazet-glaptop3.roam.corp.google.com>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 16 Nov 2016 06:01:47 -0800

> From: Eric Dumazet <edumazet@google.com>
> 
> Calling napi_hash_del() after netif_napi_del() is pointless.
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Applied.

^ permalink raw reply

* Re: [PATCH] netronome: don't access real_num_rx_queues directly
From: David Miller @ 2016-11-16 22:06 UTC (permalink / raw)
  To: arnd; +Cc: jakub.kicinski, rolf.neugebauer, oss-drivers, netdev,
	linux-kernel
In-Reply-To: <20161116141118.1893244-1-arnd@arndb.de>

From: Arnd Bergmann <arnd@arndb.de>
Date: Wed, 16 Nov 2016 15:10:49 +0100

> The netdev->real_num_rx_queues setting is only available if CONFIG_SYSFS
> is enabled, so we now get a build failure when that is turned off:
> 
> netronome/nfp/nfp_net_common.c: In function 'nfp_net_ring_swap_enable':
> netronome/nfp/nfp_net_common.c:2489:18: error: 'struct net_device' has no member named 'real_num_rx_queues'; did you mean 'real_num_tx_queues'?
> 
> As far as I can tell, the check here is only used as an optimization that
> we can skip in order to fix the compilation. If sysfs is disabled,
> the following netif_set_real_num_rx_queues() has no effect.
> 
> Fixes: 164d1e9e5d52 ("nfp: add support for ethtool .set_channels")
> Signed-off-by: Arnd Bergmann <arnd@arndb.de>

Applied, thanks.

^ permalink raw reply

* Re: [PATCH net] be2net: do not call napi_hash_del()
From: David Miller @ 2016-11-16 22:07 UTC (permalink / raw)
  To: eric.dumazet
  Cc: netdev, sathya.perla, ajit.khaparde, sriharsha.basavapatna,
	somnath.kotur
In-Reply-To: <1479305562.8455.176.camel@edumazet-glaptop3.roam.corp.google.com>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 16 Nov 2016 06:12:42 -0800

> From: Eric Dumazet <edumazet@google.com>
> 
> Calling napi_hash_del() before netif_napi_del() is dangerous
> if a synchronize_rcu() is not enforced before NAPI struct freeing.
> 
> Lets leave this detail to core networking stack and feel
> more comfortable.
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Applied.

^ permalink raw reply

* Re: [PATCH net] cxgb4: do not call napi_hash_del()
From: David Miller @ 2016-11-16 22:07 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev, hariprasad
In-Reply-To: <1479305942.8455.179.camel@edumazet-glaptop3.roam.corp.google.com>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 16 Nov 2016 06:19:02 -0800

> From: Eric Dumazet <edumazet@google.com>
> 
> Calling napi_hash_del() before netif_napi_del() is dangerous
> if a synchronize_rcu() is not enforced before NAPI struct freeing.
> 
> Lets leave this detail to core networking stack and feel
> more comfortable.
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Applied.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox