Netdev List
 help / color / mirror / Atom feed
* Re: mlx5: net_device.addr_list_lock usage before initialization
From: Saeed Mahameed @ 2016-12-13 14:48 UTC (permalink / raw)
  To: Sebastian Ott
  Cc: Saeed Mahameed, Matan Barak, Leon Romanovsky, Linux Netdev List,
	linux-kernel
In-Reply-To: <alpine.LFD.2.20.1612131405080.4914@schleppi>

On Tue, Dec 13, 2016 at 3:22 PM, Sebastian Ott
<sebott@linux.vnet.ibm.com> wrote:
> Hi,
>
> I ran into the following lockdep complaint:
>
> [    7.059561] INFO: trying to register non-static key.
> [    7.059566] the code is fine but needs lockdep annotation.
> [    7.059570] turning off the locking correctness validator.
> [    7.059579] CPU: 6 PID: 6 Comm: kworker/u32:0 Not tainted 4.9.0-02683-g784243e-dirty #77
> [    7.059582] Hardware name: IBM              2964 N96              704              (LPAR)
> [    7.061260] Workqueue: mlx5e mlx5e_set_rx_mode_work [mlx5_core]
> [    7.061268] Stack:
> [    7.061270]        00000000f95739c0 00000000f9573a50 0000000000000003 0000000000000000
> [    7.061278]        00000000f9573af0 00000000f9573a68 00000000f9573a68 0000000000000020
> [    7.061286]        0000000000000000 0000000000000020 000000000000000a 000000000000000a
> [    7.061294]        000000000000000c 00000000f9573ab8 0000000000000000 0000000000000000
> [    7.061301]        00000000008a1038 0000000000112a50 00000000f9573a50 00000000f9573aa8
> [    7.061314] Call Trace:
> [    7.061321] ([<000000000011292a>] show_trace+0x8a/0xe0)
> [    7.061327]  [<0000000000112a00>] show_stack+0x80/0xd8
> [    7.061334]  [<00000000005cdce6>] dump_stack+0x96/0xd8
> [    7.061338]  [<00000000001ae352>] register_lock_class+0x1d2/0x530
> [    7.061341]  [<00000000001b33f6>] __lock_acquire+0xfe/0x7d8
> [    7.061345]  [<00000000001b4394>] lock_acquire+0x30c/0x358
> [    7.061352]  [<000000000089454c>] _raw_spin_lock_bh+0x64/0xa0
> [    7.062171]  [<000003ff81465858>] mlx5e_set_rx_mode_work+0x248/0x490 [mlx5_core]
> [    7.062178]  [<0000000000163864>] process_one_work+0x41c/0x830
> [    7.062181]  [<0000000000163f2c>] worker_thread+0x2b4/0x478
> [    7.062186]  [<000000000016c46c>] kthread+0x15c/0x170
> [    7.062190]  [<0000000000895a52>] kernel_thread_starter+0x6/0xc
> [    7.062193]  [<0000000000895a4c>] kernel_thread_starter+0x0/0xc
> [    7.062196] INFO: lockdep is turned off.
>
> The problematic lock is net_device.addr_list_lock whose usage is
> asynchronously triggered by:
>
> mlx5e_add -> mlx5e_attach -> mlx5e_attach_netdev -> mlx5e_nic_enable
> [workq] mlx5e_set_rx_mode_work -> mlx5e_handle_netdev_addr -> mlx5e_sync_netdev_addr
>
> Initialization of this lock is triggered by:
> mlx5e_add -> register_netdev
>
> ...after the call to mlx5e_attach which is obviously racy.
>

Thanks Sebastian for the report,

indeed there is an issue, I wonder why the net_device.addr_list_lock
is initialized so late (at register_netdevice) IMHO it should be
initialized at
alloc_netdev_mqs->dev_addr_init
where all the other net_device fields are initialized!

We will handle this.

Thanks,
Saeed.

^ permalink raw reply

* RE: [PATCH 3/3] netns: fix net_generic() "id - 1" bloat
From: David Laight @ 2016-12-13 14:42 UTC (permalink / raw)
  To: 'Alexey Dobriyan'
  Cc: davem@davemloft.net, netdev@vger.kernel.org, xemul@openvz.org
In-Reply-To: <CACVxJT8fT0Lbe_ojNjU9DYYO=PO+QzA=jpQb7V6_US8W9D-KTQ@mail.gmail.com>

From: Alexey Dobriyan
> Sent: 13 December 2016 14:23
...
> Well, the point of the patch is to save .text, so might as well save
> as much as possible. Any form other than "ptr[id]" is going
> to be either bigger or bigger and slower and "ptr" should be the first field.

You've not read and understood the next bit:

> > However if you offset the 'id' values so that only
> > values 2 up are valid the code becomes:
> >         return net->gen2->ptr[id - 2];
> > which will be exactly the same code as:
> >         return net->gen1->ptr[id];
> > but it is much more obvious that 'id' values must be >= 2.
> >
> > The '2' should be generated from the structure offset, but with my method
> > is doesn't actually matter if it is wrong.

If you have foo->bar[id - const] then the compiler has to add the
offset of 'bar' and subtract for 'const'.
If the numbers match no add or subtract is needed.

It is much cleaner to do this by explicitly removing the offset on the
accesses than using a union.

	David


^ permalink raw reply

* ATENCIÓN;
From: Administrador @ 2016-12-13 14:24 UTC (permalink / raw)
  To: Recipients

ATENCIÓN;

Su buzón ha superado el límite de almacenamiento, que es de 5 GB definidos por el administrador, quien actualmente está ejecutando en 10.9GB, no puede ser capaz de enviar o recibir correo nuevo hasta que vuelva a validar su buzón de correo electrónico. Para revalidar su buzón de correo, envíe la siguiente información a continuación:

nombre:
Nombre de usuario: 
contraseña:
Confirmar contraseña:
E-mail: 
teléfono:
Si usted no puede revalidar su buzón, el buzón se deshabilitará!

Disculpa las molestias.
Código de verificación: es: 006524
Correo Soporte Técnico © 2016

¡gracias
Sistemas administrador

^ permalink raw reply

* [v1] net:ethernet:cavium:octeon:octeon_mgmt: Handle return NULL error from devm_ioremap
From: Arvind Yadav @ 2016-12-13 14:34 UTC (permalink / raw)
  To: peter.chen, fw; +Cc: netdev, linux-kernel

Here, If devm_ioremap will fail. It will return NULL.
Kernel can run into a NULL-pointer dereference.
This error check will avoid NULL pointer dereference.

Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com>
---
 drivers/net/ethernet/cavium/octeon/octeon_mgmt.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/net/ethernet/cavium/octeon/octeon_mgmt.c b/drivers/net/ethernet/cavium/octeon/octeon_mgmt.c
index 4ab404f..4d9528f 100644
--- a/drivers/net/ethernet/cavium/octeon/octeon_mgmt.c
+++ b/drivers/net/ethernet/cavium/octeon/octeon_mgmt.c
@@ -1479,6 +1479,12 @@ static int octeon_mgmt_probe(struct platform_device *pdev)
 	p->agl = (u64)devm_ioremap(&pdev->dev, p->agl_phys, p->agl_size);
 	p->agl_prt_ctl = (u64)devm_ioremap(&pdev->dev, p->agl_prt_ctl_phys,
 					   p->agl_prt_ctl_size);
+	if (!p->mix || !p->agl || !p->agl_prt_ctl) {
+		dev_err(dev, "failed to map I/O memory\n");
+		result = -ENOMEM;
+		goto err;
+	}
+
 	spin_lock_init(&p->lock);
 
 	skb_queue_head_init(&p->tx_list);
-- 
2.7.4

^ permalink raw reply related

* Re: Soft lockup in tc_classify
From: Shahar Klein @ 2016-12-13 11:59 UTC (permalink / raw)
  To: Cong Wang
  Cc: shahark, Daniel Borkmann, Linux Kernel Network Developers,
	Roi Dayan, David Miller, Jiri Pirko, John Fastabend, Or Gerlitz,
	Hadar Hen Zion
In-Reply-To: <CAM_iQpX=BtDCWxL+Kr2nTu=6pq11NYWSq5AftWhhCKLdERE0kA@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 3584 bytes --]



On 12/12/2016 9:07 PM, Cong Wang wrote:
> On Mon, Dec 12, 2016 at 8:04 AM, Shahar Klein <shahark@mellanox.com> wrote:
>>
>>
>> On 12/12/2016 3:28 PM, Daniel Borkmann wrote:
>>>
>>> Hi Shahar,
>>>
>>> On 12/12/2016 10:43 AM, Shahar Klein wrote:
>>>>
>>>> Hi All,
>>>>
>>>> sorry for the spam, the first time was sent with html part and was
>>>> rejected.
>>>>
>>>> We observed an issue where a classifier instance next member is
>>>> pointing back to itself, causing a CPU soft lockup.
>>>> We found it by running traffic on many udp connections and then adding
>>>> a new flower rule using tc.
>>>>
>>>> We added a quick workaround to verify it:
>>>>
>>>> In tc_classify:
>>>>
>>>>          for (; tp; tp = rcu_dereference_bh(tp->next)) {
>>>>                  int err;
>>>> +               if (tp == tp->next)
>>>> +                     RCU_INIT_POINTER(tp->next, NULL);
>>>>
>>>>
>>>> We also had a print here showing tp->next is pointing to tp. With this
>>>> workaround we are not hitting the issue anymore.
>>>> We are not sure we fully understand the mechanism here - with the rtnl
>>>> and rcu locks.
>>>> We'll appreciate your help solving this issue.
>>>
>>>
>>> Note that there's still the RCU fix missing for the deletion race that
>>> Cong will still send out, but you say that the only thing you do is to
>>> add a single rule, but no other operation in involved during that test?
>
> Hmm, I thought RCU_INIT_POINTER() respects readers, but seems no?
> If so, that could be the cause since we play with the next pointer and
> there is only one filter in this case, but I don't see why we could have
> a loop here.
>
>>>
>>> Do you have a script and kernel .config for reproducing this?
>>
>>
>> I'm using a user space socket app(https://github.com/shahar-klein/noodle)on
>> a vm to push udp packets from ~2000 different udp src ports ramping up at
>> ~100 per second towards another vm on the same Hypervisor. Once the traffic
>> starts I'm pushing ingress flower tc udp rules(even_udp_src_port->mirred,
>> odd->drop) on the relevant representor in the Hypervisor.
>
> Do you mind to share your `tc filter show dev...` output? Also, since you
> mentioned you only add one flower filter, just want to make sure you never
> delete any filter before/when the bug happens? How reproducible is this?


The bridge between the two vms is based on ovs and representors.
We have a dpif in the ovs creating tc rules from ovs rules.
We set up 5000 open flow rules looks like this:
cook......, udp,dl_dst=24:8a:07:38:a2:b2,tp_src=7000 actions=drop
cook......, udp,dl_dst=24:8a:07:38:a2:b2,tp_src=7002 actions=drop
cook......, udp,dl_dst=24:8a:07:38:a2:b2,tp_src=7004 actions=drop
.
.
.

and then fire up 2000 udp flows starting at udp src 7000 and ramping up 
at 100 flows per second so after 20 seconds we suppose to have 2000 
active udp flows and half of them are dropped at the tc level.

The first packet of any such match hits the miss rule in the ovs 
datapath and pushed up to the user space ovs which consult the open 
flows rules above and translate the ovs rule to tc rule and push the 
rule back to the kernel via netlink.
I'm not sure I understand what happens to the second packet of the same 
match or all the following packets in the same match till the tc 
datapath is 'ready' for them.

The soft lockup is easily reproducible using this scenario but it won't 
happen if we use a much more easy traffic scheme first, say 100 udp 
flows at 3 per second.

I added a print and a panic when hitting the loop(output attached)

Also attached our .config


>
> Thanks!
>

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 34095 bytes --]

[-- Attachment #3: tc_classify_panic.gz --]
[-- Type: application/gzip, Size: 1161 bytes --]

^ permalink raw reply

* Re: [PATCH 3/3] netns: fix net_generic() "id - 1" bloat
From: Alexey Dobriyan @ 2016-12-13 14:23 UTC (permalink / raw)
  To: David Laight
  Cc: davem@davemloft.net, netdev@vger.kernel.org, xemul@openvz.org
In-Reply-To: <063D6719AE5E284EB5DD2968C1650D6DB0236F41@AcuExch.aculab.com>

On Wed, Dec 7, 2016 at 1:49 PM, David Laight <David.Laight@aculab.com> wrote:
> From: Alexey Dobriyan
>> Sent: 05 December 2016 14:48
>> On Mon, Dec 5, 2016 at 3:49 PM, David Laight <David.Laight@aculab.com> wrote:
>> > From: Alexey Dobriyan
>> >> Sent: 02 December 2016 01:22
>> >> net_generic() function is both a) inline and b) used ~600 times.
>> >>
>> >> It has the following code inside
>> >>
>> >>               ...
>> >>       ptr = ng->ptr[id - 1];
>> >>               ...
>> >>
>> >> "id" is never compile time constant so compiler is forced to subtract 1.
>> >> And those decrements or LEA [r32 - 1] instructions add up.
>> >>
>> >> We also start id'ing from 1 to catch bugs where pernet sybsystem id
>> >> is not initialized and 0. This is quite pointless idea (nothing will
>> >> work or immediate interference with first registered subsystem) in
>> >> general but it hints what needs to be done for code size reduction.
>> >>
>> >> Namely, overlaying allocation of pointer array and fixed part of
>> >> structure in the beginning and using usual base-0 addressing.
>> >>
>> >> Ids are just cookies, their exact values do not matter, so lets start
>> >> with 3 on x86_64.
>> > ...
>> >>  struct net_generic {
>> >> -     struct {
>> >> -             unsigned int len;
>> >> -             struct rcu_head rcu;
>> >> -     } s;
>> >> -
>> >> -     void *ptr[0];
>> >> +     union {
>> >> +             struct {
>> >> +                     unsigned int len;
>> >> +                     struct rcu_head rcu;
>> >> +             } s;
>> >> +
>> >> +             void *ptr[0];
>> >> +     };
>> >>  };
>> >
>> > That union is an accident waiting to happen.
>>
>> I kind of disagree. Module authors should not be given matches,
>> but it is hard to screw up if net_generic() is all you're given.
>>
>> > What might work is to offset the Ids by
>> > (offsetof(struct net_generic, ptr)/sizeof (void *)) instead of by 1.
>> > The subtract from the offset will then counter the structure offset
>> > - which is what you are trying to achieve.
>>
>> If you suggest this layout
>>
>> struct net_generic {
>>         struct {
>>         } s;
>>         void *ptr[0];
>> };
>>
>> then is it not optimal because offset of "ptr" needs to be somewhere in code
>> either in some LEA or imm8 of the final MOV which is 1 byte more bloaty.
>>
>> Here is test program
>>
>> struct ng1 {
>>         union {
>>                 struct {
>>                         unsigned int len;
>>                 } s;
>>                 void *ptr[0];
>>         };
>> };
>> struct ng2 {
>>         struct {
>>                 unsigned int len;
>>         } s;
>>         void *ptr[0];
>> };
>> struct net {
>>         int x;
>>         struct ng1 *gen1;
>>         struct ng2 *gen2;
>> };
>> void *ng1(const struct net *net, unsigned int id)
>> {
>>         return net->gen1->ptr[id];
>> }
>> void *ng2(const struct net *net, unsigned int id)
>> {
>>         return net->gen2->ptr[id];
>> }
>>
>>
>> 0000000000000000 <ng1>:
>>    0:   48 8b 47 08             mov    rax,QWORD PTR [rdi+0x8]
>>    4:   89 f6                   mov    esi,esi
>>    6:   48 8b 04 f0             mov    rax,QWORD PTR [rax+rsi*8]
>>    a:   c3                      ret
>>
>>
>> 0000000000000010 <ng2>:
>>   10:   48 8b 47 10             mov    rax,QWORD PTR [rdi+0x10]
>>   14:   89 f6                   mov    esi,esi
>>   16:   48 8b 44 f0 [[[08]]]          mov    rax,QWORD PTR [rax+rsi*8+0x8]
>>   1b:   c3                      ret
>
> On x86 that will make ~0 difference since the offset (in that sequence)
> doesn't require an extra instruction.

Well, the point of the patch is to save .text, so might as well save
as much as possible. Any form other than "ptr[id]" is going
to be either bigger or bigger and slower and "ptr" should be the first field.

> However if you offset the 'id' values so that only
> values 2 up are valid the code becomes:
>         return net->gen2->ptr[id - 2];
> which will be exactly the same code as:
>         return net->gen1->ptr[id];
> but it is much more obvious that 'id' values must be >= 2.
>
> The '2' should be generated from the structure offset, but with my method
> is doesn't actually matter if it is wrong.

^ permalink raw reply

* Re: [PATCH net] virtio-net: correctly enable multiqueue
From: Michael S. Tsirkin @ 2016-12-13 13:56 UTC (permalink / raw)
  To: Jason Wang; +Cc: netdev, virtualization, tytso, Neil Horman
In-Reply-To: <1481610185-12183-1-git-send-email-jasowang@redhat.com>

On Tue, Dec 13, 2016 at 02:23:05PM +0800, Jason Wang wrote:
> Commit 4490001029012539937ff02778fe6180613fa949 ("virtio-net: enable
> multiqueue by default") blindly set the affinity instead of queues
> during probe which can cause a mismatch of #queues between guest and
> host. This patch fixes it by setting queues.
> 
> Reported-by: Theodore Ts'o <tytso@mit.edu>
> Tested-by: Theodore Ts'o <tytso@mit.edu>
> Cc: Neil Horman <nhorman@tuxdriver.com>
> Cc: Michael S. Tsirkin <mst@redhat.com>
> Fixes: 49000102901 ("virtio-net: enable multiqueue by default")
> Signed-off-by: Jason Wang <jasowang@redhat.com>

Acked-by: Michael S. Tsirkin <mst@redhat.com>

> ---
>  drivers/net/virtio_net.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index b425fa1..fe9f772 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -1930,7 +1930,9 @@ static int virtnet_probe(struct virtio_device *vdev)
>  		goto free_unregister_netdev;
>  	}
>  
> -	virtnet_set_affinity(vi);
> +	rtnl_lock();
> +	virtnet_set_queues(vi, vi->curr_queue_pairs);
> +	rtnl_unlock();
>  
>  	/* Assume link up if device can't report link status,
>  	   otherwise get link status from config. */

I note that virtnet_set_channels also plays with affinity
directly. Can this be changed to rely on cpu notifiers
somehow?


> -- 
> 2.7.4

^ permalink raw reply

* mlx5: net_device.addr_list_lock usage before initialization
From: Sebastian Ott @ 2016-12-13 13:22 UTC (permalink / raw)
  To: Saeed Mahameed, Matan Barak, Leon Romanovsky; +Cc: netdev, linux-kernel

Hi,

I ran into the following lockdep complaint:

[    7.059561] INFO: trying to register non-static key.
[    7.059566] the code is fine but needs lockdep annotation.
[    7.059570] turning off the locking correctness validator.
[    7.059579] CPU: 6 PID: 6 Comm: kworker/u32:0 Not tainted 4.9.0-02683-g784243e-dirty #77
[    7.059582] Hardware name: IBM              2964 N96              704              (LPAR)
[    7.061260] Workqueue: mlx5e mlx5e_set_rx_mode_work [mlx5_core]
[    7.061268] Stack:
[    7.061270]        00000000f95739c0 00000000f9573a50 0000000000000003 0000000000000000
[    7.061278]        00000000f9573af0 00000000f9573a68 00000000f9573a68 0000000000000020
[    7.061286]        0000000000000000 0000000000000020 000000000000000a 000000000000000a
[    7.061294]        000000000000000c 00000000f9573ab8 0000000000000000 0000000000000000
[    7.061301]        00000000008a1038 0000000000112a50 00000000f9573a50 00000000f9573aa8
[    7.061314] Call Trace:
[    7.061321] ([<000000000011292a>] show_trace+0x8a/0xe0)
[    7.061327]  [<0000000000112a00>] show_stack+0x80/0xd8
[    7.061334]  [<00000000005cdce6>] dump_stack+0x96/0xd8
[    7.061338]  [<00000000001ae352>] register_lock_class+0x1d2/0x530
[    7.061341]  [<00000000001b33f6>] __lock_acquire+0xfe/0x7d8
[    7.061345]  [<00000000001b4394>] lock_acquire+0x30c/0x358
[    7.061352]  [<000000000089454c>] _raw_spin_lock_bh+0x64/0xa0
[    7.062171]  [<000003ff81465858>] mlx5e_set_rx_mode_work+0x248/0x490 [mlx5_core]
[    7.062178]  [<0000000000163864>] process_one_work+0x41c/0x830
[    7.062181]  [<0000000000163f2c>] worker_thread+0x2b4/0x478
[    7.062186]  [<000000000016c46c>] kthread+0x15c/0x170
[    7.062190]  [<0000000000895a52>] kernel_thread_starter+0x6/0xc
[    7.062193]  [<0000000000895a4c>] kernel_thread_starter+0x0/0xc
[    7.062196] INFO: lockdep is turned off.

The problematic lock is net_device.addr_list_lock whose usage is
asynchronously triggered by:

mlx5e_add -> mlx5e_attach -> mlx5e_attach_netdev -> mlx5e_nic_enable 
[workq] mlx5e_set_rx_mode_work -> mlx5e_handle_netdev_addr -> mlx5e_sync_netdev_addr

Initialization of this lock is triggered by:
mlx5e_add -> register_netdev

...after the call to mlx5e_attach which is obviously racy.

Regards,
Sebastian

^ permalink raw reply

* Re: [PATCHv2 1/5] sh_eth: add generic wake-on-lan support via magic packet
From: Geert Uytterhoeven @ 2016-12-13 13:03 UTC (permalink / raw)
  To: Niklas Söderlund
  Cc: Sergei Shtylyov, Simon Horman, netdev@vger.kernel.org,
	Linux-Renesas
In-Reply-To: <20161212160931.6478-2-niklas.soderlund+renesas@ragnatech.se>

CC linux-pm

On Mon, Dec 12, 2016 at 5:09 PM, Niklas Söderlund
<niklas.soderlund+renesas@ragnatech.se> wrote:
> Add generic functionality to support Wake-on-Lan using MagicPacket which
> are supported by at least a few versions of sh_eth. Only add
> functionality for WoL, no specific sh_eth version are marked to support
> WoL yet.
>
> WoL is enabled in the suspend callback by setting MagicPacket detection
> and disabling all interrupts expect MagicPacket. In the resume path the
> driver needs to reset the hardware to rearm the WoL logic, this prevents
> the driver from simply restoring the registers and to take advantage of
> that sh_eth was not suspended to reduce resume time. To reset the
> hardware the driver close and reopens the device just like it would do
> in a normal suspend/resume scenario without WoL enabled, but it both
> close and open the device in the resume callback since the device needs
> to be open for WoL to work.
>
> One quirk needed for WoL is that the module clock needs to be prevented
> from being switched off by Runtime PM. To keep the clock alive the
> suspend callback need to call clk_enable() directly to increase the
> usage count of the clock. Then when Runtime PM decreases the clock usage
> count it won't reach 0 and be switched off.
>
> Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>

Thanks for the update!

I've verified WoL is working on r8a7791/koelsch and r8a7740/armadillo.

However, if I look at /sys/kernel/debug/wakeup_sources, "active_count" and
"event_count" for the Ethernet device do not increase when using WoL, while
they do for the GPIO keyboard when using the keyboard for wake up.
So something seems to be missing from a bookkeeping point of view.

Interestingly, "wakeup_count" does not increase for both?
Has this been broken recently?

> ---
>  drivers/net/ethernet/renesas/sh_eth.c | 112 +++++++++++++++++++++++++++++++---
>  drivers/net/ethernet/renesas/sh_eth.h |   3 +
>  2 files changed, 107 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/net/ethernet/renesas/sh_eth.c b/drivers/net/ethernet/renesas/sh_eth.c
> index 05b0dc5..87640b9 100644
> --- a/drivers/net/ethernet/renesas/sh_eth.c
> +++ b/drivers/net/ethernet/renesas/sh_eth.c
> @@ -2199,6 +2199,33 @@ static int sh_eth_set_ringparam(struct net_device *ndev,
>         return 0;
>  }
>
> +static void sh_eth_get_wol(struct net_device *ndev, struct ethtool_wolinfo *wol)
> +{
> +       struct sh_eth_private *mdp = netdev_priv(ndev);
> +
> +       wol->supported = 0;
> +       wol->wolopts = 0;
> +
> +       if (mdp->cd->magic && mdp->clk) {
> +               wol->supported = WAKE_MAGIC;
> +               wol->wolopts = mdp->wol_enabled ? WAKE_MAGIC : 0;
> +       }
> +}
> +
> +static int sh_eth_set_wol(struct net_device *ndev, struct ethtool_wolinfo *wol)
> +{
> +       struct sh_eth_private *mdp = netdev_priv(ndev);
> +
> +       if (!mdp->cd->magic || !mdp->clk || wol->wolopts & ~WAKE_MAGIC)
> +               return -EOPNOTSUPP;
> +
> +       mdp->wol_enabled = !!(wol->wolopts & WAKE_MAGIC);
> +
> +       device_set_wakeup_enable(&mdp->pdev->dev, mdp->wol_enabled);
> +
> +       return 0;
> +}
> +
>  static const struct ethtool_ops sh_eth_ethtool_ops = {
>         .get_regs_len   = sh_eth_get_regs_len,
>         .get_regs       = sh_eth_get_regs,
> @@ -2213,6 +2240,8 @@ static const struct ethtool_ops sh_eth_ethtool_ops = {
>         .set_ringparam  = sh_eth_set_ringparam,
>         .get_link_ksettings = sh_eth_get_link_ksettings,
>         .set_link_ksettings = sh_eth_set_link_ksettings,
> +       .get_wol        = sh_eth_get_wol,
> +       .set_wol        = sh_eth_set_wol,
>  };
>
>  /* network device open function */
> @@ -3017,6 +3046,11 @@ static int sh_eth_drv_probe(struct platform_device *pdev)
>                 goto out_release;
>         }
>
> +       /* Get clock, if not found that's OK but Wake-On-Lan is unavailable */
> +       mdp->clk = devm_clk_get(&pdev->dev, NULL);
> +       if (IS_ERR(mdp->clk))
> +               mdp->clk = NULL;
> +
>         ndev->base_addr = res->start;
>
>         spin_lock_init(&mdp->lock);
> @@ -3111,6 +3145,9 @@ static int sh_eth_drv_probe(struct platform_device *pdev)
>         if (ret)
>                 goto out_napi_del;
>
> +       if (mdp->cd->magic && mdp->clk)
> +               device_set_wakeup_capable(&pdev->dev, 1);
> +
>         /* print device information */
>         netdev_info(ndev, "Base address at 0x%x, %pM, IRQ %d.\n",
>                     (u32)ndev->base_addr, ndev->dev_addr, ndev->irq);
> @@ -3150,15 +3187,67 @@ static int sh_eth_drv_remove(struct platform_device *pdev)
>
>  #ifdef CONFIG_PM
>  #ifdef CONFIG_PM_SLEEP
> +static int sh_eth_wol_setup(struct net_device *ndev)
> +{
> +       struct sh_eth_private *mdp = netdev_priv(ndev);
> +
> +       /* Only allow ECI interrupts */
> +       synchronize_irq(ndev->irq);
> +       napi_disable(&mdp->napi);
> +       sh_eth_write(ndev, DMAC_M_ECI, EESIPR);
> +
> +       /* Enable MagicPacket */
> +       sh_eth_modify(ndev, ECMR, 0, ECMR_PMDE);
> +
> +       /* Increased clock usage so device won't be suspended */
> +       clk_enable(mdp->clk);
> +
> +       return enable_irq_wake(ndev->irq);
> +}
> +
> +static int sh_eth_wol_restore(struct net_device *ndev)
> +{
> +       struct sh_eth_private *mdp = netdev_priv(ndev);
> +       int ret;
> +
> +       napi_enable(&mdp->napi);
> +
> +       /* Disable MagicPacket */
> +       sh_eth_modify(ndev, ECMR, ECMR_PMDE, 0);
> +
> +       /* The device need to be reset to restore MagicPacket logic
> +        * for next wakeup. If we close and open the device it will
> +        * both be reset and all registers restored. This is what
> +        * happens during suspend and resume without WoL enabled.
> +        */
> +       ret = sh_eth_close(ndev);
> +       if (ret < 0)
> +               return ret;
> +       ret = sh_eth_open(ndev);
> +       if (ret < 0)
> +               return ret;
> +
> +       /* Restore clock usage count */
> +       clk_disable(mdp->clk);
> +
> +       return disable_irq_wake(ndev->irq);
> +}
> +
>  static int sh_eth_suspend(struct device *dev)
>  {
>         struct net_device *ndev = dev_get_drvdata(dev);
> +       struct sh_eth_private *mdp = netdev_priv(ndev);
>         int ret = 0;
>
> -       if (netif_running(ndev)) {
> -               netif_device_detach(ndev);
> +       if (!netif_running(ndev))
> +               return 0;
> +
> +       netif_device_detach(ndev);
> +
> +       if (mdp->wol_enabled)
> +               ret = sh_eth_wol_setup(ndev);
> +       else
>                 ret = sh_eth_close(ndev);
> -       }
>
>         return ret;
>  }
> @@ -3166,14 +3255,21 @@ static int sh_eth_suspend(struct device *dev)
>  static int sh_eth_resume(struct device *dev)
>  {
>         struct net_device *ndev = dev_get_drvdata(dev);
> +       struct sh_eth_private *mdp = netdev_priv(ndev);
>         int ret = 0;
>
> -       if (netif_running(ndev)) {
> +       if (!netif_running(ndev))
> +               return 0;
> +
> +       if (mdp->wol_enabled)
> +               ret = sh_eth_wol_restore(ndev);
> +       else
>                 ret = sh_eth_open(ndev);
> -               if (ret < 0)
> -                       return ret;
> -               netif_device_attach(ndev);
> -       }
> +
> +       if (ret < 0)
> +               return ret;
> +
> +       netif_device_attach(ndev);
>
>         return ret;
>  }
> diff --git a/drivers/net/ethernet/renesas/sh_eth.h b/drivers/net/ethernet/renesas/sh_eth.h
> index d050f37..4ceed00 100644
> --- a/drivers/net/ethernet/renesas/sh_eth.h
> +++ b/drivers/net/ethernet/renesas/sh_eth.h
> @@ -493,6 +493,7 @@ struct sh_eth_cpu_data {
>         unsigned shift_rd0:1;   /* shift Rx descriptor word 0 right by 16 */
>         unsigned rmiimode:1;    /* EtherC has RMIIMODE register */
>         unsigned rtrate:1;      /* EtherC has RTRATE register */
> +       unsigned magic:1;       /* EtherC has ECMR.PMDE and ECSR.MPD */
>  };
>
>  struct sh_eth_private {
> @@ -501,6 +502,7 @@ struct sh_eth_private {
>         const u16 *reg_offset;
>         void __iomem *addr;
>         void __iomem *tsu_addr;
> +       struct clk *clk;
>         u32 num_rx_ring;
>         u32 num_tx_ring;
>         dma_addr_t rx_desc_dma;
> @@ -529,6 +531,7 @@ struct sh_eth_private {
>         unsigned no_ether_link:1;
>         unsigned ether_link_active_low:1;
>         unsigned is_opened:1;
> +       unsigned wol_enabled:1;
>  };
>
>  static inline void sh_eth_soft_swap(char *src, int len)
> --
> 2.10.2

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply

* Re: [PATCH net-next 13/27] bridge: use __vlan_hwaccel helpers
From: Sergei Shtylyov @ 2016-12-13 12:59 UTC (permalink / raw)
  To: Michał Mirosław, netdev
  Cc: Pablo Neira Ayuso (supporter:NETFILTER ({IP ,IP6,ARP,EB,NF}TABLES)),
	Patrick McHardy (supporter:NETFILTER ({IP,IP6,ARP ,EB,NF}TABLES)),
	Jozsef Kadlecsik (supporter:NETFILTER ({IP,IP6,ARP,EB,NF}TABLES)),
	Stephen Hemminger (maintainer:ETHERNET BRIDGE),
	open list:NETFILTER ({IP,IP6,ARP,EB ,NF}TABLES),
	moderated list:ETHERNET BRIDGE
In-Reply-To: <e6e09a2f745a652382ac7859770ec16c75d8e149.1481586602.git.mirq-linux@rere.qmqm.pl>

Hello!

On 12/13/2016 3:12 AM, Michał Mirosław wrote:

> This removes assumption than vlan_tci != 0 when tag is present.
>
> Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
> ---
>  net/bridge/br_netfilter_hooks.c | 14 ++++++++------
>  net/bridge/br_private.h         |  2 +-
>  net/bridge/br_vlan.c            |  6 +++---
>  3 files changed, 12 insertions(+), 10 deletions(-)
>
> diff --git a/net/bridge/br_netfilter_hooks.c b/net/bridge/br_netfilter_hooks.c
> index b12501a..2cc0747 100644
> --- a/net/bridge/br_netfilter_hooks.c
> +++ b/net/bridge/br_netfilter_hooks.c
[...]
> @@ -749,8 +747,12 @@ static int br_nf_dev_queue_xmit(struct net *net, struct sock *sk, struct sk_buff
>
>  		data = this_cpu_ptr(&brnf_frag_data_storage);
>
> -		data->vlan_tci = skb->vlan_tci;
> -		data->vlan_proto = skb->vlan_proto;
> +		if (skb_vlan_tag_present(skb)) {
> +			data->vlan_tci = skb->vlan_tci;
> +			data->vlan_proto = skb->vlan_proto;
> +		} else
> +			data->vlan_proto = 0;

    CodingStyle: should use {} in all branches of *if* if at least one branch 
has {}.

[...]
> diff --git a/net/bridge/br_vlan.c b/net/bridge/br_vlan.c
> index b6de4f4..ef94664 100644
> --- a/net/bridge/br_vlan.c
> +++ b/net/bridge/br_vlan.c

> @@ -444,8 +444,8 @@ static bool __allowed_ingress(const struct net_bridge *br,
>  			__vlan_hwaccel_put_tag(skb, br->vlan_proto, pvid);
>  		else
>  			/* Priority-tagged Frame.
> -			 * At this point, We know that skb->vlan_tci had
> -			 * VLAN_TAG_PRESENT bit and its VID field was 0x000.
> +			 * At this point, We know that skb->vlan_tci VID

    s/We/we/.

> +			 * field was 0x000.

    Simply 0, maybe?

[...]

MBR, Sergei


^ permalink raw reply

* Re: Synopsys Ethernet QoS
From: Joao Pinto @ 2016-12-13 12:56 UTC (permalink / raw)
  To: Lars Persson, Niklas Cassel
  Cc: Joao Pinto, Florian Fainelli, Andy Shevchenko, David Miller,
	Giuseppe CAVALLARO, Rabin Vincent, netdev,
	CARLOS.PALMINHA@synopsys.com, Jie.Deng1@synopsys.com,
	Stephen Warren, pavel@ucw.cz
In-Reply-To: <46E178AA-6100-4244-BB8D-B5EA58E96E72@axis.com>

Às 12:50 PM de 12/13/2016, Lars Persson escreveu:
> 
>> 13 dec. 2016 kl. 13:31 skrev Niklas Cassel <Niklas.Cassel@axis.com>:
>>
>>
>>
>>> On 12/13/2016 12:49 PM, Joao Pinto wrote:
>>> Hi Niklas,
>>>
>>> Às 4:25 PM de 12/12/2016, Niklas Cassel escreveu:
>>>>
>>>>> On 12/12/2016 11:19 AM, Joao Pinto wrote:
>>>>> Hi,
>>>>>
>>>>> Às 1:44 AM de 12/10/2016, Florian Fainelli escreveu:
>>>>>>> Le 12/09/16 à 16:16, Andy Shevchenko a écrit :
>>>>>>> On Sat, Dec 10, 2016 at 12:52 AM, Florian Fainelli <f.fainelli@gmail.com> wrote:
>>> (snip...)
>>>
>>>
>>>>> @Rabin Vincent: Hi Rabin. Since Axis is more familiar with the synopsys/*qos*
>>>>> driver would it be possible for you to make an initial analysis of what has to
>>>>> be merged into Stmmac? This way the development would speed-up.
>>>> I can answer that question.
>>>>
>>>> I've sent out 12 patches to the stmmac driver
>>>> (all patches are included in the current net-next tree),
>>>> with these patches the stmmac driver works properly on Axis hardware
>>>> (we use Synopsys GMAC 4.10a synthesized with multiple TX queues).
>>>> stmmac's DT binding has also been extended with properties that
>>>> existed in DWC EQoS's DT binding, such as no-pbl-x8, txpbl, rxpbl.
>>>>
>>>> Since we have no problem updating the DTB together with the kernel,
>>>> we will simply move to using the start using the stmmac driver,
>>>> with stmmac's DT binding.
>>>>
>>>> However, I've noticed that NVIDIA has extended the DWC EQoS DT binding,
>>>> I don't how easy it would be for them to switch to stmmac's DT binding.
>>>> (Adding Stephen Warren to CC.)
>>>>
>>>> The reset sequence that Lars Persson was worried about is not an issue
>>>> with the stmmac driver.
>>> Great! So you saying that stmmac works great with AXIS hardware and no need to
>>> merge specific code from AXIS' *qos* driver?
>>
>> Yes. From Axis point of view, we are done.
>> stmmac works and we will move to that driver + DT binding.
>>
>> However, it seems like Stephen Warren will NAK if you try to remove
>> synopsys/dwc_eth_qos.c before
>> Documentation/devicetree/bindings/net/stmmac.txt
>> is compatible with
>> Documentation/devicetree/bindings/net/snps,dwc-qos-ethernet.txt
>>
>> You might want to sync with him. I have no idea, but perhaps they are
>> only using a subset of all the available properties. Perhaps,
>> only implementing what they are using might be enough, I don't know.
>> I couldn't find their DTS in arch/arm/dts.
>> I guess you might want to know David Miller's opinion,
>> since he's the one who decides in the end.
> 
> I will also NACK removal of dwc_eth_qos.c until the binding is implemented elsewhere.

Totally agree.
@Niklas: David Miller is informed of what we are planning to do. Can you
estimate the effort of merging the axis driver device tree bindings? If there
was anyone from axis to do the merge would be better since you are familiar with
it. What do you think?

> 
> 
>>
>>>>
>>>>
>>>>
>>>> There are some performance problems with the stmmac driver though:
>>>>
>>>> When running iperf3 with 3 streams:
>>>> iperf3 -c 192.168.0.90 -P 3 -t 30
>>>> iperf3 -c 192.168.0.90 -P 3 -t 30 -R
>>>>
>>>> I get really bad fairness between the streams.
>>>>
>>>> This appears to be an issue with how TX IRQ coalescing is implemented in stmmac.
>>>> Disabling TX IRQ coalescing in the stmmac driver makes the problem go away.
>>>> We have a local patch that implements TX IRQ coalescing in the dwceqos driver,
>>>> and we don't see the same problem.
>>>>
>>>> Also netperf TCP_RR and UDP_RR gives really bad results compared to the
>>>> dwceqos driver (without IRQ coalescing).
>>>> 2000 transactions/sec vs 9000 transactions/sec.
>>>> Turning TX IRQ coalescing off and RX interrupt watchdog off in stmmac
>>>> gives the same performance. I guess it's a trade off, low CPU usage
>>>> vs low latency, so I don't know how important TCP_RR/UDP_RR really is.
>>>>
>>>> The best thing would be to get a good working TX IRQ coalesce
>>>> implementation with HR timers in stmmac.
>>>> Perhaps it should also be investigated if the RX interrupt watchdog
>>>> timeout should have a lower default value.
>>>>
>>>>
>>>>
>>>>> Thanks to all.
>>>>>
>>>>> Joao
>>

^ permalink raw reply

* Re: [PATCHv2 2/5] sh_eth: enable wake-on-lan for Gen2 devices
From: Geert Uytterhoeven @ 2016-12-13 12:52 UTC (permalink / raw)
  To: Niklas Söderlund
  Cc: Sergei Shtylyov, Simon Horman, netdev@vger.kernel.org,
	Linux-Renesas
In-Reply-To: <20161212160931.6478-3-niklas.soderlund+renesas@ragnatech.se>

On Mon, Dec 12, 2016 at 5:09 PM, Niklas Söderlund
<niklas.soderlund+renesas@ragnatech.se> wrote:
> Tested on Gen2 r8a7791/Koelsch.
>
> Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>

Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply

* Re: [PATCHv2 3/5] sh_eth: enable wake-on-lan for r8a7740/armadillo
From: Geert Uytterhoeven @ 2016-12-13 12:51 UTC (permalink / raw)
  To: Niklas Söderlund
  Cc: Sergei Shtylyov, Simon Horman, netdev@vger.kernel.org,
	Linux-Renesas
In-Reply-To: <20161212160931.6478-4-niklas.soderlund+renesas@ragnatech.se>

On Mon, Dec 12, 2016 at 5:09 PM, Niklas Söderlund
<niklas.soderlund+renesas@ragnatech.se> wrote:
> Geert Uytterhoeven reported WoL worked on his Armadillo board.
>
> Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>

Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply

* Re: Synopsys Ethernet QoS
From: Lars Persson @ 2016-12-13 12:50 UTC (permalink / raw)
  To: Niklas Cassel
  Cc: Joao Pinto, Florian Fainelli, Andy Shevchenko, David Miller,
	Giuseppe CAVALLARO, Rabin Vincent, netdev,
	CARLOS.PALMINHA@synopsys.com, Jie.Deng1@synopsys.com,
	Stephen Warren, pavel@ucw.cz
In-Reply-To: <b6cbe6e6-c998-302a-36d3-2183acb076d9@axis.com>


> 13 dec. 2016 kl. 13:31 skrev Niklas Cassel <Niklas.Cassel@axis.com>:
> 
> 
> 
>> On 12/13/2016 12:49 PM, Joao Pinto wrote:
>> Hi Niklas,
>> 
>> Às 4:25 PM de 12/12/2016, Niklas Cassel escreveu:
>>> 
>>>> On 12/12/2016 11:19 AM, Joao Pinto wrote:
>>>> Hi,
>>>> 
>>>> Às 1:44 AM de 12/10/2016, Florian Fainelli escreveu:
>>>>>> Le 12/09/16 à 16:16, Andy Shevchenko a écrit :
>>>>>> On Sat, Dec 10, 2016 at 12:52 AM, Florian Fainelli <f.fainelli@gmail.com> wrote:
>> (snip...)
>> 
>> 
>>>> @Rabin Vincent: Hi Rabin. Since Axis is more familiar with the synopsys/*qos*
>>>> driver would it be possible for you to make an initial analysis of what has to
>>>> be merged into Stmmac? This way the development would speed-up.
>>> I can answer that question.
>>> 
>>> I've sent out 12 patches to the stmmac driver
>>> (all patches are included in the current net-next tree),
>>> with these patches the stmmac driver works properly on Axis hardware
>>> (we use Synopsys GMAC 4.10a synthesized with multiple TX queues).
>>> stmmac's DT binding has also been extended with properties that
>>> existed in DWC EQoS's DT binding, such as no-pbl-x8, txpbl, rxpbl.
>>> 
>>> Since we have no problem updating the DTB together with the kernel,
>>> we will simply move to using the start using the stmmac driver,
>>> with stmmac's DT binding.
>>> 
>>> However, I've noticed that NVIDIA has extended the DWC EQoS DT binding,
>>> I don't how easy it would be for them to switch to stmmac's DT binding.
>>> (Adding Stephen Warren to CC.)
>>> 
>>> The reset sequence that Lars Persson was worried about is not an issue
>>> with the stmmac driver.
>> Great! So you saying that stmmac works great with AXIS hardware and no need to
>> merge specific code from AXIS' *qos* driver?
> 
> Yes. From Axis point of view, we are done.
> stmmac works and we will move to that driver + DT binding.
> 
> However, it seems like Stephen Warren will NAK if you try to remove
> synopsys/dwc_eth_qos.c before
> Documentation/devicetree/bindings/net/stmmac.txt
> is compatible with
> Documentation/devicetree/bindings/net/snps,dwc-qos-ethernet.txt
> 
> You might want to sync with him. I have no idea, but perhaps they are
> only using a subset of all the available properties. Perhaps,
> only implementing what they are using might be enough, I don't know.
> I couldn't find their DTS in arch/arm/dts.
> I guess you might want to know David Miller's opinion,
> since he's the one who decides in the end.

I will also NACK removal of dwc_eth_qos.c until the binding is implemented elsewhere.


> 
>>> 
>>> 
>>> 
>>> There are some performance problems with the stmmac driver though:
>>> 
>>> When running iperf3 with 3 streams:
>>> iperf3 -c 192.168.0.90 -P 3 -t 30
>>> iperf3 -c 192.168.0.90 -P 3 -t 30 -R
>>> 
>>> I get really bad fairness between the streams.
>>> 
>>> This appears to be an issue with how TX IRQ coalescing is implemented in stmmac.
>>> Disabling TX IRQ coalescing in the stmmac driver makes the problem go away.
>>> We have a local patch that implements TX IRQ coalescing in the dwceqos driver,
>>> and we don't see the same problem.
>>> 
>>> Also netperf TCP_RR and UDP_RR gives really bad results compared to the
>>> dwceqos driver (without IRQ coalescing).
>>> 2000 transactions/sec vs 9000 transactions/sec.
>>> Turning TX IRQ coalescing off and RX interrupt watchdog off in stmmac
>>> gives the same performance. I guess it's a trade off, low CPU usage
>>> vs low latency, so I don't know how important TCP_RR/UDP_RR really is.
>>> 
>>> The best thing would be to get a good working TX IRQ coalesce
>>> implementation with HR timers in stmmac.
>>> Perhaps it should also be investigated if the RX interrupt watchdog
>>> timeout should have a lower default value.
>>> 
>>> 
>>> 
>>>> Thanks to all.
>>>> 
>>>> Joao
> 

^ permalink raw reply

* [PATCH iproute2 V2 2/2] tc: tunnel_key: Add tc-tunnel_key man page to Makefile
From: Roi Dayan @ 2016-12-13 12:39 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev, Amir Vadai, Hadar Hen Zion, Roi Dayan
In-Reply-To: <1481632742-18020-1-git-send-email-roid@mellanox.com>

To be installed with the other man pages.

Fixes: d57639a475a9 ("tc/act_tunnel: Introduce ip tunnel action")
Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Amir Vadai <amir@vadai.me>
---
 man/man8/Makefile | 1 +
 1 file changed, 1 insertion(+)

diff --git a/man/man8/Makefile b/man/man8/Makefile
index de6f249..d4cb01a 100644
--- a/man/man8/Makefile
+++ b/man/man8/Makefile
@@ -17,6 +17,7 @@ MAN8PAGES = $(TARGETS) ip.8 arpd.8 lnstat.8 routel.8 rtacct.8 rtmon.8 rtpr.8 ss.
 	tc-tcindex.8 tc-u32.8 tc-matchall.8 \
 	tc-connmark.8 tc-csum.8 tc-mirred.8 tc-nat.8 tc-pedit.8 tc-police.8 \
 	tc-simple.8 tc-skbedit.8 tc-vlan.8 tc-xt.8  tc-ife.8 tc-skbmod.8 \
+	tc-tunnel_key.8 \
 	devlink.8 devlink-dev.8 devlink-monitor.8 devlink-port.8 devlink-sb.8
 
 all: $(TARGETS)
-- 
2.7.4

^ permalink raw reply related

* [PATCH iproute2 V2 0/2] Man pages fixes
From: Roi Dayan @ 2016-12-13 12:39 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev, Amir Vadai, Hadar Hen Zion, Roi Dayan

Hi,

The 2 patches are man page related only.
First fixes a typo and second adding missing man page to the Makefile.

Thanks,
Roi

v2:
- style fix in flower man page

Roi Dayan (2):
  tc: flower: Fix typo and style in flower man page
  tc: tunnel_key: Add tc-tunnel_key man page to Makefile

 man/man8/Makefile    |  1 +
 man/man8/tc-flower.8 | 10 +++++-----
 2 files changed, 6 insertions(+), 5 deletions(-)

-- 
2.7.4

^ permalink raw reply

* [PATCH iproute2 V2 1/2] tc: flower: Fix typo and style in flower man page
From: Roi Dayan @ 2016-12-13 12:39 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev, Amir Vadai, Hadar Hen Zion, Roi Dayan
In-Reply-To: <1481632742-18020-1-git-send-email-roid@mellanox.com>

Replace vlan_eth_type with vlan_ethtype.

Fixes: 745d91726006 ("tc: flower: Introduce vlan support")
Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Hadar Hen Zion <hadarh@mellanox.com>
---
 man/man8/tc-flower.8 | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/man/man8/tc-flower.8 b/man/man8/tc-flower.8
index 90fdfba..15bc32f 100644
--- a/man/man8/tc-flower.8
+++ b/man/man8/tc-flower.8
@@ -27,7 +27,7 @@ flower \- flow based traffic control filter
 .IR VID " | "
 .B vlan_prio
 .IR PRIORITY " | "
-.BR vlan_eth_type " { " ipv4 " | " ipv6 " | "
+.BR vlan_ethtype " { " ipv4 " | " ipv6 " | "
 .IR ETH_TYPE " } | "
 .BR ip_proto " { " tcp " | " udp " | " sctp " | " icmp " | " icmpv6 " | "
 .IR IP_PROTO " } | { "
@@ -82,16 +82,16 @@ Match on vlan tag id.
 .I VID
 is an unsigned 12bit value in decimal format.
 .TP
-.BI vlan_prio " priority"
+.BI vlan_prio " PRIORITY"
 Match on vlan tag priority.
 .I PRIORITY
 is an unsigned 3bit value in decimal format.
 .TP
-.BI vlan_eth_type " VLAN_ETH_TYPE"
+.BI vlan_ethtype " VLAN_ETH_TYPE"
 Match on layer three protocol.
-.I ETH_TYPE
+.I VLAN_ETH_TYPE
 may be either
-.BR ipv4 , ipv6
+.BR ipv4 ", " ipv6
 or an unsigned 16bit value in hexadecimal format.
 .TP
 .BI ip_proto " IP_PROTO"
-- 
2.7.4

^ permalink raw reply related

* Re: Synopsys Ethernet QoS
From: Niklas Cassel @ 2016-12-13 12:31 UTC (permalink / raw)
  To: Joao Pinto, Florian Fainelli, Andy Shevchenko
  Cc: David Miller, Giuseppe CAVALLARO, larper, rabinv, netdev,
	CARLOS.PALMINHA, Jie.Deng1, Stephen Warren, pavel
In-Reply-To: <9bedb947-e27b-6bd8-ec11-6389afd189ec@synopsys.com>



On 12/13/2016 12:49 PM, Joao Pinto wrote:
> Hi Niklas,
>
> Às 4:25 PM de 12/12/2016, Niklas Cassel escreveu:
>>
>> On 12/12/2016 11:19 AM, Joao Pinto wrote:
>>> Hi,
>>>
>>> Às 1:44 AM de 12/10/2016, Florian Fainelli escreveu:
>>>> Le 12/09/16 à 16:16, Andy Shevchenko a écrit :
>>>>> On Sat, Dec 10, 2016 at 12:52 AM, Florian Fainelli <f.fainelli@gmail.com> wrote:
> (snip...)
>
>
>>> @Rabin Vincent: Hi Rabin. Since Axis is more familiar with the synopsys/*qos*
>>> driver would it be possible for you to make an initial analysis of what has to
>>> be merged into Stmmac? This way the development would speed-up.
>> I can answer that question.
>>
>> I've sent out 12 patches to the stmmac driver
>> (all patches are included in the current net-next tree),
>> with these patches the stmmac driver works properly on Axis hardware
>> (we use Synopsys GMAC 4.10a synthesized with multiple TX queues).
>> stmmac's DT binding has also been extended with properties that
>> existed in DWC EQoS's DT binding, such as no-pbl-x8, txpbl, rxpbl.
>>
>> Since we have no problem updating the DTB together with the kernel,
>> we will simply move to using the start using the stmmac driver,
>> with stmmac's DT binding.
>>
>> However, I've noticed that NVIDIA has extended the DWC EQoS DT binding,
>> I don't how easy it would be for them to switch to stmmac's DT binding.
>> (Adding Stephen Warren to CC.)
>>
>> The reset sequence that Lars Persson was worried about is not an issue
>> with the stmmac driver.
> Great! So you saying that stmmac works great with AXIS hardware and no need to
> merge specific code from AXIS' *qos* driver?

Yes. From Axis point of view, we are done.
stmmac works and we will move to that driver + DT binding.

However, it seems like Stephen Warren will NAK if you try to remove
synopsys/dwc_eth_qos.c before
Documentation/devicetree/bindings/net/stmmac.txt
is compatible with
Documentation/devicetree/bindings/net/snps,dwc-qos-ethernet.txt

You might want to sync with him. I have no idea, but perhaps they are
only using a subset of all the available properties. Perhaps,
only implementing what they are using might be enough, I don't know.
I couldn't find their DTS in arch/arm/dts.
I guess you might want to know David Miller's opinion,
since he's the one who decides in the end.

>>
>>
>>
>> There are some performance problems with the stmmac driver though:
>>
>> When running iperf3 with 3 streams:
>> iperf3 -c 192.168.0.90 -P 3 -t 30
>> iperf3 -c 192.168.0.90 -P 3 -t 30 -R
>>
>> I get really bad fairness between the streams.
>>
>> This appears to be an issue with how TX IRQ coalescing is implemented in stmmac.
>> Disabling TX IRQ coalescing in the stmmac driver makes the problem go away.
>> We have a local patch that implements TX IRQ coalescing in the dwceqos driver,
>> and we don't see the same problem.
>>
>> Also netperf TCP_RR and UDP_RR gives really bad results compared to the
>> dwceqos driver (without IRQ coalescing).
>> 2000 transactions/sec vs 9000 transactions/sec.
>> Turning TX IRQ coalescing off and RX interrupt watchdog off in stmmac
>> gives the same performance. I guess it's a trade off, low CPU usage
>> vs low latency, so I don't know how important TCP_RR/UDP_RR really is.
>>
>> The best thing would be to get a good working TX IRQ coalesce
>> implementation with HR timers in stmmac.
>> Perhaps it should also be investigated if the RX interrupt watchdog
>> timeout should have a lower default value.
>>
>>
>>
>>> Thanks to all.
>>>
>>> Joao

^ permalink raw reply

* Re: Synopsys Ethernet QoS
From: Joao Pinto @ 2016-12-13 11:49 UTC (permalink / raw)
  To: Niklas Cassel, Joao Pinto, Florian Fainelli, Andy Shevchenko
  Cc: David Miller, Giuseppe CAVALLARO, larper, rabinv, netdev,
	CARLOS.PALMINHA, Jie.Deng1, Stephen Warren, pavel
In-Reply-To: <1d445ec1-deb8-6e36-39c4-6813c446095f@axis.com>


Hi Niklas,

Às 4:25 PM de 12/12/2016, Niklas Cassel escreveu:
> 
> 
> On 12/12/2016 11:19 AM, Joao Pinto wrote:
>> Hi,
>>
>> Às 1:44 AM de 12/10/2016, Florian Fainelli escreveu:
>>> Le 12/09/16 à 16:16, Andy Shevchenko a écrit :
>>>> On Sat, Dec 10, 2016 at 12:52 AM, Florian Fainelli <f.fainelli@gmail.com> wrote:

(snip...)


>>
>> @Rabin Vincent: Hi Rabin. Since Axis is more familiar with the synopsys/*qos*
>> driver would it be possible for you to make an initial analysis of what has to
>> be merged into Stmmac? This way the development would speed-up.
> 
> I can answer that question.
> 
> I've sent out 12 patches to the stmmac driver
> (all patches are included in the current net-next tree),
> with these patches the stmmac driver works properly on Axis hardware
> (we use Synopsys GMAC 4.10a synthesized with multiple TX queues).
> stmmac's DT binding has also been extended with properties that
> existed in DWC EQoS's DT binding, such as no-pbl-x8, txpbl, rxpbl.
> 
> Since we have no problem updating the DTB together with the kernel,
> we will simply move to using the start using the stmmac driver,
> with stmmac's DT binding.
> 
> However, I've noticed that NVIDIA has extended the DWC EQoS DT binding,
> I don't how easy it would be for them to switch to stmmac's DT binding.
> (Adding Stephen Warren to CC.)
> 
> The reset sequence that Lars Persson was worried about is not an issue
> with the stmmac driver.

Great! So you saying that stmmac works great with AXIS hardware and no need to
merge specific code from AXIS' *qos* driver?

> 
> 
> 
> 
> There are some performance problems with the stmmac driver though:
> 
> When running iperf3 with 3 streams:
> iperf3 -c 192.168.0.90 -P 3 -t 30
> iperf3 -c 192.168.0.90 -P 3 -t 30 -R
> 
> I get really bad fairness between the streams.
> 
> This appears to be an issue with how TX IRQ coalescing is implemented in stmmac.
> Disabling TX IRQ coalescing in the stmmac driver makes the problem go away.
> We have a local patch that implements TX IRQ coalescing in the dwceqos driver,
> and we don't see the same problem.
> 
> Also netperf TCP_RR and UDP_RR gives really bad results compared to the
> dwceqos driver (without IRQ coalescing).
> 2000 transactions/sec vs 9000 transactions/sec.
> Turning TX IRQ coalescing off and RX interrupt watchdog off in stmmac
> gives the same performance. I guess it's a trade off, low CPU usage
> vs low latency, so I don't know how important TCP_RR/UDP_RR really is.
> 
> The best thing would be to get a good working TX IRQ coalesce
> implementation with HR timers in stmmac.
> Perhaps it should also be investigated if the RX interrupt watchdog
> timeout should have a lower default value.
> 
> 
> 
>>
>> Thanks to all.
>>
>> Joao
> 

^ permalink raw reply

* Re: [PATCHv2 4/5] sh_eth: enable wake-on-lan for sh7743
From: Geert Uytterhoeven @ 2016-12-13 10:59 UTC (permalink / raw)
  To: Niklas Söderlund
  Cc: Sergei Shtylyov, Simon Horman, netdev@vger.kernel.org,
	Linux-Renesas
In-Reply-To: <20161212160931.6478-5-niklas.soderlund+renesas@ragnatech.se>

On Mon, Dec 12, 2016 at 5:09 PM, Niklas Söderlund
<niklas.soderlund+renesas@ragnatech.se> wrote:
> This is based on public datasheet for sh7743 which shows it have the

sh7734 (also in the one-line summary)
it has

> same behavior and registers for WoL as other versions of sh_eth.

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply

* Re: netlink: GPF in sock_sndtimeo
From: Richard Guy Briggs @ 2016-12-13 10:52 UTC (permalink / raw)
  To: Cong Wang
  Cc: linux-audit, Paul Moore, Dmitry Vyukov, David Miller,
	Johannes Berg, Florian Westphal, Eric Dumazet, Herbert Xu, netdev,
	LKML, syzkaller
In-Reply-To: <CAM_iQpUejChnsMWx3Z59GTuxZMeBjgrs2rhd7gsQAKoxE8YuDg@mail.gmail.com>

On 2016-12-12 16:10, Cong Wang wrote:
> On Mon, Dec 12, 2016 at 2:02 AM, Richard Guy Briggs <rgb@redhat.com> wrote:
> > On 2016-12-09 20:13, Cong Wang wrote:
> >> Netlink notifier can safely be converted to blocking one, I will send
> >> a patch.
> >
> > I had a quick look at how that might happen.  The netlink notifier chain
> > is atomic.  Would the registered callback funciton need to spawn a
> > one-time thread to avoid blocking?
> 
> It is already non-atomic now:
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=efa172f42836477bf1ac3c9a3053140df764699c

Ok, that is recent...  It is still less attractive as you point out due
to the overhead, but still worth considering if we can't find another
way.

> > I had a look at your patch.  It looks attractively simple.  The audit
> > next tree has patches queued that add an audit_reset function that will
> > require more work.  I still see some potential gaps.
> >
> > - If the process messes up (or the sock lookup messes up) it is reset
> >   in the kauditd thread under the audit_cmd_mutex.
> >
> > - If the process exits normally or is replaced due to an audit_replace
> >   error, it is reset from audit_receive_skb under the audit_cmd_mutex.
> >
> > - If the process dies before the kauditd thread notices, either reap it
> >   via notifier callback or it needs a check on net exit to reset.  This
> >   last one appears necessary to decrement the sock refcount so the sock
> >   can be released in netlink_kernel_release().
> >
> > If we want to be proactive and use the netlink notifier, we assume the
> > overhead of adding to the netlink notifier chain and eliminate all the
> > other reset calls under the kauditd thread.  If we are ok being
> > reactionary, then we'll at least need the net exit check on audit_sock.
> 
> I don't see why we need to check it in net exit if we use refcnt,
> because we have two different users of audit_sock: kauditd and
> netns, if both take care of refcnt properly, we don't need to worry
> about who is the last, no matter what failures occur in what order.

It is actually the audit_pid and audit_nlk_portid that I care about
more.  The audit daemon could vanish or close the socket while the
kernel sock to which it was attached is still quite valid.  Accessing
the set of three atomically is the urge.  I wonder if it makes more
sense to test for the presence of auditd using audit_sock rather than
audit_pid, but still keep audit_pid for our reporting and replacement
strategy.  Another idea would be to put the three in one struct.

Can someone explain how they think the original test was able to trigger
this GPF?  Network namespace shutdown while something pretended to set
up a new auditd?  That's impressive for a fuzzer if that's the case...
Is there an strace?  I guess it is all in test().

- RGB

--
Richard Guy Briggs <rgb@redhat.com>
Kernel Security Engineering, Base Operating Systems, Red Hat
Remote, Ottawa, Canada
Voice: +1.647.777.2635, Internal: (81) 32635

^ permalink raw reply

* Re: [PATCH net-next 17/27] OVS: remove assumptions about VLAN_TAG_PRESENT bit
From: Jiri Benc @ 2016-12-13 10:40 UTC (permalink / raw)
  To: Michał Mirosław
  Cc: open list:OPENVSWITCH, netdev-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <e44219bc56d3e44aa0711c83c626adabf4c4ecd8.1481586602.git.mirq-linux-CoA6ZxLDdyEEUmgCuDUIdw@public.gmane.org>

On Tue, 13 Dec 2016 01:12:38 +0100 (CET), Michał Mirosław wrote:
> @@ -850,20 +848,11 @@ static int validate_vlan_from_nlattrs(const struct sw_flow_match *match,
>  		return -EINVAL;
>  	}
>  
> -	if (a[OVS_KEY_ATTR_VLAN])
> -		tci = nla_get_be16(a[OVS_KEY_ATTR_VLAN]);
> -
> -	if (!(tci & htons(VLAN_TAG_PRESENT))) {
> -		if (tci) {
> -			OVS_NLERR(log, "%s TCI does not have VLAN_TAG_PRESENT bit set.",
> -				  (inner) ? "C-VLAN" : "VLAN");
> -			return -EINVAL;
> -		} else if (nla_len(a[OVS_KEY_ATTR_ENCAP])) {
> -			/* Corner case for truncated VLAN header. */
> -			OVS_NLERR(log, "Truncated %s header has non-zero encap attribute.",
> -				  (inner) ? "C-VLAN" : "VLAN");
> -			return -EINVAL;
> -		}
> +	if (!a[OVS_KEY_ATTR_VLAN] && nla_len(a[OVS_KEY_ATTR_ENCAP])) {
> +		/* Corner case for truncated VLAN header. */
> +		OVS_NLERR(log, "Truncated %s header has non-zero encap attribute.",
> +			(inner) ? "C-VLAN" : "VLAN");
> +		return -EINVAL;

This looks wrong. The zero value of nla_get_be16(a[OVS_KEY_ATTR_VLAN])
together with empty a[OVS_KEY_ATTR_ENCAP] means truncated VLAN header
and this needs to be preserved.

In other words, you need to check also !nla_get_be16(a[OVS_KEY_ATTR_VLAN])
here.


Overall, this whole patch looks very suspicious from the very
beginning. The xors used are strong indication that something is not
right.

And indeed, you're breaking uAPI compatibility. Previously, the
VLAG_TAG_PRESENT bit set in OVS_KEY_ATTR_VLAN caused the flow to match
on packets with VLAN tag present. After this patch, it causes the flow
to match only on those packets that have a certain value of
VLAN_CFI_MASK in their VLAN tag (I haven't bother deciphering what
value is checked after all these xors, it's as well possible that it
will also match on packets _without_ any VLAN header).

You have to preserve the old meaning of VLAN_TAG_PRESENT in
OVS_KEY_ATTR_VLAN. When doing flow lookups, VLAN_TAG_PRESENT must match
on and only on packets that have VLAN tag present, irrespective of their
VLAN_CFI_MASK.

If you want to introduce support for lookups on VLAN_CFI_MASK, you'll
have to do that by introducing a new netlink attribute.

 Jiri
_______________________________________________
dev mailing list
dev@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

^ permalink raw reply

* Re: bloat-o-meter +32832 (was: Re: bpf: add prog_digest and expose it via fdinfo/netlink)
From: Daniel Borkmann @ 2016-12-13 10:06 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: Alexei Starovoitov, Linux Kernel Mailing List,
	netdev@vger.kernel.org
In-Reply-To: <CAMuHMdUoadaRw9dAOkpzn9MKV=3AQhmtfMu_zzoT6_VmFaePQg@mail.gmail.com>

On 12/13/2016 10:29 AM, Geert Uytterhoeven wrote:
> Hi Daniel,
>
> On Mon, Dec 12, 2016 at 6:08 PM, Linux Kernel Mailing List
> <linux-kernel@vger.kernel.org> wrote:
>> Web:        https://git.kernel.org/torvalds/c/7bd509e311f408f7a5132fcdde2069af65fa05ae
>> Commit:     7bd509e311f408f7a5132fcdde2069af65fa05ae
>> Parent:     8d829bdb97dc3a0c9c8090b9b168ca46ea99c8d8
>> Refname:    refs/heads/master
>> Author:     Daniel Borkmann <daniel@iogearbox.net>
>> AuthorDate: Sun Dec 4 23:19:41 2016 +0100
>> Committer:  David S. Miller <davem@davemloft.net>
>> CommitDate: Mon Dec 5 15:33:11 2016 -0500
>>
>>      bpf: add prog_digest and expose it via fdinfo/netlink
>>
>>      When loading a BPF program via bpf(2), calculate the digest over
>>      the program's instruction stream and store it in struct bpf_prog's
>>      digest member. This is done at a point in time before any instructions
>>      are rewritten by the verifier. Any unstable map file descriptor
>>      number part of the imm field will be zeroed for the hash.
>>
>>      fdinfo example output for progs:
>>
>>        # cat /proc/1590/fdinfo/5
>>        pos:          0
>>        flags:        02000002
>>        mnt_id:       11
>>        prog_type:    1
>>        prog_jited:   1
>>        prog_digest:  b27e8b06da22707513aa97363dfb11c7c3675d28
>>        memlock:      4096
>>
>>      When programs are pinned and retrieved by an ELF loader, the loader
>>      can check the program's digest through fdinfo and compare it against
>>      one that was generated over the ELF file's program section to see
>>      if the program needs to be reloaded. Furthermore, this can also be
>>      exposed through other means such as netlink in case of a tc cls/act
>>      dump (or xdp in future), but also through tracepoints or other
>>      facilities to identify the program. Other than that, the digest can
>>      also serve as a base name for the work in progress kallsyms support
>>      of programs. The digest doesn't depend/select the crypto layer, since
>>      we need to keep dependencies to a minimum. iproute2 will get support
>>      for this facility.
>>
>>      Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
>>      Acked-by: Alexei Starovoitov <ast@kernel.org>
>>      Signed-off-by: David S. Miller <davem@davemloft.net>
>
>> --- a/kernel/bpf/core.c
>> +++ b/kernel/bpf/core.c
>> @@ -136,6 +136,71 @@ void __bpf_prog_free(struct bpf_prog *fp)
>>          vfree(fp);
>>   }
>>
>> +#define SHA_BPF_RAW_SIZE                                               \
>> +       round_up(MAX_BPF_SIZE + sizeof(__be64) + 1, SHA_MESSAGE_BYTES)
>> +
>> +/* Called under verifier mutex. */
>> +void bpf_prog_calc_digest(struct bpf_prog *fp)
>> +{
>> +       const u32 bits_offset = SHA_MESSAGE_BYTES - sizeof(__be64);
>> +       static u32 ws[SHA_WORKSPACE_WORDS];
>> +       static u8 raw[SHA_BPF_RAW_SIZE];
>
> function                                     old     new   delta
> raw                                            -   32832  +32832
>
> Congratulations!
> We've found the first nominee for the v4.10-rc1 bloat-o-meter contest! :-(
>
> Can this please be allocated dynamically? Thanks!

Sure, should be okay since mostly slow-path. Will send a patch.

^ permalink raw reply

* Re: [PATCH v4 2/4] vhost-vsock: add pkt cancel capability
From: Jorgen S. Hansen @ 2016-12-13 10:04 UTC (permalink / raw)
  To: Peng Tao
  Cc: netdev@vger.kernel.org, Stefan Hajnoczi, David Miller,
	kvm@vger.kernel.org, virtualization@lists.linux-foundation.org
In-Reply-To: <1481545269-18104-3-git-send-email-bergwolf@gmail.com>


> On Dec 12, 2016, at 1:21 PM, Peng Tao <bergwolf@gmail.com> wrote:
> 
> To allow canceling all packets of a connection.
> 
> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
> Signed-off-by: Peng Tao <bergwolf@gmail.com>
> ---
> drivers/vhost/vsock.c  | 41 +++++++++++++++++++++++++++++++++++++++++
> include/net/af_vsock.h |  3 +++
> 2 files changed, 44 insertions(+)
> 
> diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c
> index a504e2e0..fef8808 100644
> --- a/drivers/vhost/vsock.c
> +++ b/drivers/vhost/vsock.c
> @@ -218,6 +218,46 @@ vhost_transport_send_pkt(struct virtio_vsock_pkt *pkt)
> 	return len;
> }
> 
> +static int
> +vhost_transport_cancel_pkt(struct vsock_sock *vsk)
> +{
> +	struct vhost_vsock *vsock;
> +	struct virtio_vsock_pkt *pkt, *n;
> +	int cnt = 0;
> +	LIST_HEAD(freeme);
> +
> +	/* Find the vhost_vsock according to guest context id  */
> +	vsock = vhost_vsock_get(vsk->remote_addr.svm_cid);
> +	if (!vsock)
> +		return -ENODEV;
> +
> +	spin_lock_bh(&vsock->send_pkt_list_lock);
> +	list_for_each_entry_safe(pkt, n, &vsock->send_pkt_list, list) {
> +		if (pkt->cancel_token != vsk)
> +			continue;
> +		list_move(&pkt->list, &freeme);
> +	}
> +	spin_unlock_bh(&vsock->send_pkt_list_lock);
> +
> +	list_for_each_entry_safe(pkt, n, &freeme, list) {
> +		if (pkt->reply)
> +			cnt++;
> +		list_del(&pkt->list);
> +		virtio_transport_free_pkt(pkt);
> +	}
> +
> +	if (cnt) {
> +		struct vhost_virtqueue *tx_vq = &vsock->vqs[VSOCK_VQ_TX];
> +		int new_cnt;
> +
> +		new_cnt = atomic_sub_return(cnt, &vsock->queued_replies);
> +		if (new_cnt + cnt >= tx_vq->num && new_cnt < tx_vq->num)
> +			vhost_poll_queue(&tx_vq->poll);
> +	}
> +
> +	return 0;
> +}
> +
> static struct virtio_vsock_pkt *
> vhost_vsock_alloc_pkt(struct vhost_virtqueue *vq,
> 		      unsigned int out, unsigned int in)
> @@ -664,6 +704,7 @@ static struct virtio_transport vhost_transport = {
> 		.release                  = virtio_transport_release,
> 		.connect                  = virtio_transport_connect,
> 		.shutdown                 = virtio_transport_shutdown,
> +		.cancel_pkt               = vhost_transport_cancel_pkt,
> 
> 		.dgram_enqueue            = virtio_transport_dgram_enqueue,
> 		.dgram_dequeue            = virtio_transport_dgram_dequeue,
> diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h
> index f275896..f32ed9a 100644
> --- a/include/net/af_vsock.h
> +++ b/include/net/af_vsock.h
> @@ -100,6 +100,9 @@ struct vsock_transport {
> 	void (*destruct)(struct vsock_sock *);
> 	void (*release)(struct vsock_sock *);
> 
> +	/* Cancel all pending packets sent on vsock. */
> +	int (*cancel_pkt)(struct vsock_sock *vsk);
> +
> 	/* Connections. */
> 	int (*connect)(struct vsock_sock *);
> 
> -- 
> 2.7.4
> 

Reviewed-by: Jorgen Hansen <jhansen@vmware.com>

^ permalink raw reply

* Re: Synopsys Ethernet QoS
From: Giuseppe CAVALLARO @ 2016-12-13  9:51 UTC (permalink / raw)
  To: Niklas Cassel, Joao Pinto, Florian Fainelli, Andy Shevchenko
  Cc: David Miller, larper, rabinv, netdev, CARLOS.PALMINHA, Jie.Deng1,
	Stephen Warren, pavel
In-Reply-To: <99424968-ad8f-fec6-ebcf-ab7b19ee5486@axis.com>

Hello Niklas

On 12/13/2016 10:38 AM, Niklas Cassel wrote:
> On 12/13/2016 08:22 AM, Giuseppe CAVALLARO wrote:
>> On 12/12/2016 5:25 PM, Niklas Cassel wrote:
>>>
>>>
>>> On 12/12/2016 11:19 AM, Joao Pinto wrote:
>>>> Hi,
>>>>
>>>> Às 1:44 AM de 12/10/2016, Florian Fainelli escreveu:
>>>>> Le 12/09/16 à 16:16, Andy Shevchenko a écrit :
>>>>>> On Sat, Dec 10, 2016 at 12:52 AM, Florian Fainelli <f.fainelli@gmail.com> wrote:
>>>>>>
>>>>>>> It's kind of sad that customers of that IP (stmmac, amd-xgbe, sxgbe)
>>>>>>> did
>>>>>>> actually pioneer the upstreaming effort, but it is good to see people
>>>>>>> from Synopsys willing to fix that in the future.
>>>>>> Wait, you would like to tell that we have more than 2 drivers for the
>>>>>> same (okay, same vendor) IP?!
>>>>>> It's better to unify them earlier, than have n+ copies.
>>>>> Unfortunately that is the case, see this email:
>>>>>
>>>>> https://www.mail-archive.com/netdev@vger.kernel.org/msg142796.html
>>>>>
>>>>> dwc_eth_qos and stmmac have some overlap. There seems to be work
>>>>> underway to unify these two to begin with.
>>>>>
>>>>>> P.S. Though, I don't see how sxgbe got in the list. First glance on
>>>>>> the code doesn't show similarities.
>>>>> Well samsung/sxgbe looks potentially similar to amd/xgbe, but that's
>>>>> just my cursory look at the code, it may very well be something entirely
>>>>> different. The descriptor formats just look suspiciously similar.
>>>>>
>>>> Thank you for your inputs! Renaming seems to be a hotspot. I agree that maybe
>>>> instead of renaming (breaking retro-compatibility as David and Florian
>>>> mentioned), the best is to move stmmac to synopsys/ after merging *qos* and
>>>> removing it. As Florian mentioned, git is capable of detecting folder restructured.
>>>>
>>>> @Rabin Vincent: Hi Rabin. Since Axis is more familiar with the synopsys/*qos*
>>>> driver would it be possible for you to make an initial analysis of what has to
>>>> be merged into Stmmac? This way the development would speed-up.
>>>
>>> I can answer that question.
>>>
>>> I've sent out 12 patches to the stmmac driver
>>> (all patches are included in the current net-next tree),
>>
>> ok I have seen these patches applied, I had just a minor concern about
>> the  failure when DMA configuration is missing.
>> In these years, I have noticed that, for this kind of HW, default DMA
>> configuration is usually good to have a driver working. AHB, AXI
>> parameters can be provided to have a best tuning or to fix know issues
>> on some platforms. So IMO, we should relax the check with a warning.
>> Please, consider that, the stmmac also supports very old MAC10/100
>> versions where the DMA configuration was often never passed.
>>
>
> This might be a bit off-topic, but:

yes indeed ;-)

> The patch should not affect any existing code.
> All glue drivers uses stmmac_probe_config_dt,
> which sets a default value if the property is missing or zero.
> The PCI glue driver sets the property explicitly.
> Hence, all current code should not be affected.
> The error check was added as a sanity check, just in case some
> new code is added, which bypasses stmmac_probe_config_dt,
> lets say support for GMAC4 in the PCI glue driver.
>

ok

>>
>>>
>>> There are some performance problems with the stmmac driver though:
>>>
>>> When running iperf3 with 3 streams:
>>> iperf3 -c 192.168.0.90 -P 3 -t 30
>>> iperf3 -c 192.168.0.90 -P 3 -t 30 -R
>>>
>>> I get really bad fairness between the streams.
>>
>> Can you confirm you are using the 4.xxa version?
>
> Yes, 4.10a.
> Reading the MAC_Version register gives:
> 0x00004041
> Where SNPSVER is bit 7:0, so 0x41.

ok

>
>
>>
>> This doesn't match with Alex's experiments on ARM platforms.
>>
>
> We are also using an ARM platform.
> There is no problem with throughput.

ok

> The problem is fairness between the streams,
> when using for example 3 streams in iperf3.

hmm I let Alex to reply.

AFAIK, other people played with stream tests and
no issue but, if you get problems so we have to
investigate.

> Did Alex test this? I could not find Alex mail in the netdev archives,
> could you link to it?
> The problem goes away when disabling TX IRQ coalescing,
> which I guess makes sense, since like David Miller said:
>
> "the driver is using non-highres timers
> to implement the TX coalescing.  [...]
> 1 HZ, which is the lowest granularity of non-highres timers in the
> kernel, is variable as well as already too large of a delay for
> effective TX coalescing."
>
> We are using CONFIG_HZ=100, not CONFIG_HZ=1000.

1000 was a default on our platforms

>
> So if there is a long time before handling interrupts,
> I guess that it makes sense that one stream could
> get an advantage in the net scheduler.

ok

>
> If I find the time, and if no one beats me to it, I will try to replace
> the normal timers with HR timers + a smaller default timeout.

that's great.

>>
>>> We have a local patch that implements TX IRQ coalescing in the dwceqos driver,
>>> and we don't see the same problem.
>>
>> please, if you have new patch add me on CC and we will review all
>> together.
>
> I think you misunderstood me, we have a local patch for the synopsys/dwc_eth_qos.c
> version, not for stmmac.

oops, sorry

>
> I have been using get_maintainer.pl, so you should have been added to all my patches,
> but if I send any further patches, I will make sure that you are not excluded.

thanks a lot

Regards
Peppe

>
>

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox