Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH v2] Revert "alx: remove WoL support"
From: Andrew Lunn @ 2018-05-30 13:58 UTC (permalink / raw)
  To: AceLan Kao
  Cc: Jay Cliburn, Chris Snook, David S . Miller, Rakesh Pandit, netdev,
	Emily Chien, Johannes Berg, Johannes Stezenbach, linux-kernel
In-Reply-To: <20180530021008.15080-1-acelan.kao@canonical.com>

On Wed, May 30, 2018 at 10:10:08AM +0800, AceLan Kao wrote:
> This reverts commit bc2bebe8de8ed4ba6482c9cc370b0dd72ffe8cd2.
> 
> The WoL feature is a must to pass Energy Star 6.1 and above,
> the power consumption will be measured during S3 with WoL is enabled.
> 
> Reverting "alx: remove WoL support", and will try to fix the unintentional
> wake up issue when WoL is enabled.

Hi AceLan

I find this change log entry rather odd.

If i remember correctly, you first argued that you did not want to
have to distribute out of tree patches.

It was suggested that you might be able to justify the revert using
the argument that the cure is worse than the decease. You ignored
that, and when with this Energy Star argument. That got shot down by
DaveM, and told to actually try to find the problem.

So you then come back and said you think the problem is fixed, but
don't know exactly what fixed it. So DaveM said try again.

Now you are back to Energy Star.

I don't get this. It was the fact you said it was probably fixed that
made DaveM reconsider. That is the argument you should be using in the
change log. We want to know what testing you have done. See a
tested-by: from somebody who had the issue which caused the revert,
and now says the issue is fixed.

Ideally we would like to know which change actually fixed the issue,
so it can be added to stable. But that requires somebody to do a long
git bisect.

    Andrew

^ permalink raw reply

* Re: [PATCH net] mlx4_core: restore optimal ICM memory allocation
From: Tariq Toukan @ 2018-05-30 13:49 UTC (permalink / raw)
  To: Eric Dumazet, David S . Miller
  Cc: netdev, Eric Dumazet, John Sperbeck, Tarick Bedeir, Qing Huang,
	Daniel Jurgens, Zhu Yanjun, Tariq Toukan
In-Reply-To: <20180530041152.113393-1-edumazet@google.com>



On 30/05/2018 7:11 AM, Eric Dumazet wrote:
> Commit 1383cb8103bb ("mlx4_core: allocate ICM memory in page size chunks")
> brought a regression caught in our regression suite, thanks to KASAN.
> 
> Note that mlx4_alloc_icm() is already able to try high order allocations
> and fallback to low-order allocations under high memory pressure.
> 
> We only have to tweak gfp_mask a bit, to help falling back faster,
> without risking OOM killings.
> 
> BUG: KASAN: slab-out-of-bounds in to_rdma_ah_attr+0x808/0x9e0 [mlx4_ib]
> Read of size 4 at addr ffff8817df584f68 by task qp_listing_test/92585
> 
> CPU: 38 PID: 92585 Comm: qp_listing_test Tainted: G           O
> Call Trace:
>   [<ffffffffba80d7bb>] dump_stack+0x4d/0x72
>   [<ffffffffb951dc5f>] print_address_description+0x6f/0x260
>   [<ffffffffb951e1c7>] kasan_report+0x257/0x370
>   [<ffffffffb951e339>] __asan_report_load4_noabort+0x19/0x20
>   [<ffffffffc0256d28>] to_rdma_ah_attr+0x808/0x9e0 [mlx4_ib]
>   [<ffffffffc02785b3>] mlx4_ib_query_qp+0x1213/0x1660 [mlx4_ib]
>   [<ffffffffc02dbfdb>] qpstat_print_qp+0x13b/0x500 [ib_uverbs]
>   [<ffffffffc02dc3ea>] qpstat_seq_show+0x4a/0xb0 [ib_uverbs]
>   [<ffffffffb95f125c>] seq_read+0xa9c/0x1230
>   [<ffffffffb96e0821>] proc_reg_read+0xc1/0x180
>   [<ffffffffb9577918>] __vfs_read+0xe8/0x730
>   [<ffffffffb9578057>] vfs_read+0xf7/0x300
>   [<ffffffffb95794d2>] SyS_read+0xd2/0x1b0
>   [<ffffffffb8e06b16>] do_syscall_64+0x186/0x420
>   [<ffffffffbaa00071>] entry_SYSCALL_64_after_hwframe+0x3d/0xa2
> RIP: 0033:0x7f851a7bb30d
> RSP: 002b:00007ffd09a758c0 EFLAGS: 00000293 ORIG_RAX: 0000000000000000
> RAX: ffffffffffffffda RBX: 00007f84ff959440 RCX: 00007f851a7bb30d
> RDX: 000000000003fc00 RSI: 00007f84ff60a000 RDI: 000000000000000b
> RBP: 00007ffd09a75900 R08: 00000000ffffffff R09: 0000000000000000
> R10: 0000000000000022 R11: 0000000000000293 R12: 0000000000000000
> R13: 000000000003ffff R14: 000000000003ffff R15: 00007f84ff60a000
> 
> Allocated by task 4488:
>   save_stack+0x46/0xd0
>   kasan_kmalloc+0xad/0xe0
>   __kmalloc+0x101/0x5e0
>   ib_register_device+0xc03/0x1250 [ib_core]
>   mlx4_ib_add+0x27d6/0x4dd0 [mlx4_ib]
>   mlx4_add_device+0xa9/0x340 [mlx4_core]
>   mlx4_register_interface+0x16e/0x390 [mlx4_core]
>   xhci_pci_remove+0x7a/0x180 [xhci_pci]
>   do_one_initcall+0xa0/0x230
>   do_init_module+0x1b9/0x5a4
>   load_module+0x63e6/0x94c0
>   SYSC_init_module+0x1a4/0x1c0
>   SyS_init_module+0xe/0x10
>   do_syscall_64+0x186/0x420
>   entry_SYSCALL_64_after_hwframe+0x3d/0xa2
> 
> Freed by task 0:
> (stack is not available)
> 
> The buggy address belongs to the object at ffff8817df584f40
>   which belongs to the cache kmalloc-32 of size 32
> The buggy address is located 8 bytes to the right of
>   32-byte region [ffff8817df584f40, ffff8817df584f60)
> The buggy address belongs to the page:
> page:ffffea005f7d6100 count:1 mapcount:0 mapping:ffff8817df584000 index:0xffff8817df584fc1
> flags: 0x880000000000100(slab)
> raw: 0880000000000100 ffff8817df584000 ffff8817df584fc1 000000010000003f
> raw: ffffea005f3ac0a0 ffffea005c476760 ffff8817fec00900 ffff883ff78d26c0
> page dumped because: kasan: bad access detected
> page->mem_cgroup:ffff883ff78d26c0
> 
> Memory state around the buggy address:
>   ffff8817df584e00: 00 03 fc fc fc fc fc fc 00 03 fc fc fc fc fc fc
>   ffff8817df584e80: 00 00 00 04 fc fc fc fc 00 00 00 fc fc fc fc fc
>> ffff8817df584f00: fb fb fb fb fc fc fc fc 00 00 00 00 fc fc fc fc
>                                                            ^
>   ffff8817df584f80: fb fb fb fb fc fc fc fc fc fc fc fc fc fc fc fc
>   ffff8817df585000: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> 
> Fixes: 1383cb8103bb ("mlx4_core: allocate ICM memory in page size chunks")
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Cc: John Sperbeck <jsperbeck@google.com>
> Cc: Tarick Bedeir <tarick@google.com>
> Cc: Qing Huang <qing.huang@oracle.com>
> Cc: Daniel Jurgens <danielj@mellanox.com>
> Cc: Zhu Yanjun <yanjun.zhu@oracle.com>
> Cc: Tariq Toukan <tariqt@mellanox.com>
> ---
>   drivers/net/ethernet/mellanox/mlx4/icm.c | 17 +++++++++++------
>   1 file changed, 11 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/net/ethernet/mellanox/mlx4/icm.c b/drivers/net/ethernet/mellanox/mlx4/icm.c
> index 685337d58276fc91baeeb64387c52985e1bc6dda..cae33d5c7dbd9ba7929adcf2127b104f6796fa5a 100644
> --- a/drivers/net/ethernet/mellanox/mlx4/icm.c
> +++ b/drivers/net/ethernet/mellanox/mlx4/icm.c
> @@ -43,12 +43,13 @@
>   #include "fw.h"
>   
>   /*
> - * We allocate in page size (default 4KB on many archs) chunks to avoid high
> - * order memory allocations in fragmented/high usage memory situation.
> + * We allocate in as big chunks as we can, up to a maximum of 256 KB
> + * per chunk. Note that the chunks are not necessarily in contiguous
> + * physical memory.
>    */
>   enum {
> -	MLX4_ICM_ALLOC_SIZE	= PAGE_SIZE,
> -	MLX4_TABLE_CHUNK_SIZE	= PAGE_SIZE,
> +	MLX4_ICM_ALLOC_SIZE	= 1 << 18,
> +	MLX4_TABLE_CHUNK_SIZE	= 1 << 18,
>   };
>   
>   static void mlx4_free_icm_pages(struct mlx4_dev *dev, struct mlx4_icm_chunk *chunk)
> @@ -135,6 +136,7 @@ struct mlx4_icm *mlx4_alloc_icm(struct mlx4_dev *dev, int npages,
>   	struct mlx4_icm *icm;
>   	struct mlx4_icm_chunk *chunk = NULL;
>   	int cur_order;
> +	gfp_t mask;
>   	int ret;
>   
>   	/* We use sg_set_buf for coherent allocs, which assumes low memory */
> @@ -178,13 +180,16 @@ struct mlx4_icm *mlx4_alloc_icm(struct mlx4_dev *dev, int npages,
>   		while (1 << cur_order > npages)
>   			--cur_order;
>   
> +		mask = gfp_mask;
> +		if (cur_order)
> +			mask = (mask & ~__GFP_DIRECT_RECLAIM) | __GFP_NORETRY;
>   		if (coherent)
>   			ret = mlx4_alloc_icm_coherent(&dev->persist->pdev->dev,
>   						      &chunk->mem[chunk->npages],
> -						      cur_order, gfp_mask);
> +						      cur_order, mask);
>   		else
>   			ret = mlx4_alloc_icm_pages(&chunk->mem[chunk->npages],
> -						   cur_order, gfp_mask,
> +						   cur_order, mask,
>   						   dev->numa_node);
>   
>   		if (ret) {
> 

Thanks Eric.

I think it preserves the original intention of commit 1383cb8103bb 
("mlx4_core: allocate ICM memory in page size chunks").

Looks good to me.

Acked-by: Tariq Toukan <tariqt@mellanox.com>

^ permalink raw reply

* Re: [PATCH bpf v3 0/5] fix test_sockmap
From: John Fastabend @ 2018-05-30 13:32 UTC (permalink / raw)
  To: Prashant Bhole, Alexei Starovoitov, Daniel Borkmann
  Cc: David S . Miller, Shuah Khan, netdev, linux-kselftest
In-Reply-To: <20180530055611.10216-1-bhole_prashant_q7@lab.ntt.co.jp>

On 05/29/2018 10:56 PM, Prashant Bhole wrote:
> This series fixes error handling, timeout and data verification in
> test_sockmap. Previously it was not able to detect failure/timeout in
> RX/TX thread because error was not notified to the main thread.
> 
> Also slightly improved test output by printing parameter values (cork,
> apply, start, end) so that parameters for all tests are displayed.
> 
> Changes in v3:
>   - Skipped error checking for corked tests
> 
> Prashant Bhole (5):
>   selftests/bpf: test_sockmap, check test failure
>   selftests/bpf: test_sockmap, join cgroup in selftest mode
>   selftests/bpf: test_sockmap, fix test timeout
>   selftests/bpf: test_sockmap, fix data verification
>   selftests/bpf: test_sockmap, print additional test options
> 
>  tools/testing/selftests/bpf/test_sockmap.c | 76 +++++++++++++++++-----
>  1 file changed, 58 insertions(+), 18 deletions(-)
> 

Looks good thanks. We may want to tune the running time a bit but
lets get this applied first. A lot of nice improvements!

.John

^ permalink raw reply

* Re: [PATCH bpf v3 3/5] selftests/bpf: test_sockmap, fix test timeout
From: John Fastabend @ 2018-05-30 13:31 UTC (permalink / raw)
  To: Prashant Bhole, Alexei Starovoitov, Daniel Borkmann
  Cc: David S . Miller, Shuah Khan, netdev, linux-kselftest
In-Reply-To: <20180530055611.10216-4-bhole_prashant_q7@lab.ntt.co.jp>

On 05/29/2018 10:56 PM, Prashant Bhole wrote:
> In order to reduce runtime of tests, recently timout for select() call
> was reduced from 1sec to 10usec. This was causing many tests failures.
> It was caught with failure handling commits in this series.
> 
> Restoring the timeout from 10usec to 1sec
> 
> Fixes: a18fda1a62c3 ("bpf: reduce runtime of test_sockmap tests")
> Signed-off-by: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp>
> ---

Quick question, how long does it take to run now with the time increase?
If its taking too long we may need to remove some tests. I have a longer
running test_sockmap script that I run as part of Cilium[1] project
where I put longer running stress tests.

Acked-by: John Fastabend <john.fastabend@gmail.com>

[1] cilium.io

^ permalink raw reply

* Re: [PATCH bpf v3 1/5] selftests/bpf: test_sockmap, check test failure
From: John Fastabend @ 2018-05-30 13:26 UTC (permalink / raw)
  To: Prashant Bhole, Alexei Starovoitov, Daniel Borkmann
  Cc: David S . Miller, Shuah Khan, netdev, linux-kselftest
In-Reply-To: <20180530055611.10216-2-bhole_prashant_q7@lab.ntt.co.jp>

On 05/29/2018 10:56 PM, Prashant Bhole wrote:
> Test failures are not identified because exit code of RX/TX threads
> is not checked. Also threads are not returning correct exit code.
> 
> - Return exit code from threads depending on test execution status
> - In main thread, check the exit code of RX/TX threads
> - Skip error checking for corked tests as they are expected to timeout
> 
> Fixes: 16962b2404ac ("bpf: sockmap, add selftests")
> Signed-off-by: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp>
> ---
>  tools/testing/selftests/bpf/test_sockmap.c | 25 ++++++++++++++++------
>  1 file changed, 19 insertions(+), 6 deletions(-)
> 

Looks good. Thanks.

Acked-by: John Fastabend <john.fastabend@gmail.com>

^ permalink raw reply

* Re: [RFC PATCH ghak32 V2 00/13] audit: implement container id
From: Steve Grubb @ 2018-05-30 13:20 UTC (permalink / raw)
  To: linux-audit-H+wXaHxf7aLQT0dZR+AlfA
  Cc: simo-H+wXaHxf7aLQT0dZR+AlfA, jlayton-H+wXaHxf7aLQT0dZR+AlfA,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, LKML,
	eparis-FjpueFixGhCM4zKIHC2jIg, dhowells-H+wXaHxf7aLQT0dZR+AlfA,
	carlos-H+wXaHxf7aLQT0dZR+AlfA, ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	luto-DgEjT+Ai2ygdnm+yROfE0A, netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	cgroups-u79uwXL29TY76Z2rM5mHXA,
	viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn
In-Reply-To: <cover.1521179281.git.rgb-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>

On Friday, March 16, 2018 5:00:27 AM EDT Richard Guy Briggs wrote:
> Implement audit kernel container ID.
> 
> This patchset is a second RFC based on the proposal document (V3)
> posted:
> 	https://www.redhat.com/archives/linux-audit/2018-January/msg00014.html

So, if you work on a container orchestrator, how exactly is this set of 
interfaces to be used and in what order?

Thanks,
-Steve

> The first patch implements the proc fs write to set the audit container
> ID of a process, emitting an AUDIT_CONTAINER record to announce the
> registration of that container ID on that process.  This patch requires
> userspace support for record acceptance and proper type display.
> 
> The second checks for children or co-threads and refuses to set the
> container ID if either are present.  (This policy could be changed to
> set both with the same container ID provided they meet the rest of the
> requirements.)
> 
> The third implements the auxiliary record AUDIT_CONTAINER_INFO if a
> container ID is identifiable with an event.  This patch requires
> userspace support for proper type display.
> 
> The fourth adds container ID filtering to the exit, exclude and user
> lists.  This patch requires auditctil userspace support for the
> --containerid option.
> 
> The 5th adds signal and ptrace support.
> 
> The 6th creates a local audit context to be able to bind a standalone
> record with a locally created auxiliary record.
> 
> The 7th, 8th, 9th, 10th patches add container ID records to standalone
> records.  Some of these may end up being syscall auxiliary records and
> won't need this specific support since they'll be supported via
> syscalls.
> 
> The 11th adds network namespace container ID labelling based on member
> tasks' container ID labels.
> 
> The 12th adds container ID support to standalone netfilter records that
> don't have a task context and lists each container to which that net
> namespace belongs.
> 
> The 13th implements reading the container ID from the proc filesystem
> for debugging.  This patch isn't planned for upstream inclusion.
> 
> Feedback please!
> 
> Example: Set a container ID of 123456 to the "sleep" task:
> 	sleep 2&
> 	child=$!
> 	echo 123456 > /proc/$child/containerid; echo $?
> 	ausearch -ts recent -m container
> 	echo child:$child contid:$( cat /proc/$child/containerid)
> This should produce a record such as:
> 	type=CONTAINER msg=audit(1521122590.315:222): op=set pid=689 uid=0
> subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 auid=0 tty=pts0
> ses=3 opid=707 old-contid=18446744073709551615 contid=123456 res=1
> 
> Example: Set a filter on a container ID 123459 on /tmp/tmpcontainerid:
> 	containerid=123459
> 	key=tmpcontainerid
> 	auditctl -a exit,always -F dir=/tmp -F perm=wa -F containerid=$containerid
> -F key=$key perl -e "sleep 1; open(my \$tmpfile, '>', \"/tmp/$key\");
> close(\$tmpfile);" & child=$!
> 	echo $containerid > /proc/$child/containerid
> 	sleep 2
> 	ausearch -i -ts recent -k $key
> 	auditctl -d exit,always -F dir=/tmp -F perm=wa -F containerid=$containerid
> -F key=$key rm -f /tmp/$key
> This should produce an event such as:
> 	type=CONTAINER_INFO msg=audit(1521122591.614:227): op=task contid=123459
> 	type=PROCTITLE msg=audit(1521122591.614:227):
> proctitle=7065726C002D6500736C65657020313B206F70656E286D792024746D7066696C
> 652C20273E272C20222F746D702F746D70636F6E7461696E6572696422293B20636C6F73652
> 824746D7066696C65293B type=PATH msg=audit(1521122591.614:227): item=1
> name="/tmp/tmpcontainerid" inode=18427 dev=00:26 mode=0100644 ouid=0
> ogid=0 rdev=00:00 obj=unconfined_u:object_r:user_tmp_t:s0 nametype=CREATE
> cap_fp=0000000000000000 cap_fi=0000000000000000 cap_fe=0 cap_fver=0
> type=PATH msg=audit(1521122591.614:227): item=0 name="/tmp/" inode=13513
> dev=00:26 mode=041777 ouid=0 ogid=0 rdev=00:00
> obj=system_u:object_r:tmp_t:s0 nametype=PARENT cap_fp=0000000000000000
> cap_fi=0000000000000000 cap_fe=0 cap_fver=0 type=CWD
> msg=audit(1521122591.614:227): cwd="/root"
> 	type=SYSCALL msg=audit(1521122591.614:227): arch=c000003e syscall=257
> success=yes exit=3 a0=ffffffffffffff9c a1=55db90a28900 a2=241 a3=1b6
> items=2 ppid=689 pid=724 auid=0 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0
> sgid=0 fsgid=0 tty=pts0 ses=3 comm="perl" exe="/usr/bin/perl"
> subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023
> key="tmpcontainerid"
> 
> See:
> 	https://github.com/linux-audit/audit-kernel/issues/32
> 	https://github.com/linux-audit/audit-userspace/issues/40
> 	https://github.com/linux-audit/audit-testsuite/issues/64
> 
> Richard Guy Briggs (13):
>   audit: add container id
>   audit: check children and threading before allowing containerid
>   audit: log container info of syscalls
>   audit: add containerid filtering
>   audit: add containerid support for ptrace and signals
>   audit: add support for non-syscall auxiliary records
>   audit: add container aux record to watch/tree/mark
>   audit: add containerid support for tty_audit
>   audit: add containerid support for config/feature/user records
>   audit: add containerid support for seccomp and anom_abend records
>   audit: add support for containerid to network namespaces
>   audit: NETFILTER_PKT: record each container ID associated with a netNS
>   debug audit: read container ID of a process
> 
>  drivers/tty/tty_audit.c     |   5 +-
>  fs/proc/base.c              |  53 ++++++++++++++++
>  include/linux/audit.h       |  43 +++++++++++++
>  include/linux/init_task.h   |   4 +-
>  include/linux/sched.h       |   1 +
>  include/net/net_namespace.h |  12 ++++
>  include/uapi/linux/audit.h  |   8 ++-
>  kernel/audit.c              |  75 ++++++++++++++++++++---
>  kernel/audit.h              |   3 +
>  kernel/audit_fsnotify.c     |   5 +-
>  kernel/audit_tree.c         |   5 +-
>  kernel/audit_watch.c        |  33 +++++-----
>  kernel/auditfilter.c        |  52 +++++++++++++++-
>  kernel/auditsc.c            | 145
> ++++++++++++++++++++++++++++++++++++++++++-- kernel/nsproxy.c            |
>   6 ++
>  net/core/net_namespace.c    |  45 ++++++++++++++
>  net/netfilter/xt_AUDIT.c    |  15 ++++-
>  17 files changed, 473 insertions(+), 37 deletions(-)

^ permalink raw reply

* [PATCH net-next] qed: Add srq core support for RoCE and iWARP
From: Yuval Bason @ 2018-05-30 13:11 UTC (permalink / raw)
  To: yuval.bason, davem
  Cc: netdev, jgg, dledford, linux-rdma, Michal Kalderon, Ariel Elior

This patch adds support for configuring SRQ and provides the necessary
APIs for rdma upper layer driver (qedr) to enable the SRQ feature.

Signed-off-by: Michal Kalderon <michal.kalderon@cavium.com>
Signed-off-by: Ariel Elior <ariel.elior@cavium.com>
Signed-off-by: Yuval Bason <yuval.bason@cavium.com>
---
 drivers/net/ethernet/qlogic/qed/qed_cxt.c   |   5 +-
 drivers/net/ethernet/qlogic/qed/qed_cxt.h   |   1 +
 drivers/net/ethernet/qlogic/qed/qed_hsi.h   |   2 +
 drivers/net/ethernet/qlogic/qed/qed_iwarp.c |  23 ++++
 drivers/net/ethernet/qlogic/qed/qed_main.c  |   2 +
 drivers/net/ethernet/qlogic/qed/qed_rdma.c  | 179 +++++++++++++++++++++++++++-
 drivers/net/ethernet/qlogic/qed/qed_rdma.h  |   2 +
 drivers/net/ethernet/qlogic/qed/qed_roce.c  |  17 ++-
 include/linux/qed/qed_rdma_if.h             |  12 +-
 9 files changed, 235 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/qlogic/qed/qed_cxt.c b/drivers/net/ethernet/qlogic/qed/qed_cxt.c
index 820b226..7ed6aa0 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_cxt.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_cxt.c
@@ -47,6 +47,7 @@
 #include "qed_hsi.h"
 #include "qed_hw.h"
 #include "qed_init_ops.h"
+#include "qed_rdma.h"
 #include "qed_reg_addr.h"
 #include "qed_sriov.h"
 
@@ -426,7 +427,7 @@ static void qed_cxt_set_srq_count(struct qed_hwfn *p_hwfn, u32 num_srqs)
 	p_mgr->srq_count = num_srqs;
 }
 
-static u32 qed_cxt_get_srq_count(struct qed_hwfn *p_hwfn)
+u32 qed_cxt_get_srq_count(struct qed_hwfn *p_hwfn)
 {
 	struct qed_cxt_mngr *p_mgr = p_hwfn->p_cxt_mngr;
 
@@ -2071,7 +2072,7 @@ static void qed_rdma_set_pf_params(struct qed_hwfn *p_hwfn,
 	u32 num_cons, num_qps, num_srqs;
 	enum protocol_type proto;
 
-	num_srqs = min_t(u32, 32 * 1024, p_params->num_srqs);
+	num_srqs = min_t(u32, QED_RDMA_MAX_SRQS, p_params->num_srqs);
 
 	if (p_hwfn->mcp_info->func_info.protocol == QED_PCI_ETH_RDMA) {
 		DP_NOTICE(p_hwfn,
diff --git a/drivers/net/ethernet/qlogic/qed/qed_cxt.h b/drivers/net/ethernet/qlogic/qed/qed_cxt.h
index a4e9586..758a8b4 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_cxt.h
+++ b/drivers/net/ethernet/qlogic/qed/qed_cxt.h
@@ -235,6 +235,7 @@ u32 qed_cxt_get_proto_tid_count(struct qed_hwfn *p_hwfn,
 				enum protocol_type type);
 u32 qed_cxt_get_proto_cid_start(struct qed_hwfn *p_hwfn,
 				enum protocol_type type);
+u32 qed_cxt_get_srq_count(struct qed_hwfn *p_hwfn);
 int qed_cxt_free_proto_ilt(struct qed_hwfn *p_hwfn, enum protocol_type proto);
 
 #define QED_CTX_WORKING_MEM 0
diff --git a/drivers/net/ethernet/qlogic/qed/qed_hsi.h b/drivers/net/ethernet/qlogic/qed/qed_hsi.h
index 8e1e6e1..82ce401 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_hsi.h
+++ b/drivers/net/ethernet/qlogic/qed/qed_hsi.h
@@ -9725,6 +9725,8 @@ enum iwarp_eqe_async_opcode {
 	IWARP_EVENT_TYPE_ASYNC_EXCEPTION_DETECTED,
 	IWARP_EVENT_TYPE_ASYNC_QP_IN_ERROR_STATE,
 	IWARP_EVENT_TYPE_ASYNC_CQ_OVERFLOW,
+	IWARP_EVENT_TYPE_ASYNC_SRQ_EMPTY,
+	IWARP_EVENT_TYPE_ASYNC_SRQ_LIMIT,
 	MAX_IWARP_EQE_ASYNC_OPCODE
 };
 
diff --git a/drivers/net/ethernet/qlogic/qed/qed_iwarp.c b/drivers/net/ethernet/qlogic/qed/qed_iwarp.c
index 2a2b101..474e6cf 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_iwarp.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_iwarp.c
@@ -271,6 +271,8 @@ int qed_iwarp_create_qp(struct qed_hwfn *p_hwfn,
 	p_ramrod->sq_num_pages = qp->sq_num_pages;
 	p_ramrod->rq_num_pages = qp->rq_num_pages;
 
+	p_ramrod->srq_id.srq_idx = cpu_to_le16(qp->srq_id);
+	p_ramrod->srq_id.opaque_fid = cpu_to_le16(p_hwfn->hw_info.opaque_fid);
 	p_ramrod->qp_handle_for_cqe.hi = cpu_to_le32(qp->qp_handle.hi);
 	p_ramrod->qp_handle_for_cqe.lo = cpu_to_le32(qp->qp_handle.lo);
 
@@ -3004,8 +3006,11 @@ static int qed_iwarp_async_event(struct qed_hwfn *p_hwfn,
 				 union event_ring_data *data,
 				 u8 fw_return_code)
 {
+	struct qed_rdma_events events = p_hwfn->p_rdma_info->events;
 	struct regpair *fw_handle = &data->rdma_data.async_handle;
 	struct qed_iwarp_ep *ep = NULL;
+	u16 srq_offset;
+	u16 srq_id;
 	u16 cid;
 
 	ep = (struct qed_iwarp_ep *)(uintptr_t)HILO_64(fw_handle->hi,
@@ -3067,6 +3072,24 @@ static int qed_iwarp_async_event(struct qed_hwfn *p_hwfn,
 		qed_iwarp_cid_cleaned(p_hwfn, cid);
 
 		break;
+	case IWARP_EVENT_TYPE_ASYNC_SRQ_EMPTY:
+		DP_NOTICE(p_hwfn, "IWARP_EVENT_TYPE_ASYNC_SRQ_EMPTY\n");
+		srq_offset = p_hwfn->p_rdma_info->srq_id_offset;
+		/* FW assigns value that is no greater than u16 */
+		srq_id = ((u16)le32_to_cpu(fw_handle->lo)) - srq_offset;
+		events.affiliated_event(events.context,
+					QED_IWARP_EVENT_SRQ_EMPTY,
+					&srq_id);
+		break;
+	case IWARP_EVENT_TYPE_ASYNC_SRQ_LIMIT:
+		DP_NOTICE(p_hwfn, "IWARP_EVENT_TYPE_ASYNC_SRQ_LIMIT\n");
+		srq_offset = p_hwfn->p_rdma_info->srq_id_offset;
+		/* FW assigns value that is no greater than u16 */
+		srq_id = ((u16)le32_to_cpu(fw_handle->lo)) - srq_offset;
+		events.affiliated_event(events.context,
+					QED_IWARP_EVENT_SRQ_LIMIT,
+					&srq_id);
+		break;
 	case IWARP_EVENT_TYPE_ASYNC_CQ_OVERFLOW:
 		DP_NOTICE(p_hwfn, "IWARP_EVENT_TYPE_ASYNC_CQ_OVERFLOW\n");
 
diff --git a/drivers/net/ethernet/qlogic/qed/qed_main.c b/drivers/net/ethernet/qlogic/qed/qed_main.c
index 68c4399..b04d57c 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_main.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_main.c
@@ -64,6 +64,7 @@
 
 #define QED_ROCE_QPS			(8192)
 #define QED_ROCE_DPIS			(8)
+#define QED_RDMA_SRQS                   QED_ROCE_QPS
 
 static char version[] =
 	"QLogic FastLinQ 4xxxx Core Module qed " DRV_MODULE_VERSION "\n";
@@ -922,6 +923,7 @@ static void qed_update_pf_params(struct qed_dev *cdev,
 	if (IS_ENABLED(CONFIG_QED_RDMA)) {
 		params->rdma_pf_params.num_qps = QED_ROCE_QPS;
 		params->rdma_pf_params.min_dpis = QED_ROCE_DPIS;
+		params->rdma_pf_params.num_srqs = QED_RDMA_SRQS;
 		/* divide by 3 the MRs to avoid MF ILT overflow */
 		params->rdma_pf_params.gl_pi = QED_ROCE_PROTOCOL_INDEX;
 	}
diff --git a/drivers/net/ethernet/qlogic/qed/qed_rdma.c b/drivers/net/ethernet/qlogic/qed/qed_rdma.c
index a411f9c..bd23659 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_rdma.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_rdma.c
@@ -259,15 +259,29 @@ static int qed_rdma_alloc(struct qed_hwfn *p_hwfn,
 		goto free_cid_map;
 	}
 
+	/* Allocate bitmap for srqs */
+	p_rdma_info->num_srqs = qed_cxt_get_srq_count(p_hwfn);
+	rc = qed_rdma_bmap_alloc(p_hwfn, &p_rdma_info->srq_map,
+				 p_rdma_info->num_srqs, "SRQ");
+	if (rc) {
+		DP_VERBOSE(p_hwfn, QED_MSG_RDMA,
+			   "Failed to allocate srq bitmap, rc = %d\n", rc);
+		goto free_real_cid_map;
+	}
+
 	if (QED_IS_IWARP_PERSONALITY(p_hwfn))
 		rc = qed_iwarp_alloc(p_hwfn);
 
 	if (rc)
-		goto free_cid_map;
+		goto free_srq_map;
 
 	DP_VERBOSE(p_hwfn, QED_MSG_RDMA, "Allocation successful\n");
 	return 0;
 
+free_srq_map:
+	kfree(p_rdma_info->srq_map.bitmap);
+free_real_cid_map:
+	kfree(p_rdma_info->real_cid_map.bitmap);
 free_cid_map:
 	kfree(p_rdma_info->cid_map.bitmap);
 free_tid_map:
@@ -351,6 +365,8 @@ static void qed_rdma_resc_free(struct qed_hwfn *p_hwfn)
 	qed_rdma_bmap_free(p_hwfn, &p_hwfn->p_rdma_info->cq_map, 1);
 	qed_rdma_bmap_free(p_hwfn, &p_hwfn->p_rdma_info->toggle_bits, 0);
 	qed_rdma_bmap_free(p_hwfn, &p_hwfn->p_rdma_info->tid_map, 1);
+	qed_rdma_bmap_free(p_hwfn, &p_hwfn->p_rdma_info->srq_map, 1);
+	qed_rdma_bmap_free(p_hwfn, &p_hwfn->p_rdma_info->real_cid_map, 1);
 
 	kfree(p_rdma_info->port);
 	kfree(p_rdma_info->dev);
@@ -431,6 +447,12 @@ static void qed_rdma_init_devinfo(struct qed_hwfn *p_hwfn,
 	if (cdev->rdma_max_sge)
 		dev->max_sge = min_t(u32, cdev->rdma_max_sge, dev->max_sge);
 
+	dev->max_srq_sge = QED_RDMA_MAX_SGE_PER_SRQ_WQE;
+	if (p_hwfn->cdev->rdma_max_srq_sge) {
+		dev->max_srq_sge = min_t(u32,
+					 p_hwfn->cdev->rdma_max_srq_sge,
+					 dev->max_srq_sge);
+	}
 	dev->max_inline = ROCE_REQ_MAX_INLINE_DATA_SIZE;
 
 	dev->max_inline = (cdev->rdma_max_inline) ?
@@ -474,6 +496,8 @@ static void qed_rdma_init_devinfo(struct qed_hwfn *p_hwfn,
 	dev->max_mr_mw_fmr_size = dev->max_mr_mw_fmr_pbl * PAGE_SIZE;
 	dev->max_pkey = QED_RDMA_MAX_P_KEY;
 
+	dev->max_srq = p_hwfn->p_rdma_info->num_srqs;
+	dev->max_srq_wr = QED_RDMA_MAX_SRQ_WQE_ELEM;
 	dev->max_qp_resp_rd_atomic_resc = RDMA_RING_PAGE_SIZE /
 					  (RDMA_RESP_RD_ATOMIC_ELM_SIZE * 2);
 	dev->max_qp_req_rd_atomic_resc = RDMA_RING_PAGE_SIZE /
@@ -1628,6 +1652,156 @@ static void *qed_rdma_get_rdma_ctx(struct qed_dev *cdev)
 	return QED_LEADING_HWFN(cdev);
 }
 
+int qed_rdma_modify_srq(void *rdma_cxt,
+			struct qed_rdma_modify_srq_in_params *in_params)
+{
+	struct rdma_srq_modify_ramrod_data *p_ramrod;
+	struct qed_hwfn *p_hwfn = rdma_cxt;
+	struct qed_sp_init_data init_data;
+	struct qed_spq_entry *p_ent;
+	u16 opaque_fid;
+	int rc;
+
+	memset(&init_data, 0, sizeof(init_data));
+	init_data.opaque_fid = p_hwfn->hw_info.opaque_fid;
+	init_data.comp_mode = QED_SPQ_MODE_EBLOCK;
+
+	rc = qed_sp_init_request(p_hwfn, &p_ent,
+				 RDMA_RAMROD_MODIFY_SRQ,
+				 p_hwfn->p_rdma_info->proto, &init_data);
+	if (rc)
+		return rc;
+
+	p_ramrod = &p_ent->ramrod.rdma_modify_srq;
+	p_ramrod->srq_id.srq_idx = cpu_to_le16(in_params->srq_id);
+	opaque_fid = p_hwfn->hw_info.opaque_fid;
+	p_ramrod->srq_id.opaque_fid = cpu_to_le16(opaque_fid);
+	p_ramrod->wqe_limit = cpu_to_le16(in_params->wqe_limit);
+
+	rc = qed_spq_post(p_hwfn, p_ent, NULL);
+	if (rc)
+		return rc;
+
+	DP_VERBOSE(p_hwfn, QED_MSG_RDMA, "modified SRQ id = %x",
+		   in_params->srq_id);
+
+	return rc;
+}
+
+int qed_rdma_destroy_srq(void *rdma_cxt,
+			 struct qed_rdma_destroy_srq_in_params *in_params)
+{
+	struct rdma_srq_destroy_ramrod_data *p_ramrod;
+	struct qed_hwfn *p_hwfn = rdma_cxt;
+	struct qed_sp_init_data init_data;
+	struct qed_spq_entry *p_ent;
+	struct qed_bmap *bmap;
+	u16 opaque_fid;
+	int rc;
+
+	opaque_fid = p_hwfn->hw_info.opaque_fid;
+
+	memset(&init_data, 0, sizeof(init_data));
+	init_data.opaque_fid = opaque_fid;
+	init_data.comp_mode = QED_SPQ_MODE_EBLOCK;
+
+	rc = qed_sp_init_request(p_hwfn, &p_ent,
+				 RDMA_RAMROD_DESTROY_SRQ,
+				 p_hwfn->p_rdma_info->proto, &init_data);
+	if (rc)
+		return rc;
+
+	p_ramrod = &p_ent->ramrod.rdma_destroy_srq;
+	p_ramrod->srq_id.srq_idx = cpu_to_le16(in_params->srq_id);
+	p_ramrod->srq_id.opaque_fid = cpu_to_le16(opaque_fid);
+
+	rc = qed_spq_post(p_hwfn, p_ent, NULL);
+	if (rc)
+		return rc;
+
+	bmap = &p_hwfn->p_rdma_info->srq_map;
+
+	spin_lock_bh(&p_hwfn->p_rdma_info->lock);
+	qed_bmap_release_id(p_hwfn, bmap, in_params->srq_id);
+	spin_unlock_bh(&p_hwfn->p_rdma_info->lock);
+
+	DP_VERBOSE(p_hwfn, QED_MSG_RDMA, "SRQ destroyed Id = %x",
+		   in_params->srq_id);
+
+	return rc;
+}
+
+int qed_rdma_create_srq(void *rdma_cxt,
+			struct qed_rdma_create_srq_in_params *in_params,
+			struct qed_rdma_create_srq_out_params *out_params)
+{
+	struct rdma_srq_create_ramrod_data *p_ramrod;
+	struct qed_hwfn *p_hwfn = rdma_cxt;
+	struct qed_sp_init_data init_data;
+	enum qed_cxt_elem_type elem_type;
+	struct qed_spq_entry *p_ent;
+	u16 opaque_fid, srq_id;
+	struct qed_bmap *bmap;
+	u32 returned_id;
+	int rc;
+
+	bmap = &p_hwfn->p_rdma_info->srq_map;
+	spin_lock_bh(&p_hwfn->p_rdma_info->lock);
+	rc = qed_rdma_bmap_alloc_id(p_hwfn, bmap, &returned_id);
+	spin_unlock_bh(&p_hwfn->p_rdma_info->lock);
+
+	if (rc) {
+		DP_NOTICE(p_hwfn, "failed to allocate srq id\n");
+		return rc;
+	}
+
+	elem_type = QED_ELEM_SRQ;
+	rc = qed_cxt_dynamic_ilt_alloc(p_hwfn, elem_type, returned_id);
+	if (rc)
+		goto err;
+	/* returned id is no greater than u16 */
+	srq_id = (u16)returned_id;
+	opaque_fid = p_hwfn->hw_info.opaque_fid;
+
+	memset(&init_data, 0, sizeof(init_data));
+	opaque_fid = p_hwfn->hw_info.opaque_fid;
+	init_data.opaque_fid = opaque_fid;
+	init_data.comp_mode = QED_SPQ_MODE_EBLOCK;
+
+	rc = qed_sp_init_request(p_hwfn, &p_ent,
+				 RDMA_RAMROD_CREATE_SRQ,
+				 p_hwfn->p_rdma_info->proto, &init_data);
+	if (rc)
+		goto err;
+
+	p_ramrod = &p_ent->ramrod.rdma_create_srq;
+	DMA_REGPAIR_LE(p_ramrod->pbl_base_addr, in_params->pbl_base_addr);
+	p_ramrod->pages_in_srq_pbl = cpu_to_le16(in_params->num_pages);
+	p_ramrod->pd_id = cpu_to_le16(in_params->pd_id);
+	p_ramrod->srq_id.srq_idx = cpu_to_le16(srq_id);
+	p_ramrod->srq_id.opaque_fid = cpu_to_le16(opaque_fid);
+	p_ramrod->page_size = cpu_to_le16(in_params->page_size);
+	DMA_REGPAIR_LE(p_ramrod->producers_addr, in_params->prod_pair_addr);
+
+	rc = qed_spq_post(p_hwfn, p_ent, NULL);
+	if (rc)
+		goto err;
+
+	out_params->srq_id = srq_id;
+
+	DP_VERBOSE(p_hwfn, QED_MSG_RDMA,
+		   "SRQ created Id = %x\n", out_params->srq_id);
+
+	return rc;
+
+err:
+	spin_lock_bh(&p_hwfn->p_rdma_info->lock);
+	qed_bmap_release_id(p_hwfn, bmap, returned_id);
+	spin_unlock_bh(&p_hwfn->p_rdma_info->lock);
+
+	return rc;
+}
+
 bool qed_rdma_allocated_qps(struct qed_hwfn *p_hwfn)
 {
 	bool result;
@@ -1773,6 +1947,9 @@ static int qed_roce_ll2_set_mac_filter(struct qed_dev *cdev,
 	.rdma_free_tid = &qed_rdma_free_tid,
 	.rdma_register_tid = &qed_rdma_register_tid,
 	.rdma_deregister_tid = &qed_rdma_deregister_tid,
+	.rdma_create_srq = &qed_rdma_create_srq,
+	.rdma_modify_srq = &qed_rdma_modify_srq,
+	.rdma_destroy_srq = &qed_rdma_destroy_srq,
 	.ll2_acquire_connection = &qed_ll2_acquire_connection,
 	.ll2_establish_connection = &qed_ll2_establish_connection,
 	.ll2_terminate_connection = &qed_ll2_terminate_connection,
diff --git a/drivers/net/ethernet/qlogic/qed/qed_rdma.h b/drivers/net/ethernet/qlogic/qed/qed_rdma.h
index 18ec9cb..6f722ee 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_rdma.h
+++ b/drivers/net/ethernet/qlogic/qed/qed_rdma.h
@@ -96,6 +96,8 @@ struct qed_rdma_info {
 	u8 num_cnqs;
 	u32 num_qps;
 	u32 num_mrs;
+	u32 num_srqs;
+	u16 srq_id_offset;
 	u16 queue_zone_base;
 	u16 max_queue_zones;
 	enum protocol_type proto;
diff --git a/drivers/net/ethernet/qlogic/qed/qed_roce.c b/drivers/net/ethernet/qlogic/qed/qed_roce.c
index 6acfd43..ee57fcd 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_roce.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_roce.c
@@ -65,6 +65,8 @@
 		     u8 fw_event_code,
 		     u16 echo, union event_ring_data *data, u8 fw_return_code)
 {
+	struct qed_rdma_events events = p_hwfn->p_rdma_info->events;
+
 	if (fw_event_code == ROCE_ASYNC_EVENT_DESTROY_QP_DONE) {
 		u16 icid =
 		    (u16)le32_to_cpu(data->rdma_data.rdma_destroy_qp_data.cid);
@@ -75,11 +77,18 @@
 		 */
 		qed_roce_free_real_icid(p_hwfn, icid);
 	} else {
-		struct qed_rdma_events *events = &p_hwfn->p_rdma_info->events;
+		if (fw_event_code == ROCE_ASYNC_EVENT_SRQ_EMPTY ||
+		    fw_event_code == ROCE_ASYNC_EVENT_SRQ_LIMIT) {
+			u16 srq_id = (u16)data->rdma_data.async_handle.lo;
+
+			events.affiliated_event(events.context, fw_event_code,
+						&srq_id);
+		} else {
+			union rdma_eqe_data rdata = data->rdma_data;
 
-		events->affiliated_event(p_hwfn->p_rdma_info->events.context,
-					 fw_event_code,
-				     (void *)&data->rdma_data.async_handle);
+			events.affiliated_event(events.context, fw_event_code,
+						(void *)&rdata.async_handle);
+		}
 	}
 
 	return 0;
diff --git a/include/linux/qed/qed_rdma_if.h b/include/linux/qed/qed_rdma_if.h
index 4dd72ba..e05e320 100644
--- a/include/linux/qed/qed_rdma_if.h
+++ b/include/linux/qed/qed_rdma_if.h
@@ -485,7 +485,9 @@ enum qed_iwarp_event_type {
 	QED_IWARP_EVENT_ACTIVE_MPA_REPLY,
 	QED_IWARP_EVENT_LOCAL_ACCESS_ERROR,
 	QED_IWARP_EVENT_REMOTE_OPERATION_ERROR,
-	QED_IWARP_EVENT_TERMINATE_RECEIVED
+	QED_IWARP_EVENT_TERMINATE_RECEIVED,
+	QED_IWARP_EVENT_SRQ_LIMIT,
+	QED_IWARP_EVENT_SRQ_EMPTY,
 };
 
 enum qed_tcp_ip_version {
@@ -646,6 +648,14 @@ struct qed_rdma_ops {
 	int (*rdma_alloc_tid)(void *rdma_cxt, u32 *itid);
 	void (*rdma_free_tid)(void *rdma_cxt, u32 itid);
 
+	int (*rdma_create_srq)(void *rdma_cxt,
+			       struct qed_rdma_create_srq_in_params *iparams,
+			       struct qed_rdma_create_srq_out_params *oparams);
+	int (*rdma_destroy_srq)(void *rdma_cxt,
+				struct qed_rdma_destroy_srq_in_params *iparams);
+	int (*rdma_modify_srq)(void *rdma_cxt,
+			       struct qed_rdma_modify_srq_in_params *iparams);
+
 	int (*ll2_acquire_connection)(void *rdma_cxt,
 				      struct qed_ll2_acquire_data *data);
 
-- 
1.8.3.1

^ permalink raw reply related

* Re: [PATCH] rtnetlink: Add more well known protocol values
From: Donald Sharp @ 2018-05-30 12:32 UTC (permalink / raw)
  To: netdev, David Ahern
In-Reply-To: <20180530122732.3688-1-sharpd@cumulusnetworks.com>

This patch is intended for net-next.

thanks!

donald

On Wed, May 30, 2018 at 8:27 AM, Donald Sharp
<sharpd@cumulusnetworks.com> wrote:
> FRRouting installs routes into the kernel associated with
> the originating protocol.  Add these values to the well
> known values in rtnetlink.h.
>
> Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
> ---
> v2: Fixed whitespace issues
>  include/uapi/linux/rtnetlink.h | 5 +++++
>  1 file changed, 5 insertions(+)
>
> diff --git a/include/uapi/linux/rtnetlink.h b/include/uapi/linux/rtnetlink.h
> index cabb210c93af..7d8502313c99 100644
> --- a/include/uapi/linux/rtnetlink.h
> +++ b/include/uapi/linux/rtnetlink.h
> @@ -254,6 +254,11 @@ enum {
>  #define RTPROT_DHCP    16      /* DHCP client */
>  #define RTPROT_MROUTED 17      /* Multicast daemon */
>  #define RTPROT_BABEL   42      /* Babel daemon */
> +#define RTPROT_BGP     186     /* BGP Routes */
> +#define RTPROT_ISIS    187     /* ISIS Routes */
> +#define RTPROT_OSPF    188     /* OSPF Routes */
> +#define RTPROT_RIP     189     /* RIP Routes */
> +#define RTPROT_EIGRP   192     /* EIGRP Routes */
>
>  /* rtm_scope
>
> --
> 2.14.3
>

^ permalink raw reply

* Re: [PATCH mlx5-next v2 11/13] IB/mlx5: Add flow counters binding support
From: Yishai Hadas @ 2018-05-30 12:31 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Leon Romanovsky, Doug Ledford, Leon Romanovsky, RDMA mailing list,
	Boris Pismenny, Matan Barak, Raed Salem, Yishai Hadas,
	Saeed Mahameed, linux-netdev, Alex Rosenbaum
In-Reply-To: <20180529195627.GA31423@ziepe.ca>

On 5/29/2018 10:56 PM, Jason Gunthorpe wrote:
> On Tue, May 29, 2018 at 04:09:15PM +0300, Leon Romanovsky wrote:
>> diff --git a/include/uapi/rdma/mlx5-abi.h b/include/uapi/rdma/mlx5-abi.h
>> index 508ea8c82da7..ef3f430a7050 100644
>> +++ b/include/uapi/rdma/mlx5-abi.h
>> @@ -443,4 +443,18 @@ enum {
>>   enum {
>>   	MLX5_IB_CLOCK_INFO_V1              = 0,
>>   };
>> +
>> +struct mlx5_ib_flow_counters_data {
>> +	__aligned_u64   counters_data;
>> +	__u32   ncounters;
>> +	__u32   reserved;
>> +};
>> +
>> +struct mlx5_ib_create_flow {
>> +	__u32   ncounters_data;
>> +	__u32   reserved;
>> +	/* Following are counters data based on ncounters_data */
>> +	struct mlx5_ib_flow_counters_data data[];
>> +};
>> +
>>   #endif /* MLX5_ABI_USER_H */
> 
> This uapi thing still needs to be fixed as I pointed out before.

In V3 we can go with below, no change in memory layout but it can 
clarify the code/usage.

struct mlx5_ib_flow_counters_desc {
         __u32   description;
         __u32   index;
};

struct mlx5_ib_flow_counters_data {
         RDMA_UAPI_PTR(struct mlx5_ib_flow_counters_desc *, counters_data);
         __u32   ncounters;
         __u32   reserved;
};

struct mlx5_ib_create_flow {
         __u32   ncounters_data;
         __u32   reserved;
         /* Following are counters data based on ncounters_data */
         struct mlx5_ib_flow_counters_data data[];


> I still can't figure out why this should be a 2d array.

This comes to support the future case of multiple counters objects/specs 
passed with the same flow. There is a need to differentiate mapping data 
for each counters object and that is done via the 'ncounters_data' field 
and the 2d array.

  I think it
> should be written simply as:
> 
> struct mlx5_ib_flow_counter_desc {
>          __u32 description;
>          __u32 index;
> };
> 
> struct mlx5_ib_create_flow {
> 	RDMA_UAPI_PTR(struct mlx5_ib_flow_counter_desc, counters_data);
> 	__u32   ncounters;
> 	__u32   reserved;
> };
> 
> With the corresponding changes elsewhere.
> 

This doesn't support the above use case.

> A flex array at the end of a struct means that the struct can never be
> extended again which seems like a terrible idea,

The header [1] has a fixed size and will always exist even if there will 
be no counters. Future extensions [2] will be added in the memory post 
the flex array which its size depends on 'ncounters_data'. This pattern 
is used also in other extended APIs. [3]

struct mlx5_ib_create_flow {
         __u32   ncounters_data;
         __u32   reserved;
[1] /* Header is above ********

         /* Following are counters data based on ncounters_data */
         struct mlx5_ib_flow_counters_data data[];

[2] Future fields.

[3] 
https://elixir.bootlin.com/linux/latest/source/include/uapi/rdma/ib_user_verbs.h#L1145

^ permalink raw reply

* [PATCH] rtnetlink: Add more well known protocol values
From: Donald Sharp @ 2018-05-30 12:27 UTC (permalink / raw)
  To: netdev, dsahern

FRRouting installs routes into the kernel associated with
the originating protocol.  Add these values to the well
known values in rtnetlink.h.

Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
---
v2: Fixed whitespace issues
 include/uapi/linux/rtnetlink.h | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/include/uapi/linux/rtnetlink.h b/include/uapi/linux/rtnetlink.h
index cabb210c93af..7d8502313c99 100644
--- a/include/uapi/linux/rtnetlink.h
+++ b/include/uapi/linux/rtnetlink.h
@@ -254,6 +254,11 @@ enum {
 #define RTPROT_DHCP	16      /* DHCP client */
 #define RTPROT_MROUTED	17      /* Multicast daemon */
 #define RTPROT_BABEL	42      /* Babel daemon */
+#define RTPROT_BGP	186     /* BGP Routes */
+#define RTPROT_ISIS	187     /* ISIS Routes */
+#define RTPROT_OSPF	188     /* OSPF Routes */
+#define RTPROT_RIP	189     /* RIP Routes */
+#define RTPROT_EIGRP	192     /* EIGRP Routes */
 
 /* rtm_scope
 
-- 
2.14.3

^ permalink raw reply related

* Re: [PATCH net-next] net: qcom/emac: fix unused variable
From: Timur Tabi @ 2018-05-30 12:10 UTC (permalink / raw)
  To: YueHaibing, davem; +Cc: netdev, linux-kernel
In-Reply-To: <20180529104343.19448-1-yuehaibing@huawei.com>

On 5/29/18 5:43 AM, YueHaibing wrote:
> When CONFIG_ACPI isn't set, variable qdf2400_ops/qdf2432_ops isn't used.
> drivers/net/ethernet/qualcomm/emac/emac-sgmii.c:284:25: warning: ‘qdf2400_ops’ defined but not used [-Wunused-variable]
>   static struct sgmii_ops qdf2400_ops = {
>                           ^~~~~~~~~~~
> drivers/net/ethernet/qualcomm/emac/emac-sgmii.c:276:25: warning: ‘qdf2432_ops’ defined but not used [-Wunused-variable]
>   static struct sgmii_ops qdf2432_ops = {
>                           ^~~~~~~~~~~
> 
> Move the declaration and functions inside the CONFIG_ACPI ifdef
> to fix the warning.
> Signed-off-by: YueHaibing<yuehaibing@huawei.com>

I already fixed this with:

https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git/commit/?id=d377df784178bf5b0a39e75dc8b1ee86e1abb3f6

-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm
Technologies, Inc.  Qualcomm Technologies, Inc. is a member of the
Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply

* [PATCH net-next] cxgb4: Add FORCE_PAUSE bit to 32 bit port caps
From: Ganesh Goudar @ 2018-05-30 11:45 UTC (permalink / raw)
  To: netdev, davem
  Cc: nirranjan, indranil, Ganesh Goudar, Santosh Rastapur,
	Casey Leedom

Add FORCE_PAUSE bit to force local pause settings instead
of using auto negotiated values.

Signed-off-by: Santosh Rastapur <santosh@chelsio.com>
Signed-off-by: Casey Leedom <leedom@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
---
 drivers/net/ethernet/chelsio/cxgb4/t4_hw.c    | 10 +++++++++-
 drivers/net/ethernet/chelsio/cxgb4/t4fw_api.h |  5 +++--
 2 files changed, 12 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c b/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c
index 39da7e3..974a868 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c
@@ -3941,6 +3941,7 @@ static fw_port_cap32_t fwcaps16_to_caps32(fw_port_cap16_t caps16)
 	CAP16_TO_CAP32(FC_RX);
 	CAP16_TO_CAP32(FC_TX);
 	CAP16_TO_CAP32(ANEG);
+	CAP16_TO_CAP32(FORCE_PAUSE);
 	CAP16_TO_CAP32(MDIAUTO);
 	CAP16_TO_CAP32(MDISTRAIGHT);
 	CAP16_TO_CAP32(FEC_RS);
@@ -3982,6 +3983,7 @@ static fw_port_cap16_t fwcaps32_to_caps16(fw_port_cap32_t caps32)
 	CAP32_TO_CAP16(802_3_PAUSE);
 	CAP32_TO_CAP16(802_3_ASM_DIR);
 	CAP32_TO_CAP16(ANEG);
+	CAP32_TO_CAP16(FORCE_PAUSE);
 	CAP32_TO_CAP16(MDIAUTO);
 	CAP32_TO_CAP16(MDISTRAIGHT);
 	CAP32_TO_CAP16(FEC_RS);
@@ -4014,6 +4016,8 @@ static inline fw_port_cap32_t cc_to_fwcap_pause(enum cc_pause cc_pause)
 		fw_pause |= FW_PORT_CAP32_FC_RX;
 	if (cc_pause & PAUSE_TX)
 		fw_pause |= FW_PORT_CAP32_FC_TX;
+	if (!(cc_pause & PAUSE_AUTONEG))
+		fw_pause |= FW_PORT_CAP32_FORCE_PAUSE;
 
 	return fw_pause;
 }
@@ -4101,7 +4105,11 @@ int t4_link_l1cfg_core(struct adapter *adapter, unsigned int mbox,
 		rcap = lc->acaps | fw_fc | fw_fec | fw_mdi;
 	}
 
-	if (rcap & ~lc->pcaps) {
+	/* Note that older Firmware doesn't have FW_PORT_CAP32_FORCE_PAUSE, so
+	 * we need to exclude this from this check in order to maintain
+	 * compatibility ...
+	 */
+	if ((rcap & ~lc->pcaps) & ~FW_PORT_CAP32_FORCE_PAUSE) {
 		dev_err(adapter->pdev_dev,
 			"Requested Port Capabilities %#x exceed Physical Port Capabilities %#x\n",
 			rcap, lc->pcaps);
diff --git a/drivers/net/ethernet/chelsio/cxgb4/t4fw_api.h b/drivers/net/ethernet/chelsio/cxgb4/t4fw_api.h
index 2d91480..f1967cf 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/t4fw_api.h
+++ b/drivers/net/ethernet/chelsio/cxgb4/t4fw_api.h
@@ -2475,7 +2475,7 @@ enum fw_port_cap {
 	FW_PORT_CAP_MDISTRAIGHT		= 0x0400,
 	FW_PORT_CAP_FEC_RS		= 0x0800,
 	FW_PORT_CAP_FEC_BASER_RS	= 0x1000,
-	FW_PORT_CAP_FEC_RESERVED	= 0x2000,
+	FW_PORT_CAP_FORCE_PAUSE		= 0x2000,
 	FW_PORT_CAP_802_3_PAUSE		= 0x4000,
 	FW_PORT_CAP_802_3_ASM_DIR	= 0x8000,
 };
@@ -2522,7 +2522,8 @@ enum fw_port_mdi {
 #define	FW_PORT_CAP32_FEC_RESERVED1	0x02000000UL
 #define	FW_PORT_CAP32_FEC_RESERVED2	0x04000000UL
 #define	FW_PORT_CAP32_FEC_RESERVED3	0x08000000UL
-#define	FW_PORT_CAP32_RESERVED2		0xf0000000UL
+#define FW_PORT_CAP32_FORCE_PAUSE	0x10000000UL
+#define FW_PORT_CAP32_RESERVED2		0xe0000000UL
 
 #define FW_PORT_CAP32_SPEED_S	0
 #define FW_PORT_CAP32_SPEED_M	0xfff
-- 
2.1.0

^ permalink raw reply related

* Re: [RFC V5 PATCH 8/8] vhost: event suppression for packed ring
From: Wei Xu @ 2018-05-30 11:42 UTC (permalink / raw)
  To: Jason Wang; +Cc: kvm, mst, netdev, linux-kernel, virtualization
In-Reply-To: <1527559830-8133-9-git-send-email-jasowang@redhat.com>

On Tue, May 29, 2018 at 10:10:30AM +0800, Jason Wang wrote:
> This patch introduces basic support for event suppression aka driver
> and device area.
> 
> Signed-off-by: Jason Wang <jasowang@redhat.com>
> ---
>  drivers/vhost/vhost.c            | 191 ++++++++++++++++++++++++++++++++++++---
>  drivers/vhost/vhost.h            |  10 +-
>  include/uapi/linux/virtio_ring.h |  19 ++++
>  3 files changed, 204 insertions(+), 16 deletions(-)
> 
> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> index a36e5ad2..112f680 100644
> --- a/drivers/vhost/vhost.c
> +++ b/drivers/vhost/vhost.c
> @@ -1112,10 +1112,15 @@ static int vq_access_ok_packed(struct vhost_virtqueue *vq, unsigned int num,
>  			       struct vring_used __user *used)
>  {
>  	struct vring_desc_packed *packed = (struct vring_desc_packed *)desc;
> +	struct vring_packed_desc_event *driver_event =
> +		(struct vring_packed_desc_event *)avail;
> +	struct vring_packed_desc_event *device_event =
> +		(struct vring_packed_desc_event *)used;
>  
> -	/* FIXME: check device area and driver area */
>  	return access_ok(VERIFY_READ, packed, num * sizeof(*packed)) &&
> -	       access_ok(VERIFY_WRITE, packed, num * sizeof(*packed));
> +	       access_ok(VERIFY_WRITE, packed, num * sizeof(*packed)) &&
> +	       access_ok(VERIFY_READ, driver_event, sizeof(*driver_event)) &&
> +	       access_ok(VERIFY_WRITE, device_event, sizeof(*device_event));
>  }
>  
>  static int vq_access_ok_split(struct vhost_virtqueue *vq, unsigned int num,
> @@ -1190,14 +1195,27 @@ static bool iotlb_access_ok(struct vhost_virtqueue *vq,
>  	return true;
>  }
>  
> -int vq_iotlb_prefetch(struct vhost_virtqueue *vq)
> +int vq_iotlb_prefetch_packed(struct vhost_virtqueue *vq)
> +{
> +	int num = vq->num;
> +
> +	return iotlb_access_ok(vq, VHOST_ACCESS_RO, (u64)(uintptr_t)vq->desc,
> +			       num * sizeof(*vq->desc), VHOST_ADDR_DESC) &&
> +	       iotlb_access_ok(vq, VHOST_ACCESS_WO, (u64)(uintptr_t)vq->desc,
> +			       num * sizeof(*vq->desc), VHOST_ADDR_DESC) &&
> +	       iotlb_access_ok(vq, VHOST_ACCESS_RO,
> +			       (u64)(uintptr_t)vq->driver_event,
> +			       sizeof(*vq->driver_event), VHOST_ADDR_AVAIL) &&
> +	       iotlb_access_ok(vq, VHOST_ACCESS_WO,
> +			       (u64)(uintptr_t)vq->device_event,
> +			       sizeof(*vq->device_event), VHOST_ADDR_USED);
> +}
> +
> +int vq_iotlb_prefetch_split(struct vhost_virtqueue *vq)
>  {
>  	size_t s = vhost_has_feature(vq, VIRTIO_RING_F_EVENT_IDX) ? 2 : 0;
>  	unsigned int num = vq->num;
>  
> -	if (!vq->iotlb)
> -		return 1;
> -
>  	return iotlb_access_ok(vq, VHOST_ACCESS_RO, (u64)(uintptr_t)vq->desc,
>  			       num * sizeof(*vq->desc), VHOST_ADDR_DESC) &&
>  	       iotlb_access_ok(vq, VHOST_ACCESS_RO, (u64)(uintptr_t)vq->avail,
> @@ -1209,6 +1227,17 @@ int vq_iotlb_prefetch(struct vhost_virtqueue *vq)
>  			       num * sizeof(*vq->used->ring) + s,
>  			       VHOST_ADDR_USED);
>  }
> +
> +int vq_iotlb_prefetch(struct vhost_virtqueue *vq)
> +{
> +	if (!vq->iotlb)
> +		return 1;
> +
> +	if (vhost_has_feature(vq, VIRTIO_F_RING_PACKED))
> +		return vq_iotlb_prefetch_packed(vq);
> +	else
> +		return vq_iotlb_prefetch_split(vq);
> +}
>  EXPORT_SYMBOL_GPL(vq_iotlb_prefetch);
>  
>  /* Can we log writes? */
> @@ -1730,6 +1759,50 @@ static int vhost_update_used_flags(struct vhost_virtqueue *vq)
>  	return 0;
>  }
>  
> +static int vhost_update_device_flags(struct vhost_virtqueue *vq,
> +				     __virtio16 device_flags)
> +{
> +	void __user *flags;
> +
> +	if (vhost_put_user(vq, device_flags, &vq->device_event->flags,
> +			   VHOST_ADDR_USED) < 0)
> +		return -EFAULT;
> +	if (unlikely(vq->log_used)) {
> +		/* Make sure the flag is seen before log. */
> +		smp_wmb();
> +		/* Log used flag write. */
> +		flags = &vq->device_event->flags;
> +		log_write(vq->log_base, vq->log_addr +
> +			  (flags - (void __user *)vq->device_event),
> +			  sizeof(vq->device_event->flags));
> +		if (vq->log_ctx)
> +			eventfd_signal(vq->log_ctx, 1);
> +	}
> +	return 0;
> +}
> +
> +static int vhost_update_device_off_wrap(struct vhost_virtqueue *vq,
> +					__virtio16 device_off_wrap)
> +{
> +	void __user *off_wrap;
> +
> +	if (vhost_put_user(vq, device_off_wrap, &vq->device_event->off_wrap,
> +			   VHOST_ADDR_USED) < 0)
> +		return -EFAULT;
> +	if (unlikely(vq->log_used)) {
> +		/* Make sure the flag is seen before log. */
> +		smp_wmb();
> +		/* Log used flag write. */
> +		off_wrap = &vq->device_event->off_wrap;
> +		log_write(vq->log_base, vq->log_addr +
> +			  (off_wrap - (void __user *)vq->device_event),
> +			  sizeof(vq->device_event->off_wrap));
> +		if (vq->log_ctx)
> +			eventfd_signal(vq->log_ctx, 1);
> +	}
> +	return 0;
> +}
> +
>  static int vhost_update_avail_event(struct vhost_virtqueue *vq, u16 avail_event)
>  {
>  	if (vhost_put_user(vq, cpu_to_vhost16(vq, vq->avail_idx),
> @@ -2683,16 +2756,13 @@ int vhost_add_used_n(struct vhost_virtqueue *vq,
>  }
>  EXPORT_SYMBOL_GPL(vhost_add_used_n);
>  
> -static bool vhost_notify(struct vhost_dev *dev, struct vhost_virtqueue *vq)
> +static bool vhost_notify_split(struct vhost_dev *dev,
> +			       struct vhost_virtqueue *vq)
>  {
>  	__u16 old, new;
>  	__virtio16 event;
>  	bool v;
>  
> -	/* FIXME: check driver area */
> -	if (vhost_has_feature(vq, VIRTIO_F_RING_PACKED))
> -		return true;
> -
>  	/* Flush out used index updates. This is paired
>  	 * with the barrier that the Guest executes when enabling
>  	 * interrupts. */
> @@ -2725,6 +2795,64 @@ static bool vhost_notify(struct vhost_dev *dev, struct vhost_virtqueue *vq)
>  	return vring_need_event(vhost16_to_cpu(vq, event), new, old);
>  }
>  
> +static bool vhost_notify_packed(struct vhost_dev *dev,
> +				struct vhost_virtqueue *vq)
> +{
> +	__virtio16 event_off_wrap, event_flags;
> +	__u16 old, new, off_wrap;
> +	bool v;
> +
> +	/* Flush out used descriptors updates. This is paired
> +	 * with the barrier that the Guest executes when enabling
> +	 * interrupts.
> +	 */
> +	smp_mb();
> +
> +	if (vhost_get_avail(vq, event_flags,
> +			   &vq->driver_event->flags) < 0) {
> +		vq_err(vq, "Failed to get driver desc_event_flags");
> +		return true;
> +	}
> +
> +	if (!vhost_has_feature(vq, VIRTIO_RING_F_EVENT_IDX))
> +		return event_flags !=
> +		       cpu_to_vhost16(vq, RING_EVENT_FLAGS_DISABLE);
> +
> +	old = vq->signalled_used;
> +	v = vq->signalled_used_valid;
> +	new = vq->signalled_used = vq->last_used_idx;
> +	vq->signalled_used_valid = true;
> +
> +	if (event_flags != cpu_to_vhost16(vq, RING_EVENT_FLAGS_DESC))
> +		return event_flags !=
> +		       cpu_to_vhost16(vq, RING_EVENT_FLAGS_DISABLE);
> +
> +	/* Read desc event flags before event_off and event_wrap */
> +	smp_rmb();
> +
> +	if (vhost_get_avail(vq, event_off_wrap,
> +			    &vq->driver_event->off_wrap) < 0) {
> +		vq_err(vq, "Failed to get driver desc_event_off/wrap");
> +		return true;
> +	}
> +
> +	off_wrap = vhost16_to_cpu(vq, event_off_wrap);
> +
> +	if (unlikely(!v))
> +		return true;
> +
> +	return vhost_vring_packed_need_event(vq, vq->used_wrap_counter,
> +					     off_wrap, new, old);
> +}
> +
> +static bool vhost_notify(struct vhost_dev *dev, struct vhost_virtqueue *vq)
> +{
> +	if (vhost_has_feature(vq, VIRTIO_F_RING_PACKED))
> +		return vhost_notify_packed(dev, vq);
> +	else
> +		return vhost_notify_split(dev, vq);
> +}
> +
>  /* This actually signals the guest, using eventfd. */
>  void vhost_signal(struct vhost_dev *dev, struct vhost_virtqueue *vq)
>  {
> @@ -2802,10 +2930,34 @@ static bool vhost_enable_notify_packed(struct vhost_dev *dev,
>  				       struct vhost_virtqueue *vq)
>  {
>  	struct vring_desc_packed *d = vq->desc_packed + vq->avail_idx;
> -	__virtio16 flags;
> +	__virtio16 flags = RING_EVENT_FLAGS_ENABLE;
>  	int ret;
>  
> -	/* FIXME: disable notification through device area */
> +	if (!(vq->used_flags & VRING_USED_F_NO_NOTIFY))
> +		return false;
> +	vq->used_flags &= ~VRING_USED_F_NO_NOTIFY;

'used_flags' was originally designed for 1.0, why should we pay attetion to it here?

Wei
> +
> +	if (vhost_has_feature(vq, VIRTIO_RING_F_EVENT_IDX)) {
> +		__virtio16 off_wrap = cpu_to_vhost16(vq, vq->avail_idx |
> +				      vq->avail_wrap_counter << 15);
> +
> +		ret = vhost_update_device_off_wrap(vq, off_wrap);
> +		if (ret) {
> +			vq_err(vq, "Failed to write to off warp at %p: %d\n",
> +			       &vq->device_event->off_wrap, ret);
> +			return false;
> +		}
> +		/* Make sure off_wrap is wrote before flags */
> +		smp_wmb();
> +		flags = RING_EVENT_FLAGS_DESC;
> +	}
> +
> +	ret = vhost_update_device_flags(vq, flags);
> +	if (ret) {
> +		vq_err(vq, "Failed to enable notification at %p: %d\n",
> +			&vq->device_event->flags, ret);
> +		return false;
> +	}
>  
>  	/* They could have slipped one in as we were doing that: make
>  	 * sure it's written, then check again. */
> @@ -2871,7 +3023,18 @@ EXPORT_SYMBOL_GPL(vhost_enable_notify);
>  static void vhost_disable_notify_packed(struct vhost_dev *dev,
>  					struct vhost_virtqueue *vq)
>  {
> -	/* FIXME: disable notification through device area */
> +	__virtio16 flags;
> +	int r;
> +
> +	if (vq->used_flags & VRING_USED_F_NO_NOTIFY)
> +		return;
> +	vq->used_flags |= VRING_USED_F_NO_NOTIFY;
> +
> +	flags = cpu_to_vhost16(vq, RING_EVENT_FLAGS_DISABLE);
> +	r = vhost_update_device_flags(vq, flags);
> +	if (r)
> +		vq_err(vq, "Failed to enable notification at %p: %d\n",
> +		       &vq->device_event->flags, r);
>  }
>  
>  static void vhost_disable_notify_split(struct vhost_dev *dev,
> diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
> index 7543a46..b920582 100644
> --- a/drivers/vhost/vhost.h
> +++ b/drivers/vhost/vhost.h
> @@ -96,8 +96,14 @@ struct vhost_virtqueue {
>  		struct vring_desc __user *desc;
>  		struct vring_desc_packed __user *desc_packed;
>  	};
> -	struct vring_avail __user *avail;
> -	struct vring_used __user *used;
> +	union {
> +		struct vring_avail __user *avail;
> +		struct vring_packed_desc_event __user *driver_event;
> +	};
> +	union {
> +		struct vring_used __user *used;
> +		struct vring_packed_desc_event __user *device_event;
> +	};
>  	const struct vhost_umem_node *meta_iotlb[VHOST_NUM_ADDRS];
>  	struct file *kick;
>  	struct eventfd_ctx *call_ctx;
> diff --git a/include/uapi/linux/virtio_ring.h b/include/uapi/linux/virtio_ring.h
> index e297580..71c7a46 100644
> --- a/include/uapi/linux/virtio_ring.h
> +++ b/include/uapi/linux/virtio_ring.h
> @@ -75,6 +75,25 @@ struct vring_desc_packed {
>  	__virtio16 flags;
>  };
>  
> +/* Enable events */
> +#define RING_EVENT_FLAGS_ENABLE 0x0
> +/* Disable events */
> +#define RING_EVENT_FLAGS_DISABLE 0x1
> +/*
> + * Enable events for a specific descriptor
> + * (as specified by Descriptor Ring Change Event Offset/Wrap Counter).
> + * Only valid if VIRTIO_F_RING_EVENT_IDX has been negotiated.
> + */
> +#define RING_EVENT_FLAGS_DESC 0x2
> +/* The value 0x3 is reserved */
> +
> +struct vring_packed_desc_event {
> +	/* Descriptor Ring Change Event Offset and Wrap Counter */
> +	__virtio16 off_wrap;
> +	/* Descriptor Ring Change Event Flags */
> +	__virtio16 flags;
> +};
> +
>  /* Virtio ring descriptors: 16 bytes.  These can chain together via "next". */
>  struct vring_desc {
>  	/* Address (guest-physical). */
> -- 
> 2.7.4
> 

^ permalink raw reply

* Re: [PATCH bpf-next v7 3/6] bpf: Add IPv6 Segment Routing helpers
From: Daniel Borkmann @ 2018-05-30 11:00 UTC (permalink / raw)
  To: Mathieu Xhonneux, netdev; +Cc: dlebrun, alexei.starovoitov
In-Reply-To: <d6833d31-4481-9595-ce26-d93ff35f411a@iogearbox.net>

On 05/24/2018 12:18 PM, Daniel Borkmann wrote:
> On 05/20/2018 03:58 PM, Mathieu Xhonneux wrote:
>> The BPF seg6local hook should be powerful enough to enable users to
>> implement most of the use-cases one could think of. After some thinking,
>> we figured out that the following actions should be possible on a SRv6
>> packet, requiring 3 specific helpers :
>>     - bpf_lwt_seg6_store_bytes: Modify non-sensitive fields of the SRH
>>     - bpf_lwt_seg6_adjust_srh: Allow to grow or shrink a SRH
>>                                (to add/delete TLVs)
>>     - bpf_lwt_seg6_action: Apply some SRv6 network programming actions
>>                            (specifically End.X, End.T, End.B6 and
>>                             End.B6.Encap)
>>
>> The specifications of these helpers are provided in the patch (see
>> include/uapi/linux/bpf.h).
>>
>> The non-sensitive fields of the SRH are the following : flags, tag and
>> TLVs. The other fields can not be modified, to maintain the SRH
>> integrity. Flags, tag and TLVs can easily be modified as their validity
>> can be checked afterwards via seg6_validate_srh. It is not allowed to
>> modify the segments directly. If one wants to add segments on the path,
>> he should stack a new SRH using the End.B6 action via
>> bpf_lwt_seg6_action.
>>
>> Growing, shrinking or editing TLVs via the helpers will flag the SRH as
>> invalid, and it will have to be re-validated before re-entering the IPv6
>> layer. This flag is stored in a per-CPU buffer, along with the current
>> header length in bytes.
>>
>> Storing the SRH len in bytes in the control block is mandatory when using
>> bpf_lwt_seg6_adjust_srh. The Header Ext. Length field contains the SRH
>> len rounded to 8 bytes (a padding TLV can be inserted to ensure the 8-bytes
>> boundary). When adding/deleting TLVs within the BPF program, the SRH may
>> temporary be in an invalid state where its length cannot be rounded to 8
>> bytes without remainder, hence the need to store the length in bytes
>> separately. The caller of the BPF program can then ensure that the SRH's
>> final length is valid using this value. Again, a final SRH modified by a
>> BPF program which doesn’t respect the 8-bytes boundary will be discarded
>> as it will be considered as invalid.
>>
>> Finally, a fourth helper is provided, bpf_lwt_push_encap, which is
>> available from the LWT BPF IN hook, but not from the seg6local BPF one.
>> This helper allows to encapsulate a Segment Routing Header (either with
>> a new outer IPv6 header, or by inlining it directly in the existing IPv6
>> header) into a non-SRv6 packet. This helper is required if we want to
>> offer the possibility to dynamically encapsulate a SRH for non-SRv6 packet,
>> as the BPF seg6local hook only works on traffic already containing a SRH.
>> This is the BPF equivalent of the seg6 LWT infrastructure, which achieves
>> the same purpose but with a static SRH per route.
>>
>> These helpers require CONFIG_IPV6=y (and not =m).
>>
>> Signed-off-by: Mathieu Xhonneux <m.xhonneux@gmail.com>
>> Acked-by: David Lebrun <dlebrun@google.com>
> 
> One minor comments for follow-ups in here below.
> 
>> +BPF_CALL_4(bpf_lwt_seg6_store_bytes, struct sk_buff *, skb, u32, offset,
>> +	   const void *, from, u32, len)
>> +{
>> +#if IS_ENABLED(CONFIG_IPV6_SEG6_BPF)
>> +	struct seg6_bpf_srh_state *srh_state =
>> +		this_cpu_ptr(&seg6_bpf_srh_states);
>> +	void *srh_tlvs, *srh_end, *ptr;
>> +	struct ipv6_sr_hdr *srh;
>> +	int srhoff = 0;
>> +
>> +	if (ipv6_find_hdr(skb, &srhoff, IPPROTO_ROUTING, NULL, NULL) < 0)
>> +		return -EINVAL;
>> +
>> +	srh = (struct ipv6_sr_hdr *)(skb->data + srhoff);
>> +	srh_tlvs = (void *)((char *)srh + ((srh->first_segment + 1) << 4));
>> +	srh_end = (void *)((char *)srh + sizeof(*srh) + srh_state->hdrlen);
>> +
>> +	ptr = skb->data + offset;
>> +	if (ptr >= srh_tlvs && ptr + len <= srh_end)
>> +		srh_state->valid = 0;
>> +	else if (ptr < (void *)&srh->flags ||
>> +		 ptr + len > (void *)&srh->segments)
>> +		return -EFAULT;
>> +
>> +	if (unlikely(bpf_try_make_writable(skb, offset + len)))
>> +		return -EFAULT;
>> +
>> +	memcpy(skb->data + offset, from, len);
>> +	return 0;
>> +#else /* CONFIG_IPV6_SEG6_BPF */
>> +	return -EOPNOTSUPP;
>> +#endif
>> +}
> 
> Instead of doing this inside the helper you can reject the program already
> in the lwt_*_func_proto() by returning NULL when !CONFIG_IPV6_SEG6_BPF. That
> way programs get rejected at verification time instead of runtime, so the
> user can probe availability more easily.

Mathieu, before this gets lost in archives, plan to follow-up on this one?

^ permalink raw reply

* Re: [PATCH v5 0/3] IR decoding using BPF
From: Daniel Borkmann @ 2018-05-30 10:57 UTC (permalink / raw)
  To: Sean Young, linux-media, linux-kernel, Alexei Starovoitov,
	Mauro Carvalho Chehab, netdev, Matthias Reichl, Devin Heitmueller,
	Y Song, Quentin Monnet
In-Reply-To: <cover.1527419762.git.sean@mess.org>

On 05/27/2018 01:24 PM, Sean Young wrote:
> The kernel IR decoders (drivers/media/rc/ir-*-decoder.c) support the most
> widely used IR protocols, but there are many protocols which are not
> supported[1]. For example, the lirc-remotes[2] repo has over 2700 remotes,
> many of which are not supported by rc-core. There is a "long tail" of
> unsupported IR protocols, for which lircd is need to decode the IR .
> 
> IR encoding is done in such a way that some simple circuit can decode it;
> therefore, bpf is ideal.
> 
> In order to support all these protocols, here we have bpf based IR decoding.
> The idea is that user-space can define a decoder in bpf, attach it to
> the rc device through the lirc chardev.
> 
> Separate work is underway to extend ir-keytable to have an extensive library
> of bpf-based decoders, and a much expanded library of rc keymaps.
> 
> Another future application would be to compile IRP[3] to a IR BPF program, and
> so support virtually every remote without having to write a decoder for each.
> It might also be possible to support non-button devices such as analog
> directional pads or air conditioning remote controls and decode the target
> temperature in bpf, and pass that to an input device.
> 
> Thanks,
> 
> Sean Young
> 
> [1] http://www.hifi-remote.com/wiki/index.php?title=DecodeIR
> [2] https://sourceforge.net/p/lirc-remotes/code/ci/master/tree/remotes/
> [3] http://www.hifi-remote.com/wiki/index.php?title=IRP_Notation
> 
> Changes since v4:
>  - Renamed rc_dev_bpf_{attach,detach,query} to lirc_bpf_{attach,detach,query}
>  - Fixed error path in lirc_bpf_query
>  - Rebased on bpf-next
> 
> Changes since v3:
>  - Implemented review comments from Quentin Monnet and Y Song (thanks!)
>  - More helpful and better formatted bpf helper documentation
>  - Changed back to bpf_prog_array rather than open-coded implementation
>  - scancodes can be 64 bit
>  - bpf gets passed values in microseconds, not nanoseconds.
>    microseconds is more than than enough (IR receivers support carriers upto
>    70kHz, at which point a single period is already 14 microseconds). Also,
>    this makes it much more consistent with lirc mode2.
>  - Since it looks much more like lirc mode2, rename the program type to
>    BPF_PROG_TYPE_LIRC_MODE2.
>  - Rebased on bpf-next
> 
> Changes since v2:
>  - Fixed locking issues
>  - Improved self-test to cover more cases
>  - Rebased on bpf-next again
> 
> Changes since v1:
>  - Code review comments from Y Song <ys114321@gmail.com> and
>    Randy Dunlap <rdunlap@infradead.org>
>  - Re-wrote sample bpf to be selftest
>  - Renamed RAWIR_DECODER -> RAWIR_EVENT (Kconfig, context, bpf prog type)
>  - Rebase on bpf-next
>  - Introduced bpf_rawir_event context structure with simpler access checking

Applied to bpf-next, thanks Sean!

^ permalink raw reply

* Re: [PATCH v4 net-next 00/19] inet: frags: bring rhashtables to IP defrag
From: Eric Dumazet @ 2018-05-30 10:56 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: Eric Dumazet, aring, Tariq Toukan, David Miller, netdev,
	Florian Westphal, Herbert Xu, Thomas Graf, Alexander Aring,
	Stefan Schmidt, Kirill Tkhai, moshe, Eran Ben Elisha, Rick Jones
In-Reply-To: <CANn89iL73WTtF7P477tJOZcbDsg3U7Py7ykA9xdipcahtJKNNA@mail.gmail.com>

On Wed, May 30, 2018 at 6:36 AM Eric Dumazet <edumazet@google.com> wrote:


> Here are the good ones, using latest David Miller net tree. ( plus
> https://patchwork.ozlabs.org/patch/922528/  but that should not matter
here)

> llpaa23:/export/hda3/google/edumazet# ./netperf -H 2607:f8b0:8099:e18:: -t
> UDP_STREAM
> MIGRATED UDP STREAM TEST from ::0 (::) port 0 AF_INET6 to
> 2607:f8b0:8099:e18:: () port 0 AF_INET6
> Socket  Message  Elapsed      Messages
> Size    Size     Time         Okay Errors   Throughput
> bytes   bytes    secs            #      #   10^6bits/sec

> 212992   65507   10.00      216236      0    11331.89
> 212992           10.00      215068           11270.68


> There are few drops because of the too small
> /proc/sys/net/core/rmem_default  ( 212992 as seen in netperf output) for
> these kind of stress.
> ( each 64KB datagram actually consumes half the budget ...)


Once rmem_default is set to 1,000,000 and  mtu set back to 1500 (instead of
5102 on my testbed)
results are indeed better.

lpaa23:/export/hda3/google/edumazet# ./netperf -H 2607:f8b0:8099:e18:: -t
UDP_STREAM -l 10
MIGRATED UDP STREAM TEST from ::0 (::) port 0 AF_INET6 to
2607:f8b0:8099:e18:: () port 0 AF_INET6
Socket  Message  Elapsed      Messages
Size    Size     Time         Okay Errors   Throughput
bytes   bytes    secs            #      #   10^6bits/sec

212992   65507   10.00      231457      0    12129.56
1000000           10.00      231457           12129.56

^ permalink raw reply

* Re: [PATCH bpf-next] bpftool: Support sendmsg{4,6} attach types
From: Daniel Borkmann @ 2018-05-30 10:56 UTC (permalink / raw)
  To: Song Liu, Jakub Kicinski
  Cc: Andrey Ignatov, Networking, Alexei Starovoitov, Quentin Monnet,
	kernel-team
In-Reply-To: <CAPhsuW6oyRbgnXoyNtA0XM03063qQJGok6bPpO_Z4QBVgmi7=w@mail.gmail.com>

On 05/30/2018 02:12 AM, Song Liu wrote:
> On Tue, May 29, 2018 at 2:20 PM, Jakub Kicinski <kubakici@wp.pl> wrote:
>> On Tue, 29 May 2018 13:29:31 -0700, Andrey Ignatov wrote:
>>> Add support for recently added BPF_CGROUP_UDP4_SENDMSG and
>>> BPF_CGROUP_UDP6_SENDMSG attach types to bpftool, update documentation
>>> and bash completion.
>>>
>>> Signed-off-by: Andrey Ignatov <rdna@fb.com>
>>
>> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
>>
>>> I'm not sure about "since 4.18" in Documentation part. I can follow-up when
>>> the next kernel version is known.
>>
>> IMHO it's fine, we can follow up if Linus decides to call it something
>> else :)
>>
>> Thanks!
> 
> Acked-by: Song Liu <songliubraving@fb.com>

Applied to bpf-next, thanks guys!

^ permalink raw reply

* Re: [PATCH v4 net-next 00/19] inet: frags: bring rhashtables to IP defrag
From: Eric Dumazet @ 2018-05-30 10:36 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: Eric Dumazet, aring, Tariq Toukan, David Miller, netdev,
	Florian Westphal, Herbert Xu, Thomas Graf, Alexander Aring,
	Stefan Schmidt, Kirill Tkhai, moshe, Eran Ben Elisha, Rick Jones
In-Reply-To: <20180530112022.2b793051@redhat.com>

On Wed, May 30, 2018 at 5:20 AM Jesper Dangaard Brouer <brouer@redhat.com>
wrote:

> On Mon, 28 May 2018 09:09:17 -0700
> Eric Dumazet <eric.dumazet@gmail.com> wrote:

> > Tariq, here are my test results : No drops for me.
> >
> > # ./netperf -H 2607:f8b0:8099:e18:: -t UDP_STREAM
> > MIGRATED UDP STREAM TEST from ::0 (::) port 0 AF_INET6 to
2607:f8b0:8099:e18:: () port 0 AF_INET6
> > Socket  Message  Elapsed      Messages
> > Size    Size     Time         Okay Errors   Throughput
> > bytes   bytes    secs            #      #   10^6bits/sec
> >
> > 212992   65507   10.00      202117      0    10592.00
> > 212992           10.00           0              0.00

> Hmm... Eric the above result show that ALL your UDP packets were dropped!
> You have 0 okay messages and 0.00 Mbit/s throughput.

> It needs to look like below (test on i40e NIC):

> $ netperf -t UDP_STREAM -H fee0:cafe::1
> MIGRATED UDP STREAM TEST from ::0 (::) port 0 AF_INET6 to fee0:cafe::1 ()
port 0 AF_INET6 : histogram : demo
> Socket  Message  Elapsed      Messages
> Size    Size     Time         Okay Errors   Throughput
> bytes   bytes    secs            #      #   10^6bits/sec

> 212992   65507   10.00      186385      0    9767.08
> 212992           10.00      186385           9767.08


> If I manually instruct ip6tables to drop all UDP packets, then I get
> what you see... so, something on your test system are likely dropping
> your UDP packets, but letting regular netperf (TCP) control
> communication through.

> # ip6tables -I INPUT -p udp -j DROP

> $ netperf -t UDP_STREAM -H fee0:cafe::1
> MIGRATED UDP STREAM TEST from ::0 (::) port 0 AF_INET6 to fee0:cafe::1 ()
port 0 AF_INET6 : histogram : demo
> Socket  Message  Elapsed      Messages
> Size    Size     Time         Okay Errors   Throughput
> bytes   bytes    secs            #      #   10^6bits/sec

> 212992   65507   10.00      182095      0    9542.41
> 212992           10.00           0              0.00



Right you are, for some reason I copied/pasted wrong results,
after _specifically_ filling up the frags to the memory limits,
when trying to reproduce 'bad numbers '

Here are the good ones, using latest David Miller net tree. ( plus
https://patchwork.ozlabs.org/patch/922528/  but that should not matter here)

llpaa23:/export/hda3/google/edumazet# ./netperf -H 2607:f8b0:8099:e18:: -t
UDP_STREAM
MIGRATED UDP STREAM TEST from ::0 (::) port 0 AF_INET6 to
2607:f8b0:8099:e18:: () port 0 AF_INET6
Socket  Message  Elapsed      Messages
Size    Size     Time         Okay Errors   Throughput
bytes   bytes    secs            #      #   10^6bits/sec

212992   65507   10.00      216236      0    11331.89
212992           10.00      215068           11270.68


There are few drops because of the too small
/proc/sys/net/core/rmem_default  ( 212992 as seen in netperf output) for
these kind of stress.
( each 64KB datagram actually consumes half the budget ...)

^ permalink raw reply

* Re: [PATCH bpf 2/2] bpf: enforce usage of __aligned_u64 in the UAPI header
From: Eugene Syromiatnikov @ 2018-05-30 10:03 UTC (permalink / raw)
  To: Song Liu
  Cc: netdev, open list, Martin KaFai Lau, Daniel Borkmann,
	Alexei Starovoitov, David S. Miller, Jiri Olsa, Ingo Molnar,
	Lawrence Brakmo, Andrey Ignatov, Jakub Kicinski, John Fastabend,
	Dmitry V. Levin
In-Reply-To: <CAPhsuW5fVamngrqEWcsPKyr3Njjz4K5vO3o51BuWXAMw_nf9KA@mail.gmail.com>

On Tue, May 29, 2018 at 10:35:09AM -0700, Song Liu wrote:
> I think these changes are not necessary. Is it a general guidance to
> only use 64-bit aligned
> variables in UAPI headers?

Not really, but it allows avoiding most alignment issues like the one
mentioned in the previous patch and in the referenced RDMA patch.

^ permalink raw reply

* Feature Request : iface may be allowed as datatype in all ipset
From: Akshat Kakkar @ 2018-05-30 10:03 UTC (permalink / raw)
  To: netdev

Is there a reason why iface is allowed to be paired only with net to
create an ipset?

I think with feature of skbinfo in every ipset, it should be allowed
to add iface in all ipset. As skbinfo can store tc classes, it might
make more sense if I can pin point on which outgoing interface this
class should be applied.

One direct way of doing could be adding iface in skbinfo itself, but I
dont think its a good suggestion.

So, other thing left is to have ipset storing interface too. Besides,
when I create a tc class, I create it on a known interface, so I know
beforehand on which interface this class is created. So I can easily
specify while adding entry in ipset.

^ permalink raw reply

* Re: [PATCH] netfilter: nfnetlink: Remove VLA usage
From: Pablo Neira Ayuso @ 2018-05-30  9:52 UTC (permalink / raw)
  To: Kees Cook
  Cc: Jozsef Kadlecsik, Florian Westphal, David S. Miller,
	netfilter-devel, coreteam, netdev, linux-kernel
In-Reply-To: <20180530003525.GA18642@beast>

On Tue, May 29, 2018 at 05:35:25PM -0700, Kees Cook wrote:
> In the quest to remove all stack VLA usage from the kernel[1], this
> allocates the maximum size expected for all possible attrs and adds
> a sanity-check to make sure nothing gets out of sync.
> 
> [1] https://lkml.kernel.org/r/CA+55aFzCG-zNmZwX4A2FQpadafLfEzK6CC=qPXydAacU1RqZWA@mail.gmail.com
> 
> Signed-off-by: Kees Cook <keescook@chromium.org>
> ---
>  net/netfilter/nfnetlink.c | 22 ++++++++++++++++++++--
>  1 file changed, 20 insertions(+), 2 deletions(-)
> 
> diff --git a/net/netfilter/nfnetlink.c b/net/netfilter/nfnetlink.c
> index 03ead8a9e90c..0cb395f9627e 100644
> --- a/net/netfilter/nfnetlink.c
> +++ b/net/netfilter/nfnetlink.c
> @@ -28,6 +28,7 @@
>  
>  #include <net/netlink.h>
>  #include <linux/netfilter/nfnetlink.h>
> +#include <linux/netfilter/nf_tables.h>
>  
>  MODULE_LICENSE("GPL");
>  MODULE_AUTHOR("Harald Welte <laforge@netfilter.org>");
> @@ -37,6 +38,11 @@ MODULE_ALIAS_NET_PF_PROTO(PF_NETLINK, NETLINK_NETFILTER);
>  	rcu_dereference_protected(table[(id)].subsys, \
>  				  lockdep_nfnl_is_held((id)))
>  
> +#define NFTA_MAX_ATTR	max(max(max(NFTA_CHAIN_MAX, NFTA_FLOWTABLE_MAX),\
> +				max(NFTA_OBJ_MAX, NFTA_RULE_MAX)),	\
> +			    max(NFTA_TABLE_MAX,				\
> +				max(NFTA_SET_ELEM_LIST_MAX, NFTA_SET_MAX)))

This is very specific of nftables, there are other nf subsystems using
nfnetlink that may go over this maximum attribute value (grep from
"struct nfnetlink_subsystem").

To remove the VLA, I think we need an artificial maximum attribute
that reasonably large enough.

^ permalink raw reply

* Re: [PATCH net-next 0/8] nfp: offload LAG for tc flower egress
From: John Hurley @ 2018-05-30  9:26 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Jakub Kicinski, David Miller, Linux Netdev List, oss-drivers,
	Jay Vosburgh, Veaceslav Falico, Andy Gospodarek
In-Reply-To: <20180529220947.GC2367@nanopsycho>

On Tue, May 29, 2018 at 11:09 PM, Jiri Pirko <jiri@resnulli.us> wrote:
> Tue, May 29, 2018 at 04:08:48PM CEST, john.hurley@netronome.com wrote:
>>On Sat, May 26, 2018 at 3:47 AM, Jakub Kicinski
>><jakub.kicinski@netronome.com> wrote:
>>> On Fri, 25 May 2018 08:48:09 +0200, Jiri Pirko wrote:
>>>> Thu, May 24, 2018 at 04:22:47AM CEST, jakub.kicinski@netronome.com wrote:
>>>> >Hi!
>>>> >
>>>> >This series from John adds bond offload to the nfp driver.  Patch 5
>>>> >exposes the hash type for NETDEV_LAG_TX_TYPE_HASH to make sure nfp
>>>> >hashing matches that of the software LAG.  This may be unnecessarily
>>>> >conservative, let's see what LAG maintainers think :)
>>>>
>>>> So you need to restrict offload to only certain hash algo? In mlxsw, we
>>>> just ignore the lag setting and do some hw default hashing. Would not be
>>>> enough? Note that there's a good reason for it, as you see, in team, the
>>>> hashing is done in a BPF function and could be totally arbitrary.
>>>> Your patchset effectively disables team offload for nfp.
>>>
>>> My understanding is that the project requirements only called for L3/L4
>>> hash algorithm offload, hence the temptation to err on the side of
>>> caution and not offload all the bond configurations.  John can provide
>>> more details.  Not being able to offload team is unfortunate indeed.
>>
>>Hi Jiri,
>>Yes, as Jakub mentions, we restrict ourselves to L3/L4 hash algorithm
>>as this is currently what is supported in fw.
>
> In mlxsw, a default l3/l4 is used always, no matter what the
> bonding/team sets. It is not correct, but it works with team as well.
> Perhaps we can have NETDEV_LAG_HASH_UNKNOWN to indicate to the driver to
> do some default? That would make the "team" offload functional.
>

yes, I would agree with that.
Thanks

>>Hopefully this will change as fw features are expanded.
>>I understand the issue this presents with offloading team.
>>Perhaps resorting to a default hw hash for team is acceptable.
>>John

^ permalink raw reply

* [PATCH] ath9k: debug: fix spelling mistake "WATHDOG" -> "WATCHDOG"
From: Colin King @ 2018-05-30  9:25 UTC (permalink / raw)
  To: QCA ath9k Development, Kalle Valo, David S . Miller,
	linux-wireless, netdev
  Cc: kernel-janitors, linux-kernel

From: Colin Ian King <colin.king@canonical.com>

Trivial fix to spelling mistake in PR_IS message text.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
---
 drivers/net/wireless/ath/ath9k/debug.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/wireless/ath/ath9k/debug.c b/drivers/net/wireless/ath/ath9k/debug.c
index f685843a2ff3..0a6eb8a8c1ed 100644
--- a/drivers/net/wireless/ath/ath9k/debug.c
+++ b/drivers/net/wireless/ath/ath9k/debug.c
@@ -538,7 +538,7 @@ static int read_file_interrupt(struct seq_file *file, void *data)
 	if (sc->sc_ah->caps.hw_caps & ATH9K_HW_CAP_EDMA) {
 		PR_IS("RXLP", rxlp);
 		PR_IS("RXHP", rxhp);
-		PR_IS("WATHDOG", bb_watchdog);
+		PR_IS("WATCHDOG", bb_watchdog);
 	} else {
 		PR_IS("RX", rxok);
 	}
-- 
2.17.0

^ permalink raw reply related

* Re: [PATCH v4 net-next 00/19] inet: frags: bring rhashtables to IP defrag
From: Jesper Dangaard Brouer @ 2018-05-30  9:20 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Alexander Aring, Tariq Toukan, David Miller, edumazet, netdev, fw,
	herbert, tgraf, alex.aring, stefan, ktkhai, Moshe Shemesh,
	Eran Ben Elisha, brouer, Rick Jones
In-Reply-To: <13bf3889-4426-b17a-d8d7-e843038a2a82@gmail.com>

On Mon, 28 May 2018 09:09:17 -0700
Eric Dumazet <eric.dumazet@gmail.com> wrote:

> Tariq, here are my test results : No drops for me.
> 
> # ./netperf -H 2607:f8b0:8099:e18:: -t UDP_STREAM
> MIGRATED UDP STREAM TEST from ::0 (::) port 0 AF_INET6 to 2607:f8b0:8099:e18:: () port 0 AF_INET6
> Socket  Message  Elapsed      Messages                
> Size    Size     Time         Okay Errors   Throughput
> bytes   bytes    secs            #      #   10^6bits/sec
> 
> 212992   65507   10.00      202117      0    10592.00
> 212992           10.00           0              0.00

Hmm... Eric the above result show that ALL your UDP packets were dropped!
You have 0 okay messages and 0.00 Mbit/s throughput.

It needs to look like below (test on i40e NIC):

$ netperf -t UDP_STREAM -H fee0:cafe::1
MIGRATED UDP STREAM TEST from ::0 (::) port 0 AF_INET6 to fee0:cafe::1 () port 0 AF_INET6 : histogram : demo
Socket  Message  Elapsed      Messages                
Size    Size     Time         Okay Errors   Throughput
bytes   bytes    secs            #      #   10^6bits/sec

212992   65507   10.00      186385      0    9767.08
212992           10.00      186385           9767.08


If I manually instruct ip6tables to drop all UDP packets, then I get
what you see... so, something on your test system are likely dropping
your UDP packets, but letting regular netperf (TCP) control
communication through.

# ip6tables -I INPUT -p udp -j DROP

$ netperf -t UDP_STREAM -H fee0:cafe::1
MIGRATED UDP STREAM TEST from ::0 (::) port 0 AF_INET6 to fee0:cafe::1 () port 0 AF_INET6 : histogram : demo
Socket  Message  Elapsed      Messages                
Size    Size     Time         Okay Errors   Throughput
bytes   bytes    secs            #      #   10^6bits/sec

212992   65507   10.00      182095      0    9542.41
212992           10.00           0              0.00


-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply

* Re: [PATCH net] VSOCK: check sk state before receive
From: Stefan Hajnoczi @ 2018-05-30  9:17 UTC (permalink / raw)
  To: Hangbin Liu; +Cc: netdev, Jorgen Hansen, David S. Miller
In-Reply-To: <20180527152945.GQ8958@leo.usersys.redhat.com>

[-- Attachment #1: Type: text/plain, Size: 4964 bytes --]

On Sun, May 27, 2018 at 11:29:45PM +0800, Hangbin Liu wrote:
> Hmm...Although I won't reproduce this bug with my reproducer after
> apply my patch. I could still get a similiar issue with syzkaller sock vnet test.
> 
> It looks this patch is not complete. Here is the KASAN call trace with my patch.
> I can also reproduce it without my patch.

Seems like a race between vmci_datagram_destroy_handle() and the
delayed callback, vmci_transport_recv_dgram_cb().

I don't know the VMCI transport well so I'll leave this to Jorgen.

> ==================================================================
> BUG: KASAN: use-after-free in vmci_transport_allow_dgram.part.7+0x155/0x1a0 [vmw_vsock_vmci_transport]
> Read of size 4 at addr ffff880026a3a914 by task kworker/0:2/96
> 
> CPU: 0 PID: 96 Comm: kworker/0:2 Not tainted 4.17.0-rc6.vsock+ #28
> Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
> Workqueue: events dg_delayed_dispatch [vmw_vmci]
> Call Trace:
>  __dump_stack lib/dump_stack.c:77 [inline]
>  dump_stack+0xdd/0x18e lib/dump_stack.c:113
>  print_address_description+0x7a/0x3e0 mm/kasan/report.c:256
>  kasan_report_error mm/kasan/report.c:354 [inline]
>  kasan_report+0x1dd/0x460 mm/kasan/report.c:412
>  vmci_transport_allow_dgram.part.7+0x155/0x1a0 [vmw_vsock_vmci_transport]
>  vmci_transport_recv_dgram_cb+0x5d/0x200 [vmw_vsock_vmci_transport]
>  dg_delayed_dispatch+0x99/0x1b0 [vmw_vmci]
>  process_one_work+0xa4e/0x1720 kernel/workqueue.c:2145
>  worker_thread+0x1df/0x1400 kernel/workqueue.c:2279
>  kthread+0x343/0x4b0 kernel/kthread.c:240
>  ret_from_fork+0x35/0x40 arch/x86/entry/entry_64.S:412
> 
> Allocated by task 2684:
>  set_track mm/kasan/kasan.c:460 [inline]
>  kasan_kmalloc+0xa0/0xd0 mm/kasan/kasan.c:553
>  slab_post_alloc_hook mm/slab.h:444 [inline]
>  slab_alloc_node mm/slub.c:2741 [inline]
>  slab_alloc mm/slub.c:2749 [inline]
>  kmem_cache_alloc+0x105/0x330 mm/slub.c:2754
>  sk_prot_alloc+0x6a/0x2c0 net/core/sock.c:1468
>  sk_alloc+0xc9/0xbb0 net/core/sock.c:1528
>  __vsock_create+0xc8/0x9b0 [vsock]
>  vsock_create+0xfd/0x1a0 [vsock]
>  __sock_create+0x310/0x690 net/socket.c:1285
>  sock_create net/socket.c:1325 [inline]
>  __sys_socket+0x101/0x240 net/socket.c:1355
>  __do_sys_socket net/socket.c:1364 [inline]
>  __se_sys_socket net/socket.c:1362 [inline]
>  __x64_sys_socket+0x7d/0xd0 net/socket.c:1362
>  do_syscall_64+0x175/0x630 arch/x86/entry/common.c:287
>  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> 
> Freed by task 2684:
>  set_track mm/kasan/kasan.c:460 [inline]
>  __kasan_slab_free+0x130/0x180 mm/kasan/kasan.c:521
>  slab_free_hook mm/slub.c:1388 [inline]
>  slab_free_freelist_hook mm/slub.c:1415 [inline]
>  slab_free mm/slub.c:2988 [inline]
>  kmem_cache_free+0xce/0x410 mm/slub.c:3004
>  sk_prot_free net/core/sock.c:1509 [inline]
>  __sk_destruct+0x629/0x940 net/core/sock.c:1593
>  sk_destruct+0x4e/0x90 net/core/sock.c:1601
>  __sk_free+0xd3/0x320 net/core/sock.c:1612
>  sk_free+0x2a/0x30 net/core/sock.c:1623
>  __vsock_release+0x431/0x610 [vsock]
>  vsock_release+0x3c/0xc0 [vsock]
>  sock_release+0x91/0x200 net/socket.c:594
>  sock_close+0x17/0x20 net/socket.c:1149
>  __fput+0x368/0xa20 fs/file_table.c:209
>  task_work_run+0x1c5/0x2a0 kernel/task_work.c:113
>  exit_task_work include/linux/task_work.h:22 [inline]
>  do_exit+0x1876/0x26c0 kernel/exit.c:865
>  do_group_exit+0x159/0x3e0 kernel/exit.c:968
>  get_signal+0x65a/0x1780 kernel/signal.c:2482
>  do_signal+0xa4/0x1fe0 arch/x86/kernel/signal.c:810
>  exit_to_usermode_loop+0x1b8/0x260 arch/x86/entry/common.c:162
>  prepare_exit_to_usermode arch/x86/entry/common.c:196 [inline]
>  syscall_return_slowpath arch/x86/entry/common.c:265 [inline]
>  do_syscall_64+0x505/0x630 arch/x86/entry/common.c:290
>  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> 
> The buggy address belongs to the object at ffff880026a3a600
>  which belongs to the cache AF_VSOCK of size 1056
> The buggy address is located 788 bytes inside of
>  1056-byte region [ffff880026a3a600, ffff880026a3aa20)
> The buggy address belongs to the page:
> page:ffffea00009a8e00 count:1 mapcount:0 mapping:0000000000000000 index:0x0 compound_mapcount: 0
> flags: 0xfffffc0008100(slab|head)
> raw: 000fffffc0008100 0000000000000000 0000000000000000 00000001000d000d
> raw: dead000000000100 dead000000000200 ffff880034471a40 0000000000000000
> page dumped because: kasan: bad access detected
> 
> Memory state around the buggy address:
>  ffff880026a3a800: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>  ffff880026a3a880: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> >ffff880026a3a900: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>                          ^
>  ffff880026a3a980: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>  ffff880026a3aa00: fb fb fb fb fc fc fc fc fc fc fc fc fc fc fc fc
> ==================================================================

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox