Netdev List

Netdev List
 help / color / mirror / Atom feed

* [PATCH net 0/2] ip[6] tunnels: fix mtu calculations
From: Nicolas Dichtel @ 2018-05-30  8:28 UTC (permalink / raw)
  To: davem, petrm, idosch; +Cc: netdev

The first patch restores the possibility to bind an ip4 tunnel to an
interface whith a large mtu.
The second patch was spotted after the first fix. I also target it to net
because it fixes the max mtu value that can be used for ipv6 tunnels.

 net/ipv4/ip_tunnel.c  |  6 +++---
 net/ipv6/ip6_tunnel.c | 11 ++++++++---
 net/ipv6/sit.c        |  5 +++--
 3 files changed, 14 insertions(+), 8 deletions(-)

Comments are welcomed,
Regards,
Nicolas

^ permalink raw reply

* [PATCH net 1/2] ip_tunnel: restore binding to ifaces with a large mtu
From: Nicolas Dichtel @ 2018-05-30  8:28 UTC (permalink / raw)
  To: davem, petrm, idosch; +Cc: netdev, Nicolas Dichtel
In-Reply-To: <20180530082843.6076-1-nicolas.dichtel@6wind.com>

After commit f6cc9c054e77, the following conf is broken (note that the
default loopback mtu is 65536, ie IP_MAX_MTU + 1):

$ ip tunnel add gre1 mode gre local 10.125.0.1 remote 10.125.0.2 dev lo
add tunnel "gre0" failed: Invalid argument
$ ip l a type dummy
$ ip l s dummy1 up
$ ip l s dummy1 mtu 65535
$ ip tunnel add gre1 mode gre local 10.125.0.1 remote 10.125.0.2 dev dummy1
add tunnel "gre0" failed: Invalid argument

dev_set_mtu() doesn't allow to set a mtu which is too large.
First, let's cap the mtu returned by ip_tunnel_bind_dev(). Second, remove
the magic value 0xFFF8 and use IP_MAX_MTU instead.
0xFFF8 seems to be there for ages, I don't know why this value was used.

With a recent kernel, it's also possible to set a mtu > IP_MAX_MTU:
$ ip l s dummy1 mtu 66000
After that patch, it's also possible to bind an ip tunnel on that kind of
interface.

CC: Petr Machata <petrm@mellanox.com>
CC: Ido Schimmel <idosch@mellanox.com>
Link: https://git.kernel.org/pub/scm/linux/kernel/git/davem/netdev-vger-cvs.git/commit/?id=e5afd356a411a
Fixes: f6cc9c054e77 ("ip_tunnel: Emit events for post-register MTU changes")
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
---
 net/ipv4/ip_tunnel.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/net/ipv4/ip_tunnel.c b/net/ipv4/ip_tunnel.c
index 6b0e362cc99b..3b39c72a1029 100644
--- a/net/ipv4/ip_tunnel.c
+++ b/net/ipv4/ip_tunnel.c
@@ -328,7 +328,7 @@ static int ip_tunnel_bind_dev(struct net_device *dev)
 
 	if (tdev) {
 		hlen = tdev->hard_header_len + tdev->needed_headroom;
-		mtu = tdev->mtu;
+		mtu = min(tdev->mtu, IP_MAX_MTU);
 	}
 
 	dev->needed_headroom = t_hlen + hlen;
@@ -362,7 +362,7 @@ static struct ip_tunnel *ip_tunnel_create(struct net *net,
 	nt = netdev_priv(dev);
 	t_hlen = nt->hlen + sizeof(struct iphdr);
 	dev->min_mtu = ETH_MIN_MTU;
-	dev->max_mtu = 0xFFF8 - dev->hard_header_len - t_hlen;
+	dev->max_mtu = IP_MAX_MTU - dev->hard_header_len - t_hlen;
 	ip_tunnel_add(itn, nt);
 	return nt;
 
@@ -930,7 +930,7 @@ int __ip_tunnel_change_mtu(struct net_device *dev, int new_mtu, bool strict)
 {
 	struct ip_tunnel *tunnel = netdev_priv(dev);
 	int t_hlen = tunnel->hlen + sizeof(struct iphdr);
-	int max_mtu = 0xFFF8 - dev->hard_header_len - t_hlen;
+	int max_mtu = IP_MAX_MTU - dev->hard_header_len - t_hlen;
 
 	if (new_mtu < ETH_MIN_MTU)
 		return -EINVAL;
-- 
2.15.1

^ permalink raw reply related

* [PATCH net 2/2] ip6_tunnel: remove magic mtu value 0xFFF8
From: Nicolas Dichtel @ 2018-05-30  8:28 UTC (permalink / raw)
  To: davem, petrm, idosch; +Cc: netdev, Nicolas Dichtel
In-Reply-To: <20180530082843.6076-1-nicolas.dichtel@6wind.com>

I don't know where this value comes from (probably a copy and paste and
paste and paste ...).
Let's use standard values which are a bit greater.

Link: https://git.kernel.org/pub/scm/linux/kernel/git/davem/netdev-vger-cvs.git/commit/?id=e5afd356a411a
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
---
 net/ipv6/ip6_tunnel.c | 11 ++++++++---
 net/ipv6/sit.c        |  5 +++--
 2 files changed, 11 insertions(+), 5 deletions(-)

diff --git a/net/ipv6/ip6_tunnel.c b/net/ipv6/ip6_tunnel.c
index da66aaac51ce..00e138a44cbb 100644
--- a/net/ipv6/ip6_tunnel.c
+++ b/net/ipv6/ip6_tunnel.c
@@ -1692,8 +1692,13 @@ int ip6_tnl_change_mtu(struct net_device *dev, int new_mtu)
 		if (new_mtu < ETH_MIN_MTU)
 			return -EINVAL;
 	}
-	if (new_mtu > 0xFFF8 - dev->hard_header_len)
-		return -EINVAL;
+	if (tnl->parms.proto == IPPROTO_IPV6 || tnl->parms.proto == 0) {
+		if (new_mtu > IP6_MAX_MTU - dev->hard_header_len)
+			return -EINVAL;
+	} else {
+		if (new_mtu > IP_MAX_MTU - dev->hard_header_len)
+			return -EINVAL;
+	}
 	dev->mtu = new_mtu;
 	return 0;
 }
@@ -1841,7 +1846,7 @@ ip6_tnl_dev_init_gen(struct net_device *dev)
 	if (!(t->parms.flags & IP6_TNL_F_IGN_ENCAP_LIMIT))
 		dev->mtu -= 8;
 	dev->min_mtu = ETH_MIN_MTU;
-	dev->max_mtu = 0xFFF8 - dev->hard_header_len;
+	dev->max_mtu = IP6_MAX_MTU - dev->hard_header_len;
 
 	return 0;
 
diff --git a/net/ipv6/sit.c b/net/ipv6/sit.c
index 2afce37a7177..e9400ffa7875 100644
--- a/net/ipv6/sit.c
+++ b/net/ipv6/sit.c
@@ -1371,7 +1371,7 @@ static void ipip6_tunnel_setup(struct net_device *dev)
 	dev->hard_header_len	= LL_MAX_HEADER + t_hlen;
 	dev->mtu		= ETH_DATA_LEN - t_hlen;
 	dev->min_mtu		= IPV6_MIN_MTU;
-	dev->max_mtu		= 0xFFF8 - t_hlen;
+	dev->max_mtu		= IP6_MAX_MTU - t_hlen;
 	dev->flags		= IFF_NOARP;
 	netif_keep_dst(dev);
 	dev->addr_len		= 4;
@@ -1583,7 +1583,8 @@ static int ipip6_newlink(struct net *src_net, struct net_device *dev,
 	if (tb[IFLA_MTU]) {
 		u32 mtu = nla_get_u32(tb[IFLA_MTU]);
 
-		if (mtu >= IPV6_MIN_MTU && mtu <= 0xFFF8 - dev->hard_header_len)
+		if (mtu >= IPV6_MIN_MTU &&
+		    mtu <= IP6_MAX_MTU - dev->hard_header_len)
 			dev->mtu = mtu;
 	}
 
-- 
2.15.1

^ permalink raw reply related

* Re: [PATCH net] net/sonic: Use dma_mapping_error()
From: Tom Bogendoerfer @ 2018-05-30  9:01 UTC (permalink / raw)
  To: Finn Thain; +Cc: David S. Miller, netdev, linux-kernel
In-Reply-To: <cba8175deaf9d631ae000088aea1ccf1c444909b.1527649393.git.fthain@telegraphics.com.au>

On Wed, May 30, 2018 at 01:03:51PM +1000, Finn Thain wrote:
> With CONFIG_DMA_API_DEBUG=y, calling sonic_open() produces the
> message, "DMA-API: device driver failed to check map error".
> Add the missing dma_mapping_error() call.
> 
> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
> Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
> ---
>  drivers/net/ethernet/natsemi/sonic.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/natsemi/sonic.c b/drivers/net/ethernet/natsemi/sonic.c
> index 7ed08486ae23..c805dcbebd02 100644
> --- a/drivers/net/ethernet/natsemi/sonic.c
> +++ b/drivers/net/ethernet/natsemi/sonic.c
> @@ -84,7 +84,7 @@ static int sonic_open(struct net_device *dev)
>  	for (i = 0; i < SONIC_NUM_RRS; i++) {
>  		dma_addr_t laddr = dma_map_single(lp->device, skb_put(lp->rx_skb[i], SONIC_RBSIZE),
>  		                                  SONIC_RBSIZE, DMA_FROM_DEVICE);
> -		if (!laddr) {
> +		if (dma_mapping_error(lp->device, laddr)) {
>  			while(i > 0) { /* free any that were mapped successfully */
>  				i--;
>  				dma_unmap_single(lp->device, lp->rx_laddr[i], SONIC_RBSIZE, DMA_FROM_DEVICE);
> -- 
> 2.16.1

looks good

Acked-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>

Thomas.

-- 
Crap can work. Given enough thrust pigs will fly, but it's not necessarily a
good idea.                                                [ RFC1925, 2.3 ]

^ permalink raw reply

* [PATCH V2] brcmfmac: stop watchdog before detach and free everything
From: Michael Trimarchi @ 2018-05-30  9:06 UTC (permalink / raw)
  To: Arend van Spriel
  Cc: Franky Lin, Hante Meuleman, Chi-Hsien Lin, Wright Feng,
	Kalle Valo, David S. Miller, Pieter-Paul Giesberts, Ian Molton,
	linux-wireless, brcm80211-dev-list.pdl, brcm80211-dev-list,
	netdev, linux-kernel
In-Reply-To: <5B0D1C9E.4000800@broadcom.com>

Using built-in in kernel image without a firmware in filesystem
or in the kernel image can lead to a kernel NULL pointer deference.
Watchdog need to be stopped in brcmf_sdio_remove

The system is going down NOW!
[ 1348.110759] Unable to handle kernel NULL pointer dereference at virtual address 000002f8
Sent SIGTERM to all processes
[ 1348.121412] Mem abort info:
[ 1348.126962]   ESR = 0x96000004
[ 1348.130023]   Exception class = DABT (current EL), IL = 32 bits
[ 1348.135948]   SET = 0, FnV = 0
[ 1348.138997]   EA = 0, S1PTW = 0
[ 1348.142154] Data abort info:
[ 1348.145045]   ISV = 0, ISS = 0x00000004
[ 1348.148884]   CM = 0, WnR = 0
[ 1348.151861] user pgtable: 4k pages, 48-bit VAs, pgdp = (____ptrval____)
[ 1348.158475] [00000000000002f8] pgd=0000000000000000
[ 1348.163364] Internal error: Oops: 96000004 [#1] PREEMPT SMP
[ 1348.168927] Modules linked in: ipv6
[ 1348.172421] CPU: 3 PID: 1421 Comm: brcmf_wdog/mmc0 Not tainted 4.17.0-rc5-next-20180517 #18
[ 1348.180757] Hardware name: Amarula A64-Relic (DT)
[ 1348.185455] pstate: 60000005 (nZCv daif -PAN -UAO)
[ 1348.190251] pc : brcmf_sdiod_freezer_count+0x0/0x20
[ 1348.195124] lr : brcmf_sdio_watchdog_thread+0x64/0x290
[ 1348.200253] sp : ffff00000b85be30
[ 1348.203561] x29: ffff00000b85be30 x28: 0000000000000000
[ 1348.208868] x27: ffff00000b6cb918 x26: ffff80003b990638
[ 1348.214176] x25: ffff0000087b1a20 x24: ffff80003b94f800
[ 1348.219483] x23: ffff000008e620c8 x22: ffff000008f0b660
[ 1348.224790] x21: ffff000008c6a858 x20: 00000000fffffe00
[ 1348.230097] x19: ffff80003b94f800 x18: 0000000000000001
[ 1348.235404] x17: 0000ffffab2e8a74 x16: ffff0000080d7de8
[ 1348.240711] x15: 0000000000000000 x14: 0000000000000400
[ 1348.246018] x13: 0000000000000400 x12: 0000000000000001
[ 1348.251324] x11: 00000000000002c4 x10: 0000000000000a10
[ 1348.256631] x9 : ffff00000b85bc40 x8 : ffff80003be11870
[ 1348.261937] x7 : ffff80003dfc7308 x6 : 000000078ff08b55
[ 1348.267243] x5 : 00000139e1058400 x4 : 0000000000000000
[ 1348.272550] x3 : dead000000000100 x2 : 958f2788d6618100
[ 1348.277856] x1 : 00000000fffffe00 x0 : 0000000000000000

Signed-off-by: Michael Trimarchi <michael@amarulasolutions.com>
---
 drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c
index 412a05b..061f69d 100644
--- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c
+++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c
@@ -4294,6 +4294,13 @@ void brcmf_sdio_remove(struct brcmf_sdio *bus)
 	brcmf_dbg(TRACE, "Enter\n");
 
 	if (bus) {
+		/* Stop watchdog task */
+		if (bus->watchdog_tsk) {
+			send_sig(SIGTERM, bus->watchdog_tsk, 1);
+			kthread_stop(bus->watchdog_tsk);
+			bus->watchdog_tsk = NULL;
+		}
+
 		/* De-register interrupt handler */
 		brcmf_sdiod_intr_unregister(bus->sdiodev);
 
-- 
2.7.4

^ permalink raw reply related

* Re: [PATCH net] VSOCK: check sk state before receive
From: Stefan Hajnoczi @ 2018-05-30  9:17 UTC (permalink / raw)
  To: Hangbin Liu; +Cc: netdev, Jorgen Hansen, David S. Miller
In-Reply-To: <20180527152945.GQ8958@leo.usersys.redhat.com>

[-- Attachment #1: Type: text/plain, Size: 4964 bytes --]

On Sun, May 27, 2018 at 11:29:45PM +0800, Hangbin Liu wrote:
> Hmm...Although I won't reproduce this bug with my reproducer after
> apply my patch. I could still get a similiar issue with syzkaller sock vnet test.
> 
> It looks this patch is not complete. Here is the KASAN call trace with my patch.
> I can also reproduce it without my patch.

Seems like a race between vmci_datagram_destroy_handle() and the
delayed callback, vmci_transport_recv_dgram_cb().

I don't know the VMCI transport well so I'll leave this to Jorgen.

> ==================================================================
> BUG: KASAN: use-after-free in vmci_transport_allow_dgram.part.7+0x155/0x1a0 [vmw_vsock_vmci_transport]
> Read of size 4 at addr ffff880026a3a914 by task kworker/0:2/96
> 
> CPU: 0 PID: 96 Comm: kworker/0:2 Not tainted 4.17.0-rc6.vsock+ #28
> Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
> Workqueue: events dg_delayed_dispatch [vmw_vmci]
> Call Trace:
>  __dump_stack lib/dump_stack.c:77 [inline]
>  dump_stack+0xdd/0x18e lib/dump_stack.c:113
>  print_address_description+0x7a/0x3e0 mm/kasan/report.c:256
>  kasan_report_error mm/kasan/report.c:354 [inline]
>  kasan_report+0x1dd/0x460 mm/kasan/report.c:412
>  vmci_transport_allow_dgram.part.7+0x155/0x1a0 [vmw_vsock_vmci_transport]
>  vmci_transport_recv_dgram_cb+0x5d/0x200 [vmw_vsock_vmci_transport]
>  dg_delayed_dispatch+0x99/0x1b0 [vmw_vmci]
>  process_one_work+0xa4e/0x1720 kernel/workqueue.c:2145
>  worker_thread+0x1df/0x1400 kernel/workqueue.c:2279
>  kthread+0x343/0x4b0 kernel/kthread.c:240
>  ret_from_fork+0x35/0x40 arch/x86/entry/entry_64.S:412
> 
> Allocated by task 2684:
>  set_track mm/kasan/kasan.c:460 [inline]
>  kasan_kmalloc+0xa0/0xd0 mm/kasan/kasan.c:553
>  slab_post_alloc_hook mm/slab.h:444 [inline]
>  slab_alloc_node mm/slub.c:2741 [inline]
>  slab_alloc mm/slub.c:2749 [inline]
>  kmem_cache_alloc+0x105/0x330 mm/slub.c:2754
>  sk_prot_alloc+0x6a/0x2c0 net/core/sock.c:1468
>  sk_alloc+0xc9/0xbb0 net/core/sock.c:1528
>  __vsock_create+0xc8/0x9b0 [vsock]
>  vsock_create+0xfd/0x1a0 [vsock]
>  __sock_create+0x310/0x690 net/socket.c:1285
>  sock_create net/socket.c:1325 [inline]
>  __sys_socket+0x101/0x240 net/socket.c:1355
>  __do_sys_socket net/socket.c:1364 [inline]
>  __se_sys_socket net/socket.c:1362 [inline]
>  __x64_sys_socket+0x7d/0xd0 net/socket.c:1362
>  do_syscall_64+0x175/0x630 arch/x86/entry/common.c:287
>  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> 
> Freed by task 2684:
>  set_track mm/kasan/kasan.c:460 [inline]
>  __kasan_slab_free+0x130/0x180 mm/kasan/kasan.c:521
>  slab_free_hook mm/slub.c:1388 [inline]
>  slab_free_freelist_hook mm/slub.c:1415 [inline]
>  slab_free mm/slub.c:2988 [inline]
>  kmem_cache_free+0xce/0x410 mm/slub.c:3004
>  sk_prot_free net/core/sock.c:1509 [inline]
>  __sk_destruct+0x629/0x940 net/core/sock.c:1593
>  sk_destruct+0x4e/0x90 net/core/sock.c:1601
>  __sk_free+0xd3/0x320 net/core/sock.c:1612
>  sk_free+0x2a/0x30 net/core/sock.c:1623
>  __vsock_release+0x431/0x610 [vsock]
>  vsock_release+0x3c/0xc0 [vsock]
>  sock_release+0x91/0x200 net/socket.c:594
>  sock_close+0x17/0x20 net/socket.c:1149
>  __fput+0x368/0xa20 fs/file_table.c:209
>  task_work_run+0x1c5/0x2a0 kernel/task_work.c:113
>  exit_task_work include/linux/task_work.h:22 [inline]
>  do_exit+0x1876/0x26c0 kernel/exit.c:865
>  do_group_exit+0x159/0x3e0 kernel/exit.c:968
>  get_signal+0x65a/0x1780 kernel/signal.c:2482
>  do_signal+0xa4/0x1fe0 arch/x86/kernel/signal.c:810
>  exit_to_usermode_loop+0x1b8/0x260 arch/x86/entry/common.c:162
>  prepare_exit_to_usermode arch/x86/entry/common.c:196 [inline]
>  syscall_return_slowpath arch/x86/entry/common.c:265 [inline]
>  do_syscall_64+0x505/0x630 arch/x86/entry/common.c:290
>  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> 
> The buggy address belongs to the object at ffff880026a3a600
>  which belongs to the cache AF_VSOCK of size 1056
> The buggy address is located 788 bytes inside of
>  1056-byte region [ffff880026a3a600, ffff880026a3aa20)
> The buggy address belongs to the page:
> page:ffffea00009a8e00 count:1 mapcount:0 mapping:0000000000000000 index:0x0 compound_mapcount: 0
> flags: 0xfffffc0008100(slab|head)
> raw: 000fffffc0008100 0000000000000000 0000000000000000 00000001000d000d
> raw: dead000000000100 dead000000000200 ffff880034471a40 0000000000000000
> page dumped because: kasan: bad access detected
> 
> Memory state around the buggy address:
>  ffff880026a3a800: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>  ffff880026a3a880: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> >ffff880026a3a900: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>                          ^
>  ffff880026a3a980: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>  ffff880026a3aa00: fb fb fb fb fc fc fc fc fc fc fc fc fc fc fc fc
> ==================================================================

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply

* Re: [PATCH v4 net-next 00/19] inet: frags: bring rhashtables to IP defrag
From: Jesper Dangaard Brouer @ 2018-05-30  9:20 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Alexander Aring, Tariq Toukan, David Miller, edumazet, netdev, fw,
	herbert, tgraf, alex.aring, stefan, ktkhai, Moshe Shemesh,
	Eran Ben Elisha, brouer, Rick Jones
In-Reply-To: <13bf3889-4426-b17a-d8d7-e843038a2a82@gmail.com>

On Mon, 28 May 2018 09:09:17 -0700
Eric Dumazet <eric.dumazet@gmail.com> wrote:

> Tariq, here are my test results : No drops for me.
> 
> # ./netperf -H 2607:f8b0:8099:e18:: -t UDP_STREAM
> MIGRATED UDP STREAM TEST from ::0 (::) port 0 AF_INET6 to 2607:f8b0:8099:e18:: () port 0 AF_INET6
> Socket  Message  Elapsed      Messages                
> Size    Size     Time         Okay Errors   Throughput
> bytes   bytes    secs            #      #   10^6bits/sec
> 
> 212992   65507   10.00      202117      0    10592.00
> 212992           10.00           0              0.00

Hmm... Eric the above result show that ALL your UDP packets were dropped!
You have 0 okay messages and 0.00 Mbit/s throughput.

It needs to look like below (test on i40e NIC):

$ netperf -t UDP_STREAM -H fee0:cafe::1
MIGRATED UDP STREAM TEST from ::0 (::) port 0 AF_INET6 to fee0:cafe::1 () port 0 AF_INET6 : histogram : demo
Socket  Message  Elapsed      Messages                
Size    Size     Time         Okay Errors   Throughput
bytes   bytes    secs            #      #   10^6bits/sec

212992   65507   10.00      186385      0    9767.08
212992           10.00      186385           9767.08


If I manually instruct ip6tables to drop all UDP packets, then I get
what you see... so, something on your test system are likely dropping
your UDP packets, but letting regular netperf (TCP) control
communication through.

# ip6tables -I INPUT -p udp -j DROP

$ netperf -t UDP_STREAM -H fee0:cafe::1
MIGRATED UDP STREAM TEST from ::0 (::) port 0 AF_INET6 to fee0:cafe::1 () port 0 AF_INET6 : histogram : demo
Socket  Message  Elapsed      Messages                
Size    Size     Time         Okay Errors   Throughput
bytes   bytes    secs            #      #   10^6bits/sec

212992   65507   10.00      182095      0    9542.41
212992           10.00           0              0.00


-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply

* [PATCH] ath9k: debug: fix spelling mistake "WATHDOG" -> "WATCHDOG"
From: Colin King @ 2018-05-30  9:25 UTC (permalink / raw)
  To: QCA ath9k Development, Kalle Valo, David S . Miller,
	linux-wireless, netdev
  Cc: kernel-janitors, linux-kernel

From: Colin Ian King <colin.king@canonical.com>

Trivial fix to spelling mistake in PR_IS message text.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
---
 drivers/net/wireless/ath/ath9k/debug.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/wireless/ath/ath9k/debug.c b/drivers/net/wireless/ath/ath9k/debug.c
index f685843a2ff3..0a6eb8a8c1ed 100644
--- a/drivers/net/wireless/ath/ath9k/debug.c
+++ b/drivers/net/wireless/ath/ath9k/debug.c
@@ -538,7 +538,7 @@ static int read_file_interrupt(struct seq_file *file, void *data)
 	if (sc->sc_ah->caps.hw_caps & ATH9K_HW_CAP_EDMA) {
 		PR_IS("RXLP", rxlp);
 		PR_IS("RXHP", rxhp);
-		PR_IS("WATHDOG", bb_watchdog);
+		PR_IS("WATCHDOG", bb_watchdog);
 	} else {
 		PR_IS("RX", rxok);
 	}
-- 
2.17.0

^ permalink raw reply related

* Re: [PATCH net-next 0/8] nfp: offload LAG for tc flower egress
From: John Hurley @ 2018-05-30  9:26 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Jakub Kicinski, David Miller, Linux Netdev List, oss-drivers,
	Jay Vosburgh, Veaceslav Falico, Andy Gospodarek
In-Reply-To: <20180529220947.GC2367@nanopsycho>

On Tue, May 29, 2018 at 11:09 PM, Jiri Pirko <jiri@resnulli.us> wrote:
> Tue, May 29, 2018 at 04:08:48PM CEST, john.hurley@netronome.com wrote:
>>On Sat, May 26, 2018 at 3:47 AM, Jakub Kicinski
>><jakub.kicinski@netronome.com> wrote:
>>> On Fri, 25 May 2018 08:48:09 +0200, Jiri Pirko wrote:
>>>> Thu, May 24, 2018 at 04:22:47AM CEST, jakub.kicinski@netronome.com wrote:
>>>> >Hi!
>>>> >
>>>> >This series from John adds bond offload to the nfp driver.  Patch 5
>>>> >exposes the hash type for NETDEV_LAG_TX_TYPE_HASH to make sure nfp
>>>> >hashing matches that of the software LAG.  This may be unnecessarily
>>>> >conservative, let's see what LAG maintainers think :)
>>>>
>>>> So you need to restrict offload to only certain hash algo? In mlxsw, we
>>>> just ignore the lag setting and do some hw default hashing. Would not be
>>>> enough? Note that there's a good reason for it, as you see, in team, the
>>>> hashing is done in a BPF function and could be totally arbitrary.
>>>> Your patchset effectively disables team offload for nfp.
>>>
>>> My understanding is that the project requirements only called for L3/L4
>>> hash algorithm offload, hence the temptation to err on the side of
>>> caution and not offload all the bond configurations.  John can provide
>>> more details.  Not being able to offload team is unfortunate indeed.
>>
>>Hi Jiri,
>>Yes, as Jakub mentions, we restrict ourselves to L3/L4 hash algorithm
>>as this is currently what is supported in fw.
>
> In mlxsw, a default l3/l4 is used always, no matter what the
> bonding/team sets. It is not correct, but it works with team as well.
> Perhaps we can have NETDEV_LAG_HASH_UNKNOWN to indicate to the driver to
> do some default? That would make the "team" offload functional.
>

yes, I would agree with that.
Thanks

>>Hopefully this will change as fw features are expanded.
>>I understand the issue this presents with offloading team.
>>Perhaps resorting to a default hw hash for team is acceptable.
>>John

^ permalink raw reply

* Re: [PATCH] netfilter: nfnetlink: Remove VLA usage
From: Pablo Neira Ayuso @ 2018-05-30  9:52 UTC (permalink / raw)
  To: Kees Cook
  Cc: Jozsef Kadlecsik, Florian Westphal, David S. Miller,
	netfilter-devel, coreteam, netdev, linux-kernel
In-Reply-To: <20180530003525.GA18642@beast>

On Tue, May 29, 2018 at 05:35:25PM -0700, Kees Cook wrote:
> In the quest to remove all stack VLA usage from the kernel[1], this
> allocates the maximum size expected for all possible attrs and adds
> a sanity-check to make sure nothing gets out of sync.
> 
> [1] https://lkml.kernel.org/r/CA+55aFzCG-zNmZwX4A2FQpadafLfEzK6CC=qPXydAacU1RqZWA@mail.gmail.com
> 
> Signed-off-by: Kees Cook <keescook@chromium.org>
> ---
>  net/netfilter/nfnetlink.c | 22 ++++++++++++++++++++--
>  1 file changed, 20 insertions(+), 2 deletions(-)
> 
> diff --git a/net/netfilter/nfnetlink.c b/net/netfilter/nfnetlink.c
> index 03ead8a9e90c..0cb395f9627e 100644
> --- a/net/netfilter/nfnetlink.c
> +++ b/net/netfilter/nfnetlink.c
> @@ -28,6 +28,7 @@
>  
>  #include <net/netlink.h>
>  #include <linux/netfilter/nfnetlink.h>
> +#include <linux/netfilter/nf_tables.h>
>  
>  MODULE_LICENSE("GPL");
>  MODULE_AUTHOR("Harald Welte <laforge@netfilter.org>");
> @@ -37,6 +38,11 @@ MODULE_ALIAS_NET_PF_PROTO(PF_NETLINK, NETLINK_NETFILTER);
>  	rcu_dereference_protected(table[(id)].subsys, \
>  				  lockdep_nfnl_is_held((id)))
>  
> +#define NFTA_MAX_ATTR	max(max(max(NFTA_CHAIN_MAX, NFTA_FLOWTABLE_MAX),\
> +				max(NFTA_OBJ_MAX, NFTA_RULE_MAX)),	\
> +			    max(NFTA_TABLE_MAX,				\
> +				max(NFTA_SET_ELEM_LIST_MAX, NFTA_SET_MAX)))

This is very specific of nftables, there are other nf subsystems using
nfnetlink that may go over this maximum attribute value (grep from
"struct nfnetlink_subsystem").

To remove the VLA, I think we need an artificial maximum attribute
that reasonably large enough.

^ permalink raw reply

* Feature Request : iface may be allowed as datatype in all ipset
From: Akshat Kakkar @ 2018-05-30 10:03 UTC (permalink / raw)
  To: netdev

Is there a reason why iface is allowed to be paired only with net to
create an ipset?

I think with feature of skbinfo in every ipset, it should be allowed
to add iface in all ipset. As skbinfo can store tc classes, it might
make more sense if I can pin point on which outgoing interface this
class should be applied.

One direct way of doing could be adding iface in skbinfo itself, but I
dont think its a good suggestion.

So, other thing left is to have ipset storing interface too. Besides,
when I create a tc class, I create it on a known interface, so I know
beforehand on which interface this class is created. So I can easily
specify while adding entry in ipset.

^ permalink raw reply

* Re: [PATCH bpf 2/2] bpf: enforce usage of __aligned_u64 in the UAPI header
From: Eugene Syromiatnikov @ 2018-05-30 10:03 UTC (permalink / raw)
  To: Song Liu
  Cc: netdev, open list, Martin KaFai Lau, Daniel Borkmann,
	Alexei Starovoitov, David S. Miller, Jiri Olsa, Ingo Molnar,
	Lawrence Brakmo, Andrey Ignatov, Jakub Kicinski, John Fastabend,
	Dmitry V. Levin
In-Reply-To: <CAPhsuW5fVamngrqEWcsPKyr3Njjz4K5vO3o51BuWXAMw_nf9KA@mail.gmail.com>

On Tue, May 29, 2018 at 10:35:09AM -0700, Song Liu wrote:
> I think these changes are not necessary. Is it a general guidance to
> only use 64-bit aligned
> variables in UAPI headers?

Not really, but it allows avoiding most alignment issues like the one
mentioned in the previous patch and in the referenced RDMA patch.

^ permalink raw reply

* Re: [PATCH v4 net-next 00/19] inet: frags: bring rhashtables to IP defrag
From: Eric Dumazet @ 2018-05-30 10:36 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: Eric Dumazet, aring, Tariq Toukan, David Miller, netdev,
	Florian Westphal, Herbert Xu, Thomas Graf, Alexander Aring,
	Stefan Schmidt, Kirill Tkhai, moshe, Eran Ben Elisha, Rick Jones
In-Reply-To: <20180530112022.2b793051@redhat.com>

On Wed, May 30, 2018 at 5:20 AM Jesper Dangaard Brouer <brouer@redhat.com>
wrote:

> On Mon, 28 May 2018 09:09:17 -0700
> Eric Dumazet <eric.dumazet@gmail.com> wrote:

> > Tariq, here are my test results : No drops for me.
> >
> > # ./netperf -H 2607:f8b0:8099:e18:: -t UDP_STREAM
> > MIGRATED UDP STREAM TEST from ::0 (::) port 0 AF_INET6 to
2607:f8b0:8099:e18:: () port 0 AF_INET6
> > Socket  Message  Elapsed      Messages
> > Size    Size     Time         Okay Errors   Throughput
> > bytes   bytes    secs            #      #   10^6bits/sec
> >
> > 212992   65507   10.00      202117      0    10592.00
> > 212992           10.00           0              0.00

> Hmm... Eric the above result show that ALL your UDP packets were dropped!
> You have 0 okay messages and 0.00 Mbit/s throughput.

> It needs to look like below (test on i40e NIC):

> $ netperf -t UDP_STREAM -H fee0:cafe::1
> MIGRATED UDP STREAM TEST from ::0 (::) port 0 AF_INET6 to fee0:cafe::1 ()
port 0 AF_INET6 : histogram : demo
> Socket  Message  Elapsed      Messages
> Size    Size     Time         Okay Errors   Throughput
> bytes   bytes    secs            #      #   10^6bits/sec

> 212992   65507   10.00      186385      0    9767.08
> 212992           10.00      186385           9767.08


> If I manually instruct ip6tables to drop all UDP packets, then I get
> what you see... so, something on your test system are likely dropping
> your UDP packets, but letting regular netperf (TCP) control
> communication through.

> # ip6tables -I INPUT -p udp -j DROP

> $ netperf -t UDP_STREAM -H fee0:cafe::1
> MIGRATED UDP STREAM TEST from ::0 (::) port 0 AF_INET6 to fee0:cafe::1 ()
port 0 AF_INET6 : histogram : demo
> Socket  Message  Elapsed      Messages
> Size    Size     Time         Okay Errors   Throughput
> bytes   bytes    secs            #      #   10^6bits/sec

> 212992   65507   10.00      182095      0    9542.41
> 212992           10.00           0              0.00



Right you are, for some reason I copied/pasted wrong results,
after _specifically_ filling up the frags to the memory limits,
when trying to reproduce 'bad numbers '

Here are the good ones, using latest David Miller net tree. ( plus
https://patchwork.ozlabs.org/patch/922528/  but that should not matter here)

llpaa23:/export/hda3/google/edumazet# ./netperf -H 2607:f8b0:8099:e18:: -t
UDP_STREAM
MIGRATED UDP STREAM TEST from ::0 (::) port 0 AF_INET6 to
2607:f8b0:8099:e18:: () port 0 AF_INET6
Socket  Message  Elapsed      Messages
Size    Size     Time         Okay Errors   Throughput
bytes   bytes    secs            #      #   10^6bits/sec

212992   65507   10.00      216236      0    11331.89
212992           10.00      215068           11270.68


There are few drops because of the too small
/proc/sys/net/core/rmem_default  ( 212992 as seen in netperf output) for
these kind of stress.
( each 64KB datagram actually consumes half the budget ...)

^ permalink raw reply

* Re: [PATCH bpf-next] bpftool: Support sendmsg{4,6} attach types
From: Daniel Borkmann @ 2018-05-30 10:56 UTC (permalink / raw)
  To: Song Liu, Jakub Kicinski
  Cc: Andrey Ignatov, Networking, Alexei Starovoitov, Quentin Monnet,
	kernel-team
In-Reply-To: <CAPhsuW6oyRbgnXoyNtA0XM03063qQJGok6bPpO_Z4QBVgmi7=w@mail.gmail.com>

On 05/30/2018 02:12 AM, Song Liu wrote:
> On Tue, May 29, 2018 at 2:20 PM, Jakub Kicinski <kubakici@wp.pl> wrote:
>> On Tue, 29 May 2018 13:29:31 -0700, Andrey Ignatov wrote:
>>> Add support for recently added BPF_CGROUP_UDP4_SENDMSG and
>>> BPF_CGROUP_UDP6_SENDMSG attach types to bpftool, update documentation
>>> and bash completion.
>>>
>>> Signed-off-by: Andrey Ignatov <rdna@fb.com>
>>
>> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
>>
>>> I'm not sure about "since 4.18" in Documentation part. I can follow-up when
>>> the next kernel version is known.
>>
>> IMHO it's fine, we can follow up if Linus decides to call it something
>> else :)
>>
>> Thanks!
> 
> Acked-by: Song Liu <songliubraving@fb.com>

Applied to bpf-next, thanks guys!

^ permalink raw reply

* Re: [PATCH v4 net-next 00/19] inet: frags: bring rhashtables to IP defrag
From: Eric Dumazet @ 2018-05-30 10:56 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: Eric Dumazet, aring, Tariq Toukan, David Miller, netdev,
	Florian Westphal, Herbert Xu, Thomas Graf, Alexander Aring,
	Stefan Schmidt, Kirill Tkhai, moshe, Eran Ben Elisha, Rick Jones
In-Reply-To: <CANn89iL73WTtF7P477tJOZcbDsg3U7Py7ykA9xdipcahtJKNNA@mail.gmail.com>

On Wed, May 30, 2018 at 6:36 AM Eric Dumazet <edumazet@google.com> wrote:


> Here are the good ones, using latest David Miller net tree. ( plus
> https://patchwork.ozlabs.org/patch/922528/  but that should not matter
here)

> llpaa23:/export/hda3/google/edumazet# ./netperf -H 2607:f8b0:8099:e18:: -t
> UDP_STREAM
> MIGRATED UDP STREAM TEST from ::0 (::) port 0 AF_INET6 to
> 2607:f8b0:8099:e18:: () port 0 AF_INET6
> Socket  Message  Elapsed      Messages
> Size    Size     Time         Okay Errors   Throughput
> bytes   bytes    secs            #      #   10^6bits/sec

> 212992   65507   10.00      216236      0    11331.89
> 212992           10.00      215068           11270.68


> There are few drops because of the too small
> /proc/sys/net/core/rmem_default  ( 212992 as seen in netperf output) for
> these kind of stress.
> ( each 64KB datagram actually consumes half the budget ...)


Once rmem_default is set to 1,000,000 and  mtu set back to 1500 (instead of
5102 on my testbed)
results are indeed better.

lpaa23:/export/hda3/google/edumazet# ./netperf -H 2607:f8b0:8099:e18:: -t
UDP_STREAM -l 10
MIGRATED UDP STREAM TEST from ::0 (::) port 0 AF_INET6 to
2607:f8b0:8099:e18:: () port 0 AF_INET6
Socket  Message  Elapsed      Messages
Size    Size     Time         Okay Errors   Throughput
bytes   bytes    secs            #      #   10^6bits/sec

212992   65507   10.00      231457      0    12129.56
1000000           10.00      231457           12129.56

^ permalink raw reply

* Re: [PATCH v5 0/3] IR decoding using BPF
From: Daniel Borkmann @ 2018-05-30 10:57 UTC (permalink / raw)
  To: Sean Young, linux-media, linux-kernel, Alexei Starovoitov,
	Mauro Carvalho Chehab, netdev, Matthias Reichl, Devin Heitmueller,
	Y Song, Quentin Monnet
In-Reply-To: <cover.1527419762.git.sean@mess.org>

On 05/27/2018 01:24 PM, Sean Young wrote:
> The kernel IR decoders (drivers/media/rc/ir-*-decoder.c) support the most
> widely used IR protocols, but there are many protocols which are not
> supported[1]. For example, the lirc-remotes[2] repo has over 2700 remotes,
> many of which are not supported by rc-core. There is a "long tail" of
> unsupported IR protocols, for which lircd is need to decode the IR .
> 
> IR encoding is done in such a way that some simple circuit can decode it;
> therefore, bpf is ideal.
> 
> In order to support all these protocols, here we have bpf based IR decoding.
> The idea is that user-space can define a decoder in bpf, attach it to
> the rc device through the lirc chardev.
> 
> Separate work is underway to extend ir-keytable to have an extensive library
> of bpf-based decoders, and a much expanded library of rc keymaps.
> 
> Another future application would be to compile IRP[3] to a IR BPF program, and
> so support virtually every remote without having to write a decoder for each.
> It might also be possible to support non-button devices such as analog
> directional pads or air conditioning remote controls and decode the target
> temperature in bpf, and pass that to an input device.
> 
> Thanks,
> 
> Sean Young
> 
> [1] http://www.hifi-remote.com/wiki/index.php?title=DecodeIR
> [2] https://sourceforge.net/p/lirc-remotes/code/ci/master/tree/remotes/
> [3] http://www.hifi-remote.com/wiki/index.php?title=IRP_Notation
> 
> Changes since v4:
>  - Renamed rc_dev_bpf_{attach,detach,query} to lirc_bpf_{attach,detach,query}
>  - Fixed error path in lirc_bpf_query
>  - Rebased on bpf-next
> 
> Changes since v3:
>  - Implemented review comments from Quentin Monnet and Y Song (thanks!)
>  - More helpful and better formatted bpf helper documentation
>  - Changed back to bpf_prog_array rather than open-coded implementation
>  - scancodes can be 64 bit
>  - bpf gets passed values in microseconds, not nanoseconds.
>    microseconds is more than than enough (IR receivers support carriers upto
>    70kHz, at which point a single period is already 14 microseconds). Also,
>    this makes it much more consistent with lirc mode2.
>  - Since it looks much more like lirc mode2, rename the program type to
>    BPF_PROG_TYPE_LIRC_MODE2.
>  - Rebased on bpf-next
> 
> Changes since v2:
>  - Fixed locking issues
>  - Improved self-test to cover more cases
>  - Rebased on bpf-next again
> 
> Changes since v1:
>  - Code review comments from Y Song <ys114321@gmail.com> and
>    Randy Dunlap <rdunlap@infradead.org>
>  - Re-wrote sample bpf to be selftest
>  - Renamed RAWIR_DECODER -> RAWIR_EVENT (Kconfig, context, bpf prog type)
>  - Rebase on bpf-next
>  - Introduced bpf_rawir_event context structure with simpler access checking

Applied to bpf-next, thanks Sean!

^ permalink raw reply

* Re: [PATCH bpf-next v7 3/6] bpf: Add IPv6 Segment Routing helpers
From: Daniel Borkmann @ 2018-05-30 11:00 UTC (permalink / raw)
  To: Mathieu Xhonneux, netdev; +Cc: dlebrun, alexei.starovoitov
In-Reply-To: <d6833d31-4481-9595-ce26-d93ff35f411a@iogearbox.net>

On 05/24/2018 12:18 PM, Daniel Borkmann wrote:
> On 05/20/2018 03:58 PM, Mathieu Xhonneux wrote:
>> The BPF seg6local hook should be powerful enough to enable users to
>> implement most of the use-cases one could think of. After some thinking,
>> we figured out that the following actions should be possible on a SRv6
>> packet, requiring 3 specific helpers :
>>     - bpf_lwt_seg6_store_bytes: Modify non-sensitive fields of the SRH
>>     - bpf_lwt_seg6_adjust_srh: Allow to grow or shrink a SRH
>>                                (to add/delete TLVs)
>>     - bpf_lwt_seg6_action: Apply some SRv6 network programming actions
>>                            (specifically End.X, End.T, End.B6 and
>>                             End.B6.Encap)
>>
>> The specifications of these helpers are provided in the patch (see
>> include/uapi/linux/bpf.h).
>>
>> The non-sensitive fields of the SRH are the following : flags, tag and
>> TLVs. The other fields can not be modified, to maintain the SRH
>> integrity. Flags, tag and TLVs can easily be modified as their validity
>> can be checked afterwards via seg6_validate_srh. It is not allowed to
>> modify the segments directly. If one wants to add segments on the path,
>> he should stack a new SRH using the End.B6 action via
>> bpf_lwt_seg6_action.
>>
>> Growing, shrinking or editing TLVs via the helpers will flag the SRH as
>> invalid, and it will have to be re-validated before re-entering the IPv6
>> layer. This flag is stored in a per-CPU buffer, along with the current
>> header length in bytes.
>>
>> Storing the SRH len in bytes in the control block is mandatory when using
>> bpf_lwt_seg6_adjust_srh. The Header Ext. Length field contains the SRH
>> len rounded to 8 bytes (a padding TLV can be inserted to ensure the 8-bytes
>> boundary). When adding/deleting TLVs within the BPF program, the SRH may
>> temporary be in an invalid state where its length cannot be rounded to 8
>> bytes without remainder, hence the need to store the length in bytes
>> separately. The caller of the BPF program can then ensure that the SRH's
>> final length is valid using this value. Again, a final SRH modified by a
>> BPF program which doesn’t respect the 8-bytes boundary will be discarded
>> as it will be considered as invalid.
>>
>> Finally, a fourth helper is provided, bpf_lwt_push_encap, which is
>> available from the LWT BPF IN hook, but not from the seg6local BPF one.
>> This helper allows to encapsulate a Segment Routing Header (either with
>> a new outer IPv6 header, or by inlining it directly in the existing IPv6
>> header) into a non-SRv6 packet. This helper is required if we want to
>> offer the possibility to dynamically encapsulate a SRH for non-SRv6 packet,
>> as the BPF seg6local hook only works on traffic already containing a SRH.
>> This is the BPF equivalent of the seg6 LWT infrastructure, which achieves
>> the same purpose but with a static SRH per route.
>>
>> These helpers require CONFIG_IPV6=y (and not =m).
>>
>> Signed-off-by: Mathieu Xhonneux <m.xhonneux@gmail.com>
>> Acked-by: David Lebrun <dlebrun@google.com>
> 
> One minor comments for follow-ups in here below.
> 
>> +BPF_CALL_4(bpf_lwt_seg6_store_bytes, struct sk_buff *, skb, u32, offset,
>> +	   const void *, from, u32, len)
>> +{
>> +#if IS_ENABLED(CONFIG_IPV6_SEG6_BPF)
>> +	struct seg6_bpf_srh_state *srh_state =
>> +		this_cpu_ptr(&seg6_bpf_srh_states);
>> +	void *srh_tlvs, *srh_end, *ptr;
>> +	struct ipv6_sr_hdr *srh;
>> +	int srhoff = 0;
>> +
>> +	if (ipv6_find_hdr(skb, &srhoff, IPPROTO_ROUTING, NULL, NULL) < 0)
>> +		return -EINVAL;
>> +
>> +	srh = (struct ipv6_sr_hdr *)(skb->data + srhoff);
>> +	srh_tlvs = (void *)((char *)srh + ((srh->first_segment + 1) << 4));
>> +	srh_end = (void *)((char *)srh + sizeof(*srh) + srh_state->hdrlen);
>> +
>> +	ptr = skb->data + offset;
>> +	if (ptr >= srh_tlvs && ptr + len <= srh_end)
>> +		srh_state->valid = 0;
>> +	else if (ptr < (void *)&srh->flags ||
>> +		 ptr + len > (void *)&srh->segments)
>> +		return -EFAULT;
>> +
>> +	if (unlikely(bpf_try_make_writable(skb, offset + len)))
>> +		return -EFAULT;
>> +
>> +	memcpy(skb->data + offset, from, len);
>> +	return 0;
>> +#else /* CONFIG_IPV6_SEG6_BPF */
>> +	return -EOPNOTSUPP;
>> +#endif
>> +}
> 
> Instead of doing this inside the helper you can reject the program already
> in the lwt_*_func_proto() by returning NULL when !CONFIG_IPV6_SEG6_BPF. That
> way programs get rejected at verification time instead of runtime, so the
> user can probe availability more easily.

Mathieu, before this gets lost in archives, plan to follow-up on this one?

^ permalink raw reply

* Re: [RFC V5 PATCH 8/8] vhost: event suppression for packed ring
From: Wei Xu @ 2018-05-30 11:42 UTC (permalink / raw)
  To: Jason Wang; +Cc: kvm, mst, netdev, linux-kernel, virtualization
In-Reply-To: <1527559830-8133-9-git-send-email-jasowang@redhat.com>

On Tue, May 29, 2018 at 10:10:30AM +0800, Jason Wang wrote:
> This patch introduces basic support for event suppression aka driver
> and device area.
> 
> Signed-off-by: Jason Wang <jasowang@redhat.com>
> ---
>  drivers/vhost/vhost.c            | 191 ++++++++++++++++++++++++++++++++++++---
>  drivers/vhost/vhost.h            |  10 +-
>  include/uapi/linux/virtio_ring.h |  19 ++++
>  3 files changed, 204 insertions(+), 16 deletions(-)
> 
> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> index a36e5ad2..112f680 100644
> --- a/drivers/vhost/vhost.c
> +++ b/drivers/vhost/vhost.c
> @@ -1112,10 +1112,15 @@ static int vq_access_ok_packed(struct vhost_virtqueue *vq, unsigned int num,
>  			       struct vring_used __user *used)
>  {
>  	struct vring_desc_packed *packed = (struct vring_desc_packed *)desc;
> +	struct vring_packed_desc_event *driver_event =
> +		(struct vring_packed_desc_event *)avail;
> +	struct vring_packed_desc_event *device_event =
> +		(struct vring_packed_desc_event *)used;
>  
> -	/* FIXME: check device area and driver area */
>  	return access_ok(VERIFY_READ, packed, num * sizeof(*packed)) &&
> -	       access_ok(VERIFY_WRITE, packed, num * sizeof(*packed));
> +	       access_ok(VERIFY_WRITE, packed, num * sizeof(*packed)) &&
> +	       access_ok(VERIFY_READ, driver_event, sizeof(*driver_event)) &&
> +	       access_ok(VERIFY_WRITE, device_event, sizeof(*device_event));
>  }
>  
>  static int vq_access_ok_split(struct vhost_virtqueue *vq, unsigned int num,
> @@ -1190,14 +1195,27 @@ static bool iotlb_access_ok(struct vhost_virtqueue *vq,
>  	return true;
>  }
>  
> -int vq_iotlb_prefetch(struct vhost_virtqueue *vq)
> +int vq_iotlb_prefetch_packed(struct vhost_virtqueue *vq)
> +{
> +	int num = vq->num;
> +
> +	return iotlb_access_ok(vq, VHOST_ACCESS_RO, (u64)(uintptr_t)vq->desc,
> +			       num * sizeof(*vq->desc), VHOST_ADDR_DESC) &&
> +	       iotlb_access_ok(vq, VHOST_ACCESS_WO, (u64)(uintptr_t)vq->desc,
> +			       num * sizeof(*vq->desc), VHOST_ADDR_DESC) &&
> +	       iotlb_access_ok(vq, VHOST_ACCESS_RO,
> +			       (u64)(uintptr_t)vq->driver_event,
> +			       sizeof(*vq->driver_event), VHOST_ADDR_AVAIL) &&
> +	       iotlb_access_ok(vq, VHOST_ACCESS_WO,
> +			       (u64)(uintptr_t)vq->device_event,
> +			       sizeof(*vq->device_event), VHOST_ADDR_USED);
> +}
> +
> +int vq_iotlb_prefetch_split(struct vhost_virtqueue *vq)
>  {
>  	size_t s = vhost_has_feature(vq, VIRTIO_RING_F_EVENT_IDX) ? 2 : 0;
>  	unsigned int num = vq->num;
>  
> -	if (!vq->iotlb)
> -		return 1;
> -
>  	return iotlb_access_ok(vq, VHOST_ACCESS_RO, (u64)(uintptr_t)vq->desc,
>  			       num * sizeof(*vq->desc), VHOST_ADDR_DESC) &&
>  	       iotlb_access_ok(vq, VHOST_ACCESS_RO, (u64)(uintptr_t)vq->avail,
> @@ -1209,6 +1227,17 @@ int vq_iotlb_prefetch(struct vhost_virtqueue *vq)
>  			       num * sizeof(*vq->used->ring) + s,
>  			       VHOST_ADDR_USED);
>  }
> +
> +int vq_iotlb_prefetch(struct vhost_virtqueue *vq)
> +{
> +	if (!vq->iotlb)
> +		return 1;
> +
> +	if (vhost_has_feature(vq, VIRTIO_F_RING_PACKED))
> +		return vq_iotlb_prefetch_packed(vq);
> +	else
> +		return vq_iotlb_prefetch_split(vq);
> +}
>  EXPORT_SYMBOL_GPL(vq_iotlb_prefetch);
>  
>  /* Can we log writes? */
> @@ -1730,6 +1759,50 @@ static int vhost_update_used_flags(struct vhost_virtqueue *vq)
>  	return 0;
>  }
>  
> +static int vhost_update_device_flags(struct vhost_virtqueue *vq,
> +				     __virtio16 device_flags)
> +{
> +	void __user *flags;
> +
> +	if (vhost_put_user(vq, device_flags, &vq->device_event->flags,
> +			   VHOST_ADDR_USED) < 0)
> +		return -EFAULT;
> +	if (unlikely(vq->log_used)) {
> +		/* Make sure the flag is seen before log. */
> +		smp_wmb();
> +		/* Log used flag write. */
> +		flags = &vq->device_event->flags;
> +		log_write(vq->log_base, vq->log_addr +
> +			  (flags - (void __user *)vq->device_event),
> +			  sizeof(vq->device_event->flags));
> +		if (vq->log_ctx)
> +			eventfd_signal(vq->log_ctx, 1);
> +	}
> +	return 0;
> +}
> +
> +static int vhost_update_device_off_wrap(struct vhost_virtqueue *vq,
> +					__virtio16 device_off_wrap)
> +{
> +	void __user *off_wrap;
> +
> +	if (vhost_put_user(vq, device_off_wrap, &vq->device_event->off_wrap,
> +			   VHOST_ADDR_USED) < 0)
> +		return -EFAULT;
> +	if (unlikely(vq->log_used)) {
> +		/* Make sure the flag is seen before log. */
> +		smp_wmb();
> +		/* Log used flag write. */
> +		off_wrap = &vq->device_event->off_wrap;
> +		log_write(vq->log_base, vq->log_addr +
> +			  (off_wrap - (void __user *)vq->device_event),
> +			  sizeof(vq->device_event->off_wrap));
> +		if (vq->log_ctx)
> +			eventfd_signal(vq->log_ctx, 1);
> +	}
> +	return 0;
> +}
> +
>  static int vhost_update_avail_event(struct vhost_virtqueue *vq, u16 avail_event)
>  {
>  	if (vhost_put_user(vq, cpu_to_vhost16(vq, vq->avail_idx),
> @@ -2683,16 +2756,13 @@ int vhost_add_used_n(struct vhost_virtqueue *vq,
>  }
>  EXPORT_SYMBOL_GPL(vhost_add_used_n);
>  
> -static bool vhost_notify(struct vhost_dev *dev, struct vhost_virtqueue *vq)
> +static bool vhost_notify_split(struct vhost_dev *dev,
> +			       struct vhost_virtqueue *vq)
>  {
>  	__u16 old, new;
>  	__virtio16 event;
>  	bool v;
>  
> -	/* FIXME: check driver area */
> -	if (vhost_has_feature(vq, VIRTIO_F_RING_PACKED))
> -		return true;
> -
>  	/* Flush out used index updates. This is paired
>  	 * with the barrier that the Guest executes when enabling
>  	 * interrupts. */
> @@ -2725,6 +2795,64 @@ static bool vhost_notify(struct vhost_dev *dev, struct vhost_virtqueue *vq)
>  	return vring_need_event(vhost16_to_cpu(vq, event), new, old);
>  }
>  
> +static bool vhost_notify_packed(struct vhost_dev *dev,
> +				struct vhost_virtqueue *vq)
> +{
> +	__virtio16 event_off_wrap, event_flags;
> +	__u16 old, new, off_wrap;
> +	bool v;
> +
> +	/* Flush out used descriptors updates. This is paired
> +	 * with the barrier that the Guest executes when enabling
> +	 * interrupts.
> +	 */
> +	smp_mb();
> +
> +	if (vhost_get_avail(vq, event_flags,
> +			   &vq->driver_event->flags) < 0) {
> +		vq_err(vq, "Failed to get driver desc_event_flags");
> +		return true;
> +	}
> +
> +	if (!vhost_has_feature(vq, VIRTIO_RING_F_EVENT_IDX))
> +		return event_flags !=
> +		       cpu_to_vhost16(vq, RING_EVENT_FLAGS_DISABLE);
> +
> +	old = vq->signalled_used;
> +	v = vq->signalled_used_valid;
> +	new = vq->signalled_used = vq->last_used_idx;
> +	vq->signalled_used_valid = true;
> +
> +	if (event_flags != cpu_to_vhost16(vq, RING_EVENT_FLAGS_DESC))
> +		return event_flags !=
> +		       cpu_to_vhost16(vq, RING_EVENT_FLAGS_DISABLE);
> +
> +	/* Read desc event flags before event_off and event_wrap */
> +	smp_rmb();
> +
> +	if (vhost_get_avail(vq, event_off_wrap,
> +			    &vq->driver_event->off_wrap) < 0) {
> +		vq_err(vq, "Failed to get driver desc_event_off/wrap");
> +		return true;
> +	}
> +
> +	off_wrap = vhost16_to_cpu(vq, event_off_wrap);
> +
> +	if (unlikely(!v))
> +		return true;
> +
> +	return vhost_vring_packed_need_event(vq, vq->used_wrap_counter,
> +					     off_wrap, new, old);
> +}
> +
> +static bool vhost_notify(struct vhost_dev *dev, struct vhost_virtqueue *vq)
> +{
> +	if (vhost_has_feature(vq, VIRTIO_F_RING_PACKED))
> +		return vhost_notify_packed(dev, vq);
> +	else
> +		return vhost_notify_split(dev, vq);
> +}
> +
>  /* This actually signals the guest, using eventfd. */
>  void vhost_signal(struct vhost_dev *dev, struct vhost_virtqueue *vq)
>  {
> @@ -2802,10 +2930,34 @@ static bool vhost_enable_notify_packed(struct vhost_dev *dev,
>  				       struct vhost_virtqueue *vq)
>  {
>  	struct vring_desc_packed *d = vq->desc_packed + vq->avail_idx;
> -	__virtio16 flags;
> +	__virtio16 flags = RING_EVENT_FLAGS_ENABLE;
>  	int ret;
>  
> -	/* FIXME: disable notification through device area */
> +	if (!(vq->used_flags & VRING_USED_F_NO_NOTIFY))
> +		return false;
> +	vq->used_flags &= ~VRING_USED_F_NO_NOTIFY;

'used_flags' was originally designed for 1.0, why should we pay attetion to it here?

Wei
> +
> +	if (vhost_has_feature(vq, VIRTIO_RING_F_EVENT_IDX)) {
> +		__virtio16 off_wrap = cpu_to_vhost16(vq, vq->avail_idx |
> +				      vq->avail_wrap_counter << 15);
> +
> +		ret = vhost_update_device_off_wrap(vq, off_wrap);
> +		if (ret) {
> +			vq_err(vq, "Failed to write to off warp at %p: %d\n",
> +			       &vq->device_event->off_wrap, ret);
> +			return false;
> +		}
> +		/* Make sure off_wrap is wrote before flags */
> +		smp_wmb();
> +		flags = RING_EVENT_FLAGS_DESC;
> +	}
> +
> +	ret = vhost_update_device_flags(vq, flags);
> +	if (ret) {
> +		vq_err(vq, "Failed to enable notification at %p: %d\n",
> +			&vq->device_event->flags, ret);
> +		return false;
> +	}
>  
>  	/* They could have slipped one in as we were doing that: make
>  	 * sure it's written, then check again. */
> @@ -2871,7 +3023,18 @@ EXPORT_SYMBOL_GPL(vhost_enable_notify);
>  static void vhost_disable_notify_packed(struct vhost_dev *dev,
>  					struct vhost_virtqueue *vq)
>  {
> -	/* FIXME: disable notification through device area */
> +	__virtio16 flags;
> +	int r;
> +
> +	if (vq->used_flags & VRING_USED_F_NO_NOTIFY)
> +		return;
> +	vq->used_flags |= VRING_USED_F_NO_NOTIFY;
> +
> +	flags = cpu_to_vhost16(vq, RING_EVENT_FLAGS_DISABLE);
> +	r = vhost_update_device_flags(vq, flags);
> +	if (r)
> +		vq_err(vq, "Failed to enable notification at %p: %d\n",
> +		       &vq->device_event->flags, r);
>  }
>  
>  static void vhost_disable_notify_split(struct vhost_dev *dev,
> diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
> index 7543a46..b920582 100644
> --- a/drivers/vhost/vhost.h
> +++ b/drivers/vhost/vhost.h
> @@ -96,8 +96,14 @@ struct vhost_virtqueue {
>  		struct vring_desc __user *desc;
>  		struct vring_desc_packed __user *desc_packed;
>  	};
> -	struct vring_avail __user *avail;
> -	struct vring_used __user *used;
> +	union {
> +		struct vring_avail __user *avail;
> +		struct vring_packed_desc_event __user *driver_event;
> +	};
> +	union {
> +		struct vring_used __user *used;
> +		struct vring_packed_desc_event __user *device_event;
> +	};
>  	const struct vhost_umem_node *meta_iotlb[VHOST_NUM_ADDRS];
>  	struct file *kick;
>  	struct eventfd_ctx *call_ctx;
> diff --git a/include/uapi/linux/virtio_ring.h b/include/uapi/linux/virtio_ring.h
> index e297580..71c7a46 100644
> --- a/include/uapi/linux/virtio_ring.h
> +++ b/include/uapi/linux/virtio_ring.h
> @@ -75,6 +75,25 @@ struct vring_desc_packed {
>  	__virtio16 flags;
>  };
>  
> +/* Enable events */
> +#define RING_EVENT_FLAGS_ENABLE 0x0
> +/* Disable events */
> +#define RING_EVENT_FLAGS_DISABLE 0x1
> +/*
> + * Enable events for a specific descriptor
> + * (as specified by Descriptor Ring Change Event Offset/Wrap Counter).
> + * Only valid if VIRTIO_F_RING_EVENT_IDX has been negotiated.
> + */
> +#define RING_EVENT_FLAGS_DESC 0x2
> +/* The value 0x3 is reserved */
> +
> +struct vring_packed_desc_event {
> +	/* Descriptor Ring Change Event Offset and Wrap Counter */
> +	__virtio16 off_wrap;
> +	/* Descriptor Ring Change Event Flags */
> +	__virtio16 flags;
> +};
> +
>  /* Virtio ring descriptors: 16 bytes.  These can chain together via "next". */
>  struct vring_desc {
>  	/* Address (guest-physical). */
> -- 
> 2.7.4
> 

^ permalink raw reply

* [PATCH net-next] cxgb4: Add FORCE_PAUSE bit to 32 bit port caps
From: Ganesh Goudar @ 2018-05-30 11:45 UTC (permalink / raw)
  To: netdev, davem
  Cc: nirranjan, indranil, Ganesh Goudar, Santosh Rastapur,
	Casey Leedom

Add FORCE_PAUSE bit to force local pause settings instead
of using auto negotiated values.

Signed-off-by: Santosh Rastapur <santosh@chelsio.com>
Signed-off-by: Casey Leedom <leedom@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
---
 drivers/net/ethernet/chelsio/cxgb4/t4_hw.c    | 10 +++++++++-
 drivers/net/ethernet/chelsio/cxgb4/t4fw_api.h |  5 +++--
 2 files changed, 12 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c b/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c
index 39da7e3..974a868 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c
@@ -3941,6 +3941,7 @@ static fw_port_cap32_t fwcaps16_to_caps32(fw_port_cap16_t caps16)
 	CAP16_TO_CAP32(FC_RX);
 	CAP16_TO_CAP32(FC_TX);
 	CAP16_TO_CAP32(ANEG);
+	CAP16_TO_CAP32(FORCE_PAUSE);
 	CAP16_TO_CAP32(MDIAUTO);
 	CAP16_TO_CAP32(MDISTRAIGHT);
 	CAP16_TO_CAP32(FEC_RS);
@@ -3982,6 +3983,7 @@ static fw_port_cap16_t fwcaps32_to_caps16(fw_port_cap32_t caps32)
 	CAP32_TO_CAP16(802_3_PAUSE);
 	CAP32_TO_CAP16(802_3_ASM_DIR);
 	CAP32_TO_CAP16(ANEG);
+	CAP32_TO_CAP16(FORCE_PAUSE);
 	CAP32_TO_CAP16(MDIAUTO);
 	CAP32_TO_CAP16(MDISTRAIGHT);
 	CAP32_TO_CAP16(FEC_RS);
@@ -4014,6 +4016,8 @@ static inline fw_port_cap32_t cc_to_fwcap_pause(enum cc_pause cc_pause)
 		fw_pause |= FW_PORT_CAP32_FC_RX;
 	if (cc_pause & PAUSE_TX)
 		fw_pause |= FW_PORT_CAP32_FC_TX;
+	if (!(cc_pause & PAUSE_AUTONEG))
+		fw_pause |= FW_PORT_CAP32_FORCE_PAUSE;
 
 	return fw_pause;
 }
@@ -4101,7 +4105,11 @@ int t4_link_l1cfg_core(struct adapter *adapter, unsigned int mbox,
 		rcap = lc->acaps | fw_fc | fw_fec | fw_mdi;
 	}
 
-	if (rcap & ~lc->pcaps) {
+	/* Note that older Firmware doesn't have FW_PORT_CAP32_FORCE_PAUSE, so
+	 * we need to exclude this from this check in order to maintain
+	 * compatibility ...
+	 */
+	if ((rcap & ~lc->pcaps) & ~FW_PORT_CAP32_FORCE_PAUSE) {
 		dev_err(adapter->pdev_dev,
 			"Requested Port Capabilities %#x exceed Physical Port Capabilities %#x\n",
 			rcap, lc->pcaps);
diff --git a/drivers/net/ethernet/chelsio/cxgb4/t4fw_api.h b/drivers/net/ethernet/chelsio/cxgb4/t4fw_api.h
index 2d91480..f1967cf 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/t4fw_api.h
+++ b/drivers/net/ethernet/chelsio/cxgb4/t4fw_api.h
@@ -2475,7 +2475,7 @@ enum fw_port_cap {
 	FW_PORT_CAP_MDISTRAIGHT		= 0x0400,
 	FW_PORT_CAP_FEC_RS		= 0x0800,
 	FW_PORT_CAP_FEC_BASER_RS	= 0x1000,
-	FW_PORT_CAP_FEC_RESERVED	= 0x2000,
+	FW_PORT_CAP_FORCE_PAUSE		= 0x2000,
 	FW_PORT_CAP_802_3_PAUSE		= 0x4000,
 	FW_PORT_CAP_802_3_ASM_DIR	= 0x8000,
 };
@@ -2522,7 +2522,8 @@ enum fw_port_mdi {
 #define	FW_PORT_CAP32_FEC_RESERVED1	0x02000000UL
 #define	FW_PORT_CAP32_FEC_RESERVED2	0x04000000UL
 #define	FW_PORT_CAP32_FEC_RESERVED3	0x08000000UL
-#define	FW_PORT_CAP32_RESERVED2		0xf0000000UL
+#define FW_PORT_CAP32_FORCE_PAUSE	0x10000000UL
+#define FW_PORT_CAP32_RESERVED2		0xe0000000UL
 
 #define FW_PORT_CAP32_SPEED_S	0
 #define FW_PORT_CAP32_SPEED_M	0xfff
-- 
2.1.0

^ permalink raw reply related

* Re: [PATCH net-next] net: qcom/emac: fix unused variable
From: Timur Tabi @ 2018-05-30 12:10 UTC (permalink / raw)
  To: YueHaibing, davem; +Cc: netdev, linux-kernel
In-Reply-To: <20180529104343.19448-1-yuehaibing@huawei.com>

On 5/29/18 5:43 AM, YueHaibing wrote:
> When CONFIG_ACPI isn't set, variable qdf2400_ops/qdf2432_ops isn't used.
> drivers/net/ethernet/qualcomm/emac/emac-sgmii.c:284:25: warning: ‘qdf2400_ops’ defined but not used [-Wunused-variable]
>   static struct sgmii_ops qdf2400_ops = {
>                           ^~~~~~~~~~~
> drivers/net/ethernet/qualcomm/emac/emac-sgmii.c:276:25: warning: ‘qdf2432_ops’ defined but not used [-Wunused-variable]
>   static struct sgmii_ops qdf2432_ops = {
>                           ^~~~~~~~~~~
> 
> Move the declaration and functions inside the CONFIG_ACPI ifdef
> to fix the warning.
> Signed-off-by: YueHaibing<yuehaibing@huawei.com>

I already fixed this with:

https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git/commit/?id=d377df784178bf5b0a39e75dc8b1ee86e1abb3f6

-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm
Technologies, Inc.  Qualcomm Technologies, Inc. is a member of the
Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply

* [PATCH] rtnetlink: Add more well known protocol values
From: Donald Sharp @ 2018-05-30 12:27 UTC (permalink / raw)
  To: netdev, dsahern

FRRouting installs routes into the kernel associated with
the originating protocol.  Add these values to the well
known values in rtnetlink.h.

Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
---
v2: Fixed whitespace issues
 include/uapi/linux/rtnetlink.h | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/include/uapi/linux/rtnetlink.h b/include/uapi/linux/rtnetlink.h
index cabb210c93af..7d8502313c99 100644
--- a/include/uapi/linux/rtnetlink.h
+++ b/include/uapi/linux/rtnetlink.h
@@ -254,6 +254,11 @@ enum {
 #define RTPROT_DHCP	16      /* DHCP client */
 #define RTPROT_MROUTED	17      /* Multicast daemon */
 #define RTPROT_BABEL	42      /* Babel daemon */
+#define RTPROT_BGP	186     /* BGP Routes */
+#define RTPROT_ISIS	187     /* ISIS Routes */
+#define RTPROT_OSPF	188     /* OSPF Routes */
+#define RTPROT_RIP	189     /* RIP Routes */
+#define RTPROT_EIGRP	192     /* EIGRP Routes */
 
 /* rtm_scope
 
-- 
2.14.3

^ permalink raw reply related

* Re: [PATCH mlx5-next v2 11/13] IB/mlx5: Add flow counters binding support
From: Yishai Hadas @ 2018-05-30 12:31 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Leon Romanovsky, Doug Ledford, Leon Romanovsky, RDMA mailing list,
	Boris Pismenny, Matan Barak, Raed Salem, Yishai Hadas,
	Saeed Mahameed, linux-netdev, Alex Rosenbaum
In-Reply-To: <20180529195627.GA31423@ziepe.ca>

On 5/29/2018 10:56 PM, Jason Gunthorpe wrote:
> On Tue, May 29, 2018 at 04:09:15PM +0300, Leon Romanovsky wrote:
>> diff --git a/include/uapi/rdma/mlx5-abi.h b/include/uapi/rdma/mlx5-abi.h
>> index 508ea8c82da7..ef3f430a7050 100644
>> +++ b/include/uapi/rdma/mlx5-abi.h
>> @@ -443,4 +443,18 @@ enum {
>>   enum {
>>   	MLX5_IB_CLOCK_INFO_V1              = 0,
>>   };
>> +
>> +struct mlx5_ib_flow_counters_data {
>> +	__aligned_u64   counters_data;
>> +	__u32   ncounters;
>> +	__u32   reserved;
>> +};
>> +
>> +struct mlx5_ib_create_flow {
>> +	__u32   ncounters_data;
>> +	__u32   reserved;
>> +	/* Following are counters data based on ncounters_data */
>> +	struct mlx5_ib_flow_counters_data data[];
>> +};
>> +
>>   #endif /* MLX5_ABI_USER_H */
> 
> This uapi thing still needs to be fixed as I pointed out before.

In V3 we can go with below, no change in memory layout but it can 
clarify the code/usage.

struct mlx5_ib_flow_counters_desc {
         __u32   description;
         __u32   index;
};

struct mlx5_ib_flow_counters_data {
         RDMA_UAPI_PTR(struct mlx5_ib_flow_counters_desc *, counters_data);
         __u32   ncounters;
         __u32   reserved;
};

struct mlx5_ib_create_flow {
         __u32   ncounters_data;
         __u32   reserved;
         /* Following are counters data based on ncounters_data */
         struct mlx5_ib_flow_counters_data data[];


> I still can't figure out why this should be a 2d array.

This comes to support the future case of multiple counters objects/specs 
passed with the same flow. There is a need to differentiate mapping data 
for each counters object and that is done via the 'ncounters_data' field 
and the 2d array.

  I think it
> should be written simply as:
> 
> struct mlx5_ib_flow_counter_desc {
>          __u32 description;
>          __u32 index;
> };
> 
> struct mlx5_ib_create_flow {
> 	RDMA_UAPI_PTR(struct mlx5_ib_flow_counter_desc, counters_data);
> 	__u32   ncounters;
> 	__u32   reserved;
> };
> 
> With the corresponding changes elsewhere.
> 

This doesn't support the above use case.

> A flex array at the end of a struct means that the struct can never be
> extended again which seems like a terrible idea,

The header [1] has a fixed size and will always exist even if there will 
be no counters. Future extensions [2] will be added in the memory post 
the flex array which its size depends on 'ncounters_data'. This pattern 
is used also in other extended APIs. [3]

struct mlx5_ib_create_flow {
         __u32   ncounters_data;
         __u32   reserved;
[1] /* Header is above ********

         /* Following are counters data based on ncounters_data */
         struct mlx5_ib_flow_counters_data data[];

[2] Future fields.

[3] 
https://elixir.bootlin.com/linux/latest/source/include/uapi/rdma/ib_user_verbs.h#L1145

^ permalink raw reply

* Re: [PATCH] rtnetlink: Add more well known protocol values
From: Donald Sharp @ 2018-05-30 12:32 UTC (permalink / raw)
  To: netdev, David Ahern
In-Reply-To: <20180530122732.3688-1-sharpd@cumulusnetworks.com>

This patch is intended for net-next.

thanks!

donald

On Wed, May 30, 2018 at 8:27 AM, Donald Sharp
<sharpd@cumulusnetworks.com> wrote:
> FRRouting installs routes into the kernel associated with
> the originating protocol.  Add these values to the well
> known values in rtnetlink.h.
>
> Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
> ---
> v2: Fixed whitespace issues
>  include/uapi/linux/rtnetlink.h | 5 +++++
>  1 file changed, 5 insertions(+)
>
> diff --git a/include/uapi/linux/rtnetlink.h b/include/uapi/linux/rtnetlink.h
> index cabb210c93af..7d8502313c99 100644
> --- a/include/uapi/linux/rtnetlink.h
> +++ b/include/uapi/linux/rtnetlink.h
> @@ -254,6 +254,11 @@ enum {
>  #define RTPROT_DHCP    16      /* DHCP client */
>  #define RTPROT_MROUTED 17      /* Multicast daemon */
>  #define RTPROT_BABEL   42      /* Babel daemon */
> +#define RTPROT_BGP     186     /* BGP Routes */
> +#define RTPROT_ISIS    187     /* ISIS Routes */
> +#define RTPROT_OSPF    188     /* OSPF Routes */
> +#define RTPROT_RIP     189     /* RIP Routes */
> +#define RTPROT_EIGRP   192     /* EIGRP Routes */
>
>  /* rtm_scope
>
> --
> 2.14.3
>

^ permalink raw reply

* [PATCH net-next] qed: Add srq core support for RoCE and iWARP
From: Yuval Bason @ 2018-05-30 13:11 UTC (permalink / raw)
  To: yuval.bason, davem
  Cc: netdev, jgg, dledford, linux-rdma, Michal Kalderon, Ariel Elior

This patch adds support for configuring SRQ and provides the necessary
APIs for rdma upper layer driver (qedr) to enable the SRQ feature.

Signed-off-by: Michal Kalderon <michal.kalderon@cavium.com>
Signed-off-by: Ariel Elior <ariel.elior@cavium.com>
Signed-off-by: Yuval Bason <yuval.bason@cavium.com>
---
 drivers/net/ethernet/qlogic/qed/qed_cxt.c   |   5 +-
 drivers/net/ethernet/qlogic/qed/qed_cxt.h   |   1 +
 drivers/net/ethernet/qlogic/qed/qed_hsi.h   |   2 +
 drivers/net/ethernet/qlogic/qed/qed_iwarp.c |  23 ++++
 drivers/net/ethernet/qlogic/qed/qed_main.c  |   2 +
 drivers/net/ethernet/qlogic/qed/qed_rdma.c  | 179 +++++++++++++++++++++++++++-
 drivers/net/ethernet/qlogic/qed/qed_rdma.h  |   2 +
 drivers/net/ethernet/qlogic/qed/qed_roce.c  |  17 ++-
 include/linux/qed/qed_rdma_if.h             |  12 +-
 9 files changed, 235 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/qlogic/qed/qed_cxt.c b/drivers/net/ethernet/qlogic/qed/qed_cxt.c
index 820b226..7ed6aa0 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_cxt.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_cxt.c
@@ -47,6 +47,7 @@
 #include "qed_hsi.h"
 #include "qed_hw.h"
 #include "qed_init_ops.h"
+#include "qed_rdma.h"
 #include "qed_reg_addr.h"
 #include "qed_sriov.h"
 
@@ -426,7 +427,7 @@ static void qed_cxt_set_srq_count(struct qed_hwfn *p_hwfn, u32 num_srqs)
 	p_mgr->srq_count = num_srqs;
 }
 
-static u32 qed_cxt_get_srq_count(struct qed_hwfn *p_hwfn)
+u32 qed_cxt_get_srq_count(struct qed_hwfn *p_hwfn)
 {
 	struct qed_cxt_mngr *p_mgr = p_hwfn->p_cxt_mngr;
 
@@ -2071,7 +2072,7 @@ static void qed_rdma_set_pf_params(struct qed_hwfn *p_hwfn,
 	u32 num_cons, num_qps, num_srqs;
 	enum protocol_type proto;
 
-	num_srqs = min_t(u32, 32 * 1024, p_params->num_srqs);
+	num_srqs = min_t(u32, QED_RDMA_MAX_SRQS, p_params->num_srqs);
 
 	if (p_hwfn->mcp_info->func_info.protocol == QED_PCI_ETH_RDMA) {
 		DP_NOTICE(p_hwfn,
diff --git a/drivers/net/ethernet/qlogic/qed/qed_cxt.h b/drivers/net/ethernet/qlogic/qed/qed_cxt.h
index a4e9586..758a8b4 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_cxt.h
+++ b/drivers/net/ethernet/qlogic/qed/qed_cxt.h
@@ -235,6 +235,7 @@ u32 qed_cxt_get_proto_tid_count(struct qed_hwfn *p_hwfn,
 				enum protocol_type type);
 u32 qed_cxt_get_proto_cid_start(struct qed_hwfn *p_hwfn,
 				enum protocol_type type);
+u32 qed_cxt_get_srq_count(struct qed_hwfn *p_hwfn);
 int qed_cxt_free_proto_ilt(struct qed_hwfn *p_hwfn, enum protocol_type proto);
 
 #define QED_CTX_WORKING_MEM 0
diff --git a/drivers/net/ethernet/qlogic/qed/qed_hsi.h b/drivers/net/ethernet/qlogic/qed/qed_hsi.h
index 8e1e6e1..82ce401 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_hsi.h
+++ b/drivers/net/ethernet/qlogic/qed/qed_hsi.h
@@ -9725,6 +9725,8 @@ enum iwarp_eqe_async_opcode {
 	IWARP_EVENT_TYPE_ASYNC_EXCEPTION_DETECTED,
 	IWARP_EVENT_TYPE_ASYNC_QP_IN_ERROR_STATE,
 	IWARP_EVENT_TYPE_ASYNC_CQ_OVERFLOW,
+	IWARP_EVENT_TYPE_ASYNC_SRQ_EMPTY,
+	IWARP_EVENT_TYPE_ASYNC_SRQ_LIMIT,
 	MAX_IWARP_EQE_ASYNC_OPCODE
 };
 
diff --git a/drivers/net/ethernet/qlogic/qed/qed_iwarp.c b/drivers/net/ethernet/qlogic/qed/qed_iwarp.c
index 2a2b101..474e6cf 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_iwarp.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_iwarp.c
@@ -271,6 +271,8 @@ int qed_iwarp_create_qp(struct qed_hwfn *p_hwfn,
 	p_ramrod->sq_num_pages = qp->sq_num_pages;
 	p_ramrod->rq_num_pages = qp->rq_num_pages;
 
+	p_ramrod->srq_id.srq_idx = cpu_to_le16(qp->srq_id);
+	p_ramrod->srq_id.opaque_fid = cpu_to_le16(p_hwfn->hw_info.opaque_fid);
 	p_ramrod->qp_handle_for_cqe.hi = cpu_to_le32(qp->qp_handle.hi);
 	p_ramrod->qp_handle_for_cqe.lo = cpu_to_le32(qp->qp_handle.lo);
 
@@ -3004,8 +3006,11 @@ static int qed_iwarp_async_event(struct qed_hwfn *p_hwfn,
 				 union event_ring_data *data,
 				 u8 fw_return_code)
 {
+	struct qed_rdma_events events = p_hwfn->p_rdma_info->events;
 	struct regpair *fw_handle = &data->rdma_data.async_handle;
 	struct qed_iwarp_ep *ep = NULL;
+	u16 srq_offset;
+	u16 srq_id;
 	u16 cid;
 
 	ep = (struct qed_iwarp_ep *)(uintptr_t)HILO_64(fw_handle->hi,
@@ -3067,6 +3072,24 @@ static int qed_iwarp_async_event(struct qed_hwfn *p_hwfn,
 		qed_iwarp_cid_cleaned(p_hwfn, cid);
 
 		break;
+	case IWARP_EVENT_TYPE_ASYNC_SRQ_EMPTY:
+		DP_NOTICE(p_hwfn, "IWARP_EVENT_TYPE_ASYNC_SRQ_EMPTY\n");
+		srq_offset = p_hwfn->p_rdma_info->srq_id_offset;
+		/* FW assigns value that is no greater than u16 */
+		srq_id = ((u16)le32_to_cpu(fw_handle->lo)) - srq_offset;
+		events.affiliated_event(events.context,
+					QED_IWARP_EVENT_SRQ_EMPTY,
+					&srq_id);
+		break;
+	case IWARP_EVENT_TYPE_ASYNC_SRQ_LIMIT:
+		DP_NOTICE(p_hwfn, "IWARP_EVENT_TYPE_ASYNC_SRQ_LIMIT\n");
+		srq_offset = p_hwfn->p_rdma_info->srq_id_offset;
+		/* FW assigns value that is no greater than u16 */
+		srq_id = ((u16)le32_to_cpu(fw_handle->lo)) - srq_offset;
+		events.affiliated_event(events.context,
+					QED_IWARP_EVENT_SRQ_LIMIT,
+					&srq_id);
+		break;
 	case IWARP_EVENT_TYPE_ASYNC_CQ_OVERFLOW:
 		DP_NOTICE(p_hwfn, "IWARP_EVENT_TYPE_ASYNC_CQ_OVERFLOW\n");
 
diff --git a/drivers/net/ethernet/qlogic/qed/qed_main.c b/drivers/net/ethernet/qlogic/qed/qed_main.c
index 68c4399..b04d57c 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_main.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_main.c
@@ -64,6 +64,7 @@
 
 #define QED_ROCE_QPS			(8192)
 #define QED_ROCE_DPIS			(8)
+#define QED_RDMA_SRQS                   QED_ROCE_QPS
 
 static char version[] =
 	"QLogic FastLinQ 4xxxx Core Module qed " DRV_MODULE_VERSION "\n";
@@ -922,6 +923,7 @@ static void qed_update_pf_params(struct qed_dev *cdev,
 	if (IS_ENABLED(CONFIG_QED_RDMA)) {
 		params->rdma_pf_params.num_qps = QED_ROCE_QPS;
 		params->rdma_pf_params.min_dpis = QED_ROCE_DPIS;
+		params->rdma_pf_params.num_srqs = QED_RDMA_SRQS;
 		/* divide by 3 the MRs to avoid MF ILT overflow */
 		params->rdma_pf_params.gl_pi = QED_ROCE_PROTOCOL_INDEX;
 	}
diff --git a/drivers/net/ethernet/qlogic/qed/qed_rdma.c b/drivers/net/ethernet/qlogic/qed/qed_rdma.c
index a411f9c..bd23659 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_rdma.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_rdma.c
@@ -259,15 +259,29 @@ static int qed_rdma_alloc(struct qed_hwfn *p_hwfn,
 		goto free_cid_map;
 	}
 
+	/* Allocate bitmap for srqs */
+	p_rdma_info->num_srqs = qed_cxt_get_srq_count(p_hwfn);
+	rc = qed_rdma_bmap_alloc(p_hwfn, &p_rdma_info->srq_map,
+				 p_rdma_info->num_srqs, "SRQ");
+	if (rc) {
+		DP_VERBOSE(p_hwfn, QED_MSG_RDMA,
+			   "Failed to allocate srq bitmap, rc = %d\n", rc);
+		goto free_real_cid_map;
+	}
+
 	if (QED_IS_IWARP_PERSONALITY(p_hwfn))
 		rc = qed_iwarp_alloc(p_hwfn);
 
 	if (rc)
-		goto free_cid_map;
+		goto free_srq_map;
 
 	DP_VERBOSE(p_hwfn, QED_MSG_RDMA, "Allocation successful\n");
 	return 0;
 
+free_srq_map:
+	kfree(p_rdma_info->srq_map.bitmap);
+free_real_cid_map:
+	kfree(p_rdma_info->real_cid_map.bitmap);
 free_cid_map:
 	kfree(p_rdma_info->cid_map.bitmap);
 free_tid_map:
@@ -351,6 +365,8 @@ static void qed_rdma_resc_free(struct qed_hwfn *p_hwfn)
 	qed_rdma_bmap_free(p_hwfn, &p_hwfn->p_rdma_info->cq_map, 1);
 	qed_rdma_bmap_free(p_hwfn, &p_hwfn->p_rdma_info->toggle_bits, 0);
 	qed_rdma_bmap_free(p_hwfn, &p_hwfn->p_rdma_info->tid_map, 1);
+	qed_rdma_bmap_free(p_hwfn, &p_hwfn->p_rdma_info->srq_map, 1);
+	qed_rdma_bmap_free(p_hwfn, &p_hwfn->p_rdma_info->real_cid_map, 1);
 
 	kfree(p_rdma_info->port);
 	kfree(p_rdma_info->dev);
@@ -431,6 +447,12 @@ static void qed_rdma_init_devinfo(struct qed_hwfn *p_hwfn,
 	if (cdev->rdma_max_sge)
 		dev->max_sge = min_t(u32, cdev->rdma_max_sge, dev->max_sge);
 
+	dev->max_srq_sge = QED_RDMA_MAX_SGE_PER_SRQ_WQE;
+	if (p_hwfn->cdev->rdma_max_srq_sge) {
+		dev->max_srq_sge = min_t(u32,
+					 p_hwfn->cdev->rdma_max_srq_sge,
+					 dev->max_srq_sge);
+	}
 	dev->max_inline = ROCE_REQ_MAX_INLINE_DATA_SIZE;
 
 	dev->max_inline = (cdev->rdma_max_inline) ?
@@ -474,6 +496,8 @@ static void qed_rdma_init_devinfo(struct qed_hwfn *p_hwfn,
 	dev->max_mr_mw_fmr_size = dev->max_mr_mw_fmr_pbl * PAGE_SIZE;
 	dev->max_pkey = QED_RDMA_MAX_P_KEY;
 
+	dev->max_srq = p_hwfn->p_rdma_info->num_srqs;
+	dev->max_srq_wr = QED_RDMA_MAX_SRQ_WQE_ELEM;
 	dev->max_qp_resp_rd_atomic_resc = RDMA_RING_PAGE_SIZE /
 					  (RDMA_RESP_RD_ATOMIC_ELM_SIZE * 2);
 	dev->max_qp_req_rd_atomic_resc = RDMA_RING_PAGE_SIZE /
@@ -1628,6 +1652,156 @@ static void *qed_rdma_get_rdma_ctx(struct qed_dev *cdev)
 	return QED_LEADING_HWFN(cdev);
 }
 
+int qed_rdma_modify_srq(void *rdma_cxt,
+			struct qed_rdma_modify_srq_in_params *in_params)
+{
+	struct rdma_srq_modify_ramrod_data *p_ramrod;
+	struct qed_hwfn *p_hwfn = rdma_cxt;
+	struct qed_sp_init_data init_data;
+	struct qed_spq_entry *p_ent;
+	u16 opaque_fid;
+	int rc;
+
+	memset(&init_data, 0, sizeof(init_data));
+	init_data.opaque_fid = p_hwfn->hw_info.opaque_fid;
+	init_data.comp_mode = QED_SPQ_MODE_EBLOCK;
+
+	rc = qed_sp_init_request(p_hwfn, &p_ent,
+				 RDMA_RAMROD_MODIFY_SRQ,
+				 p_hwfn->p_rdma_info->proto, &init_data);
+	if (rc)
+		return rc;
+
+	p_ramrod = &p_ent->ramrod.rdma_modify_srq;
+	p_ramrod->srq_id.srq_idx = cpu_to_le16(in_params->srq_id);
+	opaque_fid = p_hwfn->hw_info.opaque_fid;
+	p_ramrod->srq_id.opaque_fid = cpu_to_le16(opaque_fid);
+	p_ramrod->wqe_limit = cpu_to_le16(in_params->wqe_limit);
+
+	rc = qed_spq_post(p_hwfn, p_ent, NULL);
+	if (rc)
+		return rc;
+
+	DP_VERBOSE(p_hwfn, QED_MSG_RDMA, "modified SRQ id = %x",
+		   in_params->srq_id);
+
+	return rc;
+}
+
+int qed_rdma_destroy_srq(void *rdma_cxt,
+			 struct qed_rdma_destroy_srq_in_params *in_params)
+{
+	struct rdma_srq_destroy_ramrod_data *p_ramrod;
+	struct qed_hwfn *p_hwfn = rdma_cxt;
+	struct qed_sp_init_data init_data;
+	struct qed_spq_entry *p_ent;
+	struct qed_bmap *bmap;
+	u16 opaque_fid;
+	int rc;
+
+	opaque_fid = p_hwfn->hw_info.opaque_fid;
+
+	memset(&init_data, 0, sizeof(init_data));
+	init_data.opaque_fid = opaque_fid;
+	init_data.comp_mode = QED_SPQ_MODE_EBLOCK;
+
+	rc = qed_sp_init_request(p_hwfn, &p_ent,
+				 RDMA_RAMROD_DESTROY_SRQ,
+				 p_hwfn->p_rdma_info->proto, &init_data);
+	if (rc)
+		return rc;
+
+	p_ramrod = &p_ent->ramrod.rdma_destroy_srq;
+	p_ramrod->srq_id.srq_idx = cpu_to_le16(in_params->srq_id);
+	p_ramrod->srq_id.opaque_fid = cpu_to_le16(opaque_fid);
+
+	rc = qed_spq_post(p_hwfn, p_ent, NULL);
+	if (rc)
+		return rc;
+
+	bmap = &p_hwfn->p_rdma_info->srq_map;
+
+	spin_lock_bh(&p_hwfn->p_rdma_info->lock);
+	qed_bmap_release_id(p_hwfn, bmap, in_params->srq_id);
+	spin_unlock_bh(&p_hwfn->p_rdma_info->lock);
+
+	DP_VERBOSE(p_hwfn, QED_MSG_RDMA, "SRQ destroyed Id = %x",
+		   in_params->srq_id);
+
+	return rc;
+}
+
+int qed_rdma_create_srq(void *rdma_cxt,
+			struct qed_rdma_create_srq_in_params *in_params,
+			struct qed_rdma_create_srq_out_params *out_params)
+{
+	struct rdma_srq_create_ramrod_data *p_ramrod;
+	struct qed_hwfn *p_hwfn = rdma_cxt;
+	struct qed_sp_init_data init_data;
+	enum qed_cxt_elem_type elem_type;
+	struct qed_spq_entry *p_ent;
+	u16 opaque_fid, srq_id;
+	struct qed_bmap *bmap;
+	u32 returned_id;
+	int rc;
+
+	bmap = &p_hwfn->p_rdma_info->srq_map;
+	spin_lock_bh(&p_hwfn->p_rdma_info->lock);
+	rc = qed_rdma_bmap_alloc_id(p_hwfn, bmap, &returned_id);
+	spin_unlock_bh(&p_hwfn->p_rdma_info->lock);
+
+	if (rc) {
+		DP_NOTICE(p_hwfn, "failed to allocate srq id\n");
+		return rc;
+	}
+
+	elem_type = QED_ELEM_SRQ;
+	rc = qed_cxt_dynamic_ilt_alloc(p_hwfn, elem_type, returned_id);
+	if (rc)
+		goto err;
+	/* returned id is no greater than u16 */
+	srq_id = (u16)returned_id;
+	opaque_fid = p_hwfn->hw_info.opaque_fid;
+
+	memset(&init_data, 0, sizeof(init_data));
+	opaque_fid = p_hwfn->hw_info.opaque_fid;
+	init_data.opaque_fid = opaque_fid;
+	init_data.comp_mode = QED_SPQ_MODE_EBLOCK;
+
+	rc = qed_sp_init_request(p_hwfn, &p_ent,
+				 RDMA_RAMROD_CREATE_SRQ,
+				 p_hwfn->p_rdma_info->proto, &init_data);
+	if (rc)
+		goto err;
+
+	p_ramrod = &p_ent->ramrod.rdma_create_srq;
+	DMA_REGPAIR_LE(p_ramrod->pbl_base_addr, in_params->pbl_base_addr);
+	p_ramrod->pages_in_srq_pbl = cpu_to_le16(in_params->num_pages);
+	p_ramrod->pd_id = cpu_to_le16(in_params->pd_id);
+	p_ramrod->srq_id.srq_idx = cpu_to_le16(srq_id);
+	p_ramrod->srq_id.opaque_fid = cpu_to_le16(opaque_fid);
+	p_ramrod->page_size = cpu_to_le16(in_params->page_size);
+	DMA_REGPAIR_LE(p_ramrod->producers_addr, in_params->prod_pair_addr);
+
+	rc = qed_spq_post(p_hwfn, p_ent, NULL);
+	if (rc)
+		goto err;
+
+	out_params->srq_id = srq_id;
+
+	DP_VERBOSE(p_hwfn, QED_MSG_RDMA,
+		   "SRQ created Id = %x\n", out_params->srq_id);
+
+	return rc;
+
+err:
+	spin_lock_bh(&p_hwfn->p_rdma_info->lock);
+	qed_bmap_release_id(p_hwfn, bmap, returned_id);
+	spin_unlock_bh(&p_hwfn->p_rdma_info->lock);
+
+	return rc;
+}
+
 bool qed_rdma_allocated_qps(struct qed_hwfn *p_hwfn)
 {
 	bool result;
@@ -1773,6 +1947,9 @@ static int qed_roce_ll2_set_mac_filter(struct qed_dev *cdev,
 	.rdma_free_tid = &qed_rdma_free_tid,
 	.rdma_register_tid = &qed_rdma_register_tid,
 	.rdma_deregister_tid = &qed_rdma_deregister_tid,
+	.rdma_create_srq = &qed_rdma_create_srq,
+	.rdma_modify_srq = &qed_rdma_modify_srq,
+	.rdma_destroy_srq = &qed_rdma_destroy_srq,
 	.ll2_acquire_connection = &qed_ll2_acquire_connection,
 	.ll2_establish_connection = &qed_ll2_establish_connection,
 	.ll2_terminate_connection = &qed_ll2_terminate_connection,
diff --git a/drivers/net/ethernet/qlogic/qed/qed_rdma.h b/drivers/net/ethernet/qlogic/qed/qed_rdma.h
index 18ec9cb..6f722ee 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_rdma.h
+++ b/drivers/net/ethernet/qlogic/qed/qed_rdma.h
@@ -96,6 +96,8 @@ struct qed_rdma_info {
 	u8 num_cnqs;
 	u32 num_qps;
 	u32 num_mrs;
+	u32 num_srqs;
+	u16 srq_id_offset;
 	u16 queue_zone_base;
 	u16 max_queue_zones;
 	enum protocol_type proto;
diff --git a/drivers/net/ethernet/qlogic/qed/qed_roce.c b/drivers/net/ethernet/qlogic/qed/qed_roce.c
index 6acfd43..ee57fcd 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_roce.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_roce.c
@@ -65,6 +65,8 @@
 		     u8 fw_event_code,
 		     u16 echo, union event_ring_data *data, u8 fw_return_code)
 {
+	struct qed_rdma_events events = p_hwfn->p_rdma_info->events;
+
 	if (fw_event_code == ROCE_ASYNC_EVENT_DESTROY_QP_DONE) {
 		u16 icid =
 		    (u16)le32_to_cpu(data->rdma_data.rdma_destroy_qp_data.cid);
@@ -75,11 +77,18 @@
 		 */
 		qed_roce_free_real_icid(p_hwfn, icid);
 	} else {
-		struct qed_rdma_events *events = &p_hwfn->p_rdma_info->events;
+		if (fw_event_code == ROCE_ASYNC_EVENT_SRQ_EMPTY ||
+		    fw_event_code == ROCE_ASYNC_EVENT_SRQ_LIMIT) {
+			u16 srq_id = (u16)data->rdma_data.async_handle.lo;
+
+			events.affiliated_event(events.context, fw_event_code,
+						&srq_id);
+		} else {
+			union rdma_eqe_data rdata = data->rdma_data;
 
-		events->affiliated_event(p_hwfn->p_rdma_info->events.context,
-					 fw_event_code,
-				     (void *)&data->rdma_data.async_handle);
+			events.affiliated_event(events.context, fw_event_code,
+						(void *)&rdata.async_handle);
+		}
 	}
 
 	return 0;
diff --git a/include/linux/qed/qed_rdma_if.h b/include/linux/qed/qed_rdma_if.h
index 4dd72ba..e05e320 100644
--- a/include/linux/qed/qed_rdma_if.h
+++ b/include/linux/qed/qed_rdma_if.h
@@ -485,7 +485,9 @@ enum qed_iwarp_event_type {
 	QED_IWARP_EVENT_ACTIVE_MPA_REPLY,
 	QED_IWARP_EVENT_LOCAL_ACCESS_ERROR,
 	QED_IWARP_EVENT_REMOTE_OPERATION_ERROR,
-	QED_IWARP_EVENT_TERMINATE_RECEIVED
+	QED_IWARP_EVENT_TERMINATE_RECEIVED,
+	QED_IWARP_EVENT_SRQ_LIMIT,
+	QED_IWARP_EVENT_SRQ_EMPTY,
 };
 
 enum qed_tcp_ip_version {
@@ -646,6 +648,14 @@ struct qed_rdma_ops {
 	int (*rdma_alloc_tid)(void *rdma_cxt, u32 *itid);
 	void (*rdma_free_tid)(void *rdma_cxt, u32 itid);
 
+	int (*rdma_create_srq)(void *rdma_cxt,
+			       struct qed_rdma_create_srq_in_params *iparams,
+			       struct qed_rdma_create_srq_out_params *oparams);
+	int (*rdma_destroy_srq)(void *rdma_cxt,
+				struct qed_rdma_destroy_srq_in_params *iparams);
+	int (*rdma_modify_srq)(void *rdma_cxt,
+			       struct qed_rdma_modify_srq_in_params *iparams);
+
 	int (*ll2_acquire_connection)(void *rdma_cxt,
 				      struct qed_ll2_acquire_data *data);
 
-- 
1.8.3.1

^ permalink raw reply related

* Re: [RFC PATCH ghak32 V2 00/13] audit: implement container id
From: Steve Grubb @ 2018-05-30 13:20 UTC (permalink / raw)
  To: linux-audit-H+wXaHxf7aLQT0dZR+AlfA
  Cc: simo-H+wXaHxf7aLQT0dZR+AlfA, jlayton-H+wXaHxf7aLQT0dZR+AlfA,
	linux-api-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, LKML,
	eparis-FjpueFixGhCM4zKIHC2jIg, dhowells-H+wXaHxf7aLQT0dZR+AlfA,
	carlos-H+wXaHxf7aLQT0dZR+AlfA, ebiederm-aS9lmoZGLiVWk0Htik3J/w,
	luto-DgEjT+Ai2ygdnm+yROfE0A, netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	cgroups-u79uwXL29TY76Z2rM5mHXA,
	viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn
In-Reply-To: <cover.1521179281.git.rgb-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>

On Friday, March 16, 2018 5:00:27 AM EDT Richard Guy Briggs wrote:
> Implement audit kernel container ID.
> 
> This patchset is a second RFC based on the proposal document (V3)
> posted:
> 	https://www.redhat.com/archives/linux-audit/2018-January/msg00014.html

So, if you work on a container orchestrator, how exactly is this set of 
interfaces to be used and in what order?

Thanks,
-Steve

> The first patch implements the proc fs write to set the audit container
> ID of a process, emitting an AUDIT_CONTAINER record to announce the
> registration of that container ID on that process.  This patch requires
> userspace support for record acceptance and proper type display.
> 
> The second checks for children or co-threads and refuses to set the
> container ID if either are present.  (This policy could be changed to
> set both with the same container ID provided they meet the rest of the
> requirements.)
> 
> The third implements the auxiliary record AUDIT_CONTAINER_INFO if a
> container ID is identifiable with an event.  This patch requires
> userspace support for proper type display.
> 
> The fourth adds container ID filtering to the exit, exclude and user
> lists.  This patch requires auditctil userspace support for the
> --containerid option.
> 
> The 5th adds signal and ptrace support.
> 
> The 6th creates a local audit context to be able to bind a standalone
> record with a locally created auxiliary record.
> 
> The 7th, 8th, 9th, 10th patches add container ID records to standalone
> records.  Some of these may end up being syscall auxiliary records and
> won't need this specific support since they'll be supported via
> syscalls.
> 
> The 11th adds network namespace container ID labelling based on member
> tasks' container ID labels.
> 
> The 12th adds container ID support to standalone netfilter records that
> don't have a task context and lists each container to which that net
> namespace belongs.
> 
> The 13th implements reading the container ID from the proc filesystem
> for debugging.  This patch isn't planned for upstream inclusion.
> 
> Feedback please!
> 
> Example: Set a container ID of 123456 to the "sleep" task:
> 	sleep 2&
> 	child=$!
> 	echo 123456 > /proc/$child/containerid; echo $?
> 	ausearch -ts recent -m container
> 	echo child:$child contid:$( cat /proc/$child/containerid)
> This should produce a record such as:
> 	type=CONTAINER msg=audit(1521122590.315:222): op=set pid=689 uid=0
> subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 auid=0 tty=pts0
> ses=3 opid=707 old-contid=18446744073709551615 contid=123456 res=1
> 
> Example: Set a filter on a container ID 123459 on /tmp/tmpcontainerid:
> 	containerid=123459
> 	key=tmpcontainerid
> 	auditctl -a exit,always -F dir=/tmp -F perm=wa -F containerid=$containerid
> -F key=$key perl -e "sleep 1; open(my \$tmpfile, '>', \"/tmp/$key\");
> close(\$tmpfile);" & child=$!
> 	echo $containerid > /proc/$child/containerid
> 	sleep 2
> 	ausearch -i -ts recent -k $key
> 	auditctl -d exit,always -F dir=/tmp -F perm=wa -F containerid=$containerid
> -F key=$key rm -f /tmp/$key
> This should produce an event such as:
> 	type=CONTAINER_INFO msg=audit(1521122591.614:227): op=task contid=123459
> 	type=PROCTITLE msg=audit(1521122591.614:227):
> proctitle=7065726C002D6500736C65657020313B206F70656E286D792024746D7066696C
> 652C20273E272C20222F746D702F746D70636F6E7461696E6572696422293B20636C6F73652
> 824746D7066696C65293B type=PATH msg=audit(1521122591.614:227): item=1
> name="/tmp/tmpcontainerid" inode=18427 dev=00:26 mode=0100644 ouid=0
> ogid=0 rdev=00:00 obj=unconfined_u:object_r:user_tmp_t:s0 nametype=CREATE
> cap_fp=0000000000000000 cap_fi=0000000000000000 cap_fe=0 cap_fver=0
> type=PATH msg=audit(1521122591.614:227): item=0 name="/tmp/" inode=13513
> dev=00:26 mode=041777 ouid=0 ogid=0 rdev=00:00
> obj=system_u:object_r:tmp_t:s0 nametype=PARENT cap_fp=0000000000000000
> cap_fi=0000000000000000 cap_fe=0 cap_fver=0 type=CWD
> msg=audit(1521122591.614:227): cwd="/root"
> 	type=SYSCALL msg=audit(1521122591.614:227): arch=c000003e syscall=257
> success=yes exit=3 a0=ffffffffffffff9c a1=55db90a28900 a2=241 a3=1b6
> items=2 ppid=689 pid=724 auid=0 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0
> sgid=0 fsgid=0 tty=pts0 ses=3 comm="perl" exe="/usr/bin/perl"
> subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023
> key="tmpcontainerid"
> 
> See:
> 	https://github.com/linux-audit/audit-kernel/issues/32
> 	https://github.com/linux-audit/audit-userspace/issues/40
> 	https://github.com/linux-audit/audit-testsuite/issues/64
> 
> Richard Guy Briggs (13):
>   audit: add container id
>   audit: check children and threading before allowing containerid
>   audit: log container info of syscalls
>   audit: add containerid filtering
>   audit: add containerid support for ptrace and signals
>   audit: add support for non-syscall auxiliary records
>   audit: add container aux record to watch/tree/mark
>   audit: add containerid support for tty_audit
>   audit: add containerid support for config/feature/user records
>   audit: add containerid support for seccomp and anom_abend records
>   audit: add support for containerid to network namespaces
>   audit: NETFILTER_PKT: record each container ID associated with a netNS
>   debug audit: read container ID of a process
> 
>  drivers/tty/tty_audit.c     |   5 +-
>  fs/proc/base.c              |  53 ++++++++++++++++
>  include/linux/audit.h       |  43 +++++++++++++
>  include/linux/init_task.h   |   4 +-
>  include/linux/sched.h       |   1 +
>  include/net/net_namespace.h |  12 ++++
>  include/uapi/linux/audit.h  |   8 ++-
>  kernel/audit.c              |  75 ++++++++++++++++++++---
>  kernel/audit.h              |   3 +
>  kernel/audit_fsnotify.c     |   5 +-
>  kernel/audit_tree.c         |   5 +-
>  kernel/audit_watch.c        |  33 +++++-----
>  kernel/auditfilter.c        |  52 +++++++++++++++-
>  kernel/auditsc.c            | 145
> ++++++++++++++++++++++++++++++++++++++++++-- kernel/nsproxy.c            |
>   6 ++
>  net/core/net_namespace.c    |  45 ++++++++++++++
>  net/netfilter/xt_AUDIT.c    |  15 ++++-
>  17 files changed, 473 insertions(+), 37 deletions(-)

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox