* [PATCH net 0/2] ip[6] tunnels: fix mtu calculations
From: Nicolas Dichtel @ 2018-05-30 8:28 UTC (permalink / raw)
To: davem, petrm, idosch; +Cc: netdev
The first patch restores the possibility to bind an ip4 tunnel to an
interface whith a large mtu.
The second patch was spotted after the first fix. I also target it to net
because it fixes the max mtu value that can be used for ipv6 tunnels.
net/ipv4/ip_tunnel.c | 6 +++---
net/ipv6/ip6_tunnel.c | 11 ++++++++---
net/ipv6/sit.c | 5 +++--
3 files changed, 14 insertions(+), 8 deletions(-)
Comments are welcomed,
Regards,
Nicolas
^ permalink raw reply
* [PATCH net 1/2] ip_tunnel: restore binding to ifaces with a large mtu
From: Nicolas Dichtel @ 2018-05-30 8:28 UTC (permalink / raw)
To: davem, petrm, idosch; +Cc: netdev, Nicolas Dichtel
In-Reply-To: <20180530082843.6076-1-nicolas.dichtel@6wind.com>
After commit f6cc9c054e77, the following conf is broken (note that the
default loopback mtu is 65536, ie IP_MAX_MTU + 1):
$ ip tunnel add gre1 mode gre local 10.125.0.1 remote 10.125.0.2 dev lo
add tunnel "gre0" failed: Invalid argument
$ ip l a type dummy
$ ip l s dummy1 up
$ ip l s dummy1 mtu 65535
$ ip tunnel add gre1 mode gre local 10.125.0.1 remote 10.125.0.2 dev dummy1
add tunnel "gre0" failed: Invalid argument
dev_set_mtu() doesn't allow to set a mtu which is too large.
First, let's cap the mtu returned by ip_tunnel_bind_dev(). Second, remove
the magic value 0xFFF8 and use IP_MAX_MTU instead.
0xFFF8 seems to be there for ages, I don't know why this value was used.
With a recent kernel, it's also possible to set a mtu > IP_MAX_MTU:
$ ip l s dummy1 mtu 66000
After that patch, it's also possible to bind an ip tunnel on that kind of
interface.
CC: Petr Machata <petrm@mellanox.com>
CC: Ido Schimmel <idosch@mellanox.com>
Link: https://git.kernel.org/pub/scm/linux/kernel/git/davem/netdev-vger-cvs.git/commit/?id=e5afd356a411a
Fixes: f6cc9c054e77 ("ip_tunnel: Emit events for post-register MTU changes")
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
---
net/ipv4/ip_tunnel.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/net/ipv4/ip_tunnel.c b/net/ipv4/ip_tunnel.c
index 6b0e362cc99b..3b39c72a1029 100644
--- a/net/ipv4/ip_tunnel.c
+++ b/net/ipv4/ip_tunnel.c
@@ -328,7 +328,7 @@ static int ip_tunnel_bind_dev(struct net_device *dev)
if (tdev) {
hlen = tdev->hard_header_len + tdev->needed_headroom;
- mtu = tdev->mtu;
+ mtu = min(tdev->mtu, IP_MAX_MTU);
}
dev->needed_headroom = t_hlen + hlen;
@@ -362,7 +362,7 @@ static struct ip_tunnel *ip_tunnel_create(struct net *net,
nt = netdev_priv(dev);
t_hlen = nt->hlen + sizeof(struct iphdr);
dev->min_mtu = ETH_MIN_MTU;
- dev->max_mtu = 0xFFF8 - dev->hard_header_len - t_hlen;
+ dev->max_mtu = IP_MAX_MTU - dev->hard_header_len - t_hlen;
ip_tunnel_add(itn, nt);
return nt;
@@ -930,7 +930,7 @@ int __ip_tunnel_change_mtu(struct net_device *dev, int new_mtu, bool strict)
{
struct ip_tunnel *tunnel = netdev_priv(dev);
int t_hlen = tunnel->hlen + sizeof(struct iphdr);
- int max_mtu = 0xFFF8 - dev->hard_header_len - t_hlen;
+ int max_mtu = IP_MAX_MTU - dev->hard_header_len - t_hlen;
if (new_mtu < ETH_MIN_MTU)
return -EINVAL;
--
2.15.1
^ permalink raw reply related
* [PATCH net 2/2] ip6_tunnel: remove magic mtu value 0xFFF8
From: Nicolas Dichtel @ 2018-05-30 8:28 UTC (permalink / raw)
To: davem, petrm, idosch; +Cc: netdev, Nicolas Dichtel
In-Reply-To: <20180530082843.6076-1-nicolas.dichtel@6wind.com>
I don't know where this value comes from (probably a copy and paste and
paste and paste ...).
Let's use standard values which are a bit greater.
Link: https://git.kernel.org/pub/scm/linux/kernel/git/davem/netdev-vger-cvs.git/commit/?id=e5afd356a411a
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
---
net/ipv6/ip6_tunnel.c | 11 ++++++++---
net/ipv6/sit.c | 5 +++--
2 files changed, 11 insertions(+), 5 deletions(-)
diff --git a/net/ipv6/ip6_tunnel.c b/net/ipv6/ip6_tunnel.c
index da66aaac51ce..00e138a44cbb 100644
--- a/net/ipv6/ip6_tunnel.c
+++ b/net/ipv6/ip6_tunnel.c
@@ -1692,8 +1692,13 @@ int ip6_tnl_change_mtu(struct net_device *dev, int new_mtu)
if (new_mtu < ETH_MIN_MTU)
return -EINVAL;
}
- if (new_mtu > 0xFFF8 - dev->hard_header_len)
- return -EINVAL;
+ if (tnl->parms.proto == IPPROTO_IPV6 || tnl->parms.proto == 0) {
+ if (new_mtu > IP6_MAX_MTU - dev->hard_header_len)
+ return -EINVAL;
+ } else {
+ if (new_mtu > IP_MAX_MTU - dev->hard_header_len)
+ return -EINVAL;
+ }
dev->mtu = new_mtu;
return 0;
}
@@ -1841,7 +1846,7 @@ ip6_tnl_dev_init_gen(struct net_device *dev)
if (!(t->parms.flags & IP6_TNL_F_IGN_ENCAP_LIMIT))
dev->mtu -= 8;
dev->min_mtu = ETH_MIN_MTU;
- dev->max_mtu = 0xFFF8 - dev->hard_header_len;
+ dev->max_mtu = IP6_MAX_MTU - dev->hard_header_len;
return 0;
diff --git a/net/ipv6/sit.c b/net/ipv6/sit.c
index 2afce37a7177..e9400ffa7875 100644
--- a/net/ipv6/sit.c
+++ b/net/ipv6/sit.c
@@ -1371,7 +1371,7 @@ static void ipip6_tunnel_setup(struct net_device *dev)
dev->hard_header_len = LL_MAX_HEADER + t_hlen;
dev->mtu = ETH_DATA_LEN - t_hlen;
dev->min_mtu = IPV6_MIN_MTU;
- dev->max_mtu = 0xFFF8 - t_hlen;
+ dev->max_mtu = IP6_MAX_MTU - t_hlen;
dev->flags = IFF_NOARP;
netif_keep_dst(dev);
dev->addr_len = 4;
@@ -1583,7 +1583,8 @@ static int ipip6_newlink(struct net *src_net, struct net_device *dev,
if (tb[IFLA_MTU]) {
u32 mtu = nla_get_u32(tb[IFLA_MTU]);
- if (mtu >= IPV6_MIN_MTU && mtu <= 0xFFF8 - dev->hard_header_len)
+ if (mtu >= IPV6_MIN_MTU &&
+ mtu <= IP6_MAX_MTU - dev->hard_header_len)
dev->mtu = mtu;
}
--
2.15.1
^ permalink raw reply related
* Re: [PATCH net] net/sonic: Use dma_mapping_error()
From: Tom Bogendoerfer @ 2018-05-30 9:01 UTC (permalink / raw)
To: Finn Thain; +Cc: David S. Miller, netdev, linux-kernel
In-Reply-To: <cba8175deaf9d631ae000088aea1ccf1c444909b.1527649393.git.fthain@telegraphics.com.au>
On Wed, May 30, 2018 at 01:03:51PM +1000, Finn Thain wrote:
> With CONFIG_DMA_API_DEBUG=y, calling sonic_open() produces the
> message, "DMA-API: device driver failed to check map error".
> Add the missing dma_mapping_error() call.
>
> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
> Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
> ---
> drivers/net/ethernet/natsemi/sonic.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/net/ethernet/natsemi/sonic.c b/drivers/net/ethernet/natsemi/sonic.c
> index 7ed08486ae23..c805dcbebd02 100644
> --- a/drivers/net/ethernet/natsemi/sonic.c
> +++ b/drivers/net/ethernet/natsemi/sonic.c
> @@ -84,7 +84,7 @@ static int sonic_open(struct net_device *dev)
> for (i = 0; i < SONIC_NUM_RRS; i++) {
> dma_addr_t laddr = dma_map_single(lp->device, skb_put(lp->rx_skb[i], SONIC_RBSIZE),
> SONIC_RBSIZE, DMA_FROM_DEVICE);
> - if (!laddr) {
> + if (dma_mapping_error(lp->device, laddr)) {
> while(i > 0) { /* free any that were mapped successfully */
> i--;
> dma_unmap_single(lp->device, lp->rx_laddr[i], SONIC_RBSIZE, DMA_FROM_DEVICE);
> --
> 2.16.1
looks good
Acked-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Thomas.
--
Crap can work. Given enough thrust pigs will fly, but it's not necessarily a
good idea. [ RFC1925, 2.3 ]
^ permalink raw reply
* [PATCH V2] brcmfmac: stop watchdog before detach and free everything
From: Michael Trimarchi @ 2018-05-30 9:06 UTC (permalink / raw)
To: Arend van Spriel
Cc: Franky Lin, Hante Meuleman, Chi-Hsien Lin, Wright Feng,
Kalle Valo, David S. Miller, Pieter-Paul Giesberts, Ian Molton,
linux-wireless, brcm80211-dev-list.pdl, brcm80211-dev-list,
netdev, linux-kernel
In-Reply-To: <5B0D1C9E.4000800@broadcom.com>
Using built-in in kernel image without a firmware in filesystem
or in the kernel image can lead to a kernel NULL pointer deference.
Watchdog need to be stopped in brcmf_sdio_remove
The system is going down NOW!
[ 1348.110759] Unable to handle kernel NULL pointer dereference at virtual address 000002f8
Sent SIGTERM to all processes
[ 1348.121412] Mem abort info:
[ 1348.126962] ESR = 0x96000004
[ 1348.130023] Exception class = DABT (current EL), IL = 32 bits
[ 1348.135948] SET = 0, FnV = 0
[ 1348.138997] EA = 0, S1PTW = 0
[ 1348.142154] Data abort info:
[ 1348.145045] ISV = 0, ISS = 0x00000004
[ 1348.148884] CM = 0, WnR = 0
[ 1348.151861] user pgtable: 4k pages, 48-bit VAs, pgdp = (____ptrval____)
[ 1348.158475] [00000000000002f8] pgd=0000000000000000
[ 1348.163364] Internal error: Oops: 96000004 [#1] PREEMPT SMP
[ 1348.168927] Modules linked in: ipv6
[ 1348.172421] CPU: 3 PID: 1421 Comm: brcmf_wdog/mmc0 Not tainted 4.17.0-rc5-next-20180517 #18
[ 1348.180757] Hardware name: Amarula A64-Relic (DT)
[ 1348.185455] pstate: 60000005 (nZCv daif -PAN -UAO)
[ 1348.190251] pc : brcmf_sdiod_freezer_count+0x0/0x20
[ 1348.195124] lr : brcmf_sdio_watchdog_thread+0x64/0x290
[ 1348.200253] sp : ffff00000b85be30
[ 1348.203561] x29: ffff00000b85be30 x28: 0000000000000000
[ 1348.208868] x27: ffff00000b6cb918 x26: ffff80003b990638
[ 1348.214176] x25: ffff0000087b1a20 x24: ffff80003b94f800
[ 1348.219483] x23: ffff000008e620c8 x22: ffff000008f0b660
[ 1348.224790] x21: ffff000008c6a858 x20: 00000000fffffe00
[ 1348.230097] x19: ffff80003b94f800 x18: 0000000000000001
[ 1348.235404] x17: 0000ffffab2e8a74 x16: ffff0000080d7de8
[ 1348.240711] x15: 0000000000000000 x14: 0000000000000400
[ 1348.246018] x13: 0000000000000400 x12: 0000000000000001
[ 1348.251324] x11: 00000000000002c4 x10: 0000000000000a10
[ 1348.256631] x9 : ffff00000b85bc40 x8 : ffff80003be11870
[ 1348.261937] x7 : ffff80003dfc7308 x6 : 000000078ff08b55
[ 1348.267243] x5 : 00000139e1058400 x4 : 0000000000000000
[ 1348.272550] x3 : dead000000000100 x2 : 958f2788d6618100
[ 1348.277856] x1 : 00000000fffffe00 x0 : 0000000000000000
Signed-off-by: Michael Trimarchi <michael@amarulasolutions.com>
---
drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c
index 412a05b..061f69d 100644
--- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c
+++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c
@@ -4294,6 +4294,13 @@ void brcmf_sdio_remove(struct brcmf_sdio *bus)
brcmf_dbg(TRACE, "Enter\n");
if (bus) {
+ /* Stop watchdog task */
+ if (bus->watchdog_tsk) {
+ send_sig(SIGTERM, bus->watchdog_tsk, 1);
+ kthread_stop(bus->watchdog_tsk);
+ bus->watchdog_tsk = NULL;
+ }
+
/* De-register interrupt handler */
brcmf_sdiod_intr_unregister(bus->sdiodev);
--
2.7.4
^ permalink raw reply related
* Re: [PATCH net] VSOCK: check sk state before receive
From: Stefan Hajnoczi @ 2018-05-30 9:17 UTC (permalink / raw)
To: Hangbin Liu; +Cc: netdev, Jorgen Hansen, David S. Miller
In-Reply-To: <20180527152945.GQ8958@leo.usersys.redhat.com>
[-- Attachment #1: Type: text/plain, Size: 4964 bytes --]
On Sun, May 27, 2018 at 11:29:45PM +0800, Hangbin Liu wrote:
> Hmm...Although I won't reproduce this bug with my reproducer after
> apply my patch. I could still get a similiar issue with syzkaller sock vnet test.
>
> It looks this patch is not complete. Here is the KASAN call trace with my patch.
> I can also reproduce it without my patch.
Seems like a race between vmci_datagram_destroy_handle() and the
delayed callback, vmci_transport_recv_dgram_cb().
I don't know the VMCI transport well so I'll leave this to Jorgen.
> ==================================================================
> BUG: KASAN: use-after-free in vmci_transport_allow_dgram.part.7+0x155/0x1a0 [vmw_vsock_vmci_transport]
> Read of size 4 at addr ffff880026a3a914 by task kworker/0:2/96
>
> CPU: 0 PID: 96 Comm: kworker/0:2 Not tainted 4.17.0-rc6.vsock+ #28
> Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
> Workqueue: events dg_delayed_dispatch [vmw_vmci]
> Call Trace:
> __dump_stack lib/dump_stack.c:77 [inline]
> dump_stack+0xdd/0x18e lib/dump_stack.c:113
> print_address_description+0x7a/0x3e0 mm/kasan/report.c:256
> kasan_report_error mm/kasan/report.c:354 [inline]
> kasan_report+0x1dd/0x460 mm/kasan/report.c:412
> vmci_transport_allow_dgram.part.7+0x155/0x1a0 [vmw_vsock_vmci_transport]
> vmci_transport_recv_dgram_cb+0x5d/0x200 [vmw_vsock_vmci_transport]
> dg_delayed_dispatch+0x99/0x1b0 [vmw_vmci]
> process_one_work+0xa4e/0x1720 kernel/workqueue.c:2145
> worker_thread+0x1df/0x1400 kernel/workqueue.c:2279
> kthread+0x343/0x4b0 kernel/kthread.c:240
> ret_from_fork+0x35/0x40 arch/x86/entry/entry_64.S:412
>
> Allocated by task 2684:
> set_track mm/kasan/kasan.c:460 [inline]
> kasan_kmalloc+0xa0/0xd0 mm/kasan/kasan.c:553
> slab_post_alloc_hook mm/slab.h:444 [inline]
> slab_alloc_node mm/slub.c:2741 [inline]
> slab_alloc mm/slub.c:2749 [inline]
> kmem_cache_alloc+0x105/0x330 mm/slub.c:2754
> sk_prot_alloc+0x6a/0x2c0 net/core/sock.c:1468
> sk_alloc+0xc9/0xbb0 net/core/sock.c:1528
> __vsock_create+0xc8/0x9b0 [vsock]
> vsock_create+0xfd/0x1a0 [vsock]
> __sock_create+0x310/0x690 net/socket.c:1285
> sock_create net/socket.c:1325 [inline]
> __sys_socket+0x101/0x240 net/socket.c:1355
> __do_sys_socket net/socket.c:1364 [inline]
> __se_sys_socket net/socket.c:1362 [inline]
> __x64_sys_socket+0x7d/0xd0 net/socket.c:1362
> do_syscall_64+0x175/0x630 arch/x86/entry/common.c:287
> entry_SYSCALL_64_after_hwframe+0x44/0xa9
>
> Freed by task 2684:
> set_track mm/kasan/kasan.c:460 [inline]
> __kasan_slab_free+0x130/0x180 mm/kasan/kasan.c:521
> slab_free_hook mm/slub.c:1388 [inline]
> slab_free_freelist_hook mm/slub.c:1415 [inline]
> slab_free mm/slub.c:2988 [inline]
> kmem_cache_free+0xce/0x410 mm/slub.c:3004
> sk_prot_free net/core/sock.c:1509 [inline]
> __sk_destruct+0x629/0x940 net/core/sock.c:1593
> sk_destruct+0x4e/0x90 net/core/sock.c:1601
> __sk_free+0xd3/0x320 net/core/sock.c:1612
> sk_free+0x2a/0x30 net/core/sock.c:1623
> __vsock_release+0x431/0x610 [vsock]
> vsock_release+0x3c/0xc0 [vsock]
> sock_release+0x91/0x200 net/socket.c:594
> sock_close+0x17/0x20 net/socket.c:1149
> __fput+0x368/0xa20 fs/file_table.c:209
> task_work_run+0x1c5/0x2a0 kernel/task_work.c:113
> exit_task_work include/linux/task_work.h:22 [inline]
> do_exit+0x1876/0x26c0 kernel/exit.c:865
> do_group_exit+0x159/0x3e0 kernel/exit.c:968
> get_signal+0x65a/0x1780 kernel/signal.c:2482
> do_signal+0xa4/0x1fe0 arch/x86/kernel/signal.c:810
> exit_to_usermode_loop+0x1b8/0x260 arch/x86/entry/common.c:162
> prepare_exit_to_usermode arch/x86/entry/common.c:196 [inline]
> syscall_return_slowpath arch/x86/entry/common.c:265 [inline]
> do_syscall_64+0x505/0x630 arch/x86/entry/common.c:290
> entry_SYSCALL_64_after_hwframe+0x44/0xa9
>
> The buggy address belongs to the object at ffff880026a3a600
> which belongs to the cache AF_VSOCK of size 1056
> The buggy address is located 788 bytes inside of
> 1056-byte region [ffff880026a3a600, ffff880026a3aa20)
> The buggy address belongs to the page:
> page:ffffea00009a8e00 count:1 mapcount:0 mapping:0000000000000000 index:0x0 compound_mapcount: 0
> flags: 0xfffffc0008100(slab|head)
> raw: 000fffffc0008100 0000000000000000 0000000000000000 00000001000d000d
> raw: dead000000000100 dead000000000200 ffff880034471a40 0000000000000000
> page dumped because: kasan: bad access detected
>
> Memory state around the buggy address:
> ffff880026a3a800: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> ffff880026a3a880: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> >ffff880026a3a900: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> ^
> ffff880026a3a980: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> ffff880026a3aa00: fb fb fb fb fc fc fc fc fc fc fc fc fc fc fc fc
> ==================================================================
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]
^ permalink raw reply
* Re: [PATCH v4 net-next 00/19] inet: frags: bring rhashtables to IP defrag
From: Jesper Dangaard Brouer @ 2018-05-30 9:20 UTC (permalink / raw)
To: Eric Dumazet
Cc: Alexander Aring, Tariq Toukan, David Miller, edumazet, netdev, fw,
herbert, tgraf, alex.aring, stefan, ktkhai, Moshe Shemesh,
Eran Ben Elisha, brouer, Rick Jones
In-Reply-To: <13bf3889-4426-b17a-d8d7-e843038a2a82@gmail.com>
On Mon, 28 May 2018 09:09:17 -0700
Eric Dumazet <eric.dumazet@gmail.com> wrote:
> Tariq, here are my test results : No drops for me.
>
> # ./netperf -H 2607:f8b0:8099:e18:: -t UDP_STREAM
> MIGRATED UDP STREAM TEST from ::0 (::) port 0 AF_INET6 to 2607:f8b0:8099:e18:: () port 0 AF_INET6
> Socket Message Elapsed Messages
> Size Size Time Okay Errors Throughput
> bytes bytes secs # # 10^6bits/sec
>
> 212992 65507 10.00 202117 0 10592.00
> 212992 10.00 0 0.00
Hmm... Eric the above result show that ALL your UDP packets were dropped!
You have 0 okay messages and 0.00 Mbit/s throughput.
It needs to look like below (test on i40e NIC):
$ netperf -t UDP_STREAM -H fee0:cafe::1
MIGRATED UDP STREAM TEST from ::0 (::) port 0 AF_INET6 to fee0:cafe::1 () port 0 AF_INET6 : histogram : demo
Socket Message Elapsed Messages
Size Size Time Okay Errors Throughput
bytes bytes secs # # 10^6bits/sec
212992 65507 10.00 186385 0 9767.08
212992 10.00 186385 9767.08
If I manually instruct ip6tables to drop all UDP packets, then I get
what you see... so, something on your test system are likely dropping
your UDP packets, but letting regular netperf (TCP) control
communication through.
# ip6tables -I INPUT -p udp -j DROP
$ netperf -t UDP_STREAM -H fee0:cafe::1
MIGRATED UDP STREAM TEST from ::0 (::) port 0 AF_INET6 to fee0:cafe::1 () port 0 AF_INET6 : histogram : demo
Socket Message Elapsed Messages
Size Size Time Okay Errors Throughput
bytes bytes secs # # 10^6bits/sec
212992 65507 10.00 182095 0 9542.41
212992 10.00 0 0.00
--
Best regards,
Jesper Dangaard Brouer
MSc.CS, Principal Kernel Engineer at Red Hat
LinkedIn: http://www.linkedin.com/in/brouer
^ permalink raw reply
* [PATCH] ath9k: debug: fix spelling mistake "WATHDOG" -> "WATCHDOG"
From: Colin King @ 2018-05-30 9:25 UTC (permalink / raw)
To: QCA ath9k Development, Kalle Valo, David S . Miller,
linux-wireless, netdev
Cc: kernel-janitors, linux-kernel
From: Colin Ian King <colin.king@canonical.com>
Trivial fix to spelling mistake in PR_IS message text.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
---
drivers/net/wireless/ath/ath9k/debug.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/wireless/ath/ath9k/debug.c b/drivers/net/wireless/ath/ath9k/debug.c
index f685843a2ff3..0a6eb8a8c1ed 100644
--- a/drivers/net/wireless/ath/ath9k/debug.c
+++ b/drivers/net/wireless/ath/ath9k/debug.c
@@ -538,7 +538,7 @@ static int read_file_interrupt(struct seq_file *file, void *data)
if (sc->sc_ah->caps.hw_caps & ATH9K_HW_CAP_EDMA) {
PR_IS("RXLP", rxlp);
PR_IS("RXHP", rxhp);
- PR_IS("WATHDOG", bb_watchdog);
+ PR_IS("WATCHDOG", bb_watchdog);
} else {
PR_IS("RX", rxok);
}
--
2.17.0
^ permalink raw reply related
* Re: [PATCH net-next 0/8] nfp: offload LAG for tc flower egress
From: John Hurley @ 2018-05-30 9:26 UTC (permalink / raw)
To: Jiri Pirko
Cc: Jakub Kicinski, David Miller, Linux Netdev List, oss-drivers,
Jay Vosburgh, Veaceslav Falico, Andy Gospodarek
In-Reply-To: <20180529220947.GC2367@nanopsycho>
On Tue, May 29, 2018 at 11:09 PM, Jiri Pirko <jiri@resnulli.us> wrote:
> Tue, May 29, 2018 at 04:08:48PM CEST, john.hurley@netronome.com wrote:
>>On Sat, May 26, 2018 at 3:47 AM, Jakub Kicinski
>><jakub.kicinski@netronome.com> wrote:
>>> On Fri, 25 May 2018 08:48:09 +0200, Jiri Pirko wrote:
>>>> Thu, May 24, 2018 at 04:22:47AM CEST, jakub.kicinski@netronome.com wrote:
>>>> >Hi!
>>>> >
>>>> >This series from John adds bond offload to the nfp driver. Patch 5
>>>> >exposes the hash type for NETDEV_LAG_TX_TYPE_HASH to make sure nfp
>>>> >hashing matches that of the software LAG. This may be unnecessarily
>>>> >conservative, let's see what LAG maintainers think :)
>>>>
>>>> So you need to restrict offload to only certain hash algo? In mlxsw, we
>>>> just ignore the lag setting and do some hw default hashing. Would not be
>>>> enough? Note that there's a good reason for it, as you see, in team, the
>>>> hashing is done in a BPF function and could be totally arbitrary.
>>>> Your patchset effectively disables team offload for nfp.
>>>
>>> My understanding is that the project requirements only called for L3/L4
>>> hash algorithm offload, hence the temptation to err on the side of
>>> caution and not offload all the bond configurations. John can provide
>>> more details. Not being able to offload team is unfortunate indeed.
>>
>>Hi Jiri,
>>Yes, as Jakub mentions, we restrict ourselves to L3/L4 hash algorithm
>>as this is currently what is supported in fw.
>
> In mlxsw, a default l3/l4 is used always, no matter what the
> bonding/team sets. It is not correct, but it works with team as well.
> Perhaps we can have NETDEV_LAG_HASH_UNKNOWN to indicate to the driver to
> do some default? That would make the "team" offload functional.
>
yes, I would agree with that.
Thanks
>>Hopefully this will change as fw features are expanded.
>>I understand the issue this presents with offloading team.
>>Perhaps resorting to a default hw hash for team is acceptable.
>>John
^ permalink raw reply
* Re: [PATCH] netfilter: nfnetlink: Remove VLA usage
From: Pablo Neira Ayuso @ 2018-05-30 9:52 UTC (permalink / raw)
To: Kees Cook
Cc: Jozsef Kadlecsik, Florian Westphal, David S. Miller,
netfilter-devel, coreteam, netdev, linux-kernel
In-Reply-To: <20180530003525.GA18642@beast>
On Tue, May 29, 2018 at 05:35:25PM -0700, Kees Cook wrote:
> In the quest to remove all stack VLA usage from the kernel[1], this
> allocates the maximum size expected for all possible attrs and adds
> a sanity-check to make sure nothing gets out of sync.
>
> [1] https://lkml.kernel.org/r/CA+55aFzCG-zNmZwX4A2FQpadafLfEzK6CC=qPXydAacU1RqZWA@mail.gmail.com
>
> Signed-off-by: Kees Cook <keescook@chromium.org>
> ---
> net/netfilter/nfnetlink.c | 22 ++++++++++++++++++++--
> 1 file changed, 20 insertions(+), 2 deletions(-)
>
> diff --git a/net/netfilter/nfnetlink.c b/net/netfilter/nfnetlink.c
> index 03ead8a9e90c..0cb395f9627e 100644
> --- a/net/netfilter/nfnetlink.c
> +++ b/net/netfilter/nfnetlink.c
> @@ -28,6 +28,7 @@
>
> #include <net/netlink.h>
> #include <linux/netfilter/nfnetlink.h>
> +#include <linux/netfilter/nf_tables.h>
>
> MODULE_LICENSE("GPL");
> MODULE_AUTHOR("Harald Welte <laforge@netfilter.org>");
> @@ -37,6 +38,11 @@ MODULE_ALIAS_NET_PF_PROTO(PF_NETLINK, NETLINK_NETFILTER);
> rcu_dereference_protected(table[(id)].subsys, \
> lockdep_nfnl_is_held((id)))
>
> +#define NFTA_MAX_ATTR max(max(max(NFTA_CHAIN_MAX, NFTA_FLOWTABLE_MAX),\
> + max(NFTA_OBJ_MAX, NFTA_RULE_MAX)), \
> + max(NFTA_TABLE_MAX, \
> + max(NFTA_SET_ELEM_LIST_MAX, NFTA_SET_MAX)))
This is very specific of nftables, there are other nf subsystems using
nfnetlink that may go over this maximum attribute value (grep from
"struct nfnetlink_subsystem").
To remove the VLA, I think we need an artificial maximum attribute
that reasonably large enough.
^ permalink raw reply
* Feature Request : iface may be allowed as datatype in all ipset
From: Akshat Kakkar @ 2018-05-30 10:03 UTC (permalink / raw)
To: netdev
Is there a reason why iface is allowed to be paired only with net to
create an ipset?
I think with feature of skbinfo in every ipset, it should be allowed
to add iface in all ipset. As skbinfo can store tc classes, it might
make more sense if I can pin point on which outgoing interface this
class should be applied.
One direct way of doing could be adding iface in skbinfo itself, but I
dont think its a good suggestion.
So, other thing left is to have ipset storing interface too. Besides,
when I create a tc class, I create it on a known interface, so I know
beforehand on which interface this class is created. So I can easily
specify while adding entry in ipset.
^ permalink raw reply
* Re: [PATCH bpf 2/2] bpf: enforce usage of __aligned_u64 in the UAPI header
From: Eugene Syromiatnikov @ 2018-05-30 10:03 UTC (permalink / raw)
To: Song Liu
Cc: netdev, open list, Martin KaFai Lau, Daniel Borkmann,
Alexei Starovoitov, David S. Miller, Jiri Olsa, Ingo Molnar,
Lawrence Brakmo, Andrey Ignatov, Jakub Kicinski, John Fastabend,
Dmitry V. Levin
In-Reply-To: <CAPhsuW5fVamngrqEWcsPKyr3Njjz4K5vO3o51BuWXAMw_nf9KA@mail.gmail.com>
On Tue, May 29, 2018 at 10:35:09AM -0700, Song Liu wrote:
> I think these changes are not necessary. Is it a general guidance to
> only use 64-bit aligned
> variables in UAPI headers?
Not really, but it allows avoiding most alignment issues like the one
mentioned in the previous patch and in the referenced RDMA patch.
^ permalink raw reply
* Re: [PATCH v4 net-next 00/19] inet: frags: bring rhashtables to IP defrag
From: Eric Dumazet @ 2018-05-30 10:36 UTC (permalink / raw)
To: Jesper Dangaard Brouer
Cc: Eric Dumazet, aring, Tariq Toukan, David Miller, netdev,
Florian Westphal, Herbert Xu, Thomas Graf, Alexander Aring,
Stefan Schmidt, Kirill Tkhai, moshe, Eran Ben Elisha, Rick Jones
In-Reply-To: <20180530112022.2b793051@redhat.com>
On Wed, May 30, 2018 at 5:20 AM Jesper Dangaard Brouer <brouer@redhat.com>
wrote:
> On Mon, 28 May 2018 09:09:17 -0700
> Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > Tariq, here are my test results : No drops for me.
> >
> > # ./netperf -H 2607:f8b0:8099:e18:: -t UDP_STREAM
> > MIGRATED UDP STREAM TEST from ::0 (::) port 0 AF_INET6 to
2607:f8b0:8099:e18:: () port 0 AF_INET6
> > Socket Message Elapsed Messages
> > Size Size Time Okay Errors Throughput
> > bytes bytes secs # # 10^6bits/sec
> >
> > 212992 65507 10.00 202117 0 10592.00
> > 212992 10.00 0 0.00
> Hmm... Eric the above result show that ALL your UDP packets were dropped!
> You have 0 okay messages and 0.00 Mbit/s throughput.
> It needs to look like below (test on i40e NIC):
> $ netperf -t UDP_STREAM -H fee0:cafe::1
> MIGRATED UDP STREAM TEST from ::0 (::) port 0 AF_INET6 to fee0:cafe::1 ()
port 0 AF_INET6 : histogram : demo
> Socket Message Elapsed Messages
> Size Size Time Okay Errors Throughput
> bytes bytes secs # # 10^6bits/sec
> 212992 65507 10.00 186385 0 9767.08
> 212992 10.00 186385 9767.08
> If I manually instruct ip6tables to drop all UDP packets, then I get
> what you see... so, something on your test system are likely dropping
> your UDP packets, but letting regular netperf (TCP) control
> communication through.
> # ip6tables -I INPUT -p udp -j DROP
> $ netperf -t UDP_STREAM -H fee0:cafe::1
> MIGRATED UDP STREAM TEST from ::0 (::) port 0 AF_INET6 to fee0:cafe::1 ()
port 0 AF_INET6 : histogram : demo
> Socket Message Elapsed Messages
> Size Size Time Okay Errors Throughput
> bytes bytes secs # # 10^6bits/sec
> 212992 65507 10.00 182095 0 9542.41
> 212992 10.00 0 0.00
Right you are, for some reason I copied/pasted wrong results,
after _specifically_ filling up the frags to the memory limits,
when trying to reproduce 'bad numbers '
Here are the good ones, using latest David Miller net tree. ( plus
https://patchwork.ozlabs.org/patch/922528/ but that should not matter here)
llpaa23:/export/hda3/google/edumazet# ./netperf -H 2607:f8b0:8099:e18:: -t
UDP_STREAM
MIGRATED UDP STREAM TEST from ::0 (::) port 0 AF_INET6 to
2607:f8b0:8099:e18:: () port 0 AF_INET6
Socket Message Elapsed Messages
Size Size Time Okay Errors Throughput
bytes bytes secs # # 10^6bits/sec
212992 65507 10.00 216236 0 11331.89
212992 10.00 215068 11270.68
There are few drops because of the too small
/proc/sys/net/core/rmem_default ( 212992 as seen in netperf output) for
these kind of stress.
( each 64KB datagram actually consumes half the budget ...)
^ permalink raw reply
* Re: [PATCH bpf-next] bpftool: Support sendmsg{4,6} attach types
From: Daniel Borkmann @ 2018-05-30 10:56 UTC (permalink / raw)
To: Song Liu, Jakub Kicinski
Cc: Andrey Ignatov, Networking, Alexei Starovoitov, Quentin Monnet,
kernel-team
In-Reply-To: <CAPhsuW6oyRbgnXoyNtA0XM03063qQJGok6bPpO_Z4QBVgmi7=w@mail.gmail.com>
On 05/30/2018 02:12 AM, Song Liu wrote:
> On Tue, May 29, 2018 at 2:20 PM, Jakub Kicinski <kubakici@wp.pl> wrote:
>> On Tue, 29 May 2018 13:29:31 -0700, Andrey Ignatov wrote:
>>> Add support for recently added BPF_CGROUP_UDP4_SENDMSG and
>>> BPF_CGROUP_UDP6_SENDMSG attach types to bpftool, update documentation
>>> and bash completion.
>>>
>>> Signed-off-by: Andrey Ignatov <rdna@fb.com>
>>
>> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
>>
>>> I'm not sure about "since 4.18" in Documentation part. I can follow-up when
>>> the next kernel version is known.
>>
>> IMHO it's fine, we can follow up if Linus decides to call it something
>> else :)
>>
>> Thanks!
>
> Acked-by: Song Liu <songliubraving@fb.com>
Applied to bpf-next, thanks guys!
^ permalink raw reply
* Re: [PATCH v4 net-next 00/19] inet: frags: bring rhashtables to IP defrag
From: Eric Dumazet @ 2018-05-30 10:56 UTC (permalink / raw)
To: Jesper Dangaard Brouer
Cc: Eric Dumazet, aring, Tariq Toukan, David Miller, netdev,
Florian Westphal, Herbert Xu, Thomas Graf, Alexander Aring,
Stefan Schmidt, Kirill Tkhai, moshe, Eran Ben Elisha, Rick Jones
In-Reply-To: <CANn89iL73WTtF7P477tJOZcbDsg3U7Py7ykA9xdipcahtJKNNA@mail.gmail.com>
On Wed, May 30, 2018 at 6:36 AM Eric Dumazet <edumazet@google.com> wrote:
> Here are the good ones, using latest David Miller net tree. ( plus
> https://patchwork.ozlabs.org/patch/922528/ but that should not matter
here)
> llpaa23:/export/hda3/google/edumazet# ./netperf -H 2607:f8b0:8099:e18:: -t
> UDP_STREAM
> MIGRATED UDP STREAM TEST from ::0 (::) port 0 AF_INET6 to
> 2607:f8b0:8099:e18:: () port 0 AF_INET6
> Socket Message Elapsed Messages
> Size Size Time Okay Errors Throughput
> bytes bytes secs # # 10^6bits/sec
> 212992 65507 10.00 216236 0 11331.89
> 212992 10.00 215068 11270.68
> There are few drops because of the too small
> /proc/sys/net/core/rmem_default ( 212992 as seen in netperf output) for
> these kind of stress.
> ( each 64KB datagram actually consumes half the budget ...)
Once rmem_default is set to 1,000,000 and mtu set back to 1500 (instead of
5102 on my testbed)
results are indeed better.
lpaa23:/export/hda3/google/edumazet# ./netperf -H 2607:f8b0:8099:e18:: -t
UDP_STREAM -l 10
MIGRATED UDP STREAM TEST from ::0 (::) port 0 AF_INET6 to
2607:f8b0:8099:e18:: () port 0 AF_INET6
Socket Message Elapsed Messages
Size Size Time Okay Errors Throughput
bytes bytes secs # # 10^6bits/sec
212992 65507 10.00 231457 0 12129.56
1000000 10.00 231457 12129.56
^ permalink raw reply
* Re: [PATCH v5 0/3] IR decoding using BPF
From: Daniel Borkmann @ 2018-05-30 10:57 UTC (permalink / raw)
To: Sean Young, linux-media, linux-kernel, Alexei Starovoitov,
Mauro Carvalho Chehab, netdev, Matthias Reichl, Devin Heitmueller,
Y Song, Quentin Monnet
In-Reply-To: <cover.1527419762.git.sean@mess.org>
On 05/27/2018 01:24 PM, Sean Young wrote:
> The kernel IR decoders (drivers/media/rc/ir-*-decoder.c) support the most
> widely used IR protocols, but there are many protocols which are not
> supported[1]. For example, the lirc-remotes[2] repo has over 2700 remotes,
> many of which are not supported by rc-core. There is a "long tail" of
> unsupported IR protocols, for which lircd is need to decode the IR .
>
> IR encoding is done in such a way that some simple circuit can decode it;
> therefore, bpf is ideal.
>
> In order to support all these protocols, here we have bpf based IR decoding.
> The idea is that user-space can define a decoder in bpf, attach it to
> the rc device through the lirc chardev.
>
> Separate work is underway to extend ir-keytable to have an extensive library
> of bpf-based decoders, and a much expanded library of rc keymaps.
>
> Another future application would be to compile IRP[3] to a IR BPF program, and
> so support virtually every remote without having to write a decoder for each.
> It might also be possible to support non-button devices such as analog
> directional pads or air conditioning remote controls and decode the target
> temperature in bpf, and pass that to an input device.
>
> Thanks,
>
> Sean Young
>
> [1] http://www.hifi-remote.com/wiki/index.php?title=DecodeIR
> [2] https://sourceforge.net/p/lirc-remotes/code/ci/master/tree/remotes/
> [3] http://www.hifi-remote.com/wiki/index.php?title=IRP_Notation
>
> Changes since v4:
> - Renamed rc_dev_bpf_{attach,detach,query} to lirc_bpf_{attach,detach,query}
> - Fixed error path in lirc_bpf_query
> - Rebased on bpf-next
>
> Changes since v3:
> - Implemented review comments from Quentin Monnet and Y Song (thanks!)
> - More helpful and better formatted bpf helper documentation
> - Changed back to bpf_prog_array rather than open-coded implementation
> - scancodes can be 64 bit
> - bpf gets passed values in microseconds, not nanoseconds.
> microseconds is more than than enough (IR receivers support carriers upto
> 70kHz, at which point a single period is already 14 microseconds). Also,
> this makes it much more consistent with lirc mode2.
> - Since it looks much more like lirc mode2, rename the program type to
> BPF_PROG_TYPE_LIRC_MODE2.
> - Rebased on bpf-next
>
> Changes since v2:
> - Fixed locking issues
> - Improved self-test to cover more cases
> - Rebased on bpf-next again
>
> Changes since v1:
> - Code review comments from Y Song <ys114321@gmail.com> and
> Randy Dunlap <rdunlap@infradead.org>
> - Re-wrote sample bpf to be selftest
> - Renamed RAWIR_DECODER -> RAWIR_EVENT (Kconfig, context, bpf prog type)
> - Rebase on bpf-next
> - Introduced bpf_rawir_event context structure with simpler access checking
Applied to bpf-next, thanks Sean!
^ permalink raw reply
* Re: [PATCH bpf-next v7 3/6] bpf: Add IPv6 Segment Routing helpers
From: Daniel Borkmann @ 2018-05-30 11:00 UTC (permalink / raw)
To: Mathieu Xhonneux, netdev; +Cc: dlebrun, alexei.starovoitov
In-Reply-To: <d6833d31-4481-9595-ce26-d93ff35f411a@iogearbox.net>
On 05/24/2018 12:18 PM, Daniel Borkmann wrote:
> On 05/20/2018 03:58 PM, Mathieu Xhonneux wrote:
>> The BPF seg6local hook should be powerful enough to enable users to
>> implement most of the use-cases one could think of. After some thinking,
>> we figured out that the following actions should be possible on a SRv6
>> packet, requiring 3 specific helpers :
>> - bpf_lwt_seg6_store_bytes: Modify non-sensitive fields of the SRH
>> - bpf_lwt_seg6_adjust_srh: Allow to grow or shrink a SRH
>> (to add/delete TLVs)
>> - bpf_lwt_seg6_action: Apply some SRv6 network programming actions
>> (specifically End.X, End.T, End.B6 and
>> End.B6.Encap)
>>
>> The specifications of these helpers are provided in the patch (see
>> include/uapi/linux/bpf.h).
>>
>> The non-sensitive fields of the SRH are the following : flags, tag and
>> TLVs. The other fields can not be modified, to maintain the SRH
>> integrity. Flags, tag and TLVs can easily be modified as their validity
>> can be checked afterwards via seg6_validate_srh. It is not allowed to
>> modify the segments directly. If one wants to add segments on the path,
>> he should stack a new SRH using the End.B6 action via
>> bpf_lwt_seg6_action.
>>
>> Growing, shrinking or editing TLVs via the helpers will flag the SRH as
>> invalid, and it will have to be re-validated before re-entering the IPv6
>> layer. This flag is stored in a per-CPU buffer, along with the current
>> header length in bytes.
>>
>> Storing the SRH len in bytes in the control block is mandatory when using
>> bpf_lwt_seg6_adjust_srh. The Header Ext. Length field contains the SRH
>> len rounded to 8 bytes (a padding TLV can be inserted to ensure the 8-bytes
>> boundary). When adding/deleting TLVs within the BPF program, the SRH may
>> temporary be in an invalid state where its length cannot be rounded to 8
>> bytes without remainder, hence the need to store the length in bytes
>> separately. The caller of the BPF program can then ensure that the SRH's
>> final length is valid using this value. Again, a final SRH modified by a
>> BPF program which doesn’t respect the 8-bytes boundary will be discarded
>> as it will be considered as invalid.
>>
>> Finally, a fourth helper is provided, bpf_lwt_push_encap, which is
>> available from the LWT BPF IN hook, but not from the seg6local BPF one.
>> This helper allows to encapsulate a Segment Routing Header (either with
>> a new outer IPv6 header, or by inlining it directly in the existing IPv6
>> header) into a non-SRv6 packet. This helper is required if we want to
>> offer the possibility to dynamically encapsulate a SRH for non-SRv6 packet,
>> as the BPF seg6local hook only works on traffic already containing a SRH.
>> This is the BPF equivalent of the seg6 LWT infrastructure, which achieves
>> the same purpose but with a static SRH per route.
>>
>> These helpers require CONFIG_IPV6=y (and not =m).
>>
>> Signed-off-by: Mathieu Xhonneux <m.xhonneux@gmail.com>
>> Acked-by: David Lebrun <dlebrun@google.com>
>
> One minor comments for follow-ups in here below.
>
>> +BPF_CALL_4(bpf_lwt_seg6_store_bytes, struct sk_buff *, skb, u32, offset,
>> + const void *, from, u32, len)
>> +{
>> +#if IS_ENABLED(CONFIG_IPV6_SEG6_BPF)
>> + struct seg6_bpf_srh_state *srh_state =
>> + this_cpu_ptr(&seg6_bpf_srh_states);
>> + void *srh_tlvs, *srh_end, *ptr;
>> + struct ipv6_sr_hdr *srh;
>> + int srhoff = 0;
>> +
>> + if (ipv6_find_hdr(skb, &srhoff, IPPROTO_ROUTING, NULL, NULL) < 0)
>> + return -EINVAL;
>> +
>> + srh = (struct ipv6_sr_hdr *)(skb->data + srhoff);
>> + srh_tlvs = (void *)((char *)srh + ((srh->first_segment + 1) << 4));
>> + srh_end = (void *)((char *)srh + sizeof(*srh) + srh_state->hdrlen);
>> +
>> + ptr = skb->data + offset;
>> + if (ptr >= srh_tlvs && ptr + len <= srh_end)
>> + srh_state->valid = 0;
>> + else if (ptr < (void *)&srh->flags ||
>> + ptr + len > (void *)&srh->segments)
>> + return -EFAULT;
>> +
>> + if (unlikely(bpf_try_make_writable(skb, offset + len)))
>> + return -EFAULT;
>> +
>> + memcpy(skb->data + offset, from, len);
>> + return 0;
>> +#else /* CONFIG_IPV6_SEG6_BPF */
>> + return -EOPNOTSUPP;
>> +#endif
>> +}
>
> Instead of doing this inside the helper you can reject the program already
> in the lwt_*_func_proto() by returning NULL when !CONFIG_IPV6_SEG6_BPF. That
> way programs get rejected at verification time instead of runtime, so the
> user can probe availability more easily.
Mathieu, before this gets lost in archives, plan to follow-up on this one?
^ permalink raw reply
* Re: [RFC V5 PATCH 8/8] vhost: event suppression for packed ring
From: Wei Xu @ 2018-05-30 11:42 UTC (permalink / raw)
To: Jason Wang; +Cc: kvm, mst, netdev, linux-kernel, virtualization
In-Reply-To: <1527559830-8133-9-git-send-email-jasowang@redhat.com>
On Tue, May 29, 2018 at 10:10:30AM +0800, Jason Wang wrote:
> This patch introduces basic support for event suppression aka driver
> and device area.
>
> Signed-off-by: Jason Wang <jasowang@redhat.com>
> ---
> drivers/vhost/vhost.c | 191 ++++++++++++++++++++++++++++++++++++---
> drivers/vhost/vhost.h | 10 +-
> include/uapi/linux/virtio_ring.h | 19 ++++
> 3 files changed, 204 insertions(+), 16 deletions(-)
>
> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> index a36e5ad2..112f680 100644
> --- a/drivers/vhost/vhost.c
> +++ b/drivers/vhost/vhost.c
> @@ -1112,10 +1112,15 @@ static int vq_access_ok_packed(struct vhost_virtqueue *vq, unsigned int num,
> struct vring_used __user *used)
> {
> struct vring_desc_packed *packed = (struct vring_desc_packed *)desc;
> + struct vring_packed_desc_event *driver_event =
> + (struct vring_packed_desc_event *)avail;
> + struct vring_packed_desc_event *device_event =
> + (struct vring_packed_desc_event *)used;
>
> - /* FIXME: check device area and driver area */
> return access_ok(VERIFY_READ, packed, num * sizeof(*packed)) &&
> - access_ok(VERIFY_WRITE, packed, num * sizeof(*packed));
> + access_ok(VERIFY_WRITE, packed, num * sizeof(*packed)) &&
> + access_ok(VERIFY_READ, driver_event, sizeof(*driver_event)) &&
> + access_ok(VERIFY_WRITE, device_event, sizeof(*device_event));
> }
>
> static int vq_access_ok_split(struct vhost_virtqueue *vq, unsigned int num,
> @@ -1190,14 +1195,27 @@ static bool iotlb_access_ok(struct vhost_virtqueue *vq,
> return true;
> }
>
> -int vq_iotlb_prefetch(struct vhost_virtqueue *vq)
> +int vq_iotlb_prefetch_packed(struct vhost_virtqueue *vq)
> +{
> + int num = vq->num;
> +
> + return iotlb_access_ok(vq, VHOST_ACCESS_RO, (u64)(uintptr_t)vq->desc,
> + num * sizeof(*vq->desc), VHOST_ADDR_DESC) &&
> + iotlb_access_ok(vq, VHOST_ACCESS_WO, (u64)(uintptr_t)vq->desc,
> + num * sizeof(*vq->desc), VHOST_ADDR_DESC) &&
> + iotlb_access_ok(vq, VHOST_ACCESS_RO,
> + (u64)(uintptr_t)vq->driver_event,
> + sizeof(*vq->driver_event), VHOST_ADDR_AVAIL) &&
> + iotlb_access_ok(vq, VHOST_ACCESS_WO,
> + (u64)(uintptr_t)vq->device_event,
> + sizeof(*vq->device_event), VHOST_ADDR_USED);
> +}
> +
> +int vq_iotlb_prefetch_split(struct vhost_virtqueue *vq)
> {
> size_t s = vhost_has_feature(vq, VIRTIO_RING_F_EVENT_IDX) ? 2 : 0;
> unsigned int num = vq->num;
>
> - if (!vq->iotlb)
> - return 1;
> -
> return iotlb_access_ok(vq, VHOST_ACCESS_RO, (u64)(uintptr_t)vq->desc,
> num * sizeof(*vq->desc), VHOST_ADDR_DESC) &&
> iotlb_access_ok(vq, VHOST_ACCESS_RO, (u64)(uintptr_t)vq->avail,
> @@ -1209,6 +1227,17 @@ int vq_iotlb_prefetch(struct vhost_virtqueue *vq)
> num * sizeof(*vq->used->ring) + s,
> VHOST_ADDR_USED);
> }
> +
> +int vq_iotlb_prefetch(struct vhost_virtqueue *vq)
> +{
> + if (!vq->iotlb)
> + return 1;
> +
> + if (vhost_has_feature(vq, VIRTIO_F_RING_PACKED))
> + return vq_iotlb_prefetch_packed(vq);
> + else
> + return vq_iotlb_prefetch_split(vq);
> +}
> EXPORT_SYMBOL_GPL(vq_iotlb_prefetch);
>
> /* Can we log writes? */
> @@ -1730,6 +1759,50 @@ static int vhost_update_used_flags(struct vhost_virtqueue *vq)
> return 0;
> }
>
> +static int vhost_update_device_flags(struct vhost_virtqueue *vq,
> + __virtio16 device_flags)
> +{
> + void __user *flags;
> +
> + if (vhost_put_user(vq, device_flags, &vq->device_event->flags,
> + VHOST_ADDR_USED) < 0)
> + return -EFAULT;
> + if (unlikely(vq->log_used)) {
> + /* Make sure the flag is seen before log. */
> + smp_wmb();
> + /* Log used flag write. */
> + flags = &vq->device_event->flags;
> + log_write(vq->log_base, vq->log_addr +
> + (flags - (void __user *)vq->device_event),
> + sizeof(vq->device_event->flags));
> + if (vq->log_ctx)
> + eventfd_signal(vq->log_ctx, 1);
> + }
> + return 0;
> +}
> +
> +static int vhost_update_device_off_wrap(struct vhost_virtqueue *vq,
> + __virtio16 device_off_wrap)
> +{
> + void __user *off_wrap;
> +
> + if (vhost_put_user(vq, device_off_wrap, &vq->device_event->off_wrap,
> + VHOST_ADDR_USED) < 0)
> + return -EFAULT;
> + if (unlikely(vq->log_used)) {
> + /* Make sure the flag is seen before log. */
> + smp_wmb();
> + /* Log used flag write. */
> + off_wrap = &vq->device_event->off_wrap;
> + log_write(vq->log_base, vq->log_addr +
> + (off_wrap - (void __user *)vq->device_event),
> + sizeof(vq->device_event->off_wrap));
> + if (vq->log_ctx)
> + eventfd_signal(vq->log_ctx, 1);
> + }
> + return 0;
> +}
> +
> static int vhost_update_avail_event(struct vhost_virtqueue *vq, u16 avail_event)
> {
> if (vhost_put_user(vq, cpu_to_vhost16(vq, vq->avail_idx),
> @@ -2683,16 +2756,13 @@ int vhost_add_used_n(struct vhost_virtqueue *vq,
> }
> EXPORT_SYMBOL_GPL(vhost_add_used_n);
>
> -static bool vhost_notify(struct vhost_dev *dev, struct vhost_virtqueue *vq)
> +static bool vhost_notify_split(struct vhost_dev *dev,
> + struct vhost_virtqueue *vq)
> {
> __u16 old, new;
> __virtio16 event;
> bool v;
>
> - /* FIXME: check driver area */
> - if (vhost_has_feature(vq, VIRTIO_F_RING_PACKED))
> - return true;
> -
> /* Flush out used index updates. This is paired
> * with the barrier that the Guest executes when enabling
> * interrupts. */
> @@ -2725,6 +2795,64 @@ static bool vhost_notify(struct vhost_dev *dev, struct vhost_virtqueue *vq)
> return vring_need_event(vhost16_to_cpu(vq, event), new, old);
> }
>
> +static bool vhost_notify_packed(struct vhost_dev *dev,
> + struct vhost_virtqueue *vq)
> +{
> + __virtio16 event_off_wrap, event_flags;
> + __u16 old, new, off_wrap;
> + bool v;
> +
> + /* Flush out used descriptors updates. This is paired
> + * with the barrier that the Guest executes when enabling
> + * interrupts.
> + */
> + smp_mb();
> +
> + if (vhost_get_avail(vq, event_flags,
> + &vq->driver_event->flags) < 0) {
> + vq_err(vq, "Failed to get driver desc_event_flags");
> + return true;
> + }
> +
> + if (!vhost_has_feature(vq, VIRTIO_RING_F_EVENT_IDX))
> + return event_flags !=
> + cpu_to_vhost16(vq, RING_EVENT_FLAGS_DISABLE);
> +
> + old = vq->signalled_used;
> + v = vq->signalled_used_valid;
> + new = vq->signalled_used = vq->last_used_idx;
> + vq->signalled_used_valid = true;
> +
> + if (event_flags != cpu_to_vhost16(vq, RING_EVENT_FLAGS_DESC))
> + return event_flags !=
> + cpu_to_vhost16(vq, RING_EVENT_FLAGS_DISABLE);
> +
> + /* Read desc event flags before event_off and event_wrap */
> + smp_rmb();
> +
> + if (vhost_get_avail(vq, event_off_wrap,
> + &vq->driver_event->off_wrap) < 0) {
> + vq_err(vq, "Failed to get driver desc_event_off/wrap");
> + return true;
> + }
> +
> + off_wrap = vhost16_to_cpu(vq, event_off_wrap);
> +
> + if (unlikely(!v))
> + return true;
> +
> + return vhost_vring_packed_need_event(vq, vq->used_wrap_counter,
> + off_wrap, new, old);
> +}
> +
> +static bool vhost_notify(struct vhost_dev *dev, struct vhost_virtqueue *vq)
> +{
> + if (vhost_has_feature(vq, VIRTIO_F_RING_PACKED))
> + return vhost_notify_packed(dev, vq);
> + else
> + return vhost_notify_split(dev, vq);
> +}
> +
> /* This actually signals the guest, using eventfd. */
> void vhost_signal(struct vhost_dev *dev, struct vhost_virtqueue *vq)
> {
> @@ -2802,10 +2930,34 @@ static bool vhost_enable_notify_packed(struct vhost_dev *dev,
> struct vhost_virtqueue *vq)
> {
> struct vring_desc_packed *d = vq->desc_packed + vq->avail_idx;
> - __virtio16 flags;
> + __virtio16 flags = RING_EVENT_FLAGS_ENABLE;
> int ret;
>
> - /* FIXME: disable notification through device area */
> + if (!(vq->used_flags & VRING_USED_F_NO_NOTIFY))
> + return false;
> + vq->used_flags &= ~VRING_USED_F_NO_NOTIFY;
'used_flags' was originally designed for 1.0, why should we pay attetion to it here?
Wei
> +
> + if (vhost_has_feature(vq, VIRTIO_RING_F_EVENT_IDX)) {
> + __virtio16 off_wrap = cpu_to_vhost16(vq, vq->avail_idx |
> + vq->avail_wrap_counter << 15);
> +
> + ret = vhost_update_device_off_wrap(vq, off_wrap);
> + if (ret) {
> + vq_err(vq, "Failed to write to off warp at %p: %d\n",
> + &vq->device_event->off_wrap, ret);
> + return false;
> + }
> + /* Make sure off_wrap is wrote before flags */
> + smp_wmb();
> + flags = RING_EVENT_FLAGS_DESC;
> + }
> +
> + ret = vhost_update_device_flags(vq, flags);
> + if (ret) {
> + vq_err(vq, "Failed to enable notification at %p: %d\n",
> + &vq->device_event->flags, ret);
> + return false;
> + }
>
> /* They could have slipped one in as we were doing that: make
> * sure it's written, then check again. */
> @@ -2871,7 +3023,18 @@ EXPORT_SYMBOL_GPL(vhost_enable_notify);
> static void vhost_disable_notify_packed(struct vhost_dev *dev,
> struct vhost_virtqueue *vq)
> {
> - /* FIXME: disable notification through device area */
> + __virtio16 flags;
> + int r;
> +
> + if (vq->used_flags & VRING_USED_F_NO_NOTIFY)
> + return;
> + vq->used_flags |= VRING_USED_F_NO_NOTIFY;
> +
> + flags = cpu_to_vhost16(vq, RING_EVENT_FLAGS_DISABLE);
> + r = vhost_update_device_flags(vq, flags);
> + if (r)
> + vq_err(vq, "Failed to enable notification at %p: %d\n",
> + &vq->device_event->flags, r);
> }
>
> static void vhost_disable_notify_split(struct vhost_dev *dev,
> diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
> index 7543a46..b920582 100644
> --- a/drivers/vhost/vhost.h
> +++ b/drivers/vhost/vhost.h
> @@ -96,8 +96,14 @@ struct vhost_virtqueue {
> struct vring_desc __user *desc;
> struct vring_desc_packed __user *desc_packed;
> };
> - struct vring_avail __user *avail;
> - struct vring_used __user *used;
> + union {
> + struct vring_avail __user *avail;
> + struct vring_packed_desc_event __user *driver_event;
> + };
> + union {
> + struct vring_used __user *used;
> + struct vring_packed_desc_event __user *device_event;
> + };
> const struct vhost_umem_node *meta_iotlb[VHOST_NUM_ADDRS];
> struct file *kick;
> struct eventfd_ctx *call_ctx;
> diff --git a/include/uapi/linux/virtio_ring.h b/include/uapi/linux/virtio_ring.h
> index e297580..71c7a46 100644
> --- a/include/uapi/linux/virtio_ring.h
> +++ b/include/uapi/linux/virtio_ring.h
> @@ -75,6 +75,25 @@ struct vring_desc_packed {
> __virtio16 flags;
> };
>
> +/* Enable events */
> +#define RING_EVENT_FLAGS_ENABLE 0x0
> +/* Disable events */
> +#define RING_EVENT_FLAGS_DISABLE 0x1
> +/*
> + * Enable events for a specific descriptor
> + * (as specified by Descriptor Ring Change Event Offset/Wrap Counter).
> + * Only valid if VIRTIO_F_RING_EVENT_IDX has been negotiated.
> + */
> +#define RING_EVENT_FLAGS_DESC 0x2
> +/* The value 0x3 is reserved */
> +
> +struct vring_packed_desc_event {
> + /* Descriptor Ring Change Event Offset and Wrap Counter */
> + __virtio16 off_wrap;
> + /* Descriptor Ring Change Event Flags */
> + __virtio16 flags;
> +};
> +
> /* Virtio ring descriptors: 16 bytes. These can chain together via "next". */
> struct vring_desc {
> /* Address (guest-physical). */
> --
> 2.7.4
>
^ permalink raw reply
* [PATCH net-next] cxgb4: Add FORCE_PAUSE bit to 32 bit port caps
From: Ganesh Goudar @ 2018-05-30 11:45 UTC (permalink / raw)
To: netdev, davem
Cc: nirranjan, indranil, Ganesh Goudar, Santosh Rastapur,
Casey Leedom
Add FORCE_PAUSE bit to force local pause settings instead
of using auto negotiated values.
Signed-off-by: Santosh Rastapur <santosh@chelsio.com>
Signed-off-by: Casey Leedom <leedom@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
---
drivers/net/ethernet/chelsio/cxgb4/t4_hw.c | 10 +++++++++-
drivers/net/ethernet/chelsio/cxgb4/t4fw_api.h | 5 +++--
2 files changed, 12 insertions(+), 3 deletions(-)
diff --git a/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c b/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c
index 39da7e3..974a868 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/t4_hw.c
@@ -3941,6 +3941,7 @@ static fw_port_cap32_t fwcaps16_to_caps32(fw_port_cap16_t caps16)
CAP16_TO_CAP32(FC_RX);
CAP16_TO_CAP32(FC_TX);
CAP16_TO_CAP32(ANEG);
+ CAP16_TO_CAP32(FORCE_PAUSE);
CAP16_TO_CAP32(MDIAUTO);
CAP16_TO_CAP32(MDISTRAIGHT);
CAP16_TO_CAP32(FEC_RS);
@@ -3982,6 +3983,7 @@ static fw_port_cap16_t fwcaps32_to_caps16(fw_port_cap32_t caps32)
CAP32_TO_CAP16(802_3_PAUSE);
CAP32_TO_CAP16(802_3_ASM_DIR);
CAP32_TO_CAP16(ANEG);
+ CAP32_TO_CAP16(FORCE_PAUSE);
CAP32_TO_CAP16(MDIAUTO);
CAP32_TO_CAP16(MDISTRAIGHT);
CAP32_TO_CAP16(FEC_RS);
@@ -4014,6 +4016,8 @@ static inline fw_port_cap32_t cc_to_fwcap_pause(enum cc_pause cc_pause)
fw_pause |= FW_PORT_CAP32_FC_RX;
if (cc_pause & PAUSE_TX)
fw_pause |= FW_PORT_CAP32_FC_TX;
+ if (!(cc_pause & PAUSE_AUTONEG))
+ fw_pause |= FW_PORT_CAP32_FORCE_PAUSE;
return fw_pause;
}
@@ -4101,7 +4105,11 @@ int t4_link_l1cfg_core(struct adapter *adapter, unsigned int mbox,
rcap = lc->acaps | fw_fc | fw_fec | fw_mdi;
}
- if (rcap & ~lc->pcaps) {
+ /* Note that older Firmware doesn't have FW_PORT_CAP32_FORCE_PAUSE, so
+ * we need to exclude this from this check in order to maintain
+ * compatibility ...
+ */
+ if ((rcap & ~lc->pcaps) & ~FW_PORT_CAP32_FORCE_PAUSE) {
dev_err(adapter->pdev_dev,
"Requested Port Capabilities %#x exceed Physical Port Capabilities %#x\n",
rcap, lc->pcaps);
diff --git a/drivers/net/ethernet/chelsio/cxgb4/t4fw_api.h b/drivers/net/ethernet/chelsio/cxgb4/t4fw_api.h
index 2d91480..f1967cf 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/t4fw_api.h
+++ b/drivers/net/ethernet/chelsio/cxgb4/t4fw_api.h
@@ -2475,7 +2475,7 @@ enum fw_port_cap {
FW_PORT_CAP_MDISTRAIGHT = 0x0400,
FW_PORT_CAP_FEC_RS = 0x0800,
FW_PORT_CAP_FEC_BASER_RS = 0x1000,
- FW_PORT_CAP_FEC_RESERVED = 0x2000,
+ FW_PORT_CAP_FORCE_PAUSE = 0x2000,
FW_PORT_CAP_802_3_PAUSE = 0x4000,
FW_PORT_CAP_802_3_ASM_DIR = 0x8000,
};
@@ -2522,7 +2522,8 @@ enum fw_port_mdi {
#define FW_PORT_CAP32_FEC_RESERVED1 0x02000000UL
#define FW_PORT_CAP32_FEC_RESERVED2 0x04000000UL
#define FW_PORT_CAP32_FEC_RESERVED3 0x08000000UL
-#define FW_PORT_CAP32_RESERVED2 0xf0000000UL
+#define FW_PORT_CAP32_FORCE_PAUSE 0x10000000UL
+#define FW_PORT_CAP32_RESERVED2 0xe0000000UL
#define FW_PORT_CAP32_SPEED_S 0
#define FW_PORT_CAP32_SPEED_M 0xfff
--
2.1.0
^ permalink raw reply related
* Re: [PATCH net-next] net: qcom/emac: fix unused variable
From: Timur Tabi @ 2018-05-30 12:10 UTC (permalink / raw)
To: YueHaibing, davem; +Cc: netdev, linux-kernel
In-Reply-To: <20180529104343.19448-1-yuehaibing@huawei.com>
On 5/29/18 5:43 AM, YueHaibing wrote:
> When CONFIG_ACPI isn't set, variable qdf2400_ops/qdf2432_ops isn't used.
> drivers/net/ethernet/qualcomm/emac/emac-sgmii.c:284:25: warning: ‘qdf2400_ops’ defined but not used [-Wunused-variable]
> static struct sgmii_ops qdf2400_ops = {
> ^~~~~~~~~~~
> drivers/net/ethernet/qualcomm/emac/emac-sgmii.c:276:25: warning: ‘qdf2432_ops’ defined but not used [-Wunused-variable]
> static struct sgmii_ops qdf2432_ops = {
> ^~~~~~~~~~~
>
> Move the declaration and functions inside the CONFIG_ACPI ifdef
> to fix the warning.
> Signed-off-by: YueHaibing<yuehaibing@huawei.com>
I already fixed this with:
https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git/commit/?id=d377df784178bf5b0a39e75dc8b1ee86e1abb3f6
--
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm
Technologies, Inc. Qualcomm Technologies, Inc. is a member of the
Code Aurora Forum, a Linux Foundation Collaborative Project.
^ permalink raw reply
* [PATCH] rtnetlink: Add more well known protocol values
From: Donald Sharp @ 2018-05-30 12:27 UTC (permalink / raw)
To: netdev, dsahern
FRRouting installs routes into the kernel associated with
the originating protocol. Add these values to the well
known values in rtnetlink.h.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
---
v2: Fixed whitespace issues
include/uapi/linux/rtnetlink.h | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/include/uapi/linux/rtnetlink.h b/include/uapi/linux/rtnetlink.h
index cabb210c93af..7d8502313c99 100644
--- a/include/uapi/linux/rtnetlink.h
+++ b/include/uapi/linux/rtnetlink.h
@@ -254,6 +254,11 @@ enum {
#define RTPROT_DHCP 16 /* DHCP client */
#define RTPROT_MROUTED 17 /* Multicast daemon */
#define RTPROT_BABEL 42 /* Babel daemon */
+#define RTPROT_BGP 186 /* BGP Routes */
+#define RTPROT_ISIS 187 /* ISIS Routes */
+#define RTPROT_OSPF 188 /* OSPF Routes */
+#define RTPROT_RIP 189 /* RIP Routes */
+#define RTPROT_EIGRP 192 /* EIGRP Routes */
/* rtm_scope
--
2.14.3
^ permalink raw reply related
* Re: [PATCH mlx5-next v2 11/13] IB/mlx5: Add flow counters binding support
From: Yishai Hadas @ 2018-05-30 12:31 UTC (permalink / raw)
To: Jason Gunthorpe
Cc: Leon Romanovsky, Doug Ledford, Leon Romanovsky, RDMA mailing list,
Boris Pismenny, Matan Barak, Raed Salem, Yishai Hadas,
Saeed Mahameed, linux-netdev, Alex Rosenbaum
In-Reply-To: <20180529195627.GA31423@ziepe.ca>
On 5/29/2018 10:56 PM, Jason Gunthorpe wrote:
> On Tue, May 29, 2018 at 04:09:15PM +0300, Leon Romanovsky wrote:
>> diff --git a/include/uapi/rdma/mlx5-abi.h b/include/uapi/rdma/mlx5-abi.h
>> index 508ea8c82da7..ef3f430a7050 100644
>> +++ b/include/uapi/rdma/mlx5-abi.h
>> @@ -443,4 +443,18 @@ enum {
>> enum {
>> MLX5_IB_CLOCK_INFO_V1 = 0,
>> };
>> +
>> +struct mlx5_ib_flow_counters_data {
>> + __aligned_u64 counters_data;
>> + __u32 ncounters;
>> + __u32 reserved;
>> +};
>> +
>> +struct mlx5_ib_create_flow {
>> + __u32 ncounters_data;
>> + __u32 reserved;
>> + /* Following are counters data based on ncounters_data */
>> + struct mlx5_ib_flow_counters_data data[];
>> +};
>> +
>> #endif /* MLX5_ABI_USER_H */
>
> This uapi thing still needs to be fixed as I pointed out before.
In V3 we can go with below, no change in memory layout but it can
clarify the code/usage.
struct mlx5_ib_flow_counters_desc {
__u32 description;
__u32 index;
};
struct mlx5_ib_flow_counters_data {
RDMA_UAPI_PTR(struct mlx5_ib_flow_counters_desc *, counters_data);
__u32 ncounters;
__u32 reserved;
};
struct mlx5_ib_create_flow {
__u32 ncounters_data;
__u32 reserved;
/* Following are counters data based on ncounters_data */
struct mlx5_ib_flow_counters_data data[];
> I still can't figure out why this should be a 2d array.
This comes to support the future case of multiple counters objects/specs
passed with the same flow. There is a need to differentiate mapping data
for each counters object and that is done via the 'ncounters_data' field
and the 2d array.
I think it
> should be written simply as:
>
> struct mlx5_ib_flow_counter_desc {
> __u32 description;
> __u32 index;
> };
>
> struct mlx5_ib_create_flow {
> RDMA_UAPI_PTR(struct mlx5_ib_flow_counter_desc, counters_data);
> __u32 ncounters;
> __u32 reserved;
> };
>
> With the corresponding changes elsewhere.
>
This doesn't support the above use case.
> A flex array at the end of a struct means that the struct can never be
> extended again which seems like a terrible idea,
The header [1] has a fixed size and will always exist even if there will
be no counters. Future extensions [2] will be added in the memory post
the flex array which its size depends on 'ncounters_data'. This pattern
is used also in other extended APIs. [3]
struct mlx5_ib_create_flow {
__u32 ncounters_data;
__u32 reserved;
[1] /* Header is above ********
/* Following are counters data based on ncounters_data */
struct mlx5_ib_flow_counters_data data[];
[2] Future fields.
[3]
https://elixir.bootlin.com/linux/latest/source/include/uapi/rdma/ib_user_verbs.h#L1145
^ permalink raw reply
* Re: [PATCH] rtnetlink: Add more well known protocol values
From: Donald Sharp @ 2018-05-30 12:32 UTC (permalink / raw)
To: netdev, David Ahern
In-Reply-To: <20180530122732.3688-1-sharpd@cumulusnetworks.com>
This patch is intended for net-next.
thanks!
donald
On Wed, May 30, 2018 at 8:27 AM, Donald Sharp
<sharpd@cumulusnetworks.com> wrote:
> FRRouting installs routes into the kernel associated with
> the originating protocol. Add these values to the well
> known values in rtnetlink.h.
>
> Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
> ---
> v2: Fixed whitespace issues
> include/uapi/linux/rtnetlink.h | 5 +++++
> 1 file changed, 5 insertions(+)
>
> diff --git a/include/uapi/linux/rtnetlink.h b/include/uapi/linux/rtnetlink.h
> index cabb210c93af..7d8502313c99 100644
> --- a/include/uapi/linux/rtnetlink.h
> +++ b/include/uapi/linux/rtnetlink.h
> @@ -254,6 +254,11 @@ enum {
> #define RTPROT_DHCP 16 /* DHCP client */
> #define RTPROT_MROUTED 17 /* Multicast daemon */
> #define RTPROT_BABEL 42 /* Babel daemon */
> +#define RTPROT_BGP 186 /* BGP Routes */
> +#define RTPROT_ISIS 187 /* ISIS Routes */
> +#define RTPROT_OSPF 188 /* OSPF Routes */
> +#define RTPROT_RIP 189 /* RIP Routes */
> +#define RTPROT_EIGRP 192 /* EIGRP Routes */
>
> /* rtm_scope
>
> --
> 2.14.3
>
^ permalink raw reply
* [PATCH net-next] qed: Add srq core support for RoCE and iWARP
From: Yuval Bason @ 2018-05-30 13:11 UTC (permalink / raw)
To: yuval.bason, davem
Cc: netdev, jgg, dledford, linux-rdma, Michal Kalderon, Ariel Elior
This patch adds support for configuring SRQ and provides the necessary
APIs for rdma upper layer driver (qedr) to enable the SRQ feature.
Signed-off-by: Michal Kalderon <michal.kalderon@cavium.com>
Signed-off-by: Ariel Elior <ariel.elior@cavium.com>
Signed-off-by: Yuval Bason <yuval.bason@cavium.com>
---
drivers/net/ethernet/qlogic/qed/qed_cxt.c | 5 +-
drivers/net/ethernet/qlogic/qed/qed_cxt.h | 1 +
drivers/net/ethernet/qlogic/qed/qed_hsi.h | 2 +
drivers/net/ethernet/qlogic/qed/qed_iwarp.c | 23 ++++
drivers/net/ethernet/qlogic/qed/qed_main.c | 2 +
drivers/net/ethernet/qlogic/qed/qed_rdma.c | 179 +++++++++++++++++++++++++++-
drivers/net/ethernet/qlogic/qed/qed_rdma.h | 2 +
drivers/net/ethernet/qlogic/qed/qed_roce.c | 17 ++-
include/linux/qed/qed_rdma_if.h | 12 +-
9 files changed, 235 insertions(+), 8 deletions(-)
diff --git a/drivers/net/ethernet/qlogic/qed/qed_cxt.c b/drivers/net/ethernet/qlogic/qed/qed_cxt.c
index 820b226..7ed6aa0 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_cxt.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_cxt.c
@@ -47,6 +47,7 @@
#include "qed_hsi.h"
#include "qed_hw.h"
#include "qed_init_ops.h"
+#include "qed_rdma.h"
#include "qed_reg_addr.h"
#include "qed_sriov.h"
@@ -426,7 +427,7 @@ static void qed_cxt_set_srq_count(struct qed_hwfn *p_hwfn, u32 num_srqs)
p_mgr->srq_count = num_srqs;
}
-static u32 qed_cxt_get_srq_count(struct qed_hwfn *p_hwfn)
+u32 qed_cxt_get_srq_count(struct qed_hwfn *p_hwfn)
{
struct qed_cxt_mngr *p_mgr = p_hwfn->p_cxt_mngr;
@@ -2071,7 +2072,7 @@ static void qed_rdma_set_pf_params(struct qed_hwfn *p_hwfn,
u32 num_cons, num_qps, num_srqs;
enum protocol_type proto;
- num_srqs = min_t(u32, 32 * 1024, p_params->num_srqs);
+ num_srqs = min_t(u32, QED_RDMA_MAX_SRQS, p_params->num_srqs);
if (p_hwfn->mcp_info->func_info.protocol == QED_PCI_ETH_RDMA) {
DP_NOTICE(p_hwfn,
diff --git a/drivers/net/ethernet/qlogic/qed/qed_cxt.h b/drivers/net/ethernet/qlogic/qed/qed_cxt.h
index a4e9586..758a8b4 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_cxt.h
+++ b/drivers/net/ethernet/qlogic/qed/qed_cxt.h
@@ -235,6 +235,7 @@ u32 qed_cxt_get_proto_tid_count(struct qed_hwfn *p_hwfn,
enum protocol_type type);
u32 qed_cxt_get_proto_cid_start(struct qed_hwfn *p_hwfn,
enum protocol_type type);
+u32 qed_cxt_get_srq_count(struct qed_hwfn *p_hwfn);
int qed_cxt_free_proto_ilt(struct qed_hwfn *p_hwfn, enum protocol_type proto);
#define QED_CTX_WORKING_MEM 0
diff --git a/drivers/net/ethernet/qlogic/qed/qed_hsi.h b/drivers/net/ethernet/qlogic/qed/qed_hsi.h
index 8e1e6e1..82ce401 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_hsi.h
+++ b/drivers/net/ethernet/qlogic/qed/qed_hsi.h
@@ -9725,6 +9725,8 @@ enum iwarp_eqe_async_opcode {
IWARP_EVENT_TYPE_ASYNC_EXCEPTION_DETECTED,
IWARP_EVENT_TYPE_ASYNC_QP_IN_ERROR_STATE,
IWARP_EVENT_TYPE_ASYNC_CQ_OVERFLOW,
+ IWARP_EVENT_TYPE_ASYNC_SRQ_EMPTY,
+ IWARP_EVENT_TYPE_ASYNC_SRQ_LIMIT,
MAX_IWARP_EQE_ASYNC_OPCODE
};
diff --git a/drivers/net/ethernet/qlogic/qed/qed_iwarp.c b/drivers/net/ethernet/qlogic/qed/qed_iwarp.c
index 2a2b101..474e6cf 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_iwarp.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_iwarp.c
@@ -271,6 +271,8 @@ int qed_iwarp_create_qp(struct qed_hwfn *p_hwfn,
p_ramrod->sq_num_pages = qp->sq_num_pages;
p_ramrod->rq_num_pages = qp->rq_num_pages;
+ p_ramrod->srq_id.srq_idx = cpu_to_le16(qp->srq_id);
+ p_ramrod->srq_id.opaque_fid = cpu_to_le16(p_hwfn->hw_info.opaque_fid);
p_ramrod->qp_handle_for_cqe.hi = cpu_to_le32(qp->qp_handle.hi);
p_ramrod->qp_handle_for_cqe.lo = cpu_to_le32(qp->qp_handle.lo);
@@ -3004,8 +3006,11 @@ static int qed_iwarp_async_event(struct qed_hwfn *p_hwfn,
union event_ring_data *data,
u8 fw_return_code)
{
+ struct qed_rdma_events events = p_hwfn->p_rdma_info->events;
struct regpair *fw_handle = &data->rdma_data.async_handle;
struct qed_iwarp_ep *ep = NULL;
+ u16 srq_offset;
+ u16 srq_id;
u16 cid;
ep = (struct qed_iwarp_ep *)(uintptr_t)HILO_64(fw_handle->hi,
@@ -3067,6 +3072,24 @@ static int qed_iwarp_async_event(struct qed_hwfn *p_hwfn,
qed_iwarp_cid_cleaned(p_hwfn, cid);
break;
+ case IWARP_EVENT_TYPE_ASYNC_SRQ_EMPTY:
+ DP_NOTICE(p_hwfn, "IWARP_EVENT_TYPE_ASYNC_SRQ_EMPTY\n");
+ srq_offset = p_hwfn->p_rdma_info->srq_id_offset;
+ /* FW assigns value that is no greater than u16 */
+ srq_id = ((u16)le32_to_cpu(fw_handle->lo)) - srq_offset;
+ events.affiliated_event(events.context,
+ QED_IWARP_EVENT_SRQ_EMPTY,
+ &srq_id);
+ break;
+ case IWARP_EVENT_TYPE_ASYNC_SRQ_LIMIT:
+ DP_NOTICE(p_hwfn, "IWARP_EVENT_TYPE_ASYNC_SRQ_LIMIT\n");
+ srq_offset = p_hwfn->p_rdma_info->srq_id_offset;
+ /* FW assigns value that is no greater than u16 */
+ srq_id = ((u16)le32_to_cpu(fw_handle->lo)) - srq_offset;
+ events.affiliated_event(events.context,
+ QED_IWARP_EVENT_SRQ_LIMIT,
+ &srq_id);
+ break;
case IWARP_EVENT_TYPE_ASYNC_CQ_OVERFLOW:
DP_NOTICE(p_hwfn, "IWARP_EVENT_TYPE_ASYNC_CQ_OVERFLOW\n");
diff --git a/drivers/net/ethernet/qlogic/qed/qed_main.c b/drivers/net/ethernet/qlogic/qed/qed_main.c
index 68c4399..b04d57c 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_main.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_main.c
@@ -64,6 +64,7 @@
#define QED_ROCE_QPS (8192)
#define QED_ROCE_DPIS (8)
+#define QED_RDMA_SRQS QED_ROCE_QPS
static char version[] =
"QLogic FastLinQ 4xxxx Core Module qed " DRV_MODULE_VERSION "\n";
@@ -922,6 +923,7 @@ static void qed_update_pf_params(struct qed_dev *cdev,
if (IS_ENABLED(CONFIG_QED_RDMA)) {
params->rdma_pf_params.num_qps = QED_ROCE_QPS;
params->rdma_pf_params.min_dpis = QED_ROCE_DPIS;
+ params->rdma_pf_params.num_srqs = QED_RDMA_SRQS;
/* divide by 3 the MRs to avoid MF ILT overflow */
params->rdma_pf_params.gl_pi = QED_ROCE_PROTOCOL_INDEX;
}
diff --git a/drivers/net/ethernet/qlogic/qed/qed_rdma.c b/drivers/net/ethernet/qlogic/qed/qed_rdma.c
index a411f9c..bd23659 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_rdma.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_rdma.c
@@ -259,15 +259,29 @@ static int qed_rdma_alloc(struct qed_hwfn *p_hwfn,
goto free_cid_map;
}
+ /* Allocate bitmap for srqs */
+ p_rdma_info->num_srqs = qed_cxt_get_srq_count(p_hwfn);
+ rc = qed_rdma_bmap_alloc(p_hwfn, &p_rdma_info->srq_map,
+ p_rdma_info->num_srqs, "SRQ");
+ if (rc) {
+ DP_VERBOSE(p_hwfn, QED_MSG_RDMA,
+ "Failed to allocate srq bitmap, rc = %d\n", rc);
+ goto free_real_cid_map;
+ }
+
if (QED_IS_IWARP_PERSONALITY(p_hwfn))
rc = qed_iwarp_alloc(p_hwfn);
if (rc)
- goto free_cid_map;
+ goto free_srq_map;
DP_VERBOSE(p_hwfn, QED_MSG_RDMA, "Allocation successful\n");
return 0;
+free_srq_map:
+ kfree(p_rdma_info->srq_map.bitmap);
+free_real_cid_map:
+ kfree(p_rdma_info->real_cid_map.bitmap);
free_cid_map:
kfree(p_rdma_info->cid_map.bitmap);
free_tid_map:
@@ -351,6 +365,8 @@ static void qed_rdma_resc_free(struct qed_hwfn *p_hwfn)
qed_rdma_bmap_free(p_hwfn, &p_hwfn->p_rdma_info->cq_map, 1);
qed_rdma_bmap_free(p_hwfn, &p_hwfn->p_rdma_info->toggle_bits, 0);
qed_rdma_bmap_free(p_hwfn, &p_hwfn->p_rdma_info->tid_map, 1);
+ qed_rdma_bmap_free(p_hwfn, &p_hwfn->p_rdma_info->srq_map, 1);
+ qed_rdma_bmap_free(p_hwfn, &p_hwfn->p_rdma_info->real_cid_map, 1);
kfree(p_rdma_info->port);
kfree(p_rdma_info->dev);
@@ -431,6 +447,12 @@ static void qed_rdma_init_devinfo(struct qed_hwfn *p_hwfn,
if (cdev->rdma_max_sge)
dev->max_sge = min_t(u32, cdev->rdma_max_sge, dev->max_sge);
+ dev->max_srq_sge = QED_RDMA_MAX_SGE_PER_SRQ_WQE;
+ if (p_hwfn->cdev->rdma_max_srq_sge) {
+ dev->max_srq_sge = min_t(u32,
+ p_hwfn->cdev->rdma_max_srq_sge,
+ dev->max_srq_sge);
+ }
dev->max_inline = ROCE_REQ_MAX_INLINE_DATA_SIZE;
dev->max_inline = (cdev->rdma_max_inline) ?
@@ -474,6 +496,8 @@ static void qed_rdma_init_devinfo(struct qed_hwfn *p_hwfn,
dev->max_mr_mw_fmr_size = dev->max_mr_mw_fmr_pbl * PAGE_SIZE;
dev->max_pkey = QED_RDMA_MAX_P_KEY;
+ dev->max_srq = p_hwfn->p_rdma_info->num_srqs;
+ dev->max_srq_wr = QED_RDMA_MAX_SRQ_WQE_ELEM;
dev->max_qp_resp_rd_atomic_resc = RDMA_RING_PAGE_SIZE /
(RDMA_RESP_RD_ATOMIC_ELM_SIZE * 2);
dev->max_qp_req_rd_atomic_resc = RDMA_RING_PAGE_SIZE /
@@ -1628,6 +1652,156 @@ static void *qed_rdma_get_rdma_ctx(struct qed_dev *cdev)
return QED_LEADING_HWFN(cdev);
}
+int qed_rdma_modify_srq(void *rdma_cxt,
+ struct qed_rdma_modify_srq_in_params *in_params)
+{
+ struct rdma_srq_modify_ramrod_data *p_ramrod;
+ struct qed_hwfn *p_hwfn = rdma_cxt;
+ struct qed_sp_init_data init_data;
+ struct qed_spq_entry *p_ent;
+ u16 opaque_fid;
+ int rc;
+
+ memset(&init_data, 0, sizeof(init_data));
+ init_data.opaque_fid = p_hwfn->hw_info.opaque_fid;
+ init_data.comp_mode = QED_SPQ_MODE_EBLOCK;
+
+ rc = qed_sp_init_request(p_hwfn, &p_ent,
+ RDMA_RAMROD_MODIFY_SRQ,
+ p_hwfn->p_rdma_info->proto, &init_data);
+ if (rc)
+ return rc;
+
+ p_ramrod = &p_ent->ramrod.rdma_modify_srq;
+ p_ramrod->srq_id.srq_idx = cpu_to_le16(in_params->srq_id);
+ opaque_fid = p_hwfn->hw_info.opaque_fid;
+ p_ramrod->srq_id.opaque_fid = cpu_to_le16(opaque_fid);
+ p_ramrod->wqe_limit = cpu_to_le16(in_params->wqe_limit);
+
+ rc = qed_spq_post(p_hwfn, p_ent, NULL);
+ if (rc)
+ return rc;
+
+ DP_VERBOSE(p_hwfn, QED_MSG_RDMA, "modified SRQ id = %x",
+ in_params->srq_id);
+
+ return rc;
+}
+
+int qed_rdma_destroy_srq(void *rdma_cxt,
+ struct qed_rdma_destroy_srq_in_params *in_params)
+{
+ struct rdma_srq_destroy_ramrod_data *p_ramrod;
+ struct qed_hwfn *p_hwfn = rdma_cxt;
+ struct qed_sp_init_data init_data;
+ struct qed_spq_entry *p_ent;
+ struct qed_bmap *bmap;
+ u16 opaque_fid;
+ int rc;
+
+ opaque_fid = p_hwfn->hw_info.opaque_fid;
+
+ memset(&init_data, 0, sizeof(init_data));
+ init_data.opaque_fid = opaque_fid;
+ init_data.comp_mode = QED_SPQ_MODE_EBLOCK;
+
+ rc = qed_sp_init_request(p_hwfn, &p_ent,
+ RDMA_RAMROD_DESTROY_SRQ,
+ p_hwfn->p_rdma_info->proto, &init_data);
+ if (rc)
+ return rc;
+
+ p_ramrod = &p_ent->ramrod.rdma_destroy_srq;
+ p_ramrod->srq_id.srq_idx = cpu_to_le16(in_params->srq_id);
+ p_ramrod->srq_id.opaque_fid = cpu_to_le16(opaque_fid);
+
+ rc = qed_spq_post(p_hwfn, p_ent, NULL);
+ if (rc)
+ return rc;
+
+ bmap = &p_hwfn->p_rdma_info->srq_map;
+
+ spin_lock_bh(&p_hwfn->p_rdma_info->lock);
+ qed_bmap_release_id(p_hwfn, bmap, in_params->srq_id);
+ spin_unlock_bh(&p_hwfn->p_rdma_info->lock);
+
+ DP_VERBOSE(p_hwfn, QED_MSG_RDMA, "SRQ destroyed Id = %x",
+ in_params->srq_id);
+
+ return rc;
+}
+
+int qed_rdma_create_srq(void *rdma_cxt,
+ struct qed_rdma_create_srq_in_params *in_params,
+ struct qed_rdma_create_srq_out_params *out_params)
+{
+ struct rdma_srq_create_ramrod_data *p_ramrod;
+ struct qed_hwfn *p_hwfn = rdma_cxt;
+ struct qed_sp_init_data init_data;
+ enum qed_cxt_elem_type elem_type;
+ struct qed_spq_entry *p_ent;
+ u16 opaque_fid, srq_id;
+ struct qed_bmap *bmap;
+ u32 returned_id;
+ int rc;
+
+ bmap = &p_hwfn->p_rdma_info->srq_map;
+ spin_lock_bh(&p_hwfn->p_rdma_info->lock);
+ rc = qed_rdma_bmap_alloc_id(p_hwfn, bmap, &returned_id);
+ spin_unlock_bh(&p_hwfn->p_rdma_info->lock);
+
+ if (rc) {
+ DP_NOTICE(p_hwfn, "failed to allocate srq id\n");
+ return rc;
+ }
+
+ elem_type = QED_ELEM_SRQ;
+ rc = qed_cxt_dynamic_ilt_alloc(p_hwfn, elem_type, returned_id);
+ if (rc)
+ goto err;
+ /* returned id is no greater than u16 */
+ srq_id = (u16)returned_id;
+ opaque_fid = p_hwfn->hw_info.opaque_fid;
+
+ memset(&init_data, 0, sizeof(init_data));
+ opaque_fid = p_hwfn->hw_info.opaque_fid;
+ init_data.opaque_fid = opaque_fid;
+ init_data.comp_mode = QED_SPQ_MODE_EBLOCK;
+
+ rc = qed_sp_init_request(p_hwfn, &p_ent,
+ RDMA_RAMROD_CREATE_SRQ,
+ p_hwfn->p_rdma_info->proto, &init_data);
+ if (rc)
+ goto err;
+
+ p_ramrod = &p_ent->ramrod.rdma_create_srq;
+ DMA_REGPAIR_LE(p_ramrod->pbl_base_addr, in_params->pbl_base_addr);
+ p_ramrod->pages_in_srq_pbl = cpu_to_le16(in_params->num_pages);
+ p_ramrod->pd_id = cpu_to_le16(in_params->pd_id);
+ p_ramrod->srq_id.srq_idx = cpu_to_le16(srq_id);
+ p_ramrod->srq_id.opaque_fid = cpu_to_le16(opaque_fid);
+ p_ramrod->page_size = cpu_to_le16(in_params->page_size);
+ DMA_REGPAIR_LE(p_ramrod->producers_addr, in_params->prod_pair_addr);
+
+ rc = qed_spq_post(p_hwfn, p_ent, NULL);
+ if (rc)
+ goto err;
+
+ out_params->srq_id = srq_id;
+
+ DP_VERBOSE(p_hwfn, QED_MSG_RDMA,
+ "SRQ created Id = %x\n", out_params->srq_id);
+
+ return rc;
+
+err:
+ spin_lock_bh(&p_hwfn->p_rdma_info->lock);
+ qed_bmap_release_id(p_hwfn, bmap, returned_id);
+ spin_unlock_bh(&p_hwfn->p_rdma_info->lock);
+
+ return rc;
+}
+
bool qed_rdma_allocated_qps(struct qed_hwfn *p_hwfn)
{
bool result;
@@ -1773,6 +1947,9 @@ static int qed_roce_ll2_set_mac_filter(struct qed_dev *cdev,
.rdma_free_tid = &qed_rdma_free_tid,
.rdma_register_tid = &qed_rdma_register_tid,
.rdma_deregister_tid = &qed_rdma_deregister_tid,
+ .rdma_create_srq = &qed_rdma_create_srq,
+ .rdma_modify_srq = &qed_rdma_modify_srq,
+ .rdma_destroy_srq = &qed_rdma_destroy_srq,
.ll2_acquire_connection = &qed_ll2_acquire_connection,
.ll2_establish_connection = &qed_ll2_establish_connection,
.ll2_terminate_connection = &qed_ll2_terminate_connection,
diff --git a/drivers/net/ethernet/qlogic/qed/qed_rdma.h b/drivers/net/ethernet/qlogic/qed/qed_rdma.h
index 18ec9cb..6f722ee 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_rdma.h
+++ b/drivers/net/ethernet/qlogic/qed/qed_rdma.h
@@ -96,6 +96,8 @@ struct qed_rdma_info {
u8 num_cnqs;
u32 num_qps;
u32 num_mrs;
+ u32 num_srqs;
+ u16 srq_id_offset;
u16 queue_zone_base;
u16 max_queue_zones;
enum protocol_type proto;
diff --git a/drivers/net/ethernet/qlogic/qed/qed_roce.c b/drivers/net/ethernet/qlogic/qed/qed_roce.c
index 6acfd43..ee57fcd 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_roce.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_roce.c
@@ -65,6 +65,8 @@
u8 fw_event_code,
u16 echo, union event_ring_data *data, u8 fw_return_code)
{
+ struct qed_rdma_events events = p_hwfn->p_rdma_info->events;
+
if (fw_event_code == ROCE_ASYNC_EVENT_DESTROY_QP_DONE) {
u16 icid =
(u16)le32_to_cpu(data->rdma_data.rdma_destroy_qp_data.cid);
@@ -75,11 +77,18 @@
*/
qed_roce_free_real_icid(p_hwfn, icid);
} else {
- struct qed_rdma_events *events = &p_hwfn->p_rdma_info->events;
+ if (fw_event_code == ROCE_ASYNC_EVENT_SRQ_EMPTY ||
+ fw_event_code == ROCE_ASYNC_EVENT_SRQ_LIMIT) {
+ u16 srq_id = (u16)data->rdma_data.async_handle.lo;
+
+ events.affiliated_event(events.context, fw_event_code,
+ &srq_id);
+ } else {
+ union rdma_eqe_data rdata = data->rdma_data;
- events->affiliated_event(p_hwfn->p_rdma_info->events.context,
- fw_event_code,
- (void *)&data->rdma_data.async_handle);
+ events.affiliated_event(events.context, fw_event_code,
+ (void *)&rdata.async_handle);
+ }
}
return 0;
diff --git a/include/linux/qed/qed_rdma_if.h b/include/linux/qed/qed_rdma_if.h
index 4dd72ba..e05e320 100644
--- a/include/linux/qed/qed_rdma_if.h
+++ b/include/linux/qed/qed_rdma_if.h
@@ -485,7 +485,9 @@ enum qed_iwarp_event_type {
QED_IWARP_EVENT_ACTIVE_MPA_REPLY,
QED_IWARP_EVENT_LOCAL_ACCESS_ERROR,
QED_IWARP_EVENT_REMOTE_OPERATION_ERROR,
- QED_IWARP_EVENT_TERMINATE_RECEIVED
+ QED_IWARP_EVENT_TERMINATE_RECEIVED,
+ QED_IWARP_EVENT_SRQ_LIMIT,
+ QED_IWARP_EVENT_SRQ_EMPTY,
};
enum qed_tcp_ip_version {
@@ -646,6 +648,14 @@ struct qed_rdma_ops {
int (*rdma_alloc_tid)(void *rdma_cxt, u32 *itid);
void (*rdma_free_tid)(void *rdma_cxt, u32 itid);
+ int (*rdma_create_srq)(void *rdma_cxt,
+ struct qed_rdma_create_srq_in_params *iparams,
+ struct qed_rdma_create_srq_out_params *oparams);
+ int (*rdma_destroy_srq)(void *rdma_cxt,
+ struct qed_rdma_destroy_srq_in_params *iparams);
+ int (*rdma_modify_srq)(void *rdma_cxt,
+ struct qed_rdma_modify_srq_in_params *iparams);
+
int (*ll2_acquire_connection)(void *rdma_cxt,
struct qed_ll2_acquire_data *data);
--
1.8.3.1
^ permalink raw reply related
* Re: [RFC PATCH ghak32 V2 00/13] audit: implement container id
From: Steve Grubb @ 2018-05-30 13:20 UTC (permalink / raw)
To: linux-audit-H+wXaHxf7aLQT0dZR+AlfA
Cc: simo-H+wXaHxf7aLQT0dZR+AlfA, jlayton-H+wXaHxf7aLQT0dZR+AlfA,
linux-api-u79uwXL29TY76Z2rM5mHXA,
containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, LKML,
eparis-FjpueFixGhCM4zKIHC2jIg, dhowells-H+wXaHxf7aLQT0dZR+AlfA,
carlos-H+wXaHxf7aLQT0dZR+AlfA, ebiederm-aS9lmoZGLiVWk0Htik3J/w,
luto-DgEjT+Ai2ygdnm+yROfE0A, netdev-u79uwXL29TY76Z2rM5mHXA,
linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
cgroups-u79uwXL29TY76Z2rM5mHXA,
viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn
In-Reply-To: <cover.1521179281.git.rgb-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
On Friday, March 16, 2018 5:00:27 AM EDT Richard Guy Briggs wrote:
> Implement audit kernel container ID.
>
> This patchset is a second RFC based on the proposal document (V3)
> posted:
> https://www.redhat.com/archives/linux-audit/2018-January/msg00014.html
So, if you work on a container orchestrator, how exactly is this set of
interfaces to be used and in what order?
Thanks,
-Steve
> The first patch implements the proc fs write to set the audit container
> ID of a process, emitting an AUDIT_CONTAINER record to announce the
> registration of that container ID on that process. This patch requires
> userspace support for record acceptance and proper type display.
>
> The second checks for children or co-threads and refuses to set the
> container ID if either are present. (This policy could be changed to
> set both with the same container ID provided they meet the rest of the
> requirements.)
>
> The third implements the auxiliary record AUDIT_CONTAINER_INFO if a
> container ID is identifiable with an event. This patch requires
> userspace support for proper type display.
>
> The fourth adds container ID filtering to the exit, exclude and user
> lists. This patch requires auditctil userspace support for the
> --containerid option.
>
> The 5th adds signal and ptrace support.
>
> The 6th creates a local audit context to be able to bind a standalone
> record with a locally created auxiliary record.
>
> The 7th, 8th, 9th, 10th patches add container ID records to standalone
> records. Some of these may end up being syscall auxiliary records and
> won't need this specific support since they'll be supported via
> syscalls.
>
> The 11th adds network namespace container ID labelling based on member
> tasks' container ID labels.
>
> The 12th adds container ID support to standalone netfilter records that
> don't have a task context and lists each container to which that net
> namespace belongs.
>
> The 13th implements reading the container ID from the proc filesystem
> for debugging. This patch isn't planned for upstream inclusion.
>
> Feedback please!
>
> Example: Set a container ID of 123456 to the "sleep" task:
> sleep 2&
> child=$!
> echo 123456 > /proc/$child/containerid; echo $?
> ausearch -ts recent -m container
> echo child:$child contid:$( cat /proc/$child/containerid)
> This should produce a record such as:
> type=CONTAINER msg=audit(1521122590.315:222): op=set pid=689 uid=0
> subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 auid=0 tty=pts0
> ses=3 opid=707 old-contid=18446744073709551615 contid=123456 res=1
>
> Example: Set a filter on a container ID 123459 on /tmp/tmpcontainerid:
> containerid=123459
> key=tmpcontainerid
> auditctl -a exit,always -F dir=/tmp -F perm=wa -F containerid=$containerid
> -F key=$key perl -e "sleep 1; open(my \$tmpfile, '>', \"/tmp/$key\");
> close(\$tmpfile);" & child=$!
> echo $containerid > /proc/$child/containerid
> sleep 2
> ausearch -i -ts recent -k $key
> auditctl -d exit,always -F dir=/tmp -F perm=wa -F containerid=$containerid
> -F key=$key rm -f /tmp/$key
> This should produce an event such as:
> type=CONTAINER_INFO msg=audit(1521122591.614:227): op=task contid=123459
> type=PROCTITLE msg=audit(1521122591.614:227):
> proctitle=7065726C002D6500736C65657020313B206F70656E286D792024746D7066696C
> 652C20273E272C20222F746D702F746D70636F6E7461696E6572696422293B20636C6F73652
> 824746D7066696C65293B type=PATH msg=audit(1521122591.614:227): item=1
> name="/tmp/tmpcontainerid" inode=18427 dev=00:26 mode=0100644 ouid=0
> ogid=0 rdev=00:00 obj=unconfined_u:object_r:user_tmp_t:s0 nametype=CREATE
> cap_fp=0000000000000000 cap_fi=0000000000000000 cap_fe=0 cap_fver=0
> type=PATH msg=audit(1521122591.614:227): item=0 name="/tmp/" inode=13513
> dev=00:26 mode=041777 ouid=0 ogid=0 rdev=00:00
> obj=system_u:object_r:tmp_t:s0 nametype=PARENT cap_fp=0000000000000000
> cap_fi=0000000000000000 cap_fe=0 cap_fver=0 type=CWD
> msg=audit(1521122591.614:227): cwd="/root"
> type=SYSCALL msg=audit(1521122591.614:227): arch=c000003e syscall=257
> success=yes exit=3 a0=ffffffffffffff9c a1=55db90a28900 a2=241 a3=1b6
> items=2 ppid=689 pid=724 auid=0 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0
> sgid=0 fsgid=0 tty=pts0 ses=3 comm="perl" exe="/usr/bin/perl"
> subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023
> key="tmpcontainerid"
>
> See:
> https://github.com/linux-audit/audit-kernel/issues/32
> https://github.com/linux-audit/audit-userspace/issues/40
> https://github.com/linux-audit/audit-testsuite/issues/64
>
> Richard Guy Briggs (13):
> audit: add container id
> audit: check children and threading before allowing containerid
> audit: log container info of syscalls
> audit: add containerid filtering
> audit: add containerid support for ptrace and signals
> audit: add support for non-syscall auxiliary records
> audit: add container aux record to watch/tree/mark
> audit: add containerid support for tty_audit
> audit: add containerid support for config/feature/user records
> audit: add containerid support for seccomp and anom_abend records
> audit: add support for containerid to network namespaces
> audit: NETFILTER_PKT: record each container ID associated with a netNS
> debug audit: read container ID of a process
>
> drivers/tty/tty_audit.c | 5 +-
> fs/proc/base.c | 53 ++++++++++++++++
> include/linux/audit.h | 43 +++++++++++++
> include/linux/init_task.h | 4 +-
> include/linux/sched.h | 1 +
> include/net/net_namespace.h | 12 ++++
> include/uapi/linux/audit.h | 8 ++-
> kernel/audit.c | 75 ++++++++++++++++++++---
> kernel/audit.h | 3 +
> kernel/audit_fsnotify.c | 5 +-
> kernel/audit_tree.c | 5 +-
> kernel/audit_watch.c | 33 +++++-----
> kernel/auditfilter.c | 52 +++++++++++++++-
> kernel/auditsc.c | 145
> ++++++++++++++++++++++++++++++++++++++++++-- kernel/nsproxy.c |
> 6 ++
> net/core/net_namespace.c | 45 ++++++++++++++
> net/netfilter/xt_AUDIT.c | 15 ++++-
> 17 files changed, 473 insertions(+), 37 deletions(-)
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox