Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH] change the comment of vti6_ioctl
From: David Miller @ 2018-04-30 15:57 UTC (permalink / raw)
  To: sunlw.fnst; +Cc: netdev, steffen.klassert, herbert
In-Reply-To: <20180429070552.2472-1-sunlw.fnst@cn.fujitsu.com>

From: Sun Lianwen <sunlw.fnst@cn.fujitsu.com>
Date: Sun, 29 Apr 2018 15:05:52 +0800

> The comment of vti6_ioctl() is wrong. which use vti6_tnl_ioctl
> instead of vti6_ioctl.
> 
> Signed-off-by: Sun Lianwen <sunlw.fnst@cn.fujitsu.com>

Please CC: the IPSEC maintainers on future patch submissions to IPSEC
files, as per the top level MAINTAINERS file.

Steffen, please queue this up, thank you.

> ---
>  net/ipv6/ip6_vti.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/net/ipv6/ip6_vti.c b/net/ipv6/ip6_vti.c
> index c214ffec02f0..deadc4c3703b 100644
> --- a/net/ipv6/ip6_vti.c
> +++ b/net/ipv6/ip6_vti.c
> @@ -743,7 +743,7 @@ vti6_parm_to_user(struct ip6_tnl_parm2 *u, const struct __ip6_tnl_parm *p)
>  }
>  
>  /**
> - * vti6_tnl_ioctl - configure vti6 tunnels from userspace
> + * vti6_ioctl - configure vti6 tunnels from userspace
>   *   @dev: virtual device associated with tunnel
>   *   @ifr: parameters passed from userspace
>   *   @cmd: command to be performed
> -- 
> 2.17.0
> 
> 
> 

^ permalink raw reply

* Re: [PATCH V2 net-next 1/2] tcp: send in-queue bytes in cmsg upon read
From: David Miller @ 2018-04-30 15:56 UTC (permalink / raw)
  To: eric.dumazet
  Cc: soheil.kdev, netdev, ycheng, ncardwell, edumazet, willemb, soheil
In-Reply-To: <aec45003-3354-e49f-b032-5297e98722eb@gmail.com>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Mon, 30 Apr 2018 08:43:50 -0700

> I say sort of, because by the time we have any number, TCP might
> have received more packets anyway.

That's fine.

However, the number reported should have been true at least at some
finite point in time.

If you allow overlapping changes to either of the two variables during
the sampling, then you are reporting a number which was never true at
any point in time.

It is essentially garbage.

^ permalink raw reply

* Re: [PATCH net-next 0/2 v5] netns: uevent filtering
From: Eric W. Biederman @ 2018-04-30 15:55 UTC (permalink / raw)
  To: Christian Brauner
  Cc: davem, netdev, linux-kernel, avagin, ktkhai, serge, gregkh
In-Reply-To: <20180429104412.22445-1-christian.brauner@ubuntu.com>

Christian Brauner <christian.brauner@ubuntu.com> writes:

> Hey everyone,
>
> This is the new approach to uevent filtering as discussed (see the
> threads in [1], [2], and [3]). It only contains *non-functional
> changes*.
>
> This series deals with with fixing up uevent filtering logic:
> - uevent filtering logic is simplified
> - locking time on uevent_sock_list is minimized
> - tagged and untagged kobjects are handled in separate codepaths
> - permissions for userspace are fixed for network device uevents in
>   network namespaces owned by non-initial user namespaces
>   Udev is now able to see those events correctly which it wasn't before.
>   For example, moving a physical device into a network namespace not
>   owned by the initial user namespaces before gave:
>
>   root@xen1:~# udevadm --debug monitor -k
>   calling: monitor
>   monitor will print the received events for:
>   KERNEL - the kernel uevent
>
>   sender uid=65534, message ignored
>   sender uid=65534, message ignored
>   sender uid=65534, message ignored
>   sender uid=65534, message ignored
>   sender uid=65534, message ignored
>
>   and now after the discussion and solution in [3] correctly gives:
>
>   root@xen1:~# udevadm --debug monitor -k
>   calling: monitor
>   monitor will print the received events for:
>   KERNEL - the kernel uevent
>
>   KERNEL[625.301042] add      /devices/pci0000:00/0000:00:02.0/0000:01:00.1/net/enp1s0f1 (net)
>   KERNEL[625.301109] move     /devices/pci0000:00/0000:00:02.0/0000:01:00.1/net/enp1s0f1 (net)
>   KERNEL[625.301138] move     /devices/pci0000:00/0000:00:02.0/0000:01:00.1/net/eth1 (net)
>   KERNEL[655.333272] remove /devices/pci0000:00/0000:00:02.0/0000:01:00.1/net/eth1 (net)
>
> Thanks!
> Christian
>
> [1]: https://lkml.org/lkml/2018/4/4/739
> [2]: https://lkml.org/lkml/2018/4/26/767
> [3]: https://lkml.org/lkml/2018/4/26/738

Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>

>
> Christian Brauner (2):
>   uevent: add alloc_uevent_skb() helper
>   netns: restrict uevents
>
>  lib/kobject_uevent.c | 178 ++++++++++++++++++++++++++++++-------------
>  1 file changed, 126 insertions(+), 52 deletions(-)

Eric

^ permalink raw reply

* Re: [PATCH net-next] libcxgb,cxgb4: use __skb_put_zero to simplfy code
From: David Miller @ 2018-04-30 15:54 UTC (permalink / raw)
  To: yuehaibing; +Cc: ganeshgr, johannes.berg, netdev, linux-kernel
In-Reply-To: <20180428043522.12408-1-yuehaibing@huawei.com>

From: YueHaibing <yuehaibing@huawei.com>
Date: Sat, 28 Apr 2018 12:35:22 +0800

> use helper __skb_put_zero to replace the pattern of __skb_put() && memset()
> 
> Signed-off-by: YueHaibing <yuehaibing@huawei.com>

Applied, thank you.

^ permalink raw reply

* Re: [PATCH bpf-next] tools include uapi: Grab a copy of linux/erspan.h
From: Y Song @ 2018-04-30 15:45 UTC (permalink / raw)
  To: Daniel Borkmann; +Cc: William Tu, netdev, Yonghong Song
In-Reply-To: <be743a29-bbf2-1554-a961-0e71ee8ea030@iogearbox.net>

On Mon, Apr 30, 2018 at 7:33 AM, Daniel Borkmann <daniel@iogearbox.net> wrote:
> On 04/30/2018 04:26 PM, William Tu wrote:
>> Bring the erspan uapi header file so BPF tunnel helpers can use it.
>>
>> Fixes: 933a741e3b82 ("selftests/bpf: bpf tunnel test.")
>> Reported-by: Yonghong Song <yhs@fb.com>
>> Signed-off-by: William Tu <u9012063@gmail.com>
>
> Thanks for the patch, William! I also Cc'ed Yonghong here, so he has a
> chance to try it out.

Just tried it out. It works. Thanks for fixing!
Acked-by: Yonghong Song <yhs@fb.com>

^ permalink raw reply

* RE: smsc95xx: aligment issues
From: Woojung.Huh @ 2018-04-30 15:44 UTC (permalink / raw)
  To: stefan.wahren, Nisar.Sayed
  Cc: davem, linux-usb, netdev, popcornmix, james.hughes
In-Reply-To: <455810676.1044.1524902320201@email.1und1.de>

Hi Stefan,

Thanks for report. We will try to repro issue and contact you if need more details.

Regards,
Woojung

> -----Original Message-----
> From: Stefan Wahren [mailto:stefan.wahren@i2se.com]
> Sent: Saturday, April 28, 2018 3:59 AM
> To: Nisar Sayed - I17970 <Nisar.Sayed@microchip.com>; Woojung Huh - C21699
> <Woojung.Huh@microchip.com>
> Cc: David S. Miller <davem@davemloft.net>; linux-usb <linux-usb@vger.kernel.org>; netdev
> <netdev@vger.kernel.org>; popcorn mix <popcornmix@gmail.com>; James Hughes
> <james.hughes@raspberrypi.org>
> Subject: net: smsc95xx: aligment issues
> 
> Hi,
> after connecting a Raspberry Pi 1 B to my local network i'm seeing aligment issues under
> /proc/cpu/alignment:
> 
> User:		0
> System:		142 (_decode_session4+0x12c/0x3c8)
> Skipped:	0
> Half:		0
> Word:		0
> DWord:		127
> Multi:		15
> User faults:	2 (fixup)
> 
> I've also seen outputs with _csum_ipv6_magic.
> 
> Kernel config: bcm2835_defconfig
> Reproducible kernel trees: current linux-next, 4.17-rc2 and 4.14.37 (i didn't test older versions)
> 
> Please tell if you need more information to narrow down this issue.
> 
> Best regards
> Stefan

^ permalink raw reply

* Re: [PATCH net-next] erspan: auto detect truncated packets.
From: David Miller @ 2018-04-30 15:44 UTC (permalink / raw)
  To: u9012063; +Cc: netdev
In-Reply-To: <1524863792-66068-1-git-send-email-u9012063@gmail.com>

From: William Tu <u9012063@gmail.com>
Date: Fri, 27 Apr 2018 14:16:32 -0700

> Currently the truncated bit is set only when the mirrored packet
> is larger than mtu.  For certain cases, the packet might already
> been truncated before sending to the erspan tunnel.  In this case,
> the patch detect whether the IP header's total length is larger
> than the actual skb->len.  If true, this indicated that the
> mirrored packet is truncated and set the erspan truncate bit.
> 
> I tested the patch using bpf_skb_change_tail helper function to
> shrink the packet size and send to erspan tunnel.
> 
> Reported-by: Xiaoyan Jin <xiaoyanj@vmware.com>
> Signed-off-by: William Tu <u9012063@gmail.com>

Applied, thanks.

^ permalink raw reply

* Re: [PATCH V2 net-next 1/2] tcp: send in-queue bytes in cmsg upon read
From: Eric Dumazet @ 2018-04-30 15:43 UTC (permalink / raw)
  To: David Miller, soheil.kdev
  Cc: netdev, ycheng, ncardwell, edumazet, willemb, soheil
In-Reply-To: <20180430.113834.1760530542793231849.davem@davemloft.net>



On 04/30/2018 08:38 AM, David Miller wrote:
> From: Soheil Hassas Yeganeh <soheil.kdev@gmail.com>
> Date: Fri, 27 Apr 2018 14:57:32 -0400
> 
>> Since the socket lock is not held when calculating the size of
>> receive queue, TCP_INQ is a hint.  For example, it can overestimate
>> the queue size by one byte, if FIN is received.
> 
> I think it is even worse than that.
> 
> If another application comes in and does a recvmsg() in parallel with
> these calculations, you could even report a negative value.
> 
> These READ_ONCE() make it look like some of these issues are being
> addressed but they are not.
> 
> You could freeze the values just by taking sk->sk_lock.slock, but I
> don't know if that cost is considered acceptable or not.
> 
> Another idea is to sample both values in a loop, similar to a sequence
> lock sequence:
> 
> again:
> 	tmp1 = A;
> 	tmp2 = B;
> 	barrier();
> 	tmp3 = A;
> 	if (tmp1 != tmp3)
> 		goto again;
> 
> But the current state of affairs is not going to work well.
> 

We want a hint, and max_t(int, 0, ....)  does not return a negative value ?

If the hint is wrong in 0.1 % of the cases, we really do not care, it is not meant
to replace the existing precise ( well, sort of ) mechanism.

I say sort of, because by the time we have any number, TCP might have received more packets anyway.

^ permalink raw reply

* Re: KASAN: use-after-free Read in perf_trace_rpc_stats_latency
From: Chuck Lever @ 2018-04-30 15:39 UTC (permalink / raw)
  To: syzbot
  Cc: Anna Schumaker, Bruce Fields, David S . Miller, Jeff Layton, lkml,
	Linux NFS Mailing List, netdev, syzkaller-bugs, Trond Myklebust
In-Reply-To: <00000000000027a3e2056b10e81e@google.com>



> On Apr 30, 2018, at 9:34 AM, syzbot <syzbot+27db1f90e2b972a5f2d3@syzkaller.appspotmail.com> wrote:
> 
> Hello,
> 
> syzbot hit the following crash on bpf-next commit
> f60ad0a0c441530280a4918eca781a6a94dffa50 (Sun Apr 29 15:45:55 2018 +0000)
> Merge branch 'bpf_get_stack'
> syzbot dashboard link: https://syzkaller.appspot.com/bug?extid=27db1f90e2b972a5f2d3
> 
> Unfortunately, I don't have any reproducer for this crash yet.
> Raw console output: https://syzkaller.appspot.com/x/log.txt?id=6741221342969856
> Kernel config: https://syzkaller.appspot.com/x/.config?id=4410550353033654931
> compiler: gcc (GCC) 8.0.1 20180413 (experimental)
> 
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: syzbot+27db1f90e2b972a5f2d3@syzkaller.appspotmail.com
> It will help syzbot understand when the bug is fixed. See footer for details.
> If you forward the report, please keep this part and the footer.
> 
> rpcbind: RPC call returned error 22
> rpcbind: RPC call returned error 22
> rpcbind: RPC call returned error 22
> rpcbind: RPC call returned error 22
> ==================================================================
> BUG: KASAN: use-after-free in strlen+0x83/0xa0 lib/string.c:482
> Read of size 1 at addr ffff8801d6f0a1c0 by task syz-executor7/5079
> 
> CPU: 1 PID: 5079 Comm: syz-executor7 Not tainted 4.17.0-rc2+ #16
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
> Call Trace:
> __dump_stack lib/dump_stack.c:77 [inline]
> dump_stack+0x1b9/0x294 lib/dump_stack.c:113
> print_address_description+0x6c/0x20b mm/kasan/report.c:256
> kasan_report_error mm/kasan/report.c:354 [inline]
> kasan_report.cold.7+0x242/0x2fe mm/kasan/report.c:412
> __asan_report_load1_noabort+0x14/0x20 mm/kasan/report.c:430
> strlen+0x83/0xa0 lib/string.c:482
> trace_event_get_offsets_rpc_stats_latency include/trace/events/sunrpc.h:215 [inline]
> perf_trace_rpc_stats_latency+0x318/0x10d0 include/trace/events/sunrpc.h:215
> trace_rpc_stats_latency include/trace/events/sunrpc.h:215 [inline]
> rpc_count_iostats_metrics+0x594/0x8a0 net/sunrpc/stats.c:182
> rpc_count_iostats+0x76/0x90 net/sunrpc/stats.c:195
> xprt_release+0xa3b/0x1110 net/sunrpc/xprt.c:1351
> rpc_release_resources_task+0x20/0xa0 net/sunrpc/sched.c:1024
> rpc_release_task net/sunrpc/sched.c:1068 [inline]
> __rpc_execute+0x5e9/0xf50 net/sunrpc/sched.c:833
> rpc_execute+0x37f/0x480 net/sunrpc/sched.c:852
> rpc_run_task+0x615/0x8c0 net/sunrpc/clnt.c:1053
> rpc_call_sync+0x196/0x290 net/sunrpc/clnt.c:1082
> rpc_ping+0x155/0x1f0 net/sunrpc/clnt.c:2513
> rpc_create_xprt+0x282/0x3f0 net/sunrpc/clnt.c:479
> rpc_create+0x52e/0x900 net/sunrpc/clnt.c:587
> nfs_create_rpc_client+0x63e/0x850 fs/nfs/client.c:523
> nfs_init_client+0x74/0x100 fs/nfs/client.c:634
> nfs_get_client+0x1065/0x1500 fs/nfs/client.c:425
> nfs_init_server+0x364/0xfb0 fs/nfs/client.c:670
> nfs_create_server+0x86/0x5f0 fs/nfs/client.c:953
> nfs_try_mount+0x177/0xab0 fs/nfs/super.c:1884
> nfs_fs_mount+0x17de/0x2efd fs/nfs/super.c:2695
> mount_fs+0xae/0x328 fs/super.c:1267
> vfs_kern_mount.part.34+0xd4/0x4d0 fs/namespace.c:1037
> vfs_kern_mount fs/namespace.c:1027 [inline]
> do_new_mount fs/namespace.c:2518 [inline]
> do_mount+0x564/0x3070 fs/namespace.c:2848
> ksys_mount+0x12d/0x140 fs/namespace.c:3064
> __do_sys_mount fs/namespace.c:3078 [inline]
> __se_sys_mount fs/namespace.c:3075 [inline]
> __x64_sys_mount+0xbe/0x150 fs/namespace.c:3075
> do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:287
> entry_SYSCALL_64_after_hwframe+0x49/0xbe
> RIP: 0033:0x455979
> RSP: 002b:00007f1e2785bc68 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5
> RAX: ffffffffffffffda RBX: 00007f1e2785c6d4 RCX: 0000000000455979
> RDX: 0000000020fb5ffc RSI: 0000000020343ff8 RDI: 000000002091dff8
> RBP: 000000000072bf50 R08: 000000002000a000 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff
> R13: 0000000000000440 R14: 00000000006fa6a0 R15: 0000000000000001
> 
> Allocated by task 5079:
> save_stack+0x43/0xd0 mm/kasan/kasan.c:448
> set_track mm/kasan/kasan.c:460 [inline]
> kasan_kmalloc+0xc4/0xe0 mm/kasan/kasan.c:553
> __do_kmalloc mm/slab.c:3718 [inline]
> __kmalloc_track_caller+0x14a/0x760 mm/slab.c:3733
> kstrdup+0x39/0x70 mm/util.c:56
> xs_format_common_peer_ports+0x130/0x370 net/sunrpc/xprtsock.c:290
> xs_format_peer_addresses net/sunrpc/xprtsock.c:303 [inline]
> xs_setup_udp+0x5ea/0x880 net/sunrpc/xprtsock.c:3037
> xprt_create_transport+0x1d7/0x596 net/sunrpc/xprt.c:1433
> rpc_create+0x489/0x900 net/sunrpc/clnt.c:573
> nfs_create_rpc_client+0x63e/0x850 fs/nfs/client.c:523
> nfs_init_client+0x74/0x100 fs/nfs/client.c:634
> nfs_get_client+0x1065/0x1500 fs/nfs/client.c:425
> nfs_init_server+0x364/0xfb0 fs/nfs/client.c:670
> nfs_create_server+0x86/0x5f0 fs/nfs/client.c:953
> nfs_try_mount+0x177/0xab0 fs/nfs/super.c:1884
> nfs_fs_mount+0x17de/0x2efd fs/nfs/super.c:2695
> mount_fs+0xae/0x328 fs/super.c:1267
> vfs_kern_mount.part.34+0xd4/0x4d0 fs/namespace.c:1037
> vfs_kern_mount fs/namespace.c:1027 [inline]
> do_new_mount fs/namespace.c:2518 [inline]
> do_mount+0x564/0x3070 fs/namespace.c:2848
> ksys_mount+0x12d/0x140 fs/namespace.c:3064
> __do_sys_mount fs/namespace.c:3078 [inline]
> __se_sys_mount fs/namespace.c:3075 [inline]
> __x64_sys_mount+0xbe/0x150 fs/namespace.c:3075
> do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:287
> entry_SYSCALL_64_after_hwframe+0x49/0xbe
> 
> Freed by task 6:
> save_stack+0x43/0xd0 mm/kasan/kasan.c:448
> set_track mm/kasan/kasan.c:460 [inline]
> __kasan_slab_free+0x11a/0x170 mm/kasan/kasan.c:521
> kasan_slab_free+0xe/0x10 mm/kasan/kasan.c:528
> __cache_free mm/slab.c:3498 [inline]
> kfree+0xd9/0x260 mm/slab.c:3813
> xs_update_peer_port net/sunrpc/xprtsock.c:309 [inline]
> xs_set_port+0x105/0x180 net/sunrpc/xprtsock.c:1827
> rpcb_getport_done+0x224/0x2d0 net/sunrpc/rpcb_clnt.c:824
> rpc_exit_task+0xc9/0x2d0 net/sunrpc/sched.c:725
> __rpc_execute+0x28a/0xf50 net/sunrpc/sched.c:784
> rpc_async_schedule+0x16/0x20 net/sunrpc/sched.c:857
> process_one_work+0xc1e/0x1b50 kernel/workqueue.c:2145
> worker_thread+0x1cc/0x1440 kernel/workqueue.c:2279
> kthread+0x345/0x410 kernel/kthread.c:238
> ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:412
> 
> The buggy address belongs to the object at ffff8801d6f0a1c0
> which belongs to the cache kmalloc-32 of size 32
> The buggy address is located 0 bytes inside of
> 32-byte region [ffff8801d6f0a1c0, ffff8801d6f0a1e0)
> The buggy address belongs to the page:
> page:ffffea00075bc280 count:1 mapcount:0 mapping:ffff8801d6f0a000 index:0xffff8801d6f0afc1
> flags: 0x2fffc0000000100(slab)
> raw: 02fffc0000000100 ffff8801d6f0a000 ffff8801d6f0afc1 0000000100000024
> raw: ffffea00075c52a0 ffff8801da801238 ffff8801da8001c0 0000000000000000
> page dumped because: kasan: bad access detected
> 
> Memory state around the buggy address:
> ffff8801d6f0a080: 01 fc fc fc fc fc fc fc 00 02 fc fc fc fc fc fc
> ffff8801d6f0a100: 00 02 fc fc fc fc fc fc 00 02 fc fc fc fc fc fc
>> ffff8801d6f0a180: fb fb fb fb fc fc fc fc fb fb fb fb fc fc fc fc
>                                           ^
> ffff8801d6f0a200: 00 02 fc fc fc fc fc fc 01 fc fc fc fc fc fc fc
> ffff8801d6f0a280: fb fb fb fb fc fc fc fc fb fb fb fb fc fc fc fc
> ==================================================================

The rpc_task survived longer than the transport. task->tk_xprt
points to freed memory by the time rpc_count_iostats_metrics
runs.

The naive fix is to remove the references to task->tk_xprt
in the trace point. I'm still checking to see if this is the
fully correct fix.


> ---
> This bug is generated by a dumb bot. It may contain errors.
> See https://goo.gl/tpsmEJ for details.
> Direct all questions to syzkaller@googlegroups.com.
> 
> syzbot will keep track of this bug report.
> If you forgot to add the Reported-by tag, once the fix for this bug is merged
> into any tree, please reply to this email with:
> #syz fix: exact-commit-title
> To mark this as a duplicate of another syzbot report, please reply with:
> #syz dup: exact-subject-of-another-report
> If it's a one-off invalid bug report, please reply with:
> #syz invalid
> Note: if the crash happens again, it will cause creation of a new bug report.
> Note: all commands must start from beginning of the line in the email body.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
Chuck Lever

^ permalink raw reply

* Re: [PATCH V2 net-next 1/2] tcp: send in-queue bytes in cmsg upon read
From: David Miller @ 2018-04-30 15:38 UTC (permalink / raw)
  To: soheil.kdev; +Cc: netdev, ycheng, ncardwell, edumazet, willemb, soheil
In-Reply-To: <20180427185733.36855-1-soheil.kdev@gmail.com>

From: Soheil Hassas Yeganeh <soheil.kdev@gmail.com>
Date: Fri, 27 Apr 2018 14:57:32 -0400

> Since the socket lock is not held when calculating the size of
> receive queue, TCP_INQ is a hint.  For example, it can overestimate
> the queue size by one byte, if FIN is received.

I think it is even worse than that.

If another application comes in and does a recvmsg() in parallel with
these calculations, you could even report a negative value.

These READ_ONCE() make it look like some of these issues are being
addressed but they are not.

You could freeze the values just by taking sk->sk_lock.slock, but I
don't know if that cost is considered acceptable or not.

Another idea is to sample both values in a loop, similar to a sequence
lock sequence:

again:
	tmp1 = A;
	tmp2 = B;
	barrier();
	tmp3 = A;
	if (tmp1 != tmp3)
		goto again;

But the current state of affairs is not going to work well.

^ permalink raw reply

* Re: simplify procfs code for seq_file instances V2
From: David Howells @ 2018-04-30 15:38 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: dhowells, Andrew Morton, Alexander Viro, linux-rtc,
	Alessandro Zummo, Alexandre Belloni, devel, linux-kernel,
	linux-scsi, linux-ide, Greg Kroah-Hartman, jfs-discussion,
	linux-afs, linux-acpi, netdev, netfilter-devel, Jiri Slaby,
	linux-ext4, Alexey Dobriyan, megaraidlinux.pdl, drbd-dev
In-Reply-To: <20180425154827.32251-1-hch@lst.de>

Note that your kernel hits the:

	inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage.
	swapper/0/0 [HC1[1]:SC0[0]:HE0:SE1] takes:
		(ptrval) (fs_reclaim){?.+.}, at: fs_reclaim_acquire+0x12/0x35
	{HARDIRQ-ON-W} state was registered at:
	  fs_reclaim_acquire+0x32/0x35
	  kmem_cache_alloc_node_trace+0x49/0x2cf
	  alloc_worker+0x1d/0x49
	  init_rescuer.part.7+0x19/0x8f
	  workqueue_init+0xc0/0x1fe
	  kernel_init_freeable+0xdc/0x433
	  kernel_init+0xa/0xf5
	  ret_from_fork+0x24/0x30

bug, as described here:

	https://groups.google.com/forum/#!msg/syzkaller-bugs/sJC3Y3hOM08/aO3z9JXoAgAJ

David

^ permalink raw reply

* Re: [PATCH RFC iproute2-next 2/2] rdma: print provider resource attributes
From: Stephen Hemminger @ 2018-04-30 15:25 UTC (permalink / raw)
  To: Steve Wise; +Cc: dsahern, leon, netdev, linux-rdma
In-Reply-To: <9dc07757af0a98c444c5a40131a7c54922361e03.1525100473.git.swise@opengridcomputing.com>

On Mon, 30 Apr 2018 07:36:18 -0700
Steve Wise <swise@opengridcomputing.com> wrote:

> +#define nla_type(attr) ((attr)->nla_type & NLA_TYPE_MASK)
> +
> +void newline(struct rd *rd)
> +{
> +	if (rd->json_output)
> +		jsonw_end_array(rd->jw);
> +	else
> +		pr_out("\n");
> +}
> +
> +void newline_indent(struct rd *rd)
> +{
> +	newline(rd);
> +	if (!rd->json_output)
> +		pr_out("    ");
> +}
> +
> +static int print_provider_string(struct rd *rd, const char *key_str,
> +				 const char *val_str)
> +{
> +	if (rd->json_output) {
> +		jsonw_string_field(rd->jw, key_str, val_str);
> +		return 0;
> +	} else {
> +		return pr_out("%s %s ", key_str, val_str);
> +	}
> +}
> +
> +static int print_provider_s32(struct rd *rd, const char *key_str, int32_t val,
> +			      enum rdma_nldev_print_type print_type)
> +{
> +	if (rd->json_output) {
> +		jsonw_int_field(rd->jw, key_str, val);
> +		return 0;
> +	}
> +	switch (print_type) {
> +	case RDMA_NLDEV_PRINT_TYPE_UNSPEC:
> +		return pr_out("%s %d ", key_str, val);
> +	case RDMA_NLDEV_PRINT_TYPE_HEX:
> +		return pr_out("%s 0x%x ", key_str, val);
> +	default:
> +		return -EINVAL;
> +	}
> +}
> +

This code should get converted to json_print library that handles the
different output modes; rather than rolling it's own equivalent functionality.

^ permalink raw reply

* Re: [PATCH] connector: add parent pid and tgid to coredump and exit events
From: David Miller @ 2018-04-30 15:17 UTC (permalink / raw)
  To: zbr
  Cc: stefan.strogin, jderehag, netdev, linux-kernel, xe-linux-external,
	matt.helsley
In-Reply-To: <4667631525100490@web38o.yandex.ru>

From: Evgeniy Polyakov <zbr@ioremap.net>
Date: Mon, 30 Apr 2018 18:01:30 +0300

> Stefan, hi
> 
> Sorry for delay.
> 
> 26.04.2018, 15:04, "Stefan Strogin" <stefan.strogin@gmail.com>:
>> Hi David, Evgeniy,
>>
>> Sorry to bother you, but could you please comment about the UAPI change and the patch?
> 
> With 4-bytes pid_t everything looks fine, and I do not know arch where pid is larger currently, so it looks safe.
> 
> David, please pull it into your tree, or should it go via different path?
> 
> Acked-by: Evgeniy Polyakov <zbr@ioremap.net>

After this much time it needs to be resubmitted.

^ permalink raw reply

* Re: [PATCH bpf-next 2/3] bpf: fix formatting for bpf_get_stack() helper doc
From: Quentin Monnet @ 2018-04-30 15:16 UTC (permalink / raw)
  To: David Ahern, Alexei Starovoitov
  Cc: daniel, ast, netdev, oss-drivers, Yonghong Song
In-Reply-To: <5cc7a2d9-2a76-c323-e607-2b0400758297@gmail.com>

2018-04-30 09:12 UTC-0600 ~ David Ahern <dsahern@gmail.com>
> On 4/30/18 9:08 AM, Alexei Starovoitov wrote:
>>> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
>>> index 530ff6588d8f..8daef7326bb7 100644
>>> --- a/include/uapi/linux/bpf.h
>>> +++ b/include/uapi/linux/bpf.h
>>> @@ -1770,33 +1770,33 @@ union bpf_attr {
>>>   *
>>>   * int bpf_get_stack(struct pt_regs *regs, void *buf, u32 size, u64 flags)
>>>   * 	Description
>>> - *		Return a user or a kernel stack in bpf program provided buffer.
>>> - *		To achieve this, the helper needs *ctx*, which is a pointer
>>> + * 		Return a user or a kernel stack in bpf program provided buffer.
>>> + * 		To achieve this, the helper needs *ctx*, which is a pointer
>> I still don't quite get the difference.
>> It's replacing 2 tabs in above with 1 space + 2 tabs ?

Yes, exactly (Plus in this case, the "::" a few line below has a missing
tab).

>> Can you please teach the python script to accept both?
>> I bet that will be recurring mistake and it's impossible to spot in code review.
> And checkpatch throws an error on the 1 space + 2 tabs so it gets
> confusing on which format should be used.

Sorry about that :/. I will send a patch to make the script more flexible.

Quentin

^ permalink raw reply

* Re: [PATCH bpf-next 2/3] bpf: fix formatting for bpf_get_stack() helper doc
From: David Ahern @ 2018-04-30 15:12 UTC (permalink / raw)
  To: Alexei Starovoitov, Quentin Monnet
  Cc: daniel, ast, netdev, oss-drivers, Yonghong Song
In-Reply-To: <20180430150833.gt2di56f4jembb2f@ast-mbp.dhcp.thefacebook.com>

On 4/30/18 9:08 AM, Alexei Starovoitov wrote:
>> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
>> index 530ff6588d8f..8daef7326bb7 100644
>> --- a/include/uapi/linux/bpf.h
>> +++ b/include/uapi/linux/bpf.h
>> @@ -1770,33 +1770,33 @@ union bpf_attr {
>>   *
>>   * int bpf_get_stack(struct pt_regs *regs, void *buf, u32 size, u64 flags)
>>   * 	Description
>> - *		Return a user or a kernel stack in bpf program provided buffer.
>> - *		To achieve this, the helper needs *ctx*, which is a pointer
>> + * 		Return a user or a kernel stack in bpf program provided buffer.
>> + * 		To achieve this, the helper needs *ctx*, which is a pointer
> 
> I still don't quite get the difference.
> It's replacing 2 tabs in above with 1 space + 2 tabs ?
> Can you please teach the python script to accept both?
> I bet that will be recurring mistake and it's impossible to spot in code review.
> 

And checkpatch throws an error on the 1 space + 2 tabs so it gets
confusing on which format should be used.

^ permalink raw reply

* [PATCH RFC iproute2-next 2/2] rdma: print provider resource attributes
From: Steve Wise @ 2018-04-30 14:36 UTC (permalink / raw)
  To: dsahern, leon; +Cc: stephen, netdev, linux-rdma
In-Reply-To: <cover.1525100473.git.swise@opengridcomputing.com>

This enhancement allows printing rdma device-specific state, if provided
by the kernel.  This is done in a generic manner, so rdma tool doesn't
need to know about the details of every type of rdma device.

Provider attributes for a rdma resource are in the form of <key,
[print_type], value> tuples, where the key is a string and the value can
be any supported provider attribute.  The print_type attribute, if present,
provides a print format to use vs the standard print format for the type.
For example, the default print type for a PROVIDER_S32 value is "%d ",
but "0x%x " if the print_type of PRINT_TYPE_HEX is included inthe tuple.

Provider resources are only printed when the -dd flag is present.
If -p is present, then the output is formatted to not exceed 80 columns,
otherwise it is printed as a single row to be grep/awk friendly.

Example output:

# rdma resource show qp lqpn 1028 -dd -p
link cxgb4_0/- lqpn 1028 rqpn 0 type RC state RTS rq-psn 0 sq-psn 0 path-mig-state MIGRATED pid 0 comm [nvme_rdma]
    sqid 1028 flushed 0 memsize 123968 cidx 85 pidx 85 wq_pidx 106 flush_cidx 85 in_use 0
    size 386 flags 0x0 rqid 1029 memsize 16768 cidx 43 pidx 41 wq_pidx 171 msn 44 rqt_hwaddr 0x2a8a5d00
    rqt_size 256 in_use 128 size 130 idx 43 wr_id 0xffff881057c03408 idx 40 wr_id 0xffff881057c033f0

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
---
 rdma/rdma.c  |   7 ++-
 rdma/rdma.h  |  11 ++++
 rdma/res.c   |  30 +++------
 rdma/utils.c | 194 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 221 insertions(+), 21 deletions(-)

diff --git a/rdma/rdma.c b/rdma/rdma.c
index b43e538..c7c8b83 100644
--- a/rdma/rdma.c
+++ b/rdma/rdma.c
@@ -132,6 +132,7 @@ int main(int argc, char **argv)
 	const char *batch_file = NULL;
 	bool pretty_output = false;
 	bool show_details = false;
+	bool show_provider_details = false;
 	bool json_output = false;
 	bool force = false;
 	char *filename;
@@ -152,7 +153,10 @@ int main(int argc, char **argv)
 			pretty_output = true;
 			break;
 		case 'd':
-			show_details = true;
+			if (show_details)
+				show_provider_details = true;
+			else
+				show_details = true;
 			break;
 		case 'j':
 			json_output = true;
@@ -180,6 +184,7 @@ int main(int argc, char **argv)
 	argv += optind;
 
 	rd.show_details = show_details;
+	rd.show_provider_details = show_provider_details;
 	rd.json_output = json_output;
 	rd.pretty_output = pretty_output;
 
diff --git a/rdma/rdma.h b/rdma/rdma.h
index 1908fc4..e9581fe 100644
--- a/rdma/rdma.h
+++ b/rdma/rdma.h
@@ -55,6 +55,7 @@ struct rd {
 	char **argv;
 	char *filename;
 	bool show_details;
+	bool show_provider_details;
 	struct list_head dev_map_list;
 	uint32_t dev_idx;
 	uint32_t port_idx;
@@ -115,4 +116,14 @@ int rd_recv_msg(struct rd *rd, mnl_cb_t callback, void *data, uint32_t seq);
 void rd_prepare_msg(struct rd *rd, uint32_t cmd, uint32_t *seq, uint16_t flags);
 int rd_dev_init_cb(const struct nlmsghdr *nlh, void *data);
 int rd_attr_cb(const struct nlattr *attr, void *data);
+int rd_attr_check(const struct nlattr *attr, int *typep);
+
+/*
+ * Print helpers
+ */
+void print_provider_table(struct rd *rd, struct nlattr *tb);
+void newline(struct rd *rd);
+void newline_indent(struct rd *rd);
+#define MAX_LINE_LENGTH 80
+
 #endif /* _RDMA_TOOL_H_ */
diff --git a/rdma/res.c b/rdma/res.c
index 1a0aab6..bc0aef5 100644
--- a/rdma/res.c
+++ b/rdma/res.c
@@ -439,10 +439,8 @@ static int res_qp_parse_cb(const struct nlmsghdr *nlh, void *data)
 		if (nla_line[RDMA_NLDEV_ATTR_RES_PID])
 			free(comm);
 
-		if (rd->json_output)
-			jsonw_end_array(rd->jw);
-		else
-			pr_out("\n");
+		print_provider_table(rd, nla_line[RDMA_NLDEV_ATTR_PROVIDER]);
+		newline(rd);
 	}
 	return MNL_CB_OK;
 }
@@ -678,10 +676,8 @@ static int res_cm_id_parse_cb(const struct nlmsghdr *nlh, void *data)
 		if (nla_line[RDMA_NLDEV_ATTR_RES_PID])
 			free(comm);
 
-		if (rd->json_output)
-			jsonw_end_array(rd->jw);
-		else
-			pr_out("\n");
+		print_provider_table(rd, nla_line[RDMA_NLDEV_ATTR_PROVIDER]);
+		newline(rd);
 	}
 	return MNL_CB_OK;
 }
@@ -804,10 +800,8 @@ static int res_cq_parse_cb(const struct nlmsghdr *nlh, void *data)
 		if (nla_line[RDMA_NLDEV_ATTR_RES_PID])
 			free(comm);
 
-		if (rd->json_output)
-			jsonw_end_array(rd->jw);
-		else
-			pr_out("\n");
+		print_provider_table(rd, nla_line[RDMA_NLDEV_ATTR_PROVIDER]);
+		newline(rd);
 	}
 	return MNL_CB_OK;
 }
@@ -919,10 +913,8 @@ static int res_mr_parse_cb(const struct nlmsghdr *nlh, void *data)
 		if (nla_line[RDMA_NLDEV_ATTR_RES_PID])
 			free(comm);
 
-		if (rd->json_output)
-			jsonw_end_array(rd->jw);
-		else
-			pr_out("\n");
+		print_provider_table(rd, nla_line[RDMA_NLDEV_ATTR_PROVIDER]);
+		newline(rd);
 	}
 	return MNL_CB_OK;
 }
@@ -1004,10 +996,8 @@ static int res_pd_parse_cb(const struct nlmsghdr *nlh, void *data)
 		if (nla_line[RDMA_NLDEV_ATTR_RES_PID])
 			free(comm);
 
-		if (rd->json_output)
-			jsonw_end_array(rd->jw);
-		else
-			pr_out("\n");
+		print_provider_table(rd, nla_line[RDMA_NLDEV_ATTR_PROVIDER]);
+		newline(rd);
 	}
 	return MNL_CB_OK;
 }
diff --git a/rdma/utils.c b/rdma/utils.c
index 49c967f..3d3225c 100644
--- a/rdma/utils.c
+++ b/rdma/utils.c
@@ -11,6 +11,7 @@
 
 #include "rdma.h"
 #include <ctype.h>
+#include <inttypes.h>
 
 int rd_argc(struct rd *rd)
 {
@@ -393,8 +394,32 @@ static const enum mnl_attr_data_type nldev_policy[RDMA_NLDEV_ATTR_MAX] = {
 	[RDMA_NLDEV_ATTR_RES_MRLEN] = MNL_TYPE_U64,
 	[RDMA_NLDEV_ATTR_NDEV_INDEX]		= MNL_TYPE_U32,
 	[RDMA_NLDEV_ATTR_NDEV_NAME]		= MNL_TYPE_NUL_STRING,
+	[RDMA_NLDEV_ATTR_PROVIDER] = MNL_TYPE_NESTED,
+	[RDMA_NLDEV_ATTR_PROVIDER_ENTRY] = MNL_TYPE_NESTED,
+	[RDMA_NLDEV_ATTR_PROVIDER_STRING] = MNL_TYPE_NUL_STRING,
+	[RDMA_NLDEV_ATTR_PROVIDER_PRINT_TYPE] = MNL_TYPE_U8,
+	[RDMA_NLDEV_ATTR_PROVIDER_S32] = MNL_TYPE_U32,
+	[RDMA_NLDEV_ATTR_PROVIDER_U32] = MNL_TYPE_U32,
+	[RDMA_NLDEV_ATTR_PROVIDER_S64] = MNL_TYPE_U64,
+	[RDMA_NLDEV_ATTR_PROVIDER_U64] = MNL_TYPE_U64,
 };
 
+int rd_attr_check(const struct nlattr *attr, int *typep)
+{
+	int type;
+
+	if (mnl_attr_type_valid(attr, RDMA_NLDEV_ATTR_MAX) < 0)
+		return MNL_CB_ERROR;
+
+	type = mnl_attr_get_type(attr);
+
+	if (mnl_attr_validate(attr, nldev_policy[type]) < 0)
+		return MNL_CB_ERROR;
+
+	*typep = nldev_policy[type];
+	return MNL_CB_OK;
+}
+
 int rd_attr_cb(const struct nlattr *attr, void *data)
 {
 	const struct nlattr **tb = data;
@@ -660,3 +685,172 @@ struct dev_map *dev_map_lookup(struct rd *rd, bool allow_port_index)
 	free(dev_name);
 	return dev_map;
 }
+
+#define nla_type(attr) ((attr)->nla_type & NLA_TYPE_MASK)
+
+void newline(struct rd *rd)
+{
+	if (rd->json_output)
+		jsonw_end_array(rd->jw);
+	else
+		pr_out("\n");
+}
+
+void newline_indent(struct rd *rd)
+{
+	newline(rd);
+	if (!rd->json_output)
+		pr_out("    ");
+}
+
+static int print_provider_string(struct rd *rd, const char *key_str,
+				 const char *val_str)
+{
+	if (rd->json_output) {
+		jsonw_string_field(rd->jw, key_str, val_str);
+		return 0;
+	} else {
+		return pr_out("%s %s ", key_str, val_str);
+	}
+}
+
+static int print_provider_s32(struct rd *rd, const char *key_str, int32_t val,
+			      enum rdma_nldev_print_type print_type)
+{
+	if (rd->json_output) {
+		jsonw_int_field(rd->jw, key_str, val);
+		return 0;
+	}
+	switch (print_type) {
+	case RDMA_NLDEV_PRINT_TYPE_UNSPEC:
+		return pr_out("%s %d ", key_str, val);
+	case RDMA_NLDEV_PRINT_TYPE_HEX:
+		return pr_out("%s 0x%x ", key_str, val);
+	default:
+		return -EINVAL;
+	}
+}
+
+static int print_provider_u32(struct rd *rd, const char *key_str, uint32_t val,
+			      enum rdma_nldev_print_type print_type)
+{
+	if (rd->json_output) {
+		jsonw_int_field(rd->jw, key_str, val);
+		return 0;
+	}
+	switch (print_type) {
+	case RDMA_NLDEV_PRINT_TYPE_UNSPEC:
+		return pr_out("%s %u ", key_str, val);
+	case RDMA_NLDEV_PRINT_TYPE_HEX:
+		return pr_out("%s 0x%x ", key_str, val);
+	default:
+		return -EINVAL;
+	}
+}
+
+static int print_provider_s64(struct rd *rd, const char *key_str, int64_t val,
+			      enum rdma_nldev_print_type print_type)
+{
+	if (rd->json_output) {
+		jsonw_int_field(rd->jw, key_str, val);
+		return 0;
+	}
+	switch (print_type) {
+	case RDMA_NLDEV_PRINT_TYPE_UNSPEC:
+		return pr_out("%s %" PRId64 " ", key_str, val);
+	case RDMA_NLDEV_PRINT_TYPE_HEX:
+		return pr_out("%s 0x%" PRIx64 " ", key_str, val);
+	default:
+		return -EINVAL;
+	}
+}
+
+static int print_provider_u64(struct rd *rd, const char *key_str, uint64_t val,
+			      enum rdma_nldev_print_type print_type)
+{
+	if (rd->json_output) {
+		jsonw_int_field(rd->jw, key_str, val);
+		return 0;
+	}
+	switch (print_type) {
+	case RDMA_NLDEV_PRINT_TYPE_UNSPEC:
+		return pr_out("%s %" PRIu64 " ", key_str, val);
+	case RDMA_NLDEV_PRINT_TYPE_HEX:
+		return pr_out("%s 0x%" PRIx64 " ", key_str, val);
+	default:
+		return -EINVAL;
+	}
+}
+
+static int print_provider_entry(struct rd *rd, struct nlattr *key_attr,
+				struct nlattr *val_attr,
+				enum rdma_nldev_print_type print_type)
+{
+	const char *key_str = mnl_attr_get_str(key_attr);
+	int attr_type = nla_type(val_attr);
+
+	switch (attr_type) {
+	case RDMA_NLDEV_ATTR_PROVIDER_STRING:
+		return print_provider_string(rd, key_str,
+				mnl_attr_get_str(val_attr));
+	case RDMA_NLDEV_ATTR_PROVIDER_S32:
+		return print_provider_s32(rd, key_str,
+				mnl_attr_get_u32(val_attr), print_type);
+	case RDMA_NLDEV_ATTR_PROVIDER_U32:
+		return print_provider_u32(rd, key_str,
+				mnl_attr_get_u32(val_attr), print_type);
+	case RDMA_NLDEV_ATTR_PROVIDER_S64:
+		return print_provider_s64(rd, key_str,
+				mnl_attr_get_u64(val_attr), print_type);
+	case RDMA_NLDEV_ATTR_PROVIDER_U64:
+		return print_provider_u64(rd, key_str,
+				mnl_attr_get_u64(val_attr), print_type);
+	}
+	return -EINVAL;
+}
+
+void print_provider_table(struct rd *rd, struct nlattr *tb)
+{
+	int print_type = RDMA_NLDEV_PRINT_TYPE_UNSPEC;
+	struct nlattr *tb_entry, *key = NULL, *val;
+	int type, cc = 0;
+
+	if (!rd->show_provider_details || !tb)
+		return;
+
+	if (rd->pretty_output)
+		newline_indent(rd);
+
+	/*
+	 * Provider attrs are tuples of {key, [print-type], value}.
+	 * The key must be a string.  If print-type is present, it 
+	 * defines an alternate printf format type vs the native format
+	 * for the attribute.  And the value can be any available
+	 * provider type.
+	 */
+	mnl_attr_for_each_nested(tb_entry, tb) {
+
+		if (cc > MAX_LINE_LENGTH) {
+			if (rd->pretty_output)
+				newline_indent(rd);
+			cc = 0;
+		}
+		if (rd_attr_check(tb_entry, &type) != MNL_CB_OK)
+			return;
+		if (!key) {
+			if (type != MNL_TYPE_NUL_STRING)
+				return;
+			key = tb_entry;
+		} else if (type == MNL_TYPE_U8) {
+			print_type = mnl_attr_get_u8(tb_entry);
+		} else {
+			val = tb_entry;
+			cc += print_provider_entry(rd, key, val, print_type);
+			if (cc < 0)
+				return;
+			print_type = RDMA_NLDEV_PRINT_TYPE_UNSPEC;
+			key = NULL;
+		}
+	}
+	return;
+}
-- 
1.8.3.1

^ permalink raw reply related

* [PATCH RFC iproute2-next 1/2] rdma: update rdma_netlink.h to get provider attrs
From: Steve Wise @ 2018-04-30 14:36 UTC (permalink / raw)
  To: dsahern, leon; +Cc: stephen, netdev, linux-rdma
In-Reply-To: <cover.1525100473.git.swise@opengridcomputing.com>

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
---
 rdma/include/uapi/rdma/rdma_netlink.h | 37 ++++++++++++++++++++++++++++++++++-
 1 file changed, 36 insertions(+), 1 deletion(-)

diff --git a/rdma/include/uapi/rdma/rdma_netlink.h b/rdma/include/uapi/rdma/rdma_netlink.h
index 45474f1..faea9d5 100644
--- a/rdma/include/uapi/rdma/rdma_netlink.h
+++ b/rdma/include/uapi/rdma/rdma_netlink.h
@@ -249,10 +249,22 @@ enum rdma_nldev_command {
 	RDMA_NLDEV_NUM_OPS
 };
 
+enum {
+	RDMA_NLDEV_ATTR_ENTRY_STRLEN = 16,
+};
+
+enum rdma_nldev_print_type {
+	RDMA_NLDEV_PRINT_TYPE_UNSPEC,
+	RDMA_NLDEV_PRINT_TYPE_HEX,
+};
+
 enum rdma_nldev_attr {
 	/* don't change the order or add anything between, this is ABI! */
 	RDMA_NLDEV_ATTR_UNSPEC,
 
+	/* Pad attribute for 64b alignment */
+	RDMA_NLDEV_ATTR_PAD = RDMA_NLDEV_ATTR_UNSPEC,
+
 	/* Identifier for ib_device */
 	RDMA_NLDEV_ATTR_DEV_INDEX,		/* u32 */
 
@@ -387,8 +399,31 @@ enum rdma_nldev_attr {
 	RDMA_NLDEV_ATTR_RES_PD_ENTRY,		/* nested table */
 	RDMA_NLDEV_ATTR_RES_LOCAL_DMA_LKEY,	/* u32 */
 	RDMA_NLDEV_ATTR_RES_UNSAFE_GLOBAL_RKEY,	/* u32 */
+	/*
+	 * provider-specific attributes.
+	 */
+	RDMA_NLDEV_ATTR_PROVIDER,		/* nested table */
+	RDMA_NLDEV_ATTR_PROVIDER_ENTRY,		/* nested table */
+	RDMA_NLDEV_ATTR_PROVIDER_STRING,	/* string */
+	/*
+	 * u8 values from enum rdma_nldev_print_type
+	 */
+	RDMA_NLDEV_ATTR_PROVIDER_PRINT_TYPE,	/* u8 */
+	RDMA_NLDEV_ATTR_PROVIDER_S32,		/* s32 */
+	RDMA_NLDEV_ATTR_PROVIDER_U32,		/* u32 */
+	RDMA_NLDEV_ATTR_PROVIDER_S64,		/* s64 */
+	RDMA_NLDEV_ATTR_PROVIDER_U64,		/* u64 */
 
-	/* Netdev information for relevant protocols, like RoCE and iWARP */
+	/*
+	 * Provides logical name and index of netdevice which is
+	 * connected to physical port. This information is relevant
+	 * for RoCE and iWARP.
+	 *
+	 * The netdevices which are associated with containers are
+	 * supposed to be exported together with GID table once it
+	 * will be exposed through the netlink. Because the
+	 * associated netdevices are properties of GIDs.
+	 */
 	RDMA_NLDEV_ATTR_NDEV_INDEX,		/* u32 */
 	RDMA_NLDEV_ATTR_NDEV_NAME,		/* string */
 
-- 
1.8.3.1

^ permalink raw reply related

* [PATCH RFC iproute2-next 0/2] RDMA tool provider resource tracking
From: Steve Wise @ 2018-04-30 15:01 UTC (permalink / raw)
  To: dsahern, leon; +Cc: stephen, netdev, linux-rdma

Hello,

This series enhances the iproute2 rdma tool to include displaying
provider-specific resource attributes.  It is the user-space part of
the kernel provider resource tracking series currently under
review [1].

This is an RFC and should not be merged yet.  Once [1] is in the
linux-rdma for-next branch (and all reviewing is complete), I'll post
a final version and request merging.

Thanks,

Steve.

[1] https://www.spinics.net/lists/linux-rdma/msg64013.html

Steve Wise (2):
  rdma: update rdma_netlink.h to get provider attrs
  rdma: print provider resource attributes

 rdma/include/uapi/rdma/rdma_netlink.h |  37 ++++++-
 rdma/rdma.c                           |   7 +-
 rdma/rdma.h                           |  11 ++
 rdma/res.c                            |  30 ++----
 rdma/utils.c                          | 194 ++++++++++++++++++++++++++++++++++
 5 files changed, 257 insertions(+), 22 deletions(-)

-- 
1.8.3.1

^ permalink raw reply

* Re: [PATCH bpf-next 2/3] bpf: fix formatting for bpf_get_stack() helper doc
From: Alexei Starovoitov @ 2018-04-30 15:08 UTC (permalink / raw)
  To: Quentin Monnet; +Cc: daniel, ast, netdev, oss-drivers, Yonghong Song
In-Reply-To: <20180430103905.12863-3-quentin.monnet@netronome.com>

On Mon, Apr 30, 2018 at 11:39:04AM +0100, Quentin Monnet wrote:
> Fix formatting (indent) for bpf_get_stack() helper documentation, so
> that the doc is rendered correctly with the Python script.
> 
> Fixes: c195651e565a ("bpf: add bpf_get_stack helper")
> Cc: Yonghong Song <yhs@fb.com>
> Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
> ---
> 
> Note: The error was a missing space between the '*' marking the
> comments, and the tabs. This expected mixed indent comes from the fact I
> started to write the doc as a RST, then copied my contents (tabs
> included) in the header file and added a " * " (with a space) prefix
> everywhere.
> 
> On a second thought, using such indent style was maybe... not my best idea
> ever. Anyway, if indent for documenting eBPF helpers really gets to painful, we
> could relax parsing rules in the Python script to make things easier.
> ---
>  include/uapi/linux/bpf.h | 54 ++++++++++++++++++++++++------------------------
>  1 file changed, 27 insertions(+), 27 deletions(-)
> 
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index 530ff6588d8f..8daef7326bb7 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -1770,33 +1770,33 @@ union bpf_attr {
>   *
>   * int bpf_get_stack(struct pt_regs *regs, void *buf, u32 size, u64 flags)
>   * 	Description
> - *		Return a user or a kernel stack in bpf program provided buffer.
> - *		To achieve this, the helper needs *ctx*, which is a pointer
> + * 		Return a user or a kernel stack in bpf program provided buffer.
> + * 		To achieve this, the helper needs *ctx*, which is a pointer

I still don't quite get the difference.
It's replacing 2 tabs in above with 1 space + 2 tabs ?
Can you please teach the python script to accept both?
I bet that will be recurring mistake and it's impossible to spot in code review.

^ permalink raw reply

* Re: [PATCH] connector: add parent pid and tgid to coredump and exit events
From: Evgeniy Polyakov @ 2018-04-30 15:01 UTC (permalink / raw)
  To: Stefan Strogin, Jesper Derehag, David Miller
  Cc: netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	xe-linux-external@cisco.com, matt.helsley@gmail.com
In-Reply-To: <32f14672-5715-3e21-ba85-b27dc8d2c6b0@gmail.com>

Stefan, hi

Sorry for delay.

26.04.2018, 15:04, "Stefan Strogin" <stefan.strogin@gmail.com>:
> Hi David, Evgeniy,
>
> Sorry to bother you, but could you please comment about the UAPI change and the patch?

With 4-bytes pid_t everything looks fine, and I do not know arch where pid is larger currently, so it looks safe.

David, please pull it into your tree, or should it go via different path?

Acked-by: Evgeniy Polyakov <zbr@ioremap.net>


>>  I don't see how it breaks UAPI. The point is that structures
>>  coredump_proc_event and exit_proc_event are members of *union*
>>  event_data, thus position of the existing data in the structure is
>>  unchanged. Furthermore, this change won't increase size of struct
>>  proc_event, because comm_proc_event (also a member of event_data) is
>>  of bigger size than the changed structures.
>>
>>  If I'm wrong, could you please explain what exactly will the change
>>  break in UAPI?
>>
>>  On 30/03/18 19:59, David Miller wrote:
>>>  From: Stefan Strogin <sstrogin@cisco.com>
>>>  Date: Thu, 29 Mar 2018 17:12:47 +0300
>>>
>>>>  diff --git a/include/uapi/linux/cn_proc.h b/include/uapi/linux/cn_proc.h
>>>>  index 68ff25414700..db210625cee8 100644
>>>>  --- a/include/uapi/linux/cn_proc.h
>>>>  +++ b/include/uapi/linux/cn_proc.h
>>>>  @@ -116,12 +116,16 @@ struct proc_event {
>>>>               struct coredump_proc_event {
>>>>                       __kernel_pid_t process_pid;
>>>>                       __kernel_pid_t process_tgid;
>>>>  + __kernel_pid_t parent_pid;
>>>>  + __kernel_pid_t parent_tgid;
>>>>               } coredump;
>>>>
>>>>               struct exit_proc_event {
>>>>                       __kernel_pid_t process_pid;
>>>>                       __kernel_pid_t process_tgid;
>>>>                       __u32 exit_code, exit_signal;
>>>>  + __kernel_pid_t parent_pid;
>>>>  + __kernel_pid_t parent_tgid;
>>>>               } exit;
>>>>
>>>>       } event_data;
>>>
>>>  I don't think you can add these members without breaking UAPI.

^ permalink raw reply

* [PATCH net-next 4/4] net/smc: determine vlan_id of stacked net_device
From: Ursula Braun @ 2018-04-30 14:51 UTC (permalink / raw)
  To: davem; +Cc: netdev, linux-s390, schwidefsky, heiko.carstens, raspl, ubraun
In-Reply-To: <20180430145119.72479-1-ubraun@linux.ibm.com>

An SMC link group is bound to a specific vlan_id. Its link uses
the RoCE-GIDs established for the specific vlan_id. This patch makes
sure the appropriate vlan_id is determined for stacked scenarios like
for instance a master bonding device with vlan devices enslaved.

Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
---
 net/smc/smc_core.c | 26 +++++++++++++++++++++++---
 1 file changed, 23 insertions(+), 3 deletions(-)

diff --git a/net/smc/smc_core.c b/net/smc/smc_core.c
index d9247765aff3..1f3ea62fac5c 100644
--- a/net/smc/smc_core.c
+++ b/net/smc/smc_core.c
@@ -360,7 +360,8 @@ void smc_lgr_terminate(struct smc_link_group *lgr)
 static int smc_vlan_by_tcpsk(struct socket *clcsock, unsigned short *vlan_id)
 {
 	struct dst_entry *dst = sk_dst_get(clcsock->sk);
-	int rc = 0;
+	struct net_device *ndev;
+	int i, nest_lvl, rc = 0;
 
 	*vlan_id = 0;
 	if (!dst) {
@@ -372,8 +373,27 @@ static int smc_vlan_by_tcpsk(struct socket *clcsock, unsigned short *vlan_id)
 		goto out_rel;
 	}
 
-	if (is_vlan_dev(dst->dev))
-		*vlan_id = vlan_dev_vlan_id(dst->dev);
+	ndev = dst->dev;
+	if (is_vlan_dev(ndev)) {
+		*vlan_id = vlan_dev_vlan_id(ndev);
+		goto out_rel;
+	}
+
+	rtnl_lock();
+	nest_lvl = dev_get_nest_level(ndev);
+	for (i = 0; i < nest_lvl; i++) {
+		struct list_head *lower = &ndev->adj_list.lower;
+
+		if (list_empty(lower))
+			break;
+		lower = lower->next;
+		ndev = (struct net_device *)netdev_lower_get_next(ndev, &lower);
+		if (is_vlan_dev(ndev)) {
+			*vlan_id = vlan_dev_vlan_id(ndev);
+			break;
+		}
+	}
+	rtnl_unlock();
 
 out_rel:
 	dst_release(dst);
-- 
2.13.5

^ permalink raw reply related

* [PATCH net-next 3/4] net/smc: handle ioctls SIOCINQ, SIOCOUTQ, and SIOCOUTQNSD
From: Ursula Braun @ 2018-04-30 14:51 UTC (permalink / raw)
  To: davem; +Cc: netdev, linux-s390, schwidefsky, heiko.carstens, raspl, ubraun
In-Reply-To: <20180430145119.72479-1-ubraun@linux.ibm.com>

SIOCINQ returns the amount of unread data in the RMB.
SIOCOUTQ returns the amount of unsent or unacked sent data in the send
buffer.
SIOCOUTQNSD returns the amount of data prepared for sending, but
not yet sent.

Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
---
 net/smc/af_smc.c | 33 ++++++++++++++++++++++++++++++---
 1 file changed, 30 insertions(+), 3 deletions(-)

diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c
index 961b8eff9553..823ea3371575 100644
--- a/net/smc/af_smc.c
+++ b/net/smc/af_smc.c
@@ -29,6 +29,7 @@
 #include <net/sock.h>
 #include <net/tcp.h>
 #include <net/smc.h>
+#include <asm/ioctls.h>
 
 #include "smc.h"
 #include "smc_clc.h"
@@ -1389,12 +1390,38 @@ static int smc_ioctl(struct socket *sock, unsigned int cmd,
 		     unsigned long arg)
 {
 	struct smc_sock *smc;
+	int answ;
 
 	smc = smc_sk(sock->sk);
-	if (smc->use_fallback)
+	if (smc->use_fallback) {
+		if (!smc->clcsock)
+			return -EBADF;
 		return smc->clcsock->ops->ioctl(smc->clcsock, cmd, arg);
-	else
-		return sock_no_ioctl(sock, cmd, arg);
+	}
+	switch (cmd) {
+	case SIOCINQ: /* same as FIONREAD */
+		if (smc->sk.sk_state == SMC_LISTEN)
+			return -EINVAL;
+		answ = atomic_read(&smc->conn.bytes_to_rcv);
+		break;
+	case SIOCOUTQ:
+		/* output queue size (not send + not acked) */
+		if (smc->sk.sk_state == SMC_LISTEN)
+			return -EINVAL;
+		answ = smc->conn.sndbuf_size -
+					atomic_read(&smc->conn.sndbuf_space);
+		break;
+	case SIOCOUTQNSD:
+		/* output queue size (not send only) */
+		if (smc->sk.sk_state == SMC_LISTEN)
+			return -EINVAL;
+		answ = smc_tx_prepared_sends(&smc->conn);
+		break;
+	default:
+		return -ENOIOCTLCMD;
+	}
+
+	return put_user(answ, (int __user *)arg);
 }
 
 static ssize_t smc_sendpage(struct socket *sock, struct page *page,
-- 
2.13.5

^ permalink raw reply related

* [PATCH net-next 0/4] net/smc: fixes 2018/04/30
From: Ursula Braun @ 2018-04-30 14:51 UTC (permalink / raw)
  To: davem; +Cc: netdev, linux-s390, schwidefsky, heiko.carstens, raspl, ubraun

From: Ursula Braun <ursula.braun@linux.ibm.com>

Dave,

here are 4 smc patches for net-next covering different areas:
   * link health check
   * diagnostics for IPv6 smc sockets
   * ioctl
   * improvement for vlan determination

Thanks, Ursula

Karsten Graul (2):
  net/smc: periodic testlink support
  net/smc: ipv6 support for smc_diag.c

Ursula Braun (2):
  net/smc: handle ioctls SIOCINQ, SIOCOUTQ, and SIOCOUTQNSD
  net/smc: determine vlan_id of stacked net_device

 net/smc/af_smc.c   | 39 +++++++++++++++++++++++++++++-----
 net/smc/smc_core.c | 28 +++++++++++++++++++++---
 net/smc/smc_core.h |  4 ++++
 net/smc/smc_diag.c | 37 ++++++++++++++++++++++++--------
 net/smc/smc_llc.c  | 62 +++++++++++++++++++++++++++++++++++++++++++++++++++++-
 net/smc/smc_llc.h  |  3 +++
 net/smc/smc_wr.c   |  1 +
 7 files changed, 156 insertions(+), 18 deletions(-)

-- 
2.13.5

^ permalink raw reply

* [PATCH net-next 2/4] net/smc: ipv6 support for smc_diag.c
From: Ursula Braun @ 2018-04-30 14:51 UTC (permalink / raw)
  To: davem; +Cc: netdev, linux-s390, schwidefsky, heiko.carstens, raspl, ubraun
In-Reply-To: <20180430145119.72479-1-ubraun@linux.ibm.com>

From: Karsten Graul <kgraul@linux.ibm.com>

Update smc_diag.c to support ipv6 addresses on the diagnosis interface.

Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
---
 net/smc/smc_diag.c | 37 ++++++++++++++++++++++++++++---------
 1 file changed, 28 insertions(+), 9 deletions(-)

diff --git a/net/smc/smc_diag.c b/net/smc/smc_diag.c
index 427b91c1c964..9a8d0db7bb88 100644
--- a/net/smc/smc_diag.c
+++ b/net/smc/smc_diag.c
@@ -38,17 +38,25 @@ static void smc_diag_msg_common_fill(struct smc_diag_msg *r, struct sock *sk)
 {
 	struct smc_sock *smc = smc_sk(sk);
 
-	r->diag_family = sk->sk_family;
 	if (!smc->clcsock)
 		return;
 	r->id.idiag_sport = htons(smc->clcsock->sk->sk_num);
 	r->id.idiag_dport = smc->clcsock->sk->sk_dport;
 	r->id.idiag_if = smc->clcsock->sk->sk_bound_dev_if;
 	sock_diag_save_cookie(sk, r->id.idiag_cookie);
-	memset(&r->id.idiag_src, 0, sizeof(r->id.idiag_src));
-	memset(&r->id.idiag_dst, 0, sizeof(r->id.idiag_dst));
-	r->id.idiag_src[0] = smc->clcsock->sk->sk_rcv_saddr;
-	r->id.idiag_dst[0] = smc->clcsock->sk->sk_daddr;
+	if (sk->sk_protocol == SMCPROTO_SMC6) {
+		r->diag_family = PF_INET6;
+		memcpy(&r->id.idiag_src, &smc->clcsock->sk->sk_v6_rcv_saddr,
+		       sizeof(smc->clcsock->sk->sk_v6_rcv_saddr));
+		memcpy(&r->id.idiag_dst, &smc->clcsock->sk->sk_v6_daddr,
+		       sizeof(smc->clcsock->sk->sk_v6_daddr));
+	} else {
+		r->diag_family = PF_INET;
+		memset(&r->id.idiag_src, 0, sizeof(r->id.idiag_src));
+		memset(&r->id.idiag_dst, 0, sizeof(r->id.idiag_dst));
+		r->id.idiag_src[0] = smc->clcsock->sk->sk_rcv_saddr;
+		r->id.idiag_dst[0] = smc->clcsock->sk->sk_daddr;
+	}
 }
 
 static int smc_diag_msg_attrs_fill(struct sock *sk, struct sk_buff *skb,
@@ -153,7 +161,8 @@ static int __smc_diag_dump(struct sock *sk, struct sk_buff *skb,
 	return -EMSGSIZE;
 }
 
-static int smc_diag_dump(struct sk_buff *skb, struct netlink_callback *cb)
+static int smc_diag_dump_proto(struct proto *prot, struct sk_buff *skb,
+			       struct netlink_callback *cb)
 {
 	struct net *net = sock_net(skb->sk);
 	struct nlattr *bc = NULL;
@@ -161,8 +170,8 @@ static int smc_diag_dump(struct sk_buff *skb, struct netlink_callback *cb)
 	struct sock *sk;
 	int rc = 0;
 
-	read_lock(&smc_proto.h.smc_hash->lock);
-	head = &smc_proto.h.smc_hash->ht;
+	read_lock(&prot->h.smc_hash->lock);
+	head = &prot->h.smc_hash->ht;
 	if (hlist_empty(head))
 		goto out;
 
@@ -175,7 +184,17 @@ static int smc_diag_dump(struct sk_buff *skb, struct netlink_callback *cb)
 	}
 
 out:
-	read_unlock(&smc_proto.h.smc_hash->lock);
+	read_unlock(&prot->h.smc_hash->lock);
+	return rc;
+}
+
+static int smc_diag_dump(struct sk_buff *skb, struct netlink_callback *cb)
+{
+	int rc = 0;
+
+	rc = smc_diag_dump_proto(&smc_proto, skb, cb);
+	if (!rc)
+		rc = smc_diag_dump_proto(&smc_proto6, skb, cb);
 	return rc;
 }
 
-- 
2.13.5

^ permalink raw reply related

* [PATCH net-next 1/4] net/smc: periodic testlink support
From: Ursula Braun @ 2018-04-30 14:51 UTC (permalink / raw)
  To: davem; +Cc: netdev, linux-s390, schwidefsky, heiko.carstens, raspl, ubraun
In-Reply-To: <20180430145119.72479-1-ubraun@linux.ibm.com>

From: Karsten Graul <kgraul@linux.ibm.com>

Add periodic LLC testlink support to ensure the link is still active.
The interval time is initialized using the value of
sysctl_tcp_keepalive_time.

Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
---
 net/smc/af_smc.c   |  6 ++++--
 net/smc/smc_core.c |  2 ++
 net/smc/smc_core.h |  4 ++++
 net/smc/smc_llc.c  | 62 +++++++++++++++++++++++++++++++++++++++++++++++++++++-
 net/smc/smc_llc.h  |  3 +++
 net/smc/smc_wr.c   |  1 +
 6 files changed, 75 insertions(+), 3 deletions(-)

diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c
index 20aa4175b9f8..961b8eff9553 100644
--- a/net/smc/af_smc.c
+++ b/net/smc/af_smc.c
@@ -294,6 +294,7 @@ static void smc_copy_sock_settings_to_smc(struct smc_sock *smc)
 
 static int smc_clnt_conf_first_link(struct smc_sock *smc)
 {
+	struct net *net = sock_net(smc->clcsock->sk);
 	struct smc_link_group *lgr = smc->conn.lgr;
 	struct smc_link *link;
 	int rest;
@@ -353,7 +354,7 @@ static int smc_clnt_conf_first_link(struct smc_sock *smc)
 	if (rc < 0)
 		return SMC_CLC_DECL_TCL;
 
-	link->state = SMC_LNK_ACTIVE;
+	smc_llc_link_active(link, net->ipv4.sysctl_tcp_keepalive_time);
 
 	return 0;
 }
@@ -715,6 +716,7 @@ void smc_close_non_accepted(struct sock *sk)
 
 static int smc_serv_conf_first_link(struct smc_sock *smc)
 {
+	struct net *net = sock_net(smc->clcsock->sk);
 	struct smc_link_group *lgr = smc->conn.lgr;
 	struct smc_link *link;
 	int rest;
@@ -769,7 +771,7 @@ static int smc_serv_conf_first_link(struct smc_sock *smc)
 		return rc;
 	}
 
-	link->state = SMC_LNK_ACTIVE;
+	smc_llc_link_active(link, net->ipv4.sysctl_tcp_keepalive_time);
 
 	return 0;
 }
diff --git a/net/smc/smc_core.c b/net/smc/smc_core.c
index f44f6803f7ff..d9247765aff3 100644
--- a/net/smc/smc_core.c
+++ b/net/smc/smc_core.c
@@ -310,6 +310,7 @@ static void smc_lgr_free_bufs(struct smc_link_group *lgr)
 /* remove a link group */
 void smc_lgr_free(struct smc_link_group *lgr)
 {
+	smc_llc_link_flush(&lgr->lnk[SMC_SINGLE_LINK]);
 	smc_lgr_free_bufs(lgr);
 	smc_link_clear(&lgr->lnk[SMC_SINGLE_LINK]);
 	kfree(lgr);
@@ -332,6 +333,7 @@ void smc_lgr_terminate(struct smc_link_group *lgr)
 	struct rb_node *node;
 
 	smc_lgr_forget(lgr);
+	smc_llc_link_inactive(&lgr->lnk[SMC_SINGLE_LINK]);
 
 	write_lock_bh(&lgr->conns_lock);
 	node = rb_first(&lgr->conns_all);
diff --git a/net/smc/smc_core.h b/net/smc/smc_core.h
index 07e2a393e6d9..97339f03ba79 100644
--- a/net/smc/smc_core.h
+++ b/net/smc/smc_core.h
@@ -79,6 +79,7 @@ struct smc_link {
 	dma_addr_t		wr_rx_dma_addr;	/* DMA address of wr_rx_bufs */
 	u64			wr_rx_id;	/* seq # of last recv WR */
 	u32			wr_rx_cnt;	/* number of WR recv buffers */
+	unsigned long		wr_rx_tstamp;	/* jiffies when last buf rx */
 
 	struct ib_reg_wr	wr_reg;		/* WR register memory region */
 	wait_queue_head_t	wr_reg_wait;	/* wait for wr_reg result */
@@ -101,6 +102,9 @@ struct smc_link {
 	int			llc_confirm_resp_rc; /* rc from conf_resp msg */
 	struct completion	llc_add;	/* wait for rx of add link */
 	struct completion	llc_add_resp;	/* wait for rx of add link rsp*/
+	struct delayed_work	llc_testlink_wrk; /* testlink worker */
+	struct completion	llc_testlink_resp; /* wait for rx of testlink */
+	int			llc_testlink_time; /* testlink interval */
 };
 
 /* For now we just allow one parallel link per link group. The SMC protocol
diff --git a/net/smc/smc_llc.c b/net/smc/smc_llc.c
index ea4b21981b4b..33b4d856f4c6 100644
--- a/net/smc/smc_llc.c
+++ b/net/smc/smc_llc.c
@@ -397,7 +397,8 @@ static void smc_llc_rx_test_link(struct smc_link *link,
 				 struct smc_llc_msg_test_link *llc)
 {
 	if (llc->hd.flags & SMC_LLC_FLAG_RESP) {
-		/* unused as long as we don't send this type of msg */
+		if (link->state == SMC_LNK_ACTIVE)
+			complete(&link->llc_testlink_resp);
 	} else {
 		smc_llc_send_test_link(link, llc->user_data, SMC_LLC_RESP);
 	}
@@ -502,6 +503,65 @@ static void smc_llc_rx_handler(struct ib_wc *wc, void *buf)
 	}
 }
 
+/***************************** worker ****************************************/
+
+static void smc_llc_testlink_work(struct work_struct *work)
+{
+	struct smc_link *link = container_of(to_delayed_work(work),
+					     struct smc_link, llc_testlink_wrk);
+	unsigned long next_interval;
+	struct smc_link_group *lgr;
+	unsigned long expire_time;
+	u8 user_data[16] = { 0 };
+	int rc;
+
+	lgr = container_of(link, struct smc_link_group, lnk[SMC_SINGLE_LINK]);
+	if (link->state != SMC_LNK_ACTIVE)
+		return;		/* don't reschedule worker */
+	expire_time = link->wr_rx_tstamp + link->llc_testlink_time;
+	if (time_is_after_jiffies(expire_time)) {
+		next_interval = expire_time - jiffies;
+		goto out;
+	}
+	reinit_completion(&link->llc_testlink_resp);
+	smc_llc_send_test_link(link, user_data, SMC_LLC_REQ);
+	/* receive TEST LINK response over RoCE fabric */
+	rc = wait_for_completion_interruptible_timeout(&link->llc_testlink_resp,
+						       SMC_LLC_WAIT_TIME);
+	if (rc <= 0) {
+		smc_lgr_terminate(lgr);
+		return;
+	}
+	next_interval = link->llc_testlink_time;
+out:
+	schedule_delayed_work(&link->llc_testlink_wrk, next_interval);
+}
+
+void smc_llc_link_active(struct smc_link *link, int testlink_time)
+{
+	init_completion(&link->llc_testlink_resp);
+	INIT_DELAYED_WORK(&link->llc_testlink_wrk, smc_llc_testlink_work);
+	link->state = SMC_LNK_ACTIVE;
+	if (testlink_time) {
+		link->llc_testlink_time = testlink_time * HZ;
+		schedule_delayed_work(&link->llc_testlink_wrk,
+				      link->llc_testlink_time);
+	}
+}
+
+/* called in tasklet context */
+void smc_llc_link_inactive(struct smc_link *link)
+{
+	link->state = SMC_LNK_INACTIVE;
+	cancel_delayed_work(&link->llc_testlink_wrk);
+}
+
+/* called in worker context */
+void smc_llc_link_flush(struct smc_link *link)
+{
+	cancel_delayed_work_sync(&link->llc_testlink_wrk);
+}
+
 /***************************** init, exit, misc ******************************/
 
 static struct smc_wr_rx_handler smc_llc_rx_handlers[] = {
diff --git a/net/smc/smc_llc.h b/net/smc/smc_llc.h
index e4a7d5e234d5..d6e42116485e 100644
--- a/net/smc/smc_llc.h
+++ b/net/smc/smc_llc.h
@@ -44,6 +44,9 @@ int smc_llc_send_delete_link(struct smc_link *link,
 			     enum smc_llc_reqresp reqresp);
 int smc_llc_send_test_link(struct smc_link *lnk, u8 user_data[16],
 			   enum smc_llc_reqresp reqresp);
+void smc_llc_link_active(struct smc_link *link, int testlink_time);
+void smc_llc_link_inactive(struct smc_link *link);
+void smc_llc_link_flush(struct smc_link *link);
 int smc_llc_init(void) __init;
 
 #endif /* SMC_LLC_H */
diff --git a/net/smc/smc_wr.c b/net/smc/smc_wr.c
index 1b8af23e6e2b..cc7c1bb60fe8 100644
--- a/net/smc/smc_wr.c
+++ b/net/smc/smc_wr.c
@@ -376,6 +376,7 @@ static inline void smc_wr_rx_process_cqes(struct ib_wc wc[], int num)
 	for (i = 0; i < num; i++) {
 		link = wc[i].qp->qp_context;
 		if (wc[i].status == IB_WC_SUCCESS) {
+			link->wr_rx_tstamp = jiffies;
 			smc_wr_rx_demultiplex(&wc[i]);
 			smc_wr_rx_post(link); /* refill WR RX */
 		} else {
-- 
2.13.5

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox