Netdev List

Netdev List
 help / color / mirror / Atom feed

* KMSAN: uninit-value in hsr_register_frame_in
From: syzbot @ 2019-02-11 21:53 UTC (permalink / raw)
  To: arvid.brodin, davem, glider, linux-kernel, netdev, syzkaller-bugs

Hello,

syzbot found the following crash on:

HEAD commit:    fa1981bee40f kmsan: fix defconfig build
git tree:       kmsan
console output: https://syzkaller.appspot.com/x/log.txt?x=12c53624c00000
kernel config:  https://syzkaller.appspot.com/x/.config?x=52c9737ec5618f82
dashboard link: https://syzkaller.appspot.com/bug?extid=b8152ab439b9c5174ffd
compiler:       clang version 8.0.0 (trunk 350509)
syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=11a4f787400000
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=1797e960c00000

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+b8152ab439b9c5174ffd@syzkaller.appspotmail.com

IPv6: ADDRCONF(NETDEV_UP): hsr0: link is not ready
IPv6: ADDRCONF(NETDEV_CHANGE): hsr0: link becomes ready
IPv6: ADDRCONF(NETDEV_UP): vxcan1: link is not ready
8021q: adding VLAN 0 to HW filter on device batadv0
==================================================================
BUG: KMSAN: uninit-value in hsr_register_frame_in+0x12c/0x200  
net/hsr/hsr_framereg.c:318
CPU: 0 PID: 10753 Comm: syz-executor132 Not tainted 5.0.0-rc1+ #9
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011
Call Trace:
  __dump_stack lib/dump_stack.c:77 [inline]
  dump_stack+0x173/0x1d0 lib/dump_stack.c:113
  kmsan_report+0x12e/0x2a0 mm/kmsan/kmsan.c:600
  __msan_warning+0x82/0xf0 mm/kmsan/kmsan_instr.c:313
  hsr_register_frame_in+0x12c/0x200 net/hsr/hsr_framereg.c:318
  hsr_forward_skb+0xd64/0x2e30 net/hsr/hsr_forward.c:372
  hsr_dev_xmit+0xf8/0x160 net/hsr/hsr_device.c:243
  __netdev_start_xmit include/linux/netdevice.h:4382 [inline]
  netdev_start_xmit include/linux/netdevice.h:4391 [inline]
  xmit_one net/core/dev.c:3278 [inline]
  dev_hard_start_xmit+0x604/0xc40 net/core/dev.c:3294
  __dev_queue_xmit+0x2e48/0x3b80 net/core/dev.c:3864
  dev_queue_xmit+0x4b/0x60 net/core/dev.c:3897
  packet_snd net/packet/af_packet.c:2932 [inline]
  packet_sendmsg+0x79bb/0x9760 net/packet/af_packet.c:2957
  sock_sendmsg_nosec net/socket.c:621 [inline]
  sock_sendmsg net/socket.c:631 [inline]
  __sys_sendto+0x8c4/0xac0 net/socket.c:1788
  __do_sys_sendto net/socket.c:1800 [inline]
  __se_sys_sendto+0x107/0x130 net/socket.c:1796
  __x64_sys_sendto+0x6e/0x90 net/socket.c:1796
  do_syscall_64+0xbc/0xf0 arch/x86/entry/common.c:291
  entry_SYSCALL_64_after_hwframe+0x63/0xe7
RIP: 0033:0x441869
Code: 18 89 d0 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 48 89 f8 48 89 f7  
48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff  
ff 0f 83 bb 10 fc ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007fffa42c7de8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 0000000000441869
RDX: 000000000000000e RSI: 00000000200000c0 RDI: 0000000000000003
RBP: 00000000004a8fb0 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000402db0
R13: 0000000000402e40 R14: 0000000000000000 R15: 0000000000000000

Uninit was stored to memory at:
  kmsan_save_stack_with_flags mm/kmsan/kmsan.c:205 [inline]
  kmsan_save_stack mm/kmsan/kmsan.c:220 [inline]
  kmsan_internal_chain_origin+0x134/0x230 mm/kmsan/kmsan.c:426
  __msan_chain_origin+0x70/0xe0 mm/kmsan/kmsan_instr.c:200
  hsr_add_node net/hsr/hsr_framereg.c:152 [inline]
  hsr_get_node+0xc94/0xe10 net/hsr/hsr_framereg.c:198
  hsr_fill_frame_info net/hsr/hsr_forward.c:327 [inline]
  hsr_forward_skb+0x7e9/0x2e30 net/hsr/hsr_forward.c:370
  hsr_dev_xmit+0xf8/0x160 net/hsr/hsr_device.c:243
  __netdev_start_xmit include/linux/netdevice.h:4382 [inline]
  netdev_start_xmit include/linux/netdevice.h:4391 [inline]
  xmit_one net/core/dev.c:3278 [inline]
  dev_hard_start_xmit+0x604/0xc40 net/core/dev.c:3294
  __dev_queue_xmit+0x2e48/0x3b80 net/core/dev.c:3864
  dev_queue_xmit+0x4b/0x60 net/core/dev.c:3897
  packet_snd net/packet/af_packet.c:2932 [inline]
  packet_sendmsg+0x79bb/0x9760 net/packet/af_packet.c:2957
  sock_sendmsg_nosec net/socket.c:621 [inline]
  sock_sendmsg net/socket.c:631 [inline]
  __sys_sendto+0x8c4/0xac0 net/socket.c:1788
  __do_sys_sendto net/socket.c:1800 [inline]
  __se_sys_sendto+0x107/0x130 net/socket.c:1796
  __x64_sys_sendto+0x6e/0x90 net/socket.c:1796
  do_syscall_64+0xbc/0xf0 arch/x86/entry/common.c:291
  entry_SYSCALL_64_after_hwframe+0x63/0xe7

Uninit was created at:
  kmsan_save_stack_with_flags mm/kmsan/kmsan.c:205 [inline]
  kmsan_internal_poison_shadow+0x92/0x150 mm/kmsan/kmsan.c:159
  kmsan_kmalloc+0xa6/0x130 mm/kmsan/kmsan_hooks.c:176
  kmsan_slab_alloc+0xe/0x10 mm/kmsan/kmsan_hooks.c:185
  slab_post_alloc_hook mm/slab.h:446 [inline]
  slab_alloc_node mm/slub.c:2754 [inline]
  __kmalloc_node_track_caller+0xe9e/0xff0 mm/slub.c:4377
  __kmalloc_reserve net/core/skbuff.c:140 [inline]
  __alloc_skb+0x309/0xa20 net/core/skbuff.c:208
  alloc_skb include/linux/skbuff.h:1012 [inline]
  alloc_skb_with_frags+0x1c7/0xac0 net/core/skbuff.c:5288
  sock_alloc_send_pskb+0xafd/0x10a0 net/core/sock.c:2091
  packet_alloc_skb net/packet/af_packet.c:2783 [inline]
  packet_snd net/packet/af_packet.c:2876 [inline]
  packet_sendmsg+0x6881/0x9760 net/packet/af_packet.c:2957
  sock_sendmsg_nosec net/socket.c:621 [inline]
  sock_sendmsg net/socket.c:631 [inline]
  __sys_sendto+0x8c4/0xac0 net/socket.c:1788
  __do_sys_sendto net/socket.c:1800 [inline]
  __se_sys_sendto+0x107/0x130 net/socket.c:1796
  __x64_sys_sendto+0x6e/0x90 net/socket.c:1796
  do_syscall_64+0xbc/0xf0 arch/x86/entry/common.c:291
  entry_SYSCALL_64_after_hwframe+0x63/0xe7
==================================================================


---
This bug is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this bug report. See:
https://goo.gl/tpsmEJ#bug-status-tracking for how to communicate with  
syzbot.
syzbot can test patches for this bug, for details see:
https://goo.gl/tpsmEJ#testing-patches

^ permalink raw reply

* Re: [PATCH 2/3] mm/gup: Introduce get_user_pages_fast_longterm()
From: John Hubbard @ 2019-02-11 22:01 UTC (permalink / raw)
  To: Ira Weiny
  Cc: Jason Gunthorpe, linux-rdma, linux-kernel, linux-mm,
	Daniel Borkmann, Davidlohr Bueso, netdev, Mike Marciniszyn,
	Dennis Dalessandro, Doug Ledford, Andrew Morton,
	Kirill A. Shutemov, Dan Williams
In-Reply-To: <20190211215238.GA23825@iweiny-DESK2.sc.intel.com>

On 2/11/19 1:52 PM, Ira Weiny wrote:
> On Mon, Feb 11, 2019 at 01:39:12PM -0800, John Hubbard wrote:
>> On 2/11/19 1:26 PM, Ira Weiny wrote:
>>> On Mon, Feb 11, 2019 at 01:13:56PM -0800, John Hubbard wrote:
>>>> On 2/11/19 12:39 PM, Jason Gunthorpe wrote:
>>>>> On Mon, Feb 11, 2019 at 12:16:42PM -0800, ira.weiny@intel.com wrote:
>>>>>> From: Ira Weiny <ira.weiny@intel.com>
>>>> [...]
> Fair enough.   But to do that correctly I think we will need to convert
> get_user_pages_fast() to use flags as well.  I have a version of this series
> which includes a patch does this, but the patch touched a lot of subsystems and
> a couple of different architectures...[1]
> 
> I can't test them all.  If we want to go that way I'm up for submitting the

I have a similar problem, and a similar list of call sites, for the
put_user_pages() conversion, so that file list looks familiar. And the
arch-specific gup implementations are about to complicate my life too. :)

> patch...  But if we remove longterm in the future we may be left with a
> get_user_pages_fast() which really only needs 1 flag.  But perhaps overall we
> would be better off?
> 
> Ira

I certainly think so, yes.


thanks,
-- 
John Hubbard
NVIDIA
> 
> 
> [1] mm/gup.c: Change GUP fast to use flags rather than write bool
> 
> To facilitate additional options to get_user_pages_fast change the
> singular write parameter to be the more generic gup_flags.
> 
> This patch currently does not change any functionality.  New
> functionality will follow in subsequent patches.
> 
> Many of the get_user_pages_fast call sites were unchanged because they
> already used FOLL_WRITE or 0 as appropriate.
> 
> Signed-off-by: Ira Weiny <ira.weiny@intel.com>
> ---
>  arch/mips/mm/gup.c                         | 11 ++++++-----
>  arch/powerpc/kvm/book3s_64_mmu_hv.c        |  4 ++--
>  arch/powerpc/kvm/e500_mmu.c                |  2 +-
>  arch/powerpc/mm/mmu_context_iommu.c        |  4 ++--
>  arch/s390/kvm/interrupt.c                  |  2 +-
>  arch/s390/mm/gup.c                         | 12 ++++++------
>  arch/sh/mm/gup.c                           | 11 ++++++-----
>  arch/sparc/mm/gup.c                        |  9 +++++----
>  arch/x86/kvm/paging_tmpl.h                 |  2 +-
>  arch/x86/kvm/svm.c                         |  2 +-
>  drivers/fpga/dfl-afu-dma-region.c          |  2 +-
>  drivers/gpu/drm/via/via_dmablit.c          |  3 ++-
>  drivers/infiniband/hw/hfi1/user_pages.c    |  3 ++-
>  drivers/misc/genwqe/card_utils.c           |  2 +-
>  drivers/misc/vmw_vmci/vmci_host.c          |  2 +-
>  drivers/misc/vmw_vmci/vmci_queue_pair.c    |  6 ++++--
>  drivers/platform/goldfish/goldfish_pipe.c  |  3 ++-
>  drivers/rapidio/devices/rio_mport_cdev.c   |  4 +++-
>  drivers/sbus/char/oradax.c                 |  2 +-
>  drivers/scsi/st.c                          |  3 ++-
>  drivers/staging/gasket/gasket_page_table.c |  4 ++--
>  drivers/tee/tee_shm.c                      |  2 +-
>  drivers/vfio/vfio_iommu_spapr_tce.c        |  3 ++-
>  drivers/vhost/vhost.c                      |  2 +-
>  drivers/video/fbdev/pvr2fb.c               |  2 +-
>  drivers/virt/fsl_hypervisor.c              |  2 +-
>  drivers/xen/gntdev.c                       |  2 +-
>  fs/orangefs/orangefs-bufmap.c              |  2 +-
>  include/linux/mm.h                         |  4 ++--
>  kernel/futex.c                             |  2 +-
>  lib/iov_iter.c                             |  7 +++++--
>  mm/gup.c                                   | 10 +++++-----
>  mm/util.c                                  |  8 ++++----
>  net/ceph/pagevec.c                         |  2 +-
>  net/rds/info.c                             |  2 +-
>  net/rds/rdma.c                             |  3 ++-
>  36 files changed, 81 insertions(+), 65 deletions(-)
> 
> 

^ permalink raw reply

* KMSAN: uninit-value in capi_write
From: syzbot @ 2019-02-11 22:03 UTC (permalink / raw)
  To: davem, glider, isdn, keescook, linux-kernel, netdev, rdunlap,
	syzkaller-bugs, viro

Hello,

syzbot found the following crash on:

HEAD commit:    11587f6ee534 kmsan: remove pr_err
git tree:       kmsan
console output: https://syzkaller.appspot.com/x/log.txt?x=120bd84f400000
kernel config:  https://syzkaller.appspot.com/x/.config?x=c8a62a4eb8ea3e9f
dashboard link: https://syzkaller.appspot.com/bug?extid=0849c524d9c634f5ae66
compiler:       clang version 8.0.0 (trunk 350161)
syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=14c53cd3400000
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=13e2794b400000

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+0849c524d9c634f5ae66@syzkaller.appspotmail.com

==================================================================
BUG: KMSAN: uninit-value in capi_write+0x791/0xa90  
drivers/isdn/capi/capi.c:700
CPU: 0 PID: 10025 Comm: syz-executor379 Not tainted 4.20.0-rc7+ #2
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011
Call Trace:
  __dump_stack lib/dump_stack.c:77 [inline]
  dump_stack+0x173/0x1d0 lib/dump_stack.c:113
  kmsan_report+0x12e/0x2a0 mm/kmsan/kmsan.c:613
  __msan_warning+0x82/0xf0 mm/kmsan/kmsan_instr.c:313
  capi_write+0x791/0xa90 drivers/isdn/capi/capi.c:700
  do_loop_readv_writev fs/read_write.c:703 [inline]
  do_iter_write+0x83e/0xd80 fs/read_write.c:961
  vfs_writev fs/read_write.c:1004 [inline]
  do_writev+0x397/0x840 fs/read_write.c:1039
  __do_sys_writev fs/read_write.c:1112 [inline]
  __se_sys_writev+0x9b/0xb0 fs/read_write.c:1109
  __x64_sys_writev+0x4a/0x70 fs/read_write.c:1109
  do_syscall_64+0xbc/0xf0 arch/x86/entry/common.c:291
  entry_SYSCALL_64_after_hwframe+0x63/0xe7
RIP: 0033:0x440079
Code: 18 89 d0 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 48 89 f8 48 89 f7  
48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff  
ff 0f 83 fb 13 fc ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007ffd84911358 EFLAGS: 00000207 ORIG_RAX: 0000000000000014
RAX: ffffffffffffffda RBX: 00000000004002c8 RCX: 0000000000440079
RDX: 0000000000000001 RSI: 0000000020000180 RDI: 0000000000000003
RBP: 00000000006ca018 R08: 00000000004002c8 R09: 00000000004002c8
R10: 0000000000000000 R11: 0000000000000207 R12: 0000000000401900
R13: 0000000000401990 R14: 0000000000000000 R15: 0000000000000000

Uninit was created at:
  kmsan_save_stack_with_flags mm/kmsan/kmsan.c:204 [inline]
  kmsan_internal_poison_shadow+0x92/0x150 mm/kmsan/kmsan.c:158
  kmsan_kmalloc+0xa6/0x130 mm/kmsan/kmsan_hooks.c:176
  kmsan_slab_alloc+0xe/0x10 mm/kmsan/kmsan_hooks.c:185
  slab_post_alloc_hook mm/slab.h:446 [inline]
  slab_alloc_node mm/slub.c:2759 [inline]
  __kmalloc_node_track_caller+0xe18/0x1030 mm/slub.c:4383
  __kmalloc_reserve net/core/skbuff.c:137 [inline]
  __alloc_skb+0x309/0xa20 net/core/skbuff.c:205
  alloc_skb include/linux/skbuff.h:998 [inline]
  capi_write+0x12f/0xa90 drivers/isdn/capi/capi.c:691
  do_loop_readv_writev fs/read_write.c:703 [inline]
  do_iter_write+0x83e/0xd80 fs/read_write.c:961
  vfs_writev fs/read_write.c:1004 [inline]
  do_writev+0x397/0x840 fs/read_write.c:1039
  __do_sys_writev fs/read_write.c:1112 [inline]
  __se_sys_writev+0x9b/0xb0 fs/read_write.c:1109
  __x64_sys_writev+0x4a/0x70 fs/read_write.c:1109
  do_syscall_64+0xbc/0xf0 arch/x86/entry/common.c:291
  entry_SYSCALL_64_after_hwframe+0x63/0xe7
==================================================================


---
This bug is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this bug report. See:
https://goo.gl/tpsmEJ#bug-status-tracking for how to communicate with  
syzbot.
syzbot can test patches for this bug, for details see:
https://goo.gl/tpsmEJ#testing-patches

^ permalink raw reply

* Re: [PATCH 2/3] mm/gup: Introduce get_user_pages_fast_longterm()
From: Jason Gunthorpe @ 2019-02-11 22:06 UTC (permalink / raw)
  To: Ira Weiny
  Cc: John Hubbard, linux-rdma, linux-kernel, linux-mm, Daniel Borkmann,
	Davidlohr Bueso, netdev, Mike Marciniszyn, Dennis Dalessandro,
	Doug Ledford, Andrew Morton, Kirill A. Shutemov, Dan Williams
In-Reply-To: <20190211215238.GA23825@iweiny-DESK2.sc.intel.com>

On Mon, Feb 11, 2019 at 01:52:38PM -0800, Ira Weiny wrote:
> On Mon, Feb 11, 2019 at 01:39:12PM -0800, John Hubbard wrote:
> > On 2/11/19 1:26 PM, Ira Weiny wrote:
> > > On Mon, Feb 11, 2019 at 01:13:56PM -0800, John Hubbard wrote:
> > >> On 2/11/19 12:39 PM, Jason Gunthorpe wrote:
> > >>> On Mon, Feb 11, 2019 at 12:16:42PM -0800, ira.weiny@intel.com wrote:
> > >>>> From: Ira Weiny <ira.weiny@intel.com>
> > >> [...]
> > >> It seems to me that the longterm vs. short-term is of questionable value.
> > > 
> > > This is exactly why I did not post this before.  I've been waiting our other
> > > discussions on how GUP pins are going to be handled to play out.  But with the
> > > netdev thread today[1] it seems like we need to make sure we have a "safe" fast
> > > variant for a while.  Introducing FOLL_LONGTERM seemed like the cleanest way to
> > > do that even if we will not need the distinction in the future...  :-(
> > 
> > Yes, I agree. Below...
> > 
> > > [...]
> > > This is also why I did not change the get_user_pages_longterm because we could
> > > be ripping this all out by the end of the year...  (I hope. :-)
> > > 
> > > So while this does "pollute" the GUP family of calls I'm hoping it is not
> > > forever.
> > > 
> > > Ira
> > > 
> > > [1] https://lkml.org/lkml/2019/2/11/1789
> > > 
> > 
> > Yes, and to be clear, I think your patchset here is fine. It is easy to find
> > the FOLL_LONGTERM callers if and when we want to change anything. I just think
> > also it's appopriate to go a bit further, and use FOLL_LONGTERM all by itself.
> > 
> > That's because in either design outcome, it's better that way:
> > 
> > is just right. The gup API already has _fast and non-fast variants, and once
> > you get past a couple, you end up with a multiplication of names that really
> > work better as flags. We're there.
> > 
> > the _longterm API variants.
> 
> Fair enough.   But to do that correctly I think we will need to convert
> get_user_pages_fast() to use flags as well.  I have a version of this series
> which includes a patch does this, but the patch touched a lot of subsystems and
> a couple of different architectures...[1]

I think this should be done anyhow, it is trouble the two basically
identical interfaces have different signatures. This already caused a
bug in vfio..

I also wonder if someone should think about making fast into a flag
too..

But I'm not sure when fast should be used vs when it shouldn't :(

Jason

^ permalink raw reply

* linux-next: Fixes tag needs some work in the net-next tree
From: Stephen Rothwell @ 2019-02-11 22:17 UTC (permalink / raw)
  To: David Miller, Networking
  Cc: Linux Next Mailing List, Linux Kernel Mailing List, Tristram Ha

[-- Attachment #1: Type: text/plain, Size: 366 bytes --]

Hi all,

In commit

  cbd72b485214 ("net: dsa: microchip: add switch offload forwarding support")

Fixes tag

  Fixes: c2e866911e254067 ("microchip: break KSZ9477 DSA driver into two files")

has these problem(s):

  - Subject does not match target commit subject
    Just use
	git log -1 --format='Fixes: %h ("%s")'

-- 
Cheers,
Stephen Rothwell

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply

* Re: [PATCH 0/3] Add gup fast + longterm and use it in HFI1
From: Jason Gunthorpe @ 2019-02-11 22:22 UTC (permalink / raw)
  To: Ira Weiny
  Cc: linux-rdma, linux-kernel, linux-mm, Daniel Borkmann, netdev,
	Mike Marciniszyn, Dennis Dalessandro, Doug Ledford, Andrew Morton,
	Kirill A. Shutemov, Dan Williams
In-Reply-To: <20190211214257.GA7891@iweiny-DESK2.sc.intel.com>

On Mon, Feb 11, 2019 at 01:42:57PM -0800, Ira Weiny wrote:
> On Mon, Feb 11, 2019 at 01:47:10PM -0700, Jason Gunthorpe wrote:
> > On Mon, Feb 11, 2019 at 12:34:17PM -0800, Davidlohr Bueso wrote:
> > > On Mon, 11 Feb 2019, ira.weiny@intel.com wrote:
> > > > Ira Weiny (3):
> > > >  mm/gup: Change "write" parameter to flags
> > > >  mm/gup: Introduce get_user_pages_fast_longterm()
> > > >  IB/HFI1: Use new get_user_pages_fast_longterm()
> > > 
> > > Out of curiosity, are you planning on having all rdma drivers
> > > use get_user_pages_fast_longterm()? Ie:
> > > 
> > > hw/mthca/mthca_memfree.c:       ret = get_user_pages_fast(uaddr & PAGE_MASK, 1, FOLL_WRITE, pages);
> > 
> > This one is certainly a mistake - this should be done with a umem.
> 
> It looks like this is mapping a page allocated by user space for a
> doorbell?!?!

Many drivers do this, the 'doorbell' is a PCI -> CPU thing of some sort

> This does not seem to be allocating memory regions.  Jason, do you
> want a patch to just convert these calls and consider it legacy
> code?

It needs to use umem like all the other drivers on this path.
Otherwise it doesn't get the page pinning logic right

There is also something else rotten with these longterm callsites,
they seem to have very different ideas how to handle RLIMIT_MEMLOCK.

ie vfio doesn't even touch pinned_vm.. and rdma is applying
RLIMIT_MEMLOCK to mm->pinned_vm, while vfio is using locked_vm.. No
idea which is right, but they should be the same, and this pattern
should probably be in core code someplace.

Jason

^ permalink raw reply

* Re: [PATCH 0/3] Add gup fast + longterm and use it in HFI1
From: Jason Gunthorpe @ 2019-02-11 22:23 UTC (permalink / raw)
  To: Weiny, Ira
  Cc: linux-rdma@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org, Daniel Borkmann, Davidlohr Bueso,
	netdev@vger.kernel.org, Marciniszyn, Mike, Dalessandro, Dennis,
	Doug Ledford, Andrew Morton, Kirill A. Shutemov, Williams, Dan J
In-Reply-To: <2807E5FD2F6FDA4886F6618EAC48510E79BCF04C@CRSMSX101.amr.corp.intel.com>

On Mon, Feb 11, 2019 at 09:14:56PM +0000, Weiny, Ira wrote:
> > 
> > On Mon, Feb 11, 2019 at 12:16:40PM -0800, ira.weiny@intel.com wrote:
> > > From: Ira Weiny <ira.weiny@intel.com>
> > >
> > > NOTE: This series depends on my clean up patch to remove the write
> > > parameter from gup_fast_permitted()[1]
> > >
> > > HFI1 uses get_user_pages_fast() due to it performance advantages.
> > > Like RDMA,
> > > HFI1 pages can be held for a significant time.  But
> > > get_user_pages_fast() does not protect against mapping of FS DAX pages.
> > 
> > If HFI1 can use the _fast varient, can't all the general RDMA stuff use it too?
> > 
> > What is the guidance on when fast vs not fast should be use?
> 
> Right now it can't because it holds mmap_sem across the call.  Once
> Shiraz's patches are accepted removing the umem->hugetlb flag I
> think we can change umem.c.

Okay, that make sense, we should change it when Shiraz's patches are
merged
 
> Also, it specifies FOLL_FORCE which can't currently be specified
> with gup fast.  One idea I had was to change get_user_pages_fast()
> to use gup_flags instead of a single write flag.  But that proved to
> be a very big cosmetic change across a lot of callers so I went this
> way.

I think you should do it.. :)

Jason

^ permalink raw reply

* [PATCH] ser_gigaset: mark expected switch fall-through
From: Gustavo A. R. Silva @ 2019-02-11 22:34 UTC (permalink / raw)
  To: Paul Bolle, Karsten Keil
  Cc: gigaset307x-common, netdev, linux-kernel, Gustavo A. R. Silva,
	Kees Cook

In preparation to enabling -Wimplicit-fallthrough, mark switch
cases where we are expecting to fall through.

This patch fixes the following warning:

drivers/isdn/gigaset/ser-gigaset.c: In function ‘gigaset_tty_ioctl’:
drivers/isdn/gigaset/ser-gigaset.c:627:3: warning: this statement may fall through [-Wimplicit-fallthrough=]
   switch (arg) {
   ^~~~~~
drivers/isdn/gigaset/ser-gigaset.c:638:2: note: here
  default:
  ^~~~~~~

Warning level 3 was used: -Wimplicit-fallthrough=3

Notice that, in this particular case, the code comment is modified
in accordance with what GCC is expecting to find.

This patch is part of the ongoing efforts to enable
-Wimplicit-fallthrough.

Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
---
 drivers/isdn/gigaset/ser-gigaset.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/isdn/gigaset/ser-gigaset.c b/drivers/isdn/gigaset/ser-gigaset.c
index ab0b63a4d045..e1de8b1a1001 100644
--- a/drivers/isdn/gigaset/ser-gigaset.c
+++ b/drivers/isdn/gigaset/ser-gigaset.c
@@ -633,7 +633,7 @@ gigaset_tty_ioctl(struct tty_struct *tty, struct file *file,
 			flush_send_queue(cs);
 			break;
 		}
-		/* Pass through */
+		/* fall through */
 
 	default:
 		/* pass through to underlying serial device */
-- 
2.20.1


^ permalink raw reply related

* RE: [PATCH 0/3] Add gup fast + longterm and use it in HFI1
From: Weiny, Ira @ 2019-02-11 22:40 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: linux-rdma@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org, Daniel Borkmann, netdev@vger.kernel.org,
	Marciniszyn, Mike, Dalessandro, Dennis, Doug Ledford,
	Andrew Morton, Kirill A. Shutemov, Williams, Dan J
In-Reply-To: <20190211222208.GJ24692@ziepe.ca>

> On Mon, Feb 11, 2019 at 01:42:57PM -0800, Ira Weiny wrote:
> > On Mon, Feb 11, 2019 at 01:47:10PM -0700, Jason Gunthorpe wrote:
> > > On Mon, Feb 11, 2019 at 12:34:17PM -0800, Davidlohr Bueso wrote:
> > > > On Mon, 11 Feb 2019, ira.weiny@intel.com wrote:
> > > > > Ira Weiny (3):
> > > > >  mm/gup: Change "write" parameter to flags
> > > > >  mm/gup: Introduce get_user_pages_fast_longterm()
> > > > >  IB/HFI1: Use new get_user_pages_fast_longterm()
> > > >
> > > > Out of curiosity, are you planning on having all rdma drivers use
> > > > get_user_pages_fast_longterm()? Ie:
> > > >
> > > > hw/mthca/mthca_memfree.c:       ret = get_user_pages_fast(uaddr &
> PAGE_MASK, 1, FOLL_WRITE, pages);
> > >
> > > This one is certainly a mistake - this should be done with a umem.
> >
> > It looks like this is mapping a page allocated by user space for a
> > doorbell?!?!
> 
> Many drivers do this, the 'doorbell' is a PCI -> CPU thing of some sort

My surprise is why does _userspace_ allocate this memory?

> 
> > This does not seem to be allocating memory regions.  Jason, do you
> > want a patch to just convert these calls and consider it legacy code?
> 
> It needs to use umem like all the other drivers on this path.
> Otherwise it doesn't get the page pinning logic right

Not sure what you mean regarding the pinning logic?

> 
> There is also something else rotten with these longterm callsites, they seem
> to have very different ideas how to handle RLIMIT_MEMLOCK.
> 
> ie vfio doesn't even touch pinned_vm.. and rdma is applying
> RLIMIT_MEMLOCK to mm->pinned_vm, while vfio is using locked_vm.. No
> idea which is right, but they should be the same, and this pattern should
> probably be in core code someplace.

Neither do I.  But AFAIK pinned_vm is a subset of locked_vm.

So should we be accounting both of the counters?

Ira


^ permalink raw reply

* [PATCH net] batman-adv: fix uninit-value in batadv_interface_tx()
From: Eric Dumazet @ 2019-02-11 22:41 UTC (permalink / raw)
  To: David S . Miller; +Cc: netdev, Eric Dumazet, Eric Dumazet, syzbot

KMSAN reported batadv_interface_tx() was possibly using a
garbage value [1]

batadv_get_vid() does have a pskb_may_pull() call
but batadv_interface_tx() does not actually make sure
this did not fail.

[1]
BUG: KMSAN: uninit-value in batadv_interface_tx+0x908/0x1e40 net/batman-adv/soft-interface.c:231
CPU: 0 PID: 10006 Comm: syz-executor469 Not tainted 4.20.0-rc7+ #5
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x173/0x1d0 lib/dump_stack.c:113
 kmsan_report+0x12e/0x2a0 mm/kmsan/kmsan.c:613
 __msan_warning+0x82/0xf0 mm/kmsan/kmsan_instr.c:313
 batadv_interface_tx+0x908/0x1e40 net/batman-adv/soft-interface.c:231
 __netdev_start_xmit include/linux/netdevice.h:4356 [inline]
 netdev_start_xmit include/linux/netdevice.h:4365 [inline]
 xmit_one net/core/dev.c:3257 [inline]
 dev_hard_start_xmit+0x607/0xc40 net/core/dev.c:3273
 __dev_queue_xmit+0x2e42/0x3bc0 net/core/dev.c:3843
 dev_queue_xmit+0x4b/0x60 net/core/dev.c:3876
 packet_snd net/packet/af_packet.c:2928 [inline]
 packet_sendmsg+0x8306/0x8f30 net/packet/af_packet.c:2953
 sock_sendmsg_nosec net/socket.c:621 [inline]
 sock_sendmsg net/socket.c:631 [inline]
 __sys_sendto+0x8c4/0xac0 net/socket.c:1788
 __do_sys_sendto net/socket.c:1800 [inline]
 __se_sys_sendto+0x107/0x130 net/socket.c:1796
 __x64_sys_sendto+0x6e/0x90 net/socket.c:1796
 do_syscall_64+0xbc/0xf0 arch/x86/entry/common.c:291
 entry_SYSCALL_64_after_hwframe+0x63/0xe7
RIP: 0033:0x441889
Code: 18 89 d0 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 bb 10 fc ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007ffdda6fd468 EFLAGS: 00000216 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 0000000000441889
RDX: 000000000000000e RSI: 00000000200000c0 RDI: 0000000000000003
RBP: 0000000000000003 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000216 R12: 00007ffdda6fd4c0
R13: 00007ffdda6fd4b0 R14: 0000000000000000 R15: 0000000000000000

Uninit was created at:
 kmsan_save_stack_with_flags mm/kmsan/kmsan.c:204 [inline]
 kmsan_internal_poison_shadow+0x92/0x150 mm/kmsan/kmsan.c:158
 kmsan_kmalloc+0xa6/0x130 mm/kmsan/kmsan_hooks.c:176
 kmsan_slab_alloc+0xe/0x10 mm/kmsan/kmsan_hooks.c:185
 slab_post_alloc_hook mm/slab.h:446 [inline]
 slab_alloc_node mm/slub.c:2759 [inline]
 __kmalloc_node_track_caller+0xe18/0x1030 mm/slub.c:4383
 __kmalloc_reserve net/core/skbuff.c:137 [inline]
 __alloc_skb+0x309/0xa20 net/core/skbuff.c:205
 alloc_skb include/linux/skbuff.h:998 [inline]
 alloc_skb_with_frags+0x1c7/0xac0 net/core/skbuff.c:5220
 sock_alloc_send_pskb+0xafd/0x10e0 net/core/sock.c:2083
 packet_alloc_skb net/packet/af_packet.c:2781 [inline]
 packet_snd net/packet/af_packet.c:2872 [inline]
 packet_sendmsg+0x661a/0x8f30 net/packet/af_packet.c:2953
 sock_sendmsg_nosec net/socket.c:621 [inline]
 sock_sendmsg net/socket.c:631 [inline]
 __sys_sendto+0x8c4/0xac0 net/socket.c:1788
 __do_sys_sendto net/socket.c:1800 [inline]
 __se_sys_sendto+0x107/0x130 net/socket.c:1796
 __x64_sys_sendto+0x6e/0x90 net/socket.c:1796
 do_syscall_64+0xbc/0xf0 arch/x86/entry/common.c:291
 entry_SYSCALL_64_after_hwframe+0x63/0xe7

Fixes: c6c8fea29769 ("net: Add batman-adv meshing protocol")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Cc:	Marek Lindner <mareklindner@neomailbox.ch>
Cc:	Simon Wunderlich <sw@simonwunderlich.de>
Cc:	Antonio Quartulli <a@unstable.cc>
---
 net/batman-adv/soft-interface.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/batman-adv/soft-interface.c b/net/batman-adv/soft-interface.c
index b85ca809e509220b56492c34a9071c6c0d051f25..ffc83bebfe4038df24c5b8cce093c32a302dac18 100644
--- a/net/batman-adv/soft-interface.c
+++ b/net/batman-adv/soft-interface.c
@@ -227,6 +227,8 @@ static netdev_tx_t batadv_interface_tx(struct sk_buff *skb,
 
 	switch (ntohs(ethhdr->h_proto)) {
 	case ETH_P_8021Q:
+		if (!pskb_may_pull(skb, sizeof(*vhdr)))
+			goto dropped;
 		vhdr = vlan_eth_hdr(skb);
 
 		/* drop batman-in-batman packets to prevent loops */
-- 
2.20.1.791.gb4d0f1c61a-goog


^ permalink raw reply related

* Re: [Patch net-next v2] mlx5: use RCU lock in mlx5_eq_cq_get()
From: Saeed Mahameed @ 2019-02-11 22:41 UTC (permalink / raw)
  To: netdev@vger.kernel.org, xiyou.wangcong@gmail.com; +Cc: Tariq Toukan
In-Reply-To: <20190206230019.1303-1-xiyou.wangcong@gmail.com>

On Wed, 2019-02-06 at 15:00 -0800, Cong Wang wrote:
> mlx5_eq_cq_get() is called in IRQ handler, the spinlock inside
> gets a lot of contentions when we test some heavy workload
> with 60 RX queues and 80 CPU's, and it is clearly shown in the
> flame graph.
> 
> In fact, radix_tree_lookup() is perfectly fine with RCU read lock,
> we don't have to take a spinlock on this hot path. This is pretty
> much similar to commit 291c566a2891
> ("net/mlx4_core: Fix racy CQ (Completion Queue) free"). Slow paths
> are still serialized with the spinlock, and with synchronize_irq()
> it should be safe to just move the fast path to RCU read lock.
> 
> This patch itself reduces the latency by about 50% for our memcached
> workload on a 4.14 kernel we test. In upstream, as pointed out by
> Saeed,
> this spinlock gets some rework in commit 02d92f790364
> ("net/mlx5: CQ Database per EQ"), so the difference could be smaller.
> 
> Cc: Saeed Mahameed <saeedm@mellanox.com>
> Cc: Tariq Toukan <tariqt@mellanox.com>
> Acked-by: Saeed Mahameed <saeedm@mellanox.com>
> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
> 

Applied to mlx5-next

Will be sent to net-next in my next pull request

Thanks!


^ permalink raw reply

* [PATCH] isdn_v110: mark expected switch fall-through
From: Gustavo A. R. Silva @ 2019-02-11 22:42 UTC (permalink / raw)
  To: Karsten Keil; +Cc: netdev, linux-kernel, Gustavo A. R. Silva, Kees Cook

In preparation to enabling -Wimplicit-fallthrough, mark switch
cases where we are expecting to fall through.

This patch fixes the following warnings:

drivers/isdn/i4l/isdn_v110.c: In function ‘EncodeMatrix’:
drivers/isdn/i4l/isdn_v110.c:353:7: warning: this statement may fall through [-Wimplicit-fallthrough=]
    if (line >= mlen) {
       ^
drivers/isdn/i4l/isdn_v110.c:358:3: note: here
   case 128:
   ^~~~

Warning level 3 was used: -Wimplicit-fallthrough=3

Notice that, in this particular case, the code comment is modified
in accordance with what GCC is expecting to find.

This patch is part of the ongoing efforts to enable
-Wimplicit-fallthrough.

Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
---
 drivers/isdn/i4l/isdn_v110.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/isdn/i4l/isdn_v110.c b/drivers/isdn/i4l/isdn_v110.c
index 2a5f6668756c..d11fe76f138f 100644
--- a/drivers/isdn/i4l/isdn_v110.c
+++ b/drivers/isdn/i4l/isdn_v110.c
@@ -354,7 +354,7 @@ EncodeMatrix(unsigned char *buf, int len, unsigned char *m, int mlen)
 				printk(KERN_WARNING "isdn_v110 (EncodeMatrix): buffer full!\n");
 				return line;
 			}
-			/* else: fall through */
+			/* fall through */
 		case 128:
 			m[line] = 128;	/* leftmost -> set byte to 1000000 */
 			mbit = 64;	/* current bit in the matrix line */
-- 
2.20.1


^ permalink raw reply related

* Re: [PATCH 0/3] Add gup fast + longterm and use it in HFI1
From: Jason Gunthorpe @ 2019-02-11 22:50 UTC (permalink / raw)
  To: Weiny, Ira
  Cc: linux-rdma@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org, Daniel Borkmann, netdev@vger.kernel.org,
	Marciniszyn, Mike, Dalessandro, Dennis, Doug Ledford,
	Andrew Morton, Kirill A. Shutemov, Williams, Dan J
In-Reply-To: <2807E5FD2F6FDA4886F6618EAC48510E79BCF37B@CRSMSX101.amr.corp.intel.com>

On Mon, Feb 11, 2019 at 10:40:02PM +0000, Weiny, Ira wrote:

> > Many drivers do this, the 'doorbell' is a PCI -> CPU thing of some sort
> 
> My surprise is why does _userspace_ allocate this memory?

Well, userspace needs to read the memory, so either userpace allocates
it and the kernel GUP's it, or userspace mmap's a kernel page which
was DMA mapped.

The GUP version lets the doorbells have lower alignment than a PAGE,
and thes RDMA drivers hard requires GUP->DMA to function..

So why not use a umem here? It already has to work.

> > > This does not seem to be allocating memory regions.  Jason, do you
> > > want a patch to just convert these calls and consider it legacy code?
> > 
> > It needs to use umem like all the other drivers on this path.
> > Otherwise it doesn't get the page pinning logic right
>
> Not sure what you mean regarding the pinning logic?

The RLIMIT_MEMLOCK stuff and so on.

> > There is also something else rotten with these longterm callsites,
> > they seem to have very different ideas how to handle
> > RLIMIT_MEMLOCK.
> > 
> > ie vfio doesn't even touch pinned_vm.. and rdma is applying
> > RLIMIT_MEMLOCK to mm->pinned_vm, while vfio is using locked_vm.. No
> > idea which is right, but they should be the same, and this pattern should
> > probably be in core code someplace.
> 
> Neither do I.  But AFAIK pinned_vm is a subset of locked_vm.

I thought so..

> So should we be accounting both of the counters?

Someone should check :)

Since we don't increment locked_vm when we increment pinned_vm and
vfio only checke RLIMIT_MEMLOCK against locked_vm one can certainly
exceed the limit by mixing and matching RDMA and VFIO pins in the same
process. Sure seems like there is a bug somewhere here.

Jason

^ permalink raw reply

* Re: [PATCH mlx5-next 2/2] net/mlx5: Factor out HCA capabilities functions
From: Saeed Mahameed @ 2019-02-11 22:52 UTC (permalink / raw)
  To: Jason Gunthorpe, leon@kernel.org
  Cc: netdev@vger.kernel.org, Moni Shoua, linux-rdma@vger.kernel.org,
	dledford@redhat.com
In-Reply-To: <20190211203154.GM24706@mellanox.com>

On Mon, 2019-02-11 at 20:32 +0000, Jason Gunthorpe wrote:
> On Mon, Feb 11, 2019 at 10:02:07PM +0200, Leon Romanovsky wrote:
> > On Mon, Feb 11, 2019 at 07:50:55PM +0000, Jason Gunthorpe wrote:
> > > On Mon, Feb 11, 2019 at 01:56:08PM +0200, Leon Romanovsky wrote:
> > > > From: Leon Romanovsky <leonro@mellanox.com>
> > > > 
> > > > Combine all HCA capabilities setters under one function
> > > > and compile out the ODP related function in case kernel
> > > > was compiled without ODP support.
> > > > 
> > > > Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
> > > >  .../net/ethernet/mellanox/mlx5/core/main.c    | 47
> > > > +++++++++++++------
> > > >  1 file changed, 33 insertions(+), 14 deletions(-)
> > > > 
> > > > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c
> > > > b/drivers/net/ethernet/mellanox/mlx5/core/main.c
> > > > index 6d45518edbdc..d7145ab6105d 100644
> > > > +++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
> > > > @@ -459,6 +459,7 @@ static int handle_hca_cap_atomic(struct
> > > > mlx5_core_dev *dev)
> > > >  	return err;
> > > >  }
> > > > 
> > > > +#ifdef CONFIG_INFINIBAND_ON_DEMAND_PAGING
> > > >  static int handle_hca_cap_odp(struct mlx5_core_dev *dev)
> > > >  {
> > > >  	void *set_hca_cap;
> > > > @@ -502,6 +503,7 @@ static int handle_hca_cap_odp(struct
> > > > mlx5_core_dev *dev)
> > > >  	kfree(set_ctx);
> > > >  	return err;
> > > >  }
> > > > +#endif
> > > > 
> > > >  static int handle_hca_cap(struct mlx5_core_dev *dev)
> > > >  {
> > > > @@ -576,6 +578,35 @@ static int handle_hca_cap(struct
> > > > mlx5_core_dev *dev)
> > > >  	return err;
> > > >  }
> > > > 
> > > > +static int set_hca_cap(struct mlx5_core_dev *dev)
> > > > +{
> > > > +	struct pci_dev *pdev = dev->pdev;
> > > > +	int err;
> > > > +
> > > > +	err = handle_hca_cap(dev);
> > > > +	if (err) {
> > > > +		dev_err(&pdev->dev, "handle_hca_cap failed\n");
> > > > +		goto out;
> > > > +	}
> > > > +
> > > > +	err = handle_hca_cap_atomic(dev);
> > > > +	if (err) {
> > > > +		dev_err(&pdev->dev, "handle_hca_cap_atomic
> > > > failed\n");
> > > > +		goto out;
> > > > +	}
> > > > +
> > > > +#ifdef CONFIG_INFINIBAND_ON_DEMAND_PAGING
> > > > +	err = handle_hca_cap_odp(dev);
> > > > +	if (err) {
> > > > +		dev_err(&pdev->dev, "handle_hca_cap_odp
> > > > failed\n");
> > > > +		goto out;
> > > > +	}
> > > > +#endif
> > > 
> > > Adding
> > >   if (IS_ENABLED..)
> > >     return 0;
> > > 
> > > To the top of handle_hca_cap_odp is alot better.
> > 
> > Saeed gave comment that he prefers code to be compiled-out in case
> > config is not set. In you suggestion, the code will exist and only
> > with some optimizations enabled it will be thrown.
> 
> the kernel always compiles with optimizations that will throw away
> this code, it is a widely used pattern.
> 
> #ifdef creates a compilation test matrix that is very undesired.
> 

It is more about code separation and readability rather than compiler
optimizations, optimally i want to have a file(s) that will include all
IB stuff and compile out  directly from Makefile, of course we still
need stub functions in a header file which will yield the same
compilation test matrix, but code readability and separation will
become more clear.

-Saeed.

> Jason

^ permalink raw reply

* Re: [PATCH 2/3] mm/gup: Introduce get_user_pages_fast_longterm()
From: Dan Williams @ 2019-02-11 22:55 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Ira Weiny, John Hubbard, linux-rdma, Linux Kernel Mailing List,
	Linux MM, Daniel Borkmann, Davidlohr Bueso, Netdev,
	Mike Marciniszyn, Dennis Dalessandro, Doug Ledford, Andrew Morton,
	Kirill A. Shutemov
In-Reply-To: <20190211220658.GH24692@ziepe.ca>

On Mon, Feb 11, 2019 at 2:07 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Mon, Feb 11, 2019 at 01:52:38PM -0800, Ira Weiny wrote:
> > On Mon, Feb 11, 2019 at 01:39:12PM -0800, John Hubbard wrote:
> > > On 2/11/19 1:26 PM, Ira Weiny wrote:
> > > > On Mon, Feb 11, 2019 at 01:13:56PM -0800, John Hubbard wrote:
> > > >> On 2/11/19 12:39 PM, Jason Gunthorpe wrote:
> > > >>> On Mon, Feb 11, 2019 at 12:16:42PM -0800, ira.weiny@intel.com wrote:
> > > >>>> From: Ira Weiny <ira.weiny@intel.com>
> > > >> [...]
> > > >> It seems to me that the longterm vs. short-term is of questionable value.
> > > >
> > > > This is exactly why I did not post this before.  I've been waiting our other
> > > > discussions on how GUP pins are going to be handled to play out.  But with the
> > > > netdev thread today[1] it seems like we need to make sure we have a "safe" fast
> > > > variant for a while.  Introducing FOLL_LONGTERM seemed like the cleanest way to
> > > > do that even if we will not need the distinction in the future...  :-(
> > >
> > > Yes, I agree. Below...
> > >
> > > > [...]
> > > > This is also why I did not change the get_user_pages_longterm because we could
> > > > be ripping this all out by the end of the year...  (I hope. :-)
> > > >
> > > > So while this does "pollute" the GUP family of calls I'm hoping it is not
> > > > forever.
> > > >
> > > > Ira
> > > >
> > > > [1] https://lkml.org/lkml/2019/2/11/1789
> > > >
> > >
> > > Yes, and to be clear, I think your patchset here is fine. It is easy to find
> > > the FOLL_LONGTERM callers if and when we want to change anything. I just think
> > > also it's appopriate to go a bit further, and use FOLL_LONGTERM all by itself.
> > >
> > > That's because in either design outcome, it's better that way:
> > >
> > > is just right. The gup API already has _fast and non-fast variants, and once
> > > you get past a couple, you end up with a multiplication of names that really
> > > work better as flags. We're there.
> > >
> > > the _longterm API variants.
> >
> > Fair enough.   But to do that correctly I think we will need to convert
> > get_user_pages_fast() to use flags as well.  I have a version of this series
> > which includes a patch does this, but the patch touched a lot of subsystems and
> > a couple of different architectures...[1]
>
> I think this should be done anyhow, it is trouble the two basically
> identical interfaces have different signatures. This already caused a
> bug in vfio..
>
> I also wonder if someone should think about making fast into a flag
> too..
>
> But I'm not sure when fast should be used vs when it shouldn't :(

Effectively fast should always be used just in case the user cares
about performance. It's just that it may fail and need to fall back to
requiring the vma.

Personally I thought RDMA memory registration is a one-time / upfront
slow path so that non-fast-GUP is tolerable.

The workloads that *need* it are O_DIRECT users that can't tolerate a
vma lookup on every I/O.

^ permalink raw reply

* [PATCH] isdn: i4l: isdn_tty: Mark expected switch fall-through
From: Gustavo A. R. Silva @ 2019-02-11 22:38 UTC (permalink / raw)
  To: Karsten Keil; +Cc: netdev, linux-kernel, Gustavo A. R. Silva, Kees Cook

In preparation to enabling -Wimplicit-fallthrough, mark switch
cases where we are expecting to fall through.

This patch fixes the following warnings:

drivers/isdn/i4l/isdn_tty.c: In function ‘isdn_tty_edit_at’:
drivers/isdn/i4l/isdn_tty.c:3644:18: warning: this statement may fall through [-Wimplicit-fallthrough=]
       m->mdmcmdl = 0;
       ~~~~~~~~~~~^~~
drivers/isdn/i4l/isdn_tty.c:3646:5: note: here
     case 0:
     ^~~~

Warning level 3 was used: -Wimplicit-fallthrough=3

Notice that, in this particular case, the code comment is modified
in accordance with what GCC is expecting to find.

This patch is part of the ongoing efforts to enable
-Wimplicit-fallthrough.

Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
---
 drivers/isdn/i4l/isdn_tty.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/isdn/i4l/isdn_tty.c b/drivers/isdn/i4l/isdn_tty.c
index dc1cded716c1..43700fc19a31 100644
--- a/drivers/isdn/i4l/isdn_tty.c
+++ b/drivers/isdn/i4l/isdn_tty.c
@@ -3642,7 +3642,7 @@ isdn_tty_edit_at(const char *p, int count, modem_info *info)
 						break;
 					} else
 						m->mdmcmdl = 0;
-					/* Fall through, check for 'A' */
+					/* Fall through - check for 'A' */
 				case 0:
 					if (c == 'A') {
 						m->mdmcmd[m->mdmcmdl] = c;
-- 
2.20.1


^ permalink raw reply related

* RE: [PATCH 2/3] mm/gup: Introduce get_user_pages_fast_longterm()
From: Weiny, Ira @ 2019-02-11 23:04 UTC (permalink / raw)
  To: Williams, Dan J, Jason Gunthorpe
  Cc: John Hubbard, linux-rdma, Linux Kernel Mailing List, Linux MM,
	Daniel Borkmann, Davidlohr Bueso, Netdev, Marciniszyn, Mike,
	Dalessandro, Dennis, Doug Ledford, Andrew Morton,
	Kirill A. Shutemov
In-Reply-To: <CAPcyv4htDHmH7PVm_=HOWwRKtpcKTPSjrHPLqhwp2vhBUWL4-w@mail.gmail.com>

> 
> On Mon, Feb 11, 2019 at 2:07 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
> >
> > On Mon, Feb 11, 2019 at 01:52:38PM -0800, Ira Weiny wrote:
> > > On Mon, Feb 11, 2019 at 01:39:12PM -0800, John Hubbard wrote:
> > > > On 2/11/19 1:26 PM, Ira Weiny wrote:
> > > > > On Mon, Feb 11, 2019 at 01:13:56PM -0800, John Hubbard wrote:
> > > > >> On 2/11/19 12:39 PM, Jason Gunthorpe wrote:
> > > > >>> On Mon, Feb 11, 2019 at 12:16:42PM -0800, ira.weiny@intel.com
> wrote:
> > > > >>>> From: Ira Weiny <ira.weiny@intel.com>
> > > > >> [...]
> > > > >> It seems to me that the longterm vs. short-term is of questionable
> value.
> > > > >
> > > > > This is exactly why I did not post this before.  I've been
> > > > > waiting our other discussions on how GUP pins are going to be
> > > > > handled to play out.  But with the netdev thread today[1] it
> > > > > seems like we need to make sure we have a "safe" fast variant
> > > > > for a while.  Introducing FOLL_LONGTERM seemed like the cleanest
> > > > > way to do that even if we will not need the distinction in the
> > > > > future...  :-(
> > > >
> > > > Yes, I agree. Below...
> > > >
> > > > > [...]
> > > > > This is also why I did not change the get_user_pages_longterm
> > > > > because we could be ripping this all out by the end of the
> > > > > year...  (I hope. :-)
> > > > >
> > > > > So while this does "pollute" the GUP family of calls I'm hoping
> > > > > it is not forever.
> > > > >
> > > > > Ira
> > > > >
> > > > > [1] https://lkml.org/lkml/2019/2/11/1789
> > > > >
> > > >
> > > > Yes, and to be clear, I think your patchset here is fine. It is
> > > > easy to find the FOLL_LONGTERM callers if and when we want to
> > > > change anything. I just think also it's appopriate to go a bit further, and
> use FOLL_LONGTERM all by itself.
> > > >
> > > > That's because in either design outcome, it's better that way:
> > > >
> > > > is just right. The gup API already has _fast and non-fast
> > > > variants, and once you get past a couple, you end up with a
> > > > multiplication of names that really work better as flags. We're there.
> > > >
> > > > the _longterm API variants.
> > >
> > > Fair enough.   But to do that correctly I think we will need to convert
> > > get_user_pages_fast() to use flags as well.  I have a version of
> > > this series which includes a patch does this, but the patch touched
> > > a lot of subsystems and a couple of different architectures...[1]
> >
> > I think this should be done anyhow, it is trouble the two basically
> > identical interfaces have different signatures. This already caused a
> > bug in vfio..
> >
> > I also wonder if someone should think about making fast into a flag
> > too..
> >
> > But I'm not sure when fast should be used vs when it shouldn't :(
> 
> Effectively fast should always be used just in case the user cares about
> performance. It's just that it may fail and need to fall back to requiring the
> vma.
> 
> Personally I thought RDMA memory registration is a one-time / upfront slow
> path so that non-fast-GUP is tolerable.
> 
> The workloads that *need* it are O_DIRECT users that can't tolerate a vma
> lookup on every I/O.

There are some users who need to [un]register memory more often.  While not in the strict fast path these users would like the registrations to occur as fast as possible.  I don't personally have the results but our OPA team did do performance tests on the GUP vs GUP fast and for the hfi1 case fast was better.  I don't have any reason to believe that regular RDMA users would not also benefit.

Ira


^ permalink raw reply

* Re: [PATCH 2/3] mm/gup: Introduce get_user_pages_fast_longterm()
From: Jason Gunthorpe @ 2019-02-11 23:25 UTC (permalink / raw)
  To: Dan Williams
  Cc: Ira Weiny, John Hubbard, linux-rdma, Linux Kernel Mailing List,
	Linux MM, Daniel Borkmann, Davidlohr Bueso, Netdev,
	Mike Marciniszyn, Dennis Dalessandro, Doug Ledford, Andrew Morton,
	Kirill A. Shutemov
In-Reply-To: <CAPcyv4htDHmH7PVm_=HOWwRKtpcKTPSjrHPLqhwp2vhBUWL4-w@mail.gmail.com>

On Mon, Feb 11, 2019 at 02:55:10PM -0800, Dan Williams wrote:

> > I also wonder if someone should think about making fast into a flag
> > too..
> >
> > But I'm not sure when fast should be used vs when it shouldn't :(
> 
> Effectively fast should always be used just in case the user cares
> about performance. It's just that it may fail and need to fall back to
> requiring the vma.

But the fall back / slow path is hidden inside the API, so when should
the caller care? 

ie when should the caller care to use gup_fast vs gup_unlocked? (the
comments say they are the same, but this seems to be a mistake)

Based on some of the comments in the code it looks like this API is
trying to convert itself into:

long get_user_pages_locked(struct task_struct *tsk, struct mm_struct *mm,
                           unsigned long start, unsigned long nr_pages,
			   unsigned int gup_flags, struct page **pages,
			   struct vm_area_struct **vmas, bool *locked)

long get_user_pages_unlocked(struct task_struct *tsk, struct mm_struct *mm,
                             unsigned long start, unsigned long nr_pages,
			     unsigned int gup_flags, struct page **pages)

(and maybe a FOLL_FAST if there is some reason we have _fast and
_unlocked)

The reason I ask, is that if there is no reason for fast vs unlocked
then maybe Ira should convert HFI to use gup_unlocked and move the
'fast' code into unlocked?

ie move incrementally closer to the desired end-state here.

Jason

^ permalink raw reply

* Re: [PATCH] netfilter: nft_tunnel: Add NFTA_TUNNEL_MODE options
From: Pablo Neira Ayuso @ 2019-02-11 23:25 UTC (permalink / raw)
  To: wenxu; +Cc: netfilter-devel, netdev
In-Reply-To: <1548748277-2587-1-git-send-email-wenxu@ucloud.cn>

On Tue, Jan 29, 2019 at 03:51:17PM +0800, wenxu@ucloud.cn wrote:
> From: wenxu <wenxu@ucloud.cn>
> 
> nft "tunnel" expr match both the tun_info of RX and TX. This patch
> provide the NFTA_TUNNEL_MODE to individually match the tun_info of
> RX or TX.

Applied, thanks.

^ permalink raw reply

* Re: [PATCH net] dsa: mv88e6xxx: Ensure all pending interrupts are handled prior to exit
From: Andrew Lunn @ 2019-02-11 23:33 UTC (permalink / raw)
  To: John David Anglin; +Cc: Russell King, Vivien Didelot, Florian Fainelli, netdev
In-Reply-To: <6a1ebc61-3505-beb8-21cb-ea42ad9fe67e@bell.net>

> Signed-off-by:  John David Anglin <dave.anglin@bell.net>
> ---
>  drivers/net/dsa/mv88e6xxx/chip.c | 28 ++++++++++++++++++++++------
>  1 file changed, 22 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
> index 8dca2c949e73..12fd7ce3f1ff 100644
> --- a/drivers/net/dsa/mv88e6xxx/chip.c
> +++ b/drivers/net/dsa/mv88e6xxx/chip.c
> @@ -261,6 +261,7 @@ static irqreturn_t mv88e6xxx_g1_irq_thread_work(struct mv88e6xxx_chip *chip)
>  	unsigned int sub_irq;
>  	unsigned int n;
>  	u16 reg;
> +	u16 ctl1;
>  	int err;
> 
>  	mutex_lock(&chip->reg_lock);
> @@ -270,13 +271,28 @@ static irqreturn_t mv88e6xxx_g1_irq_thread_work(struct mv88e6xxx_chip *chip)
>  	if (err)
>  		goto out;
> 
> -	for (n = 0; n < chip->g1_irq.nirqs; ++n) {
> -		if (reg & (1 << n)) {
> -			sub_irq = irq_find_mapping(chip->g1_irq.domain, n);
> -			handle_nested_irq(sub_irq);
> -			++nhandled;
> +	do {
> +		for (n = 0; n < chip->g1_irq.nirqs; ++n) {
> +			if (reg & (1 << n)) {
> +				sub_irq = irq_find_mapping(chip->g1_irq.domain,
> +							   n);
> +				handle_nested_irq(sub_irq);
> +				++nhandled;
> +			}
>  		}
> -	}
> +
> +		mutex_lock(&chip->reg_lock);
> +		err = mv88e6xxx_g1_read(chip, MV88E6XXX_G1_CTL1, &ctl1);
> +		if (err)
> +			goto unlock;
> +		err = mv88e6xxx_g1_read(chip, MV88E6XXX_G1_STS, &reg);
> +unlock:
> +		mutex_unlock(&chip->reg_lock);
> +		if (err)
> +			goto out;
> +		ctl1 &= GENMASK(chip->g1_irq.nirqs, 0);
> +	} while (reg & ctl1);

Hi David

I just tested this on one of my boards. It loops endlessly:

[   47.173396] mv88e6xxx_g1_irq_thread_work: c881 a8 80                         
[   47.182108] mv88e6xxx_g1_irq_thread_work: c881 a8 80                         
[   47.190820] mv88e6xxx_g1_irq_thread_work: c881 a8 80                         
[   47.199535] mv88e6xxx_g1_irq_thread_work: c881 a8 80                         
[   47.208254] mv88e6xxx_g1_irq_thread_work: c881 a8 80   

These are reg, ctl1, reg & ctl1.

So there is an unhandled device interrupt. I think this is because
device interrupts are not masked before installing the interrupt
handler. But i've not fully got to the bottom of this yet.

	 Andrew

^ permalink raw reply

* Re: [PATCH] netfilter: conntrack: fix indentation issue
From: Pablo Neira Ayuso @ 2019-02-11 23:40 UTC (permalink / raw)
  To: Colin King
  Cc: Jozsef Kadlecsik, Florian Westphal, David S . Miller,
	netfilter-devel, coreteam, netdev, kernel-janitors, linux-kernel
In-Reply-To: <20190207131311.3725-1-colin.king@canonical.com>

On Thu, Feb 07, 2019 at 01:13:11PM +0000, Colin King wrote:
> From: Colin Ian King <colin.king@canonical.com>
> 
> A statement in an if block is not indented correctly. Fix this.

Applied, thanks.

^ permalink raw reply

* Re: [PATCH net-next] ipvs: Use struct_size() helper
From: Pablo Neira Ayuso @ 2019-02-11 23:40 UTC (permalink / raw)
  To: Simon Horman
  Cc: Gustavo A. R. Silva, Wensong Zhang, Julian Anastasov,
	Jozsef Kadlecsik, Florian Westphal, David S. Miller, netdev,
	lvs-devel, netfilter-devel, coreteam, linux-kernel
In-Reply-To: <20190208095648.la4ushbjxkqgqzb6@verge.net.au>

On Fri, Feb 08, 2019 at 10:56:48AM +0100, Simon Horman wrote:
> On Thu, Feb 07, 2019 at 06:44:56PM -0600, Gustavo A. R. Silva wrote:
> > One of the more common cases of allocation size calculations is finding
> > the size of a structure that has a zero-sized array at the end, along
> > with memory for some number of elements for that array. For example:
> > 
> > struct foo {
> >     int stuff;
> >     struct boo entry[];
> > };
> > 
> > size = sizeof(struct foo) + count * sizeof(struct boo);
> > instance = alloc(size, GFP_KERNEL)
> > 
> > Instead of leaving these open-coded and prone to type mistakes, we can
> > now use the new struct_size() helper:
> > 
> > size = struct_size(instance, entry, count);
> > 
> > This code was detected with the help of Coccinelle.
> > 
> > Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
> 
> Acked-by: Simon Horman <horms+renesas@verge.net.au>
> 
> Pablo, could you consider applying this?

Applied, thanks!

^ permalink raw reply

* Re: Kernel memory corruption in CIPSO labeled TCP packets processing.
From: Paul Moore @ 2019-02-11 23:43 UTC (permalink / raw)
  To: Nazarov Sergey
  Cc: linux-security-module@vger.kernel.org, selinux@vger.kernel.org,
	netdev@vger.kernel.org, Casey Schaufler
In-Reply-To: <34948711549920080@myt1-06117f29c1ea.qloud-c.yandex.net>

On Mon, Feb 11, 2019 at 4:21 PM Nazarov Sergey <s-nazarov@yandex.ru> wrote:
> Hi, Paul!
> What I need to do for this?

If you haven't already done so, go read
Documentation/process/submitting-patches.rst, that should guide you
through the process.  I would also suggest looking at both the git log
and the mailing list archives to see what others have done in terms of
commit descriptions, etc.

After that, if you have any questions let me know and I can help you out.

Thanks.

> 11.02.2019, 23:37, "Paul Moore" <paul@paul-moore.com>:
> > On Thu, Jan 31, 2019 at 8:20 AM Nazarov Sergey <s-nazarov@yandex.ru> wrote:
> >>  31.01.2019, 05:10, "Paul Moore" <paul@paul-moore.com>:
> >>  > This isn't how the rest of the stack works, look at
> >>  > ip_local_deliver_finish() for one example. Perhaps the behavior you
> >>  > are proposing is correct, but please show me where in the various RFC
> >>  > specs it is defined so that I can better understand why it should work
> >>  > this way.
> >>  > --
> >>  > paul moore
> >>  > www.paul-moore.com
> >>
> >>  Sorry, I was inattentive. ip_options_compile modifies srr option data, only if
> >>  skb is NULL. My last message could be ignored.
> >
> > Hi Nazarov,
> >
> > Do you plan on submitting these patches as a proper patchset for
> > review and merging?

-- 
paul moore
www.paul-moore.com

^ permalink raw reply

* Re: [PATCH net-next] netfilter: xt_recent: Use struct_size() in kvzalloc()
From: Pablo Neira Ayuso @ 2019-02-11 23:45 UTC (permalink / raw)
  To: Gustavo A. R. Silva
  Cc: Jozsef Kadlecsik, Florian Westphal, David S. Miller,
	netfilter-devel, coreteam, netdev, linux-kernel
In-Reply-To: <20190208005608.GA16399@embeddedor>

On Thu, Feb 07, 2019 at 06:56:08PM -0600, Gustavo A. R. Silva wrote:
> One of the more common cases of allocation size calculations is finding
> the size of a structure that has a zero-sized array at the end, along
> with memory for some number of elements for that array. For example:
> 
> struct foo {
>     int stuff;
>     void *entry[];
> };
> 
> size = sizeof(struct foo) + count * sizeof(void *);
> instance = alloc(size, GFP_KERNEL)
> 
> Instead of leaving these open-coded and prone to type mistakes, we can
> now use the new struct_size() helper:
> 
> size = struct_size(instance, entry, count);
> instance = alloc(size, GFP_KERNEL)
> 
> Notice that, in this case, variable sz is not necessary, hence it is
> removed.
> 
> This code was detected with the help of Coccinelle.

Applied, thanks.

^ permalink raw reply

* Re: [PATCH 2/3] mm/gup: Introduce get_user_pages_fast_longterm()
From: Ira Weiny @ 2019-02-12  0:08 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Dan Williams, John Hubbard, linux-rdma, Linux Kernel Mailing List,
	Linux MM, Daniel Borkmann, Davidlohr Bueso, Netdev,
	Mike Marciniszyn, Dennis Dalessandro, Doug Ledford, Andrew Morton,
	Kirill A. Shutemov
In-Reply-To: <20190211232510.GP24692@ziepe.ca>

On Mon, Feb 11, 2019 at 04:25:10PM -0700, Jason Gunthorpe wrote:
> On Mon, Feb 11, 2019 at 02:55:10PM -0800, Dan Williams wrote:
> 
> > > I also wonder if someone should think about making fast into a flag
> > > too..
> > >
> > > But I'm not sure when fast should be used vs when it shouldn't :(
> > 
> > Effectively fast should always be used just in case the user cares
> > about performance. It's just that it may fail and need to fall back to
> > requiring the vma.
> 
> But the fall back / slow path is hidden inside the API, so when should
> the caller care? 
> 
> ie when should the caller care to use gup_fast vs gup_unlocked? (the
> comments say they are the same, but this seems to be a mistake)
> 
> Based on some of the comments in the code it looks like this API is
> trying to convert itself into:
> 
> long get_user_pages_locked(struct task_struct *tsk, struct mm_struct *mm,
>                            unsigned long start, unsigned long nr_pages,
> 			   unsigned int gup_flags, struct page **pages,
> 			   struct vm_area_struct **vmas, bool *locked)
> 
> long get_user_pages_unlocked(struct task_struct *tsk, struct mm_struct *mm,
>                              unsigned long start, unsigned long nr_pages,
> 			     unsigned int gup_flags, struct page **pages)
> 
> (and maybe a FOLL_FAST if there is some reason we have _fast and
> _unlocked)
> 
> The reason I ask, is that if there is no reason for fast vs unlocked
> then maybe Ira should convert HFI to use gup_unlocked and move the
> 'fast' code into unlocked?
> 
> ie move incrementally closer to the desired end-state here.

If the pages are not in the page tables then fast is probably going to be
slightly slower because it will have to fall back after walking the tables and
finding something missing.

For PSM2 (MPI) applications are performance improvement was probably because
the memory in question was in the page tables and very much in use.

Ira

> 
> Jason

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox