Netdev List

Netdev List
 help / color / mirror / Atom feed

* KMSAN: uninit-value in hsr_register_frame_in
From: syzbot @ 2019-02-11 21:53 UTC (permalink / raw)
  To: arvid.brodin, davem, glider, linux-kernel, netdev, syzkaller-bugs

Hello,

syzbot found the following crash on:

HEAD commit:    fa1981bee40f kmsan: fix defconfig build
git tree:       kmsan
console output: https://syzkaller.appspot.com/x/log.txt?x=12c53624c00000
kernel config:  https://syzkaller.appspot.com/x/.config?x=52c9737ec5618f82
dashboard link: https://syzkaller.appspot.com/bug?extid=b8152ab439b9c5174ffd
compiler:       clang version 8.0.0 (trunk 350509)
syz repro:      https://syzkaller.appspot.com/x/repro.syz?x=11a4f787400000
C reproducer:   https://syzkaller.appspot.com/x/repro.c?x=1797e960c00000

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+b8152ab439b9c5174ffd@syzkaller.appspotmail.com

IPv6: ADDRCONF(NETDEV_UP): hsr0: link is not ready
IPv6: ADDRCONF(NETDEV_CHANGE): hsr0: link becomes ready
IPv6: ADDRCONF(NETDEV_UP): vxcan1: link is not ready
8021q: adding VLAN 0 to HW filter on device batadv0
==================================================================
BUG: KMSAN: uninit-value in hsr_register_frame_in+0x12c/0x200  
net/hsr/hsr_framereg.c:318
CPU: 0 PID: 10753 Comm: syz-executor132 Not tainted 5.0.0-rc1+ #9
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011
Call Trace:
  __dump_stack lib/dump_stack.c:77 [inline]
  dump_stack+0x173/0x1d0 lib/dump_stack.c:113
  kmsan_report+0x12e/0x2a0 mm/kmsan/kmsan.c:600
  __msan_warning+0x82/0xf0 mm/kmsan/kmsan_instr.c:313
  hsr_register_frame_in+0x12c/0x200 net/hsr/hsr_framereg.c:318
  hsr_forward_skb+0xd64/0x2e30 net/hsr/hsr_forward.c:372
  hsr_dev_xmit+0xf8/0x160 net/hsr/hsr_device.c:243
  __netdev_start_xmit include/linux/netdevice.h:4382 [inline]
  netdev_start_xmit include/linux/netdevice.h:4391 [inline]
  xmit_one net/core/dev.c:3278 [inline]
  dev_hard_start_xmit+0x604/0xc40 net/core/dev.c:3294
  __dev_queue_xmit+0x2e48/0x3b80 net/core/dev.c:3864
  dev_queue_xmit+0x4b/0x60 net/core/dev.c:3897
  packet_snd net/packet/af_packet.c:2932 [inline]
  packet_sendmsg+0x79bb/0x9760 net/packet/af_packet.c:2957
  sock_sendmsg_nosec net/socket.c:621 [inline]
  sock_sendmsg net/socket.c:631 [inline]
  __sys_sendto+0x8c4/0xac0 net/socket.c:1788
  __do_sys_sendto net/socket.c:1800 [inline]
  __se_sys_sendto+0x107/0x130 net/socket.c:1796
  __x64_sys_sendto+0x6e/0x90 net/socket.c:1796
  do_syscall_64+0xbc/0xf0 arch/x86/entry/common.c:291
  entry_SYSCALL_64_after_hwframe+0x63/0xe7
RIP: 0033:0x441869
Code: 18 89 d0 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 48 89 f8 48 89 f7  
48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff  
ff 0f 83 bb 10 fc ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007fffa42c7de8 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 0000000000441869
RDX: 000000000000000e RSI: 00000000200000c0 RDI: 0000000000000003
RBP: 00000000004a8fb0 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000402db0
R13: 0000000000402e40 R14: 0000000000000000 R15: 0000000000000000

Uninit was stored to memory at:
  kmsan_save_stack_with_flags mm/kmsan/kmsan.c:205 [inline]
  kmsan_save_stack mm/kmsan/kmsan.c:220 [inline]
  kmsan_internal_chain_origin+0x134/0x230 mm/kmsan/kmsan.c:426
  __msan_chain_origin+0x70/0xe0 mm/kmsan/kmsan_instr.c:200
  hsr_add_node net/hsr/hsr_framereg.c:152 [inline]
  hsr_get_node+0xc94/0xe10 net/hsr/hsr_framereg.c:198
  hsr_fill_frame_info net/hsr/hsr_forward.c:327 [inline]
  hsr_forward_skb+0x7e9/0x2e30 net/hsr/hsr_forward.c:370
  hsr_dev_xmit+0xf8/0x160 net/hsr/hsr_device.c:243
  __netdev_start_xmit include/linux/netdevice.h:4382 [inline]
  netdev_start_xmit include/linux/netdevice.h:4391 [inline]
  xmit_one net/core/dev.c:3278 [inline]
  dev_hard_start_xmit+0x604/0xc40 net/core/dev.c:3294
  __dev_queue_xmit+0x2e48/0x3b80 net/core/dev.c:3864
  dev_queue_xmit+0x4b/0x60 net/core/dev.c:3897
  packet_snd net/packet/af_packet.c:2932 [inline]
  packet_sendmsg+0x79bb/0x9760 net/packet/af_packet.c:2957
  sock_sendmsg_nosec net/socket.c:621 [inline]
  sock_sendmsg net/socket.c:631 [inline]
  __sys_sendto+0x8c4/0xac0 net/socket.c:1788
  __do_sys_sendto net/socket.c:1800 [inline]
  __se_sys_sendto+0x107/0x130 net/socket.c:1796
  __x64_sys_sendto+0x6e/0x90 net/socket.c:1796
  do_syscall_64+0xbc/0xf0 arch/x86/entry/common.c:291
  entry_SYSCALL_64_after_hwframe+0x63/0xe7

Uninit was created at:
  kmsan_save_stack_with_flags mm/kmsan/kmsan.c:205 [inline]
  kmsan_internal_poison_shadow+0x92/0x150 mm/kmsan/kmsan.c:159
  kmsan_kmalloc+0xa6/0x130 mm/kmsan/kmsan_hooks.c:176
  kmsan_slab_alloc+0xe/0x10 mm/kmsan/kmsan_hooks.c:185
  slab_post_alloc_hook mm/slab.h:446 [inline]
  slab_alloc_node mm/slub.c:2754 [inline]
  __kmalloc_node_track_caller+0xe9e/0xff0 mm/slub.c:4377
  __kmalloc_reserve net/core/skbuff.c:140 [inline]
  __alloc_skb+0x309/0xa20 net/core/skbuff.c:208
  alloc_skb include/linux/skbuff.h:1012 [inline]
  alloc_skb_with_frags+0x1c7/0xac0 net/core/skbuff.c:5288
  sock_alloc_send_pskb+0xafd/0x10a0 net/core/sock.c:2091
  packet_alloc_skb net/packet/af_packet.c:2783 [inline]
  packet_snd net/packet/af_packet.c:2876 [inline]
  packet_sendmsg+0x6881/0x9760 net/packet/af_packet.c:2957
  sock_sendmsg_nosec net/socket.c:621 [inline]
  sock_sendmsg net/socket.c:631 [inline]
  __sys_sendto+0x8c4/0xac0 net/socket.c:1788
  __do_sys_sendto net/socket.c:1800 [inline]
  __se_sys_sendto+0x107/0x130 net/socket.c:1796
  __x64_sys_sendto+0x6e/0x90 net/socket.c:1796
  do_syscall_64+0xbc/0xf0 arch/x86/entry/common.c:291
  entry_SYSCALL_64_after_hwframe+0x63/0xe7
==================================================================


---
This bug is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzkaller@googlegroups.com.

syzbot will keep track of this bug report. See:
https://goo.gl/tpsmEJ#bug-status-tracking for how to communicate with  
syzbot.
syzbot can test patches for this bug, for details see:
https://goo.gl/tpsmEJ#testing-patches

^ permalink raw reply

* Re: [PATCH 2/3] mm/gup: Introduce get_user_pages_fast_longterm()
From: Ira Weiny @ 2019-02-11 21:52 UTC (permalink / raw)
  To: John Hubbard
  Cc: Jason Gunthorpe, linux-rdma, linux-kernel, linux-mm,
	Daniel Borkmann, Davidlohr Bueso, netdev, Mike Marciniszyn,
	Dennis Dalessandro, Doug Ledford, Andrew Morton,
	Kirill A. Shutemov, Dan Williams
In-Reply-To: <fc9c880b-24f8-7063-6094-00175bc27f7d@nvidia.com>

On Mon, Feb 11, 2019 at 01:39:12PM -0800, John Hubbard wrote:
> On 2/11/19 1:26 PM, Ira Weiny wrote:
> > On Mon, Feb 11, 2019 at 01:13:56PM -0800, John Hubbard wrote:
> >> On 2/11/19 12:39 PM, Jason Gunthorpe wrote:
> >>> On Mon, Feb 11, 2019 at 12:16:42PM -0800, ira.weiny@intel.com wrote:
> >>>> From: Ira Weiny <ira.weiny@intel.com>
> >> [...]
> >> It seems to me that the longterm vs. short-term is of questionable value.
> > 
> > This is exactly why I did not post this before.  I've been waiting our other
> > discussions on how GUP pins are going to be handled to play out.  But with the
> > netdev thread today[1] it seems like we need to make sure we have a "safe" fast
> > variant for a while.  Introducing FOLL_LONGTERM seemed like the cleanest way to
> > do that even if we will not need the distinction in the future...  :-(
> 
> Yes, I agree. Below...
> 
> > [...]
> > This is also why I did not change the get_user_pages_longterm because we could
> > be ripping this all out by the end of the year...  (I hope. :-)
> > 
> > So while this does "pollute" the GUP family of calls I'm hoping it is not
> > forever.
> > 
> > Ira
> > 
> > [1] https://lkml.org/lkml/2019/2/11/1789
> > 
> 
> Yes, and to be clear, I think your patchset here is fine. It is easy to find
> the FOLL_LONGTERM callers if and when we want to change anything. I just think
> also it's appopriate to go a bit further, and use FOLL_LONGTERM all by itself.
> 
> That's because in either design outcome, it's better that way:
> 
> -- If we keep the concept of "I'm a long-term gup call site", then FOLL_LONGTERM
> is just right. The gup API already has _fast and non-fast variants, and once
> you get past a couple, you end up with a multiplication of names that really
> work better as flags. We're there.
> 
> -- If we drop the concept, then you've already done part of the work, by removing
> the _longterm API variants.

Fair enough.   But to do that correctly I think we will need to convert
get_user_pages_fast() to use flags as well.  I have a version of this series
which includes a patch does this, but the patch touched a lot of subsystems and
a couple of different architectures...[1]

I can't test them all.  If we want to go that way I'm up for submitting the
patch...  But if we remove longterm in the future we may be left with a
get_user_pages_fast() which really only needs 1 flag.  But perhaps overall we
would be better off?

Ira


[1] mm/gup.c: Change GUP fast to use flags rather than write bool

To facilitate additional options to get_user_pages_fast change the
singular write parameter to be the more generic gup_flags.

This patch currently does not change any functionality.  New
functionality will follow in subsequent patches.

Many of the get_user_pages_fast call sites were unchanged because they
already used FOLL_WRITE or 0 as appropriate.

Signed-off-by: Ira Weiny <ira.weiny@intel.com>
---
 arch/mips/mm/gup.c                         | 11 ++++++-----
 arch/powerpc/kvm/book3s_64_mmu_hv.c        |  4 ++--
 arch/powerpc/kvm/e500_mmu.c                |  2 +-
 arch/powerpc/mm/mmu_context_iommu.c        |  4 ++--
 arch/s390/kvm/interrupt.c                  |  2 +-
 arch/s390/mm/gup.c                         | 12 ++++++------
 arch/sh/mm/gup.c                           | 11 ++++++-----
 arch/sparc/mm/gup.c                        |  9 +++++----
 arch/x86/kvm/paging_tmpl.h                 |  2 +-
 arch/x86/kvm/svm.c                         |  2 +-
 drivers/fpga/dfl-afu-dma-region.c          |  2 +-
 drivers/gpu/drm/via/via_dmablit.c          |  3 ++-
 drivers/infiniband/hw/hfi1/user_pages.c    |  3 ++-
 drivers/misc/genwqe/card_utils.c           |  2 +-
 drivers/misc/vmw_vmci/vmci_host.c          |  2 +-
 drivers/misc/vmw_vmci/vmci_queue_pair.c    |  6 ++++--
 drivers/platform/goldfish/goldfish_pipe.c  |  3 ++-
 drivers/rapidio/devices/rio_mport_cdev.c   |  4 +++-
 drivers/sbus/char/oradax.c                 |  2 +-
 drivers/scsi/st.c                          |  3 ++-
 drivers/staging/gasket/gasket_page_table.c |  4 ++--
 drivers/tee/tee_shm.c                      |  2 +-
 drivers/vfio/vfio_iommu_spapr_tce.c        |  3 ++-
 drivers/vhost/vhost.c                      |  2 +-
 drivers/video/fbdev/pvr2fb.c               |  2 +-
 drivers/virt/fsl_hypervisor.c              |  2 +-
 drivers/xen/gntdev.c                       |  2 +-
 fs/orangefs/orangefs-bufmap.c              |  2 +-
 include/linux/mm.h                         |  4 ++--
 kernel/futex.c                             |  2 +-
 lib/iov_iter.c                             |  7 +++++--
 mm/gup.c                                   | 10 +++++-----
 mm/util.c                                  |  8 ++++----
 net/ceph/pagevec.c                         |  2 +-
 net/rds/info.c                             |  2 +-
 net/rds/rdma.c                             |  3 ++-
 36 files changed, 81 insertions(+), 65 deletions(-)



^ permalink raw reply

* Re: [PATCH 2/3] mm/gup: Introduce get_user_pages_fast_longterm()
From: Dan Williams @ 2019-02-11 21:45 UTC (permalink / raw)
  To: John Hubbard
  Cc: Ira Weiny, Jason Gunthorpe, linux-rdma, Linux Kernel Mailing List,
	Linux MM, Daniel Borkmann, Davidlohr Bueso, Netdev,
	Mike Marciniszyn, Dennis Dalessandro, Doug Ledford, Andrew Morton,
	Kirill A. Shutemov
In-Reply-To: <fc9c880b-24f8-7063-6094-00175bc27f7d@nvidia.com>

On Mon, Feb 11, 2019 at 1:39 PM John Hubbard <jhubbard@nvidia.com> wrote:
>
> On 2/11/19 1:26 PM, Ira Weiny wrote:
> > On Mon, Feb 11, 2019 at 01:13:56PM -0800, John Hubbard wrote:
> >> On 2/11/19 12:39 PM, Jason Gunthorpe wrote:
> >>> On Mon, Feb 11, 2019 at 12:16:42PM -0800, ira.weiny@intel.com wrote:
> >>>> From: Ira Weiny <ira.weiny@intel.com>
> >> [...]
> >> It seems to me that the longterm vs. short-term is of questionable value.
> >
> > This is exactly why I did not post this before.  I've been waiting our other
> > discussions on how GUP pins are going to be handled to play out.  But with the
> > netdev thread today[1] it seems like we need to make sure we have a "safe" fast
> > variant for a while.  Introducing FOLL_LONGTERM seemed like the cleanest way to
> > do that even if we will not need the distinction in the future...  :-(
>
> Yes, I agree. Below...
>
> > [...]
> > This is also why I did not change the get_user_pages_longterm because we could
> > be ripping this all out by the end of the year...  (I hope. :-)
> >
> > So while this does "pollute" the GUP family of calls I'm hoping it is not
> > forever.
> >
> > Ira
> >
> > [1] https://lkml.org/lkml/2019/2/11/1789
> >
>
> Yes, and to be clear, I think your patchset here is fine. It is easy to find
> the FOLL_LONGTERM callers if and when we want to change anything. I just think
> also it's appopriate to go a bit further, and use FOLL_LONGTERM all by itself.
>
> That's because in either design outcome, it's better that way:
>
> -- If we keep the concept of "I'm a long-term gup call site", then FOLL_LONGTERM
> is just right. The gup API already has _fast and non-fast variants, and once
> you get past a couple, you end up with a multiplication of names that really
> work better as flags. We're there.
>
> -- If we drop the concept, then you've already done part of the work, by removing
> the _longterm API variants.
>

A problem I now see with the _longterm name is that it hides its true
intent. It's really a "dax can't use page cache tricks to make it seem
like this page is ok to access indefinitely, if the system needs this
page back your pin would prevent the forward progress of the system
state.". If the discussion results in a need to have an explicit file
state (immutable or lease) then we'll continue to need a gup pin type
distinction. If the discussion resolves to one of the silent options
(fail truncate, lie about truncate) then FOLL_LONGTERM might be able
to die at that point.

^ permalink raw reply

* Re: [PATCH v3 0/3 net-next] net: phy: disregard "Clause 22 registers present" bit in get_phy_c45_devs_in_pkg
From: David Miller @ 2019-02-11 21:43 UTC (permalink / raw)
  To: hkallweit1; +Cc: andrew, f.fainelli, netdev
In-Reply-To: <6bdfe4da-6c27-c07c-2eb1-0cc13007a6d9@gmail.com>

From: Heiner Kallweit <hkallweit1@gmail.com>
Date: Mon, 11 Feb 2019 22:34:34 +0100

> On 11.02.2019 22:19, David Miller wrote:
>> From: Heiner Kallweit <hkallweit1@gmail.com>
>> Date: Fri, 8 Feb 2019 20:19:17 +0100
>> 
>>> Bit 0 in register 1.5 doesn't represent a device but is a flag that
>>> Clause 22 registers are present. Therefore disregard this bit when
>>> populating the device list. If code needs this information it
>>> should read register 1.5 directly instead of accessing the device
>>> list.
>>> Because this bit doesn't represent a device don't define a
>>> MDIO_MMD_XYZ constant, just define a MDIO_DEVS_XYZ constant for
>>> the flag in the device list bitmap.
>>  ...
>> 
>> This doesn't apply cleanly to net-next, please respin.
>> 
> You applied v2 already, and with a follow-up patch I "upgraded"
> it to v3. So you can disregard this series.

Oh that's right, thanks.

^ permalink raw reply

* Re: [PATCH 0/3] Add gup fast + longterm and use it in HFI1
From: Ira Weiny @ 2019-02-11 21:42 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: linux-rdma, linux-kernel, linux-mm, Daniel Borkmann, netdev,
	Mike Marciniszyn, Dennis Dalessandro, Doug Ledford, Andrew Morton,
	Kirill A. Shutemov, Dan Williams
In-Reply-To: <20190211204710.GE24692@ziepe.ca>

On Mon, Feb 11, 2019 at 01:47:10PM -0700, Jason Gunthorpe wrote:
> On Mon, Feb 11, 2019 at 12:34:17PM -0800, Davidlohr Bueso wrote:
> > On Mon, 11 Feb 2019, ira.weiny@intel.com wrote:
> > > Ira Weiny (3):
> > >  mm/gup: Change "write" parameter to flags
> > >  mm/gup: Introduce get_user_pages_fast_longterm()
> > >  IB/HFI1: Use new get_user_pages_fast_longterm()
> > 
> > Out of curiosity, are you planning on having all rdma drivers
> > use get_user_pages_fast_longterm()? Ie:
> > 
> > hw/mthca/mthca_memfree.c:       ret = get_user_pages_fast(uaddr & PAGE_MASK, 1, FOLL_WRITE, pages);
> 
> This one is certainly a mistake - this should be done with a umem.

It looks like this is mapping a page allocated by user space for a doorbell?!?!
And that this is supporting the old memory free cards.  I remember that these
cards used system memory instead of memory on the cards but why it expects user
space to allocate that memory and how it all works is way too old for me to
even try to remember.

This does not seem to be allocating memory regions.  Jason, do you want a patch
to just convert these calls and consider it legacy code?

Ira

> 
> Jason

^ permalink raw reply

* Re: [PATCHv1 net-next] devlink: Add WARN_ON to catch errors of not cleaning devlink objects
From: David Miller @ 2019-02-11 21:41 UTC (permalink / raw)
  To: parav; +Cc: netdev
In-Reply-To: <1549660500-26565-1-git-send-email-parav@mellanox.com>

From: Parav Pandit <parav@mellanox.com>
Date: Fri,  8 Feb 2019 15:15:00 -0600

> Add WARN_ON to make sure that all sub objects of a devlink device are
> cleanedup before freeing the devlink device.
> This helps to catch any driver bugs.
> 
> Signed-off-by: Parav Pandit <parav@mellanox.com>

Applied, thanks.

^ permalink raw reply

* Re: [PATCH net-next v4 0/9] net: Remove switchdev_ops
From: Ido Schimmel @ 2019-02-11 21:40 UTC (permalink / raw)
  To: David Miller
  Cc: f.fainelli@gmail.com, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, devel@driverdev.osuosl.org,
	bridge@lists.linux-foundation.org, Jiri Pirko, andrew@lunn.ch,
	vivien.didelot@gmail.com
In-Reply-To: <20190211.121657.2204380606625115977.davem@davemloft.net>

On Mon, Feb 11, 2019 at 12:16:57PM -0800, David Miller wrote:
> From: Florian Fainelli <f.fainelli@gmail.com>
> Date: Mon, 11 Feb 2019 11:09:52 -0800
> 
> > David, I would like to get Ido's feedback on this to make sure I did not
> > miss something, thank you!
> 
> Ok, Ido please look at this when you can.

Will review tomorrow. I was occupied with other stuff last couple of
days. Sorry

^ permalink raw reply

* Re: [PATCH 2/3] mm/gup: Introduce get_user_pages_fast_longterm()
From: John Hubbard @ 2019-02-11 21:39 UTC (permalink / raw)
  To: Ira Weiny
  Cc: Jason Gunthorpe, linux-rdma, linux-kernel, linux-mm,
	Daniel Borkmann, Davidlohr Bueso, netdev, Mike Marciniszyn,
	Dennis Dalessandro, Doug Ledford, Andrew Morton,
	Kirill A. Shutemov, Dan Williams
In-Reply-To: <20190211212652.GA7790@iweiny-DESK2.sc.intel.com>

On 2/11/19 1:26 PM, Ira Weiny wrote:
> On Mon, Feb 11, 2019 at 01:13:56PM -0800, John Hubbard wrote:
>> On 2/11/19 12:39 PM, Jason Gunthorpe wrote:
>>> On Mon, Feb 11, 2019 at 12:16:42PM -0800, ira.weiny@intel.com wrote:
>>>> From: Ira Weiny <ira.weiny@intel.com>
>> [...]
>> It seems to me that the longterm vs. short-term is of questionable value.
> 
> This is exactly why I did not post this before.  I've been waiting our other
> discussions on how GUP pins are going to be handled to play out.  But with the
> netdev thread today[1] it seems like we need to make sure we have a "safe" fast
> variant for a while.  Introducing FOLL_LONGTERM seemed like the cleanest way to
> do that even if we will not need the distinction in the future...  :-(

Yes, I agree. Below...

> [...]
> This is also why I did not change the get_user_pages_longterm because we could
> be ripping this all out by the end of the year...  (I hope. :-)
> 
> So while this does "pollute" the GUP family of calls I'm hoping it is not
> forever.
> 
> Ira
> 
> [1] https://lkml.org/lkml/2019/2/11/1789
> 

Yes, and to be clear, I think your patchset here is fine. It is easy to find
the FOLL_LONGTERM callers if and when we want to change anything. I just think
also it's appopriate to go a bit further, and use FOLL_LONGTERM all by itself.

That's because in either design outcome, it's better that way:

-- If we keep the concept of "I'm a long-term gup call site", then FOLL_LONGTERM
is just right. The gup API already has _fast and non-fast variants, and once
you get past a couple, you end up with a multiplication of names that really
work better as flags. We're there.

-- If we drop the concept, then you've already done part of the work, by removing
the _longterm API variants.



thanks,
-- 
John Hubbard
NVIDIA

^ permalink raw reply

* Re: [PATCH v3 0/3 net-next] net: phy: disregard "Clause 22 registers present" bit in get_phy_c45_devs_in_pkg
From: Heiner Kallweit @ 2019-02-11 21:34 UTC (permalink / raw)
  To: David Miller; +Cc: andrew, f.fainelli, netdev
In-Reply-To: <20190211.131900.826012556219014471.davem@davemloft.net>

On 11.02.2019 22:19, David Miller wrote:
> From: Heiner Kallweit <hkallweit1@gmail.com>
> Date: Fri, 8 Feb 2019 20:19:17 +0100
> 
>> Bit 0 in register 1.5 doesn't represent a device but is a flag that
>> Clause 22 registers are present. Therefore disregard this bit when
>> populating the device list. If code needs this information it
>> should read register 1.5 directly instead of accessing the device
>> list.
>> Because this bit doesn't represent a device don't define a
>> MDIO_MMD_XYZ constant, just define a MDIO_DEVS_XYZ constant for
>> the flag in the device list bitmap.
>  ...
> 
> This doesn't apply cleanly to net-next, please respin.
> 
You applied v2 already, and with a follow-up patch I "upgraded"
it to v3. So you can disregard this series.

> Thanks.
> 


^ permalink raw reply

* [PATCH net-next] net: phy: simplify genphy_config_eee_advert
From: Heiner Kallweit @ 2019-02-11 21:16 UTC (permalink / raw)
  To: Andrew Lunn, Florian Fainelli, David Miller; +Cc: netdev@vger.kernel.org

Use new function phy_modify_mmd_changed(), the result speaks for itself.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
---
 drivers/net/phy/phy_device.c | 27 ++++++---------------------
 1 file changed, 6 insertions(+), 21 deletions(-)

diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
index 3d14e48ae..2c61282a2 100644
--- a/drivers/net/phy/phy_device.c
+++ b/drivers/net/phy/phy_device.c
@@ -1578,31 +1578,16 @@ static int genphy_config_advert(struct phy_device *phydev)
  */
 static int genphy_config_eee_advert(struct phy_device *phydev)
 {
-	int broken = phydev->eee_broken_modes;
-	int old_adv, adv;
+	int err;
 
 	/* Nothing to disable */
-	if (!broken)
+	if (!phydev->eee_broken_modes)
 		return 0;
 
-	/* If the following call fails, we assume that EEE is not
-	 * supported by the phy. If we read 0, EEE is not advertised
-	 * In both case, we don't need to continue
-	 */
-	adv = phy_read_mmd(phydev, MDIO_MMD_AN, MDIO_AN_EEE_ADV);
-	if (adv <= 0)
-		return 0;
-
-	old_adv = adv;
-	adv &= ~broken;
-
-	/* Advertising remains unchanged with the broken mask */
-	if (old_adv == adv)
-		return 0;
-
-	phy_write_mmd(phydev, MDIO_MMD_AN, MDIO_AN_EEE_ADV, adv);
-
-	return 1;
+	err = phy_modify_mmd_changed(phydev, MDIO_MMD_AN, MDIO_AN_EEE_ADV,
+				     phydev->eee_broken_modes, 0);
+	/* If the call failed, we assume that EEE is not supported */
+	return err < 0 ? 0 : err;
 }
 
 /**
-- 
2.20.1


^ permalink raw reply related

* Re: [PATCH 0/3] Add gup fast + longterm and use it in HFI1
From: Ira Weiny @ 2019-02-11 21:29 UTC (permalink / raw)
  To: linux-rdma, linux-kernel, linux-mm, Daniel Borkmann, netdev,
	Mike Marciniszyn, Dennis Dalessandro, Doug Ledford,
	Jason Gunthorpe, Andrew Morton, Kirill A. Shutemov, Dan Williams
In-Reply-To: <20190211203417.a2c2kbmjai43flyz@linux-r8p5>

On Mon, Feb 11, 2019 at 12:34:17PM -0800, Davidlohr Bueso wrote:
> On Mon, 11 Feb 2019, ira.weiny@intel.com wrote:
> > Ira Weiny (3):
> >  mm/gup: Change "write" parameter to flags
> >  mm/gup: Introduce get_user_pages_fast_longterm()
> >  IB/HFI1: Use new get_user_pages_fast_longterm()
> 
> Out of curiosity, are you planning on having all rdma drivers
> use get_user_pages_fast_longterm()? Ie:
> 
> hw/mthca/mthca_memfree.c:       ret = get_user_pages_fast(uaddr & PAGE_MASK, 1, FOLL_WRITE, pages);
> hw/qib/qib_user_sdma.c:         ret = get_user_pages_fast(addr, j, 0, pages);

I missed that when I change the other qib call to longterm...  :-(

Yes both of these should be changed.  Although I need to look into Jasons
comment WRT the mthca call.

Ira

> 
> Thanks,
> Davidlohr

^ permalink raw reply

* Re: [PATCH 2/3] mm/gup: Introduce get_user_pages_fast_longterm()
From: Ira Weiny @ 2019-02-11 21:26 UTC (permalink / raw)
  To: John Hubbard
  Cc: Jason Gunthorpe, linux-rdma, linux-kernel, linux-mm,
	Daniel Borkmann, Davidlohr Bueso, netdev, Mike Marciniszyn,
	Dennis Dalessandro, Doug Ledford, Andrew Morton,
	Kirill A. Shutemov, Dan Williams
In-Reply-To: <bcc03ee1-4c42-48c3-bc67-942c0f04875e@nvidia.com>

On Mon, Feb 11, 2019 at 01:13:56PM -0800, John Hubbard wrote:
> On 2/11/19 12:39 PM, Jason Gunthorpe wrote:
> > On Mon, Feb 11, 2019 at 12:16:42PM -0800, ira.weiny@intel.com wrote:
> >> From: Ira Weiny <ira.weiny@intel.com>
> [...]
> >> +static inline int get_user_pages_fast_longterm(unsigned long start, int nr_pages,
> >> +					       bool write, struct page **pages)
> >> +{
> >> +	return get_user_pages_fast(start, nr_pages, write, pages);
> >> +}
> >>  #endif /* CONFIG_FS_DAX */
> >>  
> >>  int get_user_pages_fast(unsigned long start, int nr_pages, int write,
> >> @@ -2615,6 +2622,7 @@ struct page *follow_page(struct vm_area_struct *vma, unsigned long address,
> >>  #define FOLL_REMOTE	0x2000	/* we are working on non-current tsk/mm */
> >>  #define FOLL_COW	0x4000	/* internal GUP flag */
> >>  #define FOLL_ANON	0x8000	/* don't do file mappings */
> >> +#define FOLL_LONGTERM	0x10000	/* mapping is intended for a long term pin */
> > 
> > If we are adding a new flag, maybe we should get rid of the 'longterm'
> > entry points and just rely on the callers to pass the flag?
> > 
> > Jason
> > 
> 
> +1, I agree that the overall get_user_pages*() API family will be cleaner
> *without* get_user_pages_longterm*() calls. And this new flag makes that possible.
> So I'd like to see the "longerm" call replaced with just passing this flag. Maybe
> even as part of this patchset, but either way.

Yes I've thought about this as well.  I have a couple of different versions of
this series which I've been mulling over and this was one of the other
variations.  But see below...

> 
> Taking a moment to reflect on where I think this might go eventually (the notes
> below do not need to affect your patchset here, but this seems like a good place
> to mention this):
> 
> It seems to me that the longterm vs. short-term is of questionable value.

This is exactly why I did not post this before.  I've been waiting our other
discussions on how GUP pins are going to be handled to play out.  But with the
netdev thread today[1] it seems like we need to make sure we have a "safe" fast
variant for a while.  Introducing FOLL_LONGTERM seemed like the cleanest way to
do that even if we will not need the distinction in the future...  :-(

> It's actually better to just call get_user_pages(), and then if it really is
> long-term enough to matter internally, we'll see the pages marked as gup-pinned.
> If the gup pages are released before anyone (filesystem, that is) notices, then
> it must have been short term.
> 
> Doing it that way is self-maintaining. Of course, this assumes that we end up with
> a design that doesn't require being told, by the call sites, that a given gup
> call is intended for "long term" use. So I could be wrong about this direction, but
> let's please consider the possibility.

This is why I've been holding these patches.  I'm also not 100% sure if we will
need the longterm flag in the future.

This is also why I did not change the get_user_pages_longterm because we could
be ripping this all out by the end of the year...  (I hope. :-)

So while this does "pollute" the GUP family of calls I'm hoping it is not
forever.

Ira

[1] https://lkml.org/lkml/2019/2/11/1789

> 
> thanks,
> -- 
> John Hubbard
> NVIDIA

^ permalink raw reply

* Re: Kernel memory corruption in CIPSO labeled TCP packets processing.
From: Nazarov Sergey @ 2019-02-11 21:21 UTC (permalink / raw)
  To: Paul Moore
  Cc: linux-security-module@vger.kernel.org, selinux@vger.kernel.org,
	netdev@vger.kernel.org, Casey Schaufler
In-Reply-To: <CAHC9VhSjtQRGBUHvx73eVg32NMm4UN4qqaOXFyB9BMSx0Bz7dQ@mail.gmail.com>

Hi, Paul!
What I need to do for this?

11.02.2019, 23:37, "Paul Moore" <paul@paul-moore.com>:
> On Thu, Jan 31, 2019 at 8:20 AM Nazarov Sergey <s-nazarov@yandex.ru> wrote:
>>  31.01.2019, 05:10, "Paul Moore" <paul@paul-moore.com>:
>>  > This isn't how the rest of the stack works, look at
>>  > ip_local_deliver_finish() for one example. Perhaps the behavior you
>>  > are proposing is correct, but please show me where in the various RFC
>>  > specs it is defined so that I can better understand why it should work
>>  > this way.
>>  > --
>>  > paul moore
>>  > www.paul-moore.com
>>
>>  Sorry, I was inattentive. ip_options_compile modifies srr option data, only if
>>  skb is NULL. My last message could be ignored.
>
> Hi Nazarov,
>
> Do you plan on submitting these patches as a proper patchset for
> review and merging?
>
> --
> paul moore
> www.paul-moore.com

^ permalink raw reply

* Re: [PATCH net] net/x25: do not hold the cpu too long in x25_new_lci()
From: David Miller @ 2019-02-11 21:20 UTC (permalink / raw)
  To: edumazet; +Cc: netdev, eric.dumazet, syzkaller, andrew.hendry, linux-x25
In-Reply-To: <20190208204105.16314-1-edumazet@google.com>

From: Eric Dumazet <edumazet@google.com>
Date: Fri,  8 Feb 2019 12:41:05 -0800

> Due to quadratic behavior of x25_new_lci(), syzbot was able
> to trigger an rcu stall.
> 
> Fix this by not blocking BH for the whole duration of
> the function, and inserting a reschedule point when possible.
> 
> If we care enough, using a bitmap could get rid of the quadratic
> behavior.
> 
> syzbot report :
 ...
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Reported-by: syzbot <syzkaller@googlegroups.com>

Applied, thanks.

^ permalink raw reply

* Re: [PATCH v3 0/3 net-next] net: phy: disregard "Clause 22 registers present" bit in get_phy_c45_devs_in_pkg
From: David Miller @ 2019-02-11 21:19 UTC (permalink / raw)
  To: hkallweit1; +Cc: andrew, f.fainelli, netdev
In-Reply-To: <75c9d8ee-582f-f247-7595-d8732ac26c20@gmail.com>

From: Heiner Kallweit <hkallweit1@gmail.com>
Date: Fri, 8 Feb 2019 20:19:17 +0100

> Bit 0 in register 1.5 doesn't represent a device but is a flag that
> Clause 22 registers are present. Therefore disregard this bit when
> populating the device list. If code needs this information it
> should read register 1.5 directly instead of accessing the device
> list.
> Because this bit doesn't represent a device don't define a
> MDIO_MMD_XYZ constant, just define a MDIO_DEVS_XYZ constant for
> the flag in the device list bitmap.
 ...

This doesn't apply cleanly to net-next, please respin.

Thanks.

^ permalink raw reply

* [PATCH net-next 3/3] staging: fsl-dpaa2: ethsw: Remove getting PORT_BRIDGE_FLAGS
From: Florian Fainelli @ 2019-02-11 21:17 UTC (permalink / raw)
  To: netdev
  Cc: Florian Fainelli, David S. Miller, Ido Schimmel, open list,
	open list:STAGING SUBSYSTEM, moderated list:ETHERNET BRIDGE, jiri,
	andrew, vivien.didelot
In-Reply-To: <20190211211749.19847-1-f.fainelli@gmail.com>

There is no code that tries to get the attribute
SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS, remove support for doing that.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
---
 drivers/staging/fsl-dpaa2/ethsw/ethsw.c | 5 -----
 1 file changed, 5 deletions(-)

diff --git a/drivers/staging/fsl-dpaa2/ethsw/ethsw.c b/drivers/staging/fsl-dpaa2/ethsw/ethsw.c
index e559f4c25cf7..e5a14c7c777f 100644
--- a/drivers/staging/fsl-dpaa2/ethsw/ethsw.c
+++ b/drivers/staging/fsl-dpaa2/ethsw/ethsw.c
@@ -646,11 +646,6 @@ static int swdev_port_attr_get(struct net_device *netdev,
 	struct ethsw_port_priv *port_priv = netdev_priv(netdev);
 
 	switch (attr->id) {
-	case SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS:
-		attr->u.brport_flags =
-			(port_priv->ethsw_data->learning ? BR_LEARNING : 0) |
-			(port_priv->flood ? BR_FLOOD : 0);
-		break;
 	case SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS_SUPPORT:
 		attr->u.brport_flags_support = BR_LEARNING | BR_FLOOD;
 		break;
-- 
2.17.1


^ permalink raw reply related

* [PATCH net-next 2/3] rocker: Remove getting PORT_BRIDGE_FLAGS
From: Florian Fainelli @ 2019-02-11 21:17 UTC (permalink / raw)
  To: netdev
  Cc: Florian Fainelli, David S. Miller, Ido Schimmel, open list,
	open list:STAGING SUBSYSTEM, moderated list:ETHERNET BRIDGE, jiri,
	andrew, vivien.didelot
In-Reply-To: <20190211211749.19847-1-f.fainelli@gmail.com>

There is no code that attempts to get the
SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS attribute, remove support for that.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
---
 drivers/net/ethernet/rocker/rocker.h       |  2 --
 drivers/net/ethernet/rocker/rocker_main.c  | 15 ---------------
 drivers/net/ethernet/rocker/rocker_ofdpa.c | 10 ----------
 3 files changed, 27 deletions(-)

diff --git a/drivers/net/ethernet/rocker/rocker.h b/drivers/net/ethernet/rocker/rocker.h
index 748fb12260a6..2b2e1c4f0dc3 100644
--- a/drivers/net/ethernet/rocker/rocker.h
+++ b/drivers/net/ethernet/rocker/rocker.h
@@ -109,8 +109,6 @@ struct rocker_world_ops {
 	int (*port_attr_bridge_flags_set)(struct rocker_port *rocker_port,
 					  unsigned long brport_flags,
 					  struct switchdev_trans *trans);
-	int (*port_attr_bridge_flags_get)(const struct rocker_port *rocker_port,
-					  unsigned long *p_brport_flags);
 	int (*port_attr_bridge_flags_support_get)(const struct rocker_port *
 						  rocker_port,
 						  unsigned long *
diff --git a/drivers/net/ethernet/rocker/rocker_main.c b/drivers/net/ethernet/rocker/rocker_main.c
index 66f72f8c46e5..5ce8d5aba603 100644
--- a/drivers/net/ethernet/rocker/rocker_main.c
+++ b/drivers/net/ethernet/rocker/rocker_main.c
@@ -1582,17 +1582,6 @@ rocker_world_port_attr_bridge_flags_set(struct rocker_port *rocker_port,
 						trans);
 }
 
-static int
-rocker_world_port_attr_bridge_flags_get(const struct rocker_port *rocker_port,
-					unsigned long *p_brport_flags)
-{
-	struct rocker_world_ops *wops = rocker_port->rocker->wops;
-
-	if (!wops->port_attr_bridge_flags_get)
-		return -EOPNOTSUPP;
-	return wops->port_attr_bridge_flags_get(rocker_port, p_brport_flags);
-}
-
 static int
 rocker_world_port_attr_bridge_flags_support_get(const struct rocker_port *
 						rocker_port,
@@ -2061,10 +2050,6 @@ static int rocker_port_attr_get(struct net_device *dev,
 	int err = 0;
 
 	switch (attr->id) {
-	case SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS:
-		err = rocker_world_port_attr_bridge_flags_get(rocker_port,
-							      &attr->u.brport_flags);
-		break;
 	case SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS_SUPPORT:
 		err = rocker_world_port_attr_bridge_flags_support_get(rocker_port,
 								      &attr->u.brport_flags_support);
diff --git a/drivers/net/ethernet/rocker/rocker_ofdpa.c b/drivers/net/ethernet/rocker/rocker_ofdpa.c
index bea7895930f6..9c07f6040720 100644
--- a/drivers/net/ethernet/rocker/rocker_ofdpa.c
+++ b/drivers/net/ethernet/rocker/rocker_ofdpa.c
@@ -2511,16 +2511,6 @@ static int ofdpa_port_attr_bridge_flags_set(struct rocker_port *rocker_port,
 	return err;
 }
 
-static int
-ofdpa_port_attr_bridge_flags_get(const struct rocker_port *rocker_port,
-				 unsigned long *p_brport_flags)
-{
-	const struct ofdpa_port *ofdpa_port = rocker_port->wpriv;
-
-	*p_brport_flags = ofdpa_port->brport_flags;
-	return 0;
-}
-
 static int
 ofdpa_port_attr_bridge_flags_support_get(const struct rocker_port *
 					 rocker_port,
-- 
2.17.1


^ permalink raw reply related

* [PATCH net-next 1/3] mlxsw: spectrum_switchdev: Remove getting PORT_BRIDGE_FLAGS
From: Florian Fainelli @ 2019-02-11 21:17 UTC (permalink / raw)
  To: netdev
  Cc: Florian Fainelli, David S. Miller, Ido Schimmel, open list,
	open list:STAGING SUBSYSTEM, moderated list:ETHERNET BRIDGE, jiri,
	andrew, vivien.didelot
In-Reply-To: <20190211211749.19847-1-f.fainelli@gmail.com>

There is no code that will query the SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS
attribute remove support for that.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
---
 .../mellanox/mlxsw/spectrum_switchdev.c         | 17 -----------------
 1 file changed, 17 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c
index 95e37de3e48f..4c5780f8f4b2 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c
@@ -431,19 +431,6 @@ static void mlxsw_sp_bridge_vlan_put(struct mlxsw_sp_bridge_vlan *bridge_vlan)
 		mlxsw_sp_bridge_vlan_destroy(bridge_vlan);
 }
 
-static void mlxsw_sp_port_bridge_flags_get(struct mlxsw_sp_bridge *bridge,
-					   struct net_device *dev,
-					   unsigned long *brport_flags)
-{
-	struct mlxsw_sp_bridge_port *bridge_port;
-
-	bridge_port = mlxsw_sp_bridge_port_find(bridge, dev);
-	if (WARN_ON(!bridge_port))
-		return;
-
-	memcpy(brport_flags, &bridge_port->flags, sizeof(*brport_flags));
-}
-
 static int mlxsw_sp_port_attr_get(struct net_device *dev,
 				  struct switchdev_attr *attr)
 {
@@ -451,10 +438,6 @@ static int mlxsw_sp_port_attr_get(struct net_device *dev,
 	struct mlxsw_sp *mlxsw_sp = mlxsw_sp_port->mlxsw_sp;
 
 	switch (attr->id) {
-	case SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS:
-		mlxsw_sp_port_bridge_flags_get(mlxsw_sp->bridge, attr->orig_dev,
-					       &attr->u.brport_flags);
-		break;
 	case SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS_SUPPORT:
 		attr->u.brport_flags_support = BR_LEARNING | BR_FLOOD |
 					       BR_MCAST_FLOOD;
-- 
2.17.1


^ permalink raw reply related

* [PATCH net-next 0/3] Remove getting SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS
From: Florian Fainelli @ 2019-02-11 21:17 UTC (permalink / raw)
  To: netdev
  Cc: Florian Fainelli, David S. Miller, Ido Schimmel, open list,
	open list:STAGING SUBSYSTEM, moderated list:ETHERNET BRIDGE, jiri,
	andrew, vivien.didelot

Hi all,

AFAICT there is no code that attempts to get the value of the attribute
SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS while it is used with
switchdev_port_attr_set().

This is effectively no doing anything and it can slow down future work
that tries to make modifications in these areas so remove that.

David, there should be no dependency with previous patch series, but
again, feedback from Ido and Jiri would be welcome in case this was
added for a reason.

Thanks!

Florian Fainelli (3):
  mlxsw: spectrum_switchdev: Remove getting PORT_BRIDGE_FLAGS
  rocker: Remove getting PORT_BRIDGE_FLAGS
  staging: fsl-dpaa2: ethsw: Remove getting PORT_BRIDGE_FLAGS

 .../mellanox/mlxsw/spectrum_switchdev.c         | 17 -----------------
 drivers/net/ethernet/rocker/rocker.h            |  2 --
 drivers/net/ethernet/rocker/rocker_main.c       | 15 ---------------
 drivers/net/ethernet/rocker/rocker_ofdpa.c      | 10 ----------
 drivers/staging/fsl-dpaa2/ethsw/ethsw.c         |  5 -----
 5 files changed, 49 deletions(-)

-- 
2.17.1

^ permalink raw reply

* RE: [PATCH 0/3] Add gup fast + longterm and use it in HFI1
From: Weiny, Ira @ 2019-02-11 21:14 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: linux-rdma@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org, Daniel Borkmann, Davidlohr Bueso,
	netdev@vger.kernel.org, Marciniszyn, Mike, Dalessandro, Dennis,
	Doug Ledford, Andrew Morton, Kirill A. Shutemov, Williams, Dan J
In-Reply-To: <20190211204049.GB2771@ziepe.ca>

> 
> On Mon, Feb 11, 2019 at 12:16:40PM -0800, ira.weiny@intel.com wrote:
> > From: Ira Weiny <ira.weiny@intel.com>
> >
> > NOTE: This series depends on my clean up patch to remove the write
> > parameter from gup_fast_permitted()[1]
> >
> > HFI1 uses get_user_pages_fast() due to it performance advantages.
> > Like RDMA,
> > HFI1 pages can be held for a significant time.  But
> > get_user_pages_fast() does not protect against mapping of FS DAX pages.
> 
> If HFI1 can use the _fast varient, can't all the general RDMA stuff use it too?
> 
> What is the guidance on when fast vs not fast should be use?

Right now it can't because it holds mmap_sem across the call.  Once Shiraz's patches are accepted removing the umem->hugetlb flag I think we can change  umem.c.

Also, it specifies FOLL_FORCE which can't currently be specified with gup fast.  One idea I had was to change get_user_pages_fast() to use gup_flags instead of a single write flag.  But that proved to be a very big cosmetic change across a lot of callers so I went this way.

Ira

> 
> Jason

^ permalink raw reply

* Re: [PATCH 2/3] mm/gup: Introduce get_user_pages_fast_longterm()
From: John Hubbard @ 2019-02-11 21:13 UTC (permalink / raw)
  To: Jason Gunthorpe, ira.weiny
  Cc: linux-rdma, linux-kernel, linux-mm, Daniel Borkmann,
	Davidlohr Bueso, netdev, Mike Marciniszyn, Dennis Dalessandro,
	Doug Ledford, Andrew Morton, Kirill A. Shutemov, Dan Williams
In-Reply-To: <20190211203916.GA2771@ziepe.ca>

On 2/11/19 12:39 PM, Jason Gunthorpe wrote:
> On Mon, Feb 11, 2019 at 12:16:42PM -0800, ira.weiny@intel.com wrote:
>> From: Ira Weiny <ira.weiny@intel.com>
[...]
>> +static inline int get_user_pages_fast_longterm(unsigned long start, int nr_pages,
>> +					       bool write, struct page **pages)
>> +{
>> +	return get_user_pages_fast(start, nr_pages, write, pages);
>> +}
>>  #endif /* CONFIG_FS_DAX */
>>  
>>  int get_user_pages_fast(unsigned long start, int nr_pages, int write,
>> @@ -2615,6 +2622,7 @@ struct page *follow_page(struct vm_area_struct *vma, unsigned long address,
>>  #define FOLL_REMOTE	0x2000	/* we are working on non-current tsk/mm */
>>  #define FOLL_COW	0x4000	/* internal GUP flag */
>>  #define FOLL_ANON	0x8000	/* don't do file mappings */
>> +#define FOLL_LONGTERM	0x10000	/* mapping is intended for a long term pin */
> 
> If we are adding a new flag, maybe we should get rid of the 'longterm'
> entry points and just rely on the callers to pass the flag?
> 
> Jason
> 

+1, I agree that the overall get_user_pages*() API family will be cleaner
*without* get_user_pages_longterm*() calls. And this new flag makes that possible.
So I'd like to see the "longerm" call replaced with just passing this flag. Maybe
even as part of this patchset, but either way.

Taking a moment to reflect on where I think this might go eventually (the notes
below do not need to affect your patchset here, but this seems like a good place
to mention this):

It seems to me that the longterm vs. short-term is of questionable value.
It's actually better to just call get_user_pages(), and then if it really is
long-term enough to matter internally, we'll see the pages marked as gup-pinned.
If the gup pages are released before anyone (filesystem, that is) notices, then
it must have been short term.

Doing it that way is self-maintaining. Of course, this assumes that we end up with
a design that doesn't require being told, by the call sites, that a given gup
call is intended for "long term" use. So I could be wrong about this direction, but
let's please consider the possibility.

thanks,
-- 
John Hubbard
NVIDIA

^ permalink raw reply

* [Patch net v2 3/3] net_sched: fix two more memory leaks in cls_tcindex
From: Cong Wang @ 2019-02-11 21:06 UTC (permalink / raw)
  To: netdev; +Cc: Cong Wang, Jamal Hadi Salim, Jiri Pirko
In-Reply-To: <20190211210616.9592-1-xiyou.wangcong@gmail.com>

struct tcindex_filter_result contains two parts:
struct tcf_exts and struct tcf_result.

For the local variable 'cr', its exts part is never used but
initialized without being released properly on success path. So
just completely remove the exts part to fix this leak.

For the local variable 'new_filter_result', it is never properly
released if not used by 'r' on success path.

Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
---
 net/sched/cls_tcindex.c | 16 +++++++---------
 1 file changed, 7 insertions(+), 9 deletions(-)

diff --git a/net/sched/cls_tcindex.c b/net/sched/cls_tcindex.c
index 70ea5b1a7889..38bb882bb958 100644
--- a/net/sched/cls_tcindex.c
+++ b/net/sched/cls_tcindex.c
@@ -304,9 +304,9 @@ tcindex_set_parms(struct net *net, struct tcf_proto *tp, unsigned long base,
 		  struct nlattr *est, bool ovr, struct netlink_ext_ack *extack)
 {
 	struct tcindex_filter_result new_filter_result, *old_r = r;
-	struct tcindex_filter_result cr;
 	struct tcindex_data *cp = NULL, *oldp;
 	struct tcindex_filter *f = NULL; /* make gcc behave */
+	struct tcf_result cr = {};
 	int err, balloc = 0;
 	struct tcf_exts e;
 
@@ -345,13 +345,10 @@ tcindex_set_parms(struct net *net, struct tcf_proto *tp, unsigned long base,
 	cp->h = p->h;
 
 	err = tcindex_filter_result_init(&new_filter_result);
-	if (err < 0)
-		goto errout1;
-	err = tcindex_filter_result_init(&cr);
 	if (err < 0)
 		goto errout1;
 	if (old_r)
-		cr.res = r->res;
+		cr = r->res;
 
 	if (tb[TCA_TCINDEX_HASH])
 		cp->hash = nla_get_u32(tb[TCA_TCINDEX_HASH]);
@@ -442,8 +439,8 @@ tcindex_set_parms(struct net *net, struct tcf_proto *tp, unsigned long base,
 	}
 
 	if (tb[TCA_TCINDEX_CLASSID]) {
-		cr.res.classid = nla_get_u32(tb[TCA_TCINDEX_CLASSID]);
-		tcf_bind_filter(tp, &cr.res, base);
+		cr.classid = nla_get_u32(tb[TCA_TCINDEX_CLASSID]);
+		tcf_bind_filter(tp, &cr, base);
 	}
 
 	if (old_r && old_r != r) {
@@ -455,7 +452,7 @@ tcindex_set_parms(struct net *net, struct tcf_proto *tp, unsigned long base,
 	}
 
 	oldp = p;
-	r->res = cr.res;
+	r->res = cr;
 	tcf_exts_change(&r->exts, &e);
 
 	rcu_assign_pointer(tp->root, cp);
@@ -474,6 +471,8 @@ tcindex_set_parms(struct net *net, struct tcf_proto *tp, unsigned long base,
 				; /* nothing */
 
 		rcu_assign_pointer(*fp, f);
+	} else {
+		tcf_exts_destroy(&new_filter_result.exts);
 	}
 
 	if (oldp)
@@ -486,7 +485,6 @@ tcindex_set_parms(struct net *net, struct tcf_proto *tp, unsigned long base,
 	else if (balloc == 2)
 		kfree(cp->h);
 errout1:
-	tcf_exts_destroy(&cr.exts);
 	tcf_exts_destroy(&new_filter_result.exts);
 errout:
 	kfree(cp);
-- 
2.20.1


^ permalink raw reply related

* [Patch net v2 2/3] net_sched: fix a memory leak in cls_tcindex
From: Cong Wang @ 2019-02-11 21:06 UTC (permalink / raw)
  To: netdev; +Cc: Cong Wang, Jamal Hadi Salim, Jiri Pirko
In-Reply-To: <20190211210616.9592-1-xiyou.wangcong@gmail.com>

When tcindex_destroy() destroys all the filter results in
the perfect hash table, it invokes the walker to delete
each of them. However, results with class==0 are skipped
in either tcindex_walk() or tcindex_delete(), which causes
a memory leak reported by kmemleak.

This patch fixes it by skipping the walker and directly
deleting these filter results so we don't miss any filter
result.

As a result of this change, we have to initialize exts->net
properly in tcindex_alloc_perfect_hash(). For net-next, we
need to consider whether we should initialize ->net in
tcf_exts_init() instead, before that just directly test
CONFIG_NET_CLS_ACT=y.

Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
---
 net/sched/cls_tcindex.c | 46 +++++++++++++++++++++++++++--------------
 1 file changed, 30 insertions(+), 16 deletions(-)

diff --git a/net/sched/cls_tcindex.c b/net/sched/cls_tcindex.c
index 79b52a637dda..70ea5b1a7889 100644
--- a/net/sched/cls_tcindex.c
+++ b/net/sched/cls_tcindex.c
@@ -221,14 +221,6 @@ static int tcindex_delete(struct tcf_proto *tp, void *arg, bool *last,
 	return 0;
 }
 
-static int tcindex_destroy_element(struct tcf_proto *tp,
-				   void *arg, struct tcf_walker *walker)
-{
-	bool last;
-
-	return tcindex_delete(tp, arg, &last, NULL);
-}
-
 static void tcindex_destroy_work(struct work_struct *work)
 {
 	struct tcindex_data *p = container_of(to_rcu_work(work),
@@ -279,7 +271,7 @@ static void tcindex_free_perfect_hash(struct tcindex_data *cp)
 	kfree(cp->perfect);
 }
 
-static int tcindex_alloc_perfect_hash(struct tcindex_data *cp)
+static int tcindex_alloc_perfect_hash(struct net *net, struct tcindex_data *cp)
 {
 	int i, err = 0;
 
@@ -293,6 +285,9 @@ static int tcindex_alloc_perfect_hash(struct tcindex_data *cp)
 				    TCA_TCINDEX_ACT, TCA_TCINDEX_POLICE);
 		if (err < 0)
 			goto errout;
+#ifdef CONFIG_NET_CLS_ACT
+		cp->perfect[i].exts.net = net;
+#endif
 	}
 
 	return 0;
@@ -341,7 +336,7 @@ tcindex_set_parms(struct net *net, struct tcf_proto *tp, unsigned long base,
 	if (p->perfect) {
 		int i;
 
-		if (tcindex_alloc_perfect_hash(cp) < 0)
+		if (tcindex_alloc_perfect_hash(net, cp) < 0)
 			goto errout;
 		for (i = 0; i < cp->hash; i++)
 			cp->perfect[i].res = p->perfect[i].res;
@@ -410,7 +405,7 @@ tcindex_set_parms(struct net *net, struct tcf_proto *tp, unsigned long base,
 	err = -ENOMEM;
 	if (!cp->perfect && !cp->h) {
 		if (valid_perfect_hash(cp)) {
-			if (tcindex_alloc_perfect_hash(cp) < 0)
+			if (tcindex_alloc_perfect_hash(net, cp) < 0)
 				goto errout_alloc;
 			balloc = 1;
 		} else {
@@ -566,13 +561,32 @@ static void tcindex_destroy(struct tcf_proto *tp,
 			    struct netlink_ext_ack *extack)
 {
 	struct tcindex_data *p = rtnl_dereference(tp->root);
-	struct tcf_walker walker;
+	int i;
 
 	pr_debug("tcindex_destroy(tp %p),p %p\n", tp, p);
-	walker.count = 0;
-	walker.skip = 0;
-	walker.fn = tcindex_destroy_element;
-	tcindex_walk(tp, &walker);
+
+	if (p->perfect) {
+		for (i = 0; i < p->hash; i++) {
+			struct tcindex_filter_result *r = p->perfect + i;
+
+			tcf_unbind_filter(tp, &r->res);
+			if (tcf_exts_get_net(&r->exts))
+				tcf_queue_work(&r->rwork,
+					       tcindex_destroy_rexts_work);
+			else
+				__tcindex_destroy_rexts(r);
+		}
+	}
+
+	for (i = 0; p->h && i < p->hash; i++) {
+		struct tcindex_filter *f, *next;
+		bool last;
+
+		for (f = rtnl_dereference(p->h[i]); f; f = next) {
+			next = rtnl_dereference(f->next);
+			tcindex_delete(tp, &f->result, &last, NULL);
+		}
+	}
 
 	tcf_queue_work(&p->rwork, tcindex_destroy_work);
 }
-- 
2.20.1


^ permalink raw reply related

* [Patch net v2 1/3] net_sched: fix a race condition in tcindex_destroy()
From: Cong Wang @ 2019-02-11 21:06 UTC (permalink / raw)
  To: netdev; +Cc: Cong Wang, Adrian, Ben Hutchings, Jamal Hadi Salim, Jiri Pirko
In-Reply-To: <20190211210616.9592-1-xiyou.wangcong@gmail.com>

tcindex_destroy() invokes tcindex_destroy_element() via
a walker to delete each filter result in its perfect hash
table, and tcindex_destroy_element() calls tcindex_delete()
which schedules tcf RCU works to do the final deletion work.
Unfortunately this races with the RCU callback
__tcindex_destroy(), which could lead to use-after-free as
reported by Adrian.

Fix this by migrating this RCU callback to tcf RCU work too,
as that workqueue is ordered, we will not have use-after-free.

Note, we don't need to hold netns refcnt because we don't call
tcf_exts_destroy() here.

Fixes: 27ce4f05e2ab ("net_sched: use tcf_queue_work() in tcindex filter")
Reported-by: Adrian <bugs@abtelecom.ro>
Cc: Ben Hutchings <ben@decadent.org.uk>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
---
 net/sched/cls_tcindex.c | 18 +++++++++++-------
 1 file changed, 11 insertions(+), 7 deletions(-)

diff --git a/net/sched/cls_tcindex.c b/net/sched/cls_tcindex.c
index 9ccc93f257db..79b52a637dda 100644
--- a/net/sched/cls_tcindex.c
+++ b/net/sched/cls_tcindex.c
@@ -48,7 +48,7 @@ struct tcindex_data {
 	u32 hash;		/* hash table size; 0 if undefined */
 	u32 alloc_hash;		/* allocated size */
 	u32 fall_through;	/* 0: only classify if explicit match */
-	struct rcu_head rcu;
+	struct rcu_work rwork;
 };
 
 static inline int tcindex_filter_is_set(struct tcindex_filter_result *r)
@@ -229,9 +229,11 @@ static int tcindex_destroy_element(struct tcf_proto *tp,
 	return tcindex_delete(tp, arg, &last, NULL);
 }
 
-static void __tcindex_destroy(struct rcu_head *head)
+static void tcindex_destroy_work(struct work_struct *work)
 {
-	struct tcindex_data *p = container_of(head, struct tcindex_data, rcu);
+	struct tcindex_data *p = container_of(to_rcu_work(work),
+					      struct tcindex_data,
+					      rwork);
 
 	kfree(p->perfect);
 	kfree(p->h);
@@ -258,9 +260,11 @@ static int tcindex_filter_result_init(struct tcindex_filter_result *r)
 	return tcf_exts_init(&r->exts, TCA_TCINDEX_ACT, TCA_TCINDEX_POLICE);
 }
 
-static void __tcindex_partial_destroy(struct rcu_head *head)
+static void tcindex_partial_destroy_work(struct work_struct *work)
 {
-	struct tcindex_data *p = container_of(head, struct tcindex_data, rcu);
+	struct tcindex_data *p = container_of(to_rcu_work(work),
+					      struct tcindex_data,
+					      rwork);
 
 	kfree(p->perfect);
 	kfree(p);
@@ -478,7 +482,7 @@ tcindex_set_parms(struct net *net, struct tcf_proto *tp, unsigned long base,
 	}
 
 	if (oldp)
-		call_rcu(&oldp->rcu, __tcindex_partial_destroy);
+		tcf_queue_work(&oldp->rwork, tcindex_partial_destroy_work);
 	return 0;
 
 errout_alloc:
@@ -570,7 +574,7 @@ static void tcindex_destroy(struct tcf_proto *tp,
 	walker.fn = tcindex_destroy_element;
 	tcindex_walk(tp, &walker);
 
-	call_rcu(&p->rcu, __tcindex_destroy);
+	tcf_queue_work(&p->rwork, tcindex_destroy_work);
 }
 
 
-- 
2.20.1


^ permalink raw reply related

* [Patch net v2 0/3] net_sched: some fixes for cls_tcindex
From: Cong Wang @ 2019-02-11 21:06 UTC (permalink / raw)
  To: netdev; +Cc: Cong Wang

This patchset contains 3 bug fixes for tcindex filter. Please check
each patch for details.

v2: fix a compile error in patch 2
    drop netns refcnt in patch 1

Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>

Cong Wang (3):
  net_sched: fix a race condition in tcindex_destroy()
  net_sched: fix a memory leak in cls_tcindex
  net_sched: fix two more memory leaks in cls_tcindex

 net/sched/cls_tcindex.c | 80 ++++++++++++++++++++++++-----------------
 1 file changed, 48 insertions(+), 32 deletions(-)

-- 
2.20.1


^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox