* Re: [PATCH net-next] net/ipv6: Fix ip6_convert_metrics() bug
From: David Miller @ 2018-04-20 15:36 UTC (permalink / raw)
To: edumazet; +Cc: netdev, eric.dumazet, dsa
In-Reply-To: <20180419161453.46977-1-edumazet@google.com>
From: Eric Dumazet <edumazet@google.com>
Date: Thu, 19 Apr 2018 09:14:53 -0700
> If ip6_convert_metrics() fails to allocate memory, it should not
> overwrite rt->fib6_metrics or we risk a crash later as syzbot found.
...
> Fixes: d4ead6b34b67 ("net/ipv6: move metrics from dst to rt6_info")
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Cc: David Ahern <dsa@cumulusnetworks.com>
> Reported-by: syzbot <syzkaller@googlegroups.com>
Applied, thanks Eric.
^ permalink raw reply
* Re: [PATCH net-next] sfc: set and clear interrupt affinity hints
From: David Miller @ 2018-04-20 15:37 UTC (permalink / raw)
To: bkenward; +Cc: netdev, linux-net-drivers
In-Reply-To: <7ba18d83-95d6-b83c-d19c-a50bcec79da0@solarflare.com>
From: Bert Kenward <bkenward@solarflare.com>
Date: Thu, 19 Apr 2018 17:37:25 +0100
> Use cpumask_local_spread to provide interrupt affinity hints
> for each queue. This will spread interrupts across NUMA local
> CPUs first, extending to remote nodes if needed.
>
> Signed-off-by: Bert Kenward <bkenward@solarflare.com>
Applied.
^ permalink raw reply
* Re: [PATCH v4 0/3] lan78xx: Read configuration from Device Tree
From: David Miller @ 2018-04-20 15:39 UTC (permalink / raw)
To: phil
Cc: woojung.huh, UNGLinuxDriver, robh+dt, mark.rutland, andrew,
f.fainelli, mchehab, gregkh, linus.walleij, akpm, rdunlap, netdev,
devicetree, linux-kernel, linux-usb
In-Reply-To: <1524157180-27276-1-git-send-email-phil@raspberrypi.org>
From: Phil Elwell <phil@raspberrypi.org>
Date: Thu, 19 Apr 2018 17:59:37 +0100
> The Microchip LAN78XX family of devices are Ethernet controllers with
> a USB interface. Despite being discoverable devices it can be useful to
> be able to configure them from Device Tree, particularly in low-cost
> applications without an EEPROM or programmed OTP.
>
> This patch set adds support for reading the MAC address and LED modes from
> Device Tree.
>
> v4:
> - Rename nodes in bindings doc.
>
> v3:
> - Move LED setting into PHY driver.
>
> v2:
> - Use eth_platform_get_mac_address.
> - Support up to 4 LEDs, and move LED mode constants into dt-bindings header.
> - Improve bindings document.
> - Remove EEE support.
Series applied, thanks Phil.
^ permalink raw reply
* Re: [PATCH net-next 4/5] tcp: implement mmap() for zero copy receive
From: Eric Dumazet @ 2018-04-20 15:39 UTC (permalink / raw)
To: Jonathan Corbet, Eric Dumazet
Cc: Eric Dumazet, David S . Miller, netdev, Neal Cardwell,
Yuchung Cheng, Soheil Hassas Yeganeh
In-Reply-To: <20180420091951.713c0b95@lwn.net>
On 04/20/2018 08:19 AM, Jonathan Corbet wrote:
> On Thu, 19 Apr 2018 18:01:32 -0700
> Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
>> We can keep mmap() nice interface, granted we can add one hook like in following patch.
>>
>> David, do you think such patch would be acceptable by lkml and mm/fs maintainers ?
>>
>> Alternative would be implementing an ioctl() or getsockopt() operation,
>> but it seems less natural...
>
Hi Jonathan
> So I have little standing here, but what the heck, not letting that bother
> me has earned me a living for the last 20 years or so...:)
>
> I think you should consider switching over to an interface where you
> mmap() the region once, and use ioctl() to move the data into that region,
> for a couple of reasons beyond the locking issues you've already found:
>
> - The "mmap() consumes data" semantics are a bit ... strange, IMO.
> That's not what mmap() normally does. People expect ioctl() to do
> magic things, instead.
Well, the thing is that most of our use cases wont reuse same mmap() area.
RPC layer will provide all RPC with their associated pages to RPC consumers.
RPC consumers will decide to keep these pages or consume them.
So having to mmap() + another syscall to consume XXX bytes from receive queue is not
going to save cpu cycles :/
Having the ability to call mmap() multiple times for the same TCP payload is not
going to be of any use in real applications. This is why I only support 'offset 0'
for the last mmap() parameter.
>
> - I would expect it to be a tiny bit faster, since you wouldn't be doing
> the VMA setup and teardown each time.
Maybe for the degenerated case we can reuse the same region over and over.
^ permalink raw reply
* Re: [PATCH v7 net-next 4/4] netvsc: refactor notifier/event handling code to use the failover framework
From: Michael S. Tsirkin @ 2018-04-20 15:43 UTC (permalink / raw)
To: Stephen Hemminger
Cc: Sridhar Samudrala, davem, netdev, virtualization, virtio-dev,
jesse.brandeburg, alexander.h.duyck, kubakici, jasowang,
loseweigh, jiri
In-Reply-To: <20180420082802.6ca37e4c@xeon-e3>
On Fri, Apr 20, 2018 at 08:28:02AM -0700, Stephen Hemminger wrote:
> On Thu, 19 Apr 2018 18:42:04 -0700
> Sridhar Samudrala <sridhar.samudrala@intel.com> wrote:
>
> > Use the registration/notification framework supported by the generic
> > failover infrastructure.
> >
> > Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
>
> Do what you want to other devices but leave netvsc alone.
> Adding these failover ops does not reduce the code size,
drivers/net/hyperv/Kconfig | 1 +
drivers/net/hyperv/hyperv_net.h | 2 +
drivers/net/hyperv/netvsc_drv.c | 208 ++++++++++------------------------------
3 files changed, 55 insertions(+), 156 deletions(-)
100 lines gone.
> and really is
> no benefit. The netvsc device driver needs to be backported to several
> other distributions and doing this makes that harder.
>
> I will NAK patches to change to common code for netvsc
Wow.
> especially the
> three device model.
AFAIK these patches do not change netvsc to a three device model.
> MS worked hard with distro vendors to support transparent
> mode, ans we really can't have a new model;
That's why Sridhar worked hard to preserve a 2 device model for netvsc.
> or do backport.
>
> Plus, DPDK is now dependent on existing model.
DPDK does the kernel bypass thing, doesn't it? Why does the kernel care?
--
MST
^ permalink raw reply
* Re: [PATCH v7 net-next 4/4] netvsc: refactor notifier/event handling code to use the failover framework
From: David Miller @ 2018-04-20 15:46 UTC (permalink / raw)
To: stephen
Cc: sridhar.samudrala, mst, netdev, virtualization, virtio-dev,
jesse.brandeburg, alexander.h.duyck, kubakici, jasowang,
loseweigh, jiri
In-Reply-To: <20180420082802.6ca37e4c@xeon-e3>
From: Stephen Hemminger <stephen@networkplumber.org>
Date: Fri, 20 Apr 2018 08:28:02 -0700
> On Thu, 19 Apr 2018 18:42:04 -0700
> Sridhar Samudrala <sridhar.samudrala@intel.com> wrote:
>
>> Use the registration/notification framework supported by the generic
>> failover infrastructure.
>>
>> Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
>
> Do what you want to other devices but leave netvsc alone.
> Adding these failover ops does not reduce the code size, and really is
> no benefit. The netvsc device driver needs to be backported to several
> other distributions and doing this makes that harder.
>
> I will NAK patches to change to common code for netvsc especially the
> three device model. MS worked hard with distro vendors to support transparent
> mode, ans we really can't have a new model; or do backport.
>
> Plus, DPDK is now dependent on existing model.
Stephen, I understand your situation.
But the result we have now is undesirable and it happened because MS
worked with distro vendors on this rather than the upstream community
and entities with other device with similar needs.
Please next time do the latter rather than the former.
Thank you.
^ permalink raw reply
* Re: [PATCH v7 net-next 4/4] netvsc: refactor notifier/event handling code to use the failover framework
From: Samudrala, Sridhar @ 2018-04-20 15:46 UTC (permalink / raw)
To: Stephen Hemminger
Cc: mst, davem, netdev, virtualization, virtio-dev, jesse.brandeburg,
alexander.h.duyck, kubakici, jasowang, loseweigh, jiri
In-Reply-To: <20180420082802.6ca37e4c@xeon-e3>
On 4/20/2018 8:28 AM, Stephen Hemminger wrote:
> On Thu, 19 Apr 2018 18:42:04 -0700
> Sridhar Samudrala <sridhar.samudrala@intel.com> wrote:
>
>> Use the registration/notification framework supported by the generic
>> failover infrastructure.
>>
>> Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
> Do what you want to other devices but leave netvsc alone.
> Adding these failover ops does not reduce the code size, and really is
> no benefit. The netvsc device driver needs to be backported to several
> other distributions and doing this makes that harder.
>
> I will NAK patches to change to common code for netvsc especially the
> three device model. MS worked hard with distro vendors to support transparent
> mode, ans we really can't have a new model; or do backport.
failover_ops are specifically added to support both 2-netdev and 3-netdev models
This patch doesn't change netvsc model. It still keeps its 2-netdev model. From
netvsc, point of view it is just moving some code from netvsc to the failover module
and also i think the eventhandling and getbymac routines are more optimal.
> Plus, DPDK is now dependent on existing model.
^ permalink raw reply
* Re: [PATCH v7 net-next 4/4] netvsc: refactor notifier/event handling code to use the failover framework
From: David Miller @ 2018-04-20 15:47 UTC (permalink / raw)
To: mst
Cc: stephen, sridhar.samudrala, netdev, virtualization, virtio-dev,
jesse.brandeburg, alexander.h.duyck, kubakici, jasowang,
loseweigh, jiri
In-Reply-To: <20180420183505-mutt-send-email-mst@kernel.org>
From: "Michael S. Tsirkin" <mst@redhat.com>
Date: Fri, 20 Apr 2018 18:43:54 +0300
> On Fri, Apr 20, 2018 at 08:28:02AM -0700, Stephen Hemminger wrote:
>> Plus, DPDK is now dependent on existing model.
>
> DPDK does the kernel bypass thing, doesn't it? Why does the kernel care?
+1
^ permalink raw reply
* [PATCH] hv_netvsc: select needed ucs2_string routine
From: Stephen Hemminger @ 2018-04-20 15:48 UTC (permalink / raw)
To: davem; +Cc: netdev, Stephen Hemminger
The conversion of rndis friendly name to utf8 uses a standard
kernel routine which is optional in config. Therefore build
would fail for some configurations. Resolve by selecting needed
library.
Fixes: 0fe554a46a0f ("hv_netvsc: propogate Hyper-V friendly name into interface alias")
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
---
drivers/net/hyperv/Kconfig | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/net/hyperv/Kconfig b/drivers/net/hyperv/Kconfig
index 936968d23559..0765d5f61714 100644
--- a/drivers/net/hyperv/Kconfig
+++ b/drivers/net/hyperv/Kconfig
@@ -1,5 +1,6 @@
config HYPERV_NET
tristate "Microsoft Hyper-V virtual network driver"
depends on HYPERV
+ select UCS2_STRING
help
Select this option to enable the Hyper-V virtual network driver.
--
2.17.0
^ permalink raw reply related
* [PATCH for-rc] uapi: Fix SPDX tags for files referring to the 'OpenIB.org' license
From: Jason Gunthorpe @ 2018-04-20 15:49 UTC (permalink / raw)
To: linux-rdma
Cc: Kate Stewart, Philippe Ombredanne, Greg Kroah-Hartman,
Thomas Gleixner, Steve Winslow, Santosh Shilimkar, netdev,
linux-kernel, Dave Watson
Based on discussion with Kate Stewart this license is not a
BSD-2-Clause, but is now formally identified as Linux-OpenIB
by SPDX.
The key difference between the licenses is in the 'warranty'
paragraph.
if_infiniband.h refers to the 'OpenIB.org' license, but
does not include the text, instead it links to an obsolete
web site that contains a license that matches the BSD-2-Clause
SPX. There is no 'three clause' version of the OpenIB.org
license.
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
---
include/uapi/linux/if_infiniband.h | 2 +-
include/uapi/linux/rds.h | 2 +-
include/uapi/linux/tls.h | 2 +-
include/uapi/rdma/cxgb3-abi.h | 2 +-
include/uapi/rdma/cxgb4-abi.h | 2 +-
include/uapi/rdma/hns-abi.h | 2 +-
include/uapi/rdma/ib_user_cm.h | 2 +-
include/uapi/rdma/ib_user_ioctl_verbs.h | 2 +-
include/uapi/rdma/ib_user_mad.h | 2 +-
include/uapi/rdma/ib_user_sa.h | 2 +-
include/uapi/rdma/ib_user_verbs.h | 2 +-
include/uapi/rdma/mlx4-abi.h | 2 +-
include/uapi/rdma/mlx5-abi.h | 2 +-
include/uapi/rdma/mthca-abi.h | 2 +-
include/uapi/rdma/nes-abi.h | 2 +-
include/uapi/rdma/qedr-abi.h | 2 +-
include/uapi/rdma/rdma_user_cm.h | 2 +-
include/uapi/rdma/rdma_user_ioctl.h | 2 +-
include/uapi/rdma/rdma_user_rxe.h | 2 +-
19 files changed, 19 insertions(+), 19 deletions(-)
I propose to send this patch through the RDMA tree.
diff --git a/include/uapi/linux/if_infiniband.h b/include/uapi/linux/if_infiniband.h
index 050b92dcf8cf40..0fc33bf30e45a1 100644
--- a/include/uapi/linux/if_infiniband.h
+++ b/include/uapi/linux/if_infiniband.h
@@ -1,4 +1,4 @@
-/* SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) */
+/* SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) */
/*
* This software is available to you under a choice of one of two
* licenses. You may choose to be licensed under the terms of the GNU
diff --git a/include/uapi/linux/rds.h b/include/uapi/linux/rds.h
index a66b213de3d7a4..20c6bd0b00079e 100644
--- a/include/uapi/linux/rds.h
+++ b/include/uapi/linux/rds.h
@@ -1,4 +1,4 @@
-/* SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) */
+/* SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR Linux-OpenIB) */
/*
* Copyright (c) 2008 Oracle. All rights reserved.
*
diff --git a/include/uapi/linux/tls.h b/include/uapi/linux/tls.h
index c6633e97eca40b..ff02287495ac56 100644
--- a/include/uapi/linux/tls.h
+++ b/include/uapi/linux/tls.h
@@ -1,4 +1,4 @@
-/* SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) */
+/* SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR Linux-OpenIB) */
/*
* Copyright (c) 2016-2017, Mellanox Technologies. All rights reserved.
*
diff --git a/include/uapi/rdma/cxgb3-abi.h b/include/uapi/rdma/cxgb3-abi.h
index 9acb4b7a624633..85aed672f43e65 100644
--- a/include/uapi/rdma/cxgb3-abi.h
+++ b/include/uapi/rdma/cxgb3-abi.h
@@ -1,4 +1,4 @@
-/* SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) */
+/* SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR Linux-OpenIB) */
/*
* Copyright (c) 2006 Chelsio, Inc. All rights reserved.
*
diff --git a/include/uapi/rdma/cxgb4-abi.h b/include/uapi/rdma/cxgb4-abi.h
index 1fefd0140c26f6..a159ba8dcf8f13 100644
--- a/include/uapi/rdma/cxgb4-abi.h
+++ b/include/uapi/rdma/cxgb4-abi.h
@@ -1,4 +1,4 @@
-/* SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) */
+/* SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR Linux-OpenIB) */
/*
* Copyright (c) 2009-2010 Chelsio, Inc. All rights reserved.
*
diff --git a/include/uapi/rdma/hns-abi.h b/include/uapi/rdma/hns-abi.h
index 7092c8de4bd883..78613b609fa846 100644
--- a/include/uapi/rdma/hns-abi.h
+++ b/include/uapi/rdma/hns-abi.h
@@ -1,4 +1,4 @@
-/* SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) */
+/* SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR Linux-OpenIB) */
/*
* Copyright (c) 2016 Hisilicon Limited.
*
diff --git a/include/uapi/rdma/ib_user_cm.h b/include/uapi/rdma/ib_user_cm.h
index 4a8f9562f7cd9b..e2709bb8cb1802 100644
--- a/include/uapi/rdma/ib_user_cm.h
+++ b/include/uapi/rdma/ib_user_cm.h
@@ -1,4 +1,4 @@
-/* SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) */
+/* SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR Linux-OpenIB) */
/*
* Copyright (c) 2005 Topspin Communications. All rights reserved.
* Copyright (c) 2005 Intel Corporation. All rights reserved.
diff --git a/include/uapi/rdma/ib_user_ioctl_verbs.h b/include/uapi/rdma/ib_user_ioctl_verbs.h
index 04e46ea517d328..625545d862d7e4 100644
--- a/include/uapi/rdma/ib_user_ioctl_verbs.h
+++ b/include/uapi/rdma/ib_user_ioctl_verbs.h
@@ -1,4 +1,4 @@
-/* SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) */
+/* SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR Linux-OpenIB) */
/*
* Copyright (c) 2017-2018, Mellanox Technologies inc. All rights reserved.
*
diff --git a/include/uapi/rdma/ib_user_mad.h b/include/uapi/rdma/ib_user_mad.h
index ef92118dad9770..90c0cf228020dc 100644
--- a/include/uapi/rdma/ib_user_mad.h
+++ b/include/uapi/rdma/ib_user_mad.h
@@ -1,4 +1,4 @@
-/* SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) */
+/* SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR Linux-OpenIB) */
/*
* Copyright (c) 2004 Topspin Communications. All rights reserved.
* Copyright (c) 2005 Voltaire, Inc. All rights reserved.
diff --git a/include/uapi/rdma/ib_user_sa.h b/include/uapi/rdma/ib_user_sa.h
index 0d2607f0cd20c3..435155d6e1c6a5 100644
--- a/include/uapi/rdma/ib_user_sa.h
+++ b/include/uapi/rdma/ib_user_sa.h
@@ -1,4 +1,4 @@
-/* SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) */
+/* SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR Linux-OpenIB) */
/*
* Copyright (c) 2005 Intel Corporation. All rights reserved.
*
diff --git a/include/uapi/rdma/ib_user_verbs.h b/include/uapi/rdma/ib_user_verbs.h
index 9be07394fdbe50..6aeb03315b0bd5 100644
--- a/include/uapi/rdma/ib_user_verbs.h
+++ b/include/uapi/rdma/ib_user_verbs.h
@@ -1,4 +1,4 @@
-/* SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) */
+/* SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR Linux-OpenIB) */
/*
* Copyright (c) 2005 Topspin Communications. All rights reserved.
* Copyright (c) 2005, 2006 Cisco Systems. All rights reserved.
diff --git a/include/uapi/rdma/mlx4-abi.h b/include/uapi/rdma/mlx4-abi.h
index 04f64bc4045f1b..f745575281756d 100644
--- a/include/uapi/rdma/mlx4-abi.h
+++ b/include/uapi/rdma/mlx4-abi.h
@@ -1,4 +1,4 @@
-/* SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) */
+/* SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR Linux-OpenIB) */
/*
* Copyright (c) 2007 Cisco Systems, Inc. All rights reserved.
* Copyright (c) 2007, 2008 Mellanox Technologies. All rights reserved.
diff --git a/include/uapi/rdma/mlx5-abi.h b/include/uapi/rdma/mlx5-abi.h
index cb4a02c4a1cef0..fdaf00e206498c 100644
--- a/include/uapi/rdma/mlx5-abi.h
+++ b/include/uapi/rdma/mlx5-abi.h
@@ -1,4 +1,4 @@
-/* SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) */
+/* SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR Linux-OpenIB) */
/*
* Copyright (c) 2013-2015, Mellanox Technologies. All rights reserved.
*
diff --git a/include/uapi/rdma/mthca-abi.h b/include/uapi/rdma/mthca-abi.h
index ac756cd9e80772..91b12e1a6f43ce 100644
--- a/include/uapi/rdma/mthca-abi.h
+++ b/include/uapi/rdma/mthca-abi.h
@@ -1,4 +1,4 @@
-/* SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) */
+/* SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR Linux-OpenIB) */
/*
* Copyright (c) 2005 Topspin Communications. All rights reserved.
* Copyright (c) 2005, 2006 Cisco Systems. All rights reserved.
diff --git a/include/uapi/rdma/nes-abi.h b/include/uapi/rdma/nes-abi.h
index 35bfd4015d0705..f80495baa9697e 100644
--- a/include/uapi/rdma/nes-abi.h
+++ b/include/uapi/rdma/nes-abi.h
@@ -1,4 +1,4 @@
-/* SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) */
+/* SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR Linux-OpenIB) */
/*
* Copyright (c) 2006 - 2011 Intel Corporation. All rights reserved.
* Copyright (c) 2005 Topspin Communications. All rights reserved.
diff --git a/include/uapi/rdma/qedr-abi.h b/include/uapi/rdma/qedr-abi.h
index 8ba098900e9aac..24c658b3c79042 100644
--- a/include/uapi/rdma/qedr-abi.h
+++ b/include/uapi/rdma/qedr-abi.h
@@ -1,4 +1,4 @@
-/* SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) */
+/* SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR Linux-OpenIB) */
/* QLogic qedr NIC Driver
* Copyright (c) 2015-2016 QLogic Corporation
*
diff --git a/include/uapi/rdma/rdma_user_cm.h b/include/uapi/rdma/rdma_user_cm.h
index e1269024af47f0..0d1e78ebad0515 100644
--- a/include/uapi/rdma/rdma_user_cm.h
+++ b/include/uapi/rdma/rdma_user_cm.h
@@ -1,4 +1,4 @@
-/* SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) */
+/* SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR Linux-OpenIB) */
/*
* Copyright (c) 2005-2006 Intel Corporation. All rights reserved.
*
diff --git a/include/uapi/rdma/rdma_user_ioctl.h b/include/uapi/rdma/rdma_user_ioctl.h
index d223f4164a0f8d..d92d2721b28c5b 100644
--- a/include/uapi/rdma/rdma_user_ioctl.h
+++ b/include/uapi/rdma/rdma_user_ioctl.h
@@ -1,4 +1,4 @@
-/* SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) */
+/* SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR Linux-OpenIB) */
/*
* Copyright (c) 2016 Mellanox Technologies, LTD. All rights reserved.
*
diff --git a/include/uapi/rdma/rdma_user_rxe.h b/include/uapi/rdma/rdma_user_rxe.h
index 1f8a9e7daea43e..44ef6a3b7afc8c 100644
--- a/include/uapi/rdma/rdma_user_rxe.h
+++ b/include/uapi/rdma/rdma_user_rxe.h
@@ -1,4 +1,4 @@
-/* SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) */
+/* SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR Linux-OpenIB) */
/*
* Copyright (c) 2016 Mellanox Technologies Ltd. All rights reserved.
*
--
2.17.0
^ permalink raw reply related
* Re: [PATCH v1 net-next] lan78xx: Add support to dump lan78xx registers
From: David Miller @ 2018-04-20 15:50 UTC (permalink / raw)
To: raghuramchary.jallipalli; +Cc: netdev, unglinuxdriver, woojung.huh
In-Reply-To: <20180420061350.9340-1-raghuramchary.jallipalli@microchip.com>
From: Raghuram Chary J <raghuramchary.jallipalli@microchip.com>
Date: Fri, 20 Apr 2018 11:43:50 +0530
> In order to dump lan78xx family registers using ethtool, add
> support at lan78xx driver level.
>
> Signed-off-by: Raghuram Chary J <raghuramchary.jallipalli@microchip.com>
> ---
> v0->v1:
> * Remove one variable in the for loop.
Applied, thank you.
^ permalink raw reply
* Re: [PATCH net-next] tun: do not compute the rxhash, if not needed
From: David Miller @ 2018-04-20 15:51 UTC (permalink / raw)
To: pabeni; +Cc: netdev, jasowang
In-Reply-To: <1c43f8bc63407239c91df916b149d4fdbf26bed3.1524222969.git.pabeni@redhat.com>
From: Paolo Abeni <pabeni@redhat.com>
Date: Fri, 20 Apr 2018 13:18:16 +0200
> Currently, the tun driver, in absence of an eBPF steering program,
> always compute the rxhash in its rx path, even when such value
> is later unused due to additional checks (
>
> This changeset moves the all the related checks just before the
> __skb_get_hash_symmetric(), so that the latter is no more computed
> when unneeded.
>
> Also replace an unneeded RCU section with rcu_access_pointer().
>
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Applied, thank you.
^ permalink raw reply
* [PATCH net-next 0/4] mm,tcp: provide mmap_hook to solve lockdep issue
From: Eric Dumazet @ 2018-04-20 15:55 UTC (permalink / raw)
To: David S . Miller
Cc: netdev, linux-kernel, Soheil Hassas Yeganeh, Eric Dumazet,
Eric Dumazet
This patch series provide a new mmap_hook to fs willing to grab
a mutex before mm->mmap_sem is taken, to ensure lockdep sanity.
This hook allows us to shorten tcp_mmap() execution time (while mmap_sem
is held), and improve multi-threading scalability.
Eric Dumazet (4):
mm: provide a mmap_hook infrastructure
net: implement sock_mmap_hook()
tcp: provide tcp_mmap_hook()
tcp: mmap: move the skb cleanup to tcp_mmap_hook()
include/linux/fs.h | 6 ++++++
include/linux/net.h | 1 +
include/net/tcp.h | 1 +
mm/util.c | 19 ++++++++++++++++++-
net/ipv4/af_inet.c | 1 +
net/ipv4/tcp.c | 39 ++++++++++++++++++++++++++++++---------
net/ipv6/af_inet6.c | 1 +
net/socket.c | 9 +++++++++
8 files changed, 67 insertions(+), 10 deletions(-)
--
2.17.0.484.g0c8726318c-goog
^ permalink raw reply
* [PATCH net-next 1/4] mm: provide a mmap_hook infrastructure
From: Eric Dumazet @ 2018-04-20 15:55 UTC (permalink / raw)
To: David S . Miller
Cc: netdev, linux-kernel, Soheil Hassas Yeganeh, Eric Dumazet,
Eric Dumazet
In-Reply-To: <20180420155542.122183-1-edumazet@google.com>
When adding tcp mmap() implementation, I forgot that socket lock
had to be taken before current->mm->mmap_sem. syzbot eventually caught
the bug.
This patch provides a new mmap_hook() method in struct file_operations
that might be provided by fs to implement a finer control of whats
to be done before and after do_mmap_pgoff() and/or the mm->mmap_sem
acquire/release.
This is used in following patches by networking and TCP stacks
to solve the lockdep issue, and also allows some preparation
and cleanup work being done before/after mmap_sem is held,
allowing better scalability in multi-threading programs.
Fixes: 93ab6cc69162 ("tcp: implement mmap() for zero copy receive")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
---
include/linux/fs.h | 6 ++++++
mm/util.c | 19 ++++++++++++++++++-
2 files changed, 24 insertions(+), 1 deletion(-)
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 92efaf1f89775f7b017477617dd983c10e0dc4d2..ef3526f84686585678861fc585efea974a69ca55 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1698,6 +1698,11 @@ struct block_device_operations;
#define NOMMU_VMFLAGS \
(NOMMU_MAP_READ | NOMMU_MAP_WRITE | NOMMU_MAP_EXEC)
+enum mmap_hook {
+ MMAP_HOOK_PREPARE,
+ MMAP_HOOK_ROLLBACK,
+ MMAP_HOOK_COMMIT,
+};
struct iov_iter;
@@ -1714,6 +1719,7 @@ struct file_operations {
long (*unlocked_ioctl) (struct file *, unsigned int, unsigned long);
long (*compat_ioctl) (struct file *, unsigned int, unsigned long);
int (*mmap) (struct file *, struct vm_area_struct *);
+ int (*mmap_hook) (struct file *, enum mmap_hook);
unsigned long mmap_supported_flags;
int (*open) (struct inode *, struct file *);
int (*flush) (struct file *, fl_owner_t id);
diff --git a/mm/util.c b/mm/util.c
index 1fc4fa7576f762bbbf341f056ca6d0be803a423f..3ddb18ab367f069d5884083e992e999546ccd995 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -350,11 +350,28 @@ unsigned long vm_mmap_pgoff(struct file *file, unsigned long addr,
ret = security_mmap_file(file, prot, flag);
if (!ret) {
- if (down_write_killable(&mm->mmap_sem))
+ int (*mmap_hook)(struct file *, enum mmap_hook) = NULL;
+
+ if (file) {
+ mmap_hook = file->f_op->mmap_hook;
+
+ if (mmap_hook) {
+ ret = mmap_hook(file, MMAP_HOOK_PREPARE);
+ if (ret)
+ return ret;
+ }
+ }
+ if (down_write_killable(&mm->mmap_sem)) {
+ if (mmap_hook)
+ mmap_hook(file, MMAP_HOOK_ROLLBACK);
return -EINTR;
+ }
ret = do_mmap_pgoff(file, addr, len, prot, flag, pgoff,
&populate, &uf);
up_write(&mm->mmap_sem);
+ if (mmap_hook)
+ mmap_hook(file, IS_ERR(ret) ? MMAP_HOOK_ROLLBACK :
+ MMAP_HOOK_COMMIT);
userfaultfd_unmap_complete(mm, &uf);
if (populate)
mm_populate(ret, populate);
--
2.17.0.484.g0c8726318c-goog
^ permalink raw reply related
* [PATCH net-next 3/4] tcp: provide tcp_mmap_hook()
From: Eric Dumazet @ 2018-04-20 15:55 UTC (permalink / raw)
To: David S . Miller
Cc: netdev, linux-kernel, Soheil Hassas Yeganeh, Eric Dumazet,
Eric Dumazet
In-Reply-To: <20180420155542.122183-1-edumazet@google.com>
Many socket operations can copy data between user and kernel space
while socket lock is held. This means mm->mmap_sem can be taken
after socket lock.
When implementing tcp mmap(), I forgot this and syzbot was kind enough
to point this to my attention.
This patch adds tcp_mmap_hook(), allowing us to grab socket lock
before vm_mmap_pgoff() grabs mm->mmap_sem
This same hook is responsible for releasing socket lock when
vm_mmap_pgoff() has released mm->mmap_sem (or failed to acquire it)
Note that follow-up patches can transfer code from tcp_mmap()
to tcp_mmap_hook() to shorten tcp_mmap() execution time
and thus increase mmap() performance in multi-threaded programs.
Fixes: 93ab6cc69162 ("tcp: implement mmap() for zero copy receive")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
---
include/net/tcp.h | 1 +
net/ipv4/af_inet.c | 1 +
net/ipv4/tcp.c | 25 ++++++++++++++++++++++---
net/ipv6/af_inet6.c | 1 +
4 files changed, 25 insertions(+), 3 deletions(-)
diff --git a/include/net/tcp.h b/include/net/tcp.h
index 833154e3df173ea41aa16dd1ec739a175c679c5c..f68c8e8957840cacdbdd3d02bd149fce33ae324f 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -404,6 +404,7 @@ int tcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, int nonblock,
int flags, int *addr_len);
int tcp_set_rcvlowat(struct sock *sk, int val);
void tcp_data_ready(struct sock *sk);
+int tcp_mmap_hook(struct socket *sock, enum mmap_hook mode);
int tcp_mmap(struct file *file, struct socket *sock,
struct vm_area_struct *vma);
void tcp_parse_options(const struct net *net, const struct sk_buff *skb,
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index 3ebf599cebaea4926decc1aad7274b12ec7e1566..af597440ff59c049b7fd02f7d7f79c23b9e195bb 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -995,6 +995,7 @@ const struct proto_ops inet_stream_ops = {
.sendmsg = inet_sendmsg,
.recvmsg = inet_recvmsg,
.mmap = tcp_mmap,
+ .mmap_hook = tcp_mmap_hook,
.sendpage = inet_sendpage,
.splice_read = tcp_splice_read,
.read_sock = tcp_read_sock,
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 4022073b0aeea9d07af0fa825b640a00512908a3..e913b2dd5df321f2789e8d5f233ede9c2f1d5624 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -1726,6 +1726,28 @@ int tcp_set_rcvlowat(struct sock *sk, int val)
}
EXPORT_SYMBOL(tcp_set_rcvlowat);
+/* mmap() on TCP needs to grab socket lock before current->mm->mmap_sem
+ * is taken in vm_mmap_pgoff() to avoid possible dead locks.
+ */
+int tcp_mmap_hook(struct socket *sock, enum mmap_hook mode)
+{
+ struct sock *sk = sock->sk;
+
+ if (mode == MMAP_HOOK_PREPARE) {
+ lock_sock(sk);
+ /* TODO: Move here all the preparation work that can be done
+ * before having to grab current->mm->mmap_sem.
+ */
+ return 0;
+ }
+ /* TODO: Move here the stuff that can been done after
+ * current->mm->mmap_sem has been released.
+ */
+ release_sock(sk);
+ return 0;
+}
+EXPORT_SYMBOL(tcp_mmap_hook);
+
/* When user wants to mmap X pages, we first need to perform the mapping
* before freeing any skbs in receive queue, otherwise user would be unable
* to fallback to standard recvmsg(). This happens if some data in the
@@ -1756,8 +1778,6 @@ int tcp_mmap(struct file *file, struct socket *sock,
/* TODO: Maybe the following is not needed if pages are COW */
vma->vm_flags &= ~VM_MAYWRITE;
- lock_sock(sk);
-
ret = -ENOTCONN;
if (sk->sk_state == TCP_LISTEN)
goto out;
@@ -1833,7 +1853,6 @@ int tcp_mmap(struct file *file, struct socket *sock,
ret = 0;
out:
- release_sock(sk);
kvfree(pages_array);
return ret;
}
diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c
index 36d622c477b1ed3c5d2b753938444526344a6109..31ce68c001c223d3351f73453273ae517a051816 100644
--- a/net/ipv6/af_inet6.c
+++ b/net/ipv6/af_inet6.c
@@ -579,6 +579,7 @@ const struct proto_ops inet6_stream_ops = {
.sendmsg = inet_sendmsg, /* ok */
.recvmsg = inet_recvmsg, /* ok */
.mmap = tcp_mmap,
+ .mmap_hook = tcp_mmap_hook,
.sendpage = inet_sendpage,
.sendmsg_locked = tcp_sendmsg_locked,
.sendpage_locked = tcp_sendpage_locked,
--
2.17.0.484.g0c8726318c-goog
^ permalink raw reply related
* [PATCH net-next 4/4] tcp: mmap: move the skb cleanup to tcp_mmap_hook()
From: Eric Dumazet @ 2018-04-20 15:55 UTC (permalink / raw)
To: David S . Miller
Cc: netdev, linux-kernel, Soheil Hassas Yeganeh, Eric Dumazet,
Eric Dumazet
In-Reply-To: <20180420155542.122183-1-edumazet@google.com>
Freeing all skbs and sending ACK is time consuming.
This is currently done while both current->mm->mmap_sem and socket
lock are held, in tcp_mmap()
Thanks to mmap_hook infrastructure, we can perform the cleanup
after current->mm->mmap_sem has been released, thus allowing
other threads to perform mm operations without delay.
Note that the preparation work (building the array of page
pointers) can also be done from tcp_mmap_hook() while
mmap_sem has not been taken yet, but this is another independent change.
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
net/ipv4/tcp.c | 20 +++++++++++---------
1 file changed, 11 insertions(+), 9 deletions(-)
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index e913b2dd5df321f2789e8d5f233ede9c2f1d5624..82f7c3e47253cecac6ea1819fbb7a0712058ec55 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -1740,9 +1740,16 @@ int tcp_mmap_hook(struct socket *sock, enum mmap_hook mode)
*/
return 0;
}
- /* TODO: Move here the stuff that can been done after
- * current->mm->mmap_sem has been released.
- */
+ if (mode == MMAP_HOOK_COMMIT) {
+ u32 offset;
+
+ tcp_rcv_space_adjust(sk);
+
+ /* Clean up data we have read: This will do ACK frames. */
+ tcp_recv_skb(sk, tcp_sk(sk)->copied_seq, &offset);
+
+ tcp_cleanup_rbuf(sk, PAGE_SIZE);
+ }
release_sock(sk);
return 0;
}
@@ -1843,13 +1850,8 @@ int tcp_mmap(struct file *file, struct socket *sock,
if (ret)
goto out;
}
- /* operation is complete, we can 'consume' all skbs */
+ /* operation is complete, skbs will be freed from tcp_mmap_hook() */
tp->copied_seq = seq;
- tcp_rcv_space_adjust(sk);
-
- /* Clean up data we have read: This will do ACK frames. */
- tcp_recv_skb(sk, seq, &offset);
- tcp_cleanup_rbuf(sk, size);
ret = 0;
out:
--
2.17.0.484.g0c8726318c-goog
^ permalink raw reply related
* [PATCH net-next 2/4] net: implement sock_mmap_hook()
From: Eric Dumazet @ 2018-04-20 15:55 UTC (permalink / raw)
To: David S . Miller
Cc: netdev, linux-kernel, Soheil Hassas Yeganeh, Eric Dumazet,
Eric Dumazet
In-Reply-To: <20180420155542.122183-1-edumazet@google.com>
sock_mmap_hook() is the mmap_hook handler provided for socket_file_ops
Following patch will provide tcp_mmap_hook() for TCP protocol.
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
include/linux/net.h | 1 +
net/socket.c | 9 +++++++++
2 files changed, 10 insertions(+)
diff --git a/include/linux/net.h b/include/linux/net.h
index 6554d3ba4396b3df49acac934ad16eeb71a695f4..5192bf502b11e42c3d9eb342ce67361916149bfa 100644
--- a/include/linux/net.h
+++ b/include/linux/net.h
@@ -181,6 +181,7 @@ struct proto_ops {
size_t total_len, int flags);
int (*mmap) (struct file *file, struct socket *sock,
struct vm_area_struct * vma);
+ int (*mmap_hook) (struct socket *sock, enum mmap_hook);
ssize_t (*sendpage) (struct socket *sock, struct page *page,
int offset, size_t size, int flags);
ssize_t (*splice_read)(struct socket *sock, loff_t *ppos,
diff --git a/net/socket.c b/net/socket.c
index f10f1d947c78c193b49379b0ec641d81367fb4cf..75a5c2ebe57e0621dae17c6c9e1a796ee818b107 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -131,6 +131,14 @@ static ssize_t sock_splice_read(struct file *file, loff_t *ppos,
struct pipe_inode_info *pipe, size_t len,
unsigned int flags);
+static int sock_mmap_hook(struct file *file, enum mmap_hook mode)
+{
+ struct socket *sock = file->private_data;
+
+ if (!sock->ops->mmap_hook)
+ return 0;
+ return sock->ops->mmap_hook(sock, mode);
+}
/*
* Socket files have a set of 'special' operations as well as the generic file ones. These don't appear
* in the operation structures but are done directly via the socketcall() multiplexor.
@@ -147,6 +155,7 @@ static const struct file_operations socket_file_ops = {
.compat_ioctl = compat_sock_ioctl,
#endif
.mmap = sock_mmap,
+ .mmap_hook = sock_mmap_hook,
.release = sock_close,
.fasync = sock_fasync,
.sendpage = sock_sendpage,
--
2.17.0.484.g0c8726318c-goog
^ permalink raw reply related
* Re: [virtio-dev] Re: [PATCH v7 net-next 2/4] net: Introduce generic failover module
From: Alexander Duyck @ 2018-04-20 15:56 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: Samudrala, Sridhar, Stephen Hemminger, David Miller, Netdev,
virtualization, virtio-dev, Brandeburg, Jesse, Duyck, Alexander H,
Jakub Kicinski, Jason Wang, Siwei Liu, Jiri Pirko
In-Reply-To: <20180420183021-mutt-send-email-mst@kernel.org>
On Fri, Apr 20, 2018 at 8:34 AM, Michael S. Tsirkin <mst@redhat.com> wrote:
> On Fri, Apr 20, 2018 at 08:21:00AM -0700, Samudrala, Sridhar wrote:
>> > > + finfo = netdev_priv(failover_dev);
>> > > +
>> > > + primary_dev = rtnl_dereference(finfo->primary_dev);
>> > > + standby_dev = rtnl_dereference(finfo->standby_dev);
>> > > +
>> > > + if (slave_dev != primary_dev && slave_dev != standby_dev)
>> > > + goto done;
>> > > +
>> > > + if ((primary_dev && failover_xmit_ready(primary_dev)) ||
>> > > + (standby_dev && failover_xmit_ready(standby_dev))) {
>> > > + netif_carrier_on(failover_dev);
>> > > + netif_tx_wake_all_queues(failover_dev);
>> > > + } else {
>> > > + netif_carrier_off(failover_dev);
>> > > + netif_tx_stop_all_queues(failover_dev);
>> > And I think it's a good idea to get stats from device here too.
>>
>> Not sure why we need to get stats from lower devs here?
>
> link down is often indication of a hardware problem.
> lower dev might stop responding down the road.
>
>> > > +static const struct net_device_ops failover_dev_ops = {
>> > > + .ndo_open = failover_open,
>> > > + .ndo_stop = failover_close,
>> > > + .ndo_start_xmit = failover_start_xmit,
>> > > + .ndo_select_queue = failover_select_queue,
>> > > + .ndo_get_stats64 = failover_get_stats,
>> > > + .ndo_change_mtu = failover_change_mtu,
>> > > + .ndo_set_rx_mode = failover_set_rx_mode,
>> > > + .ndo_validate_addr = eth_validate_addr,
>> > > + .ndo_features_check = passthru_features_check,
>> > xdp support?
>>
>> I think it should be possible to add it be calling the lower dev ndo_xdp routines
>> with proper checks. can we add this later?
>
> I'd be concerned that if you don't xdp userspace will keep poking
> at lower devs. Then it will stop working if you add this later.
The failover device is better off not providing in-driver XDP since
there are already skbs allocated by the time that we see the packet
here anyway. As such generic XDP is the preferred way to handle this
since it will work regardless of what lower devices are present.
The only advantage of having XDP down at the virtio or VF level would
be that it performs better, but at the cost of complexity since we
would need to rebind the eBPF program any time a device is hotplugged
out and then back in. For now the current approach is in keeping with
how bonding and other similar drivers are currently handling this.
Thanks.
- Alex
^ permalink raw reply
* Re: [PATCH bpf-next] libbpf: fixed build error for samples/bpf/
From: Martin KaFai Lau @ 2018-04-20 15:59 UTC (permalink / raw)
To: Björn Töpel; +Cc: ast, daniel, netdev, Björn Töpel
In-Reply-To: <20180420080516.16683-1-bjorn.topel@gmail.com>
On Fri, Apr 20, 2018 at 10:05:16AM +0200, Björn Töpel wrote:
> From: Björn Töpel <bjorn.topel@intel.com>
>
> Commit 8a138aed4a80 ("bpf: btf: Add BTF support to libbpf") did not
> include stdbool.h, so GCC complained when building samples/bpf/.
>
> In file included from /home/btopel/src/ext/linux/samples/bpf/libbpf.h:6:0,
> from /home/btopel/src/ext/linux/samples/bpf/test_lru_dist.c:24:
> /home/btopel/src/ext/linux/tools/lib/bpf/bpf.h:105:4: error: unknown type name ‘bool’; did you mean ‘_Bool’?
> bool do_log);
> ^~~~
> _Bool
>
> Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
> ---
> tools/lib/bpf/bpf.h | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h
> index 01bda076310f..553b11ad52b3 100644
> --- a/tools/lib/bpf/bpf.h
> +++ b/tools/lib/bpf/bpf.h
> @@ -24,6 +24,7 @@
> #define __BPF_BPF_H
>
> #include <linux/bpf.h>
> +#include <stdbool.h>
Thanks for the fix!
^ permalink raw reply
* Re: [PATCH v7 net-next 4/4] netvsc: refactor notifier/event handling code to use the failover framework
From: Jiri Pirko @ 2018-04-20 16:00 UTC (permalink / raw)
To: Stephen Hemminger
Cc: Sridhar Samudrala, mst, davem, netdev, virtualization, virtio-dev,
jesse.brandeburg, alexander.h.duyck, kubakici, jasowang,
loseweigh
In-Reply-To: <20180420082802.6ca37e4c@xeon-e3>
Fri, Apr 20, 2018 at 05:28:02PM CEST, stephen@networkplumber.org wrote:
>On Thu, 19 Apr 2018 18:42:04 -0700
>Sridhar Samudrala <sridhar.samudrala@intel.com> wrote:
>
>> Use the registration/notification framework supported by the generic
>> failover infrastructure.
>>
>> Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
>
>Do what you want to other devices but leave netvsc alone.
>Adding these failover ops does not reduce the code size, and really is
>no benefit. The netvsc device driver needs to be backported to several
>other distributions and doing this makes that harder.
We should not care about the backport burden when we are trying to make
things right. And things are not right. The current netvsc approach is
just plain wrong shortcut. It should have been done in a generic way
from the very beginning. We are just trying to fix this situation.
Moreover, I believe that part of the fix is to convert netvsc to 3
netdev solution too. 2 netdev model is wrong.
>
>I will NAK patches to change to common code for netvsc especially the
>three device model. MS worked hard with distro vendors to support transparent
>mode, ans we really can't have a new model; or do backport.
>
>Plus, DPDK is now dependent on existing model.
Sorry, but nobody here cares about dpdk or other similar oddities.
^ permalink raw reply
* Re: [virtio-dev] Re: [PATCH v7 net-next 2/4] net: Introduce generic failover module
From: Michael S. Tsirkin @ 2018-04-20 16:03 UTC (permalink / raw)
To: Alexander Duyck
Cc: Samudrala, Sridhar, Stephen Hemminger, David Miller, Netdev,
virtualization, virtio-dev, Brandeburg, Jesse, Duyck, Alexander H,
Jakub Kicinski, Jason Wang, Siwei Liu, Jiri Pirko
In-Reply-To: <CAKgT0UeQTx7zJPK3K3eM9xxHfVyHXwJ-G_b8eqGn0bWAyt9aAg@mail.gmail.com>
On Fri, Apr 20, 2018 at 08:56:57AM -0700, Alexander Duyck wrote:
> On Fri, Apr 20, 2018 at 8:34 AM, Michael S. Tsirkin <mst@redhat.com> wrote:
> > On Fri, Apr 20, 2018 at 08:21:00AM -0700, Samudrala, Sridhar wrote:
> >> > > + finfo = netdev_priv(failover_dev);
> >> > > +
> >> > > + primary_dev = rtnl_dereference(finfo->primary_dev);
> >> > > + standby_dev = rtnl_dereference(finfo->standby_dev);
> >> > > +
> >> > > + if (slave_dev != primary_dev && slave_dev != standby_dev)
> >> > > + goto done;
> >> > > +
> >> > > + if ((primary_dev && failover_xmit_ready(primary_dev)) ||
> >> > > + (standby_dev && failover_xmit_ready(standby_dev))) {
> >> > > + netif_carrier_on(failover_dev);
> >> > > + netif_tx_wake_all_queues(failover_dev);
> >> > > + } else {
> >> > > + netif_carrier_off(failover_dev);
> >> > > + netif_tx_stop_all_queues(failover_dev);
> >> > And I think it's a good idea to get stats from device here too.
> >>
> >> Not sure why we need to get stats from lower devs here?
> >
> > link down is often indication of a hardware problem.
> > lower dev might stop responding down the road.
> >
> >> > > +static const struct net_device_ops failover_dev_ops = {
> >> > > + .ndo_open = failover_open,
> >> > > + .ndo_stop = failover_close,
> >> > > + .ndo_start_xmit = failover_start_xmit,
> >> > > + .ndo_select_queue = failover_select_queue,
> >> > > + .ndo_get_stats64 = failover_get_stats,
> >> > > + .ndo_change_mtu = failover_change_mtu,
> >> > > + .ndo_set_rx_mode = failover_set_rx_mode,
> >> > > + .ndo_validate_addr = eth_validate_addr,
> >> > > + .ndo_features_check = passthru_features_check,
> >> > xdp support?
> >>
> >> I think it should be possible to add it be calling the lower dev ndo_xdp routines
> >> with proper checks. can we add this later?
> >
> > I'd be concerned that if you don't xdp userspace will keep poking
> > at lower devs. Then it will stop working if you add this later.
>
> The failover device is better off not providing in-driver XDP since
> there are already skbs allocated by the time that we see the packet
> here anyway. As such generic XDP is the preferred way to handle this
> since it will work regardless of what lower devices are present.
>
> The only advantage of having XDP down at the virtio or VF level would
> be that it performs better, but at the cost of complexity since we
> would need to rebind the eBPF program any time a device is hotplugged
> out and then back in. For now the current approach is in keeping with
> how bonding and other similar drivers are currently handling this.
>
> Thanks.
>
> - Alex
OK fair enough.
--
MST
^ permalink raw reply
* Re: [net-next 1/3] tipc: set default MTU for UDP media
From: kbuild test robot @ 2018-04-20 16:06 UTC (permalink / raw)
To: GhantaKrishnamurthy MohanKrishna
Cc: kbuild-all, tipc-discussion, jon.maloy, maloy, ying.xue,
mohan.krishna.ghanta.krishnamurthy, netdev, davem
In-Reply-To: <1524128780-2550-2-git-send-email-mohan.krishna.ghanta.krishnamurthy@ericsson.com>
[-- Attachment #1: Type: text/plain, Size: 6973 bytes --]
Hi GhantaKrishnamurthy,
Thank you for the patch! Yet something to improve:
[auto build test ERROR on net-next/master]
url: https://github.com/0day-ci/linux/commits/GhantaKrishnamurthy-MohanKrishna/tipc-Confgiuration-of-MTU-for-media-UDP/20180420-224412
config: i386-randconfig-a0-201815 (attached as .config)
compiler: gcc-4.9 (Debian 4.9.4-2) 4.9.4
reproduce:
# save the attached .config to linux build tree
make ARCH=i386
Note: the linux-review/GhantaKrishnamurthy-MohanKrishna/tipc-Confgiuration-of-MTU-for-media-UDP/20180420-224412 HEAD 5757244a45c9114ee8a7ed60e9b074107605f6eb builds fine.
It only hurts bisectibility.
All errors (new ones prefixed by >>):
net/tipc/udp_media.c: In function 'tipc_udp_enable':
net/tipc/udp_media.c:716:20: error: 'struct tipc_media' has no member named 'mtu'
b->mtu = b->media->mtu;
^
net/tipc/udp_media.c: At top level:
>> net/tipc/udp_media.c:805:2: error: unknown field 'mtu' specified in initializer
.mtu = TIPC_DEF_LINK_UDP_MTU,
^
vim +/mtu +805 net/tipc/udp_media.c
632
633 /**
634 * tipc_udp_enable - callback to create a new udp bearer instance
635 * @net: network namespace
636 * @b: pointer to generic tipc_bearer
637 * @attrs: netlink bearer configuration
638 *
639 * validate the bearer parameters and initialize the udp bearer
640 * rtnl_lock should be held
641 */
642 static int tipc_udp_enable(struct net *net, struct tipc_bearer *b,
643 struct nlattr *attrs[])
644 {
645 int err = -EINVAL;
646 struct udp_bearer *ub;
647 struct udp_media_addr remote = {0};
648 struct udp_media_addr local = {0};
649 struct udp_port_cfg udp_conf = {0};
650 struct udp_tunnel_sock_cfg tuncfg = {NULL};
651 struct nlattr *opts[TIPC_NLA_UDP_MAX + 1];
652 u8 node_id[NODE_ID_LEN] = {0,};
653
654 ub = kzalloc(sizeof(*ub), GFP_ATOMIC);
655 if (!ub)
656 return -ENOMEM;
657
658 INIT_LIST_HEAD(&ub->rcast.list);
659
660 if (!attrs[TIPC_NLA_BEARER_UDP_OPTS])
661 goto err;
662
663 if (nla_parse_nested(opts, TIPC_NLA_UDP_MAX,
664 attrs[TIPC_NLA_BEARER_UDP_OPTS],
665 tipc_nl_udp_policy, NULL))
666 goto err;
667
668 if (!opts[TIPC_NLA_UDP_LOCAL] || !opts[TIPC_NLA_UDP_REMOTE]) {
669 pr_err("Invalid UDP bearer configuration");
670 err = -EINVAL;
671 goto err;
672 }
673
674 err = tipc_parse_udp_addr(opts[TIPC_NLA_UDP_LOCAL], &local,
675 &ub->ifindex);
676 if (err)
677 goto err;
678
679 err = tipc_parse_udp_addr(opts[TIPC_NLA_UDP_REMOTE], &remote, NULL);
680 if (err)
681 goto err;
682
683 /* Autoconfigure own node identity if needed */
684 if (!tipc_own_id(net)) {
685 memcpy(node_id, local.ipv6.in6_u.u6_addr8, 16);
686 tipc_net_init(net, node_id, 0);
687 }
688 if (!tipc_own_id(net)) {
689 pr_warn("Failed to set node id, please configure manually\n");
690 err = -EINVAL;
691 goto err;
692 }
693
694 b->bcast_addr.media_id = TIPC_MEDIA_TYPE_UDP;
695 b->bcast_addr.broadcast = TIPC_BROADCAST_SUPPORT;
696 rcu_assign_pointer(b->media_ptr, ub);
697 rcu_assign_pointer(ub->bearer, b);
698 tipc_udp_media_addr_set(&b->addr, &local);
699 if (local.proto == htons(ETH_P_IP)) {
700 struct net_device *dev;
701
702 dev = __ip_dev_find(net, local.ipv4.s_addr, false);
703 if (!dev) {
704 err = -ENODEV;
705 goto err;
706 }
707 udp_conf.family = AF_INET;
708 udp_conf.local_ip.s_addr = htonl(INADDR_ANY);
709 udp_conf.use_udp_checksums = false;
710 ub->ifindex = dev->ifindex;
711 if (tipc_mtu_bad(dev, sizeof(struct iphdr) +
712 sizeof(struct udphdr))) {
713 err = -EINVAL;
714 goto err;
715 }
> 716 b->mtu = b->media->mtu;
717 #if IS_ENABLED(CONFIG_IPV6)
718 } else if (local.proto == htons(ETH_P_IPV6)) {
719 udp_conf.family = AF_INET6;
720 udp_conf.use_udp6_tx_checksums = true;
721 udp_conf.use_udp6_rx_checksums = true;
722 udp_conf.local_ip6 = in6addr_any;
723 b->mtu = 1280;
724 #endif
725 } else {
726 err = -EAFNOSUPPORT;
727 goto err;
728 }
729 udp_conf.local_udp_port = local.port;
730 err = udp_sock_create(net, &udp_conf, &ub->ubsock);
731 if (err)
732 goto err;
733 tuncfg.sk_user_data = ub;
734 tuncfg.encap_type = 1;
735 tuncfg.encap_rcv = tipc_udp_recv;
736 tuncfg.encap_destroy = NULL;
737 setup_udp_tunnel_sock(net, ub->ubsock, &tuncfg);
738
739 /**
740 * The bcast media address port is used for all peers and the ip
741 * is used if it's a multicast address.
742 */
743 memcpy(&b->bcast_addr.value, &remote, sizeof(remote));
744 if (tipc_udp_is_mcast_addr(&remote))
745 err = enable_mcast(ub, &remote);
746 else
747 err = tipc_udp_rcast_add(b, &remote);
748 if (err)
749 goto err;
750
751 return 0;
752 err:
753 if (ub->ubsock)
754 udp_tunnel_sock_release(ub->ubsock);
755 kfree(ub);
756 return err;
757 }
758
759 /* cleanup_bearer - break the socket/bearer association */
760 static void cleanup_bearer(struct work_struct *work)
761 {
762 struct udp_bearer *ub = container_of(work, struct udp_bearer, work);
763 struct udp_replicast *rcast, *tmp;
764
765 list_for_each_entry_safe(rcast, tmp, &ub->rcast.list, list) {
766 list_del_rcu(&rcast->list);
767 kfree_rcu(rcast, rcu);
768 }
769
770 if (ub->ubsock)
771 udp_tunnel_sock_release(ub->ubsock);
772 synchronize_net();
773 kfree(ub);
774 }
775
776 /* tipc_udp_disable - detach bearer from socket */
777 static void tipc_udp_disable(struct tipc_bearer *b)
778 {
779 struct udp_bearer *ub;
780
781 ub = rcu_dereference_rtnl(b->media_ptr);
782 if (!ub) {
783 pr_err("UDP bearer instance not found\n");
784 return;
785 }
786 if (ub->ubsock)
787 sock_set_flag(ub->ubsock->sk, SOCK_DEAD);
788 RCU_INIT_POINTER(ub->bearer, NULL);
789
790 /* sock_release need to be done outside of rtnl lock */
791 INIT_WORK(&ub->work, cleanup_bearer);
792 schedule_work(&ub->work);
793 }
794
795 struct tipc_media udp_media_info = {
796 .send_msg = tipc_udp_send_msg,
797 .enable_media = tipc_udp_enable,
798 .disable_media = tipc_udp_disable,
799 .addr2str = tipc_udp_addr2str,
800 .addr2msg = tipc_udp_addr2msg,
801 .msg2addr = tipc_udp_msg2addr,
802 .priority = TIPC_DEF_LINK_PRI,
803 .tolerance = TIPC_DEF_LINK_TOL,
804 .window = TIPC_DEF_LINK_WIN,
> 805 .mtu = TIPC_DEF_LINK_UDP_MTU,
---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation
[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 31332 bytes --]
^ permalink raw reply
* Re: [virtio-dev] [pci PATCH v7 2/5] virtio_pci: Add support for unmanaged SR-IOV on virtio_pci devices
From: Alexander Duyck @ 2018-04-20 16:08 UTC (permalink / raw)
To: Michael S. Tsirkin, Daly, Dan, Rustad, Mark D
Cc: Bjorn Helgaas, Duyck, Alexander H, linux-pci, virtio-dev, kvm,
Netdev, LKML, linux-nvme, Keith Busch, netanel, Don Dutile,
Maximilian Heyne, Wang, Liang-min, David Woodhouse,
Christoph Hellwig, dwmw
In-Reply-To: <20180420180839-mutt-send-email-mst@kernel.org>
On Fri, Apr 20, 2018 at 8:28 AM, Michael S. Tsirkin <mst@redhat.com> wrote:
> On Fri, Apr 20, 2018 at 07:56:14AM -0700, Alexander Duyck wrote:
>> > I think for virtio it should include the feature bit, yes.
>> > Adding feature bit is very easy - post a patch to the virtio TC mailing
>> > list, wait about a week to give people time to respond (two weeks if it
>> > is around holidays and such).
>>
>> The problem is we are talking about hardware/FPGA, not software.
>> Adding a feature bit means going back and updating RTL. The software
>> side of things is easy, re-validating things after a hardware/FPGA
>> change not so much.
>>
>> If this is a hard requirement I may just drop the virtio patch, push
>> what I have, and leave it to Mark/Dan to deal with the necessary RTL
>> and code changes needed to support Virtio as I don't expect the
>> turnaround to be as easy as just a patch.
>>
>> Thanks.
>>
>> - Alex
>
> Let's focus on virtio in this thread.
That is kind of what I was thinking, and why I was thinking it might
make sense to make the virtio specific changes a separate patch set. I
could get the PCI bits taken care of in the meantime since they effect
genetic PCI, NVMe, and the Amazon ENA interfaces.
> Involving the virtio TC in host/guest interface changes is a
> hard requirement. It's just too easy to create conflicts otherwise.
>
> So you guys should have just sent the proposal to the TC when you
> were doing your RTL and you would have been in the clear.
Agreed. I believe I brought this up when I was originally asked to
look into the coding for this.
> Generally adding a feature bit with any extension is a good idea:
> this way you merely reserve a feature bit for your feature through
> the TC and are more or less sure of forward and backward compatibility.
> It's incredibly easy.
Agreed, though in this case I am not sure it makes sense since this
isn't necessarily something that is a Virtio feature itself. It is
just a side effect of the fact that they are adding SR-IOV support to
a device that happens to emulate Virtio NET and apparently their PF
has to be identical to the VF other than the PCIe extended config
space.
> But maybe it's not needed here. I am not making the decisions myself.
> Not too late: post to the TC list and let's see what the response is.
> Without a feature bit you are making a change affecting all future
> implementations without exception so the bar is a bit higher: you need
> to actually post a spec text proposal not just a patch showing how to
> use the feature, and TC needs to vote on it. Voting takes a week,
> review a week or two depending on change complexity.
>
> Hope this helps,
>
> --
> MST
I think I will leave this for Dan and Mark to handle since I am still
not all that familiar with the hardware in use here. Once a decision
has been made him and Mark could look at pushing either the one line
patch or something more complex involving a feature flag.
Thanks.
Alex
^ permalink raw reply
* Re: [RFC PATCH ghak32 V2 10/13] audit: add containerid support for seccomp and anom_abend records
From: Paul Moore @ 2018-04-20 16:11 UTC (permalink / raw)
To: Richard Guy Briggs
Cc: simo, jlayton, carlos, linux-api, containers, LKML, Eric Paris,
dhowells, Linux-Audit Mailing List, ebiederm, luto, netdev,
linux-fsdevel, cgroups, serge, viro
In-Reply-To: <20180420004218.tgndd474wgueyjzk@madcap2.tricolour.ca>
On Thu, Apr 19, 2018 at 8:42 PM, Richard Guy Briggs <rgb@redhat.com> wrote:
> On 2018-04-18 21:31, Paul Moore wrote:
>> On Fri, Mar 16, 2018 at 5:00 AM, Richard Guy Briggs <rgb@redhat.com> wrote:
>> > Add container ID auxiliary records to secure computing and abnormal end
>> > standalone records.
>> >
>> > Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
>> > ---
>> > kernel/auditsc.c | 10 ++++++++--
>> > 1 file changed, 8 insertions(+), 2 deletions(-)
>> >
>> > diff --git a/kernel/auditsc.c b/kernel/auditsc.c
>> > index 7103d23..2f02ed9 100644
>> > --- a/kernel/auditsc.c
>> > +++ b/kernel/auditsc.c
>> > @@ -2571,6 +2571,7 @@ static void audit_log_task(struct audit_buffer *ab)
>> > void audit_core_dumps(long signr)
>> > {
>> > struct audit_buffer *ab;
>> > + struct audit_context *context = audit_alloc_local();
>>
>> Looking quickly at do_coredump() I *believe* we can use current here.
>>
>> > if (!audit_enabled)
>> > return;
>> > @@ -2578,19 +2579,22 @@ void audit_core_dumps(long signr)
>> > if (signr == SIGQUIT) /* don't care for those */
>> > return;
>> >
>> > - ab = audit_log_start(NULL, GFP_KERNEL, AUDIT_ANOM_ABEND);
>> > + ab = audit_log_start(context, GFP_KERNEL, AUDIT_ANOM_ABEND);
>> > if (unlikely(!ab))
>> > return;
>> > audit_log_task(ab);
>> > audit_log_format(ab, " sig=%ld res=1", signr);
>> > audit_log_end(ab);
>> > + audit_log_container_info(context, "abend", audit_get_containerid(current));
>> > + audit_free_context(context);
>> > }
>> >
>> > void __audit_seccomp(unsigned long syscall, long signr, int code)
>> > {
>> > struct audit_buffer *ab;
>> > + struct audit_context *context = audit_alloc_local();
>>
>> We can definitely use current here.
>
> Ok, so both syscall aux records. That elimintes this patch from the
> set, can go in independently.
Yep. It should help shrink the audit container ID patchset and
perhaps more importantly it should put some distance between the
connected-record debate and the audit container ID debate.
I understand we are going to need a "local" context for some things,
the network packets are probably the best example, but whenever
possible I would like to connect these records back to a task's
context.
--
paul moore
www.paul-moore.com
^ permalink raw reply
* Re: [RFC PATCH ghak32 V2 05/13] audit: add containerid support for ptrace and signals
From: Paul Moore @ 2018-04-20 16:13 UTC (permalink / raw)
To: Richard Guy Briggs
Cc: cgroups, containers, linux-api, Linux-Audit Mailing List,
linux-fsdevel, LKML, netdev, ebiederm, luto, jlayton, carlos,
dhowells, viro, simo, Eric Paris, serge
In-Reply-To: <20180420010320.panie6mtdafxl65y@madcap2.tricolour.ca>
On Thu, Apr 19, 2018 at 9:03 PM, Richard Guy Briggs <rgb@redhat.com> wrote:
> On 2018-04-18 20:32, Paul Moore wrote:
>> On Fri, Mar 16, 2018 at 5:00 AM, Richard Guy Briggs <rgb@redhat.com> wrote:
...
>> > /*
>> > * audit_log_container_info - report container info
>> > - * @tsk: task to be recorded
>> > * @context: task or local context for record
>> > + * @op: containerid string description
>> > + * @containerid: container ID to report
>> > */
>> > -int audit_log_container_info(struct task_struct *tsk, struct audit_context *context)
>> > +int audit_log_container_info(struct audit_context *context,
>> > + char *op, u64 containerid)
>> > {
>> > struct audit_buffer *ab;
>> >
>> > - if (!audit_containerid_set(tsk))
>> > + if (!cid_valid(containerid))
>> > return 0;
>> > /* Generate AUDIT_CONTAINER_INFO with container ID */
>> > ab = audit_log_start(context, GFP_KERNEL, AUDIT_CONTAINER_INFO);
>> > if (!ab)
>> > return -ENOMEM;
>> > - audit_log_format(ab, "contid=%llu", audit_get_containerid(tsk));
>> > + audit_log_format(ab, "op=%s contid=%llu", op, containerid);
>> > audit_log_end(ab);
>> > return 0;
>> > }
>>
>> Let's get these changes into the first patch where
>> audit_log_container_info() is defined. Why? This inserts a new field
>> into the record which is a no-no. Yes, it is one single patchset, but
>> they are still separate patches and who knows which patches a given
>> distribution and/or tree may decide to backport.
>
> Fair enough. That first thought went through my mind... Would it be
> sufficient to move that field addition to the first patch and leave the
> rest here to support trace and signals?
I should have been more clear ... yes, that's what I was thinking; the
record format is the important part as it's user visible.
--
paul moore
www.paul-moore.com
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox