Netdev List
 help / color / mirror / Atom feed
* [pci PATCH v8 3/4] nvme: Migrate over to unmanaged SR-IOV support
From: Alexander Duyck @ 2018-04-20 16:31 UTC (permalink / raw)
  To: bhelgaas, alexander.h.duyck, linux-pci
  Cc: virtio-dev, kvm, netdev, dan.daly, linux-kernel, linux-nvme,
	keith.busch, netanel, ddutile, mheyne, liang-min.wang,
	mark.d.rustad, dwmw2, hch, dwmw
In-Reply-To: <20180420162633.46077.49012.stgit@ahduyck-green-test.jf.intel.com>

Instead of implementing our own version of a SR-IOV configuration stub in
the nvme driver we can just reuse the existing
pci_sriov_configure_simple function.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
---

v5: Replaced call to pci_sriov_configure_unmanaged with
        pci_sriov_configure_simple
v6: Dropped "#ifdef" checks for IOV wrapping sriov_configure definition
v7: No code change, added Reviewed-by

 drivers/nvme/host/pci.c |   20 +-------------------
 1 file changed, 1 insertion(+), 19 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index b6f43b7..ad85cf35 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -2581,24 +2581,6 @@ static void nvme_remove(struct pci_dev *pdev)
 	nvme_put_ctrl(&dev->ctrl);
 }
 
-static int nvme_pci_sriov_configure(struct pci_dev *pdev, int numvfs)
-{
-	int ret = 0;
-
-	if (numvfs == 0) {
-		if (pci_vfs_assigned(pdev)) {
-			dev_warn(&pdev->dev,
-				"Cannot disable SR-IOV VFs while assigned\n");
-			return -EPERM;
-		}
-		pci_disable_sriov(pdev);
-		return 0;
-	}
-
-	ret = pci_enable_sriov(pdev, numvfs);
-	return ret ? ret : numvfs;
-}
-
 #ifdef CONFIG_PM_SLEEP
 static int nvme_suspend(struct device *dev)
 {
@@ -2717,7 +2699,7 @@ static void nvme_error_resume(struct pci_dev *pdev)
 	.driver		= {
 		.pm	= &nvme_dev_pm_ops,
 	},
-	.sriov_configure = nvme_pci_sriov_configure,
+	.sriov_configure = pci_sriov_configure_simple,
 	.err_handler	= &nvme_err_handler,
 };
 

^ permalink raw reply related

* [pci PATCH v8 2/4] ena: Migrate over to unmanaged SR-IOV support
From: Alexander Duyck @ 2018-04-20 16:30 UTC (permalink / raw)
  To: bhelgaas, alexander.h.duyck, linux-pci
  Cc: virtio-dev, kvm, netdev, dan.daly, linux-kernel, linux-nvme,
	keith.busch, netanel, ddutile, mheyne, liang-min.wang,
	mark.d.rustad, dwmw2, hch, dwmw
In-Reply-To: <20180420162633.46077.49012.stgit@ahduyck-green-test.jf.intel.com>

Instead of implementing our own version of a SR-IOV configuration stub in
the ena driver we can just reuse the existing
pci_sriov_configure_simple function.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
---

v5: Replaced call to pci_sriov_configure_unmanaged with
        pci_sriov_configure_simple
v6: Dropped "#ifdef" checks for IOV wrapping sriov_configure definition
v7: No change

 drivers/net/ethernet/amazon/ena/ena_netdev.c |   28 +-------------------------
 1 file changed, 1 insertion(+), 27 deletions(-)

diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c b/drivers/net/ethernet/amazon/ena/ena_netdev.c
index a822e70..f2af87d 100644
--- a/drivers/net/ethernet/amazon/ena/ena_netdev.c
+++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c
@@ -3386,32 +3386,6 @@ static int ena_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 }
 
 /*****************************************************************************/
-static int ena_sriov_configure(struct pci_dev *dev, int numvfs)
-{
-	int rc;
-
-	if (numvfs > 0) {
-		rc = pci_enable_sriov(dev, numvfs);
-		if (rc != 0) {
-			dev_err(&dev->dev,
-				"pci_enable_sriov failed to enable: %d vfs with the error: %d\n",
-				numvfs, rc);
-			return rc;
-		}
-
-		return numvfs;
-	}
-
-	if (numvfs == 0) {
-		pci_disable_sriov(dev);
-		return 0;
-	}
-
-	return -EINVAL;
-}
-
-/*****************************************************************************/
-/*****************************************************************************/
 
 /* ena_remove - Device Removal Routine
  * @pdev: PCI device information struct
@@ -3526,7 +3500,7 @@ static int ena_resume(struct pci_dev *pdev)
 	.suspend    = ena_suspend,
 	.resume     = ena_resume,
 #endif
-	.sriov_configure = ena_sriov_configure,
+	.sriov_configure = pci_sriov_configure_simple,
 };
 
 static int __init ena_init(void)

^ permalink raw reply related

* [pci PATCH v8 1/4] pci: Add pci_sriov_configure_simple for PFs that don't manage VF resources
From: Alexander Duyck @ 2018-04-20 16:28 UTC (permalink / raw)
  To: bhelgaas, alexander.h.duyck, linux-pci
  Cc: virtio-dev, kvm, netdev, dan.daly, linux-kernel, linux-nvme,
	keith.busch, netanel, ddutile, mheyne, liang-min.wang,
	mark.d.rustad, dwmw2, hch, dwmw
In-Reply-To: <20180420162633.46077.49012.stgit@ahduyck-green-test.jf.intel.com>

This patch adds a common configuration function called
pci_sriov_configure_simple that will allow for managing VFs on devices
where the PF is not capable of managing VF resources.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Mark Rustad <mark.d.rustad@intel.com>
---

v5: New patch replacing pci_sriov_configure_unmanaged with
      pci_sriov_configure_simple
    Dropped bits related to autoprobe changes
v6: Defined pci_sriov_configure_simple as NULL if IOV is disabled
v7: Updated pci_sriov_configure_simple to drop need for err value
    Fixed comment explaining why pci_sriov_configure_simple is NULL

 drivers/pci/iov.c   |   31 +++++++++++++++++++++++++++++++
 include/linux/pci.h |    3 +++
 2 files changed, 34 insertions(+)

diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
index 677924a..3e0a7fd 100644
--- a/drivers/pci/iov.c
+++ b/drivers/pci/iov.c
@@ -807,3 +807,34 @@ int pci_sriov_get_totalvfs(struct pci_dev *dev)
 	return dev->sriov->total_VFs;
 }
 EXPORT_SYMBOL_GPL(pci_sriov_get_totalvfs);
+
+/**
+ * pci_sriov_configure_simple - helper to configure unmanaged SR-IOV
+ * @dev: the PCI device
+ * @nr_virtfn: number of virtual functions to enable, 0 to disable
+ *
+ * Used to provide generic enable/disable SR-IOV option for devices
+ * that do not manage the VFs generated by their driver. Return value
+ * is negative on error, or number of VFs allocated on success.
+ */
+int pci_sriov_configure_simple(struct pci_dev *dev, int nr_virtfn)
+{
+	might_sleep();
+
+	if (!dev->is_physfn)
+		return -ENODEV;
+
+	if (pci_vfs_assigned(dev)) {
+		pci_warn(dev,
+			 "Cannot modify SR-IOV while VFs are assigned\n");
+		return -EPERM;
+	}
+
+	if (!nr_virtfn) {
+		sriov_disable(dev);
+		return 0;
+	}
+
+	return sriov_enable(dev, nr_virtfn) ? : nr_virtfn;
+}
+EXPORT_SYMBOL_GPL(pci_sriov_configure_simple);
diff --git a/include/linux/pci.h b/include/linux/pci.h
index ae42289..7d36e39 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -1955,6 +1955,7 @@ static inline void pci_mmcfg_late_init(void) { }
 int pci_vfs_assigned(struct pci_dev *dev);
 int pci_sriov_set_totalvfs(struct pci_dev *dev, u16 numvfs);
 int pci_sriov_get_totalvfs(struct pci_dev *dev);
+int pci_sriov_configure_simple(struct pci_dev *dev, int nr_virtfn);
 resource_size_t pci_iov_resource_size(struct pci_dev *dev, int resno);
 void pci_vf_drivers_autoprobe(struct pci_dev *dev, bool probe);
 #else
@@ -1982,6 +1983,8 @@ static inline int pci_sriov_set_totalvfs(struct pci_dev *dev, u16 numvfs)
 { return 0; }
 static inline int pci_sriov_get_totalvfs(struct pci_dev *dev)
 { return 0; }
+/* this is expected to be used as a function pointer, just define as NULL */
+#define pci_sriov_configure_simple NULL
 static inline resource_size_t pci_iov_resource_size(struct pci_dev *dev, int resno)
 { return 0; }
 static inline void pci_vf_drivers_autoprobe(struct pci_dev *dev, bool probe) { }

^ permalink raw reply related

* [pci PATCH v8 0/4] Add support for unmanaged SR-IOV
From: Alexander Duyck @ 2018-04-20 16:28 UTC (permalink / raw)
  To: bhelgaas, alexander.h.duyck, linux-pci
  Cc: virtio-dev, kvm, netdev, dan.daly, linux-kernel, linux-nvme,
	keith.busch, netanel, ddutile, mheyne, liang-min.wang,
	mark.d.rustad, dwmw2, hch, dwmw

This series is meant to add support for SR-IOV on devices when the VFs are
not managed by the kernel. Examples of recent patches attempting to do this
include:
virto - https://patchwork.kernel.org/patch/10241225/
pci-stub - https://patchwork.kernel.org/patch/10109935/
vfio - https://patchwork.kernel.org/patch/10103353/
uio - https://patchwork.kernel.org/patch/9974031/

Since this is quickly blowing up into a multi-driver problem it is probably
best to implement this solution as generically as possible.

This series is an attempt to do that. What we do with this patch set is
provide a generic framework to enable SR-IOV in the case that the PF driver
doesn't support managing the VFs itself.

I based my patch set originally on the patch by Mark Rustad but there isn't
much left after going through and cleaning out the bits that were no longer
needed, and after incorporating the feedback from David Miller. At this point
the only items to be fully reused was his patch description which is now
present in patch 3 of the set.

This solution is limited in scope to just adding support for devices that
provide no functionality for SR-IOV other than allocating the VFs by
calling pci_enable_sriov. Previous sets had included patches for VFIO, but
for now I am dropping that as the scope of that work is larger then I
think I can take on at this time.

v2: Reduced scope back to just virtio_pci and vfio-pci
    Broke into 3 patch set from single patch
    Changed autoprobe behavior to always set when num_vfs is set non-zero
v3: Updated Documentation to clarify when sriov_unmanaged_autoprobe is used
    Wrapped vfio_pci_sriov_configure to fix build errors w/o SR-IOV in kernel
v4: Dropped vfio-pci patch
    Added ena and nvme to drivers now using pci_sriov_configure_unmanaged
    Dropped pci_disable_sriov call in virtio_pci to be consistent with ena
v5: Dropped sriov_unmanaged_autoprobe and pci_sriov_conifgure_unmanaged
    Added new patch that enables pci_sriov_configure_simple
    Updated drivers to use pci_sriov_configure_simple
v6: Defined pci_sriov_configure_simple as NULL when SR-IOV is not enabled
    Updated drivers to drop "#ifdef" checks for IOV
    Added pci-pf-stub as place for PF-only drivers to add support
v7: Dropped pci_id table explanation from pci-pf-stub driver
    Updated pci_sriov_configure_simple to drop need for err value
    Fixed comment explaining why pci_sriov_configure_simple is NULL
v8: Dropped virtio from the set, support to be added later after TC approval

Cc: Mark Rustad <mark.d.rustad@intel.com>
Cc: Maximilian Heyne <mheyne@amazon.de>
Cc: Liang-Min Wang <liang-min.wang@intel.com>
Cc: David Woodhouse <dwmw@amazon.co.uk>

---

Alexander Duyck (4):
      pci: Add pci_sriov_configure_simple for PFs that don't manage VF resources
      ena: Migrate over to unmanaged SR-IOV support
      nvme: Migrate over to unmanaged SR-IOV support
      pci-pf-stub: Add PF driver stub for PFs that function only to enable VFs


 drivers/net/ethernet/amazon/ena/ena_netdev.c |   28 -------------
 drivers/nvme/host/pci.c                      |   20 ----------
 drivers/pci/Kconfig                          |   12 ++++++
 drivers/pci/Makefile                         |    2 +
 drivers/pci/iov.c                            |   31 +++++++++++++++
 drivers/pci/pci-pf-stub.c                    |   54 ++++++++++++++++++++++++++
 include/linux/pci.h                          |    3 +
 include/linux/pci_ids.h                      |    2 +
 8 files changed, 106 insertions(+), 46 deletions(-)
 create mode 100644 drivers/pci/pci-pf-stub.c

--

^ permalink raw reply

* Re: [RFC PATCH ghak32 V2 06/13] audit: add support for non-syscall auxiliary records
From: Paul Moore @ 2018-04-20 16:21 UTC (permalink / raw)
  To: Richard Guy Briggs
  Cc: cgroups, containers, linux-api, Linux-Audit Mailing List,
	linux-fsdevel, LKML, netdev, ebiederm, luto, jlayton, carlos,
	dhowells, viro, simo, Eric Paris, serge
In-Reply-To: <20180420012346.udnga5pfdjoazcfc@madcap2.tricolour.ca>

On Thu, Apr 19, 2018 at 9:23 PM, Richard Guy Briggs <rgb@redhat.com> wrote:
> On 2018-04-18 20:39, Paul Moore wrote:
>> On Fri, Mar 16, 2018 at 5:00 AM, Richard Guy Briggs <rgb@redhat.com> wrote:
>> > Standalone audit records have the timestamp and serial number generated
>> > on the fly and as such are unique, making them standalone.  This new
>> > function audit_alloc_local() generates a local audit context that will
>> > be used only for a standalone record and its auxiliary record(s).  The
>> > context is discarded immediately after the local associated records are
>> > produced.
>> >
>> > Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
>> > ---
>> >  include/linux/audit.h |  8 ++++++++
>> >  kernel/auditsc.c      | 20 +++++++++++++++++++-
>> >  2 files changed, 27 insertions(+), 1 deletion(-)
>> >
>> > diff --git a/include/linux/audit.h b/include/linux/audit.h
>> > index ed16bb6..c0b83cb 100644
>> > --- a/include/linux/audit.h
>> > +++ b/include/linux/audit.h
>> > @@ -227,7 +227,9 @@ static inline int audit_log_container_info(struct audit_context *context,
>> >  /* These are defined in auditsc.c */
>> >                                 /* Public API */
>> >  extern int  audit_alloc(struct task_struct *task);
>> > +extern struct audit_context *audit_alloc_local(void);
>> >  extern void __audit_free(struct task_struct *task);
>> > +extern void audit_free_context(struct audit_context *context);
>> >  extern void __audit_syscall_entry(int major, unsigned long a0, unsigned long a1,
>> >                                   unsigned long a2, unsigned long a3);
>> >  extern void __audit_syscall_exit(int ret_success, long ret_value);
>> > @@ -472,6 +474,12 @@ static inline int audit_alloc(struct task_struct *task)
>> >  {
>> >         return 0;
>> >  }
>> > +static inline struct audit_context *audit_alloc_local(void)
>> > +{
>> > +       return NULL;
>> > +}
>> > +static inline void audit_free_context(struct audit_context *context)
>> > +{ }
>> >  static inline void audit_free(struct task_struct *task)
>> >  { }
>> >  static inline void audit_syscall_entry(int major, unsigned long a0,
>> > diff --git a/kernel/auditsc.c b/kernel/auditsc.c
>> > index 2932ef1..7103d23 100644
>> > --- a/kernel/auditsc.c
>> > +++ b/kernel/auditsc.c
>> > @@ -959,8 +959,26 @@ int audit_alloc(struct task_struct *tsk)
>> >         return 0;
>> >  }
>> >
>> > -static inline void audit_free_context(struct audit_context *context)
>> > +struct audit_context *audit_alloc_local(void)
>> >  {
>> > +       struct audit_context *context;
>> > +
>> > +       if (!audit_ever_enabled)
>> > +               return NULL; /* Return if not auditing. */
>> > +
>> > +       context = audit_alloc_context(AUDIT_RECORD_CONTEXT);
>> > +       if (!context)
>> > +               return NULL;
>> > +       context->serial = audit_serial();
>> > +       context->ctime = current_kernel_time64();
>> > +       context->in_syscall = 1;
>> > +       return context;
>> > +}
>> > +
>> > +inline void audit_free_context(struct audit_context *context)
>> > +{
>> > +       if (!context)
>> > +               return;
>> >         audit_free_names(context);
>> >         unroll_tree_refs(context, NULL, 0);
>> >         free_tree_refs(context);
>>
>> I'm reserving the option to comment on this idea further as I make my
>> way through the patchset, but audit_free_context() definitely
>> shouldn't be declared as an inline function.
>
> Ok, I think I follow.  When it wasn't exported, inline was fine, but now
> that it has been exported, it should no longer be inlined ...

Pretty much.  Based on a few comments I've seen by compiler folks over
the years, my current thinking is that we shouldn't worry about
explicit inlining static functions in C files (header files are a
different story).  The basic idea being that the compiler almost
always does a better job than us stupid developers.

> ... or should use
> an intermediate function name to export so that local uses of it can
> remain inline.

Possibly, but my guess is that the compiler could (will?) do that by
itself for code that lives in the same file.

-- 
paul moore
www.paul-moore.com

^ permalink raw reply

* Re: [PATCH net-next 2/2] netns: isolate seqnums to use per-netns locks
From: Christian Brauner @ 2018-04-20 16:16 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: davem, netdev, linux-kernel, avagin, ktkhai, serge, gregkh
In-Reply-To: <20180420135627.GA8350@gmail.com>

On Fri, Apr 20, 2018 at 03:56:28PM +0200, Christian Brauner wrote:
> On Wed, Apr 18, 2018 at 11:52:47PM +0200, Christian Brauner wrote:
> > On Wed, Apr 18, 2018 at 11:55:52AM -0500, Eric W. Biederman wrote:
> > > Christian Brauner <christian.brauner@ubuntu.com> writes:
> > > 
> > > > Now that it's possible to have a different set of uevents in different
> > > > network namespaces, per-network namespace uevent sequence numbers are
> > > > introduced. This increases performance as locking is now restricted to the
> > > > network namespace affected by the uevent rather than locking
> > > > everything.
> > > 
> > > Numbers please.  I personally expect that the netlink mc_list issues
> > > will swamp any benefit you get from this.
> > 
> > I wouldn't see how this would be the case. The gist of this is:
> > Everytime you send a uevent into a network namespace *not* owned by
> > init_user_ns you currently *have* to take mutex_lock(uevent_sock_list)
> > effectively blocking the host from processing uevents even though
> > - the uevent you're receiving might be totally different from the
> >   uevent that you're sending
> > - the uevent socket of the non-init_user_ns owned network namespace
> >   isn't even recorded in the list.
> > 
> > The other argument is that we now have properly isolated network
> > namespaces wrt to uevents such that each netns can have its own set of
> > uevents. This can either happen by a sufficiently privileged userspace
> > process sending it uevents that are only dedicated to that specific
> > netns. Or - and this *has been true for a long time* - because network
> > devices are *properly namespaced*. Meaning a uevent for that network
> > device is *tied to a network namespace*. For both cases the uevent
> > sequence numbering will be absolutely misleading. For example, whenever
> > you create e.g. a new veth device in a new network namespace it
> > shouldn't be accounted against the initial network namespace but *only*
> > against the network namespace that has that device added to it.
> 
> Eric, I did the testing. Here's what I did:
> 
> I compiled two 4.17-rc1 Kernels:
> - one with per netns uevent seqnums with decoupled locking
> - one without per netns uevent seqnums with decoupled locking
> 
> # Testcase 1:
> Only Injecting Uevents into network namespaces not owned by the initial user
> namespace.
> - created 1000 new user namespace + network namespace pairs
> - opened a uevent listener in each of those namespace pairs
> - injected uevents into each of those network namespaces 10,000 times meaning
>   10,000,000 (10 million) uevents were injected. (The high number of
>   uevent injections should get rid of a lot of jitter.)
> - Calculated the mean transaction time.
> - *without* uevent sequence number namespacing:
>   67 μs
> - *with* uevent sequence number namespacing:
>   55 μs
> - makes a difference of 12 μs
> 
> # Testcase 2:
> Injecting Uevents into network namespaces not owned by the initial user
> namespace and network namespaces owned by the initial user namespace.
> - created 500 new user namespace + network namespace pairs
> - created 500 new network namespace pairs
> - opened a uevent listener in each of those namespace pairs
> - injected uevents into each of those network namespaces 10,000 times meaning
>   10,000,000 (10 million) uevents were injected. (The high number of
>   uevent injections should get rid of a lot of jitter.)
> - Calculated the mean transaction time.
> - *without* uevent sequence number namespacing:
>   572 μs
> - *with* uevent sequence number namespacing:
>   514 μs
> - makes a difference of 58 μs
> 
> So there's performance gain. The third case would be to create a bunch
> of hanging processes that send SIGSTOP to themselves but do not actually
> open a uevent socket in their respective namespaces and then inject
> uevents into them. I expect there to be an even more performance
> benefits since the rtnl_table_lock() isn't hit in this case because
> there are no listeners.

I did the third test-case as well so:
- created 500 new user namespace + network namespace pairs *without
  uevent listeners*
- created 500 new network namespace pairs *without uevent listeners*
- injected uevents into each of those network namespaces 10,000 times meaning
  10,000,000 (10 million) uevents were injected. (The high number of
  uevent injections should get rid of a lot of jitter.)
- Calculated the mean transaction time.
- *without* uevent sequence number namespacing:
  206 μs
- *with* uevent sequence number namespacing:
  163 μs
- makes a difference of 43 μs

So this test-case shows performance improvement as well.

Thanks!
Christian

^ permalink raw reply

* [PATCH net] bpf: sockmap remove dead check
From: Jann Horn @ 2018-04-20 16:16 UTC (permalink / raw)
  To: ast, daniel, netdev, linux-kernel, john.fastabend, jannh
  Cc: linux-kernel, John Fastabend

Remove dead code that bails on `attr->value_size > KMALLOC_MAX_SIZE` - the
previous check already bails on `attr->value_size != 4`.

Signed-off-by: Jann Horn <jannh@google.com>
---
 kernel/bpf/sockmap.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/kernel/bpf/sockmap.c b/kernel/bpf/sockmap.c
index 8dd9210d7db7..a3b21385e947 100644
--- a/kernel/bpf/sockmap.c
+++ b/kernel/bpf/sockmap.c
@@ -1442,9 +1442,6 @@ static struct bpf_map *sock_map_alloc(union bpf_attr *attr)
 	    attr->value_size != 4 || attr->map_flags & ~SOCK_CREATE_FLAG_MASK)
 		return ERR_PTR(-EINVAL);
 
-	if (attr->value_size > KMALLOC_MAX_SIZE)
-		return ERR_PTR(-E2BIG);
-
 	err = bpf_tcp_ulp_register();
 	if (err && err != -EEXIST)
 		return ERR_PTR(err);
-- 
2.17.0.484.g0c8726318c-goog

^ permalink raw reply related

* Re: [virtio-dev] [pci PATCH v7 2/5] virtio_pci: Add support for unmanaged SR-IOV on virtio_pci devices
From: Michael S. Tsirkin @ 2018-04-20 16:14 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: Daly, Dan, Rustad, Mark D, Bjorn Helgaas, Duyck, Alexander H,
	linux-pci, virtio-dev, kvm, Netdev, LKML, linux-nvme, Keith Busch,
	netanel, Don Dutile, Maximilian Heyne, Wang, Liang-min,
	David Woodhouse, Christoph Hellwig, dwmw
In-Reply-To: <CAKgT0UeWzK=_8m9hhWfaKyYB02WXDsuCkgowS4hgQ9GkLMcAMA@mail.gmail.com>

On Fri, Apr 20, 2018 at 09:08:51AM -0700, Alexander Duyck wrote:
> On Fri, Apr 20, 2018 at 8:28 AM, Michael S. Tsirkin <mst@redhat.com> wrote:
> > On Fri, Apr 20, 2018 at 07:56:14AM -0700, Alexander Duyck wrote:
> >> > I think for virtio it should include the feature bit, yes.
> >> > Adding feature bit is very easy - post a patch to the virtio TC mailing
> >> > list, wait about a week to give people time to respond (two weeks if it
> >> > is around holidays and such).
> >>
> >> The problem is we are talking about hardware/FPGA, not software.
> >> Adding a feature bit means going back and updating RTL. The software
> >> side of things is easy, re-validating things after a hardware/FPGA
> >> change not so much.
> >>
> >> If this is a hard requirement I may just drop the virtio patch, push
> >> what I have, and leave it to Mark/Dan to deal with the necessary RTL
> >> and code changes needed to support Virtio as I don't expect the
> >> turnaround to be as easy as just a patch.
> >>
> >> Thanks.
> >>
> >> - Alex
> >
> > Let's focus on virtio in this thread.
> 
> That is kind of what I was thinking, and why I was thinking it might
> make sense to make the virtio specific changes a separate patch set. I
> could get the PCI bits taken care of in the meantime since they effect
> genetic PCI, NVMe, and the Amazon ENA interfaces.
> 
> > Involving the virtio TC in host/guest interface changes is a
> > hard requirement. It's just too easy to create conflicts otherwise.
> >
> > So you guys should have just sent the proposal to the TC when you
> > were doing your RTL and you would have been in the clear.
> 
> Agreed. I believe I brought this up when I was originally asked to
> look into the coding for this.
> 
> > Generally adding a feature bit with any extension is a good idea:
> > this way you merely reserve a feature bit for your feature through
> > the TC and are more or less sure of forward and backward compatibility.
> > It's incredibly easy.
> 
> Agreed, though in this case I am not sure it makes sense since this
> isn't necessarily something that is a Virtio feature itself. It is
> just a side effect of the fact that they are adding SR-IOV support to
> a device that happens to emulate Virtio NET and apparently their PF
> has to be identical to the VF other than the PCIe extended config
> space.

I got that. My point is not everyone implementing SR-IOV will
want to do it like this. Others might want to have VFs
be different from PFs somehow. Feature bits ensure forward
not just backward compatibility.


> > But maybe it's not needed here.  I am not making the decisions myself.
> > Not too late: post to the TC list and let's see what the response is.
> > Without a feature bit you are making a change affecting all future
> > implementations without exception so the bar is a bit higher: you need
> > to actually post a spec text proposal not just a patch showing how to
> > use the feature, and TC needs to vote on it. Voting takes a week,
> > review a week or two depending on change complexity.
> >
> > Hope this helps,
> >
> > --
> > MST
> 
> I think I will leave this for Dan and Mark to handle since I am still
> not all that familiar with the hardware in use here. Once a decision
> has been made him and Mark could look at pushing either the one line
> patch or something more complex involving a feature flag.
> 
> Thanks.
> 
> Alex

As long as the TC is involved.

I know it's a bit of a strange thing to block it at the driver level,
the issue is with the device, but it's literally the only handle I have
to prevent people from doing out of spec hacks then pushing it all on us
to maintain.

-- 
MST

^ permalink raw reply

* Re: [RFC PATCH ghak32 V2 05/13] audit: add containerid support for ptrace and signals
From: Paul Moore @ 2018-04-20 16:13 UTC (permalink / raw)
  To: Richard Guy Briggs
  Cc: cgroups, containers, linux-api, Linux-Audit Mailing List,
	linux-fsdevel, LKML, netdev, ebiederm, luto, jlayton, carlos,
	dhowells, viro, simo, Eric Paris, serge
In-Reply-To: <20180420010320.panie6mtdafxl65y@madcap2.tricolour.ca>

On Thu, Apr 19, 2018 at 9:03 PM, Richard Guy Briggs <rgb@redhat.com> wrote:
> On 2018-04-18 20:32, Paul Moore wrote:
>> On Fri, Mar 16, 2018 at 5:00 AM, Richard Guy Briggs <rgb@redhat.com> wrote:

...

>> >  /*
>> >   * audit_log_container_info - report container info
>> > - * @tsk: task to be recorded
>> >   * @context: task or local context for record
>> > + * @op: containerid string description
>> > + * @containerid: container ID to report
>> >   */
>> > -int audit_log_container_info(struct task_struct *tsk, struct audit_context *context)
>> > +int audit_log_container_info(struct audit_context *context,
>> > +                             char *op, u64 containerid)
>> >  {
>> >         struct audit_buffer *ab;
>> >
>> > -       if (!audit_containerid_set(tsk))
>> > +       if (!cid_valid(containerid))
>> >                 return 0;
>> >         /* Generate AUDIT_CONTAINER_INFO with container ID */
>> >         ab = audit_log_start(context, GFP_KERNEL, AUDIT_CONTAINER_INFO);
>> >         if (!ab)
>> >                 return -ENOMEM;
>> > -       audit_log_format(ab, "contid=%llu", audit_get_containerid(tsk));
>> > +       audit_log_format(ab, "op=%s contid=%llu", op, containerid);
>> >         audit_log_end(ab);
>> >         return 0;
>> >  }
>>
>> Let's get these changes into the first patch where
>> audit_log_container_info() is defined.  Why?  This inserts a new field
>> into the record which is a no-no.  Yes, it is one single patchset, but
>> they are still separate patches and who knows which patches a given
>> distribution and/or tree may decide to backport.
>
> Fair enough.  That first thought went through my mind...  Would it be
> sufficient to move that field addition to the first patch and leave the
> rest here to support trace and signals?

I should have been more clear ... yes, that's what I was thinking; the
record format is the important part as it's user visible.

-- 
paul moore
www.paul-moore.com

^ permalink raw reply

* Re: [RFC PATCH ghak32 V2 10/13] audit: add containerid support for seccomp and anom_abend records
From: Paul Moore @ 2018-04-20 16:11 UTC (permalink / raw)
  To: Richard Guy Briggs
  Cc: simo, jlayton, carlos, linux-api, containers, LKML, Eric Paris,
	dhowells, Linux-Audit Mailing List, ebiederm, luto, netdev,
	linux-fsdevel, cgroups, serge, viro
In-Reply-To: <20180420004218.tgndd474wgueyjzk@madcap2.tricolour.ca>

On Thu, Apr 19, 2018 at 8:42 PM, Richard Guy Briggs <rgb@redhat.com> wrote:
> On 2018-04-18 21:31, Paul Moore wrote:
>> On Fri, Mar 16, 2018 at 5:00 AM, Richard Guy Briggs <rgb@redhat.com> wrote:
>> > Add container ID auxiliary records to secure computing and abnormal end
>> > standalone records.
>> >
>> > Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
>> > ---
>> >  kernel/auditsc.c | 10 ++++++++--
>> >  1 file changed, 8 insertions(+), 2 deletions(-)
>> >
>> > diff --git a/kernel/auditsc.c b/kernel/auditsc.c
>> > index 7103d23..2f02ed9 100644
>> > --- a/kernel/auditsc.c
>> > +++ b/kernel/auditsc.c
>> > @@ -2571,6 +2571,7 @@ static void audit_log_task(struct audit_buffer *ab)
>> >  void audit_core_dumps(long signr)
>> >  {
>> >         struct audit_buffer *ab;
>> > +       struct audit_context *context = audit_alloc_local();
>>
>> Looking quickly at do_coredump() I *believe* we can use current here.
>>
>> >         if (!audit_enabled)
>> >                 return;
>> > @@ -2578,19 +2579,22 @@ void audit_core_dumps(long signr)
>> >         if (signr == SIGQUIT)   /* don't care for those */
>> >                 return;
>> >
>> > -       ab = audit_log_start(NULL, GFP_KERNEL, AUDIT_ANOM_ABEND);
>> > +       ab = audit_log_start(context, GFP_KERNEL, AUDIT_ANOM_ABEND);
>> >         if (unlikely(!ab))
>> >                 return;
>> >         audit_log_task(ab);
>> >         audit_log_format(ab, " sig=%ld res=1", signr);
>> >         audit_log_end(ab);
>> > +       audit_log_container_info(context, "abend", audit_get_containerid(current));
>> > +       audit_free_context(context);
>> >  }
>> >
>> >  void __audit_seccomp(unsigned long syscall, long signr, int code)
>> >  {
>> >         struct audit_buffer *ab;
>> > +       struct audit_context *context = audit_alloc_local();
>>
>> We can definitely use current here.
>
> Ok, so both syscall aux records.  That elimintes this patch from the
> set, can go in independently.

Yep.  It should help shrink the audit container ID patchset and
perhaps more importantly it should put some distance between the
connected-record debate and the audit container ID debate.

I understand we are going to need a "local" context for some things,
the network packets are probably the best example, but whenever
possible I would like to connect these records back to a task's
context.

-- 
paul moore
www.paul-moore.com

^ permalink raw reply

* Re: [virtio-dev] [pci PATCH v7 2/5] virtio_pci: Add support for unmanaged SR-IOV on virtio_pci devices
From: Alexander Duyck @ 2018-04-20 16:08 UTC (permalink / raw)
  To: Michael S. Tsirkin, Daly, Dan, Rustad, Mark D
  Cc: Bjorn Helgaas, Duyck, Alexander H, linux-pci, virtio-dev, kvm,
	Netdev, LKML, linux-nvme, Keith Busch, netanel, Don Dutile,
	Maximilian Heyne, Wang, Liang-min, David Woodhouse,
	Christoph Hellwig, dwmw
In-Reply-To: <20180420180839-mutt-send-email-mst@kernel.org>

On Fri, Apr 20, 2018 at 8:28 AM, Michael S. Tsirkin <mst@redhat.com> wrote:
> On Fri, Apr 20, 2018 at 07:56:14AM -0700, Alexander Duyck wrote:
>> > I think for virtio it should include the feature bit, yes.
>> > Adding feature bit is very easy - post a patch to the virtio TC mailing
>> > list, wait about a week to give people time to respond (two weeks if it
>> > is around holidays and such).
>>
>> The problem is we are talking about hardware/FPGA, not software.
>> Adding a feature bit means going back and updating RTL. The software
>> side of things is easy, re-validating things after a hardware/FPGA
>> change not so much.
>>
>> If this is a hard requirement I may just drop the virtio patch, push
>> what I have, and leave it to Mark/Dan to deal with the necessary RTL
>> and code changes needed to support Virtio as I don't expect the
>> turnaround to be as easy as just a patch.
>>
>> Thanks.
>>
>> - Alex
>
> Let's focus on virtio in this thread.

That is kind of what I was thinking, and why I was thinking it might
make sense to make the virtio specific changes a separate patch set. I
could get the PCI bits taken care of in the meantime since they effect
genetic PCI, NVMe, and the Amazon ENA interfaces.

> Involving the virtio TC in host/guest interface changes is a
> hard requirement. It's just too easy to create conflicts otherwise.
>
> So you guys should have just sent the proposal to the TC when you
> were doing your RTL and you would have been in the clear.

Agreed. I believe I brought this up when I was originally asked to
look into the coding for this.

> Generally adding a feature bit with any extension is a good idea:
> this way you merely reserve a feature bit for your feature through
> the TC and are more or less sure of forward and backward compatibility.
> It's incredibly easy.

Agreed, though in this case I am not sure it makes sense since this
isn't necessarily something that is a Virtio feature itself. It is
just a side effect of the fact that they are adding SR-IOV support to
a device that happens to emulate Virtio NET and apparently their PF
has to be identical to the VF other than the PCIe extended config
space.

> But maybe it's not needed here.  I am not making the decisions myself.
> Not too late: post to the TC list and let's see what the response is.
> Without a feature bit you are making a change affecting all future
> implementations without exception so the bar is a bit higher: you need
> to actually post a spec text proposal not just a patch showing how to
> use the feature, and TC needs to vote on it. Voting takes a week,
> review a week or two depending on change complexity.
>
> Hope this helps,
>
> --
> MST

I think I will leave this for Dan and Mark to handle since I am still
not all that familiar with the hardware in use here. Once a decision
has been made him and Mark could look at pushing either the one line
patch or something more complex involving a feature flag.

Thanks.

Alex

^ permalink raw reply

* Re: [net-next 1/3] tipc: set default MTU for UDP media
From: kbuild test robot @ 2018-04-20 16:06 UTC (permalink / raw)
  To: GhantaKrishnamurthy MohanKrishna
  Cc: kbuild-all, tipc-discussion, jon.maloy, maloy, ying.xue,
	mohan.krishna.ghanta.krishnamurthy, netdev, davem
In-Reply-To: <1524128780-2550-2-git-send-email-mohan.krishna.ghanta.krishnamurthy@ericsson.com>

[-- Attachment #1: Type: text/plain, Size: 6973 bytes --]

Hi GhantaKrishnamurthy,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on net-next/master]

url:    https://github.com/0day-ci/linux/commits/GhantaKrishnamurthy-MohanKrishna/tipc-Confgiuration-of-MTU-for-media-UDP/20180420-224412
config: i386-randconfig-a0-201815 (attached as .config)
compiler: gcc-4.9 (Debian 4.9.4-2) 4.9.4
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

Note: the linux-review/GhantaKrishnamurthy-MohanKrishna/tipc-Confgiuration-of-MTU-for-media-UDP/20180420-224412 HEAD 5757244a45c9114ee8a7ed60e9b074107605f6eb builds fine.
      It only hurts bisectibility.

All errors (new ones prefixed by >>):

   net/tipc/udp_media.c: In function 'tipc_udp_enable':
   net/tipc/udp_media.c:716:20: error: 'struct tipc_media' has no member named 'mtu'
      b->mtu = b->media->mtu;
                       ^
   net/tipc/udp_media.c: At top level:
>> net/tipc/udp_media.c:805:2: error: unknown field 'mtu' specified in initializer
     .mtu  = TIPC_DEF_LINK_UDP_MTU,
     ^

vim +/mtu +805 net/tipc/udp_media.c

   632	
   633	/**
   634	 * tipc_udp_enable - callback to create a new udp bearer instance
   635	 * @net:	network namespace
   636	 * @b:		pointer to generic tipc_bearer
   637	 * @attrs:	netlink bearer configuration
   638	 *
   639	 * validate the bearer parameters and initialize the udp bearer
   640	 * rtnl_lock should be held
   641	 */
   642	static int tipc_udp_enable(struct net *net, struct tipc_bearer *b,
   643				   struct nlattr *attrs[])
   644	{
   645		int err = -EINVAL;
   646		struct udp_bearer *ub;
   647		struct udp_media_addr remote = {0};
   648		struct udp_media_addr local = {0};
   649		struct udp_port_cfg udp_conf = {0};
   650		struct udp_tunnel_sock_cfg tuncfg = {NULL};
   651		struct nlattr *opts[TIPC_NLA_UDP_MAX + 1];
   652		u8 node_id[NODE_ID_LEN] = {0,};
   653	
   654		ub = kzalloc(sizeof(*ub), GFP_ATOMIC);
   655		if (!ub)
   656			return -ENOMEM;
   657	
   658		INIT_LIST_HEAD(&ub->rcast.list);
   659	
   660		if (!attrs[TIPC_NLA_BEARER_UDP_OPTS])
   661			goto err;
   662	
   663		if (nla_parse_nested(opts, TIPC_NLA_UDP_MAX,
   664				     attrs[TIPC_NLA_BEARER_UDP_OPTS],
   665				     tipc_nl_udp_policy, NULL))
   666			goto err;
   667	
   668		if (!opts[TIPC_NLA_UDP_LOCAL] || !opts[TIPC_NLA_UDP_REMOTE]) {
   669			pr_err("Invalid UDP bearer configuration");
   670			err = -EINVAL;
   671			goto err;
   672		}
   673	
   674		err = tipc_parse_udp_addr(opts[TIPC_NLA_UDP_LOCAL], &local,
   675					  &ub->ifindex);
   676		if (err)
   677			goto err;
   678	
   679		err = tipc_parse_udp_addr(opts[TIPC_NLA_UDP_REMOTE], &remote, NULL);
   680		if (err)
   681			goto err;
   682	
   683		/* Autoconfigure own node identity if needed */
   684		if (!tipc_own_id(net)) {
   685			memcpy(node_id, local.ipv6.in6_u.u6_addr8, 16);
   686			tipc_net_init(net, node_id, 0);
   687		}
   688		if (!tipc_own_id(net)) {
   689			pr_warn("Failed to set node id, please configure manually\n");
   690			err = -EINVAL;
   691			goto err;
   692		}
   693	
   694		b->bcast_addr.media_id = TIPC_MEDIA_TYPE_UDP;
   695		b->bcast_addr.broadcast = TIPC_BROADCAST_SUPPORT;
   696		rcu_assign_pointer(b->media_ptr, ub);
   697		rcu_assign_pointer(ub->bearer, b);
   698		tipc_udp_media_addr_set(&b->addr, &local);
   699		if (local.proto == htons(ETH_P_IP)) {
   700			struct net_device *dev;
   701	
   702			dev = __ip_dev_find(net, local.ipv4.s_addr, false);
   703			if (!dev) {
   704				err = -ENODEV;
   705				goto err;
   706			}
   707			udp_conf.family = AF_INET;
   708			udp_conf.local_ip.s_addr = htonl(INADDR_ANY);
   709			udp_conf.use_udp_checksums = false;
   710			ub->ifindex = dev->ifindex;
   711			if (tipc_mtu_bad(dev, sizeof(struct iphdr) +
   712					      sizeof(struct udphdr))) {
   713				err = -EINVAL;
   714				goto err;
   715			}
 > 716			b->mtu = b->media->mtu;
   717	#if IS_ENABLED(CONFIG_IPV6)
   718		} else if (local.proto == htons(ETH_P_IPV6)) {
   719			udp_conf.family = AF_INET6;
   720			udp_conf.use_udp6_tx_checksums = true;
   721			udp_conf.use_udp6_rx_checksums = true;
   722			udp_conf.local_ip6 = in6addr_any;
   723			b->mtu = 1280;
   724	#endif
   725		} else {
   726			err = -EAFNOSUPPORT;
   727			goto err;
   728		}
   729		udp_conf.local_udp_port = local.port;
   730		err = udp_sock_create(net, &udp_conf, &ub->ubsock);
   731		if (err)
   732			goto err;
   733		tuncfg.sk_user_data = ub;
   734		tuncfg.encap_type = 1;
   735		tuncfg.encap_rcv = tipc_udp_recv;
   736		tuncfg.encap_destroy = NULL;
   737		setup_udp_tunnel_sock(net, ub->ubsock, &tuncfg);
   738	
   739		/**
   740		 * The bcast media address port is used for all peers and the ip
   741		 * is used if it's a multicast address.
   742		 */
   743		memcpy(&b->bcast_addr.value, &remote, sizeof(remote));
   744		if (tipc_udp_is_mcast_addr(&remote))
   745			err = enable_mcast(ub, &remote);
   746		else
   747			err = tipc_udp_rcast_add(b, &remote);
   748		if (err)
   749			goto err;
   750	
   751		return 0;
   752	err:
   753		if (ub->ubsock)
   754			udp_tunnel_sock_release(ub->ubsock);
   755		kfree(ub);
   756		return err;
   757	}
   758	
   759	/* cleanup_bearer - break the socket/bearer association */
   760	static void cleanup_bearer(struct work_struct *work)
   761	{
   762		struct udp_bearer *ub = container_of(work, struct udp_bearer, work);
   763		struct udp_replicast *rcast, *tmp;
   764	
   765		list_for_each_entry_safe(rcast, tmp, &ub->rcast.list, list) {
   766			list_del_rcu(&rcast->list);
   767			kfree_rcu(rcast, rcu);
   768		}
   769	
   770		if (ub->ubsock)
   771			udp_tunnel_sock_release(ub->ubsock);
   772		synchronize_net();
   773		kfree(ub);
   774	}
   775	
   776	/* tipc_udp_disable - detach bearer from socket */
   777	static void tipc_udp_disable(struct tipc_bearer *b)
   778	{
   779		struct udp_bearer *ub;
   780	
   781		ub = rcu_dereference_rtnl(b->media_ptr);
   782		if (!ub) {
   783			pr_err("UDP bearer instance not found\n");
   784			return;
   785		}
   786		if (ub->ubsock)
   787			sock_set_flag(ub->ubsock->sk, SOCK_DEAD);
   788		RCU_INIT_POINTER(ub->bearer, NULL);
   789	
   790		/* sock_release need to be done outside of rtnl lock */
   791		INIT_WORK(&ub->work, cleanup_bearer);
   792		schedule_work(&ub->work);
   793	}
   794	
   795	struct tipc_media udp_media_info = {
   796		.send_msg	= tipc_udp_send_msg,
   797		.enable_media	= tipc_udp_enable,
   798		.disable_media	= tipc_udp_disable,
   799		.addr2str	= tipc_udp_addr2str,
   800		.addr2msg	= tipc_udp_addr2msg,
   801		.msg2addr	= tipc_udp_msg2addr,
   802		.priority	= TIPC_DEF_LINK_PRI,
   803		.tolerance	= TIPC_DEF_LINK_TOL,
   804		.window		= TIPC_DEF_LINK_WIN,
 > 805		.mtu		= TIPC_DEF_LINK_UDP_MTU,

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 31332 bytes --]

^ permalink raw reply

* Re: [virtio-dev] Re: [PATCH v7 net-next 2/4] net: Introduce generic failover module
From: Michael S. Tsirkin @ 2018-04-20 16:03 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: Samudrala, Sridhar, Stephen Hemminger, David Miller, Netdev,
	virtualization, virtio-dev, Brandeburg, Jesse, Duyck, Alexander H,
	Jakub Kicinski, Jason Wang, Siwei Liu, Jiri Pirko
In-Reply-To: <CAKgT0UeQTx7zJPK3K3eM9xxHfVyHXwJ-G_b8eqGn0bWAyt9aAg@mail.gmail.com>

On Fri, Apr 20, 2018 at 08:56:57AM -0700, Alexander Duyck wrote:
> On Fri, Apr 20, 2018 at 8:34 AM, Michael S. Tsirkin <mst@redhat.com> wrote:
> > On Fri, Apr 20, 2018 at 08:21:00AM -0700, Samudrala, Sridhar wrote:
> >> > > + finfo = netdev_priv(failover_dev);
> >> > > +
> >> > > + primary_dev = rtnl_dereference(finfo->primary_dev);
> >> > > + standby_dev = rtnl_dereference(finfo->standby_dev);
> >> > > +
> >> > > + if (slave_dev != primary_dev && slave_dev != standby_dev)
> >> > > +         goto done;
> >> > > +
> >> > > + if ((primary_dev && failover_xmit_ready(primary_dev)) ||
> >> > > +     (standby_dev && failover_xmit_ready(standby_dev))) {
> >> > > +         netif_carrier_on(failover_dev);
> >> > > +         netif_tx_wake_all_queues(failover_dev);
> >> > > + } else {
> >> > > +         netif_carrier_off(failover_dev);
> >> > > +         netif_tx_stop_all_queues(failover_dev);
> >> > And I think it's a good idea to get stats from device here too.
> >>
> >> Not sure why we need to get stats from lower devs here?
> >
> > link down is often indication of a hardware problem.
> > lower dev might stop responding down the road.
> >
> >> > > +static const struct net_device_ops failover_dev_ops = {
> >> > > + .ndo_open               = failover_open,
> >> > > + .ndo_stop               = failover_close,
> >> > > + .ndo_start_xmit         = failover_start_xmit,
> >> > > + .ndo_select_queue       = failover_select_queue,
> >> > > + .ndo_get_stats64        = failover_get_stats,
> >> > > + .ndo_change_mtu         = failover_change_mtu,
> >> > > + .ndo_set_rx_mode        = failover_set_rx_mode,
> >> > > + .ndo_validate_addr      = eth_validate_addr,
> >> > > + .ndo_features_check     = passthru_features_check,
> >> > xdp support?
> >>
> >> I think it should be possible to add it be calling the lower dev ndo_xdp routines
> >> with proper checks. can we add this later?
> >
> > I'd be concerned that if you don't xdp userspace will keep poking
> > at lower devs. Then it will stop working if you add this later.
> 
> The failover device is better off not providing in-driver XDP since
> there are already skbs allocated by the time that we see the packet
> here anyway. As such generic XDP is the preferred way to handle this
> since it will work regardless of what lower devices are present.
>
> The only advantage of having XDP down at the virtio or VF level would
> be that it performs better, but at the cost of complexity since we
> would need to rebind the eBPF program any time a device is hotplugged
> out and then back in. For now the current approach is in keeping with
> how bonding and other similar drivers are currently handling this.
> 
> Thanks.
> 
> - Alex

OK fair enough.

-- 
MST

^ permalink raw reply

* Re: [PATCH v7 net-next 4/4] netvsc: refactor notifier/event handling code to use the failover framework
From: Jiri Pirko @ 2018-04-20 16:00 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Sridhar Samudrala, mst, davem, netdev, virtualization, virtio-dev,
	jesse.brandeburg, alexander.h.duyck, kubakici, jasowang,
	loseweigh
In-Reply-To: <20180420082802.6ca37e4c@xeon-e3>

Fri, Apr 20, 2018 at 05:28:02PM CEST, stephen@networkplumber.org wrote:
>On Thu, 19 Apr 2018 18:42:04 -0700
>Sridhar Samudrala <sridhar.samudrala@intel.com> wrote:
>
>> Use the registration/notification framework supported by the generic
>> failover infrastructure.
>> 
>> Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
>
>Do what you want to other devices but leave netvsc alone.
>Adding these failover ops does not reduce the code size, and really is
>no benefit.  The netvsc device driver needs to be backported to several
>other distributions and doing this makes that harder.

We should not care about the backport burden when we are trying to make
things right. And things are not right. The current netvsc approach is
just plain wrong shortcut. It should have been done in a generic way
from the very beginning. We are just trying to fix this situation.

Moreover, I believe that part of the fix is to convert netvsc to 3
netdev solution too. 2 netdev model is wrong.


>
>I will NAK patches to change to common code for netvsc especially the
>three device model.  MS worked hard with distro vendors to support transparent
>mode, ans we really can't have a new model; or do backport.
>
>Plus, DPDK is now dependent on existing model.

Sorry, but nobody here cares about dpdk or other similar oddities.

^ permalink raw reply

* Re: [PATCH bpf-next] libbpf: fixed build error for samples/bpf/
From: Martin KaFai Lau @ 2018-04-20 15:59 UTC (permalink / raw)
  To: Björn Töpel; +Cc: ast, daniel, netdev, Björn Töpel
In-Reply-To: <20180420080516.16683-1-bjorn.topel@gmail.com>

On Fri, Apr 20, 2018 at 10:05:16AM +0200, Björn Töpel wrote:
> From: Björn Töpel <bjorn.topel@intel.com>
> 
> Commit 8a138aed4a80 ("bpf: btf: Add BTF support to libbpf") did not
> include stdbool.h, so GCC complained when building samples/bpf/.
> 
> In file included from /home/btopel/src/ext/linux/samples/bpf/libbpf.h:6:0,
>                  from /home/btopel/src/ext/linux/samples/bpf/test_lru_dist.c:24:
> /home/btopel/src/ext/linux/tools/lib/bpf/bpf.h:105:4: error: unknown type name ‘bool’; did you mean ‘_Bool’?
>     bool do_log);
>     ^~~~
>     _Bool
> 
> Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
> ---
>  tools/lib/bpf/bpf.h | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h
> index 01bda076310f..553b11ad52b3 100644
> --- a/tools/lib/bpf/bpf.h
> +++ b/tools/lib/bpf/bpf.h
> @@ -24,6 +24,7 @@
>  #define __BPF_BPF_H
>  
>  #include <linux/bpf.h>
> +#include <stdbool.h>
Thanks for the fix!

^ permalink raw reply

* Re: [virtio-dev] Re: [PATCH v7 net-next 2/4] net: Introduce generic failover module
From: Alexander Duyck @ 2018-04-20 15:56 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Samudrala, Sridhar, Stephen Hemminger, David Miller, Netdev,
	virtualization, virtio-dev, Brandeburg, Jesse, Duyck, Alexander H,
	Jakub Kicinski, Jason Wang, Siwei Liu, Jiri Pirko
In-Reply-To: <20180420183021-mutt-send-email-mst@kernel.org>

On Fri, Apr 20, 2018 at 8:34 AM, Michael S. Tsirkin <mst@redhat.com> wrote:
> On Fri, Apr 20, 2018 at 08:21:00AM -0700, Samudrala, Sridhar wrote:
>> > > + finfo = netdev_priv(failover_dev);
>> > > +
>> > > + primary_dev = rtnl_dereference(finfo->primary_dev);
>> > > + standby_dev = rtnl_dereference(finfo->standby_dev);
>> > > +
>> > > + if (slave_dev != primary_dev && slave_dev != standby_dev)
>> > > +         goto done;
>> > > +
>> > > + if ((primary_dev && failover_xmit_ready(primary_dev)) ||
>> > > +     (standby_dev && failover_xmit_ready(standby_dev))) {
>> > > +         netif_carrier_on(failover_dev);
>> > > +         netif_tx_wake_all_queues(failover_dev);
>> > > + } else {
>> > > +         netif_carrier_off(failover_dev);
>> > > +         netif_tx_stop_all_queues(failover_dev);
>> > And I think it's a good idea to get stats from device here too.
>>
>> Not sure why we need to get stats from lower devs here?
>
> link down is often indication of a hardware problem.
> lower dev might stop responding down the road.
>
>> > > +static const struct net_device_ops failover_dev_ops = {
>> > > + .ndo_open               = failover_open,
>> > > + .ndo_stop               = failover_close,
>> > > + .ndo_start_xmit         = failover_start_xmit,
>> > > + .ndo_select_queue       = failover_select_queue,
>> > > + .ndo_get_stats64        = failover_get_stats,
>> > > + .ndo_change_mtu         = failover_change_mtu,
>> > > + .ndo_set_rx_mode        = failover_set_rx_mode,
>> > > + .ndo_validate_addr      = eth_validate_addr,
>> > > + .ndo_features_check     = passthru_features_check,
>> > xdp support?
>>
>> I think it should be possible to add it be calling the lower dev ndo_xdp routines
>> with proper checks. can we add this later?
>
> I'd be concerned that if you don't xdp userspace will keep poking
> at lower devs. Then it will stop working if you add this later.

The failover device is better off not providing in-driver XDP since
there are already skbs allocated by the time that we see the packet
here anyway. As such generic XDP is the preferred way to handle this
since it will work regardless of what lower devices are present.

The only advantage of having XDP down at the virtio or VF level would
be that it performs better, but at the cost of complexity since we
would need to rebind the eBPF program any time a device is hotplugged
out and then back in. For now the current approach is in keeping with
how bonding and other similar drivers are currently handling this.

Thanks.

- Alex

^ permalink raw reply

* [PATCH net-next 2/4] net: implement sock_mmap_hook()
From: Eric Dumazet @ 2018-04-20 15:55 UTC (permalink / raw)
  To: David S . Miller
  Cc: netdev, linux-kernel, Soheil Hassas Yeganeh, Eric Dumazet,
	Eric Dumazet
In-Reply-To: <20180420155542.122183-1-edumazet@google.com>

sock_mmap_hook() is the mmap_hook handler provided for socket_file_ops

Following patch will provide tcp_mmap_hook() for TCP protocol.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 include/linux/net.h | 1 +
 net/socket.c        | 9 +++++++++
 2 files changed, 10 insertions(+)

diff --git a/include/linux/net.h b/include/linux/net.h
index 6554d3ba4396b3df49acac934ad16eeb71a695f4..5192bf502b11e42c3d9eb342ce67361916149bfa 100644
--- a/include/linux/net.h
+++ b/include/linux/net.h
@@ -181,6 +181,7 @@ struct proto_ops {
 				      size_t total_len, int flags);
 	int		(*mmap)	     (struct file *file, struct socket *sock,
 				      struct vm_area_struct * vma);
+	int		(*mmap_hook) (struct socket *sock, enum mmap_hook);
 	ssize_t		(*sendpage)  (struct socket *sock, struct page *page,
 				      int offset, size_t size, int flags);
 	ssize_t 	(*splice_read)(struct socket *sock,  loff_t *ppos,
diff --git a/net/socket.c b/net/socket.c
index f10f1d947c78c193b49379b0ec641d81367fb4cf..75a5c2ebe57e0621dae17c6c9e1a796ee818b107 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -131,6 +131,14 @@ static ssize_t sock_splice_read(struct file *file, loff_t *ppos,
 				struct pipe_inode_info *pipe, size_t len,
 				unsigned int flags);
 
+static int sock_mmap_hook(struct file *file, enum mmap_hook mode)
+{
+	struct socket *sock = file->private_data;
+
+	if (!sock->ops->mmap_hook)
+		return 0;
+	return sock->ops->mmap_hook(sock, mode);
+}
 /*
  *	Socket files have a set of 'special' operations as well as the generic file ones. These don't appear
  *	in the operation structures but are done directly via the socketcall() multiplexor.
@@ -147,6 +155,7 @@ static const struct file_operations socket_file_ops = {
 	.compat_ioctl = compat_sock_ioctl,
 #endif
 	.mmap =		sock_mmap,
+	.mmap_hook =	sock_mmap_hook,
 	.release =	sock_close,
 	.fasync =	sock_fasync,
 	.sendpage =	sock_sendpage,
-- 
2.17.0.484.g0c8726318c-goog

^ permalink raw reply related

* [PATCH net-next 4/4] tcp: mmap: move the skb cleanup to tcp_mmap_hook()
From: Eric Dumazet @ 2018-04-20 15:55 UTC (permalink / raw)
  To: David S . Miller
  Cc: netdev, linux-kernel, Soheil Hassas Yeganeh, Eric Dumazet,
	Eric Dumazet
In-Reply-To: <20180420155542.122183-1-edumazet@google.com>

Freeing all skbs and sending ACK is time consuming.

This is currently done while both current->mm->mmap_sem and socket
lock are held, in tcp_mmap()

Thanks to mmap_hook infrastructure, we can perform the cleanup
after current->mm->mmap_sem has been released, thus allowing
other threads to perform mm operations without delay.

Note that the preparation work (building the array of page
pointers) can also be done from tcp_mmap_hook() while
mmap_sem has not been taken yet, but this is another independent change.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/ipv4/tcp.c | 20 +++++++++++---------
 1 file changed, 11 insertions(+), 9 deletions(-)

diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index e913b2dd5df321f2789e8d5f233ede9c2f1d5624..82f7c3e47253cecac6ea1819fbb7a0712058ec55 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -1740,9 +1740,16 @@ int tcp_mmap_hook(struct socket *sock, enum mmap_hook mode)
 		 */
 		return 0;
 	}
-	/* TODO: Move here the stuff that can been done after
-	 * current->mm->mmap_sem has been released.
-	 */
+	if (mode == MMAP_HOOK_COMMIT) {
+		u32 offset;
+
+		tcp_rcv_space_adjust(sk);
+
+		/* Clean up data we have read: This will do ACK frames. */
+		tcp_recv_skb(sk, tcp_sk(sk)->copied_seq, &offset);
+
+		tcp_cleanup_rbuf(sk, PAGE_SIZE);
+	}
 	release_sock(sk);
 	return 0;
 }
@@ -1843,13 +1850,8 @@ int tcp_mmap(struct file *file, struct socket *sock,
 		if (ret)
 			goto out;
 	}
-	/* operation is complete, we can 'consume' all skbs */
+	/* operation is complete, skbs will be freed from tcp_mmap_hook() */
 	tp->copied_seq = seq;
-	tcp_rcv_space_adjust(sk);
-
-	/* Clean up data we have read: This will do ACK frames. */
-	tcp_recv_skb(sk, seq, &offset);
-	tcp_cleanup_rbuf(sk, size);
 
 	ret = 0;
 out:
-- 
2.17.0.484.g0c8726318c-goog

^ permalink raw reply related

* [PATCH net-next 3/4] tcp: provide tcp_mmap_hook()
From: Eric Dumazet @ 2018-04-20 15:55 UTC (permalink / raw)
  To: David S . Miller
  Cc: netdev, linux-kernel, Soheil Hassas Yeganeh, Eric Dumazet,
	Eric Dumazet
In-Reply-To: <20180420155542.122183-1-edumazet@google.com>

Many socket operations can copy data between user and kernel space
while socket lock is held. This means mm->mmap_sem can be taken
after socket lock.

When implementing tcp mmap(), I forgot this and syzbot was kind enough
to point this to my attention.

This patch adds tcp_mmap_hook(), allowing us to grab socket lock
before vm_mmap_pgoff() grabs mm->mmap_sem

This same hook is responsible for releasing socket lock when
vm_mmap_pgoff() has released mm->mmap_sem (or failed to acquire it)

Note that follow-up patches can transfer code from tcp_mmap()
to tcp_mmap_hook() to shorten tcp_mmap() execution time
and thus increase mmap() performance in multi-threaded programs.

Fixes: 93ab6cc69162 ("tcp: implement mmap() for zero copy receive")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
---
 include/net/tcp.h   |  1 +
 net/ipv4/af_inet.c  |  1 +
 net/ipv4/tcp.c      | 25 ++++++++++++++++++++++---
 net/ipv6/af_inet6.c |  1 +
 4 files changed, 25 insertions(+), 3 deletions(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index 833154e3df173ea41aa16dd1ec739a175c679c5c..f68c8e8957840cacdbdd3d02bd149fce33ae324f 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -404,6 +404,7 @@ int tcp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, int nonblock,
 		int flags, int *addr_len);
 int tcp_set_rcvlowat(struct sock *sk, int val);
 void tcp_data_ready(struct sock *sk);
+int tcp_mmap_hook(struct socket *sock, enum mmap_hook mode);
 int tcp_mmap(struct file *file, struct socket *sock,
 	     struct vm_area_struct *vma);
 void tcp_parse_options(const struct net *net, const struct sk_buff *skb,
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index 3ebf599cebaea4926decc1aad7274b12ec7e1566..af597440ff59c049b7fd02f7d7f79c23b9e195bb 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -995,6 +995,7 @@ const struct proto_ops inet_stream_ops = {
 	.sendmsg	   = inet_sendmsg,
 	.recvmsg	   = inet_recvmsg,
 	.mmap		   = tcp_mmap,
+	.mmap_hook	   = tcp_mmap_hook,
 	.sendpage	   = inet_sendpage,
 	.splice_read	   = tcp_splice_read,
 	.read_sock	   = tcp_read_sock,
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 4022073b0aeea9d07af0fa825b640a00512908a3..e913b2dd5df321f2789e8d5f233ede9c2f1d5624 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -1726,6 +1726,28 @@ int tcp_set_rcvlowat(struct sock *sk, int val)
 }
 EXPORT_SYMBOL(tcp_set_rcvlowat);
 
+/* mmap() on TCP needs to grab socket lock before current->mm->mmap_sem
+ * is taken in vm_mmap_pgoff() to avoid possible dead locks.
+ */
+int tcp_mmap_hook(struct socket *sock, enum mmap_hook mode)
+{
+	struct sock *sk = sock->sk;
+
+	if (mode == MMAP_HOOK_PREPARE) {
+		lock_sock(sk);
+		/* TODO: Move here all the preparation work that can be done
+		 * before having to grab current->mm->mmap_sem.
+		 */
+		return 0;
+	}
+	/* TODO: Move here the stuff that can been done after
+	 * current->mm->mmap_sem has been released.
+	 */
+	release_sock(sk);
+	return 0;
+}
+EXPORT_SYMBOL(tcp_mmap_hook);
+
 /* When user wants to mmap X pages, we first need to perform the mapping
  * before freeing any skbs in receive queue, otherwise user would be unable
  * to fallback to standard recvmsg(). This happens if some data in the
@@ -1756,8 +1778,6 @@ int tcp_mmap(struct file *file, struct socket *sock,
 	/* TODO: Maybe the following is not needed if pages are COW */
 	vma->vm_flags &= ~VM_MAYWRITE;
 
-	lock_sock(sk);
-
 	ret = -ENOTCONN;
 	if (sk->sk_state == TCP_LISTEN)
 		goto out;
@@ -1833,7 +1853,6 @@ int tcp_mmap(struct file *file, struct socket *sock,
 
 	ret = 0;
 out:
-	release_sock(sk);
 	kvfree(pages_array);
 	return ret;
 }
diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c
index 36d622c477b1ed3c5d2b753938444526344a6109..31ce68c001c223d3351f73453273ae517a051816 100644
--- a/net/ipv6/af_inet6.c
+++ b/net/ipv6/af_inet6.c
@@ -579,6 +579,7 @@ const struct proto_ops inet6_stream_ops = {
 	.sendmsg	   = inet_sendmsg,		/* ok		*/
 	.recvmsg	   = inet_recvmsg,		/* ok		*/
 	.mmap		   = tcp_mmap,
+	.mmap_hook	   = tcp_mmap_hook,
 	.sendpage	   = inet_sendpage,
 	.sendmsg_locked    = tcp_sendmsg_locked,
 	.sendpage_locked   = tcp_sendpage_locked,
-- 
2.17.0.484.g0c8726318c-goog

^ permalink raw reply related

* [PATCH net-next 1/4] mm: provide a mmap_hook infrastructure
From: Eric Dumazet @ 2018-04-20 15:55 UTC (permalink / raw)
  To: David S . Miller
  Cc: netdev, linux-kernel, Soheil Hassas Yeganeh, Eric Dumazet,
	Eric Dumazet
In-Reply-To: <20180420155542.122183-1-edumazet@google.com>

When adding tcp mmap() implementation, I forgot that socket lock
had to be taken before current->mm->mmap_sem. syzbot eventually caught
the bug.

This patch provides a new mmap_hook() method in struct file_operations
that might be provided by fs to implement a finer control of whats
to be done before and after do_mmap_pgoff() and/or the mm->mmap_sem
acquire/release.

This is used in following patches by networking and TCP stacks
to solve the lockdep issue, and also allows some preparation
and cleanup work being done before/after mmap_sem is held,
allowing better scalability in multi-threading programs.

Fixes: 93ab6cc69162 ("tcp: implement mmap() for zero copy receive")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
---
 include/linux/fs.h |  6 ++++++
 mm/util.c          | 19 ++++++++++++++++++-
 2 files changed, 24 insertions(+), 1 deletion(-)

diff --git a/include/linux/fs.h b/include/linux/fs.h
index 92efaf1f89775f7b017477617dd983c10e0dc4d2..ef3526f84686585678861fc585efea974a69ca55 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1698,6 +1698,11 @@ struct block_device_operations;
 #define NOMMU_VMFLAGS \
 	(NOMMU_MAP_READ | NOMMU_MAP_WRITE | NOMMU_MAP_EXEC)
 
+enum mmap_hook {
+	MMAP_HOOK_PREPARE,
+	MMAP_HOOK_ROLLBACK,
+	MMAP_HOOK_COMMIT,
+};
 
 struct iov_iter;
 
@@ -1714,6 +1719,7 @@ struct file_operations {
 	long (*unlocked_ioctl) (struct file *, unsigned int, unsigned long);
 	long (*compat_ioctl) (struct file *, unsigned int, unsigned long);
 	int (*mmap) (struct file *, struct vm_area_struct *);
+	int (*mmap_hook) (struct file *, enum mmap_hook);
 	unsigned long mmap_supported_flags;
 	int (*open) (struct inode *, struct file *);
 	int (*flush) (struct file *, fl_owner_t id);
diff --git a/mm/util.c b/mm/util.c
index 1fc4fa7576f762bbbf341f056ca6d0be803a423f..3ddb18ab367f069d5884083e992e999546ccd995 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -350,11 +350,28 @@ unsigned long vm_mmap_pgoff(struct file *file, unsigned long addr,
 
 	ret = security_mmap_file(file, prot, flag);
 	if (!ret) {
-		if (down_write_killable(&mm->mmap_sem))
+		int (*mmap_hook)(struct file *, enum mmap_hook) = NULL;
+
+		if (file) {
+			mmap_hook = file->f_op->mmap_hook;
+
+			if (mmap_hook) {
+				ret = mmap_hook(file, MMAP_HOOK_PREPARE);
+				if (ret)
+					return ret;
+			}
+		}
+		if (down_write_killable(&mm->mmap_sem)) {
+			if (mmap_hook)
+				mmap_hook(file, MMAP_HOOK_ROLLBACK);
 			return -EINTR;
+		}
 		ret = do_mmap_pgoff(file, addr, len, prot, flag, pgoff,
 				    &populate, &uf);
 		up_write(&mm->mmap_sem);
+		if (mmap_hook)
+			mmap_hook(file, IS_ERR(ret) ? MMAP_HOOK_ROLLBACK :
+						      MMAP_HOOK_COMMIT);
 		userfaultfd_unmap_complete(mm, &uf);
 		if (populate)
 			mm_populate(ret, populate);
-- 
2.17.0.484.g0c8726318c-goog

^ permalink raw reply related

* [PATCH net-next 0/4] mm,tcp: provide mmap_hook to solve lockdep issue
From: Eric Dumazet @ 2018-04-20 15:55 UTC (permalink / raw)
  To: David S . Miller
  Cc: netdev, linux-kernel, Soheil Hassas Yeganeh, Eric Dumazet,
	Eric Dumazet

This patch series provide a new mmap_hook to fs willing to grab
a mutex before mm->mmap_sem is taken, to ensure lockdep sanity.

This hook allows us to shorten tcp_mmap() execution time (while mmap_sem
is held), and improve multi-threading scalability. 

Eric Dumazet (4):
  mm: provide a mmap_hook infrastructure
  net: implement sock_mmap_hook()
  tcp: provide tcp_mmap_hook()
  tcp: mmap: move the skb cleanup to tcp_mmap_hook()

 include/linux/fs.h  |  6 ++++++
 include/linux/net.h |  1 +
 include/net/tcp.h   |  1 +
 mm/util.c           | 19 ++++++++++++++++++-
 net/ipv4/af_inet.c  |  1 +
 net/ipv4/tcp.c      | 39 ++++++++++++++++++++++++++++++---------
 net/ipv6/af_inet6.c |  1 +
 net/socket.c        |  9 +++++++++
 8 files changed, 67 insertions(+), 10 deletions(-)

-- 
2.17.0.484.g0c8726318c-goog

^ permalink raw reply

* Re: [PATCH net-next] tun: do not compute the rxhash, if not needed
From: David Miller @ 2018-04-20 15:51 UTC (permalink / raw)
  To: pabeni; +Cc: netdev, jasowang
In-Reply-To: <1c43f8bc63407239c91df916b149d4fdbf26bed3.1524222969.git.pabeni@redhat.com>

From: Paolo Abeni <pabeni@redhat.com>
Date: Fri, 20 Apr 2018 13:18:16 +0200

> Currently, the tun driver, in absence of an eBPF steering program,
> always compute the rxhash in its rx path, even when such value
> is later unused due to additional checks (
> 
> This changeset moves the all the related checks just before the
> __skb_get_hash_symmetric(), so that the latter is no more computed
> when unneeded.
> 
> Also replace an unneeded RCU section with rcu_access_pointer().
> 
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Applied, thank you.

^ permalink raw reply

* Re: [PATCH v1 net-next] lan78xx: Add support to dump lan78xx registers
From: David Miller @ 2018-04-20 15:50 UTC (permalink / raw)
  To: raghuramchary.jallipalli; +Cc: netdev, unglinuxdriver, woojung.huh
In-Reply-To: <20180420061350.9340-1-raghuramchary.jallipalli@microchip.com>

From: Raghuram Chary J <raghuramchary.jallipalli@microchip.com>
Date: Fri, 20 Apr 2018 11:43:50 +0530

> In order to dump lan78xx family registers using ethtool, add
> support at lan78xx driver level.
> 
> Signed-off-by: Raghuram Chary J <raghuramchary.jallipalli@microchip.com>
> ---
> v0->v1:
>    * Remove one variable in the for loop.

Applied, thank you.

^ permalink raw reply

* [PATCH for-rc] uapi: Fix SPDX tags for files referring to the 'OpenIB.org' license
From: Jason Gunthorpe @ 2018-04-20 15:49 UTC (permalink / raw)
  To: linux-rdma
  Cc: Kate Stewart, Philippe Ombredanne, Greg Kroah-Hartman,
	Thomas Gleixner, Steve Winslow, Santosh Shilimkar, netdev,
	linux-kernel, Dave Watson

Based on discussion with Kate Stewart this license is not a
BSD-2-Clause, but is now formally identified as Linux-OpenIB
by SPDX.

The key difference between the licenses is in the 'warranty'
paragraph.

if_infiniband.h refers to the 'OpenIB.org' license, but
does not include the text, instead it links to an obsolete
web site that contains a license that matches the BSD-2-Clause
SPX. There is no 'three clause' version of the OpenIB.org
license.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
---
 include/uapi/linux/if_infiniband.h      | 2 +-
 include/uapi/linux/rds.h                | 2 +-
 include/uapi/linux/tls.h                | 2 +-
 include/uapi/rdma/cxgb3-abi.h           | 2 +-
 include/uapi/rdma/cxgb4-abi.h           | 2 +-
 include/uapi/rdma/hns-abi.h             | 2 +-
 include/uapi/rdma/ib_user_cm.h          | 2 +-
 include/uapi/rdma/ib_user_ioctl_verbs.h | 2 +-
 include/uapi/rdma/ib_user_mad.h         | 2 +-
 include/uapi/rdma/ib_user_sa.h          | 2 +-
 include/uapi/rdma/ib_user_verbs.h       | 2 +-
 include/uapi/rdma/mlx4-abi.h            | 2 +-
 include/uapi/rdma/mlx5-abi.h            | 2 +-
 include/uapi/rdma/mthca-abi.h           | 2 +-
 include/uapi/rdma/nes-abi.h             | 2 +-
 include/uapi/rdma/qedr-abi.h            | 2 +-
 include/uapi/rdma/rdma_user_cm.h        | 2 +-
 include/uapi/rdma/rdma_user_ioctl.h     | 2 +-
 include/uapi/rdma/rdma_user_rxe.h       | 2 +-
 19 files changed, 19 insertions(+), 19 deletions(-)

I propose to send this patch through the RDMA tree.

diff --git a/include/uapi/linux/if_infiniband.h b/include/uapi/linux/if_infiniband.h
index 050b92dcf8cf40..0fc33bf30e45a1 100644
--- a/include/uapi/linux/if_infiniband.h
+++ b/include/uapi/linux/if_infiniband.h
@@ -1,4 +1,4 @@
-/* SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) */
+/* SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) */
 /*
  * This software is available to you under a choice of one of two
  * licenses.  You may choose to be licensed under the terms of the GNU
diff --git a/include/uapi/linux/rds.h b/include/uapi/linux/rds.h
index a66b213de3d7a4..20c6bd0b00079e 100644
--- a/include/uapi/linux/rds.h
+++ b/include/uapi/linux/rds.h
@@ -1,4 +1,4 @@
-/* SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) */
+/* SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR Linux-OpenIB) */
 /*
  * Copyright (c) 2008 Oracle.  All rights reserved.
  *
diff --git a/include/uapi/linux/tls.h b/include/uapi/linux/tls.h
index c6633e97eca40b..ff02287495ac56 100644
--- a/include/uapi/linux/tls.h
+++ b/include/uapi/linux/tls.h
@@ -1,4 +1,4 @@
-/* SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) */
+/* SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR Linux-OpenIB) */
 /*
  * Copyright (c) 2016-2017, Mellanox Technologies. All rights reserved.
  *
diff --git a/include/uapi/rdma/cxgb3-abi.h b/include/uapi/rdma/cxgb3-abi.h
index 9acb4b7a624633..85aed672f43e65 100644
--- a/include/uapi/rdma/cxgb3-abi.h
+++ b/include/uapi/rdma/cxgb3-abi.h
@@ -1,4 +1,4 @@
-/* SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) */
+/* SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR Linux-OpenIB) */
 /*
  * Copyright (c) 2006 Chelsio, Inc. All rights reserved.
  *
diff --git a/include/uapi/rdma/cxgb4-abi.h b/include/uapi/rdma/cxgb4-abi.h
index 1fefd0140c26f6..a159ba8dcf8f13 100644
--- a/include/uapi/rdma/cxgb4-abi.h
+++ b/include/uapi/rdma/cxgb4-abi.h
@@ -1,4 +1,4 @@
-/* SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) */
+/* SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR Linux-OpenIB) */
 /*
  * Copyright (c) 2009-2010 Chelsio, Inc. All rights reserved.
  *
diff --git a/include/uapi/rdma/hns-abi.h b/include/uapi/rdma/hns-abi.h
index 7092c8de4bd883..78613b609fa846 100644
--- a/include/uapi/rdma/hns-abi.h
+++ b/include/uapi/rdma/hns-abi.h
@@ -1,4 +1,4 @@
-/* SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) */
+/* SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR Linux-OpenIB) */
 /*
  * Copyright (c) 2016 Hisilicon Limited.
  *
diff --git a/include/uapi/rdma/ib_user_cm.h b/include/uapi/rdma/ib_user_cm.h
index 4a8f9562f7cd9b..e2709bb8cb1802 100644
--- a/include/uapi/rdma/ib_user_cm.h
+++ b/include/uapi/rdma/ib_user_cm.h
@@ -1,4 +1,4 @@
-/* SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) */
+/* SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR Linux-OpenIB) */
 /*
  * Copyright (c) 2005 Topspin Communications.  All rights reserved.
  * Copyright (c) 2005 Intel Corporation.  All rights reserved.
diff --git a/include/uapi/rdma/ib_user_ioctl_verbs.h b/include/uapi/rdma/ib_user_ioctl_verbs.h
index 04e46ea517d328..625545d862d7e4 100644
--- a/include/uapi/rdma/ib_user_ioctl_verbs.h
+++ b/include/uapi/rdma/ib_user_ioctl_verbs.h
@@ -1,4 +1,4 @@
-/* SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) */
+/* SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR Linux-OpenIB) */
 /*
  * Copyright (c) 2017-2018, Mellanox Technologies inc.  All rights reserved.
  *
diff --git a/include/uapi/rdma/ib_user_mad.h b/include/uapi/rdma/ib_user_mad.h
index ef92118dad9770..90c0cf228020dc 100644
--- a/include/uapi/rdma/ib_user_mad.h
+++ b/include/uapi/rdma/ib_user_mad.h
@@ -1,4 +1,4 @@
-/* SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) */
+/* SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR Linux-OpenIB) */
 /*
  * Copyright (c) 2004 Topspin Communications.  All rights reserved.
  * Copyright (c) 2005 Voltaire, Inc. All rights reserved.
diff --git a/include/uapi/rdma/ib_user_sa.h b/include/uapi/rdma/ib_user_sa.h
index 0d2607f0cd20c3..435155d6e1c6a5 100644
--- a/include/uapi/rdma/ib_user_sa.h
+++ b/include/uapi/rdma/ib_user_sa.h
@@ -1,4 +1,4 @@
-/* SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) */
+/* SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR Linux-OpenIB) */
 /*
  * Copyright (c) 2005 Intel Corporation.  All rights reserved.
  *
diff --git a/include/uapi/rdma/ib_user_verbs.h b/include/uapi/rdma/ib_user_verbs.h
index 9be07394fdbe50..6aeb03315b0bd5 100644
--- a/include/uapi/rdma/ib_user_verbs.h
+++ b/include/uapi/rdma/ib_user_verbs.h
@@ -1,4 +1,4 @@
-/* SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) */
+/* SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR Linux-OpenIB) */
 /*
  * Copyright (c) 2005 Topspin Communications.  All rights reserved.
  * Copyright (c) 2005, 2006 Cisco Systems.  All rights reserved.
diff --git a/include/uapi/rdma/mlx4-abi.h b/include/uapi/rdma/mlx4-abi.h
index 04f64bc4045f1b..f745575281756d 100644
--- a/include/uapi/rdma/mlx4-abi.h
+++ b/include/uapi/rdma/mlx4-abi.h
@@ -1,4 +1,4 @@
-/* SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) */
+/* SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR Linux-OpenIB) */
 /*
  * Copyright (c) 2007 Cisco Systems, Inc. All rights reserved.
  * Copyright (c) 2007, 2008 Mellanox Technologies. All rights reserved.
diff --git a/include/uapi/rdma/mlx5-abi.h b/include/uapi/rdma/mlx5-abi.h
index cb4a02c4a1cef0..fdaf00e206498c 100644
--- a/include/uapi/rdma/mlx5-abi.h
+++ b/include/uapi/rdma/mlx5-abi.h
@@ -1,4 +1,4 @@
-/* SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) */
+/* SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR Linux-OpenIB) */
 /*
  * Copyright (c) 2013-2015, Mellanox Technologies. All rights reserved.
  *
diff --git a/include/uapi/rdma/mthca-abi.h b/include/uapi/rdma/mthca-abi.h
index ac756cd9e80772..91b12e1a6f43ce 100644
--- a/include/uapi/rdma/mthca-abi.h
+++ b/include/uapi/rdma/mthca-abi.h
@@ -1,4 +1,4 @@
-/* SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) */
+/* SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR Linux-OpenIB) */
 /*
  * Copyright (c) 2005 Topspin Communications.  All rights reserved.
  * Copyright (c) 2005, 2006 Cisco Systems.  All rights reserved.
diff --git a/include/uapi/rdma/nes-abi.h b/include/uapi/rdma/nes-abi.h
index 35bfd4015d0705..f80495baa9697e 100644
--- a/include/uapi/rdma/nes-abi.h
+++ b/include/uapi/rdma/nes-abi.h
@@ -1,4 +1,4 @@
-/* SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) */
+/* SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR Linux-OpenIB) */
 /*
  * Copyright (c) 2006 - 2011 Intel Corporation.  All rights reserved.
  * Copyright (c) 2005 Topspin Communications.  All rights reserved.
diff --git a/include/uapi/rdma/qedr-abi.h b/include/uapi/rdma/qedr-abi.h
index 8ba098900e9aac..24c658b3c79042 100644
--- a/include/uapi/rdma/qedr-abi.h
+++ b/include/uapi/rdma/qedr-abi.h
@@ -1,4 +1,4 @@
-/* SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) */
+/* SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR Linux-OpenIB) */
 /* QLogic qedr NIC Driver
  * Copyright (c) 2015-2016  QLogic Corporation
  *
diff --git a/include/uapi/rdma/rdma_user_cm.h b/include/uapi/rdma/rdma_user_cm.h
index e1269024af47f0..0d1e78ebad0515 100644
--- a/include/uapi/rdma/rdma_user_cm.h
+++ b/include/uapi/rdma/rdma_user_cm.h
@@ -1,4 +1,4 @@
-/* SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) */
+/* SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR Linux-OpenIB) */
 /*
  * Copyright (c) 2005-2006 Intel Corporation.  All rights reserved.
  *
diff --git a/include/uapi/rdma/rdma_user_ioctl.h b/include/uapi/rdma/rdma_user_ioctl.h
index d223f4164a0f8d..d92d2721b28c5b 100644
--- a/include/uapi/rdma/rdma_user_ioctl.h
+++ b/include/uapi/rdma/rdma_user_ioctl.h
@@ -1,4 +1,4 @@
-/* SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) */
+/* SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR Linux-OpenIB) */
 /*
  * Copyright (c) 2016 Mellanox Technologies, LTD. All rights reserved.
  *
diff --git a/include/uapi/rdma/rdma_user_rxe.h b/include/uapi/rdma/rdma_user_rxe.h
index 1f8a9e7daea43e..44ef6a3b7afc8c 100644
--- a/include/uapi/rdma/rdma_user_rxe.h
+++ b/include/uapi/rdma/rdma_user_rxe.h
@@ -1,4 +1,4 @@
-/* SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) */
+/* SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR Linux-OpenIB) */
 /*
  * Copyright (c) 2016 Mellanox Technologies Ltd. All rights reserved.
  *
-- 
2.17.0

^ permalink raw reply related

* [PATCH] hv_netvsc: select needed ucs2_string routine
From: Stephen Hemminger @ 2018-04-20 15:48 UTC (permalink / raw)
  To: davem; +Cc: netdev, Stephen Hemminger

The conversion of rndis friendly name to utf8 uses a standard
kernel routine which is optional in config. Therefore build
would fail for some configurations. Resolve by selecting needed
library.

Fixes: 0fe554a46a0f ("hv_netvsc: propogate Hyper-V friendly name into interface alias")
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
---
 drivers/net/hyperv/Kconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/hyperv/Kconfig b/drivers/net/hyperv/Kconfig
index 936968d23559..0765d5f61714 100644
--- a/drivers/net/hyperv/Kconfig
+++ b/drivers/net/hyperv/Kconfig
@@ -1,5 +1,6 @@
 config HYPERV_NET
 	tristate "Microsoft Hyper-V virtual network driver"
 	depends on HYPERV
+	select UCS2_STRING
 	help
 	  Select this option to enable the Hyper-V virtual network driver.
-- 
2.17.0

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox