Netdev List
 help / color / mirror / Atom feed
* Re: [RFC PATCH ghak32 V2 05/13] audit: add containerid support for ptrace and signals
From: Paul Moore @ 2018-04-20 16:13 UTC (permalink / raw)
  To: Richard Guy Briggs
  Cc: cgroups, containers, linux-api, Linux-Audit Mailing List,
	linux-fsdevel, LKML, netdev, ebiederm, luto, jlayton, carlos,
	dhowells, viro, simo, Eric Paris, serge
In-Reply-To: <20180420010320.panie6mtdafxl65y@madcap2.tricolour.ca>

On Thu, Apr 19, 2018 at 9:03 PM, Richard Guy Briggs <rgb@redhat.com> wrote:
> On 2018-04-18 20:32, Paul Moore wrote:
>> On Fri, Mar 16, 2018 at 5:00 AM, Richard Guy Briggs <rgb@redhat.com> wrote:

...

>> >  /*
>> >   * audit_log_container_info - report container info
>> > - * @tsk: task to be recorded
>> >   * @context: task or local context for record
>> > + * @op: containerid string description
>> > + * @containerid: container ID to report
>> >   */
>> > -int audit_log_container_info(struct task_struct *tsk, struct audit_context *context)
>> > +int audit_log_container_info(struct audit_context *context,
>> > +                             char *op, u64 containerid)
>> >  {
>> >         struct audit_buffer *ab;
>> >
>> > -       if (!audit_containerid_set(tsk))
>> > +       if (!cid_valid(containerid))
>> >                 return 0;
>> >         /* Generate AUDIT_CONTAINER_INFO with container ID */
>> >         ab = audit_log_start(context, GFP_KERNEL, AUDIT_CONTAINER_INFO);
>> >         if (!ab)
>> >                 return -ENOMEM;
>> > -       audit_log_format(ab, "contid=%llu", audit_get_containerid(tsk));
>> > +       audit_log_format(ab, "op=%s contid=%llu", op, containerid);
>> >         audit_log_end(ab);
>> >         return 0;
>> >  }
>>
>> Let's get these changes into the first patch where
>> audit_log_container_info() is defined.  Why?  This inserts a new field
>> into the record which is a no-no.  Yes, it is one single patchset, but
>> they are still separate patches and who knows which patches a given
>> distribution and/or tree may decide to backport.
>
> Fair enough.  That first thought went through my mind...  Would it be
> sufficient to move that field addition to the first patch and leave the
> rest here to support trace and signals?

I should have been more clear ... yes, that's what I was thinking; the
record format is the important part as it's user visible.

-- 
paul moore
www.paul-moore.com

^ permalink raw reply

* Re: [virtio-dev] [pci PATCH v7 2/5] virtio_pci: Add support for unmanaged SR-IOV on virtio_pci devices
From: Michael S. Tsirkin @ 2018-04-20 16:14 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: Daly, Dan, Rustad, Mark D, Bjorn Helgaas, Duyck, Alexander H,
	linux-pci, virtio-dev, kvm, Netdev, LKML, linux-nvme, Keith Busch,
	netanel, Don Dutile, Maximilian Heyne, Wang, Liang-min,
	David Woodhouse, Christoph Hellwig, dwmw
In-Reply-To: <CAKgT0UeWzK=_8m9hhWfaKyYB02WXDsuCkgowS4hgQ9GkLMcAMA@mail.gmail.com>

On Fri, Apr 20, 2018 at 09:08:51AM -0700, Alexander Duyck wrote:
> On Fri, Apr 20, 2018 at 8:28 AM, Michael S. Tsirkin <mst@redhat.com> wrote:
> > On Fri, Apr 20, 2018 at 07:56:14AM -0700, Alexander Duyck wrote:
> >> > I think for virtio it should include the feature bit, yes.
> >> > Adding feature bit is very easy - post a patch to the virtio TC mailing
> >> > list, wait about a week to give people time to respond (two weeks if it
> >> > is around holidays and such).
> >>
> >> The problem is we are talking about hardware/FPGA, not software.
> >> Adding a feature bit means going back and updating RTL. The software
> >> side of things is easy, re-validating things after a hardware/FPGA
> >> change not so much.
> >>
> >> If this is a hard requirement I may just drop the virtio patch, push
> >> what I have, and leave it to Mark/Dan to deal with the necessary RTL
> >> and code changes needed to support Virtio as I don't expect the
> >> turnaround to be as easy as just a patch.
> >>
> >> Thanks.
> >>
> >> - Alex
> >
> > Let's focus on virtio in this thread.
> 
> That is kind of what I was thinking, and why I was thinking it might
> make sense to make the virtio specific changes a separate patch set. I
> could get the PCI bits taken care of in the meantime since they effect
> genetic PCI, NVMe, and the Amazon ENA interfaces.
> 
> > Involving the virtio TC in host/guest interface changes is a
> > hard requirement. It's just too easy to create conflicts otherwise.
> >
> > So you guys should have just sent the proposal to the TC when you
> > were doing your RTL and you would have been in the clear.
> 
> Agreed. I believe I brought this up when I was originally asked to
> look into the coding for this.
> 
> > Generally adding a feature bit with any extension is a good idea:
> > this way you merely reserve a feature bit for your feature through
> > the TC and are more or less sure of forward and backward compatibility.
> > It's incredibly easy.
> 
> Agreed, though in this case I am not sure it makes sense since this
> isn't necessarily something that is a Virtio feature itself. It is
> just a side effect of the fact that they are adding SR-IOV support to
> a device that happens to emulate Virtio NET and apparently their PF
> has to be identical to the VF other than the PCIe extended config
> space.

I got that. My point is not everyone implementing SR-IOV will
want to do it like this. Others might want to have VFs
be different from PFs somehow. Feature bits ensure forward
not just backward compatibility.


> > But maybe it's not needed here.  I am not making the decisions myself.
> > Not too late: post to the TC list and let's see what the response is.
> > Without a feature bit you are making a change affecting all future
> > implementations without exception so the bar is a bit higher: you need
> > to actually post a spec text proposal not just a patch showing how to
> > use the feature, and TC needs to vote on it. Voting takes a week,
> > review a week or two depending on change complexity.
> >
> > Hope this helps,
> >
> > --
> > MST
> 
> I think I will leave this for Dan and Mark to handle since I am still
> not all that familiar with the hardware in use here. Once a decision
> has been made him and Mark could look at pushing either the one line
> patch or something more complex involving a feature flag.
> 
> Thanks.
> 
> Alex

As long as the TC is involved.

I know it's a bit of a strange thing to block it at the driver level,
the issue is with the device, but it's literally the only handle I have
to prevent people from doing out of spec hacks then pushing it all on us
to maintain.

-- 
MST

^ permalink raw reply

* [PATCH net] bpf: sockmap remove dead check
From: Jann Horn @ 2018-04-20 16:16 UTC (permalink / raw)
  To: ast, daniel, netdev, linux-kernel, john.fastabend, jannh
  Cc: linux-kernel, John Fastabend

Remove dead code that bails on `attr->value_size > KMALLOC_MAX_SIZE` - the
previous check already bails on `attr->value_size != 4`.

Signed-off-by: Jann Horn <jannh@google.com>
---
 kernel/bpf/sockmap.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/kernel/bpf/sockmap.c b/kernel/bpf/sockmap.c
index 8dd9210d7db7..a3b21385e947 100644
--- a/kernel/bpf/sockmap.c
+++ b/kernel/bpf/sockmap.c
@@ -1442,9 +1442,6 @@ static struct bpf_map *sock_map_alloc(union bpf_attr *attr)
 	    attr->value_size != 4 || attr->map_flags & ~SOCK_CREATE_FLAG_MASK)
 		return ERR_PTR(-EINVAL);
 
-	if (attr->value_size > KMALLOC_MAX_SIZE)
-		return ERR_PTR(-E2BIG);
-
 	err = bpf_tcp_ulp_register();
 	if (err && err != -EEXIST)
 		return ERR_PTR(err);
-- 
2.17.0.484.g0c8726318c-goog

^ permalink raw reply related

* Re: [PATCH net-next 2/2] netns: isolate seqnums to use per-netns locks
From: Christian Brauner @ 2018-04-20 16:16 UTC (permalink / raw)
  To: Eric W. Biederman
  Cc: davem, netdev, linux-kernel, avagin, ktkhai, serge, gregkh
In-Reply-To: <20180420135627.GA8350@gmail.com>

On Fri, Apr 20, 2018 at 03:56:28PM +0200, Christian Brauner wrote:
> On Wed, Apr 18, 2018 at 11:52:47PM +0200, Christian Brauner wrote:
> > On Wed, Apr 18, 2018 at 11:55:52AM -0500, Eric W. Biederman wrote:
> > > Christian Brauner <christian.brauner@ubuntu.com> writes:
> > > 
> > > > Now that it's possible to have a different set of uevents in different
> > > > network namespaces, per-network namespace uevent sequence numbers are
> > > > introduced. This increases performance as locking is now restricted to the
> > > > network namespace affected by the uevent rather than locking
> > > > everything.
> > > 
> > > Numbers please.  I personally expect that the netlink mc_list issues
> > > will swamp any benefit you get from this.
> > 
> > I wouldn't see how this would be the case. The gist of this is:
> > Everytime you send a uevent into a network namespace *not* owned by
> > init_user_ns you currently *have* to take mutex_lock(uevent_sock_list)
> > effectively blocking the host from processing uevents even though
> > - the uevent you're receiving might be totally different from the
> >   uevent that you're sending
> > - the uevent socket of the non-init_user_ns owned network namespace
> >   isn't even recorded in the list.
> > 
> > The other argument is that we now have properly isolated network
> > namespaces wrt to uevents such that each netns can have its own set of
> > uevents. This can either happen by a sufficiently privileged userspace
> > process sending it uevents that are only dedicated to that specific
> > netns. Or - and this *has been true for a long time* - because network
> > devices are *properly namespaced*. Meaning a uevent for that network
> > device is *tied to a network namespace*. For both cases the uevent
> > sequence numbering will be absolutely misleading. For example, whenever
> > you create e.g. a new veth device in a new network namespace it
> > shouldn't be accounted against the initial network namespace but *only*
> > against the network namespace that has that device added to it.
> 
> Eric, I did the testing. Here's what I did:
> 
> I compiled two 4.17-rc1 Kernels:
> - one with per netns uevent seqnums with decoupled locking
> - one without per netns uevent seqnums with decoupled locking
> 
> # Testcase 1:
> Only Injecting Uevents into network namespaces not owned by the initial user
> namespace.
> - created 1000 new user namespace + network namespace pairs
> - opened a uevent listener in each of those namespace pairs
> - injected uevents into each of those network namespaces 10,000 times meaning
>   10,000,000 (10 million) uevents were injected. (The high number of
>   uevent injections should get rid of a lot of jitter.)
> - Calculated the mean transaction time.
> - *without* uevent sequence number namespacing:
>   67 μs
> - *with* uevent sequence number namespacing:
>   55 μs
> - makes a difference of 12 μs
> 
> # Testcase 2:
> Injecting Uevents into network namespaces not owned by the initial user
> namespace and network namespaces owned by the initial user namespace.
> - created 500 new user namespace + network namespace pairs
> - created 500 new network namespace pairs
> - opened a uevent listener in each of those namespace pairs
> - injected uevents into each of those network namespaces 10,000 times meaning
>   10,000,000 (10 million) uevents were injected. (The high number of
>   uevent injections should get rid of a lot of jitter.)
> - Calculated the mean transaction time.
> - *without* uevent sequence number namespacing:
>   572 μs
> - *with* uevent sequence number namespacing:
>   514 μs
> - makes a difference of 58 μs
> 
> So there's performance gain. The third case would be to create a bunch
> of hanging processes that send SIGSTOP to themselves but do not actually
> open a uevent socket in their respective namespaces and then inject
> uevents into them. I expect there to be an even more performance
> benefits since the rtnl_table_lock() isn't hit in this case because
> there are no listeners.

I did the third test-case as well so:
- created 500 new user namespace + network namespace pairs *without
  uevent listeners*
- created 500 new network namespace pairs *without uevent listeners*
- injected uevents into each of those network namespaces 10,000 times meaning
  10,000,000 (10 million) uevents were injected. (The high number of
  uevent injections should get rid of a lot of jitter.)
- Calculated the mean transaction time.
- *without* uevent sequence number namespacing:
  206 μs
- *with* uevent sequence number namespacing:
  163 μs
- makes a difference of 43 μs

So this test-case shows performance improvement as well.

Thanks!
Christian

^ permalink raw reply

* Re: [RFC PATCH ghak32 V2 06/13] audit: add support for non-syscall auxiliary records
From: Paul Moore @ 2018-04-20 16:21 UTC (permalink / raw)
  To: Richard Guy Briggs
  Cc: cgroups, containers, linux-api, Linux-Audit Mailing List,
	linux-fsdevel, LKML, netdev, ebiederm, luto, jlayton, carlos,
	dhowells, viro, simo, Eric Paris, serge
In-Reply-To: <20180420012346.udnga5pfdjoazcfc@madcap2.tricolour.ca>

On Thu, Apr 19, 2018 at 9:23 PM, Richard Guy Briggs <rgb@redhat.com> wrote:
> On 2018-04-18 20:39, Paul Moore wrote:
>> On Fri, Mar 16, 2018 at 5:00 AM, Richard Guy Briggs <rgb@redhat.com> wrote:
>> > Standalone audit records have the timestamp and serial number generated
>> > on the fly and as such are unique, making them standalone.  This new
>> > function audit_alloc_local() generates a local audit context that will
>> > be used only for a standalone record and its auxiliary record(s).  The
>> > context is discarded immediately after the local associated records are
>> > produced.
>> >
>> > Signed-off-by: Richard Guy Briggs <rgb@redhat.com>
>> > ---
>> >  include/linux/audit.h |  8 ++++++++
>> >  kernel/auditsc.c      | 20 +++++++++++++++++++-
>> >  2 files changed, 27 insertions(+), 1 deletion(-)
>> >
>> > diff --git a/include/linux/audit.h b/include/linux/audit.h
>> > index ed16bb6..c0b83cb 100644
>> > --- a/include/linux/audit.h
>> > +++ b/include/linux/audit.h
>> > @@ -227,7 +227,9 @@ static inline int audit_log_container_info(struct audit_context *context,
>> >  /* These are defined in auditsc.c */
>> >                                 /* Public API */
>> >  extern int  audit_alloc(struct task_struct *task);
>> > +extern struct audit_context *audit_alloc_local(void);
>> >  extern void __audit_free(struct task_struct *task);
>> > +extern void audit_free_context(struct audit_context *context);
>> >  extern void __audit_syscall_entry(int major, unsigned long a0, unsigned long a1,
>> >                                   unsigned long a2, unsigned long a3);
>> >  extern void __audit_syscall_exit(int ret_success, long ret_value);
>> > @@ -472,6 +474,12 @@ static inline int audit_alloc(struct task_struct *task)
>> >  {
>> >         return 0;
>> >  }
>> > +static inline struct audit_context *audit_alloc_local(void)
>> > +{
>> > +       return NULL;
>> > +}
>> > +static inline void audit_free_context(struct audit_context *context)
>> > +{ }
>> >  static inline void audit_free(struct task_struct *task)
>> >  { }
>> >  static inline void audit_syscall_entry(int major, unsigned long a0,
>> > diff --git a/kernel/auditsc.c b/kernel/auditsc.c
>> > index 2932ef1..7103d23 100644
>> > --- a/kernel/auditsc.c
>> > +++ b/kernel/auditsc.c
>> > @@ -959,8 +959,26 @@ int audit_alloc(struct task_struct *tsk)
>> >         return 0;
>> >  }
>> >
>> > -static inline void audit_free_context(struct audit_context *context)
>> > +struct audit_context *audit_alloc_local(void)
>> >  {
>> > +       struct audit_context *context;
>> > +
>> > +       if (!audit_ever_enabled)
>> > +               return NULL; /* Return if not auditing. */
>> > +
>> > +       context = audit_alloc_context(AUDIT_RECORD_CONTEXT);
>> > +       if (!context)
>> > +               return NULL;
>> > +       context->serial = audit_serial();
>> > +       context->ctime = current_kernel_time64();
>> > +       context->in_syscall = 1;
>> > +       return context;
>> > +}
>> > +
>> > +inline void audit_free_context(struct audit_context *context)
>> > +{
>> > +       if (!context)
>> > +               return;
>> >         audit_free_names(context);
>> >         unroll_tree_refs(context, NULL, 0);
>> >         free_tree_refs(context);
>>
>> I'm reserving the option to comment on this idea further as I make my
>> way through the patchset, but audit_free_context() definitely
>> shouldn't be declared as an inline function.
>
> Ok, I think I follow.  When it wasn't exported, inline was fine, but now
> that it has been exported, it should no longer be inlined ...

Pretty much.  Based on a few comments I've seen by compiler folks over
the years, my current thinking is that we shouldn't worry about
explicit inlining static functions in C files (header files are a
different story).  The basic idea being that the compiler almost
always does a better job than us stupid developers.

> ... or should use
> an intermediate function name to export so that local uses of it can
> remain inline.

Possibly, but my guess is that the compiler could (will?) do that by
itself for code that lives in the same file.

-- 
paul moore
www.paul-moore.com

^ permalink raw reply

* [pci PATCH v8 0/4] Add support for unmanaged SR-IOV
From: Alexander Duyck @ 2018-04-20 16:28 UTC (permalink / raw)
  To: bhelgaas, alexander.h.duyck, linux-pci
  Cc: virtio-dev, kvm, netdev, dan.daly, linux-kernel, linux-nvme,
	keith.busch, netanel, ddutile, mheyne, liang-min.wang,
	mark.d.rustad, dwmw2, hch, dwmw

This series is meant to add support for SR-IOV on devices when the VFs are
not managed by the kernel. Examples of recent patches attempting to do this
include:
virto - https://patchwork.kernel.org/patch/10241225/
pci-stub - https://patchwork.kernel.org/patch/10109935/
vfio - https://patchwork.kernel.org/patch/10103353/
uio - https://patchwork.kernel.org/patch/9974031/

Since this is quickly blowing up into a multi-driver problem it is probably
best to implement this solution as generically as possible.

This series is an attempt to do that. What we do with this patch set is
provide a generic framework to enable SR-IOV in the case that the PF driver
doesn't support managing the VFs itself.

I based my patch set originally on the patch by Mark Rustad but there isn't
much left after going through and cleaning out the bits that were no longer
needed, and after incorporating the feedback from David Miller. At this point
the only items to be fully reused was his patch description which is now
present in patch 3 of the set.

This solution is limited in scope to just adding support for devices that
provide no functionality for SR-IOV other than allocating the VFs by
calling pci_enable_sriov. Previous sets had included patches for VFIO, but
for now I am dropping that as the scope of that work is larger then I
think I can take on at this time.

v2: Reduced scope back to just virtio_pci and vfio-pci
    Broke into 3 patch set from single patch
    Changed autoprobe behavior to always set when num_vfs is set non-zero
v3: Updated Documentation to clarify when sriov_unmanaged_autoprobe is used
    Wrapped vfio_pci_sriov_configure to fix build errors w/o SR-IOV in kernel
v4: Dropped vfio-pci patch
    Added ena and nvme to drivers now using pci_sriov_configure_unmanaged
    Dropped pci_disable_sriov call in virtio_pci to be consistent with ena
v5: Dropped sriov_unmanaged_autoprobe and pci_sriov_conifgure_unmanaged
    Added new patch that enables pci_sriov_configure_simple
    Updated drivers to use pci_sriov_configure_simple
v6: Defined pci_sriov_configure_simple as NULL when SR-IOV is not enabled
    Updated drivers to drop "#ifdef" checks for IOV
    Added pci-pf-stub as place for PF-only drivers to add support
v7: Dropped pci_id table explanation from pci-pf-stub driver
    Updated pci_sriov_configure_simple to drop need for err value
    Fixed comment explaining why pci_sriov_configure_simple is NULL
v8: Dropped virtio from the set, support to be added later after TC approval

Cc: Mark Rustad <mark.d.rustad@intel.com>
Cc: Maximilian Heyne <mheyne@amazon.de>
Cc: Liang-Min Wang <liang-min.wang@intel.com>
Cc: David Woodhouse <dwmw@amazon.co.uk>

---

Alexander Duyck (4):
      pci: Add pci_sriov_configure_simple for PFs that don't manage VF resources
      ena: Migrate over to unmanaged SR-IOV support
      nvme: Migrate over to unmanaged SR-IOV support
      pci-pf-stub: Add PF driver stub for PFs that function only to enable VFs


 drivers/net/ethernet/amazon/ena/ena_netdev.c |   28 -------------
 drivers/nvme/host/pci.c                      |   20 ----------
 drivers/pci/Kconfig                          |   12 ++++++
 drivers/pci/Makefile                         |    2 +
 drivers/pci/iov.c                            |   31 +++++++++++++++
 drivers/pci/pci-pf-stub.c                    |   54 ++++++++++++++++++++++++++
 include/linux/pci.h                          |    3 +
 include/linux/pci_ids.h                      |    2 +
 8 files changed, 106 insertions(+), 46 deletions(-)
 create mode 100644 drivers/pci/pci-pf-stub.c

--

^ permalink raw reply

* [pci PATCH v8 1/4] pci: Add pci_sriov_configure_simple for PFs that don't manage VF resources
From: Alexander Duyck @ 2018-04-20 16:28 UTC (permalink / raw)
  To: bhelgaas, alexander.h.duyck, linux-pci
  Cc: virtio-dev, kvm, netdev, dan.daly, linux-kernel, linux-nvme,
	keith.busch, netanel, ddutile, mheyne, liang-min.wang,
	mark.d.rustad, dwmw2, hch, dwmw
In-Reply-To: <20180420162633.46077.49012.stgit@ahduyck-green-test.jf.intel.com>

This patch adds a common configuration function called
pci_sriov_configure_simple that will allow for managing VFs on devices
where the PF is not capable of managing VF resources.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Mark Rustad <mark.d.rustad@intel.com>
---

v5: New patch replacing pci_sriov_configure_unmanaged with
      pci_sriov_configure_simple
    Dropped bits related to autoprobe changes
v6: Defined pci_sriov_configure_simple as NULL if IOV is disabled
v7: Updated pci_sriov_configure_simple to drop need for err value
    Fixed comment explaining why pci_sriov_configure_simple is NULL

 drivers/pci/iov.c   |   31 +++++++++++++++++++++++++++++++
 include/linux/pci.h |    3 +++
 2 files changed, 34 insertions(+)

diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
index 677924a..3e0a7fd 100644
--- a/drivers/pci/iov.c
+++ b/drivers/pci/iov.c
@@ -807,3 +807,34 @@ int pci_sriov_get_totalvfs(struct pci_dev *dev)
 	return dev->sriov->total_VFs;
 }
 EXPORT_SYMBOL_GPL(pci_sriov_get_totalvfs);
+
+/**
+ * pci_sriov_configure_simple - helper to configure unmanaged SR-IOV
+ * @dev: the PCI device
+ * @nr_virtfn: number of virtual functions to enable, 0 to disable
+ *
+ * Used to provide generic enable/disable SR-IOV option for devices
+ * that do not manage the VFs generated by their driver. Return value
+ * is negative on error, or number of VFs allocated on success.
+ */
+int pci_sriov_configure_simple(struct pci_dev *dev, int nr_virtfn)
+{
+	might_sleep();
+
+	if (!dev->is_physfn)
+		return -ENODEV;
+
+	if (pci_vfs_assigned(dev)) {
+		pci_warn(dev,
+			 "Cannot modify SR-IOV while VFs are assigned\n");
+		return -EPERM;
+	}
+
+	if (!nr_virtfn) {
+		sriov_disable(dev);
+		return 0;
+	}
+
+	return sriov_enable(dev, nr_virtfn) ? : nr_virtfn;
+}
+EXPORT_SYMBOL_GPL(pci_sriov_configure_simple);
diff --git a/include/linux/pci.h b/include/linux/pci.h
index ae42289..7d36e39 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -1955,6 +1955,7 @@ static inline void pci_mmcfg_late_init(void) { }
 int pci_vfs_assigned(struct pci_dev *dev);
 int pci_sriov_set_totalvfs(struct pci_dev *dev, u16 numvfs);
 int pci_sriov_get_totalvfs(struct pci_dev *dev);
+int pci_sriov_configure_simple(struct pci_dev *dev, int nr_virtfn);
 resource_size_t pci_iov_resource_size(struct pci_dev *dev, int resno);
 void pci_vf_drivers_autoprobe(struct pci_dev *dev, bool probe);
 #else
@@ -1982,6 +1983,8 @@ static inline int pci_sriov_set_totalvfs(struct pci_dev *dev, u16 numvfs)
 { return 0; }
 static inline int pci_sriov_get_totalvfs(struct pci_dev *dev)
 { return 0; }
+/* this is expected to be used as a function pointer, just define as NULL */
+#define pci_sriov_configure_simple NULL
 static inline resource_size_t pci_iov_resource_size(struct pci_dev *dev, int resno)
 { return 0; }
 static inline void pci_vf_drivers_autoprobe(struct pci_dev *dev, bool probe) { }

^ permalink raw reply related

* [pci PATCH v8 2/4] ena: Migrate over to unmanaged SR-IOV support
From: Alexander Duyck @ 2018-04-20 16:30 UTC (permalink / raw)
  To: bhelgaas, alexander.h.duyck, linux-pci
  Cc: virtio-dev, kvm, netdev, dan.daly, linux-kernel, linux-nvme,
	keith.busch, netanel, ddutile, mheyne, liang-min.wang,
	mark.d.rustad, dwmw2, hch, dwmw
In-Reply-To: <20180420162633.46077.49012.stgit@ahduyck-green-test.jf.intel.com>

Instead of implementing our own version of a SR-IOV configuration stub in
the ena driver we can just reuse the existing
pci_sriov_configure_simple function.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
---

v5: Replaced call to pci_sriov_configure_unmanaged with
        pci_sriov_configure_simple
v6: Dropped "#ifdef" checks for IOV wrapping sriov_configure definition
v7: No change

 drivers/net/ethernet/amazon/ena/ena_netdev.c |   28 +-------------------------
 1 file changed, 1 insertion(+), 27 deletions(-)

diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c b/drivers/net/ethernet/amazon/ena/ena_netdev.c
index a822e70..f2af87d 100644
--- a/drivers/net/ethernet/amazon/ena/ena_netdev.c
+++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c
@@ -3386,32 +3386,6 @@ static int ena_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
 }
 
 /*****************************************************************************/
-static int ena_sriov_configure(struct pci_dev *dev, int numvfs)
-{
-	int rc;
-
-	if (numvfs > 0) {
-		rc = pci_enable_sriov(dev, numvfs);
-		if (rc != 0) {
-			dev_err(&dev->dev,
-				"pci_enable_sriov failed to enable: %d vfs with the error: %d\n",
-				numvfs, rc);
-			return rc;
-		}
-
-		return numvfs;
-	}
-
-	if (numvfs == 0) {
-		pci_disable_sriov(dev);
-		return 0;
-	}
-
-	return -EINVAL;
-}
-
-/*****************************************************************************/
-/*****************************************************************************/
 
 /* ena_remove - Device Removal Routine
  * @pdev: PCI device information struct
@@ -3526,7 +3500,7 @@ static int ena_resume(struct pci_dev *pdev)
 	.suspend    = ena_suspend,
 	.resume     = ena_resume,
 #endif
-	.sriov_configure = ena_sriov_configure,
+	.sriov_configure = pci_sriov_configure_simple,
 };
 
 static int __init ena_init(void)

^ permalink raw reply related

* [pci PATCH v8 3/4] nvme: Migrate over to unmanaged SR-IOV support
From: Alexander Duyck @ 2018-04-20 16:31 UTC (permalink / raw)
  To: bhelgaas, alexander.h.duyck, linux-pci
  Cc: virtio-dev, kvm, netdev, dan.daly, linux-kernel, linux-nvme,
	keith.busch, netanel, ddutile, mheyne, liang-min.wang,
	mark.d.rustad, dwmw2, hch, dwmw
In-Reply-To: <20180420162633.46077.49012.stgit@ahduyck-green-test.jf.intel.com>

Instead of implementing our own version of a SR-IOV configuration stub in
the nvme driver we can just reuse the existing
pci_sriov_configure_simple function.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
---

v5: Replaced call to pci_sriov_configure_unmanaged with
        pci_sriov_configure_simple
v6: Dropped "#ifdef" checks for IOV wrapping sriov_configure definition
v7: No code change, added Reviewed-by

 drivers/nvme/host/pci.c |   20 +-------------------
 1 file changed, 1 insertion(+), 19 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index b6f43b7..ad85cf35 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -2581,24 +2581,6 @@ static void nvme_remove(struct pci_dev *pdev)
 	nvme_put_ctrl(&dev->ctrl);
 }
 
-static int nvme_pci_sriov_configure(struct pci_dev *pdev, int numvfs)
-{
-	int ret = 0;
-
-	if (numvfs == 0) {
-		if (pci_vfs_assigned(pdev)) {
-			dev_warn(&pdev->dev,
-				"Cannot disable SR-IOV VFs while assigned\n");
-			return -EPERM;
-		}
-		pci_disable_sriov(pdev);
-		return 0;
-	}
-
-	ret = pci_enable_sriov(pdev, numvfs);
-	return ret ? ret : numvfs;
-}
-
 #ifdef CONFIG_PM_SLEEP
 static int nvme_suspend(struct device *dev)
 {
@@ -2717,7 +2699,7 @@ static void nvme_error_resume(struct pci_dev *pdev)
 	.driver		= {
 		.pm	= &nvme_dev_pm_ops,
 	},
-	.sriov_configure = nvme_pci_sriov_configure,
+	.sriov_configure = pci_sriov_configure_simple,
 	.err_handler	= &nvme_err_handler,
 };
 

^ permalink raw reply related

* [pci PATCH v8 4/4] pci-pf-stub: Add PF driver stub for PFs that function only to enable VFs
From: Alexander Duyck @ 2018-04-20 16:31 UTC (permalink / raw)
  To: bhelgaas, alexander.h.duyck, linux-pci
  Cc: virtio-dev, kvm, netdev, dan.daly, linux-kernel, linux-nvme,
	keith.busch, netanel, ddutile, mheyne, liang-min.wang,
	mark.d.rustad, dwmw2, hch, dwmw
In-Reply-To: <20180420162633.46077.49012.stgit@ahduyck-green-test.jf.intel.com>

Add a new driver called "pci-pf-stub" to act as a "white-list" for PF
devices that provide no other functionality other then acting as a means of
allocating a set of VFs. For now I only have one example ID provided by
Amazon in terms of devices that require this functionality. The general
idea is that in the future we will see other devices added as vendors come
up with devices where the PF is more or less just a lightweight shim used
to allocate VFs.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
---

v6: New driver to address concerns about Amazon devices left unsupported
v7: Dropped pci_id table explanation from pci-pf-stub driver

 drivers/pci/Kconfig       |   12 ++++++++++
 drivers/pci/Makefile      |    2 ++
 drivers/pci/pci-pf-stub.c |   54 +++++++++++++++++++++++++++++++++++++++++++++
 include/linux/pci_ids.h   |    2 ++
 4 files changed, 70 insertions(+)
 create mode 100644 drivers/pci/pci-pf-stub.c

diff --git a/drivers/pci/Kconfig b/drivers/pci/Kconfig
index 34b56a8..cdef2a2 100644
--- a/drivers/pci/Kconfig
+++ b/drivers/pci/Kconfig
@@ -71,6 +71,18 @@ config PCI_STUB
 
 	  When in doubt, say N.
 
+config PCI_PF_STUB
+	tristate "PCI PF Stub driver"
+	depends on PCI
+	depends on PCI_IOV
+	help
+	  Say Y or M here if you want to enable support for devices that
+	  require SR-IOV support, while at the same time the PF itself is
+	  not providing any actual services on the host itself such as
+	  storage or networking.
+
+	  When in doubt, say N.
+
 config XEN_PCIDEV_FRONTEND
         tristate "Xen PCI Frontend"
         depends on PCI && X86 && XEN
diff --git a/drivers/pci/Makefile b/drivers/pci/Makefile
index 9419709..4e133d3 100644
--- a/drivers/pci/Makefile
+++ b/drivers/pci/Makefile
@@ -43,6 +43,8 @@ obj-$(CONFIG_PCI_SYSCALL) += syscall.o
 
 obj-$(CONFIG_PCI_STUB) += pci-stub.o
 
+obj-$(CONFIG_PCI_PF_STUB) += pci-pf-stub.o
+
 obj-$(CONFIG_PCI_ECAM) += ecam.o
 
 obj-$(CONFIG_XEN_PCIDEV_FRONTEND) += xen-pcifront.o
diff --git a/drivers/pci/pci-pf-stub.c b/drivers/pci/pci-pf-stub.c
new file mode 100644
index 0000000..9d5fdf2
--- /dev/null
+++ b/drivers/pci/pci-pf-stub.c
@@ -0,0 +1,54 @@
+// SPDX-License-Identifier: GPL-2.0
+/* pci-pf-stub - simple stub driver for PCI SR-IOV PF device
+ *
+ * This driver is meant to act as a "white-list" for devices that provde
+ * SR-IOV functionality while at the same time not actually needing a
+ * driver of their own.
+ */
+
+#include <linux/module.h>
+#include <linux/pci.h>
+
+/**
+ * pci_pf_stub_white_list - White list of devices to bind pci-pf-stub onto
+ *
+ * This table provides the list of IDs this driver is supposed to bind
+ * onto. You could think of this as a list of "quirked" devices where we
+ * are adding support for SR-IOV here since there are no other drivers
+ * that they would be running under.
+ */
+static const struct pci_device_id pci_pf_stub_white_list[] = {
+	{ PCI_VDEVICE(AMAZON, 0x0053) },
+	/* required last entry */
+	{ 0 }
+};
+MODULE_DEVICE_TABLE(pci, pci_pf_stub_white_list);
+
+static int pci_pf_stub_probe(struct pci_dev *dev,
+			     const struct pci_device_id *id)
+{
+	pci_info(dev, "claimed by pci-pf-stub\n");
+	return 0;
+}
+
+static struct pci_driver pf_stub_driver = {
+	.name			= "pci-pf-stub",
+	.id_table		= pci_pf_stub_white_list,
+	.probe			= pci_pf_stub_probe,
+	.sriov_configure	= pci_sriov_configure_simple,
+};
+
+static int __init pci_pf_stub_init(void)
+{
+	return pci_register_driver(&pf_stub_driver);
+}
+
+static void __exit pci_pf_stub_exit(void)
+{
+	pci_unregister_driver(&pf_stub_driver);
+}
+
+module_init(pci_pf_stub_init);
+module_exit(pci_pf_stub_exit);
+
+MODULE_LICENSE("GPL");
diff --git a/include/linux/pci_ids.h b/include/linux/pci_ids.h
index a637a7d..62dab14 100644
--- a/include/linux/pci_ids.h
+++ b/include/linux/pci_ids.h
@@ -2549,6 +2549,8 @@
 #define PCI_VENDOR_ID_CIRCUITCO		0x1cc8
 #define PCI_SUBSYSTEM_ID_CIRCUITCO_MINNOWBOARD	0x0001
 
+#define PCI_VENDOR_ID_AMAZON		0x1d0f
+
 #define PCI_VENDOR_ID_TEKRAM		0x1de1
 #define PCI_DEVICE_ID_TEKRAM_DC290	0xdc29
 

^ permalink raw reply related

* Re: [PATCH 0/8] arm: renesas: Change platform dependency to ARCH_RENESAS
From: Mark Brown @ 2018-04-20 16:48 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: alsa-devel, Kuninori Morimoto, Catalin Marinas, Will Deacon,
	Liam Girdwood, Laurent Pinchart, devel, Mauro Carvalho Chehab,
	Vinod Koul, Magnus Damm, Russell King, linux-media, Arnd Bergmann,
	Simon Horman, Dan Williams, Jaroslav Kysela, linux-arm-kernel,
	Sergei Shtylyov, Greg Kroah-Hartman, Takashi Iwai, linux-kernel,
	linux-renesas-soc, netdev, dmaengine
In-Reply-To: <1524230914-10175-1-git-send-email-geert+renesas@glider.be>


[-- Attachment #1.1: Type: text/plain, Size: 460 bytes --]

On Fri, Apr 20, 2018 at 03:28:26PM +0200, Geert Uytterhoeven wrote:

> The first 6 patches can be applied independently by subsystem
> maintainers.
> The last two patches depend on the first 6 patches, and are thus marked
> RFC.

Would it not make sense to try to apply everything en masse rather than
delaying?  I'm happy to apply the subsystem stuff but if it gets things
done quicker or more efficiently I'm also happy to have the lot merged
as one series.

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

[-- Attachment #2: Type: text/plain, Size: 169 bytes --]

_______________________________________________
devel mailing list
devel@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel

^ permalink raw reply

* [RFC iproute 0/5] print_uint issues
From: Stephen Hemminger @ 2018-04-20 17:15 UTC (permalink / raw)
  To: alin.nastac; +Cc: netdev, Stephen Hemminger

There are several issues with implicit conversions in json
handling print_uint function which were first seen and patches
proposed by Alin Nastac.  This is my swipe at them.

Final version will be some combination of his patches and
some of this.

Spotted some more changing print_uint into a macro,
but that got ugly.

Stephen Hemminger (5):
  ipneigh: fix missing format specifier
  json: make json print_uint take unsigned int
  flower: use 16 bit format where possible
  tcp_metrics: use print_luint
  mroute: use print_uint64

 include/json_print.h | 16 +++++++++++++---
 ip/ipmroute.c        | 10 +++++-----
 ip/ipneigh.c         |  2 +-
 ip/tcp_metrics.c     |  2 +-
 lib/json_print.c     | 33 ++++++++++++++++++++++++++++++---
 tc/f_flower.c        |  2 +-
 6 files changed, 51 insertions(+), 14 deletions(-)

-- 
2.17.0

^ permalink raw reply

* [RFC iproute 1/5] ipneigh: fix missing format specifier
From: Stephen Hemminger @ 2018-04-20 17:15 UTC (permalink / raw)
  To: alin.nastac; +Cc: netdev, Stephen Hemminger
In-Reply-To: <20180420171519.8028-1-stephen@networkplumber.org>

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 ip/ipneigh.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/ip/ipneigh.c b/ip/ipneigh.c
index 4748381701e4..bd6e5c5e18ca 100644
--- a/ip/ipneigh.c
+++ b/ip/ipneigh.c
@@ -204,7 +204,7 @@ static void print_cacheinfo(const struct nda_cacheinfo *ci)
 
 	print_uint(PRINT_ANY, "used", " used %u", ci->ndm_used / hz);
 	print_uint(PRINT_ANY, "confirmed", "/%u", ci->ndm_confirmed / hz);
-	print_uint(PRINT_ANY, "updated", "/u", ci->ndm_updated / hz);
+	print_uint(PRINT_ANY, "updated", "/%u", ci->ndm_updated / hz);
 }
 
 static void print_neigh_state(unsigned int nud)
-- 
2.17.0

^ permalink raw reply related

* [RFC iproute 2/5] json: make json print_uint take unsigned int
From: Stephen Hemminger @ 2018-04-20 17:15 UTC (permalink / raw)
  To: alin.nastac; +Cc: netdev, Stephen Hemminger
In-Reply-To: <20180420171519.8028-1-stephen@networkplumber.org>

It is a source of possible bugs that json print handler print_uint
was doing implict conversion to 64 bit than printing with the
format specifier which often had only a unsigned format value.

Instead introduce wider range of print stubs for unsigned
integer types. No color versions of these necessary.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 include/json_print.h | 16 +++++++++++++---
 lib/json_print.c     | 33 ++++++++++++++++++++++++++++++---
 2 files changed, 43 insertions(+), 6 deletions(-)

diff --git a/include/json_print.h b/include/json_print.h
index 2ca7830adbd6..3d400ecf8f50 100644
--- a/include/json_print.h
+++ b/include/json_print.h
@@ -59,12 +59,22 @@ _PRINT_FUNC(int, int);
 _PRINT_FUNC(bool, bool);
 _PRINT_FUNC(null, const char*);
 _PRINT_FUNC(string, const char*);
-_PRINT_FUNC(uint, uint64_t);
-_PRINT_FUNC(hu, unsigned short);
 _PRINT_FUNC(hex, unsigned int);
 _PRINT_FUNC(0xhex, unsigned int);
-_PRINT_FUNC(lluint, unsigned long long int);
 _PRINT_FUNC(float, double);
 #undef _PRINT_FUNC
 
+#define _PRINT_FUNC(type_name, type)			\
+	void print_##type_name(enum output_type t,	\
+			       const char *key,		\
+			       const char *fmt,		\
+			       type value);
+
+_PRINT_FUNC(hu, unsigned short);
+_PRINT_FUNC(uint, unsigned int);
+_PRINT_FUNC(luint, unsigned long int);
+_PRINT_FUNC(uint64, uint64_t);
+_PRINT_FUNC(lluint, unsigned long long int);
+#undef _PRINT_FUNC
+
 #endif /* _JSON_PRINT_H_ */
diff --git a/lib/json_print.c b/lib/json_print.c
index bda7293350c3..696d8c01d3e6 100644
--- a/lib/json_print.c
+++ b/lib/json_print.c
@@ -116,12 +116,39 @@ void close_json_array(enum output_type type, const char *str)
 		}							\
 	}
 _PRINT_FUNC(int, int);
-_PRINT_FUNC(hu, unsigned short);
-_PRINT_FUNC(uint, uint64_t);
-_PRINT_FUNC(lluint, unsigned long long int);
 _PRINT_FUNC(float, double);
 #undef _PRINT_FUNC
 
+static void json_print_uint64(const char *key, uint64_t val)
+{
+	if (key)
+		jsonw_uint_field(_jw, key, val);
+	else
+		jsonw_uint(_jw, val);
+}
+
+#pragma GCC diagnostic push
+#pragma GCC diagnostic ignored "-Wformat-nonliteral"
+#define PRINT_FUNC(type_name, type)		\
+	void print_##type_name(enum output_type t, \
+				const char *key,	\
+				const char *fmt,	\
+				type value)		\
+	{						\
+		if (_IS_JSON_CONTEXT(t))		\
+			json_print_uint64(key, value);	\
+		else if (_IS_FP_CONTEXT(t))		\
+			printf(fmt, value);		\
+	}
+
+PRINT_FUNC(hu, unsigned short);
+PRINT_FUNC(uint, unsigned int);
+PRINT_FUNC(lluint, unsigned long long int);
+PRINT_FUNC(luint, unsigned long int);
+PRINT_FUNC(uint64, uint64_t);
+#undef PRINT_FUNC
+#pragma GCC diagnostic pop
+
 void print_color_string(enum output_type type,
 			enum color_attr color,
 			const char *key,
-- 
2.17.0

^ permalink raw reply related

* [RFC iproute 3/5] flower: use 16 bit format where possible
From: Stephen Hemminger @ 2018-04-20 17:15 UTC (permalink / raw)
  To: alin.nastac; +Cc: netdev, Stephen Hemminger
In-Reply-To: <20180420171519.8028-1-stephen@networkplumber.org>

Should use print_hu not print_uint for 16 bit value.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 tc/f_flower.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tc/f_flower.c b/tc/f_flower.c
index 9d4bfd2f808b..ba8eb66cdd11 100644
--- a/tc/f_flower.c
+++ b/tc/f_flower.c
@@ -1234,7 +1234,7 @@ static void flower_print_port(char *name, struct rtattr *attr)
 		return;
 
 	sprintf(namefrm,"\n  %s %%u", name);
-	print_uint(PRINT_ANY, name, namefrm, rta_getattr_be16(attr));
+	print_hu(PRINT_ANY, name, namefrm, rta_getattr_be16(attr));
 }
 
 static void flower_print_tcp_flags(char *name, struct rtattr *flags_attr,
-- 
2.17.0

^ permalink raw reply related

* [RFC iproute 4/5] tcp_metrics: use print_luint
From: Stephen Hemminger @ 2018-04-20 17:15 UTC (permalink / raw)
  To: alin.nastac; +Cc: netdev, Stephen Hemminger
In-Reply-To: <20180420171519.8028-1-stephen@networkplumber.org>

Metrics are long unsigned, so use correct print function.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 ip/tcp_metrics.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/ip/tcp_metrics.c b/ip/tcp_metrics.c
index 72dc980c92a6..3b82a4e1d2f2 100644
--- a/ip/tcp_metrics.c
+++ b/ip/tcp_metrics.c
@@ -139,7 +139,7 @@ static void print_tcp_metrics(struct rtattr *a)
 
 		print_uint(PRINT_JSON, name, NULL, val);
 		print_string(PRINT_FP, NULL, " %s ", name);
-		print_uint(PRINT_FP, NULL, "%lu", val);
+		print_luint(PRINT_FP, NULL, "%lu", val);
 	}
 
 	if (rtt) {
-- 
2.17.0

^ permalink raw reply related

* [RFC iproute 5/5] mroute: use print_uint64
From: Stephen Hemminger @ 2018-04-20 17:15 UTC (permalink / raw)
  To: alin.nastac; +Cc: netdev, Stephen Hemminger
In-Reply-To: <20180420171519.8028-1-stephen@networkplumber.org>

The values from multicast stats are 64 bit always.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 ip/ipmroute.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/ip/ipmroute.c b/ip/ipmroute.c
index 59c5b7718e18..eb6b2816324f 100644
--- a/ip/ipmroute.c
+++ b/ip/ipmroute.c
@@ -182,14 +182,14 @@ int print_mroute(const struct sockaddr_nl *who, struct nlmsghdr *n, void *arg)
 		struct rta_mfc_stats *mfcs = RTA_DATA(tb[RTA_MFC_STATS]);
 
 		print_string(PRINT_FP, NULL, "%s", _SL_);
-		print_uint(PRINT_ANY, "packets", "  %"PRIu64" packets,",
+		print_uint64(PRINT_ANY, "packets", "  %"PRIu64" packets,",
 			   mfcs->mfcs_packets);
-		print_uint(PRINT_ANY, "bytes", " %"PRIu64" bytes", mfcs->mfcs_bytes);
+		print_uint64(PRINT_ANY, "bytes", " %"PRIu64" bytes", mfcs->mfcs_bytes);
 
 		if (mfcs->mfcs_wrong_if)
-			print_uint(PRINT_ANY, "wrong_if",
-				   ", %"PRIu64" arrived on wrong iif.",
-				   mfcs->mfcs_wrong_if);
+			print_uint64(PRINT_ANY, "wrong_if",
+				     ", %"PRIu64" arrived on wrong iif.",
+				     mfcs->mfcs_wrong_if);
 	}
 
 	if (show_stats && tb[RTA_EXPIRES]) {
-- 
2.17.0

^ permalink raw reply related

* Re: [PATCH net-next 0/5] virtio-net: Add SCTP checksum offload support
From: Marcelo Ricardo Leitner @ 2018-04-20 17:22 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Vlad Yasevich, Vladislav Yasevich, netdev, linux-sctp,
	virtualization, jasowang, nhorman
In-Reply-To: <20180418162633-mutt-send-email-mst@kernel.org>

On Wed, Apr 18, 2018 at 05:06:46PM +0300, Michael S. Tsirkin wrote:
> On Tue, Apr 17, 2018 at 04:35:18PM -0400, Vlad Yasevich wrote:
> > On 04/02/2018 10:47 AM, Marcelo Ricardo Leitner wrote:
> > > On Mon, Apr 02, 2018 at 09:40:01AM -0400, Vladislav Yasevich wrote:
> > >> Now that we have SCTP offload capabilities in the kernel, we can add
> > >> them to virtio as well.  First step is SCTP checksum.
> > >
> > > Thanks.
> > >
> > >> As for GSO, the way sctp GSO is currently implemented buys us nothing
> > >> in added support to virtio.  To add true GSO, would require a lot of
> > >> re-work inside of SCTP and would require extensions to the virtio
> > >> net header to carry extra sctp data.
> > >
> > > Can you please elaborate more on this? Is this because SCTP GSO relies
> > > on the gso skb format for knowing how to segment it instead of having
> > > a list of sizes?
> > >
> >
> > it's mainly because all the true segmentation, placing data into chunks,
> > has already happened.  All that GSO does is allow for higher bundling
> > rate between VMs. If that is all SCTP GSO ever going to do, that fine,
> > but the goal is to do real GSO eventually and potentially reduce the
> > amount of memory copying we are doing.
> > If we do that, any current attempt at GSO in virtio would have to be
> > depricated and we'd need GSO2 or something like that.
>
> Batching helps virtualization *a lot* though.

Yep. The results posted by Xin in the other email give good insights
on it.

> Are there actual plans for GSO2? Is it just for SCTP?

No plans. In this context, at least, yes, just for SCTP.

It was a supposition in case we start doing a different GSO for SCTP,
one more like what we have for TCP.

Currently, as the SCTP GSO code doesn't leave the system, we can
update it if we want. But by the moment we add support for it in
virtio, we will have to be backwards compatible if we end up doing
SCTP GSO differently.

But again, I don't think such approach for SCTP GSO would be neither
feasible or worth. The complexity for it, to work across stream
schedules and late TSN allocation, would do more harm then good IMO.

>
> >
> > This is why, after doing the GSO support, I decided not to include it.
> >
> > -vlad
> > >   Marcelo
> > >

^ permalink raw reply

* Re: [pci PATCH v8 0/4] Add support for unmanaged SR-IOV
From: Randy Dunlap @ 2018-04-20 17:23 UTC (permalink / raw)
  To: Alexander Duyck, bhelgaas, linux-pci
  Cc: virtio-dev, kvm, netdev, dan.daly, linux-kernel, linux-nvme,
	keith.busch, netanel, ddutile, mheyne, liang-min.wang,
	mark.d.rustad, dwmw2, hch, dwmw
In-Reply-To: <20180420162633.46077.49012.stgit@ahduyck-green-test.jf.intel.com>

On 04/20/18 09:28, Alexander Duyck wrote:
> This series is meant to add support for SR-IOV on devices when the VFs are
> not managed by the kernel. Examples of recent patches attempting to do this
> include:
> virto - https://patchwork.kernel.org/patch/10241225/
> pci-stub - https://patchwork.kernel.org/patch/10109935/
> vfio - https://patchwork.kernel.org/patch/10103353/
> uio - https://patchwork.kernel.org/patch/9974031/

Hi,

Somewhere in this patch series it would be nice to tell us what the heck
a "PF" is.  :)

Thanks.

> Since this is quickly blowing up into a multi-driver problem it is probably
> best to implement this solution as generically as possible.
> 
> This series is an attempt to do that. What we do with this patch set is
> provide a generic framework to enable SR-IOV in the case that the PF driver
> doesn't support managing the VFs itself.
> 
> I based my patch set originally on the patch by Mark Rustad but there isn't
> much left after going through and cleaning out the bits that were no longer
> needed, and after incorporating the feedback from David Miller. At this point
> the only items to be fully reused was his patch description which is now
> present in patch 3 of the set.
> 
> This solution is limited in scope to just adding support for devices that
> provide no functionality for SR-IOV other than allocating the VFs by
> calling pci_enable_sriov. Previous sets had included patches for VFIO, but
> for now I am dropping that as the scope of that work is larger then I
> think I can take on at this time.
> 
> v2: Reduced scope back to just virtio_pci and vfio-pci
>     Broke into 3 patch set from single patch
>     Changed autoprobe behavior to always set when num_vfs is set non-zero
> v3: Updated Documentation to clarify when sriov_unmanaged_autoprobe is used
>     Wrapped vfio_pci_sriov_configure to fix build errors w/o SR-IOV in kernel
> v4: Dropped vfio-pci patch
>     Added ena and nvme to drivers now using pci_sriov_configure_unmanaged
>     Dropped pci_disable_sriov call in virtio_pci to be consistent with ena
> v5: Dropped sriov_unmanaged_autoprobe and pci_sriov_conifgure_unmanaged
>     Added new patch that enables pci_sriov_configure_simple
>     Updated drivers to use pci_sriov_configure_simple
> v6: Defined pci_sriov_configure_simple as NULL when SR-IOV is not enabled
>     Updated drivers to drop "#ifdef" checks for IOV
>     Added pci-pf-stub as place for PF-only drivers to add support
> v7: Dropped pci_id table explanation from pci-pf-stub driver
>     Updated pci_sriov_configure_simple to drop need for err value
>     Fixed comment explaining why pci_sriov_configure_simple is NULL
> v8: Dropped virtio from the set, support to be added later after TC approval
> 
> Cc: Mark Rustad <mark.d.rustad@intel.com>
> Cc: Maximilian Heyne <mheyne@amazon.de>
> Cc: Liang-Min Wang <liang-min.wang@intel.com>
> Cc: David Woodhouse <dwmw@amazon.co.uk>
> 
> ---
> 
> Alexander Duyck (4):
>       pci: Add pci_sriov_configure_simple for PFs that don't manage VF resources
>       ena: Migrate over to unmanaged SR-IOV support
>       nvme: Migrate over to unmanaged SR-IOV support
>       pci-pf-stub: Add PF driver stub for PFs that function only to enable VFs
> 
> 
>  drivers/net/ethernet/amazon/ena/ena_netdev.c |   28 -------------
>  drivers/nvme/host/pci.c                      |   20 ----------
>  drivers/pci/Kconfig                          |   12 ++++++
>  drivers/pci/Makefile                         |    2 +
>  drivers/pci/iov.c                            |   31 +++++++++++++++
>  drivers/pci/pci-pf-stub.c                    |   54 ++++++++++++++++++++++++++
>  include/linux/pci.h                          |    3 +
>  include/linux/pci_ids.h                      |    2 +
>  8 files changed, 106 insertions(+), 46 deletions(-)
>  create mode 100644 drivers/pci/pci-pf-stub.c
> 
> --
> 

-- 
~Randy

^ permalink raw reply

* Re: [PATCH] PCI: Add PCIe to pcie_print_link_status() messages
From: Bjorn Helgaas @ 2018-04-20 17:38 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: oss-drivers, Tal Gilboa, Tariq Toukan, Jacob Keller,
	Ganesh Goudar, Jeff Kirsher, intel-wired-lan, netdev,
	linux-kernel, linux-pci
In-Reply-To: <20180413181638.6424-1-jakub.kicinski@netronome.com>

On Fri, Apr 13, 2018 at 11:16:38AM -0700, Jakub Kicinski wrote:
> Currently the pcie_print_link_status() will print PCIe bandwidth
> and link width information but does not mention it is pertaining
> to the PCIe.  Since this and related functions are used exclusively
> by networking drivers today users may get confused into thinking
> that it's the NIC bandwidth that is being talked about.  Insert a
> "PCIe" into the messages.
> 
> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>

Applied to for-linus for v4.17, thanks!

> ---
>  drivers/pci/pci.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index aa86e904f93c..73a0a4993f6a 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -5273,11 +5273,11 @@ void pcie_print_link_status(struct pci_dev *dev)
>  	bw_avail = pcie_bandwidth_available(dev, &limiting_dev, &speed, &width);
>  
>  	if (bw_avail >= bw_cap)
> -		pci_info(dev, "%u.%03u Gb/s available bandwidth (%s x%d link)\n",
> +		pci_info(dev, "%u.%03u Gb/s available PCIe bandwidth (%s x%d link)\n",
>  			 bw_cap / 1000, bw_cap % 1000,
>  			 PCIE_SPEED2STR(speed_cap), width_cap);
>  	else
> -		pci_info(dev, "%u.%03u Gb/s available bandwidth, limited by %s x%d link at %s (capable of %u.%03u Gb/s with %s x%d link)\n",
> +		pci_info(dev, "%u.%03u Gb/s available PCIe bandwidth, limited by %s x%d link at %s (capable of %u.%03u Gb/s with %s x%d link)\n",
>  			 bw_avail / 1000, bw_avail % 1000,
>  			 PCIE_SPEED2STR(speed), width,
>  			 limiting_dev ? pci_name(limiting_dev) : "<unknown>",
> -- 
> 2.16.2
> 

^ permalink raw reply

* Re: [PATCH RFC net-next 00/11] udp gso
From: Alexander Duyck @ 2018-04-20 17:38 UTC (permalink / raw)
  To: Willem de Bruijn
  Cc: David Miller, Sowmini Varadhan, Samudrala, Sridhar, Netdev,
	Willem de Bruijn
In-Reply-To: <CAF=yD-+V7B67NbL8aELa9QR0Hx8bAvKNQ=JfjkwGGFtey-2FOw@mail.gmail.com>

On Wed, Apr 18, 2018 at 11:22 AM, Willem de Bruijn
<willemdebruijn.kernel@gmail.com> wrote:
> On Wed, Apr 18, 2018 at 2:12 PM, Alexander Duyck
> <alexander.duyck@gmail.com> wrote:
>> On Wed, Apr 18, 2018 at 10:28 AM, David Miller <davem@davemloft.net> wrote:
>>> From: Sowmini Varadhan <sowmini.varadhan@oracle.com>
>>> Date: Wed, 18 Apr 2018 08:31:03 -0400
>>>
>>>> However, I share Sridhar's concerns about the very fundamental change
>>>> to UDP message boundary semantics here.  There is actually no such thing
>>>> as a "segment" in udp, so in general this feature makes me a little
>>>> uneasy.  Well behaved udp applications should already be sending mtu
>>>> sized datagrams. And the not-so-well-behaved ones are probably relying
>>>> on IP fragmentation/reassembly to take care of datagram boundary semantics
>>>> for them?
>>>>
>>>> As Sridhar points out, the feature is not really "negotiated" - one side
>>>> unilaterally sets the option. If the receiver is a classic/POSIX UDP
>>>> implementation, it will have no way of knowing that message boundaries
>>>> have been re-adjusted at the sender.
>>>
>>> There are no "semantics".
>>>
>>> What ends up on the wire is the same before the kernel/app changes as
>>> afterwards.
>>>
>>> The only difference is that instead of the application doing N - 1
>>> sendmsg() calls with mtu sized writes, it's giving everything all at
>>> once and asking the kernel to segment.
>>>
>>> It even gives the application control over the size of the packets,
>>> which I think is completely prudent in this situation.
>>
>> My only concern with the patch set is verifying what mitigations are
>> in case so that we aren't trying to set an MSS size that results in a
>> frame larger than MTU. I'm still digging through the code and trying
>> to grok it, but I figured I might just put the question out there to
>> may my reviewing easier.
>
> This is checked in udp_send_skb in
>
>                 const int hlen = skb_network_header_len(skb) +
>                                  sizeof(struct udphdr);
>
>                 if (hlen + cork->gso_size > cork->fragsize)
>                         return -EINVAL;
>
> At this point cork->fragsize will have been set in ip_setup_cork
> based on device or path mtu:
>
>         cork->fragsize = ip_sk_use_pmtu(sk) ?
>                          dst_mtu(&rt->dst) : rt->dst.dev->mtu;
>
> The feature bypasses the MTU sanity checks in __ip_append_data
> by setting local variable mtu to a network layer max
>
>         mtu = cork->gso_size ? IP_MAX_MTU : cork->fragsize;
>
> but the above should perform the same check, only later. The
> check in udp_send_skb is simpler as it does not have to deal
> with the case of fragmentation.
>
>> Also any plans for HW offload support for this? I vaguely recall that
>> the igb and ixgbe parts had support for something like this in
>> hardware. I would have to double check to see what exactly is
>> supported.
>
> I hadn't given that much thought until the request yesterday to
> expose the NETIF_F_GSO_UDP_L4 flag through ethtool. By
> virtue of having only a single fixed segmentation length, it
> appears reasonably straightforward to offload.

Actually I just got a chance to start on a review of things. Do we
need to have to use both GSO_UDP and and GSO_UDP_L4? It might be
better if we could split these up and specifically call out GSO_UDP as
UFO and GSO_UDP_L4 as being UDP segmentation.

My only concern is that I don't think devices would want to have to
try and advertise both GSO_UDP and GSO_UDP_L4 if they want to support
UDP segmentation only. Ideally we want to keep UFO separate from UDP
segmentation since they would be two different things.

Thanks.

- Alex

^ permalink raw reply

* Re: [PATCH RFC net-next 00/11] udp gso
From: Tushar Dave @ 2018-04-20 18:27 UTC (permalink / raw)
  To: Alexander Duyck, David Miller
  Cc: Sowmini Varadhan, Willem de Bruijn, Samudrala, Sridhar, Netdev,
	Willem de Bruijn
In-Reply-To: <CAKgT0UcefCpG2j7RQN8aUUgu2bud_RTFs_s+nFPY2GhKgE8L+A@mail.gmail.com>



On 04/18/2018 11:12 AM, Alexander Duyck wrote:
> On Wed, Apr 18, 2018 at 10:28 AM, David Miller <davem@davemloft.net> wrote:
>> From: Sowmini Varadhan <sowmini.varadhan@oracle.com>
>> Date: Wed, 18 Apr 2018 08:31:03 -0400
>>
>>> However, I share Sridhar's concerns about the very fundamental change
>>> to UDP message boundary semantics here.  There is actually no such thing
>>> as a "segment" in udp, so in general this feature makes me a little
>>> uneasy.  Well behaved udp applications should already be sending mtu
>>> sized datagrams. And the not-so-well-behaved ones are probably relying
>>> on IP fragmentation/reassembly to take care of datagram boundary semantics
>>> for them?
>>>
>>> As Sridhar points out, the feature is not really "negotiated" - one side
>>> unilaterally sets the option. If the receiver is a classic/POSIX UDP
>>> implementation, it will have no way of knowing that message boundaries
>>> have been re-adjusted at the sender.
>>
>> There are no "semantics".
>>
>> What ends up on the wire is the same before the kernel/app changes as
>> afterwards.
>>
>> The only difference is that instead of the application doing N - 1
>> sendmsg() calls with mtu sized writes, it's giving everything all at
>> once and asking the kernel to segment.
>>
>> It even gives the application control over the size of the packets,
>> which I think is completely prudent in this situation.
> 
> My only concern with the patch set is verifying what mitigations are
> in case so that we aren't trying to set an MSS size that results in a
> frame larger than MTU. I'm still digging through the code and trying
> to grok it, but I figured I might just put the question out there to
> may my reviewing easier.
> 
> Also any plans for HW offload support for this? I vaguely recall that
> the igb and ixgbe parts had support for something like this in
> hardware. I would have to double check to see what exactly is
> supported.

Alex,

If by HW support you meant UFO (UDP Fragmentation Offload), then I have
dig into that last year using ixgbe. And I found that Intel 10G HW does
break large UDP packets into MTU size however it does not generate
*true* IP fragments. Instead, when large (> MTU) size UDP packet is
given to NIC, HW generates unique UDP packets with distinct IP
fragments. This makes it impossible for receiving station to reassemble
them into one UDP packet.

I am not sure about igb!

-Tushar


> 
> Thanks.
> 
> - Alex
> 

^ permalink raw reply

* Re: [PATCH net-next 0/5] virtio-net: Add SCTP checksum offload support
From: Michael S. Tsirkin @ 2018-04-20 18:32 UTC (permalink / raw)
  To: Marcelo Ricardo Leitner
  Cc: Vlad Yasevich, Vladislav Yasevich, netdev, linux-sctp,
	virtualization, jasowang, nhorman
In-Reply-To: <20180420172219.GR4716@localhost.localdomain>

On Fri, Apr 20, 2018 at 02:22:19PM -0300, Marcelo Ricardo Leitner wrote:
> On Wed, Apr 18, 2018 at 05:06:46PM +0300, Michael S. Tsirkin wrote:
> > On Tue, Apr 17, 2018 at 04:35:18PM -0400, Vlad Yasevich wrote:
> > > On 04/02/2018 10:47 AM, Marcelo Ricardo Leitner wrote:
> > > > On Mon, Apr 02, 2018 at 09:40:01AM -0400, Vladislav Yasevich wrote:
> > > >> Now that we have SCTP offload capabilities in the kernel, we can add
> > > >> them to virtio as well.  First step is SCTP checksum.
> > > >
> > > > Thanks.
> > > >
> > > >> As for GSO, the way sctp GSO is currently implemented buys us nothing
> > > >> in added support to virtio.  To add true GSO, would require a lot of
> > > >> re-work inside of SCTP and would require extensions to the virtio
> > > >> net header to carry extra sctp data.
> > > >
> > > > Can you please elaborate more on this? Is this because SCTP GSO relies
> > > > on the gso skb format for knowing how to segment it instead of having
> > > > a list of sizes?
> > > >
> > >
> > > it's mainly because all the true segmentation, placing data into chunks,
> > > has already happened.  All that GSO does is allow for higher bundling
> > > rate between VMs. If that is all SCTP GSO ever going to do, that fine,
> > > but the goal is to do real GSO eventually and potentially reduce the
> > > amount of memory copying we are doing.
> > > If we do that, any current attempt at GSO in virtio would have to be
> > > depricated and we'd need GSO2 or something like that.
> >
> > Batching helps virtualization *a lot* though.
> 
> Yep. The results posted by Xin in the other email give good insights
> on it.
> 
> > Are there actual plans for GSO2? Is it just for SCTP?
> 
> No plans. In this context, at least, yes, just for SCTP.
> 
> It was a supposition in case we start doing a different GSO for SCTP,
> one more like what we have for TCP.
> 
> Currently, as the SCTP GSO code doesn't leave the system, we can
> update it if we want. But by the moment we add support for it in
> virtio, we will have to be backwards compatible if we end up doing
> SCTP GSO differently.

At least for TX you can always just disable the optimization. Won't be
worse than what is there now. RX is trickier - but that's GRO
not GSO.

> But again, I don't think such approach for SCTP GSO would be neither
> feasible or worth. The complexity for it, to work across stream
> schedules and late TSN allocation, would do more harm then good IMO.
> 
> >
> > >
> > > This is why, after doing the GSO support, I decided not to include it.
> > >
> > > -vlad
> > > >   Marcelo
> > > >

^ permalink raw reply

* Re: [PATCH bpf-next v3 4/8] bpf: add documentation for eBPF helpers (23-32)
From: Quentin Monnet @ 2018-04-20 18:54 UTC (permalink / raw)
  To: Daniel Borkmann, ast; +Cc: netdev, oss-drivers, linux-doc, linux-man
In-Reply-To: <6f1b43c7-5d79-7419-1053-d0b29c1e2bb9@iogearbox.net>

2018-04-19 13:16 UTC+0200 ~ Daniel Borkmann <daniel@iogearbox.net>
> On 04/17/2018 04:34 PM, Quentin Monnet wrote:
>> Add documentation for eBPF helper functions to bpf.h user header file.
>> This documentation can be parsed with the Python script provided in
>> another commit of the patch series, in order to provide a RST document
>> that can later be converted into a man page.
>>
>> The objective is to make the documentation easily understandable and
>> accessible to all eBPF developers, including beginners.
>>
>> This patch contains descriptions for the following helper functions, all
>> written by Daniel:
>>
>> - bpf_get_prandom_u32()
>> - bpf_get_smp_processor_id()
>> - bpf_get_cgroup_classid()
>> - bpf_get_route_realm()
>> - bpf_skb_load_bytes()
>> - bpf_csum_diff()
>> - bpf_skb_get_tunnel_opt()
>> - bpf_skb_set_tunnel_opt()
>> - bpf_skb_change_proto()
>> - bpf_skb_change_type()
>>
>> v3:
>> - bpf_get_prandom_u32(): Fix helper name :(. Add description, including
>>   a note on the internal random state.
>> - bpf_get_smp_processor_id(): Add description, including a note on the
>>   processor id remaining stable during program run.
>> - bpf_get_cgroup_classid(): State that CONFIG_CGROUP_NET_CLASSID is
>>   required to use the helper. Add a reference to related documentation.
>>   State that placing a task in net_cls controller disables cgroup-bpf.
>> - bpf_get_route_realm(): State that CONFIG_CGROUP_NET_CLASSID is
>>   required to use this helper.
>> - bpf_skb_load_bytes(): Fix comment on current use cases for the helper.
>>
>> Cc: Daniel Borkmann <daniel@iogearbox.net>
>> Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com>
>> ---
>>  include/uapi/linux/bpf.h | 152 +++++++++++++++++++++++++++++++++++++++++++++++
>>  1 file changed, 152 insertions(+)
>>
>> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
>> index c59bf5b28164..d748f65a8f58 100644
>> --- a/include/uapi/linux/bpf.h
>> +++ b/include/uapi/linux/bpf.h

[...]

>> @@ -615,6 +632,27 @@ union bpf_attr {
>>   * 	Return
>>   * 		0 on success, or a negative error in case of failure.
>>   *
>> + * u32 bpf_get_cgroup_classid(struct sk_buff *skb)
>> + * 	Description
>> + * 		Retrieve the classid for the current task, i.e. for the
>> + * 		net_cls (network classifier) cgroup to which *skb* belongs.
>> + *
>> + * 		This helper is only available is the kernel was compiled with
>> + * 		the **CONFIG_CGROUP_NET_CLASSID** configuration option set to
>> + * 		"**y**" or to "**m**".
>> + *
>> + * 		Note that placing a task into the net_cls controller completely
>> + * 		disables the execution of eBPF programs with the cgroup.
> 
> I'm not sure I follow the above sentence, what do you mean by that?

Please see comment below.

> I would definitely also add here that this helper is limited to cgroups v1
> only, and that it works on clsact TC egress hook but not the ingress one.
> 
>> + * 		Also note that, in the above description, the "network
>> + * 		classifier" cgroup does not designate a generic classifier, but
>> + * 		a particular mechanism that provides an interface to tag
>> + * 		network packets with a specific class identifier. See also the
> 
> The "generic classifier" part is a bit strange to parse. I would probably
> leave the first part out and explain that this provides a means to tag
> packets based on a user-provided ID for all traffic coming from the tasks
> belonging to the related cgroup.

In this paragraph and in the one above, I am trying to address Alexei
comments to the RFC v2:

    please add that kernel should be configured with
    CONFIG_NET_CLS_CGROUP=y|m
    and mention Documentation/cgroup-v1/net_cls.txt
    Otherwise 'network classifier' is way too generic.
    I'd also mention that placing a task into net_cls controller
    disables all of cgroup-bpf.

But I must admit I am struggling a bit on this helper. If I understand
correctly, "network classifier" is too generic and could be confused
with TC classifiers? Hence the attempt to avoid that in the second
paragraph... As for "placing a task into net_cls controller disables all
of cgroup-bpf", again if I understand correctly, I think it refers to
cgroup_sk_alloc_disable() being called in write_classid() in file
net/core/netclassid_cgroup.c. But I am not familiar with it and cannot
find a nice way to explain that in the doc :/.

>> + * 		related kernel documentation, available from the Linux sources
>> + * 		in file *Documentation/cgroup-v1/net_cls.txt*.
>> + * 	Return
>> + * 		The classid, or 0 for the default unconfigured classid.
>> + *
>>   * int bpf_skb_vlan_push(struct sk_buff *skb, __be16 vlan_proto, u16 vlan_tci)
>>   * 	Description
>>   * 		Push a *vlan_tci* (VLAN tag control information) of protocol
>> @@ -734,6 +772,16 @@ union bpf_attr {
>>   * 		are **TC_ACT_REDIRECT** on success or **TC_ACT_SHOT** on
>>   * 		error.
>>   *
>> + * u32 bpf_get_route_realm(struct sk_buff *skb)
>> + * 	Description
>> + * 		Retrieve the realm or the route, that is to say the
>> + * 		**tclassid** field of the destination for the *skb*. This
>> + * 		helper is available only if the kernel was compiled with
>> + * 		**CONFIG_IP_ROUTE_CLASSID** configuration option.
> 
> Could mention that this is a similar user provided tag like in the net_cls
> case with cgroups only that this applies to routes here (dst entries) which
> hold this tag.
> 
> Also, should say that this works with clsact TC egress hook or alternatively
> conventional classful egress qdiscs, but not on TC ingress. In case of clsact
> TC egress hook this has the advantage that the dst entry has not been dropped
> yet in the xmit path. Therefore, the dst entry does not need to be artificially
> held via netif_keep_dst() for a classful qdisc until the skb is freed.

Here I am not sure to follow, the advantage is for clsact, but this is
only for a classful qdisc that we can avoid holding the dst entry? How
does holding the entry with netif_keep_dst() translate, from a user
space point of view?

Thanks a lot for the exhaustive review!
Quentin

^ permalink raw reply

* [PATCH net] strparser: Do not call mod_delayed_work with a timeout of LONG_MAX
From: Doron Roberts-Kedes @ 2018-04-20 19:11 UTC (permalink / raw)
  To: David Miller; +Cc: Tejun Heo, netdev, Doron Roberts-Kedes

struct sock's sk_rcvtimeo is initialized to
LONG_MAX/MAX_SCHEDULE_TIMEOUT in sock_init_data. Calling
mod_delayed_work with a timeout of LONG_MAX causes spurious execution of
the work function. timer->expires is set equal to jiffies + LONG_MAX.
When timer_base->clk falls behind the current value of jiffies,
the delta between timer_base->clk and jiffies + LONG_MAX causes the
expiration to be in the past. Returning early from strp_start_timer if
timeo == LONG_MAX solves this problem.

Found while testing net/tls_sw recv path.

Fixes: 43a0c6751a322847 ("strparser: Stream parser for messages")
Reviewed-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Doron Roberts-Kedes <doronrk@fb.com>
---
 net/strparser/strparser.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/strparser/strparser.c b/net/strparser/strparser.c
index 805b139..092bebc 100644
--- a/net/strparser/strparser.c
+++ b/net/strparser/strparser.c
@@ -67,7 +67,7 @@ static void strp_abort_strp(struct strparser *strp, int err)
 
 static void strp_start_timer(struct strparser *strp, long timeo)
 {
-	if (timeo)
+	if (timeo && timeo != LONG_MAX)
 		mod_delayed_work(strp_wq, &strp->msg_timer_work, timeo);
 }
 
-- 
2.9.5

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox