Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [virtio-dev] Re: [PATCH net-next v2 0/5] virtio: support packed ring
From: Michael S. Tsirkin @ 2018-09-12 16:16 UTC (permalink / raw)
  To: Tiwei Bie
  Cc: Jason Wang, virtualization, linux-kernel, netdev, virtio-dev,
	wexu, jfreimann
In-Reply-To: <20180911053726.GA7472@debian>

On Tue, Sep 11, 2018 at 01:37:26PM +0800, Tiwei Bie wrote:
> On Mon, Sep 10, 2018 at 11:33:17AM +0800, Jason Wang wrote:
> > On 2018年09月10日 11:00, Tiwei Bie wrote:
> > > On Fri, Sep 07, 2018 at 09:00:49AM -0400, Michael S. Tsirkin wrote:
> > > > On Fri, Sep 07, 2018 at 09:22:25AM +0800, Tiwei Bie wrote:
> > > > > On Mon, Aug 27, 2018 at 05:00:40PM +0300, Michael S. Tsirkin wrote:
> > > > > > Are there still plans to test the performance with vost pmd?
> > > > > > vhost doesn't seem to show a performance gain ...
> > > > > > 
> > > > > I tried some performance tests with vhost PMD. In guest, the
> > > > > XDP program will return XDP_DROP directly. And in host, testpmd
> > > > > will do txonly fwd.
> > > > > 
> > > > > When burst size is 1 and packet size is 64 in testpmd and
> > > > > testpmd needs to iterate 5 Tx queues (but only the first two
> > > > > queues are enabled) to prepare and inject packets, I got ~12%
> > > > > performance boost (5.7Mpps -> 6.4Mpps). And if the vhost PMD
> > > > > is faster (e.g. just need to iterate the first two queues to
> > > > > prepare and inject packets), then I got similar performance
> > > > > for both rings (~9.9Mpps) (packed ring's performance can be
> > > > > lower, because it's more complicated in driver.)
> > > > > 
> > > > > I think packed ring makes vhost PMD faster, but it doesn't make
> > > > > the driver faster. In packed ring, the ring is simplified, and
> > > > > the handling of the ring in vhost (device) is also simplified,
> > > > > but things are not simplified in driver, e.g. although there is
> > > > > no desc table in the virtqueue anymore, driver still needs to
> > > > > maintain a private desc state table (which is still managed as
> > > > > a list in this patch set) to support the out-of-order desc
> > > > > processing in vhost (device).
> > > > > 
> > > > > I think this patch set is mainly to make the driver have a full
> > > > > functional support for the packed ring, which makes it possible
> > > > > to leverage the packed ring feature in vhost (device). But I'm
> > > > > not sure whether there is any other better idea, I'd like to
> > > > > hear your thoughts. Thanks!
> > > > Just this: Jens seems to report a nice gain with virtio and
> > > > vhost pmd across the board. Try to compare virtio and
> > > > virtio pmd to see what does pmd do better?
> > > The virtio PMD (drivers/net/virtio) in DPDK doesn't need to share
> > > the virtio ring operation code with other drivers and is highly
> > > optimized for network. E.g. in Rx, the Rx burst function won't
> > > chain descs. So the ID management for the Rx ring can be quite
> > > simple and straightforward, we just need to initialize these IDs
> > > when initializing the ring and don't need to change these IDs
> > > in data path anymore (the mergable Rx code in that patch set
> > > assumes the descs will be written back in order, which should be
> > > fixed. I.e., the ID in the desc should be used to index vq->descx[]).
> > > The Tx code in that patch set also assumes the descs will be
> > > written back by device in order, which should be fixed.
> > 
> > Yes it is. I think I've pointed it out in some early version of pmd patch.
> > So I suspect part (or all) of the boost may come from in order feature.
> > 
> > > 
> > > But in kernel virtio driver, the virtio_ring.c is very generic.
> > > The enqueue (virtqueue_add()) and dequeue (virtqueue_get_buf_ctx())
> > > functions need to support all the virtio devices and should be
> > > able to handle all the possible cases that may happen. So although
> > > the packed ring can be very efficient in some cases, currently
> > > the room to optimize the performance in kernel's virtio_ring.c
> > > isn't that much. If we want to take the fully advantage of the
> > > packed ring's efficiency, we need some further e.g. API changes
> > > in virtio_ring.c, which shouldn't be part of this patch set.
> > 
> > Could you please share more thoughts on this e.g how to improve the API?
> > Notice since the API is shared by both split ring and packed ring, it may
> > improve the performance of split ring as well. One can easily imagine a
> > batching API, but it does not have many real users now, the only case is the
> > XDP transmission which can accept an array of XDP frames.
> 
> I don't have detailed thoughts on this yet. But kernel's
> virtio_ring.c is quite generic compared with what we did
> in virtio PMD.

In what way? What are some things that aren't implemented there?

If what you say is true then we should take a careful look
and not supporting these generic things with packed layout.
Once we do support them it will be too late and we won't
be able to get performance back.



> > 
> > > So
> > > I still think this patch set is mainly to make the kernel virtio
> > > driver to have a full functional support of the packed ring, and
> > > we can't expect impressive performance boost with it.
> > 
> > We can only gain when virtio ring layout is the bottleneck. If there're
> > bottlenecks elsewhere, we probably won't see any increasing in the numbers.
> > Vhost-net is an example, and lots of optimizations have proved that virtio
> > ring is not the main bottleneck for the current codes. I suspect it also the
> > case of virtio driver. Did perf tell us any interesting things in virtio
> > driver?
> > 
> > Thanks
> > 
> > > 
> > > > 
> > > > > > On Wed, Jul 11, 2018 at 10:27:06AM +0800, Tiwei Bie wrote:
> > > > > > > Hello everyone,
> > > > > > > 
> > > > > > > This patch set implements packed ring support in virtio driver.
> > > > > > > 
> > > > > > > Some functional tests have been done with Jason's
> > > > > > > packed ring implementation in vhost:
> > > > > > > 
> > > > > > > https://lkml.org/lkml/2018/7/3/33
> > > > > > > 
> > > > > > > Both of ping and netperf worked as expected.
> > > > > > > 
> > > > > > > v1 -> v2:
> > > > > > > - Use READ_ONCE() to read event off_wrap and flags together (Jason);
> > > > > > > - Add comments related to ccw (Jason);
> > > > > > > 
> > > > > > > RFC (v6) -> v1:
> > > > > > > - Avoid extra virtio_wmb() in virtqueue_enable_cb_delayed_packed()
> > > > > > >    when event idx is off (Jason);
> > > > > > > - Fix bufs calculation in virtqueue_enable_cb_delayed_packed() (Jason);
> > > > > > > - Test the state of the desc at used_idx instead of last_used_idx
> > > > > > >    in virtqueue_enable_cb_delayed_packed() (Jason);
> > > > > > > - Save wrap counter (as part of queue state) in the return value
> > > > > > >    of virtqueue_enable_cb_prepare_packed();
> > > > > > > - Refine the packed ring definitions in uapi;
> > > > > > > - Rebase on the net-next tree;
> > > > > > > 
> > > > > > > RFC v5 -> RFC v6:
> > > > > > > - Avoid tracking addr/len/flags when DMA API isn't used (MST/Jason);
> > > > > > > - Define wrap counter as bool (Jason);
> > > > > > > - Use ALIGN() in vring_init_packed() (Jason);
> > > > > > > - Avoid using pointer to track `next` in detach_buf_packed() (Jason);
> > > > > > > - Add comments for barriers (Jason);
> > > > > > > - Don't enable RING_PACKED on ccw for now (noticed by Jason);
> > > > > > > - Refine the memory barrier in virtqueue_poll();
> > > > > > > - Add a missing memory barrier in virtqueue_enable_cb_delayed_packed();
> > > > > > > - Remove the hacks in virtqueue_enable_cb_prepare_packed();
> > > > > > > 
> > > > > > > RFC v4 -> RFC v5:
> > > > > > > - Save DMA addr, etc in desc state (Jason);
> > > > > > > - Track used wrap counter;
> > > > > > > 
> > > > > > > RFC v3 -> RFC v4:
> > > > > > > - Make ID allocation support out-of-order (Jason);
> > > > > > > - Various fixes for EVENT_IDX support;
> > > > > > > 
> > > > > > > RFC v2 -> RFC v3:
> > > > > > > - Split into small patches (Jason);
> > > > > > > - Add helper virtqueue_use_indirect() (Jason);
> > > > > > > - Just set id for the last descriptor of a list (Jason);
> > > > > > > - Calculate the prev in virtqueue_add_packed() (Jason);
> > > > > > > - Fix/improve desc suppression code (Jason/MST);
> > > > > > > - Refine the code layout for XXX_split/packed and wrappers (MST);
> > > > > > > - Fix the comments and API in uapi (MST);
> > > > > > > - Remove the BUG_ON() for indirect (Jason);
> > > > > > > - Some other refinements and bug fixes;
> > > > > > > 
> > > > > > > RFC v1 -> RFC v2:
> > > > > > > - Add indirect descriptor support - compile test only;
> > > > > > > - Add event suppression supprt - compile test only;
> > > > > > > - Move vring_packed_init() out of uapi (Jason, MST);
> > > > > > > - Merge two loops into one in virtqueue_add_packed() (Jason);
> > > > > > > - Split vring_unmap_one() for packed ring and split ring (Jason);
> > > > > > > - Avoid using '%' operator (Jason);
> > > > > > > - Rename free_head -> next_avail_idx (Jason);
> > > > > > > - Add comments for virtio_wmb() in virtqueue_add_packed() (Jason);
> > > > > > > - Some other refinements and bug fixes;
> > > > > > > 
> > > > > > > Thanks!
> > > > > > > 
> > > > > > > Tiwei Bie (5):
> > > > > > >    virtio: add packed ring definitions
> > > > > > >    virtio_ring: support creating packed ring
> > > > > > >    virtio_ring: add packed ring support
> > > > > > >    virtio_ring: add event idx support in packed ring
> > > > > > >    virtio_ring: enable packed ring
> > > > > > > 
> > > > > > >   drivers/s390/virtio/virtio_ccw.c   |   14 +
> > > > > > >   drivers/virtio/virtio_ring.c       | 1365 ++++++++++++++++++++++------
> > > > > > >   include/linux/virtio_ring.h        |    8 +-
> > > > > > >   include/uapi/linux/virtio_config.h |    3 +
> > > > > > >   include/uapi/linux/virtio_ring.h   |   43 +
> > > > > > >   5 files changed, 1157 insertions(+), 276 deletions(-)
> > > > > > > 
> > > > > > > -- 
> > > > > > > 2.18.0
> > > > > > ---------------------------------------------------------------------
> > > > > > To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> > > > > > For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> > > > > > 
> > 

^ permalink raw reply

* Re: [PATCH net-next v2 3/5] virtio_ring: add packed ring support
From: Michael S. Tsirkin @ 2018-09-12 16:22 UTC (permalink / raw)
  To: Tiwei Bie
  Cc: jasowang, virtualization, linux-kernel, netdev, virtio-dev, wexu,
	jfreimann
In-Reply-To: <20180711022711.7090-4-tiwei.bie@intel.com>

On Wed, Jul 11, 2018 at 10:27:09AM +0800, Tiwei Bie wrote:
> This commit introduces the support (without EVENT_IDX) for
> packed ring.
> 
> Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
> ---
>  drivers/virtio/virtio_ring.c | 495 ++++++++++++++++++++++++++++++++++-
>  1 file changed, 487 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index c4f8abc7445a..f317b485ba54 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -55,12 +55,21 @@
>  #define END_USE(vq)
>  #endif
>  
> +#define _VRING_DESC_F_AVAIL(b)	((__u16)(b) << 7)
> +#define _VRING_DESC_F_USED(b)	((__u16)(b) << 15)
> +
>  struct vring_desc_state {
>  	void *data;			/* Data for callback. */
>  	struct vring_desc *indir_desc;	/* Indirect descriptor, if any. */
>  };
>  
>  struct vring_desc_state_packed {
> +	void *data;			/* Data for callback. */
> +	struct vring_packed_desc *indir_desc; /* Indirect descriptor, if any. */
> +	int num;			/* Descriptor list length. */
> +	dma_addr_t addr;		/* Buffer DMA addr. */
> +	u32 len;			/* Buffer length. */
> +	u16 flags;			/* Descriptor flags. */
>  	int next;			/* The next desc state. */
>  };
>  

Idea: how about using data to chain these descriptors?
You can validate it's pointing within an array to distinguish
e.g. on detach.

-- 
MST

^ permalink raw reply

* Re: [PATCH net-next v3 0/7] net: aquantia: implement WOL and EEE support
From: Igor Russkikh @ 2018-09-12 11:42 UTC (permalink / raw)
  To: David Miller; +Cc: netdev
In-Reply-To: <20180911.234820.1747706577550366387.davem@davemloft.net>


>> Discussion outcome regarding driver version bumps was not finished
>> (here https://patchwork.ozlabs.org/patch/954905/)
>> David, could you suggest the best way to proceed on this?
> 
> Having a channel for your driver that is outside of upstream and Linux
> distribution packages creates lots of problems.
> 
> When a user reports a problem with an upstream kernel, that verion
> dictates which driver source was being used.  There is not confusion
> or ambiguity.
> 
> For a distribution kernel, the distributor hashes out which driver
> they published in their kernel package when evaluating a bug reported
> to them.
> 
> None of these two entities is ready to evaluate and handle properly
> your custom scheme.
> 
> So generally I frown against separate distribution schemes.  It is
> in the final analysis an inferior experience for the user because
> you basically narrow all of their support channels for problems
> down to you and you alone.  The whole idea is to make it work the
> opposite way.
> 
> So in the upstream tree, really, the driver version is pretty useless.

Thank you for the comment, David.

I'll pass over these concerns to my company management.

BR, Igor

^ permalink raw reply

* Re: [PATCH] net: dsa: mv88e6xxx: Add MV88E6352 DT compatible
From: Andrew Lunn @ 2018-09-12 12:47 UTC (permalink / raw)
  To: Marek Vasut; +Cc: netdev
In-Reply-To: <b5cdfc7d-6c0a-b7ad-e0dd-f4d196f1811c@denx.de>

On Wed, Sep 12, 2018 at 10:38:41AM +0200, Marek Vasut wrote:
> On 09/12/2018 02:46 AM, Andrew Lunn wrote:
> > On Wed, Sep 12, 2018 at 02:04:54AM +0200, Marek Vasut wrote:
> >> On 09/12/2018 01:32 AM, Andrew Lunn wrote:
> >>>> compatible = "marvell,mv88e6352", "marvell,mv88e6085";
> >>>
> >>> Just "marvell,mv88e6085";
> >>>
> >>> Please take a look at all the other DT files using the Marvell
> >>> chips. You will only ever find "marvell,mv88e6085" or
> >>> "marvell,mv88e6190", because everything is compatible to one of these
> >>> two.
> >>
> >> Well, until you find a difference between those chips which you cannot
> >> discern based solely on the ID register content. Then it's better to
> >> have a compatible to match on which matches the actual chip.
> > 
> > Hi Marek
> > 
> > We have been around this loop before. The problem with putting in a
> > more specific compatible is that nothing is validating it. So it is
> > going to be wrong, simple because people cut/paste, and don't
> > necessary change it. So when we do need to look at it, we cannot look
> > at it, because it is wrong.
> > 
> > I would only add a more specific compatible if and when we need it, it
> > is actually used, and we can verify it is correct.
> 
> You may need it in the future and it may not be possible to adjust the
> DT then. So having a specific compatible and fallback compatible is I
> think the way to go. You can always match on the fallback and if the
> time comes, you can match on the specific one because it's there. In
> addition to that, such a DT would fully correctly describe the hardware,
> unlike one with only a fallback compatible.
> 
> Regarding validation, I don't see other systems using this approach
> (specific + fallback compatible) having a problem with validation.
> What is the trick there ?

The trick is, they actually use the specific version, not the
generic. So it is validated, because if the specific version is wrong,
the device generally does not work.

We have no way to do that in this case. We load the driver based on
the generic version. It then goes and looks at the ID registers, and
from that decides what the device really is.

So far, Marvell have not messed this up. The ID registers have been
correct. I trust the ID registers much more than anything in device
tree. It is baked into silicon.

But you are talking of a theoretical case where the ID register is
wrong. For this to get passed us and out their in the wild, with DT
blown in ROM, that probably means there are a batch with the correct
ID and a batch with incorrect IDs. Those with incorrect IDs are
failing. You are arguing that we should then enable the use of the
specific compatible. And i expect that breaks more devices than it
fixes, because people have cut/pasted the wrong value.

       Andrew

^ permalink raw reply

* Re: [PATCH net-next v3 01/17] asm: simd context helper API
From: Jason A. Donenfeld @ 2018-09-12 18:10 UTC (permalink / raw)
  To: kevin
  Cc: LKML, Netdev, David Miller, Greg Kroah-Hartman, Andrew Lutomirski,
	Thomas Gleixner, Samuel Neves, linux-arch
In-Reply-To: <20180912061433.GA8484@ip-172-31-15-78>

On Wed, Sep 12, 2018 at 8:14 AM Kevin Easton <kevin@guarana.org> wrote:
> Given that it's always supposed to be used like that, mightn't it be
> better if simd_relax() took a pointer to the context, so the call is
> just
>
>     simd_relax(&simd_context);
>
> ?
>
> The inlining means that there won't actually be a pointer dereference in
> the emitted code.
>
> If simd_put() also took a pointer then it could set the context back to
> HAVE_NO_SIMD as well?

That's sort of a neat idea. I guess in this scheme, you'd envision:

   simd_context_t simd_context;

   simd_get(&simd_context);
   simd_relax(&simd_context);
   simd_put(&simd_context);

And this way, if simd_context ever becomes a heavier struct, it can be
modified in place rather than returned by value from the function. On
the other hand, it's a little bit more annoying to type and makes it
harder to do declaration and initialization on the same line.

^ permalink raw reply

* Re: [PATCH v2 05/17] compat_ioctl: move more drivers to generic_compat_ioctl_ptrarg
From: Greg Kroah-Hartman @ 2018-09-12 18:13 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: linux-fbdev-u79uwXL29TY76Z2rM5mHXA,
	linux-iio-u79uwXL29TY76Z2rM5mHXA,
	linux-pci-u79uwXL29TY76Z2rM5mHXA,
	linux-remoteproc-u79uwXL29TY76Z2rM5mHXA,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	platform-driver-x86-u79uwXL29TY76Z2rM5mHXA,
	sparclinux-u79uwXL29TY76Z2rM5mHXA,
	devel-gWbeCf7V1WCQmaza687I9mD2FQJk+8+b,
	linux-scsi-u79uwXL29TY76Z2rM5mHXA,
	linux-nvdimm-hn68Rpc1hR1g9hUCZPvPmw,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	qat-linux-ral2JQCrhuEAvxtiuMwx3w,
	amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	linux-input-u79uwXL29TY76Z2rM5mHXA,
	linux-media-u79uwXL29TY76Z2rM5mHXA,
	linaro-mm-sig-cunTk1MwBs8s++Sfvej+rw,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	ceph-devel-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-usb-u79uwXL29TY76Z2rM5mHXA,
	linux-wireless-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-crypto-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linuxppc-dev-uLR06cmDAlY/bJ5BZ2RsiQ, David S. Miller,
	linux-btrfs-u79uwXL29TY76Z2rM5mHXA,
	viro-RmSDqhL/yNMiFSDQTTA3OLVCufUGDwFn
In-Reply-To: <20180912151134.436719-1-arnd-r2nGTMty4D4@public.gmane.org>

On Wed, Sep 12, 2018 at 05:08:52PM +0200, Arnd Bergmann wrote:
> The .ioctl and .compat_ioctl file operations have the same prototype so
> they can both point to the same function, which works great almost all
> the time when all the commands are compatible.
> 
> One exception is the s390 architecture, where a compat pointer is only
> 31 bit wide, and converting it into a 64-bit pointer requires calling
> compat_ptr(). Most drivers here will ever run in s390, but since we now
> have a generic helper for it, it's easy enough to use it consistently.
> 
> I double-checked all these drivers to ensure that all ioctl arguments
> are used as pointers or are ignored, but are not interpreted as integer
> values.
> 
> Signed-off-by: Arnd Bergmann <arnd-r2nGTMty4D4@public.gmane.org>

Acked-by: Greg Kroah-Hartman <gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r@public.gmane.org>

^ permalink raw reply

* Re: [PATCH v2 02/17] compat_ioctl: move drivers to generic_compat_ioctl_ptrarg
From: Greg Kroah-Hartman @ 2018-09-12 18:13 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: kvm, Alexander Shishkin, Jarkko Sakkinen, virtualization,
	Benjamin Tissoires, linux-mtd, Peter Huewe, linux1394-devel,
	devel, Jason Gunthorpe, Marek Vasut, linux-input, Tomas Winkler,
	Jiri Kosina, viro, Artem Bityutskiy, netdev, linux-usb,
	linux-kernel, Sudip Mukherjee, Stefan Richter, linux-fsdevel,
	linux-integrity, David S. Miller
In-Reply-To: <20180912150142.157913-2-arnd@arndb.de>

On Wed, Sep 12, 2018 at 05:01:03PM +0200, Arnd Bergmann wrote:
> Each of these drivers has a copy of the same trivial helper function to
> convert the pointer argument and then call the native ioctl handler.
> 
> We now have a generic implementation of that, so use it.
> 
> Signed-off-by: Arnd Bergmann <arnd@arndb.de>

Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

^ permalink raw reply

* Re: [PATCH] net: dsa: mv88e6xxx: Add MV88E6352 DT compatible
From: Marek Vasut @ 2018-09-12 13:10 UTC (permalink / raw)
  To: Andrew Lunn; +Cc: netdev
In-Reply-To: <20180912124738.GA24595@lunn.ch>

On 09/12/2018 02:47 PM, Andrew Lunn wrote:
> On Wed, Sep 12, 2018 at 10:38:41AM +0200, Marek Vasut wrote:
>> On 09/12/2018 02:46 AM, Andrew Lunn wrote:
>>> On Wed, Sep 12, 2018 at 02:04:54AM +0200, Marek Vasut wrote:
>>>> On 09/12/2018 01:32 AM, Andrew Lunn wrote:
>>>>>> compatible = "marvell,mv88e6352", "marvell,mv88e6085";
>>>>>
>>>>> Just "marvell,mv88e6085";
>>>>>
>>>>> Please take a look at all the other DT files using the Marvell
>>>>> chips. You will only ever find "marvell,mv88e6085" or
>>>>> "marvell,mv88e6190", because everything is compatible to one of these
>>>>> two.
>>>>
>>>> Well, until you find a difference between those chips which you cannot
>>>> discern based solely on the ID register content. Then it's better to
>>>> have a compatible to match on which matches the actual chip.
>>>
>>> Hi Marek
>>>
>>> We have been around this loop before. The problem with putting in a
>>> more specific compatible is that nothing is validating it. So it is
>>> going to be wrong, simple because people cut/paste, and don't
>>> necessary change it. So when we do need to look at it, we cannot look
>>> at it, because it is wrong.
>>>
>>> I would only add a more specific compatible if and when we need it, it
>>> is actually used, and we can verify it is correct.
>>
>> You may need it in the future and it may not be possible to adjust the
>> DT then. So having a specific compatible and fallback compatible is I
>> think the way to go. You can always match on the fallback and if the
>> time comes, you can match on the specific one because it's there. In
>> addition to that, such a DT would fully correctly describe the hardware,
>> unlike one with only a fallback compatible.
>>
>> Regarding validation, I don't see other systems using this approach
>> (specific + fallback compatible) having a problem with validation.
>> What is the trick there ?
> 
> The trick is, they actually use the specific version, not the
> generic. So it is validated, because if the specific version is wrong,
> the device generally does not work.

Many of them use the fallback version to avoid polluting the tables in
the kernel with redundant compat strings. The specific string is only
added into the driver if there is a HW difference which needs to be
dealt with.

> We have no way to do that in this case. We load the driver based on
> the generic version. It then goes and looks at the ID registers, and
> from that decides what the device really is.

And this works fine, until marvell screws it up in one of those chips.

> So far, Marvell have not messed this up. The ID registers have been
> correct. I trust the ID registers much more than anything in device
> tree. It is baked into silicon.

But the DT should correctly describe the hardware, if it doesn't, it's
just broken.

> But you are talking of a theoretical case where the ID register is
> wrong. For this to get passed us and out their in the wild, with DT
> blown in ROM, that probably means there are a batch with the correct
> ID and a batch with incorrect IDs. Those with incorrect IDs are
> failing. You are arguing that we should then enable the use of the
> specific compatible. And i expect that breaks more devices than it
> fixes, because people have cut/pasted the wrong value.

I am arguing you should have both specific compatible which describes
which exact chip is in that device AND a fallback compatible which the
driver matches on.

-- 
Best regards,
Marek Vasut

^ permalink raw reply

* Re: [RFC v2 1/2] netlink: add NLA_REJECT policy type
From: David Miller @ 2018-09-12 18:15 UTC (permalink / raw)
  To: johannes-cdvu00un1VgdHxzADdlk8Q
  Cc: linux-wireless-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA, mkubecek-AlSwsSmVLrQ,
	johannes.berg-ral2JQCrhuEAvxtiuMwx3w
In-Reply-To: <20180912083610.20857-1-johannes-cdvu00un1VgdHxzADdlk8Q@public.gmane.org>

From: Johannes Berg <johannes-cdvu00un1VgdHxzADdlk8Q@public.gmane.org>
Date: Wed, 12 Sep 2018 10:36:09 +0200

> From: Johannes Berg <johannes.berg-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
> 
> In some situations some netlink attributes may be used for output
> only (kernel->userspace) or may be reserved for future use. It's
> then helpful to be able to prevent userspace from using them in
> messages sent to the kernel, since they'd otherwise be ignored and
> any future will become impossible if this happens.
> 
> Add NLA_REJECT to the policy which does nothing but reject (with
> EINVAL) validation of any messages containing this attribute.
> Allow for returning a specific extended ACK error message in the
> validation_data pointer.
> 
> While at it fix the indentation of NLA_BITFIELD32 and describe the
> validation_data pointer for it.
> 
> The specific case I have in mind now is a shared nested attribute
> containing request/response data, and it would be pointless and
> potentially confusing to have userspace include response data in
> the messages that actually contain a request.
> 
> Signed-off-by: Johannes Berg <johannes.berg-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>

This looks great, no objections to this idea or the facility.

It does, however, remind me about about the classic problem of how bad
we are at feature support detection because unrecognized attributes are
ignored.

I do really hope we can fully solve that problem some day.

^ permalink raw reply

* Re: [PATCH net-next v3 02/17] zinc: introduce minimal cryptography library
From: Jason A. Donenfeld @ 2018-09-12 18:16 UTC (permalink / raw)
  To: Eric Biggers
  Cc: Ard Biesheuvel, LKML, Netdev, David Miller, Greg Kroah-Hartman,
	Andrew Lutomirski, Samuel Neves, Jean-Philippe Aumasson,
	Linux Crypto Mailing List
In-Reply-To: <20180911220849.GC81235@gmail.com>

Hi Eric,

On Wed, Sep 12, 2018 at 12:08 AM Eric Biggers <ebiggers@kernel.org> wrote:
> I'd strongly prefer the assembly to be readable too.  Jason, I'm not sure if
> you've actually read through the asm from the OpenSSL implementations, but the
> generated .S files actually do lose a lot of semantic information that was in
> the original .pl scripts.

The thing to keep in mind is that the .S was not directly and blindly
generated from the .pl. We started with the output of the .pl, and
then, particularly in the case of x86_64, worked with it a lot, and
now it's something a bit different. We've definitely spent a lot of
time reading that assembly.

I'll see if I can improve the readability with some register name
remapping on ARM. No guarantees, but I'll play a bit and see if I can
make it a bit better.

Jason

^ permalink raw reply

* Re: [PATCH net-next v3 02/17] zinc: introduce minimal cryptography library
From: Ard Biesheuvel @ 2018-09-12 18:19 UTC (permalink / raw)
  To: Jason A. Donenfeld
  Cc: Eric Biggers, LKML, Netdev, David Miller, Greg Kroah-Hartman,
	Andrew Lutomirski, Samuel Neves, Jean-Philippe Aumasson,
	Linux Crypto Mailing List
In-Reply-To: <CAHmME9rxrC+CAR-xoXd-bZO1HYZ+TjvT_K4xgQwWXE53zi61xQ@mail.gmail.com>

On 12 September 2018 at 20:16, Jason A. Donenfeld <Jason@zx2c4.com> wrote:
> Hi Eric,
>
> On Wed, Sep 12, 2018 at 12:08 AM Eric Biggers <ebiggers@kernel.org> wrote:
>> I'd strongly prefer the assembly to be readable too.  Jason, I'm not sure if
>> you've actually read through the asm from the OpenSSL implementations, but the
>> generated .S files actually do lose a lot of semantic information that was in
>> the original .pl scripts.
>
> The thing to keep in mind is that the .S was not directly and blindly
> generated from the .pl. We started with the output of the .pl, and
> then, particularly in the case of x86_64, worked with it a lot, and
> now it's something a bit different. We've definitely spent a lot of
> time reading that assembly.
>

Can we please have those changes as a separate patch? Preferably to
the .pl file rather than the .S file, so we can easily distinguish the
code from upstream from the code that you modified.

> I'll see if I can improve the readability with some register name
> remapping on ARM. No guarantees, but I'll play a bit and see if I can
> make it a bit better.
>
> Jason

^ permalink raw reply

* Re: [PATCH net-next v3 02/17] zinc: introduce minimal cryptography library
From: Eric Biggers @ 2018-09-12 18:34 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Jason A. Donenfeld, LKML, Netdev, David Miller,
	Greg Kroah-Hartman, Andrew Lutomirski, Samuel Neves,
	Jean-Philippe Aumasson, Linux Crypto Mailing List
In-Reply-To: <CAKv+Gu9ZjX3A4bHHgRAkJtr+SKwti+d3dwE0fhGtmeQB_refqw@mail.gmail.com>

On Wed, Sep 12, 2018 at 08:19:21PM +0200, Ard Biesheuvel wrote:
> On 12 September 2018 at 20:16, Jason A. Donenfeld <Jason@zx2c4.com> wrote:
> > Hi Eric,
> >
> > On Wed, Sep 12, 2018 at 12:08 AM Eric Biggers <ebiggers@kernel.org> wrote:
> >> I'd strongly prefer the assembly to be readable too.  Jason, I'm not sure if
> >> you've actually read through the asm from the OpenSSL implementations, but the
> >> generated .S files actually do lose a lot of semantic information that was in
> >> the original .pl scripts.
> >
> > The thing to keep in mind is that the .S was not directly and blindly
> > generated from the .pl. We started with the output of the .pl, and
> > then, particularly in the case of x86_64, worked with it a lot, and
> > now it's something a bit different. We've definitely spent a lot of
> > time reading that assembly.
> >
> 
> Can we please have those changes as a separate patch? Preferably to
> the .pl file rather than the .S file, so we can easily distinguish the
> code from upstream from the code that you modified.
> 
> > I'll see if I can improve the readability with some register name
> > remapping on ARM. No guarantees, but I'll play a bit and see if I can
> > make it a bit better.
> >
> > Jason

FWIW, yesterday I made a modified version of poly1305-armv4.pl that generates an
asm file that works in kernel mode.  The changes are actually pretty small, and
I think we can get them upstream into OpenSSL like they were for sha256-armv4.pl
and sha512-armv4.pl.  I'll start a thread with Andy Polyakov and you two.

But I don't have time to help with all the many OpenSSL asm files Jason is
proposing, just maybe poly1305-armv4 and chacha-armv4 for now.

- Eric

^ permalink raw reply

* [PATCH net 0/4] s390/qeth: fixes 2018-09-12
From: Julian Wiedmann @ 2018-09-12 13:31 UTC (permalink / raw)
  To: David Miller
  Cc: netdev, linux-s390, Martin Schwidefsky, Heiko Carstens,
	Stefan Raspl, Ursula Braun, Julian Wiedmann

Hi Dave,

please apply the following qeth fixes for -net.

Patch 1 resolves a regression in an error path, while patch 2 enables
the SG support by default that was newly introduced with 4.19.
Patch 3 takes care of a longstanding problem with large-order
allocations, and patch 4 fixes a potential out-of-bounds access.

Thanks,
Julian


Julian Wiedmann (3):
  s390/qeth: indicate error when netdev allocation fails
  s390/qeth: switch on SG by default for IQD devices
  s390/qeth: don't dump past end of unknown HW header

Wenjia Zhang (1):
  s390/qeth: use vzalloc for QUERY OAT buffer

 drivers/s390/net/qeth_core_main.c | 11 ++++++++---
 drivers/s390/net/qeth_l2_main.c   |  2 +-
 drivers/s390/net/qeth_l3_main.c   |  2 +-
 3 files changed, 10 insertions(+), 5 deletions(-)

-- 
2.16.4

^ permalink raw reply

* [PATCH net 1/4] s390/qeth: indicate error when netdev allocation fails
From: Julian Wiedmann @ 2018-09-12 13:31 UTC (permalink / raw)
  To: David Miller
  Cc: netdev, linux-s390, Martin Schwidefsky, Heiko Carstens,
	Stefan Raspl, Ursula Braun, Julian Wiedmann
In-Reply-To: <20180912133135.12335-1-jwi@linux.ibm.com>

Bailing out on allocation error is nice, but we also need to tell the
ccwgroup core that creating the qeth groupdev failed.

Fixes: d3d1b205e89f ("s390/qeth: allocate netdevice early")
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
---
 drivers/s390/net/qeth_core_main.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/s390/net/qeth_core_main.c b/drivers/s390/net/qeth_core_main.c
index 49f64eb3eab0..6b24face21d5 100644
--- a/drivers/s390/net/qeth_core_main.c
+++ b/drivers/s390/net/qeth_core_main.c
@@ -5768,8 +5768,10 @@ static int qeth_core_probe_device(struct ccwgroup_device *gdev)
 	qeth_update_from_chp_desc(card);
 
 	card->dev = qeth_alloc_netdev(card);
-	if (!card->dev)
+	if (!card->dev) {
+		rc = -ENOMEM;
 		goto err_card;
+	}
 
 	qeth_determine_capabilities(card);
 	enforced_disc = qeth_enforce_discipline(card);
-- 
2.16.4

^ permalink raw reply related

* [PATCH net 2/4] s390/qeth: switch on SG by default for IQD devices
From: Julian Wiedmann @ 2018-09-12 13:31 UTC (permalink / raw)
  To: David Miller
  Cc: netdev, linux-s390, Martin Schwidefsky, Heiko Carstens,
	Stefan Raspl, Ursula Braun, Julian Wiedmann
In-Reply-To: <20180912133135.12335-1-jwi@linux.ibm.com>

Scatter-gather transmit brings a nice performance boost. Considering the
rather large MTU sizes at play, it's also totally the Right Thing To Do.

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
---
 drivers/s390/net/qeth_core_main.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/s390/net/qeth_core_main.c b/drivers/s390/net/qeth_core_main.c
index 6b24face21d5..b60055e9cb1a 100644
--- a/drivers/s390/net/qeth_core_main.c
+++ b/drivers/s390/net/qeth_core_main.c
@@ -5706,6 +5706,8 @@ static struct net_device *qeth_alloc_netdev(struct qeth_card *card)
 		dev->priv_flags &= ~IFF_TX_SKB_SHARING;
 		dev->hw_features |= NETIF_F_SG;
 		dev->vlan_features |= NETIF_F_SG;
+		if (IS_IQD(card))
+			dev->features |= NETIF_F_SG;
 	}
 
 	return dev;
-- 
2.16.4

^ permalink raw reply related

* [PATCH net 3/4] s390/qeth: use vzalloc for QUERY OAT buffer
From: Julian Wiedmann @ 2018-09-12 13:31 UTC (permalink / raw)
  To: David Miller
  Cc: netdev, linux-s390, Martin Schwidefsky, Heiko Carstens,
	Stefan Raspl, Ursula Braun, Wenjia Zhang, Julian Wiedmann
In-Reply-To: <20180912133135.12335-1-jwi@linux.ibm.com>

From: Wenjia Zhang <wenjia@linux.ibm.com>

qeth_query_oat_command() currently allocates the kernel buffer for
the SIOC_QETH_QUERY_OAT ioctl with kzalloc. So on systems with
fragmented memory, large allocations may fail (eg. the qethqoat tool by
default uses 132KB).

Solve this issue by using vzalloc, backing the allocation with
non-contiguous memory.

Signed-off-by: Wenjia Zhang <wenjia@linux.ibm.com>
Reviewed-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
---
 drivers/s390/net/qeth_core_main.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/s390/net/qeth_core_main.c b/drivers/s390/net/qeth_core_main.c
index b60055e9cb1a..de8282420f96 100644
--- a/drivers/s390/net/qeth_core_main.c
+++ b/drivers/s390/net/qeth_core_main.c
@@ -25,6 +25,7 @@
 #include <linux/netdevice.h>
 #include <linux/netdev_features.h>
 #include <linux/skbuff.h>
+#include <linux/vmalloc.h>
 
 #include <net/iucv/af_iucv.h>
 #include <net/dsfield.h>
@@ -4699,7 +4700,7 @@ static int qeth_query_oat_command(struct qeth_card *card, char __user *udata)
 
 	priv.buffer_len = oat_data.buffer_len;
 	priv.response_len = 0;
-	priv.buffer =  kzalloc(oat_data.buffer_len, GFP_KERNEL);
+	priv.buffer = vzalloc(oat_data.buffer_len);
 	if (!priv.buffer) {
 		rc = -ENOMEM;
 		goto out;
@@ -4740,7 +4741,7 @@ static int qeth_query_oat_command(struct qeth_card *card, char __user *udata)
 			rc = -EFAULT;
 
 out_free:
-	kfree(priv.buffer);
+	vfree(priv.buffer);
 out:
 	return rc;
 }
-- 
2.16.4

^ permalink raw reply related

* [PATCH net 4/4] s390/qeth: don't dump past end of unknown HW header
From: Julian Wiedmann @ 2018-09-12 13:31 UTC (permalink / raw)
  To: David Miller
  Cc: netdev, linux-s390, Martin Schwidefsky, Heiko Carstens,
	Stefan Raspl, Ursula Braun, Julian Wiedmann
In-Reply-To: <20180912133135.12335-1-jwi@linux.ibm.com>

For inbound data with an unsupported HW header format, only dump the
actual HW header. We have no idea how much payload follows it, and what
it contains. Worst case, we dump past the end of the Inbound Buffer and
access whatever is located next in memory.

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
---
 drivers/s390/net/qeth_l2_main.c | 2 +-
 drivers/s390/net/qeth_l3_main.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/s390/net/qeth_l2_main.c b/drivers/s390/net/qeth_l2_main.c
index 710fa74892ae..b5e38531733f 100644
--- a/drivers/s390/net/qeth_l2_main.c
+++ b/drivers/s390/net/qeth_l2_main.c
@@ -423,7 +423,7 @@ static int qeth_l2_process_inbound_buffer(struct qeth_card *card,
 		default:
 			dev_kfree_skb_any(skb);
 			QETH_CARD_TEXT(card, 3, "inbunkno");
-			QETH_DBF_HEX(CTRL, 3, hdr, QETH_DBF_CTRL_LEN);
+			QETH_DBF_HEX(CTRL, 3, hdr, sizeof(*hdr));
 			continue;
 		}
 		work_done++;
diff --git a/drivers/s390/net/qeth_l3_main.c b/drivers/s390/net/qeth_l3_main.c
index 7175086677fb..ada258c01a08 100644
--- a/drivers/s390/net/qeth_l3_main.c
+++ b/drivers/s390/net/qeth_l3_main.c
@@ -1390,7 +1390,7 @@ static int qeth_l3_process_inbound_buffer(struct qeth_card *card,
 		default:
 			dev_kfree_skb_any(skb);
 			QETH_CARD_TEXT(card, 3, "inbunkno");
-			QETH_DBF_HEX(CTRL, 3, hdr, QETH_DBF_CTRL_LEN);
+			QETH_DBF_HEX(CTRL, 3, hdr, sizeof(*hdr));
 			continue;
 		}
 		work_done++;
-- 
2.16.4

^ permalink raw reply related

* Re: [PATCH] net: dsa: mv88e6xxx: Add MV88E6352 DT compatible
From: Andrew Lunn @ 2018-09-12 13:32 UTC (permalink / raw)
  To: Marek Vasut; +Cc: netdev
In-Reply-To: <2efc9a68-fc8c-2a85-14f8-bc2c72d9957f@denx.de>

> But the DT should correctly describe the hardware, if it doesn't, it's
> just broken.

It is more subtle than that. It can be broken, yet work, because it
contains information which we don't use. I really expect there will be
cut/paste errors, meaning the more specific compatible is sometimes
wrong. But since at the moment we don't use it, such a broken DT blob
will work. Until the day we need to make use of the more specific
compatible because there really is broken silicon. At that point, we
introduce a regression. All the devices with broke, yet up until now
working DT blobs, stop working. Are you really going to argue they
where always broken, so we don't care we introduced a regression?

Anyway, this is just rehasing an old discussion. Please go read the
archive. See if you have anything new to add which was not discussed
before.

      Andrew

^ permalink raw reply

* Re: [PATCH] net: dsa: mv88e6xxx: Add MV88E6352 DT compatible
From: Marek Vasut @ 2018-09-12 13:35 UTC (permalink / raw)
  To: Andrew Lunn; +Cc: netdev
In-Reply-To: <20180912133222.GC24595@lunn.ch>

On 09/12/2018 03:32 PM, Andrew Lunn wrote:
>> But the DT should correctly describe the hardware, if it doesn't, it's
>> just broken.
> 
> It is more subtle than that. It can be broken, yet work, because it
> contains information which we don't use. I really expect there will be
> cut/paste errors, meaning the more specific compatible is sometimes
> wrong.

If your DT is bogus, nothing can be done about that.

> But since at the moment we don't use it, such a broken DT blob
> will work. Until the day we need to make use of the more specific
> compatible because there really is broken silicon. At that point, we
> introduce a regression. All the devices with broke, yet up until now
> working DT blobs, stop working. Are you really going to argue they
> where always broken, so we don't care we introduced a regression?

If the DT is broken, it's already a bug and we cannot do anything about
that but maybe fix the DT somehow.

> Anyway, this is just rehasing an old discussion. Please go read the
> archive. See if you have anything new to add which was not discussed
> before.

Hehe, I've seen those discussions before too.

-- 
Best regards,
Marek Vasut

^ permalink raw reply

* Re: [PATCH net-next] tcp: rate limit synflood warnings further
From: Willem de Bruijn @ 2018-09-12 13:39 UTC (permalink / raw)
  To: David Miller; +Cc: Network Development, Eric Dumazet, Willem de Bruijn
In-Reply-To: <20180911.233503.2281697703604725089.davem@davemloft.net>

On Wed, Sep 12, 2018 at 2:35 AM David Miller <davem@davemloft.net> wrote:
>
> From: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
> Date: Sun,  9 Sep 2018 19:12:12 -0400
>
> > From: Willem de Bruijn <willemb@google.com>
> >
> > Convert pr_info to net_info_ratelimited to limit the total number of
> > synflood warnings.
> >
> > Commit 946cedccbd73 ("tcp: Change possible SYN flooding messages")
> > rate limits synflood warnings to one per listener.
> >
> > Workloads that open many listener sockets can still see a high rate of
> > log messages. Syzkaller is one frequent example.
> >
> > Signed-off-by: Willem de Bruijn <willemb@google.com>
>
> Applied, thanks Willem.
>
> Is this stable material?

Thanks. Probably not. A process has to keep opening new listeners
at high rate while under load. I've only seen syzkaller do that.

^ permalink raw reply

* Re: [PATCH net-next] virtio_net: ethtool tx napi configuration
From: Willem de Bruijn @ 2018-09-12 13:43 UTC (permalink / raw)
  To: Jason Wang
  Cc: Network Development, David Miller, caleb.raitto,
	Michael S. Tsirkin, Jon Olson (Google Drive), Willem de Bruijn
In-Reply-To: <ab603c53-f7f8-5e89-a7c6-0050a97abe7b@redhat.com>

On Tue, Sep 11, 2018 at 11:35 PM Jason Wang <jasowang@redhat.com> wrote:
>
>
>
> On 2018年09月11日 09:14, Willem de Bruijn wrote:
> >>>> I cook a fixup, and it looks works in my setup:
> >>>>
> >>>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> >>>> index b320b6b14749..9181c3f2f832 100644
> >>>> --- a/drivers/net/virtio_net.c
> >>>> +++ b/drivers/net/virtio_net.c
> >>>> @@ -2204,10 +2204,17 @@ static int virtnet_set_coalesce(struct
> >>>> net_device *dev,
> >>>>                    return -EINVAL;
> >>>>
> >>>>            if (napi_weight ^ vi->sq[0].napi.weight) {
> >>>> -               if (dev->flags & IFF_UP)
> >>>> -                       return -EBUSY;
> >>>> -               for (i = 0; i < vi->max_queue_pairs; i++)
> >>>> +               for (i = 0; i < vi->max_queue_pairs; i++) {
> >>>> +                       struct netdev_queue *txq =
> >>>> +                              netdev_get_tx_queue(vi->dev, i);
> >>>> +
> >>>> + virtnet_napi_tx_disable(&vi->sq[i].napi);
> >>>> +                       __netif_tx_lock_bh(txq);
> >>>>                            vi->sq[i].napi.weight = napi_weight;
> >>>> +                       __netif_tx_unlock_bh(txq);
> >>>> +                       virtnet_napi_tx_enable(vi, vi->sq[i].vq,
> >>>> + &vi->sq[i].napi);
> >>>> +               }
> >>>>            }
> >>>>
> >>>>            return 0;
> >>> Thanks! It passes my simple stress test, too. Which consists of two
> >>> concurrent loops, one toggling the ethtool option, another running
> >>> TCP_RR.
> >>>
> >>>> The only left case is the speculative tx polling in RX NAPI. I think we
> >>>> don't need to care in this case since it was not a must for correctness.
> >>> As long as the txq lock is held that will be a noop, anyway. The other
> >>> concurrent action is skb_xmit_done. It looks correct to me, but need
> >>> to think about it a bit. The tricky transition is coming out of napi without
> >>> having >= 2 + MAX_SKB_FRAGS clean descriptors. If the queue is
> >>> stopped it may deadlock transmission in no-napi mode.
> >> Yes, maybe we can enable tx queue when napi weight is zero in
> >> virtnet_poll_tx().
> > Yes, that precaution should resolve that edge case.
> >
>
> I've done a stress test and it passes. The test contains:
>
> - vm with 2 queues
> - a bash script to enable and disable tx napi
> - two netperf UDP_STREAM sessions to send small packets

Great. That matches my results. Do you want to send the v2?

^ permalink raw reply

* Re: [PATCH 4.4 18/31] r8152: napi hangup fix after disconnect
From: Ben Hutchings @ 2018-09-12 18:54 UTC (permalink / raw)
  To: Jiri Slaby, linux-usb, netdev, David S. Miller
  Cc: stable, Greg Kroah-Hartman, LKML
In-Reply-To: <3b118091-d600-0be8-3204-c0f794ef6288@suse.cz>

On Sat, 2018-08-25 at 09:43 +0200, Jiri Slaby wrote:
> On 08/24/2018, 06:38 PM, Ben Hutchings wrote:
> > On Fri, 2018-07-20 at 14:13 +0200, Greg Kroah-Hartman wrote:
> > > 4.4-stable review patch.  If anyone has any objections, please let me know.
> > > 
> > > ------------------
> > > 
> > > From: Jiri Slaby <jslaby@suse.cz>
> > > 
> > > [ Upstream commit 0ee1f4734967af8321ecebaf9c74221ace34f2d5 ]
> > 
> > [...]
> > > --- a/drivers/net/usb/r8152.c
> > > +++ b/drivers/net/usb/r8152.c
> > > @@ -3139,7 +3139,8 @@ static int rtl8152_close(struct net_devi
> > >  #ifdef CONFIG_PM_SLEEP
> > >  	unregister_pm_notifier(&tp->pm_notifier);
> > >  #endif
> > > -	napi_disable(&tp->napi);
> > > +	if (!test_bit(RTL8152_UNPLUG, &tp->flags))
> > > +		napi_disable(&tp->napi);
> > >  	clear_bit(WORK_ENABLE, &tp->flags);
> > >  	usb_kill_urb(tp->intr_urb);
> > >  	cancel_delayed_work_sync(&tp->schedule);
> > 
> > This flag appears to be set only if the USB device is actually
> > disconnected.  In case the driver is unbound for some other reason
> > (like the module is removed), the same problem will occur.
> 
> Could you elaborate? I thought this would happen:
> module_exit -> usb_deregister -> usb_unbind_device -> rtl8152_disconnect
> -> unregister_netdev -> rtl8152_close
> 
> Am I missing something?

What I mean is that if the USB device has not been *physically*
disconnected then its usb_device::state will not be
USB_STATE_NOTATTACHED.  So rtl8152_disconnect() will not set the
RTL8152_UNPLUG flag and rtl8152_close() will still call napi_disable()
which will hang.

Some options to fix this:
- Add a separate flag which rtl8152_close() checks and
rtl8152_disconnect() always sets
- Call dev_close() before netif_napi_del()

Ben.

-- 
Ben Hutchings, Software Developer                         Codethink Ltd
https://www.codethink.co.uk/                 Dale House, 35 Dale Street
                                     Manchester, M1 2HF, United Kingdom

^ permalink raw reply

* Re: [net-next, v2, 1/2] net: stmmac: Rework coalesce timer and fix multi-queue races
From: Neil Armstrong @ 2018-09-12 13:50 UTC (permalink / raw)
  To: Jose Abreu, netdev
  Cc: Jerome Brunet, Martin Blumenstingl, David S. Miller, Joao Pinto,
	Giuseppe Cavallaro, Alexandre Torgue
In-Reply-To: <49853fcd-dac6-147b-736e-1cd2bd5924e5@synopsys.com>

Hi Jose,

On 11/09/2018 10:17, Jose Abreu wrote:
> On 10-09-2018 19:15, Neil Armstrong wrote:
>>
>> RX is still ok but now TX fails almost immediately...
>>
>> With 100ms report :
>>
>> $ iperf3 -c 192.168.1.47 -t 0 -p 5202 -R -i 0.1
>> Connecting to host 192.168.1.47, port 5202
>> Reverse mode, remote host 192.168.1.47 is sending
>> [  4] local 192.168.1.45 port 45900 connected to 192.168.1.47 port 5202
>> [ ID] Interval           Transfer     Bandwidth
>> [  4]   0.00-0.10   sec  10.9 MBytes   913 Mbits/sec
>> [  4]   0.10-0.20   sec  11.0 MBytes   923 Mbits/sec
>> [  4]   0.20-0.30   sec  6.34 MBytes   532 Mbits/sec
>> [  4]   0.30-0.40   sec  0.00 Bytes  0.00 bits/sec
>> [  4]   0.40-0.50   sec  0.00 Bytes  0.00 bits/sec
>> [  4]   0.50-0.60   sec  0.00 Bytes  0.00 bits/sec
>> [  4]   0.60-0.70   sec  0.00 Bytes  0.00 bits/sec
>> [  4]   0.70-0.80   sec  0.00 Bytes  0.00 bits/sec
>> [  4]   0.80-0.90   sec  0.00 Bytes  0.00 bits/sec
>> [  4]   0.90-1.00   sec  0.00 Bytes  0.00 bits/sec
>> [  4]   1.00-1.10   sec  0.00 Bytes  0.00 bits/sec
>> ^C[  4]   1.10-1.10   sec  0.00 Bytes  0.00 bits/sec
>> - - - - - - - - - - - - - - - - - - - - - - - - -
>> [ ID] Interval           Transfer     Bandwidth
>> [  4]   0.00-1.10   sec  0.00 Bytes  0.00 bits/sec                  sender
>> [  4]   0.00-1.10   sec  28.2 MBytes   214 Mbits/sec                  receiver
>> iperf3: interrupt - the client has terminated
>>
>> Neil
> 
> Ok, here goes another incremental patch. If this doesn't work can
> you please send me a link to the spec of the board you are using ?

Sorry for the delay...

Not better, sorry.

$ iperf3 -c 10.1.3.201 -p 5202 -R -t 0
Connecting to host 10.1.3.201, port 5202
Reverse mode, remote host 10.1.3.201 is sending
[  4] local 10.1.2.12 port 60612 connected to 10.1.3.201 port 5202
[ ID] Interval           Transfer     Bandwidth
[  4]   0.00-1.00   sec   110 MBytes   920 Mbits/sec
[  4]   1.00-2.00   sec   110 MBytes   926 Mbits/sec
[  4]   2.00-3.00   sec  1.94 MBytes  16.3 Mbits/sec
[  4]   3.00-4.00   sec  0.00 Bytes  0.00 bits/sec
^C[  4]   4.00-4.76   sec  0.00 Bytes  0.00 bits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth
[  4]   0.00-4.76   sec  0.00 Bytes  0.00 bits/sec                  sender
[  4]   0.00-4.76   sec   222 MBytes   391 Mbits/sec                  receiver
iperf3: interrupt - the client has terminated

The board is the Amlogic S400 with the A113D SoC, sorry there is no public spec for this board and for this SoC.

Neil

> 
> Thanks and Best Regards,
> Jose Miguel Abreu
> 

^ permalink raw reply

* Re: [PATCH net-next v2 2/2] net: stmmac: Fixup the tail addr setting in xmit path
From: Jose Abreu @ 2018-09-12 14:17 UTC (permalink / raw)
  To: Florian Fainelli, Jose Abreu, netdev
  Cc: David S. Miller, Joao Pinto, Giuseppe Cavallaro, Alexandre Torgue
In-Reply-To: <ea85407f-c1b3-4070-fd8b-bca55cb1962c@gmail.com>

Hi Florian,

On 10-09-2018 19:46, Florian Fainelli wrote:
>
> Can you include the appropriate Fixes tag here so this can easily be
> backported to relevant stable branches?

Well I guess it goes since forever but it can only cause a major
impact in xgmac2 operation, remaining shall be okay.

I didn't add a Fixes tag because xgmac2 was merged quite recently
... Will add in next version.

Thanks and Best Regards,
Jose Miguel Abreu

^ permalink raw reply

* Re: [PATCH net-next v2 1/2] net: stmmac: Rework coalesce timer and fix multi-queue races
From: Jose Abreu @ 2018-09-12 14:23 UTC (permalink / raw)
  To: Florian Fainelli, Jose Abreu, netdev, Tal Gilboa
  Cc: Jerome Brunet, Martin Blumenstingl, David S. Miller, Joao Pinto,
	Giuseppe Cavallaro, Alexandre Torgue
In-Reply-To: <f0288def-ecf3-851b-fe81-12f0f79a061d@gmail.com>

Hi Florian,

Thanks for your input.

On 10-09-2018 20:22, Florian Fainelli wrote:
> On 09/10/2018 02:14 AM, Jose Abreu wrote:
>> This follows David Miller advice and tries to fix coalesce timer in
>> multi-queue scenarios.
>>
>> We are now using per-queue coalesce values and per-queue TX timer.
>>
>> Coalesce timer default values was changed to 1ms and the coalesce frames
>> to 25.
>>
>> Tested in B2B setup between XGMAC2 and GMAC5.
> Why not revert the entire features for this merge window and work on
> getting it to work over the next weeks/merge windows?

It was already reverted but the performance drops a little bit
(not that much but I'm trying to fix it).

>
> The idea of using a timer to coalesce TX path when there is not a HW
> timer is a good idea and if this is made robust enough, you could even
> promote that as being a network stack library/feature that could be used
> by other drivers. In fact, this could be a great addition to the net DIM
> library (Tal, what do you think?)
>
> Here's a quick drive by review of things that appear wrong in the
> current driver (without your patches):
>
> - in stmmac_xmit(), in case we hit the !is_jumbo branch and we fail the
> DMA mapping, there is no timer cancellation, don't we want to abort the
> whole transmission?

I don't think this is needed because then tx pointer will not
advance and in stmmac_tx_clean we just won't perform any work.
Besides, we can have a pending timer from previous packets
running so canceling it can cause some problems.

>
> - stmmac_tx_clean() should probably use netif_lock_bh() to guard against
> the timer (soft IRQ context) and the the NAPI context (also soft IRQ)
> running in parallel on two different CPUs. This may not explain all
> problems, but these two things are fundamentally exclusive, because the
> timer is meant to emulate the interrupt after N packets, while NAPI
> executes when such a thing did actually occur

Ok, and now I'm also using __netif_tx_lock_bh(queue) to just lock
per queue instead of the whole TX.

>
> - stmmac_poll() should cancel pending timer(s) if it was able to reclaim
> packets, likewise stmmac_tx_timer() should re-enable TX interrupts if it
> reclaimed packets, since TX interrupts could have been left disabled
> from a prior NAPI run. These could be considered optimizations, since
> you could leave the TX timer running all the time, just adjust the
> deadline (based on line rate, MTU, IPG, number of fragments and their
> respective length), worst case, both NAPI and the timer clean up your TX
> ring, so you should always have room to push more packets

In next version I'm dropping the direct call to stmmac_tx_clean()
in the timer function and just scheduling NAPI instead.

Thanks and Best Regards,
Jose Miguel Abreu

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox