Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH 2/6] net: ethernet: ti: cpts: add support for ext rftclk selection
From: Rob Herring @ 2016-12-05 14:51 UTC (permalink / raw)
  To: Grygorii Strashko
  Cc: David S. Miller, netdev, Mugunthan V N, Richard Cochran,
	Sekhar Nori, linux-kernel, linux-omap, devicetree,
	Murali Karicheri, Wingman Kwok
In-Reply-To: <20161128230428.6872-3-grygorii.strashko@ti.com>

On Mon, Nov 28, 2016 at 05:04:24PM -0600, Grygorii Strashko wrote:
> Some CPTS instances, which can be found on KeyStone 2 1/10G Ethernet
> Switch Subsystems, can control an external multiplexer that selects
> one of up to 32 clocks for time sync reference (RFTCLK). This feature
> can be configured through CPTS_RFTCLK_SEL register (offset: x08).
> 
> Hence, introduce optional DT cpts_rftclk_sel poperty wich, if present,
> will specify CPTS reference clock. The cpts_rftclk_sel should be
> omitted in DT if HW doesn't support this feature. The external fixed
> rate clocks can be defined in board files as "fixed-clock".
> 
> Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
> ---
>  Documentation/devicetree/bindings/net/keystone-netcp.txt |  2 ++

Please group binding changes into a single patch.

>  drivers/net/ethernet/ti/cpts.c                           | 12 ++++++++++++
>  drivers/net/ethernet/ti/cpts.h                           |  8 +++++++-
>  3 files changed, 21 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/devicetree/bindings/net/keystone-netcp.txt b/Documentation/devicetree/bindings/net/keystone-netcp.txt
> index c37b54e..ec4a241 100644
> --- a/Documentation/devicetree/bindings/net/keystone-netcp.txt
> +++ b/Documentation/devicetree/bindings/net/keystone-netcp.txt
> @@ -114,6 +114,8 @@ Optional properties:
>  				driver to them if needed.
>  
>   Properties related to cpts configurations.
> +	- cpts-rftclk-sel: selects one of up to 32 clocks for time sync
> +		reference.  Default = 0.

Vendor prefix.

>  	- cpts_clock_mult/cpts_clock_shift:
>  		used for converting time counter cycles to ns as in
>  

^ permalink raw reply

* Re: [PATCH 1/6] net: ethernet: ti: netcp: add support of cpts
From: Rob Herring @ 2016-12-05 14:49 UTC (permalink / raw)
  To: Grygorii Strashko
  Cc: David S. Miller, netdev, Mugunthan V N, Richard Cochran,
	Sekhar Nori, linux-kernel, linux-omap, devicetree,
	Murali Karicheri, Wingman Kwok
In-Reply-To: <20161128230428.6872-2-grygorii.strashko@ti.com>

On Mon, Nov 28, 2016 at 05:04:23PM -0600, Grygorii Strashko wrote:
> From: WingMan Kwok <w-kwok2@ti.com>
> 
> This patch adds support of the cpts device found in the
> gbe and 10gbe ethernet switches on the keystone 2 SoCs
> (66AK2E/L/Hx, 66AK2Gx).
> 
> Signed-off-by: WingMan Kwok <w-kwok2@ti.com>
> Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
> ---
>  .../devicetree/bindings/net/keystone-netcp.txt     |   9 +
>  drivers/net/ethernet/ti/Kconfig                    |   7 +-
>  drivers/net/ethernet/ti/netcp.h                    |   2 +-
>  drivers/net/ethernet/ti/netcp_core.c               |  18 +-
>  drivers/net/ethernet/ti/netcp_ethss.c              | 437 ++++++++++++++++++++-
>  5 files changed, 459 insertions(+), 14 deletions(-)
> 
> diff --git a/Documentation/devicetree/bindings/net/keystone-netcp.txt b/Documentation/devicetree/bindings/net/keystone-netcp.txt
> index 04ba1dc..c37b54e 100644
> --- a/Documentation/devicetree/bindings/net/keystone-netcp.txt
> +++ b/Documentation/devicetree/bindings/net/keystone-netcp.txt
> @@ -113,6 +113,15 @@ Optional properties:
>  				will only initialize these ports and attach PHY
>  				driver to them if needed.
>  
> + Properties related to cpts configurations.
> +	- cpts_clock_mult/cpts_clock_shift:

Needs vendor prefix. Don't use '_'.

> +		used for converting time counter cycles to ns as in
> +
> +		ns = (cycles * clock_mult) >> _shift
> +
> +		Defaults: clock_mult, clock_shift = calculated from
> +		CPTS refclk

What does this mean?

> +
>  NetCP interface properties: Interface specification for NetCP sub-modules.
>  Required properties:
>  - rx-channel:	the navigator packet dma channel name for rx.

^ permalink raw reply

* Re: [PATCH 3/3] netns: fix net_generic() "id - 1" bloat
From: Alexey Dobriyan @ 2016-12-05 14:48 UTC (permalink / raw)
  To: David Laight
  Cc: davem@davemloft.net, netdev@vger.kernel.org, xemul@openvz.org
In-Reply-To: <063D6719AE5E284EB5DD2968C1650D6DB0234CCB@AcuExch.aculab.com>

On Mon, Dec 5, 2016 at 3:49 PM, David Laight <David.Laight@aculab.com> wrote:
> From: Alexey Dobriyan
>> Sent: 02 December 2016 01:22
>> net_generic() function is both a) inline and b) used ~600 times.
>>
>> It has the following code inside
>>
>>               ...
>>       ptr = ng->ptr[id - 1];
>>               ...
>>
>> "id" is never compile time constant so compiler is forced to subtract 1.
>> And those decrements or LEA [r32 - 1] instructions add up.
>>
>> We also start id'ing from 1 to catch bugs where pernet sybsystem id
>> is not initialized and 0. This is quite pointless idea (nothing will
>> work or immediate interference with first registered subsystem) in
>> general but it hints what needs to be done for code size reduction.
>>
>> Namely, overlaying allocation of pointer array and fixed part of
>> structure in the beginning and using usual base-0 addressing.
>>
>> Ids are just cookies, their exact values do not matter, so lets start
>> with 3 on x86_64.
> ...
>>  struct net_generic {
>> -     struct {
>> -             unsigned int len;
>> -             struct rcu_head rcu;
>> -     } s;
>> -
>> -     void *ptr[0];
>> +     union {
>> +             struct {
>> +                     unsigned int len;
>> +                     struct rcu_head rcu;
>> +             } s;
>> +
>> +             void *ptr[0];
>> +     };
>>  };
>
> That union is an accident waiting to happen.

I kind of disagree. Module authors should not be given matches,
but it is hard to screw up if net_generic() is all you're given.

> What might work is to offset the Ids by
> (offsetof(struct net_generic, ptr)/sizeof (void *)) instead of by 1.
> The subtract from the offset will then counter the structure offset
> - which is what you are trying to achieve.

If you suggest this layout

struct net_generic {
        struct {
        } s;
        void *ptr[0];
};

then is it not optimal because offset of "ptr" needs to be somewhere in code
either in some LEA or imm8 of the final MOV which is 1 byte more bloaty.

Here is test program

struct ng1 {
        union {
                struct {
                        unsigned int len;
                } s;
                void *ptr[0];
        };
};
struct ng2 {
        struct {
                unsigned int len;
        } s;
        void *ptr[0];
};
struct net {
        int x;
        struct ng1 *gen1;
        struct ng2 *gen2;
};
void *ng1(const struct net *net, unsigned int id)
{
        return net->gen1->ptr[id];
}
void *ng2(const struct net *net, unsigned int id)
{
        return net->gen2->ptr[id];
}


0000000000000000 <ng1>:
   0:   48 8b 47 08             mov    rax,QWORD PTR [rdi+0x8]
   4:   89 f6                   mov    esi,esi
   6:   48 8b 04 f0             mov    rax,QWORD PTR [rax+rsi*8]
   a:   c3                      ret


0000000000000010 <ng2>:
  10:   48 8b 47 10             mov    rax,QWORD PTR [rdi+0x10]
  14:   89 f6                   mov    esi,esi
  16:   48 8b 44 f0 [[[08]]]          mov    rax,QWORD PTR [rax+rsi*8+0x8]
  1b:   c3                      ret

I can send more type checking if you can tolerate "_" identifier

    struct {
        unsigned int _;
    } pernet_ops_id_t;

^ permalink raw reply

* Re: [RFC] udp: some improvements on RX path.
From: Eric Dumazet @ 2016-12-05 14:28 UTC (permalink / raw)
  To: Paolo Abeni; +Cc: netdev
In-Reply-To: <1480944138.4694.37.camel@redhat.com>

On Mon, 2016-12-05 at 14:22 +0100, Paolo Abeni wrote:
> Hi Eric,
> 
> On Sun, 2016-12-04 at 18:43 -0800, Eric Dumazet wrote:
> > We currently access 3 cache lines from an skb in receive queue while
> > holding receive queue lock :
> > 
> > First cache line (contains ->next / prev pointers )
> > 2nd cache line (skb->peeked)
> > 3rd cache line (skb->truesize)
> > 
> > I believe we could get rid of skb->peeked completely.
> > 
> > I will cook a patch, but basically the idea is that the last owner of a
> > skb (right before skb->users becomes 0) can have the 'ownership' and
> > thus increase stats.
> 
> Agreed.
> 
> > The 3rd cache line miss is easily avoided by the following patch.
> 
> I run some performance tests on top of your patch "net: reorganize
> struct sock for better data locality", and I see an additional ~7%
> improvement on top of that, in the udp flood scenario. 
> 
> In my tests, the topmost perf offenders for the u/s process are now:
> 
>    9.98%  udp_sink  [kernel.kallsyms]  [k] udp_rmem_release
>    8.76%  udp_sink  [kernel.kallsyms]  [k] inet_recvmsg
>    6.71%  udp_sink  [kernel.kallsyms]  [k] _raw_spin_lock_irqsave
>    5.40%  udp_sink  [kernel.kallsyms]  [k] __skb_try_recv_datagram
>    5.19%  udp_sink  [kernel.kallsyms]  [k] copy_user_enhanced_fast_string
> 
> udp_rmem_release() spends most of its time doing:
> 
>     atomic_sub(size, &sk->sk_rmem_alloc);
> 
> I see a cacheline miss while accessing sk_rmem_alloc; most probably due
> to sk_rmem_alloc being updated by the writer outside the rx queue lock.
> 
> Moving such update inside the lock show remove this cache miss but will
> increase the pressure on the rx lock. What do you think ?

I want to accumulate the sk_rmem_alloc  deficit in another variable in
the same cache line than the secondary list, updated from process
context at udp recvmsg() time.

And only transfer the accumulated deficit in one go when the second
queue is emptied, or if the accumulated deficite is > (rcvbuf/2)

If done right, we should interfere between the softirq flooder(s) and
the process thread only once per batch.

> 
> inet_recvmsg() is there because with "net: reorganize struct sock for
> better data locality" we get a cache miss while accessing skc_rxhash in
> sock_rps_record_flow(); touching sk_drops is dirtying that cacheline -
> sorry for not noticing this before. Do you have CONFIG_RPS disabled ?
> 
> > But I also want to work on the idea I gave few days back, having a
> > separate queue and use splice to transfer the 'softirq queue' into
> > a calm queue in a different cache line.
> > 
> > I expect a 50 % performance increase under load, maybe 1.5 Mpps.
> 
> It should work nicely under contention, but won't that increase the
> overhead for the uncontended/single flow scenario ? the user space
> reader needs to acquire 2 lock when splicing the 'softirq queue'. On my
> system ksoftirqd and the u/s process work at similar speeds, so splicing
> will happen quite often. 

Well, the splice would happen only if you have more than one message in
the softirq queue. So no real overhead for uncontended flow scenario.


This reminds me of the busylock I added in __dev_xmit_skb(), which
basically is acquired only when we detect a possible contention on qdisc
lock.

Thanks.

^ permalink raw reply

* Re: [PATCH net-next v4 2/4] dt-bindings: net: add EEE capability constants
From: Rob Herring @ 2016-12-05 14:39 UTC (permalink / raw)
  To: Jerome Brunet
  Cc: netdev, devicetree, Florian Fainelli, Carlo Caione, Kevin Hilman,
	Giuseppe Cavallaro, Alexandre TORGUE, Martin Blumenstingl,
	Andre Roth, Andrew Lunn, Neil Armstrong, linux-amlogic,
	linux-arm-kernel, linux-kernel, Julia Lawall, Yegor Yefremov,
	Andreas Färber
In-Reply-To: <1480348229-25672-3-git-send-email-jbrunet@baylibre.com>

On Mon, Nov 28, 2016 at 04:50:26PM +0100, Jerome Brunet wrote:
> Signed-off-by: Jerome Brunet <jbrunet@baylibre.com>
> Tested-by: Yegor Yefremov <yegorslists@googlemail.com>
> Tested-by: Andreas Färber <afaerber@suse.de>
> Tested-by: Neil Armstrong <narmstrong@baylibre.com>
> ---
>  include/dt-bindings/net/mdio.h | 19 +++++++++++++++++++
>  1 file changed, 19 insertions(+)
>  create mode 100644 include/dt-bindings/net/mdio.h

Seems changes are wanted on this, but patches 2 and 3 should be 
combined. The header is part of the binding doc.

Rob

^ permalink raw reply

* Re: [PATCH net-next v6 6/6] samples/bpf: add userspace example for prohibiting sockets
From: David Ahern @ 2016-12-05 14:31 UTC (permalink / raw)
  To: David Laight, Alexei Starovoitov
  Cc: netdev@vger.kernel.org, daniel@zonque.org, ast@fb.com,
	daniel@iogearbox.net, maheshb@google.com, tgraf@suug.ch
In-Reply-To: <063D6719AE5E284EB5DD2968C1650D6DB0234BBD@AcuExch.aculab.com>

On 12/5/16 3:51 AM, David Laight wrote:
> You can at least improve the comment.

I fixed the header files for v7.

^ permalink raw reply

* Designing a safe RX-zero-copy Memory Model for Networking
From: Jesper Dangaard Brouer @ 2016-12-05 14:31 UTC (permalink / raw)
  To: netdev@vger.kernel.org
  Cc: brouer, linux-mm, John Fastabend, Willem de Bruijn,
	Björn Töpel, Karlsson, Magnus, Alexander Duyck,
	Mel Gorman, Tom Herbert, Brenden Blanco, Tariq Toukan,
	Saeed Mahameed, Jesse Brandeburg

Hi all,

This is my design for how to safely handle RX zero-copy in the network
stack, by using page_pool[1] and modifying NIC drivers.  Safely means
not leaking kernel info in pages mapped to userspace and resilience
so a malicious userspace app cannot crash the kernel.

It is only a design, and thus the purpose is for you to find any holes
in this design ;-)  Below text is also available as html see[2].

[1] https://prototype-kernel.readthedocs.io/en/latest/vm/page_pool/design/design.html
[2] https://prototype-kernel.readthedocs.io/en/latest/vm/page_pool/design/memory_model_nic.html

===========================
Memory Model for Networking
===========================

This design describes how the page_pool change the memory model for
networking in the NIC (Network Interface Card) drivers.

.. Note:: The catch for driver developers is that, once an application
          request zero-copy RX, then the driver must use a specific
          SKB allocation mode and might have to reconfigure the
          RX-ring.

Design target
=============

Allow the NIC to function as a normal Linux NIC and be shared in a
safe manor, between the kernel network stack and an accelerated
userspace application using RX zero-copy delivery.

Target is to provide the basis for building RX zero-copy solutions in
a memory safe manor.  An efficient communication channel for userspace
delivery is out of scope for this document, but OOM considerations are
discussed below (`Userspace delivery and OOM`_).

Background
==========

The SKB or ``struct sk_buff`` is the fundamental meta-data structure
for network packets in the Linux Kernel network stack.  It is a fairly
complex object and can be constructed in several ways.

From a memory perspective there are two ways depending on
RX-buffer/page state:

1) Writable packet page
2) Read-only packet page

To take full potential of the page_pool, the drivers must actually
support handling both options depending on the configuration state of
the page_pool.

Writable packet page
--------------------

When the RX packet page is writable, the SKB setup is fairly straight
forward.  The SKB->data (and skb->head) can point directly to the page
data, adjusting the offset according to drivers headroom (for adding
headers) and setting the length according to the DMA descriptor info.

The page/data need to be writable, because the network stack need to
adjust headers (like TimeToLive and checksum) or even add or remove
headers for encapsulation purposes.

A subtle catch, which also requires a writable page, is that the SKB
also have an accompanying "shared info" data-structure ``struct
skb_shared_info``.  This "skb_shared_info" is written into the
skb->data memory area at the end (skb->end) of the (header) data.  The
skb_shared_info contains semi-sensitive information, like kernel
memory pointers to other pages (which might be pointers to more packet
data).  This would be bad from a zero-copy point of view to leak this
kind of information.

Read-only packet page
---------------------

When the RX packet page is read-only, the construction of the SKB is
significantly more complicated and even involves one more memory
allocation.

1) Allocate a new separate writable memory area, and point skb->data
   here.  This is needed due to (above described) skb_shared_info.

2) Memcpy packet headers into this (skb->data) area.

3) Clear part of skb_shared_info struct in writable-area.

4) Setup pointer to packet-data in the page (in skb_shared_info->frags)
   and adjust the page_offset to be past the headers just copied.

It is useful (later) that the network stack have this notion that part
of the packet and a page can be read-only.  This implies that the
kernel will not "pollute" this memory with any sensitive information.
This is good from a zero-copy point of view, but bad from a
performance perspective.


NIC RX Zero-Copy
================

Doing NIC RX zero-copy involves mapping RX pages into userspace.  This
involves costly mapping and unmapping operations in the address space
of the userspace process.  Plus for doing this safely, the page memory
need to be cleared before using it, to avoid leaking kernel
information to userspace, also a costly operation.  The page_pool base
"class" of optimization is moving these kind of operations out of the
fastpath, by recycling and lifetime control.

Once a NIC RX-queue's page_pool have been configured for zero-copy
into userspace, then can packets still be allowed to travel the normal
stack?

Yes, this should be possible, because the driver can use the
SKB-read-only mode, which avoids polluting the page data with
kernel-side sensitive data.  This implies, when a driver RX-queue
switch page_pool to RX-zero-copy mode it MUST also switch to
SKB-read-only mode (for normal stack delivery for this RXq).

XDP can be used for controlling which pages that gets RX zero-copied
to userspace.  The page is still writable for the XDP program, but
read-only for normal stack delivery.


Kernel safety
-------------

For the paranoid, how do we protect the kernel from a malicious
userspace program.  Sure there will be a communication interface
between kernel and userspace, that synchronize ownership of pages.
But a userspace program can violate this interface, given pages are
kept VMA mapped, the program can in principle access all the memory
pages in the given page_pool.  This opens up for a malicious (or
defect) program modifying memory pages concurrently with the kernel
and DMA engine using them.

An easy way to get around userspace modifying page data contents is
simply to map pages read-only into userspace.

.. Note:: The first implementation target is read-only zero-copy RX
          page to userspace and require driver to use SKB-read-only
          mode.

Advanced: Allowing userspace write access?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

What if userspace need write access? Flipping the page permissions per
transfer will likely kill performance (as this likely affects the
TLB-cache).

I will argue that giving userspace write access is still possible,
without risking a kernel crash.  This is related to the SKB-read-only
mode that copies the packet headers (in to another memory area,
inaccessible to userspace).  The attack angle is to modify packet
headers after they passed some kernel network stack validation step
(as once headers are copied they are out of "reach").

Situation classes where memory page can be modified concurrently:

1) When DMA engine owns the page.  Not a problem, as DMA engine will
   simply overwrite data.

2) Just after DMA engine finish writing.  Not a problem, the packet
   will go through netstack validation and be rejected.

3) While XDP reads data. This can lead to XDP/eBPF program goes into a
   wrong code branch, but the eBPF virtual machine should not be able
   to crash the kernel. The worst outcome is a wrong or invalid XDP
   return code.

4) Before SKB with read-only page is constructed. Not a problem, the
   packet will go through netstack validation and be rejected.

5) After SKB with read-only page has been constructed.  Remember the
   packet headers were copied into a separate memory area, and the
   page data is pointed to with an offset passed the copied headers.
   Thus, userspace cannot modify the headers used for netstack
   validation.  It can only modify packet data contents, which is less
   critical as it cannot crash the kernel, and eventually this will be
   caught by packet checksum validation.

6) After netstack delivered packet to another userspace process. Not a
   problem, as it cannot crash the kernel.  It might corrupt
   packet-data being read by another userspace process, which one
   argument for requiring elevated privileges to get write access
   (like NET_CAP_ADMIN).


Userspace delivery and OOM
--------------------------

These RX pages are likely mapped to userspace via mmap(), so-far so
good.  It is key to performance to get an efficient way of signaling
between kernel and userspace, e.g what page are ready for consumption,
and when userspace are done with the page.

It is outside the scope of page_pool to provide such a queuing
structure, but the page_pool can offer some means of protecting the
system resource usage.  It is a classical problem that resources
(e.g. the page) must be returned in a timely manor, else the system,
in this case, will run out of memory.  Any system/design with
unbounded memory allocation can lead to Out-Of-Memory (OOM)
situations.

Communication between kernel and userspace is likely going to be some
kind of queue.  Given transferring packets individually will have too
much scheduling overhead.  A queue can implicitly function as a
bulking interface, and offers a natural way to split the workload
across CPU cores.

This essentially boils down-to a two queue system, with the RX-ring
queue and the userspace delivery queue.

Two bad situations exists for the userspace queue:

1) Userspace is not consuming objects fast-enough. This should simply
   result in packets getting dropped when enqueueing to a full
   userspace queue (as queue *must* implement some limit). Open
   question is; should this be reported or communicated to userspace.

2) Userspace is consuming objects fast, but not returning them in a
   timely manor.  This is a bad situation, because it threatens the
   system stability as it can lead to OOM.

The page_pool should somehow protect the system in case 2.  The
page_pool can detect the situation as it is able to track the number
of outstanding pages, due to the recycle feedback loop.  Thus, the
page_pool can have some configurable limit of allowed outstanding
pages, which can protect the system against OOM.

Note, the `Fbufs paper`_ propose to solve case 2 by allowing these
pages to be "pageable", i.e. swap-able, but that is not an option for
the page_pool as these pages are DMA mapped.

.. _`Fbufs paper`:
   http://citeseer.ist.psu.edu/viewdoc/summary?doi=10.1.1.52.9688

Effect of blocking allocation
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The effect of page_pool, in case 2, that denies more allocations
essentially result-in the RX-ring queue cannot be refilled and HW
starts dropping packets due to "out-of-buffers".  For NICs with
several HW RX-queues, this can be limited to a subset of queues (and
admin can control which RX queue with HW filters).

The question is if the page_pool can do something smarter in this
case, to signal the consumers of these pages, before the maximum limit
is hit (of allowed outstanding packets).  The MM-subsystem already
have a concept of emergency PFMEMALLOC reserves and associate
page-flags (e.g. page_is_pfmemalloc).  And the network stack already
handle and react to this.  Could the same PFMEMALLOC system be used
for marking pages when limit is close?

This requires further analysis. One can imagine; this could be used at
RX by XDP to mitigate the situation by dropping less-important frames.
Given XDP choose which pages are being send to userspace it might have
appropriate knowledge of what it relevant to drop(?).

.. Note:: An alternative idea is using a data-structure that blocks
          userspace from getting new pages before returning some.
          (out of scope for the page_pool)

--
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

Above document is taken at GitHub commit 47fa7c844f48fab8b
 https://github.com/netoptimizer/prototype-kernel/commit/47fa7c844f48fab8b

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [PATCH v2 net-next 7/8] net: reorganize struct sock for better data locality
From: Eric Dumazet @ 2016-12-05 14:30 UTC (permalink / raw)
  To: Paolo Abeni; +Cc: Eric Dumazet, David S . Miller, netdev, Yuchung Cheng
In-Reply-To: <1480941384.4694.13.camel@redhat.com>

On Mon, 2016-12-05 at 13:36 +0100, Paolo Abeni wrote:
> Hi Eric,

> I cherry-picked this patch only for some UDP benchmark. Under flood with
> many concurrent flows, I see this 20% improvement and a relevant
> decrease in system load.
> 
> Nice work, thanks Eric!
> 
> Tested-by: Paolo Abeni <pabeni@redhat.com>

Excellent, thanks a lot for testing !

^ permalink raw reply

* Re: [PATCH v2 net-next 1/2] flow dissector: ICMP support
From: Jiri Pirko @ 2016-12-05 14:27 UTC (permalink / raw)
  To: Simon Horman
  Cc: Tom Herbert, David Miller, Linux Kernel Network Developers,
	Jay Vosburgh, Veaceslav Falico, Andy Gospodarek, Jamal Hadi Salim,
	Jiri Pirko
In-Reply-To: <20161205142208.GA16425@penelope.horms.nl>

Mon, Dec 05, 2016 at 03:23:16PM CET, simon.horman@netronome.com wrote:
>On Sat, Dec 03, 2016 at 10:52:32AM -0800, Tom Herbert wrote:
>> On Sat, Dec 3, 2016 at 2:49 AM, Jiri Pirko <jiri@resnulli.us> wrote:
>> > Fri, Dec 02, 2016 at 09:31:41PM CET, simon.horman@netronome.com wrote:
>> >>Allow dissection of ICMP(V6) type and code. This re-uses transport layer
>> >>port dissection code as although ICMP is not a transport protocol and their
>> >>type and code are not ports this allows sharing of both code and storage.
>> >>
>> >>Signed-off-by: Simon Horman <simon.horman@netronome.com>
>
>...
>
>> > Digging into this a bit more. I think it would be much nice not to mix
>> > up l4 ports and icmp stuff.
>> >
>> > How about to have FLOW_DISSECTOR_KEY_ICMP
>> > and
>> > struct flow_dissector_key_icmp {
>> >         u8 type;
>> >         u8 code;
>> > };
>> >
>> > The you can make this structure and struct flow_dissector_key_ports into
>> > an union in struct flow_keys.
>> >
>> > Looks much cleaner to me.
>
>Hi Jiri,
>
>I wondered about that too. The main reason that I didn't do this the first
>time around is that I wasn't sure what to do to differentiate between ports
>and icmp in fl_init_dissector() given that using a union would could to
>mask bits being set in both if they are set in either.
>
>I've taken a stab at addressing that below along with implementing your
>suggestion. If the treatment in fl_init_dissector() is acceptable
>perhaps something similar should be done for FLOW_DISSECTOR_KEY_IPV[46]_ADDRS

Looks fine.

>too?
>
>> I agree, this patch adds to many conditionals into the fast path for
>> ICMP handling. Neither is there much point in using type and code as
>> input to the packet hash.
>
>Hi Tom,
>
>I'm not sure that I follow this. A packet may be ICMP or not regardless of
>if FLOW_DISSECTOR_KEY_PORT (and now FLOW_DISSECTOR_KEY_ICMP) port is set or
>not. I'd appreciate some guidance here.
>
>
>First-pass at implementing Jiri's suggestion follows:
>
>diff --git a/include/net/flow_dissector.h b/include/net/flow_dissector.h
>index 5540dfa18872..475cd121496f 100644
>--- a/include/net/flow_dissector.h
>+++ b/include/net/flow_dissector.h
>@@ -91,14 +91,10 @@ struct flow_dissector_key_addrs {
> 
> /**
>  * flow_dissector_key_ports:
>- *	@ports: port numbers of Transport header or
>- *		type and code of ICMP header
>+ *	@ports: port numbers of Transport header
>  *		ports: source (high) and destination (low) port numbers
>  *		src: source port number
>  *		dst: destination port number
>- *		icmp: ICMP type (high) and code (low)
>- *		type: ICMP type
>- *		type: ICMP code
>  */
> struct flow_dissector_key_ports {
> 	union {
>@@ -107,6 +103,18 @@ struct flow_dissector_key_ports {
> 			__be16 src;
> 			__be16 dst;
> 		};
>+	};
>+};
>+
>+/**
>+ * flow_dissector_key_icmp:
>+ *	@ports: type and code of ICMP header
>+ *		icmp: ICMP type (high) and code (low)
>+ *		type: ICMP type
>+ *		code: ICMP code
>+ */
>+struct flow_dissector_key_icmp {
>+	union {
> 		__be16 icmp;
> 		struct {
> 			u8 type;
>@@ -133,6 +141,7 @@ enum flow_dissector_key_id {
> 	FLOW_DISSECTOR_KEY_IPV4_ADDRS, /* struct flow_dissector_key_ipv4_addrs */
> 	FLOW_DISSECTOR_KEY_IPV6_ADDRS, /* struct flow_dissector_key_ipv6_addrs */
> 	FLOW_DISSECTOR_KEY_PORTS, /* struct flow_dissector_key_ports */
>+	FLOW_DISSECTOR_KEY_ICMP, /* struct flow_dissector_key_icmp */
> 	FLOW_DISSECTOR_KEY_ETH_ADDRS, /* struct flow_dissector_key_eth_addrs */
> 	FLOW_DISSECTOR_KEY_TIPC_ADDRS, /* struct flow_dissector_key_tipc_addrs */
> 	FLOW_DISSECTOR_KEY_VLAN, /* struct flow_dissector_key_flow_vlan */
>@@ -171,7 +180,10 @@ struct flow_keys {
> 	struct flow_dissector_key_tags tags;
> 	struct flow_dissector_key_vlan vlan;
> 	struct flow_dissector_key_keyid keyid;
>-	struct flow_dissector_key_ports ports;
>+	union {
>+		struct flow_dissector_key_ports ports;
>+		struct flow_dissector_key_icmp icmp;
>+	};
> 	struct flow_dissector_key_addrs addrs;
> };
> 
>diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
>index 0584b4bb4390..33e5fa2b3e87 100644
>--- a/net/core/flow_dissector.c
>+++ b/net/core/flow_dissector.c
>@@ -139,6 +139,7 @@ bool __skb_flow_dissect(const struct sk_buff *skb,
> 	struct flow_dissector_key_basic *key_basic;
> 	struct flow_dissector_key_addrs *key_addrs;
> 	struct flow_dissector_key_ports *key_ports;
>+	struct flow_dissector_key_icmp *key_icmp;
> 	struct flow_dissector_key_tags *key_tags;
> 	struct flow_dissector_key_vlan *key_vlan;
> 	struct flow_dissector_key_keyid *key_keyid;
>@@ -559,18 +560,25 @@ bool __skb_flow_dissect(const struct sk_buff *skb,
> 		break;
> 	}
> 
>-	if (dissector_uses_key(flow_dissector,
>-			       FLOW_DISSECTOR_KEY_PORTS)) {
>-		key_ports = skb_flow_dissector_target(flow_dissector,
>-						      FLOW_DISSECTOR_KEY_PORTS,
>-						      target_container);
>-		if (flow_protos_are_icmp_any(proto, ip_proto))
>-			key_ports->icmp = skb_flow_get_be16(skb, nhoff, data,
>-							    hlen);
>-		else
>+	if (flow_protos_are_icmp_any(proto, ip_proto)) {
>+		if (dissector_uses_key(flow_dissector,
>+				       FLOW_DISSECTOR_KEY_ICMP)) {
>+			key_icmp = skb_flow_dissector_target(flow_dissector,
>+							     FLOW_DISSECTOR_KEY_ICMP,
>+							     target_container);
>+			key_icmp->icmp = skb_flow_get_be16(skb, nhoff, data,
>+							   hlen);
>+		}
>+	} else {
>+		if (dissector_uses_key(flow_dissector,
>+				       FLOW_DISSECTOR_KEY_PORTS)) {
>+			key_ports = skb_flow_dissector_target(flow_dissector,
>+							      FLOW_DISSECTOR_KEY_PORTS,
>+							      target_container);
> 			key_ports->ports = __skb_flow_get_ports(skb, nhoff,
> 								ip_proto, data,
> 								hlen);
>+		}
> 	}
> 
> out_good:
>@@ -975,6 +983,10 @@ static const struct flow_dissector_key flow_keys_dissector_keys[] = {
> 		.offset = offsetof(struct flow_keys, ports),
> 	},
> 	{
>+		.key_id = FLOW_DISSECTOR_KEY_ICMP,
>+		.offset = offsetof(struct flow_keys, icmp),
>+	},
>+	{
> 		.key_id = FLOW_DISSECTOR_KEY_VLAN,
> 		.offset = offsetof(struct flow_keys, vlan),
> 	},
>diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
>index c96b2a349779..f4ba69bd362f 100644
>--- a/net/sched/cls_flower.c
>+++ b/net/sched/cls_flower.c
>@@ -38,7 +38,10 @@ struct fl_flow_key {
> 		struct flow_dissector_key_ipv4_addrs ipv4;
> 		struct flow_dissector_key_ipv6_addrs ipv6;
> 	};
>-	struct flow_dissector_key_ports tp;
>+	union {
>+		struct flow_dissector_key_ports tp;
>+		struct flow_dissector_key_icmp icmp;
>+	};
> 	struct flow_dissector_key_keyid enc_key_id;
> 	union {
> 		struct flow_dissector_key_ipv4_addrs enc_ipv4;
>@@ -511,19 +514,23 @@ static int fl_set_key(struct net *net, struct nlattr **tb,
> 			       &mask->tp.dst, TCA_FLOWER_KEY_SCTP_DST_MASK,
> 			       sizeof(key->tp.dst));
> 	} else if (flow_basic_key_is_icmpv4(&key->basic)) {
>-		fl_set_key_val(tb, &key->tp.type, TCA_FLOWER_KEY_ICMPV4_TYPE,
>-			       &mask->tp.type, TCA_FLOWER_KEY_ICMPV4_TYPE_MASK,
>-			       sizeof(key->tp.type));
>-		fl_set_key_val(tb, &key->tp.code, TCA_FLOWER_KEY_ICMPV4_CODE,
>-			       &mask->tp.code, TCA_FLOWER_KEY_ICMPV4_CODE_MASK,
>-			       sizeof(key->tp.code));
>+		fl_set_key_val(tb, &key->icmp.type, TCA_FLOWER_KEY_ICMPV4_TYPE,
>+			       &mask->icmp.type,
>+			       TCA_FLOWER_KEY_ICMPV4_TYPE_MASK,
>+			       sizeof(key->icmp.type));
>+		fl_set_key_val(tb, &key->icmp.code, TCA_FLOWER_KEY_ICMPV4_CODE,
>+			       &mask->icmp.code,
>+			       TCA_FLOWER_KEY_ICMPV4_CODE_MASK,
>+			       sizeof(key->icmp.code));
> 	} else if (flow_basic_key_is_icmpv6(&key->basic)) {
>-		fl_set_key_val(tb, &key->tp.type, TCA_FLOWER_KEY_ICMPV6_TYPE,
>-			       &mask->tp.type, TCA_FLOWER_KEY_ICMPV6_TYPE_MASK,
>-			       sizeof(key->tp.type));
>-		fl_set_key_val(tb, &key->tp.code, TCA_FLOWER_KEY_ICMPV4_CODE,
>-			       &mask->tp.code, TCA_FLOWER_KEY_ICMPV4_CODE_MASK,
>-			       sizeof(key->tp.code));
>+		fl_set_key_val(tb, &key->icmp.type, TCA_FLOWER_KEY_ICMPV6_TYPE,
>+			       &mask->icmp.type,
>+			       TCA_FLOWER_KEY_ICMPV6_TYPE_MASK,
>+			       sizeof(key->icmp.type));
>+		fl_set_key_val(tb, &key->icmp.code, TCA_FLOWER_KEY_ICMPV4_CODE,
>+			       &mask->icmp.code,
>+			       TCA_FLOWER_KEY_ICMPV4_CODE_MASK,
>+			       sizeof(key->icmp.code));
> 	}
> 
> 	if (tb[TCA_FLOWER_KEY_ENC_IPV4_SRC] ||
>@@ -609,15 +616,16 @@ static int fl_init_hashtable(struct cls_fl_head *head,
> 		keys[cnt].key_id = id;						\
> 		keys[cnt].offset = FL_KEY_MEMBER_OFFSET(member);		\
> 		cnt++;								\
>-	} while(0);
>+	} while(0)
> 
> #define FL_KEY_SET_IF_MASKED(mask, keys, cnt, id, member)			\
> 	do {									\
> 		if (FL_KEY_IS_MASKED(mask, member))				\
> 			FL_KEY_SET(keys, cnt, id, member);			\
>-	} while(0);
>+	} while(0)
> 
> static void fl_init_dissector(struct cls_fl_head *head,
>+			      struct cls_fl_filter *f,
> 			      struct fl_flow_mask *mask)
> {
> 	struct flow_dissector_key keys[FLOW_DISSECTOR_KEY_MAX];
>@@ -631,8 +639,12 @@ static void fl_init_dissector(struct cls_fl_head *head,
> 			     FLOW_DISSECTOR_KEY_IPV4_ADDRS, ipv4);
> 	FL_KEY_SET_IF_MASKED(&mask->key, keys, cnt,
> 			     FLOW_DISSECTOR_KEY_IPV6_ADDRS, ipv6);
>-	FL_KEY_SET_IF_MASKED(&mask->key, keys, cnt,
>-			     FLOW_DISSECTOR_KEY_PORTS, tp);
>+	if (flow_basic_key_is_icmpv4(&f->key.basic))
>+		FL_KEY_SET_IF_MASKED(&mask->key, keys, cnt,
>+				     FLOW_DISSECTOR_KEY_ICMP, icmp);
>+	else
>+		FL_KEY_SET_IF_MASKED(&mask->key, keys, cnt,
>+				     FLOW_DISSECTOR_KEY_PORTS, tp);
> 	FL_KEY_SET_IF_MASKED(&mask->key, keys, cnt,
> 			     FLOW_DISSECTOR_KEY_VLAN, vlan);
> 	FL_KEY_SET_IF_MASKED(&mask->key, keys, cnt,
>@@ -652,6 +664,7 @@ static void fl_init_dissector(struct cls_fl_head *head,
> }
> 
> static int fl_check_assign_mask(struct cls_fl_head *head,
>+				struct cls_fl_filter *f,
> 				struct fl_flow_mask *mask)
> {
> 	int err;
>@@ -672,7 +685,7 @@ static int fl_check_assign_mask(struct cls_fl_head *head,
> 	memcpy(&head->mask, mask, sizeof(head->mask));
> 	head->mask_assigned = true;
> 
>-	fl_init_dissector(head, mask);
>+	fl_init_dissector(head, f, mask);
> 
> 	return 0;
> }
>@@ -785,7 +798,7 @@ static int fl_change(struct net *net, struct sk_buff *in_skb,
> 	if (err)
> 		goto errout;
> 
>-	err = fl_check_assign_mask(head, &mask);
>+	err = fl_check_assign_mask(head, fnew, &mask);
> 	if (err)
> 		goto errout;
> 
>@@ -1000,24 +1013,24 @@ static int fl_dump(struct net *net, struct tcf_proto *tp, unsigned long fh,
> 				  sizeof(key->tp.dst))))
> 		goto nla_put_failure;
> 	else if (flow_basic_key_is_icmpv4(&key->basic) &&
>-		 (fl_dump_key_val(skb, &key->tp.type,
>-				  TCA_FLOWER_KEY_ICMPV4_TYPE, &mask->tp.type,
>+		 (fl_dump_key_val(skb, &key->icmp.type,
>+				  TCA_FLOWER_KEY_ICMPV4_TYPE, &mask->icmp.type,
> 				  TCA_FLOWER_KEY_ICMPV4_TYPE_MASK,
>-				  sizeof(key->tp.type)) ||
>-		  fl_dump_key_val(skb, &key->tp.code,
>-				  TCA_FLOWER_KEY_ICMPV4_CODE, &mask->tp.code,
>+				  sizeof(key->icmp.type)) ||
>+		  fl_dump_key_val(skb, &key->icmp.code,
>+				  TCA_FLOWER_KEY_ICMPV4_CODE, &mask->icmp.code,
> 				  TCA_FLOWER_KEY_ICMPV4_CODE_MASK,
>-				  sizeof(key->tp.code))))
>+				  sizeof(key->icmp.code))))
> 		goto nla_put_failure;
> 	else if (flow_basic_key_is_icmpv6(&key->basic) &&
>-		 (fl_dump_key_val(skb, &key->tp.type,
>-				  TCA_FLOWER_KEY_ICMPV6_TYPE, &mask->tp.type,
>+		 (fl_dump_key_val(skb, &key->icmp.type,
>+				  TCA_FLOWER_KEY_ICMPV6_TYPE, &mask->icmp.type,
> 				  TCA_FLOWER_KEY_ICMPV6_TYPE_MASK,
>-				  sizeof(key->tp.type)) ||
>-		  fl_dump_key_val(skb, &key->tp.code,
>-				  TCA_FLOWER_KEY_ICMPV6_CODE, &mask->tp.code,
>+				  sizeof(key->icmp.type)) ||
>+		  fl_dump_key_val(skb, &key->icmp.code,
>+				  TCA_FLOWER_KEY_ICMPV6_CODE, &mask->icmp.code,
> 				  TCA_FLOWER_KEY_ICMPV6_CODE_MASK,
>-				  sizeof(key->tp.code))))
>+				  sizeof(key->icmp.code))))
> 		goto nla_put_failure;
> 
> 	if (key->enc_control.addr_type == FLOW_DISSECTOR_KEY_IPV4_ADDRS &&

^ permalink raw reply

* [PATCH net-next] net: dsa: mv88e6xxx: Use EDSA on mv88e6097
From: Stefan Eichenberger @ 2016-12-05 13:12 UTC (permalink / raw)
  To: andrew, vivien.didelot; +Cc: netdev, Stefan Eichenberger

Use DSA_TAG_PROTO_EDSA as tag_protocol for the mv88e6097. The 
initialisation was missing before.

Signed-off-by: Stefan Eichenberger <stefan.eichenberger@netmodule.com>
---
 drivers/net/dsa/mv88e6xxx/chip.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
index ca453f3..7a6c587 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -3861,6 +3861,7 @@ static const struct mv88e6xxx_info mv88e6xxx_table[] = {
 		.global1_addr = 0x1b,
 		.age_time_coeff = 15000,
 		.g1_irqs = 8,
+		.tag_protocol = DSA_TAG_PROTO_EDSA,
 		.flags = MV88E6XXX_FLAGS_FAMILY_6097,
 		.ops = &mv88e6097_ops,
 	},
-- 
2.9.3

^ permalink raw reply related

* Re: [PATCH v2 net-next 1/2] flow dissector: ICMP support
From: Simon Horman @ 2016-12-05 14:23 UTC (permalink / raw)
  To: Tom Herbert
  Cc: Jiri Pirko, David Miller, Linux Kernel Network Developers,
	Jay Vosburgh, Veaceslav Falico, Andy Gospodarek, Jamal Hadi Salim,
	Jiri Pirko
In-Reply-To: <CALx6S36EorReKwSs=qpS6uxCrL=_CQHf2BfV+Y=Kw-Dzfc241A@mail.gmail.com>

On Sat, Dec 03, 2016 at 10:52:32AM -0800, Tom Herbert wrote:
> On Sat, Dec 3, 2016 at 2:49 AM, Jiri Pirko <jiri@resnulli.us> wrote:
> > Fri, Dec 02, 2016 at 09:31:41PM CET, simon.horman@netronome.com wrote:
> >>Allow dissection of ICMP(V6) type and code. This re-uses transport layer
> >>port dissection code as although ICMP is not a transport protocol and their
> >>type and code are not ports this allows sharing of both code and storage.
> >>
> >>Signed-off-by: Simon Horman <simon.horman@netronome.com>

...

> > Digging into this a bit more. I think it would be much nice not to mix
> > up l4 ports and icmp stuff.
> >
> > How about to have FLOW_DISSECTOR_KEY_ICMP
> > and
> > struct flow_dissector_key_icmp {
> >         u8 type;
> >         u8 code;
> > };
> >
> > The you can make this structure and struct flow_dissector_key_ports into
> > an union in struct flow_keys.
> >
> > Looks much cleaner to me.

Hi Jiri,

I wondered about that too. The main reason that I didn't do this the first
time around is that I wasn't sure what to do to differentiate between ports
and icmp in fl_init_dissector() given that using a union would could to
mask bits being set in both if they are set in either.

I've taken a stab at addressing that below along with implementing your
suggestion. If the treatment in fl_init_dissector() is acceptable
perhaps something similar should be done for FLOW_DISSECTOR_KEY_IPV[46]_ADDRS
too?

> I agree, this patch adds to many conditionals into the fast path for
> ICMP handling. Neither is there much point in using type and code as
> input to the packet hash.

Hi Tom,

I'm not sure that I follow this. A packet may be ICMP or not regardless of
if FLOW_DISSECTOR_KEY_PORT (and now FLOW_DISSECTOR_KEY_ICMP) port is set or
not. I'd appreciate some guidance here.


First-pass at implementing Jiri's suggestion follows:

diff --git a/include/net/flow_dissector.h b/include/net/flow_dissector.h
index 5540dfa18872..475cd121496f 100644
--- a/include/net/flow_dissector.h
+++ b/include/net/flow_dissector.h
@@ -91,14 +91,10 @@ struct flow_dissector_key_addrs {
 
 /**
  * flow_dissector_key_ports:
- *	@ports: port numbers of Transport header or
- *		type and code of ICMP header
+ *	@ports: port numbers of Transport header
  *		ports: source (high) and destination (low) port numbers
  *		src: source port number
  *		dst: destination port number
- *		icmp: ICMP type (high) and code (low)
- *		type: ICMP type
- *		type: ICMP code
  */
 struct flow_dissector_key_ports {
 	union {
@@ -107,6 +103,18 @@ struct flow_dissector_key_ports {
 			__be16 src;
 			__be16 dst;
 		};
+	};
+};
+
+/**
+ * flow_dissector_key_icmp:
+ *	@ports: type and code of ICMP header
+ *		icmp: ICMP type (high) and code (low)
+ *		type: ICMP type
+ *		code: ICMP code
+ */
+struct flow_dissector_key_icmp {
+	union {
 		__be16 icmp;
 		struct {
 			u8 type;
@@ -133,6 +141,7 @@ enum flow_dissector_key_id {
 	FLOW_DISSECTOR_KEY_IPV4_ADDRS, /* struct flow_dissector_key_ipv4_addrs */
 	FLOW_DISSECTOR_KEY_IPV6_ADDRS, /* struct flow_dissector_key_ipv6_addrs */
 	FLOW_DISSECTOR_KEY_PORTS, /* struct flow_dissector_key_ports */
+	FLOW_DISSECTOR_KEY_ICMP, /* struct flow_dissector_key_icmp */
 	FLOW_DISSECTOR_KEY_ETH_ADDRS, /* struct flow_dissector_key_eth_addrs */
 	FLOW_DISSECTOR_KEY_TIPC_ADDRS, /* struct flow_dissector_key_tipc_addrs */
 	FLOW_DISSECTOR_KEY_VLAN, /* struct flow_dissector_key_flow_vlan */
@@ -171,7 +180,10 @@ struct flow_keys {
 	struct flow_dissector_key_tags tags;
 	struct flow_dissector_key_vlan vlan;
 	struct flow_dissector_key_keyid keyid;
-	struct flow_dissector_key_ports ports;
+	union {
+		struct flow_dissector_key_ports ports;
+		struct flow_dissector_key_icmp icmp;
+	};
 	struct flow_dissector_key_addrs addrs;
 };
 
diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
index 0584b4bb4390..33e5fa2b3e87 100644
--- a/net/core/flow_dissector.c
+++ b/net/core/flow_dissector.c
@@ -139,6 +139,7 @@ bool __skb_flow_dissect(const struct sk_buff *skb,
 	struct flow_dissector_key_basic *key_basic;
 	struct flow_dissector_key_addrs *key_addrs;
 	struct flow_dissector_key_ports *key_ports;
+	struct flow_dissector_key_icmp *key_icmp;
 	struct flow_dissector_key_tags *key_tags;
 	struct flow_dissector_key_vlan *key_vlan;
 	struct flow_dissector_key_keyid *key_keyid;
@@ -559,18 +560,25 @@ bool __skb_flow_dissect(const struct sk_buff *skb,
 		break;
 	}
 
-	if (dissector_uses_key(flow_dissector,
-			       FLOW_DISSECTOR_KEY_PORTS)) {
-		key_ports = skb_flow_dissector_target(flow_dissector,
-						      FLOW_DISSECTOR_KEY_PORTS,
-						      target_container);
-		if (flow_protos_are_icmp_any(proto, ip_proto))
-			key_ports->icmp = skb_flow_get_be16(skb, nhoff, data,
-							    hlen);
-		else
+	if (flow_protos_are_icmp_any(proto, ip_proto)) {
+		if (dissector_uses_key(flow_dissector,
+				       FLOW_DISSECTOR_KEY_ICMP)) {
+			key_icmp = skb_flow_dissector_target(flow_dissector,
+							     FLOW_DISSECTOR_KEY_ICMP,
+							     target_container);
+			key_icmp->icmp = skb_flow_get_be16(skb, nhoff, data,
+							   hlen);
+		}
+	} else {
+		if (dissector_uses_key(flow_dissector,
+				       FLOW_DISSECTOR_KEY_PORTS)) {
+			key_ports = skb_flow_dissector_target(flow_dissector,
+							      FLOW_DISSECTOR_KEY_PORTS,
+							      target_container);
 			key_ports->ports = __skb_flow_get_ports(skb, nhoff,
 								ip_proto, data,
 								hlen);
+		}
 	}
 
 out_good:
@@ -975,6 +983,10 @@ static const struct flow_dissector_key flow_keys_dissector_keys[] = {
 		.offset = offsetof(struct flow_keys, ports),
 	},
 	{
+		.key_id = FLOW_DISSECTOR_KEY_ICMP,
+		.offset = offsetof(struct flow_keys, icmp),
+	},
+	{
 		.key_id = FLOW_DISSECTOR_KEY_VLAN,
 		.offset = offsetof(struct flow_keys, vlan),
 	},
diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
index c96b2a349779..f4ba69bd362f 100644
--- a/net/sched/cls_flower.c
+++ b/net/sched/cls_flower.c
@@ -38,7 +38,10 @@ struct fl_flow_key {
 		struct flow_dissector_key_ipv4_addrs ipv4;
 		struct flow_dissector_key_ipv6_addrs ipv6;
 	};
-	struct flow_dissector_key_ports tp;
+	union {
+		struct flow_dissector_key_ports tp;
+		struct flow_dissector_key_icmp icmp;
+	};
 	struct flow_dissector_key_keyid enc_key_id;
 	union {
 		struct flow_dissector_key_ipv4_addrs enc_ipv4;
@@ -511,19 +514,23 @@ static int fl_set_key(struct net *net, struct nlattr **tb,
 			       &mask->tp.dst, TCA_FLOWER_KEY_SCTP_DST_MASK,
 			       sizeof(key->tp.dst));
 	} else if (flow_basic_key_is_icmpv4(&key->basic)) {
-		fl_set_key_val(tb, &key->tp.type, TCA_FLOWER_KEY_ICMPV4_TYPE,
-			       &mask->tp.type, TCA_FLOWER_KEY_ICMPV4_TYPE_MASK,
-			       sizeof(key->tp.type));
-		fl_set_key_val(tb, &key->tp.code, TCA_FLOWER_KEY_ICMPV4_CODE,
-			       &mask->tp.code, TCA_FLOWER_KEY_ICMPV4_CODE_MASK,
-			       sizeof(key->tp.code));
+		fl_set_key_val(tb, &key->icmp.type, TCA_FLOWER_KEY_ICMPV4_TYPE,
+			       &mask->icmp.type,
+			       TCA_FLOWER_KEY_ICMPV4_TYPE_MASK,
+			       sizeof(key->icmp.type));
+		fl_set_key_val(tb, &key->icmp.code, TCA_FLOWER_KEY_ICMPV4_CODE,
+			       &mask->icmp.code,
+			       TCA_FLOWER_KEY_ICMPV4_CODE_MASK,
+			       sizeof(key->icmp.code));
 	} else if (flow_basic_key_is_icmpv6(&key->basic)) {
-		fl_set_key_val(tb, &key->tp.type, TCA_FLOWER_KEY_ICMPV6_TYPE,
-			       &mask->tp.type, TCA_FLOWER_KEY_ICMPV6_TYPE_MASK,
-			       sizeof(key->tp.type));
-		fl_set_key_val(tb, &key->tp.code, TCA_FLOWER_KEY_ICMPV4_CODE,
-			       &mask->tp.code, TCA_FLOWER_KEY_ICMPV4_CODE_MASK,
-			       sizeof(key->tp.code));
+		fl_set_key_val(tb, &key->icmp.type, TCA_FLOWER_KEY_ICMPV6_TYPE,
+			       &mask->icmp.type,
+			       TCA_FLOWER_KEY_ICMPV6_TYPE_MASK,
+			       sizeof(key->icmp.type));
+		fl_set_key_val(tb, &key->icmp.code, TCA_FLOWER_KEY_ICMPV4_CODE,
+			       &mask->icmp.code,
+			       TCA_FLOWER_KEY_ICMPV4_CODE_MASK,
+			       sizeof(key->icmp.code));
 	}
 
 	if (tb[TCA_FLOWER_KEY_ENC_IPV4_SRC] ||
@@ -609,15 +616,16 @@ static int fl_init_hashtable(struct cls_fl_head *head,
 		keys[cnt].key_id = id;						\
 		keys[cnt].offset = FL_KEY_MEMBER_OFFSET(member);		\
 		cnt++;								\
-	} while(0);
+	} while(0)
 
 #define FL_KEY_SET_IF_MASKED(mask, keys, cnt, id, member)			\
 	do {									\
 		if (FL_KEY_IS_MASKED(mask, member))				\
 			FL_KEY_SET(keys, cnt, id, member);			\
-	} while(0);
+	} while(0)
 
 static void fl_init_dissector(struct cls_fl_head *head,
+			      struct cls_fl_filter *f,
 			      struct fl_flow_mask *mask)
 {
 	struct flow_dissector_key keys[FLOW_DISSECTOR_KEY_MAX];
@@ -631,8 +639,12 @@ static void fl_init_dissector(struct cls_fl_head *head,
 			     FLOW_DISSECTOR_KEY_IPV4_ADDRS, ipv4);
 	FL_KEY_SET_IF_MASKED(&mask->key, keys, cnt,
 			     FLOW_DISSECTOR_KEY_IPV6_ADDRS, ipv6);
-	FL_KEY_SET_IF_MASKED(&mask->key, keys, cnt,
-			     FLOW_DISSECTOR_KEY_PORTS, tp);
+	if (flow_basic_key_is_icmpv4(&f->key.basic))
+		FL_KEY_SET_IF_MASKED(&mask->key, keys, cnt,
+				     FLOW_DISSECTOR_KEY_ICMP, icmp);
+	else
+		FL_KEY_SET_IF_MASKED(&mask->key, keys, cnt,
+				     FLOW_DISSECTOR_KEY_PORTS, tp);
 	FL_KEY_SET_IF_MASKED(&mask->key, keys, cnt,
 			     FLOW_DISSECTOR_KEY_VLAN, vlan);
 	FL_KEY_SET_IF_MASKED(&mask->key, keys, cnt,
@@ -652,6 +664,7 @@ static void fl_init_dissector(struct cls_fl_head *head,
 }
 
 static int fl_check_assign_mask(struct cls_fl_head *head,
+				struct cls_fl_filter *f,
 				struct fl_flow_mask *mask)
 {
 	int err;
@@ -672,7 +685,7 @@ static int fl_check_assign_mask(struct cls_fl_head *head,
 	memcpy(&head->mask, mask, sizeof(head->mask));
 	head->mask_assigned = true;
 
-	fl_init_dissector(head, mask);
+	fl_init_dissector(head, f, mask);
 
 	return 0;
 }
@@ -785,7 +798,7 @@ static int fl_change(struct net *net, struct sk_buff *in_skb,
 	if (err)
 		goto errout;
 
-	err = fl_check_assign_mask(head, &mask);
+	err = fl_check_assign_mask(head, fnew, &mask);
 	if (err)
 		goto errout;
 
@@ -1000,24 +1013,24 @@ static int fl_dump(struct net *net, struct tcf_proto *tp, unsigned long fh,
 				  sizeof(key->tp.dst))))
 		goto nla_put_failure;
 	else if (flow_basic_key_is_icmpv4(&key->basic) &&
-		 (fl_dump_key_val(skb, &key->tp.type,
-				  TCA_FLOWER_KEY_ICMPV4_TYPE, &mask->tp.type,
+		 (fl_dump_key_val(skb, &key->icmp.type,
+				  TCA_FLOWER_KEY_ICMPV4_TYPE, &mask->icmp.type,
 				  TCA_FLOWER_KEY_ICMPV4_TYPE_MASK,
-				  sizeof(key->tp.type)) ||
-		  fl_dump_key_val(skb, &key->tp.code,
-				  TCA_FLOWER_KEY_ICMPV4_CODE, &mask->tp.code,
+				  sizeof(key->icmp.type)) ||
+		  fl_dump_key_val(skb, &key->icmp.code,
+				  TCA_FLOWER_KEY_ICMPV4_CODE, &mask->icmp.code,
 				  TCA_FLOWER_KEY_ICMPV4_CODE_MASK,
-				  sizeof(key->tp.code))))
+				  sizeof(key->icmp.code))))
 		goto nla_put_failure;
 	else if (flow_basic_key_is_icmpv6(&key->basic) &&
-		 (fl_dump_key_val(skb, &key->tp.type,
-				  TCA_FLOWER_KEY_ICMPV6_TYPE, &mask->tp.type,
+		 (fl_dump_key_val(skb, &key->icmp.type,
+				  TCA_FLOWER_KEY_ICMPV6_TYPE, &mask->icmp.type,
 				  TCA_FLOWER_KEY_ICMPV6_TYPE_MASK,
-				  sizeof(key->tp.type)) ||
-		  fl_dump_key_val(skb, &key->tp.code,
-				  TCA_FLOWER_KEY_ICMPV6_CODE, &mask->tp.code,
+				  sizeof(key->icmp.type)) ||
+		  fl_dump_key_val(skb, &key->icmp.code,
+				  TCA_FLOWER_KEY_ICMPV6_CODE, &mask->icmp.code,
 				  TCA_FLOWER_KEY_ICMPV6_CODE_MASK,
-				  sizeof(key->tp.code))))
+				  sizeof(key->icmp.code))))
 		goto nla_put_failure;
 
 	if (key->enc_control.addr_type == FLOW_DISSECTOR_KEY_IPV4_ADDRS &&

^ permalink raw reply related

* Re: [PATCH net-next] net: dsa: mv88e6xxx: Use EDSA on mv88e6097
From: Andrew Lunn @ 2016-12-05 14:12 UTC (permalink / raw)
  To: Stefan Eichenberger; +Cc: vivien.didelot, netdev, Stefan Eichenberger
In-Reply-To: <20161205131242.19370-1-stefan.eichenberger@netmodule.com>

On Mon, Dec 05, 2016 at 02:12:42PM +0100, Stefan Eichenberger wrote:
> Use DSA_TAG_PROTO_EDSA as tag_protocol for the mv88e6097. The 
> initialisation was missing before.
> 
> Signed-off-by: Stefan Eichenberger <stefan.eichenberger@netmodule.com>

Upps, sorry. My error.

This is net-next only.

Fixes: a1f482aa8c33 ("net: dsa: mv88e6xxx: Move the tagging protocol into info")
Reviewed-by: Andrew Lunn <andrew@lunn.ch>

    Andrew

^ permalink raw reply

* Re: [patch net v2] net: fec: fix compile with CONFIG_M5272
From: Nikita Yushchenko @ 2016-12-05 14:03 UTC (permalink / raw)
  To: kbuild test robot
  Cc: kbuild-all, David S. Miller, Fugang Duan, Troy Kisky, Andrew Lunn,
	Eric Nelson, Philippe Reynes, Johannes Berg, netdev, Chris Healy,
	Fabio Estevam, linux-kernel
In-Reply-To: <60acf0f4-2f1f-2b2c-06ed-b1d8cb03525d@cogentembedded.com>

diff --git a/drivers/net/ethernet/freescale/fec_main.c
b/drivers/net/ethernet/freescale/fec_main.c
index 741cf4a57bfc..12aef1b15356 100644
--- a/drivers/net/ethernet/freescale/fec_main.c
+++ b/drivers/net/ethernet/freescale/fec_main.c
@@ -3301,7 +3301,7 @@ fec_probe(struct platform_device *pdev)

 	/* Init network device */
 	ndev = alloc_etherdev_mqs(sizeof(struct fec_enet_private) +
-				  FEC_STAT_SIZE, num_tx_qs, num_rx_qs);
+				  FEC_STATS_SIZE, num_tx_qs, num_rx_qs);
 	if (!ndev)
 		return -ENOMEM;


> Aieee   I was typing too fast today, sorry...
> 
> send separate "fix for the fix", or re-send patch without that silly typo?
> 
> Nikita
> 
>> Hi Nikita,
>>
>> [auto build test ERROR on net/master]
>>
>> url:    https://github.com/0day-ci/linux/commits/Nikita-Yushchenko/net-fec-fix-compile-with-CONFIG_M5272/20161205-181735
>> config: arm-multi_v5_defconfig (attached as .config)
>> compiler: arm-linux-gnueabi-gcc (Debian 6.1.1-9) 6.1.1 20160705
>> reproduce:
>>         wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross
>>         chmod +x ~/bin/make.cross
>>         # save the attached .config to linux build tree
>>         make.cross ARCH=arm 
>>
>> All errors (new ones prefixed by >>):
>>
>>    drivers/net/ethernet/freescale/fec_main.c: In function 'fec_probe':
>>>> drivers/net/ethernet/freescale/fec_main.c:3304:7: error: 'FEC_STAT_SIZE' undeclared (first use in this function)
>>           FEC_STAT_SIZE, num_tx_qs, num_rx_qs);
>>           ^~~~~~~~~~~~~
>>    drivers/net/ethernet/freescale/fec_main.c:3304:7: note: each undeclared identifier is reported only once for each function it appears in
>>
>> vim +/FEC_STAT_SIZE +3304 drivers/net/ethernet/freescale/fec_main.c
>>
>>   3298		int num_rx_qs;
>>   3299	
>>   3300		fec_enet_get_queue_num(pdev, &num_tx_qs, &num_rx_qs);
>>   3301	
>>   3302		/* Init network device */
>>   3303		ndev = alloc_etherdev_mqs(sizeof(struct fec_enet_private) +
>>> 3304					  FEC_STAT_SIZE, num_tx_qs, num_rx_qs);
>>   3305		if (!ndev)
>>   3306			return -ENOMEM;
>>   3307	
>>
>> ---
>> 0-DAY kernel test infrastructure                Open Source Technology Center
>> https://lists.01.org/pipermail/kbuild-all                   Intel Corporation
>>

^ permalink raw reply related

* Re: [patch net v2] net: fec: fix compile with CONFIG_M5272
From: Nikita Yushchenko @ 2016-12-05 13:55 UTC (permalink / raw)
  To: kbuild test robot
  Cc: kbuild-all, David S. Miller, Fugang Duan, Troy Kisky, Andrew Lunn,
	Eric Nelson, Philippe Reynes, Johannes Berg, netdev, Chris Healy,
	Fabio Estevam, linux-kernel
In-Reply-To: <201612052104.HSl9jlj9%fengguang.wu@intel.com>

Aieee   I was typing too fast today, sorry...

send separate "fix for the fix", or re-send patch without that silly typo?

Nikita

> Hi Nikita,
> 
> [auto build test ERROR on net/master]
> 
> url:    https://github.com/0day-ci/linux/commits/Nikita-Yushchenko/net-fec-fix-compile-with-CONFIG_M5272/20161205-181735
> config: arm-multi_v5_defconfig (attached as .config)
> compiler: arm-linux-gnueabi-gcc (Debian 6.1.1-9) 6.1.1 20160705
> reproduce:
>         wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross
>         chmod +x ~/bin/make.cross
>         # save the attached .config to linux build tree
>         make.cross ARCH=arm 
> 
> All errors (new ones prefixed by >>):
> 
>    drivers/net/ethernet/freescale/fec_main.c: In function 'fec_probe':
>>> drivers/net/ethernet/freescale/fec_main.c:3304:7: error: 'FEC_STAT_SIZE' undeclared (first use in this function)
>           FEC_STAT_SIZE, num_tx_qs, num_rx_qs);
>           ^~~~~~~~~~~~~
>    drivers/net/ethernet/freescale/fec_main.c:3304:7: note: each undeclared identifier is reported only once for each function it appears in
> 
> vim +/FEC_STAT_SIZE +3304 drivers/net/ethernet/freescale/fec_main.c
> 
>   3298		int num_rx_qs;
>   3299	
>   3300		fec_enet_get_queue_num(pdev, &num_tx_qs, &num_rx_qs);
>   3301	
>   3302		/* Init network device */
>   3303		ndev = alloc_etherdev_mqs(sizeof(struct fec_enet_private) +
>> 3304					  FEC_STAT_SIZE, num_tx_qs, num_rx_qs);
>   3305		if (!ndev)
>   3306			return -ENOMEM;
>   3307	
> 
> ---
> 0-DAY kernel test infrastructure                Open Source Technology Center
> https://lists.01.org/pipermail/kbuild-all                   Intel Corporation
> 

^ permalink raw reply

* Re: [PATCH 2/2] net: rfkill: Add rfkill-any LED trigger
From: Johannes Berg @ 2016-12-05 13:54 UTC (permalink / raw)
  To: Michał Kępień, kbuild test robot
  Cc: kbuild-all-JC7UmRfGjtg, David S . Miller,
	linux-wireless-u79uwXL29TY76Z2rM5mHXA,
	netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA
In-Reply-To: <20161201200804.GA3872-d0zqvqtRkg7MisJl3QRgO5JNpPtI3pG4@public.gmane.org>


> Thanks, these are obviously all valid concerns.  Sorry for being
> sloppy
> with the ifdefs.  If I get positive feedback on the proposed feature
> itself, all these issues (and the warning pointed out in the other
> message) will be resolved in v2.

Looks fine, please do that.

johannes

^ permalink raw reply

* [PATCH v2 4/4] ARM: dts: hix5hd2: add gmac generic compatible and clock names
From: Dongpo Li @ 2016-12-05 13:28 UTC (permalink / raw)
  To: robh+dt, mark.rutland, mturquette, sboyd, linux, zhangfei.gao,
	yisen.zhuang, salil.mehta, davem, arnd, andrew
  Cc: xuejiancheng, benjamin.chenhao, caizhiyong, netdev, devicetree,
	linux-kernel, Dongpo Li
In-Reply-To: <1480944481-118803-1-git-send-email-lidongpo@hisilicon.com>

Add gmac generic compatible and clock names.

Signed-off-by: Dongpo Li <lidongpo@hisilicon.com>
---
 arch/arm/boot/dts/hisi-x5hd2.dtsi | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/arm/boot/dts/hisi-x5hd2.dtsi b/arch/arm/boot/dts/hisi-x5hd2.dtsi
index fdcc23d..0da76c5 100644
--- a/arch/arm/boot/dts/hisi-x5hd2.dtsi
+++ b/arch/arm/boot/dts/hisi-x5hd2.dtsi
@@ -436,18 +436,20 @@
 		};
 
 		gmac0: ethernet@1840000 {
-			compatible = "hisilicon,hix5hd2-gmac";
+			compatible = "hisilicon,hix5hd2-gemac", "hisilicon,hisi-gemac-v1";
 			reg = <0x1840000 0x1000>,<0x184300c 0x4>;
 			interrupts = <0 71 4>;
 			clocks = <&clock HIX5HD2_MAC0_CLK>;
+			clock-names = "mac_core";
 			status = "disabled";
 		};
 
 		gmac1: ethernet@1841000 {
-			compatible = "hisilicon,hix5hd2-gmac";
+			compatible = "hisilicon,hix5hd2-gemac", "hisilicon,hisi-gemac-v1";
 			reg = <0x1841000 0x1000>,<0x1843010 0x4>;
 			interrupts = <0 72 4>;
 			clocks = <&clock HIX5HD2_MAC1_CLK>;
+			clock-names = "mac_core";
 			status = "disabled";
 		};
 
-- 
2.8.2

^ permalink raw reply related

* [PATCH v2 1/4] net: hix5hd2_gmac: add generic compatible string
From: Dongpo Li @ 2016-12-05 13:27 UTC (permalink / raw)
  To: robh+dt, mark.rutland, mturquette, sboyd, linux, zhangfei.gao,
	yisen.zhuang, salil.mehta, davem, arnd, andrew
  Cc: xuejiancheng, benjamin.chenhao, caizhiyong, netdev, devicetree,
	linux-kernel, Dongpo Li
In-Reply-To: <1480944481-118803-1-git-send-email-lidongpo@hisilicon.com>

The "hix5hd2" is SoC name, add the generic ethernet driver name.
The "hisi-gemac-v1" is the basic version and "hisi-gemac-v2" adds
the SG/TXCSUM/TSO/UFO features.

Signed-off-by: Dongpo Li <lidongpo@hisilicon.com>
---
 .../devicetree/bindings/net/hisilicon-hix5hd2-gmac.txt    |  9 +++++++--
 drivers/net/ethernet/hisilicon/hix5hd2_gmac.c             | 15 +++++++++++----
 2 files changed, 18 insertions(+), 6 deletions(-)

diff --git a/Documentation/devicetree/bindings/net/hisilicon-hix5hd2-gmac.txt b/Documentation/devicetree/bindings/net/hisilicon-hix5hd2-gmac.txt
index 75d398b..75920f0 100644
--- a/Documentation/devicetree/bindings/net/hisilicon-hix5hd2-gmac.txt
+++ b/Documentation/devicetree/bindings/net/hisilicon-hix5hd2-gmac.txt
@@ -1,7 +1,12 @@
 Hisilicon hix5hd2 gmac controller
 
 Required properties:
-- compatible: should be "hisilicon,hix5hd2-gmac".
+- compatible: should contain one of the following SoC strings:
+	* "hisilicon,hix5hd2-gemac"
+	* "hisilicon,hi3798cv200-gemac"
+	and one of the following version string:
+	* "hisilicon,hisi-gemac-v1"
+	* "hisilicon,hisi-gemac-v2"
 - reg: specifies base physical address(s) and size of the device registers.
   The first region is the MAC register base and size.
   The second region is external interface control register.
@@ -20,7 +25,7 @@ Required properties:
 
 Example:
 	gmac0: ethernet@f9840000 {
-		compatible = "hisilicon,hix5hd2-gmac";
+		compatible = "hisilicon,hix5hd2-gemac", "hisilicon,hisi-gemac-v1";
 		reg = <0xf9840000 0x1000>,<0xf984300c 0x4>;
 		interrupts = <0 71 4>;
 		#address-cells = <1>;
diff --git a/drivers/net/ethernet/hisilicon/hix5hd2_gmac.c b/drivers/net/ethernet/hisilicon/hix5hd2_gmac.c
index e69a6be..27cb2e6 100644
--- a/drivers/net/ethernet/hisilicon/hix5hd2_gmac.c
+++ b/drivers/net/ethernet/hisilicon/hix5hd2_gmac.c
@@ -189,6 +189,10 @@
 #define dma_cnt(n)			((n) >> 5)
 #define dma_byte(n)			((n) << 5)
 
+#define HW_CAP_TSO			BIT(0)
+#define GEMAC_V1			0
+#define GEMAC_V2			(GEMAC_V1 | HW_CAP_TSO)
+
 struct hix5hd2_desc {
 	__le32 buff_addr;
 	__le32 cmd;
@@ -1021,7 +1025,10 @@ static int hix5hd2_dev_remove(struct platform_device *pdev)
 }
 
 static const struct of_device_id hix5hd2_of_match[] = {
-	{.compatible = "hisilicon,hix5hd2-gmac",},
+	{ .compatible = "hisilicon,hisi-gemac-v1", .data = (void *)GEMAC_V1 },
+	{ .compatible = "hisilicon,hisi-gemac-v2", .data = (void *)GEMAC_V2 },
+	{ .compatible = "hisilicon,hix5hd2-gemac", .data = (void *)GEMAC_V1 },
+	{ .compatible = "hisilicon,hi3798cv200-gemac", .data = (void *)GEMAC_V2 },
 	{},
 };
 
@@ -1029,7 +1036,7 @@ MODULE_DEVICE_TABLE(of, hix5hd2_of_match);
 
 static struct platform_driver hix5hd2_dev_driver = {
 	.driver = {
-		.name = "hix5hd2-gmac",
+		.name = "hisi-gemac",
 		.of_match_table = hix5hd2_of_match,
 	},
 	.probe = hix5hd2_dev_probe,
@@ -1038,6 +1045,6 @@ static struct platform_driver hix5hd2_dev_driver = {
 
 module_platform_driver(hix5hd2_dev_driver);
 
-MODULE_DESCRIPTION("HISILICON HIX5HD2 Ethernet driver");
+MODULE_DESCRIPTION("HISILICON Gigabit Ethernet MAC driver");
 MODULE_LICENSE("GPL v2");
-MODULE_ALIAS("platform:hix5hd2-gmac");
+MODULE_ALIAS("platform:hisi-gemac");
-- 
2.8.2

^ permalink raw reply related

* [PATCH v2 2/4] net: hix5hd2_gmac: add tx scatter-gather feature
From: Dongpo Li @ 2016-12-05 13:27 UTC (permalink / raw)
  To: robh+dt, mark.rutland, mturquette, sboyd, linux, zhangfei.gao,
	yisen.zhuang, salil.mehta, davem, arnd, andrew
  Cc: xuejiancheng, benjamin.chenhao, caizhiyong, netdev, devicetree,
	linux-kernel, Dongpo Li
In-Reply-To: <1480944481-118803-1-git-send-email-lidongpo@hisilicon.com>

"hisi-gemac-v2" adds the SG/TXCSUM/TSO/UFO features.
This patch only adds the SG(scatter-gather) driver for transmitting,
the drivers of other features will be submitted later.

Signed-off-by: Dongpo Li <lidongpo@hisilicon.com>
---
 drivers/net/ethernet/hisilicon/hix5hd2_gmac.c | 198 ++++++++++++++++++++++++--
 1 file changed, 187 insertions(+), 11 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hix5hd2_gmac.c b/drivers/net/ethernet/hisilicon/hix5hd2_gmac.c
index 27cb2e6..18af55b 100644
--- a/drivers/net/ethernet/hisilicon/hix5hd2_gmac.c
+++ b/drivers/net/ethernet/hisilicon/hix5hd2_gmac.c
@@ -11,6 +11,7 @@
 #include <linux/interrupt.h>
 #include <linux/etherdevice.h>
 #include <linux/platform_device.h>
+#include <linux/of_device.h>
 #include <linux/of_net.h>
 #include <linux/of_mdio.h>
 #include <linux/clk.h>
@@ -183,6 +184,8 @@
 #define DESC_DATA_LEN_OFF		16
 #define DESC_BUFF_LEN_OFF		0
 #define DESC_DATA_MASK			0x7ff
+#define DESC_SG				BIT(30)
+#define DESC_FRAGS_NUM_OFF		11
 
 /* DMA descriptor ring helpers */
 #define dma_ring_incr(n, s)		(((n) + 1) & ((s) - 1))
@@ -192,6 +195,7 @@
 #define HW_CAP_TSO			BIT(0)
 #define GEMAC_V1			0
 #define GEMAC_V2			(GEMAC_V1 | HW_CAP_TSO)
+#define HAS_CAP_TSO(hw_cap)		((hw_cap) & HW_CAP_TSO)
 
 struct hix5hd2_desc {
 	__le32 buff_addr;
@@ -205,6 +209,27 @@ struct hix5hd2_desc_sw {
 	unsigned int	size;
 };
 
+struct hix5hd2_sg_desc_ring {
+	struct sg_desc *desc;
+	dma_addr_t phys_addr;
+};
+
+struct frags_info {
+	__le32 addr;
+	__le32 size;
+};
+
+/* hardware supported max skb frags num */
+#define SG_MAX_SKB_FRAGS	17
+struct sg_desc {
+	__le32 total_len;
+	__le32 resvd0;
+	__le32 linear_addr;
+	__le32 linear_len;
+	/* reserve one more frags for memory alignment */
+	struct frags_info frags[SG_MAX_SKB_FRAGS + 1];
+};
+
 #define QUEUE_NUMS	4
 struct hix5hd2_priv {
 	struct hix5hd2_desc_sw pool[QUEUE_NUMS];
@@ -212,6 +237,7 @@ struct hix5hd2_priv {
 #define rx_bq		pool[1]
 #define tx_bq		pool[2]
 #define tx_rq		pool[3]
+	struct hix5hd2_sg_desc_ring tx_ring;
 
 	void __iomem *base;
 	void __iomem *ctrl_base;
@@ -225,6 +251,7 @@ struct hix5hd2_priv {
 	struct device_node *phy_node;
 	phy_interface_t	phy_mode;
 
+	unsigned long hw_cap;
 	unsigned int speed;
 	unsigned int duplex;
 
@@ -515,6 +542,27 @@ static int hix5hd2_rx(struct net_device *dev, int limit)
 	return num;
 }
 
+static void hix5hd2_clean_sg_desc(struct hix5hd2_priv *priv,
+				  struct sk_buff *skb, u32 pos)
+{
+	struct sg_desc *desc;
+	dma_addr_t addr;
+	u32 len;
+	int i;
+
+	desc = priv->tx_ring.desc + pos;
+
+	addr = le32_to_cpu(desc->linear_addr);
+	len = le32_to_cpu(desc->linear_len);
+	dma_unmap_single(priv->dev, addr, len, DMA_TO_DEVICE);
+
+	for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) {
+		addr = le32_to_cpu(desc->frags[i].addr);
+		len = le32_to_cpu(desc->frags[i].size);
+		dma_unmap_page(priv->dev, addr, len, DMA_TO_DEVICE);
+	}
+}
+
 static void hix5hd2_xmit_reclaim(struct net_device *dev)
 {
 	struct sk_buff *skb;
@@ -542,8 +590,15 @@ static void hix5hd2_xmit_reclaim(struct net_device *dev)
 		pkts_compl++;
 		bytes_compl += skb->len;
 		desc = priv->tx_rq.desc + pos;
-		addr = le32_to_cpu(desc->buff_addr);
-		dma_unmap_single(priv->dev, addr, skb->len, DMA_TO_DEVICE);
+
+		if (skb_shinfo(skb)->nr_frags) {
+			hix5hd2_clean_sg_desc(priv, skb, pos);
+		} else {
+			addr = le32_to_cpu(desc->buff_addr);
+			dma_unmap_single(priv->dev, addr, skb->len,
+					 DMA_TO_DEVICE);
+		}
+
 		priv->tx_skb[pos] = NULL;
 		dev_consume_skb_any(skb);
 		pos = dma_ring_incr(pos, TX_DESC_NUM);
@@ -604,12 +659,66 @@ static irqreturn_t hix5hd2_interrupt(int irq, void *dev_id)
 	return IRQ_HANDLED;
 }
 
+static u32 hix5hd2_get_desc_cmd(struct sk_buff *skb, unsigned long hw_cap)
+{
+	u32 cmd = 0;
+
+	if (HAS_CAP_TSO(hw_cap)) {
+		if (skb_shinfo(skb)->nr_frags)
+			cmd |= DESC_SG;
+		cmd |= skb_shinfo(skb)->nr_frags << DESC_FRAGS_NUM_OFF;
+	} else {
+		cmd |= DESC_FL_FULL |
+			((skb->len & DESC_DATA_MASK) << DESC_BUFF_LEN_OFF);
+	}
+
+	cmd |= (skb->len & DESC_DATA_MASK) << DESC_DATA_LEN_OFF;
+	cmd |= DESC_VLD_BUSY;
+
+	return cmd;
+}
+
+static int hix5hd2_fill_sg_desc(struct hix5hd2_priv *priv,
+				struct sk_buff *skb, u32 pos)
+{
+	struct sg_desc *desc;
+	dma_addr_t addr;
+	int ret;
+	int i;
+
+	desc = priv->tx_ring.desc + pos;
+
+	desc->total_len = cpu_to_le32(skb->len);
+	addr = dma_map_single(priv->dev, skb->data, skb_headlen(skb),
+			      DMA_TO_DEVICE);
+	if (unlikely(dma_mapping_error(priv->dev, addr)))
+		return -EINVAL;
+	desc->linear_addr = cpu_to_le32(addr);
+	desc->linear_len = cpu_to_le32(skb_headlen(skb));
+
+	for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) {
+		skb_frag_t *frag = &skb_shinfo(skb)->frags[i];
+		int len = frag->size;
+
+		addr = skb_frag_dma_map(priv->dev, frag, 0, len, DMA_TO_DEVICE);
+		ret = dma_mapping_error(priv->dev, addr);
+		if (unlikely(ret))
+			return -EINVAL;
+		desc->frags[i].addr = cpu_to_le32(addr);
+		desc->frags[i].size = cpu_to_le32(len);
+	}
+
+	return 0;
+}
+
 static int hix5hd2_net_xmit(struct sk_buff *skb, struct net_device *dev)
 {
 	struct hix5hd2_priv *priv = netdev_priv(dev);
 	struct hix5hd2_desc *desc;
 	dma_addr_t addr;
 	u32 pos;
+	u32 cmd;
+	int ret;
 
 	/* software write pointer */
 	pos = dma_cnt(readl_relaxed(priv->base + TX_BQ_WR_ADDR));
@@ -620,18 +729,31 @@ static int hix5hd2_net_xmit(struct sk_buff *skb, struct net_device *dev)
 		return NETDEV_TX_BUSY;
 	}
 
-	addr = dma_map_single(priv->dev, skb->data, skb->len, DMA_TO_DEVICE);
-	if (dma_mapping_error(priv->dev, addr)) {
-		dev_kfree_skb_any(skb);
-		return NETDEV_TX_OK;
-	}
-
 	desc = priv->tx_bq.desc + pos;
+
+	cmd = hix5hd2_get_desc_cmd(skb, priv->hw_cap);
+	desc->cmd = cpu_to_le32(cmd);
+
+	if (skb_shinfo(skb)->nr_frags) {
+		ret = hix5hd2_fill_sg_desc(priv, skb, pos);
+		if (unlikely(ret)) {
+			dev_kfree_skb_any(skb);
+			dev->stats.tx_dropped++;
+			return NETDEV_TX_OK;
+		}
+		addr = priv->tx_ring.phys_addr + pos * sizeof(struct sg_desc);
+	} else {
+		addr = dma_map_single(priv->dev, skb->data, skb->len,
+				      DMA_TO_DEVICE);
+		if (unlikely(dma_mapping_error(priv->dev, addr))) {
+			dev_kfree_skb_any(skb);
+			dev->stats.tx_dropped++;
+			return NETDEV_TX_OK;
+		}
+	}
 	desc->buff_addr = cpu_to_le32(addr);
+
 	priv->tx_skb[pos] = skb;
-	desc->cmd = cpu_to_le32(DESC_VLD_BUSY | DESC_FL_FULL |
-				(skb->len & DESC_DATA_MASK) << DESC_DATA_LEN_OFF |
-				(skb->len & DESC_DATA_MASK) << DESC_BUFF_LEN_OFF);
 
 	/* ensure desc updated */
 	wmb();
@@ -866,10 +988,40 @@ static int hix5hd2_init_hw_desc_queue(struct hix5hd2_priv *priv)
 	return -ENOMEM;
 }
 
+static int hix5hd2_init_sg_desc_queue(struct hix5hd2_priv *priv)
+{
+	struct sg_desc *desc;
+	dma_addr_t phys_addr;
+
+	desc = (struct sg_desc *)dma_alloc_coherent(priv->dev,
+				TX_DESC_NUM * sizeof(struct sg_desc),
+				&phys_addr, GFP_KERNEL);
+	if (!desc)
+		return -ENOMEM;
+
+	priv->tx_ring.desc = desc;
+	priv->tx_ring.phys_addr = phys_addr;
+
+	return 0;
+}
+
+static void hix5hd2_destroy_sg_desc_queue(struct hix5hd2_priv *priv)
+{
+	if (priv->tx_ring.desc) {
+		dma_free_coherent(priv->dev,
+				  TX_DESC_NUM * sizeof(struct sg_desc),
+				  priv->tx_ring.desc, priv->tx_ring.phys_addr);
+		priv->tx_ring.desc = NULL;
+	}
+}
+
+static const struct of_device_id hix5hd2_of_match[];
+
 static int hix5hd2_dev_probe(struct platform_device *pdev)
 {
 	struct device *dev = &pdev->dev;
 	struct device_node *node = dev->of_node;
+	const struct of_device_id *of_id = NULL;
 	struct net_device *ndev;
 	struct hix5hd2_priv *priv;
 	struct resource *res;
@@ -887,6 +1039,13 @@ static int hix5hd2_dev_probe(struct platform_device *pdev)
 	priv->dev = dev;
 	priv->netdev = ndev;
 
+	of_id = of_match_device(hix5hd2_of_match, dev);
+	if (!of_id) {
+		ret = -EINVAL;
+		goto out_free_netdev;
+	}
+	priv->hw_cap = (unsigned long)of_id->data;
+
 	res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
 	priv->base = devm_ioremap_resource(dev, res);
 	if (IS_ERR(priv->base)) {
@@ -976,11 +1135,24 @@ static int hix5hd2_dev_probe(struct platform_device *pdev)
 	ndev->ethtool_ops = &hix5hd2_ethtools_ops;
 	SET_NETDEV_DEV(ndev, dev);
 
+	if (HAS_CAP_TSO(priv->hw_cap))
+		ndev->hw_features |= NETIF_F_SG;
+
+	ndev->features |= ndev->hw_features | NETIF_F_HIGHDMA;
+	ndev->vlan_features |= ndev->features;
+
 	ret = hix5hd2_init_hw_desc_queue(priv);
 	if (ret)
 		goto out_phy_node;
 
 	netif_napi_add(ndev, &priv->napi, hix5hd2_poll, NAPI_POLL_WEIGHT);
+
+	if (HAS_CAP_TSO(priv->hw_cap)) {
+		ret = hix5hd2_init_sg_desc_queue(priv);
+		if (ret)
+			goto out_destroy_queue;
+	}
+
 	ret = register_netdev(priv->netdev);
 	if (ret) {
 		netdev_err(ndev, "register_netdev failed!");
@@ -992,6 +1164,8 @@ static int hix5hd2_dev_probe(struct platform_device *pdev)
 	return ret;
 
 out_destroy_queue:
+	if (HAS_CAP_TSO(priv->hw_cap))
+		hix5hd2_destroy_sg_desc_queue(priv);
 	netif_napi_del(&priv->napi);
 	hix5hd2_destroy_hw_desc_queue(priv);
 out_phy_node:
@@ -1016,6 +1190,8 @@ static int hix5hd2_dev_remove(struct platform_device *pdev)
 	mdiobus_unregister(priv->bus);
 	mdiobus_free(priv->bus);
 
+	if (HAS_CAP_TSO(priv->hw_cap))
+		hix5hd2_destroy_sg_desc_queue(priv);
 	hix5hd2_destroy_hw_desc_queue(priv);
 	of_node_put(priv->phy_node);
 	cancel_work_sync(&priv->tx_timeout_task);
-- 
2.8.2

^ permalink raw reply related

* Re: [RFC] udp: some improvements on RX path.
From: Paolo Abeni @ 2016-12-05 13:22 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev
In-Reply-To: <1480905784.18162.509.camel@edumazet-glaptop3.roam.corp.google.com>

Hi Eric,

On Sun, 2016-12-04 at 18:43 -0800, Eric Dumazet wrote:
> We currently access 3 cache lines from an skb in receive queue while
> holding receive queue lock :
> 
> First cache line (contains ->next / prev pointers )
> 2nd cache line (skb->peeked)
> 3rd cache line (skb->truesize)
> 
> I believe we could get rid of skb->peeked completely.
> 
> I will cook a patch, but basically the idea is that the last owner of a
> skb (right before skb->users becomes 0) can have the 'ownership' and
> thus increase stats.

Agreed.

> The 3rd cache line miss is easily avoided by the following patch.

I run some performance tests on top of your patch "net: reorganize
struct sock for better data locality", and I see an additional ~7%
improvement on top of that, in the udp flood scenario. 

In my tests, the topmost perf offenders for the u/s process are now:

   9.98%  udp_sink  [kernel.kallsyms]  [k] udp_rmem_release
   8.76%  udp_sink  [kernel.kallsyms]  [k] inet_recvmsg
   6.71%  udp_sink  [kernel.kallsyms]  [k] _raw_spin_lock_irqsave
   5.40%  udp_sink  [kernel.kallsyms]  [k] __skb_try_recv_datagram
   5.19%  udp_sink  [kernel.kallsyms]  [k] copy_user_enhanced_fast_string

udp_rmem_release() spends most of its time doing:

    atomic_sub(size, &sk->sk_rmem_alloc);

I see a cacheline miss while accessing sk_rmem_alloc; most probably due
to sk_rmem_alloc being updated by the writer outside the rx queue lock.

Moving such update inside the lock show remove this cache miss but will
increase the pressure on the rx lock. What do you think ?

inet_recvmsg() is there because with "net: reorganize struct sock for
better data locality" we get a cache miss while accessing skc_rxhash in
sock_rps_record_flow(); touching sk_drops is dirtying that cacheline -
sorry for not noticing this before. Do you have CONFIG_RPS disabled ?

> But I also want to work on the idea I gave few days back, having a
> separate queue and use splice to transfer the 'softirq queue' into
> a calm queue in a different cache line.
> 
> I expect a 50 % performance increase under load, maybe 1.5 Mpps.

It should work nicely under contention, but won't that increase the
overhead for the uncontended/single flow scenario ? the user space
reader needs to acquire 2 lock when splicing the 'softirq queue'. On my
system ksoftirqd and the u/s process work at similar speeds, so splicing
will happen quite often. 

Paolo

^ permalink raw reply

* [PATCH v2 3/4] net: hix5hd2_gmac: add reset control and clock signals
From: Dongpo Li @ 2016-12-05 13:28 UTC (permalink / raw)
  To: robh+dt, mark.rutland, mturquette, sboyd, linux, zhangfei.gao,
	yisen.zhuang, salil.mehta, davem, arnd, andrew
  Cc: xuejiancheng, benjamin.chenhao, caizhiyong, netdev, devicetree,
	linux-kernel, Dongpo Li
In-Reply-To: <1480944481-118803-1-git-send-email-lidongpo@hisilicon.com>

Add three reset control signals, "mac_core_rst", "mac_ifc_rst" and
"phy_rst".
The following diagram explained how the reset signals work.

                        SoC
|-----------------------------------------------------
|                               ------                |
|                               | cpu |               |
|                               ------                |
|                                  |                  |
|                              ------------ AMBA bus  |
|                         GMAC     |                  |
|                            ----------------------   |
| ------------- mac_core_rst | --------------      |  |
| |clock and   |-------------->|   mac core  |     |  |
| |reset       |             | --------------      |  |
| |generator   |----         |       |             |  |
| -------------     |        | ----------------    |  |
|          |        ---------->| mac interface |   |  |
|          |     mac_ifc_rst | ----------------    |  |
|          |                 |       |             |  |
|          |                 | ------------------  |  |
|          |phy_rst          | | RGMII interface | |  |
|          |                 | ------------------  |  |
|          |                 ----------------------   |
|----------|------------------------------------------|
           |                          |
           |                      ----------
           |--------------------- |PHY chip |
                                  ----------

The "mac_core_rst" represents "mac core reset signal", it resets
the mac core including packet processing unit, descriptor processing unit,
tx engine, rx engine, control unit.
The "mac_ifc_rst" represents "mac interface reset signal", it resets
the mac interface. The mac interface unit connects mac core and
data interface like MII/RMII/RGMII. After we set a new value of
interface mode, we must reset mac interface to reload the new mode value.
The "mac_core_rst" and "mac_ifc_rst" are both optional to be
backward compatible with the hix5hd2 SoC.
The "phy_rst" represents "phy reset signal", it does a hardware reset
on the PHY chip. This reset signal is optional if the PHY can work well
without the hardware reset.

Add one more clock signal, the existing is MAC core clock,
and the new one is MAC interface clock.
The MAC interface clock is optional to be backward compatible with
the hix5hd2 SoC.

Signed-off-by: Dongpo Li <lidongpo@hisilicon.com>
---
 .../bindings/net/hisilicon-hix5hd2-gmac.txt        |  20 ++-
 drivers/net/ethernet/hisilicon/hix5hd2_gmac.c      | 139 +++++++++++++++++++--
 2 files changed, 144 insertions(+), 15 deletions(-)

diff --git a/Documentation/devicetree/bindings/net/hisilicon-hix5hd2-gmac.txt b/Documentation/devicetree/bindings/net/hisilicon-hix5hd2-gmac.txt
index 75920f0..063c02d 100644
--- a/Documentation/devicetree/bindings/net/hisilicon-hix5hd2-gmac.txt
+++ b/Documentation/devicetree/bindings/net/hisilicon-hix5hd2-gmac.txt
@@ -17,6 +17,16 @@ Required properties:
 - phy-handle: see ethernet.txt [1].
 - mac-address: see ethernet.txt [1].
 - clocks: clock phandle and specifier pair.
+- clock-names: contain the clock name "mac_core"(required) and "mac_ifc"(optional).
+- resets: should contain the phandle to the MAC core reset signal(optional),
+	the MAC interface reset signal(optional)
+	and the PHY reset signal(optional).
+- reset-names: contain the reset signal name "mac_core"(optional),
+	"mac_ifc"(optional) and "phy"(optional).
+- hisilicon,phy-reset-delays-us: triplet of delays if PHY reset signal given.
+	The 1st cell is reset pre-delay in micro seconds.
+	The 2nd cell is reset pulse in micro seconds.
+	The 3rd cell is reset post-delay in micro seconds.
 
 - PHY subnode: inherits from phy binding [2]
 
@@ -25,15 +35,19 @@ Required properties:
 
 Example:
 	gmac0: ethernet@f9840000 {
-		compatible = "hisilicon,hix5hd2-gemac", "hisilicon,hisi-gemac-v1";
+		compatible = "hisilicon,hi3798cv200-gemac", "hisilicon,hisi-gemac-v2";
 		reg = <0xf9840000 0x1000>,<0xf984300c 0x4>;
 		interrupts = <0 71 4>;
 		#address-cells = <1>;
 		#size-cells = <0>;
-		phy-mode = "mii";
+		phy-mode = "rgmii";
 		phy-handle = <&phy2>;
 		mac-address = [00 00 00 00 00 00];
-		clocks = <&clock HIX5HD2_MAC0_CLK>;
+		clocks = <&crg HISTB_ETH0_MAC_CLK>, <&crg HISTB_ETH0_MACIF_CLK>;
+		clock-names = "mac_core", "mac_ifc";
+		resets = <&crg 0xcc 8>, <&crg 0xcc 10>, <&crg 0xcc 12>;
+		reset-names = "mac_core", "mac_ifc", "phy";
+		hisilicon,phy-reset-delays-us = <10000 10000 30000>;
 
 		phy2: ethernet-phy@2 {
 			reg = <2>;
diff --git a/drivers/net/ethernet/hisilicon/hix5hd2_gmac.c b/drivers/net/ethernet/hisilicon/hix5hd2_gmac.c
index 18af55b..ee7e9ce 100644
--- a/drivers/net/ethernet/hisilicon/hix5hd2_gmac.c
+++ b/drivers/net/ethernet/hisilicon/hix5hd2_gmac.c
@@ -14,6 +14,7 @@
 #include <linux/of_device.h>
 #include <linux/of_net.h>
 #include <linux/of_mdio.h>
+#include <linux/reset.h>
 #include <linux/clk.h>
 #include <linux/circ_buf.h>
 
@@ -197,6 +198,15 @@
 #define GEMAC_V2			(GEMAC_V1 | HW_CAP_TSO)
 #define HAS_CAP_TSO(hw_cap)		((hw_cap) & HW_CAP_TSO)
 
+#define PHY_RESET_DELAYS_PROPERTY	"hisilicon,phy-reset-delays-us"
+
+enum phy_reset_delays {
+	PRE_DELAY,
+	PULSE,
+	POST_DELAY,
+	DELAYS_NUM,
+};
+
 struct hix5hd2_desc {
 	__le32 buff_addr;
 	__le32 cmd;
@@ -255,12 +265,26 @@ struct hix5hd2_priv {
 	unsigned int speed;
 	unsigned int duplex;
 
-	struct clk *clk;
+	struct clk *mac_core_clk;
+	struct clk *mac_ifc_clk;
+	struct reset_control *mac_core_rst;
+	struct reset_control *mac_ifc_rst;
+	struct reset_control *phy_rst;
+	u32 phy_reset_delays[DELAYS_NUM];
 	struct mii_bus *bus;
 	struct napi_struct napi;
 	struct work_struct tx_timeout_task;
 };
 
+static inline void hix5hd2_mac_interface_reset(struct hix5hd2_priv *priv)
+{
+	if (!priv->mac_ifc_rst)
+		return;
+
+	reset_control_assert(priv->mac_ifc_rst);
+	reset_control_deassert(priv->mac_ifc_rst);
+}
+
 static void hix5hd2_config_port(struct net_device *dev, u32 speed, u32 duplex)
 {
 	struct hix5hd2_priv *priv = netdev_priv(dev);
@@ -293,6 +317,7 @@ static void hix5hd2_config_port(struct net_device *dev, u32 speed, u32 duplex)
 	if (duplex)
 		val |= GMAC_FULL_DUPLEX;
 	writel_relaxed(val, priv->ctrl_base);
+	hix5hd2_mac_interface_reset(priv);
 
 	writel_relaxed(BIT_MODE_CHANGE_EN, priv->base + MODE_CHANGE_EN);
 	if (speed == SPEED_1000)
@@ -807,16 +832,26 @@ static int hix5hd2_net_open(struct net_device *dev)
 	struct phy_device *phy;
 	int ret;
 
-	ret = clk_prepare_enable(priv->clk);
+	ret = clk_prepare_enable(priv->mac_core_clk);
+	if (ret < 0) {
+		netdev_err(dev, "failed to enable mac core clk %d\n", ret);
+		return ret;
+	}
+
+	ret = clk_prepare_enable(priv->mac_ifc_clk);
 	if (ret < 0) {
-		netdev_err(dev, "failed to enable clk %d\n", ret);
+		clk_disable_unprepare(priv->mac_core_clk);
+		netdev_err(dev, "failed to enable mac ifc clk %d\n", ret);
 		return ret;
 	}
 
 	phy = of_phy_connect(dev, priv->phy_node,
 			     &hix5hd2_adjust_link, 0, priv->phy_mode);
-	if (!phy)
+	if (!phy) {
+		clk_disable_unprepare(priv->mac_ifc_clk);
+		clk_disable_unprepare(priv->mac_core_clk);
 		return -ENODEV;
+	}
 
 	phy_start(phy);
 	hix5hd2_hw_init(priv);
@@ -847,7 +882,8 @@ static int hix5hd2_net_close(struct net_device *dev)
 		phy_disconnect(dev->phydev);
 	}
 
-	clk_disable_unprepare(priv->clk);
+	clk_disable_unprepare(priv->mac_ifc_clk);
+	clk_disable_unprepare(priv->mac_core_clk);
 
 	return 0;
 }
@@ -1015,6 +1051,48 @@ static void hix5hd2_destroy_sg_desc_queue(struct hix5hd2_priv *priv)
 	}
 }
 
+static inline void hix5hd2_mac_core_reset(struct hix5hd2_priv *priv)
+{
+	if (!priv->mac_core_rst)
+		return;
+
+	reset_control_assert(priv->mac_core_rst);
+	reset_control_deassert(priv->mac_core_rst);
+}
+
+static void hix5hd2_sleep_us(u32 time_us)
+{
+	u32 time_ms;
+
+	if (!time_us)
+		return;
+
+	time_ms = DIV_ROUND_UP(time_us, 1000);
+	if (time_ms < 20)
+		usleep_range(time_us, time_us + 500);
+	else
+		msleep(time_ms);
+}
+
+static void hix5hd2_phy_reset(struct hix5hd2_priv *priv)
+{
+	/* To make sure PHY hardware reset success,
+	 * we must keep PHY in deassert state first and
+	 * then complete the hardware reset operation
+	 */
+	reset_control_deassert(priv->phy_rst);
+	hix5hd2_sleep_us(priv->phy_reset_delays[PRE_DELAY]);
+
+	reset_control_assert(priv->phy_rst);
+	/* delay some time to ensure reset ok,
+	 * this depends on PHY hardware feature
+	 */
+	hix5hd2_sleep_us(priv->phy_reset_delays[PULSE]);
+	reset_control_deassert(priv->phy_rst);
+	/* delay some time to ensure later MDIO access */
+	hix5hd2_sleep_us(priv->phy_reset_delays[POST_DELAY]);
+}
+
 static const struct of_device_id hix5hd2_of_match[];
 
 static int hix5hd2_dev_probe(struct platform_device *pdev)
@@ -1060,23 +1138,55 @@ static int hix5hd2_dev_probe(struct platform_device *pdev)
 		goto out_free_netdev;
 	}
 
-	priv->clk = devm_clk_get(&pdev->dev, NULL);
-	if (IS_ERR(priv->clk)) {
-		netdev_err(ndev, "failed to get clk\n");
+	priv->mac_core_clk = devm_clk_get(&pdev->dev, "mac_core");
+	if (IS_ERR(priv->mac_core_clk)) {
+		netdev_err(ndev, "failed to get mac core clk\n");
 		ret = -ENODEV;
 		goto out_free_netdev;
 	}
 
-	ret = clk_prepare_enable(priv->clk);
+	ret = clk_prepare_enable(priv->mac_core_clk);
 	if (ret < 0) {
-		netdev_err(ndev, "failed to enable clk %d\n", ret);
+		netdev_err(ndev, "failed to enable mac core clk %d\n", ret);
 		goto out_free_netdev;
 	}
 
+	priv->mac_ifc_clk = devm_clk_get(&pdev->dev, "mac_ifc");
+	if (IS_ERR(priv->mac_ifc_clk))
+		priv->mac_ifc_clk = NULL;
+
+	ret = clk_prepare_enable(priv->mac_ifc_clk);
+	if (ret < 0) {
+		netdev_err(ndev, "failed to enable mac ifc clk %d\n", ret);
+		goto out_disable_mac_core_clk;
+	}
+
+	priv->mac_core_rst = devm_reset_control_get(dev, "mac_core");
+	if (IS_ERR(priv->mac_core_rst))
+		priv->mac_core_rst = NULL;
+	hix5hd2_mac_core_reset(priv);
+
+	priv->mac_ifc_rst = devm_reset_control_get(dev, "mac_ifc");
+	if (IS_ERR(priv->mac_ifc_rst))
+		priv->mac_ifc_rst = NULL;
+
+	priv->phy_rst = devm_reset_control_get(dev, "phy");
+	if (IS_ERR(priv->phy_rst)) {
+		priv->phy_rst = NULL;
+	} else {
+		ret = of_property_read_u32_array(node,
+						 PHY_RESET_DELAYS_PROPERTY,
+						 priv->phy_reset_delays,
+						 DELAYS_NUM);
+		if (ret)
+			goto out_disable_clk;
+		hix5hd2_phy_reset(priv);
+	}
+
 	bus = mdiobus_alloc();
 	if (bus == NULL) {
 		ret = -ENOMEM;
-		goto out_free_netdev;
+		goto out_disable_clk;
 	}
 
 	bus->priv = priv;
@@ -1159,7 +1269,8 @@ static int hix5hd2_dev_probe(struct platform_device *pdev)
 		goto out_destroy_queue;
 	}
 
-	clk_disable_unprepare(priv->clk);
+	clk_disable_unprepare(priv->mac_ifc_clk);
+	clk_disable_unprepare(priv->mac_core_clk);
 
 	return ret;
 
@@ -1174,6 +1285,10 @@ static int hix5hd2_dev_probe(struct platform_device *pdev)
 	mdiobus_unregister(bus);
 err_free_mdio:
 	mdiobus_free(bus);
+out_disable_clk:
+	clk_disable_unprepare(priv->mac_ifc_clk);
+out_disable_mac_core_clk:
+	clk_disable_unprepare(priv->mac_core_clk);
 out_free_netdev:
 	free_netdev(ndev);
 
-- 
2.8.2

^ permalink raw reply related

* [PATCH v2 0/4] net: hix5hd2_gmac: add tx sg feature and reset/clock control signals
From: Dongpo Li @ 2016-12-05 13:27 UTC (permalink / raw)
  To: robh+dt, mark.rutland, mturquette, sboyd, linux, zhangfei.gao,
	yisen.zhuang, salil.mehta, davem, arnd, andrew
  Cc: xuejiancheng, benjamin.chenhao, caizhiyong, netdev, devicetree,
	linux-kernel, Dongpo Li

The "hix5hd2" is SoC name, add the generic ethernet driver compatible string.
The "hisi-gemac-v1" is the basic version and "hisi-gemac-v2" adds
the SG/TXCSUM/TSO/UFO features.
This patch set only adds the SG(scatter-gather) driver for transmitting,
the drivers of other features will be submitted later.

Add the MAC reset control signals and clock signals.
We make these signals optional to be backward compatible with
the hix5hd2 SoC.

Changes in v2:
- Make the compatible string changes be a separate patch and
the most specific string come first than the generic string
as advised by Rob.
- Make the MAC reset control signals and clock signals optional
to be backward compatible with the hix5hd2 SoC.
- Change the compatible string and give the clock a specific name
in hix5hd2 dts file.

Dongpo Li (4):
  net: hix5hd2_gmac: add generic compatible string
  net: hix5hd2_gmac: add tx scatter-gather feature
  net: hix5hd2_gmac: add reset control and clock signals
  ARM: dts: hix5hd2: add gmac generic compatible and clock names

 .../bindings/net/hisilicon-hix5hd2-gmac.txt        |  27 +-
 arch/arm/boot/dts/hisi-x5hd2.dtsi                  |   6 +-
 drivers/net/ethernet/hisilicon/hix5hd2_gmac.c      | 352 +++++++++++++++++++--
 3 files changed, 352 insertions(+), 33 deletions(-)

-- 
2.8.2

^ permalink raw reply

* Re: [patch net v2] net: fec: fix compile with CONFIG_M5272
From: kbuild test robot @ 2016-12-05 13:18 UTC (permalink / raw)
  To: Nikita Yushchenko
  Cc: kbuild-all, David S. Miller, Fugang Duan, Troy Kisky, Andrew Lunn,
	Eric Nelson, Philippe Reynes, Johannes Berg, netdev, Chris Healy,
	Fabio Estevam, linux-kernel, Nikita Yushchenko
In-Reply-To: <1480925763-20254-1-git-send-email-nikita.yoush@cogentembedded.com>

[-- Attachment #1: Type: text/plain, Size: 1513 bytes --]

Hi Nikita,

[auto build test ERROR on net/master]

url:    https://github.com/0day-ci/linux/commits/Nikita-Yushchenko/net-fec-fix-compile-with-CONFIG_M5272/20161205-181735
config: arm-multi_v5_defconfig (attached as .config)
compiler: arm-linux-gnueabi-gcc (Debian 6.1.1-9) 6.1.1 20160705
reproduce:
        wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        make.cross ARCH=arm 

All errors (new ones prefixed by >>):

   drivers/net/ethernet/freescale/fec_main.c: In function 'fec_probe':
>> drivers/net/ethernet/freescale/fec_main.c:3304:7: error: 'FEC_STAT_SIZE' undeclared (first use in this function)
          FEC_STAT_SIZE, num_tx_qs, num_rx_qs);
          ^~~~~~~~~~~~~
   drivers/net/ethernet/freescale/fec_main.c:3304:7: note: each undeclared identifier is reported only once for each function it appears in

vim +/FEC_STAT_SIZE +3304 drivers/net/ethernet/freescale/fec_main.c

  3298		int num_rx_qs;
  3299	
  3300		fec_enet_get_queue_num(pdev, &num_tx_qs, &num_rx_qs);
  3301	
  3302		/* Init network device */
  3303		ndev = alloc_etherdev_mqs(sizeof(struct fec_enet_private) +
> 3304					  FEC_STAT_SIZE, num_tx_qs, num_rx_qs);
  3305		if (!ndev)
  3306			return -ENOMEM;
  3307	

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 27879 bytes --]

^ permalink raw reply

* Re: [PATCH 1/1] NET: usb: cdc_ncm: adding MBIM RESET_FUNCTION request and modifying ncm bind common code
From: Daniele Palmas @ 2016-12-05 13:04 UTC (permalink / raw)
  To: Bjørn Mork; +Cc: Oliver Neukum, linux-usb, netdev
In-Reply-To: <87a8ca1yat.fsf@miraculix.mork.no>

Hi,

2016-12-05 11:10 GMT+01:00 Bjørn Mork <bjorn@mork.no>:
> Daniele Palmas <dnlplm@gmail.com> writes:
>
>> I went back to this and further checking revealed that MBIM function
>> reset is not needed and the only problem is related to commit
>> 48906f62c96cc2cd35753e59310cb70eb08cc6a5 cdc_ncm: toggle altsetting to
>> force reset before setup, that, however, is mandatory for some Huawei
>> modems.
>
> Interesting.  That does sound like an odd firmware bug to me. I have a
> bit of a hard time understanding how this can be.  Using data interface
> altsetting 0 to reset the function is documented in the NCM spec:
>

Agree, this is likely a firmware bug and it is also proved by all the
other modems that accept the procedure without issues.

> "
>   7.2 Using Alternate Settings to Reset an NCM Function
>
>   To place the network aspects of a function in a known state, the host
>   shall:
>   • select alternate setting 0 of the NCM Data Interface (this is the
>     setting with no endpoints). This can be done explicitly using
>     SetInterface, or implicitly using SetConfiguration. See [USB30] for
>     details.
>   • select the NCM operational parameters by sending commands to the NCM
>     Communication Interface, then
>   • select the second alternate interface setting of the NCM Data
>     Interface (this is the setting with a bulk IN endpoint and a bulk
>     OUT endpoint).
>
>   Whenever alternate setting 0 is selected by the host, the function
>   shall:
>   • flush function buffers
>   • reset the packet filter to its default state
>   • clear all multicast address filters
>   • clear all power filters set using
>     SetEthernetPowerManagementPatternFilter
>   • reset statistics counters to zero
>   • restore its Ethernet address to its default state
>   • reset its IN NTB size to the value given by field dwNtbInMaxSize
>     from the NTB Parameter Struc- ture
>   • reset the NTB format to NTB-16
>   • reset the current Maximum Datagram Size to a function-specific
>     default. If SetMaxDatagramSize is not supported, then the maximum
>     datagram size shall be the same as the value in wMaxSegmentSize of
>     the Ethernet Networking Functional Descriptor. If SetMaxDatagramSize
>     is supported by the function, then the host must either query the
>     function to determine the current effective maximum datagram size,
>     or must explicitly set the maximum datagram size. If the host wishes
>     to set the Maximum Datagram Size, it may do so prior to selecting
>     the second alternate interface set- ting of the data
>     interface. Doing so will ensure that the change takes effect prior
>     to send or receiving data.
>   • reset CRC mode so that the function will not append CRCs to
>     datagrams sent on the IN pipe
>   • reset NTB sequence numbers to zero
>
>   When the host selects the second alternate interface setting of the
>   NCM Data Interface, the function shall perform the following actions
>   in the following order.
>   • If connected to the network, the function shall send a
>     ConnectionSpeedChange notification to the host indicating the
>     current connection speed.
>   • Whether connected or not, the function shall then send a
>     NetworkConnection notification to the host, with wValue indicating
>     the current state of network connectivity
> "
>
> The addition of the "RESET" request in MBIM did not change these
> requirements AFAICS.  Supporting NCM style "altsetting 0 reset" is
> required by default inheritance.  The description of dual NCM/MBIM
> functions in the MBIM spec further emphasizes this:
>
> "
>   Regardless of the previous sequence of SET_INTERFACE commands targeting
>   the Communication Interface, the host is able to put the function into
>   a defined state by the following sequence:
>
>   1. SET_INTERFACE(Data Interface, 0)
>   2. SET_INTERFACE(Communication Interface, desired alternate setting)
>   3. Sending the required class commands for the targeted alternate
>     setting
>   4. SET_INTERFACE(Data Interface, 1 if Communication Interface,0 and
>     Data Interface,2 Communication Interface 1)
> "
>
>
> This, along with the lack of *any* sort of discussion of the
> relation/interfaction between "MBIM RESET" and "data interface
> altsetting 0" makes the MBIM RESET control request either ambigious or
> redundant.  Or both...
>
> FWIW, I've always considered RESET a symptom of the sloppy MBIM spec
> development.  It did not exactly work to their advantage that the first
> (and only?) published errata simply was an excuse to add new features
> (the "MBIM extended functional descriptor" and "MBIM loopback testmode")
> without updating the revision number.  Causing even more confusion,
> since we now have two different MBIMv1.0 specs.
>
> Well, whatever.  No need to get all frustrated about that. We have to
> deal with whatever the firmware developers serve us.  And,
> unfortunately, it is not surprising that unclear and ambigious specs
> leads to incompatible firmware implementations.
>
> I wonder, what happens if you unload the MBIM driver and let the USB
> core reset the data interface to altsetting 0?  Will the device then
> fail when the driver is loaded again?  If not, then where is the
> difference?  And can we possibly reorder the driver set_interface
> requests to make them similar to the USB core reset
>

I will check this.

Thanks,
Daniele

> Until now, I was under the impression that this was the
> only documented way to reset both NCM and MBIM functions (since the
> RESET is MBIM specific),
>
>> My proposal is to add something like
>> CDC_MBIM_FLAG_AVOID_ALTSETTING_TOGGLE quirk to avoid the toggle.
>>
>> To have a conservative approach, I would add only Telit LE922 to this
>> quirk and keep also the usleep_range delay, since I do not have a
>> clear view on all the VIDs/PIDs affected by this quirk. Then other
>> modems, once tested, can be added to the quirk if the delay is not
>> enough.
>>
>> If this approach, though ugly, has chances to be accepted I can write
>> a new patch.
>
> Yes, that sounds like the best approach at the moment.  I have
> unfortunately not as much time to experiment with this as I seem to
> need.  And the EM7455 workaround (the delay) is a bit hard to verify,
> since it is easily affected by small additional delays caused by the
> debugging itself.  This can cause bogus test results.
>
>
> Bjørn

^ permalink raw reply

* Re: [PATCH net-next] net/sched: cls_flower: Set the filter Hardware device for all use-cases
From: Simon Horman @ 2016-12-05 13:00 UTC (permalink / raw)
  To: Hadar Hen Zion; +Cc: David S. Miller, netdev, Jiri Pirko, Or Gerlitz
In-Reply-To: <1480857919-25194-1-git-send-email-hadarh@mellanox.com>

Hi Hadar,

On Sun, Dec 04, 2016 at 03:25:19PM +0200, Hadar Hen Zion wrote:
> Check if the returned device from tcf_exts_get_dev function supports tc
> offload and in case the rule can't be offloaded, set the filter hw_dev
> parameter to the original device given by the user.
> 
> The filter hw_device parameter should always be set by fl_hw_replace_filter
> function, since this pointer is used by dump stats and destroy
> filter for each flower rule (offloaded or not).
> 
> Fixes: 7091d8c7055d ('net/sched: cls_flower: Add offload support using egress Hardware device')
> Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
> Reported-by: Simon Horman <horms@verge.net.au>

Thanks for fixing this, it looks good to me now.

I apologise for not reporting this from my netronome.com address,
which was my intention.

Tested-by: Simon Horman <simon.horman@netronome.com>

> ---
>  net/sched/cls_flower.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
> index c5cea78..29a9e6d 100644
> --- a/net/sched/cls_flower.c
> +++ b/net/sched/cls_flower.c
> @@ -236,8 +236,11 @@ static int fl_hw_replace_filter(struct tcf_proto *tp,
>  	int err;
>  
>  	if (!tc_can_offload(dev, tp)) {
> -		if (tcf_exts_get_dev(dev, &f->exts, &f->hw_dev))
> +		if (tcf_exts_get_dev(dev, &f->exts, &f->hw_dev) ||
> +		    (f->hw_dev && !tc_can_offload(f->hw_dev, tp))) {
> +			f->hw_dev = dev;
>  			return tc_skip_sw(f->flags) ? -EINVAL : 0;
> +		}
>  		dev = f->hw_dev;
>  		tc->egress_dev = true;
>  	} else {
> -- 
> 1.8.3.1
> 

^ permalink raw reply

* RE: [PATCH 3/3] netns: fix net_generic() "id - 1" bloat
From: David Laight @ 2016-12-05 12:49 UTC (permalink / raw)
  To: 'Alexey Dobriyan', davem@davemloft.net
  Cc: netdev@vger.kernel.org, xemul@openvz.org
In-Reply-To: <20161202012132.GD31170@avx2>

From: Alexey Dobriyan
> Sent: 02 December 2016 01:22
> net_generic() function is both a) inline and b) used ~600 times.
> 
> It has the following code inside
> 
> 		...
> 	ptr = ng->ptr[id - 1];
> 		...
> 
> "id" is never compile time constant so compiler is forced to subtract 1.
> And those decrements or LEA [r32 - 1] instructions add up.
> 
> We also start id'ing from 1 to catch bugs where pernet sybsystem id
> is not initialized and 0. This is quite pointless idea (nothing will
> work or immediate interference with first registered subsystem) in
> general but it hints what needs to be done for code size reduction.
> 
> Namely, overlaying allocation of pointer array and fixed part of
> structure in the beginning and using usual base-0 addressing.
> 
> Ids are just cookies, their exact values do not matter, so lets start
> with 3 on x86_64.
...
>  struct net_generic {
> -	struct {
> -		unsigned int len;
> -		struct rcu_head rcu;
> -	} s;
> -
> -	void *ptr[0];
> +	union {
> +		struct {
> +			unsigned int len;
> +			struct rcu_head rcu;
> +		} s;
> +
> +		void *ptr[0];
> +	};
>  };

That union is an accident waiting to happen.
What might work is to offset the Ids by
(offsetof(struct net_generic, ptr)/sizeof (void *)) instead of by 1.
The subtract from the offset will then counter the structure offset
- which is what you are trying to achieve.

	David.

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox