Netdev List
 help / color / mirror / Atom feed
* Re: [RFC PATCH net v2 2/3] dt: bindings: add ethernet phy eee-disable-advert option documentation
From: Jerome Brunet @ 2016-11-21 16:16 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA, devicetree-u79uwXL29TY76Z2rM5mHXA,
	Florian Fainelli, Alexandre TORGUE, Neil Armstrong,
	Martin Blumenstingl, Kevin Hilman,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Andre Roth,
	linux-amlogic-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, Carlo Caione,
	Giuseppe Cavallaro,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r
In-Reply-To: <20161121160149.GF1922-g2DYL2Zd6BY@public.gmane.org>

On Mon, 2016-11-21 at 17:01 +0100, Andrew Lunn wrote:
> On Mon, Nov 21, 2016 at 04:35:23PM +0100, Jerome Brunet wrote:
> > 
> > Signed-off-by: Jerome Brunet <jbrunet-rdvid1DuHRBWk0Htik3J/w@public.gmane.org>
> > ---
> >  Documentation/devicetree/bindings/net/phy.txt | 5 +++++
> >  1 file changed, 5 insertions(+)
> > 
> > diff --git a/Documentation/devicetree/bindings/net/phy.txt
> > b/Documentation/devicetree/bindings/net/phy.txt
> > index bc1c3c8bf8fa..7f066b7c1e2c 100644
> > --- a/Documentation/devicetree/bindings/net/phy.txt
> > +++ b/Documentation/devicetree/bindings/net/phy.txt
> > @@ -35,6 +35,11 @@ Optional Properties:
> >  - broken-turn-around: If set, indicates the PHY device does not
> > correctly
> >    release the turn around line low at the end of a MDIO
> > transaction.
> >  
> > +- eee-advert-disable: Bits to clear in the MDIO_AN_EEE_ADV
> > register to
> > +  disable EEE modes. Example
> > +    * 0x4: disable EEE for 1000T,
> > +    * 0x6: disable EEE for 100TX and 1000T
> > +
> 
> Hi Jerome
> 
> I like the direction this patchset is taking. But hex values are
> pretty unfriendly. 

Agreed

> Please add a set of boolean properties, and do the
> mapping to hex in the C code.
> 
> That would also make extending this API easier. e.g. say you have a
> 10Gbps PHY with EEE, and you need to disable it. This hex value
> quickly gets ugly, eee-advert-disable-10000 is nice and simple.

What I did not realize when doing this patch for the realtek driver is
that there is already 6 valid modes defined in the kernel

#define MDIO_EEE_100TX		MDIO_AN_EEE_ADV_100TX	/*
100TX EEE cap */
#define MDIO_EEE_1000T		MDIO_AN_EEE_ADV_1000T	/*
1000T EEE cap */
#define MDIO_EEE_10GT		0x0008	/* 10GT EEE cap */
#define MDIO_EEE_1000KX		0x0010	/* 1000KX EEE cap
*/
#define MDIO_EEE_10GKX4		0x0020	/* 10G KX4 EEE cap
*/
#define MDIO_EEE_10GKR		0x0040	/* 10G KR EEE cap
*/

I took care of only 2 in the case of realtek.c since it only support
MDIO_EEE_100TX and MDIO_EEE_1000T.

Defining a property for each is certainly doable but it does not look
very nice either. If it extends in the future, it will get even more
messier, especially if you want to disable everything.

What do you think about keeping a single mask value but use the define
above in the DT ? It would be more readable than hex and easy to
extend, don't you think ?

These defines are already part of the uapi so I guess we can use those
in the DT bindings ?

> 
> 	Andrew
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* RE: [PATCH for-next 03/11] IB/hns: Optimize the logic of allocating memory using APIs
From: Salil Mehta @ 2016-11-21 16:12 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: dledford@redhat.com, Huwei (Xavier), oulijun,
	mehta.salil.lnk@gmail.com, linux-rdma@vger.kernel.org,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org, Linuxarm,
	Zhangping (ZP)
In-Reply-To: <20161116083602.GH4240@leon.nu>

> -----Original Message-----
> From: Leon Romanovsky [mailto:leon@kernel.org]
> Sent: Wednesday, November 16, 2016 8:36 AM
> To: Salil Mehta
> Cc: dledford@redhat.com; Huwei (Xavier); oulijun;
> mehta.salil.lnk@gmail.com; linux-rdma@vger.kernel.org;
> netdev@vger.kernel.org; linux-kernel@vger.kernel.org; Linuxarm;
> Zhangping (ZP)
> Subject: Re: [PATCH for-next 03/11] IB/hns: Optimize the logic of
> allocating memory using APIs
> 
> On Tue, Nov 15, 2016 at 03:52:46PM +0000, Salil Mehta wrote:
> > > -----Original Message-----
> > > From: Leon Romanovsky [mailto:leon@kernel.org]
> > > Sent: Wednesday, November 09, 2016 7:22 AM
> > > To: Salil Mehta
> > > Cc: dledford@redhat.com; Huwei (Xavier); oulijun;
> > > mehta.salil.lnk@gmail.com; linux-rdma@vger.kernel.org;
> > > netdev@vger.kernel.org; linux-kernel@vger.kernel.org; Linuxarm;
> > > Zhangping (ZP)
> > > Subject: Re: [PATCH for-next 03/11] IB/hns: Optimize the logic of
> > > allocating memory using APIs
> > >
> > > On Fri, Nov 04, 2016 at 04:36:25PM +0000, Salil Mehta wrote:
> > > > From: "Wei Hu (Xavier)" <xavier.huwei@huawei.com>
> > > >
> > > > This patch modified the logic of allocating memory using APIs in
> > > > hns RoCE driver. We used kcalloc instead of kmalloc_array and
> > > > bitmap_zero. And When kcalloc failed, call vzalloc to alloc
> > > > memory.
> > > >
> > > > Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>
> > > > Signed-off-by: Ping Zhang <zhangping5@huawei.com>
> > > > Signed-off-by: Salil Mehta  <salil.mehta@huawei.com>
> > > > ---
> > > >  drivers/infiniband/hw/hns/hns_roce_mr.c |   15 ++++++++-------
> > > >  1 file changed, 8 insertions(+), 7 deletions(-)
> > > >
> > > > diff --git a/drivers/infiniband/hw/hns/hns_roce_mr.c
> > > b/drivers/infiniband/hw/hns/hns_roce_mr.c
> > > > index fb87883..d3dfb5f 100644
> > > > --- a/drivers/infiniband/hw/hns/hns_roce_mr.c
> > > > +++ b/drivers/infiniband/hw/hns/hns_roce_mr.c
> > > > @@ -137,11 +137,12 @@ static int hns_roce_buddy_init(struct
> > > hns_roce_buddy *buddy, int max_order)
> > > >
> > > >  	for (i = 0; i <= buddy->max_order; ++i) {
> > > >  		s = BITS_TO_LONGS(1 << (buddy->max_order - i));
> > > > -		buddy->bits[i] = kmalloc_array(s, sizeof(long),
> > > GFP_KERNEL);
> > > > -		if (!buddy->bits[i])
> > > > -			goto err_out_free;
> > > > -
> > > > -		bitmap_zero(buddy->bits[i], 1 << (buddy->max_order -
> i));
> > > > +		buddy->bits[i] = kcalloc(s, sizeof(long),
> GFP_KERNEL);
> > > > +		if (!buddy->bits[i]) {
> > > > +			buddy->bits[i] = vzalloc(s * sizeof(long));
> > >
> > > I wonder, why don't you use directly vzalloc instead of kcalloc
> > > fallback?
> > As we know we will have physical contiguous pages if the kcalloc
> > call succeeds. This will give us a chance to have better performance
> > over the allocations which are just virtually contiguous through the
> > function vzalloc(). Therefore, later has only been used as a fallback
> > when our memory request cannot be entertained through kcalloc.
> >
> > Are you suggesting that there will not be much performance penalty
> > if we use just vzalloc ?
> 
> Not exactly,
> I asked it, because we have similar code in our drivers and this
> construction looks strange to me.
> 
> 1. If performance is critical, we will use kmalloc.
> 2. If performance is not critical, we will use vmalloc.
> 
> But in this case, such construction shows me that we can live with
> vmalloc performance and kmalloc allocation are not really needed.
> 
> In your specific case, I'm not sure that kcalloc will ever fail.
Performance is definitely critical here. Though, I agree this is bit
unusual way of memory allocation. In actual, we were encountering
memory alloc failures using kmalloc (if you see allocation amount
is on the higher side and is exponential) so we ended up using
vmalloc as fall back - It is very naïve allocation scheme.

Maybe we need to rethink this allocation scheme part? Also, I can pull
back this particular patch for now or just live with vzalloc() till
we figure out proper solution to this? 

> 
> Thanks
> 
> 
> >
> > >
> > > > +			if (!buddy->bits[i])
> > > > +				goto err_out_free;
> > > > +		}
> > > >  	}

^ permalink raw reply

* Re: Synopsys Ethernet QoS Driver
From: Joao Pinto @ 2016-11-21 16:11 UTC (permalink / raw)
  To: Lars Persson, Joao Pinto
  Cc: Giuseppe CAVALLARO, Rayagond Kokatanur, Rabin Vincent, mued dib,
	David Miller, Jeff Kirsher, jiri@mellanox.com,
	saeedm@mellanox.com, idosch@mellanox.com, netdev,
	linux-kernel@vger.kernel.org, CARLOS.PALMINHA@synopsys.com,
	Andreas Irestål, alexandre.torgue@st.com,
	linux-arm-kernel@lists.infradead.org
In-Reply-To: <A001080B-2DC8-48D9-BD82-8276A9B3BE3D@axis.com>

On 21-11-2016 15:43, Lars Persson wrote:
> 
> 
>> 21 nov. 2016 kl. 16:06 skrev Joao Pinto <Joao.Pinto@synopsys.com>:
>>
>>> On 21-11-2016 14:25, Giuseppe CAVALLARO wrote:
>>>> On 11/21/2016 2:28 PM, Lars Persson wrote:
>>>>
>>>>
>>>>> 21 nov. 2016 kl. 13:53 skrev Giuseppe CAVALLARO <peppe.cavallaro@st.com>:
>>>>>
>>>>> Hello Joao
>>>>>
>>>>>> On 11/21/2016 1:32 PM, Joao Pinto wrote:
>>>>>> Hello,
>>>>>>
>>>>>>>> On 21-11-2016 05:29, Rayagond Kokatanur wrote:
>>>>>>>>> On Sat, Nov 19, 2016 at 7:26 PM, Rabin Vincent <rabin@rab.in> wrote:
>>>>>>>>> On Fri, Nov 18, 2016 at 02:20:27PM +0000, Joao Pinto wrote:
>>>>>>>>> For now we are interesting in improving the synopsys QoS driver under
>>>>>>>>> /nect/ethernet/synopsys. For now the driver structure consists of a
>>>>>>>>> single file
>>>>>>>>> called dwc_eth_qos.c, containing synopsys ethernet qos common ops and

snip (...)

>>>>> The stmmac drivers run since many years on several platforms
>>>>> (sh4, stm32, arm, x86, mips ...) and it supports an huge of amount of
>>>>> configurations starting from 3.1x to 3.7x databooks.
>>>>>
>>>>> It also supports QoS hardware; for example, 4.00a, 4.10a and 4.20a
>>>>> are fully working.
>>>>>
>>>>> Also the stmmac has platform, device-tree and pcie supports and
>>>>> a lot of maintained glue-logic files.
>>>>>
>>>>> It is fully documented inside the kernel tree.
>>>>>
>>>>> I am happy to have new enhancements from other developers.
>>>>> So, on my side, if you want to spend your time on improving it on your
>>>>> platforms please feel free to do it!
>>>>>
>>>>> Concerning the stmicro/stmmac naming, these come from a really old
>>>>> story and have no issue to adopt new folder/file names.
>>>>>
>>>>> I am also open to merge fixes and changes from ethernet/synopsis.
>>>>> I want to point you on some benchmarks made by Alex some months ago
>>>>> (IIRC) that showed an stmmac winner (due to the several optimizations
>>>>> analyzed and reviewed in this mailing list).
>>>>>
>>>>> Peppe
>>>>>
>>>>
>>>> Hello Joao and others,
>>>>
>>
>> Hi Lars,
>>
>>>> As the maintainer of dwc_eth_qos.c I prefer also that we put efforts on the
>>>> most mature driver, the stmmac.
>>>>
>>>> I hope that the code can migrate into an ethernet/synopsys folder to keep the
>>>> convention of naming the folder after the vendor. This makes it easy for
>>>> others to find the driver.
>>>>
>>>> The dwc_eth_qos.c will eventually be removed and its DT binding interface can
>>>> then be implemented in the stmmac driver.
>>
>> So your ideia is to pick the ethernet/stmmac and rename it to ethernet/synopsys
>> and try to improve the structure and add the missing QoS features to it?
> 
> Indeed this is what I prefer.

Ok, it makes sense.
Just for curiosity the target setup is the following:
https://www.youtube.com/watch?v=8V-LB5y2Cos
but instead of using internal drivers, we desire to use mainline drivers only.

Thanks!

> 
>>
>>>
>>> Thanks Lars, I will be happy to support all you on this transition
>>> and I agree on renaming all.
>>>
>>> peppe
>>>
>>>
>>>> - Lars
>>>>
>>>>>>
>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> (See http://lists.openwall.net/netdev/2016/02/29/127)
>>>>>>>>
>>>>>>>> The former only supports 4.x of the hardware.
>>>>>>>>
>>>>>>>> The later supports 4.x and 3.x and already has a platform glue driver
>>>>>>>> with support for several platforms, a PCI glue driver, and a core driver
>>>>>>>> with several features not present in the former (for example: TX/RX
>>>>>>>> interrupt coalescing, EEE, PTP).
>>>>>>>>
>>>>>>>> Have you evaluated both drivers?  Why have you decided to work on the
>>>>>>>> former rather than the latter?
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> Thanks.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>

^ permalink raw reply

* Re: Netperf UDP issue with connected sockets
From: Jesper Dangaard Brouer @ 2016-11-21 16:03 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Rick Jones, netdev, Saeed Mahameed, Tariq Toukan, brouer
In-Reply-To: <1479408683.8455.273.camel@edumazet-glaptop3.roam.corp.google.com>


On Thu, 17 Nov 2016 10:51:23 -0800
Eric Dumazet <eric.dumazet@gmail.com> wrote:

> On Thu, 2016-11-17 at 19:30 +0100, Jesper Dangaard Brouer wrote:
> 
> > The point is I can see a socket Send-Q forming, thus we do know the
> > application have something to send. Thus, and possibility for
> > non-opportunistic bulking. Allowing/implementing bulk enqueue from
> > socket layer into qdisc layer, should be fairly simple (and rest of
> > xmit_more is already in place).    
> 
> 
> As I said, you are fooled by TX completions.

Obviously TX completions play a role yes, and I bet I can adjust the
TX completion to cause xmit_more to happen, at the expense of
introducing added latency.

The point is the "bloated" spinlock in __dev_queue_xmit is still caused
by the MMIO tailptr/doorbell.  The added cost occurs when enqueueing
packets, and result in the inability to get enough packets into the
qdisc for xmit_more going (on my system).  I argue that a bulk enqueue
API would allow us to get past the hurtle of transitioning into
xmit_more mode more easily.


> Please make sure to increase the sndbuf limits !
> 
> echo 2129920 >/proc/sys/net/core/wmem_default

Testing with this makes no difference.

 $ grep -H . /proc/sys/net/core/wmem_default
 /proc/sys/net/core/wmem_default:2129920


> lpaa23:~# sar -n DEV 1 10|grep eth1
                  IFACE   rxpck/s    txpck/s    rxkB/s     txkB/s   rxcmp/s   txcmp/s  rxmcst/s
> 10:49:25         eth1      7.00 9273283.00      0.61 2187214.90      0.00      0.00      0.00
> 10:49:26         eth1      1.00 9230795.00      0.06 2176787.57      0.00      0.00      1.00
> 10:49:27         eth1      2.00 9247906.00      0.17 2180915.45      0.00      0.00      0.00
> 10:49:28         eth1      3.00 9246542.00      0.23 2180790.38      0.00      0.00      1.00
> Average:         eth1      2.50 9018045.70      0.25 2126893.82      0.00      0.00      0.50

Very impressive numbers 9.2Mpps TX.

What is this test?  What kind of traffic? Multiple CPUs?


> lpaa23:~# ethtool -S eth1|grep more; sleep 1;ethtool -S eth1|grep more
>      xmit_more: 2251366909
>      xmit_more: 2256011392
> 
> lpaa23:~# echo 2256011392-2251366909 | bc
> 4644483

The xmit_more definitely works on your system, but I cannot get it to
"kick-in" on my setup.  Once the xmit_more is active, then the
"bloated" spinlock problem should go way.


(Tests with "udp_flood --pmtu 3 --send")

Forcing TX completion to happen on the same CPU, no xmit_more:

 ~/git/network-testing/bin/ethtool_stats.pl --sec 2 --dev mlx5p2
 Show adapter(s) (mlx5p2) statistics (ONLY that changed!)
 Ethtool(mlx5p2  ) stat:    104592908 (    104,592,908) <= tx0_bytes /sec
 Ethtool(mlx5p2  ) stat:        39059 (         39,059) <= tx0_nop /sec
 Ethtool(mlx5p2  ) stat:      1743215 (      1,743,215) <= tx0_packets /sec 
 Ethtool(mlx5p2  ) stat:    104719986 (    104,719,986) <= tx_bytes /sec
 Ethtool(mlx5p2  ) stat:    111774540 (    111,774,540) <= tx_bytes_phy /sec
 Ethtool(mlx5p2  ) stat:      1745333 (      1,745,333) <= tx_csum_partial /sec
 Ethtool(mlx5p2  ) stat:      1745333 (      1,745,333) <= tx_packets /sec
 Ethtool(mlx5p2  ) stat:      1746477 (      1,746,477) <= tx_packets_phy /sec
 Ethtool(mlx5p2  ) stat:    111483434 (    111,483,434) <= tx_prio1_bytes /sec
 Ethtool(mlx5p2  ) stat:      1741928 (      1,741,928) <= tx_prio1_packets /sec

Forcing TX completion to happen on remote CPU, some xmit_more:

 Show adapter(s) (mlx5p2) statistics (ONLY that changed!)
 Ethtool(mlx5p2  ) stat:    128485892 (    128,485,892) <= tx0_bytes /sec
 Ethtool(mlx5p2  ) stat:        31840 (         31,840) <= tx0_nop /sec
 Ethtool(mlx5p2  ) stat:      2141432 (      2,141,432) <= tx0_packets /sec
 Ethtool(mlx5p2  ) stat:          350 (            350) <= tx0_xmit_more /sec
 Ethtool(mlx5p2  ) stat:    128486459 (    128,486,459) <= tx_bytes /sec
 Ethtool(mlx5p2  ) stat:    137052191 (    137,052,191) <= tx_bytes_phy /sec
 Ethtool(mlx5p2  ) stat:      2141441 (      2,141,441) <= tx_csum_partial /sec
 Ethtool(mlx5p2  ) stat:      2141441 (      2,141,441) <= tx_packets /sec
 Ethtool(mlx5p2  ) stat:      2141441 (      2,141,441) <= tx_packets_phy /sec
 Ethtool(mlx5p2  ) stat:    137051300 (    137,051,300) <= tx_prio1_bytes /sec
 Ethtool(mlx5p2  ) stat:      2141427 (      2,141,427) <= tx_prio1_packets /sec
 Ethtool(mlx5p2  ) stat:          350 (            350) <= tx_xmit_more /sec



>    PerfTop:   76969 irqs/sec  kernel:96.6%  exact: 100.0% [4000Hz cycles:pp],  (all, 48 CPUs)
>---------------------------------------------------------------------------------------------
>     11.64%  [kernel]  [k] skb_set_owner_w               
>      6.21%  [kernel]  [k] queued_spin_lock_slowpath     
>      4.76%  [kernel]  [k] _raw_spin_lock                
>      4.40%  [kernel]  [k] __ip_make_skb                 
>      3.10%  [kernel]  [k] sock_wfree                    
>      2.87%  [kernel]  [k] ipt_do_table                  
>      2.76%  [kernel]  [k] fq_dequeue                    
>      2.71%  [kernel]  [k] mlx4_en_xmit                  
>      2.50%  [kernel]  [k] __dev_queue_xmit              
>      2.29%  [kernel]  [k] __ip_append_data.isra.40      
>      2.28%  [kernel]  [k] udp_sendmsg                   
>      2.01%  [kernel]  [k] __alloc_skb                   
>      1.90%  [kernel]  [k] napi_consume_skb              
>      1.63%  [kernel]  [k] udp_send_skb                  
>      1.62%  [kernel]  [k] skb_release_data              
>      1.62%  [kernel]  [k] entry_SYSCALL_64_fastpath     
>      1.56%  [kernel]  [k] dev_hard_start_xmit           
>      1.55%  udpsnd    [.] __libc_send                   
>      1.48%  [kernel]  [k] netif_skb_features            
>      1.42%  [kernel]  [k] __qdisc_run                   
>      1.35%  [kernel]  [k] sk_dst_check                  
>      1.33%  [kernel]  [k] sock_def_write_space          
>      1.30%  [kernel]  [k] kmem_cache_alloc_node_trace   
>      1.29%  [kernel]  [k] __local_bh_enable_ip          
>      1.21%  [kernel]  [k] copy_user_enhanced_fast_string
>      1.08%  [kernel]  [k] __kmalloc_reserve.isra.40     
>      1.08%  [kernel]  [k] SYSC_sendto                   
>      1.07%  [kernel]  [k] kmem_cache_alloc_node         
>      0.95%  [kernel]  [k] ip_finish_output2             
>      0.95%  [kernel]  [k] ktime_get                     
>      0.91%  [kernel]  [k] validate_xmit_skb             
>      0.88%  [kernel]  [k] sock_alloc_send_pskb          
>      0.82%  [kernel]  [k] sock_sendmsg                  

My perf outputs below...

Forcing TX completion to happen on the same CPU, no xmit_more:

# Overhead  CPU  Command     Shared Object     Symbol                         
# ........  ...  ..........  ................. ...............................
#
    12.17%  000  udp_flood   [kernel.vmlinux]  [k] _raw_spin_lock             
     5.03%  000  udp_flood   [mlx5_core]       [k] mlx5e_sq_xmit              
     3.13%  000  udp_flood   [kernel.vmlinux]  [k] __ip_append_data.isra.47   
     2.85%  000  udp_flood   [kernel.vmlinux]  [k] entry_SYSCALL_64           
     2.75%  000  udp_flood   [mlx5_core]       [k] mlx5e_poll_tx_cq           
     2.61%  000  udp_flood   [kernel.vmlinux]  [k] sock_def_write_space       
     2.48%  000  udp_flood   [kernel.vmlinux]  [k] skb_set_owner_w            
     2.25%  000  udp_flood   [kernel.vmlinux]  [k] __alloc_skb                
     2.21%  000  udp_flood   [kernel.vmlinux]  [k] udp_sendmsg                
     2.19%  000  udp_flood   [kernel.vmlinux]  [k] __slab_free                
     2.08%  000  udp_flood   [kernel.vmlinux]  [k] sock_wfree                 
     2.06%  000  udp_flood   [kernel.vmlinux]  [k] __ip_make_skb              
     1.93%  000  udp_flood   [mlx5_core]       [k] mlx5e_get_cqe              
     1.93%  000  udp_flood   libc-2.17.so      [.] __libc_send                
     1.80%  000  udp_flood   [kernel.vmlinux]  [k] entry_SYSCALL_64_fastpath  
     1.64%  000  udp_flood   [kernel.vmlinux]  [k] kfree                      
     1.61%  000  udp_flood   [kernel.vmlinux]  [k] ip_finish_output2          
     1.59%  000  udp_flood   [kernel.vmlinux]  [k] __local_bh_enable_ip       
     1.57%  000  udp_flood   [kernel.vmlinux]  [k] __dev_queue_xmit           
     1.49%  000  udp_flood   [kernel.vmlinux]  [k] __kmalloc_node_track_caller
     1.38%  000  udp_flood   [kernel.vmlinux]  [k] kmem_cache_alloc_node      
     1.30%  000  udp_flood   [kernel.vmlinux]  [k] dst_release                
     1.26%  000  udp_flood   [kernel.vmlinux]  [k] ksize                      
     1.26%  000  udp_flood   [kernel.vmlinux]  [k] sk_dst_check               
     1.22%  000  udp_flood   [kernel.vmlinux]  [k] SYSC_sendto                
     1.22%  000  udp_flood   [kernel.vmlinux]  [k] ip_send_check              


Forcing TX completion to happen on remote CPU, some xmit_more:

# Overhead  CPU  Command      Shared Object     Symbol                        
# ........  ...  ............ ................  ..............................
#
    11.67%  002  udp_flood   [kernel.vmlinux]  [k] _raw_spin_lock             
     7.61%  002  udp_flood   [kernel.vmlinux]  [k] skb_set_owner_w            
     6.15%  002  udp_flood   [mlx5_core]       [k] mlx5e_sq_xmit              
     3.05%  002  udp_flood   [kernel.vmlinux]  [k] entry_SYSCALL_64           
     2.89%  002  udp_flood   [kernel.vmlinux]  [k] __ip_append_data.isra.47   
     2.78%  000  swapper     [mlx5_core]       [k] mlx5e_poll_tx_cq           
     2.65%  002  udp_flood   [kernel.vmlinux]  [k] sk_dst_check               
     2.36%  002  udp_flood   [kernel.vmlinux]  [k] __alloc_skb                
     2.22%  002  udp_flood   [kernel.vmlinux]  [k] ip_finish_output2          
     2.07%  000  swapper     [kernel.vmlinux]  [k] __slab_free                
     2.06%  002  udp_flood   [kernel.vmlinux]  [k] udp_sendmsg                
     1.97%  002  udp_flood   [kernel.vmlinux]  [k] ksize                      
     1.92%  002  udp_flood   [kernel.vmlinux]  [k] entry_SYSCALL_64_fastpath  
     1.82%  002  udp_flood   [kernel.vmlinux]  [k] __ip_make_skb              
     1.79%  002  udp_flood   libc-2.17.so      [.] __libc_send                
     1.62%  002  udp_flood   [kernel.vmlinux]  [k] __kmalloc_node_track_caller
     1.53%  002  udp_flood   [kernel.vmlinux]  [k] __local_bh_enable_ip       
     1.48%  002  udp_flood   [kernel.vmlinux]  [k] sock_alloc_send_pskb       
     1.43%  002  udp_flood   [kernel.vmlinux]  [k] __dev_queue_xmit           
     1.39%  002  udp_flood   [kernel.vmlinux]  [k] ip_send_check              
     1.39%  002  udp_flood   [kernel.vmlinux]  [k] kmem_cache_alloc_node      
     1.37%  002  udp_flood   [kernel.vmlinux]  [k] dst_release                
     1.21%  002  udp_flood   [kernel.vmlinux]  [k] udp_send_skb               
     1.18%  002  udp_flood   [kernel.vmlinux]  [k] __fget_light               
     1.16%  002  udp_flood   [kernel.vmlinux]  [k] kfree                      
     1.15%  000  swapper     [kernel.vmlinux]  [k] sock_wfree                 
     1.14%  002  udp_flood   [kernel.vmlinux]  [k] SYSC_sendto                

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply

* Re: [RFC PATCH net v2 2/3] dt: bindings: add ethernet phy eee-disable-advert option documentation
From: Andrew Lunn @ 2016-11-21 16:01 UTC (permalink / raw)
  To: Jerome Brunet
  Cc: netdev, devicetree, Florian Fainelli, Alexandre TORGUE,
	Neil Armstrong, Martin Blumenstingl, Kevin Hilman, linux-kernel,
	Andre Roth, linux-amlogic, Carlo Caione, Giuseppe Cavallaro,
	linux-arm-kernel
In-Reply-To: <1479742524-30222-3-git-send-email-jbrunet@baylibre.com>

On Mon, Nov 21, 2016 at 04:35:23PM +0100, Jerome Brunet wrote:
> Signed-off-by: Jerome Brunet <jbrunet@baylibre.com>
> ---
>  Documentation/devicetree/bindings/net/phy.txt | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/Documentation/devicetree/bindings/net/phy.txt b/Documentation/devicetree/bindings/net/phy.txt
> index bc1c3c8bf8fa..7f066b7c1e2c 100644
> --- a/Documentation/devicetree/bindings/net/phy.txt
> +++ b/Documentation/devicetree/bindings/net/phy.txt
> @@ -35,6 +35,11 @@ Optional Properties:
>  - broken-turn-around: If set, indicates the PHY device does not correctly
>    release the turn around line low at the end of a MDIO transaction.
>  
> +- eee-advert-disable: Bits to clear in the MDIO_AN_EEE_ADV register to
> +  disable EEE modes. Example
> +    * 0x4: disable EEE for 1000T,
> +    * 0x6: disable EEE for 100TX and 1000T
> +

Hi Jerome

I like the direction this patchset is taking. But hex values are
pretty unfriendly. Please add a set of boolean properties, and do the
mapping to hex in the C code.

That would also make extending this API easier. e.g. say you have a
10Gbps PHY with EEE, and you need to disable it. This hex value
quickly gets ugly, eee-advert-disable-10000 is nice and simple.

	Andrew

^ permalink raw reply

* Re: [net-next PATCH v2 4/5] virtio_net: add dedicated XDP transmit queues
From: John Fastabend @ 2016-11-21 15:56 UTC (permalink / raw)
  To: Daniel Borkmann, eric.dumazet, mst, kubakici, shm, davem,
	alexei.starovoitov
  Cc: netdev, bblanco, john.r.fastabend, brouer, tgraf
In-Reply-To: <5832DE60.4000200@iogearbox.net>

On 16-11-21 03:45 AM, Daniel Borkmann wrote:
> On 11/20/2016 03:51 AM, John Fastabend wrote:
>> XDP requires using isolated transmit queues to avoid interference
>> with normal networking stack (BQL, NETDEV_TX_BUSY, etc). This patch
>> adds a XDP queue per cpu when a XDP program is loaded and does not
>> expose the queues to the OS via the normal API call to
>> netif_set_real_num_tx_queues(). This way the stack will never push
>> an skb to these queues.
>>
>> However virtio/vhost/qemu implementation only allows for creating
>> TX/RX queue pairs at this time so creating only TX queues was not
>> possible. And because the associated RX queues are being created I
>> went ahead and exposed these to the stack and let the backend use
>> them. This creates more RX queues visible to the network stack than
>> TX queues which is worth mentioning but does not cause any issues as
>> far as I can tell.
>>
>> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
>> ---

[...]

>>       }
>>
>> +    curr_qp = vi->curr_queue_pairs - vi->xdp_queue_pairs;
>> +    if (prog)
>> +        xdp_qp = nr_cpu_ids;
>> +
>> +    /* XDP requires extra queues for XDP_TX */
>> +    if (curr_qp + xdp_qp > vi->max_queue_pairs) {
>> +        netdev_warn(dev, "request %i queues but max is %i\n",
>> +                curr_qp + xdp_qp, vi->max_queue_pairs);
>> +        return -ENOMEM;
>> +    }
>> +
>> +    err = virtnet_set_queues(vi, curr_qp + xdp_qp);
>> +    if (err) {
>> +        dev_warn(&dev->dev, "XDP Device queue allocation failure.\n");
>> +        return err;
>> +    }
>> +
>>       if (prog) {
>> -        prog = bpf_prog_add(prog, vi->max_queue_pairs - 1);
>> -        if (IS_ERR(prog))
>> +        prog = bpf_prog_add(prog, vi->max_queue_pairs);
> 
> I think this change is not correct, it would be off by one now.
> The previous 'vi->max_queue_pairs - 1' was actually correct here.
> dev_change_xdp_fd() already gives you a reference (see the doc on
> enum xdp_netdev_command in netdevice.h).


Right, this was an error thanks for checking it I'll send a v3. And
maybe draft a test for XDP ref counting to test it in the future.

.John

^ permalink raw reply

* Re: wl1251 & mac address & calibration data
From: Pali Rohár @ 2016-11-21 15:51 UTC (permalink / raw)
  To: Kalle Valo, Pavel Machek, Ivaylo Dimitrov, Sebastian Reichel,
	Aaro Koskinen, Tony Lindgren
  Cc: linux-wireless, netdev, linux-kernel
In-Reply-To: <201611111820.52072@pali>

On Friday 11 November 2016 18:20:50 Pali Rohár wrote:
> Hi! I will open discussion about mac address and calibration data for 
> wl1251 wireless chip again...
> 
> Problem: Mac address & calibration data for wl1251 chip on Nokia N900 
> are stored on second nand partition (mtd1) in special proprietary format 
> which is used only for Nokia N900 (probably on N8x0 and N9 too). 
> Wireless driver wl1251.ko cannot work without mac address and 
> calibration data.
> 
> Absence of mac address cause that driver generates random mac address at 
> every kernel boot which has couple of problems (unstable identifier of 
> wireless device due to udev permanent storage rules; unpredictable 
> behaviour for dhcp mac address assignment, mac address filtering, ...).
> 
> Currently there is no way to set (permanent) mac address for network 
> interface from userspace. And it does not make sense to implement in 
> linux kernel large parser for proprietary format of second nand 
> partition where is mac address stored only for one device -- Nokia N900.
> 
> Driver wl1251.ko loads calibration data via request_firmware() for file 
> wl1251-nvs.bin. There are some "example" calibration file in linux-
> firmware repository, but it is not suitable for normal usage as real 
> calibration data are per-device specific.
> 
> So questions are:
> 
> 1) How to set mac address from userspace for that wl1251 interface? In 
> userspace I can write parser for that proprietary format of nand 
> partition and extract mac address from it

Proposed solutions for 1)

* Introduce new IOCL for setting that permanent mac address from
  userspace. Currently we have IOCL for get request

* Use request_firmware() (with flag from 2)) to ask for mac address from
  userspace. This is already used by wl12xx driver (as mac address is
  part of calibration data firmware file)

* Allow to set mac address via sysfs file, e.g.
  /sys/class/ieee80211/phy0/macaddress

> 2) How to send calibration data to wl1251 driver? Those are again stored 
> in proprietary format and I can write userspace parser for it.

Proposed solution for 2)

Introduce new flag for request_firmware(), so it first try to use
userspace helper for loading firmware file with possibility to fallback
to direct VFS access.


So... what do you think about it?

-- 
Pali Rohár
pali.rohar@gmail.com

^ permalink raw reply

* Re: Synopsys Ethernet QoS Driver
From: Lars Persson @ 2016-11-21 15:43 UTC (permalink / raw)
  To: Joao Pinto
  Cc: Giuseppe CAVALLARO, Rayagond Kokatanur, Rabin Vincent, mued dib,
	David Miller, Jeff Kirsher, jiri@mellanox.com,
	saeedm@mellanox.com, idosch@mellanox.com, netdev,
	linux-kernel@vger.kernel.org, CARLOS.PALMINHA@synopsys.com,
	Andreas Irestål, alexandre.torgue@st.com,
	linux-arm-kernel@lists.infradead.org
In-Reply-To: <b126021d-e902-7a53-4d7a-e73ea89a66cf@synopsys.com>



> 21 nov. 2016 kl. 16:06 skrev Joao Pinto <Joao.Pinto@synopsys.com>:
> 
>> On 21-11-2016 14:25, Giuseppe CAVALLARO wrote:
>>> On 11/21/2016 2:28 PM, Lars Persson wrote:
>>> 
>>> 
>>>> 21 nov. 2016 kl. 13:53 skrev Giuseppe CAVALLARO <peppe.cavallaro@st.com>:
>>>> 
>>>> Hello Joao
>>>> 
>>>>> On 11/21/2016 1:32 PM, Joao Pinto wrote:
>>>>> Hello,
>>>>> 
>>>>>>> On 21-11-2016 05:29, Rayagond Kokatanur wrote:
>>>>>>>> On Sat, Nov 19, 2016 at 7:26 PM, Rabin Vincent <rabin@rab.in> wrote:
>>>>>>>> On Fri, Nov 18, 2016 at 02:20:27PM +0000, Joao Pinto wrote:
>>>>>>>> For now we are interesting in improving the synopsys QoS driver under
>>>>>>>> /nect/ethernet/synopsys. For now the driver structure consists of a
>>>>>>>> single file
>>>>>>>> called dwc_eth_qos.c, containing synopsys ethernet qos common ops and
>>>>>>>> platform
>>>>>>>> related stuff.
>>>>>>>> 
>>>>>>>> Our strategy would be:
>>>>>>>> 
>>>>>>>> a) Implement a platform glue driver (dwc_eth_qos_pltfm.c)
>>>>>>>> b) Implement a pci glue driver (dwc_eth_qos_pci.c)
>>>>>>>> c) Implement a "core driver" (dwc_eth_qos.c) that would only have
>>>>>>>> Ethernet QoS
>>>>>>>> related stuff to be reused by the platform / pci drivers
>>>>>>>> d) Add a set of features to the "core driver" that we have available
>>>>>>>> internally
>>>>>>> 
>>>>>>> Note that there are actually two drivers in mainline for this hardware:
>>>>>>> 
>>>>>>> drivers/net/ethernet/synopsis/
>>>>>>> drivers/net/ethernet/stmicro/stmmac/
>>>>>> 
>>>>>> Yes the later driver (drivers/net/ethernet/stmicro/stmmac/) supports
>>>>>> both 3.x and 4.x. It has glue layer for pci, platform, core etc,
>>>>>> please refer this driver once before you start.
>>>>>> 
>>>>>> You can start adding missing feature of 4.x in stmmac driver.
>>>>> 
>>>>> Thanks you all for all the info.
>>>>> Well, I think we are in a good position to organize the ethernet drivers
>>>>> concerning Synopsys IPs.
>>>>> 
>>>>> First of all, in my opinion, it does not make sense to have a ethernet/synopsis
>>>>> (typo :)) when ethernet/stmicro is also for a synopsys IP. If we have another
>>>>> vendor using the same IP it should be able to reuse the commonn operations. But
>>>>> I would put that discussion for later :)
>>>>> 
>>>>> For now I suggest that for we create ethernet/qos and create there a folder
>>>>> called dwc (designware controller) where all the synopsys qos IP specific code
>>>>> in order to be reused for example by ethernet/stmicro/stmmac/. We just have to
>>>>> figure out a clean interface for "client drivers" like stmmac to interact with
>>>>> the new qos driver.
>>>>> 
>>>>> What do you think about this approach?
>>>> 
>>>> The stmmac drivers run since many years on several platforms
>>>> (sh4, stm32, arm, x86, mips ...) and it supports an huge of amount of
>>>> configurations starting from 3.1x to 3.7x databooks.
>>>> 
>>>> It also supports QoS hardware; for example, 4.00a, 4.10a and 4.20a
>>>> are fully working.
>>>> 
>>>> Also the stmmac has platform, device-tree and pcie supports and
>>>> a lot of maintained glue-logic files.
>>>> 
>>>> It is fully documented inside the kernel tree.
>>>> 
>>>> I am happy to have new enhancements from other developers.
>>>> So, on my side, if you want to spend your time on improving it on your
>>>> platforms please feel free to do it!
>>>> 
>>>> Concerning the stmicro/stmmac naming, these come from a really old
>>>> story and have no issue to adopt new folder/file names.
>>>> 
>>>> I am also open to merge fixes and changes from ethernet/synopsis.
>>>> I want to point you on some benchmarks made by Alex some months ago
>>>> (IIRC) that showed an stmmac winner (due to the several optimizations
>>>> analyzed and reviewed in this mailing list).
>>>> 
>>>> Peppe
>>>> 
>>> 
>>> Hello Joao and others,
>>> 
> 
> Hi Lars,
> 
>>> As the maintainer of dwc_eth_qos.c I prefer also that we put efforts on the
>>> most mature driver, the stmmac.
>>> 
>>> I hope that the code can migrate into an ethernet/synopsys folder to keep the
>>> convention of naming the folder after the vendor. This makes it easy for
>>> others to find the driver.
>>> 
>>> The dwc_eth_qos.c will eventually be removed and its DT binding interface can
>>> then be implemented in the stmmac driver.
> 
> So your ideia is to pick the ethernet/stmmac and rename it to ethernet/synopsys
> and try to improve the structure and add the missing QoS features to it?

Indeed this is what I prefer.

> 
>> 
>> Thanks Lars, I will be happy to support all you on this transition
>> and I agree on renaming all.
>> 
>> peppe
>> 
>> 
>>> - Lars
>>> 
>>>>> 
>>>>> 
>>>>>> 
>>>>>>> 
>>>>>>> (See http://lists.openwall.net/netdev/2016/02/29/127)
>>>>>>> 
>>>>>>> The former only supports 4.x of the hardware.
>>>>>>> 
>>>>>>> The later supports 4.x and 3.x and already has a platform glue driver
>>>>>>> with support for several platforms, a PCI glue driver, and a core driver
>>>>>>> with several features not present in the former (for example: TX/RX
>>>>>>> interrupt coalescing, EEE, PTP).
>>>>>>> 
>>>>>>> Have you evaluated both drivers?  Why have you decided to work on the
>>>>>>> former rather than the latter?
>>>>>> 
>>>>>> 
>>>>> 
>>>>> Thanks.
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>> 
>>> 
>> 
> 

^ permalink raw reply

* [RFC PATCH net v2 3/3] ARM64: dts: meson: odroidc2: disable advertisement EEE for GbE.
From: Jerome Brunet @ 2016-11-21 15:35 UTC (permalink / raw)
  To: netdev, devicetree, Florian Fainelli
  Cc: Alexandre TORGUE, Neil Armstrong, Martin Blumenstingl,
	Kevin Hilman, linux-kernel, Andre Roth, linux-amlogic,
	Carlo Caione, Giuseppe Cavallaro, linux-arm-kernel, Jerome Brunet
In-Reply-To: <1479742524-30222-1-git-send-email-jbrunet@baylibre.com>

Signed-off-by: Jerome Brunet <jbrunet@baylibre.com>
---
 arch/arm64/boot/dts/amlogic/meson-gxbb-odroidc2.dts | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/arch/arm64/boot/dts/amlogic/meson-gxbb-odroidc2.dts b/arch/arm64/boot/dts/amlogic/meson-gxbb-odroidc2.dts
index e6e3491d48a5..b34da077b2f8 100644
--- a/arch/arm64/boot/dts/amlogic/meson-gxbb-odroidc2.dts
+++ b/arch/arm64/boot/dts/amlogic/meson-gxbb-odroidc2.dts
@@ -98,3 +98,18 @@
 	pinctrl-0 = <&i2c_a_pins>;
 	pinctrl-names = "default";
 };
+
+&ethmac {
+	phy-handle = <&eth_phy0>;
+
+	mdio {
+		compatible = "snps,dwmac-mdio";
+		#address-cells = <1>;
+		#size-cells = <0>;
+
+		eth_phy0: ethernet-phy@0 {
+			reg = <0>;
+			eee-advert-disable = <0x4>;
+		};
+	};
+};
-- 
2.7.4

^ permalink raw reply related

* [RFC PATCH net v2 2/3] dt: bindings: add ethernet phy eee-disable-advert option documentation
From: Jerome Brunet @ 2016-11-21 15:35 UTC (permalink / raw)
  To: netdev, devicetree, Florian Fainelli
  Cc: Jerome Brunet, Carlo Caione, Kevin Hilman, Giuseppe Cavallaro,
	Alexandre TORGUE, Martin Blumenstingl, Andre Roth, Neil Armstrong,
	linux-amlogic, linux-arm-kernel, linux-kernel
In-Reply-To: <1479742524-30222-1-git-send-email-jbrunet@baylibre.com>

Signed-off-by: Jerome Brunet <jbrunet@baylibre.com>
---
 Documentation/devicetree/bindings/net/phy.txt | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/Documentation/devicetree/bindings/net/phy.txt b/Documentation/devicetree/bindings/net/phy.txt
index bc1c3c8bf8fa..7f066b7c1e2c 100644
--- a/Documentation/devicetree/bindings/net/phy.txt
+++ b/Documentation/devicetree/bindings/net/phy.txt
@@ -35,6 +35,11 @@ Optional Properties:
 - broken-turn-around: If set, indicates the PHY device does not correctly
   release the turn around line low at the end of a MDIO transaction.
 
+- eee-advert-disable: Bits to clear in the MDIO_AN_EEE_ADV register to
+  disable EEE modes. Example
+    * 0x4: disable EEE for 1000T,
+    * 0x6: disable EEE for 100TX and 1000T
+
 Example:
 
 ethernet-phy@0 {
-- 
2.7.4

^ permalink raw reply related

* [RFC PATCH net v2 1/3] net: phy: add an option to disable EEE advertisement
From: Jerome Brunet @ 2016-11-21 15:35 UTC (permalink / raw)
  To: netdev, devicetree, Florian Fainelli
  Cc: Jerome Brunet, Carlo Caione, Kevin Hilman, Giuseppe Cavallaro,
	Alexandre TORGUE, Martin Blumenstingl, Andre Roth, Neil Armstrong,
	linux-amlogic, linux-arm-kernel, linux-kernel
In-Reply-To: <1479742524-30222-1-git-send-email-jbrunet@baylibre.com>

This patch adds an option to disable EEE advertisement in the generic PHY
by providing a mask of prohibited modes corresponding to the value found in
the MDIO_AN_EEE_ADV register.

On some platforms, PHY Low power idle seems to be causing issues, even
breaking the link some cases. The patch provides a convenient way for these
platforms to disable EEE advertisement and work around the issue.

Signed-off-by: Jerome Brunet <jbrunet@baylibre.com>
---
 drivers/net/phy/phy.c        |  3 ++
 drivers/net/phy/phy_device.c | 80 +++++++++++++++++++++++++++++++++++++++-----
 include/linux/phy.h          |  3 ++
 3 files changed, 77 insertions(+), 9 deletions(-)

diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c
index f424b867f73e..a44ee14bd953 100644
--- a/drivers/net/phy/phy.c
+++ b/drivers/net/phy/phy.c
@@ -1348,6 +1348,9 @@ int phy_ethtool_set_eee(struct phy_device *phydev, struct ethtool_eee *data)
 {
 	int val = ethtool_adv_to_mmd_eee_adv_t(data->advertised);
 
+	/* Mask prohibited EEE modes */
+	val &= ~phydev->eee_advert_disabled;
+
 	phy_write_mmd_indirect(phydev, MDIO_AN_EEE_ADV, MDIO_MMD_AN, val);
 
 	return 0;
diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
index 1a4bf8acad78..74c628e046cb 100644
--- a/drivers/net/phy/phy_device.c
+++ b/drivers/net/phy/phy_device.c
@@ -1116,6 +1116,43 @@ static int genphy_config_advert(struct phy_device *phydev)
 }
 
 /**
+ * genphy_config_eee_advert - disable unwanted eee mode advertisement
+ * @phydev: target phy_device struct
+ *
+ * Description: Writes MDIO_AN_EEE_ADV after disabling unsupported energy
+ *   efficent ethernet modes. Returns 0 if the PHY's advertisement hasn't
+ *   changed, and 1 if it has changed.
+ */
+static int genphy_config_eee_advert(struct phy_device *phydev)
+{
+	u32 disabled = phydev->eee_advert_disabled;
+	u32 old_adv, adv;
+
+	/* Nothing to disable */
+	if (!disabled)
+		return 0;
+
+	/* If the following call fails, we assume that EEE is not
+	 * supported by the phy. If we read 0, EEE is not advertised
+	 * In both case, we don't need to continue
+	 */
+	adv = phy_read_mmd_indirect(phydev, MDIO_AN_EEE_ADV, MDIO_MMD_AN);
+	if (adv <= 0)
+		return 0;
+
+	old_adv = adv;
+	adv &= ~disabled;
+
+	/* Advertising remains unchanged with the ban */
+	if (old_adv == adv)
+		return 0;
+
+	phy_write_mmd_indirect(phydev, MDIO_AN_EEE_ADV, MDIO_MMD_AN, adv);
+
+	return 1;
+}
+
+/**
  * genphy_setup_forced - configures/forces speed/duplex from @phydev
  * @phydev: target phy_device struct
  *
@@ -1173,15 +1210,20 @@ EXPORT_SYMBOL(genphy_restart_aneg);
  */
 int genphy_config_aneg(struct phy_device *phydev)
 {
-	int result;
+	int err, changed;
+
+	changed = genphy_config_eee_advert(phydev);
 
 	if (AUTONEG_ENABLE != phydev->autoneg)
 		return genphy_setup_forced(phydev);
 
-	result = genphy_config_advert(phydev);
-	if (result < 0) /* error */
-		return result;
-	if (result == 0) {
+	err = genphy_config_advert(phydev);
+	if (err < 0) /* error */
+		return err;
+
+	changed |= err;
+
+	if (changed == 0) {
 		/* Advertisement hasn't changed, but maybe aneg was never on to
 		 * begin with?  Or maybe phy was isolated?
 		 */
@@ -1191,16 +1233,16 @@ int genphy_config_aneg(struct phy_device *phydev)
 			return ctl;
 
 		if (!(ctl & BMCR_ANENABLE) || (ctl & BMCR_ISOLATE))
-			result = 1; /* do restart aneg */
+			changed = 1; /* do restart aneg */
 	}
 
 	/* Only restart aneg if we are advertising something different
 	 * than we were before.
 	 */
-	if (result > 0)
-		result = genphy_restart_aneg(phydev);
+	if (changed > 0)
+		return genphy_restart_aneg(phydev);
 
-	return result;
+	return 0;
 }
 EXPORT_SYMBOL(genphy_config_aneg);
 
@@ -1558,6 +1600,21 @@ static void of_set_phy_supported(struct phy_device *phydev)
 		__set_phy_supported(phydev, max_speed);
 }
 
+static void of_set_phy_eee_disable(struct phy_device *phydev)
+{
+	struct device_node *node = phydev->mdio.dev.of_node;
+	u32 disabled;
+
+	if (!IS_ENABLED(CONFIG_OF_MDIO))
+		return;
+
+	if (!node)
+		return;
+
+	if (!of_property_read_u32(node, "eee-advert-disable", &disabled))
+		phydev->eee_advert_disabled = disabled;
+}
+
 /**
  * phy_probe - probe and init a PHY device
  * @dev: device to probe and init
@@ -1595,6 +1652,11 @@ static int phy_probe(struct device *dev)
 	of_set_phy_supported(phydev);
 	phydev->advertising = phydev->supported;
 
+	/* Get the EEE modes we want to prohibit. We will ask
+	 * the PHY stop advertising these mode later on
+	 */
+	of_set_phy_eee_disable(phydev);
+
 	/* Set the state to READY by default */
 	phydev->state = PHY_READY;
 
diff --git a/include/linux/phy.h b/include/linux/phy.h
index e25f1830fbcf..7f2ea0af16d1 100644
--- a/include/linux/phy.h
+++ b/include/linux/phy.h
@@ -401,6 +401,9 @@ struct phy_device {
 	u32 advertising;
 	u32 lp_advertising;
 
+	/* Energy efficient ethernet modes which should be prohibited */
+	u32 eee_advert_disabled;
+
 	int autoneg;
 
 	int link_timeout;
-- 
2.7.4

^ permalink raw reply related

* [RFC PATCH net v2 0/3] Fix OdroidC2 Gigabit Tx link issue
From: Jerome Brunet @ 2016-11-21 15:35 UTC (permalink / raw)
  To: netdev, devicetree, Florian Fainelli
  Cc: Jerome Brunet, Carlo Caione, Kevin Hilman, Giuseppe Cavallaro,
	Alexandre TORGUE, Martin Blumenstingl, Andre Roth, Neil Armstrong,
	linux-amlogic, linux-arm-kernel, linux-kernel

This patchset fixes an issue with the OdroidC2 board (DWMAC + RTL8211F).
Initially reported as a low Tx throughput issue at gigabit speed, the
platform enters LPI too often. This eventually break the link (both Tx
and Rx), and require to bring the interface down and up again to get the
Rx path working again.

The root cause of this issue is not fully understood yet but disabling EEE
advertisement on the PHY prevent this feature to be negotiated.
With this change, the link is stable and reliable, with the expected
throughput performance.

The patchset adds options in the generic phy driver to disable EEE
advertisement, through device tree. The way it is done is very similar
to the handling of the max-speed property.

This V2 is posted is posted as an RFC. Since it changes the generic PHY
it propably requires to be a bit more careful.
If you are not confortable taking for the coming rc, I can rebase on
net-next instead.

Chnages since V1: [1]
 - Disable the advertisement of EEE in the generic code instead of the
   realtek driver.

[1] : http://lkml.kernel.org/r/1479220154-25851-1-git-send-email-jbrunet@baylibre.com

Jerome Brunet (3):
  net: phy: add an option to disable EEE advertisement
  dt: bindings: add ethernet phy eee-disable-advert option documentation
  ARM64: dts: meson: odroidc2: disable advertisement EEE for GbE.

 Documentation/devicetree/bindings/net/phy.txt      |  5 ++
 .../arm64/boot/dts/amlogic/meson-gxbb-odroidc2.dts | 15 ++++
 drivers/net/phy/phy.c                              |  3 +
 drivers/net/phy/phy_device.c                       | 80 +++++++++++++++++++---
 include/linux/phy.h                                |  3 +
 5 files changed, 97 insertions(+), 9 deletions(-)

-- 
2.7.4

^ permalink raw reply

* Re: Synopsys Ethernet QoS Driver
From: Joao Pinto @ 2016-11-21 15:14 UTC (permalink / raw)
  To: Giuseppe CAVALLARO, Joao Pinto, Rayagond Kokatanur, Rabin Vincent
  Cc: mued dib, David Miller, Jeff Kirsher, jiri, saeedm, idosch,
	netdev, linux-kernel, CARLOS.PALMINHA, andreas.irestal,
	alexandre.torgue, lars.persson,
	linux-arm-kernel@lists.infradead.org
In-Reply-To: <7318c77e-3011-03f3-e673-54e3bb0554aa@st.com>

On 21-11-2016 15:03, Giuseppe CAVALLARO wrote:
> On 11/21/2016 4:00 PM, Joao Pinto wrote:
>> On 21-11-2016 14:36, Giuseppe CAVALLARO wrote:
>>> Hello Joao
>>>
>>> On 11/21/2016 2:48 PM, Joao Pinto wrote:
>>>> Synopsys QoS IP is a separated hardware component, so it should be reusable by
>>>> all implementations using it and so have its own "core driver" and platform +
>>>> pci glue drivers. This is necessary for example in hardware validation, where
>>>> you prototype an IP and instantiate its drivers and test it.
>>>>
>>>> Was there a strong reason to integrate QoS features directly in stmmac and not
>>>> in synopsys/dwc_eth_qos.*?
>>>
>>> We decided to enhance the stmmac on supporting the QoS for several
>>> reasons; for example the common APIs that the driver already exposed and
>>> actually suitable for other SYNP chips. Then, PTP, EEE,
>>> S/RGMII, MMC could be shared among different chips with a minimal
>>> effort.  This meant a lot of code already ready.
>>>
>>> For sure, the net-core, Ethtool, mdio parts were reused. Same for the
>>> glue logic files.
>>> For the latter, this helped to easily bring-up new platforms also
>>> because the stmmac uses the HW cap register to auto-configure many
>>> parts of the MAC core, DMA and modules. This helped many users, AFAIK.
>>>
>>> For validation purpose, this is my experience, the stmmac helped
>>> a lot because people used the same code to validate different HW
>>> and it was easy to switch to a platform to another one in order to
>>> verify / check if the support was ok or if a regression was introduced.
>>> This is important for complex supports like PTP or EEE.
>>>
>>> Hoping this can help.
>>>
>>> Do not hesitate to contact me for further details
>>
>> Thanks for the highly detailed info.
>> My target application is to prototype the Ethernet QoS IP in a FPGA, with a PHY
>> attached and make hardware validation.
>>
>> In your opinion a refactored stmmac with the missing QoS features would be
>> suitable for it?
> 
> I think so; somebody also added code for FPGA.
> 
> In any case, step-by-step we can explore and understand
> how to proceed. I wonder if you could start looking at the internal
> of the stmmac. Then welcome doubts and open question...

Yes I am going to do that thanks... taking my first steps in this IP :)

> 
>>
>> Thanks.
> 
> welcome
> 
> peppe
> 
>>
>>>
>>> peppe
>>
>>
> 

^ permalink raw reply

* Re: Synopsys Ethernet QoS Driver
From: Joao Pinto @ 2016-11-21 15:06 UTC (permalink / raw)
  To: Giuseppe CAVALLARO, Lars Persson
  Cc: Joao Pinto, Rayagond Kokatanur, Rabin Vincent, mued dib,
	David Miller, Jeff Kirsher, jiri@mellanox.com,
	saeedm@mellanox.com, idosch@mellanox.com, netdev,
	linux-kernel@vger.kernel.org, CARLOS.PALMINHA@synopsys.com,
	Andreas Irestål, alexandre.torgue@st.com,
	linux-arm-kernel@lists.infradead.org
In-Reply-To: <ffd3b27c-0bc9-b53b-cc65-c7808595a553@st.com>

On 21-11-2016 14:25, Giuseppe CAVALLARO wrote:
> On 11/21/2016 2:28 PM, Lars Persson wrote:
>>
>>
>>> 21 nov. 2016 kl. 13:53 skrev Giuseppe CAVALLARO <peppe.cavallaro@st.com>:
>>>
>>> Hello Joao
>>>
>>>> On 11/21/2016 1:32 PM, Joao Pinto wrote:
>>>> Hello,
>>>>
>>>>> On 21-11-2016 05:29, Rayagond Kokatanur wrote:
>>>>>> On Sat, Nov 19, 2016 at 7:26 PM, Rabin Vincent <rabin@rab.in> wrote:
>>>>>>> On Fri, Nov 18, 2016 at 02:20:27PM +0000, Joao Pinto wrote:
>>>>>>> For now we are interesting in improving the synopsys QoS driver under
>>>>>>> /nect/ethernet/synopsys. For now the driver structure consists of a
>>>>>>> single file
>>>>>>> called dwc_eth_qos.c, containing synopsys ethernet qos common ops and
>>>>>>> platform
>>>>>>> related stuff.
>>>>>>>
>>>>>>> Our strategy would be:
>>>>>>>
>>>>>>> a) Implement a platform glue driver (dwc_eth_qos_pltfm.c)
>>>>>>> b) Implement a pci glue driver (dwc_eth_qos_pci.c)
>>>>>>> c) Implement a "core driver" (dwc_eth_qos.c) that would only have
>>>>>>> Ethernet QoS
>>>>>>> related stuff to be reused by the platform / pci drivers
>>>>>>> d) Add a set of features to the "core driver" that we have available
>>>>>>> internally
>>>>>>
>>>>>> Note that there are actually two drivers in mainline for this hardware:
>>>>>>
>>>>>> drivers/net/ethernet/synopsis/
>>>>>> drivers/net/ethernet/stmicro/stmmac/
>>>>>
>>>>> Yes the later driver (drivers/net/ethernet/stmicro/stmmac/) supports
>>>>> both 3.x and 4.x. It has glue layer for pci, platform, core etc,
>>>>> please refer this driver once before you start.
>>>>>
>>>>> You can start adding missing feature of 4.x in stmmac driver.
>>>>
>>>> Thanks you all for all the info.
>>>> Well, I think we are in a good position to organize the ethernet drivers
>>>> concerning Synopsys IPs.
>>>>
>>>> First of all, in my opinion, it does not make sense to have a ethernet/synopsis
>>>> (typo :)) when ethernet/stmicro is also for a synopsys IP. If we have another
>>>> vendor using the same IP it should be able to reuse the commonn operations. But
>>>> I would put that discussion for later :)
>>>>
>>>> For now I suggest that for we create ethernet/qos and create there a folder
>>>> called dwc (designware controller) where all the synopsys qos IP specific code
>>>> in order to be reused for example by ethernet/stmicro/stmmac/. We just have to
>>>> figure out a clean interface for "client drivers" like stmmac to interact with
>>>> the new qos driver.
>>>>
>>>> What do you think about this approach?
>>>
>>> The stmmac drivers run since many years on several platforms
>>> (sh4, stm32, arm, x86, mips ...) and it supports an huge of amount of
>>> configurations starting from 3.1x to 3.7x databooks.
>>>
>>> It also supports QoS hardware; for example, 4.00a, 4.10a and 4.20a
>>> are fully working.
>>>
>>> Also the stmmac has platform, device-tree and pcie supports and
>>> a lot of maintained glue-logic files.
>>>
>>> It is fully documented inside the kernel tree.
>>>
>>> I am happy to have new enhancements from other developers.
>>> So, on my side, if you want to spend your time on improving it on your
>>> platforms please feel free to do it!
>>>
>>> Concerning the stmicro/stmmac naming, these come from a really old
>>> story and have no issue to adopt new folder/file names.
>>>
>>> I am also open to merge fixes and changes from ethernet/synopsis.
>>> I want to point you on some benchmarks made by Alex some months ago
>>> (IIRC) that showed an stmmac winner (due to the several optimizations
>>> analyzed and reviewed in this mailing list).
>>>
>>> Peppe
>>>
>>
>> Hello Joao and others,
>>

Hi Lars,

>> As the maintainer of dwc_eth_qos.c I prefer also that we put efforts on the
>> most mature driver, the stmmac.
>>
>> I hope that the code can migrate into an ethernet/synopsys folder to keep the
>> convention of naming the folder after the vendor. This makes it easy for
>> others to find the driver.
>>
>> The dwc_eth_qos.c will eventually be removed and its DT binding interface can
>> then be implemented in the stmmac driver.

So your ideia is to pick the ethernet/stmmac and rename it to ethernet/synopsys
and try to improve the structure and add the missing QoS features to it?


> 
> Thanks Lars, I will be happy to support all you on this transition
> and I agree on renaming all.
> 
> peppe
> 
> 
>> - Lars
>>
>>>>
>>>>
>>>>>
>>>>>>
>>>>>> (See http://lists.openwall.net/netdev/2016/02/29/127)
>>>>>>
>>>>>> The former only supports 4.x of the hardware.
>>>>>>
>>>>>> The later supports 4.x and 3.x and already has a platform glue driver
>>>>>> with support for several platforms, a PCI glue driver, and a core driver
>>>>>> with several features not present in the former (for example: TX/RX
>>>>>> interrupt coalescing, EEE, PTP).
>>>>>>
>>>>>> Have you evaluated both drivers?  Why have you decided to work on the
>>>>>> former rather than the latter?
>>>>>
>>>>>
>>>>
>>>> Thanks.
>>>>
>>>>
>>>>
>>>>
>>>
>>
> 

^ permalink raw reply

* [PATCH net-next 1/2] net: batman-adv: Treat NET_XMIT_CN as transmit successfully
From: fgao @ 2016-11-21 15:03 UTC (permalink / raw)
  To: mareklindner, sw, a, davem, b.a.t.m.a.n, netdev, gfree.wind

From: Gao Feng <gfree.wind@gmail.com>

The tc could return NET_XMIT_CN as one congestion notification, but
it does not mean the packet is lost. Other modules like ipvlan,
macvlan, and others treat NET_XMIT_CN as success too.

So batman-adv should add the NET_XMIT_CN check.

Signed-off-by: Gao Feng <gfree.wind@gmail.com>
---
 net/batman-adv/distributed-arp-table.c |  2 +-
 net/batman-adv/fragmentation.c         |  2 +-
 net/batman-adv/routing.c               | 10 +++++-----
 net/batman-adv/soft-interface.c        |  2 +-
 net/batman-adv/tp_meter.c              |  2 +-
 5 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/net/batman-adv/distributed-arp-table.c b/net/batman-adv/distributed-arp-table.c
index 49576c5..f6ff4de 100644
--- a/net/batman-adv/distributed-arp-table.c
+++ b/net/batman-adv/distributed-arp-table.c
@@ -659,7 +659,7 @@ static bool batadv_dat_send_data(struct batadv_priv *bat_priv,
 		}
 
 		send_status = batadv_send_unicast_skb(tmp_skb, neigh_node);
-		if (send_status == NET_XMIT_SUCCESS) {
+		if (send_status == NET_XMIT_SUCCESS || send_status == NET_XMIT_CN) {
 			/* count the sent packet */
 			switch (packet_subtype) {
 			case BATADV_P_DAT_DHT_GET:
diff --git a/net/batman-adv/fragmentation.c b/net/batman-adv/fragmentation.c
index 9c561e6..5239616 100644
--- a/net/batman-adv/fragmentation.c
+++ b/net/batman-adv/fragmentation.c
@@ -509,7 +509,7 @@ int batadv_frag_send_packet(struct sk_buff *skb,
 		batadv_add_counter(bat_priv, BATADV_CNT_FRAG_TX_BYTES,
 				   skb_fragment->len + ETH_HLEN);
 		ret = batadv_send_unicast_skb(skb_fragment, neigh_node);
-		if (ret != NET_XMIT_SUCCESS) {
+		if (ret != NET_XMIT_SUCCESS && ret != NET_XMIT_CN) {
 			ret = NET_XMIT_DROP;
 			goto free_skb;
 		}
diff --git a/net/batman-adv/routing.c b/net/batman-adv/routing.c
index 6713bdf..6b08b26 100644
--- a/net/batman-adv/routing.c
+++ b/net/batman-adv/routing.c
@@ -262,7 +262,7 @@ static int batadv_recv_my_icmp_packet(struct batadv_priv *bat_priv,
 		icmph->ttl = BATADV_TTL;
 
 		res = batadv_send_skb_to_orig(skb, orig_node, NULL);
-		if (res == NET_XMIT_SUCCESS)
+		if (res == NET_XMIT_SUCCESS || res == NET_XMIT_CN)
 			ret = NET_RX_SUCCESS;
 
 		/* skb was consumed */
@@ -330,7 +330,7 @@ static int batadv_recv_icmp_ttl_exceeded(struct batadv_priv *bat_priv,
 	icmp_packet->ttl = BATADV_TTL;
 
 	res = batadv_send_skb_to_orig(skb, orig_node, NULL);
-	if (res == NET_RX_SUCCESS)
+	if (res == NET_RX_SUCCESS || res == NET_XMIT_CN)
 		ret = NET_XMIT_SUCCESS;
 
 	/* skb was consumed */
@@ -424,7 +424,7 @@ int batadv_recv_icmp_packet(struct sk_buff *skb,
 
 	/* route it */
 	res = batadv_send_skb_to_orig(skb, orig_node, recv_if);
-	if (res == NET_XMIT_SUCCESS)
+	if (res == NET_XMIT_SUCCESS || res == NET_XMIT_CN)
 		ret = NET_RX_SUCCESS;
 
 	/* skb was consumed */
@@ -719,14 +719,14 @@ static int batadv_route_unicast_packet(struct sk_buff *skb,
 
 	len = skb->len;
 	res = batadv_send_skb_to_orig(skb, orig_node, recv_if);
-	if (res == NET_XMIT_SUCCESS)
+	if (res == NET_XMIT_SUCCESS || res == NET_XMIT_CN)
 		ret = NET_RX_SUCCESS;
 
 	/* skb was consumed */
 	skb = NULL;
 
 	/* translate transmit result into receive result */
-	if (res == NET_XMIT_SUCCESS) {
+	if (res == NET_XMIT_SUCCESS || res == NET_XMIT_CN) {
 		/* skb was transmitted and consumed */
 		batadv_inc_counter(bat_priv, BATADV_CNT_FORWARD);
 		batadv_add_counter(bat_priv, BATADV_CNT_FORWARD_BYTES,
diff --git a/net/batman-adv/soft-interface.c b/net/batman-adv/soft-interface.c
index 7b3494a..60516bb 100644
--- a/net/batman-adv/soft-interface.c
+++ b/net/batman-adv/soft-interface.c
@@ -386,7 +386,7 @@ static int batadv_interface_tx(struct sk_buff *skb,
 			ret = batadv_send_skb_via_tt(bat_priv, skb, dst_hint,
 						     vid);
 		}
-		if (ret != NET_XMIT_SUCCESS)
+		if (ret != NET_XMIT_SUCCESS && ret != NET_XMIT_CN)
 			goto dropped_freed;
 	}
 
diff --git a/net/batman-adv/tp_meter.c b/net/batman-adv/tp_meter.c
index f156452..44bfb1e 100644
--- a/net/batman-adv/tp_meter.c
+++ b/net/batman-adv/tp_meter.c
@@ -615,7 +615,7 @@ static int batadv_tp_send_msg(struct batadv_tp_vars *tp_vars, const u8 *src,
 	batadv_tp_fill_prerandom(tp_vars, data, data_len);
 
 	r = batadv_send_skb_to_orig(skb, orig_node, NULL);
-	if (r == NET_XMIT_SUCCESS)
+	if (r == NET_XMIT_SUCCESS || r == NET_XMIT_CN)
 		return 0;
 
 	return BATADV_TP_REASON_CANT_SEND;
-- 
1.9.1

^ permalink raw reply related

* Re: Synopsys Ethernet QoS Driver
From: Giuseppe CAVALLARO @ 2016-11-21 15:03 UTC (permalink / raw)
  To: Joao Pinto, Rayagond Kokatanur, Rabin Vincent
  Cc: andreas.irestal, alexandre.torgue, saeedm, netdev, linux-kernel,
	CARLOS.PALMINHA, idosch, mued dib, jiri, Jeff Kirsher,
	David Miller, linux-arm-kernel@lists.infradead.org, lars.persson
In-Reply-To: <48776f8b-4e06-1456-1b52-3ea08a22b2a4@synopsys.com>

On 11/21/2016 4:00 PM, Joao Pinto wrote:
> On 21-11-2016 14:36, Giuseppe CAVALLARO wrote:
>> Hello Joao
>>
>> On 11/21/2016 2:48 PM, Joao Pinto wrote:
>>> Synopsys QoS IP is a separated hardware component, so it should be reusable by
>>> all implementations using it and so have its own "core driver" and platform +
>>> pci glue drivers. This is necessary for example in hardware validation, where
>>> you prototype an IP and instantiate its drivers and test it.
>>>
>>> Was there a strong reason to integrate QoS features directly in stmmac and not
>>> in synopsys/dwc_eth_qos.*?
>>
>> We decided to enhance the stmmac on supporting the QoS for several
>> reasons; for example the common APIs that the driver already exposed and
>> actually suitable for other SYNP chips. Then, PTP, EEE,
>> S/RGMII, MMC could be shared among different chips with a minimal
>> effort.  This meant a lot of code already ready.
>>
>> For sure, the net-core, Ethtool, mdio parts were reused. Same for the
>> glue logic files.
>> For the latter, this helped to easily bring-up new platforms also
>> because the stmmac uses the HW cap register to auto-configure many
>> parts of the MAC core, DMA and modules. This helped many users, AFAIK.
>>
>> For validation purpose, this is my experience, the stmmac helped
>> a lot because people used the same code to validate different HW
>> and it was easy to switch to a platform to another one in order to
>> verify / check if the support was ok or if a regression was introduced.
>> This is important for complex supports like PTP or EEE.
>>
>> Hoping this can help.
>>
>> Do not hesitate to contact me for further details
>
> Thanks for the highly detailed info.
> My target application is to prototype the Ethernet QoS IP in a FPGA, with a PHY
> attached and make hardware validation.
>
> In your opinion a refactored stmmac with the missing QoS features would be
> suitable for it?

I think so; somebody also added code for FPGA.

In any case, step-by-step we can explore and understand
how to proceed. I wonder if you could start looking at the internal
of the stmmac. Then welcome doubts and open question...

>
> Thanks.

welcome

peppe

>
>>
>> peppe
>
>

^ permalink raw reply

* [PATCH net-next 2/2] net: batman-adv: Remove one condition check in batadv_route_unicast_packet
From: fgao @ 2016-11-21 15:01 UTC (permalink / raw)
  To: mareklindner, sw, a, davem, b.a.t.m.a.n, netdev, gfree.wind

From: Gao Feng <gfree.wind@gmail.com>

It could decrease one condition check to collect some statements in the
first condition block.

Signed-off-by: Gao Feng <gfree.wind@gmail.com>
---
 net/batman-adv/routing.c | 10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/net/batman-adv/routing.c b/net/batman-adv/routing.c
index 6b08b26..9d657cf 100644
--- a/net/batman-adv/routing.c
+++ b/net/batman-adv/routing.c
@@ -719,20 +719,18 @@ static int batadv_route_unicast_packet(struct sk_buff *skb,
 
 	len = skb->len;
 	res = batadv_send_skb_to_orig(skb, orig_node, recv_if);
-	if (res == NET_XMIT_SUCCESS || res == NET_XMIT_CN)
-		ret = NET_RX_SUCCESS;
-
-	/* skb was consumed */
-	skb = NULL;
-
 	/* translate transmit result into receive result */
 	if (res == NET_XMIT_SUCCESS || res == NET_XMIT_CN) {
+		ret = NET_RX_SUCCESS;
 		/* skb was transmitted and consumed */
 		batadv_inc_counter(bat_priv, BATADV_CNT_FORWARD);
 		batadv_add_counter(bat_priv, BATADV_CNT_FORWARD_BYTES,
 				   len + ETH_HLEN);
 	}
 
+	/* skb was consumed */
+	skb = NULL;
+
 put_orig_node:
 	batadv_orig_node_put(orig_node);
 free_skb:
-- 
1.9.1

^ permalink raw reply related

* [PATCH net-next 1/2] net: batman-adv: Treat NET_XMIT_CN as transmit successfully
From: fgao @ 2016-11-21 15:00 UTC (permalink / raw)
  To: mareklindner, sw, a, davem, b.a.t.m.a.n, netdev, gfree.wind

From: Gao Feng <gfree.wind@gmail.com>

The tc could return NET_XMIT_CN as one congestion notification, but
it does not mean the packet is lost. Other modules like ipvlan,
macvlan, and others treat NET_XMIT_CN as success too.

So batman-adv should add the NET_XMIT_CN check.

Signed-off-by: Gao Feng <gfree.wind@gmail.com>
---
 net/batman-adv/distributed-arp-table.c |  2 +-
 net/batman-adv/fragmentation.c         |  2 +-
 net/batman-adv/routing.c               | 10 +++++-----
 net/batman-adv/soft-interface.c        |  2 +-
 net/batman-adv/tp_meter.c              |  2 +-
 5 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/net/batman-adv/distributed-arp-table.c b/net/batman-adv/distributed-arp-table.c
index 49576c5..f6ff4de 100644
--- a/net/batman-adv/distributed-arp-table.c
+++ b/net/batman-adv/distributed-arp-table.c
@@ -659,7 +659,7 @@ static bool batadv_dat_send_data(struct batadv_priv *bat_priv,
 		}
 
 		send_status = batadv_send_unicast_skb(tmp_skb, neigh_node);
-		if (send_status == NET_XMIT_SUCCESS) {
+		if (send_status == NET_XMIT_SUCCESS || send_status == NET_XMIT_CN) {
 			/* count the sent packet */
 			switch (packet_subtype) {
 			case BATADV_P_DAT_DHT_GET:
diff --git a/net/batman-adv/fragmentation.c b/net/batman-adv/fragmentation.c
index 9c561e6..5239616 100644
--- a/net/batman-adv/fragmentation.c
+++ b/net/batman-adv/fragmentation.c
@@ -509,7 +509,7 @@ int batadv_frag_send_packet(struct sk_buff *skb,
 		batadv_add_counter(bat_priv, BATADV_CNT_FRAG_TX_BYTES,
 				   skb_fragment->len + ETH_HLEN);
 		ret = batadv_send_unicast_skb(skb_fragment, neigh_node);
-		if (ret != NET_XMIT_SUCCESS) {
+		if (ret != NET_XMIT_SUCCESS && ret != NET_XMIT_CN) {
 			ret = NET_XMIT_DROP;
 			goto free_skb;
 		}
diff --git a/net/batman-adv/routing.c b/net/batman-adv/routing.c
index 6713bdf..6b08b26 100644
--- a/net/batman-adv/routing.c
+++ b/net/batman-adv/routing.c
@@ -262,7 +262,7 @@ static int batadv_recv_my_icmp_packet(struct batadv_priv *bat_priv,
 		icmph->ttl = BATADV_TTL;
 
 		res = batadv_send_skb_to_orig(skb, orig_node, NULL);
-		if (res == NET_XMIT_SUCCESS)
+		if (res == NET_XMIT_SUCCESS || res == NET_XMIT_CN)
 			ret = NET_RX_SUCCESS;
 
 		/* skb was consumed */
@@ -330,7 +330,7 @@ static int batadv_recv_icmp_ttl_exceeded(struct batadv_priv *bat_priv,
 	icmp_packet->ttl = BATADV_TTL;
 
 	res = batadv_send_skb_to_orig(skb, orig_node, NULL);
-	if (res == NET_RX_SUCCESS)
+	if (res == NET_RX_SUCCESS || res == NET_XMIT_CN)
 		ret = NET_XMIT_SUCCESS;
 
 	/* skb was consumed */
@@ -424,7 +424,7 @@ int batadv_recv_icmp_packet(struct sk_buff *skb,
 
 	/* route it */
 	res = batadv_send_skb_to_orig(skb, orig_node, recv_if);
-	if (res == NET_XMIT_SUCCESS)
+	if (res == NET_XMIT_SUCCESS || res == NET_XMIT_CN)
 		ret = NET_RX_SUCCESS;
 
 	/* skb was consumed */
@@ -719,14 +719,14 @@ static int batadv_route_unicast_packet(struct sk_buff *skb,
 
 	len = skb->len;
 	res = batadv_send_skb_to_orig(skb, orig_node, recv_if);
-	if (res == NET_XMIT_SUCCESS)
+	if (res == NET_XMIT_SUCCESS || res == NET_XMIT_CN)
 		ret = NET_RX_SUCCESS;
 
 	/* skb was consumed */
 	skb = NULL;
 
 	/* translate transmit result into receive result */
-	if (res == NET_XMIT_SUCCESS) {
+	if (res == NET_XMIT_SUCCESS || res == NET_XMIT_CN) {
 		/* skb was transmitted and consumed */
 		batadv_inc_counter(bat_priv, BATADV_CNT_FORWARD);
 		batadv_add_counter(bat_priv, BATADV_CNT_FORWARD_BYTES,
diff --git a/net/batman-adv/soft-interface.c b/net/batman-adv/soft-interface.c
index 7b3494a..60516bb 100644
--- a/net/batman-adv/soft-interface.c
+++ b/net/batman-adv/soft-interface.c
@@ -386,7 +386,7 @@ static int batadv_interface_tx(struct sk_buff *skb,
 			ret = batadv_send_skb_via_tt(bat_priv, skb, dst_hint,
 						     vid);
 		}
-		if (ret != NET_XMIT_SUCCESS)
+		if (ret != NET_XMIT_SUCCESS && ret != NET_XMIT_CN)
 			goto dropped_freed;
 	}
 
diff --git a/net/batman-adv/tp_meter.c b/net/batman-adv/tp_meter.c
index f156452..44bfb1e 100644
--- a/net/batman-adv/tp_meter.c
+++ b/net/batman-adv/tp_meter.c
@@ -615,7 +615,7 @@ static int batadv_tp_send_msg(struct batadv_tp_vars *tp_vars, const u8 *src,
 	batadv_tp_fill_prerandom(tp_vars, data, data_len);
 
 	r = batadv_send_skb_to_orig(skb, orig_node, NULL);
-	if (r == NET_XMIT_SUCCESS)
+	if (r == NET_XMIT_SUCCESS || r == NET_XMIT_CN)
 		return 0;
 
 	return BATADV_TP_REASON_CANT_SEND;
-- 
1.9.1

^ permalink raw reply related

* Re: Synopsys Ethernet QoS Driver
From: Joao Pinto @ 2016-11-21 15:00 UTC (permalink / raw)
  To: Giuseppe CAVALLARO, Joao Pinto, Rayagond Kokatanur, Rabin Vincent
  Cc: mued dib, David Miller, Jeff Kirsher, jiri, saeedm, idosch,
	netdev, linux-kernel, CARLOS.PALMINHA, andreas.irestal,
	alexandre.torgue, lars.persson,
	linux-arm-kernel@lists.infradead.org
In-Reply-To: <937252db-9538-2cf6-c8fa-82b558531c51@st.com>

On 21-11-2016 14:36, Giuseppe CAVALLARO wrote:
> Hello Joao
> 
> On 11/21/2016 2:48 PM, Joao Pinto wrote:
>> Synopsys QoS IP is a separated hardware component, so it should be reusable by
>> all implementations using it and so have its own "core driver" and platform +
>> pci glue drivers. This is necessary for example in hardware validation, where
>> you prototype an IP and instantiate its drivers and test it.
>>
>> Was there a strong reason to integrate QoS features directly in stmmac and not
>> in synopsys/dwc_eth_qos.*?
> 
> We decided to enhance the stmmac on supporting the QoS for several
> reasons; for example the common APIs that the driver already exposed and
> actually suitable for other SYNP chips. Then, PTP, EEE,
> S/RGMII, MMC could be shared among different chips with a minimal
> effort.  This meant a lot of code already ready.
> 
> For sure, the net-core, Ethtool, mdio parts were reused. Same for the
> glue logic files.
> For the latter, this helped to easily bring-up new platforms also
> because the stmmac uses the HW cap register to auto-configure many
> parts of the MAC core, DMA and modules. This helped many users, AFAIK.
> 
> For validation purpose, this is my experience, the stmmac helped
> a lot because people used the same code to validate different HW
> and it was easy to switch to a platform to another one in order to
> verify / check if the support was ok or if a regression was introduced.
> This is important for complex supports like PTP or EEE.
> 
> Hoping this can help.
> 
> Do not hesitate to contact me for further details

Thanks for the highly detailed info.
My target application is to prototype the Ethernet QoS IP in a FPGA, with a PHY
attached and make hardware validation.

In your opinion a refactored stmmac with the missing QoS features would be
suitable for it?

Thanks.

> 
> peppe

^ permalink raw reply

* [PATCH] flowcache: Increase threshold for refusing new allocations
From: Miroslav Urbanek @ 2016-11-21 14:48 UTC (permalink / raw)
  To: Steffen Klassert; +Cc: NetDev

The threshold for OOM protection is too small for systems with large
number of CPUs. Applications report ENOBUFs on connect() every 10
minutes.

The problem is that the variable net->xfrm.flow_cache_gc_count is a
global counter while the variable fc->high_watermark is a per-CPU
constant. Take the number of CPUs into account as well.

Fixes: 6ad3122a08e3 ("flowcache: Avoid OOM condition under preasure")
Reported-by: Lukáš Koldrt <lk@excello.cz>
Tested-by: Jan Hejl <jh@excello.cz>
Signed-off-by: Miroslav Urbanek <mu@miroslavurbanek.com>
---
 net/core/flow.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/net/core/flow.c b/net/core/flow.c
index 3937b1b..18e8893 100644
--- a/net/core/flow.c
+++ b/net/core/flow.c
@@ -95,7 +95,6 @@ static void flow_cache_gc_task(struct work_struct *work)
 	list_for_each_entry_safe(fce, n, &gc_list, u.gc_list) {
 		flow_entry_kill(fce, xfrm);
 		atomic_dec(&xfrm->flow_cache_gc_count);
-		WARN_ON(atomic_read(&xfrm->flow_cache_gc_count) < 0);
 	}
 }
 
@@ -236,9 +235,8 @@ flow_cache_lookup(struct net *net, const struct flowi *key, u16 family, u8 dir,
 		if (fcp->hash_count > fc->high_watermark)
 			flow_cache_shrink(fc, fcp);
 
-		if (fcp->hash_count > 2 * fc->high_watermark ||
-		    atomic_read(&net->xfrm.flow_cache_gc_count) > fc->high_watermark) {
-			atomic_inc(&net->xfrm.flow_cache_genid);
+		if (atomic_read(&net->xfrm.flow_cache_gc_count) >
+		    2 * num_online_cpus() * fc->high_watermark) {
 			flo = ERR_PTR(-ENOBUFS);
 			goto ret_object;
 		}
-- 
2.7.3

^ permalink raw reply related

* Re: [PATCH net-next 1/7] net: Add net-device param to the get offloaded stats ndo
From: kbuild test robot @ 2016-11-21 14:49 UTC (permalink / raw)
  To: Saeed Mahameed
  Cc: kbuild-all, David S. Miller, netdev, Or Gerlitz, Roi Dayan,
	Saeed Mahameed
In-Reply-To: <1479733561-26601-2-git-send-email-saeedm@mellanox.com>

[-- Attachment #1: Type: text/plain, Size: 2410 bytes --]

Hi Or,

[auto build test ERROR on net-next/master]

url:    https://github.com/0day-ci/linux/commits/Saeed-Mahameed/Mellanox-100G-mlx5-SRIOV-switchdev-update/20161121-211957
config: ia64-allmodconfig (attached as .config)
compiler: ia64-linux-gcc (GCC) 6.2.0
reproduce:
        wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        make.cross ARCH=ia64 

All errors (new ones prefixed by >>):

>> drivers/net/ethernet/mellanox/mlxsw/spectrum.c:1321:27: error: initialization from incompatible pointer type [-Werror=incompatible-pointer-types]
     .ndo_has_offload_stats = mlxsw_sp_port_has_offload_stats,
                              ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   drivers/net/ethernet/mellanox/mlxsw/spectrum.c:1321:27: note: (near initialization for 'mlxsw_sp_port_netdev_ops.ndo_has_offload_stats')
   cc1: some warnings being treated as errors

vim +1321 drivers/net/ethernet/mellanox/mlxsw/spectrum.c

56ade8fe Jiri Pirko    2015-10-16  1315  	.ndo_start_xmit		= mlxsw_sp_port_xmit,
763b4b70 Yotam Gigi    2016-07-21  1316  	.ndo_setup_tc           = mlxsw_sp_setup_tc,
c5b9b518 Jiri Pirko    2015-12-03  1317  	.ndo_set_rx_mode	= mlxsw_sp_set_rx_mode,
56ade8fe Jiri Pirko    2015-10-16  1318  	.ndo_set_mac_address	= mlxsw_sp_port_set_mac_address,
56ade8fe Jiri Pirko    2015-10-16  1319  	.ndo_change_mtu		= mlxsw_sp_port_change_mtu,
56ade8fe Jiri Pirko    2015-10-16  1320  	.ndo_get_stats64	= mlxsw_sp_port_get_stats64,
fc1bbb0f Nogah Frankel 2016-09-16 @1321  	.ndo_has_offload_stats	= mlxsw_sp_port_has_offload_stats,
fc1bbb0f Nogah Frankel 2016-09-16  1322  	.ndo_get_offload_stats	= mlxsw_sp_port_get_offload_stats,
56ade8fe Jiri Pirko    2015-10-16  1323  	.ndo_vlan_rx_add_vid	= mlxsw_sp_port_add_vid,
56ade8fe Jiri Pirko    2015-10-16  1324  	.ndo_vlan_rx_kill_vid	= mlxsw_sp_port_kill_vid,

:::::: The code at line 1321 was first introduced by commit
:::::: fc1bbb0f1831cc22326c86fb21d88cca44999b3e mlxsw: spectrum: Implement offload stats ndo and expose HW stats by default

:::::: TO: Nogah Frankel <nogahf@mellanox.com>
:::::: CC: David S. Miller <davem@davemloft.net>

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 45257 bytes --]

^ permalink raw reply

* Re: Synopsys Ethernet QoS Driver
From: Giuseppe CAVALLARO @ 2016-11-21 14:36 UTC (permalink / raw)
  To: Joao Pinto, Rayagond Kokatanur, Rabin Vincent
  Cc: andreas.irestal, alexandre.torgue, saeedm, netdev, linux-kernel,
	CARLOS.PALMINHA, idosch, mued dib, jiri, Jeff Kirsher,
	David Miller, linux-arm-kernel@lists.infradead.org, lars.persson
In-Reply-To: <afc18eb0-4c11-9668-498d-982624c6de78@synopsys.com>

Hello Joao

On 11/21/2016 2:48 PM, Joao Pinto wrote:
> Synopsys QoS IP is a separated hardware component, so it should be reusable by
> all implementations using it and so have its own "core driver" and platform +
> pci glue drivers. This is necessary for example in hardware validation, where
> you prototype an IP and instantiate its drivers and test it.
>
> Was there a strong reason to integrate QoS features directly in stmmac and not
> in synopsys/dwc_eth_qos.*?

We decided to enhance the stmmac on supporting the QoS for several
reasons; for example the common APIs that the driver already exposed and
actually suitable for other SYNP chips. Then, PTP, EEE,
S/RGMII, MMC could be shared among different chips with a minimal
effort.  This meant a lot of code already ready.

For sure, the net-core, Ethtool, mdio parts were reused. Same for the
glue logic files.
For the latter, this helped to easily bring-up new platforms also
because the stmmac uses the HW cap register to auto-configure many
parts of the MAC core, DMA and modules. This helped many users, AFAIK.

For validation purpose, this is my experience, the stmmac helped
a lot because people used the same code to validate different HW
and it was easy to switch to a platform to another one in order to
verify / check if the support was ok or if a regression was introduced.
This is important for complex supports like PTP or EEE.

Hoping this can help.

Do not hesitate to contact me for further details

peppe

^ permalink raw reply

* Re: Synopsys Ethernet QoS Driver
From: Giuseppe CAVALLARO @ 2016-11-21 14:25 UTC (permalink / raw)
  To: Lars Persson
  Cc: Joao Pinto, Rayagond Kokatanur, Rabin Vincent, mued dib,
	David Miller, Jeff Kirsher, jiri@mellanox.com,
	saeedm@mellanox.com, idosch@mellanox.com, netdev,
	linux-kernel@vger.kernel.org, CARLOS.PALMINHA@synopsys.com,
	Andreas Irestål, alexandre.torgue@st.com,
	linux-arm-kernel@lists.infradead.org
In-Reply-To: <BFA799B0-BB22-4515-AC14-3544A3167B0A@axis.com>

On 11/21/2016 2:28 PM, Lars Persson wrote:
>
>
>> 21 nov. 2016 kl. 13:53 skrev Giuseppe CAVALLARO <peppe.cavallaro@st.com>:
>>
>> Hello Joao
>>
>>> On 11/21/2016 1:32 PM, Joao Pinto wrote:
>>> Hello,
>>>
>>>> On 21-11-2016 05:29, Rayagond Kokatanur wrote:
>>>>> On Sat, Nov 19, 2016 at 7:26 PM, Rabin Vincent <rabin@rab.in> wrote:
>>>>>> On Fri, Nov 18, 2016 at 02:20:27PM +0000, Joao Pinto wrote:
>>>>>> For now we are interesting in improving the synopsys QoS driver under
>>>>>> /nect/ethernet/synopsys. For now the driver structure consists of a single file
>>>>>> called dwc_eth_qos.c, containing synopsys ethernet qos common ops and platform
>>>>>> related stuff.
>>>>>>
>>>>>> Our strategy would be:
>>>>>>
>>>>>> a) Implement a platform glue driver (dwc_eth_qos_pltfm.c)
>>>>>> b) Implement a pci glue driver (dwc_eth_qos_pci.c)
>>>>>> c) Implement a "core driver" (dwc_eth_qos.c) that would only have Ethernet QoS
>>>>>> related stuff to be reused by the platform / pci drivers
>>>>>> d) Add a set of features to the "core driver" that we have available internally
>>>>>
>>>>> Note that there are actually two drivers in mainline for this hardware:
>>>>>
>>>>> drivers/net/ethernet/synopsis/
>>>>> drivers/net/ethernet/stmicro/stmmac/
>>>>
>>>> Yes the later driver (drivers/net/ethernet/stmicro/stmmac/) supports
>>>> both 3.x and 4.x. It has glue layer for pci, platform, core etc,
>>>> please refer this driver once before you start.
>>>>
>>>> You can start adding missing feature of 4.x in stmmac driver.
>>>
>>> Thanks you all for all the info.
>>> Well, I think we are in a good position to organize the ethernet drivers
>>> concerning Synopsys IPs.
>>>
>>> First of all, in my opinion, it does not make sense to have a ethernet/synopsis
>>> (typo :)) when ethernet/stmicro is also for a synopsys IP. If we have another
>>> vendor using the same IP it should be able to reuse the commonn operations. But
>>> I would put that discussion for later :)
>>>
>>> For now I suggest that for we create ethernet/qos and create there a folder
>>> called dwc (designware controller) where all the synopsys qos IP specific code
>>> in order to be reused for example by ethernet/stmicro/stmmac/. We just have to
>>> figure out a clean interface for "client drivers" like stmmac to interact with
>>> the new qos driver.
>>>
>>> What do you think about this approach?
>>
>> The stmmac drivers run since many years on several platforms
>> (sh4, stm32, arm, x86, mips ...) and it supports an huge of amount of
>> configurations starting from 3.1x to 3.7x databooks.
>>
>> It also supports QoS hardware; for example, 4.00a, 4.10a and 4.20a
>> are fully working.
>>
>> Also the stmmac has platform, device-tree and pcie supports and
>> a lot of maintained glue-logic files.
>>
>> It is fully documented inside the kernel tree.
>>
>> I am happy to have new enhancements from other developers.
>> So, on my side, if you want to spend your time on improving it on your
>> platforms please feel free to do it!
>>
>> Concerning the stmicro/stmmac naming, these come from a really old
>> story and have no issue to adopt new folder/file names.
>>
>> I am also open to merge fixes and changes from ethernet/synopsis.
>> I want to point you on some benchmarks made by Alex some months ago
>> (IIRC) that showed an stmmac winner (due to the several optimizations
>> analyzed and reviewed in this mailing list).
>>
>> Peppe
>>
>
> Hello Joao and others,
>
> As the maintainer of dwc_eth_qos.c I prefer also that we put efforts on the most mature driver, the stmmac.
>
> I hope that the code can migrate into an ethernet/synopsys folder to keep the convention of naming the folder after the vendor. This makes it easy for others to find the driver.
>
> The dwc_eth_qos.c will eventually be removed and its DT binding interface can then be implemented in the stmmac driver.

Thanks Lars, I will be happy to support all you on this transition
and I agree on renaming all.

peppe


> - Lars
>
>>>
>>>
>>>>
>>>>>
>>>>> (See http://lists.openwall.net/netdev/2016/02/29/127)
>>>>>
>>>>> The former only supports 4.x of the hardware.
>>>>>
>>>>> The later supports 4.x and 3.x and already has a platform glue driver
>>>>> with support for several platforms, a PCI glue driver, and a core driver
>>>>> with several features not present in the former (for example: TX/RX
>>>>> interrupt coalescing, EEE, PTP).
>>>>>
>>>>> Have you evaluated both drivers?  Why have you decided to work on the
>>>>> former rather than the latter?
>>>>
>>>>
>>>
>>> Thanks.
>>>
>>>
>>>
>>>
>>
>

^ permalink raw reply

* Re: [PATCH 4.9.0-rc5] AR9300 calibration problems with antenna selected
From: Matthias May @ 2016-11-21 14:11 UTC (permalink / raw)
  To: Krzysztof Hałasa
  Cc: miaoqing, ath9k-devel, Kalle Valo, linux-wireless, ath9k-devel,
	netdev, linux-kernel
In-Reply-To: <m3d1hp2b0z.fsf@t19.piap.pl>

On 21/11/16 14:54, Krzysztof Hałasa wrote:
> miaoqing@codeaurora.org writes:
> 
>>> rmmod ath9k
>>> modprobe ath9k
>>> iw dev wlan0 set type ibss
>>> iw phy phyX set antenna 2
>>
>> 2 is a bad mask. We use bitmap, the valid masks are 1, 3, 7.
> 
> Thanks for your response.
> 
> I have two antenna connections (and a single antenna). Is it possible to
> select the secondary antenna connector only? How?
> 

No this is not really possible.
We have been playing around with this two, three years ago with this.
There are just too many things which rely on chain0
Noise calibration, DFS, Temperature measurement, Spectrum measurement
are just a few of the things I remember which don't work realiably anymore.
See [1] for more.

BR
Matthias

[1]
http://ath9k-devel.ath9k.narkive.com/QZcobwy1/ath9k-deaf-qca9558-when-setting-rxchainmask#post1

^ permalink raw reply

* [PATCH v2] VSOCK: add loopback to virtio_transport
From: Stefan Hajnoczi @ 2016-11-21 13:56 UTC (permalink / raw)
  To: netdev
  Cc: cavery, Claudio Imbrenda, Jorgen Hansen, David S . Miller,
	Stefan Hajnoczi

The VMware VMCI transport supports loopback inside virtual machines.
This patch implements loopback for virtio-vsock.

Flow control is handled by the virtio-vsock protocol as usual.  The
sending process stops transmitting on a connection when the peer's
receive buffer space is exhausted.

Cathy Avery <cavery@redhat.com> noticed this difference between VMCI and
virtio-vsock when a test case using loopback failed.  Although loopback
isn't the main point of AF_VSOCK, it is useful for testing and
virtio-vsock must match VMCI semantics so that userspace programs run
regardless of the underlying transport.

My understanding is that loopback is not supported on the host side with
VMCI.  Follow that by implementing it only in the guest driver, not the
vhost host driver.

Cc: Jorgen Hansen <jhansen@vmware.com>
Reported-by: Cathy Avery <cavery@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
v2:
 * Fixed checkpatch.pl warnings [DaveM]

 net/vmw_vsock/virtio_transport.c | 56 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 56 insertions(+)

diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
index 936d7ee..2e47f9f0 100644
--- a/net/vmw_vsock/virtio_transport.c
+++ b/net/vmw_vsock/virtio_transport.c
@@ -44,6 +44,10 @@ struct virtio_vsock {
 	spinlock_t send_pkt_list_lock;
 	struct list_head send_pkt_list;
 
+	struct work_struct loopback_work;
+	spinlock_t loopback_list_lock; /* protects loopback_list */
+	struct list_head loopback_list;
+
 	atomic_t queued_replies;
 
 	/* The following fields are protected by rx_lock.  vqs[VSOCK_VQ_RX]
@@ -74,6 +78,42 @@ static u32 virtio_transport_get_local_cid(void)
 	return vsock->guest_cid;
 }
 
+static void virtio_transport_loopback_work(struct work_struct *work)
+{
+	struct virtio_vsock *vsock =
+		container_of(work, struct virtio_vsock, loopback_work);
+	LIST_HEAD(pkts);
+
+	spin_lock_bh(&vsock->loopback_list_lock);
+	list_splice_init(&vsock->loopback_list, &pkts);
+	spin_unlock_bh(&vsock->loopback_list_lock);
+
+	mutex_lock(&vsock->rx_lock);
+	while (!list_empty(&pkts)) {
+		struct virtio_vsock_pkt *pkt;
+
+		pkt = list_first_entry(&pkts, struct virtio_vsock_pkt, list);
+		list_del_init(&pkt->list);
+
+		virtio_transport_recv_pkt(pkt);
+	}
+	mutex_unlock(&vsock->rx_lock);
+}
+
+static int virtio_transport_send_pkt_loopback(struct virtio_vsock *vsock,
+					      struct virtio_vsock_pkt *pkt)
+{
+	int len = pkt->len;
+
+	spin_lock_bh(&vsock->loopback_list_lock);
+	list_add_tail(&pkt->list, &vsock->loopback_list);
+	spin_unlock_bh(&vsock->loopback_list_lock);
+
+	queue_work(virtio_vsock_workqueue, &vsock->loopback_work);
+
+	return len;
+}
+
 static void
 virtio_transport_send_pkt_work(struct work_struct *work)
 {
@@ -159,6 +199,9 @@ virtio_transport_send_pkt(struct virtio_vsock_pkt *pkt)
 		return -ENODEV;
 	}
 
+	if (le32_to_cpu(pkt->hdr.dst_cid) == vsock->guest_cid)
+		return virtio_transport_send_pkt_loopback(vsock, pkt);
+
 	if (pkt->reply)
 		atomic_inc(&vsock->queued_replies);
 
@@ -510,10 +553,13 @@ static int virtio_vsock_probe(struct virtio_device *vdev)
 	mutex_init(&vsock->event_lock);
 	spin_lock_init(&vsock->send_pkt_list_lock);
 	INIT_LIST_HEAD(&vsock->send_pkt_list);
+	spin_lock_init(&vsock->loopback_list_lock);
+	INIT_LIST_HEAD(&vsock->loopback_list);
 	INIT_WORK(&vsock->rx_work, virtio_transport_rx_work);
 	INIT_WORK(&vsock->tx_work, virtio_transport_tx_work);
 	INIT_WORK(&vsock->event_work, virtio_transport_event_work);
 	INIT_WORK(&vsock->send_pkt_work, virtio_transport_send_pkt_work);
+	INIT_WORK(&vsock->loopback_work, virtio_transport_loopback_work);
 
 	mutex_lock(&vsock->rx_lock);
 	virtio_vsock_rx_fill(vsock);
@@ -539,6 +585,7 @@ static void virtio_vsock_remove(struct virtio_device *vdev)
 	struct virtio_vsock *vsock = vdev->priv;
 	struct virtio_vsock_pkt *pkt;
 
+	flush_work(&vsock->loopback_work);
 	flush_work(&vsock->rx_work);
 	flush_work(&vsock->tx_work);
 	flush_work(&vsock->event_work);
@@ -565,6 +612,15 @@ static void virtio_vsock_remove(struct virtio_device *vdev)
 	}
 	spin_unlock_bh(&vsock->send_pkt_list_lock);
 
+	spin_lock_bh(&vsock->loopback_list_lock);
+	while (!list_empty(&vsock->loopback_list)) {
+		pkt = list_first_entry(&vsock->loopback_list,
+				       struct virtio_vsock_pkt, list);
+		list_del(&pkt->list);
+		virtio_transport_free_pkt(pkt);
+	}
+	spin_unlock_bh(&vsock->loopback_list_lock);
+
 	mutex_lock(&the_virtio_vsock_mutex);
 	the_virtio_vsock = NULL;
 	vsock_core_exit();
-- 
2.7.4

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox