Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [net-next PATCH v2 4/5] virtio_net: add dedicated XDP transmit queues
From: John Fastabend @ 2016-11-21 15:56 UTC (permalink / raw)
  To: Daniel Borkmann, eric.dumazet, mst, kubakici, shm, davem,
	alexei.starovoitov
  Cc: netdev, bblanco, john.r.fastabend, brouer, tgraf
In-Reply-To: <5832DE60.4000200@iogearbox.net>

On 16-11-21 03:45 AM, Daniel Borkmann wrote:
> On 11/20/2016 03:51 AM, John Fastabend wrote:
>> XDP requires using isolated transmit queues to avoid interference
>> with normal networking stack (BQL, NETDEV_TX_BUSY, etc). This patch
>> adds a XDP queue per cpu when a XDP program is loaded and does not
>> expose the queues to the OS via the normal API call to
>> netif_set_real_num_tx_queues(). This way the stack will never push
>> an skb to these queues.
>>
>> However virtio/vhost/qemu implementation only allows for creating
>> TX/RX queue pairs at this time so creating only TX queues was not
>> possible. And because the associated RX queues are being created I
>> went ahead and exposed these to the stack and let the backend use
>> them. This creates more RX queues visible to the network stack than
>> TX queues which is worth mentioning but does not cause any issues as
>> far as I can tell.
>>
>> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
>> ---

[...]

>>       }
>>
>> +    curr_qp = vi->curr_queue_pairs - vi->xdp_queue_pairs;
>> +    if (prog)
>> +        xdp_qp = nr_cpu_ids;
>> +
>> +    /* XDP requires extra queues for XDP_TX */
>> +    if (curr_qp + xdp_qp > vi->max_queue_pairs) {
>> +        netdev_warn(dev, "request %i queues but max is %i\n",
>> +                curr_qp + xdp_qp, vi->max_queue_pairs);
>> +        return -ENOMEM;
>> +    }
>> +
>> +    err = virtnet_set_queues(vi, curr_qp + xdp_qp);
>> +    if (err) {
>> +        dev_warn(&dev->dev, "XDP Device queue allocation failure.\n");
>> +        return err;
>> +    }
>> +
>>       if (prog) {
>> -        prog = bpf_prog_add(prog, vi->max_queue_pairs - 1);
>> -        if (IS_ERR(prog))
>> +        prog = bpf_prog_add(prog, vi->max_queue_pairs);
> 
> I think this change is not correct, it would be off by one now.
> The previous 'vi->max_queue_pairs - 1' was actually correct here.
> dev_change_xdp_fd() already gives you a reference (see the doc on
> enum xdp_netdev_command in netdevice.h).


Right, this was an error thanks for checking it I'll send a v3. And
maybe draft a test for XDP ref counting to test it in the future.

.John

^ permalink raw reply

* Re: [RFC PATCH net v2 2/3] dt: bindings: add ethernet phy eee-disable-advert option documentation
From: Andrew Lunn @ 2016-11-21 16:01 UTC (permalink / raw)
  To: Jerome Brunet
  Cc: netdev, devicetree, Florian Fainelli, Alexandre TORGUE,
	Neil Armstrong, Martin Blumenstingl, Kevin Hilman, linux-kernel,
	Andre Roth, linux-amlogic, Carlo Caione, Giuseppe Cavallaro,
	linux-arm-kernel
In-Reply-To: <1479742524-30222-3-git-send-email-jbrunet@baylibre.com>

On Mon, Nov 21, 2016 at 04:35:23PM +0100, Jerome Brunet wrote:
> Signed-off-by: Jerome Brunet <jbrunet@baylibre.com>
> ---
>  Documentation/devicetree/bindings/net/phy.txt | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/Documentation/devicetree/bindings/net/phy.txt b/Documentation/devicetree/bindings/net/phy.txt
> index bc1c3c8bf8fa..7f066b7c1e2c 100644
> --- a/Documentation/devicetree/bindings/net/phy.txt
> +++ b/Documentation/devicetree/bindings/net/phy.txt
> @@ -35,6 +35,11 @@ Optional Properties:
>  - broken-turn-around: If set, indicates the PHY device does not correctly
>    release the turn around line low at the end of a MDIO transaction.
>  
> +- eee-advert-disable: Bits to clear in the MDIO_AN_EEE_ADV register to
> +  disable EEE modes. Example
> +    * 0x4: disable EEE for 1000T,
> +    * 0x6: disable EEE for 100TX and 1000T
> +

Hi Jerome

I like the direction this patchset is taking. But hex values are
pretty unfriendly. Please add a set of boolean properties, and do the
mapping to hex in the C code.

That would also make extending this API easier. e.g. say you have a
10Gbps PHY with EEE, and you need to disable it. This hex value
quickly gets ugly, eee-advert-disable-10000 is nice and simple.

	Andrew

^ permalink raw reply

* Re: Netperf UDP issue with connected sockets
From: Jesper Dangaard Brouer @ 2016-11-21 16:03 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Rick Jones, netdev, Saeed Mahameed, Tariq Toukan, brouer
In-Reply-To: <1479408683.8455.273.camel@edumazet-glaptop3.roam.corp.google.com>


On Thu, 17 Nov 2016 10:51:23 -0800
Eric Dumazet <eric.dumazet@gmail.com> wrote:

> On Thu, 2016-11-17 at 19:30 +0100, Jesper Dangaard Brouer wrote:
> 
> > The point is I can see a socket Send-Q forming, thus we do know the
> > application have something to send. Thus, and possibility for
> > non-opportunistic bulking. Allowing/implementing bulk enqueue from
> > socket layer into qdisc layer, should be fairly simple (and rest of
> > xmit_more is already in place).    
> 
> 
> As I said, you are fooled by TX completions.

Obviously TX completions play a role yes, and I bet I can adjust the
TX completion to cause xmit_more to happen, at the expense of
introducing added latency.

The point is the "bloated" spinlock in __dev_queue_xmit is still caused
by the MMIO tailptr/doorbell.  The added cost occurs when enqueueing
packets, and result in the inability to get enough packets into the
qdisc for xmit_more going (on my system).  I argue that a bulk enqueue
API would allow us to get past the hurtle of transitioning into
xmit_more mode more easily.


> Please make sure to increase the sndbuf limits !
> 
> echo 2129920 >/proc/sys/net/core/wmem_default

Testing with this makes no difference.

 $ grep -H . /proc/sys/net/core/wmem_default
 /proc/sys/net/core/wmem_default:2129920


> lpaa23:~# sar -n DEV 1 10|grep eth1
                  IFACE   rxpck/s    txpck/s    rxkB/s     txkB/s   rxcmp/s   txcmp/s  rxmcst/s
> 10:49:25         eth1      7.00 9273283.00      0.61 2187214.90      0.00      0.00      0.00
> 10:49:26         eth1      1.00 9230795.00      0.06 2176787.57      0.00      0.00      1.00
> 10:49:27         eth1      2.00 9247906.00      0.17 2180915.45      0.00      0.00      0.00
> 10:49:28         eth1      3.00 9246542.00      0.23 2180790.38      0.00      0.00      1.00
> Average:         eth1      2.50 9018045.70      0.25 2126893.82      0.00      0.00      0.50

Very impressive numbers 9.2Mpps TX.

What is this test?  What kind of traffic? Multiple CPUs?


> lpaa23:~# ethtool -S eth1|grep more; sleep 1;ethtool -S eth1|grep more
>      xmit_more: 2251366909
>      xmit_more: 2256011392
> 
> lpaa23:~# echo 2256011392-2251366909 | bc
> 4644483

The xmit_more definitely works on your system, but I cannot get it to
"kick-in" on my setup.  Once the xmit_more is active, then the
"bloated" spinlock problem should go way.


(Tests with "udp_flood --pmtu 3 --send")

Forcing TX completion to happen on the same CPU, no xmit_more:

 ~/git/network-testing/bin/ethtool_stats.pl --sec 2 --dev mlx5p2
 Show adapter(s) (mlx5p2) statistics (ONLY that changed!)
 Ethtool(mlx5p2  ) stat:    104592908 (    104,592,908) <= tx0_bytes /sec
 Ethtool(mlx5p2  ) stat:        39059 (         39,059) <= tx0_nop /sec
 Ethtool(mlx5p2  ) stat:      1743215 (      1,743,215) <= tx0_packets /sec 
 Ethtool(mlx5p2  ) stat:    104719986 (    104,719,986) <= tx_bytes /sec
 Ethtool(mlx5p2  ) stat:    111774540 (    111,774,540) <= tx_bytes_phy /sec
 Ethtool(mlx5p2  ) stat:      1745333 (      1,745,333) <= tx_csum_partial /sec
 Ethtool(mlx5p2  ) stat:      1745333 (      1,745,333) <= tx_packets /sec
 Ethtool(mlx5p2  ) stat:      1746477 (      1,746,477) <= tx_packets_phy /sec
 Ethtool(mlx5p2  ) stat:    111483434 (    111,483,434) <= tx_prio1_bytes /sec
 Ethtool(mlx5p2  ) stat:      1741928 (      1,741,928) <= tx_prio1_packets /sec

Forcing TX completion to happen on remote CPU, some xmit_more:

 Show adapter(s) (mlx5p2) statistics (ONLY that changed!)
 Ethtool(mlx5p2  ) stat:    128485892 (    128,485,892) <= tx0_bytes /sec
 Ethtool(mlx5p2  ) stat:        31840 (         31,840) <= tx0_nop /sec
 Ethtool(mlx5p2  ) stat:      2141432 (      2,141,432) <= tx0_packets /sec
 Ethtool(mlx5p2  ) stat:          350 (            350) <= tx0_xmit_more /sec
 Ethtool(mlx5p2  ) stat:    128486459 (    128,486,459) <= tx_bytes /sec
 Ethtool(mlx5p2  ) stat:    137052191 (    137,052,191) <= tx_bytes_phy /sec
 Ethtool(mlx5p2  ) stat:      2141441 (      2,141,441) <= tx_csum_partial /sec
 Ethtool(mlx5p2  ) stat:      2141441 (      2,141,441) <= tx_packets /sec
 Ethtool(mlx5p2  ) stat:      2141441 (      2,141,441) <= tx_packets_phy /sec
 Ethtool(mlx5p2  ) stat:    137051300 (    137,051,300) <= tx_prio1_bytes /sec
 Ethtool(mlx5p2  ) stat:      2141427 (      2,141,427) <= tx_prio1_packets /sec
 Ethtool(mlx5p2  ) stat:          350 (            350) <= tx_xmit_more /sec



>    PerfTop:   76969 irqs/sec  kernel:96.6%  exact: 100.0% [4000Hz cycles:pp],  (all, 48 CPUs)
>---------------------------------------------------------------------------------------------
>     11.64%  [kernel]  [k] skb_set_owner_w               
>      6.21%  [kernel]  [k] queued_spin_lock_slowpath     
>      4.76%  [kernel]  [k] _raw_spin_lock                
>      4.40%  [kernel]  [k] __ip_make_skb                 
>      3.10%  [kernel]  [k] sock_wfree                    
>      2.87%  [kernel]  [k] ipt_do_table                  
>      2.76%  [kernel]  [k] fq_dequeue                    
>      2.71%  [kernel]  [k] mlx4_en_xmit                  
>      2.50%  [kernel]  [k] __dev_queue_xmit              
>      2.29%  [kernel]  [k] __ip_append_data.isra.40      
>      2.28%  [kernel]  [k] udp_sendmsg                   
>      2.01%  [kernel]  [k] __alloc_skb                   
>      1.90%  [kernel]  [k] napi_consume_skb              
>      1.63%  [kernel]  [k] udp_send_skb                  
>      1.62%  [kernel]  [k] skb_release_data              
>      1.62%  [kernel]  [k] entry_SYSCALL_64_fastpath     
>      1.56%  [kernel]  [k] dev_hard_start_xmit           
>      1.55%  udpsnd    [.] __libc_send                   
>      1.48%  [kernel]  [k] netif_skb_features            
>      1.42%  [kernel]  [k] __qdisc_run                   
>      1.35%  [kernel]  [k] sk_dst_check                  
>      1.33%  [kernel]  [k] sock_def_write_space          
>      1.30%  [kernel]  [k] kmem_cache_alloc_node_trace   
>      1.29%  [kernel]  [k] __local_bh_enable_ip          
>      1.21%  [kernel]  [k] copy_user_enhanced_fast_string
>      1.08%  [kernel]  [k] __kmalloc_reserve.isra.40     
>      1.08%  [kernel]  [k] SYSC_sendto                   
>      1.07%  [kernel]  [k] kmem_cache_alloc_node         
>      0.95%  [kernel]  [k] ip_finish_output2             
>      0.95%  [kernel]  [k] ktime_get                     
>      0.91%  [kernel]  [k] validate_xmit_skb             
>      0.88%  [kernel]  [k] sock_alloc_send_pskb          
>      0.82%  [kernel]  [k] sock_sendmsg                  

My perf outputs below...

Forcing TX completion to happen on the same CPU, no xmit_more:

# Overhead  CPU  Command     Shared Object     Symbol                         
# ........  ...  ..........  ................. ...............................
#
    12.17%  000  udp_flood   [kernel.vmlinux]  [k] _raw_spin_lock             
     5.03%  000  udp_flood   [mlx5_core]       [k] mlx5e_sq_xmit              
     3.13%  000  udp_flood   [kernel.vmlinux]  [k] __ip_append_data.isra.47   
     2.85%  000  udp_flood   [kernel.vmlinux]  [k] entry_SYSCALL_64           
     2.75%  000  udp_flood   [mlx5_core]       [k] mlx5e_poll_tx_cq           
     2.61%  000  udp_flood   [kernel.vmlinux]  [k] sock_def_write_space       
     2.48%  000  udp_flood   [kernel.vmlinux]  [k] skb_set_owner_w            
     2.25%  000  udp_flood   [kernel.vmlinux]  [k] __alloc_skb                
     2.21%  000  udp_flood   [kernel.vmlinux]  [k] udp_sendmsg                
     2.19%  000  udp_flood   [kernel.vmlinux]  [k] __slab_free                
     2.08%  000  udp_flood   [kernel.vmlinux]  [k] sock_wfree                 
     2.06%  000  udp_flood   [kernel.vmlinux]  [k] __ip_make_skb              
     1.93%  000  udp_flood   [mlx5_core]       [k] mlx5e_get_cqe              
     1.93%  000  udp_flood   libc-2.17.so      [.] __libc_send                
     1.80%  000  udp_flood   [kernel.vmlinux]  [k] entry_SYSCALL_64_fastpath  
     1.64%  000  udp_flood   [kernel.vmlinux]  [k] kfree                      
     1.61%  000  udp_flood   [kernel.vmlinux]  [k] ip_finish_output2          
     1.59%  000  udp_flood   [kernel.vmlinux]  [k] __local_bh_enable_ip       
     1.57%  000  udp_flood   [kernel.vmlinux]  [k] __dev_queue_xmit           
     1.49%  000  udp_flood   [kernel.vmlinux]  [k] __kmalloc_node_track_caller
     1.38%  000  udp_flood   [kernel.vmlinux]  [k] kmem_cache_alloc_node      
     1.30%  000  udp_flood   [kernel.vmlinux]  [k] dst_release                
     1.26%  000  udp_flood   [kernel.vmlinux]  [k] ksize                      
     1.26%  000  udp_flood   [kernel.vmlinux]  [k] sk_dst_check               
     1.22%  000  udp_flood   [kernel.vmlinux]  [k] SYSC_sendto                
     1.22%  000  udp_flood   [kernel.vmlinux]  [k] ip_send_check              


Forcing TX completion to happen on remote CPU, some xmit_more:

# Overhead  CPU  Command      Shared Object     Symbol                        
# ........  ...  ............ ................  ..............................
#
    11.67%  002  udp_flood   [kernel.vmlinux]  [k] _raw_spin_lock             
     7.61%  002  udp_flood   [kernel.vmlinux]  [k] skb_set_owner_w            
     6.15%  002  udp_flood   [mlx5_core]       [k] mlx5e_sq_xmit              
     3.05%  002  udp_flood   [kernel.vmlinux]  [k] entry_SYSCALL_64           
     2.89%  002  udp_flood   [kernel.vmlinux]  [k] __ip_append_data.isra.47   
     2.78%  000  swapper     [mlx5_core]       [k] mlx5e_poll_tx_cq           
     2.65%  002  udp_flood   [kernel.vmlinux]  [k] sk_dst_check               
     2.36%  002  udp_flood   [kernel.vmlinux]  [k] __alloc_skb                
     2.22%  002  udp_flood   [kernel.vmlinux]  [k] ip_finish_output2          
     2.07%  000  swapper     [kernel.vmlinux]  [k] __slab_free                
     2.06%  002  udp_flood   [kernel.vmlinux]  [k] udp_sendmsg                
     1.97%  002  udp_flood   [kernel.vmlinux]  [k] ksize                      
     1.92%  002  udp_flood   [kernel.vmlinux]  [k] entry_SYSCALL_64_fastpath  
     1.82%  002  udp_flood   [kernel.vmlinux]  [k] __ip_make_skb              
     1.79%  002  udp_flood   libc-2.17.so      [.] __libc_send                
     1.62%  002  udp_flood   [kernel.vmlinux]  [k] __kmalloc_node_track_caller
     1.53%  002  udp_flood   [kernel.vmlinux]  [k] __local_bh_enable_ip       
     1.48%  002  udp_flood   [kernel.vmlinux]  [k] sock_alloc_send_pskb       
     1.43%  002  udp_flood   [kernel.vmlinux]  [k] __dev_queue_xmit           
     1.39%  002  udp_flood   [kernel.vmlinux]  [k] ip_send_check              
     1.39%  002  udp_flood   [kernel.vmlinux]  [k] kmem_cache_alloc_node      
     1.37%  002  udp_flood   [kernel.vmlinux]  [k] dst_release                
     1.21%  002  udp_flood   [kernel.vmlinux]  [k] udp_send_skb               
     1.18%  002  udp_flood   [kernel.vmlinux]  [k] __fget_light               
     1.16%  002  udp_flood   [kernel.vmlinux]  [k] kfree                      
     1.15%  000  swapper     [kernel.vmlinux]  [k] sock_wfree                 
     1.14%  002  udp_flood   [kernel.vmlinux]  [k] SYSC_sendto                

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply

* Re: Synopsys Ethernet QoS Driver
From: Joao Pinto @ 2016-11-21 16:11 UTC (permalink / raw)
  To: Lars Persson, Joao Pinto
  Cc: Giuseppe CAVALLARO, Rayagond Kokatanur, Rabin Vincent, mued dib,
	David Miller, Jeff Kirsher, jiri@mellanox.com,
	saeedm@mellanox.com, idosch@mellanox.com, netdev,
	linux-kernel@vger.kernel.org, CARLOS.PALMINHA@synopsys.com,
	Andreas Irestål, alexandre.torgue@st.com,
	linux-arm-kernel@lists.infradead.org
In-Reply-To: <A001080B-2DC8-48D9-BD82-8276A9B3BE3D@axis.com>

On 21-11-2016 15:43, Lars Persson wrote:
> 
> 
>> 21 nov. 2016 kl. 16:06 skrev Joao Pinto <Joao.Pinto@synopsys.com>:
>>
>>> On 21-11-2016 14:25, Giuseppe CAVALLARO wrote:
>>>> On 11/21/2016 2:28 PM, Lars Persson wrote:
>>>>
>>>>
>>>>> 21 nov. 2016 kl. 13:53 skrev Giuseppe CAVALLARO <peppe.cavallaro@st.com>:
>>>>>
>>>>> Hello Joao
>>>>>
>>>>>> On 11/21/2016 1:32 PM, Joao Pinto wrote:
>>>>>> Hello,
>>>>>>
>>>>>>>> On 21-11-2016 05:29, Rayagond Kokatanur wrote:
>>>>>>>>> On Sat, Nov 19, 2016 at 7:26 PM, Rabin Vincent <rabin@rab.in> wrote:
>>>>>>>>> On Fri, Nov 18, 2016 at 02:20:27PM +0000, Joao Pinto wrote:
>>>>>>>>> For now we are interesting in improving the synopsys QoS driver under
>>>>>>>>> /nect/ethernet/synopsys. For now the driver structure consists of a
>>>>>>>>> single file
>>>>>>>>> called dwc_eth_qos.c, containing synopsys ethernet qos common ops and

snip (...)

>>>>> The stmmac drivers run since many years on several platforms
>>>>> (sh4, stm32, arm, x86, mips ...) and it supports an huge of amount of
>>>>> configurations starting from 3.1x to 3.7x databooks.
>>>>>
>>>>> It also supports QoS hardware; for example, 4.00a, 4.10a and 4.20a
>>>>> are fully working.
>>>>>
>>>>> Also the stmmac has platform, device-tree and pcie supports and
>>>>> a lot of maintained glue-logic files.
>>>>>
>>>>> It is fully documented inside the kernel tree.
>>>>>
>>>>> I am happy to have new enhancements from other developers.
>>>>> So, on my side, if you want to spend your time on improving it on your
>>>>> platforms please feel free to do it!
>>>>>
>>>>> Concerning the stmicro/stmmac naming, these come from a really old
>>>>> story and have no issue to adopt new folder/file names.
>>>>>
>>>>> I am also open to merge fixes and changes from ethernet/synopsis.
>>>>> I want to point you on some benchmarks made by Alex some months ago
>>>>> (IIRC) that showed an stmmac winner (due to the several optimizations
>>>>> analyzed and reviewed in this mailing list).
>>>>>
>>>>> Peppe
>>>>>
>>>>
>>>> Hello Joao and others,
>>>>
>>
>> Hi Lars,
>>
>>>> As the maintainer of dwc_eth_qos.c I prefer also that we put efforts on the
>>>> most mature driver, the stmmac.
>>>>
>>>> I hope that the code can migrate into an ethernet/synopsys folder to keep the
>>>> convention of naming the folder after the vendor. This makes it easy for
>>>> others to find the driver.
>>>>
>>>> The dwc_eth_qos.c will eventually be removed and its DT binding interface can
>>>> then be implemented in the stmmac driver.
>>
>> So your ideia is to pick the ethernet/stmmac and rename it to ethernet/synopsys
>> and try to improve the structure and add the missing QoS features to it?
> 
> Indeed this is what I prefer.

Ok, it makes sense.
Just for curiosity the target setup is the following:
https://www.youtube.com/watch?v=8V-LB5y2Cos
but instead of using internal drivers, we desire to use mainline drivers only.

Thanks!

> 
>>
>>>
>>> Thanks Lars, I will be happy to support all you on this transition
>>> and I agree on renaming all.
>>>
>>> peppe
>>>
>>>
>>>> - Lars
>>>>
>>>>>>
>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> (See http://lists.openwall.net/netdev/2016/02/29/127)
>>>>>>>>
>>>>>>>> The former only supports 4.x of the hardware.
>>>>>>>>
>>>>>>>> The later supports 4.x and 3.x and already has a platform glue driver
>>>>>>>> with support for several platforms, a PCI glue driver, and a core driver
>>>>>>>> with several features not present in the former (for example: TX/RX
>>>>>>>> interrupt coalescing, EEE, PTP).
>>>>>>>>
>>>>>>>> Have you evaluated both drivers?  Why have you decided to work on the
>>>>>>>> former rather than the latter?
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> Thanks.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>

^ permalink raw reply

* RE: [PATCH for-next 03/11] IB/hns: Optimize the logic of allocating memory using APIs
From: Salil Mehta @ 2016-11-21 16:12 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: dledford@redhat.com, Huwei (Xavier), oulijun,
	mehta.salil.lnk@gmail.com, linux-rdma@vger.kernel.org,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org, Linuxarm,
	Zhangping (ZP)
In-Reply-To: <20161116083602.GH4240@leon.nu>

> -----Original Message-----
> From: Leon Romanovsky [mailto:leon@kernel.org]
> Sent: Wednesday, November 16, 2016 8:36 AM
> To: Salil Mehta
> Cc: dledford@redhat.com; Huwei (Xavier); oulijun;
> mehta.salil.lnk@gmail.com; linux-rdma@vger.kernel.org;
> netdev@vger.kernel.org; linux-kernel@vger.kernel.org; Linuxarm;
> Zhangping (ZP)
> Subject: Re: [PATCH for-next 03/11] IB/hns: Optimize the logic of
> allocating memory using APIs
> 
> On Tue, Nov 15, 2016 at 03:52:46PM +0000, Salil Mehta wrote:
> > > -----Original Message-----
> > > From: Leon Romanovsky [mailto:leon@kernel.org]
> > > Sent: Wednesday, November 09, 2016 7:22 AM
> > > To: Salil Mehta
> > > Cc: dledford@redhat.com; Huwei (Xavier); oulijun;
> > > mehta.salil.lnk@gmail.com; linux-rdma@vger.kernel.org;
> > > netdev@vger.kernel.org; linux-kernel@vger.kernel.org; Linuxarm;
> > > Zhangping (ZP)
> > > Subject: Re: [PATCH for-next 03/11] IB/hns: Optimize the logic of
> > > allocating memory using APIs
> > >
> > > On Fri, Nov 04, 2016 at 04:36:25PM +0000, Salil Mehta wrote:
> > > > From: "Wei Hu (Xavier)" <xavier.huwei@huawei.com>
> > > >
> > > > This patch modified the logic of allocating memory using APIs in
> > > > hns RoCE driver. We used kcalloc instead of kmalloc_array and
> > > > bitmap_zero. And When kcalloc failed, call vzalloc to alloc
> > > > memory.
> > > >
> > > > Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>
> > > > Signed-off-by: Ping Zhang <zhangping5@huawei.com>
> > > > Signed-off-by: Salil Mehta  <salil.mehta@huawei.com>
> > > > ---
> > > >  drivers/infiniband/hw/hns/hns_roce_mr.c |   15 ++++++++-------
> > > >  1 file changed, 8 insertions(+), 7 deletions(-)
> > > >
> > > > diff --git a/drivers/infiniband/hw/hns/hns_roce_mr.c
> > > b/drivers/infiniband/hw/hns/hns_roce_mr.c
> > > > index fb87883..d3dfb5f 100644
> > > > --- a/drivers/infiniband/hw/hns/hns_roce_mr.c
> > > > +++ b/drivers/infiniband/hw/hns/hns_roce_mr.c
> > > > @@ -137,11 +137,12 @@ static int hns_roce_buddy_init(struct
> > > hns_roce_buddy *buddy, int max_order)
> > > >
> > > >  	for (i = 0; i <= buddy->max_order; ++i) {
> > > >  		s = BITS_TO_LONGS(1 << (buddy->max_order - i));
> > > > -		buddy->bits[i] = kmalloc_array(s, sizeof(long),
> > > GFP_KERNEL);
> > > > -		if (!buddy->bits[i])
> > > > -			goto err_out_free;
> > > > -
> > > > -		bitmap_zero(buddy->bits[i], 1 << (buddy->max_order -
> i));
> > > > +		buddy->bits[i] = kcalloc(s, sizeof(long),
> GFP_KERNEL);
> > > > +		if (!buddy->bits[i]) {
> > > > +			buddy->bits[i] = vzalloc(s * sizeof(long));
> > >
> > > I wonder, why don't you use directly vzalloc instead of kcalloc
> > > fallback?
> > As we know we will have physical contiguous pages if the kcalloc
> > call succeeds. This will give us a chance to have better performance
> > over the allocations which are just virtually contiguous through the
> > function vzalloc(). Therefore, later has only been used as a fallback
> > when our memory request cannot be entertained through kcalloc.
> >
> > Are you suggesting that there will not be much performance penalty
> > if we use just vzalloc ?
> 
> Not exactly,
> I asked it, because we have similar code in our drivers and this
> construction looks strange to me.
> 
> 1. If performance is critical, we will use kmalloc.
> 2. If performance is not critical, we will use vmalloc.
> 
> But in this case, such construction shows me that we can live with
> vmalloc performance and kmalloc allocation are not really needed.
> 
> In your specific case, I'm not sure that kcalloc will ever fail.
Performance is definitely critical here. Though, I agree this is bit
unusual way of memory allocation. In actual, we were encountering
memory alloc failures using kmalloc (if you see allocation amount
is on the higher side and is exponential) so we ended up using
vmalloc as fall back - It is very naïve allocation scheme.

Maybe we need to rethink this allocation scheme part? Also, I can pull
back this particular patch for now or just live with vzalloc() till
we figure out proper solution to this? 

> 
> Thanks
> 
> 
> >
> > >
> > > > +			if (!buddy->bits[i])
> > > > +				goto err_out_free;
> > > > +		}
> > > >  	}

^ permalink raw reply

* Re: [RFC PATCH net v2 2/3] dt: bindings: add ethernet phy eee-disable-advert option documentation
From: Jerome Brunet @ 2016-11-21 16:16 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA, devicetree-u79uwXL29TY76Z2rM5mHXA,
	Florian Fainelli, Alexandre TORGUE, Neil Armstrong,
	Martin Blumenstingl, Kevin Hilman,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Andre Roth,
	linux-amlogic-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, Carlo Caione,
	Giuseppe Cavallaro,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r
In-Reply-To: <20161121160149.GF1922-g2DYL2Zd6BY@public.gmane.org>

On Mon, 2016-11-21 at 17:01 +0100, Andrew Lunn wrote:
> On Mon, Nov 21, 2016 at 04:35:23PM +0100, Jerome Brunet wrote:
> > 
> > Signed-off-by: Jerome Brunet <jbrunet-rdvid1DuHRBWk0Htik3J/w@public.gmane.org>
> > ---
> >  Documentation/devicetree/bindings/net/phy.txt | 5 +++++
> >  1 file changed, 5 insertions(+)
> > 
> > diff --git a/Documentation/devicetree/bindings/net/phy.txt
> > b/Documentation/devicetree/bindings/net/phy.txt
> > index bc1c3c8bf8fa..7f066b7c1e2c 100644
> > --- a/Documentation/devicetree/bindings/net/phy.txt
> > +++ b/Documentation/devicetree/bindings/net/phy.txt
> > @@ -35,6 +35,11 @@ Optional Properties:
> >  - broken-turn-around: If set, indicates the PHY device does not
> > correctly
> >    release the turn around line low at the end of a MDIO
> > transaction.
> >  
> > +- eee-advert-disable: Bits to clear in the MDIO_AN_EEE_ADV
> > register to
> > +  disable EEE modes. Example
> > +    * 0x4: disable EEE for 1000T,
> > +    * 0x6: disable EEE for 100TX and 1000T
> > +
> 
> Hi Jerome
> 
> I like the direction this patchset is taking. But hex values are
> pretty unfriendly. 

Agreed

> Please add a set of boolean properties, and do the
> mapping to hex in the C code.
> 
> That would also make extending this API easier. e.g. say you have a
> 10Gbps PHY with EEE, and you need to disable it. This hex value
> quickly gets ugly, eee-advert-disable-10000 is nice and simple.

What I did not realize when doing this patch for the realtek driver is
that there is already 6 valid modes defined in the kernel

#define MDIO_EEE_100TX		MDIO_AN_EEE_ADV_100TX	/*
100TX EEE cap */
#define MDIO_EEE_1000T		MDIO_AN_EEE_ADV_1000T	/*
1000T EEE cap */
#define MDIO_EEE_10GT		0x0008	/* 10GT EEE cap */
#define MDIO_EEE_1000KX		0x0010	/* 1000KX EEE cap
*/
#define MDIO_EEE_10GKX4		0x0020	/* 10G KX4 EEE cap
*/
#define MDIO_EEE_10GKR		0x0040	/* 10G KR EEE cap
*/

I took care of only 2 in the case of realtek.c since it only support
MDIO_EEE_100TX and MDIO_EEE_1000T.

Defining a property for each is certainly doable but it does not look
very nice either. If it extends in the future, it will get even more
messier, especially if you want to disable everything.

What do you think about keeping a single mask value but use the define
above in the DT ? It would be more readable than hex and easy to
extend, don't you think ?

These defines are already part of the uapi so I guess we can use those
in the DT bindings ?

> 
> 	Andrew
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [mm PATCH v3 21/23] mm: Add support for releasing multiple instances of a page
From: Alexander Duyck @ 2016-11-21 16:21 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Alexander Duyck, linux-mm, Netdev, linux-kernel@vger.kernel.org
In-Reply-To: <20161118152716.3f7acf6e25f142846909b2f6@linux-foundation.org>

On Fri, Nov 18, 2016 at 3:27 PM, Andrew Morton
<akpm@linux-foundation.org> wrote:
> On Thu, 10 Nov 2016 06:36:06 -0500 Alexander Duyck <alexander.h.duyck@intel.com> wrote:
>
>> This patch adds a function that allows us to batch free a page that has
>> multiple references outstanding.  Specifically this function can be used to
>> drop a page being used in the page frag alloc cache.  With this drivers can
>> make use of functionality similar to the page frag alloc cache without
>> having to do any workarounds for the fact that there is no function that
>> frees multiple references.
>>
>> ...
>>
>> --- a/include/linux/gfp.h
>> +++ b/include/linux/gfp.h
>> @@ -506,6 +506,8 @@ extern void free_hot_cold_page(struct page *page, bool cold);
>>  extern void free_hot_cold_page_list(struct list_head *list, bool cold);
>>
>>  struct page_frag_cache;
>> +extern void __page_frag_drain(struct page *page, unsigned int order,
>> +                           unsigned int count);
>>  extern void *__alloc_page_frag(struct page_frag_cache *nc,
>>                              unsigned int fragsz, gfp_t gfp_mask);
>>  extern void __free_page_frag(void *addr);
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index 0fbfead..54fea40 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -3912,6 +3912,20 @@ static struct page *__page_frag_refill(struct page_frag_cache *nc,
>>       return page;
>>  }
>>
>> +void __page_frag_drain(struct page *page, unsigned int order,
>> +                    unsigned int count)
>> +{
>> +     VM_BUG_ON_PAGE(page_ref_count(page) == 0, page);
>> +
>> +     if (page_ref_sub_and_test(page, count)) {
>> +             if (order == 0)
>> +                     free_hot_cold_page(page, false);
>> +             else
>> +                     __free_pages_ok(page, order);
>> +     }
>> +}
>> +EXPORT_SYMBOL(__page_frag_drain);
>
> It's an exported-to-modules library function.  It should be documented,
> please?  The page-frag API is only partially documented, but that's no
> excuse.

Okay.  I assume you want the documentation as a follow-up patch since
I received a notice that the patch was added to -mm?

> And perhaps documentation will help explain the naming choice.  Why
> "drain"?  I'd have expected "put"?

The idea was that this is supposed to be a counterpart to
__page_frag_refill.  Basically it is a function we can use if we need
to tear down the page frag cache and free the backing page.  If you
want I could update the names for these functions to make that
clarification that this is meant to drain a frag cache versus just
freeing a page frag.  I had originally thought about coming up with an
mput or something like that since we are dropping multiple references,
but then I figured since we already had __page_frag_refill I would go
for __page_frag_drain.

> And why the leading underscores.  The page-frag API is pretty weird :(
>
> And inconsistent.  __alloc_page_frag -> page_frag_alloc,
> __free_page_frag -> page_frag_free(), etc.  I must have been asleep
> when I let that lot through.

The leading underscores are inherited.  Most of it has to do with the
fact that this is a backing API for the netdev sk_buff allocator.
When this stuff existed in net it was already named this way and I
just moved it over.  I'm not sure if you approved it or not as I don't
see an Ack-by or Signed-off-by from you on the patch.  The timing of
it was such that I think Linus approved it and it was then pulled in
through Dave's tree.

If you would like I could look at doing a couple of renaming patches
so that we make the API a bit more consistent.  I could move the
__alloc and __free to what you have suggested, and then take a look at
trying to rename the refill/drain to be a bit more consistent in terms
of what they are supposed to work on and how they are supposed to be
used.

- Alex

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [PATCH net 07/18] net/ena: refactor ena_get_stats64 to be atomic context safe
From: kbuild test robot @ 2016-11-21 16:23 UTC (permalink / raw)
  To: Netanel Belgazal
  Cc: kbuild-all, linux-kernel, davem, netdev, Netanel Belgazal, dwmw,
	zorik, alex, saeed, msw, aliguori, nafea
In-Reply-To: <1479631547-29354-8-git-send-email-netanel@annapurnalabs.com>

[-- Attachment #1: Type: text/plain, Size: 3508 bytes --]

Hi Netanel,

[auto build test WARNING on net/master]

url:    https://github.com/0day-ci/linux/commits/Netanel-Belgazal/Update-ENA-driver-to-version-1-1-2/20161120-165649
config: i386-randconfig-h1-11212236 (attached as .config)
compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

Note: it may well be a FALSE warning. FWIW you are at least aware of it now.
http://gcc.gnu.org/wiki/Better_Uninitialized_Warnings

All warnings (new ones prefixed by >>):

   In file included from include/linux/mmzone.h:15:0,
                    from include/linux/gfp.h:5,
                    from include/linux/cpu_rmap.h:14,
                    from drivers/net/ethernet/amazon/ena/ena_netdev.c:36:
   drivers/net/ethernet/amazon/ena/ena_netdev.c: In function 'ena_get_stats64':
>> include/linux/seqlock.h:204:19: warning: 'rx_ring' may be used uninitialized in this function [-Wmaybe-uninitialized]
     return unlikely(s->sequence != start);
                      ^~
   drivers/net/ethernet/amazon/ena/ena_netdev.c:2188:19: note: 'rx_ring' was declared here
     struct ena_ring *rx_ring, *tx_ring;
                      ^~~~~~~

vim +/rx_ring +204 include/linux/seqlock.h

4f988f15 Linus Torvalds 2012-05-04  188  /**
3c22cd57 Nick Piggin    2011-01-07  189   * __read_seqcount_retry - end a seq-read critical section (without barrier)
3c22cd57 Nick Piggin    2011-01-07  190   * @s: pointer to seqcount_t
3c22cd57 Nick Piggin    2011-01-07  191   * @start: count, from read_seqcount_begin
3c22cd57 Nick Piggin    2011-01-07  192   * Returns: 1 if retry is required, else 0
3c22cd57 Nick Piggin    2011-01-07  193   *
3c22cd57 Nick Piggin    2011-01-07  194   * __read_seqcount_retry is like read_seqcount_retry, but has no smp_rmb()
3c22cd57 Nick Piggin    2011-01-07  195   * barrier. Callers should ensure that smp_rmb() or equivalent ordering is
3c22cd57 Nick Piggin    2011-01-07  196   * provided before actually loading any of the variables that are to be
3c22cd57 Nick Piggin    2011-01-07  197   * protected in this critical section.
3c22cd57 Nick Piggin    2011-01-07  198   *
3c22cd57 Nick Piggin    2011-01-07  199   * Use carefully, only in critical code, and comment how the barrier is
3c22cd57 Nick Piggin    2011-01-07  200   * provided.
3c22cd57 Nick Piggin    2011-01-07  201   */
3c22cd57 Nick Piggin    2011-01-07  202  static inline int __read_seqcount_retry(const seqcount_t *s, unsigned start)
3c22cd57 Nick Piggin    2011-01-07  203  {
3c22cd57 Nick Piggin    2011-01-07 @204  	return unlikely(s->sequence != start);
3c22cd57 Nick Piggin    2011-01-07  205  }
3c22cd57 Nick Piggin    2011-01-07  206  
3c22cd57 Nick Piggin    2011-01-07  207  /**
3c22cd57 Nick Piggin    2011-01-07  208   * read_seqcount_retry - end a seq-read critical section
3c22cd57 Nick Piggin    2011-01-07  209   * @s: pointer to seqcount_t
3c22cd57 Nick Piggin    2011-01-07  210   * @start: count, from read_seqcount_begin
3c22cd57 Nick Piggin    2011-01-07  211   * Returns: 1 if retry is required, else 0
3c22cd57 Nick Piggin    2011-01-07  212   *

:::::: The code at line 204 was first introduced by commit
:::::: 3c22cd5709e8143444a6d08682a87f4c57902df3 kernel: optimise seqlock

:::::: TO: Nick Piggin <npiggin@kernel.dk>
:::::: CC: Nick Piggin <npiggin@kernel.dk>

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 31320 bytes --]

^ permalink raw reply

* Re: [PATCH net-next v3 0/4] Couple of BPF refcount fixes for mlx5
From: David Miller @ 2016-11-21 16:26 UTC (permalink / raw)
  To: daniel; +Cc: alexei.starovoitov, bblanco, zhiyisun, ranas, saeedm, netdev
In-Reply-To: <cover.1479514784.git.daniel@iogearbox.net>

From: Daniel Borkmann <daniel@iogearbox.net>
Date: Sat, 19 Nov 2016 01:44:59 +0100

> Various mlx5 bugs on eBPF refcount handling found during review.
> Last patch in series adds a __must_check to BPF helpers to make
> sure we won't run into it again w/o compiler complaining first.

Series applied, thanks Daniel.

^ permalink raw reply

* Re: [PATCH net-next] bnx2: use READ_ONCE() instead of barrier()
From: David Miller @ 2016-11-21 16:32 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev, rasesh.mody, harish.patil
In-Reply-To: <1479596231.8455.354.camel@edumazet-glaptop3.roam.corp.google.com>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Sat, 19 Nov 2016 14:57:11 -0800

> From: Eric Dumazet <edumazet@google.com>
> 
> barrier() is a big hammer compared to READ_ONCE(),
> and requires comments explaining what is protected.
> 
> READ_ONCE() is more precise and compiler should generate
> better overall code.
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Applied.

^ permalink raw reply

* Re: [PATCH v2 net-next] mlx4: avoid unnecessary dirtying of critical fields
From: David Miller @ 2016-11-21 16:33 UTC (permalink / raw)
  To: eric.dumazet; +Cc: ttoukan.linux, netdev, tariqt
In-Reply-To: <1479662676.8455.364.camel@edumazet-glaptop3.roam.corp.google.com>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Sun, 20 Nov 2016 09:24:36 -0800

> From: Eric Dumazet <edumazet@google.com>
> 
> While stressing a 40Gbit mlx4 NIC with busy polling, I found false
> sharing in mlx4 driver that can be easily avoided.
> 
> This patch brings an additional 7 % performance improvement in UDP_RR
> workload.
> 
> 1) If we received no frame during one mlx4_en_process_rx_cq()
>    invocation, no need to call mlx4_cq_set_ci() and/or dirty ring->cons
> 
> 2) Do not refill rx buffers if we have plenty of them.
>    This avoids false sharing and allows some bulk/batch optimizations.
>    Page allocator and its locks will thank us.
> 
> Finally, mlx4_en_poll_rx_cq() should not return 0 if it determined
> cpu handling NIC IRQ should be changed. We should return budget-1
> instead, to not fool net_rx_action() and its netdev_budget.
> 
> 
> v2: keep AVG_PERF_COUNTER(... polled) even if polled is 0
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Applied.

^ permalink raw reply

* Re: [PATCH v2] ethernet: stmmac: make DWMAC_STM32 depend on it's associated SoC
From: David Miller @ 2016-11-21 16:34 UTC (permalink / raw)
  To: pbrobinson; +Cc: peppe.cavallaro, alexandre.torgue, mcoquelin.stm32, netdev
In-Reply-To: <20161120172238.7919-1-pbrobinson@gmail.com>

From: Peter Robinson <pbrobinson@gmail.com>
Date: Sun, 20 Nov 2016 17:22:38 +0000

> There's not much point, except compile test, enabling the stmmac
> platform drivers unless the STM32 SoC is enabled. It's not
> useful without it.
> 
> Signed-off-by: Peter Robinson <pbrobinson@gmail.com>

Applied.

^ permalink raw reply

* Re: [PATCH net-next] udp: avoid one cache line miss in recvmsg()
From: David Miller @ 2016-11-21 16:27 UTC (permalink / raw)
  To: eric.dumazet; +Cc: netdev
In-Reply-To: <1479518283.8455.312.camel@edumazet-glaptop3.roam.corp.google.com>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Fri, 18 Nov 2016 17:18:03 -0800

> From: Eric Dumazet <edumazet@google.com>
> 
> UDP_SKB_CB(skb)->partial_cov is located at offset 66 in skb,
> requesting a cold cache line being read in cpu cache.
> 
> We can avoid this cache line miss for UDP sockets,
> as partial_cov has a meaning only for UDPLite.
> 
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Applied.

^ permalink raw reply

* Re: [PATCH net-next v2 0/4] geneve: Use LWT more effectively.
From: David Miller @ 2016-11-21 16:28 UTC (permalink / raw)
  To: pshelar; +Cc: netdev
In-Reply-To: <1479521411-53012-1-git-send-email-pshelar@ovn.org>

From: Pravin B Shelar <pshelar@ovn.org>
Date: Fri, 18 Nov 2016 18:10:07 -0800

> Following patch series make use of geneve LWT code path for
> geneve netdev type of device.
> This allows us to simplify geneve module.
> 
> v1-v2:
> Fix warning reported by kbuild test robot.

This doesn't apply cleanly to net-next, please respin.

Thanks.

^ permalink raw reply

* Re: [RFC PATCH net v2 2/3] dt: bindings: add ethernet phy eee-disable-advert option documentation
From: Andrew Lunn @ 2016-11-21 16:47 UTC (permalink / raw)
  To: Jerome Brunet
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA, devicetree-u79uwXL29TY76Z2rM5mHXA,
	Florian Fainelli, Alexandre TORGUE, Neil Armstrong,
	Martin Blumenstingl, Kevin Hilman,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Andre Roth,
	linux-amlogic-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, Carlo Caione,
	Giuseppe Cavallaro,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r
In-Reply-To: <1479744993.17538.85.camel-rdvid1DuHRBWk0Htik3J/w@public.gmane.org>

> What I did not realize when doing this patch for the realtek driver is
> that there is already 6 valid modes defined in the kernel
> 
> #define MDIO_EEE_100TX		MDIO_AN_EEE_ADV_100TX	/*
> 100TX EEE cap */
> #define MDIO_EEE_1000T		MDIO_AN_EEE_ADV_1000T	/*
> 1000T EEE cap */
> #define MDIO_EEE_10GT		0x0008	/* 10GT EEE cap */
> #define MDIO_EEE_1000KX		0x0010	/* 1000KX EEE cap
> */
> #define MDIO_EEE_10GKX4		0x0020	/* 10G KX4 EEE cap
> */
> #define MDIO_EEE_10GKR		0x0040	/* 10G KR EEE cap
> */
> 
> I took care of only 2 in the case of realtek.c since it only support
> MDIO_EEE_100TX and MDIO_EEE_1000T.
> 
> Defining a property for each is certainly doable but it does not look
> very nice either. If it extends in the future, it will get even more
> messier, especially if you want to disable everything.

Yes, agreed.
 
> What do you think about keeping a single mask value but use the define
> above in the DT ? It would be more readable than hex and easy to
> extend, don't you think ?
> 
> These defines are already part of the uapi so I guess we can use those
> in the DT bindings ?

I don't think they are accessible from the dtc include path. You will
need to make a copy, in include/dt-bindings/net/phy.h

But yes, using these defines is a good idea.

     Andrew
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [PATCH] net: phy: micrel: fix KSZ8041FTL supported value
From: Kirill Esipov @ 2016-11-21 16:53 UTC (permalink / raw)
  To: netdev; +Cc: linux-kernel, Kirill Esipov

Fix setting of SUPPORTED_FIBRE bit as it was not present in features
of KSZ8041.

Signed-off-by: Kirill Esipov <yesipov@gmail.com>
---
 drivers/net/phy/micrel.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/net/phy/micrel.c b/drivers/net/phy/micrel.c
index 081df68..ea92d52 100644
--- a/drivers/net/phy/micrel.c
+++ b/drivers/net/phy/micrel.c
@@ -318,12 +318,12 @@ static int ksz8041_config_init(struct phy_device *phydev)
 	/* Limit supported and advertised modes in fiber mode */
 	if (of_property_read_bool(of_node, "micrel,fiber-mode")) {
 		phydev->dev_flags |= MICREL_PHY_FXEN;
-		phydev->supported &= SUPPORTED_FIBRE |
-				     SUPPORTED_100baseT_Full |
+		phydev->supported &= SUPPORTED_100baseT_Full |
 				     SUPPORTED_100baseT_Half;
-		phydev->advertising &= ADVERTISED_FIBRE |
-				       ADVERTISED_100baseT_Full |
+		phydev->supported |= SUPPORTED_FIBRE;
+		phydev->advertising &= ADVERTISED_100baseT_Full |
 				       ADVERTISED_100baseT_Half;
+		phydev->advertising |= ADVERTISED_FIBRE;
 		phydev->autoneg = AUTONEG_DISABLE;
 	}
 
-- 
2.7.4

^ permalink raw reply related

* Re: [PATCH net-next 1/1] driver: macvlan: Remove duplicated IFF_UP condition check in macvlan_forward_source
From: David Miller @ 2016-11-21 16:59 UTC (permalink / raw)
  To: fgao; +Cc: kaber, netdev, gfree.wind
In-Reply-To: <1479687998-456-1-git-send-email-fgao@ikuai8.com>

From: fgao@ikuai8.com
Date: Mon, 21 Nov 2016 08:26:38 +0800

> From: Gao Feng <gfree.wind@gmail.com>
> 
> The function macvlan_forward_source_one has already checked the flag
> IFF_UP, so needn't check it outside in macvlan_forward_source too.
> 
> Signed-off-by: Gao Feng <gfree.wind@gmail.com>
> ---
>  v2: Remove the IFF_UP check in macvlan_forward_source instead of macvlan_forward_source_one
>  v1: Initial patch

Applied.

^ permalink raw reply

* Re: [PATCH net] ipv6 addrconf: Implemented enhanced DAD (RFC7527)
From: Erik Nordmark @ 2016-11-21 17:10 UTC (permalink / raw)
  To: Hannes Frederic Sowa, netdev
In-Reply-To: <b9854c18-f71c-2e71-d352-531bca72eeb4@stressinduktion.org>

On 11/16/16 10:49 PM, Hannes Frederic Sowa wrote:
> I thought about even removing the sysctl altogether and enable enhanced
> DAD by default. ;)
>
> I am in favor of enabling it by default.
>
> But given that there could be broken implementations out there, we
> should give users a choice and provide.
OK, I'll make it the default and send out a new version of the patch. I 
was told I should base the patch on net-next instead of linux-stable so 
I'll move it there.
>
> Could you always generate a nonce in the interface structure? You could
> check the sysctl in the send and receive path to attach and check the
> nonce. This has the advantage that you don't need to delete the
> interface and recreate it to enable/disable enhanced dad on an interface
> (also you can get away with the loop around get_random_bytes to make
> sure its value is not zero as we don't depend on a non-zero nonce
> variable to signal enaling of the feature, see below).
The nonce is per interface address and not per interface. Furthermore, 
the RFC says that on a retry of DAD the nodes will end up using a 
different nonce implying that even for the same interface address it 
should pick a different nonce for each DAD attempt.
Note that since there is no automatic retry of DAD (per RFC4862) and 
each try would check the current sysctl setting so I don't think 
pre-generating the nonce would change the behavior.

>> Is that because get_random_bytes() will not fill in anything if there is
>> insufficient entropy available?
> No, just because 0 is a possible return value from the random number
> generator. ;)

Ah - makes sense.

Thanks again for the review,
    Erik

>>>>        inc = ipv6_addr_is_multicast(daddr);
>>>>
>>>> @@ -797,6 +811,16 @@ static void ndisc_recv_ns(struct sk_buff
>>>>    have_ifp:
>>>>            if (ifp->flags & (IFA_F_TENTATIVE|IFA_F_OPTIMISTIC)) {
>>>>                if (dad) {
>>>> +                if (nonce != 0 && ifp->dad_nonce == nonce) {
>>>> +                    /* Matching nonce if looped back */
>>>> +                    if (net_ratelimit())
>>>> +                        ND_PRINTK(2, notice,
>>>> +                              "%s: IPv6 DAD loopback for address %pI6c
>>>> nonce %llu ignored\n",
>>>> +                               ifp->idev->dev->name,
>>>> +                               &ifp->addr,
>>>> +                               nonce);
>>> If we print the nonce for debugging reasons, we should keep it in
>>> correct endianess on the wire vs. in the debug output.
>> How about printing it as colon-separated hex bytes since that is more
>> clear than decimal?
>> Would follow the network byte order in the packet.
> I would be totally fine with it. It will be probably easier to switch to
> a char[6] array for the nonce then.

^ permalink raw reply

* Re: [PATCH for-next 03/11] IB/hns: Optimize the logic of allocating memory using APIs
From: Leon Romanovsky @ 2016-11-21 17:14 UTC (permalink / raw)
  To: Salil Mehta
  Cc: dledford@redhat.com, Huwei (Xavier), oulijun,
	mehta.salil.lnk@gmail.com, linux-rdma@vger.kernel.org,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org, Linuxarm,
	Zhangping (ZP)
In-Reply-To: <F4CC6FACFEB3C54C9141D49AD221F7F91A7AD7DF@lhreml503-mbx>

[-- Attachment #1: Type: text/plain, Size: 4775 bytes --]

On Mon, Nov 21, 2016 at 04:12:38PM +0000, Salil Mehta wrote:
> > -----Original Message-----
> > From: Leon Romanovsky [mailto:leon@kernel.org]
> > Sent: Wednesday, November 16, 2016 8:36 AM
> > To: Salil Mehta
> > Cc: dledford@redhat.com; Huwei (Xavier); oulijun;
> > mehta.salil.lnk@gmail.com; linux-rdma@vger.kernel.org;
> > netdev@vger.kernel.org; linux-kernel@vger.kernel.org; Linuxarm;
> > Zhangping (ZP)
> > Subject: Re: [PATCH for-next 03/11] IB/hns: Optimize the logic of
> > allocating memory using APIs
> >
> > On Tue, Nov 15, 2016 at 03:52:46PM +0000, Salil Mehta wrote:
> > > > -----Original Message-----
> > > > From: Leon Romanovsky [mailto:leon@kernel.org]
> > > > Sent: Wednesday, November 09, 2016 7:22 AM
> > > > To: Salil Mehta
> > > > Cc: dledford@redhat.com; Huwei (Xavier); oulijun;
> > > > mehta.salil.lnk@gmail.com; linux-rdma@vger.kernel.org;
> > > > netdev@vger.kernel.org; linux-kernel@vger.kernel.org; Linuxarm;
> > > > Zhangping (ZP)
> > > > Subject: Re: [PATCH for-next 03/11] IB/hns: Optimize the logic of
> > > > allocating memory using APIs
> > > >
> > > > On Fri, Nov 04, 2016 at 04:36:25PM +0000, Salil Mehta wrote:
> > > > > From: "Wei Hu (Xavier)" <xavier.huwei@huawei.com>
> > > > >
> > > > > This patch modified the logic of allocating memory using APIs in
> > > > > hns RoCE driver. We used kcalloc instead of kmalloc_array and
> > > > > bitmap_zero. And When kcalloc failed, call vzalloc to alloc
> > > > > memory.
> > > > >
> > > > > Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>
> > > > > Signed-off-by: Ping Zhang <zhangping5@huawei.com>
> > > > > Signed-off-by: Salil Mehta  <salil.mehta@huawei.com>
> > > > > ---
> > > > >  drivers/infiniband/hw/hns/hns_roce_mr.c |   15 ++++++++-------
> > > > >  1 file changed, 8 insertions(+), 7 deletions(-)
> > > > >
> > > > > diff --git a/drivers/infiniband/hw/hns/hns_roce_mr.c
> > > > b/drivers/infiniband/hw/hns/hns_roce_mr.c
> > > > > index fb87883..d3dfb5f 100644
> > > > > --- a/drivers/infiniband/hw/hns/hns_roce_mr.c
> > > > > +++ b/drivers/infiniband/hw/hns/hns_roce_mr.c
> > > > > @@ -137,11 +137,12 @@ static int hns_roce_buddy_init(struct
> > > > hns_roce_buddy *buddy, int max_order)
> > > > >
> > > > >  	for (i = 0; i <= buddy->max_order; ++i) {
> > > > >  		s = BITS_TO_LONGS(1 << (buddy->max_order - i));
> > > > > -		buddy->bits[i] = kmalloc_array(s, sizeof(long),
> > > > GFP_KERNEL);
> > > > > -		if (!buddy->bits[i])
> > > > > -			goto err_out_free;
> > > > > -
> > > > > -		bitmap_zero(buddy->bits[i], 1 << (buddy->max_order -
> > i));
> > > > > +		buddy->bits[i] = kcalloc(s, sizeof(long),
> > GFP_KERNEL);
> > > > > +		if (!buddy->bits[i]) {
> > > > > +			buddy->bits[i] = vzalloc(s * sizeof(long));
> > > >
> > > > I wonder, why don't you use directly vzalloc instead of kcalloc
> > > > fallback?
> > > As we know we will have physical contiguous pages if the kcalloc
> > > call succeeds. This will give us a chance to have better performance
> > > over the allocations which are just virtually contiguous through the
> > > function vzalloc(). Therefore, later has only been used as a fallback
> > > when our memory request cannot be entertained through kcalloc.
> > >
> > > Are you suggesting that there will not be much performance penalty
> > > if we use just vzalloc ?
> >
> > Not exactly,
> > I asked it, because we have similar code in our drivers and this
> > construction looks strange to me.
> >
> > 1. If performance is critical, we will use kmalloc.
> > 2. If performance is not critical, we will use vmalloc.
> >
> > But in this case, such construction shows me that we can live with
> > vmalloc performance and kmalloc allocation are not really needed.
> >
> > In your specific case, I'm not sure that kcalloc will ever fail.
> Performance is definitely critical here. Though, I agree this is bit
> unusual way of memory allocation. In actual, we were encountering
> memory alloc failures using kmalloc (if you see allocation amount
> is on the higher side and is exponential) so we ended up using
> vmalloc as fall back - It is very naïve allocation scheme.

I understand it, we did the same, see our mlx5_vzalloc call.
BTW, we used __GFP_NOWARN flag, which you should consider to use
in your case too.

>
> Maybe we need to rethink this allocation scheme part? Also, I can pull
> back this particular patch for now or just live with vzalloc() till
> we figure out proper solution to this?

It is up to you, I don't think that you should drop it, AFAIK, there is
no other proper solution.

>
> >
> > Thanks
> >
> >
> > >
> > > >
> > > > > +			if (!buddy->bits[i])
> > > > > +				goto err_out_free;
> > > > > +		}
> > > > >  	}

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply

* Re: Netperf UDP issue with connected sockets
From: Eric Dumazet @ 2016-11-21 18:10 UTC (permalink / raw)
  To: Jesper Dangaard Brouer; +Cc: Rick Jones, netdev, Saeed Mahameed, Tariq Toukan
In-Reply-To: <20161121170351.50a09ee1@redhat.com>

On Mon, 2016-11-21 at 17:03 +0100, Jesper Dangaard Brouer wrote:
> On Thu, 17 Nov 2016 10:51:23 -0800
> Eric Dumazet <eric.dumazet@gmail.com> wrote:
> 
> > On Thu, 2016-11-17 at 19:30 +0100, Jesper Dangaard Brouer wrote:
> > 
> > > The point is I can see a socket Send-Q forming, thus we do know the
> > > application have something to send. Thus, and possibility for
> > > non-opportunistic bulking. Allowing/implementing bulk enqueue from
> > > socket layer into qdisc layer, should be fairly simple (and rest of
> > > xmit_more is already in place).    
> > 
> > 
> > As I said, you are fooled by TX completions.
> 
> Obviously TX completions play a role yes, and I bet I can adjust the
> TX completion to cause xmit_more to happen, at the expense of
> introducing added latency.
> 
> The point is the "bloated" spinlock in __dev_queue_xmit is still caused
> by the MMIO tailptr/doorbell.  The added cost occurs when enqueueing
> packets, and result in the inability to get enough packets into the
> qdisc for xmit_more going (on my system).  I argue that a bulk enqueue
> API would allow us to get past the hurtle of transitioning into
> xmit_more mode more easily.
> 

This is very nice, but we already have bulk enqueue, it is called
xmit_more.

Kernel does not know your application is sending a packet after the one
you send.

xmit_more is not often used applications/stacks send many small packets.

qdisc is empty (one enqueued packet is immediately dequeued so
skb->xmit_more is 0), and even bypassed (TCQ_F_CAN_BYPASS)

Not sure it this has been tried before, but the doorbell avoidance could
be done by the driver itself, because it knows a TX completion will come
shortly (well... if softirqs are not delayed too much !)

Doorbell would be forced only if :

(    "skb->xmit_more is not set" AND "TX engine is not 'started yet'" )
OR
( too many [1] packets were put in TX ring buffer, no point deferring
more)

Start the pump, but once it is started, let the doorbells being done by
TX completion.

ndo_start_xmit and TX completion handler would have to maintain a shared
state describing if packets were ready but doorbell deferred.

Note that TX completion means "if at least one packet was drained",
otherwise busy polling, constantly calling napi->poll() would force a
doorbell too soon for devices sharing a NAPI for both RX and TX.

But then, maybe busy poll would like to force a doorbell...

I could try these ideas on mlx4 shortly.

[1] limit could be derived from active "ethtool -c" params, eg tx-frames

^ permalink raw reply

* Re: [PATCH net 1/1] net: l2tp: Treat NET_XMIT_CN as success in l2tp_eth_dev_xmit
From: David Miller @ 2016-11-21 18:11 UTC (permalink / raw)
  To: fgao; +Cc: edumazet, javier, netdev, gfree.wind
In-Reply-To: <1479689781-2125-1-git-send-email-fgao@ikuai8.com>

From: fgao@ikuai8.com
Date: Mon, 21 Nov 2016 08:56:21 +0800

> From: Gao Feng <gfree.wind@gmail.com>
> 
> The tc could return NET_XMIT_CN as one congestion notification, but
> it does not mean the packe is lost. Other modules like ipvlan,
> macvlan, and others treat NET_XMIT_CN as success too.
> So l2tp_eth_dev_xmit should add the NET_XMIT_CN check.
> 
> Signed-off-by: Gao Feng <gfree.wind@gmail.com>

Applied.

^ permalink raw reply

* Re: [PATCH] VSOCK: add loopback to virtio_transport
From: Jorgen S. Hansen @ 2016-11-21 12:40 UTC (permalink / raw)
  To: Stefan Hajnoczi, netdev@vger.kernel.org
  Cc: cavery@redhat.com, Claudio Imbrenda, David S . Miller
In-Reply-To: <1479397763-22319-1-git-send-email-stefanha@redhat.com>

Hi Stefan,

That should make it on par with the VMCI transport.

Thanks,
Jørgen

________________________________________
From: Stefan Hajnoczi <stefanha@redhat.com>
Sent: Thursday, November 17, 2016 4:49 PM
To: netdev@vger.kernel.org
Cc: cavery@redhat.com; Claudio Imbrenda; Jorgen S. Hansen; David S . Miller; Stefan Hajnoczi
Subject: [PATCH] VSOCK: add loopback to virtio_transport

The VMware VMCI transport supports loopback inside virtual machines.
This patch implements loopback for virtio-vsock.

Flow control is handled by the virtio-vsock protocol as usual.  The
sending process stops transmitting on a connection when the peer's
receive buffer space is exhausted.

Cathy Avery <cavery@redhat.com> noticed this difference between VMCI and
virtio-vsock when a test case using loopback failed.  Although loopback
isn't the main point of AF_VSOCK, it is useful for testing and
virtio-vsock must match VMCI semantics so that userspace programs run
regardless of the underlying transport.

My understanding is that loopback is not supported on the host side with
VMCI.  Follow that by implementing it only in the guest driver, not the
vhost host driver.

Cc: Jorgen Hansen <jhansen@vmware.com>
Reported-by: Cathy Avery <cavery@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 net/vmw_vsock/virtio_transport.c | 57 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 57 insertions(+)

diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
index 936d7ee..f2c4071 100644
--- a/net/vmw_vsock/virtio_transport.c
+++ b/net/vmw_vsock/virtio_transport.c
@@ -44,6 +44,10 @@ struct virtio_vsock {
        spinlock_t send_pkt_list_lock;
        struct list_head send_pkt_list;

+       struct work_struct loopback_work;
+       spinlock_t loopback_list_lock;
+       struct list_head loopback_list;
+
        atomic_t queued_replies;

        /* The following fields are protected by rx_lock.  vqs[VSOCK_VQ_RX]
@@ -74,6 +78,42 @@ static u32 virtio_transport_get_local_cid(void)
        return vsock->guest_cid;
 }

+static void virtio_transport_loopback_work(struct work_struct *work)
+{
+       struct virtio_vsock *vsock =
+               container_of(work, struct virtio_vsock, loopback_work);
+       LIST_HEAD(pkts);
+
+       spin_lock_bh(&vsock->loopback_list_lock);
+       list_splice_init(&vsock->loopback_list, &pkts);
+       spin_unlock_bh(&vsock->loopback_list_lock);
+
+       mutex_lock(&vsock->rx_lock);
+       while (!list_empty(&pkts)) {
+               struct virtio_vsock_pkt *pkt;
+
+               pkt = list_first_entry(&pkts, struct virtio_vsock_pkt, list);
+               list_del_init(&pkt->list);
+
+               virtio_transport_recv_pkt(pkt);
+       }
+       mutex_unlock(&vsock->rx_lock);
+}
+
+static int virtio_transport_send_pkt_loopback(struct virtio_vsock *vsock,
+                                             struct virtio_vsock_pkt *pkt)
+{
+       int len = pkt->len;
+
+       spin_lock_bh(&vsock->loopback_list_lock);
+       list_add_tail(&pkt->list, &vsock->loopback_list);
+       spin_unlock_bh(&vsock->loopback_list_lock);
+
+       queue_work(virtio_vsock_workqueue, &vsock->loopback_work);
+
+       return len;
+}
+
 static void
 virtio_transport_send_pkt_work(struct work_struct *work)
 {
@@ -159,6 +199,10 @@ virtio_transport_send_pkt(struct virtio_vsock_pkt *pkt)
                return -ENODEV;
        }

+       if (le32_to_cpu(pkt->hdr.dst_cid) == vsock->guest_cid) {
+               return virtio_transport_send_pkt_loopback(vsock, pkt);
+       }
+
        if (pkt->reply)
                atomic_inc(&vsock->queued_replies);

@@ -510,10 +554,13 @@ static int virtio_vsock_probe(struct virtio_device *vdev)
        mutex_init(&vsock->event_lock);
        spin_lock_init(&vsock->send_pkt_list_lock);
        INIT_LIST_HEAD(&vsock->send_pkt_list);
+       spin_lock_init(&vsock->loopback_list_lock);
+       INIT_LIST_HEAD(&vsock->loopback_list);
        INIT_WORK(&vsock->rx_work, virtio_transport_rx_work);
        INIT_WORK(&vsock->tx_work, virtio_transport_tx_work);
        INIT_WORK(&vsock->event_work, virtio_transport_event_work);
        INIT_WORK(&vsock->send_pkt_work, virtio_transport_send_pkt_work);
+       INIT_WORK(&vsock->loopback_work, virtio_transport_loopback_work);

        mutex_lock(&vsock->rx_lock);
        virtio_vsock_rx_fill(vsock);
@@ -539,6 +586,7 @@ static void virtio_vsock_remove(struct virtio_device *vdev)
        struct virtio_vsock *vsock = vdev->priv;
        struct virtio_vsock_pkt *pkt;

+       flush_work(&vsock->loopback_work);
        flush_work(&vsock->rx_work);
        flush_work(&vsock->tx_work);
        flush_work(&vsock->event_work);
@@ -565,6 +613,15 @@ static void virtio_vsock_remove(struct virtio_device *vdev)
        }
        spin_unlock_bh(&vsock->send_pkt_list_lock);

+       spin_lock_bh(&vsock->loopback_list_lock);
+       while (!list_empty(&vsock->loopback_list)) {
+               pkt = list_first_entry(&vsock->loopback_list,
+                                      struct virtio_vsock_pkt, list);
+               list_del(&pkt->list);
+               virtio_transport_free_pkt(pkt);
+       }
+       spin_unlock_bh(&vsock->loopback_list_lock);
+
        mutex_lock(&the_virtio_vsock_mutex);
        the_virtio_vsock = NULL;
        vsock_core_exit();

^ permalink raw reply related

* Re: [PATCH net] tcp: zero ca_priv area when switching cc algorithms
From: David Miller @ 2016-11-21 18:14 UTC (permalink / raw)
  To: fw; +Cc: netdev
In-Reply-To: <1479719317-22437-1-git-send-email-fw@strlen.de>

From: Florian Westphal <fw@strlen.de>
Date: Mon, 21 Nov 2016 10:08:37 +0100

> We need to zero out the private data area when application switches
> connection to different algorithm (TCP_CONGESTION setsockopt).
> 
> When congestion ops get assigned at connect time everything is already
> zeroed because sk_alloc uses GFP_ZERO flag.  But in the setsockopt case
> this contains whatever previous cc placed there.
> 
> Signed-off-by: Florian Westphal <fw@strlen.de>

Good catch, applied, thanks Florian.

^ permalink raw reply

* Re: [PATCH net-next 0/2] bridge: add support for IGMPv3 and MLDv2 querier
From: David Miller @ 2016-11-21 18:17 UTC (permalink / raw)
  To: nikolay; +Cc: netdev, roopa, sashok, stephen, liuhangbin
In-Reply-To: <1479729805-23108-1-git-send-email-nikolay@cumulusnetworks.com>

From: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Date: Mon, 21 Nov 2016 13:03:23 +0100

> This patch-set adds support for IGMPv3 and MLDv2 querier in the bridge.
> Two new options which can be toggled via netlink and sysfs are added that
> control the version per-bridge:
>  multicast_igmp_version - default 2, can be set to 3
>  multicast_mld_version - default 1, can be set to 2 (this option is
>                          disabled if CONFIG_IPV6=n)
> 
> Note that the names do not include "querier", I think that these options
> can be re-used later as more IGMPv3 support is added to the bridge so we
> can avoid adding more options to switch between v2 and v3 behaviour.
> 
> The set uses the already existing br_ip{4,6}_multicast_alloc_query
> functions and adds the appropriate header based on the chosen version.
> 
> For the initial support I have removed the compatibility implementation
> (RFC3376 sec 7.3.1, 7.3.2; RFC3810 sec 8.3.1, 8.3.2), because there are
> some details that we need to sort out.

Series applied, thanks.

^ permalink raw reply

* Re: [PATCH v2 next 0/2] tcp: make undo_cwnd mandatory for congestion modules
From: David Miller @ 2016-11-21 18:20 UTC (permalink / raw)
  To: fw; +Cc: netdev
In-Reply-To: <1479734318-30607-1-git-send-email-fw@strlen.de>

From: Florian Westphal <fw@strlen.de>
Date: Mon, 21 Nov 2016 14:18:36 +0100

> highspeed, illinois, scalable, veno and yeah congestion control algorithms
> don't provide a 'cwnd_undo' function.  This makes the stack default to a
> 'reno undo' which doubles cwnd.  However, the ssthresh implementation of
> these algorithms do not halve the slowstart threshold. This causes similar
> issue as the one fixed for dctcp in ce6dd23329b1e ("dctcp: avoid bogus
> doubling of cwnd after loss").
> 
> In light of this it seems better to remove the fallback and make undo_cwnd
> mandatory.
> 
> First patch fixes those spots where reno undo seems incorrect by providing
> .cwnd_undo functions, second patch removes the fallback.

Series applied, thanks for following up on this.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox