Netdev List
 help / color / mirror / Atom feed
* Re: [PATCH v2 16/16] net: Add support for networking over Thunderbolt cable
From: David Miller @ 2017-09-27 18:23 UTC (permalink / raw)
  To: mika.westerberg
  Cc: gregkh, andreas.noever, michael.jamet, yehezkel.bernat,
	amir.jer.levy, Mario.Limonciello, lukas, andriy.shevchenko,
	andrew, linux-kernel, netdev
In-Reply-To: <20170927172702.GE4630@lahna.fi.intel.com>

From: Mika Westerberg <mika.westerberg@linux.intel.com>
Date: Wed, 27 Sep 2017 20:27:02 +0300

> On Wed, Sep 27, 2017 at 09:27:09AM -0700, David Miller wrote:
>> From: Mika Westerberg <mika.westerberg@linux.intel.com>
>> Date: Wed, 27 Sep 2017 16:42:38 +0300
>> 
>> > Using build_skb() then would require to allocate larger buffer, that
>> > includes NET_SKB_PAD + SKB_DATA_ALIGN(skb_shared_info) and that exceeds
>> > page size. Is this something supported by build_skb()? It was not clear
>> > to me based on the code and other users of build_skb() but I may be
>> > missing something.
>> 
>> You need NET_SKB_PAD before and SKB_DATA_ALIGN(skb_shared_info) afterwards.
>> An order 1 page, if that's what you need, should work just fine.
> 
> I mean in order to fit a single ThunderboltIP frame, I would need to
> allocate NET_SKB_PAD+4096+SKB_DATA_ALIGN(skb_shared_info) size buffer.

Which would be an order 1 page or 8192 bytes.

> Is that still fine for build_skb()? Also can I use that with
> skb_add_rx_frag() which seem to take single page?

Again, an order 1 page should work fine.

^ permalink raw reply

* Re: [PATCH net-next] ipv6: do lazy dst->__use update when per cpu dst is available
From: Eric Dumazet @ 2017-09-27 18:03 UTC (permalink / raw)
  To: Wei Wang
  Cc: Paolo Abeni, Martin KaFai Lau, Linux Kernel Network Developers,
	David S. Miller, Hannes Frederic Sowa
In-Reply-To: <CAEA6p_DZxGesjymubWoWx7hQjaZ7Vdchh=cdOW9tAr4F7QvzXg@mail.gmail.com>

>> diff --git a/net/ipv6/route.c b/net/ipv6/route.c

>
> Hi Paolo,
>
> Eric and I discussed about this issue recently as well :).
>
> What about the following change:
>
> diff --git a/include/net/dst.h b/include/net/dst.h
> index 93568bd0a352..33e1d86bcef6 100644
> --- a/include/net/dst.h
> +++ b/include/net/dst.h
> @@ -258,14 +258,18 @@ static inline void dst_hold(struct dst_entry *dst)
>  static inline void dst_use(struct dst_entry *dst, unsigned long time)
>  {
>         dst_hold(dst);
> -       dst->__use++;
> -       dst->lastuse = time;
> +       if (dst->lastuse != time) {
> +               dst->__use++;
> +               dst->lastuse = time;
> +       }
>  }
>
>  static inline void dst_use_noref(struct dst_entry *dst, unsigned long time)
>  {
> -       dst->__use++;
> -       dst->lastuse = time;
> +       if (dst->lastuse != time) {
> +               dst->__use++;
> +               dst->lastuse = time;
> +       }
>  }
>
>  static inline struct dst_entry *dst_clone(struct dst_entry *dst)
> diff --git a/net/ipv6/route.c b/net/ipv6/route.c
> index 26cc9f483b6d..e195f093add3 100644
> --- a/net/ipv6/route.c
> +++ b/net/ipv6/route.c
> @@ -1170,8 +1170,7 @@ struct rt6_info *ip6_pol_route(struct net *net,
> struct fib6_table *table,
>
>                 struct rt6_info *pcpu_rt;
>
> -               rt->dst.lastuse = jiffies;
> -               rt->dst.__use++;
> +               dst_use_noref(rt, jiffies);
>                 pcpu_rt = rt6_get_pcpu_route(rt);
>
>                 if (pcpu_rt) {
>
>
> This way we always only update dst->__use and dst->lastuse at most
> once per jiffy. And we don't really need to update pcpu and then do
> the copy over from pcpu_rt to rt operation.
>
> Another thing is that I don't really see any places making use of
> dst->__use. So maybe we can also get rid of this dst->__use field?
>
> Thanks.
> Wei

Paolo, given we are very close to send Wei awesome work about IPv6
routing cache,
could we ask you to wait few days before doing the same work from your side ?

Main issue is the rwlock, and we are converting it to full RCU.

You are sending patches that are making our job very difficult IMO.

We chose to mimic the change done in neighbour code years ago
( 0ed8ddf4045fcfcac36bad753dc4046118c603ec )

Thanks.

^ permalink raw reply

* Re: [PATCH net-next] ipv6: do lazy dst->__use update when per cpu dst is available
From: Wei Wang @ 2017-09-27 17:44 UTC (permalink / raw)
  To: Paolo Abeni, Eric Dumazet, Martin KaFai Lau
  Cc: Linux Kernel Network Developers, David S. Miller,
	Hannes Frederic Sowa
In-Reply-To: <9d32116e61596b0de6a584b6cf05cae3ce7abb56.1506532145.git.pabeni@redhat.com>

On Wed, Sep 27, 2017 at 10:14 AM, Paolo Abeni <pabeni@redhat.com> wrote:
> When a host is under high ipv6 load, the updates of the ingress
> route '__use' field are a source of relevant contention: such
> field is updated for each packet and several cores can access
> concurrently the dst, even if percpu dst entries are available
> and used.
>
> The __use value is just a rough indication of the dst usage: is
> already updated concurrently from multiple CPUs without any lock,
> so we can decrease the contention leveraging the percpu dst to perform
> __use bulk updates: if a per cpu dst entry is found, we account on
> such entry and we flush the percpu counter once per jiffy.
>
> Performace gain under UDP flood is as follows:
>
> nr RX queues    before          after           delta
>                 kpps            kpps            (%)
> 2               2316            2688            16
> 3               3033            3605            18
> 4               3963            4328            9
> 5               4379            5253            19
> 6               5137            6000            16
>
> Performance gain under TCP syn flood should be measurable as well.
>
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> ---
>  net/ipv6/route.c | 18 ++++++++++++++++--
>  1 file changed, 16 insertions(+), 2 deletions(-)
>
> diff --git a/net/ipv6/route.c b/net/ipv6/route.c
> index 26cc9f483b6d..e69f304de950 100644
> --- a/net/ipv6/route.c
> +++ b/net/ipv6/route.c
> @@ -1170,12 +1170,24 @@ struct rt6_info *ip6_pol_route(struct net *net, struct fib6_table *table,
>
>                 struct rt6_info *pcpu_rt;
>
> -               rt->dst.lastuse = jiffies;
> -               rt->dst.__use++;
>                 pcpu_rt = rt6_get_pcpu_route(rt);
>
>                 if (pcpu_rt) {
> +                       unsigned long ts;
> +
>                         read_unlock_bh(&table->tb6_lock);
> +
> +                       /* do lazy updates of rt->dst->__use, at most once
> +                        * per jiffy, to avoid contention on such cacheline.
> +                        */
> +                       ts = jiffies;
> +                       pcpu_rt->dst.__use++;
> +                       if (pcpu_rt->dst.lastuse != ts) {
> +                               rt->dst.__use += pcpu_rt->dst.__use;
> +                               rt->dst.lastuse = ts;
> +                               pcpu_rt->dst.__use = 0;
> +                               pcpu_rt->dst.lastuse = ts;
> +                       }
>                 } else {
>                         /* We have to do the read_unlock first
>                          * because rt6_make_pcpu_route() may trigger
> @@ -1185,6 +1197,8 @@ struct rt6_info *ip6_pol_route(struct net *net, struct fib6_table *table,
>                         read_unlock_bh(&table->tb6_lock);
>                         pcpu_rt = rt6_make_pcpu_route(rt);
>                         dst_release(&rt->dst);
> +                       rt->dst.lastuse = jiffies;
> +                       rt->dst.__use++;
>                 }
>
>                 trace_fib6_table_lookup(net, pcpu_rt, table->tb6_id, fl6);
> --
> 2.13.5
>

Hi Paolo,

Eric and I discussed about this issue recently as well :).

What about the following change:

diff --git a/include/net/dst.h b/include/net/dst.h
index 93568bd0a352..33e1d86bcef6 100644
--- a/include/net/dst.h
+++ b/include/net/dst.h
@@ -258,14 +258,18 @@ static inline void dst_hold(struct dst_entry *dst)
 static inline void dst_use(struct dst_entry *dst, unsigned long time)
 {
        dst_hold(dst);
-       dst->__use++;
-       dst->lastuse = time;
+       if (dst->lastuse != time) {
+               dst->__use++;
+               dst->lastuse = time;
+       }
 }

 static inline void dst_use_noref(struct dst_entry *dst, unsigned long time)
 {
-       dst->__use++;
-       dst->lastuse = time;
+       if (dst->lastuse != time) {
+               dst->__use++;
+               dst->lastuse = time;
+       }
 }

 static inline struct dst_entry *dst_clone(struct dst_entry *dst)
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 26cc9f483b6d..e195f093add3 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -1170,8 +1170,7 @@ struct rt6_info *ip6_pol_route(struct net *net,
struct fib6_table *table,

                struct rt6_info *pcpu_rt;

-               rt->dst.lastuse = jiffies;
-               rt->dst.__use++;
+               dst_use_noref(rt, jiffies);
                pcpu_rt = rt6_get_pcpu_route(rt);

                if (pcpu_rt) {


This way we always only update dst->__use and dst->lastuse at most
once per jiffy. And we don't really need to update pcpu and then do
the copy over from pcpu_rt to rt operation.

Another thing is that I don't really see any places making use of
dst->__use. So maybe we can also get rid of this dst->__use field?

Thanks.
Wei

^ permalink raw reply related

* Re: [PATCH net-next 2/6] bpf: add meta pointer for direct access
From: Alexei Starovoitov @ 2017-09-27 17:32 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: John Fastabend, Daniel Borkmann, davem, peter.waskiewicz.jr,
	jakub.kicinski, netdev, Andy Gospodarek
In-Reply-To: <20170927165457.4265bfc3@redhat.com>

On Wed, Sep 27, 2017 at 04:54:57PM +0200, Jesper Dangaard Brouer wrote:
> On Wed, 27 Sep 2017 06:35:40 -0700
> John Fastabend <john.fastabend@gmail.com> wrote:
> 
> > On 09/27/2017 02:26 AM, Jesper Dangaard Brouer wrote:
> > > On Tue, 26 Sep 2017 21:58:53 +0200
> > > Daniel Borkmann <daniel@iogearbox.net> wrote:
> > >   
> > >> On 09/26/2017 09:13 PM, Jesper Dangaard Brouer wrote:
> > >> [...]  
> > >>> I'm currently implementing a cpumap type, that transfers raw XDP frames
> > >>> to another CPU, and the SKB is allocated on the remote CPU.  (It
> > >>> actually works extremely well).    
> > >>
> > >> Meaning you let all the XDP_PASS packets get processed on a
> > >> different CPU, so you can reserve the whole CPU just for
> > >> prefiltering, right?   
> > > 
> > > Yes, exactly.  Except I use the XDP_REDIRECT action to steer packets.
> > > The trick is using the map-flush point, to transfer packets in bulk to
> > > the remote CPU (single call IPC is too slow), but at the same time
> > > flush single packets if NAPI didn't see a bulk.
> > >   
> > >> Do you have some numbers to share at this point, just curious when
> > >> you mention it works extremely well.  
> > > 
> > > Sure... I've done a lot of benchmarking on this patchset ;-)
> > > I have a benchmark program called xdp_redirect_cpu [1][2], that collect
> > > stats via tracepoints (atm I'm limiting bulking 8 packets, and have
> > > tracepoints at bulk spots, to amortize tracepoint cost 25ns/8=3.125ns)
> > > 
> > >  [1] https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/samples/bpf/xdp_redirect_cpu_kern.c
> > >  [2] https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/samples/bpf/xdp_redirect_cpu_user.c
> > > 
> > > Here I'm installing a DDoS program that drops UDP port 9 (pktgen
> > > packets) on RX CPU=0.  I'm forcing my netperf to hit the same CPU, that
> > > the 11.9Mpps DDoS attack is hitting.
> > > 
> > > Running XDP/eBPF prog_num:4
> > > XDP-cpumap      CPU:to  pps            drop-pps    extra-info
> > > XDP-RX          0       12,030,471     11,966,982  0          
> > > XDP-RX          total   12,030,471     11,966,982 
> > > cpumap-enqueue    0:2   63,488         0           0          
> > > cpumap-enqueue  sum:2   63,488         0           0          
> > > cpumap_kthread  2       63,488         0           3          time_exceed
> > > cpumap_kthread  total   63,488         0           0          
> > > redirect_err    total   0              0          
> > > 
> > > $ netperf -H 172.16.0.2 -t TCP_CRR  -l 10 -D1 -T5,5 -- -r 1024,1024
> > > Local /Remote
> > > Socket Size   Request  Resp.   Elapsed  Trans.
> > > Send   Recv   Size     Size    Time     Rate         
> > > bytes  Bytes  bytes    bytes   secs.    per sec   
> > > 
> > > 16384  87380  1024     1024    10.00    12735.97   
> > > 16384  87380 
> > > 
> > > The netperf TCP_CRR performance is the same, without XDP loaded.
> > >   
> > 
> > Just curious could you also try this with RPS enabled (or does this have
> > RPS enabled). RPS should effectively do the same thing but higher in the
> > stack. I'm curious what the delta would be. Might be another interesting
> > case and fairly easy to setup if you already have the above scripts.
> 
> Yes, I'm essentially competing with RSP, thus such a comparison is very
> relevant...
> 
> This is only a 6 CPUs system. Allocate 2 CPUs to RPS receive and let
> other 4 CPUS process packet.
> 
> Summary of RPS (Receive Packet Steering) performance:
>  * End result is 6.3 Mpps max performance
>  * netperf TCP_CRR is 1 trans/sec.
>  * Each RX-RPS CPU stall at ~3.2Mpps.
> 
> The full test report below with setup:
> 
> The mask needed::
> 
>  perl -e 'printf "%b\n",0x3C'
>  111100
> 
> RPS setup::
> 
>  sudo sh -c 'echo 32768 > /proc/sys/net/core/rps_sock_flow_entries'
> 
>  for N in $(seq 0 5) ; do \
>    sudo sh -c "echo 8192 > /sys/class/net/ixgbe1/queues/rx-$N/rps_flow_cnt" ; \
>    sudo sh -c "echo 3c > /sys/class/net/ixgbe1/queues/rx-$N/rps_cpus" ; \
>    grep -H . /sys/class/net/ixgbe1/queues/rx-$N/rps_cpus ; \
>  done
> 
> Reduce RX queues to two ::
> 
>  ethtool -L ixgbe1 combined 2
> 
> IRQ align to CPU numbers::
> 
>  $ ~/setup01.sh
>  Not root, running with sudo
>   --- Disable Ethernet flow-control ---
>  rx unmodified, ignoring
>  tx unmodified, ignoring
>  no pause parameters changed, aborting
>  rx unmodified, ignoring
>  tx unmodified, ignoring
>  no pause parameters changed, aborting
>   --- Align IRQs ---
>  /proc/irq/54/ixgbe1-TxRx-0/../smp_affinity_list:0
>  /proc/irq/55/ixgbe1-TxRx-1/../smp_affinity_list:1
>  /proc/irq/56/ixgbe1/../smp_affinity_list:0-5
> 
> $ grep -H . /sys/class/net/ixgbe1/queues/rx-*/rps_cpus
> /sys/class/net/ixgbe1/queues/rx-0/rps_cpus:3c
> /sys/class/net/ixgbe1/queues/rx-1/rps_cpus:3c
> 
> Generator is sending: 12,715,782 tx_packets /sec
> 
>  ./pktgen_sample04_many_flows.sh -vi ixgbe2 -m 00:1b:21:bb:9a:84 \
>     -d 172.16.0.2 -t8
> 
> $ nstat > /dev/null && sleep 1 && nstat
> #kernel
> IpInReceives                    6346544            0.0
> IpInDelivers                    6346544            0.0
> IpOutRequests                   1020               0.0
> IcmpOutMsgs                     1020               0.0
> IcmpOutDestUnreachs             1020               0.0
> IcmpMsgOutType3                 1020               0.0
> UdpNoPorts                      6346898            0.0
> IpExtInOctets                   291964714          0.0
> IpExtOutOctets                  73440              0.0
> IpExtInNoECTPkts                6347063            0.0
> 
> $ mpstat -P ALL -u -I SCPU -I SUM
> 
> Average:     CPU    %usr   %nice    %sys   %irq   %soft  %idle
> Average:     all    0.00    0.00    0.00   0.42   72.97  26.61
> Average:       0    0.00    0.00    0.00   0.17   99.83   0.00
> Average:       1    0.00    0.00    0.00   0.17   99.83   0.00
> Average:       2    0.00    0.00    0.00   0.67   60.37  38.96
> Average:       3    0.00    0.00    0.00   0.67   58.70  40.64
> Average:       4    0.00    0.00    0.00   0.67   59.53  39.80
> Average:       5    0.00    0.00    0.00   0.67   58.93  40.40
> 
> Average:     CPU    intr/s
> Average:     all 152067.22
> Average:       0  50064.73
> Average:       1  50089.35
> Average:       2  45095.17
> Average:       3  44875.04
> Average:       4  44906.32
> Average:       5  45152.08
> 
> Average:     CPU     TIMER/s   NET_TX/s   NET_RX/s TASKLET/s  SCHED/s     RCU/s
> Average:       0      609.48       0.17   49431.28      0.00     2.66     21.13
> Average:       1      567.55       0.00   49498.00      0.00     2.66     21.13
> Average:       2      998.34       0.00   43941.60      4.16    82.86     68.22
> Average:       3      540.60       0.17   44140.27      0.00    85.52    108.49
> Average:       4      537.27       0.00   44219.63      0.00    84.53     64.89
> Average:       5      530.78       0.17   44445.59      0.00    85.02     90.52
> 
> From mpstat it looks like it is the RX-RPS CPUs that are the bottleneck.
> 
> Show adapter(s) (ixgbe1) statistics (ONLY that changed!)
> Ethtool(ixgbe1) stat:     11109531 (   11,109,531) <= fdir_miss /sec
> Ethtool(ixgbe1) stat:    380632356 (  380,632,356) <= rx_bytes /sec
> Ethtool(ixgbe1) stat:    812792611 (  812,792,611) <= rx_bytes_nic /sec
> Ethtool(ixgbe1) stat:      1753550 (    1,753,550) <= rx_missed_errors /sec
> Ethtool(ixgbe1) stat:      4602487 (    4,602,487) <= rx_no_dma_resources /sec
> Ethtool(ixgbe1) stat:      6343873 (    6,343,873) <= rx_packets /sec
> Ethtool(ixgbe1) stat:     10946441 (   10,946,441) <= rx_pkts_nic /sec
> Ethtool(ixgbe1) stat:    190287853 (  190,287,853) <= rx_queue_0_bytes /sec
> Ethtool(ixgbe1) stat:      3171464 (    3,171,464) <= rx_queue_0_packets /sec
> Ethtool(ixgbe1) stat:    190344503 (  190,344,503) <= rx_queue_1_bytes /sec
> Ethtool(ixgbe1) stat:      3172408 (    3,172,408) <= rx_queue_1_packets /sec
> 
> Notice, each RX-CPU can only process 3.1Mpps.
> 
> RPS RX-CPU(0):
> 
>  # Overhead  CPU  Symbol
>  # ........  ...  .......................................
>  #
>     11.72%  000  [k] ixgbe_poll
>     11.29%  000  [k] _raw_spin_lock
>     10.35%  000  [k] dev_gro_receive
>      8.36%  000  [k] __build_skb
>      7.35%  000  [k] __skb_get_hash
>      6.22%  000  [k] enqueue_to_backlog
>      5.89%  000  [k] __skb_flow_dissect
>      4.43%  000  [k] inet_gro_receive
>      4.19%  000  [k] ___slab_alloc
>      3.90%  000  [k] queued_spin_lock_slowpath
>      3.85%  000  [k] kmem_cache_alloc
>      3.06%  000  [k] build_skb
>      2.66%  000  [k] get_rps_cpu
>      2.57%  000  [k] napi_gro_receive
>      2.34%  000  [k] eth_type_trans
>      1.81%  000  [k] __cmpxchg_double_slab.isra.61
>      1.47%  000  [k] ixgbe_alloc_rx_buffers
>      1.43%  000  [k] get_partial_node.isra.81
>      0.84%  000  [k] swiotlb_sync_single
>      0.74%  000  [k] udp4_gro_receive
>      0.73%  000  [k] netif_receive_skb_internal
>      0.72%  000  [k] udp_gro_receive
>      0.63%  000  [k] skb_gro_reset_offset
>      0.49%  000  [k] __skb_flow_get_ports
>      0.48%  000  [k] llist_add_batch
>      0.36%  000  [k] swiotlb_sync_single_for_cpu
>      0.34%  000  [k] __slab_alloc
> 
> 
> Remote RPS-CPU(3) getting packets::
> 
>  # Overhead  CPU  Symbol
>  # ........  ...  ..............................................
>  #
>     33.02%  003  [k] poll_idle
>     10.99%  003  [k] __netif_receive_skb_core
>     10.45%  003  [k] page_frag_free
>      8.49%  003  [k] ip_rcv
>      4.19%  003  [k] fib_table_lookup
>      2.84%  003  [k] __udp4_lib_rcv
>      2.81%  003  [k] __slab_free
>      2.23%  003  [k] __udp4_lib_lookup
>      2.09%  003  [k] ip_route_input_rcu
>      2.07%  003  [k] kmem_cache_free
>      2.06%  003  [k] udp_v4_early_demux
>      1.73%  003  [k] ip_rcv_finish

Very interesting data.
So above perf report compares to xdp-redirect-cpu this one:
Perf top on a CPU(3) that have to alloc and free SKBs etc.

# Overhead  CPU  Symbol
# ........  ...  .......................................
#
    15.51%  003  [k] fib_table_lookup
     8.91%  003  [k] cpu_map_kthread_run
     8.04%  003  [k] build_skb
     7.88%  003  [k] page_frag_free
     5.13%  003  [k] kmem_cache_alloc
     4.76%  003  [k] ip_route_input_rcu
     4.59%  003  [k] kmem_cache_free
     4.02%  003  [k] __udp4_lib_rcv
     3.20%  003  [k] fib_validate_source
     3.02%  003  [k] __netif_receive_skb_core
     3.02%  003  [k] udp_v4_early_demux
     2.90%  003  [k] ip_rcv
     2.80%  003  [k] ip_rcv_finish

right?
and in RPS case the consumer cpu is 33% idle whereas in redirect-cpu
you can load it up all the way.
Am I interpreting all this correctly that with RPS cpu0 cannot
distributed the packets to other cpus fast enough and that's
a bottleneck?
whereas in redirect-cpu you're doing early packet distribution
before skb alloc?
So in other words with redirect-cpu all consumer cpus are doing
skb alloc and in RPS cpu0 is allocating skbs for all ?
and that's where 6M->12M performance gain comes from?

^ permalink raw reply

* [PATCH V3] r8152:  add Linksys USB3GIGV1 id
From: Grant Grundler @ 2017-09-27 17:28 UTC (permalink / raw)
  To: Hayes Wang, Oliver Neukum
  Cc: linux-usb, David S . Miller, LKML, netdev, Grant Grundler

This linksys dongle by default comes up in cdc_ether mode.
This patch allows r8152 to claim the device:
   Bus 002 Device 002: ID 13b1:0041 Linksys

Signed-off-by: Grant Grundler <grundler@chromium.org>
---
 drivers/net/usb/cdc_ether.c | 10 ++++++++++
 drivers/net/usb/r8152.c     |  2 ++
 2 files changed, 12 insertions(+)

V3: for backwards compat, add #ifdef CONFIG_USB_RTL8152 around
    the cdc_ether blacklist entry so the cdc_ether driver can
    still claim the device if r8152 driver isn't configured.

V2: add LINKSYS_VENDOR_ID to cdc_ether blacklist

diff --git a/drivers/net/usb/cdc_ether.c b/drivers/net/usb/cdc_ether.c
index 8ab281b478f2..446dcc0f1f70 100644
--- a/drivers/net/usb/cdc_ether.c
+++ b/drivers/net/usb/cdc_ether.c
@@ -546,6 +546,7 @@ static const struct driver_info wwan_info = {
 #define DELL_VENDOR_ID		0x413C
 #define REALTEK_VENDOR_ID	0x0bda
 #define SAMSUNG_VENDOR_ID	0x04e8
+#define LINKSYS_VENDOR_ID	0x13b1
 #define LENOVO_VENDOR_ID	0x17ef
 #define NVIDIA_VENDOR_ID	0x0955
 #define HP_VENDOR_ID		0x03f0
@@ -737,6 +738,15 @@ static const struct usb_device_id	products[] = {
 	.driver_info = 0,
 },
 
+#ifdef CONFIG_USB_RTL8152
+/* Linksys USB3GIGV1 Ethernet Adapter */
+{
+	USB_DEVICE_AND_INTERFACE_INFO(LINKSYS_VENDOR_ID, 0x0041, USB_CLASS_COMM,
+			USB_CDC_SUBCLASS_ETHERNET, USB_CDC_PROTO_NONE),
+	.driver_info = 0,
+},
+#endif
+
 /* ThinkPad USB-C Dock (based on Realtek RTL8153) */
 {
 	USB_DEVICE_AND_INTERFACE_INFO(LENOVO_VENDOR_ID, 0x3062, USB_CLASS_COMM,
diff --git a/drivers/net/usb/r8152.c b/drivers/net/usb/r8152.c
index ceb78e2ea4f0..941ece08ba78 100644
--- a/drivers/net/usb/r8152.c
+++ b/drivers/net/usb/r8152.c
@@ -613,6 +613,7 @@ enum rtl8152_flags {
 #define VENDOR_ID_MICROSOFT		0x045e
 #define VENDOR_ID_SAMSUNG		0x04e8
 #define VENDOR_ID_LENOVO		0x17ef
+#define VENDOR_ID_LINKSYS		0x13b1
 #define VENDOR_ID_NVIDIA		0x0955
 
 #define MCU_TYPE_PLA			0x0100
@@ -5316,6 +5317,7 @@ static const struct usb_device_id rtl8152_table[] = {
 	{REALTEK_USB_DEVICE(VENDOR_ID_LENOVO,  0x7205)},
 	{REALTEK_USB_DEVICE(VENDOR_ID_LENOVO,  0x720c)},
 	{REALTEK_USB_DEVICE(VENDOR_ID_LENOVO,  0x7214)},
+	{REALTEK_USB_DEVICE(VENDOR_ID_LINKSYS, 0x0041)},
 	{REALTEK_USB_DEVICE(VENDOR_ID_NVIDIA,  0x09ff)},
 	{}
 };
-- 
2.14.2.822.g60be5d43e6-goog

^ permalink raw reply related

* Re: [PATCH v2 16/16] net: Add support for networking over Thunderbolt cable
From: Mika Westerberg @ 2017-09-27 17:27 UTC (permalink / raw)
  To: David Miller
  Cc: gregkh, andreas.noever, michael.jamet, yehezkel.bernat,
	amir.jer.levy, Mario.Limonciello, lukas, andriy.shevchenko,
	andrew, linux-kernel, netdev
In-Reply-To: <20170927.092709.1826177647592316221.davem@davemloft.net>

On Wed, Sep 27, 2017 at 09:27:09AM -0700, David Miller wrote:
> From: Mika Westerberg <mika.westerberg@linux.intel.com>
> Date: Wed, 27 Sep 2017 16:42:38 +0300
> 
> > Using build_skb() then would require to allocate larger buffer, that
> > includes NET_SKB_PAD + SKB_DATA_ALIGN(skb_shared_info) and that exceeds
> > page size. Is this something supported by build_skb()? It was not clear
> > to me based on the code and other users of build_skb() but I may be
> > missing something.
> 
> You need NET_SKB_PAD before and SKB_DATA_ALIGN(skb_shared_info) afterwards.
> An order 1 page, if that's what you need, should work just fine.

I mean in order to fit a single ThunderboltIP frame, I would need to
allocate NET_SKB_PAD+4096+SKB_DATA_ALIGN(skb_shared_info) size buffer.
Is that still fine for build_skb()? Also can I use that with
skb_add_rx_frag() which seem to take single page?

ThunderboltIP protocol basically takes advantage of TSO/LRO but it
actually does not do any segmentation. Instead it just splits the 64kB
large package into smaller 4k frames (which each include 12 byte header)
and pushes those over the Thunderbolt medium. The receiver side then
does the opposite.

Thanks and sorry for dummy questions. I'm just not too familiar with
the networking subsystem (yet).

^ permalink raw reply

* [PATCH net-next] ipv6: do lazy dst->__use update when per cpu dst is available
From: Paolo Abeni @ 2017-09-27 17:14 UTC (permalink / raw)
  To: netdev; +Cc: David S. Miller, Hannes Frederic Sowa, Wei Wang

When a host is under high ipv6 load, the updates of the ingress
route '__use' field are a source of relevant contention: such
field is updated for each packet and several cores can access
concurrently the dst, even if percpu dst entries are available
and used.

The __use value is just a rough indication of the dst usage: is
already updated concurrently from multiple CPUs without any lock,
so we can decrease the contention leveraging the percpu dst to perform
__use bulk updates: if a per cpu dst entry is found, we account on
such entry and we flush the percpu counter once per jiffy.

Performace gain under UDP flood is as follows:

nr RX queues	before		after		delta
		kpps		kpps		(%)
2		2316		2688		16
3		3033		3605		18
4		3963		4328		9
5		4379		5253		19
6		5137		6000		16

Performance gain under TCP syn flood should be measurable as well.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 net/ipv6/route.c | 18 ++++++++++++++++--
 1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 26cc9f483b6d..e69f304de950 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -1170,12 +1170,24 @@ struct rt6_info *ip6_pol_route(struct net *net, struct fib6_table *table,
 
 		struct rt6_info *pcpu_rt;
 
-		rt->dst.lastuse = jiffies;
-		rt->dst.__use++;
 		pcpu_rt = rt6_get_pcpu_route(rt);
 
 		if (pcpu_rt) {
+			unsigned long ts;
+
 			read_unlock_bh(&table->tb6_lock);
+
+			/* do lazy updates of rt->dst->__use, at most once
+			 * per jiffy, to avoid contention on such cacheline.
+			 */
+			ts = jiffies;
+			pcpu_rt->dst.__use++;
+			if (pcpu_rt->dst.lastuse != ts) {
+				rt->dst.__use += pcpu_rt->dst.__use;
+				rt->dst.lastuse = ts;
+				pcpu_rt->dst.__use = 0;
+				pcpu_rt->dst.lastuse = ts;
+			}
 		} else {
 			/* We have to do the read_unlock first
 			 * because rt6_make_pcpu_route() may trigger
@@ -1185,6 +1197,8 @@ struct rt6_info *ip6_pol_route(struct net *net, struct fib6_table *table,
 			read_unlock_bh(&table->tb6_lock);
 			pcpu_rt = rt6_make_pcpu_route(rt);
 			dst_release(&rt->dst);
+			rt->dst.lastuse = jiffies;
+			rt->dst.__use++;
 		}
 
 		trace_fib6_table_lookup(net, pcpu_rt, table->tb6_id, fl6);
-- 
2.13.5

^ permalink raw reply related

* Re: [PATCH 2/2] uapi: add a compatibility layer between linux/uio.h and glibc
From: Lee Duncan @ 2017-09-27 16:46 UTC (permalink / raw)
  To: linux-kernel, netdev; +Cc: Dmitry V. Levin, David Miller, Lee Duncan
In-Reply-To: <20170221201914.GA28360@altlinux.org>

Ping? I never saw any response to this.

On Wed, 22 Feb 2017 05:29:47 +0300, Dmitry V Levin wrote:
> Do not define struct iovec in linux/uio.h when <sys/uio.h> or <fcntl.h>
> is already included and provides these definitions.
> 
> This fixes the following compilation error when <sys/uio.h> or <fcntl.h>
> is included before <linux/uio.h>:
> 
> /usr/include/linux/uio.h:16:8: error: redefinition of 'struct iovec'
> 
> Signed-off-by: Dmitry V. Levin <ldv@...linux.org>
> ---
>  include/uapi/linux/libc-compat.h | 10 ++++++++++
>  include/uapi/linux/uio.h         |  3 +++
>  2 files changed, 13 insertions(+)
> 
> diff --git a/include/uapi/linux/libc-compat.h b/include/uapi/linux/libc-compat.h
> index 481e3b1..9b88586 100644
> --- a/include/uapi/linux/libc-compat.h
> +++ b/include/uapi/linux/libc-compat.h
> @@ -205,6 +205,13 @@
>  #define __UAPI_DEF_TIMEZONE		1
>  #endif
>  
> +/* Coordinate with glibc bits/uio.h header. */
> +#if defined(_SYS_UIO_H) || defined(_FCNTL_H)
> +#define __UAPI_DEF_IOVEC		0
> +#else
> +#define __UAPI_DEF_IOVEC		1
> +#endif
> +
>  /* Definitions for xattr.h */
>  #if defined(_SYS_XATTR_H)
>  #define __UAPI_DEF_XATTR		0
> @@ -261,6 +268,9 @@
>  #define __UAPI_DEF_TIMEVAL			1
>  #define __UAPI_DEF_TIMEZONE			1
>  
> +/* Definitions for uio.h */
> +#define __UAPI_DEF_IOVEC		1
> +
>  /* Definitions for xattr.h */
>  #define __UAPI_DEF_XATTR		1
>  
> diff --git a/include/uapi/linux/uio.h b/include/uapi/linux/uio.h
> index 2731d56..e6e12cf 100644
> --- a/include/uapi/linux/uio.h
> +++ b/include/uapi/linux/uio.h
> @@ -9,15 +9,18 @@
>  #ifndef _UAPI__LINUX_UIO_H
>  #define _UAPI__LINUX_UIO_H
>  
> +#include <linux/libc-compat.h>
>  #include <linux/compiler.h>
>  #include <linux/types.h>
>  
>  
> +#if __UAPI_DEF_IOVEC
>  struct iovec
>  {
>  	void __user *iov_base;	/* BSD uses caddr_t (1003.1g requires void *) */
>  	__kernel_size_t iov_len; /* Must be size_t (1003.1g) */
>  };
> +#endif /* __UAPI_DEF_IOVEC */
>  
>  /*
>   *	UIO_MAXIOV shall be at least 16 1003.1g (5.4.1.1)
> -- 
> ldv

-- 
Lee Duncan
SUSE Labs

^ permalink raw reply

* RE: [PATCH net v2] net: dsa: mv88e6xxx: lock mutex when freeing IRQs
From: David Laight @ 2017-09-27 16:40 UTC (permalink / raw)
  To: 'Andrew Lunn'
  Cc: 'Vivien Didelot', netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, kernel@savoirfairelinux.com,
	David S. Miller, Florian Fainelli
In-Reply-To: <20170927130711.GD13516@lunn.ch>

From: Andrew Lunn
> Sent: 27 September 2017 14:07
> To: David Laight
> On Wed, Sep 27, 2017 at 09:06:01AM +0000, David Laight wrote:
> > From: Vivien Didelot
> > > Sent: 26 September 2017 19:57
> > > mv88e6xxx_g2_irq_free locks the registers mutex, but not
> > > mv88e6xxx_g1_irq_free, which results in a stack trace from
> > > assert_reg_lock when unloading the mv88e6xxx module. Fix this.
> > >
> > > Fixes: 3460a5770ce9 ("net: dsa: mv88e6xxx: Mask g1 interrupts and free interrupt")
> > > Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
> > > ---
> > >  drivers/net/dsa/mv88e6xxx/chip.c | 2 ++
> > >  1 file changed, 2 insertions(+)
> > >
> > > diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
> > > index c6678aa9b4ef..e7ff7483d2fb 100644
> > > --- a/drivers/net/dsa/mv88e6xxx/chip.c
> > > +++ b/drivers/net/dsa/mv88e6xxx/chip.c
> > > @@ -3947,7 +3947,9 @@ static void mv88e6xxx_remove(struct mdio_device *mdiodev)
> > >  	if (chip->irq > 0) {
> > >  		if (chip->info->g2_irqs > 0)
> > >  			mv88e6xxx_g2_irq_free(chip);
> > > +		mutex_lock(&chip->reg_lock);
> > >  		mv88e6xxx_g1_irq_free(chip);
> > > +		mutex_unlock(&chip->reg_lock);
> >
> > Isn't the irq_free code likely to have to sleep waiting for any
> > ISR to complete??
> 
> Hi David
> 
> Possibly. But this is a mutex, not a spinlock. So sleeping is O.K.
> Or am i missing something?

Looks like I was missing some coffee :-)

	David

^ permalink raw reply

* Re: [PATCH V2] r8152: add Linksys USB3GIGV1 id
From: Grant Grundler @ 2017-09-27 16:39 UTC (permalink / raw)
  To: Oliver Neukum
  Cc: Doug Anderson, Grant Grundler, David S . Miller,
	Greg Kroah-Hartman, Hayes Wang, LKML, linux-usb, netdev
In-Reply-To: <1506496556.28482.3.camel@suse.com>

On Wed, Sep 27, 2017 at 12:15 AM, Oliver Neukum <oneukum@suse.com> wrote:
> Am Dienstag, den 26.09.2017, 08:19 -0700 schrieb Doug Anderson:
>>
>> I know that for at least some of the adapters in the CDC Ethernet
>> blacklist it was claimed that the CDC Ethernet support in the adapter
>> was kinda broken anyway so the blacklist made sense.  ...but for the
>> Linksys Gigabit adapter the CDC Ethernet driver seems to work OK, it's
>> just not quite as full featured / efficient as the R8152 driver.
>>
>> Is that not a concern?  I guess you could tell people in this
>> situation that they simply need to enable the R8152 driver to get
>> continued support for their Ethernet adapter?
>
> Hi,
>
> yes, it is a valid concern. An #ifdef will be needed.

Good idea - I will post V3 shortly.

I'm assuming you mean to add #ifdef CONFIG_USB_RTL8152 around the
blacklist entry in cdc_ether driver.

cheers,
grant

^ permalink raw reply

* [PATCH net-next] tcp: fix under-evaluated ssthresh in TCP Vegas
From: Hoang Tran @ 2017-09-27 16:30 UTC (permalink / raw)
  To: netdev
  Cc: Hoang Tran, David S. Miller, Alexey Kuznetsov, Hideaki YOSHIFUJI,
	linux-kernel

With the commit 76174004a0f19785 (tcp: do not slow start when cwnd equals
ssthresh), the comparison to the reduced cwnd in tcp_vegas_ssthresh() would
under-evaluate the ssthresh.

Signed-off-by: Hoang Tran <hoang.tran@uclouvain.be>
---
 net/ipv4/tcp_vegas.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv4/tcp_vegas.c b/net/ipv4/tcp_vegas.c
index 218cfcc..ee113ff 100644
--- a/net/ipv4/tcp_vegas.c
+++ b/net/ipv4/tcp_vegas.c
@@ -158,7 +158,7 @@ EXPORT_SYMBOL_GPL(tcp_vegas_cwnd_event);
 
 static inline u32 tcp_vegas_ssthresh(struct tcp_sock *tp)
 {
-	return  min(tp->snd_ssthresh, tp->snd_cwnd-1);
+	return  min(tp->snd_ssthresh, tp->snd_cwnd);
 }
 
 static void tcp_vegas_cong_avoid(struct sock *sk, u32 ack, u32 acked)
-- 
2.7.4

^ permalink raw reply related

* Re: [PATCH net-next] libbpf: use map_flags when creating maps
From: Alexei Starovoitov @ 2017-09-27 16:29 UTC (permalink / raw)
  To: Craig Gallek, Daniel Borkmann, David S . Miller; +Cc: Chonggang Li, netdev
In-Reply-To: <20170927140458.44337-1-kraigatgoog@gmail.com>

On 9/27/17 7:04 AM, Craig Gallek wrote:
> From: Craig Gallek <kraig@google.com>
>
> This extends struct bpf_map_def to include a flags field.  Note that
> this has the potential to break the validation logic in
> bpf_object__validate_maps and bpf_object__init_maps as they use
> sizeof(struct bpf_map_def) as a minimal allowable size of a map section.
> Any bpf program compiled with a smaller struct bpf_map_def will fail this
> check.
>
> I don't believe this will be an issue in practice as both compile-time
> definitions of struct bpf_map_def (in samples/bpf/bpf_load.h and
> tools/testing/selftests/bpf/bpf_helpers.h) have always been larger
> than this newly updated version in libbpf.h.
>
> Signed-off-by: Craig Gallek <kraig@google.com>
> ---
>  tools/lib/bpf/libbpf.c | 2 +-
>  tools/lib/bpf/libbpf.h | 1 +
>  2 files changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
> index 35f6dfcdc565..6bea85f260a3 100644
> --- a/tools/lib/bpf/libbpf.c
> +++ b/tools/lib/bpf/libbpf.c
> @@ -874,7 +874,7 @@ bpf_object__create_maps(struct bpf_object *obj)
>  				      def->key_size,
>  				      def->value_size,
>  				      def->max_entries,
> -				      0);
> +				      def->map_flags);
>  		if (*pfd < 0) {
>  			size_t j;
>  			int err = *pfd;
> diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
> index 7959086eb9c9..6e20003109e0 100644
> --- a/tools/lib/bpf/libbpf.h
> +++ b/tools/lib/bpf/libbpf.h
> @@ -207,6 +207,7 @@ struct bpf_map_def {
>  	unsigned int key_size;
>  	unsigned int value_size;
>  	unsigned int max_entries;
> +	unsigned int map_flags;
>  };

yes it will break loading of pre-compiled .o
Instead of breaking, let's fix the loader to do it the way
samples/bpf/bpf_load.c does.
See commit 156450d9d964 ("samples/bpf: make bpf_load.c code compatible 
with ELF maps section changes")

^ permalink raw reply

* Re: [PATCH v2 16/16] net: Add support for networking over Thunderbolt cable
From: David Miller @ 2017-09-27 16:27 UTC (permalink / raw)
  To: mika.westerberg
  Cc: gregkh, andreas.noever, michael.jamet, yehezkel.bernat,
	amir.jer.levy, Mario.Limonciello, lukas, andriy.shevchenko,
	andrew, linux-kernel, netdev
In-Reply-To: <20170927134238.GC4630@lahna.fi.intel.com>

From: Mika Westerberg <mika.westerberg@linux.intel.com>
Date: Wed, 27 Sep 2017 16:42:38 +0300

> Using build_skb() then would require to allocate larger buffer, that
> includes NET_SKB_PAD + SKB_DATA_ALIGN(skb_shared_info) and that exceeds
> page size. Is this something supported by build_skb()? It was not clear
> to me based on the code and other users of build_skb() but I may be
> missing something.

You need NET_SKB_PAD before and SKB_DATA_ALIGN(skb_shared_info) afterwards.
An order 1 page, if that's what you need, should work just fine.

^ permalink raw reply

* Re: [PATCH] net/ipv4: Update sk_for_each_entry_offset_rcu macro to utilize rcu methods hlist_next_rcu. This fixes the warnings thrown by sparse regarding net/ipv4/udp.c on line 1974.
From: Tim Hansen @ 2017-09-27 16:25 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, linux-kernel
In-Reply-To: <20170926.200339.2029420994787006076.davem@davemloft.net>

> Tue, Sep 26, 2017 at 08:03:39PM -0700, David Miller wrote:
> From: Tim Hansen <devtimhansen@gmail.com>
> Date: Tue, 26 Sep 2017 20:54:05 -0400
>
> > Signed-off-by: Tim Hansen <devtimhansen@gmail.com>
>
> This is a poor patch submission on many levels.
>

Apologies Dave, this is my first patch. I appreciate the quick review and helpful feedback.

> But the main problem, is that there is no use of
> sk_for_each_entry_offset_rcu() in any of my networking kernel trees.
>

Using the get_maintainers.pl on include/net/sock.h brings up your name and the netdev mailing list.  Forgive me if I'm misunderstanding what you mean by this?

> Referencing code by line number never works, you have to mention
> what version of the kernel, what tree, and where in what fucntion
> the problem is occurring.
>

I am using your tree net tree for now: git://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git on HEAD. HEAD is c2cc187e53011c1c4931055984657da9085c763b for me currently on your tree.

Before I was on the 4.13 tag pulled from linus' tree.

The line number is indeed useless in hindsight since there are many different trees. I won't do that again.

Using sparse 0.5.0 on HEAD of your net tree, I run make C=1 net/ipv4/. It throws the error:

"net/ipv4/udp.c:1981:9: error: incompatible types in comparison expression (different address spaces)
net/ipv4/udp.c:1981:9: error: incompatible types in comparison expression (different address spaces)"

This points to the function sk_for_each_entry_offset_rcu() in __udp4_lib_mcast_deliver in net/ipv4/udp.c.

Inspecting this macro in include/net/sock.h is what lead to this patch.

Applying the patch silences those warnings but clearly this is -not- a proper way of fixing the error. Any guidance would be greatly appreciated.

> Secondly, sk_for_each_entry_offset_rcu() is not meant to be used
> in _raw() contexts.  This is why it's not called
> sk_for_each_entry_offset_rcu_raw().

Absolutely makes sense. I am not familar with the kernel naming standards fully yet but this is obvious in hindsight.

>
> The sparse warning is probably legitimate, and points to a bug.
>
> But nobody can tell where becuase you haven't told us what tree
> and where this happens.

Hopefully my reply has enough detail for reproduction and further debugging. Please let me know if I should supply any additional information.

^ permalink raw reply

* Re: [PATCH v3 1/1] ip_tunnel: add mpls over gre encapsulation
From: Amine Kherbouche @ 2017-09-27 16:24 UTC (permalink / raw)
  To: Roopa Prabhu; +Cc: netdev@vger.kernel.org, xeb, David Lamparter
In-Reply-To: <CAJieiUgvHmXDmY3rvVj4u_1r=m0kOfDE_HYYYkY+Z73K9ZbwKw@mail.gmail.com>



On 09/27/2017 06:20 PM, Roopa Prabhu wrote:
> I think its better to bring the patch back in.

Sounds good, ok

^ permalink raw reply

* Re: [PATCH v2 02/16] thunderbolt: Add support for XDomain properties
From: David Miller @ 2017-09-27 16:22 UTC (permalink / raw)
  To: mika.westerberg
  Cc: gregkh, andreas.noever, michael.jamet, yehezkel.bernat,
	amir.jer.levy, Mario.Limonciello, lukas, andriy.shevchenko,
	andrew, linux-kernel, netdev
In-Reply-To: <20170927113241.GB4630@lahna.fi.intel.com>

From: Mika Westerberg <mika.westerberg@linux.intel.com>
Date: Wed, 27 Sep 2017 14:32:41 +0300

> Just for my education, is there some rule which tells when __packed is
> to be used? For example the above structures are all 32-bit aligned but
> how about something like:
> 
> struct foo {
> 	u32 value1;
> 	u8 value2;
> };
> 
> If the on-wire format requires such structures I assume __packed
> is needed here?

Usually header elements are 32-bit aligned in a protocol, so it wouldn't
be specified like that.

The only legitimate case I've seen is where things are purposefully
misaligned within the header, like this:

struct foo {
	u16	x;
	u64	y;
	u16	z;
};

Where the 'y' element is 2-byte aligned.

Fortunately, those situations are extremely rare.

^ permalink raw reply

* Re: [PATCH v3 1/1] ip_tunnel: add mpls over gre encapsulation
From: Roopa Prabhu @ 2017-09-27 16:20 UTC (permalink / raw)
  To: Amine Kherbouche; +Cc: netdev@vger.kernel.org, xeb, David Lamparter
In-Reply-To: <1ce61dd3-293b-c26b-e173-3bdae4ba46a7@6wind.com>

On Wed, Sep 27, 2017 at 9:08 AM, Amine Kherbouche
<amine.kherbouche@6wind.com> wrote:
>
>
> On 09/27/2017 05:36 PM, Roopa Prabhu wrote:
>>
>> Amine, one small nit here.., if you define mpls_gre_rcv in gre header
>> (like you had initially), you could do the below...
>>
>> #if IS_ENABLED(CONFIG_MPLS)
>> mpls_gre_rcv()
>> {
>>      /* real func */
>> }
>> #else
>> mpls_gre_rcv()
>> {
>>     kfree_skb(skb)
>>     return NET_RX_DROP
>> }
>> #endif
>>
>> and the check in gre_rcv() reduces to
>>
>> if (unlikely(tpi.proto == htons(ETH_P_MPLS_UC)))
>>         return mpls_gre_rcv(skb, hdr_len);
>>
>> Which looks much cleaner.
>
>
> If I do that, do I have to add back the patch that export mpls_forward() or
> just merge it with this one ?

I think its better to bring the patch back in.

^ permalink raw reply

* Re: [PATCH v3 1/1] ip_tunnel: add mpls over gre encapsulation
From: Amine Kherbouche @ 2017-09-27 16:08 UTC (permalink / raw)
  To: Roopa Prabhu; +Cc: netdev@vger.kernel.org, xeb, David Lamparter
In-Reply-To: <CAJieiUgkOwJvrEq9vw+2-Up0qc9hvxt3M6LWnf8FSyx9VRX-Dw@mail.gmail.com>



On 09/27/2017 05:36 PM, Roopa Prabhu wrote:
> Amine, one small nit here.., if you define mpls_gre_rcv in gre header
> (like you had initially), you could do the below...
>
> #if IS_ENABLED(CONFIG_MPLS)
> mpls_gre_rcv()
> {
>      /* real func */
> }
> #else
> mpls_gre_rcv()
> {
>     kfree_skb(skb)
>     return NET_RX_DROP
> }
> #endif
>
> and the check in gre_rcv() reduces to
>
> if (unlikely(tpi.proto == htons(ETH_P_MPLS_UC)))
>         return mpls_gre_rcv(skb, hdr_len);
>
> Which looks much cleaner.

If I do that, do I have to add back the patch that export mpls_forward() 
or just merge it with this one ?

^ permalink raw reply

* RE: [PATCH v2 02/16] thunderbolt: Add support for XDomain properties
From: David Laight @ 2017-09-27 16:06 UTC (permalink / raw)
  To: 'Mika Westerberg', David Miller
  Cc: gregkh@linuxfoundation.org, andreas.noever@gmail.com,
	michael.jamet@intel.com, yehezkel.bernat@intel.com,
	amir.jer.levy@intel.com, Mario.Limonciello@dell.com,
	lukas@wunner.de, andriy.shevchenko@linux.intel.com,
	andrew@lunn.ch, linux-kernel@vger.kernel.org,
	netdev@vger.kernel.org
In-Reply-To: <20170927113241.GB4630@lahna.fi.intel.com>

From: Mika Westerberg
> Sent: 27 September 2017 12:33
...
> Just for my education, is there some rule which tells when __packed is
> to be used? For example the above structures are all 32-bit aligned but
> how about something like:
> 
> struct foo {
> 	u32 value1;
> 	u8 value2;
> };
> 
> If the on-wire format requires such structures I assume __packed
> is needed here?

You've endianness considerations as well with on-wire formats.

__packed indicates two things:
1) There will be no padding bytes between fields.
2) The structure itself might appear on any byte boundary.

The latter causes the compiler to do byte memory accesses and
shifts to load/store the data on some architectures.
So only mark things __packed when they might be misaligned in
memory.

	David

^ permalink raw reply

* Re: [iproute PATCH v2 0/3] Check user supplied interface name lengths
From: Phil Sutter @ 2017-09-27 16:05 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: netdev
In-Reply-To: <20170927084249.0591ee3a@shemminger-XPS-13-9360>

On Wed, Sep 27, 2017 at 08:42:49AM +0100, Stephen Hemminger wrote:
> On Tue, 26 Sep 2017 18:35:45 +0200
> Phil Sutter <phil@nwl.cc> wrote:
> 
> > This series adds explicit checks for user-supplied interface names to
> > make sure their length fits Linux's requirements.
> > 
> > The first two patches simplify interface name parsing in some places -
> > these are side-effects of working on the actual implementation provided
> > in patch three.
> > 
> > Changes since v1:
> > - Patches 1 and 2 introduced.
> > - Changes to patch 3 are listed in there.
> > 
> > Phil Sutter (3):
> >   ip{6,}tunnel: Avoid copying user-supplied interface name around
> >   tc: flower: No need to cache indev arg
> >   Check user supplied interface name lengths
> > 
> >  include/utils.h |  1 +
> >  ip/ip6tunnel.c  |  9 +++++----
> >  ip/ipl2tp.c     |  3 ++-
> >  ip/iplink.c     | 27 ++++++++-------------------
> >  ip/ipmaddr.c    |  1 +
> >  ip/iprule.c     |  4 ++++
> >  ip/iptunnel.c   | 27 +++++++++++++--------------
> >  ip/iptuntap.c   |  4 +++-
> >  lib/utils.c     | 10 ++++++++++
> >  misc/arpd.c     |  1 +
> >  tc/f_flower.c   |  6 ++----
> >  11 files changed, 50 insertions(+), 43 deletions(-)
> > 
> 
> I like the idea, and checking arguments is good.

Cool!

> Why not merge the check and copy and put in lib/utils.c
> 
> int get_ifname(char *name, const char *arg)
> {
> ...

What do you have in mind exactly? There are basically three situations
to which check_ifname() is added:

1) Simple pointer caching:

   | check_ifname("name", *argv);
   | name = *argv;

2) Value caching:

   | check_ifname("name", *argv);
   | strncpy(name, *argv, IFNAMSIZ);

3) Direct netlink attribute creation:

   | check_ifname("name", *argv);
   | addattr_l(&req.n, sizeof(req), IFNAME, *argv, strlen(*argv) + 1);

To cover them all, I could introduce the following:

| char *check_ifname(const char *name, const char *argv)
| {
| 	/* check *arg, call invarg() if invalid */
| 	return *argv;
| }
| 
| void copy_ifname(char *dst, const char *name, const char *argv)
| {
| 	strncpy(dst, check_ifname(name, argv), IFNAMSIZ);
| }
| 
| void addattr_ifname(struct nlmsghdr *n, int maxlen, int type,
| 		    const char *name, const char *argv)
| {
| 	addattr_l(n, maxlen, type, check_ifname(name, argv),
| 		  strlen(*argv) + 1);
| }

What do you think?

Cheers, Phil

^ permalink raw reply

* Re: [PATCH net-next 5/6] net: dsa: mv88e6xxx: Move mv88e6xxx_port_db_load_purge()
From: Vivien Didelot @ 2017-09-27 15:51 UTC (permalink / raw)
  To: Andrew Lunn, David Miller; +Cc: netdev, Andrew Lunn
In-Reply-To: <1506464764-12699-6-git-send-email-andrew@lunn.ch>

Andrew Lunn <andrew@lunn.ch> writes:

> This function is going to be needed by a soon to be added new
> function. Move it earlier so we can avoid a forward declaration.
> No code changes.

s/code/functional/

> Signed-off-by: Andrew Lunn <andrew@lunn.ch>

Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>

^ permalink raw reply

* Re: [PATCH net-next 4/6] net: dsa: mv88e6xxx: Print offending port when vlan check fails
From: Vivien Didelot @ 2017-09-27 15:49 UTC (permalink / raw)
  To: Andrew Lunn, David Miller; +Cc: netdev, Andrew Lunn
In-Reply-To: <1506464764-12699-5-git-send-email-andrew@lunn.ch>

Andrew Lunn <andrew@lunn.ch> writes:

> When testing if a VLAN is one more than one bridge, we print an error
> message that the VLAN is already in use somewhere else. Print both the
> new port which would like the VLAN, and the port which already has it,
> to aid debugging.
>
> Signed-off-by: Andrew Lunn <andrew@lunn.ch>

Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>

^ permalink raw reply

* Re: [PATCH v2 net-next 0/2] net/sched: support tunnel options in cls_flower and act_tunnel_key
From: Jiri Benc @ 2017-09-27 15:46 UTC (permalink / raw)
  To: Simon Horman
  Cc: David Miller, Jiri Pirko, Jamal Hadi Salim, Cong Wang, netdev,
	oss-drivers
In-Reply-To: <1506500194-17637-1-git-send-email-simon.horman@netronome.com>

On Wed, 27 Sep 2017 10:16:32 +0200, Simon Horman wrote:
> * Geneve
> 
>   In the case of Geneve options are TLVs[1]. My reading is that in collect
>   metadata mode the kernel does not appear to do anything other than pass
>   them around as a bytestring.
> 
>   [1] https://tools.ietf.org/html/draft-ietf-nvo3-geneve-05#section-3.5
[...]
> 
> Neither of the above appear to assume any structure for the data.

But that's not true. Geneve uses TLVs, you even mentioned that
yourself. Matching on a block of TLVs as a bytestring doesn't make
sense. The TLV fields may be in any order.

We need better matching here. Bytestring is useless for Geneve.

NACK for this direction of option matching. We'd need to introduce
matching on TLVs sooner or later anyway and this would be just a never
used compat cruft that we need to keep around forever.

 Jiri

^ permalink raw reply

* Re: [PATCH v3 1/1] ip_tunnel: add mpls over gre encapsulation
From: Roopa Prabhu @ 2017-09-27 15:36 UTC (permalink / raw)
  To: Amine Kherbouche; +Cc: netdev@vger.kernel.org, xeb, David Lamparter
In-Reply-To: <198fd7591bc1daae4727ee8b950e116b59f2d4c3.1506504229.git.amine.kherbouche@6wind.com>

On Wed, Sep 27, 2017 at 2:37 AM, Amine Kherbouche
<amine.kherbouche@6wind.com> wrote:
> This commit introduces the MPLSoGRE support (RFC 4023), using ip tunnel
> API.
>
> Encap:
>   - Add a new iptunnel type mpls.
>   - Share tx path: gre type mpls loaded from skb->protocol.
>
> Decap:
>   - pull gre hdr and call mpls_forward().
>
> Signed-off-by: Amine Kherbouche <amine.kherbouche@6wind.com>
> ---
>  include/linux/mpls.h           |  2 ++
>  include/uapi/linux/if_tunnel.h |  1 +
>  net/ipv4/ip_gre.c              | 11 +++++++++
>  net/ipv6/ip6_gre.c             | 11 +++++++++
>  net/mpls/af_mpls.c             | 52 ++++++++++++++++++++++++++++++++++++++++++
>  5 files changed, 77 insertions(+)
>
> diff --git a/include/linux/mpls.h b/include/linux/mpls.h
> index 384fb22..57203c1 100644
> --- a/include/linux/mpls.h
> +++ b/include/linux/mpls.h
> @@ -8,4 +8,6 @@
>  #define MPLS_TC_MASK           (MPLS_LS_TC_MASK >> MPLS_LS_TC_SHIFT)
>  #define MPLS_LABEL_MASK                (MPLS_LS_LABEL_MASK >> MPLS_LS_LABEL_SHIFT)
>
> +int mpls_gre_rcv(struct sk_buff *skb, int gre_hdr_len);
> +
>  #endif  /* _LINUX_MPLS_H */
> diff --git a/include/uapi/linux/if_tunnel.h b/include/uapi/linux/if_tunnel.h
> index 2e52088..a2f48c0 100644
> --- a/include/uapi/linux/if_tunnel.h
> +++ b/include/uapi/linux/if_tunnel.h
> @@ -84,6 +84,7 @@ enum tunnel_encap_types {
>         TUNNEL_ENCAP_NONE,
>         TUNNEL_ENCAP_FOU,
>         TUNNEL_ENCAP_GUE,
> +       TUNNEL_ENCAP_MPLS,
>  };
>
>  #define TUNNEL_ENCAP_FLAG_CSUM         (1<<0)
> diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c
> index 9cee986..0a898f4 100644
> --- a/net/ipv4/ip_gre.c
> +++ b/net/ipv4/ip_gre.c
> @@ -32,6 +32,9 @@
>  #include <linux/netfilter_ipv4.h>
>  #include <linux/etherdevice.h>
>  #include <linux/if_ether.h>
> +#if IS_ENABLED(CONFIG_MPLS)
> +#include <linux/mpls.h>
> +#endif
>
>  #include <net/sock.h>
>  #include <net/ip.h>
> @@ -412,6 +415,14 @@ static int gre_rcv(struct sk_buff *skb)
>                         return 0;
>         }
>
> +       if (unlikely(tpi.proto == htons(ETH_P_MPLS_UC))) {
> +#if IS_ENABLED(CONFIG_MPLS)
> +               return mpls_gre_rcv(skb, hdr_len);
> +#else
> +               goto drop;
> +#endif
> +       }
> +

Amine, one small nit here.., if you define mpls_gre_rcv in gre header
(like you had initially), you could do the below...

#if IS_ENABLED(CONFIG_MPLS)
mpls_gre_rcv()
{
     /* real func */
}
#else
mpls_gre_rcv()
{
    kfree_skb(skb)
    return NET_RX_DROP
}
#endif

and the check in gre_rcv() reduces to

if (unlikely(tpi.proto == htons(ETH_P_MPLS_UC)))
        return mpls_gre_rcv(skb, hdr_len);

Which looks much cleaner.

Other than that, looks great. pls add my Acked-by: Roopa Prabhu
<roopa@cumulusnetworks.com> to your next version.

thanks!

^ permalink raw reply

* Re: [PATCH net-next 1/6] net: dsa: Fix SWITCHDEV_ATTR_ID_PORT_PARENT_ID
From: Vivien Didelot @ 2017-09-27 15:31 UTC (permalink / raw)
  To: Andrew Lunn, David Miller; +Cc: netdev, Andrew Lunn
In-Reply-To: <1506464764-12699-2-git-send-email-andrew@lunn.ch>

Hi Andrew,

Andrew Lunn <andrew@lunn.ch> writes:

> @@ -354,13 +354,16 @@ static int dsa_slave_port_attr_get(struct net_device *dev,
>  				   struct switchdev_attr *attr)
>  {
>  	struct dsa_slave_priv *p = netdev_priv(dev);
> -	struct dsa_switch *ds = p->dp->ds;
>  
>  	switch (attr->id) {
> -	case SWITCHDEV_ATTR_ID_PORT_PARENT_ID:
> -		attr->u.ppid.id_len = sizeof(ds->index);
> -		memcpy(&attr->u.ppid.id, &ds->index, attr->u.ppid.id_len);
> +	case SWITCHDEV_ATTR_ID_PORT_PARENT_ID: {
> +		struct dsa_switch *ds = p->dp->ds;
> +		struct dsa_switch_tree *dst = ds->dst;
> +
> +		attr->u.ppid.id_len = sizeof(dst->tree);
> +		memcpy(&attr->u.ppid.id, &dst->tree, sizeof(dst->tree));
>  		break;
> +	}
>  	case SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS_SUPPORT:
>  		attr->u.brport_flags_support = 0;
>  		break;

I am pretty sure the kernel folks won't like blocks within case
statments. Simply declare dst = p->dp->ds->dst at the beginning.

Also keeping attr->u.ppid.id_len as memcpy's third argument like other
switchdev users do would be prefered.

Otherwise using the tree index is indeed the way to go:

Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>


Thanks,

        Vivien

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox