Netdev List
 help / color / mirror / Atom feed
* Re: [PATCHv2 net-next 1/6] sctp: add sctp_info dump api for sctp_diag
From: Eric Dumazet @ 2016-04-09 17:21 UTC (permalink / raw)
  To: Jamal Hadi Salim
  Cc: Xin Long, network dev, linux-sctp, Marcelo Ricardo Leitner,
	Vlad Yasevich, daniel, davem
In-Reply-To: <57091D90.4030905@mojatatu.com>

On Sat, 2016-04-09 at 11:19 -0400, Jamal Hadi Salim wrote:
> On 16-04-09 01:16 AM, Eric Dumazet wrote:
> 
> >
> > Lots of holes in this structure...
> >
> >
> 
> 
> I may have mentioned to you that there is 8 bit hole in tcp_info too ;->
> (above tcpi_rto). Adding an 8 bit explicit pad seems useful
> since it is already in the wild. I was going to send the patch after
> netdev11 but  forgot.

Well, once a hole is there, nothing we can do really, because of
compatibility with old kernels / old binaries.


But when a _new_ structure is defined, this is the time where we can ask
for doing sensible things ;)

^ permalink raw reply

* Re: [RFC PATCH v2 1/5] bpf: add PHYS_DEV prog type for early driver filter
From: Alexei Starovoitov @ 2016-04-09 17:00 UTC (permalink / raw)
  To: Tom Herbert
  Cc: Brenden Blanco, David S. Miller, Linux Kernel Network Developers,
	Or Gerlitz, Daniel Borkmann, Jesper Dangaard Brouer, Eric Dumazet,
	Edward Cree, john fastabend, Thomas Graf, Johannes Berg,
	eranlinuxmellanox, Lorenzo Colitti
In-Reply-To: <CALx6S34m8cVNgvuGp845bicixodfavH9cj-rARSwwEAvFCjd7g@mail.gmail.com>

On Sat, Apr 09, 2016 at 08:17:04AM -0300, Tom Herbert wrote:
> >
> > +/* user return codes for PHYS_DEV prog type */
> > +enum bpf_phys_dev_action {
> > +       BPF_PHYS_DEV_DROP,
> > +       BPF_PHYS_DEV_OK,
> 
> I don't like OK. Maybe this mean LOCAL. We also need FORWARD (not sure
> how to handle GRO yet).
> 
> I would suggest that we format the return code as code:subcode, where
> the above are codes. subcode is relevant to major code. For instance
> in forwarding the subcodes indicate a forwarding instruction (maybe a
> queue). DROP subcodes might answer why.

for tc redirect we use hidden percpu variable to pass additional
info together with return code. The cost of it is extra bpf_redirect() call.
Here we can do better and embed such info for xmit,
but subcodes for drop is slippery slop, since it's adding concepts
to design that are not going to be used by everyone.
If necessary bpf programs can count drop reasons internally.
Drops due to protocol!=ipv6 or drops due to ip frags present
will be program internal reasons. No need to expose them in api.

We need to get xmit part implemented first and see how it looks
before deciding on this part of api.
Right now I think we do not need tx queue number actually.
The prog should just return 'XMIT' and xdp(driver) side will decide
which tx queue to use.

> One other API issue is how to deal with encapsulation. In this case a
> header may be prepended to the packet, I assume there are BPF helper
> functions and we don't need to return a new length or start?

a bit of history:
for tc+bpf we've been trying to come up with clean helpers to do
header push/pop and it was very difficult, since skb keeps a ton
of metedata about header offsets, csum offsets, encap flag, etc
we've lost the count on number of different approaches we've
implemented and discarded.
For XDP there is no such issue.
Likely we'll have single bpf_packet_change(ctx, off, len) helper
that will grow(len) or trim(-len) bytes at offset(off) in the packet.
ctx->len will be adjusted automatically by the helper.
The headroom, tailroom will not be exposed and will never be known
to the bpf side. It's up to the helper and the driver to decide how to
insert N bytes at offset M. If the driver reserved headroom in dma
buffer, it can grow into it, if not it can grow tail and move
the whole packet. For performance reasons we obviously want some
headroom in dma buffer, but it's not exposed to bpf.

But it could be that directly adjusting ctx->len and ctx->data is faster.
For cls_bpf ctx->data is hidden and packet access is done via
special instructions and helpers. For XDP we can hopefully do better
and do packet access with direct loads. I outlined that plan
in the previous thread.

^ permalink raw reply

* Re: [RFC PATCH v2 5/5] Add sample for adding simple drop program to link
From: Brenden Blanco @ 2016-04-09 16:43 UTC (permalink / raw)
  To: Jamal Hadi Salim
  Cc: davem, netdev, tom, alexei.starovoitov, ogerlitz, daniel, brouer,
	eric.dumazet, ecree, john.fastabend, tgraf, johannes,
	eranlinuxmellanox, lorenzo
In-Reply-To: <57091625.1010206@mojatatu.com>

On Sat, Apr 09, 2016 at 10:48:05AM -0400, Jamal Hadi Salim wrote:
> On 16-04-08 12:48 AM, Brenden Blanco wrote:
> >Add a sample program that only drops packets at the
> >BPF_PROG_TYPE_PHYS_DEV hook of a link. With the drop-only program,
> >observed single core rate is ~19.5Mpps.
> >
> >Other tests were run, for instance without the dropcnt increment or
> >without reading from the packet header, the packet rate was mostly
> >unchanged.
> >
> >$ perf record -a samples/bpf/netdrvx1 $(</sys/class/net/eth0/ifindex)
> >proto 17:   19596362 drops/s
> >
> >./pktgen_sample03_burst_single_flow.sh -i $DEV -d $IP -m $MAC -t 4
> >Running... ctrl^C to stop
> >Device: eth4@0
> >Result: OK: 7873817(c7872245+d1572) usec, 38801823 (60byte,0frags)
> >   4927955pps 2365Mb/sec (2365418400bps) errors: 0
> >Device: eth4@1
> >Result: OK: 7873817(c7872123+d1693) usec, 38587342 (60byte,0frags)
> >   4900715pps 2352Mb/sec (2352343200bps) errors: 0
> >Device: eth4@2
> >Result: OK: 7873817(c7870929+d2888) usec, 38718848 (60byte,0frags)
> >   4917417pps 2360Mb/sec (2360360160bps) errors: 0
> >Device: eth4@3
> >Result: OK: 7873818(c7872193+d1625) usec, 38796346 (60byte,0frags)
> >   4927259pps 2365Mb/sec (2365084320bps) errors: 0
> >
> >perf report --no-children:
> >  29.48%  ksoftirqd/6  [mlx4_en]         [k] mlx4_en_process_rx_cq
> >  18.17%  ksoftirqd/6  [mlx4_en]         [k] mlx4_en_alloc_frags
> >   8.19%  ksoftirqd/6  [mlx4_en]         [k] mlx4_en_free_frag
> >   5.35%  ksoftirqd/6  [kernel.vmlinux]  [k] get_page_from_freelist
> >   2.92%  ksoftirqd/6  [kernel.vmlinux]  [k] free_pages_prepare
> >   2.90%  ksoftirqd/6  [mlx4_en]         [k] mlx4_call_bpf
> >   2.72%  ksoftirqd/6  [fjes]            [k] 0x000000000000af66
> >   2.37%  ksoftirqd/6  [kernel.vmlinux]  [k] swiotlb_sync_single_for_cpu
> >   1.92%  ksoftirqd/6  [kernel.vmlinux]  [k] percpu_array_map_lookup_elem
> >   1.83%  ksoftirqd/6  [kernel.vmlinux]  [k] free_one_page
> >   1.70%  ksoftirqd/6  [kernel.vmlinux]  [k] swiotlb_sync_single
> >   1.69%  ksoftirqd/6  [kernel.vmlinux]  [k] bpf_map_lookup_elem
> >   1.33%  swapper      [kernel.vmlinux]  [k] intel_idle
> >   1.32%  ksoftirqd/6  [fjes]            [k] 0x000000000000af90
> >   1.21%  ksoftirqd/6  [kernel.vmlinux]  [k] sk_load_byte_positive_offset
> >   1.07%  ksoftirqd/6  [kernel.vmlinux]  [k] __alloc_pages_nodemask
> >   0.89%  ksoftirqd/6  [kernel.vmlinux]  [k] __rmqueue
> >   0.84%  ksoftirqd/6  [mlx4_en]         [k] mlx4_alloc_pages.isra.23
> >   0.79%  ksoftirqd/6  [kernel.vmlinux]  [k] net_rx_action
> >
> >machine specs:
> >  receiver - Intel E5-1630 v3 @ 3.70GHz
> >  sender - Intel E5645 @ 2.40GHz
> >  Mellanox ConnectX-3 @40G
> >
> 
> 
> Ok, sorry - should have looked this far before sending earlier email.
> So when you run concurently you see about 5Mpps per core but if you
> shoot all traffic at a single core you see 20Mpps?
No, only sender is multiple, receiver is still single core. The flow is
the same in all 4 of the send threads. Note that only ksoftirqd/6 is
active.
> 
> Devil's advocate question:
> If the bottleneck is the driver - is there an advantage in adding the
> bpf code at all in the driver?
Only by adding this hook into the driver has it become the bottleneck.
Prior to this, the bottleneck was later in the codepath, primarily in
allocations.

If a packet is to be dropped, and a determination can be made with fewer
cpu cycles spent, then there is more time for the goodput.

Beyond that, even if the skb allocation gets 10x or 100x or whatever
improvement, there is still a non-zero cost associated, and dropping bad
packets with minimal time spent has value. The same argument holds for
physical nic forwarding decisions.

> I am curious than before to see the comparison for the same bpf code
> running at tc level vs in the driver..
Here is a perf report for drop in the clsact qdisc with direct-action,
which Daniel earlier showed to have the best performance to-date. On my
machine, this gets about 6.5Mpps drop single core. Drop due to failed
IP lookup (not shown here) is worse @4.5Mpps.

  9.24%  ksoftirqd/3  [mlx4_en]          [k] mlx4_en_process_rx_cq
  8.50%  ksoftirqd/3  [kernel.vmlinux]   [k] dev_gro_receive
  7.24%  ksoftirqd/3  [kernel.vmlinux]   [k] __netif_receive_skb_core
  5.47%  ksoftirqd/3  [mlx4_en]          [k] mlx4_en_complete_rx_desc
  4.74%  ksoftirqd/3  [kernel.vmlinux]   [k] kmem_cache_free
  3.94%  ksoftirqd/3  [mlx4_en]          [k] mlx4_en_alloc_frags
  3.42%  ksoftirqd/3  [kernel.vmlinux]   [k] napi_gro_frags
  3.34%  ksoftirqd/3  [kernel.vmlinux]   [k] inet_gro_receive
  3.32%  ksoftirqd/3  [kernel.vmlinux]   [k] __build_skb
  3.28%  ksoftirqd/3  [kernel.vmlinux]   [k] __napi_alloc_skb
  2.94%  ksoftirqd/3  [cls_bpf]          [k] cls_bpf_classify
  2.88%  ksoftirqd/3  [kernel.vmlinux]   [k] ktime_get_with_offset
  2.50%  ksoftirqd/3  [kernel.vmlinux]   [k] eth_type_trans
  2.40%  ksoftirqd/3  [kernel.vmlinux]   [k] kmem_cache_alloc
  2.29%  ksoftirqd/3  [kernel.vmlinux]   [k] skb_release_data
  2.25%  ksoftirqd/3  [kernel.vmlinux]   [k] gro_pull_from_frag0
  2.09%  ksoftirqd/3  [kernel.vmlinux]   [k] netif_receive_skb_internal
  1.99%  ksoftirqd/3  [kernel.vmlinux]   [k] memcpy_erms
  1.73%  ksoftirqd/3  [kernel.vmlinux]   [k] napi_get_frags
  1.66%  ksoftirqd/3  [kernel.vmlinux]   [k] __udp4_lib_lookup
  1.60%  ksoftirqd/3  [kernel.vmlinux]   [k] tc_classify
  1.25%  ksoftirqd/3  [kernel.vmlinux]   [k] kfree_skb
  1.24%  ksoftirqd/3  [kernel.vmlinux]   [k] get_page_from_freelist
  1.24%  ksoftirqd/3  [kernel.vmlinux]   [k] skb_gro_reset_offset
  1.16%  ksoftirqd/3  [kernel.vmlinux]   [k] udp4_gro_receive
  1.12%  ksoftirqd/3  [kernel.vmlinux]   [k] udp_gro_receive
  0.93%  ksoftirqd/3  [kernel.vmlinux]   [k] __free_page_frag
  0.91%  ksoftirqd/3  [kernel.vmlinux]   [k] skb_release_head_state
  0.89%  ksoftirqd/3  [kernel.vmlinux]   [k] __alloc_page_frag
  0.88%  ksoftirqd/3  [kernel.vmlinux]   [k] udp4_lib_lookup_skb
  0.83%  swapper      [kernel.vmlinux]   [k] intel_idle
  0.81%  ksoftirqd/3  [kernel.vmlinux]   [k] kfree_skbmem
  0.77%  ksoftirqd/3  [kernel.vmlinux]   [k] skb_release_all
  0.76%  ksoftirqd/3  [mlx4_en]          [k] mlx4_en_free_frag
  0.68%  ksoftirqd/3  [kernel.vmlinux]   [k] __netif_receive_skb
  0.64%  ksoftirqd/3  [kernel.vmlinux]   [k] free_pages_prepare
  0.53%  ksoftirqd/3  [kernel.vmlinux]   [k] read_tsc
  0.43%  ksoftirqd/3  [kernel.vmlinux]   [k] swiotlb_sync_single
  0.38%  ksoftirqd/3  [kernel.vmlinux]   [k] __memcpy
  0.37%  ksoftirqd/3  [kernel.vmlinux]   [k] bpf_map_lookup_elem
  0.35%  ksoftirqd/3  [kernel.vmlinux]   [k] __memcg_kmem_put_cache
  0.32%  ksoftirqd/3  [kernel.vmlinux]   [k] swiotlb_sync_single_for_cpu
  0.32%  ksoftirqd/3  [kernel.vmlinux]   [k] free_one_page
  0.25%  ksoftirqd/3  [kernel.vmlinux]   [k] __alloc_pages_nodemask
  0.23%  ksoftirqd/3  [kernel.vmlinux]   [k] net_rx_action
  0.22%  ksoftirqd/3  [kernel.vmlinux]   [k] __free_pages_ok
  0.21%  ksoftirqd/3  [mlx4_en]          [k] mlx4_alloc_pages.isra.23
  0.17%  ksoftirqd/3  [kernel.vmlinux]   [k] percpu_array_map_lookup_elem
  0.17%  ksoftirqd/3  [kernel.vmlinux]   [k] PageHuge
  0.15%  ksoftirqd/3  [wmi]              [k] 0x0000000000005d49
  0.15%  ksoftirqd/3  [kernel.vmlinux]   [k] __rmqueue
  0.13%  ksoftirqd/3  [wmi]              [k] 0x0000000000005d60

> 
> cheers,
> jamal

^ permalink raw reply

* Re: [PATCH net-next 3/6] qed/qede: Add VXLAN tunnel slowpath configuration support
From: Jesse Gross @ 2016-04-09 16:29 UTC (permalink / raw)
  To: Manish Chopra
  Cc: David Miller, Linux Kernel Network Developers, Ariel.Elior,
	Yuval.Mintz
In-Reply-To: <1460207825-3622-4-git-send-email-manish.chopra@qlogic.com>

On Sat, Apr 9, 2016 at 10:17 AM, Manish Chopra <manish.chopra@qlogic.com> wrote:
> diff --git a/drivers/net/ethernet/qlogic/qede/qede_main.c b/drivers/net/ethernet/qlogic/qede/qede_main.c
> index 518af32..9a82d42 100644
> --- a/drivers/net/ethernet/qlogic/qede/qede_main.c
> +++ b/drivers/net/ethernet/qlogic/qede/qede_main.c
[...]
> +static netdev_features_t qede_features_check(struct sk_buff *skb,
> +                                            struct net_device *dev,
> +                                            netdev_features_t features)
> +{
> +       return vxlan_features_check(skb, features);
> +}

This is going to restrict the set of protocols that can be offloaded
to those that look exactly like VXLAN. In particular, it means that
you won't be able offload Geneve with options. I don't think this what
you mean to do given that you are supporting these protocols in the
other patches and I know this hardware has more capabilities than just
VXLAN. I think that you want to do a header length check - similar to
what the Intel drivers are doing for example.

I noticed that the bnx2x driver has a similar issue as well.

^ permalink raw reply

* Re: [PATCHv2 net-next 1/6] sctp: add sctp_info dump api for sctp_diag
From: Xin Long @ 2016-04-09 16:10 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: network dev, linux-sctp, Marcelo Ricardo Leitner, Vlad Yasevich,
	daniel, davem
In-Reply-To: <1460179179.6473.476.camel@edumazet-glaptop3.roam.corp.google.com>

On Sat, Apr 9, 2016 at 1:19 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Sat, 2016-04-09 at 12:53 +0800, Xin Long wrote:
>> sctp_diag will dump some important details of sctp's assoc or ep, we use
>> sctp_info to describe them,  sctp_get_sctp_info to get them, and export
>> it to sctp_diag.ko.
>>
>
>
>> +int sctp_get_sctp_info(struct sock *sk, struct sctp_association *asoc,
>> +                    struct sctp_info *info)
>> +{
>> +     struct sctp_transport *prim;
>> +     struct list_head *pos, *temp;
>> +     int mask;
>> +
>> +     memset(info, 0, sizeof(*info));
>> +     if (!asoc) {
>> +             struct sctp_sock *sp = sctp_sk(sk);
>> +
>> +             info->sctpi_s_autoclose = sp->autoclose;
>> +             info->sctpi_s_adaptation_ind = sp->adaptation_ind;
>> +             info->sctpi_s_pd_point = sp->pd_point;
>> +             info->sctpi_s_nodelay = sp->nodelay;
>> +             info->sctpi_s_disable_fragments = sp->disable_fragments;
>> +             info->sctpi_s_v4mapped = sp->v4mapped;
>> +             info->sctpi_s_frag_interleave = sp->frag_interleave;
>> +
>> +             return 0;
>> +     }
>> +
>> +     info->sctpi_tag = asoc->c.my_vtag;
>> +     info->sctpi_state = asoc->state;
>> +     info->sctpi_rwnd = asoc->a_rwnd;
>> +     info->sctpi_unackdata = asoc->unack_data;
>> +     info->sctpi_penddata = sctp_tsnmap_pending(&asoc->peer.tsn_map);
>> +     info->sctpi_instrms = asoc->c.sinit_max_instreams;
>> +     info->sctpi_outstrms = asoc->c.sinit_num_ostreams;
>> +     list_for_each_safe(pos, temp, &asoc->base.inqueue.in_chunk_list)
>> +             info->sctpi_inqueue++;
>> +     list_for_each_safe(pos, temp, &asoc->outqueue.out_chunk_list)
>> +             info->sctpi_outqueue++;
>
> Is this safe ?
>
> Do you own the lock on socket or whatever lock protecting this list ?
>
there are 2 places will call these codes,
1. sctp_diag_dump -> sctp_for_each_transport -> sctp_tsp_dump
this one will use lock_sock to protect them. I think this one is ok.

1. sctp_diag_dump_one -> sctp_transport_lookup_process-> sctp_tsp_dump_one
this one just holds the tsp. and we're using  list_for_each_safe here now,
isn't it enough ?


>
>> +     info->sctpi_overall_error = asoc->overall_error_count;
>> +     info->sctpi_max_burst = asoc->max_burst;
>> +     info->sctpi_maxseg = asoc->frag_point;
>> +     info->sctpi_peer_rwnd = asoc->peer.rwnd;
>> +     info->sctpi_peer_tag = asoc->c.peer_vtag;
>> +
>> +     mask = asoc->peer.ecn_capable << 1;
>> +     mask = (mask | asoc->peer.ipv4_address) << 1;
>> +     mask = (mask | asoc->peer.ipv6_address) << 1;
>> +     mask = (mask | asoc->peer.hostname_address) << 1;
>> +     mask = (mask | asoc->peer.asconf_capable) << 1;
>> +     mask = (mask | asoc->peer.prsctp_capable) << 1;
>> +     mask = (mask | asoc->peer.auth_capable);
>> +     info->sctpi_peer_capable = mask;
>> +     mask = asoc->peer.sack_needed << 1;
>> +     mask = (mask | asoc->peer.sack_generation) << 1;
>> +     mask = (mask | asoc->peer.zero_window_announced);
>> +     info->sctpi_peer_sack = mask;
>> +
>> +     info->sctpi_isacks = asoc->stats.isacks;
>> +     info->sctpi_osacks = asoc->stats.osacks;
>> +     info->sctpi_opackets = asoc->stats.opackets;
>> +     info->sctpi_ipackets = asoc->stats.ipackets;
>> +     info->sctpi_rtxchunks = asoc->stats.rtxchunks;
>> +     info->sctpi_outofseqtsns = asoc->stats.outofseqtsns;
>> +     info->sctpi_idupchunks = asoc->stats.idupchunks;
>> +     info->sctpi_gapcnt = asoc->stats.gapcnt;
>> +     info->sctpi_ouodchunks = asoc->stats.ouodchunks;
>> +     info->sctpi_iuodchunks = asoc->stats.iuodchunks;
>> +     info->sctpi_oodchunks = asoc->stats.oodchunks;
>> +     info->sctpi_iodchunks = asoc->stats.iodchunks;
>> +     info->sctpi_octrlchunks = asoc->stats.octrlchunks;
>> +     info->sctpi_ictrlchunks = asoc->stats.ictrlchunks;
>> +
>> +     prim = asoc->peer.primary_path;
>> +     memcpy(&info->sctpi_p_address, &prim->ipaddr,
>> +            sizeof(struct sockaddr_storage));
>> +     info->sctpi_p_state = prim->state;
>> +     info->sctpi_p_cwnd = prim->cwnd;
>> +     info->sctpi_p_srtt = prim->srtt;
>> +     info->sctpi_p_rto = jiffies_to_msecs(prim->rto);
>> +     info->sctpi_p_hbinterval = prim->hbinterval;
>> +     info->sctpi_p_pathmaxrxt = prim->pathmaxrxt;
>> +     info->sctpi_p_sackdelay = jiffies_to_msecs(prim->sackdelay);
>> +     info->sctpi_p_ssthresh = prim->ssthresh;
>> +     info->sctpi_p_partial_bytes_acked = prim->partial_bytes_acked;
>> +     info->sctpi_p_flight_size = prim->flight_size;
>> +     info->sctpi_p_error = prim->error_count;
>> +
>> +     return 0;
>> +}
>> +EXPORT_SYMBOL_GPL(sctp_get_sctp_info);
>
> info is not guaranteed to be aligned on 8 bytes.
>
> You need to use put_unaligned()
>
> Check commit ff5d749772018 ("tcp: beware of alignments in
> tcp_get_info()") for details.
Ok,  I will.

>
>
>

^ permalink raw reply

* Re: [PATCH net-next 1/6] net: Make vxlan/geneve default udp ports public
From: Jesse Gross @ 2016-04-09 16:06 UTC (permalink / raw)
  To: Manish Chopra
  Cc: David Miller, Linux Kernel Network Developers, Ariel.Elior,
	Yuval.Mintz
In-Reply-To: <1460207825-3622-2-git-send-email-manish.chopra@qlogic.com>

On Sat, Apr 9, 2016 at 10:17 AM, Manish Chopra <manish.chopra@qlogic.com> wrote:
> Rationale behind this change is that with some OVS configuration
> UDP ports doesn't get notified to the driver using
> .ndo_[add|del]_vxlan_port. So for the driver to work with
> these specific ports in that environment we need to have them configured
> on adapter by default for the required hardware offload support.

I think you are referring to old out of tree code - no version of
upstream OVS does this. In addition, any old code won't work against
the new kernels that would include this driver update anyways so there
won't be a benefit in any case.

Please just use the normal registration mechanism that is already
exposed. I also noticed that in the Geneve case you aren't currently
registering for port notifications and just using the assigned port
number in all cases, which isn't right.

^ permalink raw reply

* Re: [RFC PATCH 07/11] GENEVE: Add option to mangle IP IDs on inner headers when using TSO
From: Jesse Gross @ 2016-04-09 15:52 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: Alexander Duyck, Herbert Xu, Tom Herbert, Eric Dumazet,
	Linux Kernel Network Developers, David Miller
In-Reply-To: <CAKgT0UcehJaKhEQCATuyp48Q0gqofk7JjwsakNo1KgJd2YH-TA@mail.gmail.com>

On Fri, Apr 8, 2016 at 7:04 PM, Alexander Duyck
<alexander.duyck@gmail.com> wrote:
> On Fri, Apr 8, 2016 at 2:40 PM, Jesse Gross <jesse@kernel.org> wrote:
>> Maybe I missed it but I didn't see any checks for the DF bit being set
>> when we transmit a packet with NETIF_F_TSO_MANGLEID. Even if I am
>> comfortable mangling my IDs in the DF case, I don't think this would
>> ever extend to non-DF packets. In the documentation you noted that it
>> is the driver's responsibility to do this check but I couldn't find it
>> in either ixgbe or igb. It would also be nice if the core stack could
>> enforce it somehow as well rather than each driver.
>
> Yeah I had glossed over that in the igb and ixgbe patches.  A check is
> only really needed for the incrementing to non-incrementing case and I
> wasn't sure how common it was to have TCP with an IP header that
> didn't set the DF bit.  In the case of the outer headers igb and ixgbe
> will increment the IP ID always so we don't have to worry about if DF
> is set of not there.  For the inner headers I had fudged it a bit and
> didn't add the validation.  If needed I can see about adding that
> shortly.

TCP without the DF bit set is not the default but it is possible (it
can be enabled by setting /proc/sys/net/ipv4/ip_no_pmtu_disc). I also
did a quick check of some Internet services and at least some of them
seem to return TCP without DF, so it's not too rare.

^ permalink raw reply

* Re: [PATCHv2 net-next 1/6] sctp: add sctp_info dump api for sctp_diag
From: Xin Long @ 2016-04-09 15:45 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: network dev, linux-sctp, Marcelo Ricardo Leitner, Vlad Yasevich,
	daniel, davem
In-Reply-To: <1460178984.6473.473.camel@edumazet-glaptop3.roam.corp.google.com>

On Sat, Apr 9, 2016 at 1:16 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Sat, 2016-04-09 at 12:53 +0800, Xin Long wrote:
>> sctp_diag will dump some important details of sctp's assoc or ep, we use
>> sctp_info to describe them,  sctp_get_sctp_info to get them, and export
>> it to sctp_diag.ko.
>>
>> Signed-off-by: Xin Long <lucien.xin@gmail.com>
>> ---
>>  include/linux/sctp.h    | 65 +++++++++++++++++++++++++++++++++++++
>>  include/net/sctp/sctp.h |  3 ++
>>  net/sctp/socket.c       | 86 +++++++++++++++++++++++++++++++++++++++++++++++++
>>  3 files changed, 154 insertions(+)
>>
>> diff --git a/include/linux/sctp.h b/include/linux/sctp.h
>> index a9414fd..a448ebc 100644
>> --- a/include/linux/sctp.h
>> +++ b/include/linux/sctp.h
>> @@ -705,4 +705,69 @@ typedef struct sctp_auth_chunk {
>>       sctp_authhdr_t auth_hdr;
>>  } __packed sctp_auth_chunk_t;
>>
>> +struct sctp_info {
>> +     __u32   sctpi_tag;
>> +     __u32   sctpi_state;
>> +     __u32   sctpi_rwnd;
>> +     __u16   sctpi_unackdata;
>> +     __u16   sctpi_penddata;
>> +     __u16   sctpi_instrms;
>> +     __u16   sctpi_outstrms;
>> +     __u32   sctpi_fragmentation_point;
>> +     __u32   sctpi_inqueue;
>> +     __u32   sctpi_outqueue;
>> +     __u32   sctpi_overall_error;
>> +     __u32   sctpi_max_burst;
>> +     __u32   sctpi_maxseg;
>> +     __u32   sctpi_peer_rwnd;
>> +     __u32   sctpi_peer_tag;
>> +     __u8    sctpi_peer_capable;
>> +     __u8    sctpi_peer_sack;
>> +
>> +     /* assoc status info */
>> +     __u64   sctpi_isacks;
>> +     __u64   sctpi_osacks;
>> +     __u64   sctpi_opackets;
>> +     __u64   sctpi_ipackets;
>> +     __u64   sctpi_rtxchunks;
>> +     __u64   sctpi_outofseqtsns;
>> +     __u64   sctpi_idupchunks;
>> +     __u64   sctpi_gapcnt;
>> +     __u64   sctpi_ouodchunks;
>> +     __u64   sctpi_iuodchunks;
>> +     __u64   sctpi_oodchunks;
>> +     __u64   sctpi_iodchunks;
>> +     __u64   sctpi_octrlchunks;
>> +     __u64   sctpi_ictrlchunks;
>> +
>> +     /* primary transport info */
>> +     struct sockaddr_storage sctpi_p_address;
>> +     __s32   sctpi_p_state;
>> +     __u32   sctpi_p_cwnd;
>> +     __u32   sctpi_p_srtt;
>> +     __u32   sctpi_p_rto;
>> +     __u32   sctpi_p_hbinterval;
>> +     __u32   sctpi_p_pathmaxrxt;
>> +     __u32   sctpi_p_sackdelay;
>> +     __u32   sctpi_p_sackfreq;
>> +     __u32   sctpi_p_ssthresh;
>> +     __u32   sctpi_p_partial_bytes_acked;
>> +     __u32   sctpi_p_flight_size;
>> +     __u16   sctpi_p_error;
>> +
>> +     /* sctp sock info */
>> +     __u32   sctpi_s_autoclose;
>> +     __u32   sctpi_s_adaptation_ind;
>> +     __u32   sctpi_s_pd_point;
>> +     __u8    sctpi_s_nodelay;
>> +     __u8    sctpi_s_disable_fragments;
>> +     __u8    sctpi_s_v4mapped;
>> +     __u8    sctpi_s_frag_interleave;
>> +};
>> +
>
> Lots of holes in this structure...
>
>
will check and improve it later.
thanks.

^ permalink raw reply

* Re: [next-queue PATCH 0/3] Add support for GSO partial to Intel NIC drivers
From: Alexander Duyck @ 2016-04-09 15:41 UTC (permalink / raw)
  To: Jeff Kirsher
  Cc: Alexander Duyck, Herbert Xu, Tom Herbert, Jesse Gross,
	Eric Dumazet, intel-wired-lan, Netdev, David Miller
In-Reply-To: <1460185149.2982.6.camel@intel.com>

On Fri, Apr 8, 2016 at 11:59 PM, Jeff Kirsher
<jeffrey.t.kirsher@intel.com> wrote:
> On Fri, 2016-04-08 at 17:06 -0400, Alexander Duyck wrote:
>> So these are the patches needed to enable tunnel segmentation
>> offloads on
>> the igb, igbvf, ixgbe, and ixgbevf drivers.  In addition this patch
>> extends
>> the i40e and i40evf drivers to include segmentation support for
>> tunnels
>> with outer checksums.
>>
>> The net performance gain for these patches are pretty significant.
>> In the
>> case of i40e a tunnel with outer checksums showed the following
>> improvement:
>> Throughput Throughput  Local Local   Result
>>            Units       CPU   Service Tag
>>                        Util  Demand
>>                        %
>> 14066.29   10^6bits/s  3.49  0.651   "before"
>> 20618.16   10^6bits/s  3.09  0.393   "after"
>>
>> For ixgbe similar results were seen:
>> Throughput Throughput  Local  Local   Result
>>            Units       CPU    Service Tag
>>                        Util   Demand
>>                        %
>> 12879.89   10^6bits/s  10.00  0.763   "before"
>> 14286.77   10^6bits/s  5.74   0.395   "after"
>>
>> These patches all rely on the TSO_MANGLEID and GSO_PARTIAL patches so
>> I
>> would not recommend applying them until those patches have first been
>> applied.
>
> Sorry I did not see this until after I tried applying your series. :-(
>
> Maybe the two dependent patches should have been in the series, so I
> and others do not waste their time.  Or not send this until the two
> patches were accepted.

Sorry I meant to send these as an RFC but sent it out with the
next-queue tag as I had gotten a bit distracted.

I shouldn't need to resubmit these until the other patches are
accepted so I will probably follow that route.

Thanks.

- Alex

^ permalink raw reply

* Re: [PATCHv2 net-next 4/6] sctp: add the sctp_diag.c file
From: Xin Long @ 2016-04-09 15:40 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: network dev, linux-sctp, Marcelo Ricardo Leitner, Vlad Yasevich,
	daniel, davem
In-Reply-To: <1460181116.6473.483.camel@edumazet-glaptop3.roam.corp.google.com>

On Sat, Apr 9, 2016 at 1:51 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Sat, 2016-04-09 at 12:53 +0800, Xin Long wrote:
>> This one will implement all the interface of inet_diag, inet_diag_handler.
>> which includes sctp_diag_dump, sctp_diag_dump_one and sctp_diag_get_info.
>
>
>> +static int inet_assoc_diag_fill(struct sock *sk,
>> +                             struct sctp_association *asoc,
>> +                             struct sk_buff *skb,
>> +                             const struct inet_diag_req_v2 *req,
>> +                             struct user_namespace *user_ns,
>> +                             int portid, u32 seq, u16 nlmsg_flags,
>> +                             const struct nlmsghdr *unlh)
>> +{
>> +     const struct inet_sock *inet = inet_sk(sk);
>> +     const struct inet_diag_handler *handler;
>> +     int ext = req->idiag_ext;
>> +     struct inet_diag_msg *r;
>> +     struct nlmsghdr  *nlh;
>> +     struct nlattr *attr;
>> +     void *info = NULL;
>> +     union sctp_addr laddr, paddr;
>> +     struct dst_entry *dst;
>> +     struct sctp_infox infox;
>> +
>> +     handler = inet_diag_get_handler(req->sdiag_protocol);
>> +     BUG_ON(!handler);
>> +
>> +     nlh = nlmsg_put(skb, portid, seq, unlh->nlmsg_type, sizeof(*r),
>> +                     nlmsg_flags);
>> +     if (!nlh)
>> +             return -EMSGSIZE;
>> +
>> +     r = nlmsg_data(nlh);
>> +     BUG_ON(!sk_fullsock(sk));
>> +
>> +     laddr = list_entry(asoc->base.bind_addr.address_list.next,
>> +                        struct sctp_sockaddr_entry, list)->a;
>> +     paddr = asoc->peer.primary_path->ipaddr;
>> +     dst = asoc->peer.primary_path->dst;
>> +
>> +     r->idiag_family = sk->sk_family;
>> +     r->id.idiag_sport = htons(asoc->base.bind_addr.port);
>> +     r->id.idiag_dport = htons(asoc->peer.port);
>> +     r->id.idiag_if = dst ? dst->dev->ifindex : 0;
>> +     sock_diag_save_cookie(sk, r->id.idiag_cookie);
>> +
>> +#if IS_ENABLED(CONFIG_IPV6)
>> +     if (sk->sk_family == AF_INET6) {
>> +             *(struct in6_addr *)r->id.idiag_src = laddr.v6.sin6_addr;
>> +             *(struct in6_addr *)r->id.idiag_dst = paddr.v6.sin6_addr;
>> +     } else
>> +#endif
>> +     {
>> +             memset(&r->id.idiag_src, 0, sizeof(r->id.idiag_src));
>> +             memset(&r->id.idiag_dst, 0, sizeof(r->id.idiag_dst));
>> +
>> +             r->id.idiag_src[0] = laddr.v4.sin_addr.s_addr;
>> +             r->id.idiag_dst[0] = paddr.v4.sin_addr.s_addr;
>> +     }
>> +
>> +     r->idiag_state = asoc->state;
>> +     r->idiag_timer = SCTP_EVENT_TIMEOUT_T3_RTX;
>> +     r->idiag_retrans = asoc->rtx_data_chunks;
>> +#define EXPIRES_IN_MS(tmo)  DIV_ROUND_UP((tmo - jiffies) * 1000, HZ)
>> +     r->idiag_expires =
>> +             EXPIRES_IN_MS(asoc->timeouts[SCTP_EVENT_TIMEOUT_T3_RTX]);
>> +#undef EXPIRES_IN_MS
>> +
>> +     if (nla_put_u8(skb, INET_DIAG_SHUTDOWN, sk->sk_shutdown))
>> +             goto errout;
>> +
>> +     /* IPv6 dual-stack sockets use inet->tos for IPv4 connections,
>> +      * hence this needs to be included regardless of socket family.
>> +      */
>> +     if (ext & (1 << (INET_DIAG_TOS - 1)))
>> +             if (nla_put_u8(skb, INET_DIAG_TOS, inet->tos) < 0)
>> +                     goto errout;
>> +
>> +#if IS_ENABLED(CONFIG_IPV6)
>> +     if (r->idiag_family == AF_INET6) {
>> +             if (ext & (1 << (INET_DIAG_TCLASS - 1)))
>> +                     if (nla_put_u8(skb, INET_DIAG_TCLASS,
>> +                                    inet6_sk(sk)->tclass) < 0)
>> +                             goto errout;
>> +
>> +             if (((1 << sk->sk_state) & (TCPF_LISTEN | TCPF_CLOSE)) &&
>> +                 nla_put_u8(skb, INET_DIAG_SKV6ONLY, ipv6_only_sock(sk)))
>> +                     goto errout;
>> +     }
>> +#endif
>> +
>> +     r->idiag_uid = from_kuid_munged(user_ns, sock_i_uid(sk));
>> +     r->idiag_inode = sock_i_ino(sk);
>> +
>> +     if (ext & (1 << (INET_DIAG_MEMINFO - 1))) {
>> +             struct inet_diag_meminfo minfo = {
>> +                     .idiag_rmem = sk_rmem_alloc_get(sk),
>> +                     .idiag_wmem = sk->sk_wmem_queued,
>> +                     .idiag_fmem = sk->sk_forward_alloc,
>> +                     .idiag_tmem = sk_wmem_alloc_get(sk),
>> +             };
>> +
>
> All this code looks familiar.
>
> Why inet_sk_diag_fill() is not used instead ?
>
it's hard to reuse  inet_sk_diag_fill(), cause some of them are from
assoc.

yes, there are some duplicate codes. if we want to avoid this.
we have to extract new function for this part, and it will change
more in inet_diag.


>> +             if (nla_put(skb, INET_DIAG_MEMINFO, sizeof(minfo), &minfo) < 0)
>> +                     goto errout;
>> +     }
>> +
>> +     if (ext & (1 << (INET_DIAG_SKMEMINFO - 1)))
>> +             if (sock_diag_put_meminfo(sk, skb, INET_DIAG_SKMEMINFO))
>> +                     goto errout;
>> +
>> +     if ((ext & (1 << (INET_DIAG_INFO - 1))) && handler->idiag_info_size) {
>> +             attr = nla_reserve(skb, INET_DIAG_INFO,
>> +                                handler->idiag_info_size);
>> +             if (!attr)
>> +                     goto errout;
>> +
>> +             info = nla_data(attr);
>> +     }
>> +     infox.sctpinfo = (struct sctp_info *)info;
>> +     infox.asoc = asoc;
>> +     handler->idiag_get_info(sk, r, &infox);
>> +
>> +     if (ext & (1 << (INET_DIAG_CONG - 1)))
>> +             if (nla_put_string(skb, INET_DIAG_CONG, "reno") < 0)
>> +                     goto errout;
>> +
>> +     if (inet_sctp_fill_laddrs(skb, &asoc->base.bind_addr.address_list))
>> +             goto errout;
>> +
>> +     if (inet_sctp_fill_paddrs(skb, asoc))
>> +             goto errout;
>> +
>> +     nlmsg_end(skb, nlh);
>> +     return 0;
>> +
>> +errout:
>> +     nlmsg_cancel(skb, nlh);
>> +     return -EMSGSIZE;
>> +}
>> +
>> +static int inet_ep_diag_fill(struct sock *sk, struct sctp_endpoint *ep,
>> +                          struct sk_buff *skb,
>> +                          const struct inet_diag_req_v2 *req,
>> +                          struct user_namespace *user_ns,
>> +                          u32 portid, u32 seq, u16 nlmsg_flags,
>> +                          const struct nlmsghdr *unlh)
>> +{
>> +     const struct inet_sock *inet = inet_sk(sk);
>> +     const struct inet_diag_handler *handler;
>> +     int ext = req->idiag_ext;
>> +     struct inet_diag_msg *r;
>> +     struct nlmsghdr  *nlh;
>> +     struct nlattr *attr;
>> +     void *info = NULL;
>> +     struct sctp_infox infox;
>> +
>> +     handler = inet_diag_get_handler(req->sdiag_protocol);
>> +     BUG_ON(!handler);
>> +
>> +     nlh = nlmsg_put(skb, portid, seq, unlh->nlmsg_type, sizeof(*r),
>> +                     nlmsg_flags);
>> +     if (!nlh)
>> +             return -EMSGSIZE;
>> +
>> +     r = nlmsg_data(nlh);
>> +     BUG_ON(!sk_fullsock(sk));
>> +
>> +     inet_diag_msg_common_fill(r, sk);
>> +     r->idiag_state = sk->sk_state;
>> +     r->idiag_timer = 0;
>> +     r->idiag_retrans = 0;
>> +
>> +     if (nla_put_u8(skb, INET_DIAG_SHUTDOWN, sk->sk_shutdown))
>> +             goto errout;
>> +
>> +     /* IPv6 dual-stack sockets use inet->tos for IPv4 connections,
>> +      * hence this needs to be included regardless of socket family.
>> +      */
>> +     if (ext & (1 << (INET_DIAG_TOS - 1)))
>> +             if (nla_put_u8(skb, INET_DIAG_TOS, inet->tos) < 0)
>> +                     goto errout;
>> +
>> +#if IS_ENABLED(CONFIG_IPV6)
>> +     if (r->idiag_family == AF_INET6) {
>> +             if (ext & (1 << (INET_DIAG_TCLASS - 1)))
>> +                     if (nla_put_u8(skb, INET_DIAG_TCLASS,
>> +                                    inet6_sk(sk)->tclass) < 0)
>> +                             goto errout;
>> +
>> +             if (((1 << sk->sk_state) & (TCPF_LISTEN | TCPF_CLOSE)) &&
>> +                 nla_put_u8(skb, INET_DIAG_SKV6ONLY, ipv6_only_sock(sk)))
>> +                     goto errout;
>> +     }
>> +#endif
>> +
>> +     r->idiag_uid = from_kuid_munged(user_ns, sock_i_uid(sk));
>> +     r->idiag_inode = sock_i_ino(sk);
>> +
>> +     if (ext & (1 << (INET_DIAG_MEMINFO - 1))) {
>> +             struct inet_diag_meminfo minfo = {
>> +                     .idiag_rmem = sk_rmem_alloc_get(sk),
>> +                     .idiag_wmem = sk->sk_wmem_queued,
>> +                     .idiag_fmem = sk->sk_forward_alloc,
>> +                     .idiag_tmem = sk_wmem_alloc_get(sk),
>> +             };
>> +
>
> Again, looks a lot of duplication.
>
> Also you missed that INET_DIAG_MEMINFO is kind of obsolete,
> now we have sock_diag_put_meminfo()
>
you meant we can remove it here ?
yes, it seems similar with INET_DIAG_SKMEMINFO.
but I do not know if userpace may use  INET_DIAG_MEMINFO now.

>
>> +             if (nla_put(skb, INET_DIAG_MEMINFO, sizeof(minfo), &minfo) < 0)
>> +                     goto errout;
>> +     }
>> +
>> +     if (ext & (1 << (INET_DIAG_SKMEMINFO - 1)))
>> +             if (sock_diag_put_meminfo(sk, skb, INET_DIAG_SKMEMINFO))
>> +                     goto errout;
>> +
>> +     if ((ext & (1 << (INET_DIAG_INFO - 1))) && handler->idiag_info_size) {
>> +             attr = nla_reserve(skb, INET_DIAG_INFO,
>> +                                handler->idiag_info_size);
>> +             if (!attr)
>> +                     goto errout;
>> +
>> +             info = nla_data(attr);
>> +     }
>> +     infox.sctpinfo = (struct sctp_info *)info;
>> +     infox.asoc = NULL;
>> +     handler->idiag_get_info(sk, r, &infox);
>> +
>> +     if (inet_sctp_fill_laddrs(skb, &ep->base.bind_addr.address_list))
>> +             goto errout;
>> +
>> +     nlmsg_end(skb, nlh);
>> +     return 0;
>> +
>> +errout:
>> +     nlmsg_cancel(skb, nlh);
>> +     return -EMSGSIZE;
>> +}
>> +
>> +static size_t inet_assoc_attr_size(struct sctp_association *asoc)
>> +{
>> +     int addrlen = sizeof(struct sockaddr_storage);
>> +     int addrcnt = 0;
>> +     struct sctp_sockaddr_entry *laddr;
>> +
>> +     list_for_each_entry_rcu(laddr, &asoc->base.bind_addr.address_list,
>> +                             list)
>> +             addrcnt++;
>> +
>> +     return    nla_total_size(sizeof(struct tcp_info))
>
> Are you sure you want to use tcp_info ???
here is a mistake, it should be:
    nla_total_size(sizeof(struct sctp_info))

thanks.
>
>> +             + nla_total_size(1) /* INET_DIAG_SHUTDOWN */
>> +             + nla_total_size(1) /* INET_DIAG_TOS */
>> +             + nla_total_size(1) /* INET_DIAG_TCLASS */
>> +             + nla_total_size(addrlen * asoc->peer.transport_count)
>> +             + nla_total_size(addrlen * addrcnt)
>> +             + nla_total_size(sizeof(struct inet_diag_meminfo))
>> +             + nla_total_size(sizeof(struct inet_diag_msg))
>> +             + nla_total_size(sizeof(struct sctp_info))
and will remove this one.

>> +             + 64;
>> +}
>
>
>

^ permalink raw reply

* Re: [RFC PATCH v2 1/5] bpf: add PHYS_DEV prog type for early driver filter
From: Jamal Hadi Salim @ 2016-04-09 15:29 UTC (permalink / raw)
  To: Tom Herbert, Alexei Starovoitov
  Cc: Jesper Dangaard Brouer, Brenden Blanco, David S. Miller,
	Linux Kernel Network Developers, Or Gerlitz, Daniel Borkmann,
	Eric Dumazet, Edward Cree, john fastabend, Thomas Graf,
	Johannes Berg, eranlinuxmellanox, Lorenzo Colitti, linux-mm
In-Reply-To: <CALx6S36d74D-8Rx762nmNwb1TF0M0sBfojBhUF96prJiYmDYiQ@mail.gmail.com>

On 16-04-09 07:29 AM, Tom Herbert wrote:

> +1. Forwarding which will be a common application almost always
> requires modification (decrement TTL), and header data split has
> always been a weak feature since the device has to have some arbitrary
> rules about what headers needs to be split out (either implements
> protocol specific parsing or some fixed length).

Then this is sensible. I was cruising the threads and
confused by your earlier emails Tom because you talked
about XPS etc. It sounded like the idea evolved into putting
the whole freaking stack on bpf.
If this is _forwarding only_ it maybe useful to look at
Alexey's old code in particular the DMA bits;
he built his own lookup algorithm but sounds like bpf is
a much better fit today.

cheers,
jamal

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [PATCHv2 net-next 1/6] sctp: add sctp_info dump api for sctp_diag
From: Jamal Hadi Salim @ 2016-04-09 15:19 UTC (permalink / raw)
  To: Eric Dumazet, Xin Long
  Cc: network dev, linux-sctp, Marcelo Ricardo Leitner, Vlad Yasevich,
	daniel, davem
In-Reply-To: <1460178984.6473.473.camel@edumazet-glaptop3.roam.corp.google.com>

On 16-04-09 01:16 AM, Eric Dumazet wrote:

>
> Lots of holes in this structure...
>
>


I may have mentioned to you that there is 8 bit hole in tcp_info too ;->
(above tcpi_rto). Adding an 8 bit explicit pad seems useful
since it is already in the wild. I was going to send the patch after
netdev11 but  forgot.

cheers,
jamal

^ permalink raw reply

* Re: [PATCHv2 net-next 1/6] sctp: add sctp_info dump api for sctp_diag
From: Jamal Hadi Salim @ 2016-04-09 15:16 UTC (permalink / raw)
  To: Xin Long, network dev, linux-sctp
  Cc: Marcelo Ricardo Leitner, Vlad Yasevich, daniel, davem
In-Reply-To: <c507274a984bd1b0a7e7a59d1e825352536efd25.1460177331.git.lucien.xin@gmail.com>

Appreciate these patches. Finally some love for sctp.
Small comment below:

On 16-04-09 12:53 AM, Xin Long wrote:
> sctp_diag will dump some important details of sctp's assoc or ep, we use
> sctp_info to describe them,  sctp_get_sctp_info to get them, and export
> it to sctp_diag.ko.
>
> Signed-off-by: Xin Long <lucien.xin@gmail.com>
> ---
>   include/linux/sctp.h    | 65 +++++++++++++++++++++++++++++++++++++
>   include/net/sctp/sctp.h |  3 ++
>   net/sctp/socket.c       | 86 +++++++++++++++++++++++++++++++++++++++++++++++++
>   3 files changed, 154 insertions(+)
>
> diff --git a/include/linux/sctp.h b/include/linux/sctp.h
> index a9414fd..a448ebc 100644
> --- a/include/linux/sctp.h
> +++ b/include/linux/sctp.h
> @@ -705,4 +705,69 @@ typedef struct sctp_auth_chunk {
>   	sctp_authhdr_t auth_hdr;
>   } __packed sctp_auth_chunk_t;
>
> +struct sctp_info {
> +	__u32	sctpi_tag;
> +	__u32	sctpi_state;
> +	__u32	sctpi_rwnd;
> +	__u16	sctpi_unackdata;
> +	__u16	sctpi_penddata;
> +	__u16	sctpi_instrms;
> +	__u16	sctpi_outstrms;
> +	__u32	sctpi_fragmentation_point;
> +	__u32	sctpi_inqueue;
> +	__u32	sctpi_outqueue;
> +	__u32	sctpi_overall_error;
> +	__u32	sctpi_max_burst;
> +	__u32	sctpi_maxseg;
> +	__u32	sctpi_peer_rwnd;
> +	__u32	sctpi_peer_tag;
> +	__u8	sctpi_peer_capable;
> +	__u8	sctpi_peer_sack;
> +
> +	/* assoc status info */
> +	__u64	sctpi_isacks;
> +	__u64	sctpi_osacks;
> +	__u64	sctpi_opackets;
> +	__u64	sctpi_ipackets;
> +	__u64	sctpi_rtxchunks;
> +	__u64	sctpi_outofseqtsns;
> +	__u64	sctpi_idupchunks;
> +	__u64	sctpi_gapcnt;
> +	__u64	sctpi_ouodchunks;
> +	__u64	sctpi_iuodchunks;
> +	__u64	sctpi_oodchunks;
> +	__u64	sctpi_iodchunks;
> +	__u64	sctpi_octrlchunks;
> +	__u64	sctpi_ictrlchunks;
> +
> +	/* primary transport info */
> +	struct sockaddr_storage	sctpi_p_address;
> +	__s32	sctpi_p_state;
> +	__u32	sctpi_p_cwnd;
> +	__u32	sctpi_p_srtt;
> +	__u32	sctpi_p_rto;
> +	__u32	sctpi_p_hbinterval;
> +	__u32	sctpi_p_pathmaxrxt;
> +	__u32	sctpi_p_sackdelay;
> +	__u32	sctpi_p_sackfreq;
> +	__u32	sctpi_p_ssthresh;
> +	__u32	sctpi_p_partial_bytes_acked;
> +	__u32	sctpi_p_flight_size;
> +	__u16	sctpi_p_error;
> +
> +	/* sctp sock info */
> +	__u32	sctpi_s_autoclose;
> +	__u32	sctpi_s_adaptation_ind;
> +	__u32	sctpi_s_pd_point;
> +	__u8	sctpi_s_nodelay;
> +	__u8	sctpi_s_disable_fragments;
> +	__u8	sctpi_s_v4mapped;
> +	__u8	sctpi_s_frag_interleave;
> +};
> +


Can you double check to make sure this is 32 bit aligned
(no holes) maybe in your case 64 bit aligned?
Sticking  +	__u16	sctpi_p_error in there seems
to kill it.

Also, any plans to do the netlink events and destroy features?

cheers,
jamal

^ permalink raw reply

* [PATCH net-next] ipv6: fix inet6_lookup_listener()
From: Eric Dumazet @ 2016-04-09 15:01 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, Maciej Żenczykowski

From: Eric Dumazet <edumazet@google.com>

A stupid refactoring bug in inet6_lookup_listener() needs to be fixed
in order to get proper SO_REUSEPORT behavior.

Fixes: 3b24d854cb35 ("tcp/dccp: do not touch listener sk_refcnt under synflood")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Maciej Żenczykowski <maze@google.com>
---
 net/ipv6/inet6_hashtables.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/ipv6/inet6_hashtables.c b/net/ipv6/inet6_hashtables.c
index 607da088344d..f1678388fb0d 100644
--- a/net/ipv6/inet6_hashtables.c
+++ b/net/ipv6/inet6_hashtables.c
@@ -137,7 +137,7 @@ struct sock *inet6_lookup_listener(struct net *net,
 	sk_for_each(sk, &ilb->head) {
 		score = compute_score(sk, net, hnum, daddr, dif);
 		if (score > hiscore) {
-			hiscore = score;
+			reuseport = sk->sk_reuseport;
 			if (reuseport) {
 				phash = inet6_ehashfn(net, daddr, hnum,
 						      saddr, sport);
@@ -148,7 +148,7 @@ struct sock *inet6_lookup_listener(struct net *net,
 				matches = 1;
 			}
 			result = sk;
-			reuseport = sk->sk_reuseport;
+			hiscore = score;
 		} else if (score == hiscore && reuseport) {
 			matches++;
 			if (reciprocal_scale(phash, matches) == 0)

^ permalink raw reply related

* Re: [RFC PATCH v2 5/5] Add sample for adding simple drop program to link
From: Jamal Hadi Salim @ 2016-04-09 14:48 UTC (permalink / raw)
  To: Brenden Blanco, davem
  Cc: netdev, tom, alexei.starovoitov, ogerlitz, daniel, brouer,
	eric.dumazet, ecree, john.fastabend, tgraf, johannes,
	eranlinuxmellanox, lorenzo
In-Reply-To: <1460090930-11219-5-git-send-email-bblanco@plumgrid.com>

On 16-04-08 12:48 AM, Brenden Blanco wrote:
> Add a sample program that only drops packets at the
> BPF_PROG_TYPE_PHYS_DEV hook of a link. With the drop-only program,
> observed single core rate is ~19.5Mpps.
>
> Other tests were run, for instance without the dropcnt increment or
> without reading from the packet header, the packet rate was mostly
> unchanged.
>
> $ perf record -a samples/bpf/netdrvx1 $(</sys/class/net/eth0/ifindex)
> proto 17:   19596362 drops/s
>
> ./pktgen_sample03_burst_single_flow.sh -i $DEV -d $IP -m $MAC -t 4
> Running... ctrl^C to stop
> Device: eth4@0
> Result: OK: 7873817(c7872245+d1572) usec, 38801823 (60byte,0frags)
>    4927955pps 2365Mb/sec (2365418400bps) errors: 0
> Device: eth4@1
> Result: OK: 7873817(c7872123+d1693) usec, 38587342 (60byte,0frags)
>    4900715pps 2352Mb/sec (2352343200bps) errors: 0
> Device: eth4@2
> Result: OK: 7873817(c7870929+d2888) usec, 38718848 (60byte,0frags)
>    4917417pps 2360Mb/sec (2360360160bps) errors: 0
> Device: eth4@3
> Result: OK: 7873818(c7872193+d1625) usec, 38796346 (60byte,0frags)
>    4927259pps 2365Mb/sec (2365084320bps) errors: 0
>
> perf report --no-children:
>   29.48%  ksoftirqd/6  [mlx4_en]         [k] mlx4_en_process_rx_cq
>   18.17%  ksoftirqd/6  [mlx4_en]         [k] mlx4_en_alloc_frags
>    8.19%  ksoftirqd/6  [mlx4_en]         [k] mlx4_en_free_frag
>    5.35%  ksoftirqd/6  [kernel.vmlinux]  [k] get_page_from_freelist
>    2.92%  ksoftirqd/6  [kernel.vmlinux]  [k] free_pages_prepare
>    2.90%  ksoftirqd/6  [mlx4_en]         [k] mlx4_call_bpf
>    2.72%  ksoftirqd/6  [fjes]            [k] 0x000000000000af66
>    2.37%  ksoftirqd/6  [kernel.vmlinux]  [k] swiotlb_sync_single_for_cpu
>    1.92%  ksoftirqd/6  [kernel.vmlinux]  [k] percpu_array_map_lookup_elem
>    1.83%  ksoftirqd/6  [kernel.vmlinux]  [k] free_one_page
>    1.70%  ksoftirqd/6  [kernel.vmlinux]  [k] swiotlb_sync_single
>    1.69%  ksoftirqd/6  [kernel.vmlinux]  [k] bpf_map_lookup_elem
>    1.33%  swapper      [kernel.vmlinux]  [k] intel_idle
>    1.32%  ksoftirqd/6  [fjes]            [k] 0x000000000000af90
>    1.21%  ksoftirqd/6  [kernel.vmlinux]  [k] sk_load_byte_positive_offset
>    1.07%  ksoftirqd/6  [kernel.vmlinux]  [k] __alloc_pages_nodemask
>    0.89%  ksoftirqd/6  [kernel.vmlinux]  [k] __rmqueue
>    0.84%  ksoftirqd/6  [mlx4_en]         [k] mlx4_alloc_pages.isra.23
>    0.79%  ksoftirqd/6  [kernel.vmlinux]  [k] net_rx_action
>
> machine specs:
>   receiver - Intel E5-1630 v3 @ 3.70GHz
>   sender - Intel E5645 @ 2.40GHz
>   Mellanox ConnectX-3 @40G
>


Ok, sorry - should have looked this far before sending earlier email.
So when you run concurently you see about 5Mpps per core but if you
shoot all traffic at a single core you see 20Mpps?

Devil's advocate question:
If the bottleneck is the driver - is there an advantage in adding the
bpf code at all in the driver?
I am curious than before to see the comparison for the same bpf code
running at tc level vs in the driver..

cheers,
jamal

^ permalink raw reply

* Re: [RFC PATCH v2 0/5] Add driver bpf hook for early packet drop
From: Jamal Hadi Salim @ 2016-04-09 14:37 UTC (permalink / raw)
  To: Brenden Blanco, davem
  Cc: netdev, tom, alexei.starovoitov, ogerlitz, daniel, brouer,
	eric.dumazet, ecree, john.fastabend, tgraf, johannes,
	eranlinuxmellanox, lorenzo
In-Reply-To: <1460090874-10497-1-git-send-email-bblanco@plumgrid.com>

On 16-04-08 12:47 AM, Brenden Blanco wrote:
> This patch set introduces new infrastructure for programmatically
> processing packets in the earliest stages of rx, as part of an effort
> others are calling Express Data Path (XDP) [1]. Start this effort by
> introducing a new bpf program type for early packet filtering, before even
> an skb has been allocated.
>
> With this, hope to enable line rate filtering, with this initial
> implementation providing drop/allow action only.
>
> Patch 1 introduces the new prog type and helpers for validating the bpf
> program. A new userspace struct is defined containing only len as a field,
> with others to follow in the future.
> In patch 2, create a new ndo to pass the fd to support drivers.
> In patch 3, expose a new rtnl option to userspace.
> In patch 4, enable support in mlx4 driver. No skb allocation is required,
> instead a static percpu skb is kept in the driver and minimally initialized
> for each driver frag.
> In patch 5, create a sample drop and count program. With single core,
> achieved ~20 Mpps drop rate on a 40G mlx4. This includes packet data
> access, bpf array lookup, and increment.

Hrm. This doesnt sound very high (less than 50%?).
Is the driver the main overhead?
I'd be curious, for comparison, if you just dropped everything
without bpf and alternatively with tc + bpf of the same program
on the one cpu.
Numbers we had for the NUC with tc on single core were a bit higher
than 20Mpps but there was no driver overhead - so i expected
to see much higher numbers if you did it at the driver...
Note back in the day Alexey(not Alexei;->) had a built-in driver
level forwarder;
however the advantage there was derived out of packets being DMAed
from ingress to egress port after some simple lookup.

cheers,
jamal

^ permalink raw reply

* Re: [PATCH net-next v2 1/2] rtnetlink: add new RTM_GETSTATS message to dump link stats
From: Jamal Hadi Salim @ 2016-04-09 14:30 UTC (permalink / raw)
  To: Roopa Prabhu, netdev; +Cc: davem
In-Reply-To: <1460183892-57286-2-git-send-email-roopa@cumulusnetworks.com>


Thanks for doing the work Roopa and I apologize for late comments
below:

On 16-04-09 02:38 AM, Roopa Prabhu wrote:
> From: Roopa Prabhu <roopa@cumulusnetworks.com>
>

> This patch also allows for af family stats (an example af stats for IPV6
> is available with the second patch in the series).
>
> Like any other rtnetlink message, RTM_GETSTATS can be used to get stats of
> a single interface or all interfaces with NLM_F_DUMP.
>
> Future possible new types of stat attributes:
> - IFLA_MPLS_STATS  (nested. for mpls/mdev stats)
> - IFLA_EXTENDED_STATS (nested. extended software netdev stats like bridge,
>    vlan, vxlan etc)
> - IFLA_EXTENDED_HW_STATS (nested. extended hardware stats which are
>    available via ethtool today)
>

I got the extended_hw_stats (which are very common in a lot of ASICS) if
you mean stats on packet sizes. But would the other extended stats not
just be per netdev kind specific? We have concept of XSTATS which maybe
a fit.

> This patch also declares a filter mask for all stat attributes.
> User has to provide a mask of stats attributes to query. This will be
> specified in a new hdr 'struct if_stats_msg' for stats messages.
>
> Without any attributes in the filter_mask, no stats will be returned.
>

Should such a command then not be rejected with an error code?

> +/* STATS section */
> +
> +struct if_stats_msg {
> +	__u8  family;
> +	__u32 ifindex;
> +	__u32 filter_mask;
> +};

Needs to be 32 bit aligned.
Do you need 32 bits for the filter mask?
Perhaps a 16bit mask and an 8bit pad for future use.

struct if_stats_msg {
            __u32 ifindex;
	   __u16 filter_mask;
	   __u8  family;
            __u8 pad; /* future use */
};

Or you could reverse those (from smallest to largest).
BTW, any plans to do the cool feature where i inject a timeout period
and i just get STATS events ;-> The filter struct would have to be more
sophisticated - user would need to pass a list of ifindices and
filter_mask as well as timeout.

cheers,
jamal

^ permalink raw reply

* [PATCH net-next 1/6] net: Make vxlan/geneve default udp ports public
From: Manish Chopra @ 2016-04-09 13:17 UTC (permalink / raw)
  To: davem; +Cc: netdev, Ariel.Elior, Yuval.Mintz
In-Reply-To: <1460207825-3622-1-git-send-email-manish.chopra@qlogic.com>

This patch defines default UDP ports for vxlan and geneve
in their respective header files to be accessed by the driver.

Rationale behind this change is that with some OVS configuration
UDP ports doesn't get notified to the driver using
.ndo_[add|del]_vxlan_port. So for the driver to work with
these specific ports in that environment we need to have them configured
on adapter by default for the required hardware offload support.

Signed-off-by: Manish Chopra <manish.chopra@qlogic.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: Ariel Elior <Ariel.Elior@qlogic.com>
---
 drivers/net/geneve.c | 4 +---
 drivers/net/vxlan.c  | 2 +-
 include/net/geneve.h | 1 +
 include/net/vxlan.h  | 2 ++
 4 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/net/geneve.c b/drivers/net/geneve.c
index a9fbf17..4f8a1bb 100644
--- a/drivers/net/geneve.c
+++ b/drivers/net/geneve.c
@@ -23,8 +23,6 @@
 
 #define GENEVE_NETDEV_VER	"0.6"
 
-#define GENEVE_UDP_PORT		6081
-
 #define GENEVE_N_VID		(1u << 24)
 #define GENEVE_VID_MASK		(GENEVE_N_VID - 1)
 
@@ -1361,7 +1359,7 @@ static int geneve_configure(struct net *net, struct net_device *dev,
 static int geneve_newlink(struct net *net, struct net_device *dev,
 			  struct nlattr *tb[], struct nlattr *data[])
 {
-	__be16 dst_port = htons(GENEVE_UDP_PORT);
+	__be16 dst_port = htons(GENEVE_DEF_UDP_PORT);
 	__u8 ttl = 0, tos = 0;
 	bool metadata = false;
 	union geneve_addr remote = geneve_remote_unspec;
diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 9f36340..1d7af21 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -62,7 +62,7 @@
  * The IANA assigned port is 4789, but the Linux default is 8472
  * for compatibility with early adopters.
  */
-static unsigned short vxlan_port __read_mostly = 8472;
+static unsigned short vxlan_port __read_mostly = VXLAN_DEF_UDP_PORT;
 module_param_named(udp_port, vxlan_port, ushort, 0444);
 MODULE_PARM_DESC(udp_port, "Destination UDP port");
 
diff --git a/include/net/geneve.h b/include/net/geneve.h
index e6c23dc..3c3ee4a 100644
--- a/include/net/geneve.h
+++ b/include/net/geneve.h
@@ -5,6 +5,7 @@
 #include <net/udp_tunnel.h>
 #endif
 
+#define GENEVE_DEF_UDP_PORT 6081
 
 /* Geneve Header:
  *  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
diff --git a/include/net/vxlan.h b/include/net/vxlan.h
index 2f168f0..5d1b27f 100644
--- a/include/net/vxlan.h
+++ b/include/net/vxlan.h
@@ -9,6 +9,8 @@
 #include <linux/udp.h>
 #include <net/dst_metadata.h>
 
+#define VXLAN_DEF_UDP_PORT 8472
+
 /* VXLAN protocol (RFC 7348) header:
  * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  * |R|R|R|R|I|R|R|R|               Reserved                        |
-- 
2.7.2

^ permalink raw reply related

* [PATCH net-next 0/6] qed/qede: Add tunneling support
From: Manish Chopra @ 2016-04-09 13:16 UTC (permalink / raw)
  To: davem; +Cc: netdev, Ariel.Elior, Yuval.Mintz

Hi David,

This patch series adds support for VXLAN, GRE and GENEVE tunnels
to be used over this driver. With this support, adapter can perform
TSO offload, inner/outer checksums offloads on TX and RX for
encapsulated packets.

Almost all the changes in this series addresses our driver modules,
except the first patch which is for general infrastructure change.

Please consider applying this series to net-next.

Thanks,
Manish

Manish Chopra (6):
  net: Make vxlan/geneve default udp ports public
  qed: Add infrastructure support for tunneling
  qed/qede: Add VXLAN tunnel slowpath configuration support
  qed/qede: Add GENEVE tunnel slowpath configuration support
  qed: Enable GRE tunnel slowpath configuration
  qede: Add fastpath support for tunneling

 drivers/net/ethernet/qlogic/Kconfig                |  21 ++
 drivers/net/ethernet/qlogic/qed/qed.h              |  46 ++++
 drivers/net/ethernet/qlogic/qed/qed_dev.c          |   6 +-
 drivers/net/ethernet/qlogic/qed/qed_dev_api.h      |   2 +
 drivers/net/ethernet/qlogic/qed/qed_hsi.h          |  51 ++++-
 .../net/ethernet/qlogic/qed/qed_init_fw_funcs.c    | 127 ++++++++++
 drivers/net/ethernet/qlogic/qed/qed_l2.c           |  31 +++
 drivers/net/ethernet/qlogic/qed/qed_main.c         |  15 +-
 drivers/net/ethernet/qlogic/qed/qed_reg_addr.h     |  31 +++
 drivers/net/ethernet/qlogic/qed/qed_sp.h           |   7 +
 drivers/net/ethernet/qlogic/qed/qed_sp_commands.c  | 255 +++++++++++++++++++++
 drivers/net/ethernet/qlogic/qede/qede.h            |   6 +-
 drivers/net/ethernet/qlogic/qede/qede_main.c       | 198 +++++++++++++++-
 drivers/net/geneve.c                               |   4 +-
 drivers/net/vxlan.c                                |   2 +-
 include/linux/qed/qed_eth_if.h                     |  10 +
 include/net/geneve.h                               |   1 +
 include/net/vxlan.h                                |   2 +
 18 files changed, 796 insertions(+), 19 deletions(-)

-- 
2.7.2

^ permalink raw reply

* [PATCH net-next 5/6] qed: Enable GRE tunnel slowpath configuration
From: Manish Chopra @ 2016-04-09 13:17 UTC (permalink / raw)
  To: davem; +Cc: netdev, Ariel.Elior, Yuval.Mintz
In-Reply-To: <1460207825-3622-1-git-send-email-manish.chopra@qlogic.com>

Signed-off-by: Manish Chopra <manish.chopra@qlogic.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: Ariel Elior <Ariel.Elior@qlogic.com>
---
 drivers/net/ethernet/qlogic/qed/qed_main.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/drivers/net/ethernet/qlogic/qed/qed_main.c b/drivers/net/ethernet/qlogic/qed/qed_main.c
index 406de14..63db1ef 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_main.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_main.c
@@ -779,10 +779,14 @@ static int qed_slowpath_start(struct qed_dev *cdev,
 
 	memset(&tunn_info, 0, sizeof(tunn_info));
 	tunn_info.tunn_mode |=  1 << QED_MODE_VXLAN_TUNN |
+				1 << QED_MODE_L2GRE_TUNN |
+				1 << QED_MODE_IPGRE_TUNN |
 				1 << QED_MODE_L2GENEVE_TUNN |
 				1 << QED_MODE_IPGENEVE_TUNN;
 
 	tunn_info.tunn_clss_vxlan = QED_TUNN_CLSS_MAC_VLAN;
+	tunn_info.tunn_clss_l2gre = QED_TUNN_CLSS_MAC_VLAN;
+	tunn_info.tunn_clss_ipgre = QED_TUNN_CLSS_MAC_VLAN;
 
 	rc = qed_hw_init(cdev, &tunn_info, true,
 			 cdev->int_params.out.int_mode,
-- 
2.7.2

^ permalink raw reply related

* [PATCH net-next 4/6] qed/qede: Add GENEVE tunnel slowpath configuration support
From: Manish Chopra @ 2016-04-09 13:17 UTC (permalink / raw)
  To: davem; +Cc: netdev, Ariel.Elior, Yuval.Mintz
In-Reply-To: <1460207825-3622-1-git-send-email-manish.chopra@qlogic.com>

This patch configures adapter for GENEVE tunnel offloads
to be performed by the adapter.

Signed-off-by: Manish Chopra <manish.chopra@qlogic.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: Ariel Elior <Ariel.Elior@qlogic.com>
---
 drivers/net/ethernet/qlogic/Kconfig          | 10 ++++++++++
 drivers/net/ethernet/qlogic/qed/qed_main.c   |  5 ++++-
 drivers/net/ethernet/qlogic/qede/qede_main.c | 12 +++++++++++-
 3 files changed, 25 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/qlogic/Kconfig b/drivers/net/ethernet/qlogic/Kconfig
index 7a65522..c0a11b5 100644
--- a/drivers/net/ethernet/qlogic/Kconfig
+++ b/drivers/net/ethernet/qlogic/Kconfig
@@ -114,4 +114,14 @@ config QEDE_VXLAN
 	  support for Virtual eXtensible Local Area Network (VXLAN)
 	  in the driver.
 
+config QEDE_GENEVE
+	bool "Generic Network Virtualization Encapsulation (GENEVE) support"
+	depends on QEDE && GENEVE && !(QEDE=y && GENEVE=m)
+	---help---
+	  This allows one to create GENEVE virtual interfaces that provide
+	  Layer 2 Networks over Layer 3 Networks. GENEVE is often used
+	  to tunnel virtual network infrastructure in virtualized environments.
+	  Say Y here if you want to enable hardware offload support for
+	  Generic Network Virtualization Encapsulation (GENEVE) in the driver.
+
 endif # NET_VENDOR_QLOGIC
diff --git a/drivers/net/ethernet/qlogic/qed/qed_main.c b/drivers/net/ethernet/qlogic/qed/qed_main.c
index 3b406ac..406de14 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_main.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_main.c
@@ -778,7 +778,10 @@ static int qed_slowpath_start(struct qed_dev *cdev,
 	data = cdev->firmware->data;
 
 	memset(&tunn_info, 0, sizeof(tunn_info));
-	tunn_info.tunn_mode |=  1 << QED_MODE_VXLAN_TUNN;
+	tunn_info.tunn_mode |=  1 << QED_MODE_VXLAN_TUNN |
+				1 << QED_MODE_L2GENEVE_TUNN |
+				1 << QED_MODE_IPGENEVE_TUNN;
+
 	tunn_info.tunn_clss_vxlan = QED_TUNN_CLSS_MAC_VLAN;
 
 	rc = qed_hw_init(cdev, &tunn_info, true,
diff --git a/drivers/net/ethernet/qlogic/qede/qede_main.c b/drivers/net/ethernet/qlogic/qede/qede_main.c
index 9a82d42..b2cde75 100644
--- a/drivers/net/ethernet/qlogic/qede/qede_main.c
+++ b/drivers/net/ethernet/qlogic/qede/qede_main.c
@@ -36,6 +36,7 @@
 #include <linux/random.h>
 #include <net/ip6_checksum.h>
 #include <linux/bitops.h>
+#include <net/geneve.h>
 
 #include "qede.h"
 
@@ -1864,17 +1865,26 @@ static void qede_del_vxlan_port(struct net_device *dev,
 	set_bit(QEDE_SP_VXLAN_PORT_DELETE, &edev->sp_flags);
 	schedule_delayed_work(&edev->sp_task, 0);
 }
+#endif
 
+#if defined(CONFIG_QEDE_VXLAN) || defined(CONFIG_QEDE_GENEVE)
 static void qede_config_def_udp_tunnel_ports(struct qede_dev *edev)
 {
 	struct qed_tunn_params tunn_params;
 
 	memset(&tunn_params, 0, sizeof(tunn_params));
 
+#ifdef CONFIG_QEDE_VXLAN
 	if (!edev->vxlan_dst_port) {
 		tunn_params.update_vxlan_port = 1;
 		tunn_params.vxlan_port = VXLAN_DEF_UDP_PORT;
 	}
+#endif
+
+#ifdef CONFIG_QEDE_GENEVE
+	tunn_params.update_geneve_port = 1;
+	tunn_params.geneve_port = GENEVE_DEF_UDP_PORT;
+#endif
 
 	qed_ops->tunn_config(edev->cdev, &tunn_params);
 }
@@ -3167,7 +3177,7 @@ static int qede_load(struct qede_dev *edev, enum qede_load_mode mode)
 	edev->ops->common->get_link(edev->cdev, &link_output);
 	qede_link_update(edev, &link_output);
 
-#ifdef CONFIG_QEDE_VXLAN
+#if defined(CONFIG_QEDE_VXLAN) || defined(CONFIG_QEDE_GENEVE)
 	qede_config_def_udp_tunnel_ports(edev);
 #endif
 	DP_INFO(edev, "Ending successfully qede load\n");
-- 
2.7.2

^ permalink raw reply related

* [PATCH net-next 3/6] qed/qede: Add VXLAN tunnel slowpath configuration support
From: Manish Chopra @ 2016-04-09 13:17 UTC (permalink / raw)
  To: davem; +Cc: netdev, Ariel.Elior, Yuval.Mintz
In-Reply-To: <1460207825-3622-1-git-send-email-manish.chopra@qlogic.com>

This patch enables VXLAN tunnel on the adapter and
add support for driver hooks to configure UDP ports
for VXLAN tunnel offload to be performed by the adapter.

Signed-off-by: Manish Chopra <manish.chopra@qlogic.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: Ariel Elior <Ariel.Elior@qlogic.com>
---
 drivers/net/ethernet/qlogic/Kconfig               | 11 +++
 drivers/net/ethernet/qlogic/qed/qed_main.c        |  8 ++-
 drivers/net/ethernet/qlogic/qed/qed_sp_commands.c |  3 +-
 drivers/net/ethernet/qlogic/qede/qede.h           |  5 +-
 drivers/net/ethernet/qlogic/qede/qede_main.c      | 87 +++++++++++++++++++++++
 5 files changed, 111 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/qlogic/Kconfig b/drivers/net/ethernet/qlogic/Kconfig
index ddcfcab..7a65522 100644
--- a/drivers/net/ethernet/qlogic/Kconfig
+++ b/drivers/net/ethernet/qlogic/Kconfig
@@ -103,4 +103,15 @@ config QEDE
 	depends on QED
 	---help---
 	  This enables the support for ...
+
+config QEDE_VXLAN
+	bool "Virtual eXtensible Local Area Network support"
+	default n
+	depends on QEDE && VXLAN && !(QEDE=y && VXLAN=m)
+	---help---
+	  This enables hardware offload support for VXLAN protocol over
+	  qede module. Say Y here if you want to enable hardware offload
+	  support for Virtual eXtensible Local Area Network (VXLAN)
+	  in the driver.
+
 endif # NET_VENDOR_QLOGIC
diff --git a/drivers/net/ethernet/qlogic/qed/qed_main.c b/drivers/net/ethernet/qlogic/qed/qed_main.c
index 01c0207..3b406ac 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_main.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_main.c
@@ -744,6 +744,7 @@ static void qed_update_pf_params(struct qed_dev *cdev,
 static int qed_slowpath_start(struct qed_dev *cdev,
 			      struct qed_slowpath_params *params)
 {
+	struct qed_tunn_start_params tunn_info;
 	struct qed_mcp_drv_version drv_version;
 	const u8 *data = NULL;
 	struct qed_hwfn *hwfn;
@@ -776,7 +777,12 @@ static int qed_slowpath_start(struct qed_dev *cdev,
 	/* Start the slowpath */
 	data = cdev->firmware->data;
 
-	rc = qed_hw_init(cdev, NULL, true, cdev->int_params.out.int_mode,
+	memset(&tunn_info, 0, sizeof(tunn_info));
+	tunn_info.tunn_mode |=  1 << QED_MODE_VXLAN_TUNN;
+	tunn_info.tunn_clss_vxlan = QED_TUNN_CLSS_MAC_VLAN;
+
+	rc = qed_hw_init(cdev, &tunn_info, true,
+			 cdev->int_params.out.int_mode,
 			 true, data);
 	if (rc)
 		goto err2;
diff --git a/drivers/net/ethernet/qlogic/qed/qed_sp_commands.c b/drivers/net/ethernet/qlogic/qed/qed_sp_commands.c
index 306da70..7ccd96e 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_sp_commands.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_sp_commands.c
@@ -353,7 +353,8 @@ int qed_sp_pf_start(struct qed_hwfn *p_hwfn,
 	DMA_REGPAIR_LE(p_ramrod->consolid_q_pbl_addr,
 		       p_hwfn->p_consq->chain.pbl.p_phys_table);
 
-	qed_tunn_set_pf_start_params(p_hwfn, NULL, NULL);
+	qed_tunn_set_pf_start_params(p_hwfn, p_tunn,
+				     &p_ramrod->tunnel_config);
 	p_hwfn->hw_info.personality = PERSONALITY_ETH;
 
 	DP_VERBOSE(p_hwfn, QED_MSG_SPQ,
diff --git a/drivers/net/ethernet/qlogic/qede/qede.h b/drivers/net/ethernet/qlogic/qede/qede.h
index d023251..0b66d4a 100644
--- a/drivers/net/ethernet/qlogic/qede/qede.h
+++ b/drivers/net/ethernet/qlogic/qede/qede.h
@@ -167,6 +167,7 @@ struct qede_dev {
 	bool accept_any_vlan;
 	struct delayed_work		sp_task;
 	unsigned long			sp_flags;
+	u16				vxlan_dst_port;
 };
 
 enum QEDE_STATE {
@@ -287,7 +288,9 @@ struct qede_fastpath {
 #define QEDE_CSUM_ERROR			BIT(0)
 #define QEDE_CSUM_UNNECESSARY		BIT(1)
 
-#define QEDE_SP_RX_MODE		1
+#define QEDE_SP_RX_MODE			1
+#define QEDE_SP_VXLAN_PORT_ADD		2
+#define QEDE_SP_VXLAN_PORT_DELETE	3
 
 union qede_reload_args {
 	u16 mtu;
diff --git a/drivers/net/ethernet/qlogic/qede/qede_main.c b/drivers/net/ethernet/qlogic/qede/qede_main.c
index 518af32..9a82d42 100644
--- a/drivers/net/ethernet/qlogic/qede/qede_main.c
+++ b/drivers/net/ethernet/qlogic/qede/qede_main.c
@@ -1830,6 +1830,63 @@ static void qede_vlan_mark_nonconfigured(struct qede_dev *edev)
 	edev->accept_any_vlan = false;
 }
 
+#ifdef CONFIG_QEDE_VXLAN
+static void qede_add_vxlan_port(struct net_device *dev,
+				sa_family_t sa_family, __be16 port)
+{
+	struct qede_dev *edev = netdev_priv(dev);
+	u16 t_port = ntohs(port);
+
+	if (edev->vxlan_dst_port)
+		return;
+
+	edev->vxlan_dst_port = t_port;
+
+	DP_VERBOSE(edev, QED_MSG_DEBUG, "Added vxlan port=%d", t_port);
+
+	set_bit(QEDE_SP_VXLAN_PORT_ADD, &edev->sp_flags);
+	schedule_delayed_work(&edev->sp_task, 0);
+}
+
+static void qede_del_vxlan_port(struct net_device *dev,
+				sa_family_t sa_family, __be16 port)
+{
+	struct qede_dev *edev = netdev_priv(dev);
+	u16 t_port = ntohs(port);
+
+	if (t_port != edev->vxlan_dst_port)
+		return;
+
+	edev->vxlan_dst_port = 0;
+
+	DP_VERBOSE(edev, QED_MSG_DEBUG, "Deleted vxlan port=%d", t_port);
+
+	set_bit(QEDE_SP_VXLAN_PORT_DELETE, &edev->sp_flags);
+	schedule_delayed_work(&edev->sp_task, 0);
+}
+
+static void qede_config_def_udp_tunnel_ports(struct qede_dev *edev)
+{
+	struct qed_tunn_params tunn_params;
+
+	memset(&tunn_params, 0, sizeof(tunn_params));
+
+	if (!edev->vxlan_dst_port) {
+		tunn_params.update_vxlan_port = 1;
+		tunn_params.vxlan_port = VXLAN_DEF_UDP_PORT;
+	}
+
+	qed_ops->tunn_config(edev->cdev, &tunn_params);
+}
+#endif
+
+static netdev_features_t qede_features_check(struct sk_buff *skb,
+					     struct net_device *dev,
+					     netdev_features_t features)
+{
+	return vxlan_features_check(skb, features);
+}
+
 static const struct net_device_ops qede_netdev_ops = {
 	.ndo_open = qede_open,
 	.ndo_stop = qede_close,
@@ -1841,6 +1898,11 @@ static const struct net_device_ops qede_netdev_ops = {
 	.ndo_vlan_rx_add_vid = qede_vlan_rx_add_vid,
 	.ndo_vlan_rx_kill_vid = qede_vlan_rx_kill_vid,
 	.ndo_get_stats64 = qede_get_stats64,
+#ifdef CONFIG_QEDE_VXLAN
+	.ndo_add_vxlan_port = qede_add_vxlan_port,
+	.ndo_del_vxlan_port = qede_del_vxlan_port,
+#endif
+	.ndo_features_check = qede_features_check,
 };
 
 /* -------------------------------------------------------------------------
@@ -2013,6 +2075,8 @@ static void qede_sp_task(struct work_struct *work)
 {
 	struct qede_dev *edev = container_of(work, struct qede_dev,
 					     sp_task.work);
+	struct qed_dev *cdev = edev->cdev;
+
 	mutex_lock(&edev->qede_lock);
 
 	if (edev->state == QEDE_STATE_OPEN) {
@@ -2020,6 +2084,26 @@ static void qede_sp_task(struct work_struct *work)
 			qede_config_rx_mode(edev->ndev);
 	}
 
+	if (test_and_clear_bit(QEDE_SP_VXLAN_PORT_ADD, &edev->sp_flags)) {
+		struct qed_tunn_params tunn_params;
+
+		memset(&tunn_params, 0, sizeof(tunn_params));
+		tunn_params.update_vxlan_port = 1;
+		tunn_params.vxlan_port = edev->vxlan_dst_port;
+		qed_ops->tunn_config(cdev, &tunn_params);
+	}
+
+	if (test_and_clear_bit(QEDE_SP_VXLAN_PORT_DELETE, &edev->sp_flags)) {
+		struct qed_tunn_params tunn_params;
+
+		memset(&tunn_params, 0, sizeof(tunn_params));
+		tunn_params.update_vxlan_port = 1;
+
+		/* Update with default vxlan udp port */
+		tunn_params.vxlan_port = VXLAN_DEF_UDP_PORT;
+		qed_ops->tunn_config(cdev, &tunn_params);
+	}
+
 	mutex_unlock(&edev->qede_lock);
 }
 
@@ -3083,6 +3167,9 @@ static int qede_load(struct qede_dev *edev, enum qede_load_mode mode)
 	edev->ops->common->get_link(edev->cdev, &link_output);
 	qede_link_update(edev, &link_output);
 
+#ifdef CONFIG_QEDE_VXLAN
+	qede_config_def_udp_tunnel_ports(edev);
+#endif
 	DP_INFO(edev, "Ending successfully qede load\n");
 
 	return 0;
-- 
2.7.2

^ permalink raw reply related

* [PATCH net-next 6/6] qede: Add fastpath support for tunneling
From: Manish Chopra @ 2016-04-09 13:17 UTC (permalink / raw)
  To: davem; +Cc: netdev, Ariel.Elior, Yuval.Mintz
In-Reply-To: <1460207825-3622-1-git-send-email-manish.chopra@qlogic.com>

This patch enables netdev tunneling features and adds
TX/RX fastpath support for tunneling in driver.

Signed-off-by: Manish Chopra <manish.chopra@qlogic.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: Ariel Elior <Ariel.Elior@qlogic.com>
---
 drivers/net/ethernet/qlogic/qede/qede.h      |   1 +
 drivers/net/ethernet/qlogic/qede/qede_main.c | 101 ++++++++++++++++++++++++---
 2 files changed, 92 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/qlogic/qede/qede.h b/drivers/net/ethernet/qlogic/qede/qede.h
index 0b66d4a..71a0066 100644
--- a/drivers/net/ethernet/qlogic/qede/qede.h
+++ b/drivers/net/ethernet/qlogic/qede/qede.h
@@ -287,6 +287,7 @@ struct qede_fastpath {
 
 #define QEDE_CSUM_ERROR			BIT(0)
 #define QEDE_CSUM_UNNECESSARY		BIT(1)
+#define QEDE_TUNN_CSUM_UNNECESSARY	BIT(2)
 
 #define QEDE_SP_RX_MODE			1
 #define QEDE_SP_VXLAN_PORT_ADD		2
diff --git a/drivers/net/ethernet/qlogic/qede/qede_main.c b/drivers/net/ethernet/qlogic/qede/qede_main.c
index b2cde75..b041a76 100644
--- a/drivers/net/ethernet/qlogic/qede/qede_main.c
+++ b/drivers/net/ethernet/qlogic/qede/qede_main.c
@@ -320,6 +320,9 @@ static u32 qede_xmit_type(struct qede_dev *edev,
 	    (ipv6_hdr(skb)->nexthdr == NEXTHDR_IPV6))
 		*ipv6_ext = 1;
 
+	if (skb->encapsulation)
+		rc |= XMIT_ENC;
+
 	if (skb_is_gso(skb))
 		rc |= XMIT_LSO;
 
@@ -381,6 +384,16 @@ static int map_frag_to_bd(struct qede_dev *edev,
 	return 0;
 }
 
+static u16 qede_get_skb_hlen(struct sk_buff *skb, bool is_encap_pkt)
+{
+	if (is_encap_pkt)
+		return (skb_inner_transport_header(skb) +
+			inner_tcp_hdrlen(skb) - skb->data);
+	else
+		return (skb_transport_header(skb) +
+			tcp_hdrlen(skb) - skb->data);
+}
+
 /* +2 for 1st BD for headers and 2nd BD for headlen (if required) */
 #if ((MAX_SKB_FRAGS + 2) > ETH_TX_MAX_BDS_PER_NON_LSO_PACKET)
 static bool qede_pkt_req_lin(struct qede_dev *edev, struct sk_buff *skb,
@@ -391,8 +404,7 @@ static bool qede_pkt_req_lin(struct qede_dev *edev, struct sk_buff *skb,
 	if (xmit_type & XMIT_LSO) {
 		int hlen;
 
-		hlen = skb_transport_header(skb) +
-		       tcp_hdrlen(skb) - skb->data;
+		hlen = qede_get_skb_hlen(skb, xmit_type & XMIT_ENC);
 
 		/* linear payload would require its own BD */
 		if (skb_headlen(skb) > hlen)
@@ -500,7 +512,18 @@ netdev_tx_t qede_start_xmit(struct sk_buff *skb,
 		first_bd->data.bd_flags.bitfields |=
 			1 << ETH_TX_1ST_BD_FLAGS_L4_CSUM_SHIFT;
 
-		first_bd->data.bitfields |= cpu_to_le16(temp);
+		if (xmit_type & XMIT_ENC) {
+			first_bd->data.bd_flags.bitfields |=
+				1 << ETH_TX_1ST_BD_FLAGS_IP_CSUM_SHIFT;
+		} else {
+			/* In cases when OS doesn't indicate for inner offloads
+			 * when packet is tunnelled, we need to override the HW
+			 * tunnel configuration so that packets are treated as
+			 * regular non tunnelled packets and no inner offloads
+			 * are done by the hardware.
+			 */
+			first_bd->data.bitfields |= cpu_to_le16(temp);
+		}
 
 		/* If the packet is IPv6 with extension header, indicate that
 		 * to FW and pass few params, since the device cracker doesn't
@@ -516,10 +539,15 @@ netdev_tx_t qede_start_xmit(struct sk_buff *skb,
 		third_bd->data.lso_mss =
 			cpu_to_le16(skb_shinfo(skb)->gso_size);
 
-		first_bd->data.bd_flags.bitfields |=
-		1 << ETH_TX_1ST_BD_FLAGS_IP_CSUM_SHIFT;
-		hlen = skb_transport_header(skb) +
-		       tcp_hdrlen(skb) - skb->data;
+		if (unlikely(xmit_type & XMIT_ENC)) {
+			first_bd->data.bd_flags.bitfields |=
+				1 << ETH_TX_1ST_BD_FLAGS_TUNN_IP_CSUM_SHIFT;
+			hlen = qede_get_skb_hlen(skb, true);
+		} else {
+			first_bd->data.bd_flags.bitfields |=
+				1 << ETH_TX_1ST_BD_FLAGS_IP_CSUM_SHIFT;
+			hlen = qede_get_skb_hlen(skb, false);
+		}
 
 		/* @@@TBD - if will not be removed need to check */
 		third_bd->data.bitfields |=
@@ -853,6 +881,9 @@ static void qede_set_skb_csum(struct sk_buff *skb, u8 csum_flag)
 
 	if (csum_flag & QEDE_CSUM_UNNECESSARY)
 		skb->ip_summed = CHECKSUM_UNNECESSARY;
+
+	if (csum_flag & QEDE_TUNN_CSUM_UNNECESSARY)
+		skb->csum_level = 1;
 }
 
 static inline void qede_skb_receive(struct qede_dev *edev,
@@ -1142,13 +1173,47 @@ err:
 	tpa_info->skb = NULL;
 }
 
-static u8 qede_check_csum(u16 flag)
+static bool qede_tunn_exist(u16 flag)
+{
+	return !!(flag & (PARSING_AND_ERR_FLAGS_TUNNELEXIST_MASK <<
+			  PARSING_AND_ERR_FLAGS_TUNNELEXIST_SHIFT));
+}
+
+static u8 qede_check_tunn_csum(u16 flag)
+{
+	u16 csum_flag = 0;
+	u8 tcsum = 0;
+
+	if (flag & (PARSING_AND_ERR_FLAGS_TUNNELL4CHKSMWASCALCULATED_MASK <<
+		    PARSING_AND_ERR_FLAGS_TUNNELL4CHKSMWASCALCULATED_SHIFT))
+		csum_flag |= PARSING_AND_ERR_FLAGS_TUNNELL4CHKSMERROR_MASK <<
+			     PARSING_AND_ERR_FLAGS_TUNNELL4CHKSMERROR_SHIFT;
+
+	if (flag & (PARSING_AND_ERR_FLAGS_L4CHKSMWASCALCULATED_MASK <<
+		    PARSING_AND_ERR_FLAGS_L4CHKSMWASCALCULATED_SHIFT)) {
+		csum_flag |= PARSING_AND_ERR_FLAGS_L4CHKSMERROR_MASK <<
+			     PARSING_AND_ERR_FLAGS_L4CHKSMERROR_SHIFT;
+		tcsum = QEDE_TUNN_CSUM_UNNECESSARY;
+	}
+
+	csum_flag |= PARSING_AND_ERR_FLAGS_TUNNELIPHDRERROR_MASK <<
+		     PARSING_AND_ERR_FLAGS_TUNNELIPHDRERROR_SHIFT |
+		     PARSING_AND_ERR_FLAGS_IPHDRERROR_MASK <<
+		     PARSING_AND_ERR_FLAGS_IPHDRERROR_SHIFT;
+
+	if (csum_flag & flag)
+		return QEDE_CSUM_ERROR;
+
+	return QEDE_CSUM_UNNECESSARY | tcsum;
+}
+
+static u8 qede_check_notunn_csum(u16 flag)
 {
 	u16 csum_flag = 0;
 	u8 csum = 0;
 
-	if ((PARSING_AND_ERR_FLAGS_L4CHKSMWASCALCULATED_MASK <<
-	     PARSING_AND_ERR_FLAGS_L4CHKSMWASCALCULATED_SHIFT) & flag) {
+	if (flag & (PARSING_AND_ERR_FLAGS_L4CHKSMWASCALCULATED_MASK <<
+		    PARSING_AND_ERR_FLAGS_L4CHKSMWASCALCULATED_SHIFT)) {
 		csum_flag |= PARSING_AND_ERR_FLAGS_L4CHKSMERROR_MASK <<
 			     PARSING_AND_ERR_FLAGS_L4CHKSMERROR_SHIFT;
 		csum = QEDE_CSUM_UNNECESSARY;
@@ -1163,6 +1228,14 @@ static u8 qede_check_csum(u16 flag)
 	return csum;
 }
 
+static u8 qede_check_csum(u16 flag)
+{
+	if (!qede_tunn_exist(flag))
+		return qede_check_notunn_csum(flag);
+	else
+		return qede_check_tunn_csum(flag);
+}
+
 static int qede_rx_int(struct qede_fastpath *fp, int budget)
 {
 	struct qede_dev *edev = fp->edev;
@@ -1985,6 +2058,14 @@ static void qede_init_ndev(struct qede_dev *edev)
 		      NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM |
 		      NETIF_F_TSO | NETIF_F_TSO6;
 
+	/* Encap features*/
+	hw_features |= NETIF_F_GSO_GRE | NETIF_F_GSO_UDP_TUNNEL |
+		       NETIF_F_TSO_ECN;
+	ndev->hw_enc_features = NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM |
+				NETIF_F_SG | NETIF_F_TSO | NETIF_F_TSO_ECN |
+				NETIF_F_TSO6 | NETIF_F_GSO_GRE |
+				NETIF_F_GSO_UDP_TUNNEL | NETIF_F_RXCSUM;
+
 	ndev->vlan_features = hw_features | NETIF_F_RXHASH | NETIF_F_RXCSUM |
 			      NETIF_F_HIGHDMA;
 	ndev->features = hw_features | NETIF_F_RXHASH | NETIF_F_RXCSUM |
-- 
2.7.2

^ permalink raw reply related

* [PATCH net-next 2/6] qed: Add infrastructure support for tunneling
From: Manish Chopra @ 2016-04-09 13:17 UTC (permalink / raw)
  To: davem; +Cc: netdev, Ariel.Elior, Yuval.Mintz
In-Reply-To: <1460207825-3622-1-git-send-email-manish.chopra@qlogic.com>

This patch adds various structure/APIs needed to configure/enable different
tunnel [VXLAN/GRE/GENEVE] parameters on the adapter.

Signed-off-by: Manish Chopra <manish.chopra@qlogic.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: Ariel Elior <Ariel.Elior@qlogic.com>
---
 drivers/net/ethernet/qlogic/qed/qed.h              |  46 ++++
 drivers/net/ethernet/qlogic/qed/qed_dev.c          |   6 +-
 drivers/net/ethernet/qlogic/qed/qed_dev_api.h      |   2 +
 drivers/net/ethernet/qlogic/qed/qed_hsi.h          |  51 ++++-
 .../net/ethernet/qlogic/qed/qed_init_fw_funcs.c    | 127 +++++++++++
 drivers/net/ethernet/qlogic/qed/qed_l2.c           |  31 +++
 drivers/net/ethernet/qlogic/qed/qed_main.c         |   2 +-
 drivers/net/ethernet/qlogic/qed/qed_reg_addr.h     |  31 +++
 drivers/net/ethernet/qlogic/qed/qed_sp.h           |   7 +
 drivers/net/ethernet/qlogic/qed/qed_sp_commands.c  | 254 +++++++++++++++++++++
 include/linux/qed/qed_eth_if.h                     |  10 +
 11 files changed, 563 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/qlogic/qed/qed.h b/drivers/net/ethernet/qlogic/qed/qed.h
index fcb8e9b..d4da74b 100644
--- a/drivers/net/ethernet/qlogic/qed/qed.h
+++ b/drivers/net/ethernet/qlogic/qed/qed.h
@@ -74,6 +74,51 @@ struct qed_rt_data {
 	bool	*b_valid;
 };
 
+enum qed_tunn_mode {
+	QED_MODE_L2GENEVE_TUNN,
+	QED_MODE_IPGENEVE_TUNN,
+	QED_MODE_L2GRE_TUNN,
+	QED_MODE_IPGRE_TUNN,
+	QED_MODE_VXLAN_TUNN,
+};
+
+enum qed_tunn_clss {
+	QED_TUNN_CLSS_MAC_VLAN,
+	QED_TUNN_CLSS_MAC_VNI,
+	QED_TUNN_CLSS_INNER_MAC_VLAN,
+	QED_TUNN_CLSS_INNER_MAC_VNI,
+	MAX_QED_TUNN_CLSS,
+};
+
+struct qed_tunn_start_params {
+	unsigned long	tunn_mode;
+	u16		vxlan_udp_port;
+	u16		geneve_udp_port;
+	u8		update_vxlan_udp_port;
+	u8		update_geneve_udp_port;
+	u8		tunn_clss_vxlan;
+	u8		tunn_clss_l2geneve;
+	u8		tunn_clss_ipgeneve;
+	u8		tunn_clss_l2gre;
+	u8		tunn_clss_ipgre;
+};
+
+struct qed_tunn_update_params {
+	unsigned long	tunn_mode_update_mask;
+	unsigned long	tunn_mode;
+	u16		vxlan_udp_port;
+	u16		geneve_udp_port;
+	u8		update_rx_pf_clss;
+	u8		update_tx_pf_clss;
+	u8		update_vxlan_udp_port;
+	u8		update_geneve_udp_port;
+	u8		tunn_clss_vxlan;
+	u8		tunn_clss_l2geneve;
+	u8		tunn_clss_ipgeneve;
+	u8		tunn_clss_l2gre;
+	u8		tunn_clss_ipgre;
+};
+
 /* The PCI personality is not quite synonymous to protocol ID:
  * 1. All personalities need CORE connections
  * 2. The Ethernet personality may support also the RoCE protocol
@@ -430,6 +475,7 @@ struct qed_dev {
 	u8				num_hwfns;
 	struct qed_hwfn			hwfns[MAX_HWFNS_PER_DEVICE];
 
+	unsigned long			tunn_mode;
 	u32				drv_type;
 
 	struct qed_eth_stats		*reset_stats;
diff --git a/drivers/net/ethernet/qlogic/qed/qed_dev.c b/drivers/net/ethernet/qlogic/qed/qed_dev.c
index b7d100f..bdae5a5 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_dev.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_dev.c
@@ -558,6 +558,7 @@ static int qed_hw_init_port(struct qed_hwfn *p_hwfn,
 
 static int qed_hw_init_pf(struct qed_hwfn *p_hwfn,
 			  struct qed_ptt *p_ptt,
+			  struct qed_tunn_start_params *p_tunn,
 			  int hw_mode,
 			  bool b_hw_start,
 			  enum qed_int_mode int_mode,
@@ -625,7 +626,7 @@ static int qed_hw_init_pf(struct qed_hwfn *p_hwfn,
 		qed_int_igu_enable(p_hwfn, p_ptt, int_mode);
 
 		/* send function start command */
-		rc = qed_sp_pf_start(p_hwfn, p_hwfn->cdev->mf_mode);
+		rc = qed_sp_pf_start(p_hwfn, p_tunn, p_hwfn->cdev->mf_mode);
 		if (rc)
 			DP_NOTICE(p_hwfn, "Function start ramrod failed\n");
 	}
@@ -672,6 +673,7 @@ static void qed_reset_mb_shadow(struct qed_hwfn *p_hwfn,
 }
 
 int qed_hw_init(struct qed_dev *cdev,
+		struct qed_tunn_start_params *p_tunn,
 		bool b_hw_start,
 		enum qed_int_mode int_mode,
 		bool allow_npar_tx_switch,
@@ -724,7 +726,7 @@ int qed_hw_init(struct qed_dev *cdev,
 		/* Fall into */
 		case FW_MSG_CODE_DRV_LOAD_FUNCTION:
 			rc = qed_hw_init_pf(p_hwfn, p_hwfn->p_main_ptt,
-					    p_hwfn->hw_info.hw_mode,
+					    p_tunn, p_hwfn->hw_info.hw_mode,
 					    b_hw_start, int_mode,
 					    allow_npar_tx_switch);
 			break;
diff --git a/drivers/net/ethernet/qlogic/qed/qed_dev_api.h b/drivers/net/ethernet/qlogic/qed/qed_dev_api.h
index d6c7ddf..6aac3f8 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_dev_api.h
+++ b/drivers/net/ethernet/qlogic/qed/qed_dev_api.h
@@ -62,6 +62,7 @@ void qed_resc_setup(struct qed_dev *cdev);
  * @brief qed_hw_init -
  *
  * @param cdev
+ * @param p_tunn
  * @param b_hw_start
  * @param int_mode - interrupt mode [msix, inta, etc.] to use.
  * @param allow_npar_tx_switch - npar tx switching to be used
@@ -72,6 +73,7 @@ void qed_resc_setup(struct qed_dev *cdev);
  * @return int
  */
 int qed_hw_init(struct qed_dev *cdev,
+		struct qed_tunn_start_params *p_tunn,
 		bool b_hw_start,
 		enum qed_int_mode int_mode,
 		bool allow_npar_tx_switch,
diff --git a/drivers/net/ethernet/qlogic/qed/qed_hsi.h b/drivers/net/ethernet/qlogic/qed/qed_hsi.h
index a368f5e..15e02ab 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_hsi.h
+++ b/drivers/net/ethernet/qlogic/qed/qed_hsi.h
@@ -46,7 +46,7 @@ enum common_ramrod_cmd_id {
 	COMMON_RAMROD_PF_STOP /* PF Function Stop Ramrod */,
 	COMMON_RAMROD_RESERVED,
 	COMMON_RAMROD_RESERVED2,
-	COMMON_RAMROD_RESERVED3,
+	COMMON_RAMROD_PF_UPDATE,
 	COMMON_RAMROD_EMPTY,
 	MAX_COMMON_RAMROD_CMD_ID
 };
@@ -626,6 +626,42 @@ struct pf_start_ramrod_data {
 	u8				reserved0[4];
 };
 
+/* tunnel configuration */
+struct pf_update_tunnel_config {
+	u8	update_rx_pf_clss;
+	u8	update_tx_pf_clss;
+	u8	set_vxlan_udp_port_flg;
+	u8	set_geneve_udp_port_flg;
+	u8	tx_enable_vxlan;
+	u8	tx_enable_l2geneve;
+	u8	tx_enable_ipgeneve;
+	u8	tx_enable_l2gre;
+	u8	tx_enable_ipgre;
+	u8	tunnel_clss_vxlan;
+	u8	tunnel_clss_l2geneve;
+	u8	tunnel_clss_ipgeneve;
+	u8	tunnel_clss_l2gre;
+	u8	tunnel_clss_ipgre;
+	__le16	vxlan_udp_port;
+	__le16	geneve_udp_port;
+	__le16	reserved[3];
+};
+
+struct pf_update_ramrod_data {
+	u32				reserved[2];
+	u32				reserved_1[6];
+	struct pf_update_tunnel_config	tunnel_config;
+};
+
+/* Tunnel classification scheme */
+enum tunnel_clss {
+	TUNNEL_CLSS_MAC_VLAN = 0,
+	TUNNEL_CLSS_MAC_VNI,
+	TUNNEL_CLSS_INNER_MAC_VLAN,
+	TUNNEL_CLSS_INNER_MAC_VNI,
+	MAX_TUNNEL_CLSS
+};
+
 enum ports_mode {
 	ENGX2_PORTX1 /* 2 engines x 1 port */,
 	ENGX2_PORTX2 /* 2 engines x 2 ports */,
@@ -1603,6 +1639,19 @@ bool qed_send_qm_stop_cmd(struct qed_hwfn	*p_hwfn,
 			  u16			start_pq,
 			  u16			num_pqs);
 
+void qed_set_vxlan_dest_port(struct qed_hwfn *p_hwfn,
+			     struct qed_ptt  *p_ptt, u16 dest_port);
+void qed_set_vxlan_enable(struct qed_hwfn *p_hwfn,
+			  struct qed_ptt *p_ptt, bool vxlan_enable);
+void qed_set_gre_enable(struct qed_hwfn *p_hwfn,
+			struct qed_ptt  *p_ptt, bool eth_gre_enable,
+			bool ip_gre_enable);
+void qed_set_geneve_dest_port(struct qed_hwfn *p_hwfn,
+			      struct qed_ptt *p_ptt, u16 dest_port);
+void qed_set_geneve_enable(struct qed_hwfn *p_hwfn,
+			   struct qed_ptt *p_ptt, bool eth_geneve_enable,
+			   bool ip_geneve_enable);
+
 /* Ystorm flow control mode. Use enum fw_flow_ctrl_mode */
 #define YSTORM_FLOW_CONTROL_MODE_OFFSET  (IRO[0].base)
 #define YSTORM_FLOW_CONTROL_MODE_SIZE    (IRO[0].size)
diff --git a/drivers/net/ethernet/qlogic/qed/qed_init_fw_funcs.c b/drivers/net/ethernet/qlogic/qed/qed_init_fw_funcs.c
index f55ebdc..1dd5324 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_init_fw_funcs.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_init_fw_funcs.c
@@ -788,3 +788,130 @@ bool qed_send_qm_stop_cmd(struct qed_hwfn *p_hwfn,
 
 	return true;
 }
+
+static void
+qed_set_tunnel_type_enable_bit(unsigned long *var, int bit, bool enable)
+{
+	if (enable)
+		set_bit(bit, var);
+	else
+		clear_bit(bit, var);
+}
+
+#define PRS_ETH_TUNN_FIC_FORMAT	-188897008
+
+void qed_set_vxlan_dest_port(struct qed_hwfn *p_hwfn,
+			     struct qed_ptt *p_ptt,
+			     u16 dest_port)
+{
+	qed_wr(p_hwfn, p_ptt, PRS_REG_VXLAN_PORT, dest_port);
+	qed_wr(p_hwfn, p_ptt, NIG_REG_VXLAN_PORT, dest_port);
+	qed_wr(p_hwfn, p_ptt, PBF_REG_VXLAN_PORT, dest_port);
+}
+
+void qed_set_vxlan_enable(struct qed_hwfn *p_hwfn,
+			  struct qed_ptt *p_ptt,
+			  bool vxlan_enable)
+{
+	unsigned long reg_val = 0;
+	u8 shift;
+
+	reg_val = qed_rd(p_hwfn, p_ptt, PRS_REG_ENCAPSULATION_TYPE_EN);
+	shift = PRS_REG_ENCAPSULATION_TYPE_EN_VXLAN_ENABLE_SHIFT;
+	qed_set_tunnel_type_enable_bit(&reg_val, shift, vxlan_enable);
+
+	qed_wr(p_hwfn, p_ptt, PRS_REG_ENCAPSULATION_TYPE_EN, reg_val);
+
+	if (reg_val)
+		qed_wr(p_hwfn, p_ptt, PRS_REG_OUTPUT_FORMAT_4_0,
+		       PRS_ETH_TUNN_FIC_FORMAT);
+
+	reg_val = qed_rd(p_hwfn, p_ptt, NIG_REG_ENC_TYPE_ENABLE);
+	shift = NIG_REG_ENC_TYPE_ENABLE_VXLAN_ENABLE_SHIFT;
+	qed_set_tunnel_type_enable_bit(&reg_val, shift, vxlan_enable);
+
+	qed_wr(p_hwfn, p_ptt, NIG_REG_ENC_TYPE_ENABLE, reg_val);
+
+	qed_wr(p_hwfn, p_ptt, DORQ_REG_L2_EDPM_TUNNEL_VXLAN_EN,
+	       vxlan_enable ? 1 : 0);
+}
+
+void qed_set_gre_enable(struct qed_hwfn *p_hwfn, struct qed_ptt *p_ptt,
+			bool eth_gre_enable, bool ip_gre_enable)
+{
+	unsigned long reg_val = 0;
+	u8 shift;
+
+	reg_val = qed_rd(p_hwfn, p_ptt, PRS_REG_ENCAPSULATION_TYPE_EN);
+	shift = PRS_REG_ENCAPSULATION_TYPE_EN_ETH_OVER_GRE_ENABLE_SHIFT;
+	qed_set_tunnel_type_enable_bit(&reg_val, shift, eth_gre_enable);
+
+	shift = PRS_REG_ENCAPSULATION_TYPE_EN_IP_OVER_GRE_ENABLE_SHIFT;
+	qed_set_tunnel_type_enable_bit(&reg_val, shift, ip_gre_enable);
+	qed_wr(p_hwfn, p_ptt, PRS_REG_ENCAPSULATION_TYPE_EN, reg_val);
+	if (reg_val)
+		qed_wr(p_hwfn, p_ptt, PRS_REG_OUTPUT_FORMAT_4_0,
+		       PRS_ETH_TUNN_FIC_FORMAT);
+
+	reg_val = qed_rd(p_hwfn, p_ptt, NIG_REG_ENC_TYPE_ENABLE);
+	shift = NIG_REG_ENC_TYPE_ENABLE_ETH_OVER_GRE_ENABLE_SHIFT;
+	qed_set_tunnel_type_enable_bit(&reg_val, shift, eth_gre_enable);
+
+	shift = NIG_REG_ENC_TYPE_ENABLE_IP_OVER_GRE_ENABLE_SHIFT;
+	qed_set_tunnel_type_enable_bit(&reg_val, shift, ip_gre_enable);
+	qed_wr(p_hwfn, p_ptt, NIG_REG_ENC_TYPE_ENABLE, reg_val);
+
+	qed_wr(p_hwfn, p_ptt, DORQ_REG_L2_EDPM_TUNNEL_GRE_ETH_EN,
+	       eth_gre_enable ? 1 : 0);
+	qed_wr(p_hwfn, p_ptt, DORQ_REG_L2_EDPM_TUNNEL_GRE_IP_EN,
+	       ip_gre_enable ? 1 : 0);
+}
+
+void qed_set_geneve_dest_port(struct qed_hwfn *p_hwfn,
+			      struct qed_ptt *p_ptt,
+			      u16 dest_port)
+{
+	qed_wr(p_hwfn, p_ptt, PRS_REG_NGE_PORT, dest_port);
+	qed_wr(p_hwfn, p_ptt, NIG_REG_NGE_PORT, dest_port);
+	qed_wr(p_hwfn, p_ptt, PBF_REG_NGE_PORT, dest_port);
+}
+
+void qed_set_geneve_enable(struct qed_hwfn *p_hwfn,
+			   struct qed_ptt *p_ptt,
+			   bool eth_geneve_enable,
+			   bool ip_geneve_enable)
+{
+	unsigned long reg_val = 0;
+	u8 shift;
+
+	reg_val = qed_rd(p_hwfn, p_ptt, PRS_REG_ENCAPSULATION_TYPE_EN);
+	shift = PRS_REG_ENCAPSULATION_TYPE_EN_ETH_OVER_GENEVE_ENABLE_SHIFT;
+	qed_set_tunnel_type_enable_bit(&reg_val, shift, eth_geneve_enable);
+
+	shift = PRS_REG_ENCAPSULATION_TYPE_EN_IP_OVER_GENEVE_ENABLE_SHIFT;
+	qed_set_tunnel_type_enable_bit(&reg_val, shift, ip_geneve_enable);
+
+	qed_wr(p_hwfn, p_ptt, PRS_REG_ENCAPSULATION_TYPE_EN, reg_val);
+	if (reg_val)
+		qed_wr(p_hwfn, p_ptt, PRS_REG_OUTPUT_FORMAT_4_0,
+		       PRS_ETH_TUNN_FIC_FORMAT);
+
+	qed_wr(p_hwfn, p_ptt, NIG_REG_NGE_ETH_ENABLE,
+	       eth_geneve_enable ? 1 : 0);
+	qed_wr(p_hwfn, p_ptt, NIG_REG_NGE_IP_ENABLE, ip_geneve_enable ? 1 : 0);
+
+	/* comp ver */
+	reg_val = (ip_geneve_enable || eth_geneve_enable) ? 1 : 0;
+	qed_wr(p_hwfn, p_ptt, NIG_REG_NGE_COMP_VER, reg_val);
+	qed_wr(p_hwfn, p_ptt, PBF_REG_NGE_COMP_VER, reg_val);
+	qed_wr(p_hwfn, p_ptt, PRS_REG_NGE_COMP_VER, reg_val);
+
+	/* EDPM with geneve tunnel not supported in BB_B0 */
+	if (QED_IS_BB_B0(p_hwfn->cdev))
+		return;
+
+	qed_wr(p_hwfn, p_ptt, DORQ_REG_L2_EDPM_TUNNEL_NGE_ETH_EN,
+	       eth_geneve_enable ? 1 : 0);
+	qed_wr(p_hwfn, p_ptt, DORQ_REG_L2_EDPM_TUNNEL_NGE_IP_EN,
+	       ip_geneve_enable ? 1 : 0);
+}
diff --git a/drivers/net/ethernet/qlogic/qed/qed_l2.c b/drivers/net/ethernet/qlogic/qed/qed_l2.c
index 3f35c6c..68e45eb 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_l2.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_l2.c
@@ -1899,6 +1899,36 @@ static int qed_stop_txq(struct qed_dev *cdev,
 	return 0;
 }
 
+static int qed_tunn_configure(struct qed_dev *cdev,
+			      struct qed_tunn_params *tunn_params)
+{
+	struct qed_tunn_update_params tunn_info;
+	int i, rc;
+
+	memset(&tunn_info, 0, sizeof(tunn_info));
+	if (tunn_params->update_vxlan_port == 1) {
+		tunn_info.update_vxlan_udp_port = 1;
+		tunn_info.vxlan_udp_port = tunn_params->vxlan_port;
+	}
+
+	if (tunn_params->update_geneve_port == 1) {
+		tunn_info.update_geneve_udp_port = 1;
+		tunn_info.geneve_udp_port = tunn_params->geneve_port;
+	}
+
+	for_each_hwfn(cdev, i) {
+		struct qed_hwfn *hwfn = &cdev->hwfns[i];
+
+		rc = qed_sp_pf_update_tunn_cfg(hwfn, &tunn_info,
+					       QED_SPQ_MODE_EBLOCK, NULL);
+
+		if (rc)
+			return rc;
+	}
+
+	return 0;
+}
+
 static int qed_configure_filter_rx_mode(struct qed_dev *cdev,
 					enum qed_filter_rx_mode_type type)
 {
@@ -2041,6 +2071,7 @@ static const struct qed_eth_ops qed_eth_ops_pass = {
 	.fastpath_stop = &qed_fastpath_stop,
 	.eth_cqe_completion = &qed_fp_cqe_completion,
 	.get_vport_stats = &qed_get_vport_stats,
+	.tunn_config = &qed_tunn_configure,
 };
 
 const struct qed_eth_ops *qed_get_eth_ops(u32 version)
diff --git a/drivers/net/ethernet/qlogic/qed/qed_main.c b/drivers/net/ethernet/qlogic/qed/qed_main.c
index 26d40db..01c0207 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_main.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_main.c
@@ -776,7 +776,7 @@ static int qed_slowpath_start(struct qed_dev *cdev,
 	/* Start the slowpath */
 	data = cdev->firmware->data;
 
-	rc = qed_hw_init(cdev, true, cdev->int_params.out.int_mode,
+	rc = qed_hw_init(cdev, NULL, true, cdev->int_params.out.int_mode,
 			 true, data);
 	if (rc)
 		goto err2;
diff --git a/drivers/net/ethernet/qlogic/qed/qed_reg_addr.h b/drivers/net/ethernet/qlogic/qed/qed_reg_addr.h
index c15b162..55451a4 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_reg_addr.h
+++ b/drivers/net/ethernet/qlogic/qed/qed_reg_addr.h
@@ -427,4 +427,35 @@
 	0x2aae60UL
 #define PGLUE_B_REG_PF_BAR1_SIZE \
 	0x2aae64UL
+#define PRS_REG_ENCAPSULATION_TYPE_EN	0x1f0730UL
+#define PRS_REG_GRE_PROTOCOL		0x1f0734UL
+#define PRS_REG_VXLAN_PORT		0x1f0738UL
+#define PRS_REG_OUTPUT_FORMAT_4_0	0x1f099cUL
+#define NIG_REG_ENC_TYPE_ENABLE		0x501058UL
+
+#define NIG_REG_ENC_TYPE_ENABLE_ETH_OVER_GRE_ENABLE		(0x1 << 0)
+#define NIG_REG_ENC_TYPE_ENABLE_ETH_OVER_GRE_ENABLE_SHIFT	0
+#define NIG_REG_ENC_TYPE_ENABLE_IP_OVER_GRE_ENABLE		(0x1 << 1)
+#define NIG_REG_ENC_TYPE_ENABLE_IP_OVER_GRE_ENABLE_SHIFT	1
+#define NIG_REG_ENC_TYPE_ENABLE_VXLAN_ENABLE			(0x1 << 2)
+#define NIG_REG_ENC_TYPE_ENABLE_VXLAN_ENABLE_SHIFT		2
+
+#define NIG_REG_VXLAN_PORT		0x50105cUL
+#define PBF_REG_VXLAN_PORT		0xd80518UL
+#define PBF_REG_NGE_PORT		0xd8051cUL
+#define PRS_REG_NGE_PORT		0x1f086cUL
+#define NIG_REG_NGE_PORT		0x508b38UL
+
+#define DORQ_REG_L2_EDPM_TUNNEL_GRE_ETH_EN	0x10090cUL
+#define DORQ_REG_L2_EDPM_TUNNEL_GRE_IP_EN	0x100910UL
+#define DORQ_REG_L2_EDPM_TUNNEL_VXLAN_EN	0x100914UL
+#define DORQ_REG_L2_EDPM_TUNNEL_NGE_IP_EN	0x10092cUL
+#define DORQ_REG_L2_EDPM_TUNNEL_NGE_ETH_EN	0x100930UL
+
+#define NIG_REG_NGE_IP_ENABLE			0x508b28UL
+#define NIG_REG_NGE_ETH_ENABLE			0x508b2cUL
+#define NIG_REG_NGE_COMP_VER			0x508b30UL
+#define PBF_REG_NGE_COMP_VER			0xd80524UL
+#define PRS_REG_NGE_COMP_VER			0x1f0878UL
+
 #endif
diff --git a/drivers/net/ethernet/qlogic/qed/qed_sp.h b/drivers/net/ethernet/qlogic/qed/qed_sp.h
index d39f914..4b91cb3 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_sp.h
+++ b/drivers/net/ethernet/qlogic/qed/qed_sp.h
@@ -52,6 +52,7 @@ int qed_eth_cqe_completion(struct qed_hwfn *p_hwfn,
 
 union ramrod_data {
 	struct pf_start_ramrod_data pf_start;
+	struct pf_update_ramrod_data pf_update;
 	struct rx_queue_start_ramrod_data rx_queue_start;
 	struct rx_queue_update_ramrod_data rx_queue_update;
 	struct rx_queue_stop_ramrod_data rx_queue_stop;
@@ -338,12 +339,14 @@ int qed_sp_init_request(struct qed_hwfn *p_hwfn,
  * to the internal RAM of the UStorm by the Function Start Ramrod.
  *
  * @param p_hwfn
+ * @param p_tunn
  * @param mode
  *
  * @return int
  */
 
 int qed_sp_pf_start(struct qed_hwfn *p_hwfn,
+		    struct qed_tunn_start_params *p_tunn,
 		    enum qed_mf_mode mode);
 
 /**
@@ -362,4 +365,8 @@ int qed_sp_pf_start(struct qed_hwfn *p_hwfn,
 
 int qed_sp_pf_stop(struct qed_hwfn *p_hwfn);
 
+int qed_sp_pf_update_tunn_cfg(struct qed_hwfn *p_hwfn,
+			      struct qed_tunn_update_params *p_tunn,
+			      enum spq_mode comp_mode,
+			      struct qed_spq_comp_cb *p_comp_data);
 #endif
diff --git a/drivers/net/ethernet/qlogic/qed/qed_sp_commands.c b/drivers/net/ethernet/qlogic/qed/qed_sp_commands.c
index 1c06c37..306da70 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_sp_commands.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_sp_commands.c
@@ -87,7 +87,217 @@ int qed_sp_init_request(struct qed_hwfn *p_hwfn,
 	return 0;
 }
 
+static enum tunnel_clss qed_tunn_get_clss_type(u8 type)
+{
+	switch (type) {
+	case QED_TUNN_CLSS_MAC_VLAN:
+		return TUNNEL_CLSS_MAC_VLAN;
+	case QED_TUNN_CLSS_MAC_VNI:
+		return TUNNEL_CLSS_MAC_VNI;
+	case QED_TUNN_CLSS_INNER_MAC_VLAN:
+		return TUNNEL_CLSS_INNER_MAC_VLAN;
+	case QED_TUNN_CLSS_INNER_MAC_VNI:
+		return TUNNEL_CLSS_INNER_MAC_VNI;
+	default:
+		return TUNNEL_CLSS_MAC_VLAN;
+	}
+}
+
+static void
+qed_tunn_set_pf_fix_tunn_mode(struct qed_hwfn *p_hwfn,
+			      struct qed_tunn_update_params *p_src,
+			      struct pf_update_tunnel_config *p_tunn_cfg)
+{
+	unsigned long cached_tunn_mode = p_hwfn->cdev->tunn_mode;
+	unsigned long update_mask = p_src->tunn_mode_update_mask;
+	unsigned long tunn_mode = p_src->tunn_mode;
+	unsigned long new_tunn_mode = 0;
+
+	if (test_bit(QED_MODE_L2GRE_TUNN, &update_mask)) {
+		if (test_bit(QED_MODE_L2GRE_TUNN, &tunn_mode))
+			__set_bit(QED_MODE_L2GRE_TUNN, &new_tunn_mode);
+	} else {
+		if (test_bit(QED_MODE_L2GRE_TUNN, &cached_tunn_mode))
+			__set_bit(QED_MODE_L2GRE_TUNN, &new_tunn_mode);
+	}
+
+	if (test_bit(QED_MODE_IPGRE_TUNN, &update_mask)) {
+		if (test_bit(QED_MODE_IPGRE_TUNN, &tunn_mode))
+			__set_bit(QED_MODE_IPGRE_TUNN, &new_tunn_mode);
+	} else {
+		if (test_bit(QED_MODE_IPGRE_TUNN, &cached_tunn_mode))
+			__set_bit(QED_MODE_IPGRE_TUNN, &new_tunn_mode);
+	}
+
+	if (test_bit(QED_MODE_VXLAN_TUNN, &update_mask)) {
+		if (test_bit(QED_MODE_VXLAN_TUNN, &tunn_mode))
+			__set_bit(QED_MODE_VXLAN_TUNN, &new_tunn_mode);
+	} else {
+		if (test_bit(QED_MODE_VXLAN_TUNN, &cached_tunn_mode))
+			__set_bit(QED_MODE_VXLAN_TUNN, &new_tunn_mode);
+	}
+
+	if (p_src->update_geneve_udp_port) {
+		p_tunn_cfg->set_geneve_udp_port_flg = 1;
+		p_tunn_cfg->geneve_udp_port =
+				cpu_to_le16(p_src->geneve_udp_port);
+	}
+
+	if (test_bit(QED_MODE_L2GENEVE_TUNN, &update_mask)) {
+		if (test_bit(QED_MODE_L2GENEVE_TUNN, &tunn_mode))
+			__set_bit(QED_MODE_L2GENEVE_TUNN, &new_tunn_mode);
+	} else {
+		if (test_bit(QED_MODE_L2GENEVE_TUNN, &cached_tunn_mode))
+			__set_bit(QED_MODE_L2GENEVE_TUNN, &new_tunn_mode);
+	}
+
+	if (test_bit(QED_MODE_IPGENEVE_TUNN, &update_mask)) {
+		if (test_bit(QED_MODE_IPGENEVE_TUNN, &tunn_mode))
+			__set_bit(QED_MODE_IPGENEVE_TUNN, &new_tunn_mode);
+	} else {
+		if (test_bit(QED_MODE_IPGENEVE_TUNN, &cached_tunn_mode))
+			__set_bit(QED_MODE_IPGENEVE_TUNN, &new_tunn_mode);
+	}
+
+	p_src->tunn_mode = new_tunn_mode;
+}
+
+static void
+qed_tunn_set_pf_update_params(struct qed_hwfn *p_hwfn,
+			      struct qed_tunn_update_params *p_src,
+			      struct pf_update_tunnel_config *p_tunn_cfg)
+{
+	unsigned long tunn_mode = p_src->tunn_mode;
+	enum tunnel_clss type;
+
+	qed_tunn_set_pf_fix_tunn_mode(p_hwfn, p_src, p_tunn_cfg);
+	p_tunn_cfg->update_rx_pf_clss = p_src->update_rx_pf_clss;
+	p_tunn_cfg->update_tx_pf_clss = p_src->update_tx_pf_clss;
+
+	type = qed_tunn_get_clss_type(p_src->tunn_clss_vxlan);
+	p_tunn_cfg->tunnel_clss_vxlan  = type;
+
+	type = qed_tunn_get_clss_type(p_src->tunn_clss_l2gre);
+	p_tunn_cfg->tunnel_clss_l2gre = type;
+
+	type = qed_tunn_get_clss_type(p_src->tunn_clss_ipgre);
+	p_tunn_cfg->tunnel_clss_ipgre = type;
+
+	if (p_src->update_vxlan_udp_port) {
+		p_tunn_cfg->set_vxlan_udp_port_flg = 1;
+		p_tunn_cfg->vxlan_udp_port = cpu_to_le16(p_src->vxlan_udp_port);
+	}
+
+	if (test_bit(QED_MODE_L2GRE_TUNN, &tunn_mode))
+		p_tunn_cfg->tx_enable_l2gre = 1;
+
+	if (test_bit(QED_MODE_IPGRE_TUNN, &tunn_mode))
+		p_tunn_cfg->tx_enable_ipgre = 1;
+
+	if (test_bit(QED_MODE_VXLAN_TUNN, &tunn_mode))
+		p_tunn_cfg->tx_enable_vxlan = 1;
+
+	if (p_src->update_geneve_udp_port) {
+		p_tunn_cfg->set_geneve_udp_port_flg = 1;
+		p_tunn_cfg->geneve_udp_port =
+				cpu_to_le16(p_src->geneve_udp_port);
+	}
+
+	if (test_bit(QED_MODE_L2GENEVE_TUNN, &tunn_mode))
+		p_tunn_cfg->tx_enable_l2geneve = 1;
+
+	if (test_bit(QED_MODE_IPGENEVE_TUNN, &tunn_mode))
+		p_tunn_cfg->tx_enable_ipgeneve = 1;
+
+	type = qed_tunn_get_clss_type(p_src->tunn_clss_l2geneve);
+	p_tunn_cfg->tunnel_clss_l2geneve = type;
+
+	type = qed_tunn_get_clss_type(p_src->tunn_clss_ipgeneve);
+	p_tunn_cfg->tunnel_clss_ipgeneve = type;
+}
+
+static void qed_set_hw_tunn_mode(struct qed_hwfn *p_hwfn,
+				 struct qed_ptt *p_ptt,
+				 unsigned long tunn_mode)
+{
+	u8 l2gre_enable = 0, ipgre_enable = 0, vxlan_enable = 0;
+	u8 l2geneve_enable = 0, ipgeneve_enable = 0;
+
+	if (test_bit(QED_MODE_L2GRE_TUNN, &tunn_mode))
+		l2gre_enable = 1;
+
+	if (test_bit(QED_MODE_IPGRE_TUNN, &tunn_mode))
+		ipgre_enable = 1;
+
+	if (test_bit(QED_MODE_VXLAN_TUNN, &tunn_mode))
+		vxlan_enable = 1;
+
+	qed_set_gre_enable(p_hwfn, p_ptt, l2gre_enable, ipgre_enable);
+	qed_set_vxlan_enable(p_hwfn, p_ptt, vxlan_enable);
+
+	if (test_bit(QED_MODE_L2GENEVE_TUNN, &tunn_mode))
+		l2geneve_enable = 1;
+
+	if (test_bit(QED_MODE_IPGENEVE_TUNN, &tunn_mode))
+		ipgeneve_enable = 1;
+
+	qed_set_geneve_enable(p_hwfn, p_ptt, l2geneve_enable,
+			      ipgeneve_enable);
+}
+
+static void
+qed_tunn_set_pf_start_params(struct qed_hwfn *p_hwfn,
+			     struct qed_tunn_start_params *p_src,
+			     struct pf_start_tunnel_config *p_tunn_cfg)
+{
+	unsigned long tunn_mode;
+	enum tunnel_clss type;
+
+	if (!p_src)
+		return;
+
+	tunn_mode = p_src->tunn_mode;
+	type = qed_tunn_get_clss_type(p_src->tunn_clss_vxlan);
+	p_tunn_cfg->tunnel_clss_vxlan = type;
+	type = qed_tunn_get_clss_type(p_src->tunn_clss_l2gre);
+	p_tunn_cfg->tunnel_clss_l2gre = type;
+	type = qed_tunn_get_clss_type(p_src->tunn_clss_ipgre);
+	p_tunn_cfg->tunnel_clss_ipgre = type;
+
+	if (p_src->update_vxlan_udp_port) {
+		p_tunn_cfg->set_vxlan_udp_port_flg = 1;
+		p_tunn_cfg->vxlan_udp_port = cpu_to_le16(p_src->vxlan_udp_port);
+	}
+
+	if (test_bit(QED_MODE_L2GRE_TUNN, &tunn_mode))
+		p_tunn_cfg->tx_enable_l2gre = 1;
+
+	if (test_bit(QED_MODE_IPGRE_TUNN, &tunn_mode))
+		p_tunn_cfg->tx_enable_ipgre = 1;
+
+	if (test_bit(QED_MODE_VXLAN_TUNN, &tunn_mode))
+		p_tunn_cfg->tx_enable_vxlan = 1;
+
+	if (p_src->update_geneve_udp_port) {
+		p_tunn_cfg->set_geneve_udp_port_flg = 1;
+		p_tunn_cfg->geneve_udp_port =
+				cpu_to_le16(p_src->geneve_udp_port);
+	}
+
+	if (test_bit(QED_MODE_L2GENEVE_TUNN, &tunn_mode))
+		p_tunn_cfg->tx_enable_l2geneve = 1;
+
+	if (test_bit(QED_MODE_IPGENEVE_TUNN, &tunn_mode))
+		p_tunn_cfg->tx_enable_ipgeneve = 1;
+
+	type = qed_tunn_get_clss_type(p_src->tunn_clss_l2geneve);
+	p_tunn_cfg->tunnel_clss_l2geneve = type;
+	type = qed_tunn_get_clss_type(p_src->tunn_clss_ipgeneve);
+	p_tunn_cfg->tunnel_clss_ipgeneve = type;
+}
+
 int qed_sp_pf_start(struct qed_hwfn *p_hwfn,
+		    struct qed_tunn_start_params *p_tunn,
 		    enum qed_mf_mode mode)
 {
 	struct pf_start_ramrod_data *p_ramrod = NULL;
@@ -143,6 +353,7 @@ int qed_sp_pf_start(struct qed_hwfn *p_hwfn,
 	DMA_REGPAIR_LE(p_ramrod->consolid_q_pbl_addr,
 		       p_hwfn->p_consq->chain.pbl.p_phys_table);
 
+	qed_tunn_set_pf_start_params(p_hwfn, NULL, NULL);
 	p_hwfn->hw_info.personality = PERSONALITY_ETH;
 
 	DP_VERBOSE(p_hwfn, QED_MSG_SPQ,
@@ -153,6 +364,49 @@ int qed_sp_pf_start(struct qed_hwfn *p_hwfn,
 	return qed_spq_post(p_hwfn, p_ent, NULL);
 }
 
+/* Set pf update ramrod command params */
+int qed_sp_pf_update_tunn_cfg(struct qed_hwfn *p_hwfn,
+			      struct qed_tunn_update_params *p_tunn,
+			      enum spq_mode comp_mode,
+			      struct qed_spq_comp_cb *p_comp_data)
+{
+	struct qed_spq_entry *p_ent = NULL;
+	struct qed_sp_init_data init_data;
+	int rc = -EINVAL;
+
+	/* Get SPQ entry */
+	memset(&init_data, 0, sizeof(init_data));
+	init_data.cid = qed_spq_get_cid(p_hwfn);
+	init_data.opaque_fid = p_hwfn->hw_info.opaque_fid;
+	init_data.comp_mode = comp_mode;
+	init_data.p_comp_data = p_comp_data;
+
+	rc = qed_sp_init_request(p_hwfn, &p_ent,
+				 COMMON_RAMROD_PF_UPDATE, PROTOCOLID_COMMON,
+				 &init_data);
+	if (rc)
+		return rc;
+
+	qed_tunn_set_pf_update_params(p_hwfn, p_tunn,
+				      &p_ent->ramrod.pf_update.tunnel_config);
+
+	rc = qed_spq_post(p_hwfn, p_ent, NULL);
+	if (rc)
+		return rc;
+
+	if (p_tunn->update_vxlan_udp_port)
+		qed_set_vxlan_dest_port(p_hwfn, p_hwfn->p_main_ptt,
+					p_tunn->vxlan_udp_port);
+	if (p_tunn->update_geneve_udp_port)
+		qed_set_geneve_dest_port(p_hwfn, p_hwfn->p_main_ptt,
+					 p_tunn->geneve_udp_port);
+
+	qed_set_hw_tunn_mode(p_hwfn, p_hwfn->p_main_ptt, p_tunn->tunn_mode);
+	p_hwfn->cdev->tunn_mode = p_tunn->tunn_mode;
+
+	return rc;
+}
+
 int qed_sp_pf_stop(struct qed_hwfn *p_hwfn)
 {
 	struct qed_spq_entry *p_ent = NULL;
diff --git a/include/linux/qed/qed_eth_if.h b/include/linux/qed/qed_eth_if.h
index e1d6983..713334d 100644
--- a/include/linux/qed/qed_eth_if.h
+++ b/include/linux/qed/qed_eth_if.h
@@ -111,6 +111,13 @@ struct qed_queue_start_common_params {
 	u16 sb_idx;
 };
 
+struct qed_tunn_params {
+	u16 vxlan_port;
+	u8 update_vxlan_port;
+	u16 geneve_port;
+	u8 update_geneve_port;
+};
+
 struct qed_eth_cb_ops {
 	struct qed_common_cb_ops common;
 };
@@ -165,6 +172,9 @@ struct qed_eth_ops {
 
 	void (*get_vport_stats)(struct qed_dev *cdev,
 				struct qed_eth_stats *stats);
+
+	int (*tunn_config)(struct qed_dev *cdev,
+			   struct qed_tunn_params *params);
 };
 
 const struct qed_eth_ops *qed_get_eth_ops(u32 version);
-- 
2.7.2

^ permalink raw reply related

* Re: [RFC PATCH v2 1/5] bpf: add PHYS_DEV prog type for early driver filter
From: Tom Herbert @ 2016-04-09 13:17 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: Brenden Blanco, David S. Miller, Linux Kernel Network Developers,
	Alexei Starovoitov, Or Gerlitz, Daniel Borkmann, Eric Dumazet,
	Edward Cree, john fastabend, Thomas Graf, Johannes Berg,
	eranlinuxmellanox, Lorenzo Colitti
In-Reply-To: <20160409142759.25d8464a@redhat.com>

On Sat, Apr 9, 2016 at 9:27 AM, Jesper Dangaard Brouer
<brouer@redhat.com> wrote:
> On Sat, 9 Apr 2016 08:17:04 -0300
> Tom Herbert <tom@herbertland.com> wrote:
>
>> One other API issue is how to deal with encapsulation. In this case a
>> header may be prepended to the packet, I assume there are BPF helper
>> functions and we don't need to return a new length or start?
>
> That reminds me.  Do the BPF program need to know the head-room, then?
>
Right, that is basically my question. Can we have a helper function in
BPF that will prepend n bytes to the buffer? (I don't think we want
expose a notion of headroom).

Tom

> --
> Best regards,
>   Jesper Dangaard Brouer
>   MSc.CS, Principal Kernel Engineer at Red Hat
>   Author of http://www.iptv-analyzer.org
>   LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox