Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: Soft lockup in inet_put_port on 4.6
From: Josef Bacik @ 2016-12-08 21:36 UTC (permalink / raw)
  To: Hannes Frederic Sowa; +Cc: Tom Herbert, Linux Kernel Network Developers
In-Reply-To: <1481231024.1911284.813071977.72AF4DEE@webmail.messagingengine.com>

On Thu, Dec 8, 2016 at 4:03 PM, Hannes Frederic Sowa 
<hannes@stressinduktion.org> wrote:
> Hello Tom,
> 
> On Wed, Dec 7, 2016, at 00:06, Tom Herbert wrote:
>>  We are seeing a fair number of machines getting into softlockup in 
>> 4.6
>>  kernel. As near as I can tell this is happening on the spinlock in
>>  bind hash bucket. When inet_csk_get_port exits and does 
>> spinunlock_bh
>>  the TCP timer runs and we hit lockup in inet_put_port (presumably on
>>  same lock). It seems like the locked isn't properly be unlocked
>>  somewhere but I don't readily see it.
>> 
>>  Any ideas?
> 
> Likewise we received reports that pretty much look the same on our
> heavily patched kernel. Did you have a chance to investigate or
> reproduce the problem?
> 
> I am wondering if you would be able to take a complete thread stack 
> dump
> if you can reproduce this to check if one of the user space processes 
> is
> looping inside finding a free port?

We can reproduce the problem at will, still trying to run down the 
problem.  I'll try and find one of the boxes that dumped a core and get 
a bt of everybody.  Thanks,

Josef

^ permalink raw reply

* Re: [net-next PATCH v5 1/6] net: virtio dynamically disable/enable LRO
From: Michael S. Tsirkin @ 2016-12-08 21:36 UTC (permalink / raw)
  To: John Fastabend
  Cc: daniel, shm, davem, tgraf, alexei.starovoitov, john.r.fastabend,
	netdev, brouer
In-Reply-To: <20161207201111.28121.4879.stgit@john-Precision-Tower-5810>

On Wed, Dec 07, 2016 at 12:11:11PM -0800, John Fastabend wrote:
> This adds support for dynamically setting the LRO feature flag. The
> message to control guest features in the backend uses the
> CTRL_GUEST_OFFLOADS msg type.
> 
> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
> ---
>  drivers/net/virtio_net.c |   40 +++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 39 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index a21d93a..a5c47b1 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -1419,6 +1419,36 @@ static void virtnet_init_settings(struct net_device *dev)
>  	.set_settings = virtnet_set_settings,
>  };
>  
> +static int virtnet_set_features(struct net_device *netdev,
> +				netdev_features_t features)
> +{
> +	struct virtnet_info *vi = netdev_priv(netdev);
> +	struct virtio_device *vdev = vi->vdev;
> +	struct scatterlist sg;
> +	u64 offloads = 0;
> +
> +	if (features & NETIF_F_LRO)
> +		offloads |= (1 << VIRTIO_NET_F_GUEST_TSO4) |
> +			    (1 << VIRTIO_NET_F_GUEST_TSO6);
> +
> +	if (features & NETIF_F_RXCSUM)
> +		offloads |= (1 << VIRTIO_NET_F_GUEST_CSUM);
> +
> +	if (virtio_has_feature(vdev, VIRTIO_NET_F_CTRL_GUEST_OFFLOADS)) {
> +		sg_init_one(&sg, &offloads, sizeof(uint64_t));
> +		if (!virtnet_send_command(vi,
> +					  VIRTIO_NET_CTRL_GUEST_OFFLOADS,
> +					  VIRTIO_NET_CTRL_GUEST_OFFLOADS_SET,
> +					  &sg)) {

Hmm I just realised that this will slow down setups that bridge
virtio net interfaces since bridge calls this if provided.
See below.

> +			dev_warn(&netdev->dev,
> +				 "Failed to set guest offloads by virtnet command.\n");
> +			return -EINVAL;
> +		}
> +	}

Hmm if VIRTIO_NET_F_CTRL_GUEST_OFFLOADS is off, this fails
silently. It might actually be a good idea to avoid
breaking setups.

> +
> +	return 0;
> +}
> +
>  static const struct net_device_ops virtnet_netdev = {
>  	.ndo_open            = virtnet_open,
>  	.ndo_stop   	     = virtnet_close,
> @@ -1435,6 +1465,7 @@ static void virtnet_init_settings(struct net_device *dev)
>  #ifdef CONFIG_NET_RX_BUSY_POLL
>  	.ndo_busy_poll		= virtnet_busy_poll,
>  #endif
> +	.ndo_set_features	= virtnet_set_features,
>  };
>  
>  static void virtnet_config_changed_work(struct work_struct *work)
> @@ -1815,6 +1846,12 @@ static int virtnet_probe(struct virtio_device *vdev)
>  	if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_CSUM))
>  		dev->features |= NETIF_F_RXCSUM;
>  
> +	if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO4) &&
> +	    virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO6)) {
> +		dev->features |= NETIF_F_LRO;
> +		dev->hw_features |= NETIF_F_LRO;

So the issue is I think that the virtio "LRO" isn't really
LRO, it's typically just GRO forwarded to guests.
So these are easily re-split along MTU boundaries,
which makes it ok to forward these across bridges.

It's not nice that we don't document this in the spec,
but it's the reality and people rely on this.

For now, how about doing a custom thing and just disable/enable
it as XDP is attached/detached?

> +	}
> +
>  	dev->vlan_features = dev->features;
>  
>  	/* MTU range: 68 - 65535 */
> @@ -2057,7 +2094,8 @@ static int virtnet_restore(struct virtio_device *vdev)
>  	VIRTIO_NET_F_CTRL_RX, VIRTIO_NET_F_CTRL_VLAN, \
>  	VIRTIO_NET_F_GUEST_ANNOUNCE, VIRTIO_NET_F_MQ, \
>  	VIRTIO_NET_F_CTRL_MAC_ADDR, \
> -	VIRTIO_NET_F_MTU
> +	VIRTIO_NET_F_MTU, \
> +	VIRTIO_NET_F_CTRL_GUEST_OFFLOADS
>  
>  static unsigned int features[] = {
>  	VIRTNET_FEATURES,

^ permalink raw reply

* Re: [net-next PATCH v5 5/6] virtio_net: add XDP_TX support
From: John Fastabend @ 2016-12-08 21:25 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: daniel, shm, davem, tgraf, alexei.starovoitov, john.r.fastabend,
	netdev, brouer
In-Reply-To: <20161208231708-mutt-send-email-mst@kernel.org>

On 16-12-08 01:18 PM, Michael S. Tsirkin wrote:
> On Thu, Dec 08, 2016 at 10:18:22AM -0800, John Fastabend wrote:
>>> I guess this helps because it just slows down the guest.
>>> I don't much like it ...
>>
>> I left it like this copying the pattern in balloon and input drivers. I
>> can change it back to the previous pattern where it is only called if
>> there is no errors. It has been running fine with the old pattern now
>> for an hour or so.
>>
>> .John
> 
> I'd prefer internal consistency. Could be a patch on top
> if this helps. I'm happy it isn't actually buggy.
> Let me know whether you want to post v6 or
> want me to ack this one.
> 

I think because DaveM has closed net-next ACK'ing this set would be
great to get it in Dave's tree. Then I can post a small "fix" so that it
is consistent between normal stack and xdp tx path after that.

Thanks,
John

^ permalink raw reply

* Re: [net-next PATCH v5 5/6] virtio_net: add XDP_TX support
From: Michael S. Tsirkin @ 2016-12-08 21:18 UTC (permalink / raw)
  To: John Fastabend
  Cc: daniel, shm, davem, tgraf, alexei.starovoitov, john.r.fastabend,
	netdev, brouer
In-Reply-To: <5849A3EE.7090603@gmail.com>

On Thu, Dec 08, 2016 at 10:18:22AM -0800, John Fastabend wrote:
> > I guess this helps because it just slows down the guest.
> > I don't much like it ...
> 
> I left it like this copying the pattern in balloon and input drivers. I
> can change it back to the previous pattern where it is only called if
> there is no errors. It has been running fine with the old pattern now
> for an hour or so.
> 
> .John

I'd prefer internal consistency. Could be a patch on top
if this helps. I'm happy it isn't actually buggy.
Let me know whether you want to post v6 or
want me to ack this one.

^ permalink raw reply

* Re: [PATCH v2 net-next 0/4] udp: receive path optimizations
From: Jesper Dangaard Brouer @ 2016-12-08 21:17 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: brouer, David S . Miller, netdev, Paolo Abeni, Eric Dumazet
In-Reply-To: <20161208214819.30138d12@redhat.com>

On Thu, 8 Dec 2016 21:48:19 +0100
Jesper Dangaard Brouer <brouer@redhat.com> wrote:

> On Thu,  8 Dec 2016 09:38:55 -0800
> Eric Dumazet <edumazet@google.com> wrote:
> 
> > This patch series provides about 100 % performance increase under flood.   
> 
> Could you please explain a bit more about what kind of testing you are
> doing that can show 100% performance improvement?
> 
> I've tested this patchset and my tests show *huge* speeds ups, but
> reaping the performance benefit depend heavily on setup and enabling
> the right UDP socket settings, and most importantly where the
> performance bottleneck is: ksoftirqd(producer) or udp_sink(consumer).
> 
> Basic setup: Unload all netfilter, and enable ip_early_demux.
>  sysctl net/ipv4/ip_early_demux=1
> 
> Test generator pktgen UDP packets single flow, 50Gbit/s mlx5 NICs.
>  - Vary packet size between 64 and 1514.

Below, I've added the baseline tests.

Baseline test on net-next at commit c9fba3ed3a4

> Packet-size: 64
> $ sudo taskset -c 4 ./udp_sink --port 9 --count $((10**7))
>                                 ns/pkt  pps             cycles/pkt
> recvMmsg/32  	run: 0 10000000	537.70	1859756.90	2155
> recvmsg   	run: 0 10000000	510.84	1957541.83	2047
> read      	run: 0 10000000	583.40	1714077.14	2338
> recvfrom  	run: 0 10000000	600.09	1666411.49	2405

Packet-size: 64 (baseline)
$ sudo taskset -c 4 ./udp_sink --port 9 --count $((10**7))
recvMmsg/32  	run: 0 10000000	499.75	2001016.09	2003
recvmsg   	run: 0 10000000	455.84	2193740.92	1827
read      	run: 0 10000000	566.99	1763703.49	2272
recvfrom  	run: 0 10000000	581.02	1721098.87	2328

 
> The ksoftirq thread "cost" more than udp_sink, which is idle, and UDP
> queue does not get full-enough. Thus, patchset does not have any
> effect.
> 
> 
> Try to increase pktgen packet size, as this increase the copy cost of
> udp_sink.  Thus, a queue can now form, and udp_sink CPU almost have no
> idle cycles.  The "read" and "readfrom" did experience some idle
> cycles.
> 
> Packet-size: 1514
> $ sudo taskset -c 4 ./udp_sink --port 9 --count $((10**7))
>                                 ns/pkt  pps             cycles/pkt
> recvMmsg/32  	run: 0 10000000	435.88	2294204.11	1747
> recvmsg   	run: 0 10000000	458.06	2183100.64	1835
> read      	run: 0 10000000	520.34	1921826.18	2085
> recvfrom  	run: 0 10000000	515.48	1939935.27	2066

Packet-size: 1514 (baseline)
$ sudo taskset -c 4 ./udp_sink --port 9 --count $((10**7))
recvMmsg/32  	run: 0 10000000	453.88	2203231.81	1819
recvmsg   	run: 0 10000000	488.31	2047869.13	1957
read      	run: 0 10000000	480.99	2079058.69	1927
recvfrom  	run: 0 10000000	522.64	1913349.26	2094


> Next trick connected UDP:
> 
> Use connected UDP socket (combined with ip_early_demux), removes the
> FIB_lookup from the ksoftirq, and cause tipping point to be better.
> 
> Packet-size: 64
> $ sudo taskset -c 4 ./udp_sink --port 9 --count $((10**7)) --connect
>                                 ns/pkt  pps             cycles/pkt
> recvMmsg/32  	run: 0 10000000	391.18	2556361.62	1567
> recvmsg   	run: 0 10000000	422.95	2364349.69	1695
> read      	run: 0 10000000	425.29	2351338.10	1704
> recvfrom  	run: 0 10000000	476.74	2097577.57	1910

Packet-size: 64 (baseline)
$ sudo taskset -c 4 ./udp_sink --port 9 --count $((10**7)) --connect
recvMmsg/32  	run: 0 10000000	438.55	2280255.77	1757
recvmsg   	run: 0 10000000	496.73	2013156.99	1990
read      	run: 0 10000000	412.17	2426170.58	1652
recvfrom  	run: 0 10000000	471.77	2119662.99	1890


> Change/increase packet size:
> 
> Packet-size: 1514
> $ sudo taskset -c 4 ./udp_sink --port 9 --count $((10**7)) --connect
>                                 ns/pkt  pps             cycles/pkt
> recvMmsg/32  	run: 0 10000000	457.56	2185481.94	1833
> recvmsg   	run: 0 10000000	479.42	2085837.49	1921
> read      	run: 0 10000000	398.05	2512233.13	1595
> recvfrom  	run: 0 10000000	391.07	2557096.95	1567

Packet-size: 1514 (baseline)
$ sudo taskset -c 4 ./udp_sink --port 9 --count $((10**7)) --connect
recvMmsg/32  	run: 0 10000000	491.11	2036205.63	1968
recvmsg   	run: 0 10000000	514.37	1944138.31	2061
read      	run: 0 10000000	444.02	2252147.84	1779
recvfrom  	run: 0 10000000	426.58	2344247.20	1709


> A bit strange, changing the packet size, flipped what is the fastest
> syscall.
> 
> It is also interesting to see that ksoftirq limit is:
> 
> Result from "nstat" while using recvmsg, show that ksoftirq is
> handling 2.6 Mpps, and consumer/udp_sink is bottleneck with 2Mpps.
> 
> [skylake ~]$ nstat > /dev/null && sleep 1  && nstat
> #kernel
> IpInReceives                    2667577            0.0
> IpInDelivers                    2667577            0.0
> UdpInDatagrams                  2083580            0.0
> UdpInErrors                     583995             0.0
> UdpRcvbufErrors                 583995             0.0
> IpExtInOctets                   4001340000         0.0
> IpExtInNoECTPkts                2667559            0.0

(baseline 1514 bytes recvmsg)
$ nstat > /dev/null && sleep 1  && nstat
#kernel
IpInReceives                    2702424            0.0
IpInDelivers                    2702423            0.0
UdpInDatagrams                  1950184            0.0
UdpInErrors                     752239             0.0
UdpRcvbufErrors                 752239             0.0
IpExtInOctets                   4053642000         0.0
IpExtInNoECTPkts                2702428            0.0

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply

* Re: [net-next PATCH v5 0/6] XDP for virtio_net
From: Michael S. Tsirkin @ 2016-12-08 21:16 UTC (permalink / raw)
  To: David Miller
  Cc: john.fastabend, daniel, shm, tgraf, alexei.starovoitov,
	john.r.fastabend, netdev, brouer
In-Reply-To: <20161208.141702.1346950420275854265.davem@davemloft.net>

On Thu, Dec 08, 2016 at 02:17:02PM -0500, David Miller wrote:
> From: John Fastabend <john.fastabend@gmail.com>
> Date: Wed, 07 Dec 2016 12:10:47 -0800
> 
> > This implements virtio_net for the mergeable buffers and big_packet
> > modes. I tested this with vhost_net running on qemu and did not see
> > any issues. For testing num_buf > 1 I added a hack to vhost driver
> > to only but 100 bytes per buffer.
>  ...
> 
> So where are we with this?
> 
> I'm not too thrilled with the idea of making XDP_TX optional or
> something like that.  If someone enables XDP, there is a tradeoff.

The issue is inability of XDP TX to share xmit queues with net stack.
I'm guessing virtio is not the only card that has a limited
number of queues, is it? Is it really so hard to lock the queue
and check it's running? Could be optional in case resources are there
...

-- 
MST

^ permalink raw reply

* Re: [PATCH v2 net-next 0/4] udp: receive path optimizations
From: Eric Dumazet @ 2016-12-08 21:13 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: Eric Dumazet, David S . Miller, netdev, Paolo Abeni
In-Reply-To: <20161208214819.30138d12@redhat.com>

On Thu, 2016-12-08 at 21:48 +0100, Jesper Dangaard Brouer wrote:
> On Thu,  8 Dec 2016 09:38:55 -0800
> Eric Dumazet <edumazet@google.com> wrote:
> 
> > This patch series provides about 100 % performance increase under flood. 
> 
> Could you please explain a bit more about what kind of testing you are
> doing that can show 100% performance improvement?
> 
> I've tested this patchset and my tests show *huge* speeds ups, but
> reaping the performance benefit depend heavily on setup and enabling
> the right UDP socket settings, and most importantly where the
> performance bottleneck is: ksoftirqd(producer) or udp_sink(consumer).

Right.

So here at Google we do not try (yet) to downgrade our expensive
Multiqueue Nics into dumb NICS from last decade by using a single queue
on them. Maybe it will happen when we can process 10Mpps per core,
but we are not there yet  ;)

So my test is using a NIC, programmed with 8 queues, on a dual-socket
machine. (2 physical packages)

4 queues are handled by 4 cpus on socket0 (NUMA node 0)
4 queues are handled by 4 cpus on socket1 (NUMA node 1)

So I explicitly put my poor single thread UDP application in the worst
condition, having skbs produced on two NUMA nodes. 

Then my load generator use trafgen, with spoofed UDP source addresses,
like a UDP flood would use. Or typical DNS traffic, malicious or not.

So I have 8 cpus all trying to queue packets in a single UDP socket.

Of course, a real high performance server would use 8 UDP sockets, and
SO_REUSEPORT with nice eBPF filter to spread the packets based on the
queue/cpu they arrived.

In the case you have one cpu that you need to share between ksoftirq and
all user threads, then your test results depend on process scheduler
decisions more than anything we can code in network land.

It is actually easy for user space to get more than 50% of the cycles,
and 'starve' ksoftirqd.

^ permalink raw reply

* Re: [net-next PATCH v5 0/6] XDP for virtio_net
From: Michael S. Tsirkin @ 2016-12-08 21:10 UTC (permalink / raw)
  To: John Fastabend
  Cc: Alexei Starovoitov, David Miller, daniel, shm, tgraf,
	john.r.fastabend, netdev, brouer
In-Reply-To: <20161208225807-mutt-send-email-mst@kernel.org>

On Thu, Dec 08, 2016 at 10:58:52PM +0200, Michael S. Tsirkin wrote:
> On Thu, Dec 08, 2016 at 12:46:07PM -0800, John Fastabend wrote:
> > On 16-12-08 11:38 AM, Alexei Starovoitov wrote:
> > > On Thu, Dec 08, 2016 at 02:17:02PM -0500, David Miller wrote:
> > >> From: John Fastabend <john.fastabend@gmail.com>
> > >> Date: Wed, 07 Dec 2016 12:10:47 -0800
> > >>
> > >>> This implements virtio_net for the mergeable buffers and big_packet
> > >>> modes. I tested this with vhost_net running on qemu and did not see
> > >>> any issues. For testing num_buf > 1 I added a hack to vhost driver
> > >>> to only but 100 bytes per buffer.
> > >>  ...
> > >>
> > >> So where are we with this?
> > 
> > There is one possible issue with a hang that Michael pointed out. I can
> > either spin a v6 or if you pull this v5 series in I can post a bugfix
> > for it. I am not seeing the issue in practice XDP virtio has been up
> > and running on my box here for days without issue.
> 
> 
> I'd prefer it fixed. Alternatively, apply just 1-3 for now.

Looks like there's no issue though after all.
I misunderstood things.


> > All the concerns below are really future XDP ideas and unrelated to
> > this series or at least not required for this series to applied IMO.
> > 
> > >>
> > >> I'm not too thrilled with the idea of making XDP_TX optional or
> > >> something like that.  If someone enables XDP, there is a tradeoff.
> > >>
> > >> I also have reservations about the idea to make jumbo frames work
> > >> without giving XDP access to the whole packet.  If it wants to push or
> > >> pop a header, it might need to know the whole packet length.  How will
> > >> you pass that to the XDP program?
> > >>
> > >> Some kinds of encapsulation require trailers, thus preclusing access
> > >> to the entire packet precludes those kinds of transformations.
> > > 
> > > +1
> > 
> > This was sort of speculative on my side it is certainly not dependent on
> > the series here. I agree that we don't want to get into a state where
> > program X runs here and not there and only runs after doing magic
> > incantations, etc. I would only propose it if there is a clean way to
> > implement this.
> > 
> > > 
> > >> This is why we want simple, linear, buffer access for XDP.
> > >>
> > >> Even the most seemingly minor exception turns into a huge complicated
> > >> mess.
> > > 
> > > +1
> > 
> > Yep.
> > 
> > > 
> > > and from the other thread:
> > >>> Can't we disable XDP_TX somehow? Many people might only want RX drop,
> > >>> and extra queues are not always there.
> > >>>
> > >>
> > >> Alexei, Daniel, any thoughts on this?
> > > 
> > > I don't like it.
> > > 
> > 
> > OK alternatively we can make more queues available in virtio which might
> > be the better solution.
> > 
> > >> I know we were trying to claim some base level of feature support for
> > >> all XDP drivers. I am sympathetic to this argument though for DDOS we
> > >> do not need XDP_TX support. And virtio can become queue constrained
> > >> in some cases.
> > > 
> > > especially for ddos case doing lro/gro is not helpful.
> > 
> > Fair enough but disabling LRO to handle the case where you "might" get
> > a DDOS will hurt normal good traffic.
> > 
> > > I frankly don't see a use case where you'd want to steer a packet
> > > all the way into VM just to drop them there?
> > 
> > VM to VM traffic is my use case. And in that model we need XDP at the
> > virtio or vhost layer in case of malicious/broke/untrusted VM. I have
> > some vhost patches under development for when net-next opens up again.
> > 
> > > Without XDP_TX it's too crippled. adjust_head() won't be possible,
> > 
> > Just a nit but any reason not to support adjust_head and then XDP_PASS.
> > I don't have a use case in mind but also see no reason to preclude it.
> > 
> > > packet mangling would have to be disabled and so on.
> > > If xdp program doesn't see raw packet it can only parse the headers of
> > > this jumbo meta-packet and drop it, but for virtio it's really too late.
> > > ddos protection needs to be done at the earliest hw nic receive.
> > 
> > VM to VM traffic never touches hw nic.
> > 
> > > I think if driver claims xdp support it needs to support
> > > drop/pass/tx and adjust_head. For metadata passing up into stack from xdp
> > > we need adjust_head, for encap/decap we need it too. And lro is in the way
> > > of such transformations.
> > > We struggled a lot with cls_bpf due to all metadata inside skb that needs
> > > to be kept correct. Feeding non-raw packets into xdp is a rat hole.
> > > 
> > 
> > In summary:
> > 
> > I think its worth investigating getting LRO working but agree we can't
> > sacrifice any of the existing features or complicate the code to do it.
> > If the result of investigating is it can't be done then that is how it
> > is.
> > 
> > Jumbo frames I care very little about in reality so should not have
> > mentioned it.
> > 
> > Requiring XDP drivers to support all features is fine for me I can make
> > the virtio queue scheme a bit more flexible. Michael might have some
> > opinion on this though.
> > 
> > This series shouldn't be blocked by any of the above.
> > 
> > Thanks,
> > .John

^ permalink raw reply

* Re: [PATCH net-next 4/5] liquidio VF timestamp
From: Or Gerlitz @ 2016-12-08 21:09 UTC (permalink / raw)
  To: Raghu Vatsavayi
  Cc: David Miller, Linux Netdev List, Raghu Vatsavayi, Derek Chickles,
	Satanand Burla, Felix Manlunas
In-Reply-To: <1481230848-2393-5-git-send-email-rvatsavayi@caviumnetworks.com>

On Thu, Dec 8, 2016 at 11:00 PM, Raghu Vatsavayi
<rvatsavayi@caviumnetworks.com> wrote:
> Adds support for VF timestamp.

same here, what's the use case? do you have per VF HW clocks to set?
How it works if VF A does setup of X and VF B setup of Y

^ permalink raw reply

* Re: [net-next PATCH v5 0/6] XDP for virtio_net
From: Alexei Starovoitov @ 2016-12-08 21:08 UTC (permalink / raw)
  To: John Fastabend
  Cc: David Miller, daniel, mst, shm, tgraf, john.r.fastabend, netdev,
	brouer
In-Reply-To: <5849C68F.7080707@gmail.com>

On Thu, Dec 08, 2016 at 12:46:07PM -0800, John Fastabend wrote:
> 
> Fair enough but disabling LRO to handle the case where you "might" get
> a DDOS will hurt normal good traffic.

the xdp_pass path is not optimized right now. so even without VMs
we need some work to do. lro or not-lro is imo secondary.

> > I frankly don't see a use case where you'd want to steer a packet
> > all the way into VM just to drop them there?
> 
> VM to VM traffic is my use case. And in that model we need XDP at the
> virtio or vhost layer in case of malicious/broke/untrusted VM. I have
> some vhost patches under development for when net-next opens up again.

excellent. looking forward to vhost patches.

> > Without XDP_TX it's too crippled. adjust_head() won't be possible,
> 
> Just a nit but any reason not to support adjust_head and then XDP_PASS.
> I don't have a use case in mind but also see no reason to preclude it.

adjust_head and xdp_pass needs to be supported. No doubt.
the use case is metadata passing between xdp and upper layers.

> In summary:
> 
> I think its worth investigating getting LRO working but agree we can't
> sacrifice any of the existing features or complicate the code to do it.
> If the result of investigating is it can't be done then that is how it
> is.

agree

> Requiring XDP drivers to support all features is fine for me I can make
> the virtio queue scheme a bit more flexible. Michael might have some
> opinion on this though.

I say right now all the features are pretty much must have,
but in the future we will become selective.
Like zero-copy a page from dma into user space probably doesn't
make sense to do for virtio.
Multi-port TX from virtio into phys netdev doesn't make sense either.

> This series shouldn't be blocked by any of the above.

completely agree.
since we abandoned e1000+xdp patches, the virtio+xdp is the only
thing on the table that allows us to do convenient development
and testing of xdp programs.
We've talked about a repository of blessed xdp programs.
They all would need to be routinely and automatically tested.
So virtio+xdp is a must have feature to me.

^ permalink raw reply

* Re: [PATCH net-next 2/5] liquidio VF vxlan
From: Or Gerlitz @ 2016-12-08 21:08 UTC (permalink / raw)
  To: Raghu Vatsavayi
  Cc: David Miller, Linux Netdev List, Raghu Vatsavayi, Derek Chickles,
	Satanand Burla, Felix Manlunas
In-Reply-To: <1481230848-2393-3-git-send-email-rvatsavayi@caviumnetworks.com>

On Thu, Dec 8, 2016 at 11:00 PM, Raghu Vatsavayi
<rvatsavayi@caviumnetworks.com> wrote:

> Adds VF vxlan offload support.

What's the use case for that? a VM running a VTEP, isn't that part
needs to run @ the host?

Or.

^ permalink raw reply

* Re: [net-next PATCH v5 5/6] virtio_net: add XDP_TX support
From: Michael S. Tsirkin @ 2016-12-08 21:08 UTC (permalink / raw)
  To: John Fastabend
  Cc: daniel, shm, davem, tgraf, alexei.starovoitov, john.r.fastabend,
	netdev, brouer
In-Reply-To: <5849A3EE.7090603@gmail.com>

On Thu, Dec 08, 2016 at 10:18:22AM -0800, John Fastabend wrote:
> On 16-12-07 10:11 PM, Michael S. Tsirkin wrote:
> > On Wed, Dec 07, 2016 at 12:12:45PM -0800, John Fastabend wrote:
> >> This adds support for the XDP_TX action to virtio_net. When an XDP
> >> program is run and returns the XDP_TX action the virtio_net XDP
> >> implementation will transmit the packet on a TX queue that aligns
> >> with the current CPU that the XDP packet was processed on.
> >>
> >> Before sending the packet the header is zeroed.  Also XDP is expected
> >> to handle checksum correctly so no checksum offload  support is
> >> provided.
> >>
> >> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
> >> ---
> >>  drivers/net/virtio_net.c |   99 +++++++++++++++++++++++++++++++++++++++++++---
> >>  1 file changed, 92 insertions(+), 7 deletions(-)
> >>
> >> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> >> index 28b1196..8e5b13c 100644
> >> --- a/drivers/net/virtio_net.c
> >> +++ b/drivers/net/virtio_net.c
> >> @@ -330,12 +330,57 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
> >>  	return skb;
> >>  }
> >>  
> >> +static void virtnet_xdp_xmit(struct virtnet_info *vi,
> >> +			     struct receive_queue *rq,
> >> +			     struct send_queue *sq,
> >> +			     struct xdp_buff *xdp)
> >> +{
> >> +	struct page *page = virt_to_head_page(xdp->data);
> >> +	struct virtio_net_hdr_mrg_rxbuf *hdr;
> >> +	unsigned int num_sg, len;
> >> +	void *xdp_sent;
> >> +	int err;
> >> +
> >> +	/* Free up any pending old buffers before queueing new ones. */
> >> +	while ((xdp_sent = virtqueue_get_buf(sq->vq, &len)) != NULL) {
> >> +		struct page *sent_page = virt_to_head_page(xdp_sent);
> >> +
> >> +		if (vi->mergeable_rx_bufs)
> >> +			put_page(sent_page);
> >> +		else
> >> +			give_pages(rq, sent_page);
> >> +	}
> > 
> > Looks like this is the only place where you do virtqueue_get_buf.
> > No interrupt handler?
> > This means that if you fill up the queue, nothing will clean it
> > and things will get stuck.
> 
> hmm OK so the callbacks should be implemented to do this and a pair
> of virtqueue_enable_cb_prepare()/virtqueue_disable_cb() used to enable
> and disable callbacks if packets are enqueued.

Oh I didn't realize XDP never stops processing packets,
even if they are never freed.
In that case you do not need callbacks.

> Also in the normal xmit path via start_xmit() will the same condition
> happen? It looks like free_old_xmit_skbs for example is only called if
> a packet is sent could we end up holding on to skbs in this case? I
> don't see free_old_xmit_skbs being called from any callbacks?

Right - all it does is restart the queue. That's why we don't support
BQL right now.

> > Can this be the issue you saw?
> 
> nope see below I was mishandling the big_packets page cleanup path in
> the error case.
> 
> > 
> > 
> >> +
> >> +	/* Zero header and leave csum up to XDP layers */
> >> +	hdr = xdp->data;
> >> +	memset(hdr, 0, vi->hdr_len);
> >> +
> >> +	nu_sg = 1;
> >> +	sg_init_one(sq->sg, xdp->data, xdp->data_end - xdp->data);
> >> +	err = virtqueue_add_outbuf(sq->vq, sq->sg, num_sg,
> >> +				   xdp->data, GFP_ATOMIC);
> >> +	if (unlikely(err)) {
> >> +		if (vi->mergeable_rx_bufs)
> >> +			put_page(page);
> >> +		else
> >> +			give_pages(rq, page);
> >> +	} else if (!vi->mergeable_rx_bufs) {
> >> +		/* If not mergeable bufs must be big packets so cleanup pages */
> >> +		give_pages(rq, (struct page *)page->private);
> >> +		page->private = 0;
> >> +	}
> >> +
> >> +	virtqueue_kick(sq->vq);
> > 
> > Is this unconditional kick a work-around for hang
> > we could not figure out yet?
> 
> I tracked the original issue down to how I handled the big_packet page
> cleanups.
> 
> > I guess this helps because it just slows down the guest.
> > I don't much like it ...
> 
> I left it like this copying the pattern in balloon and input drivers. I
> can change it back to the previous pattern where it is only called if
> there is no errors. It has been running fine with the old pattern now
> for an hour or so.
> 
> .John

OK makes sense.

^ permalink raw reply

* Re: Soft lockup in inet_put_port on 4.6
From: Hannes Frederic Sowa @ 2016-12-08 21:03 UTC (permalink / raw)
  To: Tom Herbert, Linux Kernel Network Developers, Josef Bacik
In-Reply-To: <CALx6S36OVUqAxq9vNnfHp2eJOuG+gSSg896zzaZoc3Og4tyxFw@mail.gmail.com>

Hello Tom,

On Wed, Dec 7, 2016, at 00:06, Tom Herbert wrote:
> We are seeing a fair number of machines getting into softlockup in 4.6
> kernel. As near as I can tell this is happening on the spinlock in
> bind hash bucket. When inet_csk_get_port exits and does spinunlock_bh
> the TCP timer runs and we hit lockup in inet_put_port (presumably on
> same lock). It seems like the locked isn't properly be unlocked
> somewhere but I don't readily see it.
> 
> Any ideas?

Likewise we received reports that pretty much look the same on our
heavily patched kernel. Did you have a chance to investigate or
reproduce the problem?

I am wondering if you would be able to take a complete thread stack dump
if you can reproduce this to check if one of the user space processes is
looping inside finding a free port?

Thanks,
Hannes

^ permalink raw reply

* [PATCH net-next 5/5] liquidio VF error handling
From: Raghu Vatsavayi @ 2016-12-08 21:00 UTC (permalink / raw)
  To: davem
  Cc: netdev, Raghu Vatsavayi, Raghu Vatsavayi, Derek Chickles,
	Satanand Burla, Felix Manlunas
In-Reply-To: <1481230848-2393-1-git-send-email-rvatsavayi@caviumnetworks.com>

Adds support for VF error handling.

Signed-off-by: Raghu Vatsavayi <raghu.vatsavayi@caviumnetworks.com>
Signed-off-by: Derek Chickles <derek.chickles@caviumnetworks.com>
Signed-off-by: Satanand Burla <satananda.burla@caviumnetworks.com>
Signed-off-by: Felix Manlunas <felix.manlunas@caviumnetworks.com>
---
 drivers/net/ethernet/cavium/liquidio/lio_vf_main.c | 139 +++++++++++++++++++++
 1 file changed, 139 insertions(+)

diff --git a/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c b/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c
index dc0e1f6..70d96c1 100644
--- a/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c
+++ b/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c
@@ -175,6 +175,144 @@ static int wait_for_pending_requests(struct octeon_device *oct)
 	return 0;
 }
 
+/**
+ * \brief Cause device to go quiet so it can be safely removed/reset/etc
+ * @param oct Pointer to Octeon device
+ */
+static void pcierror_quiesce_device(struct octeon_device *oct)
+{
+	int i;
+
+	/* Disable the input and output queues now. No more packets will
+	 * arrive from Octeon, but we should wait for all packet processing
+	 * to finish.
+	 */
+
+	/* To allow for in-flight requests */
+	schedule_timeout_uninterruptible(100);
+
+	if (wait_for_pending_requests(oct))
+		dev_err(&oct->pci_dev->dev, "There were pending requests\n");
+
+	/* Force all requests waiting to be fetched by OCTEON to complete. */
+	for (i = 0; i < MAX_OCTEON_INSTR_QUEUES(oct); i++) {
+		struct octeon_instr_queue *iq;
+
+		if (!(oct->io_qmask.iq & BIT_ULL(i)))
+			continue;
+		iq = oct->instr_queue[i];
+
+		if (atomic_read(&iq->instr_pending)) {
+			spin_lock_bh(&iq->lock);
+			iq->fill_cnt = 0;
+			iq->octeon_read_index = iq->host_write_index;
+			iq->stats.instr_processed +=
+			    atomic_read(&iq->instr_pending);
+			lio_process_iq_request_list(oct, iq, 0);
+			spin_unlock_bh(&iq->lock);
+		}
+	}
+
+	/* Force all pending ordered list requests to time out. */
+	lio_process_ordered_list(oct, 1);
+
+	/* We do not need to wait for output queue packets to be processed. */
+}
+
+/**
+ * \brief Cleanup PCI AER uncorrectable error status
+ * @param dev Pointer to PCI device
+ */
+static void cleanup_aer_uncorrect_error_status(struct pci_dev *dev)
+{
+	u32 status, mask;
+	int pos = 0x100;
+
+	pr_info("%s :\n", __func__);
+
+	pci_read_config_dword(dev, pos + PCI_ERR_UNCOR_STATUS, &status);
+	pci_read_config_dword(dev, pos + PCI_ERR_UNCOR_SEVER, &mask);
+	if (dev->error_state == pci_channel_io_normal)
+		status &= ~mask; /* Clear corresponding nonfatal bits */
+	else
+		status &= mask; /* Clear corresponding fatal bits */
+	pci_write_config_dword(dev, pos + PCI_ERR_UNCOR_STATUS, status);
+}
+
+/**
+ * \brief Stop all PCI IO to a given device
+ * @param dev Pointer to Octeon device
+ */
+static void stop_pci_io(struct octeon_device *oct)
+{
+	struct msix_entry *msix_entries;
+	int i;
+
+	/* No more instructions will be forwarded. */
+	atomic_set(&oct->status, OCT_DEV_IN_RESET);
+
+	for (i = 0; i < oct->ifcount; i++)
+		netif_device_detach(oct->props[i].netdev);
+
+	/* Disable interrupts  */
+	oct->fn_list.disable_interrupt(oct, OCTEON_ALL_INTR);
+
+	pcierror_quiesce_device(oct);
+	if (oct->msix_on) {
+		msix_entries = (struct msix_entry *)oct->msix_entries;
+		for (i = 0; i < oct->num_msix_irqs; i++) {
+			/* clear the affinity_cpumask */
+			irq_set_affinity_hint(msix_entries[i].vector,
+					      NULL);
+			free_irq(msix_entries[i].vector,
+				 &oct->ioq_vector[i]);
+		}
+		pci_disable_msix(oct->pci_dev);
+		kfree(oct->msix_entries);
+		oct->msix_entries = NULL;
+		octeon_free_ioq_vector(oct);
+	}
+	dev_dbg(&oct->pci_dev->dev, "Device state is now %s\n",
+		lio_get_state_string(&oct->status));
+
+	/* making it a common function for all OCTEON models */
+	cleanup_aer_uncorrect_error_status(oct->pci_dev);
+
+	pci_disable_device(oct->pci_dev);
+}
+
+/**
+ * \brief called when PCI error is detected
+ * @param pdev Pointer to PCI device
+ * @param state The current pci connection state
+ *
+ * This function is called after a PCI bus error affecting
+ * this device has been detected.
+ */
+static pci_ers_result_t liquidio_pcie_error_detected(struct pci_dev *pdev,
+						     pci_channel_state_t state)
+{
+	struct octeon_device *oct = pci_get_drvdata(pdev);
+
+	/* Non-correctable Non-fatal errors */
+	if (state == pci_channel_io_normal) {
+		dev_err(&oct->pci_dev->dev, "Non-correctable non-fatal error reported:\n");
+		cleanup_aer_uncorrect_error_status(oct->pci_dev);
+		return PCI_ERS_RESULT_CAN_RECOVER;
+	}
+
+	/* Non-correctable Fatal errors */
+	dev_err(&oct->pci_dev->dev, "Non-correctable FATAL reported by PCI AER driver\n");
+	stop_pci_io(oct);
+
+	return PCI_ERS_RESULT_DISCONNECT;
+}
+
+/* For PCI-E Advanced Error Recovery (AER) Interface */
+static const struct pci_error_handlers liquidio_vf_err_handler = {
+	.error_detected = liquidio_pcie_error_detected,
+};
+
 static const struct pci_device_id liquidio_vf_pci_tbl[] = {
 	{
 		PCI_VENDOR_ID_CAVIUM, OCTEON_CN23XX_VF_VID,
@@ -191,6 +329,7 @@ static int wait_for_pending_requests(struct octeon_device *oct)
 	.id_table	= liquidio_vf_pci_tbl,
 	.probe		= liquidio_vf_probe,
 	.remove		= liquidio_vf_remove,
+	.err_handler	= &liquidio_vf_err_handler,    /* For AER */
 };
 
 /**
-- 
1.8.3.1

^ permalink raw reply related

* [PATCH net-next 4/5] liquidio VF timestamp
From: Raghu Vatsavayi @ 2016-12-08 21:00 UTC (permalink / raw)
  To: davem
  Cc: netdev, Raghu Vatsavayi, Raghu Vatsavayi, Derek Chickles,
	Satanand Burla, Felix Manlunas
In-Reply-To: <1481230848-2393-1-git-send-email-rvatsavayi@caviumnetworks.com>

Adds support for VF timestamp.

Signed-off-by: Raghu Vatsavayi <raghu.vatsavayi@caviumnetworks.com>
Signed-off-by: Derek Chickles <derek.chickles@caviumnetworks.com>
Signed-off-by: Satanand Burla <satananda.burla@caviumnetworks.com>
Signed-off-by: Felix Manlunas <felix.manlunas@caviumnetworks.com>
---
 drivers/net/ethernet/cavium/liquidio/lio_vf_main.c | 180 ++++++++++++++++++++-
 1 file changed, 179 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c b/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c
index 97e9b6b..dc0e1f6 100644
--- a/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c
+++ b/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c
@@ -42,6 +42,7 @@
 #define   LIO_IFSTATE_DROQ_OPS             0x01
 #define   LIO_IFSTATE_REGISTERED           0x02
 #define   LIO_IFSTATE_RUNNING              0x04
+#define   LIO_IFSTATE_RX_TIMESTAMP_ENABLED 0x08
 
 struct liquidio_if_cfg_context {
 	int octeon_id;
@@ -65,6 +66,12 @@ struct liquidio_rx_ctl_context {
 	int cond;
 };
 
+struct oct_timestamp_resp {
+	u64 rh;
+	u64 timestamp;
+	u64 status;
+};
+
 union tx_info {
 	u64 u64;
 	struct {
@@ -1894,6 +1901,169 @@ static int liquidio_change_mtu(struct net_device *netdev, int new_mtu)
 	return 0;
 }
 
+/**
+ * \brief Handler for SIOCSHWTSTAMP ioctl
+ * @param netdev network device
+ * @param ifr interface request
+ * @param cmd command
+ */
+static int hwtstamp_ioctl(struct net_device *netdev, struct ifreq *ifr)
+{
+	struct lio *lio = GET_LIO(netdev);
+	struct hwtstamp_config conf;
+
+	if (copy_from_user(&conf, ifr->ifr_data, sizeof(conf)))
+		return -EFAULT;
+
+	if (conf.flags)
+		return -EINVAL;
+
+	switch (conf.tx_type) {
+	case HWTSTAMP_TX_ON:
+	case HWTSTAMP_TX_OFF:
+		break;
+	default:
+		return -ERANGE;
+	}
+
+	switch (conf.rx_filter) {
+	case HWTSTAMP_FILTER_NONE:
+		break;
+	case HWTSTAMP_FILTER_ALL:
+	case HWTSTAMP_FILTER_SOME:
+	case HWTSTAMP_FILTER_PTP_V1_L4_EVENT:
+	case HWTSTAMP_FILTER_PTP_V1_L4_SYNC:
+	case HWTSTAMP_FILTER_PTP_V1_L4_DELAY_REQ:
+	case HWTSTAMP_FILTER_PTP_V2_L4_EVENT:
+	case HWTSTAMP_FILTER_PTP_V2_L4_SYNC:
+	case HWTSTAMP_FILTER_PTP_V2_L4_DELAY_REQ:
+	case HWTSTAMP_FILTER_PTP_V2_L2_EVENT:
+	case HWTSTAMP_FILTER_PTP_V2_L2_SYNC:
+	case HWTSTAMP_FILTER_PTP_V2_L2_DELAY_REQ:
+	case HWTSTAMP_FILTER_PTP_V2_EVENT:
+	case HWTSTAMP_FILTER_PTP_V2_SYNC:
+	case HWTSTAMP_FILTER_PTP_V2_DELAY_REQ:
+		conf.rx_filter = HWTSTAMP_FILTER_ALL;
+		break;
+	default:
+		return -ERANGE;
+	}
+
+	if (conf.rx_filter == HWTSTAMP_FILTER_ALL)
+		ifstate_set(lio, LIO_IFSTATE_RX_TIMESTAMP_ENABLED);
+
+	else
+		ifstate_reset(lio, LIO_IFSTATE_RX_TIMESTAMP_ENABLED);
+
+	return copy_to_user(ifr->ifr_data, &conf, sizeof(conf)) ? -EFAULT : 0;
+}
+
+/**
+ * \brief ioctl handler
+ * @param netdev network device
+ * @param ifr interface request
+ * @param cmd command
+ */
+static int liquidio_ioctl(struct net_device *netdev, struct ifreq *ifr, int cmd)
+{
+	switch (cmd) {
+	case SIOCSHWTSTAMP:
+		return hwtstamp_ioctl(netdev, ifr);
+	default:
+		return -EOPNOTSUPP;
+	}
+}
+
+static void handle_timestamp(struct octeon_device *oct, u32 status, void *buf)
+{
+	struct sk_buff *skb = (struct sk_buff *)buf;
+	struct octnet_buf_free_info *finfo;
+	struct oct_timestamp_resp *resp;
+	struct octeon_soft_command *sc;
+	struct lio *lio;
+
+	finfo = (struct octnet_buf_free_info *)skb->cb;
+	lio = finfo->lio;
+	sc = finfo->sc;
+	oct = lio->oct_dev;
+	resp = (struct oct_timestamp_resp *)sc->virtrptr;
+
+	if (status != OCTEON_REQUEST_DONE) {
+		dev_err(&oct->pci_dev->dev, "Tx timestamp instruction failed. Status: %llx\n",
+			CVM_CAST64(status));
+		resp->timestamp = 0;
+	}
+
+	octeon_swap_8B_data(&resp->timestamp, 1);
+
+	if (unlikely(skb_shinfo(skb)->tx_flags & SKBTX_IN_PROGRESS)) {
+		struct skb_shared_hwtstamps ts;
+		u64 ns = resp->timestamp;
+
+		netif_info(lio, tx_done, lio->netdev,
+			   "Got resulting SKBTX_HW_TSTAMP skb=%p ns=%016llu\n",
+			   skb, (unsigned long long)ns);
+		ts.hwtstamp = ns_to_ktime(ns + lio->ptp_adjust);
+		skb_tstamp_tx(skb, &ts);
+	}
+
+	octeon_free_soft_command(oct, sc);
+	tx_buffer_free(skb);
+}
+
+/* \brief Send a data packet that will be timestamped
+ * @param oct octeon device
+ * @param ndata pointer to network data
+ * @param finfo pointer to private network data
+ */
+static int send_nic_timestamp_pkt(struct octeon_device *oct,
+				  struct octnic_data_pkt *ndata,
+				  struct octnet_buf_free_info *finfo)
+{
+	struct octeon_soft_command *sc;
+	int ring_doorbell;
+	struct lio *lio;
+	int retval;
+	u32 len;
+
+	lio = finfo->lio;
+
+	sc = octeon_alloc_soft_command_resp(oct, &ndata->cmd,
+					    sizeof(struct oct_timestamp_resp));
+	finfo->sc = sc;
+
+	if (!sc) {
+		dev_err(&oct->pci_dev->dev, "No memory for timestamped data packet\n");
+		return IQ_SEND_FAILED;
+	}
+
+	if (ndata->reqtype == REQTYPE_NORESP_NET)
+		ndata->reqtype = REQTYPE_RESP_NET;
+	else if (ndata->reqtype == REQTYPE_NORESP_NET_SG)
+		ndata->reqtype = REQTYPE_RESP_NET_SG;
+
+	sc->callback = handle_timestamp;
+	sc->callback_arg = finfo->skb;
+	sc->iq_no = ndata->q_no;
+
+	len = (u32)((struct octeon_instr_ih3 *)(&sc->cmd.cmd3.ih3))->dlengsz;
+
+	ring_doorbell = 1;
+
+	retval = octeon_send_command(oct, sc->iq_no, ring_doorbell, &sc->cmd,
+				     sc, len, ndata->reqtype);
+
+	if (retval == IQ_SEND_FAILED) {
+		dev_err(&oct->pci_dev->dev, "timestamp data packet failed status: %x\n",
+			retval);
+		octeon_free_soft_command(oct, sc);
+	} else {
+		netif_info(lio, tx_queued, lio->netdev, "Queued timestamp packet\n");
+	}
+
+	return retval;
+}
+
 /** \brief Transmit networks packets to the Octeon interface
  * @param skbuff   skbuff struct to be passed to network layer.
  * @param netdev   pointer to network device
@@ -1986,6 +2156,10 @@ static int liquidio_xmit(struct sk_buff *skb, struct net_device *netdev)
 			cmdsetup.s.transport_csum = 1;
 		}
 	}
+	if (unlikely(skb_shinfo(skb)->tx_flags & SKBTX_HW_TSTAMP)) {
+		skb_shinfo(skb)->tx_flags |= SKBTX_IN_PROGRESS;
+		cmdsetup.s.timestamp = 1;
+	}
 
 	if (!skb_shinfo(skb)->nr_frags) {
 		cmdsetup.s.u.datasize = skb->len;
@@ -2110,7 +2284,10 @@ static int liquidio_xmit(struct sk_buff *skb, struct net_device *netdev)
 		irh->vlan = skb_vlan_tag_get(skb) & VLAN_VID_MASK;
 	}
 
-	status = octnet_send_nic_data_pkt(oct, &ndata);
+	if (unlikely(cmdsetup.s.timestamp))
+		status = send_nic_timestamp_pkt(oct, &ndata, finfo);
+	else
+		status = octnet_send_nic_data_pkt(oct, &ndata);
 	if (status == IQ_SEND_FAILED)
 		goto lio_xmit_failed;
 
@@ -2382,6 +2559,7 @@ static void liquidio_del_vxlan_port(struct net_device *netdev,
 	.ndo_vlan_rx_add_vid    = liquidio_vlan_rx_add_vid,
 	.ndo_vlan_rx_kill_vid   = liquidio_vlan_rx_kill_vid,
 	.ndo_change_mtu		= liquidio_change_mtu,
+	.ndo_do_ioctl		= liquidio_ioctl,
 	.ndo_fix_features	= liquidio_fix_features,
 	.ndo_set_features	= liquidio_set_features,
 	.ndo_udp_tunnel_add     = liquidio_add_vxlan_port,
-- 
1.8.3.1

^ permalink raw reply related

* [PATCH net-next 3/5] liquidio VF ethtool stats
From: Raghu Vatsavayi @ 2016-12-08 21:00 UTC (permalink / raw)
  To: davem
  Cc: netdev, Raghu Vatsavayi, Raghu Vatsavayi, Derek Chickles,
	Satanand Burla, Felix Manlunas
In-Reply-To: <1481230848-2393-1-git-send-email-rvatsavayi@caviumnetworks.com>

Adds support for VF ethtool stats

Signed-off-by: Raghu Vatsavayi <raghu.vatsavayi@caviumnetworks.com>
Signed-off-by: Derek Chickles <derek.chickles@caviumnetworks.com>
Signed-off-by: Satanand Burla <satananda.burla@caviumnetworks.com>
Signed-off-by: Felix Manlunas <felix.manlunas@caviumnetworks.com>
---
 .../ethernet/cavium/liquidio/cn23xx_vf_device.h    |   2 +
 drivers/net/ethernet/cavium/liquidio/lio_ethtool.c | 512 +++++++++++++++++----
 drivers/net/ethernet/cavium/liquidio/lio_vf_main.c |  79 ++++
 3 files changed, 495 insertions(+), 98 deletions(-)

diff --git a/drivers/net/ethernet/cavium/liquidio/cn23xx_vf_device.h b/drivers/net/ethernet/cavium/liquidio/cn23xx_vf_device.h
index 6715df3..3f98c73 100644
--- a/drivers/net/ethernet/cavium/liquidio/cn23xx_vf_device.h
+++ b/drivers/net/ethernet/cavium/liquidio/cn23xx_vf_device.h
@@ -44,5 +44,7 @@ struct octeon_cn23xx_vf {
 
 int cn23xx_setup_octeon_vf_device(struct octeon_device *oct);
 
+u32 cn23xx_vf_get_oq_ticks(struct octeon_device *oct, u32 time_intr_in_us);
+
 void cn23xx_dump_vf_initialized_regs(struct octeon_device *oct);
 #endif
diff --git a/drivers/net/ethernet/cavium/liquidio/lio_ethtool.c b/drivers/net/ethernet/cavium/liquidio/lio_ethtool.c
index e233796..b00c300 100644
--- a/drivers/net/ethernet/cavium/liquidio/lio_ethtool.c
+++ b/drivers/net/ethernet/cavium/liquidio/lio_ethtool.c
@@ -29,6 +29,7 @@
 #include "cn66xx_regs.h"
 #include "cn66xx_device.h"
 #include "cn23xx_pf_device.h"
+#include "cn23xx_vf_device.h"
 
 static int octnet_get_link_stats(struct net_device *netdev);
 
@@ -72,6 +73,7 @@ enum {
 
 #define OCT_ETHTOOL_REGDUMP_LEN  4096
 #define OCT_ETHTOOL_REGDUMP_LEN_23XX  (4096 * 11)
+#define OCT_ETHTOOL_REGDUMP_LEN_23XX_VF  (4096 * 2)
 #define OCT_ETHTOOL_REGSVER  1
 
 /* statistics of PF */
@@ -147,6 +149,19 @@ enum {
 	"link_state_changes",
 };
 
+/* statistics of VF */
+static const char oct_vf_stats_strings[][ETH_GSTRING_LEN] = {
+	"rx_packets",
+	"tx_packets",
+	"rx_bytes",
+	"tx_bytes",
+	"rx_errors", /* jabber_err + l2_err+frame_err */
+	"tx_errors", /* fw_err_pko + fw_err_link+fw_err_drop */
+	"rx_dropped", /* total_rcvd - fw_total_rcvd + dmac_drop + fw_err_drop */
+	"tx_dropped",
+	"link_state_changes",
+};
+
 /* statistics of host tx queue */
 static const char oct_iq_stats_strings[][ETH_GSTRING_LEN] = {
 	"packets",		/*oct->instr_queue[iq_no]->stats.tx_done*/
@@ -192,25 +207,28 @@ enum {
 #define OCTNIC_NCMD_AUTONEG_ON  0x1
 #define OCTNIC_NCMD_PHY_ON      0x2
 
-static int lio_get_settings(struct net_device *netdev, struct ethtool_cmd *ecmd)
+static int lio_get_link_ksettings(struct net_device *netdev,
+				  struct ethtool_link_ksettings *ecmd)
 {
 	struct lio *lio = GET_LIO(netdev);
 	struct octeon_device *oct = lio->oct_dev;
 	struct oct_link_info *linfo;
+	u32 supported, advertising;
 
 	linfo = &lio->linfo;
 
 	if (linfo->link.s.if_mode == INTERFACE_MODE_XAUI ||
 	    linfo->link.s.if_mode == INTERFACE_MODE_RXAUI ||
 	    linfo->link.s.if_mode == INTERFACE_MODE_XFI) {
-		ecmd->port = PORT_FIBRE;
-		ecmd->supported =
-			(SUPPORTED_10000baseT_Full | SUPPORTED_FIBRE |
-			 SUPPORTED_Pause);
-		ecmd->advertising =
-			(ADVERTISED_10000baseT_Full | ADVERTISED_Pause);
-		ecmd->transceiver = XCVR_EXTERNAL;
-		ecmd->autoneg = AUTONEG_DISABLE;
+		ecmd->base.port = PORT_FIBRE;
+		supported = (SUPPORTED_10000baseT_Full | SUPPORTED_FIBRE |
+			     SUPPORTED_Pause);
+		advertising = (ADVERTISED_10000baseT_Full | ADVERTISED_Pause);
+		ethtool_convert_legacy_u32_to_link_mode(
+			ecmd->link_modes.supported, supported);
+		ethtool_convert_legacy_u32_to_link_mode(
+			ecmd->link_modes.advertising, advertising);
+		ecmd->base.autoneg = AUTONEG_DISABLE;
 
 	} else {
 		dev_err(&oct->pci_dev->dev, "Unknown link interface reported %d\n",
@@ -218,11 +236,11 @@ static int lio_get_settings(struct net_device *netdev, struct ethtool_cmd *ecmd)
 	}
 
 	if (linfo->link.s.link_up) {
-		ethtool_cmd_speed_set(ecmd, linfo->link.s.speed);
-		ecmd->duplex = linfo->link.s.duplex;
+		ecmd->base.speed = linfo->link.s.speed;
+		ecmd->base.duplex = linfo->link.s.duplex;
 	} else {
-		ethtool_cmd_speed_set(ecmd, SPEED_UNKNOWN);
-		ecmd->duplex = DUPLEX_UNKNOWN;
+		ecmd->base.speed = SPEED_UNKNOWN;
+		ecmd->base.duplex = DUPLEX_UNKNOWN;
 	}
 
 	return 0;
@@ -246,6 +264,23 @@ static int lio_get_settings(struct net_device *netdev, struct ethtool_cmd *ecmd)
 }
 
 static void
+lio_get_vf_drvinfo(struct net_device *netdev, struct ethtool_drvinfo *drvinfo)
+{
+	struct octeon_device *oct;
+	struct lio *lio;
+
+	lio = GET_LIO(netdev);
+	oct = lio->oct_dev;
+
+	memset(drvinfo, 0, sizeof(struct ethtool_drvinfo));
+	strcpy(drvinfo->driver, "liquidio_vf");
+	strcpy(drvinfo->version, LIQUIDIO_VERSION);
+	strncpy(drvinfo->fw_version, oct->fw_info.liquidio_firmware_version,
+		ETHTOOL_FWVERS_LEN);
+	strncpy(drvinfo->bus_info, pci_name(oct->pci_dev), 32);
+}
+
+static void
 lio_ethtool_get_channels(struct net_device *dev,
 			 struct ethtool_channels *channel)
 {
@@ -982,6 +1017,109 @@ static void lio_set_msglevel(struct net_device *netdev, u32 msglvl)
 	}
 }
 
+static void lio_vf_get_ethtool_stats(struct net_device *netdev,
+				     struct ethtool_stats *stats
+				     __attribute__((unused)),
+				     u64 *data)
+{
+	struct net_device_stats *netstats = &netdev->stats;
+	struct lio *lio = GET_LIO(netdev);
+	struct octeon_device *oct_dev = lio->oct_dev;
+	int i = 0, j, vj;
+
+	netdev->netdev_ops->ndo_get_stats(netdev);
+	/* sum of oct->droq[oq_no]->stats->rx_pkts_received */
+	data[i++] = CVM_CAST64(netstats->rx_packets);
+	/* sum of oct->instr_queue[iq_no]->stats.tx_done */
+	data[i++] = CVM_CAST64(netstats->tx_packets);
+	/* sum of oct->droq[oq_no]->stats->rx_bytes_received */
+	data[i++] = CVM_CAST64(netstats->rx_bytes);
+	/* sum of oct->instr_queue[iq_no]->stats.tx_tot_bytes */
+	data[i++] = CVM_CAST64(netstats->tx_bytes);
+	data[i++] = CVM_CAST64(netstats->rx_errors);
+	data[i++] = CVM_CAST64(netstats->tx_errors);
+	 /* sum of oct->droq[oq_no]->stats->rx_dropped +
+	  * oct->droq[oq_no]->stats->dropped_nodispatch +
+	  * oct->droq[oq_no]->stats->dropped_toomany +
+	  * oct->droq[oq_no]->stats->dropped_nomem
+	  */
+	data[i++] = CVM_CAST64(netstats->rx_dropped);
+	/* sum of oct->instr_queue[iq_no]->stats.tx_dropped */
+	data[i++] = CVM_CAST64(netstats->tx_dropped);
+	/* lio->link_changes */
+	data[i++] = CVM_CAST64(lio->link_changes);
+
+	for (vj = 0; vj < lio->linfo.num_txpciq; vj++) {
+		j = lio->linfo.txpciq[vj].s.q_no;
+
+		/* packets to network port */
+		/* # of packets tx to network */
+		data[i++] = CVM_CAST64(oct_dev->instr_queue[j]->stats.tx_done);
+		 /* # of bytes tx to network */
+		data[i++] = CVM_CAST64(
+				oct_dev->instr_queue[j]->stats.tx_tot_bytes);
+		/* # of packets dropped */
+		data[i++] = CVM_CAST64(
+				oct_dev->instr_queue[j]->stats.tx_dropped);
+		/* # of tx fails due to queue full */
+		data[i++] = CVM_CAST64(
+				oct_dev->instr_queue[j]->stats.tx_iq_busy);
+		/* XXX gather entries sent */
+		data[i++] = CVM_CAST64(
+				oct_dev->instr_queue[j]->stats.sgentry_sent);
+
+		/* instruction to firmware: data and control */
+		/* # of instructions to the queue */
+		data[i++] = CVM_CAST64(
+				oct_dev->instr_queue[j]->stats.instr_posted);
+		/* # of instructions processed */
+		data[i++] =
+		    CVM_CAST64(oct_dev->instr_queue[j]->stats.instr_processed);
+		/* # of instructions could not be processed */
+		data[i++] =
+		    CVM_CAST64(oct_dev->instr_queue[j]->stats.instr_dropped);
+		/* bytes sent through the queue */
+		data[i++] = CVM_CAST64(
+				oct_dev->instr_queue[j]->stats.bytes_sent);
+		/* tso request */
+		data[i++] = CVM_CAST64(oct_dev->instr_queue[j]->stats.tx_gso);
+		/* vxlan request */
+		data[i++] = CVM_CAST64(oct_dev->instr_queue[j]->stats.tx_vxlan);
+		/* txq restart */
+		data[i++] = CVM_CAST64(
+				oct_dev->instr_queue[j]->stats.tx_restart);
+	}
+
+	/* RX */
+	for (vj = 0; vj < lio->linfo.num_rxpciq; vj++) {
+		j = lio->linfo.rxpciq[vj].s.q_no;
+
+		/* packets send to TCP/IP network stack */
+		/* # of packets to network stack */
+		data[i++] = CVM_CAST64(
+				oct_dev->droq[j]->stats.rx_pkts_received);
+		/* # of bytes to network stack */
+		data[i++] = CVM_CAST64(
+				oct_dev->droq[j]->stats.rx_bytes_received);
+		data[i++] = CVM_CAST64(oct_dev->droq[j]->stats.dropped_nomem +
+				       oct_dev->droq[j]->stats.dropped_toomany +
+				       oct_dev->droq[j]->stats.rx_dropped);
+		data[i++] = CVM_CAST64(oct_dev->droq[j]->stats.dropped_nomem);
+		data[i++] = CVM_CAST64(oct_dev->droq[j]->stats.dropped_toomany);
+		data[i++] = CVM_CAST64(oct_dev->droq[j]->stats.rx_dropped);
+
+		/* control and data path */
+		data[i++] = CVM_CAST64(oct_dev->droq[j]->stats.pkts_received);
+		data[i++] = CVM_CAST64(oct_dev->droq[j]->stats.bytes_received);
+		data[i++] =
+			CVM_CAST64(oct_dev->droq[j]->stats.dropped_nodispatch);
+
+		data[i++] = CVM_CAST64(oct_dev->droq[j]->stats.rx_vxlan);
+		data[i++] =
+		    CVM_CAST64(oct_dev->droq[j]->stats.rx_alloc_failure);
+	}
+}
+
 static void lio_get_priv_flags_strings(struct lio *lio, u8 *data)
 {
 	struct octeon_device *oct_dev = lio->oct_dev;
@@ -989,6 +1127,7 @@ static void lio_get_priv_flags_strings(struct lio *lio, u8 *data)
 
 	switch (oct_dev->chip_id) {
 	case OCTEON_CN23XX_PF_VID:
+	case OCTEON_CN23XX_VF_VID:
 		for (i = 0; i < ARRAY_SIZE(oct_priv_flags_strings); i++) {
 			sprintf(data, "%s", oct_priv_flags_strings[i]);
 			data += ETH_GSTRING_LEN;
@@ -1050,12 +1189,61 @@ static void lio_get_strings(struct net_device *netdev, u32 stringset, u8 *data)
 	}
 }
 
+static void lio_vf_get_strings(struct net_device *netdev, u32 stringset,
+			       u8 *data)
+{
+	int num_iq_stats, num_oq_stats, i, j;
+	struct lio *lio = GET_LIO(netdev);
+	struct octeon_device *oct_dev = lio->oct_dev;
+	int num_stats;
+
+	switch (stringset) {
+	case ETH_SS_STATS:
+		num_stats = ARRAY_SIZE(oct_vf_stats_strings);
+		for (j = 0; j < num_stats; j++) {
+			sprintf(data, "%s", oct_vf_stats_strings[j]);
+			data += ETH_GSTRING_LEN;
+		}
+
+		num_iq_stats = ARRAY_SIZE(oct_iq_stats_strings);
+		for (i = 0; i < MAX_OCTEON_INSTR_QUEUES(oct_dev); i++) {
+			if (!(oct_dev->io_qmask.iq & BIT_ULL(i)))
+				continue;
+			for (j = 0; j < num_iq_stats; j++) {
+				sprintf(data, "tx-%d-%s", i,
+					oct_iq_stats_strings[j]);
+				data += ETH_GSTRING_LEN;
+			}
+		}
+
+		num_oq_stats = ARRAY_SIZE(oct_droq_stats_strings);
+		for (i = 0; i < MAX_OCTEON_OUTPUT_QUEUES(oct_dev); i++) {
+			if (!(oct_dev->io_qmask.oq & BIT_ULL(i)))
+				continue;
+			for (j = 0; j < num_oq_stats; j++) {
+				sprintf(data, "rx-%d-%s", i,
+					oct_droq_stats_strings[j]);
+				data += ETH_GSTRING_LEN;
+			}
+		}
+		break;
+
+	case ETH_SS_PRIV_FLAGS:
+		lio_get_priv_flags_strings(lio, data);
+		break;
+	default:
+		netif_info(lio, drv, lio->netdev, "Unknown Stringset !!\n");
+		break;
+	}
+}
+
 static int lio_get_priv_flags_ss_count(struct lio *lio)
 {
 	struct octeon_device *oct_dev = lio->oct_dev;
 
 	switch (oct_dev->chip_id) {
 	case OCTEON_CN23XX_PF_VID:
+	case OCTEON_CN23XX_VF_VID:
 		return ARRAY_SIZE(oct_priv_flags_strings);
 	case OCTEON_CN68XX:
 	case OCTEON_CN66XX:
@@ -1083,6 +1271,23 @@ static int lio_get_sset_count(struct net_device *netdev, int sset)
 	}
 }
 
+static int lio_vf_get_sset_count(struct net_device *netdev, int sset)
+{
+	struct lio *lio = GET_LIO(netdev);
+	struct octeon_device *oct_dev = lio->oct_dev;
+
+	switch (sset) {
+	case ETH_SS_STATS:
+		return (ARRAY_SIZE(oct_vf_stats_strings) +
+			ARRAY_SIZE(oct_iq_stats_strings) * oct_dev->num_iqs +
+			ARRAY_SIZE(oct_droq_stats_strings) * oct_dev->num_oqs);
+	case ETH_SS_PRIV_FLAGS:
+		return lio_get_priv_flags_ss_count(lio);
+	default:
+		return -EOPNOTSUPP;
+	}
+}
+
 static int lio_get_intr_coalesce(struct net_device *netdev,
 				 struct ethtool_coalesce *intr_coal)
 {
@@ -1095,6 +1300,7 @@ static int lio_get_intr_coalesce(struct net_device *netdev,
 
 	switch (oct->chip_id) {
 	case OCTEON_CN23XX_PF_VID:
+	case OCTEON_CN23XX_VF_VID:
 		if (!intrmod_cfg->rx_enable) {
 			intr_coal->rx_coalesce_usecs = intrmod_cfg->rx_usecs;
 			intr_coal->rx_max_coalesced_frames =
@@ -1141,7 +1347,7 @@ static int lio_get_intr_coalesce(struct net_device *netdev,
 		intr_coal->rx_max_coalesced_frames_low =
 		    intrmod_cfg->rx_mincnt_trigger;
 	}
-	if (OCTEON_CN23XX_PF(oct) &&
+	if ((OCTEON_CN23XX_PF(oct) || OCTEON_CN23XX_VF(oct)) &&
 	    (intrmod_cfg->tx_enable)) {
 		intr_coal->use_adaptive_tx_coalesce = intrmod_cfg->tx_enable;
 		intr_coal->tx_max_coalesced_frames_high =
@@ -1499,6 +1705,26 @@ static int oct_cfg_adaptive_intr(struct lio *lio, struct ethtool_coalesce
 		oct->intrmod.rx_frames = rx_max_coalesced_frames;
 		break;
 	}
+	case OCTEON_CN23XX_VF_VID: {
+		int q_no;
+
+		if (!intr_coal->rx_max_coalesced_frames)
+			rx_max_coalesced_frames = oct->intrmod.rx_frames;
+		else
+			rx_max_coalesced_frames =
+			    intr_coal->rx_max_coalesced_frames;
+		for (q_no = 0; q_no < oct->num_oqs; q_no++) {
+			octeon_write_csr64(
+			    oct, CN23XX_VF_SLI_OQ_PKT_INT_LEVELS(q_no),
+			    (octeon_read_csr64(
+				 oct, CN23XX_VF_SLI_OQ_PKT_INT_LEVELS(q_no)) &
+			     (0x3fffff00000000UL)) |
+				rx_max_coalesced_frames);
+			/* consider writing to resend bit here */
+		}
+		oct->intrmod.rx_frames = rx_max_coalesced_frames;
+		break;
+	}
 	default:
 		return -EINVAL;
 	}
@@ -1552,6 +1778,27 @@ static int oct_cfg_rx_intrtime(struct lio *lio,
 		oct->intrmod.rx_usecs = rx_coalesce_usecs;
 		break;
 	}
+	case OCTEON_CN23XX_VF_VID: {
+		u64 time_threshold;
+		int q_no;
+
+		if (!intr_coal->rx_coalesce_usecs)
+			rx_coalesce_usecs = oct->intrmod.rx_usecs;
+		else
+			rx_coalesce_usecs = intr_coal->rx_coalesce_usecs;
+
+		time_threshold =
+		    cn23xx_vf_get_oq_ticks(oct, (u32)rx_coalesce_usecs);
+		for (q_no = 0; q_no < oct->num_oqs; q_no++) {
+			octeon_write_csr64(
+				oct, CN23XX_VF_SLI_OQ_PKT_INT_LEVELS(q_no),
+				(oct->intrmod.rx_frames |
+				 (time_threshold << 32)));
+			/* consider setting resend bit */
+		}
+		oct->intrmod.rx_usecs = rx_coalesce_usecs;
+		break;
+	}
 	default:
 		return -EINVAL;
 	}
@@ -1573,6 +1820,7 @@ static int oct_cfg_rx_intrtime(struct lio *lio,
 	case OCTEON_CN68XX:
 	case OCTEON_CN66XX:
 		break;
+	case OCTEON_CN23XX_VF_VID:
 	case OCTEON_CN23XX_PF_VID: {
 		int q_no;
 
@@ -1631,6 +1879,7 @@ static int lio_set_intr_coalesce(struct net_device *netdev,
 		}
 		break;
 	case OCTEON_CN23XX_PF_VID:
+	case OCTEON_CN23XX_VF_VID:
 		break;
 	default:
 		return -EINVAL;
@@ -1693,86 +1942,6 @@ static int lio_get_ts_info(struct net_device *netdev,
 	return 0;
 }
 
-static int lio_set_settings(struct net_device *netdev, struct ethtool_cmd *ecmd)
-{
-	struct lio *lio = GET_LIO(netdev);
-	struct octeon_device *oct = lio->oct_dev;
-	struct oct_link_info *linfo;
-	struct octnic_ctrl_pkt nctrl;
-	int ret = 0;
-
-	/* get the link info */
-	linfo = &lio->linfo;
-
-	if (ecmd->autoneg != AUTONEG_ENABLE && ecmd->autoneg != AUTONEG_DISABLE)
-		return -EINVAL;
-
-	if (ecmd->autoneg == AUTONEG_DISABLE && ((ecmd->speed != SPEED_100 &&
-						  ecmd->speed != SPEED_10) ||
-						 (ecmd->duplex != DUPLEX_HALF &&
-						  ecmd->duplex != DUPLEX_FULL)))
-		return -EINVAL;
-
-	/* Ethtool Support is not provided for XAUI, RXAUI, and XFI Interfaces
-	 * as they operate at fixed Speed and Duplex settings
-	 */
-	if (linfo->link.s.if_mode == INTERFACE_MODE_XAUI ||
-	    linfo->link.s.if_mode == INTERFACE_MODE_RXAUI ||
-	    linfo->link.s.if_mode == INTERFACE_MODE_XFI) {
-		dev_info(&oct->pci_dev->dev,
-			 "Autonegotiation, duplex and speed settings cannot be modified.\n");
-		return -EINVAL;
-	}
-
-	memset(&nctrl, 0, sizeof(struct octnic_ctrl_pkt));
-
-	nctrl.ncmd.u64 = 0;
-	nctrl.ncmd.s.cmd = OCTNET_CMD_SET_SETTINGS;
-	nctrl.iq_no = lio->linfo.txpciq[0].s.q_no;
-	nctrl.wait_time = 1000;
-	nctrl.netpndev = (u64)netdev;
-	nctrl.cb_fn = liquidio_link_ctrl_cmd_completion;
-
-	/* Passing the parameters sent by ethtool like Speed, Autoneg & Duplex
-	 * to SE core application using ncmd.s.more & ncmd.s.param
-	 */
-	if (ecmd->autoneg == AUTONEG_ENABLE) {
-		/* Autoneg ON */
-		nctrl.ncmd.s.more = OCTNIC_NCMD_PHY_ON |
-				     OCTNIC_NCMD_AUTONEG_ON;
-		nctrl.ncmd.s.param1 = ecmd->advertising;
-	} else {
-		/* Autoneg OFF */
-		nctrl.ncmd.s.more = OCTNIC_NCMD_PHY_ON;
-
-		nctrl.ncmd.s.param2 = ecmd->duplex;
-
-		nctrl.ncmd.s.param1 = ecmd->speed;
-	}
-
-	ret = octnet_send_nic_ctrl_pkt(lio->oct_dev, &nctrl);
-	if (ret < 0) {
-		dev_err(&oct->pci_dev->dev, "Failed to set settings\n");
-		return -1;
-	}
-
-	return 0;
-}
-
-static int lio_nway_reset(struct net_device *netdev)
-{
-	if (netif_running(netdev)) {
-		struct ethtool_cmd ecmd;
-
-		memset(&ecmd, 0, sizeof(struct ethtool_cmd));
-		ecmd.autoneg = 0;
-		ecmd.speed = 0;
-		ecmd.duplex = 0;
-		lio_set_settings(netdev, &ecmd);
-	}
-	return 0;
-}
-
 /* Return register dump len. */
 static int lio_get_regs_len(struct net_device *dev)
 {
@@ -1782,6 +1951,8 @@ static int lio_get_regs_len(struct net_device *dev)
 	switch (oct->chip_id) {
 	case OCTEON_CN23XX_PF_VID:
 		return OCT_ETHTOOL_REGDUMP_LEN_23XX;
+	case OCTEON_CN23XX_VF_VID:
+		return OCT_ETHTOOL_REGDUMP_LEN_23XX_VF;
 	default:
 		return OCT_ETHTOOL_REGDUMP_LEN;
 	}
@@ -2007,6 +2178,123 @@ static int cn23xx_read_csr_reg(char *s, struct octeon_device *oct)
 	return len;
 }
 
+static int cn23xx_vf_read_csr_reg(char *s, struct octeon_device *oct)
+{
+	int len = 0;
+	u32 reg;
+	int i;
+
+	/* PCI  Window Registers */
+
+	len += sprintf(s + len, "\n\t Octeon CSR Registers\n\n");
+
+	for (i = 0; i < (oct->sriov_info.rings_per_vf); i++) {
+		reg = CN23XX_VF_SLI_OQ_BUFF_INFO_SIZE(i);
+		len += sprintf(s + len,
+			       "\n[%08x] (SLI_PKT%d_OUT_SIZE): %016llx\n",
+			       reg, i, (u64)octeon_read_csr64(oct, reg));
+	}
+
+	for (i = 0; i < (oct->sriov_info.rings_per_vf); i++) {
+		reg = CN23XX_VF_SLI_IQ_INSTR_COUNT64(i);
+		len += sprintf(s + len,
+			       "\n[%08x] (SLI_PKT_IN_DONE%d_CNTS): %016llx\n",
+			       reg, i, (u64)octeon_read_csr64(oct, reg));
+	}
+
+	for (i = 0; i < (oct->sriov_info.rings_per_vf); i++) {
+		reg = CN23XX_VF_SLI_OQ_PKTS_CREDIT(i);
+		len += sprintf(s + len,
+			       "\n[%08x] (SLI_PKT%d_SLIST_BAOFF_DBELL): %016llx\n",
+			       reg, i, (u64)octeon_read_csr64(oct, reg));
+	}
+
+	for (i = 0; i < (oct->sriov_info.rings_per_vf); i++) {
+		reg = CN23XX_VF_SLI_OQ_SIZE(i);
+		len += sprintf(s + len,
+			       "\n[%08x] (SLI_PKT%d_SLIST_FIFO_RSIZE): %016llx\n",
+			       reg, i, (u64)octeon_read_csr64(oct, reg));
+	}
+
+	for (i = 0; i < (oct->sriov_info.rings_per_vf); i++) {
+		reg = CN23XX_VF_SLI_OQ_PKT_CONTROL(i);
+		len += sprintf(s + len,
+			       "\n[%08x] (SLI_PKT%d__OUTPUT_CONTROL): %016llx\n",
+			       reg, i, (u64)octeon_read_csr64(oct, reg));
+	}
+
+	for (i = 0; i < (oct->sriov_info.rings_per_vf); i++) {
+		reg = CN23XX_VF_SLI_OQ_BASE_ADDR64(i);
+		len += sprintf(s + len,
+			       "\n[%08x] (SLI_PKT%d_SLIST_BADDR): %016llx\n",
+			       reg, i, (u64)octeon_read_csr64(oct, reg));
+	}
+
+	for (i = 0; i < (oct->sriov_info.rings_per_vf); i++) {
+		reg = CN23XX_VF_SLI_OQ_PKT_INT_LEVELS(i);
+		len += sprintf(s + len,
+			       "\n[%08x] (SLI_PKT%d_INT_LEVELS): %016llx\n",
+			       reg, i, (u64)octeon_read_csr64(oct, reg));
+	}
+
+	for (i = 0; i < (oct->sriov_info.rings_per_vf); i++) {
+		reg = CN23XX_VF_SLI_OQ_PKTS_SENT(i);
+		len += sprintf(s + len, "\n[%08x] (SLI_PKT%d_CNTS): %016llx\n",
+			       reg, i, (u64)octeon_read_csr64(oct, reg));
+	}
+
+	for (i = 0; i < (oct->sriov_info.rings_per_vf); i++) {
+		reg = 0x100c0 + i * CN23XX_VF_OQ_OFFSET;
+		len += sprintf(s + len,
+			       "\n[%08x] (SLI_PKT%d_ERROR_INFO): %016llx\n",
+			       reg, i, (u64)octeon_read_csr64(oct, reg));
+	}
+
+	for (i = 0; i < (oct->sriov_info.rings_per_vf); i++) {
+		reg = 0x100d0 + i * CN23XX_VF_IQ_OFFSET;
+		len += sprintf(s + len,
+			       "\n[%08x] (SLI_PKT%d_VF_INT_SUM): %016llx\n",
+			       reg, i, (u64)octeon_read_csr64(oct, reg));
+	}
+
+	for (i = 0; i < (oct->sriov_info.rings_per_vf); i++) {
+		reg = CN23XX_VF_SLI_IQ_PKT_CONTROL64(i);
+		len += sprintf(s + len,
+			       "\n[%08x] (SLI_PKT%d_INPUT_CONTROL): %016llx\n",
+			       reg, i, (u64)octeon_read_csr64(oct, reg));
+	}
+
+	for (i = 0; i < (oct->sriov_info.rings_per_vf); i++) {
+		reg = CN23XX_VF_SLI_IQ_BASE_ADDR64(i);
+		len += sprintf(s + len,
+			       "\n[%08x] (SLI_PKT%d_INSTR_BADDR): %016llx\n",
+			       reg, i, (u64)octeon_read_csr64(oct, reg));
+	}
+
+	for (i = 0; i < (oct->sriov_info.rings_per_vf); i++) {
+		reg = CN23XX_VF_SLI_IQ_DOORBELL(i);
+		len += sprintf(s + len,
+			       "\n[%08x] (SLI_PKT%d_INSTR_BAOFF_DBELL): %016llx\n",
+			       reg, i, (u64)octeon_read_csr64(oct, reg));
+	}
+
+	for (i = 0; i < (oct->sriov_info.rings_per_vf); i++) {
+		reg = CN23XX_VF_SLI_IQ_SIZE(i);
+		len += sprintf(s + len,
+			       "\n[%08x] (SLI_PKT%d_INSTR_FIFO_RSIZE): %016llx\n",
+			       reg, i, (u64)octeon_read_csr64(oct, reg));
+	}
+
+	for (i = 0; i < (oct->sriov_info.rings_per_vf); i++) {
+		reg = CN23XX_VF_SLI_IQ_INSTR_COUNT64(i);
+		len += sprintf(s + len,
+			       "\n[%08x] (SLI_PKT_IN_DONE%d_CNTS): %016llx\n",
+			       reg, i, (u64)octeon_read_csr64(oct, reg));
+	}
+
+	return len;
+}
+
 static int cn6xxx_read_csr_reg(char *s, struct octeon_device *oct)
 {
 	u32 reg;
@@ -2153,6 +2441,10 @@ static void lio_get_regs(struct net_device *dev,
 		memset(regbuf, 0, OCT_ETHTOOL_REGDUMP_LEN_23XX);
 		len += cn23xx_read_csr_reg(regbuf + len, oct);
 		break;
+	case OCTEON_CN23XX_VF_VID:
+		memset(regbuf, 0, OCT_ETHTOOL_REGDUMP_LEN_23XX_VF);
+		len += cn23xx_vf_read_csr_reg(regbuf + len, oct);
+		break;
 	case OCTEON_CN68XX:
 	case OCTEON_CN66XX:
 		memset(regbuf, 0, OCT_ETHTOOL_REGDUMP_LEN);
@@ -2183,7 +2475,7 @@ static int lio_set_priv_flags(struct net_device *netdev, u32 flags)
 }
 
 static const struct ethtool_ops lio_ethtool_ops = {
-	.get_settings		= lio_get_settings,
+	.get_link_ksettings	= lio_get_link_ksettings,
 	.get_link		= ethtool_op_get_link,
 	.get_drvinfo		= lio_get_drvinfo,
 	.get_ringparam		= lio_ethtool_get_ringparam,
@@ -2200,8 +2492,26 @@ static int lio_set_priv_flags(struct net_device *netdev, u32 flags)
 	.get_msglevel		= lio_get_msglevel,
 	.set_msglevel		= lio_set_msglevel,
 	.get_sset_count		= lio_get_sset_count,
-	.nway_reset		= lio_nway_reset,
-	.set_settings		= lio_set_settings,
+	.get_coalesce		= lio_get_intr_coalesce,
+	.set_coalesce		= lio_set_intr_coalesce,
+	.get_priv_flags		= lio_get_priv_flags,
+	.set_priv_flags		= lio_set_priv_flags,
+	.get_ts_info		= lio_get_ts_info,
+};
+
+static const struct ethtool_ops lio_vf_ethtool_ops = {
+	.get_link_ksettings	= lio_get_link_ksettings,
+	.get_link		= ethtool_op_get_link,
+	.get_drvinfo		= lio_get_vf_drvinfo,
+	.get_ringparam		= lio_ethtool_get_ringparam,
+	.get_channels		= lio_ethtool_get_channels,
+	.get_strings		= lio_vf_get_strings,
+	.get_ethtool_stats	= lio_vf_get_ethtool_stats,
+	.get_regs_len		= lio_get_regs_len,
+	.get_regs		= lio_get_regs,
+	.get_msglevel		= lio_get_msglevel,
+	.set_msglevel		= lio_set_msglevel,
+	.get_sset_count		= lio_vf_get_sset_count,
 	.get_coalesce		= lio_get_intr_coalesce,
 	.set_coalesce		= lio_set_intr_coalesce,
 	.get_priv_flags		= lio_get_priv_flags,
@@ -2211,5 +2521,11 @@ static int lio_set_priv_flags(struct net_device *netdev, u32 flags)
 
 void liquidio_set_ethtool_ops(struct net_device *netdev)
 {
-	netdev->ethtool_ops = &lio_ethtool_ops;
+	struct lio *lio = GET_LIO(netdev);
+	struct octeon_device *oct = lio->oct_dev;
+
+	if (OCTEON_CN23XX_VF(oct))
+		netdev->ethtool_ops = &lio_vf_ethtool_ops;
+	else
+		netdev->ethtool_ops = &lio_ethtool_ops;
 }
diff --git a/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c b/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c
index 5d1023b..97e9b6b 100644
--- a/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c
+++ b/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c
@@ -1824,6 +1824,56 @@ static int liquidio_set_mac(struct net_device *netdev, void *p)
 }
 
 /**
+ * \brief Net device get_stats
+ * @param netdev network device
+ */
+static struct net_device_stats *liquidio_get_stats(struct net_device *netdev)
+{
+	struct lio *lio = GET_LIO(netdev);
+	struct net_device_stats *stats = &netdev->stats;
+	u64 pkts = 0, drop = 0, bytes = 0;
+	struct oct_droq_stats *oq_stats;
+	struct oct_iq_stats *iq_stats;
+	struct octeon_device *oct;
+	int i, iq_no, oq_no;
+
+	oct = lio->oct_dev;
+
+	for (i = 0; i < lio->linfo.num_txpciq; i++) {
+		iq_no = lio->linfo.txpciq[i].s.q_no;
+		iq_stats = &oct->instr_queue[iq_no]->stats;
+		pkts += iq_stats->tx_done;
+		drop += iq_stats->tx_dropped;
+		bytes += iq_stats->tx_tot_bytes;
+	}
+
+	stats->tx_packets = pkts;
+	stats->tx_bytes = bytes;
+	stats->tx_dropped = drop;
+
+	pkts = 0;
+	drop = 0;
+	bytes = 0;
+
+	for (i = 0; i < lio->linfo.num_rxpciq; i++) {
+		oq_no = lio->linfo.rxpciq[i].s.q_no;
+		oq_stats = &oct->droq[oq_no]->stats;
+		pkts += oq_stats->rx_pkts_received;
+		drop += (oq_stats->rx_dropped +
+			 oq_stats->dropped_nodispatch +
+			 oq_stats->dropped_toomany +
+			 oq_stats->dropped_nomem);
+		bytes += oq_stats->rx_bytes_received;
+	}
+
+	stats->rx_bytes = bytes;
+	stats->rx_packets = pkts;
+	stats->rx_dropped = drop;
+
+	return stats;
+}
+
+/**
  * \brief Net device change_mtu
  * @param netdev network device
  */
@@ -2325,6 +2375,7 @@ static void liquidio_del_vxlan_port(struct net_device *netdev,
 	.ndo_open		= liquidio_open,
 	.ndo_stop		= liquidio_stop,
 	.ndo_start_xmit		= liquidio_xmit,
+	.ndo_get_stats		= liquidio_get_stats,
 	.ndo_set_mac_address	= liquidio_set_mac,
 	.ndo_set_rx_mode	= liquidio_set_mcast_list,
 	.ndo_tx_timeout		= liquidio_tx_timeout,
@@ -2614,6 +2665,13 @@ static int setup_nic_devices(struct octeon_device *octeon_dev)
 			goto setup_nic_dev_fail;
 		}
 
+		/* Register ethtool support */
+		liquidio_set_ethtool_ops(netdev);
+		if (lio->oct_dev->chip_id == OCTEON_CN23XX_VF_VID)
+			octeon_dev->priv_flags = OCT_PRIV_FLAG_DEFAULT;
+		else
+			octeon_dev->priv_flags = 0x0;
+
 		if (netdev->features & NETIF_F_LRO)
 			liquidio_set_feature(netdev, OCTNET_CMD_LRO_ENABLE,
 					     OCTNIC_LROIPV4 | OCTNIC_LROIPV6);
@@ -2679,6 +2737,7 @@ static int setup_nic_devices(struct octeon_device *octeon_dev)
  */
 static int liquidio_init_nic_module(struct octeon_device *oct)
 {
+	struct oct_intrmod_cfg *intrmod_cfg;
 	int num_nic_ports = 1;
 	int i, retval = 0;
 
@@ -2700,6 +2759,26 @@ static int liquidio_init_nic_module(struct octeon_device *oct)
 		goto octnet_init_failure;
 	}
 
+	/* Initialize interrupt moderation params */
+	intrmod_cfg = &((struct octeon_device *)oct)->intrmod;
+	intrmod_cfg->rx_enable = 1;
+	intrmod_cfg->check_intrvl = LIO_INTRMOD_CHECK_INTERVAL;
+	intrmod_cfg->maxpkt_ratethr = LIO_INTRMOD_MAXPKT_RATETHR;
+	intrmod_cfg->minpkt_ratethr = LIO_INTRMOD_MINPKT_RATETHR;
+	intrmod_cfg->rx_maxcnt_trigger = LIO_INTRMOD_RXMAXCNT_TRIGGER;
+	intrmod_cfg->rx_maxtmr_trigger = LIO_INTRMOD_RXMAXTMR_TRIGGER;
+	intrmod_cfg->rx_mintmr_trigger = LIO_INTRMOD_RXMINTMR_TRIGGER;
+	intrmod_cfg->rx_mincnt_trigger = LIO_INTRMOD_RXMINCNT_TRIGGER;
+	intrmod_cfg->tx_enable = 1;
+	intrmod_cfg->tx_maxcnt_trigger = LIO_INTRMOD_TXMAXCNT_TRIGGER;
+	intrmod_cfg->tx_mincnt_trigger = LIO_INTRMOD_TXMINCNT_TRIGGER;
+	intrmod_cfg->rx_frames = CFG_GET_OQ_INTR_PKT(octeon_get_conf(oct));
+	intrmod_cfg->rx_usecs = CFG_GET_OQ_INTR_TIME(octeon_get_conf(oct));
+	intrmod_cfg->tx_frames = CFG_GET_IQ_INTR_PKT(octeon_get_conf(oct));
+	dev_dbg(&oct->pci_dev->dev, "Network interfaces ready\n");
+
+	return retval;
+
 octnet_init_failure:
 
 	oct->ifcount = 0;
-- 
1.8.3.1

^ permalink raw reply related

* [PATCH net-next 2/5] liquidio VF vxlan
From: Raghu Vatsavayi @ 2016-12-08 21:00 UTC (permalink / raw)
  To: davem
  Cc: netdev, Raghu Vatsavayi, Raghu Vatsavayi, Derek Chickles,
	Satanand Burla, Felix Manlunas
In-Reply-To: <1481230848-2393-1-git-send-email-rvatsavayi@caviumnetworks.com>

Adds VF vxlan offload support.

Signed-off-by: Raghu Vatsavayi <raghu.vatsavayi@caviumnetworks.com>
Signed-off-by: Derek Chickles <derek.chickles@caviumnetworks.com>
Signed-off-by: Satanand Burla <satananda.burla@caviumnetworks.com>
Signed-off-by: Felix Manlunas <felix.manlunas@caviumnetworks.com>
---
 drivers/net/ethernet/cavium/liquidio/lio_vf_main.c | 97 +++++++++++++++++++++-
 1 file changed, 94 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c b/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c
index bcfc927..5d1023b 100644
--- a/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c
+++ b/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c
@@ -1398,12 +1398,24 @@ static u16 select_q(struct net_device *dev, struct sk_buff *skb,
 		skb->protocol = eth_type_trans(skb, skb->dev);
 
 		if ((netdev->features & NETIF_F_RXCSUM) &&
-		    (rh->r_dh.csum_verified & CNNIC_CSUM_VERIFIED))
+		    (((rh->r_dh.encap_on) &&
+		      (rh->r_dh.csum_verified & CNNIC_TUN_CSUM_VERIFIED)) ||
+		     (!(rh->r_dh.encap_on) &&
+		      (rh->r_dh.csum_verified & CNNIC_CSUM_VERIFIED))))
 			/* checksum has already been verified */
 			skb->ip_summed = CHECKSUM_UNNECESSARY;
 		else
 			skb->ip_summed = CHECKSUM_NONE;
 
+		/* Setting Encapsulation field on basis of status received
+		 * from the firmware
+		 */
+		if (rh->r_dh.encap_on) {
+			skb->encapsulation = 1;
+			skb->csum_level = 1;
+			droq->stats.rx_vxlan++;
+		}
+
 		/* inbound VLAN tag */
 		if ((netdev->features & NETIF_F_HW_VLAN_CTAG_RX) &&
 		    rh->r_dh.vlan) {
@@ -1916,8 +1928,14 @@ static int liquidio_xmit(struct sk_buff *skb, struct net_device *netdev)
 	cmdsetup.u64 = 0;
 	cmdsetup.s.iq_no = iq_no;
 
-	if (skb->ip_summed == CHECKSUM_PARTIAL)
-		cmdsetup.s.transport_csum = 1;
+	if (skb->ip_summed == CHECKSUM_PARTIAL) {
+		if (skb->encapsulation) {
+			cmdsetup.s.tnl_csum = 1;
+			stats->tx_vxlan++;
+		} else {
+			cmdsetup.s.transport_csum = 1;
+		}
+	}
 
 	if (!skb_shinfo(skb)->nr_frags) {
 		cmdsetup.s.u.datasize = skb->len;
@@ -2177,6 +2195,40 @@ static int liquidio_set_rxcsum_command(struct net_device *netdev, int command,
 	return ret;
 }
 
+/** Sending command to add/delete VxLAN UDP port to firmware
+ * @param netdev                pointer to network device
+ * @param command               OCTNET_CMD_VXLAN_PORT_CONFIG
+ * @param vxlan_port            VxLAN port to be added or deleted
+ * @param vxlan_cmd_bit         OCTNET_CMD_VXLAN_PORT_ADD,
+ *                              OCTNET_CMD_VXLAN_PORT_DEL
+ * @returns                     SUCCESS or FAILURE
+ */
+static int liquidio_vxlan_port_command(struct net_device *netdev, int command,
+				       u16 vxlan_port, u8 vxlan_cmd_bit)
+{
+	struct lio *lio = GET_LIO(netdev);
+	struct octeon_device *oct = lio->oct_dev;
+	struct octnic_ctrl_pkt nctrl;
+	int ret = 0;
+
+	nctrl.ncmd.u64 = 0;
+	nctrl.ncmd.s.cmd = command;
+	nctrl.ncmd.s.more = vxlan_cmd_bit;
+	nctrl.ncmd.s.param1 = vxlan_port;
+	nctrl.iq_no = lio->linfo.txpciq[0].s.q_no;
+	nctrl.wait_time = 100;
+	nctrl.netpndev = (u64)netdev;
+	nctrl.cb_fn = liquidio_link_ctrl_cmd_completion;
+
+	ret = octnet_send_nic_ctrl_pkt(lio->oct_dev, &nctrl);
+	if (ret < 0) {
+		dev_err(&oct->pci_dev->dev,
+			"DEVFLAGS VxLAN port add/delete failed in core (ret : 0x%x)\n",
+			ret);
+	}
+	return ret;
+}
+
 /** \brief Net device fix features
  * @param netdev  pointer to network device
  * @param request features requested
@@ -2245,6 +2297,30 @@ static int liquidio_set_features(struct net_device *netdev,
 	return 0;
 }
 
+static void liquidio_add_vxlan_port(struct net_device *netdev,
+				    struct udp_tunnel_info *ti)
+{
+	if (ti->type != UDP_TUNNEL_TYPE_VXLAN)
+		return;
+
+	liquidio_vxlan_port_command(netdev,
+				    OCTNET_CMD_VXLAN_PORT_CONFIG,
+				    htons(ti->port),
+				    OCTNET_CMD_VXLAN_PORT_ADD);
+}
+
+static void liquidio_del_vxlan_port(struct net_device *netdev,
+				    struct udp_tunnel_info *ti)
+{
+	if (ti->type != UDP_TUNNEL_TYPE_VXLAN)
+		return;
+
+	liquidio_vxlan_port_command(netdev,
+				    OCTNET_CMD_VXLAN_PORT_CONFIG,
+				    htons(ti->port),
+				    OCTNET_CMD_VXLAN_PORT_DEL);
+}
+
 static const struct net_device_ops lionetdevops = {
 	.ndo_open		= liquidio_open,
 	.ndo_stop		= liquidio_stop,
@@ -2257,6 +2333,8 @@ static int liquidio_set_features(struct net_device *netdev,
 	.ndo_change_mtu		= liquidio_change_mtu,
 	.ndo_fix_features	= liquidio_fix_features,
 	.ndo_set_features	= liquidio_set_features,
+	.ndo_udp_tunnel_add     = liquidio_add_vxlan_port,
+	.ndo_udp_tunnel_del     = liquidio_del_vxlan_port,
 	.ndo_select_queue	= select_q,
 };
 
@@ -2462,6 +2540,19 @@ static int setup_nic_devices(struct octeon_device *octeon_dev)
 				      | NETIF_F_LRO;
 		netif_set_gso_max_size(netdev, OCTNIC_GSO_MAX_SIZE);
 
+		/* Copy of transmit encapsulation capabilities:
+		 * TSO, TSO6, Checksums for this device
+		 */
+		lio->enc_dev_capability = NETIF_F_IP_CSUM
+					  | NETIF_F_IPV6_CSUM
+					  | NETIF_F_GSO_UDP_TUNNEL
+					  | NETIF_F_HW_CSUM | NETIF_F_SG
+					  | NETIF_F_RXCSUM
+					  | NETIF_F_TSO | NETIF_F_TSO6
+					  | NETIF_F_LRO;
+
+		netdev->hw_enc_features =
+		    (lio->enc_dev_capability & ~NETIF_F_LRO);
 		netdev->vlan_features = lio->dev_capability;
 		/* Add any unchangeable hw features */
 		lio->dev_capability |= NETIF_F_HW_VLAN_CTAG_FILTER |
-- 
1.8.3.1

^ permalink raw reply related

* [PATCH net-next 1/5] liquidio VF vlan support
From: Raghu Vatsavayi @ 2016-12-08 21:00 UTC (permalink / raw)
  To: davem
  Cc: netdev, Raghu Vatsavayi, Raghu Vatsavayi, Derek Chickles,
	Satanand Burla, Felix Manlunas
In-Reply-To: <1481230848-2393-1-git-send-email-rvatsavayi@caviumnetworks.com>

Adds support for VF vlan features.

Signed-off-by: Raghu Vatsavayi <raghu.vatsavayi@caviumnetworks.com>
Signed-off-by: Derek Chickles <derek.chickles@caviumnetworks.com>
Signed-off-by: Satanand Burla <satananda.burla@caviumnetworks.com>
Signed-off-by: Felix Manlunas <felix.manlunas@caviumnetworks.com>
---
 drivers/net/ethernet/cavium/liquidio/lio_vf_main.c | 80 ++++++++++++++++++++++
 1 file changed, 80 insertions(+)

diff --git a/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c b/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c
index 9989ac3..bcfc927 100644
--- a/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c
+++ b/drivers/net/ethernet/cavium/liquidio/lio_vf_main.c
@@ -1350,6 +1350,7 @@ static u16 select_q(struct net_device *dev, struct sk_buff *skb,
 		container_of(param, struct octeon_droq, napi);
 	struct net_device *netdev = (struct net_device *)arg;
 	struct sk_buff *skb = (struct sk_buff *)skbuff;
+	u16 vtag = 0;
 
 	if (netdev) {
 		struct lio *lio = GET_LIO(netdev);
@@ -1403,6 +1404,16 @@ static u16 select_q(struct net_device *dev, struct sk_buff *skb,
 		else
 			skb->ip_summed = CHECKSUM_NONE;
 
+		/* inbound VLAN tag */
+		if ((netdev->features & NETIF_F_HW_VLAN_CTAG_RX) &&
+		    rh->r_dh.vlan) {
+			u16 priority = rh->r_dh.priority;
+			u16 vid = rh->r_dh.vlan;
+
+			vtag = (priority << VLAN_PRIO_SHIFT) | vid;
+			__vlan_hwaccel_put_tag(skb, htons(ETH_P_8021Q), vtag);
+		}
+
 		packet_was_received = (napi_gro_receive(napi, skb) != GRO_DROP);
 
 		if (packet_was_received) {
@@ -2025,6 +2036,12 @@ static int liquidio_xmit(struct sk_buff *skb, struct net_device *netdev)
 		tx_info->s.gso_segs = skb_shinfo(skb)->gso_segs;
 	}
 
+	/* HW insert VLAN tag */
+	if (skb_vlan_tag_present(skb)) {
+		irh->priority = skb_vlan_tag_get(skb) >> VLAN_PRIO_SHIFT;
+		irh->vlan = skb_vlan_tag_get(skb) & VLAN_VID_MASK;
+	}
+
 	status = octnet_send_nic_data_pkt(oct, &ndata);
 	if (status == IQ_SEND_FAILED)
 		goto lio_xmit_failed;
@@ -2074,6 +2091,61 @@ static void liquidio_tx_timeout(struct net_device *netdev)
 	txqs_wake(netdev);
 }
 
+static int
+liquidio_vlan_rx_add_vid(struct net_device *netdev,
+			 __be16 proto __attribute__((unused)), u16 vid)
+{
+	struct lio *lio = GET_LIO(netdev);
+	struct octeon_device *oct = lio->oct_dev;
+	struct octnic_ctrl_pkt nctrl;
+	int ret = 0;
+
+	memset(&nctrl, 0, sizeof(struct octnic_ctrl_pkt));
+
+	nctrl.ncmd.u64 = 0;
+	nctrl.ncmd.s.cmd = OCTNET_CMD_ADD_VLAN_FILTER;
+	nctrl.ncmd.s.param1 = vid;
+	nctrl.iq_no = lio->linfo.txpciq[0].s.q_no;
+	nctrl.wait_time = 100;
+	nctrl.netpndev = (u64)netdev;
+	nctrl.cb_fn = liquidio_link_ctrl_cmd_completion;
+
+	ret = octnet_send_nic_ctrl_pkt(lio->oct_dev, &nctrl);
+	if (ret < 0) {
+		dev_err(&oct->pci_dev->dev, "Add VLAN filter failed in core (ret: 0x%x)\n",
+			ret);
+	}
+
+	return ret;
+}
+
+static int
+liquidio_vlan_rx_kill_vid(struct net_device *netdev,
+			  __be16 proto __attribute__((unused)), u16 vid)
+{
+	struct lio *lio = GET_LIO(netdev);
+	struct octeon_device *oct = lio->oct_dev;
+	struct octnic_ctrl_pkt nctrl;
+	int ret = 0;
+
+	memset(&nctrl, 0, sizeof(struct octnic_ctrl_pkt));
+
+	nctrl.ncmd.u64 = 0;
+	nctrl.ncmd.s.cmd = OCTNET_CMD_DEL_VLAN_FILTER;
+	nctrl.ncmd.s.param1 = vid;
+	nctrl.iq_no = lio->linfo.txpciq[0].s.q_no;
+	nctrl.wait_time = 100;
+	nctrl.netpndev = (u64)netdev;
+	nctrl.cb_fn = liquidio_link_ctrl_cmd_completion;
+
+	ret = octnet_send_nic_ctrl_pkt(lio->oct_dev, &nctrl);
+	if (ret < 0) {
+		dev_err(&oct->pci_dev->dev, "Add VLAN filter failed in core (ret: 0x%x)\n",
+			ret);
+	}
+	return ret;
+}
+
 /** Sending command to enable/disable RX checksum offload
  * @param netdev                pointer to network device
  * @param command               OCTNET_CMD_TNL_RX_CSUM_CTL
@@ -2180,6 +2252,8 @@ static int liquidio_set_features(struct net_device *netdev,
 	.ndo_set_mac_address	= liquidio_set_mac,
 	.ndo_set_rx_mode	= liquidio_set_mcast_list,
 	.ndo_tx_timeout		= liquidio_tx_timeout,
+	.ndo_vlan_rx_add_vid    = liquidio_vlan_rx_add_vid,
+	.ndo_vlan_rx_kill_vid   = liquidio_vlan_rx_kill_vid,
 	.ndo_change_mtu		= liquidio_change_mtu,
 	.ndo_fix_features	= liquidio_fix_features,
 	.ndo_set_features	= liquidio_set_features,
@@ -2388,6 +2462,12 @@ static int setup_nic_devices(struct octeon_device *octeon_dev)
 				      | NETIF_F_LRO;
 		netif_set_gso_max_size(netdev, OCTNIC_GSO_MAX_SIZE);
 
+		netdev->vlan_features = lio->dev_capability;
+		/* Add any unchangeable hw features */
+		lio->dev_capability |= NETIF_F_HW_VLAN_CTAG_FILTER |
+				       NETIF_F_HW_VLAN_CTAG_RX |
+				       NETIF_F_HW_VLAN_CTAG_TX;
+
 		netdev->features = (lio->dev_capability & ~NETIF_F_LRO);
 
 		netdev->hw_features = lio->dev_capability;
-- 
1.8.3.1

^ permalink raw reply related

* [PATCH net-next 0/5] liquidio VF offloads and stats
From: Raghu Vatsavayi @ 2016-12-08 21:00 UTC (permalink / raw)
  To: davem; +Cc: netdev, Raghu Vatsavayi

Dave,

Following is final patch series in completing the liquidio
VF driver support. These patches have minor changes related
to offloads and stats.

Please apply patches in following order as some of them
depend on earlier patches.

Raghu Vatsavayi (5):
  liquidio VF vlan support
  liquidio VF vxlan
  liquidio VF ethtool stats
  liquidio VF timestamp
  liquidio VF error handling

 .../ethernet/cavium/liquidio/cn23xx_vf_device.h    |   2 +
 drivers/net/ethernet/cavium/liquidio/lio_ethtool.c | 512 ++++++++++++++----
 drivers/net/ethernet/cavium/liquidio/lio_vf_main.c | 575 ++++++++++++++++++++-
 3 files changed, 987 insertions(+), 102 deletions(-)

-- 
1.8.3.1

^ permalink raw reply

* Re: [net-next PATCH v5 0/6] XDP for virtio_net
From: Michael S. Tsirkin @ 2016-12-08 20:58 UTC (permalink / raw)
  To: John Fastabend
  Cc: Alexei Starovoitov, David Miller, daniel, shm, tgraf,
	john.r.fastabend, netdev, brouer
In-Reply-To: <5849C68F.7080707@gmail.com>

On Thu, Dec 08, 2016 at 12:46:07PM -0800, John Fastabend wrote:
> On 16-12-08 11:38 AM, Alexei Starovoitov wrote:
> > On Thu, Dec 08, 2016 at 02:17:02PM -0500, David Miller wrote:
> >> From: John Fastabend <john.fastabend@gmail.com>
> >> Date: Wed, 07 Dec 2016 12:10:47 -0800
> >>
> >>> This implements virtio_net for the mergeable buffers and big_packet
> >>> modes. I tested this with vhost_net running on qemu and did not see
> >>> any issues. For testing num_buf > 1 I added a hack to vhost driver
> >>> to only but 100 bytes per buffer.
> >>  ...
> >>
> >> So where are we with this?
> 
> There is one possible issue with a hang that Michael pointed out. I can
> either spin a v6 or if you pull this v5 series in I can post a bugfix
> for it. I am not seeing the issue in practice XDP virtio has been up
> and running on my box here for days without issue.


I'd prefer it fixed. Alternatively, apply just 1-3 for now.

> All the concerns below are really future XDP ideas and unrelated to
> this series or at least not required for this series to applied IMO.
> 
> >>
> >> I'm not too thrilled with the idea of making XDP_TX optional or
> >> something like that.  If someone enables XDP, there is a tradeoff.
> >>
> >> I also have reservations about the idea to make jumbo frames work
> >> without giving XDP access to the whole packet.  If it wants to push or
> >> pop a header, it might need to know the whole packet length.  How will
> >> you pass that to the XDP program?
> >>
> >> Some kinds of encapsulation require trailers, thus preclusing access
> >> to the entire packet precludes those kinds of transformations.
> > 
> > +1
> 
> This was sort of speculative on my side it is certainly not dependent on
> the series here. I agree that we don't want to get into a state where
> program X runs here and not there and only runs after doing magic
> incantations, etc. I would only propose it if there is a clean way to
> implement this.
> 
> > 
> >> This is why we want simple, linear, buffer access for XDP.
> >>
> >> Even the most seemingly minor exception turns into a huge complicated
> >> mess.
> > 
> > +1
> 
> Yep.
> 
> > 
> > and from the other thread:
> >>> Can't we disable XDP_TX somehow? Many people might only want RX drop,
> >>> and extra queues are not always there.
> >>>
> >>
> >> Alexei, Daniel, any thoughts on this?
> > 
> > I don't like it.
> > 
> 
> OK alternatively we can make more queues available in virtio which might
> be the better solution.
> 
> >> I know we were trying to claim some base level of feature support for
> >> all XDP drivers. I am sympathetic to this argument though for DDOS we
> >> do not need XDP_TX support. And virtio can become queue constrained
> >> in some cases.
> > 
> > especially for ddos case doing lro/gro is not helpful.
> 
> Fair enough but disabling LRO to handle the case where you "might" get
> a DDOS will hurt normal good traffic.
> 
> > I frankly don't see a use case where you'd want to steer a packet
> > all the way into VM just to drop them there?
> 
> VM to VM traffic is my use case. And in that model we need XDP at the
> virtio or vhost layer in case of malicious/broke/untrusted VM. I have
> some vhost patches under development for when net-next opens up again.
> 
> > Without XDP_TX it's too crippled. adjust_head() won't be possible,
> 
> Just a nit but any reason not to support adjust_head and then XDP_PASS.
> I don't have a use case in mind but also see no reason to preclude it.
> 
> > packet mangling would have to be disabled and so on.
> > If xdp program doesn't see raw packet it can only parse the headers of
> > this jumbo meta-packet and drop it, but for virtio it's really too late.
> > ddos protection needs to be done at the earliest hw nic receive.
> 
> VM to VM traffic never touches hw nic.
> 
> > I think if driver claims xdp support it needs to support
> > drop/pass/tx and adjust_head. For metadata passing up into stack from xdp
> > we need adjust_head, for encap/decap we need it too. And lro is in the way
> > of such transformations.
> > We struggled a lot with cls_bpf due to all metadata inside skb that needs
> > to be kept correct. Feeding non-raw packets into xdp is a rat hole.
> > 
> 
> In summary:
> 
> I think its worth investigating getting LRO working but agree we can't
> sacrifice any of the existing features or complicate the code to do it.
> If the result of investigating is it can't be done then that is how it
> is.
> 
> Jumbo frames I care very little about in reality so should not have
> mentioned it.
> 
> Requiring XDP drivers to support all features is fine for me I can make
> the virtio queue scheme a bit more flexible. Michael might have some
> opinion on this though.
> 
> This series shouldn't be blocked by any of the above.
> 
> Thanks,
> .John

^ permalink raw reply

* Re: [PATCH v3 net-next 2/3] openvswitch: Use is_skb_forwardable() for length check.
From: Eric Garver @ 2016-12-08 20:50 UTC (permalink / raw)
  To: Pravin Shelar; +Cc: Jiri Benc, Jarno Rajahalme, Linux Kernel Network Developers
In-Reply-To: <CAOrHB_BTRP69VDeA2_cSf_qKh05Wj=OinaqU3Uu=52Wus6xS8w@mail.gmail.com>

On Sun, Dec 04, 2016 at 04:22:40PM -0800, Pravin Shelar wrote:
> On Fri, Dec 2, 2016 at 1:25 AM, Jiri Benc <jbenc@redhat.com> wrote:
> > On Thu, 1 Dec 2016 11:50:00 -0800, Pravin Shelar wrote:
> >> This is not changing any behavior compared to current OVS vlan checks.
> >> Single vlan header is not considered for MTU check.
> >
> > It is changing it.
> >
> > Consider the case when there's an interface with MTU 1500 forwarding to
> > an interface with MTU 1496. Obviously, full-sized vlan frames
> > ingressing on the first interface are not forwardable to the second
> > one. Yet, if the vlan tag is accelerated (and thus not counted in
> > skb->len), is_skb_forwardable happily returns true because of the check
> >
> >         len = dev->mtu + dev->hard_header_len + VLAN_HLEN;
> >         if (skb->len <= len)
> >
> ok, This case would be allowed due to this patch. But core linux stack
> and bridge is using this check then why not just use same forwarding
> check in OVS too, this make it consistent with core networking
> forwarding expectations.

Should we not also follow the "skbs are untagged" approach that the rest
of the kernel uses? I'm referring to patches 1 and 2 form Jiri's series
"openvswitch: make vlan handling consistent".

With those changes is_skb_forwardable() would behave as expected here.

^ permalink raw reply

* Re: [PATCH v2 net-next 0/4] udp: receive path optimizations
From: Jesper Dangaard Brouer @ 2016-12-08 20:48 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: brouer, David S . Miller, netdev, Paolo Abeni, Eric Dumazet
In-Reply-To: <1481218739-27089-1-git-send-email-edumazet@google.com>

On Thu,  8 Dec 2016 09:38:55 -0800
Eric Dumazet <edumazet@google.com> wrote:

> This patch series provides about 100 % performance increase under flood. 

Could you please explain a bit more about what kind of testing you are
doing that can show 100% performance improvement?

I've tested this patchset and my tests show *huge* speeds ups, but
reaping the performance benefit depend heavily on setup and enabling
the right UDP socket settings, and most importantly where the
performance bottleneck is: ksoftirqd(producer) or udp_sink(consumer).

Basic setup: Unload all netfilter, and enable ip_early_demux.
 sysctl net/ipv4/ip_early_demux=1

Test generator pktgen UDP packets single flow, 50Gbit/s mlx5 NICs.
 - Vary packet size between 64 and 1514.

Packet-size: 64
$ sudo taskset -c 4 ./udp_sink --port 9 --count $((10**7))
                                ns/pkt  pps             cycles/pkt
recvMmsg/32  	run: 0 10000000	537.70	1859756.90	2155
recvmsg   	run: 0 10000000	510.84	1957541.83	2047
read      	run: 0 10000000	583.40	1714077.14	2338
recvfrom  	run: 0 10000000	600.09	1666411.49	2405

The ksoftirq thread "cost" more than udp_sink, which is idle, and UDP
queue does not get full-enough. Thus, patchset does not have any
effect.

Try to increase pktgen packet size, as this increase the copy cost of
udp_sink.  Thus, a queue can now form, and udp_sink CPU almost have no
idle cycles.  The "read" and "readfrom" did experience some idle
cycles.

Packet-size: 1514
$ sudo taskset -c 4 ./udp_sink --port 9 --count $((10**7))
                                ns/pkt  pps             cycles/pkt
recvMmsg/32  	run: 0 10000000	435.88	2294204.11	1747
recvmsg   	run: 0 10000000	458.06	2183100.64	1835
read      	run: 0 10000000	520.34	1921826.18	2085
recvfrom  	run: 0 10000000	515.48	1939935.27	2066

Next trick connected UDP:

Use connected UDP socket (combined with ip_early_demux), removes the
FIB_lookup from the ksoftirq, and cause tipping point to be better.

Packet-size: 64
$ sudo taskset -c 4 ./udp_sink --port 9 --count $((10**7)) --connect
                                ns/pkt  pps             cycles/pkt
recvMmsg/32  	run: 0 10000000	391.18	2556361.62	1567
recvmsg   	run: 0 10000000	422.95	2364349.69	1695
read      	run: 0 10000000	425.29	2351338.10	1704
recvfrom  	run: 0 10000000	476.74	2097577.57	1910

Change/increase packet size:

Packet-size: 1514
$ sudo taskset -c 4 ./udp_sink --port 9 --count $((10**7)) --connect
                                ns/pkt  pps             cycles/pkt
recvMmsg/32  	run: 0 10000000	457.56	2185481.94	1833
recvmsg   	run: 0 10000000	479.42	2085837.49	1921
read      	run: 0 10000000	398.05	2512233.13	1595
recvfrom  	run: 0 10000000	391.07	2557096.95	1567

A bit strange, changing the packet size, flipped what is the fastest
syscall.

It is also interesting to see that ksoftirq limit is:

Result from "nstat" while using recvmsg, show that ksoftirq is
handling 2.6 Mpps, and consumer/udp_sink is bottleneck with 2Mpps.

[skylake ~]$ nstat > /dev/null && sleep 1  && nstat
#kernel
IpInReceives                    2667577            0.0
IpInDelivers                    2667577            0.0
UdpInDatagrams                  2083580            0.0
UdpInErrors                     583995             0.0
UdpRcvbufErrors                 583995             0.0
IpExtInOctets                   4001340000         0.0
IpExtInNoECTPkts                2667559            0.0

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply

* Re: [net-next PATCH v5 0/6] XDP for virtio_net
From: John Fastabend @ 2016-12-08 20:46 UTC (permalink / raw)
  To: Alexei Starovoitov, David Miller
  Cc: daniel, mst, shm, tgraf, john.r.fastabend, netdev, brouer,
	Michael S. Tsirkin
In-Reply-To: <20161208193814.GA1954@ast-mbp.thefacebook.com>

On 16-12-08 11:38 AM, Alexei Starovoitov wrote:
> On Thu, Dec 08, 2016 at 02:17:02PM -0500, David Miller wrote:
>> From: John Fastabend <john.fastabend@gmail.com>
>> Date: Wed, 07 Dec 2016 12:10:47 -0800
>>
>>> This implements virtio_net for the mergeable buffers and big_packet
>>> modes. I tested this with vhost_net running on qemu and did not see
>>> any issues. For testing num_buf > 1 I added a hack to vhost driver
>>> to only but 100 bytes per buffer.
>>  ...
>>
>> So where are we with this?

There is one possible issue with a hang that Michael pointed out. I can
either spin a v6 or if you pull this v5 series in I can post a bugfix
for it. I am not seeing the issue in practice XDP virtio has been up
and running on my box here for days without issue.

All the concerns below are really future XDP ideas and unrelated to
this series or at least not required for this series to applied IMO.

>>
>> I'm not too thrilled with the idea of making XDP_TX optional or
>> something like that.  If someone enables XDP, there is a tradeoff.
>>
>> I also have reservations about the idea to make jumbo frames work
>> without giving XDP access to the whole packet.  If it wants to push or
>> pop a header, it might need to know the whole packet length.  How will
>> you pass that to the XDP program?
>>
>> Some kinds of encapsulation require trailers, thus preclusing access
>> to the entire packet precludes those kinds of transformations.
> 
> +1

This was sort of speculative on my side it is certainly not dependent on
the series here. I agree that we don't want to get into a state where
program X runs here and not there and only runs after doing magic
incantations, etc. I would only propose it if there is a clean way to
implement this.

> 
>> This is why we want simple, linear, buffer access for XDP.
>>
>> Even the most seemingly minor exception turns into a huge complicated
>> mess.
> 
> +1

Yep.

> 
> and from the other thread:
>>> Can't we disable XDP_TX somehow? Many people might only want RX drop,
>>> and extra queues are not always there.
>>>
>>
>> Alexei, Daniel, any thoughts on this?
> 
> I don't like it.
> 

OK alternatively we can make more queues available in virtio which might
be the better solution.

>> I know we were trying to claim some base level of feature support for
>> all XDP drivers. I am sympathetic to this argument though for DDOS we
>> do not need XDP_TX support. And virtio can become queue constrained
>> in some cases.
> 
> especially for ddos case doing lro/gro is not helpful.

Fair enough but disabling LRO to handle the case where you "might" get
a DDOS will hurt normal good traffic.

> I frankly don't see a use case where you'd want to steer a packet
> all the way into VM just to drop them there?

VM to VM traffic is my use case. And in that model we need XDP at the
virtio or vhost layer in case of malicious/broke/untrusted VM. I have
some vhost patches under development for when net-next opens up again.

> Without XDP_TX it's too crippled. adjust_head() won't be possible,

Just a nit but any reason not to support adjust_head and then XDP_PASS.
I don't have a use case in mind but also see no reason to preclude it.

> packet mangling would have to be disabled and so on.
> If xdp program doesn't see raw packet it can only parse the headers of
> this jumbo meta-packet and drop it, but for virtio it's really too late.
> ddos protection needs to be done at the earliest hw nic receive.

VM to VM traffic never touches hw nic.

> I think if driver claims xdp support it needs to support
> drop/pass/tx and adjust_head. For metadata passing up into stack from xdp
> we need adjust_head, for encap/decap we need it too. And lro is in the way
> of such transformations.
> We struggled a lot with cls_bpf due to all metadata inside skb that needs
> to be kept correct. Feeding non-raw packets into xdp is a rat hole.
> 

In summary:

I think its worth investigating getting LRO working but agree we can't
sacrifice any of the existing features or complicate the code to do it.
If the result of investigating is it can't be done then that is how it
is.

Jumbo frames I care very little about in reality so should not have
mentioned it.

Requiring XDP drivers to support all features is fine for me I can make
the virtio queue scheme a bit more flexible. Michael might have some
opinion on this though.

This series shouldn't be blocked by any of the above.

Thanks,
.John

^ permalink raw reply

* Re: [PATCH net-next] net: sock_rps_record_flow() is for connected sockets
From: Tom Herbert @ 2016-12-08 20:44 UTC (permalink / raw)
  To: Hannes Frederic Sowa
  Cc: Eric Dumazet, Paolo Abeni, David Miller, netdev, Willem de Bruijn
In-Reply-To: <1481227516.1898563.813013233.52CC6646@webmail.messagingengine.com>

On Thu, Dec 8, 2016 at 12:05 PM, Hannes Frederic Sowa
<hannes@stressinduktion.org> wrote:
> Hello,
>
> On Thu, Dec 8, 2016, at 20:15, Tom Herbert wrote:
>> On Thu, Dec 8, 2016 at 10:02 AM, Eric Dumazet <eric.dumazet@gmail.com>
>> wrote:
>> > On Thu, 2016-12-08 at 09:49 -0800, Tom Herbert wrote:
>> >
>> >> Of course that would only help on systems where no one enable encaps,
>> >> ie. looks good in the the simple benchmarks but in real life if just
>> >> one socket enables encap everyone else takes the hit. Alternatively,
>> >> maybe we could do early demux when we do the lookup in GRO to
>> >> eliminate the extra lookup?
>> >
>> > Well, if you do the lookup in GRO, wont it be done for every incoming
>> > MSS, instead of once per GRO packet ?
>>
>> We should be able to avoid that. We already do the lookup for every
>> UDP packet going into GRO, would only need to take the refcnt once for
>> the whole GRO packet.
>>
>> >
>> > Anyway, the flooded UDP sockets out there are not normally connected
>>
>> We still should be able to use early demux in that case, just can't
>> avoid the route lookup. I wonder if we might be able to cache a soft
>> route maybe for the last local destination received to help the
>> unconnected sockets case...
>>
>> In any case, I can take a look at of doing early demux from with UDP GRO.
>
> Early demux already breaks ip rules: we might set up a rule so an
> incoming packet might depending on the rule not find an input route at
> all and would be forwarded. Same problem might occur with VRF, when you
> have multiple ip addresses in different "realms".
>
> That said, I don't see why we can't be more aggressive for GRO in the
> unconnected case: we simply must make sure that the current namespace
> holds the ip address, which is simply a hash lookup. After that we can
> even accept packets for a wildcard bounded socket.
>
Or just depend on encapsulation sockets to bind to an address. That
would eliminate most the ambiguity especially if it can be pushed into
a device that is trying to parse encapsulation. We would need new
interfaces to support that in HW, or use n-tuple filtering (which I
still maintain is the only right way to do it).

Tom

> Probably we should disable this logic as soon as soon as vrf and/or
> rules are active to have correct semantics.
>
> Bye,
> Hannes

^ permalink raw reply

* Re: [PATCH net-next v6 0/2] net/sched: cls_flower: Support matching on ICMP
From: Or Gerlitz @ 2016-12-08 20:43 UTC (permalink / raw)
  To: David Miller; +Cc: Simon Horman, Jiri Pirko, Linux Netdev List
In-Reply-To: <20161208.115810.554381162900423199.davem@davemloft.net>

On Thu, Dec 8, 2016 at 6:58 PM, David Miller <davem@davemloft.net> wrote:

> Simon and Or, you both added extensions to cls_flower at the same
> time.  Or's changes went in first, so his UAPI numbers did not change.
> Simons, your changes went in next so your numbers did change and
> therefore you will have to recompile any userland components you were
> using for testing.

Yeah, I guess you had to do some rebasing there for Simon's series...
thanks for taking care

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox