Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: Soft lockup in inet_put_port on 4.6
From: Hannes Frederic Sowa @ 2016-12-08 21:03 UTC (permalink / raw)
  To: Tom Herbert, Linux Kernel Network Developers, Josef Bacik
In-Reply-To: <CALx6S36OVUqAxq9vNnfHp2eJOuG+gSSg896zzaZoc3Og4tyxFw@mail.gmail.com>

Hello Tom,

On Wed, Dec 7, 2016, at 00:06, Tom Herbert wrote:
> We are seeing a fair number of machines getting into softlockup in 4.6
> kernel. As near as I can tell this is happening on the spinlock in
> bind hash bucket. When inet_csk_get_port exits and does spinunlock_bh
> the TCP timer runs and we hit lockup in inet_put_port (presumably on
> same lock). It seems like the locked isn't properly be unlocked
> somewhere but I don't readily see it.
> 
> Any ideas?

Likewise we received reports that pretty much look the same on our
heavily patched kernel. Did you have a chance to investigate or
reproduce the problem?

I am wondering if you would be able to take a complete thread stack dump
if you can reproduce this to check if one of the user space processes is
looping inside finding a free port?

Thanks,
Hannes

^ permalink raw reply

* Re: [net-next PATCH v5 5/6] virtio_net: add XDP_TX support
From: Michael S. Tsirkin @ 2016-12-08 21:08 UTC (permalink / raw)
  To: John Fastabend
  Cc: daniel, shm, davem, tgraf, alexei.starovoitov, john.r.fastabend,
	netdev, brouer
In-Reply-To: <5849A3EE.7090603@gmail.com>

On Thu, Dec 08, 2016 at 10:18:22AM -0800, John Fastabend wrote:
> On 16-12-07 10:11 PM, Michael S. Tsirkin wrote:
> > On Wed, Dec 07, 2016 at 12:12:45PM -0800, John Fastabend wrote:
> >> This adds support for the XDP_TX action to virtio_net. When an XDP
> >> program is run and returns the XDP_TX action the virtio_net XDP
> >> implementation will transmit the packet on a TX queue that aligns
> >> with the current CPU that the XDP packet was processed on.
> >>
> >> Before sending the packet the header is zeroed.  Also XDP is expected
> >> to handle checksum correctly so no checksum offload  support is
> >> provided.
> >>
> >> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
> >> ---
> >>  drivers/net/virtio_net.c |   99 +++++++++++++++++++++++++++++++++++++++++++---
> >>  1 file changed, 92 insertions(+), 7 deletions(-)
> >>
> >> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> >> index 28b1196..8e5b13c 100644
> >> --- a/drivers/net/virtio_net.c
> >> +++ b/drivers/net/virtio_net.c
> >> @@ -330,12 +330,57 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
> >>  	return skb;
> >>  }
> >>  
> >> +static void virtnet_xdp_xmit(struct virtnet_info *vi,
> >> +			     struct receive_queue *rq,
> >> +			     struct send_queue *sq,
> >> +			     struct xdp_buff *xdp)
> >> +{
> >> +	struct page *page = virt_to_head_page(xdp->data);
> >> +	struct virtio_net_hdr_mrg_rxbuf *hdr;
> >> +	unsigned int num_sg, len;
> >> +	void *xdp_sent;
> >> +	int err;
> >> +
> >> +	/* Free up any pending old buffers before queueing new ones. */
> >> +	while ((xdp_sent = virtqueue_get_buf(sq->vq, &len)) != NULL) {
> >> +		struct page *sent_page = virt_to_head_page(xdp_sent);
> >> +
> >> +		if (vi->mergeable_rx_bufs)
> >> +			put_page(sent_page);
> >> +		else
> >> +			give_pages(rq, sent_page);
> >> +	}
> > 
> > Looks like this is the only place where you do virtqueue_get_buf.
> > No interrupt handler?
> > This means that if you fill up the queue, nothing will clean it
> > and things will get stuck.
> 
> hmm OK so the callbacks should be implemented to do this and a pair
> of virtqueue_enable_cb_prepare()/virtqueue_disable_cb() used to enable
> and disable callbacks if packets are enqueued.

Oh I didn't realize XDP never stops processing packets,
even if they are never freed.
In that case you do not need callbacks.

> Also in the normal xmit path via start_xmit() will the same condition
> happen? It looks like free_old_xmit_skbs for example is only called if
> a packet is sent could we end up holding on to skbs in this case? I
> don't see free_old_xmit_skbs being called from any callbacks?

Right - all it does is restart the queue. That's why we don't support
BQL right now.

> > Can this be the issue you saw?
> 
> nope see below I was mishandling the big_packets page cleanup path in
> the error case.
> 
> > 
> > 
> >> +
> >> +	/* Zero header and leave csum up to XDP layers */
> >> +	hdr = xdp->data;
> >> +	memset(hdr, 0, vi->hdr_len);
> >> +
> >> +	nu_sg = 1;
> >> +	sg_init_one(sq->sg, xdp->data, xdp->data_end - xdp->data);
> >> +	err = virtqueue_add_outbuf(sq->vq, sq->sg, num_sg,
> >> +				   xdp->data, GFP_ATOMIC);
> >> +	if (unlikely(err)) {
> >> +		if (vi->mergeable_rx_bufs)
> >> +			put_page(page);
> >> +		else
> >> +			give_pages(rq, page);
> >> +	} else if (!vi->mergeable_rx_bufs) {
> >> +		/* If not mergeable bufs must be big packets so cleanup pages */
> >> +		give_pages(rq, (struct page *)page->private);
> >> +		page->private = 0;
> >> +	}
> >> +
> >> +	virtqueue_kick(sq->vq);
> > 
> > Is this unconditional kick a work-around for hang
> > we could not figure out yet?
> 
> I tracked the original issue down to how I handled the big_packet page
> cleanups.
> 
> > I guess this helps because it just slows down the guest.
> > I don't much like it ...
> 
> I left it like this copying the pattern in balloon and input drivers. I
> can change it back to the previous pattern where it is only called if
> there is no errors. It has been running fine with the old pattern now
> for an hour or so.
> 
> .John

OK makes sense.

^ permalink raw reply

* Re: [PATCH net-next 2/5] liquidio VF vxlan
From: Or Gerlitz @ 2016-12-08 21:08 UTC (permalink / raw)
  To: Raghu Vatsavayi
  Cc: David Miller, Linux Netdev List, Raghu Vatsavayi, Derek Chickles,
	Satanand Burla, Felix Manlunas
In-Reply-To: <1481230848-2393-3-git-send-email-rvatsavayi@caviumnetworks.com>

On Thu, Dec 8, 2016 at 11:00 PM, Raghu Vatsavayi
<rvatsavayi@caviumnetworks.com> wrote:

> Adds VF vxlan offload support.

What's the use case for that? a VM running a VTEP, isn't that part
needs to run @ the host?

Or.

^ permalink raw reply

* Re: [net-next PATCH v5 0/6] XDP for virtio_net
From: Alexei Starovoitov @ 2016-12-08 21:08 UTC (permalink / raw)
  To: John Fastabend
  Cc: David Miller, daniel, mst, shm, tgraf, john.r.fastabend, netdev,
	brouer
In-Reply-To: <5849C68F.7080707@gmail.com>

On Thu, Dec 08, 2016 at 12:46:07PM -0800, John Fastabend wrote:
> 
> Fair enough but disabling LRO to handle the case where you "might" get
> a DDOS will hurt normal good traffic.

the xdp_pass path is not optimized right now. so even without VMs
we need some work to do. lro or not-lro is imo secondary.

> > I frankly don't see a use case where you'd want to steer a packet
> > all the way into VM just to drop them there?
> 
> VM to VM traffic is my use case. And in that model we need XDP at the
> virtio or vhost layer in case of malicious/broke/untrusted VM. I have
> some vhost patches under development for when net-next opens up again.

excellent. looking forward to vhost patches.

> > Without XDP_TX it's too crippled. adjust_head() won't be possible,
> 
> Just a nit but any reason not to support adjust_head and then XDP_PASS.
> I don't have a use case in mind but also see no reason to preclude it.

adjust_head and xdp_pass needs to be supported. No doubt.
the use case is metadata passing between xdp and upper layers.

> In summary:
> 
> I think its worth investigating getting LRO working but agree we can't
> sacrifice any of the existing features or complicate the code to do it.
> If the result of investigating is it can't be done then that is how it
> is.

agree

> Requiring XDP drivers to support all features is fine for me I can make
> the virtio queue scheme a bit more flexible. Michael might have some
> opinion on this though.

I say right now all the features are pretty much must have,
but in the future we will become selective.
Like zero-copy a page from dma into user space probably doesn't
make sense to do for virtio.
Multi-port TX from virtio into phys netdev doesn't make sense either.

> This series shouldn't be blocked by any of the above.

completely agree.
since we abandoned e1000+xdp patches, the virtio+xdp is the only
thing on the table that allows us to do convenient development
and testing of xdp programs.
We've talked about a repository of blessed xdp programs.
They all would need to be routinely and automatically tested.
So virtio+xdp is a must have feature to me.

^ permalink raw reply

* Re: [PATCH net-next 4/5] liquidio VF timestamp
From: Or Gerlitz @ 2016-12-08 21:09 UTC (permalink / raw)
  To: Raghu Vatsavayi
  Cc: David Miller, Linux Netdev List, Raghu Vatsavayi, Derek Chickles,
	Satanand Burla, Felix Manlunas
In-Reply-To: <1481230848-2393-5-git-send-email-rvatsavayi@caviumnetworks.com>

On Thu, Dec 8, 2016 at 11:00 PM, Raghu Vatsavayi
<rvatsavayi@caviumnetworks.com> wrote:
> Adds support for VF timestamp.

same here, what's the use case? do you have per VF HW clocks to set?
How it works if VF A does setup of X and VF B setup of Y

^ permalink raw reply

* Re: [net-next PATCH v5 0/6] XDP for virtio_net
From: Michael S. Tsirkin @ 2016-12-08 21:10 UTC (permalink / raw)
  To: John Fastabend
  Cc: Alexei Starovoitov, David Miller, daniel, shm, tgraf,
	john.r.fastabend, netdev, brouer
In-Reply-To: <20161208225807-mutt-send-email-mst@kernel.org>

On Thu, Dec 08, 2016 at 10:58:52PM +0200, Michael S. Tsirkin wrote:
> On Thu, Dec 08, 2016 at 12:46:07PM -0800, John Fastabend wrote:
> > On 16-12-08 11:38 AM, Alexei Starovoitov wrote:
> > > On Thu, Dec 08, 2016 at 02:17:02PM -0500, David Miller wrote:
> > >> From: John Fastabend <john.fastabend@gmail.com>
> > >> Date: Wed, 07 Dec 2016 12:10:47 -0800
> > >>
> > >>> This implements virtio_net for the mergeable buffers and big_packet
> > >>> modes. I tested this with vhost_net running on qemu and did not see
> > >>> any issues. For testing num_buf > 1 I added a hack to vhost driver
> > >>> to only but 100 bytes per buffer.
> > >>  ...
> > >>
> > >> So where are we with this?
> > 
> > There is one possible issue with a hang that Michael pointed out. I can
> > either spin a v6 or if you pull this v5 series in I can post a bugfix
> > for it. I am not seeing the issue in practice XDP virtio has been up
> > and running on my box here for days without issue.
> 
> 
> I'd prefer it fixed. Alternatively, apply just 1-3 for now.

Looks like there's no issue though after all.
I misunderstood things.


> > All the concerns below are really future XDP ideas and unrelated to
> > this series or at least not required for this series to applied IMO.
> > 
> > >>
> > >> I'm not too thrilled with the idea of making XDP_TX optional or
> > >> something like that.  If someone enables XDP, there is a tradeoff.
> > >>
> > >> I also have reservations about the idea to make jumbo frames work
> > >> without giving XDP access to the whole packet.  If it wants to push or
> > >> pop a header, it might need to know the whole packet length.  How will
> > >> you pass that to the XDP program?
> > >>
> > >> Some kinds of encapsulation require trailers, thus preclusing access
> > >> to the entire packet precludes those kinds of transformations.
> > > 
> > > +1
> > 
> > This was sort of speculative on my side it is certainly not dependent on
> > the series here. I agree that we don't want to get into a state where
> > program X runs here and not there and only runs after doing magic
> > incantations, etc. I would only propose it if there is a clean way to
> > implement this.
> > 
> > > 
> > >> This is why we want simple, linear, buffer access for XDP.
> > >>
> > >> Even the most seemingly minor exception turns into a huge complicated
> > >> mess.
> > > 
> > > +1
> > 
> > Yep.
> > 
> > > 
> > > and from the other thread:
> > >>> Can't we disable XDP_TX somehow? Many people might only want RX drop,
> > >>> and extra queues are not always there.
> > >>>
> > >>
> > >> Alexei, Daniel, any thoughts on this?
> > > 
> > > I don't like it.
> > > 
> > 
> > OK alternatively we can make more queues available in virtio which might
> > be the better solution.
> > 
> > >> I know we were trying to claim some base level of feature support for
> > >> all XDP drivers. I am sympathetic to this argument though for DDOS we
> > >> do not need XDP_TX support. And virtio can become queue constrained
> > >> in some cases.
> > > 
> > > especially for ddos case doing lro/gro is not helpful.
> > 
> > Fair enough but disabling LRO to handle the case where you "might" get
> > a DDOS will hurt normal good traffic.
> > 
> > > I frankly don't see a use case where you'd want to steer a packet
> > > all the way into VM just to drop them there?
> > 
> > VM to VM traffic is my use case. And in that model we need XDP at the
> > virtio or vhost layer in case of malicious/broke/untrusted VM. I have
> > some vhost patches under development for when net-next opens up again.
> > 
> > > Without XDP_TX it's too crippled. adjust_head() won't be possible,
> > 
> > Just a nit but any reason not to support adjust_head and then XDP_PASS.
> > I don't have a use case in mind but also see no reason to preclude it.
> > 
> > > packet mangling would have to be disabled and so on.
> > > If xdp program doesn't see raw packet it can only parse the headers of
> > > this jumbo meta-packet and drop it, but for virtio it's really too late.
> > > ddos protection needs to be done at the earliest hw nic receive.
> > 
> > VM to VM traffic never touches hw nic.
> > 
> > > I think if driver claims xdp support it needs to support
> > > drop/pass/tx and adjust_head. For metadata passing up into stack from xdp
> > > we need adjust_head, for encap/decap we need it too. And lro is in the way
> > > of such transformations.
> > > We struggled a lot with cls_bpf due to all metadata inside skb that needs
> > > to be kept correct. Feeding non-raw packets into xdp is a rat hole.
> > > 
> > 
> > In summary:
> > 
> > I think its worth investigating getting LRO working but agree we can't
> > sacrifice any of the existing features or complicate the code to do it.
> > If the result of investigating is it can't be done then that is how it
> > is.
> > 
> > Jumbo frames I care very little about in reality so should not have
> > mentioned it.
> > 
> > Requiring XDP drivers to support all features is fine for me I can make
> > the virtio queue scheme a bit more flexible. Michael might have some
> > opinion on this though.
> > 
> > This series shouldn't be blocked by any of the above.
> > 
> > Thanks,
> > .John

^ permalink raw reply

* Re: [PATCH v2 net-next 0/4] udp: receive path optimizations
From: Eric Dumazet @ 2016-12-08 21:13 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: Eric Dumazet, David S . Miller, netdev, Paolo Abeni
In-Reply-To: <20161208214819.30138d12@redhat.com>

On Thu, 2016-12-08 at 21:48 +0100, Jesper Dangaard Brouer wrote:
> On Thu,  8 Dec 2016 09:38:55 -0800
> Eric Dumazet <edumazet@google.com> wrote:
> 
> > This patch series provides about 100 % performance increase under flood. 
> 
> Could you please explain a bit more about what kind of testing you are
> doing that can show 100% performance improvement?
> 
> I've tested this patchset and my tests show *huge* speeds ups, but
> reaping the performance benefit depend heavily on setup and enabling
> the right UDP socket settings, and most importantly where the
> performance bottleneck is: ksoftirqd(producer) or udp_sink(consumer).

Right.

So here at Google we do not try (yet) to downgrade our expensive
Multiqueue Nics into dumb NICS from last decade by using a single queue
on them. Maybe it will happen when we can process 10Mpps per core,
but we are not there yet  ;)

So my test is using a NIC, programmed with 8 queues, on a dual-socket
machine. (2 physical packages)

4 queues are handled by 4 cpus on socket0 (NUMA node 0)
4 queues are handled by 4 cpus on socket1 (NUMA node 1)

So I explicitly put my poor single thread UDP application in the worst
condition, having skbs produced on two NUMA nodes. 

Then my load generator use trafgen, with spoofed UDP source addresses,
like a UDP flood would use. Or typical DNS traffic, malicious or not.

So I have 8 cpus all trying to queue packets in a single UDP socket.

Of course, a real high performance server would use 8 UDP sockets, and
SO_REUSEPORT with nice eBPF filter to spread the packets based on the
queue/cpu they arrived.

In the case you have one cpu that you need to share between ksoftirq and
all user threads, then your test results depend on process scheduler
decisions more than anything we can code in network land.

It is actually easy for user space to get more than 50% of the cycles,
and 'starve' ksoftirqd.

^ permalink raw reply

* Re: [net-next PATCH v5 0/6] XDP for virtio_net
From: Michael S. Tsirkin @ 2016-12-08 21:16 UTC (permalink / raw)
  To: David Miller
  Cc: john.fastabend, daniel, shm, tgraf, alexei.starovoitov,
	john.r.fastabend, netdev, brouer
In-Reply-To: <20161208.141702.1346950420275854265.davem@davemloft.net>

On Thu, Dec 08, 2016 at 02:17:02PM -0500, David Miller wrote:
> From: John Fastabend <john.fastabend@gmail.com>
> Date: Wed, 07 Dec 2016 12:10:47 -0800
> 
> > This implements virtio_net for the mergeable buffers and big_packet
> > modes. I tested this with vhost_net running on qemu and did not see
> > any issues. For testing num_buf > 1 I added a hack to vhost driver
> > to only but 100 bytes per buffer.
>  ...
> 
> So where are we with this?
> 
> I'm not too thrilled with the idea of making XDP_TX optional or
> something like that.  If someone enables XDP, there is a tradeoff.

The issue is inability of XDP TX to share xmit queues with net stack.
I'm guessing virtio is not the only card that has a limited
number of queues, is it? Is it really so hard to lock the queue
and check it's running? Could be optional in case resources are there
...

-- 
MST

^ permalink raw reply

* Re: [PATCH v2 net-next 0/4] udp: receive path optimizations
From: Jesper Dangaard Brouer @ 2016-12-08 21:17 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: brouer, David S . Miller, netdev, Paolo Abeni, Eric Dumazet
In-Reply-To: <20161208214819.30138d12@redhat.com>

On Thu, 8 Dec 2016 21:48:19 +0100
Jesper Dangaard Brouer <brouer@redhat.com> wrote:

> On Thu,  8 Dec 2016 09:38:55 -0800
> Eric Dumazet <edumazet@google.com> wrote:
> 
> > This patch series provides about 100 % performance increase under flood.   
> 
> Could you please explain a bit more about what kind of testing you are
> doing that can show 100% performance improvement?
> 
> I've tested this patchset and my tests show *huge* speeds ups, but
> reaping the performance benefit depend heavily on setup and enabling
> the right UDP socket settings, and most importantly where the
> performance bottleneck is: ksoftirqd(producer) or udp_sink(consumer).
> 
> Basic setup: Unload all netfilter, and enable ip_early_demux.
>  sysctl net/ipv4/ip_early_demux=1
> 
> Test generator pktgen UDP packets single flow, 50Gbit/s mlx5 NICs.
>  - Vary packet size between 64 and 1514.

Below, I've added the baseline tests.

Baseline test on net-next at commit c9fba3ed3a4

> Packet-size: 64
> $ sudo taskset -c 4 ./udp_sink --port 9 --count $((10**7))
>                                 ns/pkt  pps             cycles/pkt
> recvMmsg/32  	run: 0 10000000	537.70	1859756.90	2155
> recvmsg   	run: 0 10000000	510.84	1957541.83	2047
> read      	run: 0 10000000	583.40	1714077.14	2338
> recvfrom  	run: 0 10000000	600.09	1666411.49	2405

Packet-size: 64 (baseline)
$ sudo taskset -c 4 ./udp_sink --port 9 --count $((10**7))
recvMmsg/32  	run: 0 10000000	499.75	2001016.09	2003
recvmsg   	run: 0 10000000	455.84	2193740.92	1827
read      	run: 0 10000000	566.99	1763703.49	2272
recvfrom  	run: 0 10000000	581.02	1721098.87	2328

 
> The ksoftirq thread "cost" more than udp_sink, which is idle, and UDP
> queue does not get full-enough. Thus, patchset does not have any
> effect.
> 
> 
> Try to increase pktgen packet size, as this increase the copy cost of
> udp_sink.  Thus, a queue can now form, and udp_sink CPU almost have no
> idle cycles.  The "read" and "readfrom" did experience some idle
> cycles.
> 
> Packet-size: 1514
> $ sudo taskset -c 4 ./udp_sink --port 9 --count $((10**7))
>                                 ns/pkt  pps             cycles/pkt
> recvMmsg/32  	run: 0 10000000	435.88	2294204.11	1747
> recvmsg   	run: 0 10000000	458.06	2183100.64	1835
> read      	run: 0 10000000	520.34	1921826.18	2085
> recvfrom  	run: 0 10000000	515.48	1939935.27	2066

Packet-size: 1514 (baseline)
$ sudo taskset -c 4 ./udp_sink --port 9 --count $((10**7))
recvMmsg/32  	run: 0 10000000	453.88	2203231.81	1819
recvmsg   	run: 0 10000000	488.31	2047869.13	1957
read      	run: 0 10000000	480.99	2079058.69	1927
recvfrom  	run: 0 10000000	522.64	1913349.26	2094


> Next trick connected UDP:
> 
> Use connected UDP socket (combined with ip_early_demux), removes the
> FIB_lookup from the ksoftirq, and cause tipping point to be better.
> 
> Packet-size: 64
> $ sudo taskset -c 4 ./udp_sink --port 9 --count $((10**7)) --connect
>                                 ns/pkt  pps             cycles/pkt
> recvMmsg/32  	run: 0 10000000	391.18	2556361.62	1567
> recvmsg   	run: 0 10000000	422.95	2364349.69	1695
> read      	run: 0 10000000	425.29	2351338.10	1704
> recvfrom  	run: 0 10000000	476.74	2097577.57	1910

Packet-size: 64 (baseline)
$ sudo taskset -c 4 ./udp_sink --port 9 --count $((10**7)) --connect
recvMmsg/32  	run: 0 10000000	438.55	2280255.77	1757
recvmsg   	run: 0 10000000	496.73	2013156.99	1990
read      	run: 0 10000000	412.17	2426170.58	1652
recvfrom  	run: 0 10000000	471.77	2119662.99	1890


> Change/increase packet size:
> 
> Packet-size: 1514
> $ sudo taskset -c 4 ./udp_sink --port 9 --count $((10**7)) --connect
>                                 ns/pkt  pps             cycles/pkt
> recvMmsg/32  	run: 0 10000000	457.56	2185481.94	1833
> recvmsg   	run: 0 10000000	479.42	2085837.49	1921
> read      	run: 0 10000000	398.05	2512233.13	1595
> recvfrom  	run: 0 10000000	391.07	2557096.95	1567

Packet-size: 1514 (baseline)
$ sudo taskset -c 4 ./udp_sink --port 9 --count $((10**7)) --connect
recvMmsg/32  	run: 0 10000000	491.11	2036205.63	1968
recvmsg   	run: 0 10000000	514.37	1944138.31	2061
read      	run: 0 10000000	444.02	2252147.84	1779
recvfrom  	run: 0 10000000	426.58	2344247.20	1709


> A bit strange, changing the packet size, flipped what is the fastest
> syscall.
> 
> It is also interesting to see that ksoftirq limit is:
> 
> Result from "nstat" while using recvmsg, show that ksoftirq is
> handling 2.6 Mpps, and consumer/udp_sink is bottleneck with 2Mpps.
> 
> [skylake ~]$ nstat > /dev/null && sleep 1  && nstat
> #kernel
> IpInReceives                    2667577            0.0
> IpInDelivers                    2667577            0.0
> UdpInDatagrams                  2083580            0.0
> UdpInErrors                     583995             0.0
> UdpRcvbufErrors                 583995             0.0
> IpExtInOctets                   4001340000         0.0
> IpExtInNoECTPkts                2667559            0.0

(baseline 1514 bytes recvmsg)
$ nstat > /dev/null && sleep 1  && nstat
#kernel
IpInReceives                    2702424            0.0
IpInDelivers                    2702423            0.0
UdpInDatagrams                  1950184            0.0
UdpInErrors                     752239             0.0
UdpRcvbufErrors                 752239             0.0
IpExtInOctets                   4053642000         0.0
IpExtInNoECTPkts                2702428            0.0

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply

* Re: [net-next PATCH v5 5/6] virtio_net: add XDP_TX support
From: Michael S. Tsirkin @ 2016-12-08 21:18 UTC (permalink / raw)
  To: John Fastabend
  Cc: daniel, shm, davem, tgraf, alexei.starovoitov, john.r.fastabend,
	netdev, brouer
In-Reply-To: <5849A3EE.7090603@gmail.com>

On Thu, Dec 08, 2016 at 10:18:22AM -0800, John Fastabend wrote:
> > I guess this helps because it just slows down the guest.
> > I don't much like it ...
> 
> I left it like this copying the pattern in balloon and input drivers. I
> can change it back to the previous pattern where it is only called if
> there is no errors. It has been running fine with the old pattern now
> for an hour or so.
> 
> .John

I'd prefer internal consistency. Could be a patch on top
if this helps. I'm happy it isn't actually buggy.
Let me know whether you want to post v6 or
want me to ack this one.

^ permalink raw reply

* Re: [net-next PATCH v5 5/6] virtio_net: add XDP_TX support
From: John Fastabend @ 2016-12-08 21:25 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: daniel, shm, davem, tgraf, alexei.starovoitov, john.r.fastabend,
	netdev, brouer
In-Reply-To: <20161208231708-mutt-send-email-mst@kernel.org>

On 16-12-08 01:18 PM, Michael S. Tsirkin wrote:
> On Thu, Dec 08, 2016 at 10:18:22AM -0800, John Fastabend wrote:
>>> I guess this helps because it just slows down the guest.
>>> I don't much like it ...
>>
>> I left it like this copying the pattern in balloon and input drivers. I
>> can change it back to the previous pattern where it is only called if
>> there is no errors. It has been running fine with the old pattern now
>> for an hour or so.
>>
>> .John
> 
> I'd prefer internal consistency. Could be a patch on top
> if this helps. I'm happy it isn't actually buggy.
> Let me know whether you want to post v6 or
> want me to ack this one.
> 

I think because DaveM has closed net-next ACK'ing this set would be
great to get it in Dave's tree. Then I can post a small "fix" so that it
is consistent between normal stack and xdp tx path after that.

Thanks,
John

^ permalink raw reply

* Re: [net-next PATCH v5 1/6] net: virtio dynamically disable/enable LRO
From: Michael S. Tsirkin @ 2016-12-08 21:36 UTC (permalink / raw)
  To: John Fastabend
  Cc: daniel, shm, davem, tgraf, alexei.starovoitov, john.r.fastabend,
	netdev, brouer
In-Reply-To: <20161207201111.28121.4879.stgit@john-Precision-Tower-5810>

On Wed, Dec 07, 2016 at 12:11:11PM -0800, John Fastabend wrote:
> This adds support for dynamically setting the LRO feature flag. The
> message to control guest features in the backend uses the
> CTRL_GUEST_OFFLOADS msg type.
> 
> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
> ---
>  drivers/net/virtio_net.c |   40 +++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 39 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index a21d93a..a5c47b1 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -1419,6 +1419,36 @@ static void virtnet_init_settings(struct net_device *dev)
>  	.set_settings = virtnet_set_settings,
>  };
>  
> +static int virtnet_set_features(struct net_device *netdev,
> +				netdev_features_t features)
> +{
> +	struct virtnet_info *vi = netdev_priv(netdev);
> +	struct virtio_device *vdev = vi->vdev;
> +	struct scatterlist sg;
> +	u64 offloads = 0;
> +
> +	if (features & NETIF_F_LRO)
> +		offloads |= (1 << VIRTIO_NET_F_GUEST_TSO4) |
> +			    (1 << VIRTIO_NET_F_GUEST_TSO6);
> +
> +	if (features & NETIF_F_RXCSUM)
> +		offloads |= (1 << VIRTIO_NET_F_GUEST_CSUM);
> +
> +	if (virtio_has_feature(vdev, VIRTIO_NET_F_CTRL_GUEST_OFFLOADS)) {
> +		sg_init_one(&sg, &offloads, sizeof(uint64_t));
> +		if (!virtnet_send_command(vi,
> +					  VIRTIO_NET_CTRL_GUEST_OFFLOADS,
> +					  VIRTIO_NET_CTRL_GUEST_OFFLOADS_SET,
> +					  &sg)) {

Hmm I just realised that this will slow down setups that bridge
virtio net interfaces since bridge calls this if provided.
See below.

> +			dev_warn(&netdev->dev,
> +				 "Failed to set guest offloads by virtnet command.\n");
> +			return -EINVAL;
> +		}
> +	}

Hmm if VIRTIO_NET_F_CTRL_GUEST_OFFLOADS is off, this fails
silently. It might actually be a good idea to avoid
breaking setups.

> +
> +	return 0;
> +}
> +
>  static const struct net_device_ops virtnet_netdev = {
>  	.ndo_open            = virtnet_open,
>  	.ndo_stop   	     = virtnet_close,
> @@ -1435,6 +1465,7 @@ static void virtnet_init_settings(struct net_device *dev)
>  #ifdef CONFIG_NET_RX_BUSY_POLL
>  	.ndo_busy_poll		= virtnet_busy_poll,
>  #endif
> +	.ndo_set_features	= virtnet_set_features,
>  };
>  
>  static void virtnet_config_changed_work(struct work_struct *work)
> @@ -1815,6 +1846,12 @@ static int virtnet_probe(struct virtio_device *vdev)
>  	if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_CSUM))
>  		dev->features |= NETIF_F_RXCSUM;
>  
> +	if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO4) &&
> +	    virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO6)) {
> +		dev->features |= NETIF_F_LRO;
> +		dev->hw_features |= NETIF_F_LRO;

So the issue is I think that the virtio "LRO" isn't really
LRO, it's typically just GRO forwarded to guests.
So these are easily re-split along MTU boundaries,
which makes it ok to forward these across bridges.

It's not nice that we don't document this in the spec,
but it's the reality and people rely on this.

For now, how about doing a custom thing and just disable/enable
it as XDP is attached/detached?

> +	}
> +
>  	dev->vlan_features = dev->features;
>  
>  	/* MTU range: 68 - 65535 */
> @@ -2057,7 +2094,8 @@ static int virtnet_restore(struct virtio_device *vdev)
>  	VIRTIO_NET_F_CTRL_RX, VIRTIO_NET_F_CTRL_VLAN, \
>  	VIRTIO_NET_F_GUEST_ANNOUNCE, VIRTIO_NET_F_MQ, \
>  	VIRTIO_NET_F_CTRL_MAC_ADDR, \
> -	VIRTIO_NET_F_MTU
> +	VIRTIO_NET_F_MTU, \
> +	VIRTIO_NET_F_CTRL_GUEST_OFFLOADS
>  
>  static unsigned int features[] = {
>  	VIRTNET_FEATURES,

^ permalink raw reply

* Re: Soft lockup in inet_put_port on 4.6
From: Josef Bacik @ 2016-12-08 21:36 UTC (permalink / raw)
  To: Hannes Frederic Sowa; +Cc: Tom Herbert, Linux Kernel Network Developers
In-Reply-To: <1481231024.1911284.813071977.72AF4DEE@webmail.messagingengine.com>

On Thu, Dec 8, 2016 at 4:03 PM, Hannes Frederic Sowa 
<hannes@stressinduktion.org> wrote:
> Hello Tom,
> 
> On Wed, Dec 7, 2016, at 00:06, Tom Herbert wrote:
>>  We are seeing a fair number of machines getting into softlockup in 
>> 4.6
>>  kernel. As near as I can tell this is happening on the spinlock in
>>  bind hash bucket. When inet_csk_get_port exits and does 
>> spinunlock_bh
>>  the TCP timer runs and we hit lockup in inet_put_port (presumably on
>>  same lock). It seems like the locked isn't properly be unlocked
>>  somewhere but I don't readily see it.
>> 
>>  Any ideas?
> 
> Likewise we received reports that pretty much look the same on our
> heavily patched kernel. Did you have a chance to investigate or
> reproduce the problem?
> 
> I am wondering if you would be able to take a complete thread stack 
> dump
> if you can reproduce this to check if one of the user space processes 
> is
> looping inside finding a free port?

We can reproduce the problem at will, still trying to run down the 
problem.  I'll try and find one of the boxes that dumped a core and get 
a bt of everybody.  Thanks,

Josef

^ permalink raw reply

* Re: [net-next PATCH v5 5/6] virtio_net: add XDP_TX support
From: Michael S. Tsirkin @ 2016-12-08 21:45 UTC (permalink / raw)
  To: John Fastabend
  Cc: daniel, shm, davem, tgraf, alexei.starovoitov, john.r.fastabend,
	netdev, brouer
In-Reply-To: <5849CFD4.60103@gmail.com>

On Thu, Dec 08, 2016 at 01:25:40PM -0800, John Fastabend wrote:
> On 16-12-08 01:18 PM, Michael S. Tsirkin wrote:
> > On Thu, Dec 08, 2016 at 10:18:22AM -0800, John Fastabend wrote:
> >>> I guess this helps because it just slows down the guest.
> >>> I don't much like it ...
> >>
> >> I left it like this copying the pattern in balloon and input drivers. I
> >> can change it back to the previous pattern where it is only called if
> >> there is no errors. It has been running fine with the old pattern now
> >> for an hour or so.
> >>
> >> .John
> > 
> > I'd prefer internal consistency. Could be a patch on top
> > if this helps. I'm happy it isn't actually buggy.
> > Let me know whether you want to post v6 or
> > want me to ack this one.
> > 
> 
> I think because DaveM has closed net-next ACK'ing this set would be
> great to get it in Dave's tree. Then I can post a small "fix" so that it
> is consistent between normal stack and xdp tx path after that.
> 
> Thanks,
> John

I can merge it through my tree too - I don't think there's
any dependency on net-next, is there?

-- 
MST

^ permalink raw reply

* Re: [net-next PATCH v5 5/6] virtio_net: add XDP_TX support
From: John Fastabend @ 2016-12-08 21:51 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: daniel, shm, davem, tgraf, alexei.starovoitov, john.r.fastabend,
	netdev, brouer
In-Reply-To: <20161208234513-mutt-send-email-mst@kernel.org>

On 16-12-08 01:45 PM, Michael S. Tsirkin wrote:
> On Thu, Dec 08, 2016 at 01:25:40PM -0800, John Fastabend wrote:
>> On 16-12-08 01:18 PM, Michael S. Tsirkin wrote:
>>> On Thu, Dec 08, 2016 at 10:18:22AM -0800, John Fastabend wrote:
>>>>> I guess this helps because it just slows down the guest.
>>>>> I don't much like it ...
>>>>
>>>> I left it like this copying the pattern in balloon and input drivers. I
>>>> can change it back to the previous pattern where it is only called if
>>>> there is no errors. It has been running fine with the old pattern now
>>>> for an hour or so.
>>>>
>>>> .John
>>>
>>> I'd prefer internal consistency. Could be a patch on top
>>> if this helps. I'm happy it isn't actually buggy.
>>> Let me know whether you want to post v6 or
>>> want me to ack this one.
>>>
>>
>> I think because DaveM has closed net-next ACK'ing this set would be
>> great to get it in Dave's tree. Then I can post a small "fix" so that it
>> is consistent between normal stack and xdp tx path after that.
>>
>> Thanks,
>> John
> 
> I can merge it through my tree too - I don't think there's
> any dependency on net-next, is there?
> 

Its all changes to virtio_net except for patch 2 which is against
filter.c/filter.h which I guess you could possibly get a merge conflict
on. But it would be fairly trivial.

I don't have a preference on which tree it goes through whichever makes
the most sense.

.John

^ permalink raw reply

* Re: [PATCH 1/2] net: ethernet: sxgbe: remove private tx queue lock
From: Pavel Machek @ 2016-12-08 21:54 UTC (permalink / raw)
  To: Lino Sanfilippo
  Cc: Francois Romieu, bh74.an, ks.giri, vipul.pandya, peppe.cavallaro,
	alexandre.torgue, davem, linux-kernel, netdev
In-Reply-To: <051e3043-8b58-0591-36e3-99e2267f67f4@gmx.de>

[-- Attachment #1: Type: text/plain, Size: 2207 bytes --]

On Thu 2016-12-08 21:32:12, Lino Sanfilippo wrote:
> Hi,
> 
> On 08.12.2016 00:15, Francois Romieu wrote:
> > Lino Sanfilippo <LinoSanfilippo@gmx.de> :
> >> The driver uses a private lock for synchronization between the xmit
> >> function and the xmit completion handler, but since the NETIF_F_LLTX flag
> >> is not set, the xmit function is also called with the xmit_lock held.
> >> 
> >> On the other hand the xmit completion handler first takes the private lock
> >> and (in case that the tx queue has been stopped) the xmit_lock, leading
> >> to a reverse locking order and the potential danger of a deadlock.
> > 
> > netif_tx_stop_queue is used by:
> > 1. xmit function before releasing lock and returning.
> > 2. sxgbe_restart_tx_queue()
> >    <- sxgbe_tx_interrupt
> >    <- sxgbe_reset_all_tx_queues()
> >       <- sxgbe_tx_timeout()
> > 
> > Given xmit won't be called again until tx queue is enabled, it's not clear
> > how a deadlock could happen due to #1.
> > 
> 
> 
> After spending more thoughts on this I tend to agree with you. Yes, we have the
> different locking order for the xmit_lock and the private lock in two concurrent
> threads. And one of the first things one learns about locking is that this is a
> good way to create a deadlock sooner or later. But in our case the deadlock 
> can only occur if the xmit function and the tx completion handler perceive different
>  states for the tx queue, or to be more specific: 
> the completion handler sees the tx queue in state "stopped" while the xmit handler 
> sees it in state "running" at the same time. Only then both functions would try to
> take both locks, which could lead to a deadlock.
> 
> OTOH Pavel said that he actually could produce a deadlock. Now I wonder if this is caused
> by that locking scheme (in a way I have not figured out yet) or if it is a different issue.

Pavel has some problems, but that's on different hardware.. and it is
possible that it is deadlock (or something else) somewhere else.

Best regards,
									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply

* [PATCH net-next 1/3] net/mlx5e: use %pad format string for dma_addr_t
From: Arnd Bergmann @ 2016-12-08 21:57 UTC (permalink / raw)
  To: Saeed Mahameed, Matan Barak, Leon Romanovsky
  Cc: Arnd Bergmann, David S. Miller, Daniel Jurgens, Tariq Toukan,
	netdev, linux-rdma, linux-kernel

On 32-bit ARM with 64-bit dma_addr_t I get this warning about an
incorrect format string:

In file included from /git/arm-soc/drivers/net/ethernet/mellanox/mlx5/core/alloc.c:42:0:
drivers/net/ethernet/mellanox/mlx5/core/alloc.c: In function ‘mlx5_frag_buf_alloc_node’:
drivers/net/ethernet/mellanox/mlx5/core/alloc.c:134:12: error: cast to pointer from integer of different size [-Werror=int-to-pointer-cast]

We have the special %pad format for printing dma_addr_t, so use that
to print the correct address and avoid the warning.

Fixes: 1c1b522808a1 ("net/mlx5e: Implement Fragmented Work Queue (WQ)")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
---
 drivers/net/ethernet/mellanox/mlx5/core/alloc.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/alloc.c b/drivers/net/ethernet/mellanox/mlx5/core/alloc.c
index 44791de5afe6..66bd213f35ce 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/alloc.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/alloc.c
@@ -130,8 +130,8 @@ int mlx5_frag_buf_alloc_node(struct mlx5_core_dev *dev, int size,
 		if (frag->map & ((1 << buf->page_shift) - 1)) {
 			dma_free_coherent(&dev->pdev->dev, frag_sz,
 					  buf->frags[i].buf, buf->frags[i].map);
-			mlx5_core_warn(dev, "unexpected map alignment: 0x%p, page_shift=%d\n",
-				       (void *)frag->map, buf->page_shift);
+			mlx5_core_warn(dev, "unexpected map alignment: %pad, page_shift=%d\n",
+				       &frag->map, buf->page_shift);
 			goto err_free_buf;
 		}
 		size -= frag_sz;
-- 
2.9.0

^ permalink raw reply related

* [PATCH net-next 2/3] net: xgene: move xgene_cle_ptree_ewdn data off stack
From: Arnd Bergmann @ 2016-12-08 21:57 UTC (permalink / raw)
  To: Iyappan Subramanian, Keyur Chudgar
  Cc: Arnd Bergmann, David S. Miller, Khuong Dinh, Quan Nguyen,
	Tanmay Inamdar, netdev, linux-kernel
In-Reply-To: <20161208215727.44841-1-arnd@arndb.de>

The array for initializing the cle is set up on the stack with
almost entirely constant data and then passed to a function that
converts it into HW specific bit patterns. With the latest
addition, the size of this array has grown to the point that
we get a warning about potential stack overflow in allmodconfig
builds:

xgene_enet_cle.c: In function ‘xgene_enet_cle_init’:
xgene_enet_cle.c:836:1: error: the frame size of 1032 bytes is larger than 1024 bytes [-Werror=frame-larger-than=]

Looking a bit deeper at the usage, I noticed that the only modification
of the data is in dead code, as we don't even use the cle module
for phy_mode other than PHY_INTERFACE_MODE_XGMII. This means we
can simply mark the structure constant and access it directly rather
than passing the pointer down through another structure, making
the code more efficient at the same time as avoiding the
warning.

Fixes: a809701fed15 ("drivers: net: xgene: fix: RSS for non-TCP/UDP")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
---
 drivers/net/ethernet/apm/xgene/xgene_enet_cle.c | 768 ++++++++++++------------
 drivers/net/ethernet/apm/xgene/xgene_enet_cle.h |   2 -
 2 files changed, 381 insertions(+), 389 deletions(-)

diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_cle.c b/drivers/net/ethernet/apm/xgene/xgene_enet_cle.c
index 1dc6c20cd82b..e1a51d8892fc 100644
--- a/drivers/net/ethernet/apm/xgene/xgene_enet_cle.c
+++ b/drivers/net/ethernet/apm/xgene/xgene_enet_cle.c
@@ -79,10 +79,10 @@ static void xgene_cle_kn_to_hw(struct xgene_cle_ptree_kn *kn, u32 *buf)
 	}
 }
 
-static void xgene_cle_dn_to_hw(struct xgene_cle_ptree_ewdn *dn,
+static void xgene_cle_dn_to_hw(const struct xgene_cle_ptree_ewdn *dn,
 			       u32 *buf, u32 jb)
 {
-	struct xgene_cle_ptree_branch *br;
+	const struct xgene_cle_ptree_branch *br;
 	u32 i, j = 0;
 	u32 npp;
 
@@ -205,17 +205,385 @@ static int xgene_cle_setup_dbptr(struct xgene_enet_pdata *pdata,
 	return 0;
 }
 
+static const struct xgene_cle_ptree_ewdn xgene_init_ptree_dn[] = {
+	{
+		/* PKT_TYPE_NODE */
+		.node_type = EWDN,
+		.last_node = 0,
+		.hdr_len_store = 1,
+		.hdr_extn = NO_BYTE,
+		.byte_store = NO_BYTE,
+		.search_byte_store = NO_BYTE,
+		.result_pointer = DB_RES_DROP,
+		.num_branches = 2,
+		.branch = {
+			{
+				/* IPV4 */
+				.valid = 1,
+				.next_packet_pointer = 22,
+				.jump_bw = JMP_FW,
+				.jump_rel = JMP_ABS,
+				.operation = EQT,
+				.next_node = PKT_PROT_NODE,
+				.next_branch = 0,
+				.data = 0x8,
+				.mask = 0x0
+			},
+			{
+				.valid = 0,
+				.next_packet_pointer = 262,
+				.jump_bw = JMP_FW,
+				.jump_rel = JMP_ABS,
+				.operation = EQT,
+				.next_node = LAST_NODE,
+				.next_branch = 0,
+				.data = 0x0,
+				.mask = 0xffff
+			}
+		},
+	},
+	{
+		/* PKT_PROT_NODE */
+		.node_type = EWDN,
+		.last_node = 0,
+		.hdr_len_store = 1,
+		.hdr_extn = NO_BYTE,
+		.byte_store = NO_BYTE,
+		.search_byte_store = NO_BYTE,
+		.result_pointer = DB_RES_DROP,
+		.num_branches = 3,
+		.branch = {
+			{
+				/* TCP */
+				.valid = 1,
+				.next_packet_pointer = 26,
+				.jump_bw = JMP_FW,
+				.jump_rel = JMP_ABS,
+				.operation = EQT,
+				.next_node = RSS_IPV4_TCP_NODE,
+				.next_branch = 0,
+				.data = 0x0600,
+				.mask = 0x00ff
+			},
+			{
+				/* UDP */
+				.valid = 1,
+				.next_packet_pointer = 26,
+				.jump_bw = JMP_FW,
+				.jump_rel = JMP_ABS,
+				.operation = EQT,
+				.next_node = RSS_IPV4_UDP_NODE,
+				.next_branch = 0,
+				.data = 0x1100,
+				.mask = 0x00ff
+			},
+			{
+				.valid = 0,
+				.next_packet_pointer = 26,
+				.jump_bw = JMP_FW,
+				.jump_rel = JMP_ABS,
+				.operation = EQT,
+				.next_node = RSS_IPV4_OTHERS_NODE,
+				.next_branch = 0,
+				.data = 0x0,
+				.mask = 0xffff
+			}
+		}
+	},
+	{
+		/* RSS_IPV4_TCP_NODE */
+		.node_type = EWDN,
+		.last_node = 0,
+		.hdr_len_store = 1,
+		.hdr_extn = NO_BYTE,
+		.byte_store = NO_BYTE,
+		.search_byte_store = BOTH_BYTES,
+		.result_pointer = DB_RES_DROP,
+		.num_branches = 6,
+		.branch = {
+			{
+				/* SRC IPV4 B01 */
+				.valid = 0,
+				.next_packet_pointer = 28,
+				.jump_bw = JMP_FW,
+				.jump_rel = JMP_ABS,
+				.operation = EQT,
+				.next_node = RSS_IPV4_TCP_NODE,
+				.next_branch = 1,
+				.data = 0x0,
+				.mask = 0xffff
+			},
+			{
+				/* SRC IPV4 B23 */
+				.valid = 0,
+				.next_packet_pointer = 30,
+				.jump_bw = JMP_FW,
+				.jump_rel = JMP_ABS,
+				.operation = EQT,
+				.next_node = RSS_IPV4_TCP_NODE,
+				.next_branch = 2,
+				.data = 0x0,
+				.mask = 0xffff
+			},
+			{
+				/* DST IPV4 B01 */
+				.valid = 0,
+				.next_packet_pointer = 32,
+				.jump_bw = JMP_FW,
+				.jump_rel = JMP_ABS,
+				.operation = EQT,
+				.next_node = RSS_IPV4_TCP_NODE,
+				.next_branch = 3,
+				.data = 0x0,
+				.mask = 0xffff
+			},
+			{
+				/* DST IPV4 B23 */
+				.valid = 0,
+				.next_packet_pointer = 34,
+				.jump_bw = JMP_FW,
+				.jump_rel = JMP_ABS,
+				.operation = EQT,
+				.next_node = RSS_IPV4_TCP_NODE,
+				.next_branch = 4,
+				.data = 0x0,
+				.mask = 0xffff
+			},
+			{
+				/* TCP SRC Port */
+				.valid = 0,
+				.next_packet_pointer = 36,
+				.jump_bw = JMP_FW,
+				.jump_rel = JMP_ABS,
+				.operation = EQT,
+				.next_node = RSS_IPV4_TCP_NODE,
+				.next_branch = 5,
+				.data = 0x0,
+				.mask = 0xffff
+			},
+			{
+				/* TCP DST Port */
+				.valid = 0,
+				.next_packet_pointer = 256,
+				.jump_bw = JMP_FW,
+				.jump_rel = JMP_ABS,
+				.operation = EQT,
+				.next_node = LAST_NODE,
+				.next_branch = 0,
+				.data = 0x0,
+				.mask = 0xffff
+			}
+		}
+	},
+	{
+		/* RSS_IPV4_UDP_NODE */
+		.node_type = EWDN,
+		.last_node = 0,
+		.hdr_len_store = 1,
+		.hdr_extn = NO_BYTE,
+		.byte_store = NO_BYTE,
+		.search_byte_store = BOTH_BYTES,
+		.result_pointer = DB_RES_DROP,
+		.num_branches = 6,
+		.branch = {
+			{
+				/* SRC IPV4 B01 */
+				.valid = 0,
+				.next_packet_pointer = 28,
+				.jump_bw = JMP_FW,
+				.jump_rel = JMP_ABS,
+				.operation = EQT,
+				.next_node = RSS_IPV4_UDP_NODE,
+				.next_branch = 1,
+				.data = 0x0,
+				.mask = 0xffff
+			},
+			{
+				/* SRC IPV4 B23 */
+				.valid = 0,
+				.next_packet_pointer = 30,
+				.jump_bw = JMP_FW,
+				.jump_rel = JMP_ABS,
+				.operation = EQT,
+				.next_node = RSS_IPV4_UDP_NODE,
+				.next_branch = 2,
+				.data = 0x0,
+				.mask = 0xffff
+			},
+			{
+				/* DST IPV4 B01 */
+				.valid = 0,
+				.next_packet_pointer = 32,
+				.jump_bw = JMP_FW,
+				.jump_rel = JMP_ABS,
+				.operation = EQT,
+				.next_node = RSS_IPV4_UDP_NODE,
+				.next_branch = 3,
+				.data = 0x0,
+				.mask = 0xffff
+			},
+			{
+				/* DST IPV4 B23 */
+				.valid = 0,
+				.next_packet_pointer = 34,
+				.jump_bw = JMP_FW,
+				.jump_rel = JMP_ABS,
+				.operation = EQT,
+				.next_node = RSS_IPV4_UDP_NODE,
+				.next_branch = 4,
+				.data = 0x0,
+				.mask = 0xffff
+			},
+			{
+				/* TCP SRC Port */
+				.valid = 0,
+				.next_packet_pointer = 36,
+				.jump_bw = JMP_FW,
+				.jump_rel = JMP_ABS,
+				.operation = EQT,
+				.next_node = RSS_IPV4_UDP_NODE,
+				.next_branch = 5,
+				.data = 0x0,
+				.mask = 0xffff
+			},
+			{
+				/* TCP DST Port */
+				.valid = 0,
+				.next_packet_pointer = 258,
+				.jump_bw = JMP_FW,
+				.jump_rel = JMP_ABS,
+				.operation = EQT,
+				.next_node = LAST_NODE,
+				.next_branch = 0,
+				.data = 0x0,
+				.mask = 0xffff
+			}
+		}
+	},
+	{
+		/* RSS_IPV4_OTHERS_NODE */
+		.node_type = EWDN,
+		.last_node = 0,
+		.hdr_len_store = 1,
+		.hdr_extn = NO_BYTE,
+		.byte_store = NO_BYTE,
+		.search_byte_store = BOTH_BYTES,
+		.result_pointer = DB_RES_DROP,
+		.num_branches = 6,
+		.branch = {
+			{
+				/* SRC IPV4 B01 */
+				.valid = 0,
+				.next_packet_pointer = 28,
+				.jump_bw = JMP_FW,
+				.jump_rel = JMP_ABS,
+				.operation = EQT,
+				.next_node = RSS_IPV4_OTHERS_NODE,
+				.next_branch = 1,
+				.data = 0x0,
+				.mask = 0xffff
+			},
+			{
+				/* SRC IPV4 B23 */
+				.valid = 0,
+				.next_packet_pointer = 30,
+				.jump_bw = JMP_FW,
+				.jump_rel = JMP_ABS,
+				.operation = EQT,
+				.next_node = RSS_IPV4_OTHERS_NODE,
+				.next_branch = 2,
+				.data = 0x0,
+				.mask = 0xffff
+			},
+			{
+				/* DST IPV4 B01 */
+				.valid = 0,
+				.next_packet_pointer = 32,
+				.jump_bw = JMP_FW,
+				.jump_rel = JMP_ABS,
+				.operation = EQT,
+				.next_node = RSS_IPV4_OTHERS_NODE,
+				.next_branch = 3,
+				.data = 0x0,
+				.mask = 0xffff
+			},
+			{
+				/* DST IPV4 B23 */
+				.valid = 0,
+				.next_packet_pointer = 34,
+				.jump_bw = JMP_FW,
+				.jump_rel = JMP_ABS,
+				.operation = EQT,
+				.next_node = RSS_IPV4_OTHERS_NODE,
+				.next_branch = 4,
+				.data = 0x0,
+				.mask = 0xffff
+			},
+			{
+				/* TCP SRC Port */
+				.valid = 0,
+				.next_packet_pointer = 36,
+				.jump_bw = JMP_FW,
+				.jump_rel = JMP_ABS,
+				.operation = EQT,
+				.next_node = RSS_IPV4_OTHERS_NODE,
+				.next_branch = 5,
+				.data = 0x0,
+				.mask = 0xffff
+			},
+			{
+				/* TCP DST Port */
+				.valid = 0,
+				.next_packet_pointer = 260,
+				.jump_bw = JMP_FW,
+				.jump_rel = JMP_ABS,
+				.operation = EQT,
+				.next_node = LAST_NODE,
+				.next_branch = 0,
+				.data = 0x0,
+				.mask = 0xffff
+			}
+		}
+	},
+
+	{
+		/* LAST NODE */
+		.node_type = EWDN,
+		.last_node = 1,
+		.hdr_len_store = 1,
+		.hdr_extn = NO_BYTE,
+		.byte_store = NO_BYTE,
+		.search_byte_store = NO_BYTE,
+		.result_pointer = DB_RES_DROP,
+		.num_branches = 1,
+		.branch = {
+			{
+				.valid = 0,
+				.next_packet_pointer = 0,
+				.jump_bw = JMP_FW,
+				.jump_rel = JMP_ABS,
+				.operation = EQT,
+				.next_node = MAX_NODES,
+				.next_branch = 0,
+				.data = 0,
+				.mask = 0xffff
+			}
+		}
+	}
+};
+
 static int xgene_cle_setup_node(struct xgene_enet_pdata *pdata,
 				struct xgene_enet_cle *cle)
 {
 	struct xgene_cle_ptree *ptree = &cle->ptree;
-	struct xgene_cle_ptree_ewdn *dn = ptree->dn;
+	const struct xgene_cle_ptree_ewdn *dn = xgene_init_ptree_dn;
+	int num_dn = ARRAY_SIZE(xgene_init_ptree_dn);
 	struct xgene_cle_ptree_kn *kn = ptree->kn;
 	u32 buf[CLE_DRAM_REGS];
 	int i, j, ret;
 
 	memset(buf, 0, sizeof(buf));
-	for (i = 0; i < ptree->num_dn; i++) {
+	for (i = 0; i < num_dn; i++) {
 		xgene_cle_dn_to_hw(&dn[i], buf, cle->jump_bytes);
 		ret = xgene_cle_dram_wr(cle, buf, 17, i + ptree->start_node,
 					PTREE_RAM, CLE_CMD_WR);
@@ -225,8 +593,8 @@ static int xgene_cle_setup_node(struct xgene_enet_pdata *pdata,
 
 	/* continue node index for key node */
 	memset(buf, 0, sizeof(buf));
-	for (j = i; j < (ptree->num_kn + ptree->num_dn); j++) {
-		xgene_cle_kn_to_hw(&kn[j - ptree->num_dn], buf);
+	for (j = i; j < (ptree->num_kn + num_dn); j++) {
+		xgene_cle_kn_to_hw(&kn[j - num_dn], buf);
 		ret = xgene_cle_dram_wr(cle, buf, 17, j + ptree->start_node,
 					PTREE_RAM, CLE_CMD_WR);
 		if (ret)
@@ -407,392 +775,20 @@ static int xgene_enet_cle_init(struct xgene_enet_pdata *pdata)
 	struct xgene_enet_cle *enet_cle = &pdata->cle;
 	u32 def_qid, def_fpsel, def_nxtfpsel, pool_id;
 	struct xgene_cle_dbptr dbptr[DB_MAX_PTRS];
-	struct xgene_cle_ptree_branch *br;
 	struct xgene_cle_ptree *ptree;
 	struct xgene_cle_ptree_kn kn;
 	int ret;
-	struct xgene_cle_ptree_ewdn ptree_dn[] = {
-		{
-			/* PKT_TYPE_NODE */
-			.node_type = EWDN,
-			.last_node = 0,
-			.hdr_len_store = 1,
-			.hdr_extn = NO_BYTE,
-			.byte_store = NO_BYTE,
-			.search_byte_store = NO_BYTE,
-			.result_pointer = DB_RES_DROP,
-			.num_branches = 2,
-			.branch = {
-				{
-					/* IPV4 */
-					.valid = 1,
-					.next_packet_pointer = 22,
-					.jump_bw = JMP_FW,
-					.jump_rel = JMP_ABS,
-					.operation = EQT,
-					.next_node = PKT_PROT_NODE,
-					.next_branch = 0,
-					.data = 0x8,
-					.mask = 0x0
-				},
-				{
-					.valid = 0,
-					.next_packet_pointer = 262,
-					.jump_bw = JMP_FW,
-					.jump_rel = JMP_ABS,
-					.operation = EQT,
-					.next_node = LAST_NODE,
-					.next_branch = 0,
-					.data = 0x0,
-					.mask = 0xffff
-				}
-			},
-		},
-		{
-			/* PKT_PROT_NODE */
-			.node_type = EWDN,
-			.last_node = 0,
-			.hdr_len_store = 1,
-			.hdr_extn = NO_BYTE,
-			.byte_store = NO_BYTE,
-			.search_byte_store = NO_BYTE,
-			.result_pointer = DB_RES_DROP,
-			.num_branches = 3,
-			.branch = {
-				{
-					/* TCP */
-					.valid = 1,
-					.next_packet_pointer = 26,
-					.jump_bw = JMP_FW,
-					.jump_rel = JMP_ABS,
-					.operation = EQT,
-					.next_node = RSS_IPV4_TCP_NODE,
-					.next_branch = 0,
-					.data = 0x0600,
-					.mask = 0x00ff
-				},
-				{
-					/* UDP */
-					.valid = 1,
-					.next_packet_pointer = 26,
-					.jump_bw = JMP_FW,
-					.jump_rel = JMP_ABS,
-					.operation = EQT,
-					.next_node = RSS_IPV4_UDP_NODE,
-					.next_branch = 0,
-					.data = 0x1100,
-					.mask = 0x00ff
-				},
-				{
-					.valid = 0,
-					.next_packet_pointer = 26,
-					.jump_bw = JMP_FW,
-					.jump_rel = JMP_ABS,
-					.operation = EQT,
-					.next_node = RSS_IPV4_OTHERS_NODE,
-					.next_branch = 0,
-					.data = 0x0,
-					.mask = 0xffff
-				}
-			}
-		},
-		{
-			/* RSS_IPV4_TCP_NODE */
-			.node_type = EWDN,
-			.last_node = 0,
-			.hdr_len_store = 1,
-			.hdr_extn = NO_BYTE,
-			.byte_store = NO_BYTE,
-			.search_byte_store = BOTH_BYTES,
-			.result_pointer = DB_RES_DROP,
-			.num_branches = 6,
-			.branch = {
-				{
-					/* SRC IPV4 B01 */
-					.valid = 0,
-					.next_packet_pointer = 28,
-					.jump_bw = JMP_FW,
-					.jump_rel = JMP_ABS,
-					.operation = EQT,
-					.next_node = RSS_IPV4_TCP_NODE,
-					.next_branch = 1,
-					.data = 0x0,
-					.mask = 0xffff
-				},
-				{
-					/* SRC IPV4 B23 */
-					.valid = 0,
-					.next_packet_pointer = 30,
-					.jump_bw = JMP_FW,
-					.jump_rel = JMP_ABS,
-					.operation = EQT,
-					.next_node = RSS_IPV4_TCP_NODE,
-					.next_branch = 2,
-					.data = 0x0,
-					.mask = 0xffff
-				},
-				{
-					/* DST IPV4 B01 */
-					.valid = 0,
-					.next_packet_pointer = 32,
-					.jump_bw = JMP_FW,
-					.jump_rel = JMP_ABS,
-					.operation = EQT,
-					.next_node = RSS_IPV4_TCP_NODE,
-					.next_branch = 3,
-					.data = 0x0,
-					.mask = 0xffff
-				},
-				{
-					/* DST IPV4 B23 */
-					.valid = 0,
-					.next_packet_pointer = 34,
-					.jump_bw = JMP_FW,
-					.jump_rel = JMP_ABS,
-					.operation = EQT,
-					.next_node = RSS_IPV4_TCP_NODE,
-					.next_branch = 4,
-					.data = 0x0,
-					.mask = 0xffff
-				},
-				{
-					/* TCP SRC Port */
-					.valid = 0,
-					.next_packet_pointer = 36,
-					.jump_bw = JMP_FW,
-					.jump_rel = JMP_ABS,
-					.operation = EQT,
-					.next_node = RSS_IPV4_TCP_NODE,
-					.next_branch = 5,
-					.data = 0x0,
-					.mask = 0xffff
-				},
-				{
-					/* TCP DST Port */
-					.valid = 0,
-					.next_packet_pointer = 256,
-					.jump_bw = JMP_FW,
-					.jump_rel = JMP_ABS,
-					.operation = EQT,
-					.next_node = LAST_NODE,
-					.next_branch = 0,
-					.data = 0x0,
-					.mask = 0xffff
-				}
-			}
-		},
-		{
-			/* RSS_IPV4_UDP_NODE */
-			.node_type = EWDN,
-			.last_node = 0,
-			.hdr_len_store = 1,
-			.hdr_extn = NO_BYTE,
-			.byte_store = NO_BYTE,
-			.search_byte_store = BOTH_BYTES,
-			.result_pointer = DB_RES_DROP,
-			.num_branches = 6,
-			.branch = {
-				{
-					/* SRC IPV4 B01 */
-					.valid = 0,
-					.next_packet_pointer = 28,
-					.jump_bw = JMP_FW,
-					.jump_rel = JMP_ABS,
-					.operation = EQT,
-					.next_node = RSS_IPV4_UDP_NODE,
-					.next_branch = 1,
-					.data = 0x0,
-					.mask = 0xffff
-				},
-				{
-					/* SRC IPV4 B23 */
-					.valid = 0,
-					.next_packet_pointer = 30,
-					.jump_bw = JMP_FW,
-					.jump_rel = JMP_ABS,
-					.operation = EQT,
-					.next_node = RSS_IPV4_UDP_NODE,
-					.next_branch = 2,
-					.data = 0x0,
-					.mask = 0xffff
-				},
-				{
-					/* DST IPV4 B01 */
-					.valid = 0,
-					.next_packet_pointer = 32,
-					.jump_bw = JMP_FW,
-					.jump_rel = JMP_ABS,
-					.operation = EQT,
-					.next_node = RSS_IPV4_UDP_NODE,
-					.next_branch = 3,
-					.data = 0x0,
-					.mask = 0xffff
-				},
-				{
-					/* DST IPV4 B23 */
-					.valid = 0,
-					.next_packet_pointer = 34,
-					.jump_bw = JMP_FW,
-					.jump_rel = JMP_ABS,
-					.operation = EQT,
-					.next_node = RSS_IPV4_UDP_NODE,
-					.next_branch = 4,
-					.data = 0x0,
-					.mask = 0xffff
-				},
-				{
-					/* TCP SRC Port */
-					.valid = 0,
-					.next_packet_pointer = 36,
-					.jump_bw = JMP_FW,
-					.jump_rel = JMP_ABS,
-					.operation = EQT,
-					.next_node = RSS_IPV4_UDP_NODE,
-					.next_branch = 5,
-					.data = 0x0,
-					.mask = 0xffff
-				},
-				{
-					/* TCP DST Port */
-					.valid = 0,
-					.next_packet_pointer = 258,
-					.jump_bw = JMP_FW,
-					.jump_rel = JMP_ABS,
-					.operation = EQT,
-					.next_node = LAST_NODE,
-					.next_branch = 0,
-					.data = 0x0,
-					.mask = 0xffff
-				}
-			}
-		},
-		{
-			/* RSS_IPV4_OTHERS_NODE */
-			.node_type = EWDN,
-			.last_node = 0,
-			.hdr_len_store = 1,
-			.hdr_extn = NO_BYTE,
-			.byte_store = NO_BYTE,
-			.search_byte_store = BOTH_BYTES,
-			.result_pointer = DB_RES_DROP,
-			.num_branches = 6,
-			.branch = {
-				{
-					/* SRC IPV4 B01 */
-					.valid = 0,
-					.next_packet_pointer = 28,
-					.jump_bw = JMP_FW,
-					.jump_rel = JMP_ABS,
-					.operation = EQT,
-					.next_node = RSS_IPV4_OTHERS_NODE,
-					.next_branch = 1,
-					.data = 0x0,
-					.mask = 0xffff
-				},
-				{
-					/* SRC IPV4 B23 */
-					.valid = 0,
-					.next_packet_pointer = 30,
-					.jump_bw = JMP_FW,
-					.jump_rel = JMP_ABS,
-					.operation = EQT,
-					.next_node = RSS_IPV4_OTHERS_NODE,
-					.next_branch = 2,
-					.data = 0x0,
-					.mask = 0xffff
-				},
-				{
-					/* DST IPV4 B01 */
-					.valid = 0,
-					.next_packet_pointer = 32,
-					.jump_bw = JMP_FW,
-					.jump_rel = JMP_ABS,
-					.operation = EQT,
-					.next_node = RSS_IPV4_OTHERS_NODE,
-					.next_branch = 3,
-					.data = 0x0,
-					.mask = 0xffff
-				},
-				{
-					/* DST IPV4 B23 */
-					.valid = 0,
-					.next_packet_pointer = 34,
-					.jump_bw = JMP_FW,
-					.jump_rel = JMP_ABS,
-					.operation = EQT,
-					.next_node = RSS_IPV4_OTHERS_NODE,
-					.next_branch = 4,
-					.data = 0x0,
-					.mask = 0xffff
-				},
-				{
-					/* TCP SRC Port */
-					.valid = 0,
-					.next_packet_pointer = 36,
-					.jump_bw = JMP_FW,
-					.jump_rel = JMP_ABS,
-					.operation = EQT,
-					.next_node = RSS_IPV4_OTHERS_NODE,
-					.next_branch = 5,
-					.data = 0x0,
-					.mask = 0xffff
-				},
-				{
-					/* TCP DST Port */
-					.valid = 0,
-					.next_packet_pointer = 260,
-					.jump_bw = JMP_FW,
-					.jump_rel = JMP_ABS,
-					.operation = EQT,
-					.next_node = LAST_NODE,
-					.next_branch = 0,
-					.data = 0x0,
-					.mask = 0xffff
-				}
-			}
-		},
 
-		{
-			/* LAST NODE */
-			.node_type = EWDN,
-			.last_node = 1,
-			.hdr_len_store = 1,
-			.hdr_extn = NO_BYTE,
-			.byte_store = NO_BYTE,
-			.search_byte_store = NO_BYTE,
-			.result_pointer = DB_RES_DROP,
-			.num_branches = 1,
-			.branch = {
-				{
-					.valid = 0,
-					.next_packet_pointer = 0,
-					.jump_bw = JMP_FW,
-					.jump_rel = JMP_ABS,
-					.operation = EQT,
-					.next_node = MAX_NODES,
-					.next_branch = 0,
-					.data = 0,
-					.mask = 0xffff
-				}
-			}
-		}
-	};
+	if (pdata->phy_mode != PHY_INTERFACE_MODE_XGMII)
+		return -EINVAL;
 
 	ptree = &enet_cle->ptree;
 	ptree->start_pkt = 12; /* Ethertype */
-	if (pdata->phy_mode == PHY_INTERFACE_MODE_XGMII) {
-		ret = xgene_cle_setup_rss(pdata);
-		if (ret) {
-			netdev_err(pdata->ndev, "RSS initialization failed\n");
-			return ret;
-		}
-	} else {
-		br = &ptree_dn[PKT_PROT_NODE].branch[0];
-		br->valid = 0;
-		br->next_packet_pointer = 260;
-		br->next_node = LAST_NODE;
-		br->data = 0x0000;
-		br->mask = 0xffff;
+
+	ret = xgene_cle_setup_rss(pdata);
+	if (ret) {
+		netdev_err(pdata->ndev, "RSS initialization failed\n");
+		return ret;
 	}
 
 	def_qid = xgene_enet_dst_ring_num(pdata->rx_ring[0]);
@@ -825,10 +821,8 @@ static int xgene_enet_cle_init(struct xgene_enet_pdata *pdata)
 	kn.key[0].priority = 0;
 	kn.key[0].result_pointer = DB_RES_ACCEPT;
 
-	ptree->dn = ptree_dn;
 	ptree->kn = &kn;
 	ptree->dbptr = dbptr;
-	ptree->num_dn = MAX_NODES;
 	ptree->num_kn = 1;
 	ptree->num_dbptr = DB_MAX_PTRS;
 
diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_cle.h b/drivers/net/ethernet/apm/xgene/xgene_enet_cle.h
index 290d5d159ec2..18fe8d56082c 100644
--- a/drivers/net/ethernet/apm/xgene/xgene_enet_cle.h
+++ b/drivers/net/ethernet/apm/xgene/xgene_enet_cle.h
@@ -278,10 +278,8 @@ struct xgene_cle_dbptr {
 };
 
 struct xgene_cle_ptree {
-	struct xgene_cle_ptree_ewdn *dn;
 	struct xgene_cle_ptree_kn *kn;
 	struct xgene_cle_dbptr *dbptr;
-	u32 num_dn;
 	u32 num_kn;
 	u32 num_dbptr;
 	u32 start_node;
-- 
2.9.0

^ permalink raw reply related

* Re: [PATCH net-next 2/2] net: ethernet: Initial driver for Synopsys DWC XLGMAC
From: Pavel Machek @ 2016-12-08 21:57 UTC (permalink / raw)
  To: Jie Deng
  Cc: davem, f.fainelli, netdev, linux-kernel, CARLOS.PALMINHA,
	lars.persson, thomas.lendacky, peppe.cavallaro, Joao.Pinto
In-Reply-To: <bea5ea07-cbab-35fe-434c-1cfcb96225ef@synopsys.com>

[-- Attachment #1: Type: text/plain, Size: 2399 bytes --]

On Thu 2016-12-08 13:58:47, Jie Deng wrote:
> 
> 
> On 2016/12/7 17:48, Pavel Machek wrote:
> > Hi!
> >
> >> This patch provides the initial driver for 25/40/50/100 GbE
> >> devices using Synopsys DWC Enterprise Ethernet (XLGMAC)
> >>
> >> Signed-off-by: Jie Deng <jiedeng@synopsys.com>
> > I trust this is different hardware from stmmac?
> Yes, different hardware. Currently,  Synopsys has several Ethernet IP cores. But
> drivers for these hardwares are scattered in different places. I really hope
> that they can be centralized into driver/net/ethernet/synopsys for maintenance.
> This makes things easy for others to find the corresponding drivers.
>     - QoS
>         drivers/net/ethernet/synopsys/dwc_eth_qos.c
>         drivers/net/ethernet/stmicro/stmmac/          
>     - XGMAC
>         driver/net/ethernet/amd/xgbe/
>     - XLGMAC
>         No driver for this hardware in mainline at present.


> >> diff --git a/MAINTAINERS b/MAINTAINERS
> >> index 331f6af..738f818 100644
> >> --- a/MAINTAINERS
> >> +++ b/MAINTAINERS
> >> @@ -10639,6 +10639,12 @@ S: Supported
> >>  F: Documentation/devicetree/bindings/net/snps,dwc-qos-ethernet.txt
> >>  F: drivers/net/ethernet/synopsys/dwc_eth_qos.c
> >>  
> >> +SYNOPSYS DESIGNWARE ENTERPRISE ETHERNET (XLGMAC) DRIVER
> >> +M:	jiedeng <jiedeng@synopsys.com>
> > Jie Deng here?
> Oh, should be Jie Deng here. Thank you!
> >
> >> @@ -0,0 +1,228 @@
> >> +/*
> >> + * Synopsys DesignWare Ethernet Driver
> > Your patch title names hardware differently...?
> Yes, currently, it should be XLGMAC driver. I extracted the parts can be shared
> for different Synopsys IPs.
> 
> The XLGMAC driver can be divided into three parts
> 
> - The DWC ethernet core layer (DWC ECL):
> Currently, it supports XLGMAC. But I think this part can be shared for different
> Synopsys IPs.
> 
> - The DWC MAC HW adapter layer (DWC MHAL):
> This part was designed to include special support for a specific MAC IP.
> Currently, XLGMAC, and I think we can add
> something to support XGMAC and QoS in the future.
> 
> - The glue adapter layer (GAL)

Ok, thanks for description. Hopefully it can be merged in a way that
is not too confusing...

Best regards,
									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply

* [PATCH net-next 3/3] net: xgene: avoid bogus maybe-uninitialized warning
From: Arnd Bergmann @ 2016-12-08 21:57 UTC (permalink / raw)
  To: Iyappan Subramanian, Keyur Chudgar
  Cc: Arnd Bergmann, David S. Miller, Quan Nguyen, Khuong Dinh, Toan Le,
	netdev, linux-kernel
In-Reply-To: <20161208215727.44841-1-arnd@arndb.de>

In some configurations, gcc cannot trace the state of variables
across a spin_unlock() barrier, leading to a warning about
correct code:

xgene_enet_main.c: In function 'xgene_enet_start_xmit':
../../../phy/mdio-xgene.h:112:14: error: 'mss_index' may be used uninitialized in this function [-Werror=maybe-uninitialized]

Here we can trivially move the assignment before that spin_unlock,
which reliably avoids the warning.

Fixes: e3978673f514 ("drivers: net: xgene: Fix MSS programming")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
---
 drivers/net/ethernet/apm/xgene/xgene_enet_main.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_main.c b/drivers/net/ethernet/apm/xgene/xgene_enet_main.c
index 6c7eea8b36af..dba4b883e9a3 100644
--- a/drivers/net/ethernet/apm/xgene/xgene_enet_main.c
+++ b/drivers/net/ethernet/apm/xgene/xgene_enet_main.c
@@ -319,11 +319,11 @@ static int xgene_enet_setup_mss(struct net_device *ndev, u32 mss)
 		}
 	}
 
-	spin_unlock(&pdata->mss_lock);
-
 	/* No slots with ref_count = 0 available, return busy */
 	if (!mss_index_found)
-		return -EBUSY;
+		mss_index = -EBUSY;
+
+	spin_unlock(&pdata->mss_lock);
 
 	return mss_index;
 }
-- 
2.9.0

^ permalink raw reply related

* Re: net-next closing, README
From: Timur Tabi @ 2016-12-08 22:02 UTC (permalink / raw)
  To: David Miller; +Cc: netdev
In-Reply-To: <20161207.162845.1424733568267357691.davem@davemloft.net>

On Wed, Dec 7, 2016 at 3:28 PM, David Miller <davem@davemloft.net> wrote:

> Therefore, please do not submit any new features or cleanups for
> net-next.  Bug fixes for problems introduced in net-next are fine,
> however.

I posted a v3 of my "QDF2400 support" patchset today, although v1 was
posted two weeks ago.  Am I breaking the rule? I'm really hoping to
get those patches into 4.10, if that's okay.

-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.

^ permalink raw reply

* Re: [PATCH 1/2] net: ethernet: sxgbe: remove private tx queue lock
From: Lino Sanfilippo @ 2016-12-08 22:12 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Francois Romieu, bh74.an, ks.giri, vipul.pandya, peppe.cavallaro,
	alexandre.torgue, davem, linux-kernel, netdev
In-Reply-To: <20161208215409.GA12472@amd>

Hi,

On 08.12.2016 22:54, Pavel Machek wrote:
> On Thu 2016-12-08 21:32:12, Lino Sanfilippo wrote:
>> Hi,
>> 
>> On 08.12.2016 00:15, Francois Romieu wrote:
>> > Lino Sanfilippo <LinoSanfilippo@gmx.de> :
>> >> The driver uses a private lock for synchronization between the xmit
>> >> function and the xmit completion handler, but since the NETIF_F_LLTX flag
>> >> is not set, the xmit function is also called with the xmit_lock held.
>> >> 
>> >> On the other hand the xmit completion handler first takes the private lock
>> >> and (in case that the tx queue has been stopped) the xmit_lock, leading
>> >> to a reverse locking order and the potential danger of a deadlock.
>> > 
>> > netif_tx_stop_queue is used by:
>> > 1. xmit function before releasing lock and returning.
>> > 2. sxgbe_restart_tx_queue()
>> >    <- sxgbe_tx_interrupt
>> >    <- sxgbe_reset_all_tx_queues()
>> >       <- sxgbe_tx_timeout()
>> > 
>> > Given xmit won't be called again until tx queue is enabled, it's not clear
>> > how a deadlock could happen due to #1.
>> > 
>> 
>> 
>> After spending more thoughts on this I tend to agree with you. Yes, we have the
>> different locking order for the xmit_lock and the private lock in two concurrent
>> threads. And one of the first things one learns about locking is that this is a
>> good way to create a deadlock sooner or later. But in our case the deadlock 
>> can only occur if the xmit function and the tx completion handler perceive different
>>  states for the tx queue, or to be more specific: 
>> the completion handler sees the tx queue in state "stopped" while the xmit handler 
>> sees it in state "running" at the same time. Only then both functions would try to
>> take both locks, which could lead to a deadlock.
>> 
>> OTOH Pavel said that he actually could produce a deadlock. Now I wonder if this is caused
>> by that locking scheme (in a way I have not figured out yet) or if it is a different issue.
> 
> Pavel has some problems, but that's on different hardware.. and it is
> possible that it is deadlock (or something else) somewhere else.
> 

Right, it is different hardware. But the locking situation in xmit function and tx completion handler
is very similar in both drivers. So if a deadlock is not possible in sxgbe it should 
also not be possible in stmmac (at least not due to the different locking order). 
So maybe there is no real issue that we could fix with removing the private lock and we should
keep it as it is.

Regards,
Lino

^ permalink raw reply

* Re: [PATCH net-next 1/2] net: phy: add extension of phy-mode for XLGMII
From: Florian Fainelli @ 2016-12-08 22:15 UTC (permalink / raw)
  To: Jie Deng, davem, netdev
  Cc: linux-kernel, CARLOS.PALMINHA, lars.persson, thomas.lendacky
In-Reply-To: <6d4a65c5488364cdaec72eb22e742aa7e94dfebb.1481075763.git.jiedeng@synopsys.com>

On 12/06/2016 07:57 PM, Jie Deng wrote:
> This patch adds phy-mode support for Synopsys XLGMAC

The functional changes look good, but I would like to see some
description of what the XL part stands for here.

While you are modifying this, do you also mind submitting a Device Tree
specification change:

https://www.devicetree.org/specifications/

Thanks!

> 
> Signed-off-by: Jie Deng <jiedeng@synopsys.com>
> ---
>  Documentation/devicetree/bindings/net/ethernet.txt | 1 +
>  include/linux/phy.h                                | 3 +++
>  2 files changed, 4 insertions(+)
> 
> diff --git a/Documentation/devicetree/bindings/net/ethernet.txt b/Documentation/devicetree/bindings/net/ethernet.txt
> index 0515095..2378f00 100644
> --- a/Documentation/devicetree/bindings/net/ethernet.txt
> +++ b/Documentation/devicetree/bindings/net/ethernet.txt
> @@ -28,6 +28,7 @@ The following properties are common to the Ethernet controllers:
>    * "rtbi"
>    * "smii"
>    * "xgmii"
> +  * "xlgmii"
>    * "trgmii"
>  - phy-connection-type: the same as "phy-mode" property but described in ePAPR;
>  - phy-handle: phandle, specifies a reference to a node representing a PHY
> diff --git a/include/linux/phy.h b/include/linux/phy.h
> index feb8a98..b52f9f8 100644
> --- a/include/linux/phy.h
> +++ b/include/linux/phy.h
> @@ -79,6 +79,7 @@
>  	PHY_INTERFACE_MODE_RTBI,
>  	PHY_INTERFACE_MODE_SMII,
>  	PHY_INTERFACE_MODE_XGMII,
> +	PHY_INTERFACE_MODE_XLGMII,
>  	PHY_INTERFACE_MODE_MOCA,
>  	PHY_INTERFACE_MODE_QSGMII,
>  	PHY_INTERFACE_MODE_TRGMII,
> @@ -136,6 +137,8 @@ static inline const char *phy_modes(phy_interface_t interface)
>  		return "smii";
>  	case PHY_INTERFACE_MODE_XGMII:
>  		return "xgmii";
> +	case PHY_INTERFACE_MODE_XLGMII:
> +		return "xlgmii";
>  	case PHY_INTERFACE_MODE_MOCA:
>  		return "moca";
>  	case PHY_INTERFACE_MODE_QSGMII:
> 


-- 
Florian

^ permalink raw reply

* Re: [net-next PATCH v5 0/6] XDP for virtio_net
From: David Miller @ 2016-12-08 22:16 UTC (permalink / raw)
  To: john.fastabend
  Cc: alexei.starovoitov, daniel, mst, shm, tgraf, john.r.fastabend,
	netdev, brouer
In-Reply-To: <5849C68F.7080707@gmail.com>

From: John Fastabend <john.fastabend@gmail.com>
Date: Thu, 8 Dec 2016 12:46:07 -0800

> On 16-12-08 11:38 AM, Alexei Starovoitov wrote:
>> On Thu, Dec 08, 2016 at 02:17:02PM -0500, David Miller wrote:
>>> From: John Fastabend <john.fastabend@gmail.com>
>>> Date: Wed, 07 Dec 2016 12:10:47 -0800
>>>
>>>> This implements virtio_net for the mergeable buffers and big_packet
>>>> modes. I tested this with vhost_net running on qemu and did not see
>>>> any issues. For testing num_buf > 1 I added a hack to vhost driver
>>>> to only but 100 bytes per buffer.
>>>  ...
>>>
>>> So where are we with this?
> 
> There is one possible issue with a hang that Michael pointed out. I can
> either spin a v6 or if you pull this v5 series in I can post a bugfix
> for it. I am not seeing the issue in practice XDP virtio has been up
> and running on my box here for days without issue.

If that's the case then please spin a v6 and I'll apply it.

^ permalink raw reply

* Re: [PATCH 1/2] net: ethernet: sxgbe: remove private tx queue lock
From: Pavel Machek @ 2016-12-08 22:18 UTC (permalink / raw)
  To: Lino Sanfilippo
  Cc: Francois Romieu, bh74.an, ks.giri, vipul.pandya, peppe.cavallaro,
	alexandre.torgue, davem, linux-kernel, netdev
In-Reply-To: <9b55b51c-bbbf-7f80-fb67-9df88a288708@gmx.de>

[-- Attachment #1: Type: text/plain, Size: 2997 bytes --]

On Thu 2016-12-08 23:12:10, Lino Sanfilippo wrote:
> Hi,
> 
> On 08.12.2016 22:54, Pavel Machek wrote:
> > On Thu 2016-12-08 21:32:12, Lino Sanfilippo wrote:
> >> Hi,
> >> 
> >> On 08.12.2016 00:15, Francois Romieu wrote:
> >> > Lino Sanfilippo <LinoSanfilippo@gmx.de> :
> >> >> The driver uses a private lock for synchronization between the xmit
> >> >> function and the xmit completion handler, but since the NETIF_F_LLTX flag
> >> >> is not set, the xmit function is also called with the xmit_lock held.
> >> >> 
> >> >> On the other hand the xmit completion handler first takes the private lock
> >> >> and (in case that the tx queue has been stopped) the xmit_lock, leading
> >> >> to a reverse locking order and the potential danger of a deadlock.
> >> > 
> >> > netif_tx_stop_queue is used by:
> >> > 1. xmit function before releasing lock and returning.
> >> > 2. sxgbe_restart_tx_queue()
> >> >    <- sxgbe_tx_interrupt
> >> >    <- sxgbe_reset_all_tx_queues()
> >> >       <- sxgbe_tx_timeout()
> >> > 
> >> > Given xmit won't be called again until tx queue is enabled, it's not clear
> >> > how a deadlock could happen due to #1.
> >> > 
> >> 
> >> 
> >> After spending more thoughts on this I tend to agree with you. Yes, we have the
> >> different locking order for the xmit_lock and the private lock in two concurrent
> >> threads. And one of the first things one learns about locking is that this is a
> >> good way to create a deadlock sooner or later. But in our case the deadlock 
> >> can only occur if the xmit function and the tx completion handler perceive different
> >>  states for the tx queue, or to be more specific: 
> >> the completion handler sees the tx queue in state "stopped" while the xmit handler 
> >> sees it in state "running" at the same time. Only then both functions would try to
> >> take both locks, which could lead to a deadlock.
> >> 
> >> OTOH Pavel said that he actually could produce a deadlock. Now I wonder if this is caused
> >> by that locking scheme (in a way I have not figured out yet) or if it is a different issue.
> > 
> > Pavel has some problems, but that's on different hardware.. and it is
> > possible that it is deadlock (or something else) somewhere else.
> > 
> 
> Right, it is different hardware. But the locking situation in xmit function and tx completion handler
> is very similar in both drivers. So if a deadlock is not possible in sxgbe it should 
> also not be possible in stmmac (at least not due to the different locking order). 
> So maybe there is no real issue that we could fix with removing the private lock and we should
> keep it as it is.

Well.. the locking is pretty confused there. Having private lock that
mirrors lock from network layer is confusing and ugly... that should
be reason to fix it.
									Pavel

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox