Netdev List

Netdev List
 help / color / mirror / Atom feed

* [Patch net v2] af_unix: conditionally use freezable blocking calls in read
From: Cong Wang @ 2016-11-17 23:55 UTC (permalink / raw)
  To: netdev
  Cc: dvyukov, Cong Wang, Tejun Heo, Colin Cross, Rafael J. Wysocki,
	Hannes Frederic Sowa

Commit 2b15af6f95 ("af_unix: use freezable blocking calls in read")
converts schedule_timeout() to its freezable version, it was probably
correct at that time, but later, commit 2b514574f7e8
("net: af_unix: implement splice for stream af_unix sockets") breaks
the strong requirement for a freezable sleep, according to
commit 0f9548ca1091:

    We shouldn't try_to_freeze if locks are held.  Holding a lock can cause a
    deadlock if the lock is later acquired in the suspend or hibernate path
    (e.g.  by dpm).  Holding a lock can also cause a deadlock in the case of
    cgroup_freezer if a lock is held inside a frozen cgroup that is later
    acquired by a process outside that group.

The pipe_lock is still held at that point.

So use freezable version only for the recvmsg call path, avoid impact for
Android.

Fixes: 2b514574f7e8 ("net: af_unix: implement splice for stream af_unix sockets")
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Colin Cross <ccross@android.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
---
 net/unix/af_unix.c | 17 +++++++++++------
 1 file changed, 11 insertions(+), 6 deletions(-)

diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 5d1c14a..2358f26 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -2199,7 +2199,8 @@ static int unix_dgram_recvmsg(struct socket *sock, struct msghdr *msg,
  *	Sleep until more data has arrived. But check for races..
  */
 static long unix_stream_data_wait(struct sock *sk, long timeo,
-				  struct sk_buff *last, unsigned int last_len)
+				  struct sk_buff *last, unsigned int last_len,
+				  bool freezable)
 {
 	struct sk_buff *tail;
 	DEFINE_WAIT(wait);
@@ -2220,7 +2221,10 @@ static long unix_stream_data_wait(struct sock *sk, long timeo,
 
 		sk_set_bit(SOCKWQ_ASYNC_WAITDATA, sk);
 		unix_state_unlock(sk);
-		timeo = freezable_schedule_timeout(timeo);
+		if (freezable)
+			timeo = freezable_schedule_timeout(timeo);
+		else
+			timeo = schedule_timeout(timeo);
 		unix_state_lock(sk);
 
 		if (sock_flag(sk, SOCK_DEAD))
@@ -2250,7 +2254,8 @@ struct unix_stream_read_state {
 	unsigned int splice_flags;
 };
 
-static int unix_stream_read_generic(struct unix_stream_read_state *state)
+static int unix_stream_read_generic(struct unix_stream_read_state *state,
+				    bool freezable)
 {
 	struct scm_cookie scm;
 	struct socket *sock = state->socket;
@@ -2330,7 +2335,7 @@ static int unix_stream_read_generic(struct unix_stream_read_state *state)
 			mutex_unlock(&u->iolock);
 
 			timeo = unix_stream_data_wait(sk, timeo, last,
-						      last_len);
+						      last_len, freezable);
 
 			if (signal_pending(current)) {
 				err = sock_intr_errno(timeo);
@@ -2472,7 +2477,7 @@ static int unix_stream_recvmsg(struct socket *sock, struct msghdr *msg,
 		.flags = flags
 	};
 
-	return unix_stream_read_generic(&state);
+	return unix_stream_read_generic(&state, true);
 }
 
 static int unix_stream_splice_actor(struct sk_buff *skb,
@@ -2503,7 +2508,7 @@ static ssize_t unix_stream_splice_read(struct socket *sock,  loff_t *ppos,
 	    flags & SPLICE_F_NONBLOCK)
 		state.flags = MSG_DONTWAIT;
 
-	return unix_stream_read_generic(&state);
+	return unix_stream_read_generic(&state, false);
 }
 
 static int unix_shutdown(struct socket *sock, int mode)
-- 
2.1.0

^ permalink raw reply related

* Re: Virtio_net support vxlan encapsulation package TSO offload discuss
From: Jarno Rajahalme @ 2016-11-17 23:10 UTC (permalink / raw)
  To: Jason Wang
  Cc: Zhangming (James, Euler), netdev@vger.kernel.org,
	Michael S. Tsirkin, Vlad Yasevic, Amnon Ilan
In-Reply-To: <0438366c-8b0e-270e-cbd1-334c1d655428@redhat.com>

I worked on the same issue a few months back. I rebased my proof-of-concept code to the current net-next and posted an RFC patch a moment ago.

I have zero experience on QEMU feature negotiation or extending the virtio_net spec. Since the virtio_net handling code is now all done using shared code, this should work for macvtap as well, not sure if macvtap needs some control plane changes.

I posted a separate patch to make af_packet also use the shared infra for virtio_net handling yesterday. My RFC patch assumes that af_packet need not be touched, i.e., assumes the af_packet patch is applied, even though the patches apply to net-next in either order.

  Jarno

> On Nov 16, 2016, at 11:27 PM, Jason Wang <jasowang@redhat.com> wrote:
> 
> 
> 
> On 2016年11月17日 09:31, Zhangming (James, Euler) wrote:
>> On 2016年11月15日 11:28, Jason Wang wrote:
>>> On 2016年11月10日 14:19, Zhangming (James, Euler) wrote:
>>>> On 2016年11月09日 15:14, Jason Wang wrote:
>>>>> On 2016年11月08日 19:58, Zhangming (James, Euler) wrote:
>>>>>> On 2016年11月08日 19:17, Jason Wang wrote:
>>>>>> 
>>>>>>> On 2016年11月08日 19:13, Jason Wang wrote:
>>>>>>>> Cc Michael
>>>>>>>> 
>>>>>>>> On 2016年11月08日 16:34, Zhangming (James, Euler) wrote:
>>>>>>>>> In container scenario, OVS is installed in the Virtual machine,
>>>>>>>>> and all the containers connected to the OVS will communicated
>>>>>>>>> through VXLAN encapsulation.
>>>>>>>>> 
>>>>>>>>> By now, virtio_net does not support TSO offload for VXLAN
>>>>>>>>> encapsulated TSO package. In this condition, the performance is
>>>>>>>>> not good, sender is bottleneck
>>>>>>>>> 
>>>>>>>>> I googled this scenario, but I didn’t find any information. Will
>>>>>>>>> virtio_net support VXLAN encapsulation package TSO offload later?
>>>>>>>>> 
>>>>>>>> Yes and for both sender and receiver.
>>>>>>>> 
>>>>>>>>> My idea is virtio_net open encapsulated TSO offload, and
>>>>>>>>> transport encapsulation info to TUN, TUN will parse the info and
>>>>>>>>> build skb with encapsulation info.
>>>>>>>>> 
>>>>>>>>> OVS or kernel on the host should be modified to support this.
>>>>>>>>> Using this method, the TCP performance aremore than 2x as before.
>>>>>>>>> 
>>>>>>>>> Any advice and suggestions for this idea or new idea will be
>>>>>>>>> greatly appreciated!
>>>>>>>>> 
>>>>>>>>> Best regards,
>>>>>>>>> 
>>>>>>>>>      James zhang
>>>>>>>>> 
>>>>>>>> Sounds very good. And we may also need features bits
>>>>>>>> (VIRTIO_NET_F_GUEST|HOST_GSO_X) for this.
>>>>>>>> 
>>>>>>>> This is in fact one of items in networking todo list. (See
>>>>>>>> http://www.linux-kvm.org/page/NetworkingTodo). While at it, we'd
>>>>>>>> better support not only VXLAN but also other tunnels.
>>>>>>> Cc Vlad who is working on extending virtio-net headers.
>>>>>>> 
>>>>>>>> We can start with the spec work, or if you've already had some
>>>>>>>> bits you can post them as RFC for early review.
>>>>>>>> 
>>>>>>>> Thanks
>>>>>> Below is my demo code
>>>>>> Virtio_net.c
>>>>>> static int virtnet_probe(struct virtio_device *vdev), add belows codes:
>>>>>>           if (virtio_has_feature(vdev, VIRTIO_NET_F_MRG_RXBUF) ||				// avoid gso segment, it should be negotiation later, because in the demo I reuse num_buffers.
>>>>>>               virtio_has_feature(vdev, VIRTIO_F_VERSION_1)) {
>>>>>>                   dev->hw_enc_features |= NETIF_F_TSO;
>>>>>>                   dev->hw_enc_features |= NETIF_F_ALL_CSUM;
>>>>>>                   dev->hw_enc_features |= NETIF_F_GSO_UDP_TUNNEL;
>>>>>>                   dev->hw_enc_features |= NETIF_F_GSO_UDP_TUNNEL_CSUM;
>>>>>>                   dev->hw_enc_features |=
>>>>>> NETIF_F_GSO_TUNNEL_REMCSUM;
>>>>>> 
>>>>>>                   dev->features |= NETIF_F_GSO_UDP_TUNNEL;
>>>>>>                   dev->features |= NETIF_F_GSO_UDP_TUNNEL_CSUM;
>>>>>>                   dev->features |= NETIF_F_GSO_TUNNEL_REMCSUM;
>>>>>>           }
>>>>>> 
>>>>>> static int xmit_skb(struct send_queue *sq, struct sk_buff *skb), add
>>>>>> below to pieces of codes
>>>>>> 
>>>>>>                   if (skb_shinfo(skb)->gso_type & SKB_GSO_UDP_TUNNEL)
>>>>>>                           hdr->hdr.gso_type |= VIRTIO_NET_HDR_GSO_TUNNEL;
>>>>>>                   if (skb_shinfo(skb)->gso_type & SKB_GSO_UDP_TUNNEL_CSUM)
>>>>>>                           hdr->hdr.gso_type |= VIRTIO_NET_HDR_GSO_TUNNEL_CSUM;
>>>>>>                   if (skb_shinfo(skb)->gso_type & SKB_GSO_TUNNEL_REMCSUM)
>>>>>>                           hdr->hdr.gso_type |=
>>>>>> VIRTIO_NET_HDR_GSO_TUNNEL_REMCSUM;
>>>>>> 
>>>>>>           if (skb->encapsulation && skb_is_gso(skb)) {
>>>>>>                   inner_mac_len = skb_inner_network_header(skb) - skb_inner_mac_header(skb);
>>>>>>                   tnl_len = skb_inner_mac_header(skb) - skb_mac_header(skb);
>>>>>>                   if ( !(inner_mac_len >> DATA_LEN_SHIFT) && !(tnl_len >> DATA_LEN_SHIFT) ) {
>>>>>>                           hdr->hdr.flags |= VIRTIO_NET_HDR_F_ENCAPSULATION;
>>>>>>                           hdr->num_buffers = (__virtio16)((inner_mac_len << DATA_LEN_SHIFT) | tnl_len);		//we reuse num_buffers for simple , we should add extend member for later.
>>>>>>                   }  else
>>>>>>                           hdr->num_buffers = 0;
>>>>>>           }
>>>>>> 
>>>>>> Tun.c
>>>>>>                   if (memcpy_fromiovecend((void *)&hdr, iv, offset, tun->vnet_hdr_sz))		//read header with negotiation length
>>>>>>                           return -EFAULT;
>>>>>> 
>>>>>>                   if (hdr.gso_type & VIRTIO_NET_HDR_GSO_TUNNEL)					//set tunnel gso info
>>>>>>                           skb_shinfo(skb)->gso_type |= SKB_GSO_UDP_TUNNEL;
>>>>>>                   if (hdr.gso_type & VIRTIO_NET_HDR_GSO_TUNNEL_CSUM)
>>>>>>                           skb_shinfo(skb)->gso_type |= SKB_GSO_UDP_TUNNEL_CSUM;
>>>>>>                   if (hdr.gso_type & VIRTIO_NET_HDR_GSO_TUNNEL_REMCSUM)
>>>>>>                           skb_shinfo(skb)->gso_type |=
>>>>>> SKB_GSO_TUNNEL_REMCSUM;
>>>>>> 
>>>>>>           if (hdr.flags & VIRTIO_NET_HDR_F_ENCAPSULATION) {						//read tunnel info from header and set to built skb.
>>>>>>                   tnl_len = tun16_to_cpu(tun, hdr.num_buffers) & TUN_TNL_LEN_MASK;
>>>>>>                   payload_mac_len = tun16_to_cpu(tun, hdr.num_buffers) >> TUN_DATA_LEN_SHIFT;
>>>>>>                   mac_len = skb_network_header(skb) - skb_mac_header(skb);
>>>>>>                   skb_set_inner_mac_header(skb, tnl_len - mac_len);
>>>>>>                   skb_set_inner_network_header(skb, tnl_len + payload_mac_len - mac_len);
>>>>>>                   skb->encapsulation = 1;
>>>>>>           }
>>>>>> 
>>>>>> 
>>>>> Something like this, and you probably need do something more:
>>>>> 
>>>>> - use net-next.git to generate the patch (for the latest code)
>>>>> - add feature negotiation
>>>>> - tun/macvtap/qemu patches for this, you can start with tun/macvtap
>>>>> patches
>>>>> - support for all other SKB_GSO_* types which is not supported
>>>>> - use a new field instead of num_buffers
>>>>> - a virtio spec patch to describe the support for encapsulation
>>>>> offload
>>>>> 
>>>>> Thanks
>>>> Thank you for your advice, I will start it right now.
>>>> 
>>>> Thanks
>>> Cool, one more question: while at it, I think you may want to add support for dpdk too?
>>> 
>>> Thanks
>> Do you mean that the patch should be compatible with virtio pmd, or give virtio pmd patch?
>> 
>> Thanks
> 
> I mean it's better to prepare patches for both virtio pmd and dpdk.
> 
> Thanks

^ permalink raw reply

* Re: Netperf UDP issue with connected sockets
From: Rick Jones @ 2016-11-17 23:08 UTC (permalink / raw)
  To: Eric Dumazet, Jesper Dangaard Brouer; +Cc: netdev, Saeed Mahameed, Tariq Toukan
In-Reply-To: <1479419042.8455.280.camel@edumazet-glaptop3.roam.corp.google.com>

On 11/17/2016 01:44 PM, Eric Dumazet wrote:
> because netperf sends the same message
> over and over...

Well, sort of, by default.  That can be altered to a degree.

The global -F option should cause netperf to fill the buffers in its 
send ring with data from the specified file.  The number of buffers in 
the send ring can be controlled via the global -W option.  The number of 
elements in the ring will default to one more than the initial SO_SNDBUF 
size divided by the send size.

raj@tardy:~/netperf2_trunk$ strace -v -o /tmp/netperf.strace src/netperf 
-F src/nettest_omni.c -t UDP_STREAM -l 1 -- -m 1472

...

socket(PF_INET, SOCK_DGRAM, IPPROTO_UDP) = 4
getsockopt(4, SOL_SOCKET, SO_SNDBUF, [212992], [4]) = 0
getsockopt(4, SOL_SOCKET, SO_RCVBUF, [212992], [4]) = 0
setsockopt(4, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
bind(4, {sa_family=AF_INET, sin_port=htons(0), 
sin_addr=inet_addr("0.0.0.0")}, 16) = 0
setsockopt(4, SOL_SOCKET, SO_DONTROUTE, [1], 4) = 0
setsockopt(4, SOL_IP, IP_RECVERR, [1], 4) = 0
open("src/nettest_omni.c", O_RDONLY)    = 5
fstat(5, {st_dev=makedev(8, 2), st_ino=82075297, st_mode=S_IFREG|0664, 
st_nlink=1, st_uid=1000, st_gid=1000, st_blksize=4096, st_blocks=456, 
st_size=230027, st_atime=2016/11/16-09:49:29, 
st_mtime=2016/11/16-09:49:24, st_ctime=2016/11/16-09:49:24}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) 
= 0x7f3099f62000
read(5, "#ifdef HAVE_CONFIG_H\n#include <c"..., 4096) = 4096
read(5, "_INTEGER *intvl_two_ptr = &intvl"..., 4096) = 4096
read(5, "interval_count = interval_burst;"..., 4096) = 4096
read(5, ";\n\n/* these will control the wid"..., 4096) = 4096
read(5, "\n  LOCAL_SECURITY_ENABLED_NUM,\n "..., 4096) = 4096
read(5, "      &dwBytes,  \n              "..., 4096) = 4096

...

rt_sigaction(SIGALRM, {0x402ea6, [ALRM], SA_RESTORER|SA_INTERRUPT, 
0x7f30994a7cb0}, NULL, 8) = 0
rt_sigaction(SIGINT, {0x402ea6, [INT], SA_RESTORER|SA_INTERRUPT, 
0x7f30994a7cb0}, NULL, 8) = 0
alarm(1)                                = 0
sendto(4, "#ifdef HAVE_CONFIG_H\n#include <c"..., 1472, 0, 
{sa_family=AF_INET, sin_port=htons(58088), 
sin_addr=inet_addr("127.0.0.1")}, 16) = 1472
sendto(4, " used\\n\\\n    -m local,remote   S"..., 1472, 0, 
{sa_family=AF_INET, sin_port=htons(58088), 
sin_addr=inet_addr("127.0.0.1")}, 16) = 1472
sendto(4, " do here but clear the legacy fl"..., 1472, 0, 
{sa_family=AF_INET, sin_port=htons(58088), 
sin_addr=inet_addr("127.0.0.1")}, 16) = 1472
sendto(4, "e before we scan the test-specif"..., 1472, 0, 
{sa_family=AF_INET, sin_port=htons(58088), 
sin_addr=inet_addr("127.0.0.1")}, 16) = 1472
sendto(4, "\n\n\tfprintf(where,\n\t\ttput_fmt_1_l"..., 1472, 0, 
{sa_family=AF_INET, sin_port=htons(58088), 
sin_addr=inet_addr("127.0.0.1")}, 16) = 1472

Of course, it will continue to send the same messages from the send_ring 
over and over instead of putting different data into the buffers each 
time, but if one has a sufficiently large -W option specified...
happy benchmarking,

rick jones

^ permalink raw reply

* [RFC PATCH net-next] virtio_net: Support UDP Tunnel offloads.
From: Jarno Rajahalme @ 2016-11-17 23:01 UTC (permalink / raw)
  To: netdev; +Cc: jarno, james.zhangming, mst, vyasevic, ailan

This patch is a proof-of-concept I did a few months ago for UDP tunnel
offload support in virtio_net interface, and rebased on to the current
net-next.

Real implementation needs to extend the virtio_net header rather than
piggy-backing on existing fields.  Inner MAC length (or inner network
offset) also needs to be passed as a new field.  Control plane (QEMU)
also needs to be updated.

All testing was done using Geneve, but this should work for all UDP
tunnels the same.

Signed-off-by: Jarno Rajahalme <jarno@ovn.org>
---
 drivers/net/tun.c               |  7 ++++-
 drivers/net/virtio_net.c        | 16 +++++++---
 include/linux/skbuff.h          |  5 ++++
 include/linux/virtio_net.h      | 66 ++++++++++++++++++++++++++++++-----------
 include/uapi/linux/virtio_net.h |  7 +++++
 5 files changed, 78 insertions(+), 23 deletions(-)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 1588469..36f3219 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -198,7 +198,9 @@ struct tun_struct {
 	struct net_device	*dev;
 	netdev_features_t	set_features;
 #define TUN_USER_FEATURES (NETIF_F_HW_CSUM|NETIF_F_TSO_ECN|NETIF_F_TSO| \
-			  NETIF_F_TSO6|NETIF_F_UFO)
+			   NETIF_F_TSO6|NETIF_F_UFO|NETIF_F_GSO_UDP_TUNNEL| \
+			   NETIF_F_GSO_UDP_TUNNEL_CSUM| \
+			   NETIF_F_GSO_TUNNEL_REMCSUM)
 
 	int			align;
 	int			vnet_hdr_sz;
@@ -1877,6 +1879,9 @@ static int set_offload(struct tun_struct *tun, unsigned long arg)
 
 		if (arg & TUN_F_UFO) {
 			features |= NETIF_F_UFO;
+#if 1
+			features |= NETIF_F_GSO_UDP_TUNNEL|NETIF_F_GSO_UDP_TUNNEL_CSUM|NETIF_F_GSO_TUNNEL_REMCSUM;
+#endif
 			arg &= ~TUN_F_UFO;
 		}
 	}
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index ca5239a..eb8d887 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -1789,7 +1789,10 @@ static int virtnet_probe(struct virtio_device *vdev)
 
 		if (virtio_has_feature(vdev, VIRTIO_NET_F_GSO)) {
 			dev->hw_features |= NETIF_F_TSO | NETIF_F_UFO
-				| NETIF_F_TSO_ECN | NETIF_F_TSO6;
+				| NETIF_F_TSO_ECN | NETIF_F_TSO6
+				| NETIF_F_GSO_UDP_TUNNEL
+				| NETIF_F_GSO_UDP_TUNNEL_CSUM
+				| NETIF_F_GSO_TUNNEL_REMCSUM;
 		}
 		/* Individual feature bits: what can host handle? */
 		if (virtio_has_feature(vdev, VIRTIO_NET_F_HOST_TSO4))
@@ -1798,13 +1801,18 @@ static int virtnet_probe(struct virtio_device *vdev)
 			dev->hw_features |= NETIF_F_TSO6;
 		if (virtio_has_feature(vdev, VIRTIO_NET_F_HOST_ECN))
 			dev->hw_features |= NETIF_F_TSO_ECN;
-		if (virtio_has_feature(vdev, VIRTIO_NET_F_HOST_UFO))
+		if (virtio_has_feature(vdev, VIRTIO_NET_F_HOST_UFO)) {
 			dev->hw_features |= NETIF_F_UFO;
-
+#if 1
+			dev->hw_features |= NETIF_F_GSO_UDP_TUNNEL;
+			dev->hw_features |= NETIF_F_GSO_UDP_TUNNEL_CSUM;
+			dev->hw_features |= NETIF_F_GSO_TUNNEL_REMCSUM;
+#endif
+		}
 		dev->features |= NETIF_F_GSO_ROBUST;
 
 		if (gso)
-			dev->features |= dev->hw_features & (NETIF_F_ALL_TSO|NETIF_F_UFO);
+			dev->features |= dev->hw_features & (NETIF_F_ALL_TSO|NETIF_F_UFO|NETIF_F_GSO_UDP_TUNNEL|NETIF_F_GSO_UDP_TUNNEL_CSUM|NETIF_F_GSO_TUNNEL_REMCSUM);
 		/* (!csum && gso) case will be fixed by register_netdev() */
 	}
 	if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_CSUM))
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index a4aeeca..992ad30 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -2115,6 +2115,11 @@ static inline unsigned char *skb_inner_mac_header(const struct sk_buff *skb)
 	return skb->head + skb->inner_mac_header;
 }
 
+static inline int skb_inner_mac_offset(const struct sk_buff *skb)
+{
+	return skb_inner_mac_header(skb) - skb->data;
+}
+
 static inline void skb_reset_inner_mac_header(struct sk_buff *skb)
 {
 	skb->inner_mac_header = skb->data - skb->head;
diff --git a/include/linux/virtio_net.h b/include/linux/virtio_net.h
index 1c912f8..17384d1 100644
--- a/include/linux/virtio_net.h
+++ b/include/linux/virtio_net.h
@@ -8,10 +8,19 @@ static inline int virtio_net_hdr_to_skb(struct sk_buff *skb,
 					const struct virtio_net_hdr *hdr,
 					bool little_endian)
 {
-	unsigned short gso_type = 0;
+	u16 start = __virtio16_to_cpu(little_endian, hdr->csum_start);
+
+	if (hdr->flags & VIRTIO_NET_HDR_F_NEEDS_CSUM) {
+		u16 off = __virtio16_to_cpu(little_endian, hdr->csum_offset);
+
+		if (!skb_partial_csum_set(skb, start, off))
+			return -EINVAL;
+	}
 
 	if (hdr->gso_type != VIRTIO_NET_HDR_GSO_NONE) {
-		switch (hdr->gso_type & ~VIRTIO_NET_HDR_GSO_ECN) {
+		unsigned short gso_type = 0;
+
+		switch (hdr->gso_type & ~VIRTIO_NET_HDR_GSO_FLAGS) {
 		case VIRTIO_NET_HDR_GSO_TCPV4:
 			gso_type = SKB_GSO_TCPV4;
 			break;
@@ -27,23 +36,28 @@ static inline int virtio_net_hdr_to_skb(struct sk_buff *skb,
 
 		if (hdr->gso_type & VIRTIO_NET_HDR_GSO_ECN)
 			gso_type |= SKB_GSO_TCP_ECN;
+		if (hdr->gso_type & VIRTIO_NET_HDR_GSO_UDP_TUNNEL)
+			gso_type |= SKB_GSO_UDP_TUNNEL;
+		if (hdr->gso_type & VIRTIO_NET_HDR_GSO_UDP_TUNNEL_CSUM)
+			gso_type |= SKB_GSO_UDP_TUNNEL_CSUM;
+		if (hdr->gso_type & VIRTIO_NET_HDR_GSO_TUNNEL_REMCSUM) {
+			gso_type |= SKB_GSO_TUNNEL_REMCSUM;
+			skb->remcsum_offload = true;
+		}
+		if (gso_type & (SKB_GSO_UDP_TUNNEL | SKB_GSO_UDP_TUNNEL_CSUM)) {
+			u16 hdr_len = __virtio16_to_cpu(little_endian,
+							hdr->hdr_len);
+			skb->encapsulation = 1;
+			skb_set_inner_mac_header(skb, hdr_len);
+			skb_set_inner_network_header(skb, hdr_len + ETH_HLEN);
+			/* XXX: What if start is not set? */
+			skb_set_inner_transport_header(skb, start);
+		}
 
 		if (hdr->gso_size == 0)
 			return -EINVAL;
-	}
-
-	if (hdr->flags & VIRTIO_NET_HDR_F_NEEDS_CSUM) {
-		u16 start = __virtio16_to_cpu(little_endian, hdr->csum_start);
-		u16 off = __virtio16_to_cpu(little_endian, hdr->csum_offset);
-
-		if (!skb_partial_csum_set(skb, start, off))
-			return -EINVAL;
-	}
-
-	if (hdr->gso_type != VIRTIO_NET_HDR_GSO_NONE) {
-		u16 gso_size = __virtio16_to_cpu(little_endian, hdr->gso_size);
-
-		skb_shinfo(skb)->gso_size = gso_size;
+		skb_shinfo(skb)->gso_size = __virtio16_to_cpu(little_endian,
+							      hdr->gso_size);
 		skb_shinfo(skb)->gso_type = gso_type;
 
 		/* Header must be checked, and gso_segs computed. */
@@ -64,8 +78,8 @@ static inline int virtio_net_hdr_from_skb(const struct sk_buff *skb,
 		struct skb_shared_info *sinfo = skb_shinfo(skb);
 
 		/* This is a hint as to how much should be linear. */
-		hdr->hdr_len = __cpu_to_virtio16(little_endian,
-						 skb_headlen(skb));
+		u16 hdr_len = skb_headlen(skb);
+
 		hdr->gso_size = __cpu_to_virtio16(little_endian,
 						  sinfo->gso_size);
 		if (sinfo->gso_type & SKB_GSO_TCPV4)
@@ -78,6 +92,22 @@ static inline int virtio_net_hdr_from_skb(const struct sk_buff *skb,
 			return -EINVAL;
 		if (sinfo->gso_type & SKB_GSO_TCP_ECN)
 			hdr->gso_type |= VIRTIO_NET_HDR_GSO_ECN;
+		if (sinfo->gso_type & SKB_GSO_UDP_TUNNEL)
+			hdr->gso_type |= VIRTIO_NET_HDR_GSO_UDP_TUNNEL;
+		if (sinfo->gso_type & SKB_GSO_UDP_TUNNEL_CSUM)
+			hdr->gso_type |= VIRTIO_NET_HDR_GSO_UDP_TUNNEL_CSUM;
+		if (sinfo->gso_type & SKB_GSO_TUNNEL_REMCSUM)
+			hdr->gso_type |= VIRTIO_NET_HDR_GSO_TUNNEL_REMCSUM;
+
+		if (sinfo->gso_type &
+		    (SKB_GSO_UDP_TUNNEL | SKB_GSO_UDP_TUNNEL_CSUM))
+			/* For encapsulated packets 'hdr_len' is the offset to
+			 * the beginning of the inner packet.  This way the
+			 * encapsulation can remain ignorant of the size of the
+			 * UDP tunnel header.
+			 */
+			hdr_len = skb_inner_mac_offset(skb);
+		hdr->hdr_len = __cpu_to_virtio16(little_endian, hdr_len);
 	} else
 		hdr->gso_type = VIRTIO_NET_HDR_GSO_NONE;
 
diff --git a/include/uapi/linux/virtio_net.h b/include/uapi/linux/virtio_net.h
index fc353b5..833950b 100644
--- a/include/uapi/linux/virtio_net.h
+++ b/include/uapi/linux/virtio_net.h
@@ -93,7 +93,14 @@ struct virtio_net_hdr_v1 {
 #define VIRTIO_NET_HDR_GSO_TCPV4	1	/* GSO frame, IPv4 TCP (TSO) */
 #define VIRTIO_NET_HDR_GSO_UDP		3	/* GSO frame, IPv4 UDP (UFO) */
 #define VIRTIO_NET_HDR_GSO_TCPV6	4	/* GSO frame, IPv6 TCP */
+#define VIRTIO_NET_HDR_GSO_UDP_TUNNEL	0x10	/* GSO frame, UDP tunnel */
+#define VIRTIO_NET_HDR_GSO_UDP_TUNNEL_CSUM 0x20	/* GSO frame, UDP tnl w CSUM */
+#define VIRTIO_NET_HDR_GSO_TUNNEL_REMCSUM 0x40	/* TUNNEL with TSO & REMCSUM */
 #define VIRTIO_NET_HDR_GSO_ECN		0x80	/* TCP has ECN set */
+#define VIRTIO_NET_HDR_GSO_FLAGS (VIRTIO_NET_HDR_GSO_UDP_TUNNEL | \
+				  VIRTIO_NET_HDR_GSO_UDP_TUNNEL_CSUM | \
+				  VIRTIO_NET_HDR_GSO_TUNNEL_REMCSUM | \
+				  VIRTIO_NET_HDR_GSO_ECN)
 	__u8 gso_type;
 	__virtio16 hdr_len;	/* Ethernet + IP + tcp/udp hdrs */
 	__virtio16 gso_size;	/* Bytes to append to hdr_len per frame */
-- 
2.1.4

^ permalink raw reply related

* Re: [Patch net] af_unix: revert "af_unix: use freezable blocking calls in read"
From: Cong Wang @ 2016-11-17 22:47 UTC (permalink / raw)
  To: Colin Cross; +Cc: netdev, Dmitry Vyukov, Tejun Heo, Rafael J. Wysocki
In-Reply-To: <CAMbhsRSQ+eVid4OG_xVMDyuZGfOJ2gj6DvHakOH5vbu3ySHF3A@mail.gmail.com>

On Thu, Nov 17, 2016 at 2:30 PM, Colin Cross <ccross@android.com> wrote:
> On Thu, Nov 17, 2016 at 2:09 PM, Cong Wang <xiyou.wangcong@gmail.com> wrote:
>> Commit 2b15af6f95 ("af_unix: use freezable blocking calls in read")
>> converts schedule_timeout() to its freezable version, it was probably
>> correct at that time, but later, commit 2b514574f7e8
>> ("net: af_unix: implement splice for stream af_unix sockets") breaks
>> the strong requirement for a freezable sleep, according to
>> commit 0f9548ca1091:
>>
>>     We shouldn't try_to_freeze if locks are held.  Holding a lock can cause a
>>     deadlock if the lock is later acquired in the suspend or hibernate path
>>     (e.g.  by dpm).  Holding a lock can also cause a deadlock in the case of
>>     cgroup_freezer if a lock is held inside a frozen cgroup that is later
>>     acquired by a process outside that group.
>>
>> The pipe_lock is still held at that point. So just revert commit 2b15af6f95.
>
> On my phone 77 threads are blocked in unix_stream_recvmsg.  A simple
> revert of this patch will cause every one of those threads to wake up
> twice per suspend cycle, which can be multiple times a second.  How
> about adding a freezable flag to unix_stream_read_state so
> unix_stream_recvmsg can stay freezable, and unix_stream_splice_read
> can be unfreezable?

Fair enough, I didn't know it could have such an impact.
I will send v2.

Thanks.

^ permalink raw reply

* Re: Netperf UDP issue with connected sockets
From: Alexander Duyck @ 2016-11-17 22:39 UTC (permalink / raw)
  To: David Laight
  Cc: Jesper Dangaard Brouer, Eric Dumazet, Rick Jones,
	netdev@vger.kernel.org
In-Reply-To: <063D6719AE5E284EB5DD2968C1650D6DB0222982@AcuExch.aculab.com>

On Thu, Nov 17, 2016 at 9:34 AM, David Laight <David.Laight@aculab.com> wrote:
> From: Jesper Dangaard Brouer
>> Sent: 17 November 2016 14:58
>> On Thu, 17 Nov 2016 06:17:38 -0800
>> Eric Dumazet <eric.dumazet@gmail.com> wrote:
>>
>> > On Thu, 2016-11-17 at 14:42 +0100, Jesper Dangaard Brouer wrote:
>> >
>> > > I can see that qdisc layer does not activate xmit_more in this case.
>> > >
>> >
>> > Sure. Not enough pressure from the sender(s).
>> >
>> > The bottleneck is not the NIC or qdisc in your case, meaning that BQL
>> > limit is kept at a small value.
>> >
>> > (BTW not all NIC have expensive doorbells)
>>
>> I believe this NIC mlx5 (50G edition) does.
>>
>> I'm seeing UDP TX of 1656017.55 pps, which is per packet:
>> 2414 cycles(tsc) 603.86 ns
>>
>> Perf top shows (with my own udp_flood, that avoids __ip_select_ident):
>>
>>  Samples: 56K of event 'cycles', Event count (approx.): 51613832267
>>    Overhead  Command        Shared Object        Symbol
>>  +    8.92%  udp_flood      [kernel.vmlinux]     [k] _raw_spin_lock
>>    - _raw_spin_lock
>>       + 90.78% __dev_queue_xmit
>>       + 7.83% dev_queue_xmit
>>       + 1.30% ___slab_alloc
>>  +    5.59%  udp_flood      [kernel.vmlinux]     [k] skb_set_owner_w
>>  +    4.77%  udp_flood      [mlx5_core]          [k] mlx5e_sq_xmit
>>  +    4.09%  udp_flood      [kernel.vmlinux]     [k] fib_table_lookup
>>  +    4.00%  swapper        [mlx5_core]          [k] mlx5e_poll_tx_cq
>>  +    3.11%  udp_flood      [kernel.vmlinux]     [k] __ip_route_output_key_hash
>>  +    2.49%  swapper        [kernel.vmlinux]     [k] __slab_free
>>
>> In this setup the spinlock in __dev_queue_xmit should be uncongested.
>> An uncongested spin_lock+unlock cost 32 cycles(tsc) 8.198 ns on this system.
>>
>> But 8.92% of the time is spend on it, which corresponds to a cost of 215
>> cycles (2414*0.0892).  This cost is too high, thus something else is
>> going on... I claim this mysterious extra cost is the tailptr/doorbell.
>
> Try adding code to ring the doorbell twice.
> If this doesn't slow things down then it isn't (likely to be) responsible
> for the delay you are seeing.
>
>         David
>

The problem isn't only the doorbell.  It is doorbell plus a locked
transaction on x86 results in a long wait until the doorbell write has
been completed.

You could batch a bunch of doorbell writes together and it isn't an
issue unless you do something like writel(), wmb(), writel(), wmb(),
then you will see the effect double since the write memory barrier is
what is forcing the delays.

- Alex

^ permalink raw reply

* Re: [Patch net] af_unix: revert "af_unix: use freezable blocking calls in read"
From: Colin Cross @ 2016-11-17 22:30 UTC (permalink / raw)
  To: Cong Wang; +Cc: netdev, dvyukov, Tejun Heo, Rafael J. Wysocki
In-Reply-To: <1479420564-15799-1-git-send-email-xiyou.wangcong@gmail.com>

On Thu, Nov 17, 2016 at 2:09 PM, Cong Wang <xiyou.wangcong@gmail.com> wrote:
> Commit 2b15af6f95 ("af_unix: use freezable blocking calls in read")
> converts schedule_timeout() to its freezable version, it was probably
> correct at that time, but later, commit 2b514574f7e8
> ("net: af_unix: implement splice for stream af_unix sockets") breaks
> the strong requirement for a freezable sleep, according to
> commit 0f9548ca1091:
>
>     We shouldn't try_to_freeze if locks are held.  Holding a lock can cause a
>     deadlock if the lock is later acquired in the suspend or hibernate path
>     (e.g.  by dpm).  Holding a lock can also cause a deadlock in the case of
>     cgroup_freezer if a lock is held inside a frozen cgroup that is later
>     acquired by a process outside that group.
>
> The pipe_lock is still held at that point. So just revert commit 2b15af6f95.

On my phone 77 threads are blocked in unix_stream_recvmsg.  A simple
revert of this patch will cause every one of those threads to wake up
twice per suspend cycle, which can be multiple times a second.  How
about adding a freezable flag to unix_stream_read_state so
unix_stream_recvmsg can stay freezable, and unix_stream_splice_read
can be unfreezable?

^ permalink raw reply

* Re: net: BUG still has locks held in unix_stream_splice_read
From: Hannes Frederic Sowa @ 2016-11-17 22:27 UTC (permalink / raw)
  To: Cong Wang, Al Viro
  Cc: Dmitry Vyukov, David Miller, Eric Dumazet, netdev, LKML,
	syzkaller, Colin Cross, Mandeep Singh Baines
In-Reply-To: <CAM_iQpUEEDE+OcfX66YJDC6dA+b-URHhUtWtv+sn-t5Esk_FWw@mail.gmail.com>

On 17.11.2016 22:44, Cong Wang wrote:
> On Sun, Oct 9, 2016 at 8:14 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
>> E.g what will happen if some code does a read on AF_UNIX socket with
>> some local mutex held?  AFAICS, there are exactly two callers of
>> freezable_schedule_timeout() - this one and one in XFS; the latter is
>> in a kernel thread where we do have good warranties about the locking
>> environment, but here it's in the bleeding ->recvmsg/->splice_read and
>> for those assumption that caller doesn't hold any locks is pretty
>> strong, especially since it's not documented anywhere.
>>
>> What's going on there?
> 
> Commit 2b15af6f95 ("af_unix: use freezable blocking calls in read")
> converts schedule_timeout() to its freezable version, it was probably correct
> at that time, but later, commit 2b514574f7e88c8498027ee366
> ("net: af_unix: implement splice for stream af_unix sockets") breaks its
> requirement for a freezable sleep:
> 
>     commit 0f9548ca10916dec166eaf74c816bded7d8e611d
> 
>     lockdep: check that no locks held at freeze time
> 
>     We shouldn't try_to_freeze if locks are held.  Holding a lock can cause a
>     deadlock if the lock is later acquired in the suspend or hibernate path
>     (e.g.  by dpm).  Holding a lock can also cause a deadlock in the case of
>     cgroup_freezer if a lock is held inside a frozen cgroup that is later
>     acquired by a process outside that group.
> 
> So probably we just need to revert commit 2b15af6f95 now.
> 
> I am going to send a revert for at least -net and -stable, since Dmitry
> saw this warning again.

I am not an expert on freezing but this looks around right from the
freezer code. Awesome, thanks a lot for spotting this one!

^ permalink raw reply

* Re: Debugging Ethernet issues
From: Måns Rullgård @ 2016-11-17 22:17 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: Sebastian Frias, Mason, Andrew Lunn, netdev, Sergei Shtylyov,
	Tom Lendacky, Zach Brown, Shaohui Xie, Tim Beale, Brian Hill,
	Vince Bridgers, Balakumaran Kannan, David S. Miller,
	Kirill Kapranov
In-Reply-To: <8efc016a-3390-3bec-d74c-7c215c151b2f@gmail.com>

Florian Fainelli <f.fainelli@gmail.com> writes:

> On 11/14/2016 11:00 AM, Måns Rullgård wrote:
>> Florian Fainelli <f.fainelli@gmail.com> writes:
>> 
>>> On 11/14/2016 10:20 AM, Florian Fainelli wrote:
>>>> On 11/14/2016 09:59 AM, Sebastian Frias wrote:
>>>>> On 11/14/2016 06:32 PM, Florian Fainelli wrote:
>>>>>> On 11/14/2016 07:33 AM, Mason wrote:
>>>>>>> On 14/11/2016 15:58, Mason wrote:
>>>>>>>
>>>>>>>> nb8800 26000.ethernet eth0: Link is Up - 100Mbps/Full - flow control rx/tx
>>>>>>>> vs
>>>>>>>> nb8800 26000.ethernet eth0: Link is Up - 100Mbps/Full - flow control off
>>>>>>>>
>>>>>>>> I'm not sure whether "flow control" is relevant...
>>>>>>>
>>>>>>> Based on phy_print_status()
>>>>>>> phydev->pause ? "rx/tx" : "off"
>>>>>>> I added the following patch.
>>>>>>>
>>>>>>> diff --git a/drivers/net/ethernet/aurora/nb8800.c b/drivers/net/ethernet/aurora/nb8800.c
>>>>>>> index defc22a15f67..4e758c1cfa4e 100644
>>>>>>> --- a/drivers/net/ethernet/aurora/nb8800.c
>>>>>>> +++ b/drivers/net/ethernet/aurora/nb8800.c
>>>>>>> @@ -667,6 +667,8 @@ static void nb8800_link_reconfigure(struct net_device *dev)
>>>>>>>         struct phy_device *phydev = priv->phydev;
>>>>>>>         int change = 0;
>>>>>>>  
>>>>>>> +       printk("%s from %pf\n", __func__, __builtin_return_address(0));
>>>>>>> +
>>>>>>>         if (phydev->link) {
>>>>>>>                 if (phydev->speed != priv->speed) {
>>>>>>>                         priv->speed = phydev->speed;
>>>>>>> @@ -1274,9 +1276,9 @@ static int nb8800_hw_init(struct net_device *dev)
>>>>>>>         nb8800_writeb(priv, NB8800_PQ2, val & 0xff);
>>>>>>>  
>>>>>>>         /* Auto-negotiate by default */
>>>>>>> -       priv->pause_aneg = true;
>>>>>>> -       priv->pause_rx = true;
>>>>>>> -       priv->pause_tx = true;
>>>>>>> +       priv->pause_aneg = false;
>>>>>>> +       priv->pause_rx = false;
>>>>>>> +       priv->pause_tx = false;
>>>>>>>  
>>>>>>>         nb8800_mc_init(dev, 0);
>>>>>>>  
>>>>>>>
>> 
>> [...]
>> 
>>>>>> And the time difference is clearly accounted for auto-negotiation time
>>>>>> here, as you can see it takes about 3 seconds for Gigabit Ethernet to
>>>>>> auto-negotiate and that seems completely acceptable and normal to me
>>>>>> since it is a more involved process than lower speeds.
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> OK, so now it works (by accident?) even on 100 Mbps switch, but it still
>>>>>>> prints "flow control rx/tx"...
>>>>>>
>>>>>> Because your link partner advertises flow control, and that's what
>>>>>> phydev->pause and phydev->asym_pause report (I know it's confusing, but
>>>>>> that's what it is at the moment).
>>>>>
>>>>> Thanks.
>>>>> Could you confirm that Mason's patch is correct and/or that it does not
>>>>> has negative side-effects?
>>>>
>>>> The patch is not correct nor incorrect per-se, it changes the default
>>>> policy of having pause frames advertised by default to not having them
>>>> advertised by default.
>> 
>> I was advised to advertise flow control by default back when I was
>> working on the driver, and I think it makes sense to do so.
>> 
>>>> This influences both your Ethernet MAC and the link partner in that
>>>> the result is either flow control is enabled (before) or it is not
>>>> (with the patch). There must be something amiss if you see packet
>>>> loss or some kind of problem like that with an early exchange such as
>>>> DHCP. Flow control tend to kick in under higher packet rates (at
>>>> least, that's what you expect).
>>>>
>>>>>
>>>>> Right now we know that Mason's patch makes this work, but we do not
>>>>> understand why nor its implications.
>>>>
>>>> You need to understand why, right now, the way this problem is
>>>> presented, you came up with a workaround, not with the root cause or the
>>>> solution. What does your link partner (switch?) reports, that is, what
>>>> is the ethtool output when you have a link up from  your nb8800 adapter?
>>>
>>> Actually, nb8800_pause_config() seems to be doing a complete MAC/DMA
>>> reconfiguration when pause frames get auto-negotiated while the link is
>>> UP,
>> 
>> This is due to a silly hardware limitation.  The register containing the
>> flow control bits can't be written while rx is enabled.
>
> You do a DMA stop, but you don't disable the MAC receiver unlike what
> nb8800_stop() does, why is not calling nb8800_mac_rx() necessary here?

Oh, right.  That's because the RXC_CR register (where the flow control
bits are) can't be modified when the RCR_EN bit (rx dma enable) is set.
The MAC core register controlled by nb8800_mac_rx() doesn't matter here.
There is no way of changing the flow control setting without briefly
stopping rx dma.

None of this should be relevant here though since everything should be
all set up before dma is enabled the first time.

>>> and it does not differentiate being called from
>>> ethtool::set_pauseparam or the PHYLIB adjust_link callback (which it
>>> probably should),
>> 
>> Differentiate how?
>
> Differentiate in that when you are called from adjust_link, why bother
> checking with netif_running() since you are only configuring the pause
> settings when phydev->link is set. Not that this matters much, but
> that's something the caller can tell you.

netif_running() can be true or false independently of the link state.

-- 
Måns Rullgård

^ permalink raw reply

* Re: [PATCH v2 net-next] lan78xx: relocate mdix setting to phy driver
From: Florian Fainelli @ 2016-11-17 22:16 UTC (permalink / raw)
  To: Woojung.Huh, davem, netdev; +Cc: andrew, UNGLinuxDriver
In-Reply-To: <9235D6609DB808459E95D78E17F2E43D40966AA4@CHN-SV-EXMX02.mchp-main.com>

On 11/17/2016 02:10 PM, Woojung.Huh@microchip.com wrote:
> From: Woojung Huh <woojung.huh@microchip.com>
> 
> Relocate mdix code to phy driver to be called at config_init().
> 
> Signed-off-by: Woojung Huh <woojung.huh@microchip.com>

Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
-- 
Florian

^ permalink raw reply

* [PATCH v2 net-next] lan78xx: relocate mdix setting to phy driver
From: Woojung.Huh @ 2016-11-17 22:10 UTC (permalink / raw)
  To: davem, f.fainelli, netdev; +Cc: andrew, UNGLinuxDriver

From: Woojung Huh <woojung.huh@microchip.com>

Relocate mdix code to phy driver to be called at config_init().

Signed-off-by: Woojung Huh <woojung.huh@microchip.com>
---
 drivers/net/phy/microchip.c | 36 +++++++++++++++++++++-
 drivers/net/usb/lan78xx.c   | 73 ++-------------------------------------------
 2 files changed, 38 insertions(+), 71 deletions(-)

diff --git a/drivers/net/phy/microchip.c b/drivers/net/phy/microchip.c
index 7c00e50..eb4db22 100644
--- a/drivers/net/phy/microchip.c
+++ b/drivers/net/phy/microchip.c
@@ -106,6 +106,40 @@ static int lan88xx_set_wol(struct phy_device *phydev,
 	return 0;
 }
 
+static void lan88xx_set_mdix(struct phy_device *phydev)
+{
+	int buf;
+	int val;
+
+	switch (phydev->mdix) {
+	case ETH_TP_MDI:
+		val = LAN88XX_EXT_MODE_CTRL_MDI_;
+		break;
+	case ETH_TP_MDI_X:
+		val = LAN88XX_EXT_MODE_CTRL_MDI_X_;
+		break;
+	case ETH_TP_MDI_AUTO:
+		val = LAN88XX_EXT_MODE_CTRL_AUTO_MDIX_;
+		break;
+	default:
+		return;
+	}
+
+	phy_write(phydev, LAN88XX_EXT_PAGE_ACCESS, LAN88XX_EXT_PAGE_SPACE_1);
+	buf = phy_read(phydev, LAN88XX_EXT_MODE_CTRL);
+	buf &= ~LAN88XX_EXT_MODE_CTRL_MDIX_MASK_;
+	buf |= val;
+	phy_write(phydev, LAN88XX_EXT_MODE_CTRL, buf);
+	phy_write(phydev, LAN88XX_EXT_PAGE_ACCESS, LAN88XX_EXT_PAGE_SPACE_0);
+}
+
+static int lan88xx_config_aneg(struct phy_device *phydev)
+{
+	lan88xx_set_mdix(phydev);
+
+	return genphy_config_aneg(phydev);
+}
+
 static struct phy_driver microchip_phy_driver[] = {
 {
 	.phy_id		= 0x0007c130,
@@ -120,7 +154,7 @@ static struct phy_driver microchip_phy_driver[] = {
 	.remove		= lan88xx_remove,
 
 	.config_init	= genphy_config_init,
-	.config_aneg	= genphy_config_aneg,
+	.config_aneg	= lan88xx_config_aneg,
 	.read_status	= genphy_read_status,
 
 	.ack_interrupt	= lan88xx_phy_ack_interrupt,
diff --git a/drivers/net/usb/lan78xx.c b/drivers/net/usb/lan78xx.c
index cf2857f..0c459e9 100644
--- a/drivers/net/usb/lan78xx.c
+++ b/drivers/net/usb/lan78xx.c
@@ -1471,62 +1471,12 @@ static void lan78xx_set_msglevel(struct net_device *net, u32 level)
 	dev->msg_enable = level;
 }
 
-static int lan78xx_get_mdix_status(struct net_device *net)
-{
-	struct phy_device *phydev = net->phydev;
-	int buf;
-
-	phy_write(phydev, LAN88XX_EXT_PAGE_ACCESS, LAN88XX_EXT_PAGE_SPACE_1);
-	buf = phy_read(phydev, LAN88XX_EXT_MODE_CTRL);
-	phy_write(phydev, LAN88XX_EXT_PAGE_ACCESS, LAN88XX_EXT_PAGE_SPACE_0);
-
-	return buf;
-}
-
-static void lan78xx_set_mdix_status(struct net_device *net, __u8 mdix_ctrl)
-{
-	struct lan78xx_net *dev = netdev_priv(net);
-	struct phy_device *phydev = net->phydev;
-	int buf;
-
-	if (mdix_ctrl == ETH_TP_MDI) {
-		phy_write(phydev, LAN88XX_EXT_PAGE_ACCESS,
-			  LAN88XX_EXT_PAGE_SPACE_1);
-		buf = phy_read(phydev, LAN88XX_EXT_MODE_CTRL);
-		buf &= ~LAN88XX_EXT_MODE_CTRL_MDIX_MASK_;
-		phy_write(phydev, LAN88XX_EXT_MODE_CTRL,
-			  buf | LAN88XX_EXT_MODE_CTRL_MDI_);
-		phy_write(phydev, LAN88XX_EXT_PAGE_ACCESS,
-			  LAN88XX_EXT_PAGE_SPACE_0);
-	} else if (mdix_ctrl == ETH_TP_MDI_X) {
-		phy_write(phydev, LAN88XX_EXT_PAGE_ACCESS,
-			  LAN88XX_EXT_PAGE_SPACE_1);
-		buf = phy_read(phydev, LAN88XX_EXT_MODE_CTRL);
-		buf &= ~LAN88XX_EXT_MODE_CTRL_MDIX_MASK_;
-		phy_write(phydev, LAN88XX_EXT_MODE_CTRL,
-			  buf | LAN88XX_EXT_MODE_CTRL_MDI_X_);
-		phy_write(phydev, LAN88XX_EXT_PAGE_ACCESS,
-			  LAN88XX_EXT_PAGE_SPACE_0);
-	} else if (mdix_ctrl == ETH_TP_MDI_AUTO) {
-		phy_write(phydev, LAN88XX_EXT_PAGE_ACCESS,
-			  LAN88XX_EXT_PAGE_SPACE_1);
-		buf = phy_read(phydev, LAN88XX_EXT_MODE_CTRL);
-		buf &= ~LAN88XX_EXT_MODE_CTRL_MDIX_MASK_;
-		phy_write(phydev, LAN88XX_EXT_MODE_CTRL,
-			  buf | LAN88XX_EXT_MODE_CTRL_AUTO_MDIX_);
-		phy_write(phydev, LAN88XX_EXT_PAGE_ACCESS,
-			  LAN88XX_EXT_PAGE_SPACE_0);
-	}
-	dev->mdix_ctrl = mdix_ctrl;
-}
-
 static int lan78xx_get_link_ksettings(struct net_device *net,
 				      struct ethtool_link_ksettings *cmd)
 {
 	struct lan78xx_net *dev = netdev_priv(net);
 	struct phy_device *phydev = net->phydev;
 	int ret;
-	int buf;
 
 	ret = usb_autopm_get_interface(dev->intf);
 	if (ret < 0)
@@ -1534,20 +1484,6 @@ static int lan78xx_get_link_ksettings(struct net_device *net,
 
 	ret = phy_ethtool_ksettings_get(phydev, cmd);
 
-	buf = lan78xx_get_mdix_status(net);
-
-	buf &= LAN88XX_EXT_MODE_CTRL_MDIX_MASK_;
-	if (buf == LAN88XX_EXT_MODE_CTRL_AUTO_MDIX_) {
-		cmd->base.eth_tp_mdix = ETH_TP_MDI_AUTO;
-		cmd->base.eth_tp_mdix_ctrl = ETH_TP_MDI_AUTO;
-	} else if (buf == LAN88XX_EXT_MODE_CTRL_MDI_) {
-		cmd->base.eth_tp_mdix = ETH_TP_MDI;
-		cmd->base.eth_tp_mdix_ctrl = ETH_TP_MDI;
-	} else if (buf == LAN88XX_EXT_MODE_CTRL_MDI_X_) {
-		cmd->base.eth_tp_mdix = ETH_TP_MDI_X;
-		cmd->base.eth_tp_mdix_ctrl = ETH_TP_MDI_X;
-	}
-
 	usb_autopm_put_interface(dev->intf);
 
 	return ret;
@@ -1565,9 +1501,6 @@ static int lan78xx_set_link_ksettings(struct net_device *net,
 	if (ret < 0)
 		return ret;
 
-	if (dev->mdix_ctrl != cmd->base.eth_tp_mdix_ctrl)
-		lan78xx_set_mdix_status(net, cmd->base.eth_tp_mdix_ctrl);
-
 	/* change speed & duplex */
 	ret = phy_ethtool_ksettings_set(phydev, cmd);
 
@@ -2019,6 +1952,9 @@ static int lan78xx_phy_init(struct lan78xx_net *dev)
 		phydev->irq = 0;
 	netdev_dbg(dev->net, "phydev->irq = %d\n", phydev->irq);
 
+	/* set to AUTOMDIX */
+	phydev->mdix = ETH_TP_MDI_AUTO;
+
 	ret = phy_connect_direct(dev->net, phydev,
 				 lan78xx_link_status_change,
 				 PHY_INTERFACE_MODE_GMII);
@@ -2028,9 +1964,6 @@ static int lan78xx_phy_init(struct lan78xx_net *dev)
 		return -EIO;
 	}
 
-	/* set to AUTOMDIX */
-	lan78xx_set_mdix_status(dev->net, ETH_TP_MDI_AUTO);
-
 	/* MAC doesn't support 1000T Half */
 	phydev->supported &= ~SUPPORTED_1000baseT_Half;
 
-- 
2.7.4

^ permalink raw reply related

* [Patch net] af_unix: revert "af_unix: use freezable blocking calls in read"
From: Cong Wang @ 2016-11-17 22:09 UTC (permalink / raw)
  To: netdev; +Cc: dvyukov, Cong Wang, Tejun Heo, Colin Cross, Rafael J. Wysocki

Commit 2b15af6f95 ("af_unix: use freezable blocking calls in read")
converts schedule_timeout() to its freezable version, it was probably
correct at that time, but later, commit 2b514574f7e8
("net: af_unix: implement splice for stream af_unix sockets") breaks
the strong requirement for a freezable sleep, according to
commit 0f9548ca1091:

    We shouldn't try_to_freeze if locks are held.  Holding a lock can cause a
    deadlock if the lock is later acquired in the suspend or hibernate path
    (e.g.  by dpm).  Holding a lock can also cause a deadlock in the case of
    cgroup_freezer if a lock is held inside a frozen cgroup that is later
    acquired by a process outside that group.

The pipe_lock is still held at that point. So just revert commit 2b15af6f95.

Reported-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Colin Cross <ccross@android.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
---
 net/unix/af_unix.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 5d1c14a..6f37f9e 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -116,7 +116,6 @@
 #include <linux/mount.h>
 #include <net/checksum.h>
 #include <linux/security.h>
-#include <linux/freezer.h>
 
 struct hlist_head unix_socket_table[2 * UNIX_HASH_SIZE];
 EXPORT_SYMBOL_GPL(unix_socket_table);
@@ -2220,7 +2219,7 @@ static long unix_stream_data_wait(struct sock *sk, long timeo,
 
 		sk_set_bit(SOCKWQ_ASYNC_WAITDATA, sk);
 		unix_state_unlock(sk);
-		timeo = freezable_schedule_timeout(timeo);
+		timeo = schedule_timeout(timeo);
 		unix_state_lock(sk);
 
 		if (sock_flag(sk, SOCK_DEAD))
-- 
2.1.0

^ permalink raw reply related

* Re: [net-next PATCH v2] net: dummy: Introduce dummy virtual functions
From: Or Gerlitz @ 2016-11-17 22:04 UTC (permalink / raw)
  To: Phil Sutter; +Cc: David Miller, Linux Netdev List, Sabrina Dubroca
In-Reply-To: <20161114130205.18333-1-phil@nwl.cc>

On Mon, Nov 14, 2016 at 3:02 PM, Phil Sutter <phil@nwl.cc> wrote:

> Due to the assumption that all PFs are PCI devices, this implementation
> is not completely straightforward: In order to allow for
> rtnl_fill_ifinfo() to see the dummy VFs, a fake PCI parent device is
> attached to the dummy netdev. This has to happen at the right spot so
> register_netdevice() does not get confused. This patch abuses
> ndo_fix_features callback for that. In ndo_uninit callback, the fake
> parent is removed again for the same purpose.

So you did some mimic-ing of PCI interface, how do you let the user to
config the number of VFs? though a module param? why? if the module
param only serves to say how many VF the device supports, maybe
support the maximum possible by PCI spec and skip the module param?


> +module_param(num_vfs, int, 0);
> +MODULE_PARM_DESC(num_vfs, "Number of dummy VFs per dummy device");
> +

> @@ -190,6 +382,7 @@ static int __init dummy_init_one(void)
>         err = register_netdevice(dev_dummy);
>         if (err < 0)
>                 goto err;
> +
>         return 0;

nit, remove this added blank line..

^ permalink raw reply

* Re: [PATCH net 1/3] net: phy: realtek: add eee advertisement disable options
From: Jerome Brunet @ 2016-11-17 21:48 UTC (permalink / raw)
  To: Anand Moon
  Cc: netdev, devicetree, Florian Fainelli, Alexandre TORGUE,
	Neil Armstrong, Martin Blumenstingl, Kevin Hilman, Linux Kernel,
	Andre Roth, linux-amlogic, Carlo Caione, Giuseppe Cavallaro,
	linux-arm-kernel
In-Reply-To: <CANAwSgTB5=8tVj1FrZF3EgjK5mjUMwLYb90NeWSFDCsRjsuB6A@mail.gmail.com>

On Thu, 2016-11-17 at 23:30 +0530, Anand Moon wrote:
> Hi Jerone,
> 
> > > How about adding callback functionality for .soft_reset to handle
> > > BMCR
> > > where we update the Auto-Negotiation for the phy,
> > > as per the datasheet of the rtl8211f.

I think BMCR is already pretty well handled by the genphy, don't you
think ?

> > 
> > I'm not sure I understand how this would help with our issue (and
> > EEE).
> > Am I missing something or is it something unrelated that you would
> > like
> > to see happening on this driver ?
> > 
> [snip]
> 
> I was just tying other phy module to understand the feature.
> But in order to improve  the throughput I tried to integrate blow u-
> boot commit.
> 
> commit 3d6af748ebd831524cb22a29433e9092af469ec7
> Author: Shengzhou Liu <Shengzhou.Liu@freescale.com>
> Date:   Thu Mar 12 18:54:59 2015 +0800
> 
>     net/phy: Add support for realtek RTL8211F
> 
>     RTL8211F has different registers from RTL8211E.
>     This patch adds support for RTL8211F PHY which
>     can be found on Freescale's T1023 RDB board.
> 
> And added the similar functionality to  .config_aneg    =
> &rtl8211f_config_aneg,
> 

I assume this is the commit you are referring to : 
http://git.denx.de/?p=u-boot.git;a=commit;h=3d6af748ebd831524cb22a29433
e9092af469ec7

I tried looking a this particular commit and the other ones in
realtek.c history of u-boot. I don't really see what it does that linux
is not already doing.

> And I seem to have better results in through put with periodic drop
> but it recovers.
> -----
> odroid@odroid64:~$ iperf3 -c 10.0.0.102 -p 2006 -i 1 -t 100 -V
> iperf 3.0.11
> Linux odroid64 4.9.0-rc5-xc2ml #18 SMP PREEMPT Thu Nov 17 22:56:00 

[...]

> 
> Test Complete. Summary Results:
> [ ID] Interval           Transfer     Bandwidth       Retr
> [  4]   0.00-100.00 sec  10.5 GBytes   902
> Mbits/sec    4             sender
> [  4]   0.00-100.00 sec  10.5 GBytes   902
> Mbits/sec                  receiver
> CPU Utilization: local/sender 5.6% (0.2%u/5.4%s), remote/receiver
> 17.1% (1.2%u/15.9%s)
> 

That's the kind of throughput we have on the C2 once the link is
reliable (with EEE switch off for GbE)

> Can your confirm this at your end.
> Once confirm I will try to send this as a fix for this issue.
> 

I'm testing the code to disable EEE in a generic way. I'll post the RFC
for it asap.

> -Best Regards
> Anand Moon

^ permalink raw reply

* Re: stmmac/RTL8211F/Meson GXBB: TX throughput problems
From: Jerome Brunet @ 2016-11-17 21:47 UTC (permalink / raw)
  To: André Roth
  Cc: Martin Blumenstingl, Johnson Leung, Giuseppe CAVALLARO,
	linux-amlogic, Alexandre Torgue, netdev
In-Reply-To: <20161117194405.4ca7899b@gmail.com>

On Thu, 2016-11-17 at 19:44 +0100, André Roth wrote:
> Hi all,
> 
> > 
> > I checked again the kernel
> > at https://github.com/hardkernel/linux/tree/ odroidc2-3.14.y. The
> > version you mention (3.14.65-73) seems to be:
> > sha1: c75d5f4d1516cdd86d90a9d1c565bb0ed9251036 tag: jenkins-deb
> > s905
> > kernel-73
> 
> I downloaded the prebuilt image from hardkernel, I did not build the
> kernel myself. but hardkernel has an earlier release of the same
> kernel
> version, which works fine too. I assume they would have committed the
> change in the newer version..
>  
> > 
> > In this particular version, both realtek drivers:
> > - drivers/net/phy/realtek.c
> > - drivers/amlogic/ethernet/phy/am_realtek.c
> > 
> > have the hack to disable 1000M advertisement. I don't understand
> > how
> > it possible for you to have 1000Base-T Full Duplex with this, maybe
> > I'm missing something here ?
> 
> that's what I don't understand as well...
> 
> the patched kernel shows the following:
> 
> $ uname -a
> Linux T-06 4.9.0-rc4+ #21 SMP PREEMPT Sun Nov 13 12:07:19 UTC 2016
> 
> $ sudo ethtool eth0
> Settings for eth0:
>         Supported ports: [ TP MII ]
>         Supported link modes:   10baseT/Half 10baseT/Full 
>                                 100baseT/Half 100baseT/Full 
>                                 1000baseT/Full 
>         Supported pause frame use: No
>         Supports auto-negotiation: Yes
>         Advertised link modes:  10baseT/Half 10baseT/Full 
>                                 100baseT/Half 100baseT/Full 
>                                 1000baseT/Full 
>         Advertised pause frame use: No
>         Advertised auto-negotiation: Yes
>         Link partner advertised link modes:  10baseT/Half
> 10baseT/Full 
>                                              100baseT/Half
> 100baseT/Full 1000baseT/Half 1000baseT/Full 
>         Link partner advertised pause frame use: Symmetric Receive-
> only
>         Link partner advertised auto-negotiation: Yes
>         Speed: 1000Mb/s
>         Duplex: Full
>         Port: MII
>         PHYAD: 0
>         Transceiver: external
>         Auto-negotiation: on
>         Supports Wake-on: ug
>         Wake-on: d
>         Current message level: 0x0000003f (63)
>                                drv probe link timer ifdown ifup
>         Link detected: yes
> 
> $ sudo ethtool --show-eee eth0
> EEE Settings for eth0:
>         EEE status: disabled
>         Tx LPI: disabled
>         Supported EEE link modes:  100baseT/Full 
>                                    1000baseT/Full 
>         Advertised EEE link modes:  100baseT/Full 
>         Link partner advertised EEE link modes:  100baseT/Full 
>                                                  1000baseT/Full 
> 
> can it be that "EEE link modes" and the "normal" link modes are two
> different things ? 

Exactly, They are.
Hardkernel code disable both.

With hardkernel's kernel, you should not have 1000baseT/Full in
"Advertised link modes" and you would have nothing reported in
"Advertised EEE link modes"

> 
> > 
> > If you did compile the kernel yourself, could you check the 2 file
> > mentioned above ? Just to be sure there was no patch applied at the
> > last minute, which would not show up in the git history of
> > hardkernel ?
> 
> I cannot check this easily at the moment..
> 
> Regards,
> 
>  André
> 

^ permalink raw reply

* Re: net: BUG still has locks held in unix_stream_splice_read
From: Cong Wang @ 2016-11-17 21:44 UTC (permalink / raw)
  To: Al Viro
  Cc: Dmitry Vyukov, David Miller, Hannes Frederic Sowa, Eric Dumazet,
	netdev, LKML, syzkaller, Colin Cross, Mandeep Singh Baines
In-Reply-To: <20161010031450.GW19539@ZenIV.linux.org.uk>

On Sun, Oct 9, 2016 at 8:14 PM, Al Viro <viro@zeniv.linux.org.uk> wrote:
> E.g what will happen if some code does a read on AF_UNIX socket with
> some local mutex held?  AFAICS, there are exactly two callers of
> freezable_schedule_timeout() - this one and one in XFS; the latter is
> in a kernel thread where we do have good warranties about the locking
> environment, but here it's in the bleeding ->recvmsg/->splice_read and
> for those assumption that caller doesn't hold any locks is pretty
> strong, especially since it's not documented anywhere.
>
> What's going on there?

Commit 2b15af6f95 ("af_unix: use freezable blocking calls in read")
converts schedule_timeout() to its freezable version, it was probably correct
at that time, but later, commit 2b514574f7e88c8498027ee366
("net: af_unix: implement splice for stream af_unix sockets") breaks its
requirement for a freezable sleep:

    commit 0f9548ca10916dec166eaf74c816bded7d8e611d

    lockdep: check that no locks held at freeze time

    We shouldn't try_to_freeze if locks are held.  Holding a lock can cause a
    deadlock if the lock is later acquired in the suspend or hibernate path
    (e.g.  by dpm).  Holding a lock can also cause a deadlock in the case of
    cgroup_freezer if a lock is held inside a frozen cgroup that is later
    acquired by a process outside that group.

So probably we just need to revert commit 2b15af6f95 now.

I am going to send a revert for at least -net and -stable, since Dmitry
saw this warning again.

^ permalink raw reply

* Re: Netperf UDP issue with connected sockets
From: Eric Dumazet @ 2016-11-17 21:44 UTC (permalink / raw)
  To: Jesper Dangaard Brouer; +Cc: Rick Jones, netdev, Saeed Mahameed, Tariq Toukan
In-Reply-To: <20161117221941.3b525181@redhat.com>

On Thu, 2016-11-17 at 22:19 +0100, Jesper Dangaard Brouer wrote:

> 
> Maybe you can share your udp flood "udpsnd" program source?

Very ugly. This is based on what I wrote when tracking the UDP v6
checksum bug (4f2e4ad56a65f3b7d64c258e373cb71e8d2499f4 net: mangle zero
checksum in skb_checksum_help()), because netperf sends the same message
over and over...


Use -d 2   to remove the ip_idents_reserve() overhead.


#define _GNU_SOURCE

#include <errno.h>
#include <error.h>
#include <linux/errqueue.h>
#include <netinet/in.h>
#include <sched.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/time.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <unistd.h>

char buffer[1400];

int main(int argc, char** argv) {
  int fd, i;
  struct sockaddr_in6 addr;
  char *host = "2002:af6:798::1";
  int family = AF_INET6;
  int discover = -1;

  while ((i = getopt(argc, argv, "4H:d:")) != -1) {
    switch (i) {
    case 'H': host = optarg; break;
    case '4': family = AF_INET; break;
    case 'd': discover = atoi(optarg); break;
    }
  }
  fd = socket(family, SOCK_DGRAM, 0);
  if (fd < 0)
    error(1, errno, "failed to create socket");
  if (discover != -1)
    setsockopt(fd, SOL_IP, IP_MTU_DISCOVER,
               &discover, sizeof(discover));

  memset(&addr, 0, sizeof(addr));
  if (family == AF_INET6) {
	  addr.sin6_family = AF_INET6;
	  addr.sin6_port = htons(9);
      inet_pton(family, host, (void *)&addr.sin6_addr.s6_addr);
  } else {
    struct sockaddr_in *in = (struct sockaddr_in *)&addr;
    in->sin_family = family;
    in->sin_port = htons(9);
      inet_pton(family, host, &in->sin_addr);
  }
  connect(fd, (struct sockaddr *)&addr,
          (family == AF_INET6) ? sizeof(addr) :
                                 sizeof(struct sockaddr_in));
  memset(buffer, 1, 1400);
  for (i = 0; i < 655360000; i++) {
    memcpy(buffer, &i, sizeof(i));
    send(fd, buffer, 100 + rand() % 200, 0);
  }
  return 0;
}

^ permalink raw reply

* Re: Netperf UDP issue with connected sockets
From: Jesper Dangaard Brouer @ 2016-11-17 21:19 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Rick Jones, netdev, Saeed Mahameed, Tariq Toukan, brouer
In-Reply-To: <1479408683.8455.273.camel@edumazet-glaptop3.roam.corp.google.com>

On Thu, 17 Nov 2016 10:51:23 -0800
Eric Dumazet <eric.dumazet@gmail.com> wrote:

> On Thu, 2016-11-17 at 19:30 +0100, Jesper Dangaard Brouer wrote:
> 
> > The point is I can see a socket Send-Q forming, thus we do know the
> > application have something to send. Thus, and possibility for
> > non-opportunistic bulking. Allowing/implementing bulk enqueue from
> > socket layer into qdisc layer, should be fairly simple (and rest of
> > xmit_more is already in place).    
> 
> 
> As I said, you are fooled by TX completions.
> 
> Please make sure to increase the sndbuf limits !
> 
> echo 2129920 >/proc/sys/net/core/wmem_default
> 
> lpaa23:~# sar -n DEV 1 10|grep eth1
> 10:49:25         eth1      7.00 9273283.00      0.61 2187214.90      0.00      0.00      0.00
> 10:49:26         eth1      1.00 9230795.00      0.06 2176787.57      0.00      0.00      1.00
> 10:49:27         eth1      2.00 9247906.00      0.17 2180915.45      0.00      0.00      0.00
> 10:49:28         eth1      3.00 9246542.00      0.23 2180790.38      0.00      0.00      1.00
> 10:49:29         eth1      1.00 9239218.00      0.06 2179044.83      0.00      0.00      0.00
> 10:49:30         eth1      3.00 9248775.00      0.23 2181257.84      0.00      0.00      1.00
> 10:49:31         eth1      4.00 9225471.00      0.65 2175772.75      0.00      0.00      0.00
> 10:49:32         eth1      2.00 9253536.00      0.33 2182666.44      0.00      0.00      1.00
> 10:49:33         eth1      1.00 9265900.00      0.06 2185598.40      0.00      0.00      0.00
> 10:49:34         eth1      1.00 6949031.00      0.06 1638889.63      0.00      0.00      1.00
> Average:         eth1      2.50 9018045.70      0.25 2126893.82      0.00      0.00      0.50
> 
> 
> lpaa23:~# ethtool -S eth1|grep more; sleep 1;ethtool -S eth1|grep more
>      xmit_more: 2251366909
>      xmit_more: 2256011392
> 
> lpaa23:~# echo 2256011392-2251366909 | bc
> 4644483

xmit more not happen that frequently for my setup, it does happen
sometimes. And I do monitor with "ethtool -S".

~/git/network-testing/bin/ethtool_stats.pl --sec 2 --dev mlx5p2
Show adapter(s) (mlx5p2) statistics (ONLY that changed!)
Ethtool(mlx5p2  ) stat:     92900913 (     92,900,913) <= tx0_bytes /sec
Ethtool(mlx5p2  ) stat:        36073 (         36,073) <= tx0_nop /sec
Ethtool(mlx5p2  ) stat:      1548349 (      1,548,349) <= tx0_packets /sec
Ethtool(mlx5p2  ) stat:            1 (              1) <= tx0_xmit_more /sec
Ethtool(mlx5p2  ) stat:     92884899 (     92,884,899) <= tx_bytes /sec
Ethtool(mlx5p2  ) stat:     99297696 (     99,297,696) <= tx_bytes_phy /sec
Ethtool(mlx5p2  ) stat:      1548082 (      1,548,082) <= tx_csum_partial /sec
Ethtool(mlx5p2  ) stat:      1548082 (      1,548,082) <= tx_packets /sec
Ethtool(mlx5p2  ) stat:      1551527 (      1,551,527) <= tx_packets_phy /sec
Ethtool(mlx5p2  ) stat:     99076658 (     99,076,658) <= tx_prio1_bytes /sec
Ethtool(mlx5p2  ) stat:      1548073 (      1,548,073) <= tx_prio1_packets /sec
Ethtool(mlx5p2  ) stat:     92936078 (     92,936,078) <= tx_vport_unicast_bytes /sec
Ethtool(mlx5p2  ) stat:      1548934 (      1,548,934) <= tx_vport_unicast_packets /sec
Ethtool(mlx5p2  ) stat:            1 (              1) <= tx_xmit_more /sec

(after several attempts I got:)
$ ethtool -S mlx5p2|grep more; sleep 1;ethtool -S mlx5p2|grep more
     tx_xmit_more: 14048
     tx0_xmit_more: 14048
     tx_xmit_more: 14049
     tx0_xmit_more: 14049

This was with:
 $ grep -H . /proc/sys/net/core/wmem_default
 /proc/sys/net/core/wmem_default:2129920

>    PerfTop:   76969 irqs/sec  kernel:96.6%  exact: 100.0% [4000Hz cycles:pp],  (all, 48 CPUs)
> ---------------------------------------------------------------------------------------------
> 
>     11.64%  [kernel]  [k] skb_set_owner_w               
>      6.21%  [kernel]  [k] queued_spin_lock_slowpath     
>      4.76%  [kernel]  [k] _raw_spin_lock                
>      4.40%  [kernel]  [k] __ip_make_skb                 
>      3.10%  [kernel]  [k] sock_wfree                    
>      2.87%  [kernel]  [k] ipt_do_table                  
>      2.76%  [kernel]  [k] fq_dequeue                    
>      2.71%  [kernel]  [k] mlx4_en_xmit                  
>      2.50%  [kernel]  [k] __dev_queue_xmit              
>      2.29%  [kernel]  [k] __ip_append_data.isra.40      
>      2.28%  [kernel]  [k] udp_sendmsg                   
>      2.01%  [kernel]  [k] __alloc_skb                   
>      1.90%  [kernel]  [k] napi_consume_skb              
>      1.63%  [kernel]  [k] udp_send_skb                  
>      1.62%  [kernel]  [k] skb_release_data              
>      1.62%  [kernel]  [k] entry_SYSCALL_64_fastpath     
>      1.56%  [kernel]  [k] dev_hard_start_xmit           
>      1.55%  udpsnd    [.] __libc_send                   
>      1.48%  [kernel]  [k] netif_skb_features            
>      1.42%  [kernel]  [k] __qdisc_run                   
>      1.35%  [kernel]  [k] sk_dst_check                  
>      1.33%  [kernel]  [k] sock_def_write_space          
>      1.30%  [kernel]  [k] kmem_cache_alloc_node_trace   
>      1.29%  [kernel]  [k] __local_bh_enable_ip          
>      1.21%  [kernel]  [k] copy_user_enhanced_fast_string
>      1.08%  [kernel]  [k] __kmalloc_reserve.isra.40     
>      1.08%  [kernel]  [k] SYSC_sendto                   
>      1.07%  [kernel]  [k] kmem_cache_alloc_node         
>      0.95%  [kernel]  [k] ip_finish_output2             
>      0.95%  [kernel]  [k] ktime_get                     
>      0.91%  [kernel]  [k] validate_xmit_skb             
>      0.88%  [kernel]  [k] sock_alloc_send_pskb          
>      0.82%  [kernel]  [k] sock_sendmsg                  

I'm more interested in why I see fib_table_lookup() and
__ip_route_output_key_hash() when you don't ?!?  There must be some
mistake in my setup!

Maybe you can share your udp flood "udpsnd" program source?

Maybe I'm missing some important sysctl /proc/net/sys/ ?

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer

p.s. I placed my testing software here:
 https://github.com/netoptimizer/network-testing/tree/master/src

^ permalink raw reply

* [PATCH] net: fec: Detect and recover receive queue hangs
From: Chris Lesiak @ 2016-11-17 21:14 UTC (permalink / raw)
  To: Fugang Duan; +Cc: netdev, linux-kernel, Jaccon Bastiaansen, chris.lesiak

This corrects a problem that appears to be similar to ERR006358.  But
while ERR006358 is a race when the tx queue transitions from empty to
not empty, this problem is a race when the rx queue transitions from
full to not full.

The symptom is a receive queue that is stuck.  The ENET_RDAR register
will read 0, indicating that there are no empty receive descriptors in
the receive ring.  Since no additional frames can be queued, no RXF
interrupts occur.

This problem can be triggered with a 1 Gb link and about 400 Mbps of
traffic.

This patch detects this condition, sets the work_rx bit, and
reschedules the poll method.

Signed-off-by: Chris Lesiak <chris.lesiak@licor.com>
---
 drivers/net/ethernet/freescale/fec_main.c | 31 +++++++++++++++++++++++++++++++
 1 file changed, 31 insertions(+)

diff --git a/drivers/net/ethernet/freescale/fec_main.c b/drivers/net/ethernet/freescale/fec_main.c
index fea0f33..8a87037 100644
--- a/drivers/net/ethernet/freescale/fec_main.c
+++ b/drivers/net/ethernet/freescale/fec_main.c
@@ -1588,6 +1588,34 @@ fec_enet_interrupt(int irq, void *dev_id)
 	return ret;
 }
 
+static inline bool
+fec_enet_recover_rxq(struct fec_enet_private *fep, u16 queue_id)
+{
+	int work_bit = (queue_id == 0) ? 2 : ((queue_id == 1) ? 0 : 1);
+
+	if (readl(fep->rx_queue[queue_id]->bd.reg_desc_active))
+		return false;
+
+	dev_notice_once(&fep->pdev->dev, "Recovered rx queue\n");
+
+	fep->work_rx |= 1 << work_bit;
+
+	return true;
+}
+
+static inline bool fec_enet_recover_rxqs(struct fec_enet_private *fep)
+{
+	unsigned int q;
+	bool ret = false;
+
+	for (q = 0; q < fep->num_rx_queues; q++) {
+		if (fec_enet_recover_rxq(fep, q))
+			ret = true;
+	}
+
+	return ret;
+}
+
 static int fec_enet_rx_napi(struct napi_struct *napi, int budget)
 {
 	struct net_device *ndev = napi->dev;
@@ -1601,6 +1629,9 @@ static int fec_enet_rx_napi(struct napi_struct *napi, int budget)
 	if (pkts < budget) {
 		napi_complete(napi);
 		writel(FEC_DEFAULT_IMASK, fep->hwp + FEC_IMASK);
+
+		if (fec_enet_recover_rxqs(fep) && napi_reschedule(napi))
+			writel(FEC_NAPI_IMASK, fep->hwp + FEC_IMASK);
 	}
 	return pkts;
 }
-- 
2.5.5

^ permalink raw reply related

* Re: [PATCH net 1/1] net sched filters: pass netlink message flags in event notification
From: Cong Wang @ 2016-11-17 21:02 UTC (permalink / raw)
  To: Roman Mashak
  Cc: David Miller, Linux Kernel Network Developers, Jamal Hadi Salim
In-Reply-To: <1479334570-25159-1-git-send-email-mrv@mojatatu.com>

On Wed, Nov 16, 2016 at 2:16 PM, Roman Mashak <mrv@mojatatu.com> wrote:
> Userland client should be able to read an event, and reflect it back to
> the kernel, therefore it needs to extract complete set of netlink flags.
>
> For example, this will allow "tc monitor" to distinguish Add and Replace
> operations.
>
> Signed-off-by: Roman Mashak <mrv@mojatatu.com>
> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
> ---
>  net/sched/cls_api.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
> index 2b2a797..8e93d4a 100644
> --- a/net/sched/cls_api.c
> +++ b/net/sched/cls_api.c
> @@ -112,7 +112,7 @@ static void tfilter_notify_chain(struct net *net, struct sk_buff *oskb,
>
>         for (it_chain = chain; (tp = rtnl_dereference(*it_chain)) != NULL;
>              it_chain = &tp->next)
> -               tfilter_notify(net, oskb, n, tp, 0, event, false);
> +               tfilter_notify(net, oskb, n, tp, n->nlmsg_flags, event, false);


I must miss something, why does it make sense to pass n->nlmsg_flags
as 'fh' to tfilter_notify()??


>  }
>
>  /* Select new prio value from the range, managed by kernel. */
> @@ -430,7 +430,8 @@ static int tfilter_notify(struct net *net, struct sk_buff *oskb,
>         if (!skb)
>                 return -ENOBUFS;
>
> -       if (tcf_fill_node(net, skb, tp, fh, portid, n->nlmsg_seq, 0, event) <= 0) {
> +       if (tcf_fill_node(net, skb, tp, fh, portid, n->nlmsg_seq,
> +                         n->nlmsg_flags, event) <= 0) {


This part makes sense.

^ permalink raw reply

* Re: [PATCH net-next v2 2/4] bpf, mlx5: fix various refcount issues in mlx5e_xdp_set
From: Daniel Borkmann @ 2016-11-17 20:03 UTC (permalink / raw)
  To: Saeed Mahameed
  Cc: David S. Miller, Alexei Starovoitov, Brenden Blanco, Zhiyi Sun,
	Rana Shahout, Saeed Mahameed, Linux Netdev List
In-Reply-To: <CALzJLG-ZQxF0t1N5ry2OUqcBjrs+7XzuWb_dRDOjEnRf1975Hg@mail.gmail.com>

On 11/17/2016 10:48 AM, Saeed Mahameed wrote:
> On Wed, Nov 16, 2016 at 5:26 PM, Daniel Borkmann <daniel@iogearbox.net> wrote:
>> On 11/16/2016 03:30 PM, Saeed Mahameed wrote:
>>> On Wed, Nov 16, 2016 at 3:54 PM, Daniel Borkmann <daniel@iogearbox.net>
>>> wrote:
>>>> On 11/16/2016 01:25 PM, Saeed Mahameed wrote:
>>>>> On Wed, Nov 16, 2016 at 2:04 AM, Daniel Borkmann <daniel@iogearbox.net>
>>>>> wrote:
>>>>>>
>>>>>> There are multiple issues in mlx5e_xdp_set():
>>>>>>
>>>>>> 1) The batched bpf_prog_add() is currently not checked for errors! When
>>>>>>       doing so, it should be done at an earlier point in time to makes
>>>>>> sure
>>>>>>       that we cannot fail anymore at the time we want to set the program
>>>>>> for
>>>>>>       each channel. This only means that we have to undo the
>>>>>> bpf_prog_add()
>>>>>>       in case we return due to required reset.
>>>>>>
>>>>>> 2) When swapping the priv->xdp_prog, then no extra reference count must
>>>>>> be
>>>>>>       taken since we got that from call path via dev_change_xdp_fd()
>>>>>> already.
>>>>>>       Otherwise, we'd never be able to free the program. Also,
>>>>>> bpf_prog_add()
>>>>>>       without checking the return code could fail.
>>>>>>
>>>>>> Fixes: 86994156c736 ("net/mlx5e: XDP fast RX drop bpf programs
>>>>>> support")
>>>>>> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
>>>>>> ---
>>>>>>     drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 25
>>>>>> ++++++++++++++++++-----
>>>>>>     1 file changed, 20 insertions(+), 5 deletions(-)
>>>>>>
>>>>>> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
>>>>>> b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
>>>>>> index ab0c336..cf26672 100644
>>>>>> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
>>>>>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
>>>>>> @@ -3142,6 +3142,17 @@ static int mlx5e_xdp_set(struct net_device
>>>>>> *netdev, struct bpf_prog *prog)
>>>>>>                    goto unlock;
>>>>>>            }
>>>>>>
>>>>>> +       if (prog) {
>>>>>> +               /* num_channels is invariant here, so we can take the
>>>>>> +                * batched reference right upfront.
>>>>>> +                */
>>>>>> +               prog = bpf_prog_add(prog, priv->params.num_channels);
>>>>>> +               if (IS_ERR(prog)) {
>>>>>> +                       err = PTR_ERR(prog);
>>>>>> +                       goto unlock;
>>>>>> +               }
>>>>>> +       }
>>>>>> +
>>>>>
>>>>> With this approach you will end up taking a ref count twice per RQ! on
>>>>> the first time xdp_set is called i.e (old_prog = NULL, prog != NULL).
>>>>> One ref will be taken per RQ/Channel from the above code, and since
>>>>> reset will be TRUE mlx5e_open_locked will be called and another ref
>>>>> count will be taken on mlx5e_create_rq.
>>>>>
>>>>> The issue here is that we have two places for ref count accounting,
>>>>> xdp_set and  mlx5e_create_rq. Having ref-count updates in
>>>>> mlx5e_create_rq is essential for num_channels configuration changes
>>>>> (mlx5e_set_ringparam).
>>>>>
>>>>> Our previous approach made sure that only one path will do the ref
>>>>> counting (mlx5e_open_locked vs. mlx5e_xdp_set batch ref update when
>>>>> reset == false).
>>>>
>>>> That is correct, for a short time bpf_prog_add() was charged also when
>>>> we reset. When we reset, we will then jump to unlock_put and safely undo
>>>> this since we took ref from mlx5e_create_rq() already in that case, and
>>>> return successfully. That was intentional, so that eventually we end up
>>>> just taking a /single/ ref per RQ when we exit mlx5e_xdp_set(), but more
>>>> below ...
>>>>
>>>> [...]
>>>>>
>>>>> 2. Keep the current approach and make sure to not update the ref count
>>>>> twice, you can batch update only if (!reset && was_open) otherwise you
>>>>> can rely on mlx5e_open_locked to take the num_channels ref count for
>>>>> you.
>>>>>
>>>>> Personally I prefer option 2, since we will keep the current logic
>>>>> which will allow configuration updates such as (mlx5e_set_ringparam)
>>>>> to not worry about ref counting since it will be done in the reset
>>>>> flow.
>>>>
>>>> ... agree on keeping current approach. I actually like the idea, so we'd
>>>> end up with this simpler version for the batched ref then.
>>>
>>> Great :).
>>>
>>> So let's do the batched update only if we are not going to reset (we
>>> already know that in advance), instead of the current patch where you
>>> batch update unconditionally and then
>>>    unlock_put in case reset was performed (which is just redundant and
>>> confusing).
>>>
>>> Please note that if open_locked fails you need to goto unlock_put.
>>
>> Sorry, I'm not quite sure I follow you here; are you /now/ commenting on
>> the original patch or on the already updated diff I did from my very last
>> email, that is, http://patchwork.ozlabs.org/patch/695564/ ?
>
> I am commenting on this patch  "V2 2/4".
> this patch will take batched ref count unconditionally and release the
> extra refs "unlock_put" if reset was performed.
> I am asking to take the batched ref only if we are not going to reset
> in advance to save the extra "unlock_put"

Okay, sure, it's clear and we discussed about that already earlier, and
my response back then was the diff in the /end/ of this email here:

   https://www.spinics.net/lists/netdev/msg404709.html

Hence my confusion why we were back on the original patch after that. ;)
Anyway, I'll get the updated set with the suggestion out tomorrow, thanks
for the review!

>>>> Note, your "bpf_prog_add(prog, 1) // one ref for the device." is not
>>>> needed
>>>> since this we already got that one through dev_change_xdp_fd() as
>>>> mentioned.
>>>
>>> If it is not needed then why we need bpf_prog_put on mlx5e_nic_cleanup
>>> in your next patch? this doesn't look symmetric (right) !
>>> you shouldn't release a reference you didn't take.
>>> Overall with this series the driver can take num_channels refs and
>>> will release num_channels refs on mlx5e_close. we shouldn't release
>>> one extra ref on NIC cleanup.
>>
>> I already explained this in the commit description; when dev_change_xdp_fd()
>> is called and sees a valid fd, it does bpf_prog_get_type(), which calls the
>> __bpf_prog_get() taking a ref on the program (bpf_prog_inc()). That is then
>> passed down to ops->ndo_xdp(). Only in case of error from the ->ndo_xdp()
>> callback, we bpf_prog_put() this reference from dev_change_xdp_fd() side.
>>
>> For drivers that implement against this ndo, it means that you need N-1 refs
>> in addition. Have a look at the other two in-tree users, which do it
>> correctly,
>> that is mlx4_xdp_set() and nfp_net_xdp_setup().
>>
>> It's documented here (include/linux/netdevice.h) with ndo_xdp referring to
>> it:
>>
>> /* These structures hold the attributes of xdp state that are being passed
>>   * to the netdevice through the xdp op.
>>   */
>> enum xdp_netdev_command {
>>          /* Set or clear a bpf program used in the earliest stages of packet
>>           * rx. The prog will have been loaded as BPF_PROG_TYPE_XDP. The
>> callee
>>           * is responsible for calling bpf_prog_put on any old progs that are
>>           * stored. In case of error, the callee need not release the new
>> prog
>>           * reference, but on success it takes ownership and must
>> bpf_prog_put
>>           * when it is no longer used.
>>           */
>>          XDP_SETUP_PROG,
>> [...]
>> };
>>
>> I think reason why this is rather confusing would be that initially, it was
>> just a single prog for all queues, but it was requested along the way to
>> move
>> prog pointer down to queues instead and let them have their own ref, so that
>> some time in future individual progs from a subset of the queues can be
>> exchanged.
>>
>> Since the way xdp in mlx5 is implemented, you currently have the
>> priv->xdp_prog
>> as the control prog. That's okay right now, but requires to drop the last
>> ref
>> on priv->xdp_prog via bpf_prog_put() when netdev is dismantled and
>> priv->xdp_prog
>> still present.
>
> Got it, but i would rather make the stack cleanup those refs once it
> receives an unregester_netdev event instead of counting on netdev to
> do it, as i said before, like any other resource added to the netdev
> by the stack (vlan/mac/etc..).
>
> but this is a general discussion of xdp API, it won't block this series :).
>
> Saeed.

^ permalink raw reply

* [PATCH net-next v4 5/5] net: dsa: mv88e6xxx: Select IRQ_DOMAIN
From: Florian Fainelli @ 2016-11-17 19:19 UTC (permalink / raw)
  To: netdev
  Cc: davem, mw, arnd, gregory.clement, Shaohui.Xie, andrew,
	Florian Fainelli
In-Reply-To: <20161117191914.11077-1-f.fainelli@gmail.com>

Some architectures may not define IRQ_DOMAIN (like m32r), fixes
undefined references to IRQ_DOMAIN functions.

Fixes: dc30c35be720 ("net: dsa: mv88e6xxx: Implement interrupt support.")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
---
 drivers/net/dsa/mv88e6xxx/Kconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/dsa/mv88e6xxx/Kconfig b/drivers/net/dsa/mv88e6xxx/Kconfig
index 486668813e15..1aaa7a95ebc4 100644
--- a/drivers/net/dsa/mv88e6xxx/Kconfig
+++ b/drivers/net/dsa/mv88e6xxx/Kconfig
@@ -1,6 +1,7 @@
 config NET_DSA_MV88E6XXX
 	tristate "Marvell 88E6xxx Ethernet switch fabric support"
 	depends on NET_DSA
+	select IRQ_DOMAIN
 	select NET_DSA_TAG_EDSA
 	select NET_DSA_TAG_DSA
 	help
-- 
2.9.3

^ permalink raw reply related

* [PATCH net-next v4 4/5] net: marvell: Allow drivers to be built with COMPILE_TEST
From: Florian Fainelli @ 2016-11-17 19:19 UTC (permalink / raw)
  To: netdev
  Cc: davem, mw, arnd, gregory.clement, Shaohui.Xie, andrew,
	Florian Fainelli
In-Reply-To: <20161117191914.11077-1-f.fainelli@gmail.com>

All Marvell Ethernet drivers actually build fine with COMPILE_TEST with
a few warnings. We need to add a few HAS_DMA dependencies to fix linking
failures on problematic architectures like m32r.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
---
 drivers/net/ethernet/marvell/Kconfig | 11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/marvell/Kconfig b/drivers/net/ethernet/marvell/Kconfig
index 2664827ddecd..d74d4e6f0b34 100644
--- a/drivers/net/ethernet/marvell/Kconfig
+++ b/drivers/net/ethernet/marvell/Kconfig
@@ -5,7 +5,7 @@
 config NET_VENDOR_MARVELL
 	bool "Marvell devices"
 	default y
-	depends on PCI || CPU_PXA168 || MV64X60 || PPC32 || PLAT_ORION || INET
+	depends on PCI || CPU_PXA168 || MV64X60 || PPC32 || PLAT_ORION || INET || COMPILE_TEST
 	---help---
 	  If you have a network (Ethernet) card belonging to this class, say Y.
 
@@ -18,7 +18,8 @@ if NET_VENDOR_MARVELL
 
 config MV643XX_ETH
 	tristate "Marvell Discovery (643XX) and Orion ethernet support"
-	depends on (MV64X60 || PPC32 || PLAT_ORION) && INET
+	depends on (MV64X60 || PPC32 || PLAT_ORION || COMPILE_TEST) && INET
+	depends on HAS_DMA
 	select PHYLIB
 	select MVMDIO
 	---help---
@@ -55,7 +56,8 @@ config MVNETA_BM_ENABLE
 
 config MVNETA
 	tristate "Marvell Armada 370/38x/XP network interface support"
-	depends on PLAT_ORION
+	depends on PLAT_ORION || COMPILE_TEST
+	depends on HAS_DMA
 	select MVMDIO
 	select FIXED_PHY
 	---help---
@@ -77,7 +79,8 @@ config MVNETA_BM
 
 config MVPP2
 	tristate "Marvell Armada 375 network interface support"
-	depends on MACH_ARMADA_375
+	depends on MACH_ARMADA_375 || COMPILE_TEST
+	depends on HAS_DMA
 	select MVMDIO
 	---help---
 	  This driver supports the network interface units in the
-- 
2.9.3

^ permalink raw reply related

* [PATCH net-next v4 3/5] bus: mvebu-bus: Provide inline stub for mvebu_mbus_get_dram_win_info
From: Florian Fainelli @ 2016-11-17 19:19 UTC (permalink / raw)
  To: netdev
  Cc: davem, mw, arnd, gregory.clement, Shaohui.Xie, andrew,
	Florian Fainelli
In-Reply-To: <20161117191914.11077-1-f.fainelli@gmail.com>

In preparation for allowing CONFIG_MVNETA_BM to build with COMPILE_TEST,
provide an inline stub for mvebu_mbus_get_dram_win_info().

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
---
 include/linux/mbus.h | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/include/linux/mbus.h b/include/linux/mbus.h
index 2931aa43dab1..0d3f14fd2621 100644
--- a/include/linux/mbus.h
+++ b/include/linux/mbus.h
@@ -82,6 +82,7 @@ static inline int mvebu_mbus_get_io_win_info(phys_addr_t phyaddr, u32 *size,
 }
 #endif
 
+#ifdef CONFIG_MVEBU_MBUS
 int mvebu_mbus_save_cpu_target(u32 __iomem *store_addr);
 void mvebu_mbus_get_pcie_mem_aperture(struct resource *res);
 void mvebu_mbus_get_pcie_io_aperture(struct resource *res);
@@ -97,5 +98,12 @@ int mvebu_mbus_init(const char *soc, phys_addr_t mbus_phys_base,
 		    size_t mbus_size, phys_addr_t sdram_phys_base,
 		    size_t sdram_size);
 int mvebu_mbus_dt_init(bool is_coherent);
+#else
+static inline int mvebu_mbus_get_dram_win_info(phys_addr_t phyaddr, u8 *target,
+					       u8 *attr)
+{
+	return -EINVAL;
+}
+#endif /* CONFIG_MVEBU_MBUS */
 
 #endif /* __LINUX_MBUS_H */
-- 
2.9.3

^ permalink raw reply related

* [PATCH net-next v4 2/5] net: fsl: Allow most drivers to be built with COMPILE_TEST
From: Florian Fainelli @ 2016-11-17 19:19 UTC (permalink / raw)
  To: netdev
  Cc: davem, mw, arnd, gregory.clement, Shaohui.Xie, andrew,
	Florian Fainelli
In-Reply-To: <20161117191914.11077-1-f.fainelli@gmail.com>

There are only a handful of Freescale Ethernet drivers that don't
actually build with COMPILE_TEST:

* FEC, for which we would need to define a default register layout if no
  supported architecture is defined

* UCC_GETH which depends on PowerPC cpm.h header (which could be moved
  to a generic location)

* GIANFAR needs to depend on HAS_DMA to fix linking failures on some
  architectures (like m32r)

We need to fix an unmet dependency to get there though:
warning: (FSL_XGMAC_MDIO) selects OF_MDIO which has unmet direct
dependencies (OF && PHYLIB)

which would result in CONFIG_OF_MDIO=[ym] without CONFIG_OF to be set.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
---
 drivers/net/ethernet/freescale/Kconfig | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/freescale/Kconfig b/drivers/net/ethernet/freescale/Kconfig
index aa3f615886b4..0d415516b577 100644
--- a/drivers/net/ethernet/freescale/Kconfig
+++ b/drivers/net/ethernet/freescale/Kconfig
@@ -8,7 +8,7 @@ config NET_VENDOR_FREESCALE
 	depends on FSL_SOC || QUICC_ENGINE || CPM1 || CPM2 || PPC_MPC512x || \
 		   M523x || M527x || M5272 || M528x || M520x || M532x || \
 		   ARCH_MXC || ARCH_MXS || (PPC_MPC52xx && PPC_BESTCOMM) || \
-		   ARCH_LAYERSCAPE
+		   ARCH_LAYERSCAPE || COMPILE_TEST
 	---help---
 	  If you have a network (Ethernet) card belonging to this class, say Y.
 
@@ -65,6 +65,7 @@ config FSL_PQ_MDIO
 config FSL_XGMAC_MDIO
 	tristate "Freescale XGMAC MDIO"
 	select PHYLIB
+	depends on OF
 	select OF_MDIO
 	---help---
 	  This driver supports the MDIO bus on the Fman 10G Ethernet MACs, and
@@ -85,6 +86,7 @@ config UGETH_TX_ON_DEMAND
 
 config GIANFAR
 	tristate "Gianfar Ethernet"
+	depends on HAS_DMA
 	select FSL_PQ_MDIO
 	select PHYLIB
 	select CRC32
-- 
2.9.3

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox