Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH net-next] net: dsa: remove tag_protocol from dsa_switch
From: David Miller @ 2016-04-21 17:44 UTC (permalink / raw)
  To: vivien.didelot; +Cc: netdev, linux-kernel, kernel, f.fainelli, andrew
In-Reply-To: <1461018244-5371-1-git-send-email-vivien.didelot@savoirfairelinux.com>

From: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Date: Mon, 18 Apr 2016 18:24:04 -0400

> Having the tag protocol in dsa_switch_driver for setup time and in
> dsa_switch_tree for runtime is enough. Remove dsa_switch's one.
> 
> Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>

Applied.

^ permalink raw reply

* Re: [PATCH net] tcp: Fix SOF_TIMESTAMPING_TX_ACK when handling dup acks
From: David Miller @ 2016-04-21 17:46 UTC (permalink / raw)
  To: kafai
  Cc: netdev, kernel-team, edumazet, ncardwell, soheil.kdev, willemb,
	ycheng
In-Reply-To: <1461019193-3034571-1-git-send-email-kafai@fb.com>

From: Martin KaFai Lau <kafai@fb.com>
Date: Mon, 18 Apr 2016 15:39:53 -0700

> Assuming SOF_TIMESTAMPING_TX_ACK is on. When dup acks are received,
> it could incorrectly think that a skb has already
> been acked and queue a SCM_TSTAMP_ACK cmsg to the
> sk->sk_error_queue.
> 
> In tcp_ack_tstamp(), it checks
> 'between(shinfo->tskey, prior_snd_una, tcp_sk(sk)->snd_una - 1)'.
> If prior_snd_una == tcp_sk(sk)->snd_una like the following packetdrill
> script, between() returns true but the tskey is actually not acked.
> e.g. try between(3, 2, 1).
> 
> The fix is to replace between() with one before() and one !before().
> By doing this, the -1 offset on the tcp_sk(sk)->snd_una can also be
> removed.
 ...
> Signed-off-by: Martin KaFai Lau <kafai@fb.com>

Applied, thank you.

^ permalink raw reply

* Re: drop all fragments inside tx queue if one gets dropped
From: Michael Richardson @ 2016-04-21 17:48 UTC (permalink / raw)
  To: Rick Jones; +Cc: netdev, linux-wpan, Alexander Aring
In-Reply-To: <5717EA7E.1020606@hpe.com>

[-- Attachment #1: Type: text/plain, Size: 1473 bytes --]

Rick Jones <rick.jones2@hpe.com> wrote:
    > I don't recall seeing similar poor behaviour in Linux; I would have
    > assumed
    > that the intra-stack flow-control "took care" of it.  Perhaps there is
    > something specific to wpan which precludes that?

The major user of big UDP packets in the 1990s was NFS.
I fondly recall deploying wsize=1024,rsize=1024 for NFS mounts between HPUX,
Apollo and Sun machines across the intersite Ottawa BNR "WAN"

NFS mounts now use TCP by default, and NFS is not a well used protocol
outside of clueful people (everyone else uses CIFS), and modern machines have
way DMA engines in their ethernet that can accomodate more than enough
xmit buffers to perhaps make this moot.  But, I dealt with this very problem
with a Linux NFS server that would get GbE XOFF's by a broken Cisco switch
that wouldn't always XON, and would seem to drop it's queue. (Turning QoS off
on the cisco switch made it tolerable.)

Still, Alex, it would be worth looking at whether the NFS UDP transmitter
does anything clueful to keep from overwhelming the ethernet layer.

wpan deals with sub-IP fragmentation of 1280 (or larger) IPv6 packets into
127 6lowpan fragments for transmission over 802.15.4 interfaces which run at
typically 250kb/s.   Those radios are typically SPI interfaced (often with
bit-banged SPI interfaces), and the radio MAC has only a single transmit
buffer (which is also the receive buffer!).

Packet trains would be nice to have.

^ permalink raw reply

* Re: [PATCH net-next] perf, bpf: minimize the size of perf_trace_() tracepoint handler
From: David Miller @ 2016-04-21 17:49 UTC (permalink / raw)
  To: ast
  Cc: rostedt, peterz, mingo, daniel, acme, wangnan0, jbacik,
	brendan.d.gregg, netdev, linux-kernel, kernel-team
In-Reply-To: <1461035510-2810305-1-git-send-email-ast@fb.com>

From: Alexei Starovoitov <ast@fb.com>
Date: Mon, 18 Apr 2016 20:11:50 -0700

> move trace_call_bpf() into helper function to minimize the size
> of perf_trace_*() tracepoint handlers.
>     text	   data	    bss	    dec	 	   hex	filename
> 10541679	5526646	2945024	19013349	1221ee5	vmlinux_before
> 10509422	5526646	2945024	18981092	121a0e4	vmlinux_after
> 
> It may seem that perf_fetch_caller_regs() can also be moved,
> but that is incorrect, since ip/sp will be wrong.
> 
> bpf+tracepoint performance is not affected, since
> perf_swevent_put_recursion_context() is now inlined.
> export_symbol_gpl can also be dropped.
> 
> No measurable change in normal perf tracepoints.
> 
> Suggested-by: Steven Rostedt <rostedt@goodmis.org>
> Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Applied, thanks Alexei.

^ permalink raw reply

* [PATCH net v2 0/3] drivers: net: cpsw: phy-handle fixes
From: David Rivshin (Allworx) @ 2016-04-21 17:50 UTC (permalink / raw)
  To: netdev-u79uwXL29TY76Z2rM5mHXA, linux-omap-u79uwXL29TY76Z2rM5mHXA
  Cc: linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, David Miller, Mugunthan V N,
	Grygorii Strashko, Andrew Goodbody, Markus Brunner,
	Nicolas Chauvet

From: David Rivshin <drivshin-5fOYsn7Fw8lBDgjK7y7TUQ@public.gmane.org>

The first patch fixes a bug that makes dual_emac mode break if
either slave uses the phy-handle property in the devicetree.

The second patch fixes some cosmetic problems with error messages,
and also makes the binding documentation more explicit.

The third patch cleans up the fixed-link case to work like
the now-fixed phy-handle case.

I have tested on the following hardware configurations:
 - (EVMSK) dual emac, phy_id property in both slaves
 - (EVMSK) dual emac, phy-handle property in both slaves
 - (BeagleBoneBlack) single emac, phy_id property
 - (custom) single emac, fixed-link subnode

Nicolas Chauvet reported testing on an HP t410 (dm8148).

Markus Brunner reported testing v1 on the following [1]:
 - emac0 with phy_id and emac1 with fixed phy
 - emac0 with phy-handle and emac1 with fixed phy
 - emac0 with fixed phy and emac1 with fixed phy


Changes since v1 [2]:
- Rebased
- Added Tested-by from Nicolas Chauvet on all patches
- Added Acked-by from Rob Herring for the binding change in patch 2 [3]

[1] http://www.spinics.net/lists/netdev/msg357890.html
[2] http://www.spinics.net/lists/netdev/msg357772.html
[3] http://www.spinics.net/lists/netdev/msg358254.html

David Rivshin (3):
  drivers: net: cpsw: fix parsing of phy-handle DT property in dual_emac
    config
  drivers: net: cpsw: fix error messages when using phy-handle DT
    property
  drivers: net: cpsw: use of_phy_connect() in fixed-link case

 Documentation/devicetree/bindings/net/cpsw.txt |  4 +--
 drivers/net/ethernet/ti/cpsw.c                 | 41 +++++++++++++-------------
 drivers/net/ethernet/ti/cpsw.h                 |  1 +
 3 files changed, 23 insertions(+), 23 deletions(-)

-- 
2.5.5

--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH net-next] macvlan: fix failure during registration v2
From: Eric W. Biederman @ 2016-04-21 17:44 UTC (permalink / raw)
  To: Francesco Ruggeri; +Cc: netdev, David S. Miller, Mahesh Bandewar
In-Reply-To: <1461188301-4466-1-git-send-email-fruggeri@arista.com>

Francesco Ruggeri <fruggeri@arista.com> writes:

> If macvlan_common_newlink fails in register_netdevice after macvlan_init
> then it decrements port->count twice, first in macvlan_uninit (from
> register_netdevice or rollback_registered) and then again in
> macvlan_common_newlink.
> A similar problem may exist in the ipvlan driver.
> This patch consolidates modifications to port->count into macvlan_init
> and macvlan_uninit (thanks to Eric Biederman for suggesting this approach).
> In macvtap_device_event it also avoids cleaning up in NETDEV_UNREGISTER
> if NETDEV_REGISTER had previously failed.
>
> Signed-off-by: Francesco Ruggeri <fruggeri@arista.com>

The macvlan_init bits obviously corect.

The macvtap bits I don't really understand what is going on.

> diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
> index 95394ed..e770221 100644
> --- a/drivers/net/macvtap.c
> +++ b/drivers/net/macvtap.c
> @@ -1303,6 +1303,8 @@ static int macvtap_device_event(struct notifier_block *unused,
>  		}
>  		break;
>  	case NETDEV_UNREGISTER:
> +		if (vlan->minor == 0)
> +			break;

I don't understand this bit.  A minor of 0 is never assigned.  That is
clear from the code.  On what code path can you get here without
assigning a minor?

My gut says this either needs a big fat comment explaining what is going
on, or a slight refactoring of the code (like moving port count
increments into macvlan_init) that makes it so this code path can't
happen.

>  		devt = MKDEV(MAJOR(macvtap_major), vlan->minor);
>  		device_destroy(macvtap_class, devt);
>  		macvtap_free_minor(vlan);

Eric

^ permalink raw reply

* [PATCH v2 net-next] tcp-tso: do not split TSO packets at retransmit time
From: Eric Dumazet @ 2016-04-21 17:55 UTC (permalink / raw)
  To: David S . Miller
  Cc: netdev, Neal Cardwell, Yuchung Cheng, Eric Dumazet, Eric Dumazet

Linux TCP stack painfully segments all TSO/GSO packets before retransmits.

This was fine back in the days when TSO/GSO were emerging, with their
bugs, but we believe the dark age is over.

Keeping big packets in write queues, but also in stack traversal
has a lot of benefits.
 - Less memory overhead, because write queues have less skbs
 - Less cpu overhead at ACK processing.
 - Better SACK processing, as lot of studies mentioned how
   awful linux was at this ;)
 - Less cpu overhead to send the rtx packets
   (IP stack traversal, netfilter traversal, drivers...)
 - Better latencies in presence of losses.
 - Smaller spikes in fq like packet schedulers, as retransmits
   are not constrained by TCP Small Queues.

1 % packet losses are common today, and at 100Gbit speeds, this
translates to ~80,000 losses per second.
Losses are often correlated, and we see many retransmit events
leading to 1-MSS train of packets, at the time hosts are already
under stress.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
---
 include/net/tcp.h     |  4 ++--
 net/ipv4/tcp_input.c  |  2 +-
 net/ipv4/tcp_output.c | 64 +++++++++++++++++++++++----------------------------
 net/ipv4/tcp_timer.c  |  4 ++--
 4 files changed, 34 insertions(+), 40 deletions(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index fd40f8c64d5f..0dc272dcd772 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -538,8 +538,8 @@ __u32 cookie_v6_init_sequence(const struct sk_buff *skb, __u16 *mss);
 void __tcp_push_pending_frames(struct sock *sk, unsigned int cur_mss,
 			       int nonagle);
 bool tcp_may_send_now(struct sock *sk);
-int __tcp_retransmit_skb(struct sock *, struct sk_buff *);
-int tcp_retransmit_skb(struct sock *, struct sk_buff *);
+int __tcp_retransmit_skb(struct sock *sk, struct sk_buff *skb, int segs);
+int tcp_retransmit_skb(struct sock *sk, struct sk_buff *skb, int segs);
 void tcp_retransmit_timer(struct sock *sk);
 void tcp_xmit_retransmit_queue(struct sock *);
 void tcp_simple_retransmit(struct sock *);
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 90e0d9256b74..729e489b5608 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -5543,7 +5543,7 @@ static bool tcp_rcv_fastopen_synack(struct sock *sk, struct sk_buff *synack,
 	if (data) { /* Retransmit unacked data in SYN */
 		tcp_for_write_queue_from(data, sk) {
 			if (data == tcp_send_head(sk) ||
-			    __tcp_retransmit_skb(sk, data))
+			    __tcp_retransmit_skb(sk, data, 1))
 				break;
 		}
 		tcp_rearm_rto(sk);
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 6451b83d81e9..4876b256a70a 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -2266,7 +2266,7 @@ void tcp_send_loss_probe(struct sock *sk)
 	if (WARN_ON(!skb || !tcp_skb_pcount(skb)))
 		goto rearm_timer;
 
-	if (__tcp_retransmit_skb(sk, skb))
+	if (__tcp_retransmit_skb(sk, skb, 1))
 		goto rearm_timer;
 
 	/* Record snd_nxt for loss detection. */
@@ -2551,17 +2551,17 @@ static void tcp_retrans_try_collapse(struct sock *sk, struct sk_buff *to,
  * state updates are done by the caller.  Returns non-zero if an
  * error occurred which prevented the send.
  */
-int __tcp_retransmit_skb(struct sock *sk, struct sk_buff *skb)
+int __tcp_retransmit_skb(struct sock *sk, struct sk_buff *skb, int segs)
 {
-	struct tcp_sock *tp = tcp_sk(sk);
 	struct inet_connection_sock *icsk = inet_csk(sk);
+	struct tcp_sock *tp = tcp_sk(sk);
 	unsigned int cur_mss;
-	int err;
+	int diff, len, err;
+
 
-	/* Inconslusive MTU probe */
-	if (icsk->icsk_mtup.probe_size) {
+	/* Inconclusive MTU probe */
+	if (icsk->icsk_mtup.probe_size)
 		icsk->icsk_mtup.probe_size = 0;
-	}
 
 	/* Do not sent more than we queued. 1/4 is reserved for possible
 	 * copying overhead: fragmentation, tunneling, mangling etc.
@@ -2594,30 +2594,27 @@ int __tcp_retransmit_skb(struct sock *sk, struct sk_buff *skb)
 	    TCP_SKB_CB(skb)->seq != tp->snd_una)
 		return -EAGAIN;
 
-	if (skb->len > cur_mss) {
-		if (tcp_fragment(sk, skb, cur_mss, cur_mss, GFP_ATOMIC))
+	len = cur_mss * segs;
+	if (skb->len > len) {
+		if (tcp_fragment(sk, skb, len, cur_mss, GFP_ATOMIC))
 			return -ENOMEM; /* We'll try again later. */
 	} else {
-		int oldpcount = tcp_skb_pcount(skb);
+		if (skb_unclone(skb, GFP_ATOMIC))
+			return -ENOMEM;
 
-		if (unlikely(oldpcount > 1)) {
-			if (skb_unclone(skb, GFP_ATOMIC))
-				return -ENOMEM;
-			tcp_init_tso_segs(skb, cur_mss);
-			tcp_adjust_pcount(sk, skb, oldpcount - tcp_skb_pcount(skb));
-		}
+		diff = tcp_skb_pcount(skb);
+		tcp_set_skb_tso_segs(skb, cur_mss);
+		diff -= tcp_skb_pcount(skb);
+		if (diff)
+			tcp_adjust_pcount(sk, skb, diff);
+		if (skb->len < cur_mss)
+			tcp_retrans_try_collapse(sk, skb, cur_mss);
 	}
 
 	/* RFC3168, section 6.1.1.1. ECN fallback */
 	if ((TCP_SKB_CB(skb)->tcp_flags & TCPHDR_SYN_ECN) == TCPHDR_SYN_ECN)
 		tcp_ecn_clear_syn(sk, skb);
 
-	tcp_retrans_try_collapse(sk, skb, cur_mss);
-
-	/* Make a copy, if the first transmission SKB clone we made
-	 * is still in somebody's hands, else make a clone.
-	 */
-
 	/* make sure skb->data is aligned on arches that require it
 	 * and check if ack-trimming & collapsing extended the headroom
 	 * beyond what csum_start can cover.
@@ -2633,20 +2630,22 @@ int __tcp_retransmit_skb(struct sock *sk, struct sk_buff *skb)
 	}
 
 	if (likely(!err)) {
+		segs = tcp_skb_pcount(skb);
+
 		TCP_SKB_CB(skb)->sacked |= TCPCB_EVER_RETRANS;
 		/* Update global TCP statistics. */
-		TCP_INC_STATS(sock_net(sk), TCP_MIB_RETRANSSEGS);
+		TCP_ADD_STATS(sock_net(sk), TCP_MIB_RETRANSSEGS, segs);
 		if (TCP_SKB_CB(skb)->tcp_flags & TCPHDR_SYN)
 			NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_TCPSYNRETRANS);
-		tp->total_retrans++;
+		tp->total_retrans += segs;
 	}
 	return err;
 }
 
-int tcp_retransmit_skb(struct sock *sk, struct sk_buff *skb)
+int tcp_retransmit_skb(struct sock *sk, struct sk_buff *skb, int segs)
 {
 	struct tcp_sock *tp = tcp_sk(sk);
-	int err = __tcp_retransmit_skb(sk, skb);
+	int err = __tcp_retransmit_skb(sk, skb, segs);
 
 	if (err == 0) {
 #if FASTRETRANS_DEBUG > 0
@@ -2737,6 +2736,7 @@ void tcp_xmit_retransmit_queue(struct sock *sk)
 
 	tcp_for_write_queue_from(skb, sk) {
 		__u8 sacked = TCP_SKB_CB(skb)->sacked;
+		int segs;
 
 		if (skb == tcp_send_head(sk))
 			break;
@@ -2744,14 +2744,8 @@ void tcp_xmit_retransmit_queue(struct sock *sk)
 		if (!hole)
 			tp->retransmit_skb_hint = skb;
 
-		/* Assume this retransmit will generate
-		 * only one packet for congestion window
-		 * calculation purposes.  This works because
-		 * tcp_retransmit_skb() will chop up the
-		 * packet to be MSS sized and all the
-		 * packet counting works out.
-		 */
-		if (tcp_packets_in_flight(tp) >= tp->snd_cwnd)
+		segs = tp->snd_cwnd - tcp_packets_in_flight(tp);
+		if (segs <= 0)
 			return;
 
 		if (fwd_rexmitting) {
@@ -2788,7 +2782,7 @@ begin_fwd:
 		if (sacked & (TCPCB_SACKED_ACKED|TCPCB_SACKED_RETRANS))
 			continue;
 
-		if (tcp_retransmit_skb(sk, skb))
+		if (tcp_retransmit_skb(sk, skb, segs))
 			return;
 
 		NET_INC_STATS_BH(sock_net(sk), mib_idx);
diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c
index 49bc474f8e35..373b03e78aaa 100644
--- a/net/ipv4/tcp_timer.c
+++ b/net/ipv4/tcp_timer.c
@@ -404,7 +404,7 @@ void tcp_retransmit_timer(struct sock *sk)
 			goto out;
 		}
 		tcp_enter_loss(sk);
-		tcp_retransmit_skb(sk, tcp_write_queue_head(sk));
+		tcp_retransmit_skb(sk, tcp_write_queue_head(sk), 1);
 		__sk_dst_reset(sk);
 		goto out_reset_timer;
 	}
@@ -436,7 +436,7 @@ void tcp_retransmit_timer(struct sock *sk)
 
 	tcp_enter_loss(sk);
 
-	if (tcp_retransmit_skb(sk, tcp_write_queue_head(sk)) > 0) {
+	if (tcp_retransmit_skb(sk, tcp_write_queue_head(sk), 1) > 0) {
 		/* Retransmission failed because of local congestion,
 		 * do not backoff.
 		 */
-- 
2.8.0.rc3.226.g39d4020

^ permalink raw reply related

* Re: [PATCHv3 net-next] net: use jiffies_to_msecs to replace EXPIRES_IN_MS in inet/sctp_diag
From: David Miller @ 2016-04-21 17:56 UTC (permalink / raw)
  To: lucien.xin
  Cc: netdev, linux-sctp, marcelo.leitner, vyasevich, daniel,
	eric.dumazet, jsitnick
In-Reply-To: <87a0e83132ce64a3fe2266a5323fa6577a2e5c96.1461049801.git.lucien.xin@gmail.com>

From: Xin Long <lucien.xin@gmail.com>
Date: Tue, 19 Apr 2016 15:10:01 +0800

> EXPIRES_IN_MS macro comes from net/ipv4/inet_diag.c and dates
> back to before jiffies_to_msecs() has been introduced.
> 
> Now we can remove it and use jiffies_to_msecs().
> 
> Signed-off-by: Xin Long <lucien.xin@gmail.com>

Applied, thanks.

^ permalink raw reply

* Re: [PATCH net-next] NLA_BINARY misuse bug in HSR
From: David Miller @ 2016-04-21 17:59 UTC (permalink / raw)
  To: mail; +Cc: arvid.brodin, netdev, daniel, julia.lawall, peter.heise
In-Reply-To: <20160419113428.GA10532@aircraft-controller>

From: Peter Heise <mail@pheise.de>
Date: Tue, 19 Apr 2016 13:34:28 +0200

> Removed .type field from NLA to do proper length checking.
> Reported by Daniel Borkmann and Julia Lawall.
> 
> Signed-off-by: Peter Heise <peter.heise@airbus.com>

Applied.

^ permalink raw reply

* Re: [PATCH 1/2] [v4] net: emac: emac gigabit ethernet controller driver
From: Timur Tabi @ 2016-04-21 18:03 UTC (permalink / raw)
  To: Florian Fainelli, netdev, linux-kernel, devicetree, linux-arm-msm,
	sdharia, Shanker Donthineni, Greg Kroah-Hartman, vikrams, cov,
	gavidov, Rob Herring, andrew, bjorn.andersson, Mark Langsdorf,
	Jon Masters, Andy Gross, David S. Miller
In-Reply-To: <57100962.40404@gmail.com>

Florian Fainelli wrote:

> Well, PHYLIB does prefer using MDIO accesses to "speak" to PHYs,
> built-in or external, but there is always the option of investing into
> some custom development with the subsystem to make it play nicely with
> your HW.

So I've done some more research, and I believe that the internal phy is 
not a candidate for phylib, but the external phy (which is a real phy) 
might be.  There's no MDIO bus to the internal phy.

Does this mean that I will need to enable a PHY driver, and that driver 
will control the external phy?  If so, then does that mean that I would 
delete all to code in my driver that calls emac_phy_read() and 
emac_phy_write()?  For example, I wouldn't need emac_phy_link_check() 
any more?

>> The MDIO bus on these chips is not accessible as a separate entity.  It
>> is melded (for lack of a better word) into the EMAC itself.  That's why
>> there is a "qcom,no-external-phy" property.  You could, in theory, wire
>> the internal phy of one SOC directly to the internal phy of another SOC,
>> and use that as in interconnect between SOCs.  I don't know of any such
>> use-cases however.
>
> The fact the MDIO bus is built-into the MAC is really not a problem
> here, there are tons of drivers that deal with that just fine, yet, the
> DT binding needs to reflect that properly by having a sub-node of the
> Ethernet MAC which is a MDIO bus controller node. If external or
> internal PHYs are accessible through that MDIO bus, they also need to
> appear as child-nodes of that MDIO bus controller node.

Does the compatible property of the phy node (for the external phy) need 
to list the actual external phy?  That is, should it look like this:

	phy0: ethernet-phy@0 {
		compatible = "qcom,fsm9900-emac-phy";
		reg = <0>;
	}

or this:

	phy0: ethernet-phy@0 {
		compatible = "athr,whatever-phy";
		reg = <0>;
	}



> Can we just say that, an absence of PHY specified in the Device Tree (no
> phy-handle property and PHY not a child node of the MDIO bus), means
> that there is no external PHY?

Yes, that works.

>
> [snip]
>
>>> Do you need to maintain these flags when most, if not all of them
>>> already exist in dev->flags or dev->features?
>>
>> So you're saying that, for example, in emac_set_features() I should
>> remove this:
>>
>>      if (netdev->features & NETIF_F_HW_VLAN_CTAG_RX)
>>          set_bit(EMAC_STATUS_VLANSTRIP_EN, &adpt->status);
>>      else
>>          clear_bit(EMAC_STATUS_VLANSTRIP_EN, &adpt->status);
>>
>> and then in emac_mac_mode_config(), I should do this instead:
>>
>> void emac_mac_mode_config(struct emac_adapter *adpt)
>> {
>>      struct net_device *netdev = adpt->netdev;
>>
>>      if (netdev->features & NETIF_F_HW_VLAN_CTAG_RX)
>>          mac |= VLAN_STRIP;
>>      else
>>          mac &= ~VLAN_STRIP;
>>
>>
>> If so, then what do I do in emac_rx_mode_set()?  Should I delete this
>> entire block:
>>
>>      /* Check for Promiscuous and All Multicast modes */
>>      if (netdev->flags & IFF_PROMISC) {
>>          set_bit(EMAC_STATUS_PROMISC_EN, &adpt->status);
>>      } else if (netdev->flags & IFF_ALLMULTI) {
>>          set_bit(EMAC_STATUS_MULTIALL_EN, &adpt->status);
>>          clear_bit(EMAC_STATUS_PROMISC_EN, &adpt->status);
>>      } else {
>>          clear_bit(EMAC_STATUS_MULTIALL_EN, &adpt->status);
>>          clear_bit(EMAC_STATUS_PROMISC_EN, &adpt->status);
>>      }
>>
>> It does look like Gilad is just mirroring the flags/features variable
>> into adpt->status.  What I can't figure out is why.  It seems completely
>> redundant, but I have a nagging feeling that there is a good reason.
>
> Yes, I think your set_features and set_rx_mode functions would be
> greatly simplified, if each of them did take care of programming the HW
> immediately based on function arguments/flags. Unless absolutely
> required (e.g: suspend/resume, outside of the scope of the function
> etc..) having bookeeping variables is always something that can be out
> of sync, so better avoid them as much as possible.

Ok, I'll try to clean this up.

>> So I should move the netif_start_queue() to the end of this function?
>> Sorry if that's a stupid question, but I know little about the MAC side
>> of network drivers.
>
> That's fine, yes moving netif_start_queue() at the far end of the
> function is a good change.

Ok.

>>>> +/* Bring down the interface/HW */
>>>> +void emac_mac_down(struct emac_adapter *adpt, bool reset)
>>>> +{
>>>> +    struct net_device *netdev = adpt->netdev;
>>>> +    struct emac_phy *phy = &adpt->phy;
>>>> +    unsigned long flags;
>>>> +
>>>> +    set_bit(EMAC_STATUS_DOWN, &adpt->status);
>>>
>>> Do you need to maintain that? Would not netif_running() tell you what
>>> you want if you reflect the carrier state properly?
>>
>> I think that emac_work_thread_link_check() handles this.  It's a timer
>> thread that polls the link state and calls netif_carrier_off() if the
>> link is down.  Is that sufficient?
>>
>
> Probably, then again, with PHYLIB you have the option of either
> switching the PHY to interrupt mode (thsus saving the polling_), or it
> polls the PHY for link statuses every HZ.

I'll have to check and see if interrupt mode is even an option.  So 
phylib can do the polling for me?

>>> Since you have a producer index, you should consider checking
>>> skb->xmit_more to know whether you can update the register now, or
>>> later, which could save some expensive operation and batch TX.
>>
>> I'll have to figure out what means and get back to you.  When would
>> "later" be?
>
> After the driver gets accepted mainline for instance would seem fine.
> Considering how this seems to work, something like this is usally all
> that is needed:
>
> if (!skb->xmit_more || netif_xmit_stopped(txq)
> 	/* write producer index to get HW to transmit */

Oh, I thought you meant later in the code somewhere.  At a later date 
with another patch sounds great to me, though.

>>>> +irqreturn_t emac_isr(int _irq, void *data)
>>>> +{
>>>> +    struct emac_irq *irq = data;
>>>> +    struct emac_adapter *adpt = container_of(irq, struct
>>>> emac_adapter, irq);
>>>> +    struct emac_rx_queue *rx_q = &adpt->rx_q;
>>>> +
>>>> +    int max_ints = 1;
>>>> +    u32 isr, status;
>>>> +
>>>> +    /* disable the interrupt */
>>>> +    writel(0, adpt->base + EMAC_INT_MASK);
>>>> +
>>>> +    do {
>>>
>>> With max_ints = 1, this is essentially the same as no loop, so just
>>> inline it to reduce the indentation.
>>
>> In another internal version of this driver, max_ints is set to 5.  Could
>> this be some way of processing multiple packets in one interrupt?  Isn't
>> that something that NAPI already takes care of, anyway?
>
> Yes, NAPI is going to mitigate the cost of taking an interrupt and
> scheduling your bottom-half/soft IRQ for actual packet processing, it is
> the recommended way to mitigate the number of interrupts in the receive
> path (and transmit for that matter).

I'll clean up the code and remove max_ints.

>
>>
>>>> +        isr = readl_relaxed(adpt->base + EMAC_INT_STATUS);
>>>> +        status = isr & irq->mask;
>>>> +
>>>> +        if (status == 0)
>>>> +            break;
>>>> +
>>>> +        if (status & ISR_ERROR) {
>>>> +            netif_warn(adpt,  intr, adpt->netdev,
>>>> +                   "warning: error irq status 0x%lx\n",
>>>> +                   status & ISR_ERROR);
>>>> +            /* reset MAC */
>>>> +            set_bit(EMAC_STATUS_TASK_REINIT_REQ, &adpt->status);
>>>> +            emac_work_thread_reschedule(adpt);
>>>> +        }
>>>> +
>>>> +        /* Schedule the napi for receive queue with interrupt
>>>> +         * status bit set
>>>> +         */
>>>> +        if ((status & rx_q->intr)) {
>>>> +            if (napi_schedule_prep(&rx_q->napi)) {
>>>> +                irq->mask &= ~rx_q->intr;
>>>> +                __napi_schedule(&rx_q->napi);
>>>> +            }
>>>> +        }
>>>> +
>>>> +        if (status & TX_PKT_INT)
>>>> +            emac_mac_tx_process(adpt, &adpt->tx_q);
>>>
>>> You should consider using a NAPI instance for reclaiming TX buffers as
>>> well.
>>
>> I'll have to figure out what means and get back to you.
>
> drivers/net/ethernet/broadcom/bcmsysport.c is an example driver that
> reclaims transmitted buffers in NAPI. What that means is, take the TX
> completion interrupt, schedule a NAPI instance to run, and this NAPI
> instance cleans up the entire TX queue (it is not bounded, like the RX
> NAPI instance). It is really just moving the freeing of SKBs into
> softIRQ context vs. hardIRQ.

Thanks.  I don't think I'll get to any of the NAPI fixes in v5 of this 
driver.  I want to make sure I get the phylib conversion correct first.

>>>> +/* Configure VLAN tag strip/insert feature */
>>>> +static int emac_set_features(struct net_device *netdev,
>>>> +                 netdev_features_t features)
>>>> +{
>>>> +    struct emac_adapter *adpt = netdev_priv(netdev);
>>>> +
>>>> +    netdev_features_t changed = features ^ netdev->features;
>>>> +
>>>> +    if (!(changed & (NETIF_F_HW_VLAN_CTAG_TX |
>>>> NETIF_F_HW_VLAN_CTAG_RX)))
>>>> +        return 0;
>>>> +
>>>> +    netdev->features = features;
>>>> +    if (netdev->features & NETIF_F_HW_VLAN_CTAG_RX)
>>>> +        set_bit(EMAC_STATUS_VLANSTRIP_EN, &adpt->status);
>>>> +    else
>>>> +        clear_bit(EMAC_STATUS_VLANSTRIP_EN, &adpt->status);
>>>
>>> What about TX vlan offload?
>>
>> I don't know what that is.
>
> TX VLAN offload would be that you can specify the VLAN id somewhere in a
> packet's descriptor and have the HW automatically build an Ethernet
> frame with the correct VLAN id, and all the Ethernet frame payload
> appropriately placed at the correct offsets, with no cost for the CPU
> but indicating that information (and not having to do a memmove() to
> insert the 802.1Q tag).

I have no idea if our hardware supports that.  I'll make a note of TX 
VLAN offload and submit a separate patch if I can make it work.

>> and I've never understood why it's necessary to fall back to 32-bits if
>> 64 bits fails.  Isn't 64 bits a superset of 32 bits?  The driver is
>> saying that the hardware supports all of DDR.  How could fail, and how
>> could 32-bit succeed if 64-bits fails?
>
> I believe there could be cases where the HW is capable of addressing
> more physical memory than the CPU itself (usually unlikely, but it
> could),there could be cases where the HW is behind an IOMMMU which only
> has a window into the DDR, and that could prevent a higher DMA_BIT_MASK
> from being successfully configured.

So, so I'm going to add dma-ranges support (I posted another patch asked 
for feedback, but I haven't gotten it yet).

For ACPI, we're going to depend on IORT to set the DMA mask for us.

-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora
Forum, a Linux Foundation collaborative project.

^ permalink raw reply

* Re: [PATCH net 9/9] net/mlx5e: Reset link modes upon setting speed to zero
From: David Miller @ 2016-04-21 18:04 UTC (permalink / raw)
  To: saeedm; +Cc: netdev, ogerlitz, talal, eranbe
In-Reply-To: <1461069222-27076-10-git-send-email-saeedm@mellanox.com>

From: Saeed Mahameed <saeedm@mellanox.com>
Date: Tue, 19 Apr 2016 15:33:42 +0300

> Upon ethtool request to set speed to 0 we handle it as a special request
> to reset link modes to Device's defaults.
> 
> Fixes: f62b8bb8f2d3 ("net/mlx5: Extend mlx5_core to support ConnectX-4
> Ethernet functionality")
> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

Please don't try to sneak things like this into the patches you submit.

If you continue to add weird stuff like this, I will never _ever_ be
able to trust you guys and have a high degree of confidence in your
changes.  If you continue like this, I will always have to audit your
patches very strictly which is very time consuming for me.

Do not extend ethtool's semantics in a way which suits you specifically.

If we want to have this semantic, you must first propose it as a
global semantic which then in turn can be adopted by all drivers
supporting ethtool.

Thank you.

^ permalink raw reply

* [PATCH net 1/1] qlcnic: Update version to 5.3.64
From: Manish Chopra @ 2016-04-21 17:25 UTC (permalink / raw)
  To: davem; +Cc: netdev, Dept-GELinuxNICDev

Just updating the version as many fixes got
accumulated over 5.3.63

Signed-off-by: Manish Chopra <manish.chopra@qlogic.com>
---
 drivers/net/ethernet/qlogic/qlcnic/qlcnic.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/qlogic/qlcnic/qlcnic.h b/drivers/net/ethernet/qlogic/qlcnic/qlcnic.h
index 55007f1..caf6ddb 100644
--- a/drivers/net/ethernet/qlogic/qlcnic/qlcnic.h
+++ b/drivers/net/ethernet/qlogic/qlcnic/qlcnic.h
@@ -37,8 +37,8 @@
 
 #define _QLCNIC_LINUX_MAJOR 5
 #define _QLCNIC_LINUX_MINOR 3
-#define _QLCNIC_LINUX_SUBVERSION 63
-#define QLCNIC_LINUX_VERSIONID  "5.3.63"
+#define _QLCNIC_LINUX_SUBVERSION 64
+#define QLCNIC_LINUX_VERSIONID  "5.3.64"
 #define QLCNIC_DRV_IDC_VER  0x01
 #define QLCNIC_DRIVER_VERSION  ((_QLCNIC_LINUX_MAJOR << 16) |\
 		 (_QLCNIC_LINUX_MINOR << 8) | (_QLCNIC_LINUX_SUBVERSION))
-- 
2.7.2

^ permalink raw reply related

* Re: [PATCH net 9/9] net/mlx5e: Reset link modes upon setting speed to zero
From: David Miller @ 2016-04-21 18:07 UTC (permalink / raw)
  To: saeedm; +Cc: netdev, ogerlitz, talal, eranbe
In-Reply-To: <20160421.140454.143155848960923980.davem@davemloft.net>

From: David Miller <davem@davemloft.net>
Date: Thu, 21 Apr 2016 14:04:54 -0400 (EDT)

> From: Saeed Mahameed <saeedm@mellanox.com>
> Date: Tue, 19 Apr 2016 15:33:42 +0300
> 
>> Upon ethtool request to set speed to 0 we handle it as a special request
>> to reset link modes to Device's defaults.
>> 
>> Fixes: f62b8bb8f2d3 ("net/mlx5: Extend mlx5_core to support ConnectX-4
>> Ethernet functionality")
>> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
> 
> Please don't try to sneak things like this into the patches you submit.

Also, this patch series doesn't even compile:

[davem@dhcp-10-15-49-210 net]$ make -s -j16
Makefile:679: Cannot use CONFIG_KCOV: -fsanitize-coverage=trace-pc is not supported by compiler
  DESCEND  objtool
drivers/infiniband/hw/mlx5/main.c: In function ‘mlx5_query_hca_port’:
drivers/infiniband/hw/mlx5/main.c:721:32: error: passing argument 2 of ‘mlx5_query_port_max_mtu’ from incompatible pointer type [-Werror=incompatible-pointer-types]
  mlx5_query_port_max_mtu(mdev, &max_mtu, port);
                                ^
In file included from drivers/infiniband/hw/mlx5/main.c:45:0:
include/linux/mlx5/port.h:58:6: note: expected ‘u16 * {aka short unsigned int *}’ but argument is of type ‘int *’
 void mlx5_query_port_max_mtu(struct mlx5_core_dev *dev, u16 *max_mtu, u8 port);
      ^
drivers/infiniband/hw/mlx5/main.c:725:33: error: passing argument 2 of ‘mlx5_query_port_oper_mtu’ from incompatible pointer type [-Werror=incompatible-pointer-types]
  mlx5_query_port_oper_mtu(mdev, &oper_mtu, port);
                                 ^
In file included from drivers/infiniband/hw/mlx5/main.c:45:0:
include/linux/mlx5/port.h:59:6: note: expected ‘u16 * {aka short unsigned int *}’ but argument is of type ‘int *’
 void mlx5_query_port_oper_mtu(struct mlx5_core_dev *dev, u16 *oper_mtu,
      ^
cc1: some warnings being treated as errors
scripts/Makefile.build:291: recipe for target 'drivers/infiniband/hw/mlx5/main.o' failed
make[4]: *** [drivers/infiniband/hw/mlx5/main.o] Error 1

^ permalink raw reply

* Re: [patch -next] geneve: testing the wrong variable in geneve6_build_skb()
From: David Miller @ 2016-04-21 18:08 UTC (permalink / raw)
  To: dan.carpenter
  Cc: aduyck, jesse, linville, pshelar, jbenc, daniel, tom, netdev,
	kernel-janitors
In-Reply-To: <20160419143056.GD4876@mwanda>

From: Dan Carpenter <dan.carpenter@oracle.com>
Date: Tue, 19 Apr 2016 17:30:56 +0300

> We intended to test "err" and not "skb".
> 
> Fixes: aed069df099c ('ip_tunnel_core: iptunnel_handle_offloads returns int and doesn't free skb')
> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>

Applied, thanks Dan.

^ permalink raw reply

* Re: linux-next: zillions of lockdep whinges in include/net/sock.h:1408
From: Hannes Frederic Sowa @ 2016-04-21 18:10 UTC (permalink / raw)
  To: Eric Dumazet, Valdis.Kletnieks; +Cc: David S. Miller, netdev, linux-kernel
In-Reply-To: <1461245496.7627.17.camel@edumazet-glaptop3.roam.corp.google.com>

On 21.04.2016 15:31, Eric Dumazet wrote:
> On Thu, 2016-04-21 at 05:05 -0400, Valdis.Kletnieks@vt.edu wrote:
>> On Thu, 21 Apr 2016 09:42:12 +0200, Hannes Frederic Sowa said:
>>> Hi,
>>>
>>> On Thu, Apr 21, 2016, at 02:30, Valdis Kletnieks wrote:
>>>> linux-next 20160420 is whining at an incredible rate - in 20 minutes of
>>>> uptime, I piled up some 41,000 hits from all over the place (cleaned up
>>>> to skip the CPU and PID so the list isn't quite so long):
>>>
>>> Thanks for the report. Can you give me some more details:
>>>
>>> Is this an nfs socket? Do you by accident know if this socket went
>>> through xs_reclassify_socket at any point? We do hold the appropriate
>>> locks at that point but I fear that the lockdep reinitialization
>>> confused lockdep.
>>
>> It wasn't an NFS socket, as NFS wasn't even active at the time.  I'm reasonably
>> sure that multiple sockets were in play, given that tcp_v6_rcv and
>> udpv6_queue_rcv_skb were both implicated.  I strongly suspect that pretty much
>> any IPv6 traffic could do it - the frequency dropped off quite a bit when I
>> closed firefox, which is usually a heavy network hitter on my laptop.
> 
> 
> Looks like the following patch is needed, can you try it please ?
> 
> Thanks !
> 
> diff --git a/include/net/sock.h b/include/net/sock.h
> index d997ec13a643..db8301c76d50 100644
> --- a/include/net/sock.h
> +++ b/include/net/sock.h
> @@ -1350,7 +1350,8 @@ static inline bool lockdep_sock_is_held(const struct sock *csk)
>  {
>  	struct sock *sk = (struct sock *)csk;
>  
> -	return lockdep_is_held(&sk->sk_lock) ||
> +	return !debug_locks ||
> +	       lockdep_is_held(&sk->sk_lock) ||
>  	       lockdep_is_held(&sk->sk_lock.slock);
>  }
>  #endif

I am a little bit lost because I cannot reproduce this bug. I thought
maybe it has something to do with single cpu spin_locks which don't
update the lockdep_maps but I couldn't reproduce it. If debug_locks get
flipped you should see something in dmesg, too. Maybe you have this
handy? Was there another lockdep splat before the networking ones? Also
the config would be helpful.

Thanks,
Hannes

^ permalink raw reply

* Re: [PATCH] ixgbevf: Fix relaxed order settings in VF driver
From: Alexander Duyck @ 2016-04-21 18:13 UTC (permalink / raw)
  To: Babu Moger
  Cc: Jeff Kirsher, Brandeburg, Jesse, shannon nelson, Carolyn Wyborny,
	Skidmore, Donald C, Bruce W Allan, John Ronciak, Mitch Williams,
	intel-wired-lan, Netdev, linux-kernel@vger.kernel.org,
	Sowmini Varadhan
In-Reply-To: <1461259276-54151-1-git-send-email-babu.moger@oracle.com>

On Thu, Apr 21, 2016 at 10:21 AM, Babu Moger <babu.moger@oracle.com> wrote:
> Current code writes the tx/rx relaxed order without reading it first.
> This can lead to unintended consequences as we are forcibly writing
> other bits.

The consequences were very much intended as there are situations where
enabling relaxed ordering can lead to data corruption.

> We noticed this problem while testing VF driver on sparc. Relaxed
> order settings for rx queue were all messed up which was causing
> performance drop with VF interface.

What additional relaxed ordering bits are you enabling on Sparc?  I'm
assuming it is just the Rx data write back but I want to verify.

> Fixed it by reading the registers first and setting the specific
> bit of interest. With this change we are able to match the bandwidth
> equivalent to PF interface.
>
> Signed-off-by: Babu Moger <babu.moger@oracle.com>

Fixed is a relative term here since you are only chasing performance
from what I can tell.  We need to make certain that this doesn't break
the driver on any other architectures by leading to things like data
corruption.

- Alex

^ permalink raw reply

* Re: [net-next PATCH 1/2] netdev_features: Fold NETIF_F_ALL_TSO into NETIF_F_GSO_SOFTWARE
From: David Miller @ 2016-04-21 18:15 UTC (permalink / raw)
  To: aduyck; +Cc: netdev, alexander.duyck
In-Reply-To: <20160419180219.11003.78651.stgit@ahduyck-xeon-server>

From: Alexander Duyck <aduyck@mirantis.com>
Date: Tue, 19 Apr 2016 14:02:19 -0400

> This patch folds NETIF_F_ALL_TSO into the bitmask for NETIF_F_GSO_SOFTWARE.
> The idea is to avoid duplication of defines since the only difference
> between the two was the GSO_UDP bit.
> 
> Signed-off-by: Alexander Duyck <aduyck@mirantis.com>

Applied.

^ permalink raw reply

* Re: [net-next PATCH 2/2] veth: Update features to include all tunnel GSO types
From: David Miller @ 2016-04-21 18:15 UTC (permalink / raw)
  To: aduyck; +Cc: netdev, alexander.duyck
In-Reply-To: <20160419180226.11003.43326.stgit@ahduyck-xeon-server>

From: Alexander Duyck <aduyck@mirantis.com>
Date: Tue, 19 Apr 2016 14:02:26 -0400

> This patch adds support for the checksum enabled versions of UDP and GRE
> tunnels.  With this change we should be able to send and receive GSO frames
> of these types over the veth pair without needing to segment the packets.
> 
> Signed-off-by: Alexander Duyck <aduyck@mirantis.com>

Applied.

^ permalink raw reply

* [PATCH net v2 1/3] drivers: net: cpsw: fix parsing of phy-handle DT property in dual_emac config
From: David Rivshin (Allworx) @ 2016-04-21 18:19 UTC (permalink / raw)
  To: netdev, linux-omap
  Cc: linux-arm-kernel, devicetree, linux-kernel, David Miller,
	Mugunthan V N, Grygorii Strashko, Andrew Goodbody, Markus Brunner,
	Nicolas Chauvet
In-Reply-To: <1461261035-5578-1-git-send-email-drivshin.allworx@gmail.com>

From: David Rivshin <drivshin@allworx.com>

Commit 9e42f715264ff158478fa30eaed847f6e131366b ("drivers: net: cpsw: add
phy-handle parsing") saved the "phy-handle" phandle into a new cpsw_priv
field. However, phy connections are per-slave, so the phy_node field should
be in cpsw_slave_data rather than cpsw_priv.

This would go unnoticed in a single emac configuration. But in dual_emac
mode, the last "phy-handle" property parsed for either slave would be used
by both of them, causing them both to refer to the same phy_device.

Fixes: 9e42f715264f ("drivers: net: cpsw: add phy-handle parsing")
Signed-off-by: David Rivshin <drivshin@allworx.com>
Tested-by: Nicolas Chauvet <kwizart@gmail.com>
---
I would suggest this for -stable. It should apply cleanly as far back
as 4.4.

Changes since v1 [1]:
- Rebased (no conflicts)
- Added Tested-by from Nicolas Chauvet

[1] https://patchwork.ozlabs.org/patch/560326/

 drivers/net/ethernet/ti/cpsw.c | 13 ++++++-------
 drivers/net/ethernet/ti/cpsw.h |  1 +
 2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c
index 42fdfd4..d69cb3f 100644
--- a/drivers/net/ethernet/ti/cpsw.c
+++ b/drivers/net/ethernet/ti/cpsw.c
@@ -363,15 +363,14 @@ static inline void slave_write(struct cpsw_slave *slave, u32 val, u32 offset)
 	__raw_writel(val, slave->regs + offset);
 }
 
 struct cpsw_priv {
 	spinlock_t			lock;
 	struct platform_device		*pdev;
 	struct net_device		*ndev;
-	struct device_node		*phy_node;
 	struct napi_struct		napi_rx;
 	struct napi_struct		napi_tx;
 	struct device			*dev;
 	struct cpsw_platform_data	data;
 	struct cpsw_ss_regs __iomem	*regs;
 	struct cpsw_wr_regs __iomem	*wr_regs;
 	u8 __iomem			*hw_stats;
@@ -1144,16 +1143,16 @@ static void cpsw_slave_open(struct cpsw_slave *slave, struct cpsw_priv *priv)
 
 	if (priv->data.dual_emac)
 		cpsw_add_dual_emac_def_ale_entries(priv, slave, slave_port);
 	else
 		cpsw_ale_add_mcast(priv->ale, priv->ndev->broadcast,
 				   1 << slave_port, 0, 0, ALE_MCAST_FWD_2);
 
-	if (priv->phy_node)
-		slave->phy = of_phy_connect(priv->ndev, priv->phy_node,
+	if (slave->data->phy_node)
+		slave->phy = of_phy_connect(priv->ndev, slave->data->phy_node,
 				 &cpsw_adjust_link, 0, slave->data->phy_if);
 	else
 		slave->phy = phy_connect(priv->ndev, slave->data->phy_id,
 				 &cpsw_adjust_link, slave->data->phy_if);
 	if (IS_ERR(slave->phy)) {
 		dev_err(priv->dev, "phy %s not found on slave %d\n",
 			slave->data->phy_id, slave->slave_num);
@@ -1936,20 +1935,19 @@ static void cpsw_slave_init(struct cpsw_slave *slave, struct cpsw_priv *priv,
 
 	slave->data	= data;
 	slave->regs	= regs + slave_reg_ofs;
 	slave->sliver	= regs + sliver_reg_ofs;
 	slave->port_vlan = data->dual_emac_res_vlan;
 }
 
-static int cpsw_probe_dt(struct cpsw_priv *priv,
+static int cpsw_probe_dt(struct cpsw_platform_data *data,
 			 struct platform_device *pdev)
 {
 	struct device_node *node = pdev->dev.of_node;
 	struct device_node *slave_node;
-	struct cpsw_platform_data *data = &priv->data;
 	int i = 0, ret;
 	u32 prop;
 
 	if (!node)
 		return -EINVAL;
 
 	if (of_property_read_u32(node, "slaves", &prop)) {
@@ -2029,15 +2027,16 @@ static int cpsw_probe_dt(struct cpsw_priv *priv,
 		int lenp;
 		const __be32 *parp;
 
 		/* This is no slave child node, continue */
 		if (strcmp(slave_node->name, "slave"))
 			continue;
 
-		priv->phy_node = of_parse_phandle(slave_node, "phy-handle", 0);
+		slave_data->phy_node = of_parse_phandle(slave_node,
+							"phy-handle", 0);
 		parp = of_get_property(slave_node, "phy_id", &lenp);
 		if (of_phy_is_fixed_link(slave_node)) {
 			struct device_node *phy_node;
 			struct phy_device *phy_dev;
 
 			/* In the case of a fixed PHY, the DT node associated
 			 * to the PHY is the Ethernet MAC DT node.
@@ -2271,15 +2270,15 @@ static int cpsw_probe(struct platform_device *pdev)
 	 * This may be required here for child devices.
 	 */
 	pm_runtime_enable(&pdev->dev);
 
 	/* Select default pin state */
 	pinctrl_pm_select_default_state(&pdev->dev);
 
-	if (cpsw_probe_dt(priv, pdev)) {
+	if (cpsw_probe_dt(&priv->data, pdev)) {
 		dev_err(&pdev->dev, "cpsw: platform data missing\n");
 		ret = -ENODEV;
 		goto clean_runtime_disable_ret;
 	}
 	data = &priv->data;
 
 	if (is_valid_ether_addr(data->slave_data[0].mac_addr)) {
diff --git a/drivers/net/ethernet/ti/cpsw.h b/drivers/net/ethernet/ti/cpsw.h
index 442a703..e50afd1 100644
--- a/drivers/net/ethernet/ti/cpsw.h
+++ b/drivers/net/ethernet/ti/cpsw.h
@@ -14,14 +14,15 @@
 #ifndef __CPSW_H__
 #define __CPSW_H__
 
 #include <linux/if_ether.h>
 #include <linux/phy.h>
 
 struct cpsw_slave_data {
+	struct device_node *phy_node;
 	char		phy_id[MII_BUS_ID_SIZE];
 	int		phy_if;
 	u8		mac_addr[ETH_ALEN];
 	u16		dual_emac_res_vlan;	/* Reserved VLAN for DualEMAC */
 };
 
 struct cpsw_platform_data {
-- 
2.5.5

^ permalink raw reply related

* Re: [PATCH net-next v2 0/4] libnl: enhance API to ease 64bit alignment for attribute
From: David Miller @ 2016-04-21 18:28 UTC (permalink / raw)
  To: nicolas.dichtel; +Cc: netdev, roopa, eric.dumazet, tgraf, jhs
In-Reply-To: <1461257907-4458-1-git-send-email-nicolas.dichtel@6wind.com>

From: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Date: Thu, 21 Apr 2016 18:58:23 +0200

> Here is a proposal to add more helpers in the libnetlink to manage 64-bit
> alignment issues.
> Note that this series was only tested on x86 by tweeking
> CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS and adding some traces.
> 
> The first patch adds helpers for 64bit alignment and other patches
> use them.
> 
> We could also add helpers for nla_put_u64() and its variants if needed.
> 
> v1 -> v2:
>  - remove patch #1
>  - split patch #2 (now #1 and #2)
>  - add nla_need_padding_for_64bit()

I like it, nice work Nicolas.

Applied to net-next.

I did a quick scan and the following jumped out at me as cases we need
to fix up as well:

1) xfrm_user
2) tcp_info
3) taskstats
4) pkt_{cls,sched}
5) openvswitch
etc.

Most of these are statistic cases just like all of the existing ones
we have fixed so far.

^ permalink raw reply

* [PATCH net v2 2/3] drivers: net: cpsw: fix error messages when using phy-handle DT property
From: David Rivshin (Allworx) @ 2016-04-21 18:26 UTC (permalink / raw)
  To: netdev, linux-omap
  Cc: linux-arm-kernel, devicetree, linux-kernel, David Miller,
	Mugunthan V N, Grygorii Strashko, Andrew Goodbody, Markus Brunner,
	Nicolas Chauvet
In-Reply-To: <1461261035-5578-1-git-send-email-drivshin.allworx@gmail.com>

From: David Rivshin <drivshin@allworx.com>

The phy-handle, phy_id, and fixed-link properties are mutually exclusive,
and only one need be specified. However if phy-handle was specified, an
error message would complain about the lack of phy_id or fixed-link.

Also, if phy-handle was specified and the subsequent of_phy_connect()
failed, the error message still referenced slaved->data->phy_id, which
would be empty. Instead, use the name of the device_node as a useful
identifier.

Fixes: 9e42f715264f ("drivers: net: cpsw: add phy-handle parsing")
Signed-off-by: David Rivshin <drivshin@allworx.com>
Acked-by: Rob Herring <robh@kernel.org>
Tested-by: Nicolas Chauvet <kwizart@gmail.com>
---
If would like this for -stable it should apply cleanly as far back
as 4.5. It failes on 4.4 due to some context differences, but can be
applied with 'git am -C2'. Or, I can produce a separate patch against
linux-4.4.y if preferred.

Changes since v1 [1]:
- Rebased (no conflicts)
- Added Tested-by from Nicolas Chauvet
- Added Acked-by from Rob Herring for the binding change

[1] https://patchwork.ozlabs.org/patch/560324/

 Documentation/devicetree/bindings/net/cpsw.txt |  4 ++--
 drivers/net/ethernet/ti/cpsw.c                 | 17 +++++++++++++----
 2 files changed, 15 insertions(+), 6 deletions(-)

diff --git a/Documentation/devicetree/bindings/net/cpsw.txt b/Documentation/devicetree/bindings/net/cpsw.txt
index 28a4781..3033c0f 100644
--- a/Documentation/devicetree/bindings/net/cpsw.txt
+++ b/Documentation/devicetree/bindings/net/cpsw.txt
@@ -46,16 +46,16 @@ Optional properties:
 - dual_emac_res_vlan	: Specifies VID to be used to segregate the ports
 - mac-address		: See ethernet.txt file in the same directory
 - phy_id		: Specifies slave phy id
 - phy-handle		: See ethernet.txt file in the same directory
 
 Slave sub-nodes:
 - fixed-link		: See fixed-link.txt file in the same directory
-			  Either the property phy_id, or the sub-node
-			  fixed-link can be specified
+
+Note: Exactly one of phy_id, phy-handle, or fixed-link must be specified.
 
 Note: "ti,hwmods" field is used to fetch the base address and irq
 resources from TI, omap hwmod data base during device registration.
 Future plan is to migrate hwmod data base contents into device tree
 blob so that, all the required data will be used from device tree dts
 file.
 
diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c
index d69cb3f..3c81413 100644
--- a/drivers/net/ethernet/ti/cpsw.c
+++ b/drivers/net/ethernet/ti/cpsw.c
@@ -1150,16 +1150,19 @@ static void cpsw_slave_open(struct cpsw_slave *slave, struct cpsw_priv *priv)
 	if (slave->data->phy_node)
 		slave->phy = of_phy_connect(priv->ndev, slave->data->phy_node,
 				 &cpsw_adjust_link, 0, slave->data->phy_if);
 	else
 		slave->phy = phy_connect(priv->ndev, slave->data->phy_id,
 				 &cpsw_adjust_link, slave->data->phy_if);
 	if (IS_ERR(slave->phy)) {
-		dev_err(priv->dev, "phy %s not found on slave %d\n",
-			slave->data->phy_id, slave->slave_num);
+		dev_err(priv->dev, "phy \"%s\" not found on slave %d\n",
+			slave->data->phy_node ?
+				slave->data->phy_node->full_name :
+				slave->data->phy_id,
+			slave->slave_num);
 		slave->phy = NULL;
 	} else {
 		phy_attached_info(slave->phy);
 
 		phy_start(slave->phy);
 
 		/* Configure GMII_SEL register */
@@ -2030,15 +2033,19 @@ static int cpsw_probe_dt(struct cpsw_platform_data *data,
 		/* This is no slave child node, continue */
 		if (strcmp(slave_node->name, "slave"))
 			continue;
 
 		slave_data->phy_node = of_parse_phandle(slave_node,
 							"phy-handle", 0);
 		parp = of_get_property(slave_node, "phy_id", &lenp);
-		if (of_phy_is_fixed_link(slave_node)) {
+		if (slave_data->phy_node) {
+			dev_dbg(&pdev->dev,
+				"slave[%d] using phy-handle=\"%s\"\n",
+				i, slave_data->phy_node->full_name);
+		} else if (of_phy_is_fixed_link(slave_node)) {
 			struct device_node *phy_node;
 			struct phy_device *phy_dev;
 
 			/* In the case of a fixed PHY, the DT node associated
 			 * to the PHY is the Ethernet MAC DT node.
 			 */
 			ret = of_phy_register_fixed_link(slave_node);
@@ -2067,15 +2074,17 @@ static int cpsw_probe_dt(struct cpsw_platform_data *data,
 			if (!mdio) {
 				dev_err(&pdev->dev, "Missing mdio platform device\n");
 				return -EINVAL;
 			}
 			snprintf(slave_data->phy_id, sizeof(slave_data->phy_id),
 				 PHY_ID_FMT, mdio->name, phyid);
 		} else {
-			dev_err(&pdev->dev, "No slave[%d] phy_id or fixed-link property\n", i);
+			dev_err(&pdev->dev,
+				"No slave[%d] phy_id, phy-handle, or fixed-link property\n",
+				i);
 			goto no_phy_slave;
 		}
 		slave_data->phy_if = of_get_phy_mode(slave_node);
 		if (slave_data->phy_if < 0) {
 			dev_err(&pdev->dev, "Missing or malformed slave[%d] phy-mode property\n",
 				i);
 			return slave_data->phy_if;
-- 
2.5.5

^ permalink raw reply related

* Re: [PATCH] drivers: net: cpsw: fix wrong regs access in cpsw_ndo_open
From: David Miller @ 2016-04-21 18:31 UTC (permalink / raw)
  To: grygorii.strashko; +Cc: netdev, mugunthanvnm, nsekhar, linux-kernel, linux-omap
In-Reply-To: <1461089389-17508-1-git-send-email-grygorii.strashko@ti.com>

From: Grygorii Strashko <grygorii.strashko@ti.com>
Date: Tue, 19 Apr 2016 21:09:49 +0300

> The cpsw_ndo_open() could try to access CPSW registers before
> calling pm_runtime_get_sync(). This will trigger L3 error:
> 
>  WARNING: CPU: 0 PID: 21 at drivers/bus/omap_l3_noc.c:147 l3_interrupt_handler+0x220/0x34c()
>  44000000.ocp:L3 Custom Error: MASTER M2 (64-bit) TARGET L4_FAST (Idle): Data Access in Supervisor mode during Functional access
> 
> and CPSW will stop functioning.
> 
> Hence, fix it by moving pm_runtime_get_sync() before the first access
> to CPSW registers in cpsw_ndo_open().
> 
> Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>

Applied, thanks.

^ permalink raw reply

* Re: [PATCH net-next] macvlan: fix failure during registration v2
From: Francesco Ruggeri @ 2016-04-21 18:39 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: netdev, David S. Miller, Mahesh Bandewar
In-Reply-To: <8737qe6en1.fsf@x220.int.ebiederm.org>

On Thu, Apr 21, 2016 at 10:44 AM, Eric W. Biederman
<ebiederm@xmission.com> wrote:
<
>> diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
>> index 95394ed..e770221 100644
>> --- a/drivers/net/macvtap.c
>> +++ b/drivers/net/macvtap.c
>> @@ -1303,6 +1303,8 @@ static int macvtap_device_event(struct notifier_block *unused,
>>               }
>>               break;
>>       case NETDEV_UNREGISTER:
>> +             if (vlan->minor == 0)
>> +                     break;
>
> I don't understand this bit.  A minor of 0 is never assigned.  That is
> clear from the code.  On what code path can you get here without
> assigning a minor?
>

You can have vlan->minor == 0 if macvtap_device_event(NETDEV_REGISTER)
failed.

macvtap_device_event is invoked in the context of macvtap_newlink and
it can fail if for example a macvtap interface using the same ifindex
already exists in a different namespace. That is how we originally ran
into the port->count issue.
In that case the sequence is

macvtap_newlink
  macvlan_common_newlink
    register_netdevice
      call_netdevice_notifiers(NETDEV_REGISTER, dev)
        macvtap_device_event(NETDEV_REGISTER)
          <fail here, vlan->minor = 0>
      rollback_registered(dev);
        rollback_registered_many
          call_netdevice_notifiers(NETDEV_UNREGISTER, dev);
            macvtap_device_event(NETDEV_UNREGISTER)
              <nothing to do here>

Should this bit go into a separate patch?
Would a comment like this help:

/* We can have vlan->minor == 0 if NETDEV_REGISTER above failed */

Marc Angel posted https://marc.info/?l=linux-netdev&m=146116146925511&w=2
about conflicts if one tries to create macvtap interfaces with the same
ifindex in different namespaces.

Thanks,
Francesco

>
> Eric

^ permalink raw reply

* [PATCH net v2 3/3] drivers: net: cpsw: use of_phy_connect() in fixed-link case
From: David Rivshin (Allworx) @ 2016-04-21 18:41 UTC (permalink / raw)
  To: netdev-u79uwXL29TY76Z2rM5mHXA, linux-omap-u79uwXL29TY76Z2rM5mHXA
  Cc: linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, David Miller, Mugunthan V N,
	Grygorii Strashko, Andrew Goodbody, Markus Brunner,
	Nicolas Chauvet
In-Reply-To: <1461261035-5578-1-git-send-email-drivshin.allworx-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>

From: David Rivshin <drivshin-5fOYsn7Fw8lBDgjK7y7TUQ@public.gmane.org>

If a fixed-link DT subnode is used, the phy_device was looked up so
that a PHY ID string could be constructed and passed to phy_connect().
This is not necessary, as the device_node can be passed directly to
of_phy_connect() instead. This reuses the same codepath as if the
phy-handle DT property was used.

Signed-off-by: David Rivshin <drivshin-5fOYsn7Fw8lBDgjK7y7TUQ@public.gmane.org>
Tested-by: Nicolas Chauvet <kwizart-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
---

Changes since v1 [1]:
- Rebased (trivial conflict, e5a03bfd modified the deleted snprintf)
- Added Tested-by from Nicolas Chauvet

[1] https://patchwork.ozlabs.org/patch/560327/


 drivers/net/ethernet/ti/cpsw.c | 11 +----------
 1 file changed, 1 insertion(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c
index 3c81413..245f919 100644
--- a/drivers/net/ethernet/ti/cpsw.c
+++ b/drivers/net/ethernet/ti/cpsw.c
@@ -2038,30 +2038,21 @@ static int cpsw_probe_dt(struct cpsw_platform_data *data,
 							"phy-handle", 0);
 		parp = of_get_property(slave_node, "phy_id", &lenp);
 		if (slave_data->phy_node) {
 			dev_dbg(&pdev->dev,
 				"slave[%d] using phy-handle=\"%s\"\n",
 				i, slave_data->phy_node->full_name);
 		} else if (of_phy_is_fixed_link(slave_node)) {
-			struct device_node *phy_node;
-			struct phy_device *phy_dev;
-
 			/* In the case of a fixed PHY, the DT node associated
 			 * to the PHY is the Ethernet MAC DT node.
 			 */
 			ret = of_phy_register_fixed_link(slave_node);
 			if (ret)
 				return ret;
-			phy_node = of_node_get(slave_node);
-			phy_dev = of_phy_find_device(phy_node);
-			if (!phy_dev)
-				return -ENODEV;
-			snprintf(slave_data->phy_id, sizeof(slave_data->phy_id),
-				 PHY_ID_FMT, phy_dev->mdio.bus->id,
-				 phy_dev->mdio.addr);
+			slave_data->phy_node = of_node_get(slave_node);
 		} else if (parp) {
 			u32 phyid;
 			struct device_node *mdio_node;
 			struct platform_device *mdio;
 
 			if (lenp != (sizeof(__be32) * 2)) {
 				dev_err(&pdev->dev, "Invalid slave[%d] phy_id property\n", i);
-- 
2.5.5

--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* Re: [PATCH net 0/2] tcp: Merge timestamp info when coalescing skbs
From: David Miller @ 2016-04-21 18:41 UTC (permalink / raw)
  To: kafai; +Cc: netdev, edumazet, ncardwell, soheil, willemb, ycheng, kernel-team
In-Reply-To: <1461130769-1442865-1-git-send-email-kafai@fb.com>

From: Martin KaFai Lau <kafai@fb.com>
Date: Tue, 19 Apr 2016 22:39:27 -0700

> This series is separated from the RFC series related to
> tcp_sendmsg(MSG_EOR) and it is targeting for the net branch.
> This patchset is focusing on fixing cases where TCP
> timestamp could be lost after coalescing skbs.
> 
> A BPF prog is used to kprobe to sock_queue_err_skb()
> and print out the value of serr->ee.ee_data.  The BPF
> prog (run-able from bcc) is attached here:
 ...

Series applied, thank you.

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox