Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCHv3 net-next] net: use jiffies_to_msecs to replace EXPIRES_IN_MS in inet/sctp_diag
From: David Miller @ 2016-04-21 17:56 UTC (permalink / raw)
  To: lucien.xin
  Cc: netdev, linux-sctp, marcelo.leitner, vyasevich, daniel,
	eric.dumazet, jsitnick
In-Reply-To: <87a0e83132ce64a3fe2266a5323fa6577a2e5c96.1461049801.git.lucien.xin@gmail.com>

From: Xin Long <lucien.xin@gmail.com>
Date: Tue, 19 Apr 2016 15:10:01 +0800

> EXPIRES_IN_MS macro comes from net/ipv4/inet_diag.c and dates
> back to before jiffies_to_msecs() has been introduced.
> 
> Now we can remove it and use jiffies_to_msecs().
> 
> Signed-off-by: Xin Long <lucien.xin@gmail.com>

Applied, thanks.

^ permalink raw reply

* [PATCH v2 net-next] tcp-tso: do not split TSO packets at retransmit time
From: Eric Dumazet @ 2016-04-21 17:55 UTC (permalink / raw)
  To: David S . Miller
  Cc: netdev, Neal Cardwell, Yuchung Cheng, Eric Dumazet, Eric Dumazet

Linux TCP stack painfully segments all TSO/GSO packets before retransmits.

This was fine back in the days when TSO/GSO were emerging, with their
bugs, but we believe the dark age is over.

Keeping big packets in write queues, but also in stack traversal
has a lot of benefits.
 - Less memory overhead, because write queues have less skbs
 - Less cpu overhead at ACK processing.
 - Better SACK processing, as lot of studies mentioned how
   awful linux was at this ;)
 - Less cpu overhead to send the rtx packets
   (IP stack traversal, netfilter traversal, drivers...)
 - Better latencies in presence of losses.
 - Smaller spikes in fq like packet schedulers, as retransmits
   are not constrained by TCP Small Queues.

1 % packet losses are common today, and at 100Gbit speeds, this
translates to ~80,000 losses per second.
Losses are often correlated, and we see many retransmit events
leading to 1-MSS train of packets, at the time hosts are already
under stress.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
---
 include/net/tcp.h     |  4 ++--
 net/ipv4/tcp_input.c  |  2 +-
 net/ipv4/tcp_output.c | 64 +++++++++++++++++++++++----------------------------
 net/ipv4/tcp_timer.c  |  4 ++--
 4 files changed, 34 insertions(+), 40 deletions(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index fd40f8c64d5f..0dc272dcd772 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -538,8 +538,8 @@ __u32 cookie_v6_init_sequence(const struct sk_buff *skb, __u16 *mss);
 void __tcp_push_pending_frames(struct sock *sk, unsigned int cur_mss,
 			       int nonagle);
 bool tcp_may_send_now(struct sock *sk);
-int __tcp_retransmit_skb(struct sock *, struct sk_buff *);
-int tcp_retransmit_skb(struct sock *, struct sk_buff *);
+int __tcp_retransmit_skb(struct sock *sk, struct sk_buff *skb, int segs);
+int tcp_retransmit_skb(struct sock *sk, struct sk_buff *skb, int segs);
 void tcp_retransmit_timer(struct sock *sk);
 void tcp_xmit_retransmit_queue(struct sock *);
 void tcp_simple_retransmit(struct sock *);
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 90e0d9256b74..729e489b5608 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -5543,7 +5543,7 @@ static bool tcp_rcv_fastopen_synack(struct sock *sk, struct sk_buff *synack,
 	if (data) { /* Retransmit unacked data in SYN */
 		tcp_for_write_queue_from(data, sk) {
 			if (data == tcp_send_head(sk) ||
-			    __tcp_retransmit_skb(sk, data))
+			    __tcp_retransmit_skb(sk, data, 1))
 				break;
 		}
 		tcp_rearm_rto(sk);
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 6451b83d81e9..4876b256a70a 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -2266,7 +2266,7 @@ void tcp_send_loss_probe(struct sock *sk)
 	if (WARN_ON(!skb || !tcp_skb_pcount(skb)))
 		goto rearm_timer;
 
-	if (__tcp_retransmit_skb(sk, skb))
+	if (__tcp_retransmit_skb(sk, skb, 1))
 		goto rearm_timer;
 
 	/* Record snd_nxt for loss detection. */
@@ -2551,17 +2551,17 @@ static void tcp_retrans_try_collapse(struct sock *sk, struct sk_buff *to,
  * state updates are done by the caller.  Returns non-zero if an
  * error occurred which prevented the send.
  */
-int __tcp_retransmit_skb(struct sock *sk, struct sk_buff *skb)
+int __tcp_retransmit_skb(struct sock *sk, struct sk_buff *skb, int segs)
 {
-	struct tcp_sock *tp = tcp_sk(sk);
 	struct inet_connection_sock *icsk = inet_csk(sk);
+	struct tcp_sock *tp = tcp_sk(sk);
 	unsigned int cur_mss;
-	int err;
+	int diff, len, err;
+
 
-	/* Inconslusive MTU probe */
-	if (icsk->icsk_mtup.probe_size) {
+	/* Inconclusive MTU probe */
+	if (icsk->icsk_mtup.probe_size)
 		icsk->icsk_mtup.probe_size = 0;
-	}
 
 	/* Do not sent more than we queued. 1/4 is reserved for possible
 	 * copying overhead: fragmentation, tunneling, mangling etc.
@@ -2594,30 +2594,27 @@ int __tcp_retransmit_skb(struct sock *sk, struct sk_buff *skb)
 	    TCP_SKB_CB(skb)->seq != tp->snd_una)
 		return -EAGAIN;
 
-	if (skb->len > cur_mss) {
-		if (tcp_fragment(sk, skb, cur_mss, cur_mss, GFP_ATOMIC))
+	len = cur_mss * segs;
+	if (skb->len > len) {
+		if (tcp_fragment(sk, skb, len, cur_mss, GFP_ATOMIC))
 			return -ENOMEM; /* We'll try again later. */
 	} else {
-		int oldpcount = tcp_skb_pcount(skb);
+		if (skb_unclone(skb, GFP_ATOMIC))
+			return -ENOMEM;
 
-		if (unlikely(oldpcount > 1)) {
-			if (skb_unclone(skb, GFP_ATOMIC))
-				return -ENOMEM;
-			tcp_init_tso_segs(skb, cur_mss);
-			tcp_adjust_pcount(sk, skb, oldpcount - tcp_skb_pcount(skb));
-		}
+		diff = tcp_skb_pcount(skb);
+		tcp_set_skb_tso_segs(skb, cur_mss);
+		diff -= tcp_skb_pcount(skb);
+		if (diff)
+			tcp_adjust_pcount(sk, skb, diff);
+		if (skb->len < cur_mss)
+			tcp_retrans_try_collapse(sk, skb, cur_mss);
 	}
 
 	/* RFC3168, section 6.1.1.1. ECN fallback */
 	if ((TCP_SKB_CB(skb)->tcp_flags & TCPHDR_SYN_ECN) == TCPHDR_SYN_ECN)
 		tcp_ecn_clear_syn(sk, skb);
 
-	tcp_retrans_try_collapse(sk, skb, cur_mss);
-
-	/* Make a copy, if the first transmission SKB clone we made
-	 * is still in somebody's hands, else make a clone.
-	 */
-
 	/* make sure skb->data is aligned on arches that require it
 	 * and check if ack-trimming & collapsing extended the headroom
 	 * beyond what csum_start can cover.
@@ -2633,20 +2630,22 @@ int __tcp_retransmit_skb(struct sock *sk, struct sk_buff *skb)
 	}
 
 	if (likely(!err)) {
+		segs = tcp_skb_pcount(skb);
+
 		TCP_SKB_CB(skb)->sacked |= TCPCB_EVER_RETRANS;
 		/* Update global TCP statistics. */
-		TCP_INC_STATS(sock_net(sk), TCP_MIB_RETRANSSEGS);
+		TCP_ADD_STATS(sock_net(sk), TCP_MIB_RETRANSSEGS, segs);
 		if (TCP_SKB_CB(skb)->tcp_flags & TCPHDR_SYN)
 			NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_TCPSYNRETRANS);
-		tp->total_retrans++;
+		tp->total_retrans += segs;
 	}
 	return err;
 }
 
-int tcp_retransmit_skb(struct sock *sk, struct sk_buff *skb)
+int tcp_retransmit_skb(struct sock *sk, struct sk_buff *skb, int segs)
 {
 	struct tcp_sock *tp = tcp_sk(sk);
-	int err = __tcp_retransmit_skb(sk, skb);
+	int err = __tcp_retransmit_skb(sk, skb, segs);
 
 	if (err == 0) {
 #if FASTRETRANS_DEBUG > 0
@@ -2737,6 +2736,7 @@ void tcp_xmit_retransmit_queue(struct sock *sk)
 
 	tcp_for_write_queue_from(skb, sk) {
 		__u8 sacked = TCP_SKB_CB(skb)->sacked;
+		int segs;
 
 		if (skb == tcp_send_head(sk))
 			break;
@@ -2744,14 +2744,8 @@ void tcp_xmit_retransmit_queue(struct sock *sk)
 		if (!hole)
 			tp->retransmit_skb_hint = skb;
 
-		/* Assume this retransmit will generate
-		 * only one packet for congestion window
-		 * calculation purposes.  This works because
-		 * tcp_retransmit_skb() will chop up the
-		 * packet to be MSS sized and all the
-		 * packet counting works out.
-		 */
-		if (tcp_packets_in_flight(tp) >= tp->snd_cwnd)
+		segs = tp->snd_cwnd - tcp_packets_in_flight(tp);
+		if (segs <= 0)
 			return;
 
 		if (fwd_rexmitting) {
@@ -2788,7 +2782,7 @@ begin_fwd:
 		if (sacked & (TCPCB_SACKED_ACKED|TCPCB_SACKED_RETRANS))
 			continue;
 
-		if (tcp_retransmit_skb(sk, skb))
+		if (tcp_retransmit_skb(sk, skb, segs))
 			return;
 
 		NET_INC_STATS_BH(sock_net(sk), mib_idx);
diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c
index 49bc474f8e35..373b03e78aaa 100644
--- a/net/ipv4/tcp_timer.c
+++ b/net/ipv4/tcp_timer.c
@@ -404,7 +404,7 @@ void tcp_retransmit_timer(struct sock *sk)
 			goto out;
 		}
 		tcp_enter_loss(sk);
-		tcp_retransmit_skb(sk, tcp_write_queue_head(sk));
+		tcp_retransmit_skb(sk, tcp_write_queue_head(sk), 1);
 		__sk_dst_reset(sk);
 		goto out_reset_timer;
 	}
@@ -436,7 +436,7 @@ void tcp_retransmit_timer(struct sock *sk)
 
 	tcp_enter_loss(sk);
 
-	if (tcp_retransmit_skb(sk, tcp_write_queue_head(sk)) > 0) {
+	if (tcp_retransmit_skb(sk, tcp_write_queue_head(sk), 1) > 0) {
 		/* Retransmission failed because of local congestion,
 		 * do not backoff.
 		 */
-- 
2.8.0.rc3.226.g39d4020

^ permalink raw reply related

* Re: [PATCH net-next] macvlan: fix failure during registration v2
From: Eric W. Biederman @ 2016-04-21 17:44 UTC (permalink / raw)
  To: Francesco Ruggeri; +Cc: netdev, David S. Miller, Mahesh Bandewar
In-Reply-To: <1461188301-4466-1-git-send-email-fruggeri@arista.com>

Francesco Ruggeri <fruggeri@arista.com> writes:

> If macvlan_common_newlink fails in register_netdevice after macvlan_init
> then it decrements port->count twice, first in macvlan_uninit (from
> register_netdevice or rollback_registered) and then again in
> macvlan_common_newlink.
> A similar problem may exist in the ipvlan driver.
> This patch consolidates modifications to port->count into macvlan_init
> and macvlan_uninit (thanks to Eric Biederman for suggesting this approach).
> In macvtap_device_event it also avoids cleaning up in NETDEV_UNREGISTER
> if NETDEV_REGISTER had previously failed.
>
> Signed-off-by: Francesco Ruggeri <fruggeri@arista.com>

The macvlan_init bits obviously corect.

The macvtap bits I don't really understand what is going on.

> diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
> index 95394ed..e770221 100644
> --- a/drivers/net/macvtap.c
> +++ b/drivers/net/macvtap.c
> @@ -1303,6 +1303,8 @@ static int macvtap_device_event(struct notifier_block *unused,
>  		}
>  		break;
>  	case NETDEV_UNREGISTER:
> +		if (vlan->minor == 0)
> +			break;

I don't understand this bit.  A minor of 0 is never assigned.  That is
clear from the code.  On what code path can you get here without
assigning a minor?

My gut says this either needs a big fat comment explaining what is going
on, or a slight refactoring of the code (like moving port count
increments into macvlan_init) that makes it so this code path can't
happen.

>  		devt = MKDEV(MAJOR(macvtap_major), vlan->minor);
>  		device_destroy(macvtap_class, devt);
>  		macvtap_free_minor(vlan);

Eric

^ permalink raw reply

* [PATCH net v2 0/3] drivers: net: cpsw: phy-handle fixes
From: David Rivshin (Allworx) @ 2016-04-21 17:50 UTC (permalink / raw)
  To: netdev-u79uwXL29TY76Z2rM5mHXA, linux-omap-u79uwXL29TY76Z2rM5mHXA
  Cc: linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, David Miller, Mugunthan V N,
	Grygorii Strashko, Andrew Goodbody, Markus Brunner,
	Nicolas Chauvet

From: David Rivshin <drivshin-5fOYsn7Fw8lBDgjK7y7TUQ@public.gmane.org>

The first patch fixes a bug that makes dual_emac mode break if
either slave uses the phy-handle property in the devicetree.

The second patch fixes some cosmetic problems with error messages,
and also makes the binding documentation more explicit.

The third patch cleans up the fixed-link case to work like
the now-fixed phy-handle case.

I have tested on the following hardware configurations:
 - (EVMSK) dual emac, phy_id property in both slaves
 - (EVMSK) dual emac, phy-handle property in both slaves
 - (BeagleBoneBlack) single emac, phy_id property
 - (custom) single emac, fixed-link subnode

Nicolas Chauvet reported testing on an HP t410 (dm8148).

Markus Brunner reported testing v1 on the following [1]:
 - emac0 with phy_id and emac1 with fixed phy
 - emac0 with phy-handle and emac1 with fixed phy
 - emac0 with fixed phy and emac1 with fixed phy


Changes since v1 [2]:
- Rebased
- Added Tested-by from Nicolas Chauvet on all patches
- Added Acked-by from Rob Herring for the binding change in patch 2 [3]

[1] http://www.spinics.net/lists/netdev/msg357890.html
[2] http://www.spinics.net/lists/netdev/msg357772.html
[3] http://www.spinics.net/lists/netdev/msg358254.html

David Rivshin (3):
  drivers: net: cpsw: fix parsing of phy-handle DT property in dual_emac
    config
  drivers: net: cpsw: fix error messages when using phy-handle DT
    property
  drivers: net: cpsw: use of_phy_connect() in fixed-link case

 Documentation/devicetree/bindings/net/cpsw.txt |  4 +--
 drivers/net/ethernet/ti/cpsw.c                 | 41 +++++++++++++-------------
 drivers/net/ethernet/ti/cpsw.h                 |  1 +
 3 files changed, 23 insertions(+), 23 deletions(-)

-- 
2.5.5

--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH net-next] perf, bpf: minimize the size of perf_trace_() tracepoint handler
From: David Miller @ 2016-04-21 17:49 UTC (permalink / raw)
  To: ast
  Cc: rostedt, peterz, mingo, daniel, acme, wangnan0, jbacik,
	brendan.d.gregg, netdev, linux-kernel, kernel-team
In-Reply-To: <1461035510-2810305-1-git-send-email-ast@fb.com>

From: Alexei Starovoitov <ast@fb.com>
Date: Mon, 18 Apr 2016 20:11:50 -0700

> move trace_call_bpf() into helper function to minimize the size
> of perf_trace_*() tracepoint handlers.
>     text	   data	    bss	    dec	 	   hex	filename
> 10541679	5526646	2945024	19013349	1221ee5	vmlinux_before
> 10509422	5526646	2945024	18981092	121a0e4	vmlinux_after
> 
> It may seem that perf_fetch_caller_regs() can also be moved,
> but that is incorrect, since ip/sp will be wrong.
> 
> bpf+tracepoint performance is not affected, since
> perf_swevent_put_recursion_context() is now inlined.
> export_symbol_gpl can also be dropped.
> 
> No measurable change in normal perf tracepoints.
> 
> Suggested-by: Steven Rostedt <rostedt@goodmis.org>
> Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Applied, thanks Alexei.

^ permalink raw reply

* Re: drop all fragments inside tx queue if one gets dropped
From: Michael Richardson @ 2016-04-21 17:48 UTC (permalink / raw)
  To: Rick Jones; +Cc: netdev, linux-wpan, Alexander Aring
In-Reply-To: <5717EA7E.1020606@hpe.com>

[-- Attachment #1: Type: text/plain, Size: 1473 bytes --]

Rick Jones <rick.jones2@hpe.com> wrote:
    > I don't recall seeing similar poor behaviour in Linux; I would have
    > assumed
    > that the intra-stack flow-control "took care" of it.  Perhaps there is
    > something specific to wpan which precludes that?

The major user of big UDP packets in the 1990s was NFS.
I fondly recall deploying wsize=1024,rsize=1024 for NFS mounts between HPUX,
Apollo and Sun machines across the intersite Ottawa BNR "WAN"

NFS mounts now use TCP by default, and NFS is not a well used protocol
outside of clueful people (everyone else uses CIFS), and modern machines have
way DMA engines in their ethernet that can accomodate more than enough
xmit buffers to perhaps make this moot.  But, I dealt with this very problem
with a Linux NFS server that would get GbE XOFF's by a broken Cisco switch
that wouldn't always XON, and would seem to drop it's queue. (Turning QoS off
on the cisco switch made it tolerable.)

Still, Alex, it would be worth looking at whether the NFS UDP transmitter
does anything clueful to keep from overwhelming the ethernet layer.

wpan deals with sub-IP fragmentation of 1280 (or larger) IPv6 packets into
127 6lowpan fragments for transmission over 802.15.4 interfaces which run at
typically 250kb/s.   Those radios are typically SPI interfaced (often with
bit-banged SPI interfaces), and the radio MAC has only a single transmit
buffer (which is also the receive buffer!).

Packet trains would be nice to have.

^ permalink raw reply

* Re: [PATCH net] tcp: Fix SOF_TIMESTAMPING_TX_ACK when handling dup acks
From: David Miller @ 2016-04-21 17:46 UTC (permalink / raw)
  To: kafai
  Cc: netdev, kernel-team, edumazet, ncardwell, soheil.kdev, willemb,
	ycheng
In-Reply-To: <1461019193-3034571-1-git-send-email-kafai@fb.com>

From: Martin KaFai Lau <kafai@fb.com>
Date: Mon, 18 Apr 2016 15:39:53 -0700

> Assuming SOF_TIMESTAMPING_TX_ACK is on. When dup acks are received,
> it could incorrectly think that a skb has already
> been acked and queue a SCM_TSTAMP_ACK cmsg to the
> sk->sk_error_queue.
> 
> In tcp_ack_tstamp(), it checks
> 'between(shinfo->tskey, prior_snd_una, tcp_sk(sk)->snd_una - 1)'.
> If prior_snd_una == tcp_sk(sk)->snd_una like the following packetdrill
> script, between() returns true but the tskey is actually not acked.
> e.g. try between(3, 2, 1).
> 
> The fix is to replace between() with one before() and one !before().
> By doing this, the -1 offset on the tcp_sk(sk)->snd_una can also be
> removed.
 ...
> Signed-off-by: Martin KaFai Lau <kafai@fb.com>

Applied, thank you.

^ permalink raw reply

* Re: [PATCH net-next] net: dsa: remove tag_protocol from dsa_switch
From: David Miller @ 2016-04-21 17:44 UTC (permalink / raw)
  To: vivien.didelot; +Cc: netdev, linux-kernel, kernel, f.fainelli, andrew
In-Reply-To: <1461018244-5371-1-git-send-email-vivien.didelot@savoirfairelinux.com>

From: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Date: Mon, 18 Apr 2016 18:24:04 -0400

> Having the tag protocol in dsa_switch_driver for setup time and in
> dsa_switch_tree for runtime is enough. Remove dsa_switch's one.
> 
> Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>

Applied.

^ permalink raw reply

* Re: [PATCHv2 net] openvswitch: Orphan skbs before IPv6 defrag
From: David Miller @ 2016-04-21 17:42 UTC (permalink / raw)
  To: joe; +Cc: netdev, fw, netfilter-devel, diproiettod, pablo
In-Reply-To: <1461016307-14098-1-git-send-email-joe@ovn.org>

From: Joe Stringer <joe@ovn.org>
Date: Mon, 18 Apr 2016 14:51:47 -0700

> This is the IPv6 counterpart to commit 8282f27449bf ("inet: frag: Always
> orphan skbs inside ip_defrag()").
> 
> Prior to commit 029f7f3b8701 ("netfilter: ipv6: nf_defrag: avoid/free
> clone operations"), ipv6 fragments sent to nf_ct_frag6_gather() would be
> cloned (implicitly orphaning) prior to queueing for reassembly. As such,
> when the IPv6 message is eventually reassembled, the skb->sk for all
> fragments would be NULL. After that commit was introduced, rather than
> cloning, the original skbs were queued directly without orphaning. The
> end result is that all frags except for the first and last may have a
> socket attached.
> 
> This commit explicitly orphans such skbs during nf_ct_frag6_gather() to
> prevent BUG_ON(skb->sk) during a later call to ip6_fragment().
 ...
> Fixes: 029f7f3b8701 ("netfilter: ipv6: nf_defrag: avoid/free clone
> operations")
> Reported-by: Daniele Di Proietto <diproiettod@vmware.com>
> Signed-off-by: Joe Stringer <joe@ovn.org>

Applied and queued up for -stable, thanks.

^ permalink raw reply

* Re: [RFC PATCH net-next 7/8] net: ipv4: listified version of ip_rcv
From: Edward Cree @ 2016-04-21 17:24 UTC (permalink / raw)
  To: netdev, David Miller; +Cc: Jesper Dangaard Brouer, linux-net-drivers
In-Reply-To: <5716347D.3030808@solarflare.com>

On 19/04/16 14:37, Edward Cree wrote:
> Also involved adding a way to run a netfilter hook over a list of packets.
Turns out, this breaks the build if netfilter is *disabled*, because I forgot to
add a stub in that case.  Next version of patch (if there is one) will have the
following under #ifndef CONFIG_NETFILTER:

static inline void
NF_HOOK_LIST(uint8_t pf, unsigned int hook, struct net *net, struct sock *sk,
	     struct sk_buff_head *list, struct sk_buff_head *sublist,
	     struct net_device *in, struct net_device *out,
	     int (*okfn)(struct net *, struct sock *, struct sk_buff *))
{
	__skb_queue_head_init(sublist);
	/* Move everything to the sublist */
	skb_queue_splice_init(list, sublist);
}

-Ed

^ permalink raw reply

* [PATCH] ixgbevf: Fix relaxed order settings in VF driver
From: Babu Moger @ 2016-04-21 17:21 UTC (permalink / raw)
  To: jeffrey.t.kirsher, jesse.brandeburg, shannon.nelson,
	carolyn.wyborny, donald.c.skidmore, bruce.w.allan, john.ronciak,
	mitch.a.williams
  Cc: intel-wired-lan, netdev, linux-kernel, babu.moger,
	sowmini.varadhan

Current code writes the tx/rx relaxed order without reading it first.
This can lead to unintended consequences as we are forcibly writing
other bits.

We noticed this problem while testing VF driver on sparc. Relaxed
order settings for rx queue were all messed up which was causing
performance drop with VF interface.

Fixed it by reading the registers first and setting the specific
bit of interest. With this change we are able to match the bandwidth
equivalent to PF interface.

Signed-off-by: Babu Moger <babu.moger@oracle.com>
---
 drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c |    9 +++++++--
 1 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
index 0ea14c0..51abff1 100644
--- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
+++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
@@ -1545,6 +1545,7 @@ static inline void ixgbevf_irq_enable(struct ixgbevf_adapter *adapter)
 static void ixgbevf_configure_tx_ring(struct ixgbevf_adapter *adapter,
 				      struct ixgbevf_ring *ring)
 {
+	u32 regval;
 	struct ixgbe_hw *hw = &adapter->hw;
 	u64 tdba = ring->dma;
 	int wait_loop = 10;
@@ -1565,8 +1566,10 @@ static void ixgbevf_configure_tx_ring(struct ixgbevf_adapter *adapter,
 	IXGBE_WRITE_REG(hw, IXGBE_VFTDWBAL(reg_idx), 0);
 
 	/* enable relaxed ordering */
+	regval = IXGBE_READ_REG(hw, IXGBE_VFDCA_TXCTRL(reg_idx));
+
 	IXGBE_WRITE_REG(hw, IXGBE_VFDCA_TXCTRL(reg_idx),
-			(IXGBE_DCA_TXCTRL_DESC_RRO_EN |
+			(regval | IXGBE_DCA_TXCTRL_DESC_RRO_EN |
 			 IXGBE_DCA_TXCTRL_DATA_RRO_EN));
 
 	/* reset head and tail pointers */
@@ -1734,6 +1737,7 @@ static void ixgbevf_setup_vfmrqc(struct ixgbevf_adapter *adapter)
 static void ixgbevf_configure_rx_ring(struct ixgbevf_adapter *adapter,
 				      struct ixgbevf_ring *ring)
 {
+	u32 regval;
 	struct ixgbe_hw *hw = &adapter->hw;
 	u64 rdba = ring->dma;
 	u32 rxdctl;
@@ -1749,8 +1753,9 @@ static void ixgbevf_configure_rx_ring(struct ixgbevf_adapter *adapter,
 			ring->count * sizeof(union ixgbe_adv_rx_desc));
 
 	/* enable relaxed ordering */
+	regval = IXGBE_READ_REG(hw, IXGBE_VFDCA_RXCTRL(reg_idx));
 	IXGBE_WRITE_REG(hw, IXGBE_VFDCA_RXCTRL(reg_idx),
-			IXGBE_DCA_RXCTRL_DESC_RRO_EN);
+			regval | IXGBE_DCA_RXCTRL_DESC_RRO_EN);
 
 	/* reset head and tail pointers */
 	IXGBE_WRITE_REG(hw, IXGBE_VFRDH(reg_idx), 0);
-- 
1.7.1

^ permalink raw reply related

* [PATCH net-next v2 2/4] rtnl: use the new API to align IFLA_STATS*
From: Nicolas Dichtel @ 2016-04-21 16:58 UTC (permalink / raw)
  To: netdev; +Cc: davem, roopa, eric.dumazet, tgraf, jhs, Nicolas Dichtel
In-Reply-To: <1461257907-4458-1-git-send-email-nicolas.dichtel@6wind.com>

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
---
 net/core/rtnetlink.c | 22 +++++-----------------
 1 file changed, 5 insertions(+), 17 deletions(-)

diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 4a47a9aceb1d..5ec059d52823 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -1051,14 +1051,9 @@ static noinline_for_stack int rtnl_fill_stats(struct sk_buff *skb,
 {
 	struct rtnl_link_stats64 *sp;
 	struct nlattr *attr;
-	int err;
-
-	err = nla_align_64bit(skb, IFLA_PAD);
-	if (err)
-		return err;
 
-	attr = nla_reserve(skb, IFLA_STATS64,
-			   sizeof(struct rtnl_link_stats64));
+	attr = nla_reserve_64bit(skb, IFLA_STATS64,
+				 sizeof(struct rtnl_link_stats64), IFLA_PAD);
 	if (!attr)
 		return -EMSGSIZE;
 
@@ -3469,17 +3464,10 @@ static int rtnl_fill_statsinfo(struct sk_buff *skb, struct net_device *dev,
 
 	if (filter_mask & IFLA_STATS_FILTER_BIT(IFLA_STATS_LINK_64)) {
 		struct rtnl_link_stats64 *sp;
-		int err;
-
-		/* if necessary, add a zero length NOP attribute so that
-		 * IFLA_STATS_LINK_64 will be 64-bit aligned
-		 */
-		err = nla_align_64bit(skb, IFLA_STATS_UNSPEC);
-		if (err)
-			goto nla_put_failure;
 
-		attr = nla_reserve(skb, IFLA_STATS_LINK_64,
-				   sizeof(struct rtnl_link_stats64));
+		attr = nla_reserve_64bit(skb, IFLA_STATS_LINK_64,
+					 sizeof(struct rtnl_link_stats64),
+					 IFLA_STATS_UNSPEC);
 		if (!attr)
 			goto nla_put_failure;
 
-- 
2.4.2

^ permalink raw reply related

* [PATCH net-next v2 4/4] ip6mr: align RTA_MFC_STATS on 64-bit
From: Nicolas Dichtel @ 2016-04-21 16:58 UTC (permalink / raw)
  To: netdev; +Cc: davem, roopa, eric.dumazet, tgraf, jhs, Nicolas Dichtel
In-Reply-To: <1461257907-4458-1-git-send-email-nicolas.dichtel@6wind.com>

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
---
 net/ipv6/ip6mr.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/ipv6/ip6mr.c b/net/ipv6/ip6mr.c
index a10e77103c88..bf678324fd52 100644
--- a/net/ipv6/ip6mr.c
+++ b/net/ipv6/ip6mr.c
@@ -2268,7 +2268,7 @@ static int __ip6mr_fill_mroute(struct mr6_table *mrt, struct sk_buff *skb,
 	mfcs.mfcs_packets = c->mfc_un.res.pkt;
 	mfcs.mfcs_bytes = c->mfc_un.res.bytes;
 	mfcs.mfcs_wrong_if = c->mfc_un.res.wrong_if;
-	if (nla_put(skb, RTA_MFC_STATS, sizeof(mfcs), &mfcs) < 0)
+	if (nla_put_64bit(skb, RTA_MFC_STATS, sizeof(mfcs), &mfcs, RTA_PAD) < 0)
 		return -EMSGSIZE;
 
 	rtm->rtm_type = RTN_MULTICAST;
@@ -2411,7 +2411,7 @@ static int mr6_msgsize(bool unresolved, int maxvif)
 		      + nla_total_size(0)	/* RTA_MULTIPATH */
 		      + maxvif * NLA_ALIGN(sizeof(struct rtnexthop))
 						/* RTA_MFC_STATS */
-		      + nla_total_size(sizeof(struct rta_mfc_stats))
+		      + nla_total_size_64bit(sizeof(struct rta_mfc_stats))
 		;
 
 	return len;
-- 
2.4.2

^ permalink raw reply related

* [PATCH net-next v2 3/4] ipmr: align RTA_MFC_STATS on 64-bit
From: Nicolas Dichtel @ 2016-04-21 16:58 UTC (permalink / raw)
  To: netdev; +Cc: davem, roopa, eric.dumazet, tgraf, jhs, Nicolas Dichtel
In-Reply-To: <1461257907-4458-1-git-send-email-nicolas.dichtel@6wind.com>

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
---
 include/uapi/linux/rtnetlink.h | 1 +
 net/ipv4/ipmr.c                | 4 ++--
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/include/uapi/linux/rtnetlink.h b/include/uapi/linux/rtnetlink.h
index cc885c4e9065..a94e0b69c769 100644
--- a/include/uapi/linux/rtnetlink.h
+++ b/include/uapi/linux/rtnetlink.h
@@ -317,6 +317,7 @@ enum rtattr_type_t {
 	RTA_ENCAP_TYPE,
 	RTA_ENCAP,
 	RTA_EXPIRES,
+	RTA_PAD,
 	__RTA_MAX
 };
 
diff --git a/net/ipv4/ipmr.c b/net/ipv4/ipmr.c
index 395e2814a46d..21a38e296fe2 100644
--- a/net/ipv4/ipmr.c
+++ b/net/ipv4/ipmr.c
@@ -2104,7 +2104,7 @@ static int __ipmr_fill_mroute(struct mr_table *mrt, struct sk_buff *skb,
 	mfcs.mfcs_packets = c->mfc_un.res.pkt;
 	mfcs.mfcs_bytes = c->mfc_un.res.bytes;
 	mfcs.mfcs_wrong_if = c->mfc_un.res.wrong_if;
-	if (nla_put(skb, RTA_MFC_STATS, sizeof(mfcs), &mfcs) < 0)
+	if (nla_put_64bit(skb, RTA_MFC_STATS, sizeof(mfcs), &mfcs, RTA_PAD) < 0)
 		return -EMSGSIZE;
 
 	rtm->rtm_type = RTN_MULTICAST;
@@ -2237,7 +2237,7 @@ static size_t mroute_msgsize(bool unresolved, int maxvif)
 		      + nla_total_size(0)	/* RTA_MULTIPATH */
 		      + maxvif * NLA_ALIGN(sizeof(struct rtnexthop))
 						/* RTA_MFC_STATS */
-		      + nla_total_size(sizeof(struct rta_mfc_stats))
+		      + nla_total_size_64bit(sizeof(struct rta_mfc_stats))
 		;
 
 	return len;
-- 
2.4.2

^ permalink raw reply related

* [PATCH net-next v2 1/4] libnl: add more helpers to align attributes on 64-bit
From: Nicolas Dichtel @ 2016-04-21 16:58 UTC (permalink / raw)
  To: netdev; +Cc: davem, roopa, eric.dumazet, tgraf, jhs, Nicolas Dichtel
In-Reply-To: <1461257907-4458-1-git-send-email-nicolas.dichtel@6wind.com>

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
---
 include/net/netlink.h | 39 +++++++++++++++-----
 lib/nlattr.c          | 99 +++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 130 insertions(+), 8 deletions(-)

diff --git a/include/net/netlink.h b/include/net/netlink.h
index 3c1fd92a52c8..6f51a8a06498 100644
--- a/include/net/netlink.h
+++ b/include/net/netlink.h
@@ -244,13 +244,21 @@ int nla_memcpy(void *dest, const struct nlattr *src, int count);
 int nla_memcmp(const struct nlattr *nla, const void *data, size_t size);
 int nla_strcmp(const struct nlattr *nla, const char *str);
 struct nlattr *__nla_reserve(struct sk_buff *skb, int attrtype, int attrlen);
+struct nlattr *__nla_reserve_64bit(struct sk_buff *skb, int attrtype,
+				   int attrlen, int padattr);
 void *__nla_reserve_nohdr(struct sk_buff *skb, int attrlen);
 struct nlattr *nla_reserve(struct sk_buff *skb, int attrtype, int attrlen);
+struct nlattr *nla_reserve_64bit(struct sk_buff *skb, int attrtype,
+				 int attrlen, int padattr);
 void *nla_reserve_nohdr(struct sk_buff *skb, int attrlen);
 void __nla_put(struct sk_buff *skb, int attrtype, int attrlen,
 	       const void *data);
+void __nla_put_64bit(struct sk_buff *skb, int attrtype, int attrlen,
+		     const void *data, int padattr);
 void __nla_put_nohdr(struct sk_buff *skb, int attrlen, const void *data);
 int nla_put(struct sk_buff *skb, int attrtype, int attrlen, const void *data);
+int nla_put_64bit(struct sk_buff *skb, int attrtype, int attrlen,
+		  const void *data, int padattr);
 int nla_put_nohdr(struct sk_buff *skb, int attrlen, const void *data);
 int nla_append(struct sk_buff *skb, int attrlen, const void *data);
 
@@ -1231,6 +1239,27 @@ static inline int nla_validate_nested(const struct nlattr *start, int maxtype,
 }
 
 /**
+ * nla_need_padding_for_64bit - test 64-bit alignment of the next attribute
+ * @skb: socket buffer the message is stored in
+ *
+ * Return true if padding is needed to align the next attribute (nla_data()) to
+ * a 64-bit aligned area.
+ */
+static inline bool nla_need_padding_for_64bit(struct sk_buff *skb)
+{
+#ifndef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
+	/* The nlattr header is 4 bytes in size, that's why we test
+	 * if the skb->data _is_ aligned.  A NOP attribute, plus
+	 * nlattr header for next attribute, will make nla_data()
+	 * 8-byte aligned.
+	 */
+	if (IS_ALIGNED((unsigned long)skb_tail_pointer(skb), 8))
+		return true;
+#endif
+	return false;
+}
+
+/**
  * nla_align_64bit - 64-bit align the nla_data() of next attribute
  * @skb: socket buffer the message is stored in
  * @padattr: attribute type for the padding
@@ -1244,16 +1273,10 @@ static inline int nla_validate_nested(const struct nlattr *start, int maxtype,
  */
 static inline int nla_align_64bit(struct sk_buff *skb, int padattr)
 {
-#ifndef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
-	/* The nlattr header is 4 bytes in size, that's why we test
-	 * if the skb->data _is_ aligned.  This NOP attribute, plus
-	 * nlattr header for next attribute, will make nla_data()
-	 * 8-byte aligned.
-	 */
-	if (IS_ALIGNED((unsigned long)skb_tail_pointer(skb), 8) &&
+	if (nla_need_padding_for_64bit(skb) &&
 	    !nla_reserve(skb, padattr, 0))
 		return -EMSGSIZE;
-#endif
+
 	return 0;
 }
 
diff --git a/lib/nlattr.c b/lib/nlattr.c
index f5907d23272d..2b82f1e2ebc2 100644
--- a/lib/nlattr.c
+++ b/lib/nlattr.c
@@ -355,6 +355,29 @@ struct nlattr *__nla_reserve(struct sk_buff *skb, int attrtype, int attrlen)
 EXPORT_SYMBOL(__nla_reserve);
 
 /**
+ * __nla_reserve_64bit - reserve room for attribute on the skb and align it
+ * @skb: socket buffer to reserve room on
+ * @attrtype: attribute type
+ * @attrlen: length of attribute payload
+ *
+ * Adds a netlink attribute header to a socket buffer and reserves
+ * room for the payload but does not copy it. It also ensure that this
+ * attribute will be 64-bit aign.
+ *
+ * The caller is responsible to ensure that the skb provides enough
+ * tailroom for the attribute header and payload.
+ */
+struct nlattr *__nla_reserve_64bit(struct sk_buff *skb, int attrtype,
+				   int attrlen, int padattr)
+{
+	if (nla_need_padding_for_64bit(skb))
+		nla_align_64bit(skb, padattr);
+
+	return __nla_reserve(skb, attrtype, attrlen);
+}
+EXPORT_SYMBOL(__nla_reserve_64bit);
+
+/**
  * __nla_reserve_nohdr - reserve room for attribute without header
  * @skb: socket buffer to reserve room on
  * @attrlen: length of attribute payload
@@ -397,6 +420,35 @@ struct nlattr *nla_reserve(struct sk_buff *skb, int attrtype, int attrlen)
 EXPORT_SYMBOL(nla_reserve);
 
 /**
+ * nla_reserve_64bit - reserve room for attribute on the skb and align it
+ * @skb: socket buffer to reserve room on
+ * @attrtype: attribute type
+ * @attrlen: length of attribute payload
+ *
+ * Adds a netlink attribute header to a socket buffer and reserves
+ * room for the payload but does not copy it. It also ensure that this
+ * attribute will be 64-bit aign.
+ *
+ * Returns NULL if the tailroom of the skb is insufficient to store
+ * the attribute header and payload.
+ */
+struct nlattr *nla_reserve_64bit(struct sk_buff *skb, int attrtype, int attrlen,
+				 int padattr)
+{
+	size_t len;
+
+	if (nla_need_padding_for_64bit(skb))
+		len = nla_total_size_64bit(attrlen);
+	else
+		len = nla_total_size(attrlen);
+	if (unlikely(skb_tailroom(skb) < len))
+		return NULL;
+
+	return __nla_reserve_64bit(skb, attrtype, attrlen, padattr);
+}
+EXPORT_SYMBOL(nla_reserve_64bit);
+
+/**
  * nla_reserve_nohdr - reserve room for attribute without header
  * @skb: socket buffer to reserve room on
  * @attrlen: length of attribute payload
@@ -436,6 +488,26 @@ void __nla_put(struct sk_buff *skb, int attrtype, int attrlen,
 EXPORT_SYMBOL(__nla_put);
 
 /**
+ * __nla_put_64bit - Add a netlink attribute to a socket buffer and align it
+ * @skb: socket buffer to add attribute to
+ * @attrtype: attribute type
+ * @attrlen: length of attribute payload
+ * @data: head of attribute payload
+ *
+ * The caller is responsible to ensure that the skb provides enough
+ * tailroom for the attribute header and payload.
+ */
+void __nla_put_64bit(struct sk_buff *skb, int attrtype, int attrlen,
+		     const void *data, int padattr)
+{
+	struct nlattr *nla;
+
+	nla = __nla_reserve_64bit(skb, attrtype, attrlen, padattr);
+	memcpy(nla_data(nla), data, attrlen);
+}
+EXPORT_SYMBOL(__nla_put_64bit);
+
+/**
  * __nla_put_nohdr - Add a netlink attribute without header
  * @skb: socket buffer to add attribute to
  * @attrlen: length of attribute payload
@@ -474,6 +546,33 @@ int nla_put(struct sk_buff *skb, int attrtype, int attrlen, const void *data)
 EXPORT_SYMBOL(nla_put);
 
 /**
+ * nla_put_64bit - Add a netlink attribute to a socket buffer and align it
+ * @skb: socket buffer to add attribute to
+ * @attrtype: attribute type
+ * @attrlen: length of attribute payload
+ * @data: head of attribute payload
+ *
+ * Returns -EMSGSIZE if the tailroom of the skb is insufficient to store
+ * the attribute header and payload.
+ */
+int nla_put_64bit(struct sk_buff *skb, int attrtype, int attrlen,
+		  const void *data, int padattr)
+{
+	size_t len;
+
+	if (nla_need_padding_for_64bit(skb))
+		len = nla_total_size_64bit(attrlen);
+	else
+		len = nla_total_size(attrlen);
+	if (unlikely(skb_tailroom(skb) < len))
+		return -EMSGSIZE;
+
+	__nla_put_64bit(skb, attrtype, attrlen, data, padattr);
+	return 0;
+}
+EXPORT_SYMBOL(nla_put_64bit);
+
+/**
  * nla_put_nohdr - Add a netlink attribute without header
  * @skb: socket buffer to add attribute to
  * @attrlen: length of attribute payload
-- 
2.4.2

^ permalink raw reply related

* [PATCH net-next v2 0/4] libnl: enhance API to ease 64bit alignment for attribute
From: Nicolas Dichtel @ 2016-04-21 16:58 UTC (permalink / raw)
  To: netdev; +Cc: davem, roopa, eric.dumazet, tgraf, jhs
In-Reply-To: <1461142655-5067-1-git-send-email-nicolas.dichtel@6wind.com>

Here is a proposal to add more helpers in the libnetlink to manage 64-bit
alignment issues.
Note that this series was only tested on x86 by tweeking
CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS and adding some traces.

The first patch adds helpers for 64bit alignment and other patches
use them.

We could also add helpers for nla_put_u64() and its variants if needed.

v1 -> v2:
 - remove patch #1
 - split patch #2 (now #1 and #2)
 - add nla_need_padding_for_64bit()

 include/net/netlink.h          | 39 +++++++++++++----
 include/uapi/linux/rtnetlink.h |  1 +
 lib/nlattr.c                   | 99 ++++++++++++++++++++++++++++++++++++++++++
 net/core/rtnetlink.c           | 22 +++-------
 net/ipv4/ipmr.c                |  4 +-
 net/ipv6/ip6mr.c               |  4 +-
 6 files changed, 140 insertions(+), 29 deletions(-)

Comments are welcomed,
Regards,
Nicolas

^ permalink raw reply

* Re: [RFC PATCH v3 net-next 2/3] tcp: Handle eor bit when coalescing skb
From: Martin KaFai Lau @ 2016-04-21 16:56 UTC (permalink / raw)
  To: Soheil Hassas Yeganeh
  Cc: netdev, Eric Dumazet, Neal Cardwell, Willem de Bruijn,
	Yuchung Cheng, Kernel Team
In-Reply-To: <CACSApvbyD7wFy+y2MrShKRMCTg-zF6UkP-Le867N=ssCYa-Yng@mail.gmail.com>

On Wed, Apr 20, 2016 at 04:04:54PM -0400, Soheil Hassas Yeganeh wrote:
> > diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> > index a6e4a83..96bdf98 100644
> > --- a/net/ipv4/tcp_output.c
> > +++ b/net/ipv4/tcp_output.c
> > @@ -2494,6 +2494,7 @@ static void tcp_collapse_retrans(struct sock *sk, struct sk_buff *skb)
> >          * packet counting does not break.
> >          */
> >         TCP_SKB_CB(skb)->sacked |= TCP_SKB_CB(next_skb)->sacked & TCPCB_EVER_RETRANS;
> > +       TCP_SKB_CB(skb)->eor = TCP_SKB_CB(next_skb)->eor;
> >
> >         /* changed transmit queue under us so clear hints */
> >         tcp_clear_retrans_hints_partial(tp);
> > @@ -2545,6 +2546,9 @@ static void tcp_retrans_try_collapse(struct sock *sk, struct sk_buff *to,
> >                 if (!tcp_can_collapse(sk, skb))
> >                         break;
> >
> > +               if (TCP_SKB_CB(to)->eor)
> > +                       break;
> > +
>
> nit: Perhaps a better place to check for eor is right after entering
> the loop? to skip a few instructions and tcp_can_collapse, in an
> unlikely case eor is set.
hmm... Not sure I understand it.
You meant moving the unlikely case before (or after?) the more likely
cases which may have a better chance to break the loop sooner?

^ permalink raw reply

* [RFC PATCH v3 2/2] ppp: add rtnetlink device creation support
From: Guillaume Nault @ 2016-04-21 16:22 UTC (permalink / raw)
  To: netdev
  Cc: linux-ppp, David Miller, Paul Mackerras, Stephen Hemminger,
	walter harms
In-Reply-To: <cover.1461251392.git.g.nault@alphalink.fr>

Define PPP device handler for use with rtnetlink.
The only PPP specific attribute is IFLA_PPP_DEV_FD. It is mandatory and
contains the file descriptor of the associated /dev/ppp instance (the
file descriptor which would have been used for ioctl(PPPIOCNEWUNIT) in
the ioctl-based API). The PPP device is removed when this file
descriptor is released (same behaviour as with ioctl based PPP
devices).

PPP devices created with the rtnetlink API behave like the ones created
with ioctl(PPPIOCNEWUNIT). In particular existing ioctls work the same
way, no matter how the PPP device was created.
The rtnl callbacks are also assigned to ioctl based PPP devices. This
way, rtnl messages have the same effect on any PPP devices.
The immediate effect is that all PPP devices, even ioctl-based
ones, can now be removed with "ip link del".

A minor difference still exists between ioctl and rtnl based PPP
interfaces: in the device name, the number following the "ppp" prefix
corresponds to the PPP unit number for ioctl based devices, while it is
just an unrelated incrementing index for rtnl ones.

Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
---
 drivers/net/ppp/ppp_generic.c | 115 ++++++++++++++++++++++++++++++++++++++++--
 include/uapi/linux/if_link.h  |   8 +++
 2 files changed, 120 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ppp/ppp_generic.c b/drivers/net/ppp/ppp_generic.c
index f581a77..8b3db3a 100644
--- a/drivers/net/ppp/ppp_generic.c
+++ b/drivers/net/ppp/ppp_generic.c
@@ -46,6 +46,7 @@
 #include <linux/device.h>
 #include <linux/mutex.h>
 #include <linux/slab.h>
+#include <linux/file.h>
 #include <asm/unaligned.h>
 #include <net/slhc_vj.h>
 #include <linux/atomic.h>
@@ -186,6 +187,7 @@ struct channel {
 struct ppp_config {
 	struct file *file;
 	s32 unit;
+	bool ifname_is_set;
 };
 
 /*
@@ -286,6 +288,7 @@ static int unit_get(struct idr *p, void *ptr);
 static int unit_set(struct idr *p, void *ptr, int n);
 static void unit_put(struct idr *p, int n);
 static void *unit_find(struct idr *p, int n);
+static void ppp_setup(struct net_device *dev);
 
 static const struct net_device_ops ppp_netdev_ops;
 
@@ -964,7 +967,7 @@ static struct pernet_operations ppp_net_ops = {
 	.size = sizeof(struct ppp_net),
 };
 
-static int ppp_unit_register(struct ppp *ppp, int unit)
+static int ppp_unit_register(struct ppp *ppp, int unit, bool ifname_is_set)
 {
 	struct ppp_net *pn = ppp_pernet(ppp->ppp_net);
 	int ret;
@@ -994,7 +997,8 @@ static int ppp_unit_register(struct ppp *ppp, int unit)
 	}
 	ppp->file.index = ret;
 
-	snprintf(ppp->dev->name, IFNAMSIZ, "ppp%i", ppp->file.index);
+	if (!ifname_is_set)
+		snprintf(ppp->dev->name, IFNAMSIZ, "ppp%i", ppp->file.index);
 
 	ret = register_netdevice(ppp->dev);
 	if (ret < 0)
@@ -1043,7 +1047,7 @@ static int ppp_dev_configure(struct net *src_net, struct net_device *dev,
 	ppp->active_filter = NULL;
 #endif /* CONFIG_PPP_FILTER */
 
-	err = ppp_unit_register(ppp, conf->unit);
+	err = ppp_unit_register(ppp, conf->unit, conf->ifname_is_set);
 	if (err < 0)
 		return err;
 
@@ -1052,6 +1056,99 @@ static int ppp_dev_configure(struct net *src_net, struct net_device *dev,
 	return 0;
 }
 
+static const struct nla_policy ppp_nl_policy[IFLA_PPP_MAX + 1] = {
+	[IFLA_PPP_DEV_FD]	= { .type = NLA_S32 },
+};
+
+static int ppp_nl_validate(struct nlattr *tb[], struct nlattr *data[])
+{
+	if (!data)
+		return -EINVAL;
+
+	if (!data[IFLA_PPP_DEV_FD])
+		return -EINVAL;
+	if (nla_get_s32(data[IFLA_PPP_DEV_FD]) < 0)
+		return -EBADF;
+
+	return 0;
+}
+
+static int ppp_nl_newlink(struct net *src_net, struct net_device *dev,
+			  struct nlattr *tb[], struct nlattr *data[])
+{
+	struct ppp_config conf = {
+		.unit = -1,
+		.ifname_is_set = true,
+	};
+	struct file *file;
+	int err;
+
+	file = fget(nla_get_s32(data[IFLA_PPP_DEV_FD]));
+	if (!file)
+		return -EBADF;
+
+	/* rtnl_lock is already held here, but ppp_create_interface() locks
+	 * ppp_mutex before holding rtnl_lock. Using mutex_trylock() avoids
+	 * possible deadlock due to lock order inversion, at the cost of
+	 * pushing the problem back to userspace.
+	 */
+	if (!mutex_trylock(&ppp_mutex)) {
+		err = -EBUSY;
+		goto out;
+	}
+
+	if (file->f_op != &ppp_device_fops || file->private_data) {
+		err = -EBADF;
+		goto out_unlock;
+	}
+
+	conf.file = file;
+	err = ppp_dev_configure(src_net, dev, &conf);
+
+out_unlock:
+	mutex_unlock(&ppp_mutex);
+out:
+	fput(file);
+
+	return err;
+}
+
+static void ppp_nl_dellink(struct net_device *dev, struct list_head *head)
+{
+	unregister_netdevice_queue(dev, head);
+}
+
+static size_t ppp_nl_get_size(const struct net_device *dev)
+{
+	return 0;
+}
+
+static int ppp_nl_fill_info(struct sk_buff *skb, const struct net_device *dev)
+{
+	return 0;
+}
+
+static struct net *ppp_nl_get_link_net(const struct net_device *dev)
+{
+	struct ppp *ppp = netdev_priv(dev);
+
+	return ppp->ppp_net;
+}
+
+static struct rtnl_link_ops ppp_link_ops __read_mostly = {
+	.kind		= "ppp",
+	.maxtype	= IFLA_PPP_MAX,
+	.policy		= ppp_nl_policy,
+	.priv_size	= sizeof(struct ppp),
+	.setup		= ppp_setup,
+	.validate	= ppp_nl_validate,
+	.newlink	= ppp_nl_newlink,
+	.dellink	= ppp_nl_dellink,
+	.get_size	= ppp_nl_get_size,
+	.fill_info	= ppp_nl_fill_info,
+	.get_link_net	= ppp_nl_get_link_net,
+};
+
 #define PPP_MAJOR	108
 
 /* Called at boot time if ppp is compiled into the kernel,
@@ -1080,11 +1177,19 @@ static int __init ppp_init(void)
 		goto out_chrdev;
 	}
 
+	err = rtnl_link_register(&ppp_link_ops);
+	if (err) {
+		pr_err("failed to register rtnetlink PPP handler\n");
+		goto out_class;
+	}
+
 	/* not a big deal if we fail here :-) */
 	device_create(ppp_class, NULL, MKDEV(PPP_MAJOR, 0), NULL, "ppp");
 
 	return 0;
 
+out_class:
+	class_destroy(ppp_class);
 out_chrdev:
 	unregister_chrdev(PPP_MAJOR, "ppp");
 out_net:
@@ -2829,6 +2934,7 @@ static int ppp_create_interface(struct net *net, struct file *file, int *unit)
 	struct ppp_config conf = {
 		.file = file,
 		.unit = *unit,
+		.ifname_is_set = false,
 	};
 	struct ppp *ppp;
 	struct net_device *dev;
@@ -2840,6 +2946,7 @@ static int ppp_create_interface(struct net *net, struct file *file, int *unit)
 		goto err;
 	}
 	dev_net_set(dev, net);
+	dev->rtnl_link_ops = &ppp_link_ops;
 
 	rtnl_lock();
 
@@ -3046,6 +3153,7 @@ static void __exit ppp_cleanup(void)
 	/* should never happen */
 	if (atomic_read(&ppp_unit_count) || atomic_read(&channel_count))
 		pr_err("PPP: removing module but units remain!\n");
+	rtnl_link_unregister(&ppp_link_ops);
 	unregister_chrdev(PPP_MAJOR, "ppp");
 	device_destroy(ppp_class, MKDEV(PPP_MAJOR, 0));
 	class_destroy(ppp_class);
@@ -3104,4 +3212,5 @@ EXPORT_SYMBOL(ppp_register_compressor);
 EXPORT_SYMBOL(ppp_unregister_compressor);
 MODULE_LICENSE("GPL");
 MODULE_ALIAS_CHARDEV(PPP_MAJOR, 0);
+MODULE_ALIAS_RTNL_LINK("ppp");
 MODULE_ALIAS("devname:ppp");
diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index ba69d44..6b2ec17 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -517,6 +517,14 @@ enum {
 };
 #define IFLA_GENEVE_MAX	(__IFLA_GENEVE_MAX - 1)
 
+/* PPP section */
+enum {
+	IFLA_PPP_UNSPEC,
+	IFLA_PPP_DEV_FD,
+	__IFLA_PPP_MAX
+};
+#define IFLA_PPP_MAX (__IFLA_PPP_MAX - 1)
+
 /* Bonding section */
 
 enum {
-- 
2.8.1


^ permalink raw reply related

* [RFC PATCH v3 1/2] ppp: define reusable device creation functions
From: Guillaume Nault @ 2016-04-21 16:22 UTC (permalink / raw)
  To: netdev
  Cc: linux-ppp, David Miller, Paul Mackerras, Stephen Hemminger,
	walter harms
In-Reply-To: <cover.1461251392.git.g.nault@alphalink.fr>

Move PPP device initialisation and registration out of
ppp_create_interface().
This prepares code for device registration with rtnetlink.

While there, simplify the prototype of ppp_create_interface():
  * Since ppp_dev_configure() takes care of setting file->private_data,
    there's no need to return a ppp structure to ppp_unattached_ioctl()
    anymore.
  * The unit parameter is made read/write so that ppp_create_interface()
    can tell which unit number has been assigned.

Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
---
 drivers/net/ppp/ppp_generic.c | 206 ++++++++++++++++++++++++------------------
 1 file changed, 118 insertions(+), 88 deletions(-)

diff --git a/drivers/net/ppp/ppp_generic.c b/drivers/net/ppp/ppp_generic.c
index f572b31..f581a77 100644
--- a/drivers/net/ppp/ppp_generic.c
+++ b/drivers/net/ppp/ppp_generic.c
@@ -183,6 +183,11 @@ struct channel {
 #endif /* CONFIG_PPP_MULTILINK */
 };
 
+struct ppp_config {
+	struct file *file;
+	s32 unit;
+};
+
 /*
  * SMP locking issues:
  * Both the ppp.rlock and ppp.wlock locks protect the ppp.channels
@@ -269,8 +274,7 @@ static void ppp_ccp_peek(struct ppp *ppp, struct sk_buff *skb, int inbound);
 static void ppp_ccp_closed(struct ppp *ppp);
 static struct compressor *find_compressor(int type);
 static void ppp_get_stats(struct ppp *ppp, struct ppp_stats *st);
-static struct ppp *ppp_create_interface(struct net *net, int unit,
-					struct file *file, int *retp);
+static int ppp_create_interface(struct net *net, struct file *file, int *unit);
 static void init_ppp_file(struct ppp_file *pf, int kind);
 static void ppp_destroy_interface(struct ppp *ppp);
 static struct ppp *ppp_find_unit(struct ppp_net *pn, int unit);
@@ -853,12 +857,12 @@ static int ppp_unattached_ioctl(struct net *net, struct ppp_file *pf,
 		/* Create a new ppp unit */
 		if (get_user(unit, p))
 			break;
-		ppp = ppp_create_interface(net, unit, file, &err);
-		if (!ppp)
+		err = ppp_create_interface(net, file, &unit);
+		if (err < 0)
 			break;
-		file->private_data = &ppp->file;
+
 		err = -EFAULT;
-		if (put_user(ppp->file.index, p))
+		if (put_user(unit, p))
 			break;
 		err = 0;
 		break;
@@ -960,6 +964,94 @@ static struct pernet_operations ppp_net_ops = {
 	.size = sizeof(struct ppp_net),
 };
 
+static int ppp_unit_register(struct ppp *ppp, int unit)
+{
+	struct ppp_net *pn = ppp_pernet(ppp->ppp_net);
+	int ret;
+
+	mutex_lock(&pn->all_ppp_mutex);
+
+	if (unit < 0) {
+		ret = unit_get(&pn->units_idr, ppp);
+		if (ret < 0)
+			goto err;
+	} else {
+		/* Caller asked for a specific unit number. Fail with -EEXIST
+		 * if unavailable. For backward compatibility, return -EEXIST
+		 * too if idr allocation fails; this makes pppd retry without
+		 * requesting a specific unit number.
+		 */
+		if (unit_find(&pn->units_idr, unit)) {
+			ret = -EEXIST;
+			goto err;
+		}
+		ret = unit_set(&pn->units_idr, ppp, unit);
+		if (ret < 0) {
+			/* Rewrite error for backward compatibility */
+			ret = -EEXIST;
+			goto err;
+		}
+	}
+	ppp->file.index = ret;
+
+	snprintf(ppp->dev->name, IFNAMSIZ, "ppp%i", ppp->file.index);
+
+	ret = register_netdevice(ppp->dev);
+	if (ret < 0)
+		goto err_unit;
+
+	atomic_inc(&ppp_unit_count);
+
+	mutex_unlock(&pn->all_ppp_mutex);
+
+	return 0;
+
+err_unit:
+	unit_put(&pn->units_idr, ppp->file.index);
+err:
+	mutex_unlock(&pn->all_ppp_mutex);
+
+	return ret;
+}
+
+static int ppp_dev_configure(struct net *src_net, struct net_device *dev,
+			     const struct ppp_config *conf)
+{
+	struct ppp *ppp = netdev_priv(dev);
+	int indx;
+	int err;
+
+	ppp->dev = dev;
+	ppp->ppp_net = src_net;
+	ppp->mru = PPP_MRU;
+	ppp->owner = conf->file;
+
+	init_ppp_file(&ppp->file, INTERFACE);
+	ppp->file.hdrlen = PPP_HDRLEN - 2; /* don't count proto bytes */
+
+	for (indx = 0; indx < NUM_NP; ++indx)
+		ppp->npmode[indx] = NPMODE_PASS;
+	INIT_LIST_HEAD(&ppp->channels);
+	spin_lock_init(&ppp->rlock);
+	spin_lock_init(&ppp->wlock);
+#ifdef CONFIG_PPP_MULTILINK
+	ppp->minseq = -1;
+	skb_queue_head_init(&ppp->mrq);
+#endif /* CONFIG_PPP_MULTILINK */
+#ifdef CONFIG_PPP_FILTER
+	ppp->pass_filter = NULL;
+	ppp->active_filter = NULL;
+#endif /* CONFIG_PPP_FILTER */
+
+	err = ppp_unit_register(ppp, conf->unit);
+	if (err < 0)
+		return err;
+
+	conf->file->private_data = &ppp->file;
+
+	return 0;
+}
+
 #define PPP_MAJOR	108
 
 /* Called at boot time if ppp is compiled into the kernel,
@@ -2732,102 +2824,40 @@ ppp_get_stats(struct ppp *ppp, struct ppp_stats *st)
  * or if there is already a unit with the requested number.
  * unit == -1 means allocate a new number.
  */
-static struct ppp *ppp_create_interface(struct net *net, int unit,
-					struct file *file, int *retp)
+static int ppp_create_interface(struct net *net, struct file *file, int *unit)
 {
+	struct ppp_config conf = {
+		.file = file,
+		.unit = *unit,
+	};
 	struct ppp *ppp;
-	struct ppp_net *pn;
-	struct net_device *dev = NULL;
-	int ret = -ENOMEM;
-	int i;
+	struct net_device *dev;
+	int err;
 
 	dev = alloc_netdev(sizeof(struct ppp), "", NET_NAME_ENUM, ppp_setup);
-	if (!dev)
-		goto out1;
-
-	pn = ppp_pernet(net);
-
-	ppp = netdev_priv(dev);
-	ppp->dev = dev;
-	ppp->mru = PPP_MRU;
-	init_ppp_file(&ppp->file, INTERFACE);
-	ppp->file.hdrlen = PPP_HDRLEN - 2;	/* don't count proto bytes */
-	ppp->owner = file;
-	for (i = 0; i < NUM_NP; ++i)
-		ppp->npmode[i] = NPMODE_PASS;
-	INIT_LIST_HEAD(&ppp->channels);
-	spin_lock_init(&ppp->rlock);
-	spin_lock_init(&ppp->wlock);
-#ifdef CONFIG_PPP_MULTILINK
-	ppp->minseq = -1;
-	skb_queue_head_init(&ppp->mrq);
-#endif /* CONFIG_PPP_MULTILINK */
-#ifdef CONFIG_PPP_FILTER
-	ppp->pass_filter = NULL;
-	ppp->active_filter = NULL;
-#endif /* CONFIG_PPP_FILTER */
-
-	/*
-	 * drum roll: don't forget to set
-	 * the net device is belong to
-	 */
+	if (!dev) {
+		err = -ENOMEM;
+		goto err;
+	}
 	dev_net_set(dev, net);
 
 	rtnl_lock();
-	mutex_lock(&pn->all_ppp_mutex);
-
-	if (unit < 0) {
-		unit = unit_get(&pn->units_idr, ppp);
-		if (unit < 0) {
-			ret = unit;
-			goto out2;
-		}
-	} else {
-		ret = -EEXIST;
-		if (unit_find(&pn->units_idr, unit))
-			goto out2; /* unit already exists */
-		/*
-		 * if caller need a specified unit number
-		 * lets try to satisfy him, otherwise --
-		 * he should better ask us for new unit number
-		 *
-		 * NOTE: yes I know that returning EEXIST it's not
-		 * fair but at least pppd will ask us to allocate
-		 * new unit in this case so user is happy :)
-		 */
-		unit = unit_set(&pn->units_idr, ppp, unit);
-		if (unit < 0)
-			goto out2;
-	}
-
-	/* Initialize the new ppp unit */
-	ppp->file.index = unit;
-	sprintf(dev->name, "ppp%d", unit);
 
-	ret = register_netdevice(dev);
-	if (ret != 0) {
-		unit_put(&pn->units_idr, unit);
-		netdev_err(ppp->dev, "PPP: couldn't register device %s (%d)\n",
-			   dev->name, ret);
-		goto out2;
-	}
-
-	ppp->ppp_net = net;
+	err = ppp_dev_configure(net, dev, &conf);
+	if (err < 0)
+		goto err_dev;
+	ppp = netdev_priv(dev);
+	*unit = ppp->file.index;
 
-	atomic_inc(&ppp_unit_count);
-	mutex_unlock(&pn->all_ppp_mutex);
 	rtnl_unlock();
 
-	*retp = 0;
-	return ppp;
+	return 0;
 
-out2:
-	mutex_unlock(&pn->all_ppp_mutex);
+err_dev:
 	rtnl_unlock();
 	free_netdev(dev);
-out1:
-	*retp = ret;
-	return NULL;
+err:
+	return err;
 }
 
 /*
-- 
2.8.1


^ permalink raw reply related

* [RFC PATCH v3 0/2] ppp: add rtnetlink support
From: Guillaume Nault @ 2016-04-21 16:22 UTC (permalink / raw)
  To: netdev
  Cc: linux-ppp, David Miller, Paul Mackerras, Stephen Hemminger,
	walter harms

PPP devices lack the ability to be customised at creation time. In
particular they can't be created in a given netns or with a particular
name. Moving or renaming the device after creation is possible, but
creates undesirable transient effects on servers where PPP devices are
constantly created and removed, as users connect and disconnect.
Implementing rtnetlink support solves this problem.

The rtnetlink handlers implemented in this series are minimal, and can
only replace the PPPIOCNEWUNIT ioctl. The rest of PPP ioctls remains
necessary for any other operation on channels and units.
It is perfectly possible to mix PPP devices created by rtnl
and by ioctl(PPPIOCNEWUNIT). Devices will behave in the same way.

If necessary, rtnetlink support could be extended to provide some of
the functionalities brought by ppp_net_ioctl() and ppp_ioctl(). This
would let external programs, like "ip link", set or retrieve PPP device
information. However, I haven't made my mind on the usefulness of this
approach, so this isn't implemented in this series.

This series doesn't try to invert lock ordering between ppp_mutex and
rtnl_lock. mutex_trylock() is used instead, which greatly simplifies
things.
A user visible difference brought by this series is that old PPP
interfaces (those created with ioctl(PPPIOCNEWUNIT)), can now be
removed by "ip link del", just like new rtnl based PPP devices.

Changes since v2:
  - Define ->rtnl_link_ops for ioctl based PPP devices, so they can
    handle rtnl messages just like rtnl based ones (suggested by
    Stephen Hemminger).
  - Move back to original lock ordering between ppp_mutex and rtnl_lock
    to simplify patch series. Handle lock inversion issue using
    mutex_trylock() (suggested by Stephen Hemminger).
  - Do file descriptor lookup directly in ppp_nl_newlink(), to simplify
    ppp_dev_configure().

Changes since v1:
  - Rebase on net-next.
  - Invert locking order wrt. ppp_mutex and rtnl_lock and protect
    file->private_data with ppp_mutex.

Guillaume Nault (2):
  ppp: define reusable device creation functions
  ppp: add rtnetlink device creation support

 drivers/net/ppp/ppp_generic.c | 315 ++++++++++++++++++++++++++++++------------
 include/uapi/linux/if_link.h  |   8 ++
 2 files changed, 235 insertions(+), 88 deletions(-)

-- 
2.8.1

^ permalink raw reply

* Re: [net-next 00/17][pull request] 100GbE Intel Wired LAN Driver Updates 2016-04-20
From: David Miller @ 2016-04-21 16:21 UTC (permalink / raw)
  To: jeffrey.t.kirsher; +Cc: netdev, nhorman, sassmann, jogreene, john.ronciak
In-Reply-To: <1461218969-68578-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Date: Wed, 20 Apr 2016 23:09:12 -0700

> This series contains updates to fm10k only.

Pulled, thanks Jeff.

^ permalink raw reply

* Re: [RFC PATCH] gro: Partly revert "net: gro: allow to build full sized skb"
From: Alexander Duyck @ 2016-04-21 16:02 UTC (permalink / raw)
  To: Steffen Klassert; +Cc: Eric Dumazet, Sowmini Varadhan, Netdev
In-Reply-To: <20160421074042.GF3347@gauss.secunet.com>

On Thu, Apr 21, 2016 at 12:40 AM, Steffen Klassert
<steffen.klassert@secunet.com> wrote:
> This partly reverts the below mentioned patch because on
> forwarding, such skbs can't be offloaded to a NIC.
>
> We need this to get IPsec GRO for forwarding to work properly,
> otherwise the GRO aggregated packets get segmented again by
> the GSO layer. Although discovered when implementing IPsec GRO,
> this is a general problem in the forwarding path.

I'm confused as to why you would need this to get IPsec GRO forwarding
to work.  Are you having to go through a device that doesn't have
NETIF_F_FRAGLIST defined?  Also what is the issue with having to go
through the GSO layer on segmentation?  It seems like we might be able
to do something like what we did with GSO partial to split frames so
that they are in chunks that wouldn't require NETIF_F_FRAGLIST.  Then
you could get the best of both worlds in that the stack would only
process one super-frame, and the transmitter could TSO a series of
frames that are some fixed MSS in size.

- Alex

^ permalink raw reply

* Re: [PATCH net-next 0/7] net: network drivers should not depend on geneve/vxlan
From: David Miller @ 2016-04-21 15:53 UTC (permalink / raw)
  To: hannes; +Cc: netdev, jesse
In-Reply-To: <1461224591.4101216.585253121.498EBE7C@webmail.messagingengine.com>

From: Hannes Frederic Sowa <hannes@stressinduktion.org>
Date: Thu, 21 Apr 2016 09:43:11 +0200

> could you look at this again? If you have further feedback I am happy to
> incooperate those.

I will.

^ permalink raw reply

* Re: [PATCH V2] net: stmmac: socfpga: Remove re-registration of reset controller
From: Dinh Nguyen @ 2016-04-21 14:52 UTC (permalink / raw)
  To: Marek Vasut, netdev
  Cc: peppe.cavallaro, alexandre.torgue, Matthew Gerlach,
	David S . Miller
In-Reply-To: <5718BEDB.4010303@denx.de>



On 04/21/2016 06:51 AM, Marek Vasut wrote:
>>>>
>>>> if you modify the patch to call stmmac_dvr_probe() before calling
>>>> socfpga_dwmac_init(), then you would already have the reset control
>>>> information.
>>>
>>> I was under the impression that you must call socfpga_dwmac_init()
>>> before stmmac_dvr_probe() for whatever hardware-related reason. If
>>> you are absolutely certain this is not necessary, then that's just
>>> perfect and the patch can be simplified even further -- just remove
>>> the call to socfpga_dwmac_init() from probe altogether , the dwmac
>>> core code will call plat_dat->init at the end of probe .
>>>
>>> So shall we do that ? I am happy to spin V3 like that if you confirm
>>> that it's legal to do things in the aforementioned order.
>>>
>>
>> AFAICT, I don't see any reason why the socfpga_dwmac_init() has to go
>> before the stmmac_dvr_probe(). I tested this by using U-Boot to put the
>> PHY modes in the system manager into a different mode, and put the
>> ethernet IP into reset. Linux was able to use ethernet just fine with
>> the aformentioned order.
> 
> Ha, the dwmac code does not call the ->init, so you have to call it from
> the socfpga-dwmac probe afterall. I will send a V3 now and do
> it just like you suggested. I will send a RFC for calling ->init and
> ->exit from the stmmac common code too.
> 

Yes, I saw this too. Was going to wait until after this patch to
follow-up. Looks like you already did! Thanks!

Dinh

^ permalink raw reply

* Re: [PATCH V3] net: stmmac: socfpga: Remove re-registration of reset controller
From: Dinh Nguyen @ 2016-04-21 14:39 UTC (permalink / raw)
  To: Marek Vasut, netdev
  Cc: peppe.cavallaro, alexandre.torgue, Matthew Gerlach,
	David S . Miller
In-Reply-To: <1461240710-5381-1-git-send-email-marex@denx.de>



On 04/21/2016 07:11 AM, Marek Vasut wrote:
> Both socfpga_dwmac_parse_data() in dwmac-socfpga.c and stmmac_dvr_probe()
> in stmmac_main.c functions call devm_reset_control_get() to register an
> reset controller for the stmmac. This results in an attempt to register
> two reset controllers for the same non-shared reset line.
> 
> The first attempt to register the reset controller works fine. The second
> attempt fails with warning from the reset controller core, see below.
> The warning is produced because the reset line is non-shared and thus
> it is allowed to have only up-to one reset controller associated with
> that reset line, not two or more.
> 
> The solution has multiple parts. First, the original socfpga_dwmac_init()
> is tweaked to use reset controller pointer from the stmmac_priv (private
> data of the stmmac core) instead of the local instance, which was used
> before. The local re-registration of the reset controller is removed.
> 
> Next, the socfpga_dwmac_init() is moved after stmmac_dvr_probe() in the
> probe function. This order is legal according to Altera and it makes the
> code much easier, since there is no need to temporarily register and
> unregister the reset controller ; the reset controller is already registered
> by the stmmac_dvr_probe().
> 
> Finally, plat_dat->exit and socfpga_dwmac_exit() is no longer necessary,
> since the functionality is already performed by the stmmac core.
> 
> ------------[ cut here ]------------
> WARNING: CPU: 0 PID: 1 at drivers/reset/core.c:187 __of_reset_control_get+0x218/0x270
> Modules linked in:
> CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.6.0-rc4-next-20160419-00015-gabb2477-dirty #4
> Hardware name: Altera SOCFPGA
> [<c010f290>] (unwind_backtrace) from [<c010b82c>] (show_stack+0x10/0x14)
> [<c010b82c>] (show_stack) from [<c0373da4>] (dump_stack+0x94/0xa8)
> [<c0373da4>] (dump_stack) from [<c011bcc0>] (__warn+0xec/0x104)
> [<c011bcc0>] (__warn) from [<c011bd88>] (warn_slowpath_null+0x20/0x28)
> [<c011bd88>] (warn_slowpath_null) from [<c03a6eb4>] (__of_reset_control_get+0x218/0x270)
> [<c03a6eb4>] (__of_reset_control_get) from [<c03a701c>] (__devm_reset_control_get+0x54/0x90)
> [<c03a701c>] (__devm_reset_control_get) from [<c041fa30>] (stmmac_dvr_probe+0x1b4/0x8e8)
> [<c041fa30>] (stmmac_dvr_probe) from [<c04298c8>] (socfpga_dwmac_probe+0x1b8/0x28c)
> [<c04298c8>] (socfpga_dwmac_probe) from [<c03d6ffc>] (platform_drv_probe+0x4c/0xb0)
> [<c03d6ffc>] (platform_drv_probe) from [<c03d54ec>] (driver_probe_device+0x224/0x2bc)
> [<c03d54ec>] (driver_probe_device) from [<c03d5630>] (__driver_attach+0xac/0xb0)
> [<c03d5630>] (__driver_attach) from [<c03d382c>] (bus_for_each_dev+0x6c/0xa0)
> [<c03d382c>] (bus_for_each_dev) from [<c03d4ad4>] (bus_add_driver+0x1a4/0x21c)
> [<c03d4ad4>] (bus_add_driver) from [<c03d60ac>] (driver_register+0x78/0xf8)
> [<c03d60ac>] (driver_register) from [<c0101760>] (do_one_initcall+0x40/0x170)
> [<c0101760>] (do_one_initcall) from [<c0800e38>] (kernel_init_freeable+0x1dc/0x27c)
> [<c0800e38>] (kernel_init_freeable) from [<c05d1bd4>] (kernel_init+0x8/0x114)
> [<c05d1bd4>] (kernel_init) from [<c01076f8>] (ret_from_fork+0x14/0x3c)
> ---[ end trace 059d2fbe87608fa9 ]---
> 
> Signed-off-by: Marek Vasut <marex@denx.de>
> Cc: Matthew Gerlach <mgerlach@opensource.altera.com>
> Cc: Dinh Nguyen <dinguyen@opensource.altera.com>
> Cc: David S. Miller <davem@davemloft.net>
> ---
> V2: Add missing stmmac_rst = NULL; into socfpga_dwmac_init_probe()
> V3: Greatly simplify the code by moving socfpga_dwmac_init() after
>     stmmac_dvr_probe(), which is legal. This removes the need for
>     temporary registration of the reset controller, which is super.
>

Tested-by: Dinh Nguyen <dinguyen@opensource.altera.com>

Thanks,
Dinh

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox