Netdev List
 help / color / mirror / Atom feed
* Re: [iproute PATCH v2 1/7] ipntable: Make sure filter.name is NULL-terminated
From: Phil Sutter @ 2017-08-18 10:52 UTC (permalink / raw)
  To: David Laight; +Cc: Stephen Hemminger, netdev@vger.kernel.org
In-Reply-To: <063D6719AE5E284EB5DD2968C1650D6DD005A7B4@AcuExch.aculab.com>

Hi David,

On Fri, Aug 18, 2017 at 09:19:16AM +0000, David Laight wrote:
> From: Phil Sutter
> > Sent: 17 August 2017 18:09
> > To: Stephen Hemminger
> > Cc: netdev@vger.kernel.org
> > Subject: [iproute PATCH v2 1/7] ipntable: Make sure filter.name is NULL-terminated
> > 
> > Signed-off-by: Phil Sutter <phil@nwl.cc>
> > ---
> >  ip/ipntable.c | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> > 
> > diff --git a/ip/ipntable.c b/ip/ipntable.c
> > index 879626ee4f491..7be1f04d33d90 100644
> > --- a/ip/ipntable.c
> > +++ b/ip/ipntable.c
> > @@ -633,7 +633,8 @@ static int ipntable_show(int argc, char **argv)
> >  		} else if (strcmp(*argv, "name") == 0) {
> >  			NEXT_ARG();
> > 
> > -			strncpy(filter.name, *argv, sizeof(filter.name));
> > +			strncpy(filter.name, *argv, sizeof(filter.name) - 1);
> > +			filter.name[sizeof(filter.name) - 1] = '\0';
> 
> Why not check for overflow instead?
> 			if (filter.name[sizeof(filter.name) - 1])
> 				usage("filer name too long");

sizeof(filter.name) is 1024, which is maybe a bit over the top for
something a user would input. So I found a better way avoiding all this
at once: I made filter.name a const char *, then just assigned *argv to
it. This should be safe since rtnl_dump_filter() and therefore
print_ntable() callback is called from inside ipntable_show() so *argv
is not accessed outside of it's scope.

What do you think?

Thanks, Phil

^ permalink raw reply

* Re: [PATCH] net: sunrpc: svcsock: fix NULL-pointer exception
From: Jeff Layton @ 2017-08-18 10:27 UTC (permalink / raw)
  To: Vadim Lomovtsev, trond.myklebust-7I+n7zu2hftEKMMhf/gKZA,
	anna.schumaker-HgOvQuBEEgTQT0dZR+AlfA,
	bfields-uC3wQj2KruNg9hUCZPvPmw, davem-fT/PcQaiUtIeIZ0/mPfg9Q,
	linux-nfs-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	pabeni-H+wXaHxf7aLQT0dZR+AlfA
In-Reply-To: <1503050447-13362-1-git-send-email-vlomovts-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>

On Fri, 2017-08-18 at 06:00 -0400, Vadim Lomovtsev wrote:
> While running nfs/connectathon tests kernel NULL-pointer exception
> has been observed due to races in svcsock.c.
> 
> Race is appear when kernel accepts connection by kernel_accept
> (which creates new socket) and start queuing ingress packets
> to new socket. This happanes in ksoftirq context which concurrently
> on a differnt core while new socket setup is not done yet.
> 
> The fix is to re-order socket user data init sequence, add NULL-ptr
> check before callback call along with barriers to prevent kernel crash.
> 
> Test results: nfs/connectathon reports '0' failed tests for about 200+ iterations.
> 
> Crash log:
> ---<-snip->---
> [ 6708.638984] Unable to handle kernel NULL pointer dereference at virtual address 00000000
> [ 6708.647093] pgd = ffff0000094e0000
> [ 6708.650497] [00000000] *pgd=0000010ffff90003, *pud=0000010ffff90003, *pmd=0000010ffff80003, *pte=0000000000000000
> [ 6708.660761] Internal error: Oops: 86000005 [#1] SMP
> [ 6708.665630] Modules linked in: nfsv3 nfnetlink_queue nfnetlink_log nfnetlink rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache overlay xt_CONNSECMARK xt_SECMARK xt_conntrack iptable_security ip_tables ah4 xfrm4_mode_transport sctp tun binfmt_misc ext4 jbd2 mbcache loop tcp_diag udp_diag inet_diag rpcrdma ib_isert iscsi_target_mod ib_iser rdma_cm iw_cm libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp scsi_transport_srp ib_ipoib ib_ucm ib_uverbs ib_umad ib_cm ib_core nls_koi8_u nls_cp932 ts_kmp nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack vfat fat ghash_ce sha2_ce sha1_ce cavium_rng_vf i2c_thunderx sg thunderx_edac i2c_smbus edac_core cavium_rng nfsd auth_rpcgss nfs_acl lockd grace sunrpc xfs libcrc32c nicvf nicpf ast i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgb
 lt fb_sys_fops
> [ 6708.736446]  ttm drm i2c_core thunder_bgx thunder_xcv mdio_thunder mdio_cavium dm_mirror dm_region_hash dm_log dm_mod [last unloaded: stap_3c300909c5b3f46dcacd49aab3334af_87021]
> [ 6708.752275] CPU: 84 PID: 0 Comm: swapper/84 Tainted: G        W  OE   4.11.0-4.el7.aarch64 #1
> [ 6708.760787] Hardware name: www.cavium.com CRB-2S/CRB-2S, BIOS 0.3 Mar 13 2017
> [ 6708.767910] task: ffff810006842e80 task.stack: ffff81000689c000
> [ 6708.773822] PC is at 0x0
> [ 6708.776739] LR is at svc_data_ready+0x38/0x88 [sunrpc]
> [ 6708.781866] pc : [<0000000000000000>] lr : [<ffff0000029d7378>] pstate: 60000145
> [ 6708.789248] sp : ffff810ffbad3900
> [ 6708.792551] x29: ffff810ffbad3900 x28: ffff000008c73d58
> [ 6708.797853] x27: 0000000000000000 x26: ffff81000bbe1e00
> [ 6708.803156] x25: 0000000000000020 x24: ffff800f7410bf28
> [ 6708.808458] x23: ffff000008c63000 x22: ffff000008c63000
> [ 6708.813760] x21: ffff800f7410bf28 x20: ffff81000bbe1e00
> [ 6708.819063] x19: ffff810012412400 x18: 00000000d82a9df2
> [ 6708.824365] x17: 0000000000000000 x16: 0000000000000000
> [ 6708.829667] x15: 0000000000000000 x14: 0000000000000001
> [ 6708.834969] x13: 0000000000000000 x12: 722e736f622e676e
> [ 6708.840271] x11: 00000000f814dd99 x10: 0000000000000000
> [ 6708.845573] x9 : 7374687225000000 x8 : 0000000000000000
> [ 6708.850875] x7 : 0000000000000000 x6 : 0000000000000000
> [ 6708.856177] x5 : 0000000000000028 x4 : 0000000000000000
> [ 6708.861479] x3 : 0000000000000000 x2 : 00000000e5000000
> [ 6708.866781] x1 : 0000000000000000 x0 : ffff81000bbe1e00
> [ 6708.872084]
> [ 6708.873565] Process swapper/84 (pid: 0, stack limit = 0xffff81000689c000)
> [ 6708.880341] Stack: (0xffff810ffbad3900 to 0xffff8100068a0000)
> [ 6708.886075] Call trace:
> [ 6708.888513] Exception stack(0xffff810ffbad3710 to 0xffff810ffbad3840)
> [ 6708.894942] 3700:                                   ffff810012412400 0001000000000000
> [ 6708.902759] 3720: ffff810ffbad3900 0000000000000000 0000000060000145 ffff800f79300000
> [ 6708.910577] 3740: ffff000009274d00 00000000000003ea 0000000000000015 ffff000008c63000
> [ 6708.918395] 3760: ffff810ffbad3830 ffff800f79300000 000000000000004d 0000000000000000
> [ 6708.926212] 3780: ffff810ffbad3890 ffff0000080f88dc ffff800f79300000 000000000000004d
> [ 6708.934030] 37a0: ffff800f7930093c ffff000008c63000 0000000000000000 0000000000000140
> [ 6708.941848] 37c0: ffff000008c2c000 0000000000040b00 ffff81000bbe1e00 0000000000000000
> [ 6708.949665] 37e0: 00000000e5000000 0000000000000000 0000000000000000 0000000000000028
> [ 6708.957483] 3800: 0000000000000000 0000000000000000 0000000000000000 7374687225000000
> [ 6708.965300] 3820: 0000000000000000 00000000f814dd99 722e736f622e676e 0000000000000000
> [ 6708.973117] [<          (null)>]           (null)
> [ 6708.977824] [<ffff0000086f9fa4>] tcp_data_queue+0x754/0xc5c
> [ 6708.983386] [<ffff0000086fa64c>] tcp_rcv_established+0x1a0/0x67c
> [ 6708.989384] [<ffff000008704120>] tcp_v4_do_rcv+0x15c/0x22c
> [ 6708.994858] [<ffff000008707418>] tcp_v4_rcv+0xaf0/0xb58
> [ 6709.000077] [<ffff0000086df784>] ip_local_deliver_finish+0x10c/0x254
> [ 6709.006419] [<ffff0000086dfea4>] ip_local_deliver+0xf0/0xfc
> [ 6709.011980] [<ffff0000086dfad4>] ip_rcv_finish+0x208/0x3a4
> [ 6709.017454] [<ffff0000086e018c>] ip_rcv+0x2dc/0x3c8
> [ 6709.022328] [<ffff000008692fc8>] __netif_receive_skb_core+0x2f8/0xa0c
> [ 6709.028758] [<ffff000008696068>] __netif_receive_skb+0x38/0x84
> [ 6709.034580] [<ffff00000869611c>] netif_receive_skb_internal+0x68/0xdc
> [ 6709.041010] [<ffff000008696bc0>] napi_gro_receive+0xcc/0x1a8
> [ 6709.046690] [<ffff0000014b0fc4>] nicvf_cq_intr_handler+0x59c/0x730 [nicvf]
> [ 6709.053559] [<ffff0000014b1380>] nicvf_poll+0x38/0xb8 [nicvf]
> [ 6709.059295] [<ffff000008697a6c>] net_rx_action+0x2f8/0x464
> [ 6709.064771] [<ffff000008081824>] __do_softirq+0x11c/0x308
> [ 6709.070164] [<ffff0000080d14e4>] irq_exit+0x12c/0x174
> [ 6709.075206] [<ffff00000813101c>] __handle_domain_irq+0x78/0xc4
> [ 6709.081027] [<ffff000008081608>] gic_handle_irq+0x94/0x190
> [ 6709.086501] Exception stack(0xffff81000689fdf0 to 0xffff81000689ff20)
> [ 6709.092929] fde0:                                   0000810ff2ec0000 ffff000008c10000
> [ 6709.100747] fe00: ffff000008c70ef4 0000000000000001 0000000000000000 ffff810ffbad9b18
> [ 6709.108565] fe20: ffff810ffbad9c70 ffff8100169d3800 ffff810006843ab0 ffff81000689fe80
> [ 6709.116382] fe40: 0000000000000bd0 0000ffffdf979cd0 183f5913da192500 0000ffff8a254ce4
> [ 6709.124200] fe60: 0000ffff8a254b78 0000aaab10339808 0000000000000000 0000ffff8a0c2a50
> [ 6709.132018] fe80: 0000ffffdf979b10 ffff000008d6d450 ffff000008c10000 ffff000008d6d000
> [ 6709.139836] fea0: 0000000000000054 ffff000008cd3dbc 0000000000000000 0000000000000000
> [ 6709.147653] fec0: 0000000000000000 0000000000000000 0000000000000000 ffff81000689ff20
> [ 6709.155471] fee0: ffff000008085240 ffff81000689ff20 ffff000008085244 0000000060000145
> [ 6709.163289] ff00: ffff81000689ff10 ffff00000813f1e4 ffffffffffffffff ffff00000813f238
> [ 6709.171107] [<ffff000008082eb4>] el1_irq+0xb4/0x140
> [ 6709.175976] [<ffff000008085244>] arch_cpu_idle+0x44/0x11c
> [ 6709.181368] [<ffff0000087bf3b8>] default_idle_call+0x20/0x30
> [ 6709.187020] [<ffff000008116d50>] do_idle+0x158/0x1e4
> [ 6709.191973] [<ffff000008116ff4>] cpu_startup_entry+0x2c/0x30
> [ 6709.197624] [<ffff00000808e7cc>] secondary_start_kernel+0x13c/0x160
> [ 6709.203878] [<0000000001bc71c4>] 0x1bc71c4
> [ 6709.207967] Code: bad PC value
> [ 6709.211061] SMP: stopping secondary CPUs
> [ 6709.218830] Starting crashdump kernel...
> [ 6709.222749] Bye!
> ---<-snip>---
> 
> Signed-off-by: Vadim Lomovtsev <vlomovts-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> ---
>  net/sunrpc/svcsock.c | 24 ++++++++++++++++++------
>  1 file changed, 18 insertions(+), 6 deletions(-)
> 
> diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
> index 2b720fa..b6496f3 100644
> --- a/net/sunrpc/svcsock.c
> +++ b/net/sunrpc/svcsock.c
> @@ -421,7 +421,9 @@ static void svc_data_ready(struct sock *sk)
>  		dprintk("svc: socket %p(inet %p), busy=%d\n",
>  			svsk, sk,
>  			test_bit(XPT_BUSY, &svsk->sk_xprt.xpt_flags));
> -		svsk->sk_odata(sk);
> +		rmb();
> +		if (svsk->sk_odata)
> +			svsk->sk_odata(sk);

I'm not convinced these checks will really do much, though you might
need the rmb() calls there...see below...

>  		if (!test_and_set_bit(XPT_DATA, &svsk->sk_xprt.xpt_flags))
>  			svc_xprt_enqueue(&svsk->sk_xprt);
>  	}
> @@ -437,7 +439,9 @@ static void svc_write_space(struct sock *sk)
>  	if (svsk) {
>  		dprintk("svc: socket %p(inet %p), write_space busy=%d\n",
>  			svsk, sk, test_bit(XPT_BUSY, &svsk->sk_xprt.xpt_flags));
> -		svsk->sk_owspace(sk);
> +		rmb();
> +		if (svsk->sk_owspace)
> +			svsk->sk_owspace(sk);
>  		svc_xprt_enqueue(&svsk->sk_xprt);
>  	}
>  }
> @@ -760,8 +764,12 @@ static void svc_tcp_listen_data_ready(struct sock *sk)
>  	dprintk("svc: socket %p TCP (listen) state change %d\n",
>  		sk, sk->sk_state);
>  
> -	if (svsk)
> -		svsk->sk_odata(sk);
> +	if (svsk) { 
> +		rmb();
> +		if (svsk->sk_odata)
> +			svsk->sk_odata(sk);
> +	}
> +
>  	/*
>  	 * This callback may called twice when a new connection
>  	 * is established as a child socket inherits everything
> @@ -794,7 +802,10 @@ static void svc_tcp_state_change(struct sock *sk)
>  	if (!svsk)
>  		printk("svc: socket %p: no user data\n", sk);
>  	else {
> -		svsk->sk_ostate(sk);
> +		rmb();
> +		if (svsk->sk_ostate)
> +			svsk->sk_ostate(sk);
> +
>  		if (sk->sk_state != TCP_ESTABLISHED) {
>  			set_bit(XPT_CLOSE, &svsk->sk_xprt.xpt_flags);
>  			svc_xprt_enqueue(&svsk->sk_xprt);
> @@ -1381,12 +1392,13 @@ static struct svc_sock *svc_setup_socket(struct svc_serv *serv,
>  		return ERR_PTR(err);
>  	}
>  
> -	inet->sk_user_data = svsk;
>  	svsk->sk_sock = sock;
>  	svsk->sk_sk = inet;
>  	svsk->sk_ostate = inet->sk_state_change;
>  	svsk->sk_odata = inet->sk_data_ready;
>  	svsk->sk_owspace = inet->sk_write_space;
> +	wmb();
> +	inet->sk_user_data = svsk;
>  
>  	/* Initialize the socket */
>  	if (sock->type == SOCK_DGRAM)

Since we always check whether svsk is NULL before calling the functions,
then the above hunk and maybe the rmb() calls might be all that's needed
here. I don't see how we'd ever get a sk_odata value (for instance)
that's NULL with the above change.

-- 
Jeff Layton <jlayton-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [PATCH net-next] tcp: export drops counter to /proc/net/tcp{,6}
From: Stéphan Gorget @ 2017-08-18 10:21 UTC (permalink / raw)
  To: netdev
  Cc: Stéphan Gorget, Jeethu Rao, David S . Miller,
	Alexei Starovoitov, Eric Dumazet, kernel-team, linux-kernel

Those counters are exported for raw and udp but not for tcp, though they
are incremented.

An example where it is useful is chasing listen overflow. Listen overflow
are counted as a global counter in LINUX_MIB_LISTENOVERFLOWS accessible
in /proc/net/netstat but there is no way to find related drops in the
information exported for tcp. With this patch it will make possible to
correlate growth of LINUX_MIB_LISTENOVERFLOWS with growth of drops for
a tcp socket.

This patch does it for /proc/net/tcp and /proc/net/tcp6, the drops
counter is being incremented in function like tcp_listendrop() which
would help surface error cases like listen overflow and track them to
the correct socket.

That work has been done for raw sockets (/proc/net/raw and
/proc/net/raw6) by Wang Chen in commit 33c732c36169
("[IPV4]: Add raw drops counter.") and commit a92aa318b4b3
("[IPV6]: Add raw6 drops counter.") and by Eric Dumazet for udp ones
(/proc/net/udp and /proc/net/udp6) in commit cb61cb9b8b5e
("udp: sk_drops handling").

Signed-off-by: Stéphan Gorget <sgorget@fb.com>
Signed-off-by: Jeethu Rao <jeethu@fb.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Alexei Starovoitov <ast@fb.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: kernel-team@fb.com
Cc: netdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
---
 Documentation/networking/proc_net_tcp.txt | 30 ++++++++++++++++--------------
 net/ipv4/tcp_ipv4.c                       |  5 +++--
 net/ipv6/tcp_ipv6.c                       |  7 ++++---
 3 files changed, 23 insertions(+), 19 deletions(-)

diff --git a/Documentation/networking/proc_net_tcp.txt b/Documentation/networking/proc_net_tcp.txt
index 4a79209..38bda94 100644
--- a/Documentation/networking/proc_net_tcp.txt
+++ b/Documentation/networking/proc_net_tcp.txt
@@ -24,20 +24,22 @@ up into 3 parts because of the length of the line):
       |        |----------------------> receive-queue
       |-------------------------------> transmit-queue
 
-   1000        0 54165785 4 cd1e6040 25 4 27 3 -1
-    |          |    |     |    |     |  | |  | |--> slow start size threshold, 
-    |          |    |     |    |     |  | |  |      or -1 if the threshold
-    |          |    |     |    |     |  | |  |      is >= 0xFFFF
-    |          |    |     |    |     |  | |  |----> sending congestion window
-    |          |    |     |    |     |  | |-------> (ack.quick<<1)|ack.pingpong
-    |          |    |     |    |     |  |---------> Predicted tick of soft clock
-    |          |    |     |    |     |              (delayed ACK control data)
-    |          |    |     |    |     |------------> retransmit timeout
-    |          |    |     |    |------------------> location of socket in memory
-    |          |    |     |-----------------------> socket reference count
-    |          |    |-----------------------------> inode
-    |          |----------------------------------> unanswered 0-window probes
-    |---------------------------------------------> uid
+   1000        0 54165785 4 cd1e6040 25 4 27 3 -1 0
+    |          |    |     |    |     |  | |  | |  |--> number of drops
+    |          |    |     |    |     |  | |  | |-----> slow start size
+    |          |    |     |    |     |  | |  |         threshold, or -1 if the
+    |          |    |     |    |     |  | |  |         threshold is >= 0xFFFF
+    |          |    |     |    |     |  | |  |-------> sending congestion window
+    |          |    |     |    |     |  | |----------> (ack.quick<<1) |
+    |          |    |     |    |     |  |               ack.pingpong
+    |          |    |     |    |     |  |------------> Predicted tick of soft clock
+    |          |    |     |    |     |                 (delayed ACK control data)
+    |          |    |     |    |     |---------------> retransmit timeout
+    |          |    |     |    |---------------------> location of socket in memory
+    |          |    |     |--------------------------> socket reference count
+    |          |    |--------------------------------> inode
+    |          |-------------------------------------> unanswered 0-window probes
+    |------------------------------------------------> uid
 
 timer_active:
   0  no timer is pending
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 5af8b80..173e20a 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -2255,7 +2255,7 @@ static void get_tcp4_sock(struct sock *sk, struct seq_file *f, int i)
 		rx_queue = max_t(int, tp->rcv_nxt - tp->copied_seq, 0);
 
 	seq_printf(f, "%4d: %08X:%04X %08X:%04X %02X %08X:%08X %02X:%08lX "
-			"%08X %5u %8d %lu %d %pK %lu %lu %u %u %d",
+			"%08X %5u %8d %lu %d %pK %lu %lu %u %u %d %d",
 		i, src, srcp, dest, destp, state,
 		tp->write_seq - tp->snd_una,
 		rx_queue,
@@ -2272,7 +2272,8 @@ static void get_tcp4_sock(struct sock *sk, struct seq_file *f, int i)
 		tp->snd_cwnd,
 		state == TCP_LISTEN ?
 		    fastopenq->max_qlen :
-		    (tcp_in_initial_slowstart(tp) ? -1 : tp->snd_ssthresh));
+		    (tcp_in_initial_slowstart(tp) ? -1 : tp->snd_ssthresh),
+		state == TCP_LISTEN ? atomic_read(&sk->sk_drops) : 0);
 }
 
 static void get_timewait4_sock(const struct inet_timewait_sock *tw,
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index d79a1af..1cd296a 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -1795,8 +1795,8 @@ static void get_tcp6_sock(struct seq_file *seq, struct sock *sp, int i)
 		rx_queue = max_t(int, tp->rcv_nxt - tp->copied_seq, 0);
 
 	seq_printf(seq,
-		   "%4d: %08X%08X%08X%08X:%04X %08X%08X%08X%08X:%04X "
-		   "%02X %08X:%08X %02X:%08lX %08X %5u %8d %lu %d %pK %lu %lu %u %u %d\n",
+		   "%4d: %08X%08X%08X%08X:%04X %08X%08X%08X%08X:%04X %02X "
+		   "%08X:%08X %02X:%08lX %08X %5u %8d %lu %d %pK %lu %lu %u %u %d %d\n",
 		   i,
 		   src->s6_addr32[0], src->s6_addr32[1],
 		   src->s6_addr32[2], src->s6_addr32[3], srcp,
@@ -1818,7 +1818,8 @@ static void get_tcp6_sock(struct seq_file *seq, struct sock *sp, int i)
 		   tp->snd_cwnd,
 		   state == TCP_LISTEN ?
 			fastopenq->max_qlen :
-			(tcp_in_initial_slowstart(tp) ? -1 : tp->snd_ssthresh)
+			(tcp_in_initial_slowstart(tp) ? -1 : tp->snd_ssthresh),
+		   state == TCP_LISTEN ? atomic_read(&sp->sk_drops) : 0
 		   );
 }
 
-- 
2.9.5

^ permalink raw reply related

* Re: [iproute PATCH v2 1/7] devlink: No need for this self-assignment
From: Phil Sutter @ 2017-08-18 10:20 UTC (permalink / raw)
  To: Jiri Pirko; +Cc: Stephen Hemminger, netdev
In-Reply-To: <20170817194850.GA24011@nanopsycho.orion>

On Thu, Aug 17, 2017 at 09:48:50PM +0200, Jiri Pirko wrote:
> Thu, Aug 17, 2017 at 07:09:25PM CEST, phil@nwl.cc wrote:
> >dl_argv_handle_both() will either assign to handle_bit or error out in
> >which case the variable is not used by the caller.
> 
> I'm pretty sure that I did this to silence the compiler. If the compiler
> bug is fixed now, good.

That might depend on the compiler you used, so maybe you just want to
give it a try in your environment? If it still happens, we can keep this
self-assignment of course since it shouldn't harm.

Thanks, Phil

^ permalink raw reply

* [PATCH] net: sunrpc: svcsock: fix NULL-pointer exception
From: Vadim Lomovtsev @ 2017-08-18 10:00 UTC (permalink / raw)
  To: trond.myklebust, anna.schumaker, bfields, jlayton, davem,
	linux-nfs, netdev, linux-kernel, pabeni
  Cc: Vadim Lomovtsev

While running nfs/connectathon tests kernel NULL-pointer exception
has been observed due to races in svcsock.c.

Race is appear when kernel accepts connection by kernel_accept
(which creates new socket) and start queuing ingress packets
to new socket. This happanes in ksoftirq context which concurrently
on a differnt core while new socket setup is not done yet.

The fix is to re-order socket user data init sequence, add NULL-ptr
check before callback call along with barriers to prevent kernel crash.

Test results: nfs/connectathon reports '0' failed tests for about 200+ iterations.

Crash log:
---<-snip->---
[ 6708.638984] Unable to handle kernel NULL pointer dereference at virtual address 00000000
[ 6708.647093] pgd = ffff0000094e0000
[ 6708.650497] [00000000] *pgd=0000010ffff90003, *pud=0000010ffff90003, *pmd=0000010ffff80003, *pte=0000000000000000
[ 6708.660761] Internal error: Oops: 86000005 [#1] SMP
[ 6708.665630] Modules linked in: nfsv3 nfnetlink_queue nfnetlink_log nfnetlink rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache overlay xt_CONNSECMARK xt_SECMARK xt_conntrack iptable_security ip_tables ah4 xfrm4_mode_transport sctp tun binfmt_misc ext4 jbd2 mbcache loop tcp_diag udp_diag inet_diag rpcrdma ib_isert iscsi_target_mod ib_iser rdma_cm iw_cm libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp scsi_transport_srp ib_ipoib ib_ucm ib_uverbs ib_umad ib_cm ib_core nls_koi8_u nls_cp932 ts_kmp nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack vfat fat ghash_ce sha2_ce sha1_ce cavium_rng_vf i2c_thunderx sg thunderx_edac i2c_smbus edac_core cavium_rng nfsd auth_rpcgss nfs_acl lockd grace sunrpc xfs libcrc32c nicvf nicpf ast i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt
  fb_sys_fops
[ 6708.736446]  ttm drm i2c_core thunder_bgx thunder_xcv mdio_thunder mdio_cavium dm_mirror dm_region_hash dm_log dm_mod [last unloaded: stap_3c300909c5b3f46dcacd49aab3334af_87021]
[ 6708.752275] CPU: 84 PID: 0 Comm: swapper/84 Tainted: G        W  OE   4.11.0-4.el7.aarch64 #1
[ 6708.760787] Hardware name: www.cavium.com CRB-2S/CRB-2S, BIOS 0.3 Mar 13 2017
[ 6708.767910] task: ffff810006842e80 task.stack: ffff81000689c000
[ 6708.773822] PC is at 0x0
[ 6708.776739] LR is at svc_data_ready+0x38/0x88 [sunrpc]
[ 6708.781866] pc : [<0000000000000000>] lr : [<ffff0000029d7378>] pstate: 60000145
[ 6708.789248] sp : ffff810ffbad3900
[ 6708.792551] x29: ffff810ffbad3900 x28: ffff000008c73d58
[ 6708.797853] x27: 0000000000000000 x26: ffff81000bbe1e00
[ 6708.803156] x25: 0000000000000020 x24: ffff800f7410bf28
[ 6708.808458] x23: ffff000008c63000 x22: ffff000008c63000
[ 6708.813760] x21: ffff800f7410bf28 x20: ffff81000bbe1e00
[ 6708.819063] x19: ffff810012412400 x18: 00000000d82a9df2
[ 6708.824365] x17: 0000000000000000 x16: 0000000000000000
[ 6708.829667] x15: 0000000000000000 x14: 0000000000000001
[ 6708.834969] x13: 0000000000000000 x12: 722e736f622e676e
[ 6708.840271] x11: 00000000f814dd99 x10: 0000000000000000
[ 6708.845573] x9 : 7374687225000000 x8 : 0000000000000000
[ 6708.850875] x7 : 0000000000000000 x6 : 0000000000000000
[ 6708.856177] x5 : 0000000000000028 x4 : 0000000000000000
[ 6708.861479] x3 : 0000000000000000 x2 : 00000000e5000000
[ 6708.866781] x1 : 0000000000000000 x0 : ffff81000bbe1e00
[ 6708.872084]
[ 6708.873565] Process swapper/84 (pid: 0, stack limit = 0xffff81000689c000)
[ 6708.880341] Stack: (0xffff810ffbad3900 to 0xffff8100068a0000)
[ 6708.886075] Call trace:
[ 6708.888513] Exception stack(0xffff810ffbad3710 to 0xffff810ffbad3840)
[ 6708.894942] 3700:                                   ffff810012412400 0001000000000000
[ 6708.902759] 3720: ffff810ffbad3900 0000000000000000 0000000060000145 ffff800f79300000
[ 6708.910577] 3740: ffff000009274d00 00000000000003ea 0000000000000015 ffff000008c63000
[ 6708.918395] 3760: ffff810ffbad3830 ffff800f79300000 000000000000004d 0000000000000000
[ 6708.926212] 3780: ffff810ffbad3890 ffff0000080f88dc ffff800f79300000 000000000000004d
[ 6708.934030] 37a0: ffff800f7930093c ffff000008c63000 0000000000000000 0000000000000140
[ 6708.941848] 37c0: ffff000008c2c000 0000000000040b00 ffff81000bbe1e00 0000000000000000
[ 6708.949665] 37e0: 00000000e5000000 0000000000000000 0000000000000000 0000000000000028
[ 6708.957483] 3800: 0000000000000000 0000000000000000 0000000000000000 7374687225000000
[ 6708.965300] 3820: 0000000000000000 00000000f814dd99 722e736f622e676e 0000000000000000
[ 6708.973117] [<          (null)>]           (null)
[ 6708.977824] [<ffff0000086f9fa4>] tcp_data_queue+0x754/0xc5c
[ 6708.983386] [<ffff0000086fa64c>] tcp_rcv_established+0x1a0/0x67c
[ 6708.989384] [<ffff000008704120>] tcp_v4_do_rcv+0x15c/0x22c
[ 6708.994858] [<ffff000008707418>] tcp_v4_rcv+0xaf0/0xb58
[ 6709.000077] [<ffff0000086df784>] ip_local_deliver_finish+0x10c/0x254
[ 6709.006419] [<ffff0000086dfea4>] ip_local_deliver+0xf0/0xfc
[ 6709.011980] [<ffff0000086dfad4>] ip_rcv_finish+0x208/0x3a4
[ 6709.017454] [<ffff0000086e018c>] ip_rcv+0x2dc/0x3c8
[ 6709.022328] [<ffff000008692fc8>] __netif_receive_skb_core+0x2f8/0xa0c
[ 6709.028758] [<ffff000008696068>] __netif_receive_skb+0x38/0x84
[ 6709.034580] [<ffff00000869611c>] netif_receive_skb_internal+0x68/0xdc
[ 6709.041010] [<ffff000008696bc0>] napi_gro_receive+0xcc/0x1a8
[ 6709.046690] [<ffff0000014b0fc4>] nicvf_cq_intr_handler+0x59c/0x730 [nicvf]
[ 6709.053559] [<ffff0000014b1380>] nicvf_poll+0x38/0xb8 [nicvf]
[ 6709.059295] [<ffff000008697a6c>] net_rx_action+0x2f8/0x464
[ 6709.064771] [<ffff000008081824>] __do_softirq+0x11c/0x308
[ 6709.070164] [<ffff0000080d14e4>] irq_exit+0x12c/0x174
[ 6709.075206] [<ffff00000813101c>] __handle_domain_irq+0x78/0xc4
[ 6709.081027] [<ffff000008081608>] gic_handle_irq+0x94/0x190
[ 6709.086501] Exception stack(0xffff81000689fdf0 to 0xffff81000689ff20)
[ 6709.092929] fde0:                                   0000810ff2ec0000 ffff000008c10000
[ 6709.100747] fe00: ffff000008c70ef4 0000000000000001 0000000000000000 ffff810ffbad9b18
[ 6709.108565] fe20: ffff810ffbad9c70 ffff8100169d3800 ffff810006843ab0 ffff81000689fe80
[ 6709.116382] fe40: 0000000000000bd0 0000ffffdf979cd0 183f5913da192500 0000ffff8a254ce4
[ 6709.124200] fe60: 0000ffff8a254b78 0000aaab10339808 0000000000000000 0000ffff8a0c2a50
[ 6709.132018] fe80: 0000ffffdf979b10 ffff000008d6d450 ffff000008c10000 ffff000008d6d000
[ 6709.139836] fea0: 0000000000000054 ffff000008cd3dbc 0000000000000000 0000000000000000
[ 6709.147653] fec0: 0000000000000000 0000000000000000 0000000000000000 ffff81000689ff20
[ 6709.155471] fee0: ffff000008085240 ffff81000689ff20 ffff000008085244 0000000060000145
[ 6709.163289] ff00: ffff81000689ff10 ffff00000813f1e4 ffffffffffffffff ffff00000813f238
[ 6709.171107] [<ffff000008082eb4>] el1_irq+0xb4/0x140
[ 6709.175976] [<ffff000008085244>] arch_cpu_idle+0x44/0x11c
[ 6709.181368] [<ffff0000087bf3b8>] default_idle_call+0x20/0x30
[ 6709.187020] [<ffff000008116d50>] do_idle+0x158/0x1e4
[ 6709.191973] [<ffff000008116ff4>] cpu_startup_entry+0x2c/0x30
[ 6709.197624] [<ffff00000808e7cc>] secondary_start_kernel+0x13c/0x160
[ 6709.203878] [<0000000001bc71c4>] 0x1bc71c4
[ 6709.207967] Code: bad PC value
[ 6709.211061] SMP: stopping secondary CPUs
[ 6709.218830] Starting crashdump kernel...
[ 6709.222749] Bye!
---<-snip>---

Signed-off-by: Vadim Lomovtsev <vlomovts@redhat.com>
---
 net/sunrpc/svcsock.c | 24 ++++++++++++++++++------
 1 file changed, 18 insertions(+), 6 deletions(-)

diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
index 2b720fa..b6496f3 100644
--- a/net/sunrpc/svcsock.c
+++ b/net/sunrpc/svcsock.c
@@ -421,7 +421,9 @@ static void svc_data_ready(struct sock *sk)
 		dprintk("svc: socket %p(inet %p), busy=%d\n",
 			svsk, sk,
 			test_bit(XPT_BUSY, &svsk->sk_xprt.xpt_flags));
-		svsk->sk_odata(sk);
+		rmb();
+		if (svsk->sk_odata)
+			svsk->sk_odata(sk);
 		if (!test_and_set_bit(XPT_DATA, &svsk->sk_xprt.xpt_flags))
 			svc_xprt_enqueue(&svsk->sk_xprt);
 	}
@@ -437,7 +439,9 @@ static void svc_write_space(struct sock *sk)
 	if (svsk) {
 		dprintk("svc: socket %p(inet %p), write_space busy=%d\n",
 			svsk, sk, test_bit(XPT_BUSY, &svsk->sk_xprt.xpt_flags));
-		svsk->sk_owspace(sk);
+		rmb();
+		if (svsk->sk_owspace)
+			svsk->sk_owspace(sk);
 		svc_xprt_enqueue(&svsk->sk_xprt);
 	}
 }
@@ -760,8 +764,12 @@ static void svc_tcp_listen_data_ready(struct sock *sk)
 	dprintk("svc: socket %p TCP (listen) state change %d\n",
 		sk, sk->sk_state);
 
-	if (svsk)
-		svsk->sk_odata(sk);
+	if (svsk) { 
+		rmb();
+		if (svsk->sk_odata)
+			svsk->sk_odata(sk);
+	}
+
 	/*
 	 * This callback may called twice when a new connection
 	 * is established as a child socket inherits everything
@@ -794,7 +802,10 @@ static void svc_tcp_state_change(struct sock *sk)
 	if (!svsk)
 		printk("svc: socket %p: no user data\n", sk);
 	else {
-		svsk->sk_ostate(sk);
+		rmb();
+		if (svsk->sk_ostate)
+			svsk->sk_ostate(sk);
+
 		if (sk->sk_state != TCP_ESTABLISHED) {
 			set_bit(XPT_CLOSE, &svsk->sk_xprt.xpt_flags);
 			svc_xprt_enqueue(&svsk->sk_xprt);
@@ -1381,12 +1392,13 @@ static struct svc_sock *svc_setup_socket(struct svc_serv *serv,
 		return ERR_PTR(err);
 	}
 
-	inet->sk_user_data = svsk;
 	svsk->sk_sock = sock;
 	svsk->sk_sk = inet;
 	svsk->sk_ostate = inet->sk_state_change;
 	svsk->sk_odata = inet->sk_data_ready;
 	svsk->sk_owspace = inet->sk_write_space;
+	wmb();
+	inet->sk_user_data = svsk;
 
 	/* Initialize the socket */
 	if (sock->type == SOCK_DGRAM)
-- 
1.8.3.1

^ permalink raw reply related

* Re: [PATCH] net: stmmac: socfgpa: Ensure emac bit set in sys manager for MII/GMII/SGMII.
From: Sergei Shtylyov @ 2017-08-18  9:58 UTC (permalink / raw)
  To: Stephan Gatzka, peppe.cavallaro, alexandre.torgue, netdev,
	linux-kernel
In-Reply-To: <1503039314-575-1-git-send-email-stephan.gatzka@gmail.com>

Hello!

On 8/18/2017 9:55 AM, Stephan Gatzka wrote:

> When using MII/GMII/SGMII in the Altera SoC, the phy needs to be
> wired through the FPGA. To ensure correct behavior, the appropriate
> bit in the System Manager FPGA Interface Group register needs to be
> set.
> 
> Signed-off-by: Stephan Gatzka <stephan.gatzka@gmail.com>
> ---
>   drivers/net/ethernet/stmicro/stmmac/dwmac-socfpga.c | 5 ++++-
>   1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-socfpga.c b/drivers/net/ethernet/stmicro/stmmac/dwmac-socfpga.c
> index 17d4bba..d7c231b 100644
> --- a/drivers/net/ethernet/stmicro/stmmac/dwmac-socfpga.c
> +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-socfpga.c
> @@ -269,7 +269,10 @@ static int socfpga_dwmac_set_phy_mode(struct socfpga_dwmac *dwmac)
>   	ctrl &= ~(SYSMGR_EMACGRP_CTRL_PHYSEL_MASK << reg_shift);
>   	ctrl |= val << reg_shift;
>   
> -	if (dwmac->f2h_ptp_ref_clk) {
> +	if ((dwmac->f2h_ptp_ref_clk) ||
> +	    (phymode == PHY_INTERFACE_MODE_MII) ||
> +	    (phymode == PHY_INTERFACE_MODE_GMII) ||
> +	    (phymode == PHY_INTERFACE_MODE_SGMII)) {

    Inner parens not needed at all (especially the first pair).

[...]

MBR, Sergei

^ permalink raw reply

* RE: [iproute PATCH v2 2/2] ifcfg: Quote left-hand side of [ ] expression
From: David Laight @ 2017-08-18  9:32 UTC (permalink / raw)
  To: 'Phil Sutter', Stephen Hemminger; +Cc: netdev@vger.kernel.org
In-Reply-To: <20170817170932.24900-3-phil@nwl.cc>

From: Phil Sutter
> Sent: 17 August 2017 18:10
> This prevents word-splitting and therefore leads to more accurate error
> message in case 'grep -c' prints something other than a number.
> 
> Signed-off-by: Phil Sutter <phil@nwl.cc>
> ---
>  ip/ifcfg | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/ip/ifcfg b/ip/ifcfg
> index 083d9df36742f..30a2dc49816cd 100644
> --- a/ip/ifcfg
> +++ b/ip/ifcfg
> @@ -131,7 +131,7 @@ noarp=$?
> 
>  ip route add unreachable 224.0.0.0/24 >& /dev/null
>  ip route add unreachable 255.255.255.255 >& /dev/null
> -if [ `ip link ls $dev | grep -c MULTICAST` -ge 1 ]; then
> +if [ "`ip link ls $dev | grep -c MULTICAST`" -ge 1 ]; then

You could drag all these scripts into the 1990's by using $(...)
instead of `...`.

	David

^ permalink raw reply

* RE: [iproute PATCH v2 4/5] tc/tc_filter: Make sure filter name is not empty
From: David Laight @ 2017-08-18  9:30 UTC (permalink / raw)
  To: 'Phil Sutter', Stephen Hemminger; +Cc: netdev@vger.kernel.org
In-Reply-To: <20170817170931.24545-5-phil@nwl.cc>

From: Phil Sutter
> Sent: 17 August 2017 18:10
> The later check for 'k[0] != 0' requires a non-empty filter name,
> otherwise NULL pointer dereference in 'q' might happen.
> 
> Signed-off-by: Phil Sutter <phil@nwl.cc>
> ---
>  tc/tc_filter.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/tc/tc_filter.c b/tc/tc_filter.c
> index b13fb9185d4fd..a799edb35886d 100644
> --- a/tc/tc_filter.c
> +++ b/tc/tc_filter.c
> @@ -412,6 +412,9 @@ static int tc_filter_get(int cmd, unsigned int flags, int argc, char **argv)
>  			usage();
>  			return 0;
>  		} else {
> +			if (!strlen(*argv))
> +				invarg("invalid filter name", *argv);

That is nearly as bad as:
	p[strlen(p)] = 0;

> +
>  			strncpy(k, *argv, sizeof(k)-1);
> 
>  			q = get_filter_kind(k);
> --
> 2.13.1

^ permalink raw reply

* Re: [PATCH v2 0/6] net: stmmac: Detect PHY location with phy-is-integrated
From: Corentin Labbe @ 2017-08-18  9:24 UTC (permalink / raw)
  To: robh+dt, mark.rutland, linux, maxime.ripard, wens,
	peppe.cavallaro, alexandre.torgue
  Cc: devicetree, linux-arm-kernel, linux-kernel, netdev
In-Reply-To: <20170817075149.16178-1-clabbe.montjoie@gmail.com>

On Thu, Aug 17, 2017 at 09:51:43AM +0200, Corentin Labbe wrote:
> Hello
> 
> The current way to find if the phy is internal is to compare DT phy-mode
> and emac_variant/internal_phy.
> But it will negate a possible future SoC where an external PHY use the
> same phy mode than the integrated one.
> 
> This patchs series adds a new way to find if the PHY is integrated, via
> the phy-is-integrated DT property.
> 
> Since it exists both integrated and external ethernet-phy@1, they are merged in
> the final DTB and so share all properties.
> For avoiding this, the phy-is-integrated is added only to board DT.
> 
> The first five patchs should go via the sunxi tree.
> the last one should go via the net tree.
> Note that this serie will need backporting the patch
> "Documentation: net: phy: Add phy-is-integrated binding" which is in net-next
> 
> Thanks
> Regards
> 
> Changes since v1:
> - Dropped phy-is-integrated documentation patch since another same patch was already merged
> - Moved phy-is-integrated from SoC dtsi to final board DT.
> 
> Corentin Labbe (6):
>   ARM: sun8i: orangepipc: Set phy-is-integrated to the internal phy node
>   ARM: sun8i: beelink-x2: Set phy-is-integrated to the internal phy node
>   ARM: sun8i: nanopi-neo: Set phy-is-integrated to the internal phy node
>   ARM: sun8i: orangepi-2: Set phy-is-integrated to the internal phy node
>   ARM: sun8i: orangepi-one: Set phy-is-integrated to the internal phy
>     node
>   net: stmmac: dwmac-sun8i: choose internal PHY via phy-is-integrated
> 
>  arch/arm/boot/dts/sun8i-h3-beelink-x2.dts         |  4 ++++
>  arch/arm/boot/dts/sun8i-h3-nanopi-neo.dts         |  4 ++++
>  arch/arm/boot/dts/sun8i-h3-orangepi-2.dts         |  4 ++++
>  arch/arm/boot/dts/sun8i-h3-orangepi-one.dts       |  4 ++++
>  arch/arm/boot/dts/sun8i-h3-orangepi-pc.dts        |  4 ++++
>  drivers/net/ethernet/stmicro/stmmac/dwmac-sun8i.c | 16 ++++++++--------
>  6 files changed, 28 insertions(+), 8 deletions(-)
> 
> -- 
> 2.13.0
> 

Hello

Self NACK, the comment from Rob on previous series is pertinent.
I will send a v3 which use a mdio-mux for solving all problems.

Regards

^ permalink raw reply

* RE: [iproute PATCH v2 2/3] iproute_lwtunnel: Argument to strerror must be positive
From: David Laight @ 2017-08-18  9:21 UTC (permalink / raw)
  To: 'Phil Sutter', Stephen Hemminger; +Cc: netdev@vger.kernel.org
In-Reply-To: <20170817170932.24812-3-phil@nwl.cc>

From: Phil Sutter
> Sent: 17 August 2017 18:10
> Signed-off-by: Phil Sutter <phil@nwl.cc>
> ---
>  ip/iproute_lwtunnel.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/ip/iproute_lwtunnel.c b/ip/iproute_lwtunnel.c
> index 398ab5e077ed8..1a3dc4d4c0ed9 100644
> --- a/ip/iproute_lwtunnel.c
> +++ b/ip/iproute_lwtunnel.c
> @@ -643,7 +643,7 @@ static int lwt_parse_bpf(struct rtattr *rta, size_t len,
>  	err = bpf_parse_common(bpf_type, &cfg, &bpf_cb_ops, &x);
>  	if (err < 0) {
>  		fprintf(stderr, "Failed to parse eBPF program: %s\n",
> -			strerror(err));
> +			strerror(-err));

If we are in userspace I'd expect errno values to be +ve.
Returning a -ve errno is very non-standard.

	David

^ permalink raw reply

* RE: [iproute PATCH v2 1/7] ipntable: Make sure filter.name is NULL-terminated
From: David Laight @ 2017-08-18  9:19 UTC (permalink / raw)
  To: 'Phil Sutter', Stephen Hemminger; +Cc: netdev@vger.kernel.org
In-Reply-To: <20170817170932.24659-2-phil@nwl.cc>

From: Phil Sutter
> Sent: 17 August 2017 18:09
> To: Stephen Hemminger
> Cc: netdev@vger.kernel.org
> Subject: [iproute PATCH v2 1/7] ipntable: Make sure filter.name is NULL-terminated
> 
> Signed-off-by: Phil Sutter <phil@nwl.cc>
> ---
>  ip/ipntable.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/ip/ipntable.c b/ip/ipntable.c
> index 879626ee4f491..7be1f04d33d90 100644
> --- a/ip/ipntable.c
> +++ b/ip/ipntable.c
> @@ -633,7 +633,8 @@ static int ipntable_show(int argc, char **argv)
>  		} else if (strcmp(*argv, "name") == 0) {
>  			NEXT_ARG();
> 
> -			strncpy(filter.name, *argv, sizeof(filter.name));
> +			strncpy(filter.name, *argv, sizeof(filter.name) - 1);
> +			filter.name[sizeof(filter.name) - 1] = '\0';

Why not check for overflow instead?
			if (filter.name[sizeof(filter.name) - 1])
				usage("filer name too long");

	David

^ permalink raw reply

* Re: Something hitting my total number of connections to the server
From: Akshat Kakkar @ 2017-08-18  9:14 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev
In-Reply-To: <1502969805.4936.154.camel@edumazet-glaptop3.roam.corp.google.com>

On Thu, Aug 17, 2017 at 5:06 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Thu, 2017-08-17 at 14:35 +0530, Akshat Kakkar wrote:
>
>> I upgraded to 4.4 but still experiencing same issue.
>> Please help.
>
> Still too old kernel, shoot again ;)
>
>


Sorry but that's the maximum I can try as of now as its the LT version.

Besides, this issue was not present in 2.6.32 but came with 3.10 and
still there in 4.4, so I doubt if it has to do with some kernel and/or
kernel parameters much as you guys are good enough not to keep an
issue for so long (around 3 years).

So please help.

^ permalink raw reply

* [PATCH v2 0/5] ARM: dts: rcar-gen2: Convert to new CPG/MSSR bindings
From: Geert Uytterhoeven @ 2017-08-18  9:11 UTC (permalink / raw)
  To: Simon Horman, Magnus Damm
  Cc: linux-renesas-soc, linux-clk, linux-arm-kernel, devicetree,
	Geert Uytterhoeven, Andrew Lunn, Florian Fainelli, netdev

	Hi Simon, Magnus,

Currently Renesas R-Car Gen2 SoCs use the common clk-rcar-gen2,
clk-mstp, and clk-div6 drivers, which depend on most clocks being
described in DT.  Especially the module (MSTP) clocks are cumbersome and
error prone, due to 3 arrays (clocks, clock-indices, and
clock-output-names) to be kept in sync. In addition, the clk-mstp driver
cannot be extended easily to also support module resets, which are
provided by the same hardware module.

Hence when developing support for R-Car Gen3 SoCs, another approach was
chosen, which led to the CPG/MSSR driver core, and SoC-specific
subdrivers (initially for R-Car Gen3, but later also for RZ/G1).

This series converts the various R-Car Gen2 DTSes to migrate to the new
CPG/MSSR drivers that were added in v4.13-rc1.

Note that module reset descriptions will be added later.

Changes compared to v1:
  - Rebased.

Dependencies:
  - renesas-devel-20170818-v4.13-rc5.

Known issues:
  - The CPG/MSSR driver is initialized later than the old clk-rcar-gen2
    driver, causing changes of initialization order for other drivers.

    Currently the PHY subsystem does not support probe deferral

	+irq: no irq domain found for /interrupt-controller@e61c0000 !

	-Micrel KSZ8041RNLI ee700000.ethernet-ffffffff:01: attached PHY driver [Micrel KSZ8041RNLI] (mii_bus:phy_addr=ee700000.ethernet-ffffffff:01, irq=182)
	+Micrel KSZ8041RNLI ee700000.ethernet-ffffffff:01: attached PHY driver [Micrel KSZ8041RNLI] (mii_bus:phy_addr=ee700000.ethernet-ffffffff:01, irq=-1)

    leading to the Ethernet PHY falling back to polling instead of using an
    interrupt.  This can be remedied by "of_mdio: Fix broken PHY IRQ in
    case of probe deferral" (https://patchwork.kernel.org/patch/9734175/),
    which is still pending approval.

    Fortunately the impact of this is limited to a small delay in the
    detection of cable (un)plugging.
    Note that when using the PHY interrupt, cable unplugging is reported
    instantaneous, but plugging takes ca. 1.5-1.8 seconds.  Hence when
    using polling with the standard interval of 1 second, the link comes up
    in about the same time.

For your convenience, this series is also available in the
topic/rcar2-cpg-mssr-dt-v2 branch of my renesas-drivers git repository at
git://git.kernel.org/pub/scm/linux/kernel/git/geert/renesas-drivers.git.

This has been tested on r8a7790/lager, r8a7791/koelsch, r8a7792/blanche,
r8a7793/gose, and r8a7794/alt.  /sys/kernel/debug/clk/clk_summary has
been compared before and after the conversion.

Thanks for applying!

Geert Uytterhoeven (5):
  ARM: dts: r8a7790: Convert to new CPG/MSSR bindings
  ARM: dts: r8a7791: Convert to new CPG/MSSR bindings
  ARM: dts: r8a7792: Convert to new CPG/MSSR bindings
  ARM: dts: r8a7793: Convert to new CPG/MSSR bindings
  ARM: dts: r8a7794: Convert to new CPG/MSSR bindings

 arch/arm/boot/dts/r8a7790-lager.dts   |   7 +-
 arch/arm/boot/dts/r8a7790.dtsi        | 557 ++++++----------------------------
 arch/arm/boot/dts/r8a7791-koelsch.dts |   4 +-
 arch/arm/boot/dts/r8a7791-porter.dts  |   4 +-
 arch/arm/boot/dts/r8a7791.dtsi        | 557 +++++++---------------------------
 arch/arm/boot/dts/r8a7792-blanche.dts |   3 +-
 arch/arm/boot/dts/r8a7792-wheat.dts   |   3 +-
 arch/arm/boot/dts/r8a7792.dtsi        | 333 ++++----------------
 arch/arm/boot/dts/r8a7793-gose.dts    |   4 +-
 arch/arm/boot/dts/r8a7793.dtsi        | 459 +++++-----------------------
 arch/arm/boot/dts/r8a7794-alt.dts     |   3 +-
 arch/arm/boot/dts/r8a7794-silk.dts    |   3 +-
 arch/arm/boot/dts/r8a7794.dtsi        | 528 +++++---------------------------
 13 files changed, 430 insertions(+), 2035 deletions(-)

-- 
2.7.4

Gr{oetje,eeting}s,

						Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
							    -- Linus Torvalds

^ permalink raw reply

* [PATCH 4/6] dpaa_eth: add NETIF_F_RXHASH
From: Madalin Bucur @ 2017-08-18  8:56 UTC (permalink / raw)
  To: netdev, davem; +Cc: linuxppc-dev, linux-kernel
In-Reply-To: <1503046588-24349-1-git-send-email-madalin.bucur@nxp.com>

Set the skb hash when then FMan Keygen hash result is available.

Signed-off-by: Madalin Bucur <madalin.bucur@nxp.com>
---
 drivers/net/ethernet/freescale/dpaa/dpaa_eth.c     | 19 ++++++++++++++++---
 drivers/net/ethernet/freescale/dpaa/dpaa_eth.h     |  1 +
 drivers/net/ethernet/freescale/dpaa/dpaa_ethtool.c |  9 +++++++--
 drivers/net/ethernet/freescale/fman/fman_port.c    | 11 +++++++++++
 drivers/net/ethernet/freescale/fman/fman_port.h    |  2 ++
 5 files changed, 37 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
index 6d89e74..ef30038 100644
--- a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
+++ b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
@@ -236,7 +236,7 @@ static int dpaa_netdev_init(struct net_device *net_dev,
 	net_dev->max_mtu = dpaa_get_max_mtu();
 
 	net_dev->hw_features |= (NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM |
-				 NETIF_F_LLTX);
+				 NETIF_F_LLTX | NETIF_F_RXHASH);
 
 	net_dev->hw_features |= NETIF_F_SG | NETIF_F_HIGHDMA;
 	/* The kernels enables GSO automatically, if we declare NETIF_F_SG.
@@ -2237,12 +2237,13 @@ static enum qman_cb_dqrr_result rx_default_dqrr(struct qman_portal *portal,
 	dma_addr_t addr = qm_fd_addr(fd);
 	enum qm_fd_format fd_format;
 	struct net_device *net_dev;
-	u32 fd_status;
+	u32 fd_status, hash_offset;
 	struct dpaa_bp *dpaa_bp;
 	struct dpaa_priv *priv;
 	unsigned int skb_len;
 	struct sk_buff *skb;
 	int *count_ptr;
+	void *vaddr;
 
 	fd_status = be32_to_cpu(fd->status);
 	fd_format = qm_fd_get_format(fd);
@@ -2288,7 +2289,8 @@ static enum qman_cb_dqrr_result rx_default_dqrr(struct qman_portal *portal,
 	dma_unmap_single(dpaa_bp->dev, addr, dpaa_bp->size, DMA_FROM_DEVICE);
 
 	/* prefetch the first 64 bytes of the frame or the SGT start */
-	prefetch(phys_to_virt(addr) + qm_fd_get_offset(fd));
+	vaddr = phys_to_virt(addr);
+	prefetch(vaddr + qm_fd_get_offset(fd));
 
 	fd_format = qm_fd_get_format(fd);
 	/* The only FD types that we may receive are contig and S/G */
@@ -2309,6 +2311,14 @@ static enum qman_cb_dqrr_result rx_default_dqrr(struct qman_portal *portal,
 
 	skb->protocol = eth_type_trans(skb, net_dev);
 
+	if (net_dev->features & NETIF_F_RXHASH && priv->keygen_in_use &&
+	    !fman_port_get_hash_result_offset(priv->mac_dev->port[RX],
+					      &hash_offset))
+		skb_set_hash(skb, be32_to_cpu(*(u32 *)(vaddr + hash_offset)),
+			     // if L4 exists, it was used in the hash generation
+			     be32_to_cpu(fd->status) & FM_FD_STAT_L4CV ?
+				PKT_HASH_TYPE_L4 : PKT_HASH_TYPE_L3);
+
 	skb_len = skb->len;
 
 	if (unlikely(netif_receive_skb(skb) == NET_RX_DROP))
@@ -2774,6 +2784,9 @@ static int dpaa_eth_probe(struct platform_device *pdev)
 	if (err)
 		goto init_ports_failed;
 
+	/* Rx traffic distribution based on keygen hashing defaults to on */
+	priv->keygen_in_use = true;
+
 	priv->percpu_priv = devm_alloc_percpu(dev, *priv->percpu_priv);
 	if (!priv->percpu_priv) {
 		dev_err(dev, "devm_alloc_percpu() failed\n");
diff --git a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.h b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.h
index 496a12c..bd94220 100644
--- a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.h
+++ b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.h
@@ -159,6 +159,7 @@ struct dpaa_priv {
 	struct list_head dpaa_fq_list;
 
 	u8 num_tc;
+	bool keygen_in_use;
 	u32 msg_enable;	/* net_device message level */
 
 	struct {
diff --git a/drivers/net/ethernet/freescale/dpaa/dpaa_ethtool.c b/drivers/net/ethernet/freescale/dpaa/dpaa_ethtool.c
index 965f652..faea674 100644
--- a/drivers/net/ethernet/freescale/dpaa/dpaa_ethtool.c
+++ b/drivers/net/ethernet/freescale/dpaa/dpaa_ethtool.c
@@ -402,6 +402,8 @@ static void dpaa_get_strings(struct net_device *net_dev, u32 stringset,
 static int dpaa_get_hash_opts(struct net_device *dev,
 			      struct ethtool_rxnfc *cmd)
 {
+	struct dpaa_priv *priv = netdev_priv(dev);
+
 	cmd->data = 0;
 
 	switch (cmd->flow_type) {
@@ -409,7 +411,8 @@ static int dpaa_get_hash_opts(struct net_device *dev,
 	case TCP_V6_FLOW:
 	case UDP_V4_FLOW:
 	case UDP_V6_FLOW:
-		cmd->data |= RXH_L4_B_0_1 | RXH_L4_B_2_3;
+		if (priv->keygen_in_use)
+			cmd->data |= RXH_L4_B_0_1 | RXH_L4_B_2_3;
 		/* Fall through */
 	case IPV4_FLOW:
 	case IPV6_FLOW:
@@ -421,7 +424,8 @@ static int dpaa_get_hash_opts(struct net_device *dev,
 	case AH_V6_FLOW:
 	case ESP_V4_FLOW:
 	case ESP_V6_FLOW:
-		cmd->data |= RXH_IP_SRC | RXH_IP_DST;
+		if (priv->keygen_in_use)
+			cmd->data |= RXH_IP_SRC | RXH_IP_DST;
 		break;
 	default:
 		cmd->data = 0;
@@ -458,6 +462,7 @@ static void dpaa_set_hash(struct net_device *net_dev, bool enable)
 	rxport = mac_dev->port[0];
 
 	fman_port_use_kg_hash(rxport, enable);
+	priv->keygen_in_use = enable;
 }
 
 static int dpaa_set_hash_opts(struct net_device *dev,
diff --git a/drivers/net/ethernet/freescale/fman/fman_port.c b/drivers/net/ethernet/freescale/fman/fman_port.c
index b0ad9c4..451bae7 100644
--- a/drivers/net/ethernet/freescale/fman/fman_port.c
+++ b/drivers/net/ethernet/freescale/fman/fman_port.c
@@ -1720,6 +1720,17 @@ u32 fman_port_get_qman_channel_id(struct fman_port *port)
 }
 EXPORT_SYMBOL(fman_port_get_qman_channel_id);
 
+int fman_port_get_hash_result_offset(struct fman_port *port, u32 *offset)
+{
+	if (port->buffer_offsets.hash_result_offset == ILLEGAL_BASE)
+		return -EINVAL;
+
+	*offset = port->buffer_offsets.hash_result_offset;
+
+	return 0;
+}
+EXPORT_SYMBOL(fman_port_get_hash_result_offset);
+
 static int fman_port_probe(struct platform_device *of_dev)
 {
 	struct fman_port *port;
diff --git a/drivers/net/ethernet/freescale/fman/fman_port.h b/drivers/net/ethernet/freescale/fman/fman_port.h
index 5a99611..e86ca6a 100644
--- a/drivers/net/ethernet/freescale/fman/fman_port.h
+++ b/drivers/net/ethernet/freescale/fman/fman_port.h
@@ -151,6 +151,8 @@ int fman_port_enable(struct fman_port *port);
 
 u32 fman_port_get_qman_channel_id(struct fman_port *port);
 
+int fman_port_get_hash_result_offset(struct fman_port *port, u32 *offset);
+
 struct fman_port *fman_port_bind(struct device *dev);
 
 #endif /* __FMAN_PORT_H */
-- 
2.1.0

^ permalink raw reply related

* [PATCH 6/6] dpaa_eth: check allocation result
From: Madalin Bucur @ 2017-08-18  8:56 UTC (permalink / raw)
  To: netdev, davem; +Cc: linuxppc-dev, linux-kernel
In-Reply-To: <1503046588-24349-1-git-send-email-madalin.bucur@nxp.com>

Signed-off-by: Madalin Bucur <madalin.bucur@nxp.com>
---
 drivers/net/ethernet/freescale/dpaa/dpaa_eth.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
index ef30038..ff7f153 100644
--- a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
+++ b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
@@ -2557,6 +2557,9 @@ static struct dpaa_bp *dpaa_bp_alloc(struct device *dev)
 
 	dpaa_bp->bpid = FSL_DPAA_BPID_INV;
 	dpaa_bp->percpu_count = devm_alloc_percpu(dev, *dpaa_bp->percpu_count);
+	if (!dpaa_bp->percpu_count)
+		return ERR_PTR(-ENOMEM);
+
 	dpaa_bp->config_count = FSL_DPAA_ETH_MAX_BUF_COUNT;
 
 	dpaa_bp->seed_cb = dpaa_bp_seed;
-- 
2.1.0

^ permalink raw reply related

* [PATCH 5/6] Documentation: networking: add RSS information
From: Madalin Bucur @ 2017-08-18  8:56 UTC (permalink / raw)
  To: netdev, davem; +Cc: linuxppc-dev, linux-kernel
In-Reply-To: <1503046588-24349-1-git-send-email-madalin.bucur@nxp.com>

Signed-off-by: Madalin Bucur <madalin.bucur@nxp.com>
---
 Documentation/networking/dpaa.txt | 68 ++++++++++++++++++++++++++++++++++++++-
 1 file changed, 67 insertions(+), 1 deletion(-)

diff --git a/Documentation/networking/dpaa.txt b/Documentation/networking/dpaa.txt
index 76e016d..f88194f 100644
--- a/Documentation/networking/dpaa.txt
+++ b/Documentation/networking/dpaa.txt
@@ -13,6 +13,7 @@ Contents
 	- Configuring DPAA Ethernet in your kernel
 	- DPAA Ethernet Frame Processing
 	- DPAA Ethernet Features
+	- DPAA IRQ Affinity and Receive Side Scaling
 	- Debugging
 
 DPAA Ethernet Overview
@@ -147,7 +148,10 @@ gradually.
 
 The driver has Rx and Tx checksum offloading for UDP and TCP. Currently the Rx
 checksum offload feature is enabled by default and cannot be controlled through
-ethtool.
+ethtool. Also, rx-flow-hash and rx-hashing was added. The addition of RSS
+provides a big performance boost for the forwarding scenarios, allowing
+different traffic flows received by one interface to be processed by different
+CPUs in parallel.
 
 The driver has support for multiple prioritized Tx traffic classes. Priorities
 range from 0 (lowest) to 3 (highest). These are mapped to HW workqueues with
@@ -166,6 +170,68 @@ classes as follows:
 tc qdisc add dev <int> root handle 1: \
 	 mqprio num_tc 4 map 0 0 0 0 1 1 1 1 2 2 2 2 3 3 3 3 hw 1
 
+DPAA IRQ Affinity and Receive Side Scaling
+==========================================
+
+Traffic coming on the DPAA Rx queues or on the DPAA Tx confirmation
+queues is seen by the CPU as ingress traffic on a certain portal.
+The DPAA QMan portal interrupts are affined each to a certain CPU.
+The same portal interrupt services all the QMan portal consumers.
+
+By default the DPAA Ethernet driver enables RSS, making use of the
+DPAA FMan Parser and Keygen blocks to distribute traffic on 128
+hardware frame queues using a hash on IP v4/v6 source and destination
+and L4 source and destination ports, in present in the received frame.
+When RSS is disabled, all traffic received by a certain interface is
+received on the default Rx frame queue. The default DPAA Rx frame
+queues are configured to put the received traffic into a pool channel
+that allows any available CPU portal to dequeue the ingress traffic.
+The default frame queues have the HOLDACTIVE option set, ensuring that
+traffic bursts from a certain queue are serviced by the same CPU.
+This ensures a very low rate of frame reordering. A drawback of this
+is that only one CPU at a time can service the traffic received by a
+certain interface when RSS is not enabled.
+
+To implement RSS, the DPAA Ethernet driver allocates an extra set of
+128 Rx frame queues that are configured to dedicated channels, in a
+round-robin manner. The mapping of the frame queues to CPUs is now
+hardcoded, there is no indirection table to move traffic for a certain
+FQ (hash result) to another CPU. The ingress traffic arriving on one
+of these frame queues will arrive at the same portal and will always
+be processed by the same CPU. This ensures intra-flow order preservation
+and workload distribution for multiple traffic flows.
+
+RSS can be turned off for a certain interface using ethtool, i.e.
+
+	# ethtool -N fm1-mac9 rx-flow-hash tcp4 ""
+
+To turn it back on, one needs to set rx-flow-hash for tcp4/6 or udp4/6:
+
+	# ethtool -N fm1-mac9 rx-flow-hash udp4 sfdn
+
+There is no independent control for individual protocols, any command
+run for one of tcp4|udp4|ah4|esp4|sctp4|tcp6|udp6|ah6|esp6|sctp6 is
+going to control the rx-flow-hashing for all protocols on that interface.
+
+Besides using the FMan Keygen computed hash for spreading traffic on the
+128 Rx FQs, the DPAA Ethernet driver also sets the skb hash value when
+the NETIF_F_RXHASH feature is on (active by default). This can be turned
+on or off through ethtool, i.e.:
+
+	# ethtool -K fm1-mac9 rx-hashing off
+	# ethtool -k fm1-mac9 | grep hash
+	receive-hashing: off
+	# ethtool -K fm1-mac9 rx-hashing on
+	Actual changes:
+	receive-hashing: on
+	# ethtool -k fm1-mac9 | grep hash
+	receive-hashing: on
+
+Please note that Rx hashing depends upon the rx-flow-hashing being on
+for that interface - turning off rx-flow-hashing will also disable the
+rx-hashing (without ethtool reporting it as off as that depends on the
+NETIF_F_RXHASH feature flag).
+
 Debugging
 =========
 
-- 
2.1.0

^ permalink raw reply related

* [PATCH 3/6] dpaa_eth: enable Rx hashing control
From: Madalin Bucur @ 2017-08-18  8:56 UTC (permalink / raw)
  To: netdev, davem; +Cc: linuxppc-dev, linux-kernel
In-Reply-To: <1503046588-24349-1-git-send-email-madalin.bucur@nxp.com>

Allow ethtool control of the Rx flow hashing. By default RSS is
enabled, this allows to turn it off by bypassing the FMan Keygen
block and sending all traffic on the default Rx frame queue.

Signed-off-by: Madalin Bucur <madalin.bucur@nxp.com>
---
 drivers/net/ethernet/freescale/dpaa/dpaa_ethtool.c | 113 +++++++++++++++++++++
 1 file changed, 113 insertions(+)

diff --git a/drivers/net/ethernet/freescale/dpaa/dpaa_ethtool.c b/drivers/net/ethernet/freescale/dpaa/dpaa_ethtool.c
index aad825088..965f652 100644
--- a/drivers/net/ethernet/freescale/dpaa/dpaa_ethtool.c
+++ b/drivers/net/ethernet/freescale/dpaa/dpaa_ethtool.c
@@ -399,6 +399,117 @@ static void dpaa_get_strings(struct net_device *net_dev, u32 stringset,
 	memcpy(strings, dpaa_stats_global, size);
 }
 
+static int dpaa_get_hash_opts(struct net_device *dev,
+			      struct ethtool_rxnfc *cmd)
+{
+	cmd->data = 0;
+
+	switch (cmd->flow_type) {
+	case TCP_V4_FLOW:
+	case TCP_V6_FLOW:
+	case UDP_V4_FLOW:
+	case UDP_V6_FLOW:
+		cmd->data |= RXH_L4_B_0_1 | RXH_L4_B_2_3;
+		/* Fall through */
+	case IPV4_FLOW:
+	case IPV6_FLOW:
+	case SCTP_V4_FLOW:
+	case SCTP_V6_FLOW:
+	case AH_ESP_V4_FLOW:
+	case AH_ESP_V6_FLOW:
+	case AH_V4_FLOW:
+	case AH_V6_FLOW:
+	case ESP_V4_FLOW:
+	case ESP_V6_FLOW:
+		cmd->data |= RXH_IP_SRC | RXH_IP_DST;
+		break;
+	default:
+		cmd->data = 0;
+		break;
+	}
+
+	return 0;
+}
+
+static int dpaa_get_rxnfc(struct net_device *dev, struct ethtool_rxnfc *cmd,
+			  u32 *unused)
+{
+	int ret = -EOPNOTSUPP;
+
+	switch (cmd->cmd) {
+	case ETHTOOL_GRXFH:
+		ret = dpaa_get_hash_opts(dev, cmd);
+		break;
+	default:
+		break;
+	}
+
+	return ret;
+}
+
+static void dpaa_set_hash(struct net_device *net_dev, bool enable)
+{
+	struct mac_device *mac_dev;
+	struct fman_port *rxport;
+	struct dpaa_priv *priv;
+
+	priv = netdev_priv(net_dev);
+	mac_dev = priv->mac_dev;
+	rxport = mac_dev->port[0];
+
+	fman_port_use_kg_hash(rxport, enable);
+}
+
+static int dpaa_set_hash_opts(struct net_device *dev,
+			      struct ethtool_rxnfc *nfc)
+{
+	int ret = -EINVAL;
+
+	/* we support hashing on IPv4/v6 src/dest IP and L4 src/dest port */
+	if (nfc->data &
+	    ~(RXH_IP_SRC | RXH_IP_DST | RXH_L4_B_0_1 | RXH_L4_B_2_3))
+		return -EINVAL;
+
+	switch (nfc->flow_type) {
+	case TCP_V4_FLOW:
+	case TCP_V6_FLOW:
+	case UDP_V4_FLOW:
+	case UDP_V6_FLOW:
+	case IPV4_FLOW:
+	case IPV6_FLOW:
+	case SCTP_V4_FLOW:
+	case SCTP_V6_FLOW:
+	case AH_ESP_V4_FLOW:
+	case AH_ESP_V6_FLOW:
+	case AH_V4_FLOW:
+	case AH_V6_FLOW:
+	case ESP_V4_FLOW:
+	case ESP_V6_FLOW:
+		dpaa_set_hash(dev, !!nfc->data);
+		ret = 0;
+		break;
+	default:
+		break;
+	}
+
+	return ret;
+}
+
+static int dpaa_set_rxnfc(struct net_device *dev, struct ethtool_rxnfc *cmd)
+{
+	int ret = -EOPNOTSUPP;
+
+	switch (cmd->cmd) {
+	case ETHTOOL_SRXFH:
+		ret = dpaa_set_hash_opts(dev, cmd);
+		break;
+	default:
+		break;
+	}
+
+	return ret;
+}
+
 const struct ethtool_ops dpaa_ethtool_ops = {
 	.get_drvinfo = dpaa_get_drvinfo,
 	.get_msglevel = dpaa_get_msglevel,
@@ -412,4 +523,6 @@ const struct ethtool_ops dpaa_ethtool_ops = {
 	.get_strings = dpaa_get_strings,
 	.get_link_ksettings = dpaa_get_link_ksettings,
 	.set_link_ksettings = dpaa_set_link_ksettings,
+	.get_rxnfc = dpaa_get_rxnfc,
+	.set_rxnfc = dpaa_set_rxnfc,
 };
-- 
2.1.0

^ permalink raw reply related

* [PATCH 2/6] dpaa_eth: use multiple Rx frame queues
From: Madalin Bucur @ 2017-08-18  8:56 UTC (permalink / raw)
  To: netdev, davem; +Cc: linuxppc-dev, linux-kernel
In-Reply-To: <1503046588-24349-1-git-send-email-madalin.bucur@nxp.com>

Add a block of 128 Rx frame queues per port. The FMan hardware will
send traffic on one of these queues based on the FMan port Parse
Classify Distribute setup. The hash computed by the FMan Keygen
block will select the Rx FQ.

Signed-off-by: Madalin Bucur <madalin.bucur@nxp.com>
---
 drivers/net/ethernet/freescale/dpaa/dpaa_eth.c     | 50 +++++++++++++++++++---
 drivers/net/ethernet/freescale/dpaa/dpaa_eth.h     |  1 +
 .../net/ethernet/freescale/dpaa/dpaa_eth_sysfs.c   |  3 ++
 3 files changed, 47 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
index c7fa285..6d89e74 100644
--- a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
+++ b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
@@ -158,7 +158,7 @@ MODULE_PARM_DESC(tx_timeout, "The Tx timeout in ms");
 #define DPAA_RX_PRIV_DATA_SIZE	(u16)(DPAA_TX_PRIV_DATA_SIZE + \
 					dpaa_rx_extra_headroom)
 
-#define DPAA_ETH_RX_QUEUES	128
+#define DPAA_ETH_PCD_RXQ_NUM	128
 
 #define DPAA_ENQUEUE_RETRIES	100000
 
@@ -169,6 +169,7 @@ struct fm_port_fqs {
 	struct dpaa_fq *tx_errq;
 	struct dpaa_fq *rx_defq;
 	struct dpaa_fq *rx_errq;
+	struct dpaa_fq *rx_pcdq;
 };
 
 /* All the dpa bps in use at any moment */
@@ -628,6 +629,7 @@ static inline void dpaa_assign_wq(struct dpaa_fq *fq, int idx)
 		fq->wq = 5;
 		break;
 	case FQ_TYPE_RX_DEFAULT:
+	case FQ_TYPE_RX_PCD:
 		fq->wq = 6;
 		break;
 	case FQ_TYPE_TX:
@@ -688,6 +690,7 @@ static int dpaa_alloc_all_fqs(struct device *dev, struct list_head *list,
 			      struct fm_port_fqs *port_fqs)
 {
 	struct dpaa_fq *dpaa_fq;
+	u32 fq_base, fq_base_aligned, i;
 
 	dpaa_fq = dpaa_fq_alloc(dev, 0, 1, list, FQ_TYPE_RX_ERROR);
 	if (!dpaa_fq)
@@ -701,6 +704,26 @@ static int dpaa_alloc_all_fqs(struct device *dev, struct list_head *list,
 
 	port_fqs->rx_defq = &dpaa_fq[0];
 
+	/* the PCD FQIDs range needs to be aligned for correct operation */
+	if (qman_alloc_fqid_range(&fq_base, 2 * DPAA_ETH_PCD_RXQ_NUM))
+		goto fq_alloc_failed;
+
+	fq_base_aligned = ALIGN(fq_base, DPAA_ETH_PCD_RXQ_NUM);
+
+	for (i = fq_base; i < fq_base_aligned; i++)
+		qman_release_fqid(i);
+
+	for (i = fq_base_aligned + DPAA_ETH_PCD_RXQ_NUM;
+	     i < (fq_base + 2 * DPAA_ETH_PCD_RXQ_NUM); i++)
+		qman_release_fqid(i);
+
+	dpaa_fq = dpaa_fq_alloc(dev, fq_base_aligned, DPAA_ETH_PCD_RXQ_NUM,
+				list, FQ_TYPE_RX_PCD);
+	if (!dpaa_fq)
+		goto fq_alloc_failed;
+
+	port_fqs->rx_pcdq = &dpaa_fq[0];
+
 	if (!dpaa_fq_alloc(dev, 0, DPAA_ETH_TXQ_NUM, list, FQ_TYPE_TX_CONF_MQ))
 		goto fq_alloc_failed;
 
@@ -870,13 +893,14 @@ static void dpaa_fq_setup(struct dpaa_priv *priv,
 			  const struct dpaa_fq_cbs *fq_cbs,
 			  struct fman_port *tx_port)
 {
-	int egress_cnt = 0, conf_cnt = 0, num_portals = 0, cpu;
+	int egress_cnt = 0, conf_cnt = 0, num_portals = 0, portal_cnt = 0, cpu;
 	const cpumask_t *affine_cpus = qman_affine_cpus();
-	u16 portals[NR_CPUS];
+	u16 channels[NR_CPUS];
 	struct dpaa_fq *fq;
 
 	for_each_cpu(cpu, affine_cpus)
-		portals[num_portals++] = qman_affine_channel(cpu);
+		channels[num_portals++] = qman_affine_channel(cpu);
+
 	if (num_portals == 0)
 		dev_err(priv->net_dev->dev.parent,
 			"No Qman software (affine) channels found");
@@ -890,6 +914,12 @@ static void dpaa_fq_setup(struct dpaa_priv *priv,
 		case FQ_TYPE_RX_ERROR:
 			dpaa_setup_ingress(priv, fq, &fq_cbs->rx_errq);
 			break;
+		case FQ_TYPE_RX_PCD:
+			if (!num_portals)
+				continue;
+			dpaa_setup_ingress(priv, fq, &fq_cbs->rx_defq);
+			fq->channel = channels[portal_cnt++ % num_portals];
+			break;
 		case FQ_TYPE_TX:
 			dpaa_setup_egress(priv, fq, tx_port,
 					  &fq_cbs->egress_ern);
@@ -1039,7 +1069,8 @@ static int dpaa_fq_init(struct dpaa_fq *dpaa_fq, bool td_enable)
 		/* Put all the ingress queues in our "ingress CGR". */
 		if (priv->use_ingress_cgr &&
 		    (dpaa_fq->fq_type == FQ_TYPE_RX_DEFAULT ||
-		     dpaa_fq->fq_type == FQ_TYPE_RX_ERROR)) {
+		     dpaa_fq->fq_type == FQ_TYPE_RX_ERROR ||
+		     dpaa_fq->fq_type == FQ_TYPE_RX_PCD)) {
 			initfq.we_mask |= cpu_to_be16(QM_INITFQ_WE_CGID);
 			initfq.fqd.fq_ctrl |= cpu_to_be16(QM_FQCTRL_CGE);
 			initfq.fqd.cgid = (u8)priv->ingress_cgr.cgrid;
@@ -1170,7 +1201,7 @@ static int dpaa_eth_init_tx_port(struct fman_port *port, struct dpaa_fq *errq,
 
 static int dpaa_eth_init_rx_port(struct fman_port *port, struct dpaa_bp **bps,
 				 size_t count, struct dpaa_fq *errq,
-				 struct dpaa_fq *defq,
+				 struct dpaa_fq *defq, struct dpaa_fq *pcdq,
 				 struct dpaa_buffer_layout *buf_layout)
 {
 	struct fman_buffer_prefix_content buf_prefix_content;
@@ -1190,6 +1221,10 @@ static int dpaa_eth_init_rx_port(struct fman_port *port, struct dpaa_bp **bps,
 	rx_p = &params.specific_params.rx_params;
 	rx_p->err_fqid = errq->fqid;
 	rx_p->dflt_fqid = defq->fqid;
+	if (pcdq) {
+		rx_p->pcd_base_fqid = pcdq->fqid;
+		rx_p->pcd_fqs_count = DPAA_ETH_PCD_RXQ_NUM;
+	}
 
 	count = min(ARRAY_SIZE(rx_p->ext_buf_pools.ext_buf_pool), count);
 	rx_p->ext_buf_pools.num_of_pools_used = (u8)count;
@@ -1234,7 +1269,8 @@ static int dpaa_eth_init_ports(struct mac_device *mac_dev,
 		return err;
 
 	err = dpaa_eth_init_rx_port(rxport, bps, count, port_fqs->rx_errq,
-				    port_fqs->rx_defq, &buf_layout[RX]);
+				    port_fqs->rx_defq, port_fqs->rx_pcdq,
+				    &buf_layout[RX]);
 
 	return err;
 }
diff --git a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.h b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.h
index 9941a78..496a12c 100644
--- a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.h
+++ b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.h
@@ -52,6 +52,7 @@
 enum dpaa_fq_type {
 	FQ_TYPE_RX_DEFAULT = 1, /* Rx Default FQs */
 	FQ_TYPE_RX_ERROR,	/* Rx Error FQs */
+	FQ_TYPE_RX_PCD,		/* Rx Parse Classify Distribute FQs */
 	FQ_TYPE_TX,		/* "Real" Tx FQs */
 	FQ_TYPE_TX_CONFIRM,	/* Tx default Conf FQ (actually an Rx FQ) */
 	FQ_TYPE_TX_CONF_MQ,	/* Tx conf FQs (one for each Tx FQ) */
diff --git a/drivers/net/ethernet/freescale/dpaa/dpaa_eth_sysfs.c b/drivers/net/ethernet/freescale/dpaa/dpaa_eth_sysfs.c
index ec75d1c..0d9b185 100644
--- a/drivers/net/ethernet/freescale/dpaa/dpaa_eth_sysfs.c
+++ b/drivers/net/ethernet/freescale/dpaa/dpaa_eth_sysfs.c
@@ -71,6 +71,9 @@ static ssize_t dpaa_eth_show_fqids(struct device *dev,
 		case FQ_TYPE_RX_ERROR:
 			str = "Rx error";
 			break;
+		case FQ_TYPE_RX_PCD:
+			str = "Rx PCD";
+			break;
 		case FQ_TYPE_TX_CONFIRM:
 			str = "Tx default confirmation";
 			break;
-- 
2.1.0

^ permalink raw reply related

* [PATCH 1/6] fsl/fman: enable FMan Keygen
From: Madalin Bucur @ 2017-08-18  8:56 UTC (permalink / raw)
  To: netdev, davem; +Cc: linuxppc-dev, linux-kernel
In-Reply-To: <1503046588-24349-1-git-send-email-madalin.bucur@nxp.com>

From: Iordache Florinel-R70177 <florinel.iordache@nxp.com>

Add support for the FMan Keygen with a hardcoded scheme to spread
incoming traffic on a FQ range based on source and destination IPs
and ports.

Signed-off-by: Iordache Florinel <florinel.iordache@nxp.com>
Signed-off-by: Madalin Bucur <madalin.bucur@nxp.com>
---
 drivers/net/ethernet/freescale/fman/Makefile      |   2 +-
 drivers/net/ethernet/freescale/fman/fman.c        |  26 +
 drivers/net/ethernet/freescale/fman/fman.h        |   2 +
 drivers/net/ethernet/freescale/fman/fman_keygen.c | 783 ++++++++++++++++++++++
 drivers/net/ethernet/freescale/fman/fman_keygen.h |  46 ++
 drivers/net/ethernet/freescale/fman/fman_port.c   |  40 +-
 drivers/net/ethernet/freescale/fman/fman_port.h   |   5 +
 7 files changed, 902 insertions(+), 2 deletions(-)
 create mode 100644 drivers/net/ethernet/freescale/fman/fman_keygen.c
 create mode 100644 drivers/net/ethernet/freescale/fman/fman_keygen.h

diff --git a/drivers/net/ethernet/freescale/fman/Makefile b/drivers/net/ethernet/freescale/fman/Makefile
index 6049177..2c38119 100644
--- a/drivers/net/ethernet/freescale/fman/Makefile
+++ b/drivers/net/ethernet/freescale/fman/Makefile
@@ -4,6 +4,6 @@ obj-$(CONFIG_FSL_FMAN) += fsl_fman.o
 obj-$(CONFIG_FSL_FMAN) += fsl_fman_port.o
 obj-$(CONFIG_FSL_FMAN) += fsl_mac.o
 
-fsl_fman-objs	:= fman_muram.o fman.o fman_sp.o
+fsl_fman-objs	:= fman_muram.o fman.o fman_sp.o fman_keygen.o
 fsl_fman_port-objs := fman_port.o
 fsl_mac-objs:= mac.o fman_dtsec.o fman_memac.o fman_tgec.o
diff --git a/drivers/net/ethernet/freescale/fman/fman.c b/drivers/net/ethernet/freescale/fman/fman.c
index e714b8f..491a5ac 100644
--- a/drivers/net/ethernet/freescale/fman/fman.c
+++ b/drivers/net/ethernet/freescale/fman/fman.c
@@ -34,6 +34,7 @@
 
 #include "fman.h"
 #include "fman_muram.h"
+#include "fman_keygen.h"
 
 #include <linux/fsl/guts.h>
 #include <linux/slab.h>
@@ -56,6 +57,7 @@
 /* Modules registers offsets */
 #define BMI_OFFSET		0x00080000
 #define QMI_OFFSET		0x00080400
+#define KG_OFFSET		0x000C1000
 #define DMA_OFFSET		0x000C2000
 #define FPM_OFFSET		0x000C3000
 #define IMEM_OFFSET		0x000C4000
@@ -617,6 +619,7 @@ struct fman {
 	struct fman_qmi_regs __iomem *qmi_regs;
 	struct fman_dma_regs __iomem *dma_regs;
 	struct fman_hwp_regs __iomem *hwp_regs;
+	struct fman_kg_regs __iomem *kg_regs;
 	fman_exceptions_cb *exception_cb;
 	fman_bus_error_cb *bus_error_cb;
 	/* Spinlock for FMan use */
@@ -631,6 +634,8 @@ struct fman {
 	/* Fifo in MURAM */
 	unsigned long fifo_offset;
 	size_t fifo_size;
+	/* KeyGen handle */
+	struct fman_keygen *keygen;
 
 	u32 liodn_base[64];
 	u32 liodn_offset[64];
@@ -1811,6 +1816,7 @@ static int fman_config(struct fman *fman)
 	fman->qmi_regs = base_addr + QMI_OFFSET;
 	fman->dma_regs = base_addr + DMA_OFFSET;
 	fman->hwp_regs = base_addr + HWP_OFFSET;
+	fman->kg_regs = base_addr + KG_OFFSET;
 	fman->base_addr = base_addr;
 
 	spin_lock_init(&fman->spinlock);
@@ -2083,6 +2089,11 @@ static int fman_init(struct fman *fman)
 	/* Init HW Parser */
 	hwp_init(fman->hwp_regs);
 
+	/* Init KeyGen */
+	fman->keygen = keygen_init(fman->kg_regs);
+	if (!fman->keygen)
+		return -EINVAL;
+
 	err = enable(fman, cfg);
 	if (err != 0)
 		return err;
@@ -2562,6 +2573,21 @@ int fman_get_rx_extra_headroom(void)
 EXPORT_SYMBOL(fman_get_rx_extra_headroom);
 
 /**
+ * fman_get_keygen
+ *
+ * @fman:	A Pointer to FMan device
+ *
+ * Get the handle to KeyGen module part of FM driver
+ *
+ * Return: Handle to KeyGen
+ */
+struct fman_keygen *fman_get_keygen(struct fman *fman)
+{
+	return fman->keygen;
+}
+EXPORT_SYMBOL(fman_get_keygen);
+
+/**
  * fman_bind
  * @dev:	FMan OF device pointer
  *
diff --git a/drivers/net/ethernet/freescale/fman/fman.h b/drivers/net/ethernet/freescale/fman/fman.h
index f53e147..291990e 100644
--- a/drivers/net/ethernet/freescale/fman/fman.h
+++ b/drivers/net/ethernet/freescale/fman/fman.h
@@ -320,6 +320,8 @@ u16 fman_get_max_frm(void);
 
 int fman_get_rx_extra_headroom(void);
 
+struct fman_keygen *fman_get_keygen(struct fman *fman);
+
 struct fman *fman_bind(struct device *dev);
 
 #endif /* __FM_H */
diff --git a/drivers/net/ethernet/freescale/fman/fman_keygen.c b/drivers/net/ethernet/freescale/fman/fman_keygen.c
new file mode 100644
index 0000000..f54da3c
--- /dev/null
+++ b/drivers/net/ethernet/freescale/fman/fman_keygen.c
@@ -0,0 +1,783 @@
+/*
+ * Copyright 2017 NXP
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are met:
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in the
+ *       documentation and/or other materials provided with the distribution.
+ *     * Neither the name of NXP nor the
+ *       names of its contributors may be used to endorse or promote products
+ *       derived from this software without specific prior written permission.
+ *
+ *
+ * ALTERNATIVELY, this software may be distributed under the terms of the
+ * GNU General Public License ("GPL") as published by the Free Software
+ * Foundation, either version 2 of that License or (at your option) any
+ * later version.
+ *
+ * THIS SOFTWARE IS PROVIDED BY NXP ``AS IS'' AND ANY
+ * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
+ * WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+ * DISCLAIMED. IN NO EVENT SHALL NXP BE LIABLE FOR ANY
+ * DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
+ * (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+ * LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
+ * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
+ * SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include <linux/slab.h>
+
+#include "fman_keygen.h"
+
+/* Maximum number of HW Ports */
+#define FMAN_MAX_NUM_OF_HW_PORTS		64
+
+/* Maximum number of KeyGen Schemes */
+#define FM_KG_MAX_NUM_OF_SCHEMES		32
+
+/* Number of generic KeyGen Generic Extract Command Registers */
+#define FM_KG_NUM_OF_GENERIC_REGS		8
+
+/* Dummy port ID */
+#define DUMMY_PORT_ID				0
+
+/* Select Scheme Value Register */
+#define KG_SCH_DEF_USE_KGSE_DV_0		2
+#define KG_SCH_DEF_USE_KGSE_DV_1		3
+
+/* Registers Shifting values */
+#define FM_KG_KGAR_NUM_SHIFT			16
+#define KG_SCH_DEF_L4_PORT_SHIFT		8
+#define KG_SCH_DEF_IP_ADDR_SHIFT		18
+#define KG_SCH_HASH_CONFIG_SHIFT_SHIFT		24
+
+/* KeyGen Registers bit field masks: */
+
+/* Enable bit field mask for KeyGen General Configuration Register */
+#define FM_KG_KGGCR_EN				0x80000000
+
+/* KeyGen Global Registers bit field masks */
+#define FM_KG_KGAR_GO				0x80000000
+#define FM_KG_KGAR_READ				0x40000000
+#define FM_KG_KGAR_WRITE			0x00000000
+#define FM_KG_KGAR_SEL_SCHEME_ENTRY		0x00000000
+#define FM_KG_KGAR_SCM_WSEL_UPDATE_CNT		0x00008000
+
+#define FM_KG_KGAR_ERR				0x20000000
+#define FM_KG_KGAR_SEL_CLS_PLAN_ENTRY		0x01000000
+#define FM_KG_KGAR_SEL_PORT_ENTRY		0x02000000
+#define FM_KG_KGAR_SEL_PORT_WSEL_SP		0x00008000
+#define FM_KG_KGAR_SEL_PORT_WSEL_CPP		0x00004000
+
+/* Error events exceptions */
+#define FM_EX_KG_DOUBLE_ECC			0x80000000
+#define FM_EX_KG_KEYSIZE_OVERFLOW		0x40000000
+
+/* Scheme Registers bit field masks */
+#define KG_SCH_MODE_EN				0x80000000
+#define KG_SCH_VSP_NO_KSP_EN			0x80000000
+#define KG_SCH_HASH_CONFIG_SYM			0x40000000
+
+/* Known Protocol field codes */
+#define KG_SCH_KN_PORT_ID		0x80000000
+#define KG_SCH_KN_MACDST		0x40000000
+#define KG_SCH_KN_MACSRC		0x20000000
+#define KG_SCH_KN_TCI1			0x10000000
+#define KG_SCH_KN_TCI2			0x08000000
+#define KG_SCH_KN_ETYPE			0x04000000
+#define KG_SCH_KN_PPPSID		0x02000000
+#define KG_SCH_KN_PPPID			0x01000000
+#define KG_SCH_KN_MPLS1			0x00800000
+#define KG_SCH_KN_MPLS2			0x00400000
+#define KG_SCH_KN_MPLS_LAST		0x00200000
+#define KG_SCH_KN_IPSRC1		0x00100000
+#define KG_SCH_KN_IPDST1		0x00080000
+#define KG_SCH_KN_PTYPE1		0x00040000
+#define KG_SCH_KN_IPTOS_TC1		0x00020000
+#define KG_SCH_KN_IPV6FL1		0x00010000
+#define KG_SCH_KN_IPSRC2		0x00008000
+#define KG_SCH_KN_IPDST2		0x00004000
+#define KG_SCH_KN_PTYPE2		0x00002000
+#define KG_SCH_KN_IPTOS_TC2		0x00001000
+#define KG_SCH_KN_IPV6FL2		0x00000800
+#define KG_SCH_KN_GREPTYPE		0x00000400
+#define KG_SCH_KN_IPSEC_SPI		0x00000200
+#define KG_SCH_KN_IPSEC_NH		0x00000100
+#define KG_SCH_KN_IPPID			0x00000080
+#define KG_SCH_KN_L4PSRC		0x00000004
+#define KG_SCH_KN_L4PDST		0x00000002
+#define KG_SCH_KN_TFLG			0x00000001
+
+/* NIA values */
+#define NIA_ENG_BMI			0x00500000
+#define NIA_BMI_AC_ENQ_FRAME		0x00000002
+#define ENQUEUE_KG_DFLT_NIA		(NIA_ENG_BMI | NIA_BMI_AC_ENQ_FRAME)
+
+/* Hard-coded configuration:
+ * These values are used as hard-coded values for KeyGen configuration
+ * and they replace user selections for this hard-coded version
+ */
+
+/* Hash distribution shift */
+#define DEFAULT_HASH_DIST_FQID_SHIFT		0
+
+/* Hash shift */
+#define DEFAULT_HASH_SHIFT			0
+
+/* Symmetric hash usage:
+ * Warning:
+ * - the value for symmetric hash usage must be in accordance with hash
+ *	key defined below
+ * - according to tests performed, spreading is not working if symmetric
+ *	hash is set on true
+ * So ultimately symmetric hash functionality should be always disabled:
+ */
+#define DEFAULT_SYMMETRIC_HASH			false
+
+/* Hash Key extraction fields: */
+#define DEFAULT_HASH_KEY_EXTRACT_FIELDS		\
+	(KG_SCH_KN_IPSRC1 | KG_SCH_KN_IPDST1 | \
+	    KG_SCH_KN_L4PSRC | KG_SCH_KN_L4PDST)
+
+/* Default values to be used as hash key in case IPv4 or L4 (TCP, UDP)
+ * don't exist in the frame
+ */
+/* Default IPv4 address */
+#define DEFAULT_HASH_KEY_IPv4_ADDR		0x0A0A0A0A
+/* Default L4 port */
+#define DEFAULT_HASH_KEY_L4_PORT		0x0B0B0B0B
+
+/* KeyGen Memory Mapped Registers: */
+
+/* Scheme Configuration RAM Registers */
+struct fman_kg_scheme_regs {
+	u32 kgse_mode;		/* 0x100: MODE */
+	u32 kgse_ekfc;		/* 0x104: Extract Known Fields Command */
+	u32 kgse_ekdv;		/* 0x108: Extract Known Default Value */
+	u32 kgse_bmch;		/* 0x10C: Bit Mask Command High */
+	u32 kgse_bmcl;		/* 0x110: Bit Mask Command Low */
+	u32 kgse_fqb;		/* 0x114: Frame Queue Base */
+	u32 kgse_hc;		/* 0x118: Hash Command */
+	u32 kgse_ppc;		/* 0x11C: Policer Profile Command */
+	u32 kgse_gec[FM_KG_NUM_OF_GENERIC_REGS];
+			/* 0x120: Generic Extract Command */
+	u32 kgse_spc;
+		/* 0x140: KeyGen Scheme Entry Statistic Packet Counter */
+	u32 kgse_dv0;	/* 0x144: KeyGen Scheme Entry Default Value 0 */
+	u32 kgse_dv1;	/* 0x148: KeyGen Scheme Entry Default Value 1 */
+	u32 kgse_ccbs;
+		/* 0x14C: KeyGen Scheme Entry Coarse Classification Bit*/
+	u32 kgse_mv;	/* 0x150: KeyGen Scheme Entry Match vector */
+	u32 kgse_om;	/* 0x154: KeyGen Scheme Entry Operation Mode bits */
+	u32 kgse_vsp;
+		/* 0x158: KeyGen Scheme Entry Virtual Storage Profile */
+};
+
+/* Port Partition Configuration Registers */
+struct fman_kg_pe_regs {
+	u32 fmkg_pe_sp;		/* 0x100: KeyGen Port entry Scheme Partition */
+	u32 fmkg_pe_cpp;
+		/* 0x104: KeyGen Port Entry Classification Plan Partition */
+};
+
+/* General Configuration and Status Registers
+ * Global Statistic Counters
+ * KeyGen Global Registers
+ */
+struct fman_kg_regs {
+	u32 fmkg_gcr;	/* 0x000: KeyGen General Configuration Register */
+	u32 res004;	/* 0x004: Reserved */
+	u32 res008;	/* 0x008: Reserved */
+	u32 fmkg_eer;	/* 0x00C: KeyGen Error Event Register */
+	u32 fmkg_eeer;	/* 0x010: KeyGen Error Event Enable Register */
+	u32 res014;	/* 0x014: Reserved */
+	u32 res018;	/* 0x018: Reserved */
+	u32 fmkg_seer;	/* 0x01C: KeyGen Scheme Error Event Register */
+	u32 fmkg_seeer;	/* 0x020: KeyGen Scheme Error Event Enable Register */
+	u32 fmkg_gsr;	/* 0x024: KeyGen Global Status Register */
+	u32 fmkg_tpc;	/* 0x028: Total Packet Counter Register */
+	u32 fmkg_serc;	/* 0x02C: Soft Error Capture Register */
+	u32 res030[4];	/* 0x030: Reserved */
+	u32 fmkg_fdor;	/* 0x034: Frame Data Offset Register */
+	u32 fmkg_gdv0r;	/* 0x038: Global Default Value Register 0 */
+	u32 fmkg_gdv1r;	/* 0x03C: Global Default Value Register 1 */
+	u32 res04c[6];	/* 0x040: Reserved */
+	u32 fmkg_feer;	/* 0x044: Force Error Event Register */
+	u32 res068[38];	/* 0x048: Reserved */
+	union {
+		u32 fmkg_indirect[63];	/* 0x100: Indirect Access Registers */
+		struct fman_kg_scheme_regs fmkg_sch; /* Scheme Registers */
+		struct fman_kg_pe_regs fmkg_pe; /* Port Partition Registers */
+	};
+	u32 fmkg_ar;	/* 0x1FC: KeyGen Action Register */
+};
+
+/* KeyGen Scheme data */
+struct keygen_scheme {
+	bool used;	/* Specifies if this scheme is used */
+	u8 hw_port_id;
+		/* Hardware port ID
+		 * schemes sharing between multiple ports is not
+		 * currently supported
+		 * so we have only one port id bound to a scheme
+		 */
+	u32 base_fqid;
+		/* Base FQID:
+		 * Must be between 1 and 2^24-1
+		 * If hash is used and an even distribution is
+		 * expected according to hash_fqid_count,
+		 * base_fqid must be aligned to hash_fqid_count
+		 */
+	u32 hash_fqid_count;
+		/* FQ range for hash distribution:
+		 * Must be a power of 2
+		 * Represents the range of queues for spreading
+		 */
+	bool use_hashing;	/* Usage of Hashing and spreading over FQ */
+	bool symmetric_hash;	/* Symmetric Hash option usage */
+	u8 hashShift;
+		/* Hash result right shift.
+		 * Select the 24 bits out of the 64 hash result.
+		 * 0 means using the 24 LSB's, otherwise
+		 * use the 24 LSB's after shifting right
+		 */
+	u32 match_vector;	/* Match Vector */
+};
+
+/* KeyGen driver data */
+struct fman_keygen {
+	struct keygen_scheme schemes[FM_KG_MAX_NUM_OF_SCHEMES];
+				/* Array of schemes */
+	struct fman_kg_regs __iomem *keygen_regs;	/* KeyGen registers */
+};
+
+/* keygen_write_ar_wait
+ *
+ * Write Action Register with specified value, wait for GO bit field to be
+ * idle and then read the error
+ *
+ * regs: KeyGen registers
+ * fmkg_ar: Action Register value
+ *
+ * Return: Zero for success or error code in case of failure
+ */
+static int keygen_write_ar_wait(struct fman_kg_regs __iomem *regs, u32 fmkg_ar)
+{
+	iowrite32be(fmkg_ar, &regs->fmkg_ar);
+
+	/* Wait for GO bit field to be idle */
+	while (fmkg_ar & FM_KG_KGAR_GO)
+		fmkg_ar = ioread32be(&regs->fmkg_ar);
+
+	if (fmkg_ar & FM_KG_KGAR_ERR)
+		return -EINVAL;
+
+	return 0;
+}
+
+/* build_ar_scheme
+ *
+ * Build Action Register value for scheme settings
+ *
+ * scheme_id: Scheme ID
+ * update_counter: update scheme counter
+ * write: true for action to write the scheme or false for read action
+ *
+ * Return: AR value
+ */
+static u32 build_ar_scheme(u8 scheme_id, bool update_counter, bool write)
+{
+	u32 rw = (u32)(write ? FM_KG_KGAR_WRITE : FM_KG_KGAR_READ);
+
+	return (u32)(FM_KG_KGAR_GO |
+			rw |
+			FM_KG_KGAR_SEL_SCHEME_ENTRY |
+			DUMMY_PORT_ID |
+			((u32)scheme_id << FM_KG_KGAR_NUM_SHIFT) |
+			(update_counter ? FM_KG_KGAR_SCM_WSEL_UPDATE_CNT : 0));
+}
+
+/* build_ar_bind_scheme
+ *
+ * Build Action Register value for port binding to schemes
+ *
+ * hwport_id: HW Port ID
+ * write: true for action to write the bind or false for read action
+ *
+ * Return: AR value
+ */
+static u32 build_ar_bind_scheme(u8 hwport_id, bool write)
+{
+	u32 rw = write ? (u32)FM_KG_KGAR_WRITE : (u32)FM_KG_KGAR_READ;
+
+	return (u32)(FM_KG_KGAR_GO |
+			rw |
+			FM_KG_KGAR_SEL_PORT_ENTRY |
+			hwport_id |
+			FM_KG_KGAR_SEL_PORT_WSEL_SP);
+}
+
+/* keygen_write_sp
+ *
+ * Write Scheme Partition Register with specified value
+ *
+ * regs: KeyGen Registers
+ * sp: Scheme Partition register value
+ * add: true to add a scheme partition or false to clear
+ *
+ * Return: none
+ */
+static void keygen_write_sp(struct fman_kg_regs __iomem *regs, u32 sp, bool add)
+{
+	u32 tmp;
+
+	tmp = ioread32be(&regs->fmkg_pe.fmkg_pe_sp);
+
+	if (add)
+		tmp |= sp;
+	else
+		tmp &= ~sp;
+
+	iowrite32be(tmp, &regs->fmkg_pe.fmkg_pe_sp);
+}
+
+/* build_ar_bind_cls_plan
+ *
+ * Build Action Register value for Classification Plan
+ *
+ * hwport_id: HW Port ID
+ * write: true for action to write the CP or false for read action
+ *
+ * Return: AR value
+ */
+static u32 build_ar_bind_cls_plan(u8 hwport_id, bool write)
+{
+	u32 rw = write ? (u32)FM_KG_KGAR_WRITE : (u32)FM_KG_KGAR_READ;
+
+	return (u32)(FM_KG_KGAR_GO |
+			rw |
+			FM_KG_KGAR_SEL_PORT_ENTRY |
+			hwport_id |
+			FM_KG_KGAR_SEL_PORT_WSEL_CPP);
+}
+
+/* keygen_write_cpp
+ *
+ * Write Classification Plan Partition Register with specified value
+ *
+ * regs: KeyGen Registers
+ * cpp: CPP register value
+ *
+ * Return: none
+ */
+static void keygen_write_cpp(struct fman_kg_regs __iomem *regs, u32 cpp)
+{
+	iowrite32be(cpp, &regs->fmkg_pe.fmkg_pe_cpp);
+}
+
+/* keygen_write_scheme
+ *
+ * Write all Schemes Registers with specified values
+ *
+ * regs: KeyGen Registers
+ * scheme_id: Scheme ID
+ * scheme_regs: Scheme registers values desired to be written
+ * update_counter: update scheme counter
+ *
+ * Return: Zero for success or error code in case of failure
+ */
+static int keygen_write_scheme(struct fman_kg_regs __iomem *regs, u8 scheme_id,
+			       struct fman_kg_scheme_regs *scheme_regs,
+				bool update_counter)
+{
+	u32 ar_reg;
+	int err, i;
+
+	/* Write indirect scheme registers */
+	iowrite32be(scheme_regs->kgse_mode, &regs->fmkg_sch.kgse_mode);
+	iowrite32be(scheme_regs->kgse_ekfc, &regs->fmkg_sch.kgse_ekfc);
+	iowrite32be(scheme_regs->kgse_ekdv, &regs->fmkg_sch.kgse_ekdv);
+	iowrite32be(scheme_regs->kgse_bmch, &regs->fmkg_sch.kgse_bmch);
+	iowrite32be(scheme_regs->kgse_bmcl, &regs->fmkg_sch.kgse_bmcl);
+	iowrite32be(scheme_regs->kgse_fqb, &regs->fmkg_sch.kgse_fqb);
+	iowrite32be(scheme_regs->kgse_hc, &regs->fmkg_sch.kgse_hc);
+	iowrite32be(scheme_regs->kgse_ppc, &regs->fmkg_sch.kgse_ppc);
+	iowrite32be(scheme_regs->kgse_spc, &regs->fmkg_sch.kgse_spc);
+	iowrite32be(scheme_regs->kgse_dv0, &regs->fmkg_sch.kgse_dv0);
+	iowrite32be(scheme_regs->kgse_dv1, &regs->fmkg_sch.kgse_dv1);
+	iowrite32be(scheme_regs->kgse_ccbs, &regs->fmkg_sch.kgse_ccbs);
+	iowrite32be(scheme_regs->kgse_mv, &regs->fmkg_sch.kgse_mv);
+	iowrite32be(scheme_regs->kgse_om, &regs->fmkg_sch.kgse_om);
+	iowrite32be(scheme_regs->kgse_vsp, &regs->fmkg_sch.kgse_vsp);
+
+	for (i = 0 ; i < FM_KG_NUM_OF_GENERIC_REGS ; i++)
+		iowrite32be(scheme_regs->kgse_gec[i],
+			    &regs->fmkg_sch.kgse_gec[i]);
+
+	/* Write AR (Action register) */
+	ar_reg = build_ar_scheme(scheme_id, update_counter, true);
+	err = keygen_write_ar_wait(regs, ar_reg);
+	if (err != 0) {
+		pr_err("Writing Action Register failed\n");
+		return err;
+	}
+
+	return err;
+}
+
+/* get_free_scheme_id
+ *
+ * Find the first free scheme available to be used
+ *
+ * keygen: KeyGen handle
+ * scheme_id: pointer to scheme id
+ *
+ * Return: 0 on success, -EINVAL when the are no available free schemes
+ */
+static int get_free_scheme_id(struct fman_keygen *keygen, u8 *scheme_id)
+{
+	u8 i;
+
+	for (i = 0; i < FM_KG_MAX_NUM_OF_SCHEMES; i++)
+		if (!keygen->schemes[i].used) {
+			*scheme_id = i;
+			return 0;
+		}
+
+	return -EINVAL;
+}
+
+/* get_scheme
+ *
+ * Provides the scheme for specified ID
+ *
+ * keygen: KeyGen handle
+ * scheme_id: Scheme ID
+ *
+ * Return: handle to required scheme
+ */
+static struct keygen_scheme *get_scheme(struct fman_keygen *keygen,
+					u8 scheme_id)
+{
+	if (scheme_id >= FM_KG_MAX_NUM_OF_SCHEMES)
+		return NULL;
+	return &keygen->schemes[scheme_id];
+}
+
+/* keygen_bind_port_to_schemes
+ *
+ * Bind the port to schemes
+ *
+ * keygen: KeyGen handle
+ * scheme_id: id of the scheme to bind to
+ * bind: true to bind the port or false to unbind it
+ *
+ * Return: Zero for success or error code in case of failure
+ */
+static int keygen_bind_port_to_schemes(struct fman_keygen *keygen,
+				       u8 scheme_id,
+					bool bind)
+{
+	struct fman_kg_regs __iomem *keygen_regs = keygen->keygen_regs;
+	struct keygen_scheme *scheme;
+	u32 ar_reg;
+	u32 schemes_vector = 0;
+	int err;
+
+	scheme = get_scheme(keygen, scheme_id);
+	if (!scheme) {
+		pr_err("Requested Scheme does not exist\n");
+		return -EINVAL;
+	}
+	if (!scheme->used) {
+		pr_err("Cannot bind port to an invalid scheme\n");
+		return -EINVAL;
+	}
+
+	schemes_vector |= 1 << (31 - scheme_id);
+
+	ar_reg = build_ar_bind_scheme(scheme->hw_port_id, false);
+	err = keygen_write_ar_wait(keygen_regs, ar_reg);
+	if (err != 0) {
+		pr_err("Reading Action Register failed\n");
+		return err;
+	}
+
+	keygen_write_sp(keygen_regs, schemes_vector, bind);
+
+	ar_reg = build_ar_bind_scheme(scheme->hw_port_id, true);
+	err = keygen_write_ar_wait(keygen_regs, ar_reg);
+	if (err != 0) {
+		pr_err("Writing Action Register failed\n");
+		return err;
+	}
+
+	return 0;
+}
+
+/* keygen_scheme_setup
+ *
+ * Setup the scheme according to required configuration
+ *
+ * keygen: KeyGen handle
+ * scheme_id: scheme ID
+ * enable: true to enable scheme or false to disable it
+ *
+ * Return: Zero for success or error code in case of failure
+ */
+static int keygen_scheme_setup(struct fman_keygen *keygen, u8 scheme_id,
+			       bool enable)
+{
+	struct fman_kg_regs __iomem *keygen_regs = keygen->keygen_regs;
+	struct fman_kg_scheme_regs scheme_regs;
+	struct keygen_scheme *scheme;
+	u32 tmp_reg;
+	int err;
+
+	scheme = get_scheme(keygen, scheme_id);
+	if (!scheme) {
+		pr_err("Requested Scheme does not exist\n");
+		return -EINVAL;
+	}
+	if (enable && scheme->used) {
+		pr_err("The requested Scheme is already used\n");
+		return -EINVAL;
+	}
+
+	/* Clear scheme registers */
+	memset(&scheme_regs, 0, sizeof(struct fman_kg_scheme_regs));
+
+	/* Setup all scheme registers: */
+	tmp_reg = 0;
+
+	if (enable) {
+		/* Enable Scheme */
+		tmp_reg |= KG_SCH_MODE_EN;
+		/* Enqueue frame NIA */
+		tmp_reg |= ENQUEUE_KG_DFLT_NIA;
+	}
+
+	scheme_regs.kgse_mode = tmp_reg;
+
+	scheme_regs.kgse_mv = scheme->match_vector;
+
+	/* Scheme don't override StorageProfile:
+	 * valid only for DPAA_VERSION >= 11
+	 */
+	scheme_regs.kgse_vsp = KG_SCH_VSP_NO_KSP_EN;
+
+	/* Configure Hard-Coded Rx Hashing: */
+
+	if (scheme->use_hashing) {
+		/* configure kgse_ekfc */
+		scheme_regs.kgse_ekfc = DEFAULT_HASH_KEY_EXTRACT_FIELDS;
+
+		/* configure kgse_ekdv */
+		tmp_reg = 0;
+		tmp_reg |= (KG_SCH_DEF_USE_KGSE_DV_0 <<
+				KG_SCH_DEF_IP_ADDR_SHIFT);
+		tmp_reg |= (KG_SCH_DEF_USE_KGSE_DV_1 <<
+				KG_SCH_DEF_L4_PORT_SHIFT);
+		scheme_regs.kgse_ekdv = tmp_reg;
+
+		/* configure kgse_dv0 */
+		scheme_regs.kgse_dv0 = DEFAULT_HASH_KEY_IPv4_ADDR;
+		/* configure kgse_dv1 */
+		scheme_regs.kgse_dv1 = DEFAULT_HASH_KEY_L4_PORT;
+
+		/* configure kgse_hc  */
+		tmp_reg = 0;
+		tmp_reg |= ((scheme->hash_fqid_count - 1) <<
+				DEFAULT_HASH_DIST_FQID_SHIFT);
+		tmp_reg |= scheme->hashShift << KG_SCH_HASH_CONFIG_SHIFT_SHIFT;
+
+		if (scheme->symmetric_hash) {
+			/* Normally extraction key should be verified if
+			 * complies with symmetric hash
+			 * But because extraction is hard-coded, we are sure
+			 * the key is symmetric
+			 */
+			tmp_reg |= KG_SCH_HASH_CONFIG_SYM;
+		}
+		scheme_regs.kgse_hc = tmp_reg;
+	} else {
+		scheme_regs.kgse_ekfc = 0;
+		scheme_regs.kgse_hc = 0;
+		scheme_regs.kgse_ekdv = 0;
+		scheme_regs.kgse_dv0 = 0;
+		scheme_regs.kgse_dv1 = 0;
+	}
+
+	/* configure kgse_fqb: Scheme FQID base */
+	tmp_reg = 0;
+	tmp_reg |= scheme->base_fqid;
+	scheme_regs.kgse_fqb = tmp_reg;
+
+	/* features not used by hard-coded configuration */
+	scheme_regs.kgse_bmch = 0;
+	scheme_regs.kgse_bmcl = 0;
+	scheme_regs.kgse_spc = 0;
+
+	/* Write scheme registers */
+	err = keygen_write_scheme(keygen_regs, scheme_id, &scheme_regs, true);
+	if (err != 0) {
+		pr_err("Writing scheme registers failed\n");
+		return err;
+	}
+
+	/* Update used field for Scheme */
+	scheme->used = enable;
+
+	return 0;
+}
+
+/* keygen_init
+ *
+ * KeyGen initialization:
+ * Initializes and enables KeyGen, allocate driver memory, setup registers,
+ * clear port bindings, invalidate all schemes
+ *
+ * keygen_regs: KeyGen registers base address
+ *
+ * Return: Handle to KeyGen driver
+ */
+struct fman_keygen *keygen_init(struct fman_kg_regs __iomem *keygen_regs)
+{
+	struct fman_keygen *keygen;
+	u32 ar;
+	int i;
+
+	/* Allocate memory for KeyGen driver */
+	keygen = kzalloc(sizeof(*keygen), GFP_KERNEL);
+	if (!keygen)
+		return NULL;
+
+	keygen->keygen_regs = keygen_regs;
+
+	/* KeyGen initialization (for Master partition):
+	 * Setup KeyGen registers
+	 */
+	iowrite32be(ENQUEUE_KG_DFLT_NIA, &keygen_regs->fmkg_gcr);
+
+	iowrite32be(FM_EX_KG_DOUBLE_ECC | FM_EX_KG_KEYSIZE_OVERFLOW,
+		    &keygen_regs->fmkg_eer);
+
+	iowrite32be(0, &keygen_regs->fmkg_fdor);
+	iowrite32be(0, &keygen_regs->fmkg_gdv0r);
+	iowrite32be(0, &keygen_regs->fmkg_gdv1r);
+
+	/* Clear binding between ports to schemes and classification plans
+	 * so that all ports are not bound to any scheme/classification plan
+	 */
+	for (i = 0; i < FMAN_MAX_NUM_OF_HW_PORTS; i++) {
+		/* Clear all pe sp schemes registers */
+		keygen_write_sp(keygen_regs, 0xffffffff, false);
+		ar = build_ar_bind_scheme(i, true);
+		keygen_write_ar_wait(keygen_regs, ar);
+
+		/* Clear all pe cpp classification plans registers */
+		keygen_write_cpp(keygen_regs, 0);
+		ar = build_ar_bind_cls_plan(i, true);
+		keygen_write_ar_wait(keygen_regs, ar);
+	}
+
+	/* Enable all scheme interrupts */
+	iowrite32be(0xFFFFFFFF, &keygen_regs->fmkg_seer);
+	iowrite32be(0xFFFFFFFF, &keygen_regs->fmkg_seeer);
+
+	/* Enable KyeGen */
+	iowrite32be(ioread32be(&keygen_regs->fmkg_gcr) | FM_KG_KGGCR_EN,
+		    &keygen_regs->fmkg_gcr);
+
+	return keygen;
+}
+EXPORT_SYMBOL(keygen_init);
+
+/* keygen_port_hashing_init
+ *
+ * Initializes a port for Rx Hashing with specified configuration parameters
+ *
+ * keygen: KeyGen handle
+ * hw_port_id: HW Port ID
+ * hash_base_fqid: Hashing Base FQID used for spreading
+ * hash_size: Hashing size
+ *
+ * Return: Zero for success or error code in case of failure
+ */
+int keygen_port_hashing_init(struct fman_keygen *keygen, u8 hw_port_id,
+			     u32 hash_base_fqid, u32 hash_size)
+{
+	struct keygen_scheme *scheme;
+	u8 scheme_id;
+	int err;
+
+	/* Validate Scheme configuration parameters */
+	if (hash_base_fqid == 0 || (hash_base_fqid & ~0x00FFFFFF)) {
+		pr_err("Base FQID must be between 1 and 2^24-1\n");
+		return -EINVAL;
+	}
+	if (hash_size == 0 || (hash_size & (hash_size - 1)) != 0) {
+		pr_err("Hash size must be power of two\n");
+		return -EINVAL;
+	}
+
+	/* Find a free scheme */
+	err = get_free_scheme_id(keygen, &scheme_id);
+	if (err) {
+		pr_err("The maximum number of available Schemes has been exceeded\n");
+		return -EINVAL;
+	}
+
+	/* Create and configure Hard-Coded Scheme: */
+
+	scheme = get_scheme(keygen, scheme_id);
+	if (!scheme) {
+		pr_err("Requested Scheme does not exist\n");
+		return -EINVAL;
+	}
+	if (scheme->used) {
+		pr_err("The requested Scheme is already used\n");
+		return -EINVAL;
+	}
+
+	/* Clear all scheme fields because the scheme may have been
+	 * previously used
+	 */
+	memset(scheme, 0, sizeof(struct keygen_scheme));
+
+	/* Setup scheme: */
+	scheme->hw_port_id = hw_port_id;
+	scheme->use_hashing = true;
+	scheme->base_fqid = hash_base_fqid;
+	scheme->hash_fqid_count = hash_size;
+	scheme->symmetric_hash = DEFAULT_SYMMETRIC_HASH;
+	scheme->hashShift = DEFAULT_HASH_SHIFT;
+
+	/* All Schemes in hard-coded configuration
+	 * are Indirect Schemes
+	 */
+	scheme->match_vector = 0;
+
+	err = keygen_scheme_setup(keygen, scheme_id, true);
+	if (err != 0) {
+		pr_err("Scheme setup failed\n");
+		return err;
+	}
+
+	/* Bind Rx port to Scheme */
+	err = keygen_bind_port_to_schemes(keygen, scheme_id, true);
+	if (err != 0) {
+		pr_err("Binding port to schemes failed\n");
+		return err;
+	}
+
+	return 0;
+}
+EXPORT_SYMBOL(keygen_port_hashing_init);
diff --git a/drivers/net/ethernet/freescale/fman/fman_keygen.h b/drivers/net/ethernet/freescale/fman/fman_keygen.h
new file mode 100644
index 0000000..c4640de
--- /dev/null
+++ b/drivers/net/ethernet/freescale/fman/fman_keygen.h
@@ -0,0 +1,46 @@
+/*
+ * Copyright 2017 NXP
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are met:
+ *     * Redistributions of source code must retain the above copyright
+ *       notice, this list of conditions and the following disclaimer.
+ *     * Redistributions in binary form must reproduce the above copyright
+ *       notice, this list of conditions and the following disclaimer in the
+ *       documentation and/or other materials provided with the distribution.
+ *     * Neither the name of NXP nor the
+ *       names of its contributors may be used to endorse or promote products
+ *       derived from this software without specific prior written permission.
+ *
+ *
+ * ALTERNATIVELY, this software may be distributed under the terms of the
+ * GNU General Public License ("GPL") as published by the Free Software
+ * Foundation, either version 2 of that License or (at your option) any
+ * later version.
+ *
+ * THIS SOFTWARE IS PROVIDED BY NXP ``AS IS'' AND ANY
+ * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
+ * WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+ * DISCLAIMED. IN NO EVENT SHALL NXP BE LIABLE FOR ANY
+ * DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
+ * (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+ * LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
+ * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
+ * SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef __KEYGEN_H
+#define __KEYGEN_H
+
+#include <linux/io.h>
+
+struct fman_keygen;
+struct fman_kg_regs;
+
+struct fman_keygen *keygen_init(struct fman_kg_regs __iomem *keygen_regs);
+
+int keygen_port_hashing_init(struct fman_keygen *keygen, u8 hw_port_id,
+			     u32 hash_base_fqid, u32 hash_size);
+
+#endif /* __KEYGEN_H */
diff --git a/drivers/net/ethernet/freescale/fman/fman_port.c b/drivers/net/ethernet/freescale/fman/fman_port.c
index 49bfa11..b0ad9c4 100644
--- a/drivers/net/ethernet/freescale/fman/fman_port.c
+++ b/drivers/net/ethernet/freescale/fman/fman_port.c
@@ -35,6 +35,7 @@
 #include "fman_port.h"
 #include "fman.h"
 #include "fman_sp.h"
+#include "fman_keygen.h"
 
 #include <linux/io.h>
 #include <linux/slab.h>
@@ -184,6 +185,7 @@
 #define NIA_ENG_QMI_ENQ					0x00540000
 #define NIA_ENG_QMI_DEQ					0x00580000
 #define NIA_ENG_HWP					0x00440000
+#define NIA_ENG_HWK					0x00480000
 #define NIA_BMI_AC_ENQ_FRAME				0x00000002
 #define NIA_BMI_AC_TX_RELEASE				0x000002C0
 #define NIA_BMI_AC_RELEASE				0x000000C0
@@ -394,6 +396,8 @@ struct fman_port_bpools {
 struct fman_port_cfg {
 	u32 dflt_fqid;
 	u32 err_fqid;
+	u32 pcd_base_fqid;
+	u32 pcd_fqs_count;
 	u8 deq_sp;
 	bool deq_high_priority;
 	enum fman_port_deq_type deq_type;
@@ -1271,6 +1275,10 @@ static void set_rx_dflt_cfg(struct fman_port *port,
 		port_params->specific_params.rx_params.err_fqid;
 	port->cfg->dflt_fqid =
 		port_params->specific_params.rx_params.dflt_fqid;
+	port->cfg->pcd_base_fqid =
+		port_params->specific_params.rx_params.pcd_base_fqid;
+	port->cfg->pcd_fqs_count =
+		port_params->specific_params.rx_params.pcd_fqs_count;
 }
 
 static void set_tx_dflt_cfg(struct fman_port *port,
@@ -1398,6 +1406,24 @@ int fman_port_config(struct fman_port *port, struct fman_port_params *params)
 EXPORT_SYMBOL(fman_port_config);
 
 /**
+ * fman_port_use_kg_hash
+ * port:        A pointer to a FM Port module.
+ * Sets the HW KeyGen or the BMI as HW Parser next engine, enabling
+ * or bypassing the KeyGen hashing of Rx traffic
+ */
+void fman_port_use_kg_hash(struct fman_port *port, bool enable)
+{
+	if (enable)
+		/* After the Parser frames go to KeyGen */
+		iowrite32be(NIA_ENG_HWK, &port->bmi_regs->rx.fmbm_rfpne);
+	else
+		/* After the Parser frames go to BMI */
+		iowrite32be(NIA_ENG_BMI | NIA_BMI_AC_ENQ_FRAME,
+			    &port->bmi_regs->rx.fmbm_rfpne);
+}
+EXPORT_SYMBOL(fman_port_use_kg_hash);
+
+/**
  * fman_port_init
  * port:	A pointer to a FM Port module.
  * Initializes the FM PORT module by defining the software structure and
@@ -1407,9 +1433,10 @@ EXPORT_SYMBOL(fman_port_config);
  */
 int fman_port_init(struct fman_port *port)
 {
+	struct fman_port_init_params params;
+	struct fman_keygen *keygen;
 	struct fman_port_cfg *cfg;
 	int err;
-	struct fman_port_init_params params;
 
 	if (is_init_done(port->cfg))
 		return -EINVAL;
@@ -1472,6 +1499,17 @@ int fman_port_init(struct fman_port *port)
 	if (err)
 		return err;
 
+	if (port->cfg->pcd_fqs_count) {
+		keygen = fman_get_keygen(port->dts_params.fman);
+		err = keygen_port_hashing_init(keygen, port->port_id,
+					       port->cfg->pcd_base_fqid,
+					       port->cfg->pcd_fqs_count);
+		if (err)
+			return err;
+
+		fman_port_use_kg_hash(port, true);
+	}
+
 	kfree(port->cfg);
 	port->cfg = NULL;
 
diff --git a/drivers/net/ethernet/freescale/fman/fman_port.h b/drivers/net/ethernet/freescale/fman/fman_port.h
index 8ba9017..5a99611 100644
--- a/drivers/net/ethernet/freescale/fman/fman_port.h
+++ b/drivers/net/ethernet/freescale/fman/fman_port.h
@@ -100,6 +100,9 @@ struct fman_port;
 struct fman_port_rx_params {
 	u32 err_fqid;			/* Error Queue Id. */
 	u32 dflt_fqid;			/* Default Queue Id. */
+	u32 pcd_base_fqid;		/* PCD base Queue Id. */
+	u32 pcd_fqs_count;		/* Number of PCD FQs. */
+
 	/* Which external buffer pools are used
 	 * (up to FMAN_PORT_MAX_EXT_POOLS_NUM), and their sizes.
 	 */
@@ -134,6 +137,8 @@ struct fman_port_params {
 
 int fman_port_config(struct fman_port *port, struct fman_port_params *params);
 
+void fman_port_use_kg_hash(struct fman_port *port, bool enable);
+
 int fman_port_init(struct fman_port *port);
 
 int fman_port_cfg_buf_prefix_content(struct fman_port *port,
-- 
2.1.0

^ permalink raw reply related

* [PATCH 0/6] Add RSS to DPAA 1.x Ethernet driver
From: Madalin Bucur @ 2017-08-18  8:56 UTC (permalink / raw)
  To: netdev, davem; +Cc: linuxppc-dev, linux-kernel

This patch set introduces Receive Side Scaling for the DPAA Ethernet
driver. Documentation is updated with details related to the new
feature and limitations that apply.
Added also a small fix.

Iordache Florinel-R70177 (1):
  fsl/fman: enable FMan Keygen

Madalin Bucur (5):
  dpaa_eth: use multiple Rx frame queues
  dpaa_eth: enable Rx hashing control
  dpaa_eth: add NETIF_F_RXHASH
  Documentation: networking: add RSS information
  dpaa_eth: check allocation result

 Documentation/networking/dpaa.txt                  |  68 +-
 drivers/net/ethernet/freescale/dpaa/dpaa_eth.c     |  72 +-
 drivers/net/ethernet/freescale/dpaa/dpaa_eth.h     |   2 +
 .../net/ethernet/freescale/dpaa/dpaa_eth_sysfs.c   |   3 +
 drivers/net/ethernet/freescale/dpaa/dpaa_ethtool.c | 118 ++++
 drivers/net/ethernet/freescale/fman/Makefile       |   2 +-
 drivers/net/ethernet/freescale/fman/fman.c         |  26 +
 drivers/net/ethernet/freescale/fman/fman.h         |   2 +
 drivers/net/ethernet/freescale/fman/fman_keygen.c  | 783 +++++++++++++++++++++
 drivers/net/ethernet/freescale/fman/fman_keygen.h  |  46 ++
 drivers/net/ethernet/freescale/fman/fman_port.c    |  51 +-
 drivers/net/ethernet/freescale/fman/fman_port.h    |   7 +
 12 files changed, 1167 insertions(+), 13 deletions(-)
 create mode 100644 drivers/net/ethernet/freescale/fman/fman_keygen.c
 create mode 100644 drivers/net/ethernet/freescale/fman/fman_keygen.h

-- 
2.1.0

^ permalink raw reply

* Re: [PATCHv2 net] net: sched: fix NULL pointer dereference when action calls some targets
From: Pablo Neira Ayuso @ 2017-08-18  8:29 UTC (permalink / raw)
  To: Xin Long; +Cc: network dev, davem, Cong Wang
In-Reply-To: <f9e3e9338eeb819c40efb13fed33514a3df23c14.1503025296.git.lucien.xin@gmail.com>

On Fri, Aug 18, 2017 at 11:01:36AM +0800, Xin Long wrote:
> As we know in some target's checkentry it may dereference par.entryinfo
> to check entry stuff inside. But when sched action calls xt_check_target,
> par.entryinfo is set with NULL. It would cause kernel panic when calling
> some targets.
> 
> It can be reproduce with:
>   # tc qd add dev eth1 ingress handle ffff:
>   # tc filter add dev eth1 parent ffff: u32 match u32 0 0 action xt \
>     -j ECN --ecn-tcp-remove
> 
> It could also crash kernel when using target CLUSTERIP or TPROXY.
> 
> By now there's no proper value for par.entryinfo in ipt_init_target,
> but it can not be set with NULL. This patch is to void all these
> panics by setting it with an ipt_entry obj with all members = 0.
> 
> Note that this issue has been there since the very beginning.
> 
> Signed-off-by: Xin Long <lucien.xin@gmail.com>

Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>

^ permalink raw reply

* Re: [PATCH net-next] bpf: fix a return in sockmap_get_from_fd()
From: Daniel Borkmann @ 2017-08-18  8:21 UTC (permalink / raw)
  To: Dan Carpenter, Alexei Starovoitov, John Fastabend; +Cc: netdev, kernel-janitors
In-Reply-To: <20170818071210.wyq37kura6wz6bx6@mwanda>

On 08/18/2017 09:27 AM, Dan Carpenter wrote:
> "map" is a valid pointer.  We wanted to return "err" instead.  Also
> let's return a zero literal at the end.
>
> Fixes: 174a79ff9515 ("bpf: sockmap with sk redirect support")
> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>

Acked-by: Daniel Borkmann <daniel@iogearbox.net>

^ permalink raw reply

* [PATCH net-next 7/7] s390/qeth: use skb_cow_head() for L2 OSA xmit
From: Julian Wiedmann @ 2017-08-18  8:19 UTC (permalink / raw)
  To: David Miller
  Cc: netdev, linux-s390, Martin Schwidefsky, Heiko Carstens,
	Stefan Raspl, Ursula Braun, Julian Wiedmann
In-Reply-To: <20170818081910.48869-1-jwi@linux.vnet.ibm.com>

Taking a full copy via skb_realloc_headroom() on every xmit is overkill
and wastes CPU time; all we actually need is to push on the qeth_hdr.
So rework the L2 OSA TX path to avoid the copy.
Minor complications arise because struct qeth_hdr must not cross a page
boundary. So add a new helper qeth_push_hdr() that catches this, and
falls back to the hdr cache that we already use for IQDs.

This change uncovered that qeth's TX completion takes rather long.
Now that we no longer free the original skb straight away and thus call
skb->destructor later than before, throughput regresses significantly.
For now, restore old behaviour by adding an explicit skb_orphan(),
and a big TODO to improve the TX completion time.

Tested-by: Nils Hoppmann <niho@de.ibm.com>
Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
---
 drivers/s390/net/qeth_core.h      |  1 +
 drivers/s390/net/qeth_core_main.c | 28 +++++++++++++++++++
 drivers/s390/net/qeth_l2_main.c   | 58 ++++++++++++++++++++++++---------------
 3 files changed, 65 insertions(+), 22 deletions(-)

diff --git a/drivers/s390/net/qeth_core.h b/drivers/s390/net/qeth_core.h
index 5753fbc485d5..59e09854c4f7 100644
--- a/drivers/s390/net/qeth_core.h
+++ b/drivers/s390/net/qeth_core.h
@@ -985,6 +985,7 @@ int qeth_set_features(struct net_device *, netdev_features_t);
 int qeth_recover_features(struct net_device *);
 netdev_features_t qeth_fix_features(struct net_device *, netdev_features_t);
 int qeth_vm_request_mac(struct qeth_card *card);
+int qeth_push_hdr(struct sk_buff *skb, struct qeth_hdr **hdr, unsigned int len);
 
 /* exports for OSN */
 int qeth_osn_assist(struct net_device *, void *, int);
diff --git a/drivers/s390/net/qeth_core_main.c b/drivers/s390/net/qeth_core_main.c
index ffefdd97abca..bae7440abc01 100644
--- a/drivers/s390/net/qeth_core_main.c
+++ b/drivers/s390/net/qeth_core_main.c
@@ -3890,6 +3890,34 @@ int qeth_hdr_chk_and_bounce(struct sk_buff *skb, struct qeth_hdr **hdr, int len)
 }
 EXPORT_SYMBOL_GPL(qeth_hdr_chk_and_bounce);
 
+/**
+ * qeth_push_hdr() - push a qeth_hdr onto an skb.
+ * @skb: skb that the qeth_hdr should be pushed onto.
+ * @hdr: double pointer to a qeth_hdr. When returning with >= 0,
+ *	 it contains a valid pointer to a qeth_hdr.
+ * @len: length of the hdr that needs to be pushed on.
+ *
+ * Returns the pushed length. If the header can't be pushed on
+ * (eg. because it would cross a page boundary), it is allocated from
+ * the cache instead and 0 is returned.
+ * Error to create the hdr is indicated by returning with < 0.
+ */
+int qeth_push_hdr(struct sk_buff *skb, struct qeth_hdr **hdr, unsigned int len)
+{
+	if (skb_headroom(skb) >= len &&
+	    qeth_get_elements_for_range((addr_t)skb->data - len,
+					(addr_t)skb->data) == 1) {
+		*hdr = skb_push(skb, len);
+		return len;
+	}
+	/* fall back */
+	*hdr = kmem_cache_alloc(qeth_core_header_cache, GFP_ATOMIC);
+	if (!*hdr)
+		return -ENOMEM;
+	return 0;
+}
+EXPORT_SYMBOL_GPL(qeth_push_hdr);
+
 static void __qeth_fill_buffer(struct sk_buff *skb,
 			       struct qeth_qdio_out_buffer *buf,
 			       bool is_first_elem, unsigned int offset)
diff --git a/drivers/s390/net/qeth_l2_main.c b/drivers/s390/net/qeth_l2_main.c
index c85fadf21b38..760b023eae95 100644
--- a/drivers/s390/net/qeth_l2_main.c
+++ b/drivers/s390/net/qeth_l2_main.c
@@ -705,9 +705,11 @@ static int qeth_l2_xmit_iqd(struct qeth_card *card, struct sk_buff *skb,
 static int qeth_l2_xmit_osa(struct qeth_card *card, struct sk_buff *skb,
 			    struct qeth_qdio_out_q *queue, int cast_type)
 {
+	int push_len = sizeof(struct qeth_hdr);
 	unsigned int elements, nr_frags;
-	struct sk_buff *skb_copy;
-	struct qeth_hdr *hdr;
+	unsigned int hdr_elements = 0;
+	struct qeth_hdr *hdr = NULL;
+	unsigned int hd_len = 0;
 	int rc;
 
 	/* fix hardware limitation: as long as we do not have sbal
@@ -727,38 +729,44 @@ static int qeth_l2_xmit_osa(struct qeth_card *card, struct sk_buff *skb,
 	}
 	nr_frags = skb_shinfo(skb)->nr_frags;
 
-	/* create a copy with writeable headroom */
-	skb_copy = skb_realloc_headroom(skb, sizeof(struct qeth_hdr));
-	if (!skb_copy)
-		return -ENOMEM;
-	hdr = skb_push(skb_copy, sizeof(struct qeth_hdr));
-	qeth_l2_fill_header(hdr, skb_copy, cast_type,
-			    skb_copy->len - sizeof(*hdr));
-	if (skb_copy->ip_summed == CHECKSUM_PARTIAL)
-		qeth_l2_hdr_csum(card, hdr, skb_copy);
-
-	elements = qeth_get_elements_no(card, skb_copy, 0, 0);
+	rc = skb_cow_head(skb, push_len);
+	if (rc)
+		return rc;
+	push_len = qeth_push_hdr(skb, &hdr, push_len);
+	if (push_len < 0)
+		return push_len;
+	if (!push_len) {
+		/* hdr was allocated from cache */
+		hd_len = sizeof(*hdr);
+		hdr_elements = 1;
+	}
+	qeth_l2_fill_header(hdr, skb, cast_type, skb->len - push_len);
+	if (skb->ip_summed == CHECKSUM_PARTIAL)
+		qeth_l2_hdr_csum(card, hdr, skb);
+
+	elements = qeth_get_elements_no(card, skb, hdr_elements, 0);
 	if (!elements) {
 		rc = -E2BIG;
 		goto out;
 	}
-	if (qeth_hdr_chk_and_bounce(skb_copy, &hdr, sizeof(*hdr))) {
-		rc = -EINVAL;
-		goto out;
-	}
-	rc = qeth_do_send_packet(card, queue, skb_copy, hdr, 0, 0, elements);
+	elements += hdr_elements;
+
+	/* TODO: remove the skb_orphan() once TX completion is fast enough */
+	skb_orphan(skb);
+	rc = qeth_do_send_packet(card, queue, skb, hdr, 0, hd_len, elements);
 out:
 	if (!rc) {
-		/* tx success, free dangling original */
-		dev_kfree_skb_any(skb);
 		if (card->options.performance_stats && nr_frags) {
 			card->perf_stats.sg_skbs_sent++;
 			/* nr_frags + skb->data */
 			card->perf_stats.sg_frags_sent += nr_frags + 1;
 		}
 	} else {
-		/* tx fail, free copy */
-		dev_kfree_skb_any(skb_copy);
+		if (hd_len)
+			kmem_cache_free(qeth_core_header_cache, hdr);
+		if (rc == -EBUSY)
+			/* roll back to ETH header */
+			skb_pull(skb, push_len);
 	}
 	return rc;
 }
@@ -1011,6 +1019,12 @@ static int qeth_l2_setup_netdev(struct qeth_card *card)
 			card->dev->vlan_features |= NETIF_F_RXCSUM;
 		}
 	}
+	if (card->info.type != QETH_CARD_TYPE_OSN &&
+	    card->info.type != QETH_CARD_TYPE_IQD) {
+		card->dev->priv_flags &= ~IFF_TX_SKB_SHARING;
+		card->dev->needed_headroom = sizeof(struct qeth_hdr);
+	}
+
 	card->info.broadcast_capable = 1;
 	qeth_l2_request_initial_mac(card);
 	card->dev->gso_max_size = (QETH_MAX_BUFFER_ELEMENTS(card) - 1) *
-- 
2.11.2

^ permalink raw reply related

* [PATCH net-next 6/7] s390/qeth: unify code to build header elements
From: Julian Wiedmann @ 2017-08-18  8:19 UTC (permalink / raw)
  To: David Miller
  Cc: netdev, linux-s390, Martin Schwidefsky, Heiko Carstens,
	Stefan Raspl, Ursula Braun, Julian Wiedmann
In-Reply-To: <20170818081910.48869-1-jwi@linux.vnet.ibm.com>

After plenty of refactoring, use hd_len as single indication that
the skb needs a dedicated header element.

This preserves existing behaviour for TSO, as 'hdr' always points
to skb->data.

Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
---
 drivers/s390/net/qeth_core_main.c | 29 +++++++++++++++--------------
 1 file changed, 15 insertions(+), 14 deletions(-)

diff --git a/drivers/s390/net/qeth_core_main.c b/drivers/s390/net/qeth_core_main.c
index cef9f54d0eb9..ffefdd97abca 100644
--- a/drivers/s390/net/qeth_core_main.c
+++ b/drivers/s390/net/qeth_core_main.c
@@ -3953,37 +3953,38 @@ static void __qeth_fill_buffer(struct sk_buff *skb,
 	buf->next_element_to_fill = element;
 }
 
+/**
+ * qeth_fill_buffer() - map skb into an output buffer
+ * @queue:	QDIO queue to submit the buffer on
+ * @buf:	buffer to transport the skb
+ * @skb:	skb to map into the buffer
+ * @hdr:	qeth_hdr for this skb. Either at skb->data, or allocated
+ *		from qeth_core_header_cache.
+ * @offset:	when mapping the skb, start at skb->data + offset
+ * @hd_len:	if > 0, build a dedicated header element of this size
+ */
 static int qeth_fill_buffer(struct qeth_qdio_out_q *queue,
 			    struct qeth_qdio_out_buffer *buf,
 			    struct sk_buff *skb, struct qeth_hdr *hdr,
 			    unsigned int offset, unsigned int hd_len)
 {
-	struct qdio_buffer *buffer;
+	struct qdio_buffer *buffer = buf->buffer;
 	bool is_first_elem = true;
 	int flush_cnt = 0;
 
-	buffer = buf->buffer;
 	refcount_inc(&skb->users);
 	skb_queue_tail(&buf->skb_list, skb);
 
-	if (hdr->hdr.l3.id == QETH_HEADER_TYPE_TSO) {
-		int element = buf->next_element_to_fill;
-		is_first_elem = false;
-
-		/*fill first buffer entry only with header information */
-		buffer->element[element].addr = skb->data;
-		buffer->element[element].length = hd_len;
-		buffer->element[element].eflags = SBAL_EFLAGS_FIRST_FRAG;
-		buf->next_element_to_fill++;
-	/* IQD */
-	} else if (offset) {
+	/* build dedicated header element */
+	if (hd_len) {
 		int element = buf->next_element_to_fill;
 		is_first_elem = false;
 
 		buffer->element[element].addr = hdr;
 		buffer->element[element].length = hd_len;
 		buffer->element[element].eflags = SBAL_EFLAGS_FIRST_FRAG;
-		buf->is_header[element] = 1;
+		/* remember to free cache-allocated qeth_hdr: */
+		buf->is_header[element] = ((void *)hdr != skb->data);
 		buf->next_element_to_fill++;
 	}
 
-- 
2.11.2

^ permalink raw reply related

* [PATCH net-next 5/7] s390/qeth: pass full IQD header length to fill_buffer()
From: Julian Wiedmann @ 2017-08-18  8:19 UTC (permalink / raw)
  To: David Miller
  Cc: netdev, linux-s390, Martin Schwidefsky, Heiko Carstens,
	Stefan Raspl, Ursula Braun, Julian Wiedmann
In-Reply-To: <20170818081910.48869-1-jwi@linux.vnet.ibm.com>

This is a prerequisite for unifying the code to build header elements.
The TSO header has a different size, so we can no longer rely on implicitly
adding the size of a normal qeth_hdr.

No functional change.

Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
---
 drivers/s390/net/qeth_core_main.c | 3 +--
 drivers/s390/net/qeth_l2_main.c   | 2 +-
 drivers/s390/net/qeth_l3_main.c   | 3 ++-
 3 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/s390/net/qeth_core_main.c b/drivers/s390/net/qeth_core_main.c
index 4a5c3028dfb6..cef9f54d0eb9 100644
--- a/drivers/s390/net/qeth_core_main.c
+++ b/drivers/s390/net/qeth_core_main.c
@@ -3981,8 +3981,7 @@ static int qeth_fill_buffer(struct qeth_qdio_out_q *queue,
 		is_first_elem = false;
 
 		buffer->element[element].addr = hdr;
-		buffer->element[element].length = sizeof(struct qeth_hdr) +
-							hd_len;
+		buffer->element[element].length = hd_len;
 		buffer->element[element].eflags = SBAL_EFLAGS_FIRST_FRAG;
 		buf->is_header[element] = 1;
 		buf->next_element_to_fill++;
diff --git a/drivers/s390/net/qeth_l2_main.c b/drivers/s390/net/qeth_l2_main.c
index a6233ab562f0..c85fadf21b38 100644
--- a/drivers/s390/net/qeth_l2_main.c
+++ b/drivers/s390/net/qeth_l2_main.c
@@ -695,7 +695,7 @@ static int qeth_l2_xmit_iqd(struct qeth_card *card, struct sk_buff *skb,
 		goto out;
 	}
 	rc = qeth_do_send_packet_fast(card, queue, skb, hdr, data_offset,
-				      data_offset);
+				      sizeof(*hdr) + data_offset);
 out:
 	if (rc)
 		kmem_cache_free(qeth_core_header_cache, hdr);
diff --git a/drivers/s390/net/qeth_l3_main.c b/drivers/s390/net/qeth_l3_main.c
index 02400bbcb610..ab661a431f7c 100644
--- a/drivers/s390/net/qeth_l3_main.c
+++ b/drivers/s390/net/qeth_l3_main.c
@@ -2670,6 +2670,7 @@ static netdev_tx_t qeth_l3_hard_start_xmit(struct sk_buff *skb,
 	if (card->info.type == QETH_CARD_TYPE_IQD) {
 		new_skb = skb;
 		data_offset = ETH_HLEN;
+		hd_len = sizeof(*hdr);
 		hdr = kmem_cache_alloc(qeth_core_header_cache, GFP_ATOMIC);
 		if (!hdr)
 			goto tx_drop;
@@ -2771,7 +2772,7 @@ static netdev_tx_t qeth_l3_hard_start_xmit(struct sk_buff *skb,
 					 hd_len, elements);
 	} else
 		rc = qeth_do_send_packet_fast(card, queue, new_skb, hdr,
-					      data_offset, 0);
+					      data_offset, hd_len);
 
 	if (!rc) {
 		card->stats.tx_packets++;
-- 
2.11.2

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox