Netdev List
 help / color / mirror / Atom feed
* [net-next 8/8] Revert "igb: Fix a deadlock in igb_sriov_reinit"
From: Jeff Kirsher @ 2016-04-07  4:37 UTC (permalink / raw)
  To: davem; +Cc: Arika Chen, netdev, nhorman, sassmann, jogreene, Jeff Kirsher
In-Reply-To: <1460003853-133949-1-git-send-email-jeffrey.t.kirsher@intel.com>

From: Arika Chen <arika.chen@huawei.com>

This reverts commit 3eb14ea8d958 ("igb: Fix a deadlock in
igb_sriov_reinit")
It is the same as commit f468adc944ef ("igb: missing rtnl_unlock in
igb_sriov_reinit()")
There is no rtnl_lock() in igb_resume before, rtnl_unlock will cause a
deadlock.

Signed-off-by: Arika Chen <arika.chen@huawei.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
---
 drivers/net/ethernet/intel/igb/igb_main.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c
index ff0476c..8e96c35 100644
--- a/drivers/net/ethernet/intel/igb/igb_main.c
+++ b/drivers/net/ethernet/intel/igb/igb_main.c
@@ -7579,7 +7579,6 @@ static int igb_resume(struct device *dev)
 
 	if (igb_init_interrupt_scheme(adapter, true)) {
 		dev_err(&pdev->dev, "Unable to allocate memory for queues\n");
-		rtnl_unlock();
 		return -ENOMEM;
 	}
 
-- 
2.5.5

^ permalink raw reply related

* RE: [PATCH net-next] net: add the AF_KCM entries to family name tables
From: Dexuan Cui @ 2016-04-07  4:52 UTC (permalink / raw)
  To: David Miller; +Cc: netdev@vger.kernel.org
In-Reply-To: <20160406.235914.1353594601077158951.davem@davemloft.net>

> From: David Miller [mailto:davem@davemloft.net]
> Sent: Thursday, April 7, 2016 11:59
> To: Dexuan Cui <decui@microsoft.com>
> Cc: netdev@vger.kernel.org
> Subject: Re: [PATCH net-next] net: add the AF_KCM entries to family name
> tables
> 
> From: Dexuan Cui <decui@microsoft.com>
> Date: Thu, 7 Apr 2016 01:54:18 +0000
> 
> > Can you please apply this to net-next too?
> 
> That will happen transparently the next time I merge 'net' into
> 'net-next'.
> 
> It will happen at a time of my own choosing, and usually occurs
> when I do a push of my 'net' tree to Linus and he takes it in,
> and I know people need some 'net' things in 'net-next'.

Thanks for the explanation!

So, at present, let me only post the single AF_HYPERV patch to
net-next and hold the patch that adds AF_HYPERV entries to the family
name tables.

Thanks,
-- Dexuan

^ permalink raw reply

* [PATCH net-next] tcp/dccp: fix inet_reuseport_add_sock()
From: Eric Dumazet @ 2016-04-07  5:07 UTC (permalink / raw)
  To: David Ahern, David Miller; +Cc: netdev@vger.kernel.org, edumazet
In-Reply-To: <5705A392.2000207@cumulusnetworks.com>

From: Eric Dumazet <edumazet@google.com>

David Ahern reported panics in __inet_hash() caused by my recent commit.

The reason is inet_reuseport_add_sock() was still using
sk_nulls_for_each_rcu() instead of sk_for_each_rcu().
SO_REUSEPORT enabled listeners were causing an instant crash.

While chasing this bug, I found that I forgot to clear SOCK_RCU_FREE
flag, as it is inherited from the parent at clone time.

Fixes: 3b24d854cb35 ("tcp/dccp: do not touch listener sk_refcnt under synflood")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: David Ahern <dsa@cumulusnetworks.com>
---
 net/ipv4/inet_connection_sock.c |    3 +++
 net/ipv4/inet_hashtables.c      |    3 +--
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
index bc5196ea1bdf..ab69da2d2a77 100644
--- a/net/ipv4/inet_connection_sock.c
+++ b/net/ipv4/inet_connection_sock.c
@@ -661,6 +661,9 @@ struct sock *inet_csk_clone_lock(const struct sock *sk,
 		inet_sk(newsk)->inet_sport = htons(inet_rsk(req)->ir_num);
 		newsk->sk_write_space = sk_stream_write_space;
 
+		/* listeners have SOCK_RCU_FREE, not the children */
+		sock_reset_flag(newsk, SOCK_RCU_FREE);
+
 		newsk->sk_mark = inet_rsk(req)->ir_mark;
 		atomic64_set(&newsk->sk_cookie,
 			     atomic64_read(&inet_rsk(req)->ir_cookie));
diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c
index 98ba03b6f87d..fcadb670f50b 100644
--- a/net/ipv4/inet_hashtables.c
+++ b/net/ipv4/inet_hashtables.c
@@ -439,10 +439,9 @@ static int inet_reuseport_add_sock(struct sock *sk,
 						     bool match_wildcard))
 {
 	struct sock *sk2;
-	struct hlist_nulls_node *node;
 	kuid_t uid = sock_i_uid(sk);
 
-	sk_nulls_for_each_rcu(sk2, node, &ilb->head) {
+	sk_for_each_rcu(sk2, &ilb->head) {
 		if (sk2 != sk &&
 		    sk2->sk_family == sk->sk_family &&
 		    ipv6_only_sock(sk2) == ipv6_only_sock(sk) &&

^ permalink raw reply related

* Re: [PATCH RFC] net: decrease the length of backlog queue immediately after it's detached from sk
From: Yang Yingliang @ 2016-04-07  6:01 UTC (permalink / raw)
  To: Sergei Shtylyov, netdev; +Cc: davem, eric.dumazet
In-Reply-To: <56FBCCF6.80203@cogentembedded.com>



On 2016/3/30 20:56, Sergei Shtylyov wrote:
> Hello.
>
> On 3/30/2016 8:16 AM, Yang Yingliang wrote:
>
>> When task A hold the sk owned in tcp_sendmsg, if lots of packets
>> arrive and the packets will be added to backlog queue. The packets
>> will be handled in release_sock called from tcp_sendmsg. When the
>> sk_backlog is removed from sk, the length will not decrease until
>> all the packets in backlog queue are handled. This may leads to the
>> new packets be dropped because the lenth is too big. So set the
>> lenth to 0 immediately after it's detached from sk.
>
>     Length?
>
>> Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
> [...]
>
> MBR, Sergei
>
>
Yes. It's a typo.

Thanks
Yang

^ permalink raw reply

* Re: [PATCH RFC] net: decrease the length of backlog queue immediately after it's detached from sk
From: Yang Yingliang @ 2016-04-07  5:59 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev, davem, Ding Tianhong
In-Reply-To: <1459345637.6473.205.camel@edumazet-glaptop3.roam.corp.google.com>



On 2016/3/30 21:47, Eric Dumazet wrote:
> On Wed, 2016-03-30 at 13:56 +0800, Yang Yingliang wrote:
>
>> Sorry, I made a mistake. I am very sure my kernel has these two patches.
>> And I can get some dropping of the packets in 10Gb eth.
>>
>> # netstat -s | grep -i backlog
>>       TCPBacklogDrop: 4135
>> # netstat -s | grep -i backlog
>>       TCPBacklogDrop: 4167
>
> Sender will retransmit and the receiver backlog will lilely be emptied
> before the packets arrive again.
>
> Are you sure these are TCP drops ?
Yes.

>
> Which 10Gb NIC is it ? (ethtool -i eth0)
The NIC driver is not upstream. And my system is arm64.

>
> What is the max size of sendmsg() chunks are generated by your apps ?
256KB

>
> Are they forcing small SO_RCVBUF or SO_SNDBUF ?
I am not sure.
I add some debug message in kernel:
[2016-04-06 10:56:55][ 1365.477140] TCP: rcvbuf:10485760 sndbuf:2097152 
limit:12582912 backloglen:12402232 rmem_alloc:0 truesize:53320
[2016-04-06 10:56:55][ 1365.477170] TCP: rcvbuf:10485760 sndbuf:2097152 
limit:12582912 backloglen:12460884 rmem_alloc:55986 truesize:58652
[2016-04-06 10:56:55][ 1365.477192] TCP: rcvbuf:10485760 sndbuf:2097152 
limit:12582912 backloglen:12506206 rmem_alloc:0 truesize:45322
[2016-04-06 10:56:55][ 1365.477226] TCP: rcvbuf:10485760 sndbuf:2097152 
limit:12582912 backloglen:12519536 rmem_alloc:7998 truesize:13330
[2016-04-06 10:56:55][ 1365.477254] TCP: rcvbuf:10485760 sndbuf:2097152 
limit:12582912 backloglen:12575522 rmem_alloc:0 truesize:55986
[2016-04-06 10:56:55][ 1365.477282] TCP: rcvbuf:10485760 sndbuf:2097152 
limit:12582912 backloglen:12634174 rmem_alloc:0 truesize:58652
[2016-04-06 10:56:55][ 1365.477301] TCP: rcvbuf:10485760 sndbuf:2097152 
limit:12582912 backloglen:12634174 rmem_alloc:26660 truesize:31992
[2016-04-06 10:56:55][ 1365.477321] TCP: rcvbuf:10485760 sndbuf:2097152 
limit:12582912 backloglen:12634174 rmem_alloc:58652 truesize:26660
[2016-04-06 10:56:55][ 1365.477341] TCP: rcvbuf:10485760 sndbuf:2097152 
limit:12582912 backloglen:12634174 rmem_alloc:58652 truesize:42656
[2016-04-06 10:56:55][ 1365.477384] TCP: rcvbuf:10485760 sndbuf:2097152 
limit:12582912 backloglen:12634174 rmem_alloc:0 truesize:58652
[2016-04-06 10:56:55][ 1365.477403] TCP: rcvbuf:10485760 sndbuf:2097152 
limit:12582912 backloglen:12634174 rmem_alloc:0 truesize:34658

>
> What percentage of drops do you have ?
netstat -s | grep -i TCPBacklogDrop increases 20-40 per second.
It's about 1.2% (117724(TCPBacklogDrop)/214502873(InSegs of cat 
/proc/net/snmp)).

>
> Here (at Google), we have less than one backlog drop per billion
> packets, on host facing the public Internet.
>
> If a TCP sender sends a burst of tiny packets because it is misbehaving,
> you absolutely will drop packets, especially if applications use
> sendmsg() with very big lengths and big SO_SNDBUF.
>
> Trying to not drop these hostile packets as you did is simply opening
> your host to DOS attacks.
>
> Eventually, we should even drop earlier in TCP stack (before taking
> socket lock).
>
>
How about expand the buffer like:

diff --git a/include/net/tcp.h b/include/net/tcp.h
index 6d204f3..da1bc16 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -281,6 +281,7 @@ extern unsigned int sysctl_tcp_notsent_lowat;
  extern int sysctl_tcp_min_tso_segs;
  extern int sysctl_tcp_autocorking;
  extern int sysctl_tcp_invalid_ratelimit;
+extern int sysctl_tcp_backlog_buf_multi;

  extern atomic_long_t tcp_memory_allocated;
  extern struct percpu_counter tcp_sockets_allocated;
diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c
index f0e8297..9511410 100644
--- a/net/ipv4/sysctl_net_ipv4.c
+++ b/net/ipv4/sysctl_net_ipv4.c
@@ -631,6 +631,13 @@ static struct ctl_table ipv4_table[] = {
  		.mode		= 0644,
  		.proc_handler	= proc_dointvec
  	},
+	{
+		.procname	= "tcp_backlog_buf_multi",
+		.data		= &sysctl_tcp_backlog_buf_multi,
+		.maxlen		= sizeof(int),
+		.mode		= 0644,
+		.proc_handler	= proc_dointvec,
+	},
  #ifdef CONFIG_NETLABEL
  	{
  		.procname	= "cipso_cache_enable",
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 87463c8..337ad55 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -101,6 +101,8 @@ int sysctl_tcp_thin_dupack __read_mostly;
  int sysctl_tcp_moderate_rcvbuf __read_mostly = 1;
  int sysctl_tcp_early_retrans __read_mostly = 3;
  int sysctl_tcp_invalid_ratelimit __read_mostly = HZ/2;
+int sysctl_tcp_backlog_buf_multi __read_mostly = 1;
+EXPORT_SYMBOL(sysctl_tcp_backlog_buf_multi);

  #define FLAG_DATA		0x01 /* Incoming frame contained data.		*/
  #define FLAG_WIN_UPDATE		0x02 /* Incoming ACK was a window update.	*/
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 13b92d5..39272f3 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -1635,7 +1635,8 @@ process:
  		if (!tcp_prequeue(sk, skb))
  			ret = tcp_v4_do_rcv(sk, skb);
  	} else if (unlikely(sk_add_backlog(sk, skb,
-					   sk->sk_rcvbuf + sk->sk_sndbuf))) {
+					   (sk->sk_rcvbuf + sk->sk_sndbuf) *
+					   sysctl_tcp_backlog_buf_multi))) {
  		bh_unlock_sock(sk);
  		NET_INC_STATS_BH(net, LINUX_MIB_TCPBACKLOGDROP);
  		goto discard_and_relse;
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index c1147ac..1e8f709 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -1433,7 +1433,8 @@ process:
  		if (!tcp_prequeue(sk, skb))
  			ret = tcp_v6_do_rcv(sk, skb);
  	} else if (unlikely(sk_add_backlog(sk, skb,
-					   sk->sk_rcvbuf + sk->sk_sndbuf))) {
+					   (sk->sk_rcvbuf + sk->sk_sndbuf) *
+					   sysctl_tcp_backlog_buf_multi))) {
  		bh_unlock_sock(sk);
  		NET_INC_STATS_BH(net, LINUX_MIB_TCPBACKLOGDROP);
  		goto discard_and_relse;
-- 

^ permalink raw reply related

* Backport Security Fix for CVE-2015-8787 to v4.1
From: Yuki Machida @ 2016-04-07  6:40 UTC (permalink / raw)
  To: davem; +Cc: netdev, kamatam, pablo

Hi David,

I conformed that a patch of CVE-2015-8787 not applied at v4.1.21.
Could you please apply a patch for 4.1-stable ?

CVE-2015-8787
Upstream commit 94f9cd81436c85d8c3a318ba92e236ede73752fc

Regards,
Yuki Machida

^ permalink raw reply

* [PATCH] net: mark DECnet as broken
From: Vegard Nossum @ 2016-04-07  7:22 UTC (permalink / raw)
  To: David Miller
  Cc: netdev, linux-kernel, Vegard Nossum, Eric Dumazet, Sasha Levin

There are NULL pointer dereference bugs in DECnet which can be triggered
by unprivileged users and have been reported multiple times to LKML,
however nobody seems confident enough in the proposed fixes to merge them
and the consensus seems to be that nobody cares enough about DECnet to
see it fixed anyway.

To shield unsuspecting users from the possible DOS, we should mark this
BROKEN until somebody who actually uses this code can fix it.

Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Link: https://lkml.org/lkml/2015/12/17/666
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: David Miller <davem@davemloft.net>
---
 net/decnet/Kconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/decnet/Kconfig b/net/decnet/Kconfig
index f3393e1..b040172 100644
--- a/net/decnet/Kconfig
+++ b/net/decnet/Kconfig
@@ -3,6 +3,7 @@
 #
 config DECNET
 	tristate "DECnet Support"
+	depends on BROKEN
 	---help---
 	  The DECnet networking protocol was used in many products made by
 	  Digital (now Compaq).  It provides reliable stream and sequenced
-- 
1.9.1

^ permalink raw reply related

* [PATCH] netlink: don't send NETLINK_URELEASE for unbound sockets
From: Johannes Berg @ 2016-04-07  7:31 UTC (permalink / raw)
  To: netdev; +Cc: Dmitry Ivanov, linux-wireless

From: Dmitry Ivanov <dmitrijs.ivanovs@ubnt.com>

All existing users of NETLINK_URELEASE use it to clean up resources that
were previously allocated to a socket via some command. As a result, no
users require getting this notification for unbound sockets.

Sending it for unbound sockets, however, is a problem because any user
(including unprivileged users) can create a socket that uses the same ID
as an existing socket. Binding this new socket will fail, but if the
NETLINK_URELEASE notification is generated for such sockets, the users
thereof will be tricked into thinking the socket that they allocated the
resources for is closed.

In the nl80211 case, this will cause destruction of virtual interfaces
that still belong to an existing hostapd process; this is the case that
Dmitry noticed. In the NFC case, it will cause a poll abort. In the case
of netlink log/queue it will cause them to stop reporting events, as if
NFULNL_CFG_CMD_UNBIND/NFQNL_CFG_CMD_UNBIND had been called.

Fix this problem by checking that the socket is bound before generating
the NETLINK_URELEASE notification.

Cc: stable@vger.kernel.org
Signed-off-by: Dmitry Ivanov <dima@ubnt.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
---
 net/netlink/af_netlink.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index 215fc08c02ab..330ebd600f25 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -688,7 +688,7 @@ static int netlink_release(struct socket *sock)
 
 	skb_queue_purge(&sk->sk_write_queue);
 
-	if (nlk->portid) {
+	if (nlk->portid && nlk->bound) {
 		struct netlink_notify n = {
 						.net = sock_net(sk),
 						.protocol = sk->sk_protocol,
-- 
2.7.0

^ permalink raw reply related

* Re: [PATCH net-next] bpf: simplify verifier register state assignments
From: Daniel Borkmann @ 2016-04-07  7:39 UTC (permalink / raw)
  To: Alexei Starovoitov; +Cc: David S . Miller, netdev
In-Reply-To: <1459996761-2926623-1-git-send-email-ast@fb.com>

On 04/07/2016 04:39 AM, Alexei Starovoitov wrote:
> verifier is using the following structure to track the state of registers:
> struct reg_state {
>      enum bpf_reg_type type;
>      union {
>          int imm;
>          struct bpf_map *map_ptr;
>      };
> };
> and later on in states_equal() does memcmp(&old->regs[i], &cur->regs[i],..)
> to find equivalent states.
> Throughout the code of verifier there are assignements to 'imm' and 'map_ptr'
> fields and it's not obvious that most of the assignments into 'imm' don't
> need to clear extra 4 bytes (like mark_reg_unknown_value() does) to make sure
> that memcmp doesn't go over junk left from 'map_ptr' assignment.
>
> Simplify the code by converting 'int' into 'long'
>
> Suggested-by: Daniel Borkmann <daniel@iogearbox.net>
> Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Acked-by: Daniel Borkmann <daniel@iogearbox.net>

^ permalink raw reply

* Re: [PATCH] net: mark DECnet as broken
From: James Cameron @ 2016-04-07  7:50 UTC (permalink / raw)
  To: Vegard Nossum
  Cc: David Miller, netdev, linux-kernel, Eric Dumazet, Sasha Levin
In-Reply-To: <1460013763-22985-1-git-send-email-vegard.nossum@oracle.com>

On Thu, Apr 07, 2016 at 09:22:43AM +0200, Vegard Nossum wrote:
> There are NULL pointer dereference bugs in DECnet which can be triggered
> by unprivileged users and have been reported multiple times to LKML,
> however nobody seems confident enough in the proposed fixes to merge them
> and the consensus seems to be that nobody cares enough about DECnet to
> see it fixed anyway.
> 
> To shield unsuspecting users from the possible DOS, we should mark this
> BROKEN until somebody who actually uses this code can fix it.
> 
> Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
> Link: https://lkml.org/lkml/2015/12/17/666
> Cc: Eric Dumazet <eric.dumazet@gmail.com>
> Cc: Sasha Levin <sasha.levin@oracle.com>
> Cc: David Miller <davem@davemloft.net>

Reviewed-by: James Cameron <quozl@laptop.org>

(An old DECnet application programmer from way back, ah what fun!)

> ---
>  net/decnet/Kconfig | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/net/decnet/Kconfig b/net/decnet/Kconfig
> index f3393e1..b040172 100644
> --- a/net/decnet/Kconfig
> +++ b/net/decnet/Kconfig
> @@ -3,6 +3,7 @@
>  #
>  config DECNET
>  	tristate "DECnet Support"
> +	depends on BROKEN
>  	---help---
>  	  The DECnet networking protocol was used in many products made by
>  	  Digital (now Compaq).  It provides reliable stream and sequenced

fwiw, then Compaq merged into HP.

> -- 
> 1.9.1
> 

-- 
James Cameron
http://quozl.netrek.org/

^ permalink raw reply

* Re: [RFC PATCH 0/2] selinux: avoid nf hooks overhead when not needed
From: Paolo Abeni @ 2016-04-07  7:59 UTC (permalink / raw)
  To: Casey Schaufler
  Cc: linux-security-module, David S. Miller, James Morris, Paul Moore,
	Andreas Gruenbacher, Stephen Smalley, Florian Westphal, netdev,
	selinux
In-Reply-To: <570582FC.7030807@schaufler-ca.com>

Hi Casey,

On Wed, 2016-04-06 at 14:43 -0700, Casey Schaufler wrote:
> On 4/6/2016 2:51 AM, Paolo Abeni wrote:
> > Currently, selinux always registers iptables POSTROUTING hooks regarless of
> > the running policy needs for any action to be performed by them.
> >
> > Even the socket_sock_rcv_skb() is always registered, but it can result in a no-op
> > depending on the current policy configuration.
> >
> > The above invocations in the kernel datapath are cause of measurable
> > overhead in networking performance test.
> >
> > This patch series adds explicit notification for netlabel status change 
> > (other relevant status change, like xfrm and secmark, are already notified to
> > LSM) and use this information in selinux to register the above hooks only when
> > the current status makes them relevant, deregistering them when no-op
> >
> > Avoiding the LSM hooks overhead, in netperf UDP_STREAM test with small packets,
> > gives about 5% performance improvement on rx and about 8% on tx.
> >
> > Paolo Abeni (2):
> >   security: add hook for netlabel status change notification
> >   selinux: implement support for dynamic net hook [de-]registration

Thank you for the feedback. The patch series is an RFC, so it's still
rough and not yet well tested in all possible scenarios.

> Did you consider the fact that netlabel and the LSM socket
> hooks are used by Smack as well as SELinux? 

Actually yes. The patch series itself is explicitly targeted at reducing
some overhead introduced by selinux in network loads (I'm sorry, now I
see that the last sentence in the cover letter is misleading), and it
tries to achieve that result without affecting others LSM users.

The first patch in the series just introduces an optional LSM hook
(netlbl_changed) that is invoked every time the
'netlabel_mgmt_protocount' values is changed. It do not modify the
behavior nor meaning of any of the existing hooks and/or netlabel APIs.
It's up to the security module to leverage (or not) the new one.

> Did you measure the impact that your changes have on Smack? 

Actually I didn't. This is one of the reasons I posted the patch as RFC.
As per design security modules not implementing 'netlbl_changed' should
not be affected. Am I missing something ?

Regards,

Paolo


^ permalink raw reply

* [PATCH 0/1] net: stmmac: socfgpa: Ensure emac bit set in System Manger for PTP
From: Phil Reid @ 2016-04-07  7:55 UTC (permalink / raw)
  To: peppe.cavallaro, netdev; +Cc: Phil Reid

Enable PTP FPGA clock, pps and ext trig connections to stmmac.
Note: This hardware configuration is not offically support by Altera.


Phil Reid (1):
  net: stmmac: socfgpa: Ensure emac bit set in System Manger for PTP

 drivers/net/ethernet/stmicro/stmmac/dwmac-socfpga.c | 16 +++++++++++++---
 1 file changed, 13 insertions(+), 3 deletions(-)

-- 
1.8.3.1

^ permalink raw reply

* [PATCH 1/1] net: stmmac: socfgpa: Ensure emac bit set in System Manger for PTP
From: Phil Reid @ 2016-04-07  7:55 UTC (permalink / raw)
  To: peppe.cavallaro, netdev; +Cc: Phil Reid
In-Reply-To: <1460015735-8946-1-git-send-email-preid@electromag.com.au>

When using the PTP fpga to hps clock source for the stmmac module
the appropriate bit in the System Manager FPGA Interface Group register
needs to be set. This is not set by the bootloader setup  when the
HPS emac pins are being for this emac module.

This allows the PTP clock to be sourced from the FPGA and also connects
the PTP pps and ext trig signals to the stmmac PTP hardware.

Patch proposed by Phil Collins.

Signed-off-by: Phil Reid <preid@electromag.com.au>
---
 drivers/net/ethernet/stmicro/stmmac/dwmac-socfpga.c | 16 +++++++++++++---
 1 file changed, 13 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-socfpga.c b/drivers/net/ethernet/stmicro/stmmac/dwmac-socfpga.c
index f0d797a..44022b1 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac-socfpga.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-socfpga.c
@@ -34,6 +34,9 @@
 #define SYSMGR_EMACGRP_CTRL_PHYSEL_MASK 0x00000003
 #define SYSMGR_EMACGRP_CTRL_PTP_REF_CLK_MASK 0x00000010
 
+#define SYSMGR_FPGAGRP_MODULE_REG  0x00000028
+#define SYSMGR_FPGAGRP_MODULE_EMAC 0x00000004
+
 #define EMAC_SPLITTER_CTRL_REG			0x0
 #define EMAC_SPLITTER_CTRL_SPEED_MASK		0x3
 #define EMAC_SPLITTER_CTRL_SPEED_10		0x2
@@ -148,7 +151,7 @@ static int socfpga_dwmac_setup(struct socfpga_dwmac *dwmac)
 	int phymode = dwmac->interface;
 	u32 reg_offset = dwmac->reg_offset;
 	u32 reg_shift = dwmac->reg_shift;
-	u32 ctrl, val;
+	u32 ctrl, val, module;
 
 	switch (phymode) {
 	case PHY_INTERFACE_MODE_RGMII:
@@ -175,12 +178,19 @@ static int socfpga_dwmac_setup(struct socfpga_dwmac *dwmac)
 	ctrl &= ~(SYSMGR_EMACGRP_CTRL_PHYSEL_MASK << reg_shift);
 	ctrl |= val << reg_shift;
 
-	if (dwmac->f2h_ptp_ref_clk)
+	if (dwmac->f2h_ptp_ref_clk) {
 		ctrl |= SYSMGR_EMACGRP_CTRL_PTP_REF_CLK_MASK << (reg_shift / 2);
-	else
+		regmap_read(sys_mgr_base_addr, SYSMGR_FPGAGRP_MODULE_REG,
+			    &module);
+		module |= (SYSMGR_FPGAGRP_MODULE_EMAC << (reg_shift / 2));
+		regmap_write(sys_mgr_base_addr, SYSMGR_FPGAGRP_MODULE_REG,
+			     module);
+	} else {
 		ctrl &= ~(SYSMGR_EMACGRP_CTRL_PTP_REF_CLK_MASK << (reg_shift / 2));
+	}
 
 	regmap_write(sys_mgr_base_addr, reg_offset, ctrl);
+
 	return 0;
 }
 
-- 
1.8.3.1

^ permalink raw reply related

* Re: [PATCH v2 2/2] sctp: delay calls to sk_data_ready() as much as possible
From: Jakub Sitnicki @ 2016-04-07  8:05 UTC (permalink / raw)
  To: Marcelo Ricardo Leitner; +Cc: netdev, Neil Horman, Vlad Yasevich, linux-sctp
In-Reply-To: <703257ed516669b180fcce57e6745b1853da9a95.1459952558.git.marcelo.leitner@gmail.com>

On Wed, Apr 06, 2016 at 07:53 PM CEST, Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> wrote:
> Currently, the processing of multiple chunks in a single SCTP packet
> leads to multiple calls to sk_data_ready, causing multiple wake up
> signals which are costly and doesn't make it wake up any faster.
>
> With this patch it will notice that the wake up is pending and will do it
> before leaving the state machine interpreter, latest place possible to
> do it realiably and cleanly.
>
> Note that sk_data_ready events are not dependent on asocs, unlike waking
> up writers.
>
> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
> ---

[...]

> diff --git a/net/sctp/sm_sideeffect.c b/net/sctp/sm_sideeffect.c
> index 7fe56d0acabf66cfd8fe29dfdb45f7620b470ac7..e7042f9ce63b0cfca50cae252f51b60b68cb5731 100644
> --- a/net/sctp/sm_sideeffect.c
> +++ b/net/sctp/sm_sideeffect.c
> @@ -1742,6 +1742,11 @@ out:
>  			error = sctp_outq_uncork(&asoc->outqueue, gfp);
>  	} else if (local_cork)
>  		error = sctp_outq_uncork(&asoc->outqueue, gfp);
> +
> +	if (sctp_sk(ep->base.sk)->pending_data_ready) {
> +		ep->base.sk->sk_data_ready(ep->base.sk);
> +		sctp_sk(ep->base.sk)->pending_data_ready = 0;
> +	}
>  	return error;
>  nomem:
>  	error = -ENOMEM;

Would it make sense to introduce a local variable for ep->base.sk (and
make this function 535+1 lines long ;-)

      struct sock *sk = ep->base.sk;

... like sctp_ulpq_tail_event() does?

Thanks,
Jakub

^ permalink raw reply

* Re: [PATCH RFC] net: decrease the length of backlog queue immediately after it's detached from sk
From: Eric Dumazet @ 2016-04-07 10:21 UTC (permalink / raw)
  To: Yang Yingliang; +Cc: netdev, davem, Ding Tianhong
In-Reply-To: <5705F759.9020003@huawei.com>

On Thu, 2016-04-07 at 13:59 +0800, Yang Yingliang wrote:
> 
> On 2016/3/30 21:47, Eric Dumazet wrote:
> > On Wed, 2016-03-30 at 13:56 +0800, Yang Yingliang wrote:
> >
> >> Sorry, I made a mistake. I am very sure my kernel has these two patches.
> >> And I can get some dropping of the packets in 10Gb eth.
> >>
> >> # netstat -s | grep -i backlog
> >>       TCPBacklogDrop: 4135
> >> # netstat -s | grep -i backlog
> >>       TCPBacklogDrop: 4167
> >
> > Sender will retransmit and the receiver backlog will lilely be emptied
> > before the packets arrive again.
> >
> > Are you sure these are TCP drops ?
> Yes.
> 
> >
> > Which 10Gb NIC is it ? (ethtool -i eth0)
> The NIC driver is not upstream. And my system is arm64.
> 
> >
> > What is the max size of sendmsg() chunks are generated by your apps ?
> 256KB
> 
> >
> > Are they forcing small SO_RCVBUF or SO_SNDBUF ?
> I am not sure.
> I add some debug message in kernel:
> [2016-04-06 10:56:55][ 1365.477140] TCP: rcvbuf:10485760 sndbuf:2097152 
> limit:12582912 backloglen:12402232 rmem_alloc:0 truesize:53320
> [2016-04-06 10:56:55][ 1365.477170] TCP: rcvbuf:10485760 sndbuf:2097152 
> limit:12582912 backloglen:12460884 rmem_alloc:55986 truesize:58652
> [2016-04-06 10:56:55][ 1365.477192] TCP: rcvbuf:10485760 sndbuf:2097152 
> limit:12582912 backloglen:12506206 rmem_alloc:0 truesize:45322
> [2016-04-06 10:56:55][ 1365.477226] TCP: rcvbuf:10485760 sndbuf:2097152 
> limit:12582912 backloglen:12519536 rmem_alloc:7998 truesize:13330
> [2016-04-06 10:56:55][ 1365.477254] TCP: rcvbuf:10485760 sndbuf:2097152 
> limit:12582912 backloglen:12575522 rmem_alloc:0 truesize:55986
> [2016-04-06 10:56:55][ 1365.477282] TCP: rcvbuf:10485760 sndbuf:2097152 
> limit:12582912 backloglen:12634174 rmem_alloc:0 truesize:58652
> [2016-04-06 10:56:55][ 1365.477301] TCP: rcvbuf:10485760 sndbuf:2097152 
> limit:12582912 backloglen:12634174 rmem_alloc:26660 truesize:31992
> [2016-04-06 10:56:55][ 1365.477321] TCP: rcvbuf:10485760 sndbuf:2097152 
> limit:12582912 backloglen:12634174 rmem_alloc:58652 truesize:26660
> [2016-04-06 10:56:55][ 1365.477341] TCP: rcvbuf:10485760 sndbuf:2097152 
> limit:12582912 backloglen:12634174 rmem_alloc:58652 truesize:42656
> [2016-04-06 10:56:55][ 1365.477384] TCP: rcvbuf:10485760 sndbuf:2097152 
> limit:12582912 backloglen:12634174 rmem_alloc:0 truesize:58652
> [2016-04-06 10:56:55][ 1365.477403] TCP: rcvbuf:10485760 sndbuf:2097152 
> limit:12582912 backloglen:12634174 rmem_alloc:0 truesize:34658
> 
> >
> > What percentage of drops do you have ?
> netstat -s | grep -i TCPBacklogDrop increases 20-40 per second.
> It's about 1.2% (117724(TCPBacklogDrop)/214502873(InSegs of cat 
> /proc/net/snmp)).
> 
> >
> > Here (at Google), we have less than one backlog drop per billion
> > packets, on host facing the public Internet.
> >
> > If a TCP sender sends a burst of tiny packets because it is misbehaving,
> > you absolutely will drop packets, especially if applications use
> > sendmsg() with very big lengths and big SO_SNDBUF.
> >
> > Trying to not drop these hostile packets as you did is simply opening
> > your host to DOS attacks.
> >
> > Eventually, we should even drop earlier in TCP stack (before taking
> > socket lock).
> >
> >
> How about expand the buffer like:

Please do not send patches before really understanding the issue you
have.

Having a backlog of 12506206 bytes is ridiculous. Dropping packets is
absolutely fine if this ever happens.

Something is really wrong on your host, or the sender simply does not
comply with TCP protocol (not caring of receiver window at all)

Since you added a trace of truesize, please also trace skb->len

^ permalink raw reply

* Re: [PATCH net-next v2] macvlan: Support interface operstate properly
From: Nikolay Aleksandrov @ 2016-04-07 11:05 UTC (permalink / raw)
  To: Debabrata Banerjee, Patrick McHardy, netdev
In-Reply-To: <1459982173-791-1-git-send-email-dbanerje@akamai.com>

On 04/07/2016 12:36 AM, Debabrata Banerjee wrote:
> Set appropriate macvlan interface status based on lower device and our
> status. Can be up, down, or lowerlayerdown.
What about dormant ?

> 
> de7d244d0 improved operstate by setting it from unknown to up, however
> it did not handle transferring down or lowerlayerdown.
No, this is not correct. It did not set it to "up", lowerlayerdown is currently
being set when the lower device is carrier-off, the only thing not handled is
the lower device going to admin down state. Up until this patch lowerlayerdown is
correctly propagated and set based on lower and macvlan device's carrier states,
after this patch you also include the device flags which makes things inconsistent
because you can overwrite the operstate set by link_watch (or user) afterwards, and
this is especially true for the dormant case. In fact you won't be able to set dormant
on a macvlan device, it'll always get overwritten.
Also another (minor) inconsistency is that user-space will no longer get IFF_RUNNING in the
flags by dev_get_flags() so when the lower device is down but carrier-on and the macvlan
is now in LOWERLAYERDOWN, so it will also show up as NO-CARRIER:
6: mac1@eth2: <NO-CARRIER,BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1500 qdisc noqueue state LOWERLAYERDOWN mode DEFAULT group default qlen 1000

But you can also see packets going through the macvlan device while it is in
the state above which is confusing.
# tcpdump -e -n -i mac1
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on mac1, link-type EN10MB (Ethernet), capture size 262144 bytes
12:08:37.330771 3a:83:f3:52:47:3b > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Request who-has 192.168.99.2 tell 192.168.99.1, length 28
12:08:38.332846 3a:83:f3:52:47:3b > ff:ff:ff:ff:ff:ff, ethertype ARP (0x0806), length 42: Request who-has 192.168.99.2 tell 192.168.99.1, length 28

That being said I understand the need to switch to lowerlayerdown when the lower
device is in "down", which is basically the most important change of this patch.
The rest is already handled by link watch based on carrier state. By now people
are used to having lowerlayerdown when there's no carrier, now it can also mean
that the lower device has been brought admin down.

Here's another interesting state:
6: mac1@eth2: <NO-CARRIER,BROADCAST,MULTICAST,UP,LOWER_UP,DORMANT,M-DOWN> mtu 1500 qdisc noqueue state LOWERLAYERDOWN mode DEFAULT group default qlen 1000
Prior to this patch the macvlan would stay in dormant state and it will also propagate
to devices stacked on top of it.

> 
> Signed-off-by: Debabrata Banerjee <dbanerje@akamai.com>
> ---
> v2: Fix locking and update commit message
> 
>  drivers/net/macvlan.c | 47 +++++++++++++++++++++++++++++++++++++++++++++--
>  1 file changed, 45 insertions(+), 2 deletions(-)
> 

^ permalink raw reply

* Re: [PATCH] ieee802154/adf7242: fix memory leak of firmware
From: Michael Hennerich @ 2016-04-07 11:09 UTC (permalink / raw)
  To: Sudip Mukherjee, Alexander Aring; +Cc: linux-kernel, linux-wpan, netdev
In-Reply-To: <1460027764-27428-1-git-send-email-sudipm.mukherjee@gmail.com>

On 04/07/2016 01:16 PM, Sudip Mukherjee wrote:
> If the firmware upload or the firmware verification fails then we
> printed the error message and exited but we missed releasing the
> firmware.
>
> Signed-off-by: Sudip Mukherjee <sudip.mukherjee@codethink.co.uk>

Acked-by: Michael Hennerich <michael.hennerich@analog.com>

> ---
>   drivers/net/ieee802154/adf7242.c | 2 ++
>   1 file changed, 2 insertions(+)
>
> diff --git a/drivers/net/ieee802154/adf7242.c b/drivers/net/ieee802154/adf7242.c
> index 89154c0..91d4531 100644
> --- a/drivers/net/ieee802154/adf7242.c
> +++ b/drivers/net/ieee802154/adf7242.c
> @@ -1030,6 +1030,7 @@ static int adf7242_hw_init(struct adf7242_local *lp)
>   	if (ret) {
>   		dev_err(&lp->spi->dev,
>   			"upload firmware failed with %d\n", ret);
> +		release_firmware(fw);
>   		return ret;
>   	}
>
> @@ -1037,6 +1038,7 @@ static int adf7242_hw_init(struct adf7242_local *lp)
>   	if (ret) {
>   		dev_err(&lp->spi->dev,
>   			"verify firmware failed with %d\n", ret);
> +		release_firmware(fw);
>   		return ret;
>   	}
>
>


-- 
Greetings,
Michael

^ permalink raw reply

* [PATCH] ieee802154/adf7242: fix memory leak of firmware
From: Sudip Mukherjee @ 2016-04-07 11:16 UTC (permalink / raw)
  To: Michael Hennerich, Alexander Aring
  Cc: linux-kernel, linux-wpan, netdev, Sudip Mukherjee

If the firmware upload or the firmware verification fails then we
printed the error message and exited but we missed releasing the
firmware.

Signed-off-by: Sudip Mukherjee <sudip.mukherjee@codethink.co.uk>
---
 drivers/net/ieee802154/adf7242.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/ieee802154/adf7242.c b/drivers/net/ieee802154/adf7242.c
index 89154c0..91d4531 100644
--- a/drivers/net/ieee802154/adf7242.c
+++ b/drivers/net/ieee802154/adf7242.c
@@ -1030,6 +1030,7 @@ static int adf7242_hw_init(struct adf7242_local *lp)
 	if (ret) {
 		dev_err(&lp->spi->dev,
 			"upload firmware failed with %d\n", ret);
+		release_firmware(fw);
 		return ret;
 	}
 
@@ -1037,6 +1038,7 @@ static int adf7242_hw_init(struct adf7242_local *lp)
 	if (ret) {
 		dev_err(&lp->spi->dev,
 			"verify firmware failed with %d\n", ret);
+		release_firmware(fw);
 		return ret;
 	}
 
-- 
1.9.1

^ permalink raw reply related

* Re: [PATCH v7 net-next 1/1] hv_sock: introduce Hyper-V Sockets
From: Joe Perches @ 2016-04-07 11:29 UTC (permalink / raw)
  To: Dexuan Cui, gregkh, davem, netdev, linux-kernel, devel, olaf, apw,
	jasowang, kys, haiyangz
  Cc: vkuznets
In-Reply-To: <1460033410-28710-1-git-send-email-decui@microsoft.com>

On Thu, 2016-04-07 at 05:50 -0700, Dexuan Cui wrote:
> Hyper-V Sockets (hv_sock) supplies a byte-stream based communication
> mechanism between the host and the guest. It's somewhat like TCP over
> VMBus, but the transportation layer (VMBus) is much simpler than IP.

style trivia:

> diff --git a/net/hv_sock/af_hvsock.c b/net/hv_sock/af_hvsock.c
[]
> +static struct sock *__hvsock_find_bound_socket(const struct sockaddr_hv *addr)
> +{
> +	struct hvsock_sock *hvsk;
> + 
> +	list_for_each_entry(hvsk, &hvsock_bound_list, bound_list)
> +		if (uuid_equals(addr->shv_service_id,
> +				hvsk->local_addr.shv_service_id))
> +			return hvsock_to_sk(hvsk);

Because there's an if, it's generally nicer to use
braces in the list_for_each
> +static struct sock *__hvsock_find_connected_socket_by_channel(
> +	const struct vmbus_channel *channel)
> +{
> +	struct hvsock_sock *hvsk;
> +
> +	list_for_each_entry(hvsk, &hvsock_connected_list, connected_list)
> +		if (hvsk->channel == channel)
> +			return hvsock_to_sk(hvsk);
> +	return NULL;

here too

> +static int hvsock_sendmsg(struct socket *sock, struct msghdr *msg, size_t len)
> +{
[]
> +	if (msg->msg_flags & ~MSG_DONTWAIT) {
> +		pr_err("hvsock_sendmsg: unsupported flags=0x%x\n",
> +		       msg->msg_flags);

All the pr_<level> messages with embedded function
names could use "%s:", __func__

^ permalink raw reply

* RE: [PATCH v7 net-next 1/1] hv_sock: introduce Hyper-V Sockets
From: Dexuan Cui @ 2016-04-07 11:47 UTC (permalink / raw)
  To: Joe Perches, gregkh@linuxfoundation.org, davem@davemloft.net,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	devel@linuxdriverproject.org, olaf@aepfle.de, apw@canonical.com,
	jasowang@redhat.com, KY Srinivasan, Haiyang Zhang
In-Reply-To: <1460028573.6715.88.camel@perches.com>

> From: Joe Perches [mailto:joe@perches.com]
> Sent: Thursday, April 7, 2016 19:30
> To: Dexuan Cui <decui@microsoft.com>; gregkh@linuxfoundation.org;
> davem@davemloft.net; netdev@vger.kernel.org; linux-kernel@vger.kernel.org;
> devel@linuxdriverproject.org; olaf@aepfle.de; apw@canonical.com;
> jasowang@redhat.com; KY Srinivasan <kys@microsoft.com>; Haiyang Zhang
> <haiyangz@microsoft.com>
> Cc: vkuznets@redhat.com
> Subject: Re: [PATCH v7 net-next 1/1] hv_sock: introduce Hyper-V Sockets
> 
> On Thu, 2016-04-07 at 05:50 -0700, Dexuan Cui wrote:
> > Hyper-V Sockets (hv_sock) supplies a byte-stream based communication
> > mechanism between the host and the guest. It's somewhat like TCP over
> > VMBus, but the transportation layer (VMBus) is much simpler than IP.
> 
> style trivia:
> 
> > diff --git a/net/hv_sock/af_hvsock.c b/net/hv_sock/af_hvsock.c
> []
> > +static struct sock *__hvsock_find_bound_socket(const struct sockaddr_hv
> *addr)
> > +{
> > +	struct hvsock_sock *hvsk;
> > +
> > +	list_for_each_entry(hvsk, &hvsock_bound_list, bound_list)
> > +		if (uuid_equals(addr->shv_service_id,
> > +				hvsk->local_addr.shv_service_id))
> > +			return hvsock_to_sk(hvsk);
> 
> Because there's an if, it's generally nicer to use
> braces in the list_for_each

Thanks for the suggestion, Joe!
I'll add {}.

> > +static struct sock *__hvsock_find_connected_socket_by_channel(
> > +	const struct vmbus_channel *channel)
> > +{
> > +	struct hvsock_sock *hvsk;
> > +
> > +	list_for_each_entry(hvsk, &hvsock_connected_list, connected_list)
> > +		if (hvsk->channel == channel)
> > +			return hvsock_to_sk(hvsk);
> > +	return NULL;
> 
> here too
I'll fix this too.

> > +static int hvsock_sendmsg(struct socket *sock, struct msghdr *msg, size_t len)
> > +{
> []
> > +	if (msg->msg_flags & ~MSG_DONTWAIT) {
> > +		pr_err("hvsock_sendmsg: unsupported flags=0x%x\n",
> > +		       msg->msg_flags);
> 
> All the pr_<level> messages with embedded function
> names could use "%s:", __func__
I'll fix this.

Thanks,
-- Dexuan

^ permalink raw reply

* [PATCH v4 1/2] RDS: memory allocated must be align to 8
From: Shamir Rabinovitch @ 2016-04-07 11:57 UTC (permalink / raw)
  To: rds-devel, netdev; +Cc: davem, shamir.rabinovitch, santosh.shilimkar

Fix issue in 'rds_ib_cong_recv' when accessing unaligned memory
allocated by 'rds_page_remainder_alloc' using uint64_t pointer.

Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>
---
 net/rds/page.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/rds/page.c b/net/rds/page.c
index 616f21f..e2b5a58 100644
--- a/net/rds/page.c
+++ b/net/rds/page.c
@@ -135,8 +135,8 @@ int rds_page_remainder_alloc(struct scatterlist *scat, unsigned long bytes,
 			if (rem->r_offset != 0)
 				rds_stats_inc(s_page_remainder_hit);
 
-			rem->r_offset += bytes;
-			if (rem->r_offset == PAGE_SIZE) {
+			rem->r_offset += ALIGN(bytes, 8);
+			if (rem->r_offset >= PAGE_SIZE) {
 				__free_page(rem->r_page);
 				rem->r_page = NULL;
 			}
-- 
1.7.1

^ permalink raw reply related

* [PATCH v4 2/2] RDS: fix congestion map corruption for PAGE_SIZE > 4k
From: Shamir Rabinovitch @ 2016-04-07 11:57 UTC (permalink / raw)
  To: rds-devel, netdev; +Cc: davem, shamir.rabinovitch, santosh.shilimkar
In-Reply-To: <1460030256-16791-1-git-send-email-shamir.rabinovitch@oracle.com>

When PAGE_SIZE > 4k single page can contain 2 RDS fragments. If
'rds_ib_cong_recv' ignore the RDS fragment offset in to the page it
then read the data fragment as far congestion map update and lead to
corruption of the RDS connection far congestion map.

Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com>
---
 net/rds/ib_recv.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/net/rds/ib_recv.c b/net/rds/ib_recv.c
index 977fb86..abc8cc8 100644
--- a/net/rds/ib_recv.c
+++ b/net/rds/ib_recv.c
@@ -796,7 +796,7 @@ static void rds_ib_cong_recv(struct rds_connection *conn,
 
 		addr = kmap_atomic(sg_page(&frag->f_sg));
 
-		src = addr + frag_off;
+		src = addr + frag->f_sg.offset + frag_off;
 		dst = (void *)map->m_page_addrs[map_page] + map_off;
 		for (k = 0; k < to_copy; k += 8) {
 			/* Record ports that became uncongested, ie
-- 
1.7.1

^ permalink raw reply related

* [PATCH 0/2] drivers: net: cpsw: fix ale calls and drop host_port field from cpsw_priv
From: Grygorii Strashko @ 2016-04-07 12:16 UTC (permalink / raw)
  To: David S. Miller, netdev, Mugunthan V N
  Cc: Sekhar Nori, linux-kernel, linux-omap, Grygorii Strashko

This clean up series intended to:
 - fix port_mask parameters in ale calls and drop unnecessary shifts
 - drop host_port field from struct cpsw_priv

Nothing critical. Tested on am437x-idk-evm in dual mac and switch modes.

Grygorii Strashko (2):
  drivers: net: cpsw: fix port_mask parameters in ale calls
  drivers: net: cpsw: drop host_port field from struct cpsw_priv

 drivers/net/ethernet/ti/cpsw.c | 52 +++++++++++++++++-------------------------
 1 file changed, 21 insertions(+), 31 deletions(-)

-- 
2.8.0

^ permalink raw reply

* [PATCH 1/2] drivers: net: cpsw: fix port_mask parameters in ale calls
From: Grygorii Strashko @ 2016-04-07 12:16 UTC (permalink / raw)
  To: David S. Miller, netdev, Mugunthan V N
  Cc: Sekhar Nori, linux-kernel, linux-omap, Grygorii Strashko
In-Reply-To: <1460031404-28594-1-git-send-email-grygorii.strashko@ti.com>

ALE APIs expect to receive port masks as input values for arguments
port_mask, untag, reg_mcast, unreg_mcast. But there are few places in
code where port masks are passed left-shifted by cpsw_priv->host_port,
like below:

 cpsw_ale_add_vlan(priv->ale, priv->data.default_vlan,
		  ALE_ALL_PORTS << priv->host_port,
		  ALE_ALL_PORTS << priv->host_port, 0, 0);

and cpsw is still working just because priv->host_port == 0
and has never ever been changed.

Hence, fix port_mask parameters in ALE APIs calls and drop
"<< priv->host_port" from all places where it's used to
shift valid port mask.

Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
---
 drivers/net/ethernet/ti/cpsw.c | 22 +++++++++-------------
 1 file changed, 9 insertions(+), 13 deletions(-)

diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c
index 42fdfd4..5292e70 100644
--- a/drivers/net/ethernet/ti/cpsw.c
+++ b/drivers/net/ethernet/ti/cpsw.c
@@ -535,7 +535,7 @@ static const struct cpsw_stats cpsw_gstrings_stats[] = {
 				ALE_VLAN, slave->port_vlan, 0);		\
 		} else {						\
 			cpsw_ale_add_mcast(priv->ale, addr,		\
-				ALE_ALL_PORTS << priv->host_port,	\
+				ALE_ALL_PORTS,				\
 				0, 0, 0);				\
 		}							\
 	} while (0)
@@ -602,8 +602,7 @@ static void cpsw_set_promiscious(struct net_device *ndev, bool enable)
 			cpsw_ale_control_set(ale, 0, ALE_AGEOUT, 1);
 
 			/* Clear all mcast from ALE */
-			cpsw_ale_flush_multicast(ale, ALE_ALL_PORTS <<
-						 priv->host_port, -1);
+			cpsw_ale_flush_multicast(ale, ALE_ALL_PORTS, -1);
 
 			/* Flood All Unicast Packets to Host port */
 			cpsw_ale_control_set(ale, 0, ALE_P0_UNI_FLOOD, 1);
@@ -648,8 +647,7 @@ static void cpsw_ndo_set_rx_mode(struct net_device *ndev)
 	cpsw_ale_set_allmulti(priv->ale, priv->ndev->flags & IFF_ALLMULTI);
 
 	/* Clear all mcast from ALE */
-	cpsw_ale_flush_multicast(priv->ale, ALE_ALL_PORTS << priv->host_port,
-				 vid);
+	cpsw_ale_flush_multicast(priv->ale, ALE_ALL_PORTS, vid);
 
 	if (!netdev_mc_empty(ndev)) {
 		struct netdev_hw_addr *ha;
@@ -1172,7 +1170,6 @@ static void cpsw_slave_open(struct cpsw_slave *slave, struct cpsw_priv *priv)
 static inline void cpsw_add_default_vlan(struct cpsw_priv *priv)
 {
 	const int vlan = priv->data.default_vlan;
-	const int port = priv->host_port;
 	u32 reg;
 	int i;
 	int unreg_mcast_mask;
@@ -1190,9 +1187,9 @@ static inline void cpsw_add_default_vlan(struct cpsw_priv *priv)
 	else
 		unreg_mcast_mask = ALE_PORT_1 | ALE_PORT_2;
 
-	cpsw_ale_add_vlan(priv->ale, vlan, ALE_ALL_PORTS << port,
-			  ALE_ALL_PORTS << port, ALE_ALL_PORTS << port,
-			  unreg_mcast_mask << port);
+	cpsw_ale_add_vlan(priv->ale, vlan, ALE_ALL_PORTS,
+			  ALE_ALL_PORTS, ALE_ALL_PORTS,
+			  unreg_mcast_mask);
 }
 
 static void cpsw_init_host_port(struct cpsw_priv *priv)
@@ -1273,8 +1270,7 @@ static int cpsw_ndo_open(struct net_device *ndev)
 		cpsw_add_default_vlan(priv);
 	else
 		cpsw_ale_add_vlan(priv->ale, priv->data.default_vlan,
-				  ALE_ALL_PORTS << priv->host_port,
-				  ALE_ALL_PORTS << priv->host_port, 0, 0);
+				  ALE_ALL_PORTS, ALE_ALL_PORTS, 0, 0);
 
 	if (!cpsw_common_res_usage_state(priv)) {
 		struct cpsw_priv *priv_sl0 = cpsw_get_slave_priv(priv, 0);
@@ -1666,7 +1662,7 @@ static inline int cpsw_add_vlan_ale_entry(struct cpsw_priv *priv,
 	}
 
 	ret = cpsw_ale_add_vlan(priv->ale, vid, port_mask, 0, port_mask,
-				unreg_mcast_mask << priv->host_port);
+				unreg_mcast_mask);
 	if (ret != 0)
 		return ret;
 
@@ -1738,7 +1734,7 @@ static int cpsw_ndo_vlan_rx_kill_vid(struct net_device *ndev,
 		return ret;
 
 	ret = cpsw_ale_del_ucast(priv->ale, priv->mac_addr,
-				 priv->host_port, ALE_VLAN, vid);
+				 HOST_PORT_NUM, ALE_VLAN, vid);
 	if (ret != 0)
 		return ret;
 
-- 
2.8.0

^ permalink raw reply related

* [PATCH 2/2] drivers: net: cpsw: drop host_port field from struct cpsw_priv
From: Grygorii Strashko @ 2016-04-07 12:16 UTC (permalink / raw)
  To: David S. Miller, netdev, Mugunthan V N
  Cc: Sekhar Nori, linux-kernel, linux-omap, Grygorii Strashko
In-Reply-To: <1460031404-28594-1-git-send-email-grygorii.strashko@ti.com>

The host_port field is constantly assigned to 0 and this value has
never changed (since time when cpsw driver was introduced. More over,
if this field will be assigned to non 0 value it will break current
driver functionality.

Hence, there are no reasons to continue maintaining this host_port
field and it can be removed, and the HOST_PORT_NUM and ALE_PORT_HOST
defines can be used instead.

Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
---
 drivers/net/ethernet/ti/cpsw.c | 30 ++++++++++++------------------
 1 file changed, 12 insertions(+), 18 deletions(-)

diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c
index 5292e70..54bcc38 100644
--- a/drivers/net/ethernet/ti/cpsw.c
+++ b/drivers/net/ethernet/ti/cpsw.c
@@ -381,7 +381,6 @@ struct cpsw_priv {
 	u32				coal_intvl;
 	u32				bus_freq_mhz;
 	int				rx_packet_max;
-	int				host_port;
 	struct clk			*clk;
 	u8				mac_addr[ETH_ALEN];
 	struct cpsw_slave		*slaves;
@@ -531,7 +530,7 @@ static const struct cpsw_stats cpsw_gstrings_stats[] = {
 			int slave_port = cpsw_get_slave_port(priv,	\
 						slave->slave_num);	\
 			cpsw_ale_add_mcast(priv->ale, addr,		\
-				1 << slave_port | 1 << priv->host_port,	\
+				1 << slave_port | ALE_PORT_HOST,	\
 				ALE_VLAN, slave->port_vlan, 0);		\
 		} else {						\
 			cpsw_ale_add_mcast(priv->ale, addr,		\
@@ -542,10 +541,7 @@ static const struct cpsw_stats cpsw_gstrings_stats[] = {
 
 static inline int cpsw_get_slave_port(struct cpsw_priv *priv, u32 slave_num)
 {
-	if (priv->host_port == 0)
-		return slave_num + 1;
-	else
-		return slave_num;
+	return slave_num + 1;
 }
 
 static void cpsw_set_promiscious(struct net_device *ndev, bool enable)
@@ -1090,7 +1086,7 @@ static inline void cpsw_add_dual_emac_def_ale_entries(
 		struct cpsw_priv *priv, struct cpsw_slave *slave,
 		u32 slave_port)
 {
-	u32 port_mask = 1 << slave_port | 1 << priv->host_port;
+	u32 port_mask = 1 << slave_port | ALE_PORT_HOST;
 
 	if (priv->version == CPSW_VERSION_1)
 		slave_write(slave, slave->port_vlan, CPSW1_PORT_VLAN);
@@ -1101,7 +1097,7 @@ static inline void cpsw_add_dual_emac_def_ale_entries(
 	cpsw_ale_add_mcast(priv->ale, priv->ndev->broadcast,
 			   port_mask, ALE_VLAN, slave->port_vlan, 0);
 	cpsw_ale_add_ucast(priv->ale, priv->mac_addr,
-		priv->host_port, ALE_VLAN | ALE_SECURE, slave->port_vlan);
+		HOST_PORT_NUM, ALE_VLAN | ALE_SECURE, slave->port_vlan);
 }
 
 static void soft_reset_slave(struct cpsw_slave *slave)
@@ -1202,7 +1198,7 @@ static void cpsw_init_host_port(struct cpsw_priv *priv)
 	cpsw_ale_start(priv->ale);
 
 	/* switch to vlan unaware mode */
-	cpsw_ale_control_set(priv->ale, priv->host_port, ALE_VLAN_AWARE,
+	cpsw_ale_control_set(priv->ale, HOST_PORT_NUM, ALE_VLAN_AWARE,
 			     CPSW_ALE_VLAN_AWARE);
 	control_reg = readl(&priv->regs->control);
 	control_reg |= CPSW_VLAN_AWARE;
@@ -1216,14 +1212,14 @@ static void cpsw_init_host_port(struct cpsw_priv *priv)
 		     &priv->host_port_regs->cpdma_tx_pri_map);
 	__raw_writel(0, &priv->host_port_regs->cpdma_rx_chan_map);
 
-	cpsw_ale_control_set(priv->ale, priv->host_port,
+	cpsw_ale_control_set(priv->ale, HOST_PORT_NUM,
 			     ALE_PORT_STATE, ALE_PORT_STATE_FORWARD);
 
 	if (!priv->data.dual_emac) {
-		cpsw_ale_add_ucast(priv->ale, priv->mac_addr, priv->host_port,
+		cpsw_ale_add_ucast(priv->ale, priv->mac_addr, HOST_PORT_NUM,
 				   0, 0);
 		cpsw_ale_add_mcast(priv->ale, priv->ndev->broadcast,
-				   1 << priv->host_port, 0, 0, ALE_MCAST_FWD_2);
+				   ALE_PORT_HOST, 0, 0, ALE_MCAST_FWD_2);
 	}
 }
 
@@ -1616,9 +1612,9 @@ static int cpsw_ndo_set_mac_address(struct net_device *ndev, void *p)
 		flags = ALE_VLAN;
 	}
 
-	cpsw_ale_del_ucast(priv->ale, priv->mac_addr, priv->host_port,
+	cpsw_ale_del_ucast(priv->ale, priv->mac_addr, HOST_PORT_NUM,
 			   flags, vid);
-	cpsw_ale_add_ucast(priv->ale, addr->sa_data, priv->host_port,
+	cpsw_ale_add_ucast(priv->ale, addr->sa_data, HOST_PORT_NUM,
 			   flags, vid);
 
 	memcpy(priv->mac_addr, addr->sa_data, ETH_ALEN);
@@ -1667,7 +1663,7 @@ static inline int cpsw_add_vlan_ale_entry(struct cpsw_priv *priv,
 		return ret;
 
 	ret = cpsw_ale_add_ucast(priv->ale, priv->mac_addr,
-				 priv->host_port, ALE_VLAN, vid);
+				 HOST_PORT_NUM, ALE_VLAN, vid);
 	if (ret != 0)
 		goto clean_vid;
 
@@ -1679,7 +1675,7 @@ static inline int cpsw_add_vlan_ale_entry(struct cpsw_priv *priv,
 
 clean_vlan_ucast:
 	cpsw_ale_del_ucast(priv->ale, priv->mac_addr,
-			    priv->host_port, ALE_VLAN, vid);
+			   HOST_PORT_NUM, ALE_VLAN, vid);
 clean_vid:
 	cpsw_ale_del_vlan(priv->ale, vid, 0);
 	return ret;
@@ -2148,7 +2144,6 @@ static int cpsw_probe_dual_emac(struct platform_device *pdev,
 	priv_sl2->bus_freq_mhz = priv->bus_freq_mhz;
 
 	priv_sl2->regs = priv->regs;
-	priv_sl2->host_port = priv->host_port;
 	priv_sl2->host_port_regs = priv->host_port_regs;
 	priv_sl2->wr_regs = priv->wr_regs;
 	priv_sl2->hw_stats = priv->hw_stats;
@@ -2317,7 +2312,6 @@ static int cpsw_probe(struct platform_device *pdev)
 		goto clean_runtime_disable_ret;
 	}
 	priv->regs = ss_regs;
-	priv->host_port = HOST_PORT_NUM;
 
 	/* Need to enable clocks with runtime PM api to access module
 	 * registers
-- 
2.8.0

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox