Netdev List
 help / color / mirror / Atom feed
* Re: [Patch net] net_sched: remove cls_flower idr on failure
From: David Miller @ 2017-09-21 22:14 UTC (permalink / raw)
  To: xiyou.wangcong; +Cc: netdev, chrism, jiri
In-Reply-To: <20170920161845.28753-1-xiyou.wangcong@gmail.com>

From: Cong Wang <xiyou.wangcong@gmail.com>
Date: Wed, 20 Sep 2017 09:18:45 -0700

> Fixes: c15ab236d69d ("net/sched: Change cls_flower to use IDR")
> Cc: Chris Mi <chrism@mellanox.com>
> Cc: Jiri Pirko <jiri@mellanox.com>
> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>

Applied, thanks.

^ permalink raw reply

* Re: Kernel 4.13.0-rc4-next-20170811 - IP Routing / Forwarding performance vs Core/RSS number / HT on
From: Florian Fainelli @ 2017-09-21 22:07 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Paweł Staszewski, Paolo Abeni, Jesper Dangaard Brouer,
	Linux Kernel Network Developers, Alexander Duyck
In-Reply-To: <1506030895.29839.153.camel@edumazet-glaptop3.roam.corp.google.com>

On 09/21/2017 02:54 PM, Eric Dumazet wrote:
> On Thu, 2017-09-21 at 14:41 -0700, Florian Fainelli wrote:
> 
>> Would not this apply to pretty much any stacked device setup though? It
>> seems like any network device that just queues up its packet on another
>> physical device for actual transmission may need that (e.g: DSA, bond,
>> team, more.?)
> 
> We support bonding and team already.

Right, so that seems to mostly leave us with DSA at least. What about
other devices that also have IFF_NO_QUEUE set?
-- 
Florian

^ permalink raw reply

* [PATCH] e1000: avoid null pointer dereference on invalid stat type
From: Colin King @ 2017-09-21 22:01 UTC (permalink / raw)
  To: Jeff Kirsher, intel-wired-lan, netdev; +Cc: kernel-janitors, linux-kernel

From: Colin Ian King <colin.king@canonical.com>

Currently if the stat type is invalid then data[i] is being set
either by dereferencing a null pointer p, or it is reading from
an incorrect previous location if we had a valid stat type
previously.  Fix this by nullify pointer p if a stat type is
invalid and only setting data if p is not null.

Detected by CoverityScan, CID#113385 ("Explicit null dereferenced")

Signed-off-by: Colin Ian King <colin.king@canonical.com>
---
 drivers/net/ethernet/intel/e1000/e1000_ethtool.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/e1000/e1000_ethtool.c b/drivers/net/ethernet/intel/e1000/e1000_ethtool.c
index ec8aa4562cc9..2ef6f08b580b 100644
--- a/drivers/net/ethernet/intel/e1000/e1000_ethtool.c
+++ b/drivers/net/ethernet/intel/e1000/e1000_ethtool.c
@@ -1824,11 +1824,12 @@ static void e1000_get_ethtool_stats(struct net_device *netdev,
 {
 	struct e1000_adapter *adapter = netdev_priv(netdev);
 	int i;
-	char *p = NULL;
 	const struct e1000_stats *stat = e1000_gstrings_stats;
 
 	e1000_update_stats(adapter);
 	for (i = 0; i < E1000_GLOBAL_STATS_LEN; i++) {
+		char *p;
+
 		switch (stat->type) {
 		case NETDEV_STATS:
 			p = (char *)netdev + stat->stat_offset;
@@ -1837,12 +1838,13 @@ static void e1000_get_ethtool_stats(struct net_device *netdev,
 			p = (char *)adapter + stat->stat_offset;
 			break;
 		default:
+			p = NULL;
 			WARN_ONCE(1, "Invalid E1000 stat type: %u index %d\n",
 				  stat->type, i);
 			break;
 		}
 
-		if (stat->sizeof_stat == sizeof(u64))
+		if (p && stat->sizeof_stat == sizeof(u64))
 			data[i] = *(u64 *)p;
 		else
 			data[i] = *(u32 *)p;
-- 
2.14.1

^ permalink raw reply related

* [RFC PATCH 2/2] userns: control capabilities of some user namespaces
From: Mahesh Bandewar @ 2017-09-21 21:56 UTC (permalink / raw)
  To: LKML, Netdev
  Cc: Kees Cook, Serge Hallyn, Eric W . Biederman, Eric Dumazet,
	David Miller, Mahesh Bandewar, Mahesh Bandewar

From: Mahesh Bandewar <maheshb@google.com>

With this new notion of "controlled" user-namespaces, the controlled
user-namespaces are marked at the time of their creation while the
capabilities of processes that belong to them are controlled using the
global mask.

Init-user-ns is always uncontrolled and a process that has SYS_ADMIN
that belongs to uncontrolled user-ns can create another (child) user-
namespace that is uncontrolled. Any other process (that either does
not have SYS_ADMIN or belongs to a controlled user-ns) can only
create a user-ns that is controlled.

global-capability-whitelist (controlled_userns_caps_whitelist) is used
at the capability check-time and keeps the semantics for the processes
that belong to uncontrolled user-ns as it is. Processes that belong to
controlled user-ns however are subjected to different checks-

   (a) if the capability in question is controlled and process belongs
       to controlled user-ns, then it's always denied.
   (b) if the capability in question is NOT controlled then fall back
       to the traditional check.

Signed-off-by: Mahesh Bandewar <maheshb@google.com>
---
 include/linux/capability.h     |  1 +
 include/linux/user_namespace.h | 20 ++++++++++++++++++++
 kernel/capability.c            |  5 +++++
 kernel/user_namespace.c        |  3 +++
 security/commoncap.c           |  8 ++++++++
 5 files changed, 37 insertions(+)

diff --git a/include/linux/capability.h b/include/linux/capability.h
index 6c0b9677c03f..b8c6cac18658 100644
--- a/include/linux/capability.h
+++ b/include/linux/capability.h
@@ -250,6 +250,7 @@ extern bool ptracer_capable(struct task_struct *tsk, struct user_namespace *ns);
 extern int get_vfs_caps_from_disk(const struct dentry *dentry, struct cpu_vfs_cap_data *cpu_caps);
 int proc_douserns_caps_whitelist(struct ctl_table *table, int write,
 				 void __user *buff, size_t *lenp, loff_t *ppos);
+bool is_capability_controlled(int cap);
 
 extern int cap_convert_nscap(struct dentry *dentry, void **ivalue, size_t size);
 
diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h
index c18e01252346..e890fe81b47e 100644
--- a/include/linux/user_namespace.h
+++ b/include/linux/user_namespace.h
@@ -22,6 +22,7 @@ struct uid_gid_map {	/* 64 bytes -- 1 cache line */
 };
 
 #define USERNS_SETGROUPS_ALLOWED 1UL
+#define USERNS_CONTROLLED	 2UL
 
 #define USERNS_INIT_FLAGS USERNS_SETGROUPS_ALLOWED
 
@@ -102,6 +103,16 @@ static inline void put_user_ns(struct user_namespace *ns)
 		__put_user_ns(ns);
 }
 
+static inline bool is_user_ns_controlled(const struct user_namespace *ns)
+{
+	return ns->flags & USERNS_CONTROLLED;
+}
+
+static inline void mark_user_ns_controlled(struct user_namespace *ns)
+{
+	ns->flags |= USERNS_CONTROLLED;
+}
+
 struct seq_operations;
 extern const struct seq_operations proc_uid_seq_operations;
 extern const struct seq_operations proc_gid_seq_operations;
@@ -160,6 +171,15 @@ static inline struct ns_common *ns_get_owner(struct ns_common *ns)
 {
 	return ERR_PTR(-EPERM);
 }
+
+static inline bool is_user_ns_controlled(const struct user_namespace *ns)
+{
+	return false;
+}
+
+static inline void mark_user_ns_controlled(struct user_namespace *ns)
+{
+}
 #endif
 
 #endif /* _LINUX_USER_H */
diff --git a/kernel/capability.c b/kernel/capability.c
index 62dbe3350c1b..40a38cc4ff43 100644
--- a/kernel/capability.c
+++ b/kernel/capability.c
@@ -510,6 +510,11 @@ bool ptracer_capable(struct task_struct *tsk, struct user_namespace *ns)
 }
 
 /* Controlled-userns capabilities routines */
+bool is_capability_controlled(int cap)
+{
+	return !cap_raised(controlled_userns_caps_whitelist, cap);
+}
+
 #ifdef CONFIG_SYSCTL
 int proc_douserns_caps_whitelist(struct ctl_table *table, int write,
 				 void __user *buff, size_t *lenp, loff_t *ppos)
diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c
index c490f1e4313b..f393ea5108f0 100644
--- a/kernel/user_namespace.c
+++ b/kernel/user_namespace.c
@@ -53,6 +53,9 @@ static void set_cred_user_ns(struct cred *cred, struct user_namespace *user_ns)
 	cred->cap_effective = CAP_FULL_SET;
 	cred->cap_ambient = CAP_EMPTY_SET;
 	cred->cap_bset = CAP_FULL_SET;
+	if (!ns_capable(user_ns->parent, CAP_SYS_ADMIN) ||
+	    is_user_ns_controlled(user_ns->parent))
+		mark_user_ns_controlled(user_ns);
 #ifdef CONFIG_KEYS
 	key_put(cred->request_key_auth);
 	cred->request_key_auth = NULL;
diff --git a/security/commoncap.c b/security/commoncap.c
index 6bf72b175b49..26f41602da10 100644
--- a/security/commoncap.c
+++ b/security/commoncap.c
@@ -73,6 +73,14 @@ int cap_capable(const struct cred *cred, struct user_namespace *targ_ns,
 {
 	struct user_namespace *ns = targ_ns;
 
+	/* If the capability is controlled and user-ns that process
+	 * belongs-to is 'controlled' then return EPERM and no need
+	 * to check the user-ns hierarchy.
+	 */
+	if (is_user_ns_controlled(cred->user_ns) &&
+	    is_capability_controlled(cap))
+		return -EPERM;
+
 	/* See if cred has the capability in the target user namespace
 	 * by examining the target user namespace and all of the target
 	 * user namespace's parents.
-- 
2.14.1.821.g8fa685d3b7-goog

^ permalink raw reply related

* [RFC PATCH 1/2] capability: introduce sysctl for controlled user-ns capability whitelist
From: Mahesh Bandewar @ 2017-09-21 21:56 UTC (permalink / raw)
  To: LKML, Netdev
  Cc: Kees Cook, Serge Hallyn, Eric W . Biederman, Eric Dumazet,
	David Miller, Mahesh Bandewar, Mahesh Bandewar

From: Mahesh Bandewar <maheshb@google.com>

Add a sysctl variable kernel.controlled_userns_caps_whitelist. This
takes input as capability mask expressed as two comma separated hex
u32 words. The mask, however, is stored in kernel as kernel_cap_t type.

Any capabilities that are not part of this mask will be controlled and
will not be allowed to processes in controlled user-ns.

Signed-off-by: Mahesh Bandewar <maheshb@google.com>
CC: Serge Hallyn <serge@hallyn.com>
CC: Kees Cook <keescook@chromium.org>
CC: "Eric W. Biederman" <ebiederm@xmission.com>

---
 Documentation/sysctl/kernel.txt | 21 ++++++++++++++++++
 include/linux/capability.h      |  3 +++
 kernel/capability.c             | 47 +++++++++++++++++++++++++++++++++++++++++
 kernel/sysctl.c                 |  5 +++++
 4 files changed, 76 insertions(+)

diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt
index ce61d1fe08ca..ec0d74476f48 100644
--- a/Documentation/sysctl/kernel.txt
+++ b/Documentation/sysctl/kernel.txt
@@ -25,6 +25,7 @@ show up in /proc/sys/kernel:
 - bootloader_version	     [ X86 only ]
 - callhome		     [ S390 only ]
 - cap_last_cap
+- controlled_userns_caps_whitelist
 - core_pattern
 - core_pipe_limit
 - core_uses_pid
@@ -186,6 +187,26 @@ CAP_LAST_CAP from the kernel.
 
 ==============================================================
 
+controlled_userns_caps_whitelist
+
+Capability mask that is whitelisted for "controlled" user namespaces.
+Any capability that is missing from this mask will not be allowed to
+any process that is attached to a controlled-userns. e.g. if CAP_NET_RAW
+is not part of this mask, then processes running inside any controlled
+userns's will not be allowed to perform action that needs CAP_NET_RAW
+capability. However, processes that are attached to a parent user-ns
+hierarchy that is *not* controlled and has CAP_NET_RAW can continue
+performing those actions. User-namespaces are marked "controlled" at
+the time of their creation based on the capabilities of the creator.
+A process that does not have CAP_SYS_ADMIN will create user-namespaces
+that are controlled.
+
+The value is expressed as two comma separated hex words (u32). This
+sysctl is avaialble in init-ns and users with CAP_SYS_ADMIN in init-ns
+are allowed to make changes.
+
+==============================================================
+
 core_pattern:
 
 core_pattern is used to specify a core dumpfile pattern name.
diff --git a/include/linux/capability.h b/include/linux/capability.h
index b52e278e4744..6c0b9677c03f 100644
--- a/include/linux/capability.h
+++ b/include/linux/capability.h
@@ -13,6 +13,7 @@
 #define _LINUX_CAPABILITY_H
 
 #include <uapi/linux/capability.h>
+#include <linux/sysctl.h>
 
 
 #define _KERNEL_CAPABILITY_VERSION _LINUX_CAPABILITY_VERSION_3
@@ -247,6 +248,8 @@ extern bool ptracer_capable(struct task_struct *tsk, struct user_namespace *ns);
 
 /* audit system wants to get cap info from files as well */
 extern int get_vfs_caps_from_disk(const struct dentry *dentry, struct cpu_vfs_cap_data *cpu_caps);
+int proc_douserns_caps_whitelist(struct ctl_table *table, int write,
+				 void __user *buff, size_t *lenp, loff_t *ppos);
 
 extern int cap_convert_nscap(struct dentry *dentry, void **ivalue, size_t size);
 
diff --git a/kernel/capability.c b/kernel/capability.c
index f97fe77ceb88..62dbe3350c1b 100644
--- a/kernel/capability.c
+++ b/kernel/capability.c
@@ -28,6 +28,8 @@ EXPORT_SYMBOL(__cap_empty_set);
 
 int file_caps_enabled = 1;
 
+kernel_cap_t controlled_userns_caps_whitelist = CAP_FULL_SET;
+
 static int __init file_caps_disable(char *str)
 {
 	file_caps_enabled = 0;
@@ -506,3 +508,48 @@ bool ptracer_capable(struct task_struct *tsk, struct user_namespace *ns)
 	rcu_read_unlock();
 	return (ret == 0);
 }
+
+/* Controlled-userns capabilities routines */
+#ifdef CONFIG_SYSCTL
+int proc_douserns_caps_whitelist(struct ctl_table *table, int write,
+				 void __user *buff, size_t *lenp, loff_t *ppos)
+{
+	DECLARE_BITMAP(caps_bitmap, CAP_LAST_CAP);
+	struct ctl_table caps_table;
+	char tbuf[NAME_MAX];
+	int ret;
+
+	ret = bitmap_from_u32array(caps_bitmap, CAP_LAST_CAP,
+				   controlled_userns_caps_whitelist.cap,
+				   _KERNEL_CAPABILITY_U32S);
+	if (ret != CAP_LAST_CAP)
+		return -1;
+
+	scnprintf(tbuf, NAME_MAX, "%*pb", CAP_LAST_CAP, caps_bitmap);
+
+	caps_table.data = tbuf;
+	caps_table.maxlen = NAME_MAX;
+	caps_table.mode = table->mode;
+	ret = proc_dostring(&caps_table, write, buff, lenp, ppos);
+	if (ret)
+		return ret;
+	if (write) {
+		kernel_cap_t tmp;
+
+		if (!capable(CAP_SYS_ADMIN))
+			return -EPERM;
+
+		ret = bitmap_parse_user(buff, *lenp, caps_bitmap, CAP_LAST_CAP);
+		if (ret)
+			return ret;
+
+		ret = bitmap_to_u32array(tmp.cap, _KERNEL_CAPABILITY_U32S,
+					 caps_bitmap, CAP_LAST_CAP);
+		if (ret != CAP_LAST_CAP)
+			return -1;
+
+		controlled_userns_caps_whitelist = tmp;
+	}
+	return 0;
+}
+#endif /* CONFIG_SYSCTL */
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 6648fbbb8157..9903cf0de287 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -1229,6 +1229,11 @@ static struct ctl_table kern_table[] = {
 		.extra2		= &one,
 	},
 #endif
+	{
+		.procname	= "controlled_userns_caps_whitelist",
+		.mode		= 0644,
+		.proc_handler	= proc_douserns_caps_whitelist,
+	},
 	{ }
 };
 
-- 
2.14.1.821.g8fa685d3b7-goog

^ permalink raw reply related

* [RFC PATCH 0/2] capability controlled user-namespaces
From: Mahesh Bandewar @ 2017-09-21 21:56 UTC (permalink / raw)
  To: LKML, Netdev
  Cc: Kees Cook, Serge Hallyn, Eric W . Biederman, Eric Dumazet,
	David Miller, Mahesh Bandewar, Mahesh Bandewar

From: Mahesh Bandewar <maheshb@google.com>

TL;DR version
-------------
Creating a sandbox environment with namespaces is challenging
considering what these sandboxed processes can engage into. e.g.
CVE-2017-6074, CVE-2017-7184, CVE-2017-7308 etc. just to name few.
Current form of user-namespaces, however, if changed a bit can allow
us to create a sandbox environment without locking down user-
namespaces.

Detailed version
----------------

Problem
-------
User-namespaces in the current form have increased the attack surface as
any process can acquire capabilities which are not available to them (by
default) by performing combination of clone()/unshare()/setns() syscalls.

    #define _GNU_SOURCE
    #include <stdio.h>
    #include <sched.h>
    #include <netinet/in.h>

    int main(int ac, char **av)
    {
        int sock = -1;

        printf("Attempting to open RAW socket before unshare()...\n");
        sock = socket(AF_INET6, SOCK_RAW, IPPROTO_RAW);
        if (sock < 0) {
            perror("socket() SOCK_RAW failed: ");
        } else {
            printf("Successfully opened RAW-Sock before unshare().\n");
            close(sock);
            sock = -1;
        }

        if (unshare(CLONE_NEWUSER | CLONE_NEWNET) < 0) {
            perror("unshare() failed: ");
            return 1;
        }

        printf("Attempting to open RAW socket after unshare()...\n");
        sock = socket(AF_INET6, SOCK_RAW, IPPROTO_RAW);
        if (sock < 0) {
            perror("socket() SOCK_RAW failed: ");
        } else {
            printf("Successfully opened RAW-Sock after unshare().\n");
            close(sock);
            sock = -1;
        }

        return 0;
    }

The above example shows how easy it is to acquire NET_RAW capabilities
and once acquired, these processes could take benefit of above mentioned
or similar issues discovered/undiscovered with malicious intent. Note
that this is just an example and the problem/solution is not limited
to NET_RAW capability *only*. 

The easiest fix one can apply here is to lock-down user-namespaces which
many of the distros do (i.e. don't allow users to create user namespaces),
but unfortunately that prevents everyone from using them.

Approach
--------
Introduce a notion of 'controlled' user-namespaces. Every process on
the host is allowed to create user-namespaces (governed by the limit
imposed by per-ns sysctl) however, mark user-namespaces created by
sandboxed processes as 'controlled'. Use this 'mark' at the time of
capability check in conjunction with a global capability whitelist.
If the capability is not whitelisted, processes that belong to 
controlled user-namespaces will not be allowed.

Once a user-ns is marked as 'controlled'; all its child user-
namespaces are marked as 'controlled' too.

A global whitelist is list of capabilities governed by the
sysctl which is available to (privileged) user in init-ns to modify
while it's applicable to all controlled user-namespaces on the host.

Marking user-namespaces controlled without modifying the whitelist is
equivalent of the current behavior. The default value of whitelist includes
all capabilities so that the compatibility is maintained. However it gives
admins fine-grained ability to control various capabilities system wide
without locking down user-namespaces.

Please see individual patches in this series.

Mahesh Bandewar (2):
  capability: introduce sysctl for controlled user-ns capability
    whitelist
  userns: control capabilities of some user namespaces

 Documentation/sysctl/kernel.txt | 21 +++++++++++++++++
 include/linux/capability.h      |  4 ++++
 include/linux/user_namespace.h  | 20 ++++++++++++++++
 kernel/capability.c             | 52 +++++++++++++++++++++++++++++++++++++++++
 kernel/sysctl.c                 |  5 ++++
 kernel/user_namespace.c         |  3 +++
 security/commoncap.c            |  8 +++++++
 7 files changed, 113 insertions(+)

-- 
2.14.1.821.g8fa685d3b7-goog

^ permalink raw reply

* Re: Kernel 4.13.0-rc4-next-20170811 - IP Routing / Forwarding performance vs Core/RSS number / HT on
From: Eric Dumazet @ 2017-09-21 21:54 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: Paweł Staszewski, Paolo Abeni, Jesper Dangaard Brouer,
	Linux Kernel Network Developers, Alexander Duyck
In-Reply-To: <6607c631-580d-825b-6205-6f6ee688ce32@gmail.com>

On Thu, 2017-09-21 at 14:41 -0700, Florian Fainelli wrote:

> Would not this apply to pretty much any stacked device setup though? It
> seems like any network device that just queues up its packet on another
> physical device for actual transmission may need that (e.g: DSA, bond,
> team, more.?)

We support bonding and team already.

^ permalink raw reply

* Re: [PATCH net] bpf: one perf event close won't free bpf program attached by another perf event
From: Alexei Starovoitov @ 2017-09-21 21:53 UTC (permalink / raw)
  To: Peter Zijlstra, Yonghong Song; +Cc: Steven Rostedt, daniel, netdev, kernel-team
In-Reply-To: <20170921111706.343om7252gcagco6@hirez.programming.kicks-ass.net>

On 9/21/17 4:17 AM, Peter Zijlstra wrote:
> On Wed, Sep 20, 2017 at 10:20:13PM -0700, Yonghong Song wrote:
>>> (2). trace_event_call->perf_events are per cpu data structure, that
>>> means, some filtering logic is needed to avoid the same perf_event prog
>>> is executing twice.
>>
>> What I mean here is that the trace_event_call->perf_events need to be
>> checked on ALL cpus since bpf prog should be executed regardless of
>> cpu affiliation. It is possible that the same perf_event in different
>> per_cpu bucket and hence filtering is needed to avoid the same
>> perf_event bpf_prog is executed twice.
>
> An event will only ever be on a single CPU's list at any one time IIRC.

yes, but doing for_each_cpu there is not an option. too slow.
struct trace_event_call is the only stable argument in
perf_trace_##call(), so we gotta have a pointer there for stuff
we need to run.
This patch added another annoying pointer, since it's the simplest
bugfix for stable. For net-next we're going to remove it, since
we're working on multi-prog support for kprobes/tracepoints.
(right now there is only one prog allowed and that's very limiting)
With multi-prog that bpf_prog_owner pointer will be removed and
existing 'struct bpf_prog *prog' pointer will be replaced with
something else.

> Now, hysterically perf_event_set_bpf_prog used the tracepoint crud
> because that already had bpf bits in. But it might make sense to look at
> unifying the bpf stuff across all the different event types. Have them
> all use event->prog.

it sounds good in theory, but in practice we need a separate
'stuff to run' pointer in both perf_event and trace_even_call,
since that's what being passed to overflow_handle and perf_trace_##call.

> I suspect that would break a fair bunch of bpf proglets, since the data
> access to the trace data would be completely different, but it would be
> much nicer to not have this distinction based on event type.

such things are certainly an abi.
kprobe+bpf has to see struct pt_regs
perf_event+bpf has to see struct bpf_perf_event_data and
tracepoint+bpf has to see struct foo { fields }
The fields will change every time tracepoint is changed.
That's fine.
But we cannot unify kprobe with tracepoints with perf_event prog types.
And frankly I don't see the need.
Note that in pt_regs we don't need to populate everything.
The 'optimized fprobe' we were talking about at plumbers we
would populate di,si,dx,cx,sp since most of the kprobe+bpf progs
don't care about the other regs and especially cpu flags.
So plenty of room for tweaks and optimizations.

^ permalink raw reply

* Re: Kernel 4.13.0-rc4-next-20170811 - IP Routing / Forwarding performance vs Core/RSS number / HT on
From: Paweł Staszewski @ 2017-09-21 21:43 UTC (permalink / raw)
  To: Florian Fainelli, Eric Dumazet
  Cc: Paolo Abeni, Jesper Dangaard Brouer,
	Linux Kernel Network Developers, Alexander Duyck
In-Reply-To: <6607c631-580d-825b-6205-6f6ee688ce32@gmail.com>



W dniu 2017-09-21 o 23:41, Florian Fainelli pisze:
> On 09/21/2017 02:26 PM, Paweł Staszewski wrote:
>>
>> W dniu 2017-08-15 o 11:11, Paweł Staszewski pisze:
>>> diff --git a/net/8021q/vlan_netlink.c b/net/8021q/vlan_netlink.c
>>> index
>>> 5e831de3103e2f7092c7fa15534def403bc62fb4..9472de846d5c0960996261cb2843032847fa4bf7
>>> 100644
>>> --- a/net/8021q/vlan_netlink.c
>>> +++ b/net/8021q/vlan_netlink.c
>>> @@ -143,6 +143,7 @@ static int vlan_newlink(struct net *src_net,
>>> struct net_device *dev,
>>>        vlan->vlan_proto = proto;
>>>        vlan->vlan_id     = nla_get_u16(data[IFLA_VLAN_ID]);
>>>        vlan->real_dev     = real_dev;
>>> +    dev->priv_flags |= (real_dev->priv_flags & IFF_XMIT_DST_RELEASE);
>>>        vlan->flags     = VLAN_FLAG_REORDER_HDR;
>>>          err = vlan_check_real_dev(real_dev, vlan->vlan_proto,
>>> vlan->vlan_id);
>> Any plans for this patch to go normal into the kernel ?
> Would not this apply to pretty much any stacked device setup though? It
> seems like any network device that just queues up its packet on another
> physical device for actual transmission may need that (e.g: DSA, bond,
> team, more.?)
Some devices libe bond have it.

Just maybee when there was first patch vlans were not taken into account.
Did not checked all :)

But I know Eric will do :)

^ permalink raw reply

* Re: Kernel 4.13.0-rc4-next-20170811 - IP Routing / Forwarding performance vs Core/RSS number / HT on
From: Florian Fainelli @ 2017-09-21 21:41 UTC (permalink / raw)
  To: Paweł Staszewski, Eric Dumazet
  Cc: Paolo Abeni, Jesper Dangaard Brouer,
	Linux Kernel Network Developers, Alexander Duyck
In-Reply-To: <5d32f5cf-ca69-1f6b-5bca-cdcd4dc414e2@itcare.pl>

On 09/21/2017 02:26 PM, Paweł Staszewski wrote:
> 
> 
> W dniu 2017-08-15 o 11:11, Paweł Staszewski pisze:
>> diff --git a/net/8021q/vlan_netlink.c b/net/8021q/vlan_netlink.c
>> index
>> 5e831de3103e2f7092c7fa15534def403bc62fb4..9472de846d5c0960996261cb2843032847fa4bf7
>> 100644
>> --- a/net/8021q/vlan_netlink.c
>> +++ b/net/8021q/vlan_netlink.c
>> @@ -143,6 +143,7 @@ static int vlan_newlink(struct net *src_net,
>> struct net_device *dev,
>>       vlan->vlan_proto = proto;
>>       vlan->vlan_id     = nla_get_u16(data[IFLA_VLAN_ID]);
>>       vlan->real_dev     = real_dev;
>> +    dev->priv_flags |= (real_dev->priv_flags & IFF_XMIT_DST_RELEASE);
>>       vlan->flags     = VLAN_FLAG_REORDER_HDR;
>>         err = vlan_check_real_dev(real_dev, vlan->vlan_proto,
>> vlan->vlan_id); 
> 
> Any plans for this patch to go normal into the kernel ?

Would not this apply to pretty much any stacked device setup though? It
seems like any network device that just queues up its packet on another
physical device for actual transmission may need that (e.g: DSA, bond,
team, more.?)
-- 
Florian

^ permalink raw reply

* Re: [PATCH 2/2] ip_tunnel: add mpls over gre encapsulation
From: Amine Kherbouche @ 2017-09-21 21:39 UTC (permalink / raw)
  To: Francois Romieu; +Cc: netdev, xeb, roopa, equinox
In-Reply-To: <20170921212536.GA814@electric-eye.fr.zoreil.com>

Hi Francois,

Thanks for the feedback, I'll make it for the next version.

On 21/09/2017 23:25, Francois Romieu wrote:
> Amine Kherbouche <amine.kherbouche@6wind.com> :
> [...]
>> diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c
>> index 36ea2ad..060ed07 100644
>> --- a/net/mpls/af_mpls.c
>> +++ b/net/mpls/af_mpls.c
> [...]
>> @@ -39,6 +40,40 @@ static int one = 1;
>>   static int label_limit = (1 << 20) - 1;
>>   static int ttl_max = 255;
>>   
>> +size_t ipgre_mpls_encap_hlen(struct ip_tunnel_encap *e)
>> +{
>> +	return sizeof(struct mpls_shim_hdr);
>> +}
>> +
>> +int ipgre_mpls_build_header(struct sk_buff *skb, struct ip_tunnel_encap *e,
>> +			    u8 *protocol, struct flowi4 *fl4)
>> +{
>> +	return 0;
>> +}
>> +
>> +static const struct ip_tunnel_encap_ops mpls_iptun_ops = {
>> +	.encap_hlen = ipgre_mpls_encap_hlen,
>> +	.build_header = ipgre_mpls_build_header,
>> +};
> Nit: af_mpls.c uses tab before '=' in such places.
>
>> +
>> +int ipgre_tunnel_encap_add_mpls_ops(void)
>> +{
>> +	int ret;
>> +
>> +	ret = ip_tunnel_encap_add_ops(&mpls_iptun_ops, TUNNEL_ENCAP_MPLS);
> ip_tunnel_encap_add_ops is CONFIG_NET_IP_TUNNEL dependant.
>
> Afaics CONFIG_MPLS does not enforce it.
>
> [...]
>> @@ -2486,6 +2521,7 @@ static int __init mpls_init(void)
>>   		      0);
>>   	rtnl_register(PF_MPLS, RTM_GETNETCONF, mpls_netconf_get_devconf,
>>   		      mpls_netconf_dump_devconf, 0);
>> +	ipgre_tunnel_encap_add_mpls_ops();
>>   	err = 0;
>>   out:
>>   	return err;
> ipgre_tunnel_encap_add_mpls_ops status return code is not checked.
>

^ permalink raw reply

* Re: net: macb: fail when there's no PHY
From: Brandon Streiff @ 2017-09-21 21:35 UTC (permalink / raw)
  To: Grant Edwards, Florian Fainelli; +Cc: netdev, Brandon Streiff
In-Reply-To: <20170921203608.GB30148@grante>

> On Thu, Sep 21, 2017 at 01:05:57PM -0700, Florian Fainelli wrote:
>
>>> It looks like the macb driver still can't handle boards that don't
>>> have a PHY.  Is that correct?
>> 
>> Not since:
>> 
>> dacdbb4dfc1a1a1378df8ebc914d4fe82259ed46 ("net: macb: add fixed-link
>> node support")
>
> Yep, it's obvious now that I've got the diff in front of me.
>
> Thanks!
>
> [I just started working with device tree for the first time yesterday,
> and I must say it's way better than the "old days" which required all
> sorts of ugly to produce a kernel that could work on two slightly
> different boards.]
>
> -- 
> Grant

I have a board that's in a similar boat. My workaround was to undo
portions of dacdbb4dfc1a with the following patch; this lets me still
use fixed-link and have MDIO (to configure a switch), but not require
a PHY.

There was a patch set last year by Harini Katakam ("net: macb: Add MDIO
driver for accessing multiple PHY devices") that might ultimately be a
better approach to tackling this problem, although I haven't seen any
further chatter on it.

---
 drivers/net/ethernet/cadence/macb_main.c | 38 +++++++++++++++-----------------
 1 file changed, 18 insertions(+), 20 deletions(-)

diff --git a/drivers/net/ethernet/cadence/macb_main.c b/drivers/net/ethernet/cadence/macb_main.c
index 1741cda..a45848e 100644
--- a/drivers/net/ethernet/cadence/macb_main.c
+++ b/drivers/net/ethernet/cadence/macb_main.c
@@ -564,30 +564,28 @@ static int macb_mii_init(struct macb *bp)
 				goto err_out_unregister_bus;
 			}
 			bp->phy_node = of_node_get(np);
+		}
 
-			err = mdiobus_register(bp->mii_bus);
-		} else {
-			/* try dt phy registration */
-			err = of_mdiobus_register(bp->mii_bus, np);
+		/* try dt phy registration */
+		err = of_mdiobus_register(bp->mii_bus, np);
 
-			/* fallback to standard phy registration if no phy were
-			 * found during dt phy registration
-			 */
-			if (!err && !phy_find_first(bp->mii_bus)) {
-				for (i = 0; i < PHY_MAX_ADDR; i++) {
-					struct phy_device *phydev;
-
-					phydev = mdiobus_scan(bp->mii_bus, i);
-					if (IS_ERR(phydev) &&
-					    PTR_ERR(phydev) != -ENODEV) {
-						err = PTR_ERR(phydev);
-						break;
-					}
+		/* fallback to standard phy registration if no phy were
+		 * found during dt phy registration
+		 */
+		if (!err && !phy_find_first(bp->mii_bus)) {
+			for (i = 0; i < PHY_MAX_ADDR; i++) {
+				struct phy_device *phydev;
+
+				phydev = mdiobus_scan(bp->mii_bus, i);
+				if (IS_ERR(phydev) &&
+				    PTR_ERR(phydev) != -ENODEV) {
+					err = PTR_ERR(phydev);
+					break;
 				}
-
-				if (err)
-					goto err_out_unregister_bus;
 			}
+
+			if (err)
+				goto err_out_unregister_bus;
 		}
 	} else {
 		for (i = 0; i < PHY_MAX_ADDR; i++)
-- 
2.1.4

^ permalink raw reply related

* Re: Kernel 4.13.0-rc4-next-20170811 - IP Routing / Forwarding performance vs Core/RSS number / HT on
From: Paweł Staszewski @ 2017-09-21 21:34 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Paolo Abeni, Jesper Dangaard Brouer,
	Linux Kernel Network Developers, Alexander Duyck
In-Reply-To: <1506029642.29839.151.camel@edumazet-glaptop3.roam.corp.google.com>



W dniu 2017-09-21 o 23:34, Eric Dumazet pisze:
> On Thu, 2017-09-21 at 23:26 +0200, Paweł Staszewski wrote:
>> W dniu 2017-08-15 o 11:11, Paweł Staszewski pisze:
>>> diff --git a/net/8021q/vlan_netlink.c b/net/8021q/vlan_netlink.c
>>> index
>>> 5e831de3103e2f7092c7fa15534def403bc62fb4..9472de846d5c0960996261cb2843032847fa4bf7
>>> 100644
>>> --- a/net/8021q/vlan_netlink.c
>>> +++ b/net/8021q/vlan_netlink.c
>>> @@ -143,6 +143,7 @@ static int vlan_newlink(struct net *src_net,
>>> struct net_device *dev,
>>>        vlan->vlan_proto = proto;
>>>        vlan->vlan_id     = nla_get_u16(data[IFLA_VLAN_ID]);
>>>        vlan->real_dev     = real_dev;
>>> +    dev->priv_flags |= (real_dev->priv_flags & IFF_XMIT_DST_RELEASE);
>>>        vlan->flags     = VLAN_FLAG_REORDER_HDR;
>>>          err = vlan_check_real_dev(real_dev, vlan->vlan_proto,
>>> vlan->vlan_id);
>> Any plans for this patch to go normal into the kernel ?
>>
>> So far im using it for about 3 weeks on all my linux based routers - and
>> no problems.
> Yes, I was about to submit it, as I mentioned it few hours ago to you ;)
>
>
>
>

Yes i saw Your point 2)  in previous emails :)
But there was no patch in previous reply for this so was thinking that 
maybee too many things to do and You forgot about it :)

Thanks
Paweł

^ permalink raw reply

* Re: Kernel 4.13.0-rc4-next-20170811 - IP Routing / Forwarding performance vs Core/RSS number / HT on
From: Eric Dumazet @ 2017-09-21 21:34 UTC (permalink / raw)
  To: Paweł Staszewski
  Cc: Paolo Abeni, Jesper Dangaard Brouer,
	Linux Kernel Network Developers, Alexander Duyck
In-Reply-To: <5d32f5cf-ca69-1f6b-5bca-cdcd4dc414e2@itcare.pl>

On Thu, 2017-09-21 at 23:26 +0200, Paweł Staszewski wrote:
> 
> W dniu 2017-08-15 o 11:11, Paweł Staszewski pisze:
> > diff --git a/net/8021q/vlan_netlink.c b/net/8021q/vlan_netlink.c
> > index 
> > 5e831de3103e2f7092c7fa15534def403bc62fb4..9472de846d5c0960996261cb2843032847fa4bf7 
> > 100644
> > --- a/net/8021q/vlan_netlink.c
> > +++ b/net/8021q/vlan_netlink.c
> > @@ -143,6 +143,7 @@ static int vlan_newlink(struct net *src_net, 
> > struct net_device *dev,
> >       vlan->vlan_proto = proto;
> >       vlan->vlan_id     = nla_get_u16(data[IFLA_VLAN_ID]);
> >       vlan->real_dev     = real_dev;
> > +    dev->priv_flags |= (real_dev->priv_flags & IFF_XMIT_DST_RELEASE);
> >       vlan->flags     = VLAN_FLAG_REORDER_HDR;
> >         err = vlan_check_real_dev(real_dev, vlan->vlan_proto, 
> > vlan->vlan_id); 
> 
> Any plans for this patch to go normal into the kernel ?
> 
> So far im using it for about 3 weeks on all my linux based routers - and 
> no problems.

Yes, I was about to submit it, as I mentioned it few hours ago to you ;)

^ permalink raw reply

* Re: Kernel 4.13.0-rc4-next-20170811 - IP Routing / Forwarding performance vs Core/RSS number / HT on
From: Paweł Staszewski @ 2017-09-21 21:26 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Paolo Abeni, Jesper Dangaard Brouer,
	Linux Kernel Network Developers, Alexander Duyck
In-Reply-To: <4b1efff7-4f91-fd78-beb8-2c7ebcf18895@itcare.pl>



W dniu 2017-08-15 o 11:11, Paweł Staszewski pisze:
> diff --git a/net/8021q/vlan_netlink.c b/net/8021q/vlan_netlink.c
> index 
> 5e831de3103e2f7092c7fa15534def403bc62fb4..9472de846d5c0960996261cb2843032847fa4bf7 
> 100644
> --- a/net/8021q/vlan_netlink.c
> +++ b/net/8021q/vlan_netlink.c
> @@ -143,6 +143,7 @@ static int vlan_newlink(struct net *src_net, 
> struct net_device *dev,
>       vlan->vlan_proto = proto;
>       vlan->vlan_id     = nla_get_u16(data[IFLA_VLAN_ID]);
>       vlan->real_dev     = real_dev;
> +    dev->priv_flags |= (real_dev->priv_flags & IFF_XMIT_DST_RELEASE);
>       vlan->flags     = VLAN_FLAG_REORDER_HDR;
>         err = vlan_check_real_dev(real_dev, vlan->vlan_proto, 
> vlan->vlan_id); 

Any plans for this patch to go normal into the kernel ?

So far im using it for about 3 weeks on all my linux based routers - and 
no problems.

^ permalink raw reply

* Re: [PATCH 2/2] ip_tunnel: add mpls over gre encapsulation
From: Francois Romieu @ 2017-09-21 21:25 UTC (permalink / raw)
  To: Amine Kherbouche; +Cc: netdev, xeb, roopa, equinox
In-Reply-To: <1505985924-12479-3-git-send-email-amine.kherbouche@6wind.com>

Amine Kherbouche <amine.kherbouche@6wind.com> :
[...]
> diff --git a/net/mpls/af_mpls.c b/net/mpls/af_mpls.c
> index 36ea2ad..060ed07 100644
> --- a/net/mpls/af_mpls.c
> +++ b/net/mpls/af_mpls.c
[...]
> @@ -39,6 +40,40 @@ static int one = 1;
>  static int label_limit = (1 << 20) - 1;
>  static int ttl_max = 255;
>  
> +size_t ipgre_mpls_encap_hlen(struct ip_tunnel_encap *e)
> +{
> +	return sizeof(struct mpls_shim_hdr);
> +}
> +
> +int ipgre_mpls_build_header(struct sk_buff *skb, struct ip_tunnel_encap *e,
> +			    u8 *protocol, struct flowi4 *fl4)
> +{
> +	return 0;
> +}
> +
> +static const struct ip_tunnel_encap_ops mpls_iptun_ops = {
> +	.encap_hlen = ipgre_mpls_encap_hlen,
> +	.build_header = ipgre_mpls_build_header,
> +};

Nit: af_mpls.c uses tab before '=' in such places.

> +
> +int ipgre_tunnel_encap_add_mpls_ops(void)
> +{
> +	int ret;
> +
> +	ret = ip_tunnel_encap_add_ops(&mpls_iptun_ops, TUNNEL_ENCAP_MPLS);

ip_tunnel_encap_add_ops is CONFIG_NET_IP_TUNNEL dependant.

Afaics CONFIG_MPLS does not enforce it.

[...]
> @@ -2486,6 +2521,7 @@ static int __init mpls_init(void)
>  		      0);
>  	rtnl_register(PF_MPLS, RTM_GETNETCONF, mpls_netconf_get_devconf,
>  		      mpls_netconf_dump_devconf, 0);
> +	ipgre_tunnel_encap_add_mpls_ops();
>  	err = 0;
>  out:
>  	return err;

ipgre_tunnel_encap_add_mpls_ops status return code is not checked.

-- 
Ueimor

^ permalink raw reply

* Re: [PATCH net] packet: hold bind lock when rebinding to fanout hook
From: Willem de Bruijn @ 2017-09-21 21:10 UTC (permalink / raw)
  To: David Miller; +Cc: Willem de Bruijn, Network Development, nixiaoming
In-Reply-To: <20170920.140327.2188227671204769350.davem@davemloft.net>

On Wed, Sep 20, 2017 at 5:03 PM, David Miller <davem@davemloft.net> wrote:
> From: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
> Date: Fri, 15 Sep 2017 10:07:46 -0400
>
>> On Thu, Sep 14, 2017 at 5:14 PM, Willem de Bruijn <willemb@google.com> wrote:
>>> Packet socket bind operations must hold the po->bind_lock. This keeps
>>> po->running consistent with whether the socket is actually on a ptype
>>> list to receive packets.
>>>
>>> fanout_add unbinds a socket and its packet_rcv/tpacket_rcv call, then
>>> binds the fanout object to receive through packet_rcv_fanout.
>>>
>>> Make it hold the po->bind_lock when testing po->running and rebinding.
>>> Else, it can race with other rebind operations, such as that in
>>> packet_set_ring from packet_rcv to tpacket_rcv. Concurrent updates
>>> can result in a socket being added to a fanout group twice, causing
>>> use-after-free KASAN bug reports, among others.
>>>
>>> Reported independently by both trinity and syzkaller.
>>> Verified that the syzkaller reproducer passes after this patch.
>>>
>>
>> I forgot to add the Fixes tag, sorry.
>>
>> Fixes: dc99f600698d ("packet: Add fanout support.")
>
> Applied and queued up for stable as it fixes this race and I can't
> see any new problems it introduces.
>
> But boy is this one messy area.

Yeah, I've been staring at this code for a while now. But I don't see
an obvious way to simplify it. packet_notifier does not hold the socket
lock, so even if all of bind, set_ring and fanout_add do hold that,
bind_lock is still needed. packet_mmap may not take the socket lock,
so pg_vec_lock must remain to synchronize with packet_set_ring
even if for no other reason. I had a look at using rcu pointers for rx_ring
and tx_ring, to avoid taking that lock in the datapath and possibly updating
without the unbind/bind dance. But that update needs to be atomic with
purging the socket queue, so the socket must be properly quiesced.

> The scariest thing to me now is the save/restore sequence done by
> packet_set_ring(), for example.
>
>         spin_lock(&po->bind_lock);
>         was_running = po->running;
>         num = po->num;
>         if (was_running) {
>                 po->num = 0;
>                 __unregister_prot_hook(sk, false);
>         }
>         spin_unlock(&po->bind_lock);
>  ...
>         spin_lock(&po->bind_lock);
>         if (was_running) {
>                 po->num = num;
>                 register_prot_hook(sk);
>         }
>         spin_unlock(&po->bind_lock);
>
> The socket is also locked during this sequence but that doesn't
> prevent parallel changes to the running state.
>
> Since po->bind_lock is dropped, it's possible for another thread
> to grab bind_lock and bind it meanwhile.
>
> The above code seems to assume that can't happen, and that
> register_prot_hook() will always see po->running set to zero
> and rebind the socket.
>
> If the race happens we'll have weird state, because we did not
> rebind yet we modified po->num.

It appears that the only path that may try to bind without holding the
socket lock is packet_notifier. That skips register_prot_hook if !po->num.

> We seem to have a hierachy of sleeping and non-sleeping locks
> that do not work well together.

Given the number recent bugs that were fixed by locking the socket
inside a particular setsockopt case, I think that we should lock that
entire function, similar to other protocol families (after verifying that
all cases can indeed sleep).

Another issue that looks fragile is the test for po->fanout in
packet_do_bind before taking the socket lock.

^ permalink raw reply

* pull-request: ieee802154 2017-09-20
From: Stefan Schmidt @ 2017-09-21 20:56 UTC (permalink / raw)
  To: David S. Miller; +Cc: linux-wpan, alex.aring, marcel, netdev

Hello Dave.

[Resend with netdev in cc]

Here comes a pull request for ieee802154 changes I have queued up for
this merge window.

Normally these have been coming through the bluetooth tree but as this
three have been falling through the cracks so far and I have to review
and ack all of them anyway I think it makes sense if I save the
bluetooth people some work and handle them directly.

Its the first pull request I send to you so please let me know if I did
something wrong or if you prefer a different format.

regards
Stefan Schmidt

The following changes since commit 8ca712c373a462cfa1b62272870b6c2c74aa83f9:

  Merge branch 'net-speedup-netns-create-delete-time' (2017-09-19 16:32:24 -0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/sschmidt/wpan-next.git ieee802154-for-davem-2017-09-20

for you to fetch changes up to d5dd29e4dafef4baad7bf529ad73cafeb13e1aa8:

  ieee802154: atusb: Driver for Busware HUL dongle (2017-09-20 13:37:16 +0200)

----------------------------------------------------------------
Support for the hulusb hardware inside the atusb driver by Josef
Filzmaier and 802.15.4 MAC security compliance fix by Diogenes Pereira.
----------------------------------------------------------------
Diogenes Pereira (2):
      mac802154: replace hardcoded value with macro
      mac802154: Fix MAC header and payload encrypted

Josef Filzmaier (1):
      ieee802154: atusb: Driver for Busware HUL dongle

 drivers/net/ieee802154/atusb.c | 317 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++----------------
 drivers/net/ieee802154/atusb.h |   8 ++++
 net/mac802154/llsec.c          |  14 ++++--
 3 files changed, 295 insertions(+), 44 deletions(-)

^ permalink raw reply

* [PATCH 5/5] xfrm: eradicate size_t
From: Alexey Dobriyan @ 2017-09-21 20:48 UTC (permalink / raw)
  To: steffen.klassert; +Cc: herbert, davem, netdev
In-Reply-To: <20170921204543.GB13550@avx2>

All netlink message sizes are a) unsigned, b) can't be >= 4GB in size
because netlink doesn't support >= 64KB messages in the first place.

All those size_t across the code are a scam especially across networking
which likes to work with small numbers like 1500 or 65536.

Propagate unsignedness and flip some "int" to "unsigned int" as well.

This is preparation to switching nlmsg_new() to "unsigned int".

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
---

 net/xfrm/xfrm_user.c |   44 +++++++++++++++++++++++---------------------
 1 file changed, 23 insertions(+), 21 deletions(-)

--- a/net/xfrm/xfrm_user.c
+++ b/net/xfrm/xfrm_user.c
@@ -458,9 +458,9 @@ static int xfrm_alloc_replay_state_esn(struct xfrm_replay_state_esn **replay_esn
 	return 0;
 }
 
-static inline int xfrm_user_sec_ctx_size(struct xfrm_sec_ctx *xfrm_ctx)
+static inline unsigned int xfrm_user_sec_ctx_size(struct xfrm_sec_ctx *xfrm_ctx)
 {
-	int len = 0;
+	unsigned int len = 0;
 
 	if (xfrm_ctx) {
 		len += sizeof(struct xfrm_user_sec_ctx);
@@ -1031,7 +1031,7 @@ static inline int xfrm_nlmsg_multicast(struct net *net, struct sk_buff *skb,
 		return -1;
 }
 
-static inline size_t xfrm_spdinfo_msgsize(void)
+static inline unsigned int xfrm_spdinfo_msgsize(void)
 {
 	return NLMSG_ALIGN(4)
 	       + nla_total_size(sizeof(struct xfrmu_spdinfo))
@@ -1157,7 +1157,7 @@ static int xfrm_get_spdinfo(struct sk_buff *skb, struct nlmsghdr *nlh,
 	return nlmsg_unicast(net->xfrm.nlsk, r_skb, sportid);
 }
 
-static inline size_t xfrm_sadinfo_msgsize(void)
+static inline unsigned int xfrm_sadinfo_msgsize(void)
 {
 	return NLMSG_ALIGN(4)
 	       + nla_total_size(sizeof(struct xfrmu_sadhinfo))
@@ -1633,7 +1633,7 @@ static inline int copy_to_user_sec_ctx(struct xfrm_policy *xp, struct sk_buff *s
 		return copy_sec_ctx(xp->security, skb);
 	return 0;
 }
-static inline size_t userpolicy_type_attrsize(void)
+static inline unsigned int userpolicy_type_attrsize(void)
 {
 #ifdef CONFIG_XFRM_SUB_POLICY
 	return nla_total_size(sizeof(struct xfrm_userpolicy_type));
@@ -1850,9 +1850,9 @@ static int xfrm_flush_sa(struct sk_buff *skb, struct nlmsghdr *nlh,
 	return 0;
 }
 
-static inline size_t xfrm_aevent_msgsize(struct xfrm_state *x)
+static inline unsigned int xfrm_aevent_msgsize(struct xfrm_state *x)
 {
-	size_t replay_size = x->replay_esn ?
+	unsigned int replay_size = x->replay_esn ?
 			      xfrm_replay_state_esn_len(x->replay_esn) :
 			      sizeof(struct xfrm_replay_state);
 
@@ -2321,8 +2321,8 @@ static int copy_to_user_kmaddress(const struct xfrm_kmaddress *k, struct sk_buff
 	return nla_put(skb, XFRMA_KMADDRESS, sizeof(uk), &uk);
 }
 
-static inline size_t xfrm_migrate_msgsize(int num_migrate, int with_kma,
-					  int with_encp)
+static inline unsigned int xfrm_migrate_msgsize(int num_migrate, int with_kma,
+						int with_encp)
 {
 	return NLMSG_ALIGN(sizeof(struct xfrm_userpolicy_id))
 	      + (with_kma ? nla_total_size(sizeof(struct xfrm_kmaddress)) : 0)
@@ -2566,7 +2566,7 @@ static void xfrm_netlink_rcv(struct sk_buff *skb)
 	mutex_unlock(&net->xfrm.xfrm_cfg_mutex);
 }
 
-static inline size_t xfrm_expire_msgsize(void)
+static inline unsigned int xfrm_expire_msgsize(void)
 {
 	return NLMSG_ALIGN(sizeof(struct xfrm_user_expire))
 	       + nla_total_size(sizeof(struct xfrm_mark));
@@ -2654,9 +2654,9 @@ static int xfrm_notify_sa_flush(const struct km_event *c)
 	return xfrm_nlmsg_multicast(net, skb, 0, XFRMNLGRP_SA);
 }
 
-static inline size_t xfrm_sa_len(struct xfrm_state *x)
+static inline unsigned int xfrm_sa_len(struct xfrm_state *x)
 {
-	size_t l = 0;
+	unsigned int l = 0;
 	if (x->aead)
 		l += nla_total_size(aead_len(x->aead));
 	if (x->aalg) {
@@ -2701,8 +2701,9 @@ static int xfrm_notify_sa(struct xfrm_state *x, const struct km_event *c)
 	struct xfrm_usersa_id *id;
 	struct nlmsghdr *nlh;
 	struct sk_buff *skb;
-	int len = xfrm_sa_len(x);
-	int headlen, err;
+	unsigned int len = xfrm_sa_len(x);
+	unsigned int headlen;
+	int err;
 
 	headlen = sizeof(*p);
 	if (c->event == XFRM_MSG_DELSA) {
@@ -2776,8 +2777,8 @@ static int xfrm_send_state_notify(struct xfrm_state *x, const struct km_event *c
 
 }
 
-static inline size_t xfrm_acquire_msgsize(struct xfrm_state *x,
-					  struct xfrm_policy *xp)
+static inline unsigned int xfrm_acquire_msgsize(struct xfrm_state *x,
+						struct xfrm_policy *xp)
 {
 	return NLMSG_ALIGN(sizeof(struct xfrm_user_acquire))
 	       + nla_total_size(sizeof(struct xfrm_user_tmpl) * xp->xfrm_nr)
@@ -2900,7 +2901,7 @@ static struct xfrm_policy *xfrm_compile_policy(struct sock *sk, int opt,
 	return xp;
 }
 
-static inline size_t xfrm_polexpire_msgsize(struct xfrm_policy *xp)
+static inline unsigned int xfrm_polexpire_msgsize(struct xfrm_policy *xp)
 {
 	return NLMSG_ALIGN(sizeof(struct xfrm_user_polexpire))
 	       + nla_total_size(sizeof(struct xfrm_user_tmpl) * xp->xfrm_nr)
@@ -2957,13 +2958,14 @@ static int xfrm_exp_policy_notify(struct xfrm_policy *xp, int dir, const struct
 
 static int xfrm_notify_policy(struct xfrm_policy *xp, int dir, const struct km_event *c)
 {
-	int len = nla_total_size(sizeof(struct xfrm_user_tmpl) * xp->xfrm_nr);
+	unsigned int len = nla_total_size(sizeof(struct xfrm_user_tmpl) * xp->xfrm_nr);
 	struct net *net = xp_net(xp);
 	struct xfrm_userpolicy_info *p;
 	struct xfrm_userpolicy_id *id;
 	struct nlmsghdr *nlh;
 	struct sk_buff *skb;
-	int headlen, err;
+	unsigned int headlen;
+	int err;
 
 	headlen = sizeof(*p);
 	if (c->event == XFRM_MSG_DELPOLICY) {
@@ -3070,7 +3072,7 @@ static int xfrm_send_policy_notify(struct xfrm_policy *xp, int dir, const struct
 
 }
 
-static inline size_t xfrm_report_msgsize(void)
+static inline unsigned int xfrm_report_msgsize(void)
 {
 	return NLMSG_ALIGN(sizeof(struct xfrm_user_report));
 }
@@ -3115,7 +3117,7 @@ static int xfrm_send_report(struct net *net, u8 proto,
 	return xfrm_nlmsg_multicast(net, skb, 0, XFRMNLGRP_REPORT);
 }
 
-static inline size_t xfrm_mapping_msgsize(void)
+static inline unsigned int xfrm_mapping_msgsize(void)
 {
 	return NLMSG_ALIGN(sizeof(struct xfrm_user_mapping));
 }

^ permalink raw reply

* [PATCH 4/5] xfrm: make xfrm_replay_state_esn_len() return unsigned int
From: Alexey Dobriyan @ 2017-09-21 20:47 UTC (permalink / raw)
  To: steffen.klassert; +Cc: herbert, davem, netdev
In-Reply-To: <20170921204543.GB13550@avx2>

Replay detection bitmaps can't have negative length.

Comparisons with nla_len() are left signed just in case negative value
can sneak in there.

Propagate unsignedness for code size savings:

	add/remove: 0/0 grow/shrink: 0/5 up/down: 0/-38 (-38)
	function                                     old     new   delta
	xfrm_state_construct                        1802    1800      -2
	xfrm_update_ae_params                        295     289      -6
	xfrm_state_migrate                          1345    1339      -6
	xfrm_replay_notify_esn                       349     337     -12
	xfrm_replay_notify_bmp                       345     333     -12

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
---

 include/net/xfrm.h   |    2 +-
 net/xfrm/xfrm_user.c |   10 +++++-----
 2 files changed, 6 insertions(+), 6 deletions(-)

--- a/include/net/xfrm.h
+++ b/include/net/xfrm.h
@@ -1779,7 +1779,7 @@ static inline unsigned int xfrm_alg_auth_len(const struct xfrm_algo_auth *alg)
 	return sizeof(*alg) + ((alg->alg_key_len + 7) / 8);
 }
 
-static inline int xfrm_replay_state_esn_len(struct xfrm_replay_state_esn *replay_esn)
+static inline unsigned int xfrm_replay_state_esn_len(struct xfrm_replay_state_esn *replay_esn)
 {
 	return sizeof(*replay_esn) + replay_esn->bmp_len * sizeof(__u32);
 }
--- a/net/xfrm/xfrm_user.c
+++ b/net/xfrm/xfrm_user.c
@@ -130,7 +130,7 @@ static inline int verify_replay(struct xfrm_usersa_info *p,
 		if (rs->bmp_len > XFRMA_REPLAY_ESN_MAX / sizeof(rs->bmp[0]) / 8)
 			return -EINVAL;
 
-		if (nla_len(rt) < xfrm_replay_state_esn_len(rs) &&
+		if (nla_len(rt) < (int)xfrm_replay_state_esn_len(rs) &&
 		    nla_len(rt) != sizeof(*rs))
 			return -EINVAL;
 	}
@@ -404,7 +404,7 @@ static inline int xfrm_replay_verify_len(struct xfrm_replay_state_esn *replay_es
 					 struct nlattr *rp)
 {
 	struct xfrm_replay_state_esn *up;
-	int ulen;
+	unsigned int ulen;
 
 	if (!replay_esn || !rp)
 		return 0;
@@ -414,7 +414,7 @@ static inline int xfrm_replay_verify_len(struct xfrm_replay_state_esn *replay_es
 
 	/* Check the overall length and the internal bitmap length to avoid
 	 * potential overflow. */
-	if (nla_len(rp) < ulen ||
+	if (nla_len(rp) < (int)ulen ||
 	    xfrm_replay_state_esn_len(replay_esn) != ulen ||
 	    replay_esn->bmp_len != up->bmp_len)
 		return -EINVAL;
@@ -430,14 +430,14 @@ static int xfrm_alloc_replay_state_esn(struct xfrm_replay_state_esn **replay_esn
 				       struct nlattr *rta)
 {
 	struct xfrm_replay_state_esn *p, *pp, *up;
-	int klen, ulen;
+	unsigned int klen, ulen;
 
 	if (!rta)
 		return 0;
 
 	up = nla_data(rta);
 	klen = xfrm_replay_state_esn_len(up);
-	ulen = nla_len(rta) >= klen ? klen : sizeof(*up);
+	ulen = nla_len(rta) >= (int)klen ? klen : sizeof(*up);
 
 	p = kzalloc(klen, GFP_KERNEL);
 	if (!p)

^ permalink raw reply

* [PATCH 3/5] xfrm: make xfrm_alg_auth_len() return unsigned int
From: Alexey Dobriyan @ 2017-09-21 20:47 UTC (permalink / raw)
  To: steffen.klassert; +Cc: herbert, davem, netdev
In-Reply-To: <20170921204543.GB13550@avx2>

Key lengths can't be negative.

Comparison with nla_len() is left signed just in case negative value
can sneak in there.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
---

 include/net/xfrm.h   |    2 +-
 net/xfrm/xfrm_user.c |    2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

--- a/include/net/xfrm.h
+++ b/include/net/xfrm.h
@@ -1774,7 +1774,7 @@ static inline unsigned int xfrm_alg_len(const struct xfrm_algo *alg)
 	return sizeof(*alg) + ((alg->alg_key_len + 7) / 8);
 }
 
-static inline int xfrm_alg_auth_len(const struct xfrm_algo_auth *alg)
+static inline unsigned int xfrm_alg_auth_len(const struct xfrm_algo_auth *alg)
 {
 	return sizeof(*alg) + ((alg->alg_key_len + 7) / 8);
 }
--- a/net/xfrm/xfrm_user.c
+++ b/net/xfrm/xfrm_user.c
@@ -68,7 +68,7 @@ static int verify_auth_trunc(struct nlattr **attrs)
 		return 0;
 
 	algp = nla_data(rt);
-	if (nla_len(rt) < xfrm_alg_auth_len(algp))
+	if (nla_len(rt) < (int)xfrm_alg_auth_len(algp))
 		return -EINVAL;
 
 	algp->alg_name[sizeof(algp->alg_name) - 1] = '\0';

^ permalink raw reply

* [PATCH 2/5] xfrm: make xfrm_alg_len() return unsigned int
From: Alexey Dobriyan @ 2017-09-21 20:46 UTC (permalink / raw)
  To: steffen.klassert; +Cc: herbert, davem, netdev
In-Reply-To: <20170921204543.GB13550@avx2>

Key lengths can't be negative.

Comparison with nla_len() is left signed just in case negative value
can sneak in there.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
---

 include/net/xfrm.h   |    2 +-
 net/xfrm/xfrm_user.c |    2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

--- a/include/net/xfrm.h
+++ b/include/net/xfrm.h
@@ -1769,7 +1769,7 @@ static inline unsigned int aead_len(struct xfrm_algo_aead *alg)
 	return sizeof(*alg) + ((alg->alg_key_len + 7) / 8);
 }
 
-static inline int xfrm_alg_len(const struct xfrm_algo *alg)
+static inline unsigned int xfrm_alg_len(const struct xfrm_algo *alg)
 {
 	return sizeof(*alg) + ((alg->alg_key_len + 7) / 8);
 }
--- a/net/xfrm/xfrm_user.c
+++ b/net/xfrm/xfrm_user.c
@@ -42,7 +42,7 @@ static int verify_one_alg(struct nlattr **attrs, enum xfrm_attr_type_t type)
 		return 0;
 
 	algp = nla_data(rt);
-	if (nla_len(rt) < xfrm_alg_len(algp))
+	if (nla_len(rt) < (int)xfrm_alg_len(algp))
 		return -EINVAL;
 
 	switch (type) {

^ permalink raw reply

* [PATCH 1/5] xfrm: make aead_len() return unsigned int
From: Alexey Dobriyan @ 2017-09-21 20:45 UTC (permalink / raw)
  To: steffen.klassert; +Cc: herbert, davem, netdev

Key lengths can't be negative.

Comparison with nla_len() is left signed just in case negative value
can sneak in there.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
---

 include/net/xfrm.h   |    2 +-
 net/xfrm/xfrm_user.c |    2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

--- a/include/net/xfrm.h
+++ b/include/net/xfrm.h
@@ -1764,7 +1764,7 @@ static inline int xfrm_acquire_is_on(struct net *net)
 }
 #endif
 
-static inline int aead_len(struct xfrm_algo_aead *alg)
+static inline unsigned int aead_len(struct xfrm_algo_aead *alg)
 {
 	return sizeof(*alg) + ((alg->alg_key_len + 7) / 8);
 }
--- a/net/xfrm/xfrm_user.c
+++ b/net/xfrm/xfrm_user.c
@@ -84,7 +84,7 @@ static int verify_aead(struct nlattr **attrs)
 		return 0;
 
 	algp = nla_data(rt);
-	if (nla_len(rt) < aead_len(algp))
+	if (nla_len(rt) < (int)aead_len(algp))
 		return -EINVAL;
 
 	algp->alg_name[sizeof(algp->alg_name) - 1] = '\0';

^ permalink raw reply

* [PATCH] net: use 32-bit arithmetic while allocating net device
From: Alexey Dobriyan @ 2017-09-21 20:33 UTC (permalink / raw)
  To: davem; +Cc: netdev

Private part of allocation is never big enough to warrant size_t.

Space savings:

	add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-10 (-10)
	function                                     old     new   delta
	alloc_netdev_mqs                            1120    1110     -10

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
---

 net/core/dev.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -7989,7 +7989,7 @@ struct net_device *alloc_netdev_mqs(int sizeof_priv, const char *name,
 		unsigned int txqs, unsigned int rxqs)
 {
 	struct net_device *dev;
-	size_t alloc_size;
+	unsigned int alloc_size;
 	struct net_device *p;
 
 	BUG_ON(strlen(name) >= sizeof(dev->name));

^ permalink raw reply

* Re: net: macb: fail when there's no PHY
From: Grant Edwards @ 2017-09-21 20:36 UTC (permalink / raw)
  To: Florian Fainelli; +Cc: netdev
In-Reply-To: <66c0a032-4d20-69f1-deb4-6c65af6ec740@gmail.com>

On Thu, Sep 21, 2017 at 01:05:57PM -0700, Florian Fainelli wrote:

>> It looks like the macb driver still can't handle boards that don't
>> have a PHY.  Is that correct?
> 
> Not since:
> 
> dacdbb4dfc1a1a1378df8ebc914d4fe82259ed46 ("net: macb: add fixed-link
> node support")

Yep, it's obvious now that I've got the diff in front of me.

Thanks!

[I just started working with device tree for the first time yesterday,
and I must say it's way better than the "old days" which required all
sorts of ugly to produce a kernel that could work on two slightly
different boards.]

-- 
Grant

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox